RAG-Powered Knowledge Base
Problem — Teams needed accurate, cited answers from internal docs without hallucination.
Why it was hard — Balancing retrieval quality, context length, and latency for real-time chat.
Tech stack — LLM, BGE-M3, Qdrant, LangChain, FastAPI
Architecture — User query → embedding → vector search (Qdrant) → top-k retrieval → LLM prompt with context → streamed response.