Live Demo: 👉 https://rag-chatbot-otcfoy7nvekby3tmmxycba.streamlit.app/
🎥 Demo
A conversational AI chatbot that lets you upload PDFs and ask questions about them. Built with Retrieval-Augmented Generation (RAG), it retrieves the most relevant chunks from your documents, reranks them for accuracy, and generates structured answers using LLaMA 3.3 70B via Groq.
- Multi-PDF support — upload and index multiple PDFs in a single session
- Smart Mode — answers using document context, falls back to general knowledge when needed
- Strict Mode — answers strictly from the uploaded documents, no external knowledge
- Reranking pipeline — BGE cross-encoder reranker on top of MMR retrieval for higher precision
- Streaming responses — token-by-token output with a live typing cursor
- Query-type detection — automatically detects compare, summarize, definition, explain, or general queries and formats the answer accordingly
- Source attribution — shows which file and page each answer was pulled from, grouped by document
- Chat export — download the full conversation as a
.txtfile
User Query
|
Query Classifier --> Format Instructions
|
History-Aware Query Rewriter (LLaMA 3.3 via Groq)
|
MMR Retriever (ChromaDB + all-MiniLM-L6-v2)
fetch_k=40 --> k=20
|
BGE Reranker (BAAI/bge-reranker-base)
|
Top 5 Chunks
|
Answer Generation (LLaMA 3.3 70B via Groq)
|
Streaming Response + Source Expander
-
Ingest — PDFs are loaded with
PyPDFLoader, split into 800-token chunks with 150-token overlap, embedded withall-MiniLM-L6-v2, and stored in ChromaDB. -
Retrieve — On each query, chat history is used to rewrite the question into a clear standalone query. MMR retrieval fetches 40 candidates, filters to 20 for diversity, then the BGE cross-encoder reranks and returns the top 5.
-
Generate — The top chunks are passed to LLaMA 3.3 70B via Groq along with the detected query type and structured format instructions. The response streams token by token.
-
Sources — Retrieved chunks are grouped by source file and shown with page numbers and text excerpts in a collapsible expander.
rag-chatbot/
├── app.py # Streamlit UI — chat interface, sidebar, export
├── rag_chain.py # RAG pipeline — retriever, reranker, LLM chain
├── ingest.py # PDF ingestion — load, chunk, embed, store
├── requirements.txt # Python dependencies
├── .env # API keys (not committed)
├── .gitignore
└── README.md
git clone https://github.com/snigdha031/rag-chatbot.git
cd rag-chatbotpython -m venv venv
source venv/bin/activate # Mac/Linux
venv\Scripts\activate # Windowspip install -r requirements.txtCreate a .env file in the root directory:
GROQ_API_KEY=your_groq_api_key_hereGet your free Groq API key at console.groq.com.
streamlit run app.py| Setting | Default | Description |
|---|---|---|
| Embedding model | all-MiniLM-L6-v2 |
HuggingFace sentence transformer |
| Reranker model | BAAI/bge-reranker-base |
Cross-encoder reranker |
| LLM | llama-3.3-70b-versatile |
Via Groq API |
| Chunk size | 800 tokens |
With 150 token overlap |
| MMR fetch_k | 40 |
Candidates fetched before MMR |
| MMR k | 20 |
After MMR diversity filter |
| Reranker top_k | 5 |
Final chunks passed to LLM |
- Push the repo to GitHub
- Go to share.streamlit.io
- Connect your GitHub repo
- Add your API key under Settings > Secrets:
GROQ_API_KEY = "your_groq_api_key_here"- Click Deploy — your app gets a public URL instantly
Note: ChromaDB persists to disk. On Streamlit Cloud the filesystem resets on each reboot, so uploaded PDFs will need to be re-indexed after the app restarts. For persistent storage across restarts, swap ChromaDB for a hosted vector DB like Pinecone or Qdrant Cloud.
| Layer | Tool |
|---|---|
| UI | Streamlit |
| LLM | LLaMA 3.3 70B via Groq |
| Orchestration | LangChain |
| Embeddings | all-MiniLM-L6-v2 (HuggingFace) |
| Vector store | ChromaDB |
| Reranker | BAAI/bge-reranker-base |
| PDF loading | PyPDFLoader |
MIT License — see LICENSE for details.
