Ask questions about any document and get accurate, cited answers powered by a multi-agent AI pipeline. Upload a PDF, DOCX, or TXT file and instantly chat with it — summarize it, extract key points, or ask anything specific.
- Upload any document — PDF, DOCX, or plain text
- Ask questions in natural language — the system finds the most relevant parts of your document and answers based only on what's actually in it
- Summarize — get a full summary of any document in seconds
- Extract key points — pull out the most important facts and takeaways
- Cited answers — every response tells you which part of the document it came from
- Semantic caching — identical or very similar queries are cached so you don't waste API calls
When you upload a document, it gets split into small chunks and converted into numerical embeddings using a local HuggingFace model. These embeddings are stored in ChromaDB on your machine.
When you ask a question, four agents work together:
Your Question
↓
Orchestrator Agent — figures out what you're asking (Q&A, summarize, key points)
↓
Retrieval Agent — searches ChromaDB for the most relevant chunks
↓
Reasoning Agent — thinks through the retrieved context step by step
↓
Generation Agent — writes the final answer with source citations
↓
Answer shown in Streamlit UI
| Layer | Tool | Why |
|---|---|---|
| Frontend | Streamlit | Fast, clean UI with no frontend code |
| LLM | Gemini 1.5 Flash | Free tier — 1M tokens/day |
| Embeddings | sentence-transformers (HuggingFace) | Runs locally, completely free |
| Vector DB | ChromaDB | Local, persistent, no setup needed |
| Framework | LangChain | Agent orchestration and prompt management |
| Document parsing | PyMuPDF, python-docx | PDF and DOCX support |
Total cost to run: $0 — everything except Gemini runs locally. Gemini's free tier gives you 15 requests/minute and 1 million tokens per day.
multi_agent_rag/
├── app.py # Streamlit frontend — run this
├── requirements.txt
├── .env.example # Copy this to .env and add your key
│
├── agents/
│ ├── orchestrator.py # Routes queries to the right agent
│ ├── retrieval_agent.py # Semantic search over your document
│ ├── reasoning_agent.py # Chain-of-thought reasoning
│ └── generation_agent.py # Final answer with citations
│
├── core/
│ ├── document_processor.py # Ingests and chunks documents
│ ├── embeddings.py # HuggingFace embedding generation
│ ├── vector_store.py # ChromaDB interface
│ └── prompt_templates.py # All LangChain prompts
│
├── utils/
│ ├── text_splitter.py # Semantic chunking logic
│ └── helpers.py # Utility functions
│
└── config/
└── settings.py # Config loaded from .env
git clone https://github.com/your-username/multi-agent-rag.git
cd multi-agent-ragpip install -r requirements.txtGo to aistudio.google.com/app/apikey, sign in with your Google account, and create a key. It takes about 30 seconds.
cp .env.example .envOpen .env and paste your key:
GEMINI_API_KEY=your_key_here
streamlit run app.pyYour browser will open automatically at http://localhost:8501.
- Open the app in your browser
- Upload a PDF, DOCX, or TXT file using the sidebar
- Wait a few seconds while the document is processed and indexed
- Start asking questions in the chat box
- Use the Summarize or Key Points buttons for instant one-click analysis
- Expand the Sources section under any answer to see exactly which part of the document was used
- "What is the main argument of this paper?"
- "List all the dates and deadlines mentioned."
- "What does the author recommend in the conclusion?"
- "Summarize section 3."
- "Are there any risks or limitations mentioned?"
| Variable | Description |
|---|---|
GEMINI_API_KEY |
Your Google Gemini API key (required) |
CHUNK_SIZE |
Token size per document chunk (default: 500) |
CHUNK_OVERLAP |
Overlap between chunks (default: 50) |
TOP_K_RESULTS |
Number of chunks retrieved per query (default: 5) |
CACHE_SIMILARITY_THRESHOLD |
Cosine similarity above which a cached answer is reused (default: 0.95) |
- Works best with text-heavy documents — scanned image PDFs without OCR won't be indexed properly
- Very large documents (100+ pages) may take 30–60 seconds to process on first upload
- Gemini free tier has a rate limit of 15 requests per minute — if you hit it, just wait a moment and retry
MIT — free to use, modify, and distribute.