#🧠 Multi-Agent RAG — Document Q&A System

Ask questions about any document and get accurate, cited answers powered by a multi-agent AI pipeline. Upload a PDF, DOCX, or TXT file and instantly chat with it — summarize it, extract key points, or ask anything specific.

What it does

Upload any document — PDF, DOCX, or plain text
Ask questions in natural language — the system finds the most relevant parts of your document and answers based only on what's actually in it
Summarize — get a full summary of any document in seconds
Extract key points — pull out the most important facts and takeaways
Cited answers — every response tells you which part of the document it came from
Semantic caching — identical or very similar queries are cached so you don't waste API calls

How it works

When you upload a document, it gets split into small chunks and converted into numerical embeddings using a local HuggingFace model. These embeddings are stored in ChromaDB on your machine.

When you ask a question, four agents work together:

Your Question
      ↓
Orchestrator Agent — figures out what you're asking (Q&A, summarize, key points)
      ↓
Retrieval Agent — searches ChromaDB for the most relevant chunks
      ↓
Reasoning Agent — thinks through the retrieved context step by step
      ↓
Generation Agent — writes the final answer with source citations
      ↓
Answer shown in Streamlit UI

Tech stack

Layer	Tool	Why
Frontend	Streamlit	Fast, clean UI with no frontend code
LLM	Gemini 1.5 Flash	Free tier — 1M tokens/day
Embeddings	sentence-transformers (HuggingFace)	Runs locally, completely free
Vector DB	ChromaDB	Local, persistent, no setup needed
Framework	LangChain	Agent orchestration and prompt management
Document parsing	PyMuPDF, python-docx	PDF and DOCX support

Total cost to run: $0 — everything except Gemini runs locally. Gemini's free tier gives you 15 requests/minute and 1 million tokens per day.

Project structure

multi_agent_rag/
├── app.py                        # Streamlit frontend — run this
├── requirements.txt
├── .env.example                  # Copy this to .env and add your key
│
├── agents/
│   ├── orchestrator.py           # Routes queries to the right agent
│   ├── retrieval_agent.py        # Semantic search over your document
│   ├── reasoning_agent.py        # Chain-of-thought reasoning
│   └── generation_agent.py      # Final answer with citations
│
├── core/
│   ├── document_processor.py    # Ingests and chunks documents
│   ├── embeddings.py            # HuggingFace embedding generation
│   ├── vector_store.py          # ChromaDB interface
│   └── prompt_templates.py      # All LangChain prompts
│
├── utils/
│   ├── text_splitter.py         # Semantic chunking logic
│   └── helpers.py               # Utility functions
│
└── config/
    └── settings.py              # Config loaded from .env

Setup

1. Clone the repo

git clone https://github.com/your-username/multi-agent-rag.git
cd multi-agent-rag

2. Install dependencies

pip install -r requirements.txt

3. Get your free Gemini API key

Go to aistudio.google.com/app/apikey, sign in with your Google account, and create a key. It takes about 30 seconds.

4. Set up your environment

cp .env.example .env

Open .env and paste your key:

GEMINI_API_KEY=your_key_here

5. Run the app

streamlit run app.py

Your browser will open automatically at http://localhost:8501.

Usage

Open the app in your browser
Upload a PDF, DOCX, or TXT file using the sidebar
Wait a few seconds while the document is processed and indexed
Start asking questions in the chat box
Use the Summarize or Key Points buttons for instant one-click analysis
Expand the Sources section under any answer to see exactly which part of the document was used

Example queries

"What is the main argument of this paper?"
"List all the dates and deadlines mentioned."
"What does the author recommend in the conclusion?"
"Summarize section 3."
"Are there any risks or limitations mentioned?"

Environment variables

Variable	Description
`GEMINI_API_KEY`	Your Google Gemini API key (required)
`CHUNK_SIZE`	Token size per document chunk (default: 500)
`CHUNK_OVERLAP`	Overlap between chunks (default: 50)
`TOP_K_RESULTS`	Number of chunks retrieved per query (default: 5)
`CACHE_SIMILARITY_THRESHOLD`	Cosine similarity above which a cached answer is reused (default: 0.95)

Known limitations

Works best with text-heavy documents — scanned image PDFs without OCR won't be indexed properly
Very large documents (100+ pages) may take 30–60 seconds to process on first upload
Gemini free tier has a rate limit of 15 requests per minute — if you hit it, just wait a moment and retry

Built with

License

MIT — free to use, modify, and distribute.

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
.vscode		.vscode
agents		agents
core		core
ui		ui
utils		utils
.env.example		.env.example
.gitignore		.gitignore
README.md		README.md
app.py		app.py
config.py		config.py
requirements.txt		requirements.txt
test_setup.py		test_setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

#🧠 Multi-Agent RAG — Document Q&A System

What it does

How it works

Tech stack

Project structure

Setup

1. Clone the repo

2. Install dependencies

3. Get your free Gemini API key

4. Set up your environment

5. Run the app

Usage

Example queries

Environment variables

Known limitations

Built with

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

#🧠 Multi-Agent RAG — Document Q&A System

What it does

How it works

Tech stack

Project structure

Setup

1. Clone the repo

2. Install dependencies

3. Get your free Gemini API key

4. Set up your environment

5. Run the app

Usage

Example queries

Environment variables

Known limitations

Built with

License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages