An end-to-end RAG pipeline with hybrid dense/sparse retrieval, cross-encoder reranking, contextual chunking, and DeepSeek LLM generation β served via a full-stack NiceGUI web interface.
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β NiceGUI Frontend (:8000) β
β Upload PDF | Ask Questions | View Results β
ββββββββββββββββ¬βββββββββββββββββββββββββββββββββββ¬ββββββββββββββββββββ
β β
βΌ βΌ
ββββββββββββββββββββββββββββ ββββββββββββββββββββββββββββββββββββ
β Ingestion Pipeline β β Query Pipeline β
β β β β
β ββββββββββββββββββββββ β β ββββββββββββββββββββββββββββββ β
β β PDF Text Extract β β β β Dense Embedding (BGE-M3) β β
β β (pypdf) β β β βββββββββββββββ¬βββββββββββββββ β
β ββββββββββ¬ββββββββββββ β β β β
β βΌ β β βΌ β
β ββββββββββββββββββββββ β β ββββββββββββββββββββββββββββββ β
β β Contextual Chunkingβ β β β Hybrid Search (Qdrant) β β
β β (DeepSeek LLM) β β β β Top-20 candidates β β
β ββββββββββ¬ββββββββββββ β β βββββββββββββββ¬βββββββββββββββ β
β βΌ β β β β
β ββββββββββββββββββββββ β β βΌ β
β β Dense Embedding β β β ββββββββββββββββββββββββββββββ β
β β (BGE-M3) β β β β Cross-Encoder Reranking β β
β ββββββββββ¬ββββββββββββ β β β (BGE-Reranker-v2-M3) β β
β βΌ β β βββββββββββββββ¬βββββββββββββββ β
β ββββββββββββββββββββββ β β β β
β β Qdrant Upsert β β β βΌ β
β β (Vector Indexing) β β β ββββββββββββββββββββββββββββββ β
β ββββββββββββββββββββββ β β β DeepSeek LLM Generation β β
ββββββββββββββββββββββββββββ β β (Chain-of-Thought) β β
β ββββββββββββββββββββββββββββββ β
ββββββββββββββββββββββββββββββββββββ
β
ββββββββββββ΄βββββββββββ
β Qdrant (Docker) β
β localhost:6333 β
βββββββββββββββββββββββ
| Component | Technology |
|---|---|
| Frontend | NiceGUI (Quasar/Vue-based) |
| Backend | FastAPI |
| Vector Database | Qdrant (Docker) |
| Embeddings | BAAI/bge-m3 (Sentence Transformers) |
| Reranker | BAAI/bge-reranker-v2-m3 |
| LLM | DeepSeek Chat API |
| PDF Parsing | pypdf |
- π Contextual Chunking β Each chunk is enriched with document-level context via DeepSeek before indexing
- π Hybrid Retrieval β Dense vector similarity search via Qdrant
- π― Cross-Encoder Reranking β BGE-Reranker-v2-M3 reranks top-20 candidates for precision
- π Chain-of-Thought Generation β DeepSeek LLM generates reasoned answers with thinking traces
- π₯οΈ Web UI β Drag-and-drop PDF upload, real-time Q&A, collapsible reasoning display
- π REST API β Programmatic access via
/upload-pdfand/askendpoints
- π Python 3.10+
- π³ Docker (for Qdrant)
- π DeepSeek API Key β Get one at platform.deepseek.com
git clone https://github.com/ayman-tech/deep-rag.git
cd deep-ragpython -m venv .venv
# Windows
.\.venv\Scripts\activate
# macOS/Linux
source .venv/bin/activatepip install -r requirements.txtCreate a .env file in the project root:
DEEPSEEK_API_KEY=your_deepseek_api_key_here
QDRANT_URL=http://localhost:6333docker run -d -p 6333:6333 -p 6334:6334 qdrant/qdrantWith Web UI (recommended):
python ui.pyOpens at β http://localhost:8000
API-only mode (headless):
python main.pySwagger docs at β http://localhost:8000/docs
- π Open http://localhost:8000
- π Upload a PDF using the drag-and-drop uploader
- β Type a question and click Ask
- β View the answer and expand the reasoning trace
Recommended to use Postman, curl commands are given below
Upload a PDF:
curl -X POST http://localhost:8000/upload-pdf \
-F "file=@document.pdf"Ask a question:
curl -X POST http://localhost:8000/ask \
-H "Content-Type: application/json" \
-d '{"query": "What are the key findings?"}'Response format:
{
"query": "What are the key findings?",
"reasoning": "Based on the context provided...",
"answer": "The key findings include...",
"sources_used": 5
}rag/
βββ main.py # FastAPI-only entry point (headless API)
βββ ui.py # NiceGUI + FastAPI entry point (Web UI + API)
βββ config.py # Settings & environment config (Pydantic)
βββ requirements.txt # Python dependencies
βββ .env # Environment variables (not committed)
βββ src/
βββ __init__.py
βββ ingestion.py # PDF parsing, contextual chunking, vector indexing
βββ retrieval.py # Hybrid search, cross-encoder reranking
βββ generation.py # DeepSeek LLM answer generation
- PDF Extraction β Text is extracted page-by-page using
pypdf - Contextual Chunking β Each page chunk is sent to DeepSeek along with the full document summary to generate a contextual header (inspired by Anthropic's contextual retrieval)
- Embedding β The enriched chunk is encoded into a dense vector using
BAAI/bge-m3 - Indexing β Vectors are upserted into Qdrant with metadata (page number, context, original text)
- Embedding β The user query is encoded using the same
BAAI/bge-m3model - Retrieval β Top-20 candidate chunks are retrieved from Qdrant via cosine similarity
- Reranking β A cross-encoder (
BAAI/bge-reranker-v2-m3) rescores all candidates for precision - Generation β Top-5 reranked chunks are sent to DeepSeek LLM with the query for chain-of-thought answer generation
All settings are managed via config.py and can be overridden with environment variables:
| Variable | Default | Description |
|---|---|---|
DEEPSEEK_API_KEY |
(required) | DeepSeek API key |
QDRANT_URL |
http://localhost:6333 |
Qdrant server URL |
COLLECTION_NAME |
gold_standard_rag |
Qdrant collection name |
DENSE_MODEL |
BAAI/bge-m3 |
Sentence transformer model |
RERANK_MODEL |
BAAI/bge-reranker-v2-m3 |
Cross-encoder reranker model |
MIT License
Ayman Sayed Β© aymanAI.com Copyright 2026