Skip to content

ayman-tech/deep-rag

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

11 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

🧠 DeepRAG β€” Hybrid Retrieval-Augmented Generation System

An end-to-end RAG pipeline with hybrid dense/sparse retrieval, cross-encoder reranking, contextual chunking, and DeepSeek LLM generation β€” served via a full-stack NiceGUI web interface.


πŸ—οΈ Architecture

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                        NiceGUI Frontend (:8000)                     β”‚
β”‚                   Upload PDF  |  Ask Questions  |  View Results     β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
               β”‚                                  β”‚
               β–Ό                                  β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”       β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚    Ingestion Pipeline    β”‚       β”‚        Query Pipeline            β”‚
β”‚                          β”‚       β”‚                                  β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚       β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚
β”‚  β”‚  PDF Text Extract  β”‚  β”‚       β”‚  β”‚   Dense Embedding (BGE-M3) β”‚  β”‚
β”‚  β”‚     (pypdf)        β”‚  β”‚       β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚       β”‚                β”‚                 β”‚
β”‚           β–Ό              β”‚       β”‚                β–Ό                 β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚       β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚
β”‚  β”‚ Contextual Chunkingβ”‚  β”‚       β”‚  β”‚   Hybrid Search (Qdrant)   β”‚  β”‚
β”‚  β”‚   (DeepSeek LLM)   β”‚  β”‚       β”‚  β”‚   Top-20 candidates        β”‚  β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚       β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚
β”‚           β–Ό              β”‚       β”‚                β”‚                 β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚       β”‚                β–Ό                 β”‚
β”‚  β”‚  Dense Embedding   β”‚  β”‚       β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚
β”‚  β”‚   (BGE-M3)         β”‚  β”‚       β”‚  β”‚  Cross-Encoder Reranking   β”‚  β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚       β”‚  β”‚  (BGE-Reranker-v2-M3)      β”‚  β”‚
β”‚           β–Ό              β”‚       β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚       β”‚                β”‚                 β”‚
β”‚  β”‚  Qdrant Upsert     β”‚  β”‚       β”‚                β–Ό                 β”‚
β”‚  β”‚  (Vector Indexing) β”‚  β”‚       β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚       β”‚  β”‚  DeepSeek LLM Generation   β”‚  β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜       β”‚  β”‚  (Chain-of-Thought)        β”‚  β”‚
                                   β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚
                                   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                              β”‚
                                   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                                   β”‚   Qdrant (Docker)   β”‚
                                   β”‚   localhost:6333    β”‚
                                   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

βš™οΈ Tech Stack

Component Technology
Frontend NiceGUI (Quasar/Vue-based)
Backend FastAPI
Vector Database Qdrant (Docker)
Embeddings BAAI/bge-m3 (Sentence Transformers)
Reranker BAAI/bge-reranker-v2-m3
LLM DeepSeek Chat API
PDF Parsing pypdf

✨ Features

  • πŸ”— Contextual Chunking β€” Each chunk is enriched with document-level context via DeepSeek before indexing
  • πŸ” Hybrid Retrieval β€” Dense vector similarity search via Qdrant
  • 🎯 Cross-Encoder Reranking β€” BGE-Reranker-v2-M3 reranks top-20 candidates for precision
  • πŸ’­ Chain-of-Thought Generation β€” DeepSeek LLM generates reasoned answers with thinking traces
  • πŸ–₯️ Web UI β€” Drag-and-drop PDF upload, real-time Q&A, collapsible reasoning display
  • πŸ”Œ REST API β€” Programmatic access via /upload-pdf and /ask endpoints

πŸ“‹ Prerequisites

  • 🐍 Python 3.10+
  • 🐳 Docker (for Qdrant)
  • πŸ”‘ DeepSeek API Key β€” Get one at platform.deepseek.com

πŸš€ Quick Start

1. Clone the repository

git clone https://github.com/ayman-tech/deep-rag.git
cd deep-rag

2. Create virtual environment

python -m venv .venv

# Windows
.\.venv\Scripts\activate

# macOS/Linux
source .venv/bin/activate

3. Install dependencies

pip install -r requirements.txt

4. Configure environment variables

Create a .env file in the project root:

DEEPSEEK_API_KEY=your_deepseek_api_key_here
QDRANT_URL=http://localhost:6333

5. Start Qdrant (Docker)

docker run -d -p 6333:6333 -p 6334:6334 qdrant/qdrant

6. Run the application πŸŽ‰

With Web UI (recommended):

python ui.py

Opens at β†’ http://localhost:8000

API-only mode (headless):

python main.py

Swagger docs at β†’ http://localhost:8000/docs

πŸ’‘ Usage

πŸ–₯️ Web UI

  1. 🌐 Open http://localhost:8000
  2. πŸ“„ Upload a PDF using the drag-and-drop uploader
  3. ❓ Type a question and click Ask
  4. βœ… View the answer and expand the reasoning trace

πŸ”Œ REST API Testing

Recommended to use Postman, curl commands are given below

Upload a PDF:

curl -X POST http://localhost:8000/upload-pdf \
  -F "file=@document.pdf"

Ask a question:

curl -X POST http://localhost:8000/ask \
  -H "Content-Type: application/json" \
  -d '{"query": "What are the key findings?"}'

Response format:

{
  "query": "What are the key findings?",
  "reasoning": "Based on the context provided...",
  "answer": "The key findings include...",
  "sources_used": 5
}

πŸ“ Project Structure

rag/
β”œβ”€β”€ main.py              # FastAPI-only entry point (headless API)
β”œβ”€β”€ ui.py                # NiceGUI + FastAPI entry point (Web UI + API)
β”œβ”€β”€ config.py            # Settings & environment config (Pydantic)
β”œβ”€β”€ requirements.txt     # Python dependencies
β”œβ”€β”€ .env                 # Environment variables (not committed)
└── src/
    β”œβ”€β”€ __init__.py
    β”œβ”€β”€ ingestion.py     # PDF parsing, contextual chunking, vector indexing
    β”œβ”€β”€ retrieval.py     # Hybrid search, cross-encoder reranking
    └── generation.py    # DeepSeek LLM answer generation

πŸ”¬ How It Works

πŸ“₯ Ingestion Pipeline

  1. PDF Extraction β€” Text is extracted page-by-page using pypdf
  2. Contextual Chunking β€” Each page chunk is sent to DeepSeek along with the full document summary to generate a contextual header (inspired by Anthropic's contextual retrieval)
  3. Embedding β€” The enriched chunk is encoded into a dense vector using BAAI/bge-m3
  4. Indexing β€” Vectors are upserted into Qdrant with metadata (page number, context, original text)

πŸ”Ž Query Pipeline

  1. Embedding β€” The user query is encoded using the same BAAI/bge-m3 model
  2. Retrieval β€” Top-20 candidate chunks are retrieved from Qdrant via cosine similarity
  3. Reranking β€” A cross-encoder (BAAI/bge-reranker-v2-m3) rescores all candidates for precision
  4. Generation β€” Top-5 reranked chunks are sent to DeepSeek LLM with the query for chain-of-thought answer generation

⚑ Configuration

All settings are managed via config.py and can be overridden with environment variables:

Variable Default Description
DEEPSEEK_API_KEY (required) DeepSeek API key
QDRANT_URL http://localhost:6333 Qdrant server URL
COLLECTION_NAME gold_standard_rag Qdrant collection name
DENSE_MODEL BAAI/bge-m3 Sentence transformer model
RERANK_MODEL BAAI/bge-reranker-v2-m3 Cross-encoder reranker model

πŸ“„ License

MIT License

Ayman Sayed Β© aymanAI.com Copyright 2026

About

Retrieval-Augmented Generation (RAG) system using FastAPI, Qdrant vector database & DeepSeek LLM with hybrid retrieval (dense+sparse) & cross-encoder reranking. Document ingestion pipeline with chunking, embedding generation via sentence transformers & vector indexing, served through NiceGUI web interface for PDF upload & natural language querying.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors