A production-grade Multimodal RAG (Retrieval-Augmented Generation) system for intelligent document understanding, semantic retrieval, and grounded question answering.
Ask-My-Documents is an advanced multimodal RAG pipeline designed to process complex documents such as research papers, technical PDFs, and enterprise documents. Unlike basic PDF chatbot implementations, this project focuses on retrieval quality, multimodal understanding, and production-style document ingestion pipelines.
The system combines:
- Semantic chunking — title-aware, context-preserving document segmentation
- OCR-aware extraction — robust handling of scanned and image-heavy documents
- Table-aware parsing — structured preservation of tabular data
- Image-aware understanding — vision-language model integration for multimodal content
- AI-powered semantic enrichment — LLM-augmented chunk metadata
- Vector retrieval pipelines — high-performance semantic search via ChromaDB
| Feature | Description |
|---|---|
| 📄 PDF Ingestion | Parse and extract content from complex PDFs |
| 🧠 Semantic Enrichment | AI-enhanced chunk-level understanding |
| 🖼️ Multimodal Understanding | Vision-Language Model integration (Qwen2-VL) |
| 📊 Table Extraction | Structure-preserving table parsing |
| 🔎 Vector Retrieval | Semantic search with ChromaDB |
| 🧩 Title-Aware Chunking | Context-coherent document segmentation |
| 📚 Grounded Generation | Source-cited answer generation |
| 🏷️ Metadata-Aware Retrieval | Rich metadata for precision filtering |
| 🚀 Production Architecture | Modular, extensible ingestion pipeline |
| Component | Technology |
|---|---|
| Document Parsing | Unstructured |
| Embeddings | BAAI/bge-small-en-v1.5 |
| Vector Database | ChromaDB |
| Multimodal Model | Qwen2-VL |
| Framework | LangChain |
| OCR | Tesseract OCR |
| PDF Processing | Poppler |
| Backend | Python 3.10+ |
Document Upload
│
▼
Document Parsing (Unstructured)
│
▼
Semantic Element Extraction
│
▼
Title-Aware Chunking
│
▼
Multimodal Content Extraction
│
▼
AI Semantic Enrichment
│
▼
Embeddings Generation
│
▼
ChromaDB Vector Storage
│
▼
Hybrid Retrieval + Re-ranking
│
▼
Grounded Answer Generation
ask-my-documents/
│
├── data/
├── notebooks/
├── src/
│ ├── ingestion/
│ ├── chunking/
│ ├── embeddings/
│ ├── retrieval/
│ ├── llm/
│ ├── evaluation/
│ └── utils/
│
├── vector_db/
├── app/
├── requirements.txt
└── README.md
git clone https://github.com/your-username/ask-my-documents.git
cd ask-my-documentspython -m venv venv
# Windows
venv\Scripts\activate
# Linux / macOS
source venv/bin/activatepip install -r requirements.txtLinux:
sudo apt-get install poppler-utils tesseract-ocrWindows:
Download and install:
Then add both to your system PATH.
- Semantic document chunking
- Multimodal preprocessing pipeline
- OCR-aware extraction
- Table-aware document understanding
- Image-aware retrieval enrichment
- Vision-language model integration
- Retrieval-ready semantic indexing
- Hybrid Retrieval (BM25 + Dense Retrieval)
- Cross-Encoder Re-ranking
- RAG Evaluation Pipeline (Ragas)
- Citation-aware responses
- Streamlit / FastAPI deployment
- Image embedding retrieval
- Parent-child retrieval
- Cross-modal search
- 📖 Research Paper Assistant — Query academic papers with grounded citations
- 🛠️ Technical Documentation QA — Instant answers from complex technical manuals
- 🏢 Enterprise Document Search — Retrieve insights from internal knowledge bases
- 📊 Table-Aware QA — Ask questions directly about tabular data
- 🌐 Multimodal Knowledge Retrieval — Combine text and image understanding
This project is licensed under the MIT License.
Developed as part of an advanced RAG engineering and multimodal retrieval learning journey.