Skip to content
/ RAG Public

πŸ€– Retrieval-Augmented Generation (RAG) system combining πŸ” vector search with 🧠 LLMs to enable accurate, context-aware responses from πŸ“„ custom document datasets.

License

Notifications You must be signed in to change notification settings

akash-aman/RAG

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

22 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸš€ RAG-LLM: Production-Grade Retrieval-Augmented Generation

A full-featured Retrieval-Augmented Generation system built with FastAPI, Milvus, and LM Studio. Implements advanced RAG techniques including Hybrid Search, HyDE, Cross-Encoder Reranking, Sub-Query Decomposition, Prompt Optimization, and Self-RAG.

πŸ“š Learning Project β€” This project is built purely for learning purposes. The goal is to mimic production-level architecture and components to understand how RAG systems are implemented behind the scenes in real-world applications β€” from the layered backend architecture (API β†’ Controller β†’ Services) to the vector search pipeline, cross-encoder reranking, and self-reflection loops.

πŸ–₯️ Local LLM β€” The LLM runs entirely locally via LM Studio on a machine with 128GB RAM, so there are no external API calls or cloud dependencies for inference.

πŸ–ΌοΈ Demo

Chat Interface

Streaming chat with RAG responses and source attribution

demo.mp4

πŸ“‘ Table of Contents


πŸ— Architecture Overview

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚   Client    │────▢│           FastAPI Application          │────▢│  Milvus  β”‚
β”‚ (Swagger UI)β”‚     β”‚                                        β”‚     β”‚ Vector DBβ”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜     β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚     β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                    β”‚  β”‚        RAG Controller             β”‚ β”‚
                    β”‚  β”‚                                   β”‚ β”‚     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                    β”‚  β”‚  Query ──▢ HyDE ──▢ Hybrid Search β”‚ │────▢│  Redis   β”‚
                    β”‚  β”‚    β”‚                    β”‚         β”‚ β”‚     β”‚ (Celery) β”‚
                    β”‚  β”‚    β–Ό                    β–Ό         β”‚ β”‚     β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                    β”‚  β”‚  Rerank ──▢ Optimize ──▢ LLM Gen  β”‚ β”‚
                    β”‚  β”‚                          β”‚        β”‚ β”‚     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                    β”‚  β”‚                     Self-RAG Loop β”‚ │────▢│ LM Studioβ”‚
                    β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚     β”‚ (LLM)    β”‚
                    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜     β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Layer Architecture

Layer Directory Responsibility
API src/api/ HTTP endpoints, request/response handling, JWT auth
Controller src/controllers/ Business logic orchestration, RAG pipeline
Services src/services/ Domain logic β€” ingestion, retrieval, generation
Infrastructure src/vector_db/ Database clients, schema management
Core src/core/ Config, security, shared utilities
Models src/models/ Pydantic schemas, data models

πŸ”„ RAG Pipeline β€” Step by Step

When a user sends a query, the following happens inside the RAGController:

User Query
    β”‚
    β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Step 1: QUERY ENHANCEMENT (HyDE)           β”‚
β”‚                                             β”‚
β”‚  β€’ LLM generates a "hypothetical answer"    β”‚
β”‚  β€’ That answer is embedded into a vector    β”‚
β”‚  β€’ This vector is used for retrieval        β”‚
β”‚  β€’ Why? The hypothetical doc is closer in   β”‚
β”‚    embedding space to real answers than the β”‚
β”‚    original question                        β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
    β”‚
    β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Step 2: HYBRID SEARCH                      β”‚
β”‚                                             β”‚
β”‚  Two parallel search strategies:            β”‚
β”‚                                             β”‚
β”‚  Dense Search (Milvus):                     β”‚
β”‚    β€’ COSINE similarity on HNSW index        β”‚
β”‚    β€’ Captures semantic meaning              β”‚
β”‚                                             β”‚
β”‚  Sparse Search (BM25):                      β”‚
β”‚    β€’ Keyword-based term matching            β”‚
β”‚    β€’ Captures exact keyword relevance       β”‚
β”‚                                             β”‚
β”‚  Fusion:                                    β”‚
β”‚    β€’ Reciprocal Rank Fusion (RRF)           β”‚
β”‚    β€’ Merges both ranked lists               β”‚
β”‚    β€’ RRF(d) = Ξ£ 1/(k + rank(d))             β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
    β”‚
    β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Step 3: RERANKING                          β”‚
β”‚                                             β”‚
β”‚  β€’ Cross-Encoder: ms-marco-MiniLM-L-6-v2    β”‚
β”‚  β€’ Scores each (query, passage) pair        β”‚
β”‚  β€’ Much more accurate than bi-encoder       β”‚
β”‚  β€’ Re-orders by true relevance score        β”‚
β”‚  β€’ Selects top-K most relevant chunks       β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
    β”‚
    β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Step 4: PROMPT OPTIMIZATION                β”‚
β”‚                                             β”‚
β”‚  Lost-in-the-Middle Reordering:             β”‚
β”‚    β€’ LLMs pay more attention to start/end   β”‚
β”‚    β€’ Best chunks β†’ positions 1 and N        β”‚
β”‚    β€’ Weaker chunks β†’ middle positions       β”‚
β”‚                                             β”‚
β”‚  Prompt Compression:                        β”‚
β”‚    β€’ Removes filler phrases                 β”‚
β”‚    β€’ Normalizes whitespace                  β”‚
β”‚    β€’ Trims to token budget                  β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
    β”‚
    β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Step 5: LLM GENERATION                     β”‚
β”‚                                             β”‚
β”‚  β€’ Context + query β†’ LLM (LM Studio)        β”‚
β”‚  β€’ System prompt enforces grounding         β”‚
β”‚  β€’ "Only answer from the provided context"  β”‚
β”‚  β€’ Supports streaming (SSE)                 β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
    β”‚
    β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Step 6: SELF-RAG (Reflection)              β”‚
β”‚                                             β”‚
β”‚  β€’ LLM evaluates its own answer             β”‚
β”‚  β€’ Checks: Is it grounded? Complete?        β”‚
β”‚  β€’ If confidence < threshold:               β”‚
β”‚    β†’ Generates a refined query              β”‚
β”‚    β†’ Re-retrieves with new query            β”‚
β”‚    β†’ Merges new context + regenerates       β”‚
β”‚  β€’ Maximum 1 retry to avoid loops           β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
    β”‚
    β–Ό
  Final Answer + Sources + Metadata

Ingestion Pipeline

When a document is uploaded:

Upload File (PDF/TXT/MD)
    β”‚
    β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Load File   │───▢│   Chunking   │───▢│  Embedding   │───▢│  Milvus  β”‚
β”‚  (PyPDF2)    β”‚    β”‚  (512 tok,   β”‚    β”‚  (MiniLM-L6  β”‚    β”‚  Insert  β”‚
β”‚              β”‚    β”‚  128 overlap)β”‚    β”‚   384-dim)   β”‚    β”‚          β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                         β”‚                    β”‚
                    Text cleaning        L2-normalized
                    (whitespace,         dense vectors
                     special chars)      (cosine ready)

πŸ“‚ Folder Structure

RAG-LLM/
β”‚
β”œβ”€β”€ src/                          # Main application source
β”‚   β”œβ”€β”€ main.py                   # FastAPI app factory, CORS, rate limiting, lifespan
β”‚   β”œβ”€β”€ rag_utils.py              # Legacy RAG utilities (preserved for reference)
β”‚   β”‚
β”‚   β”œβ”€β”€ core/                     # Cross-cutting concerns
β”‚   β”‚   β”œβ”€β”€ config.py             # Pydantic Settings β€” all env vars centralized
β”‚   β”‚   └── security.py           # JWT auth, password hashing, RBAC, input sanitization
β”‚   β”‚
β”‚   β”œβ”€β”€ models/                   # Data models
β”‚   β”‚   β”œβ”€β”€ schemas.py            # Pydantic request/response schemas for all endpoints
β”‚   β”‚   └── database.py           # In-memory user store (replace with DB in production)
β”‚   β”‚
β”‚   β”œβ”€β”€ api/                      # API layer
β”‚   β”‚   β”œβ”€β”€ dependencies.py       # FastAPI dependency injection (user store, vector client)
β”‚   β”‚   └── v1/                   # Versioned API routes
β”‚   β”‚       β”œβ”€β”€ auth.py           # POST /register, POST /login, GET /me, GET /users
β”‚   β”‚       β”œβ”€β”€ ingest.py         # POST /ingest, GET /ingest, DELETE /ingest/{doc_id}
β”‚   β”‚       └── query.py          # POST /query (JSON + SSE streaming)
β”‚   β”‚
β”‚   β”œβ”€β”€ controllers/              # Business logic orchestration
β”‚   β”‚   └── rag_controller.py     # Full RAG pipeline: HyDE β†’ Search β†’ Rerank β†’ LLM β†’ Self-RAG
β”‚   β”‚
β”‚   β”œβ”€β”€ services/                 # Domain services (core RAG logic lives here)
β”‚   β”‚   β”œβ”€β”€ ingestion/            # Document processing pipeline
β”‚   β”‚   β”‚   β”œβ”€β”€ chunker.py        # Text loading (PDF/TXT) + RecursiveCharacterTextSplitter
β”‚   β”‚   β”‚   β”œβ”€β”€ embedder.py       # Sentence-transformer embeddings (all-MiniLM-L6-v2)
β”‚   β”‚   β”‚   └── indexer.py        # Orchestrates: load β†’ chunk β†’ embed β†’ store in Milvus
β”‚   β”‚   β”‚
β”‚   β”‚   β”œβ”€β”€ retrieval/            # Search & retrieval strategies
β”‚   β”‚   β”‚   β”œβ”€β”€ hybrid_search.py  # Dense (Milvus) + Sparse (BM25) with RRF fusion
β”‚   β”‚   β”‚   β”œβ”€β”€ reranker.py       # Cross-encoder reranking (ms-marco-MiniLM-L-6-v2)
β”‚   β”‚   β”‚   β”œβ”€β”€ hyde.py           # HyDE: hypothetical document embedding for better recall
β”‚   β”‚   β”‚   └── sub_query.py      # Decomposes complex queries into 2-4 sub-queries
β”‚   β”‚   β”‚
β”‚   β”‚   └── generator/            # LLM interaction & response generation
β”‚   β”‚       β”œβ”€β”€ llm_client.py     # Multi-provider LLM client (LM Studio/OpenAI/Ollama)
β”‚   β”‚       β”œβ”€β”€ prompt_optimizer.py # Lost-in-the-middle reordering + compression
β”‚   β”‚       └── self_rag.py       # Self-reflection loop with confidence evaluation
β”‚   β”‚
β”‚   β”œβ”€β”€ vector_db/                # Vector database layer
β”‚   β”‚   β”œβ”€β”€ client.py             # Milvus client β€” CRUD, hybrid search, schema migration
β”‚   β”‚   └── schema.py             # Collection schema constants and index configs
β”‚   β”‚
β”‚   └── utils/                    # Shared utilities
β”‚       β”œβ”€β”€ logger/logger.py      # Centralized logging config
β”‚       └── embedding/generic.py  # Advanced TextTransformer (multiple chunking strategies)
β”‚
β”œβ”€β”€ workers/                      # Async task processing
β”‚   β”œβ”€β”€ celery_app.py             # Celery configuration (Redis broker)
β”‚   └── ingestion_worker.py       # Background document ingestion task
β”‚
β”œβ”€β”€ tests/                        # Test suite
β”‚   β”œβ”€β”€ test_api.py               # Integration tests (12 tests covering all endpoints)
β”‚   └── sample.txt                # Sample document for testing
β”‚
β”œβ”€β”€ notebooks/                    # Jupyter notebooks
β”‚   β”œβ”€β”€ rag_explained.ipynb       # ⬅️  Interactive walkthrough of the RAG pipeline
β”‚   β”œβ”€β”€ embedding.ipynb           # Embedding experiments
β”‚   └── vectordb.ipynb            # Vector database experiments
β”‚
β”œβ”€β”€ frontend/                     # Frontend app (React + Vite + shadcn/ui)
β”œβ”€β”€ data/                         # Data directories
β”œβ”€β”€ docker-compose.yml            # Full stack: Milvus + Redis + API + Worker
β”œβ”€β”€ Dockerfile                    # Multi-stage Docker build
β”œβ”€β”€ requirements.txt              # Python dependencies
└── .env                          # Environment variables

βš™οΈ Tech Stack

Component Technology Purpose
API Framework FastAPI Async HTTP server with auto-generated OpenAPI docs
Vector Database Milvus Dense vector storage with HNSW/COSINE indexing
Embedding Model all-MiniLM-L6-v2 384-dim sentence embeddings
Reranker Model ms-marco-MiniLM-L-6-v2 Cross-encoder for passage reranking
LLM Provider LM Studio / OpenAI / Ollama Text generation via OpenAI-compatible API
Task Queue Celery + Redis Async background document processing
Auth JWT (python-jose) Token-based authentication with RBAC
Rate Limiting SlowAPI Per-endpoint rate limits
Sparse Search rank_bm25 BM25 keyword matching for hybrid search

πŸš€ Getting Started

Prerequisites

  • Python 3.10+
  • Docker & Docker Compose (for Milvus & Redis)
  • LM Studio running locally (or any OpenAI-compatible LLM server)

1. Start Infrastructure

# Start Milvus (vector DB) and Redis (task queue)
docker-compose up -d standalone redis

2. Install Dependencies

pip install -r requirements.txt

3. Configure Environment

Edit .env with your settings:

# LLM (point to your LM Studio or OpenAI endpoint)
LLM_BASE_URL=http://127.0.0.1:1234/v1
LLM_API_KEY=lm-studio
LLM_MODEL=local-model

# Milvus
MILVUS_HOST=localhost
MILVUS_PORT=19530

# JWT Secret (change in production!)
JWT_SECRET_KEY=super-secret-change-me-in-prod

4. Run the API

uvicorn src.main:app --host 0.0.0.0 --port 8081

5. Open Swagger UI

Visit http://localhost:8081/docs for interactive API documentation.

Quick workflow:

  1. POST /api/v1/auth/register β€” create a user
  2. POST /api/v1/auth/login β€” get a JWT token
  3. Click Authorize πŸ”’ β†’ enter Bearer <your_token>
  4. POST /api/v1/ingest β€” upload a document
  5. GET /api/v1/ingest β€” see your ingested documents
  6. POST /api/v1/query β€” ask questions about your documents

πŸ“‘ API Reference

Authentication

Method Endpoint Auth Description
POST /api/v1/auth/register None Create a new user account
POST /api/v1/auth/login None Login β†’ returns JWT token
GET /api/v1/auth/me Bearer Current user info
GET /api/v1/auth/users Admin List all users

Document Ingestion

Method Endpoint Auth Description
POST /api/v1/ingest Bearer Upload document (PDF/TXT/MD)
GET /api/v1/ingest Bearer List your ingested documents
DELETE /api/v1/ingest/{doc_id} Bearer Delete a specific document
DELETE /api/v1/ingest Bearer Delete all your documents

RAG Query

Method Endpoint Auth Description
POST /api/v1/query Bearer Ask a question (supports SSE streaming)

Query Request Body:

{
  "query": "What is retrieval-augmented generation?",
  "stream": false,
  "enable_hyde": true,
  "enable_reranking": true,
  "enable_self_rag": true,
  "top_k": 5,
  "filters": {}
}

Health

Method Endpoint Description
GET / App info
GET /health Health check + Milvus status

βš™ Configuration

All settings are in .env and loaded via src/core/config.py:

Variable Default Description
LLM_BASE_URL http://127.0.0.1:1234/v1 LLM endpoint
LLM_MODEL local-model Model name
EMBEDDING_MODEL all-MiniLM-L6-v2 Sentence-transformer model
EMBEDDING_DIM 384 Embedding dimension
CHUNK_SIZE 512 Characters per chunk
CHUNK_OVERLAP 128 Overlap between chunks
RETRIEVAL_TOP_K 20 Candidates from search
RETRIEVAL_FINAL_K 5 Final chunks after reranking
ENABLE_HYDE true Enable HyDE query enhancement
ENABLE_RERANKING true Enable cross-encoder reranking
ENABLE_SELF_RAG true Enable self-reflection loop
RATE_LIMIT_QUERY 30/minute Query endpoint rate limit
JWT_SECRET_KEY ... JWT signing key

πŸ““ Understanding the Code β€” Jupyter Notebook

For a deep-dive into how each component works, open the interactive notebook:

jupyter notebook notebooks/rag_explained.ipynb

The notebook walks through:

  1. Document Chunking β€” how text is split into overlapping segments
  2. Embedding β€” how chunks become 384-dim vectors
  3. Vector Search β€” how Milvus finds similar documents
  4. BM25 Sparse Search β€” keyword-based retrieval
  5. Hybrid Fusion (RRF) β€” combining dense + sparse results
  6. Cross-Encoder Reranking β€” improving result quality
  7. HyDE β€” hypothetical document embeddings
  8. Prompt Optimization β€” lost-in-the-middle reordering
  9. Self-RAG β€” self-reflection and re-retrieval

πŸ§ͺ Running Tests

# Run all 12 integration tests
python -m pytest tests/test_api.py -v

Tests cover: health checks, authentication, document ingestion, and RAG queries.


🐳 Docker Deployment

# Full stack (Milvus + Redis + API + Celery Worker)
docker-compose up -d

# API only (if infra already running)
docker-compose up -d rag-api

πŸ“„ License

MIT

πŸ–ΌοΈ Screenshots

Chat Interface

Streaming chat with RAG responses and source attribution

Chat

Document Management

Drag-and-drop upload, document listing with metadata

Documents

LM Studio (Local LLM)

Running selene-1-mini-llama-3.1-8b model locally

LM Studio


About

πŸ€– Retrieval-Augmented Generation (RAG) system combining πŸ” vector search with 🧠 LLMs to enable accurate, context-aware responses from πŸ“„ custom document datasets.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published