A tiny, dependency-light Retrieval-Augmented Generation (RAG) toolkit in pure Python. Index a folder of documents and ask grounded, source-cited questions about them. It runs fully offline out of the box using built-in TF-IDF embeddings — no API keys, no network — and swaps in OpenAI for production-quality results with a single line of code.
Most RAG tutorials drag in a heavy vector database and a paid API before you can
run a single query. raglite keeps the whole pipeline transparent and hackable:
roughly 400 lines of readable code that show exactly how chunking, embedding,
vector retrieval and answer generation fit together — while still exposing clean
seams to plug in real production components.
documents ──▶ chunking ──▶ embeddings ──▶ vector store ──▶ retrieval ──▶ answer
(sentence-aware, (TF-IDF or (cosine / (top-k) (extractive
overlapping) OpenAI) dot product) or LLM)
Each stage is an interchangeable component behind a small protocol, so you can replace the offline embedder with OpenAI, or the extractive answerer with a chat model, without touching the rest of the system.
git clone https://github.com/GoodVaibhs/raglite.git
cd raglite
pip install -e .Optional production backends:
pip install -e ".[openai]"from raglite import RAGPipeline
rag = RAGPipeline()
rag.add_text("The Eiffel Tower is located in Paris, France.", source="geo")
rag.add_directory("examples/sample_docs") # index .txt files in a folder
rag.build()
result = rag.ask("Where is the Eiffel Tower?")
print(result.answer)
for s in result.sources:
print(f" {s.score:.2f} {s.chunk.source}")# One-shot question
python -m raglite.cli --docs examples/sample_docs --ask "What are the stages of a RAG pipeline?"
# Interactive REPL
python -m raglite.cli --docs examples/sample_docsfrom raglite import RAGPipeline, OpenAIEmbedder, OpenAIAnswerer
rag = RAGPipeline(
embedder=OpenAIEmbedder(model="text-embedding-3-small"),
answerer=OpenAIAnswerer(model="gpt-4o-mini"),
)
rag.add_directory("examples/sample_docs").build()
print(rag.ask("How does RAG reduce hallucination?").answer)Set OPENAI_API_KEY in your environment first.
- Sentence-aware chunking with configurable overlap so facts that straddle a boundary are not lost.
- Hashed TF-IDF embeddings — deterministic, fast and dependency-free, with smoothed inverse-document-frequency weighting.
- Cosine similarity via normalised dot products — vectors are L2-normalised on insertion, so a query is a single matrix-vector multiply.
- Protocol-based components (
Embedder,Answerer) make backends swappable.
pip install -e ".[dev]"
pytest -vThe test suite covers chunking edge cases, embedding similarity properties, and an end-to-end retrieval check.
src/raglite/
chunking.py sentence-aware, overlapping chunker
embeddings.py TF-IDF (offline) + OpenAI (optional) embedders
vectorstore.py in-memory cosine-similarity store
llm.py extractive (offline) + OpenAI (optional) answerers
pipeline.py high-level RAGPipeline API
cli.py command-line interface
tests/ pytest suite
examples/ sample documents
MIT — see LICENSE.