Skip to content

GoodVaibhs/raglite

Repository files navigation

raglite

A tiny, dependency-light Retrieval-Augmented Generation (RAG) toolkit in pure Python. Index a folder of documents and ask grounded, source-cited questions about them. It runs fully offline out of the box using built-in TF-IDF embeddings — no API keys, no network — and swaps in OpenAI for production-quality results with a single line of code.

CI Python License

Why this exists

Most RAG tutorials drag in a heavy vector database and a paid API before you can run a single query. raglite keeps the whole pipeline transparent and hackable: roughly 400 lines of readable code that show exactly how chunking, embedding, vector retrieval and answer generation fit together — while still exposing clean seams to plug in real production components.

How it works

documents ──▶ chunking ──▶ embeddings ──▶ vector store ──▶ retrieval ──▶ answer
            (sentence-aware,  (TF-IDF or     (cosine /        (top-k)     (extractive
             overlapping)      OpenAI)        dot product)                 or LLM)

Each stage is an interchangeable component behind a small protocol, so you can replace the offline embedder with OpenAI, or the extractive answerer with a chat model, without touching the rest of the system.

Install

git clone https://github.com/GoodVaibhs/raglite.git
cd raglite
pip install -e .

Optional production backends:

pip install -e ".[openai]"

Quick start (Python)

from raglite import RAGPipeline

rag = RAGPipeline()
rag.add_text("The Eiffel Tower is located in Paris, France.", source="geo")
rag.add_directory("examples/sample_docs")   # index .txt files in a folder
rag.build()

result = rag.ask("Where is the Eiffel Tower?")
print(result.answer)
for s in result.sources:
    print(f"  {s.score:.2f}  {s.chunk.source}")

Quick start (CLI)

# One-shot question
python -m raglite.cli --docs examples/sample_docs --ask "What are the stages of a RAG pipeline?"

# Interactive REPL
python -m raglite.cli --docs examples/sample_docs

Using OpenAI for production-quality answers

from raglite import RAGPipeline, OpenAIEmbedder, OpenAIAnswerer

rag = RAGPipeline(
    embedder=OpenAIEmbedder(model="text-embedding-3-small"),
    answerer=OpenAIAnswerer(model="gpt-4o-mini"),
)
rag.add_directory("examples/sample_docs").build()
print(rag.ask("How does RAG reduce hallucination?").answer)

Set OPENAI_API_KEY in your environment first.

Design notes

  • Sentence-aware chunking with configurable overlap so facts that straddle a boundary are not lost.
  • Hashed TF-IDF embeddings — deterministic, fast and dependency-free, with smoothed inverse-document-frequency weighting.
  • Cosine similarity via normalised dot products — vectors are L2-normalised on insertion, so a query is a single matrix-vector multiply.
  • Protocol-based components (Embedder, Answerer) make backends swappable.

Testing

pip install -e ".[dev]"
pytest -v

The test suite covers chunking edge cases, embedding similarity properties, and an end-to-end retrieval check.

Project layout

src/raglite/
  chunking.py      sentence-aware, overlapping chunker
  embeddings.py    TF-IDF (offline) + OpenAI (optional) embedders
  vectorstore.py   in-memory cosine-similarity store
  llm.py           extractive (offline) + OpenAI (optional) answerers
  pipeline.py      high-level RAGPipeline API
  cli.py           command-line interface
tests/             pytest suite
examples/          sample documents

License

MIT — see LICENSE.

About

A tiny, dependency-light Retrieval-Augmented Generation (RAG) toolkit in pure Python. Runs fully offline with built-in TF-IDF embeddings, or plug in OpenAI.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages