This project is a local RAG system built with TypeScript and Node.js. It ingests PDF and Markdown files, stores document chunks in Chroma, combines vector search with BM25 retrieval, reranks the results, and generates grounded answers with citations.
The goal is simple: answer questions from your own documents with evidence, instead of relying only on model memory.
- loads
.pdf,.md, and.markdownfiles fromdata/raw - splits content into overlapping chunks
- stores embeddings in Chroma
- builds BM25 search from processed chunks
- combines keyword and vector retrieval
- reranks results before generation
- returns answers with citations
A plain LLM can answer fluently, but not always reliably from your source material. This project adds retrieval first, then asks the model to answer only from the retrieved context. That makes the output easier to inspect and more useful for document-based question answering.
- Documents are loaded from
data/raw. - Text is normalized and chunked.
- Chunks are saved to
data/processed/chunks.json. - The same chunks are embedded and stored in Chroma.
- At query time, the system retrieves with Chroma and BM25.
- Results are fused, reranked, and passed to Ollama for the final answer.
- Node.js
- TypeScript
- Express
- Ollama
- ChromaDB
- LangChain
wink-bm25-text-search@xenova/transformers
Install dependencies and copy the environment file:
npm install
cp .env.example .envDefault services expected locally:
- Ollama at
http://127.0.0.1:11434 - Chroma at
http://127.0.0.1:8000
Default Ollama models:
ollama pull llama3.1:8b
ollama pull nomic-embed-textAdd your source documents to data/raw.
Ingest documents:
npm run ingestQuery from the CLI:
npm run query -- "What does the document say about cancellations?"Run the API server:
npm run devBuild and run:
npm run build
npm startRun evaluation:
npm run evaluateReturns the active model, embedding model, and collection.
Runs ingestion and returns document and chunk counts.
Example request:
curl -X POST http://localhost:3000/query \
-H "Content-Type: application/json" \
-d '{"question":"Summarize the cancellation terms with citations.","topK":6}'Main settings come from .env:
OLLAMA_BASE_URLOLLAMA_LLM_MODELOLLAMA_EMBEDDING_MODELCHROMA_URLCHROMA_COLLECTION_NAMERAW_DOCS_DIRPROCESSED_DOCS_DIRDEFAULT_TOP_KDEFAULT_VECTOR_KDEFAULT_BM25_KCHUNK_SIZECHUNK_OVERLAPRERANKER_MODEL
data/processed/chunks.jsonis required because BM25 is built from it at runtime.- Chroma and the processed chunk file are both part of retrieval.
- First query can be slower because the reranker loads lazily.