RAG Lab

A local-first visual document analysis workbench. Upload PDFs and images, ask questions, and get answers grounded in your documents — all running on your own hardware.

RAG Lab uses ColPali visual embeddings to understand document layout and content visually, paired with any Ollama-compatible LLM for inference. Run models locally or connect to Ollama Cloud for larger models.

Features

RAG & retrieval

Visual RAG — ColPali embeddings capture page layout, tables, and figures natively without text extraction. Optional OCR mode extracts text from retrieved pages for LLMs that work better with text input.
Hybrid retrieval — Combines ColPali visual search with BM25 keyword search for the best of both worlds. Visual embeddings find charts and layouts; keyword matching anchors on specific terms. Matching text snippets appear as expandable citations below each retrieved page.
LLM reranking (opt-in) — Second-stage relevance scoring of the top-K retrieved pages by a small LLM. Improves precision for ambiguous queries. Toggle in Settings → Retrieval.
HyDE query expansion (opt-in) — Generates a hypothetical answer passage and retrieves against it, bridging the lexical gap between questions and documents. Toggle in Settings → Retrieval.
Adaptive retrieval — Score-slope analysis dynamically adjusts how many pages are retrieved per query.
Batch processing — Process multiple documents with per-document streaming responses.
Full-document summarization — Generic queries ("summarize") automatically process all pages in sequential chunks.

LLM & UX

Any Ollama model — Works with any local model, plus Ollama Cloud models via API key (Settings → Advanced). Vision, text, and reasoning models supported.
Prompt templates — Create custom extraction templates (e.g., K-1 line item extractor) for structured data output.
Conversation memory — Mem0 automatically extracts and recalls context across chat sessions (disabled during RAG to keep answers document-grounded).
Multi-session — Create, switch, and manage independent analysis sessions. Chat history is paginated (50 messages at a time) with a Load earlier pill for long conversations.
Stream retry — If an LLM stream is interrupted, the partial response stays visible and a one-click Retry button reruns the same request.
Dark mode — Theme toggle persists via localStorage; respects prefers-color-scheme on first load.
Keyboard shortcuts — ? opens the overlay. Esc closes modals. Ctrl+N new session, Ctrl+, settings, Ctrl+B sidebar.

Security & observability

Defaults are safe; most integrations are opt-in so a local install ships nothing externally.

Auth — FastAPI-Users with username + argon2, JWT cookie (SameSite=lax, HttpOnly). Password policy: 8–128 chars, one letter + one digit. Login is constant-time to defeat account enumeration.
Rate limiting — Per-IP limits on /auth/login, /auth/register, and /documents/upload (env-overridable).
CSRF protection — Origin-header validation on cookie-authenticated state-changing requests.
Security headers — X-Frame-Options: DENY, X-Content-Type-Options: nosniff, Referrer-Policy, Permissions-Policy always on. Content-Security-Policy opt-in via CSP_ENABLE=true.
Path traversal + upload guards — Filenames validated against an extension allowlist, resolved paths must stay within the session's upload dir, size-capped at MAX_UPLOAD_SIZE.
No pickle — Caches use numpy/JSON; inter-process IPC uses torch.save(weights_only=True) + base64-in-JSON. Pickle deserialization is an RCE vector and this project refuses to rely on it.
Log redaction — JWT_SECRET, OLLAMA_API_KEY, Authorization: Bearer …, hf_…, sk-… are scrubbed from every log line before it reaches a handler.
Structured logs (opt-in) — LOG_FORMAT=json flips console + file output to one JSON record per line for Loki/Elastic/Datadog.
Error tracking (opt-in) — SENTRY_DSN activates Sentry on both backend (sentry-sdk[fastapi]) and frontend (@sentry/sveltekit).
Metrics — Prometheus scrape at /metrics (admin-only) with HTTP volume + latency, cache hit rates, LLM inference + retrieval durations.

Quick Start

Prerequisites

Python 3.10+ with CUDA-capable GPU
Ollama installed and running
Node.js 18+ for the frontend

1. Clone and install

git clone https://github.com/inkind79/rag-lab.git
cd rag-lab

# Python environment
python -m venv venv
source venv/bin/activate
pip install -r requirements.txt

# Frontend
cd frontend
npm install
cd ..

2. Pull models

# Any Ollama model for chat — vision models recommended for document analysis
ollama pull <your-preferred-model>   # e.g., gemma4, qwen3-vl, llama3.2-vision, phi4

# Required by Mem0 for conversation memory
ollama pull nomic-embed-text         # embeddings
ollama pull gemma3:4b                # memory extraction (small text model)

RAG Lab auto-detects your installed Ollama models on startup — no configuration needed.

3. Configure

cp .env.example .env
# Optionally edit .env to set a persistent JWT_SECRET (a random one is generated if not set)

4. Run

# Terminal 1: Backend
uvicorn fastapi_app:app --host 127.0.0.1 --port 8000

# Terminal 2: Frontend
cd frontend && npm run dev

Open http://localhost:5173 — follow the "Create one" link on the login page to register, then sign in.

Architecture

rag-lab/
├── fastapi_app.py              # Application entry point
├── src/
│   ├── api/                    # FastAPI routes (auth, chat, sessions, documents)
│   ├── models/                 # ColPali adapter, Ollama/HF handlers, LanceDB, RAG retriever
│   ├── services/               # Document processor, response generator, batch processor
│   └── utils/                  # Memory management, model configs, logging
├── frontend/
│   └── src/
│       ├── routes/             # SvelteKit pages (+page.svelte, +layout.svelte)
│       └── lib/
│           ├── components/     # Markdown, DocumentPanel, TemplatePanel, Settings
│           ├── stores/         # Svelte stores (chat, session, toast)
│           └── api/            # API client (streamChat, sessions, documents)
└── config/                     # Model configs, global settings

Stack

Layer	Technology
Frontend	SvelteKit 2, Svelte 5, TypeScript
Backend	FastAPI, Python 3.10+
Embeddings	ColQwen3.5 (ColPali visual embeddings)
Vector store	LanceDB (multi-vector, local)
LLM inference	Ollama (any local model — vision, text, or reasoning)
Memory	Mem0 (automatic, local Ollama + ChromaDB)
Auth	FastAPI-Users (SQLite, registration, argon2 hashing, JWT cookies)

System Requirements

GPU: NVIDIA with 8GB+ VRAM (16GB+ recommended for larger models)
RAM: 16GB+
Storage: 20GB+ (models + dependencies)
OS: Linux / WSL2 (requires NVIDIA CUDA support)

Development

# Run tests (CI-minimal install — no torch/colpali/lancedb needed)
pip install -r requirements-test.txt
pytest                                # ~5s full suite

# Lint
ruff check .
cd frontend && npm run check          # svelte-check

# Regenerate TS types after changing a Pydantic request/response model
python scripts/gen_types.py

# Benchmark retrieval against a golden set
python -m src.eval.cli --golden tests/fixtures/eval/sample_golden_set.json

# Browser E2E (requires backend + Vite running)
cd frontend && npm run test:e2e       # Playwright smoke suite

Continuous integration runs test / lint / security (gitleaks + bandit) on every PR — see .github/workflows/.

Contributing

Contributions are welcome. Please open an issue first to discuss what you'd like to change.

Fork the repo
Create a feature branch (git checkout -b feature/my-feature)
Commit your changes
Push to the branch and open a Pull Request

License

MIT — see LICENSE for details.

Name		Name	Last commit message	Last commit date
Latest commit History 86 Commits
.claude		.claude
.github/workflows		.github/workflows
config		config
docs		docs
frontend		frontend
scripts		scripts
src		src
tests		tests
.env.example		.env.example
.gitignore		.gitignore
.mcp.json		.mcp.json
.pre-commit-config.yaml		.pre-commit-config.yaml
ARCHITECTURE.md		ARCHITECTURE.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
fastapi_app.py		fastapi_app.py
pyproject.toml		pyproject.toml
requirements-dev.txt		requirements-dev.txt
requirements-test.txt		requirements-test.txt
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RAG Lab

Features

RAG & retrieval

LLM & UX

Security & observability

Quick Start

Prerequisites

1. Clone and install

2. Pull models

3. Configure

4. Run

Architecture

Stack

System Requirements

Development

Further reading

Contributing

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

RAG Lab

Features

RAG & retrieval

LLM & UX

Security & observability

Quick Start

Prerequisites

1. Clone and install

2. Pull models

3. Configure

4. Run

Architecture

Stack

System Requirements

Development

Further reading

Contributing

License

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages