RAGRig

Open-source RAG workbench for traceable, model-ready knowledge pipelines.

源栈: from scattered sources to traceable, model-ready knowledge.

About

RAGRig is an open-source RAG workbench for small and medium-sized teams.

RAGRig is not another chat-with-file wrapper. It focuses on the operational layer around RAG: ingestion, parsing, cleaning, chunking, embedding, indexing, retrieval, answer grounding, model/provider selection, evaluation, and traceability.

Why RAGRig

Local-first: start with local files, Postgres/pgvector, Ollama, LM Studio, BGE, and self-hosted OpenAI-compatible runtimes.
Cloud-ready: support mainstream cloud model entry points such as OpenAI, OpenRouter, and Gemini, with Vertex AI and Bedrock tracked as roadmap/provider catalog items.
Traceable by design: connect each answer back to source URI, document version, chunk, pipeline run, and model/provider diagnostics.
Model-flexible: keep LLM, embedding, reranker, OCR, and parser providers behind explicit registry contracts.
Vector-store portable: use Postgres/pgvector as the default and keep Qdrant as an optional backend.
Pipeline-oriented: make parsing, cleaning, chunking, embedding, indexing, and reranking inspectable instead of hiding them behind a chat box.
Plugin-first: extend sources, sinks, models, vector stores, parsers, preview tools, and workflow nodes without bloating the core.
Quality-gated: core modules target 100% test coverage; optional cloud/enterprise integrations use contract tests and opt-in live smoke checks.

Architecture

flowchart LR
    inputs["Inputs<br/>files, URLs, object storage, docs, DB"]
    pipeline["Pipeline<br/>parse, clean, chunk, embed, index"]
    core["RAGRig core<br/>KB, docs, versions, chunks, runs, audit"]
    providers["Provider registry<br/>LLM, embedding, reranker, parser"]
    vectors["Vector backends<br/>pgvector default, Qdrant optional"]
    console["Web Console<br/>configure, preview, health, playground"]
    answer["Retrieval + Answer<br/>hits, citations, diagnostics"]

    inputs --> pipeline
    providers --> pipeline
    pipeline --> core
    core --> vectors
    vectors --> answer
    core --> console
    providers --> console
    answer --> console

Tech Stack

Layer	Current / Default	Optional / Roadmap
App/API	Python, FastAPI	MCP/export surfaces
Web Console	FastAPI-served lightweight console	richer workflow UI
Metadata DB	PostgreSQL	SQLite for smoke/test paths
Vector backend	pgvector	Qdrant
Local models	Ollama, LM Studio, OpenAI-compatible endpoints	vLLM, llama.cpp, Xinference, LocalAI
Cloud models	OpenAI, OpenRouter, Gemini	Vertex AI, Bedrock, Azure OpenAI, Anthropic catalog entries
Inputs	local files, Markdown/TXT, S3-compatible sources	PDF/DOCX upload, URLs, enterprise connectors
Quality	pytest, coverage, contract tests	opt-in live provider smoke

Roadmap

Local Pilot

The next roadmap milestone is a simple local pilot. It is one step toward the broader platform, not the project positioning itself.

Target user journey:

Start the local stack.
Open the Web Console.
Create a knowledge base.
Upload Markdown, TXT, PDF, or DOCX, or import one public page, sitemap, or docs page list.
Choose a model provider.
Run ingestion and indexing.
Ask a question in Playground and inspect answer citations, retrieval hits, chunks, and provider diagnostics.

See Local Pilot spec for scope and acceptance criteria.

Later Milestones

richer Web Console workflow management
advanced PDF/DOCX/OCR parsing
broader source and sink plugins
evaluation dashboards and regression gates
enterprise permission, audit, and connector hardening

Web Console

The Web Console is the main operator surface for RAGRig. The intended first-run shape is:

knowledge base list
source setup and ingestion tasks
model configuration and health checks
pipeline run history
document and chunk preview
retrieval and answer Playground
health and database/vector status

Prototype:

Quick Start

Vercel Preview + Supabase

RAGRig can run as a Vercel Preview deployment backed by Supabase Postgres. This is for online product preview; Docker remains the recommended local pilot path.

Required Vercel Preview environment variables:

DATABASE_URL=postgresql://USER:PASSWORD@HOST:PORT/postgres?sslmode=require
VECTOR_BACKEND=pgvector
APP_ENV=preview

For local migration and make db-check against Supabase, also set:

DB_RUNTIME_HOST=HOST
DB_HOST_PORT=PORT

Run migrations from a trusted local or CI environment before using the Preview DB:

DATABASE_URL='postgresql://USER:PASSWORD@HOST:PORT/postgres?sslmode=require' \
DB_RUNTIME_HOST='HOST' \
DB_HOST_PORT='PORT' \
uv run alembic upgrade head

After Vercel creates a Preview deployment:

VERCEL_PREVIEW_URL='https://your-preview-url.vercel.app' make vercel-preview-smoke

Model credentials are optional for Preview startup; no model credentials are required for startup. See EVI-130 for the full deployment contract.

10-Minute Local Pilot Demo

Run the minimal preflight first. It checks only startup-critical items such as the app import path, an ephemeral database health check, writable artifacts, and Docker availability for Docker mode.

make pilot-docker-preflight

Model configuration is optional for startup. Ollama, LM Studio, Gemini, OpenAI, OpenRouter, rerankers, and external stores are reported as readiness items elsewhere; missing model credentials should not stop the app from booting.

Start the demo stack:

make pilot-up
make pilot-docker-smoke

Open the Web Console:

http://localhost:8000/console

Create a knowledge base, then upload the demo documents:

examples/local-pilot/company-handbook.md
examples/local-pilot/support-faq.md

Use the suggested questions in:

examples/local-pilot/demo-questions.json

After upload, inspect the pipeline run, open chunk preview, ask a Playground question, and confirm the answer is grounded with citations.

Docker Local Pilot

Build and start the local pilot stack:

make pilot-up
make pilot-docker-smoke

Open:

http://localhost:8000/console

Stop the stack:

make pilot-down

The Docker image does not bundle LLM weights or model runtimes. For local models, run Ollama or LM Studio on the host and configure an OpenAI-compatible endpoint with RAGRIG_ANSWER_BASE_URL. Cloud providers such as Gemini, OpenAI, and OpenRouter are enabled by passing their API keys as environment variables.

To build only the application image:

make pilot-docker-build

Developer Setup

Install dependencies:

make sync

Create local environment:

cp .env.example .env

Start the database and run migrations:

docker compose up --build -d db
make migrate
make db-check

Run the current local ingestion and indexing smoke path:

make ingest-local
make index-local
make retrieve-check QUERY="RAGRig Guide"

Run the Local Pilot API smoke:

make local-pilot-smoke

Start the Web Console:

make run-web

Open:

http://localhost:8000/console

If ports 8000 or 5432 are already in use, update .env:

APP_HOST_PORT=18000
DB_HOST_PORT=15433

Optional Qdrant path:

docker compose --profile qdrant up -d qdrant
uv sync --extra vectorstores
VECTOR_BACKEND=qdrant make index-local
VECTOR_BACKEND=qdrant make retrieve-check QUERY="RAGRig Guide"

Verification

Default checks:

make format
make lint
make test
make coverage
make web-check
make local-pilot-smoke
make dependency-inventory

Browser-level Local Pilot Console check:

make local-pilot-console-e2e

This starts an ephemeral SQLite-backed app, verifies a failed upload/retry path, uploads Markdown/PDF/DOCX through the Web Console, checks pipeline/chunk UI, and asks one grounded Playground question. It requires npm and a local Chrome/Chromium browser; set RAGRIG_CONSOLE_E2E_BROWSER_CHANNEL=chromium if Chrome is not available.

Supply-chain checks:

make licenses
make sbom
make audit

make audit needs network access to vulnerability services. Offline environments should run make audit-dry-run and record the missing live audit as a release blocker.

Authentication

RAGRig ships with password-based authentication and per-workspace isolation.

Configuration

Variable	Default	Description
`RAGRIG_AUTH_ENABLED`	`true`	Enable auth enforcement. Set `false` for local dev (no login required).
`RAGRIG_AUTH_SESSION_DAYS`	`30`	Session token lifetime in days.
`RAGRIG_AUTH_SECRET_PEPPER`	dev default	HMAC pepper for token hashing. Always override in production.

First-run setup

When RAGRIG_AUTH_ENABLED=true, navigate to the web console — you will be redirected to the login page. Register the first account via Create account. The first account automatically receives the owner role for the default workspace.

Role-based access

Role	Description
`owner`	Full access, including member management and role assignment
`admin`	Can manage members (except owner assignment) and all write operations
`editor`	Can write to knowledge bases, run pipelines, upload documents
`viewer`	Read-only access

Write routes (POST /knowledge-bases, POST /knowledge-bases/{name}/upload, pipeline and source operations) require editor or above. Processing-profile mutations and rollbacks require admin or above.

Member management

# List workspace members
curl /auth/workspace/members \
  -H "Authorization: Bearer rag_session_..."

# Change a member's role (admin/owner only)
curl -X PATCH /auth/workspace/members/{user_id} \
  -H "Authorization: Bearer rag_session_..." \
  -H "Content-Type: application/json" \
  -d '{"role": "editor"}'

# Remove a member (admin/owner only)
curl -X DELETE /auth/workspace/members/{user_id} \
  -H "Authorization: Bearer rag_session_..."

API keys

Token-based API access is supported alongside browser sessions:

# Create an API key via a registered session (replace TOKEN and NAME)
curl -X POST /auth/api-keys \
  -H "Authorization: Bearer rag_session_..." \
  -H "Content-Type: application/json" \
  -d '{"name": "ci-key"}'

# Use the returned key on API requests
curl /knowledge-bases \
  -H "Authorization: Bearer rag_live_..."

Local development (auth disabled)

For local iteration without login overhead:

RAGRIG_AUTH_ENABLED=false uv run uvicorn ragrig.main:app --reload

All requests are routed to the default workspace as an anonymous user.

Production Guardrails

RAGRig blocks the deterministic fake reranker fallback in production by default. Set RAGRIG_ALLOW_FAKE_RERANKER=true only for explicit demos or accepted degraded environments. The /health endpoint reports the current reranker policy.

Documentation

Key specs:

Operations:

Repository Layout

.
├── assets/             # Project icon
├── docs/               # Specs, operations docs, prototypes
├── scripts/            # Smoke, ops, and verification commands
├── src/ragrig/         # RAGRig application code
├── tests/              # Unit and contract tests
├── docker-compose.yml  # Local Postgres/pgvector and optional services
├── pyproject.toml      # Python dependencies and tooling
└── Makefile            # Common developer commands

License

RAGRig is licensed under the Apache License 2.0. See LICENSE.

Name		Name	Last commit message	Last commit date
Latest commit History 340 Commits
.github		.github
alembic		alembic
api		api
assets		assets
docs		docs
examples/local-pilot		examples/local-pilot
frontend		frontend
scripts		scripts
src/ragrig		src/ragrig
tests		tests
.dockerignore		.dockerignore
.env.example		.env.example
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
Dockerfile		Dockerfile
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
README.zh-CN.md		README.zh-CN.md
SECURITY.md		SECURITY.md
alembic.ini		alembic.ini
docker-compose.yml		docker-compose.yml
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
uv.lock		uv.lock
vercel.json		vercel.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RAGRig

About

Why RAGRig

Architecture

Tech Stack

Roadmap

Local Pilot

Later Milestones

Web Console

Quick Start

Vercel Preview + Supabase

10-Minute Local Pilot Demo

Docker Local Pilot

Developer Setup

Verification

Authentication

Configuration

First-run setup

Role-based access

Member management

API keys

Local development (auth disabled)

Production Guardrails

Documentation

Repository Layout

License

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

RAGRig

About

Why RAGRig

Architecture

Tech Stack

Roadmap

Local Pilot

Later Milestones

Web Console

Quick Start

Vercel Preview + Supabase

10-Minute Local Pilot Demo

Docker Local Pilot

Developer Setup

Verification

Authentication

Configuration

First-run setup

Role-based access

Member management

API keys

Local development (auth disabled)

Production Guardrails

Documentation

Repository Layout

License

About

Topics

Resources

License

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages