MainRag

Self-hosted context layer for AI coding agents. MainRag exposes private repositories, docs, and prior agent conversations to coding agents (Codex, Claude Code, ...) through MCP — with citations and tenant boundaries.

Coding agents query the right files first time, with citations, on the customer's own infrastructure.

Under the hood (technical): PostgreSQL FTS + Qdrant (HNSW + INT8)

GTE-ModernBERT embeddings + cross-encoder reranking + code intelligence (symbols, call-graph, N-hop traversal).

MainRag is a self-hosted retrieval backend that turns a heterogeneous corpus (source code, Markdown docs, PDFs, web crawls, chat transcripts) into a queryable knowledge base. It is built for LLM agents and human developers who need grounded, citable, low-latency answers over large private codebases (~860k chunks tested) without sending data to a third party.

Embedding model: Alibaba-NLP/gte-modernbert-base (768d, 8192-token context)
Reranker: BAAI/bge-reranker-base (cross-encoder)
Vector store: Qdrant 1.16 with HNSW + Scalar Quantization (INT8)
Lexical index: PostgreSQL 18 FTS (GIN, UNION ALL simple+english)
Intelligence layer: Tree-sitter symbol extraction, call-graph edges, N-hop BFS traversal

Last verified: 2026-04-24 via commit 2d597cb

In a Codex rollout

A typical customer scenario:

Customer infrastructure: mainrag runs on the customer's own servers (PostgreSQL + Qdrant + TEI). No source code, retrieval index, or query history leaves their network.
Private repository ingestion: the admin source endpoint (POST /api/v1/admin/sources + /sync) indexes the customer's monorepo, internal docs, prior agent conversations, and design tickets.
Codex via MCP: Codex (or any MCP-aware coding agent) connects to mainrag's MCP endpoint (/api/v1/mcp/tools). When the agent needs context, it calls search_code, find_callers, get_symbol_card — results come back with citations to specific files and lines.
Cited PR review: the agent's proposed patch references the same citations. Reviewers can re-open the retrieval results, see what the agent saw, and verify whether the change is grounded in the right context.

▶ 3-minute MCP demo: docs/demo-mcp-codex.md

What you see in the cast: docker compose up → 3 services healthy → 13 MCP tools listed → search_code returns a cited result → mainrag-CLI search hits the same backend → closing one-liner.

Pairs with noaide for the operator console: mainrag provides context; noaide provides supervision and audit.

MCP for AI coding agents

MainRag ships a Model Context Protocol server alongside the HTTP API. Coding agents on private codebases need grounded retrieval over the company repository, not just the open files — MainRag is that retrieval layer. 13 tools are exposed live under /api/v1/mcp/tools (list) and /api/v1/mcp/tools/execute (call):

Tool	Purpose
`search_code`	Hybrid retrieval over indexed sources with citations
`search_symbols`	Identifier-aware lookup (functions, types, methods)
`find_callers` / `find_callees`	Call-graph navigation (1..N hops)
`get_symbol_callgraph` / `get_symbol_card`	Symbol-centric context bundles
`explain_path`	Why was this chunk retrieved? (signal breakdown)
`list_sources` / `get_source_stats`	Inventory of indexed corpora
`browse_layers` / `explore` / `get_ownership` / `report_dead_end`	Agent-driven exploration

End-to-end demo (3 minutes from docker compose up to a Codex patch): docs/demo-mcp-codex.md.

Why MainRag

Pure vector search overfits to paraphrase. Pure keyword search misses synonyms. Most hybrid stacks stop at RRF. MainRag adds:

Multi-signal ranking: RRF (BM25 + vector) + call-graph popularity + symbol-expansion (identifier tokenization) + parent-context boosting.
UNION ALL FTS: the simple and english tsvector configurations run in parallel; identifier substrings (hybrid_search) and natural-language queries (how to delete a clip) both hit.
Cross-encoder rerank: top-N from hybrid fusion is re-scored by a ModernBERT cross-encoder before being returned.
Code intelligence, not just text: tree-sitter parses 25+ languages into symbols, edges are stored as a proper graph, and N-hop call chains are reachable via a single API call.

Coding agents on private codebases need grounded retrieval over the company repository, not just the open files. MainRag is that retrieval layer — citable, tenant-bounded, fully self-hosted.

Pairs with noaide — context (mainrag) + control (noaide) for coding agents.

How it fits with the rest of the stack

mainrag is the context layer in a four-pillar stack for AI coding agents in regulated environments:

flowchart LR
  subgraph CONTEXT["Context"]
    MR["mainrag<br/>private-code retrieval<br/>MCP, citations"]
  end
  subgraph CONTROL["Control"]
    NO["noaide<br/>operator console<br/>JSONL transparency"]
  end
  subgraph RUNTIME["Runtime"]
    PS["project-sentinel<br/>sandbox, audit<br/>control planes"]
  end
  subgraph TRUST["Trust"]
    CO["complianceos<br/>regulated deployment<br/>on-prem evaluation"]
  end

  AG["Coding agent<br/>(Codex / Claude / Gemini)"]
  AG -->|context query| MR
  AG -->|observed by| NO
  AG -.->|optionally sandboxed in| PS
  CO -.->|frames customer evaluation| AG

mainrag answers: what context does the agent need?
noaide answers: what is the agent doing right now?
project-sentinel answers: what runtime boundaries enforce safety?
complianceos answers: how does a regulated customer evaluate this?

Performance

Measured on a single workstation (AMD Ryzen 9 5900HS, RTX 3050 Ti 4 GB, 16 GB RAM), corpus size 859k chunks, 10 canonical queries × 3 repetitions (n=30, wall-clock including CLI startup overhead ~30–50 ms).

Metric	Value
p50	132 ms
p95	187 ms
p99	208 ms
mean ± stdev	131 ± 36 ms
min / max	68 / 213 ms

Evidence: data/benchmarks/search_latency_20260424T140514Z.json, script: scripts/benchmark-search.py.

Quality baseline (early public preview)

Relevance is tracked through a small, manually curated 10-query reference set, with each top-5 result rated GOOD / PARTIAL / WEAK by hand. This is not a production benchmark — it is the internal regression set used during the alpha cycle. A larger, publicly reproducible eval is on the v0.2-beta roadmap.

Model	GOOD	PARTIAL	WEAK
`BAAI/bge-base-en-v1.5`	50 %	20 %	30 %
`Alibaba-NLP/gte-modernbert-base`	70 % (+20 pp)	20 %	10 %

Evidence: docs/search-baseline-bge-base.md, docs/search-baseline-gte-modernbert.md.

Capabilities at a glance

Honest status per area, in four buckets — Implemented (code on main + tests in CI), Demo-backed (code on main + a recorded walkthrough or fixture), Partial (code on main but a polish or hardening gap remains), Roadmap (an open issue, no code yet).

Capability	Status
Hybrid retrieval (BM25 + vector + cross-encoder rerank)	Implemented
MCP server (13 tools live under `/api/v1/mcp/tools`)	Implemented
Code intelligence (tree-sitter, 25+ languages)	Implemented
Call-graph + N-hop BFS traversal	Implemented
Watch-mode (incremental re-indexing)	Implemented
RLS + JWT + rate-limit + pepper-hashed API keys	Implemented
Performance baseline (Recall@10 70 %, p50 132 ms)	Demo-backed (`data/benchmarks/`)
MCP demo walkthrough	Demo-backed (`docs/demo-mcp-codex.md`, `docs/images/mcp-codex-demo.gif`)
Multi-tenant isolation	Partial (RLS on `sources` / `files` / `chunks`; `symbols` / `call_graph_edges` and `DEFAULT_USER_ID` outbox are v0.2 work — see #10)
Multi-tenant beta hardening	Roadmap (#10, scoped for v0.2-beta)

Architecture at a glance

flowchart LR
  CLI["mainrag CLI / MCP"] -->|HTTP / JSON| API["axum API :3001<br/>auth · rate-limit · CORS"]
  API --> PG[("PostgreSQL<br/>FTS + RLS<br/>symbols · call-graph")]
  API --> QD[("Qdrant<br/>HNSW + INT8<br/>~860k vec")]
  API --> EMB["TEI GTE<br/>embedder :8091"]
  API --> RR["TEI BGE<br/>reranker :8082"]

Terminal-readable ASCII variant

                      ┌────────────────────────────┐
                      │    mainrag CLI / MCP       │
                      └─────────────┬──────────────┘
                                    │ HTTP / JSON
                      ┌─────────────▼──────────────┐
                      │    axum API  (port 3001)   │
                      │  auth · rate-limit · CORS  │
                      └──┬──────────────────────┬──┘
                         │                      │
     ┌───────────────────┼──────────────────────┼───────────────────┐
     │                   │                      │                   │
┌────▼────┐        ┌─────▼─────┐         ┌──────▼──────┐      ┌─────▼─────┐
│PostgreSQL│        │  Qdrant   │         │  TEI GTE   │      │ TEI BGE   │
│FTS + RLS │        │HNSW + INT8│         │  Embedder  │      │ Reranker  │
│ symbols  │        │ 860k vec  │         │  :8091     │      │  :8082    │
│callgraph │        │           │         │            │      │           │
└──────────┘        └───────────┘         └────────────┘      └───────────┘

Full diagram and data-flow: docs/architecture.md.

Quickstart

Requires: Docker + nvidia-container-toolkit, PostgreSQL 18, Rust 1.75+.

# 1. Build workspace
cargo build --release --workspace

# 2. Start embedder + reranker + Qdrant
docker compose up -d

# 3. Apply schema
psql "$DATABASE_URL" -f schema_intelligence.sql

# 4. Run the API
./target/release/mainrag-api

# 5. From another shell: register a source via the admin API,
#    sync it, then search
TOKEN=$(curl -sf -X POST http://localhost:3001/api/v1/auth/login \
  -H "Content-Type: application/json" \
  -d '{"username":"admin","password":"<your-admin-password>"}' | jq -r .token)
SRC_ID=$(curl -sf -X POST http://localhost:3001/api/v1/admin/sources \
  -H "Authorization: Bearer $TOKEN" -H "Content-Type: application/json" \
  -d '{"name":"my-repo","source_type":"fs","path":"./path/to/code"}' | jq -r .id)
curl -sf -X POST "http://localhost:3001/api/v1/admin/sources/$SRC_ID/sync" \
  -H "Authorization: Bearer $TOKEN" | jq '.status'
./target/release/mainrag search "how does hybrid_search work"

See docs/operations.md for deployment, service topology, model requirements (~600 MB for GTE embedder + reranker), and mainrag.env reference.

Features

Hybrid retrieval — BM25 ⊕ vector ⊕ cross-encoder rerank. See docs/architecture.md.
Code intelligence — symbol extraction (25+ languages via tree-sitter), call-graph with N-hop BFS. See docs/intelligence.md.
HTTP API + MCP server — axum on :3001, MCP tools for Claude/agents. See docs/api.md.
Watch mode — incremental re-indexing on file changes, PDF/export/git/web plugins.
Security — Row-Level-Security on PostgreSQL, dual-key JWT rotation, rate limiting, pepper-hashed API keys, request-size limits, security headers.

Repository layout

.
├── api/            Rust axum server + retrieval pipeline + intelligence
├── cli/            Rust CLI (mainrag binary)
├── docs/           Public docs (architecture, api, operations, intelligence) + baselines
├── ops/            systemd units, migration/backup infrastructure
├── scripts/        Python utilities (benchmark, migration, enrichment)
├── data/           Benchmark artifacts (gitignored except JSON results)
├── docker-compose.yml
├── schema_intelligence.sql
└── Cargo.toml      workspace root

Documentation

Doc	Scope
`docs/architecture.md`	System components, data flow, ranking pipeline
`docs/api.md`	HTTP endpoints, auth, request/response shapes
`docs/operations.md`	Deployment, services, env vars, health checks
`docs/intelligence.md`	Call-graph, N-hop traversal, symbol cards
`docs/search-baseline-gte-modernbert.md`	Current relevance evidence (10 queries)
`docs/search-baseline-bge-base.md`	Prior BGE baseline (historical)
`docs/demo-mcp-codex.md`	3-minute MCP/Codex demo walkthrough
`examples/`	Copy-pasteable walkthroughs (index OSS repo · call MCP tools · agent with context)

The repository's social preview (the image GitHub renders when the repo is shared) lives at docs/images/og-preview.png and is reproducible from docs/images/og-preview.source.html. Uploading it is a manual maintainer step under Settings → General → Social preview — GitHub does not expose a REST endpoint for that upload.

Status

This is an early public preview (v0.1.0-alpha.1). The system runs production traffic on a single node but public-facing APIs, CI, and the plugin interface are not yet stabilized. Expect breaking changes.

Not for production multi-tenant use. MainRag v0.1.0-alpha is a single-tenant developer preview. The transactional outbox and the DEFAULT_USER_ID hardening are scoped for v0.2 (multi-tenant beta) — see #10 for the plan.

MainRag is developed using AI coding agents — the same tools it serves with private-code context.

License

Licensed under the Apache License, Version 2.0. See LICENSE.

Name		Name	Last commit message	Last commit date
Latest commit History 55 Commits
.github		.github
api		api
cli		cli
data/benchmarks		data/benchmarks
docs		docs
eval		eval
examples		examples
migrations		migrations
ops		ops
scripts		scripts
systemd		systemd
tools		tools
.cargo-remote.toml.example		.cargo-remote.toml.example
.gitignore		.gitignore
.gitleaksignore		.gitleaksignore
CHANGELOG.md		CHANGELOG.md
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
LICENSE		LICENSE
README.md		README.md
SECURITY.md		SECURITY.md
audit.toml		audit.toml
credentials.example.json		credentials.example.json
docker-compose.yml		docker-compose.yml
mainrag-api.service		mainrag-api.service
mainrag.env.example		mainrag.env.example
schema.sql		schema.sql
schema_intelligence.sql		schema_intelligence.sql
schema_security.sql		schema_security.sql
schema_web.sql		schema_web.sql

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MainRag

In a Codex rollout

MCP for AI coding agents

Why MainRag

How it fits with the rest of the stack

Performance

Quality baseline (early public preview)

Capabilities at a glance

Architecture at a glance

Quickstart

Features

Repository layout

Documentation

Status

License

About

Uh oh!

Releases 1

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

MainRag

In a Codex rollout

MCP for AI coding agents

Why MainRag

How it fits with the rest of the stack

Performance

Quality baseline (early public preview)

Capabilities at a glance

Architecture at a glance

Quickstart

Features

Repository layout

Documentation

Status

License

About

Topics

Resources

License

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages