Self-hosted team brain for AI coding agents — your code, docs, and decisions in one hybrid-searchable index, served over MCP.
한국어 README | English
Code-search tools know your code but nothing else. Memory servers know your notes but not your code. A development team's knowledge doesn't split that cleanly: the question "how do refunds work here?" is answered by RefundService.kt and the design memo that says why partial refunds are disabled — and your agent should get both from the same search.
mindcairn is a small, self-hosted MCP server an individual developer can stand up on a spare machine. Point it at a repo, let it analyze the structure, ingest your design docs and captured decisions into the same index, and every Claude Code / Cursor session on your team gets one search_codebase that answers with code, docs, and decisions together — over plain HTTP, or Tailscale for remote teammates.
- One index for code + docs + decisions — chunks from your repo, Notion exports / markdown (
ingest_doc), and decisions captured mid-session (capture_decision) all live in the same hybrid index. - Hybrid search that actually finds identifiers — BM25 sparse + dense (bge-m3) fused with RRF. On our internal eval this took Hit@5 from 63% to 88% (exact-identifier queries: 50% → 75%, natural-language queries: 75% → 100%).
- Built for teams of 1–10 — read-only mode, an IP allowlist for write tools, usage logging, and a
report_issuefeedback loop. No SSO, no cluster, no per-seat pricing.
Respectfully — these are different tools for different jobs. mindcairn's job is the intersection.
| mindcairn | claude-context | mem0 | basic-memory | |
|---|---|---|---|---|
| Remembers | code + docs + decisions | code only | conversation facts | markdown notes |
| Deployment | self-hosted, no new 3rd party | OpenAI key + Zilliz Cloud | SaaS or self-host | local files |
| Team sharing | read-only mode + IP allowlist + usage log | — | — | — |
Honest table — what runs where:
| Step | Where it runs | Your code leaves the machine? |
|---|---|---|
| Embeddings | local Ollama (bge-m3), default |
No (unless you opt into EMBEDDING_PROVIDER=openai) |
| Vector storage & search | local Qdrant + SQLite | No |
init analysis (discovery/strategy) |
Claude — via your existing Claude Code CLI login or API key | Yes — repo structure and snippets are sent for analysis |
| Chunk enrichment (labeling) | Claude (Haiku) | Yes — chunk contents are sent; set ENRICHER=off to block |
If you already use Claude Code, mindcairn adds zero new third parties. (claude-context requires OpenAI + Zilliz Cloud accounts.) We deliberately don't claim "fully local": the LLM analysis steps call Claude, and you control how much of that happens.
From the verification benchmarks — every number from actual run logs, no estimates. Judging criterion: correct file inside search_codebase top-5.
| Repo | Language | init time |
Files | Chunks | Search hits |
|---|---|---|---|---|---|
| gin-gonic/gin | Go | 166.7s | 58 | 111 | 3/3 (all #1) |
| trpc/trpc | TS monorepo | 267.3s (+ repair build 174.2s) | 182 | 473 | 3/3 |
| fastapi/fastapi | Python | 113.3s | 48 | 81 | 3/3 |
9/9 top-5 hits across three non-JVM repos, init in 113–267 seconds each. On our internal Kotlin/Spring deployment, hybrid search raised Hit@5 from 63% to 88%.
If you use Claude Code, Cursor, or any MCP-capable coding agent, you don't have to run anything by hand. Open your project and paste this:
Set up mindcairn for this repository.
- Clone
https://github.com/YoonSuHyeon/mindcairnnext to this project (skip if it's already there) and runbun installinside it.- Make sure Docker and Ollama are running, then from the mindcairn folder run
docker compose up -d(Qdrant) andollama pull bge-m3.- Run
bun run src/cli/index.ts init --repo <absolute path of THIS repo> --tag <short-name> --yes.- Start the server with
bun run src/cli/index.ts serve <short-name>, then register it with me:claude mcp add --transport http mindcairn-<short-name> http://localhost:8765/mcp.From then on, use the
search_codebasetool whenever I ask about this codebase.
The agent runs each step, resolves any preflight errors it hits (missing Docker / Ollama / model), and wires up the MCP connection — you just approve. Want to drive it yourself? The manual steps are below.
Bun (required — uses bun:sqlite), Docker, and either Ollama or an OpenAI API key for embeddings. LLM analysis uses the Claude Code CLI (no API key needed) or ANTHROPIC_API_KEY.
On macOS:
brew install oven-sh/bun/bun
brew install --cask docker # then launch Docker Desktop once and wait until it's up
brew install ollama # then launch the Ollama app (or run `ollama serve`)For the LLM analysis, either be logged in to the Claude Code CLI (claude on your PATH) or export ANTHROPIC_API_KEY. Neither? Use No-LLM mode.
Verify the two daemons are actually running — the most common quickstart failure:
docker info > /dev/null && echo "docker OK" # errors if Docker Desktop isn't running
curl -s localhost:11434 > /dev/null && echo "ollama OK" # errors if Ollama isn't running(Linux: Docker Engine + Ollama; same checks apply.)
docker compose up -d # Qdrant (vector DB)
ollama pull bge-m3 # embedding model (or set EMBEDDING_PROVIDER=openai)
bun install
# Interactive wizard: preflight → preset → analysis → indexing → MCP instructions
bun run src/cli/index.ts init
# Non-interactive
bun run src/cli/index.ts init --repo /path/to/your/repo --tag my-app --yes
# Serve as an MCP server
bun run src/cli/index.ts serve my-appThen connect your agent:
claude mcp add --transport http mindcairn-my-app http://localhost:8765/mcpAlready running Qdrant or something on these ports? Skip
docker compose upif a Qdrant is already listening on 6333 — mindcairn just uses it. If port 8765 is taken,serverefuses to start (instead of silently sharing the port); pick another withMINDCAIRN_MCP_PORT=8770. Note the wizardinitis interactive — run it on its own line, not as part of a pasted block, or use the non-interactive form.
Measured on a clean machine: init finished in 88 seconds against spring-petclinic (37 files) — well under 30 minutes end to end, including Docker images and the embedding model pull.
# stop: Ctrl-C the serve process; docker compose down (index data survives)
# reset one instance: delete its local state + Qdrant collection
# (collection name = mindcairn_<tag> with non-alphanumerics replaced by "_": my-app → mindcairn_my_app)
rm -rf .mindcairn/my-app
curl -X DELETE http://localhost:6333/collections/mindcairn_my_app
# reset everything: docker compose down -v (drops all Qdrant volumes)- Hybrid search (BM25 + dense, RRF fusion) — a code-aware tokenizer (camelCase/snake_case splitting, full-identifier tokens, CJK support) feeds a sparse BM25 vector next to the dense embedding in Qdrant; queries fuse both with Reciprocal Rank Fusion. Exact function names and natural-language questions both work.
- Layer-aware chunking — a strategy agent inspects the repo and designs chunkers per architectural layer (controller / service / repository / DTO / entity), instead of fixed-size text windows. Falls back to a generic strategy when coverage is low.
- Decision capture —
capture_decisionsaves a decision/fact/incident from the middle of a coding session into the index; it's searchable within seconds, next to the code it's about. - Docs ingestion — push Notion exports or arbitrary markdown into the same index via the
ingest_doctool or batch scripts, so design docs answer alongside code. - LLM enrichment (optional) — a fast model (Haiku) attaches structured labels to each chunk (class name, methods, tables, keywords); embeddings index the label rather than raw noise. Set
ENRICHER=offto skip. - Incremental sync — re-index only changed files (
synccommand); new chunks automatically participate in BM25. - Team sharing, safely —
MINDCAIRN_READONLY=1exposes search tools only;MINDCAIRN_WRITE_IPSallowlists who can write; every query is logged for usage review. - Feedback & regression loop —
report_issuecaptures bad answers,eval_query+ a golden set catch search-quality regressions before they ship.
| Tool | What it does |
|---|---|
search_codebase |
Hybrid search over code + ingested docs + decisions |
find_pattern |
Find implementations of a pattern/convention |
explain_module |
Summarize a module by name |
get_chunk |
Fetch a full chunk by id |
capture_decision |
Save a team decision into the index |
ingest_doc |
Index an external document |
list_captured |
List captured decisions/docs |
learn_preference |
Store a team preference/rule |
eval_query |
Run a search-quality eval case |
report_issue |
File feedback on a bad answer |
With MINDCAIRN_READONLY=1, only the search/read tools plus report_issue are exposed.
your repo ────▶ ┌──────────────────────────────────────┐
(read-only) │ init (one-time, < 30 min) │
docs/decisions │ 1 preset detect languages/globs │
(ingest_doc, │ 2 discovery LLM analyzes structure │
capture_…) │ 3 strategy layer-aware chunk plan │
│ 4 build chunk → enrich → embed │
└───────────────┬──────────────────────┘
▼
┌────────────┐ ┌──────────────┐
│ Qdrant │ │ SQLite │
│ dense+BM25 │ │ chunks, usage│
│ (RRF) │ │ logs, evals │
└─────┬──────┘ └──────┬───────┘
└────────┬────────┘
▼
┌───────────────────────────┐ HTTP /mcp
│ MCP server :8765 │◀──────────── Claude Code / Cursor
│ search_codebase, │ (teammates via
│ find_pattern, ingest_doc… │ Tailscale, etc.)
└───────────────────────────┘
Everything is optional — defaults work with local Ollama + Claude CLI. See .env.example.
| Variable | Default | Description |
|---|---|---|
EMBEDDING_PROVIDER |
ollama |
ollama or openai |
OPENAI_API_KEY |
— | Required when EMBEDDING_PROVIDER=openai |
MINDCAIRN_EMBED_MODEL |
bge-m3 / text-embedding-3-small |
Embedding model per provider |
MINDCAIRN_EMBED_DIM |
1024 / 1536 |
Embedding dimension |
ANTHROPIC_API_KEY |
— | If set, LLM calls use the API; otherwise the Claude CLI (OAuth) |
MINDCAIRN_LLM |
auto | Force claude-cli or api |
MINDCAIRN_CLAUDE_BIN |
claude |
Path to the Claude CLI binary |
MINDCAIRN_LLM_TIMEOUT_MS |
120000 |
Per-call LLM timeout (ms); a hung CLI/API call is killed so indexing can't wedge |
ENRICHER |
auto |
Chunk labeling: claude-cli / api / off / auto |
MINDCAIRN_MODEL_LARGE |
(Claude Opus) | Model for discovery/strategy |
MINDCAIRN_MODEL_FAST |
(Claude Haiku) | Model for enrich/eval |
MINDCAIRN_QDRANT_HOST |
http://localhost:6333 |
Qdrant endpoint |
MINDCAIRN_OLLAMA_HOST |
http://localhost:11434 |
Ollama endpoint |
MINDCAIRN_MCP_PORT |
8765 |
MCP server port |
MINDCAIRN_MCP_HOST |
0.0.0.0 |
MCP bind address |
MINDCAIRN_READONLY |
off | 1 = expose search tools + report_issue only (shared team instance) |
MINDCAIRN_WRITE_IPS |
— | Comma-separated IP allowlist for write tools. localhost always allowed; if unset, remote writes are blocked (localhost-only) |
MINDCAIRN_TRUST_PROXY |
— | Trust the proxy's X-Forwarded-For so req.ip is the real client. loopback / IP / hop-count. Set only when behind a reverse proxy — otherwise clients could spoof their IP and bypass MINDCAIRN_WRITE_IPS |
MINDCAIRN_OUTPUT_DIR |
.mindcairn |
Where indexes/presets are stored |
Most per-company adaptation needs no code: presets control what gets indexed, and the LLM-designed chunking strategy is a plain JSON file you can edit (custom chunkers per convention — @Table entities, code-enum interfaces, your own annotations). Adding symbol-level parsing for a new language is a code change with a clear seam. See docs/extending.md.
templates/commands/ ships a ready-made Claude Code slash command (/mindcairn-start) — copy it into your project's .claude/commands/, replace the placeholder skill/agent names with your team's, and you get a mindcairn-first workflow entry point (task kickoff, free-form search, SQL drafting, debugging, retros).
Straight from the verification round — known today, on the roadmap:
- Symbol-level chunking is Kotlin/Java only. Go/Python/TS get file-unit chunks (the regex parser doesn't extract their classes/methods). Search works, but per-function precision is lower than for JVM languages. Incremental
syncalso only picks up changed.ktfiles today and requires a priorbuild --ref(which records the baseline SHA) — for other languages, re-runbuild. - Multiple file-unit chunkers can chunk the same file more than once — gin: 58 files → 111 chunks; trpc: 182 files → 473 chunks. Search-time quotas soften the bias, but index size and enrich cost grow.
- The claude-cli path is bound by your session usage limit. Enrichment can fail under a 429 even with retries; re-running
initretries only the failed chunks thanks to the cache. - Korean-first UX — the CLI wizard, log output, and enrichment labels are currently in Korean (the codebase comes from a Korean team). Search itself is language-agnostic (bge-m3 is multilingual), but a UI language option is not implemented yet.
- No graph layer, no automatic memory extraction. This is pure hybrid retrieval; decisions enter the index only via explicit
capture_decision/ingest_doc.
Does my code leave my machine?
Embeddings and vector search are local (Ollama + Qdrant). The discovery/strategy analysis and optional chunk enrichment send code snippets to Claude — via your existing Claude Code CLI login or an API key. Set ENRICHER=off to minimize LLM calls; the one-time discovery + strategy analysis is all that remains. For zero LLM calls, hand-write the strategy JSON and build --strategy directly — see No-LLM mode. See also Privacy & data flow.
Why Bun and not Node?
mindcairn uses bun:sqlite for chunk storage and gets a fast dev loop for free. Node is not supported.
Which languages are supported? Kotlin/Java get a dedicated structural parser today; TypeScript/JavaScript, Python, Go, Rust, and SQL are indexed via the generic file-unit strategy (see Limitations). The strategy agent picks what fits your repo.
How do remote teammates connect?
The server binds 0.0.0.0, so anything that gives them network access works — we use Tailscale. Run the shared instance with MINDCAIRN_READONLY=1 and put your own IP in MINDCAIRN_WRITE_IPS.
How is this different from Glean / Sourcegraph?
Those are excellent for large orgs — and priced/operated like it. mindcairn targets the other end: one developer, one spare machine, one init command, a team of one to ten.
Can it index my Notion / internal docs?
Yes — ingest_doc (MCP tool) for one-offs, plus batch scripts for Notion exports. Docs, decisions, and code share the same hybrid index.
