The only memory layer that learns how you work — not just what you said. Persistent, local memory for AI coding agents: Claude Code, Codex CLI, Cursor, any MCP client. Temporal knowledge graph · procedural memory · AST codebase ingest · cross-project analogy · 3D WebGL visualization.
Why this, not mem0 / Letta / Zep / Supermemory / Cognee? → docs/vs-competitors.md
- The problem it solves
- 60-second demo
- Benchmarks — how it compares
- Competitor comparison
- What you get
- Architecture
- Install
- Quick start
- MCP tools reference
- TypeScript SDK
- Dashboard
- Update
- Upgrading from v7.x to v8.0
- Ollama setup
- Configuration
- Roadmap
- Support the project
- Philosophy & license
AI coding agents have amnesia. Every new Claude Code / Codex / Cursor session starts from zero. Yesterday's architectural decisions, bug fixes, stack choices, and hard-won lessons vanish the moment you close the terminal. You re-explain the same things, re-discover the same solutions, paste the same context into every new chat.
total-agent-memory gives the agent a persistent brain — on your machine, not in someone else's cloud.
Every decision, solution, error, fact, file change, and session summary is:
- Captured — explicitly via
memory_saveor implicitly via hooks on file edits / bash errors / session end - Linked — automatically extracted into a knowledge graph (entities, relations, temporal facts)
- Searchable — 6-stage hybrid retrieval (BM25 + dense + graph + CrossEncoder + MMR + RRF fusion), 96.2% R@5 on public LongMemEval
- Private — 100% local. SQLite + FastEmbed + optional Ollama. No data leaves your machine.
You: "remember we picked pgvector over ChromaDB because of multi-tenant RLS"
Claude: ✓ memory_save(type=decision, content="Chose pgvector over ChromaDB",
context="WHY: single Postgres, per-tenant RLS")
[3 days later, different session, possibly different project directory:]
You: "why did we pick pgvector again?"
Claude: ✓ memory_recall(query="vector database choice")
→ "Chose pgvector over ChromaDB for multi-tenant RLS. Single DB
instance, row-level security per tenant."
It's not just retrieval. It's procedural too:
You: "migrate auth middleware to JWT-only session tokens"
Claude: ✓ workflow_predict(task_description="migrate auth middleware...")
→ confidence 0.82, predicted steps:
1. read src/auth/middleware.go + tests
2. update session fixtures in tests/
3. run migration 0042
4. regenerate OpenAPI spec
similar past: wf#118 (success), wf#93 (success)
Public LongMemEval benchmark (xiaowu0162/longmemeval-cleaned, 470 questions, the dataset everyone publishes against):
R@5 (recall_any) on public LongMemEval
─────────────────────────────────────────
100% ─┤
│
96.2% ┤ ████ ← total-agent-memory v7.0 (LOCAL, 38.8 ms, MIT)
95.0% ┤ ████ ← Mastra "Observational" (cloud)
│ ████
│ ████
85.4% ┤ ████ ← Supermemory (cloud, $0.01/1k tok)
│ ████
│ ████
│ ████
80% ┤ ████
└──────────────────────────────────────────
Reproducible: evals/longmemeval-2026-04-17.json · Runner: benchmarks/longmemeval_bench.py
| Question type | Count | Our R@5 |
|---|---|---|
| knowledge-update | 72 | 100.0% |
| single-session-user | 64 | 100.0% |
| multi-session | 121 | 96.7% |
| single-session-assistant | 56 | 96.4% |
| temporal-reasoning | 127 | 95.3% ← bi-temporal KG pays off |
| single-session-preference | 30 | 80.0% ← weakest spot |
| TOTAL | 470 | 96.2% |
p50 (warm) ▌ 0.065 ms
p95 (warm) ▌▌ 2.97 ms
LongMemEval ▌▌▌▌▌ 38.8 ms/query ← includes embedding + CrossEncoder rerank
p50 (cold) ▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌ 1333 ms ← first query after process start
Warm / cold reproducible from evals/results-2026-04-17.json.
We're not replacing chatbot memory — we're occupying the coding-agent + MCP + local niche.
| mem0 | Letta | Zep | Supermemory | Cognee | LangMem | total-agent-memory | |
|---|---|---|---|---|---|---|---|
| Funding / status | $24M YC | $10M seed | $12M seed | $2.6M seed | $7.5M seed | in LangChain | self-funded OSS |
| Runs 100% local | 🟡 | ✅ | 🟡 | ❌ | 🟡 | 🟡 | ✅ |
| MCP-native | via SDK | ❌ | 🟡 Graphiti | 🟡 | ❌ | ❌ | ✅ 60+ tools |
| Knowledge graph | 🔒 $249/mo | ❌ | ✅ | ✅ | ✅ | ❌ | ✅ |
Temporal facts (kg_at) |
❌ | ❌ | ✅ | ❌ | 🟡 | ❌ | ✅ |
| Procedural memory | ❌ | ❌ | ❌ | ❌ | ❌ | 🟡 | ✅ workflow_predict |
| Cross-project analogy | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ✅ analogize |
| Self-improving rules | ❌ | ❌ | ❌ | ❌ | 🟡 | ❌ | ✅ learn_error |
| AST codebase ingest | ❌ | ❌ | ❌ | ❌ | 🟡 | ❌ | ✅ tree-sitter 9 lang |
| Pre-edit risk warnings | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ✅ file_context |
| 3D WebGL graph viewer | ❌ | ❌ | 🟡 | ✅ | ❌ | ❌ | ✅ |
| Price for graph features | $249/mo | free | cloud | usage | free | free | free |
Full side-by-side with pricing, latency, accuracy, "when to pick each" → docs/vs-competitors.md.
| Capability | Tool | One-liner |
|---|---|---|
| 🧠 Procedural memory | workflow_predict / workflow_track |
"How did I solve this last time?" — predicts steps with confidence |
| 🔗 Cross-project analogy | analogize |
"Was there something like this in another repo?" — Jaccard + Dempster-Shafer |
file_context |
Surfaces past errors / hot spots on the file you're about to edit | |
| 🛡 Self-improving rules | learn_error + self_rules_context |
Bash failures → patterns → auto-consolidated behavioral rules at N≥3 |
| 🕰 Temporal facts | kg_add_fact / kg_at |
Append-only KG with valid_from/valid_to — query what was true at any point |
| 🎯 Task workflow phases | classify_task / phase_transition |
Automatic L1-L4 complexity classification, state machine across van/plan/creative/build/reflect/archive |
| 🧩 Structured decisions | save_decision |
Options + criteria matrix + rationale + discarded → searchable decision records with per-criterion embeddings |
| 💸 Token-efficient retrieval | memory_recall(mode="index") + memory_get |
3-layer workflow: compact IDs → timeline → batched full fetch. ~83% token saving on typical queries |
- 6-stage hybrid retrieval (BM25 + dense + fuzzy + graph + CrossEncoder + MMR, RRF fusion) — 96.2% R@5 public
- Multi-representation embeddings — each record embedded as raw + summary + keywords + questions + compressed
- AST codebase ingest — tree-sitter across 9 languages (Python, TS/JS, Go, Rust, Java, C/C++, Ruby, C#)
- Auto-reflection pipeline —
memory_save→ LaunchAgent file-watch → graph edges appear ~30 s later - rtk-style content filters — strip noise from pytest / cargo / git / docker logs while preserving URLs, paths, code
- 3D WebGL knowledge graph viewer — 3,500+ nodes, 120,000+ edges, click-to-focus, filters
- Hive plot & adjacency matrix — alternate graph views sorted by node type
- A2A protocol — memory shared between multiple agents (backend + frontend + mobile in a team)
design-exploreskill — drop-in Claude Code skill that walks L3-L4 tasks through options → criteria matrix →save_decisionbefore code (seeexamples/skills/design-explore/SKILL.md)<private>...</private>inline redaction in any saved content- Cloud LLM/embed providers with per-phase routing (OpenAI / Anthropic / OpenRouter / Together / Groq / Cohere / any OpenAI-compat)
activeContext.mdObsidian projection for human-readable session state- Phase-scoped rules (
self_rules_context(phase="build")) — ~70% token reduction
┌─────────────────────────────────────────────────┐
│ Your AI coding agent │
│ (Claude Code · Codex CLI · Cursor · any MCP) │
└──────────────────────┬──────────────────────────┘
│ MCP (stdio or HTTP)
│ 60+ tools
┌──────────────────────▼──────────────────────────┐
│ total-agent-memory server │
│ ┌──────────────┐ ┌────────────────────┐ │
│ │ memory_save │ │ memory_recall │ │
│ │ memory_upd │ │ 6-stage pipeline: │ │
│ │ kg_add_fact │ │ BM25 (FTS5) │ │
│ │ learn_error │ │ + dense (FastEmbed)│ │
│ │ file_context │ │ + fuzzy │ │
│ │ workflow_* │ │ + graph expansion │ │
│ │ analogize │ │ + CrossEncoder † │ │
│ │ ingest_code │ │ + MMR diversity † │ │
│ └──────┬───────┘ │ → RRF fusion │ │
│ │ └──────────┬──────────┘ │
└───────────┼─────────────────────┼────────────────┘
│ │
┌───────────▼─────────────────────▼────────────────┐
│ Storage │
│ ┌────────────┐ ┌────────────┐ ┌─────────────┐ │
│ │ SQLite │ │ FastEmbed │ │ Ollama │ │
│ │ + FTS5 │ │ HNSW │ │ (optional) │ │
│ │ + KG tbls │ │ binary-q │ │ qwen2.5-7b │ │
│ └────────────┘ └────────────┘ └─────────────┘ │
└───────────────────────────────────────────────────┘
│
│ file-watch + debounce
┌───────────▼────────────────────────────────────┐
│ Auto-reflection pipeline (LaunchAgent) │
│ triple_extraction → deep_enrichment → reprs │
│ (async, 10s debounce, drains in background) │
└─────────────────────────────────────────────────┘
│
┌───────────▼─────────────────────────────────────┐
│ Dashboard (localhost:37737) │
│ / - stats, savings, queue depths │
│ /graph/live - 3D WebGL force-graph │
│ /graph/hive - D3 hive plot │
│ /graph/matrix - adjacency matrix │
└─────────────────────────────────────────────────┘
† CrossEncoder + MMR are on-demand via `rerank=true` / `diverse=true`
Two paths. Same 60+ tools, same dashboard, different deployment shapes.
| OS | Command | Background services |
|---|---|---|
| macOS 10.15+ | ./install.sh --ide claude-code |
LaunchAgents (launchctl) |
| Linux (Ubuntu 22.04+, Debian 12+, Fedora 38+) | ./install.sh --ide claude-code |
systemd --user |
| WSL2 (Windows 11 + Ubuntu/Debian) | ./install.sh --ide claude-code |
systemd --user — requires /etc/wsl.conf with [boot] systemd=true; otherwise falls back to shell-loop autostart |
| Windows 10/11 native | .\install.ps1 -Ide claude-code |
Task Scheduler |
Full per-platform walkthrough, WSL2 Windows-host-vs-WSL IDE nuances, the
wsl -e MCP-command pattern, IDE coverage matrix, and uninstall/diagnostic
flows: docs/installation.md.
git clone https://github.com/vbcherepanov/claude-total-memory.git ~/claude-memory-server
cd ~/claude-memory-server
bash install.sh --ide claude-code # or: cursor | gemini-cli | opencode | codexThe installer:
- Clones + creates
~/claude-memory-server/.venv/ - Installs deps from
requirements.txtandrequirements-dev.txt - Pre-downloads the FastEmbed multilingual MiniLM model
- Registers the MCP server via
claude mcp add-json memory ...(stored in~/.claude.json, the canonical store Claude Code actually reads) - Copies all hooks (
session-*,user-prompt-submit.sh,post-tool-use.sh,pre-edit.sh,on-bash-error.sh, etc.) into~/.claude/hooks/and registers them in~/.claude/settings.json - Grants
permissions.allowfor 20+mcp__memory__*tools so hook-driven calls don't prompt for confirmation - Installs background services for the current OS:
- macOS — 4 LaunchAgents (
reflection,orphan-backfill,check-updates,dashboard) under~/Library/LaunchAgents/ - Linux / WSL2 — 7 systemd
--userunits (*.service,*.timer,*.path) under~/.config/systemd/user/; gracefully degrades ifsystemd --useris unavailable (WSL without/etc/wsl.conf)
- macOS — 4 LaunchAgents (
- Applies all migrations to a fresh
memory.db - Starts the dashboard at
http://127.0.0.1:37737
Restart Claude Code → /mcp → memory should show Connected with 60+ tools.
git clone https://github.com/vbcherepanov/claude-total-memory.git $HOME\claude-memory-server
cd $HOME\claude-memory-server
powershell -ExecutionPolicy Bypass -File install.ps1 -Ide claude-codeSame 9 steps as Unix, but:
- MCP config path is
%USERPROFILE%\.claude\settings.json(or.cursor\mcp.json, etc.) - Hooks copied to
%USERPROFILE%\.claude\hooks\—.ps1versions (auto-capture, memory-trigger, user-prompt-submit, post-tool-use, pre-edit, on-bash-error, session-start/end, on-stop, codex-notify) - Background services via Task Scheduler:
total-agent-memory-reflection— every 5 min (no native FileSystemWatcher equivalent)total-agent-memory-orphan-backfill— daily 00:00 + 6h repetitiontotal-agent-memory-check-updates— weekly Mon 09:00ClaudeTotalMemoryDashboard— AtLogon
All installers preserve ~/.claude-memory/memory.db and your config files; only services + hook registrations are removed.
./install.sh --uninstall # macOS/Linux/WSL2 — removes LaunchAgents OR systemd units
.\install.ps1 -Uninstall # Windows — unregisters Scheduled Tasks + cleans settings.jsonOne-shot health check — prints ✓/✗ for each subsystem (OS detect, venv, MCP import, services, dashboard HTTP, Ollama, DB migrations):
bash scripts/diagnose.sh # macOS / Linux / WSL2
.\scripts\diagnose.ps1 # WindowsExit code 0 = all green, 1 = something broken.
git clone https://github.com/vbcherepanov/claude-total-memory.git
cd claude-total-memory
bash install-docker.sh --with-composeBrings up 5 services:
| Service | Role | Exposed |
|---|---|---|
mcp |
MCP server (HTTP transport) | 127.0.0.1:3737/mcp |
dashboard |
Web UI | 127.0.0.1:37737 |
ollama |
Local LLM runtime | 127.0.0.1:11434 |
reflection |
File-watch queue drainer | internal |
scheduler |
Ofelia cron (backfill + update check) | internal |
First run pulls qwen2.5-coder:7b (~4.7 GB) + nomic-embed-text (~275 MB) — 5–10 min cold start.
GPU note: Docker Desktop on macOS doesn't forward Metal. Native install is faster on Mac. On Linux with NVIDIA Container Toolkit, uncomment the deploy.resources.reservations.devices block in docker-compose.yml.
memory_save(content="install works", type="fact")
memory_stats()
Open http://127.0.0.1:37737/ — dashboard, knowledge graph, token savings.
Once installed, in any Claude Code / Codex CLI / Cursor session:
1. Resume where you left off (auto on session start, but you can also invoke)
session_init(project="my-api")
→ {summary: "yesterday: migrated auth middleware to JWT",
next_steps: ["update OpenAPI spec", "notify frontend team"],
pitfalls: ["don't revert migration 0042 — dev DB already migrated"]}
2. Save a decision (agent does this automatically after hooks are registered)
memory_save(
type="decision",
content="Chose pgvector over ChromaDB for multi-tenant RLS",
context="WHY: single Postgres instance, per-tenant row-level security",
project="my-api",
tags=["database", "multi-tenant"],
)
3. Recall across sessions / projects
memory_recall(query="vector database choice", project="my-api", limit=5)
→ RRF-fused results from 6 retrieval tiers
4. Predict approach before starting a task
workflow_predict(task_description="migrate auth middleware to JWT-only")
→ {confidence: 0.82, predicted_steps: [...], similar_past: [...]}
5. Check a file's risk before editing (auto via hook, also manual)
file_context(path="/Users/me/my-api/src/auth/middleware.go")
→ {risk_score: 0.71, warnings: ["last 3 edits caused test failures in ..."], hot_spots: [...]}
6. Get full stats
memory_stats()
→ {sessions: 515, knowledge: {active: 1859, ...}, storage_mb: 119.5, ...}
Core retrieval (9): memory_save, memory_recall, memory_get, memory_update, memory_delete, memory_history, memory_extract_session, memory_relate, memory_search_by_tag
Knowledge graph (8): kg_add_fact, kg_invalidate_fact, kg_at, kg_timeline, memory_graph, memory_graph_index, memory_graph_stats, memory_concepts
Episodic / session (6): memory_episode_save, memory_episode_recall, session_init, session_end, memory_timeline, memory_history
Procedural / workflows (4): workflow_learn, workflow_predict, workflow_track, classify_task
Task phases (4, v8.0): task_create, phase_transition, task_phases_list, complete_task
Decisions (1, v8.0): save_decision
Intents (3, v8.0): save_intent, list_intents, search_intents
Self-improvement (5): self_rules, self_rules_context, self_insight, self_patterns, self_error_log, rule_set_phase (v8.0)
Pre-edit guard / error learning (3): file_context, learn_error, self_error_log
Analogy / cross-project (2): analogize, ingest_codebase
Reflection / consolidation (4): memory_reflect_now, memory_consolidate, memory_forget, memory_observe
Stats / export (5): memory_stats, memory_export, memory_self_assess, memory_context_build, benchmark
Skills (3): memory_skill_get, memory_skill_update, file_context
Total: 60+ tools. Each is documented below with input schema and example.
When you only know the topic but not which records matter, use progressive disclosure:
- Index —
memory_recall(query="auth refactor", mode="index", limit=20)→ ~2 KB of{id, title, score, type, project, created_at}per hit. No content, no cognitive expansion. - Timeline —
memory_recall(query="auth refactor", mode="timeline", limit=5, neighbors=2)→ top-K hits padded with ±neighbours from the same session, sorted chronologically. - Fetch —
memory_get(ids=[3622, 3606])→ full content for ONLY the IDs you chose (max 50 per call,detail="summary"truncates to 150 chars).
Typical saving: 80-90 %% fewer tokens vs memory_recall(detail="full", limit=20) when you end up using 2-3 of the 20 hits.
Core memory (15)
memory_recall · memory_get · memory_save · memory_update · memory_delete · memory_search_by_tag · memory_history · memory_timeline · memory_stats · memory_consolidate · memory_export · memory_forget · memory_relate · memory_extract_session · memory_observe
Knowledge graph (6)
memory_graph · memory_graph_index · memory_graph_stats · memory_concepts · memory_associate · memory_context_build
Episodic memory & skills (4)
memory_episode_save · memory_episode_recall · memory_skill_get · memory_skill_update
Reflection & self-improvement (7)
memory_reflect_now · memory_self_assess · self_error_log · self_insight · self_patterns · self_reflect · self_rules · self_rules_context
Temporal knowledge graph (4)
kg_add_fact · kg_invalidate_fact · kg_at · kg_timeline
Procedural memory (3)
workflow_learn · workflow_predict · workflow_track
Pre-flight guards & automation (8)
file_context (pre-edit risk scoring) · learn_error (auto-consolidating error capture) · session_init / session_end · ingest_codebase (AST, 9 languages) · analogize (cross-project analogy) · benchmark (regression gate)
Full JSON schemas: python -m claude_total_memory.cli tools --json or open the dashboard at localhost:37737/tools.
For Node.js / browser / any TS project that isn't an MCP-native agent:
npm i @vbch/total-agent-memory-clientimport { connectStdio } from "@vbch/total-agent-memory-client";
const memory = await connectStdio();
await memory.save({
type: "decision",
content: "Picked pgvector over ChromaDB for multi-tenant RLS",
project: "my-api",
});
const hits = await memory.recallFlat({
query: "vector database choice",
project: "my-api",
limit: 5,
});Also ships LangChain adapter example, procedural-memory integration, and HTTP transport (for team / serverless setups).
Package repo: github.com/vbcherepanov/total-agent-memory-client
/— live stats, queue depths, token savings from filters, representation coverage/graph/live— 3D WebGL force-graph (Three.js), 3,500+ nodes / 120,000+ edges, click-to-focus, type filters, search/graph/hive— D3 hive plot, nodes on radial axes by type/graph/matrix— canvas adjacency matrix sorted by type/knowledge— paginated knowledge browser, tag filters/sessions— last 50 sessions with summaries + next steps/errors— consolidated error patterns/rules— active behavioral rules + fire counts- SSE-pill in header — live reconnect indicator
Screenshots → docs/screenshots/ (coming)
cd ~/claude-memory-server
./update.sh7 stages:
- Pre-flight — disk check + DB snapshot (keeps last 7)
- Source pull (git) or SHA-256-verified tarball
- Deps —
pip install -r requirements.txt -r requirements-dev.txt(only if hash changed) - Full pytest suite — aborts with snapshot if red
- Schema migrations —
python src/tools/version_status.py - LaunchAgent reload — reflection + backfill + update-check
- MCP reconnect notification — in-app
/mcp→memory→ Reconnect
Manual equivalent:
cd ~/claude-memory-server
git pull
.venv/bin/pip install -r requirements.txt -r requirements-dev.txt
.venv/bin/python src/tools/version_status.py
.venv/bin/python -m pytest tests/
# in Claude Code: /mcp → memory → Reconnectv8.0 is backward compatible — your existing v7 installation keeps working unchanged. All new features are opt-in via MCP tool calls or env vars.
cd ~/claude-memory-server && ./update.sh
# Applies migrations 011-013 idempotently, restarts LaunchAgents, updates dependenciesThen restart Claude Code: /mcp restart memory.
- Migrations 011–013 apply on MCP startup (privacy_counters, task_phases, intents). Zero-downtime, idempotent.
- Existing
memory_savecalls keep working — they now additionally strip<private>...</private>sections if present. - Existing
memory_recallcalls keep working — default mode is still"search". Newmode="index"is opt-in. - Existing
session_endcalls keep working —auto_compress=Falseby default. Passauto_compress=Trueto opt in. - Existing
self_rules_contextcalls keep working — default returns all rules (no phase filter).
1. Cloud providers (only if you want to replace/augment Ollama):
export MEMORY_LLM_PROVIDER=openai # or "anthropic"
export MEMORY_LLM_API_KEY=sk-...
export MEMORY_LLM_MODEL=gpt-4o-mini # or "claude-haiku-4-5"See Cloud providers for OpenRouter / per-phase routing / Cohere examples.
2. Install additional hooks (for UserPromptSubmit capture + citation):
./install.sh --ide claude-code # re-run installer; it now registers user-prompt-submit.sh hookThe hook is additive — existing hooks keep working.
3. activeContext.md Obsidian integration (if you want markdown projection):
export MEMORY_ACTIVECONTEXT_VAULT=~/Documents/project/Projects # default
# Disable: export MEMORY_ACTIVECONTEXT_DISABLE=1Each session_end writes <vault>/<project>/activeContext.md.
None. All v7 MCP tool signatures are preserved. New parameters are optional with safe defaults.
If you switch to a cloud embedding provider (MEMORY_EMBED_PROVIDER=openai/cohere), the server will refuse to start if existing DB embeddings have a different dimension than the new provider returns. This is deliberate — it prevents silent data corruption.
Either:
- Keep
MEMORY_EMBED_PROVIDER=fastembed(default 384d) and only change the LLM provider, OR - Re-embed the DB:
python src/tools/reembed.py --provider openai --model text-embedding-3-small
Quick reference — see full docs in MCP tools reference:
| Tool | Purpose |
|---|---|
classify_task(description) |
Returns {level 1-4, suggested_phases, estimated_tokens} |
task_create(task_id, description) |
Starts state machine in "van" phase |
phase_transition(task_id, new_phase, artifacts?) |
Moves task through van/plan/creative/build/reflect/archive |
task_phases_list(task_id) |
Chronological phase history |
save_decision(title, options, criteria_matrix, selected, rationale, ...) |
Structured decision with per-criterion indexing |
memory_get(ids, detail) |
Batched full-content fetch for IDs from memory_recall(mode="index") |
save_intent / list_intents / search_intents |
UserPromptSubmit-captured prompts |
rule_set_phase(rule_id, phase) |
Tag a rule for phase-scoped loading |
Extended tools:
memory_recall(mode="index"|"timeline", decisions_only=False, ...)— 3-layer token-efficient workflowsession_end(auto_compress=True, transcript=None, ...)— LLM-generated summaryself_rules_context(phase="build"|"plan"|...)— phase filtersave_knowledge(...)— now strips<private>...</private>sections automatically
v8.0 doesn't remove any v7 functionality. If you hit an issue, you can:
-
Set env var to revert behaviour:
export MEMORY_LLM_PROVIDER=ollama # revert to local LLM export MEMORY_EMBED_PROVIDER=fastembed # revert to local embeddings export MEMORY_ACTIVECONTEXT_DISABLE=1 # disable markdown projection export MEMORY_POST_TOOL_CAPTURE=0 # disable opt-in capture (default anyway)
-
Migrations 011/012/013 are additive (no
DROP/ALTERon existing tables), so DB downgrade is not destructive — old code continues reading older tables. -
Worst case:
git checkout v7.0.0 && ./update.sh --skip-migrations.
Without Ollama: works fully — raw content is saved, retrieval via BM25 + FastEmbed dense embeddings.
With Ollama: you also get LLM-generated summaries, keywords, question-forms, compressed representations, and deep enrichment (entities, intent, topics).
brew install ollama # or: curl -fsSL https://ollama.com/install.sh | sh
ollama serve &
ollama pull qwen2.5-coder:7b # default — best quality/speed on M-series
ollama pull nomic-embed-text # optional, alternative embedderUse OpenAI, Anthropic, or any OpenAI-compat endpoint (OpenRouter, Together, Groq, DeepSeek, LM Studio, llama.cpp) instead of local Ollama.
OpenAI:
export MEMORY_LLM_PROVIDER=openai
export MEMORY_LLM_API_KEY=sk-...
export MEMORY_LLM_MODEL=gpt-4o-miniAnthropic:
export MEMORY_LLM_PROVIDER=anthropic
export MEMORY_LLM_API_KEY=sk-ant-...
export MEMORY_LLM_MODEL=claude-haiku-4-5OpenRouter (100+ models via one endpoint):
export MEMORY_LLM_PROVIDER=openai
export MEMORY_LLM_API_BASE=https://openrouter.ai/api/v1
export MEMORY_LLM_API_KEY=sk-or-...
export MEMORY_LLM_MODEL=anthropic/claude-haiku-4.5Per-phase routing (cheap model for bulk, quality for compression):
export MEMORY_TRIPLE_PROVIDER=openai
export MEMORY_TRIPLE_MODEL=gpt-4o-mini
export MEMORY_ENRICH_PROVIDER=anthropic
export MEMORY_ENRICH_MODEL=claude-haiku-4-5Embeddings (dimension must match existing DB or re-embed required):
export MEMORY_EMBED_PROVIDER=openai
export MEMORY_EMBED_MODEL=text-embedding-3-small # 1536d
# or Cohere:
export MEMORY_EMBED_PROVIDER=cohere
export MEMORY_EMBED_API_KEY=...| Model | Size | Use case |
|---|---|---|
qwen2.5-coder:7b |
4.7 GB | default — best quality/speed ratio |
qwen2.5-coder:32b |
19 GB | highest quality, needs 32 GB+ RAM |
llama3.1:8b |
4.9 GB | general-purpose alternative |
phi3:mini |
2.3 GB | low-RAM machines |
Environment variables (all optional):
| Variable | Default | Purpose |
|---|---|---|
MEMORY_DB |
~/.claude-memory/memory.db |
SQLite location |
MEMORY_LLM_ENABLED |
auto |
auto|true|false|force — LLM enrichment toggle |
MEMORY_LLM_MODEL |
qwen2.5-coder:7b |
Ollama model for enrichment |
MEMORY_LLM_PROBE_TTL_SEC |
60 |
Cache TTL for Ollama availability probe |
MEMORY_LLM_TIMEOUT_SEC |
60 |
Global fallback timeout for Ollama requests (s) |
MEMORY_TRIPLE_TIMEOUT_SEC |
30 |
Timeout for deep triple extraction (s) |
MEMORY_ENRICH_TIMEOUT_SEC |
45 |
Timeout for deep enrichment (s) |
MEMORY_REPR_TIMEOUT_SEC |
60 |
Timeout for representation generation (s) |
MEMORY_TRIPLE_MAX_PREDICT |
2048 |
num_predict cap for triple extraction |
OLLAMA_URL |
http://localhost:11434 |
Ollama endpoint |
MEMORY_EMBED_MODE |
fastembed |
fastembed|sentence-transformers|ollama |
DASHBOARD_PORT |
37737 |
HTTP dashboard port |
MEMORY_MCP_PORT |
3737 |
HTTP MCP transport port (Docker path) |
CPU-only / WSL hosts: if Ollama keeps timing out, lower
MEMORY_TRIPLE_MAX_PREDICTbefore raising timeouts.install-codex.shwrites conservative defaults automatically.
Full config: see claude_total_memory/config.py.
- ✅ Task workflow phases (L1-L4 classifier + 6-phase state machine)
- ✅ Structured
save_decisionwith criteria matrix + multi-representation criterion indexing - ✅ Cloud LLM/embed providers (OpenAI, Anthropic, Cohere, any OpenAI-compat)
- ✅
session_end(auto_compress=True)via LLM provider - ✅ Progressive disclosure:
memory_recall(mode="index")+memory_get(ids) - ✅
activeContext.mdObsidian live-doc projection - ✅ Phase-scoped rules via tag filter
- ✅
<private>...</private>inline redaction - ✅ HTTP citation endpoints
/api/knowledge/{id}+/api/session/{id} - ✅ UserPromptSubmit + PostToolUse (opt-in) capture hooks
- ✅ Unified
install.sh --ide {claude-code|cursor|gemini-cli|opencode|codex}
- Plugin marketplace publish (when Claude Code API opens)
has_llm()per-phase provider caching- GitHub Actions: install smoke tests + LongMemEval nightly
- "Endless mode" — continuous session without hard boundaries (virtual sessions by idle >N hours)
- MLX local LLM integration (A1 plan from memory #3583)
- Speculative decoding for local path (+1.5-1.8× LLM speed)
total-agent-memory is, and will always be, free and MIT-licensed. No paid tier, no gated features, no "enterprise edition". The benchmarks on this page are the entire product.
If it's saving you hours of context-pasting every week and you want to help keep development going — or just say thanks — a donation means a lot.
| Goal | |
|---|---|
| ☕ $5 — a coffee | One evening of focused OSS work |
| 🍕 $25 — a pizza | A new MCP tool end-to-end (design, code, tests, docs) |
| 🎧 $100 — a weekend | A major feature: e.g. the preference-tracking module that closes the 80% gap on LongMemEval |
| 💎 $500+ — a sprint | A release cycle: new subsystem + migrations + docs + benchmark artifact |
- ⭐ Star the repo — GitHub discovery runs on this
- 🐦 Share benchmarks on X / HN / Reddit — reach matters more than donations
- 🐛 Open issues with repro cases — bug reports are pure gold
- 📝 Write a blog post about how you use it
- 🔧 Submit a PR — fixes, new tools, new integrations
- 🌍 Translate the README — first docs in RU / DE / JA / ZH very welcome
- 💬 Tell your team — peer recommendations convert 10× better than marketing
- Building something that would benefit from a custom integration, on-prem deployment, or team-shared memory? Email
vbcherepanov@gmail.com— open to contract work and partnerships. - AI / dev-tools company whose roadmap overlaps? Same email — happy to talk.
MIT forever. No commercial-license switch, no VC money, no dark patterns. The memory layer belongs to the developers using it, not to a SaaS vendor.
Local-first is the product. If you want a cloud memory service, mem0 and Supermemory are great. If you want your data on your disk, untouched by anyone else — this.
Honest benchmarks. Every number on this page is reproducible from the artifacts in evals/ and the scripts in benchmarks/. If you can't reproduce a claim, open an issue — it's a bug.
- Open an issue before a large PR — saves everyone time.
pytest tests/must stay green. Add tests for new tools.- Update
evals/scenarios/*.jsonif you change retrieval behavior. - Docs-only / typo PRs welcome without discussion.
MIT — see LICENSE.
Built for coding agents. Runs on your machine. Free forever.
Compare to mem0 / Letta / Zep / Supermemory ·
Benchmark artifact ·
TypeScript SDK ·
Donate