Skip to content

Lexus2016/turbo_quant_memory

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

80 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Turbo Quant Memory for AI Agents

Turbo Quant Memory hero

Latest release Python 3.11+ MCP server Local-first

Other languages: Russian | Ukrainian

Turbo Quant Memory is the memory layer that makes AI agents feel like long-term teammates instead of short-term chat sessions.

If you use Claude Code, Codex, Cursor, OpenCode, Gemini CLI, or any MCP client, this is how you keep your institutional knowledge alive between tasks.

Why It Matters

Most agent workflows fail in the same place: memory.

  • Great insights disappear in chat history.
  • Every new task restarts from zero.
  • Teams re-explain the same architecture again and again.

Turbo Quant Memory fixes this by making your project knowledge persistent, searchable, and reusable.

Why Teams Choose Turbo Quant Memory

Typical AI workflow With Turbo Quant Memory
Agents forget context between sessions Agents can continue from saved project knowledge
Decisions stay buried in old threads Decisions become reusable notes
Team knowledge stays inside one person's head Knowledge becomes shared, searchable, and portable
Token budget is wasted on repeated reading Context is loaded smarter, so more budget goes to reasoning

The Core Promise

Your agents stop behaving like temporary assistants and start behaving like members of the team.

What Makes It Different

  • Local-first by design: your memory stays under your control.
  • One memory layer for many clients: same knowledge, same standards, same outcomes.
  • Cross-agent continuity: start in Codex, continue in Gemini CLI, come back to Codex, and keep the same project memory.
  • Built for real delivery: capture decisions, patterns, and handoffs that compound over time.
  • Transparent and auditable: memory is explicit, structured, and easy to inspect.

Quick Start

Use this 60-second flow:

  1. Install once:
uv tool install git+https://github.com/Lexus2016/turbo_quant_memory@v0.3.1
  1. Add tqmemory MCP server in your client (the client will launch it automatically):
# Codex
codex mcp add tqmemory -- turbo-memory-mcp serve

# Gemini CLI
gemini mcp add tqmemory turbo-memory-mcp serve

# Claude Code (project scope)
claude mcp add --scope project tqmemory -- turbo-memory-mcp serve
  1. Restart the client and run any tqmemory tool.

Need a ready config for Gemini CLI, Cursor, OpenCode, or Antigravity? Use CLIENT_INTEGRATIONS.md.

Shared Memory Across Agents

This works out of the box in the standard local setup. You do not need a separate sync service, export/import flow, or agent-specific memory configuration.

This is shared local memory, not remote cloud sync. If Codex and Gemini CLI run on the same machine and open the same repository, they can use the same memory layer automatically.

To keep one shared project memory across Codex, Gemini CLI, and other MCP clients:

  1. Install turbo-memory-mcp once on the machine.
  2. Add the same tqmemory MCP server in each client you use.
  3. Open the same repository in each client.

When those conditions are true, the clients resolve the same project memory automatically. That means you can start work in Codex, continue in Gemini CLI, and return to Codex without rebuilding context.

If a client is launched outside the repository root, set TQMEMORY_PROJECT_ROOT explicitly so it resolves the same project identity.

Who This Is For

  • AI-first engineering teams
  • Solo builders running multiple agents
  • Product teams that want consistent AI execution quality
  • Anyone tired of repeating context every day

Why Pick This

Choose Turbo Quant Memory if you want:

  • faster onboarding for every new task
  • fewer repeated mistakes
  • stronger continuity across sessions
  • higher ROI from every agent run

Benchmark-Proven Cost Advantage

Benchmark summary

On this repository corpus, the compact memory path shows strong savings that directly reduce model spend:

  • semantic_search only: 63.96% fewer bytes sent to the model on average
  • semantic_search + hydrate(top1): 44.1% fewer bytes on average
  • semantic_search latency: 68.13 ms average
  • hydrate latency: 41.63 ms average

Why this is a practical advantage:

  • less repeated reading means fewer billed input tokens
  • lower token pressure means lower cost per task
  • context budget stays available for reasoning instead of reloading files

New In v0.3.1

  • Published shared-memory guidance for Codex and Gemini CLI handoffs inside the README and client integration docs.
  • Added a ready Gemini CLI fixture plus smoke-check steps for validating the same tqmemory server across clients.
  • Clarified that shared memory is local same-machine continuity, not remote cloud sync.

Learn More

About

Local-first MCP memory server for AI coding agents with compact retrieval and project/global scopes.

Topics

Resources

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages