MindReader is organized as an npm workspaces monorepo:
mindreaderv2/
packages/
mindgraph/ # Python core - Graphiti memory engine, CLI, background worker
mindreader-ui/ # Express server + React UI - visualization & management
server/
server.js # App assembly (~150 lines)
routes/ # 8 route modules (graph, entity, categories, search, cleanup, audit, tokens, cli)
lib/ # Shared utilities (daemon, categorizer, LLM client, preprocessor)
ui/ # React frontend
openclaw-plugin/ # Optional AI agent integration - auto-recall/capture
Conversations ──> Capture ──> Preprocessor ──> Neo4j Knowledge Graph ──> Recall ──> AI Context
│ │ │ │
▼ ▼ ▼ ▼
Filter messages Classify facts Auto-categorize Semantic search
Extract facts Attributes → Neo4j Auto-tag Entity profiles
Find entities Relationships → Graphiti Relationship repair Structured JSON
│
▼
Self-Evolution
(web search LLM discovers
new entities & relationships)
- Capture — Conversations are filtered, then an LLM preprocessor extracts and classifies facts before storage
- Preprocess — Each fact is classified as either an attribute (written directly to entity tags/summary in Neo4j) or a relationship (forwarded to Graphiti for graph storage). This prevents junk entities like "Developer" or "15 Years Experience" from polluting the graph.
- Organize — LLM auto-categorizes, auto-tags, and maintains the graph continuously
- Evolve — Any node can be expanded via web-search-powered research, discovering new entities and connections
- Recall — Semantic search retrieves relevant memories with full entity context
Traditional knowledge graph systems treat everything as entities. Tell it "Dell is a developer with 15 years experience" and you get three entities: "Dell", "Developer", and "15 Years Experience". The last two are attributes, not independent entities — they pollute the graph.
MindReader's preprocessing pipeline solves this:
- Known entity lookup — Before classification, the preprocessor searches Neo4j for entities mentioned in the text
- LLM classification — Each fact is classified as an attribute (role, skill, trait, preference → direct Neo4j update) or a relationship (connection between entities → Graphiti)
- Direct attribute writes — Tags and summary updates are written directly to existing entity nodes, avoiding unnecessary graph traversal
- Graceful degradation — If preprocessing fails, the system falls back to Graphiti with custom extraction instructions that still prevent junk entity creation
Both the manual store path (memory_store tool) and the auto-capture path (end-of-conversation hook) go through this pipeline. The auto-capture path additionally extracts key facts from conversation history before classification, filtering out code, debug output, and tool results.
| Mode | Behavior | LLM Calls |
|---|---|---|
merged (default) |
One LLM call extracts facts + classifies | 1 |
two-pass |
Separate extraction and classification steps | 1 + N |
Set via the PREPROCESS_MODE environment variable.
Search across entity names, summaries, and tags from a single search bar (Ctrl+K).
- Tag-aware search — Search "swimmer" finds entities tagged "swimmer" even if the word isn't in the name
- Context-aware — On the graph tab, search highlights and zooms to matching nodes
- Relevance ranking — Exact name match > prefix match > contains match > tag match
| Feature | How It Works |
|---|---|
| Auto-Categorization | LLM classifies uncategorized entities using name, summary, and tags |
| Auto-Tagging | Extracts descriptive tags (roles, skills, locations, relationships) in the same LLM call |
| Batch Re-categorization | Process uncategorized entities in configurable batches with one click |
| Duplicate Detection | Scans for entities with similar names and properties |
| Relationship Repair | Detects reversed, misspelled, and vague relationships (rule-based + LLM) |
| Orphan Cleanup | Finds and removes disconnected entities |
| View | Purpose |
|---|---|
| List | Browse and search entities with pagination, filter by category |
| Timeline | See memories organized chronologically (Today, Yesterday, This Week, Earlier) |
| Graph | Interactive visual exploration of entities and relationships |
| Categories | Browse and manage entity categories with per-category counts |
| Activity | Audit log of captured and recalled memories |
| Tokens | Track LLM API usage and costs |
| Maintenance | Cleanup tools, relationship repair, batch re-categorization |
- Zoom, pan, click to explore
- Nodes sized by connection count — important entities are larger
- Color-coded by category (person, project, company, etc.)
- Hover for quick preview with category, tags, and summary
- Filter by category to focus on what matters
- 6 layout modes — Force, ForceAtlas2, Radial, Circular, Cluster, Grid
Evolve supports up to 3 rounds of progressive discovery:
- Round 1: Internal discovery (connections to existing graph entities) + external web research
- Round 2: Deeper research, excluding Round 1 discoveries, finding second-degree connections
- Round 3: Further depth, building on all prior discoveries
Each round sends previous discoveries as exclusion context to avoid re-discovering the same entities. The UI shows a sequential feed with round dividers. After each round, choose to continue or save.
Memories decay over time unless reinforced by access. This keeps the knowledge graph fresh and prevents stale data from cluttering recall results.
-
Strength decay: Every entity and relationship has a
strengthfield (0.0-1.0). Strength decays exponentially based on time since last access:strength = exp(-lambda * days_since_last_access). Default lambda=0.03 gives a ~23-day half-life. -
Access reinforcement: When an entity is accessed (search, recall, detail view), its strength is boosted and
last_accessed_atis reset. Frequently-used memories stay strong. -
Auto-expiry: When strength drops below the threshold (default 0.1), the item is soft-expired via
expired_attimestamp. It's hidden from normal queries but preserved for history. -
Cascade expiry: When all relationships of an entity are expired, the entity is auto-expired too.
-
Contradiction expiry: Handled by Graphiti — when new info contradicts an existing edge, the old edge is invalidated immediately regardless of strength.
| Variable | Default | Description |
|---|---|---|
MEMORY_DECAY_ENABLED |
true |
Enable/disable the decay system |
MEMORY_DECAY_INTERVAL_MS |
3600000 (1hr) |
How often the decay job runs |
MEMORY_DECAY_LAMBDA |
0.03 |
Decay rate (~23-day half-life) |
MEMORY_DECAY_THRESHOLD |
0.1 |
Auto-expire below this strength |
MEMORY_DECAY_REINFORCE_DELTA |
0.3 |
Strength boost on access |
| Endpoint | Method | Description |
|---|---|---|
/api/decay/status |
GET | Decay statistics and config |
/api/decay/run |
POST | Manually trigger a decay cycle |
/api/decay/restore/:name |
POST | Un-expire an entity and its relationships |
Enable "Time Travel" in the graph view to reveal a date slider:
- Drag the slider to see the graph at any point in time
- Nodes created after the selected date disappear
- Expired nodes reappear as ghosts
- Date gauge ticks show key dates
- Auto Play: animates from the oldest entity to now over 30 seconds
All expiry is soft-delete. The underlying data model uses created_at and expired_at timestamps:
-- Show the graph as of March 1
WHERE e.created_at <= datetime('2026-03-01')
AND (e.expired_at IS NULL OR e.expired_at > datetime('2026-03-01'))
Each entity has two text fields:
- Summary (max 200 chars): Brief identifier used in recall context and search previews
- Details (max 10KB, markdown): Comprehensive information — every fact, relationship context, and historical change
When new facts are captured about an entity, an LLM synthesizes the existing details with the new information into an updated markdown document. The LLM also generates a concise 200-char summary.
This happens automatically on:
- Auto-capture: conversation facts merged into entity details
- Manual store:
memory_storetool updates trigger synthesis - Evolve: web search discoveries merged into details
- Direct API:
POST /api/entitiescan set details explicitly
| Endpoint | Method | Description |
|---|---|---|
/api/entity/:name |
GET | Returns entity with details field |
/api/entity/:name/details |
PUT | Update details directly (body: { details: "markdown..." }) |
/api/entities |
POST | Create entities with optional details field |
Details are included in the fulltext search index. Searching for any term in the details will find the entity, even if the summary doesn't mention it.
For systems that require precise, deterministic memory management without LLM processing:
POST /api/entities
Create or update entities directly in Neo4j with batch support, optional relationships, and upsert behavior.
See Direct Entity API Reference for full documentation with examples.
Full command-line interface for power users and automation:
mg search "swimming competitions" # Semantic search with entity profiles
mg search "Aria" --json # Machine-readable JSON output
mg tags "Aria Lu" # View tags: Aria Lu [person]: swimmer, daughter
mg tags "Aria Lu" --add "competitive" # Add a tag
mg tags --backfill # LLM-extract tags for all entities
mg add "Alice is a data scientist" # Store a new memory
mg entities --limit 20 # List entities
mg maint scan # Scan for issues
mg maint fix # Auto-fix duplicates and orphansMindReader is built as a first-class OpenClaw extension.
- Auto-capture (
agent_endhook) — extracts entities, facts, and relationships from conversations - Auto-recall (
before_agent_starthook) — retrieves relevant memories and injects into agent context - Tool calls —
memory_search,memory_store,memory_entitiestools - Web UI — graph explorer served on configurable port (default 18900)
- Auto-sync — plugin files automatically sync from monorepo on gateway restart
Configuration in openclaw.json under plugins.entries.mindreader.config:
| Key | Description | Default |
|---|---|---|
neo4jUri |
Neo4j bolt connection URI | bolt://localhost:7687 |
neo4jUser |
Neo4j username | neo4j |
neo4jPassword |
Neo4j password | (empty) |
llmProvider |
LLM provider (openai/dashscope/anthropic) | (from .env) |
llmApiKey |
LLM API key | (from .env) |
llmBaseUrl |
LLM API base URL | (from .env) |
llmModel |
LLM model name | (from .env) |
autoCapture |
Enable auto-capture from conversations | true |
autoRecall |
Enable auto-recall before agent responses | true |
recallLimit |
Max memories to recall | 5 |
captureMaxChars |
Max chars to capture per conversation | 2000 |
uiPort |
Web UI port | 18900 |
uiEnabled |
Enable web UI | true |
Values left empty in
openclaw.jsonfall through to.envvariables automatically.
Configuration is stored in .env at the monorepo root. The setup wizard generates this automatically.
| Variable | Description |
|---|---|
LLM_PROVIDER |
openai, dashscope, anthropic, or ollama |
LLM_API_KEY |
API key for the LLM provider |
LLM_BASE_URL |
API base URL |
LLM_MODEL |
Primary LLM model |
LLM_SMALL_MODEL |
Smaller/faster model for extraction |
LLM_EVOLVE_MODEL |
Model for evolve feature (ideally with web search) |
EMBEDDER_API_KEY |
Embedder API key (defaults to LLM_API_KEY) |
EMBEDDER_BASE_URL |
Embedder API base URL |
EMBEDDER_MODEL |
Embedding model name |
EMBEDDER_DIM |
Embedding dimensions |
NEO4J_URI |
Neo4j connection URI |
NEO4J_USER |
Neo4j username |
NEO4J_PASSWORD |
Neo4j password |
UI_PORT |
Web UI port (default: 18900) |
SEQ_URL |
Seq structured logging URL (optional) |
SEQ_API_KEY |
Seq API key (optional) |
| Provider | Status | Default Model | Web Search | Notes |
|---|---|---|---|---|
| OpenAI | Supported | gpt-4o-mini |
No | Most widely available |
| DashScope (Alibaba) | Supported | qwen3.5-flash |
Yes (built-in) | Best for evolve feature |
| Anthropic | Supported (native) | claude-sonnet-4-6 |
No | Uses native Anthropic SDK |
| Ollama | Supported | llama3.2 |
No | Free, local, no API key. Embeddings via nomic-embed-text |
# Dev mode (server + UI with hot reload)
npm run dev
# Build the UI for production
npm run build
# Run setup wizard
npm run setup
# CLI tool
mg --help # or: python3 packages/mindgraph/python/mg_cli.py --helpMindReader runs on Linux, macOS, and Windows. The setup wizard detects the platform and uses the appropriate script (bash on Unix, PowerShell on Windows). The Python daemon uses platform-specific stdin handling (asyncio pipes on Unix, threaded readline on Windows).
MindReader supports multi-tenant isolation via a mandatory tenantId property on all data nodes and edges.
- Every Entity, Episodic, TokenUsage, AuditLog node and every RELATES_TO/MENTIONS edge has a
tenantIdproperty - In open-source (single-user) mode, tenantId is always "master"
- In cloud mode, the upstream API proxy sets
X-Tenant-Idheader per authenticated user - All queries automatically filter by tenantId via AsyncLocalStorage context injection
- Category and Migration nodes are system-wide (no tenant filtering)
| Variable | Default | Description |
|---|---|---|
INTERNAL_SECRET |
(empty) | Shared secret for X-Internal-Secret header validation. When set, only requests with this secret can set X-Tenant-Id to non-"master" values. |
- Express server should NOT be publicly accessible in cloud deployments
- X-Internal-Secret header validates that tenant ID comes from the trusted proxy
- AsyncLocalStorage ensures tenant context flows to all downstream queries automatically
- Two-tenant isolation test suite (
tests/tenant-isolation.test.js) validates no cross-tenant leakage
# Start MindReader server first, then:
node tests/tenant-isolation.test.js