Durable project state, architecture-as-built, and open design questions. Human-facing and committed to the repo. Actionable work lives in GitHub Issues; this file holds the context behind it.
Last updated: 2026-06-17
A conversational long-term memory pipeline, evaluated on the LoCoMo benchmark (long-term conversational QA — 5 question categories, scored by a Claude-Haiku LLM-as-judge).
Note: the
README.mdstill describes an earlier design (a nanoGPT KV-cache replacement withsrc/graph/,src/memory/,src/model/). That structure no longer exists. The real code lives insrc/library/. README needs updating — see issues.
turn
→ ingest_entities (Phase 1: spaCy NER → L0 GROUND nodes + speaker)
→ ingest_concepts (Phase 2: Haiku LLM → L1 CONCEPT nodes, INSTANTIATES edges)
→ sleep_consolidation (global re-merge pass — "hippocampal replay")
→ assemble_graph / assemble_graph_only (context within a token budget)
→ LLM answer → LoCoMo LLM-as-judge score
Three node levels in one unified graph:
| Level | Source | Behaviour |
|---|---|---|
| L0 GROUND | spaCy NER + speaker | Concrete entities; stable immediately; never merge |
| L1 CONCEPT | Haiku extraction | Abstract concepts; accrue maturity; merge-eligible |
| L2 META | — | Declared, currently unused |
Plus: Hebbian edge reinforcement, maturity-gated merge events ("aha moments"), a temporal/event/place sub-graph, a JSONL ingest log, and a holographic "bioelectric field" layer (field.py, field_reconstruction.py).
src/library/graph.py— relational concept graph (write policy, maturity, merge)src/library/context.py—ContextAssembler; orchestrates ingest + context assemblysrc/library/llm_extractor.py— triple extractor (currently shadowed; see gaps)src/library/field.py/field_reconstruction.py— holographic field layersrc/locomo.py— LoCoMo parsing, scoring, sleep consolidation, convergencesrc/{gpt2,llama,qwen}_evaluator.py,evaluate.py,experiment.py— eval runnerspaper/DESIGN.md— original design rationale (partly aspirational; see gaps)
-
Relational-signature identity is only half-built. DESIGN.md's central claim is that concept identity is its relational signature (language-agnostic, structural). In the code, node matching at write time is pure surface form — character-bigram Jaccard on labels (
_label_similarity). Only merging uses the relational signature (_signature_similarity). The language-agnostic promise is therefore not realized at write/retrieval time. -
Thresholds are heuristic, not calibrated. DESIGN.md's "open questions" (maturity threshold, merge threshold, provisional lifetime) were answered with dynamic, density-scaled formulas rather than empirically tuned values. No
results/directory exists yet, so they are uncalibrated. -
Holographic field is oversold by the README. The field layer's own evaluation concludes it is a compression tool, not a recovery mechanism — graph topology does the recovery work.
-
Redundancy / dead code.
llm_extractor.py(triple extraction) is shadowed by the direct_extract_concepts_llmpath incontext.py, which wins. The L2 META level is declared but never used.
- Signature identity at write time (gap 1) — the most interesting design thread.
- Real LoCoMo numbers to calibrate the dynamic thresholds (gap 2) — unblocks evidence-based tuning.
- Task tracking: GitHub Issues (adopted 2026-06-17).
- This file: durable project context and decisions, not a task list.