π§ Episodic memory engine for LLMs β persistent, associative, and beyond context windows.
KΕhaku (η₯η) β amber, preserved in time.
Like insects trapped in amber, memories are captured, compressed, and preserved β not lost to context limits.
KΕhaku is a neural episodic memory system:
- Stores experiences as HDC hypervectors
- Retrieves via associative similarity
- Works as a drop-in memory layer for any LLM
Not:
- β RAG
- β vector database
But:
β learned memory with recall
LLMs forget.
- Context windows are finite
- RAG loses nuance
- Summaries lose detail
There is no true memory system.
Every other LLM-memory tool (Mem0, Zep/Graphiti, Letta) is the same recipe: dense embeddings + vector search + a graph. They all do one thing β retrieve by cosine top-k. Dense embeddings have no invertible structure, so that's the ceiling.
kohaku's hyperdimensional substrate can do algebra over memory β something that's impossible to copy without changing substrates:
from kohaku import AnalogicalMemory
mem = AnalogicalMemory()
mem.add_record("USA", {"currency": "dollar", "capital": "washington"})
mem.add_record("Mexico", {"currency": "peso", "capital": "mexico_city"})
mem.get("USA", "currency").value # -> "dollar" (attribute recall)
mem.analogy("USA", "Mexico", "dollar").value # -> "peso" (the dollar of Mexico)No model call, no extra storage β just binding + bundling + cleanup. That last
line ("What is the dollar of Mexico?", Kanerva 2010) is relational transfer
over an agent's own memory: learn a preference in one domain, infer the analog
in another. Try it: PYTHONPATH=python python3 examples/analogy_demo.py.
Operating envelope (honest β see benchmarks/bench_analogy.py): attribute
recall stays exact past 40 attributes/record; analogical transfer is β₯95%
accurate up to ~16 bound pairs/record at 10k-D, then degrades gracefully (every
answer carries a confidence + margin so you can threshold).
- Hyperdimensional computing (HDC)
- Associative memory / Hopfield networks
- Memory-augmented architectures
- Episodic vs semantic memory
- π Python β the full engine and API; the pure-Python path is the correctness baseline and works with zero native dependencies.
- π¦ Rust accelerator (optional) β bit-packed XOR + popcount cosine top-k
behind a PyO3 extension.
pip install .(from the repo root, via maturin) buildskohaku._kohaku_rs(kohaku._BACKEND == "rust-accel"), parity-tested against NumPy in CI. Retrieval crosses the FFI boundary zero-copy (borrowedint8arrays, no list marshaling). The big win iskohaku.RetrievalIndex, a resident packed index that packs the keys once and is ~160β230Γ faster than NumPy on repeated probes (benchmarks/bench_backends.py);queryandquery_with_decayuse a per-memory cached index automatically. One-shot batches stay on NumPy (re-packing every call is ~parity with BLAS).
pip install ./python # pure-Python baseline
pip install . # + Rust accelerator (needs a Rust toolchain + maturin)pip install kohakufrom kohaku import Memory
mem = Memory()
mem.store("User prefers Italian wine")
mem.store("User is allergic to shellfish", importance=0.9, tags=["health"])
hits = mem.query("What does the user like to drink?")
for h in hits:
print(h.text, round(h.similarity, 3))
# β User prefers Italian wine 0.63
mem.save("user.json") # labels + metadata; HVs re-derived on load
mem2 = Memory.load("user.json")Memory is the one-line front door: store strings, get ranked MemoryHit
results back (.text, .similarity, .salience, .source, .tags). It wraps
the full EnrichedMemoryStore β temporal validity, salience, source-trust,
tags β behind a string-in/string-out API. Reach for EnrichedMemoryStore,
MemorySystem, and friends directly when you need provenance graphs, version
history, or consolidation daemons.
The default encoder bundles per-token hypervectors, so similarity is token
overlap β "the customer enjoys merlot" won't match "User prefers Italian
wine". For meaning-based recall, plug in an EmbeddingEncoder that projects
a dense embedding into HDC space (SimHash β sign of a fixed random projection,
which approximately preserves cosine):
pip install "kohaku[semantic]" # pulls sentence-transformersfrom kohaku import Memory, EmbeddingEncoder
enc = EmbeddingEncoder(model_name="all-MiniLM-L6-v2") # or embed_fn=<your callable>
mem = Memory(encoder=enc)
mem.store("User prefers Italian wine")
mem.query("the customer enjoys a glass of merlot")[0].text
# β 'User prefers Italian wine' (zero shared tokens, still matches)EmbeddingEncoder takes any embed_fn (str -> float array) β sentence-
transformers, OpenAI embeddings, your own β so there's no hard dependency. A
store saved with a custom encoder must be reloaded with the same one
(Memory.load(path, encoder=enc)).
Exact cosine retrieval is O(NΒ·D) per query. Flip on the bipolar-LSH index to
narrow each similarity query to a small candidate set before exact ranking:
mem = Memory(ann=True) # maintains a kohaku.ann.LSHIndex
# ... store thousands of memories ...
mem.query("...") # sub-linear: LSH candidates, exact re-rankResults are unchanged except for the rare LSH miss β candidates are always
scored with exact cosine, and salience/recency sorts or empty candidate sets
fall back to a full scan. LSHIndex is pure NumPy (no FAISS/hnswlib) and can
be used standalone.
save_system / load_system persist an entire enriched setup β episodic
hypervectors, per-memory metadata, and the provenance / version / relationship
side stores β into one directory with a manifest:
from kohaku import save_system, load_system
save_system(store, "snapshot/", provenance=pg, versions=vs, relationships=rel)
bundle = load_system("snapshot/")
bundle.store, bundle.provenance, bundle.versions, bundle.relationshipsSQLite side stores are copied via the backup API (so :memory: stores persist
too), and recall is exact after the round-trip.
from kohaku import EpisodicMemory, save, load
mem = EpisodicMemory(capacity=1000)
# ... store entries ...
save(mem, "memories.hkb") # packed binary, ~10x smaller than JSON
save(mem, "memories.json") # human-readable
mem2 = load("memories.hkb") # round-trip preserves IDs, timestamps, recallfrom kohaku import consolidate_to_memory
semantic = consolidate_to_memory(mem, similarity_threshold=0.3)
# Greedy bundle-of-bundles clustering: N noisy episodic traces β K semantic centroids.from kohaku import ItemMemory, HopfieldAssociator, MemorySystem, encode_text
# Online HDC learning β prototypes update with every example
im = ItemMemory()
for example in cat_examples:
im.add("cat", encode_text(example))
im.train_from_feedback("cat", encode_text("a dog barked"), correct=False)
top = im.predict(encode_text("a kitten napping"), top_k=3)
# Modern Hopfield β clean noisy queries by softmax-weighted retrieval
hop = HopfieldAssociator(beta=0.05)
for proto in canonical_prototypes:
hop.store(proto)
cleaned = hop.complete(noisy_query)
# Combined episodic + semantic store with sleep-style consolidation
ms = MemorySystem(episodic_capacity=1000)
ms.store_episode(key, value, label="meeting on monday")
ms.consolidate_to_semantic(similarity_threshold=0.3) # promote clusters β prototypes
results = ms.recall(query, top_k=3, use_decay=True) # tagged by sourcefrom kohaku import DecayConfig, query_with_decay
cfg = DecayConfig(half_life=100.0, floor=0.05)
results = query_with_decay(mem, query_key, top_k=5, config=cfg)
# Older memories decay exponentially: weight = max(0.5 ** (age / half_life), floor)python demo/server.py # starts a localhost server with REAL kohaku
open http://127.0.0.1:8000The page detects the API and switches from offline simulation to live mode β every
similarity number, decay weight, and .hkb file size you see is computed by the live
library. Add a phrase, click any node, drag the days slider, hit save β it's all real.
PYTHONPATH=python python3 demo/demo.py # rich-terminal walkthroughGive models memory β not just context.