Skip to content

konjoai/kohaku

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

124 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸ¦€ Kōhaku

Language ML License Status

🧠 Episodic memory engine for LLMs β€” persistent, associative, and beyond context windows.


πŸ‚ Meaning

Kōhaku (η₯珀) β€” amber, preserved in time.

Like insects trapped in amber, memories are captured, compressed, and preserved β€” not lost to context limits.


πŸš€ What it is

Kōhaku is a neural episodic memory system:

  • Stores experiences as HDC hypervectors
  • Retrieves via associative similarity
  • Works as a drop-in memory layer for any LLM

Not:

  • ❌ RAG
  • ❌ vector database

But:

βœ… learned memory with recall


❗ The problem

LLMs forget.

  • Context windows are finite
  • RAG loses nuance
  • Summaries lose detail

There is no true memory system.


✨ Why kohaku is different β€” memory that reasons

Every other LLM-memory tool (Mem0, Zep/Graphiti, Letta) is the same recipe: dense embeddings + vector search + a graph. They all do one thing β€” retrieve by cosine top-k. Dense embeddings have no invertible structure, so that's the ceiling.

kohaku's hyperdimensional substrate can do algebra over memory β€” something that's impossible to copy without changing substrates:

from kohaku import AnalogicalMemory

mem = AnalogicalMemory()
mem.add_record("USA",    {"currency": "dollar", "capital": "washington"})
mem.add_record("Mexico", {"currency": "peso",   "capital": "mexico_city"})

mem.get("USA", "currency").value          # -> "dollar"   (attribute recall)
mem.analogy("USA", "Mexico", "dollar").value   # -> "peso"  (the dollar of Mexico)

No model call, no extra storage β€” just binding + bundling + cleanup. That last line ("What is the dollar of Mexico?", Kanerva 2010) is relational transfer over an agent's own memory: learn a preference in one domain, infer the analog in another. Try it: PYTHONPATH=python python3 examples/analogy_demo.py.

Operating envelope (honest β€” see benchmarks/bench_analogy.py): attribute recall stays exact past 40 attributes/record; analogical transfer is β‰₯95% accurate up to ~16 bound pairs/record at 10k-D, then degrades gracefully (every answer carries a confidence + margin so you can threshold).


🧠 What you learn

  • Hyperdimensional computing (HDC)
  • Associative memory / Hopfield networks
  • Memory-augmented architectures
  • Episodic vs semantic memory

βš™οΈ Architecture

  • 🐍 Python β€” the full engine and API; the pure-Python path is the correctness baseline and works with zero native dependencies.
  • πŸ¦€ Rust accelerator (optional) β€” bit-packed XOR + popcount cosine top-k behind a PyO3 extension. pip install . (from the repo root, via maturin) builds kohaku._kohaku_rs (kohaku._BACKEND == "rust-accel"), parity-tested against NumPy in CI. Retrieval crosses the FFI boundary zero-copy (borrowed int8 arrays, no list marshaling). The big win is kohaku.RetrievalIndex, a resident packed index that packs the keys once and is ~160–230Γ— faster than NumPy on repeated probes (benchmarks/bench_backends.py); query and query_with_decay use a per-memory cached index automatically. One-shot batches stay on NumPy (re-packing every call is ~parity with BLAS).
pip install ./python    # pure-Python baseline
pip install .           # + Rust accelerator (needs a Rust toolchain + maturin)

πŸš€ Quick Start

pip install kohaku
from kohaku import Memory

mem = Memory()
mem.store("User prefers Italian wine")
mem.store("User is allergic to shellfish", importance=0.9, tags=["health"])

hits = mem.query("What does the user like to drink?")
for h in hits:
    print(h.text, round(h.similarity, 3))
# β†’ User prefers Italian wine 0.63

mem.save("user.json")          # labels + metadata; HVs re-derived on load
mem2 = Memory.load("user.json")

Memory is the one-line front door: store strings, get ranked MemoryHit results back (.text, .similarity, .salience, .source, .tags). It wraps the full EnrichedMemoryStore β€” temporal validity, salience, source-trust, tags β€” behind a string-in/string-out API. Reach for EnrichedMemoryStore, MemorySystem, and friends directly when you need provenance graphs, version history, or consolidation daemons.

🧬 Semantic recall (opt-in)

The default encoder bundles per-token hypervectors, so similarity is token overlap β€” "the customer enjoys merlot" won't match "User prefers Italian wine". For meaning-based recall, plug in an EmbeddingEncoder that projects a dense embedding into HDC space (SimHash β€” sign of a fixed random projection, which approximately preserves cosine):

pip install "kohaku[semantic]"     # pulls sentence-transformers
from kohaku import Memory, EmbeddingEncoder

enc = EmbeddingEncoder(model_name="all-MiniLM-L6-v2")   # or embed_fn=<your callable>
mem = Memory(encoder=enc)
mem.store("User prefers Italian wine")
mem.query("the customer enjoys a glass of merlot")[0].text
# β†’ 'User prefers Italian wine'   (zero shared tokens, still matches)

EmbeddingEncoder takes any embed_fn (str -> float array) β€” sentence- transformers, OpenAI embeddings, your own β€” so there's no hard dependency. A store saved with a custom encoder must be reloaded with the same one (Memory.load(path, encoder=enc)).

⚑ Scaling past 10⁴ memories

Exact cosine retrieval is O(NΒ·D) per query. Flip on the bipolar-LSH index to narrow each similarity query to a small candidate set before exact ranking:

mem = Memory(ann=True)            # maintains a kohaku.ann.LSHIndex
# ... store thousands of memories ...
mem.query("...")                  # sub-linear: LSH candidates, exact re-rank

Results are unchanged except for the rare LSH miss β€” candidates are always scored with exact cosine, and salience/recency sorts or empty candidate sets fall back to a full scan. LSHIndex is pure NumPy (no FAISS/hnswlib) and can be used standalone.

πŸ“¦ Whole-system snapshots

save_system / load_system persist an entire enriched setup β€” episodic hypervectors, per-memory metadata, and the provenance / version / relationship side stores β€” into one directory with a manifest:

from kohaku import save_system, load_system

save_system(store, "snapshot/", provenance=pg, versions=vs, relationships=rel)
bundle = load_system("snapshot/")
bundle.store, bundle.provenance, bundle.versions, bundle.relationships

SQLite side stores are copied via the backup API (so :memory: stores persist too), and recall is exact after the round-trip.


πŸ’Ύ Persistence (v0.4.0)

from kohaku import EpisodicMemory, save, load

mem = EpisodicMemory(capacity=1000)
# ... store entries ...
save(mem, "memories.hkb")        # packed binary, ~10x smaller than JSON
save(mem, "memories.json")       # human-readable

mem2 = load("memories.hkb")      # round-trip preserves IDs, timestamps, recall

🌱 Consolidation

from kohaku import consolidate_to_memory

semantic = consolidate_to_memory(mem, similarity_threshold=0.3)
# Greedy bundle-of-bundles clustering: N noisy episodic traces β†’ K semantic centroids.

🧠 Online learning + Hopfield + episodic↔semantic (v0.5.0)

from kohaku import ItemMemory, HopfieldAssociator, MemorySystem, encode_text

# Online HDC learning β€” prototypes update with every example
im = ItemMemory()
for example in cat_examples:
    im.add("cat", encode_text(example))
im.train_from_feedback("cat", encode_text("a dog barked"), correct=False)
top = im.predict(encode_text("a kitten napping"), top_k=3)

# Modern Hopfield β€” clean noisy queries by softmax-weighted retrieval
hop = HopfieldAssociator(beta=0.05)
for proto in canonical_prototypes:
    hop.store(proto)
cleaned = hop.complete(noisy_query)

# Combined episodic + semantic store with sleep-style consolidation
ms = MemorySystem(episodic_capacity=1000)
ms.store_episode(key, value, label="meeting on monday")
ms.consolidate_to_semantic(similarity_threshold=0.3)  # promote clusters β†’ prototypes
results = ms.recall(query, top_k=3, use_decay=True)   # tagged by source

πŸ•°οΈ Temporal decay

from kohaku import DecayConfig, query_with_decay

cfg = DecayConfig(half_life=100.0, floor=0.05)
results = query_with_decay(mem, query_key, top_k=5, config=cfg)
# Older memories decay exponentially: weight = max(0.5 ** (age / half_life), floor)

🎬 Live demo

python demo/server.py        # starts a localhost server with REAL kohaku
open http://127.0.0.1:8000

The page detects the API and switches from offline simulation to live mode β€” every similarity number, decay weight, and .hkb file size you see is computed by the live library. Add a phrase, click any node, drag the days slider, hit save β€” it's all real.

PYTHONPATH=python python3 demo/demo.py    # rich-terminal walkthrough

🎯 Vision

Give models memory β€” not just context.

About

πŸ¦€ Kohaku β€” Neural episodic memory engine 🧠 using HDC hypervectors. Stores long-term context πŸ“š and retrieves via associative recall πŸ”—β€”a persistent memory layer beyond RAG for LLMs ✨

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors