🗂️ NicheMem: Idle but Not Forgotten

Reward-Independent Memory Ownership for Long-Horizon Agents

Idle but Not Forgotten — usage-driven memory evicts the dormant runbook; NicheMem's owner sleeps on it and retains it

Memory allocation must be structural, not usage-driven.

Papers • Honest Scope • Quick Start • Key Results • Mechanism • What Broke • Roadmap • Citation

📖 Abstract

Every deployed agent-memory system allocates its bounded budget by recency or current utility — LRU context paging, usage-weighted compression, learned relevance selectors. NicheMem identifies a structural failure of this entire family: in long-horizon deployments where task families recur after dormancy (quarterly reports, annual audits, rare incident runbooks), the memory most worth protecting is precisely the memory emitting no usage signal. Usage-driven policies erase dormant-family competence and pay a re-acquisition cost at every reactivation.

Core thesis: retention of dormant-family skills tracks a single property — whether what survives the budget is decided by the usage stream or by a usage-independent ownership structure. NicheMem reaches the ownership structure emergently: memory modules acquire task families through quality-weighted winner-take-all competition; converged ownership is pinned; eviction pressure is confined within niches; and each module's skill footprint is a structural floor that budget rebalancing may not squeeze.

This is the sibling instantiation of the GAUSE reward-independence principle, transported from learner populations (weight space) into agent memory (context space) — with three mapped substrate differences that are the most novel findings here.

⚠️ Honest Scope (read first)

All evidence in this repo is mechanism-level. CycleBench-Sim replaces the frozen LLM with a calibrated agent abstraction (per-rule success is a function of whether the needed skill entries survive in retrieved context, and at what fidelity). This isolates who owns budget and what survives churn — the allocation mechanism — exactly as tabular experiments isolate capacity allocation in GAUSE.

✅ What is tested: memory allocation dynamics, retention laws, the mechanism, the theory.
❌ What is not tested: end-to-end LLM behavior. No LLM experiment has been run.
🔭 Every LLM-level statement in the papers is a stated, falsifiable prediction for the future LLM tier; the papers are written to serve as that tier's pre-registration. See paper/Next Steps and Review.tex for the cost-and-priority plan (~$300 buys the decisive gate experiment).

🚀 Quick Start

git clone https://github.com/HowardLiYH/NicheMem.git
cd NicheMem
pip install numpy scipy matplotlib

python3 experiments/run_all.py        # full suite E1–E10 → results/*.json  (~15 min, CPU)
python3 experiments/verify_theory.py  # numerical checks of every proposition
python3 experiments/make_figures.py   # publication figures → paper/figures/
cd paper && latexmk -pdf main.tex     # + "NicheMem Explainer.tex", "Deep Dive.tex"

Every number in the papers is generated from results/*.json — and audited against them.

🎯 Key Results

The dissociation: usage-driven policies cluster at 0.35–0.41 post-reactivation; reward-independent at 0.92–0.94

The dissociation (E1, 24 seeds, matched budget, Holm-adjusted p ≤ 1.2×10⁻³⁰):

Policy class	Post-reactivation	Active-family	Verdict
Usage-driven (LRU / usage-compress / EMO / learned utility)	0.35 – 0.41	0.91 – 0.93	forgets dormant families
Unbounded RAG ("store everything")	0.84 → 0.37 at long horizons	0.85	retrieval decays into de-facto forgetting
Reward-independent (quotas / NicheMem / oracle)	0.92 – 0.94	0.93 – 0.94	retains exactly

Coverage is a commodity; retention is not. Active-family success is substantively equivalent across every bounded arm — the dissociation is dormancy-specific.
Retention tracks the class, not the policy. The four usage-driven heuristics are statistically indistinguishable from one another. What the score measures (recency, frequency, utility, relevance) is irrelevant; that it is a function of the usage stream is decisive.
NicheMem recovers 97% of a privileged hand-built oracle (0.915 vs 0.944) with no labels, no quotas, no trained router, at 1.8× evaluation calls concentrated in an organization phase touching ~7% of the stream.
Three retention-curve shapes, each matching its derived law: step (eviction; collapse within ~2 epochs of dormancy — there is no grace period), geometric (score decay), graded-then-cliff with near-miss failures (lossy summarization: the agent half-remembers and fails plausibly).
Verified theory: LRU survival bound, fidelity-decay law, exactness of retention under ownership + skill floors, cold-start acquisition chain (analytic 2.620 vs Monte-Carlo 2.615), and a break-even condition retention satisfies 14:1 at the operating point — and fails below a measurable budget (NicheMem honestly loses to LRU at B=2,400).

⚙️ The Mechanism

Compete. Per task, memory modules tournament: each retrieves from its own store; the module whose context yields the best verifiable outcome wins, ingests the distilled skill, and applies a quality-weighted exponentiated-gradient affinity update (a zero-quality win moves nothing — the cold-start floor).
Pin. At affinity > 0.9 (~8 quality wins), ownership locks: routing becomes a table lookup; the owner stops contesting other families (competitive exclusion).
Idle ≠ Forget. A dormant family's owner receives no tasks → no ingests, no reclaims. Its skill footprint is a structural floor invisible to budget rebalancing; only episodic working memory flexes with need. Retention is exact — an identity, not a bound.

🔬 Findings That Refuted Our Own Predictions

Ten predictions were pre-registered before the first run; four were refuted and are reported as findings (full outcome table in main.pdf, Appendix B):

#	We predicted	The data said
1	Smarter usage heuristics retain more	No — the class decides; the heuristic doesn't
2	NicheMem is robust to cluster noise	No — label noise defeats every label-keyed store, pinned NicheMem included; robustness lives in per-task tournament selection at 12× cost
3	A staleness trigger rescues within-family drift (as in GAUSE)	No-op — memory stores self-repair through distillation; the weight-space remedy doesn't transfer
4	Surplus modules idle harmlessly (as GAUSE proves for learners)	No — competence is acquired through winning, so surplus competitors duplicate acquisition and waste budget under tight B

Plus two documented self-inflicted failures kept in the record because each is a design theorem:

The budget side door: our first "budget tracks need" allocator silently starved dormant niches — a reward-chaser smuggled in through budget rebalancing (cost 15 retention points before diagnosis). Repair: skill floors are structural.
Retrieval–refresh coupling: when a store cell exceeds the retrieval top-k, un-retrieved skills age invisibly and LRU evicts them during active periods — it dents even the oracle.

And the drift inversion: at full rule replacement, the most retentive arm becomes the worst (protected wrongness crowds working memory) — retention's value is bounded by within-family stability, crossover ≈ ⅔ replacement.

📚 The Paper Suite

Document	Pages	Role
`paper/main.tex`	17	The research paper (NeurIPS style): propositions + proofs, all experiments, pre-registration outcomes
`paper/NicheMem Explainer.tex`	19	Architecture & mechanism companion: intuition, worked demos, applications map, deployment recipe, when not to use it
`paper/Deep Dive.tex`	20	Mathematical foundations: derivations, worked numeric examples verified from code, verbatim load-bearing code, failure post-mortems, auto-generated reference tables
`paper/Next Steps and Review.tex`	4	What's deferred (LLM tiers), cost estimates, and an honest self-review

Repository map: src/cyclebench/ (simulator) • experiments/ (suite, theory checks, figures) • results/ (canonical JSONs) • PLAN.md, PREREGISTRATION.md, THEORY.md (process record).

🛣️ Roadmap — the LLM Tier

The decisive missing experiment is cheap: a Tier-1-LLM mini gate (~$60–300 self-hosted; 4 policies × 5 seeds on a small frozen model with real procedurally-generated tool quirks) answers whether the class dissociation survives a real model. Then: learned family inference in the loop (<$30), Tier-2 (ALFWorld/GAIA-style pools under cyclic schedules, $500–2,500), and the Tier-3 longitudinal "runbook survives 800 tasks of dormancy" hero run. Full plan with gates and costs: paper/Next Steps and Review.tex.

Pre-registered prediction for that tier: usage-driven stores will lose dormant-family competence at rates governed by their reclaim channel; structurally-owned stores will not; and the gap will shrink as zero-shot competence rises.

🧬 Relation to GAUSE

Both projects instantiate one principle in different substrates — capacity protection must not depend on the signal whose absence defines the thing being protected:

	GAUSE (learners)	NicheMem (agent memory)
Capacity	learner calibration (weights)	token budget (context)
Regimes	market/environment states	task families
Forgetting cost	post-reactivation error	post-reactivation task failure
What did not transfer	—	drift trigger (stores self-repair) • surplus harmlessness (acquisition coupling) • single-allocator assumption (budget side door)

📄 Citation

@techreport{li2026nichemem,
  title  = {Idle but Not Forgotten: Reward-Independent Memory Ownership
            for Long-Horizon Agents --- A Mechanism-Level Study with CycleBench-Sim},
  author = {Li, Yuhao},
  institution = {University of Pennsylvania},
  year   = {2026},
  note   = {Mechanism-level evidence; LLM-tier predictions pre-registered}
}

_{Built on the GAUSE reward-independence principle • All predictions pre-registered • All refutations reported}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🗂️ NicheMem: Idle but Not Forgotten

Reward-Independent Memory Ownership for Long-Horizon Agents

📖 Abstract

⚠️ Honest Scope (read first)

🚀 Quick Start

🎯 Key Results

⚙️ The Mechanism

🔬 Findings That Refuted Our Own Predictions

📚 The Paper Suite

🛣️ Roadmap — the LLM Tier

🧬 Relation to GAUSE

📄 Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
assets		assets
experiments		experiments
paper		paper
results		results
src/cyclebench		src/cyclebench
.gitignore		.gitignore
LICENSE		LICENSE
PLAN.md		PLAN.md
PREREGISTRATION.md		PREREGISTRATION.md
README.md		README.md
THEORY.md		THEORY.md

Folders and files

Latest commit

History

Repository files navigation

🗂️ NicheMem: Idle but Not Forgotten

Reward-Independent Memory Ownership for Long-Horizon Agents

📖 Abstract

⚠️ Honest Scope (read first)

🚀 Quick Start

🎯 Key Results

⚙️ The Mechanism

🔬 Findings That Refuted Our Own Predictions

📚 The Paper Suite

🛣️ Roadmap — the LLM Tier

🧬 Relation to GAUSE

📄 Citation

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages