Skip to content

epmem: opt-in kernel-mediated copy of long-present constant WMEs into smem, with episode eviction (experimental)#578

Closed
kimjune01 wants to merge 3 commits intoSoarGroup:developmentfrom
kimjune01:epmem-consolidation
Closed

epmem: opt-in kernel-mediated copy of long-present constant WMEs into smem, with episode eviction (experimental)#578
kimjune01 wants to merge 3 commits intoSoarGroup:developmentfrom
kimjune01:epmem-consolidation

Conversation

@kimjune01
Copy link
Copy Markdown

@kimjune01 kimjune01 commented Mar 25, 2026

What this PR does

Adds an opt-in mechanism that periodically scans episodic memory for constant WMEs continuously present for ≥ N episodes and inserts them into semantic memory via the kernel's CLI_add path, outside the agent's production system. Optionally evicts old episode rows after each run.

All four parameters are off by default. The scan (including eviction) only runs when consolidate=on; setting consolidate-evict-age alone has no effect.

This is kernel-mediated semantic memory insertion

The central architectural choice in this PR is that a kernel-side scan, not the agent's own productions, decides what gets added to semantic memory. The scan is driven by an epmem persistence heuristic (continuous presence for N episodes). This is not a claim about how Soar should derive semantic memory, or about which regularities "deserve" promotion. It is a mechanism that exposes one heuristic and lets users decide when and whether to invoke it.

Any agent that queries smem after a run will observe entries that were not there before. Any agent that queries episodes older than the eviction age will get different retrieval results. These are deliberate semantic changes, not transparent optimizations.

Open for discussion on whether such a mechanism belongs in Soar and under what conditions. Not a merge proposal.

Parameters

Parameter Type Default Description
consolidate on/off off Enable the periodic scan
consolidate-interval integer 100 Episodes between scans
consolidate-threshold integer 10 Minimum continuous presence in episodes before a WME qualifies
consolidate-evict-age integer 0 (off) Minimum episode age before eviction eligible; 0 disables eviction

Algorithm

Step Operation
Scan Constant WMEs present in epmem_wmes_constant_now with start_episode_id ≤ current - threshold
Dedup Exclude wc_ids already recorded in epmem_consolidated tracking table
Group Cluster qualifying WMEs by epmem parent node
Sanitize Reverse-hash attribute/value strings, skip empty or unquotable strings
Write Build (<c_n> ^attr value ...) string per parent, pass to CLI_add
Mark Insert consolidated wc_ids into epmem_consolidated to prevent reprocessing
Evict If consolidate-evict-age > 0, delete point and range rows and episode rows older than current - evict-age; _now intervals are untouched
Trigger Every consolidate-interval episodes

The scan uses a single SQL statement joining epmem_wmes_constant, epmem_wmes_constant_now, and the dedup table. For WMEs currently open in _now, "continuous presence for N episodes" is detectable by comparing start_episode_id against current - threshold, avoiding aggregation over episode history. This relies on current epmem interval semantics and only considers WMEs in the _now set at scan time. It does not attempt to detect WMEs that were interrupted and re-added during the window, or to deduplicate WMEs that are semantically or structurally equivalent across different wc_ids.

Known limitations

  • Constant WMEs only. Identifier (graph-structured) WMEs are not copied; resulting smem LTIs are therefore flat.
  • Newly synthesized structures, not transferred objects. Entries grouped by epmem parent-node ID become newly synthesized LTIs with fresh identifiers. They do not carry identity continuity from any epmem episode or prior smem LTI, and may not correspond to any cognitively meaningful semantic object.
  • Per-wc_id dedup only. The epmem_consolidated table prevents reprocessing of the same wc_id but does not deduplicate semantically or structurally equivalent WMEs across different wc_ids. Repeated runs can produce multiple smem LTIs covering overlapping content.
  • Asymmetric eviction. Evicting from epmem does not retract previously created smem entries. Repeated runs accumulate smem content monotonically; cleaning up is left to the user.
  • No cross-tier invalidation. If underlying epmem WMEs change or disappear, the smem entries created from them are not updated or removed.
  • _now intervals preserved during eviction. Active WME intervals are never split; only completed ranges, points, and episode rows older than the eviction age are removed.
  • Continuous-presence is a narrow heuristic. Frequent-but-interrupted patterns are not caught. Frequency, co-occurrence, and causal structure are not detected.
  • No benchmarks of agent-level impact. This PR does not claim that enabling the mechanism improves agent behavior on any task.

Changes

File Change
episodic_memory.h 4 params, 1 stat, 7 prepared statements, function declaration
episodic_memory.cpp Param/stat init, SQL statements, schema table + drop, epmem_consolidate() with eviction (~200 lines), hook in epmem_go()
EpMemFunctionalTests.hpp 3 new test declarations
EpMemFunctionalTests.cpp 3 test implementations
2 new .soar test agents Consolidation and eviction test agents

Tests

46/46 epmem tests pass (43 existing + 3 new), zero regressions:

  • testConsolidation — stable WMEs written to smem after consolidation fires
  • testConsolidationOff — smem stays empty when feature is disabled
  • testConsolidationEviction — old episodes evicted, recent episodes preserved, smem entries intact

Background references

Periodically scan episodic memory for stable WME structures and write
them to semantic memory as new LTIs.  This implements the compose+test
framework (Casteigts et al., 2019) for automatic episodic-to-semantic
knowledge transfer — the operation Soar's long-term declarative stores
have been missing.

Algorithm:
  compose — union of constant WMEs currently active in epmem
  test    — continuous presence >= consolidate-threshold episodes
  write   — create smem LTI with qualifying augmentations via CLI_add

New parameters (all under epmem):
  consolidate            on/off  (default off)
  consolidate-interval   integer (default 100) — episodes between runs
  consolidate-threshold  integer (default 10)  — min episode persistence

Deduplication via epmem_consolidated tracking table prevents repeated
writes across consolidation runs.  Table is dropped on reinit alongside
other epmem graph tables.

Off by default — zero behavior change until explicitly enabled.

Limitations (deferred to follow-up):
  - Only consolidates constant-valued WMEs, not identifier edges
  - No back-invalidation across the WM/smem tier boundary
  - last-consolidation stat does not persist across agent reinit

Motivation: Derbinsky & Laird (2013) proved forgetting is essential to
Soar's scaling but only built it for working and procedural memory.
Episodic and semantic memory have no eviction and no capacity bound.
This patch addresses the first half: automatic semantic learning from
episodic experience.  With semantic entries derived from episodes,
episodic eviction becomes safe (merged episodes leave no reconstruction
debt), and R4's forgettable WME scope expands automatically.

Reference:
  Casteigts et al. (2019), "Computing Parameters of Sequence-Based
    Dynamic Graphs," Theory of Computing Systems.
  Derbinsky & Laird (2013), "Effective and efficient forgetting of
    learned knowledge in Soar's working and procedural memories,"
    Cognitive Systems Research.
  https://june.kim/prescription-soar — full prescription
After consolidation writes stable WMEs to smem, old episodes become
redundant.  Delete point entries and episode rows older than
consolidate-evict-age episodes.  This is safe: the consolidated
knowledge is in smem, so there is no reconstruction debt.

New parameter:
  consolidate-evict-age  integer (default 0 = off) — min age before
  an episode is eligible for eviction

Range and _now interval entries are preserved (they span multiple
episodes).  Only point entries and episode rows are removed.

Reference: Derbinsky & Laird (2013), §5 — "forgotten working-memory
knowledge may be recovered via deliberate reconstruction from semantic
memory."  Consolidation creates the semantic entries; eviction removes
the source episodes that are no longer needed for reconstruction.
- Delete _range entries whose intervals end before the eviction cutoff
  (previously only _point entries were evicted, leaving dead weight)
- Wrap all eviction DELETEs in BEGIN/COMMIT when lazy_commit is off
  for atomicity (when lazy_commit is on, already inside a transaction)

Retrieval of evicted episodes is already safe: epmem_install_memory
checks valid_episode and returns ^retrieved no-memory.
@scijones
Copy link
Copy Markdown
Contributor

Wow, this and the other pull request. I need to review these, but thanks! These look exciting.

@scijones scijones self-assigned this Mar 25, 2026
@kimjune01 kimjune01 changed the title Episodic-to-semantic memory consolidation (experimental) epmem: opt-in scan writing stable WMEs to smem, with episode eviction (experimental) Apr 9, 2026
@kimjune01 kimjune01 changed the title epmem: opt-in scan writing stable WMEs to smem, with episode eviction (experimental) epmem: opt-in kernel-mediated copy of long-present constant WMEs into smem, with episode eviction (experimental) Apr 9, 2026
@kimjune01
Copy link
Copy Markdown
Author

Retiring this PR. The assumptions it was built on were wrong:

  1. D&L 2013 extrapolation. The prior diagnosis extrapolated from D&L 2013's WM/procedural forgetting work to claim smem/epmem eviction was needed. D&L 2013 does not make this claim — it treats smem as the stable backup store that enables WM forgetting (R4), not as something needing its own eviction.

  2. epmem→smem pathway. This PR assumed epmem and smem share a compatible graph representation where episodes accumulate into semantic entries. John Laird rejected this in a 2026-04-09 meeting: smem's data structure is completely different from epmem's. Smem is populated from working memory via deliberate agent action, not derived from epmem. Laird 2022 Figure 6 confirms: smem's source of knowledge is "Existence in Working memory."

  3. Scaling motivation. Laird rates Soar's real-time capability "yes, yes" with "millions of items" and diverse knowledge at "tens of millions of rules, facts, and episodes" (§10 items 3, 12). The scaling-is-the-bottleneck framing does not hold.

See the re-intake at june.kim for the corrected SOAP analysis.

@kimjune01 kimjune01 closed this Apr 11, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants