smem: add --sweep-dominated for destructive structural compaction (experimental, changes retrieval behavior) by kimjune01 · Pull Request #580 · SoarGroup/Soar

kimjune01 · 2026-03-27T16:36:39Z

What this PR does

Adds smem --sweep-dominated [<budget>]: a destructive CLI command that deletes LTIs identified by the same structural inclusion predicate used by smem --redundancy-check, up to an optional budget. Also adds a kernel routine delete_ltm() used as the deletion path.

Off by default. Must be invoked explicitly. Deletion is irreversible within the store; this command is intended for manual maintenance workflows, not routine agent execution.

Important: this command changes smem behavior, not just storage

Structural containment under the inclusion predicate does not imply retrieval equivalence. Deleting a "contained" LTI can change retrieval results qualitatively, not just shift ranking:

Base-level activation. If the contained LTI has been retrieved more recently than the container, its activation history can rank it above the container for cues both satisfy. Removing it changes which LTI is returned by such cues.
Spreading activation. Edges into and out of the contained LTI contribute to the global spread distribution. Removing it changes activation propagation for any query that touches its neighborhood, even when the query does not match it.
Identity, aliases, and cue interactions. Downstream productions may match on specific LTI identifiers, aliases, or graph neighborhoods. Removing an LTI breaks these even when a structural container exists.

Strict behavior preservation would at least require merging activation history and spreading contributions from the removed LTI into its container, and likely additional semantic reconciliation beyond that. This PR does not attempt any of it. Retrieval results and activation dynamics may change in ways this command does not predict.

Algorithm

Mark. Reuses the structural inclusion predicate to identify contained LTIs.
Filter. Attempts to exclude LTIs currently referenced as WME ids in working memory via smem_in_wmem. This filter is best-effort only. It does not track LTIs referenced as WME values, so a leaf LTI present in working memory only as a value may not be protected.
Sweep. Calls delete_ltm() for each contained entry in reverse ID order, up to the optional budget.
Report. Prints what was deleted, what the filter excluded, what survived.

`delete_ltm(pLTI_ID)`: centralized deletion bookkeeping

Consolidates the bookkeeping currently needed by this command into one routine. The steps below are what is currently performed; this list is not a claim of completeness against the entire smem schema.

Disconnects outgoing edges via existing disconnect_ltm.
Updates inbound parent bookkeeping: child counts, LTI-child counts, attribute frequencies (NULL-safe for mixed constant/LTI attributes).
Invalidates spreading activation trajectories for affected parents (guarded on spreading-enabled).
Renormalizes edge weights on surviving parents.
Cleans the auxiliary tables currently known to reference the deleted ID: smem_prohibited, smem_trajectories, smem_likelihoods, smem_trajectory_num, the spread tables used by the current spreading implementation, activation history, aliases, and fake activations.
Deletes from smem_lti and updates lti node count statistics.

If other auxiliary tables or invariants touch the deleted ID and are not listed above, they are not being maintained by this routine. A maintainer review of the full schema is a prerequisite for using this command in production.

Known limitations

Not retrieval-equivalent in the presence of BLA or spreading. Covered above.
Working-memory reference filter is incomplete. smem_in_wmem tracks LTIs as WME ids only, not as values. A leaf LTI present in working memory only as a WME value can slip through the filter and be deleted while still referenced. This is best-effort, not a safety guarantee.
Spreading activation test coverage is partial. Trajectory invalidation and weight renormalization are implemented but only guarded-on-enabled; no spreading-enabled regression run is included here.
Full smem regression suite not run. Only the targeted tests listed below are reported.
Deletion is irreversible within the store. No undo path, no transactional rollback after the command returns.
No benchmarks of agent-level impact. This PR does not claim that deleting structurally contained entries improves any measurable agent behavior.

Test plan

Build succeeds (CMake + make, macOS)
testSweepDominated: flat entries, verify deletion and survival
testSweepDominatedWithInboundRefs: parent with LTI child, child contained and deleted, parent retains constant attributes
Post-sweep redundancy check confirms no remaining contained entries
Test with budget argument
Test with spreading activation enabled
Run full smem test suite for regressions

Periodically scan episodic memory for stable WME structures and write them to semantic memory as new LTIs. This implements the compose+test framework (Casteigts et al., 2019) for automatic episodic-to-semantic knowledge transfer — the operation Soar's long-term declarative stores have been missing. Algorithm: compose — union of constant WMEs currently active in epmem test — continuous presence >= consolidate-threshold episodes write — create smem LTI with qualifying augmentations via CLI_add New parameters (all under epmem): consolidate on/off (default off) consolidate-interval integer (default 100) — episodes between runs consolidate-threshold integer (default 10) — min episode persistence Deduplication via epmem_consolidated tracking table prevents repeated writes across consolidation runs. Table is dropped on reinit alongside other epmem graph tables. Off by default — zero behavior change until explicitly enabled. Limitations (deferred to follow-up): - Only consolidates constant-valued WMEs, not identifier edges - No back-invalidation across the WM/smem tier boundary - last-consolidation stat does not persist across agent reinit Motivation: Derbinsky & Laird (2013) proved forgetting is essential to Soar's scaling but only built it for working and procedural memory. Episodic and semantic memory have no eviction and no capacity bound. This patch addresses the first half: automatic semantic learning from episodic experience. With semantic entries derived from episodes, episodic eviction becomes safe (merged episodes leave no reconstruction debt), and R4's forgettable WME scope expands automatically. Reference: Casteigts et al. (2019), "Computing Parameters of Sequence-Based Dynamic Graphs," Theory of Computing Systems. Derbinsky & Laird (2013), "Effective and efficient forgetting of learned knowledge in Soar's working and procedural memories," Cognitive Systems Research. https://june.kim/prescription-soar — full prescription

After consolidation writes stable WMEs to smem, old episodes become redundant. Delete point entries and episode rows older than consolidate-evict-age episodes. This is safe: the consolidated knowledge is in smem, so there is no reconstruction debt. New parameter: consolidate-evict-age integer (default 0 = off) — min age before an episode is eligible for eviction Range and _now interval entries are preserved (they span multiple episodes). Only point entries and episode rows are removed. Reference: Derbinsky & Laird (2013), §5 — "forgotten working-memory knowledge may be recovered via deliberate reconstruction from semantic memory." Consolidation creates the semantic entries; eviction removes the source episodes that are no longer needed for reconstruction.

- Delete _range entries whose intervals end before the eviction cutoff (previously only _point entries were evicted, leaving dead weight) - Wrap all eviction DELETEs in BEGIN/COMMIT when lazy_commit is off for atomicity (when lazy_commit is on, already inside a transaction) Retrieval of evicted episodes is already safe: epmem_install_memory checks valid_episode and returns ^retrieved no-memory.

Implements Kilpeläinen-Mannila tree inclusion to detect which LTI entries are structurally dominated by others. Detection only — no eviction. Works at the raw hash level via web_expand to avoid Symbol allocation overhead.

…rror routing Codex review found four issues: 1. Greedy child matching could produce false negatives when first-fit blocks later required matches. Replaced with backtracking injective matcher. 2. Cycle detection conflated "currently exploring" with "proven to include". Split into separate active_pairs (recursion stack) and memo (proven results). Cycles now conservatively return false. 3. CLI_redundancy_check returned void, DoSMem always returned true. Changed to bool with SetError routing on failure. 4. help smem did not list --redundancy-check. Added.

@1

Round 2 codex review found two remaining issues: 1. Per-attribute a_used vectors allowed two distinct B nodes to map to the same A node via different attributes. Fixed by threading a global b_to_a map (B node → A node assignment) through all recursion. Backtracking undoes assignments on failure. 2. Active-pair cycle check returned false (pessimistic), which missed valid cyclic equivalences like @1 ^next @1 vs @2 ^next @2. Changed to coinductive (optimistic): revisiting an active pair returns true. If the assumption is wrong, non-cyclic proof obligations will fail.

Round 3 codex review found two issues: 1. Failed speculative branches leaked descendant b_to_a bindings. Fixed by snapshotting b_to_a before each branch and restoring on failure. Also pre-bind b_child before recursion so descendants see the intended assignment. 2. Memoization keyed by (lti_a, lti_b) was unsound because results depend on the current b_to_a context. Dropped memo entirely — smem entries are shallow (depth 1-2) so re-evaluation is cheap.

@b

Round 4 codex review found two issues: 1. Root of B was not pinned to root of A in the global assignment. Counterexample: @b ^next @b vs @A ^next @A1, @A1 ^next @A1 — B's root could map to A1 instead of A. Fixed by seeding b_to_a[lti_b] = lti_a before recursion. 2. smem --redundancy-check was missing from the runtime help screen in smem_settings.cpp. Added.

@1

…ated LTIs Mark phase reuses tree inclusion detection from SoarGroup#579. Sweep phase: 1. R4 safety check: skip LTIs currently referenced in working memory 2. Dependency-safe deletion: disconnect_ltm, then delete from all smem tables (augmentations, activation history, aliases, lti) 3. Per-invocation budget via optional numeric argument Includes functional test proving full eviction: add 3 LTIs where @1 is structurally dominated by @2, sweep, verify @1 is gone and @2/@3 survive. Post-sweep redundancy check confirms no remaining dominated entries.

Replaces raw SQL deletion in sweep with a proper kernel-side routine that composes existing bookkeeping paths: 1. disconnect_ltm for outgoing edges (existing) 2. Inbound edge update: for each parent pointing to this LTI, decrement child counts, LTI-child counts, attribute frequencies 3. Invalidate spreading activation via invalidate_from_lti 4. Clean all 8+ auxiliary tables: prohibited, trajectories, likelihoods, trajectory_num, current/uncommitted/committed spread, activation history, aliases, fake activations 5. Delete from smem_lti and update node count Addresses all three correctness blockers from codex round 1 review of the sweep PR.

@dead

Round 2 codex review found: remaining_attr_q excluded constant rows because NULL <> ? is NULL in SQLite (not true). A parent with ^name @dead AND ^name alice would incorrectly decrement attribute frequency after deleting @dead. Fixed with (value_lti_id IS NULL OR value_lti_id<>?). Added testSweepDominatedWithInboundRefs: parent @1 has ^name alice (constant) and ^friend @2 (LTI child). @2 is dominated by @3 and swept. Verifies @1 retains ^name after the LTI child is removed.

… test Round 3 codex review found: (1) surviving parents need trajectory invalidation and edge weight renormalization after child removal, (2) NULL-safe attribute frequency check already fixed. Guard spreading invalidation on spreading-enabled check to avoid hangs when spreading tables aren't initialized. Added testSweepDominatedWithInboundRefs: parent with LTI child where child is dominated and swept. Verifies parent survives with constant attributes intact, no redundancy remains after sweep.

kimjune01 · 2026-04-11T06:02:34Z

Retiring this PR. The assumptions it was built on were wrong:

D&L 2013 extrapolation. The prior diagnosis extrapolated from D&L 2013's WM/procedural forgetting to claim smem needed destructive compaction. D&L 2013 treats smem as the stable backup enabling WM forgetting (R4), not as a target for eviction or compaction.
Scaling motivation retired. Laird rates real-time capability "yes, yes" with "millions of items" and diverse knowledge at "tens of millions" (§10 items 3, 12).
epmem→smem pathway unsupported. The sweep-dominated mechanism assumed structural compatibility between epmem and smem that does not exist. Laird rejected the unified-graph model (2026-04-09 meeting). Smem is populated from WM, not derived from epmem.

See the re-intake at june.kim for the corrected SOAP analysis.

kimjune01 added 12 commits March 25, 2026 04:52

Add smem --redundancy-check for tree inclusion detection

e32d4d6

Implements Kilpeläinen-Mannila tree inclusion to detect which LTI entries are structurally dominated by others. Detection only — no eviction. Works at the raw hash level via web_expand to avoid Symbol allocation overhead.

kimjune01 marked this pull request as ready for review March 27, 2026 18:52

kimjune01 changed the title ~~Add smem --sweep-dominated: budgeted eviction of dominated LTIs (draft)~~ smem: add --sweep-dominated for structural eviction (experimental, not behavior-preserving) Apr 9, 2026

kimjune01 changed the title ~~smem: add --sweep-dominated for structural eviction (experimental, not behavior-preserving)~~ smem: add --sweep-dominated for destructive structural compaction (experimental, changes retrieval behavior) Apr 9, 2026

kimjune01 closed this Apr 11, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

smem: add --sweep-dominated for destructive structural compaction (experimental, changes retrieval behavior)#580

smem: add --sweep-dominated for destructive structural compaction (experimental, changes retrieval behavior)#580
kimjune01 wants to merge 12 commits intoSoarGroup:developmentfrom
kimjune01:smem-sweep-dominated

kimjune01 commented Mar 27, 2026 •

edited

Loading

Uh oh!

kimjune01 commented Apr 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

kimjune01 commented Mar 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this PR does

Important: this command changes smem behavior, not just storage

Algorithm

delete_ltm(pLTI_ID): centralized deletion bookkeeping

Known limitations

Test plan

Uh oh!

kimjune01 commented Apr 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

kimjune01 commented Mar 27, 2026 •

edited

Loading

`delete_ltm(pLTI_ID)`: centralized deletion bookkeeping