feat(salience): length filter + tuned exemplars by cipher813 · Pull Request #150 · cipher813/mnemon

cipher813 · 2026-05-22T00:02:44Z

Summary

Two iterations on build_standing_set.py based on the first real prod-vault scoring output (where bare-snapshot picks were tiny noise like "halt the run" / "propagate"):

--min-content-length filter (default 50 chars). Hard-filter sub-threshold memories BEFORE scoring.
Tuned exemplar lists. Expanded CONSTRAINT_EXEMPLARS from 10 → 21 patterns, TIME_BOUNDED_EXEMPLARS from 10 → 17. Coverage now includes session-handoff shapes (the dominant noise class).

Length filter

scripts/salience_phase0.sh score                              # default: 50-char min
scripts/salience_phase0.sh score --min-content-length 100      # stricter
scripts/salience_phase0.sh score --min-content-length 0        # disable

Hard-filter (not soft penalty) because no amount of constraint-match or contradiction-win saves a 2-word memory from being noise in a standing-tier context. Operator can disable or tune.

Exemplar tuning

CONSTRAINT_EXEMPLARS (10 → 21): organized into 5 groups based on standing rules observed in your recall context this session:

SOTA / institutional (5)
Verification / discipline (5)
Failure / error handling (4)
Process / coordination (3)
Existential constraints (3)

TIME_BOUNDED_EXEMPLARS (10 → 17):

Session handoff shapes (5) — dominant noise pattern
PR / commit references (2)
Tiny single-thought patterns (3) — defense in depth with length filter

Composes with PR #149

Once both merge:

scripts/salience_phase0.sh snapshot     # PR #149: pulls vec.npz
scripts/salience_phase0.sh score        # this PR: tuned + length-filtered

Expected outcome: constraint-shape memories rise, session-handoff noise gets explicit time-penalty, sub-50-char single-thoughts are excluded entirely.

Verified

Local 4-doc smoke: length filter drops 1 sub-50-char doc; scoring still discriminates correctly
Full pytest 801, harness 13/13

Future iterations (NOT in this PR)

Vault-derived auto-exemplars — sample high-confidence feedback/preference memories as positive exemplars instead of hand-tuning. True prototype-network design. ~1h work.
Per-content-type length thresholds — handoffs often have meta headers ("- Topic: X") that inflate length without adding constraint content.

Test plan

Length filter works (drops 1 doc in local 4-doc smoke)
801/801 pytest, 13/13 harness
After PR fix(scripts): salience_phase0.sh snapshot pulls vec.npz alongside sqlite #149 + this merge: score against prod produces meaningfully better picks (constraint-shape rises, handoff noise falls)

🤖 Generated with Claude Code

Two iterations on the embedding-based scorer based on first real prod-vault scoring output (PR #149 will get embeddings into the snapshot first; this PR makes the scoring itself more robust): 1. --min-content-length filter (default 50 chars). Hard-filter tiny memories like "halt the run" / "propagate" / "Option 1 or 3" BEFORE scoring. They technically score via breadth (FTS-match many queries) but carry no actual standing-tier constraint — too thin to condition reasoning. Operator can set to 0 to disable, or any other threshold. Rationale for hard filter (not soft penalty): no amount of other signal saves a 2-word memory from being noise in a standing-tier context. Soft penalty would still let edge cases through; hard filter is robust. 2. Tuned exemplar lists. CONSTRAINT_EXEMPLARS expanded from 10 to 21 patterns drawn from real Brian-coded standing rules observed during this session: - SOTA / institutional rules (5) - Verification / discipline (5) - Failure / error handling (4) - Process / coordination (3) - Existential constraints (3) TIME_BOUNDED_EXEMPLARS expanded from 10 to 17 patterns. New coverage includes: - Session handoff shapes (5): the dominant noise pattern in bare-prod-snapshot scoring (PR #149 output) was tiny session handoffs like "Session: proceed". These need explicit negative-exemplar coverage. - PR / commit references (2) - Tiny single-thought patterns (3): "halt the run", "propagate", "Option 1 or 3" — defense in depth with the length filter. Verified locally: smoke against 4-doc vault, length filter drops 1 sub-50-char doc; scoring still discriminates. Full pytest 801, harness 13/13. Composes with PR #149 — once both merge: scripts/salience_phase0.sh snapshot # pulls vec.npz alongside sqlite scripts/salience_phase0.sh score # tuned exemplars + length filter Expected: scoring against prod will produce meaningfully better picks. Constraint-shape memories should rise, session-handoff noise should be penalized (high time_penalty), and sub-50-char single-thought memories should be filtered out entirely. Future iteration paths (NOT in this PR): - Vault-derived auto-exemplars (sample high-confidence feedback/ preference memories as positive exemplars instead of hand- tuning). True prototype-network design. ~1h work. - Per-content-type length thresholds (handoffs often have meta headers like "- Topic: X" that inflate length without adding constraint content). Operator-tunable. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

cipher813 merged commit 052db8b into main May 22, 2026
9 checks passed

cipher813 deleted the feat/salience-length-filter-exemplar-tune branch May 22, 2026 00:03

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(salience): length filter + tuned exemplars#150

feat(salience): length filter + tuned exemplars#150
cipher813 merged 1 commit into
mainfrom
feat/salience-length-filter-exemplar-tune

cipher813 commented May 22, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

cipher813 commented May 22, 2026

Summary

Length filter

Exemplar tuning

Composes with PR #149

Verified

Future iterations (NOT in this PR)

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant