feat(salience): length filter + tuned exemplars#150
Merged
Conversation
Two iterations on the embedding-based scorer based on first real prod-vault scoring output (PR #149 will get embeddings into the snapshot first; this PR makes the scoring itself more robust): 1. --min-content-length filter (default 50 chars). Hard-filter tiny memories like "halt the run" / "propagate" / "Option 1 or 3" BEFORE scoring. They technically score via breadth (FTS-match many queries) but carry no actual standing-tier constraint — too thin to condition reasoning. Operator can set to 0 to disable, or any other threshold. Rationale for hard filter (not soft penalty): no amount of other signal saves a 2-word memory from being noise in a standing-tier context. Soft penalty would still let edge cases through; hard filter is robust. 2. Tuned exemplar lists. CONSTRAINT_EXEMPLARS expanded from 10 to 21 patterns drawn from real Brian-coded standing rules observed during this session: - SOTA / institutional rules (5) - Verification / discipline (5) - Failure / error handling (4) - Process / coordination (3) - Existential constraints (3) TIME_BOUNDED_EXEMPLARS expanded from 10 to 17 patterns. New coverage includes: - Session handoff shapes (5): the dominant noise pattern in bare-prod-snapshot scoring (PR #149 output) was tiny session handoffs like "Session: proceed". These need explicit negative-exemplar coverage. - PR / commit references (2) - Tiny single-thought patterns (3): "halt the run", "propagate", "Option 1 or 3" — defense in depth with the length filter. Verified locally: smoke against 4-doc vault, length filter drops 1 sub-50-char doc; scoring still discriminates. Full pytest 801, harness 13/13. Composes with PR #149 — once both merge: scripts/salience_phase0.sh snapshot # pulls vec.npz alongside sqlite scripts/salience_phase0.sh score # tuned exemplars + length filter Expected: scoring against prod will produce meaningfully better picks. Constraint-shape memories should rise, session-handoff noise should be penalized (high time_penalty), and sub-50-char single-thought memories should be filtered out entirely. Future iteration paths (NOT in this PR): - Vault-derived auto-exemplars (sample high-confidence feedback/ preference memories as positive exemplars instead of hand- tuning). True prototype-network design. ~1h work. - Per-content-type length thresholds (handoffs often have meta headers like "- Topic: X" that inflate length without adding constraint content). Operator-tunable. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Two iterations on
build_standing_set.pybased on the first real prod-vault scoring output (where bare-snapshot picks were tiny noise like "halt the run" / "propagate"):--min-content-lengthfilter (default 50 chars). Hard-filter sub-threshold memories BEFORE scoring.CONSTRAINT_EXEMPLARSfrom 10 → 21 patterns,TIME_BOUNDED_EXEMPLARSfrom 10 → 17. Coverage now includes session-handoff shapes (the dominant noise class).Length filter
Hard-filter (not soft penalty) because no amount of constraint-match or contradiction-win saves a 2-word memory from being noise in a standing-tier context. Operator can disable or tune.
Exemplar tuning
CONSTRAINT_EXEMPLARS(10 → 21): organized into 5 groups based on standing rules observed in your recall context this session:TIME_BOUNDED_EXEMPLARS(10 → 17):Composes with PR #149
Once both merge:
Expected outcome: constraint-shape memories rise, session-handoff noise gets explicit time-penalty, sub-50-char single-thoughts are excluded entirely.
Verified
Future iterations (NOT in this PR)
Test plan
scoreagainst prod produces meaningfully better picks (constraint-shape rises, handoff noise falls)🤖 Generated with Claude Code