Skip to content

feat(salience): length filter + tuned exemplars#150

Merged
cipher813 merged 1 commit into
mainfrom
feat/salience-length-filter-exemplar-tune
May 22, 2026
Merged

feat(salience): length filter + tuned exemplars#150
cipher813 merged 1 commit into
mainfrom
feat/salience-length-filter-exemplar-tune

Conversation

@cipher813
Copy link
Copy Markdown
Owner

Summary

Two iterations on build_standing_set.py based on the first real prod-vault scoring output (where bare-snapshot picks were tiny noise like "halt the run" / "propagate"):

  1. --min-content-length filter (default 50 chars). Hard-filter sub-threshold memories BEFORE scoring.
  2. Tuned exemplar lists. Expanded CONSTRAINT_EXEMPLARS from 10 → 21 patterns, TIME_BOUNDED_EXEMPLARS from 10 → 17. Coverage now includes session-handoff shapes (the dominant noise class).

Length filter

scripts/salience_phase0.sh score                              # default: 50-char min
scripts/salience_phase0.sh score --min-content-length 100      # stricter
scripts/salience_phase0.sh score --min-content-length 0        # disable

Hard-filter (not soft penalty) because no amount of constraint-match or contradiction-win saves a 2-word memory from being noise in a standing-tier context. Operator can disable or tune.

Exemplar tuning

CONSTRAINT_EXEMPLARS (10 → 21): organized into 5 groups based on standing rules observed in your recall context this session:

  • SOTA / institutional (5)
  • Verification / discipline (5)
  • Failure / error handling (4)
  • Process / coordination (3)
  • Existential constraints (3)

TIME_BOUNDED_EXEMPLARS (10 → 17):

  • Session handoff shapes (5) — dominant noise pattern
  • PR / commit references (2)
  • Tiny single-thought patterns (3) — defense in depth with length filter

Composes with PR #149

Once both merge:

scripts/salience_phase0.sh snapshot     # PR #149: pulls vec.npz
scripts/salience_phase0.sh score        # this PR: tuned + length-filtered

Expected outcome: constraint-shape memories rise, session-handoff noise gets explicit time-penalty, sub-50-char single-thoughts are excluded entirely.

Verified

  • Local 4-doc smoke: length filter drops 1 sub-50-char doc; scoring still discriminates correctly
  • Full pytest 801, harness 13/13

Future iterations (NOT in this PR)

  • Vault-derived auto-exemplars — sample high-confidence feedback/preference memories as positive exemplars instead of hand-tuning. True prototype-network design. ~1h work.
  • Per-content-type length thresholds — handoffs often have meta headers ("- Topic: X") that inflate length without adding constraint content.

Test plan

🤖 Generated with Claude Code

Two iterations on the embedding-based scorer based on first real
prod-vault scoring output (PR #149 will get embeddings into the
snapshot first; this PR makes the scoring itself more robust):

1. --min-content-length filter (default 50 chars). Hard-filter tiny
   memories like "halt the run" / "propagate" / "Option 1 or 3"
   BEFORE scoring. They technically score via breadth (FTS-match
   many queries) but carry no actual standing-tier constraint —
   too thin to condition reasoning. Operator can set to 0 to
   disable, or any other threshold.

   Rationale for hard filter (not soft penalty): no amount of
   other signal saves a 2-word memory from being noise in a
   standing-tier context. Soft penalty would still let edge cases
   through; hard filter is robust.

2. Tuned exemplar lists. CONSTRAINT_EXEMPLARS expanded from 10 to
   21 patterns drawn from real Brian-coded standing rules observed
   during this session:
   - SOTA / institutional rules (5)
   - Verification / discipline (5)
   - Failure / error handling (4)
   - Process / coordination (3)
   - Existential constraints (3)

   TIME_BOUNDED_EXEMPLARS expanded from 10 to 17 patterns. New
   coverage includes:
   - Session handoff shapes (5): the dominant noise pattern in
     bare-prod-snapshot scoring (PR #149 output) was tiny session
     handoffs like "Session: proceed". These need explicit
     negative-exemplar coverage.
   - PR / commit references (2)
   - Tiny single-thought patterns (3): "halt the run", "propagate",
     "Option 1 or 3" — defense in depth with the length filter.

Verified locally: smoke against 4-doc vault, length filter drops
1 sub-50-char doc; scoring still discriminates. Full pytest 801,
harness 13/13.

Composes with PR #149 — once both merge:
  scripts/salience_phase0.sh snapshot   # pulls vec.npz alongside sqlite
  scripts/salience_phase0.sh score      # tuned exemplars + length filter

Expected: scoring against prod will produce meaningfully better
picks. Constraint-shape memories should rise, session-handoff
noise should be penalized (high time_penalty), and sub-50-char
single-thought memories should be filtered out entirely.

Future iteration paths (NOT in this PR):
  - Vault-derived auto-exemplars (sample high-confidence feedback/
    preference memories as positive exemplars instead of hand-
    tuning). True prototype-network design. ~1h work.
  - Per-content-type length thresholds (handoffs often have meta
    headers like "- Topic: X" that inflate length without adding
    constraint content). Operator-tunable.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@cipher813 cipher813 merged commit 052db8b into main May 22, 2026
9 checks passed
@cipher813 cipher813 deleted the feat/salience-length-filter-exemplar-tune branch May 22, 2026 00:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant