feat(salience): Phase 1 — first-class standing tier (default-off, soak-gated)#154
Merged
Conversation
…k-gated)
Adds the standing-context recall tier as a schema-backed, MCP-tooled,
CLI-driven first-class feature. Memories explicitly promoted via
memory_promote are injected into every <mnemon-context> envelope on
every prompt, regardless of query similarity. Cap=15 (hard ceiling 20).
Default-off behind STANDING_TIER_ENABLED; soak-gated.
Closes the salience-tier Phase 1 work from the 2026-05-22 reframing —
ships the substrate gated, operator promotes ~5 career-context
memories via memory_promote MCP tool, flips the flag, observes ≥1
week soak for runway-style under-weighting recurrence vs absence.
Phase 1 IS the validation gate (vs the original synthetic A/B against
the Phase 0 env-var form, which carried no marginal information once
Phase 1 ships gated — same injection mechanism).
Schema additive: documents.tier TEXT NOT NULL DEFAULT 'situational'.
Idx idx_documents_tier on live rows for cap-count + search-exclusion
queries. Pre-existing memories default to 'situational'; harmless if
the flag stays off.
Store API:
promote_to_standing(id) — raises StandingTierCapReached at cap,
StandingTierProvenanceRejected on hook-sourced (Layer 4 compose),
StandingTierError on missing/invalidated; idempotent re-promote.
demote_to_situational(id) — round-trip; no-op on already-situational.
list_standing() — live tier members, content-included, ordered DESC.
standing_tier_status() — count/cap/hard_ceiling stats.
search_bm25 + search_vector — gain include_standing kw (default False)
so the unconditionally-injected tier isn't double-counted in
ranked retrieval.
MCP tools (server.py, both stdio + Streamable HTTP):
memory_promote(id) — surface cap / provenance / missing rejections
as user-actionable messages.
memory_demote(id) — idempotent.
memory_list_standing() — JSON array consumed by build_context.
Tool count: 14 → 17.
build_context wiring (context_surfacing.py): when STANDING_TIER_ENABLED
(config constant OR MNEMON_STANDING_TIER_ENABLED env override accepting
1/true/yes/on), call memory_list_standing in one round-trip; render
as "Standing context" sub-section inside the existing Layer 1
envelope. Phase 0 env-var path (MNEMON_STANDING_TIER_FILE → standing.json
→ standing-rendered.md cache) PRESERVED as fallback so operators
retain per-session override mechanism.
CLI: `mnemon standing list / promote <id> / demote <id>`. `mnemon
status` gains a Standing tier: N/CAP line.
Composes with: Layer 0 is_well_shaped (capture rejection upstream),
Layer 1 envelope (standing block inside same envelope + nonce as
situational), Layer 4 HOOK_SOURCE_CONFIDENCE_CEILING (provenance
demotion enforced via StandingTierProvenanceRejected), rc16 source_key
(orthogonal), capture-attention Phase A (recurrence_count signals
candidates that operator-review can promote — bridges to salience-tier
Phase 2 promotion signals).
22 new tests in tests/test_standing_tier.py covering: promote
success / cap-rejection (cap=2 fixture, 3rd raises) / hook-sourced
rejection / invalidated rejection / missing rejection / cap respects
invalidated / demote round-trip / demote idempotent / demote frees
cap slot / list_standing ordering + content / search excludes by
default / search includes when requested / build_context flag-off /
flag-on memory_list_standing call / env-var truthy parsing.
Suite 814 → 836 passing. test_server_remote.py tool-count assertions
bumped 14 → 17.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This was referenced May 22, 2026
cipher813
added a commit
that referenced
this pull request
May 22, 2026
Seals the 2026-05-22 substrate arc: - #153 capture-attention Phase A (recurrence-weighted preserve+relate+boost) - #154 salience-tier Phase 1 (first-class standing tier, +3 MCP tools) - #155 build_standing_set.py exemplar bias fix Both new feature paths gated default-off (CAPTURE_ATTENTION_ENABLED, STANDING_TIER_ENABLED) — operator flips per the soak workflow. Post-merge ritual: - tag v0.7.0rc1 + GitHub Release - twine upload (dist/mnemon-memory-0.7.0rc1.{tar.gz,whl}) - mnemon upgrade web --app-name mnemon-memory --mnemon-version 0.7.0rc1 - mnemon doctor 7/7 against live remote - operator promotes 5 career memories + flips MNEMON_STANDING_TIER_ENABLED for ≥1 week soak Suite 836 passing. mnemon --version returns 0.7.0rc1. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
3 tasks
cipher813
added a commit
that referenced
this pull request
May 22, 2026
Closes the unit-test-coverage gap that let memory_check_contradictions
ship to Fly with a hidden bug. Pre-existing test_server.py mocks every
external dep (LLM, NLI, embedder, vecstore), which is right for
isolated-contract testing but leaves a gap: a tool's real call path
can raise an uncaught exception that no mocked test exercises.
tests/test_tools_integration.py iterates the entire registered MCP
tool manager and invokes each tool against a real seeded vault with
minimal-valid inputs. Three test classes:
1. test_every_tool_invokes_cleanly — no unhandled exception, return
shape matches MCP contract (str/dict/list). Coverage-check
assertions force every newly-registered tool to have a fixture
input AND every removed tool to have its fixture cleaned up.
2. test_no_tool_returns_opaque_error_string — outputs must not
contain "Error occurred during tool execution" or "Internal
server error" verbatim. Opaque envelopes come from the MCP
transport wrapping an escaped Python exception — never
acceptable as a tool's own output.
3. test_destructive_tools_respect_dry_run — locks the dry_run
contract on memory_check_contradictions + memory_sweep. Mutating
inputs are stubbed to "would-decay" labels via mocked NLI; the
test asserts pre-state == post-state.
Heavy paths (NLI classify, FastEmbed re-embed) are stubbed so the
suite stays under ~17s; real-model paths are validated by
scripts/calibrate_capture_threshold.py and the operator Layer-3
web test ritual (extended in follow-up PR #158).
Catches the failure class that bit memory_check_contradictions on
2026-05-22: the LLM-required path would have raised under [server]
extras even with the broad try/except, and this canary's assertion
"no unhandled exception" would have fired on PR #154 when the
salience-tier tools were added — forcing the fix before merge.
Note: this PR doesn't yet exercise the [server]-only install matrix
(that's PR #158). It DOES catch any tool that raises through its
wrapper regardless of extras — the universal subset of the gap.
Composes with feedback_no_silent_fails: every tool must catch
failures at its boundary and return a clean error string, never
let an exception escape.
Suite 847 → 850 passing.
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
cipher813
added a commit
that referenced
this pull request
May 22, 2026
…#159) Follow-up to PR #158 — closes the [server]-extras gap that the local integration canary can't see + extends the operator Layer-3 web test ritual to probe every MCP tool against the live Fly app. Two artifacts: 1. .github/workflows/ci-server-extras.yml — installs mnemon-memory[server] ONLY (the Fly Docker install) + pytest as a separate test runner. Runs the full suite under that minimal install. Includes a guard that asserts llama-cpp-python is NOT installed under [server] — so future PRs can't accidentally drag the LLM dep into the production path. This is the workflow that would have caught memory_check_contradictions's LLM hard-dependency on PR #154 when the salience-tier tools were first added; ci.yml passed because [dev] installs everything. 2. scripts/promote_stable.sh layer3 --exercise-all-tools — opt-in flag that, after the test Fly app is up but before downgrade, iterates every registered MCP tool against the remote and asserts each returns cleanly. Catches Fly-specific breakage (missing baked models, Anthropic MCP proxy timeouts, transport regressions) that the local Python-level canary in tests/test_tools_integration.py can't see. Tool list resolved dynamically from mcp._tool_manager._tools, so tools added in future PRs are exercised automatically — no per- release maintenance burden. Per-tool inputs mirror the integration- test fixture; destructive tools (memory_forget, memory_rebuild) skipped; mutating tools constrained to dry_run / round-trip. scripts/_layer3_remote_helper.py gains an exercise-all-tools subcommand wired through the FastMCP tool manager. Two regression- lock tests added to tests/test_promote_stable.sh harness (13 → 15 passing) covering helper dispatch + flag plumbing through the bash dispatcher (cmd_layer3 "$@" forwarding + EXERCISE_ALL_TOOLS=1 set). Full Python suite still 850 passing. Driver: Brian's 2026-05-22 ask after the memory_check_contradictions incident — "given the difficulty of checking each individual mnemon tool available, are we properly using unit tests to confirm that everything works as expected?" PR #158 addressed the Python-level canary; this PR addresses the deployment-environment + Fly-level canary. Together they form the test trio for catching the 2026-05-22 failure class on the next PR rather than on the next operator MCP call. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Ships the salience-tier substrate as a schema-backed, MCP-tooled, CLI-driven first-class feature. Memories explicitly promoted via
memory_promoteare injected into every<mnemon-context>envelope on every prompt, regardless of query similarity. Cap=15 (hard ceiling 20). Default-off behindSTANDING_TIER_ENABLED.Closes the salience-tier Phase 1 work per the 2026-05-22 reframing: Phase 1 IS the validation gate. Operator promotes ~5 career-context memories via the new
memory_promoteMCP tool, flips the flag, observes ≥1 week soak for runway-style under-weighting recurrence vs absence. The earlier synthetic-A/B-against-Phase-0 plan was reframed because the injection mechanism is identical between Phase 0's env-var-flagged form and Phase 1's schema-backed form — an A/B carried no marginal information.Driver
The original Claude Desktop failure mode: load-bearing facts (Brian's runway, career posture) were retrieved into context but under-weighted against a dominant generic prior. RAG worked. Salience didn't. This PR closes the recall-side weighting gap. The upstream curation gap (capture fragmenting load-bearing facts into pieces) was closed by capture attention Phase A (#153, merged earlier today).
What ships
Store API
promote_to_standing(id)— raisesStandingTierCapReached/StandingTierProvenanceRejected/StandingTierErrorwith user-actionable messages. Idempotent re-promote returns True.demote_to_situational(id)— round-trip. Idempotent on already-situational (returns False).list_standing()— live tier members, content-included, ordered DESC by created_at.standing_tier_status()—{count, cap, hard_ceiling}.search_bm25+search_vector— gaininclude_standing: bool = Falsekw param. Tier 1 docs excluded from ranked retrieval by default (no double-counting; they're already injected unconditionally).MCP tools (both stdio + Streamable HTTP)
memory_promote(id)— surfaces cap / provenance / missing rejections as readable messages.memory_demote(id)— idempotent.memory_list_standing()— JSON array consumed bybuild_context.CLI
mnemon standing list / promote <id> / demote <id>mnemon statusgains aStanding tier: N/CAPline.build_context wiring
STANDING_TIER_ENABLED(config constant ORMNEMON_STANDING_TIER_ENABLEDenv override accepting1/true/yes/on): singlememory_list_standinground-trip → renders as## Standing contextsub-section inside the existing Layer 1 envelope, ahead of## Situational recall.MNEMON_STANDING_TIER_FILE→standing.json→standing-rendered.mdcache). Operators retain a per-session override mechanism.Schema
documents.tier TEXT NOT NULL DEFAULT 'situational'— additive migration in_migrate_tier(). Indexidx_documents_tierscoped to live rows. Pre-existing memories default to'situational'; harmless if flag stays off.Composability (all preserved)
is_well_shaped) — capture rejection runs upstream of any tier consideration<mnemon-context>data-marking + nonceHOOK_SOURCE_CONFIDENCE_CEILING+ provenance) — hook-sourced memories cannot be promoted; explicitStandingTierProvenanceRejectedrejectionsource_keyupsert — unchanged; tier orthogonalrecurrence_countsignals candidates that operator-review can promote (bridges to salience-tier Phase 2 promotion signals work)Soak gates (before flipping default-on)
STANDING_TIER_ENABLED=true(set via env or config)Test plan
pytest tests/test_standing_tier.py)test_server_remote.py(bumped 14 → 17)mnemon statusshows tier line;mnemon standing list/promote/demoteround-trip works on a temp vaultmemory_promoteMCP tool; setMNEMON_STANDING_TIER_ENABLED=truein the Claude Code launching shell; verify the<mnemon-context>block contains## Standing contextsub-section; observe ≥1 week soak.🤖 Generated with Claude Code