feat(contradiction): NLI-based detection (no LLM) + dry_run + 0.7.0rc2 bump#157
Merged
Conversation
…7.0rc2
Rebuilds memory_check_contradictions on a Natural Language Inference
cross-encoder instead of the LLM classifier. NLI is the canonical
non-LLM ML primitive for "entailment / contradiction / neutral on a
sentence pair" — and ships through the same FastEmbed-style ONNX
runtime path already in mnemon, with zero new deps.
Driver: the LLM-based path couldn't actually work on Fly. mnemon's
[server] extras don't install llama-cpp-python per the 2026-05-21
"mnemon is LLM-free by design" decision, so check_contradictions has
been silently broken in production since the [server]/[llm] split.
Surfaced 2026-05-22 when claude.ai tried memory_check_contradictions
on memory #2543 and got an opaque "Error occurred during tool
execution" envelope (Anthropic MCP proxy timeout on the slow/broken
LLM path).
SOTA / institutional fit:
- NLI is the named ML task for this exact pair-relationship problem
- cross-encoder/nli-deberta-v3-xsmall: 22M params, ~87 MB INT8 ONNX,
~10-20ms per pair bidirectional on CPU
- Mirrors mnemon's existing FastEmbed ONNX pattern (lazy load,
prewarm at lifespan, baked into Fly image)
- Zero new pyproject deps — onnxruntime + tokenizers + huggingface_hub
all transitive via FastEmbed already
Bidirectional classification disambiguates the mnemon taxonomy:
both entail → "same" (semantic equivalence)
new entails old only → "update" (new supersedes old)
contradiction either way → "contradiction"
both neutral → "unrelated"
Existing CONTRADICTION_OVERLAP_THRESHOLD=0.7 cosine gate is preserved
— filters obviously-unrelated pairs before they reach NLI, protecting
against the rare NLI false-positive on cross-topic content.
dry_run param: closes the read/command-separation violation
(check_* mutating state). When True, reports what WOULD decay
without applying mutations. Useful for operator audit; the 2026-05-22
standing-tier promotion incident (three contradictory liquidity
figures) would have benefited from this.
Clean error surface: when NLI can't load (e.g., model download
fails on a fresh local install without network), MCP tool returns
"skipped — NLI classifier unavailable" instead of an opaque
"Error occurred during tool execution" envelope. Fail-loud per
feedback_no_silent_fails.
Dockerfile bakes the 87 MB INT8 ONNX model + tokenizer at build
time. Cold start adds NLI load to the prewarm path (~5-8s total).
Health check start period bumped 30s → 45s.
Files added:
src/mnemon/nli.py — model wrapper (lazy load, classify_pair,
classify_pair_bidirectional, prewarm,
is_available, NLIUnavailableError)
tests/test_nli.py — 11 tests covering bidirectional label
mapping, error surfacing, availability
Files changed:
src/mnemon/contradiction.py — LLM imports + prompt construction
removed; NLI pipeline explicit;
return shape gains nli_unavailable
+ dry_run flags
src/mnemon/server.py — memory_check_contradictions tool
gains dry_run param + clean
nli_unavailable surface
src/mnemon/server_remote.py — NLI prewarm in lifespan startup
(parallel to FastEmbed)
Dockerfile — bake NLI model; HF_HOME set;
health check start period bump
tests/test_contradiction.py — mock NLI instead of LLM; +2 tests
for dry_run and nli_unavailable
Suite 836 → 847 passing. End-to-end smoke test: real NLI inference
correctly classifies a contradiction case ($500K vs $200K liquidity)
through the MCP tool with dry_run respected.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This was referenced May 22, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Rebuilds
memory_check_contradictionson a Natural Language Inference cross-encoder instead of the LLM classifier. Closes the long-latent defect thatcheck_contradictionshas been silently broken on Fly since the[server]/[llm]extras split (nollama-cpp-pythonin[server]per the 2026-05-21 "mnemon is LLM-free by design" decision).Surfaced 2026-05-22 when claude.ai called
memory_check_contradictionson memory #2543 (composite runway) and got an opaqueError occurred during tool executionenvelope (Anthropic MCP proxy timing out on the slow/broken LLM path). Brian's framing: "is it a requirement to use an llm to check contradictions? there is no ml tool that can be used instead?" — yes there is, it's NLI, and it's the right institutional fit.SOTA / institutional fit
cross-encoder/nli-deberta-v3-xsmall: 22M params, ~87 MB INT8 ONNX, ~10-20ms per pair bidirectional on CPUonnxruntime+tokenizers+huggingface_huball transitive via FastEmbed alreadyMapping NLI → mnemon taxonomy (bidirectional)
contradictionsame(semantic equivalence)update(new supersedes old)same(existing dominates)unrelatedCosine gate (
CONTRADICTION_OVERLAP_THRESHOLD=0.7) preserved upstream of NLI to filter obviously-unrelated pairs.dry_runparameterCloses the read/command-separation violation (the
check_*naming on a mutating function). Whendry_run=True, the tool reports what WOULD have decayed without applying mutations (no confidence changes, no relations inserted). Useful for operator audit before committing destructive changes.Clean error surface
When NLI isn't loadable (model download fails on a fresh local install without network), MCP tool returns
Contradiction check for #X skipped — NLI classifier unavailable on this server (model load failed). ... No vault state was modified.Instead of an opaque envelope. Fail-loud perfeedback_no_silent_fails.Files
New:
src/mnemon/nli.py— model wrapper, lazy-load singleton,classify_pair,classify_pair_bidirectional,prewarm,is_available,NLIUnavailableErrortests/test_nli.py— 11 tests covering bidirectional label mapping, error surfacing, availabilityChanged:
src/mnemon/contradiction.py— LLM imports + prompt construction removed; NLI pipeline explicit; return shape gainsnli_unavailable+dry_runflagssrc/mnemon/server.py—memory_check_contradictions(id, dry_run=False); cleannli_unavailablesurfacesrc/mnemon/server_remote.py— NLI prewarm in lifespan startup (parallel to FastEmbed)Dockerfile— bake 87 MB INT8 ONNX model + tokenizer;HF_HOMEset; health check start period 30s → 45stests/test_contradiction.py— mock NLI instead of LLM; +2 tests fordry_runandnli_unavailableVersion bump:
0.7.0rc1 → 0.7.0rc2.Test plan
pytest tests/test_nli.py)pytest tests/test_contradiction.py)dry_runrespected (would-decay count surfaced, no mutations applied)v0.7.0rc2, GitHub Release,python -m build,twine uploadmnemon upgrade web --app-name mnemon-memory --mnemon-version 0.7.0rc2to ship Dockerfile changes (NLI bake) to Flymnemon doctor7/7 + smoke-testmemory_check_contradictionsfrom claude.ai🤖 Generated with Claude Code