feat(contradiction): NLI-based detection (no LLM) + dry_run + 0.7.0rc2 bump by cipher813 · Pull Request #157 · cipher813/mnemon

cipher813 · 2026-05-22T16:29:09Z

Summary

Rebuilds memory_check_contradictions on a Natural Language Inference cross-encoder instead of the LLM classifier. Closes the long-latent defect that check_contradictions has been silently broken on Fly since the [server]/[llm] extras split (no llama-cpp-python in [server] per the 2026-05-21 "mnemon is LLM-free by design" decision).

Surfaced 2026-05-22 when claude.ai called memory_check_contradictions on memory #2543 (composite runway) and got an opaque Error occurred during tool execution envelope (Anthropic MCP proxy timing out on the slow/broken LLM path). Brian's framing: "is it a requirement to use an llm to check contradictions? there is no ml tool that can be used instead?" — yes there is, it's NLI, and it's the right institutional fit.

SOTA / institutional fit

NLI (entailment / contradiction / neutral on a sentence pair) is the named ML task for exactly this problem
cross-encoder/nli-deberta-v3-xsmall: 22M params, ~87 MB INT8 ONNX, ~10-20ms per pair bidirectional on CPU
Mirrors mnemon's existing FastEmbed ONNX pattern (lazy load, prewarm at lifespan, baked into Fly image)
Zero new pyproject deps — onnxruntime + tokenizers + huggingface_hub all transitive via FastEmbed already

Mapping NLI → mnemon taxonomy (bidirectional)

premise→hypothesis	hypothesis→premise	mnemon label
contradiction (either direction)		`contradiction`
entailment	entailment	`same` (semantic equivalence)
neutral	entailment	`update` (new supersedes old)
entailment	neutral	`same` (existing dominates)
neutral	neutral	`unrelated`

Cosine gate (CONTRADICTION_OVERLAP_THRESHOLD=0.7) preserved upstream of NLI to filter obviously-unrelated pairs.

`dry_run` parameter

Closes the read/command-separation violation (the check_* naming on a mutating function). When dry_run=True, the tool reports what WOULD have decayed without applying mutations (no confidence changes, no relations inserted). Useful for operator audit before committing destructive changes.

Clean error surface

When NLI isn't loadable (model download fails on a fresh local install without network), MCP tool returns Contradiction check for #X skipped — NLI classifier unavailable on this server (model load failed). ... No vault state was modified. Instead of an opaque envelope. Fail-loud per feedback_no_silent_fails.

Files

New:

src/mnemon/nli.py — model wrapper, lazy-load singleton, classify_pair, classify_pair_bidirectional, prewarm, is_available, NLIUnavailableError
tests/test_nli.py — 11 tests covering bidirectional label mapping, error surfacing, availability

Changed:

src/mnemon/contradiction.py — LLM imports + prompt construction removed; NLI pipeline explicit; return shape gains nli_unavailable + dry_run flags
src/mnemon/server.py — memory_check_contradictions(id, dry_run=False); clean nli_unavailable surface
src/mnemon/server_remote.py — NLI prewarm in lifespan startup (parallel to FastEmbed)
Dockerfile — bake 87 MB INT8 ONNX model + tokenizer; HF_HOME set; health check start period 30s → 45s
tests/test_contradiction.py — mock NLI instead of LLM; +2 tests for dry_run and nli_unavailable

Version bump: 0.7.0rc1 → 0.7.0rc2.

Test plan

11 new NLI tests passing (pytest tests/test_nli.py)
Refactored contradiction tests passing (pytest tests/test_contradiction.py)
Full suite 836 → 847 passing, no regression
End-to-end smoke test with real NLI inference: $500K-vs-$200K liquidity pair correctly classified as contradiction via the MCP tool; dry_run respected (would-decay count surfaced, no mutations applied)
Post-merge: tag v0.7.0rc2, GitHub Release, python -m build, twine upload
Operator runs mnemon upgrade web --app-name mnemon-memory --mnemon-version 0.7.0rc2 to ship Dockerfile changes (NLI bake) to Fly
mnemon doctor 7/7 + smoke-test memory_check_contradictions from claude.ai
Then proceed with Phase 1 standing-tier soak

🤖 Generated with Claude Code

…7.0rc2 Rebuilds memory_check_contradictions on a Natural Language Inference cross-encoder instead of the LLM classifier. NLI is the canonical non-LLM ML primitive for "entailment / contradiction / neutral on a sentence pair" — and ships through the same FastEmbed-style ONNX runtime path already in mnemon, with zero new deps. Driver: the LLM-based path couldn't actually work on Fly. mnemon's [server] extras don't install llama-cpp-python per the 2026-05-21 "mnemon is LLM-free by design" decision, so check_contradictions has been silently broken in production since the [server]/[llm] split. Surfaced 2026-05-22 when claude.ai tried memory_check_contradictions on memory #2543 and got an opaque "Error occurred during tool execution" envelope (Anthropic MCP proxy timeout on the slow/broken LLM path). SOTA / institutional fit: - NLI is the named ML task for this exact pair-relationship problem - cross-encoder/nli-deberta-v3-xsmall: 22M params, ~87 MB INT8 ONNX, ~10-20ms per pair bidirectional on CPU - Mirrors mnemon's existing FastEmbed ONNX pattern (lazy load, prewarm at lifespan, baked into Fly image) - Zero new pyproject deps — onnxruntime + tokenizers + huggingface_hub all transitive via FastEmbed already Bidirectional classification disambiguates the mnemon taxonomy: both entail → "same" (semantic equivalence) new entails old only → "update" (new supersedes old) contradiction either way → "contradiction" both neutral → "unrelated" Existing CONTRADICTION_OVERLAP_THRESHOLD=0.7 cosine gate is preserved — filters obviously-unrelated pairs before they reach NLI, protecting against the rare NLI false-positive on cross-topic content. dry_run param: closes the read/command-separation violation (check_* mutating state). When True, reports what WOULD decay without applying mutations. Useful for operator audit; the 2026-05-22 standing-tier promotion incident (three contradictory liquidity figures) would have benefited from this. Clean error surface: when NLI can't load (e.g., model download fails on a fresh local install without network), MCP tool returns "skipped — NLI classifier unavailable" instead of an opaque "Error occurred during tool execution" envelope. Fail-loud per feedback_no_silent_fails. Dockerfile bakes the 87 MB INT8 ONNX model + tokenizer at build time. Cold start adds NLI load to the prewarm path (~5-8s total). Health check start period bumped 30s → 45s. Files added: src/mnemon/nli.py — model wrapper (lazy load, classify_pair, classify_pair_bidirectional, prewarm, is_available, NLIUnavailableError) tests/test_nli.py — 11 tests covering bidirectional label mapping, error surfacing, availability Files changed: src/mnemon/contradiction.py — LLM imports + prompt construction removed; NLI pipeline explicit; return shape gains nli_unavailable + dry_run flags src/mnemon/server.py — memory_check_contradictions tool gains dry_run param + clean nli_unavailable surface src/mnemon/server_remote.py — NLI prewarm in lifespan startup (parallel to FastEmbed) Dockerfile — bake NLI model; HF_HOME set; health check start period bump tests/test_contradiction.py — mock NLI instead of LLM; +2 tests for dry_run and nli_unavailable Suite 836 → 847 passing. End-to-end smoke test: real NLI inference correctly classifies a contradiction case ($500K vs $200K liquidity) through the MCP tool with dry_run respected. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

cipher813 merged commit c38d233 into main May 22, 2026
9 checks passed

cipher813 deleted the feat/contradiction-detection-nli branch May 22, 2026 16:30

This was referenced May 22, 2026

test(tools): all-tools integration round-trip canary (follow-up to #157) #158

Merged

test(coverage): enforce ≥80% coverage + README badge — 86% baseline #160

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(contradiction): NLI-based detection (no LLM) + dry_run + 0.7.0rc2 bump#157

feat(contradiction): NLI-based detection (no LLM) + dry_run + 0.7.0rc2 bump#157
cipher813 merged 1 commit into
mainfrom
feat/contradiction-detection-nli

cipher813 commented May 22, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

cipher813 commented May 22, 2026

Summary

SOTA / institutional fit

Mapping NLI → mnemon taxonomy (bidirectional)

dry_run parameter

Clean error surface

Files

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

`dry_run` parameter