Skip to content

feat(contradiction): NLI-based detection (no LLM) + dry_run + 0.7.0rc2 bump#157

Merged
cipher813 merged 1 commit into
mainfrom
feat/contradiction-detection-nli
May 22, 2026
Merged

feat(contradiction): NLI-based detection (no LLM) + dry_run + 0.7.0rc2 bump#157
cipher813 merged 1 commit into
mainfrom
feat/contradiction-detection-nli

Conversation

@cipher813
Copy link
Copy Markdown
Owner

Summary

Rebuilds memory_check_contradictions on a Natural Language Inference cross-encoder instead of the LLM classifier. Closes the long-latent defect that check_contradictions has been silently broken on Fly since the [server]/[llm] extras split (no llama-cpp-python in [server] per the 2026-05-21 "mnemon is LLM-free by design" decision).

Surfaced 2026-05-22 when claude.ai called memory_check_contradictions on memory #2543 (composite runway) and got an opaque Error occurred during tool execution envelope (Anthropic MCP proxy timing out on the slow/broken LLM path). Brian's framing: "is it a requirement to use an llm to check contradictions? there is no ml tool that can be used instead?" — yes there is, it's NLI, and it's the right institutional fit.

SOTA / institutional fit

  • NLI (entailment / contradiction / neutral on a sentence pair) is the named ML task for exactly this problem
  • cross-encoder/nli-deberta-v3-xsmall: 22M params, ~87 MB INT8 ONNX, ~10-20ms per pair bidirectional on CPU
  • Mirrors mnemon's existing FastEmbed ONNX pattern (lazy load, prewarm at lifespan, baked into Fly image)
  • Zero new pyproject depsonnxruntime + tokenizers + huggingface_hub all transitive via FastEmbed already

Mapping NLI → mnemon taxonomy (bidirectional)

premise→hypothesis hypothesis→premise mnemon label
contradiction (either direction) contradiction
entailment entailment same (semantic equivalence)
neutral entailment update (new supersedes old)
entailment neutral same (existing dominates)
neutral neutral unrelated

Cosine gate (CONTRADICTION_OVERLAP_THRESHOLD=0.7) preserved upstream of NLI to filter obviously-unrelated pairs.

dry_run parameter

Closes the read/command-separation violation (the check_* naming on a mutating function). When dry_run=True, the tool reports what WOULD have decayed without applying mutations (no confidence changes, no relations inserted). Useful for operator audit before committing destructive changes.

Clean error surface

When NLI isn't loadable (model download fails on a fresh local install without network), MCP tool returns Contradiction check for #X skipped — NLI classifier unavailable on this server (model load failed). ... No vault state was modified. Instead of an opaque envelope. Fail-loud per feedback_no_silent_fails.

Files

New:

  • src/mnemon/nli.py — model wrapper, lazy-load singleton, classify_pair, classify_pair_bidirectional, prewarm, is_available, NLIUnavailableError
  • tests/test_nli.py — 11 tests covering bidirectional label mapping, error surfacing, availability

Changed:

  • src/mnemon/contradiction.py — LLM imports + prompt construction removed; NLI pipeline explicit; return shape gains nli_unavailable + dry_run flags
  • src/mnemon/server.pymemory_check_contradictions(id, dry_run=False); clean nli_unavailable surface
  • src/mnemon/server_remote.py — NLI prewarm in lifespan startup (parallel to FastEmbed)
  • Dockerfile — bake 87 MB INT8 ONNX model + tokenizer; HF_HOME set; health check start period 30s → 45s
  • tests/test_contradiction.py — mock NLI instead of LLM; +2 tests for dry_run and nli_unavailable

Version bump: 0.7.0rc1 → 0.7.0rc2.

Test plan

  • 11 new NLI tests passing (pytest tests/test_nli.py)
  • Refactored contradiction tests passing (pytest tests/test_contradiction.py)
  • Full suite 836 → 847 passing, no regression
  • End-to-end smoke test with real NLI inference: $500K-vs-$200K liquidity pair correctly classified as contradiction via the MCP tool; dry_run respected (would-decay count surfaced, no mutations applied)
  • Post-merge: tag v0.7.0rc2, GitHub Release, python -m build, twine upload
  • Operator runs mnemon upgrade web --app-name mnemon-memory --mnemon-version 0.7.0rc2 to ship Dockerfile changes (NLI bake) to Fly
  • mnemon doctor 7/7 + smoke-test memory_check_contradictions from claude.ai
  • Then proceed with Phase 1 standing-tier soak

🤖 Generated with Claude Code

…7.0rc2

Rebuilds memory_check_contradictions on a Natural Language Inference
cross-encoder instead of the LLM classifier. NLI is the canonical
non-LLM ML primitive for "entailment / contradiction / neutral on a
sentence pair" — and ships through the same FastEmbed-style ONNX
runtime path already in mnemon, with zero new deps.

Driver: the LLM-based path couldn't actually work on Fly. mnemon's
[server] extras don't install llama-cpp-python per the 2026-05-21
"mnemon is LLM-free by design" decision, so check_contradictions has
been silently broken in production since the [server]/[llm] split.
Surfaced 2026-05-22 when claude.ai tried memory_check_contradictions
on memory #2543 and got an opaque "Error occurred during tool
execution" envelope (Anthropic MCP proxy timeout on the slow/broken
LLM path).

SOTA / institutional fit:
- NLI is the named ML task for this exact pair-relationship problem
- cross-encoder/nli-deberta-v3-xsmall: 22M params, ~87 MB INT8 ONNX,
  ~10-20ms per pair bidirectional on CPU
- Mirrors mnemon's existing FastEmbed ONNX pattern (lazy load,
  prewarm at lifespan, baked into Fly image)
- Zero new pyproject deps — onnxruntime + tokenizers + huggingface_hub
  all transitive via FastEmbed already

Bidirectional classification disambiguates the mnemon taxonomy:
  both entail               → "same"  (semantic equivalence)
  new entails old only      → "update" (new supersedes old)
  contradiction either way  → "contradiction"
  both neutral              → "unrelated"

Existing CONTRADICTION_OVERLAP_THRESHOLD=0.7 cosine gate is preserved
— filters obviously-unrelated pairs before they reach NLI, protecting
against the rare NLI false-positive on cross-topic content.

dry_run param: closes the read/command-separation violation
(check_* mutating state). When True, reports what WOULD decay
without applying mutations. Useful for operator audit; the 2026-05-22
standing-tier promotion incident (three contradictory liquidity
figures) would have benefited from this.

Clean error surface: when NLI can't load (e.g., model download
fails on a fresh local install without network), MCP tool returns
"skipped — NLI classifier unavailable" instead of an opaque
"Error occurred during tool execution" envelope. Fail-loud per
feedback_no_silent_fails.

Dockerfile bakes the 87 MB INT8 ONNX model + tokenizer at build
time. Cold start adds NLI load to the prewarm path (~5-8s total).
Health check start period bumped 30s → 45s.

Files added:
  src/mnemon/nli.py        — model wrapper (lazy load, classify_pair,
                              classify_pair_bidirectional, prewarm,
                              is_available, NLIUnavailableError)
  tests/test_nli.py        — 11 tests covering bidirectional label
                              mapping, error surfacing, availability

Files changed:
  src/mnemon/contradiction.py    — LLM imports + prompt construction
                                    removed; NLI pipeline explicit;
                                    return shape gains nli_unavailable
                                    + dry_run flags
  src/mnemon/server.py           — memory_check_contradictions tool
                                    gains dry_run param + clean
                                    nli_unavailable surface
  src/mnemon/server_remote.py    — NLI prewarm in lifespan startup
                                    (parallel to FastEmbed)
  Dockerfile                     — bake NLI model; HF_HOME set;
                                    health check start period bump
  tests/test_contradiction.py    — mock NLI instead of LLM; +2 tests
                                    for dry_run and nli_unavailable

Suite 836 → 847 passing. End-to-end smoke test: real NLI inference
correctly classifies a contradiction case ($500K vs $200K liquidity)
through the MCP tool with dry_run respected.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@cipher813 cipher813 merged commit c38d233 into main May 22, 2026
9 checks passed
@cipher813 cipher813 deleted the feat/contradiction-detection-nli branch May 22, 2026 16:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant