cipher813 · cipher813 · May 22, 2026 · May 22, 2026
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -1,5 +1,76 @@
 # Changelog
 
+## [0.7.0rc2] - 2026-05-22
+
+### Features
+
+- **Contradiction detection rebuilt with NLI (no LLM dep).** The
+  `memory_check_contradictions` MCP tool now uses a Natural Language
+  Inference cross-encoder (`cross-encoder/nli-deberta-v3-xsmall`,
+  22M params, ~87 MB INT8 ONNX) instead of an LLM classifier. NLI
+  is the canonical non-LLM ML primitive for this exact task —
+  entailment / contradiction / neutral classification on a sentence
+  pair — and ships through the same FastEmbed-style ONNX runtime
+  path already in mnemon. **Zero new dependencies** (onnxruntime +
+  tokenizers + huggingface_hub all transitively required by
+  FastEmbed already). Replaces the prior LLM-based path that
+  couldn't work on Fly (`[server]` extras don't install
+  `llama-cpp-python` per the 2026-05-21 "mnemon is LLM-free by
+  design" decision, so the LLM path was effectively broken since
+  the original `[server]`/`[llm]` split).
+  - **Bidirectional classification.** Each candidate pair is run
+    through the cross-encoder twice (premise→hypothesis +
+    hypothesis→premise, ~10-20ms total on CPU INT8). The two
+    directions disambiguate the mnemon taxonomy: both entail →
+    `same`; new entails old but not vice versa → `update`;
+    contradiction in either direction → `contradiction`; both
+    neutral → `unrelated`.
+  - **Cosine gate preserved.** Existing
+    `CONTRADICTION_OVERLAP_THRESHOLD=0.7` still filters candidates
+    before NLI — protects against the rare NLI false-positive on
+    obviously-unrelated pairs.
+  - **Model baked into Fly image.** Dockerfile downloads the 87 MB
+    quantized ONNX model + tokenizer at build time, mirroring the
+    existing FastEmbed bake. Cold start adds the NLI load to the
+    pre-warm path (~5-8 seconds total vs 3-5 seconds prior). Health
+    check start period bumped 30s → 45s.
+  - **Clean error surface.** When NLI isn't loadable (e.g., model
+    download fails on a fresh local install without network), the
+    MCP tool returns a clear "skipped — NLI classifier unavailable"
+    message instead of an opaque "Error occurred during tool
+    execution" envelope. Fail-loud per
+    `feedback_no_silent_fails`. Composes with the recalled
+    `feedback-mnemon-pypi-upload-claude-is-authorized` mental model:
+    surface failure causes specifically, never the generic envelope.
+
+- **`dry_run` parameter on `memory_check_contradictions`.** When
+  `dry_run=True`, the tool reports what WOULD have decayed without
+  applying any mutations (no confidence changes, no relations
+  inserted). Closes the read/command-separation violation in the
+  prior `check_*` naming; useful for operator audit before
+  committing destructive changes (the 2026-05-22 standing-tier
+  promotion incident — operator review of three contradictory
+  liquidity figures — would have benefited from this).
+
+### Internal
+
+- **New module `src/mnemon/nli.py`** mirroring `embedder.py`:
+  lazy-loaded singleton, `prewarm()` for lifespan startup,
+  `classify_pair()` for single-direction, `classify_pair_bidirectional()`
+  for the mnemon-taxonomy mapping, `is_available()` probe,
+  `NLIUnavailableError` named exception. Operator override:
+  `MNEMON_NLI_ONNX_VARIANT` env var to swap between FP32 / FP16 /
+  INT8 variants (default INT8 AVX-512 for x86 Fly).
+- **`contradiction.py` refactored**: LLM imports + prompt
+  construction removed; vector gate + NLI classify pipeline now
+  explicit in the docstring. Return shape gains `nli_unavailable`
+  and `dry_run` flags for caller-side handling.
+- **Tests**: `tests/test_nli.py` (11 new) covers bidirectional
+  label mapping, error surfacing, availability probe.
+  `tests/test_contradiction.py` refactored to mock the NLI layer
+  instead of `mnemon.llm.generate`; adds dry-run mutation-skip
+  test + nli-unavailable clean-flag test. Suite 836 → 847 passing.
+
 ## [0.7.0rc1] - 2026-05-22
 
 ### Fixes

diff --git a/Dockerfile b/Dockerfile
@@ -15,6 +15,17 @@ RUN pip install --no-cache-dir ".[server]"
 ENV FASTEMBED_CACHE_DIR=/app/.cache/fastembed
 RUN python -c "from fastembed import TextEmbedding; TextEmbedding(model_name='BAAI/bge-small-en-v1.5', cache_dir='/app/.cache/fastembed')"
 
+# Bake the NLI cross-encoder (~87 MB INT8 ONNX) for
+# memory_check_contradictions — same rationale as the FastEmbed bake.
+# Without this, the first contradiction check pays a 5-15 second
+# download cost AND risks Anthropic's MCP-proxy timeout on the call.
+# Model lives in /app/.cache/huggingface (default HF cache root).
+ENV HF_HOME=/app/.cache/huggingface
+RUN python -c "from huggingface_hub import hf_hub_download; \
+    hf_hub_download(repo_id='cross-encoder/nli-deberta-v3-xsmall', filename='onnx/model_qint8_avx512.onnx'); \
+    hf_hub_download(repo_id='cross-encoder/nli-deberta-v3-xsmall', filename='tokenizer.json'); \
+    hf_hub_download(repo_id='cross-encoder/nli-deberta-v3-xsmall', filename='config.json')"
+
 # Vault data persists in /data (mount a Fly volume here)
 ENV MNEMON_VAULT_DIR=/data
 RUN mkdir -p /data
@@ -25,10 +36,12 @@ ENV PORT=8080
 EXPOSE 8080
 
 # Health check has a generous start period because the server pre-loads
-# the embedding model on startup (see server_remote.py) — uvicorn does
-# not bind the port until that load completes (~3-5 seconds on warm
-# disk, longer on first-ever boot if the model isn't yet cached).
-HEALTHCHECK --interval=30s --timeout=5s --start-period=30s --retries=3 \
+# BOTH the embedding model and the NLI classifier on startup (see
+# server_remote.py) — uvicorn does not bind the port until both loads
+# complete (~5-8 seconds on warm disk, longer on first-ever boot if
+# models aren't yet cached). Start period bumped to 45s for the dual
+# pre-warm.
+HEALTHCHECK --interval=30s --timeout=5s --start-period=45s --retries=3 \
     CMD python -c "import urllib.request, sys; sys.exit(0 if urllib.request.urlopen('http://localhost:8080/health', timeout=3).status == 200 else 1)"
 
 CMD ["mnemon", "serve-remote"]
diff --git a/pyproject.toml b/pyproject.toml
@@ -4,7 +4,7 @@ build-backend = "hatchling.build"
 
 [project]
 name = "mnemon-memory"
-version = "0.7.0rc1"
+version = "0.7.0rc2"
 description = "Universal long-term memory layer for AI agents via MCP"
 readme = "README.md"
 license = "MIT"

diff --git a/src/mnemon/__init__.py b/src/mnemon/__init__.py
@@ -1,3 +1,3 @@
 """mnemon — Universal long-term memory layer for AI agents via MCP."""
 
-__version__ = "0.7.0rc1"
+__version__ = "0.7.0rc2"