Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
71 changes: 71 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,76 @@
# Changelog

## [0.7.0rc2] - 2026-05-22

### Features

- **Contradiction detection rebuilt with NLI (no LLM dep).** The
`memory_check_contradictions` MCP tool now uses a Natural Language
Inference cross-encoder (`cross-encoder/nli-deberta-v3-xsmall`,
22M params, ~87 MB INT8 ONNX) instead of an LLM classifier. NLI
is the canonical non-LLM ML primitive for this exact task —
entailment / contradiction / neutral classification on a sentence
pair — and ships through the same FastEmbed-style ONNX runtime
path already in mnemon. **Zero new dependencies** (onnxruntime +
tokenizers + huggingface_hub all transitively required by
FastEmbed already). Replaces the prior LLM-based path that
couldn't work on Fly (`[server]` extras don't install
`llama-cpp-python` per the 2026-05-21 "mnemon is LLM-free by
design" decision, so the LLM path was effectively broken since
the original `[server]`/`[llm]` split).
- **Bidirectional classification.** Each candidate pair is run
through the cross-encoder twice (premise→hypothesis +
hypothesis→premise, ~10-20ms total on CPU INT8). The two
directions disambiguate the mnemon taxonomy: both entail →
`same`; new entails old but not vice versa → `update`;
contradiction in either direction → `contradiction`; both
neutral → `unrelated`.
- **Cosine gate preserved.** Existing
`CONTRADICTION_OVERLAP_THRESHOLD=0.7` still filters candidates
before NLI — protects against the rare NLI false-positive on
obviously-unrelated pairs.
- **Model baked into Fly image.** Dockerfile downloads the 87 MB
quantized ONNX model + tokenizer at build time, mirroring the
existing FastEmbed bake. Cold start adds the NLI load to the
pre-warm path (~5-8 seconds total vs 3-5 seconds prior). Health
check start period bumped 30s → 45s.
- **Clean error surface.** When NLI isn't loadable (e.g., model
download fails on a fresh local install without network), the
MCP tool returns a clear "skipped — NLI classifier unavailable"
message instead of an opaque "Error occurred during tool
execution" envelope. Fail-loud per
`feedback_no_silent_fails`. Composes with the recalled
`feedback-mnemon-pypi-upload-claude-is-authorized` mental model:
surface failure causes specifically, never the generic envelope.

- **`dry_run` parameter on `memory_check_contradictions`.** When
`dry_run=True`, the tool reports what WOULD have decayed without
applying any mutations (no confidence changes, no relations
inserted). Closes the read/command-separation violation in the
prior `check_*` naming; useful for operator audit before
committing destructive changes (the 2026-05-22 standing-tier
promotion incident — operator review of three contradictory
liquidity figures — would have benefited from this).

### Internal

- **New module `src/mnemon/nli.py`** mirroring `embedder.py`:
lazy-loaded singleton, `prewarm()` for lifespan startup,
`classify_pair()` for single-direction, `classify_pair_bidirectional()`
for the mnemon-taxonomy mapping, `is_available()` probe,
`NLIUnavailableError` named exception. Operator override:
`MNEMON_NLI_ONNX_VARIANT` env var to swap between FP32 / FP16 /
INT8 variants (default INT8 AVX-512 for x86 Fly).
- **`contradiction.py` refactored**: LLM imports + prompt
construction removed; vector gate + NLI classify pipeline now
explicit in the docstring. Return shape gains `nli_unavailable`
and `dry_run` flags for caller-side handling.
- **Tests**: `tests/test_nli.py` (11 new) covers bidirectional
label mapping, error surfacing, availability probe.
`tests/test_contradiction.py` refactored to mock the NLI layer
instead of `mnemon.llm.generate`; adds dry-run mutation-skip
test + nli-unavailable clean-flag test. Suite 836 → 847 passing.

## [0.7.0rc1] - 2026-05-22

### Fixes
Expand Down
21 changes: 17 additions & 4 deletions Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,17 @@ RUN pip install --no-cache-dir ".[server]"
ENV FASTEMBED_CACHE_DIR=/app/.cache/fastembed
RUN python -c "from fastembed import TextEmbedding; TextEmbedding(model_name='BAAI/bge-small-en-v1.5', cache_dir='/app/.cache/fastembed')"

# Bake the NLI cross-encoder (~87 MB INT8 ONNX) for
# memory_check_contradictions — same rationale as the FastEmbed bake.
# Without this, the first contradiction check pays a 5-15 second
# download cost AND risks Anthropic's MCP-proxy timeout on the call.
# Model lives in /app/.cache/huggingface (default HF cache root).
ENV HF_HOME=/app/.cache/huggingface
RUN python -c "from huggingface_hub import hf_hub_download; \
hf_hub_download(repo_id='cross-encoder/nli-deberta-v3-xsmall', filename='onnx/model_qint8_avx512.onnx'); \
hf_hub_download(repo_id='cross-encoder/nli-deberta-v3-xsmall', filename='tokenizer.json'); \
hf_hub_download(repo_id='cross-encoder/nli-deberta-v3-xsmall', filename='config.json')"

# Vault data persists in /data (mount a Fly volume here)
ENV MNEMON_VAULT_DIR=/data
RUN mkdir -p /data
Expand All @@ -25,10 +36,12 @@ ENV PORT=8080
EXPOSE 8080

# Health check has a generous start period because the server pre-loads
# the embedding model on startup (see server_remote.py) — uvicorn does
# not bind the port until that load completes (~3-5 seconds on warm
# disk, longer on first-ever boot if the model isn't yet cached).
HEALTHCHECK --interval=30s --timeout=5s --start-period=30s --retries=3 \
# BOTH the embedding model and the NLI classifier on startup (see
# server_remote.py) — uvicorn does not bind the port until both loads
# complete (~5-8 seconds on warm disk, longer on first-ever boot if
# models aren't yet cached). Start period bumped to 45s for the dual
# pre-warm.
HEALTHCHECK --interval=30s --timeout=5s --start-period=45s --retries=3 \
CMD python -c "import urllib.request, sys; sys.exit(0 if urllib.request.urlopen('http://localhost:8080/health', timeout=3).status == 200 else 1)"

CMD ["mnemon", "serve-remote"]
2 changes: 1 addition & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ build-backend = "hatchling.build"

[project]
name = "mnemon-memory"
version = "0.7.0rc1"
version = "0.7.0rc2"
description = "Universal long-term memory layer for AI agents via MCP"
readme = "README.md"
license = "MIT"
Expand Down
2 changes: 1 addition & 1 deletion src/mnemon/__init__.py
Original file line number Diff line number Diff line change
@@ -1,3 +1,3 @@
"""mnemon — Universal long-term memory layer for AI agents via MCP."""

__version__ = "0.7.0rc1"
__version__ = "0.7.0rc2"
Loading
Loading