Skip to content

Optimistic concurrency control on content updates (v0.11.0)#87

Merged
fstamatelopoulos merged 3 commits into
mainfrom
feat/optimistic-locking
Jun 12, 2026
Merged

Optimistic concurrency control on content updates (v0.11.0)#87
fstamatelopoulos merged 3 commits into
mainfrom
feat/optimistic-locking

Conversation

@fstamatelopoulos

Copy link
Copy Markdown
Owner

Summary

Closes the last-write-wins gap surfaced by a real incident: two agent sessions updated the same document at close times and the later write silently shadowed the earlier one. Versioning made the merge recoverable — this makes the overwrite preventable.

The contract: content updates (document_id or update_if_exists) now require expected_content_hash — the content_hash of the version the edit was based on, returned by every read surface. The check is atomic inside cerefox_ingest_document (SELECT … FOR UPDATE), closing the read→chunk+embed→write race at the one place all transports share.

  • Stale hashCEREFOX_CONFLICT (SQLSTATE 40001; HTTP 409 on REST): re-read → merge → retry with the new hash. Error text teaches agents the protocol.
  • Missing hashCEREFOX_TOKEN_REQUIRED (22023; HTTP 400).
  • last_write_wins: true (CLI --last-write-wins) — explicit, audit-logged escape hatch. Passed internally by ingest-dir / guides ingest (filesystem is their source of truth) and by the frozen Python fallback (preserves its historical behavior).
  • Handlers also fast-fail a stale token before the embedding spend (advisory; RPC stays authoritative).

Token distribution: content_hash returned by cerefox_get_document / cerefox_search (docs mode) / cerefox_metadata_search RPCs → MCP tool headers, CLI output (document get, search), REST EF responses, and the web document API. The web edit page sends the hash it loaded and shows a merge-needed error on 409.

Versioning: schema_version 0.4.0 → 0.5.0 (RPC-only — content_hash already existed; no table migration). GPT Actions OpenAPI → 2.0.0. Design of record: docs/specs/concurrency-control-design.md (iter-32; local embedder slides to v0.12+).

⚠️ Breaking (intended, v0.11.0)

  • Pre-v0.11 clients' content updates fail against an upgraded server until cerefox self-update. Creates and all reads unaffected.
  • Existing Custom GPTs need the v2.0.0 OpenAPI block re-pasted.
  • Suggested rollout: merge → cut v0.11.0 → self-update clients → cerefox server deploy → re-paste GPT block.

Docs

Design doc; AGENT_GUIDE (params, "Concurrent writers" section, workflows, rules) + AGENT_QUICK_REFERENCE (rule 9, workflows; get_help bundle regenerated); cli.md (flags, outputs, recipes); connect-agents (OpenAPI 2.0.0 + 409/400 responses + GPT system prompt teaches the merge protocol); agent-coordination ("Concurrent writers are conflict-guarded"); solution-design §6.1; requirements FR-1.11; README feature row; CLAUDE.md design decision #9; e2e-use-cases §6C; CHANGELOG (Changed — BREAKING); plan.md (iter-32 → v0.11.0).

Test plan

  • _shared: 211 pass (incl. new fast-fail conflict unit tests); typecheck clean
  • packages/memory: 136 pass / 0 fail; live suites now gate on deployed schema ≥ 0.5.0 (probe-and-skip with a deploy hint instead of "function not found")
  • Live conflict-path coverage (runs after schema deploy): no-token → TOKEN_REQUIRED; with hash → updated; stale hash → CONFLICT; --last-write-wins → bypass; failed attempts create no version snapshots
  • Frontend builds; cerefox_get_help bundle in sync
  • Post-merge (maintainer): cut v0.11.0 → self-update → cerefox server deploy → verify with the live suites → re-paste GPT Actions block

🤖 Generated with Claude Code

fstamatelopoulos and others added 3 commits June 12, 2026 12:25
…ntent_hash)

Motivated by a real incident: two agent sessions updated the same doc at
close times and the later write silently shadowed the earlier one
(last-write-wins); versioning made the merge recoverable, but recovery is
not prevention.

Content updates (document_id or update_if_exists) now require
expected_content_hash — the content_hash of the version the edit was based
on — checked ATOMICALLY inside cerefox_ingest_document (SELECT … FOR
UPDATE), closing the read→embed→write race at the one place all transports
share. Stale hash → CEREFOX_CONFLICT (40001; HTTP 409 on REST): re-read,
merge, retry. Missing hash → CEREFOX_TOKEN_REQUIRED (22023; HTTP 400).
last_write_wins=true is the explicit escape hatch (audit-logged); the
filesystem-sync flows (ingest-dir, guides ingest) and the frozen Python
fallback pass it internally.

content_hash is now returned by every document-shaped read (get_document /
search docs-mode / metadata_search RPCs, MCP tool headers, CLI output, REST
EF responses, web document API) so writers always hold the token. The web
edit page sends the hash it loaded and surfaces a merge-needed error on 409.
Handlers also fast-fail a stale token BEFORE the embedding spend (advisory;
the RPC stays authoritative).

BREAKING: pre-v0.11 clients' content updates fail against an upgraded
server until self-updated; GPT Actions need the v2.0.0 OpenAPI block
re-pasted. Creates unaffected. schema_version 0.4.0 → 0.5.0 (RPC-only; no
table migration — content_hash already existed). Live test suites gate on
the deployed schema version (probe-and-skip below 0.5.0) and now cover
conflict / token-required / last-write-wins paths end to end.

Design of record: docs/specs/concurrency-control-design.md (iter-32, v0.11.0;
local embedder slides to v0.12+).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Doc-sweep follow-up to the iter-32 implementation — every doc that teaches
update flows now carries the expected_content_hash contract:

- agent-coordination: new "Concurrent writers are conflict-guarded" section
  (this guide IS the multi-agent doc) + decision-log append pattern passes
  the hash.
- AGENT_GUIDE / cli.md: ID-based-update recipes gain the get → hash → update
  → on-conflict-merge steps.
- connect-agents: Custom GPT system prompt teaches the 409 merge protocol;
  metadataSearch response documents content_hash.
- solution-design §6.1: the FOR UPDATE concurrency check documented next to
  the atomic-ingest property it extends.
- requirements-and-specs: FR-1.11 (optimistic concurrency, P0).
- README: "Concurrency-safe updates" feature row.
- CLAUDE.md: Key Design Decision #9.
- e2e-use-cases: section 6C — the new live/unit conflict-path test matrix.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…aved keys

The edit page's key suggestions embedded the usage count in the option label
(`status (108)`); Mantine Autocomplete inserts the LABEL into the input on
select, so picking a suggestion and saving stored the literal string
"status (108)" as the metadata key — polluting the KB taxonomy (the key list
then showed "status (108) (1)").

Fix: options are bare keys; the count is rendered dropdown-only via
renderOption ("status · 108 docs"). The search filter's key Select was never
affected (Select keeps value/label separate) but now labels the count as
"(N docs)" so it reads as a usage count, not an ID. IngestPage's native
datalist already inserted bare keys.

(The one polluted document in the maintainer KB — keys "type (135)" /
"status (108)" — was repaired via `document edit --set-meta/--unset-meta`.)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@fstamatelopoulos fstamatelopoulos merged commit 304a8b1 into main Jun 12, 2026
3 checks passed
@fstamatelopoulos fstamatelopoulos deleted the feat/optimistic-locking branch June 13, 2026 00:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant