Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion AGENT_GUIDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -72,7 +72,7 @@ Save a new document or update an existing one.
| `last_write_wins` | No | Explicitly skip the concurrency check (default `false`). Use ONLY when an external source of truth makes conflicts meaningless (file re-sync). Recorded in the audit log. **Never use it to silence a conflict.** |
| `project_name` | No | **Single** project name (created if absent). On update: **non-destructive add** — ensures this membership exists, preserves others. See "Project membership semantics" below. |
| `project_names` | No | **List** of project names (each created if absent). On update: **destructive replace** — sets the document's full project set to exactly this list. Use when you want to set multiple projects at once, or deliberately change the membership list. Wins over `project_name` when both are passed. |
| `metadata` | No | Arbitrary JSON. Use at minimum: `type` and `status`. |
| `metadata` | No | Arbitrary JSON. Use at minimum: `type` and `status`. **On update, omitting this keeps the document's existing metadata** (v0.11.1); pass `{}` to deliberately clear all tags. |
| `author` | No | Your agent name for audit attribution. Always set this. |
| `source` | No | Origin label (default "agent"). |

Expand Down
2 changes: 1 addition & 1 deletion AGENT_QUICK_REFERENCE.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ Cerefox is a persistent, shared knowledge base. You have **10 MCP tools** (9 of
| Tool | Purpose | Key params |
|------|---------|------------|
| `cerefox_search` | Find documents (hybrid FTS + semantic) | `query` (required), `project_name`, `metadata_filter`, `requestor` |
| `cerefox_ingest` | Save or update a document | `title`, `content` (required), `document_id` (update by ID), `expected_content_hash` (**required on content updates** — see rule 9), `last_write_wins`, `update_if_exists`, `project_name` (single, non-destructive add on update), `project_names` (list, destructive replace on update), `metadata`, `author` |
| `cerefox_ingest` | Save or update a document | `title`, `content` (required), `document_id` (update by ID), `expected_content_hash` (**required on content updates** — see rule 9), `last_write_wins`, `update_if_exists`, `project_name` (single, non-destructive add on update), `project_names` (list, destructive replace on update), `metadata` (omit on update to keep existing tags; `{}` clears), `author` |
| `cerefox_get_document` | Get full document by ID (header includes `content_hash` — the update token) | `document_id` (required) |
| `cerefox_list_versions` | Version history of a document | `document_id` (required) |
| `cerefox_metadata_search` | Find or list docs by metadata, project, or time (no text query) | `metadata_filter`, `project_name` (list a project's docs), `updated_since`, `include_content` — **at least one** of metadata_filter/project_name/updated_since/created_since |
Expand Down
17 changes: 16 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,22 @@ Versioning: [Semantic Versioning](https://semver.org/spec/v2.0.0.html) — all `

## [Unreleased]

Open roadmap.
### Fixed

- **Content updates no longer wipe a document's metadata.** Every transport defaulted
an absent `metadata` argument to `{}` and the ingest RPC applied it verbatim — so any
content update that didn't re-pass the tags (CLI `document ingest` without
`--metadata`, MCP `cerefox_ingest`, the REST EF, the frozen Python fallback) silently
cleared the document's metadata. And since metadata is not versioned, the loss was
unrecoverable. The contract is now **NULL = "not provided" → keep existing** (create
uses `{}`), enforced once in the `cerefox_ingest_document` RPC; pass `{}` explicitly
to deliberately clear. Schema version 0.5.0 → **0.6.0** (RPC-only; run
`cerefox server deploy` — v0.11.1 clients sending NULL against a 0.5.0 server would
fail the NOT NULL constraint on update).
- **CLI parity: `cerefox metadata search` no longer requires `--metadata-filter`.**
Like the MCP tool / EF (relaxed in v0.10.x — the CLI was missed), at least one of
filter / `--project-name` / `--updated-since` / `--created-since` is required;
`--project-name` alone lists that project's documents.

---

Expand Down
4 changes: 2 additions & 2 deletions _shared/mcp-tools/get-help-content.ts

Large diffs are not rendered by default.

5 changes: 4 additions & 1 deletion _shared/mcp-tools/ingest.ts
Original file line number Diff line number Diff line change
Expand Up @@ -71,7 +71,10 @@ async function handler(
const project_name = args.project_name as string | undefined;
const project_names_raw = args.project_names;
const source = (args.source as string | undefined) ?? "agent";
const metadata = (args.metadata as Record<string, unknown> | undefined) ?? {};
// null = "not provided": the RPC keeps existing metadata on update and uses
// {} on create (v0.11.1 — defaulting to {} here used to wipe a document's
// tags on every content update that didn't re-pass them).
const metadata = (args.metadata as Record<string, unknown> | undefined) ?? null;
const update_if_exists = (args.update_if_exists as boolean | undefined) ?? false;
const author = (args.author as string | undefined) ?? "mcp-agent";
const author_type = "agent"; // MCP path is always agent
Expand Down
6 changes: 6 additions & 0 deletions docs/TODO.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,12 @@

## Known Tasks (Not Yet Scheduled)

### Data Safety
- [ ] **Metadata versioning / recovery** — version snapshots capture content only;
a metadata wipe is unrecoverable (bit us in the v0.11.1 incident). Proposal with
options (audit-log before/after values, and/or metadata on version rows):
[`docs/research/metadata-versioning.md`](research/metadata-versioning.md).

### Search & Ranking
- [ ] Reciprocal Rank Fusion (RRF) for hybrid search instead of linear alpha blending
- [ ] True BM25 ranking via pg_textsearch or ParadeDB extension
Expand Down
6 changes: 3 additions & 3 deletions docs/guides/cli.md
Original file line number Diff line number Diff line change
Expand Up @@ -43,7 +43,7 @@ cerefox document ingest --paste --title "<title>" [OPTIONS] # stdin
| `--title` | `-t` | str | filename stem | Document title. Required with `--paste`. |
| `--project-name` | `--project`, `-p` | str | _none_ | Project name to assign the document to (created if missing). |
| `--paste` | — | flag | off | Read markdown from stdin. Requires `--title`. |
| `--metadata` | `-m` | JSON | `{}` | Extra metadata as a JSON object, e.g. `'{"tags":["work"]}'`. |
| `--metadata` | `-m` | JSON | _not provided_ | Extra metadata as a JSON object, e.g. `'{"tags":["work"]}'`. **On update, omitting this keeps the document's existing metadata** (v0.11.1); pass `'{}'` to deliberately clear all metadata. |
| `--update-if-exists` | `-u` | flag | off | Title/source-path-based fallback update. Mutually exclusive with `--document-id`. |
| `--document-id` | `-i` | UUID | _none_ | Deterministic ID-based update. Errors if the document doesn't exist. |
| `--expected-content-hash` | — | sha256 | _none_ | **Required on content updates** (v0.11 optimistic concurrency): the `content_hash` of the version this edit is based on, shown by `cerefox document get` / `cerefox search`. Stale → conflict error (re-read, merge, retry). |
Expand Down Expand Up @@ -410,8 +410,8 @@ cerefox metadata search --metadata-filter '<json>' [OPTIONS]

| Flag | Type | Default | Description |
|---|---|---|---|
| `--metadata-filter <json>` (`-f`) | JSON | **required** | Metadata filter, e.g. `'{"type":"decision-log"}'`. |
| `--project-name <name>` (`-p`) | str | _none_ | Filter by project name. |
| `--metadata-filter <json>` (`-f`) | JSON | _none_ | Metadata filter, e.g. `'{"type":"decision-log"}'`. Optional since v0.11.1 — at least one of filter / `--project-name` / `--updated-since` / `--created-since` is required (parity with the MCP tool). |
| `--project-name <name>` (`-p`) | str | _none_ | Filter by project name. Sufficient on its own to list that project's documents. |
| `--updated-since TEXT` | ISO-8601 | _none_ | Documents updated after this timestamp. |
| `--created-since TEXT` | ISO-8601 | _none_ | Documents created after this timestamp. |
| `--limit INTEGER` | int | `10` | Max results. |
Expand Down
6 changes: 5 additions & 1 deletion docs/guides/connect-agents.md
Original file line number Diff line number Diff line change
Expand Up @@ -609,7 +609,7 @@ In the action editor, paste this schema (replace `<your-project-ref>`):
openapi: 3.1.0
info:
title: Cerefox Knowledge Base
version: 2.0.0
version: 2.1.0
servers:
- url: https://<your-project-ref>.supabase.co/functions/v1
paths:
Expand Down Expand Up @@ -728,6 +728,10 @@ paths:
default: agent
metadata:
type: object
description: >
Arbitrary JSON metadata. On an UPDATE, omitting this keeps
the document's existing metadata (v2.1.0); pass {} to
deliberately clear all tags.
update_if_exists:
type: boolean
default: false
Expand Down
13 changes: 9 additions & 4 deletions docs/plan.md
Original file line number Diff line number Diff line change
Expand Up @@ -3477,13 +3477,18 @@ bundler (`--use-api`, issue #84). Design of record:
[`docs/research/local-cerefox-design.md`](research/local-cerefox-design.md).

**Near-term tracks** (iteration numbers are planning IDs, not ship order):
1. **Iteration 32 — Optimistic concurrency control**, target **v0.11.0**, on
`feat/optimistic-locking`. Motivated by a real two-agent last-write-wins incident.
Content updates now require `expected_content_hash` (compare-and-swap on the existing
1. **Iteration 32 — Optimistic concurrency control**: ✅ **SHIPPED v0.11.0**
(2026-06-12; schema 0.5.0; deployed + live-validated on the maintainer cloud).
Content updates require `expected_content_hash` (compare-and-swap on the existing
`content_hash`, atomic in the ingest RPC via `FOR UPDATE`) or an explicit
`last_write_wins`. Design of record:
[`docs/specs/concurrency-control-design.md`](specs/concurrency-control-design.md).
Implemented across RPC + MCP + EF + CLI + web + docs; schema 0.5.0.
**v0.11.1 follow-up** (on `fix/metadata-preserve-on-update`, schema 0.6.0):
content updates without metadata no longer wipe a document's tags
(`p_metadata` NULL = keep existing), plus CLI `metadata search` parity (filter
optional with another scope). The wipe incident also spawned the
**metadata-versioning** backlog proposal:
[`docs/research/metadata-versioning.md`](research/metadata-versioning.md).
2. **Iteration 31 — Local ONNX embedder** (fully-offline World B), target **v0.12+**
(slid from v0.11.0 to make room for iter-32), on `feat/local-embedder`.
Design committed; P0 implementation pending review. See iter-31 in the log above.
Expand Down
66 changes: 66 additions & 0 deletions docs/research/metadata-versioning.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,66 @@
# Metadata Versioning — proposal (backlog, unscheduled)

**Status**: Proposal / design sketch — NOT scheduled. Lives in `research/` until
it graduates to an iteration (then a design-of-record in `specs/`).
**Date**: 2026-06-13
**Motivation**: the v0.11.1 incident — a metadata-wipe bug destroyed document
tags with **no recovery path**, because version snapshots capture content only.
Content enjoys two safety layers (optimistic locking for prevention, versioning
for recovery); metadata now has prevention-adjacent protection (absent ≠ clear,
v0.11.1) but still **zero recovery**.

## Current state

- `cerefox_document_versions` + archived chunks snapshot **content only**.
- Metadata lives solely on the live `cerefox_documents.metadata` JSONB column.
- Metadata writes (ingest-with-metadata, `document edit --set-meta/--unset-meta`,
web edit, `cerefox_set_document_projects` for memberships) produce audit
entries, but the audit log records **descriptions, not values** — you can see
*that* metadata changed, not *what it was*.
- This was a deliberate simplicity choice (two-table design, lean version rows).

## Options

### Option A — snapshot metadata into version rows (smallest)

Add `metadata JSONB` to `cerefox_document_versions`; `cerefox_snapshot_version`
copies the document's metadata at snapshot time.

- Pros: one column + one line in the snapshot RPC; restore-from-version can
optionally restore tags; zero new tables.
- Cons: only captures metadata at **content-update** moments — metadata-only
edits between content updates still vanish without a trace; retention
cleanup expires it with the version.

### Option B — audit log records metadata before/after values

Add `meta_before` / `meta_after` JSONB columns to `cerefox_audit_log`,
populated on `update-metadata` and `update-content` operations.

- Pros: covers **every** metadata change (including metadata-only edits);
audit log is immutable and survives version cleanup; recovery = read the
last good `meta_before`.
- Cons: grows the audit table (metadata is small; likely fine); recovery is
manual-ish (no one-click restore), though a `document edit --restore-meta
<audit-id>` verb could be added later.

### Option C — full metadata version table

A dedicated `cerefox_metadata_versions` table, one row per metadata change.

- Cons: a third versioning concept for marginal benefit over B. Rejected
unless A+B prove insufficient.

## Leaning

**B, possibly A+B together** (they're independent and both cheap). B is the
real recovery net because metadata-only edits are the common case; A makes
"restore this version" semantically complete. Both are additive schema changes
(migration + `schema_version` bump). Revisit when the pain recurs or before
v1.0's stability commitment freezes the schema surface.

## Non-goals

- Optimistic locking for metadata-only edits (separate, smaller discussion —
the v0.11.0 design doc scoped it out deliberately).
- Project-membership versioning (memberships are M2M rows, not metadata).
6 changes: 4 additions & 2 deletions packages/memory/src/cli/commands/ingest-dir.ts
Original file line number Diff line number Diff line change
Expand Up @@ -95,7 +95,9 @@ async function action(dir: string, options: IngestDirOptions): Promise<void> {
"No --author / CEREFOX_AUTHOR_NAME set — audit log will record these writes as 'unknown'.",
);
}
const metadata = parseJsonObjectArg(options.metadata, "--metadata") ?? {};
// undefined = "not provided": re-ingesting existing files keeps their
// current metadata (v0.11.1). Pass --metadata '{}' to clear on update.
const metadata = parseJsonObjectArg(options.metadata, "--metadata");

const settings = loadSettings();
if (!settings.supabaseUrl || !settings.supabaseKey) {
Expand Down Expand Up @@ -140,7 +142,7 @@ async function action(dir: string, options: IngestDirOptions): Promise<void> {
title: basename(file, extname(file)),
source: options.source ?? "cli",
projectName: options.projectName ?? null,
metadata: metadata as Record<string, unknown>,
metadata: metadata ?? null,
updateExisting: Boolean(options.updateIfExists),
author,
authorType: authorType as "user" | "agent",
Expand Down
9 changes: 6 additions & 3 deletions packages/memory/src/cli/commands/ingest.ts
Original file line number Diff line number Diff line change
Expand Up @@ -114,7 +114,10 @@ async function action(
"No --author / CEREFOX_AUTHOR_NAME set — audit log will record this write as 'unknown'.",
);
}
const metadata = parseJsonObjectArg(options.metadata, "--metadata") ?? {};
// undefined = "not provided": on update the existing metadata is KEPT
// (v0.11.1 — the old `?? {}` default wiped a document's tags on every
// content update without --metadata). Pass --metadata '{}' to clear.
const metadata = parseJsonObjectArg(options.metadata, "--metadata");

let projectNames: string[] | undefined;
if (options.projectNames) {
Expand Down Expand Up @@ -152,7 +155,7 @@ async function action(
source: options.source ?? "cli",
projectName: options.projectName ?? null,
projectNames: projectNames ?? null,
metadata: metadata as Record<string, unknown>,
metadata: metadata ?? null,
updateExisting: Boolean(options.updateIfExists),
documentId: options.documentId ?? null,
author,
Expand All @@ -166,7 +169,7 @@ async function action(
source: options.source ?? "cli",
projectName: options.projectName ?? null,
projectNames: projectNames ?? null,
metadata: metadata as Record<string, unknown>,
metadata: metadata ?? null,
updateExisting: Boolean(options.updateIfExists),
documentId: options.documentId ?? null,
author,
Expand Down
28 changes: 20 additions & 8 deletions packages/memory/src/cli/commands/metadata-search.ts
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,7 @@ interface MetadataSearchRow {
}

async function action(options: {
metadataFilter: string;
metadataFilter?: string;
projectName?: string;
updatedSince?: string;
createdSince?: string;
Expand All @@ -48,11 +48,21 @@ async function action(options: {
requestor?: string;
json?: boolean;
}): Promise<void> {
const metadataFilter = parseJsonObjectArg(options.metadataFilter, "--metadata-filter");
if (!metadataFilter || Object.keys(metadataFilter).length === 0) {
// Parity with the MCP tool / EF (v0.10.x relaxation, CLI caught up in
// v0.11.1): the filter is optional, but at least one narrowing criterion is
// required so this never becomes an unbounded whole-KB dump. An empty filter
// + --project-name lists that project's documents.
const metadataFilter =
parseJsonObjectArg(options.metadataFilter, "--metadata-filter") ?? {};
if (
Object.keys(metadataFilter).length === 0 &&
!options.projectName &&
!options.updatedSince &&
!options.createdSince
) {
throw userError(
"--metadata-filter is required and must be a non-empty JSON object.",
`Example: --metadata-filter '{"type":"decision-log"}'.`,
"Provide at least one of: --metadata-filter, --project-name, --updated-since, or --created-since.",
`Examples: --metadata-filter '{"type":"decision-log"}' · --project-name "research" (lists that project's docs).`,
);
}

Expand Down Expand Up @@ -136,10 +146,12 @@ async function action(options: {
export function registerMetadataSearch(program: Command): void {
program
.command("metadata-search")
.description("Find documents by metadata criteria (no text query).")
.requiredOption(
.description(
"Find or list documents by metadata, project, or time criteria (no text query).",
)
.option(
"-f, --metadata-filter <json>",
"JSON object; only docs whose metadata contains ALL pairs are returned.",
"JSON object; only docs whose metadata contains ALL pairs are returned. Optional — omit to list by --project-name / time range alone (at least one criterion is required).",
)
.option("-p, --project-name <name>", "Filter to a specific project.")
.option("--updated-since <iso>", "Only docs updated on/after this ISO timestamp.")
Expand Down
Loading
Loading