Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
64 changes: 64 additions & 0 deletions docs/atlas/refinement-coverage.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,64 @@
---
title: Atlas Zod refinement coverage
status: living
source: src/atlas/types.ts
generated: 2026-06-12
---

# Atlas Zod refinement coverage

This document enumerates every Zod refinement and transform currently in
`src/atlas/types.ts` (the foundational Atlas contract). For each, it
records whether the constraint is **JSON-Schema-expressible** (and therefore
survives `zod-to-json-schema` conversion at orchestrator-shell boot) or
whether it **requires a post-pass** Zod parse after JSON Schema validation
(because it is a runtime predicate / transform that `zod-to-json-schema`
silently drops).

This file is paired with test `src/__tests__/atlas-refinement-coverage.test.ts`
(T9 per spec §7.9). The test asserts the refinement count in this doc matches
the refinement count counted in source — so if you add a new `.refine(...)` /
`.superRefine(...)` / `.transform(...)` to `src/atlas/types.ts`, you MUST
add a corresponding row here, otherwise T9 fails with a stale-doc message.

## Refinement table

| Refinement | Schema | JSON-Schema-expressible? | Post-pass note |
| ----------------------------------------------------------------------- | --------------------------------------------- | -------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `subsystemHasNoDelimiter` (fragment) | `CandidateFragmentSchema` (line ~145) | No (runtime predicate over a string body) | Rejects when `subsystem` contains `:`, `⟦`, or `⟧`. JSON Schema cannot express a predicate over unicode delimiters as a portable `pattern`. Enforced by the CLI helper's post-pass `CandidateFragmentSchema.parse(input)` step (see spec §4.2.1, STEP 2). |
| `subsystemHasNoDelimiter` (finalized candidate) | `CandidateSchema` (line ~207) | No (runtime predicate) | Same predicate as the fragment row above, applied to the finalized Tier-3 `Candidate` after canonicalization. JSON Schema is not the validation surface for finalized rows — they are validated in TS by `CandidateSchema.parse(...)` — so this lives purely in Zod. |
| `episodic.needsReview === true` | `EpisodicCandidateFragmentSchema` (line ~166) | No (semantic invariant, not structural) | Rejects when `needsReview !== true`. Episodic leaves are "guilty until validated" — the per-family invariant cannot be expressed as a JSON Schema `const` on a `boolean` because the base `CandidateFragmentSchema` permits both values; only the episodic narrowing forbids `false`. Enforced as a SECOND parse via `EpisodicCandidateFragmentSchema` when `sourcetype === "episodic"` (spec §4.6). |
| `episodic.provenance.classification.provenance_class === "derived"` | `EpisodicCandidateFragmentSchema` (line ~173) | No (semantic invariant) | Rejects when `provenance_class !== "derived"`. Episodic leaves can never be `"primary"`. Enforced post-pass via the episodic-narrowed schema. |
| `episodic.provenance.classification.confidence === "low"` | `EpisodicCandidateFragmentSchema` (line ~177) | No (semantic invariant) | Rejects when `confidence !== "low"`. Episodic confidence is clamped to `"low"` by policy. Enforced post-pass via the episodic-narrowed schema. |
| `episodic.provenance.classification.validation_status === "unverified"` | `EpisodicCandidateFragmentSchema` (line ~181) | No (semantic invariant) | Rejects when `validation_status !== "unverified"`. Episodic claims are unverified by construction. Enforced post-pass via the episodic-narrowed schema. |
| `episodic sensitivity floor` (transform) | `EpisodicCandidateFragmentSchema` (line ~190) | No (`.transform` mutates the parsed value; not expressible in JSON Schema) | Coerces `sensitivity === "public"` upward to `"internal"`; `"internal"` / `"proprietary"` / `"secret"` are preserved verbatim. This is a "coerce up to floor" rewrite, NOT a "reject below floor" predicate, so even the JSON-Schema `enum` shape would not catch it (the input is allowed; the value just gets rewritten before persistence). Enforced post-pass via `EpisodicCandidateFragmentSchema.parse(...)`. |

## Summary

- Total refinements / transforms in `src/atlas/types.ts`: **7**
- JSON-Schema-expressible: **0**
- Post-pass required: **7**

All seven entries are runtime predicates or transforms; none survive
`zod-to-json-schema` conversion. The CLI helper at `atlas harvest
write-fragment --stdin` therefore re-parses every fragment through Zod
(`CandidateFragmentSchema.parse` for base fragments, and additionally
`EpisodicCandidateFragmentSchema.parse` when `sourcetype === "episodic"`)
to enforce all seven. See spec §4.1.1, §4.2.1, and §4.6 for the full
orchestration-shell vs CLI-helper split.

## Future-edit note

If you add a `.refine(...)`, `.superRefine(...)`, `.transform(...)`, or
`.regex(...)` to `src/atlas/types.ts`, you must:

1. Add a row to the table above describing the constraint, the host
schema, whether it is JSON-Schema-expressible, and where it is
enforced.
2. Update the **Summary** counts.
3. Re-run `npx vitest run src/__tests__/atlas-refinement-coverage.test.ts`
and confirm green.

T9 fails fast on count drift so the silent-drop class of bug (a new
runtime predicate added to `types.ts` but never wired into the CLI
post-pass) is caught at test time, not at first failing leaf.
19 changes: 11 additions & 8 deletions package-lock.json

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

5 changes: 4 additions & 1 deletion package.json
Original file line number Diff line number Diff line change
Expand Up @@ -72,7 +72,8 @@
"pgvector": "^0.2.0",
"simple-git": "^3.27.0",
"yaml": "^2.8.3",
"zod": "^3.23.8"
"zod": "^3.23.8",
"zod-to-json-schema": "^3.25.2"
},
"peerDependencies": {
"@xenova/transformers": "^2.17.0",
Expand Down Expand Up @@ -100,6 +101,8 @@
"@types/jsdom": "^28.0.1",
"@types/node": "^25.0.6",
"@types/pg": "^8.11.10",
"ajv": "^8.20.0",
"ajv-formats": "^3.0.1",
"jsdom": "^28.0.0",
"tsx": "^4.21.0",
"typescript": "^5.9.3",
Expand Down
64 changes: 64 additions & 0 deletions runs/fragments/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,64 @@
# Atlas Harvest — fragment on-disk contract

Fragments under `<runs-dir>/<run-id>/fragments/<stem>.json` are the canonical
durable artifact of a Tier-1 leaf-fleet run. They are the seam between the
agent-orchestration half (the leaf fleet) and the deterministic in-process
half (`atlas harvest run` and downstream tiers).

## On-disk format

One JSON object per file, pretty-printed, validated against
`CandidateFragmentSchema` (in `src/atlas/types.ts`) — or
`EpisodicCandidateFragmentSchema` when `sourcetype: "episodic"`, which layers
the four episodic-invariant refinements (`needsReview`, `provenance_class`,
`confidence`, `validation_status`) on top of the base.

See `scripts/atlas-harvest/leaf-prompt.md` for the field-by-field contract and
worked examples.

## Stem derivation

The file stem is supplied explicitly via `--stem <stem>` to the
`atlas harvest write-fragment` CLI. When `--stem` is omitted, the stem is
derived from the fragment's canonical-key components — concretely
`claimSlug(<sourcetype>:<subsystem>:claimSlug(claimSlugHint || title))`
(`claimSlugHint` is optional on `CandidateFragmentSchema`; the CLI falls back
to the fragment `title` when no hint is supplied). The stem derivation and
the fragment's `canonical_key` are produced by different functions and yield
different strings — the stem is a filesystem-safe slug, not a copy of the
canonical key. The derivation is still idempotent across runs and two
fragments with the same claim text but different sourcetype/subsystem never
collide.

## Canonical write boundary

Only `atlas harvest write-fragment --stdin` writes into this directory in
Phase 0. Direct `fs.writeFile` from leaves is deprecated as of Phase 0 — it
still works (existing leaves are not broken) but it is no longer the supported
write path, and Phase 1 will remove the leaf-side writer entirely.

The write CLI reads a single fragment JSON from stdin, validates it, and
writes it to `<runs-dir>/<run-id>/fragments/<stem>.json`.

## Schema validation

The CLI Zod-parses the input before writing. Exit-code matrix (spec §4.2.1):

- `0` — success (fragment written; absolute path printed to stdout)
- `1` — stdin/IO failure (bad JSON, unreadable stdin, write error other than EEXIST)
- `2` — stem collision (file already exists)
- `3` — schema validation failure (base `CandidateFragmentSchema` rejected the input)
- `4` — episodic invariant violation (one of `needsReview`/`provenance_class`/`confidence`/`validation_status` failed the episodic refinement)

stderr always carries the underlying Zod / IO error message; the exit code
distinguishes the FAILURE CLASS so the caller (leaf adapter, CI gate) can
route accordingly.

## Atomic create

The CLI creates fragment files EXCLUSIVELY (the underlying open uses the `wx`
flag). A pre-existing file at the same stem yields exit code 2 (`EEXIST`) and
no write occurs — the prior fragment is never silently overwritten.

To re-mint a fragment at the same stem, delete the file first (or run with a
fresh `--run-id`).
2 changes: 2 additions & 0 deletions scripts/atlas-harvest/blitz-manifest.md
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,8 @@ runs AFTER the fleet, over the fragments this fleet produces.
| `FRAGMENTS_DIR` | Absolute path to `runs/<RUN_ID>/fragments/`. The single write target. |
| `AS_OF` | The harvest "as of" calendar date (`YYYY-MM-DD`) stamped into provenance freshness for sources that lack their own date. |

- Phase-0 canonical write path: pipe fragments through `atlas harvest write-fragment --stdin`. See `runs/fragments/README.md` for the on-disk contract.

## Fragment id convention

Each leaf owns a unique, filesystem-safe, deterministic file stem so parallel
Expand Down
Loading