Add --check mode to generate_schemas.py + roundtrip tests (M4)#78
Conversation
Extends scripts/generate_schemas.py with a `--check` flag that verifies each committed docs/*.json schema is byte-identical to what the live Pydantic models produce — running the same post-processing as `write`, so v0.5's stable required-fields contract stays preserved. Drift exits non-zero with a unified-diff preview capped at 40 lines per file, plus the remediation command. Wires the check in CI before the test step so a Pydantic edit that forgets to regenerate fails fast with an actionable message. Refactors each `write_X_schema()` into a pure `build_X_schema() -> (Path, str)` and a thin write wrapper using a new `_emit()` helper. Tests call the builders directly via importlib.util — no subprocess — so a model edit failing the roundtrip is caught locally before CI. tests/test_schema_roundtrip.py (7 tests): - per-schema roundtrip for manifest, report, packet, checks catalog; - end-to-end `--check` exits 0 on a clean repo; - negative control: synthetic drift triggers exit 1 with diff preview; - builder purity: deterministic, returns (Path, str), trailing newline. CONTRIBUTING.md documents the model-edit → regen → commit workflow. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The initial M4 commit (94d5c63) included three required-field additions that referenced future fields not present on main: - ReleaseDecision.required gained "contribution_rules" (M8 audit) - A new ContributionRule.required block was added (M8) - loaded_plugins.required gained "validation_status", "validation_errors", "runtime_errors" (M5 plugin validation) These came from concurrent in-flight changes that contaminated the edit view of scripts/generate_schemas.py before commit. None of those fields exist on main's ReleaseDecision or loaded_plugins payloads, so `python scripts/generate_schemas.py --check` correctly reported drift in docs/report-schema.v0.16.json on CI. This commit restores the post-processing required lists to exactly what main has, so the M4 mechanism (--check, builder/write split, roundtrip tests) is the only contract change in this PR. The M5/M8 additions belong with their respective model changes in a future PR. Verified: `python scripts/generate_schemas.py --check` exits 0 against main's models; tests/test_schema_roundtrip.py all 7 tests pass; `git diff origin/main -- scripts/generate_schemas.py` shows only mechanism changes (helpers, --check flag, build/write split, BUILDERS, argparse). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Pushed fix Root cause: the initial M4 commit included three required-field additions in the report-schema post-processing that referenced future fields not on main:
These came from concurrent in-flight M5/M8 changes that contaminated the edit view of Fix: the strip commit restores the three sections to exactly what Verification:
The push was a fast-forward ( |
The reviewer flagged that ``build_checks_catalog()`` called
``check_catalog()`` without an explicit ``plugins_enabled=False``.
With ``AGENTS_SHIPGATE_ENABLE_PLUGINS=1`` plus any third-party check
plugin installed on the host, the default ``check_catalog()`` resolves
plugins from entry points and includes their metadata in the result.
``--check`` would then either:
- falsely flag drift in the committed built-in-only
``docs/checks.json``, or
- on a ``write`` run, silently overwrite the committed catalog with
a plugin-augmented one.
Either path breaks the "deterministic artifact, regardless of host
environment" guarantee that the M4 mechanism is supposed to provide.
Fix: pass ``plugins_enabled=False`` explicitly. Same value the
implicit default would have on a clean machine, but immune to env
contamination.
Regression test ``test_checks_catalog_ignores_enabled_plugins``:
- installs a fake plugin entry point with a distinctive check_id;
- sets ``AGENTS_SHIPGATE_ENABLE_PLUGINS=1`` via monkeypatch;
- cross-checks the threat by calling ``registry.check_catalog()``
directly and asserting the canary IS present (so the test fails
loudly if the upstream plugin path ever stops loading — guards
against vacuous passes);
- then asserts ``build_checks_catalog()`` output is byte-identical
to a clean run with plugins env unset and does not contain the
canary check_id.
Docstring/description string unchanged — only the build call site
gets the explicit kwarg, so the generated artifact stays byte-
identical to ``main``'s committed ``docs/checks.json``.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Pushed Fix:
The committed artifact is now a pure function of the built-in catalog, regardless of host environment. Regression test:
Notes:
PR commit chain:
Ready for re-review. |
Summary
Trust-hardening M4: extend
scripts/generate_schemas.pywith a--checkflag that verifies committeddocs/*.jsonschemas are byte-identical to what the live Pydantic models produce. Drift fails fast in CI with a unified-diff preview and a remediation command.write_X_schema()into a purebuild_X_schema() -> (Path, str)plus a thin write wrapper using a new_emit()helper. Tests call the builders directly viaimportlib.util— no subprocess, no I/O.tests/test_schema_roundtrip.py(7 tests, all passing): per-schema roundtrip + end-to-end--checkexit-0 + negative-control drift detection + builder purity invariant.python scripts/generate_schemas.py --checkbefore the test step, so a Pydantic edit that forgets to regenerate fails in seconds with a clear diff instead of being discovered downstream.CONTRIBUTING.mddocuments the model-edit → regen → commit workflow with the exact commands.Why this matters
Before this PR, the
docs/*-schema.v*.jsonartifacts were generated and committed but their parity with the Pydantic models was enforced only by a docstring note ("CI calls this script and asserts the working tree is clean afterward") that was never actually wired. M4 makes that contract real and structural.This is the foundation slice of the 30-day Trust Hardening Pass — landing first so subsequent slices (severity floor, baseline integrity, plugin validation) can safely bump
report_schema_versionor extendCheckMetadatawith the knowledge that any forgotten regen will fail CI loudly.What's preserved
build_report_schema()is untouched. The v0.5requiredlists, the constant pinning onschema_version/report_schema_version, the inline-enum tightening foragent_action/provenance_kind, the per-frameworkframeworks.{google_adk,langchain,crewai}blocks — all stay verbatim.git statusis clean).Files
scripts/generate_schemas.py—--checkmode, builder/write split,_emit()helper, argparse with--help,BUILDERStuple as single source of truth.tests/test_schema_roundtrip.py— 7 new tests..github/workflows/ci.yml— one new step beforeTest.CONTRIBUTING.md— newSchema Changessection.Test plan
python scripts/generate_schemas.pywrites byte-identical artifacts on a clean repo.python scripts/generate_schemas.py --checkexits 0 on a clean repo.--checkexits 1 with a unified-diff preview, the(X more diff lines truncated)suffix, and the remediation command.pytest tests/test_schema_roundtrip.py— 7 passed.pytest tests/test_cli.py tests/test_schema_roundtrip.py tests/test_inputs.py tests/test_findings.py— 94 passed.ruff check scripts/generate_schemas.py tests/test_schema_roundtrip.py— clean.python scripts/generate_schemas.py --helprenders correctly.🤖 Generated with Claude Code