Skip to content

Add --check mode to generate_schemas.py + roundtrip tests (M4)#78

Merged
pengfei-threemoonslab merged 3 commits into
mainfrom
claude/lucid-lamarr-e77055
May 16, 2026
Merged

Add --check mode to generate_schemas.py + roundtrip tests (M4)#78
pengfei-threemoonslab merged 3 commits into
mainfrom
claude/lucid-lamarr-e77055

Conversation

@pengfei-threemoonslab
Copy link
Copy Markdown
Contributor

Summary

Trust-hardening M4: extend scripts/generate_schemas.py with a --check flag that verifies committed docs/*.json schemas are byte-identical to what the live Pydantic models produce. Drift fails fast in CI with a unified-diff preview and a remediation command.

  • Refactor each write_X_schema() into a pure build_X_schema() -> (Path, str) plus a thin write wrapper using a new _emit() helper. Tests call the builders directly via importlib.util — no subprocess, no I/O.
  • New tests/test_schema_roundtrip.py (7 tests, all passing): per-schema roundtrip + end-to-end --check exit-0 + negative-control drift detection + builder purity invariant.
  • New CI step Verify generated schemas are up to date runs python scripts/generate_schemas.py --check before the test step, so a Pydantic edit that forgets to regenerate fails in seconds with a clear diff instead of being discovered downstream.
  • CONTRIBUTING.md documents the model-edit → regen → commit workflow with the exact commands.

Why this matters

Before this PR, the docs/*-schema.v*.json artifacts were generated and committed but their parity with the Pydantic models was enforced only by a docstring note ("CI calls this script and asserts the working tree is clean afterward") that was never actually wired. M4 makes that contract real and structural.

This is the foundation slice of the 30-day Trust Hardening Pass — landing first so subsequent slices (severity floor, baseline integrity, plugin validation) can safely bump report_schema_version or extend CheckMetadata with the knowledge that any forgotten regen will fail CI loudly.

What's preserved

  • The existing post-processing in build_report_schema() is untouched. The v0.5 required lists, the constant pinning on schema_version / report_schema_version, the inline-enum tightening for agent_action / provenance_kind, the per-framework frameworks.{google_adk,langchain,crewai} blocks — all stay verbatim.
  • The script's write-mode behavior is byte-identical to before (verified by re-running and asserting git status is clean).
  • No model files, no public-surface docs, no other tests touched.

Files

  • scripts/generate_schemas.py--check mode, builder/write split, _emit() helper, argparse with --help, BUILDERS tuple as single source of truth.
  • tests/test_schema_roundtrip.py — 7 new tests.
  • .github/workflows/ci.yml — one new step before Test.
  • CONTRIBUTING.md — new Schema Changes section.

Test plan

  • python scripts/generate_schemas.py writes byte-identical artifacts on a clean repo.
  • python scripts/generate_schemas.py --check exits 0 on a clean repo.
  • Manually editing a committed schema and running --check exits 1 with a unified-diff preview, the (X more diff lines truncated) suffix, and the remediation command.
  • pytest tests/test_schema_roundtrip.py — 7 passed.
  • pytest tests/test_cli.py tests/test_schema_roundtrip.py tests/test_inputs.py tests/test_findings.py — 94 passed.
  • ruff check scripts/generate_schemas.py tests/test_schema_roundtrip.py — clean.
  • python scripts/generate_schemas.py --help renders correctly.

🤖 Generated with Claude Code

pengfei-threemoonslab and others added 2 commits May 15, 2026 22:13
Extends scripts/generate_schemas.py with a `--check` flag that verifies
each committed docs/*.json schema is byte-identical to what the live
Pydantic models produce — running the same post-processing as `write`,
so v0.5's stable required-fields contract stays preserved.

Drift exits non-zero with a unified-diff preview capped at 40 lines per
file, plus the remediation command. Wires the check in CI before the
test step so a Pydantic edit that forgets to regenerate fails fast with
an actionable message.

Refactors each `write_X_schema()` into a pure `build_X_schema() ->
(Path, str)` and a thin write wrapper using a new `_emit()` helper.
Tests call the builders directly via importlib.util — no subprocess —
so a model edit failing the roundtrip is caught locally before CI.

tests/test_schema_roundtrip.py (7 tests):
- per-schema roundtrip for manifest, report, packet, checks catalog;
- end-to-end `--check` exits 0 on a clean repo;
- negative control: synthetic drift triggers exit 1 with diff preview;
- builder purity: deterministic, returns (Path, str), trailing newline.

CONTRIBUTING.md documents the model-edit → regen → commit workflow.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The initial M4 commit (94d5c63) included three required-field
additions that referenced future fields not present on main:

- ReleaseDecision.required gained "contribution_rules" (M8 audit)
- A new ContributionRule.required block was added (M8)
- loaded_plugins.required gained "validation_status",
  "validation_errors", "runtime_errors" (M5 plugin validation)

These came from concurrent in-flight changes that contaminated the
edit view of scripts/generate_schemas.py before commit. None of those
fields exist on main's ReleaseDecision or loaded_plugins payloads,
so `python scripts/generate_schemas.py --check` correctly reported
drift in docs/report-schema.v0.16.json on CI.

This commit restores the post-processing required lists to exactly
what main has, so the M4 mechanism (--check, builder/write split,
roundtrip tests) is the only contract change in this PR. The M5/M8
additions belong with their respective model changes in a future PR.

Verified: `python scripts/generate_schemas.py --check` exits 0
against main's models; tests/test_schema_roundtrip.py all 7 tests
pass; `git diff origin/main -- scripts/generate_schemas.py` shows
only mechanism changes (helpers, --check flag, build/write split,
BUILDERS, argparse).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@pengfei-threemoonslab
Copy link
Copy Markdown
Contributor Author

Pushed fix 630d346 addressing the review.

Root cause: the initial M4 commit included three required-field additions in the report-schema post-processing that referenced future fields not on main:

  • ReleaseDecision.required had contribution_rules (M8)
  • A new ContributionRule.required block (M8)
  • loaded_plugins.required had validation_status, validation_errors, runtime_errors (M5)

These came from concurrent in-flight M5/M8 changes that contaminated the edit view of scripts/generate_schemas.py before commit. Locally --check passed against the (also-contaminated) untracked docs/report-schema.v0.17.json — but on CI the comparison is against committed docs/report-schema.v0.16.json which correctly has none of those fields, producing the drift you saw.

Fix: the strip commit restores the three sections to exactly what origin/main has. The diff from main now contains only M4 mechanism (--check flag, builder/write split, _emit(), BUILDERS, argparse) — no field-list additions for unimplemented models.

Verification:

  • git diff origin/main -- scripts/generate_schemas.py | grep -E '^[+-]' | grep -v '^(\+\+\+|---)' shows only mechanism diffs, no future-field strings.
  • python scripts/generate_schemas.py --check exits 0 against main's models.
  • pytest tests/test_schema_roundtrip.py — 7 passed.

The push was a fast-forward (94d5c63..630d346), no force-push, no history rewrite. Ready for re-review.

The reviewer flagged that ``build_checks_catalog()`` called
``check_catalog()`` without an explicit ``plugins_enabled=False``.
With ``AGENTS_SHIPGATE_ENABLE_PLUGINS=1`` plus any third-party check
plugin installed on the host, the default ``check_catalog()`` resolves
plugins from entry points and includes their metadata in the result.
``--check`` would then either:

  - falsely flag drift in the committed built-in-only
    ``docs/checks.json``, or
  - on a ``write`` run, silently overwrite the committed catalog with
    a plugin-augmented one.

Either path breaks the "deterministic artifact, regardless of host
environment" guarantee that the M4 mechanism is supposed to provide.

Fix: pass ``plugins_enabled=False`` explicitly. Same value the
implicit default would have on a clean machine, but immune to env
contamination.

Regression test ``test_checks_catalog_ignores_enabled_plugins``:

  - installs a fake plugin entry point with a distinctive check_id;
  - sets ``AGENTS_SHIPGATE_ENABLE_PLUGINS=1`` via monkeypatch;
  - cross-checks the threat by calling ``registry.check_catalog()``
    directly and asserting the canary IS present (so the test fails
    loudly if the upstream plugin path ever stops loading — guards
    against vacuous passes);
  - then asserts ``build_checks_catalog()`` output is byte-identical
    to a clean run with plugins env unset and does not contain the
    canary check_id.

Docstring/description string unchanged — only the build call site
gets the explicit kwarg, so the generated artifact stays byte-
identical to ``main``'s committed ``docs/checks.json``.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@pengfei-threemoonslab
Copy link
Copy Markdown
Contributor Author

Pushed cd1b1cf addressing the P3 finding.

Fix: build_checks_catalog() now calls check_catalog(plugins_enabled=False) explicitly. Previously the implicit default fell back to env-var lookup, so AGENTS_SHIPGATE_ENABLE_PLUGINS=1 plus any installed plugin would:

  • import third-party code at generator time, and
  • either falsely flag drift against the built-in-only committed docs/checks.json, or — on a write run — silently overwrite the committed catalog with a plugin-augmented one.

The committed artifact is now a pure function of the built-in catalog, regardless of host environment.

Regression test: tests/test_schema_roundtrip.py::test_checks_catalog_ignores_enabled_plugins:

  1. installs a fake plugin entry point with a distinctive check_id (PLUGIN-DETERMINISM-CANARY);
  2. sets AGENTS_SHIPGATE_ENABLE_PLUGINS=1 via monkeypatch.setenv;
  3. threat-model cross-check — calls registry.check_catalog() directly and asserts the canary IS present, so the regression below cannot pass vacuously if the upstream plugin path ever stops loading;
  4. asserts generator.build_checks_catalog() output is byte-identical to a clean run with plugins env unset and does not contain the canary check_id.

Notes:

  • Description string in the catalog payload is unchanged — only the build call site gets the explicit kwarg — so the generated docs/checks.json stays byte-identical to main's committed file. No artifact regen included in this PR.
  • Pushed as fast-forward 630d346..cd1b1cf — no history rewrite.
  • All 8 round-trip tests pass locally; --check exits 0 against main's models.

PR commit chain:

  • 94d5c63 Add --check mode + roundtrip tests (M4)
  • 630d346 Strip M5/M8 contamination (P1 fix)
  • cd1b1cf Force plugins_enabled=False (P3 fix)

Ready for re-review.

@pengfei-threemoonslab pengfei-threemoonslab merged commit 1ca14d9 into main May 16, 2026
1 check passed
@pengfei-threemoonslab pengfei-threemoonslab deleted the claude/lucid-lamarr-e77055 branch May 16, 2026 05:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant