Add --check mode to generate_schemas.py + roundtrip tests (M4) by pengfei-threemoonslab · Pull Request #78 · ThreeMoonsLab/agents-shipgate

pengfei-threemoonslab · 2026-05-16T05:13:59Z

Summary

Trust-hardening M4: extend scripts/generate_schemas.py with a --check flag that verifies committed docs/*.json schemas are byte-identical to what the live Pydantic models produce. Drift fails fast in CI with a unified-diff preview and a remediation command.

Refactor each write_X_schema() into a pure build_X_schema() -> (Path, str) plus a thin write wrapper using a new _emit() helper. Tests call the builders directly via importlib.util — no subprocess, no I/O.
New tests/test_schema_roundtrip.py (7 tests, all passing): per-schema roundtrip + end-to-end --check exit-0 + negative-control drift detection + builder purity invariant.
New CI step Verify generated schemas are up to date runs python scripts/generate_schemas.py --check before the test step, so a Pydantic edit that forgets to regenerate fails in seconds with a clear diff instead of being discovered downstream.
CONTRIBUTING.md documents the model-edit → regen → commit workflow with the exact commands.

Why this matters

Before this PR, the docs/*-schema.v*.json artifacts were generated and committed but their parity with the Pydantic models was enforced only by a docstring note ("CI calls this script and asserts the working tree is clean afterward") that was never actually wired. M4 makes that contract real and structural.

This is the foundation slice of the 30-day Trust Hardening Pass — landing first so subsequent slices (severity floor, baseline integrity, plugin validation) can safely bump report_schema_version or extend CheckMetadata with the knowledge that any forgotten regen will fail CI loudly.

What's preserved

The existing post-processing in build_report_schema() is untouched. The v0.5 required lists, the constant pinning on schema_version / report_schema_version, the inline-enum tightening for agent_action / provenance_kind, the per-framework frameworks.{google_adk,langchain,crewai} blocks — all stay verbatim.
The script's write-mode behavior is byte-identical to before (verified by re-running and asserting git status is clean).
No model files, no public-surface docs, no other tests touched.

Files

scripts/generate_schemas.py — --check mode, builder/write split, _emit() helper, argparse with --help, BUILDERS tuple as single source of truth.
tests/test_schema_roundtrip.py — 7 new tests.
.github/workflows/ci.yml — one new step before Test.
CONTRIBUTING.md — new Schema Changes section.

Test plan

python scripts/generate_schemas.py writes byte-identical artifacts on a clean repo.
python scripts/generate_schemas.py --check exits 0 on a clean repo.
Manually editing a committed schema and running --check exits 1 with a unified-diff preview, the (X more diff lines truncated) suffix, and the remediation command.
pytest tests/test_schema_roundtrip.py — 7 passed.
pytest tests/test_cli.py tests/test_schema_roundtrip.py tests/test_inputs.py tests/test_findings.py — 94 passed.
ruff check scripts/generate_schemas.py tests/test_schema_roundtrip.py — clean.
python scripts/generate_schemas.py --help renders correctly.

🤖 Generated with Claude Code

Extends scripts/generate_schemas.py with a `--check` flag that verifies each committed docs/*.json schema is byte-identical to what the live Pydantic models produce — running the same post-processing as `write`, so v0.5's stable required-fields contract stays preserved. Drift exits non-zero with a unified-diff preview capped at 40 lines per file, plus the remediation command. Wires the check in CI before the test step so a Pydantic edit that forgets to regenerate fails fast with an actionable message. Refactors each `write_X_schema()` into a pure `build_X_schema() -> (Path, str)` and a thin write wrapper using a new `_emit()` helper. Tests call the builders directly via importlib.util — no subprocess — so a model edit failing the roundtrip is caught locally before CI. tests/test_schema_roundtrip.py (7 tests): - per-schema roundtrip for manifest, report, packet, checks catalog; - end-to-end `--check` exits 0 on a clean repo; - negative control: synthetic drift triggers exit 1 with diff preview; - builder purity: deterministic, returns (Path, str), trailing newline. CONTRIBUTING.md documents the model-edit → regen → commit workflow. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The initial M4 commit (94d5c63) included three required-field additions that referenced future fields not present on main: - ReleaseDecision.required gained "contribution_rules" (M8 audit) - A new ContributionRule.required block was added (M8) - loaded_plugins.required gained "validation_status", "validation_errors", "runtime_errors" (M5 plugin validation) These came from concurrent in-flight changes that contaminated the edit view of scripts/generate_schemas.py before commit. None of those fields exist on main's ReleaseDecision or loaded_plugins payloads, so `python scripts/generate_schemas.py --check` correctly reported drift in docs/report-schema.v0.16.json on CI. This commit restores the post-processing required lists to exactly what main has, so the M4 mechanism (--check, builder/write split, roundtrip tests) is the only contract change in this PR. The M5/M8 additions belong with their respective model changes in a future PR. Verified: `python scripts/generate_schemas.py --check` exits 0 against main's models; tests/test_schema_roundtrip.py all 7 tests pass; `git diff origin/main -- scripts/generate_schemas.py` shows only mechanism changes (helpers, --check flag, build/write split, BUILDERS, argparse). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

pengfei-threemoonslab · 2026-05-16T05:27:59Z

Pushed fix 630d346 addressing the review.

Root cause: the initial M4 commit included three required-field additions in the report-schema post-processing that referenced future fields not on main:

ReleaseDecision.required had contribution_rules (M8)
A new ContributionRule.required block (M8)
loaded_plugins.required had validation_status, validation_errors, runtime_errors (M5)

These came from concurrent in-flight M5/M8 changes that contaminated the edit view of scripts/generate_schemas.py before commit. Locally --check passed against the (also-contaminated) untracked docs/report-schema.v0.17.json — but on CI the comparison is against committed docs/report-schema.v0.16.json which correctly has none of those fields, producing the drift you saw.

Fix: the strip commit restores the three sections to exactly what origin/main has. The diff from main now contains only M4 mechanism (--check flag, builder/write split, _emit(), BUILDERS, argparse) — no field-list additions for unimplemented models.

Verification:

git diff origin/main -- scripts/generate_schemas.py | grep -E '^[+-]' | grep -v '^(\+\+\+|---)' shows only mechanism diffs, no future-field strings.
python scripts/generate_schemas.py --check exits 0 against main's models.
pytest tests/test_schema_roundtrip.py — 7 passed.

The push was a fast-forward (94d5c63..630d346), no force-push, no history rewrite. Ready for re-review.

The reviewer flagged that ``build_checks_catalog()`` called ``check_catalog()`` without an explicit ``plugins_enabled=False``. With ``AGENTS_SHIPGATE_ENABLE_PLUGINS=1`` plus any third-party check plugin installed on the host, the default ``check_catalog()`` resolves plugins from entry points and includes their metadata in the result. ``--check`` would then either: - falsely flag drift in the committed built-in-only ``docs/checks.json``, or - on a ``write`` run, silently overwrite the committed catalog with a plugin-augmented one. Either path breaks the "deterministic artifact, regardless of host environment" guarantee that the M4 mechanism is supposed to provide. Fix: pass ``plugins_enabled=False`` explicitly. Same value the implicit default would have on a clean machine, but immune to env contamination. Regression test ``test_checks_catalog_ignores_enabled_plugins``: - installs a fake plugin entry point with a distinctive check_id; - sets ``AGENTS_SHIPGATE_ENABLE_PLUGINS=1`` via monkeypatch; - cross-checks the threat by calling ``registry.check_catalog()`` directly and asserting the canary IS present (so the test fails loudly if the upstream plugin path ever stops loading — guards against vacuous passes); - then asserts ``build_checks_catalog()`` output is byte-identical to a clean run with plugins env unset and does not contain the canary check_id. Docstring/description string unchanged — only the build call site gets the explicit kwarg, so the generated artifact stays byte- identical to ``main``'s committed ``docs/checks.json``. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

pengfei-threemoonslab · 2026-05-16T05:36:15Z

Pushed cd1b1cf addressing the P3 finding.

Fix: build_checks_catalog() now calls check_catalog(plugins_enabled=False) explicitly. Previously the implicit default fell back to env-var lookup, so AGENTS_SHIPGATE_ENABLE_PLUGINS=1 plus any installed plugin would:

import third-party code at generator time, and
either falsely flag drift against the built-in-only committed docs/checks.json, or — on a write run — silently overwrite the committed catalog with a plugin-augmented one.

The committed artifact is now a pure function of the built-in catalog, regardless of host environment.

Regression test: tests/test_schema_roundtrip.py::test_checks_catalog_ignores_enabled_plugins:

installs a fake plugin entry point with a distinctive check_id (PLUGIN-DETERMINISM-CANARY);
sets AGENTS_SHIPGATE_ENABLE_PLUGINS=1 via monkeypatch.setenv;
threat-model cross-check — calls registry.check_catalog() directly and asserts the canary IS present, so the regression below cannot pass vacuously if the upstream plugin path ever stops loading;
asserts generator.build_checks_catalog() output is byte-identical to a clean run with plugins env unset and does not contain the canary check_id.

Notes:

Description string in the catalog payload is unchanged — only the build call site gets the explicit kwarg — so the generated docs/checks.json stays byte-identical to main's committed file. No artifact regen included in this PR.
Pushed as fast-forward 630d346..cd1b1cf — no history rewrite.
All 8 round-trip tests pass locally; --check exits 0 against main's models.

PR commit chain:

94d5c63 Add --check mode + roundtrip tests (M4)
630d346 Strip M5/M8 contamination (P1 fix)
cd1b1cf Force plugins_enabled=False (P3 fix)

Ready for re-review.

pengfei-threemoonslab and others added 2 commits May 15, 2026 22:13

pengfei-threemoonslab merged commit 1ca14d9 into main May 16, 2026
1 check passed

pengfei-threemoonslab deleted the claude/lucid-lamarr-e77055 branch May 16, 2026 05:38

This was referenced May 16, 2026

Add plugin validation pipeline + --strict-plugins (M5) #83

Merged

Enforce no-import / no-exec trust invariant across all adapters (M3) #79

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add --check mode to generate_schemas.py + roundtrip tests (M4)#78

Add --check mode to generate_schemas.py + roundtrip tests (M4)#78
pengfei-threemoonslab merged 3 commits into
mainfrom
claude/lucid-lamarr-e77055

pengfei-threemoonslab commented May 16, 2026

Uh oh!

pengfei-threemoonslab commented May 16, 2026

Uh oh!

pengfei-threemoonslab commented May 16, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

pengfei-threemoonslab commented May 16, 2026

Summary

Why this matters

What's preserved

Files

Test plan

Uh oh!

pengfei-threemoonslab commented May 16, 2026

Uh oh!

pengfei-threemoonslab commented May 16, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant