Add severity-override floor + policy audit (M1, v0.17)#80
Conversation
|
Pushed fix commit P1.1 — schema artifacts. Ran P1.2 — policy-pack rule regression. Extended P2.3 — rich-form expiry was advisory. Added P2.4 — broken test fixtures. The two flagged cases overrode Collateral fix. Schema-generator change worth a second look: I added a manifest-schema override in |
5148e21 to
88551a2
Compare
|
Pushed P1.1 — v0.17 collision with M8 merged to main. Rebased onto P1.2 — action-surface policy severity bypass. Real and important. The resolver previously compared overrides against Fix: extended Three new test cases in tests/test_severity_override_floor.py: tier-crossing rejection without ack, tier-crossing with ack carries The audit row now reports the effective STABILITY.md "Severity-override floor" gains a new "Dynamic-severity check classes" paragraph documenting the contract for Final state after both fixes: 1150 passed, 3 skipped; ruff clean. |
Closes the largest trust hole in the release gate: today any manifest
can write `checks.severity_overrides: SHIP-POLICY-APPROVAL-MISSING: info`
and silently turn off a critical finding. The original severity lands
in `evidence.default_severity` for audit but reviewers rarely look there.
M1 makes the gate honest:
- `CheckMetadata.floor_severity` declares a hard lower bound on what
a manifest override is allowed to resolve to. 16 release-critical
built-ins now declare floors (critical→floor=high for policy/action;
high→floor=medium for auth/scope/inventory/sidefx).
- Below-floor overrides are rejected as manifest config errors (exit 2).
The floor is hard; no acknowledgement bypasses it.
- `checks.severity_overrides` accepts both legacy scalar form and a
new rich form `{severity, reason, expires}`.
- New `checks.acknowledge_overrides[]` block gates tier-crossing
downgrades (critical↔high, high↔normal). Tier-crossing upgrades
and same-tier downgrades never require ack.
- Expired ack entries fail manifest load with exit 2 — no advisory
bypass.
- New `report.policy_audit.severity_overrides_applied[]` surfaces
every applied override at the top of the report. Required +
non-nullable on the wire (mirrors v0.12 agent_summary pattern).
- Markdown report renders a `## Policy Audit` section between
Release Decision and Summary when overrides exist.
- GitHub step summary adds a one-liner counting overrides +
downgrades + tier-crossed.
Schema bump: report_schema_version 0.16 → 0.17.
Breaking for manifests currently downgrading any of the 16 floored
checks below their new floor. Failure mode is loud (exit 2 with a
routable error message), not silent.
Architecture:
- New module `core/severity_overrides.py` (331 LOC) owns the
validation policy as a pure function with explicit `today=`
injection for deterministic tests.
- Legacy `apply_severity_overrides(findings, dict[str, Severity])`
signature unchanged — existing direct callers (test_findings.py,
test_policy_packs.py) keep working byte-for-byte.
- Resolver runs up front in cli/scan.py; mutation pass below only
sees a manifest that has passed policy validation.
Tests:
- `tests/test_severity_override_floor.py` (507 LOC, 27 cases):
floor enforcement (hard, no ack bypass), tier-crossing semantics
(downgrade-requires-ack, upgrade-never-requires-ack), expiry
(today and past = expired), unknown check_id rejection, legacy
scalar coercion, rich-form round-trip, audit shape, duplicate-ack
rejection, CheckMetadata self-consistency.
Follow-up (not in this PR):
- Run `python scripts/generate_schemas.py` to write
`docs/report-schema.v0.17.json` and refresh `docs/checks.json` +
`docs/manifest-v0.1.json` with the new fields. The generator
already knows how to mark `policy_audit` required + non-nullable.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…expiry
Four blockers caught in review:
P1.1 — Schema artifacts missing. The PR bumped emitted reports to
report_schema_version 0.17 but did not commit docs/report-schema.v0.17.json
or refresh the public surfaces. Fixed by:
- Running scripts/generate_schemas.py to write docs/report-schema.v0.17.json,
refresh docs/checks.json and docs/manifest-v0.1.json with the new fields.
- Bumping v0.16 → v0.17 in .well-known/agents-shipgate.json, README.md (3
callsites), docs/INDEX.md, docs/agent-contract-current.md (3 callsites),
AGENTS.md (3 callsites), docs/examples.md, docs/autofix-policy.md,
llms.txt (2 callsites), skills/agents-shipgate/SKILL.md. v0.16 moves
to the frozen-reference list in each.
- Updating tests/test_provenance_kind.py CURRENT_SCHEMA + tests/test_reports.py
REPORT_SCHEMA_V16 → REPORT_SCHEMA_V17 references to validate against the
v0.17 schema.
- Regenerating llms-full.txt from the updated sources.
- Regenerating samples/*/expected/report.json so the golden fixtures
carry report_schema_version: 0.17.
P1.2 — Policy-pack rule override regression. cli/scan.py passed only
check_catalog(...) to resolve_severity_overrides, but run_checks already
treats policy-pack rule IDs as known via extra_known_check_ids. A
manifest overriding e.g. ORG-HIGH-RISK-OWNER-MISSING failed as
"unknown check_id". Fixed by:
- Extending resolve_severity_overrides with
extra_known_check_defaults: dict[str, Severity] | None, mapping each
policy-pack rule ID to its declared default severity. The resolver
builds a synthetic CheckMetadata with category="policy_pack" and
floor_severity=None — floors are a built-in trust contract by design.
- Wiring {resolved.rule.id: resolved.rule.severity for ... in
policy_packs.rules} from cli/scan.py.
- Updating the existing tests/test_policy_packs.py fixture (the exact
high → medium silent-downgrade pattern M1 is closing) to add an
acknowledge_overrides entry — the canonical example of the new
trust contract applied to policy-pack rule IDs.
- Adding 4 new test cases in tests/test_severity_override_floor.py
covering policy-pack rule ID acceptance, tier-crossing semantics,
same-tier passthrough, and the ack path.
P2.3 — Rich-form override `expires` was advisory. STABILITY.md and the
schema docstring promised `expires` is a hard expiry, but the resolver
only enforced expiry on acknowledge_overrides — rich-form override
entries with an expired `expires` were silently applied. Fixed by:
- New _enforce_override_expiry() helper, parallel to
_enforce_ack_expiry(). Same hard contract: exit 2 on/past the expires
date, no advisory bypass.
- 3 new test cases (expired, expires-today, expires-tomorrow).
P2.4 — Two test cases reasoned wrongly about tiers. The fixtures used
SHIP-SCHEMA-MISSING-BOUNDS (default high) and overrode to medium,
calling it "same tier" — but high → medium IS tier-crossing under the
documented tier definition (high tier → normal tier). The resolver
correctly rejected those without an ack. Fixed by:
- Swapping the fixtures to use SHIP-DOC-MISSING-DESCRIPTION (default
medium) → low (both in normal tier, genuinely same-tier).
- The corresponding ruff import-sort issue auto-fixed.
Plus one collateral regression caught by tests:
- report/tool_surface_diff.py iterated manifest.checks.severity_overrides
values expecting scalars, but they're now SeverityOverrideEntry
objects (legacy scalar form is coerced at load time via
ChecksConfig._coerce_severity_overrides). Extract entry.severity for
the hash/summary so the diff stays stable for repos that didn't add
reason/expires.
Test results:
- pytest: 1122 passed, 3 skipped, 0 failed.
- ruff: all checks passed.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Reviewer caught a real bypass: an action-surface policy declared at ``severity: critical`` would silently downgrade to ``high`` without an acknowledgement, because the resolver compared the override against ``CheckMetadata.default_severity`` (catalog static = high for SHIP-ACTION-POLICY-VIOLATION) instead of the manifest-declared severity the finding would actually emit at. Fix: the resolver now treats ``extra_known_check_defaults`` as "effective default severity per check ID". For check IDs in the catalog, the resolver takes ``max(catalog default, supplied default)`` for tier-crossing and audit purposes; floor enforcement still uses the catalog floor (the static gate floor for the check class). cli/scan.py aggregates the strongest declared severity across ``manifest.action_surface.policies[]`` and passes it as ``extra_known_check_defaults["SHIP-ACTION-POLICY-VIOLATION"]``. The same parameter still carries policy-pack rule defaults — the dict unifies "outside-catalog IDs" and "catalog IDs with dynamic emitted severity" under one shape, taking the stronger value when both apply. The reproducer case the reviewer described now correctly raises ConfigError with the critical → high tier-boundary diagnostic without an ack, and applies cleanly with one. The ``policy_audit.severity_overrides_applied`` row reports ``default_severity: critical`` (the effective default) instead of ``high`` (the catalog static), so reviewers see the real downgrade. Three new test cases in tests/test_severity_override_floor.py: - ``test_action_policy_critical_overrides_to_high_is_tier_crossing`` - ``test_action_policy_critical_overrides_to_high_with_ack_passes`` - ``test_action_policy_dynamic_default_only_used_when_stronger`` (the resolver never weakens the catalog default — dynamic values only escalate). STABILITY.md documents the dynamic-severity behavior under "Severity-override floor", clarifying the contract for SHIP-ACTION-POLICY-VIOLATION and policy-pack rules. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
88551a2 to
c10630a
Compare
Summary
SHIP-POLICY-APPROVAL-MISSING(critical) toinfosilently. M1 enforces hard severity floors on 16 release-critical built-in checks, surfaces every applied override in a top-of-reportpolicy_auditblock, and rejects expired acknowledgements as manifest config errors.report_schema_version0.16 → 0.17. Breaking for manifests that currently downgrade any of the floored checks below their new floor — failure mode is exit 2 with a routable error message, not silent.core/severity_overrides.py(331 LOC pure resolver),tests/test_severity_override_floor.py(507 LOC, 27 cases).Design highlights
Hard floor.
CheckMetadata.floor_severitydeclares the lowest severity an override is allowed to resolve to. Floors set on:critical → high(5 checks):SHIP-POLICY-APPROVAL-MISSING,SHIP-ACTION-{FINANCIAL-WRITE-CONTROL-MISSING, DESTRUCTIVE-ROLLBACK-MISSING, WILDCARD-SCOPE, EFFECT-ESCALATED, APPROVAL-REMOVED}high → medium(11 checks):SHIP-AUTH-*(4),SHIP-SCOPE-*(2),SHIP-INVENTORY-*(2),SHIP-POLICY-CONFIRMATION-MISSING,SHIP-SIDEFX-IDEMPOTENCY-MISSINGNo acknowledgement bypasses the floor.
Tier-crossing acknowledgement.
checks.acknowledge_overrides[]is required for any severity downgrade that crosses a tier boundary (critical / high / medium-low). Tier-crossing upgrades and same-tier downgrades never require ack.Rich override shape.
severity_overridesaccepts both the legacy scalar form (SHIP-XYZ: medium) and{severity, reason, expires}. Reason flows into the audit row; expires gives reviewers a time-bounded override.Expiry is hard. Expired ack → manifest config error (exit 2), no advisory-mode bypass.
Audit envelope. New
report.policy_audit.severity_overrides_applied[]with{check_id, default_severity, applied_severity, manifest_path, reason, tier_crossed, direction, expires}. Required + non-nullable on the wire (mirrors v0.12agent_summarypattern).Architectural notes
core/severity_overrides.pyowns the validation policy as a pure function with explicittoday=injection. Lets tests pin the date without monkey-patching.apply_severity_overrides(findings, dict[str, Severity])signature unchanged — direct callers intests/test_findings.pyandtests/test_policy_packs.pykeep working byte-for-byte.cli/scan.py; the mutation pass below only ever sees a manifest that has passed policy validation. Failing fast = failing routable.AgentsShipgateManifest.severity_overrides()still returns the flat scalar projection; newseverity_override_entries()returns the rich shape andacknowledge_overrides()returns the ack list.Files touched
core/models.pyreport_schema_version: "0.17",CheckMetadata.floor_severity+ validator, newSeverityOverrideAuditEntry+PolicyAudit,ReadinessReport.policy_auditcore/severity_overrides.py(new)core/findings.pybuild_report(policy_audit=...)kwargconfig/schema.pySeverityOverrideEntry,OverrideAcknowledgement, scalar back-compat coercion, duplicate-ack rejectionchecks/registry.pyfloor_severitydeclared on 16 built-inscli/scan.pyresolve_severity_overrideswired before mutation; audit threaded intobuild_reportreport/markdown.py## Policy Auditsection between Release Decision and Summaryci/github_summary.pyPolicy audit: N override(s) · K downgrade · J tier-crossedscripts/generate_schemas.pypolicy_auditrequired + non-nullable on the wireSTABILITY.mdCHANGELOG.mdtests/test_severity_override_floor.py(new)Test plan
python scripts/generate_schemas.py— regeneratedocs/report-schema.v0.17.json+ refreshdocs/checks.json+docs/manifest-v0.1.json. The generator already knows how to markpolicy_auditrequired + non-nullable; this PR updates the generator but leaves schema file generation to the developer because the sandbox has no Python runtime.pytest tests/test_severity_override_floor.py -v— new test file (27 cases).pytest tests/test_findings.py tests/test_config.py tests/test_scan.py tests/test_policy_packs.py -v— regression on the legacy paths that bypass the resolver and callapply_severity_overridesdirectly with a scalar dict. Should pass byte-for-byte.ruff check src tests— lint.samples/support_refund_agentand confirmreport.policy_audit.severity_overrides_appliedexists as an empty array (no overrides in that fixture).checks.severity_overrides: SHIP-POLICY-APPROVAL-MISSING: infoto a manifest, confirm exit 2 with the floor error message.SHIP-AUTH-MANIFEST-BROAD-SCOPE: mediumwithout an ack, confirm exit 2 with the tier-boundary error message. Add the matchingacknowledge_overridesentry, confirm scan succeeds and the audit row carries the reason.Out of scope
--json-summary, M7 agent-mode unification, M8 truth-table doc). Each lands in its own PR; this one is the trust-spine pivot.🤖 Generated with Claude Code