ThreeMoonsLab · pengfei-threemoonslab · May 17, 2026 · May 16, 2026 · May 16, 2026 · May 16, 2026
diff --git a/AGENTS.md b/AGENTS.md
@@ -250,8 +250,9 @@ Other stable top-level fields:
 - `findings[].blocks_release` (v0.16+, explicit release-policy blockers from Action Surface Diff policies)
 - `action_surface_facts` / `action_surface_diff` (v0.16+, deterministic action snapshot and base/head action delta)
 - `release_decision.contribution_rules[]` (v0.17+, per-finding audit of how each finding contributed to the decision; one row per `report.findings` entry, with `category` ∈ `{blocker, review_item, excluded}` and `rule` ∈ `{policy_block_new, severity_block_new, policy_baseline_accepted, severity_baseline_accepted, review_required, sub_threshold, suppressed}`)
+- `policy_audit.severity_overrides_applied[]` (v0.17+, top-of-report audit envelope listing every manifest-driven severity override with `{check_id, default_severity, applied_severity, manifest_path, reason, tier_crossed, direction, expires}`)
 
-The full schema is at [`docs/report-schema.v0.17.json`](docs/report-schema.v0.17.json) (current; emitted reports carry `report_schema_version: "0.17"`). v0.17 adds the per-finding `release_decision.contribution_rules[]` audit, on top of v0.16's first-class Action Surface Diff fields, v0.15's per-finding `provenance_kind` enum, v0.14's `insufficient_evidence` value in the `release_decision.decision`/`agent_summary.verdict` enums, and v0.13's `codex_plugin_surface` block. Older reports validate against [`docs/report-schema.v0.16.json`](docs/report-schema.v0.16.json) (frozen reference). What's-stable is documented in [STABILITY.md](STABILITY.md).
+The full schema is at [`docs/report-schema.v0.17.json`](docs/report-schema.v0.17.json) (current; emitted reports carry `report_schema_version: "0.17"`). v0.17 adds the top-level `policy_audit` block surfacing applied severity overrides and the per-finding `release_decision.contribution_rules[]` audit, on top of v0.16's first-class Action Surface Diff fields, v0.15's per-finding `provenance_kind` enum, v0.14's `insufficient_evidence` value in the `release_decision.decision`/`agent_summary.verdict` enums, and v0.13's `codex_plugin_surface` block. Older reports validate against [`docs/report-schema.v0.16.json`](docs/report-schema.v0.16.json) (frozen reference). What's-stable is documented in [STABILITY.md](STABILITY.md).
 
 **Release gating signal**: prefer `release_decision.decision` (`"blocked" | "review_required" | "insufficient_evidence" | "passed"`) over `summary.status`. The new field is **baseline-aware** — a baseline-matched critical surfaces in `release_decision.review_items` (accepted debt), not `release_decision.blockers`. `summary.status` stays baseline-blind for v0.7 compatibility, so a baseline-matched-only critical produces both `summary.status = "release_blockers_detected"` AND `release_decision.decision = "review_required"` (intentional divergence — see [STABILITY.md](STABILITY.md#release_decisiondecision-vs-summarystatus)). `insufficient_evidence` (added v0.14) signals that the scan saw too many low-confidence tools or source-loader warnings to be trustworthy; consumers that switch on the enum must fall back to `review_required` for unknown future values.
 

diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -2,13 +2,71 @@
 
 ## Unreleased
 
+- **v0.17 / M1 trust-hardening: severity-override floor + audit.**
+  - `core.models.CheckMetadata` gains an optional `floor_severity` field
+    (Severity | None). 16 release-critical built-in checks now declare a
+    hard floor:
+    - `SHIP-POLICY-APPROVAL-MISSING` (critical → floor "high")
+    - `SHIP-ACTION-{FINANCIAL-WRITE-CONTROL-MISSING, DESTRUCTIVE-ROLLBACK-MISSING,
+      WILDCARD-SCOPE, EFFECT-ESCALATED, APPROVAL-REMOVED}` (critical → floor "high")
+    - `SHIP-AUTH-{MISSING-SCOPE, MANIFEST-BROAD-SCOPE, TOOL-BROAD-SCOPE,
+      SCOPE-COVERAGE-MISSING}` (high → floor "medium")
+    - `SHIP-SCOPE-{TOOL-OUTSIDE-PURPOSE, PROHIBITED-TOOL-PRESENT}` (high → floor "medium")
+    - `SHIP-INVENTORY-{WILDCARD-TOOLS, LOW-CONFIDENCE-PRODUCTION-SURFACE}` (high → floor "medium")
+    - `SHIP-POLICY-CONFIRMATION-MISSING` (high → floor "medium")
+    - `SHIP-SIDEFX-IDEMPOTENCY-MISSING` (high → floor "medium")
+  - Any `checks.severity_overrides` entry that resolves below the floor
+    is rejected as a manifest config error (exit 2). The floor is hard;
+    no acknowledgement bypasses it. **Breaking** for manifests that
+    previously downgraded these checks below their new floor — fix by
+    raising the override to floor-or-above, or removing the override.
+  - `checks.severity_overrides` accepts both the legacy scalar form
+    (`SHIP-XYZ: medium`) and a new rich form
+    (`SHIP-XYZ: { severity, reason, expires }`). Reason flows into the
+    new audit row; expires gives reviewers a time-bounded override.
+  - New `checks.acknowledge_overrides[]` block. Required for any
+    severity override whose application crosses a severity tier
+    boundary (critical ↔ high, high ↔ medium/low/info) as a downgrade.
+    Tier-crossing **upgrades** never require ack (strictly more
+    conservative). Same-tier downgrades (medium → low) don't require ack.
+    For checks emitted with manifest-declared severity (action-surface
+    policies via `SHIP-ACTION-POLICY-VIOLATION`, policy-pack rules)
+    the resolver compares against the strongest declared severity
+    across the manifest, not the static catalog default — so a
+    `severity: critical` action policy with override `high` is
+    correctly tier-crossing and requires ack.
+  - Expired `acknowledge_overrides` entry raises a manifest config error
+    (exit 2) — no advisory-mode bypass. Same hard contract applies to
+    `expires` on rich-form `severity_overrides` entries.
+  - New top-level `report.policy_audit` block surfacing every applied
+    override:
+    `policy_audit.severity_overrides_applied[].{check_id,
+    default_severity, applied_severity, manifest_path, reason,
+    tier_crossed, direction, expires}`. Always emitted on scans (empty
+    envelope when no overrides applied); required + non-nullable on
+    the wire (mirrors the v0.12 `agent_summary` pattern). Lands at
+    `report_schema_version: "0.17"` alongside M8's
+    `release_decision.contribution_rules[]` — both audits are additive
+    and share the same schema bump.
+  - Markdown report renders a new "Policy Audit" section between
+    Release Decision and Summary when overrides exist. GitHub step
+    summary adds a one-liner counting overrides + tier-crossed +
+    upgrades/downgrades.
+  - New module `core/severity_overrides.py` owns floor/tier/ack/expiry
+    resolution as a pure function; `core/findings.py::apply_severity_overrides`
+    still consumes a flat `dict[str, Severity]` so existing direct
+    callers and tests stay byte-compatible.
+  - `AgentsShipgateManifest.severity_overrides()` still returns the
+    flat scalar projection for back-compat; new
+    `severity_override_entries()` returns the rich shape and
+    `acknowledge_overrides()` returns the ack list.
 - Added `release_decision.contribution_rules[]` — a deterministic
   per-finding audit of how each finding contributed to the release
   decision (M8 of the Trust Hardening Pass). Bumps
-  `report_schema_version` to `0.17`. Exactly one row per
-  `report.findings` entry (including suppressed) with `category` ∈
-  `{blocker, review_item, excluded}` and `rule` ∈ `{policy_block_new,
-  severity_block_new, policy_baseline_accepted,
+  `report_schema_version` to `0.17` (shared with M1's `policy_audit`).
+  Exactly one row per `report.findings` entry (including suppressed)
+  with `category` ∈ `{blocker, review_item, excluded}` and `rule` ∈
+  `{policy_block_new, severity_block_new, policy_baseline_accepted,
   severity_baseline_accepted, review_required, sub_threshold,
   suppressed}`. The new `STABILITY.md` "Release decision truth table"
   documents which `(rule, category)` pair fires for every

diff --git a/README.md b/README.md
@@ -190,7 +190,7 @@ Set `pr_comment: "true"` to post a compact PR summary:
 
 ## What it produces
 
-- **Tool-Use Readiness Report** — `agents-shipgate-reports/report.{md,json,sarif}`. Markdown for human release review, JSON for tools and coding agents (current schema [v0.17](docs/report-schema.v0.17.json); gating signal is `release_decision.decision`; v0.17 adds the per-finding `release_decision.contribution_rules[]` audit on top of v0.16's first-class Action Surface Diff fields and v0.15's per-finding `provenance_kind`), SARIF for GitHub code-scanning workflows.
+- **Tool-Use Readiness Report** — `agents-shipgate-reports/report.{md,json,sarif}`. Markdown for human release review, JSON for tools and coding agents (current schema [v0.17](docs/report-schema.v0.17.json); gating signal is `release_decision.decision`; v0.17 adds the top-level `policy_audit` block surfacing every applied severity override plus the per-finding `release_decision.contribution_rules[]` decision audit on top of v0.16's first-class Action Surface Diff fields and v0.15's per-finding `provenance_kind`), SARIF for GitHub code-scanning workflows.
 - **Release Evidence Packet** — `agents-shipgate-reports/packet.{md,json,html}` (and `packet.pdf` with the `[pdf]` extras). Reviewer-shaped synthesis with fixed sections, including tool-surface and action-surface diffs when available. Governed by [packet schema v0.5](docs/packet-schema.v0.5.json) — see [STABILITY.md §Release Evidence Packet](STABILITY.md#release-evidence-packet-v05).
 
 ## Exit codes
@@ -226,7 +226,7 @@ Agents Shipgate is designed to be agent-friendly. If you're a coding agent (Clau
 - **[`prompts/`](prompts/)** — reusable prompts for common workflows
 - **[`skills/agents-shipgate/`](skills/agents-shipgate/)** + **[`.claude/commands/shipgate.md`](.claude/commands/shipgate.md)** — self-contained Claude Code skill (bundled prompts and CI recipe) and `/shipgate` slash command. See [`docs/agents/use-with-claude-code.md`](docs/agents/use-with-claude-code.md) to install in your own project.
 - **[`docs/ai-search-summary.md`](docs/ai-search-summary.md)** — human-readable summary for AI search, answer engines, and coding agents
-- **[`docs/manifest-v0.1.json`](docs/manifest-v0.1.json)** + **[`docs/report-schema.v0.17.json`](docs/report-schema.v0.17.json)** — JSON Schemas for live editor validation (current; emitted reports carry `report_schema_version: "0.17"`). v0.17 adds `release_decision.contribution_rules[]` (per-finding decision audit); v0.16 added `action_surface_facts` and `action_surface_diff`; v0.15 added the per-finding `provenance_kind` enum. Read `release_decision.decision` for release gating in new consumers; read `agent_summary.first_recommended_action` for a deterministic next step.
+- **[`docs/manifest-v0.1.json`](docs/manifest-v0.1.json)** + **[`docs/report-schema.v0.17.json`](docs/report-schema.v0.17.json)** — JSON Schemas for live editor validation (current; emitted reports carry `report_schema_version: "0.17"`). v0.17 adds the top-level `policy_audit` block surfacing applied severity overrides and the per-finding `release_decision.contribution_rules[]` decision audit; v0.16 added `action_surface_facts` and `action_surface_diff`; v0.15 added the per-finding `provenance_kind` enum. Read `release_decision.decision` for release gating in new consumers; read `agent_summary.first_recommended_action` for a deterministic next step.
 - **[`docs/checks.json`](docs/checks.json)** — machine-readable check catalog
 
 Every command has a `--json` form. Errors emit a structured `next_action` line on stderr when `AGENTS_SHIPGATE_AGENT_MODE=1`.
@@ -414,7 +414,7 @@ Agents Shipgate is a static, manifest-first scanner. It is intentionally narrow:
 - It does not verify runtime behavior, latency, prompt quality, or routing decisions.
 - It does not replace dynamic security testing or human security review of the underlying systems.
 - It only inspects what is declared in `shipgate.yaml`, local OpenAPI specs, MCP exports, simple OpenAI API artifacts, optional SDK AST metadata, static Google ADK/LangChain/CrewAI inputs, and static Codex plugin package metadata; tools that are not declared or statically discoverable are not scanned.
-- The manifest remains `version: "0.1"` so existing configs keep working. Current reports carry `report_schema_version: "0.17"` (additive over v0.16, adding `release_decision.contribution_rules[]` — a deterministic per-finding audit of how each finding contributed to the release decision) while preserving the stable payload contract documented in the report schema.
+- The manifest remains `version: "0.1"` so existing configs keep working. Current reports carry `report_schema_version: "0.17"` (additive over v0.16's action-surface diff, adding the top-level `policy_audit` block surfacing applied severity overrides and the per-finding `release_decision.contribution_rules[]` decision audit) while preserving the stable payload contract documented in the report schema.
 
 See [ROADMAP.md](ROADMAP.md) for what is planned next.
 

diff --git a/STABILITY.md b/STABILITY.md
@@ -101,6 +101,41 @@ In `agents-shipgate-reports/report.json`, the following are guaranteed:
 - `tool_inventory[].{name, source_type, source_ref, risk_tags, auth_scopes, owner, confidence}`
 - `loaded_plugins[].{name, value, distribution, version, check_id}`
 - `loaded_plugins[].{validation_status, validation_errors, runtime_errors}` (v0.17+ / M5) — plugin validation provenance, required + present on every entry. `validation_status` is one of `valid | load_failed | bad_signature | bad_metadata | id_collision | bad_floor`; the two error lists are always present and empty for clean plugins. Invalid plugins still appear in this array (with `check_id: null` for entries that failed before metadata parsing), so reviewers can see what was skipped without reading scanner logs. Plugin findings whose `check_id` does not match the declared metadata are dropped at runtime and recorded under `runtime_errors`.
+- `policy_audit.severity_overrides_applied[].{check_id, default_severity, applied_severity, manifest_path, reason, tier_crossed, direction, expires}` (v0.17+ / M1) — top-of-report audit envelope for severity overrides applied during scan. Always present on emitted scans (empty when no overrides applied); required + non-nullable on the wire. `direction` is one of `downgrade | upgrade | same`. `tier_crossed=true` indicates the override crossed a severity tier boundary (critical / high / medium-low); tier-crossing downgrades require a matching `checks.acknowledge_overrides` entry, which is reflected in `reason`. `expires` is an ISO-8601 date carried from the matching acknowledgement (or the rich-form override entry); on/past this date the manifest fails to load with exit 2.
+
+### Severity-override floor
+
+`checks.severity_overrides` continues to accept the legacy scalar form
+(`SHIP-XYZ: medium`) and additionally accepts a rich form
+(`SHIP-XYZ: { severity, reason, expires }`). Reviewers should prefer the
+rich form for any tier-crossing or release-critical override.
+
+Some built-in checks declare a per-check **hard floor**
+(`CheckMetadata.floor_severity`). When set, a manifest override that
+resolves to a weaker severity than the floor is rejected as a config
+error (exit 2). The floor is hard — `acknowledge_overrides` does NOT
+bypass it. Use `agents-shipgate list-checks --json` to inspect each
+check's floor.
+
+`checks.acknowledge_overrides[]` (v0.17+) — required for severity
+overrides whose application crosses a severity tier boundary as a
+downgrade. Stable shape: `{check_id, reason, expires?}`. Within-tier
+downgrades (e.g., medium → low) and any upgrade never require ack.
+Tiers (stable within `0.x`): `critical / high / medium-low`. Expired
+ack entries are a manifest config error.
+
+**Dynamic-severity check classes** (v0.17+). For check IDs whose
+emitted finding severity depends on user-declared manifest values —
+specifically `SHIP-ACTION-POLICY-VIOLATION` (emits at
+`action_surface.policies[].severity`) and policy-pack rule IDs (emit
+at the pack rule's `severity`) — the resolver uses the **strongest
+declared severity** across the manifest as the tier-crossing
+comparison base, not the static catalog default. This closes the
+bypass where a `severity: critical` action policy with override
+`high` could appear same-tier against the catalog's `high` default.
+The `policy_audit.severity_overrides_applied[].default_severity`
+row reports the effective (dynamic-aware) default so reviewers see
+the real before/after.
 
 ### Scenario Suggestion YAML
 

diff --git a/docs/INDEX.md b/docs/INDEX.md
@@ -21,7 +21,7 @@ A single entry point for human readers and AI agents walking the `docs/` tree.
 - [`checks.md`](checks.md) — full check catalog (human-readable)
 - [`checks.json`](checks.json) — machine-readable check catalog (regenerated each release)
 - [`manifest-v0.1.json`](manifest-v0.1.json) — JSON Schema for `shipgate.yaml`
-- [`report-schema.v0.17.json`](report-schema.v0.17.json) — JSON Schema for `report.json` (current; emitted reports carry `report_schema_version: "0.17"`, which adds the per-finding `release_decision.contribution_rules[]` audit on top of v0.16's first-class Action Surface Diff fields)
+- [`report-schema.v0.17.json`](report-schema.v0.17.json) — JSON Schema for `report.json` (current; emitted reports carry `report_schema_version: "0.17"`, which adds the top-level `policy_audit` block surfacing applied severity overrides plus the per-finding `release_decision.contribution_rules[]` audit on top of v0.16's first-class Action Surface Diff fields)
 - [`agent-action-guide.md`](agent-action-guide.md) — per-category recipe for what to do with a finding (canonical fix per check category, last-resort suppression rules)
 - [`upstream-integrations.md`](upstream-integrations.md) — per-framework 60-second drop-in for adding Shipgate to an existing project (OpenAI Agents SDK, LangChain, CrewAI, ADK, MCP-only, OpenAPI-only, OpenAI Messages API, Anthropic Messages API)
 - [`report-schema.v0.16.json`](report-schema.v0.16.json) — frozen v0.16 reference schema; pre-v0.17 reports validate against this