Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion AGENTS.md
Original file line number Diff line number Diff line change
Expand Up @@ -250,8 +250,9 @@ Other stable top-level fields:
- `findings[].blocks_release` (v0.16+, explicit release-policy blockers from Action Surface Diff policies)
- `action_surface_facts` / `action_surface_diff` (v0.16+, deterministic action snapshot and base/head action delta)
- `release_decision.contribution_rules[]` (v0.17+, per-finding audit of how each finding contributed to the decision; one row per `report.findings` entry, with `category` ∈ `{blocker, review_item, excluded}` and `rule` ∈ `{policy_block_new, severity_block_new, policy_baseline_accepted, severity_baseline_accepted, review_required, sub_threshold, suppressed}`)
- `policy_audit.severity_overrides_applied[]` (v0.17+, top-of-report audit envelope listing every manifest-driven severity override with `{check_id, default_severity, applied_severity, manifest_path, reason, tier_crossed, direction, expires}`)

The full schema is at [`docs/report-schema.v0.17.json`](docs/report-schema.v0.17.json) (current; emitted reports carry `report_schema_version: "0.17"`). v0.17 adds the per-finding `release_decision.contribution_rules[]` audit, on top of v0.16's first-class Action Surface Diff fields, v0.15's per-finding `provenance_kind` enum, v0.14's `insufficient_evidence` value in the `release_decision.decision`/`agent_summary.verdict` enums, and v0.13's `codex_plugin_surface` block. Older reports validate against [`docs/report-schema.v0.16.json`](docs/report-schema.v0.16.json) (frozen reference). What's-stable is documented in [STABILITY.md](STABILITY.md).
The full schema is at [`docs/report-schema.v0.17.json`](docs/report-schema.v0.17.json) (current; emitted reports carry `report_schema_version: "0.17"`). v0.17 adds the top-level `policy_audit` block surfacing applied severity overrides and the per-finding `release_decision.contribution_rules[]` audit, on top of v0.16's first-class Action Surface Diff fields, v0.15's per-finding `provenance_kind` enum, v0.14's `insufficient_evidence` value in the `release_decision.decision`/`agent_summary.verdict` enums, and v0.13's `codex_plugin_surface` block. Older reports validate against [`docs/report-schema.v0.16.json`](docs/report-schema.v0.16.json) (frozen reference). What's-stable is documented in [STABILITY.md](STABILITY.md).

**Release gating signal**: prefer `release_decision.decision` (`"blocked" | "review_required" | "insufficient_evidence" | "passed"`) over `summary.status`. The new field is **baseline-aware** — a baseline-matched critical surfaces in `release_decision.review_items` (accepted debt), not `release_decision.blockers`. `summary.status` stays baseline-blind for v0.7 compatibility, so a baseline-matched-only critical produces both `summary.status = "release_blockers_detected"` AND `release_decision.decision = "review_required"` (intentional divergence — see [STABILITY.md](STABILITY.md#release_decisiondecision-vs-summarystatus)). `insufficient_evidence` (added v0.14) signals that the scan saw too many low-confidence tools or source-loader warnings to be trustworthy; consumers that switch on the enum must fall back to `review_required` for unknown future values.

Expand Down
66 changes: 62 additions & 4 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,13 +2,71 @@

## Unreleased

- **v0.17 / M1 trust-hardening: severity-override floor + audit.**
- `core.models.CheckMetadata` gains an optional `floor_severity` field
(Severity | None). 16 release-critical built-in checks now declare a
hard floor:
- `SHIP-POLICY-APPROVAL-MISSING` (critical → floor "high")
- `SHIP-ACTION-{FINANCIAL-WRITE-CONTROL-MISSING, DESTRUCTIVE-ROLLBACK-MISSING,
WILDCARD-SCOPE, EFFECT-ESCALATED, APPROVAL-REMOVED}` (critical → floor "high")
- `SHIP-AUTH-{MISSING-SCOPE, MANIFEST-BROAD-SCOPE, TOOL-BROAD-SCOPE,
SCOPE-COVERAGE-MISSING}` (high → floor "medium")
- `SHIP-SCOPE-{TOOL-OUTSIDE-PURPOSE, PROHIBITED-TOOL-PRESENT}` (high → floor "medium")
- `SHIP-INVENTORY-{WILDCARD-TOOLS, LOW-CONFIDENCE-PRODUCTION-SURFACE}` (high → floor "medium")
- `SHIP-POLICY-CONFIRMATION-MISSING` (high → floor "medium")
- `SHIP-SIDEFX-IDEMPOTENCY-MISSING` (high → floor "medium")
- Any `checks.severity_overrides` entry that resolves below the floor
is rejected as a manifest config error (exit 2). The floor is hard;
no acknowledgement bypasses it. **Breaking** for manifests that
previously downgraded these checks below their new floor — fix by
raising the override to floor-or-above, or removing the override.
- `checks.severity_overrides` accepts both the legacy scalar form
(`SHIP-XYZ: medium`) and a new rich form
(`SHIP-XYZ: { severity, reason, expires }`). Reason flows into the
new audit row; expires gives reviewers a time-bounded override.
- New `checks.acknowledge_overrides[]` block. Required for any
severity override whose application crosses a severity tier
boundary (critical ↔ high, high ↔ medium/low/info) as a downgrade.
Tier-crossing **upgrades** never require ack (strictly more
conservative). Same-tier downgrades (medium → low) don't require ack.
For checks emitted with manifest-declared severity (action-surface
policies via `SHIP-ACTION-POLICY-VIOLATION`, policy-pack rules)
the resolver compares against the strongest declared severity
across the manifest, not the static catalog default — so a
`severity: critical` action policy with override `high` is
correctly tier-crossing and requires ack.
- Expired `acknowledge_overrides` entry raises a manifest config error
(exit 2) — no advisory-mode bypass. Same hard contract applies to
`expires` on rich-form `severity_overrides` entries.
- New top-level `report.policy_audit` block surfacing every applied
override:
`policy_audit.severity_overrides_applied[].{check_id,
default_severity, applied_severity, manifest_path, reason,
tier_crossed, direction, expires}`. Always emitted on scans (empty
envelope when no overrides applied); required + non-nullable on
the wire (mirrors the v0.12 `agent_summary` pattern). Lands at
`report_schema_version: "0.17"` alongside M8's
`release_decision.contribution_rules[]` — both audits are additive
and share the same schema bump.
- Markdown report renders a new "Policy Audit" section between
Release Decision and Summary when overrides exist. GitHub step
summary adds a one-liner counting overrides + tier-crossed +
upgrades/downgrades.
- New module `core/severity_overrides.py` owns floor/tier/ack/expiry
resolution as a pure function; `core/findings.py::apply_severity_overrides`
still consumes a flat `dict[str, Severity]` so existing direct
callers and tests stay byte-compatible.
- `AgentsShipgateManifest.severity_overrides()` still returns the
flat scalar projection for back-compat; new
`severity_override_entries()` returns the rich shape and
`acknowledge_overrides()` returns the ack list.
- Added `release_decision.contribution_rules[]` — a deterministic
per-finding audit of how each finding contributed to the release
decision (M8 of the Trust Hardening Pass). Bumps
`report_schema_version` to `0.17`. Exactly one row per
`report.findings` entry (including suppressed) with `category` ∈
`{blocker, review_item, excluded}` and `rule` ∈ `{policy_block_new,
severity_block_new, policy_baseline_accepted,
`report_schema_version` to `0.17` (shared with M1's `policy_audit`).
Exactly one row per `report.findings` entry (including suppressed)
with `category` ∈ `{blocker, review_item, excluded}` and `rule` ∈
`{policy_block_new, severity_block_new, policy_baseline_accepted,
severity_baseline_accepted, review_required, sub_threshold,
suppressed}`. The new `STABILITY.md` "Release decision truth table"
documents which `(rule, category)` pair fires for every
Expand Down
6 changes: 3 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -190,7 +190,7 @@ Set `pr_comment: "true"` to post a compact PR summary:

## What it produces

- **Tool-Use Readiness Report** — `agents-shipgate-reports/report.{md,json,sarif}`. Markdown for human release review, JSON for tools and coding agents (current schema [v0.17](docs/report-schema.v0.17.json); gating signal is `release_decision.decision`; v0.17 adds the per-finding `release_decision.contribution_rules[]` audit on top of v0.16's first-class Action Surface Diff fields and v0.15's per-finding `provenance_kind`), SARIF for GitHub code-scanning workflows.
- **Tool-Use Readiness Report** — `agents-shipgate-reports/report.{md,json,sarif}`. Markdown for human release review, JSON for tools and coding agents (current schema [v0.17](docs/report-schema.v0.17.json); gating signal is `release_decision.decision`; v0.17 adds the top-level `policy_audit` block surfacing every applied severity override plus the per-finding `release_decision.contribution_rules[]` decision audit on top of v0.16's first-class Action Surface Diff fields and v0.15's per-finding `provenance_kind`), SARIF for GitHub code-scanning workflows.
- **Release Evidence Packet** — `agents-shipgate-reports/packet.{md,json,html}` (and `packet.pdf` with the `[pdf]` extras). Reviewer-shaped synthesis with fixed sections, including tool-surface and action-surface diffs when available. Governed by [packet schema v0.5](docs/packet-schema.v0.5.json) — see [STABILITY.md §Release Evidence Packet](STABILITY.md#release-evidence-packet-v05).

## Exit codes
Expand Down Expand Up @@ -226,7 +226,7 @@ Agents Shipgate is designed to be agent-friendly. If you're a coding agent (Clau
- **[`prompts/`](prompts/)** — reusable prompts for common workflows
- **[`skills/agents-shipgate/`](skills/agents-shipgate/)** + **[`.claude/commands/shipgate.md`](.claude/commands/shipgate.md)** — self-contained Claude Code skill (bundled prompts and CI recipe) and `/shipgate` slash command. See [`docs/agents/use-with-claude-code.md`](docs/agents/use-with-claude-code.md) to install in your own project.
- **[`docs/ai-search-summary.md`](docs/ai-search-summary.md)** — human-readable summary for AI search, answer engines, and coding agents
- **[`docs/manifest-v0.1.json`](docs/manifest-v0.1.json)** + **[`docs/report-schema.v0.17.json`](docs/report-schema.v0.17.json)** — JSON Schemas for live editor validation (current; emitted reports carry `report_schema_version: "0.17"`). v0.17 adds `release_decision.contribution_rules[]` (per-finding decision audit); v0.16 added `action_surface_facts` and `action_surface_diff`; v0.15 added the per-finding `provenance_kind` enum. Read `release_decision.decision` for release gating in new consumers; read `agent_summary.first_recommended_action` for a deterministic next step.
- **[`docs/manifest-v0.1.json`](docs/manifest-v0.1.json)** + **[`docs/report-schema.v0.17.json`](docs/report-schema.v0.17.json)** — JSON Schemas for live editor validation (current; emitted reports carry `report_schema_version: "0.17"`). v0.17 adds the top-level `policy_audit` block surfacing applied severity overrides and the per-finding `release_decision.contribution_rules[]` decision audit; v0.16 added `action_surface_facts` and `action_surface_diff`; v0.15 added the per-finding `provenance_kind` enum. Read `release_decision.decision` for release gating in new consumers; read `agent_summary.first_recommended_action` for a deterministic next step.
- **[`docs/checks.json`](docs/checks.json)** — machine-readable check catalog

Every command has a `--json` form. Errors emit a structured `next_action` line on stderr when `AGENTS_SHIPGATE_AGENT_MODE=1`.
Expand Down Expand Up @@ -414,7 +414,7 @@ Agents Shipgate is a static, manifest-first scanner. It is intentionally narrow:
- It does not verify runtime behavior, latency, prompt quality, or routing decisions.
- It does not replace dynamic security testing or human security review of the underlying systems.
- It only inspects what is declared in `shipgate.yaml`, local OpenAPI specs, MCP exports, simple OpenAI API artifacts, optional SDK AST metadata, static Google ADK/LangChain/CrewAI inputs, and static Codex plugin package metadata; tools that are not declared or statically discoverable are not scanned.
- The manifest remains `version: "0.1"` so existing configs keep working. Current reports carry `report_schema_version: "0.17"` (additive over v0.16, adding `release_decision.contribution_rules[]` — a deterministic per-finding audit of how each finding contributed to the release decision) while preserving the stable payload contract documented in the report schema.
- The manifest remains `version: "0.1"` so existing configs keep working. Current reports carry `report_schema_version: "0.17"` (additive over v0.16's action-surface diff, adding the top-level `policy_audit` block surfacing applied severity overrides and the per-finding `release_decision.contribution_rules[]` decision audit) while preserving the stable payload contract documented in the report schema.

See [ROADMAP.md](ROADMAP.md) for what is planned next.

Expand Down
35 changes: 35 additions & 0 deletions STABILITY.md
Original file line number Diff line number Diff line change
Expand Up @@ -101,6 +101,41 @@ In `agents-shipgate-reports/report.json`, the following are guaranteed:
- `tool_inventory[].{name, source_type, source_ref, risk_tags, auth_scopes, owner, confidence}`
- `loaded_plugins[].{name, value, distribution, version, check_id}`
- `loaded_plugins[].{validation_status, validation_errors, runtime_errors}` (v0.17+ / M5) — plugin validation provenance, required + present on every entry. `validation_status` is one of `valid | load_failed | bad_signature | bad_metadata | id_collision | bad_floor`; the two error lists are always present and empty for clean plugins. Invalid plugins still appear in this array (with `check_id: null` for entries that failed before metadata parsing), so reviewers can see what was skipped without reading scanner logs. Plugin findings whose `check_id` does not match the declared metadata are dropped at runtime and recorded under `runtime_errors`.
- `policy_audit.severity_overrides_applied[].{check_id, default_severity, applied_severity, manifest_path, reason, tier_crossed, direction, expires}` (v0.17+ / M1) — top-of-report audit envelope for severity overrides applied during scan. Always present on emitted scans (empty when no overrides applied); required + non-nullable on the wire. `direction` is one of `downgrade | upgrade | same`. `tier_crossed=true` indicates the override crossed a severity tier boundary (critical / high / medium-low); tier-crossing downgrades require a matching `checks.acknowledge_overrides` entry, which is reflected in `reason`. `expires` is an ISO-8601 date carried from the matching acknowledgement (or the rich-form override entry); on/past this date the manifest fails to load with exit 2.

### Severity-override floor

`checks.severity_overrides` continues to accept the legacy scalar form
(`SHIP-XYZ: medium`) and additionally accepts a rich form
(`SHIP-XYZ: { severity, reason, expires }`). Reviewers should prefer the
rich form for any tier-crossing or release-critical override.

Some built-in checks declare a per-check **hard floor**
(`CheckMetadata.floor_severity`). When set, a manifest override that
resolves to a weaker severity than the floor is rejected as a config
error (exit 2). The floor is hard — `acknowledge_overrides` does NOT
bypass it. Use `agents-shipgate list-checks --json` to inspect each
check's floor.

`checks.acknowledge_overrides[]` (v0.17+) — required for severity
overrides whose application crosses a severity tier boundary as a
downgrade. Stable shape: `{check_id, reason, expires?}`. Within-tier
downgrades (e.g., medium → low) and any upgrade never require ack.
Tiers (stable within `0.x`): `critical / high / medium-low`. Expired
ack entries are a manifest config error.

**Dynamic-severity check classes** (v0.17+). For check IDs whose
emitted finding severity depends on user-declared manifest values —
specifically `SHIP-ACTION-POLICY-VIOLATION` (emits at
`action_surface.policies[].severity`) and policy-pack rule IDs (emit
at the pack rule's `severity`) — the resolver uses the **strongest
declared severity** across the manifest as the tier-crossing
comparison base, not the static catalog default. This closes the
bypass where a `severity: critical` action policy with override
`high` could appear same-tier against the catalog's `high` default.
The `policy_audit.severity_overrides_applied[].default_severity`
row reports the effective (dynamic-aware) default so reviewers see
the real before/after.

### Scenario Suggestion YAML

Expand Down
2 changes: 1 addition & 1 deletion docs/INDEX.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ A single entry point for human readers and AI agents walking the `docs/` tree.
- [`checks.md`](checks.md) — full check catalog (human-readable)
- [`checks.json`](checks.json) — machine-readable check catalog (regenerated each release)
- [`manifest-v0.1.json`](manifest-v0.1.json) — JSON Schema for `shipgate.yaml`
- [`report-schema.v0.17.json`](report-schema.v0.17.json) — JSON Schema for `report.json` (current; emitted reports carry `report_schema_version: "0.17"`, which adds the per-finding `release_decision.contribution_rules[]` audit on top of v0.16's first-class Action Surface Diff fields)
- [`report-schema.v0.17.json`](report-schema.v0.17.json) — JSON Schema for `report.json` (current; emitted reports carry `report_schema_version: "0.17"`, which adds the top-level `policy_audit` block surfacing applied severity overrides plus the per-finding `release_decision.contribution_rules[]` audit on top of v0.16's first-class Action Surface Diff fields)
- [`agent-action-guide.md`](agent-action-guide.md) — per-category recipe for what to do with a finding (canonical fix per check category, last-resort suppression rules)
- [`upstream-integrations.md`](upstream-integrations.md) — per-framework 60-second drop-in for adding Shipgate to an existing project (OpenAI Agents SDK, LangChain, CrewAI, ADK, MCP-only, OpenAPI-only, OpenAI Messages API, Anthropic Messages API)
- [`report-schema.v0.16.json`](report-schema.v0.16.json) — frozen v0.16 reference schema; pre-v0.17 reports validate against this
Expand Down
Loading