Static release checks for tool-using AI agents.
Agents Shipgate is an open-source CLI and GitHub Action that scans MCP, OpenAPI, OpenAI Agents SDK, Anthropic Messages API, Google ADK, LangChain/LangGraph, CrewAI, n8n, and OpenAI API artifacts, then writes a deterministic Tool-Use Readiness Report before your agent gets production-like permissions.
Website: threemoonslab.com — quickstart, glossary, check catalog, and design partners.
No agent execution. No LLM calls. No MCP server connections. No scanner network calls. No scanner telemetry. Apache-2.0.
For a 5-minute first run, scan a bundled fixture and inspect the generated
report. If you already have uv installed, this
is a one-command check with no persistent install:
uvx agents-shipgate fixture run support_refund_agentOtherwise, install once with pipx and run the same fixture:
pipx install agents-shipgate
agents-shipgate fixture run support_refund_agentThe fixture prints:
Fixture: support_refund_agent
Decision: blocked
Blockers: 2 Review items: 16
Counts: critical=2 high=14 medium=2
Reports: <tempdir>/reports
Fixture copy at <tempdir>; pass --keep to retain after the run.
Both blockers are on stripe.create_refund: missing approval policy and missing idempotency evidence. The fixture writes report.{md,json} and packet.{md,json,html} into the temp reports/ directory. To scan your own repo and write the standard agents-shipgate-reports/ directory, see Scan your repo below.
The public Action is listed on the GitHub Action Marketplace. Use the snippet in Use in CI to add it to a workflow.
Open a report first if you want to see the output shape before installing:
| Sample | Markdown | JSON |
|---|---|---|
support_refund_agent |
report.md |
report.json |
simple_openai_api_agent |
report.md |
report.json |
simple_langchain_agent |
report.md |
report.json |
The support_refund_agent fixture also includes a reviewer-shaped Release
Evidence Packet in packet.md,
packet.json, and
packet.html.
Add static release checks for this tool-using AI agent with Agents Shipgate.
Run:
agents-shipgate detect --workspace . --json
If Shipgate is relevant, run:
agents-shipgate contract --json
agents-shipgate init --workspace . --write --ci --json
agents-shipgate scan -c shipgate.yaml --suggest-patches --format json
agents-shipgate apply-patches --from agents-shipgate-reports/report.json --confidence high --apply
Read `agents-shipgate-reports/report.json`, not Markdown. Summarize
`release_decision.decision`, blocker count, review item count, top critical/high
findings, and any safe patches applied. Auto-apply only high-confidence safe
patches. Do not auto-assert approval, confirmation, idempotency, broad-scope,
or prohibited-action policy decisions. Ensure `.gitignore` contains
`agents-shipgate-reports/` before committing.
- Agent builders — review MCP, OpenAPI, and SDK tool definitions before merging changes that expand the tool surface.
- Platform teams — add release gates for approval, scope, idempotency, and baseline drift to PR review.
- Security and GRC reviewers — get static release evidence without running agents or importing user code.
Run Agents Shipgate when a PR adds or changes agent tool surfaces or the policy evidence around them:
- MCP exports, OpenAPI specs, or local tool inventories.
- OpenAI Agents SDK, Google ADK, LangChain/LangGraph, CrewAI, Anthropic Messages API, or OpenAI API artifact tool definitions.
- Prompts, permission scopes, approval policies, confirmation policies,
prohibited actions, or
shipgate.yaml. - GitHub Actions or CI release gates for a tool-using AI agent.
agents-shipgate init --workspace . --write
agents-shipgate scan -c shipgate.yamlReports land at agents-shipgate-reports/report.{md,json,sarif}; the Release Evidence Packet lands at agents-shipgate-reports/packet.{md,json,html}.
Install alternatives (your agent project does not need Python 3.12 — install the CLI separately):
python -m pip install agents-shipgate # global pip
uv tool install agents-shipgate # via uvThe v0.6 single-turn flow takes a workspace from "looks like an agent project" to "Shipgate integrated, scan green or with safe patches applied, CI workflow drafted":
agents-shipgate detect --json # 1. classify
agents-shipgate init --write --ci --json # 2. manifest + workflow
agents-shipgate scan -c shipgate.yaml --suggest-patches --format json # 3. scan + suggest
agents-shipgate apply-patches --from agents-shipgate-reports/report.json \
--confidence high --apply # 4. apply safe trivial fixesinit --ci writes .github/workflows/agents-shipgate.yml. apply-patches
is dry-run by default and refuses to mutate anything outside the
manifest's directory.
For agents driving this flow programmatically, see
docs/agent-recipes.md. For framework-by-framework
minimal manifests, see docs/minimal-real-configs.md.
- uses: ThreeMoonsLab/agents-shipgate@v0.10.0
with:
config: shipgate.yaml
ci_mode: advisorySet pr_comment: "true" to post a compact PR summary:
| Input | Status |
|---|---|
| Model Context Protocol (MCP) exports | Supported |
| OpenAPI 3.x specs | Supported |
| OpenAI Agents SDK Python entrypoints | Supported |
| Anthropic Messages API artifacts | Supported |
| Google ADK Python and YAML config | Supported |
| LangChain/LangGraph static Python inputs | Supported |
| CrewAI static Python inputs | Supported |
| n8n workflow JSON and source-control stubs | Supported |
| OpenAI API artifacts | Supported |
| Codex plugin packages and marketplaces | Supported |
- Tool-Use Readiness Report —
agents-shipgate-reports/report.{md,json,sarif}. Markdown for human release review, JSON for tools and coding agents (current schema v0.16; gating signal isrelease_decision.decision; v0.16 adds first-class Action Surface Diff fields on top of v0.15's per-findingprovenance_kind), SARIF for GitHub code-scanning workflows. - Release Evidence Packet —
agents-shipgate-reports/packet.{md,json,html}(andpacket.pdfwith the[pdf]extras). Reviewer-shaped synthesis with fixed sections, including tool-surface and action-surface diffs when available. Governed by packet schema v0.5 — see STABILITY.md §Release Evidence Packet.
| Code | Meaning |
|---|---|
0 |
Pass (advisory mode or strict-no-blockers) |
2 |
Manifest config error |
3 |
Input parse error (file missing, malformed, path traversal blocked) |
4 |
Other Agents Shipgate error |
20 |
Strict-mode gate failure |
Human readers can skip this section; it exists so coding agents can find the repo's machine-readable contracts quickly.
Agents Shipgate is designed to be agent-friendly. If you're a coding agent (Claude Code, Codex, Cursor, Aider) reading this repo:
llms.txt— short index of every machine-readable surface, one fetch.llms-full.txt— long-form concatenation ofAGENTS.md+ recipes + checks + concepts + autofix policy, in one document. Built byscripts/build-llms-full.py..well-known/agents-shipgate.json— discovery metadata (tagline, install commands, schema URLs, gating signal, exit codes, trigger-catalog URL).docs/triggers.json— machine-readable mirror of the AGENTS.md trigger table. Apply the rules to a PR diff to decide whether to proposeagents-shipgate detect. Schema is stable for0.x.tools/shipgate-detect.py— zero-install, stdlib-only detector.curl … | python3 - --workspace . --jsonreturns the same structural verdict asagents-shipgate detect --json. Pinned to the canonical CLI bytests/test_zero_install_detector.py. Seedocs/zero-install.md.agents-shipgate contract --json— verify the installed CLI's local contract before relying on hard-coded schema or gating assumptions.docs/agent-contract-current.md— single source of truth for the current schema versions and which JSON fields to read. Updated whenever the contract bumps; other agent-facing surfaces link here instead of restating the contract.AGENTS.md— canonical agent-facing instructions: install, run, common tasks, JSON-mode flags, error semanticsSTABILITY.md— what won't break across0.xversionsdocs/target-repo-agent-snippets.md— copyable snippets for adding Shipgate trigger rules to downstream agent reposdocs/agent-adoption-harness.md— manual protocol for checking whether coding agents discover and use Shipgatebenchmark/— frozen archetypes, prompts, setup variants, and a public leaderboard CSV. Closes the loop on adoption-readiness changes.docs/zero-install.md— single-file detector,uvx, and GitHub Action paths for evaluating Shipgate without a local install.prompts/— reusable prompts for common workflowsskills/agents-shipgate/+.claude/commands/shipgate.md— self-contained Claude Code skill (bundled prompts and CI recipe) and/shipgateslash command. Seedocs/agents/use-with-claude-code.mdto install in your own project.docs/ai-search-summary.md— human-readable summary for AI search, answer engines, and coding agentsdocs/manifest-v0.1.json+docs/report-schema.v0.16.json— JSON Schemas for live editor validation (current; emitted reports carryreport_schema_version: "0.16"). v0.16 addsaction_surface_factsandaction_surface_diff; v0.15 added the per-findingprovenance_kindenum. Readrelease_decision.decisionfor release gating in new consumers; readagent_summary.first_recommended_actionfor a deterministic next step.docs/checks.json— machine-readable check catalog
Every command has a --json form. Errors emit a structured next_action line on stderr when AGENTS_SHIPGATE_AGENT_MODE=1.
Once an AI agent can refund, email, cancel, deploy, or modify a record, every tool change becomes a release event. Code review catches code; eval suites catch behavior; observability catches runtime. None of them answer the release question: given the tool surface declared in this PR, do we have explicit approval policies, scope coverage, idempotency evidence, and review readiness for every action?
Agents Shipgate produces a deterministic answer to that question, before promotion.
The longer thesis — healthcare for agents and the broader agent lifecycle readiness roadmap — lives on the marketing site.
The bundled support-refund fixture demonstrates the kind of release risks Agents Shipgate is designed to surface:
## Release Decision
Decision: blocked
Reason: 2 active findings block release.
Blockers: 2
Review items: 16
Fail policy: would_fail_ci=false (exit 0)
Top findings:
1. stripe.create_refund lacks a declared approval policy
2. stripe.create_refund lacks idempotency evidence
3. Manifest declares broad permission scopes
stripe.create_refundlacks a declared approval policy, so a financial action could ship without an explicit human review gate.stripe.create_refund.amountlacks a maximum bound, weakening blast-radius control.stripe.create_refundlacks idempotency evidence while retry behavior is known, risking duplicate refunds.wildcard_mcp_tools.*exposes a wildcard tool surface, making review incomplete.gmail.send_customer_emailoverlaps a prohibited external-communication action without a matching confirmation policy.
The fastest way to understand what changes for a reviewer: walk through a Golden PR. Each one ships a sample manifest, the resulting report, the release decision, and the recommended PR-comment summary an agent should post.
openai-agents-sdk-refund-agent— refund agent addsstripe.create_refund. Shipgate decidesblockedbecause approval policy and idempotency evidence are missing. Includes the recommended Markdown PR-comment template.golden-pr-from-coding-agent.md— the artifact a coding agent should produce after running the canonical 4-call flow: PR comment, structuredagent_summary, applied diff, review-item table.mcp-only-tool-server— MCP server with no Python framework imports; demonstrates the MCP-only adoption path.openapi-support-agent— OpenAPI-described tool surface; shows scope-coverage findings.
| Alternative | Gap Agents Shipgate Covers |
|---|---|
| Unit tests | Tests usually validate code paths, not the released tool surface and declared policies. |
| Code review | Reviewers miss generated specs, MCP exports, broad scopes, and missing approval policies. |
| Runtime traces | Useful later, but they arrive after behavior exists. Agents Shipgate runs before promotion. |
| Nothing | Tool-surface drift becomes a production surprise. |
For named comparisons against specific evaluators and platforms, see the marketing-site versus pages: vs evals, vs promptfoo, vs Braintrust, vs LangSmith, and vs observability platforms.
CI is advisory by default:
agents-shipgate scan --config shipgate.yaml --ci-mode advisoryStrict mode exits with code 20 only when unsuppressed critical findings exist.
Configuration, input parsing, and internal tool errors use 2, 3, and 4 respectively:
agents-shipgate scan --config shipgate.yaml --ci-mode strictFor existing projects, save the current reviewed findings as a local baseline and fail strict CI only on new unsuppressed findings:
agents-shipgate baseline save --config shipgate.yaml --out .agents-shipgate/baseline.json
agents-shipgate scan --config shipgate.yaml --baseline .agents-shipgate/baseline.json --ci-mode strictTeams can override severities and CI failure thresholds:
checks:
severity_overrides:
SHIP-AUTH-MISSING-SCOPE: critical
ci:
fail_on:
- critical
- highAgents Shipgate supports static Google ADK extraction for Python entrypoints and Agent Config YAML. The adapter detects LlmAgent/Agent definitions, function tools, OpenAPIToolset, McpToolset, callbacks, plugins, sub-agents, eval references, and explicit local tool inventories without importing ADK code.
version: "0.1"
project:
name: adk-support-agent
agent:
name: support-agent
declared_purpose:
- handle support cases
environment:
target: production_like
tool_sources:
- id: adk
type: google_adk
path: agent.py
google_adk:
eval_sets:
- evals/support.eval.json
tool_inventories:
- inventories/adk-mcp-tools.jsonDynamic ADK toolsets produce warnings or findings unless you provide explicit MCP, OpenAPI, or local tool inventory inputs.
Agents Shipgate includes static Python extraction for LangChain/LangGraph and
CrewAI. The adapters parse Python AST only; they do not import framework
packages or user modules. The supported LangChain/LangGraph patterns target
LangChain Core 0.3+, LangChain 1.x create_agent, and LangGraph 0.2+ source
shapes.
tool_sources:
- id: langchain_agent
type: langchain
path: agent.py
- id: crewai_agent
type: crewai
path: crew.pyFor dynamic or prebuilt tool surfaces, provide explicit local inventory files:
langchain:
tool_inventories:
- inventories/langchain-tools.json
crewai:
tool_inventories:
- inventories/crewai-tools.jsonv0.4 adds local declarative YAML policy packs for organization-specific release rules. Policy packs are static data and run without importing code.
checks:
policy_packs:
- path: policies/org-release.yamlagents-shipgate scan --config shipgate.yaml --policy-pack policies/org-release.yaml| Buyer | Pain | Pitch | Next step |
|---|---|---|---|
| Platform engineer shipping a first production agent | "I don't know what I don't know." | Audits manifest and tool schemas for release risks code review misses. | Run agents-shipgate init --workspace . --write. |
| Security or GRC reviewer | "Agents bypass existing controls." | Creates a static tool-surface audit trail for review. | Review the check catalog. |
| AI PM with a shipping deadline | "Security review blocks us late." | Gives teams self-serve pre-review before formal approval. | Scan the support-refund fixture. |
Agents Shipgate is a static, manifest-first scanner. It is intentionally narrow:
- It does not run agents, call tools, invoke LLMs, or verify model availability.
- It does not verify runtime behavior, latency, prompt quality, or routing decisions.
- It does not replace dynamic security testing or human security review of the underlying systems.
- It only inspects what is declared in
shipgate.yaml, local OpenAPI specs, MCP exports, simple OpenAI API artifacts, optional SDK AST metadata, static Google ADK/LangChain/CrewAI inputs, and static Codex plugin package metadata; tools that are not declared or statically discoverable are not scanned. - The manifest remains
version: "0.1"so existing configs keep working. Current reports carryreport_schema_version: "0.16"(additive over v0.15's provenance enum, addingaction_surface_factsandaction_surface_diff) while preserving the stable payload contract documented in the report schema.
See ROADMAP.md for what is planned next.
Agents Shipgate does not import user code, run agents, call tools, call LLMs, connect to MCP servers, make network calls, or collect telemetry by default.
See Trust model and Security policy for the default local-only guarantees and disclosure process.
Drop this full advisory workflow into .github/workflows/agents-shipgate.yml. It runs on every PR, posts a summary comment, uploads the report and packet as workflow artifacts, and never fails the job. This is the same file shipped at examples/github-actions/01-advisory-pr-comment.yml.
name: Agents Shipgate (advisory)
on:
pull_request:
permissions:
contents: read
pull-requests: write
jobs:
shipgate:
runs-on: ubuntu-latest
timeout-minutes: 10
steps:
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd
with:
fetch-depth: 0
- uses: ThreeMoonsLab/agents-shipgate@v0.10.0
with:
ci_mode: advisory
diff_base: target
pr_comment: 'true'
shipgate_version: '0.10.0'Switch to ci_mode: strict only after your team has reviewed the advisory output. See examples/github-actions/ for strict / baseline / SARIF / multi-config / changed-paths recipes.
Inputs: config, ci_mode (advisory or strict), fail_on, baseline, baseline_mode, diff_from, diff_base, policy_packs, no_plugins, output_dir, upload_artifact, pr_comment, github_token, shipgate_version. Set diff_base: target for a best-effort target-branch scan in PRs; shallow checkout, missing config, schema mismatch, or scan failure disables the diff and leaves the release gate unchanged.
Outputs: decision, blocker_count, review_item_count, ci_would_fail, diff_enabled, status, critical_count, high_count, medium_count, baseline_new_count, baseline_matched_count, baseline_resolved_count, adk_agent_count, adk_dynamic_toolset_count, report_json, report_markdown, report_sarif, exit_code. Prefer decision and ci_would_fail over legacy status for new release gates.
Set shipgate_version to install a pinned PyPI release instead of the action source when your workflow requires package/version parity.
Agents Shipgate is and will remain free OSS for individuals and teams running it on their own infrastructure. The core manifest-first scanner, built-in checks, Markdown report, and JSON report are intended to remain open source. We do not collect telemetry and do not require an account.
If hosted dashboards, SSO, org-wide baselines, approval workflows, or trace-based evidence emerge, they should live in a separate optional product rather than moving core OSS functionality behind a paywall.
Teams shipping production-like tool-using agents can apply to the
Three Moons Lab design partner program
— the marketing page mirrors
docs/design-partners.md in the repo and includes a
prefilled email CTA for review criteria and contact.
The marketing site at threemoonslab.com carries the same canonical concepts in human-readable, search-optimised form: quickstart, check catalog, glossary, blog, and design partners. The in-repo docs below are the canonical contract; the marketing pages are sized for first-time readers and AI search ingest.

