Architecture

For contributors and integrators who want to understand the internals. Pinned to v0.2.0; modules and APIs may evolve.

High-level data flow

shipgate.yaml
    │
    ▼
┌─────────────────────┐
│ config/loader.py    │  YAML → AgentsShipgateManifest (Pydantic, extra="forbid")
└─────────────────────┘
    │
    ▼
┌─────────────────────┐
│ inputs/             │  load_mcp_tools / load_openapi_tools /
│   mcp.py            │  load_openai_sdk_static_tools / load_openai_api_artifacts
│   openapi.py        │  → list[LoadedToolSource]
│   openai_sdk_…py    │
│   openai_api.py     │
└─────────────────────┘
    │
    ▼
┌─────────────────────┐
│ cli/scan.py         │  flatten_and_deduplicate_tools (priority-based)
│ _flatten…           │  → list[Tool]
└─────────────────────┘
    │
    ▼
┌─────────────────────┐
│ core/risk_hints.py  │  enrich_tools_with_risk_hints (HTTP method,
│ enrich…             │  MCP annotations, tokenized keyword classifier,
│                     │  manual overrides) → list[Tool] with risk_hints
└─────────────────────┘
    │
    ▼
┌─────────────────────┐
│ checks/registry.py  │  run_checks (28 built-ins + opt-in plugins)
│ run_checks          │  → list[Finding]
└─────────────────────┘
    │
    ▼
┌─────────────────────┐
│ core/findings.py    │  assign_finding_ids (fingerprint + collision discriminator)
│                     │  apply_severity_overrides
│                     │  apply_suppressions
└─────────────────────┘
    │
    ▼
┌─────────────────────┐
│ core/baseline.py    │  apply_baseline (matched / new / resolved)
└─────────────────────┘
    │
    ▼
┌─────────────────────┐
│ report/             │  ReadinessReport → report.md, report.json
│   markdown.py       │  + GitHub step summary
│   json_report.py    │
│ ci/                 │
│   exit_policy.py    │  exit_code_for_report → 0 or 20
└─────────────────────┘

Module layout

src/agents_shipgate/
├── __main__.py          # python -m agents_shipgate entry point
├── cli/
│   ├── main.py          # Typer app: scan, init, doctor, explain, list-checks, baseline
│   ├── scan.py          # Orchestration: load → enrich → check → report
│   └── discovery.py     # Workspace scan + manifest template generator
├── config/
│   ├── schema.py        # Pydantic manifest models (extra="forbid")
│   └── loader.py        # YAML loader + typo suggester
├── inputs/
│   ├── common.py        # resolve_input_path, load_structured_file, schema_to_parameters
│   ├── mcp.py           # MCP JSON tools/list export reader
│   ├── openapi.py       # OpenAPI 3.x reader with bounded $ref resolution
│   ├── openai_sdk_static.py  # Python AST extractor for @function_tool
│   └── openai_api.py    # OpenAI Agents API artifacts (prompts, schemas, traces)
├── core/
│   ├── models.py        # Tool, Finding, ReadinessReport, etc.
│   ├── context.py       # ScanContext (frozen dataclass)
│   ├── risk_hints.py    # Tokenized keyword classifier + manual overrides
│   ├── findings.py      # Fingerprinting, suppressions, severity overrides
│   ├── baseline.py      # Save / load / apply baseline
│   ├── errors.py        # Exception hierarchy
│   └── logging.py       # Stderr logger + optional JSON formatter
├── checks/
│   ├── registry.py      # Built-in check list + plugin loader
│   ├── base.py          # tool_finding / agent_finding helpers
│   ├── inventory.py
│   ├── documentation.py
│   ├── schema.py
│   ├── auth.py
│   ├── manifest_scope.py
│   ├── manifest_consistency.py
│   ├── policy.py
│   ├── side_effects.py
│   └── api.py           # OpenAI Agents API checks
├── report/
│   ├── markdown.py      # render_markdown_report
│   └── json_report.py   # write_json_report
└── ci/
    ├── exit_policy.py   # exit_code_for_report (advisory vs strict)
    └── github_summary.py  # GITHUB_STEP_SUMMARY emitter

Key types

`ScanContext` (`core/context.py`)

Frozen dataclass passed to every check function:

@dataclass(frozen=True)
class ScanContext:
    manifest: AgentsShipgateManifest
    agent: Agent
    tools: list[Tool]
    config_path: Path
    api_artifacts: OpenAIApiArtifacts | None

Pure value object — checks must not mutate it.

`Tool` (`core/models.py`)

Pydantic model. Carries the union of fields a check might inspect: name, description, source_type, schemas, parameters, annotations, auth scopes, risk_hints, owner, extraction confidence. Source-specific fields (HTTP method, MCP annotation hints) live under annotations.

`Finding` (`core/models.py`)

Pydantic model. Required fields: check_id, title, severity, category, recommendation. Optional: tool_id, tool_name, agent_id, evidence (free-form dict), confidence, source (SourceReference). Set after creation: id, fingerprint, suppressed, suppression_reason, baseline_status.

`CheckMetadata` (`core/models.py`)

Pydantic model used by list-checks / explain. Plugins attach a CheckMetadata (or compatible dict) as run.AGENTS_SHIPGATE_METADATA to register catalog entries.

Risk-hint classifier (`core/risk_hints.py`)

The most heuristic-laden module. Critical implementation notes:

Tokenized keyword matching. v0.2 uses re.findall(r"[a-z]+", text.lower()) to split names/descriptions/scopes into word tokens, then intersects with module-level keyword sets. This avoids substring false positives ("deploy" matches the standalone token but not the substring inside "deployments").
Source-typed gating. The keyword classifier runs only for openai_api and sdk_function source types. OpenAPI-derived tools get read_only / write directly from HTTP method.
SDK preview safety net. SDK functions whose tokens include preview and have no HTTP method get read_only at HIGH confidence and are exempted from the keyword classifier — this is what protects fixture tools like send_email_preview from being tagged as external_write.
GET → read_only at HIGH. Any GET endpoint with no write hint gets read_only at HIGH confidence so is_effectively_read_only short-circuits policy/scope checks. The exception is GETs that pick up a destructive tag from operationId tokens (e.g. *_destroy_with_associated_resources) — those still flow through.
Manual overrides win. risk_overrides.tools.{tool}.tags add hints at HIGH manual confidence; remove_tags removes by tag regardless of source.

The full keyword sets live near the top of risk_hints.py and are documented in Check Catalog § Risk-hint reference.

Fingerprint algorithm (`core/findings.py`)

def finding_fingerprint(finding: Finding) -> str:
    identity = {
        "check_id": finding.check_id,
        "tool_name": finding.tool_name,
        "evidence": _canonicalize_for_fingerprint(finding.evidence),
    }
    digest = hashlib.sha256(
        json.dumps(identity, sort_keys=True, default=str).encode("utf-8")
    ).hexdigest()[:16]
    return f"fp_{digest}"

_canonicalize_for_fingerprint recursively sorts dict keys, sorts list items by JSON representation, and excludes the default_severity key (the audit field that records pre-override severity). This last detail is what makes severity_overrides safe to apply before or after assign_finding_ids — a question that surfaced in the v0.2 review pass.

When two findings collide (same fingerprint), assign_finding_ids adds an 8-char content-derived discriminator built from agent_id, category, confidence, recommendation, source, title, tool_id, tool_name. The result is order-independent — running the same checks in a different order produces the same id for each finding.

Plugin loader (`checks/registry.py`)

Plugins are gated behind AGENTS_SHIPGATE_ENABLE_PLUGINS=1 (env) AND not overridden by --no-plugins (CLI). The loader:

Calls entry_points(group="agents_shipgate.checks").
Skips entry points where dist.metadata["Name"] (normalized) equals "agents-shipgate" — protects against builtin spoofing.
Falls back to a value-prefix check when dist is None (rare; usually pip installs).
Collects each plugin's metadata into loaded_plugins[] for the report.

See Plugin Authoring for the public-facing contract.

Trust-posture invariants

These are enforced by the test suite and grep-able from the source:

No subprocess, os.system, popen anywhere
No HTTP client (requests, urllib, httpx, aiohttp) in scanner code
YAML uses yaml.safe_load; !!python/object/... rejected
Path resolution rejects .. escape from manifest dir (tests/test_inputs.py::test_mcp_loader_rejects_path_traversal)
Plugin builtin spoof rejected (tests/test_plugins.py::test_builtin_distribution_entry_points_are_skipped)

See Trust Model § Verifying these claims.

Testing

git clone https://github.com/ThreeMoonsLab/agents-shipgate.git
cd agents-shipgate
python -m pip install -e ".[dev]"
python -m pytest                           # 125 tests
python -m pytest tests/test_risk_hints.py  # tokenization invariants
python -m pytest tests/test_plugins.py     # plugin loader contract
python -m ruff check .                     # lint

CI pins:

pytest with --cov-fail-under=75 (.github/workflows/ci.yml)
Ruff rules ["E4", "E7", "E9", "F", "I", "B", "UP"] with B008 ignored (Typer defaults)
pip-audit for dependency vulnerabilities
cyclonedx-py for SBOM generation

Releases are signed with sigstore and published via PyPI Trusted Publishing (.github/workflows/release.yml).

Where to add new code

You're adding…	File / pattern
A new check	`src/agents_shipgate/checks/{name}.py` with a `run(context)` function. Add to `BUILTIN_CHECKS` in `registry.py`. Add a `CHECK_METADATA` entry. Add a test under `tests/`.
A new risk-hint heuristic	Extend `_add_automatic_hints` in `risk_hints.py`. Add tests in `tests/test_risk_hints.py` covering both true positives and the edge case that motivated it.
A new input loader	`src/agents_shipgate/inputs/{name}.py` with a `load_*_tools(source, base_dir) -> LoadedToolSource`. Wire into `cli/scan.py:_load_sources`. Use `resolve_input_path` for paths.
A new manifest field	`src/agents_shipgate/config/schema.py`. The typo suggester picks it up automatically (no list update needed). Bump the manifest schema version if it's a breaking change.
A new CLI command	`cli/main.py`. Each top-level command is a `@app.command()`. Errors → `ConfigError` (exit 2), `InputParseError` (exit 3), `AgentsShipgateError` (exit 4).

Roadmap and current debts

See ROADMAP.md for the official direction. Known internal debts that contributors are welcome to take on:

Split SHIP-API-OPERATIONAL-READINESS into atomic check IDs (currently bundles retry, timeout, test cases, output schemas, traces).
Strict mode default fails only on critical — discussion ongoing about whether [critical, high] should be the implicit default.
Baselines include created_at and aren't byte-idempotent across runs — a content-only mode would improve git diffs.
Top-level check_severity_overrides is an alias for the nested checks.severity_overrides. Pick one and deprecate the other.

Open issues with the architecture label discuss these in detail.

Agents Shipgate · Apache-2.0 · maintained by Three Moons Lab · Report a false positive

🏠 Home

Getting started

Reference

Workflows

Extending

Project

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Architecture

Architecture

High-level data flow

Module layout

Key types

`ScanContext` (`core/context.py`)

`Tool` (`core/models.py`)

`Finding` (`core/models.py`)

`CheckMetadata` (`core/models.py`)

Risk-hint classifier (`core/risk_hints.py`)

Fingerprint algorithm (`core/findings.py`)

Plugin loader (`checks/registry.py`)

Trust-posture invariants

Testing

Where to add new code

Roadmap and current debts

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally

Architecture

Architecture

High-level data flow

Module layout

Key types

ScanContext (core/context.py)

Tool (core/models.py)

Finding (core/models.py)

CheckMetadata (core/models.py)

Risk-hint classifier (core/risk_hints.py)

Fingerprint algorithm (core/findings.py)

Plugin loader (checks/registry.py)

Trust-posture invariants

Testing

Where to add new code

Roadmap and current debts

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally

`ScanContext` (`core/context.py`)

`Tool` (`core/models.py`)

`Finding` (`core/models.py`)

`CheckMetadata` (`core/models.py`)

Risk-hint classifier (`core/risk_hints.py`)

Fingerprint algorithm (`core/findings.py`)

Plugin loader (`checks/registry.py`)