Skip to content

Architecture

Pengfei Hu edited this page Apr 26, 2026 · 1 revision

Architecture

For contributors and integrators who want to understand the internals. Pinned to v0.2.0; modules and APIs may evolve.


High-level data flow

shipgate.yaml
    │
    ▼
┌─────────────────────┐
│ config/loader.py    │  YAML → AgentsShipgateManifest (Pydantic, extra="forbid")
└─────────────────────┘
    │
    ▼
┌─────────────────────┐
│ inputs/             │  load_mcp_tools / load_openapi_tools /
│   mcp.py            │  load_openai_sdk_static_tools / load_openai_api_artifacts
│   openapi.py        │  → list[LoadedToolSource]
│   openai_sdk_…py    │
│   openai_api.py     │
└─────────────────────┘
    │
    ▼
┌─────────────────────┐
│ cli/scan.py         │  flatten_and_deduplicate_tools (priority-based)
│ _flatten…           │  → list[Tool]
└─────────────────────┘
    │
    ▼
┌─────────────────────┐
│ core/risk_hints.py  │  enrich_tools_with_risk_hints (HTTP method,
│ enrich…             │  MCP annotations, tokenized keyword classifier,
│                     │  manual overrides) → list[Tool] with risk_hints
└─────────────────────┘
    │
    ▼
┌─────────────────────┐
│ checks/registry.py  │  run_checks (28 built-ins + opt-in plugins)
│ run_checks          │  → list[Finding]
└─────────────────────┘
    │
    ▼
┌─────────────────────┐
│ core/findings.py    │  assign_finding_ids (fingerprint + collision discriminator)
│                     │  apply_severity_overrides
│                     │  apply_suppressions
└─────────────────────┘
    │
    ▼
┌─────────────────────┐
│ core/baseline.py    │  apply_baseline (matched / new / resolved)
└─────────────────────┘
    │
    ▼
┌─────────────────────┐
│ report/             │  ReadinessReport → report.md, report.json
│   markdown.py       │  + GitHub step summary
│   json_report.py    │
│ ci/                 │
│   exit_policy.py    │  exit_code_for_report → 0 or 20
└─────────────────────┘

Module layout

src/agents_shipgate/
├── __main__.py          # python -m agents_shipgate entry point
├── cli/
│   ├── main.py          # Typer app: scan, init, doctor, explain, list-checks, baseline
│   ├── scan.py          # Orchestration: load → enrich → check → report
│   └── discovery.py     # Workspace scan + manifest template generator
├── config/
│   ├── schema.py        # Pydantic manifest models (extra="forbid")
│   └── loader.py        # YAML loader + typo suggester
├── inputs/
│   ├── common.py        # resolve_input_path, load_structured_file, schema_to_parameters
│   ├── mcp.py           # MCP JSON tools/list export reader
│   ├── openapi.py       # OpenAPI 3.x reader with bounded $ref resolution
│   ├── openai_sdk_static.py  # Python AST extractor for @function_tool
│   └── openai_api.py    # OpenAI Agents API artifacts (prompts, schemas, traces)
├── core/
│   ├── models.py        # Tool, Finding, ReadinessReport, etc.
│   ├── context.py       # ScanContext (frozen dataclass)
│   ├── risk_hints.py    # Tokenized keyword classifier + manual overrides
│   ├── findings.py      # Fingerprinting, suppressions, severity overrides
│   ├── baseline.py      # Save / load / apply baseline
│   ├── errors.py        # Exception hierarchy
│   └── logging.py       # Stderr logger + optional JSON formatter
├── checks/
│   ├── registry.py      # Built-in check list + plugin loader
│   ├── base.py          # tool_finding / agent_finding helpers
│   ├── inventory.py
│   ├── documentation.py
│   ├── schema.py
│   ├── auth.py
│   ├── manifest_scope.py
│   ├── manifest_consistency.py
│   ├── policy.py
│   ├── side_effects.py
│   └── api.py           # OpenAI Agents API checks
├── report/
│   ├── markdown.py      # render_markdown_report
│   └── json_report.py   # write_json_report
└── ci/
    ├── exit_policy.py   # exit_code_for_report (advisory vs strict)
    └── github_summary.py  # GITHUB_STEP_SUMMARY emitter

Key types

ScanContext (core/context.py)

Frozen dataclass passed to every check function:

@dataclass(frozen=True)
class ScanContext:
    manifest: AgentsShipgateManifest
    agent: Agent
    tools: list[Tool]
    config_path: Path
    api_artifacts: OpenAIApiArtifacts | None

Pure value object — checks must not mutate it.

Tool (core/models.py)

Pydantic model. Carries the union of fields a check might inspect: name, description, source_type, schemas, parameters, annotations, auth scopes, risk_hints, owner, extraction confidence. Source-specific fields (HTTP method, MCP annotation hints) live under annotations.

Finding (core/models.py)

Pydantic model. Required fields: check_id, title, severity, category, recommendation. Optional: tool_id, tool_name, agent_id, evidence (free-form dict), confidence, source (SourceReference). Set after creation: id, fingerprint, suppressed, suppression_reason, baseline_status.

CheckMetadata (core/models.py)

Pydantic model used by list-checks / explain. Plugins attach a CheckMetadata (or compatible dict) as run.AGENTS_SHIPGATE_METADATA to register catalog entries.


Risk-hint classifier (core/risk_hints.py)

The most heuristic-laden module. Critical implementation notes:

  • Tokenized keyword matching. v0.2 uses re.findall(r"[a-z]+", text.lower()) to split names/descriptions/scopes into word tokens, then intersects with module-level keyword sets. This avoids substring false positives ("deploy" matches the standalone token but not the substring inside "deployments").
  • Source-typed gating. The keyword classifier runs only for openai_api and sdk_function source types. OpenAPI-derived tools get read_only / write directly from HTTP method.
  • SDK preview safety net. SDK functions whose tokens include preview and have no HTTP method get read_only at HIGH confidence and are exempted from the keyword classifier — this is what protects fixture tools like send_email_preview from being tagged as external_write.
  • GET → read_only at HIGH. Any GET endpoint with no write hint gets read_only at HIGH confidence so is_effectively_read_only short-circuits policy/scope checks. The exception is GETs that pick up a destructive tag from operationId tokens (e.g. *_destroy_with_associated_resources) — those still flow through.
  • Manual overrides win. risk_overrides.tools.{tool}.tags add hints at HIGH manual confidence; remove_tags removes by tag regardless of source.

The full keyword sets live near the top of risk_hints.py and are documented in Check Catalog § Risk-hint reference.


Fingerprint algorithm (core/findings.py)

def finding_fingerprint(finding: Finding) -> str:
    identity = {
        "check_id": finding.check_id,
        "tool_name": finding.tool_name,
        "evidence": _canonicalize_for_fingerprint(finding.evidence),
    }
    digest = hashlib.sha256(
        json.dumps(identity, sort_keys=True, default=str).encode("utf-8")
    ).hexdigest()[:16]
    return f"fp_{digest}"

_canonicalize_for_fingerprint recursively sorts dict keys, sorts list items by JSON representation, and excludes the default_severity key (the audit field that records pre-override severity). This last detail is what makes severity_overrides safe to apply before or after assign_finding_ids — a question that surfaced in the v0.2 review pass.

When two findings collide (same fingerprint), assign_finding_ids adds an 8-char content-derived discriminator built from agent_id, category, confidence, recommendation, source, title, tool_id, tool_name. The result is order-independent — running the same checks in a different order produces the same id for each finding.


Plugin loader (checks/registry.py)

Plugins are gated behind AGENTS_SHIPGATE_ENABLE_PLUGINS=1 (env) AND not overridden by --no-plugins (CLI). The loader:

  1. Calls entry_points(group="agents_shipgate.checks").
  2. Skips entry points where dist.metadata["Name"] (normalized) equals "agents-shipgate" — protects against builtin spoofing.
  3. Falls back to a value-prefix check when dist is None (rare; usually pip installs).
  4. Collects each plugin's metadata into loaded_plugins[] for the report.

See Plugin Authoring for the public-facing contract.


Trust-posture invariants

These are enforced by the test suite and grep-able from the source:

  • No subprocess, os.system, popen anywhere
  • No HTTP client (requests, urllib, httpx, aiohttp) in scanner code
  • YAML uses yaml.safe_load; !!python/object/... rejected
  • Path resolution rejects .. escape from manifest dir (tests/test_inputs.py::test_mcp_loader_rejects_path_traversal)
  • Plugin builtin spoof rejected (tests/test_plugins.py::test_builtin_distribution_entry_points_are_skipped)

See Trust Model § Verifying these claims.


Testing

git clone https://github.com/ThreeMoonsLab/agents-shipgate.git
cd agents-shipgate
python -m pip install -e ".[dev]"
python -m pytest                           # 125 tests
python -m pytest tests/test_risk_hints.py  # tokenization invariants
python -m pytest tests/test_plugins.py     # plugin loader contract
python -m ruff check .                     # lint

CI pins:

  • pytest with --cov-fail-under=75 (.github/workflows/ci.yml)
  • Ruff rules ["E4", "E7", "E9", "F", "I", "B", "UP"] with B008 ignored (Typer defaults)
  • pip-audit for dependency vulnerabilities
  • cyclonedx-py for SBOM generation

Releases are signed with sigstore and published via PyPI Trusted Publishing (.github/workflows/release.yml).


Where to add new code

You're adding… File / pattern
A new check src/agents_shipgate/checks/{name}.py with a run(context) function. Add to BUILTIN_CHECKS in registry.py. Add a CHECK_METADATA entry. Add a test under tests/.
A new risk-hint heuristic Extend _add_automatic_hints in risk_hints.py. Add tests in tests/test_risk_hints.py covering both true positives and the edge case that motivated it.
A new input loader src/agents_shipgate/inputs/{name}.py with a load_*_tools(source, base_dir) -> LoadedToolSource. Wire into cli/scan.py:_load_sources. Use resolve_input_path for paths.
A new manifest field src/agents_shipgate/config/schema.py. The typo suggester picks it up automatically (no list update needed). Bump the manifest schema version if it's a breaking change.
A new CLI command cli/main.py. Each top-level command is a @app.command(). Errors → ConfigError (exit 2), InputParseError (exit 3), AgentsShipgateError (exit 4).

Roadmap and current debts

See ROADMAP.md for the official direction. Known internal debts that contributors are welcome to take on:

  • Split SHIP-API-OPERATIONAL-READINESS into atomic check IDs (currently bundles retry, timeout, test cases, output schemas, traces).
  • Strict mode default fails only on critical — discussion ongoing about whether [critical, high] should be the implicit default.
  • Baselines include created_at and aren't byte-idempotent across runs — a content-only mode would improve git diffs.
  • Top-level check_severity_overrides is an alias for the nested checks.severity_overrides. Pick one and deprecate the other.

Open issues with the architecture label discuss these in detail.

Clone this wiki locally