-
Notifications
You must be signed in to change notification settings - Fork 0
Architecture
For contributors and integrators who want to understand the internals. Pinned to v0.2.0; modules and APIs may evolve.
shipgate.yaml
│
▼
┌─────────────────────┐
│ config/loader.py │ YAML → AgentsShipgateManifest (Pydantic, extra="forbid")
└─────────────────────┘
│
▼
┌─────────────────────┐
│ inputs/ │ load_mcp_tools / load_openapi_tools /
│ mcp.py │ load_openai_sdk_static_tools / load_openai_api_artifacts
│ openapi.py │ → list[LoadedToolSource]
│ openai_sdk_…py │
│ openai_api.py │
└─────────────────────┘
│
▼
┌─────────────────────┐
│ cli/scan.py │ flatten_and_deduplicate_tools (priority-based)
│ _flatten… │ → list[Tool]
└─────────────────────┘
│
▼
┌─────────────────────┐
│ core/risk_hints.py │ enrich_tools_with_risk_hints (HTTP method,
│ enrich… │ MCP annotations, tokenized keyword classifier,
│ │ manual overrides) → list[Tool] with risk_hints
└─────────────────────┘
│
▼
┌─────────────────────┐
│ checks/registry.py │ run_checks (28 built-ins + opt-in plugins)
│ run_checks │ → list[Finding]
└─────────────────────┘
│
▼
┌─────────────────────┐
│ core/findings.py │ assign_finding_ids (fingerprint + collision discriminator)
│ │ apply_severity_overrides
│ │ apply_suppressions
└─────────────────────┘
│
▼
┌─────────────────────┐
│ core/baseline.py │ apply_baseline (matched / new / resolved)
└─────────────────────┘
│
▼
┌─────────────────────┐
│ report/ │ ReadinessReport → report.md, report.json
│ markdown.py │ + GitHub step summary
│ json_report.py │
│ ci/ │
│ exit_policy.py │ exit_code_for_report → 0 or 20
└─────────────────────┘
src/agents_shipgate/
├── __main__.py # python -m agents_shipgate entry point
├── cli/
│ ├── main.py # Typer app: scan, init, doctor, explain, list-checks, baseline
│ ├── scan.py # Orchestration: load → enrich → check → report
│ └── discovery.py # Workspace scan + manifest template generator
├── config/
│ ├── schema.py # Pydantic manifest models (extra="forbid")
│ └── loader.py # YAML loader + typo suggester
├── inputs/
│ ├── common.py # resolve_input_path, load_structured_file, schema_to_parameters
│ ├── mcp.py # MCP JSON tools/list export reader
│ ├── openapi.py # OpenAPI 3.x reader with bounded $ref resolution
│ ├── openai_sdk_static.py # Python AST extractor for @function_tool
│ └── openai_api.py # OpenAI Agents API artifacts (prompts, schemas, traces)
├── core/
│ ├── models.py # Tool, Finding, ReadinessReport, etc.
│ ├── context.py # ScanContext (frozen dataclass)
│ ├── risk_hints.py # Tokenized keyword classifier + manual overrides
│ ├── findings.py # Fingerprinting, suppressions, severity overrides
│ ├── baseline.py # Save / load / apply baseline
│ ├── errors.py # Exception hierarchy
│ └── logging.py # Stderr logger + optional JSON formatter
├── checks/
│ ├── registry.py # Built-in check list + plugin loader
│ ├── base.py # tool_finding / agent_finding helpers
│ ├── inventory.py
│ ├── documentation.py
│ ├── schema.py
│ ├── auth.py
│ ├── manifest_scope.py
│ ├── manifest_consistency.py
│ ├── policy.py
│ ├── side_effects.py
│ └── api.py # OpenAI Agents API checks
├── report/
│ ├── markdown.py # render_markdown_report
│ └── json_report.py # write_json_report
└── ci/
├── exit_policy.py # exit_code_for_report (advisory vs strict)
└── github_summary.py # GITHUB_STEP_SUMMARY emitter
Frozen dataclass passed to every check function:
@dataclass(frozen=True)
class ScanContext:
manifest: AgentsShipgateManifest
agent: Agent
tools: list[Tool]
config_path: Path
api_artifacts: OpenAIApiArtifacts | NonePure value object — checks must not mutate it.
Pydantic model. Carries the union of fields a check might inspect: name, description, source_type, schemas, parameters, annotations, auth scopes, risk_hints, owner, extraction confidence. Source-specific fields (HTTP method, MCP annotation hints) live under annotations.
Pydantic model. Required fields: check_id, title, severity, category, recommendation. Optional: tool_id, tool_name, agent_id, evidence (free-form dict), confidence, source (SourceReference). Set after creation: id, fingerprint, suppressed, suppression_reason, baseline_status.
Pydantic model used by list-checks / explain. Plugins attach a CheckMetadata (or compatible dict) as run.AGENTS_SHIPGATE_METADATA to register catalog entries.
The most heuristic-laden module. Critical implementation notes:
-
Tokenized keyword matching. v0.2 uses
re.findall(r"[a-z]+", text.lower())to split names/descriptions/scopes into word tokens, then intersects with module-level keyword sets. This avoids substring false positives ("deploy"matches the standalone token but not the substring inside"deployments"). -
Source-typed gating. The keyword classifier runs only for
openai_apiandsdk_functionsource types. OpenAPI-derived tools getread_only/writedirectly from HTTP method. -
SDK preview safety net. SDK functions whose tokens include
previewand have no HTTP method getread_onlyat HIGH confidence and are exempted from the keyword classifier — this is what protects fixture tools likesend_email_previewfrom being tagged as external_write. -
GET → read_only at HIGH. Any GET endpoint with no write hint gets
read_onlyat HIGH confidence sois_effectively_read_onlyshort-circuits policy/scope checks. The exception is GETs that pick up adestructivetag from operationId tokens (e.g.*_destroy_with_associated_resources) — those still flow through. -
Manual overrides win.
risk_overrides.tools.{tool}.tagsadd hints at HIGH manual confidence;remove_tagsremoves by tag regardless of source.
The full keyword sets live near the top of risk_hints.py and are documented in Check Catalog § Risk-hint reference.
def finding_fingerprint(finding: Finding) -> str:
identity = {
"check_id": finding.check_id,
"tool_name": finding.tool_name,
"evidence": _canonicalize_for_fingerprint(finding.evidence),
}
digest = hashlib.sha256(
json.dumps(identity, sort_keys=True, default=str).encode("utf-8")
).hexdigest()[:16]
return f"fp_{digest}"_canonicalize_for_fingerprint recursively sorts dict keys, sorts list items by JSON representation, and excludes the default_severity key (the audit field that records pre-override severity). This last detail is what makes severity_overrides safe to apply before or after assign_finding_ids — a question that surfaced in the v0.2 review pass.
When two findings collide (same fingerprint), assign_finding_ids adds an 8-char content-derived discriminator built from agent_id, category, confidence, recommendation, source, title, tool_id, tool_name. The result is order-independent — running the same checks in a different order produces the same id for each finding.
Plugins are gated behind AGENTS_SHIPGATE_ENABLE_PLUGINS=1 (env) AND not overridden by --no-plugins (CLI). The loader:
- Calls
entry_points(group="agents_shipgate.checks"). - Skips entry points where
dist.metadata["Name"](normalized) equals"agents-shipgate"— protects against builtin spoofing. - Falls back to a value-prefix check when
distis None (rare; usually pip installs). - Collects each plugin's metadata into
loaded_plugins[]for the report.
See Plugin Authoring for the public-facing contract.
These are enforced by the test suite and grep-able from the source:
- No
subprocess,os.system,popenanywhere - No HTTP client (
requests,urllib,httpx,aiohttp) in scanner code - YAML uses
yaml.safe_load;!!python/object/...rejected - Path resolution rejects
..escape from manifest dir (tests/test_inputs.py::test_mcp_loader_rejects_path_traversal) - Plugin builtin spoof rejected (
tests/test_plugins.py::test_builtin_distribution_entry_points_are_skipped)
See Trust Model § Verifying these claims.
git clone https://github.com/ThreeMoonsLab/agents-shipgate.git
cd agents-shipgate
python -m pip install -e ".[dev]"
python -m pytest # 125 tests
python -m pytest tests/test_risk_hints.py # tokenization invariants
python -m pytest tests/test_plugins.py # plugin loader contract
python -m ruff check . # lintCI pins:
- pytest with
--cov-fail-under=75(.github/workflows/ci.yml) - Ruff rules
["E4", "E7", "E9", "F", "I", "B", "UP"]withB008ignored (Typer defaults) -
pip-auditfor dependency vulnerabilities -
cyclonedx-pyfor SBOM generation
Releases are signed with sigstore and published via PyPI Trusted Publishing (.github/workflows/release.yml).
| You're adding… | File / pattern |
|---|---|
| A new check |
src/agents_shipgate/checks/{name}.py with a run(context) function. Add to BUILTIN_CHECKS in registry.py. Add a CHECK_METADATA entry. Add a test under tests/. |
| A new risk-hint heuristic | Extend _add_automatic_hints in risk_hints.py. Add tests in tests/test_risk_hints.py covering both true positives and the edge case that motivated it. |
| A new input loader |
src/agents_shipgate/inputs/{name}.py with a load_*_tools(source, base_dir) -> LoadedToolSource. Wire into cli/scan.py:_load_sources. Use resolve_input_path for paths. |
| A new manifest field |
src/agents_shipgate/config/schema.py. The typo suggester picks it up automatically (no list update needed). Bump the manifest schema version if it's a breaking change. |
| A new CLI command |
cli/main.py. Each top-level command is a @app.command(). Errors → ConfigError (exit 2), InputParseError (exit 3), AgentsShipgateError (exit 4). |
See ROADMAP.md for the official direction. Known internal debts that contributors are welcome to take on:
-
Split
SHIP-API-OPERATIONAL-READINESSinto atomic check IDs (currently bundles retry, timeout, test cases, output schemas, traces). -
Strict mode default fails only on
critical— discussion ongoing about whether[critical, high]should be the implicit default. -
Baselines include
created_atand aren't byte-idempotent across runs — a content-only mode would improve git diffs. -
Top-level
check_severity_overridesis an alias for the nestedchecks.severity_overrides. Pick one and deprecate the other.
Open issues with the architecture label discuss these in detail.
Agents Shipgate · Apache-2.0 · maintained by Three Moons Lab · Report a false positive
Getting started
Reference
Workflows
Extending
Project