Skip to content

Enforce no-import / no-exec trust invariant across all adapters (M3)#79

Merged
pengfei-threemoonslab merged 4 commits into
mainfrom
claude/m3-no-import-invariant
May 16, 2026
Merged

Enforce no-import / no-exec trust invariant across all adapters (M3)#79
pengfei-threemoonslab merged 4 commits into
mainfrom
claude/m3-no-import-invariant

Conversation

@pengfei-threemoonslab
Copy link
Copy Markdown
Contributor

Summary

  • Adds an AST lint (tests/test_adapter_static_only.py) over every source under src/agents_shipgate/inputs/ that rejects exec / eval / __import__ / compile, dynamic-import surfaces (importlib.import_module, importlib.util.spec_from_file_location, importlib.machinery.SourceFileLoader, runpy.run_path, subprocess.*, os.system/popen/execv*/spawnv), and the matching import / from-import forms so an aliased re-export can't slip past the call-site checks. Runs as a dedicated Trust-model invariant lint step before the full pytest suite.
  • Rewrites tests/test_fixture_no_import.py to drive every adapter end-to-end (LangChain, CrewAI, OpenAI Agents SDK, Google ADK, MCP, OpenAPI, Anthropic, OpenAI API, n8n) against a fixture with a module-level raise RuntimeError(...) trap. Each test snapshots sys.modules before/after run_scan() and asserts no module whose __file__ resolves under the fixture root ends up cached — a strictly stronger property than the previous runpy.run_path smoke check, which only proved the fixture would fail if imported but never exercised the adapter against it.
  • Upgrades STABILITY.md § Trust-model invariants to cite both tests as the structural enforcement, framing the property as structurally absent rather than "we tried to forbid X."

Why

The public claim "No agent execution. No LLM calls. No MCP server connections. No scanner network calls." (README) and "the SDK loaders use ast.parse only" (STABILITY.md) was previously held up by convention plus two narrow runpy.run_path checks on the LangChain and CrewAI samples. A new adapter could silently introduce importlib.import_module(user_path) and the tests wouldn't notice. This PR converts the claim from a documentation promise into a structural CI gate.

What it does NOT change

  • No production code under src/agents_shipgate/ is modified.
  • The trust-model claim itself is unchanged — this PR is purely enforcement and documentation of an existing invariant.
  • No schema version bump. No new check IDs. No new exit codes. No new manifest fields.
  • Baseline scan of current inputs/ is clean — zero violations to fix before merging.

Test plan

  • pytest tests/test_adapter_static_only.py tests/test_fixture_no_import.py42 passed in 0.62s on Python 3.14 against a clean origin/main checkout.
  • ruff check tests/test_adapter_static_only.py tests/test_fixture_no_import.pyAll checks passed!
  • Verify the dedicated CI step Trust-model invariant lint runs in <1s and fails fast on synthetic violations (covered by test_lint_scanner_catches_known_violation_shapes, parametrized over 11 forbidden shapes).
  • Verify safe parsing patterns don't false-positive (test_lint_scanner_does_not_false_positive_on_safe_shapes covers re.compile, ast.parse, yaml.safe_load, json.loads).
  • Verify lint covers every known adapter file via test_invariant_lint_covers_every_adapter_module — catches a future inputs/ reorganization that would silently drop modules from the scan.

Layered design

  1. Structural enforcement (test_adapter_static_only.py, ~0.3s): catches a regression before it ships. The dedicated CI step name surfaces in the GitHub UI as Trust-model invariant lint · failed rather than a buried pytest line.
  2. Runtime enforcement (test_fixture_no_import.py, ~0.4s): proves the invariant holds against real adapter execution. The sys.modules snapshot catches even pathological loader paths (e.g., importlib.util.spec_from_file_location + lazy loader) that don't fire a raise at module top.
  3. Sample protection: samples/simple_langchain_agent/agent.py and samples/simple_crewai_agent/crew.py are asserted to retain their module-level raise RuntimeError(...) so the published demonstrators stay faithful to the no-import claim.

File-by-file

File Status LOC Purpose
tests/test_adapter_static_only.py new 291 AST lint + negative/positive controls + inventory sanity check
tests/test_fixture_no_import.py rewritten 696 (was 63) Per-adapter live-load tests + sample-protection
.github/workflows/ci.yml +8 Dedicated Trust-model invariant lint step before full test suite
STABILITY.md +25 Cite both tests as structural enforcement

🤖 Generated with Claude Code

@pengfei-threemoonslab
Copy link
Copy Markdown
Contributor Author

Thanks for the catch — verified all five bypass patterns reproduced empty-violations against the v1 lint. Fixed in 6cacfb6.

What changed

Implementation (tests/test_adapter_static_only.py):

  • Two-pass _scan_source: pass 1 walks every Import / ImportFrom and builds two alias maps — module_aliases: local_name -> canonical_module_path and name_aliases: local_name -> (canonical_module, canonical_attr). Pass 2 walks every Call and resolves attribute chains + bare-name calls back to their canonical dotted path before checking the forbidden sets.
  • Prefix families (new FORBIDDEN_ATTR_CALL_PREFIXES = ("os.exec", "os.spawn", "os.posix_spawn")) cover every os.exec* / os.spawn* / os.posix_spawn* variant the os module documents, plus any future Python addition under these prefixes. The previously-enumerated os.execv / os.execvp / os.spawnv entries are removed from the exact set (now covered by prefix).
  • TRACKED_NON_FORBIDDEN_MODULES = {"os"} marks modules with both legitimate uses (os.path, os.environ, os.getcwd) and forbidden surfaces. From-imports of forbidden attributes from a tracked module are flagged at the import line itself (clearer error than waiting for the eventual call site), and aliases are recorded for call-site resolution.
  • builtins added to FORBIDDEN_MODULESimport builtins, import builtins as b, from builtins import eval, and from builtins import eval as e are all rejected at the import line.
  • Wildcard from-import (from os import *) flagged — would otherwise pull system / execv* / popen / etc. into local scope and defeat the call-site check.

Acknowledged limitations documented in the module docstring:

  • Flow-sensitive aliasing: ev = eval; ev("1+1") — no AST signal at the call site once the name is rebound by assignment.
  • Reflective access: getattr(os, "system")("ls"), globals()["eval"]("..."), __builtins__["eval"]("...").
  • Attribute paths through aliased submodules: import os.path as p; p.system(...) does not resolve (would need to follow submodule attribute paths).

The lint is the structural floor; code review is the dataflow ceiling. Honest framing > false security.

Verification

Reproduced reviewer's exact bypass patterns against _scan_source directly:

alias-module    -> "forbidden call 'os.system'"
alias-from      -> "forbidden from-import of 'os.system'"
                   + "forbidden call 'os.system' (via from-import alias 'sh')"
execve          -> "forbidden call 'os.execve'"
spawnvp         -> "forbidden call 'os.spawnvp'"
builtins-attr   -> "forbidden import 'builtins' (dynamic Python loading surface)"

All five now flag, with attribution where the bypass was via alias.

Test surface

  • Negative-control parametrize grows from 11 → 35 cases. Every bypass shape called out in the review has a dedicated entry, plus broader coverage of the prefix families (execv, execve, execvpe, execlpe, spawnv, spawnvp, spawnvpe, posix_spawn, posix_spawnp) and module-alias / from-import-alias / wildcard / builtins patterns.
  • Safe-control test (test_lint_scanner_does_not_false_positive_on_safe_shapes) is tightened — the prior importlib workaround is gone and the assertion is now not violations. Tests legitimate os.path.join, os.environ.get, os.path.abspath, from os import getcwd, from os import environ to make sure adapters' real usage stays green.
  • 65 tests pass (up from 42), ruff clean, baseline scan of current inputs/ still clean.

No production code under src/ changed. STABILITY.md text and the live-load tests in test_fixture_no_import.py are unchanged from 22bc2fb.

@pengfei-threemoonslab
Copy link
Copy Markdown
Contributor Author

Both reproduced; both fixed in 7f6d7df.

P2 — order-of-import rebind bypass

Reproduced empty-violations against the prior commit:

rebind-os         -> []   # import os; os.system('echo hi'); import pathlib as os
rebind-aliased    -> []   # import os as runner; runner.execve(...); import pathlib as runner

Root cause: module_aliases: dict[str, str] and name_aliases: dict[str, tuple[str, str]] were last-write-wins. Pass 1 walked every import before pass 2 checked any call site, so the final alias state was used to resolve all calls. A later import pathlib as os overwrote module_aliases["os"] = "os" with module_aliases["os"] = "pathlib", and the earlier os.system(...) call resolved through pathlib.system — not forbidden.

Fix: union-of-bindings. module_aliases: dict[str, set[str]] and name_aliases: dict[str, set[tuple[str, str]]]. A local name binds to the union of every canonical module/attr pair it was ever assigned in the file. The chain resolver returns every possible canonical chain, and the call site is flagged if any resolution is forbidden. This is the conservative path the review suggested: "conservatively retain 'ever bound to os' aliases instead of overwriting them."

After fix, same inputs produce:

rebind-os         -> "forbidden call 'os.system'"
rebind-aliased    -> "forbidden call 'os.execve'"

False-positive trade-off (documented in the docstring): if a local name is ever bound to a dangerous module in any branch, every chain on that name is checked against the forbidden surface. Conservatism in the direction of "flag and let code review decide" is the right failure mode for a trust-model lint.

No-false-positive on legitimate dual-binding: verified import os; import pathlib as os; os.path.join('a','b') produces []os.path.join is in neither the exact nor prefix sets under either binding.

Four new negative-control cases cover both rebind shapes (module alias + from-import alias) plus the reverse-order variant where the dangerous binding follows the safe one.

P3 — STABILITY.md overpromised

You're right — the lint deliberately allows import importlib.metadata and from importlib.metadata import version because the plugin registry under checks/ uses it for entry-point discovery against the installed environment, not user workspace files. The STABILITY text saying "import importlib forms are forbidden"* was broader than the actual lint.

Tightened the section to enumerate exactly what's forbidden and explicitly call out importlib.metadata as allowed:

- Bare-name calls to exec / eval / __import__ / compile.
- Attribute calls to importlib.import_module,
  importlib.util.spec_from_file_location,
  importlib.util.module_from_spec,
  importlib.machinery.SourceFileLoader,
  runpy.run_path, runpy.run_module,
  subprocess.{run, call, Popen, check_call, check_output},
  os.system, os.popen, and every variant under the
  os.exec* / os.spawn* / os.posix_spawn* prefixes.
- Module imports of runpy, subprocess, importlib, importlib.util,
  importlib.machinery, and builtins — in any
  import X, import X as Y, or from X import … form.
- Wildcard `from os import *`.

importlib.metadata is intentionally allowed: the plugin registry
under checks/ (not inputs/) uses it for entry-point discovery, and
discovery happens against the installed environment, not user
workspace files. …

Final numbers

  • Negative-control parametrize: 35 → 39 cases (P2 adds 4 rebind shapes).
  • 69 tests pass (up from 65), ruff clean, baseline inputs/ still clean.
  • Production code under src/ unchanged.

pengfei-threemoonslab and others added 3 commits May 15, 2026 22:49
Adds two layers of structural enforcement for the public claim in
README and STABILITY.md that the scanner does not execute or import
user code.

1. tests/test_adapter_static_only.py (new) — AST scan of every source
   under src/agents_shipgate/inputs/. Rejects any call to exec / eval /
   __import__ / compile, any use of importlib.import_module /
   importlib.util.spec_from_file_location / importlib.util.module_from_spec /
   importlib.machinery.SourceFileLoader / runpy.run_path / runpy.run_module /
   subprocess.{run,call,Popen,check_call,check_output} / os.{system,popen,
   execv,execvp,spawnv}, and the matching import / from-import forms so an
   aliased re-export can't slip past the call-site checks. Includes negative-
   control parametrize over every forbidden shape and a false-positive
   control proving re.compile / ast.parse / yaml.safe_load / json.loads stay
   green. Baseline scan of current inputs/ is clean — no remediation work
   required to merge.

2. tests/test_fixture_no_import.py (rewritten) — per-adapter live-load
   tests. Each of LangChain, CrewAI, OpenAI Agents SDK, Google ADK, MCP,
   OpenAPI, Anthropic, OpenAI API, and n8n is driven through run_scan()
   against a fixture whose Python content (or sibling trap.py, for
   declarative adapters) raises RuntimeError at module load. Each test
   additionally snapshots sys.modules before/after and asserts no module
   whose __file__ resolves under the fixture root ends up cached — a
   strictly stronger property than relying on the runtime raise alone.
   Sample-protection assertions keep samples/simple_langchain_agent/agent.py
   and samples/simple_crewai_agent/crew.py faithful demonstrators of the
   no-import claim.

3. .github/workflows/ci.yml — adds a dedicated "Trust-model invariant
   lint" CI step that runs the AST scan before the main test suite, so a
   regression is visible at the top of CI logs rather than buried in
   pytest output.

4. STABILITY.md — upgrades the "Trust-model invariants" section to cite
   both tests as the structural enforcement and frame the property as
   "structurally absent" rather than "we tried to forbid X."

42 tests pass; ruff clean. Pre-existing CI surface is unchanged.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
PR reviewer found that the v1 lint could be bypassed by:

  1. Module aliasing: `import os as oo; oo.system(...)` — root Name
     was checked verbatim, never resolved to `os`.
  2. From-import alias: `from os import system as sh; sh(...)` — call
     site was a bare Name, not checked against from-import aliases.
  3. `os.exec*` / `os.spawn*` / `os.posix_spawn*` family variants —
     only `execv` / `execvp` / `spawnv` were enumerated; `execve`,
     `execvpe`, `execlpe`, `spawnvp`, `spawnvpe`, `posix_spawn`,
     `posix_spawnp` all slipped through.
  4. Builtins re-export: `import builtins; builtins.eval(...)` — the
     attribute chain wasn't in the forbidden set and `builtins` wasn't
     a forbidden import.

All four bypasses reproduced empty-violations against the v1 lint
before this commit; all are now caught and have dedicated negative-
control parametrize entries.

Implementation changes in tests/test_adapter_static_only.py:

- Two-pass `_scan_source`: pass 1 walks every `Import` / `ImportFrom`
  to build `module_aliases: local -> canonical_module_path` and
  `name_aliases: local -> (canonical_module, canonical_attr)`; pass 2
  resolves attribute chains and bare-name calls through the alias maps
  before checking the forbidden sets.
- New `FORBIDDEN_ATTR_CALL_PREFIXES = ("os.exec", "os.spawn",
  "os.posix_spawn")` catches every variant in those families without
  enumeration. The `os.execv` / `os.execvp` / `os.spawnv` exact entries
  are removed (covered by prefix).
- New `TRACKED_NON_FORBIDDEN_MODULES = {"os"}` records aliases for
  modules that have legitimate uses (`os.path`, `os.environ`,
  `os.getcwd`) but also forbidden surfaces.
- `builtins` added to `FORBIDDEN_MODULES`. `import builtins`,
  `import builtins as b`, `from builtins import eval`, and
  `from builtins import eval as e` are all rejected at the import line.
- From-imports of forbidden attributes are flagged at the import line
  itself, not only at the eventual call site, giving a clearer error.
- Wildcard from-import from a tracked module
  (`from os import *`) is rejected — would otherwise alias forbidden
  names into local scope and defeat the call-site check.
- Acknowledged limitations documented in the module docstring: flow-
  sensitive aliasing (`ev = eval; ev(...)`), reflective access
  (`getattr(os, "system")`, `globals()["eval"]`), and attribute paths
  through aliased submodules. These need code review, not lint.

Negative-control parametrize grows from 11 to 35 cases, covering every
bypass pattern in the docstring. Safe-control test cleans up the prior
`importlib` workaround and now asserts a fully-empty violation list.

65 tests pass (up from 42); ruff clean. Production code under
src/ is unchanged. Baseline scan of current inputs/ remains clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two findings from the second-round review:

  [P2] The two-pass alias map was last-write-wins, so a later
  ``import pathlib as os`` erased the earlier ``import os`` binding
  for the call-site check. ``import os; os.system("ls"); import
  pathlib as os`` therefore passed the lint with zero violations.
  Same shape with ``import os as runner; runner.execve(...); import
  pathlib as runner`` also passed.

  [P3] STABILITY.md claimed ``import importlib*`` forms are
  forbidden, but ``import importlib.metadata`` and ``from
  importlib.metadata import version`` are intentionally allowed
  (the plugin registry under checks/ uses metadata for entry-point
  discovery). The documentation overpromised what the lint enforces.

Reproduced both findings against the prior commit (rebind-os and
rebind-aliased returned []; importlib.metadata returned []) before
fixing.

Implementation (tests/test_adapter_static_only.py):

- Convert ``module_aliases: dict[str, str]`` to
  ``dict[str, set[str]]`` and ``name_aliases:
  dict[str, tuple[str, str]]`` to ``dict[str, set[tuple[str, str]]]``.
  A local name binds to the *union* of every canonical module/attr
  pair it was ever assigned in the file. ``import os; import pathlib
  as os`` now yields ``module_aliases["os"] = {"os", "pathlib"}``.
- ``_resolve_attribute_chain`` -> ``_resolve_attribute_chain_all``
  returns every canonical chain a call could resolve to. Call sites
  flag once if *any* resolution is forbidden.
- Same conservative-union logic for from-import aliases at bare-name
  call sites.
- Added 4 negative-control parametrize cases covering both
  rebind shapes (module alias + from-import alias) plus the
  reverse-order variant where the dangerous binding follows the safe
  one. Verified ``import os; import pathlib as os;
  os.path.join('a','b')`` does NOT false-positive — ``os.path.join``
  is in neither the exact nor prefix sets under either binding.

False-positive trade-off (acknowledged in the docstring): if a local
name is ever bound to a dangerous module *in any branch*, every
attribute chain on that name is checked against the forbidden surface.
For trust-model lint this is the right failure mode — suspicious
aliasing should be a code-review trigger, not a silent pass.

Documentation (STABILITY.md):

- Replace the vague ``import runpy / import subprocess / import
  importlib*`` line with an enumerated forbidden-modules list
  (``runpy``, ``subprocess``, ``importlib``, ``importlib.util``,
  ``importlib.machinery``, ``builtins``) and an enumerated
  forbidden-attribute list (including the ``os.exec*`` / ``os.spawn*``
  / ``os.posix_spawn*`` prefix families and ``os.system`` /
  ``os.popen``).
- Explicitly call out that ``importlib.metadata`` is allowed and why
  (plugin-discovery surface lives under ``checks/`` not ``inputs/``,
  and discovery runs against the installed environment, not user
  workspace files).
- Document the conservative union-of-bindings alias resolution so
  reviewers know an aliased rebinding cannot defeat the lint.

69 tests pass (up from 65); ruff clean. Production code under src/
unchanged. Baseline scan of current inputs/ still clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@pengfei-threemoonslab
Copy link
Copy Markdown
Contributor Author

You're right — and the cause was different from what the surface symptom suggested. Both findings are now resolved by rebasing the branch onto current main.

What actually happened

When I opened this PR, I branched from origin/main at 940fdcf (the latest commit at that moment). Since then, two PRs merged ahead of mine:

  • #78Add --check mode to generate_schemas.py + roundtrip tests (M4) → introduced the "Verify generated schemas are up to date" CI step, the --check flag, tests/test_schema_roundtrip.py, and the plugins_enabled=False guard at scripts/generate_schemas.py:791.
  • #81Add release_decision.contribution_rules[] audit (v0.17).

My PR branch was behind both. From the perspective of a gh pr checkout-style review against the current main, my PR looked like it was removing M4 infrastructure — because the merge would have. In fact I never touched scripts/generate_schemas.py, tests/test_schema_roundtrip.py, or that CI step; they simply weren't on my base.

Fix

Rebased the three M3 commits onto current origin/main. Resolution:

  • .github/workflows/ci.yml — only real conflict. Kept both steps in the right order:
    - name: Compile
    - name: Verify generated schemas are up to date  (M4 — restored)
    - name: Trust-model invariant lint              (M3 — this PR)
    - name: Test
    Schema drift fails CI first (fast structural drift). Trust-model lint fails CI second (also fast). Then the full pytest run.
  • STABILITY.md — clean merge; the section I edited was unchanged on main.
  • scripts/generate_schemas.py — picked up unmodified from main; plugins_enabled=False at line 791 is intact.
  • tests/test_schema_roundtrip.py — picked up unmodified from main; the regression guard test_checks_catalog_ignores_enabled_plugins (line 167) is intact.

Verification

Against the rebased branch tip (b2c7bcd):

$ python scripts/generate_schemas.py --check
(clean — exit 0)

$ pytest tests/test_adapter_static_only.py tests/test_fixture_no_import.py tests/test_schema_roundtrip.py
77 passed in <1s

$ grep plugins_enabled scripts/generate_schemas.py
791:    for check in check_catalog(plugins_enabled=False)

$ grep test_checks_catalog_ignores_enabled_plugins tests/test_schema_roundtrip.py
167:def test_checks_catalog_ignores_enabled_plugins(generator, monkeypatch):

The M4 schema-drift gate, the plugin-isolation guard, and the round-trip test suite are all back in place exactly as they shipped in #78. M3 layers on top with its own dedicated CI step.

Branch state

  • Three M3 commits replayed cleanly:
    • 63ff110 — Enforce no-import / no-exec trust invariant across all adapters (M3)
    • fab0d78 — Close lint bypass holes flagged in PR review (M3 follow-up)
    • b2c7bcd — Fix two more lint holes flagged in PR review (M3 follow-up 2)
  • Diff against current main is now: +1309 / -44 across 4 files — only the M3 surface (tests/test_adapter_static_only.py new, tests/test_fixture_no_import.py rewritten, STABILITY.md +42, .github/workflows/ci.yml +8).
  • Force-pushed with --force-with-lease.

Apologies for the noise — I should have rebased before opening this PR. Will check git log origin/main..HEAD before opening future M-item PRs to catch this.

@pengfei-threemoonslab pengfei-threemoonslab merged commit fcad6bd into main May 16, 2026
1 check passed
@pengfei-threemoonslab pengfei-threemoonslab deleted the claude/m3-no-import-invariant branch May 16, 2026 06:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant