Enforce no-import / no-exec trust invariant across all adapters (M3) by pengfei-threemoonslab · Pull Request #79 · ThreeMoonsLab/agents-shipgate

pengfei-threemoonslab · 2026-05-16T05:15:08Z

Summary

Adds an AST lint (tests/test_adapter_static_only.py) over every source under src/agents_shipgate/inputs/ that rejects exec / eval / __import__ / compile, dynamic-import surfaces (importlib.import_module, importlib.util.spec_from_file_location, importlib.machinery.SourceFileLoader, runpy.run_path, subprocess.*, os.system/popen/execv*/spawnv), and the matching import / from-import forms so an aliased re-export can't slip past the call-site checks. Runs as a dedicated Trust-model invariant lint step before the full pytest suite.
Rewrites tests/test_fixture_no_import.py to drive every adapter end-to-end (LangChain, CrewAI, OpenAI Agents SDK, Google ADK, MCP, OpenAPI, Anthropic, OpenAI API, n8n) against a fixture with a module-level raise RuntimeError(...) trap. Each test snapshots sys.modules before/after run_scan() and asserts no module whose __file__ resolves under the fixture root ends up cached — a strictly stronger property than the previous runpy.run_path smoke check, which only proved the fixture would fail if imported but never exercised the adapter against it.
Upgrades STABILITY.md § Trust-model invariants to cite both tests as the structural enforcement, framing the property as structurally absent rather than "we tried to forbid X."

Why

The public claim "No agent execution. No LLM calls. No MCP server connections. No scanner network calls." (README) and "the SDK loaders use ast.parse only" (STABILITY.md) was previously held up by convention plus two narrow runpy.run_path checks on the LangChain and CrewAI samples. A new adapter could silently introduce importlib.import_module(user_path) and the tests wouldn't notice. This PR converts the claim from a documentation promise into a structural CI gate.

What it does NOT change

No production code under src/agents_shipgate/ is modified.
The trust-model claim itself is unchanged — this PR is purely enforcement and documentation of an existing invariant.
No schema version bump. No new check IDs. No new exit codes. No new manifest fields.
Baseline scan of current inputs/ is clean — zero violations to fix before merging.

Test plan

pytest tests/test_adapter_static_only.py tests/test_fixture_no_import.py — 42 passed in 0.62s on Python 3.14 against a clean origin/main checkout.
ruff check tests/test_adapter_static_only.py tests/test_fixture_no_import.py — All checks passed!
Verify the dedicated CI step Trust-model invariant lint runs in <1s and fails fast on synthetic violations (covered by test_lint_scanner_catches_known_violation_shapes, parametrized over 11 forbidden shapes).
Verify safe parsing patterns don't false-positive (test_lint_scanner_does_not_false_positive_on_safe_shapes covers re.compile, ast.parse, yaml.safe_load, json.loads).
Verify lint covers every known adapter file via test_invariant_lint_covers_every_adapter_module — catches a future inputs/ reorganization that would silently drop modules from the scan.

Layered design

Structural enforcement (test_adapter_static_only.py, ~0.3s): catches a regression before it ships. The dedicated CI step name surfaces in the GitHub UI as Trust-model invariant lint · failed rather than a buried pytest line.
Runtime enforcement (test_fixture_no_import.py, ~0.4s): proves the invariant holds against real adapter execution. The sys.modules snapshot catches even pathological loader paths (e.g., importlib.util.spec_from_file_location + lazy loader) that don't fire a raise at module top.
Sample protection: samples/simple_langchain_agent/agent.py and samples/simple_crewai_agent/crew.py are asserted to retain their module-level raise RuntimeError(...) so the published demonstrators stay faithful to the no-import claim.

File-by-file

File	Status	LOC	Purpose
`tests/test_adapter_static_only.py`	new	291	AST lint + negative/positive controls + inventory sanity check
`tests/test_fixture_no_import.py`	rewritten	696 (was 63)	Per-adapter live-load tests + sample-protection
`.github/workflows/ci.yml`	+8	—	Dedicated `Trust-model invariant lint` step before full test suite
`STABILITY.md`	+25	—	Cite both tests as structural enforcement

🤖 Generated with Claude Code

pengfei-threemoonslab · 2026-05-16T05:32:50Z

Thanks for the catch — verified all five bypass patterns reproduced empty-violations against the v1 lint. Fixed in 6cacfb6.

What changed

Implementation (tests/test_adapter_static_only.py):

Two-pass _scan_source: pass 1 walks every Import / ImportFrom and builds two alias maps — module_aliases: local_name -> canonical_module_path and name_aliases: local_name -> (canonical_module, canonical_attr). Pass 2 walks every Call and resolves attribute chains + bare-name calls back to their canonical dotted path before checking the forbidden sets.
Prefix families (new FORBIDDEN_ATTR_CALL_PREFIXES = ("os.exec", "os.spawn", "os.posix_spawn")) cover every os.exec* / os.spawn* / os.posix_spawn* variant the os module documents, plus any future Python addition under these prefixes. The previously-enumerated os.execv / os.execvp / os.spawnv entries are removed from the exact set (now covered by prefix).
TRACKED_NON_FORBIDDEN_MODULES = {"os"} marks modules with both legitimate uses (os.path, os.environ, os.getcwd) and forbidden surfaces. From-imports of forbidden attributes from a tracked module are flagged at the import line itself (clearer error than waiting for the eventual call site), and aliases are recorded for call-site resolution.
builtins added to FORBIDDEN_MODULES — import builtins, import builtins as b, from builtins import eval, and from builtins import eval as e are all rejected at the import line.
Wildcard from-import (from os import *) flagged — would otherwise pull system / execv* / popen / etc. into local scope and defeat the call-site check.

Acknowledged limitations documented in the module docstring:

Flow-sensitive aliasing: ev = eval; ev("1+1") — no AST signal at the call site once the name is rebound by assignment.
Reflective access: getattr(os, "system")("ls"), globals()["eval"]("..."), __builtins__["eval"]("...").
Attribute paths through aliased submodules: import os.path as p; p.system(...) does not resolve (would need to follow submodule attribute paths).

The lint is the structural floor; code review is the dataflow ceiling. Honest framing > false security.

Verification

Reproduced reviewer's exact bypass patterns against _scan_source directly:

alias-module    -> "forbidden call 'os.system'"
alias-from      -> "forbidden from-import of 'os.system'"
                   + "forbidden call 'os.system' (via from-import alias 'sh')"
execve          -> "forbidden call 'os.execve'"
spawnvp         -> "forbidden call 'os.spawnvp'"
builtins-attr   -> "forbidden import 'builtins' (dynamic Python loading surface)"

All five now flag, with attribution where the bypass was via alias.

Test surface

Negative-control parametrize grows from 11 → 35 cases. Every bypass shape called out in the review has a dedicated entry, plus broader coverage of the prefix families (execv, execve, execvpe, execlpe, spawnv, spawnvp, spawnvpe, posix_spawn, posix_spawnp) and module-alias / from-import-alias / wildcard / builtins patterns.
Safe-control test (test_lint_scanner_does_not_false_positive_on_safe_shapes) is tightened — the prior importlib workaround is gone and the assertion is now not violations. Tests legitimate os.path.join, os.environ.get, os.path.abspath, from os import getcwd, from os import environ to make sure adapters' real usage stays green.
65 tests pass (up from 42), ruff clean, baseline scan of current inputs/ still clean.

No production code under src/ changed. STABILITY.md text and the live-load tests in test_fixture_no_import.py are unchanged from 22bc2fb.

pengfei-threemoonslab · 2026-05-16T05:42:54Z

Both reproduced; both fixed in 7f6d7df.

P2 — order-of-import rebind bypass

Reproduced empty-violations against the prior commit:

rebind-os         -> []   # import os; os.system('echo hi'); import pathlib as os
rebind-aliased    -> []   # import os as runner; runner.execve(...); import pathlib as runner

Root cause: module_aliases: dict[str, str] and name_aliases: dict[str, tuple[str, str]] were last-write-wins. Pass 1 walked every import before pass 2 checked any call site, so the final alias state was used to resolve all calls. A later import pathlib as os overwrote module_aliases["os"] = "os" with module_aliases["os"] = "pathlib", and the earlier os.system(...) call resolved through pathlib.system — not forbidden.

Fix: union-of-bindings. module_aliases: dict[str, set[str]] and name_aliases: dict[str, set[tuple[str, str]]]. A local name binds to the union of every canonical module/attr pair it was ever assigned in the file. The chain resolver returns every possible canonical chain, and the call site is flagged if any resolution is forbidden. This is the conservative path the review suggested: "conservatively retain 'ever bound to os' aliases instead of overwriting them."

After fix, same inputs produce:

rebind-os         -> "forbidden call 'os.system'"
rebind-aliased    -> "forbidden call 'os.execve'"

False-positive trade-off (documented in the docstring): if a local name is ever bound to a dangerous module in any branch, every chain on that name is checked against the forbidden surface. Conservatism in the direction of "flag and let code review decide" is the right failure mode for a trust-model lint.

No-false-positive on legitimate dual-binding: verified import os; import pathlib as os; os.path.join('a','b') produces [] — os.path.join is in neither the exact nor prefix sets under either binding.

Four new negative-control cases cover both rebind shapes (module alias + from-import alias) plus the reverse-order variant where the dangerous binding follows the safe one.

P3 — STABILITY.md overpromised

You're right — the lint deliberately allows import importlib.metadata and from importlib.metadata import version because the plugin registry under checks/ uses it for entry-point discovery against the installed environment, not user workspace files. The STABILITY text saying "import importlib forms are forbidden"* was broader than the actual lint.

Tightened the section to enumerate exactly what's forbidden and explicitly call out importlib.metadata as allowed:

- Bare-name calls to exec / eval / __import__ / compile.
- Attribute calls to importlib.import_module,
  importlib.util.spec_from_file_location,
  importlib.util.module_from_spec,
  importlib.machinery.SourceFileLoader,
  runpy.run_path, runpy.run_module,
  subprocess.{run, call, Popen, check_call, check_output},
  os.system, os.popen, and every variant under the
  os.exec* / os.spawn* / os.posix_spawn* prefixes.
- Module imports of runpy, subprocess, importlib, importlib.util,
  importlib.machinery, and builtins — in any
  import X, import X as Y, or from X import … form.
- Wildcard `from os import *`.

importlib.metadata is intentionally allowed: the plugin registry
under checks/ (not inputs/) uses it for entry-point discovery, and
discovery happens against the installed environment, not user
workspace files. …

Final numbers

Negative-control parametrize: 35 → 39 cases (P2 adds 4 rebind shapes).
69 tests pass (up from 65), ruff clean, baseline inputs/ still clean.
Production code under src/ unchanged.

Adds two layers of structural enforcement for the public claim in README and STABILITY.md that the scanner does not execute or import user code. 1. tests/test_adapter_static_only.py (new) — AST scan of every source under src/agents_shipgate/inputs/. Rejects any call to exec / eval / __import__ / compile, any use of importlib.import_module / importlib.util.spec_from_file_location / importlib.util.module_from_spec / importlib.machinery.SourceFileLoader / runpy.run_path / runpy.run_module / subprocess.{run,call,Popen,check_call,check_output} / os.{system,popen, execv,execvp,spawnv}, and the matching import / from-import forms so an aliased re-export can't slip past the call-site checks. Includes negative- control parametrize over every forbidden shape and a false-positive control proving re.compile / ast.parse / yaml.safe_load / json.loads stay green. Baseline scan of current inputs/ is clean — no remediation work required to merge. 2. tests/test_fixture_no_import.py (rewritten) — per-adapter live-load tests. Each of LangChain, CrewAI, OpenAI Agents SDK, Google ADK, MCP, OpenAPI, Anthropic, OpenAI API, and n8n is driven through run_scan() against a fixture whose Python content (or sibling trap.py, for declarative adapters) raises RuntimeError at module load. Each test additionally snapshots sys.modules before/after and asserts no module whose __file__ resolves under the fixture root ends up cached — a strictly stronger property than relying on the runtime raise alone. Sample-protection assertions keep samples/simple_langchain_agent/agent.py and samples/simple_crewai_agent/crew.py faithful demonstrators of the no-import claim. 3. .github/workflows/ci.yml — adds a dedicated "Trust-model invariant lint" CI step that runs the AST scan before the main test suite, so a regression is visible at the top of CI logs rather than buried in pytest output. 4. STABILITY.md — upgrades the "Trust-model invariants" section to cite both tests as the structural enforcement and frame the property as "structurally absent" rather than "we tried to forbid X." 42 tests pass; ruff clean. Pre-existing CI surface is unchanged. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

PR reviewer found that the v1 lint could be bypassed by: 1. Module aliasing: `import os as oo; oo.system(...)` — root Name was checked verbatim, never resolved to `os`. 2. From-import alias: `from os import system as sh; sh(...)` — call site was a bare Name, not checked against from-import aliases. 3. `os.exec*` / `os.spawn*` / `os.posix_spawn*` family variants — only `execv` / `execvp` / `spawnv` were enumerated; `execve`, `execvpe`, `execlpe`, `spawnvp`, `spawnvpe`, `posix_spawn`, `posix_spawnp` all slipped through. 4. Builtins re-export: `import builtins; builtins.eval(...)` — the attribute chain wasn't in the forbidden set and `builtins` wasn't a forbidden import. All four bypasses reproduced empty-violations against the v1 lint before this commit; all are now caught and have dedicated negative- control parametrize entries. Implementation changes in tests/test_adapter_static_only.py: - Two-pass `_scan_source`: pass 1 walks every `Import` / `ImportFrom` to build `module_aliases: local -> canonical_module_path` and `name_aliases: local -> (canonical_module, canonical_attr)`; pass 2 resolves attribute chains and bare-name calls through the alias maps before checking the forbidden sets. - New `FORBIDDEN_ATTR_CALL_PREFIXES = ("os.exec", "os.spawn", "os.posix_spawn")` catches every variant in those families without enumeration. The `os.execv` / `os.execvp` / `os.spawnv` exact entries are removed (covered by prefix). - New `TRACKED_NON_FORBIDDEN_MODULES = {"os"}` records aliases for modules that have legitimate uses (`os.path`, `os.environ`, `os.getcwd`) but also forbidden surfaces. - `builtins` added to `FORBIDDEN_MODULES`. `import builtins`, `import builtins as b`, `from builtins import eval`, and `from builtins import eval as e` are all rejected at the import line. - From-imports of forbidden attributes are flagged at the import line itself, not only at the eventual call site, giving a clearer error. - Wildcard from-import from a tracked module (`from os import *`) is rejected — would otherwise alias forbidden names into local scope and defeat the call-site check. - Acknowledged limitations documented in the module docstring: flow- sensitive aliasing (`ev = eval; ev(...)`), reflective access (`getattr(os, "system")`, `globals()["eval"]`), and attribute paths through aliased submodules. These need code review, not lint. Negative-control parametrize grows from 11 to 35 cases, covering every bypass pattern in the docstring. Safe-control test cleans up the prior `importlib` workaround and now asserts a fully-empty violation list. 65 tests pass (up from 42); ruff clean. Production code under src/ is unchanged. Baseline scan of current inputs/ remains clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Two findings from the second-round review: [P2] The two-pass alias map was last-write-wins, so a later ``import pathlib as os`` erased the earlier ``import os`` binding for the call-site check. ``import os; os.system("ls"); import pathlib as os`` therefore passed the lint with zero violations. Same shape with ``import os as runner; runner.execve(...); import pathlib as runner`` also passed. [P3] STABILITY.md claimed ``import importlib*`` forms are forbidden, but ``import importlib.metadata`` and ``from importlib.metadata import version`` are intentionally allowed (the plugin registry under checks/ uses metadata for entry-point discovery). The documentation overpromised what the lint enforces. Reproduced both findings against the prior commit (rebind-os and rebind-aliased returned []; importlib.metadata returned []) before fixing. Implementation (tests/test_adapter_static_only.py): - Convert ``module_aliases: dict[str, str]`` to ``dict[str, set[str]]`` and ``name_aliases: dict[str, tuple[str, str]]`` to ``dict[str, set[tuple[str, str]]]``. A local name binds to the *union* of every canonical module/attr pair it was ever assigned in the file. ``import os; import pathlib as os`` now yields ``module_aliases["os"] = {"os", "pathlib"}``. - ``_resolve_attribute_chain`` -> ``_resolve_attribute_chain_all`` returns every canonical chain a call could resolve to. Call sites flag once if *any* resolution is forbidden. - Same conservative-union logic for from-import aliases at bare-name call sites. - Added 4 negative-control parametrize cases covering both rebind shapes (module alias + from-import alias) plus the reverse-order variant where the dangerous binding follows the safe one. Verified ``import os; import pathlib as os; os.path.join('a','b')`` does NOT false-positive — ``os.path.join`` is in neither the exact nor prefix sets under either binding. False-positive trade-off (acknowledged in the docstring): if a local name is ever bound to a dangerous module *in any branch*, every attribute chain on that name is checked against the forbidden surface. For trust-model lint this is the right failure mode — suspicious aliasing should be a code-review trigger, not a silent pass. Documentation (STABILITY.md): - Replace the vague ``import runpy / import subprocess / import importlib*`` line with an enumerated forbidden-modules list (``runpy``, ``subprocess``, ``importlib``, ``importlib.util``, ``importlib.machinery``, ``builtins``) and an enumerated forbidden-attribute list (including the ``os.exec*`` / ``os.spawn*`` / ``os.posix_spawn*`` prefix families and ``os.system`` / ``os.popen``). - Explicitly call out that ``importlib.metadata`` is allowed and why (plugin-discovery surface lives under ``checks/`` not ``inputs/``, and discovery runs against the installed environment, not user workspace files). - Document the conservative union-of-bindings alias resolution so reviewers know an aliased rebinding cannot defeat the lint. 69 tests pass (up from 65); ruff clean. Production code under src/ unchanged. Baseline scan of current inputs/ still clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

pengfei-threemoonslab · 2026-05-16T05:50:30Z

You're right — and the cause was different from what the surface symptom suggested. Both findings are now resolved by rebasing the branch onto current main.

What actually happened

When I opened this PR, I branched from origin/main at 940fdcf (the latest commit at that moment). Since then, two PRs merged ahead of mine:

#78 — Add --check mode to generate_schemas.py + roundtrip tests (M4) → introduced the "Verify generated schemas are up to date" CI step, the --check flag, tests/test_schema_roundtrip.py, and the plugins_enabled=False guard at scripts/generate_schemas.py:791.
#81 — Add release_decision.contribution_rules[] audit (v0.17).

My PR branch was behind both. From the perspective of a gh pr checkout-style review against the current main, my PR looked like it was removing M4 infrastructure — because the merge would have. In fact I never touched scripts/generate_schemas.py, tests/test_schema_roundtrip.py, or that CI step; they simply weren't on my base.

Fix

Rebased the three M3 commits onto current origin/main. Resolution:

.github/workflows/ci.yml — only real conflict. Kept both steps in the right order:
```
- name: Compile
- name: Verify generated schemas are up to date  (M4 — restored)
- name: Trust-model invariant lint              (M3 — this PR)
- name: Test
```
Schema drift fails CI first (fast structural drift). Trust-model lint fails CI second (also fast). Then the full pytest run.
STABILITY.md — clean merge; the section I edited was unchanged on main.
scripts/generate_schemas.py — picked up unmodified from main; plugins_enabled=False at line 791 is intact.
tests/test_schema_roundtrip.py — picked up unmodified from main; the regression guard test_checks_catalog_ignores_enabled_plugins (line 167) is intact.

Verification

Against the rebased branch tip (b2c7bcd):

$ python scripts/generate_schemas.py --check
(clean — exit 0)

$ pytest tests/test_adapter_static_only.py tests/test_fixture_no_import.py tests/test_schema_roundtrip.py
77 passed in <1s

$ grep plugins_enabled scripts/generate_schemas.py
791:    for check in check_catalog(plugins_enabled=False)

$ grep test_checks_catalog_ignores_enabled_plugins tests/test_schema_roundtrip.py
167:def test_checks_catalog_ignores_enabled_plugins(generator, monkeypatch):

The M4 schema-drift gate, the plugin-isolation guard, and the round-trip test suite are all back in place exactly as they shipped in #78. M3 layers on top with its own dedicated CI step.

Branch state

Three M3 commits replayed cleanly:
- 63ff110 — Enforce no-import / no-exec trust invariant across all adapters (M3)
- fab0d78 — Close lint bypass holes flagged in PR review (M3 follow-up)
- b2c7bcd — Fix two more lint holes flagged in PR review (M3 follow-up 2)
Diff against current main is now: +1309 / -44 across 4 files — only the M3 surface (tests/test_adapter_static_only.py new, tests/test_fixture_no_import.py rewritten, STABILITY.md +42, .github/workflows/ci.yml +8).
Force-pushed with --force-with-lease.

Apologies for the noise — I should have rebased before opening this PR. Will check git log origin/main..HEAD before opening future M-item PRs to catch this.

pengfei-threemoonslab and others added 3 commits May 15, 2026 22:49

pengfei-threemoonslab mentioned this pull request May 16, 2026

Add plugin validation pipeline + --strict-plugins (M5) #83

Merged

6 tasks

pengfei-threemoonslab force-pushed the claude/m3-no-import-invariant branch from 7f6d7df to b2c7bcd Compare May 16, 2026 05:50

Tighten trust invariant test coverage

ed94e1a

pengfei-threemoonslab merged commit fcad6bd into main May 16, 2026
1 check passed

pengfei-threemoonslab deleted the claude/m3-no-import-invariant branch May 16, 2026 06:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enforce no-import / no-exec trust invariant across all adapters (M3)#79

Enforce no-import / no-exec trust invariant across all adapters (M3)#79
pengfei-threemoonslab merged 4 commits into
mainfrom
claude/m3-no-import-invariant

pengfei-threemoonslab commented May 16, 2026

Uh oh!

pengfei-threemoonslab commented May 16, 2026

Uh oh!

pengfei-threemoonslab commented May 16, 2026

Uh oh!

pengfei-threemoonslab commented May 16, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

pengfei-threemoonslab commented May 16, 2026

Summary

Why

What it does NOT change

Test plan

Layered design

File-by-file

Uh oh!

pengfei-threemoonslab commented May 16, 2026

What changed

Verification

Test surface

Uh oh!

pengfei-threemoonslab commented May 16, 2026

P2 — order-of-import rebind bypass

P3 — STABILITY.md overpromised

Final numbers

Uh oh!

pengfei-threemoonslab commented May 16, 2026

What actually happened

Fix

Verification

Branch state

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant