Enforce no-import / no-exec trust invariant across all adapters (M3)#79
Conversation
|
Thanks for the catch — verified all five bypass patterns reproduced empty-violations against the v1 lint. Fixed in What changedImplementation (
Acknowledged limitations documented in the module docstring:
The lint is the structural floor; code review is the dataflow ceiling. Honest framing > false security. VerificationReproduced reviewer's exact bypass patterns against All five now flag, with attribution where the bypass was via alias. Test surface
No production code under |
|
Both reproduced; both fixed in P2 — order-of-import rebind bypassReproduced empty-violations against the prior commit: Root cause: Fix: union-of-bindings. After fix, same inputs produce: False-positive trade-off (documented in the docstring): if a local name is ever bound to a dangerous module in any branch, every chain on that name is checked against the forbidden surface. Conservatism in the direction of "flag and let code review decide" is the right failure mode for a trust-model lint. No-false-positive on legitimate dual-binding: verified Four new negative-control cases cover both rebind shapes (module alias + from-import alias) plus the reverse-order variant where the dangerous binding follows the safe one. P3 — STABILITY.md overpromisedYou're right — the lint deliberately allows Tightened the section to enumerate exactly what's forbidden and explicitly call out - Bare-name calls to exec / eval / __import__ / compile.
- Attribute calls to importlib.import_module,
importlib.util.spec_from_file_location,
importlib.util.module_from_spec,
importlib.machinery.SourceFileLoader,
runpy.run_path, runpy.run_module,
subprocess.{run, call, Popen, check_call, check_output},
os.system, os.popen, and every variant under the
os.exec* / os.spawn* / os.posix_spawn* prefixes.
- Module imports of runpy, subprocess, importlib, importlib.util,
importlib.machinery, and builtins — in any
import X, import X as Y, or from X import … form.
- Wildcard `from os import *`.
importlib.metadata is intentionally allowed: the plugin registry
under checks/ (not inputs/) uses it for entry-point discovery, and
discovery happens against the installed environment, not user
workspace files. …Final numbers
|
Adds two layers of structural enforcement for the public claim in
README and STABILITY.md that the scanner does not execute or import
user code.
1. tests/test_adapter_static_only.py (new) — AST scan of every source
under src/agents_shipgate/inputs/. Rejects any call to exec / eval /
__import__ / compile, any use of importlib.import_module /
importlib.util.spec_from_file_location / importlib.util.module_from_spec /
importlib.machinery.SourceFileLoader / runpy.run_path / runpy.run_module /
subprocess.{run,call,Popen,check_call,check_output} / os.{system,popen,
execv,execvp,spawnv}, and the matching import / from-import forms so an
aliased re-export can't slip past the call-site checks. Includes negative-
control parametrize over every forbidden shape and a false-positive
control proving re.compile / ast.parse / yaml.safe_load / json.loads stay
green. Baseline scan of current inputs/ is clean — no remediation work
required to merge.
2. tests/test_fixture_no_import.py (rewritten) — per-adapter live-load
tests. Each of LangChain, CrewAI, OpenAI Agents SDK, Google ADK, MCP,
OpenAPI, Anthropic, OpenAI API, and n8n is driven through run_scan()
against a fixture whose Python content (or sibling trap.py, for
declarative adapters) raises RuntimeError at module load. Each test
additionally snapshots sys.modules before/after and asserts no module
whose __file__ resolves under the fixture root ends up cached — a
strictly stronger property than relying on the runtime raise alone.
Sample-protection assertions keep samples/simple_langchain_agent/agent.py
and samples/simple_crewai_agent/crew.py faithful demonstrators of the
no-import claim.
3. .github/workflows/ci.yml — adds a dedicated "Trust-model invariant
lint" CI step that runs the AST scan before the main test suite, so a
regression is visible at the top of CI logs rather than buried in
pytest output.
4. STABILITY.md — upgrades the "Trust-model invariants" section to cite
both tests as the structural enforcement and frame the property as
"structurally absent" rather than "we tried to forbid X."
42 tests pass; ruff clean. Pre-existing CI surface is unchanged.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
PR reviewer found that the v1 lint could be bypassed by:
1. Module aliasing: `import os as oo; oo.system(...)` — root Name
was checked verbatim, never resolved to `os`.
2. From-import alias: `from os import system as sh; sh(...)` — call
site was a bare Name, not checked against from-import aliases.
3. `os.exec*` / `os.spawn*` / `os.posix_spawn*` family variants —
only `execv` / `execvp` / `spawnv` were enumerated; `execve`,
`execvpe`, `execlpe`, `spawnvp`, `spawnvpe`, `posix_spawn`,
`posix_spawnp` all slipped through.
4. Builtins re-export: `import builtins; builtins.eval(...)` — the
attribute chain wasn't in the forbidden set and `builtins` wasn't
a forbidden import.
All four bypasses reproduced empty-violations against the v1 lint
before this commit; all are now caught and have dedicated negative-
control parametrize entries.
Implementation changes in tests/test_adapter_static_only.py:
- Two-pass `_scan_source`: pass 1 walks every `Import` / `ImportFrom`
to build `module_aliases: local -> canonical_module_path` and
`name_aliases: local -> (canonical_module, canonical_attr)`; pass 2
resolves attribute chains and bare-name calls through the alias maps
before checking the forbidden sets.
- New `FORBIDDEN_ATTR_CALL_PREFIXES = ("os.exec", "os.spawn",
"os.posix_spawn")` catches every variant in those families without
enumeration. The `os.execv` / `os.execvp` / `os.spawnv` exact entries
are removed (covered by prefix).
- New `TRACKED_NON_FORBIDDEN_MODULES = {"os"}` records aliases for
modules that have legitimate uses (`os.path`, `os.environ`,
`os.getcwd`) but also forbidden surfaces.
- `builtins` added to `FORBIDDEN_MODULES`. `import builtins`,
`import builtins as b`, `from builtins import eval`, and
`from builtins import eval as e` are all rejected at the import line.
- From-imports of forbidden attributes are flagged at the import line
itself, not only at the eventual call site, giving a clearer error.
- Wildcard from-import from a tracked module
(`from os import *`) is rejected — would otherwise alias forbidden
names into local scope and defeat the call-site check.
- Acknowledged limitations documented in the module docstring: flow-
sensitive aliasing (`ev = eval; ev(...)`), reflective access
(`getattr(os, "system")`, `globals()["eval"]`), and attribute paths
through aliased submodules. These need code review, not lint.
Negative-control parametrize grows from 11 to 35 cases, covering every
bypass pattern in the docstring. Safe-control test cleans up the prior
`importlib` workaround and now asserts a fully-empty violation list.
65 tests pass (up from 42); ruff clean. Production code under
src/ is unchanged. Baseline scan of current inputs/ remains clean.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two findings from the second-round review:
[P2] The two-pass alias map was last-write-wins, so a later
``import pathlib as os`` erased the earlier ``import os`` binding
for the call-site check. ``import os; os.system("ls"); import
pathlib as os`` therefore passed the lint with zero violations.
Same shape with ``import os as runner; runner.execve(...); import
pathlib as runner`` also passed.
[P3] STABILITY.md claimed ``import importlib*`` forms are
forbidden, but ``import importlib.metadata`` and ``from
importlib.metadata import version`` are intentionally allowed
(the plugin registry under checks/ uses metadata for entry-point
discovery). The documentation overpromised what the lint enforces.
Reproduced both findings against the prior commit (rebind-os and
rebind-aliased returned []; importlib.metadata returned []) before
fixing.
Implementation (tests/test_adapter_static_only.py):
- Convert ``module_aliases: dict[str, str]`` to
``dict[str, set[str]]`` and ``name_aliases:
dict[str, tuple[str, str]]`` to ``dict[str, set[tuple[str, str]]]``.
A local name binds to the *union* of every canonical module/attr
pair it was ever assigned in the file. ``import os; import pathlib
as os`` now yields ``module_aliases["os"] = {"os", "pathlib"}``.
- ``_resolve_attribute_chain`` -> ``_resolve_attribute_chain_all``
returns every canonical chain a call could resolve to. Call sites
flag once if *any* resolution is forbidden.
- Same conservative-union logic for from-import aliases at bare-name
call sites.
- Added 4 negative-control parametrize cases covering both
rebind shapes (module alias + from-import alias) plus the
reverse-order variant where the dangerous binding follows the safe
one. Verified ``import os; import pathlib as os;
os.path.join('a','b')`` does NOT false-positive — ``os.path.join``
is in neither the exact nor prefix sets under either binding.
False-positive trade-off (acknowledged in the docstring): if a local
name is ever bound to a dangerous module *in any branch*, every
attribute chain on that name is checked against the forbidden surface.
For trust-model lint this is the right failure mode — suspicious
aliasing should be a code-review trigger, not a silent pass.
Documentation (STABILITY.md):
- Replace the vague ``import runpy / import subprocess / import
importlib*`` line with an enumerated forbidden-modules list
(``runpy``, ``subprocess``, ``importlib``, ``importlib.util``,
``importlib.machinery``, ``builtins``) and an enumerated
forbidden-attribute list (including the ``os.exec*`` / ``os.spawn*``
/ ``os.posix_spawn*`` prefix families and ``os.system`` /
``os.popen``).
- Explicitly call out that ``importlib.metadata`` is allowed and why
(plugin-discovery surface lives under ``checks/`` not ``inputs/``,
and discovery runs against the installed environment, not user
workspace files).
- Document the conservative union-of-bindings alias resolution so
reviewers know an aliased rebinding cannot defeat the lint.
69 tests pass (up from 65); ruff clean. Production code under src/
unchanged. Baseline scan of current inputs/ still clean.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
7f6d7df to
b2c7bcd
Compare
|
You're right — and the cause was different from what the surface symptom suggested. Both findings are now resolved by rebasing the branch onto current What actually happenedWhen I opened this PR, I branched from
My PR branch was behind both. From the perspective of a FixRebased the three M3 commits onto current
VerificationAgainst the rebased branch tip ( The M4 schema-drift gate, the plugin-isolation guard, and the round-trip test suite are all back in place exactly as they shipped in #78. M3 layers on top with its own dedicated CI step. Branch state
Apologies for the noise — I should have rebased before opening this PR. Will check |
Summary
tests/test_adapter_static_only.py) over every source undersrc/agents_shipgate/inputs/that rejectsexec/eval/__import__/compile, dynamic-import surfaces (importlib.import_module,importlib.util.spec_from_file_location,importlib.machinery.SourceFileLoader,runpy.run_path,subprocess.*,os.system/popen/execv*/spawnv), and the matching import / from-import forms so an aliased re-export can't slip past the call-site checks. Runs as a dedicatedTrust-model invariant lintstep before the full pytest suite.tests/test_fixture_no_import.pyto drive every adapter end-to-end (LangChain, CrewAI, OpenAI Agents SDK, Google ADK, MCP, OpenAPI, Anthropic, OpenAI API, n8n) against a fixture with a module-levelraise RuntimeError(...)trap. Each test snapshotssys.modulesbefore/afterrun_scan()and asserts no module whose__file__resolves under the fixture root ends up cached — a strictly stronger property than the previousrunpy.run_pathsmoke check, which only proved the fixture would fail if imported but never exercised the adapter against it.Why
The public claim "No agent execution. No LLM calls. No MCP server connections. No scanner network calls." (README) and "the SDK loaders use
ast.parseonly" (STABILITY.md) was previously held up by convention plus two narrowrunpy.run_pathchecks on the LangChain and CrewAI samples. A new adapter could silently introduceimportlib.import_module(user_path)and the tests wouldn't notice. This PR converts the claim from a documentation promise into a structural CI gate.What it does NOT change
src/agents_shipgate/is modified.inputs/is clean — zero violations to fix before merging.Test plan
pytest tests/test_adapter_static_only.py tests/test_fixture_no_import.py— 42 passed in 0.62s on Python 3.14 against a cleanorigin/maincheckout.ruff check tests/test_adapter_static_only.py tests/test_fixture_no_import.py— All checks passed!Trust-model invariant lintruns in <1s and fails fast on synthetic violations (covered bytest_lint_scanner_catches_known_violation_shapes, parametrized over 11 forbidden shapes).test_lint_scanner_does_not_false_positive_on_safe_shapescoversre.compile,ast.parse,yaml.safe_load,json.loads).test_invariant_lint_covers_every_adapter_module— catches a futureinputs/reorganization that would silently drop modules from the scan.Layered design
test_adapter_static_only.py, ~0.3s): catches a regression before it ships. The dedicated CI step name surfaces in the GitHub UI asTrust-model invariant lint · failedrather than a buried pytest line.test_fixture_no_import.py, ~0.4s): proves the invariant holds against real adapter execution. Thesys.modulessnapshot catches even pathological loader paths (e.g.,importlib.util.spec_from_file_location+ lazy loader) that don't fire araiseat module top.samples/simple_langchain_agent/agent.pyandsamples/simple_crewai_agent/crew.pyare asserted to retain their module-levelraise RuntimeError(...)so the published demonstrators stay faithful to the no-import claim.File-by-file
tests/test_adapter_static_only.pytests/test_fixture_no_import.py.github/workflows/ci.ymlTrust-model invariant lintstep before full test suiteSTABILITY.md🤖 Generated with Claude Code