test+fix: host-safety property harness + soundness fixes (cyclic repr/eq, dynamic bases, native-op bombs)#134
Merged
Merged
Conversation
Adds the load-bearing sandbox invariant as a *universal property*, not a
curated example list: for ANY input — generated, adversarial, or malformed —
`Pyex.run/2` must return {:ok,_,_} or {:error, %Pyex.Error{}}, never let an
Elixir-level crash escape, and always terminate (an outer wall-clock Task is
the backstop, so a resource ceiling that *fails* to stop a program becomes a
test failure instead of a frozen suite).
On its first run the harness found three real host-safety holes, all fixed
here (each pinned by a named regression case):
1. Self-referential containers hung the host in repr/str — the recursion ran
in native Elixir, outside the step loop, so NO resource ceiling fired. A
guest could hang the host with three lines (`a=[]; a.append(a); print(a)`).
Fix: cycle-aware rendering (`Protocols.with_cycle_guard` + a transient
`Ctx.repr_seen` set) emits `[...]`/`{...}` like CPython. Also gave
`eval_py_str` a dict clause so `str(dict)` is cycle-aware (was leaking
`<ref>` via the non-cycle-aware fallback).
2. `a == a` on a self-referential container hung (structural compare recursed
forever). Fix: identity short-circuit — same heap ref compares equal
without materializing the value (matches CPython, which checks identity
before __eq__).
3. `type('T', (frozenset,), {})()` (and other builtin bases that are
`{:builtin, _}` constructors rather than classes) crashed the host: the
un-reified base leaked into the MRO and an MRO walker had no clause for it.
Fix: `c3_linearize` drops any base that doesn't reify to a real class, so
the MRO only ever contains `{:class, _}` — every MRO walker is safe at once.
Gate: full suite (6173), the 127-property CPython differential fuzz
(conformance preserved), and Dialyzer all green.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_019NokzcR7BiAigPgC78zpk9
…on 1.19 Dialyzer)
CI's Elixir 1.19 Dialyzer treats MapSet.t() as opaque, so every %Ctx{}
construction site tripped call_without_opaque on the new repr_seen field.
A map-as-set is non-opaque and equivalent for cycle tracking.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_019NokzcR7BiAigPgC78zpk9
A sweep across the broader host-safety surface (parser depth, single-step
alloc/compute, blocking stdlib, stdlib pathologies, callback reentrancy)
turned up four operations that run native and never hit a step boundary, so
the step/memory/timeout ceilings can't interrupt them — a guest could hang
the host:
- math.factorial(10**6) — unbounded native bignum multiply chain
- bytes(10**9) — unbounded native allocation
- 'x'.rjust/ljust/center/zfill(10**9) — unbounded native string build
- time.sleep(n) — capped at 30s but ignoring the run's timeout,
so a run with timeout: 1500 could still block 30s
Fixes mirror the existing string-repetition guard (`*`, capped at 10M):
factorial n capped at 100_000, bytes count at 100_000_000, pad width at
10_000_000 (all fail fast with a clean Python error), and time.sleep now
caps to min(30s, run timeout) so it can never block longer than the budget.
The host-safety harness gains named regressions for each hole plus a
resource-bomb generator (sizes from tiny to 10**12 across bytes/`*`/rjust/
zfill/range/`**`/int-from-string) so the property exercises this axis too.
Gate: full suite (6183), 127-property CPython differential fuzz (conformance
preserved), and Dialyzer all green.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_019NokzcR7BiAigPgC78zpk9
Owner
Author
|
Extended this PR with a wider host-safety sweep beyond the original three fixes. Triaging the broader surface (parser depth, single-step alloc/compute, blocking stdlib, stdlib cyclic inputs, callback reentrancy) surfaced four more operations that ran native and bypassed every resource ceiling:
All four are now bounded (caps mirroring the existing Gate re-run green: full suite (6183), 127-property differential fuzz, Dialyzer. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Why
Trust in a sandbox comes from invariants it enforces and proves, not examples that happen to pass. The load-bearing invariant for pyex (a multitenant ocap sandbox running untrusted code) is: no guest can crash, hang, or unbound the host. That was tested only by a curated list of resource bombs (
AdversarialTest). This PR asserts it as a universal property — and on its first run the property found three real host-safety holes, all fixed here.The harness (
test/pyex/host_safety_test.exs)For any input — randomly generated, adversarial, or malformed —
Pyex.run/2must:{:ok, _, _}or{:error, %Pyex.Error{}}(never let an Elixir-level exception/exit/throw escape),Taskis the backstop, so a step/memory/timeout ceiling that fails to stop a program becomes a test failure, not a frozen suite,Plus a regression corpus pinning each host-crash-class input by name.
The three bugs it found (all fixed)
Self-referential
repr/strhung the host. The recursion runs in native Elixir, outside the interpreter step loop — so no resource ceiling fired.a=[]; a.append(a); print(a)hung indefinitely. Fix: cycle-aware rendering (Protocols.with_cycle_guard+ a transientCtx.repr_seenset) emits[...]/{...}like CPython; gaveeval_py_stra dict clause sostr(dict)is cycle-aware too (it had been leaking<ref>through the non-cycle-aware fallback).a == aon a self-referential container hung (structural comparison recursed forever). Fix: identity short-circuit — the same heap ref compares equal without materializing the value, matching CPython (which checks identity before__eq__).type('T', (frozenset,), {})()crashed the host. Builtin bases that are{:builtin, _}constructors (not classes) weren't reified, leaked into the MRO, and an MRO walker had no clause for them →FunctionClauseError. Fix:c3_linearizedrops any base that doesn't reify to a real class, so the MRO only ever contains{:class, _}— every MRO walker is made safe at once (defense at the source).Note on conformance
The cycle fixes are CPython-conformant for
==and for the cycle marker. Self-referentialprint(a)renders one nesting level deeper than CPython ([[[...]]]vs[[...]]) because the outermost container is passed already-dereferenced (no heap id to register); it terminates and clearly marks the cycle, which is the host-safety guarantee. The differential fuzz does not generate cyclic structures, so there is no conformance regression — confirmed below.Gate
mix format --check-formatted✅mix compile --warnings-as-errors✅mix test— full suite 6173 tests, 0 failures ✅mix dialyzer✅ (the 5 "unnecessary skips" are pre-existing.dialyzer_ignore.exsentries, untouched)🤖 Generated with Claude Code
https://claude.ai/code/session_019NokzcR7BiAigPgC78zpk9