Skip to content

test+fix: host-safety property harness + soundness fixes (cyclic repr/eq, dynamic bases, native-op bombs)#134

Merged
ivarvong merged 3 commits into
mainfrom
host-safety-harness
Jun 29, 2026
Merged

test+fix: host-safety property harness + soundness fixes (cyclic repr/eq, dynamic bases, native-op bombs)#134
ivarvong merged 3 commits into
mainfrom
host-safety-harness

Conversation

@ivarvong

Copy link
Copy Markdown
Owner

Why

Trust in a sandbox comes from invariants it enforces and proves, not examples that happen to pass. The load-bearing invariant for pyex (a multitenant ocap sandbox running untrusted code) is: no guest can crash, hang, or unbound the host. That was tested only by a curated list of resource bombs (AdversarialTest). This PR asserts it as a universal property — and on its first run the property found three real host-safety holes, all fixed here.

The harness (test/pyex/host_safety_test.exs)

For any input — randomly generated, adversarial, or malformed — Pyex.run/2 must:

  • return {:ok, _, _} or {:error, %Pyex.Error{}} (never let an Elixir-level exception/exit/throw escape),
  • terminate within a wall-clock bound — an outer Task is the backstop, so a step/memory/timeout ceiling that fails to stop a program becomes a test failure, not a frozen suite,
  • stay within its resource ceilings.

Plus a regression corpus pinning each host-crash-class input by name.

The three bugs it found (all fixed)

  1. Self-referential repr/str hung the host. The recursion runs in native Elixir, outside the interpreter step loop — so no resource ceiling fired. a=[]; a.append(a); print(a) hung indefinitely. Fix: cycle-aware rendering (Protocols.with_cycle_guard + a transient Ctx.repr_seen set) emits [...]/{...} like CPython; gave eval_py_str a dict clause so str(dict) is cycle-aware too (it had been leaking <ref> through the non-cycle-aware fallback).

  2. a == a on a self-referential container hung (structural comparison recursed forever). Fix: identity short-circuit — the same heap ref compares equal without materializing the value, matching CPython (which checks identity before __eq__).

  3. type('T', (frozenset,), {})() crashed the host. Builtin bases that are {:builtin, _} constructors (not classes) weren't reified, leaked into the MRO, and an MRO walker had no clause for them → FunctionClauseError. Fix: c3_linearize drops any base that doesn't reify to a real class, so the MRO only ever contains {:class, _} — every MRO walker is made safe at once (defense at the source).

Note on conformance

The cycle fixes are CPython-conformant for == and for the cycle marker. Self-referential print(a) renders one nesting level deeper than CPython ([[[...]]] vs [[...]]) because the outermost container is passed already-dereferenced (no heap id to register); it terminates and clearly marks the cycle, which is the host-safety guarantee. The differential fuzz does not generate cyclic structures, so there is no conformance regression — confirmed below.

Gate

  • mix format --check-formatted
  • mix compile --warnings-as-errors
  • mix test — full suite 6173 tests, 0 failures
  • 127-property CPython differential fuzz — 0 failures (conformance preserved) ✅
  • mix dialyzer ✅ (the 5 "unnecessary skips" are pre-existing .dialyzer_ignore.exs entries, untouched)

🤖 Generated with Claude Code

https://claude.ai/code/session_019NokzcR7BiAigPgC78zpk9

ivarvong and others added 3 commits June 29, 2026 09:28
Adds the load-bearing sandbox invariant as a *universal property*, not a
curated example list: for ANY input — generated, adversarial, or malformed —
`Pyex.run/2` must return {:ok,_,_} or {:error, %Pyex.Error{}}, never let an
Elixir-level crash escape, and always terminate (an outer wall-clock Task is
the backstop, so a resource ceiling that *fails* to stop a program becomes a
test failure instead of a frozen suite).

On its first run the harness found three real host-safety holes, all fixed
here (each pinned by a named regression case):

1. Self-referential containers hung the host in repr/str — the recursion ran
   in native Elixir, outside the step loop, so NO resource ceiling fired. A
   guest could hang the host with three lines (`a=[]; a.append(a); print(a)`).
   Fix: cycle-aware rendering (`Protocols.with_cycle_guard` + a transient
   `Ctx.repr_seen` set) emits `[...]`/`{...}` like CPython. Also gave
   `eval_py_str` a dict clause so `str(dict)` is cycle-aware (was leaking
   `<ref>` via the non-cycle-aware fallback).

2. `a == a` on a self-referential container hung (structural compare recursed
   forever). Fix: identity short-circuit — same heap ref compares equal
   without materializing the value (matches CPython, which checks identity
   before __eq__).

3. `type('T', (frozenset,), {})()` (and other builtin bases that are
   `{:builtin, _}` constructors rather than classes) crashed the host: the
   un-reified base leaked into the MRO and an MRO walker had no clause for it.
   Fix: `c3_linearize` drops any base that doesn't reify to a real class, so
   the MRO only ever contains `{:class, _}` — every MRO walker is safe at once.

Gate: full suite (6173), the 127-property CPython differential fuzz
(conformance preserved), and Dialyzer all green.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_019NokzcR7BiAigPgC78zpk9
…on 1.19 Dialyzer)

CI's Elixir 1.19 Dialyzer treats MapSet.t() as opaque, so every %Ctx{}
construction site tripped call_without_opaque on the new repr_seen field.
A map-as-set is non-opaque and equivalent for cycle tracking.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_019NokzcR7BiAigPgC78zpk9
A sweep across the broader host-safety surface (parser depth, single-step
alloc/compute, blocking stdlib, stdlib pathologies, callback reentrancy)
turned up four operations that run native and never hit a step boundary, so
the step/memory/timeout ceilings can't interrupt them — a guest could hang
the host:

  - math.factorial(10**6)  — unbounded native bignum multiply chain
  - bytes(10**9)           — unbounded native allocation
  - 'x'.rjust/ljust/center/zfill(10**9) — unbounded native string build
  - time.sleep(n)          — capped at 30s but ignoring the run's timeout,
                             so a run with timeout: 1500 could still block 30s

Fixes mirror the existing string-repetition guard (`*`, capped at 10M):
factorial n capped at 100_000, bytes count at 100_000_000, pad width at
10_000_000 (all fail fast with a clean Python error), and time.sleep now
caps to min(30s, run timeout) so it can never block longer than the budget.

The host-safety harness gains named regressions for each hole plus a
resource-bomb generator (sizes from tiny to 10**12 across bytes/`*`/rjust/
zfill/range/`**`/int-from-string) so the property exercises this axis too.

Gate: full suite (6183), 127-property CPython differential fuzz (conformance
preserved), and Dialyzer all green.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_019NokzcR7BiAigPgC78zpk9
@ivarvong ivarvong changed the title test+fix: host-safety property harness + 3 soundness fixes it surfaced test+fix: host-safety property harness + soundness fixes (cyclic repr/eq, dynamic bases, native-op bombs) Jun 29, 2026
@ivarvong

Copy link
Copy Markdown
Owner Author

Extended this PR with a wider host-safety sweep beyond the original three fixes. Triaging the broader surface (parser depth, single-step alloc/compute, blocking stdlib, stdlib cyclic inputs, callback reentrancy) surfaced four more operations that ran native and bypassed every resource ceiling:

  • math.factorial(10**6) — unbounded bignum
  • bytes(10**9) — unbounded allocation
  • 'x'.rjust/ljust/center/zfill(10**9) — unbounded string build
  • time.sleep(n) — capped at 30s but ignoring the run's timeout (a timeout: 1500 run could still block 30s)

All four are now bounded (caps mirroring the existing * 10M guard; time.sleep caps to min(30s, run timeout)), with named regressions + a resource-bomb generator added to the harness. Encouragingly, the same sweep confirmed deep parser nesting, cyclic deepcopy/json.dumps/membership, infinite itertoolslist, and raising sorted/__hash__ callbacks were already host-safe.

Gate re-run green: full suite (6183), 127-property differential fuzz, Dialyzer.

@ivarvong ivarvong merged commit 548618e into main Jun 29, 2026
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant