docs: processless by design + how the caller adds hard process ceilings by ivarvong · Pull Request #135 · ivarvong/pyex

ivarvong · 2026-06-29T14:36:36Z

Why

A reader asked the right question after the host-safety work: the interpreter's resource ceilings are cooperative (checked between evaluation steps), so a single native op or a blocking NIF can run to completion before the next check. The truly input-independent guarantee — bound memory and time regardless of what the code does — comes from the runtime, not the interpreter.

Pyex deliberately does not provide that itself: run/2 is a synchronous call on the caller's process, and Pyex.BannedCallTracer already enforces on every CI run that Process/Task/spawn never appear in lib/pyex. Keeping it processless is what makes it deterministic, embeddable, and pool-friendly. So the right move is to document how the caller adds the process layer, not to bake one in.

What this adds (docs only)

A new "Hard ceilings the caller adds (Pyex is processless by design)" subsection in the README sandbox model, explaining the two-layer resource model:
- In-process, cooperative (Pyex): step/memory-estimate/output/call-depth/timeout, returning a clean %Pyex.Error{}.
- Out-of-process, hard (caller): a GC-enforced max_heap_size (with kill: true and include_shared_binaries so large off-heap binaries count too) plus a wall-clock brutal-kill watchdog.
A ready-to-use SafeRunner wrapper (spawn_monitor so a guest OOM can't take the caller down, message-passed result, :out_of_memory / :timeout outcomes).
A pointer from the Pyex moduledoc so integrators find it from the API docs.

Verified

The documented wrapper was exercised end to end, not just written:

happy path → {:ok, 42, _}
infinite loop with Pyex's own limits disabled → caller's {:error, :timeout} (watchdog fires)
unbounded allocation with Pyex's memory budget disabled, 40 MB heap cap → {:error, :out_of_memory} (GC kill fires)

No library code changes — Pyex stays processless. Independent of #134 (different files; no conflict).

🤖 Generated with Claude Code

https://claude.ai/code/session_019NokzcR7BiAigPgC78zpk9

Pyex never spawns, links, or sleeps a process of its own — run/2 is a synchronous call on the caller's process, and a CI static analyzer already enforces that (Process/Task/spawn are banned in lib/pyex). That keeps the interpreter deterministic, embeddable, and pool-friendly, and leaves the process model to the caller. Documents the two-layer resource model and gives a ready-to-use SafeRunner: the interpreter's cooperative in-process limits, plus the caller's hard, BEAM-enforced ceilings (max_heap_size with kill + include_shared_binaries, and a wall-clock brutal-kill watchdog) that bound memory and time regardless of what the guest does — including gaps in the cooperative accounting and uninterruptible NIFs. The wrapper was verified end to end: happy path returns, an infinite loop with limits disabled hits the watchdog, and an unbounded allocation hits the heap kill. Adds the guidance to the README sandbox model and a pointer from the Pyex moduledoc. Docs only; no library code changes (Pyex stays processless). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_019NokzcR7BiAigPgC78zpk9

…ledger A small Plug/Bandit server demonstrating the processless model end to end: each HTTP request is its own process, so the request handler is the isolation boundary — it spawns a monitored worker, caps the worker's heap (GC-enforced), runs the untrusted Python, and watchdogs the wall clock. Returns proper status codes (200 ok / 400 python_error / 504 timeout / 507 out_of_memory), and on success the response also carries `trace`: the host's own rendered span tree of every storage op the program caused — unforgeable by the guest. Verified end to end over real HTTP (curl): a storage program comes back with its db.set/db.get/db.delete spans, a runaway loop with 504, a memory bomb with 507. Referenced from the README sandbox section. Docs/example only. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_019NokzcR7BiAigPgC78zpk9

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_019NokzcR7BiAigPgC78zpk9

507 (WebDAV "Insufficient Storage") and 504 (proxy "Gateway Timeout") were the wrong shape for "the guest used too much memory / time" — those are execution verdicts, not transport conditions, and there is no honest HTTP code for them. The HTTP status now describes the API call: running and bounding a job is a successful request (200) whose verdict — ok / error / timeout / out_of_memory — is a field in the body. 400 is reserved for a malformed HTTP request (empty/oversized body) and 5xx for a fault in the sandbox itself. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_019NokzcR7BiAigPgC78zpk9

…disciplined) Reshapes the response around the principle that the HTTP status describes the sandbox SERVICE, never the guest program — so the guest can't move the operator's 5xx rate (and thus its circuit breakers, health checks, and pager). Every job that runs returns 200 with a body envelope: { run_id, verdict: ok|error|timeout|out_of_memory|host_fault, stdout, value | error{type,kind,message,line}, usage{steps,compute_ms,duration_ms,memory_bytes,output_bytes}, trace } # the host capability ledger usage + the capability ledger are folded in from Pyex's telemetry ([:pyex, :run, :stop] and [:pyex, :run, :exception]), captured per-worker, so they're present even when the program FAILED — "what did it touch before it crashed?" is exactly when the ledger matters. A host_fault (contained interpreter bug) stays a 200 verdict but also fires a dedicated high-severity log, keeping service-health and containment-health on separate channels. 400 is reserved for a malformed HTTP request; 5xx for a fault in the service itself. Verified over curl: ok/error/timeout/out_of_memory all return 200 with the verdict in the body, and an erroring program still returns its db.set/db.get ledger + usage. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_019NokzcR7BiAigPgC78zpk9

ivarvong and others added 5 commits June 29, 2026 10:36

docs: fix store curl example to include 'import store'

66851aa

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_019NokzcR7BiAigPgC78zpk9

ivarvong merged commit 5dd4afb into main Jun 29, 2026
7 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs: processless by design + how the caller adds hard process ceilings#135

docs: processless by design + how the caller adds hard process ceilings#135
ivarvong merged 5 commits into
mainfrom
docs-processless-isolation

ivarvong commented Jun 29, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ivarvong commented Jun 29, 2026

Why

What this adds (docs only)

Verified

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant