docs: processless by design + how the caller adds hard process ceilings#135
Merged
Conversation
Pyex never spawns, links, or sleeps a process of its own — run/2 is a synchronous call on the caller's process, and a CI static analyzer already enforces that (Process/Task/spawn are banned in lib/pyex). That keeps the interpreter deterministic, embeddable, and pool-friendly, and leaves the process model to the caller. Documents the two-layer resource model and gives a ready-to-use SafeRunner: the interpreter's cooperative in-process limits, plus the caller's hard, BEAM-enforced ceilings (max_heap_size with kill + include_shared_binaries, and a wall-clock brutal-kill watchdog) that bound memory and time regardless of what the guest does — including gaps in the cooperative accounting and uninterruptible NIFs. The wrapper was verified end to end: happy path returns, an infinite loop with limits disabled hits the watchdog, and an unbounded allocation hits the heap kill. Adds the guidance to the README sandbox model and a pointer from the Pyex moduledoc. Docs only; no library code changes (Pyex stays processless). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_019NokzcR7BiAigPgC78zpk9
…ledger A small Plug/Bandit server demonstrating the processless model end to end: each HTTP request is its own process, so the request handler is the isolation boundary — it spawns a monitored worker, caps the worker's heap (GC-enforced), runs the untrusted Python, and watchdogs the wall clock. Returns proper status codes (200 ok / 400 python_error / 504 timeout / 507 out_of_memory), and on success the response also carries `trace`: the host's own rendered span tree of every storage op the program caused — unforgeable by the guest. Verified end to end over real HTTP (curl): a storage program comes back with its db.set/db.get/db.delete spans, a runaway loop with 504, a memory bomb with 507. Referenced from the README sandbox section. Docs/example only. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_019NokzcR7BiAigPgC78zpk9
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_019NokzcR7BiAigPgC78zpk9
507 (WebDAV "Insufficient Storage") and 504 (proxy "Gateway Timeout") were the wrong shape for "the guest used too much memory / time" — those are execution verdicts, not transport conditions, and there is no honest HTTP code for them. The HTTP status now describes the API call: running and bounding a job is a successful request (200) whose verdict — ok / error / timeout / out_of_memory — is a field in the body. 400 is reserved for a malformed HTTP request (empty/oversized body) and 5xx for a fault in the sandbox itself. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_019NokzcR7BiAigPgC78zpk9
…disciplined)
Reshapes the response around the principle that the HTTP status describes the
sandbox SERVICE, never the guest program — so the guest can't move the
operator's 5xx rate (and thus its circuit breakers, health checks, and pager).
Every job that runs returns 200 with a body envelope:
{ run_id, verdict: ok|error|timeout|out_of_memory|host_fault,
stdout, value | error{type,kind,message,line},
usage{steps,compute_ms,duration_ms,memory_bytes,output_bytes},
trace } # the host capability ledger
usage + the capability ledger are folded in from Pyex's telemetry
([:pyex, :run, :stop] and [:pyex, :run, :exception]), captured per-worker, so
they're present even when the program FAILED — "what did it touch before it
crashed?" is exactly when the ledger matters. A host_fault (contained
interpreter bug) stays a 200 verdict but also fires a dedicated high-severity
log, keeping service-health and containment-health on separate channels.
400 is reserved for a malformed HTTP request; 5xx for a fault in the service
itself. Verified over curl: ok/error/timeout/out_of_memory all return 200 with
the verdict in the body, and an erroring program still returns its db.set/db.get
ledger + usage.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_019NokzcR7BiAigPgC78zpk9
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Why
A reader asked the right question after the host-safety work: the interpreter's resource ceilings are cooperative (checked between evaluation steps), so a single native op or a blocking NIF can run to completion before the next check. The truly input-independent guarantee — bound memory and time regardless of what the code does — comes from the runtime, not the interpreter.
Pyex deliberately does not provide that itself:
run/2is a synchronous call on the caller's process, andPyex.BannedCallTraceralready enforces on every CI run thatProcess/Task/spawnnever appear inlib/pyex. Keeping it processless is what makes it deterministic, embeddable, and pool-friendly. So the right move is to document how the caller adds the process layer, not to bake one in.What this adds (docs only)
timeout, returning a clean%Pyex.Error{}.max_heap_size(withkill: trueandinclude_shared_binariesso large off-heap binaries count too) plus a wall-clock brutal-kill watchdog.SafeRunnerwrapper (spawn_monitorso a guest OOM can't take the caller down, message-passed result,:out_of_memory/:timeoutoutcomes).Pyexmoduledoc so integrators find it from the API docs.Verified
The documented wrapper was exercised end to end, not just written:
{:ok, 42, _}{:error, :timeout}(watchdog fires){:error, :out_of_memory}(GC kill fires)No library code changes — Pyex stays processless. Independent of #134 (different files; no conflict).
🤖 Generated with Claude Code
https://claude.ai/code/session_019NokzcR7BiAigPgC78zpk9