A subinterprer + os.fork() spawning backend#448
Draft
goodboy wants to merge 3 commits intosubint_spawner_backendfrom
Draft
A subinterprer + os.fork() spawning backend#448goodboy wants to merge 3 commits intosubint_spawner_backendfrom
os.fork() spawning backend#448goodboy wants to merge 3 commits intosubint_spawner_backendfrom
Conversation
Experimental third spawn backend: use a fresh sub-interpreter purely as a trio-free launchpad from which to `os.fork()` + exec back into `python -m tractor._child`. Per issue #379's "fork()-workaround/hacks" thread. Intent is to sidestep both, - the trio+fork hazards hitting `trio_proc` (python- trio/trio#1614 et al.), since the forking interp is guaranteed trio-free. - the shared-GIL abandoned-thread hazards hitting `subint_proc` (`ai/conc-anal/subint_sigint_starvation_issue.md`), since we don't *stay* in the subint — it only lives long enough to call `os.fork()` Downstream of the fork+exec, all the existing `trio_proc` plumbing is reused verbatim: `ipc_server.wait_for_peer()`, `SpawnSpec`, `Portal` yield, soft-kill. Status: NOT wired up beyond scaffolding. The fn raises `NotImplementedError` immediately; the `bootstrap` fork/exec string builder and the `# TODO: orchestrate driver thread` block are kept in-tree as deliberate dead code so the next iteration starts from a concrete shape rather than a blank page. Docstring calls out three open questions that need empirical validation before wiring this up: 1. Does CPython permit `os.fork()` from a non-main legacy subint? 2. Can the child stay fork-without-exec and `trio.run()` directly from within the launchpad subint? 3. How do `signal.set_wakeup_fd()` handlers and other process-global state interact when the forking thread is inside a subint? (this patch was generated in some part by [`claude-code`][claude-code-gh]) [claude-code-gh]: https://github.com/anthropics/claude-code
Empirical finding: the WIP `subint_fork_proc` scaffold landed in `cf0e3e6f` does *not* work on current CPython. The `fork()` syscall succeeds in the parent, but the CHILD aborts immediately during `PyOS_AfterFork_Child()` → `_PyInterpreterState_DeleteExceptMain()`, which gates on the current tstate belonging to the main interp — the child dies with `Fatal Python error: not main interpreter`. CPython devs acknowledge the fragility with an in-source comment (`// Ideally we could guarantee tstate is running main.`) but expose no user-facing hook to satisfy the precondition — so the strategy is structurally dead until upstream changes. Rather than delete the scaffold, reshape it into a documented dead-end so the next person with this idea lands on the reason rather than rediscovering the same CPython-level refusal. Deats, - Move `subint_fork_proc` out of `tractor.spawn._subint` into a new `tractor.spawn._subint_fork` dedicated module (153 LOC). Module + fn docstrings now describe the blockage directly; the fn body is trimmed to a `NotImplementedError` pointing at the analysis doc — no more dead-code `bootstrap` sketch bloating `_subint.py`. - `_spawn.py`: keep `'subint_fork'` in `SpawnMethodKey` + the `_methods` dispatch so `--spawn-backend=subint_fork` routes to a clean `NotImplementedError` rather than "invalid backend"; comment calls out the blockage. Collapse the duplicate py3.14 feature-gate in `try_set_start_method()` into a combined `case 'subint' | 'subint_fork':` arm. - New 337-line analysis: `ai/conc-anal/subint_fork_blocked_by_cpython_post_fork_issue.md`. Annotated walkthrough from the user-visible fatal error down to the specific `Modules/posixmodule.c` + `Python/pystate.c` source lines enforcing the refusal, plus an upstream-report draft. (this patch was generated in some part by [`claude-code`][claude-code-gh]) [claude-code-gh]: https://github.com/anthropics/claude-code
Standalone script to validate the "main-interp worker-thread
forkserver + subint-hosted trio" arch proposed as a workaround
to the CPython-level refusal doc'd in
`ai/conc-anal/subint_fork_blocked_by_cpython_post_fork_issue.md`.
Deliberately NOT a `tractor` test — zero `tractor` imports.
Uses `_interpreters` (private stdlib) + `os.fork()` directly so
pass/fail is a property of CPython alone, independent of our
runtime. Requires py3.14+.
Deats,
- four scenarios via `--scenario`:
- `control_subint_thread_fork` — the KNOWN-BROKEN case as a
harness sanity; if the child DOESN'T abort, our analysis
is wrong
- `main_thread_fork` — baseline sanity, must always succeed
- `worker_thread_fork` — architectural assertion: regular
`threading.Thread` attached to main interp calls
`os.fork()`; child should survive post-fork cleanup
- `full_architecture` — end-to-end: fork from a main-interp
worker thread, then in child create a subint driving a
worker thread running `trio.run()`
- exit code 0 on EXPECTED outcome (for `control_*` that means
"child aborted", not "child succeeded")
- each scenario prints a self-contained pass/fail banner; use
`os.waitpid()` of the parent + per-scenario status prints to
observe the child's fate
Also, log NLNet provenance for this session's three-sub-phase
work (py3.13 gate tightening, `pytest-timeout` + marker
refactor, `subint_fork` prototype → CPython-block finding).
Prompt-IO: ai/prompt-io/claude/20260422T200723Z_797f57c_prompt_io.md
(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
797f57c to
0f48ed2
Compare
99d7033 to
4b2a088
Compare
5 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
(RFC) Add
subint_forkstub — blocked on CPython post-fork gateMotivation
The plan: a third
tractorspawn backend that uses a freshsub-interpreter purely as a trio-free launchpad from which to
call
os.fork(), then runs_actor_child_main()against anormal
trioruntime in the child. The hope was to sidestepboth of the well-known hazards already plaguing the existing
pair of backends — without the full PEP 684 per-interp-GIL work
landing first.
Specifically it would dodge,
trio+fork()hazards hittingtrio_proc(Surviving fork() python-trio/trio#1614 et al.), since the forking interpreter
is statically guaranteed never to have imported
trio.subint_proc(the in-thread backend introduced in PR A subinterpreter-in-thread spawning backend #446), since the
launchpad subint only lives long enough to call
os.fork()— the child process runs full vanilla CPython with
trio.run()directly off the main interp.Empirical finding (TL;DR): this strategy is structurally
dead on current CPython.
os.fork()from a non-mainsubinterpreter aborts the child immediately during
PyOS_AfterFork_Child()→_PyInterpreterState_DeleteExceptMain()withThe check is enforced by an explicit
PyStatus_ERR("not main interpreter")gate atPython/pystate.c:1044-1047, called fromModules/posixmodule.c:753. Adjacent in-source comments(
// Ideally we could guarantee tstate is running main.)confirm CPython devs are aware the path is fragile, but no
user-facing hook is exposed to satisfy the precondition.
Why ship this as a draft PR rather than just a buried
ai/conc-anal/doc: PR descriptions are discoverable,citable, and forkable in a way internal analysis docs are not.
CPython upstream contributors can link / fork / reference this
page directly when discussing whether to expose a hook.
What we are asking of CPython upstream — any of:
the main interp before
PyOS_AfterFork_Child()runs._PyInterpreterState_DeleteExceptFor(tstate->interp)variant that lets the calling subint survive post-fork
(subsequent
execv()clears state at the OS level anyway).Fatal Python errorinto a cleanRuntimeErrorraised inthe parent's
os.fork()call so the failure mode isdebuggable.
fork-from-non-main-subint is permanently disallowed, so we
can stop trying.
This PR's
_subint_fork.pymodule +ai/conc-anal/subint_fork_blocked_by_cpython_post_fork_issue.mdanalysis doc are designed to be lifted directly into a CPython
issue tracker report.
Src of research
—
PyOS_AfterFork_Childcall site.—
_PyInterpreterState_DeleteExceptMainrefusal.(would unblock).
concurrent.interpreterspublic API (the gated subint we use).
trio+fork()hazard, the original motivator for wanting atrio-free fork launchpad in the first place.
Summary of changes
By chronological commit (stacked on PR #446):
subint_fork_procbackendscaffold under
tractor.spawn._subint. Per Trying out sub-interpreters (subints), maybefork()can be hacked now?' #379's"fork()-workaround/hacks" thread;
bootstrapfork+execstring-builder +
# TODO: orchestrate driver threadblockkept in-tree as deliberate dead code so the next iteration
starts from a concrete shape rather than a blank page. Three
open questions documented in the docstring for empirical
follow-up.
subint_forkas blocked at theCPython level after empirical validation. Reshape the WIP
scaffold into a dedicated
tractor.spawn._subint_forkmodule (153 LOC) — fn body trimmed to a
NotImplementedErrorpointing at the analysis doc; no moredead-code
bootstrapsketch bloating_subint.py._spawn.py: keep'subint_fork'inSpawnMethodKey+the
_methodsdispatch so--spawn-backend=subint_forkroutes to a clean
NotImplementedErrorrather than"invalid backend"; comments call out the blockage.
Collapse the duplicate py3.14 feature-gate in
try_set_start_method()into a combinedcase 'subint' | 'subint_fork':arm.ai/conc-anal/subint_fork_blocked_by_cpython_post_fork_issue.md.Annotated walkthrough from the user-visible fatal error
down to the specific
Modules/posixmodule.c+Python/pystate.csource lines enforcing the refusal,plus an upstream-report draft.
subint_forkworkaround smoketest at
ai/conc-anal/subint_fork_from_main_thread_smoketest.py—a standalone script (zero
tractorimports) that validatesthe alternative "main-interp worker-thread forkserver +
subint-hosted trio" architecture.
--scenariomodes:control_subint_thread_fork(KNOWN-BROKEN harness sanity),
main_thread_fork(baseline),
worker_thread_fork(architectural assertionthat fork from a regular
threading.Threadon the maininterp survives), and
full_architecture(end-to-end:fork from a main-interp worker, then in child create a
subint driving a trio thread).
tractorruntime — usable as a reproducer for upstream.three-sub-phase work via
ai/prompt-io/claude/20260422T200723Z_797f57c_prompt_io.md.Scopes changed
tractor.spawn._subint_fork(new, 153 LOC) — moduledocstring documents the CPython block;
subint_fork_proc()feature-gates on py3.14
_has_subintsthen raisesNotImplementedErrorpointing at the analysis doc.tractor.spawn._spawn—SpawnMethodKeyliteral grows'subint_fork';try_set_start_method()collapses thepy3.14 feature-gate into a combined
case 'subint' | 'subint_fork':arm;_methodsdispatchroutes the new key to
subint_fork_proc.ai.conc_anal(docs) —subint_fork_blocked_by_cpython_post_fork_issue.md(337-lineanalysis + upstream-report draft) and
subint_fork_from_main_thread_smoketest.py(440-linestandalone CPython-level smoketest, four scenarios).
ai.prompt_io.claude— NLNet provenance log.TODOs before landing
to satisfy the post-fork main-interp gate, OR until we
collectively decide the smoketest's "main-interp
worker-thread forkserver" workaround is worth shipping under
a different name.
in the analysis doc.
Future follow up
Watch for CPython upstream changes to the post-fork
main-interp gate.
Track resolution of any issue filed against CPython (per
the report draft in
ai/conc-anal/subint_fork_blocked_by_cpython_post_fork_issue.md"Upstream-report draft" section). The blocker is
_PyInterpreterState_DeleteExceptMain()atPython/pystate.c:1040— any patch loosening that gate (orexposing a pre-fork tstate-swap hook) unblocks this backend.
Re-evaluate fork-from-non-main-subint feasibility once
PEP 684 (per-interp GIL) fully ships.
PEP 684's per-interp-GIL work touches
_PyInterpreterState_DeleteExceptMainadjacency; oncestable, recheck whether the gate behavior changed or whether
a public
_PyInterpreterState_DeleteExceptFor()variant isfeasible to propose.
Validate the alternative "main-interp worker-thread
forkserver" architecture in the
full_architecturescenarioof
ai/conc-anal/subint_fork_from_main_thread_smoketest.py.Run the smoketest on multiple CPython 3.14.x point releases;
if the pattern holds, write an issue proposing it as a
fourth backend (distinct from this PR's dead
subint_fork) — fork from a regularthreading.Threadonthe main interp, then in the child host
trio.run()frominside a freshly-created subint's worker thread.
Resolve the three open questions in the original
eee79a03subint_fork_procdocstring — Q1 (fork fromnon-main subint?) is answered: NO; Q2 (fork-without-exec
trio.run()from the launchpad subint?) and Q3(
signal.set_wakeup_fd()+ process-global state across thesubint→main fork boundary) remain moot until/unless the
gate is lifted.
Surface the
XXXannotations adjacent to the gate sitein
Python/pystate.c:159-161(// XXX Won't this fail since PyInterpreterState_Clear() requires the "current" tstate to be set?) in the upstream report — they suggest latentissues in
_PyInterpreterState_DeleteExceptMaineven in thehappy path.
(this pr content was generated in some part by
claude-code)