Skip to content

A subinterprer + os.fork() spawning backend#448

Draft
goodboy wants to merge 3 commits intosubint_spawner_backendfrom
subint_fork_backend
Draft

A subinterprer + os.fork() spawning backend#448
goodboy wants to merge 3 commits intosubint_spawner_backendfrom
subint_fork_backend

Conversation

@goodboy
Copy link
Copy Markdown
Owner

@goodboy goodboy commented Apr 22, 2026

(RFC) Add subint_fork stub — blocked on CPython post-fork gate

STATUS: BLOCKED ON UPSTREAM CPYTHON. This PR exists as a
public, discoverable, citable artifact to push for upstream
support — the new subint_fork backend code intentionally
raises NotImplementedError. NOT FOR MERGE.

Looking for a working fork-based spawn backend? See
PR #447 — subint_forkserver:
the alternative path that ships today by forking from a regular
threading.Thread attached to the main interpreter rather than
from a non-main subint. This PR is for the not-yet-possible
fork-from-non-main-subint variant; #447 covers the working
case. Also watch issue #379 (subint umbrella) for the broader
roadmap.


Motivation

The plan: a third tractor spawn backend that uses a fresh
sub-interpreter purely as a trio-free launchpad from which to
call os.fork(), then runs _actor_child_main() against a
normal trio runtime in the child. The hope was to sidestep
both of the well-known hazards already plaguing the existing
pair of backends — without the full PEP 684 per-interp-GIL work
landing first.

Specifically it would dodge,

  • the trio+fork() hazards hitting trio_proc
    (Surviving fork() python-trio/trio#1614 et al.), since the forking interpreter
    is statically guaranteed never to have imported trio.
  • the shared-GIL abandoned-thread hazards hitting subint_proc
    (the in-thread backend introduced in PR A subinterpreter-in-thread spawning backend #446), since the
    launchpad subint only lives long enough to call os.fork()
    — the child process runs full vanilla CPython with
    trio.run() directly off the main interp.

Empirical finding (TL;DR): this strategy is structurally
dead on current CPython. os.fork() from a non-main
subinterpreter aborts the child immediately during
PyOS_AfterFork_Child()
_PyInterpreterState_DeleteExceptMain() with

Fatal Python error: _PyInterpreterState_DeleteExceptMain: not main interpreter

The check is enforced by an explicit
PyStatus_ERR("not main interpreter") gate at
Python/pystate.c:1044-1047, called from
Modules/posixmodule.c:753. Adjacent in-source comments
(// Ideally we could guarantee tstate is running main.)
confirm CPython devs are aware the path is fragile, but no
user-facing hook is exposed to satisfy the precondition.

Why ship this as a draft PR rather than just a buried
ai/conc-anal/ doc
: PR descriptions are discoverable,
citable, and forkable in a way internal analysis docs are not.
CPython upstream contributors can link / fork / reference this
page directly when discussing whether to expose a hook.

What we are asking of CPython upstream — any of:

  • a user-facing pre-fork hook that swaps the calling tstate to
    the main interp before PyOS_AfterFork_Child() runs.
  • a _PyInterpreterState_DeleteExceptFor(tstate->interp)
    variant that lets the calling subint survive post-fork
    (subsequent execv() clears state at the OS level anyway).
  • minimally — convert the silent child-side
    Fatal Python error into a clean RuntimeError raised in
    the parent's os.fork() call so the failure mode is
    debuggable.
  • OR — an authoritative doc statement that
    fork-from-non-main-subint is permanently disallowed, so we
    can stop trying.

This PR's _subint_fork.py module +
ai/conc-anal/subint_fork_blocked_by_cpython_post_fork_issue.md
analysis doc are designed to be lifted directly into a CPython
issue tracker report.


Src of research


Summary of changes

By chronological commit (stacked on PR #446):

  • (eee79a03) Add WIP subint_fork_proc backend
    scaffold under tractor.spawn._subint. Per Trying out sub-interpreters (subints), maybe fork() can be hacked now?' #379's
    "fork()-workaround/hacks" thread; bootstrap fork+exec
    string-builder + # TODO: orchestrate driver thread block
    kept in-tree as deliberate dead code so the next iteration
    starts from a concrete shape rather than a blank page. Three
    open questions documented in the docstring for empirical
    follow-up.
  • (0f48ed2e) Doc subint_fork as blocked at the
    CPython level after empirical validation. Reshape the WIP
    scaffold into a dedicated tractor.spawn._subint_fork
    module (153 LOC) — fn body trimmed to a
    NotImplementedError pointing at the analysis doc; no more
    dead-code bootstrap sketch bloating _subint.py.
    • _spawn.py: keep 'subint_fork' in SpawnMethodKey +
      the _methods dispatch so --spawn-backend=subint_fork
      routes to a clean NotImplementedError rather than
      "invalid backend"; comments call out the blockage.
      Collapse the duplicate py3.14 feature-gate in
      try_set_start_method() into a combined
      case 'subint' | 'subint_fork': arm.
    • New 337-line analysis:
      ai/conc-anal/subint_fork_blocked_by_cpython_post_fork_issue.md.
      Annotated walkthrough from the user-visible fatal error
      down to the specific Modules/posixmodule.c +
      Python/pystate.c source lines enforcing the refusal,
      plus an upstream-report draft.
  • (de4f470b) Add a CPython-level subint_fork
    workaround smoketest at
    ai/conc-anal/subint_fork_from_main_thread_smoketest.py
    a standalone script (zero tractor imports) that validates
    the alternative "main-interp worker-thread forkserver +
    subint-hosted trio" architecture.
    • Four --scenario modes: control_subint_thread_fork
      (KNOWN-BROKEN harness sanity), main_thread_fork
      (baseline), worker_thread_fork (architectural assertion
      that fork from a regular threading.Thread on the main
      interp survives), and full_architecture (end-to-end:
      fork from a main-interp worker, then in child create a
      subint driving a trio thread).
    • Pass/fail is a property of CPython alone, independent of
      tractor runtime — usable as a reproducer for upstream.
    • Also logs the NLNet provenance for this session's
      three-sub-phase work via
      ai/prompt-io/claude/20260422T200723Z_797f57c_prompt_io.md.

Scopes changed

  • tractor.spawn._subint_fork (new, 153 LOC) — module
    docstring documents the CPython block; subint_fork_proc()
    feature-gates on py3.14 _has_subints then raises
    NotImplementedError pointing at the analysis doc.
  • tractor.spawn._spawnSpawnMethodKey literal grows
    'subint_fork'; try_set_start_method() collapses the
    py3.14 feature-gate into a combined
    case 'subint' | 'subint_fork': arm; _methods dispatch
    routes the new key to subint_fork_proc.
  • ai.conc_anal (docs) —
    subint_fork_blocked_by_cpython_post_fork_issue.md (337-line
    analysis + upstream-report draft) and
    subint_fork_from_main_thread_smoketest.py (440-line
    standalone CPython-level smoketest, four scenarios).
  • ai.prompt_io.claude — NLNet provenance log.

TODOs before landing

  • DO NOT MERGE until upstream CPython provides a hook
    to satisfy the post-fork main-interp gate, OR until we
    collectively decide the smoketest's "main-interp
    worker-thread forkserver" workaround is worth shipping under
    a different name.
  • File the upstream CPython issue using the report draft
    in the analysis doc.

Future follow up

  • Watch for CPython upstream changes to the post-fork
    main-interp gate.
    Track resolution of any issue filed against CPython (per
    the report draft in
    ai/conc-anal/subint_fork_blocked_by_cpython_post_fork_issue.md
    "Upstream-report draft" section). The blocker is
    _PyInterpreterState_DeleteExceptMain() at
    Python/pystate.c:1040 — any patch loosening that gate (or
    exposing a pre-fork tstate-swap hook) unblocks this backend.

  • Re-evaluate fork-from-non-main-subint feasibility once
    PEP 684 (per-interp GIL) fully ships.
    PEP 684's per-interp-GIL work touches
    _PyInterpreterState_DeleteExceptMain adjacency; once
    stable, recheck whether the gate behavior changed or whether
    a public _PyInterpreterState_DeleteExceptFor() variant is
    feasible to propose.

  • Validate the alternative "main-interp worker-thread
    forkserver" architecture in the full_architecture scenario
    of ai/conc-anal/subint_fork_from_main_thread_smoketest.py.
    Run the smoketest on multiple CPython 3.14.x point releases;
    if the pattern holds, write an issue proposing it as a
    fourth backend (distinct from this PR's dead
    subint_fork) — fork from a regular threading.Thread on
    the main interp, then in the child host trio.run() from
    inside a freshly-created subint's worker thread.

  • Resolve the three open questions in the original
    eee79a03 subint_fork_proc docstring — Q1 (fork from
    non-main subint?) is answered: NO; Q2 (fork-without-exec

    • trio.run() from the launchpad subint?) and Q3
      (signal.set_wakeup_fd() + process-global state across the
      subint→main fork boundary) remain moot until/unless the
      gate is lifted.
  • Surface the XXX annotations adjacent to the gate site
    in Python/pystate.c:159-161 (// XXX Won't this fail since PyInterpreterState_Clear() requires the "current" tstate to be set?) in the upstream report — they suggest latent
    issues in _PyInterpreterState_DeleteExceptMain even in the
    happy path.



(this pr content was generated in some part by claude-code)

goodboy added 3 commits April 23, 2026 18:48
Experimental third spawn backend: use a fresh
sub-interpreter purely as a trio-free launchpad from
which to `os.fork()` + exec back into
`python -m tractor._child`. Per issue #379's
"fork()-workaround/hacks" thread.

Intent is to sidestep both,
- the trio+fork hazards hitting `trio_proc` (python- trio/trio#1614 et
  al.), since the forking interp is guaranteed trio-free.

- the shared-GIL abandoned-thread hazards hitting `subint_proc`
  (`ai/conc-anal/subint_sigint_starvation_issue.md`), since we don't
  *stay* in the subint — it only lives long enough to call `os.fork()`

Downstream of the fork+exec, all the existing `trio_proc` plumbing is
reused verbatim: `ipc_server.wait_for_peer()`, `SpawnSpec`, `Portal`
yield, soft-kill.

Status: NOT wired up beyond scaffolding. The fn raises
`NotImplementedError` immediately; the `bootstrap` fork/exec string
builder and the `# TODO: orchestrate driver thread` block are kept
in-tree as deliberate dead code so the next iteration starts from
a concrete shape rather than a blank page.

Docstring calls out three open questions that need
empirical validation before wiring this up:
1. Does CPython permit `os.fork()` from a non-main
   legacy subint?
2. Can the child stay fork-without-exec and
   `trio.run()` directly from within the launchpad
   subint?
3. How do `signal.set_wakeup_fd()` handlers and other
   process-global state interact when the forking
   thread is inside a subint?

(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
Empirical finding: the WIP `subint_fork_proc` scaffold
landed in `cf0e3e6f` does *not* work on current CPython.
The `fork()` syscall succeeds in the parent, but the
CHILD aborts immediately during
`PyOS_AfterFork_Child()` →
`_PyInterpreterState_DeleteExceptMain()`, which gates
on the current tstate belonging to the main interp —
the child dies with `Fatal Python error: not main
interpreter`.

CPython devs acknowledge the fragility with an in-source
comment (`// Ideally we could guarantee tstate is running
main.`) but expose no user-facing hook to satisfy the
precondition — so the strategy is structurally dead until
upstream changes.

Rather than delete the scaffold, reshape it into a
documented dead-end so the next person with this idea
lands on the reason rather than rediscovering the same
CPython-level refusal.

Deats,
- Move `subint_fork_proc` out of `tractor.spawn._subint`
  into a new `tractor.spawn._subint_fork` dedicated
  module (153 LOC). Module + fn docstrings now describe
  the blockage directly; the fn body is trimmed to a
  `NotImplementedError` pointing at the analysis doc —
  no more dead-code `bootstrap` sketch bloating
  `_subint.py`.
- `_spawn.py`: keep `'subint_fork'` in `SpawnMethodKey`
  + the `_methods` dispatch so
  `--spawn-backend=subint_fork` routes to a clean
  `NotImplementedError` rather than "invalid backend";
  comment calls out the blockage. Collapse the duplicate
  py3.14 feature-gate in `try_set_start_method()` into a
  combined `case 'subint' | 'subint_fork':` arm.
- New 337-line analysis:
  `ai/conc-anal/subint_fork_blocked_by_cpython_post_fork_issue.md`.
  Annotated walkthrough from the user-visible fatal
  error down to the specific `Modules/posixmodule.c` +
  `Python/pystate.c` source lines enforcing the refusal,
  plus an upstream-report draft.

(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
Standalone script to validate the "main-interp worker-thread
forkserver + subint-hosted trio" arch proposed as a workaround
to the CPython-level refusal doc'd in
`ai/conc-anal/subint_fork_blocked_by_cpython_post_fork_issue.md`.

Deliberately NOT a `tractor` test — zero `tractor` imports.
Uses `_interpreters` (private stdlib) + `os.fork()` directly so
pass/fail is a property of CPython alone, independent of our
runtime. Requires py3.14+.

Deats,
- four scenarios via `--scenario`:
  - `control_subint_thread_fork` — the KNOWN-BROKEN case as a
    harness sanity; if the child DOESN'T abort, our analysis
    is wrong
  - `main_thread_fork` — baseline sanity, must always succeed
  - `worker_thread_fork` — architectural assertion: regular
    `threading.Thread` attached to main interp calls
    `os.fork()`; child should survive post-fork cleanup
  - `full_architecture` — end-to-end: fork from a main-interp
    worker thread, then in child create a subint driving a
    worker thread running `trio.run()`
- exit code 0 on EXPECTED outcome (for `control_*` that means
  "child aborted", not "child succeeded")
- each scenario prints a self-contained pass/fail banner; use
  `os.waitpid()` of the parent + per-scenario status prints to
  observe the child's fate

Also, log NLNet provenance for this session's three-sub-phase
work (py3.13 gate tightening, `pytest-timeout` + marker
refactor, `subint_fork` prototype → CPython-block finding).

Prompt-IO: ai/prompt-io/claude/20260422T200723Z_797f57c_prompt_io.md

(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant