diff --git a/.claude/settings.local.json b/.claude/settings.local.json index abb6bc300..ac895510b 100644 --- a/.claude/settings.local.json +++ b/.claude/settings.local.json @@ -1,8 +1,16 @@ { "permissions": { "allow": [ - "Bash(date *)", "Bash(cp .claude/*)", + "Read(.claude/**)", + "Read(.claude/skills/run-tests/**)", + "Write(.claude/**/*commit_msg*)", + "Write(.claude/git_commit_msg_LATEST.md)", + "Skill(run-tests)", + "Skill(close-wkt)", + "Skill(open-wkt)", + "Skill(prompt-io)", + "Bash(date *)", "Bash(git diff *)", "Bash(git log *)", "Bash(git status)", @@ -23,14 +31,12 @@ "Bash(UV_PROJECT_ENVIRONMENT=py* uv sync:*)", "Bash(UV_PROJECT_ENVIRONMENT=py* uv run:*)", "Bash(echo EXIT:$?:*)", - "Write(.claude/*commit_msg*)", - "Write(.claude/git_commit_msg_LATEST.md)", - "Skill(run-tests)", - "Skill(close-wkt)", - "Skill(open-wkt)", - "Skill(prompt-io)" + "Bash(echo \"EXIT=$?\")", + "Read(//tmp/**)" ], "deny": [], "ask": [] - } + }, + "prefersReducedMotion": false, + "outputStyle": "default" } diff --git a/.claude/skills/conc-anal/SKILL.md b/.claude/skills/conc-anal/SKILL.md index 4f498b7c3..fa121bb25 100644 --- a/.claude/skills/conc-anal/SKILL.md +++ b/.claude/skills/conc-anal/SKILL.md @@ -229,3 +229,69 @@ Unlike asyncio, trio allows checkpoints in that does `await` can itself be cancelled (e.g. by nursery shutdown). Watch for cleanup code that assumes it will run to completion. + +### Unbounded waits in cleanup paths + +Any `await .wait()` in a teardown path is +a latent deadlock unless the event's setter is +GUARANTEED to fire. If the setter depends on +external state (peer disconnects, child process +exit, subsequent task completion) that itself +depends on the current task's progress, you have +a mutual wait. + +Rule: **bound every `await X.wait()` in cleanup +paths with `trio.move_on_after()`** unless you +can prove the setter is unconditionally reachable +from the state at the await site. Concrete recent +example: `ipc_server.wait_for_no_more_peers()` in +`async_main`'s finally (see +`ai/conc-anal/subint_forkserver_test_cancellation_leak_issue.md` +"probe iteration 3") — it was unbounded, and when +one peer-handler was stuck the wait-for-no-more- +peers event never fired, deadlocking the whole +actor-tree teardown cascade. + +### The capture-pipe-fill hang pattern (grep this first) + +When investigating any hang in the test suite +**especially under fork-based backends**, first +check whether the hang reproduces under `pytest +-s` (`--capture=no`). If `-s` makes it go away +you're not looking at a trio concurrency bug — +you're looking at a Linux pipe-buffer fill. + +Mechanism: pytest replaces fds 1,2 with pipe +write-ends. Fork-child subactors inherit those +fds. High-volume error-log tracebacks (cancel +cascade spew) fill the 64KB pipe buffer. Child +`write()` blocks. Child can't exit. Parent's +`waitpid`/pidfd wait blocks. Deadlock cascades up +the tree. + +Pre-existing guards in `tests/conftest.py` encode +this knowledge — grep these BEFORE blaming +concurrency: + +```python +# tests/conftest.py:258 +if loglevel in ('trace', 'debug'): + # XXX: too much logging will lock up the subproc (smh) + loglevel: str = 'info' + +# tests/conftest.py:316 +# can lock up on the `_io.BufferedReader` and hang.. +stderr: str = proc.stderr.read().decode() +``` + +Full post-mortem + +`ai/conc-anal/subint_forkserver_test_cancellation_leak_issue.md` +for the canonical reproduction. Cost several +investigation sessions before catching it — +because the capture-pipe symptom was masked by +deeper cascade-deadlocks. Once the cascades were +fixed, the tree tore down enough to generate +pipe-filling log volume → capture-pipe finally +surfaced. Grep-note for future-self: **if a +multi-subproc tractor test hangs, `pytest -s` +first, conc-anal second.** diff --git a/.claude/skills/run-tests/SKILL.md b/.claude/skills/run-tests/SKILL.md index 946e871e0..b2014201c 100644 --- a/.claude/skills/run-tests/SKILL.md +++ b/.claude/skills/run-tests/SKILL.md @@ -205,6 +205,101 @@ python -m pytest tests/ -x -q --co 2>&1 | tail -5 If either fails, fix the import error before running any actual tests. +### Step 4: zombie-actor / stale-registry check (MANDATORY) + +The tractor runtime's default registry address is +**`127.0.0.1:1616`** (TCP) / `/tmp/registry@1616.sock` +(UDS). Whenever any prior test run — especially one +using a fork-based backend like `subint_forkserver` — +leaks a child actor process, that zombie keeps the +registry port bound and **every subsequent test +session fails to bind**, often presenting as 50+ +unrelated failures ("all tests broken"!) across +backends. + +**This has to be checked before the first run AND +after any cancelled/SIGINT'd run** — signal failures +in the middle of a test can leave orphan children. + +```sh +# 1. TCP registry — any listener on :1616? (primary signal) +ss -tlnp 2>/dev/null | grep ':1616' || echo 'TCP :1616 free' + +# 2. leftover actor/forkserver procs — scoped to THIS +# repo's python path, so we don't false-flag legit +# long-running tractor-using apps (e.g. `piker`, +# downstream projects that embed tractor). +pgrep -af "$(pwd)/py[0-9]*/bin/python.*_actor_child_main|subint-forkserv" \ + | grep -v 'grep\|pgrep' \ + || echo 'no leaked actor procs from this repo' + +# 3. stale UDS registry sockets +ls -la /tmp/registry@*.sock 2>/dev/null \ + || echo 'no leaked UDS registry sockets' +``` + +**Interpretation:** + +- **TCP :1616 free AND no stale sockets** → clean, + proceed. The actor-procs probe is secondary — false + positives are common (piker, any other tractor- + embedding app); only cleanup if `:1616` is bound or + sockets linger. +- **TCP :1616 bound OR stale sockets present** → + surface PIDs + cmdlines to the user, offer cleanup: + + ```sh + # 1. GRACEFUL FIRST (tractor is structured concurrent — it + # catches SIGINT as an OS-cancel in `_trio_main` and + # cascades Portal.cancel_actor via IPC to every descendant. + # So always try SIGINT first with a bounded timeout; only + # escalate to SIGKILL if graceful cleanup doesn't complete). + pkill -INT -f "$(pwd)/py[0-9]*/bin/python.*_actor_child_main|subint-forkserv" + + # 2. bounded wait for graceful teardown (usually sub-second). + # Loop until the processes exit, or timeout. Keep the + # bound tight — hung/abrupt-killed descendants usually + # hang forever, so don't wait more than a few seconds. + for i in $(seq 1 10); do + pgrep -f "$(pwd)/py[0-9]*/bin/python.*_actor_child_main|subint-forkserv" >/dev/null || break + sleep 0.3 + done + + # 3. ESCALATE TO SIGKILL only if graceful didn't finish. + if pgrep -f "$(pwd)/py[0-9]*/bin/python.*_actor_child_main|subint-forkserv" >/dev/null; then + echo 'graceful teardown timed out — escalating to SIGKILL' + pkill -9 -f "$(pwd)/py[0-9]*/bin/python.*_actor_child_main|subint-forkserv" + fi + + # 4. if a test zombie holds :1616 specifically and doesn't + # match the above pattern, find its PID the hard way: + ss -tlnp 2>/dev/null | grep ':1616' # prints `users:(("",pid=NNNN,...))` + # then (same SIGINT-first ladder): + # kill -INT ; sleep 1; kill -9 2>/dev/null + + # 5. remove stale UDS sockets + rm -f /tmp/registry@*.sock + + # 6. re-verify + ss -tlnp 2>/dev/null | grep ':1616' || echo 'TCP :1616 now free' + ``` + +**Never ignore stale registry state.** If you see the +"all tests failing" pattern — especially +`trio.TooSlowError` / connection refused / address in +use on many unrelated tests — check registry **before** +spelunking into test code. The failure signature will +be identical across backends because they're all +fighting for the same port. + +**False-positive warning for step 2:** a plain +`pgrep -af '_actor_child_main'` will also match +legit long-running tractor-embedding apps (e.g. +`piker` at `~/repos/piker/py*/bin/python3 -m +tractor._child ...`). Always scope to the current +repo's python path, or only use step 1 (`:1616`) as +the authoritative signal. + ## 4. Run and report - Run the constructed command. @@ -356,3 +451,175 @@ by your changes — note them and move on. **Rule of thumb**: if a test fails with `TooSlowError`, `trio.TooSlowError`, or `pexpect.TIMEOUT` and you didn't touch the relevant code path, it's flaky — skip it. + +## 9. The pytest-capture hang pattern (CHECK THIS FIRST) + +**Symptom:** a tractor test hangs indefinitely under +default `pytest` but passes instantly when you add +`-s` (`--capture=no`). + +**Cause:** tractor subactors (especially under fork- +based backends) inherit pytest's stdout/stderr +capture pipes via fds 1,2. Under high-volume error +logging (e.g. multi-level cancel cascade, nested +`run_in_actor` failures, anything triggering +`RemoteActorError` + `ExceptionGroup` traceback +spew), the **64KB Linux pipe buffer fills** faster +than pytest drains it. Subactor writes block → can't +finish exit → parent's `waitpid`/pidfd wait blocks → +deadlock cascades up the tree. + +**Pre-existing guards in the tractor harness** that +encode this same knowledge — grep these FIRST +before spelunking: + +- `tests/conftest.py:258-260` (in the `daemon` + fixture): `# XXX: too much logging will lock up + the subproc (smh)` — downgrades `trace`/`debug` + loglevel to `info` to prevent the hang. +- `tests/conftest.py:316`: `# can lock up on the + _io.BufferedReader and hang..` — noted on the + `proc.stderr.read()` post-SIGINT. + +**Debug recipe (in priority order):** + +1. **Try `-s` first.** If the hang disappears with + `pytest -s`, you've confirmed it's capture-pipe + fill. Skip spelunking. +2. **Lower the loglevel.** Default `--ll=error` on + this project; if you've bumped it to `debug` / + `info`, try dropping back. Each log level + multiplies pipe-pressure under fault cascades. +3. **If you MUST use default capture + high log + volume**, redirect subactor stdout/stderr in the + child prelude (e.g. + `tractor.spawn._subint_forkserver._child_target` + post-`_close_inherited_fds`) to `/dev/null` or a + file. + +**Signature tells you it's THIS bug (vs. a real +code hang):** + +- Multi-actor test under fork-based backend + (`subint_forkserver`, eventually `trio_proc` too + under enough log volume). +- Multiple `RemoteActorError` / `ExceptionGroup` + tracebacks in the error path. +- Test passes with `-s` in the 5-10s range, hangs + past pytest-timeout (usually 30+ s) without `-s`. +- Subactor processes visible via `pgrep -af + subint-forkserv` or similar after the hang — + they're alive but blocked on `write()` to an + inherited stdout fd. + +**Historical reference:** this deadlock cost a +multi-session investigation (4 genuine cascade +fixes landed along the way) that only surfaced the +capture-pipe issue AFTER the deeper fixes let the +tree actually tear down enough to produce pipe- +filling log volume. Full post-mortem in +`ai/conc-anal/subint_forkserver_test_cancellation_leak_issue.md`. +Lesson codified here so future-me grep-finds the +workaround before digging. + +## 10. Reaping zombie subactors (`tractor-reap`) + +**Symptom:** after a `pytest` run crashes, times out, +or is `Ctrl+C`'d, subactor forks (esp. under +`subint_forkserver`) can be reparented to `init` +(PPid==1) and linger. They hold onto ports, inherit +pytest's capture-pipe fds, and flakify later +sessions. + +**Two layers of defense:** + +### a) Session-scoped auto-fixture (always on) + +`tractor/_testing/pytest.py::_reap_orphaned_subactors` +runs at pytest session teardown. It walks `/proc` for +direct descendants of the pytest pid, SIGINTs them, +waits up to 3s, then SIGKILLs survivors. SC-polite: +gives the subactor runtime a chance to run its trio +cancel shield + IPC teardown before escalation. + +This is *autouse* and session-scoped — you don't need +to do anything. It just runs. + +### b) `scripts/tractor-reap` CLI (manual reap) + +For the **pytest-died-mid-session** case (Ctrl+C, OOM +kill, hung process you had to `kill -9`), the fixture +never ran. Reach for the CLI: + +```sh +# default: orphans (PPid==1, cwd==repo, cmd contains python) +scripts/tractor-reap + +# descendant-mode: from a still-live supervisor +scripts/tractor-reap --parent + +# see what would be reaped, don't signal +scripts/tractor-reap -n + +# tune the SIGINT → SIGKILL grace window +scripts/tractor-reap --grace 5 +``` + +Exit code: `0` if everyone exited on SIGINT, `1` if +SIGKILL had to escalate — so you can chain it in CI +health-checks (`scripts/tractor-reap || `). + +**What it matches** (orphan-mode): +- `PPid == 1` (reparented to init → definitely + orphaned, not just a currently-running child) +- `cwd == ` (keeps the sweep scoped; won't + touch unrelated init-children elsewhere) +- `python` in cmdline + +**What it does not do:** kill anything whose PPid is +still a live tractor parent. If the parent is alive +it's not an orphan; use `--parent ` if you need +to force-reap under a still-live supervisor. + +**When NOT to run it:** while a pytest session is +active in another terminal. It's safe (won't touch +that session's live children in orphan-mode) but can +race if the target session is mid-teardown. + +### c) `--shm` / `--shm-only`: orphan-segment sweep + +Because `tractor.ipc._mp_bs.disable_mantracker()` +turns off `mp.resource_tracker` (see +`ai/conc-anal/subint_forkserver_mp_shared_memory_issue.md`), +a hard-crashing actor can leave `/dev/shm/` +segments behind that nothing else GCs. + +```sh +# process reap THEN shm sweep +scripts/tractor-reap --shm + +# shm sweep only (skip process phase) +scripts/tractor-reap --shm-only + +# dry-run: list candidates, don't unlink +scripts/tractor-reap --shm -n +``` + +**Match criteria** (very conservative — this is a +shared-system path, can't be wrong): +- segment is a regular file under `/dev/shm`, +- owned by the **current uid** (`stat.st_uid`), +- AND **no live process holds it open** — + enumerated by walking every readable + `/proc//maps` (post-mmap mappings) AND + `/proc//fd/*` (pre-mmap shm-opened fds). + +The "nobody has it open" check is the +kernel-canonical "is this leaked?" test — same +answer `lsof /dev/shm/` would give. No +reliance on tractor-specific naming, so it works +for any tractor app. Critically, it WILL NOT touch +segments held by other apps you have running +(e.g. `piker`, `lttng-ust-*`, `aja-shm-*` — +verified locally with 81 in-use segments correctly +preserved). diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml index ea5b98113..6eff3bcbe 100644 --- a/.github/workflows/ci.yml +++ b/.github/workflows/ci.yml @@ -148,9 +148,13 @@ jobs: - name: Run tests run: > uv run - pytest tests/ -rsx + pytest + tests/ + -rsx --spawn-backend=${{ matrix.spawn_backend }} --tpt-proto=${{ matrix.tpt_proto }} + --capture=fd + # ^XXX^ can't work with --spawn-method=main_thread_forkserver # XXX legacy NOTE XXX # diff --git a/ai/conc-anal/cancel_cascade_too_slow_under_main_thread_forkserver_issue.md b/ai/conc-anal/cancel_cascade_too_slow_under_main_thread_forkserver_issue.md new file mode 100644 index 000000000..780cbb67c --- /dev/null +++ b/ai/conc-anal/cancel_cascade_too_slow_under_main_thread_forkserver_issue.md @@ -0,0 +1,202 @@ +# Cancel-cascade `trio.TooSlowError` flakes under `main_thread_forkserver` + +## Symptom + +Running the full test suite under + +```bash +./py313/bin/python -m pytest tests/ \ + --tpt-proto=tcp \ + --spawn-backend=main_thread_forkserver +``` + +surfaces a single, **rotating** `trio.TooSlowError` +failure each run. The failure isn't deterministic on +test identity — different test each run — but it +ALWAYS looks like: + +``` +FAILED tests/::test_ - trio.TooSlowError +==== 1 failed, 373 passed, 17 skipped, 11–12 xfailed, + 0–1 xpassed, ~550 warnings in ~6min ==== +``` + +Pass rate: **~99.7%** (373 of 374 non-skip tests). +Wall-clock per full run: 5–6 min. + +## Tests observed flaking so far + +Each row was the SOLE failure in a separate run: + +| run # | test | +|---|---| +| 1 | `tests/test_advanced_streaming.py::test_dynamic_pub_sub[KeyboardInterrupt]` | +| 2 | `tests/test_infected_asyncio.py::test_context_spawns_aio_task_that_errors[parent_actor_cancels_child=False]` | + +Both share the same shape: + +- **Cancel cascade** of N subactors back to a parent root actor. +- N ≥ `multiprocessing.cpu_count()` for `test_dynamic_pub_sub` + (it spawns `cpus - 1` consumers + publisher + dynamic-consumer). +- N ≈ 2 for `test_context_spawns_aio_task_that_errors` — + but each subactor is `infect_asyncio=True`, so each + cancel involves the trio↔asyncio guest-run unwind + which is structurally heavier than pure-trio. +- Test wraps the cascade in `trio.fail_after(N seconds)` + and the cap fires before the cascade completes. + +The exact failing test rotates because each test is +independently close to the cap; whichever happens to +be unlucky in scheduling/CPU-contention on a given run +is the one that times out. + +## Root-cause family + +`hard_kill` (`tractor/spawn/_spawn.py:hard_kill`) runs +the SC-graceful teardown ladder per subactor: + +1. `Portal.cancel_actor()` — graceful IPC cancel-req. +2. Wait `terminate_after=1.6s` for sub to exit. +3. If still alive: `proc.kill()` (SIGKILL). +4. (NEW) `_unlink_uds_bind_addrs()` — post-mortem + sock-file cleanup for UDS leaks (issue #452 fix). + +For a cascade of N subactors, each pays steps 1–4. If +graceful-cancel doesn't complete within 1.6s for ANY +sub, that sub eats a full 1.6s of `move_on_after` plus +the `proc.wait()` post-SIGKILL. + +Worst case under fork backend with N=cpus subs: +- N × 1.6s = 16s+ on a 10-core box just for the + graceful timeout phase +- Plus per-spawn fork-IPC handshake cost compounds + during teardown (each sub's IPC cleanup goes through + the same forkserver coordinator) +- Plus the new autouse fixtures + (`_track_orphaned_uds_per_test`, + `_detect_runaway_subactors_per_test`, + `_reap_orphaned_subactors`) all run at test + teardown, adding small (10s of ms) but cumulative + overhead + +Current cap: 30s (`fail_after_s = 30 if +is_forking_spawner else 12`). Empirically fits the +median run but the tail breaks ~0.3% of the time. + +## NOT regressing + +To confirm this is a flake and not a regression: + +- Pre-`WakeupSocketpair`-patch baseline: tests + HUNG INDEFINITELY (busy-loop never released). +- Post-patch: pass-or-fail-fast, ~99.7% pass, the + occasional cap-hit fails in bounded time (<60s for + the offending test). +- Same test PASSES under `--spawn-backend=trio` + (no fork, no hard-kill compounding). + +So the suite is dramatically better than before; the +remaining flake is a known-tolerable steady-state. + +## Possible mitigations (ranked) + +### A. Bump the cap further + +Cheapest. Change the per-test `fail_after_s` from 30 +to e.g. 60 for fork backends. Pros: trivial. Cons: +masks any genuine slowness regression we'd want to +catch. + +### B. CPU-count-aware cap + +For tests whose N scales with `cpu_count()`, scale +the cap too: + +```python +fail_after_s = ( + max(30, cpu_count() * 3) # 3s/actor floor + if is_forking_spawner + else 12 +) +``` + +Pros: scales with the actual cancel-cascade work. +Cons: still arbitrary multiplier. + +### C. `pytest-rerunfailures` for these tests only + +Mark the known-flaky tests with +`@pytest.mark.flaky(reruns=1)` (needs +`pytest-rerunfailures` dep). Single retry hides +genuine ~0.3% transient flakes. + +Pros: no cap change, surfaces persistent failures +loudly. Cons: adds a dep, retries can mask real bugs +if used widely. + +### D. Reduce `hard_kill`'s `terminate_after` + +Drop from 1.6s → 0.8s. Cuts the worst-case cascade +time roughly in half. Risks: fewer subs get a chance +to run their cleanup before SIGKILL → more orphaned +state for the autouse reapers to handle (ironically, +adds back overhead elsewhere). + +### E. Profile + targeted fix + +Add `log.devx()` markers in `hard_kill` to time each +phase. Identify if any subactor is consistently +hitting the 1.6s cap (vs. exiting in <0.1s). If so, +that sub has a teardown bug worth fixing at source. +Pros: actually fixes the underlying slowness. Cons: +real investigation work, deferred from this round. + +## Recommendation + +Land this issue-doc as the tracker. Apply **(B)** as +a small follow-up — cheap and proportional. If it +still flakes, escalate to **(E)** with a `log.devx()` +profile-pass. + +`(C)` is a backstop if `(B)` doesn't quite get there +and we need green CI faster than (E) can deliver. + +## Verification protocol + +After applying any mitigation: + +```bash +# Run the suite N times back-to-back, count failures. +# A persistent failure on the SAME test == real bug. +# Failures rotating across tests == still cap-related. + +for i in $(seq 1 5); do + ./py313/bin/python -m pytest tests/ \ + --tpt-proto=tcp \ + --spawn-backend=main_thread_forkserver \ + -q 2>&1 | tail -2 +done +``` + +Target: 0 failures across 5 runs ⇒ ship. 1–2 failures +still rotating ⇒ apply (C). Same test failing twice +⇒ escalate to (E). + +## See also + +- [#452](https://github.com/goodboy/tractor/issues/452) — + UDS sock-file leak (related — `hard_kill`'s + cleanup phase contributes to cascade time) +- `ai/conc-anal/trio_wakeup_socketpair_busy_loop_under_fork_issue.md` + — the upstream-trio fix that turned this from a + 100% hang into a 0.3% flake +- `ai/conc-anal/infected_asyncio_under_main_thread_forkserver_hang_issue.md` + — the asyncio variant which contributes to one of + the rotating failures +- `tractor/spawn/_spawn.py::hard_kill` — the SIGKILL + cascade source +- `tractor/_testing/_reap.py::_track_orphaned_uds_per_test`, + `_detect_runaway_subactors_per_test`, + `_reap_orphaned_subactors` — autouse cleanup + fixtures whose cumulative teardown overhead + contributes to the cascade time diff --git a/ai/conc-anal/fork_thread_semantics_execution_vs_memory.md b/ai/conc-anal/fork_thread_semantics_execution_vs_memory.md new file mode 100644 index 000000000..c07ad81d3 --- /dev/null +++ b/ai/conc-anal/fork_thread_semantics_execution_vs_memory.md @@ -0,0 +1,281 @@ +# `fork()` in a multi-threaded program — execution-side vs. memory-side of the same coin + +A reference doc for readers who've encountered one of two +opposite-sounding framings of POSIX `fork()` semantics in a +multi-threaded program and are confused by the other. + +This is a sibling to +`subint_fork_blocked_by_cpython_post_fork_issue.md` — that +doc covers a CPython-level refusal of fork-from-subint; +this one covers the more general POSIX layer, since +tractor's main-thread forkserver design rests on it. + +## TL;DR + +POSIX `fork()` only preserves the *calling* thread as a +runnable thread in the child — every other thread in the +parent simply never executes another instruction in the +child. trio's docs call this "leaked"; tractor's +`_main_thread_forkserver.py` docstring calls it "gone". +Both are correct: "gone" is the *execution* side (no +scheduler entry, no instructions retired), "leaked" is the +*memory* side (the dead threads' stacks and per-thread +heap structures still ride into the child's address space +as orphaned COW pages with no owner and no cleanup hook). +Same POSIX reality, two halves of the same coin. + +## The two framings + +[python-trio/trio#1614][trio-1614] (the canonical "trio + +fork" hazards thread) puts it this way: + +> If you use `fork()` in a process with multiple threads, +> all the other thread stacks are just leaked: there's +> nothing else you can reasonably do with them. + +`tractor.spawn._main_thread_forkserver`'s module docstring +(specifically the "What survives the fork? — POSIX +semantics" section) puts it this way: + +> POSIX `fork()` only preserves the *calling* thread as a +> runnable thread in the child. Every other thread in the +> parent — trio's runner thread, any `to_thread` cache +> threads, anything else — never executes another +> instruction post-fork. + +A reader bouncing between the two can be forgiven for +asking: well, *which* is it — leaked or gone? + +The answer is "yes". They're describing the same POSIX +behavior from two different angles: + +- trio is talking about the **bytes** the dead threads + leave behind — stacks, TLS slots, per-thread arena + metadata — and the fact that nothing in the child can + drive them forward, free them, or even safely walk + them. That's a memory leak in the strict sense: held + but unreachable. +- tractor is talking about the **execution** side + relevant to the forkserver design: which threads + retire instructions in the child? Exactly one — the + one that called `fork()`. Everything else, regardless + of the bytes left behind, is dead in a scheduler + sense. + +Neither framing is wrong; they're just answering +different questions. + +## POSIX `fork()` in a multi-threaded program — what actually happens + +Per POSIX (and concretely on Linux glibc), the contract +of `fork()` in a multi-threaded process is: + +1. The kernel creates a new process whose virtual + address space is a COW copy of the parent's. *All* + pages map across — code, heap, every thread's stack, + every malloc arena, every mmap region. +2. Of the parent's N threads, exactly **one** is + reified in the child as a runnable kernel task: the + thread that called `fork()`. The other N-1 threads + have *no* corresponding task in the child kernel. They + were never scheduled, never `clone()`d for the child, + never exist as runnable entities. +3. Their **memory artifacts** — pthread stacks, TLS, + `pthread_t` structures, glibc per-thread arena + bookkeeping — are still mapped in the child's address + space, because (1) duplicates *everything* page-wise. + They sit there as inert COW bytes. +4. The kernel does not clean those bytes up. There is no + "phantom-thread cleanup" pass post-fork. The kernel + doesn't know which mapped pages "belonged to" which + thread — at the kernel level mappings are + process-scoped, not thread-scoped. +5. The surviving thread (the caller of `fork()`) cannot + safely access those leaked bytes either. Any state + they encoded — held mutexes, in-flight syscalls, + half-updated invariants — is frozen at whatever + instant the parent's fork-syscall observed it. Some + of those mutexes may even still be locked from the + child's POV (the canonical "fork-in-multithreaded- + program-deadlocks" hazard; see `man pthread_atfork`). + +So: from the kernel's PoV, the child has one thread. +From the address-space's PoV, the child has all the +parent's bytes — including the corpses of the N-1 dead +threads' stacks. Both true simultaneously. + +## Why trio says "leaked" + +trio's framing makes sense from the parent's +PoV, looking at *what those threads were doing*. In a +running `trio.run()` process you typically have: + +- The trio runner thread itself — owns the `selectors` + epoll fd, the signal-wakeup-fd, the run-queue. +- Threadpool worker threads (`trio.to_thread`'s cache) + — blocked in `wait()` on the threadpool's work + condvar. +- Whatever other ad-hoc threads the application + started. + +Each of those threads owns *real work-state*: epoll +registrations, file descriptors held in +soon-to-be-completed reads, half-released locks, posted +but unconsumed wakeups. After fork, that state is still +encoded in the child's memory. None of it is invalid in +a well-formed-bytes sense. It's just that: + +- The thread that was driving it is gone. +- Nothing else in the child knows the layout well + enough to take over. +- Even if it did, the kernel objects backing the work + (epoll fd, signalfd) have separate post-fork + semantics that don't compose with userland trio + state. + +So the bytes are *held* (they're in the child's +address space, they count against RSS, they survive +until something clobbers them), and they're +*unreachable* in any meaningful sense — no thread can +safely drive them forward. That is the textbook +definition of a leak. + +trio's quote is reminding the user that `fork()` from a +multi-threaded process is a one-way memory hazard: +whatever those threads were doing, that work-state is +now garbage you happen to still be carrying. + +## Why tractor says "gone" + +tractor's `_main_thread_forkserver` framing is concerned +with a different question: *which thread executes in the +child, and is it safe?* + +The forkserver design rests on POSIX's "calling thread +is the sole survivor" guarantee. We pick that calling +thread very deliberately: a dedicated worker that has +provably never entered trio. So the thread that *does* +run in the child is one whose locals, TLS, and stack +contain nothing trio-related. Trio's runner thread — +the one that owned the epoll fd and the run-queue — is +*gone* from the child in the execution sense. It will +never run another instruction. The fact that its stack +bytes still exist in the child's address space (the +"leaked" view) is irrelevant to the forkserver, because +nothing in the child reads or writes those pages. + +So when the docstring says "Every other thread … is +gone the instant `fork()` returns in the child", it's +being precise about the surface that matters for the +backend: scheduler-level liveness. Nothing schedules +those threads ever again. Whether their bytes are +hanging around is a separate (and, for the design, +non-load-bearing) fact. + +## Cross-table + +The same tabular layout the `_main_thread_forkserver` +docstring uses, expanded with a fourth "what handles +it" column: + +| thread | parent | child (executing) | child (memory) | what handles it | +|---------------------|-----------|-------------------|------------------------------|-----------------------------| +| forkserver worker | continues | sole survivor | live stack | runs the child's bootstrap | +| `trio.run()` thread | continues | not running | leaked stack (zombie bytes) | overwritten by child's fresh `trio.run()` | +| any other thread | continues | not running | leaked stack (zombie bytes) | overwritten / GC'd / clobbered by `exec()` if used | + +The "child (executing)" column is the *execution* side +of the coin — what tractor cares about. The "child +(memory)" column is the *memory* side — what trio +cares about. + +The "what handles it" column is the deliberate punchline +of the design: nothing has to handle the leaked bytes +*explicitly*. They get clobbered by ordinary forward +progress in the child: + +- The fresh `trio.run()` the child boots up allocates + its own stack, scheduler, and run-queue, which over + time overlaps and overwrites the inherited zombie + pages. +- Python's GC walks live objects only; the dead-thread + Python frames aren't reachable from any + `PyThreadState`, so they get freed at the next + collection cycle. +- If the child eventually `exec()`s, the entire address + space is replaced and the leak vanishes. + +## What this means for the forkserver design + +The crucial point is that **the design doesn't and +*can't* prevent the leak**. There is no userland fix +for COW thread stacks. The kernel hands the child a +duplicated address space; that's what `fork()` *is*. No +amount of pre-fork hookery, `pthread_atfork()` +gymnastics, or post-fork cleanup can un-COW the dead +threads' pages without unmapping them, and unmapping +arbitrary regions of a duplicated address space is +neither portable nor safe. + +What the design *does* ensure is the orthogonal +property: the survivor thread is one that doesn't need +any of that leaked state to function. Concretely: + +- Survivor is the forkserver worker thread. +- That worker has provably never imported, called into, + or held any reference to `trio`. (Enforced by keeping + the worker's lifecycle entirely in + `_main_thread_forkserver.py` and never letting trio + task-state cross into it.) +- So the leaked pages — trio runner stack, threadpool + caches, etc. — are inert relative to the survivor. + No code path in the child references them. +- The child then boots its own fresh `trio.run()`, + which allocates new state in new pages. Over the + child's lifetime the COW'd zombie pages get + overwritten, GC'd, or (if the child eventually + `exec()`s) discarded wholesale. + +The "leak" is real but inert. It costs RSS until +clobbered; it doesn't cost correctness. That's exactly +the property the forkserver pattern is built on, and +it's also why the design needs the "calling thread is +trio-free" precondition to be airtight: if the survivor +were a trio thread, it *would* try to drive the leaked +trio state, and the leak would no longer be inert. + +## See also + +- `tractor/spawn/_main_thread_forkserver.py` — module + docstring's "What survives the fork? — POSIX + semantics" section is the in-tree, code-adjacent + prose this doc expands on. The cross-table here is a + fourth-column expansion of the table there. + +- [python-trio/trio#1614][trio-1614] — the trio issue + with the "leaked" framing, and the canonical thread + for trio + `fork()` hazards more broadly. + +- [`subint_fork_blocked_by_cpython_post_fork_issue.md`](./subint_fork_blocked_by_cpython_post_fork_issue.md) + — sibling analysis covering CPython's *post-fork* + hooks (`PyOS_AfterFork_Child`, + `_PyInterpreterState_DeleteExceptMain`) and why + fork-from-non-main-subint is a CPython-level hard + refusal. Complementary axis: this doc is about POSIX + semantics; that doc is about the CPython runtime + layer that runs *after* POSIX `fork()` returns in + the child. + +- `man pthread_atfork(3)` — canonical "fork in a + multithreaded process is dangerous" reference. + Especially the rationale section, which is the + closest thing to a normative statement of "the + surviving thread cannot safely use anything the dead + threads were touching." + +- `man fork(2)` (Linux) — "Other than [the calling + thread], … no other threads are replicated …" + paragraph is the kernel-side statement of the + execution-side framing this doc opens with. + +[trio-1614]: https://github.com/python-trio/trio/issues/1614 diff --git a/ai/conc-anal/infected_asyncio_under_main_thread_forkserver_hang_issue.md b/ai/conc-anal/infected_asyncio_under_main_thread_forkserver_hang_issue.md new file mode 100644 index 000000000..0a04d253c --- /dev/null +++ b/ai/conc-anal/infected_asyncio_under_main_thread_forkserver_hang_issue.md @@ -0,0 +1,378 @@ +# `infect_asyncio` × `main_thread_forkserver` Mode-A deadlock + +## Reproducer + +```bash +./py313/bin/python -m pytest \ + tests/test_infected_asyncio.py::test_aio_simple_error \ + --tpt-proto=tcp \ + --spawn-backend=main_thread_forkserver \ + -v --capture=sys +``` + +Hangs indefinitely. Mode-A signature — both processes +parked in `epoll_wait`, **neither burning CPU**. + +## Empirical observations (caught alive) + +### Outer pytest (parent) + +`py-spy dump` on the test runner pid shows the trio +event loop parked at the bottom of `trio.run()`: + +``` +Thread (idle): "MainThread" + get_events (trio/_core/_io_epoll.py:245) + self: + timeout: 86400 + run (trio/_core/_run.py:2415) + next_send: [] + timeout: 86400 + test_aio_simple_error (tests/test_infected_asyncio.py:175) +``` + +`timeout: 86400` is trio's "no scheduled work, just wait +for I/O forever" sentinel. `next_send: []` confirms +nothing is queued. The parent is stuck inside +`tractor.open_nursery(...).run_in_actor(...)` waiting +for `ipc_server.wait_for_peer(uid)` to fire — i.e. +waiting for the spawned subactor to connect back. + +### Subactor (forked child) + +`/proc//stack`: + +``` +do_epoll_wait+0x4c0/0x500 +__x64_sys_epoll_wait+0x70/0x120 +do_syscall_64+0xef/0x1540 +entry_SYSCALL_64_after_hwframe+0x77/0x7f +``` + +`strace -p -f`: + +``` +[pid ] epoll_wait(6 +[pid ] epoll_wait(3 +``` + +**Two threads**, both parked in `epoll_wait` on +distinct epoll fds. Both blocked, neither making +progress. + +### Subactor file-descriptor table + +``` +fd=0,1,2 stdio +fd=3 eventpoll [watches fd 4] +fd=4 ↔ fd=5 unix STREAM (CONNECTED) — internal pair +fd=6 eventpoll [watches fds 7, 9] +fd=7 ↔ fd=8 unix STREAM (CONNECTED) — internal pair +fd=9 ↔ fd=10 unix STREAM (CONNECTED) — internal pair +``` + +Confirmed via `ss -xp` peer-inode lookup: **all 6 unix +sockets are internal socketpairs** (peer in same pid). + +**Critical**: zero TCP/IPv4/IPv6 sockets, despite +`--tpt-proto=tcp`: + +``` +$ sudo lsof -p | grep -iE 'TCP|IPv' +(empty) +$ sudo ss -tnp | grep +(empty) +``` + +**The subactor never opened a TCP connection back to +the parent.** + +## Diagnosis + +The subactor reaches `_actor_child_main` → +`_trio_main(actor)` → +`run_as_asyncio_guest(trio_main)`. Code path +(`tractor.spawn._entry`): + +```python +if infect_asyncio: + actor._infected_aio = True + run_as_asyncio_guest(trio_main) # ← this branch +else: + trio.run(trio_main) +``` + +`run_as_asyncio_guest` (`tractor.to_asyncio`): + +```python +def run_as_asyncio_guest(trio_main, ...): + async def aio_main(trio_main): + loop = asyncio.get_running_loop() + trio_done_fute = asyncio.Future() + ... + trio.lowlevel.start_guest_run( + trio_main, + run_sync_soon_threadsafe=loop.call_soon_threadsafe, + done_callback=trio_done_callback, + ) + out = await asyncio.shield(trio_done_fute) + return out.unwrap() + ... + return asyncio.run(aio_main(trio_main)) +``` + +Expected flow: +1. `asyncio.run(aio_main(...))` — boots fresh asyncio + loop in calling thread. +2. `aio_main` calls `trio.lowlevel.start_guest_run(...)` + — initializes trio's I/O manager, schedules first + trio slice via `loop.call_soon_threadsafe`. +3. asyncio loop dispatches the callback → trio runs a + slice → yields back via `call_soon_threadsafe`. +4. Trio's `async_main` (the user function) runs → + `Channel.from_addr(parent_addr)` → TCP connect to + parent. + +What we observe instead: +- 2 threads in `epoll_wait` (one trio epoll, one + asyncio epoll, both inactive) +- 6 unix-socket fds (3 socketpairs: trio + wakeup-fd-pair, asyncio wakeup-fd-pair, trio kicker + socketpair) +- ZERO TCP — `Channel.from_addr` never ran + +Most likely cause: **trio's guest-run scheduling +callback didn't get dispatched by asyncio's loop in +the forked child**, so trio's `async_main` never +executes past trio bootstrap, and the +parent-IPC-connect step is never reached. + +## Fork-survival risk surface (hypothesis) + +`trio.lowlevel.start_guest_run` builds Python-level +closures + signal handlers + wakeup-fd registrations +that depend on: + +- The asyncio event loop's `call_soon_threadsafe` + thread-id matching the loop owner thread. +- Process-wide signal-wakeup-fd state + (`signal.set_wakeup_fd`). +- Trio's `KIManager` SIGINT handler. + +Under `main_thread_forkserver`, the fork happens from +a worker thread that has **never entered trio** +(intentional — trio-free launchpad). But the FORKED +child then tries to bring up BOTH asyncio AND +trio-as-guest fresh from this trio-free thread. The +asyncio loop boots fine; trio's `start_guest_run` +initializes BUT the cross-loop dispatch (asyncio +queue → trio slice) appears to silently fail to wire +up. + +Two more hypotheses worth probing: + +1. **Wakeup-fd contention**: asyncio installs + `signal.set_wakeup_fd()`. trio's + guest-run also wants a wakeup-fd. Whoever installs + second wins; the loser's `epoll_wait` no longer + wakes on signals. Combined with the `asyncio.shield( + trio_done_fute)` + `asyncio.CancelledError` + handling in `run_as_asyncio_guest`, a missed signal + delivery could explain the indefinite park. + +2. **Trio kicker socketpair race**: trio's I/O manager + uses an internal `socket.socketpair()` to "kick" + itself out of `epoll_wait` when a non-IO task needs + scheduling. In guest mode, the kicker is still + present but is supposed to be triggered via the + asyncio dispatch. If the kicker write never gets + issued by asyncio's callback, trio's epoll never + wakes. + +## Confirmed via py-spy (live capture) + +After detaching `strace` (ptrace is exclusive — that's +why `py-spy` returns EPERM if strace is attached): + +``` +Thread (idle): "main-thread-forkserver[asyncio_actor]" + select (selectors.py:452) # asyncio epoll + _run_once (asyncio/base_events.py:2012) + run_forever (asyncio/base_events.py:683) + run_until_complete (asyncio/base_events.py:712) + run (asyncio/runners.py:118) + run (asyncio/runners.py:195) + run_as_asyncio_guest (tractor/to_asyncio.py:1770) + _trio_main (tractor/spawn/_entry.py:160) + _actor_child_main (tractor/_child.py:72) + _child_target (tractor/spawn/_main_thread_forkserver.py:910) + _worker (tractor/spawn/_main_thread_forkserver.py:605) + [thread bootstrap] + +Thread (idle): "Trio thread 14" + get_events (trio/_core/_io_epoll.py:245) # trio epoll + get_events (trio/_core/_run.py:1678) + capture (outcome/_impl.py:67) + _handle_job (trio/_core/_thread_cache.py:173) + _work (trio/_core/_thread_cache.py:196) + [thread bootstrap] +``` + +This data **rewrites the diagnosis**: trio guest-run +isn't broken across the fork — it's working as designed. +The two threads ARE the canonical guest-run architecture: + +1. **Asyncio main loop** runs in the lead thread. Parked + in `selectors.EpollSelector.select(timeout=-1)` — + waiting indefinitely for ANY callback to be queued. +2. **Trio's I/O manager** offloads `get_events` + (`epoll_wait`) onto a `trio._core._thread_cache` + worker thread. The worker calls + `outcome.capture(get_events)` and parks in + `epoll_wait(timeout=86400)`. +3. When trio I/O fires (or its kicker socketpair gets a + write), the worker returns from `epoll_wait`, + delivers the result via `_handle_job`'s `deliver` + callback, which schedules the next trio slice on + asyncio via `loop.call_soon_threadsafe`. + +The fact that the trio thread is *already* in +`_thread_cache._handle_job` doing `capture(get_events)` +means **trio's scheduler HAS started** — the bridge +asyncio↔trio is wired correctly post-fork. + +So `async_main` DID run far enough to register some +trio task that's now awaiting I/O. The question +becomes: **what is `async_main` waiting on?** + +Process state confirms it's NOT waiting on the TCP +connect to parent: + +``` +$ sudo lsof -p | grep -iE 'TCP|IPv' +(empty) +$ sudo ss -tnp | grep +(empty) +``` + +`Channel.from_addr(parent_addr)` — the very first +thing `async_main` does — was never reached, OR was +reached but errored before `socket()` was called. The +parent (running `ipc_server.wait_for_peer`) waits +forever for the connection; it never comes. + +## Refined hypothesis + +`async_main` is stalled in some PRE-`Channel.from_addr` +checkpoint. Candidates: + +1. **`get_console_log` / logger init** — called early in + `_trio_main` if `actor.loglevel is not None`. Logging + setup involves file/handler init that could block on + something fork-inherited (e.g. a stale lock). +2. **`debug.maybe_init_greenback`** — `start_guest_run` + includes a check (`if debug_mode(): assert 0` — + currently asserts unsupported). For non-debug mode + this is bypassed but related machinery may run. +3. **Stackscope SIGUSR1 handler install** — gated on + `_debug_mode` OR `TRACTOR_ENABLE_STACKSCOPE` env-var. + The `enable_stack_on_sig()` path captures a trio + token via `trio.lowlevel.current_trio_token()` — + could block under guest mode. +4. **Initial `await trio.sleep(0)` / first checkpoint** + in `async_main` before reaching the + `Channel.from_addr` line. Under guest mode, if the + FIRST `call_soon_threadsafe` callback never gets + processed by asyncio, trio's first slice never + completes — but the worker thread WOULD still be in + `epoll_wait` having been started by trio's I/O + manager init. + +## Confirming `async_main`'s parked location + +Add temporary logging at the top of `Actor.async_main`: + +```python +# tractor/runtime/_runtime.py around line 855 +async def async_main(self, parent_addr=None): + log.devx('async_main: ENTERED') # marker A + try: + log.devx('async_main: pre-Channel.from_addr') # marker B + chan = await Channel.from_addr( + addr=wrap_address(parent_addr) + ) + log.devx('async_main: post-Channel.from_addr') # marker C + ... +``` + +Re-run the test with `--ll=devx`. The last marker logged +tells us exactly where `async_main` parked. If only A +fires, the issue is between A and B (logger init, +stackscope, etc.). If A and B fire but not C, it's in +`Channel.from_addr` (DNS, socket creation, connect). + +## Related sibling bug + +`tests/test_multi_program.py::test_register_duplicate_name` +hangs under the same backend with a DIFFERENT +fingerprint: + +- Subactor at 100% CPU (busy-loop), not parked +- `recvfrom(6, "", 65536, 0, NULL, NULL) = 0` repeating + with no `epoll_wait` in between +- fd=6 is one of trio's internal AF_UNIX + socketpair fds (the kicker mechanism) + +Distinct root cause — possibly trio's kicker socketpair +inheriting a half-closed state across the fork — but +shares the broader theme: **trio internal-state +initialization isn't fully fork-safe under +`main_thread_forkserver`** for the more exotic +dispatch paths. + +## Workarounds (until fix lands) + +1. **Skip-mark on the fork backend** — temporarily mark + `tests/test_infected_asyncio.py` with + `pytest.mark.skipon_spawn_backend('main_thread_forkserver', + reason='infect_asyncio + fork interaction broken, + see ai/conc-anal/infected_asyncio_under_main_thread_forkserver_hang_issue.md')`. + Lets the rest of the test suite run green while + this is being fixed properly. + +2. **Run infected-asyncio tests under the `trio` + backend only** — they don't exercise fork + semantics, so they won't hit this bug. + +## Investigation next steps + +In rough priority: + +1. Catch the hang alive again, **detach strace**, + `py-spy --locals` the subactor — confirm trio + thread is NOT yet at `async_main`. +2. Diff `start_guest_run` setup pre-fork vs post-fork + by adding `log.devx()` markers in + `tractor.to_asyncio.run_as_asyncio_guest::aio_main` + at: + - asyncio loop bringup + - immediately before `start_guest_run` + - immediately after `start_guest_run` + - inside the `trio_done_callback` registration +3. Check whether the asyncio loop dispatches ANY + callbacks in the forked child — instrument + `loop.call_soon_threadsafe` (e.g. monkey-patch + `loop._call_soon` to log). +4. If steps 1–3 confirm that asyncio's queue is + stuck, look at whether the asyncio event-loop + policy or selector is being inherited from a + pre-fork (parent-process) state in a way that + breaks the new loop. + +## See also + +- [#379](https://github.com/goodboy/tractor/issues/379) — subint umbrella +- [#451](https://github.com/goodboy/tractor/issues/451) — Mode-A cancel-cascade hang +- `ai/conc-anal/fork_thread_semantics_execution_vs_memory.md` +- `ai/conc-anal/subint_forkserver_test_cancellation_leak_issue.md` +- python-trio/trio#1614 — trio + fork hazards diff --git a/ai/conc-anal/subint_fork_from_main_thread_smoketest.py b/ai/conc-anal/subint_fork_from_main_thread_smoketest.py new file mode 100644 index 000000000..08166eac8 --- /dev/null +++ b/ai/conc-anal/subint_fork_from_main_thread_smoketest.py @@ -0,0 +1,375 @@ +#!/usr/bin/env python3 +''' +Standalone CPython-level feasibility check for the "main-interp +worker-thread forkserver + subint-hosted trio" architecture +proposed as a workaround to the CPython-level refusal +documented in +`ai/conc-anal/subint_fork_blocked_by_cpython_post_fork_issue.md`. + +Purpose +------- +Deliberately NOT a `tractor` test. Zero `tractor` imports. +Uses `_interpreters` (private stdlib) + `os.fork()` directly so +the signal is unambiguous — pass/fail here is a property of +CPython alone, independent of our runtime. + +Run each scenario in isolation; the child's fate is observable +only via `os.waitpid()` of the parent and the scenario's own +status prints. + +Scenarios (pick one with `--scenario `) +--------------------------------------------- + +- `control_subint_thread_fork` — the KNOWN-BROKEN case we + documented in `subint_fork_blocked_by_cpython_post_fork_issue.md`: + drive a subint from a thread, call `os.fork()` inside its + `_interpreters.exec()`, watch the child abort. **Included as + a control** — if this scenario DOESN'T abort the child, our + analysis is wrong and we should re-check everything. + +- `main_thread_fork` — baseline sanity. Call `os.fork()` from + the process's main thread. Must always succeed; if this + fails something much bigger is broken. + +- `worker_thread_fork` — the architectural assertion. Spawn a + regular `threading.Thread` (attached to main interp, NOT a + subint), have IT call `os.fork()`. Child should survive + post-fork cleanup. + +- `full_architecture` — end-to-end: main-interp worker thread + forks. In the child, fork-thread (still main-interp) creates + a subint, drives a second worker thread inside it that runs + a trivial `trio.run()`. Validates the "root runtime lives in + a subint in the child" piece of the proposed arch. + +All scenarios print a self-contained pass/fail banner. Exit +code 0 on expected outcome (which for `control_*` means "child +aborted", not "child succeeded"!). + +Requires Python 3.14+. + +Usage +----- +:: + + python subint_fork_from_main_thread_smoketest.py \\ + --scenario main_thread_fork + + python subint_fork_from_main_thread_smoketest.py \\ + --scenario full_architecture + +''' +from __future__ import annotations +import argparse +import os +import sys +import threading +import time + + +# Hard-require py3.14 for the public `concurrent.interpreters` +# API (we still drop to `_interpreters` internally, same as +# `tractor.spawn._subint`). +try: + from concurrent import interpreters as _public_interpreters # noqa: F401 + import _interpreters # type: ignore +except ImportError: + print( + 'FAIL (setup): requires Python 3.14+ ' + '(missing `concurrent.interpreters`)', + file=sys.stderr, + ) + sys.exit(2) + + +# The actual primitives this script exercises live in +# `tractor.spawn._subint_forkserver` — we re-import them here +# rather than inlining so the module and the validation stay +# in sync. (Early versions of this file had them inline for +# the "zero tractor imports" isolation guarantee; now that +# CPython-level feasibility is confirmed, the validated +# primitives have moved into tractor proper.) +from tractor.spawn._main_thread_forkserver import ( + fork_from_worker_thread, + wait_child, +) +from tractor.spawn._subint_forkserver import ( + run_subint_in_worker_thread, +) + + +# ---------------------------------------------------------------- +# small observability helpers (test-harness only) +# ---------------------------------------------------------------- + + +def _banner(title: str) -> None: + line = '=' * 60 + print(f'\n{line}\n{title}\n{line}', flush=True) + + +def _report( + label: str, + *, + ok: bool, + status_str: str, + expect_exit_ok: bool, +) -> None: + verdict: str = 'PASS' if ok else 'FAIL' + expected_str: str = ( + 'normal exit (rc=0)' + if expect_exit_ok + else 'abnormal death (signal or nonzero exit)' + ) + print( + f'[{verdict}] {label}: ' + f'expected {expected_str}; observed {status_str}', + flush=True, + ) + + +# ---------------------------------------------------------------- +# scenario: `control_subint_thread_fork` (known-broken) +# ---------------------------------------------------------------- + + +def scenario_control_subint_thread_fork() -> int: + _banner( + '[control] fork from INSIDE a subint (expected: child aborts)' + ) + interp_id = _interpreters.create('legacy') + print(f' created subint {interp_id}', flush=True) + + # Shared flag: child writes a sentinel file we can detect from + # the parent. If the child manages to write this, CPython's + # post-fork refusal is NOT happening → analysis is wrong. + sentinel = '/tmp/subint_fork_smoketest_control_child_ran' + try: + os.unlink(sentinel) + except FileNotFoundError: + pass + + bootstrap = ( + 'import os\n' + 'pid = os.fork()\n' + 'if pid == 0:\n' + # child — if CPython's refusal fires this code never runs + f' with open({sentinel!r}, "w") as f:\n' + ' f.write("ran")\n' + ' os._exit(0)\n' + 'else:\n' + # parent side (inside the launchpad subint) — stash the + # forked PID on a shareable dict so we can waitpid() + # from the outer main interp. We can't just return it; + # _interpreters.exec() returns nothing useful. + ' import builtins\n' + ' builtins._forked_child_pid = pid\n' + ) + + # NOTE, we can't easily pull state back from the subint. + # For the CONTROL scenario we just time-bound the fork + + # check the sentinel. If sentinel exists → child ran → + # analysis wrong. If not → child aborted → analysis + # confirmed. + done = threading.Event() + + def _drive() -> None: + try: + _interpreters.exec(interp_id, bootstrap) + except Exception as err: + print( + f' subint bootstrap raised (expected on some ' + f'CPython versions): {type(err).__name__}: {err}', + flush=True, + ) + finally: + done.set() + + t = threading.Thread( + target=_drive, + name='control-subint-fork-launchpad', + daemon=True, + ) + t.start() + done.wait(timeout=5.0) + t.join(timeout=2.0) + + # Give the (possibly-aborted) child a moment to die. + time.sleep(0.5) + + sentinel_present = os.path.exists(sentinel) + verdict = ( + # "PASS" for our analysis means sentinel NOT present. + 'PASS' if not sentinel_present else 'FAIL (UNEXPECTED)' + ) + print( + f'[{verdict}] control: sentinel present={sentinel_present} ' + f'(analysis predicts False — child should abort before ' + f'writing)', + flush=True, + ) + if sentinel_present: + os.unlink(sentinel) + + try: + _interpreters.destroy(interp_id) + except _interpreters.InterpreterError: + pass + + return 0 if not sentinel_present else 1 + + +# ---------------------------------------------------------------- +# scenario: `main_thread_fork` (baseline sanity) +# ---------------------------------------------------------------- + + +def scenario_main_thread_fork() -> int: + _banner( + '[baseline] fork from MAIN thread (expected: child exits normally)' + ) + + pid = os.fork() + if pid == 0: + os._exit(0) + + return 0 if _wait_child( + pid, + label='main_thread_fork', + expect_exit_ok=True, + ) else 1 + + +# ---------------------------------------------------------------- +# scenario: `worker_thread_fork` (architectural assertion) +# ---------------------------------------------------------------- + + +def _run_worker_thread_fork_scenario( + label: str, + *, + child_target=None, +) -> int: + ''' + Thin wrapper: delegate the actual fork to the + `tractor.spawn._subint_forkserver` primitive, then wait + on the child and render a pass/fail banner. + + ''' + try: + pid: int = fork_from_worker_thread( + child_target=child_target, + thread_name=f'worker-fork-thread[{label}]', + ) + except RuntimeError as err: + print(f'[FAIL] {label}: {err}', flush=True) + return 1 + print(f' forked child pid={pid}', flush=True) + ok, status_str = wait_child(pid, expect_exit_ok=True) + _report( + label, + ok=ok, + status_str=status_str, + expect_exit_ok=True, + ) + return 0 if ok else 1 + + +def scenario_worker_thread_fork() -> int: + _banner( + '[arch] fork from MAIN-INTERP WORKER thread ' + '(expected: child exits normally — this is the one ' + 'that matters)' + ) + return _run_worker_thread_fork_scenario( + 'worker_thread_fork', + ) + + +# ---------------------------------------------------------------- +# scenario: `full_architecture` +# ---------------------------------------------------------------- + + +_CHILD_TRIO_BOOTSTRAP: str = ( + 'import trio\n' + 'async def _main():\n' + ' await trio.sleep(0.05)\n' + ' return 42\n' + 'result = trio.run(_main)\n' + 'assert result == 42, f"trio.run returned {result}"\n' + 'print(" CHILD subint: trio.run OK, result=42", ' + 'flush=True)\n' +) + + +def _child_trio_in_subint() -> int: + ''' + CHILD-side `child_target`: drive a trivial `trio.run()` + inside a fresh legacy-config subint on a worker thread, + using the `tractor.spawn._subint_forkserver.run_subint_in_worker_thread` + primitive. Returns 0 on success. + + ''' + try: + run_subint_in_worker_thread( + _CHILD_TRIO_BOOTSTRAP, + thread_name='child-subint-trio-thread', + ) + except RuntimeError as err: + print( + f' CHILD: run_subint_in_worker_thread timed out / thread ' + f'never returned: {err}', + flush=True, + ) + return 3 + except BaseException as err: + print( + f' CHILD: subint bootstrap raised: ' + f'{type(err).__name__}: {err}', + flush=True, + ) + return 4 + return 0 + + +def scenario_full_architecture() -> int: + _banner( + '[arch-full] worker-thread fork + child runs trio in a ' + 'subint (end-to-end proposed arch)' + ) + return _run_worker_thread_fork_scenario( + 'full_architecture', + child_target=_child_trio_in_subint, + ) + + +# ---------------------------------------------------------------- +# main +# ---------------------------------------------------------------- + + +SCENARIOS: dict[str, Callable[[], int]] = { + 'control_subint_thread_fork': scenario_control_subint_thread_fork, + 'main_thread_fork': scenario_main_thread_fork, + 'worker_thread_fork': scenario_worker_thread_fork, + 'full_architecture': scenario_full_architecture, +} + + +def main() -> int: + ap = argparse.ArgumentParser( + description=__doc__, + formatter_class=argparse.RawDescriptionHelpFormatter, + ) + ap.add_argument( + '--scenario', + choices=sorted(SCENARIOS.keys()), + required=True, + ) + args = ap.parse_args() + return SCENARIOS[args.scenario]() + + +if __name__ == '__main__': + sys.exit(main()) diff --git a/ai/conc-anal/subint_forkserver_mp_shared_memory_issue.md b/ai/conc-anal/subint_forkserver_mp_shared_memory_issue.md new file mode 100644 index 000000000..07214dad0 --- /dev/null +++ b/ai/conc-anal/subint_forkserver_mp_shared_memory_issue.md @@ -0,0 +1,187 @@ +# `subint_forkserver` × `multiprocessing.SharedMemory`: fork-inherited `resource_tracker` fd + +Surfaced by `tests/test_shm.py` under +`--spawn-backend=subint_forkserver`. Two distinct +failure modes, one root cause: +**`multiprocessing.resource_tracker` is fork-without-exec +unsafe** (canonical CPython class — bpo-38119, bpo-45209). + +**Status: resolved by `tractor/ipc/_mp_bs.py` + +`tractor/ipc/_shm.py` changes (see "Resolution" below). +This doc kept as the +post-mortem / decision record.** + +## TL;DR + +`mp.shared_memory.SharedMemory` registers each shm +allocation with the per-process +`multiprocessing.resource_tracker` singleton. The +tracker is a daemon process started lazily; the +parent owns a unix-pipe-fd to it. When the parent +forks-without-execing into a `subint_forkserver` +child, the child inherits that fd — but it refers to +the *parent's* tracker, which the child has no +business writing to. + +Two manifestations under the original (pre-fix) code: + +1. **`test_child_attaches_alot`** — child loops 1000× + `attach_shm_list()`. First `mp.SharedMemory` call + in the child triggers + `resource_tracker._ensure_running_and_write` → + `_teardown_dead_process` → `os.close(self._fd)` on + an fd the child should never have touched. Surfaces + as `OSError: [Errno 9] Bad file descriptor` + wrapped in `tractor.RemoteActorError`. + +2. **`test_parent_writer_child_reader[*]`** — first + parametrize variant "passes" (with + `resource_tracker: leaked shared_memory` warning) + because nobody ever cleans up `/shm_list`. + Subsequent variants then fail with + `FileExistsError: '/shm_list'` because the leak + persists across the parametrize loop and forkserver + children can't `shm_open(create=True)` an existing + key. + +Trio backend (`mp_spawn`-style) doesn't surface this: +each subactor `exec`s a fresh interpreter → +independent resource tracker per subactor → no +inherited-fd issue, and the test's pre-existing leak +gets masked by the per-process tracker reset. + +Under `subint_forkserver`, the child is `os.fork()`'d +from a worker thread (no `exec`) → inherits parent's +`mp.resource_tracker._resource_tracker._fd` → EBADF +/ cross-talk on first `mp.SharedMemory` op. + +## Resolution + +We side-step the broken upstream machinery entirely +rather than try to make it fork-safe. Two-part fix +landed (commits to follow this doc): + +### 1. `tractor/ipc/_mp_bs.py::disable_mantracker()` + — unconditional disable + +The previous "3.13+ short-circuit" path used +`partial(SharedMemory, track=False)` to opt-out of +registration on 3.13+. The `track=False` switch is +necessary but not sufficient under fork: the +inherited tracker fd can still be touched indirectly +(e.g. through `_ensure_running_and_write`'s +self-check path). + +The fix takes both belts AND suspenders: + +- **Always** monkey-patch + `mp.resource_tracker._resource_tracker` to a + no-op `ManTracker` subclass whose + `register`/`unregister`/`ensure_running` are all + empty. +- **Always** wrap `SharedMemory` with + `track=False`. + +Result: the inherited tracker fd in the fork child +is still inherited (fd is a kernel object; we can't +un-inherit it across fork) but **nothing in the +shm code path will ever try to use it** — both the +tracker singleton and the per-allocation registration +are short-circuited. + +### 2. `tractor/ipc/_shm.py::open_shm_list()` + — own the cleanup + +Without `mp.resource_tracker`, nobody else will +unlink leaked segments at process exit. tractor +already controls actor lifecycle, so we register +unlink on the actor's lifetime stack: + +```python +def try_unlink(): + try: + shml.shm.unlink() + except FileNotFoundError as fne: + log.exception(...) # benign sibling-already-cleaned race + +actor.lifetime_stack.callback(try_unlink) +``` + +The `FileNotFoundError` swallow handles the case +where a sibling actor already unlinked the same +segment (legitimate race in shared-key setups). + +## Why this is the right call + +- **mp's tracker is widely criticized.** The + in-tree comment "non-SC madness" predates this + fix and matches CPython upstream's own discomfort + (e.g. the per-context tracker design rework + discussions in bpo-43475). +- **tractor already owns process lifecycle.** We + have `actor.lifetime_stack`, `Portal.cancel_actor`, + and the IPC cancel cascade. Adding mp's tracker + on top buys nothing we can't do better ourselves. +- **Backend-uniform.** No special-casing per spawn + backend. trio (`mp_spawn`-style), `subint_forkserver`, + and the future `subint` all behave identically + — register-time no-op, exit-time unlink-via- + lifetime-stack. + +## Trade-offs / known gaps + +- **Crash-leaked segments.** If an actor segfaults + or is `SIGKILL`'d before its lifetime stack runs, + `/dev/shm/` will leak. Mitigation: + `scripts/tractor-reap --shm` walks `/dev/shm`, + filters to segments owned by the current uid that + no live process is mapping or holding open (via + `/proc/*/maps` + `/proc/*/fd/*`), and unlinks + them. The "nobody-has-it-open" filter is + kernel-canonical so it never touches in-flight + segments held by sibling apps (verified locally + against 81 piker/lttng/aja-held segments — all + preserved). + - Higher-level apps using shm should still pin a + UUID into the key (the `'shml_'` pattern + in `test_child_attaches_alot`) so concurrent + sessions don't collide on the same key. +- **Cross-actor unlink races.** Two actors holding + the same shm key racing on `unlink()` — handled + by the `FileNotFoundError` swallow. +- **Crashes won't show up in mp's leak warning.** + We've turned off `resource_tracker`, so the usual + `resource_tracker: There appear to be N leaked + shared_memory objects to clean up at shutdown` + warning is gone too. If we ever want it back as + a crash-detection signal, we'd need our own + equivalent (walk the actor's `_shm_list_keys` set + at root teardown, log any unfreed). + +## Verification + +```sh +# fixed under both backends: +./py314/bin/python -m pytest tests/test_shm.py \ + --spawn-backend=subint_forkserver +# 7 passed + +./py314/bin/python -m pytest tests/test_shm.py \ + --spawn-backend=trio +# 7 passed (regression check) +``` + +## References + +- CPython upstream issues: + - https://bugs.python.org/issue38119 (fork + + resource_tracker fd inheritance) + - https://bugs.python.org/issue45209 + (SharedMemory + resource_tracker) + - https://bugs.python.org/issue43475 + (per-context tracker rework discussion) +- Long-term alternative: migrate off + `multiprocessing.shared_memory` entirely to + `posix_ipc` (no tracker) or finish the + `hotbaud`-based ringbuf transport. Not blocked on + this fix — both are independently tracked. diff --git a/ai/conc-anal/subint_forkserver_orphan_sigint_hang_issue.md b/ai/conc-anal/subint_forkserver_orphan_sigint_hang_issue.md new file mode 100644 index 000000000..50c8a4c65 --- /dev/null +++ b/ai/conc-anal/subint_forkserver_orphan_sigint_hang_issue.md @@ -0,0 +1,385 @@ +# `subint_forkserver` backend: orphaned-subactor SIGINT wedged in `epoll_wait` + +Follow-up to the Phase C `subint_forkserver` spawn-backend +PR (see `tractor.spawn._subint_forkserver`, issue #379). +Surfaced by the xfail'd +`tests/spawn/test_subint_forkserver.py::test_orphaned_subactor_sigint_cleanup_DRAFT`. + +Related-but-distinct from +`subint_cancel_delivery_hang_issue.md` (orphaned-channel +park AFTER subint teardown) and +`subint_sigint_starvation_issue.md` (GIL-starvation, +SIGINT never delivered): here the SIGINT IS delivered, +trio's handler IS installed, but trio's event loop never +wakes — so the KBI-at-checkpoint → `_trio_main` catch path +(which is the runtime's *intentional* OS-cancel design) +never fires. + +## TL;DR + +When a `subint_forkserver`-spawned subactor is orphaned +(parent `SIGKILL`'d, no IPC cancel path available) and then +externally `SIGINT`'d, the subactor hangs in +`trio/_core/_io_epoll.py::get_events` (epoll_wait) +indefinitely — even though: + +1. `threading.current_thread() is threading.main_thread()` + post-fork (CPython 3.14 re-designates correctly). +2. Trio's SIGINT handler IS installed in the subactor + (`signal.getsignal(SIGINT)` returns + `.handler at 0x...>`). +3. The kernel does deliver SIGINT — the signal arrives at + the only thread in the process (the fork-inherited + worker which IS now "main" per Python). + +Yet `epoll_wait` does not return. Trio's wakeup-fd mechanism +— the machinery that turns SIGINT into an epoll-wake — is +somehow not firing the wakeup. Until that's fixed, the +intentional "KBI-as-OS-cancel" path in +`tractor/spawn/_entry.py::_trio_main:164` is unreachable +for forkserver-spawned subactors whose parent dies. + +## Symptom + +Test: `tests/spawn/test_subint_forkserver.py::test_orphaned_subactor_sigint_cleanup_DRAFT` +(currently marked `@pytest.mark.xfail(strict=True)`). + +1. Harness subprocess brings up a tractor root actor + + one `run_in_actor(_sleep_forever)` subactor via + `try_set_start_method('subint_forkserver')`. +2. Harness prints `CHILD_PID` (subactor) and + `PARENT_READY` (root actor) markers to stdout. +3. Test `os.kill(parent_pid, SIGKILL)` + `proc.wait()` + to fully reap the root-actor harness. +4. Child (now reparented to pid 1) is still alive. +5. Test `os.kill(child_pid, SIGINT)` and polls + `os.kill(child_pid, 0)` for up to 10s. +6. **Observed**: the child is still alive at deadline — + SIGINT did not unwedge the trio loop. + +## What the "intentional" cancel path IS + +`tractor/spawn/_entry.py::_trio_main:157-186` — + +```python +try: + if infect_asyncio: + actor._infected_aio = True + run_as_asyncio_guest(trio_main) + else: + trio.run(trio_main) + +except KeyboardInterrupt: + logmeth = log.cancel + exit_status: str = ( + 'Actor received KBI (aka an OS-cancel)\n' + ... + ) +``` + +The "KBI == OS-cancel" mapping IS the runtime's +deliberate, documented design. An OS-level SIGINT should +flow as: kernel → trio handler → KBI at trio checkpoint +→ unwinds `async_main` → surfaces at `_trio_main`'s +`except KeyboardInterrupt:` → `log.cancel` + clean `rc=0`. + +**So fixing this hang is not "add a new SIGINT behavior" — +it's "make the existing designed behavior actually fire in +this backend config".** That's why option (B) ("fix root +cause") is aligned with existing design intent, not a +scope expansion. + +## Evidence + +### Positive control: standalone fork-from-worker + `trio.run(sleep_forever)` + SIGINT WORKS + +```python +import os, signal, time, trio +from tractor.spawn._subint_forkserver import ( + fork_from_worker_thread, wait_child, +) + +def child_target() -> int: + async def _main(): + try: + await trio.sleep_forever() + except KeyboardInterrupt: + print('CHILD: caught KBI — trio SIGINT works!') + return + trio.run(_main) + return 0 + +pid = fork_from_worker_thread(child_target, thread_name='trio-sigint-test') +time.sleep(1.0) +os.kill(pid, signal.SIGINT) +wait_child(pid) +``` + +Result: `CHILD: caught KBI — trio SIGINT works!` + clean +exit. So the fork-child + trio signal plumbing IS healthy +in isolation. The hang appears only with the full tractor +subactor runtime on top. + +### Negative test: full tractor subactor + orphan-SIGINT + +Equivalent to the xfail test. Traceback dump via +`faulthandler.register(SIGUSR1, all_threads=True)` at the +stuck moment: + +``` +Current thread 0x00007... [subint-forkserv] (most recent call first): + File ".../trio/_core/_io_epoll.py", line 245 in get_events + File ".../trio/_core/_run.py", line 2415 in run + File "tractor/spawn/_entry.py", line 162 in _trio_main + File "tractor/_child.py", line 72 in _actor_child_main + File "tractor/spawn/_subint_forkserver.py", line 650 in _child_target + File "tractor/spawn/_subint_forkserver.py", line 308 in _worker + File ".../threading.py", line 1024 in run +``` + +### Thread + signal-mask inventory of the stuck subactor + +Single thread (`tid == pid`, comm `'subint-forkserv'`, +which IS `threading.main_thread()` post-fork): + +``` +SigBlk: 0000000000000000 # nothing blocked +SigIgn: 0000000001001000 # SIGPIPE etc (Python defaults) +SigCgt: 0000000108000202 # bit 1 = SIGINT caught +``` + +Bit 1 set in `SigCgt` → SIGINT handler IS installed. So +trio's handler IS in place at the kernel level — not a +"handler missing" situation. + +### Handler identity + +Inside the subactor's RPC body, `signal.getsignal(SIGINT)` +returns `.handler at +0x...>` — trio's own `KIManager` handler. tractor's only +SIGINT touches are `signal.getsignal()` *reads* (to stash +into `debug.DebugStatus._trio_handler`); nothing writes +over trio's handler outside the debug-REPL shielding path +(`devx/debug/_tty_lock.py::shield_sigint`) which isn't +engaged here (no debug_mode). + +## Ruled out + +- **GIL starvation / signal-pipe-full** (class A, + `subint_sigint_starvation_issue.md`): subactor runs on + its own GIL (separate OS process), not sharing with the + parent → no cross-process GIL contention. And `strace`- + equivalent in the signal mask shows SIGINT IS caught, + not queued. +- **Orphaned channel park** (`subint_cancel_delivery_hang_issue.md`): + different failure mode — that one has trio iterating + normally and getting wedged on an orphaned + `chan.recv()` AFTER teardown. Here trio's event loop + itself never wakes. +- **Tractor explicitly catching + swallowing KBI**: + greppable — the one `except KeyboardInterrupt:` in the + runtime is the INTENTIONAL cancel-path catch at + `_trio_main:164`. `async_main` uses `except Exception` + (not BaseException), so KBI should propagate through + cleanly if it ever fires. +- **Missing `signal.set_wakeup_fd` (main-thread + restriction)**: post-fork, the fork-worker thread IS + `threading.main_thread()`, so trio's main-thread check + passes and its wakeup-fd install should succeed. + +## Root cause hypothesis (unverified) + +The SIGINT handler fires but trio's wakeup-fd write does +not wake `epoll_wait`. Candidate causes, ranked by +plausibility: + +1. **Wakeup-fd lifecycle race around tractor IPC setup.** + `async_main` spins up an IPC server + `process_messages` + loops early. Somewhere in that path the wakeup-fd that + trio registered with its epoll instance may be + closed/replaced/clobbered, so subsequent SIGINT writes + land on an fd that's no longer in the epoll set. + Evidence needed: compare + `signal.set_wakeup_fd(-1)` return value inside a + post-tractor-bringup RPC body vs. a pre-bringup + equivalent. If they differ, that's it. +2. **Shielded cancel scope around `process_messages`.** + The RPC message loop is likely wrapped in a trio cancel + scope; if that scope is `shield=True` at any outer + layer, KBI scheduled at a checkpoint could be absorbed + by the shield and never bubble out to `_trio_main`. +3. **Pre-fork wakeup-fd inheritance.** trio in the PARENT + process registered a wakeup-fd with its own epoll. The + child inherits the fd number but not the parent's + epoll instance — if tractor/trio re-uses the parent's + stale fd number anywhere, writes would go to a no-op + fd. (This is the least likely — `trio.run()` on the + child calls `KIManager.install` which should install a + fresh wakeup-fd from scratch.) + +## Cross-backend scope question + +**Untested**: does the same orphan-SIGINT hang reproduce +against the `trio_proc` backend (stock subprocess + exec)? +If yes → pre-existing tractor bug, independent of +`subint_forkserver`. If no → something specific to the +fork-from-worker path (e.g. inherited fds, mid-epoll-setup +interference). + +**Quick repro for trio_proc**: + +```python +# save as /tmp/trio_proc_orphan_sigint_repro.py +import os, sys, signal, time, glob +import subprocess as sp + +SCRIPT = ''' +import os, sys, trio, tractor +async def _sleep_forever(): + print(f"CHILD_PID={os.getpid()}", flush=True) + await trio.sleep_forever() + +async def _main(): + async with ( + tractor.open_root_actor(registry_addrs=[("127.0.0.1", 12350)]), + tractor.open_nursery() as an, + ): + await an.run_in_actor(_sleep_forever, name="sf-child") + print(f"PARENT_READY={os.getpid()}", flush=True) + await trio.sleep_forever() + +trio.run(_main) +''' + +proc = sp.Popen( + [sys.executable, '-c', SCRIPT], + stdout=sp.PIPE, stderr=sp.STDOUT, +) +# parse CHILD_PID + PARENT_READY off proc.stdout ... +# SIGKILL parent, SIGINT child, poll. +``` + +If that hangs too, open a broader issue; if not, this is +`subint_forkserver`-specific (likely fd-inheritance-related). + +## Why this is ours to fix (not CPython's) + +- Signal IS delivered (`SigCgt` bitmask confirms). +- Handler IS installed (trio's `KIManager`). +- Thread identity is correct post-fork. +- `_trio_main` already has the intentional KBI→clean-exit + path waiting to fire. + +Every CPython-level precondition is met. Something in +tractor's runtime or trio's integration with it is +breaking the SIGINT→wakeup→event-loop-wake pipeline. + +## Possible fix directions + +1. **Audit the wakeup-fd across tractor's IPC bringup.** + Add a trio startup hook that captures + `signal.set_wakeup_fd(-1)` at `_trio_main` entry, + after `async_main` enters, and periodically — assert + it's unchanged. If it moves, track down the writer. +2. **Explicit `signal.set_wakeup_fd` reset after IPC + setup.** Brute force: re-install a fresh wakeup-fd + mid-bringup. Band-aid, but fast to try. +3. **Ensure no `shield=True` cancel scope envelopes the + RPC-message-loop / IPC-server task.** If one does, + KBI-at-checkpoint never escapes. +4. **Once fixed, the `child_sigint='trio'` mode on + `subint_forkserver_proc`** becomes effectively a no-op + or a doc-only mode — trio's natural handler already + does the right thing. Might end up removing the flag + entirely if there's no behavioral difference between + modes. + +## Current workaround + +None; `child_sigint` defaults to `'ipc'` (IPC cancel is +the only reliable cancel path today), and the xfail test +documents the gap. Operators hitting orphan-SIGINT get a +hung process that needs `SIGKILL`. + +## Reproducer + +Inline, standalone (no pytest): + +```python +# save as /tmp/orphan_sigint_repro.py (py3.14+) +import os, sys, signal, time, glob, trio +import tractor +from tractor.spawn._subint_forkserver import ( + fork_from_worker_thread, +) + +async def _sleep_forever(): + print(f'SUBACTOR[{os.getpid()}]', flush=True) + await trio.sleep_forever() + +async def _main(): + async with ( + tractor.open_root_actor( + registry_addrs=[('127.0.0.1', 12349)], + ), + tractor.open_nursery() as an, + ): + await an.run_in_actor(_sleep_forever, name='sf-child') + await trio.sleep_forever() + +def child_target() -> int: + from tractor.spawn._spawn import try_set_start_method + try_set_start_method('subint_forkserver') + trio.run(_main) + return 0 + +pid = fork_from_worker_thread(child_target, thread_name='repro') +time.sleep(3.0) + +# find the subactor pid via /proc +children = [] +for path in glob.glob(f'/proc/{pid}/task/*/children'): + with open(path) as f: + children.extend(int(x) for x in f.read().split() if x) +subactor_pid = children[0] + +# SIGKILL root → orphan the subactor +os.kill(pid, signal.SIGKILL) +os.waitpid(pid, 0) +time.sleep(0.3) + +# SIGINT the orphan — should cause clean trio exit +os.kill(subactor_pid, signal.SIGINT) + +# poll for exit +for _ in range(100): + try: + os.kill(subactor_pid, 0) + time.sleep(0.1) + except ProcessLookupError: + print('HARNESS: subactor exited cleanly ✔') + sys.exit(0) +os.kill(subactor_pid, signal.SIGKILL) +print('HARNESS: subactor hung — reproduced') +sys.exit(1) +``` + +Expected (current): `HARNESS: subactor hung — reproduced`. + +After fix: `HARNESS: subactor exited cleanly ✔`. + +## References + +- `tractor/spawn/_entry.py::_trio_main:157-186` — the + intentional KBI→clean-exit path this bug makes + unreachable. +- `tractor/spawn/_subint_forkserver` — the backend whose + orphan cancel-robustness this blocks. +- `tests/spawn/test_subint_forkserver.py::test_orphaned_subactor_sigint_cleanup_DRAFT` + — the xfail'd reproducer in the test suite. +- `ai/conc-anal/subint_cancel_delivery_hang_issue.md` — + sibling "orphaned channel park" hang (different class). +- `ai/conc-anal/subint_sigint_starvation_issue.md` — + sibling "GIL starvation SIGINT drop" hang (different + class). +- tractor issue #379 — subint backend tracking. diff --git a/ai/conc-anal/subint_forkserver_test_cancellation_leak_issue.md b/ai/conc-anal/subint_forkserver_test_cancellation_leak_issue.md new file mode 100644 index 000000000..a685f14ff --- /dev/null +++ b/ai/conc-anal/subint_forkserver_test_cancellation_leak_issue.md @@ -0,0 +1,851 @@ +# `subint_forkserver` backend: `test_cancellation.py` multi-level cancel cascade hang + +> **Tracked at:** [#449](https://github.com/goodboy/tractor/issues/449) + +Follow-up tracker: surfaced while wiring the new +`subint_forkserver` spawn backend into the full tractor +test matrix (step 2 of the post-backend-lands plan). +See also +`ai/conc-anal/subint_forkserver_orphan_sigint_hang_issue.md` +— sibling tracker for a different forkserver-teardown +class which probably shares the same fundamental root +cause (fork-FD-inheritance across nested spawns). + +## TL;DR + +`tests/test_cancellation.py::test_nested_multierrors[subint_forkserver]` +hangs indefinitely under our new backend. The hang is +**inside the graceful IPC cancel cascade** — every actor +in the multi-level tree parks in `epoll_wait` waiting +for IPC messages that never arrive. Not a hard-kill / +tree-reap issue (we don't reach the hard-kill fallback +path at all). + +Working hypothesis (unverified): **`os.fork()` from a +subactor inherits the root parent's IPC listener socket +FDs**. When a first-level subactor forkserver-spawns a +grandchild, that grandchild inherits both its direct +spawner's FDs AND the root's FDs — IPC message routing +becomes ambiguous (or silently sends to the wrong +channel), so the cancel cascade can't reach its target. + +## Corrected diagnosis vs. earlier draft + +An earlier version of this doc claimed the root cause +was **"forkserver teardown doesn't tree-kill +descendants"** (SIGKILL only reaches the direct child, +grandchildren survive and hold TCP `:1616`). That +diagnosis was **wrong**, caused by conflating two +observations: + +1. *5-zombie leak holding :1616* — happened in my own + workflow when I aborted a bg pytest task with + `pkill` (SIGTERM/SIGKILL, not SIGINT). The abrupt + kill skipped the graceful `ActorNursery.__aexit__` + cancel cascade entirely, orphaning descendants to + init. **This was my cleanup bug, not a forkserver + teardown bug.** Codified the fix (SIGINT-first + + bounded wait before SIGKILL) in + `feedback_sc_graceful_cancel_first.md` + + `.claude/skills/run-tests/SKILL.md`. +2. *`test_nested_multierrors` hangs indefinitely* — + the real, separate, forkserver-specific bug + captured by this doc. + +The two symptoms are unrelated. The tree-kill / setpgrp +fix direction proposed earlier would not help (1) (SC- +graceful-cleanup is the right answer there) and would +not help (2) (the hang is in the cancel cascade, not +in the hard-kill fallback). + +## Symptom + +Reproducer (py3.14, clean env): + +```sh +# preflight: ensure clean env +ss -tlnp 2>/dev/null | grep ':1616' && echo 'FOUL — cleanup first!' || echo 'clean' + +./py314/bin/python -m pytest --spawn-backend=subint_forkserver \ + 'tests/test_cancellation.py::test_nested_multierrors[subint_forkserver]' \ + --timeout=30 --timeout-method=thread --tb=short -v +``` + +Expected: `pytest-timeout` fires at 30s with a thread- +dump banner, but the process itself **remains alive +after timeout** and doesn't unwedge on subsequent +SIGINT. Requires SIGKILL to reap. + +## Evidence (tree structure at hang point) + +All 5 processes are kernel-level `S` (sleeping) in +`do_epoll_wait` (trio's event loop waiting on I/O): + +``` +PID PPID THREADS NAME ROLE +333986 1 2 subint-forkserv pytest main (the test body) +333993 333986 3 subint-forkserv "child 1" spawner subactor + 334003 333993 1 subint-forkserv grandchild errorer under child-1 + 334014 333993 1 subint-forkserv grandchild errorer under child-1 +333999 333986 1 subint-forkserv "child 2" spawner subactor (NO grandchildren!) +``` + +### Asymmetric tree depth + +The test's `spawn_and_error(breadth=2, depth=3)` should +have BOTH direct children spawning 2 grandchildren +each, going 3 levels deep. Reality: + +- Child 1 (333993, 3 threads) DID spawn its two + grandchildren as expected — fully booted trio + runtime. +- Child 2 (333999, 1 thread) did NOT spawn any + grandchildren — clearly never completed its + nursery's first `run_in_actor`. Its 1-thread state + suggests the runtime never fully booted (no trio + worker threads for `waitpid`/IPC). + +This asymmetry is the key clue: the two direct +children started identically but diverged. Probably a +race around fork-inherited state (listener FDs, +subactor-nursery channel state) that happens to land +differently depending on spawn ordering. + +### Parent-side state + +Thread-dump of pytest main (333986) at the hang: + +- Main trio thread — parked in + `trio._core._io_epoll.get_events` (epoll_wait on + its event loop). Waiting for IPC from children. +- Two trio-cache worker threads — each parked in + `outcome.capture(sync_fn)` calling + `os.waitpid(child_pid, 0)`. These are our + `_ForkedProc.wait()` off-loads. They're waiting for + the direct children to exit — but children are + stuck in their own epoll_wait waiting for IPC from + the parent. + +**It's a deadlock, not a leak:** the parent is +correctly running `soft_kill(proc, _ForkedProc.wait, +portal)` (graceful IPC cancel via +`Portal.cancel_actor()`), but the children never +acknowledge the cancel message (or the message never +reaches them through the tangled post-fork IPC). + +## What's NOT the cause (ruled out) + +- **`_ForkedProc.kill()` only SIGKILLs direct pid / + missing tree-kill**: doesn't apply — we never reach + the hard-kill path. The deadlock is in the graceful + cancel cascade. +- **Port `:1616` contention**: ruled out after the + `reg_addr` fixture-wiring fix; each test session + gets a unique port now. +- **GIL starvation / SIGINT pipe filling** (class-A, + `subint_sigint_starvation_issue.md`): doesn't apply + — each subactor is its own OS process with its own + GIL (not legacy-config subint). +- **Child-side `_trio_main` absorbing KBI**: grep + confirmed; `_trio_main` only catches KBI at the + `trio.run()` callsite, which is reached only if the + trio loop exits normally. The children here never + exit trio.run() — they're wedged inside. + +## Hypothesis: FD inheritance across nested forks + +`subint_forkserver_proc` calls +`fork_from_worker_thread()` which ultimately does +`os.fork()` from a dedicated worker thread. Standard +Linux/POSIX fork semantics: **the child inherits ALL +open FDs from the parent**, including listener +sockets, epoll fds, trio wakeup pipes, and the +parent's IPC channel sockets. + +At root-actor fork-spawn time, the root's IPC server +listener FDs are open in the parent. Those get +inherited by child 1. Child 1 then forkserver-spawns +its OWN subactor (grandchild). The grandchild +inherits FDs from child 1 — but child 1's address +space still contains **the root's IPC listener FDs +too** (inherited at first fork). So the grandchild +has THREE sets of FDs: + +1. Its own (created after becoming a subactor). +2. Its direct parent child-1's. +3. The ROOT's (grandparent's) — inherited transitively. + +IPC message routing may be ambiguous in this tangled +state. Or a listener socket that the root thinks it +owns is actually open in multiple processes, and +messages sent to it go to an arbitrary one. That +would exactly match the observed "graceful cancel +never propagates". + +This hypothesis predicts the bug **scales with fork +depth**: single-level forkserver spawn +(`test_subint_forkserver_spawn_basic`) works +perfectly, but any test that spawns a second level +deadlocks. Matches observations so far. + +## Fix directions (to validate) + +### 1. `close_fds=True` equivalent in `fork_from_worker_thread()` + +`subprocess.Popen` / `trio.lowlevel.open_process` have +`close_fds=True` by default on POSIX — they +enumerate open FDs in the child post-fork and close +everything except stdio + any explicitly-passed FDs. +Our raw `os.fork()` doesn't. Adding the equivalent to +our `_worker` prelude would isolate each fork +generation's FD set. + +Implementation sketch in +`tractor.spawn._subint_forkserver.fork_from_worker_thread._worker`: + +```python +def _worker() -> None: + pid: int = os.fork() + if pid == 0: + # CHILD: close inherited FDs except stdio + the + # pid-pipe we just opened. + keep: set[int] = {0, 1, 2, rfd, wfd} + import resource + soft, _ = resource.getrlimit(resource.RLIMIT_NOFILE) + os.closerange(3, soft) # blunt; or enumerate /proc/self/fd + # ... then child_target() as before +``` + +Problem: overly aggressive — closes FDs the +grandchild might legitimately need (e.g. its parent's +IPC channel for the spawn-spec handshake, if we rely +on that). Needs thought about which FDs are +"inheritable and safe" vs. "inherited by accident". + +### 2. Cloexec on tractor's own FDs + +Set `FD_CLOEXEC` on tractor-created sockets (listener +sockets, IPC channel sockets, pipes). This flag +causes automatic close on `execve`, but since we +`fork()` without `exec()`, this alone doesn't help. +BUT — combined with a child-side explicit close- +non-cloexec loop, it gives us a way to mark "my +private FDs" vs. "safe to inherit". Most robust, but +requires tractor-wide audit. + +### 3. Explicit FD cleanup in `_ForkedProc`/`_child_target` + +Have `subint_forkserver_proc`'s `_child_target` +closure explicitly close the parent-side IPC listener +FDs before calling `_actor_child_main`. Requires +being able to enumerate "the parent's listener FDs +that the child shouldn't keep" — plausible via +`Actor.ipc_server`'s socket objects. + +### 4. Use `os.posix_spawn` with explicit `file_actions` + +Instead of raw `os.fork()`, use `os.posix_spawn()` +which supports explicit file-action specifications +(close this FD, dup2 that FD). Cleaner semantics, but +probably incompatible with our "no exec" requirement +(subint_forkserver is a fork-without-exec design). + +**Likely correct answer: (3) — targeted FD cleanup +via `actor.ipc_server` handle.** (1) is too blunt, +(2) is too wide-ranging, (4) changes the spawn +mechanism. + +## Reproducer (standalone, no pytest) + +```python +# save as /tmp/forkserver_nested_hang_repro.py (py3.14+) +import trio, tractor + +async def assert_err(): + assert 0 + +async def spawn_and_error(breadth: int = 2, depth: int = 1): + async with tractor.open_nursery() as n: + for i in range(breadth): + if depth > 0: + await n.run_in_actor( + spawn_and_error, + breadth=breadth, + depth=depth - 1, + name=f'spawner_{i}_{depth}', + ) + else: + await n.run_in_actor( + assert_err, + name=f'errorer_{i}', + ) + +async def _main(): + async with tractor.open_nursery() as n: + for i in range(2): + await n.run_in_actor( + spawn_and_error, + name=f'top_{i}', + breadth=2, + depth=1, + ) + +if __name__ == '__main__': + from tractor.spawn._spawn import try_set_start_method + try_set_start_method('subint_forkserver') + with trio.fail_after(20): + trio.run(_main) +``` + +Expected (current): hangs on `trio.fail_after(20)` +— children never ack the error-propagation cancel +cascade. Pattern: top 2 direct children, 4 +grandchildren, 1 errorer deadlocks while trying to +unwind through its parent chain. + +After fix: `trio.TooSlowError`-free completion; the +root's `open_nursery` receives the +`BaseExceptionGroup` containing the `AssertionError` +from the errorer and unwinds cleanly. + +## Update — 2026-04-23: partial fix landed, deeper layer surfaced + +Three improvements landed as separate commits in the +`subint_forkserver_backend` branch (see `git log`): + +1. **`_close_inherited_fds()` in fork-child prelude** + (`tractor/spawn/_subint_forkserver.py`). POSIX + close-fds-equivalent enumeration via + `/proc/self/fd` (or `RLIMIT_NOFILE` fallback), keep + only stdio. This is fix-direction (1) from the list + above — went with the blunt form rather than the + targeted enum-via-`actor.ipc_server` form, turns + out the aggressive close is safe because every + inheritable resource the fresh child needs + (IPC-channel socket, etc.) is opened AFTER the + fork anyway. +2. **`_ForkedProc.wait()` via `os.pidfd_open()` + + `trio.lowlevel.wait_readable()`** — matches the + `trio.Process.wait` / `mp.Process.sentinel` pattern + used by `trio_proc` and `proc_waiter`. Gives us + fully trio-cancellable child-wait (prior impl + blocked a cache thread on a sync `os.waitpid` that + was NOT trio-cancellable due to + `abandon_on_cancel=False`). +3. **`_parent_chan_cs` wiring** in + `tractor/runtime/_runtime.py`: capture the shielded + `loop_cs` for the parent-channel `process_messages` + task in `async_main`; explicitly cancel it in + `Actor.cancel()` teardown. This breaks the shield + during teardown so the parent-chan loop exits when + cancel is issued, instead of parking on a parent- + socket EOF that might never arrive under fork + semantics. + +**Concrete wins from (1):** the sibling +`subint_forkserver_orphan_sigint_hang_issue.md` class +is **now fixed** — `test_orphaned_subactor_sigint_cleanup_DRAFT` +went from strict-xfail to pass. The xfail mark was +removed; the test remains as a regression guard. + +**test_nested_multierrors STILL hangs** though. + +### Updated diagnosis (narrowed) + +DIAGDEBUG instrumentation of `process_messages` ENTER/ +EXIT pairs + `_parent_chan_cs.cancel()` call sites +showed (captured during a 20s-timeout repro): + +- 80 `process_messages` ENTERs, 75 EXITs → 5 stuck. +- **All 40 `shield=True` ENTERs matched EXIT** — every + shielded parent-chan loop exits cleanly. The + `_parent_chan_cs` wiring works as intended. +- **The 5 stuck loops are all `shield=False`** — peer- + channel handlers (inbound connections handled by + `handle_stream_from_peer` in stream_handler_tn). +- After our `_parent_chan_cs.cancel()` fires, NEW + shielded process_messages loops start (on the + session reg_addr port — probably discovery-layer + reconnection attempts). These don't block teardown + (they all exit) but indicate the cancel cascade has + more moving parts than expected. + +### Remaining unknown + +Why don't the 5 peer-channel loops exit when +`service_tn.cancel_scope.cancel()` fires? They're in +`stream_handler_tn` which IS `service_tn` in the +current configuration (`open_ipc_server(parent_tn= +service_tn, stream_handler_tn=service_tn)`). A +standard nursery-scope-cancel should propagate through +them — no shield, no special handler. Something +specific to the fork-spawned configuration keeps them +alive. + +Candidate follow-up experiments: + +- Dump the trio task tree at the hang point (via + `stackscope` or direct trio introspection) to see + what each stuck loop is awaiting. `chan.__anext__` + on a socket recv? An inner lock? A shielded sub-task? +- Compare peer-channel handler lifecycle under + `trio_proc` vs `subint_forkserver` with equivalent + logging to spot the divergence. +- Investigate whether the peer handler is caught in + the `except trio.Cancelled:` path at + `tractor/ipc/_server.py:448` that re-raises — but + re-raise means it should still exit. Unless + something higher up swallows it. + +### Attempted fix (DID NOT work) — hypothesis (3) + +Tried: in `_serve_ipc_eps` finally, after closing +listeners, also iterate `server._peers` and +sync-close each peer channel's underlying stream +socket fd: + +```python +for _uid, _chans in list(server._peers.items()): + for _chan in _chans: + try: + _stream = _chan._transport.stream if _chan._transport else None + if _stream is not None: + _stream.socket.close() # sync fd close + except (AttributeError, OSError): + pass +``` + +Theory: closing the socket fd from outside the stuck +recv task would make the recv see EBADF / +ClosedResourceError and unblock. + +Result: `test_nested_multierrors[subint_forkserver]` +still hangs identically. Either: +- The sync `socket.close()` doesn't propagate into + trio's in-flight `recv_some()` the way I expected + (trio may hold an internal reference that keeps the + fd open even after an external close), or +- The stuck recv isn't even the root blocker and the + peer handlers never reach the finally for some + reason I haven't understood yet. + +Either way, the sync-close hypothesis is **ruled +out**. Reverted the experiment, restored the skip- +mark on the test. + +### Aside: `-s` flag does NOT change `test_nested_multierrors` behavior + +Tested explicitly: both with and without `-s`, the +test hangs identically. So the capture-pipe-fill +hypothesis is **ruled out** for this test. + +The earlier `test_context_stream_semantics.py` `-s` +observation was most likely caused by a competing +pytest run in my session (confirmed via process list +— my leftover pytest was alive at that time and +could have been holding state on the default +registry port). + +## Update — 2026-04-23 (late): cancel delivery ruled in, nursery-wait ruled BLOCKER + +**New diagnostic run** instrumented +`handle_stream_from_peer` at ENTER / `except +trio.Cancelled:` / finally, plus `Actor.cancel()` +just before `self._parent_chan_cs.cancel()`. Result: + +- **40 `handle_stream_from_peer` ENTERs**. +- **0 `except trio.Cancelled:` hits** — cancel + never fires on any peer-handler. +- **35 finally hits** — those handlers exit via + peer-initiated EOF (normal return), NOT cancel. +- **5 handlers never reach finally** — stuck forever. +- **`Actor.cancel()` fired in 12 PIDs** — but the + PIDs with peer handlers that DIDN'T fire + Actor.cancel are exactly **root + 2 direct + spawners**. These 3 actors have peer handlers + (for their own subactors) that stay stuck because + **`Actor.cancel()` at these levels never runs**. + +### The actual deadlock shape + +`Actor.cancel()` lives in +`open_root_actor.__aexit__` / `async_main` teardown. +That only runs when the enclosing `async with +tractor.open_nursery()` exits. The nursery's +`__aexit__` calls the backend `*_proc` spawn target's +teardown, which does `soft_kill() → +_ForkedProc.wait()` on its child PID. That wait is +trio-cancellable via pidfd now (good) — but nothing +CANCELS it because the outer scope only cancels when +`Actor.cancel()` runs, which only runs when the +nursery completes, which waits on the child. + +It's a **multi-level mutual wait**: + +``` +root blocks on spawner.wait() + spawner blocks on grandchild.wait() + grandchild blocks on errorer.wait() + errorer Actor.cancel() ran, but process + may not have fully exited yet + (something in root_tn holding on?) +``` + +Each level waits for the level below. The bottom +level (errorer) reaches Actor.cancel(), but its +process may not fully exit — meaning its pidfd +doesn't go readable, meaning the grandchild's +waitpid doesn't return, meaning the grandchild's +nursery doesn't unwind, etc. all the way up. + +### Refined question + +**Why does an errorer process not exit after its +`Actor.cancel()` completes?** + +Possibilities: +1. `_parent_chan_cs.cancel()` fires (shielded + parent-chan loop unshielded), but the task is + stuck INSIDE the shielded loop's recv in a way + that cancel still can't break. +2. After `Actor.cancel()` returns, `async_main` + still has other tasks in `root_tn` waiting for + something that never arrives (e.g. outbound + IPC reply delivery). +3. The `os._exit(rc)` in `_worker` (at + `_subint_forkserver.py`) doesn't run because + `_child_target` never returns. + +Next-session candidate probes (in priority order): + +1. **Instrument `_worker`'s fork-child branch** to + confirm whether `child_target()` returns (and + thus `os._exit(rc)` is reached) for errorer + PIDs. If yes → process should die; if no → + trace back into `_actor_child_main` / + `_trio_main` / `async_main` to find the stuck + spot. +2. **Instrument `async_main`'s final unwind** to + see which await in the teardown doesn't + complete. +3. **Compare under `trio_proc` backend** at the + same `_worker`-equivalent level to see where + the flows diverge. + +### Rule-out: NOT a stuck peer-chan recv + +Earlier hypothesis was that the 5 stuck peer-chan +loops were blocked on a socket recv that cancel +couldn't interrupt. This pass revealed the real +cause: cancel **never reaches those tasks** because +their owning actor's `Actor.cancel()` never runs. +The recvs are fine — they're just parked because +nothing is telling them to stop. + +## Update — 2026-04-23 (very late): leaves exit, middle actors stuck in `trio.run` + +Yet another instrumentation pass — this time +printing at: + +- `_worker` child branch: `pre child_target()` / + `child_target RETURNED rc=N` / `about to + os._exit(rc)` +- `_trio_main`: `about to trio.run` / + `trio.run RETURNED NORMALLY` / `FINALLY` + +**Fresh-run results** (`test_nested_multierrors[ +subint_forkserver]`, depth=1/breadth=2, 1 root + 14 +forked = 15 actors total): + +- **9 processes completed the full flow** — + `trio.run RETURNED NORMALLY` → `child_target + RETURNED rc=0` → `about to os._exit(0)`. These + are the LEAVES of the tree (errorer actors) plus + their direct parents (depth-0 spawners). They + actually exit their processes. +- **5 processes are stuck INSIDE `trio.run(trio_main)`** + — they hit "about to trio.run" but NEVER see + "trio.run RETURNED NORMALLY". These are root + + top-level spawners + one intermediate. + +**What this means:** `async_main` itself is the +deadlock holder, not the peer-channel loops. +Specifically, the outer `async with root_tn:` in +`async_main` never exits for the 5 stuck actors. +Their `trio.run` never returns → `_trio_main` +catch/finally never runs → `_worker` never reaches +`os._exit(rc)` → the PROCESS never dies → its +parent's `_ForkedProc.wait()` blocks → parent's +nursery hangs → parent's `async_main` hangs → ... + +### The new precise question + +**What task in the 5 stuck actors' `async_main` +never completes?** Candidates: + +1. The shielded parent-chan `process_messages` + task in `root_tn` — but we explicitly cancel it + via `_parent_chan_cs.cancel()` in `Actor.cancel()`. + However, `Actor.cancel()` only runs during + `open_root_actor.__aexit__`, which itself runs + only after `async_main`'s outer unwind — which + doesn't happen. So the shield isn't broken. + +2. `await actor_nursery._join_procs.wait()` or + similar in the inline backend `*_proc` flow. + +3. `_ForkedProc.wait()` on a grandchild that + actually DID exit — but the pidfd_open watch + didn't fire for some reason (race between + pidfd_open and the child exiting?). + +The most specific next probe: **add DIAG around +`_ForkedProc.wait()` enter/exit** to see whether +the pidfd-based wait returns for every grandchild +exit. If a stuck parent's `_ForkedProc.wait()` +NEVER returns despite its child exiting, the +pidfd mechanism has a race bug under nested +forkserver. + +Alternative probe: instrument `async_main`'s outer +nursery exits to find which nursery's `__aexit__` +is stuck, drilling down from `trio.run` to the +specific `async with` that never completes. + +### Cascade summary (updated tree view) + +``` +ROOT (pytest) STUCK in trio.run +├── top_0 (spawner, d=1) STUCK in trio.run +│ ├── spawner_0_d1_0 (d=0) exited (os._exit 0) +│ │ ├── errorer_0_0 exited (os._exit 0) +│ │ └── errorer_0_1 exited (os._exit 0) +│ └── spawner_0_d1_1 (d=0) exited (os._exit 0) +│ ├── errorer_0_2 exited (os._exit 0) +│ └── errorer_0_3 exited (os._exit 0) +└── top_1 (spawner, d=1) STUCK in trio.run + ├── spawner_1_d1_0 (d=0) STUCK in trio.run (sibling race?) + │ ├── errorer_1_0 exited + │ └── errorer_1_1 exited + └── spawner_1_d1_1 (d=0) STUCK in trio.run + ├── errorer_1_2 exited + └── errorer_1_3 exited +``` + +Grandchildren (d=0 spawners) exit OR stick — +asymmetric. Not purely depth-determined. Some race +condition in nursery teardown when multiple +siblings error simultaneously. + +## Update — 2026-04-23 (late, probe iteration 3): hang pinpointed to `wait_for_no_more_peers()` + +Further DIAGDEBUG at every milestone in `async_main` +(runtime UP / EXITED service_tn / EXITED root_tn / +FINALLY ENTER / RETURNING) plus `_ForkedProc.wait` +ENTER/RETURNED per-pidfd. Result: + +**Every stuck actor reaches `async_main: FINALLY +ENTER` but NOT `async_main: RETURNING`.** + +That isolates the hang to a specific await in +`async_main`'s finally block at +`tractor/runtime/_runtime.py:1837+`. The suspect: + +```python +# Ensure all peers (actors connected to us as clients) are finished +if ipc_server := actor.ipc_server and ipc_server.has_peers(check_chans=True): + ... + await ipc_server.wait_for_no_more_peers() # ← UNBOUNDED, blocks forever +``` + +`_no_more_peers` is an `Event` set only when +`server._peers` empties (see +`ipc/_server.py:526-530`). If ANY peer-handler is +stuck (the 5 unclosed loops from the earlier pass), +it keeps its channel in `server._peers`, so the +event never fires, so the wait hangs. + +### Applied fix (partial, landed as defensive-in-depth) + +`tractor/runtime/_runtime.py:1981` — +`wait_for_no_more_peers()` call now wrapped in +`trio.move_on_after(3.0)` + a warning log when the +timeout fires. Commented with the full rationale. + +**Verified:** with this fix, ALL 15 actors reach +`async_main: RETURNING` cleanly (up from 10/15 +reaching end before). + +**Unfortunately:** the test still hangs past 45s +total — meaning there's YET ANOTHER unbounded wait +downstream of `async_main`. The bounded +`wait_for_no_more_peers` unblocks one level, but +the cascade has another level above it. + +### Candidates for the remaining hang + +1. `open_root_actor`'s own finally / post- + `async_main` flow in `_root.py` — specifically + `await actor.cancel(None)` which has its own + internal waits. +2. The `trio.run()` itself doesn't return even + after the root task completes because trio's + nursery still has background tasks running. +3. Maybe `_serve_ipc_eps`'s finally has an await + that blocks when peers aren't clearing. + +### Current stance + +- Defensive `wait_for_no_more_peers` bound landed + (good hygiene regardless). Revealing a real + deadlock-avoidance gap in tractor's cleanup. +- Test still hangs → skip-mark restored on + `test_nested_multierrors[subint_forkserver]`. +- The full chain of unbounded waits needs another + session of drilling, probably at + `open_root_actor` / `actor.cancel` level. + +### Summary of this investigation's wins + +1. **FD hygiene fix** (`_close_inherited_fds`) — + correct, closed orphan-SIGINT sibling issue. +2. **pidfd-based `_ForkedProc.wait`** — cancellable, + matches trio_proc pattern. +3. **`_parent_chan_cs` wiring** — + `Actor.cancel()` now breaks the shielded parent- + chan `process_messages` loop. +4. **`wait_for_no_more_peers` bounded** — + prevents the actor-level finally hang. +5. **Ruled-out hypotheses:** tree-kill missing + (wrong), stuck socket recv (wrong). +6. **Pinpointed remaining unknown:** at least one + more unbounded wait in the teardown cascade + above `async_main`. Concrete candidates + enumerated above. + +## Update — 2026-04-23 (VERY late): pytest capture pipe IS the final gate + +After landing fixes 1-4 and instrumenting every +layer down to `tractor_test`'s `trio.run(_main)`: + +**Empirical result: with `pytest -s` the test PASSES +in 6.20s.** Without `-s` (default `--capture=fd`) it +hangs forever. + +DIAG timeline for the root pytest PID (with `-s` +implied from later verification): + +``` +tractor_test: about to trio.run(_main) +open_root_actor: async_main task started, yielding to test body +_main: about to await wrapped test fn +_main: wrapped RETURNED cleanly ← test body completed! +open_root_actor: about to actor.cancel(None) +Actor.cancel ENTER req_chan=False +Actor.cancel RETURN +open_root_actor: actor.cancel RETURNED +open_root_actor: outer FINALLY +open_root_actor: finally END (returning from ctxmgr) +tractor_test: trio.run FINALLY (returned or raised) ← trio.run fully returned! +``` + +`trio.run()` fully returns. The test body itself +completes successfully (pytest.raises absorbed the +expected `BaseExceptionGroup`). What blocks is +**pytest's own stdout/stderr capture** — under +`--capture=fd` default, pytest replaces the parent +process's fd 1,2 with pipe write-ends it's reading +from. Fork children inherit those pipe fds +(because `_close_inherited_fds` correctly preserves +stdio). High-volume subactor error-log tracebacks +(7+ actors each logging multiple +`RemoteActorError`/`ExceptionGroup` tracebacks on +the error-propagation cascade) fill the 64KB Linux +pipe buffer. Subactor writes block. Subactor can't +progress. Process doesn't exit. Parent's +`_ForkedProc.wait` (now pidfd-based and +cancellable, but nothing's cancelling here since +the test body already completed) keeps the pipe +reader alive... but pytest isn't draining its end +fast enough because test-teardown/fixture-cleanup +is in progress. + +**Actually** the exact mechanism is slightly +different: pytest's capture fixture MIGHT be +actively reading, but faster-than-writer subactors +overflow its internal buffer. Or pytest might be +blocked itself on the finalization step. + +Either way, `-s` conclusively fixes it. + +### Why I ruled this out earlier (and shouldn't have) + +Earlier in this investigation I tested +`test_nested_multierrors` with/without `-s` and +both hung. That's because AT THAT TIME, fixes 1-4 +weren't all in place yet. The test was hanging at +multiple deeper levels long before reaching the +"generate lots of error-log output" phase. Once +the cascade actually tore down cleanly, enough +output was produced to hit the capture-pipe limit. + +**Classic order-of-operations mistake in +debugging:** ruling something out too early based +on a test that was actually failing for a +different reason. + +### Fix direction (next session) + +Redirect subactor stdout/stderr to `/dev/null` (or +a session-scoped log file) in the fork-child +prelude, right after `_close_inherited_fds()`. This +severs the inherited pytest-capture pipes and lets +subactor output flow elsewhere. Under normal +production use (non-pytest), stdout/stderr would +be the TTY — we'd want to keep that. So the +redirect should be conditional or opt-in via the +`child_sigint`/proc_kwargs flag family. + +Alternative: document as a gotcha and recommend +`pytest -s` for any tests using the +`subint_forkserver` backend with multi-level actor +trees. Simpler, user-visible, no code change. + +### Current state + +- Skip-mark on `test_nested_multierrors[subint_forkserver]` + restored with reason pointing here. +- Test confirmed passing with `-s` after all 4 + cascade fixes applied. +- The 4 cascade fixes are NOT wasted — they're + correct hardening regardless of the capture-pipe + issue, AND without them we'd never reach the + "actually produces enough output to fill the + pipe" state. + +## Stopgap (landed) + +`test_nested_multierrors` skip-marked under +`subint_forkserver` via +`@pytest.mark.skipon_spawn_backend('subint_forkserver', +reason='...')`, cross-referenced to this doc. Mark +should be dropped once the peer-channel-loop exit +issue is fixed. + +## References + +- `tractor/spawn/_subint_forkserver.py::fork_from_worker_thread` + — the primitive whose post-fork FD hygiene is + probably the culprit. +- `tractor/spawn/_subint_forkserver.py::subint_forkserver_proc` + — the backend function that orchestrates the + graceful cancel path hitting this bug. +- `tractor/spawn/_subint_forkserver.py::_ForkedProc` + — the `trio.Process`-compatible shim; NOT the + failing component (confirmed via thread-dump). +- `tests/test_cancellation.py::test_nested_multierrors` + — the test that surfaced the hang. +- `ai/conc-anal/subint_forkserver_orphan_sigint_hang_issue.md` + — sibling hang class; probably same underlying + fork-FD-inheritance root cause. +- tractor issue #379 — subint backend tracking. diff --git a/ai/conc-anal/subint_forkserver_thread_constraints_on_pep684_issue.md b/ai/conc-anal/subint_forkserver_thread_constraints_on_pep684_issue.md new file mode 100644 index 000000000..b3c4563d3 --- /dev/null +++ b/ai/conc-anal/subint_forkserver_thread_constraints_on_pep684_issue.md @@ -0,0 +1,186 @@ +# Revisit `subint_forkserver` thread-cache constraints once msgspec PEP 684 support lands + +> **Tracked at:** [#450](https://github.com/goodboy/tractor/issues/450) + +Follow-up tracker for cleanup work gated on the msgspec +PEP 684 adoption upstream ([jcrist/msgspec#563](https://github.com/jcrist/msgspec/issues/563)). + +Context — why this exists +------------------------- + +The `tractor.spawn._subint_forkserver` submodule currently +carries two "non-trio" thread-hygiene constraints whose +necessity is tangled with issues that *should* dissolve +under PEP 684 isolated-mode subinterpreters: + +1. `fork_from_worker_thread()` / `run_subint_in_worker_thread()` + internally allocate a **dedicated `threading.Thread`** + rather than using `trio.to_thread.run_sync()`. +2. The test helper is named + `run_fork_in_non_trio_thread()` — the + `non_trio` qualifier is load-bearing today. + +This doc catalogs *why* those constraints exist, which of +them isolated-mode would fix, and what the +audit-and-cleanup path looks like once msgspec #563 is +resolved. + +The three reasons the constraints exist +--------------------------------------- + +### 1. GIL-starvation class → fixed by PEP 684 isolated mode + +The class-A hang documented in +`subint_sigint_starvation_issue.md` is entirely about +legacy-config subints **sharing the main GIL**. Once +msgspec #563 lands and tractor flips +`tractor.spawn._subint` to +`concurrent.interpreters.create()` (isolated config), each +subint gets its own GIL. Abandoned subint threads can't +contend for main's GIL → can't starve the main trio loop +→ signal-wakeup-pipe drains normally → no SIGINT-drop. + +This class of hazard **dissolves entirely**. The +non-trio-thread requirement for *this reason* disappears. + +### 2. Destroy race / tstate-recycling → orthogonal; unclear + +The `subint_proc` dedicated-thread fix (commit `26fb8206`) +addressed a different issue: `_interpreters.destroy(interp_id)` +was blocking on a trio-cache worker that had run an +earlier `interp.exec()` for that subint. Working +hypothesis at the time was "the cached thread retains the +subint's tstate". + +But tstate-handling is **not specific to GIL mode** — +`_PyXI_Enter` / `_PyXI_Exit` (the C-level machinery both +configs use to enter/leave a subint from a thread) should +restore the caller's tstate regardless of GIL config. So +isolated mode **doesn't obviously fix this**. It might be: + +- A py3.13 bug fixed in later versions — we saw the race + first on 3.13 and never re-tested on 3.14 after moving + to dedicated threads. +- A genuine CPython quirk around cached threads that + exec'd into a subint, persisting across GIL modes. +- Something else we misdiagnosed — the empirical fix + (dedicated thread) worked but the analysis may have + been incomplete. + +Only way to know: once we're on isolated mode, empirically +retry `trio.to_thread.run_sync(interp.exec, ...)` and see +if `destroy()` still blocks. If it does, keep the +dedicated thread; if not, one constraint relaxed. + +### 3. Fork-from-main-interp-tstate (the constraint in this module's helper names) + +The fork-from-main-interp-tstate invariant — CPython's +`PyOS_AfterFork_Child` → +`_PyInterpreterState_DeleteExceptMain` gate documented in +`subint_fork_blocked_by_cpython_post_fork_issue.md` — is +about the calling thread's **current** tstate at the +moment `os.fork()` runs. If trio's cache threads never +enter subints at all, their tstate is plain main-interp, +and fork from them would be fine. + +The reason the smoke test + +`run_fork_in_non_trio_thread` test helper +currently use a dedicated `threading.Thread` is narrow: +**we don't want to risk a trio cache thread that has +previously been used as a subint driver being the one that +picks up the fork job**. If cached tstate doesn't get +cleared (back to reason #2), the fork's child-side +post-init would see the wrong interp and abort. + +In an isolated-mode world where msgspec works: + +- `subint_proc` would use the public + `concurrent.interpreters.create()` + `Interpreter.exec()` + / `Interpreter.close()` — which *should* handle tstate + cleanly (they're the "blessed" API). +- If so, trio's cache threads are safe to fork from + regardless of whether they've previously driven subints. +- → the `non_trio` qualifier in + `run_fork_in_non_trio_thread` becomes + *overcautious* rather than load-bearing, and the + dedicated-thread primitives in `_subint_forkserver.py` + can likely be replaced with straight + `trio.to_thread.run_sync()` wrappers. + +TL;DR +----- + +| constraint | fixed by isolated mode? | +|---|---| +| GIL-starvation (class A) | **yes** | +| destroy race on cached worker | unclear — empirical test on py3.14 + isolated API required | +| fork-from-main-tstate requirement on worker | **probably yes, conditional on the destroy-race question above** | + +If #2 also resolves on py3.14+ with isolated mode, +tractor could drop the `non_trio` qualifier from the fork +helper's name and just use `trio.to_thread.run_sync(...)` +for everything. But **we shouldn't do that preemptively** +— the current cautious design is cheap (one dedicated +thread per fork / per subint-exec) and correct. + +Audit plan when msgspec #563 lands +---------------------------------- + +Assuming msgspec grows `Py_mod_multiple_interpreters` +support: + +1. **Flip `tractor.spawn._subint` to isolated mode.** Drop + the `_interpreters.create('legacy')` call in favor of + the public API (`concurrent.interpreters.create()` + + `Interpreter.exec()` / `Interpreter.close()`). Run the + three `ai/conc-anal/subint_*_issue.md` reproducers — + class-A (`test_stale_entry_is_deleted` etc.) should + pass without the `skipon_spawn_backend('subint')` marks + (revisit the marker inventory). + +2. **Empirical destroy-race retest.** In `subint_proc`, + swap the dedicated `threading.Thread` back to + `trio.to_thread.run_sync(Interpreter.exec, ..., + abandon_on_cancel=False)` and run the full subint test + suite. If `Interpreter.close()` (or the backing + destroy) blocks the same way as the legacy version + did, revert and keep the dedicated thread. + +3. **If #2 clean**, audit `_subint_forkserver.py`: + - Rename `run_fork_in_non_trio_thread` → drop the + `_non_trio_` qualifier (e.g. `run_fork_in_thread`) or + inline the two-line `trio.to_thread.run_sync` call at + the call sites and drop the helper entirely. + - Consider whether `fork_from_worker_thread` + + `run_subint_in_worker_thread` still warrant being + separate module-level primitives or whether they + collapse into a compound + `trio.to_thread.run_sync`-driven pattern inside the + (future) `subint_forkserver_proc` backend. + +4. **Doc fallout.** `subint_sigint_starvation_issue.md` + and `subint_cancel_delivery_hang_issue.md` both cite + the legacy-GIL-sharing architecture as the root cause. + Close them with commit-refs to the isolated-mode + migration. This doc itself should get a closing + post-mortem section noting which of #1/#2/#3 actually + resolved vs persisted. + +References +---------- + +- `tractor.spawn._subint_forkserver` — the in-tree module + whose constraints this doc catalogs. +- `ai/conc-anal/subint_sigint_starvation_issue.md` — the + GIL-starvation class. +- `ai/conc-anal/subint_cancel_delivery_hang_issue.md` — + sibling Ctrl-C-able hang class. +- `ai/conc-anal/subint_fork_blocked_by_cpython_post_fork_issue.md` + — why fork-from-subint is blocked (this drives the + forkserver-via-non-subint-thread workaround). +- `ai/conc-anal/subint_fork_from_main_thread_smoketest.py` + — empirical validation for the workaround. +- [PEP 684 — per-interpreter GIL](https://peps.python.org/pep-0684/) +- [PEP 734 — `concurrent.interpreters` public API](https://peps.python.org/pep-0734/) +- [jcrist/msgspec#563 — PEP 684 support tracker](https://github.com/jcrist/msgspec/issues/563) +- tractor issue #379 — subint backend tracking. diff --git a/ai/conc-anal/test_register_duplicate_name_daemon_connect_race_issue.md b/ai/conc-anal/test_register_duplicate_name_daemon_connect_race_issue.md new file mode 100644 index 000000000..67d754710 --- /dev/null +++ b/ai/conc-anal/test_register_duplicate_name_daemon_connect_race_issue.md @@ -0,0 +1,273 @@ +# `test_register_duplicate_name` racy connect-failure on `daemon` fixture readiness + +## Symptom + +`tests/test_multi_program.py::test_register_duplicate_name` +fails intermittently under BOTH transports + ALL spawn +backends with connect-refused errors: + +``` +# under --tpt-proto=uds +FAILED tests/test_multi_program.py::test_register_duplicate_name +- ConnectionRefusedError: [Errno 111] Connection refused +( ^^^ this exc was collapsed from a group ^^^ ) + +# under --tpt-proto=tcp +FAILED tests/test_multi_program.py::test_register_duplicate_name +- OSError: all attempts to connect to 127.0.0.1:36003 failed +( ^^^ this exc was collapsed from a group ^^^ ) +``` + +Distinct from the cancel-cascade `TooSlowError` flake +class — see +`cancel_cascade_too_slow_under_main_thread_forkserver_issue.md`. +This is a **connect-time race** before the daemon is +fully ready to `accept()`, not a teardown-cascade +slowness. + +## Root cause: blind `time.sleep()` in `daemon` fixture + +`tests/conftest.py::daemon` boots a sub-py-process via +`subprocess.Popen([python, '-c', 'tractor.run_daemon(...)'])`, +then **blindly sleeps** a fixed delay before yielding +`proc` to the test: + +```python +# excerpt from tests/conftest.py::daemon +proc = subprocess.Popen([ + sys.executable, '-c', code, +]) + +bg_daemon_spawn_delay: float = _PROC_SPAWN_WAIT # 0.6 +if tpt_proto == 'uds': + bg_daemon_spawn_delay += 1.6 +if _non_linux and ci_env: + bg_daemon_spawn_delay += 1 + +# XXX, allow time for the sub-py-proc to boot up. +# !TODO, see ping-polling ideas above! +time.sleep(bg_daemon_spawn_delay) + +assert not proc.returncode +yield proc +``` + +Inherent fragility: the delay is "long enough on dev +boxes most of the time" but has no actual +synchronization with the daemon's `bind()` + `listen()` +completion. Under any of: + +- Loaded box (CI parallelism, big rebuild in + background, low-cpu-freq) +- Cold first-run (`importlib` cache miss, JIT warmup) +- Higher-than-expected `tractor` import cost +- Filesystem latency (UDS sockfile create, slow + tmpfs) + +...the sleep finishes BEFORE the daemon has bound its +listen socket → first test client call to +`tractor.find_actor()` / `wait_for_actor()` / +`open_nursery(registry_addrs=[reg_addr])`'s implicit +connect → `ConnectionRefusedError` (TCP) or +`FileNotFoundError`/`ConnectionRefusedError` (UDS). + +## Reproducer + +Easiest: run the suite under load. + +```bash +# create CPU pressure on another core in parallel +stress-ng --cpu 2 --timeout 600s & + +./py313/bin/python -m pytest \ + tests/test_multi_program.py::test_register_duplicate_name \ + --spawn-backend=main_thread_forkserver \ + --tpt-proto=tcp -v +``` + +Reproduces ~30-50% of the time on a dev laptop. On a +quiet idle box, may need 5-10 runs to hit. + +## Why the existing `_PROC_SPAWN_WAIT` tuning is +inadequate + +Recent `bg_daemon_spawn_delay` rename +(de-monotonic-grow fix) just-shipped removed the +*accumulation* bug where each invocation made the +NEXT test's wait longer too. Net effect: every +invocation now uses the SAME `0.6 + 1.6` (UDS) or +`0.6` (TCP) sleep, no growth. Good — but does +NOTHING for the underlying race. Each individual +test still relies on a blind sleep that may or may +not be sufficient. + +Bumping the constant higher pushes flake rate down +but never to zero AND adds dead time to every +non-flaking run. Not a fix, just a knob. + +## Side effects + +- **Inter-test cascade**: a single failure can cascade + via leaked subprocesses (the `daemon` fixture's + cleanup may not fully tear down a daemon that never + reached "ready"). The `_reap_orphaned_subactors` + session-end + `_track_orphaned_uds_per_test` + per-test fixtures handle most of this now, but the + affected test itself still fails. +- **Worsens under fork-spawn backends**: the daemon + has more init work + (`_main_thread_forkserver`-coordinator-thread + startup, etc.) so the sleep has to cover MORE. + +## Fix design — replace blind sleep with active poll + +The right primitive is **poll the daemon's bind +address until it accepts a connection or we time +out**, with the timeout being a hard ceiling rather +than a baseline. Two implementation paths: + +### Path A — TCP/UDS connect-poll loop + +Try `socket.connect(reg_addr)` in a tight loop with +short backoff (~50ms), succeed on the first non-error +return, fail-loud on a hard cap (e.g. 10s). Same +primitive works for both transports because both use +`socket.connect()` semantics. + +Rough shape: + +```python +def _wait_for_daemon_ready( + reg_addr, + tpt_proto: str, + timeout: float = 10.0, + poll_interval: float = 0.05, +) -> None: + deadline = time.monotonic() + timeout + while True: + if tpt_proto == 'tcp': + sock = socket.socket(socket.AF_INET) + target = reg_addr # (host, port) + else: # uds + sock = socket.socket(socket.AF_UNIX) + target = os.path.join(*reg_addr) + try: + sock.settimeout(poll_interval) + sock.connect(target) + except ( + ConnectionRefusedError, + FileNotFoundError, + socket.timeout, + ) as exc: + if time.monotonic() >= deadline: + raise TimeoutError( + f'Daemon never accepted on {target!r} ' + f'within {timeout}s' + ) from exc + time.sleep(poll_interval) + else: + sock.close() + return +``` + +Pros: trivial primitive, no tractor-runtime +dependency, works pre-yield in the fixture body, +fail-fast on truly-broken daemon. +Cons: doesn't actually do an IPC handshake, just +proves listen-side is up. A daemon that bound but +hasn't initialized its registrar table yet would +still race. + +### Path B — `tractor.find_actor()` poll + +Use the actual discovery API the test would call: + +```python +async def _wait_for_daemon_ready_via_discovery( + reg_addr, + timeout: float = 10.0, + poll_interval: float = 0.05, +): + deadline = trio.current_time() + timeout + async with tractor.open_root_actor( + registry_addrs=[reg_addr], + # ephemeral root just for the probe + ): + while True: + try: + async with tractor.find_actor( + 'registrar', # daemon's own name + registry_addrs=[reg_addr], + ) as portal: + if portal is not None: + return + except Exception: + pass + if trio.current_time() >= deadline: + raise TimeoutError(...) + await trio.sleep(poll_interval) +``` + +Pros: actually proves the discovery path works, +handles the "bound but not ready" case naturally. +Cons: requires booting an ephemeral root actor JUST +for the probe (overhead), more code, and runs in trio +which complicates the sync-fixture context. Need a +`trio.run()` wrapper. + +### Recommended: Path A with optional handshake check + +Path A is much simpler + handles 95% of the bug +class. If "bound-but-not-ready" turns out to still +race (it shouldn't — `tractor.run_daemon` doesn't +return from `bind()` until the registrar is +fully populated), escalate to Path B as a focused +follow-up. + +## Workarounds (until fix lands) + +1. **Bump `_PROC_SPAWN_WAIT`** higher (current: 0.6). + 2.0–3.0 hides most flakes at the cost of adding + dead time to every test. Not a fix but reduces + blast radius while the proper poll lands. +2. **`pytest-rerunfailures`** with `reruns=1` on the + `daemon` fixture's tests specifically. Hides the + flake but doesn't address it. +3. **Mark known-affected tests as `xfail(strict=False)`** + under `--ci`. Lets CI go green at the cost of + silently hiding regressions. + +(Recommend skipping all three — implement the active +poll instead.) + +## Investigation next steps + +1. Implement Path A as a `_wait_for_daemon_ready()` + helper in `tests/conftest.py`. Replace the + `time.sleep(bg_daemon_spawn_delay)` call with it. +2. Drop the `_PROC_SPAWN_WAIT` constant entirely + (active poll obsoletes blind sleep). +3. Run the suite 5-10 times to validate flake rate + drops to 0. +4. If flakes persist, profile whether the daemon + process exits with non-zero before the poll's + deadline hits — that'd be a different bug + (daemon startup crash) that the blind sleep was + masking. +5. Cross-check `tests/test_multi_program.py::test_*` + — multiple tests use the `daemon` fixture; all + should benefit from the same poll primitive. + +## Related + +- `tests/conftest.py::daemon` — the fixture under + fix +- `tests/conftest.py::_PROC_SPAWN_WAIT` — the + constant to drop +- `cancel_cascade_too_slow_under_main_thread_forkserver_issue.md` + — distinct flake class (cancel-cascade + `TooSlowError` at teardown, not connect-time race) +- `trio_wakeup_socketpair_busy_loop_under_fork_issue.md` + — different bug entirely; this race was masked + pre-WakeupSocketpair-patch by the busy-loop + hangs. diff --git a/ai/conc-anal/trio_wakeup_socketpair_busy_loop_under_fork_issue.md b/ai/conc-anal/trio_wakeup_socketpair_busy_loop_under_fork_issue.md new file mode 100644 index 000000000..213841e99 --- /dev/null +++ b/ai/conc-anal/trio_wakeup_socketpair_busy_loop_under_fork_issue.md @@ -0,0 +1,221 @@ +# trio `WakeupSocketpair.drain()` busy-loop in forked child (peer-closed missed-EOF) + +## Reproducer + +```bash +./py313/bin/python -m pytest \ + tests/test_multi_program.py::test_register_duplicate_name \ + --tpt-proto=tcp \ + --spawn-backend=main_thread_forkserver \ + -v --capture=sys +``` + +Subactor pegs a CPU core indefinitely; parent test +hangs waiting for the subactor. + +## Empirical evidence (caught alive) + +``` +$ sudo strace -p +recvfrom(6, "", 65536, 0, NULL, NULL) = 0 +recvfrom(6, "", 65536, 0, NULL, NULL) = 0 +recvfrom(6, "", 65536, 0, NULL, NULL) = 0 +... (no `epoll_wait`, no other syscalls, just this back-to-back) +``` + +Pattern: tight C-level `recvfrom` loop returning 0 +each call. No `epoll_wait` between iterations → +**not trio's task scheduler**. Pure synchronous C +loop. + +``` +$ sudo readlink /proc//fd/6 +socket:[] + +$ sudo lsof -p | grep ' 6u' + goodboy 6u unix 0xffff... 0t0 type=STREAM (CONNECTED) +``` + +fd=6 is an **AF_UNIX socket** in CONNECTED state. +Even though the test uses `--tpt-proto=tcp`, this fd +is NOT a tractor IPC channel — it's an internal +trio socketpair. + +## Root-cause: `WakeupSocketpair.drain()` + +`/site-packages/trio/_core/_wakeup_socketpair.py`: + +```python +class WakeupSocketpair: + def __init__(self) -> None: + self.wakeup_sock, self.write_sock = socket.socketpair() + self.wakeup_sock.setblocking(False) + self.write_sock.setblocking(False) + ... + + def drain(self) -> None: + try: + while True: + self.wakeup_sock.recv(2**16) + except BlockingIOError: + pass +``` + +`socket.socketpair()` on Linux defaults to AF_UNIX +SOCK_STREAM. Both ends non-blocking. Normal flow: + +1. Signal/wake event → `write_sock.send(b'\x00')` + queues a byte. +2. `wakeup_sock` becomes readable → trio's epoll + triggers. +3. Trio calls `drain()` to flush the buffer. +4. drain loops on `wakeup_sock.recv(64KB)`. +5. Eventually buffer empty → non-blocking socket + raises `BlockingIOError` → except → break. + +**Bug surface — peer-closed missed-EOF**: + +Non-blocking socket semantics: +- buffer has data → `recv` returns N>0 bytes (loop continues) +- buffer empty → `recv` raises `BlockingIOError` +- **peer FIN'd → `recv` returns 0 bytes (NEITHER exception NOR + break — infinite tight loop)** + +`drain()` does not handle the `b''` return-value +(EOF) case. If `write_sock` has been closed (or the +process holding it is gone), every iteration returns +0 → infinite loop → 100% CPU on a single core. + +## Why this triggers under `main_thread_forkserver` + +Under `os.fork()` from the forkserver-worker thread: + +1. Parent has a `WakeupSocketpair` instance with + `wakeup_sock=fdN`, `write_sock=fdM`. Both fds + open in parent. +2. Fork → child inherits BOTH fds (kernel-level fd + table dup). +3. `_close_inherited_fds()` runs in child → + closes everything except stdio. `wakeup_sock` and + `write_sock` of the parent's `WakeupSocketpair` + ARE closed in child. +4. Child's trio (running fresh) creates its OWN + `WakeupSocketpair` → NEW fd numbers (e.g. fd 6, 7). +5. **In `infect_asyncio` mode** the asyncio loop is + the host; trio runs as guest via + `start_guest_run`. trio still creates its + `WakeupSocketpair` in the I/O manager but its + role is different. + +The race window: somewhere between (3) and (5), if a +`WakeupSocketpair` Python object reference inherited +via COW (from parent's pre-fork heap) survives long +enough that `drain()` is called on it AFTER its fds +were closed but BEFORE the child's NEW socketpair +takes over the recycled fd numbers — the recycled fd +will be one of the child's NEW socketpair ends, whose +peer might be FIN-flagged (e.g. parent-process +peer-end is closed). + +Or simpler: the `wait_for_actor`/`find_actor` discovery +flow in `test_register_duplicate_name` triggers an +unusual code path where a stale `WakeupSocketpair` +gets `drain()`-called on a fd whose peer has already +closed. + +## Why `drain()` shouldn't loop indefinitely on EOF +(upstream trio bug) + +Even WITHOUT fork, `drain()` should treat `b''` as +EOF and break. The current code is correct for the +"buffer drained on a healthy socketpair" scenario but +incorrect for the "peer is gone" scenario. It's a +defensive-programming gap in trio. + +A one-line patch upstream: + +```python +def drain(self) -> None: + try: + while True: + data = self.wakeup_sock.recv(2**16) + if not data: + break # peer-closed; nothing more to drain + except BlockingIOError: + pass +``` + +## Workarounds (until the underlying issue lands) + +1. **Skip-mark on the fork backend**: + `tests/test_multi_program.py` → + `pytest.mark.skipon_spawn_backend('main_thread_forkserver', + reason='trio WakeupSocketpair.drain busy-loop, see ai/conc-anal/trio_wakeup_socketpair_busy_loop_under_fork_issue.md')`. + +2. **Defensive monkey-patch in tractor's + forkserver-child prelude** — wrap + `WakeupSocketpair.drain` to handle `b''`: + + ```python + # in `_actor_child_main` or `_close_inherited_fds`'s + # post-fork prelude: + from trio._core._wakeup_socketpair import WakeupSocketpair + _orig_drain = WakeupSocketpair.drain + def _safe_drain(self): + try: + while True: + data = self.wakeup_sock.recv(2**16) + if not data: + return # peer closed + except BlockingIOError: + pass + WakeupSocketpair.drain = _safe_drain + ``` + + Tracks upstream — remove once trio fixes. + +3. **Upstream the fix**: 1-line PR to `python-trio/trio` + adding `if not data: break` to `drain()`. + +## Investigation next steps + +1. **Confirm via py-spy**: when caught alive, detach + strace first then + `sudo py-spy dump --pid --locals`. The + busy thread should show `drain` from `WakeupSocketpair` + in the call chain. +2. **Identify which write-end peer is closed**: from + the inode of fd 6, look up the matching peer + inode via `ss -xp` and see whose process it + was/is. +3. **Verify the missed-EOF hypothesis**: hand-craft a + minimal `WakeupSocketpair` repro: + + ```python + from trio._core._wakeup_socketpair import WakeupSocketpair + ws = WakeupSocketpair() + ws.write_sock.close() # simulate peer-gone + ws.drain() # should hang forever + ``` + +## Sibling bug + +`tests/test_infected_asyncio.py::test_aio_simple_error` +hangs under the same backend with a DIFFERENT +fingerprint (Mode-A deadlock, both parties in +`epoll_wait`, no busy-loop). Distinct root cause — +see `infected_asyncio_under_main_thread_forkserver_hang_issue.md`. + +Both share the broader theme: **trio internal-state +initialization isn't fully fork-safe under +`main_thread_forkserver`** for the more exotic +dispatch paths. + +## See also + +- [#379](https://github.com/goodboy/tractor/issues/379) — subint umbrella +- python-trio/trio#1614 — trio + fork hazards +- `trio._core._wakeup_socketpair.WakeupSocketpair` + source (the smoking gun) +- `ai/conc-anal/fork_thread_semantics_execution_vs_memory.md` +- `ai/conc-anal/infected_asyncio_under_main_thread_forkserver_hang_issue.md` diff --git a/ai/prompt-io/claude/20260422T200723Z_797f57c_prompt_io.md b/ai/prompt-io/claude/20260422T200723Z_797f57c_prompt_io.md new file mode 100644 index 000000000..e606db8ff --- /dev/null +++ b/ai/prompt-io/claude/20260422T200723Z_797f57c_prompt_io.md @@ -0,0 +1,155 @@ +--- +model: claude-opus-4-7[1m] +service: claude +session: subints-phase-b-hardening-and-fork-block +timestamp: 2026-04-22T20:07:23Z +git_ref: 797f57c +scope: code +substantive: true +raw_file: 20260422T200723Z_797f57c_prompt_io.raw.md +--- + +## Prompt + +Session-spanning work on the Phase B `subint` spawn-backend. +Three distinct sub-phases in one log: + +1. **Py3.13 gate tightening** — diagnose a reproducible hang + of subint spawn flow under py3.13 (works on py3.14), trace + to a private `_interpreters` module vintage issue, tighten + our feature gate from "`_interpreters` present" to "public + `concurrent.interpreters` present" (i.e. py3.14+). +2. **Test-harness hardening** — add `pytest-timeout` dep, put + `@pytest.mark.timeout(30, method='thread')` on the + three known-hanging subint tests cataloged in + `ai/conc-anal/subint_sigint_starvation_issue.md`. Separately, + code-review the user's in-flight `skipon_spawn_backend` + marker implementation; find four bugs; refactor to use + `item.iter_markers()`. +3. **`subint_fork` prototype → CPython-block finding** — draft + a WIP `subint_fork_proc` backend using a sub-interpreter as + a launchpad for `os.fork()` (to sidestep trio#1614). User + tests on py3.14, hits + `Fatal Python error: _PyInterpreterState_DeleteExceptMain: + not main interpreter`. Walk CPython sources (local clone at + `~/repos/cpython/`) to pinpoint the refusal + (`Modules/posixmodule.c:728` → `Python/pystate.c:1040`). + Revert implementation to a `NotImplementedError` stub in a + new `_subint_fork.py` submodule, document the finding in a + third `conc-anal/` doc with an upstream-report draft for + the CPython issue tracker. Finally, discuss user's proposed + workaround architecture (main-interp worker-thread + forkserver) and draft a standalone smoke-test script for + feasibility validation. + +## Response summary + +All three sub-phases landed concrete artifacts: + +**Sub-phase 1** — `_subint.py` + `_spawn.py` gates + error +messages updated to require py3.14+ via the public +`concurrent.interpreters` module presence check. Module +docstring revised to explain the empirical reason +(py3.13's private `_interpreters` vintage wedges under +multi-trio-task usage even though minimal standalone +reproducers work fine there). Test-module +`pytest.importorskip` likewise switched. + +**Sub-phase 2** — `pytest-timeout>=2.3` added to `testing` +dep group. `@pytest.mark.timeout(30, method='thread')` +applied on: +- `tests/discovery/test_registrar.py::test_stale_entry_is_deleted` +- `tests/test_cancellation.py::test_cancel_while_childs_child_in_sync_sleep` +- `tests/test_cancellation.py::test_multierror_fast_nursery` +- `tests/test_subint_cancellation.py::test_subint_non_checkpointing_child` + +`method='thread'` documented inline as load-bearing — the +GIL-starvation path that drops `SIGINT` would equally drop +`SIGALRM`, so only a watchdog-thread timeout can reliably +escape. + +`skipon_spawn_backend` plugin refactored into a single +`iter_markers`-driven loop in `pytest_collection_modifyitems` +(~30 LOC replacing ~30 LOC of nested conditionals). Four +bugs dissolved: wrong `.get()` key, module-level `pytestmark` +suppressing per-test marks, unhandled `pytestmark = [list]` +form, `pytest.Makr` typo. Marker help text updated to +document the variadic backend-list + `reason=` kwarg +surface. + +**Sub-phase 3** — Prototype drafted (then reverted): + +- `tractor/spawn/_subint_fork.py` — new dedicated submodule + housing the `subint_fork_proc` stub. Module docstring + + fn docstring explain the attempt, the CPython-level + block, and the reason for keeping the stub in-tree + (documentation of the attempt + starting point if CPython + ever lifts the restriction). +- `tractor/spawn/_spawn.py` — `'subint_fork'` registered as a + `SpawnMethodKey` literal + in `_methods`, so + `--spawn-backend=subint_fork` routes to a clean + `NotImplementedError` pointing at the analysis doc rather + than an "invalid backend" error. +- `ai/conc-anal/subint_fork_blocked_by_cpython_post_fork_issue.md` — + third sibling conc-anal doc. Full annotated CPython + source walkthrough from user-visible + `Fatal Python error` → `Modules/posixmodule.c:728 + PyOS_AfterFork_Child()` → `Python/pystate.c:1040 + _PyInterpreterState_DeleteExceptMain()` gate. Includes a + copy-paste-ready upstream-report draft for the CPython + issue tracker with a two-tier ask (ideally "make it work", + minimally "cleaner error than `Fatal Python error` + aborting the child"). +- `ai/conc-anal/subint_fork_from_main_thread_smoketest.py` — + standalone zero-tractor-import CPython-level smoke test + for the user's proposed workaround architecture + (forkserver on a main-interp worker thread). Four + argparse-driven scenarios: `control_subint_thread_fork` + (reproduces the known-broken case as a test-harness + sanity), `main_thread_fork` (baseline), `worker_thread_fork` + (architectural assertion), `full_architecture` + (end-to-end trio-in-subint in forked child). User will + run on py3.14 next. + +## Files changed + +See `git log 26fb820..HEAD --stat` for the canonical list. +New files this session: +- `tractor/spawn/_subint_fork.py` +- `ai/conc-anal/subint_fork_blocked_by_cpython_post_fork_issue.md` +- `ai/conc-anal/subint_fork_from_main_thread_smoketest.py` + +Modified (diff pointers in raw log): +- `tractor/spawn/_subint.py` (py3.14 gate) +- `tractor/spawn/_spawn.py` (`subint_fork` registration) +- `tractor/_testing/pytest.py` (`skipon_spawn_backend` refactor) +- `pyproject.toml` (`pytest-timeout` dep) +- `tests/discovery/test_registrar.py`, + `tests/test_cancellation.py`, + `tests/test_subint_cancellation.py` (timeout marks, + cross-refs to conc-anal docs) + +## Human edits + +Several back-and-forth iterations with user-driven +adjustments during the session: + +- User corrected my initial mis-classification of + `test_cancel_while_childs_child_in_sync_sleep[subint-False]` + as Ctrl-C-able — second strace showed `EAGAIN`, putting + it squarely in class A (GIL-starvation). Re-analysis + preserved in the raw log. +- User independently fixed the `.get(reason)` → `.get('reason', reason)` + bug in the marker plugin before my review; preserved their + fix. +- User suggested moving the `subint_fork_proc` stub from + the bottom of `_subint.py` into its own + `_subint_fork.py` submodule — applied. +- User asked to keep the forkserver-architecture + discussion as background for the smoke-test rather than + committing to a tractor-side refactor until the smoke + test validates the CPython-level assumptions. + +Commit messages in this range (b025c982 … 797f57c) were +drafted via `/commit-msg` + `rewrap.py --width 67`; user +landed them with the usual review. diff --git a/ai/prompt-io/claude/20260422T200723Z_797f57c_prompt_io.raw.md b/ai/prompt-io/claude/20260422T200723Z_797f57c_prompt_io.raw.md new file mode 100644 index 000000000..395523fef --- /dev/null +++ b/ai/prompt-io/claude/20260422T200723Z_797f57c_prompt_io.raw.md @@ -0,0 +1,343 @@ +--- +model: claude-opus-4-7[1m] +service: claude +timestamp: 2026-04-22T20:07:23Z +git_ref: 797f57c +diff_cmd: git log 26fb820..HEAD # all session commits since the destroy-race fix log +--- + +Session-spanning conversation covering the Phase B hardening +of the `subint` spawn-backend and an investigation into a +proposed `subint_fork` follow-up which turned out to be +blocked at the CPython level. This log is a narrative capture +of the substantive turns (not every message) and references +the concrete code + docs the session produced. Per diff-ref +mode the actual code diffs are pointed at via `git log` on +each ref rather than duplicated inline. + +## Narrative of the substantive turns + +### Py3.13 hang / gate tightening + +Diagnosed a reproducible hang of the `subint` backend under +py3.13 (test_spawning tests wedge after root-actor bringup). +Root cause: py3.13's vintage of the private `_interpreters` C +module has a latent thread/subint-interaction issue that +`_interpreters.exec()` silently fails to progress under +tractor's multi-trio usage pattern — even though a minimal +standalone `threading.Thread` + `_interpreters.exec()` +reproducer works fine on the same Python. Empirically +py3.14 fixes it. + +Fix (from this session): tighten the `_has_subints` gate in +`tractor.spawn._subint` from "private module importable" to +"public `concurrent.interpreters` present" — which is 3.14+ +only. This leaves `subint_proc()` unchanged in behavior (we +still call the *private* `_interpreters.create('legacy')` +etc. under the hood) but refuses to engage on 3.13. + +Also tightened the matching gate in +`tractor.spawn._spawn.try_set_start_method('subint')` and +rev'd the corresponding error messages from "3.13+" to +"3.14+" with a sentence explaining why. Test-module +`pytest.importorskip` switched from `_interpreters` → +`concurrent.interpreters` to match. + +### `pytest-timeout` dep + `skipon_spawn_backend` marker plumbing + +Added `pytest-timeout>=2.3` to the `testing` dep group with +an inline comment pointing at the `ai/conc-anal/*.md` docs. +Applied `@pytest.mark.timeout(30, method='thread')` (the +`method='thread'` is load-bearing — `signal`-method +`SIGALRM` suffers the same GIL-starvation path that drops +`SIGINT` in the class-A hang pattern) to the three known- +hanging subint tests cataloged in +`subint_sigint_starvation_issue.md`. + +Separately code-reviewed the user's newly-staged +`skipon_spawn_backend` pytest marker implementation in +`tractor/_testing/pytest.py`. Found four bugs: + +1. `modmark.kwargs.get(reason)` called `.get()` with the + *variable* `reason` as the dict key instead of the string + `'reason'` — user-supplied `reason=` was never picked up. + (User had already fixed this locally via `.get('reason', + reason)` by the time my review happened — preserved that + fix.) +2. The module-level `pytestmark` branch suppressed per-test + marker handling (the `else:` was an `else:` rather than + independent iteration). +3. `mod_pytestmark.mark` assumed a single + `MarkDecorator` — broke on the valid-pytest `pytestmark = + [mark, mark]` list form. +4. Typo: `pytest.Makr` → `pytest.Mark`. + +Refactored the hook to use `item.iter_markers(name=...)` +which walks function + class + module scopes uniformly and +handles both `pytestmark` forms natively. ~30 LOC replaced +the original ~30 LOC of nested conditionals, all four bugs +dissolved. Also updated the marker help string to reflect +the variadic `*start_methods` + `reason=` surface. + +### `subint_fork_proc` prototype attempt + +User's hypothesis: the known trio+`fork()` issues +(python-trio/trio#1614) could be sidestepped by using a +sub-interpreter purely as a launchpad — `os.fork()` from a +subint that has never imported trio → child is in a +trio-free context. In the child `execv()` back into +`python -m tractor._child` and the downstream handshake +matches `trio_proc()` identically. + +Drafted the prototype at `tractor/spawn/_subint.py`'s bottom +(originally — later moved to its own submod, see below): +launchpad-subint creation, bootstrap code-string with +`os.fork()` + `execv()`, driver-thread orchestration, +parent-side `ipc_server.wait_for_peer()` dance. Registered +`'subint_fork'` as a new `SpawnMethodKey` literal, added +`case 'subint' | 'subint_fork':` feature-gate arm in +`try_set_start_method()`, added entry in `_methods` dict. + +### CPython-level block discovered + +User tested on py3.14 and saw: + +``` +Fatal Python error: _PyInterpreterState_DeleteExceptMain: not main interpreter +Python runtime state: initialized + +Current thread 0x00007f6b71a456c0 [subint-fork-lau] (most recent call first): + File "