fix(codex): force valid model on app-server spawn to avoid silent hang#34
Merged
Conversation
SMA-24: Verify orchestrator fixes from PR #19 + PR #21 cherry-picks. - 2 regression tests pinning PR #19 (workflow_dir + reuse_policy + hook_env simultaneous reload) and PR #21 (start_session failure cleanup) invariants. - New docs/llm-wiki/orchestrator-phase-transition.md entry. - Zero production code changes (git diff main -- src/ empty pre-merge). QA: targeted 91 passed / 1 skipped, full suite 524 passed / 6 skipped.
WORKFLOW.md `codex.command` now passes `-c model=gpt-5-codex` so codex 0.130's `thread/start` returns a real response even when the user's `~/.codex/config.toml` pins an invalid model. Without the override, codex hangs silently on `thread/start` (no JSON-RPC error, no exit), and Symphony eventually surfaces this as the confusing `error: port_exit: subprocess stdout closed` after the subprocess is killed for unrelated reasons. Override at the command level keeps the user-level codex config untouched — other tools that share `~/.codex/` may rely on whatever the user has set. Full diagnosis (reproduction, root cause, upstream + Symphony follow-ups) lives at docs/SMA-26/diagnosis.md.
…nched backends When the Symphony orchestrator is launched in the background (e.g. `nohup python -m symphony.cli ... &` or a systemd-style unit with no TTY), the orchestrator process inherits a closed or half-broken fd 0. The previous `subprocess.run` for hooks did not pin `stdin`, so the hook's `bash` and any grandchild it spawned inherited that broken fd. The most visible victim is `after_create`'s `scripts/symphony-setup-worktree.sh`, which runs `python -m venv .venv` + `pip install`. CPython aborts at startup with Fatal Python error: init_sys_streams: can't initialize sys standard streams / OSError: [Errno 9] Bad file descriptor and the orchestrator surfaces this as a confusing `hook after_create exited 1`. Symptomatically the very first hook invocation per-backend tends to succeed (workspace freshly created) while subsequent ones fail — observed live as 4+ consecutive `hook_failed` after ~9 healthy dispatches against the same backend on macOS. The one-line `stdin=subprocess.DEVNULL` pins fd 0 for every hook spawn regardless of how the parent was launched. No effect on hooks that don't read stdin (every existing hook in the repo). Pairs with the `-c model=gpt-5-codex` codex command override on this branch — both are backend-reliability fixes that make Symphony behave the same whether you run it from a TTY or in the background.
cskwork
added a commit
that referenced
this pull request
May 17, 2026
Patch release rolling up the post-0.6.0 reliability fixes: - #19 refresh workflow dir on hook reload (cherry-pick) - #21 stop failed phase-transition backend (cherry-pick) - #22 isolate corrupt file tickets during scan - #23 honor symphony.autocommitExclude in commit_workspace_on_done (opt-in escape hatch; default behavior unchanged) - #34 force valid codex model on app-server spawn to avoid silent hang - #34 pin hook stdin to /dev/null for background-launched backends All changes are bug fixes restoring intended behavior — no user-facing feature additions or signature changes. Pinning pyproject.toml and src/symphony/__init__.py in lockstep.
2 tasks
cskwork
added a commit
that referenced
this pull request
May 17, 2026
SMA-25 (Verify autocommitExclude mechanism from PR #23) ran on codex and was self-Blocked at Learn when the merge gate failed against .git/worktrees/<ID>/ — the codex sandbox refused writes through the worktree's git admin dir. That sandbox gap is fixed separately in PR #36. The agent reached Learn with substantive artefacts though, and they are worth keeping independently of the merge-gate failure. This recovers only the high-signal files from `symphony/SMA-25` (commit 9c964fe) and leaves out the per-turn status echoes and raw pytest JSON/diff runs that were progress noise rather than reference material. Recovered: - docs/SMA-25/{explore,plan,work,qa,learn}/* — phase deliverables (Explore notes + reuse inventory, implementation plan, work + qa-rewind summaries, QA api-surface + details, Learn details). - docs/features/SMA-25/index.md — As-Is/To-Be one-pager. - docs/llm-wiki/workspace-auto-commit-excludes.md — new wiki entry covering the opt-in `symphony.autocommitExclude` mechanism, including the base-squash safety case Explore surfaced beyond the original brief. - docs/llm-wiki/INDEX.md — one new row for the wiki entry (kept the existing SMA-24 `orchestrator-phase-transition` row, which the SMA-25 branch had silently dropped because it was forked before SMA-24 merged into main). Not recovered (deliberately): - docs/SMA-25/todo/turn-*-status.md + blocker.md (stale-worker echo logs from the pre-restart cycle). - docs/SMA-25/qa/runs/*.json + qa/diff/*.diff (raw pytest output, large and reproducible from `pytest` directly). - src/symphony/*.py / tests/*.py / pyproject.toml / __init__.py changes on the SMA-25 branch — those were forked before PR #19/#21/#22/#23/#34/ #36 landed and would revert main.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
codex.commandnow passes-c model=gpt-5-codexto codexapp-serverso an invalid model in the user's~/.codex/config.tomlno longer makesthread/starthang silently.docs/SMA-26/diagnosis.mdwith reproduction, root cause, and follow-up notes.Why
codex
0.130.0does not return a JSON-RPC error onthread/startwhen the configured model is invalid — it just stops responding. Symphony's_request()has no per-call timeout, so the silent hang surfaces much later as the confusingerror: port_exit: subprocess stdout closedonce the subprocess is killed for unrelated reasons. Six straight retries on SMA-25 hit exactly this failure mode.The override pins a valid model at spawn time without mutating the user's home config (other tools that share
~/.codex/may rely on whatever the user has set).How verified
Manual JSON-RPC handshake against
codex app-server(with and without the override):Without
-c model=...: only theinitializeresponse comes back,thread/start(id=2) returns zero bytes,timeoutkills the process at 7s. With the override:thread/startresponds immediately withresult.thread.id = 019e34f0-c760-7540-b18e-d9fbfedd65bdplus the expectedthread/startednotification.Symphony backend dispatch verified live — see
docs/SMA-26/diagnosis.mdfor the full trace and file:line references.Test plan
pytest -q).~/.codex/config.toml, a codex-routed ticket dispatches withoutport_exit.~/.codex/config.toml, the override has no negative effect — codex picksgpt-5-codexas expected.Follow-ups (separate)
thread/startfor invalid models instead of hanging._request()(or at minimum tothread/start) so silent hangs surface fast as a clearRequestTimeoutrather than a staleport_exitminutes later.