feat: model Codex subagent workflows by tpn · Pull Request #242 · wesm/agentsview

tpn · 2026-03-26T17:33:49Z

Closes #233.

Summary

model Codex spawn_agent calls as real subagent-linked tool calls using existing subagent_session_id plumbing
turn terminal Codex wait / <subagent_notification> status into structured tool output instead of raw user-lane JSON
fall back to full parse only for Codex subagent-specific incremental events, and bump dataVersion so upgraded hosts auto-resync and re-export historical file-backed Codex sessions

Details

This PR teaches the Codex parser to preserve the information needed for the existing subagent UI path instead of dropping it on the floor:

function_call messages now retain call_id as ToolUseID
spawn_agent is categorized as a Task
spawn_agent outputs map the parent tool call to the child Codex session ID (codex:<agent_id>)
wait terminal results become paired tool output
<subagent_notification> becomes a fallback source of terminal output when no matching wait result was recorded
non-terminal running notifications are ignored so they do not suppress later completion/error results

The frontend also hides raw <subagent_notification> transcript rows, so already-synced databases stop showing the JSON blobs immediately after upgrading.

Upgrade / Rollout

no SQLite schema rebuild required
no PostgreSQL drop/recreate required
dataVersion is bumped so upgraded hosts automatically perform a full local resync
agentsview pg push already forces a full PG push after a local resync, so exporters will re-export automatically on the next full push path

Practical effect for self-hosted PG users:

upgrade the exporter binary on each host
run agentsview pg push --full once, or let the existing periodic exporter do the next full-sync cycle
keep the shared PG instance in place

Known limit:

historical sessions whose raw Codex JSONL source file is already gone cannot be retroactively relinked, because the old SQLite archive never stored the dropped function_call_output linkage data

Test Plan

CC=/usr/bin/gcc CXX=/usr/bin/g++ CGO_ENABLED=1 go test -tags fts5 ./internal/db -count=1
CC=/usr/bin/gcc CXX=/usr/bin/g++ CGO_ENABLED=1 go test -tags fts5 ./internal/parser -count=1
npm test -- --run src/lib/utils/messages.test.ts

roborev-ci · 2026-03-26T17:37:40Z

roborev: Combined Review (`79dbf9d`)

The PR requires changes to
address two medium-severity issues related to Codex subagent state tracking and result threading.

Medium

Location: internal/parser/codex.go:164, internal/parser/codex.go:244
Problem: Subagent completion notifications are
always associated with the original spawn_agent call because agentCalls is only populated from spawn_agent outputs. If Codex issues a wait for that agent before the notification arrives, the notification still gets attached to spawn_agent, and a later wait output can produce a second
completion result for the same agent. This misthreads the result and can duplicate the completion message.
Fix: When parsing a wait call, extract its ids and remap those agent IDs to the wait call ID, or otherwise defer notification/result binding until you know whether a
matching wait exists.
Location: internal/parser/codex.go:224
Problem: handleFunctionCallOutput marks every agent in status.Map() as completed once any terminal text is rendered. For mixed wait results where one agent
is completed/errored and another is still running, the running agent is incorrectly marked in agentResults, so its later completion notification/output will be dropped.
Fix: Only set agentResults[agentID] = true for entries whose status actually produced terminal text (
completed or errored), not for every key in the status object.

Synthesized from 3 reviews (agents: codex, gemini | types: default, security)

tpn · 2026-03-26T18:26:26Z

Addressed the Codex subagent threading feedback in 8856c1b.

Fixes included:

remap waited agent ids to the wait call so later notifications bind to the right tool call
mark only terminal (completed / errored) agent statuses as resolved
keep multi-agent wait output deterministic by sorting agent ids
dedupe orphaned terminal notifications instead of emitting repeated plain user rows

roborev-ci · 2026-03-26T18:38:49Z

roborev: Combined Review (`8856c1b`)

Review completed: 1 Medium severity
issue found regarding duplicate subagent results.

Medium

Location: internal/parser/codex.go around handleFunctionCallOutput (wait branch)
Problem: A terminal <subagent_notification> that arrives after the wait call but before the wait function_call_output will be emitted twice. handleSubagentNotification marks the agent as completed and appends a tool result, but the later wait output path does not consult agentResults before appending its own tool result for the same agent.
Fix: Filter already
-completed agent IDs out of the wait output before emitting a result, or skip emitting the wait tool result when every terminal status in that payload was already recorded from notifications.

Synthesized from 3 reviews (agents: codex, gemini | types: default, security)

tpn · 2026-03-26T18:58:28Z

Latest roborev item from the PR thread is addressed in a169042.

This follow-up makes the wait path skip agent ids that already produced terminal output via notifications, so a terminal notification followed by a later terminal wait result no longer duplicates the same completion output.

roborev-ci · 2026-03-26T19:39:49Z

roborev: Combined Review (`a169042`)

Verdict: The changes successfully implement Codex subagent workflows, but there is a synchronization issue with how terminal subagent notifications are finalized.

Medium

Location:
internal/parser/codex.go (handleSubagentNotification)
Problem: Terminal subagent notifications are finalized immediately against the current owner in agentCalls. If a child finishes before the parent later emits a wait call for that same agent, the completion is permanently attached to spawn _agent (or emitted as a standalone user message), and the later wait tool call is left without any result because agentResults[agentID] already suppresses it.
Fix: Keep terminal status text pending per agent_id and prefer rebinding it to a later wait call when one appears, or defer deduplication until the parser knows whether a wait call exists later in the session.

Synthesized from 3 reviews (agents: codex, gemini | types: default, security)

tpn · 2026-03-26T20:03:56Z

Addressed the latest PR-thread roborev finding in d2f4db9.

This follow-up keeps terminal subagent notifications pending until the parser knows whether a later wait call should own them, so a completion that arrives before wait now rebinds correctly instead of being frozen onto spawn_agent or dropped into a standalone user row.

roborev-ci · 2026-03-26T20:40:50Z

roborev: Combined Review (`d2f4db9`)

Summary Verdict: One medium-severity issue identified regarding the chronological ordering of orphaned sub
agent notifications in the Codex parser.

Medium

Location: internal/parser/codex.go (handleSubagentNotification, flushPendingAgentResults)
Problem: Terminal subagent notifications that arrive after spawn_agent but before any wait are buffered until EOF
. If later transcript messages follow and no later wait rebind happens, the synthesized tool-result message is appended after those later messages, and orphaned notifications are additionally reordered by agent ID during the final flush. That breaks the original conversation chronology in the parsed output.
Fix: Preserve buffered notification order
. Either emit the fallback result as soon as it is clear no later wait will claim it, or keep pending notifications in arrival order and insert/flush them before later messages rather than only at end-of-file.

Synthesized from 3 reviews (agents: codex, gemini | types: default, security)

tpn · 2026-03-26T21:55:13Z

Addressed the latest PR-thread roborev finding in 63bae3b.

This follow-up preserves chronology for no-wait fallback cases by reserving notification order and inserting the synthesized result back into the parsed stream at that original position when no later wait claims it.

roborev-ci · 2026-03-26T22:15:03Z

roborev: Combined Review (`63bae3b`)

Synthesis unavailable. Showing individual review outputs.

codex — default (done)

Review Findings

Severity: Medium; Location: internal/parser/codex.go:177-223, internal/parser/codex.go:1000-1063; Problem: Incremental parsing still misses the case where a later wait call should rebind a subagent result that was already persisted earlier as a fallback under spawn_agent or as an orphaned notification. ParseCodexSessionFrom builds a fresh builder with no prior pendingAgentResults, and codexIncrementalNeedsFullParse() only forces a full parse on notification/output lines, not on appended wait calls, so a session can remain permanently threaded to the wrong tool call until another subagent event happens. Fix: Persist enough subagent state across incremental parses to support rebinding, or conservatively force a full parse when a Codex wait call is appended.
Severity: Low; Location: internal/parser/codex.go:241-266, internal/parser/codex.go:316-351; Problem: Duplicate terminal notifications received before a result is materialized overwrite the earlier pending entry in pendingAgentResults, including its original ordinal/timestamp. That means deduping can move the surviving result later in the timeline than the first completion event, which breaks chronology when there are intervening messages. Fix: When a pending terminal result for an agent already exists, keep the earliest pending entry instead of overwriting it.

Summary

Adds Codex subagent parsing/threading support, treats subagent notifications as system messages, and bumps the parser data version for resync.

codex — security (done)

Summary: No security issues identified in the reviewed diff. The changes are confined to local Codex session parsing, subagent result correlation, and frontend system-message classification, and they do not introduce new injection, authz/authn, credential exposure, path traversal, dependency, CI/CD, concurrency, or sensitive-data handling risks under the repository’s stated threat model.

No issues found.

gemini — default (failed)

Error: Review failed. Check CI logs for details.

tpn · 2026-03-26T22:44:09Z

Addressed the newest PR-thread roborev finding in 63dc056.

This follow-up forces a full reparse when a late Codex wait call is appended after earlier notifications, and it also preserves the earliest pending terminal notification instead of overwriting it with later duplicates.

roborev-ci · 2026-03-26T23:16:53Z

roborev: Combined Review (`63dc056`)

Synthesis unavailable. Showing individual review outputs.

codex — default (done)

Review Findings

Severity: Medium
Location: internal/parser/codex.go:46, internal/parser/codex.go:241
Problem: Result de-duplication is keyed globally by agent_id (agentResults map[string]bool), so once any terminal result is recorded for an agent, later wait calls for that same agent will have their function_call_output silently dropped. This leaves a valid later tool call with no matching tool result and breaks chronology for repeated waits/retries.
Fix: Track delivered results per call_id + agent_id (or per bound wait call) instead of per agent globally, so only duplicate sources for the same wait invocation are suppressed.
Severity: Low
Location: internal/parser/codex.go:1063
Problem: codexIncrementalNeedsFullParse forces a full reparse for every <subagent_notification>, including nonterminal statuses like running that handleSubagentNotification intentionally ignores. Long-running subagents will now trigger unnecessary resync churn with no persisted message change.
Fix: Parse the notification status here as well and require a full parse only for terminal completed/errored notifications that can actually affect stored messages.

Summary

This change adds Codex subagent/session linking and wait-result threading, including notification fallback and incremental-parse fallback logic.

codex — security (done)

Summary: The diff is limited to Codex session parsing, message classification, and test helpers. I did not identify any security regressions related to injection, auth/access control, credential exposure, path traversal, unsafe deserialization, dependency/CI issues, sensitive data handling, concurrency-based security bypasses, or error-driven information leakage.

No issues found.

gemini — default (failed)

Error: Review failed. Check CI logs for details.

tpn · 2026-03-27T00:39:19Z

Im going back to the drawing board on the Codex subagent threading so this lands as a coherent fix instead of another incremental patch. The first few rounds showed that trying to force this workflow into the existing single-result pairing model is still leaving chronology/ownership edge cases, so Im reworking it more comprehensively before the next push.

tpn · 2026-03-28T04:12:57Z

Superseded by #247.

I kept running into ownership and chronology edge cases while trying to fit Codex subagent terminal status into the existing single tool_calls.result_content model. The v2 PR switches to first-class tool_result_events in SQLite + PostgreSQL, keeps result_content as a derived/latest summary, and renders expandable chronological history in the UI.

Please review #247 instead of this branch for the current approach.

Closes #233. Supersedes #242. ## Summary - model Codex subagent terminal status as first-class `tool_result_events` in SQLite and PostgreSQL - keep `tool_calls.result_content` as a derived/latest compatibility summary instead of the canonical storage shape - render the latest summary by default and expose full chronological subagent result history in the tool block UI - preserve event history across full resyncs, PG pushes, and orphan-copy recovery ## How This Differs From #242 `#242` tried to solve Codex subagent workflows inside the existing single-result model by: - rebinding `wait` / `<subagent_notification>` output into one mutable `tool_calls.result_content` - synthesizing tool-result-like transcript rows to preserve some ordering information - adding parser heuristics to decide which source should "own" the final blob That approach improved the raw JSON problem, but it kept running into ownership and chronology edge cases because Codex subagent status is not actually a single final blob. This PR switches to a different model: - add additive `tool_result_events` tables in both SQLite and PostgreSQL - attach chronological result events directly to the originating tool call - derive `tool_calls.result_content` from those events for compatibility/search/compact UI - keep the existing subagent session linkage on `spawn_agent` Practical effect: - chronology is preserved without retroactively mutating older transcript rows - `wait` output and fallback notifications are stored as canonical event history - PG-backed multi-host views and local SQLite views share the same result-event model ## Details - `spawn_agent` remains a `Task` tool call and still maps to `subagent_session_id` - terminal `wait` output becomes `tool_result_events` - terminal `<subagent_notification>` messages also become `tool_result_events` - blocked categories are enforced server-side for event history as well as the compact summary - orphan-copy / resync paths preserve stored event history for sessions whose source files no longer exist ## Upgrade / Rollout - SQLite: additive schema change plus `dataVersion` bump - PostgreSQL: additive schema change, no drop/recreate required - exporters should upgrade and run `agentsview pg push --full` once, or let the normal full-sync path re-export after resync Known limit: - sessions whose original Codex JSONL source is already gone can only preserve event history that was already stored in SQLite; they cannot derive brand new history from missing raw files ## Test Plan - `CC=/usr/bin/gcc CXX=/usr/bin/g++ CGO_ENABLED=1 go test -tags fts5 ./internal/parser ./internal/sync ./internal/db ./internal/postgres -count=1` - `cd frontend && npm test -- --run src/lib/components/content/ToolBlock.test.ts src/lib/utils/messages.test.ts` ## Local Review - local `codex`, `claude-code`, and `gemini` review runs covered both `default` and `security` - the first v2 pass surfaced issues around stale `tool_result_events`, blocked-category history leakage, and same-ordinal orphan chronology - those follow-up fixes are included in this branch

tpn added 4 commits March 26, 2026 10:17

feat: model codex subagent workflows

4298da4

fix: preserve codex notification fallbacks

b289b7f

fix: ignore nonterminal codex notifications

d8c0e43

fix: trigger resync for codex subagent rollout

79dbf9d

tpn mentioned this pull request Mar 26, 2026

feat: model Codex subagent workflows instead of showing raw subagent_notification JSON #233

Closed

fix: thread codex subagent results correctly

8856c1b

fix: dedupe codex wait and notification results

a169042

fix: rebind pending codex wait results

d2f4db9

fix: preserve codex subagent chronology

63bae3b

fix: reparse late codex wait bindings

63dc056

tpn mentioned this pull request Mar 28, 2026

feat: model Codex subagent result events #247

Merged

wesm closed this Mar 28, 2026

Conversation

tpn commented Mar 26, 2026

Summary

Details

Upgrade / Rollout

Test Plan

Uh oh!

roborev-ci bot commented Mar 26, 2026

roborev: Combined Review (79dbf9d)

Medium

Uh oh!

tpn commented Mar 26, 2026

Uh oh!

roborev-ci bot commented Mar 26, 2026

roborev: Combined Review (8856c1b)

Medium

Uh oh!

tpn commented Mar 26, 2026

Uh oh!

roborev-ci bot commented Mar 26, 2026

roborev: Combined Review (a169042)

Medium

Uh oh!

tpn commented Mar 26, 2026

Uh oh!

roborev-ci bot commented Mar 26, 2026

roborev: Combined Review (d2f4db9)

Medium

Uh oh!

tpn commented Mar 26, 2026

Uh oh!

roborev-ci bot commented Mar 26, 2026

roborev: Combined Review (63bae3b)

codex — default (done)

Review Findings

Summary

codex — security (done)

gemini — default (failed)

Uh oh!

tpn commented Mar 26, 2026

Uh oh!

roborev-ci bot commented Mar 26, 2026

roborev: Combined Review (63dc056)

codex — default (done)

Review Findings

Summary

codex — security (done)

gemini — default (failed)

Uh oh!

tpn commented Mar 27, 2026

Uh oh!

tpn commented Mar 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

roborev: Combined Review (`79dbf9d`)

roborev: Combined Review (`8856c1b`)

roborev: Combined Review (`a169042`)

roborev: Combined Review (`d2f4db9`)

roborev: Combined Review (`63bae3b`)

roborev: Combined Review (`63dc056`)