Skip to content

feat: model Codex subagent workflows#242

Closed
tpn wants to merge 9 commits intowesm:mainfrom
tpn:tpn/233-codex-subagent
Closed

feat: model Codex subagent workflows#242
tpn wants to merge 9 commits intowesm:mainfrom
tpn:tpn/233-codex-subagent

Conversation

@tpn
Copy link
Copy Markdown
Contributor

@tpn tpn commented Mar 26, 2026

Closes #233.

Summary

  • model Codex spawn_agent calls as real subagent-linked tool calls using existing subagent_session_id plumbing
  • turn terminal Codex wait / <subagent_notification> status into structured tool output instead of raw user-lane JSON
  • fall back to full parse only for Codex subagent-specific incremental events, and bump dataVersion so upgraded hosts auto-resync and re-export historical file-backed Codex sessions

Details

This PR teaches the Codex parser to preserve the information needed for the existing subagent UI path instead of dropping it on the floor:

  • function_call messages now retain call_id as ToolUseID
  • spawn_agent is categorized as a Task
  • spawn_agent outputs map the parent tool call to the child Codex session ID (codex:<agent_id>)
  • wait terminal results become paired tool output
  • <subagent_notification> becomes a fallback source of terminal output when no matching wait result was recorded
  • non-terminal running notifications are ignored so they do not suppress later completion/error results

The frontend also hides raw <subagent_notification> transcript rows, so already-synced databases stop showing the JSON blobs immediately after upgrading.

Upgrade / Rollout

  • no SQLite schema rebuild required
  • no PostgreSQL drop/recreate required
  • dataVersion is bumped so upgraded hosts automatically perform a full local resync
  • agentsview pg push already forces a full PG push after a local resync, so exporters will re-export automatically on the next full push path

Practical effect for self-hosted PG users:

  1. upgrade the exporter binary on each host
  2. run agentsview pg push --full once, or let the existing periodic exporter do the next full-sync cycle
  3. keep the shared PG instance in place

Known limit:

  • historical sessions whose raw Codex JSONL source file is already gone cannot be retroactively relinked, because the old SQLite archive never stored the dropped function_call_output linkage data

Test Plan

  • CC=/usr/bin/gcc CXX=/usr/bin/g++ CGO_ENABLED=1 go test -tags fts5 ./internal/db -count=1
  • CC=/usr/bin/gcc CXX=/usr/bin/g++ CGO_ENABLED=1 go test -tags fts5 ./internal/parser -count=1
  • npm test -- --run src/lib/utils/messages.test.ts

@roborev-ci
Copy link
Copy Markdown

roborev-ci bot commented Mar 26, 2026

roborev: Combined Review (79dbf9d)

The PR requires changes to
address two medium-severity issues related to Codex subagent state tracking and result threading.

Medium

  • Location: internal/parser/codex.go:164, internal/parser/codex.go:244
    Problem: Subagent completion notifications are
    always associated with the original spawn_agent call because agentCalls is only populated from spawn_agent outputs. If Codex issues a wait for that agent before the notification arrives, the notification still gets attached to spawn_agent, and a later wait output can produce a second
    completion result for the same agent. This misthreads the result and can duplicate the completion message.
    Fix: When parsing a wait call, extract its ids and remap those agent IDs to the wait call ID, or otherwise defer notification/result binding until you know whether a
    matching wait exists.

  • Location: internal/parser/codex.go:224
    Problem: handleFunctionCallOutput marks every agent in status.Map() as completed once any terminal text is rendered. For mixed wait results where one agent
    is completed/errored and another is still running, the running agent is incorrectly marked in agentResults, so its later completion notification/output will be dropped.
    Fix: Only set agentResults[agentID] = true for entries whose status actually produced terminal text (
    completed or errored), not for every key in the status object.


Synthesized from 3 reviews (agents: codex, gemini | types: default, security)

@tpn
Copy link
Copy Markdown
Contributor Author

tpn commented Mar 26, 2026

Addressed the Codex subagent threading feedback in 8856c1b.

Fixes included:

  • remap waited agent ids to the wait call so later notifications bind to the right tool call
  • mark only terminal (completed / errored) agent statuses as resolved
  • keep multi-agent wait output deterministic by sorting agent ids
  • dedupe orphaned terminal notifications instead of emitting repeated plain user rows

@roborev-ci
Copy link
Copy Markdown

roborev-ci bot commented Mar 26, 2026

roborev: Combined Review (8856c1b)

Review completed: 1 Medium severity
issue found regarding duplicate subagent results.

Medium

  • Location: internal/parser/codex.go around handleFunctionCallOutput (wait branch)
  • Problem: A terminal <subagent_notification> that arrives after the wait call but before the wait function_call_output will be emitted twice. handleSubagentNotification marks the agent as completed and appends a tool result, but the later wait output path does not consult agentResults before appending its own tool result for the same agent.
  • Fix: Filter already
    -completed agent IDs out of the wait output before emitting a result, or skip emitting the wait tool result when every terminal status in that payload was already recorded from notifications.

Synthesized from 3 reviews (agents: codex, gemini | types: default, security)

@tpn
Copy link
Copy Markdown
Contributor Author

tpn commented Mar 26, 2026

Latest roborev item from the PR thread is addressed in a169042.

This follow-up makes the wait path skip agent ids that already produced terminal output via notifications, so a terminal notification followed by a later terminal wait result no longer duplicates the same completion output.

@roborev-ci
Copy link
Copy Markdown

roborev-ci bot commented Mar 26, 2026

roborev: Combined Review (a169042)

Verdict: The changes successfully implement Codex subagent workflows, but there is a synchronization issue with how terminal subagent notifications are finalized.

Medium

  • Location:
    internal/parser/codex.go (handleSubagentNotification)
  • Problem: Terminal subagent notifications are finalized immediately against the current owner in agentCalls. If a child finishes before the parent later emits a wait call for that same agent, the completion is permanently attached to spawn _agent (or emitted as a standalone user message), and the later wait tool call is left without any result because agentResults[agentID] already suppresses it.
  • Fix: Keep terminal status text pending per agent_id and prefer rebinding it to a later wait call when one appears, or defer deduplication until the parser knows whether a wait call exists later in the session.

Synthesized from 3 reviews (agents: codex, gemini | types: default, security)

@tpn
Copy link
Copy Markdown
Contributor Author

tpn commented Mar 26, 2026

Addressed the latest PR-thread roborev finding in d2f4db9.

This follow-up keeps terminal subagent notifications pending until the parser knows whether a later wait call should own them, so a completion that arrives before wait now rebinds correctly instead of being frozen onto spawn_agent or dropped into a standalone user row.

@roborev-ci
Copy link
Copy Markdown

roborev-ci bot commented Mar 26, 2026

roborev: Combined Review (d2f4db9)

Summary Verdict: One medium-severity issue identified regarding the chronological ordering of orphaned sub
agent notifications in the Codex parser.

Medium

  • Location: internal/parser/codex.go (handleSubagentNotification, flushPendingAgentResults)
  • Problem: Terminal subagent notifications that arrive after spawn_agent but before any wait are buffered until EOF
    . If later transcript messages follow and no later wait rebind happens, the synthesized tool-result message is appended after those later messages, and orphaned notifications are additionally reordered by agent ID during the final flush. That breaks the original conversation chronology in the parsed output.
  • Fix: Preserve buffered notification order
    . Either emit the fallback result as soon as it is clear no later wait will claim it, or keep pending notifications in arrival order and insert/flush them before later messages rather than only at end-of-file.

Synthesized from 3 reviews (agents: codex, gemini | types: default, security)

@tpn
Copy link
Copy Markdown
Contributor Author

tpn commented Mar 26, 2026

Addressed the latest PR-thread roborev finding in 63bae3b.

This follow-up preserves chronology for no-wait fallback cases by reserving notification order and inserting the synthesized result back into the parsed stream at that original position when no later wait claims it.

@roborev-ci
Copy link
Copy Markdown

roborev-ci bot commented Mar 26, 2026

roborev: Combined Review (63bae3b)

Synthesis unavailable. Showing individual review outputs.

codex — default (done)

Review Findings

  • Severity: Medium; Location: internal/parser/codex.go:177-223, internal/parser/codex.go:1000-1063; Problem: Incremental parsing still misses the case where a later wait call should rebind a subagent result that was already persisted earlier as a fallback under spawn_agent or as an orphaned notification. ParseCodexSessionFrom builds a fresh builder with no prior pendingAgentResults, and codexIncrementalNeedsFullParse() only forces a full parse on notification/output lines, not on appended wait calls, so a session can remain permanently threaded to the wrong tool call until another subagent event happens. Fix: Persist enough subagent state across incremental parses to support rebinding, or conservatively force a full parse when a Codex wait call is appended.

  • Severity: Low; Location: internal/parser/codex.go:241-266, internal/parser/codex.go:316-351; Problem: Duplicate terminal notifications received before a result is materialized overwrite the earlier pending entry in pendingAgentResults, including its original ordinal/timestamp. That means deduping can move the surviving result later in the timeline than the first completion event, which breaks chronology when there are intervening messages. Fix: When a pending terminal result for an agent already exists, keep the earliest pending entry instead of overwriting it.

Summary

Adds Codex subagent parsing/threading support, treats subagent notifications as system messages, and bumps the parser data version for resync.


codex — security (done)

Summary: No security issues identified in the reviewed diff. The changes are confined to local Codex session parsing, subagent result correlation, and frontend system-message classification, and they do not introduce new injection, authz/authn, credential exposure, path traversal, dependency, CI/CD, concurrency, or sensitive-data handling risks under the repository’s stated threat model.

No issues found.


gemini — default (failed)

Error: Review failed. Check CI logs for details.

@tpn
Copy link
Copy Markdown
Contributor Author

tpn commented Mar 26, 2026

Addressed the newest PR-thread roborev finding in 63dc056.

This follow-up forces a full reparse when a late Codex wait call is appended after earlier notifications, and it also preserves the earliest pending terminal notification instead of overwriting it with later duplicates.

@roborev-ci
Copy link
Copy Markdown

roborev-ci bot commented Mar 26, 2026

roborev: Combined Review (63dc056)

Synthesis unavailable. Showing individual review outputs.

codex — default (done)

Review Findings

  • Severity: Medium

  • Location: internal/parser/codex.go:46, internal/parser/codex.go:241

  • Problem: Result de-duplication is keyed globally by agent_id (agentResults map[string]bool), so once any terminal result is recorded for an agent, later wait calls for that same agent will have their function_call_output silently dropped. This leaves a valid later tool call with no matching tool result and breaks chronology for repeated waits/retries.

  • Fix: Track delivered results per call_id + agent_id (or per bound wait call) instead of per agent globally, so only duplicate sources for the same wait invocation are suppressed.

  • Severity: Low

  • Location: internal/parser/codex.go:1063

  • Problem: codexIncrementalNeedsFullParse forces a full reparse for every <subagent_notification>, including nonterminal statuses like running that handleSubagentNotification intentionally ignores. Long-running subagents will now trigger unnecessary resync churn with no persisted message change.

  • Fix: Parse the notification status here as well and require a full parse only for terminal completed/errored notifications that can actually affect stored messages.

Summary

This change adds Codex subagent/session linking and wait-result threading, including notification fallback and incremental-parse fallback logic.


codex — security (done)

Summary: The diff is limited to Codex session parsing, message classification, and test helpers. I did not identify any security regressions related to injection, auth/access control, credential exposure, path traversal, unsafe deserialization, dependency/CI issues, sensitive data handling, concurrency-based security bypasses, or error-driven information leakage.

No issues found.


gemini — default (failed)

Error: Review failed. Check CI logs for details.

@tpn
Copy link
Copy Markdown
Contributor Author

tpn commented Mar 27, 2026

Im going back to the drawing board on the Codex subagent threading so this lands as a coherent fix instead of another incremental patch. The first few rounds showed that trying to force this workflow into the existing single-result pairing model is still leaving chronology/ownership edge cases, so Im reworking it more comprehensively before the next push.

@tpn
Copy link
Copy Markdown
Contributor Author

tpn commented Mar 28, 2026

Superseded by #247.

I kept running into ownership and chronology edge cases while trying to fit Codex subagent terminal status into the existing single tool_calls.result_content model. The v2 PR switches to first-class tool_result_events in SQLite + PostgreSQL, keeps result_content as a derived/latest summary, and renders expandable chronological history in the UI.

Please review #247 instead of this branch for the current approach.

@wesm wesm closed this Mar 28, 2026
wesm pushed a commit that referenced this pull request Mar 28, 2026
Closes #233.
Supersedes #242.

## Summary

- model Codex subagent terminal status as first-class
`tool_result_events` in SQLite and PostgreSQL
- keep `tool_calls.result_content` as a derived/latest compatibility
summary instead of the canonical storage shape
- render the latest summary by default and expose full chronological
subagent result history in the tool block UI
- preserve event history across full resyncs, PG pushes, and orphan-copy
recovery

## How This Differs From #242

`#242` tried to solve Codex subagent workflows inside the existing
single-result model by:

- rebinding `wait` / `<subagent_notification>` output into one mutable
`tool_calls.result_content`
- synthesizing tool-result-like transcript rows to preserve some
ordering information
- adding parser heuristics to decide which source should "own" the final
blob

That approach improved the raw JSON problem, but it kept running into
ownership and chronology edge cases because Codex subagent status is not
actually a single final blob.

This PR switches to a different model:

- add additive `tool_result_events` tables in both SQLite and PostgreSQL
- attach chronological result events directly to the originating tool
call
- derive `tool_calls.result_content` from those events for
compatibility/search/compact UI
- keep the existing subagent session linkage on `spawn_agent`

Practical effect:

- chronology is preserved without retroactively mutating older
transcript rows
- `wait` output and fallback notifications are stored as canonical event
history
- PG-backed multi-host views and local SQLite views share the same
result-event model

## Details

- `spawn_agent` remains a `Task` tool call and still maps to
`subagent_session_id`
- terminal `wait` output becomes `tool_result_events`
- terminal `<subagent_notification>` messages also become
`tool_result_events`
- blocked categories are enforced server-side for event history as well
as the compact summary
- orphan-copy / resync paths preserve stored event history for sessions
whose source files no longer exist

## Upgrade / Rollout

- SQLite: additive schema change plus `dataVersion` bump
- PostgreSQL: additive schema change, no drop/recreate required
- exporters should upgrade and run `agentsview pg push --full` once, or
let the normal full-sync path re-export after resync

Known limit:

- sessions whose original Codex JSONL source is already gone can only
preserve event history that was already stored in SQLite; they cannot
derive brand new history from missing raw files

## Test Plan

- `CC=/usr/bin/gcc CXX=/usr/bin/g++ CGO_ENABLED=1 go test -tags fts5
./internal/parser ./internal/sync ./internal/db ./internal/postgres
-count=1`
- `cd frontend && npm test -- --run
src/lib/components/content/ToolBlock.test.ts
src/lib/utils/messages.test.ts`

## Local Review

- local `codex`, `claude-code`, and `gemini` review runs covered both
`default` and `security`
- the first v2 pass surfaced issues around stale `tool_result_events`,
blocked-category history leakage, and same-ordinal orphan chronology
- those follow-up fixes are included in this branch
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat: model Codex subagent workflows instead of showing raw subagent_notification JSON

2 participants