fix(subagent): reliable gateway tool-loop for non-Anthropic providers (persist tool-result messages + JSON-normalize outputs)#2257
Open
rafaelreis-r wants to merge 2 commits into
Conversation
…-fix histories on resume
The gateway-native subagent loop (agent.use_gateway_loop, mandatory for
non-Anthropic models) persisted assistant turns and per-tool execution rows
but never persisted the user-role message carrying the tool results back to
the model (`void userMessageIdx`). Within a single uninterrupted run this is
invisible — the tool-result messages live in the in-memory array. But on any
resume (worker restart / re-claim), loadPriorMessages rebuilt the history from
subagent_messages alone, which was missing every tool-result message, leaving
adjacent assistant turns with dangling tool-calls. AI SDK v6 then rejects the
prompt — "Tool results are missing for tool calls ..." or "messages do not
match the ModelMessage[] schema" — and the job loops to max_attempts. The
replayState.priorTools reconciler never gets a chance: chat() throws on the
malformed history before the tool-dispatch section runs.
The legacy Anthropic-direct path always persisted this message; only the
gateway path dropped it.
Fix:
- toolLoop: add onToolResultMessage callback, fired before the in-memory
push (write-before-use), wired in runSubagentViaGateway to persist the
user-role tool-result message. No more message_idx gaps going forward.
- runSubagentViaGateway: rebuildReplayHistory() self-heals already-corrupted
jobs by reconstructing each missing tool-result message from settled
subagent_tool_executions rows (matched by provider toolCallId) and
re-persisting it. Conservative: a still-pending/absent exec leaves the turn
un-healed rather than fabricating a result. nextMessageIdx now derives from
max(known idx)+1 instead of array length so a healed gap can't collide.
Tests: test/e2e/subagent-gateway-resume.test.ts (forward persistence + self-
healing resume without tool re-execution). Existing gateway-path and
cross-provider crash-replay suites stay green.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
… them AI SDK v6 validates each tool-result `json` output against a strict JSONValue Zod schema. gbrain tool outputs routinely carry non-JSON values: node-postgres returns `timestamptz` columns as JS `Date`, so brain_get_page / brain_list_pages rows include Date-typed updated_at/created_at fields. A raw Date (also undefined / bigint) makes the entire tool message fail the ModelMessage union, and generateText throws "Invalid prompt: The messages do not match the ModelMessage[] schema" on the turn after such a tool runs. This fired on every live multi-turn run with a non-Anthropic (gateway-path) model. Crash-replay masked it: the value had already been round-tripped through the jsonb column to an ISO string, so a resumed run validated fine — until the next live tool call hit a fresh Date again. (This was the "attempt 1" failure shadowed by the separate tool-result-message persistence bug.) Fix: normalize the tool-result `json` value via toJsonValue() (JSON round-trip: Date -> ISO string, undefined dropped, non-serializable -> string fallback) at the toModelMessages boundary, plus a non-throwing safeStringify() for the error-text branch. Persisted blocks are unaffected (persistMessage already JSON-stringifies on write, so stored and sent forms now agree). Tests: test/gateway-model-messages.test.ts — Date->ISO normalization, undefined drop, and a real AI SDK generateText validation accepting a Date-bearing tool output (no ModelMessage rejection). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fixes #2256.
The gateway-native subagent loop (
agent.use_gateway_loop=true, mandatory for non-Anthropic providers) couldn't reliably complete a multi-turn tool conversation. Two independent bugs combined to make subagent jobs loop on retries forever. Both surface as AI SDK v6 errors:Invalid prompt: The messages do not match the ModelMessage[] schemaandTool results are missing for tool calls <ids>. The Anthropic-direct path is unaffected.Commit 1 — persist tool-result messages; self-heal on resume
toolLoop()fed each turn's tool results back as arole:'user'message but never persisted it (void userMessageIdx), andrunSubagentViaGatewaydiscardedToolLoopResult.messages. Sosubagent_messageshad gaps at every evenmessage_idx. On any resume,loadPriorMessagesrebuilt a history of adjacent assistant turns with dangling tool-calls, which AI SDK v6 rejects — before thereplayState.priorToolsreconciler (which only runs inside tool-dispatch, afterchat()) ever gets a chance. The job then looped tomax_attempts. The legacy Anthropic-direct path always persisted this message.toolLoop: newonToolResultMessagecallback, fired before the in-memory push (write-before-use).runSubagentViaGateway: wires it to persist the tool-result message; andrebuildReplayHistory()self-heals already-corrupted jobs by reconstructing each missing tool-result message from settledsubagent_tool_executionsrows (matched by providertoolCallId) and re-persisting it. Conservative: a still-pending/absent exec leaves the turn un-healed rather than fabricating data.nextMessageIdxnow derives frommax(known idx)+1so a healed gap can't collide under the unique(job_id, message_idx)constraint.Commit 2 — JSON-normalize tool-result outputs
AI SDK v6 validates each tool-result
jsonoutput against a strictJSONValueschema. node-postgres returnstimestamptzas JSDate, sobrain_get_page/brain_list_pagesrows carry Date-typedupdated_at/created_at; a rawDate(alsoundefined/bigint) fails theModelMessageunion the turn after such a tool runs. Replay masked it (the jsonb round-trip had already stringified the Date).toModelMessages: normalize thejsonvalue viatoJsonValue()(JSON round-trip:Date→ISO,undefineddropped, non-serializable→string), plus a non-throwingsafeStringify()for the error-text branch. Persisted blocks are unaffected (write path already JSON-stringifies, so stored and sent forms now agree).Tests
test/e2e/subagent-gateway-resume.test.ts— forward persistence (contiguousmessage_idx, no gaps) + self-healing resume from a pre-fix corrupted job without re-executing the tool.test/gateway-model-messages.test.ts—Date→ISO normalization,undefineddrop, and a realgenerateTextvalidation accepting a Date-bearing tool output (noModelMessagerejection).subagent-gateway-pathand cross-providersubagent-crash-replay-multi-providersuites stay green (33 pass on this branch).Validation
Deployed against a live
llama-server:qwen36bdream backlog where 23/23 multi-turn jobs were stuck looping. After the fix, jobs that previously failed every retry run the full multi-turn synthesis to completion with contiguousmessage_idx(verified one job: 16 messages, idx 0–15, real synthesis output; the rest drained the same way). The only non-completions were pre-existingwall-clock timeoutdeaths on the heaviest transcripts that had already exhausted most of their retry budget on the old bug — not a regression.