fix: stateful-tools domain handling + preserve tool output in conversation history#237
Merged
Merged
Conversation
6 tests that verify workers register under the correct Conductor domain when stateful=True. All assertions inspect the workflow execution via server API — no mocks, no LLM output parsing, fully deterministic. Tests: 1. Stateful tool completes (not stuck SCHEDULED) 2. Stateful stop_when callback executes in domain 3. Stateful swarm handoff + termination execute in domain 4. Mixed @tool and @tool(stateful=True) share domain 5. Concurrent stateful executions get different domains (isolation) 6. Non-stateful agents work without domain (regression guard) Counterfactual verified: removing the domain fix causes test 2 to fail (stop_when stuck SCHEDULED with pollCount=0, timeout after 300s).
… handoffs
Server always generates {parent}_handoff_check for SWARM workflows, but SDK
only registered the worker when agent.handoffs was truthy. This caused SWARM
loops (like coder↔qa) to deadlock when the parent had no explicit handoffs
(only children had OnTextMention handoffs).
Fix: also register when agent.strategy == "swarm" and agent.agents.
Includes 37 deterministic tests covering all multi-agent strategies.
…gies TypeScript SDK already had the correct condition (handoffs.length > 0 || strategy === "swarm") unlike the Python bug, but had no tests proving it. Adds 9 tests: - SWARM parent with NO handoffs (issue fixer exact pattern) - Bare SWARM, 3-agent SWARM, requiredWorkers filter - Non-SWARM strategies correctly skip handoff_check - Single agent (no children) correctly skips - Counterfactual: old buggy condition (handoffs only) would miss SWARM - Issue fixer exact topology verification
Required for Mode A idempotency-replay handling: when the server returns an existing execution for an idempotency_key, workers must register under the workflow's original taskToDomain, not the freshly generated run_id. This was assumed by the design spec but was actually uncommitted user work on coding_agent — pulling it forward here. Updates the 4 start/stream call sites to compute worker_domain via _resolve_worker_domain(execution_id, run_id) and use it for _prepare_workers, _register_and_start_skill_workers, and the LocalLivenessCheck registered-pair query.
A stateful agent re-run with the same task names but a different ``run_id`` (e.g., a fresh execution that isn't an idempotency replay) needs ``WorkerManager.start()`` to register the new ``(task_name, domain)`` pairs. The previous ``elif new_workers:`` branch only fired when the *task-name set* changed; identical names under a different domain were silently skipped, leaving the new domain unpolled. Convert to ``else:`` at all four sites (sync/async ``run`` and ``start``, plus the framework and graph paths). ``WorkerManager`` is domain-aware and only spawns missing pairs, so this remains safe under concurrent re-runs and avoids the fork() deadlock window of a full stop/restart cycle. Adds test_prepare_workers_starts_worker_manager_when_only_domain_changes to lock the contract: start() is called once even when the task names have already been registered, as long as the domain differs.
The previous compactToolHistory implementation truncated any tool result
older than the 6 most recent to 500 chars (with '...[truncated]'). That
caused the agent to lose context — a 5KB glob_find result kept only ~500
chars of file names, so on the next turn the agent re-issued the same
glob_find with a different filter, then re-read the same files. Observed
in workflow 637d179b-e0b5-4efd-a33f-2b2811ccbc01 where iter 14's
glob_find result was clipped to '...AgentspanAIMod...[truncated]' and
the agent kept reissuing nearly-identical queries trying to see more.
Removed:
* Mid-conversation truncation of older tool results
* contextbook_read 'keep only latest per section' dedup (relied on
truncation; now that we keep full results, dedup is unnecessary)
* truncateToolResult helper and TOOL_RESULT_TRUNCATE_LENGTH /
RECENT_TOOL_RESULTS_TO_KEEP constants — no longer used
Kept:
* Write-only tool collapsing — contextbook_write / contextbook_summary
produce 'wrote N chars' confirmations that add no downstream value
and are emitted by the agent itself (can't lose information)
Token-budget pressure flows through condenseIfNeeded, which drops ENTIRE
old messages — cleaner than partial truncation since the agent either
has full context for a message or doesn't see it at all.
Verified: ./gradlew test → 578 server tests pass.
7 tasks
shaileshpadave
approved these changes
May 14, 2026
bradyyie
approved these changes
May 14, 2026
bradyyie
left a comment
Collaborator
There was a problem hiding this comment.
LGTM , tested as much as i could isolated and seems to work ! and all existing tests pass
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Two critical fixes carved out of the larger coding-agent work (PR #163) as a small, independently-reviewable PR. The PAC/PAE work is being landed separately.
a) Stateful tools — domain handling
A series of fixes for stateful agents (`stateful=True` / `@tool(stateful=True)`) where workers must register under the execution's Conductor domain for isolation:
b) Tool call truncation — preserve tool output in conversation history
`AgentChatCompleteTaskMapper.compactToolHistory()` previously truncated any tool result older than the 6 most recent to 500 chars (`...[truncated]`). That caused the agent to lose context — a 5KB `glob_find` result kept only ~500 chars of file names, so on the next turn the agent re-issued the same `glob_find` with a different filter, then re-read the same files. Observed in workflow `637d179b-e0b5-4efd-a33f-2b2811ccbc01`.
After this fix:
Test plan
Scope
This PR intentionally does not include: