fix: stateful-tools domain handling + preserve tool output in conversation history by v1r3n · Pull Request #237 · agentspan-ai/agentspan

v1r3n · 2026-05-14T04:23:57Z

Summary

Two critical fixes carved out of the larger coding-agent work (PR #163) as a small, independently-reviewable PR. The PAC/PAE work is being landed separately.

a) Stateful tools — domain handling

A series of fixes for stateful agents (`stateful=True` / `@tool(stateful=True)`) where workers must register under the execution's Conductor domain for isolation:

`fix: register handoff_check worker for SWARM parents without explicit handoffs` — any SWARM parent always needs a `handoff_check` worker; the previous gate only registered when `agent.handoffs` was non-empty, causing SWARM workflows to stall.
`feat(sdk): add _resolve_worker_domain + use worker_domain at 4 sites` — single source of truth for resolving the per-execution worker domain; used by `run()`, `start()`, `stream()`, and `stream_async()` so every entry point computes the same domain.
`fix(sdk): re-register stateful workers when polling domain changes` — when the polling thread observes a domain it hasn't registered, it re-runs the late-registration path. Prevents long-lived agents from getting stuck after a domain rebind.
`test(e2e): add suite 14 — stateful domain propagation` — 6 deterministic e2e tests (no LLM-output parsing) for the domain-propagation contract.
`test(ts): prove SWARM handoff_check registration works for all topologies` — TS-side regression net for the SWARM fix.

b) Tool call truncation — preserve tool output in conversation history

`AgentChatCompleteTaskMapper.compactToolHistory()` previously truncated any tool result older than the 6 most recent to 500 chars (`...[truncated]`). That caused the agent to lose context — a 5KB `glob_find` result kept only ~500 chars of file names, so on the next turn the agent re-issued the same `glob_find` with a different filter, then re-read the same files. Observed in workflow `637d179b-e0b5-4efd-a33f-2b2811ccbc01`.

After this fix:

Tool result content is never truncated mid-conversation.
Token-budget pressure flows through `condenseIfNeeded` which drops entire old messages — cleaner than partial truncation.
The only remaining transformation is collapsing write-only tool confirmations (`contextbook_write`, `contextbook_summary`) to `[ok]`, since those carry no downstream value.

Test plan

`./gradlew test` (server) → BUILD SUCCESSFUL, 578 tests pass
`pytest tests/unit/` (Python SDK) → 1572 pass, 0 fail
Watch CI run for full python-e2e + typescript-e2e on this branch

Scope

This PR intentionally does not include:

PAC/PAE (Plan-and-Compile / Plan-and-Execute) — separate PR
Worker liveness monitoring — separate PR if desired
Issue-fixer harness / examples — separate branch

@tool

6 tests that verify workers register under the correct Conductor domain when stateful=True. All assertions inspect the workflow execution via server API — no mocks, no LLM output parsing, fully deterministic. Tests: 1. Stateful tool completes (not stuck SCHEDULED) 2. Stateful stop_when callback executes in domain 3. Stateful swarm handoff + termination execute in domain 4. Mixed @tool and @tool(stateful=True) share domain 5. Concurrent stateful executions get different domains (isolation) 6. Non-stateful agents work without domain (regression guard) Counterfactual verified: removing the domain fix causes test 2 to fail (stop_when stuck SCHEDULED with pollCount=0, timeout after 300s).

… handoffs Server always generates {parent}_handoff_check for SWARM workflows, but SDK only registered the worker when agent.handoffs was truthy. This caused SWARM loops (like coder↔qa) to deadlock when the parent had no explicit handoffs (only children had OnTextMention handoffs). Fix: also register when agent.strategy == "swarm" and agent.agents. Includes 37 deterministic tests covering all multi-agent strategies.

…gies TypeScript SDK already had the correct condition (handoffs.length > 0 || strategy === "swarm") unlike the Python bug, but had no tests proving it. Adds 9 tests: - SWARM parent with NO handoffs (issue fixer exact pattern) - Bare SWARM, 3-agent SWARM, requiredWorkers filter - Non-SWARM strategies correctly skip handoff_check - Single agent (no children) correctly skips - Counterfactual: old buggy condition (handoffs only) would miss SWARM - Issue fixer exact topology verification

Required for Mode A idempotency-replay handling: when the server returns an existing execution for an idempotency_key, workers must register under the workflow's original taskToDomain, not the freshly generated run_id. This was assumed by the design spec but was actually uncommitted user work on coding_agent — pulling it forward here. Updates the 4 start/stream call sites to compute worker_domain via _resolve_worker_domain(execution_id, run_id) and use it for _prepare_workers, _register_and_start_skill_workers, and the LocalLivenessCheck registered-pair query.

A stateful agent re-run with the same task names but a different ``run_id`` (e.g., a fresh execution that isn't an idempotency replay) needs ``WorkerManager.start()`` to register the new ``(task_name, domain)`` pairs. The previous ``elif new_workers:`` branch only fired when the *task-name set* changed; identical names under a different domain were silently skipped, leaving the new domain unpolled. Convert to ``else:`` at all four sites (sync/async ``run`` and ``start``, plus the framework and graph paths). ``WorkerManager`` is domain-aware and only spawns missing pairs, so this remains safe under concurrent re-runs and avoids the fork() deadlock window of a full stop/restart cycle. Adds test_prepare_workers_starts_worker_manager_when_only_domain_changes to lock the contract: start() is called once even when the task names have already been registered, as long as the domain differs.

The previous compactToolHistory implementation truncated any tool result older than the 6 most recent to 500 chars (with '...[truncated]'). That caused the agent to lose context — a 5KB glob_find result kept only ~500 chars of file names, so on the next turn the agent re-issued the same glob_find with a different filter, then re-read the same files. Observed in workflow 637d179b-e0b5-4efd-a33f-2b2811ccbc01 where iter 14's glob_find result was clipped to '...AgentspanAIMod...[truncated]' and the agent kept reissuing nearly-identical queries trying to see more. Removed: * Mid-conversation truncation of older tool results * contextbook_read 'keep only latest per section' dedup (relied on truncation; now that we keep full results, dedup is unnecessary) * truncateToolResult helper and TOOL_RESULT_TRUNCATE_LENGTH / RECENT_TOOL_RESULTS_TO_KEEP constants — no longer used Kept: * Write-only tool collapsing — contextbook_write / contextbook_summary produce 'wrote N chars' confirmations that add no downstream value and are emitted by the agent itself (can't lose information) Token-budget pressure flows through condenseIfNeeded, which drops ENTIRE old messages — cleaner than partial truncation since the agent either has full context for a message or doesn't see it at all. Verified: ./gradlew test → 578 server tests pass.

bradyyie

LGTM , tested as much as i could isolated and seems to work ! and all existing tests pass

v1r3n added 6 commits May 14, 2026 00:18

v1r3n mentioned this pull request May 14, 2026

feat: Strategy.PLAN_EXECUTE — PAC/PAE compile-and-execute for LLM plans #238

Merged

7 tasks

Merge branch 'main' into feat/stateful-tools-and-tool-output-fix

830f4aa

shaileshpadave approved these changes May 14, 2026

View reviewed changes

bradyyie approved these changes May 14, 2026

View reviewed changes

v1r3n merged commit cf8e17f into main May 16, 2026
8 checks passed

v1r3n deleted the feat/stateful-tools-and-tool-output-fix branch May 16, 2026 16:49

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: stateful-tools domain handling + preserve tool output in conversation history#237

fix: stateful-tools domain handling + preserve tool output in conversation history#237
v1r3n merged 7 commits into
mainfrom
feat/stateful-tools-and-tool-output-fix

v1r3n commented May 14, 2026

Uh oh!

bradyyie left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

v1r3n commented May 14, 2026

Summary

a) Stateful tools — domain handling

b) Tool call truncation — preserve tool output in conversation history

Test plan

Scope

Uh oh!

bradyyie left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants