Coding agent by v1r3n · Pull Request #163 · agentspan-ai/agentspan

v1r3n · 2026-04-26T06:36:11Z

No description provided.

Multi-agent coding agent that takes a GitHub issue number, analyzes the codebase, implements a fix with tests, and creates a PR autonomously. Architecture: pipeline-wrapped swarm (Issue Analyst >> SWARM >> PR Creator) with Tech Lead (Opus), Coder (Sonnet), DG Code Reviewer (skill agent), and QA Lead (Sonnet). Includes contextbook for durable team memory, stateful workers, idempotency via issue number, and full e2e test gate. All project-specific values are configurable constants for reuse.

@tool

3-chunk plan: tools (21 @tool functions), agent assembly (6 agents + instructions + pipeline), and verification. 8 tasks, ~30 steps. Files: _issue_fixer_tools.py, _issue_fixer_instructions.py, 100_issue_fixer_agent.py.

Six detailed prompt strings: Issue Analyst, Tech Lead, Coder, DG Reviewer, QA Lead, PR Creator. Parameterized with {repo}, {branch_prefix}, {max_review_cycles}, {max_e2e_retries}.

serve() re-registers workers in the default domain (no domain), but start() already registered them under the execution's unique domain UUID. The domain mismatch caused stateful tool tasks (like contextbook_read) to stay in SCHEDULED state with pollCount=0 — no worker was polling in the correct domain. handle.join() blocks until completion using the workers already registered by start() under the correct domain.

_register_workers recursed into sub-agents without passing the domain parameter. This caused sub-agent tools (e.g. contextbook_read on issue_analyst, a child of the pipeline) to register in the default domain while the Conductor tasks were scheduled in the execution's unique domain — resulting in pollCount=0 and SCHEDULED forever.

When an agent has stateful=True, the Conductor server schedules all tasks (tools, stop_when, termination, handoff, check_transfer, etc.) under the execution's unique domain UUID. But only tool workers were registered with that domain — all 11 system worker types (guardrails, stop_when, gate, callbacks, termination, check_transfer, router, handoff, swarm transfers, manual selection) were registered without a domain, causing them to poll in the default domain and never pick up the scheduled tasks (pollCount=0, stuck in SCHEDULED forever). Fix: add domain parameter to all _register_*_worker helpers and propagate it from _register_workers through every call site and recursive sub-agent registration. Affected methods: - _register_guardrail_worker - _register_single_guardrail_worker - _register_stop_when_worker - _register_gate_worker - _register_callback_worker - _register_termination_worker - _register_check_transfer_worker - _register_router_worker - _register_handoff_worker - _register_swarm_transfer_workers - _register_manual_selection_worker

@tool

6 tests that verify workers register under the correct Conductor domain when stateful=True. All assertions inspect the workflow execution via server API — no mocks, no LLM output parsing, fully deterministic. Tests: 1. Stateful tool completes (not stuck SCHEDULED) 2. Stateful stop_when callback executes in domain 3. Stateful swarm handoff + termination execute in domain 4. Mixed @tool and @tool(stateful=True) share domain 5. Concurrent stateful executions get different domains (isolation) 6. Non-stateful agents work without domain (regression guard) Counterfactual verified: removing the domain fix causes test 2 to fail (stop_when stuck SCHEDULED with pollCount=0, timeout after 300s).

Root cause of failed execution: Issue Analyst cloned repo into a random temp dir, but all other agents' tools operated in the SDK's CWD. The Coder, QA Lead, and PR Creator couldn't see the cloned repo. Fix: - Add set_working_dir()/get_working_dir() to tools module - Add _resolve(path) helper that resolves relative paths against the shared working directory - Add _cwd() helper for subprocess calls - All 21 tools now use _resolve() for file paths and _cwd() for subprocess cwd - Contextbook stored inside working dir (.contextbook/) - Entry point creates temp dir with random UUID suffix and calls set_working_dir() before starting the pipeline - Issue Analyst instructions updated to clone into "." (the working dir) instead of a separate mktemp dir - All agent instructions updated with "must use tools" reminders to prevent LLM from hallucinating code instead of calling tools

…ination Observed failures in execution 56063a70: - Issue Analyst: looped on contextbook_read 12 times, never produced output - Tech Lead: spent 25 turns reading files, never wrote implementation_plan - Coder: never ran (no handoff from Tech Lead) - PR Creator: nothing to commit Fixes: - Add explicit turn budgets to every agent ("you have N turns, budget them") - Add step-by-step turn allocation (turns 1-3: read, turns 4-15: explore, etc.) - Add anti-loop rules ("do NOT call contextbook_read repeatedly") - Add CRITICAL rules: "you MUST write implementation_plan before handing off" - Add "if you run out of turns without writing the plan, you have FAILED" - Simplify PR Creator to 5-turn max with numbered steps - Every agent: "output HANDOFF_TO_X in your response text" (not as a tool call) - DG Reviewer: "complete in 10 turns or fewer"

Observed in fe7daec4: Tech Lead hit max_turns=25 after reading 17 files but never wrote implementation_plan. Each turn used only 1 tool call. Changes: - max_turns: tech_lead 30->80, coder 100->200, qa_lead 40->80 - All instructions now say "call multiple tools in parallel" with specific examples of what to batch per turn - Tech Lead: phased approach with "reserve 30% of turns for writing" and "do NOT spend more than 70% reading" - Every agent: parallel-first patterns ("read these 3-5 files at once") - Removed fixed turn budgets that were too rigid, replaced with percentage-based guidance

Problems in c014e645: - Coder made 6 edits and committed but never output HANDOFF_TO_DG. It kept making tool calls after commit, handoff check found no handoff text, swarm exited. DG and QA never ran. Fixes: 1. Coder instructions: "After git commit, your VERY NEXT response must be the HANDOFF text with ZERO tool calls" and "The handoff text must be the ONLY content in that response" 2. New Documentation Agent (pipeline stage 3): - Runs after coding swarm, before PR creator - For features: MUST create example + update docs (mandatory) - For bug fixes: update relevant docs only if needed - Pipeline: issue_analyst >> coding_swarm >> docs_agent >> pr_creator - max_turns=40, has read/write/edit/glob/grep/list/run_command tools

…alls Reflects all changes since the original spec: - New pipeline: issue_analyst >> coding_swarm >> docs_agent >> pr_creator - Docs Agent (Stage 3): updates docs, creates mandatory examples for features - Working Directory section: shared temp dir, clone to ".", all tools relative - Tech Lead max_turns: 30 -> 80 - Coder max_turns: 100 -> 200 - QA Lead max_turns: 40 -> 80 - Tool assignment matrix: added Docs Agent column - Parallel tool calls: noted in design notes - Updated PR Creator to Stage 4

Two-layer fix: 1. Issue Analyst adds '.contextbook/' to .gitignore after cloning 2. All git add commands use ':!.contextbook' exclude pathspec

New constants (user-overridable): DOCS_PLAN_DIR = "docs/plan" — Tech Lead writes implementation plans here DOCS_DESIGN_DIR = "docs/design" — Docs Agent writes feature design docs here Tech Lead now writes the plan to BOTH: - {docs_plan_dir}/issue-<N>-plan.md (persisted in repo, committed) - contextbook implementation_plan (for agent communication) Docs Agent now writes feature design docs to: - {docs_design_dir}/issue-<N>-<feature-slug>.md Both paths are format placeholders in instructions, resolved from constants.

…e, reactions

Issues often reference external URLs (RFCs, API docs, related PRs, design docs). The web_fetch tool fetches a URL, strips HTML to plain text, and returns up to 16K chars. Assigned to Tech Lead, Coder, and Docs Agent.

Each PR now includes a machine-readable JSON block capturing the full context of what changed, why, and by whom. Designed so that release reviews can programmatically aggregate PR context across a release. Flow: 1. Coder writes change_context JSON to contextbook after implementing (issue_number, title, change_type, date, author, root_cause, what_changed [{file, change}], testing, risks, related_issues) 2. Coder updates it again after writing tests (testing field, test files) 3. PR Creator reads change_context and embeds it in the PR body inside a collapsible <details> block with a ```json code fence New contextbook section: "change_context"

Major restructure: replace single 4-agent swarm with deterministic pipeline stages and focused 2-agent review loops. Old: issue_analyst >> [SWARM: tech_lead, coder, dg, qa] >> docs >> pr Problem: coder never handed off to DG — swarm exited after coder. New: issue_analyst >> tech_lead >> [impl_loop] >> [test_loop] >> docs >> pr Pipeline stages: 1. Issue Analyst — fetch issue, clone, branch 2. Tech Lead — analyze code, write implementation plan (deeper thinking) 3. Implementation Loop (outer SWARM): - code_review_loop (inner SWARM: coder <-> DG, until CODE_APPROVED) - tl_reviewer (Tech Lead final review, IMPL_APPROVED or NEEDS_REWORK) 4. Test Loop (SWARM: qa_lead <-> test_coder, until TESTS_PASS) 5. Docs Agent — design docs, API docs, examples (MANDATORY for features) 6. PR Creator — commit, push, create PR with change_context JSON Key improvements: - DG review is GUARANTEED — 2-agent swarm must alternate - TL final review gate — implementation must be approved before testing - QA evidence: test results saved to {qa_evidence_dir}/issue-<N>/ - Each loop is pluggable — swap DG with any code reviewer - Separate test_coder instance for test writing - New TL_REVIEW_INSTRUCTIONS for final sign-off

New mode: address review comments on an existing PR and update it. Usage: python 100_issue_fixer_agent.py 42 # Fix issue #42 (new) python 100_issue_fixer_agent.py 42 --pr 157 # Address PR #157 feedback Feedback pipeline: pr_feedback >> impl_loop >> test_loop >> pr_updater Flow: 1. PR Feedback Agent: fetches PR comments, reviews, inline review comments (file:line), writes structured feedback to contextbook 2. Implementation Loop: coder addresses feedback, DG reviews changes 3. Test Loop: QA verifies tests still pass after changes 4. PR Updater: pushes to same branch, adds summary comment to PR with table of feedback items and resolutions New agents: pr_feedback, pr_updater New instructions: PR_FEEDBACK_INSTRUCTIONS, PR_UPDATER_INSTRUCTIONS Idempotency key for feedback: "issue-{N}-pr-{PR}-feedback"

WorkerManager._start_new_workers() deduplicates by task_def_name only, ignoring domain. When the same tool (e.g. contextbook_read) is registered under both default domain (None) and a stateful execution domain, the domain-specific worker was skipped because the name already existed. This caused SCHEDULED tasks with pollCount=0 in domain-specific queues — the worker was polling the default domain, not the execution's domain. Fix: track (task_def_name, domain) tuples in the existing set instead of just task_def_name. A worker under domain=None and the same worker under a specific domain are different polling targets that both need their own process.

…antee DG runs Root cause: SWARM relies on LLM to output handoff text. The coder LLM never outputs "HANDOFF_TO_DG" — it keeps making tool calls until max_turns. Result: DG never ran in any execution. 4 code_review_loop iterations, 0 dg_reviewer sub-workflows. Fix: Replace SWARM with SEQUENTIAL pipeline for review stages. Before (SWARM — unreliable): code_review_loop = SWARM(coder, dg_reviewer) # DG never runs After (SEQUENTIAL — deterministic): code_then_review = coder >> dg_reviewer # DG ALWAYS runs The outer impl_loop remains a SWARM for the TL approval cycle: impl_loop = SWARM(code_then_review, tl_reviewer) - code_then_review runs (coder + DG guaranteed) - tl_reviewer checks: IMPL_APPROVED or NEEDS_REWORK Testing also made sequential: test_then_verify = qa_lead >> test_coder >> qa_reviewer - QA plans, coder writes tests, QA reviews + runs e2e Also: added write_file tool to QA agents for writing QA evidence files.

Multi-agent coding agent that takes a GitHub issue number, analyzes the codebase, implements a fix with tests, and creates a PR autonomously. Architecture: pipeline-wrapped swarm (Issue Analyst >> SWARM >> PR Creator) with Tech Lead (Opus), Coder (Sonnet), DG Code Reviewer (skill agent), and QA Lead (Sonnet). Includes contextbook for durable team memory, stateful workers, idempotency via issue number, and full e2e test gate. All project-specific values are configurable constants for reuse.

@tool

3-chunk plan: tools (21 @tool functions), agent assembly (6 agents + instructions + pipeline), and verification. 8 tasks, ~30 steps. Files: _issue_fixer_tools.py, _issue_fixer_instructions.py, 100_issue_fixer_agent.py.

Six detailed prompt strings: Issue Analyst, Tech Lead, Coder, DG Reviewer, QA Lead, PR Creator. Parameterized with {repo}, {branch_prefix}, {max_review_cycles}, {max_e2e_retries}.

serve() re-registers workers in the default domain (no domain), but start() already registered them under the execution's unique domain UUID. The domain mismatch caused stateful tool tasks (like contextbook_read) to stay in SCHEDULED state with pollCount=0 — no worker was polling in the correct domain. handle.join() blocks until completion using the workers already registered by start() under the correct domain.

…resh starts E2E tests (Tasks 10–12 of the worker-liveness plan) — six tests under tests/integration/test_worker_liveness_live.py covering all three failure modes plus a validity counter-test for each: - LocalLivenessCheck raises WorkerStartupError when no worker subprocess is alive after registration (Mode B-1). - ServerLivenessMonitor surfaces WorkerStallError through join() when a SCHEDULED task in our domain has pollCount=0 past the stall threshold, with liveness_stall_policy="raise" (Mode B-2). - AgentHandle.is_resumed flips True and the "Resumed existing execution" INFO log is emitted on idempotency_key replay (Mode A). Each test pairs with a counter-test (liveness_enabled=False or fresh idempotency_key) confirming the new check, not unrelated behavior, is what's signaling. All assertions are algorithmic. Hot-path fix: skip _extract_domain on fresh starts. is_resumed and the resume INFO log can only fire when the caller passed an idempotency_key — without one, the server cannot return a previously-recorded domain. The original implementation made an extra get_workflow GET on every start/run regardless. This regressed the test_runtime unit tests (which assert get_workflow is called exactly once) and added latency to the common case. All four call sites — start/run, sync/async — now gate the resume detection behind ``if idempotency_key:``. Verification: - tests/unit/test_runtime.py: 160 pass (was 5 failing on the extra GET). - tests/integration/test_worker_liveness_live.py: 6/6 pass in 26s.

A stateful agent re-run with the same task names but a different ``run_id`` (e.g., a fresh execution that isn't an idempotency replay) needs ``WorkerManager.start()`` to register the new ``(task_name, domain)`` pairs. The previous ``elif new_workers:`` branch only fired when the *task-name set* changed; identical names under a different domain were silently skipped, leaving the new domain unpolled. Convert to ``else:`` at all four sites (sync/async ``run`` and ``start``, plus the framework and graph paths). ``WorkerManager`` is domain-aware and only spawns missing pairs, so this remains safe under concurrent re-runs and avoids the fork() deadlock window of a full stop/restart cycle. Adds test_prepare_workers_starts_worker_manager_when_only_domain_changes to lock the contract: start() is called once even when the task names have already been registered, as long as the domain differs.

…XECUTE Adds an optional ``planSource`` field on AgentConfig that points to a SIMPLE task (e.g., contextbook_read) which can serve a structured plan when the planner agent's text output cannot be parsed as a JSON fence. This is the recovery path for plan extraction failures — instead of immediately falling through to the agentic fallback agent, the harness can first try a deterministic source. Useful when the planner writes its plan to a contextbook section as part of its own toolchain and the extraction-from-LLM-output is unreliable. - server: AgentConfig.planSource (Map<String,Object>) — {tool, args}. - python sdk: Agent(plan_source={...}) constructor kwarg + serialization. - typescript sdk: not wired in this commit; will be added when the TS PLAN_EXECUTE harness needs it.

Adds parallel e2e tests in Python, Java, and TypeScript verifying that the GraalJS plan compiler honors ``max_tokens`` declared on a generate block. Counterfactual coverage: without the compiler reading ``gen.max_tokens``, every LLM_CHAT_COMPLETE task uses the default 4096 ceiling; the test instructs the planner to set max_tokens=8192 and requests sections >250 words, so a regression would surface as truncated output and missing word-count thresholds. All assertions are algorithmic — file existence, word counts, expected sections.

No functional changes — line-wrap and indentation adjustments across AgentCompiler, AgentChatCompleteTaskMapper, AgentService, and AgentCompilerTest.

Replaces the single ``CODER_PLANNER_INSTRUCTIONS`` with three distinct prompts to align the example with the deterministic-coding-workflows harness: - ``CODER_EXPLORER_INSTRUCTIONS`` — read-only exploration phase that writes a structured plan to the contextbook with the JSON fence required by Strategy.PLAN_EXECUTE. - ``CODER_PLANNER_INSTRUCTIONS`` — slim copy-and-emit role used after exploration so the planner output is deterministic enough for plan extraction. - ``CODER_IMPLEMENTER_FALLBACK_INSTRUCTIONS`` — agentic fallback when the deterministic plan execution fails validation.

CODING_AGENT_HARNESS_DESIGN.md — 1,568-line specification for a coding-agent runtime: conversation state, tool safety, file edits, command execution, sub-agent delegation, interruption recovery, session persistence. Reference architecture for staged build. GENERIC_AGENT_HARNESS_DESIGN.md — 2,814-line generalization beyond coding: resource + capability model with adapters for filesystem, browser, database, workflow, and device domains. Plugin/credential boundaries and security policy. .gitignore: ignore /.claude (per-user IDE artifacts).

… package rename # Conflicts: # sdk/java/src/main/java/ai/agentspan/Agent.java # sdk/java/src/main/java/ai/agentspan/internal/AgentConfigSerializer.java # sdk/java/src/test/java/ai/agentspan/e2e/E2ePlanExecuteTest.java

@builder

Addresses 17 findings from the dg adversarial code review of Strategy.PLAN_EXECUTE. The implementation had multiple paths where errors silently routed to "success" and the planner's free-text could inject script into the orchestrator. This commit fails closed everywhere. JavaScriptBuilder.compilePlanToWorkflowScript: - success_condition is filtered by safeCondition() — length-capped at 256 chars and rejected if it contains function/loop/assignment/host-access tokens. Plan validation surfaces "unsafe success_condition: ..." instead of evaluating planner-supplied JS. - Topological sort detects cycles and returns a structured error with the full cycle path, instead of silently dropping nodes. - Cycle/duplicate-id/unsafe-condition/bad-output_schema all return {workflow_def: null, error: "..."} which the parent now surfaces. - output_schema is validated as an instance-shape example object: real JSON Schema input ({"type":"object","properties":{...}}) is rejected rather than producing toolInputs.type / toolInputs.properties garbage. - LLM_CHAT_COMPLETE temperature is read from gen.temperature (default 0), not hardcoded. - LLM user message no longer appends "Respond with valid JSON only." — the system message + jsonOutput:true + temperature carry the contract. - INLINE parse task is followed by a SWITCH(parsed.__parse_error). On parse failure → TERMINATE FAILED; on success → SIMPLE tool task. This prevents the tool task from firing with all-undefined args. - Validation SWITCH inverts: decisionCases:{passed:onSuccess}, defaultCase:onFailure. Aggregator returning null/error/anything-else fails closed instead of silently routing to onSuccess. - optional:true is removed from all parse, eval, validation, terminate, on_success, on_failure, static, generated, and gate tasks. Failures bubble through SUB_WORKFLOW so the parent SWITCH can route to fallback. retryCount:1 on tool tasks covers transient errors. - timeoutSeconds is sourced from harnessTimeoutSeconds input (default 600 if absent), so the dynamic sub-workflow tracks the parent's contract. - workflow_def is no longer array-wrapped — bare JSON.stringify(wfDef). parse_wf in the parent simplifies to JSON.parse(wfDefJson). MultiAgentCompiler.compilePlanExecute: - After compile_plan, a new compile_status INLINE + compile_gate SWITCH TERMINATE on compile_error with the actual error message, so cycle / duplicate-id / unsafe-condition errors are visible to the caller instead of disappearing into a null-def SUB_WORKFLOW launch. - has_json predicate now requires Array.isArray(p.steps) && length > 0, matching what compilePlan actually accepts. Weaker predicates trip- wired the compile path with plans the compiler immediately rejected. - Sub-workflow input forwards cwd, credentials, and media in addition to prompt/session_id/__agentspan_ctx__/context. Compiled plan tools that depend on working dir or credentials no longer silently default. This is why examples had to hardcode WORK_DIR. - SUB_WORKFLOW.optional:true removed — sub-workflow failures must propagate so the fallback agent is reached. - harnessTimeoutSeconds is plumbed into compile_plan inputs. - planSource.tool is validated at compile time via isToolRegisteredInHarness — typos throw IllegalArgumentException at deploy instead of silently failing at runtime. - Fallback config rebuild uses toBuilder().maxTurns(fbMaxTurns).build() instead of an explicit 8-field whitelist copy that silently dropped agents/strategy/memory/promptInputs/toolChoice/handoffs/callbacks/ metadata/etc when fallback_max_turns was set. AgentConfig: @builder(toBuilder = true) so toBuilder() is available for the fallback rebuild above. Tests: PlanCompilerScriptTest updated to read the bare-object workflow_def (no more List<Map> unwrap) and to surface compile errors as test failures instead of NPEs. All 5 tests + the broader server suite pass. Skipped from this batch: - Per-execution workflow naming (race fix): on closer reading the placeholder is registered once at deploy and is not overwritten at runtime; the actual sub-workflow definition is injected inline via SubWorkflowParams.workflowDefinition. The "race" Gilfoyle described doesn't actually exist in the code path. - Move plan compiler to .js resource (large refactor — separate commit). - TS SDK planSource parity (separate SDK). - Failure-mode e2e tests (separate batch). - Spec doc sync (separate commit).

… consistency - TypeScript SDK: AgentOptions and Agent class gain planSource field; serializer forwards it as planSource on the wire to match the Java/Python AgentConfig. Closes the previously-flagged cross-SDK gap. - docs/design/deterministic-coding-workflows.md - Replace plan_json_section with plan_source (matches actual SDK). - Clarify output_schema is an instance-shape example object, not real JSON Schema. Real JSON Schema is rejected at compile time. - Document success_condition as a length-capped, denylist-filtered JS expression rather than free-form jq/JS. - Replace the "degrade to Strategy.SEQUENTIAL" bullets in the Risks table with what the impl actually does: try plan_source → fall through to fallback agent → TERMINATE if no fallback. - extractJsonFenceScript Case 1: when the planner returned an already-parsed plan object, markdown_plan is now the original coerced planner text (when available) instead of a re-pretty-printed JSON. Other cases already passed the original — this aligns Case 1 with the rest so the fallback agent always sees the LLM's actual prose, not a re-serialized object.

…ed paths Adds 18 unit tests exercising the failure modes the dg review flagged. All run as ordinary GraalVM Polyglot / pure Java unit tests — no live Conductor server, no LLM API calls. PlanCompilerScriptTest (+13 tests): - testCycleInDependsOnIsRejected — a→b→a returns a structured cycle error with the full path, instead of the previous silent partial DAG. - testDuplicateStepIdIsRejected — duplicate step ids produce a clear validation error. - testEmptyStepsArrayIsRejected — empty steps array short-circuits with "non-empty steps array". - testUnsafeSuccessConditionIsRejected — six adversarial conditions (function literal, while loop, Java.type, eval, assignment, var declaration) are each rejected by safeCondition. - testSafeSuccessConditionsAreAccepted — counter-test: typical safe conditions ($.exit_code === 0, $.indexOf('passed') >= 0, etc.) compile. - testJsonSchemaAsOutputSchemaIsRejected — passing real JSON Schema ({"type":"object","properties":...}) errors with a clear "use an example object instead" message instead of producing toolInputs.type / toolInputs.properties garbage. - testInstanceShapeOutputSchemaIsAccepted — counter-test: plain instance-shape compiles cleanly. - testGeneratedOpUsesParseGateSwitch — every generated op's chain contains a SWITCH(pgate_*) that routes parse errors to TERMINATE. - testNoTaskIsOptional — verifies the optional:true cancer is gone across static, generated, validation, on_success, on_failure, TERMINATE, INLINE, and SWITCH tasks. - testValidationSwitchFailsClosed — decisionCases:{passed:onSuccess}, defaultCase:onFailure with a TERMINATE inside, so any non-passed aggregator result fails closed. - testTimeoutFromHarnessConfig — harnessTimeoutSeconds=1234 flows into workflow.timeoutSeconds. - testDefaultTimeoutWhenHarnessTimeoutAbsent — no harness timeout falls back to 600. - testWorkflowDefIsNotArrayWrapped — workflow_def is a bare JSON object, not [wfDef] (the historical array wrap is gone, plus parse_wf simplified). MultiAgentCompilerTest (+5 tests): - testPlanExecutePlanSourceWithUnknownToolIsRejectedAtCompile — planSource.tool that's not registered anywhere in the harness throws IllegalArgumentException at compile time. Surfaces typos at deploy instead of as silent runtime no-ops. - testPlanExecutePlanSourceMissingToolFieldIsRejected — planSource map without a 'tool' key fails fast. - testPlanExecutePlanSourceWithRegisteredToolCompiles — counter-test: registered tool produces a plan_reader SIMPLE task. - testPlanExecuteSurfacesCompileErrors — has_plan branch contains the new compile_status INLINE + compile_gate SWITCH that TERMINATEs with "Plan compilation failed: ..." instead of letting parse_wf trip on a null workflow def. - testPlanExecuteSubWorkflowForwardsCwdCredentialsMedia — SUB_WORKFLOW inputs include cwd, credentials, media in addition to prompt/session_id. - testPlanExecuteSubWorkflowIsNotOptional — execTask.isOptional() is false so failures propagate to the fallback SWITCH. Total: 18 new tests, 0 failures, full server suite stays green. Note on Python e2e tests for the same failure modes (no fence → fallback, validation failure → fallback, etc.): those require a freshly-built Conductor server with the new compiler code. Adding them requires a ``./gradlew build`` + server restart, which is destructive against the locally-running shared server and was not done in this commit. The Java unit tests above exercise the same compile paths against the same compiler script, just without the LLM round-trip.

@builder

… fallback routing Addresses 5 findings from a second /dg adversarial review of the PLAN_EXECUTE implementation. Three were criticals introduced in the previous "fix" pass. JavaScriptBuilder.compilePlanToWorkflowScript:safeCondition (rewrite) Previous denylist was the wrong shape — it blocked function/eval/Java but left ``constructor``, ``prototype``, ``__proto__``, ``Object``, ``Reflect``, ``Proxy``, bracket access, comma operator, and ``+``-based string concatenation. The bypass $.constructor.constructor('return Java.type(...)')() passed the filter and reached GraalJS evaluation — a textbook sandbox escape via the Function constructor. Replaced with three layers: 1. Character allowlist — only ``$.()'"!=<>&|-`` plus alphanumeric/ underscore/whitespace. Rejects ``;``, ``{``, ``}``, backticks, backslashes (no escape evasion), ``,`` (no comma operator), ``?``/``:`` (no ternary), ``[``/``]`` (no bracket-key access), ``+``/``*``/``/`` (no string concat tricks). 2. String literals are stripped before identifier check so legitimate conditions like ``$.kind === 'constructor'`` are preserved. 3. Identifier denylist on the stripped string blocks the sandbox- escape primitives and host-bridge globals: constructor, prototype, __proto__, __defineGetter__, Function, eval, Object, Reflect, Proxy, Java, Promise, etc., plus all JS control-flow keywords. Bare assignment is still rejected. The ``globallThis`` (3-l) typo from the prior iteration is gone — globalThis is in the new identifier list with correct spelling. JavaScriptBuilder.compilePlanToWorkflowScript:output_schema sniff (broaden) Previous heuristic required both ``properties`` AND ``type === 'object'``. Missed ``{type:'object', required:[...]}`` (no properties) and ``{$schema, properties}`` (no top-level type). Now flags presence of any JSON-Schema-only key: $schema, properties, required, additionalProperties, definitions, $defs, $ref, allOf, anyOf, oneOf, patternProperties. MultiAgentCompiler.compilePlanExecute (compile_gate restructure) Two related fixes: (a) Previous gate had decision case only for ``compile_error`` and an empty ``defaultCase``. The compile_status script also emitted ``no_def`` (defensive sentinel for null wfDef without an error string) — that fell through, parse_wf JSON.parse(null) returned null, SUB_WORKFLOW launched with null definition. The very silent- degrade the gate was meant to prevent. Folded both error sentinels into one ``compile_failed`` so the gate cannot have a fall-through. (b) Compile failures TERMINATEd the parent harness even when a fallback agent was configured — an architectural inversion since fallback exists exactly to recover from deterministic-path failures, and compile failure is the canonical such failure. Now routes compile_failed into buildFallbackBranch when fallbackConfig != null (with a distinct prefix to avoid task-name collision with the exec-failure fallback emitted further down). Only TERMINATEs when no fallback is configured, with the actual error string in terminationReason. Tests: - PlanCompilerScriptTest.testUnsafeSuccessConditionIsRejected expanded from 6 cases to 22, covering the round-2 bypasses: $.constructor. constructor(...), $['constructor'], comma operator, backticks, backslash escapes (\\u0063onstructor), Object/Reflect/Proxy globals, ternary, __defineGetter__, deeper-position assignment. - testSuccessConditionAllowsLiteralBannedWordInString — new counter- test asserting ``$.kind === 'constructor'`` (banned word in string literal) IS accepted. The new filter strips string literals before the identifier check. - testPlanExecuteSurfacesCompileErrors updated for the renamed ``compile_failed`` sentinel and the route-to-fallback behavior: expects last task in the failed branch to be SUB_WORKFLOW (fallback) when fallback is configured. - testPlanExecuteCompileErrorTerminatesWhenNoFallback (new) — counter- test for the no-fallback path: TERMINATE with the actual error message in terminationReason. - testAgentConfigToBuilderPreservesAllFieldsExceptOverridden (new) — locks the contract for the toBuilder() rebuild. Verifies maxTurns override applies but model/instructions/maxTokens/temperature/ credentials all survive. Catches future regressions if @builder config changes. Verification: PlanCompilerScriptTest 19/19, MultiAgentCompilerTest 42/42, full server suite green.

…, output_select, namespace, docs Addresses the 5 important findings from the second /dg adversarial review. MultiAgentCompiler.compilePlanExecute (markdown_plan wired through) Previously the fallback agent prompt received ``plannerResult`` (the coerced string from planner_coerce). When the planner returned an already-parsed Map via jsonOutput:true, ``plannerResult`` was the re-stringified Map — losing the LLM's original prose. Meanwhile ``extract_json`` was always computing ``markdown_plan`` (the original text) but no consumer read it. Fixed by passing ``${extractRef}.output.result.markdown_plan`` as the fallback plan-text reference into both buildPlanExecutionBranch and buildFallbackOnlyBranch. The fallback agent now sees the planner's actual prose, which is more useful for agentic recovery. MultiAgentCompiler.compilePlanExecute (output_select non-optional) Previous output_select used ``optional:true`` to mask unresolved ``${...}`` refs from inactive branches. That suppression also swallowed real expression bugs. With the new compile-failure fallback path (``${prefix}_compile_fallback``), there are now four possible result refs but only one runs per execution. Replaced with a ``safe()`` helper inside the JS expression that detects Conductor's literal-``${...}`` markers (constructed via String.fromCharCode(36) so this script's own source isn't pre-resolved) and treats them as null. Coalesces in priority order: plan_exec → fallback → compile_fallback → noplan_fallback. Drops ``optional:true`` so genuine expression failures now surface. MultiAgentCompiler.isToolRegisteredInHarness (namespace tightened) Previously recursed into sub-agents. The plan_reader SIMPLE task is emitted in the parent harness's task namespace, so a tool registered only on a deeper sub-agent's worker would not be polled for the parent's task name — a silent runtime hang. Restricted the check to harness-level tools only. The user must declare the plan_source tool on the harness. testPlanExecutePlanSourceWithRegisteredToolCompiles renamed to testPlanExecutePlanSourceWithHarnessLevelToolCompiles and updated to put the tool on the harness (not the planner sub-agent). New testPlanExecutePlanSourceWithSubAgentOnlyToolIsRejected counter-test asserts a tool only on a sub-agent now fails compile with IllegalArgumentException. 100_issue_fixer_agent.py (brittleness documented) The ``coder`` PLAN_EXECUTE harness has one sub-agent and no fallback (per commit 1fad27a). Any compile/extract/validation/tool failure TERMINATEs the SUB_WORKFLOW, ending the SWARM iteration without a chance for QA feedback recovery. Added a comment block above the agent definition explaining the trade-off and how to add a fallback if agentic recovery is wanted. deterministic-coding-workflows.md (failure-mode table expanded) Replaced the 3-row failure table with a 7-row enumeration covering every failure mode the implementation handles: no fence, compile errors (cycle/duplicate-id/unsafe-condition/bad-schema), null-def defensive case, parse failures inside generated ops, tool failures, validation failures, and plan_source.tool misconfiguration. Each row notes the detection point and the routing. Verification: server suite green (43 MultiAgentCompiler tests, 19 PlanCompilerScript tests, all other compilers unchanged).

…lidation result, brace-matching, error message Addresses the 5 findings from a third /dg adversarial review (2 important + 3 minor). Two of the importants are direct consequences of fixes from prior rounds. JavaScriptBuilder.compilePlanToWorkflowScript: Half-wired ambient input forwarding (the fix from re-review #2 was incomplete). Parent harness forwards cwd/credentials/media to the SUB_WORKFLOW input correctly, but compilePlanToWorkflowScript only injected __agentspan_ctx__ and session_id into per-tool SIMPLE task inputParameters — so tools saw workflow.input.cwd populated at the dynamic-workflow level but their own input maps didn't include it. Filesystem tools running inside a plan_execute strategy were effectively rootless. Refactored: added an injectAmbient(args) helper at the top of the script that sets all five ambient inputs (__agentspan_ctx__, session_id, cwd, credentials, media). Replaced six per-site __agentspan_ctx__/session_id assignments (static op, generated op toolInputs, validation, on_success, on_failure) with single injectAmbient(...) calls. DRYs the change and prevents per-site drift. Static 'completed' literal shadowing fallback output (consequence of the re-review #1 optional:true removal): when plan.validation is absent, outputParameters emitted result='completed' / status='completed' as literal strings — regardless of actual completion. Conductor resolves output mapping on FAILED workflows too. Then the parent's output_select coalesce read _plan_exec.output.result first, saw the literal 'completed', decided it was truthy and not a ${...} placeholder, and returned it. Fallback ran but its recovered output was shadowed. Tests masked it because every test path used a validation block. Fixed by tracking lastOpRef during step emission (= taskReferenceName of the last top-level task in each step). When lastAggRef is null, result is now ref(lastOpRef + '.output.result') — pointing at the last tool's actual output. On a FAILED workflow this resolves to a literal ${...} string which the parent's safe() helper detects and skips, letting the fallback's recovered output surface as the harness result. Status output keeps the 'completed' literal as a placeholder for the no-validation path — that string is informational only and not in the fallback-shadowing chain. JavaScriptBuilder.extractJsonFenceScript Case 4: Brace-matching JSON extraction counted { / } without tracking string- literal state. {"key": "}{}"} miscounted and sliced the wrong substring. Now tracks in-string state with backslash-escape handling during the brace walk so braces inside string values are ignored. MultiAgentCompiler.isToolRegisteredInHarness error message: Said "is not registered as a tool on '<harness>' or any of its sub- agents" but the validator only checks harness-level tools (after the re-review #2 namespace tightening). The message gave users false hope of putting the tool on a sub-agent — they'd move it and fail again. Updated to "is not registered as a harness-level tool on '<harness>'. The plan_reader task is emitted in the harness's task namespace, so the tool must be declared in tools=[...] on the harness itself (declaring it on a sub-agent does not work)." Tests: - testInnerToolTasksReceiveCwdCredentialsMedia (new): compiles a real plan with static op + generated op + validation + on_success + on_failure, walks every emitted SIMPLE task, asserts every one of them contains cwd/credentials/media/session_id/__agentspan_ctx__ as inputParameters. Catches the prior incomplete-plumbing regression. - testNoValidationPlanResultPointsAtLastTaskNotLiteral (new): asserts outputParameters.result references a task output (${...}.output.result) instead of the literal 'completed' for plans without a validation block. Locks the fix against future regression. Verification: PlanCompilerScriptTest 21/21, MultiAgentCompilerTest 43/43, full server suite green.

…ator, sibling-fix gaps Addresses 4 findings from a fourth /dg adversarial review (1 critical + 3 important). Three of the four are 'fix-didn't-reach-siblings' from prior rounds — patches that landed at one site but missed structurally identical neighbors. Adds a structural invariant test to catch this class of regression going forward. JavaScriptBuilder.compilePlanToWorkflowScript — injectAmbient ordering CRITICAL. The LLM-generated tool branch built toolInputs with ``var toolInputs = injectAmbient({})`` first, then overlaid LLM-driven schema keys on top. The static-args branch did the inverse (correct) ordering: copy LLM args first, ambient override last. The comment on the helper definition explicitly said ambient values are "forced overrides — LLM-supplied args cannot redirect these." An LLM-authored ``output_schema`` containing a key named ``cwd``, ``credentials``, ``media``, ``session_id``, or ``__agentspan_ctx__`` could redirect the filesystem root or substitute credentials — exactly the threat the helper exists to prevent. Fix: changed the LLM branch to start with ``var toolInputs = {}``, populate from the schema-key loop (or _args fallback in the catch), and call ``injectAmbient(toolInputs)`` at the end so ambient keys always win. Mirrors the static-args branch. JavaScriptBuilder.compilePlanToWorkflowScript — lastOpRef = joinRef Conductor's JoinTask.output is ``{taskRef → outputMap}`` with no top-level ``result`` key. ``${joinRef.output.result}`` resolved to a literal placeholder, the parent's safe() helper stripped it, and any plan whose terminal step was parallel:true with no validation block produced an empty result. Fix: after the JOIN, emit an INLINE aggregator (``parallel_agg_<step>``) that collects each branch's last task's output into an array and points lastOpRef at it. ``${aggRef.output.result}`` now resolves to the array of branch outputs. JavaScriptBuilder.extractJsonFenceScript — Case 5c not string-aware Round 3 added string-literal-aware brace matching to Case 4 (planner-text path) but the structurally identical block for planReaderContent (Case 5c) was untouched. A plan_source tool returning content with ``"steps"`` or ``{`` / ``}`` inside string literals would mis-slice. Fix: extracted the brace-walk into a ``findMatchingBrace(text, openIdx)`` helper at the top of the script and replaced both Case 4 and Case 5c with helper calls. Single source of truth — future fixes won't drift. MultiAgentCompiler — plan_reader doesn't forward cwd/credentials/media Round 3's ambient-forwarding fix landed in compilePlanToWorkflowScript via injectAmbient but missed the plan_reader SIMPLE task in buildPlanExecutionBranch. A reader tool needing cwd (e.g. workspace filesystem reads) would silently fail and fall through to the no_plan branch. Fix: added cwd/credentials/media forwards to readerInputs alongside the existing session_id/__agentspan_ctx__ forwards. Kept optional:true with a comment explaining why (a reader pointed at a not-yet-written contextbook section is a normal "no fallback content" condition; the no_plan SWITCH path surfaces real failures via the fallback agent or TERMINATE). Tests: - testEverySimpleTaskHasFiveAmbientKeys (NEW, structural invariant) — compiles a plan exercising static + generated + parallel + validation + on_success + on_failure ops, walks every emitted SIMPLE task (recursing into FORK_JOIN branches AND SWITCH decisionCases for the parseGate-wrapped tool tasks), and asserts every one carries cwd, credentials, media, session_id, __agentspan_ctx__. This is the "fix-didn't-reach-siblings" guard — if any future tool emission forgets injectAmbient(), this test fails immediately. - testGeneratedOpAmbientKeysWinOverLLMSuppliedSchemaKeys (NEW, regression for the critical) — compiles a plan whose output_schema contains keys named ``cwd``, ``credentials``, ``media`` (the LLM-driven attack shape) plus a non-colliding ``safe_field``. Asserts the emitted SIMPLE task's cwd/credentials/media point at workflow.input.* (ambient won), and safe_field points at the parsed result (legitimate non-ambient flow). - collectTasks helper now recurses into SWITCH decisionCases and defaultCase — required for the structural invariant test to walk the parseGate-wrapped tool tasks. Verification: PlanCompilerScriptTest 23/23, MultiAgentCompilerTest 43/43, full server suite green.

… SWITCH for result references Addresses 3 findings from a fifth /dg adversarial review (2 important, 1 minor). Same root cause as the round-4 lastOpRef-=-joinRef fix, recurring at a third host: parseGate SWITCH terminals. Root pattern (round 4 + round 5): wrapper task types like SWITCH (parseGate) and JOIN don't expose a meaningful ``.result`` on their output. SWITCH outputs the case decision; JOIN outputs ``{taskRef → outputMap}``. References like ``${pgate_*.output.result}`` or ``${join_*.output.result}`` resolve to literal placeholders. The round-4 fix wrapped JOIN with a parallel_agg INLINE that produces a real ``.result``. But the same disease afflicts parseGate SWITCHes — when a generated op terminates a branch (``chain.push(parseGate)``), the chain's terminal is the SWITCH, not the inner tool. parallel_agg's per-branch inputs and the sequential lastOpRef both ended up pointing at the SWITCH wrapper. Fix: introduced ``innerRefMap`` and ``terminalRef(task)``. When emitting a parseGate, register ``innerRefMap[parseGate.taskReferenceName] = toolRef`` — the actual SIMPLE tool task wrapped by the SWITCH. ``terminalRef(task)`` returns the inner ref when one exists, else the task's own ref. Use ``terminalRef`` at: - parallel_agg's per-branch inputs (was ``ref(joinOn[ja] + ...)``, now ``ref(terminalRef(branches[ja][...]) + ...)``). joinOn keeps the SWITCH ref for ordering. - sequential lastOpRef assignment (was branch task's ref directly, now ``terminalRef(branches[b2][t])``). Tests: - testSequentialTerminalGeneratedOpResultPointsAtInnerTool (NEW) — sequential plan ending in generated op asserts outputParameters.result references ``t_*`` (inner tool), not ``pgate_*`` (SWITCH wrapper). - testParallelTerminalGeneratedOpAggregatorPointsAtInnerTool (NEW) — parallel branches all ending in generated ops; asserts parallel_agg inputs ``b0``, ``b1`` reference ``t_s1_*`` (inner tools), not ``pgate_s1_*``. Both tests are CLAUDE.md-compliant: written first, manually verified to fail against pre-fix code (lastOpRef pointed at parseGate SWITCH), then pass after the terminalRef change. Three rounds of the same root cause: - round 3: cwd/credentials/media — parent forwarded to SUB_WORKFLOW input, missed per-tool task injection (sibling miss). - round 4: lastOpRef = joinRef, JOIN has no .result (sibling miss). - round 5: lastOpRef = parseGate SWITCH, SWITCH has no .result (sibling miss). The structural invariant test from round 4 caught ambient-key forgetfulness but didn't assert result-reference correctness for terminal tasks. The two new round-5 tests close that specific gap. Verification: PlanCompilerScriptTest 25/25, MultiAgentCompilerTest 43/43, full server suite green.

… nudge + non-empty SWITCH passed branch Bumped Conductor from 3.3.0-SNAPSHOT (local build) to 3.30.0.rc12. Two latent regressions surfaced under the new server. JavaScriptBuilder.compilePlanToWorkflowScript — user-message json mention OpenAI's Responses API requires the literal word "json" in input (user) messages when ``text.format`` is ``json_object``. The system message's "JSON" mention is not sufficient — Responses API checks user-message content specifically. My round-1 fix had removed the trailing "Respond with valid JSON only." user-message nudge on the rationale that system prompt + jsonOutput:true carried the contract. That dismissal as "cargo cult" was wrong: it was the OpenAI-required guard. Restored as "Respond as json." (minimal, lowercase, satisfies the check). Comment updated to explain why this is required, not redundant. JavaScriptBuilder.compilePlanToWorkflowScript — empty SWITCH branch Conductor 3.30's SWITCH falls through to defaultCase when the matched decision case is empty. With the round-2 inversion (``decisionCases:{passed: onSuccess}``, ``defaultCase: onFailure``) and the common case of no on_success actions, val_agg='passed' matched the 'passed' case (0 tasks), Conductor fell through to defaultCase, TERMINATE ran, workflow falsely FAILED. Insert a no-op INLINE in the 'passed' branch when on_success is empty so the matched case is never empty. The no-op produces a sentinel result downstream consumers ignore. Tests: - testValidationPassedBranchHasNoOpWhenOnSuccessIsEmpty (NEW) — asserts the 'passed' branch is non-empty when no on_success is declared, and the inserted task is the ok_noop INLINE. - testValidationPassedBranchPreservesOnSuccessTasks (NEW) — counter- test: when on_success IS provided, those tasks land in the passed branch and the no-op is NOT inserted. Verification: - PlanCompilerScriptTest: 27/27 (was 25; +2 regression tests) - MultiAgentCompilerTest: 43/43 - Python unit tests: 1629/1629 - Python e2e tests/integration/test_plan_execute_live.py: 3/3 in 31.9s (under Conductor 3.30.0.rc12 with real OpenAI calls)

bradyyie

QA review: This PR has merge conflicts. It's a massive change (80+ files) spanning Python SDK, TypeScript SDK, Java SDK, and server — adds plan-execute harness, coding agent, worker liveness, state updates protocol, and many tests. Too large and conflicting to approve as-is. Recommend: rebase against main and consider splitting the liveness, plan-execute, and coding agent features into separate PRs.

# Conflicts: # sdk/java/src/main/java/ai/agentspan/internal/ToolRegistry.java # sdk/python/src/agentspan/agents/agent.py # sdk/python/src/agentspan/agents/tool.py # server/src/main/java/dev/agentspan/runtime/model/AgentConfig.java

Move the coding-agent and generic-agent harness design docs out of this PR. The harness module itself is uncommitted WIP and will land on the separate feat/coding-agent-harness branch. This PR now covers only: a) stateful-tools fixes in the SDKs (domain propagation, worker liveness, handoff_check registration, stateful re-registration on domain change) b) PAC/PAE — Strategy.PLAN_EXECUTE, prefill_tools, planSource, success condition eval, parallel FORK_JOIN validations

Continuation of PAC/PAE work, restored from in-progress WIP after the main-sync split: * PAC (Plan-And-Compile): plans.py SDK helpers, PlanAndCompileTask + config on the server, MultiAgentCompiler/AgentCompiler/ToolCompiler/GuardrailCompiler + accompanying tests (SynthOutputScriptTest, EnrichToolsScriptTest, PlanAndCompileTaskTest, AgentCompileE2ETest). * PAE (Plan-And-Execute): docs/concepts/plan-execute.md, integration tests (test_plan_execute_live, test_pac_toolType_routing_e2e), example flows (103_plan_and_compile.py, 104_plan_execute_guardrails.py, 106_plan_execute_agent_fanout.py, 107_pac_mcp_proof.py). * UI: CompiledPlanView, AgentExecutionDiagram updates, agentExecutionUtils, LeftPanelTabs, Execution.jsx state machine for compiled-plan rendering. * SDK plumbing: ModelContextWindows, WorkflowTaskUtils, JavaScriptBuilder, AgentChatCompleteTaskMapper, AgentRun/StartRequest/GoogleADKNormalizer. * Java SDK examples: Example48Planner; Agent + AgentConfigSerializer updates. * Worker domain contract spec (docs/design/WORKER_DOMAIN_CONTRACT.md). * Restored main-side fields that the stash apply had overwritten: AgentConfig.maskedFields, Agent.synthesize/masked_fields, ToolDef.retry_count/retry_delay_seconds, plus their serializer paths.

@tool

… tests The WIP stash apply in be808e0 clobbered three main-side PRs in runtime.py (the merge had them, the WIP version did not). Restoring them and fixing two unrelated CI breakers introduced by the same WIP commit. runtime.py — restore three main PRs: * #200 fix(python-sdk): register transfer tool workers for hybrid agents (this is a stateful-workers fix — without it, hybrid agents with both tools and sub-agents deadlock because the transfer_to_<name> SIMPLE tasks have no Python workers polling for them) -> re-add _register_hybrid_transfer_workers method + its call site -> include transfer_to_<sub> names in _collect_worker_names * #196 fix(sdk): guard _collect against non-Agent sub-agents (LangGraph) -> add isinstance() guard in both _collect inner functions * #188 feat(sdk): retry_count / retry_delay_seconds in @tool decorator -> re-add params to _default_task_def signature -> tool_registry.py passes ToolDef's retry_count/retry_delay_seconds server/build.gradle — revert conductorVersion to 3.30.0.rc12. The WIP bumped to rc13 which lives only in a local conductor checkout (not published to Maven). CI server-tests couldn't resolve the JARs. sdk/typescript/src/types.ts — ToolDef.call is now optional (call?()). The WIP made it required, breaking ToolDef literals like the one in code-execution.ts:246 (CodeExecutor.asTool()) which don't need to be prefill-callable. tests/unit/test_contextbook_flow.py — three stale assertions updated to match the WIP's intentional behavior changes in _issue_fixer_tools.py: * VALID sections set now includes 'task_brief' (added by WIP) * Fetcher write step now also writes task_brief (matches the new _fetcher_done gate in 100_issue_fixer_agent.py) * Repeat-read assertion checks the new 'REPEAT READ #4' / 'STOP RE-READING' banner instead of the old 'repeat read limit exceeded' substring. Local verification: * Python: 1689 unit tests pass (was 1681 pass + 8 fail) * TypeScript: full build + dts build succeed * Server: rc12 is the latest published Maven version (the previous coding_agent state used rc12 successfully)

Two more carry-overs from the WIP overwrite, both blocking server-tests in CI: AgentService.java — restore main PR #195's deleteExecutionRecord and pruneExecutions methods (plus the java.time imports they need). The WIP version of AgentService.java predates that PR, but AgentController still calls both methods — the compileJava target failed with 'cannot find symbol' until these were re-added. server/.../util/PlanCompilerScriptTest.java — delete (WIP intended). The test exercised JavaScriptBuilder.compilePlanToWorkflowScript(), which the PAC/PAE refactor removed. Replacement coverage now lives in PlanAndCompileTaskTest, SynthOutputScriptTest, EnrichToolsScriptTest, and ModelContextWindowsTest — all already committed in be808e0. Local verification: ./gradlew test → BUILD SUCCESSFUL, 578 tests, 0 failures, 0 errors

The project enforces a 'no inline fully-qualified names' lint via a custom Gradle task. Two carry-overs from the WIP overwrite violated it: * AgentCompiler.java:1257 — `.collect(java.util.stream.Collectors.joining(", "))` -> add `import java.util.stream.Collectors;`, use `Collectors.joining` * AgentChatCompleteTaskMapper.java:665 — `java.util.Set<Integer> ... = new java.util.LinkedHashSet<>();` -> add `import java.util.LinkedHashSet;`, drop the `java.util.` prefix (`java.util.Set` is already imported via `import java.util.Set;`) Verified: ./gradlew checkNoInlineFQN -> BUILD SUCCESSFUL ./gradlew test -> BUILD SUCCESSFUL (578 tests, 0 fail)

Run ./gradlew :spotlessApply on the 5 WIP files that violated formatting: * AgentCompiler.java * PlanAndCompileTask.java * AgentChatCompleteTaskMapperTest.java * AgentCompilerTest.java * PlanAndCompileTaskTest.java Mechanical line-break / chain-call reformatting; no behavior change. Verified: ./gradlew :spotlessJavaCheck -> BUILD SUCCESSFUL ./gradlew test -> BUILD SUCCESSFUL

…sult.error Carry-over from the WIP overwrite: PR #201 was on main when we merged it in, but the stash apply rolled runtime.py back to the pre-#201 version. Restored: * Two _run* tail paths inspect tasks for the first FAILED task and use its reasonForIncompletion (prefixed with the task's referenceTaskName) as AgentResult.error, falling back to the workflow-level reason. * _extract_failed_task_reason helper. So instead of the opaque workflow reason, callers now see something like 'Task ''manager_llm'' failed: LLM API returned 429' — diagnosable without opening the execution history UI.

v1r3n added 30 commits April 23, 2026 20:51

docs: add issue fixer agent implementation plan

b5562f0

3-chunk plan: tools (21 @tool functions), agent assembly (6 agents + instructions + pipeline), and verification. 8 tasks, ~30 steps. Files: _issue_fixer_tools.py, _issue_fixer_instructions.py, 100_issue_fixer_agent.py.

feat(examples): add 21 tools for issue fixer agent

6a441bb

feat(examples): add agent instructions for issue fixer

e422152

Six detailed prompt strings: Issue Analyst, Tech Lead, Coder, DG Reviewer, QA Lead, PR Creator. Parameterized with {repo}, {branch_prefix}, {max_review_cycles}, {max_e2e_retries}.

feat(examples): add issue fixer agent — autonomous issue-to-PR pipeline

71a4983

docs: update spec entry point to use join() instead of serve()

56c94ca

fix(examples): prevent .contextbook from being committed

4c2a262

Two-layer fix: 1. Issue Analyst adds '.contextbook/' to .gitignore after cloning 2. All git add commands use ':!.contextbook' exclude pathspec

fix(examples): fetch full issue context including assignees, mileston…

7524526

…e, reactions

docs: add issue fixer agent implementation plan

1aa6a9c

3-chunk plan: tools (21 @tool functions), agent assembly (6 agents + instructions + pipeline), and verification. 8 tasks, ~30 steps. Files: _issue_fixer_tools.py, _issue_fixer_instructions.py, 100_issue_fixer_agent.py.

feat(examples): add 21 tools for issue fixer agent

22a26ee

feat(examples): add agent instructions for issue fixer

fcb0f2e

Six detailed prompt strings: Issue Analyst, Tech Lead, Coder, DG Reviewer, QA Lead, PR Creator. Parameterized with {repo}, {branch_prefix}, {max_review_cycles}, {max_e2e_retries}.

feat(examples): add issue fixer agent — autonomous issue-to-PR pipeline

c834bcb

v1r3n added 18 commits May 6, 2026 15:49

feat(sdk): re-export WorkerStartupError, WorkerStallError

b9c699d

style(server): apply spotless formatting

ccc13c1

No functional changes — line-wrap and indentation adjustments across AgentCompiler, AgentChatCompleteTaskMapper, AgentService, and AgentCompilerTest.

bradyyie reviewed May 12, 2026

View reviewed changes

v1r3n and others added 9 commits May 13, 2026 10:37

Merge branch 'main' into coding_agent

ca0f9f0

This was referenced May 14, 2026

fix: stateful-tools domain handling + preserve tool output in conversation history #237

Merged

feat: Strategy.PLAN_EXECUTE — PAC/PAE compile-and-execute for LLM plans #238

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Coding agent#163

Coding agent#163
v1r3n wants to merge 131 commits into
mainfrom
coding_agent

v1r3n commented Apr 26, 2026

Uh oh!

bradyyie left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

v1r3n commented Apr 26, 2026

Uh oh!

bradyyie left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants