feat: add WhaleFlow — declarative multi-agent workflow orchestration by AdityaVG13 · Pull Request #2482 · Hmbown/CodeWhale

AdityaVG13 · 2026-06-01T05:34:47Z

New crate: crates/whaleflow providing declarative JSON-config-driven sub-agent swarm orchestration for CodeWhale. Inspired by Claude Code's Dynamic Workflows

WorkflowConfig JSON schema with phases, tasks, dependencies
Topological scheduler with semaphore-based concurrency control
File-scope conflict detection for parallel write safety
Git worktree isolation per task (create → extract → apply → clean)
Structured WorkflowResult with per-task cost/token tracking
workflow_run tool schema for model invocation
TUI integration via WhaleFlowSpawner (SubAgentManager bridge)
18 tests: 15 unit + 3 integration

Summary

Testing

cargo fmt --all -- --check
cargo clippy --workspace --all-targets --all-features
cargo test --workspace --all-features

Checklist

Updated docs or comments as needed
Added or updated tests where relevant
Verified TUI behavior manually if UI changes

Greptile Summary

This PR adds crates/whaleflow, a new declarative multi-agent orchestration layer that lets the model drive sub-agent swarms through a JSON workflow config. The TUI crate is extended with a workflow_run tool backed by WhaleFlowSpawner, which translates the scheduler's phase/task graph into SubAgentManager calls with optional git-worktree isolation.

Scheduler & config: topological phase ordering, semaphore-bounded parallelism, file-scope conflict detection, and a validated WorkflowConfig schema. IsolationMode::Worktree correctly returns Some(path) from cwd_path(), and git operations in WhaleFlowSpawner are correctly offloaded to tokio::task::spawn_blocking.
timeout_secs / max_steps: both fields are now fully wired — timeout_secs wraps the poll loop in tokio::time::timeout and max_steps flows through SubAgentSpawnOptions — but neither appears in WORKFLOW_RUN_SCHEMA, so the model cannot discover or use them.
Parallel abort semantics: dropping a JoinHandle in Tokio detaches rather than cancels the task; when FailurePolicy::Abort fires mid-phase the remaining handles are dropped and those sub-agents continue running in the background, potentially writing to the shared workspace after the workflow reports Aborted.

Confidence Score: 3/5

Safe to merge only after addressing the parallel-abort task-detachment issue; sub-agents for Shared-isolation ReadWrite tasks can continue writing to the workspace after the orchestrator considers the workflow aborted.

The worktree lifecycle, blocking-IO isolation, and timeout wiring are well-implemented. However, the parallel fan-out code drops JoinHandles on abort without cancelling the underlying tokio tasks — those tasks are detached and keep running, potentially making file changes in the main workspace that the scheduler has already declared complete or skipped. This is a real behavioral defect on a core execution path.

crates/whaleflow/src/scheduler.rs (parallel abort detachment) and crates/whaleflow/src/tool.rs (schema missing max_steps/timeout_secs)

Important Files Changed

Filename	Overview
crates/whaleflow/src/scheduler.rs	Core scheduler with topological sort, parallel fan-out, and failure handling. Two issues: parallel Abort drops JoinHandles without cancelling detached tasks (allowing background writes after abort), and non-deterministic phase ordering for independent phases due to HashMap iteration.
crates/whaleflow/src/tool.rs	Exposes workflow_run to the model and wires execute_workflow. The JSON schema for task properties is missing max_steps and timeout_secs, making both fully-implemented fields invisible to the model.
crates/tui/src/tools/workflow/mod.rs	WhaleFlowSpawner bridges the whaleflow scheduler to SubAgentManager. Git operations are correctly offloaded to spawn_blocking, timeout wrapping is sound, and worktree lifecycle (create→extract→apply→remove) is properly sequenced with warnings on partial failure.
crates/whaleflow/src/worktree.rs	Worktree lifecycle management using std::process::Command. extract_changes runs git diff HEAD which only captures uncommitted working-tree changes — committed sub-agent work is silently lost (flagged in a prior review thread).
crates/whaleflow/src/config.rs	Workflow/Phase/Task schema with validation, conflict detection, and cycle checking. scopes_overlap correctly uses path-segment boundary comparisons. IsolationMode::cwd_path() now returns Some for Worktree, resolving a prior concern.
crates/tui/src/core/engine.rs	Wires WhaleFlowSpawner into the tool registry. Correctly guards workflow_tool registration on runtime availability, preserving prior panic-on-None behavior for sub-agent tools.
crates/whaleflow/tests/integration_test.rs	Five integration tests covering three-phase workflows, partial failure, JSON round-trip, and abort policies. MockSpawner returns instantly, which masks the detached-task behavior when Abort fires in a parallel phase.

Sequence Diagram

sequenceDiagram
    participant M as Model
    participant WT as WorkflowRunTool
    participant EW as execute_workflow
    participant S as Scheduler
    participant WFS as WhaleFlowSpawner
    participant WM as WorktreeManager
    participant SAM as SubAgentManager

    M->>WT: "workflow_run({config: ...})"
    WT->>EW: execute_workflow(config_json, spawner)
    EW->>S: Scheduler::new(config, spawner)
    EW->>S: run()

    loop for each phase (topo order)
        S->>S: build_prompt(task) — inject upstream results
        alt parallel phase
            par for each task
                S->>WFS: spawn(task_id, prompt, cwd, timeout, max_steps)
                alt "isolation = worktree"
                    WFS->>WM: create(task_id, workspace) via spawn_blocking
                    WM-->>WFS: worktree_path
                end
                WFS->>SAM: spawn_background_with_assignment_options
                loop poll 250ms
                    WFS->>SAM: get_result(agent_id)
                    SAM-->>WFS: status
                end
                alt completed and worktree
                    WFS->>WM: extract_changes (git diff HEAD)
                    WFS->>WM: apply_patch (git apply)
                    WFS->>WM: remove (git worktree remove)
                end
                WFS-->>S: AgentResult
            end
        else sequential phase
            S->>WFS: spawn(...)
            WFS-->>S: AgentResult
        end
    end

    S-->>EW: WorkflowResult
    EW-->>WT: result JSON
    WT-->>M: ToolResult(success, json)

Comments Outside Diff (3)

crates/whaleflow/src/worktree.rs, line 2029-2036 (link)

Blocking std::process::Command called from inside async context

WorktreeManager::create, extract_changes, apply_patch, and remove all use std::process::Command::output() / wait_with_output(), which are blocking syscalls. These methods are called directly inside the async fn spawn(...) implementation of WhaleFlowSpawner, which itself runs on a tokio worker thread (via tokio::spawn in the scheduler's parallel fan-out path). Blocking a tokio worker thread with long-running git operations (especially git worktree add or git apply on a large repository) can starve the async runtime and degrade all concurrent tasks.

The fix is to wrap each Command call in tokio::task::spawn_blocking(|| ...) and .await the result, or switch to tokio::process::Command.
crates/whaleflow/src/config.rs, line 761-784 (link)

scopes_overlap has false positives due to string prefix vs. path prefix mismatch

strip_glob("src/auth/**") yields "src/auth" and strip_glob("src/auth_admin/**") yields "src/auth_admin". The check "src/auth_admin".starts_with("src/auth") returns true (string prefix), so these two entirely disjoint directory scopes are incorrectly flagged as overlapping. std::path::Path::starts_with enforces component boundaries and would correctly return false here (Path::new("src/auth_admin").starts_with(Path::new("src/auth")) → false). The impact is a spurious OverlappingScopes warning logged at WARN level; the workflow continues, but users see misleading diagnostics.
crates/whaleflow/src/config.rs, line 635-648 (link)

depends_on_results can reference tasks in the same parallel phase, silently receiving no data

Validation confirms that depends_on_results IDs exist somewhere in the workflow, but it does not verify that they belong to a prior phase. If task B in a parallel phase lists task A (in the same phase) in depends_on_results, both tasks are spawned concurrently. When the scheduler calls build_prompt for B before A has completed, self.results.get("A") returns None, and the prompt silently includes "### A (not available)\n\n" instead of real context. The model sees no error — it just gets empty upstream data, producing subtly wrong behavior with no diagnostic.

_{Reviews (4): Last reviewed commit: "fix(whaleflow): improve scopes_overlap w..." | Re-trigger Greptile}

Greptile also left 1 inline comment on this PR.

New crate crates/whaleflow providing declarative JSON-config-driven sub-agent swarm orchestration for CodeWhale. Inspired by Claude Code's Dynamic Workflows (Opus 4.8, May 2026). - WorkflowConfig JSON schema with phases, tasks, dependencies - Topological scheduler with semaphore-based concurrency control - File-scope conflict detection for parallel write safety - Git worktree isolation per task (create → extract → apply → clean) - Structured WorkflowResult with per-task cost/token tracking - workflow_run tool schema for model invocation - TUI integration via WhaleFlowSpawner (SubAgentManager bridge) - 18 tests: 15 unit + 3 integration

gemini-code-assist · 2026-06-01T05:34:50Z

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

…_blocking - cwd_path() now returns worktree path for Worktree variant (was dead code) - parallel phases now honor Abort failure policy - WorktreeManager git calls wrapped in tokio::spawn_blocking - timeout_secs wired end-to-end with tokio::time::timeout on polling loop - AgentSpawner trait extended with timeout_secs/max_steps parameters - WorkflowRunTool no longer claims ReadOnly capability - unknown agent_type now logs a warning instead of silently defaulting Addresses Greptile review: P1 (blocking Command), P2 (dead timeout_secs)

greptile-apps · 2026-06-01T06:13:15Z

+    pub fn extract_changes(task_id: &str, workspace: &Path) -> Result<String, SpawnError> {
+        let relative = format!(".worktrees/whaleflow-{}", task_id);
+        let worktree_path = workspace.join(&relative);
+
+        let output = Command::new("git")
+            .arg("-C")
+            .arg(&worktree_path)
+            .arg("diff")
+            .arg("HEAD")
+            .output()
+            .map_err(|e| {
+                SpawnError::WorktreeError(format!("git diff in worktree failed: {}", e))
+            })?;
+
+        if !output.status.success() {
+            let stderr = String::from_utf8_lossy(&output.stderr);
+            return Err(SpawnError::WorktreeError(format!(
+                "git diff in worktree failed: {}",
+                stderr.trim()
+            )));
+        }
+
+        Ok(String::from_utf8_lossy(&output.stdout).into_owned())


git diff HEAD misses committed changes — data loss for committing sub-agents

extract_changes runs git diff HEAD inside the worktree, which compares the working directory against the worktree's current HEAD. If a sub-agent commits any of its work (advancing HEAD in the worktree), git diff HEAD only captures uncommitted changes after the last commit — all committed changes between the worktree's initial HEAD and its final HEAD are excluded from the patch. When the worktree is then removed, those committed changes are permanently lost.

The fix is to record the initial HEAD SHA when create is called, then use git diff <initial_sha> (working-tree vs initial commit) or git diff <initial_sha>..HEAD (all commits + unstaged) in extract_changes to capture the full set of changes.

… tests - max_steps flows through SubAgentSpawnOptions to per-agent step budget - extract_changes errors now logged instead of silently ignored - files_touched populated from worktree diff output - TaskStatus re-exported from whaleflow crate - 3 new tests: abort in parallel phase, abort stops subsequent phases, timeout_secs/max_steps deserialization Addresses Greptile P1 (silent failure on extract_changes)

greptile-apps · 2026-06-01T07:03:20Z

+                let handle = tokio::spawn(async move {
+                    let _permit = sem.acquire().await;
+                    spawner
+                        .spawn(task_id_for_closure, prompt, task.agent_type.clone(), cwd, timeout_secs, max_steps)
+                        .await
+                });
+                handles.push((task_id, handle));
+            }
+


Spawned tasks are detached, not cancelled, on Abort in parallel phase

When a task fails and FailurePolicy::Abort triggers inside the for (task_id, handle) in handles loop, the loop breaks and the remaining JoinHandles in handles are dropped. Dropping a tokio::spawn JoinHandle in Tokio detaches the task — the underlying async task continues running to completion. For Shared-isolation ReadWrite tasks this means additional sub-agents keep writing to the main workspace after the scheduler has already reported the workflow as Aborted, producing changes the orchestrator never learns about and that can conflict with subsequent workflow runs. The integration test passes because the MockSpawner returns instantly, so the detached task finishes before the test asserts anything, masking the real-world behavior.

mo-vic · 2026-06-05T14:08:00Z

Cool feature! I can't wait to play with it.

Hmbown · 2026-06-06T01:59:02Z

Thanks @AdityaVG13. I did a fresh v0.9 stewardship pass on this and #2486.

This is the right WhaleFlow direction and I want to preserve it as the source branch for the workflow runner work, but I am not going to merge or directly harvest the full PR into the current v0.9 integration branch as-is.

Current release evidence:

The current codex/v0.9.0-stewardship branch still has design references only; it does not have crates/whaleflow, WorkflowRunTool, or workflow_run registered in code yet.
This PR is draft/dirty and its latest matrix has failing lint and Windows tests.
The review findings are real release risks for an agent/workflow executor: aborting a parallel phase can detach still-running agents, worktree extraction can miss committed work, and the model-facing schema still needs to expose the execution controls it implements.
The milestone definition is stricter than a JSON-only runner: v0.9 wants a typed workflow IR / Rust executor path with branch/leaf semantics, replay/evidence, and clear safety boundaries.

Safe path from here: keep this PR open as the intent/source branch, then land WhaleFlow in smaller maintainer slices against the v0.9 branch. The first viable slice should be something like typed config/IR validation plus deterministic scheduler tests, behind a feature/config gate and without exposing a write-capable workflow_run tool until cancellation, worktree diff capture, and replay/evidence semantics are airtight. Any harvested slice should credit you in the commit/PR body/changelog.

Thanks also @mo-vic for the product interest here. The feature is exciting; the restraint is only because workflow orchestration is exactly the kind of surface where a partial merge can write to the wrong place or leave agent work running after the UI says it stopped.

@AdityaVG13

Adds @AdityaVG13 to the contribution-gate allowlist now that WhaleFlow #2482/#2486 have been harvested into the maintained v0.9 IR/TraceStore foundation with public credit.

Hmbown · 2026-06-06T03:54:43Z

Thanks @AdityaVG13. As part of v0.9 stewardship, I opened #2821 as a narrow maintainer harvest of the safe typed-IR naming surface from this WhaleFlow direction.

What was harvested: explicit WorkflowSpec, WorkflowNode, branch/leaf specs, budget/permission/model/promotion policy metadata structs, and workflow_ir_roundtrip coverage in crates/whaleflow.

What remains intentionally out of scope for that maintainer PR: runtime workflow_run exposure, executor behavior, deterministic replay, worktree application, and model/provider routing. This PR still carries the broader orchestration intent, so I am not treating it as replaced wholesale.

Add the explicit WorkflowSpec/WorkflowNode metadata surface requested for the v0.9 WhaleFlow IR, including budget, permission, model, and promotion policy records plus serde roundtrip coverage. Runtime execution, replay, and worktree application remain out of scope. Refs #2668, #2482, #2486. Co-authored-by: AdityaVG13 <44177453+AdityaVG13@users.noreply.github.com>

Hmbown · 2026-06-06T04:24:15Z

Thanks again @AdityaVG13. I opened #2823 as another narrow v0.9 maintainer harvest from this WhaleFlow direction.

What #2823 takes: a crate-local mock executor skeleton over WorkflowSpec, acceptance-style tests for #2669 control flow, ExpandSpec::max_children, generated-node validation, and pure BranchTournament / ParetoFrontier reducer scaffolding.

What remains intentionally out of scope: live workflow_run, real subagent spawning, worktree apply/extract, TraceStore writes, replay, provider routing, and TUI workflow mode. This PR remains the broader source branch for that intent, so I am not treating it as replaced wholesale.

Add a crate-local mock executor over WorkflowSpec that records leaf, branch, and control-node results for Sequence, BranchSet, Leaf, Reduce, TeacherReview, LoopUntil, Cond, and Expand. Add reducer scaffolding for BranchTournament and ParetoFrontier, plus #2669 acceptance-style tests, without exposing workflow_run, spawning agents, or applying worktrees. Refs #2669. Harvests narrow WhaleFlow executor intent from #2482/#2486. Co-authored-by: AdityaVG13 <44177453+AdityaVG13@users.noreply.github.com>

Hmbown · 2026-06-06T05:08:05Z

v0.9 stewardship update: I opened #2829 as another narrow WhaleFlow maintainer slice inspired by the broader workflow direction here.

#2829 adds crate-only deterministic replay from recorded leaf/control records, including stable leaf input hashes and replay_diverged for missing records. It deliberately avoids runtime commands, live provider calls, worktree replay, and the broader draft scope, so this PR remains open as the larger source branch.

Thanks @AdityaVG13 for the WhaleFlow draft direction; the credit is preserved in the #2829 PR body and changelog.

Hmbown · 2026-06-06T05:13:18Z

v0.9 stewardship update: I opened #2830 as a crate-only model-policy slice for WhaleFlow.

#2830 adds role/capability model selection, mock provider plumbing, and fail-closed JSON repair parsing without live provider calls or runtime provider switching. It keeps the broader #2672 provider-routing/adapter work out of scope while preserving the broader WhaleFlow draft direction here.

Thanks @AdityaVG13; the changelog and PR body keep the broader WhaleFlow draft credit trail intact.

Hmbown · 2026-06-06T05:16:52Z

Another narrow WhaleFlow foundation slice is up in #2831. It keeps the work inside codewhale-whaleflow: the rlm_cache_change.star dogfood workflow now compiles and runs through the mock executor, with candidate branches, LoopUntil verification, tournament selection, teacher review, and reduction represented in the IR.

This still does not claim the full runtime workflow mode, live provider replay, or shared persistence pieces from this broader proposal. It is intended as CI-backed scaffolding we can build on safely for v0.9.0. Thanks @AdityaVG13 for the draft architecture and cost-tracking direction that shaped these WhaleFlow slices.

Hmbown · 2026-06-06T05:23:40Z

Follow-up v0.9 stewardship update: #2833 adds another crate-only WhaleFlow foundation slice.

New in #2833:

WorkflowMemoUsage separates ARMH/shared-memo telemetry from provider token/cost usage;
leaf, branch, and workflow results carry memo counters;
mock execution aggregates memo usage and replay preserves recorded counters.

This still does not expose workflow_run, live RLM/provider calls, shared DB memo lookup, TraceStore writes, or TUI workflow mode. Thanks @AdityaVG13 for the WhaleFlow draft and cache/cost direction; this is deliberately just the typed telemetry shape needed before runtime behavior is safe.

greptile-apps Bot reviewed Jun 1, 2026

View reviewed changes

Comment thread crates/whaleflow/src/config.rs

Comment thread crates/whaleflow/src/config.rs

AdityaVG13 marked this pull request as draft June 1, 2026 05:54

greptile-apps Bot reviewed Jun 1, 2026

View reviewed changes

AdityaVG13 added 3 commits June 1, 2026 02:19

fix(whaleflow): validate missing file_scope in parallel ReadWrite tasks

574a606

fix(whaleflow): improve scopes_overlap with path-boundary matching

069b7db

AdityaVG13 mentioned this pull request Jun 1, 2026

Feat/whaleflow cost tracking #2486

Draft

6 tasks

greptile-apps Bot reviewed Jun 1, 2026

View reviewed changes

This was referenced Jun 1, 2026

[codex] v0.8.50 triage harvest #2504

Merged

EPIC: v0.9.0 WhaleFlow branch/leaf workflow mode #2667

Open

Hmbown added the whaleflow WhaleFlow branch/leaf workflow runtime and workflow mode label Jun 3, 2026

Hmbown added this to the v0.9.0 milestone Jun 3, 2026

Hmbown added v0.9.0 Targeting v0.9.0 workflow-runtime Workflow IR, executor, control flow, and replay runtime labels Jun 3, 2026

Hmbown mentioned this pull request Jun 3, 2026

v0.9.0 Open PR harvest: merge, supersede, or close long-lived branches #2722

Open

This was referenced Jun 6, 2026

WhaleFlow: typed workflow IR and TraceStore state migrations #2668

Closed

feat(whaleflow): add typed workflow spec IR #2821

Merged

Hmbown mentioned this pull request Jun 6, 2026

feat(whaleflow): add mock executor skeleton #2823

Merged

Hmbown mentioned this pull request Jun 6, 2026

WhaleFlow: Rust executor skeleton for branch/leaf workflows #2669

Open

Hmbown mentioned this pull request Jun 6, 2026

feat(whaleflow): replay recorded workflow outputs #2829

Merged

Hmbown mentioned this pull request Jun 6, 2026

feat(whaleflow): add model role policy registry #2830

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add WhaleFlow — declarative multi-agent workflow orchestration#2482

feat: add WhaleFlow — declarative multi-agent workflow orchestration#2482
AdityaVG13 wants to merge 5 commits into
Hmbown:mainfrom
AdityaVG13:feat/whaleflow

AdityaVG13 commented Jun 1, 2026 •

edited by greptile-apps Bot

Loading

Uh oh!

gemini-code-assist Bot commented Jun 1, 2026

Uh oh!

Uh oh!

Uh oh!

greptile-apps Bot Jun 1, 2026

Uh oh!

Uh oh!

greptile-apps Bot Jun 1, 2026

Uh oh!

mo-vic commented Jun 5, 2026

Uh oh!

Hmbown commented Jun 6, 2026

Uh oh!

Hmbown commented Jun 6, 2026

Uh oh!

Hmbown commented Jun 6, 2026

Uh oh!

Hmbown commented Jun 6, 2026

Uh oh!

Hmbown commented Jun 6, 2026

Uh oh!

Hmbown commented Jun 6, 2026

Uh oh!

Hmbown commented Jun 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

AdityaVG13 commented Jun 1, 2026 • edited by greptile-apps Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Testing

Checklist

Greptile Summary

Confidence Score: 3/5

Important Files Changed

Sequence Diagram

Comments Outside Diff (3)

Uh oh!

gemini-code-assist Bot commented Jun 1, 2026

Uh oh!

Uh oh!

Uh oh!

greptile-apps Bot Jun 1, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

greptile-apps Bot Jun 1, 2026

Choose a reason for hiding this comment

Uh oh!

mo-vic commented Jun 5, 2026

Uh oh!

Hmbown commented Jun 6, 2026

Uh oh!

Hmbown commented Jun 6, 2026

Uh oh!

Hmbown commented Jun 6, 2026

Uh oh!

Hmbown commented Jun 6, 2026

Uh oh!

Hmbown commented Jun 6, 2026

Uh oh!

Hmbown commented Jun 6, 2026

Uh oh!

Hmbown commented Jun 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

AdityaVG13 commented Jun 1, 2026 •

edited by greptile-apps Bot

Loading