Skip to content

feat: add WhaleFlow — declarative multi-agent workflow orchestration#2482

Draft
AdityaVG13 wants to merge 5 commits into
Hmbown:mainfrom
AdityaVG13:feat/whaleflow
Draft

feat: add WhaleFlow — declarative multi-agent workflow orchestration#2482
AdityaVG13 wants to merge 5 commits into
Hmbown:mainfrom
AdityaVG13:feat/whaleflow

Conversation

@AdityaVG13
Copy link
Copy Markdown
Contributor

@AdityaVG13 AdityaVG13 commented Jun 1, 2026

New crate: crates/whaleflow providing declarative JSON-config-driven sub-agent swarm orchestration for CodeWhale. Inspired by Claude Code's Dynamic Workflows

  • WorkflowConfig JSON schema with phases, tasks, dependencies
  • Topological scheduler with semaphore-based concurrency control
  • File-scope conflict detection for parallel write safety
  • Git worktree isolation per task (create → extract → apply → clean)
  • Structured WorkflowResult with per-task cost/token tracking
  • workflow_run tool schema for model invocation
  • TUI integration via WhaleFlowSpawner (SubAgentManager bridge)
  • 18 tests: 15 unit + 3 integration

Summary

Testing

  • cargo fmt --all -- --check
  • cargo clippy --workspace --all-targets --all-features
  • cargo test --workspace --all-features

Checklist

  • Updated docs or comments as needed
  • Added or updated tests where relevant
  • Verified TUI behavior manually if UI changes

Greptile Summary

This PR adds crates/whaleflow, a new declarative multi-agent orchestration layer that lets the model drive sub-agent swarms through a JSON workflow config. The TUI crate is extended with a workflow_run tool backed by WhaleFlowSpawner, which translates the scheduler's phase/task graph into SubAgentManager calls with optional git-worktree isolation.

  • Scheduler & config: topological phase ordering, semaphore-bounded parallelism, file-scope conflict detection, and a validated WorkflowConfig schema. IsolationMode::Worktree correctly returns Some(path) from cwd_path(), and git operations in WhaleFlowSpawner are correctly offloaded to tokio::task::spawn_blocking.
  • timeout_secs / max_steps: both fields are now fully wired — timeout_secs wraps the poll loop in tokio::time::timeout and max_steps flows through SubAgentSpawnOptions — but neither appears in WORKFLOW_RUN_SCHEMA, so the model cannot discover or use them.
  • Parallel abort semantics: dropping a JoinHandle in Tokio detaches rather than cancels the task; when FailurePolicy::Abort fires mid-phase the remaining handles are dropped and those sub-agents continue running in the background, potentially writing to the shared workspace after the workflow reports Aborted.

Confidence Score: 3/5

Safe to merge only after addressing the parallel-abort task-detachment issue; sub-agents for Shared-isolation ReadWrite tasks can continue writing to the workspace after the orchestrator considers the workflow aborted.

The worktree lifecycle, blocking-IO isolation, and timeout wiring are well-implemented. However, the parallel fan-out code drops JoinHandles on abort without cancelling the underlying tokio tasks — those tasks are detached and keep running, potentially making file changes in the main workspace that the scheduler has already declared complete or skipped. This is a real behavioral defect on a core execution path.

crates/whaleflow/src/scheduler.rs (parallel abort detachment) and crates/whaleflow/src/tool.rs (schema missing max_steps/timeout_secs)

Important Files Changed

Filename Overview
crates/whaleflow/src/scheduler.rs Core scheduler with topological sort, parallel fan-out, and failure handling. Two issues: parallel Abort drops JoinHandles without cancelling detached tasks (allowing background writes after abort), and non-deterministic phase ordering for independent phases due to HashMap iteration.
crates/whaleflow/src/tool.rs Exposes workflow_run to the model and wires execute_workflow. The JSON schema for task properties is missing max_steps and timeout_secs, making both fully-implemented fields invisible to the model.
crates/tui/src/tools/workflow/mod.rs WhaleFlowSpawner bridges the whaleflow scheduler to SubAgentManager. Git operations are correctly offloaded to spawn_blocking, timeout wrapping is sound, and worktree lifecycle (create→extract→apply→remove) is properly sequenced with warnings on partial failure.
crates/whaleflow/src/worktree.rs Worktree lifecycle management using std::process::Command. extract_changes runs git diff HEAD which only captures uncommitted working-tree changes — committed sub-agent work is silently lost (flagged in a prior review thread).
crates/whaleflow/src/config.rs Workflow/Phase/Task schema with validation, conflict detection, and cycle checking. scopes_overlap correctly uses path-segment boundary comparisons. IsolationMode::cwd_path() now returns Some for Worktree, resolving a prior concern.
crates/tui/src/core/engine.rs Wires WhaleFlowSpawner into the tool registry. Correctly guards workflow_tool registration on runtime availability, preserving prior panic-on-None behavior for sub-agent tools.
crates/whaleflow/tests/integration_test.rs Five integration tests covering three-phase workflows, partial failure, JSON round-trip, and abort policies. MockSpawner returns instantly, which masks the detached-task behavior when Abort fires in a parallel phase.

Sequence Diagram

sequenceDiagram
    participant M as Model
    participant WT as WorkflowRunTool
    participant EW as execute_workflow
    participant S as Scheduler
    participant WFS as WhaleFlowSpawner
    participant WM as WorktreeManager
    participant SAM as SubAgentManager

    M->>WT: "workflow_run({config: ...})"
    WT->>EW: execute_workflow(config_json, spawner)
    EW->>S: Scheduler::new(config, spawner)
    EW->>S: run()

    loop for each phase (topo order)
        S->>S: build_prompt(task) — inject upstream results
        alt parallel phase
            par for each task
                S->>WFS: spawn(task_id, prompt, cwd, timeout, max_steps)
                alt "isolation = worktree"
                    WFS->>WM: create(task_id, workspace) via spawn_blocking
                    WM-->>WFS: worktree_path
                end
                WFS->>SAM: spawn_background_with_assignment_options
                loop poll 250ms
                    WFS->>SAM: get_result(agent_id)
                    SAM-->>WFS: status
                end
                alt completed and worktree
                    WFS->>WM: extract_changes (git diff HEAD)
                    WFS->>WM: apply_patch (git apply)
                    WFS->>WM: remove (git worktree remove)
                end
                WFS-->>S: AgentResult
            end
        else sequential phase
            S->>WFS: spawn(...)
            WFS-->>S: AgentResult
        end
    end

    S-->>EW: WorkflowResult
    EW-->>WT: result JSON
    WT-->>M: ToolResult(success, json)
Loading

Comments Outside Diff (3)

  1. crates/whaleflow/src/worktree.rs, line 2029-2036 (link)

    P1 Blocking std::process::Command called from inside async context

    WorktreeManager::create, extract_changes, apply_patch, and remove all use std::process::Command::output() / wait_with_output(), which are blocking syscalls. These methods are called directly inside the async fn spawn(...) implementation of WhaleFlowSpawner, which itself runs on a tokio worker thread (via tokio::spawn in the scheduler's parallel fan-out path). Blocking a tokio worker thread with long-running git operations (especially git worktree add or git apply on a large repository) can starve the async runtime and degrade all concurrent tasks.

    The fix is to wrap each Command call in tokio::task::spawn_blocking(|| ...) and .await the result, or switch to tokio::process::Command.

    Fix in Codex Fix in Claude Code Fix in Cursor

  2. crates/whaleflow/src/config.rs, line 761-784 (link)

    P2 scopes_overlap has false positives due to string prefix vs. path prefix mismatch

    strip_glob("src/auth/**") yields "src/auth" and strip_glob("src/auth_admin/**") yields "src/auth_admin". The check "src/auth_admin".starts_with("src/auth") returns true (string prefix), so these two entirely disjoint directory scopes are incorrectly flagged as overlapping. std::path::Path::starts_with enforces component boundaries and would correctly return false here (Path::new("src/auth_admin").starts_with(Path::new("src/auth")) → false). The impact is a spurious OverlappingScopes warning logged at WARN level; the workflow continues, but users see misleading diagnostics.

    Fix in Codex Fix in Claude Code Fix in Cursor

  3. crates/whaleflow/src/config.rs, line 635-648 (link)

    P2 depends_on_results can reference tasks in the same parallel phase, silently receiving no data

    Validation confirms that depends_on_results IDs exist somewhere in the workflow, but it does not verify that they belong to a prior phase. If task B in a parallel phase lists task A (in the same phase) in depends_on_results, both tasks are spawned concurrently. When the scheduler calls build_prompt for B before A has completed, self.results.get("A") returns None, and the prompt silently includes "### A (not available)\n\n" instead of real context. The model sees no error — it just gets empty upstream data, producing subtly wrong behavior with no diagnostic.

    Fix in Codex Fix in Claude Code Fix in Cursor

Fix All in Codex Fix All in Claude Code Fix All in Cursor

Reviews (4): Last reviewed commit: "fix(whaleflow): improve scopes_overlap w..." | Re-trigger Greptile

Greptile also left 1 inline comment on this PR.

New crate crates/whaleflow providing declarative JSON-config-driven
sub-agent swarm orchestration for CodeWhale. Inspired by Claude Code's
Dynamic Workflows (Opus 4.8, May 2026).

- WorkflowConfig JSON schema with phases, tasks, dependencies
- Topological scheduler with semaphore-based concurrency control
- File-scope conflict detection for parallel write safety
- Git worktree isolation per task (create → extract → apply → clean)
- Structured WorkflowResult with per-task cost/token tracking
- workflow_run tool schema for model invocation
- TUI integration via WhaleFlowSpawner (SubAgentManager bridge)
- 18 tests: 15 unit + 3 integration
@gemini-code-assist
Copy link
Copy Markdown
Contributor

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

Comment thread crates/whaleflow/src/config.rs
Comment thread crates/whaleflow/src/config.rs
@AdityaVG13 AdityaVG13 marked this pull request as draft June 1, 2026 05:54
…_blocking

- cwd_path() now returns worktree path for Worktree variant (was dead code)
- parallel phases now honor Abort failure policy
- WorktreeManager git calls wrapped in tokio::spawn_blocking
- timeout_secs wired end-to-end with tokio::time::timeout on polling loop
- AgentSpawner trait extended with timeout_secs/max_steps parameters
- WorkflowRunTool no longer claims ReadOnly capability
- unknown agent_type now logs a warning instead of silently defaulting

Addresses Greptile review: P1 (blocking Command), P2 (dead timeout_secs)
Comment on lines +66 to +88
pub fn extract_changes(task_id: &str, workspace: &Path) -> Result<String, SpawnError> {
let relative = format!(".worktrees/whaleflow-{}", task_id);
let worktree_path = workspace.join(&relative);

let output = Command::new("git")
.arg("-C")
.arg(&worktree_path)
.arg("diff")
.arg("HEAD")
.output()
.map_err(|e| {
SpawnError::WorktreeError(format!("git diff in worktree failed: {}", e))
})?;

if !output.status.success() {
let stderr = String::from_utf8_lossy(&output.stderr);
return Err(SpawnError::WorktreeError(format!(
"git diff in worktree failed: {}",
stderr.trim()
)));
}

Ok(String::from_utf8_lossy(&output.stdout).into_owned())
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 git diff HEAD misses committed changes — data loss for committing sub-agents

extract_changes runs git diff HEAD inside the worktree, which compares the working directory against the worktree's current HEAD. If a sub-agent commits any of its work (advancing HEAD in the worktree), git diff HEAD only captures uncommitted changes after the last commit — all committed changes between the worktree's initial HEAD and its final HEAD are excluded from the patch. When the worktree is then removed, those committed changes are permanently lost.

The fix is to record the initial HEAD SHA when create is called, then use git diff <initial_sha> (working-tree vs initial commit) or git diff <initial_sha>..HEAD (all commits + unstaged) in extract_changes to capture the full set of changes.

Fix in Codex Fix in Claude Code Fix in Cursor

Comment thread crates/tui/src/tools/workflow/mod.rs Outdated
… tests

- max_steps flows through SubAgentSpawnOptions to per-agent step budget
- extract_changes errors now logged instead of silently ignored
- files_touched populated from worktree diff output
- TaskStatus re-exported from whaleflow crate
- 3 new tests: abort in parallel phase, abort stops subsequent phases,
  timeout_secs/max_steps deserialization

Addresses Greptile P1 (silent failure on extract_changes)
@AdityaVG13 AdityaVG13 mentioned this pull request Jun 1, 2026
6 tasks
Comment on lines +129 to +137
let handle = tokio::spawn(async move {
let _permit = sem.acquire().await;
spawner
.spawn(task_id_for_closure, prompt, task.agent_type.clone(), cwd, timeout_secs, max_steps)
.await
});
handles.push((task_id, handle));
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Spawned tasks are detached, not cancelled, on Abort in parallel phase

When a task fails and FailurePolicy::Abort triggers inside the for (task_id, handle) in handles loop, the loop breaks and the remaining JoinHandles in handles are dropped. Dropping a tokio::spawn JoinHandle in Tokio detaches the task — the underlying async task continues running to completion. For Shared-isolation ReadWrite tasks this means additional sub-agents keep writing to the main workspace after the scheduler has already reported the workflow as Aborted, producing changes the orchestrator never learns about and that can conflict with subsequent workflow runs. The integration test passes because the MockSpawner returns instantly, so the detached task finishes before the test asserts anything, masking the real-world behavior.

Fix in Codex Fix in Claude Code Fix in Cursor

@Hmbown Hmbown added the whaleflow WhaleFlow branch/leaf workflow runtime and workflow mode label Jun 3, 2026
@Hmbown Hmbown added this to the v0.9.0 milestone Jun 3, 2026
@Hmbown Hmbown added v0.9.0 Targeting v0.9.0 workflow-runtime Workflow IR, executor, control flow, and replay runtime labels Jun 3, 2026
@mo-vic
Copy link
Copy Markdown

mo-vic commented Jun 5, 2026

Cool feature! I can't wait to play with it.

@Hmbown
Copy link
Copy Markdown
Owner

Hmbown commented Jun 6, 2026

Thanks @AdityaVG13. I did a fresh v0.9 stewardship pass on this and #2486.

This is the right WhaleFlow direction and I want to preserve it as the source branch for the workflow runner work, but I am not going to merge or directly harvest the full PR into the current v0.9 integration branch as-is.

Current release evidence:

  • The current codex/v0.9.0-stewardship branch still has design references only; it does not have crates/whaleflow, WorkflowRunTool, or workflow_run registered in code yet.
  • This PR is draft/dirty and its latest matrix has failing lint and Windows tests.
  • The review findings are real release risks for an agent/workflow executor: aborting a parallel phase can detach still-running agents, worktree extraction can miss committed work, and the model-facing schema still needs to expose the execution controls it implements.
  • The milestone definition is stricter than a JSON-only runner: v0.9 wants a typed workflow IR / Rust executor path with branch/leaf semantics, replay/evidence, and clear safety boundaries.

Safe path from here: keep this PR open as the intent/source branch, then land WhaleFlow in smaller maintainer slices against the v0.9 branch. The first viable slice should be something like typed config/IR validation plus deterministic scheduler tests, behind a feature/config gate and without exposing a write-capable workflow_run tool until cancellation, worktree diff capture, and replay/evidence semantics are airtight. Any harvested slice should credit you in the commit/PR body/changelog.

Thanks also @mo-vic for the product interest here. The feature is exciting; the restraint is only because workflow orchestration is exactly the kind of surface where a partial merge can write to the wrong place or leave agent work running after the UI says it stopped.

@Hmbown
Copy link
Copy Markdown
Owner

Hmbown commented Jun 6, 2026

Thanks @AdityaVG13. As part of v0.9 stewardship, I opened #2821 as a narrow maintainer harvest of the safe typed-IR naming surface from this WhaleFlow direction.

What was harvested: explicit WorkflowSpec, WorkflowNode, branch/leaf specs, budget/permission/model/promotion policy metadata structs, and workflow_ir_roundtrip coverage in crates/whaleflow.

What remains intentionally out of scope for that maintainer PR: runtime workflow_run exposure, executor behavior, deterministic replay, worktree application, and model/provider routing. This PR still carries the broader orchestration intent, so I am not treating it as replaced wholesale.

Hmbown added a commit that referenced this pull request Jun 6, 2026
Add the explicit WorkflowSpec/WorkflowNode metadata surface requested for the v0.9 WhaleFlow IR, including budget, permission, model, and promotion policy records plus serde roundtrip coverage. Runtime execution, replay, and worktree application remain out of scope.

Refs #2668, #2482, #2486.

Co-authored-by: AdityaVG13 <44177453+AdityaVG13@users.noreply.github.com>
@Hmbown
Copy link
Copy Markdown
Owner

Hmbown commented Jun 6, 2026

Thanks again @AdityaVG13. I opened #2823 as another narrow v0.9 maintainer harvest from this WhaleFlow direction.

What #2823 takes: a crate-local mock executor skeleton over WorkflowSpec, acceptance-style tests for #2669 control flow, ExpandSpec::max_children, generated-node validation, and pure BranchTournament / ParetoFrontier reducer scaffolding.

What remains intentionally out of scope: live workflow_run, real subagent spawning, worktree apply/extract, TraceStore writes, replay, provider routing, and TUI workflow mode. This PR remains the broader source branch for that intent, so I am not treating it as replaced wholesale.

Hmbown added a commit that referenced this pull request Jun 6, 2026
Add a crate-local mock executor over WorkflowSpec that records leaf, branch, and control-node results for Sequence, BranchSet, Leaf, Reduce, TeacherReview, LoopUntil, Cond, and Expand. Add reducer scaffolding for BranchTournament and ParetoFrontier, plus #2669 acceptance-style tests, without exposing workflow_run, spawning agents, or applying worktrees.

Refs #2669.
Harvests narrow WhaleFlow executor intent from #2482/#2486.

Co-authored-by: AdityaVG13 <44177453+AdityaVG13@users.noreply.github.com>
@Hmbown
Copy link
Copy Markdown
Owner

Hmbown commented Jun 6, 2026

v0.9 stewardship update: I opened #2829 as another narrow WhaleFlow maintainer slice inspired by the broader workflow direction here.

#2829 adds crate-only deterministic replay from recorded leaf/control records, including stable leaf input hashes and replay_diverged for missing records. It deliberately avoids runtime commands, live provider calls, worktree replay, and the broader draft scope, so this PR remains open as the larger source branch.

Thanks @AdityaVG13 for the WhaleFlow draft direction; the credit is preserved in the #2829 PR body and changelog.

@Hmbown
Copy link
Copy Markdown
Owner

Hmbown commented Jun 6, 2026

v0.9 stewardship update: I opened #2830 as a crate-only model-policy slice for WhaleFlow.

#2830 adds role/capability model selection, mock provider plumbing, and fail-closed JSON repair parsing without live provider calls or runtime provider switching. It keeps the broader #2672 provider-routing/adapter work out of scope while preserving the broader WhaleFlow draft direction here.

Thanks @AdityaVG13; the changelog and PR body keep the broader WhaleFlow draft credit trail intact.

@Hmbown
Copy link
Copy Markdown
Owner

Hmbown commented Jun 6, 2026

Another narrow WhaleFlow foundation slice is up in #2831. It keeps the work inside codewhale-whaleflow: the rlm_cache_change.star dogfood workflow now compiles and runs through the mock executor, with candidate branches, LoopUntil verification, tournament selection, teacher review, and reduction represented in the IR.

This still does not claim the full runtime workflow mode, live provider replay, or shared persistence pieces from this broader proposal. It is intended as CI-backed scaffolding we can build on safely for v0.9.0. Thanks @AdityaVG13 for the draft architecture and cost-tracking direction that shaped these WhaleFlow slices.

@Hmbown
Copy link
Copy Markdown
Owner

Hmbown commented Jun 6, 2026

Follow-up v0.9 stewardship update: #2833 adds another crate-only WhaleFlow foundation slice.

New in #2833:

  • WorkflowMemoUsage separates ARMH/shared-memo telemetry from provider token/cost usage;
  • leaf, branch, and workflow results carry memo counters;
  • mock execution aggregates memo usage and replay preserves recorded counters.

This still does not expose workflow_run, live RLM/provider calls, shared DB memo lookup, TraceStore writes, or TUI workflow mode. Thanks @AdityaVG13 for the WhaleFlow draft and cache/cost direction; this is deliberately just the typed telemetry shape needed before runtime behavior is safe.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

v0.9.0 Targeting v0.9.0 whaleflow WhaleFlow branch/leaf workflow runtime and workflow mode workflow-runtime Workflow IR, executor, control flow, and replay runtime

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants