test(whaleflow): replay dogfood workflow from recorded trace by Hmbown · Pull Request #2852 · Hmbown/CodeWhale

Hmbown · 2026-06-06T09:01:11Z

Summary

add recorded mock-trace replay coverage for the real workflows/rlm_cache_change.star dogfood workflow
assert the dogfood replay includes regression-tests, teacher-review, and summarize-cache-change
assert removing the dogfood regression-tests record produces ReplayDiverged instead of falling back to live execution
update the v0.9 acceptance matrix and changelog evidence while keeping live workflow_run, provider calls, TraceStore writes, worktree application, and TUI pod monitor behavior deferred

Refs #2726 and #2679. Preserves and credits the WhaleFlow direction from #2482/#2486; thanks @AdityaVG13 for the original WhaleFlow draft and cost-tracking foundation.

Verification

cargo test -p codewhale-whaleflow rlm_cache_change --locked
cargo fmt --all --check
git diff --check
cmp -s CHANGELOG.md crates/tui/CHANGELOG.md
./scripts/release/check-versions.sh
./scripts/release/check-ohos-deps.sh

greptile-apps

Hmbown has reached the 50-review limit for trial accounts. To continue receiving code reviews, upgrade your plan.

gemini-code-assist

Code Review

This pull request adds replay test coverage for the rlm_cache_change.star dogfood workflow, verifying successful replay from recorded mock traces and proper handling of missing records (resulting in ReplayDiverged). The changes include new tests and helper functions for trace reconstruction in crates/whaleflow/src/starlark_authoring.rs, alongside corresponding documentation and changelog updates. Feedback on the code changes highlights a potential simplification using .flatten() instead of .cloned().unwrap_or(None), and identifies a critical limitation in the trace reconstruction helper collect_leaf_records which statically traverses the AST and will fail to correctly reconstruct traces for workflows containing loops.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

gemini-code-assist · 2026-06-06T09:02:14Z

+                        .map(|dependency| {
+                            (
+                                dependency.clone(),
+                                resolved_outputs.get(dependency).cloned().unwrap_or(None),


The expression .cloned().unwrap_or(None) on an Option<Option<T>> can be simplified more idiomatically using .flatten(). This avoids the explicit unwrap_or(None) call and improves readability.

resolved_outputs.get(dependency).cloned().flatten(),

gemini-code-assist · 2026-06-06T09:02:14Z

+                WorkflowNode::Leaf(leaf) => {
+                    let result = results
+                        .iter()
+                        .find(|result| result.leaf_id == leaf.id)
+                        .expect("mock execution should record every declared leaf")
+                        .clone();


Limitation in Trace Reconstruction for Loops / Multiple Executions

The collect_leaf_records helper statically traverses the workflow AST (&workflow.nodes) to reconstruct the replay trace.

Because it performs a static traversal:

It will only visit each Leaf node once, even if that leaf is executed multiple times (e.g., inside a LoopUntil block with multiple iterations).

The .find() call on line 590 will always retrieve the first execution result of that leaf, ignoring subsequent iterations.

This means any workflow containing loops that execute more than once will produce an incomplete or incorrect replay trace, leading to ReplayDiverged errors during replay. Consider refactoring this helper to map directly over the dynamic execution results (execution.leaf_results) and resolve their dependencies dynamically, or document this limitation if it is strictly intended for single-iteration test scenarios.

test(whaleflow): replay dogfood workflow from recorded trace

afac41e

greptile-apps Bot reviewed Jun 6, 2026

View reviewed changes

Hmbown merged commit 73c8318 into codex/v0.9.0-stewardship Jun 6, 2026
2 checks passed

Hmbown deleted the codex/v090-whaleflow-dogfood-replay branch June 6, 2026 09:01

This was referenced Jun 6, 2026

v0.9.0 WhaleFlow MVP cutline: IR, executor, replay, and pod monitor before teacher loops #2726

Open

WhaleFlow: ship rlm_cache_change.star as the first dogfood workflow #2679

Open

v0.9.0 Release acceptance matrix: required checks before tagging #2729

Open

gemini-code-assist Bot reviewed Jun 6, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

test(whaleflow): replay dogfood workflow from recorded trace#2852

test(whaleflow): replay dogfood workflow from recorded trace#2852
Hmbown merged 1 commit into
codex/v0.9.0-stewardshipfrom
codex/v090-whaleflow-dogfood-replay

Hmbown commented Jun 6, 2026

Uh oh!

greptile-apps Bot left a comment

Uh oh!

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot Jun 6, 2026

Uh oh!

gemini-code-assist Bot Jun 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Hmbown commented Jun 6, 2026

Summary

Verification

Uh oh!

greptile-apps Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Jun 6, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jun 6, 2026

Choose a reason for hiding this comment

Limitation in Trace Reconstruction for Loops / Multiple Executions

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant