Skip to content

test(whaleflow): replay dogfood workflow from recorded trace#2852

Merged
Hmbown merged 1 commit into
codex/v0.9.0-stewardshipfrom
codex/v090-whaleflow-dogfood-replay
Jun 6, 2026
Merged

test(whaleflow): replay dogfood workflow from recorded trace#2852
Hmbown merged 1 commit into
codex/v0.9.0-stewardshipfrom
codex/v090-whaleflow-dogfood-replay

Conversation

@Hmbown
Copy link
Copy Markdown
Owner

@Hmbown Hmbown commented Jun 6, 2026

Summary

  • add recorded mock-trace replay coverage for the real workflows/rlm_cache_change.star dogfood workflow
  • assert the dogfood replay includes regression-tests, teacher-review, and summarize-cache-change
  • assert removing the dogfood regression-tests record produces ReplayDiverged instead of falling back to live execution
  • update the v0.9 acceptance matrix and changelog evidence while keeping live workflow_run, provider calls, TraceStore writes, worktree application, and TUI pod monitor behavior deferred

Refs #2726 and #2679. Preserves and credits the WhaleFlow direction from #2482/#2486; thanks @AdityaVG13 for the original WhaleFlow draft and cost-tracking foundation.

Verification

  • cargo test -p codewhale-whaleflow rlm_cache_change --locked
  • cargo fmt --all --check
  • git diff --check
  • cmp -s CHANGELOG.md crates/tui/CHANGELOG.md
  • ./scripts/release/check-versions.sh
  • ./scripts/release/check-ohos-deps.sh

Copy link
Copy Markdown
Contributor

@greptile-apps greptile-apps Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmbown has reached the 50-review limit for trial accounts. To continue receiving code reviews, upgrade your plan.

@Hmbown Hmbown merged commit 73c8318 into codex/v0.9.0-stewardship Jun 6, 2026
2 checks passed
@Hmbown Hmbown deleted the codex/v090-whaleflow-dogfood-replay branch June 6, 2026 09:01
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request adds replay test coverage for the rlm_cache_change.star dogfood workflow, verifying successful replay from recorded mock traces and proper handling of missing records (resulting in ReplayDiverged). The changes include new tests and helper functions for trace reconstruction in crates/whaleflow/src/starlark_authoring.rs, alongside corresponding documentation and changelog updates. Feedback on the code changes highlights a potential simplification using .flatten() instead of .cloned().unwrap_or(None), and identifies a critical limitation in the trace reconstruction helper collect_leaf_records which statically traverses the AST and will fail to correctly reconstruct traces for workflows containing loops.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

.map(|dependency| {
(
dependency.clone(),
resolved_outputs.get(dependency).cloned().unwrap_or(None),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The expression .cloned().unwrap_or(None) on an Option<Option<T>> can be simplified more idiomatically using .flatten(). This avoids the explicit unwrap_or(None) call and improves readability.

                                resolved_outputs.get(dependency).cloned().flatten(),

Comment on lines +588 to +593
WorkflowNode::Leaf(leaf) => {
let result = results
.iter()
.find(|result| result.leaf_id == leaf.id)
.expect("mock execution should record every declared leaf")
.clone();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Limitation in Trace Reconstruction for Loops / Multiple Executions

The collect_leaf_records helper statically traverses the workflow AST (&workflow.nodes) to reconstruct the replay trace.

Because it performs a static traversal:

  1. It will only visit each Leaf node once, even if that leaf is executed multiple times (e.g., inside a LoopUntil block with multiple iterations).
  2. The .find() call on line 590 will always retrieve the first execution result of that leaf, ignoring subsequent iterations.

This means any workflow containing loops that execute more than once will produce an incomplete or incorrect replay trace, leading to ReplayDiverged errors during replay. Consider refactoring this helper to map directly over the dynamic execution results (execution.leaf_results) and resolve their dependencies dynamically, or document this limitation if it is strictly intended for single-iteration test scenarios.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant