fix(plan): add tier-model failover to plan step executor (#1448) by RyderFreeman4Logos · Pull Request #1450 · RyderFreeman4Logos/cli-sub-agent

RyderFreeman4Logos · 2026-05-18T03:27:06Z

Summary

Plan steps with tier annotations now iterate through the tier's model fallback chain on failover-eligible failures (HTTP 400/429, rate limit, quota exhaustion)
Previously, resolve_step_tool collapsed the tier into a single (tool, model_spec) pair and execute_csa_step tried only that one model — gemini-cli returning 400 caused the entire workflow to abort with no recovery
StepTarget::CsaTool gains tier_name field so the executor can build the full fallback chain via ordered_tier_candidates()

Changes

New: plan_cmd_tier_failover.rs — execute_csa_step_with_tier_failover() loops through tier candidates using classify_next_model_failure_with_elapsed() for failover gating
Modified: plan_cmd_steps.rs — StepTarget::CsaTool carries tier_name, resolve_step_tool preserves it
Modified: plan_cmd_exec.rs — StepExecutionOutcome gains stderr field for failover pattern detection
Extracted: plan_cmd_steps_test_helpers.rs — test-only execute_plan/execute_step helpers (monolith gate compliance)

Closes #1448

Test plan

cargo check --package cli-sub-agent passes
cargo clippy --package cli-sub-agent -- -D warnings clean
cargo fmt --check clean
csa review --range main...HEAD verdict: PASS (session 01KRWHSC6ZZ)
Manual: run csa plan run patterns/mktd/workflow.toml against a repo where gemini-cli returns 400 — verify it falls through to claude-code/codex

🤖 Generated with Claude Code

Plan steps with a tier annotation now iterate through the tier's model list on failover-eligible failures (HTTP 400/429, rate limit, quota exhaustion). Previously, resolve_step_tool collapsed the tier into a single (tool, model_spec) pair and execute_csa_step tried only that one model — gemini-cli returning 400 caused the entire workflow to abort. Changes: - StepTarget::CsaTool gains tier_name field so the executor can build the full fallback chain via ordered_tier_candidates() - New plan_cmd_tier_failover module with execute_csa_step_with_tier_failover that loops through candidates using classify_next_model_failure_with_elapsed - StepExecutionOutcome gains stderr field for failover pattern detection - Extracted test-only helpers to plan_cmd_steps_test_helpers (monolith gate) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

gemini-code-assist

Code Review

This pull request introduces tier-level failover for CSA steps, allowing the system to automatically attempt alternative models when a primary model fails due to issues like rate limiting. Additionally, it adds stderr capturing to step execution outcomes and refactors test helpers into a separate module. Review feedback identifies that the failover summary log currently omits the failure reason for the final candidate in a tier. Furthermore, it is noted that stderr capturing for bash steps is currently ineffective because the underlying process inherits stderr rather than piping it for capture.

gemini-code-assist · 2026-05-18T03:29:36Z

+                if let Some(rate_limit) = detected
+                    && idx + 1 < candidates.len()
+                {
+                    warn!(
+                        "{label} - {spec_label} failed ({}); advancing to next tier model",
+                        rate_limit.reason
+                    );
+                    eprintln!("{label} - {spec_label} FAILOVER ({})", rate_limit.reason);
+                    failures.push(TierAttemptFailure {
+                        model_spec: spec_label.to_string(),
+                        reason: rate_limit.reason,
+                    });
+                    continue;
+                }


The "all models failed" summary log (line 116) will be incomplete because the last model's failure is not added to the failures vector. The current logic only pushes to failures when idx + 1 < candidates.len(). If the last candidate fails, its reason is omitted from the final summary warning.

if let Some(rate_limit) = detected { if idx + 1 < candidates.len() { warn!( "{label} - {spec_label} failed ({}); advancing to next tier model", rate_limit.reason ); eprintln!("{label} - {spec_label} FAILOVER ({})", rate_limit.reason); failures.push(TierAttemptFailure { model_spec: spec_label.to_string(), reason: rate_limit.reason, }); continue; } else { failures.push(TierAttemptFailure { model_spec: spec_label.to_string(), reason: rate_limit.reason, }); } }

gemini-code-assist · 2026-05-18T03:29:36Z

    };

    let stdout = String::from_utf8_lossy(&output.stdout).to_string();
+    let stderr_str = String::from_utf8_lossy(&output.stderr).to_string();


The stderr_str captured here will be empty for bash steps because spawn_bash (line 172) uses Stdio::inherit() for stderr. To correctly populate this field for the journal or pattern detection, spawn_bash would need to use Stdio::piped(), and execute_bash_step would then need to manually print the captured stderr to maintain terminal visibility.

RyderFreeman4Logos and others added 2 commits May 17, 2026 20:20

chore: update weave.lock version stamps to 0.1.738

e919aff

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

RyderFreeman4Logos merged commit f253f0f into main May 18, 2026
3 of 7 checks passed

gemini-code-assist Bot reviewed May 18, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(plan): add tier-model failover to plan step executor (#1448)#1450

fix(plan): add tier-model failover to plan step executor (#1448)#1450
RyderFreeman4Logos merged 2 commits into
mainfrom
fix/1448-plan-step-tier-failover

RyderFreeman4Logos commented May 18, 2026

Uh oh!

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot May 18, 2026

Uh oh!

gemini-code-assist Bot May 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

RyderFreeman4Logos commented May 18, 2026

Summary

Changes

Test plan

Uh oh!

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot May 18, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 18, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant