feat(provider): complete provider fallback chain (#2574)#2773
Conversation
Add the configuration and data-model layer for automatic provider
fallback. When active provider returns a retryable error (429, 5xx,
timeout), CodeWhale can switch to the next configured provider without
user intervention.
Changes:
- `crates/config/src/lib.rs`:
- `ConfigToml.fallback_providers: Vec<ProviderKind>` — serialised
from `fallback_providers = ["deepseek", "openrouter"]` in TOML
- `ProviderChain` struct: tracks ordered provider list + current
position, with advance/has_next/is_fallback_active helpers
- 6 new unit tests covering chain construction, advance, exhaust,
dedup, remaining count, and TOML parsing
- `crates/tui/src/tui/app.rs`:
- `App.fallback_providers: Vec<String>` — provider route names
- `App.fallback_depth: Option<usize>` — current chain position
- `App.last_fallback_reason: Option<String>` — UI display
Follow-up PRs will wire:
- Runtime fallback execution in the agent loop
- `/provider fallback` command
- Status bar fallback indicator
Closes Hmbown#2574
…own#2574) Full provider fallback chain implementation: - `crates/config/src/lib.rs`: - `ConfigToml.fallback_providers: Vec<ProviderKind>` — TOML parsing - `ProviderChain` struct with advance/has_next/is_fallback_active - 6 unit tests - `crates/tui/src/tui/app.rs`: - `fallback_providers`, `fallback_depth`, `last_fallback_reason` fields - `advance_fallback(reason)` — advance to next fallback provider - `reset_fallback()` — return to primary - `is_fallback_active()` — query current state - `load_fallback_from_toml()` — init from config - `crates/tui/src/commands/provider.rs`: - `/provider fallback` — show chain + position + last reason - `/provider fallback reset` — return to primary provider Follow-up: wire `advance_fallback` into the engine error handler for automatic runtime fallback on 429/5xx/timeout. Closes Hmbown#2574
Add automatic provider fallback so that when the active provider returns a retryable error (429, 5xx, timeout, network), CodeWhale switches to the next configured provider without user intervention. ### Configuration ```toml # ~/.codewhale/config.toml provider = "nvidia-nim" fallback_providers = ["deepseek", "openrouter"] ``` ### Changes **Config layer** (`crates/config/src/lib.rs`): - `ConfigToml.fallback_providers: Vec<ProviderKind>` — deserialised from `fallback_providers = ["deepseek", "openrouter"]` in TOML - `ProviderChain` struct: ordered provider list with position tracking, `advance()`, `has_next()`, `is_fallback_active()`, `remaining()` - 6 unit tests covering construction, advance, exhaust, dedup, remaining count, and TOML parsing **App state** (`crates/tui/src/tui/app.rs`): - `fallback_providers: Vec<String>` — provider route names - `fallback_depth: Option<usize>` — current chain position (None = primary) - `last_fallback_reason: Option<String>` — for UI display - `advance_fallback(reason)` — advance to next fallback provider - `reset_fallback()` — return to primary - `is_fallback_active()` — query current state - `load_fallback_from_toml()` — init from ConfigToml **Runtime integration** (`crates/tui/src/tui/ui.rs`): - `apply_engine_error_to_app()`: when engine error is recoverable (Network / RateLimit / Timeout) and fallback providers are configured, automatically calls `advance_fallback()` instead of going offline **Command** (`crates/tui/src/commands/provider.rs`): - `/provider fallback` — show current chain, position, and last fallback reason - `/provider fallback reset` — return to primary provider **Footer** (`crates/tui/src/tui/footer_ui.rs`): - `footer_state_label()`: shows "fallback →" in warning colour when a fallback provider is active ### User experience 1. Configure fallback providers in config.toml 2. Primary provider encounters 429/5xx/timeout 3. Footer changes to "fallback →" (orange) 4. Status message: "Switched to deepseek (fallback 1/2): HTTP 429..." 5. `/provider fallback` shows chain: active provider, position, reason 6. `/provider fallback reset` returns to primary ### Testing - 93 config tests pass (6 new ProviderChain tests) - 185 provider command tests pass - Dead code warning suppressed for `load_fallback_from_toml` (awaiting App::new let-binding refactor for startup init) Closes Hmbown#2574
|
Thanks @idling11 for taking the time to contribute. This repository is currently observing a maintainer-managed contribution gate in dry-run mode, so this pull request is staying open. When enforcement is enabled, pull requests from contributors who are not listed in Please read |
There was a problem hiding this comment.
Code Review
This pull request introduces a provider fallback chain feature (#2574) that automatically switches to alternative providers when recoverable errors (like rate limits, timeouts, or network issues) occur. It adds configuration support, a ProviderChain state manager, TUI integration via /provider fallback, and status indicators. The review identified several issues: a critical bug in advance_fallback that skips the first fallback provider and breaks single-fallback setups, a UI bug that incorrectly displays the current fallback marker when none is active, a 0-based index display issue in status messages, and a robustness concern regarding potential out-of-bounds indexing in ProviderChain::current().
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
| let current_depth = self.fallback_depth.unwrap_or(0); | ||
| let next_depth = current_depth + 1; |
There was a problem hiding this comment.
Critical Bug: Fallback Chain Skipping and Premature Exhaustion
In advance_fallback, when self.fallback_depth is None (indicating the primary provider is active), current_depth defaults to 0, making next_depth equal to 1.
This causes two major issues:
- Skipping the first fallback: The first fallback provider at
self.fallback_providers[0]is completely skipped, and the chain immediately transitions to the second fallback provider at index1. - Complete failure with a single fallback: If only one fallback provider is configured (
self.fallback_providers.len() == 1),next_depth(1) is>=the length of the fallback providers (1). The fallback chain is immediately considered exhausted, and fallback never executes.
To fix this, next_depth should be computed as 0 when self.fallback_depth is None, and incremented by 1 only when it is Some.
| let current_depth = self.fallback_depth.unwrap_or(0); | |
| let next_depth = current_depth + 1; | |
| let next_depth = self.fallback_depth.map_or(0, |d| d + 1); |
| let depth = app.fallback_depth.unwrap_or(0); | ||
| let mut lines = vec![format!("Active: {active}")]; | ||
| for (i, name) in app.fallback_providers.iter().enumerate() { | ||
| let marker = if i == depth { " ◀ current" } else { "" }; |
There was a problem hiding this comment.
Bug: Incorrect Current Marker Display
When app.fallback_depth is None (meaning the primary provider is active, not any fallback), depth defaults to 0. This incorrectly marks the first fallback provider (i == 0) with the ◀ current label in the /provider fallback command output.
The marker should only be displayed if app.fallback_depth is explicitly Some(i).
| let depth = app.fallback_depth.unwrap_or(0); | |
| let mut lines = vec![format!("Active: {active}")]; | |
| for (i, name) in app.fallback_providers.iter().enumerate() { | |
| let marker = if i == depth { " ◀ current" } else { "" }; | |
| let mut lines = vec![format!("Active: {active}")]; | |
| for (i, name) in app.fallback_providers.iter().enumerate() { | |
| let marker = if app.fallback_depth == Some(i) { " ◀ current" } else { "" }; |
| app.status_message = Some(format!( | ||
| "Switched to {} (fallback {}/{}): {message}", | ||
| app.api_provider.as_str(), | ||
| app.fallback_depth.unwrap_or(0), |
There was a problem hiding this comment.
Bug: Incorrect Fallback Index Display
When app.fallback_depth is Some(depth) (0-indexed), displaying app.fallback_depth.unwrap_or(0) results in a 0-based index (e.g., "fallback 0/2" for the first fallback). To match the user-friendly 1-based index described in the PR (e.g., "fallback 1/2"), we should map Some(depth) to depth + 1.
app.fallback_depth.map_or(0, |d| d + 1),| pub fn current(&self) -> ProviderKind { | ||
| self.providers[self.position] | ||
| } |
There was a problem hiding this comment.
Robustness: Safe Indexing in current()
Since ProviderChain has public fields providers and position, external code can mutate them and potentially break the invariant position < providers.len(). To prevent out-of-bounds panics, use safe indexing with .get().copied() and fall back to the first provider (which is guaranteed to exist).
pub fn current(&self) -> ProviderKind {
self.providers.get(self.position).copied().unwrap_or(self.providers[0])
}| pub fn advance_fallback(&mut self, reason: impl Into<String>) -> bool { | ||
| if self.fallback_providers.is_empty() { | ||
| return false; | ||
| } | ||
| let current_depth = self.fallback_depth.unwrap_or(0); | ||
| let next_depth = current_depth + 1; | ||
| if next_depth >= self.fallback_providers.len() { | ||
| self.last_fallback_reason = Some(format!( | ||
| "Fallback chain exhausted after {}: {}", | ||
| self.fallback_providers.len(), | ||
| reason.into() | ||
| )); | ||
| return false; | ||
| } | ||
| let next_name = &self.fallback_providers[next_depth]; | ||
| if let Some(next_provider) = ApiProvider::parse(next_name) { | ||
| self.fallback_depth = Some(next_depth); | ||
| self.last_fallback_reason = | ||
| Some(format!("Fell back to {}: {}", next_name, reason.into())); | ||
| self.api_provider = next_provider; | ||
| true | ||
| } else { | ||
| self.last_fallback_reason = Some(format!("Unknown fallback provider: {next_name}")); | ||
| false | ||
| } | ||
| } |
There was a problem hiding this comment.
Off-by-one: first fallback provider always skipped (or never reached)
fallback_providers is 0-indexed (first fallback = index 0), but advance_fallback starts with current_depth = unwrap_or(0) then accesses fallback_providers[next_depth] — where next_depth = current_depth + 1 = 1 on the very first call, skipping index 0. With exactly one fallback configured, next_depth = 1 >= len() = 1 is true immediately, so the exhaustion path fires and the only fallback is never tried.
For fallback_providers = ["deepseek", "openrouter"]:
- Call 1:
current_depth=0,next_depth=1, accessesfallback_providers[1]="openrouter"—"deepseek"is skipped forever. - Call 2:
current_depth=1,next_depth=2 >= 2→ "exhausted" even though the chain only visited one provider.
The fix is to treat fallback_depth as 1-based, so the first advance maps to fallback_providers[0]: change the bounds check to next_depth > self.fallback_providers.len() and the access to &self.fallback_providers[next_depth - 1].
| /// Reset the fallback chain to the primary provider. | ||
| pub fn reset_fallback(&mut self) { | ||
| self.fallback_depth = None; | ||
| self.last_fallback_reason = None; | ||
| } |
There was a problem hiding this comment.
reset_fallback clears the depth flag but doesn't restore the primary provider
reset_fallback sets fallback_depth = None and clears last_fallback_reason, but never touches api_provider. After a fallback switch, api_provider holds the fallback provider (e.g. "openrouter"). Calling /provider fallback reset makes is_fallback_active() return false and removes the footer indicator, yet all subsequent requests still go to the fallback provider — the opposite of what the command advertises. There is currently no field that remembers the original primary provider, so reset_fallback has no value to restore. A primary_provider: ApiProvider field (set at startup and never changed by fallback logic) would fix both this method and the display.
| let active = app.api_provider.as_str().to_string(); | ||
| let depth = app.fallback_depth.unwrap_or(0); | ||
| let mut lines = vec![format!("Active: {active}")]; | ||
| for (i, name) in app.fallback_providers.iter().enumerate() { | ||
| let marker = if i == depth { " ◀ current" } else { "" }; | ||
| lines.push(format!(" [{i}] {name}{marker}")); | ||
| } |
There was a problem hiding this comment.
Fallback status display incorrectly marks the first fallback as current when the primary is active
When no fallback is active (fallback_depth = None), depth = unwrap_or(0) = 0, so the loop marks fallback_providers[0] with " ◀ current" even though the primary provider is in use. Comparing against Some(i + 1) instead of i == depth fixes the aliasing.
| let active = app.api_provider.as_str().to_string(); | |
| let depth = app.fallback_depth.unwrap_or(0); | |
| let mut lines = vec![format!("Active: {active}")]; | |
| for (i, name) in app.fallback_providers.iter().enumerate() { | |
| let marker = if i == depth { " ◀ current" } else { "" }; | |
| lines.push(format!(" [{i}] {name}{marker}")); | |
| } | |
| let active = app.api_provider.as_str().to_string(); | |
| let depth = app.fallback_depth; // None = primary active | |
| let mut lines = vec![format!("Active: {active}")]; | |
| for (i, name) in app.fallback_providers.iter().enumerate() { | |
| // depth is 1-based (Some(1) = first fallback = index 0) | |
| let marker = if depth == Some(i + 1) { " ◀ current" } else { "" }; | |
| lines.push(format!(" [{i}] {name}{marker}")); | |
| } |
|
Thanks @idling11 for taking this on, and thanks @hsdbeebou for the original #2574 request. The fallback-chain direction is useful, but after reading the diff, linked issue, and review comments, I do not think we should fast-path this full automatic fallback implementation into the v0.9 stewardship branch as-is. The immediate code issues are fixable, but important:
The larger issue is the provider-routing contract. The existing So I would like to keep this open as provider-routing work rather than merge or close it for v0.9. A good next split would be either:
I’m intentionally not closing #2574; the intent is still valuable. This just needs the safer provider-routing pass the issue comment called out before we ship it. |
There was a problem hiding this comment.
idling11 has reached the 50-review limit for trial accounts. To continue receiving code reviews, upgrade your plan.
There was a problem hiding this comment.
idling11 has reached the 50-review limit for trial accounts. To continue receiving code reviews, upgrade your plan.
There was a problem hiding this comment.
idling11 has reached the 50-review limit for trial accounts. To continue receiving code reviews, upgrade your plan.
There was a problem hiding this comment.
idling11 has reached the 50-review limit for trial accounts. To continue receiving code reviews, upgrade your plan.
Oh yes, you're right. I'll take another look at the overall implementation plan. It will most likely involve splitting the PR into smaller parts, and I'll submit it according to your suggestion. |
There was a problem hiding this comment.
idling11 has reached the 50-review limit for trial accounts. To continue receiving code reviews, upgrade your plan.
|
Thanks @idling11 for taking the time to contribute. This repository is currently observing a maintainer-managed contribution gate in dry-run mode, so this pull request is staying open. When enforcement is enabled, pull requests from contributors who are not listed in Please read |
Add automatic provider fallback so that when the active provider returns
a retryable error (429, 5xx, timeout, network), CodeWhale switches to
the next configured provider without user intervention.
Configuration
Changes
Config layer (
crates/config/src/lib.rs):ConfigToml.fallback_providers: Vec<ProviderKind>— deserialised fromfallback_providers = ["deepseek", "openrouter"]in TOMLProviderChainstruct: ordered provider list with position tracking,advance(),has_next(),is_fallback_active(),remaining()count, and TOML parsing
App state (
crates/tui/src/tui/app.rs):fallback_providers: Vec<String>— provider route namesfallback_depth: Option<usize>— current chain position (None = primary)last_fallback_reason: Option<String>— for UI displayadvance_fallback(reason)— advance to next fallback providerreset_fallback()— return to primaryis_fallback_active()— query current stateload_fallback_from_toml()— init from ConfigTomlRuntime integration (
crates/tui/src/tui/ui.rs):apply_engine_error_to_app(): when engine error is recoverable(Network / RateLimit / Timeout) and fallback providers are configured,
automatically calls
advance_fallback()instead of going offlineCommand (
crates/tui/src/commands/provider.rs):/provider fallback— show current chain, position, and last fallback reason/provider fallback reset— return to primary providerFooter (
crates/tui/src/tui/footer_ui.rs):footer_state_label(): shows "fallback →" in warning colour whena fallback provider is active
User experience
/provider fallbackshows chain: active provider, position, reason/provider fallback resetreturns to primaryTesting
load_fallback_from_toml(awaiting App::new let-binding refactor for startup init)
Closes #2574
Testing
cargo fmt --all -- --checkcargo clippy --workspace --all-targets --all-featurescargo test --workspace --all-featuresChecklist
Greptile Summary
This PR adds an automatic provider fallback chain: when a recoverable error (429, 5xx, timeout) occurs, CodeWhale switches to the next configured provider without user intervention and shows a "fallback →" indicator in the footer. The config layer (
ProviderChain) and footer indicator are implemented correctly, but the core runtime state inapp.rshas two bugs that make the feature behave incorrectly in all configurations.advance_fallbacktreatsfallback_depthas 1-based (0 = primary) but indexes intofallback_providerswithnext_depthdirectly instead ofnext_depth - 1. With one fallback configured, the exhaustion path fires immediately and the fallback is never tried. With two fallbacks,\"deepseek\"(index 0) is always skipped.reset_fallbackdoesn't restore the primary provider:reset_fallback()clearsfallback_depthso the UI shows "primary active", butapi_provideris never restored and requests keep going to the last fallback provider.Confidence Score: 3/5
The fallback chain silently misbehaves for every valid configuration: single-fallback setups never switch providers, multi-fallback setups skip the first entry, and the reset command leaves the wrong provider active while telling the user otherwise.
Both core behaviors the feature is built around — advancing through the chain in order and returning to the primary on reset — are broken. A user who configures one fallback provider and hits a 429 will see 'chain exhausted' with no fallback attempted. A user who resets after a fallback will get a misleading success message while requests keep going to the fallback provider.
crates/tui/src/tui/app.rs — advance_fallback and reset_fallback both need fixes before the feature can work as documented.
Important Files Changed
advance_fallbackhas an off-by-one that skips the first fallback (making a single-fallback config completely unusable), andreset_fallbacknever restoresapi_providerto the primary./provider fallbackcommand; display logic incorrectly marks the first fallback as "current" when the primary provider is still active due to a depth-zero aliasing issue.fallback_providersfield andProviderChainstruct with correct dedup, advance, and exhaustion logic; 6 unit tests cover the key scenarios.advance_fallback; status message format is correct, though it inherits the off-by-one fromadvance_fallbackfor the fallback position counter.Reviews (1): Last reviewed commit: "feat(provider): complete provider fallba..." | Re-trigger Greptile