feat(provider): complete provider fallback chain (#2574) by idling11 · Pull Request #2773 · Hmbown/CodeWhale

idling11 · 2026-06-05T03:54:07Z

Add automatic provider fallback so that when the active provider returns
a retryable error (429, 5xx, timeout, network), CodeWhale switches to
the next configured provider without user intervention.

Configuration

# ~/.codewhale/config.toml
provider = "nvidia-nim"
fallback_providers = ["deepseek", "openrouter"]

Changes

Config layer (crates/config/src/lib.rs):

ConfigToml.fallback_providers: Vec<ProviderKind> — deserialised from
fallback_providers = ["deepseek", "openrouter"] in TOML
ProviderChain struct: ordered provider list with position tracking,
advance(), has_next(), is_fallback_active(), remaining()
6 unit tests covering construction, advance, exhaust, dedup, remaining
count, and TOML parsing

App state (crates/tui/src/tui/app.rs):

fallback_providers: Vec<String> — provider route names
fallback_depth: Option<usize> — current chain position (None = primary)
last_fallback_reason: Option<String> — for UI display
advance_fallback(reason) — advance to next fallback provider
reset_fallback() — return to primary
is_fallback_active() — query current state
load_fallback_from_toml() — init from ConfigToml

Runtime integration (crates/tui/src/tui/ui.rs):

apply_engine_error_to_app(): when engine error is recoverable
(Network / RateLimit / Timeout) and fallback providers are configured,
automatically calls advance_fallback() instead of going offline

Command (crates/tui/src/commands/provider.rs):

/provider fallback — show current chain, position, and last fallback reason
/provider fallback reset — return to primary provider

Footer (crates/tui/src/tui/footer_ui.rs):

footer_state_label(): shows "fallback →" in warning colour when
a fallback provider is active

User experience

Configure fallback providers in config.toml
Primary provider encounters 429/5xx/timeout
Footer changes to "fallback →" (orange)
Status message: "Switched to deepseek (fallback 1/2): HTTP 429..."
/provider fallback shows chain: active provider, position, reason
/provider fallback reset returns to primary

Testing

93 config tests pass (6 new ProviderChain tests)
185 provider command tests pass
Dead code warning suppressed for load_fallback_from_toml
(awaiting App::new let-binding refactor for startup init)

Closes #2574

Testing

cargo fmt --all -- --check
cargo clippy --workspace --all-targets --all-features
cargo test --workspace --all-features

Checklist

Updated docs or comments as needed
Added or updated tests where relevant
Verified TUI behavior manually if UI changes

Greptile Summary

This PR adds an automatic provider fallback chain: when a recoverable error (429, 5xx, timeout) occurs, CodeWhale switches to the next configured provider without user intervention and shows a "fallback →" indicator in the footer. The config layer (ProviderChain) and footer indicator are implemented correctly, but the core runtime state in app.rs has two bugs that make the feature behave incorrectly in all configurations.

First fallback always skipped / single-fallback never used: advance_fallback treats fallback_depth as 1-based (0 = primary) but indexes into fallback_providers with next_depth directly instead of next_depth - 1. With one fallback configured, the exhaustion path fires immediately and the fallback is never tried. With two fallbacks, \"deepseek\" (index 0) is always skipped.
reset_fallback doesn't restore the primary provider: reset_fallback() clears fallback_depth so the UI shows "primary active", but api_provider is never restored and requests keep going to the last fallback provider.

Confidence Score: 3/5

The fallback chain silently misbehaves for every valid configuration: single-fallback setups never switch providers, multi-fallback setups skip the first entry, and the reset command leaves the wrong provider active while telling the user otherwise.

Both core behaviors the feature is built around — advancing through the chain in order and returning to the primary on reset — are broken. A user who configures one fallback provider and hits a 429 will see 'chain exhausted' with no fallback attempted. A user who resets after a fallback will get a misleading success message while requests keep going to the fallback provider.

crates/tui/src/tui/app.rs — advance_fallback and reset_fallback both need fixes before the feature can work as documented.

Important Files Changed

Filename	Overview
crates/tui/src/tui/app.rs	Adds fallback state fields and helper methods, but `advance_fallback` has an off-by-one that skips the first fallback (making a single-fallback config completely unusable), and `reset_fallback` never restores `api_provider` to the primary.
crates/tui/src/commands/provider.rs	Adds `/provider fallback` command; display logic incorrectly marks the first fallback as "current" when the primary provider is still active due to a depth-zero aliasing issue.
crates/config/src/lib.rs	Adds `fallback_providers` field and `ProviderChain` struct with correct dedup, advance, and exhaustion logic; 6 unit tests cover the key scenarios.
crates/tui/src/tui/ui.rs	Wires recoverable errors to `advance_fallback`; status message format is correct, though it inherits the off-by-one from `advance_fallback` for the fallback position counter.
crates/tui/src/tui/footer_ui.rs	Adds "fallback →" indicator in warning colour when a fallback is active; straightforward and correct.

_{Reviews (1): Last reviewed commit: "feat(provider): complete provider fallba..." | Re-trigger Greptile}

Greptile also left 3 inline comments on this PR.

Add the configuration and data-model layer for automatic provider fallback. When active provider returns a retryable error (429, 5xx, timeout), CodeWhale can switch to the next configured provider without user intervention. Changes: - `crates/config/src/lib.rs`: - `ConfigToml.fallback_providers: Vec<ProviderKind>` — serialised from `fallback_providers = ["deepseek", "openrouter"]` in TOML - `ProviderChain` struct: tracks ordered provider list + current position, with advance/has_next/is_fallback_active helpers - 6 new unit tests covering chain construction, advance, exhaust, dedup, remaining count, and TOML parsing - `crates/tui/src/tui/app.rs`: - `App.fallback_providers: Vec<String>` — provider route names - `App.fallback_depth: Option<usize>` — current chain position - `App.last_fallback_reason: Option<String>` — UI display Follow-up PRs will wire: - Runtime fallback execution in the agent loop - `/provider fallback` command - Status bar fallback indicator Closes Hmbown#2574

…own#2574) Full provider fallback chain implementation: - `crates/config/src/lib.rs`: - `ConfigToml.fallback_providers: Vec<ProviderKind>` — TOML parsing - `ProviderChain` struct with advance/has_next/is_fallback_active - 6 unit tests - `crates/tui/src/tui/app.rs`: - `fallback_providers`, `fallback_depth`, `last_fallback_reason` fields - `advance_fallback(reason)` — advance to next fallback provider - `reset_fallback()` — return to primary - `is_fallback_active()` — query current state - `load_fallback_from_toml()` — init from config - `crates/tui/src/commands/provider.rs`: - `/provider fallback` — show chain + position + last reason - `/provider fallback reset` — return to primary provider Follow-up: wire `advance_fallback` into the engine error handler for automatic runtime fallback on 429/5xx/timeout. Closes Hmbown#2574

Add automatic provider fallback so that when the active provider returns a retryable error (429, 5xx, timeout, network), CodeWhale switches to the next configured provider without user intervention. ### Configuration ```toml # ~/.codewhale/config.toml provider = "nvidia-nim" fallback_providers = ["deepseek", "openrouter"] ``` ### Changes **Config layer** (`crates/config/src/lib.rs`): - `ConfigToml.fallback_providers: Vec<ProviderKind>` — deserialised from `fallback_providers = ["deepseek", "openrouter"]` in TOML - `ProviderChain` struct: ordered provider list with position tracking, `advance()`, `has_next()`, `is_fallback_active()`, `remaining()` - 6 unit tests covering construction, advance, exhaust, dedup, remaining count, and TOML parsing **App state** (`crates/tui/src/tui/app.rs`): - `fallback_providers: Vec<String>` — provider route names - `fallback_depth: Option<usize>` — current chain position (None = primary) - `last_fallback_reason: Option<String>` — for UI display - `advance_fallback(reason)` — advance to next fallback provider - `reset_fallback()` — return to primary - `is_fallback_active()` — query current state - `load_fallback_from_toml()` — init from ConfigToml **Runtime integration** (`crates/tui/src/tui/ui.rs`): - `apply_engine_error_to_app()`: when engine error is recoverable (Network / RateLimit / Timeout) and fallback providers are configured, automatically calls `advance_fallback()` instead of going offline **Command** (`crates/tui/src/commands/provider.rs`): - `/provider fallback` — show current chain, position, and last fallback reason - `/provider fallback reset` — return to primary provider **Footer** (`crates/tui/src/tui/footer_ui.rs`): - `footer_state_label()`: shows "fallback →" in warning colour when a fallback provider is active ### User experience 1. Configure fallback providers in config.toml 2. Primary provider encounters 429/5xx/timeout 3. Footer changes to "fallback →" (orange) 4. Status message: "Switched to deepseek (fallback 1/2): HTTP 429..." 5. `/provider fallback` shows chain: active provider, position, reason 6. `/provider fallback reset` returns to primary ### Testing - 93 config tests pass (6 new ProviderChain tests) - 185 provider command tests pass - Dead code warning suppressed for `load_fallback_from_toml` (awaiting App::new let-binding refactor for startup init) Closes Hmbown#2574

github-actions · 2026-06-05T03:54:17Z

Thanks @idling11 for taking the time to contribute.

This repository is currently observing a maintainer-managed contribution gate in dry-run mode, so this pull request is staying open. When enforcement is enabled, pull requests from contributors who are not listed in .github/APPROVED_CONTRIBUTORS will be closed automatically.

Please read CONTRIBUTING.md for the expected contribution shape. A maintainer can grant PR access by commenting /lgtm on a pull request.

gemini-code-assist

Code Review

This pull request introduces a provider fallback chain feature (#2574) that automatically switches to alternative providers when recoverable errors (like rate limits, timeouts, or network issues) occur. It adds configuration support, a ProviderChain state manager, TUI integration via /provider fallback, and status indicators. The review identified several issues: a critical bug in advance_fallback that skips the first fallback provider and breaks single-fallback setups, a UI bug that incorrectly displays the current fallback marker when none is active, a 0-based index display issue in status messages, and a robustness concern regarding potential out-of-bounds indexing in ProviderChain::current().

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

gemini-code-assist · 2026-06-05T03:56:54Z

+        let current_depth = self.fallback_depth.unwrap_or(0);
+        let next_depth = current_depth + 1;


Critical Bug: Fallback Chain Skipping and Premature Exhaustion

In advance_fallback, when self.fallback_depth is None (indicating the primary provider is active), current_depth defaults to 0, making next_depth equal to 1.

This causes two major issues:

Skipping the first fallback: The first fallback provider at self.fallback_providers[0] is completely skipped, and the chain immediately transitions to the second fallback provider at index 1.

Complete failure with a single fallback: If only one fallback provider is configured (self.fallback_providers.len() == 1), next_depth (1) is >= the length of the fallback providers (1). The fallback chain is immediately considered exhausted, and fallback never executes.

To fix this, next_depth should be computed as 0 when self.fallback_depth is None, and incremented by 1 only when it is Some.

Suggested change

let current_depth = self.fallback_depth.unwrap_or(0);

let next_depth = current_depth + 1;

let next_depth = self.fallback_depth.map_or(0, |d| d + 1);

gemini-code-assist · 2026-06-05T03:56:54Z

+            let depth = app.fallback_depth.unwrap_or(0);
+            let mut lines = vec![format!("Active: {active}")];
+            for (i, name) in app.fallback_providers.iter().enumerate() {
+                let marker = if i == depth { " ◀ current" } else { "" };


Bug: Incorrect Current Marker Display

When app.fallback_depth is None (meaning the primary provider is active, not any fallback), depth defaults to 0. This incorrectly marks the first fallback provider (i == 0) with the ◀ current label in the /provider fallback command output.

The marker should only be displayed if app.fallback_depth is explicitly Some(i).

Suggested change

let depth = app.fallback_depth.unwrap_or(0);

let mut lines = vec![format!("Active: {active}")];

for (i, name) in app.fallback_providers.iter().enumerate() {

let marker = if i == depth { " ◀ current" } else { "" };

let mut lines = vec![format!("Active: {active}")];

for (i, name) in app.fallback_providers.iter().enumerate() {

let marker = if app.fallback_depth == Some(i) { " ◀ current" } else { "" };

gemini-code-assist · 2026-06-05T03:56:54Z

+            app.status_message = Some(format!(
+                "Switched to {} (fallback {}/{}): {message}",
+                app.api_provider.as_str(),
+                app.fallback_depth.unwrap_or(0),


Bug: Incorrect Fallback Index Display

When app.fallback_depth is Some(depth) (0-indexed), displaying app.fallback_depth.unwrap_or(0) results in a 0-based index (e.g., "fallback 0/2" for the first fallback). To match the user-friendly 1-based index described in the PR (e.g., "fallback 1/2"), we should map Some(depth) to depth + 1.

app.fallback_depth.map_or(0, |d| d + 1),

gemini-code-assist · 2026-06-05T03:56:54Z

+    pub fn current(&self) -> ProviderKind {
+        self.providers[self.position]
+    }


Robustness: Safe Indexing in current()

Since ProviderChain has public fields providers and position, external code can mutate them and potentially break the invariant position < providers.len(). To prevent out-of-bounds panics, use safe indexing with .get().copied() and fall back to the first provider (which is guaranteed to exist).

pub fn current(&self) -> ProviderKind { self.providers.get(self.position).copied().unwrap_or(self.providers[0]) }

greptile-apps · 2026-06-05T03:57:48Z

+    pub fn advance_fallback(&mut self, reason: impl Into<String>) -> bool {
+        if self.fallback_providers.is_empty() {
+            return false;
+        }
+        let current_depth = self.fallback_depth.unwrap_or(0);
+        let next_depth = current_depth + 1;
+        if next_depth >= self.fallback_providers.len() {
+            self.last_fallback_reason = Some(format!(
+                "Fallback chain exhausted after {}: {}",
+                self.fallback_providers.len(),
+                reason.into()
+            ));
+            return false;
+        }
+        let next_name = &self.fallback_providers[next_depth];
+        if let Some(next_provider) = ApiProvider::parse(next_name) {
+            self.fallback_depth = Some(next_depth);
+            self.last_fallback_reason =
+                Some(format!("Fell back to {}: {}", next_name, reason.into()));
+            self.api_provider = next_provider;
+            true
+        } else {
+            self.last_fallback_reason = Some(format!("Unknown fallback provider: {next_name}"));
+            false
+        }
+    }


Off-by-one: first fallback provider always skipped (or never reached)

fallback_providers is 0-indexed (first fallback = index 0), but advance_fallback starts with current_depth = unwrap_or(0) then accesses fallback_providers[next_depth] — where next_depth = current_depth + 1 = 1 on the very first call, skipping index 0. With exactly one fallback configured, next_depth = 1 >= len() = 1 is true immediately, so the exhaustion path fires and the only fallback is never tried.

For fallback_providers = ["deepseek", "openrouter"]:

Call 1: current_depth=0, next_depth=1, accesses fallback_providers[1] = "openrouter" — "deepseek" is skipped forever.

Call 2: current_depth=1, next_depth=2 >= 2 → "exhausted" even though the chain only visited one provider.

The fix is to treat fallback_depth as 1-based, so the first advance maps to fallback_providers[0]: change the bounds check to next_depth > self.fallback_providers.len() and the access to &self.fallback_providers[next_depth - 1].

greptile-apps · 2026-06-05T03:57:49Z

+    /// Reset the fallback chain to the primary provider.
+    pub fn reset_fallback(&mut self) {
+        self.fallback_depth = None;
+        self.last_fallback_reason = None;
+    }


reset_fallback clears the depth flag but doesn't restore the primary provider

reset_fallback sets fallback_depth = None and clears last_fallback_reason, but never touches api_provider. After a fallback switch, api_provider holds the fallback provider (e.g. "openrouter"). Calling /provider fallback reset makes is_fallback_active() return false and removes the footer indicator, yet all subsequent requests still go to the fallback provider — the opposite of what the command advertises. There is currently no field that remembers the original primary provider, so reset_fallback has no value to restore. A primary_provider: ApiProvider field (set at startup and never changed by fallback logic) would fix both this method and the display.

greptile-apps · 2026-06-05T03:57:50Z

+            let active = app.api_provider.as_str().to_string();
+            let depth = app.fallback_depth.unwrap_or(0);
+            let mut lines = vec![format!("Active: {active}")];
+            for (i, name) in app.fallback_providers.iter().enumerate() {
+                let marker = if i == depth { " ◀ current" } else { "" };
+                lines.push(format!("  [{i}] {name}{marker}"));
+            }


Fallback status display incorrectly marks the first fallback as current when the primary is active

When no fallback is active (fallback_depth = None), depth = unwrap_or(0) = 0, so the loop marks fallback_providers[0] with " ◀ current" even though the primary provider is in use. Comparing against Some(i + 1) instead of i == depth fixes the aliasing.

Suggested change

let active = app.api_provider.as_str().to_string();

let depth = app.fallback_depth.unwrap_or(0);

let mut lines = vec![format!("Active: {active}")];

for (i, name) in app.fallback_providers.iter().enumerate() {

let marker = if i == depth { " ◀ current" } else { "" };

lines.push(format!(" [{i}] {name}{marker}"));

}

let active = app.api_provider.as_str().to_string();

let depth = app.fallback_depth; // None = primary active

let mut lines = vec![format!("Active: {active}")];

for (i, name) in app.fallback_providers.iter().enumerate() {

// depth is 1-based (Some(1) = first fallback = index 0)

let marker = if depth == Some(i + 1) { " ◀ current" } else { "" };

lines.push(format!(" [{i}] {name}{marker}"));

}

Hmbown · 2026-06-05T04:00:13Z

Thanks @idling11 for taking this on, and thanks @hsdbeebou for the original #2574 request. The fallback-chain direction is useful, but after reading the diff, linked issue, and review comments, I do not think we should fast-path this full automatic fallback implementation into the v0.9 stewardship branch as-is.

The immediate code issues are fixable, but important:

crates/tui/src/tui/app.rs skips the first fallback provider on the first advance, so a single-fallback config never actually falls back.
reset_fallback() clears the UI state but does not restore the primary provider, so the UI can say “primary” while requests still appear to be pointed at the fallback provider.
/provider fallback can mark the first fallback as current when the primary provider is active.

The larger issue is the provider-routing contract. The existing /provider switch path validates credentials, mutates the in-memory config, respawns the engine/client, syncs session state, updates model/capability-scoped state, and persists settings. This PR currently mutates TUI-side fallback/provider state from apply_engine_error_to_app(), but it does not go through that full switch path, so it can leave the UI and actual request client out of sync. That is especially risky because fallback can change API key/vendor billing, model capability, tool/schema support, context limits, and the user’s expectation mid-turn.

So I would like to keep this open as provider-routing work rather than merge or close it for v0.9. A good next split would be either:

a design/config PR that defines the fallback contract and tests the config model without claiming runtime auto-switch support yet, or
a runtime PR that routes fallback through the same engine/client rebuild semantics as /provider, with tests for first fallback, single fallback, reset-to-primary, all-fallbacks-exhausted, no-fallback-on-auth/model-not-found, and visible transcript/status disclosure.

I’m intentionally not closing #2574; the intent is still valuable. This just needs the safer provider-routing pass the issue comment called out before we ship it.

greptile-apps

idling11 has reached the 50-review limit for trial accounts. To continue receiving code reviews, upgrade your plan.

…tive

greptile-apps

idling11 has reached the 50-review limit for trial accounts. To continue receiving code reviews, upgrade your plan.

greptile-apps

idling11 has reached the 50-review limit for trial accounts. To continue receiving code reviews, upgrade your plan.

greptile-apps

idling11 has reached the 50-review limit for trial accounts. To continue receiving code reviews, upgrade your plan.

idling11 · 2026-06-05T04:12:48Z

Thanks @idling11 for taking this on, and thanks @hsdbeebou for the original #2574 request. The fallback-chain direction is useful, but after reading the diff, linked issue, and review comments, I do not think we should fast-path this full automatic fallback implementation into the v0.9 stewardship branch as-is.

The immediate code issues are fixable, but important:

crates/tui/src/tui/app.rs skips the first fallback provider on the first advance, so a single-fallback config never actually falls back.

reset_fallback() clears the UI state but does not restore the primary provider, so the UI can say “primary” while requests still appear to be pointed at the fallback provider.

/provider fallback can mark the first fallback as current when the primary provider is active.

The larger issue is the provider-routing contract. The existing /provider switch path validates credentials, mutates the in-memory config, respawns the engine/client, syncs session state, updates model/capability-scoped state, and persists settings. This PR currently mutates TUI-side fallback/provider state from apply_engine_error_to_app(), but it does not go through that full switch path, so it can leave the UI and actual request client out of sync. That is especially risky because fallback can change API key/vendor billing, model capability, tool/schema support, context limits, and the user’s expectation mid-turn.

So I would like to keep this open as provider-routing work rather than merge or close it for v0.9. A good next split would be either:

a design/config PR that defines the fallback contract and tests the config model without claiming runtime auto-switch support yet, or

a runtime PR that routes fallback through the same engine/client rebuild semantics as /provider, with tests for first fallback, single fallback, reset-to-primary, all-fallbacks-exhausted, no-fallback-on-auth/model-not-found, and visible transcript/status disclosure.

I’m intentionally not closing #2574; the intent is still valuable. This just needs the safer provider-routing pass the issue comment called out before we ship it.

Oh yes, you're right. I'll take another look at the overall implementation plan. It will most likely involve splitting the PR into smaller parts, and I'll submit it according to your suggestion.

greptile-apps

idling11 has reached the 50-review limit for trial accounts. To continue receiving code reviews, upgrade your plan.

github-actions · 2026-06-05T23:20:17Z

Thanks @idling11 for taking the time to contribute.

This repository is currently observing a maintainer-managed contribution gate in dry-run mode, so this pull request is staying open. When enforcement is enabled, pull requests from contributors who are not listed in .github/APPROVED_CONTRIBUTORS will be closed automatically.

Please read CONTRIBUTING.md for the expected contribution shape. A maintainer can grant PR access by commenting /lgtm on a pull request.

idling11 added 3 commits June 5, 2026 11:14

gemini-code-assist Bot reviewed Jun 5, 2026

View reviewed changes