Auto model router adds a full network round-trip before every message

## Problem

When `model = "auto"` is configured, every user message triggers a **serial** Flash Router API call (`deepseek-v4-flash`, `max_tokens: 96`) before the actual model request can be dispatched. This adds an unavoidable network round-trip (typically 300 ms–2 s, up to 4 s on timeout) to every turn, even for trivially routable messages like "continue", "yes", or "list files".

The router call is strictly serial — the real API request cannot start until the router responds. The effect is a perceptible delay before streaming begins on every message, making the TUI feel sluggish compared to a fixed-model setup.

**The current implementation is equivalent to: to decide whether to drive a Ferrari or a Corolla, you first take an Uber to go ask for directions, then come back to get your car.**

Who is affected: every user who enables `model = "auto"` (whether via the `/model auto` command or the Model Picker).

## Proposed solution

Replace or supplement the Flash Router with a **local, zero-network-cost heuristic that runs first**. The router API call should be reserved only for genuinely ambiguous cases the local heuristic cannot confidently resolve.

Concrete proposal:

1. **Heuristic-first dispatch**: Run `auto_model_heuristic()` synchronously before any network call. If the heuristic returns a strong signal (e.g. message is very short → Flash, contains complex keywords → Pro), use it immediately and **skip the Flash Router entirely**.
2. **Router only for grey zone**: Only invoke the Flash Router when the heuristic lands on the default branch (100–500 chars, no decisive keywords). Those are the cases where a lightweight LLM classifier can add value.
3. **Optional: parallel speculative dispatch** — send both the router request AND the likely-heuristic request simultaneously. If the router agrees, stream the heuristic response; if it disagrees, abort and retry with the correct model. (More complex, higher ceiling.)

## Use case

I use `model = "auto"` to balance cost and capability — Flash for quick lookups, Pro for real work. But the upfront delay on every message is noticeable enough that I frequently switch back to a fixed model just to avoid the hesitation. The feature loses its value if the cost is a perceptible pause on even the simplest exchanges.

## Alternatives considered

- **Pure heuristic with no router** — the current keyword + length heuristic in `auto_model_heuristic()` plus `auto_reasoning::select()` already covers a wide range. Removing the router entirely would eliminate the latency but would lose the semantic understanding the Flash Router can bring for nuanced requests.
- **Cached routing** — reuse the previous turn's routing decision within the same session. This works for long running conversations but breaks on topic shifts, and the first message still pays the latency cost.
- **Parallel routing** — dispatch both the router and the most-likely request at once. Technically feasible but wastes API quota when the router disagrees.

The heuristic-first approach is the simplest change with the largest latency win.

## Impact

Every single turn in auto mode. For power users who leave auto mode on all session, this is dozens to hundreds of turns per day, each paying 300 ms to 2 s of dead time before any visible response.

## Additional context

Relevant source:
- Router dispatch: `crates/tui/src/commands/config.rs:922-958` — builds and calls the Flash router API
- Heuristic fallback: `crates/tui/src/commands/config.rs:720-735` — `auto_model_heuristic()` (keyword + length, pure local, ~0 ms)
- Integration point: `crates/tui/src/tui/ui.rs:4397-4412` — where the routing result is consumed; the UI blocks on `resolve_auto_model_selection().await` before it can send the real `Op::SendMessage`

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Auto model router adds a full network round-trip before every message #1549

Problem

Proposed solution

Use case

Alternatives considered

Impact

Additional context

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Auto model router adds a full network round-trip before every message #1549

Description

Problem

Proposed solution

Use case

Alternatives considered

Impact

Additional context

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions