Plano already sits in the routing layer but it doesn't take advantage of that position to automatically reduce cost for developers. We should build a first-class GPU free-tier arbitrage policy: routing low-stakes or bursty agent traffic to free/low-cost providers when available, with deterministic fallback to the primary when they're unavailable or overloaded, and full trace visibility into every routing decision.
Requirements
What "done" looks like
A developer can add a minimal config block to enable arbitrage, run a request, and see in the trace: provider selected, reason (free-tier available), fallback chain if applicable.
Plano already sits in the routing layer but it doesn't take advantage of that position to automatically reduce cost for developers. We should build a first-class GPU free-tier arbitrage policy: routing low-stakes or bursty agent traffic to free/low-cost providers when available, with deterministic fallback to the primary when they're unavailable or overloaded, and full trace visibility into every routing decision.
Requirements
What "done" looks like
A developer can add a minimal config block to enable arbitrage, run a request, and see in the trace: provider selected, reason (free-tier available), fallback chain if applicable.