Expose workload-aware model selection + normalized cache metrics for gateway routing

## Summary
`llm-gateway` now does route-class-aware provider/model selection and adds durable response/prefix caching with TTL. We need `llm-providers` to expose stronger provider-native model/caching capabilities so gateway policy can stay thin and avoid hardcoded provider quirks.

## Why this matters
Current gateway integration has to hardcode Groq/Cerebras behavior to get good results (model picks by route class, tool stripping on cheap routes). That logic belongs primarily in `llm-providers` so all clients benefit and policy remains centralized.

## Requested enhancements in `llm-providers`
1. **Route/workload-aware model defaults**
- Add first-class API for model selection by workload class (e.g. `summary`, `planning`, `code_draft`, `long_context`, `tool_loop`) instead of a single generic default.
- Keep provider-specific best models configurable without per-consumer hardcoding.

2. **Provider-native prompt caching normalization**
- Ensure supported providers expose prompt-cache controls and consistently report cache usage.
- Normalize usage fields across providers to include (when available):
  - `cachedInputTokens`
  - `cacheReadInputTokens`
  - `cacheWriteInputTokens`

3. **Model capability metadata surface**
- Expose capability metadata per model (tools, streaming, context window, reasoning profile, structured-output reliability) so router layers can make policy decisions without static registries.

4. **Stable override contract**
- Provide a documented override path (env/config/hook) for consumers to set provider+workload model preferences without forking selection logic.

## Gateway-side context (already implemented)
- Per-route provider model overrides for Groq/Cerebras.
- Response cache with TTL and max-entries eviction for safe routes.
- Cache-hit/usage telemetry integration.

## Acceptance criteria
- A gateway consumer can ask for a provider default model using workload intent and get stable, provider-tuned outputs.
- Cache usage fields are normalized enough that gateway telemetry does not need provider-specific parsing.
- Consumer no longer needs hardcoded per-provider model maps for basic routing quality.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Expose workload-aware model selection + normalized cache metrics for gateway routing #67

Summary

Why this matters

Requested enhancements in `llm-providers`

Gateway-side context (already implemented)

Acceptance criteria

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Expose workload-aware model selection + normalized cache metrics for gateway routing #67

Description

Summary

Why this matters

Requested enhancements in llm-providers

Gateway-side context (already implemented)

Acceptance criteria

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

Requested enhancements in `llm-providers`