Skip to content

Expose workload-aware model selection + normalized cache metrics for gateway routing #67

@stackbilt-admin

Description

@stackbilt-admin

Summary

llm-gateway now does route-class-aware provider/model selection and adds durable response/prefix caching with TTL. We need llm-providers to expose stronger provider-native model/caching capabilities so gateway policy can stay thin and avoid hardcoded provider quirks.

Why this matters

Current gateway integration has to hardcode Groq/Cerebras behavior to get good results (model picks by route class, tool stripping on cheap routes). That logic belongs primarily in llm-providers so all clients benefit and policy remains centralized.

Requested enhancements in llm-providers

  1. Route/workload-aware model defaults
  • Add first-class API for model selection by workload class (e.g. summary, planning, code_draft, long_context, tool_loop) instead of a single generic default.
  • Keep provider-specific best models configurable without per-consumer hardcoding.
  1. Provider-native prompt caching normalization
  • Ensure supported providers expose prompt-cache controls and consistently report cache usage.
  • Normalize usage fields across providers to include (when available):
    • cachedInputTokens
    • cacheReadInputTokens
    • cacheWriteInputTokens
  1. Model capability metadata surface
  • Expose capability metadata per model (tools, streaming, context window, reasoning profile, structured-output reliability) so router layers can make policy decisions without static registries.
  1. Stable override contract
  • Provide a documented override path (env/config/hook) for consumers to set provider+workload model preferences without forking selection logic.

Gateway-side context (already implemented)

  • Per-route provider model overrides for Groq/Cerebras.
  • Response cache with TTL and max-entries eviction for safe routes.
  • Cache-hit/usage telemetry integration.

Acceptance criteria

  • A gateway consumer can ask for a provider default model using workload intent and get stable, provider-tuned outputs.
  • Cache usage fields are normalized enough that gateway telemetry does not need provider-specific parsing.
  • Consumer no longer needs hardcoded per-provider model maps for basic routing quality.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions