Skip to content

feat: LLM governance — cost tracking, fallback chains, spending limits, rate limiting #2958

@vibegui

Description

@vibegui

LLM Governance Features

MCP Mesh captures token counts and has pricing data per model, but lacks cost calculation, fallback routing, spending limits, and rate limiting. These are the top 4 features needed for LiteLLM-like governance over LLM spend — without integrating a separate Python proxy.

Why not integrate LiteLLM directly?

  • LiteLLM is a Python proxy server — architectural mismatch with Mesh's Bun/TS stack
  • Mesh already has native AI SDK v3 adapters with higher fidelity (reasoning tokens, multi-modal, tool calling loops)
  • Adding LiteLLM would mean running a separate Python service, doubling infra complexity
  • Building cost tracking + fallback chains gives 80% of LiteLLM's value without the complexity

Feature 1: Cost Calculation & Tracking

Gap: Mesh captures inputTokens/outputTokens AND has per-model pricing data (costs.input/costs.output), but never multiplies them together.

What to build

  • Add cost_usd field to monitoring records (computed: inputTokens * inputCost + outputTokens * outputCost)
  • New computeLlmCost() utility in apps/mesh/src/ai-providers/cost.ts
  • Emit cost in monitorLlmCall() and recordLlmCallMetrics() at both onFinish and onError callbacks
  • Add COST_USD to MONITORING_LOG_ATTR and MonitoringRow
  • New counter metric tool.execution.cost_usd
  • Cost aggregation query in monitoring-sql.ts (sum by model/user/time period)

Key files

  • apps/mesh/src/monitoring/emit-llm-call.ts
  • apps/mesh/src/monitoring/schema.ts
  • apps/mesh/src/monitoring/record-llm-call-metrics.ts
  • apps/mesh/src/api/routes/decopilot/stream-core.ts (~line 701, 757)
  • apps/mesh/src/storage/monitoring-sql.ts

Feature 2: Model Fallback Chains

Gap: If a provider returns an error (rate limit, outage), the request fails. No retry with alternative model.

What to build

  • withFallback() wrapper in apps/mesh/src/ai-providers/fallback.ts
  • On 429 or 5xx errors, automatically try next model in chain
  • Log fallback events via OTel span events
  • Integrate into OpenAI-compat endpoint (x-fallback-models header or request body extension)
  • Integrate into decopilot stream (read fallback config from agent/virtual MCP config)

Key files

  • apps/mesh/src/api/routes/openai-compat.ts
  • apps/mesh/src/api/routes/decopilot/stream-core.ts

Feature 3: Spending Limits & Budget Enforcement

Gap: No way to cap spend per org/user/API key. Depends on Feature 1.

What to build

  • New spending_limits table (migration 062-spending-limits.ts):
    • entity_type (organization / api_key / user), entity_id, limit_usd, period (daily/weekly/monthly)
  • Storage operations: CRUD + getCurrentSpend() + checkBudget()
  • Budget enforcement middleware (Hono) that checks spend before LLM calls
  • Returns 429 with x-budget-remaining header when exceeded
  • Fail-open on DB errors

Key files

  • apps/mesh/migrations/062-spending-limits.ts (new)
  • apps/mesh/src/storage/spending-limits.ts (new)
  • apps/mesh/src/api/middleware/budget-check.ts (new)
  • apps/mesh/src/api/app.ts

Feature 4: Rate Limiting (RPM/TPM)

Gap: Mesh relies entirely on provider rate limits. No org-level enforcement.

What to build

  • In-memory sliding window rate limiter (Map-based, single process)
  • rate_limits table alongside spending_limits migration
    • entity_type, entity_id, rpm_limit, tpm_limit
  • Rate limit middleware (Hono) that checks RPM before LLM calls
  • Returns 429 with Retry-After and x-ratelimit-* headers
  • TPM enforcement is post-hoc (record after call, reject next if over)

Key files

  • apps/mesh/src/api/middleware/rate-limiter.ts (new)
  • apps/mesh/src/api/middleware/rate-limit-check.ts (new)
  • apps/mesh/src/api/app.ts

Implementation Order

Phase 1 (quick wins):     Feature 1 — cost calculation
Phase 2 (reliability):    Feature 2 — model fallback chains
Phase 3 (governance):     Features 3 & 4 — spending limits + rate limiting

Middleware ordering on routes

rate-limit check → budget check → handler

🤖 Generated with Claude Code

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions