feat: LLM governance — cost tracking, fallback chains, spending limits, rate limiting

# LLM Governance Features

MCP Mesh captures token counts and has pricing data per model, but lacks cost calculation, fallback routing, spending limits, and rate limiting. These are the top 4 features needed for LiteLLM-like governance over LLM spend — without integrating a separate Python proxy.

## Why not integrate LiteLLM directly?

- LiteLLM is a Python proxy server — architectural mismatch with Mesh's Bun/TS stack
- Mesh already has native AI SDK v3 adapters with higher fidelity (reasoning tokens, multi-modal, tool calling loops)
- Adding LiteLLM would mean running a separate Python service, doubling infra complexity
- Building cost tracking + fallback chains gives 80% of LiteLLM's value without the complexity

---

## Feature 1: Cost Calculation & Tracking

**Gap:** Mesh captures `inputTokens`/`outputTokens` AND has per-model pricing data (`costs.input`/`costs.output`), but never multiplies them together.

### What to build
- Add `cost_usd` field to monitoring records (computed: `inputTokens * inputCost + outputTokens * outputCost`)
- New `computeLlmCost()` utility in `apps/mesh/src/ai-providers/cost.ts`
- Emit cost in `monitorLlmCall()` and `recordLlmCallMetrics()` at both `onFinish` and `onError` callbacks
- Add `COST_USD` to `MONITORING_LOG_ATTR` and `MonitoringRow`
- New counter metric `tool.execution.cost_usd`
- Cost aggregation query in `monitoring-sql.ts` (sum by model/user/time period)

### Key files
- `apps/mesh/src/monitoring/emit-llm-call.ts`
- `apps/mesh/src/monitoring/schema.ts`
- `apps/mesh/src/monitoring/record-llm-call-metrics.ts`
- `apps/mesh/src/api/routes/decopilot/stream-core.ts` (~line 701, 757)
- `apps/mesh/src/storage/monitoring-sql.ts`

---

## Feature 2: Model Fallback Chains

**Gap:** If a provider returns an error (rate limit, outage), the request fails. No retry with alternative model.

### What to build
- `withFallback()` wrapper in `apps/mesh/src/ai-providers/fallback.ts`
- On 429 or 5xx errors, automatically try next model in chain
- Log fallback events via OTel span events
- Integrate into OpenAI-compat endpoint (`x-fallback-models` header or request body extension)
- Integrate into decopilot stream (read fallback config from agent/virtual MCP config)

### Key files
- `apps/mesh/src/api/routes/openai-compat.ts`
- `apps/mesh/src/api/routes/decopilot/stream-core.ts`

---

## Feature 3: Spending Limits & Budget Enforcement

**Gap:** No way to cap spend per org/user/API key. Depends on Feature 1.

### What to build
- New `spending_limits` table (migration `062-spending-limits.ts`):
  - `entity_type` (organization / api_key / user), `entity_id`, `limit_usd`, `period` (daily/weekly/monthly)
- Storage operations: CRUD + `getCurrentSpend()` + `checkBudget()`
- Budget enforcement middleware (Hono) that checks spend before LLM calls
- Returns 429 with `x-budget-remaining` header when exceeded
- Fail-open on DB errors

### Key files
- `apps/mesh/migrations/062-spending-limits.ts` (new)
- `apps/mesh/src/storage/spending-limits.ts` (new)
- `apps/mesh/src/api/middleware/budget-check.ts` (new)
- `apps/mesh/src/api/app.ts`

---

## Feature 4: Rate Limiting (RPM/TPM)

**Gap:** Mesh relies entirely on provider rate limits. No org-level enforcement.

### What to build
- In-memory sliding window rate limiter (Map-based, single process)
- `rate_limits` table alongside spending_limits migration
  - `entity_type`, `entity_id`, `rpm_limit`, `tpm_limit`
- Rate limit middleware (Hono) that checks RPM before LLM calls
- Returns 429 with `Retry-After` and `x-ratelimit-*` headers
- TPM enforcement is post-hoc (record after call, reject next if over)

### Key files
- `apps/mesh/src/api/middleware/rate-limiter.ts` (new)
- `apps/mesh/src/api/middleware/rate-limit-check.ts` (new)
- `apps/mesh/src/api/app.ts`

---

## Implementation Order

```
Phase 1 (quick wins):     Feature 1 — cost calculation
Phase 2 (reliability):    Feature 2 — model fallback chains
Phase 3 (governance):     Features 3 & 4 — spending limits + rate limiting
```

## Middleware ordering on routes

```
rate-limit check → budget check → handler
```

---

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: LLM governance — cost tracking, fallback chains, spending limits, rate limiting #2958

LLM Governance Features

Why not integrate LiteLLM directly?

Feature 1: Cost Calculation & Tracking

What to build

Key files

Feature 2: Model Fallback Chains

What to build

Key files

Feature 3: Spending Limits & Budget Enforcement

What to build

Key files

Feature 4: Rate Limiting (RPM/TPM)

What to build

Key files

Implementation Order

Middleware ordering on routes

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

feat: LLM governance — cost tracking, fallback chains, spending limits, rate limiting #2958

Description

LLM Governance Features

Why not integrate LiteLLM directly?

Feature 1: Cost Calculation & Tracking

What to build

Key files

Feature 2: Model Fallback Chains

What to build

Key files

Feature 3: Spending Limits & Budget Enforcement

What to build

Key files

Feature 4: Rate Limiting (RPM/TPM)

What to build

Key files

Implementation Order

Middleware ordering on routes

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions