Skip to content

FishRaposo/07-costpilot

Repository files navigation

CostPilot

LLM Cost and Latency Optimization Dashboard

An LLM usage analytics dashboard for cost, latency, routing, and optimization decisions.

Overview

CostPilot is a dashboard and middleware toolkit that tracks LLM usage, cost, latency, cache hit rate, model selection, and workflow-level spend. It serves as the financial control panel for AI systems.

Features

  • Real-time Cost Tracking: Monitor token usage and costs across all LLM providers
  • Latency Analytics: Track response times with p50, p95, p99 percentiles
  • Workflow Attribution: Attribute costs to specific workflows and features
  • Cache Hit Rate: Measure prompt caching effectiveness and savings
  • Model Comparison: Compare cost and performance across models
  • Expensive Prompt Detection: Identify costly prompts for optimization
  • Budget Alerts: Set spending thresholds and get notified

Architecture

┌─────────────┐     ┌──────────────┐     ┌────────────┐
│   SDK/MW     │────▶│   FastAPI     │────▶│ PostgreSQL │
│  (Python)    │     │   Server      │     │            │
└─────────────┘     └──────┬───────┘     └────────────┘
                           │
                    ┌──────▼───────┐
                    │   Next.js     │
                    │   Dashboard   │
                    └──────────────┘

Quick Start

Using Docker Compose

cp .env.example .env
docker-compose up -d

The dashboard will be available at http://localhost:3000 and the API at http://localhost:8000.

Manual Setup

Server

cd server
pip install -r requirements.txt
uvicorn app.main:app --reload

Dashboard

cd dashboard
npm install
npm run dev

SDK

cd sdk
pip install -e .

SDK Usage

Basic Usage

from costpilot import CostPilotClient

client = CostPilotClient(
    server_url="http://localhost:8000",
    api_key="your-api-key",
    project="my-project"
)

await client.log_usage({
    "model": "gpt-4o",
    "workflow": "summarization",
    "prompt_tokens": 1500,
    "completion_tokens": 500,
    "latency_ms": 1200,
    "cached": False
})

Decorator Usage

from costpilot.decorators import track_cost, track_llm_call

@track_cost(workflow="content-generation")
@track_llm_call(model="gpt-4o")
async def generate_content(prompt: str) -> str:
    response = await openai.chat.completions.create(...)
    return response.choices[0].message.content

ASGI Middleware

from costpilot.middleware import CostPilotMiddleware

app.add_middleware(CostPilotMiddleware, client=client)

Token Economics

CostPilot calculates costs based on provider pricing data:

  • Input tokens: Charged per 1K tokens at the input rate
  • Output tokens: Charged per 1K tokens at the output rate
  • Cache savings: Cached tokens are tracked separately for savings calculation
  • Workflow aggregation: Costs roll up to workflow and project levels

Pricing Configuration

Pricing data is loaded from YAML configuration files. See config/pricing.example.yaml for the format.

Supported providers:

  • OpenAI (GPT-4o, GPT-4o-mini, GPT-4 Turbo, GPT-3.5 Turbo)
  • Anthropic (Claude 3.5 Sonnet, Claude 3 Opus, Claude 3 Haiku)
  • Google (Gemini 1.5 Pro, Gemini 1.5 Flash)
  • Mistral (Mistral Large, Mistral Medium)

API Endpoints

Endpoint Method Description
/api/v1/usage POST Log a usage record
/api/v1/usage/batch POST Log batch usage records
/api/v1/costs GET Query cost data
/api/v1/costs/by-workflow GET Costs grouped by workflow
/api/v1/costs/by-model GET Costs grouped by model
/api/v1/analytics/over-time GET Cost trends over time
/api/v1/analytics/expensive-prompts GET Most expensive prompts
/api/v1/analytics/optimization GET Optimization suggestions
/api/v1/health GET Health check

Dashboard Pages

  • Home: Summary overview with key metrics and charts
  • Costs: Detailed cost breakdown by model, workflow, and time
  • Latency: Latency distribution and trends per model
  • Workflows: Workflow-level spend and performance metrics

Development

Running Tests

# SDK tests
cd sdk && pytest

# Server tests
cd server && pytest

# Dashboard build
cd dashboard && npm run build

Environment Variables

See .env.example for all configuration options.

Budget Alert Payloads

Budget thresholds can include webhook_url, warning_threshold_percent, and critical_threshold_percent. CostPilot records the last crossed threshold and exposes the alert payload from /api/v1/budget/status/{project}:

{
  "project": "my-project",
  "monthly_budget_usd": 1000.0,
  "current_spend_usd": 850.0,
  "percent_used": 85.0,
  "status": "warning",
  "threshold_percent": 80,
  "triggered_at": "2026-05-08T22:00:00Z"
}

Outbound webhook delivery requires an approved sender integration; the API stores webhook configuration and exposes the exact payload safely.

License

MIT

About

LLM cost, latency, and savings dashboard. Next.js + Python + PostgreSQL. Track costs per workflow, measure latency, and optimize AI spend.

Topics

Resources

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors