Pod Cost Manager

Workload-aware Kubernetes HPA scale-down — annotates pods with deletion-cost based on real-time activity metrics from a pluggable backend, so idle pods are removed first.

Overview

When Kubernetes HPA scales down a deployment, it has no awareness of which pods are actively processing work. This can lead to mid-operation pod termination, causing failures and retries.

Pod Cost Manager solves this by:

Querying a pluggable metrics backend for real-time pod activity metrics (configurable per workload)
Computing a deletion cost (0-1000) for each pod based on current workload
Annotating each pod with controller.kubernetes.io/pod-deletion-cost so Kubernetes preferentially removes idle pods first
Optionally managing HPA minReplicas to prevent scale-down below the active pod count

The pod-deletion-cost annotation is a Kubernetes-native mechanism respected by any HPA controller, including KEDA.

The result: HPA scale-down decisions become workload-aware, eliminating unnecessary failures.

How It Works

Architecture

Metrics Backend ──> Pod Cost Manager ──> Kubernetes API
  (Prometheus,       (CronJob)          (pod annotations + HPA patches)
   Datadog, etc.)

The manager runs as a Kubernetes CronJob (default: every minute). Each run:

Discovers target pods via label selectors
Queries the metrics backend for pod-level metrics in batch (one query per metric type, not per pod)
Enriches metrics with Kubernetes pod status (age, readiness, termination state)
Calculates a weighted cost score per pod
Annotates each pod with controller.kubernetes.io/pod-deletion-cost
(Optional) Patches HPA minReplicas = max(active_pods + buffer, baseline)

Cost Calculation

Each pod receives a score from 0 (idle, safe to remove) to 1000 (very active, protect):

Metric	Weight	Max Points	Description
Running Splits	10 per split	400	Actively executing query fragments
Waiting Splits	5 per split	300	Queued work ready to execute
High-Priority Tasks (L0-L2)	15 per task	150	Critical execution pipeline tasks
Low-Priority Tasks (L3-L4)	10 per task	100	Background execution tasks
CPU Utilization	0.5 per %	50	Current CPU usage rate
Output Buffers	50 per buffer	500	Active result streaming buffers (critical)

All metric names, weights, and caps are fully configurable via Helm values. The defaults are tuned for Trino distributed query workloads but can be adapted to any application.

Edge cases are handled before the weighted calculation:

Condition	Cost	Rationale
Terminating pod	0	Already being removed
Not-ready pod	50	Unhealthy, prefer removal
New pod (< 3 min)	500	Protect during startup
No metrics available	500	Assume active (safe default)

A small random jitter (default 0-10) is added to break ties between equally-scored pods.

Output Buffer Protection

Output buffers represent active result streaming from workers to the coordinator. Killing a worker with active output buffers causes immediate query failure with no recovery path. The high weight (50 per buffer) and cap (500 points) ensure these workers are strongly protected.

For Trino output buffer metrics, deploy the companion trino-buffer-exporter.

Quick Start

1. Build the Docker Image

docker build -t pod-cost-manager:latest .

2. Deploy with Helm

helm upgrade --install pod-cost-manager ./chart \
  -n trino \
  -f chart/values.yaml

3. Verify

# Check CronJob
kubectl get cronjobs -n trino

# Check recent job runs
kubectl get jobs -n trino -l app=pod-cost-manager

# View pod annotations
kubectl get pods -n trino -l component=worker -o jsonpath='{range .items[*]}{.metadata.name}{"\t"}{.metadata.annotations.controller\.kubernetes\.io/pod-deletion-cost}{"\n"}{end}'

Configuration Reference

Image

Parameter	Description	Default
`image.repository`	Container image repository	`pod-cost-manager`
`image.tag`	Container image tag	`latest`
`image.pullPolicy`	Image pull policy	`IfNotPresent`

Metrics Backend

Parameter	Description	Default
`metricsBackend.type`	Backend type: `prometheus`	`prometheus`
`metricsBackend.url`	Server URL	`http://prometheus-server.prometheus.svc`
`metricsBackend.port`	Server port	`80`
`metricsBackend.timeout_seconds`	Query timeout	`10`
`metricsBackend.auth.type`	Auth type: `none`, `basic`, `bearer`	`none`
`metricsBackend.auth.existingSecret`	Secret name for auth credentials	`""`

The architecture supports custom metrics backends. To add one (e.g., Datadog, CloudWatch), implement the MetricsBackend ABC in pod-cost-manager.py and register it in create_metrics_backend().

Pod Selector

Parameter	Description	Default
`podSelector.<name>.enabled`	Enable this selector	`true`
`podSelector.<name>.namespace`	Kubernetes namespace	`trino`
`podSelector.<name>.labels`	Label selector map	`{app: trino, component: worker, release: trino}`

You can define multiple selectors for blue/green or multi-cluster deployments. See examples/.

Metric Names

Parameter	Description	Default
`metricNames.runningSplits`	Running splits metric	`trino_execution_executor_TaskExecutor_RunningSplits`
`metricNames.waitingSplits`	Waiting splits metric	`trino_execution_executor_TaskExecutor_WaitingSplits`
`metricNames.runningTasksLevelPattern`	Task level pattern (use `{level}`)	`trino_execution_executor_TaskExecutor_RunningTasksLevel{level}`
`metricNames.cpuUsage`	CPU usage metric	`container_cpu_usage_seconds_total`
`metricNames.cpuContainerName`	Container name for CPU query	`trino-worker`
`metricNames.cpuRateWindow`	Rate window for CPU query	`15m`
`metricNames.activeOutputBuffers`	Active output buffers metric	`trino_worker_active_output_buffers`
`metricNames.outputBufferedBytes`	Output buffered bytes metric	`trino_worker_output_buffered_bytes`
`metricNames.taskLevels`	Number of task priority levels	`5`

Edge Case Costs

Parameter	Description	Default
`edgeCaseCosts.terminatingPod`	Cost for terminating pods	`0`
`edgeCaseCosts.notReadyPod`	Cost for not-ready pods	`50`
`edgeCaseCosts.newPod`	Cost for newly created pods	`500`
`edgeCaseCosts.noMetricsPod`	Cost when no metrics available	`500`

Pod Age

Parameter	Description	Default
`podAge.newPodThresholdHours`	Hours below which a pod is "new"	`0.05` (~3 minutes)

Jitter

Parameter	Description	Default
`jitter.min`	Minimum jitter value	`0`
`jitter.max`	Maximum jitter value	`10`

Cost Calculation

Parameter	Description	Default
`costCalculation.weights.running_splits`	Weight per running split	`10`
`costCalculation.weights.waiting_splits`	Weight per waiting split	`5`
`costCalculation.weights.running_tasks_high_priority`	Weight per L0-L2 task	`15`
`costCalculation.weights.running_tasks_low_priority`	Weight per L3-L4 task	`10`
`costCalculation.weights.cpu_utilization`	Weight per CPU %	`0.5`
`costCalculation.weights.output_buffers`	Weight per active buffer	`50`
`costCalculation.caps.running_splits`	Max points from running splits	`400`
`costCalculation.caps.waiting_splits`	Max points from waiting splits	`300`
`costCalculation.caps.running_tasks_high`	Max points from L0-L2 tasks	`150`
`costCalculation.caps.running_tasks_low`	Max points from L3-L4 tasks	`100`
`costCalculation.caps.cpu_utilization`	Max points from CPU	`50`
`costCalculation.caps.output_buffers`	Max points from output buffers	`500`
`costCalculation.caps.max_total`	Overall maximum cost	`1000`

HPA Management

Parameter	Description	Default
`hpaManagement.enabled`	Enable dynamic minReplicas management	`false`
`hpaManagement.buffer`	Extra pods above active count	`1`
`hpaManagement.baseline_minimum`	Minimum floor for minReplicas	`2`
`hpaManagement.hpa_names`	Map of selector keys to HPA names	`{}`

Other

Parameter	Description	Default
`schedule`	CronJob schedule	`/1 * * *`
`namespace`	Deployment namespace	`trino`
`dryRun`	Log without patching	`false`
`logging.level`	Log level	`INFO`
`logging.format`	Log format	`json`
`datadog.enabled`	Enable Datadog log annotations	`false`

Authentication

Bearer Token

Create a Kubernetes secret with your token:

kubectl create secret generic prometheus-credentials \
  --from-literal=token=YOUR_TOKEN \
  -n trino

Configure in values:

metricsBackend:
  type: "prometheus"
  url: "https://prometheus.example.com"
  port: 443
  auth:
    type: "bearer"
    existingSecret: "prometheus-credentials"

Basic Auth

kubectl create secret generic prometheus-credentials \
  --from-literal=password=YOUR_PASSWORD \
  -n trino

metricsBackend:
  auth:
    type: "basic"
    existingSecret: "prometheus-credentials"

The username is configured in the config YAML; only the password is injected from the secret.

Examples

See the examples/ directory:

values-basic.yaml - Single release
values-blue-green.yaml - Blue/green deployment with HPA management
values-with-auth.yaml - Prometheus with bearer token auth
values-custom-metrics.yaml - Custom metric names and tuning
custom-backends.py - Implementing alternative metrics backends (Datadog, CloudWatch, static/testing)

Prerequisites

Kubernetes 1.24+ (pod deletion cost annotation support)
A metrics backend (Prometheus with application metrics, or implement a custom backend)
Helm 3.x
(Optional) trino-buffer-exporter for Trino output buffer metrics

Contributing

Contributions are welcome! Please see CONTRIBUTING.md for guidelines.

About Simon

Pod Cost Manager is maintained by Simon, the agentic marketing platform that combines customer data with real-world signals to orchestrate personalized, 1:1 campaigns at scale. We built this tool to manage intelligent autoscaling for our Trino query infrastructure and open-sourced it so others can benefit.

License

Apache License 2.0. See LICENSE for details.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.github/ISSUE_TEMPLATE		.github/ISSUE_TEMPLATE
chart		chart
examples		examples
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
pod-cost-manager.py		pod-cost-manager.py
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

Pod Cost Manager

Overview

How It Works

Architecture

Cost Calculation

Output Buffer Protection

Quick Start

1. Build the Docker Image

2. Deploy with Helm

3. Verify

Configuration Reference

Image

Metrics Backend

Pod Selector

Metric Names

Edge Case Costs

Pod Age

Jitter

Cost Calculation

HPA Management

Other

Authentication

Bearer Token

Basic Auth

Examples

Prerequisites

Contributing

About Simon

License

About

Resources

License

Code of conduct

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages