Skip to content

chirichexe/kratos

Repository files navigation

KRATOS

Framework for Application-Aware GPU Scheduling in Kubernetes

Kubernetes Go NVIDIA Prometheus License

KRATOS is a Kubernetes operator for studying application-aware GPU scheduling of CUDA workloads on heterogeneous clusters.

The framework does not replace Kubernetes or Volcano. It adds an intermediate decision layer that learns from previous executions, scores eligible nodes, and generates scheduling hints.

The current design goal is to let users describe CUDA workloads together with their scheduling requirements, such as GPU memory, compute capability, priority, replica count, and distributed constraints.

After an initial execution, the controller is expected to collect profiling information from nsight-compute (e.g. if a kernel is compute-bound or memory-bound) and reuse that profile to score nodes for later runs, in order to make the scheduling policy application-aware.

Status

Planned integrations include:

  • Kubernetes for resource lifecycle management.
  • NVIDIA Nsight Compute and DCGM for CUDA profiling and GPU metrics.
  • Prometheus and Grafana for runtime observability.

Architecture

KRATOS architectural diagram

Getting Started

Clone the repository and run the local test suite:

git clone git@github.com:chirichexe/kratos.git
cd kratos
make test

The make test target generates Kubernetes manifests, regenerates deepcopy code, runs formatting and vet checks, downloads envtest binaries, and then runs the Go tests.

Install the CRD into the Kubernetes cluster selected by your current kubectl context:

make install

Run the controller locally against that cluster:

make run

In another terminal, create a sample CUDA workload:

kubectl apply -f config/samples/gpu_v1alpha1_cudaexperiment.yaml
kubectl get cudaexperiments.gpu.scheduler.io
kubectl get jobs,pods -l gpu.scheduler.io/experiment=cuda-vector-add
kubectl logs job/cuda-vector-add-execution

For a local GPU-enabled Kubernetes lab, see Local GPU Lab.

CUDAExperiment

Users describe a CUDA workload with one Kubernetes custom resource:

apiVersion: gpu.scheduler.io/v1alpha1
kind: CUDAExperiment
metadata:
  name: cuda-vector-add
spec:
  image: nvcr.io/nvidia/k8s/cuda-sample:vectoradd-cuda12.5.0
  runtimeClassName: nvidia
  replicas: 1
  gpuRequired: 1
  minimumComputeCapability: "7.0"
  minimumMemory: 4Gi
  priority: normal
  profilingEnabled: true

The current controller creates one Kubernetes Job named <experiment-name>-execution, sets the NVIDIA GPU limit from gpuRequired, uses the configured runtime class, and records the Job name in status.

When profilingEnabled: true, the controller creates a profiled Job:

  • stage-workload is an initContainer that uses the experiment image and stages the CUDA executable into a shared volume.
  • profiling-runner is the controller-owned Nsight Compute container. It uses the Nsight Compute image, requests the GPU, launches the staged workload once under ncu, imports the generated .ncu-rep, and prints raw metrics in its logs.

The default profiling runner image is kratos-nsight-compute-poc:latest. Set KRATOS_NSIGHT_COMPUTE_IMAGE on the controller manager to use a registry image. For custom workload images, set spec.command[0] to the executable path inside the image. If command is omitted, the controller uses the NVIDIA sample path /cuda-samples/vectorAdd.

The longer-term operator roadmap is profile lookup, cluster scoring, node-selection hints, Volcano submission, and profile updates after execution.

Development

Run the package tests:

make test

Regenerate Kubernetes assets after API or RBAC changes:

make manifests
make generate

Documentation

License

KRATOS is licensed under the Apache License 2.0.

About

Application-aware Kubernetes GPU Scheduler

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Contributors