KRATOS is a Kubernetes operator for studying application-aware GPU scheduling of CUDA workloads on heterogeneous clusters.
The framework does not replace Kubernetes or Volcano. It adds an intermediate decision layer that learns from previous executions, scores eligible nodes, and generates scheduling hints.
The current design goal is to let users describe CUDA workloads together with their scheduling requirements, such as GPU memory, compute capability, priority, replica count, and distributed constraints.
After an initial execution, the controller is expected to collect profiling information from nsight-compute (e.g. if a kernel is compute-bound or memory-bound) and reuse that profile to score nodes for later runs, in order to make the scheduling policy application-aware.
Planned integrations include:
- Kubernetes for resource lifecycle management.
- NVIDIA Nsight Compute and DCGM for CUDA profiling and GPU metrics.
- Prometheus and Grafana for runtime observability.
Clone the repository and run the local test suite:
git clone git@github.com:chirichexe/kratos.git
cd kratos
make testThe make test target generates Kubernetes manifests, regenerates deepcopy
code, runs formatting and vet checks, downloads envtest binaries, and then runs
the Go tests.
Install the CRD into the Kubernetes cluster selected by your current
kubectl context:
make installRun the controller locally against that cluster:
make runIn another terminal, create a sample CUDA workload:
kubectl apply -f config/samples/gpu_v1alpha1_cudaexperiment.yaml
kubectl get cudaexperiments.gpu.scheduler.io
kubectl get jobs,pods -l gpu.scheduler.io/experiment=cuda-vector-add
kubectl logs job/cuda-vector-add-executionFor a local GPU-enabled Kubernetes lab, see Local GPU Lab.
Users describe a CUDA workload with one Kubernetes custom resource:
apiVersion: gpu.scheduler.io/v1alpha1
kind: CUDAExperiment
metadata:
name: cuda-vector-add
spec:
image: nvcr.io/nvidia/k8s/cuda-sample:vectoradd-cuda12.5.0
runtimeClassName: nvidia
replicas: 1
gpuRequired: 1
minimumComputeCapability: "7.0"
minimumMemory: 4Gi
priority: normal
profilingEnabled: trueThe current controller creates one Kubernetes Job named
<experiment-name>-execution, sets the NVIDIA GPU limit from gpuRequired,
uses the configured runtime class, and records the Job name in status.
When profilingEnabled: true, the controller creates a profiled Job:
stage-workloadis an initContainer that uses the experiment image and stages the CUDA executable into a shared volume.profiling-runneris the controller-owned Nsight Compute container. It uses the Nsight Compute image, requests the GPU, launches the staged workload once underncu, imports the generated.ncu-rep, and prints raw metrics in its logs.
The default profiling runner image is kratos-nsight-compute-poc:latest. Set
KRATOS_NSIGHT_COMPUTE_IMAGE on the controller manager to use a registry image.
For custom workload images, set spec.command[0] to the executable path inside
the image. If command is omitted, the controller uses the NVIDIA sample path
/cuda-samples/vectorAdd.
The longer-term operator roadmap is profile lookup, cluster scoring, node-selection hints, Volcano submission, and profile updates after execution.
Run the package tests:
make testRegenerate Kubernetes assets after API or RBAC changes:
make manifests
make generate- Published documentation
- Project structure
- Getting started
- Architecture
- Operator lifecycle
- Observability
- Experiment notes
KRATOS is licensed under the Apache License 2.0.
