Skip to content

Make sure gpu-operator works with older GPUs + ubuntu24 + k3s #46

Description

@danielpodwysocki

Our current version of gpu-operator defaults to templating in some non-existant images when using older GPUs and ubuntu24:

Warning  Failed     11m (x5 over 14m)     kubelet            Failed to pull image "nvcr.io/nvidia/driver:550.127.05-ubuntu24.04": rpc error: code = NotFound desc = failed to pull and unpack image "nvcr.io/nvidia/driver:550.127.05-ubuntu24.04": failed to resolve reference "nvcr.io/nvidia/driver:550.127.05-ubuntu24.04": nvcr.io/nvidia/driver:550.127.05-ubuntu24.04: not found

This results in the state machine getting stuck:

gpu-feature-discovery-fl9rq                                   0/2     Init:0/3           0          14m
gpu-operator-788c6bf9fb-d2rjj                                 1/1     Running            0          15m
gpu-operator-node-feature-discovery-gc-7f6fbc9775-4xtw6       1/1     Running            0          15m
gpu-operator-node-feature-discovery-master-6ccd579c8c-8djhp   1/1     Running            0          15m
gpu-operator-node-feature-discovery-worker-kw8pm              1/1     Running            0          15m
nvidia-container-toolkit-daemonset-9ncqq                      0/1     Init:0/1           0          14m
nvidia-dcgm-exporter-dtld7                                    0/1     Init:0/1           0          14m
nvidia-device-plugin-daemonset-96sfj                          0/2     Init:0/2           0          14m
nvidia-driver-daemonset-gkn4s                                 0/1     ImagePullBackOff   0          15m
nvidia-operator-validator-545q8                               0/1     Init:0/4           0          14m

Bumping to latest gets proper images templated - however for k3s deployments we still need to make sure the CONTAINERD_ envs are set to custom k3s paths (https://docs.nvidia.com/datacenter/cloud-native/gpu-operator/latest/getting-started.html)

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions