Skip to content

No nvidia-container-runtime on Fedora 42? #57

@jlost

Description

@jlost

I am encountering issues running nvkind on fedora 42 and latest docker with nvidia-smi 580.95.05
These are the latest drivers and nvidia-container-toolkit provided by rpmfusion.

❯ nvidia-ctk --version
NVIDIA Container Toolkit CLI version 1.17.4
❯ uname -a
Linux hostname 6.16.12-200.fc42.x86_64

Following the README instructions, the first problem I encounter is that the instructions tell me to configure sudo nvidia-ctk runtime configure --runtime=docker --set-as-default --cdi.enabled. This gives me a /etc/docker/daemon.json that sets default runtime to nvidia. Running a containerized nvidia-smi with --gpus or --device flags does not work because nvidia-container-runtime is missing. It is not provided by the latest nvidia-container-toolkit package, presumably because this runtime implementation has been retired in favor of CDI.

If I remove the changes to daemon.json and use CDI (--device) with the default runtime, I can run a containerized nvidia-smi and move on to creating an nvkind cluster. That's when I run into another error:

> nvkind cluster create
[...]
time="2025-10-24T13:35:24Z" level=info msg="Using config version 3"
time="2025-10-24T13:35:24Z" level=info msg="Using CRI runtime plugin name \"io.containerd.cri.v1.runtime\""
time="2025-10-24T13:35:24Z" level=info msg="Wrote updated config to /etc/containerd/config.d/99-nvidia.toml"
time="2025-10-24T13:35:24Z" level=info msg="It is recommended that containerd daemon be restarted."
umount: /proc/driver/nvidia: not mounted
F1024 09:35:24.918249 3194125 main.go:45] Error: patching /proc/driver/nvidia on node 'nvkind-xjqfc-worker': running script on nvkind-xjqfc-worker: executing command: exit status 1

Inside the worker, /proc/driver/nvidia is populated, but there's no mount directly there. That implementation detail seems to be specific to nvidia-container-runtime.

Maybe I'm overlooking some option that makes nvkind work with CDI rather than the nvidia-container-runtime? Or is the expectation that I get an older NCT version from other source? But it seems to me like nvidia-container-runtime is on its way out?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions