Skip to content

Proposal: Use a docker image for the worker nodes #22

@p11o

Description

@p11o

Instead of running a script on node startup, it may be better to use a docker image for quicker startup and robust versioning.

Here is a basic draft. I can create a PR of this if it makes sense.

FROM kindest/node:v1.31.4


RUN apt-get update && \
    apt-get install -y gpg && \
    curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg && \
      curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | \
        sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
        tee /etc/apt/sources.list.d/nvidia-container-toolkit.list && \
    apt-get update && \
    apt-get install -y nvidia-container-toolkit && \
    nvidia-ctk config --set nvidia-container-runtime.modes.cdi.annotation-prefixes=nvidia.cdi.k8s.io/ && \
    nvidia-ctk runtime configure --runtime=containerd --set-as-default --cdi.enabled

COPY entrypoint /entrypoint

ENTRYPOINT [ "/entrypoint", "/sbin/init" ]

entrypoint

#!/usr/bin/env bash

# Unmount the masked /proc/driver/nvidia to allow
# dynamically generated MIG devices to be discovered
umount -R /proc/driver/nvidia

# Make it so that calls into nvidia-smi / libnvidia-ml.so do not
# attempt to recreate nvidia device nodes or reset their permissions if
# tampered with
cp /proc/driver/nvidia/params root/gpu-params
sed -i 's/^ModifyDeviceFiles: 1$/ModifyDeviceFiles: 0/' root/gpu-params
mount --bind root/gpu-params /proc/driver/nvidia/params

exec /usr/local/bin/entrypoint "$@"

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions