- About
- Prerequisites
- Building the IX device plugin
- Configuring the IX device plugin
- Enabling GPU Support in Kubernetes
- Running GPU Jobs
- Split GPU Board to Multiple GPU Devices
- Shared Access to GPUs
The IX device plugin for Kubernetes is a Daemonset that allows you to automatically:
- Expose the number of GPUs on each nodes of your cluster
- Keep track of the health of your GPUs
- Run GPU enabled containers in your Kubernetes cluster.
The list of prerequisites for running the IX device plugin is described below:
- Iluvatar driver and software stack >= v1.1.0
- Kubernetes version >= 1.10
make allThis will build the ix-device-plugin binary and ix-device-plugin image, see logging for more details.
The IX device plugin has a number of options that can be configured for it.
# check ix-device-plugin.yaml
apiVersion: v1
kind: ConfigMap
data:
ix-config: |-
resourceName: "iluvatar.com/gpu"
flags:
splitboard: false
usevolcano: false
reset_gpu: falseField |
Type |
Description |
|---|---|---|
flags.splitboard |
boolean | Split GPU devices in every board(eg.BI-V150) if splitboard is true |
flags.usevolcano |
boolean | Enable Volcano integration (Use ix-device-plugin with ix-volcano-plugin) |
flags.reset_gpu |
boolean | Enable Gpu reset |
| Parameter | Default | Description |
|---|---|---|
image.repository |
ix-device-plugin |
Image repository |
image.tag |
<tag> |
Image tag |
image.pullPolicy |
IfNotPresent |
Image pull policy |
ixConfig.flags.splitboard |
false |
Enable splitboard mode |
ixConfig.flags.usevolcano |
false |
Enable Volcano integration |
ixConfig.flags.reset_gpu |
false |
Enable GPU reset functionality |
helm install ix-device-plugin ix-device-plugin-4.3.0.tgz \
--set image.repository=registry.local/ix-device-plugin \
--set image.tag=test \
--set image.pullPolicy=Always \
-n kube-systemYou can install the ix-device-plugin chart in two modes: with Volcano plugin enabled or without Volcano.
Enable the usevolcano flag:
helm install ix-device-plugin ix-device-plugin-4.3.0.tgz \
--set ixConfig.flags.usevolcano=true \
-n kube-systemOnce you have configured the options above on all the GPU nodes in your cluster, you can enable GPU support by deploying the following Daemonset:
# ix-device-plugin.yaml
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: iluvatar-device-plugin
namespace: kube-system
labels:
app.kubernetes.io/name: iluvatar-device-plugin
spec:
selector:
matchLabels:
app.kubernetes.io/name: iluvatar-device-plugin
template:
metadata:
annotations:
scheduler.alpha.kubernetes.io/critical-pod: ""
labels:
app.kubernetes.io/name: iluvatar-device-plugin
spec:
priorityClassName: "system-node-critical"
securityContext:
null
containers:
- name: iluvatar-device-plugin
securityContext:
capabilities:
drop:
- ALL
privileged: true
image: "ix-device-plugin:4.3.0"
imagePullPolicy: IfNotPresent
livenessProbe:
exec:
command:
- ls
- /var/lib/kubelet/device-plugins/iluvatar-gpu.sock
periodSeconds: 5
startupProbe:
exec:
command:
- ls
- /var/lib/kubelet/device-plugins/iluvatar-gpu.sock
periodSeconds: 5
resources:
{}
volumeMounts:
- mountPath: /var/lib/kubelet/device-plugins
name: device-plugin
- mountPath: /run/udev
name: udev-ctl
readOnly: true
- mountPath: /sys
name: sys
readOnly: true
- mountPath: /dev
name: dev
- name: ixc
mountPath: /ixconfig
volumes:
- hostPath:
path: /var/lib/kubelet/device-plugins
name: device-plugin
- hostPath:
path: /run/udev
name: udev-ctl
- hostPath:
path: /sys
name: sys
- hostPath:
path: /etc/udev/
name: udev-etc
- hostPath:
path: /dev
name: dev
- name: ixc
configMap:
name: ix-configkubectl create -f ix-device-plugin.yamlGPU can be exposed to a pod by adding iluvatar.com/gpu to the pod definition, and you can restrict the GPU resource by adding resources.limits to the pod definition.
$ cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: Pod
metadata:
name: corex-example
spec:
containers:
- name: corex-example
image: corex:4.0.0
command: ["/usr/local/corex/bin/ixsmi"]
args: ["-l"]
resources:
limits:
iluvatar.com/gpu: 1 # requesting 1 GPUs
EOFkubectl logs corex-example
+-----------------------------------------------------------------------------+
| IX-ML: <version> Driver Version: <version> CUDA Version: <version> |
|-------------------------------+----------------------+----------------------|
| GPU Name | Bus-Id | Clock-SM Clock-Mem |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Iluvatar BI-V150S | 00000000:8A:00.0 | 500MHz 1600MHz |
| 0% 33C P0 N/A / N/A | 114MiB / 32768MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Process name Usage(MiB) |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+The IX device plugin allows splitting one GPU board into multiple GPU Devices through a set of extended options in its configuration file.
The extended options for splitting board can be seen below:
flags:
splitboard: falseThat is, flags.splitboard, a boolean flag can now be specified. If this flag is set to true, the plugin will split the GPU board into multiple GPUs and
kubelet will advertise multiple iluvatar.com/gpu resources to Kubernetes instead of 1 for one GPU board. Otherwise, the plugin will advertise only 1 iluvatar.com/gpu resource for one GPU board.
For example:
flags:
splitboard: trueIf this configuration were applied to a node with 1 GPUs(eg. Bi-V150, which has 2 GPU chips on it) on it, the plugin
would now advertise 2 iluvatar.com/gpu resources to Kubernetes instead of 1.
$ kubectl describe node
...
Capacity:
iluvatar.com/gpu: 2
...
The IX device plugin allows oversubscription of GPUs through a set of extended options in its configuration file.
The extended options for sharing using time-slicing can be seen below:
sharing:
timeSlicing:
replicas: <num-replicas>
...That is, sharing.timeSlicing.replicas, a number of replicas can now be specified. These replicas represent the number of shared accesses that will be granted for a GPU.
For example:
flags:
splitboard: false
sharing:
timeSlicing:
replicas: 4If this configuration were applied to a node with 2 GPUs on it, the plugin
would now advertise 8 iluvatar.com/gpu resources to Kubernetes instead of 2.
$ kubectl describe node
...
Capacity:
iluvatar.com/gpu: 8
...