Skip to content

Deep-Spark/ix-device-plugin

Repository files navigation

IX device plugin for Kubernetes

Table of Contents

About

The IX device plugin for Kubernetes is a Daemonset that allows you to automatically:

  • Expose the number of GPUs on each nodes of your cluster
  • Keep track of the health of your GPUs
  • Run GPU enabled containers in your Kubernetes cluster.

Prerequisites

The list of prerequisites for running the IX device plugin is described below:

  • Iluvatar driver and software stack >= v1.1.0
  • Kubernetes version >= 1.10

Building the IX device plugin

make all

This will build the ix-device-plugin binary and ix-device-plugin image, see logging for more details.

Configuring the IX device plugin

The IX device plugin has a number of options that can be configured for it.

# check ix-device-plugin.yaml
apiVersion: v1
kind: ConfigMap
data:
  ix-config: |-
    resourceName: "iluvatar.com/gpu"
    flags:
      splitboard: false
      usevolcano: false
      reset_gpu: false
Field Type Description
flags.splitboard boolean Split GPU devices in every board(eg.BI-V150) if splitboard is true
flags.usevolcano boolean Enable Volcano integration (Use ix-device-plugin with ix-volcano-plugin)
flags.reset_gpu boolean Enable Gpu reset

Helm Install

Values

Parameter Default Description
image.repository ix-device-plugin Image repository
image.tag <tag> Image tag
image.pullPolicy IfNotPresent Image pull policy
ixConfig.flags.splitboard false Enable splitboard mode
ixConfig.flags.usevolcano false Enable Volcano integration
ixConfig.flags.reset_gpu false Enable GPU reset functionality

Example

Install with Custom Image

helm install ix-device-plugin ix-device-plugin-4.3.0.tgz \
  --set image.repository=registry.local/ix-device-plugin \
  --set image.tag=test \
  --set image.pullPolicy=Always \
  -n kube-system

Install with Volcano plugin

You can install the ix-device-plugin chart in two modes: with Volcano plugin enabled or without Volcano.

Enable the usevolcano flag:

helm install ix-device-plugin ix-device-plugin-4.3.0.tgz \
  --set ixConfig.flags.usevolcano=true \
  -n kube-system

Enabling GPU Support in Kubernetes

Once you have configured the options above on all the GPU nodes in your cluster, you can enable GPU support by deploying the following Daemonset:

# ix-device-plugin.yaml
apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: iluvatar-device-plugin
  namespace: kube-system
  labels:
    app.kubernetes.io/name: iluvatar-device-plugin
spec:
  selector:
    matchLabels:
      app.kubernetes.io/name: iluvatar-device-plugin
  template:
    metadata:
      annotations:
        scheduler.alpha.kubernetes.io/critical-pod: ""
      labels:
        app.kubernetes.io/name: iluvatar-device-plugin
    spec:
      priorityClassName: "system-node-critical"
      securityContext:
        null
      containers:
        - name: iluvatar-device-plugin
          securityContext:
            capabilities:
              drop:
              - ALL
            privileged: true
          image: "ix-device-plugin:4.3.0"
          imagePullPolicy: IfNotPresent
          livenessProbe:
            exec:
              command:
              - ls
              - /var/lib/kubelet/device-plugins/iluvatar-gpu.sock
            periodSeconds: 5
          startupProbe:
            exec:
              command:
              - ls
              - /var/lib/kubelet/device-plugins/iluvatar-gpu.sock
            periodSeconds: 5
          resources:
            {}
          volumeMounts:
            - mountPath: /var/lib/kubelet/device-plugins
              name: device-plugin
            - mountPath: /run/udev
              name: udev-ctl
              readOnly: true
            - mountPath: /sys
              name: sys
              readOnly: true
            - mountPath: /dev
              name: dev
            - name: ixc
              mountPath: /ixconfig
      volumes:
        - hostPath:
            path: /var/lib/kubelet/device-plugins
          name: device-plugin
        - hostPath:
            path: /run/udev
          name: udev-ctl
        - hostPath:
            path: /sys
          name: sys
        - hostPath:
            path: /etc/udev/
          name: udev-etc
        - hostPath:
            path: /dev
          name: dev
        - name: ixc
          configMap:
              name: ix-config
kubectl create -f ix-device-plugin.yaml

Running GPU Jobs

GPU can be exposed to a pod by adding iluvatar.com/gpu to the pod definition, and you can restrict the GPU resource by adding resources.limits to the pod definition.

$ cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: Pod
metadata:
  name: corex-example
spec:
  containers:
  - name: corex-example
    image: corex:4.0.0
    command: ["/usr/local/corex/bin/ixsmi"]
    args: ["-l"]
    resources:
      limits:
        iluvatar.com/gpu: 1 # requesting 1 GPUs
EOF
kubectl logs corex-example
+-----------------------------------------------------------------------------+
|  IX-ML: <version>      Driver Version: <version>      CUDA Version: <version>           |
|-------------------------------+----------------------+----------------------|
| GPU  Name                     | Bus-Id               | Clock-SM  Clock-Mem  |
| Fan  Temp  Perf  Pwr:Usage/Cap|      Memory-Usage    | GPU-Util  Compute M. |
|===============================+======================+======================|
| 0    Iluvatar BI-V150S        | 00000000:8A:00.0     | 500MHz    1600MHz    |
| 0%   33C   P0    N/A / N/A    | 114MiB / 32768MiB    | 0%        Default    |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU        PID      Process name                                Usage(MiB) |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

Split GPU Board to Multiple GPU Devices

The IX device plugin allows splitting one GPU board into multiple GPU Devices through a set of extended options in its configuration file.

With SplitBoard

The extended options for splitting board can be seen below:

flags:
    splitboard: false

That is, flags.splitboard, a boolean flag can now be specified. If this flag is set to true, the plugin will split the GPU board into multiple GPUs and kubelet will advertise multiple iluvatar.com/gpu resources to Kubernetes instead of 1 for one GPU board. Otherwise, the plugin will advertise only 1 iluvatar.com/gpu resource for one GPU board.

For example:

flags:
    splitboard: true

If this configuration were applied to a node with 1 GPUs(eg. Bi-V150, which has 2 GPU chips on it) on it, the plugin would now advertise 2 iluvatar.com/gpu resources to Kubernetes instead of 1.

$ kubectl describe node
...
Capacity:
  iluvatar.com/gpu: 2
...

Shared Access to GPUs

The IX device plugin allows oversubscription of GPUs through a set of extended options in its configuration file.

With Time-Slicing

The extended options for sharing using time-slicing can be seen below:

sharing:
    timeSlicing:
        replicas: <num-replicas>
    ...

That is, sharing.timeSlicing.replicas, a number of replicas can now be specified. These replicas represent the number of shared accesses that will be granted for a GPU.

For example:

flags:
    splitboard: false
sharing:
    timeSlicing:
        replicas: 4

If this configuration were applied to a node with 2 GPUs on it, the plugin would now advertise 8 iluvatar.com/gpu resources to Kubernetes instead of 2.

$ kubectl describe node
...
Capacity:
  iluvatar.com/gpu: 8
...

About

The IX device plugin is a DaemonSet for Kubernetes, which can help to expose the Iluvatar GPU in the Kubernetes cluster.

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors