IX device plugin for Kubernetes

About

The IX device plugin for Kubernetes is a Daemonset that allows you to automatically:

Expose the number of GPUs on each nodes of your cluster
Keep track of the health of your GPUs
Run GPU enabled containers in your Kubernetes cluster.

Prerequisites

The list of prerequisites for running the IX device plugin is described below:

Iluvatar driver and software stack >= v1.1.0
Kubernetes version >= 1.10

Building the IX device plugin

make all

This will build the ix-device-plugin binary and ix-device-plugin image, see logging for more details.

Configuring the IX device plugin

The IX device plugin has a number of options that can be configured for it.

# check ix-device-plugin.yaml
apiVersion: v1
kind: ConfigMap
data:
  ix-config: |-
    resourceName: "iluvatar.com/gpu"
    flags:
      splitboard: false
      usevolcano: false
      reset_gpu: false

`Field`	`Type`	`Description`
`flags.splitboard`	boolean	Split GPU devices in every board(eg.BI-V150) if `splitboard` is `true`
`flags.usevolcano`	boolean	Enable Volcano integration (Use ix-device-plugin with ix-volcano-plugin)
`flags.reset_gpu`	boolean	Enable Gpu reset

Helm Install

Values

Parameter	Default	Description
`image.repository`	`ix-device-plugin`	Image repository
`image.tag`	`<tag>`	Image tag
`image.pullPolicy`	`IfNotPresent`	Image pull policy
`ixConfig.flags.splitboard`	`false`	Enable splitboard mode
`ixConfig.flags.usevolcano`	`false`	Enable Volcano integration
`ixConfig.flags.reset_gpu`	`false`	Enable GPU reset functionality

Example

Install with Custom Image

helm install ix-device-plugin ix-device-plugin-4.3.0.tgz \
  --set image.repository=registry.local/ix-device-plugin \
  --set image.tag=test \
  --set image.pullPolicy=Always \
  -n kube-system

Install with Volcano plugin

You can install the ix-device-plugin chart in two modes: with Volcano plugin enabled or without Volcano.

Enable the usevolcano flag:

helm install ix-device-plugin ix-device-plugin-4.3.0.tgz \
  --set ixConfig.flags.usevolcano=true \
  -n kube-system

Enabling GPU Support in Kubernetes

Once you have configured the options above on all the GPU nodes in your cluster, you can enable GPU support by deploying the following Daemonset:

# ix-device-plugin.yaml
apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: iluvatar-device-plugin
  namespace: kube-system
  labels:
    app.kubernetes.io/name: iluvatar-device-plugin
spec:
  selector:
    matchLabels:
      app.kubernetes.io/name: iluvatar-device-plugin
  template:
    metadata:
      annotations:
        scheduler.alpha.kubernetes.io/critical-pod: ""
      labels:
        app.kubernetes.io/name: iluvatar-device-plugin
    spec:
      priorityClassName: "system-node-critical"
      securityContext:
        null
      containers:
        - name: iluvatar-device-plugin
          securityContext:
            capabilities:
              drop:
              - ALL
            privileged: true
          image: "ix-device-plugin:4.3.0"
          imagePullPolicy: IfNotPresent
          livenessProbe:
            exec:
              command:
              - ls
              - /var/lib/kubelet/device-plugins/iluvatar-gpu.sock
            periodSeconds: 5
          startupProbe:
            exec:
              command:
              - ls
              - /var/lib/kubelet/device-plugins/iluvatar-gpu.sock
            periodSeconds: 5
          resources:
            {}
          volumeMounts:
            - mountPath: /var/lib/kubelet/device-plugins
              name: device-plugin
            - mountPath: /run/udev
              name: udev-ctl
              readOnly: true
            - mountPath: /sys
              name: sys
              readOnly: true
            - mountPath: /dev
              name: dev
            - name: ixc
              mountPath: /ixconfig
      volumes:
        - hostPath:
            path: /var/lib/kubelet/device-plugins
          name: device-plugin
        - hostPath:
            path: /run/udev
          name: udev-ctl
        - hostPath:
            path: /sys
          name: sys
        - hostPath:
            path: /etc/udev/
          name: udev-etc
        - hostPath:
            path: /dev
          name: dev
        - name: ixc
          configMap:
              name: ix-config

kubectl create -f ix-device-plugin.yaml

Running GPU Jobs

GPU can be exposed to a pod by adding iluvatar.com/gpu to the pod definition, and you can restrict the GPU resource by adding resources.limits to the pod definition.

$ cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: Pod
metadata:
  name: corex-example
spec:
  containers:
  - name: corex-example
    image: corex:4.0.0
    command: ["/usr/local/corex/bin/ixsmi"]
    args: ["-l"]
    resources:
      limits:
        iluvatar.com/gpu: 1 # requesting 1 GPUs
EOF

kubectl logs corex-example
+-----------------------------------------------------------------------------+
|  IX-ML: <version>      Driver Version: <version>      CUDA Version: <version>           |
|-------------------------------+----------------------+----------------------|
| GPU  Name                     | Bus-Id               | Clock-SM  Clock-Mem  |
| Fan  Temp  Perf  Pwr:Usage/Cap|      Memory-Usage    | GPU-Util  Compute M. |
|===============================+======================+======================|
| 0    Iluvatar BI-V150S        | 00000000:8A:00.0     | 500MHz    1600MHz    |
| 0%   33C   P0    N/A / N/A    | 114MiB / 32768MiB    | 0%        Default    |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU        PID      Process name                                Usage(MiB) |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

Split GPU Board to Multiple GPU Devices

The IX device plugin allows splitting one GPU board into multiple GPU Devices through a set of extended options in its configuration file.

With SplitBoard

The extended options for splitting board can be seen below:

flags:
    splitboard: false

That is, flags.splitboard, a boolean flag can now be specified. If this flag is set to true, the plugin will split the GPU board into multiple GPUs and kubelet will advertise multiple iluvatar.com/gpu resources to Kubernetes instead of 1 for one GPU board. Otherwise, the plugin will advertise only 1 iluvatar.com/gpu resource for one GPU board.

For example:

flags:
    splitboard: true

If this configuration were applied to a node with 1 GPUs(eg. Bi-V150, which has 2 GPU chips on it) on it, the plugin would now advertise 2 iluvatar.com/gpu resources to Kubernetes instead of 1.

$ kubectl describe node
...
Capacity:
  iluvatar.com/gpu: 2
...

Shared Access to GPUs

The IX device plugin allows oversubscription of GPUs through a set of extended options in its configuration file.

With Time-Slicing

The extended options for sharing using time-slicing can be seen below:

sharing:
    timeSlicing:
        replicas: <num-replicas>
    ...

That is, sharing.timeSlicing.replicas, a number of replicas can now be specified. These replicas represent the number of shared accesses that will be granted for a GPU.

For example:

flags:
    splitboard: false
sharing:
    timeSlicing:
        replicas: 4

If this configuration were applied to a node with 2 GPUs on it, the plugin would now advertise 8 iluvatar.com/gpu resources to Kubernetes instead of 2.

$ kubectl describe node
...
Capacity:
  iluvatar.com/gpu: 8
...

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
cmd		cmd
deployment/ix-device-plugin		deployment/ix-device-plugin
docker		docker
pkg		pkg
vendor		vendor
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
corex-example.yaml		corex-example.yaml
go.mod		go.mod
go.sum		go.sum
ix-device-plugin-volcano.yaml		ix-device-plugin-volcano.yaml
ix-device-plugin.yaml		ix-device-plugin.yaml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

IX device plugin for Kubernetes

Table of Contents

About

Prerequisites

Building the IX device plugin

Configuring the IX device plugin

Helm Install

Values

Example

Install with Custom Image

Install with Volcano plugin

Enabling GPU Support in Kubernetes

Running GPU Jobs

Split GPU Board to Multiple GPU Devices

With SplitBoard

Shared Access to GPUs

With Time-Slicing

About

Uh oh!

Releases 2

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

IX device plugin for Kubernetes

Table of Contents

About

Prerequisites

Building the IX device plugin

Configuring the IX device plugin

Helm Install

Values

Example

Install with Custom Image

Install with Volcano plugin

Enabling GPU Support in Kubernetes

Running GPU Jobs

Split GPU Board to Multiple GPU Devices

With SplitBoard

Shared Access to GPUs

With Time-Slicing

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages