Adding a GPU enabled node to a Kubernetes Cluster created with Cluster API.
Create a Cluster in US-East-1 and then add the additional VMs located in kubernetes-gpu/additional-vms.yaml. Make sure to change clusterName as necessary in the MachineDeployment.
On the management cluster:
kubectl apply -f additional-vms.yaml
Afterwards you need to activate the GPU enabled runtime by applying runtimeclass.yaml
On the GPU enabled cluster:
kubectl apply -f runtimeclass.yaml
Turn on the NVIDIA Device Plugin, which has been modified to use the NVIDIA containerd runtime class.
kubectl apply -f nvidia-device-plugin.yaml
Finally, run a GPU enabled example that is requesting 1 GPU and using the containerd runtime class.
kubectl apply -f notebook-example.yml
kubectl port-forward tf-notebook 8888:8888
- Checkout Image Builder to another repository that is relative to this repository.
$ ls -1 | grep 'kubernetes-gpu\|image-builder' kubernetes-gpu image-builder - Next, modify
nvidia-containerd.jsonto provide the absolute path to thekubernetes-gpurepository for theextra_reposoption. In the example, it's located in my$HOMEdirectory underworkspace. - Create a
.secrets/secrets.jsonfile that contains the missing credentials options from Image Builder. Since we pass this file in last it'll override everything else in packer, here's an example:{ "aws_region": "us-east-1", "aws_profile": "com", "ami_regions": "us-east-1", "vpc_id": "vpc-123456", "subnet_id": "subnet-123456", "ssh_keypair_name": "cluster-api-provider-aws", "iam_instance_profile": "nodes.cluster-api-provider-aws.sigs.k8s.io" } - Image Builder by default uses the regular Amazon Linux 2 base AMI, but we want to use one that has NVIDIA drivers pre-installed, so make the following change in
image-builder/images/capi/packer/ami/packer.json:@@ -135,11 +135,11 @@ "instance_type": "t3.small", "source_ami": "{{user `amazon_2_ami`}}", "source_ami_filter": { "filters": { "virtualization-type": "hvm", - "name": "amzn2-ami-hvm-2*", + "name": "amzn2-ami-graphics-hvm-2*", "root-device-type": "ebs", "architecture": "x86_64" }, "owners": ["amazon"], "most_recent": true - Afterwards, run
build.shand get the output of the AMI image, for example:ami-0728541f6e02e632d, you would then provide this as the AMI ID in your Machine Deployment.
- Image Builder is using a base Amazon Linux 2 image with NVIDIA drivers already installed.
- Adding a containerd runtime class to the containerd configuration.
- Install the
libnvidia-containerandnvidia-container-runtimerepositories, then install all RPMs which are dependencies tonvidia-container-runtime - Using an AWS instance type with a GPU attached instead of a regular instance type.
- Adding a Runtime Class to support mixed workloads (ones that use a GPU and ones that don't)
- Using the NVIDIA Device Plugin to handle requesting and limits for GPUs in workloads.
- Providing the alternative runtime environment for workloads, so that you can keep GPU enabled separate from non-GPU enabled (this follows a similar concept to kata containers).