Skip to content

[Issue]: GPU device count not working on Minikube cluster #149

Description

@mwessman-amd

Problem Description

Hello,

I'm currently running a Minikube k8s cluster and I'm using this plugin to enable my MI300X GPU's. The problem here is that I have 8 GPU's on the node, but I would only need to enable a few of them.

https://instinct.docs.amd.com/projects/k8s-device-plugin/en/latest/user-guide/configuration.html

I followed the guide above, and tested both using the environment variable AND creating the configMap for the plugin. I have confirmed that the container that runs the plugin properly mounts the configMap, and that that config.yaml can be found under /etc/config.yaml. However, whenever I run the plugin, it still enables all 8 of my GPU's.

Is there another workaround for making sure it only selects a few GPU's on my node, or can I somehow give it specific GPU's to use? Any help is appreciated.

Operating System

Ubuntu 22.04.5 LTS (Jammy Jellyfish)

CPU

AMD EPYC 9534 64-Core Processor

GPU

8 x AMD Instinct MI300X

ROCm Version

6.3.3.60303-74~22.04

ROCm Component

No response

Steps to Reproduce

No response

(Optional for Linux users) Output of /opt/rocm/bin/rocminfo --support

No response

Additional Information

No response

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions