Context
PR #137 introduced NvmlGpuInfo in hardware.py which wraps raw pynvml calls behind a GpuInfoProvider Protocol. Reviewers (@ncclementi, @mdboom) noted we should migrate from pynvml to cuda.core.system for consistency with the rest of the CUDA Python ecosystem.
As of recently, cuda.core.system has all the APIs needed (device count, compute capability, memory info, driver version). However, this requires waiting for the cuda-core 1.0 release before merging.
Related
Work needed
Once cuda-core 1.0 is released:
- Update
NvmlGpuInfo in rapids_cli/hardware.py to use cuda.core.system instead of pynvml
- Update
HardwareInfoError wrapping to catch cuda.core exceptions instead of pynvml.NVMLError
- Remove
nvidia-ml-py (pynvml) from dependencies.yaml if fully replaced
- Update tests in
test_hardware.py to mock cuda.core instead of pynvml
Context
PR #137 introduced
NvmlGpuInfoinhardware.pywhich wraps raw pynvml calls behind aGpuInfoProviderProtocol. Reviewers (@ncclementi, @mdboom) noted we should migrate from pynvml tocuda.core.systemfor consistency with the rest of the CUDA Python ecosystem.As of recently,
cuda.core.systemhas all the APIs needed (device count, compute capability, memory info, driver version). However, this requires waiting for the cuda-core 1.0 release before merging.Related
cuda_toolkit_checkfails on fresh install ascuda-bindingsis not declared in dependencies #145 — cuda-bindings dependency issueGpuInfoProviderabstractionWork needed
Once cuda-core 1.0 is released:
NvmlGpuInfoinrapids_cli/hardware.pyto usecuda.core.systeminstead of pynvmlHardwareInfoErrorwrapping to catch cuda.core exceptions instead ofpynvml.NVMLErrornvidia-ml-py(pynvml) fromdependencies.yamlif fully replacedtest_hardware.pyto mock cuda.core instead of pynvml