Refactor public API and add NVML installation fallback#2
Open
riccardoc95 wants to merge 37 commits into
Open
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Changelog
Public API
The low-level
nvml_*wrappers are no longer exported.The public API has been split into files for easier maintenance. It is now focused on:
cm_start()cm_timestamp()cm_stop()cm_parser()cm_vizdf()cm_plot_usage()CudaMonSession()R CMD check notes
The
cm_plot_usage()global variable notes were addressed by explicitly binding the ggplot variable:tm <- value <- type <- label_y <- step <- NULLNVML deprecation warning
The deprecated
nvmlDeviceGetTemperature()call was replaced with the newernvmlDeviceGetTemperatureV()NVML temperature API path.Torch/reticulate dependency
The previous Python/torch-based GPU test path was removed from inst.
Process discovery
The monitoring code no longer parses
pscommand output directly. Process discovery was moved to thepsR package: a script with functionprocess_pidsis added.Bioconductor GPU CI
Added
.BBSoptionswith GPU reliance configuration so Bioconductor can opt the package into GPU-capable builders.Installation fallback
The
configurelogic now includes a fallback for systems where NVML is not available. This allows the package to install in non-NVIDIA or non-CUDA environments while still enabling NVML-backed monitoring when the library is available.Documentation
Documentation was expanded and updated, including:
Current MIG-related limitations
gpu_utilization_pctandmemory_utilization_pctare not always available when running on MIG-enabled devices.At the moment, these missing metrics are handled conservatively: when a metric is not reported, it is excluded from the plot.
The memory total field was also renamed from
memory_total_bytestomemory_total_device_bytesto make its meaning explicit. This value refers to the total memory of the physical GPU device, not necessarily the memory limit of a specific MIG instance.