Skip to content

Feature/multi gpu load support#456

Open
ThatDeltaGuy wants to merge 14 commits into
hass-agent:mainfrom
ThatDeltaGuy:Feature/Multi-GPU-Support
Open

Feature/multi gpu load support#456
ThatDeltaGuy wants to merge 14 commits into
hass-agent:mainfrom
ThatDeltaGuy:Feature/Multi-GPU-Support

Conversation

@ThatDeltaGuy

@ThatDeltaGuy ThatDeltaGuy commented Jun 16, 2026

Copy link
Copy Markdown

Added support for multiple GPU's, in the GPU Load Sensor, including integrated graphics.

Summary

The GPU Load sensor previously summed every detected GPU's utilization into a single number, with no way to pick a specific GPU. This adds a GPU selector (pick one, or "All" to average) and fixes the underlying aggregation, which on systems with both an iGPU and a dGPU could silently exceed 100% or report the wrong adapter entirely.

The problem

  • The sensor summed the Utilization Percentage counter across every GPU Engine instance. With two active GPUs under load simultaneously, the reported value could exceed 100%.
  • Worse, on some hardware (confirmed via live testing on a mixed AMD iGPU + NVIDIA dGPU system) Windows doesn't increment the phys_n segment of the counter instance name per adapter — it stays phys_0 for every adapter. Any attempt to group by phys_n would have silently merged unrelated GPUs into one bucket regardless.
  • The only thing in the counter instance name that's actually unique per adapter is the LUID. But Win32_VideoController (the WMI class previously used to resolve friendly GPU names) doesn't expose LUID at all — there's no property for it. That ruled out WMI as a way to pair counter data with a GPU's name.

How it works now

  • Selector + averaging, not summing. The sensor config UI gained a GPU dropdown (reusing the existing settings combobox, now renamed CbSetting1 since it's shared across Network/Internal Sensor/GPU types rather than being network-specific). Selecting "All" averages every GPU's load instead of summing it.
  • DXGI for enumeration. Added the Vanara.PInvoke.DXGI package (pinned to the same version as the existing Vanara.PInvoke.PowrProf) to enumerate adapters via IDXGIFactory1/IDXGIAdapter1.GetDesc1(). This gives a LUID and a friendly name from one authoritative call, so there's no need to correlate two different APIs. DXGI also lists every installed adapter unconditionally, so an idle GPU (e.g. an iGPU that hasn't rendered anything yet) is still selectable — the old GPU Engine counter approach only ever exposed an adapter once something had scheduled 3D work on it.
  • Grouping by LUID, not phys_n. Performance counter instances are now grouped by the LUID extracted from their instance name, which is the only identifier that's actually unique per physical adapter.
  • LUID case normalization. Windows doesn't consistently capitalize the hex digits in the LUID across different processes' counter instances for the same adapter. Without normalizing to lowercase, the same physical GPU could get split into two separate dictionary entries depending on which process's instance happened to be read.
  • Filtering out phantom adapters. DXGI excludes the WARP/Microsoft Basic Render software rasterizer (DXGI_ADAPTER_FLAG_SOFTWARE) from the selectable list, but the GPU Engine counters can still report instances against it (or other non-hardware adapters, e.g. an indirect display driver). Counter instances are now filtered down to only the adapters DXGI confirmed are real, before anything gets aggregated — otherwise an unselectable phantom adapter could still quietly skew the "All" average.
  • Removed the Thread.Sleep(10) (had a //TODO: fix this against it). The sensor now caches its PerformanceCounter objects across refreshes instead of recreating and re-priming them every call. A percentage-style counter needs a meaningful time delta between samples to be accurate; reusing the counter means it diffs against the previous real refresh (e.g. 30 seconds ago) instead of an artificial ~10ms window.
  • Fixed a GetState() formatting bug. value.ToString("#.##") returns an empty string for exactly 0 in .NET (not "0") — a latent bug that rarely surfaced before, since an idle GPU couldn't be selected at all. Now uses "0.##".

note: There's no fully reliable way to identify a remaining unlabeled adapter type. DXGI's SOFTWARE flag filter handles the common case, but isn't a guarantee against every possible phantom adapter type. DXGI_ADAPTER_FLAG_REMOTE are not filtered, since they can represent a real GPU accessed over a Remote Desktop session.

Testing

No unit test infrastructure existed in the repo, so this adds three new NUnit projects (HASS.Agent.UnitTests, HASS.Agent.Shared.UnitTests, HASS.Agent.Satellite.Service.UnitTests), wired into the solution. 36 tests total, covering:

  • GPU id normalization ("*"/null/empty all mean "All")
  • LUID extraction and case-insensitive matching
  • Averaging vs. summing (the core regression case: two GPUs at 20%/80% must report 50%, not 100%)
  • Phantom-adapter filtering
  • StoredSensors wiring in both the Agent and Satellite Service, confirming the configured Query flows through to the sensor's selected GPU
  • One test ([Category("Hardware")]) that exercises the real GPU Engine counters and DXGI on whatever machine runs it, printing each detected GPU's live reading — skips gracefully via Assert.Ignore rather than failing on a machine with no GPU counters, so it's safe in CI.
image

AI Usage

For full disclosure, I have used AI, but only to build out most of the unit tests, and also update the translations into other languages (and edit this description to make it clearer), As well as using the AI-ish intelli-sense to write the comments before editing them. I have however, gone through and read everything it has written to make sure its understandable and testing the right things. I have to assume on the translations.

@ThatDeltaGuy ThatDeltaGuy marked this pull request as ready for review June 16, 2026 21:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant