Problem
src/whichllm/constants.py currently acts as a curated hardware override table for GPU bandwidth, NVIDIA compute capability, and AMD shared-memory APU markers.
That works for common GPUs, but it is not comprehensive and it directly affects recommendation quality:
- Unknown
GPU_BANDWIDTH entries can make GPU tok/s estimates fall to 0.0 or low-confidence output, even when VRAM fit is correct.
- Unknown AMD/Intel shared-memory APUs can be treated like tiny discrete GPUs if the OS reports only a 512 MB or 4 GB aperture.
- New GPUs require manual constants updates before whichllm can estimate speed well.
--gpu simulation can use dbgpu, but real detected hardware still relies heavily on static lookups for bandwidth and shared-memory classification.
The ideal state is not to add every GPU to constants.py by hand. Ideally, every GPU should have usable vendor, model, VRAM/unified memory, memory type, bus width, memory bandwidth, compute capability, and memory-classification metadata. That data should come from a layered resolver, with curated constants only as overrides.
Research summary
There is no single source that solves this cleanly.
dbgpu is the closest catalog source because it exposes GPU names, memory size, memory type, bus width, memory clock, memory bandwidth, and CUDA-version-style metadata. It should stay in the stack, but it is a static TechPowerUp-backed catalog, not live detection. It cannot know the user's installed laptop variant, active VRAM split, current unified-memory behavior, driver-visible memory, or brand-new GPUs before catalog updates.
PCI IDs and pci.ids are useful for stable identity, but they do not provide VRAM, bandwidth, bus width, memory type, or usable shared-memory behavior.
Live APIs are strongest for facts about the installed machine, but each covers only part of the problem:
- NVIDIA NVML /
nvidia-smi: good for name, VRAM, PCI info, driver data, and compute capability; not enough for direct VRAM bandwidth.
- CUDA APIs: can expose bus width and memory clock, so bandwidth can be derived, but this should be optional.
- AMD SMI / ROCm SMI: good on supported AMD Linux systems; may expose VRAM size/type/width/clocks and some bandwidth metrics.
- AMD ADLX: useful for Windows AMD, including integrated/discrete type and VRAM data, but it is a native Windows AMD SDK and should be optional.
- Intel Level Zero Sysman: strong source for Intel memory properties and bandwidth where available, but Python ergonomics are weaker.
- Apple
system_profiler / Metal: good for identity and unified-memory classification; peak memory bandwidth still needs curated Apple Silicon data.
- DXGI: good generic Windows adapter identity and dedicated/shared memory; no bandwidth.
- WMI
Win32_VideoController: easy fallback, but lower fidelity and AdapterRAM can be capped or inaccurate.
- Vulkan/OpenCL: useful generic fallback for name/vendor/device type/memory heaps, but not reliable bandwidth sources.
Direction
Build a layered GPU metadata system:
live OS/vendor detection
→ normalized identity
→ catalog lookup
→ derived bandwidth when possible
→ curated overrides
→ conservative fallback
→ source/confidence metadata in JSON/UI
constants.py should become a curated override layer for:
- Apple Silicon bandwidth
- shared-memory APU/iGPU markers
- known catalog corrections
- brand-new GPUs not yet covered by the catalog
- safety overrides for bad or ambiguous vendor/API data
Proposed architecture
Add a gpu_metadata / gpu_catalog layer with separate concepts:
DetectedGPU
Live facts from the current machine:
raw name, vendor, PCI IDs, visible memory, shared/unified/discrete hints, backend source.
CatalogGPU
Static facts:
normalized model, memory bandwidth, memory type, bus width, release generation, compute capability.
ResolvedGPU
Merged result:
best identity, memory model, bandwidth, confidence, provenance, notes.
Use source priority per field, not per GPU. Live VRAM should beat catalog VRAM. Catalog bandwidth can beat a heuristic. Curated overrides can correct known bad rows. Every important field should carry source and confidence metadata.
Example output target:
{
"memory_bandwidth_gbps": 120,
"memory_bandwidth_source": "catalog",
"memory_bandwidth_confidence": "medium",
"memory_kind": "shared_system",
"memory_kind_source": "curated_override"
}
Fallback policy
Unknown bandwidth should not collapse tok/s to 0.0 for otherwise valid GPUs.
Recommended priority:
- Direct live bandwidth if a vendor API exposes it.
- Catalog bandwidth from
dbgpu with exact or high-confidence normalized match.
- Derived bandwidth from memory clock and bus width.
- Curated Apple/APU/iGPU defaults.
- Conservative vendor/family heuristic.
- Very-low-confidence generic fallback with a wide range and clear warning.
The estimator should prefer a conservative nonzero estimate plus low confidence over a misleading 0 tok/s, unless there is truly no usable GPU path.
Implementation phases
- Add metadata/provenance schema for GPU fields.
- Wrap
dbgpu behind a catalog provider with exact/fuzzy match confidence and validation.
- Add unknown-bandwidth fallback policy and JSON/UI provenance fields.
- Move current static constants toward curated override data.
- Improve Windows generic detection with DXGI where possible; keep WMI as fallback.
- Add optional vendor providers later: CUDA, AMD SMI/ADLX, Intel Level Zero, Metal/PyObjC.
- Add a debug hardware JSON mode for actionable hardware reports.
Acceptance criteria
- Unknown bandwidth should not automatically produce unusable
0.0 tok/s for otherwise valid GPUs.
- Known shared-memory APUs should keep working without requiring every marketing-name variant to be listed manually.
- Future GPUs should get a conservative estimate from family/generation heuristics before a constants update is released.
constants.py should remain useful for overrides and special cases, but not be the only way to get decent speed estimates.
- JSON should make it clear when GPU bandwidth/classification came from live detection, catalog lookup, derived data, curated override, or heuristic fallback.
dbgpu should be retained as a catalog provider unless a clearly better maintained/cataloged source replaces it.
Useful references
Related
Problem
src/whichllm/constants.pycurrently acts as a curated hardware override table for GPU bandwidth, NVIDIA compute capability, and AMD shared-memory APU markers.That works for common GPUs, but it is not comprehensive and it directly affects recommendation quality:
GPU_BANDWIDTHentries can make GPU tok/s estimates fall to0.0or low-confidence output, even when VRAM fit is correct.--gpusimulation can usedbgpu, but real detected hardware still relies heavily on static lookups for bandwidth and shared-memory classification.The ideal state is not to add every GPU to
constants.pyby hand. Ideally, every GPU should have usable vendor, model, VRAM/unified memory, memory type, bus width, memory bandwidth, compute capability, and memory-classification metadata. That data should come from a layered resolver, with curated constants only as overrides.Research summary
There is no single source that solves this cleanly.
dbgpuis the closest catalog source because it exposes GPU names, memory size, memory type, bus width, memory clock, memory bandwidth, and CUDA-version-style metadata. It should stay in the stack, but it is a static TechPowerUp-backed catalog, not live detection. It cannot know the user's installed laptop variant, active VRAM split, current unified-memory behavior, driver-visible memory, or brand-new GPUs before catalog updates.PCI IDs and
pci.idsare useful for stable identity, but they do not provide VRAM, bandwidth, bus width, memory type, or usable shared-memory behavior.Live APIs are strongest for facts about the installed machine, but each covers only part of the problem:
nvidia-smi: good for name, VRAM, PCI info, driver data, and compute capability; not enough for direct VRAM bandwidth.system_profiler/ Metal: good for identity and unified-memory classification; peak memory bandwidth still needs curated Apple Silicon data.Win32_VideoController: easy fallback, but lower fidelity andAdapterRAMcan be capped or inaccurate.Direction
Build a layered GPU metadata system:
constants.pyshould become a curated override layer for:Proposed architecture
Add a
gpu_metadata/gpu_cataloglayer with separate concepts:Use source priority per field, not per GPU. Live VRAM should beat catalog VRAM. Catalog bandwidth can beat a heuristic. Curated overrides can correct known bad rows. Every important field should carry source and confidence metadata.
Example output target:
{ "memory_bandwidth_gbps": 120, "memory_bandwidth_source": "catalog", "memory_bandwidth_confidence": "medium", "memory_kind": "shared_system", "memory_kind_source": "curated_override" }Fallback policy
Unknown bandwidth should not collapse tok/s to
0.0for otherwise valid GPUs.Recommended priority:
dbgpuwith exact or high-confidence normalized match.The estimator should prefer a conservative nonzero estimate plus low confidence over a misleading
0 tok/s, unless there is truly no usable GPU path.Implementation phases
dbgpubehind a catalog provider with exact/fuzzy match confidence and validation.Acceptance criteria
0.0tok/s for otherwise valid GPUs.constants.pyshould remain useful for overrides and special cases, but not be the only way to get decent speed estimates.dbgpushould be retained as a catalog provider unless a clearly better maintained/cataloged source replaces it.Useful references
dbgpu: https://github.com/painebenjamin/dbgpudbgpuon PyPI: https://pypi.org/project/dbgpu/Win32_VideoController: https://learn.microsoft.com/en-us/windows/win32/cimwin32prov/win32-videocontrollersystem_profiler: https://manp.gs/mac/8/system_profilerRelated