Skip to content

Improve GPU catalog coverage beyond constants.py #47

@Andyyyy64

Description

@Andyyyy64

Problem

src/whichllm/constants.py currently acts as a curated hardware override table for GPU bandwidth, NVIDIA compute capability, and AMD shared-memory APU markers.

That works for common GPUs, but it is not comprehensive and it directly affects recommendation quality:

  • Unknown GPU_BANDWIDTH entries can make GPU tok/s estimates fall to 0.0 or low-confidence output, even when VRAM fit is correct.
  • Unknown AMD/Intel shared-memory APUs can be treated like tiny discrete GPUs if the OS reports only a 512 MB or 4 GB aperture.
  • New GPUs require manual constants updates before whichllm can estimate speed well.
  • --gpu simulation can use dbgpu, but real detected hardware still relies heavily on static lookups for bandwidth and shared-memory classification.

The ideal state is not to add every GPU to constants.py by hand. Ideally, every GPU should have usable vendor, model, VRAM/unified memory, memory type, bus width, memory bandwidth, compute capability, and memory-classification metadata. That data should come from a layered resolver, with curated constants only as overrides.

Research summary

There is no single source that solves this cleanly.

dbgpu is the closest catalog source because it exposes GPU names, memory size, memory type, bus width, memory clock, memory bandwidth, and CUDA-version-style metadata. It should stay in the stack, but it is a static TechPowerUp-backed catalog, not live detection. It cannot know the user's installed laptop variant, active VRAM split, current unified-memory behavior, driver-visible memory, or brand-new GPUs before catalog updates.

PCI IDs and pci.ids are useful for stable identity, but they do not provide VRAM, bandwidth, bus width, memory type, or usable shared-memory behavior.

Live APIs are strongest for facts about the installed machine, but each covers only part of the problem:

  • NVIDIA NVML / nvidia-smi: good for name, VRAM, PCI info, driver data, and compute capability; not enough for direct VRAM bandwidth.
  • CUDA APIs: can expose bus width and memory clock, so bandwidth can be derived, but this should be optional.
  • AMD SMI / ROCm SMI: good on supported AMD Linux systems; may expose VRAM size/type/width/clocks and some bandwidth metrics.
  • AMD ADLX: useful for Windows AMD, including integrated/discrete type and VRAM data, but it is a native Windows AMD SDK and should be optional.
  • Intel Level Zero Sysman: strong source for Intel memory properties and bandwidth where available, but Python ergonomics are weaker.
  • Apple system_profiler / Metal: good for identity and unified-memory classification; peak memory bandwidth still needs curated Apple Silicon data.
  • DXGI: good generic Windows adapter identity and dedicated/shared memory; no bandwidth.
  • WMI Win32_VideoController: easy fallback, but lower fidelity and AdapterRAM can be capped or inaccurate.
  • Vulkan/OpenCL: useful generic fallback for name/vendor/device type/memory heaps, but not reliable bandwidth sources.

Direction

Build a layered GPU metadata system:

live OS/vendor detection
→ normalized identity
→ catalog lookup
→ derived bandwidth when possible
→ curated overrides
→ conservative fallback
→ source/confidence metadata in JSON/UI

constants.py should become a curated override layer for:

  • Apple Silicon bandwidth
  • shared-memory APU/iGPU markers
  • known catalog corrections
  • brand-new GPUs not yet covered by the catalog
  • safety overrides for bad or ambiguous vendor/API data

Proposed architecture

Add a gpu_metadata / gpu_catalog layer with separate concepts:

DetectedGPU
  Live facts from the current machine:
  raw name, vendor, PCI IDs, visible memory, shared/unified/discrete hints, backend source.

CatalogGPU
  Static facts:
  normalized model, memory bandwidth, memory type, bus width, release generation, compute capability.

ResolvedGPU
  Merged result:
  best identity, memory model, bandwidth, confidence, provenance, notes.

Use source priority per field, not per GPU. Live VRAM should beat catalog VRAM. Catalog bandwidth can beat a heuristic. Curated overrides can correct known bad rows. Every important field should carry source and confidence metadata.

Example output target:

{
  "memory_bandwidth_gbps": 120,
  "memory_bandwidth_source": "catalog",
  "memory_bandwidth_confidence": "medium",
  "memory_kind": "shared_system",
  "memory_kind_source": "curated_override"
}

Fallback policy

Unknown bandwidth should not collapse tok/s to 0.0 for otherwise valid GPUs.

Recommended priority:

  1. Direct live bandwidth if a vendor API exposes it.
  2. Catalog bandwidth from dbgpu with exact or high-confidence normalized match.
  3. Derived bandwidth from memory clock and bus width.
  4. Curated Apple/APU/iGPU defaults.
  5. Conservative vendor/family heuristic.
  6. Very-low-confidence generic fallback with a wide range and clear warning.

The estimator should prefer a conservative nonzero estimate plus low confidence over a misleading 0 tok/s, unless there is truly no usable GPU path.

Implementation phases

  1. Add metadata/provenance schema for GPU fields.
  2. Wrap dbgpu behind a catalog provider with exact/fuzzy match confidence and validation.
  3. Add unknown-bandwidth fallback policy and JSON/UI provenance fields.
  4. Move current static constants toward curated override data.
  5. Improve Windows generic detection with DXGI where possible; keep WMI as fallback.
  6. Add optional vendor providers later: CUDA, AMD SMI/ADLX, Intel Level Zero, Metal/PyObjC.
  7. Add a debug hardware JSON mode for actionable hardware reports.

Acceptance criteria

  • Unknown bandwidth should not automatically produce unusable 0.0 tok/s for otherwise valid GPUs.
  • Known shared-memory APUs should keep working without requiring every marketing-name variant to be listed manually.
  • Future GPUs should get a conservative estimate from family/generation heuristics before a constants update is released.
  • constants.py should remain useful for overrides and special cases, but not be the only way to get decent speed estimates.
  • JSON should make it clear when GPU bandwidth/classification came from live detection, catalog lookup, derived data, curated override, or heuristic fallback.
  • dbgpu should be retained as a catalog provider unless a clearly better maintained/cataloged source replaces it.

Useful references

Related

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions