Improve GPU catalog coverage beyond constants.py

## Problem

`src/whichllm/constants.py` currently acts as a curated hardware override table for GPU bandwidth, NVIDIA compute capability, and AMD shared-memory APU markers.

That works for common GPUs, but it is not comprehensive and it directly affects recommendation quality:

- Unknown `GPU_BANDWIDTH` entries can make GPU tok/s estimates fall to `0.0` or low-confidence output, even when VRAM fit is correct.
- Unknown AMD/Intel shared-memory APUs can be treated like tiny discrete GPUs if the OS reports only a 512 MB or 4 GB aperture.
- New GPUs require manual constants updates before whichllm can estimate speed well.
- `--gpu` simulation can use `dbgpu`, but real detected hardware still relies heavily on static lookups for bandwidth and shared-memory classification.

The ideal state is not to add every GPU to `constants.py` by hand. Ideally, every GPU should have usable vendor, model, VRAM/unified memory, memory type, bus width, memory bandwidth, compute capability, and memory-classification metadata. That data should come from a layered resolver, with curated constants only as overrides.

## Research summary

There is no single source that solves this cleanly.

`dbgpu` is the closest catalog source because it exposes GPU names, memory size, memory type, bus width, memory clock, memory bandwidth, and CUDA-version-style metadata. It should stay in the stack, but it is a static TechPowerUp-backed catalog, not live detection. It cannot know the user's installed laptop variant, active VRAM split, current unified-memory behavior, driver-visible memory, or brand-new GPUs before catalog updates.

PCI IDs and `pci.ids` are useful for stable identity, but they do not provide VRAM, bandwidth, bus width, memory type, or usable shared-memory behavior.

Live APIs are strongest for facts about the installed machine, but each covers only part of the problem:

- NVIDIA NVML / `nvidia-smi`: good for name, VRAM, PCI info, driver data, and compute capability; not enough for direct VRAM bandwidth.
- CUDA APIs: can expose bus width and memory clock, so bandwidth can be derived, but this should be optional.
- AMD SMI / ROCm SMI: good on supported AMD Linux systems; may expose VRAM size/type/width/clocks and some bandwidth metrics.
- AMD ADLX: useful for Windows AMD, including integrated/discrete type and VRAM data, but it is a native Windows AMD SDK and should be optional.
- Intel Level Zero Sysman: strong source for Intel memory properties and bandwidth where available, but Python ergonomics are weaker.
- Apple `system_profiler` / Metal: good for identity and unified-memory classification; peak memory bandwidth still needs curated Apple Silicon data.
- DXGI: good generic Windows adapter identity and dedicated/shared memory; no bandwidth.
- WMI `Win32_VideoController`: easy fallback, but lower fidelity and `AdapterRAM` can be capped or inaccurate.
- Vulkan/OpenCL: useful generic fallback for name/vendor/device type/memory heaps, but not reliable bandwidth sources.

## Direction

Build a layered GPU metadata system:

```text
live OS/vendor detection
→ normalized identity
→ catalog lookup
→ derived bandwidth when possible
→ curated overrides
→ conservative fallback
→ source/confidence metadata in JSON/UI
```

`constants.py` should become a curated override layer for:

- Apple Silicon bandwidth
- shared-memory APU/iGPU markers
- known catalog corrections
- brand-new GPUs not yet covered by the catalog
- safety overrides for bad or ambiguous vendor/API data

## Proposed architecture

Add a `gpu_metadata` / `gpu_catalog` layer with separate concepts:

```text
DetectedGPU
  Live facts from the current machine:
  raw name, vendor, PCI IDs, visible memory, shared/unified/discrete hints, backend source.

CatalogGPU
  Static facts:
  normalized model, memory bandwidth, memory type, bus width, release generation, compute capability.

ResolvedGPU
  Merged result:
  best identity, memory model, bandwidth, confidence, provenance, notes.
```

Use source priority per field, not per GPU. Live VRAM should beat catalog VRAM. Catalog bandwidth can beat a heuristic. Curated overrides can correct known bad rows. Every important field should carry source and confidence metadata.

Example output target:

```json
{
  "memory_bandwidth_gbps": 120,
  "memory_bandwidth_source": "catalog",
  "memory_bandwidth_confidence": "medium",
  "memory_kind": "shared_system",
  "memory_kind_source": "curated_override"
}
```

## Fallback policy

Unknown bandwidth should not collapse tok/s to `0.0` for otherwise valid GPUs.

Recommended priority:

1. Direct live bandwidth if a vendor API exposes it.
2. Catalog bandwidth from `dbgpu` with exact or high-confidence normalized match.
3. Derived bandwidth from memory clock and bus width.
4. Curated Apple/APU/iGPU defaults.
5. Conservative vendor/family heuristic.
6. Very-low-confidence generic fallback with a wide range and clear warning.

The estimator should prefer a conservative nonzero estimate plus low confidence over a misleading `0 tok/s`, unless there is truly no usable GPU path.

## Implementation phases

1. Add metadata/provenance schema for GPU fields.
2. Wrap `dbgpu` behind a catalog provider with exact/fuzzy match confidence and validation.
3. Add unknown-bandwidth fallback policy and JSON/UI provenance fields.
4. Move current static constants toward curated override data.
5. Improve Windows generic detection with DXGI where possible; keep WMI as fallback.
6. Add optional vendor providers later: CUDA, AMD SMI/ADLX, Intel Level Zero, Metal/PyObjC.
7. Add a debug hardware JSON mode for actionable hardware reports.

## Acceptance criteria

- Unknown bandwidth should not automatically produce unusable `0.0` tok/s for otherwise valid GPUs.
- Known shared-memory APUs should keep working without requiring every marketing-name variant to be listed manually.
- Future GPUs should get a conservative estimate from family/generation heuristics before a constants update is released.
- `constants.py` should remain useful for overrides and special cases, but not be the only way to get decent speed estimates.
- JSON should make it clear when GPU bandwidth/classification came from live detection, catalog lookup, derived data, curated override, or heuristic fallback.
- `dbgpu` should be retained as a catalog provider unless a clearly better maintained/cataloged source replaces it.

## Useful references

- `dbgpu`: https://github.com/painebenjamin/dbgpu
- `dbgpu` on PyPI: https://pypi.org/project/dbgpu/
- PCI IDs: https://pci-ids.ucw.cz/
- Linux PCI sysfs: https://docs.kernel.org/PCI/sysfs-pci.html
- NVIDIA NVML: https://developer.nvidia.com/management-library-nvml
- NVIDIA CUDA device properties: https://docs.nvidia.com/cuda/cuda-runtime-api/structcudaDeviceProp.html
- AMD SMI Python API: https://rocm.docs.amd.com/projects/amdsmi/en/latest/reference/amdsmi-py-api.html
- AMD ADLX GPU interfaces: https://gpuopen.com/manuals/adlx/adlx-sdk-references/adlx-interfaces/gpu/iadlxgpu/type/
- Intel Level Zero Sysman: https://oneapi-src.github.io/level-zero-spec/level-zero/latest/sysman/api.html
- Windows DXGI adapter descriptor: https://learn.microsoft.com/en-us/windows/win32/api/dxgi1_6/ns-dxgi1_6-dxgi_adapter_desc3
- Windows `Win32_VideoController`: https://learn.microsoft.com/en-us/windows/win32/cimwin32prov/win32-videocontroller
- Apple `system_profiler`: https://manp.gs/mac/8/system_profiler
- Vulkan device properties: https://registry.khronos.org/VulkanSC/specs/1.0-extensions/man/html/VkPhysicalDeviceProperties.html
- OpenCL device info: https://registry.khronos.org/OpenCL/specs/unified/refpages/man/html/clGetDeviceInfo.html

## Related

- #19
- #30
- #36


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve GPU catalog coverage beyond constants.py #47

Problem

Research summary

Direction

Proposed architecture

Fallback policy

Implementation phases

Acceptance criteria

Useful references

Related

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Improve GPU catalog coverage beyond constants.py #47

Description

Problem

Research summary

Direction

Proposed architecture

Fallback policy

Implementation phases

Acceptance criteria

Useful references

Related

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions