Feature Request
Support simulating multiple GPUs via the --gpu flag, including heterogeneous setups (e.g., RTX 5080 + RTX 5060 Ti 16GB).
Motivation
Many users run or plan multi-GPU setups for local LLM inference. Currently whichllm only allows simulating a single GPU, making it impossible to evaluate which models would fit on a dual-GPU rig.
Common use cases:
- Dual RTX 3060 12GB — combined 24 GB VRAM, common budget build https://www.youtube.com/watch?v=3XC8BA5UNBs
- Dual RTX 3090 — 48 GB total, popular used-market combo
- RTX 4090 + RTX 3090 — asymmetric upgrades
- H100 x2, x4 — enterprise setups
Proposed CLI Syntax
# Heterogeneous dual GPU
uvx whichllm@latest --gpu "RTX 5080,RTX 5060 Ti 16GB"
# Identical GPUs with count shorthand
uvx whichllm@latest --gpu "2x RTX 4090"
uvx whichllm@latest --gpu "4x H100"
Technical Notes
The type system already partially supports this — HardwareInfo.gpu is typed as list[GPUInfo]. Physical multi-GPU detection works when multiple GPUs are present. The gap is in the CLI layer and VRAM pooling logic.
What's needed:
- CLI parsing —
_handle_gpu_option() should accept comma-separated GPU names or Nx prefix
- VRAM pooling — combine VRAM across GPUs (total available = sum - overhead)
- Speed estimation — for heterogeneous setups, the slowest GPU becomes the bottleneck for split-tensor inference
- Bandwidth model — inter-GPU communication overhead (PCIe/NVLink) affects effective throughput
Environment
uvx whichllm@latest
- Windows 11 / any OS
Feature Request
Support simulating multiple GPUs via the
--gpuflag, including heterogeneous setups (e.g., RTX 5080 + RTX 5060 Ti 16GB).Motivation
Many users run or plan multi-GPU setups for local LLM inference. Currently
whichllmonly allows simulating a single GPU, making it impossible to evaluate which models would fit on a dual-GPU rig.Common use cases:
Proposed CLI Syntax
Technical Notes
The type system already partially supports this —
HardwareInfo.gpuis typed aslist[GPUInfo]. Physical multi-GPU detection works when multiple GPUs are present. The gap is in the CLI layer and VRAM pooling logic.What's needed:
_handle_gpu_option()should accept comma-separated GPU names orNxprefixEnvironment
uvx whichllm@latest