Skip to content

Feature Request: Multi-GPU simulation via --gpu flag (heterogeneous setups) #65

@cobra91

Description

@cobra91

Feature Request

Support simulating multiple GPUs via the --gpu flag, including heterogeneous setups (e.g., RTX 5080 + RTX 5060 Ti 16GB).

Motivation

Many users run or plan multi-GPU setups for local LLM inference. Currently whichllm only allows simulating a single GPU, making it impossible to evaluate which models would fit on a dual-GPU rig.

Common use cases:

  • Dual RTX 3060 12GB — combined 24 GB VRAM, common budget build https://www.youtube.com/watch?v=3XC8BA5UNBs
  • Dual RTX 3090 — 48 GB total, popular used-market combo
  • RTX 4090 + RTX 3090 — asymmetric upgrades
  • H100 x2, x4 — enterprise setups

Proposed CLI Syntax

# Heterogeneous dual GPU
uvx whichllm@latest --gpu "RTX 5080,RTX 5060 Ti 16GB"

# Identical GPUs with count shorthand
uvx whichllm@latest --gpu "2x RTX 4090"
uvx whichllm@latest --gpu "4x H100"

Technical Notes

The type system already partially supports this — HardwareInfo.gpu is typed as list[GPUInfo]. Physical multi-GPU detection works when multiple GPUs are present. The gap is in the CLI layer and VRAM pooling logic.

What's needed:

  1. CLI parsing_handle_gpu_option() should accept comma-separated GPU names or Nx prefix
  2. VRAM pooling — combine VRAM across GPUs (total available = sum - overhead)
  3. Speed estimation — for heterogeneous setups, the slowest GPU becomes the bottleneck for split-tensor inference
  4. Bandwidth model — inter-GPU communication overhead (PCIe/NVLink) affects effective throughput

Environment

  • uvx whichllm@latest
  • Windows 11 / any OS

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions