Multi-GPU support is now requested in #65 and #110, with deployment strategy details in #52. #84 also has a useful first pass, but current main has moved enough that I want to rework this in smaller maintainer-owned steps.
First-pass scope:
- parse repeated and comma-separated
--gpu values, including forms like 2x RTX 4090
- represent heterogeneous GPUs without treating them as one perfect pooled device
- use conservative VRAM fit math for multi-GPU setups
- keep speed estimates low-confidence unless backend split assumptions are explicit
- show warnings that explain split and heterogeneous-GPU assumptions
Out of scope for the first pass:
- exact TP/DP throughput prediction
- PCIe lane and NVLink modeling
- backend-specific tensor-split tuning
References:
The implementation should preserve existing single-GPU behavior and include regression tests for homogeneous GPUs, heterogeneous GPUs, and partial-offload cases.
Multi-GPU support is now requested in #65 and #110, with deployment strategy details in #52. #84 also has a useful first pass, but current
mainhas moved enough that I want to rework this in smaller maintainer-owned steps.First-pass scope:
--gpuvalues, including forms like2x RTX 4090Out of scope for the first pass:
References:
The implementation should preserve existing single-GPU behavior and include regression tests for homogeneous GPUs, heterogeneous GPUs, and partial-offload cases.