Implement maintainer-owned multi-GPU fit simulation

Multi-GPU support is now requested in #65 and #110, with deployment strategy details in #52. #84 also has a useful first pass, but current `main` has moved enough that I want to rework this in smaller maintainer-owned steps.

First-pass scope:

- parse repeated and comma-separated `--gpu` values, including forms like `2x RTX 4090`
- represent heterogeneous GPUs without treating them as one perfect pooled device
- use conservative VRAM fit math for multi-GPU setups
- keep speed estimates low-confidence unless backend split assumptions are explicit
- show warnings that explain split and heterogeneous-GPU assumptions

Out of scope for the first pass:

- exact TP/DP throughput prediction
- PCIe lane and NVLink modeling
- backend-specific tensor-split tuning

References:

- #65
- #52
- #84
- #110

The implementation should preserve existing single-GPU behavior and include regression tests for homogeneous GPUs, heterogeneous GPUs, and partial-offload cases.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement maintainer-owned multi-GPU fit simulation #112

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Implement maintainer-owned multi-GPU fit simulation #112

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions