Skip to content

Implement maintainer-owned multi-GPU fit simulation #112

@Andyyyy64

Description

@Andyyyy64

Multi-GPU support is now requested in #65 and #110, with deployment strategy details in #52. #84 also has a useful first pass, but current main has moved enough that I want to rework this in smaller maintainer-owned steps.

First-pass scope:

  • parse repeated and comma-separated --gpu values, including forms like 2x RTX 4090
  • represent heterogeneous GPUs without treating them as one perfect pooled device
  • use conservative VRAM fit math for multi-GPU setups
  • keep speed estimates low-confidence unless backend split assumptions are explicit
  • show warnings that explain split and heterogeneous-GPU assumptions

Out of scope for the first pass:

  • exact TP/DP throughput prediction
  • PCIe lane and NVLink modeling
  • backend-specific tensor-split tuning

References:

The implementation should preserve existing single-GPU behavior and include regression tests for homogeneous GPUs, heterogeneous GPUs, and partial-offload cases.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions