Skip to content

Multi-GPU Deployment Configuration and Optimal Parallel Strategy #52

@Wangxingyan

Description

@Wangxingyan

Problem

The project looks like only support matching the best local LLM for single-GPU scenarios?

However, I want to leverage multiple GPUs (e.g., 4 or 8 GPUs) to run larger models that cannot fit on a single GPU, or to achieve higher throughput via parallel inference.How to specify the target number of GPUs for deployment matching?

Proposed Solution

  1. Allow users to specify target GPU count when searching for compatible models
  2. Provide the optimal parallel strategy configuration for model deployment

Alternatives Considered

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions