Problem
The project looks like only support matching the best local LLM for single-GPU scenarios?
However, I want to leverage multiple GPUs (e.g., 4 or 8 GPUs) to run larger models that cannot fit on a single GPU, or to achieve higher throughput via parallel inference.How to specify the target number of GPUs for deployment matching?
Proposed Solution
- Allow users to specify target GPU count when searching for compatible models
- Provide the optimal parallel strategy configuration for model deployment
Alternatives Considered
No response
Problem
The project looks like only support matching the best local LLM for single-GPU scenarios?
However, I want to leverage multiple GPUs (e.g., 4 or 8 GPUs) to run larger models that cannot fit on a single GPU, or to achieve higher throughput via parallel inference.How to specify the target number of GPUs for deployment matching?
Proposed Solution
Alternatives Considered
No response