Skip to content

[Hyperloom] Optimize dsr1-fp8-mi355x-sglang, gptoss-fp4-mi355x-vllm#2

Open
lishuoshuo-amd wants to merge 1 commit intomainfrom
hyperloom/ci-20260416-0632
Open

[Hyperloom] Optimize dsr1-fp8-mi355x-sglang, gptoss-fp4-mi355x-vllm#2
lishuoshuo-amd wants to merge 1 commit intomainfrom
hyperloom/ci-20260416-0632

Conversation

@lishuoshuo-amd
Copy link
Copy Markdown
Owner

Description

Automated performance optimization update from Hyperloom CI.

dsr1-fp8-mi355x-sglang

Metric Value
Baseline (tok/s/GPU) 309.88
Optimized (tok/s/GPU) 331.37
Optimization Gain +6.9%
InferenceX Current (tok/s/GPU) 310.09
vs InferenceX +6.9%

Server flag changes:

  • --num-continuous-decode-steps: 48

gptoss-fp4-mi355x-vllm

Metric Value
Baseline (tok/s/GPU) 7344.93
Optimized (tok/s/GPU) 7855.14
Optimization Gain +7.0%
InferenceX Current (tok/s/GPU) 6585.93
vs InferenceX +19.3%

Server flag changes:

  • Add --max-num-seqs 256
  • Add --enable-chunked-prefill
  • Add --max-num-batched-tokens 16384

Related Issue

Automated by Hyperloom CI

Type of Change

  • Configuration change

Checklist

  • I have tested my changes locally
  • I have updated documentation if necessary
  • If I changed a container image or config, I have already updated perf-changelog.yaml

…4-mi355x-vllm

- dsr1-fp8-mi355x-sglang: --num-continuous-decode-steps: 4 → 8
- gptoss-fp4-mi355x-vllm: Add --max-num-seqs 256; Add --enable-chunked-prefill ; Add --max-num-batched-tokens 16384
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant