[AMD] Optimize kimik2.5-int4-mi300x-vllm, dsr1-fp8-mi300x-sglang by lishuoshuo-amd · Pull Request #11 · lishuoshuo-amd/InferenceX

lishuoshuo-amd · 2026-05-07T02:20:20Z

Automated performance optimization from Hyperloom CI (2026-05-05).

kimik2.5-int4-mi300x-vllm

Metric	Value
Baseline (tok/s/GPU)	38.65
Optimized (tok/s/GPU)	51.86
Optimization Gain	+34.2%
InferenceX (tok/s/GPU)	39.25
vs InferenceX	+32.1%

Server flag changes:

--max-num-seqs: 256 → 128

dsr1-fp8-mi300x-sglang

Metric	Value
Baseline (tok/s/GPU)	88.0
Optimized (tok/s/GPU)	91.7
Optimization Gain	+4.2%
InferenceX (tok/s/GPU)	149.31
vs InferenceX	+4.2%

Server flag changes:

--mem-fraction-static: 0.8 → 0.85
--num-continuous-decode-steps: 4 → 16

Related Issue

Automated by Hyperloom CI

Type of Change

Configuration change

Checklist

I have tested my changes locally
I have updated documentation if necessary
If I changed a container image or config, I have already updated perf-changelog.yaml

…glang kimik2.5: --max-num-seqs 256 → 128 (+34.2% output throughput gain) dsr1: --mem-fraction-static 0.8 → 0.85, --num-continuous-decode-steps 4 → 16 (+4.2%) Co-authored-by: Cursor <cursoragent@cursor.com>

github-actions · 2026-05-07T02:20:27Z

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow

If additional help is needed, PR authors can reach out to core maintainers over Slack.

lishuoshuo-amd added the verify-enabled Validate PR label May 7, 2026

lishuoshuo-amd changed the title ~~[AMD/Hyperloom] Optimize kimik2.5-int4-mi300x-vllm, dsr1-fp8-mi300x-sglang~~ [AMD] Optimize kimik2.5-int4-mi300x-vllm, dsr1-fp8-mi300x-sglang May 7, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[AMD] Optimize kimik2.5-int4-mi300x-vllm, dsr1-fp8-mi300x-sglang#11

[AMD] Optimize kimik2.5-int4-mi300x-vllm, dsr1-fp8-mi300x-sglang#11
lishuoshuo-amd wants to merge 1 commit intomainfrom
hyperloom/ci-20260505-optimize

lishuoshuo-amd commented May 7, 2026

Uh oh!

github-actions Bot commented May 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

lishuoshuo-amd commented May 7, 2026

kimik2.5-int4-mi300x-vllm

dsr1-fp8-mi300x-sglang

Related Issue

Type of Change

Checklist

Uh oh!

github-actions Bot commented May 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant