Skip to content

[AMD] Optimize kimik2.5-int4-mi300x-vllm, dsr1-fp8-mi300x-sglang#11

Open
lishuoshuo-amd wants to merge 1 commit intomainfrom
hyperloom/ci-20260505-optimize
Open

[AMD] Optimize kimik2.5-int4-mi300x-vllm, dsr1-fp8-mi300x-sglang#11
lishuoshuo-amd wants to merge 1 commit intomainfrom
hyperloom/ci-20260505-optimize

Conversation

@lishuoshuo-amd
Copy link
Copy Markdown
Owner

Automated performance optimization from Hyperloom CI (2026-05-05).

kimik2.5-int4-mi300x-vllm

Metric Value
Baseline (tok/s/GPU) 38.65
Optimized (tok/s/GPU) 51.86
Optimization Gain +34.2%
InferenceX (tok/s/GPU) 39.25
vs InferenceX +32.1%

Server flag changes:

  • --max-num-seqs: 256128

dsr1-fp8-mi300x-sglang

Metric Value
Baseline (tok/s/GPU) 88.0
Optimized (tok/s/GPU) 91.7
Optimization Gain +4.2%
InferenceX (tok/s/GPU) 149.31
vs InferenceX +4.2%

Server flag changes:

  • --mem-fraction-static: 0.80.85
  • --num-continuous-decode-steps: 416

Related Issue

Automated by Hyperloom CI

Type of Change

  • Configuration change

Checklist

  • I have tested my changes locally
  • I have updated documentation if necessary
  • If I changed a container image or config, I have already updated perf-changelog.yaml

…glang

kimik2.5: --max-num-seqs 256 → 128 (+34.2% output throughput gain)
dsr1: --mem-fraction-static 0.8 → 0.85, --num-continuous-decode-steps 4 → 16 (+4.2%)

Co-authored-by: Cursor <cursoragent@cursor.com>
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 7, 2026

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow

If additional help is needed, PR authors can reach out to core maintainers over Slack.

@lishuoshuo-amd lishuoshuo-amd added the verify-enabled Validate PR label May 7, 2026
@lishuoshuo-amd lishuoshuo-amd changed the title [AMD/Hyperloom] Optimize kimik2.5-int4-mi300x-vllm, dsr1-fp8-mi300x-sglang [AMD] Optimize kimik2.5-int4-mi300x-vllm, dsr1-fp8-mi300x-sglang May 7, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

verify-enabled Validate PR

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant