[AMD/ROCM] atom glm5 fp8 on mi355x by seungrokj · Pull Request #1126 · SemiAnalysisAI/InferenceX

seungrokj · 2026-04-24T03:19:35Z

Summary

Update GLM-5 FP8 MI355X ATOM benchmark config with new image
Add TP=4 search space alongside existing TP=8 for 1k1k and 8k1k sequence lengths
Set --gpu-memory-utilization 0.9 in server launch script

Test plan

CI benchmark runs pass for glm5-fp8-mi355x-atom with TP=4 and TP=8
Verify new image rocm/atom:rocm7.2.2_ubuntu24.04_py3.12_pytorch_release_2.10.0_atom0.1.2.post works correctly

🤖 Generated with Claude Code

…ization Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

github-actions · 2026-04-24T03:19:43Z

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow

As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers.

If additional help is needed, PR authors can reach out to core maintainers over Slack.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

seungrokj · 2026-04-24T03:23:08Z

to align with the existing PR #1009 used glm5 instead of glm5.1 (same architecture)

claude · 2026-04-24T03:24:29Z

+    - "Image: rocm/atom:rocm7.2.2_ubuntu24.04_py3.12_pytorch_release_2.10.0_atom0.1.2.post"
+    - "Add TP=4, concurrency 4-256 for 1k1k and 8k1k sequence lengths"
+    - "Add --gpu-memory-utilization 0.9 to server launch"
+  pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/


🟡 The new glm5-fp8-mi355x-atom entry in perf-changelog.yaml has pr-link 'https://github.com/SemiAnalysisAI/InferenceX/pull/' with no PR number after the trailing slash. Every other entry uses an actual PR number or a placeholder like 'XXX'/'XXXX'/'TBD', so this reads as a copy-paste omission that yields a broken URL. Please append '1126' (this PR's number).

Extended reasoning...

What the bug is

The final entry added to perf-changelog.yaml (lines 1735-1744) describes the GLM-5 FP8 MI355X ATOM update, but its pr-link field is truncated:

pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/

Nothing follows /pull/ — not a number, not even a placeholder like XXX/XXXX/TBD.

How it manifests / impact

The resulting URL https://github.com/SemiAnalysisAI/InferenceX/pull/ does not resolve to a specific PR. On GitHub, this bare path redirects to the repository pulls listing (or 404s depending on context) rather than to PR #1126, which defeats the purpose of the field: allowing future readers of the changelog to trace a perf delta back to the PR that caused it.

Why existing code doesn't prevent it

perf-changelog.yaml is free-form YAML with no schema validator enforcing that pr-link ends with a numeric PR id. Grep confirms this is the only entry in the file that ends with a bare pull/ — ~20 other entries use XXX/XXXX/TBD placeholders or real numbers. A follow-up commit (8a1620c Fix perf-changelog entry for GLM5 FP8 atom config) patched the config-keys and description fields in this same block but missed the pr-link.

Step-by-step proof

Open perf-changelog.yaml and scroll to the last entry (lines 1735-1744):
- config-keys: - glm5-fp8-mi355x-atom description: - "Update GLM-5 FP8 MI355X ATOM benchmark: ..." ... pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/

Line 1744 ends with pull/ — no number.

Pasting that URL into a browser does not route to PR [AMD/ROCM] atom glm5 fp8 on mi355x #1126 (this PR).

Compare with the preceding entry for glm5.1-fp4-mi355x-atom on line 1732 which correctly points to .../pull/1043.

How to fix

Append 1126 so the line reads:

pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/1126

This is a trivial one-line fix. Severity is nit — it is a documentation/metadata field in a changelog, not runtime code, but it is caught during review and easy to correct before merge.

seungrokj · 2026-04-24T07:35:12Z

hi @functionstackx @cquil11 @billishyahao @chunfangamd can you approve this ?
e2e perf/accuracy
https://github.com/SemiAnalysisAI/InferenceX/actions/runs/24870492346

billishyahao

LGTM

functionstackx · 2026-04-25T05:13:00Z

hi @functionstackx @cquil11 @billishyahao @chunfangamd can you approve this ? e2e perf/accuracy https://github.com/SemiAnalysisAI/InferenceX/actions/runs/24870492346

@Oseltamivir can u approve and merge this in during idle time between ur mi355 dsv4 runs. Dsv4 mi355 is priority in the queue

seungrokj and others added 2 commits April 24, 2026 12:15

Update GLM5 FP8 atom config: new image, add TP=4, set gpu-memory-util…

d136db5

…ization Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Fix perf-changelog entry for GLM5 FP8 atom config

8a1620c

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

seungrokj requested a review from a team April 24, 2026 03:19

github-project-automation Bot added this to InferenceMAX Board Apr 24, 2026

seungrokj requested review from 1am9trash, billishyahao, chunfangamd and yctseng0211 as code owners April 24, 2026 03:19

Add PR link to perf-changelog for GLM5 FP8 atom config

66c42b4

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

claude Bot reviewed Apr 24, 2026

View reviewed changes

seungrokj added the AMD label Apr 24, 2026

billishyahao approved these changes Apr 24, 2026

View reviewed changes

Merge branch 'main' into srok/atom_glm5_fp8

2b41edd

Merge branch 'main' into srok/atom_glm5_fp8

68ce144

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[AMD/ROCM] atom glm5 fp8 on mi355x#1126

[AMD/ROCM] atom glm5 fp8 on mi355x#1126
seungrokj wants to merge 5 commits intomainfrom
srok/atom_glm5_fp8

seungrokj commented Apr 24, 2026

Uh oh!

github-actions Bot commented Apr 24, 2026

Uh oh!

seungrokj commented Apr 24, 2026

Uh oh!

claude Bot Apr 24, 2026

Uh oh!

seungrokj commented Apr 24, 2026

Uh oh!

billishyahao left a comment

Uh oh!

functionstackx commented Apr 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

seungrokj commented Apr 24, 2026

Summary

Test plan

Uh oh!

github-actions Bot commented Apr 24, 2026

Uh oh!

seungrokj commented Apr 24, 2026

Uh oh!

claude Bot Apr 24, 2026

Choose a reason for hiding this comment

Uh oh!

seungrokj commented Apr 24, 2026

Uh oh!

billishyahao left a comment

Choose a reason for hiding this comment

Uh oh!

functionstackx commented Apr 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants