Skip to content

[AMD/ROCM] atom glm5 fp8 on mi355x#1126

Open
seungrokj wants to merge 5 commits intomainfrom
srok/atom_glm5_fp8
Open

[AMD/ROCM] atom glm5 fp8 on mi355x#1126
seungrokj wants to merge 5 commits intomainfrom
srok/atom_glm5_fp8

Conversation

@seungrokj
Copy link
Copy Markdown
Collaborator

Summary

  • Update GLM-5 FP8 MI355X ATOM benchmark config with new image
  • Add TP=4 search space alongside existing TP=8 for 1k1k and 8k1k sequence lengths
  • Set --gpu-memory-utilization 0.9 in server launch script

Test plan

  • CI benchmark runs pass for glm5-fp8-mi355x-atom with TP=4 and TP=8
  • Verify new image rocm/atom:rocm7.2.2_ubuntu24.04_py3.12_pytorch_release_2.10.0_atom0.1.2.post works correctly

🤖 Generated with Claude Code

seungrokj and others added 2 commits April 24, 2026 12:15
…ization

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown
Contributor

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow

As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers.

If additional help is needed, PR authors can reach out to core maintainers over Slack.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@seungrokj
Copy link
Copy Markdown
Collaborator Author

to align with the existing PR #1009 used glm5 instead of glm5.1 (same architecture)

Comment thread perf-changelog.yaml Outdated
- "Image: rocm/atom:rocm7.2.2_ubuntu24.04_py3.12_pytorch_release_2.10.0_atom0.1.2.post"
- "Add TP=4, concurrency 4-256 for 1k1k and 8k1k sequence lengths"
- "Add --gpu-memory-utilization 0.9 to server launch"
pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 The new glm5-fp8-mi355x-atom entry in perf-changelog.yaml has pr-link 'https://github.com/SemiAnalysisAI/InferenceX/pull/' with no PR number after the trailing slash. Every other entry uses an actual PR number or a placeholder like 'XXX'/'XXXX'/'TBD', so this reads as a copy-paste omission that yields a broken URL. Please append '1126' (this PR's number).

Extended reasoning...

What the bug is

The final entry added to perf-changelog.yaml (lines 1735-1744) describes the GLM-5 FP8 MI355X ATOM update, but its pr-link field is truncated:

pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/

Nothing follows /pull/ — not a number, not even a placeholder like XXX/XXXX/TBD.

How it manifests / impact

The resulting URL https://github.com/SemiAnalysisAI/InferenceX/pull/ does not resolve to a specific PR. On GitHub, this bare path redirects to the repository pulls listing (or 404s depending on context) rather than to PR #1126, which defeats the purpose of the field: allowing future readers of the changelog to trace a perf delta back to the PR that caused it.

Why existing code doesn't prevent it

perf-changelog.yaml is free-form YAML with no schema validator enforcing that pr-link ends with a numeric PR id. Grep confirms this is the only entry in the file that ends with a bare pull/ — ~20 other entries use XXX/XXXX/TBD placeholders or real numbers. A follow-up commit (8a1620c Fix perf-changelog entry for GLM5 FP8 atom config) patched the config-keys and description fields in this same block but missed the pr-link.

Step-by-step proof

  1. Open perf-changelog.yaml and scroll to the last entry (lines 1735-1744):
    - config-keys:
        - glm5-fp8-mi355x-atom
      description:
        - "Update GLM-5 FP8 MI355X ATOM benchmark: ..."
        ...
      pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/
  2. Line 1744 ends with pull/ — no number.
  3. Pasting that URL into a browser does not route to PR [AMD/ROCM] atom glm5 fp8 on mi355x #1126 (this PR).
  4. Compare with the preceding entry for glm5.1-fp4-mi355x-atom on line 1732 which correctly points to .../pull/1043.

How to fix

Append 1126 so the line reads:

pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/1126

This is a trivial one-line fix. Severity is nit — it is a documentation/metadata field in a changelog, not runtime code, but it is caught during review and easy to correct before merge.

@seungrokj seungrokj added the AMD label Apr 24, 2026
@seungrokj
Copy link
Copy Markdown
Collaborator Author

Copy link
Copy Markdown
Collaborator

@billishyahao billishyahao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@functionstackx
Copy link
Copy Markdown
Contributor

hi @functionstackx @cquil11 @billishyahao @chunfangamd can you approve this ? e2e perf/accuracy https://github.com/SemiAnalysisAI/InferenceX/actions/runs/24870492346

@Oseltamivir can u approve and merge this in during idle time between ur mi355 dsv4 runs. Dsv4 mi355 is priority in the queue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

3 participants