Skip to content

[AMD/ROCM] atom qwen3.5 fp4 on mi355x#1133

Open
seungrokj wants to merge 4 commits intomainfrom
srok/atom_qwen3.5_fp4
Open

[AMD/ROCM] atom qwen3.5 fp4 on mi355x#1133
seungrokj wants to merge 4 commits intomainfrom
srok/atom_qwen3.5_fp4

Conversation

@seungrokj
Copy link
Copy Markdown
Collaborator

Summary

  • Add Qwen3.5-397B-A17B MXFP4 single-node MI355X ATOM benchmark config and launch script
  • Image: rocm/atom:rocm7.2.2_ubuntu24.04_py3.12_pytorch_release_2.10.0_atom0.1.2.post
  • TP=2 and TP=4, concurrency 4-256 for 1k1k and 8k1k sequence lengths
  • --gpu-memory-utilization 0.9 set in server launch

Test plan

  • CI benchmark runs pass for qwen3.5-fp4-mi355x-atom with TP=2 and TP=4

🤖 Generated with Claude Code

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown
Contributor

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow

As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers.

If additional help is needed, PR authors can reach out to core maintainers over Slack.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@seungrokj seungrokj added the AMD label Apr 24, 2026
Comment thread perf-changelog.yaml Outdated
Comment on lines +1742 to +1744
- "TP=2 and TP=4, concurrency 4-256 for 1k1k and 8k1k sequence lengths"
- "Add --gpu-memory-utilization 0.9 to server launch"
pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Two metadata issues in the new qwen3.5-fp4-mi355x-atom changelog entry (nit). (1) perf-changelog.yaml:1744 has an empty pr-link ending in /pull/ with no PR number — should be /pull/1133. (2) perf-changelog.yaml:1742 says "TP=2 and TP=4, concurrency 4-256 for 1k1k and 8k1k sequence lengths", but only TP=2 sweeps 4-256; TP=4 actually caps at 16 in amd-master.yaml. Both look like copy-paste from the sibling glm5.1 entry without full adjustment.

Extended reasoning...

1. Empty pr-link

The new changelog entry added at lines 1737-1744 of perf-changelog.yaml ends with:

  pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/

This resolves to the PR index page, not to any specific PR, defeating the traceability purpose of the changelog. Every other entry in the file either uses a real PR number (e.g. the directly adjacent glm5.1-fp4-mi355x-atom entry on line 1735 uses /pull/1043, and the preceding qwen3.5-fp8-mi355x-atom entry on line 1734 uses /pull/1040) or a clear placeholder (XXX, XXXX, TBD). An empty suffix is unique in this file and is the only style that yields a literally broken URL. Fix: replace the trailing slash with /pull/1133. The pydantic ChangelogEntry.pr_link field in utils/matrix_logic/validation.py is typed as plain str and does not validate URL well-formedness, so this passes schema checks silently.

2. Inaccurate TP=4 concurrency range in description

Line 1742 reads:

    - "TP=2 and TP=4, concurrency 4-256 for 1k1k and 8k1k sequence lengths"

but the actual search-space added to .github/configs/amd-master.yaml is:

  - isl: 1024
    osl: 1024
    search-space:
    - { tp: 2, conc-start: 4, conc-end: 256 }
    - { tp: 4, conc-start: 4, conc-end: 16 }
  - isl: 8192
    osl: 1024
    search-space:
    - { tp: 2, conc-start: 4, conc-end: 256 }
    - { tp: 4, conc-start: 4, conc-end: 16 }

TP=4 sweeps concurrency 4-16 only (not 4-256). The sentence overstates TP=4 coverage by 16x — it's a direct copy-paste from the sibling glm5.1-fp4-mi355x-atom description (line 1733 in this PR, pr-link 1043), where the glm5.1 config legitimately sweeps TP=4 up to conc 256. Suggested rewording: "TP=2 concurrency 4-256, TP=4 concurrency 4-16, for 1k1k and 8k1k sequence lengths".

Impact and severity

Both issues are documentation/metadata only — the YAML that actually drives benchmarks is internally consistent and the runtime is unaffected. Verifiers unanimously graded both as nit. However, they should be fixed before merge because: (a) the empty pull/ URL is a traceability regression that cannot be detected by current schema validation, and (b) the description actively misleads readers about what the benchmark covers.

Step-by-step proof for issue (2)

  1. amd-master.yaml lines 316-322/317-326 (post-merge) define the qwen3.5-fp4-mi355x-atom search-space: for both isl: 1024/osl: 1024 and isl: 8192/osl: 1024, TP=2 has conc-start: 4, conc-end: 256 and TP=4 has conc-start: 4, conc-end: 16.
  2. The search-space expansion (concurrency multiplies by 2 in the usual 4, 8, 16, 32, ... sweep) therefore yields TP=4 points {4, 8, 16} only — no TP=4 runs at conc 32, 64, 128, or 256.
  3. The description string at line 1742 implies TP=4 is swept identically to TP=2 ("TP=2 and TP=4, concurrency 4-256"), which is false for this config.
  4. Compare with the sibling qwen3.5-fp4-mi355x-sglang entry earlier in the file (no changelog entry in this PR, but identical shape): same TP=2 4-256, TP=4 4-16 split. The source of the copy-paste is the glm5.1 entry (line 1733) whose description reuses the same sentence but whose actual search-space is { tp: 4, conc-start: 4, conc-end: 256 }, making the sentence correct there.

Conclusion: Both items are real metadata defects; severity is nit.

@seungrokj
Copy link
Copy Markdown
Collaborator Author

@seungrokj
Copy link
Copy Markdown
Collaborator Author

@functionstackx @cquil11 @billishyahao @chunfangamd plz approve this

@functionstackx
Copy link
Copy Markdown
Contributor

@functionstackx @cquil11 @billishyahao @chunfangamd plz approve this

@Oseltamivir can u approve and merge this during downtime in between ur dsv4 mi355 runs

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

2 participants