Aiter MHC fix and keep DSv4 ATOM conc1 by Oseltamivir · Pull Request #1202 · SemiAnalysisAI/InferenceX

Oseltamivir · 2026-04-27T23:03:59Z

Summary

Applies the fix mhc device ROCm/aiter#2916 mhc_pre device-allocation fix at benchmark runtime for dsv4-fp4-mi355x-atom.
Removes the ATOM deepseek_v4.py sed workaround that disabled mhc_pre and forced the torch fallback.
Keeps the DSv4 ATOM config at CONC=1 only, with a fatal script guard for accidental high-concurrency runs.
Appends a perf changelog entry so CI runs the affected MI355X ATOM config.

Why

PR #1165 introduced DeepSeek-V4-Pro support on ATOM but had to disable the aiter mhc_pre path because aiter allocated internal tensors on the wrong device. ROCm/aiter#2916 fixes that by allocating mhc_pre intermediates on residual.device. This PR vendors that pure-Python fix into the benchmark startup path without rebuilding aiter or changing the image.

Run 24953107645 showed that higher-concurrency DSv4 ATOM runs are not ready yet:

1k1k at CONC>=16 can fail during initialization with negative KV budget after high warmup peak memory.
1k1k at CONC=4 and 8k1k at CONC>=4 OOM inside the PR Claude Opus 4.6 #650 torch sparse_attn fallback.
Eval-only currently fails independently because DSv4 has no HF tokenizer chat_template for /v1/chat/completions.

Until ATOM lands the AITER sparse-attention / multi-request path for DeepSeek-V4, this should stay a single-request marker.

Quantization notes

The ATOM PR #650 path appears to allocate routed MoE expert weights as MXFP4, not BF16: make_v4_quant_config() returns dtypes.fp4x2 for .ffn.experts, FusedMoE selects Mxfp4MoEMethod, and the triton path swizzles packed uint8 FP4 weights plus e8m0 scales. The observed OOMs are in sparse-attention temporary tensors / warmup budget, not from globally dequantizing MoE weights.

Validation

bash -n benchmarks/single_node/dsv4_fp4_mi355x_atom.sh
python utils/matrix_logic/generate_sweep_configs.py test-config --config-files .github/configs/amd-master.yaml --config-keys dsv4-fp4-mi355x-atom --no-evals
python utils/matrix_logic/generate_sweep_configs.py test-config --config-files .github/configs/amd-master.yaml --config-keys dsv4-fp4-mi355x-atom --evals-only
python utils/process_changelog.py --base-ref origin/main --head-ref HEAD --changelog-file perf-changelog.yaml --trim-conc
Local fake-package check that the embedded aiter.ops.mhc patcher applies and is idempotent.

References

github-actions · 2026-04-27T23:04:07Z

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow

As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers.

If additional help is needed, PR authors can reach out to core maintainers over Slack.

claude · 2026-04-27T23:07:19Z

Claude finished @Oseltamivir's task in 1m 1s —— View job

Review: PR #1202 — Aiter MHC device fix for DSv4 ATOM

Read repo guidelines
Review changed files and gather context
Validate perf-changelog entry
Check benchmark script code style
Post review findings

LGTM — no blocking issues found.

The runtime patcher for aiter/ops/mhc.py is well-structured: idempotent (skips if already patched), validates pre-conditions before modifying anything, verifies post-conditions after writing, and fails fast with clear messages on any mismatch. Replacing the sed workaround that disabled mhc_pre with the upstream device-allocation fix (plus a grep -q assertion that ATOM still exposes the hook) is a clean improvement. Perf-changelog entry is correctly appended at the end.

claude

LGTM — runtime aiter patch is idempotent, verifies pre/post conditions, and only affects the dsv4-fp4-mi355x-atom benchmark.

Extended reasoning...

Overview

This PR modifies a single benchmark script (benchmarks/single_node/dsv4_fp4_mi355x_atom.sh) to vendor the pure-Python portion of ROCm/aiter#2916 as a runtime patch over the installed aiter/ops/mhc.py. It also removes the previous sed workaround that disabled mhc_pre entirely, and adds a corresponding perf-changelog.yaml entry. The patch adds device=residual.device to several torch.empty() calls so MHC intermediate tensors no longer land on the wrong device.

Security risks

None meaningful. The script clones from public GitHub mirrors over HTTPS at pinned SHAs (already the existing pattern for ATOM and triton_kernels), then mutates a file in an ephemeral container. There is no user input, no network-exposed surface, and no auth/crypto/permissions code involved.

Level of scrutiny

Low. This is benchmark infra code, not production. The blast radius is one CI config (dsv4-fp4-mi355x-atom); a regression here would manifest as a benchmark failure rather than affect users. The patch logic is mechanical (literal source.replace(old, new, 1)), guarded by an idempotency check, refuses to patch if the expected pre-patch text isn't found (with named missing patterns), and re-reads the file post-write to verify success. Failure modes all exit non-zero with clear FATAL messages.

Other factors

The author lists local validation including a fake-package idempotency check. The replacement substrings are carefully whitespace-precise and unique enough that accidental matches are unlikely. The removed sed workaround is replaced with a grep -q guard so a future ATOM rev that drops the hook fails loudly rather than silently. The changelog entry follows the established format. No bugs flagged by the hunting system.

Matches the exact tree from 55fd191 (run 25027405568). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Oseltamivir force-pushed the codex/dsv4-atom-aiter-mhc-fix branch from e21a607 to 889c695 Compare April 27, 2026 23:04

Oseltamivir changed the title ~~[codex] Use aiter MHC device fix for DSv4 ATOM~~ Aiter MHC device fix for DSv4 ATOM Apr 27, 2026

Oseltamivir added the sweep-enabled label Apr 27, 2026

Oseltamivir marked this pull request as ready for review April 27, 2026 23:06

Oseltamivir requested a review from a team April 27, 2026 23:06

claude Bot reviewed Apr 27, 2026

View reviewed changes

fix: use aiter mhc device fix for dsv4 atom

952c923

Oseltamivir force-pushed the codex/dsv4-atom-aiter-mhc-fix branch from 889c695 to 952c923 Compare April 27, 2026 23:23

Oseltamivir requested review from 1am9trash, billishyahao, chunfangamd, seungrokj and yctseng0211 as code owners April 27, 2026 23:23

Oseltamivir changed the title ~~Aiter MHC device fix for DSv4 ATOM~~ [codex] Use aiter MHC fix and keep DSv4 ATOM conc1 Apr 27, 2026

Oseltamivir changed the title ~~[codex] Use aiter MHC fix and keep DSv4 ATOM conc1~~ Aiter MHC fix and keep DSv4 ATOM conc1 Apr 27, 2026

Oseltamivir added full-sweep-enabled and removed sweep-enabled labels Apr 28, 2026

Oseltamivir force-pushed the codex/dsv4-atom-aiter-mhc-fix branch from 951d350 to 952c923 Compare April 28, 2026 06:07

fix: restore DSv4 ATOM aiter mhc + perf stack to CI-proven state

0d94067

Matches the exact tree from 55fd191 (run 25027405568). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Oseltamivir force-pushed the codex/dsv4-atom-aiter-mhc-fix branch from 04a7baf to 0d94067 Compare April 28, 2026 07:44

Merge branch 'main' into codex/dsv4-atom-aiter-mhc-fix

ce87aa8

Oseltamivir merged commit 38d2da7 into main Apr 28, 2026
17 checks passed

Oseltamivir deleted the codex/dsv4-atom-aiter-mhc-fix branch April 28, 2026 16:30

github-project-automation Bot moved this to Done in InferenceMAX Board Apr 28, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Aiter MHC fix and keep DSv4 ATOM conc1#1202

Aiter MHC fix and keep DSv4 ATOM conc1#1202
Oseltamivir merged 3 commits intomainfrom
codex/dsv4-atom-aiter-mhc-fix

Oseltamivir commented Apr 27, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Apr 27, 2026

Uh oh!

claude Bot commented Apr 27, 2026 •

edited

Loading

Uh oh!

claude Bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Oseltamivir commented Apr 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Why

Quantization notes

Validation

References

Uh oh!

github-actions Bot commented Apr 27, 2026

Uh oh!

claude Bot commented Apr 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review: PR #1202 — Aiter MHC device fix for DSv4 ATOM

Uh oh!

claude Bot left a comment

Choose a reason for hiding this comment

Overview

Security risks

Level of scrutiny

Other factors

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Oseltamivir commented Apr 27, 2026 •

edited

Loading

claude Bot commented Apr 27, 2026 •

edited

Loading