fix mhc device by valarLip · Pull Request #2916 · ROCm/aiter

valarLip · 2026-04-25T15:33:09Z

Motivation

Technical Details

Test Plan

Test Result

Submission Checklist

Look over the contributing guidelines at https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.

github-actions · 2026-04-25T15:33:38Z

🏷️ CI Guide

Runs automatically on every PR:

✅ Pre-checks (submodule verification, code formatting)
✅ Aiter op tests (gfx942 + gfx950)
✅ Triton tests on MI35X (only when aiter/ops/triton/** or related paths are changed)

Extended tests (opt-in via labels):

Label	Tests
`ci:triton-300x`	Run an additional Triton test job on MI300X in PRs; main branch always runs both MI35X and MI300X
`ci:sglang`	SGLang integration tests
`ci:atom`	ATOM benchmark (DeepSeek-R1 + GPT-OSS)
`ci:vllm`	vLLM benchmark
`ci:all`	All of the above

Add labels via the sidebar or gh pr edit 2916 --add-label <label>

Copilot

Pull request overview

Fixes mhc_pre intermediate tensor allocations to be created on the same device as the input residual, preventing device-mismatch failures when the global default device is not CUDA.

Changes:

Reorders imports in aiter/ops/mhc.py.
Allocates out_pad, sqrsum, post_mix, comb_mix, and layer_input with device=residual.device.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-25T15:36:38Z

+    device = residual.device
    out_pad = torch.empty(
-        selected_splitk, m, (hc_mult3 + 31) // 32 * 32, dtype=dtypes.fp32
+        selected_splitk, m, (hc_mult3 + 31) // 32 * 32, dtype=dtypes.fp32, device=device


Consider adding a regression test that exercises mhc_pre when the global default device is CPU but inputs are explicitly on CUDA (e.g., torch.set_default_device('cpu') then create residual/fn/hc_scale/hc_base on cuda). This change fixes internal tensor allocations to follow residual.device, but current tests may still pass even if allocations accidentally fall back to the default device.

fix mhc device

76ea1ed

valarLip requested review from a team and Copilot April 25, 2026 15:33

Copilot started reviewing on behalf of valarLip April 25, 2026 15:34 View session

Copilot AI reviewed Apr 25, 2026

View reviewed changes

sunway513 mentioned this pull request Apr 25, 2026

[RFC] DeepSeek-V4 KV Cache Reform — closed for correctness implementation (v0.2.6) sunway513/ATOM#35

Open

33 tasks

Oseltamivir mentioned this pull request Apr 27, 2026

Aiter MHC fix and keep DSv4 ATOM conc1 SemiAnalysisAI/InferenceX#1202

Merged

valarLip merged commit 8c27e66 into main Apr 28, 2026
32 checks passed

valarLip deleted the fix_mhc_device branch April 28, 2026 03:01

Oseltamivir added a commit to SemiAnalysisAI/InferenceX that referenced this pull request Apr 28, 2026

Aiter MHC_pre ROCm/aiter#2916 fix

38d2da7

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix mhc device#2916

fix mhc device#2916
valarLip merged 1 commit intomainfrom
fix_mhc_device

valarLip commented Apr 25, 2026

Uh oh!

github-actions Bot commented Apr 25, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Apr 25, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

valarLip commented Apr 25, 2026

Motivation

Technical Details

Test Plan

Test Result

Submission Checklist

Uh oh!

github-actions Bot commented Apr 25, 2026

🏷️ CI Guide

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Apr 25, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants