rmsnorm gluon kernel created for gfx1250 by amd-jrosas · Pull Request #2912 · ROCm/aiter

amd-jrosas · 2026-04-24T14:14:56Z

Motivation

Create rmsnorm kernel in gluon for gfx1250

Technical Details

Translated existing triton implementation into a gluon equivalent.

Test Plan

Added a test reference in existing test_rmsnorm.py for gluon implementation.

Test Result

Passed all test condition

…include gfx1250 rmsnorm

github-actions · 2026-04-24T14:15:44Z

🏷️ CI Guide

Runs automatically on every PR:

✅ Pre-checks (submodule verification, code formatting)
✅ Aiter op tests (gfx942 + gfx950)
✅ Triton tests on MI35X (only when aiter/ops/triton/** or related paths are changed)

Extended tests (opt-in via labels):

Label	Tests
`ci:triton-300x`	Run an additional Triton test job on MI300X in PRs; main branch always runs both MI35X and MI300X
`ci:sglang`	SGLang integration tests
`ci:atom`	ATOM benchmark (DeepSeek-R1 + GPT-OSS)
`ci:vllm`	vLLM benchmark
`ci:all`	All of the above

Add labels via the sidebar or gh pr edit 2912 --add-label <label>

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

vgokhale · 2026-04-26T02:57:04Z

+    sharedLayoutWeights: gl.constexpr = gl.SwizzledSharedLayout(1, 1, 1, order=[0])
+
+    # create a swizzled shared layout for the output
+    gl.SwizzledSharedLayout(1, 1, 1, order=[1, 0])


This isn't assigned to anything but is probably also not needed since the output is not TDM stored so you don't need a shared layout for it?

vgokhale · 2026-04-26T02:57:48Z

+
+        # Loop through the rows of the input tensor by NUM_PROG blocks
+        for row_idx in range(row_start, n_rows, NUM_PROG):
+            input_ptr + (row_idx * input_row_stride)


This isn't assigned to anything?

vgokhale · 2026-04-26T02:59:17Z

+            rms_norm = a * norm_factor * weights
+            # store rms norm and the norm factor
+            gl.store(
+                rsigma_ptr + row_start, norm_factor.to(rsigma_ptr.dtype.element_ty)


Did you mean row_idx?

vgokhale · 2026-04-26T03:02:28Z

+    USE_BLOCK = COL > BLOCK_SIZE
+    NUM_PROG = min(ROW, get_num_sms())
+
+    grid = (NUM_PROG,)


I think you can put min(ROW, get_num_sms()) here.

vgokhale · 2026-04-26T03:02:44Z

+    output = torch.empty_like(input, device=input.device)
+    rsigma = torch.empty((ROW,), device=input.device, dtype=input.dtype)
+
+    MAX_FUSED_SIZE = 65536 // input.element_size()


Comment for the magic number?

vgokhale · 2026-04-26T19:38:25Z

@@ -0,0 +1,39 @@
+# SPDX-License-Identifier: MIT


I don't think we want two different files. We want a single API and the wrapper decides whether to call triton (gfx950 and earlier) or gluon (gfx1250, if a gluon kernel exists).

rmsnorm gluon kernel created for gfx1250. Updated test_rmsnorm.py to …

3991a5d

…include gfx1250 rmsnorm

amd-jrosas requested a review from a team April 24, 2026 14:14

amd-jrosas and others added 5 commits April 24, 2026 11:10

Apply suggestion from @github-actions[bot]

4988c3c

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

Apply suggestion from @github-actions[bot]

6b421a8

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

Apply suggestion from @github-actions[bot]

5216f1f

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

Merge branch 'main' into jrosas_gluon_rmsnorm

36a09f9

Fixed formating and style

374921a

vgokhale reviewed Apr 26, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

rmsnorm gluon kernel created for gfx1250#2912

rmsnorm gluon kernel created for gfx1250#2912
amd-jrosas wants to merge 6 commits intomainfrom
jrosas_gluon_rmsnorm

amd-jrosas commented Apr 24, 2026

Uh oh!

github-actions Bot commented Apr 24, 2026

Uh oh!

vgokhale Apr 26, 2026

Uh oh!

vgokhale Apr 26, 2026

Uh oh!

vgokhale Apr 26, 2026

Uh oh!

vgokhale Apr 26, 2026

Uh oh!

vgokhale Apr 26, 2026

Uh oh!

vgokhale Apr 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

amd-jrosas commented Apr 24, 2026

Motivation

Technical Details

Test Plan

Test Result

Uh oh!

github-actions Bot commented Apr 24, 2026

🏷️ CI Guide

Uh oh!

vgokhale Apr 26, 2026

Choose a reason for hiding this comment

Uh oh!

vgokhale Apr 26, 2026

Choose a reason for hiding this comment

Uh oh!

vgokhale Apr 26, 2026

Choose a reason for hiding this comment

Uh oh!

vgokhale Apr 26, 2026

Choose a reason for hiding this comment

Uh oh!

vgokhale Apr 26, 2026

Choose a reason for hiding this comment

Uh oh!

vgokhale Apr 26, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants