Skip to content

Add RDNA4 MoE WMMA kernel path#430

Draft
vivienfanghuagood wants to merge 1 commit intoROCm:mainfrom
vivienfanghuagood:fhq/rdna4-moe
Draft

Add RDNA4 MoE WMMA kernel path#430
vivienfanghuagood wants to merge 1 commit intoROCm:mainfrom
vivienfanghuagood:fhq/rdna4-moe

Conversation

@vivienfanghuagood
Copy link
Copy Markdown
Collaborator

@vivienfanghuagood vivienfanghuagood commented Apr 23, 2026

Summary

  • add an RDNA4-specific MoE 2-stage WMMA path for gfx120x/gfx1201
  • factor the public MoE compile wrappers so CDNA, gfx1250, and RDNA4 share the same external API shape
  • add RDNA4 test coverage, benchmark integration, and user-facing documentation
  • make the RDNA4 stage2 fp16/bf16 atomic path stream-safe and cover a real masked-reduce case

Testing

  • python3 -m py_compile kernels/rdna_moe_gemm_2stage.py kernels/moe_gemm_2stage_common_gfx1250.py tests/kernels/test_moe_gemm_rdna4.py tests/kernels/test_moe_gemm_wmma_gfx1250.py
  • pytest -q tests/kernels/test_moe_gemm_rdna4.py
  • pytest -q tests/kernels/test_moe_gemm_rdna4.py::test_moe_reduce_valid_mask_masks_invalid_routes
  • sh -n scripts/run_benchmark.sh

Validated on gfx1201 (Radeon AI PRO R9700).

@vivienfanghuagood vivienfanghuagood marked this pull request as draft April 23, 2026 13:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant