Fuse RoPE across MLite attention implementations by ISEEKYAN · Pull Request #71 · ISEEKYAN/Megatron-LM

ISEEKYAN · 2026-06-30T21:47:21Z

Summary

add a single MLite kernel boundary for TE-backed generic and MLA fused RoPE providers
route dense/MRoPE, MLA, DSA, and CSA RoPE sites through the fused path while preserving the unfused reference path
enable RoPE fusion by default in the relevant typed implementation configs
add adapter, routing, runtime-config, and AST layering contract coverage

focused CPU suite: 43 passed
ruff and diff checks passed
eos Slurm 5538097: COMPLETED 0:0, 5 passed, 0 skipped (forward and backward coverage)

Signed-off-by: Yan Bai <bayan@nvidia.com>

ISEEKYAN added 2 commits June 30, 2026 17:57

Add fused RoPE routing to MLite attention

b7c7f66

Signed-off-by: Yan Bai <bayan@nvidia.com>

Add RoPE fusion boundary contract tests

1a4a1a3

Signed-off-by: Yan Bai <bayan@nvidia.com>

ISEEKYAN force-pushed the mlite-attention-rope-fusion branch from fe86d0e to 1a4a1a3 Compare July 1, 2026 00:57