Skip to content

Support preshuffled layout in indexer_k_quant_and_cache / cp_gather_indexer_k_quant_cache#2879

Open
1am9trash wants to merge 2 commits intoROCm:mainfrom
1am9trash:support-indexer_k_quant_and_cache_kernel-with-preshuffle
Open

Support preshuffled layout in indexer_k_quant_and_cache / cp_gather_indexer_k_quant_cache#2879
1am9trash wants to merge 2 commits intoROCm:mainfrom
1am9trash:support-indexer_k_quant_and_cache_kernel-with-preshuffle

Conversation

@1am9trash
Copy link
Copy Markdown
Member

Motivation

In sglang PR#23562, we bump the nsa indexer's page_size to 64, which switches the indexer attention path to the _gluon_deepgemm_fp8_paged_mqa_logits_preshuffle kernel. The kernel consumes the indexer k cache in an MFMA 16×16 preshuffled layout.

Technical Details

This PR extends indexer_k_quant_and_cache and cp_gather_indexer_k_quant_cache to support writing and gathering the preshuffled layout.

We add an optional preshuffle: bool = False argument to both ops. Default is False, so existing callers are unaffected.

  • preshuffle=False (default): unchanged row-major [block_size, head_dim] layout inside each paged block.
  • preshuffle=True: each block's k region is written / read as a sequence of MFMA 16×16 tiles.

Also extend the testcase with -p/--preshuffle flag.

Test Plan

Test Result

Submission Checklist

@1am9trash 1am9trash requested a review from a team April 23, 2026 10:05
@github-actions
Copy link
Copy Markdown
Contributor

🏷️ CI Guide

Runs automatically on every PR:

  • ✅ Pre-checks (submodule verification, code formatting)
  • ✅ Aiter op tests (gfx942 + gfx950)
  • ✅ Triton tests on MI35X (only when aiter/ops/triton/** or related paths are changed)

Extended tests (opt-in via labels):

Label Tests
ci:sglang SGLang integration tests
ci:atom ATOM benchmark (DeepSeek-R1 + GPT-OSS)
ci:vllm vLLM benchmark
ci:all All of the above

Add labels via the sidebar or gh pr edit 2879 --add-label <label>

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant