Skip to content

[Bug] Default value of ChunkQ in deepgemm could lead to divided_by_0 error #2891

Open
qli88 wants to merge 1 commit intomainfrom
qiang_chunkQ_fix
Open

[Bug] Default value of ChunkQ in deepgemm could lead to divided_by_0 error #2891
qli88 wants to merge 1 commit intomainfrom
qiang_chunkQ_fix

Conversation

@qli88
Copy link
Copy Markdown

@qli88 qli88 commented Apr 24, 2026

GLM 5.1 FP8 fails on MI300/355 because its index_n_heads=32 and ChunkQ = 64, which will lead to divided by zero error when calculation triton grid.

@qli88 qli88 requested review from a team and Copilot April 24, 2026 00:34
@github-actions
Copy link
Copy Markdown
Contributor

🏷️ CI Guide

Runs automatically on every PR:

  • ✅ Pre-checks (submodule verification, code formatting)
  • ✅ Aiter op tests (gfx942 + gfx950)
  • ✅ Triton tests on MI35X (only when aiter/ops/triton/** or related paths are changed)

Extended tests (opt-in via labels):

Label Tests
ci:sglang SGLang integration tests
ci:atom ATOM benchmark (DeepSeek-R1 + GPT-OSS)
ci:vllm vLLM benchmark
ci:all All of the above

Add labels via the sidebar or gh pr edit 2891 --add-label <label>

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes a divide-by-zero failure in the DeepGEMM FP8 paged MQA logits Triton wrapper that can occur when the default ChunkQ is larger than heads, causing TileQCount to become 0 and breaking the Triton grid/SplitKV computation.

Changes:

  • Clamp ChunkQ to heads in deepgemm_fp8_paged_mqa_logits_stage1 to prevent TileQCount == 0.
  • Prevent downstream TotalCuCount // TileQCount from raising a ZeroDivisionError in the reported configuration (e.g., heads=32, ChunkQ=64).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread aiter/ops/triton/attention/pa_mqa_logits.py Outdated
@qli88 qli88 force-pushed the qiang_chunkQ_fix branch from e13756b to faf9226 Compare April 24, 2026 00:41
Update TileQCount calculation in deepgemm_fp8_paged_mqa_logits_stage1 to
avoid possible divided_by_zero issue
@qli88 qli88 force-pushed the qiang_chunkQ_fix branch from 7939a94 to d1ac490 Compare April 24, 2026 04:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants