Skip to content

Fix Gemma4 pure-decode full attention#455

Draft
shsym wants to merge 1 commit into
pie.app/v1-base-shmemfrom
task/gemma4-pure-decode-full-attention
Draft

Fix Gemma4 pure-decode full attention#455
shsym wants to merge 1 commit into
pie.app/v1-base-shmemfrom
task/gemma4-pure-decode-full-attention

Conversation

@shsym

@shsym shsym commented Jun 30, 2026

Copy link
Copy Markdown
Contributor

Gemma4 31B full/global layers (32 Q / 4 KV heads) corrupted in pure decode because the manual full-attention path materialized GQA with ggml_repeat_4d instead of letting ggml_mul_mat apply grouped-query broadcast.

This keeps K/V grouped like the slow path and views the packed decode mask as a single-query mask for manual SDPA.

Verified with clean 31B Q4 long-gen, clean 31B Q8, and no 12B Q4 regression.

@coderabbitai

coderabbitai Bot commented Jun 30, 2026

Copy link
Copy Markdown

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 9287120e-523d-4dbb-867c-797acb7e5e05

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch task/gemma4-pure-decode-full-attention

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

@shsym shsym marked this pull request as draft June 30, 2026 07:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant