Relax GQA seqlens_k shape validation for backward compat with older models by vraspar · Pull Request #28259 · microsoft/onnxruntime

vraspar · 2026-04-29T00:06:18Z

Problem

PR #28031 fixed a security OOB GEMM bug via crafted seqlens_k by changing && to || in the shape validation in group_query_attention_helper.h. This correctly enforces the spec (1D Tensor of shape (batch_size)) but breaks models (e.g. qwen3-0.6b, qwen3-1.7b) whose builder.py emits seqlens_k with shape [1,1] instead of [1].

Fix

Relax the shape check to accept shapes with unit dimensions around the batch axis. The validation rule is:

seqlens_k must be at least 1D (scalars are rejected)
Total element count must equal batch_size
Each dimension must be 1 or batch_size (e.g. accepts [B], [B,1], [1,B] but rejects [2,2] for B=4)

Also fixes the same latent &&/|| bug in the JS/WebGPU EP (group-query-attention.ts).

Security: The per-element value bounds checks in Compute() are unchanged -- the OOB fix from #28031 is fully preserved.

Changes

group_query_attention_helper.h -- scalar rejection + element-count shape check (shared by CPU, CUDA, WebGPU EPs)
group-query-attention.ts -- same fix for the JS WebGPU path
group_query_attention_op_test.cc -- tests for [1,1] compat, multi-batch [2,1] compat, trailing-batch [1,2] compat, scalar rejection, wrong-count rejection, and invalid factored shape rejection

…odels PR #28031 tightened seqlens_k shape validation (&&->||), correctly rejecting non-1D tensors per spec. However, older model builders emit seqlens_k with shape [1,1] instead of [1], breaking HuggingFace LLMs (qwen3-0.6b, qwen3-1.7b). Relax shape check to allow unit dimensions around the batch axis: each dim must be 1 or batch_size (accepts [B], [B,1], [1,1] but rejects [2,2] for B=4). Also fixes the same latent && bug in JS/WebGPU EP. Value bounds checks in Compute() are unchanged. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

vraspar · 2026-04-29T05:44:01Z

Sorry about the force-push — Copilot CLI rewrote the branch and lost the incremental diff history.

Addressed all 5 comments:

group_query_attention_helper.h:267 — Tightened the factored-shape check so each dim must be 1 or batch_size (rejects e.g. [2,2] for B=4). Added SeqlensKInvalidFactoredShape test to cover it.
group-query-attention.ts:203 — Aligned error messages between JS and C++ so they match.
group-query-attention.ts:197 — Removed [1, 1] from the comment in both C++ and JS. Now just shows [B, 1] instead of [B].
group_query_attention_op_test.cc:267 — Added a comment explaining the loose tolerance: these tests validate shape acceptance, not numerical correctness. Agree exact-value tests can be a follow-up.
group_query_attention_op_test.cc:237 — Extended RunGQASeqlensKTest with an optional seqlens_k_shape param. All 5 shape tests use the helper now, net -73 lines.

Add JS/WebGPU test for [1,1] seqlens_k shape (the exact qwen3 regression case) and C++ test for trailing batch dim shape {1,B}. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Copilot

Pull request overview

Relaxes seqlens_k shape validation for GroupQueryAttention to restore backward compatibility with older model exporters that emit extra unit dimensions (e.g., [B,1]), while keeping the value-range checks that prevent OOB access.

Changes:

Update C++ CheckInputs() validation to accept seqlens_k shapes with batch_size total elements (with additional per-dimension constraints).
Apply equivalent validation updates in the JS/WebGPU validateInputs() path.
Extend CPU and JS test coverage with legacy-shape acceptance and wrong-shape rejection cases.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.

File	Description
onnxruntime/contrib_ops/cpu/bert/group_query_attention_helper.h	Updates `seqlens_k` shape validation and error messages in shared helper.
js/web/lib/wasm/jsep/webgpu/ops/group-query-attention.ts	Aligns WebGPU input validation with the relaxed `seqlens_k` shape rules.
onnxruntime/test/contrib_ops/group_query_attention_op_test.cc	Adds regression tests for legacy 2D shapes and invalid element-count/shape cases.
js/web/test/data/ops/group-query-attention.jsonc	Adds a Web test case covering legacy `[1,1]` `seqlens_k` shape acceptance.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Address review comments: - Reject rank-0 (scalar) seqlens_k in both C++ and JS validation - Use std::optional<vector> for test helper seqlens_k_shape param - Add SeqlensKScalarRejected test case Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

vraspar · 2026-04-29T18:34:03Z

Addressed remaining comments:

*\helper.h:265* (edgchen1 + Copilot) — Added \NumDimensions() == 0\ rejection so scalar \seqlens_k\ is no longer silently accepted when \�atch_size==1. Same check added in JS path (\dims.length === 0).
*\ est.cc:26* (edgchen1) — Changed \seqlens_k_shape\ param to \std::optional<std::vector<int64_t>>\ so empty {}\ isn't confused with scalar shape. All call sites wrapped with explicit \std::vector<int64_t>{...}.
*\helper.h:277* (Copilot) — Updated PR description to reflect the full validation rule (at least 1D + element count + per-dim constraint).
Added \SeqlensKScalarRejected\ test to cover the new scalar rejection path.

Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>

vraspar · 2026-04-29T20:16:57Z

Validated with https://huggingface.co/schmuell/Qwen3-1.7B

ankitm3k · 2026-04-30T11:18:11Z

@vraspar your PR #28031 broke the functionality & I have tested with open source models too. FYI intel#1067

edgchen1 · 2026-04-30T21:48:07Z

looks like the CI build is complaining about JS formatting.

Error: Following source files are not formatted: (did you run "npm run format"?)
js/web/lib/wasm/jsep/webgpu/ops/group-query-attention.ts

vraspar · 2026-05-01T18:04:15Z

Thanks @edgchen1, Fixed the linting issue

edgchen1 reviewed Apr 29, 2026

View reviewed changes

Comment thread onnxruntime/contrib_ops/cpu/bert/group_query_attention_helper.h

vraspar force-pushed the vraspar/fix-gqa-seqlens-k-shape-compat branch from ba7d3a2 to c0b4397 Compare April 29, 2026 05:38

Add tests for legacy 2D seqlens_k shapes

958c8f3

Add JS/WebGPU test for [1,1] seqlens_k shape (the exact qwen3 regression case) and C++ test for trailing batch dim shape {1,B}. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

vraspar requested a review from Copilot April 29, 2026 06:43

Copilot started reviewing on behalf of vraspar April 29, 2026 06:43 View session

Copilot AI reviewed Apr 29, 2026

View reviewed changes

Comment thread onnxruntime/contrib_ops/cpu/bert/group_query_attention_helper.h

Comment thread onnxruntime/contrib_ops/cpu/bert/group_query_attention_helper.h

Comment thread js/web/lib/wasm/jsep/webgpu/ops/group-query-attention.ts

edgchen1 reviewed Apr 29, 2026

View reviewed changes

Comment thread onnxruntime/test/contrib_ops/group_query_attention_op_test.cc Outdated

Comment thread onnxruntime/contrib_ops/cpu/bert/group_query_attention_helper.h

edgchen1 previously approved these changes Apr 29, 2026

View reviewed changes

vraspar dismissed edgchen1’s stale review via dac0ffa April 29, 2026 18:29

edgchen1 reviewed Apr 29, 2026

View reviewed changes

Comment thread onnxruntime/test/contrib_ops/group_query_attention_op_test.cc Outdated

edgchen1 previously approved these changes Apr 29, 2026

View reviewed changes

Use unchecked operator*

e0b7ef5

Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>

vraspar dismissed edgchen1’s stale review via e0b7ef5 April 29, 2026 20:16

edgchen1 previously approved these changes Apr 29, 2026

View reviewed changes

MayureshV1 mentioned this pull request Apr 30, 2026

fix: GQA kernel relax seqlens_k shape check to element count intel/onnxruntime#1067

Closed

vraspar enabled auto-merge (squash) April 30, 2026 19:36

sanaa-hamel-microsoft added the release:1.26.0 label May 1, 2026

Fix JS linting for group-query-attention.ts

8e93585

vraspar dismissed edgchen1’s stale review via 8e93585 May 1, 2026 18:03

edgchen1 approved these changes May 1, 2026

View reviewed changes

vraspar merged commit 60ce9cc into main May 1, 2026
89 of 91 checks passed

vraspar deleted the vraspar/fix-gqa-seqlens-k-shape-compat branch May 1, 2026 22:35

Conversation

vraspar commented Apr 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

Fix

Changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

vraspar commented Apr 29, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

vraspar commented Apr 29, 2026

Uh oh!

Uh oh!

vraspar commented Apr 29, 2026

Uh oh!

ankitm3k commented Apr 30, 2026

Uh oh!

edgchen1 commented Apr 30, 2026

Uh oh!

vraspar commented May 1, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

vraspar commented Apr 29, 2026 •

edited

Loading