Relax GQA seqlens_k shape validation for backward compat with older models#28259
Relax GQA seqlens_k shape validation for backward compat with older models#28259
Conversation
…odels PR #28031 tightened seqlens_k shape validation (&&->||), correctly rejecting non-1D tensors per spec. However, older model builders emit seqlens_k with shape [1,1] instead of [1], breaking HuggingFace LLMs (qwen3-0.6b, qwen3-1.7b). Relax shape check to allow unit dimensions around the batch axis: each dim must be 1 or batch_size (accepts [B], [B,1], [1,1] but rejects [2,2] for B=4). Also fixes the same latent && bug in JS/WebGPU EP. Value bounds checks in Compute() are unchanged. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
ba7d3a2 to
c0b4397
Compare
|
Sorry about the force-push — Copilot CLI rewrote the branch and lost the incremental diff history. Addressed all 5 comments:
|
Add JS/WebGPU test for [1,1] seqlens_k shape (the exact qwen3 regression
case) and C++ test for trailing batch dim shape {1,B}.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
Relaxes seqlens_k shape validation for GroupQueryAttention to restore backward compatibility with older model exporters that emit extra unit dimensions (e.g., [B,1]), while keeping the value-range checks that prevent OOB access.
Changes:
- Update C++
CheckInputs()validation to acceptseqlens_kshapes withbatch_sizetotal elements (with additional per-dimension constraints). - Apply equivalent validation updates in the JS/WebGPU
validateInputs()path. - Extend CPU and JS test coverage with legacy-shape acceptance and wrong-shape rejection cases.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
| onnxruntime/contrib_ops/cpu/bert/group_query_attention_helper.h | Updates seqlens_k shape validation and error messages in shared helper. |
| js/web/lib/wasm/jsep/webgpu/ops/group-query-attention.ts | Aligns WebGPU input validation with the relaxed seqlens_k shape rules. |
| onnxruntime/test/contrib_ops/group_query_attention_op_test.cc | Adds regression tests for legacy 2D shapes and invalid element-count/shape cases. |
| js/web/test/data/ops/group-query-attention.jsonc | Adds a Web test case covering legacy [1,1] seqlens_k shape acceptance. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Address review comments: - Reject rank-0 (scalar) seqlens_k in both C++ and JS validation - Use std::optional<vector> for test helper seqlens_k_shape param - Add SeqlensKScalarRejected test case Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
Addressed remaining comments:
|
Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>
|
Validated with https://huggingface.co/schmuell/Qwen3-1.7B |
|
@vraspar your PR #28031 broke the functionality & I have tested with open source models too. FYI intel#1067 |
|
looks like the CI build is complaining about JS formatting. Error: Following source files are not formatted: (did you run "npm run format"?) |
|
Thanks @edgchen1, Fixed the linting issue |
Problem
PR #28031 fixed a security OOB GEMM bug via crafted seqlens_k by changing
&&to||in the shape validation in group_query_attention_helper.h. This correctly enforces the spec (1D Tensor of shape (batch_size)) but breaks models (e.g. qwen3-0.6b, qwen3-1.7b) whose builder.py emits seqlens_k with shape [1,1] instead of [1].Fix
Relax the shape check to accept shapes with unit dimensions around the batch axis. The validation rule is:
Also fixes the same latent &&/|| bug in the JS/WebGPU EP (group-query-attention.ts).
Security: The per-element value bounds checks in Compute() are unchanged -- the OOB fix from #28031 is fully preserved.
Changes