[xpu] Guard oneDNN SDPA against small head_dim causing GPU page faults#8
Open
laifenxiawucha wants to merge 1 commit into
Open
[xpu] Guard oneDNN SDPA against small head_dim causing GPU page faults#8laifenxiawucha wants to merge 1 commit into
laifenxiawucha wants to merge 1 commit into
Conversation
…PU page fault in oneDNN MatMul kernel (MFDNN-14479) when head_dim is\nbelow the kernel tiling block size. The oneDNN 3.11.1 upgrade\n(PyTorch pytorch#177607) fixed the original crash (head_dim=96, seq_len=65536)\nbut missed small head_dim values (head_dim=16, seq_len=16413).\n\nRoot cause: can_use_overrideable_attention had no minimum head_dim\ncheck, so shapes that crash oneDNN were still routed to it.\n\nFix: Add minimum head_dim guard in check_head_dim_size_xpu():\n fp32 requires head_dim >= 32 (SIMD16=64B tiling block)\n fp16/bf16 requires head_dim >= 16 (SIMD16=32B tiling block)\n\nFixes: intel/torch-xpu-ops#3394\nReference: MFDNN-14479\nCUDA ref: aten/src/ATen/native/transformers/sdp_utils_cpp.cpp:14-34\n (CUDA has no min check - SIMT handles small dims natively)
e54b848 to
b2e8d74
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
Guard oneDNN SDPA against head_dim values that trigger GPU page faults in the oneDNN MatMul kernel (MFDNN-14479).
The oneDNN 3.11.1 upgrade (PyTorch pytorch#177607) fixed the original crash case (head_dim=96, seq_len=65536) but the fix was incomplete — small head_dim values still crash.
Root Cause
check_head_dim_size_xpu()inaten/src/ATen/native/mkldnn/xpu/Attention.cpponly enforced a maximum head_dim (576) but had no minimum. oneDNN's MatMul kernel for SDPA requires head_dim >= SIMD tiling block size (32 for fp32, 16 for fp16/bf16). When head_dim is below this, the kernel writes beyond allocated buffers, producing:Fix
Add minimum head_dim guard in
check_head_dim_size_xpu():Shapes below the minimum fall through to the math backend.
Reproducer
After fix: falls back to math backend (no crash), produces correct output.
Self-Review
check_head_dim_size_xpu()is the appropriate gate for oneDNN SDPA constraintsReferences
aten/src/ATen/native/transformers/sdp_utils_cpp.cpp:14-34Fixes: intel/torch-xpu-ops#3394