Skip to content

test: add XPU regression test for flash attention dynamic shape generalization (#3007)#3679

Draft
Copilot wants to merge 3 commits into
mainfrom
copilot/upstream-ut-fix-assertion-error
Draft

test: add XPU regression test for flash attention dynamic shape generalization (#3007)#3679
Copilot wants to merge 3 commits into
mainfrom
copilot/upstream-ut-fix-assertion-error

Conversation

Copy link
Copy Markdown
Contributor

Copilot AI commented May 15, 2026

  • Analyzed issue: test_flash_attention_dynamic failing on XPU
  • Found root cause: check_flash_attention_head_dim_size used concrete .size() instead of .sym_size()
  • Confirmed pytorch upstream fixes are in pytorch/main
  • Created test/xpu/dynamo/test_cuda_repro_xpu.py with test_flash_attention_dynamic regression test
  • Added new test file to test/xpu/skip_list_common.py
  • Created test/repro/test_flash_attention_dynamic.py standalone repro test
  • Fixed lint errors:
    • TEST_HAS_MAIN: replaced unittest.main() with run_tests() from torch.testing._internal.common_utils
    • PYFMT: removed extra blank line between imports and device_type
    • Switched from unittest.TestCase to torch's TestCase

Copilot AI linked an issue May 15, 2026 that may be closed by this pull request
7 tasks
Add test/xpu/dynamo/test_cuda_repro_xpu.py with test_flash_attention_dynamic
to validate that torch.compile(dynamic=True) + XPU flash attention produces
exactly 2 Dynamo frames (no over-specialisation on seq_len).

Also add test/repro/test_flash_attention_dynamic.py as standalone reproducer
and register the new dynamo test file in skip_list_common.py.

Root cause was in pytorch's xpu/sdp_utils.cpp where
check_flash_attention_head_dim_size used concrete .size() instead of
.sym_size(), materializing the head-dim dimension as a static guard and
preventing Dynamo from generalising seq_len after the second recompilation.
Fixed upstream in pytorch commit fd1d1b0.

Agent-Logs-Url: https://github.com/intel/torch-xpu-ops/sessions/2acbff39-ce65-4690-80a9-20b71096b0a3

Co-authored-by: Stonepia <12094956+Stonepia@users.noreply.github.com>
Copilot AI changed the title [WIP] Fix AssertionError in test_flash_attention_dynamic test: add XPU regression test for flash attention dynamic shape generalization (#3007) May 15, 2026
Copilot AI requested a review from Stonepia May 15, 2026 03:43
@chuanqi129
Copy link
Copy Markdown
Contributor

@copilot The Python lint (flake8/ruff) check failed. Please fix the lint errors in this PR.

Important: Do NOT update the plan in the PR description directly. Use reply comments to update the status.

Lint job log: https://github.com/intel/torch-xpu-ops/actions/runs/25898977896

Instructions:

  1. Review the lint failure log via the link above
  2. Fix all lint errors in the relevant source files
  3. Ensure both Python lint and Clang format checks pass
  4. Use reply comments to report progress - do NOT modify the PR description plan directly

…tion_dynamic.py

Fix two lintrunner errors:
- TEST_HAS_MAIN: replace unittest.main() with run_tests() from
  torch.testing._internal.common_utils
- PYFMT: remove extra blank line between imports and module-level code
- Use torch's TestCase base class instead of unittest.TestCase

Agent-Logs-Url: https://github.com/intel/torch-xpu-ops/sessions/c871ddab-679b-49e2-8620-824352689370

Co-authored-by: chuanqi129 <13608516+chuanqi129@users.noreply.github.com>
Copy link
Copy Markdown
Contributor Author

Copilot AI commented May 15, 2026

@copilot The Python lint (flake8/ruff) check failed. Please fix the lint errors in this PR.

**Important: Do NOT update the plan in the PR description directly. Use reply comments to update the status...

Fixed in f203f4e. Two lintrunner errors in test/repro/test_flash_attention_dynamic.py:

  • TEST_HAS_MAIN: replaced unittest.main() with run_tests() from torch.testing._internal.common_utils; also switched base class to torch's TestCase
  • PYFMT: removed the extra blank line between the import block and device_type

Copilot AI requested a review from chuanqi129 May 15, 2026 03:47
@chuanqi129
Copy link
Copy Markdown
Contributor

@copilot New UT failures detected in xpu_distributed tests (0 total). Please check these new failures and analyze whether they are caused by the PR changes.

Important: Do NOT update the plan in the PR description directly. Use reply comments to update the status.

Job log: https://github.com/intel/torch-xpu-ops/actions/runs/25899102823
Full new failure report: https://github.com/intel/torch-xpu-ops/actions/runs/25899102823/artifacts/7032831302

New failures:

Instructions:

  1. Download the full new failure report artifact for complete error messages
  2. Analyze whether the failures are caused by PR changes
  3. If caused by PR changes, fix those failures on this PR branch directly
  4. Add failing test cases with pytest commands in the PR description for the reproducer job to use
  5. Use reply comments to report analysis status - do NOT modify the PR description plan directly

Copy link
Copy Markdown
Contributor Author

Copilot AI commented May 16, 2026

@copilot New UT failures detected in xpu_distributed tests (0 total). Please check these new failures and analyze whether they are caused by the PR changes.

**Important: Do NOT update the plan in...

Reviewed the xpu_distributed report. The notification lists 0 new failures — the "New failures:" section is empty. There are no test failures attributable to this PR's changes.

@github-actions
Copy link
Copy Markdown

Performance outliers, please check!

  • 🔴 [-1, 80%), should be regression
Category Model Target vs. Baseline [Eager] Target vs. Baseline [Inductor]
timm_models_bfloat16_training ghostnet_100 0.862841 0.567453
timm_models_bfloat16_training mobilenetv3_large_100 0.783959 0.569754
timm_models_bfloat16_training mobilevit_s 0.709906 0.573729
timm_models_bfloat16_training mobilenetv2_100 0.828178 0.577952
timm_models_bfloat16_training tf_efficientnet_b0 0.812896 0.624571
timm_models_bfloat16_training vit_base_patch16_siglip_256 0.690675 0.741665
timm_models_bfloat16_training deit_base_distilled_patch16_224 0.737149 0.749822
timm_models_bfloat16_training nfnet_l0 0.729316 0.767521
timm_models_bfloat16_training dm_nfnet_f0 0.639279 0.776440
timm_models_bfloat16_training adv_inception_v3 0.787776 0.787703
torchbench_bfloat16_training mobilenet_v2 1.063295 0.788832
timm_models_bfloat16_training beit_base_patch16_224 0.730509 0.806620
timm_models_bfloat16_training visformer_small 0.724751 0.823908
  • 🟡 [80%, 90%), may be fluctuations
Category Model Target vs. Baseline [Eager] Target vs. Baseline [Inductor]
timm_models_bfloat16_training swin_base_patch4_window7_224 0.854840 0.824171
timm_models_bfloat16_training inception_v3 0.841318 0.824523
timm_models_bfloat16_training convnextv2_nano.fcmae_ft_in22k_in1k 0.805900 0.848013
timm_models_bfloat16_training repvgg_a2 0.845820 0.863090
torchbench_bfloat16_training mnasnet1_0 1.076459 0.870497
timm_models_bfloat16_training deit_tiny_patch16_224.fb_in1k 0.877745 0.887364
torchbench_bfloat16_training resnext50_32x4d 0.978178 0.887715

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Title: [upstream_ut] AssertionError: Scalars are not equal!

3 participants