Skip to content

Revert "Add subgroup topk kernel for XPU (part1 of #3369) (#3371)"#3707

Merged
guangyey merged 1 commit into
mainfrom
yifeng/revert_topk_opt
May 19, 2026
Merged

Revert "Add subgroup topk kernel for XPU (part1 of #3369) (#3371)"#3707
guangyey merged 1 commit into
mainfrom
yifeng/revert_topk_opt

Conversation

@CuiYifeng
Copy link
Copy Markdown
Contributor

@CuiYifeng CuiYifeng commented May 19, 2026

This reverts commit 8eaa591, except new overloads of sycl_kernel_submit.
The motivation is that #3371 caused build timeout in stock CD.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR reverts the subgroup top-k XPU optimization that caused build timeouts with multiple AOT targets, restoring the existing segmented top-k path while removing the dedicated subgroup kernel files.

Review context: read .github/skills/xpu-ops-pr-review/SKILL.md.

Changes:

  • Removes the subgroup top-k kernel implementation and declaration.
  • Removes the optimized-path dispatch from topk_kernel.
  • Restores unconditional use of segmented_group_select_pairs followed by optional sorting.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated no comments.

File Description
src/ATen/native/xpu/sycl/TensorTopKSbtopkKernel.h Deletes the subgroup top-k API declarations.
src/ATen/native/xpu/sycl/TensorTopKSbtopkKernel.cpp Deletes the subgroup top-k kernel implementation and launch logic.
src/ATen/native/xpu/sycl/TensorTopKKernel.cpp Restores the original top-k execution path without subgroup dispatch.

Copy link
Copy Markdown
Contributor

@jianyizh jianyizh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please write clear in description that it's reverted because of long build time in CD

@CuiYifeng
Copy link
Copy Markdown
Contributor Author

Please write clear in description that it's reverted because of long build time in CD

Updated.

@CuiYifeng CuiYifeng requested a review from jianyizh May 19, 2026 06:03
@guangyey
Copy link
Copy Markdown
Contributor

I saw the build passed and landed it immediately since the change only removed code. We need to update commit pin as soon as possible to unblock pytorch/pytorch#173895 PyTorch CI.

@guangyey guangyey merged commit 5872efe into main May 19, 2026
29 checks passed
@guangyey guangyey deleted the yifeng/revert_topk_opt branch May 19, 2026 07:56
@chuanqi129
Copy link
Copy Markdown
Contributor

@copilot New UT failures detected in op_ut tests (7 total). Please check these new failures and analyze whether they are caused by the PR changes.

Important: Do NOT update the plan in the PR description directly. Use reply comments to update the status.

Job log: https://github.com/intel/torch-xpu-ops/actions/runs/26078819694
Full new failure report: https://github.com/intel/torch-xpu-ops/actions/runs/26078819694/artifacts/7081996199

New failures:

Category Class name Test name Status Message
op_ut third_party.torch-xpu-ops.test.xpu.dynamo.test_regional_inductor_xpu.RegionalInductorInvokeSubgraphTests test_max_autotune_no_cudagraphs_serialize_False failed AttributeError: module 'torch._inductor' has no attribute 'compile_fx'
op_ut third_party.torch-xpu-ops.test.xpu.functorch.test_control_flow_xpu.TestControlFlowTraced test_while_loop_autograd_simple failed AssertionError: 'clas[152 chars] zeros: "i64[]" = torch.ops.aten.zeros.def[3514 chars]1)\n' != 'clas[152 chars] clone: "f32[3, 3]" = torch.ops.aten.clone[3636 chars]1)\n'
op_ut third_party.torch-xpu-ops.test.xpu.functorch.test_ops_xpu error ImportError: cannot import name 'skipOps' from 'common_utils' (/__w/torch-xpu-ops/torch-xpu-ops/pytorch/third_party/torch-xpu-ops/test/xpu/../../../../test/functorch/common_utils.py)
op_ut third_party.torch-xpu-ops.test.xpu.functorch.test_vmap_xpu error ImportError: cannot import name 'skipOps' from 'common_utils' (/__w/torch-xpu-ops/torch-xpu-ops/pytorch/third_party/torch-xpu-ops/test/xpu/../../../../test/functorch/common_utils.py)
op_ut third_party.torch-xpu-ops.test.xpu.test_custom_ops_xpu.TestCustomOpAPI test_no_grad_skips_autograd failed AssertionError: Scalars are not equal!

... and 2 more failures. See full list in artifact.

Instructions:

  1. Download the full new failure report artifact for complete error messages
  2. Analyze whether the failures are caused by PR changes
  3. If caused by PR changes, fix those failures on this PR branch directly
  4. Add failing test cases with pytest commands in the PR description for the reproducer job to use
  5. Use reply comments to report analysis status - do NOT modify the PR description plan directly

@github-actions
Copy link
Copy Markdown

Performance outliers, please check!

  • 🔴 [-1, 80%), should be regression
Category Model Target vs. Baseline [Eager] Target vs. Baseline [Inductor]
timm_models_bfloat16_training ghostnet_100 0.856848 0.564996
timm_models_bfloat16_training mobilenetv3_large_100 0.783607 0.569622
timm_models_bfloat16_training mobilevit_s 0.718829 0.579479
timm_models_bfloat16_training mobilenetv2_100 0.835302 0.584723
timm_models_bfloat16_training tf_efficientnet_b0 0.804629 0.619609
timm_models_bfloat16_training vit_base_patch16_siglip_256 0.695374 0.744301
timm_models_bfloat16_training deit_base_distilled_patch16_224 0.752327 0.772345
timm_models_bfloat16_training dm_nfnet_f0 0.639916 0.782825
timm_models_bfloat16_training nfnet_l0 0.738771 0.783153
torchbench_bfloat16_training mobilenet_v2 1.063866 0.791071
timm_models_bfloat16_training adv_inception_v3 0.791371 0.796311
timm_models_bfloat16_training beit_base_patch16_224 0.735183 0.813880
timm_models_bfloat16_training visformer_small 0.719781 0.822906
  • 🟡 [80%, 90%), may be fluctuations
Category Model Target vs. Baseline [Eager] Target vs. Baseline [Inductor]
timm_models_bfloat16_training swin_base_patch4_window7_224 0.828552 0.804603
timm_models_bfloat16_training inception_v3 0.836975 0.830448
timm_models_bfloat16_training repvgg_a2 0.823781 0.837616
timm_models_bfloat16_training convnextv2_nano.fcmae_ft_in22k_in1k 0.812822 0.854369
torchbench_bfloat16_training resnext50_32x4d 0.968512 0.862016
torchbench_bfloat16_training mnasnet1_0 1.071825 0.863574
timm_models_bfloat16_training deit_tiny_patch16_224.fb_in1k 0.884297 0.890124

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants