Revert "Add subgroup topk kernel for XPU (part1 of #3369) (#3371)" by CuiYifeng · Pull Request #3707 · intel/torch-xpu-ops

CuiYifeng · 2026-05-19T05:46:41Z

This reverts commit 8eaa591, except new overloads of sycl_kernel_submit.
The motivation is that #3371 caused build timeout in stock CD.

This reverts commit 8eaa591.

Copilot

Pull request overview

This PR reverts the subgroup top-k XPU optimization that caused build timeouts with multiple AOT targets, restoring the existing segmented top-k path while removing the dedicated subgroup kernel files.

Review context: read .github/skills/xpu-ops-pr-review/SKILL.md.

Changes:

Removes the subgroup top-k kernel implementation and declaration.
Removes the optimized-path dispatch from topk_kernel.
Restores unconditional use of segmented_group_select_pairs followed by optional sorting.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated no comments.

File	Description
`src/ATen/native/xpu/sycl/TensorTopKSbtopkKernel.h`	Deletes the subgroup top-k API declarations.
`src/ATen/native/xpu/sycl/TensorTopKSbtopkKernel.cpp`	Deletes the subgroup top-k kernel implementation and launch logic.
`src/ATen/native/xpu/sycl/TensorTopKKernel.cpp`	Restores the original top-k execution path without subgroup dispatch.

jianyizh

Please write clear in description that it's reverted because of long build time in CD

CuiYifeng · 2026-05-19T06:03:34Z

Please write clear in description that it's reverted because of long build time in CD

Updated.

guangyey · 2026-05-19T07:56:15Z

I saw the build passed and landed it immediately since the change only removed code. We need to update commit pin as soon as possible to unblock pytorch/pytorch#173895 PyTorch CI.

chuanqi129 · 2026-05-19T11:20:51Z

@copilot New UT failures detected in op_ut tests (7 total). Please check these new failures and analyze whether they are caused by the PR changes.

Important: Do NOT update the plan in the PR description directly. Use reply comments to update the status.

Job log: https://github.com/intel/torch-xpu-ops/actions/runs/26078819694
Full new failure report: https://github.com/intel/torch-xpu-ops/actions/runs/26078819694/artifacts/7081996199

New failures:

Category	Class name	Test name	Status	Message
op_ut	third_party.torch-xpu-ops.test.xpu.dynamo.test_regional_inductor_xpu.RegionalInductorInvokeSubgraphTests	test_max_autotune_no_cudagraphs_serialize_False	failed	AttributeError: module 'torch._inductor' has no attribute 'compile_fx'
op_ut	third_party.torch-xpu-ops.test.xpu.functorch.test_control_flow_xpu.TestControlFlowTraced	test_while_loop_autograd_simple	failed	AssertionError: 'clas[152 chars] zeros: "i64[]" = torch.ops.aten.zeros.def[3514 chars]1)\n' != 'clas[152 chars] clone: "f32[3, 3]" = torch.ops.aten.clone[3636 chars]1)\n'
op_ut		third_party.torch-xpu-ops.test.xpu.functorch.test_ops_xpu	error	ImportError: cannot import name 'skipOps' from 'common_utils' (/__w/torch-xpu-ops/torch-xpu-ops/pytorch/third_party/torch-xpu-ops/test/xpu/../../../../test/functorch/common_utils.py)
op_ut		third_party.torch-xpu-ops.test.xpu.functorch.test_vmap_xpu	error	ImportError: cannot import name 'skipOps' from 'common_utils' (/__w/torch-xpu-ops/torch-xpu-ops/pytorch/third_party/torch-xpu-ops/test/xpu/../../../../test/functorch/common_utils.py)
op_ut	third_party.torch-xpu-ops.test.xpu.test_custom_ops_xpu.TestCustomOpAPI	test_no_grad_skips_autograd	failed	AssertionError: Scalars are not equal!

... and 2 more failures. See full list in artifact.

Instructions:

Download the full new failure report artifact for complete error messages
Analyze whether the failures are caused by PR changes
If caused by PR changes, fix those failures on this PR branch directly
Add failing test cases with pytest commands in the PR description for the reproducer job to use
Use reply comments to report analysis status - do NOT modify the PR description plan directly

github-actions · 2026-05-19T11:48:32Z

Performance outliers, please check!

🔴 [-1, 80%), should be regression

Category	Model	Target vs. Baseline [Eager]	Target vs. Baseline [Inductor]
timm_models_bfloat16_training	ghostnet_100	0.856848	0.564996
timm_models_bfloat16_training	mobilenetv3_large_100	0.783607	0.569622
timm_models_bfloat16_training	mobilevit_s	0.718829	0.579479
timm_models_bfloat16_training	mobilenetv2_100	0.835302	0.584723
timm_models_bfloat16_training	tf_efficientnet_b0	0.804629	0.619609
timm_models_bfloat16_training	vit_base_patch16_siglip_256	0.695374	0.744301
timm_models_bfloat16_training	deit_base_distilled_patch16_224	0.752327	0.772345
timm_models_bfloat16_training	dm_nfnet_f0	0.639916	0.782825
timm_models_bfloat16_training	nfnet_l0	0.738771	0.783153
torchbench_bfloat16_training	mobilenet_v2	1.063866	0.791071
timm_models_bfloat16_training	adv_inception_v3	0.791371	0.796311
timm_models_bfloat16_training	beit_base_patch16_224	0.735183	0.813880
timm_models_bfloat16_training	visformer_small	0.719781	0.822906

🟡 [80%, 90%), may be fluctuations

Category	Model	Target vs. Baseline [Eager]	Target vs. Baseline [Inductor]
timm_models_bfloat16_training	swin_base_patch4_window7_224	0.828552	0.804603
timm_models_bfloat16_training	inception_v3	0.836975	0.830448
timm_models_bfloat16_training	repvgg_a2	0.823781	0.837616
timm_models_bfloat16_training	convnextv2_nano.fcmae_ft_in22k_in1k	0.812822	0.854369
torchbench_bfloat16_training	resnext50_32x4d	0.968512	0.862016
torchbench_bfloat16_training	mnasnet1_0	1.071825	0.863574
timm_models_bfloat16_training	deit_tiny_patch16_224.fb_in1k	0.884297	0.890124

Revert "Add subgroup topk kernel for XPU (part1 of #3369) (#3371)"

abfadeb

This reverts commit 8eaa591.

CuiYifeng requested review from chuanqi129, Copilot and jianyizh May 19, 2026 05:46

Copilot started reviewing on behalf of CuiYifeng May 19, 2026 05:47 View session

Copilot AI reviewed May 19, 2026

View reviewed changes

jianyizh requested changes May 19, 2026

View reviewed changes

CuiYifeng requested a review from jianyizh May 19, 2026 06:03

guangyey approved these changes May 19, 2026

View reviewed changes

jianyizh approved these changes May 19, 2026

View reviewed changes

guangyey merged commit 5872efe into main May 19, 2026
29 checks passed

guangyey deleted the yifeng/revert_topk_opt branch May 19, 2026 07:56

wdziurdz mentioned this pull request May 21, 2026

test_sort failures with num_warps=8 after agama driver upgrade intel/intel-xpu-backend-for-triton#6869

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Revert "Add subgroup topk kernel for XPU (part1 of #3369) (#3371)"#3707

Revert "Add subgroup topk kernel for XPU (part1 of #3369) (#3371)"#3707
guangyey merged 1 commit into
mainfrom
yifeng/revert_topk_opt

CuiYifeng commented May 19, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

jianyizh left a comment

Uh oh!

CuiYifeng commented May 19, 2026

Uh oh!

guangyey commented May 19, 2026

Uh oh!

Uh oh!

chuanqi129 commented May 19, 2026

Uh oh!

github-actions Bot commented May 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

CuiYifeng commented May 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

jianyizh left a comment

Choose a reason for hiding this comment

Uh oh!

CuiYifeng commented May 19, 2026

Uh oh!

guangyey commented May 19, 2026

Uh oh!

Uh oh!

chuanqi129 commented May 19, 2026

Uh oh!

github-actions Bot commented May 19, 2026

Performance outliers, please check!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

CuiYifeng commented May 19, 2026 •

edited

Loading