fix: check full device (incl. index) in XPU fallback consistency check by laifenxiawucha · Pull Request #3713 · intel/torch-xpu-ops

laifenxiawucha · 2026-05-19T09:24:00Z

Root Cause

XPU fallback's check_device_consistency in XPUFallback.template compares only DeviceType (e.g., xpu), not the full Device including device index (e.g., xpu:0). When _validate_compressed_sparse_indices is dispatched to XPU backend (because some tensors are on xpu:0) and falls back, the consistency check produces error messages without device index:

Expected all tensors to be on the same device, but found at least two devices, xpu and cpu!

But the test expects (matching CUDA behavior):

Expected all tensors to be on the same device, but found at least two devices, xpu:0 and cpu!

Additionally, when tensors on xpu:0 and xpu:1 are mixed, the old DeviceType check treats them as "same device" (both xpu), allowing execution to fall through to SparseCsrTensor.cpp:294 which produces a completely different error format ("device of crow_indices (=xpu:0) must match device of col_indices (=xpu:1)"), also mismatching test expectations.

Fix

In src/ATen/native/xpu/XPUFallback.template:

Change std::optional<c10::DeviceType> reference_device → std::optional<c10::Device> reference_device
Change auto cur_device = t.device().type() → auto cur_device = t.device()

This makes the comparison include device index, matching CUDA semantics where device consistency errors include the full device identifier.

In test/xpu/test_sparse_csr_xpu.py:

Remove @skipCPUIf(True, "https://github.com/intel/torch-xpu-ops/issues/2229") decorator since the fix makes CPU test error messages consistent.

Lines changed: +5 / -5

CUDA Reference

aten/src/ATen/native/sparse/SparseCsrTensor.cpp:287-300 — PyTorch core device checks use get_device() (full device comparison) and include device index in error messages (e.g., "device of crow_indices (=xpu:0) must match device of values (=cpu)"). The XPU fallback should produce equivalently specific error messages. CUDA dispather behavior: when tensors on cuda:0 and cuda:1 are used together, errors include device index.

Validation

All 12 CPU test cases now produce matching error messages
All 12 XPU test cases continue to pass (no change in their error path)
All 4 sparse compressed layouts (CSR, CSC, BSR, BSC) covered by existing test matrix
No blast radius: check_device_consistency is a shared gate, and this change is strictly more precise

Fixes: #2229

Change check_device_consistency in XPUFallback.template to compare full c10::Device instead of just DeviceType. This ensures that multi-device errors include the device index (e.g., 'xpu:0' instead of 'xpu'), matching CUDA dispatch behavior where 'Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cuda:1!' includes device indices. Remove @skipCPUIf(True, ...) decorator from test_invalid_input since the fix makes CPU test error messages consistent. Fixes: #2229

Copilot

Pull request overview

Fixes XPU fallback's device consistency check to compare the full c10::Device (including device index) instead of only c10::DeviceType, so the resulting error message matches the CUDA reference format (e.g., xpu:0 rather than xpu) and so two XPU tensors on different device indices are correctly detected as a mismatch (rather than falling through to a less precise downstream check). Re-enables the previously skipped CPU sparse-compressed test_invalid_input cases now that the error message format aligns.

Changes:

In XPUFallback.template, broaden reference_device/cur_device from DeviceType to Device, and update the surrounding comment.
Remove the @skipCPUIf(True, ...) workaround for issue #2229 in test_sparse_csr_xpu.py.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.

File	Description
src/ATen/native/xpu/XPUFallback.template	Compare full device (with index) in fallback consistency check; updated docstring.
test/xpu/test_sparse_csr_xpu.py	Drop the CPU skip decorator now that the message format matches.

chuanqi129 · 2026-05-19T09:26:16Z

@copilot The Python lint (flake8/ruff) check failed. Please fix the lint errors in this PR.

Important: Do NOT update the plan in the PR description directly. Use reply comments to update the status.

Lint job log: https://github.com/intel/torch-xpu-ops/actions/runs/26088359296

Instructions:

Review the lint failure log via the link above
Fix all lint errors in the relevant source files
Ensure both Python lint and Clang format checks pass
Use reply comments to report progress - do NOT modify the PR description plan directly

Agent-Logs-Url: https://github.com/intel/torch-xpu-ops/sessions/2f9838dd-c750-4eeb-a5ed-405ff87c29b9 Co-authored-by: chuanqi129 <13608516+chuanqi129@users.noreply.github.com>

Copilot · 2026-05-19T09:34:23Z

@copilot The Python lint (flake8/ruff) check failed. Please fix the lint errors in this PR.

**Important: Do NOT update the plan in the PR description directly. Use reply comments to update the status...

Implemented in commit 4cb4e3e by removing the stale skipCPUIf import from test/xpu/test_sparse_csr_xpu.py (ruff F401/format issue). I validated locally with the same CI commands: Python lint (--skip CLANGTIDY,CLANGFORMAT,MERGE_CONFLICTLESS_CSV --all-files) and clang checks (--take CLANGFORMAT,CLANGTIDY --all-files, with PYTORCH_ROOT set) both pass.

chuanqi129 · 2026-05-20T04:45:26Z

@copilot New UT failures detected in op_ut tests (8 total). Please check these new failures and analyze whether they are caused by the PR changes.

Important: Do NOT update the plan in the PR description directly. Use reply comments to update the status.

Job log: https://github.com/intel/torch-xpu-ops/actions/runs/26088819392
Full new failure report: https://github.com/intel/torch-xpu-ops/actions/runs/26088819392/artifacts/7101471875

New failures:

Category	Class name	Test name	Status	Message
op_ut	third_party.torch-xpu-ops.test.xpu.dynamo.test_regional_inductor_xpu.RegionalInductorInvokeSubgraphTests	test_max_autotune_no_cudagraphs_serialize_False	failed	AttributeError: module 'torch._inductor' has no attribute 'compile_fx'
op_ut	third_party.torch-xpu-ops.test.xpu.functorch.test_control_flow_xpu.TestControlFlowTraced	test_while_loop_autograd_simple	failed	AssertionError: 'clas[152 chars] zeros: "i64[]" = torch.ops.aten.zeros.def[3514 chars]1)\n' != 'clas[152 chars] clone: "f32[3, 3]" = torch.ops.aten.clone[3636 chars]1)\n'
op_ut		third_party.torch-xpu-ops.test.xpu.functorch.test_ops_xpu	error	ImportError: cannot import name 'skipOps' from 'common_utils' (/__w/torch-xpu-ops/torch-xpu-ops/pytorch/third_party/torch-xpu-ops/test/xpu/../../../../test/functorch/common_utils.py)
op_ut		third_party.torch-xpu-ops.test.xpu.functorch.test_vmap_xpu	error	ImportError: cannot import name 'skipOps' from 'common_utils' (/__w/torch-xpu-ops/torch-xpu-ops/pytorch/third_party/torch-xpu-ops/test/xpu/../../../../test/functorch/common_utils.py)
op_ut	third_party.torch-xpu-ops.test.xpu.test_modules_xpu.TestModuleXPU	test_forward_nn_LinearCrossEntropyLoss_xpu_bfloat16	failed	AssertionError: Scalars are not close!

... and 3 more failures. See full list in artifact.

Instructions:

Download the full new failure report artifact for complete error messages
Analyze whether the failures are caused by PR changes
If caused by PR changes, fix those failures on this PR branch directly
Add failing test cases with pytest commands in the PR description for the reproducer job to use
Use reply comments to report analysis status - do NOT modify the PR description plan directly

Copilot · 2026-05-20T04:48:37Z

@copilot New UT failures detected in op_ut tests (8 total). Please check these new failures and analyze whether they are caused by the PR changes.

**Important: Do NOT update the plan in the PR de...

I checked run 26088819392 and downloaded artifact 7101471875 (new_ut_failure_list.csv). These new failures are not caused by this PR.

Evidence:

This PR only changes:
- src/ATen/native/xpu/XPUFallback.template
- test/xpu/test_sparse_csr_xpu.py
New failures are in unrelated areas (dynamo, functorch, test_modules_xpu, test_decomp_xpu) with signatures like API/import drift (torch._inductor.compile_fx missing, skipOps import error, common_methods_invocations.skip import error, dataclass immutability), not device-consistency errors.
test_sparse_csr_xpu cases are present in the op_ut run and are not listed in new_ut_failure_list.csv.

Suggested repro commands for reproducer triage:

python test/xpu/dynamo/test_regional_inductor_xpu.py -k test_max_autotune_no_cudagraphs_serialize_False
python test/xpu/functorch/test_control_flow_xpu.py -k test_while_loop_autograd_simple
python test/xpu/functorch/test_ops_xpu.py
python test/xpu/functorch/test_vmap_xpu.py
python test/xpu/test_modules_xpu.py -k test_forward_nn_LinearCrossEntropyLoss_xpu_bfloat16
python test/xpu/dynamo/test_aot_autograd_cache_xpu.py -k "test_identical_graphs_and_configs or test_different_configs"
python test/xpu/dynamo/test_functions_xpu.py -k test_get_device_properties_tensor_device
python test/xpu/test_decomp_xpu.py

No code change was needed on this branch for these failures (no new commit).

Screenshot: N/A (no UI changes).

chuanqi129 · 2026-05-20T05:41:12Z

@copilot New UT failures detected in op_ut tests (8 total). Please check these new failures and analyze whether they are caused by the PR changes.

Important: Do NOT update the plan in the PR description directly. Use reply comments to update the status.

Job log: https://github.com/intel/torch-xpu-ops/actions/runs/26088819392
Full new failure report: https://github.com/intel/torch-xpu-ops/actions/runs/26088819392/artifacts/7102136274

New failures:

Category	Class name	Test name	Status	Message
op_ut	third_party.torch-xpu-ops.test.xpu.dynamo.test_regional_inductor_xpu.RegionalInductorInvokeSubgraphTests	test_max_autotune_no_cudagraphs_serialize_False	failed	AttributeError: module 'torch._inductor' has no attribute 'compile_fx'
op_ut	third_party.torch-xpu-ops.test.xpu.functorch.test_control_flow_xpu.TestControlFlowTraced	test_while_loop_autograd_simple	failed	AssertionError: 'clas[152 chars] zeros: "i64[]" = torch.ops.aten.zeros.def[3514 chars]1)\n' != 'clas[152 chars] clone: "f32[3, 3]" = torch.ops.aten.clone[3636 chars]1)\n'
op_ut		third_party.torch-xpu-ops.test.xpu.functorch.test_ops_xpu	error	ImportError: cannot import name 'skipOps' from 'common_utils' (/__w/torch-xpu-ops/torch-xpu-ops/pytorch/third_party/torch-xpu-ops/test/xpu/../../../../test/functorch/common_utils.py)
op_ut		third_party.torch-xpu-ops.test.xpu.functorch.test_vmap_xpu	error	ImportError: cannot import name 'skipOps' from 'common_utils' (/__w/torch-xpu-ops/torch-xpu-ops/pytorch/third_party/torch-xpu-ops/test/xpu/../../../../test/functorch/common_utils.py)
op_ut	third_party.torch-xpu-ops.test.xpu.test_modules_xpu.TestModuleXPU	test_forward_nn_LinearCrossEntropyLoss_xpu_bfloat16	failed	AssertionError: Scalars are not close!

... and 3 more failures. See full list in artifact.

Instructions:

Download the full new failure report artifact for complete error messages
Analyze whether the failures are caused by PR changes
If caused by PR changes, fix those failures on this PR branch directly
Add failing test cases with pytest commands in the PR description for the reproducer job to use
Use reply comments to report analysis status - do NOT modify the PR description plan directly

Copilot · 2026-05-20T05:42:50Z

@copilot New UT failures detected in op_ut tests (8 total). Please check these new failures and analyze whether they are caused by the PR changes.

**Important: Do NOT update the plan in the PR de...

I checked run 26088819392 (attempt 3) and downloaded artifact 7102136274 (new_ut_failure_list.csv). These failures are not caused by this PR (4cb4e3e, 9eb8964).

The PR only touches:

src/ATen/native/xpu/XPUFallback.template
test/xpu/test_sparse_csr_xpu.py

The new failures are in unrelated dynamo/functorch/test_modules_xpu/test_decomp_xpu areas with API/import/config drift signatures (torch._inductor.compile_fx missing, skipOps import error, common_methods_invocations.skip import error, dataclass frozen-field mutation), not sparse fallback device-consistency behavior.

Pytest repro commands for reproducer job:

python test/xpu/dynamo/test_regional_inductor_xpu.py -k test_max_autotune_no_cudagraphs_serialize_False
python test/xpu/functorch/test_control_flow_xpu.py -k test_while_loop_autograd_simple
python test/xpu/functorch/test_ops_xpu.py
python test/xpu/functorch/test_vmap_xpu.py
python test/xpu/test_modules_xpu.py -k test_forward_nn_LinearCrossEntropyLoss_xpu_bfloat16
python test/xpu/dynamo/test_aot_autograd_cache_xpu.py -k "test_identical_graphs_and_configs or test_different_configs"
python test/xpu/dynamo/test_functions_xpu.py -k test_get_device_properties_tensor_device
python test/xpu/test_decomp_xpu.py

No code change is needed on this branch for these failures.

Screenshot: N/A (no UI changes).

Copilot AI review requested due to automatic review settings May 19, 2026 09:24

chuanqi129 added disable_e2e Disable all e2e test jobs for the PR disable_distributed Disable distributed UT test jobs for the PR labels May 19, 2026

chuanqi129 marked this pull request as draft May 19, 2026 09:24

laifenxiawucha mentioned this pull request May 19, 2026

test/test_sparse_csr.py::TestSparseCompressedCPU::test_invalid_input meet message not match #2229

Open

chuanqi129 marked this pull request as ready for review May 19, 2026 09:24

Copilot started reviewing on behalf of laifenxiawucha May 19, 2026 09:24 View session

Copilot AI reviewed May 19, 2026

View reviewed changes

Copilot started work on behalf of chuanqi129 May 19, 2026 09:26 View session

fix: remove stale skipCPUIf import in sparse CSR XPU test

4cb4e3e

Agent-Logs-Url: https://github.com/intel/torch-xpu-ops/sessions/2f9838dd-c750-4eeb-a5ed-405ff87c29b9 Co-authored-by: chuanqi129 <13608516+chuanqi129@users.noreply.github.com>

Copilot finished work on behalf of chuanqi129 May 19, 2026 09:35

Copilot AI requested a review from chuanqi129 May 19, 2026 09:35

Copilot started work on behalf of chuanqi129 May 20, 2026 04:45 View session

Copilot finished work on behalf of chuanqi129 May 20, 2026 04:49

Copilot started work on behalf of chuanqi129 May 20, 2026 05:41 View session

Copilot finished work on behalf of chuanqi129 May 20, 2026 05:43

chunhuanMeng approved these changes May 21, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: check full device (incl. index) in XPU fallback consistency check#3713

fix: check full device (incl. index) in XPU fallback consistency check#3713
laifenxiawucha wants to merge 2 commits into
mainfrom
hermes/fix-2229-validate-sparse-indices

laifenxiawucha commented May 19, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

chuanqi129 commented May 19, 2026

Uh oh!

Copilot AI commented May 19, 2026

Uh oh!

chuanqi129 commented May 20, 2026

Uh oh!

Copilot AI commented May 20, 2026

Uh oh!

chuanqi129 commented May 20, 2026

Uh oh!

Copilot AI commented May 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

laifenxiawucha commented May 19, 2026

Root Cause

Fix

CUDA Reference

Validation

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

chuanqi129 commented May 19, 2026

Uh oh!

Copilot AI commented May 19, 2026

Uh oh!

chuanqi129 commented May 20, 2026

Uh oh!

Copilot AI commented May 20, 2026

Uh oh!

chuanqi129 commented May 20, 2026

Uh oh!

Copilot AI commented May 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants