Skip to content

fix: check full device (incl. index) in XPU fallback consistency check#3713

Open
laifenxiawucha wants to merge 2 commits into
mainfrom
hermes/fix-2229-validate-sparse-indices
Open

fix: check full device (incl. index) in XPU fallback consistency check#3713
laifenxiawucha wants to merge 2 commits into
mainfrom
hermes/fix-2229-validate-sparse-indices

Conversation

@laifenxiawucha
Copy link
Copy Markdown
Contributor

Root Cause

XPU fallback's check_device_consistency in XPUFallback.template compares only DeviceType (e.g., xpu), not the full Device including device index (e.g., xpu:0). When _validate_compressed_sparse_indices is dispatched to XPU backend (because some tensors are on xpu:0) and falls back, the consistency check produces error messages without device index:

Expected all tensors to be on the same device, but found at least two devices, xpu and cpu!

But the test expects (matching CUDA behavior):

Expected all tensors to be on the same device, but found at least two devices, xpu:0 and cpu!

Additionally, when tensors on xpu:0 and xpu:1 are mixed, the old DeviceType check treats them as "same device" (both xpu), allowing execution to fall through to SparseCsrTensor.cpp:294 which produces a completely different error format ("device of crow_indices (=xpu:0) must match device of col_indices (=xpu:1)"), also mismatching test expectations.

Fix

In src/ATen/native/xpu/XPUFallback.template:

  • Change std::optional<c10::DeviceType> reference_devicestd::optional<c10::Device> reference_device
  • Change auto cur_device = t.device().type()auto cur_device = t.device()

This makes the comparison include device index, matching CUDA semantics where device consistency errors include the full device identifier.

In test/xpu/test_sparse_csr_xpu.py:

  • Remove @skipCPUIf(True, "https://github.com/intel/torch-xpu-ops/issues/2229") decorator since the fix makes CPU test error messages consistent.

Lines changed: +5 / -5

CUDA Reference

aten/src/ATen/native/sparse/SparseCsrTensor.cpp:287-300 — PyTorch core device checks use get_device() (full device comparison) and include device index in error messages (e.g., "device of crow_indices (=xpu:0) must match device of values (=cpu)"). The XPU fallback should produce equivalently specific error messages. CUDA dispather behavior: when tensors on cuda:0 and cuda:1 are used together, errors include device index.

Validation

  • All 12 CPU test cases now produce matching error messages
  • All 12 XPU test cases continue to pass (no change in their error path)
  • All 4 sparse compressed layouts (CSR, CSC, BSR, BSC) covered by existing test matrix
  • No blast radius: check_device_consistency is a shared gate, and this change is strictly more precise

Fixes: #2229

Change check_device_consistency in XPUFallback.template to compare full
c10::Device instead of just DeviceType. This ensures that multi-device
errors include the device index (e.g., 'xpu:0' instead of 'xpu'),
matching CUDA dispatch behavior where 'Expected all tensors to be on
the same device, but found at least two devices, cuda:0 and cuda:1!'
includes device indices.

Remove @skipCPUIf(True, ...) decorator from test_invalid_input since
the fix makes CPU test error messages consistent.

Fixes: #2229
Copilot AI review requested due to automatic review settings May 19, 2026 09:24
@chuanqi129 chuanqi129 added disable_e2e Disable all e2e test jobs for the PR disable_distributed Disable distributed UT test jobs for the PR labels May 19, 2026
@chuanqi129 chuanqi129 marked this pull request as draft May 19, 2026 09:24
@chuanqi129 chuanqi129 marked this pull request as ready for review May 19, 2026 09:24
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes XPU fallback's device consistency check to compare the full c10::Device (including device index) instead of only c10::DeviceType, so the resulting error message matches the CUDA reference format (e.g., xpu:0 rather than xpu) and so two XPU tensors on different device indices are correctly detected as a mismatch (rather than falling through to a less precise downstream check). Re-enables the previously skipped CPU sparse-compressed test_invalid_input cases now that the error message format aligns.

Changes:

  • In XPUFallback.template, broaden reference_device/cur_device from DeviceType to Device, and update the surrounding comment.
  • Remove the @skipCPUIf(True, ...) workaround for issue #2229 in test_sparse_csr_xpu.py.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.

File Description
src/ATen/native/xpu/XPUFallback.template Compare full device (with index) in fallback consistency check; updated docstring.
test/xpu/test_sparse_csr_xpu.py Drop the CPU skip decorator now that the message format matches.

@chuanqi129
Copy link
Copy Markdown
Contributor

@copilot The Python lint (flake8/ruff) check failed. Please fix the lint errors in this PR.

Important: Do NOT update the plan in the PR description directly. Use reply comments to update the status.

Lint job log: https://github.com/intel/torch-xpu-ops/actions/runs/26088359296

Instructions:

  1. Review the lint failure log via the link above
  2. Fix all lint errors in the relevant source files
  3. Ensure both Python lint and Clang format checks pass
  4. Use reply comments to report progress - do NOT modify the PR description plan directly

Copy link
Copy Markdown
Contributor

Copilot AI commented May 19, 2026

@copilot The Python lint (flake8/ruff) check failed. Please fix the lint errors in this PR.

**Important: Do NOT update the plan in the PR description directly. Use reply comments to update the status...

Implemented in commit 4cb4e3e by removing the stale skipCPUIf import from test/xpu/test_sparse_csr_xpu.py (ruff F401/format issue). I validated locally with the same CI commands: Python lint (--skip CLANGTIDY,CLANGFORMAT,MERGE_CONFLICTLESS_CSV --all-files) and clang checks (--take CLANGFORMAT,CLANGTIDY --all-files, with PYTORCH_ROOT set) both pass.

Copilot AI requested a review from chuanqi129 May 19, 2026 09:35
@chuanqi129
Copy link
Copy Markdown
Contributor

@copilot New UT failures detected in op_ut tests (8 total). Please check these new failures and analyze whether they are caused by the PR changes.

Important: Do NOT update the plan in the PR description directly. Use reply comments to update the status.

Job log: https://github.com/intel/torch-xpu-ops/actions/runs/26088819392
Full new failure report: https://github.com/intel/torch-xpu-ops/actions/runs/26088819392/artifacts/7101471875

New failures:

Category Class name Test name Status Message
op_ut third_party.torch-xpu-ops.test.xpu.dynamo.test_regional_inductor_xpu.RegionalInductorInvokeSubgraphTests test_max_autotune_no_cudagraphs_serialize_False failed AttributeError: module 'torch._inductor' has no attribute 'compile_fx'
op_ut third_party.torch-xpu-ops.test.xpu.functorch.test_control_flow_xpu.TestControlFlowTraced test_while_loop_autograd_simple failed AssertionError: 'clas[152 chars] zeros: "i64[]" = torch.ops.aten.zeros.def[3514 chars]1)\n' != 'clas[152 chars] clone: "f32[3, 3]" = torch.ops.aten.clone[3636 chars]1)\n'
op_ut third_party.torch-xpu-ops.test.xpu.functorch.test_ops_xpu error ImportError: cannot import name 'skipOps' from 'common_utils' (/__w/torch-xpu-ops/torch-xpu-ops/pytorch/third_party/torch-xpu-ops/test/xpu/../../../../test/functorch/common_utils.py)
op_ut third_party.torch-xpu-ops.test.xpu.functorch.test_vmap_xpu error ImportError: cannot import name 'skipOps' from 'common_utils' (/__w/torch-xpu-ops/torch-xpu-ops/pytorch/third_party/torch-xpu-ops/test/xpu/../../../../test/functorch/common_utils.py)
op_ut third_party.torch-xpu-ops.test.xpu.test_modules_xpu.TestModuleXPU test_forward_nn_LinearCrossEntropyLoss_xpu_bfloat16 failed AssertionError: Scalars are not close!

... and 3 more failures. See full list in artifact.

Instructions:

  1. Download the full new failure report artifact for complete error messages
  2. Analyze whether the failures are caused by PR changes
  3. If caused by PR changes, fix those failures on this PR branch directly
  4. Add failing test cases with pytest commands in the PR description for the reproducer job to use
  5. Use reply comments to report analysis status - do NOT modify the PR description plan directly

Copy link
Copy Markdown
Contributor

Copilot AI commented May 20, 2026

@copilot New UT failures detected in op_ut tests (8 total). Please check these new failures and analyze whether they are caused by the PR changes.

**Important: Do NOT update the plan in the PR de...

I checked run 26088819392 and downloaded artifact 7101471875 (new_ut_failure_list.csv). These new failures are not caused by this PR.

Evidence:

  • This PR only changes:
    • src/ATen/native/xpu/XPUFallback.template
    • test/xpu/test_sparse_csr_xpu.py
  • New failures are in unrelated areas (dynamo, functorch, test_modules_xpu, test_decomp_xpu) with signatures like API/import drift (torch._inductor.compile_fx missing, skipOps import error, common_methods_invocations.skip import error, dataclass immutability), not device-consistency errors.
  • test_sparse_csr_xpu cases are present in the op_ut run and are not listed in new_ut_failure_list.csv.

Suggested repro commands for reproducer triage:

  • python test/xpu/dynamo/test_regional_inductor_xpu.py -k test_max_autotune_no_cudagraphs_serialize_False
  • python test/xpu/functorch/test_control_flow_xpu.py -k test_while_loop_autograd_simple
  • python test/xpu/functorch/test_ops_xpu.py
  • python test/xpu/functorch/test_vmap_xpu.py
  • python test/xpu/test_modules_xpu.py -k test_forward_nn_LinearCrossEntropyLoss_xpu_bfloat16
  • python test/xpu/dynamo/test_aot_autograd_cache_xpu.py -k "test_identical_graphs_and_configs or test_different_configs"
  • python test/xpu/dynamo/test_functions_xpu.py -k test_get_device_properties_tensor_device
  • python test/xpu/test_decomp_xpu.py

No code change was needed on this branch for these failures (no new commit).

Screenshot: N/A (no UI changes).

@chuanqi129
Copy link
Copy Markdown
Contributor

@copilot New UT failures detected in op_ut tests (8 total). Please check these new failures and analyze whether they are caused by the PR changes.

Important: Do NOT update the plan in the PR description directly. Use reply comments to update the status.

Job log: https://github.com/intel/torch-xpu-ops/actions/runs/26088819392
Full new failure report: https://github.com/intel/torch-xpu-ops/actions/runs/26088819392/artifacts/7102136274

New failures:

Category Class name Test name Status Message
op_ut third_party.torch-xpu-ops.test.xpu.dynamo.test_regional_inductor_xpu.RegionalInductorInvokeSubgraphTests test_max_autotune_no_cudagraphs_serialize_False failed AttributeError: module 'torch._inductor' has no attribute 'compile_fx'
op_ut third_party.torch-xpu-ops.test.xpu.functorch.test_control_flow_xpu.TestControlFlowTraced test_while_loop_autograd_simple failed AssertionError: 'clas[152 chars] zeros: "i64[]" = torch.ops.aten.zeros.def[3514 chars]1)\n' != 'clas[152 chars] clone: "f32[3, 3]" = torch.ops.aten.clone[3636 chars]1)\n'
op_ut third_party.torch-xpu-ops.test.xpu.functorch.test_ops_xpu error ImportError: cannot import name 'skipOps' from 'common_utils' (/__w/torch-xpu-ops/torch-xpu-ops/pytorch/third_party/torch-xpu-ops/test/xpu/../../../../test/functorch/common_utils.py)
op_ut third_party.torch-xpu-ops.test.xpu.functorch.test_vmap_xpu error ImportError: cannot import name 'skipOps' from 'common_utils' (/__w/torch-xpu-ops/torch-xpu-ops/pytorch/third_party/torch-xpu-ops/test/xpu/../../../../test/functorch/common_utils.py)
op_ut third_party.torch-xpu-ops.test.xpu.test_modules_xpu.TestModuleXPU test_forward_nn_LinearCrossEntropyLoss_xpu_bfloat16 failed AssertionError: Scalars are not close!

... and 3 more failures. See full list in artifact.

Instructions:

  1. Download the full new failure report artifact for complete error messages
  2. Analyze whether the failures are caused by PR changes
  3. If caused by PR changes, fix those failures on this PR branch directly
  4. Add failing test cases with pytest commands in the PR description for the reproducer job to use
  5. Use reply comments to report analysis status - do NOT modify the PR description plan directly

Copy link
Copy Markdown
Contributor

Copilot AI commented May 20, 2026

@copilot New UT failures detected in op_ut tests (8 total). Please check these new failures and analyze whether they are caused by the PR changes.

**Important: Do NOT update the plan in the PR de...

I checked run 26088819392 (attempt 3) and downloaded artifact 7102136274 (new_ut_failure_list.csv). These failures are not caused by this PR (4cb4e3e, 9eb8964).

The PR only touches:

  • src/ATen/native/xpu/XPUFallback.template
  • test/xpu/test_sparse_csr_xpu.py

The new failures are in unrelated dynamo/functorch/test_modules_xpu/test_decomp_xpu areas with API/import/config drift signatures (torch._inductor.compile_fx missing, skipOps import error, common_methods_invocations.skip import error, dataclass frozen-field mutation), not sparse fallback device-consistency behavior.

Pytest repro commands for reproducer job:

  • python test/xpu/dynamo/test_regional_inductor_xpu.py -k test_max_autotune_no_cudagraphs_serialize_False
  • python test/xpu/functorch/test_control_flow_xpu.py -k test_while_loop_autograd_simple
  • python test/xpu/functorch/test_ops_xpu.py
  • python test/xpu/functorch/test_vmap_xpu.py
  • python test/xpu/test_modules_xpu.py -k test_forward_nn_LinearCrossEntropyLoss_xpu_bfloat16
  • python test/xpu/dynamo/test_aot_autograd_cache_xpu.py -k "test_identical_graphs_and_configs or test_different_configs"
  • python test/xpu/dynamo/test_functions_xpu.py -k test_get_device_properties_tensor_device
  • python test/xpu/test_decomp_xpu.py

No code change is needed on this branch for these failures.

Screenshot: N/A (no UI changes).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

disable_distributed Disable distributed UT test jobs for the PR disable_e2e Disable all e2e test jobs for the PR

Projects

None yet

Development

Successfully merging this pull request may close these issues.

test/test_sparse_csr.py::TestSparseCompressedCPU::test_invalid_input meet message not match

5 participants