fix: check full device (incl. index) in XPU fallback consistency check#3713
fix: check full device (incl. index) in XPU fallback consistency check#3713laifenxiawucha wants to merge 2 commits into
Conversation
Change check_device_consistency in XPUFallback.template to compare full c10::Device instead of just DeviceType. This ensures that multi-device errors include the device index (e.g., 'xpu:0' instead of 'xpu'), matching CUDA dispatch behavior where 'Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cuda:1!' includes device indices. Remove @skipCPUIf(True, ...) decorator from test_invalid_input since the fix makes CPU test error messages consistent. Fixes: #2229
There was a problem hiding this comment.
Pull request overview
Fixes XPU fallback's device consistency check to compare the full c10::Device (including device index) instead of only c10::DeviceType, so the resulting error message matches the CUDA reference format (e.g., xpu:0 rather than xpu) and so two XPU tensors on different device indices are correctly detected as a mismatch (rather than falling through to a less precise downstream check). Re-enables the previously skipped CPU sparse-compressed test_invalid_input cases now that the error message format aligns.
Changes:
- In
XPUFallback.template, broadenreference_device/cur_devicefromDeviceTypetoDevice, and update the surrounding comment. - Remove the
@skipCPUIf(True, ...)workaround for issue #2229 intest_sparse_csr_xpu.py.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
| src/ATen/native/xpu/XPUFallback.template | Compare full device (with index) in fallback consistency check; updated docstring. |
| test/xpu/test_sparse_csr_xpu.py | Drop the CPU skip decorator now that the message format matches. |
|
@copilot The Python lint (flake8/ruff) check failed. Please fix the lint errors in this PR. Important: Do NOT update the plan in the PR description directly. Use reply comments to update the status. Lint job log: https://github.com/intel/torch-xpu-ops/actions/runs/26088359296 Instructions:
|
Agent-Logs-Url: https://github.com/intel/torch-xpu-ops/sessions/2f9838dd-c750-4eeb-a5ed-405ff87c29b9 Co-authored-by: chuanqi129 <13608516+chuanqi129@users.noreply.github.com>
Implemented in commit 4cb4e3e by removing the stale |
|
@copilot New UT failures detected in op_ut tests (8 total). Please check these new failures and analyze whether they are caused by the PR changes. Important: Do NOT update the plan in the PR description directly. Use reply comments to update the status. Job log: https://github.com/intel/torch-xpu-ops/actions/runs/26088819392 New failures:
... and 3 more failures. See full list in artifact. Instructions:
|
I checked run Evidence:
Suggested repro commands for reproducer triage:
No code change was needed on this branch for these failures (no new commit). Screenshot: N/A (no UI changes). |
|
@copilot New UT failures detected in op_ut tests (8 total). Please check these new failures and analyze whether they are caused by the PR changes. Important: Do NOT update the plan in the PR description directly. Use reply comments to update the status. Job log: https://github.com/intel/torch-xpu-ops/actions/runs/26088819392 New failures:
... and 3 more failures. See full list in artifact. Instructions:
|
I checked run The PR only touches:
The new failures are in unrelated Pytest repro commands for reproducer job:
No code change is needed on this branch for these failures. Screenshot: N/A (no UI changes). |
Root Cause
XPU fallback's
check_device_consistencyinXPUFallback.templatecompares onlyDeviceType(e.g.,xpu), not the fullDeviceincluding device index (e.g.,xpu:0). When_validate_compressed_sparse_indicesis dispatched to XPU backend (because some tensors are onxpu:0) and falls back, the consistency check produces error messages without device index:But the test expects (matching CUDA behavior):
Additionally, when tensors on
xpu:0andxpu:1are mixed, the oldDeviceTypecheck treats them as "same device" (bothxpu), allowing execution to fall through toSparseCsrTensor.cpp:294which produces a completely different error format ("device of crow_indices (=xpu:0) must match device of col_indices (=xpu:1)"), also mismatching test expectations.Fix
In
src/ATen/native/xpu/XPUFallback.template:std::optional<c10::DeviceType> reference_device→std::optional<c10::Device> reference_deviceauto cur_device = t.device().type()→auto cur_device = t.device()This makes the comparison include device index, matching CUDA semantics where device consistency errors include the full device identifier.
In
test/xpu/test_sparse_csr_xpu.py:@skipCPUIf(True, "https://github.com/intel/torch-xpu-ops/issues/2229")decorator since the fix makes CPU test error messages consistent.Lines changed: +5 / -5
CUDA Reference
aten/src/ATen/native/sparse/SparseCsrTensor.cpp:287-300— PyTorch core device checks useget_device()(full device comparison) and include device index in error messages (e.g.,"device of crow_indices (=xpu:0) must match device of values (=cpu)"). The XPU fallback should produce equivalently specific error messages. CUDA dispather behavior: when tensors oncuda:0andcuda:1are used together, errors include device index.Validation
check_device_consistencyis a shared gate, and this change is strictly more preciseFixes: #2229