Skip to content

Fix arange error type for fractional step with integer output on XPU#3701

Open
AKloniecki wants to merge 4 commits into
mainfrom
aklonieckix/fix-arange
Open

Fix arange error type for fractional step with integer output on XPU#3701
AKloniecki wants to merge 4 commits into
mainfrom
aklonieckix/fix-arange

Conversation

@AKloniecki
Copy link
Copy Markdown
Contributor

Summary

Fix arange_out_xpu to raise ValueError (instead of RuntimeError) when a fractional step truncates to zero for an integer output tensor, matching the CUDA implementation.

Changes

  • Include <ATen/native/RangeUtils.h> for arange_check_bounds.
  • Replace manual TORCH_CHECK calls with arange_check_bounds(start, end, step) for consistent double-precision validation.
  • Add TORCH_CHECK_VALUE(xstep != 0, "step must be nonzero") in the int64_t code path (mirrors RangeFactories.cu).

Test

python -m pytest -sxv test_ops_xpu.py -k test_errors_arange_xpu now passes.

Fixes #3699

When torch.arange is called with a fractional step (e.g., 0.5) and an
integer output tensor, the step truncates to 0. The CUDA backend
correctly raises ValueError via TORCH_CHECK_VALUE, but XPU was raising
RuntimeError via TORCH_CHECK.

Replace inline bounds checks with arange_check_bounds() (from
RangeUtils.h) and add TORCH_CHECK_VALUE in the int64_t branch to match
CUDA behavior.

Fixes #3699

Signed-off-by: Artur Kłoniecki <arturx.kloniecki@intel.com>
Copilot AI review requested due to automatic review settings May 18, 2026 12:23
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Skill files read: .github/skills/xpu-ops-pr-review/SKILL.md.

This PR updates the XPU arange_out_xpu argument validation to better match CUDA semantics by centralizing bound checks in arange_check_bounds(...) and ensuring the integer-output fractional-step-to-zero case raises ValueError.

Changes:

  • Added #include <ATen/native/RangeUtils.h> and switched to arange_check_bounds(start, end, step) for bounds/sign/finite validation.
  • Added an int64_t-specific TORCH_CHECK_VALUE(xstep != 0, "step must be nonzero") to match CUDA’s error type for fractional steps truncating to zero.

Comment thread src/ATen/native/xpu/RangeFactories.cpp Outdated
Comment thread src/ATen/native/xpu/RangeFactories.cpp Outdated
Copy link
Copy Markdown
Contributor

@BBBela BBBela left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have the same concern as @pbielak regarding putting the xstart, xend, and xstep into the constexpr body where they are used. I would personally do a step further and move also the using accscalar_t alias into the if constexpr as it is also used only for those three variables - but it is not blocking in any case.
Beside that everything looks good!

@chuanqi129
Copy link
Copy Markdown
Contributor

@copilot New UT failures detected in op_ut tests (1 total). Please check these new failures and analyze whether they are caused by the PR changes.

Important: Do NOT update the plan in the PR description directly. Use reply comments to update the status.

Job log: https://github.com/intel/torch-xpu-ops/actions/runs/26033231815
Full new failure report: https://github.com/intel/torch-xpu-ops/actions/runs/26033231815/artifacts/7067652345

New failures:

Category Class name Test name Status Message
op_ut third_party.torch-xpu-ops.test.xpu.functorch.test_vmap_xpu.TestVmapOperatorsOpInfoXPU test_op_has_batch_rule_count_nonzero_xpu_float32 failed Unexpected success

Instructions:

  1. Download the full new failure report artifact for complete error messages
  2. Analyze whether the failures are caused by PR changes
  3. If caused by PR changes, fix those failures on this PR branch directly
  4. Add failing test cases with pytest commands in the PR description for the reproducer job to use
  5. Use reply comments to report analysis status - do NOT modify the PR description plan directly

Agent-Logs-Url: https://github.com/intel/torch-xpu-ops/sessions/b20c84de-bfc4-446e-b55f-028f700c7dea

Co-authored-by: chuanqi129 <13608516+chuanqi129@users.noreply.github.com>
Copy link
Copy Markdown
Contributor

Copilot AI commented May 18, 2026

I have the same concern as @pbielak regarding putting the xstart, xend, and xstep into the constexpr body where they are used. I would personally do a step further and move also the `using acc...

Addressed in 3e68faf: I moved xstart/xend/xstep into the int64_t if constexpr branch, and also moved the using accscalar_t alias there since it is only needed for those values.

@copilot New UT failures detected in op_ut tests (1 total). Please check these new failures and analyze whether they are caused by the PR changes.

**Important: Do NOT update the plan in the PR de...

I checked run 26033231815 and downloaded artifact 7067652345 (new_ut_failure_list.csv). The reported failure is test_op_has_batch_rule_count_nonzero_xpu_float32 with Unexpected success, which comes from a pre-existing xfail("count_nonzero") in test/xpu/functorch/test_vmap_xpu.py. This PR only changes src/ATen/native/xpu/RangeFactories.cpp (8584d5b, follow-up 3e68faf) and does not touch count_nonzero/functorch paths, so this failure is not caused by these PR changes. Reproducer command: python -m pytest -x -vv test/xpu/functorch/test_vmap_xpu.py -k test_op_has_batch_rule_count_nonzero_xpu_float32.

Copilot AI requested a review from chuanqi129 May 18, 2026 19:54
@github-actions
Copy link
Copy Markdown

Performance outliers, please check!

  • 🔴 [-1, 80%), should be regression
Category Model Target vs. Baseline [Eager] Target vs. Baseline [Inductor]
timm_models_bfloat16_training ghostnet_100 0.860729 0.567998
timm_models_bfloat16_training mobilenetv3_large_100 0.779542 0.569755
timm_models_bfloat16_training mobilevit_s 0.718466 0.576406
timm_models_bfloat16_training mobilenetv2_100 0.829184 0.581502
timm_models_bfloat16_training tf_efficientnet_b0 0.812186 0.627838
timm_models_bfloat16_training deit_base_distilled_patch16_224 0.737897 0.738956
timm_models_bfloat16_training vit_base_patch16_siglip_256 0.732254 0.761148
timm_models_bfloat16_training dm_nfnet_f0 0.644692 0.775626
timm_models_bfloat16_training nfnet_l0 0.743344 0.783158
torchbench_bfloat16_training mobilenet_v2 1.064741 0.789318
timm_models_bfloat16_training adv_inception_v3 0.796177 0.802161
timm_models_bfloat16_training beit_base_patch16_224 0.775618 0.839905
timm_models_bfloat16_training visformer_small 0.744350 0.875215
  • 🟡 [80%, 90%), may be fluctuations
Category Model Target vs. Baseline [Eager] Target vs. Baseline [Inductor]
torchbench_bfloat16_training dcgan 0.852079 0.812976
timm_models_bfloat16_training repvgg_a2 0.837614 0.832600
timm_models_bfloat16_training inception_v3 0.851772 0.838098
timm_models_bfloat16_training swin_base_patch4_window7_224 0.869586 0.838903
timm_models_bfloat16_training convnextv2_nano.fcmae_ft_in22k_in1k 0.816929 0.857979
torchbench_bfloat16_training mnasnet1_0 1.076022 0.875167
torchbench_bfloat16_training resnext50_32x4d 0.972332 0.883240
timm_models_bfloat16_training deit_tiny_patch16_224.fb_in1k 0.889454 0.884388

Copy link
Copy Markdown
Contributor

@BBBela BBBela left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me in current shape!
Thanks!

@chuanqi129
Copy link
Copy Markdown
Contributor

@copilot New UT failures detected in op_ut tests (1 total). Please check these new failures and analyze whether they are caused by the PR changes.

Important: Do NOT update the plan in the PR description directly. Use reply comments to update the status.

Job log: https://github.com/intel/torch-xpu-ops/actions/runs/26056824947
Full new failure report: https://github.com/intel/torch-xpu-ops/actions/runs/26056824947/artifacts/7080904551

New failures:

Category Class name Test name Status Message
op_ut third_party.torch-xpu-ops.test.xpu.functorch.test_vmap_xpu.TestVmapOperatorsOpInfoXPU test_op_has_batch_rule_count_nonzero_xpu_float32 failed Unexpected success

Instructions:

  1. Download the full new failure report artifact for complete error messages
  2. Analyze whether the failures are caused by PR changes
  3. If caused by PR changes, fix those failures on this PR branch directly
  4. Add failing test cases with pytest commands in the PR description for the reproducer job to use
  5. Use reply comments to report analysis status - do NOT modify the PR description plan directly

Copy link
Copy Markdown
Contributor

Copilot AI commented May 19, 2026

@copilot New UT failures detected in op_ut tests (1 total). Please check these new failures and analyze whether they are caused by the PR changes.

**Important: Do NOT update the plan in the PR de...

I checked run 26056824947, reviewed the failing linux-ut (op_ut) / summary job logs, and downloaded artifact 7080904551. The reported test test_op_has_batch_rule_count_nonzero_xpu_float32 fails as Unexpected success and corresponds to a pre-existing xfail("count_nonzero") in test/xpu/functorch/test_vmap_xpu.py. The last two PR commits (8584d5b, 3e68faf) only modify src/ATen/native/xpu/RangeFactories.cpp, so this is not caused by this PR’s arange changes. Reproducer: python -m pytest -x -vv test/xpu/functorch/test_vmap_xpu.py -k test_op_has_batch_rule_count_nonzero_xpu_float32.

@github-actions github-actions Bot added disable_e2e Disable all e2e test jobs for the PR disable_distributed Disable distributed UT test jobs for the PR labels May 21, 2026
@chuanqi129 chuanqi129 marked this pull request as draft May 21, 2026 14:15
@chuanqi129 chuanqi129 marked this pull request as ready for review May 21, 2026 14:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

disable_distributed Disable distributed UT test jobs for the PR disable_e2e Disable all e2e test jobs for the PR

Projects

None yet

Development

Successfully merging this pull request may close these issues.

New failing test case test_errors_arange_xpu

7 participants