Fix arange error type for fractional step with integer output on XPU by AKloniecki · Pull Request #3701 · intel/torch-xpu-ops

AKloniecki · 2026-05-18T12:23:11Z

Summary

Fix arange_out_xpu to raise ValueError (instead of RuntimeError) when a fractional step truncates to zero for an integer output tensor, matching the CUDA implementation.

Changes

Include <ATen/native/RangeUtils.h> for arange_check_bounds.
Replace manual TORCH_CHECK calls with arange_check_bounds(start, end, step) for consistent double-precision validation.
Add TORCH_CHECK_VALUE(xstep != 0, "step must be nonzero") in the int64_t code path (mirrors RangeFactories.cu).

Test

python -m pytest -sxv test_ops_xpu.py -k test_errors_arange_xpu now passes.

Fixes #3699

When torch.arange is called with a fractional step (e.g., 0.5) and an integer output tensor, the step truncates to 0. The CUDA backend correctly raises ValueError via TORCH_CHECK_VALUE, but XPU was raising RuntimeError via TORCH_CHECK. Replace inline bounds checks with arange_check_bounds() (from RangeUtils.h) and add TORCH_CHECK_VALUE in the int64_t branch to match CUDA behavior. Fixes #3699 Signed-off-by: Artur Kłoniecki <arturx.kloniecki@intel.com>

Copilot

Pull request overview

Skill files read: .github/skills/xpu-ops-pr-review/SKILL.md.

This PR updates the XPU arange_out_xpu argument validation to better match CUDA semantics by centralizing bound checks in arange_check_bounds(...) and ensuring the integer-output fractional-step-to-zero case raises ValueError.

Changes:

Added #include <ATen/native/RangeUtils.h> and switched to arange_check_bounds(start, end, step) for bounds/sign/finite validation.
Added an int64_t-specific TORCH_CHECK_VALUE(xstep != 0, "step must be nonzero") to match CUDA’s error type for fractional steps truncating to zero.

BBBela

I have the same concern as @pbielak regarding putting the xstart, xend, and xstep into the constexpr body where they are used. I would personally do a step further and move also the using accscalar_t alias into the if constexpr as it is also used only for those three variables - but it is not blocking in any case.
Beside that everything looks good!

chuanqi129 · 2026-05-18T19:49:32Z

@copilot New UT failures detected in op_ut tests (1 total). Please check these new failures and analyze whether they are caused by the PR changes.

Important: Do NOT update the plan in the PR description directly. Use reply comments to update the status.

Job log: https://github.com/intel/torch-xpu-ops/actions/runs/26033231815
Full new failure report: https://github.com/intel/torch-xpu-ops/actions/runs/26033231815/artifacts/7067652345

New failures:

Category	Class name	Test name	Status	Message
op_ut	third_party.torch-xpu-ops.test.xpu.functorch.test_vmap_xpu.TestVmapOperatorsOpInfoXPU	test_op_has_batch_rule_count_nonzero_xpu_float32	failed	Unexpected success

Instructions:

Download the full new failure report artifact for complete error messages
Analyze whether the failures are caused by PR changes
If caused by PR changes, fix those failures on this PR branch directly
Add failing test cases with pytest commands in the PR description for the reproducer job to use
Use reply comments to report analysis status - do NOT modify the PR description plan directly

Agent-Logs-Url: https://github.com/intel/torch-xpu-ops/sessions/b20c84de-bfc4-446e-b55f-028f700c7dea Co-authored-by: chuanqi129 <13608516+chuanqi129@users.noreply.github.com>

Copilot · 2026-05-18T19:53:19Z

I have the same concern as @pbielak regarding putting the xstart, xend, and xstep into the constexpr body where they are used. I would personally do a step further and move also the `using acc...

Addressed in 3e68faf: I moved xstart/xend/xstep into the int64_t if constexpr branch, and also moved the using accscalar_t alias there since it is only needed for those values.

@copilot New UT failures detected in op_ut tests (1 total). Please check these new failures and analyze whether they are caused by the PR changes.

**Important: Do NOT update the plan in the PR de...

I checked run 26033231815 and downloaded artifact 7067652345 (new_ut_failure_list.csv). The reported failure is test_op_has_batch_rule_count_nonzero_xpu_float32 with Unexpected success, which comes from a pre-existing xfail("count_nonzero") in test/xpu/functorch/test_vmap_xpu.py. This PR only changes src/ATen/native/xpu/RangeFactories.cpp (8584d5b, follow-up 3e68faf) and does not touch count_nonzero/functorch paths, so this failure is not caused by these PR changes. Reproducer command: python -m pytest -x -vv test/xpu/functorch/test_vmap_xpu.py -k test_op_has_batch_rule_count_nonzero_xpu_float32.

github-actions · 2026-05-19T02:23:06Z

Performance outliers, please check!

🔴 [-1, 80%), should be regression

Category	Model	Target vs. Baseline [Eager]	Target vs. Baseline [Inductor]
timm_models_bfloat16_training	ghostnet_100	0.860729	0.567998
timm_models_bfloat16_training	mobilenetv3_large_100	0.779542	0.569755
timm_models_bfloat16_training	mobilevit_s	0.718466	0.576406
timm_models_bfloat16_training	mobilenetv2_100	0.829184	0.581502
timm_models_bfloat16_training	tf_efficientnet_b0	0.812186	0.627838
timm_models_bfloat16_training	deit_base_distilled_patch16_224	0.737897	0.738956
timm_models_bfloat16_training	vit_base_patch16_siglip_256	0.732254	0.761148
timm_models_bfloat16_training	dm_nfnet_f0	0.644692	0.775626
timm_models_bfloat16_training	nfnet_l0	0.743344	0.783158
torchbench_bfloat16_training	mobilenet_v2	1.064741	0.789318
timm_models_bfloat16_training	adv_inception_v3	0.796177	0.802161
timm_models_bfloat16_training	beit_base_patch16_224	0.775618	0.839905
timm_models_bfloat16_training	visformer_small	0.744350	0.875215

🟡 [80%, 90%), may be fluctuations

Category	Model	Target vs. Baseline [Eager]	Target vs. Baseline [Inductor]
torchbench_bfloat16_training	dcgan	0.852079	0.812976
timm_models_bfloat16_training	repvgg_a2	0.837614	0.832600
timm_models_bfloat16_training	inception_v3	0.851772	0.838098
timm_models_bfloat16_training	swin_base_patch4_window7_224	0.869586	0.838903
timm_models_bfloat16_training	convnextv2_nano.fcmae_ft_in22k_in1k	0.816929	0.857979
torchbench_bfloat16_training	mnasnet1_0	1.076022	0.875167
torchbench_bfloat16_training	resnext50_32x4d	0.972332	0.883240
timm_models_bfloat16_training	deit_tiny_patch16_224.fb_in1k	0.889454	0.884388

BBBela

Looks good to me in current shape!
Thanks!

chuanqi129 · 2026-05-19T10:25:15Z

@copilot New UT failures detected in op_ut tests (1 total). Please check these new failures and analyze whether they are caused by the PR changes.

Important: Do NOT update the plan in the PR description directly. Use reply comments to update the status.

Job log: https://github.com/intel/torch-xpu-ops/actions/runs/26056824947
Full new failure report: https://github.com/intel/torch-xpu-ops/actions/runs/26056824947/artifacts/7080904551

New failures:

Category	Class name	Test name	Status	Message
op_ut	third_party.torch-xpu-ops.test.xpu.functorch.test_vmap_xpu.TestVmapOperatorsOpInfoXPU	test_op_has_batch_rule_count_nonzero_xpu_float32	failed	Unexpected success

Instructions:

Download the full new failure report artifact for complete error messages
Analyze whether the failures are caused by PR changes
If caused by PR changes, fix those failures on this PR branch directly
Add failing test cases with pytest commands in the PR description for the reproducer job to use
Use reply comments to report analysis status - do NOT modify the PR description plan directly

Copilot · 2026-05-19T10:27:25Z

@copilot New UT failures detected in op_ut tests (1 total). Please check these new failures and analyze whether they are caused by the PR changes.

**Important: Do NOT update the plan in the PR de...

I checked run 26056824947, reviewed the failing linux-ut (op_ut) / summary job logs, and downloaded artifact 7080904551. The reported test test_op_has_batch_rule_count_nonzero_xpu_float32 fails as Unexpected success and corresponds to a pre-existing xfail("count_nonzero") in test/xpu/functorch/test_vmap_xpu.py. The last two PR commits (8584d5b, 3e68faf) only modify src/ATen/native/xpu/RangeFactories.cpp, so this is not caused by this PR’s arange changes. Reproducer: python -m pytest -x -vv test/xpu/functorch/test_vmap_xpu.py -k test_op_has_batch_rule_count_nonzero_xpu_float32.

Copilot AI review requested due to automatic review settings May 18, 2026 12:23

Copilot started reviewing on behalf of AKloniecki May 18, 2026 12:23 View session

Copilot AI reviewed May 18, 2026

View reviewed changes

Comment thread src/ATen/native/xpu/RangeFactories.cpp Outdated

jkosnox approved these changes May 18, 2026

View reviewed changes

pbielak reviewed May 18, 2026

View reviewed changes

Comment thread src/ATen/native/xpu/RangeFactories.cpp Outdated

BBBela reviewed May 18, 2026

View reviewed changes

Copilot started work on behalf of chuanqi129 May 18, 2026 19:49 View session

Scope arange int64 temporaries to constexpr branch

3e68faf

Agent-Logs-Url: https://github.com/intel/torch-xpu-ops/sessions/b20c84de-bfc4-446e-b55f-028f700c7dea Co-authored-by: chuanqi129 <13608516+chuanqi129@users.noreply.github.com>

Copilot finished work on behalf of chuanqi129 May 18, 2026 19:54

Copilot AI requested a review from chuanqi129 May 18, 2026 19:54

BBBela approved these changes May 19, 2026

View reviewed changes

Copilot started work on behalf of chuanqi129 May 19, 2026 10:25 View session

Copilot finished work on behalf of chuanqi129 May 19, 2026 10:28

pbielak approved these changes May 19, 2026

View reviewed changes

Merge branch 'main' into aklonieckix/fix-arange

10a5471

github-actions Bot added disable_e2e Disable all e2e test jobs for the PR disable_distributed Disable distributed UT test jobs for the PR labels May 21, 2026

chuanqi129 marked this pull request as draft May 21, 2026 14:15

chuanqi129 marked this pull request as ready for review May 21, 2026 14:15

Merge branch 'main' into aklonieckix/fix-arange

2035a8b

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix arange error type for fractional step with integer output on XPU#3701

Fix arange error type for fractional step with integer output on XPU#3701
AKloniecki wants to merge 4 commits into
mainfrom
aklonieckix/fix-arange

AKloniecki commented May 18, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

BBBela left a comment

Uh oh!

chuanqi129 commented May 18, 2026

Uh oh!

Copilot AI commented May 18, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented May 19, 2026

Uh oh!

BBBela left a comment

Uh oh!

chuanqi129 commented May 19, 2026

Uh oh!

Copilot AI commented May 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

Conversation

AKloniecki commented May 18, 2026

Summary

Changes

Test

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

BBBela left a comment

Choose a reason for hiding this comment

Uh oh!

chuanqi129 commented May 18, 2026

Uh oh!

Copilot AI commented May 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot commented May 19, 2026

Performance outliers, please check!

Uh oh!

BBBela left a comment

Choose a reason for hiding this comment

Uh oh!

chuanqi129 commented May 19, 2026

Uh oh!

Copilot AI commented May 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

Copilot AI commented May 18, 2026 •

edited

Loading