Skip to content

inf is returned by nn.TransformerEncoderLayer#3674

Draft
Stonepia wants to merge 2 commits into
mainfrom
agent/issue-2015
Draft

inf is returned by nn.TransformerEncoderLayer#3674
Stonepia wants to merge 2 commits into
mainfrom
agent/issue-2015

Conversation

@Stonepia
Copy link
Copy Markdown
Contributor

inf is returned by nn.TransformerEncoderLayer

Fixes #2015

Root Cause: The test_transformerencoderlayer test added XPU to fast_path_device check (commit 792afdf), causing the test to expect NaN output on XPU when a fully-masked row is passed. However, XPU's TransformerEncoderLayer does not implement the fast path (unlike CUDA/CPU), so the attention implementation produces inf/NaN due to float16 overflow or softmax over -inf values rather than the expected NaN pattern. For float16, large input values ([20., 30., 40., 50.]) cause overflow in the softmax/attention computation, producing NaN instead of the expected reference values.

Failed Tests:

  • test/test_nn.py::TestNNDeviceTypeXPU::test_transformerencoderlayer_gelu_xpu_float16
  • test/test_nn.py::TestNNDeviceTypeXPU::test_transformerencoderlayer_xpu_float16
  • test/test_nn.py::TestNNDeviceTypeXPU::test_transformerencoderlayer_xpu_float32
  • test/test_nn.py::TestNNDeviceTypeXPU::test_transformerencoderlayer_xpu_float64

Diff stat:

test/xpu/test_nn_xpu.py | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

inf is returned by nn.TransformerEncoderLayer
@Stonepia Stonepia added the disable_all Disable all ci test jobs for the PR, just keep basic lint check label May 14, 2026
@chuanqi129
Copy link
Copy Markdown
Contributor

@copilot The Python lint (flake8/ruff) check failed. Please fix the lint errors in this PR.

Lint job log: https://github.com/intel/torch-xpu-ops/actions/runs/25865859919

Instructions:

  1. Review the lint failure log via the link above
  2. Fix all lint errors in the relevant source files
  3. Ensure both Python lint and Clang format checks pass

Copy link
Copy Markdown
Contributor

Copilot AI commented May 14, 2026

@copilot The Python lint (flake8/ruff) check failed. Please fix the lint errors in this PR.

Lint job log: https://github.com/intel/torch-xpu-ops/actions/runs/25865859919...

Fixed in commit a5d61d7. The PYFMT formatter required collapsing the multi-line and (... or ...) condition into a single line: and ("cuda" in str(device) or "cpu" in str(device)).

Copilot AI requested a review from chuanqi129 May 14, 2026 14:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

disable_all Disable all ci test jobs for the PR, just keep basic lint check

Projects

None yet

Development

Successfully merging this pull request may close these issues.

inf is returned by nn.TransformerEncoderLayer

3 participants