Skip to content

Add symmetric memory support on XPU device#2041

Open
zhangxiaoli73 wants to merge 31 commits into
mainfrom
cherry/add-symm-xpu
Open

Add symmetric memory support on XPU device#2041
zhangxiaoli73 wants to merge 31 commits into
mainfrom
cherry/add-symm-xpu

Conversation

@zhangxiaoli73
Copy link
Copy Markdown
Contributor

PyTorch provides symmetric memory support on CUDA device.

Accordingly, we would like to provide similar feature on XPU device.

@zhangxiaoli73 zhangxiaoli73 changed the title Add symmetric memory support on XPU device [Pending on SYCL IPC] Add symmetric memory support on XPU device Sep 17, 2025
@Chao1Han Chao1Han force-pushed the cherry/add-symm-xpu branch from 546c71d to b19eed1 Compare January 21, 2026 00:57
@Chao1Han Chao1Han force-pushed the cherry/add-symm-xpu branch from 127627c to 03cedc1 Compare May 15, 2026 06:50
Copilot AI review requested due to automatic review settings May 15, 2026 06:50
@chuanqi129
Copy link
Copy Markdown
Contributor

@copilot The Python lint (flake8/ruff) check failed. Please fix the lint errors in this PR.

Important: Do NOT update the plan in the PR description directly. Use reply comments to update the status.

Lint job log: https://github.com/intel/torch-xpu-ops/actions/runs/25904657285

Instructions:

  1. Review the lint failure log via the link above
  2. Fix all lint errors in the relevant source files
  3. Ensure both Python lint and Clang format checks pass
  4. Use reply comments to report progress - do NOT modify the PR description plan directly

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds an XPU symmetric-memory backend that mirrors PyTorch's CUDA implementation. It introduces a XPUSymmetricMemory/XPUSymmetricMemoryAllocator pair on top of sycl::ext::oneapi::experimental::ipc_memory for peer-buffer exchange, plus signal-pad-based barrier/put_signal/wait_signal kernels, store-backed handle exchange utilities, and new XCCL-driven tests. CMake plumbing for an optional ISHMEM dependency is also added but no source consumes it in this PR.

Changes:

  • New XPU SymmetricMemory implementation: allocator, rendezvous via SYCL IPC, signal-pad kernels, Unix-domain-socket IPC channel helper.
  • Build system integration: new Signal.cpp compiled as SYCL, optional USE_ISHMEM discovery and linkage.
  • Tests: a new test_symmetric_memory_xccl.py plus a SymmetricMemoryTest class added to test_c10d_xccl.py.

Reviewed changes

Copilot reviewed 14 out of 14 changed files in this pull request and generated 12 comments.

Show a summary per file
File Description
src/xccl/XPUSymmetricMemory.hpp Public API for the XPU symm-mem allocator and handle types.
src/xccl/XPUSymmetricMemory.cpp Allocator / rendezvous logic; uses SYCL IPC + store exchange; barrier/put/wait wrappers.
src/xccl/XPUSymmetricMemoryTypes.hpp Backend-specific type aliases / constants.
src/xccl/XPUSymmetricMemoryUtils.hpp Declarations for IpcChannel, StoreExchange, backend-name helper.
src/xccl/XPUSymmetricMemoryUtils.cpp Implementation of UDS fd-passing IPC channel and getSymmMemBackendXPU.
src/xccl/Signal.hpp Inline signal primitives + kernel launch declarations.
src/xccl/Signal.cpp SYCL kernels for barrier / put_signal / wait_signal.
src/xccl/CMakeLists.txt Routes Signal.cpp through the SYCL build path.
src/BuildOnLinux.cmake Optionally links torch::ishmem when USE_ISHMEM.
CMakeLists.txt Adds USE_ISHMEM discovery flow.
cmake/ISHMEM.cmake Declares torch::ishmem imported target.
cmake/Modules/FindISHMEM.cmake Locates Intel SHMEM via ISHMEM_ROOT.
test/xpu/distributed/test_c10d_xccl.py Adds SymmetricMemoryTest (rendezvous, signal pad, subgroup, put/wait).
test/xpu/distributed/test_symmetric_memory_xccl.py Adds AsyncTP fused all-gather-matmul / matmul-reduce-scatter tests.

Comment thread src/xccl/XPUSymmetricMemory.cpp Outdated
Comment thread src/xccl/Signal.hpp
Comment thread src/xccl/Signal.cpp
Comment thread test/xpu/distributed/test_c10d_xccl.py Outdated
Comment thread test/xpu/distributed/test_symmetric_memory_xccl.py Outdated
Comment thread src/xccl/XPUSymmetricMemory.hpp Outdated
Comment thread src/xccl/XPUSymmetricMemoryUtils.cpp
Comment thread src/xccl/XPUSymmetricMemoryUtils.cpp Outdated
Comment thread src/xccl/XPUSymmetricMemoryUtils.cpp Outdated
Comment thread CMakeLists.txt Outdated
@zhangxiaoli73 zhangxiaoli73 changed the title [Pending on SYCL IPC] Add symmetric memory support on XPU device Add symmetric memory support on XPU device May 15, 2026
Copy link
Copy Markdown
Contributor

Copilot AI commented May 15, 2026

@copilot The Python lint (flake8/ruff) check failed. Please fix the lint errors in this PR.

**Important: Do NOT update the plan in the PR description directly. Use reply comments to update the status...

Addressed on branch head at 48766d6: the lint-failing test file was removed, and I re-ran both checks locally with .github/scripts/lintrunner.sh (--skip CLANGTIDY,CLANGFORMAT,MERGE_CONFLICTLESS_CSV --all-files and --take CLANGFORMAT,CLANGTIDY --all-files). Both passed with No lint issues.

Copilot AI requested a review from chuanqi129 May 15, 2026 07:02
Copilot AI review requested due to automatic review settings May 18, 2026 05:33
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 9 out of 9 changed files in this pull request and generated 10 comments.

Comment thread src/xccl/Signal.hpp
Comment thread src/xccl/XPUSymmetricMemory.cpp
Comment thread src/xccl/XPUSymmetricMemoryUtils.hpp Outdated
Comment thread test/xpu/distributed/test_c10d_xccl.py Outdated
Comment thread src/xccl/Signal.hpp Outdated
Comment thread src/xccl/XPUSymmetricMemoryUtils.cpp
Comment thread src/xccl/XPUSymmetricMemoryUtils.cpp Outdated
Comment thread src/xccl/XPUSymmetricMemory.cpp Outdated
Comment thread src/xccl/XPUSymmetricMemory.cpp Outdated
Comment thread test/xpu/distributed/test_c10d_xccl.py Outdated
Copilot AI review requested due to automatic review settings May 20, 2026 07:58
@github-actions github-actions Bot added disable_e2e Disable all e2e test jobs for the PR disable_win Disable Windows CI test jobs for the PR labels May 20, 2026
@chuanqi129 chuanqi129 marked this pull request as draft May 20, 2026 07:59
@chuanqi129 chuanqi129 marked this pull request as ready for review May 20, 2026 07:59
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 9 out of 9 changed files in this pull request and generated 12 comments.

Comment thread src/xccl/XPUSymmetricMemoryUtils.cpp Outdated
Comment thread src/xccl/XPUSymmetricMemory.cpp
Comment thread test/xpu/distributed/test_c10d_xccl.py Outdated
Comment thread test/xpu/distributed/test_c10d_xccl.py Outdated
Comment thread test/xpu/distributed/test_c10d_xccl.py Outdated
Comment thread src/xccl/XPUSymmetricMemory.cpp
Comment thread src/xccl/XPUSymmetricMemory.cpp
Comment thread src/xccl/XPUSymmetricMemory.cpp
Comment thread src/xccl/Signal.hpp Outdated
Comment thread src/xccl/Signal.cpp
@zhangxiaoli73
Copy link
Copy Markdown
Contributor Author

@gujinghui @Chao1Han Please review

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

disable_e2e Disable all e2e test jobs for the PR disable_win Disable Windows CI test jobs for the PR

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants