Add symmetric memory support on XPU device#2041
Conversation
76b7465 to
2ac439c
Compare
546c71d to
b19eed1
Compare
127627c to
03cedc1
Compare
|
@copilot The Python lint (flake8/ruff) check failed. Please fix the lint errors in this PR. Important: Do NOT update the plan in the PR description directly. Use reply comments to update the status. Lint job log: https://github.com/intel/torch-xpu-ops/actions/runs/25904657285 Instructions:
|
There was a problem hiding this comment.
Pull request overview
Adds an XPU symmetric-memory backend that mirrors PyTorch's CUDA implementation. It introduces a XPUSymmetricMemory/XPUSymmetricMemoryAllocator pair on top of sycl::ext::oneapi::experimental::ipc_memory for peer-buffer exchange, plus signal-pad-based barrier/put_signal/wait_signal kernels, store-backed handle exchange utilities, and new XCCL-driven tests. CMake plumbing for an optional ISHMEM dependency is also added but no source consumes it in this PR.
Changes:
- New XPU SymmetricMemory implementation: allocator, rendezvous via SYCL IPC, signal-pad kernels, Unix-domain-socket IPC channel helper.
- Build system integration: new
Signal.cppcompiled as SYCL, optionalUSE_ISHMEMdiscovery and linkage. - Tests: a new
test_symmetric_memory_xccl.pyplus aSymmetricMemoryTestclass added totest_c10d_xccl.py.
Reviewed changes
Copilot reviewed 14 out of 14 changed files in this pull request and generated 12 comments.
Show a summary per file
| File | Description |
|---|---|
| src/xccl/XPUSymmetricMemory.hpp | Public API for the XPU symm-mem allocator and handle types. |
| src/xccl/XPUSymmetricMemory.cpp | Allocator / rendezvous logic; uses SYCL IPC + store exchange; barrier/put/wait wrappers. |
| src/xccl/XPUSymmetricMemoryTypes.hpp | Backend-specific type aliases / constants. |
| src/xccl/XPUSymmetricMemoryUtils.hpp | Declarations for IpcChannel, StoreExchange, backend-name helper. |
| src/xccl/XPUSymmetricMemoryUtils.cpp | Implementation of UDS fd-passing IPC channel and getSymmMemBackendXPU. |
| src/xccl/Signal.hpp | Inline signal primitives + kernel launch declarations. |
| src/xccl/Signal.cpp | SYCL kernels for barrier / put_signal / wait_signal. |
| src/xccl/CMakeLists.txt | Routes Signal.cpp through the SYCL build path. |
| src/BuildOnLinux.cmake | Optionally links torch::ishmem when USE_ISHMEM. |
| CMakeLists.txt | Adds USE_ISHMEM discovery flow. |
| cmake/ISHMEM.cmake | Declares torch::ishmem imported target. |
| cmake/Modules/FindISHMEM.cmake | Locates Intel SHMEM via ISHMEM_ROOT. |
| test/xpu/distributed/test_c10d_xccl.py | Adds SymmetricMemoryTest (rendezvous, signal pad, subgroup, put/wait). |
| test/xpu/distributed/test_symmetric_memory_xccl.py | Adds AsyncTP fused all-gather-matmul / matmul-reduce-scatter tests. |
Addressed on branch head at |
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
|
@gujinghui @Chao1Han Please review |
PyTorch provides symmetric memory support on CUDA device.
Accordingly, we would like to provide similar feature on XPU device.