Refactor Memory.h and simply the API by guangyey · Pull Request #3742 · intel/torch-xpu-ops

guangyey · 2026-05-22T10:35:23Z

Motivation

Centralize XPU memory copy/set operations into comm/Memory.h. Previously, all APIs defined in Memory.h are unused. I refactor it and provide some APIs could be reused in code.

Raw queue.memcpy() + record_event() pairs were scattered across the codebase, making it easy to miss event recording and leak pinned memory lifetime bugs. This introduces a small set of typed wrappers that encapsulate the correct synchronization and event-recording protocol, then migrates all callsites to use them.

Benefits: correctness by construction (can't forget record_event), single point of maintenance.

Copilot

Pull request overview

Skill file(s) read: .github/skills/xpu-ops-pr-review/SKILL.md.

This PR refactors XPU SYCL memory copy/memset helpers into src/comm/Memory.h, introducing a small set of wrappers intended to standardize synchronization and pinned-memory event recording, and migrates multiple callsites to use the new APIs.

Changes:

Replaced scattered queue.memcpy()/memset() usages (plus ad-hoc record_event) with centralized wrappers (memcpyAndSync, memcpyAsync, memcpyHostToDeviceAsync, memcpyPinnedHostToDeviceAsync, memcpyDeviceToHostAsync, memsetAndSync, memsetAsync).
Migrated key callsites (copy path, foreach/multi-tensor metadata uploads, scalar extraction, sparse CSR add, resize) to use the wrappers.

Must-fix (blocking):

src/ATen/native/xpu/Copy.cpp device-to-device “memcpy-eligible” path now enqueues the copy via xpu::sycl::memcpyAsync, which uses the current stream internally rather than the provided copy_stream. This can break the cross-device barrier/synchronization logic and cause races.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 1 comment.

Show a summary per file

File	Description
src/comm/Memory.h	Adds centralized SYCL memcpy/memset wrappers with pinned-host staging and event recording.
src/ATen/native/xpu/XPUScalar.cpp	Uses `memcpyAndSync` for scalar extraction instead of raw queue memcpy+wait.
src/ATen/native/xpu/sycl/ResizeKernel.cpp	Uses centralized async memcpy for storage resize copy.
src/ATen/native/xpu/sycl/MultiTensorApply.h	Uses pinned-H2D wrapper for metadata uploads and encapsulated event recording.
src/ATen/native/xpu/sycl/ForeachReduceKernels.cpp	Uses pinned-H2D wrapper for metadata/count uploads and encapsulated event recording.
src/ATen/native/xpu/Copy.cpp	Refactors memcpy paths to use centralized wrappers (introduces stream/queue mismatch bug).
src/ATen/native/sparse/xpu/sycl/SparseCsrTensorAddKernels.cpp	Uses centralized memset/memcpy wrappers for async set and sync readback of nnz.

+    auto src = iter.data_ptr(1);
    size_t size = iter.numel() * iter.element_size(0);
-    q.copy(src, dst, size);
+    xpu::sycl::memcpyAsync(dst, src, size);


github-actions Bot added disable_e2e Disable all e2e test jobs for the PR disable_distributed Disable distributed UT test jobs for the PR labels May 22, 2026

chuanqi129 marked this pull request as draft May 22, 2026 10:35

chuanqi129 marked this pull request as ready for review May 22, 2026 10:35

guangyey force-pushed the guangyey/memcpy branch 4 times, most recently from 1b15397 to 529a550 Compare May 22, 2026 11:18

guangyey requested a review from Copilot May 22, 2026 11:25

Copilot started reviewing on behalf of guangyey May 22, 2026 11:25 View session

Copilot AI reviewed May 22, 2026

View reviewed changes

Comment thread src/ATen/native/xpu/Copy.cpp

auto src = iter.data_ptr(1);

size_t size = iter.numel() * iter.element_size(0);

q.copy(src, dst, size);

xpu::sycl::memcpyAsync(dst, src, size);

Refactor Memory.h and simply the API

529a550

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor Memory.h and simply the API#3742

Refactor Memory.h and simply the API#3742
guangyey wants to merge 1 commit into
mainfrom
guangyey/memcpy

guangyey commented May 22, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

guangyey commented May 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

guangyey commented May 22, 2026 •

edited

Loading