Add parameter to control whether to record stream by RichardWooSJTU · Pull Request #9 · PFCCLab/DeepEP

RichardWooSJTU · 2026-02-28T07:49:05Z

Summary

Add a skip_x_record_stream parameter (default false) to intranode_dispatch and intranode_combine, allowing callers to skip record_stream on large activation tensors (x, recv_x) to reduce GPU memory pressure
When skip_x_record_stream=True, PyTorch's CUDA caching allocator can reclaim the memory of x/recv_x earlier, instead of holding it until the communication stream finishes

Motivation

record_stream prevents the CUDA caching allocator from reusing a tensor's memory until the recorded stream completes. For large activation tensors like x and recv_x, this significantly increases peak GPU memory usage. In scenarios where the caller can guarantee the lifetime of these tensors externally (e.g., the tensors are kept alive by Python references until the comm stream finishes), skipping record_stream is safe and reduces memory footprint.

Changes

File	Change
csrc/deep_ep.hpp	Add bool skip_x_record_stream = false to intranode_dispatch and intranode_combine declarations
csrc/deep_ep.cpp	Gate x/recv_x record_stream calls behind !skip_x_record_stream in both functions
deep_ep/buffer.py	Add skip_x_record_stream: bool = False to dispatch() and combine(), pass through to C++ runtime

Usage

# Default behavior unchanged
recv_x, *_ = buffer.dispatch(x, ...)

# Skip record_stream on x/recv_x to save memory
recv_x, *_ = buffer.dispatch(x, ..., skip_x_record_stream=True)
recv_x, *_ = buffer.combine(x, handle, ..., skip_x_record_stream=True)

…_worst_tokens > 0

RichardWooSJTU added 2 commits February 28, 2026 15:41

Add parameter to control record stream o reduce memroy usage when num…

9497dd3

…_worst_tokens > 0

fix typo

dca8763

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add parameter to control whether to record stream#9

Add parameter to control whether to record stream#9
RichardWooSJTU wants to merge 2 commits intoPFCCLab:paddlefrom
RichardWooSJTU:num_worst_tokens

RichardWooSJTU commented Feb 28, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

RichardWooSJTU commented Feb 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Motivation

Changes

Usage

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

RichardWooSJTU commented Feb 28, 2026 •

edited

Loading