Skip to content

[WebGPU] Fix stale buffer bindings on first graph-capture replay#28325

Open
hariharans29 wants to merge 3 commits intomainfrom
hari/webgpu_graph_capture_buffer_fix
Open

[WebGPU] Fix stale buffer bindings on first graph-capture replay#28325
hariharans29 wants to merge 3 commits intomainfrom
hari/webgpu_graph_capture_buffer_fix

Conversation

@hariharans29
Copy link
Copy Markdown
Member

Lazily create the graph-mode BufferManager and gate routing through it on a graph_buffer_mgr_active_ flag set only between OnRunStart/OnRunEnd. Refactor GpuBufferAllocator to take a std::function<const BufferManager&()> getter so the allocator can resolve the active buffer manager dynamically.

Pre-existing latent bug: when graph capture is enabled, captured commands held buffer pointers from the capture run. If the allocator freed and reused a slot before the first Replay(), the dispatch read stale memory and produced one wrong logits tensor. Mostly hidden in fixed-shape decode because the allocator typically re-handed out the same addresses; exposed by MatMulNBits MLP/QKV fusions which churn the allocation pattern (#28280)

Also adds WebGpuContext::WaitForQueueIdle() and ComputeContextBase::FlushAndWait() helpers used by the buffer-mgr lifetime management.

Motivation and Context

Fix correctness on first replay

Lazily create the graph-mode BufferManager and gate routing through it on a graph_buffer_mgr_active_ flag set only between OnRunStart/OnRunEnd. Refactor GpuBufferAllocator to take a std::function<const BufferManager&()> getter so the allocator can resolve the active buffer manager dynamically.

Pre-existing latent bug: when graph capture is enabled, captured commands held buffer pointers from the capture run. If the allocator freed and reused a slot before the first Replay(), the dispatch read stale memory and produced one wrong logits tensor. Mostly hidden in fixed-shape decode because the allocator typically re-handed out the same addresses; exposed by upcoming MatMulNBits MLP/QKV fusions which churn the allocation pattern.

Also adds WebGpuContext::WaitForQueueIdle() and ComputeContextBase::FlushAndWait() helpers used by the buffer-mgr lifetime management.
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR addresses a WebGPU graph-capture correctness issue where the first replay could read stale buffer bindings. It does this by routing allocations through a dedicated “graph-mode” BufferManager only during the capture window, and by letting the allocator resolve the currently-active buffer manager dynamically.

Changes:

  • Lazily create the graph-mode BufferManager and gate WebGpuExecutionProvider::BufferManager() via a new graph_buffer_mgr_active_ flag (active only between OnRunStart/OnRunEnd).
  • Refactor GpuBufferAllocator to optionally take a std::function<const BufferManager&()> getter, enabling dynamic buffer-manager selection.
  • Add WebGpuContext::WaitForQueueIdle() (implementation) and a ComputeContextBase::FlushAndWait() convenience helper.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
onnxruntime/core/providers/webgpu/webgpu_execution_provider.h Adds graph_buffer_mgr_active_ flag to control graph buffer manager routing.
onnxruntime/core/providers/webgpu/webgpu_execution_provider.cc Lazily instantiates graph buffer manager, toggles active flag during capture, and updates allocator creation to use a getter.
onnxruntime/core/providers/webgpu/webgpu_context.cc Implements WebGpuContext::WaitForQueueIdle().
onnxruntime/core/providers/webgpu/compute_context.h Adds FlushAndWait() helper and expands BufferManagerAccessor friendship.
onnxruntime/core/providers/webgpu/allocator.h Adds <functional> and a new allocator constructor taking a buffer-manager getter; stores the getter.
onnxruntime/core/providers/webgpu/allocator.cc Implements new constructor and switches alloc/free to use the getter.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread onnxruntime/core/providers/webgpu/compute_context.h
Comment thread onnxruntime/core/providers/webgpu/compute_context.h Outdated
Comment thread onnxruntime/core/providers/webgpu/webgpu_context.cc
Fixes build error introduced when ComputeContextBase::FlushAndWait() was added without the corresponding WebGpuContext public declaration. Spotted by Copilot review on the PR.
@hariharans29 hariharans29 changed the title [WebGPU] Fix stale buffer bindings on first graph-capture replay WIP: [WebGPU] Fix stale buffer bindings on first graph-capture replay May 2, 2026
@hariharans29 hariharans29 changed the title WIP: [WebGPU] Fix stale buffer bindings on first graph-capture replay [WebGPU] Fix stale buffer bindings on first graph-capture replay May 3, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants