[WebGPU] Fix stale buffer bindings on first graph-capture replay#28325
Open
hariharans29 wants to merge 3 commits intomainfrom
Open
[WebGPU] Fix stale buffer bindings on first graph-capture replay#28325hariharans29 wants to merge 3 commits intomainfrom
hariharans29 wants to merge 3 commits intomainfrom
Conversation
Lazily create the graph-mode BufferManager and gate routing through it on a graph_buffer_mgr_active_ flag set only between OnRunStart/OnRunEnd. Refactor GpuBufferAllocator to take a std::function<const BufferManager&()> getter so the allocator can resolve the active buffer manager dynamically. Pre-existing latent bug: when graph capture is enabled, captured commands held buffer pointers from the capture run. If the allocator freed and reused a slot before the first Replay(), the dispatch read stale memory and produced one wrong logits tensor. Mostly hidden in fixed-shape decode because the allocator typically re-handed out the same addresses; exposed by upcoming MatMulNBits MLP/QKV fusions which churn the allocation pattern. Also adds WebGpuContext::WaitForQueueIdle() and ComputeContextBase::FlushAndWait() helpers used by the buffer-mgr lifetime management.
Contributor
There was a problem hiding this comment.
Pull request overview
This PR addresses a WebGPU graph-capture correctness issue where the first replay could read stale buffer bindings. It does this by routing allocations through a dedicated “graph-mode” BufferManager only during the capture window, and by letting the allocator resolve the currently-active buffer manager dynamically.
Changes:
- Lazily create the graph-mode
BufferManagerand gateWebGpuExecutionProvider::BufferManager()via a newgraph_buffer_mgr_active_flag (active only betweenOnRunStart/OnRunEnd). - Refactor
GpuBufferAllocatorto optionally take astd::function<const BufferManager&()>getter, enabling dynamic buffer-manager selection. - Add
WebGpuContext::WaitForQueueIdle()(implementation) and aComputeContextBase::FlushAndWait()convenience helper.
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| onnxruntime/core/providers/webgpu/webgpu_execution_provider.h | Adds graph_buffer_mgr_active_ flag to control graph buffer manager routing. |
| onnxruntime/core/providers/webgpu/webgpu_execution_provider.cc | Lazily instantiates graph buffer manager, toggles active flag during capture, and updates allocator creation to use a getter. |
| onnxruntime/core/providers/webgpu/webgpu_context.cc | Implements WebGpuContext::WaitForQueueIdle(). |
| onnxruntime/core/providers/webgpu/compute_context.h | Adds FlushAndWait() helper and expands BufferManagerAccessor friendship. |
| onnxruntime/core/providers/webgpu/allocator.h | Adds <functional> and a new allocator constructor taking a buffer-manager getter; stores the getter. |
| onnxruntime/core/providers/webgpu/allocator.cc | Implements new constructor and switches alloc/free to use the getter. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Fixes build error introduced when ComputeContextBase::FlushAndWait() was added without the corresponding WebGpuContext public declaration. Spotted by Copilot review on the PR.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Lazily create the graph-mode BufferManager and gate routing through it on a graph_buffer_mgr_active_ flag set only between OnRunStart/OnRunEnd. Refactor GpuBufferAllocator to take a std::function<const BufferManager&()> getter so the allocator can resolve the active buffer manager dynamically.
Pre-existing latent bug: when graph capture is enabled, captured commands held buffer pointers from the capture run. If the allocator freed and reused a slot before the first Replay(), the dispatch read stale memory and produced one wrong logits tensor. Mostly hidden in fixed-shape decode because the allocator typically re-handed out the same addresses; exposed by MatMulNBits MLP/QKV fusions which churn the allocation pattern (#28280)
Also adds WebGpuContext::WaitForQueueIdle() and ComputeContextBase::FlushAndWait() helpers used by the buffer-mgr lifetime management.
Motivation and Context
Fix correctness on first replay