Skip to content

[Docs] Clarify top-level for-loops inside graph_do_while#727

Open
hughperkins wants to merge 1 commit into
mainfrom
hp/doc-graph-do-while-top-level-loops
Open

[Docs] Clarify top-level for-loops inside graph_do_while#727
hughperkins wants to merge 1 commit into
mainfrom
hp/doc-graph-do-while-top-level-loops

Conversation

@hughperkins

Copy link
Copy Markdown
Collaborator

Document that each top-level for-loop in a graph_do_while body is still its own offloaded launch with grid-wide barriers between consecutive loops, so multi-phase algorithms that need grid-wide sync between phases (e.g. a device-wide radix sort: histogram -> scan -> scatter) work correctly when called directly in the loop body.

Clarifies that the "do not nest in runtime for/if/while" guidance is about ordinary nested control flow (which demotes a loop out of top-level position and collapses the offload), NOT about graph_do_while itself - which is the construct designed to host a sequence of top-level offloaded loops. Verified empirically (radix sort of N>BLOCK_DIM keys inside graph_do_while sorts correctly every iteration).

Issue: #

Brief Summary

copilot:summary

Walkthrough

copilot:walkthrough

Document that each top-level for-loop in a graph_do_while body is still its own
offloaded launch with grid-wide barriers between consecutive loops, so multi-phase
algorithms that need grid-wide sync between phases (e.g. a device-wide radix sort:
histogram -> scan -> scatter) work correctly when called directly in the loop body.

Clarifies that the "do not nest in runtime for/if/while" guidance is about ordinary
nested control flow (which demotes a loop out of top-level position and collapses the
offload), NOT about graph_do_while itself - which is the construct designed to host a
sequence of top-level offloaded loops. Verified empirically (radix sort of N>BLOCK_DIM
keys inside graph_do_while sorts correctly every iteration).
@github-actions

github-actions Bot commented Jun 6, 2026

Copy link
Copy Markdown

cond[()] = ...
```

**What does break the grid-wide barrier:** nesting a `for`-loop inside *ordinary* runtime control flow — another `for`, an `if`, or a plain Python `while` — **demotes it from top-level position**, so it no longer becomes its own offloaded launch. Instead it runs as device code *within the enclosing launch*, and the grid-wide barrier between it and its siblings is lost (other blocks may not have produced their data yet). `graph_do_while` is **not** "ordinary runtime control flow" in this sense — it is precisely the construct designed to host a sequence of top-level offloaded loops, so loops directly in its body keep their barriers. Compile-time `qd.static(range(...))` loops are also fine: they unroll flat at compile time and keep their bodies at top-level position.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need this paragraph?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant