Inference: Create finer grained cuda-graphs with better coverage of smaller batch sizes by sidsingh-nvidia · Pull Request #3527 · NVIDIA/Megatron-LM

sidsingh-nvidia · 2026-02-21T00:51:02Z

What does this PR do ?

Currently, we skip creating cuda graphs for small batch sizes. For example, with max requests 512 and number of cuda graphs = 16, our smallest cuda-graph size is 32. This PR makes it such that if the user passes -1 as the argument for number of cuda graphs, we automatically decide the cuda graph batch sizes, with fine grained coverage of small batch sizes i.e. [1,2,4,8]. The algorithm to pick the cuda graph batch sizes is identical to vLLM.

Note that, this is orthogonal to @mathemakitten's PR - #3509, which is another way of covering smaller batch sizes, albeit with a much smaller total cuda graph count. It remains to be seen which one is the more suitable approach.

⚠️ For major changes (either in lines of code or in its impact), please make sure to first share a design doc with the team. If you're unsure what's the best way to do so, contact the @mcore-oncall.

Contribution process

flowchart LR
    A[Pre-checks] --> B[PR Tests]
    subgraph Code Review/Approval
        C1[Expert Review] --> C2[Final Review]
    end
    B --> C1
    C2 --> D[Merge]

Pre-checks

I want this PR in a versioned release and have added the appropriate Milestone (e.g., Core 0.8)
I have added relevant unit tests
I have added relevant functional tests
I have added proper typing to my code Typing guidelines
I have added relevant documentation
I have run the autoformatter.sh on my PR

Code review

The following process is enforced via the CODEOWNERS file for changes into megatron/core. For changes outside of megatron/core, it is up to the PR author whether or not to tag the Final Reviewer team.

For MRs into `main` branch

Feel free to message or comment the @mcore-oncall to help accelerate your merge into main. The less complex your PR is, the faster it will be approved and merged!

(Step 1): Add PR label `Expert Review`

(Step 2): Collect the expert reviewers reviews

Attach the Expert Review label when your PR is ready for review.
GitHub auto-assigns expert reviewers based on your changes. They will get notified and pick up your PR soon.

⚠️ Only proceed to the next step once all reviewers have approved, merge-conflict are resolved and the CI is passing.
Final Review might get declined if these requirements are not fulfilled.

(Step 3): Final Review

Add Final Review label
GitHub auto-assigns final reviewers based on your changes. They will get notified and pick up your PR soon.

(Optional Step 4): Cherry-pick into release branch

If this PR also needs to be merged into core_r* release branches, after this PR has been merged, select Cherry-pick to open a new PR into the release branch.

For MRs into `dev` branch

The proposed review process for `dev` branch is under active discussion.

MRs are mergable after one approval by either eharper@nvidia.com or zijiey@nvidia.com.

Merging your PR

Any member of core-adlr and core-nemo will be able to merge your PR.

…automatically

janEbert · 2026-02-23T17:20:29Z

Could you please create tests for this new feature? Thanks!

sidsingh-nvidia · 2026-02-23T21:28:43Z

@janEbert I have added the unit test.

janEbert · 2026-02-23T22:05:32Z

Thank you!

janEbert · 2026-02-23T22:05:56Z

/ok to test 82b6c35

svcnvidia-nemo-ci · 2026-02-23T22:31:00Z

🔄 Merge queue validation started!

You can track the progress here: https://github.com/NVIDIA/Megatron-LM/actions/runs/22327611794

…maller batch sizes (NVIDIA#3527)

sidsingh-nvidia added 2 commits February 20, 2026 16:43

--inference-dynamic-batching-num-cuda-graphs -1 sets num cuda graphs …

5b470e5

…automatically

format

9a4cb19

sidsingh-nvidia requested review from a team as code owners February 21, 2026 00:51

svcnvidia-nemo-ci added this to the Core 0.16 milestone Feb 21, 2026

svcnvidia-nemo-ci requested a review from a team February 21, 2026 00:51

sidsingh-nvidia requested a review from mathemakitten February 21, 2026 00:51

format

5e5d999

copy-pr-bot Bot temporarily deployed to test February 21, 2026 00:58 Inactive

kvareddy approved these changes Feb 21, 2026

View reviewed changes

mathemakitten mentioned this pull request Feb 23, 2026

Change the cudagraph distribution from linearly to exponentially-decreasing #3509

Open

6 tasks

mathemakitten approved these changes Feb 23, 2026

View reviewed changes

shanmugamr1992 approved these changes Feb 23, 2026

View reviewed changes

sidsingh-nvidia self-assigned this Feb 23, 2026

Merge branch 'main' into siddharth/fine-grained-cgs

f37cdae

copy-pr-bot Bot temporarily deployed to test February 23, 2026 19:08 Inactive

sidsingh-nvidia added 2 commits February 23, 2026 13:25

add unit test

c5e2abb

Merge branch 'main' into siddharth/fine-grained-cgs

82b6c35

copy-pr-bot Bot temporarily deployed to test February 23, 2026 21:27 Inactive

sidsingh-nvidia enabled auto-merge February 23, 2026 21:46

janEbert approved these changes Feb 23, 2026

View reviewed changes

sidsingh-nvidia added this pull request to the merge queue Feb 23, 2026

Merged via the queue into NVIDIA:main with commit fde3b90 Feb 23, 2026
48 of 49 checks passed

sidsingh-nvidia deleted the siddharth/fine-grained-cgs branch February 23, 2026 23:16

ko3n1g pushed a commit to ko3n1g/Megatron-LM that referenced this pull request Feb 26, 2026

Inference: Create finer grained cuda-graphs with better coverage of s…

466e01f

…maller batch sizes (NVIDIA#3527)

BoxiangW pushed a commit to BoxiangW/Megatron-LM that referenced this pull request Mar 4, 2026

Inference: Create finer grained cuda-graphs with better coverage of s…

ca4438d

…maller batch sizes (NVIDIA#3527)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Inference: Create finer grained cuda-graphs with better coverage of smaller batch sizes#3527

Inference: Create finer grained cuda-graphs with better coverage of smaller batch sizes#3527
sidsingh-nvidia merged 6 commits intoNVIDIA:mainfrom
sidsingh-nvidia:siddharth/fine-grained-cgs

sidsingh-nvidia commented Feb 21, 2026 •

edited

Loading

Uh oh!

janEbert commented Feb 23, 2026

Uh oh!

sidsingh-nvidia commented Feb 23, 2026

Uh oh!

janEbert commented Feb 23, 2026

Uh oh!

janEbert commented Feb 23, 2026

Uh oh!

svcnvidia-nemo-ci commented Feb 23, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Conversation

sidsingh-nvidia commented Feb 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do ?

Contribution process

Pre-checks

Code review

(Step 1): Add PR label Expert Review

(Step 2): Collect the expert reviewers reviews

(Step 3): Final Review

(Optional Step 4): Cherry-pick into release branch

Merging your PR

Uh oh!

janEbert commented Feb 23, 2026

Uh oh!

sidsingh-nvidia commented Feb 23, 2026

Uh oh!

janEbert commented Feb 23, 2026

Uh oh!

janEbert commented Feb 23, 2026

Uh oh!

svcnvidia-nemo-ci commented Feb 23, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

sidsingh-nvidia commented Feb 21, 2026 •

edited

Loading

(Step 1): Add PR label `Expert Review`