Use Protocols to type-check linear_proj submodules of Attention by nschank · Pull Request #3434 · NVIDIA/Megatron-LM

nschank · 2026-02-15T16:42:02Z

What does this PR do ?

Defines Protocols representing linear_proj submodules, and uses them instead of ModuleSpec to enable typechecking of its construction in SelfAttention, CrossAttention, and MLA.

I also updated Backend to return linear_proj specifically, allowing type-checking of RowParallelLinear types as instances of linear_proj directly (otherwise Backend "hides" the type and makes no type-checking occur).

While I was in attention, I also updated the naming conventions of the existing interfaces to match what we've finalized on.

Associated design doc: Typed ModuleSpec.pdf

Contribution process

flowchart LR
    A[Pre-checks] --> B[PR Tests]
    subgraph Code Review/Approval
        C1[Expert Review] --> C2[Final Review]
    end
    B --> C1
    C2 --> D[Merge]

Pre-checks

I want this PR in a versioned release and have added the appropriate Milestone (e.g., Core 0.8)
I have added relevant unit tests
I have added relevant functional tests
I have added proper typing to my code Typing guidelines
I have added relevant documentation
I have run the autoformatter.sh on my PR

Code review

The following process is enforced via the CODEOWNERS file for changes into megatron/core. For changes outside of megatron/core, it is up to the PR author whether or not to tag the Final Reviewer team.

For MRs into `main` branch

Feel free to message or comment the @mcore-oncall to help accelerate your merge into main. The less complex your PR is, the faster it will be approved and merged!

(Step 1): Add PR label `Expert Review`

(Step 2): Collect the expert reviewers reviews

Attach the Expert Review label when your PR is ready for review.
GitHub auto-assigns expert reviewers based on your changes. They will get notified and pick up your PR soon.

⚠️ Only proceed to the next step once all reviewers have approved, merge-conflict are resolved and the CI is passing.
Final Review might get declined if these requirements are not fulfilled.

(Step 3): Final Review

Add Final Review label
GitHub auto-assigns final reviewers based on your changes. They will get notified and pick up your PR soon.

(Optional Step 4): Cherry-pick into release branch

If this PR also needs to be merged into core_r* release branches, after this PR has been merged, select Cherry-pick to open a new PR into the release branch.

For MRs into `dev` branch

The proposed review process for `dev` branch is under active discussion.

MRs are mergable after one approval by either eharper@nvidia.com or zijiey@nvidia.com.

Merging your PR

Any member of core-adlr and core-nemo will be able to merge your PR.

copy-pr-bot · 2026-02-15T16:42:05Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

Phlip79 · 2026-02-17T01:05:07Z

/ok to test 9db13d6

nschank · 2026-03-07T20:38:20Z

Resynced after coming back from travel, sorry for delay!

yashaswikarnati

synced offline, just had a minor comment,overall lgtm!

jaredcasper · 2026-03-17T23:53:49Z

I also updated Backend to return linear_proj specifically, allowing type-checking of RowParallelLinear types as instances of linear_proj directly (otherwise Backend "hides" the type and makes no type-checking occur).

Can you expand on this a bit? I'm guessing this is adding the row_parallel_linear_proj() function in addition to the "row_parallel_linear()" function? Don't those have the same inputs/outputs so same types? Why the need for a special one for "_proj"?

nschank · 2026-03-19T12:00:22Z

@jaredcasper Sure! Fair criticism, this is sorta in a partial state so maybe I should update with a TODO for clarity or something. I'm trying to solve the following problem:

backend: BackendSpecProvider = ...
submodules = SelfAttentionSubmodules(..., linear_proj=backend.get_type(), ...)

SelfAttentionSubmodules.linear_proj has a specific interface it wants to require - it knows the exact signature that a LinearProjBuilder is supposed to satisfy, and same for the LinearProjInterface it must return. So whenever you provide something via linear_proj=, the type checker is given the opportunity to check that the interface actually matches.

It can only do so if the thing being passed to linear_proj= actually has a type which can be tested against that interface. This is true of specific classes (so if I pass something of type type[RowParallelLinear]), unions of classes, Callables, functools.partial, etc.

But the return type of BackendSpecProvider.row_parallel_linear() is just type. type is basically equivalent to 🤷 as far as the type-checker is concerned, so doing linear_proj=backend.row_parallel_linear() will not catch a type error. Individual subclasses of BackendSpecProvider can provide a narrower return type for row_parallel_linear, which helps somewhat (if callers are using a subclass directly), but any time a caller is using something which the type-checker only knows is a BackendSpecProvider (but not which kind) then it will not type-check row_parallel_linear.

I don't have a great Protocol to use here for what generically a method named row_parallel_linear() should actually return - there are at least two distinct Protocols that row_parallel_linear() needs to satisfy (LinearProjBuilder and LinearFc2Builder), and it's not entirely obvious those two things are required to have identical interfaces. The ideal world would be if I could just say the return type is LinearProjBuilder & LinearFc2Builder (i.e. it must satisfy both at once) but Python doesn't support that.

Thus, my proposed solution here is effectively to have BackendSpecProvider offer individual methods for each particular Builder protocol that we end up introducing. If we later merge LinearProjBuilder and LinearFc2Builder into a single LinearLayerBuilder then both column_parallel_linear and row_parallel_linear could use it; but in the meantime I think we should have row_parallel_linear_proj (returning LinearProjBuilder) and I will rename row_parallel_linear() to row_parallel_linear_fc2() -> LinearFc2Builder. This basically means BackendSpecProvider might return the same class from multiple separate methods, but each one is enforcing that that class satisfies a different interface.

nschank · 2026-04-01T02:00:45Z

@jaredcasper I realized the relevant work is actually somewhat independent so am opening a separate PR for it here: #4087 - I simply reverted the Backend changes here so now we're just focusing on linear_proj, and the issue I noted with type-checking will be fixed by that PR.

Phlip79 · 2026-04-03T16:43:01Z

/ok to test 6585350

gautham-kollu · 2026-04-07T22:12:29Z

appears to be cluster related. Rerunning failed tests
2026-04-03T23:49:21.6764216Z 8-task-1-0/0 [default0]: handles = [pynvml.nvmlDeviceGetHandleByIndex(i) for i in physical_device_indices]
2026-04-03T23:49:21.6764692Z 8-task-1-0/0 [default0]: for i in range(len(handles)):
2026-04-03T23:49:21.6765036Z 8-task-1-0/0 [default0]: for j in range(i + 1, len(handles)):
2026-04-03T23:49:21.6765587Z 8-task-1-0/0 [default0]: status = pynvml.nvmlDeviceGetP2PStatus(handles[i], handles[j], pynvml.NVML_P2P_CAPS_INDEX_NVLINK)
2026-04-03T23:49:21.6766172Z 8-task-1-0/0 [default0]:> assert status == pynvml.NVML_P2P_STATUS_OK,
2026-04-03T23:49:21.6766766Z 8-task-1-0/0 [default0]: f'No NVLink connection between GPU {physical_device_indices[i]} and GPU {physical_device_indices[j]}, '
2026-04-03T23:49:21.6767468Z 8-task-1-0/0 [default0]: f'but allow_nvlink_for_{"low_latency" if low_latency_mode else "normal"}_mode=True'
2026-04-03T23:49:21.6768294Z 8-task-1-0/0 [default0]:E AssertionError: No NVLink connection between GPU 0 and GPU 4, but allow_nvlink_for_normal_mode=True

gautham-kollu · 2026-04-07T22:13:00Z

/ok to test 221f80c

Phlip79 · 2026-04-30T04:06:53Z

/ok to test 6776b0d

nschank requested review from a team as code owners February 15, 2026 16:42

ko3n1g requested a review from a team February 15, 2026 16:42

github-actions Bot added the community-request label Feb 15, 2026

Phlip79 added Expert Review [deprecated] Apply this label to indicate that your PR is ready for expert review. complexity: medium labels Feb 17, 2026

copy-pr-bot Bot temporarily deployed to nemo-ci February 17, 2026 01:05 Inactive

ko3n1g added this to the Core 0.16 milestone Feb 17, 2026

copy-pr-bot Bot temporarily deployed to nemo-ci February 17, 2026 01:05 Inactive

copy-pr-bot Bot had a problem deploying to nemo-ci February 17, 2026 01:05 Failure

copy-pr-bot Bot temporarily deployed to test February 17, 2026 01:06 Inactive

yashaswikarnati reviewed Feb 20, 2026

View reviewed changes

Comment thread megatron/core/models/gpt/gpt_layer_specs.py Outdated

chtruong814 added the needs-follow-up Issue needs follow-up label Mar 2, 2026

nschank force-pushed the linearproj branch from 9db13d6 to 479b369 Compare March 7, 2026 20:35

chtruong814 added needs-follow-up Issue needs follow-up and removed needs-follow-up Issue needs follow-up labels Mar 7, 2026

yashaswikarnati approved these changes Mar 13, 2026

View reviewed changes

nschank force-pushed the linearproj branch from 479b369 to 3f8feac Compare March 14, 2026 17:51

chtruong814 added the needs-follow-up Issue needs follow-up label Mar 14, 2026

santhnm2 reviewed Mar 18, 2026

View reviewed changes

Comment thread megatron/core/transformer/attention.py Outdated

nschank force-pushed the linearproj branch from 3f8feac to 208d564 Compare March 19, 2026 12:01

svcnvidia-nemo-ci added the Final Review PR is in the "final review" stage label Mar 19, 2026

Use Protocols to type-check linear_proj submodules of Attention

3248471

nschank force-pushed the linearproj branch from 09fcf45 to 3248471 Compare April 1, 2026 01:40

nschank mentioned this pull request Apr 1, 2026

Refactor BackendSpecProvider to use Protocols to define the types it returns #4087

Open

5 tasks

Remove accidental extra default arg

6585350

Phlip79 removed the Expert Review [deprecated] Apply this label to indicate that your PR is ready for expert review. label Apr 3, 2026

copy-pr-bot Bot temporarily deployed to test April 3, 2026 16:43 Inactive

Merge branch 'main' into linearproj

221f80c

copy-pr-bot Bot temporarily deployed to test April 7, 2026 22:13 Inactive

chtruong814 added waiting-for-customer and removed waiting-for-customer labels Apr 14, 2026

jaredcasper approved these changes Apr 17, 2026

View reviewed changes

Merge branch 'main' into linearproj

6b499fd

yashaswikarnati requested a review from a team as a code owner April 17, 2026 18:03

svcnvidia-nemo-ci added Approved All necessary approvals have been made and removed Final Review PR is in the "final review" stage labels Apr 17, 2026

chtruong814 added waiting-on-customer Waiting on the original author to respond and removed needs-follow-up Issue needs follow-up labels Apr 17, 2026

yaox12 added Approved All necessary approvals have been made and removed Approved All necessary approvals have been made labels Apr 20, 2026

chtruong814 added needs-follow-up Issue needs follow-up and removed waiting-on-customer Waiting on the original author to respond labels Apr 20, 2026

svcnvidia-nemo-ci added waiting-on-maintainers Waiting on maintainers to respond and removed needs-follow-up Issue needs follow-up labels Apr 21, 2026

Merge branch 'main' into linearproj

6776b0d

Phlip79 enabled auto-merge April 30, 2026 04:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use Protocols to type-check linear_proj submodules of Attention#3434

Use Protocols to type-check linear_proj submodules of Attention#3434
nschank wants to merge 5 commits intoNVIDIA:mainfrom
nschank:linearproj

nschank commented Feb 15, 2026

Uh oh!

copy-pr-bot Bot commented Feb 15, 2026

Uh oh!

Phlip79 commented Feb 17, 2026

Uh oh!

Uh oh!

nschank commented Mar 7, 2026

Uh oh!

yashaswikarnati left a comment

Uh oh!

jaredcasper commented Mar 17, 2026

Uh oh!

Uh oh!

nschank commented Mar 19, 2026

Uh oh!

nschank commented Apr 1, 2026

Uh oh!

Phlip79 commented Apr 3, 2026

Uh oh!

gautham-kollu commented Apr 7, 2026

Uh oh!

gautham-kollu commented Apr 7, 2026

Uh oh!

Phlip79 commented Apr 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

11 participants

Conversation

nschank commented Feb 15, 2026

What does this PR do ?

Contribution process

Pre-checks

Code review

(Step 1): Add PR label Expert Review

(Step 2): Collect the expert reviewers reviews

(Step 3): Final Review

(Optional Step 4): Cherry-pick into release branch

Merging your PR

Uh oh!

copy-pr-bot Bot commented Feb 15, 2026

Uh oh!

Phlip79 commented Feb 17, 2026

Uh oh!

Uh oh!

nschank commented Mar 7, 2026

Uh oh!

yashaswikarnati left a comment

Choose a reason for hiding this comment

Uh oh!

jaredcasper commented Mar 17, 2026

Uh oh!

Uh oh!

nschank commented Mar 19, 2026

Uh oh!

nschank commented Apr 1, 2026

Uh oh!

Phlip79 commented Apr 3, 2026

Uh oh!

gautham-kollu commented Apr 7, 2026

Uh oh!

gautham-kollu commented Apr 7, 2026

Uh oh!

Phlip79 commented Apr 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

11 participants

(Step 1): Add PR label `Expert Review`