Use Protocols to type-check linear_proj submodules of Attention#3434
Use Protocols to type-check linear_proj submodules of Attention#3434nschank wants to merge 5 commits intoNVIDIA:mainfrom
Conversation
|
/ok to test 9db13d6 |
|
Resynced after coming back from travel, sorry for delay! |
yashaswikarnati
left a comment
There was a problem hiding this comment.
synced offline, just had a minor comment,overall lgtm!
Can you expand on this a bit? I'm guessing this is adding the row_parallel_linear_proj() function in addition to the "row_parallel_linear()" function? Don't those have the same inputs/outputs so same types? Why the need for a special one for "_proj"? |
|
@jaredcasper Sure! Fair criticism, this is sorta in a partial state so maybe I should update with a TODO for clarity or something. I'm trying to solve the following problem: backend: BackendSpecProvider = ...
submodules = SelfAttentionSubmodules(..., linear_proj=backend.get_type(), ...)
It can only do so if the thing being passed to But the return type of I don't have a great Protocol to use here for what generically a method named Thus, my proposed solution here is effectively to have |
|
@jaredcasper I realized the relevant work is actually somewhat independent so am opening a separate PR for it here: #4087 - I simply reverted the Backend changes here so now we're just focusing on linear_proj, and the issue I noted with type-checking will be fixed by that PR. |
|
/ok to test 6585350 |
|
appears to be cluster related. Rerunning failed tests |
|
/ok to test 221f80c |
|
/ok to test 6776b0d |
What does this PR do ?
Defines Protocols representing
linear_projsubmodules, and uses them instead of ModuleSpec to enable typechecking of its construction in SelfAttention, CrossAttention, and MLA.I also updated
Backendto returnlinear_projspecifically, allowing type-checking ofRowParallelLineartypes as instances oflinear_projdirectly (otherwiseBackend"hides" the type and makes no type-checking occur).While I was in
attention, I also updated the naming conventions of the existing interfaces to match what we've finalized on.Associated design doc: Typed ModuleSpec.pdf
Contribution process
flowchart LR A[Pre-checks] --> B[PR Tests] subgraph Code Review/Approval C1[Expert Review] --> C2[Final Review] end B --> C1 C2 --> D[Merge]Pre-checks
Core 0.8)Code review
The following process is enforced via the CODEOWNERS file for changes into
megatron/core. For changes outside ofmegatron/core, it is up to the PR author whether or not to tag the Final Reviewer team.For MRs into `main` branch
Feel free to message or comment the @mcore-oncall to help accelerate your merge into main. The less complex your PR is, the faster it will be approved and merged!
(Step 1): Add PR label
Expert Review(Step 2): Collect the expert reviewers reviews
Expert Reviewlabel when your PR is ready for review.Final Review might get declined if these requirements are not fulfilled.
(Step 3): Final Review
Final Reviewlabel(Optional Step 4): Cherry-pick into release branch
If this PR also needs to be merged into
core_r*release branches, after this PR has been merged, selectCherry-pickto open a new PR into the release branch.For MRs into `dev` branch
The proposed review process for `dev` branch is under active discussion.MRs are mergable after one approval by either
eharper@nvidia.comorzijiey@nvidia.com.Merging your PR
Any member of core-adlr and
core-nemowill be able to merge your PR.