Skip to content

Add opt-in nonuniform tensor parallelism#4585

Open
daiyaanarfeen wants to merge 1 commit intoNVIDIA:devfrom
daiyaanarfeen:ntp-implementation-dev-pr
Open

Add opt-in nonuniform tensor parallelism#4585
daiyaanarfeen wants to merge 1 commit intoNVIDIA:devfrom
daiyaanarfeen:ntp-implementation-dev-pr

Conversation

@daiyaanarfeen
Copy link
Copy Markdown

@daiyaanarfeen daiyaanarfeen commented May 1, 2026

What does this PR do ?

Adds opt-in nonuniform tensor parallelism support through isolated helper modules, including NTP-aware DDP/optimizer gradient resharding and a Transformer Engine userbuffer adapter for mixed TP-domain overlap setup.

Issue tracking

Linked issue: N/A. This is a NVIDIA-authored feature branch for internal NTP validation; no public GitHub issue is currently linked.

Contribution process

Pre-checks

  • I have added relevant unit tests
  • I have added relevant functional tests via distributed end-to-end unit coverage for packed TP2/TP4 + TE overlap
  • I have added proper typing to my code Typing guidelines
  • I have added relevant documentation in module and API docstrings
  • I have run the autoformatter.sh on my PR

Testing

  • SKIP_DOCS=true BASE_REF=dev /Users/darfeen/.cache/uv/bin/uv run bash tools/autoformat.sh
  • CHECK_ONLY=true SKIP_DOCS=true BASE_REF=dev /Users/darfeen/.cache/uv/bin/uv run bash tools/autoformat.sh
  • /tmp/megatron-ntp-pr/.venv/bin/python -m py_compile megatron/core/distributed/nonuniform_tp.py megatron/core/extensions/nonuniform_tp_transformer_engine.py tests/unit_tests/distributed/test_nonuniform_tp.py tests/unit_tests/extension/test_nonuniform_tp_transformer_engine.py
  • /tmp/megatron-ntp-pr/.venv/bin/python -c '<direct nonuniform_tp_transformer_engine helper smoke test>'

Note: local pytest collection is blocked in this macOS uv environment because the repo excludes torch/triton from the default env; a one-off torch install then failed on missing triton during Megatron conftest import.

@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot Bot commented May 1, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@daiyaanarfeen daiyaanarfeen force-pushed the ntp-implementation-dev-pr branch from 0c04f90 to e756ddf Compare May 1, 2026 22:40
@daiyaanarfeen daiyaanarfeen marked this pull request as ready for review May 1, 2026 22:42
@daiyaanarfeen daiyaanarfeen requested review from a team as code owners May 1, 2026 22:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants