Enable active-param and memory based Minitron pruning constraint by kevalmorabia97 · Pull Request #1377 · NVIDIA/Model-Optimizer

kevalmorabia97 · 2026-04-30T11:03:58Z

What does this PR do?

Type of change: New feature, new tests, documentation.

Extends the Minitron NAS pruner to support pruning by active parameter count (active_params) and memory footprint (memory_mb) in addition to the existing total parameter count (params) constraint. Also adds standalone utilities for analytical model stats.

Changes

New pruning constraint keys

active_params: prune to a target number of active (routed) params — useful for MoE models where total ≫ active; when present, active_params is the primary sort/display metric for candidates (priority: active_params > params > memory_mb)
memory_mb: prune to fit a memory budget (BF16 weights + KV-cache + Mamba state at a given sequence length and batch size)
Constraints can be combined (AND logic): e.g. {"params": 6e9, "memory_mb": 12288}

New standalone utilities (modelopt.torch.nas.plugins.megatron_model_stats)

mcore_param_count: analytically computes total and active parameter counts for GPT and Mamba/hybrid MCore models
mcore_memory_footprint_mb: estimates memory in MB (weights + KV-cache + Mamba state)
print_mcore_model_stats: rich-formatted model stats panel

Rich-formatted pruning logs — search space, top-k candidate tables, and best subnet panel printed on rank 0

prune_score_func format update — now mmlu_<N>pct_bs<bs> (e.g. mmlu_10pct_bs32) to explicitly control batch size for MMLU evaluation; old mmlu_<N>pct format removed

Infrastructure

NeMo container bumped to nvcr.io/nvidia/nemo:26.04 in CI and docs
Added examples/megatron_bridge/requirements.txt with transformers<5.0 (required for saving some Nemotron-3-Nano models)

Usage

# Prune to 3B active params (MoE-aware) — active_params is the primary sort metric
mtp.prune(model, mode=[("mcore_minitron", ss_config)], constraints={"active_params": 3e9}, config=pruning_config)

# Prune to fit a 12 GB memory budget
mtp.prune(model, mode=[("mcore_minitron", ss_config)], constraints={"memory_mb": 12288}, config=pruning_config)

Testing

Pruned Nemotron-3-Nano-30B-A3B (31.6B, A3.6B) --> A3.0B. Takes <1hr on 8x H100 (more details in #1376)

torchrun --nproc_per_node 8 examples/megatron_bridge/prune_minitron.py \
    --pp_size 8 \
    --hf_model_name_or_path nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16 \
    --trust_remote_code \
    --prune_target_params 28e9 \
    --prune_target_active_params 3e9 \
    --hparams_to_skip num_attention_heads \
    --seq_length 8192 \
    --output_hf_path pruned/Nemotron-3-Nano-30B-A3B-Pruned-28B-A3B-top20-max15depth-max30width-mmlu_10pct_bs32 \
    --top_k 20 \
    --max_depth_pruning 0.15 \
    --max_width_pruning 0.30 \
    --prune_score_func mmlu_10pct_bs32 \
    --num_layers_in_first_pipeline_stage 5 \
    --num_layers_in_last_pipeline_stage 5

╭──────────────────────────────────────────────────── Original Model Stats ─────────────────────────────────────────────────────╮
│ Total Parameters                              31.58B                                                                          │
│ Active Parameters                             3.58B                                                                           │
│ Memory (BF16, seq_length=8192, batch_size=1)  weights: 60230.1 MB, kv_cache: 48.0 MB, mamba_state: 23.8 MB, Total: 60301.9 MB │
╰───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯

                                                                 Top 20 Candidates with Scores
┏━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━━┓
┃  # ┃ export_config                                                                                                         ┃ active_params ┃ params ┃  score ┃
┡━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━━┩
│  1 │ {'num_layers': 46, 'hidden_size': 2560, 'mamba_num_heads': 56, 'mamba_head_dim': 64, 'num_moe_experts': 120,          │         3.00B │ 27.06B │ 0.3399 │
│    │ 'moe_ffn_hidden_size': 1792, 'moe_shared_expert_intermediate_size': 3072}                                             │               │        │        │
│  2 │ {'num_layers': 48, 'hidden_size': 2560, 'mamba_num_heads': 56, 'mamba_head_dim': 56, 'num_moe_experts': 112,          │         3.00B │ 25.37B │ 0.4650 │
│    │ 'moe_ffn_hidden_size': 1792, 'moe_shared_expert_intermediate_size': 3072}                                             │               │        │        │
│  3 │ {'num_layers': 46, 'hidden_size': 2560, 'mamba_num_heads': 64, 'mamba_head_dim': 56, 'num_moe_experts': 112,          │         3.00B │ 25.37B │ 0.2343 │
│    │ 'moe_ffn_hidden_size': 1792, 'moe_shared_expert_intermediate_size': 3072}                                             │               │        │        │
│  4 │ {'num_layers': 52, 'hidden_size': 2688, 'mamba_num_heads': 56, 'mamba_head_dim': 48, 'num_moe_experts': 96,           │         3.00B │ 20.09B │ 0.2552 │
│    │ 'moe_ffn_hidden_size': 1536, 'moe_shared_expert_intermediate_size': 3072}                                             │               │        │        │
│  5 │ {'num_layers': 52, 'hidden_size': 2688, 'mamba_num_heads': 48, 'mamba_head_dim': 56, 'num_moe_experts': 104,          │         3.00B │ 21.61B │ 0.2601 │
│    │ 'moe_ffn_hidden_size': 1536, 'moe_shared_expert_intermediate_size': 3072}                                             │               │        │        │
│  6 │ {'num_layers': 52, 'hidden_size': 2560, 'mamba_num_heads': 48, 'mamba_head_dim': 64, 'num_moe_experts': 96,           │         3.00B │ 19.28B │ 0.3762 │
│    │ 'moe_ffn_hidden_size': 1536, 'moe_shared_expert_intermediate_size': 3712}                                             │               │        │        │
│  7 │ {'num_layers': 52, 'hidden_size': 2304, 'mamba_num_heads': 64, 'mamba_head_dim': 64, 'num_moe_experts': 104,          │         3.00B │ 22.28B │ 0.4783 │
│    │ 'moe_ffn_hidden_size': 1856, 'moe_shared_expert_intermediate_size': 3072}                                             │               │        │        │
│  8 │ {'num_layers': 52, 'hidden_size': 2560, 'mamba_num_heads': 48, 'mamba_head_dim': 48, 'num_moe_experts': 96,           │         3.00B │ 21.99B │ 0.2420 │
│    │ 'moe_ffn_hidden_size': 1792, 'moe_shared_expert_intermediate_size': 3328}                                             │               │        │        │
│  9 │ {'num_layers': 50, 'hidden_size': 2560, 'mamba_num_heads': 48, 'mamba_head_dim': 48, 'num_moe_experts': 112,          │         3.00B │ 25.37B │ 0.2399 │
│    │ 'moe_ffn_hidden_size': 1792, 'moe_shared_expert_intermediate_size': 3712}                                             │               │        │        │
│ 10 │ {'num_layers': 50, 'hidden_size': 2560, 'mamba_num_heads': 48, 'mamba_head_dim': 48, 'num_moe_experts': 112,          │         3.00B │ 26.17B │ 0.2601 │
│    │ 'moe_ffn_hidden_size': 1856, 'moe_shared_expert_intermediate_size': 3328}                                             │               │        │        │
│ 11 │ {'num_layers': 46, 'hidden_size': 2560, 'mamba_num_heads': 56, 'mamba_head_dim': 64, 'num_moe_experts': 112,          │         3.00B │ 25.37B │ 0.2503 │
│    │ 'moe_ffn_hidden_size': 1792, 'moe_shared_expert_intermediate_size': 3072}                                             │               │        │        │
│ 12 │ {'num_layers': 48, 'hidden_size': 2560, 'mamba_num_heads': 56, 'mamba_head_dim': 56, 'num_moe_experts': 104,          │         3.00B │ 23.68B │ 0.4329 │
│    │ 'moe_ffn_hidden_size': 1792, 'moe_shared_expert_intermediate_size': 3072}                                             │               │        │        │
│ 13 │ {'num_layers': 46, 'hidden_size': 2688, 'mamba_num_heads': 64, 'mamba_head_dim': 64, 'num_moe_experts': 128,          │         3.00B │ 26.17B │ 0.2587 │
│    │ 'moe_ffn_hidden_size': 1536, 'moe_shared_expert_intermediate_size': 2816}                                             │               │        │        │
│ 14 │ {'num_layers': 46, 'hidden_size': 2560, 'mamba_num_heads': 64, 'mamba_head_dim': 56, 'num_moe_experts': 104,          │         3.00B │ 23.68B │ 0.2336 │
│    │ 'moe_ffn_hidden_size': 1792, 'moe_shared_expert_intermediate_size': 3072}                                             │               │        │        │
│ 15 │ {'num_layers': 52, 'hidden_size': 2688, 'mamba_num_heads': 48, 'mamba_head_dim': 56, 'num_moe_experts': 96,           │         3.00B │ 20.09B │ 0.2559 │
│    │ 'moe_ffn_hidden_size': 1536, 'moe_shared_expert_intermediate_size': 3072}                                             │               │        │        │
│ 16 │ {'num_layers': 52, 'hidden_size': 2304, 'mamba_num_heads': 64, 'mamba_head_dim': 64, 'num_moe_experts': 96,           │         3.00B │ 20.70B │ 0.4608 │
│    │ 'moe_ffn_hidden_size': 1856, 'moe_shared_expert_intermediate_size': 3072}                                             │               │        │        │
│ 17 │ {'num_layers': 50, 'hidden_size': 2560, 'mamba_num_heads': 48, 'mamba_head_dim': 48, 'num_moe_experts': 104,          │         3.00B │ 23.68B │ 0.2455 │
│    │ 'moe_ffn_hidden_size': 1792, 'moe_shared_expert_intermediate_size': 3712}                                             │               │        │        │
│ 18 │ {'num_layers': 50, 'hidden_size': 2560, 'mamba_num_heads': 48, 'mamba_head_dim': 48, 'num_moe_experts': 104,          │         3.00B │ 24.42B │ 0.2503 │
│    │ 'moe_ffn_hidden_size': 1856, 'moe_shared_expert_intermediate_size': 3328}                                             │               │        │        │
│ 19 │ {'num_layers': 48, 'hidden_size': 2560, 'mamba_num_heads': 48, 'mamba_head_dim': 48, 'num_moe_experts': 120,          │         3.00B │ 27.92B │ 0.2587 │
│    │ 'moe_ffn_hidden_size': 1856, 'moe_shared_expert_intermediate_size': 3712}                                             │               │        │        │
│ 20 │ {'num_layers': 46, 'hidden_size': 2560, 'mamba_num_heads': 56, 'mamba_head_dim': 64, 'num_moe_experts': 104,          │         3.00B │ 23.68B │ 0.2469 │
│    │ 'moe_ffn_hidden_size': 1792, 'moe_shared_expert_intermediate_size': 3072}                                             │               │        │        │
└────┴───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┴───────────────┴────────┴────────┘

╭──────────────────────────────────────────────────────────────────────── Best Subnet ─────────────────────────────────────────────────────────────────────────╮
│ export_config  {'num_layers': 52, 'hidden_size': 2304, 'mamba_num_heads': 64, 'mamba_head_dim': 64, 'num_moe_experts': 104, 'moe_ffn_hidden_size': 1856,     │
│                'moe_shared_expert_intermediate_size': 3072}                                                                                                  │
│ active_params  3.00B                                                                                                                                         │
│ params         22.28B                                                                                                                                        │
│ score          0.4783                                                                                                                                        │
╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯

╭───────────────────────────────────────────────────── Pruned Model Stats ──────────────────────────────────────────────────────╮
│ Total Parameters                              22.28B                                                                          │
│ Active Parameters                             3.00B                                                                           │
│ Memory (BF16, seq_length=8192, batch_size=1)  weights: 42489.7 MB, kv_cache: 48.0 MB, mamba_state: 23.8 MB, Total: 42561.6 MB │
╰───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯

Before your PR is "Ready for review"

Is this change backward compatible?: ✅
Did you write any new necessary tests?: ✅
Did you update Changelog?: ✅

coderabbitai · 2026-04-30T11:04:10Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

📝 Walkthrough

Walkthrough

Adds combinable NAS pruning targets (params, active_params, memory_mb) plus a manual export mode; introduces analytical Megatron/Mamba model stats (params and memory) consumed by the pruning searcher; refactors MCore pruning to compute candidate metrics analytically; updates CLI/docs/examples; and expands unit and distributed tests to cover new flows.

Changes

Cohort / File(s)	Summary
Documentation `examples/megatron_bridge/README.md`, `examples/pruning/README.md`, `CHANGELOG.rst`	Document combinable pruning targets (`params`, `active_params`, `memory_mb`), manual `--prune_export_config` mode, example usages, and add changelog entry.
Pruning CLI / Example `examples/megatron_bridge/prune_minitron.py`	Replace mutually-exclusive prune flags with independent/combinable NAS targets; validate argument combinations; include `batch_size`/`seq_length` in pruning config; remove num2hrb-based model param logging.
Pruner Search & Param Counting `modelopt/torch/prune/plugins/mcore_minitron.py`	Refactor searcher to use per-candidate `metrics` dict (params/active_params/memory_mb), cache keyed by metric tuples, compute metrics analytically, filter by all active ceilings, sort by primary metric, validate top-k via scoring; fix EP-aware parameter-counting and adjust exported signatures.
Analytical Model Stats `modelopt/torch/nas/plugins/megatron_model_stats.py`, `modelopt/torch/nas/plugins/__init__.py`	Add analytical utilities to parse hybrid patterns and compute total/active parameter counts and memory footprint (weights, KV-cache, Mamba state); re-export `megatron_model_stats` under megatron plugin guard.
Tests — NAS / Pruning / Stats `tests/gpu_megatron/torch/nas/plugins/test_megatron_mamba_dynamic_modules.py`, `tests/gpu_megatron/torch/prune/plugins/test_mcore_mamba_minitron_pruning.py`, `tests/examples/megatron_bridge/test_prune_minitron.py`, `tests/gpu_megatron/torch/nas/plugins/test_megatron_model_stats.py`	Add/refactor tests: distributed check that analytical `_param_num_dynamic` matches formula; NAS pruning tests for `params`, `active_params`, and `memory_mb`; deterministic CPU tests for analytical stats and memory decomposition; update example integration test to assert pruned size.
Test Utilities / Helpers `tests/_test_utils/torch/transformers_models.py`	Widen return type of `create_tiny_qwen3_moe_dir` to allow returning either `Path` or `(Path, PreTrainedModel)`.

Sequence Diagram(s)

sequenceDiagram
  participant User
  participant CLI as prune_minitron.py
  participant Stats as megatron_model_stats
  participant Pruner as mcore_minitron.Searcher
  participant Model as Teacher / Candidate Builder
  participant Output as Pruned Model / Report

  User->>CLI: run prune command (targets or --prune_export_config)
  CLI->>Stats: compute teacher metrics (params/active/memory) as needed
  CLI->>Pruner: pass metric ceilings + batch_size/seq_length (if NAS)
  Pruner->>Stats: compute candidate metrics analytically for grid
  Pruner->>Pruner: filter candidates by all ceilings, rank by primary metric
  Pruner->>Pruner: validate top-k via score_func
  Pruner->>Model: build/export chosen pruned model (or use export_config)
  Model->>Output: emit pruned model artifact
  Pruner->>Output: render search results and stats

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

Add Full TE Spec support for Megatron Pruning DynamicModules + MoE bug fixes #1024 — overlaps modifications to the mcore_minitron pruning/search pipeline and parameter-count logic.

Suggested reviewers

ChenhanYu
AAnoosheh

🚥 Pre-merge checks | ✅ 5 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 34.62% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (5 passed)

Check name	Status	Explanation
Title check	✅ Passed	The PR title accurately reflects the main objective: extending Minitron pruning with active-parameter and memory-based constraints. The title is concise, clear, and directly related to the changeset.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Security Anti-Patterns	✅ Passed	Comprehensive security review found no unsafe torch.load, numpy.load, eval/exec, trust_remote_code, or nosec patterns in modified Python files.
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch kmorabia/minitron-memory-active-param-nas

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

copy-pr-bot · 2026-04-30T11:05:10Z

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

github-actions · 2026-04-30T11:07:50Z

PR Preview Action v1.8.1
🚀 View preview at https://NVIDIA.github.io/Model-Optimizer/pr-preview/pr-1377/
Built to branch `gh-pages` at 2026-05-01 20:04 UTC. Preview will be ready when the GitHub Pages deployment is complete.

coderabbitai

Actionable comments posted: 4

🧹 Nitpick comments (1)

examples/megatron_bridge/prune_minitron.py (1)

151-158: ⚡ Quick win

Expose the memory-budget batch size explicitly.

memory_mb in the searcher depends on both seq_length and batch_size, but this CLI only forwards seq_length and leaves batch_size at the default of 1. That makes the budget optimistic for any real generation batch greater than 1.

Suggested fix

+    parser.add_argument(
+        "--memory_batch_size",
+        type=int,
+        default=1,
+        help="Batch size to use when evaluating --prune_target_memory_mb.",
+    )
...
         pruning_config["top_k"] = args.top_k
         pruning_config["seq_length"] = args.seq_length
+        pruning_config["batch_size"] = args.memory_batch_size

Also applies to: 368-368

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@examples/megatron_bridge/prune_minitron.py` around lines 151 - 158, The CLI's
prune memory budget uses --seq_length but assumes batch_size=1, making memory_mb
optimistic for real batches; add a new argument (e.g., --prune_memory_batch_size
or --prune_batch_size) to the parser near the existing
parser.add_argument("--prune_target_memory_mb", ...) with type=int and a default
of 1, update its help text to note it affects the memory budget calculation, and
then pass this value into the memory calculation used by the searcher (the code
that computes memory_mb or calls searcher) so memory_mb accounts for seq_length
* batch_size when invoking the NAS searcher. Ensure the new flag is also used in
the other occurrence referenced (around the other prune call) so both places
compute memory consistently.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@examples/megatron_bridge/README.md`:
- Around line 81-88: The documentation example is inconsistent: the prose says
"3B active params" but the command uses --prune_target_active_params 8e9 and the
output path contains "Pruned-8B-Active"; update them to match. Edit README.md so
that either the prose reads "8B active params" to match the prune_minitron.py
invocation and output_hf_path, or change the command flag value to 3e9 and the
output_hf_path to "Pruned-3B-Active" so all three references (the prose,
--prune_target_active_params flag, and the output_hf_path) consistently reflect
the same active-params target.

In `@modelopt/torch/nas/megatron_model_stats.py`:
- Around line 201-207: parse_main_layer_chars currently only strips PP boundary
'|' but must also skip GDN marker 'G' so hybrid patterns containing 'G' don't
propagate into mcore_param_count and trigger the unsupported-char error; update
parse_main_layer_chars to ignore 'G' (in addition to '|') when producing the
per-layer char list (ensure the function returns one char per actual layer by
filtering out '|' and 'G').

In `@modelopt/torch/prune/plugins/mcore_minitron.py`:
- Around line 430-439: active_metric_keys is currently built from iterating the
unordered frozenset _METRIC_CONSTRAINTS which can select a different primary_key
across ranks; change construction of active_metric_keys to preserve the
constraints insertion order by iterating self.constraints and selecting keys
present in _METRIC_CONSTRAINTS (e.g., active_metric_keys = [k for k in
self.constraints if k in _METRIC_CONSTRAINTS]), ensure you handle the empty-case
before accessing primary_key, and keep primary_key deterministic. In
_compute_candidate_metrics(), replace direct access to
model.hybrid_layer_pattern with the same compatibility fallback used in _prune
(use getattr(model, "hybrid_layer_pattern", None) or
self.hybrid_override_pattern) to avoid AttributeError on older Megatron-Core
versions. Ensure both changes reference the existing symbols active_metric_keys,
primary_key, _METRIC_CONSTRAINTS, _compute_candidate_metrics,
model.hybrid_layer_pattern, hybrid_override_pattern and _prune so the fixes
align with the surrounding logic.
- Around line 650-653: The code directly accesses model.hybrid_layer_pattern for
MambaModel which can raise AttributeError on older MCore versions that only
expose hybrid_override_pattern; update the access to use the same fallback logic
as earlier (check for getattr(model, "hybrid_override_pattern", None) first,
then fallback to getattr(model, "hybrid_layer_pattern", None)) when computing
hybrid_layer_pattern for MambaModel so metric-based NAS candidate evaluation
won't fail on pre-0.17 MCore; modify the block handling MambaModel to use this
fallback lookup.

---

Nitpick comments:
In `@examples/megatron_bridge/prune_minitron.py`:
- Around line 151-158: The CLI's prune memory budget uses --seq_length but
assumes batch_size=1, making memory_mb optimistic for real batches; add a new
argument (e.g., --prune_memory_batch_size or --prune_batch_size) to the parser
near the existing parser.add_argument("--prune_target_memory_mb", ...) with
type=int and a default of 1, update its help text to note it affects the memory
budget calculation, and then pass this value into the memory calculation used by
the searcher (the code that computes memory_mb or calls searcher) so memory_mb
accounts for seq_length * batch_size when invoking the NAS searcher. Ensure the
new flag is also used in the other occurrence referenced (around the other prune
call) so both places compute memory consistently.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 8c94f5fa-6801-4848-8f93-59cedf8ae6b2

📥 Commits

Reviewing files that changed from the base of the PR and between bb08094 and caae65c.

📒 Files selected for processing (7)

examples/megatron_bridge/README.md
examples/megatron_bridge/prune_minitron.py
modelopt/torch/nas/megatron_model_stats.py
modelopt/torch/prune/plugins/mcore_minitron.py
tests/gpu_megatron/torch/nas/plugins/test_megatron_mamba_dynamic_modules.py
tests/gpu_megatron/torch/nas/test_megatron_model_stats.py
tests/gpu_megatron/torch/prune/plugins/test_mcore_mamba_minitron_pruning.py

codecov · 2026-04-30T11:17:05Z

Codecov Report

❌ Patch coverage is 93.88889% with 22 lines in your changes missing coverage. Please review.
✅ Project coverage is 76.78%. Comparing base (50706d1) to head (b2b3346).

Files with missing lines	Patch %	Lines
modelopt/torch/nas/plugins/megatron_model_stats.py	93.27%	15 Missing ⚠️
modelopt/torch/prune/plugins/mcore_minitron.py	94.85%	7 Missing ⚠️

Additional details and impacted files

@@             Coverage Diff             @@
##             main    #1377       +/-   ##
===========================================
+ Coverage   66.36%   76.78%   +10.41%     
===========================================
  Files         471      472        +1     
  Lines       50510    50824      +314     
===========================================
+ Hits        33522    39025     +5503     
+ Misses      16988    11799     -5189

Flag	Coverage Δ
examples	`33.48% <0.55%> (-7.17%)`	⬇️
gpu	`59.93% <93.88%> (+32.99%)`	⬆️
regression	`14.81% <0.55%> (-0.02%)`	⬇️
unit	`52.52% <0.55%> (-0.33%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

coderabbitai

Actionable comments posted: 2

♻️ Duplicate comments (1)

modelopt/torch/prune/plugins/mcore_minitron.py (1)

691-694: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Add fallback for hybrid_override_pattern to ensure compatibility with older MCore versions.

Line 694 directly accesses model.hybrid_layer_pattern without the compatibility fallback used elsewhere in this file (lines 373-377). On MCore versions that only expose hybrid_override_pattern (pre-0.17), this will raise an AttributeError when computing metrics for hybrid/Mamba models.

🔧 Suggested fix

         # Get hybrid layer pattern for MambaModel (None for pure GPT)
         hybrid_layer_pattern: str | None = None
         if isinstance(model, MambaModel):
-            hybrid_layer_pattern = model.hybrid_layer_pattern
+            hybrid_key = (
+                "hybrid_override_pattern"
+                if hasattr(model, "hybrid_override_pattern")
+                else "hybrid_layer_pattern"
+            )
+            hybrid_layer_pattern = getattr(model, hybrid_key)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@modelopt/torch/prune/plugins/mcore_minitron.py` around lines 691 - 694, The
code assumes model.hybrid_layer_pattern exists for MambaModel but older MCore
exposes hybrid_override_pattern; update the MambaModel branch in compute metrics
to use a compatibility fallback (e.g., set hybrid_layer_pattern = getattr(model,
"hybrid_layer_pattern", None) or fallback to getattr(model,
"hybrid_override_pattern", None)) so it mirrors the earlier compatibility logic
used elsewhere in this file; ensure you reference MambaModel and the model
variable when adding the fallback so AttributeError is avoided on pre-0.17
MCore.

🧹 Nitpick comments (1)

modelopt/torch/nas/plugins/megatron_model_stats.py (1)
41-44: Move optional dependency imports inside function scope using lazy loading.

Lines 41-44 import megatron and rich at module level. Per coding guidelines, optional dependencies must be loaded lazily via import_plugin() at the point of use, not hard-imported at module level. Even though the module itself is wrapped with import_plugin() in __init__.py, the imports within the file still execute eagerly, violating the lazy-loading convention.

Defer these imports to the functions that require them (e.g., print_mcore_model_stats, _mamba_layer_params) using import_plugin() context managers at call sites.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modelopt/torch/nas/plugins/megatron_model_stats.py` around lines 41 - 44,
Remove the module-level imports of megatron and rich (MambaModel, Console,
Panel, Table) and instead perform lazy imports inside the functions that use
them (e.g., print_mcore_model_stats and _mamba_layer_params) by calling
import_plugin() as a context manager at the beginning of each function and
importing megatron.core.models.mamba.mamba_model.MambaModel and
rich.console.Console / rich.panel.Panel / rich.table.Table there; ensure no
other code in the module relies on those names at import time and update any
references to use the locally imported symbols within those functions.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@modelopt/torch/nas/plugins/megatron_model_stats.py`:
- Around line 352-353: The code slices
parse_main_layer_chars(hybrid_layer_pattern) to num_layers (assigned to
layer_chars) without verifying the parsed pattern length; update the logic in
the places using parse_main_layer_chars (the assignment to layer_chars) to first
compute parsed = parse_main_layer_chars(hybrid_layer_pattern), check that
len(parsed) >= num_layers (or exactly equals num_layers as appropriate), and if
not raise a clear exception or log an error and fail fast so
hybrid_layer_pattern mismatches aren’t silently truncated; reference
parse_main_layer_chars, hybrid_layer_pattern, num_layers, and layer_chars when
adding this validation and error handling (apply the same change at the other
call site that slices to num_layers).
- Around line 488-501: kv_per_layer and kv_bytes are computed unconditionally
which can crash if kv_channels or num_query_groups are missing and Mamba uses
0/1 fallbacks that silently undercount; fix by gating the KV calculation behind
n_attn > 0 and validating required dimensions (kv_channels, num_query_groups)
before computing kv_per_layer/kv_bytes (raise a clear ValueError if missing),
and for Mamba ensure when n_mamba > 0 you validate mamba_num_heads,
mamba_head_dim, mamba_state_dim (and optionally mamba_num_groups) are not None
rather than defaulting to 0/1; only compute mamba_bytes when the required Mamba
dims are present, otherwise raise an error so the estimator fails fast instead
of under-reporting.

---

Duplicate comments:
In `@modelopt/torch/prune/plugins/mcore_minitron.py`:
- Around line 691-694: The code assumes model.hybrid_layer_pattern exists for
MambaModel but older MCore exposes hybrid_override_pattern; update the
MambaModel branch in compute metrics to use a compatibility fallback (e.g., set
hybrid_layer_pattern = getattr(model, "hybrid_layer_pattern", None) or fallback
to getattr(model, "hybrid_override_pattern", None)) so it mirrors the earlier
compatibility logic used elsewhere in this file; ensure you reference MambaModel
and the model variable when adding the fallback so AttributeError is avoided on
pre-0.17 MCore.

---

Nitpick comments:
In `@modelopt/torch/nas/plugins/megatron_model_stats.py`:
- Around line 41-44: Remove the module-level imports of megatron and rich
(MambaModel, Console, Panel, Table) and instead perform lazy imports inside the
functions that use them (e.g., print_mcore_model_stats and _mamba_layer_params)
by calling import_plugin() as a context manager at the beginning of each
function and importing megatron.core.models.mamba.mamba_model.MambaModel and
rich.console.Console / rich.panel.Panel / rich.table.Table there; ensure no
other code in the module relies on those names at import time and update any
references to use the locally imported symbols within those functions.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 5e1794a8-d42b-4444-abd1-542dd79d975d

📥 Commits

Reviewing files that changed from the base of the PR and between caae65c and 8035a26.

📒 Files selected for processing (11)

CHANGELOG.rst
examples/megatron_bridge/README.md
examples/megatron_bridge/prune_minitron.py
examples/pruning/README.md
modelopt/torch/nas/plugins/__init__.py
modelopt/torch/nas/plugins/megatron_model_stats.py
modelopt/torch/prune/plugins/mcore_minitron.py
tests/examples/megatron_bridge/test_prune_minitron.py
tests/gpu_megatron/torch/nas/plugins/test_megatron_mamba_dynamic_modules.py
tests/gpu_megatron/torch/nas/plugins/test_megatron_model_stats.py
tests/gpu_megatron/torch/prune/plugins/test_mcore_mamba_minitron_pruning.py

✅ Files skipped from review due to trivial changes (2)

examples/pruning/README.md
CHANGELOG.rst

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tests/examples/megatron_bridge/test_prune_minitron.py`:
- Around line 53-56: The test calls mcore_param_count(pruned_model) with a
HuggingFace model but mcore_param_count requires the vocab_size when given an HF
model; update the call to pass the model's vocab size (use
pruned_model.config.vocab_size) so call mcore_param_count(pruned_model,
vocab_size=pruned_model.config.vocab_size) (mirror the same fix applied for the
teacher model) to ensure correct parameter counting for
AutoModelForCausalLM.from_pretrained(pruned_model_path).

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 8432c79c-56ed-4bff-916a-f3d84fafcd02

📥 Commits

Reviewing files that changed from the base of the PR and between 8035a26 and 11805a4.

📒 Files selected for processing (11)

CHANGELOG.rst
examples/megatron_bridge/README.md
examples/megatron_bridge/prune_minitron.py
examples/pruning/README.md
modelopt/torch/nas/plugins/__init__.py
modelopt/torch/nas/plugins/megatron_model_stats.py
modelopt/torch/prune/plugins/mcore_minitron.py
tests/examples/megatron_bridge/test_prune_minitron.py
tests/gpu_megatron/torch/nas/plugins/test_megatron_mamba_dynamic_modules.py
tests/gpu_megatron/torch/nas/plugins/test_megatron_model_stats.py
tests/gpu_megatron/torch/prune/plugins/test_mcore_mamba_minitron_pruning.py

✅ Files skipped from review due to trivial changes (4)

examples/pruning/README.md
modelopt/torch/nas/plugins/megatron_model_stats.py
examples/megatron_bridge/README.md
CHANGELOG.rst

🚧 Files skipped from review as they are similar to previous changes (2)

modelopt/torch/nas/plugins/init.py
examples/megatron_bridge/prune_minitron.py

coderabbitai

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

tests/_test_utils/torch/transformers_models.py (1)

111-120: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

The new tuple return type is unreachable.

create_tiny_qwen3_moe_dir() still always returns qwen3_moe_dir, so the widened annotation does not match the implementation. Either add the missing return_model path or keep the return type as Path until that behavior exists.

♻️ Suggested fix

 def create_tiny_qwen3_moe_dir(
-    tmp_path: Path | str, with_tokenizer: bool = False, **config_kwargs
+    tmp_path: Path | str, with_tokenizer: bool = False, return_model: bool = False, **config_kwargs
 ) -> Path | tuple[Path, PreTrainedModel]:
     qwen3_moe_dir = Path(tmp_path) / "tiny_qwen3_moe"
     if with_tokenizer:
-        tokenizer = tokenizer = get_tiny_tokenizer()
+        tokenizer = get_tiny_tokenizer()
         tokenizer.save_pretrained(qwen3_moe_dir)
         config_kwargs["vocab_size"] = tokenizer.vocab_size
-    get_tiny_qwen3_moe(**config_kwargs).save_pretrained(qwen3_moe_dir)
-    return qwen3_moe_dir
+    tiny_qwen3_moe = get_tiny_qwen3_moe(**config_kwargs)
+    tiny_qwen3_moe.save_pretrained(qwen3_moe_dir)
+    if return_model:
+        return qwen3_moe_dir, tiny_qwen3_moe
+    return qwen3_moe_dir

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@tests/_test_utils/torch/transformers_models.py` around lines 111 - 120, The
annotated return type claims the function can return Path or tuple[Path,
PreTrainedModel], but the implementation always returns only Path; fix by either
reverting the annotation to Path or by adding an explicit return_model: bool =
False parameter and returning the model when requested: call model =
get_tiny_qwen3_moe(**config_kwargs), model.save_pretrained(qwen3_moe_dir), and
if return_model is True return (qwen3_moe_dir, model) else return qwen3_moe_dir;
update the function signature and return type to match this behavior and adjust
references to tokenizer variable duplication (tokenizer = tokenizer) if present.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@examples/megatron_bridge/prune_minitron.py`:
- Around line 151-158: prune_target_memory_mb is currently evaluated using
pruning_config["batch_size"] wired to --calib_mbs (fixed to 1), so add a
dedicated CLI flag (e.g., --prune_batch_size or --prune_target_batch_size) in
the same argparse block where parser.add_argument("--prune_target_memory_mb",
...) is defined and use that new argument to set pruning_config["batch_size"]
instead of using --calib_mbs; keep --calib_mbs for calibration work but ensure
the pruning path reads args.prune_batch_size (with a sensible default, e.g., 1)
and update any place that sets pruning_config["batch_size"] (and related
mentions around lines 363-365) to reference this new flag so memory estimates
reflect the intended deployment batch size.

In `@tests/gpu_megatron/torch/nas/plugins/test_megatron_mamba_dynamic_modules.py`:
- Around line 234-238: The current use of
DistributedProcessGroup.get_dist_syncd_obj with the lambda that flattens
all_rank_configs into width_ss_config can silently mask per-rank differences
(last-write-wins); before applying the flattening op, collect all_rank_configs
from get_dist_syncd_obj (or call the synchronization helper without the op),
iterate over each key across the per-rank dicts returned for local_config and
validate that every rank reports an identical value for that key, and if any key
has conflicting values raise/assert/log a clear error (or fail the test)
indicating which key and differing values were observed; then only call the
flattening op (the lambda used to build width_ss_config) after this agreement
check (references: DistributedProcessGroup.get_dist_syncd_obj, local_config,
width_ss_config, get_pipeline_model_parallel_group).
- Line 156: Replace the hard assert that enforces "size <= 4" with a pytest.skip
guard so tests don't hard-fail on larger pipeline-parallel sizes; specifically,
in the helper/test that currently does `assert size <= 4,
"test_param_num_dynamic_matches_formula only configured for upto 4 GPUs"`, check
`if size > 4:` and call `pytest.skip("...")` with a similar explanatory message
referencing the limitation, importing pytest if necessary so the test is skipped
rather than failing.

---

Outside diff comments:
In `@tests/_test_utils/torch/transformers_models.py`:
- Around line 111-120: The annotated return type claims the function can return
Path or tuple[Path, PreTrainedModel], but the implementation always returns only
Path; fix by either reverting the annotation to Path or by adding an explicit
return_model: bool = False parameter and returning the model when requested:
call model = get_tiny_qwen3_moe(**config_kwargs),
model.save_pretrained(qwen3_moe_dir), and if return_model is True return
(qwen3_moe_dir, model) else return qwen3_moe_dir; update the function signature
and return type to match this behavior and adjust references to tokenizer
variable duplication (tokenizer = tokenizer) if present.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 4d72e821-1f4b-4fcf-8aee-8442f5881c50

📥 Commits

Reviewing files that changed from the base of the PR and between 11805a4 and 90ce94e.

📒 Files selected for processing (12)

CHANGELOG.rst
examples/megatron_bridge/README.md
examples/megatron_bridge/prune_minitron.py
examples/pruning/README.md
modelopt/torch/nas/plugins/__init__.py
modelopt/torch/nas/plugins/megatron_model_stats.py
modelopt/torch/prune/plugins/mcore_minitron.py
tests/_test_utils/torch/transformers_models.py
tests/examples/megatron_bridge/test_prune_minitron.py
tests/gpu_megatron/torch/nas/plugins/test_megatron_mamba_dynamic_modules.py
tests/gpu_megatron/torch/nas/plugins/test_megatron_model_stats.py
tests/gpu_megatron/torch/prune/plugins/test_mcore_mamba_minitron_pruning.py

✅ Files skipped from review due to trivial changes (4)

examples/pruning/README.md
CHANGELOG.rst
examples/megatron_bridge/README.md
modelopt/torch/nas/plugins/megatron_model_stats.py

🚧 Files skipped from review as they are similar to previous changes (3)

tests/gpu_megatron/torch/nas/plugins/test_megatron_model_stats.py
tests/gpu_megatron/torch/prune/plugins/test_mcore_mamba_minitron_pruning.py
modelopt/torch/prune/plugins/mcore_minitron.py

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (1)

modelopt/torch/prune/plugins/mcore_minitron.py (1)

507-509: ⚡ Quick win

Use all active metrics to break ties before slicing top_k.

Right now the cache is sorted only by primary_key, so candidates that tie on that metric keep the raw product-order from _generate_search_space_combos(). With combined constraints, that can arbitrarily push out subnets that are closer to the secondary ceilings before top_k validation ever runs.

Proposed change

-            self.all_candidates_per_constraint[constraints_cache_key] = sorted(
-                selected, key=lambda x: x.metrics[primary_key], reverse=True
-            )
+            self.all_candidates_per_constraint[constraints_cache_key] = sorted(
+                selected,
+                key=lambda x: tuple(x.metrics[k] for k in active_metric_keys),
+                reverse=True,
+            )

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@modelopt/torch/prune/plugins/mcore_minitron.py` around lines 507 - 509, The
cache currently sorts selected only by primary_key which leaves ties ordered by
the generator's product order and can evict better candidates on secondary
metrics; change the sort used when assigning
self.all_candidates_per_constraint[constraints_cache_key] to order by a tuple of
all active metric keys (primary_key first, then the other metric names in the
same priority order) so ties are broken by subsequent metrics before any top_k
slicing. Locate the assignment to
self.all_candidates_per_constraint[constraints_cache_key] and replace the
key=lambda x: x.metrics[primary_key] with a key that builds a tuple like
tuple(x.metrics[m] for m in metrics_list) (use reverse=True if higher is better)
where metrics_list contains primary_key followed by the remaining active
metrics; keep the rest of the flow (e.g., _generate_search_space_combos() and
later top_k validation) unchanged.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tests/gpu_megatron/torch/prune/plugins/test_mcore_mamba_minitron_pruning.py`:
- Around line 319-325: The helper _assert_top_k_candidates currently uses exact
equality for metrics (actual.metrics == metrics) which causes flaky tests for
fractional float fields like memory_mb; update _assert_top_k_candidates to
compare numeric/float metric values using pytest.approx (import pytest) rather
than exact equality—e.g., when comparing actual.metrics to expected metrics,
iterate metric keys and for values that are floats use assert actual_value ==
pytest.approx(expected_value) and for non-floats keep exact equality; keep the
existing checks for ss_config and score but replace any direct float comparisons
(including score if it can be fractional) with pytest.approx to stabilize the
test.

---

Nitpick comments:
In `@modelopt/torch/prune/plugins/mcore_minitron.py`:
- Around line 507-509: The cache currently sorts selected only by primary_key
which leaves ties ordered by the generator's product order and can evict better
candidates on secondary metrics; change the sort used when assigning
self.all_candidates_per_constraint[constraints_cache_key] to order by a tuple of
all active metric keys (primary_key first, then the other metric names in the
same priority order) so ties are broken by subsequent metrics before any top_k
slicing. Locate the assignment to
self.all_candidates_per_constraint[constraints_cache_key] and replace the
key=lambda x: x.metrics[primary_key] with a key that builds a tuple like
tuple(x.metrics[m] for m in metrics_list) (use reverse=True if higher is better)
where metrics_list contains primary_key followed by the remaining active
metrics; keep the rest of the flow (e.g., _generate_search_space_combos() and
later top_k validation) unchanged.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: c55cb2a6-fe6f-49b5-9fab-31db8a3753f4

📥 Commits

Reviewing files that changed from the base of the PR and between 90ce94e and f061c19.

📒 Files selected for processing (12)

CHANGELOG.rst
examples/megatron_bridge/README.md
examples/megatron_bridge/prune_minitron.py
examples/pruning/README.md
modelopt/torch/nas/plugins/__init__.py
modelopt/torch/nas/plugins/megatron_model_stats.py
modelopt/torch/prune/plugins/mcore_minitron.py
tests/_test_utils/torch/transformers_models.py
tests/examples/megatron_bridge/test_prune_minitron.py
tests/gpu_megatron/torch/nas/plugins/test_megatron_mamba_dynamic_modules.py
tests/gpu_megatron/torch/nas/plugins/test_megatron_model_stats.py
tests/gpu_megatron/torch/prune/plugins/test_mcore_mamba_minitron_pruning.py

✅ Files skipped from review due to trivial changes (3)

CHANGELOG.rst
modelopt/torch/nas/plugins/megatron_model_stats.py
examples/megatron_bridge/README.md

🚧 Files skipped from review as they are similar to previous changes (6)

tests/examples/megatron_bridge/test_prune_minitron.py
tests/_test_utils/torch/transformers_models.py
modelopt/torch/nas/plugins/init.py
tests/gpu_megatron/torch/nas/plugins/test_megatron_mamba_dynamic_modules.py
examples/megatron_bridge/prune_minitron.py
tests/gpu_megatron/torch/nas/plugins/test_megatron_model_stats.py

copy-pr-bot · 2026-05-01T19:21:00Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

…ch logging Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>

Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>

kevalmorabia97 requested review from a team as code owners April 30, 2026 11:03

kevalmorabia97 requested a review from ChenhanYu April 30, 2026 11:04

kevalmorabia97 marked this pull request as draft April 30, 2026 11:05

kevalmorabia97 changed the title ~~Add Megatron model stats NAS utilities and active param tracking for Minitron pruning~~ Enable active-param and memory based Minitron pruning constraint Apr 30, 2026

coderabbitai Bot reviewed Apr 30, 2026

View reviewed changes

Comment thread examples/megatron_bridge/README.md Outdated

Comment thread modelopt/torch/nas/plugins/megatron_model_stats.py Outdated

Comment thread modelopt/torch/prune/plugins/mcore_minitron.py

Comment thread modelopt/torch/prune/plugins/mcore_minitron.py Outdated

kevalmorabia97 force-pushed the kmorabia/minitron-memory-active-param-nas branch from caae65c to 8035a26 Compare April 30, 2026 15:54

kevalmorabia97 marked this pull request as ready for review April 30, 2026 15:55

kevalmorabia97 force-pushed the kmorabia/minitron-memory-active-param-nas branch from 8035a26 to 11805a4 Compare April 30, 2026 15:59

coderabbitai Bot reviewed Apr 30, 2026

View reviewed changes

Comment thread modelopt/torch/nas/plugins/megatron_model_stats.py Outdated

Comment thread modelopt/torch/nas/plugins/megatron_model_stats.py Outdated

kevalmorabia97 requested review from AAnoosheh, LianaMikael, achidiac-nv and danielkorzekwa April 30, 2026 16:05

coderabbitai Bot reviewed Apr 30, 2026

View reviewed changes

Comment thread tests/examples/megatron_bridge/test_prune_minitron.py Outdated

kevalmorabia97 force-pushed the kmorabia/minitron-memory-active-param-nas branch 2 times, most recently from 5c832dd to 90ce94e Compare April 30, 2026 19:27

coderabbitai Bot reviewed Apr 30, 2026

View reviewed changes

Comment thread examples/megatron_bridge/prune_minitron.py

Comment thread tests/gpu_megatron/torch/nas/plugins/test_megatron_mamba_dynamic_modules.py

Comment thread tests/gpu_megatron/torch/nas/plugins/test_megatron_mamba_dynamic_modules.py

kevalmorabia97 force-pushed the kmorabia/minitron-memory-active-param-nas branch from 90ce94e to f061c19 Compare April 30, 2026 20:59

coderabbitai Bot reviewed Apr 30, 2026

View reviewed changes

Comment thread tests/gpu_megatron/torch/prune/plugins/test_mcore_mamba_minitron_pruning.py

kevalmorabia97 requested a review from a team as a code owner May 1, 2026 19:20

kevalmorabia97 added 2 commits May 1, 2026 13:01

Enable active-param and memory based Minitron pruning constraint + ri…

7ad01c0

…ch logging Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>

Fixes based on Nemotron3 tests

b2b3346

Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>

kevalmorabia97 force-pushed the kmorabia/minitron-memory-active-param-nas branch from 38d82fb to b2b3346 Compare May 1, 2026 20:01

Conversation

kevalmorabia97 commented Apr 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Changes

Usage

Testing

Before your PR is "Ready for review"

Uh oh!

coderabbitai Bot commented Apr 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested reviewers

❌ Failed checks (1 warning)

Uh oh!

copy-pr-bot Bot commented Apr 30, 2026

Uh oh!

github-actions Bot commented Apr 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Built to branch gh-pages at 2026-05-01 20:04 UTC. Preview will be ready when the GitHub Pages deployment is complete.

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

codecov Bot commented Apr 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

copy-pr-bot Bot commented May 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

kevalmorabia97 commented Apr 30, 2026 •

edited

Loading

coderabbitai Bot commented Apr 30, 2026 •

edited

Loading

github-actions Bot commented Apr 30, 2026 •

edited

Loading

Built to branch `gh-pages` at 2026-05-01 20:04 UTC.
Preview will be ready when the GitHub Pages deployment is complete.

codecov Bot commented Apr 30, 2026 •

edited

Loading