Support Mixed precision & Static MSE PTQ in MCore export; Nemotron Super v3 NVFP4 recipe by jenchen13 · Pull Request #1363 · NVIDIA/Model-Optimizer

jenchen13 · 2026-04-28T18:33:04Z

What does this PR do?

Type of change: New recipe

Add a YAML quantization recipe that roughly mirrors Nemotron 3 Super NVFP4 hf_quant_config.json
support mixed precision export in MCore by detecting mixed precision layers in HF Quant Config
Support MCore checkpoint resumed TensorQuantizer with _global_amax field in NVFP4QTensor static quantizer detection -> fixes bug during MCore export for MSE
Fix dynamic block quantizer detection when block_sizes is dict-backed.
Skip dynamic block quantizers during MoE calibration completeness checks and distributed amax sync.
Add fp8_scale_sweep_stride to optionally subsample NVFP4 FP8 scale sweep candidates.

Why

The dynamic block check previously used attribute access and failed for dict-backed block_sizes, so dynamic block quantizers could incorrectly enter MoE amax completeness/sync
paths. The FP8 sweep stride keeps default exhaustive behavior while giving recipes a controlled way to reduce NVFP4 weight scale search time.

Testing

python3 -m py_compile modelopt/torch/quantization/model_calib.py
git diff --check -- modelopt/torch/quantization/model_calib.py

Super recipe

Mirrors the published nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-NVFP4 hf_quant_config.json:

MoE routed experts (mixer.experts..{up,down}_proj): NVFP4 W4A4 weight MSE, group_size 16
MoE shared experts (mixer.shared_experts.{up,down}_proj): FP8 per-tensor
Mamba mixer linears (mixer.{in,out}_proj): FP8 per-tensor
KV cache: FP8
rest: not quantized

Usage

# Add a code snippet demonstrating how to use this

Testing

TODO test in HF and MCore PTQ on Nemotron model

Before your PR is "Ready for review"

Make sure you read and follow Contributor guidelines and your commits are signed (git commit -s -S).

Make sure you read and follow the Security Best Practices (e.g. avoiding hardcoded trust_remote_code=True, torch.load(..., weights_only=False), pickle, etc.).

Is this change backward compatible?: ✅ / ❌ / N/A
If you copied code from any other sources or added a new PIP dependency, did you follow guidance in CONTRIBUTING.md: ✅ / ❌ / N/A
Did you write any new necessary tests?: ✅ / ❌ / N/A
Did you update Changelog?: ✅ / ❌ / N/A

Additional Information

Summary by CodeRabbit

New Features
- Added three new quantization recipes for Nemotron-3-Super-120B with NVFP4 and FP8 calibration strategies.
- Added configurable FP8 scale sweep stride parameter for fine-tuning quantization calibration.
- Improved per-layer quantization metadata collection during model export.
Improvements
- Enhanced mixed-precision quantization and MoE expert handling across distributed processes.
- Refined KV-cache and attention layer quantization configuration export.
Tests
- Updated quantization export verification tests.

Signed-off-by: Jennifer Chen <jennifchen@nvidia.com>

coderabbitai · 2026-04-28T18:33:16Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

📝 Walkthrough

Walkthrough

This PR adds FP8 scale sweep stride control to calibration workflows, introduces three mixed-precision NVFP4 quantization recipes for Nemotron-3-Super-120B with different calibration methods (MSE, MSE with stride-4 sweep, max-based), refactors MoE calibration completeness checks to recursively traverse SequentialQuantizer leaves, and overhauls HuggingFace export to collect and apply per-layer quantization metadata.

Changes

Cohort / File(s)	Summary
YAML Quantization Recipes `modelopt_recipes/models/Nemotron-3-Super-120B-A12B/super-nvfp4.yaml`, `modelopt_recipes/models/Nemotron-3-Super-120B-A12B/super-nvfp4-fp8-sweep-stride4.yaml`, `modelopt_recipes/models/Nemotron-3-Super-120B-A12B/super-nvfp4-max-calib.yaml`	Three new mixed-precision quantization recipes: MSE calibration with FP8 scale sweep (stride 1 and 4), and max-based calibration. Each selectively enables NVFP4 W4A4 for routed MoE experts, FP8 for shared experts/Mamba components, and KV-cache BMM quantizers.
FP8 Scale Sweep Stride Configuration `modelopt/torch/quantization/config.py`, `modelopt/torch/quantization/calib/mse.py`, `modelopt/torch/quantization/model_calib.py`	Adds `fp8_scale_sweep_stride` parameter to `MseCalibConfig`, `NVFP4MSECalibrator`, and `mse_calibrate()` function to subsample FP8 E4M3 scale candidates during calibration (minimum value 1).
MoE Calibration and Synchronization `modelopt/torch/quantization/model_calib.py`	Refactors MoE completeness checks and DP/EP `_amax` synchronization to recursively traverse `SequentialQuantizer` leaves, consistently detect dynamic block quantizers via shared helper, and skip dynamic-block leaves from synchronization.
NVFP4 Static Quantizer Detection `modelopt/torch/quantization/qtensor/nvfp4_tensor.py`	Introduces `_get_static_global_amax` helper supporting both dict- and attribute-style access to `weight_quantizer.global_amax`, replacing direct attribute lookups across scaling factor methods.
Megatron Plugin Sharding `modelopt/torch/quantization/plugins/megatron.py`	Filters `weight_quantizer._global_amax` entries from sharded axis mapping to exclude only global-amax tensors from axis assignment.
HF Export Layer Metadata `modelopt/torch/export/unified_export_megatron.py`	Adds per-layer quantization metadata collection during export, refactors exclusion handling via centralized `_record_excluded_module` helper, enhances QKV handling for unquantized cases with separate projection exclusions, and updates `save_pretrained` to apply per-layer quantization config when available.
Export Test Updates `tests/gpu_megatron/torch/export/test_unified_export_megatron.py`	Revises quantization config verification to require whole-structure equality, updates HF config validation paths (e.g., `config_groups/group_0/weights/group_size` for NVFP4 group size), and adds unit test for unquantized QKV slicing and derived projection module exclusions.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

🚥 Pre-merge checks | ✅ 5 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 75.68% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (5 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately captures the main focus: mixed-precision support, static MSE PTQ in MCore export, and a new Nemotron Super v3 NVFP4 recipe, all of which are reflected throughout the changeset.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Security Anti-Patterns	✅ Passed	Pull request does not introduce any critical security anti-patterns defined in SECURITY.md.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch jennifchen/super_nvfp4_recipe

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Review rate limit: 7/8 reviews remaining, refill in 7 minutes and 30 seconds.}

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Signed-off-by: Jennifer Chen <jennifchen@nvidia.com>

coderabbitai

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@modelopt_recipes/models/Nemotron-3-Super-120B-A12B/super-nvfp4.yaml`:
- Around line 29-32: The metadata claims attention o_proj is FP8 per-tensor but
quant_cfg lacks any override for attention.o_proj; either update the description
or add the missing quantizer mapping. Fix by adding an explicit quant_cfg
override for the attention o_proj parameter name (e.g., attention.o_proj /
attention.output_projection / whatever key is used in your model mapping) to use
the FP8 per-tensor quantizer used elsewhere, or remove the attention o_proj
mention from the metadata so it matches the existing quant_cfg; ensure you
reference the exact layer key used in quant_cfg to keep mapping consistent with
the model.
- Around line 94-115: The entries using broad quantizer_name patterns
('*mixer.fc1_latent_proj*weight_quantizer',
'*mixer.fc1_latent_proj*input_quantizer',
'*mixer.fc2_latent_proj*weight_quantizer',
'*mixer.fc2_latent_proj*input_quantizer') are enabling FP8 for every latent
projection instead of only layers 1, 3, and 5; update these quantizer_name
values to target only the specific layer instances (e.g. include the layer
index/identifier for layers 1, 3, and 5 in the wildcard or use a regex/explicit
list) so only those mixer.fc1_latent_proj and mixer.fc2_latent_proj quantizers
are set to num_bits: e4m3, and leave all other latent projection quantizers at
BF16 (or remove the generic entries).

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 60ed9c0b-efef-4967-b321-7270a8853455

📥 Commits

Reviewing files that changed from the base of the PR and between 8eec6d4 and 2a0c852.

📒 Files selected for processing (1)

modelopt_recipes/models/Nemotron-3-Super-120B-A12B/super-nvfp4.yaml

codecov · 2026-04-28T18:50:07Z

Codecov Report

❌ Patch coverage is 30.00000% with 70 lines in your changes missing coverage. Please review.
✅ Project coverage is 58.90%. Comparing base (8eec6d4) to head (5de5541).
⚠️ Report is 10 commits behind head on main.

Files with missing lines	Patch %	Lines
modelopt/torch/export/unified_export_megatron.py	5.66%	50 Missing ⚠️
modelopt/torch/quantization/model_calib.py	61.53%	10 Missing ⚠️
modelopt/torch/quantization/calib/mse.py	0.00%	6 Missing ⚠️
modelopt/torch/quantization/plugins/megatron.py	0.00%	2 Missing ⚠️
...odelopt/torch/quantization/qtensor/nvfp4_tensor.py	83.33%	2 Missing ⚠️

Additional details and impacted files

@@             Coverage Diff             @@
##             main    #1363       +/-   ##
===========================================
- Coverage   76.93%   58.90%   -18.03%     
===========================================
  Files         471      471               
  Lines       50404    53013     +2609     
===========================================
- Hits        38776    31227     -7549     
- Misses      11628    21786    +10158

Flag	Coverage Δ
examples	`36.95% <27.83%> (-3.72%)`	⬇️
gpu	`15.78% <19.58%> (-44.38%)`	⬇️
regression	`14.91% <9.27%> (+0.20%)`	⬆️
unit	`52.80% <19.38%> (+0.06%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Signed-off-by: Jennifer Chen <jennifchen@nvidia.com>

github-actions · 2026-04-29T19:15:38Z

PR Preview Action v1.8.1
🚀 View preview at https://NVIDIA.github.io/Model-Optimizer/pr-preview/pr-1363/
Built to branch `gh-pages` at 2026-05-01 21:08 UTC. Preview will be ready when the GitHub Pages deployment is complete.

coderabbitai

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

modelopt/torch/quantization/model_calib.py (1)
389-412: ⚠️ Potential issue | 🟡 Minor

Custom FP8-sweep backends still ignore the new stride setting.

When a registered backend_factory is used, this branch still calls the old 3-argument factory signature, so fp8_scale_sweep_stride only takes effect on the built-in NVFP4MSECalibrator path below. That makes the new config silently no-op for registry-backed sweep calibrators. Please extend the factory contract or reject non-default stride explicitly in the backend path.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modelopt/torch/quantization/model_calib.py` around lines 389 - 412, The
registered backend factory path (lookup via _FP8_SWEEP_CALIBRATOR_REGISTRY and
assigned to backend_factory) currently calls the factory with the old 3-argument
signature and thus ignores fp8_scale_sweep_stride; update the branch that sets
module._calibrator via backend_factory to either (A) call the factory with the
new argument (pass fp8_scale_sweep_stride) and update the factory contract
accordingly, or (B) explicitly detect a non-default fp8_scale_sweep_stride and
raise/rollback with a clear error so users know registry-backed calibrators do
not support stride; ensure the call still passes initial_amax,
module._calibrator._axis, partial(_mse_quant_func, quantizer=module) and include
fp8_scale_sweep_stride when choosing option A, mirroring how NVFP4MSECalibrator
is constructed.

♻️ Duplicate comments (2)

modelopt_recipes/models/Nemotron-3-Super-120B-A12B/super-nvfp4.yaml (1)
41-42: ⚠️ Potential issue | 🟡 Minor

The metadata description still overstates what this recipe quantizes.

These lines say attention o_proj, fc1_latent_proj, and fc2_latent_proj are FP8 per-tensor, but there are no matching overrides in quant_cfg, and the header comments say the latent MoE projections stay BF16. Please update the description so it matches the actual recipe.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modelopt_recipes/models/Nemotron-3-Super-120B-A12B/super-nvfp4.yaml` around
lines 41 - 42, The metadata description in the YAML (the description field)
incorrectly claims that "attention o_proj", "fc1_latent_proj", and
"fc2_latent_proj" are FP8 per-tensor while the recipe does not contain matching
overrides in quant_cfg and header comments state latent MoE projections remain
BF16/FP16; update the description text to accurately reflect the recipe by
removing or changing the FP8 claims (e.g., state that latent MoE projections and
those specific projections remain BF16/FP16 and only list the layers that the
quant_cfg actually overrides as FP8 per-tensor).
modelopt_recipes/models/Nemotron-3-Super-120B-A12B/super-nvfp4-fp8-sweep-stride4.yaml (1)
34-35: ⚠️ Potential issue | 🟡 Minor

The metadata description does not match the actual quantizer mapping.

These lines still claim attention o_proj, fc1_latent_proj, and fc2_latent_proj are FP8 per-tensor, but this recipe never enables those quantizers, and the header comments above say latent MoE stays BF16. Please align the description with the quant_cfg that follows.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@modelopt_recipes/models/Nemotron-3-Super-120B-A12B/super-nvfp4-fp8-sweep-stride4.yaml`
around lines 34 - 35, The YAML description string claims that "attention o_proj,
fc1_latent_proj, and fc2_latent_proj" are FP8 per-tensor, but the quant_cfg in
this recipe does not enable quantizers for "attention.o_proj",
"fc1_latent_proj", or "fc2_latent_proj" (and the header notes latent MoE stays
BF16/FP16); fix this by either updating the description to state those
projections remain BF16/FP16 (and remove the FP8 per-tensor claim) or modify
quant_cfg to actually enable FP8 per-tensor quantizers for the keys
"attention.o_proj", "fc1_latent_proj", and "fc2_latent_proj" so the comment
matches the mapping.

🧹 Nitpick comments (1)

modelopt/torch/quantization/calib/mse.py (1)
202-206: Add a regression test for strided FP8 candidate generation.

The existing coverage only exercises the default 126-candidate path. This branch adds two new behaviors—subsampling and forced inclusion of the last candidate—so it should have a focused test for fp8_scale_sweep_stride > 1 to lock down both the reduced candidate count and preservation of the max scale.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modelopt/torch/quantization/calib/mse.py` around lines 202 - 206, The strided
FP8 candidate-generation branch guarded by fp8_scale_sweep_stride > 1 (in the
block that subsamples fp8_values into candidates and appends the last value) is
untested; add a focused regression test (e.g.,
test_fp8_scale_sweep_stride_preserves_last_candidate) that sets
fp8_scale_sweep_stride > 1, calls the code path that produces fp8_values, and
asserts that the resulting candidates length is reduced according to the stride
and that the final element equals the original fp8_values[-1] (verifying forced
inclusion of the max scale); ensure the test covers both subsampling and the
append behavior so the branch is locked down.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@modelopt_recipes/models/Nemotron-3-Super-120B-A12B/super-nvfp4.yaml`:
- Around line 16-28: Remove the unresolved Git merge conflict markers (<<<<<<<,
=======, >>>>>>>) and restore a single coherent comment block describing the
quant config; keep the more detailed version that lists both HF and
Megatron-Core names (the lines mentioning mixer.experts.<N>.{up,down}_proj,
mlp.experts.local_experts.<N>.linear_fc{1,2},
mixer.shared_experts.{up,down}_proj, and mlp.shared_experts.linear_fc{1,2}) or
merge its additional details into the shorter variant so the YAML comment is
valid and free of conflict markers.

---

Outside diff comments:
In `@modelopt/torch/quantization/model_calib.py`:
- Around line 389-412: The registered backend factory path (lookup via
_FP8_SWEEP_CALIBRATOR_REGISTRY and assigned to backend_factory) currently calls
the factory with the old 3-argument signature and thus ignores
fp8_scale_sweep_stride; update the branch that sets module._calibrator via
backend_factory to either (A) call the factory with the new argument (pass
fp8_scale_sweep_stride) and update the factory contract accordingly, or (B)
explicitly detect a non-default fp8_scale_sweep_stride and raise/rollback with a
clear error so users know registry-backed calibrators do not support stride;
ensure the call still passes initial_amax, module._calibrator._axis,
partial(_mse_quant_func, quantizer=module) and include fp8_scale_sweep_stride
when choosing option A, mirroring how NVFP4MSECalibrator is constructed.

---

Duplicate comments:
In
`@modelopt_recipes/models/Nemotron-3-Super-120B-A12B/super-nvfp4-fp8-sweep-stride4.yaml`:
- Around line 34-35: The YAML description string claims that "attention o_proj,
fc1_latent_proj, and fc2_latent_proj" are FP8 per-tensor, but the quant_cfg in
this recipe does not enable quantizers for "attention.o_proj",
"fc1_latent_proj", or "fc2_latent_proj" (and the header notes latent MoE stays
BF16/FP16); fix this by either updating the description to state those
projections remain BF16/FP16 (and remove the FP8 per-tensor claim) or modify
quant_cfg to actually enable FP8 per-tensor quantizers for the keys
"attention.o_proj", "fc1_latent_proj", and "fc2_latent_proj" so the comment
matches the mapping.

In `@modelopt_recipes/models/Nemotron-3-Super-120B-A12B/super-nvfp4.yaml`:
- Around line 41-42: The metadata description in the YAML (the description
field) incorrectly claims that "attention o_proj", "fc1_latent_proj", and
"fc2_latent_proj" are FP8 per-tensor while the recipe does not contain matching
overrides in quant_cfg and header comments state latent MoE projections remain
BF16/FP16; update the description text to accurately reflect the recipe by
removing or changing the FP8 claims (e.g., state that latent MoE projections and
those specific projections remain BF16/FP16 and only list the layers that the
quant_cfg actually overrides as FP8 per-tensor).

---

Nitpick comments:
In `@modelopt/torch/quantization/calib/mse.py`:
- Around line 202-206: The strided FP8 candidate-generation branch guarded by
fp8_scale_sweep_stride > 1 (in the block that subsamples fp8_values into
candidates and appends the last value) is untested; add a focused regression
test (e.g., test_fp8_scale_sweep_stride_preserves_last_candidate) that sets
fp8_scale_sweep_stride > 1, calls the code path that produces fp8_values, and
asserts that the resulting candidates length is reduced according to the stride
and that the final element equals the original fp8_values[-1] (verifying forced
inclusion of the max scale); ensure the test covers both subsampling and the
append behavior so the branch is locked down.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 0d72ce13-180e-46ea-b4d8-8c6c140d22a7

📥 Commits

Reviewing files that changed from the base of the PR and between 9282cdb and f796197.

📒 Files selected for processing (5)

modelopt/torch/quantization/calib/mse.py
modelopt/torch/quantization/config.py
modelopt/torch/quantization/model_calib.py
modelopt_recipes/models/Nemotron-3-Super-120B-A12B/super-nvfp4-fp8-sweep-stride4.yaml
modelopt_recipes/models/Nemotron-3-Super-120B-A12B/super-nvfp4.yaml

Signed-off-by: Jennifer Chen <jennifchen@nvidia.com>

coderabbitai

♻️ Duplicate comments (2)

modelopt_recipes/models/Nemotron-3-Super-120B-A12B/super-nvfp4.yaml (1)

35-36: ⚠️ Potential issue | 🟡 Minor

Metadata precision mapping is inconsistent with the actual recipe.

Line 35 says latent MoE fc1_latent_proj/fc2_latent_proj are FP8 per-tensor, but this file has no latent-MoE quantizer overrides and the header (Line 27) says they stay BF16. Please align the description with quant_cfg to avoid confusion.

Proposed patch

 metadata:
   recipe_type: ptq
-  description: Super NVFP4 mixed precision — sparse MoE experts NVFP4 (W4A4, group_size 16); shared experts, mamba in/out_proj, and Latent MOE fc1_latent_proj/fc2_latent_proj
-    FP8 per-tensor; FP8 KV cache; lm_head/MTP/SSM stay BF16/FP16. Weight-MSE calibration with FP8 scale sweep.
+  description: >-
+    Super NVFP4 mixed precision — sparse MoE experts NVFP4 (W4A4, group_size 16);
+    shared experts and mamba in/out_proj FP8 per-tensor; FP8 KV cache; latent MoE,
+    lm_head, MTP, output, and mamba conv1d stay BF16; SSM cache stays FP32
+    (optionally FP16 in vLLM). Weight-MSE calibration with FP8 scale sweep.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@modelopt_recipes/models/Nemotron-3-Super-120B-A12B/super-nvfp4.yaml` around
lines 35 - 36, The description's claim that latent MoE
"fc1_latent_proj/fc2_latent_proj" are FP8 per-tensor is inconsistent with the
quant_cfg (and header) which leave those layers as BF16; either update the
description string to state that fc1_latent_proj and fc2_latent_proj remain
BF16/FP16 to match quant_cfg, or add explicit quantizer overrides in quant_cfg
for the fc1_latent_proj and fc2_latent_proj modules to set them to FP8
per-tensor (matching the rest of the FP8 settings); ensure the description text
and the quant_cfg entries (module names fc1_latent_proj, fc2_latent_proj) are
aligned.

modelopt_recipes/models/Nemotron-3-Super-120B-A12B/super-nvfp4-fp8-sweep-stride4.yaml (1)

34-35: ⚠️ Potential issue | 🟡 Minor

Description/comments conflict on latent MoE and SSM/mamba precision.

Lines 34-35 state latent MoE is FP8 per-tensor, but the recipe does not enable latent-MoE quantizers. Also, Lines 134-135 conflict with earlier comments on SSM/mamba precision. Please make these statements internally consistent.

Proposed patch

 metadata:
   recipe_type: ptq
-  description: Super NVFP4 mixed precision — sparse MoE experts NVFP4 (W4A4, group_size 16); shared experts, mamba in/out_proj, and Latent MOE fc1_latent_proj/fc2_latent_proj
-    FP8 per-tensor; FP8 KV cache; lm_head/MTP/SSM stay BF16/FP16. Weight-MSE calibration with stride-4 FP8 scale sweep.
+  description: >-
+    Super NVFP4 mixed precision — sparse MoE experts NVFP4 (W4A4, group_size 16);
+    shared experts and mamba in/out_proj FP8 per-tensor; FP8 KV cache; latent MoE,
+    lm_head, MTP, output, and mamba conv1d stay BF16; SSM cache stays FP32
+    (optionally FP16 in vLLM). Weight-MSE calibration with stride-4 FP8 scale sweep.
@@
-    # Stay BF16: lm_head, output projection, MoE routers/gates, MTP head.
-    # SSM state / mamba conv1d stay FP16.
+    # Stay BF16: lm_head, output projection, MoE routers/gates, MTP head, mamba conv1d.
+    # SSM state stays FP32 (can be set to FP16 in vLLM).

Also applies to: 134-135

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In
`@modelopt_recipes/models/Nemotron-3-Super-120B-A12B/super-nvfp4-fp8-sweep-stride4.yaml`
around lines 34 - 35, The description text claims "latent MoE is FP8 per-tensor"
and that "lm_head/MTP/SSM stay BF16/FP16" but the recipe doesn't enable
latent-MoE quantizers and later lines conflict on SSM/mamba precision; fix by
making the YAML fields match the prose: either add the latent-MoE quantizer key
(e.g., include latent_moe in the quantizers list or set
enable_latent_moe_quantizers: true) and ensure its precision is set to FP8
per-tensor, or change the description to remove the FP8 latent-MoE claim;
likewise reconcile SSM/mamba entries by updating the lm_head/MTP/SSM and mamba
precision fields to uniformly state BF16/FP16 (or change the description to
reflect the actual configured precisions) so the "description" string and the
quantizer/precision keys (latent_moe_quantizers, quantizers list, ssm_precision,
mamba_precision, lm_head_precision / mtp_precision) are consistent.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In
`@modelopt_recipes/models/Nemotron-3-Super-120B-A12B/super-nvfp4-fp8-sweep-stride4.yaml`:
- Around line 34-35: The description text claims "latent MoE is FP8 per-tensor"
and that "lm_head/MTP/SSM stay BF16/FP16" but the recipe doesn't enable
latent-MoE quantizers and later lines conflict on SSM/mamba precision; fix by
making the YAML fields match the prose: either add the latent-MoE quantizer key
(e.g., include latent_moe in the quantizers list or set
enable_latent_moe_quantizers: true) and ensure its precision is set to FP8
per-tensor, or change the description to remove the FP8 latent-MoE claim;
likewise reconcile SSM/mamba entries by updating the lm_head/MTP/SSM and mamba
precision fields to uniformly state BF16/FP16 (or change the description to
reflect the actual configured precisions) so the "description" string and the
quantizer/precision keys (latent_moe_quantizers, quantizers list, ssm_precision,
mamba_precision, lm_head_precision / mtp_precision) are consistent.

In `@modelopt_recipes/models/Nemotron-3-Super-120B-A12B/super-nvfp4.yaml`:
- Around line 35-36: The description's claim that latent MoE
"fc1_latent_proj/fc2_latent_proj" are FP8 per-tensor is inconsistent with the
quant_cfg (and header) which leave those layers as BF16; either update the
description string to state that fc1_latent_proj and fc2_latent_proj remain
BF16/FP16 to match quant_cfg, or add explicit quantizer overrides in quant_cfg
for the fc1_latent_proj and fc2_latent_proj modules to set them to FP8
per-tensor (matching the rest of the FP8 settings); ensure the description text
and the quant_cfg entries (module names fc1_latent_proj, fc2_latent_proj) are
aligned.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 595b4531-3ced-4894-a5cc-c28f161c8f3e

📥 Commits

Reviewing files that changed from the base of the PR and between f796197 and 81d9d87.

📒 Files selected for processing (2)

modelopt_recipes/models/Nemotron-3-Super-120B-A12B/super-nvfp4-fp8-sweep-stride4.yaml
modelopt_recipes/models/Nemotron-3-Super-120B-A12B/super-nvfp4.yaml

Signed-off-by: Jennifer Chen <jennifchen@nvidia.com>

jinhangchoi · 2026-04-29T21:28:37Z

+#     Megatron-Core names:    mlp.shared_experts.linear_fc{1,2}
+#   - Mamba mixer linears (mixer.{in,out}_proj):                   FP8 per-tensor
+#   - KV cache:                                                    FP8
+#   - Attention linears ({q,k,v}_proj):                            BF16 (not quantized)


Can we double check attention out linear? IIRC, attention o_proj should be FP8.

responded in slack, only 2/9 attention layers had o_proj FP8 in final Super NVFP4 ckpt, but we can always add it later to test if accuracy degradation is minimal

Signed-off-by: Jennifer Chen <jennifchen@nvidia.com>

jenchen13 · 2026-05-01T20:38:47Z

/claude review

jenchen13 · 2026-05-01T20:44:46Z

@coderabbitai full review

coderabbitai · 2026-05-01T20:44:52Z

✅ Actions performed

Full review triggered.

jenchen13 · 2026-05-01T20:45:02Z

@claude review

Signed-off-by: Jennifer Chen <jennifchen@nvidia.com>

coderabbitai

Actionable comments posted: 5

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

modelopt/torch/export/unified_export_megatron.py (1)
1213-1229: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Record packed-expert quant metadata against the module prefix, not the weight key.

In both pack-remap helpers, prefix is the tensor key written into state_dict (for example ...weight). Recording that verbatim produces layer_config_dict entries like ...weight.quantization, and process_layer_quant_config() will then emit quantized_layers names ending in .weight instead of the HF module prefix. Serving-side layer matching will miss those packed experts.
Suggested fix
-            self._record_layer_quant_config(prefix, qformat, block_size)
+            module_prefix = prefix.rsplit(".", 1)[0] + "."
+            self._record_layer_quant_config(module_prefix, qformat, block_size)
Apply the same normalization in both _pack_name_remapping() and _pack_name_remapping_gpt_oss().
Also applies to: 1280-1298
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modelopt/torch/export/unified_export_megatron.py` around lines 1213 - 1229,
The code records quant metadata under the full tensor key (e.g., "...weight")
causing layer names to end with ".weight"; update both _pack_name_remapping and
_pack_name_remapping_gpt_oss to normalize the prefix before calling
self._record_layer_quant_config by stripping the tensor suffix (e.g., remove
trailing ".weight" or the last dot component) so the module-level HF prefix is
recorded instead of the weight key; apply the same normalization logic where
self._record_layer_quant_config(prefix, qformat, block_size) is invoked in both
functions.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@modelopt_recipes/models/Nemotron-3-Super-120B-A12B/super-nvfp4-amax.yaml`:
- Around line 16-36: The recipe metadata/comments misstate Latent MOE behavior:
fc1_latent_proj/fc2_latent_proj are described as FP8 per-tensor but the
quant_cfg (and lines noting "stay BF16/FP16") never enable those quantizers;
either update the human-readable description to say Latent MOE projections
remain BF16 (or FP16 per existing comment) to match quant_cfg, or enable FP8
per-tensor quantizers for fc1_latent_proj and fc2_latent_proj in the quant_cfg
so the comment matches behavior—make the change by editing the
metadata/description block and/or the quant_cfg entries that reference
fc1_latent_proj and fc2_latent_proj accordingly.

In
`@modelopt_recipes/models/Nemotron-3-Super-120B-A12B/super-nvfp4-fp8-sweep-stride4.yaml`:
- Around line 34-35: The description erroneously claims the Latent MoE is "FP8
per-tensor" while the recipe does not enable quantizer patterns for
fc1_latent_proj or fc2_latent_proj; update the metadata description string (the
YAML "description" field) to remove or correct the FP8 statement for Latent MoE
(or explicitly state that fc1_latent_proj/fc2_latent_proj remain BF16/FP16) so
it matches the actual quantizer configuration for fc1_latent_proj and
fc2_latent_proj.

In
`@modelopt_recipes/models/Nemotron-3-Super-120B-A12B/super-nvfp4-max-calib.yaml`:
- Around line 35-36: The description claims latent MoE is FP8 and that static
scales are "chosen by MSE", but the YAML sets no quantizers for
fc1_latent_proj/fc2_latent_proj and uses method: max; update the human-readable
description to match the actual config by either (A) enabling the latent
projection quantizers (fc1_latent_proj/fc2_latent_proj) to make latent MoE FP8,
or (B) explicitly state latent projections remain unquantized (not FP8) and
change the phrase "static scales are chosen by MSE" to reflect the configured
method (e.g., "static scales computed by max" or "method: max"); adjust the
description lines referencing "FP8 per-tensor; ... Latent MOE
fc1_latent_proj/fc2_latent_proj" and the sentence about static scale selection
accordingly so the text and the fields (fc1_latent_proj, fc2_latent_proj, and
method: max) are consistent.

In `@modelopt_recipes/models/Nemotron-3-Super-120B-A12B/super-nvfp4.yaml`:
- Around line 35-36: The description string misreports the latent-MoE policy:
update the description (the YAML description field) to reflect that
fc1_latent_proj and fc2_latent_proj are left unquantized (per the quant_cfg)
rather than being "FP8 per-tensor"; edit the text around the FP8/KV/latent-MoE
sentence to explicitly state that fc1_latent_proj/fc2_latent_proj remain
unquantized (or their actual precision) and keep FP8 per-tensor/KV wording only
for the tensors that truly use FP8.

In `@modelopt/torch/export/unified_export_megatron.py`:
- Around line 815-818: The current check only treats qformat is None as
excluded, but QUANTIZATION_NONE should be treated the same so those layers are
recorded as excluded (and not dropped from the exported quant config); update
the conditional in unified_export_megatron.py where qformat is obtained from
_get_quantization_format(module) to also consider QUANTIZATION_NONE (e.g., if
qformat is None or qformat == QUANTIZATION_NONE) before calling
_record_excluded_module(prefix), and ensure any references to
_record_layer_quant_config still see these explicitly disabled quantizers as
excluded rather than omitted.

---

Outside diff comments:
In `@modelopt/torch/export/unified_export_megatron.py`:
- Around line 1213-1229: The code records quant metadata under the full tensor
key (e.g., "...weight") causing layer names to end with ".weight"; update both
_pack_name_remapping and _pack_name_remapping_gpt_oss to normalize the prefix
before calling self._record_layer_quant_config by stripping the tensor suffix
(e.g., remove trailing ".weight" or the last dot component) so the module-level
HF prefix is recorded instead of the weight key; apply the same normalization
logic where self._record_layer_quant_config(prefix, qformat, block_size) is
invoked in both functions.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 3adef0ee-1161-4068-a796-fc857c2455fd

📥 Commits

Reviewing files that changed from the base of the PR and between 8eec6d4 and 5e32bd1.

📒 Files selected for processing (11)

modelopt/torch/export/unified_export_megatron.py
modelopt/torch/quantization/calib/mse.py
modelopt/torch/quantization/config.py
modelopt/torch/quantization/model_calib.py
modelopt/torch/quantization/plugins/megatron.py
modelopt/torch/quantization/qtensor/nvfp4_tensor.py
modelopt_recipes/models/Nemotron-3-Super-120B-A12B/super-nvfp4-amax.yaml
modelopt_recipes/models/Nemotron-3-Super-120B-A12B/super-nvfp4-fp8-sweep-stride4.yaml
modelopt_recipes/models/Nemotron-3-Super-120B-A12B/super-nvfp4-max-calib.yaml
modelopt_recipes/models/Nemotron-3-Super-120B-A12B/super-nvfp4.yaml
tests/gpu_megatron/torch/export/test_unified_export_megatron.py

Signed-off-by: Jennifer Chen <jennifchen@nvidia.com>

jenchen13 · 2026-05-01T21:05:57Z

@CodeRabbit full review

coderabbitai · 2026-05-01T21:06:03Z

✅ Actions performed

Full review triggered.

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (2)

modelopt/torch/quantization/qtensor/nvfp4_tensor.py (1)

120-122: 💤 Low value

_get_static_global_amax is called twice for the static branch.

_is_static_quantizer internally calls _get_static_global_amax and discards the result; line 122 then calls it again. Consider hoisting to a single variable:

♻️ Proposed refactor

-        if cls._is_static_quantizer(weight_quantizer):
-            # Static path: use pre-computed per-block amax values from quantizer
-            global_amax = cls._get_static_global_amax(weight_quantizer).float()
+        global_amax = cls._get_static_global_amax(weight_quantizer)
+        if global_amax is not None:
+            # Static path: use pre-computed per-block amax values from quantizer
+            global_amax = global_amax.float()

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@modelopt/torch/quantization/qtensor/nvfp4_tensor.py` around lines 120 - 122,
Hoist the call to _get_static_global_amax so it's executed only once: call amax
= cls._get_static_global_amax(weight_quantizer) (and .float() as needed) and
reuse that value instead of calling _get_static_global_amax again; update
_is_static_quantizer to accept an optional precomputed amax parameter or
refactor _is_static_quantizer to avoid calling _get_static_global_amax
internally so the static branch uses the hoisted amax (refer to
_is_static_quantizer, _get_static_global_amax and weight_quantizer).

tests/gpu_megatron/torch/export/test_unified_export_megatron.py (1)

151-164: ⚡ Quick win

Add one end-to-end mixed-precision export case.

The new export path here is the per-layer layer_config_dict/process_layer_quant_config flow, but this matrix still only exercises uniform configs (NVFP4_DEFAULT_CFG / FP8_DEFAULT_CFG). A single mixed recipe case would catch regressions in quantized_layers, excludes, and the config.json ↔ hf_quant_config.json parity you just tightened.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/gpu_megatron/torch/export/test_unified_export_megatron.py` around lines
151 - 164, Add a new parametrized test case to the existing
pytest.mark.parametrize matrix (the tuple of ("model_type", "arch",
"extra_module", "quant_config", "kv_cache_quant_cfg")) to exercise the per-layer
mixed-precision export path: supply a quant_config that triggers the
layer_config_dict / process_layer_quant_config flow (i.e., a mixed-recipe config
rather than NVFP4_DEFAULT_CFG or FP8_DEFAULT_CFG) so the test exercises
quantized_layers, excludes handling, and the config.json ↔ hf_quant_config.json
parity; update the matrix alongside existing entries (referencing the parameters
model_type/arch/extra_module/quant_config/kv_cache_quant_cfg) so one case uses a
mixed per-layer config for either nemotron or llama.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In
`@modelopt_recipes/models/Nemotron-3-Super-120B-A12B/super-nvfp4-max-calib.yaml`:
- Around line 45-79: The two routed-expert weight quantizer entries
('*mixer.experts.*weight_quantizer' and '*mlp.experts*weight_quantizer')
currently use type: dynamic; change them to use the static-weight calibration
path by replacing type: dynamic with type: static (or remove the dynamic setting
and ensure static is explicitly set) so routed-expert weights remain static in
this max-calib variant while leaving the corresponding input_quantizer entries
unchanged.

---

Nitpick comments:
In `@modelopt/torch/quantization/qtensor/nvfp4_tensor.py`:
- Around line 120-122: Hoist the call to _get_static_global_amax so it's
executed only once: call amax = cls._get_static_global_amax(weight_quantizer)
(and .float() as needed) and reuse that value instead of calling
_get_static_global_amax again; update _is_static_quantizer to accept an optional
precomputed amax parameter or refactor _is_static_quantizer to avoid calling
_get_static_global_amax internally so the static branch uses the hoisted amax
(refer to _is_static_quantizer, _get_static_global_amax and weight_quantizer).

In `@tests/gpu_megatron/torch/export/test_unified_export_megatron.py`:
- Around line 151-164: Add a new parametrized test case to the existing
pytest.mark.parametrize matrix (the tuple of ("model_type", "arch",
"extra_module", "quant_config", "kv_cache_quant_cfg")) to exercise the per-layer
mixed-precision export path: supply a quant_config that triggers the
layer_config_dict / process_layer_quant_config flow (i.e., a mixed-recipe config
rather than NVFP4_DEFAULT_CFG or FP8_DEFAULT_CFG) so the test exercises
quantized_layers, excludes handling, and the config.json ↔ hf_quant_config.json
parity; update the matrix alongside existing entries (referencing the parameters
model_type/arch/extra_module/quant_config/kv_cache_quant_cfg) so one case uses a
mixed per-layer config for either nemotron or llama.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 94f3528f-2ee4-4757-b8de-3a323300ca74

📥 Commits

Reviewing files that changed from the base of the PR and between 8eec6d4 and 5de5541.

📒 Files selected for processing (10)

modelopt/torch/export/unified_export_megatron.py
modelopt/torch/quantization/calib/mse.py
modelopt/torch/quantization/config.py
modelopt/torch/quantization/model_calib.py
modelopt/torch/quantization/plugins/megatron.py
modelopt/torch/quantization/qtensor/nvfp4_tensor.py
modelopt_recipes/models/Nemotron-3-Super-120B-A12B/super-nvfp4-fp8-sweep-stride4.yaml
modelopt_recipes/models/Nemotron-3-Super-120B-A12B/super-nvfp4-max-calib.yaml
modelopt_recipes/models/Nemotron-3-Super-120B-A12B/super-nvfp4.yaml
tests/gpu_megatron/torch/export/test_unified_export_megatron.py

coderabbitai · 2026-05-01T21:14:34Z

+    # MoE routed experts -> NVFP4 W4A4, block_size 16, e4m3 scale.
+    # HF/export names: backbone.layers.*.mixer.experts.*.{up,down}_proj.
+    - quantizer_name: '*mixer.experts.*weight_quantizer'
+      enable: true
+      cfg:
+        block_sizes:
+          -1: 16
+          type: dynamic
+          scale_bits: e4m3
+        num_bits: e2m1
+    - quantizer_name: '*mixer.experts.*input_quantizer'
+      enable: true
+      cfg:
+        block_sizes:
+          -1: 16
+          type: dynamic
+          scale_bits: e4m3
+        num_bits: e2m1
+    # Megatron-Core/PTQ names: decoder.layers.*.mlp.experts.local_experts.*.linear_fc{1,2}.
+    - quantizer_name: '*mlp.experts*weight_quantizer'
+      enable: true
+      cfg:
+        block_sizes:
+          -1: 16
+          type: dynamic
+          scale_bits: e4m3
+        num_bits: e2m1
+    - quantizer_name: '*mlp.experts*input_quantizer'
+      enable: true
+      cfg:
+        block_sizes:
+          -1: 16
+          type: dynamic
+          scale_bits: e4m3
+        num_bits: e2m1


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Keep routed-expert weights static in the max-calib variant.

These two *...weight_quantizer entries are type: dynamic, so this recipe does more than swap method: mse for method: max—it changes routed-expert weights to dynamic NVFP4 and bypasses the static-weight calibration path entirely. That makes this variant a different quantization recipe, not a max-calibrated version of super-nvfp4.yaml.

Suggested fix

- quantizer_name: '*mixer.experts.*weight_quantizer' enable: true cfg: block_sizes: - type: dynamic + type: static scale_bits: e4m3 num_bits: e2m1 @@ - quantizer_name: '*mlp.experts*weight_quantizer' enable: true cfg: block_sizes: - type: dynamic + type: static scale_bits: e4m3 num_bits: e2m1

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

# MoE routed experts -> NVFP4 W4A4, block_size 16, e4m3 scale.

# HF/export names: backbone.layers.*.mixer.experts.*.{up,down}_proj.

- quantizer_name: '*mixer.experts.*weight_quantizer'

enable: true

cfg:

block_sizes:

-1: 16

type: dynamic

scale_bits: e4m3

num_bits: e2m1

- quantizer_name: '*mixer.experts.*input_quantizer'

enable: true

cfg:

block_sizes:

-1: 16

type: dynamic

scale_bits: e4m3

num_bits: e2m1

# Megatron-Core/PTQ names: decoder.layers.*.mlp.experts.local_experts.*.linear_fc{1,2}.

- quantizer_name: '*mlp.experts*weight_quantizer'

enable: true

cfg:

block_sizes:

-1: 16

type: dynamic

scale_bits: e4m3

num_bits: e2m1

- quantizer_name: '*mlp.experts*input_quantizer'

enable: true

cfg:

block_sizes:

-1: 16

type: dynamic

scale_bits: e4m3

num_bits: e2m1

# MoE routed experts -> NVFP4 W4A4, block_size 16, e4m3 scale.

# HF/export names: backbone.layers.*.mixer.experts.*.{up,down}_proj.

- quantizer_name: '*mixer.experts.*weight_quantizer'

enable: true

cfg:

block_sizes:

-1: 16

type: static

scale_bits: e4m3

num_bits: e2m1

- quantizer_name: '*mixer.experts.*input_quantizer'

enable: true

cfg:

block_sizes:

-1: 16

type: dynamic

scale_bits: e4m3

num_bits: e2m1

# Megatron-Core/PTQ names: decoder.layers.*.mlp.experts.local_experts.*.linear_fc{1,2}.

- quantizer_name: '*mlp.experts*weight_quantizer'

enable: true

cfg:

block_sizes:

-1: 16

type: static

scale_bits: e4m3

num_bits: e2m1

- quantizer_name: '*mlp.experts*input_quantizer'

enable: true

cfg:

block_sizes:

-1: 16

type: dynamic

scale_bits: e4m3

num_bits: e2m1

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@modelopt_recipes/models/Nemotron-3-Super-120B-A12B/super-nvfp4-max-calib.yaml` around lines 45 - 79, The two routed-expert weight quantizer entries ('*mixer.experts.*weight_quantizer' and '*mlp.experts*weight_quantizer') currently use type: dynamic; change them to use the static-weight calibration path by replacing type: dynamic with type: static (or remove the dynamic setting and ensure static is explicitly set) so routed-expert weights remain static in this max-calib variant while leaving the corresponding input_quantizer entries unchanged.

shengliangxu · 2026-05-01T23:00:53Z

-            if hasattr(self, "kv_cache_dtype"):
-                self._hf_quant_config["quantization"]["kv_cache_quant_algo"] = self.kv_cache_dtype
+            # Use one serving-facing config for both hf_quant_config.json and config.json.
+            self._hf_quant_config = convert_hf_quant_config_format(raw_hf_quant_config)


do we change the format of hf_quant_config.json by this change?

add nemotron super 4 nvfp4 recipe

2a0c852

Signed-off-by: Jennifer Chen <jennifchen@nvidia.com>

jenchen13 requested a review from a team as a code owner April 28, 2026 18:33

jenchen13 requested a review from h-guo18 April 28, 2026 18:33

remove latent moe fp8

9f96df3

Signed-off-by: Jennifer Chen <jennifchen@nvidia.com>

coderabbitai Bot reviewed Apr 28, 2026

View reviewed changes

Comment thread modelopt_recipes/models/Nemotron-3-Super-120B-A12B/super-nvfp4.yaml Outdated

Comment thread modelopt_recipes/models/Nemotron-3-Super-120B-A12B/super-nvfp4.yaml Outdated

jenchen13 requested a review from realAsma April 28, 2026 18:40

jenchen13 changed the title ~~Add nemotron Super v3 NVFP4 PTQ recipe~~ Add Nemotron Super v3 NVFP4 PTQ recipe Apr 28, 2026

jenchen13 added 3 commits April 28, 2026 11:54

fix docstring

9282cdb

Signed-off-by: Jennifer Chen <jennifchen@nvidia.com>

fix MSE moe calibration and add stride for fp8 scale sweep

cfaf055

Signed-off-by: Jennifer Chen <jennifchen@nvidia.com>

fix MLM naming in recipe and add stride recipe

f796197

Signed-off-by: Jennifer Chen <jennifchen@nvidia.com>

jenchen13 requested a review from a team as a code owner April 29, 2026 19:11

jenchen13 changed the title ~~Add Nemotron Super v3 NVFP4 PTQ recipe~~ Fix dynamic block quantizer detection & MSE MOE calibration; Add Nemotron Super v3 NVFP4 PTQ recipe Apr 29, 2026

jenchen13 requested review from Fridah-nv and meenchen April 29, 2026 19:15

coderabbitai Bot reviewed Apr 29, 2026

View reviewed changes

Comment thread modelopt_recipes/models/Nemotron-3-Super-120B-A12B/super-nvfp4.yaml Outdated

fix merge conflict

81d9d87

Signed-off-by: Jennifer Chen <jennifchen@nvidia.com>

coderabbitai Bot reviewed Apr 29, 2026

View reviewed changes

jenchen13 added 2 commits April 29, 2026 13:35

amax recipe

acf6892

Signed-off-by: Jennifer Chen <jennifchen@nvidia.com>

fix config

372820e

Signed-off-by: Jennifer Chen <jennifchen@nvidia.com>

jinhangchoi reviewed Apr 29, 2026

View reviewed changes

jenchen13 added 2 commits April 29, 2026 18:36

add amax recipe

c635b8a

Signed-off-by: Jennifer Chen <jennifchen@nvidia.com>

mixed precision export for megatron

a829722

Signed-off-by: Jennifer Chen <jennifchen@nvidia.com>

jenchen13 requested a review from a team as a code owner May 1, 2026 19:29

fix mcore ckpt resume for static quantizers and MSE export

5e32bd1

Signed-off-by: Jennifer Chen <jennifchen@nvidia.com>

jenchen13 changed the title ~~Fix dynamic block quantizer detection & MSE MOE calibration; Add Nemotron Super v3 NVFP4 PTQ recipe~~ Support Mixed precision & Static MSE PTQ in MCore export; Add Nemotron Super v3 NVFP4 PTQ recipe May 1, 2026

jenchen13 changed the title ~~Support Mixed precision & Static MSE PTQ in MCore export; Add Nemotron Super v3 NVFP4 PTQ recipe~~ Support Mixed precision & Static MSE PTQ in MCore export; Nemotron Super v3 NVFP4 recipe May 1, 2026

remove duplicate recipe

ef37456

Signed-off-by: Jennifer Chen <jennifchen@nvidia.com>

coderabbitai Bot reviewed May 1, 2026

View reviewed changes

jenchen13 added 3 commits May 1, 2026 13:53

fix docstring

b5f4b56

Signed-off-by: Jennifer Chen <jennifchen@nvidia.com>

fix max calib recipe

b5c5331

Signed-off-by: Jennifer Chen <jennifchen@nvidia.com>

cleanup recipes

5de5541

Signed-off-by: Jennifer Chen <jennifchen@nvidia.com>

coderabbitai Bot reviewed May 1, 2026

View reviewed changes

shengliangxu reviewed May 1, 2026

View reviewed changes

Conversation

jenchen13 commented Apr 28, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Why

Testing

Super recipe

Usage

Testing

Before your PR is "Ready for review"

Additional Information

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Apr 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Changes

Estimated code review effort

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

codecov Bot commented Apr 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

github-actions Bot commented Apr 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Built to branch gh-pages at 2026-05-01 21:08 UTC. Preview will be ready when the GitHub Pages deployment is complete.

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

jinhangchoi Apr 29, 2026

Choose a reason for hiding this comment

Uh oh!

jenchen13 Apr 30, 2026

Choose a reason for hiding this comment

Uh oh!

jenchen13 commented May 1, 2026

Uh oh!

jenchen13 commented May 1, 2026

Uh oh!

coderabbitai Bot commented May 1, 2026

Uh oh!

jenchen13 commented May 1, 2026

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jenchen13 commented May 1, 2026

Uh oh!

coderabbitai Bot commented May 1, 2026

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 1, 2026

Choose a reason for hiding this comment

Uh oh!

shengliangxu May 1, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

jenchen13 commented Apr 28, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Apr 28, 2026 •

edited

Loading

codecov Bot commented Apr 28, 2026 •

edited

Loading

github-actions Bot commented Apr 29, 2026 •

edited

Loading

Built to branch `gh-pages` at 2026-05-01 21:08 UTC.
Preview will be ready when the GitHub Pages deployment is complete.