Support YAML quant recipe in PTQ and remove first/last layer modifier code#4503
Support YAML quant recipe in PTQ and remove first/last layer modifier code#4503jenchen13 merged 4 commits intoNVIDIA:mainfrom
Conversation
Signed-off-by: Jennifer Chen <jennifchen@nvidia.com>
|
This PR has been automatically converted to draft because all PRs must start as drafts. When you are ready for review, click Ready for Review to begin the review process. This will:
See the contribution guide for more details. |
|
Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
|
/ok to test 1aa8e99 |
ChenhanYu
left a comment
There was a problem hiding this comment.
Thanks for moving layer skipping into the recipe — much cleaner than the imperative get_first/last_layers_disabled_config helpers. Requesting changes for one blocker plus a few smaller items.
🚨 Attention required — public-CLI contract change
This PR removes three documented flags (--num-first-layers-to-skip-quant, --num-last-layers-to-skip-quant, --force-all-expert-routing) with no deprecation window. Worth calling out in the PR description / release notes, not buried in a refactor. Suggest keeping the flags one cycle with a DeprecationWarning pointing users at the recipe equivalent.
🔗 Upstream dependency not closed up
The recipe refactorization on the ModelOpt side (NVIDIA/Model-Optimizer#1253) doesn't look merged/closed yet. from modelopt.recipe import ModelOptPTQRecipe, load_recipe will fail against any modelopt build that predates that PR. Please confirm the minimum modelopt version this depends on, gate or pin accordingly, and ideally land/tag #1253 first so CI here isn't fragile.
🐛 Blocker — quantize.py will crash on the non-recipe path
Lines 221–232 of examples/post_training/modelopt/quantize.py (untouched by this PR) still do:
if args.num_first_layers_to_skip_quant is not None:
mtq_config = get_first_layers_disabled_config(...)
if args.num_last_layers_to_skip_quant is not None:
mtq_config = get_last_layers_disabled_config(...)But this PR removed both argparse args (so args.num_first_layers_to_skip_quant doesn't exist) and both helper functions. The first call to get_modelopt_torch_quantization_config() without --recipe will hit AttributeError, and even with the attrs stubbed the helpers are gone (NameError). Please either delete this block too or restore the helpers/args. Add a smoke test that runs the legacy --export-quant-cfg NVFP4_DEFAULT_CFG path so this doesn't regress.
🧪 Test coverage request
Before merging, please run this end-to-end against nmm-sandbox PTQ and QAD test suites — both the new --recipe path and the legacy --export-quant-cfg path (the one currently broken above). The shell-side dispatch in quantize.sh also needs both branches exercised.
Inline notes
examples/post_training/modelopt/quantize.py ~L170 — assert isinstance(recipe, ModelOptPTQRecipe) is stripped under python -O. For user-facing input validation, prefer:
if not isinstance(recipe, ModelOptPTQRecipe):
raise TypeError(f"Expected PTQ recipe, but got {type(recipe).__name__} from {args.recipe}")examples/post_training/modelopt/quantize.py ~L173–174 — only --export-kv-cache-quant gets a "being ignored" warning, but the --recipe help text says both --export-quant-cfg and --export-kv-cache-quant are ignored. If a user passes a non-default --export-quant-cfg together with --recipe, they get no feedback. Suggest one print_rank_0 that lists every overridden arg. Also, the f-string is ~120 chars — likely to fail autoformat.sh (black 100-col).
examples/post_training/modelopt/quantize.py L5 — import copy is now unused (only used by the removed helpers). Please remove.
examples/post_training/modelopt/quantize.sh L23–32 — heuristic (/ or .yaml/.yml → recipe) is reasonable since the predefined configs are uppercase-only identifiers, but please add a one-line comment noting that contract so a future contributor doesn't add a config like models/foo and silently break it.
|
/claude review |
There was a problem hiding this comment.
Two issues found:
- Syntax error (blocker): Line 170 in
quantize.pyhas a comma instead of a colon on theif not isinstance(...)line, which will cause aSyntaxErrorat import time. - Dead import:
import copyon line 5 is no longer used after removing the layer-skip helper functions.
Co-authored-by: claude[bot] <209825114+claude[bot]@users.noreply.github.com>
Co-authored-by: claude[bot] <209825114+claude[bot]@users.noreply.github.com>
|
/claude review |
|
/ok to test 0610539 |
|
🔄 Merge queue validation started! You can track the progress here: https://github.com/NVIDIA/Megatron-LM/actions/runs/25181011042 |
What does this PR do ?
Support YAML quant recipe in PTQ, which eliminates the need of the code to modify first/last layers in quant config. User can now skip layers directly in the yaml, e.g.
--recipeflag... quantize.sh <model> NVFP4_DEFAULT_CONFIGor...quantize.sh <model> /path/to/modelopt_recipe--force-all-expert-routingwhich is a no-opDeprecation
Deprecating these flags in
quantize.py:--num-first-layers-to-skip-quant, --num-last-layers-to-skip-quant, --force-all-expert-routing--force-all-expert-routingwas already a no-op in the process of deprecation--num-first-layers-to-skip-quant, --num-last-layers-to-skip-quantto skip layersIssue tracking
For PRs from open-source community contributors:
Linked issue:
Contribution process
Pre-checks
Code review
Feel free to message or comment the @mcore-oncall to help accelerate your merge into main. The less complex your PR is, the faster it will be approved and merged!
All PRs start as draft. If you open a non-draft PR, it will be automatically converted to draft.
Step 1: Mark PR as "Ready for Review"
.github/CODEOWNERS.Final Review might get declined if these requirements are not fulfilled.
Step 2: Final Review
For PRs that change
megatron/core, once all expert reviewers have approved, theFinal Reviewlabel is applied automatically and final reviewers are assigned.For PRs outside
megatron/core, this step is skipped.Step 3: Approved
Once all required reviewers have approved, the
Approvedlabel is applied automatically.Merge
Any member of mcore-engineers will be able to merge your PR.
For MRs into `dev` branch
The proposed review process for `dev` branch is under active discussion.MRs are mergable after one approval by either
eharper@nvidia.comorzijiey@nvidia.com.