Add a general composable $import system for YAML configs, and use it to implement composable recipes#1253
Add a general composable $import system for YAML configs, and use it to implement composable recipes#1253shengliangxu wants to merge 44 commits intomainfrom
Conversation
|
Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds a composable YAML imports system ( Changes
Sequence Diagram(s)sequenceDiagram
participant U as User
participant L as modelopt.recipe.loader
participant R as _load_raw_config
participant S as _resolve_imports
participant FS as Files / Builtins
U->>L: load_recipe(path)
L->>R: _load_raw_config(path)
R->>FS: read YAML (file or builtin)
FS-->>R: raw documents
R-->>L: parsed raw config (may include imports)
alt imports present
L->>S: _resolve_imports(parsed_config)
S->>R: _load_raw_config(import_path)
R->>FS: read imported snippet
FS-->>R: snippet content
R-->>S: snippet (dict/list/_list_content)
S->>S: merge / splice / detect circular refs
S-->>L: resolved config
end
L-->>U: final resolved recipe
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes 🚥 Pre-merge checks | ✅ 4✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
|
Commit description:
Introduce an import mechanism that lets recipe YAML files reference reusable
config snippets by name, reducing duplication across recipes.
Syntax:
imports:
fp8: configs/numerics/fp8
base_disable_all: configs/ptq/base_disable_all
quant_cfg:
- base_disable_all # string entry → replaced with imported dict or spliced
list
- quantizer_name: '*weight_quantizer'
cfg: fp8 # string cfg → replaced with imported dict
Features:
- Dict-based imports (keys are names, values are config paths) — no name conflicts
- Three resolution modes: string cfg value, string list entry (dict), string list entry (list
splice)
- Recursive resolution with circular import detection
- Path resolution via load_config (built-in library first, then filesystem)
- Works with both single-file and directory recipe formats
New reusable config snippets (modelopt_recipes/configs/):
- numerics/fp8.yml, nvfp4_dynamic.yml, nvfp4_static.yml
- ptq/base_disable_all.yaml, default_disabled_quantizers.yaml
All 6 built-in PTQ recipes converted to use imports, reducing each by ~30 lines.
Pre-commit hook updated to skip configs/ directory and allow string entries in
quant_cfg. load_config() now accepts YAML lists for list-valued snippets.
Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
ea910ae to
82d5a12
Compare
Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
ChenhanYu
left a comment
There was a problem hiding this comment.
Review
The problem is real — 6 recipes duplicating the same 14-entry disable list and NVFP4 format definition is a maintenance nightmare. The $import system is well-designed (scoped, recursive, circular detection) and the test coverage is excellent (837 lines). The before/after on nvfp4_experts_only (75 → 18 lines) is compelling.
However, before we introduce a custom config composition system, I want to understand why we can't reuse what we already have:
1. Why not OmegaConf?
We already depend on OmegaConf — EAGLE training (main.py, PR #1134) uses it with __base__ inheritance and OmegaConf.merge() for config composition. OmegaConf natively supports:
- Structured configs with inheritance (
defaults:list) - Variable interpolation (
${numerics.fp8}) - Merge with override semantics
The $import system reimplements composition with different syntax ($import: markers, --- multi-doc for list snippets, scoped import namespaces). This means we now have two config composition systems in the codebase — OmegaConf for speculative decoding recipes and $import for quantization recipes. Contributors need to learn both.
Could the same snippet library (configs/numerics/, configs/ptq/units/) work with OmegaConf's native composition instead of a custom resolver?
2. Why not a Python factory?
The *_CFG dicts are already Python. A factory approach like:
def build_ptq_config(format="nvfp4", kv="fp8", experts_only=False):
cfg = BASE_DISABLE_ALL + DEFAULT_DISABLED
if experts_only:
cfg += expert_quantizers(format)
else:
cfg += all_linear_quantizers(format)
if kv:
cfg += kv_quantizers(kv)
return cfgwould be more debuggable (breakpoints, type checking, IDE navigation) and avoids YAML-specific complexity (multi-doc --- separator, list splicing semantics). This is essentially what nemo_run does with run.Config — Python factories with composable config objects.
The YAML approach has advantages (user-editable, diffable, declarative), but our recipes are primarily consumed programmatically by mtq.quantize(), not hand-edited by end users.
3. Migration path
If we go with $import, what's the end state? Do all hardcoded *_CFG dicts eventually move to YAML? Or do we maintain both Python dicts and YAML presets indefinitely? The PR removes 51 lines from config.py but the Python API (mtq.FP8_DEFAULT_CFG) still needs to work — how does that load path change?
I'd like to hear the rationale before approving. The implementation quality is high — this is about whether we want a second composition system in the codebase vs. reusing OmegaConf or Python factories.
Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
Thanks @ChenhanYu — these are legitimate questions. Point-by-point reply below: 1. OmegaConfTwo points on the premise before the substance:
The Inheritance vs CompositionI actually did explore a Inheritance answers "B is a specialization of A; override these fields." It expresses a single parent + overrides: # inheritance flavor
__base__: fp8_default
quant_cfg:
override_field: new_valueThe configs we target aren't shaped that way. Concretely,
None of these four is the parent of another. They're siblings. The "base" the author has in mind is the combination, not any single snippet. # composition flavor (what this PR lands)
imports:
base_disable_all: configs/ptq/units/base_disable_all
fp8: configs/numerics/fp8
fp8_kv: configs/ptq/units/fp8_kv
default_disabled: configs/ptq/units/default_disabled_quantizers
quant_cfg:
- $import: base_disable_all # splice list snippet
- quantizer_name: "*weight_quantizer"
cfg: { $import: fp8 } # merge dict snippet
- quantizer_name: "*input_quantizer"
cfg: { $import: fp8 } # re-use same snippet
- $import: fp8_kv # splice another list snippet
- $import: default_disabled # splice one moreTo express the same recipe as inheritance you'd have to pick one snippet as the "base" and encode the others as overrides on top of it — which doesn't match the actual structure (no snippet is a specialization of another) and forces authors to invent an artificial ordering. Two concrete capabilities fall out of composition that inheritance can't provide cleanly:
So Rebuilding
Where I agree: the "two composition systems" concern is legitimate — and the plan is to converge. Once this PR merges, we'll migrate the existing flat YAML configs (EAGLE, DFlash, etc.) onto the 2. Python factoryAgreed that a factory wins on debuggability. But I want to reframe the choice before arguing specifics: YAML
So the question isn't "YAML vs. Python factory" — it's "what belongs at each layer." My claim: structural composition within a single config is YAML's sweet spot, and Two side points on your specific examples:
3. Migration pathDirect answer:
|
cjluo-nv
left a comment
There was a problem hiding this comment.
Reviewed the general/ptq and the refactor LGTM
| - ``$import: fp8`` under ``cfg`` is a **dict value** — the snippet (a YAML dict of | ||
| quantizer attributes) replaces the ``cfg`` field. |
There was a problem hiding this comment.
This is a bit confusing.
Should we be doing something like this?
quantize:
algorithm: max
quant_cfg:
- $import: base_disable_all # spliced from a single-element list snippet
- quantizer_name: '*weight_quantizer'
$import: fp8 # cfg value replaced with imported dict
- $import: default_disabled # spliced from a multi-element list snippet
There was a problem hiding this comment.
no, unless the fp8 snippet is defined using:
cfg:
num_bits: e4m3
axis:
I don't think we want to do it because cfg includes both the base format and many other components
{
"num_bits": 8, # 8-bit integer quantization
"axis": None, # per-tensor scale (no per-channel axis)
"fake_quant": True, # simulate quantization in forward pass (PTQ / QAT)
"unsigned": False, # signed integer range, e.g. [-128, 127] for INT8
"narrow_range": False, # full range; True would restrict to [-127, 127] for INT8
"type": "static", # static calibration (not dynamic per-inference)
"block_sizes": None, # no block quantization; set for NF4 / MXFP formats
"bias": None, # no affine bias correction
"calibrator": "max", # use max-abs calibration to determine amax
"rotate": False, # no Hadamard rotation (QuaRot / SpinQuant)
"pass_through_bwd": True, # straight-through estimator for QAT gradients
"trt_high_precision_dtype": "Float", # cast QDQ nodes to fp32 for TRT StronglyType export
"backend": None, # use the built-in quantization backend
"backend_extra_args": None, # no extra args for custom backends
"use_constant_amax": False, # calibrate amax; True hard-codes FP8 E4M3 max (448.0)
}
Using the current design make you config beyond just base format:
quantize:
algorithm: max
quant_cfg:
- $import: base_disable_all # spliced from a single-element list snippet
- quantizer_name: '*weight_quantizer'
cfg:
$import: fp8 # base format
rotate: true # example more conifg that is beyond base format
$import: other_components
- $import: default_disabled # spliced from a multi-element list snippet
|
Thanks @shengliangxu — the composition-vs-inheritance distinction is well-argued and I agree that these configs are horizontal bags of independent pieces, not a parent-child chain. That point is convincing. However, your nemo_run response actually strengthens the case for using it rather than building
This describes exactly what we want. nemo_run already provides:
Our Your recipe could be expressed as: def fp8_default_recipe():
return run.Config(
PTQRecipe,
quant_cfg=[
base_disable_all(),
weight_quantizer(format=fp8()),
input_quantizer(format=fp8()),
fp8_kv(),
default_disabled(),
]
)Which serializes to: _target_: modelopt.torch.quantization.PTQRecipe
quant_cfg:
- _target_: modelopt.torch.quantization.QuantEntry
quantizer_name: '*'
enable: false
- _target_: modelopt.torch.quantization.QuantEntry
quantizer_name: '*weight_quantizer'
cfg:
num_bits: e4m3
- _target_: modelopt.torch.quantization.QuantEntry
quantizer_name: '*input_quantizer'
cfg:
num_bits: e4m3
- _target_: modelopt.torch.quantization.QuantEntry
quantizer_name: '*[kv]_bmm_quantizer'
cfg:
num_bits: e4m3The YAML is still readable and diffable. The That said, I see the tradeoff: nemo_run's serialized YAML has I'm not blocking this PR — the implementation quality is high and the problem is real. But before we commit to a custom |
|
Follow-up: I looked into nemo_run's Side-by-side:
|
$import |
_factory_ |
|
|---|---|---|
| References resolve to | YAML snippet files | Python factory functions |
| Debuggable | No | Yes (breakpoints, IDE navigation) |
| Type-checked | No | Yes |
| Overridable at any level | Only at YAML level | At any level — override the top-level factory, a mid-level factory, or a leaf numeric format |
| Custom DSL | Yes ($import, --- multi-doc, list splicing) |
No (standard nemo_run _factory_) |
| Already in our ecosystem | No (new) | Yes (tools/launcher uses it for Slurm configs) |
Nested factory overrides — the key advantage
With _factory_, composition is overridable at every level:
Leaf factory (numeric format — defined once):
@run.cli.factory
def nvfp4():
return {"num_bits": "e2m1", "block_sizes": {-1: 16, "type": "dynamic", "scale_bits": "e4m3"}}Mid-level factory (expert quantizer pattern):
@run.cli.factory
def experts_quantizer(pattern="*mlp.experts*", numeric=nvfp4):
return [
{"quantizer_name": f"{pattern}weight_quantizer", "enable": True, "cfg": numeric()},
{"quantizer_name": f"{pattern}input_quantizer", "enable": True, "cfg": numeric()},
]Top-level factory:
@run.cli.factory
def nvfp4_experts_only_fp8_kv():
return PTQRecipe(
quant_cfg=[*base_disable_all(), *experts_quantizer("*mlp.experts*", nvfp4), ...]
)User can operate at any level of abstraction:
# Level 1: Use the whole recipe as-is (1 line)
_factory_: nvfp4_experts_only_fp8_kv# Level 2: See and customize the composition (same length as $import)
_factory_: ptq_recipe
quant_cfg:
- _factory_: base_disable_all
- quantizer_name: '*mlp.experts*weight_quantizer'
cfg: { _factory_: nvfp4 }
- quantizer_name: '*mlp.experts*input_quantizer'
cfg: { _factory_: nvfp4 }
- _factory_: fp8_kv
- _factory_: default_disabled# Level 3: Override a leaf — swap NVFP4 for FP8 on one quantizer
_factory_: ptq_recipe
quant_cfg:
- _factory_: base_disable_all
- quantizer_name: '*mlp.experts*weight_quantizer'
cfg: { _factory_: fp8 } # ← changed from nvfp4 to fp8
- quantizer_name: '*mlp.experts*input_quantizer'
cfg: { _factory_: nvfp4 }
- _factory_: fp8_kv
- _factory_: default_disabledCLI overrides — nemo_run's dotlist syntax
nemo_run supports OmegaConf-style dotlist overrides from the command line, which compose on top of _factory_ resolution. This means any nested field can be overridden without editing the YAML at all:
# Use the standard recipe, override calibration size
python quantize.py --factory nvfp4_experts_only_fp8_kv \
calibration.calib_size=512
# Override a nested numeric format at the leaf level
python quantize.py --factory nvfp4_experts_only_fp8_kv \
quant_cfg.1.cfg.num_bits=e4m3 \
quant_cfg.1.cfg.block_sizes='{-1: 128}'
# Swap the expert quantizer pattern to target a different module
python quantize.py --factory nvfp4_experts_only_fp8_kv \
quant_cfg.1.quantizer_name='*moe.experts*weight_quantizer'With $import, CLI overrides would require a separate OmegaConf layer on top — two systems working together. With _factory_, it's built in: the factory produces the config, dotlist overrides patch it, done. One system, end to end.
Again — not blocking this PR. But I think _factory_ achieves the same deduplication goals with less custom infrastructure and more flexibility. Worth considering before we commit to a custom DSL.
|
|
||
| # --------------------------------------------------------------------------- | ||
| # $import resolution | ||
| # --------------------------------------------------------------------------- | ||
|
|
There was a problem hiding this comment.
nit:
This is one of the claude comment style I do not like.
| # --------------------------------------------------------------------------- | |
| # $import resolution | |
| # --------------------------------------------------------------------------- |
There was a problem hiding this comment.
this comment is simply adding redundancy IMO.
| # ``axis: null`` is explicit to match the hardcoded ``FP8_DEFAULT_CFG`` shape — | ||
| # downstream code that keys on ``"axis" in cfg`` sees the same dict layout. | ||
| num_bits: e4m3 | ||
| axis: |
There was a problem hiding this comment.
How can we specify None explicitly?
There was a problem hiding this comment.
it's general yaml, these are equivalent:
axis: ~
axis: null
axis:
| # See the License for the specific language governing permissions and | ||
| # limitations under the License. | ||
|
|
||
| # NVFP4 E2M1 blockwise quantizer attributes with FP8 E4M3 scales (static calibration). |
There was a problem hiding this comment.
| # NVFP4 E2M1 blockwise quantizer attributes with FP8 E4M3 scales (static calibration). | |
| # NVFP4 E2M1 blockwise quantizer attributes with FP8 E4M3 scales (used for NVFP4 weights since weight scales can be static). |
There was a problem hiding this comment.
can we change this name to kv_fp8?
I am trying to have a good naming convention:
- numeric presets are the basic
nvfp4
mxfp4
fp8
- operator level presets live above that (format: {operator_name}_{numeric_format}):
weight_act_nvfp4_nvfp4
weight_act_fp8_fp8
weight_act_nvfp4_f16
weight_act_mxfp4_fp8
kv_nvfp4
kv_mxfp4
kv_fp8
|
Let's use an example to compare more clearly. Using Side-by-side below: Scaffolding (required before any factory can register)Assume
Numeric leaf
List snippet with an inline import
Flat list of 14 exclusions
Top-level recipe (same granularity, no wrapper factory)
SummaryThe actual differences:
|
|
Thanks @shengliangxu — your side-by-side comparison is thorough and the fiddle limitations are real. I want to step back and structure the discussion around three separable questions. 1. The circular dependency — why the resolver should move out of
|
| External deps | 0 (pure Python + pyyaml + omegaconf) |
| Code to write | ~300 lines (registry + resolver + OmegaConf override) |
| Maintenance | We own it — bugs are ours, must track nemo_run _factory_ spec for compatibility |
| Factories | Regular Python functions, no fiddle restrictions, no wrapper types needed |
| Compatibility | Must ensure our _factory_ semantics match nemo_run's so tools/launcher YAMLs work with both |
| Risk | Our implementation drifts from nemo_run's over time |
Option B: Depend on nemo_run [core] (if they add the extra)
| External deps | fiddle (~2 MB) + omegaconf (~0.6 MB) + catalogue (~0.1 MB) ≈ 3 MB, 4-6 packages |
| Code to write | 0 — use nemo_run's resolver as-is |
| Maintenance | nemo_run team owns it — just upgrade version |
| Factories | Subject to fiddle's auto_config restrictions (no list comprehensions, needs wrapper types for bare dict/list returns — as you demonstrated) |
| Compatibility | Guaranteed — same code path as tools/launcher |
| Risk | nemo_run doesn't add [core] extra → stuck with 179 MB ray dependency from leptonai |
Option C: $import (this PR as-is)
| External deps | 0 |
| Code to write | 336 lines (already done) |
| Maintenance | We own it |
| Factories | N/A — YAML snippets, not Python functions |
| Compatibility | Does not converge with tools/launcher — two systems in the repo |
| Risk | Future recipe types (QAT, QAD) mix $import + OmegaConf in the same file |
My preference is A (custom _factory_) if you're willing to implement it, with the explicit goal of compatibility with nemo_run's _factory_ spec so we can switch to B later if they trim their dependencies.
But I also recognize that C (this PR) solves the immediate problem well, and the two-system concern is a future issue that can be addressed when QAT/QAD recipes actually ship. What do you think?
meenchen
left a comment
There was a problem hiding this comment.
1. Design — Module-Import-Time YAML Loading
[SUGGESTION] Lazy-load or fall back on failure
FP8_DEFAULT_CFG: dict[str, Any] = load_config("configs/ptq/presets/model/fp8")
FP8_KV_CFG: dict[str, Any] = load_config("configs/ptq/presets/kv/fp8")Packaging is correct (package-data globs cover **/*.yaml), so this should work post-install. But if packaging ever regresses, or a YAML is malformed, the entire modelopt.torch.quantization.config module fails to import — breaking all downstream code. Consider lazy-loading via PEP 562 module __getattr__, or wrapping in try/except with a fallback to the hardcoded dict during the migration window.
2. Design — Filesystem-First Path Resolution
[QUESTION] User files in CWD silently shadow built-ins
_resolve_config_path checks filesystem paths before BUILTIN_CONFIG_ROOT:
paths_to_check.append(Path(f"{config_file}.yml")) # CWD-relative first
paths_to_check.append(BUILTIN_CONFIG_ROOT.joinpath(...)) # built-in secondA user who happens to have configs/numerics/fp8.yml in their CWD silently overrides the built-in FP8_DEFAULT_CFG. Non-reproducible behavior depending on where Python is launched. Intentional (user override), or should absolute-style imports like configs/... always resolve against built-ins first?
3. Design — Snippet Files Use Indented Top-Level Lists
[NIT] Non-standard YAML formatting
Unit snippet files (base_disable_all.yaml, default_disabled_quantizers.yaml, etc.) use indented top-level list items:
- quantizer_name: '*'
enable: falseYAML accepts this but it's unusual. Most editors and tools expect column-0 list items. Authors copying from these templates may be confused. Consider de-indenting for consistency.
Alternative approach: two separate problems, two simpler solutionsThe PR conflates two distinct duplication problems and solves them together with one mechanism. Separating them reveals that each has a simpler native solution. Problem 1: Python
|
!include |
$import (this PR) |
|
|---|---|---|
| Reference style | !include configs/numerics/fp8.yaml |
imports: section + {$import: fp8} |
| List snippet that needs its own includes | just works recursively | requires multi-document YAML (--- separator) |
| Implementation | ~50 lines | 336 lines |
| Inline key override inside an include | not supported inline | {$import: nvfp4, block_sizes: {type: static}} |
The one feature $import adds over !include is inline dict override ({$import: nvfp4, block_sizes: {type: static}}). If that's needed in practice, it can be handled with a targeted !merge tag in the small number of places where it appears, rather than building the full indirection system for every recipe.
Context: why OmegaConf (the other suggestion in this thread) doesn't work here either
For completeness — modelopt.torch.puzzletron already uses OmegaConf's defaults: list heavily for config composition. The reason that pattern doesn't transfer to PTQ recipes is structural:
- Puzzletron configs are consumed by a Hydra app, so OmegaConf's config loader is already in the loop. PTQ recipes produce plain Python
dicts passed tomtq.quantize()— there is no Hydra runtime. - OmegaConf merges dicts by key.
quant_cfgis an ordered list — OmegaConf has no mechanism to splice named entries into the middle of a list.
So a custom processing step is genuinely required for the list-splice case. The question is only whether that step is !include (~50 lines, no indirection) or $import (336 lines, indirection layer, multi-doc YAML).
Summary
| This PR | Proposed alternative | |
|---|---|---|
config.py deduplication |
Migrate to YAML, load at import time, circular import → move resolver to modelopt.torch.opt |
Python * unpacking, stays in Python, 0 new lines in core |
| YAML recipe deduplication | $import with named imports, 336-line resolver |
!include tag, ~50-line resolver, stays in modelopt.recipe |
New lines in modelopt.torch.opt |
~336 | 0 |
| Multi-document YAML required | Yes (for list snippets with their own imports) | No |
The problems are real and worth solving. The alternative just keeps the two concerns separate and uses the simplest mechanism for each.
Problem 1: we want to get rid of the python hard coded stuff completely, so this argument is not applicable here. Problem 2: !include is just yet another YAML DSL, it's no real difference with defining and using OmegaConf is a library implementing YAML handling logics, it's irrelevant when talking about designs. |
What does this PR do?
Type of change: New feature
Add a composable
$importsystem for recipe YAML configs, plus reorganize config snippets and begin migrating hardcoded*_CFGdicts to YAML presets.Problem
*_CFGdicts inconfig.py(e.g.,FP8_DEFAULT_CFG) had no connection to the YAML recipe systemmodelopt.recipe, causing circular imports whenmodelopt.torch.quantization.configneeded to load YAML configsSolution
Composable
$importsystemRecipes declare an
importssection mapping names to config snippet files. The{$import: name}marker is explicitly resolved at load time. For example, a recipe can import shared FP8 attributes and standard exclusions rather than duplicating them inline.$importsemantics:$import: [a, b]) and inline override keys with ordered precedence (later imports override earlier, inline keys override all)---separator (first doc = imports, second doc = list content)Config snippet library (
modelopt_recipes/configs/):numerics/— all numeric format definitions live here (fp8, nvfp4, nvfp4_static). Any new numeric format (e.g., int8, mxfp8) should be added to this directory as the single source of truth for quantizer attributesptq/units/— reusable quant_cfg entries (base_disable_all, default_disabled, fp8_kv, w8a8_fp8_fp8, w4a4_nvfp4_nvfp4)ptq/presets/model/— complete configs replacing hardcoded*_CFGdictsptq/presets/kv/— KV cache config presetsHardcoded config migration:
FP8_DEFAULT_CFGandFP8_KV_CFGnow load from YAML presets viaload_config()modelopt.recipe._config_loadertomodelopt.torch.opt.config_loaderto eliminate circular importsAll 6 built-in PTQ recipes converted to use imports — each reduced by ~30 lines.
Changes by area
Config loading infrastructure:
modelopt/torch/opt/config_loader.py— new home for YAML loading +$importresolution (zeromodeloptimports, no circular dependency risk)modelopt/recipe/_config_loader.py— thin re-export fromtorch.opt.config_loadermodelopt/recipe/loader.py— uses shared_resolve_importsfrom config_loaderQuantization config:
modelopt/torch/quantization/config.py—FP8_DEFAULT_CFGandFP8_KV_CFGloaded from YAML presetsRecipe YAML files:
$importstylePre-commit:
.pre-commit-config.yaml— recipe validator excludesconfigs/directorytools/precommit/check_modelopt_recipes.py— recognizes$importentriesDocumentation:
docs/source/guides/10_recipes.rst— full$importspecification with examples, inline/import style comparison, multi-document snippets, override precedenceTests:
tests/unit/recipe/test_loader.py— 20+ new tests covering all import featuresTesting
All new and existing recipe loader tests pass. Built-in recipe smoke tests pass with converted recipes.
Before your PR is "Ready for review"
FP8_DEFAULT_CFG/FP8_KV_CFGproduce identical dicts)Summary by CodeRabbit
New Features
Documentation
Tests
Chores