Add a general composable $import system for YAML configs, and use it to implement composable recipes by shengliangxu · Pull Request #1253 · NVIDIA/Model-Optimizer

shengliangxu · 2026-04-14T01:38:25Z

What does this PR do?

Type of change: New feature

Add a composable $import system for recipe YAML configs, plus reorganize config snippets and begin migrating hardcoded *_CFG dicts to YAML presets.

Problem

The 6 built-in PTQ recipes duplicated the same numeric format definitions and 14-entry exclusion list across every file
The hardcoded *_CFG dicts in config.py (e.g., FP8_DEFAULT_CFG) had no connection to the YAML recipe system
The config loading infrastructure lived in modelopt.recipe, causing circular imports when modelopt.torch.quantization.config needed to load YAML configs

Solution

Composable $import system

Recipes declare an imports section mapping names to config snippet files. The {$import: name} marker is explicitly resolved at load time. For example, a recipe can import shared FP8 attributes and standard exclusions rather than duplicating them inline.

$import semantics:

Dict value: replaced with snippet content. Supports multiple imports ($import: [a, b]) and inline override keys with ordered precedence (later imports override earlier, inline keys override all)
List element: snippet (must be a YAML list) is spliced into the surrounding list
Multi-document YAML: list-valued snippets that need imports use --- separator (first doc = imports, second doc = list content)
Recursive: snippets can themselves have imports. Circular imports detected
Scoped: each file's import names are independent

Config snippet library (modelopt_recipes/configs/):

numerics/ — all numeric format definitions live here (fp8, nvfp4, nvfp4_static). Any new numeric format (e.g., int8, mxfp8) should be added to this directory as the single source of truth for quantizer attributes
ptq/units/ — reusable quant_cfg entries (base_disable_all, default_disabled, fp8_kv, w8a8_fp8_fp8, w4a4_nvfp4_nvfp4)
ptq/presets/model/ — complete configs replacing hardcoded *_CFG dicts
ptq/presets/kv/ — KV cache config presets

Hardcoded config migration:

FP8_DEFAULT_CFG and FP8_KV_CFG now load from YAML presets via load_config()
Config loading infrastructure moved from modelopt.recipe._config_loader to modelopt.torch.opt.config_loader to eliminate circular imports

All 6 built-in PTQ recipes converted to use imports — each reduced by ~30 lines.

Changes by area

Config loading infrastructure:

modelopt/torch/opt/config_loader.py — new home for YAML loading + $import resolution (zero modelopt imports, no circular dependency risk)
modelopt/recipe/_config_loader.py — thin re-export from torch.opt.config_loader
modelopt/recipe/loader.py — uses shared _resolve_imports from config_loader

Quantization config:

modelopt/torch/quantization/config.py — FP8_DEFAULT_CFG and FP8_KV_CFG loaded from YAML presets

Recipe YAML files:

6 PTQ recipes converted to $import style
8 new config snippet files + 2 preset files

Pre-commit:

.pre-commit-config.yaml — recipe validator excludes configs/ directory
tools/precommit/check_modelopt_recipes.py — recognizes $import entries

Documentation:

docs/source/guides/10_recipes.rst — full $import specification with examples, inline/import style comparison, multi-document snippets, override precedence

Tests:

tests/unit/recipe/test_loader.py — 20+ new tests covering all import features

Testing

All new and existing recipe loader tests pass. Built-in recipe smoke tests pass with converted recipes.

Before your PR is "Ready for review"

Is this change backward compatible?: ✅ (recipes without imports load unchanged; FP8_DEFAULT_CFG / FP8_KV_CFG produce identical dicts)
Did you write any new necessary tests?: ✅
Did you update Changelog?: ✅

Summary by CodeRabbit

New Features
- Composable YAML imports for recipes ($import); new FP8 and NVFP4 numeric formats, reusable PTQ units, and modular presets for quantization.
Documentation
- Expanded recipe guide with inline vs import styles, composable imports, multi-document snippets, repo layout, and examples.
Tests
- Extensive unit tests covering import resolution, list splicing, recursion/cycles, and directory-format recipes.
Chores
- Pre-commit hook updated to exclude the shared configs subtree from recipe validation.

copy-pr-bot · 2026-04-14T01:38:30Z

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

coderabbitai · 2026-04-14T01:38:33Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

📝 Walkthrough

Walkthrough

Adds a composable YAML imports system (imports + $import) and a new centralized config loader with raw-load + import resolution; updates recipe loading to resolve imports, refactors many PTQ recipes to use shared snippet files, adds numeric/unit presets and tests, and updates pre-commit to exclude modelopt_recipes/configs/.

Changes

Cohort / File(s)	Summary
Config loader core `modelopt/torch/opt/config_loader.py`, `modelopt/recipe/_config_loader.py`, `modelopt/recipe/loader.py`	Introduce a new config loader with `_load_raw_config`, `_resolve_imports`, and `load_config`; replace the old local loader with re-exports and update recipe loader to raw-load then resolve imports (including validation of mapping shapes).
Quantization defaults integration `modelopt/torch/quantization/config.py`	Replace inline FP8/kv default dicts with `load_config(...)` calls to YAML presets and add `dict[str, Any]` annotations.
Reusable numerics & units `modelopt_recipes/configs/numerics/.yaml`, `modelopt_recipes/configs/ptq/units/`, `modelopt_recipes/configs/ptq/units/README.md`	Add numeric snippets (fp8, nvfp4, nvfp4_static) and PTQ unit fragments (base_disable_all, default_disabled_quantizers, fp8_kv, w8a8_fp8_fp8, w4a4_nvfp4_nvfp4) for composition via `$import`.
Presets & recipes `modelopt_recipes/configs/ptq/presets/`, `modelopt_recipes/general/ptq/.yaml`	Add preset fragments and refactor PTQ recipes to use `imports` + `$import`, replacing many inline quantizer entries with imported unit fragments.
Documentation & tooling `docs/source/guides/10_recipes.rst`, `CHANGELOG.rst`, `.pre-commit-config.yaml`, `modelopt_recipes/configs/ptq/presets/README.md`	Document the `imports`/`$import` system, update CHANGELOG, add README for presets, and update pre-commit hook to exclude `modelopt_recipes/configs/`.
Tests & validator `tests/unit/recipe/test_loader.py`, `tools/precommit/check_modelopt_recipes.py`	Add extensive unit tests covering import resolution (list/dict, multi-document, recursive/circular, scoping, builtin snippets) and update pre-commit checker to accept/validate `"$import"` entries in `quant_cfg`.

Sequence Diagram(s)

sequenceDiagram
    participant U as User
    participant L as modelopt.recipe.loader
    participant R as _load_raw_config
    participant S as _resolve_imports
    participant FS as Files / Builtins

    U->>L: load_recipe(path)
    L->>R: _load_raw_config(path)
    R->>FS: read YAML (file or builtin)
    FS-->>R: raw documents
    R-->>L: parsed raw config (may include imports)
    alt imports present
        L->>S: _resolve_imports(parsed_config)
        S->>R: _load_raw_config(import_path)
        R->>FS: read imported snippet
        FS-->>R: snippet content
        R-->>S: snippet (dict/list/_list_content)
        S->>S: merge / splice / detect circular refs
        S-->>L: resolved config
    end
    L-->>U: final resolved recipe

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

🚥 Pre-merge checks | ✅ 4

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The PR title accurately summarizes the main change: introducing a composable $import system for YAML configs and using it to implement composable recipes, which is the primary objective of the entire changeset.
Docstring Coverage	✅ Passed	Docstring coverage is 96.43% which is sufficient. The required threshold is 80.00%.
Security Anti-Patterns	✅ Passed	PR introduces secure YAML config loading with safe_load() methods and composable imports, no unsafe code patterns or new dependencies detected.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch shengliangx/composable-recipes

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

github-actions · 2026-04-14T01:42:44Z

PR Preview Action v1.8.1
🚀 View preview at https://NVIDIA.github.io/Model-Optimizer/pr-preview/pr-1253/
Built to branch `gh-pages` at 2026-05-02 00:51 UTC. Preview will be ready when the GitHub Pages deployment is complete.

Commit description: Introduce an import mechanism that lets recipe YAML files reference reusable config snippets by name, reducing duplication across recipes. Syntax: imports: fp8: configs/numerics/fp8 base_disable_all: configs/ptq/base_disable_all quant_cfg: - base_disable_all # string entry → replaced with imported dict or spliced list - quantizer_name: '*weight_quantizer' cfg: fp8 # string cfg → replaced with imported dict Features: - Dict-based imports (keys are names, values are config paths) — no name conflicts - Three resolution modes: string cfg value, string list entry (dict), string list entry (list splice) - Recursive resolution with circular import detection - Path resolution via load_config (built-in library first, then filesystem) - Works with both single-file and directory recipe formats New reusable config snippets (modelopt_recipes/configs/): - numerics/fp8.yml, nvfp4_dynamic.yml, nvfp4_static.yml - ptq/base_disable_all.yaml, default_disabled_quantizers.yaml All 6 built-in PTQ recipes converted to use imports, reducing each by ~30 lines. Pre-commit hook updated to skip configs/ directory and allow string entries in quant_cfg. load_config() now accepts YAML lists for list-valued snippets. Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>

Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>

ChenhanYu

Review

The problem is real — 6 recipes duplicating the same 14-entry disable list and NVFP4 format definition is a maintenance nightmare. The $import system is well-designed (scoped, recursive, circular detection) and the test coverage is excellent (837 lines). The before/after on nvfp4_experts_only (75 → 18 lines) is compelling.

However, before we introduce a custom config composition system, I want to understand why we can't reuse what we already have:

1. Why not OmegaConf?

We already depend on OmegaConf — EAGLE training (main.py, PR #1134) uses it with __base__ inheritance and OmegaConf.merge() for config composition. OmegaConf natively supports:

Structured configs with inheritance (defaults: list)
Variable interpolation (${numerics.fp8})
Merge with override semantics

The $import system reimplements composition with different syntax ($import: markers, --- multi-doc for list snippets, scoped import namespaces). This means we now have two config composition systems in the codebase — OmegaConf for speculative decoding recipes and $import for quantization recipes. Contributors need to learn both.

Could the same snippet library (configs/numerics/, configs/ptq/units/) work with OmegaConf's native composition instead of a custom resolver?

2. Why not a Python factory?

The *_CFG dicts are already Python. A factory approach like:

def build_ptq_config(format="nvfp4", kv="fp8", experts_only=False):
    cfg = BASE_DISABLE_ALL + DEFAULT_DISABLED
    if experts_only:
        cfg += expert_quantizers(format)
    else:
        cfg += all_linear_quantizers(format)
    if kv:
        cfg += kv_quantizers(kv)
    return cfg

would be more debuggable (breakpoints, type checking, IDE navigation) and avoids YAML-specific complexity (multi-doc --- separator, list splicing semantics). This is essentially what nemo_run does with run.Config — Python factories with composable config objects.

The YAML approach has advantages (user-editable, diffable, declarative), but our recipes are primarily consumed programmatically by mtq.quantize(), not hand-edited by end users.

3. Migration path

If we go with $import, what's the end state? Do all hardcoded *_CFG dicts eventually move to YAML? Or do we maintain both Python dicts and YAML presets indefinitely? The PR removes 51 lines from config.py but the Python API (mtq.FP8_DEFAULT_CFG) still needs to work — how does that load path change?

I'd like to hear the rationale before approving. The implementation quality is high — this is about whether we want a second composition system in the codebase vs. reusing OmegaConf or Python factories.

Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>

shengliangxu · 2026-04-23T00:10:25Z

Review

The problem is real — 6 recipes duplicating the same 14-entry disable list and NVFP4 format definition is a maintenance nightmare. The $import system is well-designed (scoped, recursive, circular detection) and the test coverage is excellent (837 lines). The before/after on nvfp4_experts_only (75 → 18 lines) is compelling.

However, before we introduce a custom config composition system, I want to understand why we can't reuse what we already have:

1. Why not OmegaConf?

We already depend on OmegaConf — EAGLE training (main.py, PR #1134) uses it with __base__ inheritance and OmegaConf.merge() for config composition. OmegaConf natively supports:

Structured configs with inheritance (defaults: list)

Variable interpolation (${numerics.fp8})

Merge with override semantics

The $import system reimplements composition with different syntax ($import: markers, --- multi-doc for list snippets, scoped import namespaces). This means we now have two config composition systems in the codebase — OmegaConf for speculative decoding recipes and $import for quantization recipes. Contributors need to learn both.

Could the same snippet library (configs/numerics/, configs/ptq/units/) work with OmegaConf's native composition instead of a custom resolver?

2. Why not a Python factory?

The *_CFG dicts are already Python. A factory approach like:
def build_ptq_config(format="nvfp4", kv="fp8", experts_only=False):
    cfg = BASE_DISABLE_ALL + DEFAULT_DISABLED
    if experts_only:
        cfg += expert_quantizers(format)
    else:
        cfg += all_linear_quantizers(format)
    if kv:
        cfg += kv_quantizers(kv)
    return cfg
would be more debuggable (breakpoints, type checking, IDE navigation) and avoids YAML-specific complexity (multi-doc --- separator, list splicing semantics). This is essentially what nemo_run does with run.Config — Python factories with composable config objects.

The YAML approach has advantages (user-editable, diffable, declarative), but our recipes are primarily consumed programmatically by mtq.quantize(), not hand-edited by end users.

3. Migration path

If we go with $import, what's the end state? Do all hardcoded *_CFG dicts eventually move to YAML? Or do we maintain both Python dicts and YAML presets indefinitely? The PR removes 51 lines from config.py but the Python API (mtq.FP8_DEFAULT_CFG) still needs to work — how does that load path change?

I'd like to hear the rationale before approving. The implementation quality is high — this is about whether we want a second composition system in the codebase vs. reusing OmegaConf or Python factories.

Thanks @ChenhanYu — these are legitimate questions. Point-by-point reply below:

1. OmegaConf

Two points on the premise before the substance:

EAGLE today doesn't do YAML-level composition. The EAGLE/DFlash YAMLs in the repo are flat single-document configs — no __base__, no defaults:, no cross-file imports. What EAGLE uses OmegaConf for is "load one YAML, apply CLI dotlist overrides."
OmegaConf itself isn't a composition DSL. It's a Python merge library — OmegaConf.merge(a, b) deep-merges two already-loaded trees. $import lives at a different level: it's a YAML-level composition syntax that resolves file references at load time. The two solve different problems.

The defaults: list + __base__ inheritance you may be thinking of is Hydra, not OmegaConf. Hydra is a heavier dep (config stores, group overrides, output-dir management, launcher abstraction) that would overlap much more with what this PR builds than $import does, not less.

Inheritance vs Composition

I actually did explore a __base__-style inheritance approach first. The core mismatch is that inheritance is a vertical chain but configs we want to have such as recipes are a horizontal bag of orthogonal pieces.

Inheritance answers "B is a specialization of A; override these fields." It expresses a single parent + overrides:

# inheritance flavor
__base__: fp8_default
quant_cfg:
  override_field: new_value

The configs we target aren't shaped that way. Concretely, fp8_default-fp8_kv is built from four independent building blocks:

base_disable_all — a list of entries that turn every quantizer off as a baseline
FP8 numeric attributes for weight + input quantizers
fp8_kv — FP8 attributes for the KV-cache quantizers
default_disabled — standard exclusion list (LM head, routers, BatchNorm, etc.)

None of these four is the parent of another. They're siblings. The "base" the author has in mind is the combination, not any single snippet. $import expresses that directly:

# composition flavor (what this PR lands)
imports:
  base_disable_all: configs/ptq/units/base_disable_all
  fp8: configs/numerics/fp8
  fp8_kv: configs/ptq/units/fp8_kv
  default_disabled: configs/ptq/units/default_disabled_quantizers

quant_cfg:
  - $import: base_disable_all               # splice list snippet
  - quantizer_name: "*weight_quantizer"
    cfg: { $import: fp8 }                   # merge dict snippet
  - quantizer_name: "*input_quantizer"
    cfg: { $import: fp8 }                   # re-use same snippet
  - $import: fp8_kv                         # splice another list snippet
  - $import: default_disabled               # splice one more

To express the same recipe as inheritance you'd have to pick one snippet as the "base" and encode the others as overrides on top of it — which doesn't match the actual structure (no snippet is a specialization of another) and forces authors to invent an artificial ordering.

Two concrete capabilities fall out of composition that inheritance can't provide cleanly:

Pick-and-mix across orthogonal axes. A new recipe like nvfp4_mlp_only-fp8_kv reuses base_disable_all, fp8_kv, default_disabled as-is and swaps the numeric snippet from fp8 to nvfp4. Inheritance would require either a deep chain (fp8_default → mlp_only_variant → swap_kv → ...) or duplicating the override block per combination. Composition just re-imports different snippets.
List splicing at specific positions. $import as a list element splices a sub-list into quant_cfg in place. Inheritance merges field-by-field, so a list override replaces the whole list (or needs a list_merge_mode: extend toggle, still at the list level, not at element-position granularity).

So __base__ isn't wrong — it's just the wrong shape for this problem. It handles "A with tweaks"; we need "A + B + C + tweaks" where A, B, C are drawn from independent libraries.

Rebuilding $import on top of OmegaConf would still require either:

Writing a custom resolver that loads other files and splices them — i.e. this PR's resolver, dressed as an OmegaConf plugin; or
Dropping list-element splicing and requiring every list snippet to be expressed as a dict-valued key, which is neither more readable nor natively supported by OmegaConf.

Where I agree: the "two composition systems" concern is legitimate — and the plan is to converge. Once this PR merges, we'll migrate the existing flat YAML configs (EAGLE, DFlash, etc.) onto the $import system so the whole repo shares one composition primitive. OmegaConf's dotlist-override ergonomics can sit on top of load_config() for CLI overrides — same $import for structural composition, same dotlist for last-mile overrides, everywhere. The two-systems state is transitional, not the end state.

2. Python factory

Agreed that a factory wins on debuggability. But I want to reframe the choice before arguing specifics: YAML $import and Python composition aren't alternatives — they live at different layers and coexist.

YAML $import — intra-file composition. Building one logical configuration from reusable pieces: a quant_cfg assembled from base_disable_all + fp8 + fp8_kv + default_disabled is one config, authored in one YAML file. This is what $import is for.
Python — inter-config composition. A full recipe for a real workflow is more than a single quant config. It's a bundle: the quant config, a calibration dataset config, an export-format config, pre/post hooks, validation, and glue code that wires these into mtq.quantize(...), evaluation harnesses, etc. That's the layer where Python (factories, classes, load_recipe()) naturally lives.
In this PR, modelopt.recipe.loader.load_recipe() is already that upper-layer Python glue: it loads the YAML recipe, parses metadata, dispatches on recipe_type, and returns a typed Pydantic object. A future load_recipe() for QAT or distillation will load several YAML configs (quant + dataset + training) and assemble them into one typed recipe — Python composition on top of YAML composition.

So the question isn't "YAML vs. Python factory" — it's "what belongs at each layer." My claim: structural composition within a single config is YAML's sweet spot, and build_ptq_config(...) pushes that work up into Python where it doesn't belong. With this split, $import handles the part that's verbose and declarative (which sub-list goes where), and Python handles the part that needs logic (validating a recipe, gluing components, dispatching).

Two side points on your specific examples:

nemo_run. run.Config is a Python factory pattern, yes — but it's explicitly designed to serialize to YAML and reload, where YAML is the shared, reviewable artifact and the Python class is the constructor/validator. Nemo_run actually supports the "YAML as canonical" framing, not against it.
"Consumed programmatically, not hand-edited today." True today, but that's exactly the bottleneck this PR is trying to remove. A recipe library only pays off if recipe authoring is as easy as editing a file; otherwise every new quantization combination requires a Python PR into modelopt core.

3. Migration path

Direct answer:

YAML is the source of truth. modelopt_recipes/configs/ptq/presets/model/fp8.yaml is the authoritative FP8 preset.
Python constants are eagerly-loaded aliases. config.py keeps FP8_DEFAULT_CFG: dict[str, Any] = load_config("configs/ptq/presets/model/fp8") — executed at import time. The mtq.FP8_DEFAULT_CFG API is unchanged; the Python dict is just the materialized form of the YAML.
Backward compatibility. This PR removes the hardcoded list entries (51 lines) while preserving every Python-side name. The equivalence is enforced by test_general_ptq_yaml_matches_config_dicts. The Python constants and API still exist only to keep our API's backward compatibility.
Deprecation. Over time the _CFG constants can be deprecated in favor of load_recipe(...), but that's a follow-up API change, not a blocker for this PR.

cjluo-nv

Reviewed the general/ptq and the refactor LGTM

…e-recipes

realAsma · 2026-04-23T18:09:51Z

+- ``$import: fp8`` under ``cfg`` is a **dict value** — the snippet (a YAML dict of
+  quantizer attributes) replaces the ``cfg`` field.


This is a bit confusing.

Should we be doing something like this?

quantize: algorithm: max quant_cfg: - $import: base_disable_all # spliced from a single-element list snippet - quantizer_name: '*weight_quantizer' $import: fp8 # cfg value replaced with imported dict - $import: default_disabled # spliced from a multi-element list snippet

no, unless the fp8 snippet is defined using:

cfg: num_bits: e4m3 axis:

I don't think we want to do it because cfg includes both the base format and many other components

{ "num_bits": 8, # 8-bit integer quantization "axis": None, # per-tensor scale (no per-channel axis) "fake_quant": True, # simulate quantization in forward pass (PTQ / QAT) "unsigned": False, # signed integer range, e.g. [-128, 127] for INT8 "narrow_range": False, # full range; True would restrict to [-127, 127] for INT8 "type": "static", # static calibration (not dynamic per-inference) "block_sizes": None, # no block quantization; set for NF4 / MXFP formats "bias": None, # no affine bias correction "calibrator": "max", # use max-abs calibration to determine amax "rotate": False, # no Hadamard rotation (QuaRot / SpinQuant) "pass_through_bwd": True, # straight-through estimator for QAT gradients "trt_high_precision_dtype": "Float", # cast QDQ nodes to fp32 for TRT StronglyType export "backend": None, # use the built-in quantization backend "backend_extra_args": None, # no extra args for custom backends "use_constant_amax": False, # calibrate amax; True hard-codes FP8 E4M3 max (448.0) }

Using the current design make you config beyond just base format:

quantize: algorithm: max quant_cfg: - $import: base_disable_all # spliced from a single-element list snippet - quantizer_name: '*weight_quantizer' cfg: $import: fp8 # base format rotate: true # example more conifg that is beyond base format $import: other_components - $import: default_disabled # spliced from a multi-element list snippet

…e-recipes

ChenhanYu · 2026-04-23T21:29:08Z

Thanks @shengliangxu — the composition-vs-inheritance distinction is well-argued and I agree that these configs are horizontal bags of independent pieces, not a parent-child chain. That point is convincing.

However, your nemo_run response actually strengthens the case for using it rather than building $import:

run.Config is a Python factory pattern, yes — but it's explicitly designed to serialize to YAML and reload, where YAML is the shared, reviewable artifact and the Python class is the constructor/validator.

This describes exactly what we want. nemo_run already provides:

Horizontal composition — run.Config objects can reference other configs as arguments
YAML as the canonical artifact — serializes to readable YAML via YamlSerializer
Type checking — validated at construction time
No custom DSL — no $import markers, no --- multi-doc convention

Our tool/launcher can also have nested argument on top directly.

Your recipe could be expressed as:

def fp8_default_recipe():
    return run.Config(
        PTQRecipe,
        quant_cfg=[
            base_disable_all(),
            weight_quantizer(format=fp8()),
            input_quantizer(format=fp8()),
            fp8_kv(),
            default_disabled(),
        ]
    )

Which serializes to:

_target_: modelopt.torch.quantization.PTQRecipe
quant_cfg:
- _target_: modelopt.torch.quantization.QuantEntry
  quantizer_name: '*'
  enable: false
- _target_: modelopt.torch.quantization.QuantEntry
  quantizer_name: '*weight_quantizer'
  cfg:
    num_bits: e4m3
- _target_: modelopt.torch.quantization.QuantEntry
  quantizer_name: '*input_quantizer'
  cfg:
    num_bits: e4m3
- _target_: modelopt.torch.quantization.QuantEntry
  quantizer_name: '*[kv]_bmm_quantizer'
  cfg:
    num_bits: e4m3

The YAML is still readable and diffable. The _target_ pattern is Hydra/fiddle standard — widely understood in the ML ecosystem. The factory functions (base_disable_all(), fp8(), fp8_kv()) are the single source of truth, equivalent to your snippet library but with breakpoints, IDE navigation, and type checking.

That said, I see the tradeoff: nemo_run's serialized YAML has _target_ annotations that make it less "plain YAML" than $import. And adding nemo_run as a dependency to modelopt core may be too heavy for what's essentially a config loading utility.

I'm not blocking this PR — the implementation quality is high and the problem is real. But before we commit to a custom $import system, I want to understand: is there a specific reason nemo_run's factory pattern doesn't work here? Is it the dependency weight? The _target_ overhead? The fact that mtq.quantize() expects raw dicts not run.Config objects? Or something else?

ChenhanYu · 2026-04-23T22:03:11Z

Follow-up: I looked into nemo_run's _factory_ pattern — it's a native nemo_run concept already used in our tools/launcher for Slurm config composition. It provides the same compact YAML as $import but with debuggable Python factories underneath.

Side-by-side: `$import` vs `_factory_` at the same granularity

$import                              _factory_
─────────────────────────────────    ─────────────────────────────────
imports:                             # (no imports section needed —
  base: configs/ptq/units/base       #  factories registered in Python)
  nvfp4: configs/numerics/nvfp4
  fp8_kv: configs/ptq/units/fp8_kv
  disabled: configs/ptq/units/disabled

quant_cfg:                           quant_cfg:
- $import: base                      - _factory_: base_disable_all
- quantizer_name: '...wt_quant'      - quantizer_name: '...wt_quant'
    cfg: { $import: nvfp4 }              cfg: { _factory_: nvfp4 }
- quantizer_name: '...in_quant'      - quantizer_name: '...in_quant'
    cfg: { $import: nvfp4 }              cfg: { _factory_: nvfp4 }
- $import: fp8_kv                    - _factory_: fp8_kv
- $import: disabled                  - _factory_: default_disabled

Same readability, same compactness, same level of composition. The difference is where the referenced content lives:

	`$import`	`_factory_`
References resolve to	YAML snippet files	Python factory functions
Debuggable	No	Yes (breakpoints, IDE navigation)
Type-checked	No	Yes
Overridable at any level	Only at YAML level	At any level — override the top-level factory, a mid-level factory, or a leaf numeric format
Custom DSL	Yes (`$import`, `---` multi-doc, list splicing)	No (standard nemo_run `_factory_`)
Already in our ecosystem	No (new)	Yes (`tools/launcher` uses it for Slurm configs)

Nested factory overrides — the key advantage

With _factory_, composition is overridable at every level:

Leaf factory (numeric format — defined once):

@run.cli.factory
def nvfp4():
    return {"num_bits": "e2m1", "block_sizes": {-1: 16, "type": "dynamic", "scale_bits": "e4m3"}}

Mid-level factory (expert quantizer pattern):

@run.cli.factory
def experts_quantizer(pattern="*mlp.experts*", numeric=nvfp4):
    return [
        {"quantizer_name": f"{pattern}weight_quantizer", "enable": True, "cfg": numeric()},
        {"quantizer_name": f"{pattern}input_quantizer", "enable": True, "cfg": numeric()},
    ]

Top-level factory:

@run.cli.factory
def nvfp4_experts_only_fp8_kv():
    return PTQRecipe(
        quant_cfg=[*base_disable_all(), *experts_quantizer("*mlp.experts*", nvfp4), ...]
    )

User can operate at any level of abstraction:

# Level 1: Use the whole recipe as-is (1 line)
_factory_: nvfp4_experts_only_fp8_kv

# Level 2: See and customize the composition (same length as $import)
_factory_: ptq_recipe
quant_cfg:
  - _factory_: base_disable_all
  - quantizer_name: '*mlp.experts*weight_quantizer'
    cfg: { _factory_: nvfp4 }
  - quantizer_name: '*mlp.experts*input_quantizer'
    cfg: { _factory_: nvfp4 }
  - _factory_: fp8_kv
  - _factory_: default_disabled

# Level 3: Override a leaf — swap NVFP4 for FP8 on one quantizer
_factory_: ptq_recipe
quant_cfg:
  - _factory_: base_disable_all
  - quantizer_name: '*mlp.experts*weight_quantizer'
    cfg: { _factory_: fp8 }          # ← changed from nvfp4 to fp8
  - quantizer_name: '*mlp.experts*input_quantizer'
    cfg: { _factory_: nvfp4 }
  - _factory_: fp8_kv
  - _factory_: default_disabled

CLI overrides — nemo_run's dotlist syntax

nemo_run supports OmegaConf-style dotlist overrides from the command line, which compose on top of _factory_ resolution. This means any nested field can be overridden without editing the YAML at all:

# Use the standard recipe, override calibration size
python quantize.py --factory nvfp4_experts_only_fp8_kv \
    calibration.calib_size=512

# Override a nested numeric format at the leaf level
python quantize.py --factory nvfp4_experts_only_fp8_kv \
    quant_cfg.1.cfg.num_bits=e4m3 \
    quant_cfg.1.cfg.block_sizes='{-1: 128}'

# Swap the expert quantizer pattern to target a different module
python quantize.py --factory nvfp4_experts_only_fp8_kv \
    quant_cfg.1.quantizer_name='*moe.experts*weight_quantizer'

With $import, CLI overrides would require a separate OmegaConf layer on top — two systems working together. With _factory_, it's built in: the factory produces the config, dotlist overrides patch it, done. One system, end to end.

Again — not blocking this PR. But I think _factory_ achieves the same deduplication goals with less custom infrastructure and more flexibility. Worth considering before we commit to a custom DSL.

realAsma · 2026-04-23T22:19:20Z

+
+# ---------------------------------------------------------------------------
+# $import resolution
+# ---------------------------------------------------------------------------
+


nit:
This is one of the claude comment style I do not like.

Suggested change

# ---------------------------------------------------------------------------

# $import resolution

# ---------------------------------------------------------------------------

this comment is simply adding redundancy IMO.

realAsma · 2026-04-23T22:20:18Z

+# ``axis: null`` is explicit to match the hardcoded ``FP8_DEFAULT_CFG`` shape —
+# downstream code that keys on ``"axis" in cfg`` sees the same dict layout.
+num_bits: e4m3
+axis:


How can we specify None explicitly?

it's general yaml, these are equivalent:

axis: ~ axis: null axis:

realAsma · 2026-04-23T22:20:49Z

+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+# NVFP4 E2M1 blockwise quantizer attributes with FP8 E4M3 scales (static calibration).


Suggested change

# NVFP4 E2M1 blockwise quantizer attributes with FP8 E4M3 scales (static calibration).

# NVFP4 E2M1 blockwise quantizer attributes with FP8 E4M3 scales (used for NVFP4 weights since weight scales can be static).

realAsma · 2026-04-23T22:32:37Z

can we change this name to kv_fp8?

I am trying to have a good naming convention:

numeric presets are the basic

nvfp4 mxfp4 fp8

operator level presets live above that (format: {operator_name}_{numeric_format}):

weight_act_nvfp4_nvfp4 weight_act_fp8_fp8 weight_act_nvfp4_f16 weight_act_mxfp4_fp8 kv_nvfp4 kv_mxfp4 kv_fp8

shengliangxu · 2026-04-24T02:23:32Z

Let's use an example to compare more clearly.

Using fp8_default-fp8_kv converted to both $import and nemo_run _factory_ so we can compare actual code instead of sketches. Both paths produce the same yaml config as output of parsing.

Side-by-side below: $import (this PR) on the left, _factory_ on the right.

Scaffolding (required before any factory can register)

Assume QuantizerCfgEntry is already a dataclass/Pydantic class (the existing modelopt schema). $import hands raw YAML dicts to the Pydantic validator at the boundary — no additional types needed. _factory_ still needs two extra wrapper types, because @run.cli.factory rejects bare dict/list returns (fiddle.auto_config requires a constructible class):

$import _factory_ — factories.py

(nothing — plain dicts from YAML flow straight into ModelOptPTQRecipe's Pydantic validator.)

from dataclasses import dataclass, field
from typing import Optional
import nemo_run as run


@dataclass
class NumericCfg:
    """Type for the ``cfg:`` field — factories can't return bare ``dict``."""
    num_bits: str = ""
    axis: Optional[int] = None


@dataclass
class EntryList:
    """List wrapper — factories can't return bare ``list``, must wrap it."""
    entries: list = field(default_factory=list)

Numeric leaf

`$import` — configs/numerics/fp8.yaml	`_factory_` — factories.py
num_bits: e4m3 axis:	@run.cli.factory @run.autoconvert def fp8() -> NumericCfg: return NumericCfg( num_bits="e4m3", axis=None, )

List snippet with an inline import

$import — configs/ptq/units/w8a8_fp8_fp8.yaml _factory_ — factories.py

imports:
  fp8: configs/numerics/fp8
---
- quantizer_name: '*weight_quantizer'
  cfg:
    $import: fp8
- quantizer_name: '*input_quantizer'
  cfg:
    $import: fp8

@run.cli.factory
@run.autoconvert
def w8a8_fp8_fp8() -> EntryList:
    return EntryList(entries=[
        QuantizerCfgEntry(
            quantizer_name="*weight_quantizer",
            cfg=fp8(),
        ),
        QuantizerCfgEntry(
            quantizer_name="*input_quantizer",
            cfg=fp8(),
        ),
    ])

Flat list of 14 exclusions

$import — configs/ptq/units/default_disabled_quantizers.yaml _factory_ — every entry must be a literal constructor call

- quantizer_name: '*block_sparse_moe.gate*'
  enable: false
- quantizer_name: '*linear_attn.conv1d*'
  enable: false
- quantizer_name: '*lm_head*'
  enable: false
# ... 8 more plain-name exclusions ...
- parent_class: 'nn.BatchNorm1d'
  quantizer_name: '*'
  enable: false
- parent_class: 'nn.BatchNorm2d'
  quantizer_name: '*'
  enable: false
- parent_class: 'nn.BatchNorm3d'
  quantizer_name: '*'
  enable: false
- parent_class: 'nn.LeakyReLU'
  quantizer_name: '*'
  enable: false

@run.cli.factory
@run.autoconvert
def default_disabled_quantizers() -> EntryList:
    # A list comprehension over a shared names list
    # fails at decoration time:
    #   UnsupportedLanguageConstructError: Control flow
    #   (ListComp) is unsupported by auto_config.
    # for / if / generators are also rejected.
    return EntryList(entries=[
        QuantizerCfgEntry(quantizer_name="*block_sparse_moe.gate*", enable=False),
        QuantizerCfgEntry(quantizer_name="*linear_attn.conv1d*",    enable=False),
        QuantizerCfgEntry(quantizer_name="*lm_head*",               enable=False),
        # ... 8 more literal constructor calls ...
        QuantizerCfgEntry(parent_class="nn.BatchNorm1d", quantizer_name="*", enable=False),
        QuantizerCfgEntry(parent_class="nn.BatchNorm2d", quantizer_name="*", enable=False),
        QuantizerCfgEntry(parent_class="nn.BatchNorm3d", quantizer_name="*", enable=False),
        QuantizerCfgEntry(parent_class="nn.LeakyReLU",   quantizer_name="*", enable=False),
    ])

Top-level recipe (same granularity, no wrapper factory)

$import — general/ptq/fp8_default-fp8_kv.yaml _factory_ — consumer YAML

imports:
  base_disable_all:            configs/ptq/units/base_disable_all
  default_disabled_quantizers: configs/ptq/units/default_disabled_quantizers
  w8a8_fp8_fp8:                configs/ptq/units/w8a8_fp8_fp8
  fp8_kv:                      configs/ptq/units/fp8_kv

metadata:
  recipe_type: ptq
  description: FP8 W8A8 + FP8 KV, max calibration.
quantize:
  algorithm: max
  quant_cfg:
    - $import: base_disable_all            # splices
    - $import: w8a8_fp8_fp8                # splices
    - $import: fp8_kv                      # splices
    - $import: default_disabled_quantizers # splices

metadata:
  recipe_type: ptq
  description: FP8 W8A8 + FP8 KV, max calibration.
quantize:
  algorithm: max
  quant_cfg:
    - _factory_: base_disable_all            # returns EntryList, not spliced
    - _factory_: w8a8_fp8_fp8                # returns EntryList, not spliced
    - _factory_: fp8_kv                      # returns EntryList, not spliced
    - _factory_: default_disabled_quantizers # returns EntryList, not spliced

Summary

The actual differences:

$import: Python for schema , YAML for data. Every numeric format, every quantizer entry, every exclusion list is YAML data that the loader hands to the schema at the boundary. Adding a new numeric format or a new exclusion list is a new YAML file; no Python change.
_factory_: Python for schema AND data, YAML for additional data. Every numeric format and every quantizer entry is a literal constructor call inside a Python factory function. Adding a new numeric format or a new exclusion list is a new Python function plus @run.cli.factory registration plus an import side-effect to make sure the decorator runs.

ChenhanYu · 2026-04-24T03:07:43Z

Thanks @shengliangxu — your side-by-side comparison is thorough and the fiddle limitations are real. I want to step back and structure the discussion around three separable questions.

1. The circular dependency — why the resolver should move out of `modelopt.torch`

This PR places the $import resolver in modelopt.torch.opt.config_loader (336 lines in core) because modelopt.torch.quantization.config calls load_config() at import time to populate FP8_DEFAULT_CFG and friends. Moving the resolver to modelopt.recipe would re-introduce the circular import:

modelopt.torch.quantization.config → modelopt.recipe._config_loader  (load_config)
modelopt.recipe.config → modelopt.torch.quantization.config           (QuantizeConfig)

But this circular dependency only exists because config resolution happens at import time. It doesn't have to. Looking at every call site of mtq.quantize() (hf_ptq.py, example_utils.py, torchvision_qat.py, fakequant_worker.py, etc.), they all follow the same pattern:

quant_cfg = mtq.FP8_DEFAULT_CFG                   # pick a config
quant_cfg = build_quant_cfg(qformat, quant_cfg, ...)  # optionally modify
mtq.quantize(model, quant_cfg, forward_loop)       # call quantize with plain dict

mtq.quantize() accepts a plain dict. It doesn't care where the dict came from. So the resolution can happen at the call site:

# modelopt.torch.quantization.config — stays as plain Python dicts, no resolver
FP8_DEFAULT_CFG = {"quant_cfg": [...], "algorithm": "max"}   # no load_config, no imports

# User script or modelopt.recipe — resolves here, before calling mtq.quantize
from modelopt.recipe import load_recipe
config = load_recipe("general/ptq/fp8_default-fp8_kv")   # resolver runs HERE
mtq.quantize(model, config, forward_loop)                  # plain dict in

No circular dependency. No resolver in core. modelopt.torch stays lightweight — plain Python dicts, no YAML loading at import time. The resolver lives in modelopt.recipe where it belongs.

This matters because the design doc explicitly proposes separating the Optimization Lib (model-agnostic, modelopt.torch) from the Recipes Lib (config composition, modelopt.recipe). Putting the resolver in core goes against that separation.

2. Why a unified `_factory_` approach — regardless of implementation

Today we have two composition systems:

$import for PTQ recipe quant_cfg composition (this PR)
OmegaConf + __base__ for speculative decoding training configs (eagle3.yaml, dflash.yaml)

Future recipe types (QAT, QAD) will need BOTH in the same file:

quantize:
  quant_cfg:
    - $import: base_disable_all        # ← $import system
    - $import: w8a8_fp8_fp8            # ← $import system
training:
  learning_rate: 1e-5                  # ← OmegaConf dotlist from CLI
data:
  dataset: cnn_dailymail

Two systems, one file. The user needs to know $import for quant_cfg and OmegaConf dotlist for everything else. And they don't compose — you can't override a $import snippet field from the CLI because $import resolves first, then OmegaConf operates on the expanded result without knowing what came from where.

A unified _factory_ approach eliminates this:

_factory_: qat_fp8_default
training:
  learning_rate: 1e-5
data:
  dataset: cnn_dailymail

One system. The factory produces the full config (including composed quant_cfg), then OmegaConf dotlist overrides apply uniformly to everything. This also converges with tools/launcher which already uses _factory_ for Slurm config composition — same key, same semantics across the entire repo.

At the same YAML granularity, the two approaches are equivalent:

$import                              _factory_
─────────────────────────────────    ─────────────────────────────────
imports:                             # (no imports section needed)
  base: configs/ptq/units/base
  nvfp4: configs/numerics/nvfp4
  fp8_kv: configs/ptq/units/fp8_kv
  disabled: configs/ptq/units/disabled

quant_cfg:                           quant_cfg:
- $import: base                      - _factory_: base_disable_all
- quantizer_name: '...wt_quant'      - quantizer_name: '...wt_quant'
    cfg: { $import: nvfp4 }              cfg: { _factory_: nvfp4 }
- $import: fp8_kv                    - _factory_: fp8_kv
- $import: disabled                  - _factory_: default_disabled

Same readability, same compactness. But _factory_ gives us one composition system instead of two.

3. Implementation cost: custom resolver vs nemo_run

You mentioned you can implement _factory_ yourself. Here's the honest comparison:

Option A: Custom _factory_ resolver in modelopt.recipe


External deps	0 (pure Python + pyyaml + omegaconf)
Code to write	~300 lines (registry + resolver + OmegaConf override)
Maintenance	We own it — bugs are ours, must track nemo_run `_factory_` spec for compatibility
Factories	Regular Python functions, no fiddle restrictions, no wrapper types needed
Compatibility	Must ensure our `_factory_` semantics match nemo_run's so `tools/launcher` YAMLs work with both
Risk	Our implementation drifts from nemo_run's over time

Option B: Depend on nemo_run [core] (if they add the extra)


External deps	fiddle (~2 MB) + omegaconf (~0.6 MB) + catalogue (~0.1 MB) ≈ 3 MB, 4-6 packages
Code to write	0 — use nemo_run's resolver as-is
Maintenance	nemo_run team owns it — just upgrade version
Factories	Subject to fiddle's `auto_config` restrictions (no list comprehensions, needs wrapper types for bare dict/list returns — as you demonstrated)
Compatibility	Guaranteed — same code path as `tools/launcher`
Risk	nemo_run doesn't add `[core]` extra → stuck with 179 MB ray dependency from leptonai

Option C: $import (this PR as-is)


External deps	0
Code to write	336 lines (already done)
Maintenance	We own it
Factories	N/A — YAML snippets, not Python functions
Compatibility	Does not converge with `tools/launcher` — two systems in the repo
Risk	Future recipe types (QAT, QAD) mix `$import` + OmegaConf in the same file

My preference is A (custom _factory_) if you're willing to implement it, with the explicit goal of compatibility with nemo_run's _factory_ spec so we can switch to B later if they trim their dependencies.

But I also recognize that C (this PR) solves the immediate problem well, and the two-system concern is a future issue that can be addressed when QAT/QAD recipes actually ship. What do you think?

meenchen

1. Design — Module-Import-Time YAML Loading

[SUGGESTION] Lazy-load or fall back on failure

FP8_DEFAULT_CFG: dict[str, Any] = load_config("configs/ptq/presets/model/fp8")
FP8_KV_CFG: dict[str, Any] = load_config("configs/ptq/presets/kv/fp8")

Packaging is correct (package-data globs cover **/*.yaml), so this should work post-install. But if packaging ever regresses, or a YAML is malformed, the entire modelopt.torch.quantization.config module fails to import — breaking all downstream code. Consider lazy-loading via PEP 562 module __getattr__, or wrapping in try/except with a fallback to the hardcoded dict during the migration window.

2. Design — Filesystem-First Path Resolution

[QUESTION] User files in CWD silently shadow built-ins
_resolve_config_path checks filesystem paths before BUILTIN_CONFIG_ROOT:

paths_to_check.append(Path(f"{config_file}.yml"))        # CWD-relative first
paths_to_check.append(BUILTIN_CONFIG_ROOT.joinpath(...))  # built-in second

A user who happens to have configs/numerics/fp8.yml in their CWD silently overrides the built-in FP8_DEFAULT_CFG. Non-reproducible behavior depending on where Python is launched. Intentional (user override), or should absolute-style imports like configs/... always resolve against built-ins first?

3. Design — Snippet Files Use Indented Top-Level Lists

[NIT] Non-standard YAML formatting
Unit snippet files (base_disable_all.yaml, default_disabled_quantizers.yaml, etc.) use indented top-level list items:

  - quantizer_name: '*'
    enable: false

YAML accepts this but it's unusual. Most editors and tools expect column-0 list items. Authors copying from these templates may be confused. Consider de-indenting for consistency.

kevalmorabia97 · 2026-04-24T19:23:45Z

Alternative approach: two separate problems, two simpler solutions

The PR conflates two distinct duplication problems and solves them together with one mechanism. Separating them reveals that each has a simpler native solution.

Problem 1: Python `*_CFG` dicts are duplicated in `config.py`

FP8_DEFAULT_CFG, NVFP4_DEFAULT_CFG, etc. all inline the same _base_disable_all entries and 14-entry exclusion list. The fix is Python * unpacking — the native list-splice operation, no new abstractions needed:

# config.py — deduplicated with plain Python

_base_disable_all: list[QuantizerCfgEntry] = [
    {"quantizer_name": "*", "enable": False},
]

_default_disabled: list[QuantizerCfgEntry] = [
    {"quantizer_name": "*lm_head*", "enable": False},
    {"quantizer_name": "*output_layer*", "enable": False},
    # ... 12 more, written once
]

_fp8 = {"num_bits": (4, 3), "axis": None}

FP8_DEFAULT_CFG: QuantizeConfig = {
    "algorithm": "max",
    "quant_cfg": [
        *_base_disable_all,
        {"quantizer_name": "*weight_quantizer", "cfg": _fp8},
        {"quantizer_name": "*input_quantizer",  "cfg": _fp8},
        {"quantizer_name": "*[kv]_bmm_quantizer", "cfg": _fp8},
        *_default_disabled,
    ],
}

This eliminates the duplication in config.py with zero new architecture: no YAML loading at import time, no circular import, no resolver in modelopt.torch.opt, FP8_DEFAULT_CFG remains a plain dict, backends/fp8_per_tensor_gemm.py:100 continues to work unchanged.

Problem 2: YAML recipe files are duplicated

The 6 .yaml recipe files each inline the same 14-entry exclusion list. For this the right primitive is a !include custom YAML tag — a direct path reference with automatic list-flattening, no indirection layer:

# fp8_default-fp8_kv.yaml — with !include
metadata:
  recipe_type: ptq
  description: FP8 W8A8, FP8 KV cache, max calibration.

quantize:
  algorithm: max
  quant_cfg:
    - !include configs/ptq/units/base_disable_all.yaml     # list elements spliced in
    - quantizer_name: '*weight_quantizer'
      cfg: !include configs/numerics/fp8.yaml              # dict value replaced
    - quantizer_name: '*input_quantizer'
      cfg: !include configs/numerics/fp8.yaml
    - !include configs/ptq/units/fp8_kv.yaml
    - !include configs/ptq/units/default_disabled.yaml

Implementation is a single custom PyYAML constructor (~50 lines): resolve the path against the built-in library then the filesystem, load the content, and flatten any list-typed results into the surrounding list in a post-load pass.

Why `$import`'s indirection adds complexity without proportional benefit

The imports: → {$import: name} two-level indirection is the right pattern for a large project where snippet paths change frequently. For a small, stable snippet set under modelopt_recipes/configs/, it's over-engineering:

	`!include`	`$import` (this PR)
Reference style	`!include configs/numerics/fp8.yaml`	`imports:` section + `{$import: fp8}`
List snippet that needs its own includes	just works recursively	requires multi-document YAML (`---` separator)
Implementation	~50 lines	336 lines
Inline key override inside an include	not supported inline	`{$import: nvfp4, block_sizes: {type: static}}`

The one feature $import adds over !include is inline dict override ({$import: nvfp4, block_sizes: {type: static}}). If that's needed in practice, it can be handled with a targeted !merge tag in the small number of places where it appears, rather than building the full indirection system for every recipe.

Context: why OmegaConf (the other suggestion in this thread) doesn't work here either

For completeness — modelopt.torch.puzzletron already uses OmegaConf's defaults: list heavily for config composition. The reason that pattern doesn't transfer to PTQ recipes is structural:

Puzzletron configs are consumed by a Hydra app, so OmegaConf's config loader is already in the loop. PTQ recipes produce plain Python dicts passed to mtq.quantize() — there is no Hydra runtime.
OmegaConf merges dicts by key. quant_cfg is an ordered list — OmegaConf has no mechanism to splice named entries into the middle of a list.

So a custom processing step is genuinely required for the list-splice case. The question is only whether that step is !include (~50 lines, no indirection) or $import (336 lines, indirection layer, multi-doc YAML).

Summary

	This PR	Proposed alternative
`config.py` deduplication	Migrate to YAML, load at import time, circular import → move resolver to `modelopt.torch.opt`	Python `*` unpacking, stays in Python, 0 new lines in core
YAML recipe deduplication	`$import` with named imports, 336-line resolver	`!include` tag, ~50-line resolver, stays in `modelopt.recipe`
New lines in `modelopt.torch.opt`	~336	0
Multi-document YAML required	Yes (for list snippets with their own imports)	No

The problems are real and worth solving. The alternative just keeps the two concerns separate and uses the simplest mechanism for each.

shengliangxu · 2026-04-25T00:32:30Z

!include

Alternative approach: two separate problems, two simpler solutions

The PR conflates two distinct duplication problems and solves them together with one mechanism. Separating them reveals that each has a simpler native solution.

Problem 1: Python *_CFG dicts are duplicated in config.py

FP8_DEFAULT_CFG, NVFP4_DEFAULT_CFG, etc. all inline the same _base_disable_all entries and 14-entry exclusion list. The fix is Python * unpacking — the native list-splice operation, no new abstractions needed:
# config.py — deduplicated with plain Python

_base_disable_all: list[QuantizerCfgEntry] = [
    {"quantizer_name": "*", "enable": False},
]

_default_disabled: list[QuantizerCfgEntry] = [
    {"quantizer_name": "*lm_head*", "enable": False},
    {"quantizer_name": "*output_layer*", "enable": False},
    # ... 12 more, written once
]

_fp8 = {"num_bits": (4, 3), "axis": None}

FP8_DEFAULT_CFG: QuantizeConfig = {
    "algorithm": "max",
    "quant_cfg": [
        *_base_disable_all,
        {"quantizer_name": "*weight_quantizer", "cfg": _fp8},
        {"quantizer_name": "*input_quantizer",  "cfg": _fp8},
        {"quantizer_name": "*[kv]_bmm_quantizer", "cfg": _fp8},
        *_default_disabled,
    ],
}
This eliminates the duplication in config.py with zero new architecture: no YAML loading at import time, no circular import, no resolver in modelopt.torch.opt, FP8_DEFAULT_CFG remains a plain dict, backends/fp8_per_tensor_gemm.py:100 continues to work unchanged.

Problem 2: YAML recipe files are duplicated

The 6 .yaml recipe files each inline the same 14-entry exclusion list. For this the right primitive is a !include custom YAML tag — a direct path reference with automatic list-flattening, no indirection layer:
# fp8_default-fp8_kv.yaml — with !include
metadata:
  recipe_type: ptq
  description: FP8 W8A8, FP8 KV cache, max calibration.

quantize:
  algorithm: max
  quant_cfg:
    - !include configs/ptq/units/base_disable_all.yaml     # list elements spliced in
    - quantizer_name: '*weight_quantizer'
      cfg: !include configs/numerics/fp8.yaml              # dict value replaced
    - quantizer_name: '*input_quantizer'
      cfg: !include configs/numerics/fp8.yaml
    - !include configs/ptq/units/fp8_kv.yaml
    - !include configs/ptq/units/default_disabled.yaml
Implementation is a single custom PyYAML constructor (~50 lines): resolve the path against the built-in library then the filesystem, load the content, and flatten any list-typed results into the surrounding list in a post-load pass.

Why $import's indirection adds complexity without proportional benefit

The imports: → {$import: name} two-level indirection is the right pattern for a large project where snippet paths change frequently. For a small, stable snippet set under modelopt_recipes/configs/, it's over-engineering:

!include $import (this PR)
Reference style !include configs/numerics/fp8.yaml imports: section + {$import: fp8}
List snippet that needs its own includes just works recursively requires multi-document YAML (--- separator)
Implementation ~50 lines 336 lines
Inline key override inside an include not supported inline {$import: nvfp4, block_sizes: {type: static}}
The one feature $import adds over !include is inline dict override ({$import: nvfp4, block_sizes: {type: static}}). If that's needed in practice, it can be handled with a targeted !merge tag in the small number of places where it appears, rather than building the full indirection system for every recipe.

Context: why OmegaConf (the other suggestion in this thread) doesn't work here either

For completeness — modelopt.torch.puzzletron already uses OmegaConf's defaults: list heavily for config composition. The reason that pattern doesn't transfer to PTQ recipes is structural:

Puzzletron configs are consumed by a Hydra app, so OmegaConf's config loader is already in the loop. PTQ recipes produce plain Python dicts passed to mtq.quantize() — there is no Hydra runtime.

OmegaConf merges dicts by key. quant_cfg is an ordered list — OmegaConf has no mechanism to splice named entries into the middle of a list.

So a custom processing step is genuinely required for the list-splice case. The question is only whether that step is !include (~50 lines, no indirection) or $import (336 lines, indirection layer, multi-doc YAML).

Summary

This PR Proposed alternative
config.py deduplication Migrate to YAML, load at import time, circular import → move resolver to modelopt.torch.opt Python * unpacking, stays in Python, 0 new lines in core
YAML recipe deduplication $import with named imports, 336-line resolver !include tag, ~50-line resolver, stays in modelopt.recipe
New lines in modelopt.torch.opt ~336 0
Multi-document YAML required Yes (for list snippets with their own imports) No
The problems are real and worth solving. The alternative just keeps the two concerns separate and uses the simplest mechanism for each.

Problem 1: we want to get rid of the python hard coded stuff completely, so this argument is not applicable here.

Problem 2: !include is just yet another YAML DSL, it's no real difference with defining and using @import or !import.

OmegaConf is a library implementing YAML handling logics, it's irrelevant when talking about designs.

…e-recipes

shengliangxu added 17 commits April 15, 2026 14:09

reimplement using $import

99120f8

Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>

remove enable: true

f3caa85

Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>

remove incorrect indent

f29aed8

Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>

remove filter

eb0842b

Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>

simplify list import

d692606

Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>

update docs

e267edc

Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>

add import override semantic

bc47154

Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>

more clear docs

dbb524d

Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>

changelog

9941490

Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>

new conflict semantic

74235a9

Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>

support import for recipe snippets

8182b74

Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>

license headers + more doc

fd13e6e

Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>

more snippets

dcf10a6

Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>

nvfp4_dynamic is default

dc67001

Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>

quant config

5baba0b

Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>

presets

82d5a12

Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>

shengliangxu force-pushed the shengliangx/composable-recipes branch from ea910ae to 82d5a12 Compare April 15, 2026 21:10

shengliangxu added 4 commits April 15, 2026 14:11

yml -> yaml

cbf3f29

Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>

remove circular dependency

ae9e245

Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>

make config_root so it is logcially independent of recipe

65b291d

Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>

README

9f69cd0

Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>

shengliangxu changed the title ~~Add composable $import system for recipe YAML configs~~ Add a general composable $import system for YAML configs, and refactor some flat YAML config files into composable format Apr 15, 2026

shengliangxu added 4 commits April 15, 2026 15:24

Change Log

0b79b9f

Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>

use full name, do not short

e3c9e50

Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>

cleaner code

070f215

Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>

A new test

1127f32

Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>

shengliangxu requested review from a team and kevalmorabia97 and removed request for a team April 22, 2026 15:49

ChenhanYu reviewed Apr 22, 2026

View reviewed changes

shengliangxu and others added 2 commits April 22, 2026 14:27

Merge branch 'main' into shengliangx/composable-recipes

ad9e7b3

CHANGELOG update

c9f9f71

Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>

cjluo-nv approved these changes Apr 23, 2026

View reviewed changes

meenchen mentioned this pull request Apr 23, 2026

feat(recipes): add KV cache cast variants (fp8_cast / nvfp4_cast) #1334

Merged

4 tasks

Merge remote-tracking branch 'origin/main' into shengliangx/composabl…

fe28a6e

…e-recipes

realAsma reviewed Apr 23, 2026

View reviewed changes

Merge remote-tracking branch 'origin/main' into shengliangx/composabl…

aed5a91

…e-recipes

realAsma reviewed Apr 23, 2026

View reviewed changes

meenchen reviewed Apr 24, 2026

View reviewed changes

ChenhanYu mentioned this pull request Apr 29, 2026

Support YAML quant recipe in PTQ and remove first/last layer modifier code NVIDIA/Megatron-LM#4503

Merged

5 tasks

jenchen13 self-requested a review April 29, 2026 16:51

ChenhanYu self-requested a review May 1, 2026 16:49

ChenhanYu approved these changes May 1, 2026

View reviewed changes

Merge remote-tracking branch 'origin/main' into shengliangx/composabl…

a79dd8b

…e-recipes

kevalmorabia97 approved these changes May 2, 2026

View reviewed changes

		- ``$import: fp8`` under ``cfg`` is a dict value — the snippet (a YAML dict of
		quantizer attributes) replaces the ``cfg`` field.

	# NVFP4 E2M1 blockwise quantizer attributes with FP8 E4M3 scales (static calibration).
	# NVFP4 E2M1 blockwise quantizer attributes with FP8 E4M3 scales (used for NVFP4 weights since weight scales can be static).

Conversation

shengliangxu commented Apr 14, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Problem

Solution

Changes by area

Testing

Before your PR is "Ready for review"

Summary by CodeRabbit

Uh oh!

copy-pr-bot Bot commented Apr 14, 2026

Uh oh!

coderabbitai Bot commented Apr 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Uh oh!

github-actions Bot commented Apr 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Built to branch gh-pages at 2026-05-02 00:51 UTC. Preview will be ready when the GitHub Pages deployment is complete.

Uh oh!

ChenhanYu left a comment

Choose a reason for hiding this comment

Review

1. Why not OmegaConf?

2. Why not a Python factory?

3. Migration path

Uh oh!

shengliangxu commented Apr 23, 2026

Review

1. Why not OmegaConf?

2. Why not a Python factory?

3. Migration path

1. OmegaConf

Inheritance vs Composition

2. Python factory

3. Migration path

Uh oh!

cjluo-nv left a comment

Choose a reason for hiding this comment

Uh oh!

realAsma Apr 23, 2026

Choose a reason for hiding this comment

Uh oh!

shengliangxu Apr 23, 2026

Choose a reason for hiding this comment

Uh oh!

ChenhanYu commented Apr 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ChenhanYu commented Apr 23, 2026

Side-by-side: $import vs _factory_ at the same granularity

Nested factory overrides — the key advantage

CLI overrides — nemo_run's dotlist syntax

Uh oh!

realAsma Apr 23, 2026

Choose a reason for hiding this comment

Uh oh!

realAsma Apr 23, 2026

Choose a reason for hiding this comment

Uh oh!

realAsma Apr 23, 2026

Choose a reason for hiding this comment

Uh oh!

shengliangxu Apr 25, 2026

Choose a reason for hiding this comment

Uh oh!

realAsma Apr 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

realAsma Apr 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

shengliangxu commented Apr 24, 2026

shengliangxu commented Apr 14, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Apr 14, 2026 •

edited

Loading

github-actions Bot commented Apr 14, 2026 •

edited

Loading

Built to branch `gh-pages` at 2026-05-02 00:51 UTC.
Preview will be ready when the GitHub Pages deployment is complete.

ChenhanYu commented Apr 23, 2026 •

edited

Loading

Side-by-side: `$import` vs `_factory_` at the same granularity

realAsma Apr 23, 2026 •

edited

Loading

realAsma Apr 23, 2026 •

edited

Loading

1. The circular dependency — why the resolver should move out of `modelopt.torch`

2. Why a unified `_factory_` approach — regardless of implementation

Problem 1: Python `*_CFG` dicts are duplicated in `config.py`

Why `$import`'s indirection adds complexity without proportional benefit

shengliangxu commented Apr 25, 2026 •

edited

Loading

Problem 1: Python `*_CFG` dicts are duplicated in `config.py`

Why `$import`'s indirection adds complexity without proportional benefit