Flux2-Dev Quantization by jingyu-ml · Pull Request #947 · NVIDIA/Model-Optimizer

jingyu-ml · 2026-02-28T00:18:01Z

What does this PR do?

Type of change: new example

Overview:

Register Flux2Attention and Flux2ParallelSelfAttention in the quantization plugin so bmm quantizers are patched (enables --quantize-mha).
Add Flux2-specific dummy input generation for HF checkpoint export.
Guard check_conv_and_mha with hasattr for bmm quantizer attributes

Usage

python quantize.py \
    --model flux2-dev \
    --model-dtype BFloat16 \
    --format fp4 --batch-size 2 --calib-size 1 \
    --n-steps 20 --quantized-torch-ckpt-save-path ./flux2-dev-fp4.pt --collect-method default \
    --hf-ckpt-dir ./flux2-dev-fp4

Testing

Before your PR is "Ready for review"

Make sure you read and follow Contributor guidelines and your commits are signed.
Is this change backward compatible?: Yes
Did you write any new necessary tests?: No
Did you add or update any necessary documentation?: No
Did you update Changelog?: No

Additional Information

Summary by CodeRabbit

New Features
- Added Flux2-dev model support with Flux2-compatible dummy input generation and default inference params (768×1024, guidance scale 4.0).
Refactor
- Made attention quantization disabling more robust by iterating available quantizers before disabling.
Infrastructure
- Flux2 attention components are now optional and registered only when present to avoid import issues.
Tests
- Added Flux2 test helpers and coverage validating Flux2 dummy input shapes.

Signed-off-by: Jingyu Xin <jingyux@nvidia.com>

copy-pr-bot · 2026-02-28T00:18:07Z

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

coderabbitai · 2026-02-28T00:18:12Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

📝 Walkthrough

Walkthrough

Integrates Flux2 support across diffusers tooling: registers a new ModelType and pipeline, adds Flux2 model defaults and filter, generates Flux2-specific dummy inputs for export, updates quantization plugin to conditionally register Flux2 attention classes, and expands tests to cover Flux2 model factories and dummy-input expectations. (34 words)

Changes

Cohort / File(s)	Summary
Flux2 Model Registration & Defaults `examples/diffusers/quantization/models_utils.py`	Import `Flux2Pipeline` (optional), add `ModelType.FLUX2_DEV`, map it to `Flux2Pipeline`, add registry entry `black-forest-labs/FLUX.2-dev`, wire filter function, and add default config (backbone, dataset, inference_extra_args: height 768, width 1024, guidance_scale 4.0).
Quantizer/Attention Handling `examples/diffusers/quantization/utils.py`, `modelopt/torch/quantization/plugins/diffusion/diffusers.py`	Make FP8-MHA disablement robust by iterating over possible quantizer attributes with `hasattr()` before calling `disable()`; add guarded imports for `Flux2Attention` / `Flux2ParallelSelfAttention` and conditionally register them in `QuantModuleRegistry` and `_QuantAttentionModuleMixin` only if available.
Export: Flux2 Dummy Inputs `modelopt/torch/export/diffusers_utils.py`	Detect `Flux2Transformer2DModel`, add `_flux2_inputs()` to build Flux2-specific dummy inputs (3D hidden_states, encoder_hidden_states, timestep, img_ids, txt_ids, optional guidance), and insert a `("flux2", is_flux2, _flux2_inputs)` entry into model_input_builders.
Tests: Flux2 Coverage `tests/_test_utils/torch/diffusers_models.py`, `tests/unit/torch/export/test_export_diffusers.py`	Guard import for `Flux2Transformer2DModel`, add `get_tiny_flux2()` test factory (skips if unavailable), include it in export tests, and add a Flux2-specific test to validate dummy input shapes and guidance/ID structure.

Sequence Diagram(s)

sequenceDiagram
    participant Registry as ModelRegistry
    participant Exporter as Exporter
    participant Quant as QuantModuleRegistry
    participant Tests as TestSuite

    Registry->>Registry: Register ModelType.FLUX2_DEV\nMap to Flux2Pipeline + defaults + filter
    Exporter->>Registry: Query model metadata (pipeline, defaults)
    Exporter->>Exporter: Detect Flux2Transformer2DModel\nIf Flux2 -> use _flux2_inputs to build dummy inputs
    Quant->>Quant: Try import Flux2Attention / Flux2ParallelSelfAttention
    alt imports succeed
        Quant->>Quant: Register Flux2 attention modules in registry and mixin
    end
    Tests->>Exporter: Run export tests including get_tiny_flux2\nValidate dummy input shapes and guidance

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

🚥 Pre-merge checks | ✅ 3 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 43.75% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title "Flux2-Dev Quantization" directly and clearly summarizes the main changes: adding Flux2-Dev support to the quantization pipeline.
Security Anti-Patterns	✅ Passed	No security anti-patterns detected: no torch.load with weights_only=False, numpy.load with allow_pickle=True, trust_remote_code=True, eval/exec on external input, or nosec comments found.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings (stacked PR)
📝 Generate docstrings (commit on current branch)

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch jingyux/flux2-dev

📝 Coding Plan

Generate coding plan for human review comments

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

codecov · 2026-02-28T00:37:37Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 70.07%. Comparing base (2d7d1ec) to head (ce5e004).
⚠️ Report is 6 commits behind head on main.

Additional details and impacted files

@@           Coverage Diff           @@
##             main     #947   +/-   ##
=======================================
  Coverage   70.07%   70.07%           
=======================================
  Files         221      221           
  Lines       25531    25531           
=======================================
  Hits        17892    17892           
  Misses       7639     7639

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Signed-off-by: Jingyu Xin <jingyux@nvidia.com>

coderabbitai

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

examples/diffusers/quantization/models_utils.py (1)
21-28: ⚠️ Potential issue | 🟡 Minor

Guard the Flux2Pipeline import against older diffusers versions.

The direct import of Flux2Pipeline on line 23 will fail with ImportError for users with diffusers < 0.36.0 (when Flux2Pipeline was added). This mirrors the defensive import pattern already used in modelopt/torch/quantization/plugins/diffusion/diffusers.py for Flux2Attention (lines 37-44), and aligns with the existing pattern in MODEL_PIPELINE which supports None values.
🛡️ Suggested fix
 from diffusers import (
     DiffusionPipeline,
-    Flux2Pipeline,
     FluxPipeline,
     LTXConditionPipeline,
     StableDiffusion3Pipeline,
     WanPipeline,
 )
+
+try:
+    from diffusers import Flux2Pipeline
+except ImportError:
+    Flux2Pipeline = None
Then update line 103 in MODEL_PIPELINE to handle the None case gracefully (optional, depending on how the dictionary is used).
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@examples/diffusers/quantization/models_utils.py` around lines 21 - 28, Guard
the direct import of Flux2Pipeline by wrapping it in a try/except ImportError
and set Flux2Pipeline = None on failure (same pattern used for Flux2Attention),
so importing models_utils.py won’t raise for diffusers < 0.36.0; then ensure the
MODEL_PIPELINE mapping (the dict containing StableDiffusion3Pipeline,
FluxPipeline, WanPipeline, Flux2Pipeline, etc.) and any code that uses
Flux2Pipeline checks for None before using it (e.g., skip registration or
conditional branch when Flux2Pipeline is None).

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@examples/diffusers/quantization/models_utils.py`:
- Line 66: The mapping uses filter_func_flux_dev (Flux1-specific) for
ModelType.FLUX2_DEV; create a new Flux2-specific filter function (e.g.,
filter_func_flux2_dev) that targets the actual Flux2 layer/attribute names and
structures (handle absence of pooled_projections, different embedder names, and
the Flux2 attention modules Flux2Attention and Flux2ParallelSelfAttention) and
replace the mapping entry to use it; implement the selection logic in
filter_func_flux2_dev to inspect module class/type and attribute names used by
Flux2Transformer2DModel and return True only for the quantizable weight/bias
tensors for those modules, then register filter_func_flux2_dev in the
ModelType.FLUX2_DEV entry.

---

Outside diff comments:
In `@examples/diffusers/quantization/models_utils.py`:
- Around line 21-28: Guard the direct import of Flux2Pipeline by wrapping it in
a try/except ImportError and set Flux2Pipeline = None on failure (same pattern
used for Flux2Attention), so importing models_utils.py won’t raise for diffusers
< 0.36.0; then ensure the MODEL_PIPELINE mapping (the dict containing
StableDiffusion3Pipeline, FluxPipeline, WanPipeline, Flux2Pipeline, etc.) and
any code that uses Flux2Pipeline checks for None before using it (e.g., skip
registration or conditional branch when Flux2Pipeline is None).

ℹ️ Review info

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2905cb0 and 67ec8b3.

📒 Files selected for processing (4)

examples/diffusers/quantization/models_utils.py
examples/diffusers/quantization/utils.py
modelopt/torch/export/diffusers_utils.py
modelopt/torch/quantization/plugins/diffusion/diffusers.py

coderabbitai · 2026-03-02T05:07:35Z

    filter_func_map = {
        ModelType.FLUX_DEV: filter_func_flux_dev,
        ModelType.FLUX_SCHNELL: filter_func_default,
+        ModelType.FLUX2_DEV: filter_func_flux_dev,


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# Check if there's any Flux2-specific filter function defined rg -n 'filter_func.*flux2' --type=py

Repository: NVIDIA/Model-Optimizer

Length of output: 48

🏁 Script executed:

# Also search for all flux-related filter function definitions rg -n 'def filter_func.*flux' --type=py

Repository: NVIDIA/Model-Optimizer

Length of output: 154

🏁 Script executed:

# Check the context around line 66 in the file to understand the mapping cat -n examples/diffusers/quantization/models_utils.py | sed -n '50,80p'

Repository: NVIDIA/Model-Optimizer

Length of output: 1309

🏁 Script executed:

# Read the filter_func_flux_dev implementation sed -n '82,120p' examples/diffusers/quantization/utils.py

Repository: NVIDIA/Model-Optimizer

Length of output: 1456

🏁 Script executed:

# Search for Flux2 model references and layer patterns rg -n 'flux2|FLUX2' --type=py -i -B 2 -A 2

Repository: NVIDIA/Model-Optimizer

Length of output: 6822

🏁 Script executed:

# Check if there are any tests or examples showing Flux2 layer structure fd -type f -name '*.py' | xargs grep -l 'flux2\|FLUX2' | head -20

Repository: NVIDIA/Model-Optimizer

Length of output: 236

🏁 Script executed:

# Look for Flux2 transformer model structure rg -n 'Flux2Transformer2DModel|transformer_flux2' --type=py -A 5 | head -50

Repository: NVIDIA/Model-Optimizer

Length of output: 1926

🏁 Script executed:

# Check if there are any config or model files for Flux2 fd -type f \( -name '*flux2*' -o -name '*Flux2*' \)

Repository: NVIDIA/Model-Optimizer

Length of output: 236

🏁 Script executed:

# Search for any documentation or tests showing Flux2 layer naming rg -n 'time_text_embed|context_embedder|x_embedder' --type=py

Repository: NVIDIA/Model-Optimizer

Length of output: 800

Create a Flux2-specific filter function.

filter_func_flux_dev uses layer names (time_text_embed, context_embedder, x_embedder, norm_out) that are specific to Flux1. Since Flux2Transformer2DModel is architecturally different—it does not use pooled_projections, has distinct attention modules (Flux2Attention, Flux2ParallelSelfAttention), and different input handling—it likely has different layer naming conventions. A separate filter function should be created for Flux2 to ensure the correct layers are selected for quantization.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@examples/diffusers/quantization/models_utils.py` at line 66, The mapping uses filter_func_flux_dev (Flux1-specific) for ModelType.FLUX2_DEV; create a new Flux2-specific filter function (e.g., filter_func_flux2_dev) that targets the actual Flux2 layer/attribute names and structures (handle absence of pooled_projections, different embedder names, and the Flux2 attention modules Flux2Attention and Flux2ParallelSelfAttention) and replace the mapping entry to use it; implement the selection logic in filter_func_flux2_dev to inspect module class/type and attribute names used by Flux2Transformer2DModel and return True only for the quantizable weight/bias tensors for those modules, then register filter_func_flux2_dev in the ModelType.FLUX2_DEV entry.

Copilot

Pull request overview

This PR adds initial Flux2-dev support across the diffusers quantization flow, ensuring Flux2 attention modules are patchable for MHA quantization and enabling HF checkpoint export by generating Flux2-appropriate dummy inputs.

Changes:

Register Flux2Attention / Flux2ParallelSelfAttention in the diffusers quantization plugin (when available).
Add Flux2-specific dummy input generation for HF checkpoint export (Flux2Transformer2DModel).
Make example MHA quantizer disabling more robust by guarding quantizer attributes with hasattr.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.

File	Description
modelopt/torch/quantization/plugins/diffusion/diffusers.py	Adds optional imports + registry entries for Flux2 attention modules so quantization patching applies.
modelopt/torch/export/diffusers_utils.py	Adds Flux2 model detection and a Flux2-specific dummy input builder for export.
examples/diffusers/quantization/utils.py	Avoids attribute errors when disabling MHA quantizers by checking attributes before calling `disable()`.
examples/diffusers/quantization/models_utils.py	Adds a new `flux2-dev` model type and wires it into model IDs, pipeline selection, and defaults.

Comments suppressed due to low confidence (2)

examples/diffusers/quantization/models_utils.py:28

Flux2Pipeline is imported unconditionally from diffusers. Since this repo declares diffusers>=0.32.2, environments with an older diffusers that doesn’t provide Flux2Pipeline will fail at import-time and break the entire example suite. Please wrap the import in a try/except ImportError (similar to other optional diffusers components in the codebase) and handle the absence gracefully (e.g., set it to None and gate usage).

from diffusers import (
    DiffusionPipeline,
    Flux2Pipeline,
    FluxPipeline,
    LTXConditionPipeline,
    StableDiffusion3Pipeline,
    WanPipeline,
)

examples/diffusers/quantization/models_utils.py:107

MODEL_PIPELINE registers ModelType.FLUX2_DEV: Flux2Pipeline, which will raise at import-time if Flux2Pipeline is unavailable (and will also make downstream code assume a pipeline exists). After making the import optional, please gate this mapping (and any code that consumes it) so non-Flux2 workflows still work when running with older diffusers versions.

MODEL_PIPELINE: dict[ModelType, type[DiffusionPipeline] | None] = {
    ModelType.SDXL_BASE: DiffusionPipeline,
    ModelType.SDXL_TURBO: DiffusionPipeline,
    ModelType.SD3_MEDIUM: StableDiffusion3Pipeline,
    ModelType.SD35_MEDIUM: StableDiffusion3Pipeline,
    ModelType.FLUX_DEV: FluxPipeline,
    ModelType.FLUX_SCHNELL: FluxPipeline,
    ModelType.FLUX2_DEV: Flux2Pipeline,
    ModelType.LTX_VIDEO_DEV: LTXConditionPipeline,
    ModelType.LTX2: None,
    ModelType.WAN22_T2V_14b: WanPipeline,
    ModelType.WAN22_T2V_5b: WanPipeline,
}

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Signed-off-by: Jingyu Xin <jingyux@nvidia.com>

Edwardf0t1

Review: Flux2-Dev Quantization

Overall the changes are well-structured and follow existing patterns. A few issues to address:

Issues

1. Unconditional Flux2Pipeline import will break on older diffusers (examples/diffusers/quantization/models_utils.py:23)

Unlike the plugin file which gracefully handles ImportError for Flux2Attention/Flux2ParallelSelfAttention, the Flux2Pipeline import at the top of models_utils.py is unconditional and will crash on diffusers versions that don't have it. Should be guarded with a try/except like the plugin does.

2. filter_func_flux_dev reused for Flux2 — needs verification (models_utils.py:66)

FLUX2_DEV reuses filter_func_flux_dev which was written for Flux1. If Flux2 has different layer naming (given the new attention classes Flux2Attention, Flux2ParallelSelfAttention), this filter may incorrectly include/exclude layers. Please verify this is intentional or create a Flux2-specific filter.

3. Unrelated LTX filter change (examples/diffusers/quantization/utils.py:77)

The filter_func_ltx_video regex was modified to add |blocks\.(0|1|2|45|46|47)\. — this is unrelated to Flux2 support and isn't mentioned in the PR description. Should be in a separate commit or at least documented in the PR body.

4. Missing test coverage

A new model type, a changed check_conv_and_mha code path (hasattr guard), and a new dummy input builder are added with no tests. At minimum a unit test for _flux2_inputs shape correctness and the check_conv_and_mha hasattr path would help prevent regressions.

Minor / Nits

Hardcoded magic numbers in _flux2_inputs: img_seq_len = 16, text_seq_len = 8 — consider deriving from config or documenting why these values. Also, guidance default 4.0 vs Flux1's 3.5 — worth a brief comment on why it differs.
Inline config dict for Flux2 (models_utils.py:157-165): Flux1 uses a shared _FLUX_BASE_CONFIG constant. Consider extracting a _FLUX2_BASE_CONFIG for consistency/reusability.

What looks good

The try/except + None sentinel pattern for optional Flux2Attention/Flux2ParallelSelfAttention in the plugin is clean.
The hasattr guard in check_conv_and_mha is a good defensive fix for modules that may not have all bmm quantizer attributes.

Signed-off-by: Jingyu Xin <jingyux@nvidia.com>

coderabbitai

Actionable comments posted: 1

♻️ Duplicate comments (1)

examples/diffusers/quantization/models_utils.py (1)

66-66: ⚠️ Potential issue | 🟠 Major

Use a Flux2-specific filter for ModelType.FLUX2_DEV.

Line 66 maps Flux2 to filter_func_flux_dev, which is Flux1-pattern-based and can miss/mis-target Flux2 quantizable modules.

🔧 Suggested change

 from utils import (
     filter_func_default,
     filter_func_flux_dev,
+    filter_func_flux2_dev,
     filter_func_ltx_video,
     filter_func_wan_video,
 )
@@
-        ModelType.FLUX2_DEV: filter_func_flux_dev,
+        ModelType.FLUX2_DEV: filter_func_flux2_dev,

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@examples/diffusers/quantization/models_utils.py` at line 66, The mapping for
ModelType.FLUX2_DEV currently uses the Flux1-style filter_func_flux_dev which
misses Flux2 modules; update the dispatch so ModelType.FLUX2_DEV uses a
Flux2-specific filter (e.g., filter_func_flux2), or implement filter_func_flux2
with the correct Flux2 quantizable-module patterns and replace the existing
entry mapping ModelType.FLUX2_DEV -> filter_func_flux_dev to ModelType.FLUX2_DEV
-> filter_func_flux2; ensure the new filter function follows the same signature
as filter_func_flux_dev so callers like the quantization pipeline can use it
unchanged.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@examples/diffusers/quantization/models_utils.py`:
- Around line 23-24: The module eagerly imports Flux2Pipeline (introduced in
diffusers v0.36.0) which breaks imports for older diffusers versions; either
bump the package constraint to diffusers>=0.36.0 or guard the import: change the
top-level import of Flux2Pipeline (alongside FluxPipeline) to a safe pattern
that tries to import Flux2Pipeline and falls back (e.g., set Flux2Pipeline =
None or import within the function that needs it) so the module can import
successfully on older diffusers; update any usage sites to handle the
guarded/optional Flux2Pipeline accordingly.

---

Duplicate comments:
In `@examples/diffusers/quantization/models_utils.py`:
- Line 66: The mapping for ModelType.FLUX2_DEV currently uses the Flux1-style
filter_func_flux_dev which misses Flux2 modules; update the dispatch so
ModelType.FLUX2_DEV uses a Flux2-specific filter (e.g., filter_func_flux2), or
implement filter_func_flux2 with the correct Flux2 quantizable-module patterns
and replace the existing entry mapping ModelType.FLUX2_DEV ->
filter_func_flux_dev to ModelType.FLUX2_DEV -> filter_func_flux2; ensure the new
filter function follows the same signature as filter_func_flux_dev so callers
like the quantization pipeline can use it unchanged.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 5d86670d-b07f-4846-9add-e7a3a82a57ea

📥 Commits

Reviewing files that changed from the base of the PR and between 185fd28 and fead5f1.

📒 Files selected for processing (2)

examples/diffusers/quantization/models_utils.py
examples/diffusers/quantization/utils.py

🚧 Files skipped from review as they are similar to previous changes (1)

examples/diffusers/quantization/utils.py

Signed-off-by: Jingyu Xin <jingyux@nvidia.com>

coderabbitai

🧹 Nitpick comments (1)

examples/diffusers/quantization/models_utils.py (1)
106-106: Improve error message when Flux2Pipeline is unavailable.

Runtime checks already exist in pipeline_manager.py (lines 61-63, 101-104) that handle None pipeline classes. However, the error message "does not use diffusers pipelines" is misleading when the failure is due to an outdated diffusers version. Consider updating the error message to explicitly indicate the version requirement, e.g., "Model type {model_type.value} requires diffusers>=0.36.0. Please upgrade: pip install --upgrade diffusers" to provide clearer guidance to users.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@examples/diffusers/quantization/models_utils.py` at line 106, The current
mapping includes ModelType.FLUX2_DEV -> Flux2Pipeline but when Flux2Pipeline is
unavailable the runtime checks in pipeline_manager (the branches that handle a
None pipeline class) raise an error message "does not use diffusers pipelines"
which is misleading; update the error raised when the pipeline class is None
(the logic that checks the pipeline class returned for ModelType.FLUX2_DEV /
Flux2Pipeline) to mention the minimum diffusers version required (suggest:
"Model type {model_type.value} requires diffusers>=0.36.0. Please upgrade: pip
install --upgrade diffusers") so users get clear upgrade guidance instead of the
generic message.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@examples/diffusers/quantization/models_utils.py`:
- Line 106: The current mapping includes ModelType.FLUX2_DEV -> Flux2Pipeline
but when Flux2Pipeline is unavailable the runtime checks in pipeline_manager
(the branches that handle a None pipeline class) raise an error message "does
not use diffusers pipelines" which is misleading; update the error raised when
the pipeline class is None (the logic that checks the pipeline class returned
for ModelType.FLUX2_DEV / Flux2Pipeline) to mention the minimum diffusers
version required (suggest: "Model type {model_type.value} requires
diffusers>=0.36.0. Please upgrade: pip install --upgrade diffusers") so users
get clear upgrade guidance instead of the generic message.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: a04e846d-b842-4433-8abb-dcbfb6ac2d89

📥 Commits

Reviewing files that changed from the base of the PR and between fead5f1 and ce5e004.

📒 Files selected for processing (4)

examples/diffusers/quantization/models_utils.py
examples/diffusers/quantization/utils.py
tests/_test_utils/torch/diffusers_models.py
tests/unit/torch/export/test_export_diffusers.py

🚧 Files skipped from review as they are similar to previous changes (1)

examples/diffusers/quantization/utils.py

jingyu-ml · 2026-03-14T05:20:23Z

Missing test coverage

A new model type, a changed check_conv_and_mha code path (hasattr guard), and a new dummy input builder are added with no tests. At minimum a unit test for _flux2_inputs shape correctness and the check_conv_and_mha hasattr path would help prevent regressions.

Thank you.

Almost everything is fixed, but we couldn’t agree on the test case for check_conv_and_mha. This is just an example script under examples/diffusers/quantization/, not library code. Writing a unit test for it isn’t necessary, since we’ve already verified that the code works for all possible inputs covered by this example.

Edwardf0t1

Is FP8/NVFP4 Flux2-dev supported in TRT-LLM VisualGen?

jingyu-ml · 2026-03-14T16:41:46Z

Is FP8/NVFP4 Flux2-dev supported in TRT-LLM VisualGen?

They haven’t yet, but it should be included in their next release.

Update

e3a2fa7

Signed-off-by: Jingyu Xin <jingyux@nvidia.com>

jingyu-ml requested a review from a team as a code owner February 28, 2026 00:18

jingyu-ml requested a review from ajrasane February 28, 2026 00:18

jingyu-ml marked this pull request as draft February 28, 2026 00:18

jingyu-ml self-assigned this Feb 28, 2026

Add the flux2-dev export support

643a741

Signed-off-by: Jingyu Xin <jingyux@nvidia.com>

jingyu-ml marked this pull request as ready for review March 2, 2026 05:01

jingyu-ml requested review from a team as code owners March 2, 2026 05:01

jingyu-ml requested a review from cjluo-nv March 2, 2026 05:01

Merge branch 'main' into jingyux/flux2-dev

67ec8b3

jingyu-ml changed the title ~~Flux2-Dev~~ Flux2-Dev Quantization Mar 2, 2026

coderabbitai Bot reviewed Mar 2, 2026

View reviewed changes

Merge branch 'main' into jingyux/flux2-dev

d754255

Edwardf0t1 requested a review from Copilot March 5, 2026 07:50

Copilot started reviewing on behalf of Edwardf0t1 March 5, 2026 07:50 View session

Copilot AI reviewed Mar 5, 2026

View reviewed changes

Comment thread modelopt/torch/export/diffusers_utils.py

jingyu-ml and others added 2 commits March 5, 2026 22:34

Merge branch 'main' into jingyux/flux2-dev

442a765

Update the ltx2 filter

185fd28

Signed-off-by: Jingyu Xin <jingyux@nvidia.com>

Edwardf0t1 requested changes Mar 13, 2026

View reviewed changes

Merge branch 'main' into jingyux/flux2-dev

fead5f1

Signed-off-by: Jingyu Xin <jingyux@nvidia.com>

coderabbitai Bot reviewed Mar 14, 2026

View reviewed changes

Comment thread examples/diffusers/quantization/models_utils.py Outdated

Update

ce5e004

Signed-off-by: Jingyu Xin <jingyux@nvidia.com>

coderabbitai Bot reviewed Mar 14, 2026

View reviewed changes

jingyu-ml requested a review from Edwardf0t1 March 14, 2026 05:20

Edwardf0t1 approved these changes Mar 14, 2026

View reviewed changes

jingyu-ml merged commit 1070d89 into main Mar 14, 2026
40 checks passed

jingyu-ml deleted the jingyux/flux2-dev branch March 14, 2026 16:42

Conversation

jingyu-ml commented Feb 28, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Usage

Testing

Before your PR is "Ready for review"

Additional Information

Summary by CodeRabbit

Uh oh!

copy-pr-bot Bot commented Feb 28, 2026

Uh oh!

coderabbitai Bot commented Feb 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

❌ Failed checks (1 warning)

Uh oh!

codecov Bot commented Feb 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Mar 2, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Edwardf0t1 left a comment

Choose a reason for hiding this comment

Review: Flux2-Dev Quantization

Issues

Minor / Nits

What looks good

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

jingyu-ml commented Mar 14, 2026

Uh oh!

Edwardf0t1 left a comment

Choose a reason for hiding this comment

Uh oh!

jingyu-ml commented Mar 14, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

jingyu-ml commented Feb 28, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Feb 28, 2026 •

edited

Loading

codecov Bot commented Feb 28, 2026 •

edited

Loading