Skip to content

Flux2-Dev Quantization#947

Merged
jingyu-ml merged 8 commits intomainfrom
jingyux/flux2-dev
Mar 14, 2026
Merged

Flux2-Dev Quantization#947
jingyu-ml merged 8 commits intomainfrom
jingyux/flux2-dev

Conversation

@jingyu-ml
Copy link
Copy Markdown
Contributor

@jingyu-ml jingyu-ml commented Feb 28, 2026

What does this PR do?

Type of change: new example

Overview:

  • Register Flux2Attention and Flux2ParallelSelfAttention in the quantization plugin so bmm quantizers are patched (enables --quantize-mha).
  • Add Flux2-specific dummy input generation for HF checkpoint export.
  • Guard check_conv_and_mha with hasattr for bmm quantizer attributes

Usage

python quantize.py \
    --model flux2-dev \
    --model-dtype BFloat16 \
    --format fp4 --batch-size 2 --calib-size 1 \
    --n-steps 20 --quantized-torch-ckpt-save-path ./flux2-dev-fp4.pt --collect-method default \
    --hf-ckpt-dir ./flux2-dev-fp4

Testing

Before your PR is "Ready for review"

  • Make sure you read and follow Contributor guidelines and your commits are signed.
  • Is this change backward compatible?: Yes
  • Did you write any new necessary tests?: No
  • Did you add or update any necessary documentation?: No
  • Did you update Changelog?: No

Additional Information

Summary by CodeRabbit

  • New Features

    • Added Flux2-dev model support with Flux2-compatible dummy input generation and default inference params (768×1024, guidance scale 4.0).
  • Refactor

    • Made attention quantization disabling more robust by iterating available quantizers before disabling.
  • Infrastructure

    • Flux2 attention components are now optional and registered only when present to avoid import issues.
  • Tests

    • Added Flux2 test helpers and coverage validating Flux2 dummy input shapes.

Signed-off-by: Jingyu Xin <jingyux@nvidia.com>
@jingyu-ml jingyu-ml requested a review from a team as a code owner February 28, 2026 00:18
@jingyu-ml jingyu-ml requested a review from ajrasane February 28, 2026 00:18
@jingyu-ml jingyu-ml marked this pull request as draft February 28, 2026 00:18
@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot Bot commented Feb 28, 2026

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@jingyu-ml jingyu-ml self-assigned this Feb 28, 2026
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Feb 28, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Integrates Flux2 support across diffusers tooling: registers a new ModelType and pipeline, adds Flux2 model defaults and filter, generates Flux2-specific dummy inputs for export, updates quantization plugin to conditionally register Flux2 attention classes, and expands tests to cover Flux2 model factories and dummy-input expectations. (34 words)

Changes

Cohort / File(s) Summary
Flux2 Model Registration & Defaults
examples/diffusers/quantization/models_utils.py
Import Flux2Pipeline (optional), add ModelType.FLUX2_DEV, map it to Flux2Pipeline, add registry entry black-forest-labs/FLUX.2-dev, wire filter function, and add default config (backbone, dataset, inference_extra_args: height 768, width 1024, guidance_scale 4.0).
Quantizer/Attention Handling
examples/diffusers/quantization/utils.py, modelopt/torch/quantization/plugins/diffusion/diffusers.py
Make FP8-MHA disablement robust by iterating over possible quantizer attributes with hasattr() before calling disable(); add guarded imports for Flux2Attention / Flux2ParallelSelfAttention and conditionally register them in QuantModuleRegistry and _QuantAttentionModuleMixin only if available.
Export: Flux2 Dummy Inputs
modelopt/torch/export/diffusers_utils.py
Detect Flux2Transformer2DModel, add _flux2_inputs() to build Flux2-specific dummy inputs (3D hidden_states, encoder_hidden_states, timestep, img_ids, txt_ids, optional guidance), and insert a ("flux2", is_flux2, _flux2_inputs) entry into model_input_builders.
Tests: Flux2 Coverage
tests/_test_utils/torch/diffusers_models.py, tests/unit/torch/export/test_export_diffusers.py
Guard import for Flux2Transformer2DModel, add get_tiny_flux2() test factory (skips if unavailable), include it in export tests, and add a Flux2-specific test to validate dummy input shapes and guidance/ID structure.

Sequence Diagram(s)

sequenceDiagram
    participant Registry as ModelRegistry
    participant Exporter as Exporter
    participant Quant as QuantModuleRegistry
    participant Tests as TestSuite

    Registry->>Registry: Register ModelType.FLUX2_DEV\nMap to Flux2Pipeline + defaults + filter
    Exporter->>Registry: Query model metadata (pipeline, defaults)
    Exporter->>Exporter: Detect Flux2Transformer2DModel\nIf Flux2 -> use _flux2_inputs to build dummy inputs
    Quant->>Quant: Try import Flux2Attention / Flux2ParallelSelfAttention
    alt imports succeed
        Quant->>Quant: Register Flux2 attention modules in registry and mixin
    end
    Tests->>Exporter: Run export tests including get_tiny_flux2\nValidate dummy input shapes and guidance
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

🚥 Pre-merge checks | ✅ 3 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 43.75% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title "Flux2-Dev Quantization" directly and clearly summarizes the main changes: adding Flux2-Dev support to the quantization pipeline.
Security Anti-Patterns ✅ Passed No security anti-patterns detected: no torch.load with weights_only=False, numpy.load with allow_pickle=True, trust_remote_code=True, eval/exec on external input, or nosec comments found.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch jingyux/flux2-dev
📝 Coding Plan
  • Generate coding plan for human review comments

Comment @coderabbitai help to get the list of available commands and usage tips.

@codecov
Copy link
Copy Markdown

codecov Bot commented Feb 28, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 70.07%. Comparing base (2d7d1ec) to head (ce5e004).
⚠️ Report is 6 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main     #947   +/-   ##
=======================================
  Coverage   70.07%   70.07%           
=======================================
  Files         221      221           
  Lines       25531    25531           
=======================================
  Hits        17892    17892           
  Misses       7639     7639           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Signed-off-by: Jingyu Xin <jingyux@nvidia.com>
@jingyu-ml jingyu-ml marked this pull request as ready for review March 2, 2026 05:01
@jingyu-ml jingyu-ml requested review from a team as code owners March 2, 2026 05:01
@jingyu-ml jingyu-ml requested a review from cjluo-nv March 2, 2026 05:01
@jingyu-ml jingyu-ml changed the title Flux2-Dev Flux2-Dev Quantization Mar 2, 2026
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
examples/diffusers/quantization/models_utils.py (1)

21-28: ⚠️ Potential issue | 🟡 Minor

Guard the Flux2Pipeline import against older diffusers versions.

The direct import of Flux2Pipeline on line 23 will fail with ImportError for users with diffusers < 0.36.0 (when Flux2Pipeline was added). This mirrors the defensive import pattern already used in modelopt/torch/quantization/plugins/diffusion/diffusers.py for Flux2Attention (lines 37-44), and aligns with the existing pattern in MODEL_PIPELINE which supports None values.

🛡️ Suggested fix
 from diffusers import (
     DiffusionPipeline,
-    Flux2Pipeline,
     FluxPipeline,
     LTXConditionPipeline,
     StableDiffusion3Pipeline,
     WanPipeline,
 )
+
+try:
+    from diffusers import Flux2Pipeline
+except ImportError:
+    Flux2Pipeline = None

Then update line 103 in MODEL_PIPELINE to handle the None case gracefully (optional, depending on how the dictionary is used).

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@examples/diffusers/quantization/models_utils.py` around lines 21 - 28, Guard
the direct import of Flux2Pipeline by wrapping it in a try/except ImportError
and set Flux2Pipeline = None on failure (same pattern used for Flux2Attention),
so importing models_utils.py won’t raise for diffusers < 0.36.0; then ensure the
MODEL_PIPELINE mapping (the dict containing StableDiffusion3Pipeline,
FluxPipeline, WanPipeline, Flux2Pipeline, etc.) and any code that uses
Flux2Pipeline checks for None before using it (e.g., skip registration or
conditional branch when Flux2Pipeline is None).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@examples/diffusers/quantization/models_utils.py`:
- Line 66: The mapping uses filter_func_flux_dev (Flux1-specific) for
ModelType.FLUX2_DEV; create a new Flux2-specific filter function (e.g.,
filter_func_flux2_dev) that targets the actual Flux2 layer/attribute names and
structures (handle absence of pooled_projections, different embedder names, and
the Flux2 attention modules Flux2Attention and Flux2ParallelSelfAttention) and
replace the mapping entry to use it; implement the selection logic in
filter_func_flux2_dev to inspect module class/type and attribute names used by
Flux2Transformer2DModel and return True only for the quantizable weight/bias
tensors for those modules, then register filter_func_flux2_dev in the
ModelType.FLUX2_DEV entry.

---

Outside diff comments:
In `@examples/diffusers/quantization/models_utils.py`:
- Around line 21-28: Guard the direct import of Flux2Pipeline by wrapping it in
a try/except ImportError and set Flux2Pipeline = None on failure (same pattern
used for Flux2Attention), so importing models_utils.py won’t raise for diffusers
< 0.36.0; then ensure the MODEL_PIPELINE mapping (the dict containing
StableDiffusion3Pipeline, FluxPipeline, WanPipeline, Flux2Pipeline, etc.) and
any code that uses Flux2Pipeline checks for None before using it (e.g., skip
registration or conditional branch when Flux2Pipeline is None).

ℹ️ Review info

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2905cb0 and 67ec8b3.

📒 Files selected for processing (4)
  • examples/diffusers/quantization/models_utils.py
  • examples/diffusers/quantization/utils.py
  • modelopt/torch/export/diffusers_utils.py
  • modelopt/torch/quantization/plugins/diffusion/diffusers.py

filter_func_map = {
ModelType.FLUX_DEV: filter_func_flux_dev,
ModelType.FLUX_SCHNELL: filter_func_default,
ModelType.FLUX2_DEV: filter_func_flux_dev,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# Check if there's any Flux2-specific filter function defined
rg -n 'filter_func.*flux2' --type=py

Repository: NVIDIA/Model-Optimizer

Length of output: 48


🏁 Script executed:

# Also search for all flux-related filter function definitions
rg -n 'def filter_func.*flux' --type=py

Repository: NVIDIA/Model-Optimizer

Length of output: 154


🏁 Script executed:

# Check the context around line 66 in the file to understand the mapping
cat -n examples/diffusers/quantization/models_utils.py | sed -n '50,80p'

Repository: NVIDIA/Model-Optimizer

Length of output: 1309


🏁 Script executed:

# Read the filter_func_flux_dev implementation
sed -n '82,120p' examples/diffusers/quantization/utils.py

Repository: NVIDIA/Model-Optimizer

Length of output: 1456


🏁 Script executed:

# Search for Flux2 model references and layer patterns
rg -n 'flux2|FLUX2' --type=py -i -B 2 -A 2

Repository: NVIDIA/Model-Optimizer

Length of output: 6822


🏁 Script executed:

# Check if there are any tests or examples showing Flux2 layer structure
fd -type f -name '*.py' | xargs grep -l 'flux2\|FLUX2' | head -20

Repository: NVIDIA/Model-Optimizer

Length of output: 236


🏁 Script executed:

# Look for Flux2 transformer model structure
rg -n 'Flux2Transformer2DModel|transformer_flux2' --type=py -A 5 | head -50

Repository: NVIDIA/Model-Optimizer

Length of output: 1926


🏁 Script executed:

# Check if there are any config or model files for Flux2
fd -type f \( -name '*flux2*' -o -name '*Flux2*' \)

Repository: NVIDIA/Model-Optimizer

Length of output: 236


🏁 Script executed:

# Search for any documentation or tests showing Flux2 layer naming
rg -n 'time_text_embed|context_embedder|x_embedder' --type=py

Repository: NVIDIA/Model-Optimizer

Length of output: 800


Create a Flux2-specific filter function.

filter_func_flux_dev uses layer names (time_text_embed, context_embedder, x_embedder, norm_out) that are specific to Flux1. Since Flux2Transformer2DModel is architecturally different—it does not use pooled_projections, has distinct attention modules (Flux2Attention, Flux2ParallelSelfAttention), and different input handling—it likely has different layer naming conventions. A separate filter function should be created for Flux2 to ensure the correct layers are selected for quantization.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@examples/diffusers/quantization/models_utils.py` at line 66, The mapping uses
filter_func_flux_dev (Flux1-specific) for ModelType.FLUX2_DEV; create a new
Flux2-specific filter function (e.g., filter_func_flux2_dev) that targets the
actual Flux2 layer/attribute names and structures (handle absence of
pooled_projections, different embedder names, and the Flux2 attention modules
Flux2Attention and Flux2ParallelSelfAttention) and replace the mapping entry to
use it; implement the selection logic in filter_func_flux2_dev to inspect module
class/type and attribute names used by Flux2Transformer2DModel and return True
only for the quantizable weight/bias tensors for those modules, then register
filter_func_flux2_dev in the ModelType.FLUX2_DEV entry.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds initial Flux2-dev support across the diffusers quantization flow, ensuring Flux2 attention modules are patchable for MHA quantization and enabling HF checkpoint export by generating Flux2-appropriate dummy inputs.

Changes:

  • Register Flux2Attention / Flux2ParallelSelfAttention in the diffusers quantization plugin (when available).
  • Add Flux2-specific dummy input generation for HF checkpoint export (Flux2Transformer2DModel).
  • Make example MHA quantizer disabling more robust by guarding quantizer attributes with hasattr.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.

File Description
modelopt/torch/quantization/plugins/diffusion/diffusers.py Adds optional imports + registry entries for Flux2 attention modules so quantization patching applies.
modelopt/torch/export/diffusers_utils.py Adds Flux2 model detection and a Flux2-specific dummy input builder for export.
examples/diffusers/quantization/utils.py Avoids attribute errors when disabling MHA quantizers by checking attributes before calling disable().
examples/diffusers/quantization/models_utils.py Adds a new flux2-dev model type and wires it into model IDs, pipeline selection, and defaults.
Comments suppressed due to low confidence (2)

examples/diffusers/quantization/models_utils.py:28

  • Flux2Pipeline is imported unconditionally from diffusers. Since this repo declares diffusers>=0.32.2, environments with an older diffusers that doesn’t provide Flux2Pipeline will fail at import-time and break the entire example suite. Please wrap the import in a try/except ImportError (similar to other optional diffusers components in the codebase) and handle the absence gracefully (e.g., set it to None and gate usage).
from diffusers import (
    DiffusionPipeline,
    Flux2Pipeline,
    FluxPipeline,
    LTXConditionPipeline,
    StableDiffusion3Pipeline,
    WanPipeline,
)

examples/diffusers/quantization/models_utils.py:107

  • MODEL_PIPELINE registers ModelType.FLUX2_DEV: Flux2Pipeline, which will raise at import-time if Flux2Pipeline is unavailable (and will also make downstream code assume a pipeline exists). After making the import optional, please gate this mapping (and any code that consumes it) so non-Flux2 workflows still work when running with older diffusers versions.
MODEL_PIPELINE: dict[ModelType, type[DiffusionPipeline] | None] = {
    ModelType.SDXL_BASE: DiffusionPipeline,
    ModelType.SDXL_TURBO: DiffusionPipeline,
    ModelType.SD3_MEDIUM: StableDiffusion3Pipeline,
    ModelType.SD35_MEDIUM: StableDiffusion3Pipeline,
    ModelType.FLUX_DEV: FluxPipeline,
    ModelType.FLUX_SCHNELL: FluxPipeline,
    ModelType.FLUX2_DEV: Flux2Pipeline,
    ModelType.LTX_VIDEO_DEV: LTXConditionPipeline,
    ModelType.LTX2: None,
    ModelType.WAN22_T2V_14b: WanPipeline,
    ModelType.WAN22_T2V_5b: WanPipeline,
}

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread modelopt/torch/export/diffusers_utils.py
jingyu-ml and others added 2 commits March 5, 2026 22:34
Signed-off-by: Jingyu Xin <jingyux@nvidia.com>
Copy link
Copy Markdown
Contributor

@Edwardf0t1 Edwardf0t1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review: Flux2-Dev Quantization

Overall the changes are well-structured and follow existing patterns. A few issues to address:

Issues

1. Unconditional Flux2Pipeline import will break on older diffusers (examples/diffusers/quantization/models_utils.py:23)

Unlike the plugin file which gracefully handles ImportError for Flux2Attention/Flux2ParallelSelfAttention, the Flux2Pipeline import at the top of models_utils.py is unconditional and will crash on diffusers versions that don't have it. Should be guarded with a try/except like the plugin does.

2. filter_func_flux_dev reused for Flux2 — needs verification (models_utils.py:66)

FLUX2_DEV reuses filter_func_flux_dev which was written for Flux1. If Flux2 has different layer naming (given the new attention classes Flux2Attention, Flux2ParallelSelfAttention), this filter may incorrectly include/exclude layers. Please verify this is intentional or create a Flux2-specific filter.

3. Unrelated LTX filter change (examples/diffusers/quantization/utils.py:77)

The filter_func_ltx_video regex was modified to add |blocks\.(0|1|2|45|46|47)\. — this is unrelated to Flux2 support and isn't mentioned in the PR description. Should be in a separate commit or at least documented in the PR body.

4. Missing test coverage

A new model type, a changed check_conv_and_mha code path (hasattr guard), and a new dummy input builder are added with no tests. At minimum a unit test for _flux2_inputs shape correctness and the check_conv_and_mha hasattr path would help prevent regressions.

Minor / Nits

  • Hardcoded magic numbers in _flux2_inputs: img_seq_len = 16, text_seq_len = 8 — consider deriving from config or documenting why these values. Also, guidance default 4.0 vs Flux1's 3.5 — worth a brief comment on why it differs.
  • Inline config dict for Flux2 (models_utils.py:157-165): Flux1 uses a shared _FLUX_BASE_CONFIG constant. Consider extracting a _FLUX2_BASE_CONFIG for consistency/reusability.

What looks good

  • The try/except + None sentinel pattern for optional Flux2Attention/Flux2ParallelSelfAttention in the plugin is clean.
  • The hasattr guard in check_conv_and_mha is a good defensive fix for modules that may not have all bmm quantizer attributes.

Signed-off-by: Jingyu Xin <jingyux@nvidia.com>
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
examples/diffusers/quantization/models_utils.py (1)

66-66: ⚠️ Potential issue | 🟠 Major

Use a Flux2-specific filter for ModelType.FLUX2_DEV.

Line 66 maps Flux2 to filter_func_flux_dev, which is Flux1-pattern-based and can miss/mis-target Flux2 quantizable modules.

🔧 Suggested change
 from utils import (
     filter_func_default,
     filter_func_flux_dev,
+    filter_func_flux2_dev,
     filter_func_ltx_video,
     filter_func_wan_video,
 )
@@
-        ModelType.FLUX2_DEV: filter_func_flux_dev,
+        ModelType.FLUX2_DEV: filter_func_flux2_dev,
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@examples/diffusers/quantization/models_utils.py` at line 66, The mapping for
ModelType.FLUX2_DEV currently uses the Flux1-style filter_func_flux_dev which
misses Flux2 modules; update the dispatch so ModelType.FLUX2_DEV uses a
Flux2-specific filter (e.g., filter_func_flux2), or implement filter_func_flux2
with the correct Flux2 quantizable-module patterns and replace the existing
entry mapping ModelType.FLUX2_DEV -> filter_func_flux_dev to ModelType.FLUX2_DEV
-> filter_func_flux2; ensure the new filter function follows the same signature
as filter_func_flux_dev so callers like the quantization pipeline can use it
unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@examples/diffusers/quantization/models_utils.py`:
- Around line 23-24: The module eagerly imports Flux2Pipeline (introduced in
diffusers v0.36.0) which breaks imports for older diffusers versions; either
bump the package constraint to diffusers>=0.36.0 or guard the import: change the
top-level import of Flux2Pipeline (alongside FluxPipeline) to a safe pattern
that tries to import Flux2Pipeline and falls back (e.g., set Flux2Pipeline =
None or import within the function that needs it) so the module can import
successfully on older diffusers; update any usage sites to handle the
guarded/optional Flux2Pipeline accordingly.

---

Duplicate comments:
In `@examples/diffusers/quantization/models_utils.py`:
- Line 66: The mapping for ModelType.FLUX2_DEV currently uses the Flux1-style
filter_func_flux_dev which misses Flux2 modules; update the dispatch so
ModelType.FLUX2_DEV uses a Flux2-specific filter (e.g., filter_func_flux2), or
implement filter_func_flux2 with the correct Flux2 quantizable-module patterns
and replace the existing entry mapping ModelType.FLUX2_DEV ->
filter_func_flux_dev to ModelType.FLUX2_DEV -> filter_func_flux2; ensure the new
filter function follows the same signature as filter_func_flux_dev so callers
like the quantization pipeline can use it unchanged.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 5d86670d-b07f-4846-9add-e7a3a82a57ea

📥 Commits

Reviewing files that changed from the base of the PR and between 185fd28 and fead5f1.

📒 Files selected for processing (2)
  • examples/diffusers/quantization/models_utils.py
  • examples/diffusers/quantization/utils.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • examples/diffusers/quantization/utils.py

Comment thread examples/diffusers/quantization/models_utils.py Outdated
Signed-off-by: Jingyu Xin <jingyux@nvidia.com>
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
examples/diffusers/quantization/models_utils.py (1)

106-106: Improve error message when Flux2Pipeline is unavailable.

Runtime checks already exist in pipeline_manager.py (lines 61-63, 101-104) that handle None pipeline classes. However, the error message "does not use diffusers pipelines" is misleading when the failure is due to an outdated diffusers version. Consider updating the error message to explicitly indicate the version requirement, e.g., "Model type {model_type.value} requires diffusers>=0.36.0. Please upgrade: pip install --upgrade diffusers" to provide clearer guidance to users.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@examples/diffusers/quantization/models_utils.py` at line 106, The current
mapping includes ModelType.FLUX2_DEV -> Flux2Pipeline but when Flux2Pipeline is
unavailable the runtime checks in pipeline_manager (the branches that handle a
None pipeline class) raise an error message "does not use diffusers pipelines"
which is misleading; update the error raised when the pipeline class is None
(the logic that checks the pipeline class returned for ModelType.FLUX2_DEV /
Flux2Pipeline) to mention the minimum diffusers version required (suggest:
"Model type {model_type.value} requires diffusers>=0.36.0. Please upgrade: pip
install --upgrade diffusers") so users get clear upgrade guidance instead of the
generic message.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@examples/diffusers/quantization/models_utils.py`:
- Line 106: The current mapping includes ModelType.FLUX2_DEV -> Flux2Pipeline
but when Flux2Pipeline is unavailable the runtime checks in pipeline_manager
(the branches that handle a None pipeline class) raise an error message "does
not use diffusers pipelines" which is misleading; update the error raised when
the pipeline class is None (the logic that checks the pipeline class returned
for ModelType.FLUX2_DEV / Flux2Pipeline) to mention the minimum diffusers
version required (suggest: "Model type {model_type.value} requires
diffusers>=0.36.0. Please upgrade: pip install --upgrade diffusers") so users
get clear upgrade guidance instead of the generic message.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: a04e846d-b842-4433-8abb-dcbfb6ac2d89

📥 Commits

Reviewing files that changed from the base of the PR and between fead5f1 and ce5e004.

📒 Files selected for processing (4)
  • examples/diffusers/quantization/models_utils.py
  • examples/diffusers/quantization/utils.py
  • tests/_test_utils/torch/diffusers_models.py
  • tests/unit/torch/export/test_export_diffusers.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • examples/diffusers/quantization/utils.py

@jingyu-ml
Copy link
Copy Markdown
Contributor Author

Missing test coverage

A new model type, a changed check_conv_and_mha code path (hasattr guard), and a new dummy input builder are added with no tests. At minimum a unit test for _flux2_inputs shape correctness and the check_conv_and_mha hasattr path would help prevent regressions.

Thank you.

Almost everything is fixed, but we couldn’t agree on the test case for check_conv_and_mha. This is just an example script under examples/diffusers/quantization/, not library code. Writing a unit test for it isn’t necessary, since we’ve already verified that the code works for all possible inputs covered by this example.

@jingyu-ml jingyu-ml requested a review from Edwardf0t1 March 14, 2026 05:20
Copy link
Copy Markdown
Contributor

@Edwardf0t1 Edwardf0t1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is FP8/NVFP4 Flux2-dev supported in TRT-LLM VisualGen?

@jingyu-ml
Copy link
Copy Markdown
Contributor Author

Is FP8/NVFP4 Flux2-dev supported in TRT-LLM VisualGen?

They haven’t yet, but it should be included in their next release.

@jingyu-ml jingyu-ml merged commit 1070d89 into main Mar 14, 2026
40 checks passed
@jingyu-ml jingyu-ml deleted the jingyux/flux2-dev branch March 14, 2026 16:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants