Conversation
|
Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughIntegrates Flux2 support across diffusers tooling: registers a new ModelType and pipeline, adds Flux2 model defaults and filter, generates Flux2-specific dummy inputs for export, updates quantization plugin to conditionally register Flux2 attention classes, and expands tests to cover Flux2 model factories and dummy-input expectations. (34 words) Changes
Sequence Diagram(s)sequenceDiagram
participant Registry as ModelRegistry
participant Exporter as Exporter
participant Quant as QuantModuleRegistry
participant Tests as TestSuite
Registry->>Registry: Register ModelType.FLUX2_DEV\nMap to Flux2Pipeline + defaults + filter
Exporter->>Registry: Query model metadata (pipeline, defaults)
Exporter->>Exporter: Detect Flux2Transformer2DModel\nIf Flux2 -> use _flux2_inputs to build dummy inputs
Quant->>Quant: Try import Flux2Attention / Flux2ParallelSelfAttention
alt imports succeed
Quant->>Quant: Register Flux2 attention modules in registry and mixin
end
Tests->>Exporter: Run export tests including get_tiny_flux2\nValidate dummy input shapes and guidance
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes 🚥 Pre-merge checks | ✅ 3 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches
🧪 Generate unit tests (beta)
📝 Coding Plan
Comment |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #947 +/- ##
=======================================
Coverage 70.07% 70.07%
=======================================
Files 221 221
Lines 25531 25531
=======================================
Hits 17892 17892
Misses 7639 7639 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Signed-off-by: Jingyu Xin <jingyux@nvidia.com>
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
examples/diffusers/quantization/models_utils.py (1)
21-28:⚠️ Potential issue | 🟡 MinorGuard the
Flux2Pipelineimport against older diffusers versions.The direct import of
Flux2Pipelineon line 23 will fail withImportErrorfor users with diffusers < 0.36.0 (when Flux2Pipeline was added). This mirrors the defensive import pattern already used inmodelopt/torch/quantization/plugins/diffusion/diffusers.pyforFlux2Attention(lines 37-44), and aligns with the existing pattern inMODEL_PIPELINEwhich supportsNonevalues.🛡️ Suggested fix
from diffusers import ( DiffusionPipeline, - Flux2Pipeline, FluxPipeline, LTXConditionPipeline, StableDiffusion3Pipeline, WanPipeline, ) + +try: + from diffusers import Flux2Pipeline +except ImportError: + Flux2Pipeline = NoneThen update line 103 in
MODEL_PIPELINEto handle theNonecase gracefully (optional, depending on how the dictionary is used).🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@examples/diffusers/quantization/models_utils.py` around lines 21 - 28, Guard the direct import of Flux2Pipeline by wrapping it in a try/except ImportError and set Flux2Pipeline = None on failure (same pattern used for Flux2Attention), so importing models_utils.py won’t raise for diffusers < 0.36.0; then ensure the MODEL_PIPELINE mapping (the dict containing StableDiffusion3Pipeline, FluxPipeline, WanPipeline, Flux2Pipeline, etc.) and any code that uses Flux2Pipeline checks for None before using it (e.g., skip registration or conditional branch when Flux2Pipeline is None).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@examples/diffusers/quantization/models_utils.py`:
- Line 66: The mapping uses filter_func_flux_dev (Flux1-specific) for
ModelType.FLUX2_DEV; create a new Flux2-specific filter function (e.g.,
filter_func_flux2_dev) that targets the actual Flux2 layer/attribute names and
structures (handle absence of pooled_projections, different embedder names, and
the Flux2 attention modules Flux2Attention and Flux2ParallelSelfAttention) and
replace the mapping entry to use it; implement the selection logic in
filter_func_flux2_dev to inspect module class/type and attribute names used by
Flux2Transformer2DModel and return True only for the quantizable weight/bias
tensors for those modules, then register filter_func_flux2_dev in the
ModelType.FLUX2_DEV entry.
---
Outside diff comments:
In `@examples/diffusers/quantization/models_utils.py`:
- Around line 21-28: Guard the direct import of Flux2Pipeline by wrapping it in
a try/except ImportError and set Flux2Pipeline = None on failure (same pattern
used for Flux2Attention), so importing models_utils.py won’t raise for diffusers
< 0.36.0; then ensure the MODEL_PIPELINE mapping (the dict containing
StableDiffusion3Pipeline, FluxPipeline, WanPipeline, Flux2Pipeline, etc.) and
any code that uses Flux2Pipeline checks for None before using it (e.g., skip
registration or conditional branch when Flux2Pipeline is None).
ℹ️ Review info
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
examples/diffusers/quantization/models_utils.pyexamples/diffusers/quantization/utils.pymodelopt/torch/export/diffusers_utils.pymodelopt/torch/quantization/plugins/diffusion/diffusers.py
| filter_func_map = { | ||
| ModelType.FLUX_DEV: filter_func_flux_dev, | ||
| ModelType.FLUX_SCHNELL: filter_func_default, | ||
| ModelType.FLUX2_DEV: filter_func_flux_dev, |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# Check if there's any Flux2-specific filter function defined
rg -n 'filter_func.*flux2' --type=pyRepository: NVIDIA/Model-Optimizer
Length of output: 48
🏁 Script executed:
# Also search for all flux-related filter function definitions
rg -n 'def filter_func.*flux' --type=pyRepository: NVIDIA/Model-Optimizer
Length of output: 154
🏁 Script executed:
# Check the context around line 66 in the file to understand the mapping
cat -n examples/diffusers/quantization/models_utils.py | sed -n '50,80p'Repository: NVIDIA/Model-Optimizer
Length of output: 1309
🏁 Script executed:
# Read the filter_func_flux_dev implementation
sed -n '82,120p' examples/diffusers/quantization/utils.pyRepository: NVIDIA/Model-Optimizer
Length of output: 1456
🏁 Script executed:
# Search for Flux2 model references and layer patterns
rg -n 'flux2|FLUX2' --type=py -i -B 2 -A 2Repository: NVIDIA/Model-Optimizer
Length of output: 6822
🏁 Script executed:
# Check if there are any tests or examples showing Flux2 layer structure
fd -type f -name '*.py' | xargs grep -l 'flux2\|FLUX2' | head -20Repository: NVIDIA/Model-Optimizer
Length of output: 236
🏁 Script executed:
# Look for Flux2 transformer model structure
rg -n 'Flux2Transformer2DModel|transformer_flux2' --type=py -A 5 | head -50Repository: NVIDIA/Model-Optimizer
Length of output: 1926
🏁 Script executed:
# Check if there are any config or model files for Flux2
fd -type f \( -name '*flux2*' -o -name '*Flux2*' \)Repository: NVIDIA/Model-Optimizer
Length of output: 236
🏁 Script executed:
# Search for any documentation or tests showing Flux2 layer naming
rg -n 'time_text_embed|context_embedder|x_embedder' --type=pyRepository: NVIDIA/Model-Optimizer
Length of output: 800
Create a Flux2-specific filter function.
filter_func_flux_dev uses layer names (time_text_embed, context_embedder, x_embedder, norm_out) that are specific to Flux1. Since Flux2Transformer2DModel is architecturally different—it does not use pooled_projections, has distinct attention modules (Flux2Attention, Flux2ParallelSelfAttention), and different input handling—it likely has different layer naming conventions. A separate filter function should be created for Flux2 to ensure the correct layers are selected for quantization.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@examples/diffusers/quantization/models_utils.py` at line 66, The mapping uses
filter_func_flux_dev (Flux1-specific) for ModelType.FLUX2_DEV; create a new
Flux2-specific filter function (e.g., filter_func_flux2_dev) that targets the
actual Flux2 layer/attribute names and structures (handle absence of
pooled_projections, different embedder names, and the Flux2 attention modules
Flux2Attention and Flux2ParallelSelfAttention) and replace the mapping entry to
use it; implement the selection logic in filter_func_flux2_dev to inspect module
class/type and attribute names used by Flux2Transformer2DModel and return True
only for the quantizable weight/bias tensors for those modules, then register
filter_func_flux2_dev in the ModelType.FLUX2_DEV entry.
There was a problem hiding this comment.
Pull request overview
This PR adds initial Flux2-dev support across the diffusers quantization flow, ensuring Flux2 attention modules are patchable for MHA quantization and enabling HF checkpoint export by generating Flux2-appropriate dummy inputs.
Changes:
- Register
Flux2Attention/Flux2ParallelSelfAttentionin the diffusers quantization plugin (when available). - Add Flux2-specific dummy input generation for HF checkpoint export (
Flux2Transformer2DModel). - Make example MHA quantizer disabling more robust by guarding quantizer attributes with
hasattr.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
| modelopt/torch/quantization/plugins/diffusion/diffusers.py | Adds optional imports + registry entries for Flux2 attention modules so quantization patching applies. |
| modelopt/torch/export/diffusers_utils.py | Adds Flux2 model detection and a Flux2-specific dummy input builder for export. |
| examples/diffusers/quantization/utils.py | Avoids attribute errors when disabling MHA quantizers by checking attributes before calling disable(). |
| examples/diffusers/quantization/models_utils.py | Adds a new flux2-dev model type and wires it into model IDs, pipeline selection, and defaults. |
Comments suppressed due to low confidence (2)
examples/diffusers/quantization/models_utils.py:28
Flux2Pipelineis imported unconditionally fromdiffusers. Since this repo declaresdiffusers>=0.32.2, environments with an older diffusers that doesn’t provideFlux2Pipelinewill fail at import-time and break the entire example suite. Please wrap the import in atry/except ImportError(similar to other optional diffusers components in the codebase) and handle the absence gracefully (e.g., set it toNoneand gate usage).
from diffusers import (
DiffusionPipeline,
Flux2Pipeline,
FluxPipeline,
LTXConditionPipeline,
StableDiffusion3Pipeline,
WanPipeline,
)
examples/diffusers/quantization/models_utils.py:107
MODEL_PIPELINEregistersModelType.FLUX2_DEV: Flux2Pipeline, which will raise at import-time ifFlux2Pipelineis unavailable (and will also make downstream code assume a pipeline exists). After making the import optional, please gate this mapping (and any code that consumes it) so non-Flux2 workflows still work when running with older diffusers versions.
MODEL_PIPELINE: dict[ModelType, type[DiffusionPipeline] | None] = {
ModelType.SDXL_BASE: DiffusionPipeline,
ModelType.SDXL_TURBO: DiffusionPipeline,
ModelType.SD3_MEDIUM: StableDiffusion3Pipeline,
ModelType.SD35_MEDIUM: StableDiffusion3Pipeline,
ModelType.FLUX_DEV: FluxPipeline,
ModelType.FLUX_SCHNELL: FluxPipeline,
ModelType.FLUX2_DEV: Flux2Pipeline,
ModelType.LTX_VIDEO_DEV: LTXConditionPipeline,
ModelType.LTX2: None,
ModelType.WAN22_T2V_14b: WanPipeline,
ModelType.WAN22_T2V_5b: WanPipeline,
}
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Signed-off-by: Jingyu Xin <jingyux@nvidia.com>
Edwardf0t1
left a comment
There was a problem hiding this comment.
Review: Flux2-Dev Quantization
Overall the changes are well-structured and follow existing patterns. A few issues to address:
Issues
1. Unconditional Flux2Pipeline import will break on older diffusers (examples/diffusers/quantization/models_utils.py:23)
Unlike the plugin file which gracefully handles ImportError for Flux2Attention/Flux2ParallelSelfAttention, the Flux2Pipeline import at the top of models_utils.py is unconditional and will crash on diffusers versions that don't have it. Should be guarded with a try/except like the plugin does.
2. filter_func_flux_dev reused for Flux2 — needs verification (models_utils.py:66)
FLUX2_DEV reuses filter_func_flux_dev which was written for Flux1. If Flux2 has different layer naming (given the new attention classes Flux2Attention, Flux2ParallelSelfAttention), this filter may incorrectly include/exclude layers. Please verify this is intentional or create a Flux2-specific filter.
3. Unrelated LTX filter change (examples/diffusers/quantization/utils.py:77)
The filter_func_ltx_video regex was modified to add |blocks\.(0|1|2|45|46|47)\. — this is unrelated to Flux2 support and isn't mentioned in the PR description. Should be in a separate commit or at least documented in the PR body.
4. Missing test coverage
A new model type, a changed check_conv_and_mha code path (hasattr guard), and a new dummy input builder are added with no tests. At minimum a unit test for _flux2_inputs shape correctness and the check_conv_and_mha hasattr path would help prevent regressions.
Minor / Nits
- Hardcoded magic numbers in
_flux2_inputs:img_seq_len = 16,text_seq_len = 8— consider deriving from config or documenting why these values. Also, guidance default4.0vs Flux1's3.5— worth a brief comment on why it differs. - Inline config dict for Flux2 (
models_utils.py:157-165): Flux1 uses a shared_FLUX_BASE_CONFIGconstant. Consider extracting a_FLUX2_BASE_CONFIGfor consistency/reusability.
What looks good
- The try/except +
Nonesentinel pattern for optionalFlux2Attention/Flux2ParallelSelfAttentionin the plugin is clean. - The
hasattrguard incheck_conv_and_mhais a good defensive fix for modules that may not have all bmm quantizer attributes.
Signed-off-by: Jingyu Xin <jingyux@nvidia.com>
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (1)
examples/diffusers/quantization/models_utils.py (1)
66-66:⚠️ Potential issue | 🟠 MajorUse a Flux2-specific filter for
ModelType.FLUX2_DEV.Line 66 maps Flux2 to
filter_func_flux_dev, which is Flux1-pattern-based and can miss/mis-target Flux2 quantizable modules.🔧 Suggested change
from utils import ( filter_func_default, filter_func_flux_dev, + filter_func_flux2_dev, filter_func_ltx_video, filter_func_wan_video, ) @@ - ModelType.FLUX2_DEV: filter_func_flux_dev, + ModelType.FLUX2_DEV: filter_func_flux2_dev,🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@examples/diffusers/quantization/models_utils.py` at line 66, The mapping for ModelType.FLUX2_DEV currently uses the Flux1-style filter_func_flux_dev which misses Flux2 modules; update the dispatch so ModelType.FLUX2_DEV uses a Flux2-specific filter (e.g., filter_func_flux2), or implement filter_func_flux2 with the correct Flux2 quantizable-module patterns and replace the existing entry mapping ModelType.FLUX2_DEV -> filter_func_flux_dev to ModelType.FLUX2_DEV -> filter_func_flux2; ensure the new filter function follows the same signature as filter_func_flux_dev so callers like the quantization pipeline can use it unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@examples/diffusers/quantization/models_utils.py`:
- Around line 23-24: The module eagerly imports Flux2Pipeline (introduced in
diffusers v0.36.0) which breaks imports for older diffusers versions; either
bump the package constraint to diffusers>=0.36.0 or guard the import: change the
top-level import of Flux2Pipeline (alongside FluxPipeline) to a safe pattern
that tries to import Flux2Pipeline and falls back (e.g., set Flux2Pipeline =
None or import within the function that needs it) so the module can import
successfully on older diffusers; update any usage sites to handle the
guarded/optional Flux2Pipeline accordingly.
---
Duplicate comments:
In `@examples/diffusers/quantization/models_utils.py`:
- Line 66: The mapping for ModelType.FLUX2_DEV currently uses the Flux1-style
filter_func_flux_dev which misses Flux2 modules; update the dispatch so
ModelType.FLUX2_DEV uses a Flux2-specific filter (e.g., filter_func_flux2), or
implement filter_func_flux2 with the correct Flux2 quantizable-module patterns
and replace the existing entry mapping ModelType.FLUX2_DEV ->
filter_func_flux_dev to ModelType.FLUX2_DEV -> filter_func_flux2; ensure the new
filter function follows the same signature as filter_func_flux_dev so callers
like the quantization pipeline can use it unchanged.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 5d86670d-b07f-4846-9add-e7a3a82a57ea
📒 Files selected for processing (2)
examples/diffusers/quantization/models_utils.pyexamples/diffusers/quantization/utils.py
🚧 Files skipped from review as they are similar to previous changes (1)
- examples/diffusers/quantization/utils.py
There was a problem hiding this comment.
🧹 Nitpick comments (1)
examples/diffusers/quantization/models_utils.py (1)
106-106: Improve error message whenFlux2Pipelineis unavailable.Runtime checks already exist in
pipeline_manager.py(lines 61-63, 101-104) that handleNonepipeline classes. However, the error message "does not use diffusers pipelines" is misleading when the failure is due to an outdated diffusers version. Consider updating the error message to explicitly indicate the version requirement, e.g., "Model type {model_type.value} requires diffusers>=0.36.0. Please upgrade: pip install --upgrade diffusers" to provide clearer guidance to users.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@examples/diffusers/quantization/models_utils.py` at line 106, The current mapping includes ModelType.FLUX2_DEV -> Flux2Pipeline but when Flux2Pipeline is unavailable the runtime checks in pipeline_manager (the branches that handle a None pipeline class) raise an error message "does not use diffusers pipelines" which is misleading; update the error raised when the pipeline class is None (the logic that checks the pipeline class returned for ModelType.FLUX2_DEV / Flux2Pipeline) to mention the minimum diffusers version required (suggest: "Model type {model_type.value} requires diffusers>=0.36.0. Please upgrade: pip install --upgrade diffusers") so users get clear upgrade guidance instead of the generic message.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@examples/diffusers/quantization/models_utils.py`:
- Line 106: The current mapping includes ModelType.FLUX2_DEV -> Flux2Pipeline
but when Flux2Pipeline is unavailable the runtime checks in pipeline_manager
(the branches that handle a None pipeline class) raise an error message "does
not use diffusers pipelines" which is misleading; update the error raised when
the pipeline class is None (the logic that checks the pipeline class returned
for ModelType.FLUX2_DEV / Flux2Pipeline) to mention the minimum diffusers
version required (suggest: "Model type {model_type.value} requires
diffusers>=0.36.0. Please upgrade: pip install --upgrade diffusers") so users
get clear upgrade guidance instead of the generic message.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: a04e846d-b842-4433-8abb-dcbfb6ac2d89
📒 Files selected for processing (4)
examples/diffusers/quantization/models_utils.pyexamples/diffusers/quantization/utils.pytests/_test_utils/torch/diffusers_models.pytests/unit/torch/export/test_export_diffusers.py
🚧 Files skipped from review as they are similar to previous changes (1)
- examples/diffusers/quantization/utils.py
Thank you. Almost everything is fixed, but we couldn’t agree on the test case for check_conv_and_mha. This is just an example script under examples/diffusers/quantization/, not library code. Writing a unit test for it isn’t necessary, since we’ve already verified that the code works for all possible inputs covered by this example. |
Edwardf0t1
left a comment
There was a problem hiding this comment.
Is FP8/NVFP4 Flux2-dev supported in TRT-LLM VisualGen?
They haven’t yet, but it should be included in their next release. |
What does this PR do?
Type of change: new example
Overview:
Usage
python quantize.py \ --model flux2-dev \ --model-dtype BFloat16 \ --format fp4 --batch-size 2 --calib-size 1 \ --n-steps 20 --quantized-torch-ckpt-save-path ./flux2-dev-fp4.pt --collect-method default \ --hf-ckpt-dir ./flux2-dev-fp4Testing
Before your PR is "Ready for review"
Additional Information
Summary by CodeRabbit
New Features
Refactor
Infrastructure
Tests