[6056809] Fix TRT dependency in ModelOpt ONNX quantization by gcunhase · Pull Request #1189 · NVIDIA/Model-Optimizer

gcunhase · 2026-04-07T17:46:31Z

What does this PR do?

Type of change: Bug fix

Regression bug introduced by the Autotune integration into ModelOpt ONNX quantization (#951), making ModelOpt dependent on TensorRT in all scenarios. This PR fixes this issue by requiring TensorRT only when --autotune is enabled.

Usage

$ python -m modelopt.onnx.quantization --onnx_path=${MODEL_NAME}.onnx

Testing

See bug 6056809.

Before your PR is "Ready for review"

Make sure you read and follow Contributor guidelines and your commits are signed (git commit -s -S).

Make sure you read and follow the Security Best Practices (e.g. avoiding hardcoded trust_remote_code=True, torch.load(..., weights_only=False), pickle, etc.).

Is this change backward compatible?: ✅
If you copied code from any other sources or added a new PIP dependency, did you follow guidance in CONTRIBUTING.md: N/A
Did you write any new necessary tests?: ✅
Did you update Changelog?: N/A

Summary by CodeRabbit

Bug Fixes
- Autotune dependency failures now surface as clearer runtime errors instead of only logging warnings.
Chores
- Centralized autotune presets and numeric defaults into a shared configuration.
- Core autotune components are conditionally exposed so initialization succeeds when optional acceleration libraries are absent.
- Deferred autotune imports to runtime to improve failure handling.
Tests
- Added a test ensuring the quantization CLI/parser initializes correctly without optional acceleration libraries.

Signed-off-by: gcunhase <4861122+gcunhase@users.noreply.github.com>

coderabbitai · 2026-04-07T17:46:46Z

📝 Walkthrough

Walkthrough

Moves autotune presets and numeric defaults into autotune.utils, updates import targets across the autotune package, makes TensorRT-dependent autotune exports optional via conditional imports, surfaces Autotune import failures at runtime during quantization, and adds a test ensuring the CLI parser loads without TensorRT.

Changes

Cohort / File(s)	Summary
Autotune utils/constants `modelopt/onnx/quantization/autotune/utils.py`	Adds `DEFAULT_NUM_SCHEMES`, `DEFAULT_WARMUP_RUNS`, `DEFAULT_TIMING_RUNS` and `MODE_PRESETS` (`quick`, `default`, `extensive`).
Autotune entrypoints `modelopt/onnx/quantization/autotune/__main__.py`, `modelopt/onnx/quantization/__main__.py`	Remove local preset/default definitions; import `MODE_PRESETS` and numeric defaults from `autotune.utils` and reference those shared constants.
Autotune package exports safety `modelopt/onnx/quantization/autotune/__init__.py`	Import utility symbols (`MODE_PRESETS`, `StoreWithExplicitFlag`, `get_node_filter_list`) from `.utils`; wrap optional/heavy autotune exports (TensorRT-dependent classes and helpers) in a `try/except ImportError` and append them to `__all__` only when available.
Quantization runtime handling `modelopt/onnx/quantization/quantize.py`	Remove top-level Autotune imports; perform Autotune imports inside `_find_nodes_to_quantize_autotune`, and raise a `RuntimeError` (including the original `ImportError` message and a hint) if imports fail.
Tests: CLI without TensorRT `tests/unit/onnx/quantization/test_autotune_quantization_integration.py`	Adds `test_quantization_cli_parser_imports_without_tensorrt()` which injects `tensorrt = None` into `sys.modules`, reloads the autotune package, builds the CLI parser via `get_parser()`, and asserts parsing of `--onnx_path dummy.onnx` with default `quantize_mode` `int8`.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

🚥 Pre-merge checks | ✅ 4

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title clearly and concisely describes the main fix: removing an unnecessary TensorRT dependency from the ONNX quantization module by making it conditional on autotune usage.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Security Anti-Patterns	✅ Passed	The PR changes do not introduce any security anti-patterns. Modifications are limited to refactoring imports and making TensorRT an optional dependency.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 4

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@modelopt/onnx/quantization/autotune/__init__.py`:
- Around line 27-50: Update the module's __all__ so it only exposes symbols
actually imported at runtime: keep the existing base entries (e.g.
"MODE_PRESETS", "StoreWithExplicitFlag", "get_node_filter_list") and add an
_OPTIONAL_EXPORTS list containing the optional TensorRT-related names (e.g.
"AutotunerError", "AutotunerNotInitializedError",
"ChildRegionInputInsertionPoint", "ChildRegionOutputInsertionPoint",
"CombinedRegionSearch", "Config", "InsertionScheme", "InvalidSchemeError",
"NodeInputInsertionPoint", "PatternCache", "PatternSchemes", "QDQAutotuner",
"Region", "RegionPattern", "RegionType", "ResolvedInsertionPoint",
"TensorRTPyBenchmark", "TrtExecBenchmark"), then extend __all__ with only those
names present in globals() (e.g. __all__.extend(name for name in
_OPTIONAL_EXPORTS if name in globals())) so wildcard imports and introspection
won't reference missing symbols when TensorRT is unavailable.
- Around line 27-50: Replace the bare "except ImportError: pass" with a narrow
error handling strategy: catch ModuleNotFoundError (or check for the optional
onnx extras) and only suppress that specific missing-dependency error, but
re-raise any other ImportError or Exception so typos/syntax/internal import
errors in QDQAutotuner, TensorRTPyBenchmark, TrtExecBenchmark, RegionPattern,
CombinedRegionSearch, etc., are surfaced; also make __all__ conditional (build
it from the names that were actually imported or set it to an empty list when
the optional deps are absent) so the public API does not export symbols that
failed to import.

In `@modelopt/onnx/quantization/quantize.py`:
- Around line 292-304: The try/except around init_benchmark_instance swallows
the original exception; change the bare except to capture the exception (e.g.,
except Exception as e) and raise a RuntimeError that includes the original error
text and chains it (raise RuntimeError("Failed to initialize benchmark...: {e}")
from e) so the original traceback and message from init_benchmark_instance are
preserved; locate the block around init_benchmark_instance in quantize.py and
update the except accordingly.

In `@tests/unit/onnx/quantization/test_autotune_quantization_integration.py`:
- Around line 22-38: Rename the test function
quantization_cli_parser_imports_without_tensorrt to start with the pytest
discovery prefix (e.g., test_quantization_cli_parser_imports_without_tensorrt)
in tests/unit/onnx/quantization/test_autotune_quantization_integration.py so
pytest will run it; locate the function by its current name and update any
internal references or docs/comments that mention the old name, but do not
change the test body or imports.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 2ba3b5d4-cb12-41e7-96e2-a327f6d88d13

📥 Commits

Reviewing files that changed from the base of the PR and between 80d2f02 and 52049b9.

📒 Files selected for processing (6)

modelopt/onnx/quantization/__main__.py
modelopt/onnx/quantization/autotune/__init__.py
modelopt/onnx/quantization/autotune/__main__.py
modelopt/onnx/quantization/autotune/utils.py
modelopt/onnx/quantization/quantize.py
tests/unit/onnx/quantization/test_autotune_quantization_integration.py

codecov · 2026-04-07T18:01:21Z

Codecov Report

❌ Patch coverage is 88.88889% with 2 lines in your changes missing coverage. Please review.
✅ Project coverage is 76.83%. Comparing base (80d2f02) to head (f47e3ca).
⚠️ Report is 3 commits behind head on main.

Files with missing lines	Patch %	Lines
modelopt/onnx/quantization/quantize.py	0.00%	2 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #1189      +/-   ##
==========================================
+ Coverage   74.77%   76.83%   +2.05%     
==========================================
  Files         351      351              
  Lines       40289    40291       +2     
==========================================
+ Hits        30125    30956     +831     
+ Misses      10164     9335     -829

Flag	Coverage Δ
examples	`44.17% <0.00%> (+3.93%)`	⬆️
gpu	`56.85% <83.33%> (-0.22%)`	⬇️
unit	`54.97% <88.88%> (+0.13%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Signed-off-by: gcunhase <4861122+gcunhase@users.noreply.github.com>

coderabbitai

🧹 Nitpick comments (1)

modelopt/onnx/quantization/autotune/__init__.py (1)
27-54: Lazy-load TensorRT-backed modules to avoid import-time side effects for non-autotune consumers.

While the try/except gracefully handles missing TensorRT dependencies, the package still executes the import path for all Autotune modules whenever modelopt.onnx.quantization.autotune is loaded. Consider deferring these imports until actual use via module-level lazy loading (e.g., import_plugin() pattern) to align with the project's optional dependency gating strategy. Current verified imports are all autotune-specific consumers (tests and integration code), so this is a design refinement rather than a blocker.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modelopt/onnx/quantization/autotune/__init__.py` around lines 27 - 54,
Replace the current top-level try/except imports with a deferred import helper
so importing modelopt.onnx.quantization.autotune has no side-effects; create a
function (e.g., import_autotune_plugins or _ensure_autotune_loaded) that
performs the try/except import of QDQAutotuner, TensorRTPyBenchmark,
TrtExecBenchmark and the symbols from common, insertion_points, region_pattern,
region_search, and on import failure logs the same warning via logger; have the
helper inject the imported names into the module namespace (via globals()) so
callers can access QDQAutotuner, TensorRTPyBenchmark, TrtExecBenchmark,
AutotunerError, AutotunerNotInitializedError, Config, InsertionScheme,
InvalidSchemeError, PatternCache, PatternSchemes, Region, RegionType,
ChildRegionInputInsertionPoint, ChildRegionOutputInsertionPoint,
NodeInputInsertionPoint, ResolvedInsertionPoint, RegionPattern,
CombinedRegionSearch when the helper is first called; leave existing public
names/exports working and ensure any unit tests or call sites call the helper
(or call it lazily on attribute access) before using those symbols.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@modelopt/onnx/quantization/autotune/__init__.py`:
- Around line 27-54: Replace the current top-level try/except imports with a
deferred import helper so importing modelopt.onnx.quantization.autotune has no
side-effects; create a function (e.g., import_autotune_plugins or
_ensure_autotune_loaded) that performs the try/except import of QDQAutotuner,
TensorRTPyBenchmark, TrtExecBenchmark and the symbols from common,
insertion_points, region_pattern, region_search, and on import failure logs the
same warning via logger; have the helper inject the imported names into the
module namespace (via globals()) so callers can access QDQAutotuner,
TensorRTPyBenchmark, TrtExecBenchmark, AutotunerError,
AutotunerNotInitializedError, Config, InsertionScheme, InvalidSchemeError,
PatternCache, PatternSchemes, Region, RegionType,
ChildRegionInputInsertionPoint, ChildRegionOutputInsertionPoint,
NodeInputInsertionPoint, ResolvedInsertionPoint, RegionPattern,
CombinedRegionSearch when the helper is first called; leave existing public
names/exports working and ensure any unit tests or call sites call the helper
(or call it lazily on attribute access) before using those symbols.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 8ef7e0df-2178-4197-b527-90c207bb0963

📥 Commits

Reviewing files that changed from the base of the PR and between 61fc352 and f47e3ca.

📒 Files selected for processing (1)

modelopt/onnx/quantization/autotune/__init__.py

### What does this PR do? Type of change: Bug fix Regression bug introduced by the Autotune integration into ModelOpt ONNX quantization (#951), making ModelOpt dependent on TensorRT in all scenarios. This PR fixes this issue by requiring TensorRT only when `--autotune` is enabled. ### Usage ```python $ python -m modelopt.onnx.quantization --onnx_path=${MODEL_NAME}.onnx ``` ### Testing See bug 6056809. ### Before your PR is "*Ready for review*" Make sure you read and follow [Contributor guidelines](https://github.com/NVIDIA/Model-Optimizer/blob/main/CONTRIBUTING.md) and your commits are signed (`git commit -s -S`). Make sure you read and follow the [Security Best Practices](https://github.com/NVIDIA/Model-Optimizer/blob/main/SECURITY.md#security-coding-practices-for-contributors) (e.g. avoiding hardcoded `trust_remote_code=True`, `torch.load(..., weights_only=False)`, `pickle`, etc.). - Is this change backward compatible?: ✅ - If you copied code from any other sources or added a new PIP dependency, did you follow guidance in `CONTRIBUTING.md`: N/A - Did you write any new necessary tests?: ✅ - Did you update [Changelog](https://github.com/NVIDIA/Model-Optimizer/blob/main/CHANGELOG.rst)?: N/A  ## Summary by CodeRabbit * **Bug Fixes** * Autotune dependency failures now surface as clearer runtime errors instead of only logging warnings. * **Chores** * Centralized autotune presets and numeric defaults into a shared configuration. * Core autotune components are conditionally exposed so initialization succeeds when optional acceleration libraries are absent. * Deferred autotune imports to runtime to improve failure handling. * **Tests** * Added a test ensuring the quantization CLI/parser initializes correctly without optional acceleration libraries.  --------- Signed-off-by: gcunhase <4861122+gcunhase@users.noreply.github.com>

gcunhase added 2 commits April 7, 2026 13:09

Fix TRT dependency in ModelOpt ONNX quantization

47724f8

Signed-off-by: gcunhase <4861122+gcunhase@users.noreply.github.com>

Add test

52049b9

Signed-off-by: gcunhase <4861122+gcunhase@users.noreply.github.com>

gcunhase requested a review from a team as a code owner April 7, 2026 17:46

gcunhase requested a review from ajrasane April 7, 2026 17:46

gcunhase requested a review from kevalmorabia97 April 7, 2026 17:47

kevalmorabia97 reviewed Apr 7, 2026

View reviewed changes

Comment thread modelopt/onnx/quantization/quantize.py Outdated

kevalmorabia97 reviewed Apr 7, 2026

View reviewed changes

Comment thread tests/unit/onnx/quantization/test_autotune_quantization_integration.py Outdated

coderabbitai Bot reviewed Apr 7, 2026

View reviewed changes

Comment thread modelopt/onnx/quantization/autotune/__init__.py Outdated

Comment thread modelopt/onnx/quantization/quantize.py

Comment thread tests/unit/onnx/quantization/test_autotune_quantization_integration.py Outdated

gcunhase added 3 commits April 7, 2026 14:39

Fixed test name

6494d1a

Signed-off-by: gcunhase <4861122+gcunhase@users.noreply.github.com>

Moved try/except import check to inside quantize_autotune function

61fc352

Signed-off-by: gcunhase <4861122+gcunhase@users.noreply.github.com>

Made __all__ robust to missing optional TensorRT dependencies

f47e3ca

Signed-off-by: gcunhase <4861122+gcunhase@users.noreply.github.com>

ajrasane approved these changes Apr 7, 2026

View reviewed changes

kevalmorabia97 approved these changes Apr 7, 2026

View reviewed changes

kevalmorabia97 added the cherry-pick-0.44.0 After code freeze, cherry-pick to release branch for next rc (bulk update). Only for bug fixes / doc label Apr 7, 2026

coderabbitai Bot reviewed Apr 7, 2026

View reviewed changes

kevalmorabia97 merged commit bdc04f1 into NVIDIA:main Apr 7, 2026
97 of 107 checks passed

kevalmorabia97 removed the cherry-pick-0.44.0 After code freeze, cherry-pick to release branch for next rc (bulk update). Only for bug fixes / doc label Apr 23, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[6056809] Fix TRT dependency in ModelOpt ONNX quantization#1189

[6056809] Fix TRT dependency in ModelOpt ONNX quantization#1189
kevalmorabia97 merged 5 commits intoNVIDIA:mainfrom
gcunhase:dev/gcunhasergio/6056809_modelopt_trt_fails

gcunhase commented Apr 7, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Apr 7, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

codecov Bot commented Apr 7, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

gcunhase commented Apr 7, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Usage

Testing

Before your PR is "Ready for review"

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Apr 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

codecov Bot commented Apr 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

gcunhase commented Apr 7, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Apr 7, 2026 •

edited

Loading

codecov Bot commented Apr 7, 2026 •

edited

Loading