Skip to content

[6056809] Fix TRT dependency in ModelOpt ONNX quantization#1189

Merged
kevalmorabia97 merged 5 commits intoNVIDIA:mainfrom
gcunhase:dev/gcunhasergio/6056809_modelopt_trt_fails
Apr 7, 2026
Merged

[6056809] Fix TRT dependency in ModelOpt ONNX quantization#1189
kevalmorabia97 merged 5 commits intoNVIDIA:mainfrom
gcunhase:dev/gcunhasergio/6056809_modelopt_trt_fails

Conversation

@gcunhase
Copy link
Copy Markdown
Contributor

@gcunhase gcunhase commented Apr 7, 2026

What does this PR do?

Type of change: Bug fix

Regression bug introduced by the Autotune integration into ModelOpt ONNX quantization (#951), making ModelOpt dependent on TensorRT in all scenarios. This PR fixes this issue by requiring TensorRT only when --autotune is enabled.

Usage

$ python -m modelopt.onnx.quantization --onnx_path=${MODEL_NAME}.onnx

Testing

See bug 6056809.

Before your PR is "Ready for review"

Make sure you read and follow Contributor guidelines and your commits are signed (git commit -s -S).

Make sure you read and follow the Security Best Practices (e.g. avoiding hardcoded trust_remote_code=True, torch.load(..., weights_only=False), pickle, etc.).

  • Is this change backward compatible?: ✅
  • If you copied code from any other sources or added a new PIP dependency, did you follow guidance in CONTRIBUTING.md: N/A
  • Did you write any new necessary tests?: ✅
  • Did you update Changelog?: N/A

Summary by CodeRabbit

  • Bug Fixes

    • Autotune dependency failures now surface as clearer runtime errors instead of only logging warnings.
  • Chores

    • Centralized autotune presets and numeric defaults into a shared configuration.
    • Core autotune components are conditionally exposed so initialization succeeds when optional acceleration libraries are absent.
    • Deferred autotune imports to runtime to improve failure handling.
  • Tests

    • Added a test ensuring the quantization CLI/parser initializes correctly without optional acceleration libraries.

gcunhase added 2 commits April 7, 2026 13:09
Signed-off-by: gcunhase <4861122+gcunhase@users.noreply.github.com>
Signed-off-by: gcunhase <4861122+gcunhase@users.noreply.github.com>
@gcunhase gcunhase requested a review from a team as a code owner April 7, 2026 17:46
@gcunhase gcunhase requested a review from ajrasane April 7, 2026 17:46
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 7, 2026

📝 Walkthrough

Walkthrough

Moves autotune presets and numeric defaults into autotune.utils, updates import targets across the autotune package, makes TensorRT-dependent autotune exports optional via conditional imports, surfaces Autotune import failures at runtime during quantization, and adds a test ensuring the CLI parser loads without TensorRT.

Changes

Cohort / File(s) Summary
Autotune utils/constants
modelopt/onnx/quantization/autotune/utils.py
Adds DEFAULT_NUM_SCHEMES, DEFAULT_WARMUP_RUNS, DEFAULT_TIMING_RUNS and MODE_PRESETS (quick, default, extensive).
Autotune entrypoints
modelopt/onnx/quantization/autotune/__main__.py, modelopt/onnx/quantization/__main__.py
Remove local preset/default definitions; import MODE_PRESETS and numeric defaults from autotune.utils and reference those shared constants.
Autotune package exports safety
modelopt/onnx/quantization/autotune/__init__.py
Import utility symbols (MODE_PRESETS, StoreWithExplicitFlag, get_node_filter_list) from .utils; wrap optional/heavy autotune exports (TensorRT-dependent classes and helpers) in a try/except ImportError and append them to __all__ only when available.
Quantization runtime handling
modelopt/onnx/quantization/quantize.py
Remove top-level Autotune imports; perform Autotune imports inside _find_nodes_to_quantize_autotune, and raise a RuntimeError (including the original ImportError message and a hint) if imports fail.
Tests: CLI without TensorRT
tests/unit/onnx/quantization/test_autotune_quantization_integration.py
Adds test_quantization_cli_parser_imports_without_tensorrt() which injects tensorrt = None into sys.modules, reloads the autotune package, builds the CLI parser via get_parser(), and asserts parsing of --onnx_path dummy.onnx with default quantize_mode int8.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

🚥 Pre-merge checks | ✅ 4
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and concisely describes the main fix: removing an unnecessary TensorRT dependency from the ONNX quantization module by making it conditional on autotune usage.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Security Anti-Patterns ✅ Passed The PR changes do not introduce any security anti-patterns. Modifications are limited to refactoring imports and making TensorRT an optional dependency.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

@gcunhase gcunhase requested a review from kevalmorabia97 April 7, 2026 17:47
Comment thread modelopt/onnx/quantization/quantize.py Outdated
Comment thread tests/unit/onnx/quantization/test_autotune_quantization_integration.py Outdated
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@modelopt/onnx/quantization/autotune/__init__.py`:
- Around line 27-50: Update the module's __all__ so it only exposes symbols
actually imported at runtime: keep the existing base entries (e.g.
"MODE_PRESETS", "StoreWithExplicitFlag", "get_node_filter_list") and add an
_OPTIONAL_EXPORTS list containing the optional TensorRT-related names (e.g.
"AutotunerError", "AutotunerNotInitializedError",
"ChildRegionInputInsertionPoint", "ChildRegionOutputInsertionPoint",
"CombinedRegionSearch", "Config", "InsertionScheme", "InvalidSchemeError",
"NodeInputInsertionPoint", "PatternCache", "PatternSchemes", "QDQAutotuner",
"Region", "RegionPattern", "RegionType", "ResolvedInsertionPoint",
"TensorRTPyBenchmark", "TrtExecBenchmark"), then extend __all__ with only those
names present in globals() (e.g. __all__.extend(name for name in
_OPTIONAL_EXPORTS if name in globals())) so wildcard imports and introspection
won't reference missing symbols when TensorRT is unavailable.
- Around line 27-50: Replace the bare "except ImportError: pass" with a narrow
error handling strategy: catch ModuleNotFoundError (or check for the optional
onnx extras) and only suppress that specific missing-dependency error, but
re-raise any other ImportError or Exception so typos/syntax/internal import
errors in QDQAutotuner, TensorRTPyBenchmark, TrtExecBenchmark, RegionPattern,
CombinedRegionSearch, etc., are surfaced; also make __all__ conditional (build
it from the names that were actually imported or set it to an empty list when
the optional deps are absent) so the public API does not export symbols that
failed to import.

In `@modelopt/onnx/quantization/quantize.py`:
- Around line 292-304: The try/except around init_benchmark_instance swallows
the original exception; change the bare except to capture the exception (e.g.,
except Exception as e) and raise a RuntimeError that includes the original error
text and chains it (raise RuntimeError("Failed to initialize benchmark...: {e}")
from e) so the original traceback and message from init_benchmark_instance are
preserved; locate the block around init_benchmark_instance in quantize.py and
update the except accordingly.

In `@tests/unit/onnx/quantization/test_autotune_quantization_integration.py`:
- Around line 22-38: Rename the test function
quantization_cli_parser_imports_without_tensorrt to start with the pytest
discovery prefix (e.g., test_quantization_cli_parser_imports_without_tensorrt)
in tests/unit/onnx/quantization/test_autotune_quantization_integration.py so
pytest will run it; locate the function by its current name and update any
internal references or docs/comments that mention the old name, but do not
change the test body or imports.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 2ba3b5d4-cb12-41e7-96e2-a327f6d88d13

📥 Commits

Reviewing files that changed from the base of the PR and between 80d2f02 and 52049b9.

📒 Files selected for processing (6)
  • modelopt/onnx/quantization/__main__.py
  • modelopt/onnx/quantization/autotune/__init__.py
  • modelopt/onnx/quantization/autotune/__main__.py
  • modelopt/onnx/quantization/autotune/utils.py
  • modelopt/onnx/quantization/quantize.py
  • tests/unit/onnx/quantization/test_autotune_quantization_integration.py

Comment thread modelopt/onnx/quantization/autotune/__init__.py Outdated
Comment thread modelopt/onnx/quantization/quantize.py
Comment thread tests/unit/onnx/quantization/test_autotune_quantization_integration.py Outdated
@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 7, 2026

Codecov Report

❌ Patch coverage is 88.88889% with 2 lines in your changes missing coverage. Please review.
✅ Project coverage is 76.83%. Comparing base (80d2f02) to head (f47e3ca).
⚠️ Report is 3 commits behind head on main.

Files with missing lines Patch % Lines
modelopt/onnx/quantization/quantize.py 0.00% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1189      +/-   ##
==========================================
+ Coverage   74.77%   76.83%   +2.05%     
==========================================
  Files         351      351              
  Lines       40289    40291       +2     
==========================================
+ Hits        30125    30956     +831     
+ Misses      10164     9335     -829     
Flag Coverage Δ
examples 44.17% <0.00%> (+3.93%) ⬆️
gpu 56.85% <83.33%> (-0.22%) ⬇️
unit 54.97% <88.88%> (+0.13%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

gcunhase added 3 commits April 7, 2026 14:39
Signed-off-by: gcunhase <4861122+gcunhase@users.noreply.github.com>
Signed-off-by: gcunhase <4861122+gcunhase@users.noreply.github.com>
Signed-off-by: gcunhase <4861122+gcunhase@users.noreply.github.com>
@kevalmorabia97 kevalmorabia97 added the cherry-pick-0.44.0 After code freeze, cherry-pick to release branch for next rc (bulk update). Only for bug fixes / doc label Apr 7, 2026
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
modelopt/onnx/quantization/autotune/__init__.py (1)

27-54: Lazy-load TensorRT-backed modules to avoid import-time side effects for non-autotune consumers.

While the try/except gracefully handles missing TensorRT dependencies, the package still executes the import path for all Autotune modules whenever modelopt.onnx.quantization.autotune is loaded. Consider deferring these imports until actual use via module-level lazy loading (e.g., import_plugin() pattern) to align with the project's optional dependency gating strategy. Current verified imports are all autotune-specific consumers (tests and integration code), so this is a design refinement rather than a blocker.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modelopt/onnx/quantization/autotune/__init__.py` around lines 27 - 54,
Replace the current top-level try/except imports with a deferred import helper
so importing modelopt.onnx.quantization.autotune has no side-effects; create a
function (e.g., import_autotune_plugins or _ensure_autotune_loaded) that
performs the try/except import of QDQAutotuner, TensorRTPyBenchmark,
TrtExecBenchmark and the symbols from common, insertion_points, region_pattern,
region_search, and on import failure logs the same warning via logger; have the
helper inject the imported names into the module namespace (via globals()) so
callers can access QDQAutotuner, TensorRTPyBenchmark, TrtExecBenchmark,
AutotunerError, AutotunerNotInitializedError, Config, InsertionScheme,
InvalidSchemeError, PatternCache, PatternSchemes, Region, RegionType,
ChildRegionInputInsertionPoint, ChildRegionOutputInsertionPoint,
NodeInputInsertionPoint, ResolvedInsertionPoint, RegionPattern,
CombinedRegionSearch when the helper is first called; leave existing public
names/exports working and ensure any unit tests or call sites call the helper
(or call it lazily on attribute access) before using those symbols.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@modelopt/onnx/quantization/autotune/__init__.py`:
- Around line 27-54: Replace the current top-level try/except imports with a
deferred import helper so importing modelopt.onnx.quantization.autotune has no
side-effects; create a function (e.g., import_autotune_plugins or
_ensure_autotune_loaded) that performs the try/except import of QDQAutotuner,
TensorRTPyBenchmark, TrtExecBenchmark and the symbols from common,
insertion_points, region_pattern, region_search, and on import failure logs the
same warning via logger; have the helper inject the imported names into the
module namespace (via globals()) so callers can access QDQAutotuner,
TensorRTPyBenchmark, TrtExecBenchmark, AutotunerError,
AutotunerNotInitializedError, Config, InsertionScheme, InvalidSchemeError,
PatternCache, PatternSchemes, Region, RegionType,
ChildRegionInputInsertionPoint, ChildRegionOutputInsertionPoint,
NodeInputInsertionPoint, ResolvedInsertionPoint, RegionPattern,
CombinedRegionSearch when the helper is first called; leave existing public
names/exports working and ensure any unit tests or call sites call the helper
(or call it lazily on attribute access) before using those symbols.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 8ef7e0df-2178-4197-b527-90c207bb0963

📥 Commits

Reviewing files that changed from the base of the PR and between 61fc352 and f47e3ca.

📒 Files selected for processing (1)
  • modelopt/onnx/quantization/autotune/__init__.py

@kevalmorabia97 kevalmorabia97 merged commit bdc04f1 into NVIDIA:main Apr 7, 2026
97 of 107 checks passed
kevalmorabia97 pushed a commit that referenced this pull request Apr 7, 2026
### What does this PR do?

Type of change: Bug fix

Regression bug introduced by the Autotune integration into ModelOpt ONNX
quantization (#951),
making ModelOpt dependent on TensorRT in all scenarios. This PR fixes
this issue by requiring TensorRT only when `--autotune` is enabled.

### Usage

```python
$ python -m modelopt.onnx.quantization --onnx_path=${MODEL_NAME}.onnx
```

### Testing
See bug 6056809.

### Before your PR is "*Ready for review*"

Make sure you read and follow [Contributor
guidelines](https://github.com/NVIDIA/Model-Optimizer/blob/main/CONTRIBUTING.md)
and your commits are signed (`git commit -s -S`).

Make sure you read and follow the [Security Best
Practices](https://github.com/NVIDIA/Model-Optimizer/blob/main/SECURITY.md#security-coding-practices-for-contributors)
(e.g. avoiding hardcoded `trust_remote_code=True`, `torch.load(...,
weights_only=False)`, `pickle`, etc.).

- Is this change backward compatible?: ✅
- If you copied code from any other sources or added a new PIP
dependency, did you follow guidance in `CONTRIBUTING.md`: N/A
- Did you write any new necessary tests?: ✅
- Did you update
[Changelog](https://github.com/NVIDIA/Model-Optimizer/blob/main/CHANGELOG.rst)?:
N/A

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

* **Bug Fixes**
* Autotune dependency failures now surface as clearer runtime errors
instead of only logging warnings.

* **Chores**
* Centralized autotune presets and numeric defaults into a shared
configuration.
* Core autotune components are conditionally exposed so initialization
succeeds when optional acceleration libraries are absent.
  * Deferred autotune imports to runtime to improve failure handling.

* **Tests**
* Added a test ensuring the quantization CLI/parser initializes
correctly without optional acceleration libraries.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Signed-off-by: gcunhase <4861122+gcunhase@users.noreply.github.com>
kinjalpatel27 pushed a commit that referenced this pull request Apr 13, 2026
### What does this PR do?

Type of change: Bug fix

Regression bug introduced by the Autotune integration into ModelOpt ONNX
quantization (#951),
making ModelOpt dependent on TensorRT in all scenarios. This PR fixes
this issue by requiring TensorRT only when `--autotune` is enabled.

### Usage

```python
$ python -m modelopt.onnx.quantization --onnx_path=${MODEL_NAME}.onnx
```

### Testing
See bug 6056809.

### Before your PR is "*Ready for review*"

Make sure you read and follow [Contributor
guidelines](https://github.com/NVIDIA/Model-Optimizer/blob/main/CONTRIBUTING.md)
and your commits are signed (`git commit -s -S`).

Make sure you read and follow the [Security Best
Practices](https://github.com/NVIDIA/Model-Optimizer/blob/main/SECURITY.md#security-coding-practices-for-contributors)
(e.g. avoiding hardcoded `trust_remote_code=True`, `torch.load(...,
weights_only=False)`, `pickle`, etc.).

- Is this change backward compatible?: ✅
- If you copied code from any other sources or added a new PIP
dependency, did you follow guidance in `CONTRIBUTING.md`: N/A
- Did you write any new necessary tests?: ✅
- Did you update
[Changelog](https://github.com/NVIDIA/Model-Optimizer/blob/main/CHANGELOG.rst)?:
N/A

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

* **Bug Fixes**
* Autotune dependency failures now surface as clearer runtime errors
instead of only logging warnings.

* **Chores**
* Centralized autotune presets and numeric defaults into a shared
configuration.
* Core autotune components are conditionally exposed so initialization
succeeds when optional acceleration libraries are absent.
  * Deferred autotune imports to runtime to improve failure handling.

* **Tests**
* Added a test ensuring the quantization CLI/parser initializes
correctly without optional acceleration libraries.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Signed-off-by: gcunhase <4861122+gcunhase@users.noreply.github.com>
@kevalmorabia97 kevalmorabia97 removed the cherry-pick-0.44.0 After code freeze, cherry-pick to release branch for next rc (bulk update). Only for bug fixes / doc label Apr 23, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants