[6056809] Fix TRT dependency in ModelOpt ONNX quantization#1189
Conversation
Signed-off-by: gcunhase <4861122+gcunhase@users.noreply.github.com>
📝 WalkthroughWalkthroughMoves autotune presets and numeric defaults into Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes 🚥 Pre-merge checks | ✅ 4✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 4
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@modelopt/onnx/quantization/autotune/__init__.py`:
- Around line 27-50: Update the module's __all__ so it only exposes symbols
actually imported at runtime: keep the existing base entries (e.g.
"MODE_PRESETS", "StoreWithExplicitFlag", "get_node_filter_list") and add an
_OPTIONAL_EXPORTS list containing the optional TensorRT-related names (e.g.
"AutotunerError", "AutotunerNotInitializedError",
"ChildRegionInputInsertionPoint", "ChildRegionOutputInsertionPoint",
"CombinedRegionSearch", "Config", "InsertionScheme", "InvalidSchemeError",
"NodeInputInsertionPoint", "PatternCache", "PatternSchemes", "QDQAutotuner",
"Region", "RegionPattern", "RegionType", "ResolvedInsertionPoint",
"TensorRTPyBenchmark", "TrtExecBenchmark"), then extend __all__ with only those
names present in globals() (e.g. __all__.extend(name for name in
_OPTIONAL_EXPORTS if name in globals())) so wildcard imports and introspection
won't reference missing symbols when TensorRT is unavailable.
- Around line 27-50: Replace the bare "except ImportError: pass" with a narrow
error handling strategy: catch ModuleNotFoundError (or check for the optional
onnx extras) and only suppress that specific missing-dependency error, but
re-raise any other ImportError or Exception so typos/syntax/internal import
errors in QDQAutotuner, TensorRTPyBenchmark, TrtExecBenchmark, RegionPattern,
CombinedRegionSearch, etc., are surfaced; also make __all__ conditional (build
it from the names that were actually imported or set it to an empty list when
the optional deps are absent) so the public API does not export symbols that
failed to import.
In `@modelopt/onnx/quantization/quantize.py`:
- Around line 292-304: The try/except around init_benchmark_instance swallows
the original exception; change the bare except to capture the exception (e.g.,
except Exception as e) and raise a RuntimeError that includes the original error
text and chains it (raise RuntimeError("Failed to initialize benchmark...: {e}")
from e) so the original traceback and message from init_benchmark_instance are
preserved; locate the block around init_benchmark_instance in quantize.py and
update the except accordingly.
In `@tests/unit/onnx/quantization/test_autotune_quantization_integration.py`:
- Around line 22-38: Rename the test function
quantization_cli_parser_imports_without_tensorrt to start with the pytest
discovery prefix (e.g., test_quantization_cli_parser_imports_without_tensorrt)
in tests/unit/onnx/quantization/test_autotune_quantization_integration.py so
pytest will run it; locate the function by its current name and update any
internal references or docs/comments that mention the old name, but do not
change the test body or imports.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 2ba3b5d4-cb12-41e7-96e2-a327f6d88d13
📒 Files selected for processing (6)
modelopt/onnx/quantization/__main__.pymodelopt/onnx/quantization/autotune/__init__.pymodelopt/onnx/quantization/autotune/__main__.pymodelopt/onnx/quantization/autotune/utils.pymodelopt/onnx/quantization/quantize.pytests/unit/onnx/quantization/test_autotune_quantization_integration.py
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #1189 +/- ##
==========================================
+ Coverage 74.77% 76.83% +2.05%
==========================================
Files 351 351
Lines 40289 40291 +2
==========================================
+ Hits 30125 30956 +831
+ Misses 10164 9335 -829
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Signed-off-by: gcunhase <4861122+gcunhase@users.noreply.github.com>
Signed-off-by: gcunhase <4861122+gcunhase@users.noreply.github.com>
Signed-off-by: gcunhase <4861122+gcunhase@users.noreply.github.com>
There was a problem hiding this comment.
🧹 Nitpick comments (1)
modelopt/onnx/quantization/autotune/__init__.py (1)
27-54: Lazy-load TensorRT-backed modules to avoid import-time side effects for non-autotune consumers.While the
try/exceptgracefully handles missing TensorRT dependencies, the package still executes the import path for all Autotune modules whenevermodelopt.onnx.quantization.autotuneis loaded. Consider deferring these imports until actual use via module-level lazy loading (e.g.,import_plugin()pattern) to align with the project's optional dependency gating strategy. Current verified imports are all autotune-specific consumers (tests and integration code), so this is a design refinement rather than a blocker.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@modelopt/onnx/quantization/autotune/__init__.py` around lines 27 - 54, Replace the current top-level try/except imports with a deferred import helper so importing modelopt.onnx.quantization.autotune has no side-effects; create a function (e.g., import_autotune_plugins or _ensure_autotune_loaded) that performs the try/except import of QDQAutotuner, TensorRTPyBenchmark, TrtExecBenchmark and the symbols from common, insertion_points, region_pattern, region_search, and on import failure logs the same warning via logger; have the helper inject the imported names into the module namespace (via globals()) so callers can access QDQAutotuner, TensorRTPyBenchmark, TrtExecBenchmark, AutotunerError, AutotunerNotInitializedError, Config, InsertionScheme, InvalidSchemeError, PatternCache, PatternSchemes, Region, RegionType, ChildRegionInputInsertionPoint, ChildRegionOutputInsertionPoint, NodeInputInsertionPoint, ResolvedInsertionPoint, RegionPattern, CombinedRegionSearch when the helper is first called; leave existing public names/exports working and ensure any unit tests or call sites call the helper (or call it lazily on attribute access) before using those symbols.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@modelopt/onnx/quantization/autotune/__init__.py`:
- Around line 27-54: Replace the current top-level try/except imports with a
deferred import helper so importing modelopt.onnx.quantization.autotune has no
side-effects; create a function (e.g., import_autotune_plugins or
_ensure_autotune_loaded) that performs the try/except import of QDQAutotuner,
TensorRTPyBenchmark, TrtExecBenchmark and the symbols from common,
insertion_points, region_pattern, region_search, and on import failure logs the
same warning via logger; have the helper inject the imported names into the
module namespace (via globals()) so callers can access QDQAutotuner,
TensorRTPyBenchmark, TrtExecBenchmark, AutotunerError,
AutotunerNotInitializedError, Config, InsertionScheme, InvalidSchemeError,
PatternCache, PatternSchemes, Region, RegionType,
ChildRegionInputInsertionPoint, ChildRegionOutputInsertionPoint,
NodeInputInsertionPoint, ResolvedInsertionPoint, RegionPattern,
CombinedRegionSearch when the helper is first called; leave existing public
names/exports working and ensure any unit tests or call sites call the helper
(or call it lazily on attribute access) before using those symbols.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 8ef7e0df-2178-4197-b527-90c207bb0963
📒 Files selected for processing (1)
modelopt/onnx/quantization/autotune/__init__.py
### What does this PR do? Type of change: Bug fix Regression bug introduced by the Autotune integration into ModelOpt ONNX quantization (#951), making ModelOpt dependent on TensorRT in all scenarios. This PR fixes this issue by requiring TensorRT only when `--autotune` is enabled. ### Usage ```python $ python -m modelopt.onnx.quantization --onnx_path=${MODEL_NAME}.onnx ``` ### Testing See bug 6056809. ### Before your PR is "*Ready for review*" Make sure you read and follow [Contributor guidelines](https://github.com/NVIDIA/Model-Optimizer/blob/main/CONTRIBUTING.md) and your commits are signed (`git commit -s -S`). Make sure you read and follow the [Security Best Practices](https://github.com/NVIDIA/Model-Optimizer/blob/main/SECURITY.md#security-coding-practices-for-contributors) (e.g. avoiding hardcoded `trust_remote_code=True`, `torch.load(..., weights_only=False)`, `pickle`, etc.). - Is this change backward compatible?: ✅ - If you copied code from any other sources or added a new PIP dependency, did you follow guidance in `CONTRIBUTING.md`: N/A - Did you write any new necessary tests?: ✅ - Did you update [Changelog](https://github.com/NVIDIA/Model-Optimizer/blob/main/CHANGELOG.rst)?: N/A <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **Bug Fixes** * Autotune dependency failures now surface as clearer runtime errors instead of only logging warnings. * **Chores** * Centralized autotune presets and numeric defaults into a shared configuration. * Core autotune components are conditionally exposed so initialization succeeds when optional acceleration libraries are absent. * Deferred autotune imports to runtime to improve failure handling. * **Tests** * Added a test ensuring the quantization CLI/parser initializes correctly without optional acceleration libraries. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Signed-off-by: gcunhase <4861122+gcunhase@users.noreply.github.com>
### What does this PR do? Type of change: Bug fix Regression bug introduced by the Autotune integration into ModelOpt ONNX quantization (#951), making ModelOpt dependent on TensorRT in all scenarios. This PR fixes this issue by requiring TensorRT only when `--autotune` is enabled. ### Usage ```python $ python -m modelopt.onnx.quantization --onnx_path=${MODEL_NAME}.onnx ``` ### Testing See bug 6056809. ### Before your PR is "*Ready for review*" Make sure you read and follow [Contributor guidelines](https://github.com/NVIDIA/Model-Optimizer/blob/main/CONTRIBUTING.md) and your commits are signed (`git commit -s -S`). Make sure you read and follow the [Security Best Practices](https://github.com/NVIDIA/Model-Optimizer/blob/main/SECURITY.md#security-coding-practices-for-contributors) (e.g. avoiding hardcoded `trust_remote_code=True`, `torch.load(..., weights_only=False)`, `pickle`, etc.). - Is this change backward compatible?: ✅ - If you copied code from any other sources or added a new PIP dependency, did you follow guidance in `CONTRIBUTING.md`: N/A - Did you write any new necessary tests?: ✅ - Did you update [Changelog](https://github.com/NVIDIA/Model-Optimizer/blob/main/CHANGELOG.rst)?: N/A <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **Bug Fixes** * Autotune dependency failures now surface as clearer runtime errors instead of only logging warnings. * **Chores** * Centralized autotune presets and numeric defaults into a shared configuration. * Core autotune components are conditionally exposed so initialization succeeds when optional acceleration libraries are absent. * Deferred autotune imports to runtime to improve failure handling. * **Tests** * Added a test ensuring the quantization CLI/parser initializes correctly without optional acceleration libraries. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Signed-off-by: gcunhase <4861122+gcunhase@users.noreply.github.com>
What does this PR do?
Type of change: Bug fix
Regression bug introduced by the Autotune integration into ModelOpt ONNX quantization (#951), making ModelOpt dependent on TensorRT in all scenarios. This PR fixes this issue by requiring TensorRT only when
--autotuneis enabled.Usage
Testing
See bug 6056809.
Before your PR is "Ready for review"
Make sure you read and follow Contributor guidelines and your commits are signed (
git commit -s -S).Make sure you read and follow the Security Best Practices (e.g. avoiding hardcoded
trust_remote_code=True,torch.load(..., weights_only=False),pickle, etc.).CONTRIBUTING.md: N/ASummary by CodeRabbit
Bug Fixes
Chores
Tests