fix: auto-upgrade model opset to 21 for int16/uint16 QDQ quantization#28202
fix: auto-upgrade model opset to 21 for int16/uint16 QDQ quantization#28202Rishi-Dave wants to merge 3 commits intomicrosoft:mainfrom
Conversation
The update_opset_version helper already auto-bumps opset to 19 when float8 quantization is requested on older models. Extend the same pattern to int16/uint16: when the user requests QUInt16 or QInt16 weight quantization and the model's opset is below 21, bump to 21 so that native ONNX QuantizeLinear/DequantizeLinear can be emitted instead of silently falling back to the com.microsoft contrib domain. Fixes microsoft#25223
There was a problem hiding this comment.
Pull request overview
Extends ONNX Runtime’s Python quantization utilities to automatically upgrade an input model’s ONNX opset to 21 when 16-bit integer (INT16/UINT16) QDQ quantization is requested, aligning behavior with the existing float8 opset auto-upgrade logic and adding regression tests.
Changes:
- Add an
update_opset_versionbranch to auto-bump opset< 21to21for INT16/UINT16 quantization types (with a warning). - Add unit tests validating opset 20 → 21 upgrade and no-op behavior at opset 21 for QUInt16/QInt16.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
onnxruntime/python/tools/quantization/quant_utils.py |
Adds opset auto-upgrade logic for INT16/UINT16 quantization to ensure ONNX-native QDQ compatibility. |
onnxruntime/test/python/quantization/test_quant_util.py |
Adds test coverage for the new opset upgrade behavior for 16-bit quantization types. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
update_opset_version previously only inspected weight_type, so a config like activation_type=QInt16 with weight_type=QInt8 would not trigger the opset>=21 bump and could produce a model with int16 Q/DQ on opset<21. Extend the helper to accept activation_type and bump when either is INT16/UINT16. Update the quantize_static call site and add subtests covering 16-bit-activation-only, 16-bit-weight-only, both-8bit, and backward-compat (single-arg call) cases.
tianleiwu
left a comment
There was a problem hiding this comment.
Thanks for the follow-up on the activation-type gap. I found one remaining workflow issue: the config-based static quantization path can still carry an auto-derived UseQDQContribOps flag from the original opset, so it may keep emitting contrib-domain Q/DQ after the model is bumped to opset 21. Please align that path with the direct quantize_static behavior before merge.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| _int16_types = (onnx.TensorProto.UINT16, onnx.TensorProto.INT16) | ||
| needs_opset21_for_16bit = weight_quant_type in _int16_types or activation_quant_type in _int16_types |
| # Both 8-bit should NOT bump to 21 | ||
| with self.subTest(weight_type="QInt8", activation_type="QUInt8", opset=20): | ||
| model = helper.make_model(graph, opset_imports=[helper.make_opsetid("", 20)]) | ||
| result = update_opset_version(model, QuantType.QInt8, QuantType.QUInt8) | ||
| result_opset = result.opset_import[0].version | ||
| self.assertNotEqual(result_opset, 21) |
… get_qdq_config get_qdq_config() was auto-setting extra_options["UseQDQContribOps"] = True whenever activation_type or weight_type was INT16/UINT16 and the model opset was < 21. This caused the config-based quantize(..., StaticQuantConfig) path to emit com.microsoft Q/DQ ops even after quantize_static() bumped the model to opset 21, where native ONNX QuantizeLinear/DequantizeLinear supports INT16/UINT16 natively. Narrow the condition so that UseQDQContribOps is only auto-set for 4-bit types (which have no opset bump) and for tensor-override-based types; 16-bit top-level weight/activation types are excluded because the opset-21 bump in quantize_static() already handles them. An explicit user-supplied UseQDQContribOps in extra_options still takes precedence via the existing override merge. Update test_get_qdq_config.py: rename and fix the int16-opset19 subtest to assert the new correct behavior (no contrib-ops flag), and add an end-to-end test that verifies the config path produces an opset-21 model with native-domain Q/DQ nodes. Tighten the existing no-op subtest in test_quant_util.py from assertNotEqual to assertEqual(result_opset, 20) for a stricter regression guard.
tianleiwu
left a comment
There was a problem hiding this comment.
Re-review after commit ef62c23
The main concern from the prior round — get_qdq_config() pre-setting UseQDQContribOps=True for 16-bit types before the opset bump — is now fixed. The narrowed condition correctly limits the auto-set to 4-bit types only, and the new end-to-end test (test_quantize_via_config_int16_opset_lt21_uses_native_qdq) validates that the config-based path produces opset-21 models with native ONNX Q/DQ.
One minor suggestion on the overrides check below.
LGTM — nice work addressing the feedback.
| overrides_have_opset21_types = any(t in opset21_types for t in overrides_helper.get_quant_types()) | ||
| if activation_type in opset21_types or weight_type in opset21_types or overrides_have_opset21_types: | ||
| # Only set UseQDQContribOps for 4-bit types; 16-bit types are handled by the opset bump. | ||
| needs_contrib_ops = activation_type in q4_types or weight_type in q4_types or overrides_have_opset21_types |
There was a problem hiding this comment.
Suggestion: overrides_have_opset21_types still checks both q16_types and q4_types. This means if a user places INT16 in tensor_quant_overrides and uses INT16 as the main activation/weight type, UseQDQContribOps will be set even though the opset will be bumped to 21.
For the common case (overrides-only INT16 with INT8 main types) this is correct since update_opset_version doesn't inspect overrides. But consider splitting the check for clarity:
overrides_have_q4_types = any(t in q4_types for t in overrides_helper.get_quant_types())
overrides_have_q16_only = (
any(t in q16_types for t in overrides_helper.get_quant_types())
and not needs_opset21_bump # main types already trigger the bump
)
needs_contrib_ops = (
activation_type in q4_types
or weight_type in q4_types
or overrides_have_q4_types
or overrides_have_q16_only
)Not a blocker — the current logic is safe and produces correct (if slightly suboptimal) output in the edge case.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| _int16_types = (onnx.TensorProto.UINT16, onnx.TensorProto.INT16) | ||
| needs_opset21_for_16bit = weight_quant_type in _int16_types or activation_quant_type in _int16_types |
| if onnx_opset.version < 21: | ||
| opset21_types = q16_types.union(q4_types) | ||
| overrides_have_opset21_types = any(t in opset21_types for t in overrides_helper.get_quant_types()) | ||
| if activation_type in opset21_types or weight_type in opset21_types or overrides_have_opset21_types: | ||
| # Only set UseQDQContribOps for 4-bit types; 16-bit types are handled by the opset bump. | ||
| needs_contrib_ops = activation_type in q4_types or weight_type in q4_types or overrides_have_opset21_types |
| f"The original model opset version is {opset_version}, which does not support 16-bit integer " | ||
| "quantization with native ONNX QuantizeLinear/DequantizeLinear. " | ||
| "Please update the model to opset >= 21. Automatically update the model to opset 21. " | ||
| "Please verify the quantized model." |
| qdq_config = get_qdq_config( | ||
| float_model, | ||
| data_reader, | ||
| activation_type=QuantType.QUInt16, | ||
| weight_type=QuantType.QInt8, |
Summary
update_opset_versionhelper to auto-bump opset from < 21 to 21 when QUInt16/QInt16 weight quantization is requestedMotivation
Fixes #25223.
Users exporting models from
torch.exportwith uint16/int16 quantization hit a gap where models below opset 21 were not being upgraded. Mirroring the existing float8 branch gives users a consistent, predictable upgrade path for 16-bit QDQ.Changes
onnxruntime/python/tools/quantization/quant_utils.py: newelifbranch inupdate_opset_versionthat bumps opset to 21 whenweight_quant_typeisINT16/UINT16and current opset is < 21. Emits a warning matching the existing float8 branch style.onnxruntime/test/python/quantization/test_quant_util.py: newtest_update_opset_version_16bitwith parametric subtests coveringQUInt16/QInt16bumping from opset 20 → 21 and a no-op regression check for models already at opset 21.Test Plan
All tests pass.
lintrunner -aproduces no changes.