fix: auto-upgrade model opset to 21 for int16/uint16 QDQ quantization by Rishi-Dave · Pull Request #28202 · microsoft/onnxruntime

Rishi-Dave · 2026-04-23T11:22:40Z

Summary

Extends existing update_opset_version helper to auto-bump opset from < 21 to 21 when QUInt16/QInt16 weight quantization is requested
Mirrored after the existing float8 quantization opset upgrade pattern
Adds test coverage with parametric subtests for 16-bit int quantization

Motivation

Users exporting models from torch.export with uint16/int16 quantization hit a gap where models below opset 21 were not being upgraded. Mirroring the existing float8 branch gives users a consistent, predictable upgrade path for 16-bit QDQ.

Changes

onnxruntime/python/tools/quantization/quant_utils.py: new elif branch in update_opset_version that bumps opset to 21 when weight_quant_type is INT16 / UINT16 and current opset is < 21. Emits a warning matching the existing float8 branch style.
onnxruntime/test/python/quantization/test_quant_util.py: new test_update_opset_version_16bit with parametric subtests covering QUInt16 / QInt16 bumping from opset 20 → 21 and a no-op regression check for models already at opset 21.

Test Plan

python -m pytest onnxruntime/test/python/quantization/test_quant_util.py -v

All tests pass. lintrunner -a produces no changes.

The update_opset_version helper already auto-bumps opset to 19 when float8 quantization is requested on older models. Extend the same pattern to int16/uint16: when the user requests QUInt16 or QInt16 weight quantization and the model's opset is below 21, bump to 21 so that native ONNX QuantizeLinear/DequantizeLinear can be emitted instead of silently falling back to the com.microsoft contrib domain. Fixes microsoft#25223

Copilot

Pull request overview

Extends ONNX Runtime’s Python quantization utilities to automatically upgrade an input model’s ONNX opset to 21 when 16-bit integer (INT16/UINT16) QDQ quantization is requested, aligning behavior with the existing float8 opset auto-upgrade logic and adding regression tests.

Changes:

Add an update_opset_version branch to auto-bump opset < 21 to 21 for INT16/UINT16 quantization types (with a warning).
Add unit tests validating opset 20 → 21 upgrade and no-op behavior at opset 21 for QUInt16/QInt16.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File	Description
`onnxruntime/python/tools/quantization/quant_utils.py`	Adds opset auto-upgrade logic for INT16/UINT16 quantization to ensure ONNX-native QDQ compatibility.
`onnxruntime/test/python/quantization/test_quant_util.py`	Adds test coverage for the new opset upgrade behavior for 16-bit quantization types.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

update_opset_version previously only inspected weight_type, so a config like activation_type=QInt16 with weight_type=QInt8 would not trigger the opset>=21 bump and could produce a model with int16 Q/DQ on opset<21. Extend the helper to accept activation_type and bump when either is INT16/UINT16. Update the quantize_static call site and add subtests covering 16-bit-activation-only, 16-bit-weight-only, both-8bit, and backward-compat (single-arg call) cases.

tianleiwu

Thanks for the follow-up on the activation-type gap. I found one remaining workflow issue: the config-based static quantization path can still carry an auto-derived UseQDQContribOps flag from the original opset, so it may keep emitting contrib-domain Q/DQ after the model is bumped to opset 21. Please align that path with the direct quantize_static behavior before merge.

Copilot

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

+    _int16_types = (onnx.TensorProto.UINT16, onnx.TensorProto.INT16)
+    needs_opset21_for_16bit = weight_quant_type in _int16_types or activation_quant_type in _int16_types


+        # Both 8-bit should NOT bump to 21
+        with self.subTest(weight_type="QInt8", activation_type="QUInt8", opset=20):
+            model = helper.make_model(graph, opset_imports=[helper.make_opsetid("", 20)])
+            result = update_opset_version(model, QuantType.QInt8, QuantType.QUInt8)
+            result_opset = result.opset_import[0].version
+            self.assertNotEqual(result_opset, 21)


… get_qdq_config get_qdq_config() was auto-setting extra_options["UseQDQContribOps"] = True whenever activation_type or weight_type was INT16/UINT16 and the model opset was < 21. This caused the config-based quantize(..., StaticQuantConfig) path to emit com.microsoft Q/DQ ops even after quantize_static() bumped the model to opset 21, where native ONNX QuantizeLinear/DequantizeLinear supports INT16/UINT16 natively. Narrow the condition so that UseQDQContribOps is only auto-set for 4-bit types (which have no opset bump) and for tensor-override-based types; 16-bit top-level weight/activation types are excluded because the opset-21 bump in quantize_static() already handles them. An explicit user-supplied UseQDQContribOps in extra_options still takes precedence via the existing override merge. Update test_get_qdq_config.py: rename and fix the int16-opset19 subtest to assert the new correct behavior (no contrib-ops flag), and add an end-to-end test that verifies the config path produces an opset-21 model with native-domain Q/DQ nodes. Tighten the existing no-op subtest in test_quant_util.py from assertNotEqual to assertEqual(result_opset, 20) for a stricter regression guard.

tianleiwu

Re-review after commit `ef62c23`

The main concern from the prior round — get_qdq_config() pre-setting UseQDQContribOps=True for 16-bit types before the opset bump — is now fixed. The narrowed condition correctly limits the auto-set to 4-bit types only, and the new end-to-end test (test_quantize_via_config_int16_opset_lt21_uses_native_qdq) validates that the config-based path produces opset-21 models with native ONNX Q/DQ.

One minor suggestion on the overrides check below.

LGTM — nice work addressing the feedback.

tianleiwu · 2026-05-03T15:59:26Z

        overrides_have_opset21_types = any(t in opset21_types for t in overrides_helper.get_quant_types())
-        if activation_type in opset21_types or weight_type in opset21_types or overrides_have_opset21_types:
+        # Only set UseQDQContribOps for 4-bit types; 16-bit types are handled by the opset bump.
+        needs_contrib_ops = activation_type in q4_types or weight_type in q4_types or overrides_have_opset21_types


Suggestion: overrides_have_opset21_types still checks both q16_types and q4_types. This means if a user places INT16 in tensor_quant_overrides and uses INT16 as the main activation/weight type, UseQDQContribOps will be set even though the opset will be bumped to 21.

For the common case (overrides-only INT16 with INT8 main types) this is correct since update_opset_version doesn't inspect overrides. But consider splitting the check for clarity:

overrides_have_q4_types = any(t in q4_types for t in overrides_helper.get_quant_types()) overrides_have_q16_only = ( any(t in q16_types for t in overrides_helper.get_quant_types()) and not needs_opset21_bump # main types already trigger the bump ) needs_contrib_ops = ( activation_type in q4_types or weight_type in q4_types or overrides_have_q4_types or overrides_have_q16_only )

Not a blocker — the current logic is safe and produces correct (if slightly suboptimal) output in the edge case.

Copilot

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

+    _int16_types = (onnx.TensorProto.UINT16, onnx.TensorProto.INT16)
+    needs_opset21_for_16bit = weight_quant_type in _int16_types or activation_quant_type in _int16_types


    if onnx_opset.version < 21:
        opset21_types = q16_types.union(q4_types)
        overrides_have_opset21_types = any(t in opset21_types for t in overrides_helper.get_quant_types())
-        if activation_type in opset21_types or weight_type in opset21_types or overrides_have_opset21_types:
+        # Only set UseQDQContribOps for 4-bit types; 16-bit types are handled by the opset bump.
+        needs_contrib_ops = activation_type in q4_types or weight_type in q4_types or overrides_have_opset21_types


+            f"The original model opset version is {opset_version}, which does not support 16-bit integer "
+            "quantization with native ONNX QuantizeLinear/DequantizeLinear. "
+            "Please update the model to opset >= 21. Automatically update the model to opset 21. "
+            "Please verify the quantized model."


+        qdq_config = get_qdq_config(
+            float_model,
+            data_reader,
+            activation_type=QuantType.QUInt16,
+            weight_type=QuantType.QInt8,


tianleiwu requested a review from Copilot April 25, 2026 16:44

Copilot started reviewing on behalf of tianleiwu April 25, 2026 16:45 View session

Copilot AI reviewed Apr 25, 2026

View reviewed changes

Comment thread onnxruntime/python/tools/quantization/quant_utils.py Outdated

tianleiwu requested changes May 3, 2026

View reviewed changes

Comment thread onnxruntime/python/tools/quantization/quantize.py

tianleiwu requested a review from Copilot May 3, 2026 00:42

Copilot started reviewing on behalf of tianleiwu May 3, 2026 00:43 View session

Copilot AI reviewed May 3, 2026

View reviewed changes

tianleiwu approved these changes May 3, 2026

View reviewed changes

tianleiwu requested a review from Copilot May 3, 2026 17:22

Copilot started reviewing on behalf of tianleiwu May 3, 2026 17:23 View session

Copilot AI reviewed May 3, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: auto-upgrade model opset to 21 for int16/uint16 QDQ quantization#28202

fix: auto-upgrade model opset to 21 for int16/uint16 QDQ quantization#28202
Rishi-Dave wants to merge 3 commits intomicrosoft:mainfrom
Rishi-Dave:rishidave/feat/quant-utils-opset-bump-16bit

Rishi-Dave commented Apr 23, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

tianleiwu left a comment

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

tianleiwu left a comment

Uh oh!

tianleiwu May 3, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

		_int16_types = (onnx.TensorProto.UINT16, onnx.TensorProto.INT16)
		needs_opset21_for_16bit = weight_quant_type in _int16_types or activation_quant_type in _int16_types

Conversation

Rishi-Dave commented Apr 23, 2026

Summary

Motivation

Changes

Test Plan

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

tianleiwu left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

tianleiwu left a comment

Choose a reason for hiding this comment

Re-review after commit ef62c23

Uh oh!

tianleiwu May 3, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Re-review after commit `ef62c23`