Skip to content

fix: auto-upgrade model opset to 21 for int16/uint16 QDQ quantization#28202

Open
Rishi-Dave wants to merge 3 commits intomicrosoft:mainfrom
Rishi-Dave:rishidave/feat/quant-utils-opset-bump-16bit
Open

fix: auto-upgrade model opset to 21 for int16/uint16 QDQ quantization#28202
Rishi-Dave wants to merge 3 commits intomicrosoft:mainfrom
Rishi-Dave:rishidave/feat/quant-utils-opset-bump-16bit

Conversation

@Rishi-Dave
Copy link
Copy Markdown
Contributor

Summary

  • Extends existing update_opset_version helper to auto-bump opset from < 21 to 21 when QUInt16/QInt16 weight quantization is requested
  • Mirrored after the existing float8 quantization opset upgrade pattern
  • Adds test coverage with parametric subtests for 16-bit int quantization

Motivation

Fixes #25223.

Users exporting models from torch.export with uint16/int16 quantization hit a gap where models below opset 21 were not being upgraded. Mirroring the existing float8 branch gives users a consistent, predictable upgrade path for 16-bit QDQ.

Changes

  • onnxruntime/python/tools/quantization/quant_utils.py: new elif branch in update_opset_version that bumps opset to 21 when weight_quant_type is INT16 / UINT16 and current opset is < 21. Emits a warning matching the existing float8 branch style.
  • onnxruntime/test/python/quantization/test_quant_util.py: new test_update_opset_version_16bit with parametric subtests covering QUInt16 / QInt16 bumping from opset 20 → 21 and a no-op regression check for models already at opset 21.

Test Plan

python -m pytest onnxruntime/test/python/quantization/test_quant_util.py -v

All tests pass. lintrunner -a produces no changes.

The update_opset_version helper already auto-bumps opset to 19 when
float8 quantization is requested on older models. Extend the same
pattern to int16/uint16: when the user requests QUInt16 or QInt16
weight quantization and the model's opset is below 21, bump to 21 so
that native ONNX QuantizeLinear/DequantizeLinear can be emitted
instead of silently falling back to the com.microsoft contrib domain.

Fixes microsoft#25223
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Extends ONNX Runtime’s Python quantization utilities to automatically upgrade an input model’s ONNX opset to 21 when 16-bit integer (INT16/UINT16) QDQ quantization is requested, aligning behavior with the existing float8 opset auto-upgrade logic and adding regression tests.

Changes:

  • Add an update_opset_version branch to auto-bump opset < 21 to 21 for INT16/UINT16 quantization types (with a warning).
  • Add unit tests validating opset 20 → 21 upgrade and no-op behavior at opset 21 for QUInt16/QInt16.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File Description
onnxruntime/python/tools/quantization/quant_utils.py Adds opset auto-upgrade logic for INT16/UINT16 quantization to ensure ONNX-native QDQ compatibility.
onnxruntime/test/python/quantization/test_quant_util.py Adds test coverage for the new opset upgrade behavior for 16-bit quantization types.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread onnxruntime/python/tools/quantization/quant_utils.py Outdated
update_opset_version previously only inspected weight_type, so a config
like activation_type=QInt16 with weight_type=QInt8 would not trigger the
opset>=21 bump and could produce a model with int16 Q/DQ on opset<21.
Extend the helper to accept activation_type and bump when either is
INT16/UINT16. Update the quantize_static call site and add subtests
covering 16-bit-activation-only, 16-bit-weight-only, both-8bit, and
backward-compat (single-arg call) cases.
Copy link
Copy Markdown
Contributor

@tianleiwu tianleiwu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the follow-up on the activation-type gap. I found one remaining workflow issue: the config-based static quantization path can still carry an auto-derived UseQDQContribOps flag from the original opset, so it may keep emitting contrib-domain Q/DQ after the model is bumped to opset 21. Please align that path with the direct quantize_static behavior before merge.

Comment thread onnxruntime/python/tools/quantization/quantize.py
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +978 to +979
_int16_types = (onnx.TensorProto.UINT16, onnx.TensorProto.INT16)
needs_opset21_for_16bit = weight_quant_type in _int16_types or activation_quant_type in _int16_types
Comment thread onnxruntime/python/tools/quantization/quantize.py
Comment on lines +214 to +219
# Both 8-bit should NOT bump to 21
with self.subTest(weight_type="QInt8", activation_type="QUInt8", opset=20):
model = helper.make_model(graph, opset_imports=[helper.make_opsetid("", 20)])
result = update_opset_version(model, QuantType.QInt8, QuantType.QUInt8)
result_opset = result.opset_import[0].version
self.assertNotEqual(result_opset, 21)
… get_qdq_config

get_qdq_config() was auto-setting extra_options["UseQDQContribOps"] = True
whenever activation_type or weight_type was INT16/UINT16 and the model opset
was < 21. This caused the config-based quantize(..., StaticQuantConfig) path
to emit com.microsoft Q/DQ ops even after quantize_static() bumped the model
to opset 21, where native ONNX QuantizeLinear/DequantizeLinear supports
INT16/UINT16 natively.

Narrow the condition so that UseQDQContribOps is only auto-set for 4-bit types
(which have no opset bump) and for tensor-override-based types; 16-bit top-level
weight/activation types are excluded because the opset-21 bump in quantize_static()
already handles them. An explicit user-supplied UseQDQContribOps in extra_options
still takes precedence via the existing override merge.

Update test_get_qdq_config.py: rename and fix the int16-opset19 subtest to assert
the new correct behavior (no contrib-ops flag), and add an end-to-end test that
verifies the config path produces an opset-21 model with native-domain Q/DQ nodes.
Tighten the existing no-op subtest in test_quant_util.py from assertNotEqual to
assertEqual(result_opset, 20) for a stricter regression guard.
Copy link
Copy Markdown
Contributor

@tianleiwu tianleiwu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-review after commit ef62c23

The main concern from the prior round — get_qdq_config() pre-setting UseQDQContribOps=True for 16-bit types before the opset bump — is now fixed. The narrowed condition correctly limits the auto-set to 4-bit types only, and the new end-to-end test (test_quantize_via_config_int16_opset_lt21_uses_native_qdq) validates that the config-based path produces opset-21 models with native ONNX Q/DQ.

One minor suggestion on the overrides check below.

LGTM — nice work addressing the feedback.

overrides_have_opset21_types = any(t in opset21_types for t in overrides_helper.get_quant_types())
if activation_type in opset21_types or weight_type in opset21_types or overrides_have_opset21_types:
# Only set UseQDQContribOps for 4-bit types; 16-bit types are handled by the opset bump.
needs_contrib_ops = activation_type in q4_types or weight_type in q4_types or overrides_have_opset21_types
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: overrides_have_opset21_types still checks both q16_types and q4_types. This means if a user places INT16 in tensor_quant_overrides and uses INT16 as the main activation/weight type, UseQDQContribOps will be set even though the opset will be bumped to 21.

For the common case (overrides-only INT16 with INT8 main types) this is correct since update_opset_version doesn't inspect overrides. But consider splitting the check for clarity:

overrides_have_q4_types = any(t in q4_types for t in overrides_helper.get_quant_types())
overrides_have_q16_only = (
    any(t in q16_types for t in overrides_helper.get_quant_types())
    and not needs_opset21_bump  # main types already trigger the bump
)
needs_contrib_ops = (
    activation_type in q4_types
    or weight_type in q4_types
    or overrides_have_q4_types
    or overrides_have_q16_only
)

Not a blocker — the current logic is safe and produces correct (if slightly suboptimal) output in the edge case.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +978 to +979
_int16_types = (onnx.TensorProto.UINT16, onnx.TensorProto.INT16)
needs_opset21_for_16bit = weight_quant_type in _int16_types or activation_quant_type in _int16_types
Comment on lines 377 to +381
if onnx_opset.version < 21:
opset21_types = q16_types.union(q4_types)
overrides_have_opset21_types = any(t in opset21_types for t in overrides_helper.get_quant_types())
if activation_type in opset21_types or weight_type in opset21_types or overrides_have_opset21_types:
# Only set UseQDQContribOps for 4-bit types; 16-bit types are handled by the opset bump.
needs_contrib_ops = activation_type in q4_types or weight_type in q4_types or overrides_have_opset21_types
Comment on lines +991 to +994
f"The original model opset version is {opset_version}, which does not support 16-bit integer "
"quantization with native ONNX QuantizeLinear/DequantizeLinear. "
"Please update the model to opset >= 21. Automatically update the model to opset 21. "
"Please verify the quantized model."
Comment on lines +325 to +329
qdq_config = get_qdq_config(
float_model,
data_reader,
activation_type=QuantType.QUInt16,
weight_type=QuantType.QInt8,
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature request] support U16 / S16 for QuantizeLinear and DequantizeLinear

3 participants