fix(quantization): validate bias scale in QDQ Conv → QLinearConv fusion by Rishi-Dave · Pull Request #28229 · microsoft/onnxruntime

Rishi-Dave · 2026-04-25T11:39:25Z

Summary

Add CheckConvBiasScale validator inside ConvNodeGroupSelector::Check
Skip QDQ Conv → QLinearConv fusion when bias DQ scale ≠ input_scale × weight_scale (within 1% relative tolerance)
Adds Python test coverage for both matching and mismatched bias scales

Motivation

The ONNX QLinearConv spec requires the int32 bias to use scale x_scale × w_scale[i] so the fused kernel can reuse it directly. The current QDQ selector only verifies the bias dtype is INT32 — it never checks that the bias DQ's scale satisfies this relationship. When a model is constructed with an arbitrary bias scale (e.g. user-supplied or from a non-canonical quantizer), the selector still fuses the subgraph and the QLinearConv kernel produces silently wrong outputs at ORT_ENABLE_EXTENDED and above on CPU EP. CUDA and disabled-optimization paths produce correct results, making the bug particularly hard to diagnose.

Changes

onnxruntime/core/optimizer/qdq_transformer/selectors_actions/qdq_selectors.cc: add CheckConvBiasScale static helper. Returns false (skip fusion) when:
- any of x/w/b scales is not a constant initializer
- any scale dtype is not float32
- x_scale is not a scalar / 1-element rank-1 tensor
- b_scale length is neither 1 nor num_channels
- any per-channel bias scale differs from x_scale × w_scale[i] by more than atol=1e-6 + rtol=1e-2 × |expected|
onnxruntime/test/python/quantization/test_qdq.py: new TestConvBiasScaleValidation class with two cases — mismatched bias scale (asserts optimized output matches unoptimized) and matching bias scale (asserts correctness preserved when fusion is allowed).

Test Plan

python -m pytest onnxruntime/test/python/quantization/test_qdq.py::TestConvBiasScaleValidation -v
Existing QDQ Conv tests (verify_quantize_conv family) should continue to pass — fusion is unchanged for canonical quantizer-produced models where bias_scale equals input_scale × weight_scale exactly.
Reproduce the issue with the model from Conv + Q/DQ produces bad output in CPUExecutionProvider #24711 and confirm CPU ORT_ENABLE_ALL output now matches ORT_DISABLE_ALL.

ONNX QLinearConv reuses the int32 bias values directly without re-scaling, which requires bias_scale[i] == x_scale * w_scale[i] for each output channel. Previously ConvNodeGroupSelector::Check did not verify this invariant, so a QDQ model whose bias DQ used an arbitrary scale (e.g. produced by a non-conformant quantizer) would be silently fused into QLinearConv with wrong outputs. Add a static helper CheckConvBiasScale() that reads the three scale initializers, handles both per-tensor (scalar) and per-channel weight scales, and rejects the fusion when the tolerance check |b_scale - x_scale*w_scale| > atol + rtol*|x_scale*w_scale| (atol=1e-6, rtol=1e-2) fails for any channel. If any scale is not a constant initializer the helper returns false conservatively. Add two Python test cases in TestConvBiasScaleValidation: - test_mismatched_bias_scale_skips_fusion: verifies that optimized and unoptimized sessions agree when bias_scale is 2x the correct value, proving fusion was safely skipped. - test_matching_bias_scale_allows_fusion: verifies the same agreement when bias_scale is exact, ensuring no regression on valid models. Fixes microsoft#24711

Copilot

Pull request overview

This PR hardens the QDQ Conv → QLinearConv fusion in the QDQ transformer by validating that the bias dequantization scale conforms to the ONNX QLinearConv requirement (bias_scale == x_scale * w_scale[i]), preventing silent wrong results when models use non-canonical bias scales.

Changes:

Add CheckConvBiasScale validation to ConvNodeGroupSelector::Check to skip fusion when bias scale does not match x_scale * w_scale[i] within tolerance.
Add Python tests that run the same model with optimizations enabled vs disabled to detect bias-scale-related corruption.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File	Description
`onnxruntime/core/optimizer/qdq_transformer/selectors_actions/qdq_selectors.cc`	Adds bias scale validation gate to prevent incorrect QDQ Conv fusion into QLinearConv.
`onnxruntime/test/python/quantization/test_qdq.py`	Adds regression tests for matched vs mismatched bias scales under different optimization levels.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

…scale check CheckConvBiasScale derived num_channels from w_scales.size() alone, which is 1 for per-tensor weight scale even when bias_scale is a per-channel vector. That caused valid graphs (per-channel bias with scalar weight scale) to be rejected and only one channel to be validated when num_channels was inferred wrongly. Use the larger of w_num/b_num for num_channels and broadcast either scale across channels when scalar. Reject only when both are vectors with mismatched lengths. Also extend the matching/mismatched-bias-scale tests to save the optimized graph and assert QLinearConv presence/absence directly via op counts, so the tests fail fast if fusion silently disables for an unrelated reason.

tianleiwu

Thanks for tightening this fusion path. I found one remaining correctness gap in the same bias-DQ area: the scale is now validated, but the bias zero point is still discarded by the QLinearConv rewrite, so a matching-scale model with nonzero bias zero point can still be fused into silent wrong output.

tianleiwu · 2026-05-03T00:46:16Z

+
+    // Verify bias scale == input_scale * weight_scale[i] per ONNX QLinearConv spec.
+    // If scales don't match within tolerance, skip fusion to avoid silent numerical errors.
+    if (!CheckConvBiasScale(graph_viewer, *dq_nodes[0], *dq_nodes[1], *dq_nodes[2])) {


This still needs to validate the bias DQ zero point before fusion. ConvMoves() only appends dq_bias input 0 as the QLinearConv bias, so the bias DQ scale and zero point are discarded during the rewrite. This helper now verifies that the discarded scale matches x_scale * w_scale, but it does not require the discarded zero point to be zero. If a model has matching b_scale and a nonzero int32 b_zp, the unfused graph computes (bias_q - b_zp) * b_scale, while the fused QLinearConv interprets bias_q directly with implicit bias zero point 0, producing silently shifted outputs. Please reject fusion unless the bias zero point is absent or a constant int32 zero, and add a regression test where matching scale plus nonzero b_zp does not produce QLinearConv.

…fusion CheckConvBiasScale only validated the bias dequantize scale, but ConvMoves discards both the bias DQ scale and zero-point during the rewrite to QLinearConv (which has no bias_zp input and assumes implicit zero). A model with matching b_scale = x_scale * w_scale but nonzero b_zp would therefore be silently fused into a graph computing bias_q * b_scale instead of the spec-correct (bias_q - b_zp) * b_scale, producing shifted outputs. Add a sibling helper CheckConvBiasZeroPoint that, when the optional bias zero-point input is present and constant, requires it to be int32 with every element zero. Absent or empty-name inputs are treated as zero (matching the DequantizeLinear default). Non-constant or non-int32 zero points are conservatively rejected so fusion is skipped rather than producing silently wrong results. Wire the new check into ConvNodeGroupSelector::Check immediately after the scale check, only on the bias-DQ branch (dq_nodes.size() == 3). Add a regression test test_nonzero_bias_zp_skips_fusion that builds a conformant-scale graph with b_zp=1, asserts the optimized graph contains no QLinearConv (the primary structural guarantee), and additionally compares optimized vs unoptimized outputs as a defense-in-depth check.

Rishi-Dave · 2026-05-03T12:58:29Z

Thanks for catching the bias zero-point gap. Pushed 0f7900d adding CheckConvBiasZeroPoint in qdq_selectors.cc, invoked from ConvNodeGroupSelector::Check right after CheckConvBiasScale. The helper rejects fusion unless the bias DQ's zero-point input is absent/empty (implicit 0) or a constant int32 initializer with every element equal to 0 — covering per-tensor and per-channel cases. Non-constant or non-int32 zero points are conservatively rejected.

Regression test TestConvBiasScaleValidation::test_nonzero_bias_zp_skips_fusion in test/python/quantization/test_qdq.py builds a QDQ Conv with matching b_scale == x_scale * w_scale but b_zp = 1 and asserts the optimized graph contains zero QLinearConv nodes. Happy to fold followups (stronger numeric assertion, positive control) into this PR if preferred.

tianleiwu requested a review from Copilot April 25, 2026 16:42

Copilot started reviewing on behalf of tianleiwu April 25, 2026 16:43 View session

Copilot AI reviewed Apr 25, 2026

View reviewed changes

Comment thread onnxruntime/core/optimizer/qdq_transformer/selectors_actions/qdq_selectors.cc

Comment thread onnxruntime/test/python/quantization/test_qdq.py

tianleiwu requested changes May 3, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(quantization): validate bias scale in QDQ Conv → QLinearConv fusion#28229

fix(quantization): validate bias scale in QDQ Conv → QLinearConv fusion#28229
Rishi-Dave wants to merge 3 commits intomicrosoft:mainfrom
Rishi-Dave:rishidave/fix/qdq-conv-bias-scale-validation

Rishi-Dave commented Apr 25, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

tianleiwu left a comment

Uh oh!

tianleiwu May 3, 2026

Uh oh!

Rishi-Dave commented May 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

Rishi-Dave commented Apr 25, 2026

Summary

Motivation

Changes

Test Plan

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

tianleiwu left a comment

Choose a reason for hiding this comment

Uh oh!

tianleiwu May 3, 2026

Choose a reason for hiding this comment

Uh oh!

Rishi-Dave commented May 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants