fix(quantization): validate bias scale in QDQ Conv → QLinearConv fusion#28229
fix(quantization): validate bias scale in QDQ Conv → QLinearConv fusion#28229Rishi-Dave wants to merge 3 commits intomicrosoft:mainfrom
Conversation
ONNX QLinearConv reuses the int32 bias values directly without re-scaling, which requires bias_scale[i] == x_scale * w_scale[i] for each output channel. Previously ConvNodeGroupSelector::Check did not verify this invariant, so a QDQ model whose bias DQ used an arbitrary scale (e.g. produced by a non-conformant quantizer) would be silently fused into QLinearConv with wrong outputs. Add a static helper CheckConvBiasScale() that reads the three scale initializers, handles both per-tensor (scalar) and per-channel weight scales, and rejects the fusion when the tolerance check |b_scale - x_scale*w_scale| > atol + rtol*|x_scale*w_scale| (atol=1e-6, rtol=1e-2) fails for any channel. If any scale is not a constant initializer the helper returns false conservatively. Add two Python test cases in TestConvBiasScaleValidation: - test_mismatched_bias_scale_skips_fusion: verifies that optimized and unoptimized sessions agree when bias_scale is 2x the correct value, proving fusion was safely skipped. - test_matching_bias_scale_allows_fusion: verifies the same agreement when bias_scale is exact, ensuring no regression on valid models. Fixes microsoft#24711
There was a problem hiding this comment.
Pull request overview
This PR hardens the QDQ Conv → QLinearConv fusion in the QDQ transformer by validating that the bias dequantization scale conforms to the ONNX QLinearConv requirement (bias_scale == x_scale * w_scale[i]), preventing silent wrong results when models use non-canonical bias scales.
Changes:
- Add
CheckConvBiasScalevalidation toConvNodeGroupSelector::Checkto skip fusion when bias scale does not matchx_scale * w_scale[i]within tolerance. - Add Python tests that run the same model with optimizations enabled vs disabled to detect bias-scale-related corruption.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
onnxruntime/core/optimizer/qdq_transformer/selectors_actions/qdq_selectors.cc |
Adds bias scale validation gate to prevent incorrect QDQ Conv fusion into QLinearConv. |
onnxruntime/test/python/quantization/test_qdq.py |
Adds regression tests for matched vs mismatched bias scales under different optimization levels. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
…scale check CheckConvBiasScale derived num_channels from w_scales.size() alone, which is 1 for per-tensor weight scale even when bias_scale is a per-channel vector. That caused valid graphs (per-channel bias with scalar weight scale) to be rejected and only one channel to be validated when num_channels was inferred wrongly. Use the larger of w_num/b_num for num_channels and broadcast either scale across channels when scalar. Reject only when both are vectors with mismatched lengths. Also extend the matching/mismatched-bias-scale tests to save the optimized graph and assert QLinearConv presence/absence directly via op counts, so the tests fail fast if fusion silently disables for an unrelated reason.
tianleiwu
left a comment
There was a problem hiding this comment.
Thanks for tightening this fusion path. I found one remaining correctness gap in the same bias-DQ area: the scale is now validated, but the bias zero point is still discarded by the QLinearConv rewrite, so a matching-scale model with nonzero bias zero point can still be fused into silent wrong output.
|
|
||
| // Verify bias scale == input_scale * weight_scale[i] per ONNX QLinearConv spec. | ||
| // If scales don't match within tolerance, skip fusion to avoid silent numerical errors. | ||
| if (!CheckConvBiasScale(graph_viewer, *dq_nodes[0], *dq_nodes[1], *dq_nodes[2])) { |
There was a problem hiding this comment.
This still needs to validate the bias DQ zero point before fusion. ConvMoves() only appends dq_bias input 0 as the QLinearConv bias, so the bias DQ scale and zero point are discarded during the rewrite. This helper now verifies that the discarded scale matches x_scale * w_scale, but it does not require the discarded zero point to be zero. If a model has matching b_scale and a nonzero int32 b_zp, the unfused graph computes (bias_q - b_zp) * b_scale, while the fused QLinearConv interprets bias_q directly with implicit bias zero point 0, producing silently shifted outputs. Please reject fusion unless the bias zero point is absent or a constant int32 zero, and add a regression test where matching scale plus nonzero b_zp does not produce QLinearConv.
…fusion CheckConvBiasScale only validated the bias dequantize scale, but ConvMoves discards both the bias DQ scale and zero-point during the rewrite to QLinearConv (which has no bias_zp input and assumes implicit zero). A model with matching b_scale = x_scale * w_scale but nonzero b_zp would therefore be silently fused into a graph computing bias_q * b_scale instead of the spec-correct (bias_q - b_zp) * b_scale, producing shifted outputs. Add a sibling helper CheckConvBiasZeroPoint that, when the optional bias zero-point input is present and constant, requires it to be int32 with every element zero. Absent or empty-name inputs are treated as zero (matching the DequantizeLinear default). Non-constant or non-int32 zero points are conservatively rejected so fusion is skipped rather than producing silently wrong results. Wire the new check into ConvNodeGroupSelector::Check immediately after the scale check, only on the bias-DQ branch (dq_nodes.size() == 3). Add a regression test test_nonzero_bias_zp_skips_fusion that builds a conformant-scale graph with b_zp=1, asserts the optimized graph contains no QLinearConv (the primary structural guarantee), and additionally compares optimized vs unoptimized outputs as a defense-in-depth check.
|
Thanks for catching the bias zero-point gap. Pushed Regression test |
Summary
CheckConvBiasScalevalidator insideConvNodeGroupSelector::CheckMotivation
Fixes #24711.
The ONNX QLinearConv spec requires the int32 bias to use scale
x_scale × w_scale[i]so the fused kernel can reuse it directly. The current QDQ selector only verifies the bias dtype is INT32 — it never checks that the bias DQ's scale satisfies this relationship. When a model is constructed with an arbitrary bias scale (e.g. user-supplied or from a non-canonical quantizer), the selector still fuses the subgraph and the QLinearConv kernel produces silently wrong outputs atORT_ENABLE_EXTENDEDand above on CPU EP. CUDA and disabled-optimization paths produce correct results, making the bug particularly hard to diagnose.Changes
onnxruntime/core/optimizer/qdq_transformer/selectors_actions/qdq_selectors.cc: addCheckConvBiasScalestatic helper. Returnsfalse(skip fusion) when:x_scaleis not a scalar / 1-element rank-1 tensorb_scalelength is neither 1 nornum_channelsx_scale × w_scale[i]by more thanatol=1e-6 + rtol=1e-2 × |expected|onnxruntime/test/python/quantization/test_qdq.py: newTestConvBiasScaleValidationclass with two cases — mismatched bias scale (asserts optimized output matches unoptimized) and matching bias scale (asserts correctness preserved when fusion is allowed).Test Plan
python -m pytest onnxruntime/test/python/quantization/test_qdq.py::TestConvBiasScaleValidation -vverify_quantize_convfamily) should continue to pass — fusion is unchanged for canonical quantizer-produced models where bias_scale equals input_scale × weight_scale exactly.ORT_ENABLE_ALLoutput now matchesORT_DISABLE_ALL.