Skip to content

fix(quantization): validate bias scale in QDQ Conv → QLinearConv fusion#28229

Open
Rishi-Dave wants to merge 3 commits intomicrosoft:mainfrom
Rishi-Dave:rishidave/fix/qdq-conv-bias-scale-validation
Open

fix(quantization): validate bias scale in QDQ Conv → QLinearConv fusion#28229
Rishi-Dave wants to merge 3 commits intomicrosoft:mainfrom
Rishi-Dave:rishidave/fix/qdq-conv-bias-scale-validation

Conversation

@Rishi-Dave
Copy link
Copy Markdown
Contributor

Summary

  • Add CheckConvBiasScale validator inside ConvNodeGroupSelector::Check
  • Skip QDQ Conv → QLinearConv fusion when bias DQ scale ≠ input_scale × weight_scale (within 1% relative tolerance)
  • Adds Python test coverage for both matching and mismatched bias scales

Motivation

Fixes #24711.

The ONNX QLinearConv spec requires the int32 bias to use scale x_scale × w_scale[i] so the fused kernel can reuse it directly. The current QDQ selector only verifies the bias dtype is INT32 — it never checks that the bias DQ's scale satisfies this relationship. When a model is constructed with an arbitrary bias scale (e.g. user-supplied or from a non-canonical quantizer), the selector still fuses the subgraph and the QLinearConv kernel produces silently wrong outputs at ORT_ENABLE_EXTENDED and above on CPU EP. CUDA and disabled-optimization paths produce correct results, making the bug particularly hard to diagnose.

Changes

  • onnxruntime/core/optimizer/qdq_transformer/selectors_actions/qdq_selectors.cc: add CheckConvBiasScale static helper. Returns false (skip fusion) when:
    • any of x/w/b scales is not a constant initializer
    • any scale dtype is not float32
    • x_scale is not a scalar / 1-element rank-1 tensor
    • b_scale length is neither 1 nor num_channels
    • any per-channel bias scale differs from x_scale × w_scale[i] by more than atol=1e-6 + rtol=1e-2 × |expected|
  • onnxruntime/test/python/quantization/test_qdq.py: new TestConvBiasScaleValidation class with two cases — mismatched bias scale (asserts optimized output matches unoptimized) and matching bias scale (asserts correctness preserved when fusion is allowed).

Test Plan

  • python -m pytest onnxruntime/test/python/quantization/test_qdq.py::TestConvBiasScaleValidation -v
  • Existing QDQ Conv tests (verify_quantize_conv family) should continue to pass — fusion is unchanged for canonical quantizer-produced models where bias_scale equals input_scale × weight_scale exactly.
  • Reproduce the issue with the model from Conv + Q/DQ produces bad output in CPUExecutionProvider #24711 and confirm CPU ORT_ENABLE_ALL output now matches ORT_DISABLE_ALL.

ONNX QLinearConv reuses the int32 bias values directly without
re-scaling, which requires bias_scale[i] == x_scale * w_scale[i] for
each output channel. Previously ConvNodeGroupSelector::Check did not
verify this invariant, so a QDQ model whose bias DQ used an arbitrary
scale (e.g. produced by a non-conformant quantizer) would be silently
fused into QLinearConv with wrong outputs.

Add a static helper CheckConvBiasScale() that reads the three scale
initializers, handles both per-tensor (scalar) and per-channel weight
scales, and rejects the fusion when the tolerance check
  |b_scale - x_scale*w_scale| > atol + rtol*|x_scale*w_scale|
(atol=1e-6, rtol=1e-2) fails for any channel. If any scale is not a
constant initializer the helper returns false conservatively.

Add two Python test cases in TestConvBiasScaleValidation:
- test_mismatched_bias_scale_skips_fusion: verifies that optimized and
  unoptimized sessions agree when bias_scale is 2x the correct value,
  proving fusion was safely skipped.
- test_matching_bias_scale_allows_fusion: verifies the same agreement
  when bias_scale is exact, ensuring no regression on valid models.

Fixes microsoft#24711
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR hardens the QDQ Conv → QLinearConv fusion in the QDQ transformer by validating that the bias dequantization scale conforms to the ONNX QLinearConv requirement (bias_scale == x_scale * w_scale[i]), preventing silent wrong results when models use non-canonical bias scales.

Changes:

  • Add CheckConvBiasScale validation to ConvNodeGroupSelector::Check to skip fusion when bias scale does not match x_scale * w_scale[i] within tolerance.
  • Add Python tests that run the same model with optimizations enabled vs disabled to detect bias-scale-related corruption.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File Description
onnxruntime/core/optimizer/qdq_transformer/selectors_actions/qdq_selectors.cc Adds bias scale validation gate to prevent incorrect QDQ Conv fusion into QLinearConv.
onnxruntime/test/python/quantization/test_qdq.py Adds regression tests for matched vs mismatched bias scales under different optimization levels.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread onnxruntime/test/python/quantization/test_qdq.py
…scale check

CheckConvBiasScale derived num_channels from w_scales.size() alone, which is 1
for per-tensor weight scale even when bias_scale is a per-channel vector. That
caused valid graphs (per-channel bias with scalar weight scale) to be rejected
and only one channel to be validated when num_channels was inferred wrongly.

Use the larger of w_num/b_num for num_channels and broadcast either scale
across channels when scalar. Reject only when both are vectors with mismatched
lengths.

Also extend the matching/mismatched-bias-scale tests to save the optimized
graph and assert QLinearConv presence/absence directly via op counts, so the
tests fail fast if fusion silently disables for an unrelated reason.
Copy link
Copy Markdown
Contributor

@tianleiwu tianleiwu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for tightening this fusion path. I found one remaining correctness gap in the same bias-DQ area: the scale is now validated, but the bias zero point is still discarded by the QLinearConv rewrite, so a matching-scale model with nonzero bias zero point can still be fused into silent wrong output.


// Verify bias scale == input_scale * weight_scale[i] per ONNX QLinearConv spec.
// If scales don't match within tolerance, skip fusion to avoid silent numerical errors.
if (!CheckConvBiasScale(graph_viewer, *dq_nodes[0], *dq_nodes[1], *dq_nodes[2])) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This still needs to validate the bias DQ zero point before fusion. ConvMoves() only appends dq_bias input 0 as the QLinearConv bias, so the bias DQ scale and zero point are discarded during the rewrite. This helper now verifies that the discarded scale matches x_scale * w_scale, but it does not require the discarded zero point to be zero. If a model has matching b_scale and a nonzero int32 b_zp, the unfused graph computes (bias_q - b_zp) * b_scale, while the fused QLinearConv interprets bias_q directly with implicit bias zero point 0, producing silently shifted outputs. Please reject fusion unless the bias zero point is absent or a constant int32 zero, and add a regression test where matching scale plus nonzero b_zp does not produce QLinearConv.

…fusion

CheckConvBiasScale only validated the bias dequantize scale, but ConvMoves
discards both the bias DQ scale and zero-point during the rewrite to
QLinearConv (which has no bias_zp input and assumes implicit zero). A model
with matching b_scale = x_scale * w_scale but nonzero b_zp would therefore
be silently fused into a graph computing bias_q * b_scale instead of the
spec-correct (bias_q - b_zp) * b_scale, producing shifted outputs.

Add a sibling helper CheckConvBiasZeroPoint that, when the optional bias
zero-point input is present and constant, requires it to be int32 with
every element zero. Absent or empty-name inputs are treated as zero
(matching the DequantizeLinear default). Non-constant or non-int32 zero
points are conservatively rejected so fusion is skipped rather than
producing silently wrong results.

Wire the new check into ConvNodeGroupSelector::Check immediately after the
scale check, only on the bias-DQ branch (dq_nodes.size() == 3).

Add a regression test test_nonzero_bias_zp_skips_fusion that builds a
conformant-scale graph with b_zp=1, asserts the optimized graph contains
no QLinearConv (the primary structural guarantee), and additionally
compares optimized vs unoptimized outputs as a defense-in-depth check.
@Rishi-Dave
Copy link
Copy Markdown
Contributor Author

Thanks for catching the bias zero-point gap. Pushed 0f7900d adding CheckConvBiasZeroPoint in qdq_selectors.cc, invoked from ConvNodeGroupSelector::Check right after CheckConvBiasScale. The helper rejects fusion unless the bias DQ's zero-point input is absent/empty (implicit 0) or a constant int32 initializer with every element equal to 0 — covering per-tensor and per-channel cases. Non-constant or non-int32 zero points are conservatively rejected.

Regression test TestConvBiasScaleValidation::test_nonzero_bias_zp_skips_fusion in test/python/quantization/test_qdq.py builds a QDQ Conv with matching b_scale == x_scale * w_scale but b_zp = 1 and asserts the optimized graph contains zero QLinearConv nodes. Happy to fold followups (stronger numeric assertion, positive control) into this PR if preferred.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Conv + Q/DQ produces bad output in CPUExecutionProvider

3 participants