Skip to content

Apply constant folding to Q/DQ patterns on constants#802

Open
pluralia wants to merge 12 commits into
Xilinx:release_rai_1_8from
pluralia:ai.bypass-shape-ops-through-dq
Open

Apply constant folding to Q/DQ patterns on constants#802
pluralia wants to merge 12 commits into
Xilinx:release_rai_1_8from
pluralia:ai.bypass-shape-ops-through-dq

Conversation

@pluralia
Copy link
Copy Markdown

@pluralia pluralia commented May 31, 2026

The update is required to remove Slice and Transpose ops if MM is identified as act x weight instead of act x act.

Adds three new patthers to src/Dialect/ONNX/Transforms/ConstProp.cpp, all gated behind the existing enableQDQ flag and registered alongside the current RemoveQDQForConst. They target Q/DQ chains rooted at an ONNXConstantOp that the existing patterns leave behind:

  • BypassShapeOpThroughDQ<ONNXOp> — swaps Const → DQ → ShapeOp into Const → ShapeOp → DQ for Slice/Transpose/Reshape/Squeeze/Unsqueeze so the existing *OfConst folders can collapse the shape op on the integer constant.

  • DropIdempotentQDQOnConst — removes a Const → DQ(s,z) → Q(s,z) round-trip on an integer constant when the DQ and Q share the same scale, zero-point, and storage dtype

  • FoldRequantizeOnConst — materializes a new integer constant for a genuine Const → DQ(s1,z1) → Q(s2,z2,intT2) requantize by recomputing each element with round-to-nearest-even and saturation

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR extends ONNX constant propagation to further fold/remove Q/DQ patterns rooted at constants (under the existing enableQDQ gate), helping eliminate leftover Slice/Transpose/etc. that can block later optimizations.

Changes:

  • Adds BypassShapeOpThroughDQ<...> to commute shape-only ops ahead of DequantizeLinear for per-tensor quantization on constants.
  • Adds DropIdempotentQDQOnConst to remove no-op Const -> DQ -> Q round-trips on integer constants when parameters are identical.
  • Adds FoldRequantizeOnConst to materialize a new integer constant for genuine Const -> DQ -> Q requantization.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/Dialect/ONNX/Transforms/ConstProp.cpp Outdated
Comment thread src/Dialect/ONNX/Transforms/ConstProp.cpp Outdated
Comment thread src/Dialect/ONNX/Transforms/ConstProp.cpp Outdated
Comment thread src/Dialect/ONNX/Transforms/ConstProp.cpp Outdated
Comment thread src/Dialect/ONNX/Transforms/ConstProp.cpp Outdated
Comment thread src/Dialect/ONNX/Transforms/ConstProp.cpp Outdated
Comment thread src/Dialect/ONNX/Transforms/ConstProp.cpp Outdated
Comment thread src/Dialect/ONNX/Transforms/ConstProp.cpp
Comment thread src/Dialect/ONNX/Transforms/ConstProp.cpp
@jorickert jorickert requested a review from ilango100 June 1, 2026 08:42
Copy link
Copy Markdown

@p-lanza p-lanza left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please add unit tests

Comment thread src/Dialect/ONNX/Transforms/ConstProp.cpp
@guwacAMD
Copy link
Copy Markdown

guwacAMD commented Jun 1, 2026

DMAC FE Regression Summary

  • 151 models submitted, 145 pass / 6 fail on both sides — pass/fail set identical.
  • No new failures, no recovered passes.
  • 6 failures pre-exist on nightly (unrelated to the patch):
    • exit 137 (OOM): fara-qwen-vl-1, fara-qwen-vl-512, phi-v-next-7b-psu0, phi-v-next-7b-psu1
    • exit 134 (abort): mako-encoder-fp16, psr

Per-model frontend diff

  • 143 / 145 identical between CHK and REF.
  • 2 / 145 show real diffs: psx0-qdq (target) and microsoft-infoxlm-large (side effect).

psx0-qdq — net op-type deltas (REF → CHK)

op_type REF CHK Δ
MatMul_qdq_actxact_uint16xuint16xuint16 120 0 −120
MatMul_qdq_uint16xuint8xuint16 0 120 +120
MatMul_qdq_bias_uint16xuint8xuint16 0 2 +2
Slice_qdq_uint8xbfloat16 120 0 −120
Transpose_qdq_bfloat16xuint16 120 0 −120
Transpose_qdq_uint16xuint16 3 1 −2
Transpose_qdq_uint8xuint8 4 0 −4
Dequant_uint16xbfloat16 4 0 −4
Dequant_int16xbfloat16 2 0 −2
Quant_bfloat16xuint16 2 0 −2

120 actxact uint16×uint16 MatMuls reclassified to AIE-runnable actxweight uint16×uint8; 120 paired Slice + 120 Transpose ops on weights collapsed; 2 1×1 Convs (AIESW-34318) now fused as MatMul_qdq_bias_uint16×uint8.

microsoft-infoxlm-large — side effect

  • 1 Conversion_uint8xuint16 removed (lm_head.dense.bias_DequantizeLinear)
  • 1 LayerNormalization_qdq weight retyped from uint16uint8 (B_dtype, wgt1_bytes 2→1)
  • Net: −1 op, small improvement

@pluralia pluralia force-pushed the ai.bypass-shape-ops-through-dq branch from 54f8e3e to 35620a8 Compare June 1, 2026 22:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants