Skip to content

xmc/ReplaceQDQResizePass: replace 1x1 spatial XFEResize with broadcas…#804

Open
nvunnam57128 wants to merge 2 commits into
Xilinx:feature/onnx-to-tosafrom
nvunnam57128:nvunnam/replace-qdq-resize-1x1-to-broadcasting-add
Open

xmc/ReplaceQDQResizePass: replace 1x1 spatial XFEResize with broadcas…#804
nvunnam57128 wants to merge 2 commits into
Xilinx:feature/onnx-to-tosafrom
nvunnam57128:nvunnam/replace-qdq-resize-1x1-to-broadcasting-add

Conversation

@nvunnam57128
Copy link
Copy Markdown

No description provided.

…ting Add

Mirrors the IPU-specific 1x1-input optimisation in xcompiler's
ReplaceQDQResizePass (src/pass/passes/ReplaceQDQResizePass.cpp lines
200-282).  When a quantised XFEResize takes a tensor of shape
[N, 1, 1, C] and upsamples it to [N, H, W, C] (NHWC, with H>1 or W>1),
the Resize is functionally a broadcast: there is only one source
pixel per (N, C) so bilinear/nearest collapse to replication.

The rewrite drops the Resize and emits

  %zero  = onnx.Constant of shape [N, H, W, C] in the output quant type,
           storage value = output_zero_point (decodes to 0.0)
  %out   = onnx.Add(%resize_input, %zero)

ONNX Add broadcasts [N, 1, 1, C] + [N, H, W, C] -> [N, H, W, C],
producing the same numerical result as the original Resize.  The
synthetic zero Add is then collapsed by downstream eltwise / const-fold
passes (xcompiler's pipeline does the same: ReplaceQDQResizePass tags
the eltwise with original_resize_opt=true, and a later fusion absorbs
the zero into a downstream skip-connection Add).

Match conditions:
  * single-use XFEResize
  * rank-4 static input AND output (NHWC)
  * input_shape[1] == 1 && input_shape[2] == 1
  * output_shape[1] > 1 || output_shape[2] > 1
  * batch and channel dims match across input/output
  * input and output are uniform quant types with matching scale/zp

This avoids the backend qlinear_resize kernel for the corner-case
1x1->HxW shape that often fails or runs sub-optimally on IPU; observed
on scene_parser_512_256_v2_int8 (Resize_173_8), PSO3, PSA2, PSA3 and
mep_v2/K2.

Placement: after ConvertToChannelLast (creates XFEResize), after the
5D->4D and transpose-optimisation passes (stable rank-4 NHWC), and
right before ReplaceQuantizedTileToAddPass (its analogue for Tile) so
the emitted onnx.Add is immediately lowered by ReplaceQDQEltwisePass.

Co-authored-by: Cursor <cursoragent@cursor.com>
@nvunnam57128 nvunnam57128 force-pushed the nvunnam/replace-qdq-resize-1x1-to-broadcasting-add branch from 0910ff9 to 215fc3a Compare June 1, 2026 17:33
Apply repo .clang-format (LLVM + AlwaysBreakTemplateDeclarations: Yes +
AlignAfterOpenBracket: DontAlign) to ReplaceQDQResizePass.cpp.
Whitespace-only fix for the 4 violations reported by clang-format 20.0.0git
on the previous commit (struct header break, two notifyMatchFailure
continuation breaks, and the rewriter.create<ONNXConstantOp> continuation).

Co-authored-by: Cursor <cursoragent@cursor.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant