Skip to content

Nvunnam/combined pr801 803 804#805

Open
nvunnam57128 wants to merge 4 commits into
Xilinx:feature/onnx-to-tosafrom
nvunnam57128:nvunnam/combined-pr800-801-804
Open

Nvunnam/combined pr801 803 804#805
nvunnam57128 wants to merge 4 commits into
Xilinx:feature/onnx-to-tosafrom
nvunnam57128:nvunnam/combined-pr800-801-804

Conversation

@nvunnam57128
Copy link
Copy Markdown

No description provided.

@nvunnam57128 nvunnam57128 marked this pull request as draft June 1, 2026 17:49
@nvunnam57128 nvunnam57128 marked this pull request as ready for review June 1, 2026 17:50
rachgupt-amd and others added 4 commits June 1, 2026 23:36
…ting Add

Mirrors the IPU-specific 1x1-input optimisation in xcompiler's
ReplaceQDQResizePass (src/pass/passes/ReplaceQDQResizePass.cpp lines
200-282).  When a quantised XFEResize takes a tensor of shape
[N, 1, 1, C] and upsamples it to [N, H, W, C] (NHWC, with H>1 or W>1),
the Resize is functionally a broadcast: there is only one source
pixel per (N, C) so bilinear/nearest collapse to replication.

The rewrite drops the Resize and emits

  %zero  = onnx.Constant of shape [N, H, W, C] in the output quant type,
           storage value = output_zero_point (decodes to 0.0)
  %out   = onnx.Add(%resize_input, %zero)

ONNX Add broadcasts [N, 1, 1, C] + [N, H, W, C] -> [N, H, W, C],
producing the same numerical result as the original Resize.  The
synthetic zero Add is then collapsed by downstream eltwise / const-fold
passes (xcompiler's pipeline does the same: ReplaceQDQResizePass tags
the eltwise with original_resize_opt=true, and a later fusion absorbs
the zero into a downstream skip-connection Add).

Match conditions:
  * single-use XFEResize
  * rank-4 static input AND output (NHWC)
  * input_shape[1] == 1 && input_shape[2] == 1
  * output_shape[1] > 1 || output_shape[2] > 1
  * batch and channel dims match across input/output
  * input and output are uniform quant types with matching scale/zp

This avoids the backend qlinear_resize kernel for the corner-case
1x1->HxW shape that often fails or runs sub-optimally on IPU; observed
on scene_parser_512_256_v2_int8 (Resize_173_8), PSO3, PSA2, PSA3 and
mep_v2/K2.

Placement: after ConvertToChannelLast (creates XFEResize), after the
5D->4D and transpose-optimisation passes (stable rank-4 NHWC), and
right before ReplaceQuantizedTileToAddPass (its analogue for Tile) so
the emitted onnx.Add is immediately lowered by ReplaceQDQEltwisePass.

Co-authored-by: Cursor <cursoragent@cursor.com>
Apply repo .clang-format (LLVM + AlwaysBreakTemplateDeclarations: Yes +
AlignAfterOpenBracket: DontAlign) to ReplaceQDQResizePass.cpp.
Whitespace-only fix for the 4 violations reported by clang-format 20.0.0git
on the previous commit (struct header break, two notifyMatchFailure
continuation breaks, and the rewriter.create<ONNXConstantOp> continuation).

Co-authored-by: Cursor <cursoragent@cursor.com>
…ot/Erf/Mod, normalize Softmax axis to last, disable Where
@nvunnam57128 nvunnam57128 force-pushed the nvunnam/combined-pr800-801-804 branch from 8e7cb7f to c132477 Compare June 2, 2026 05:36
@jorickert jorickert removed request for jorickert and p-lanza June 2, 2026 07:04
@nvunnam57128 nvunnam57128 changed the title Nvunnam/combined pr800 801 804 Nvunnam/combined pr801 803 804 Jun 2, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants