Skip to content

[AIEX] Fix invalid remat after super-reg rewrite/unbundle#987

Merged
niwinanto merged 3 commits into
aie-publicfrom
niwin.splitmap.repair
May 19, 2026
Merged

[AIEX] Fix invalid remat after super-reg rewrite/unbundle#987
niwinanto merged 3 commits into
aie-publicfrom
niwin.splitmap.repair

Conversation

@niwinanto
Copy link
Copy Markdown
Collaborator

Greedy register allocation can produce an invalid rematerialization after the AIE register-rewriter passes run, because those passes modify the IR in place but leave the LiveInterval of the VRM "Original" ancestor stale. SplitKit::defFromParent walks back through VRM.getOriginal() and uses that stale OrigLI to pick a remat candidate, which can clone an instruction whose semantics no longer match the destination's register class.

; CHECK-NEXT: {{ $}}
; CHECK-NEXT: [[MOV_RLC_imm11_pseudo2:%[0-9]+]]:er = MOV_RLC_imm11_pseudo 1
; CHECK-NEXT: [[AND1:%[0-9]+]]:er = AND [[COPY]], [[MOV_RLC_imm11_pseudo2]]
; CHECK-NEXT: [[MOV_PD_imm11_pseudo4:%[0-9]+]]:eds = MOV_PD_imm11_pseudo 128
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was an illegal copy that we have fixed now.

@niwinanto niwinanto force-pushed the niwin.splitmap.repair branch 4 times, most recently from c68212a to f2b1ad1 Compare May 7, 2026 10:42
DebugVars);

// Prevent SplitKit from rematerializing through stale ancestor LIs.
LLVM_DEBUG(dbgs() << "Clearing stale split-from mappings...\n");
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: this will emit debug prints twice, once here, once in clearStaleSplitFromMappings itself.
Drop the print here?

@andcarminati
Copy link
Copy Markdown
Collaborator

Hi Niwin, could you please share the current QoR status of this fix?

# RUN: llc -mtriple=aie2p -start-before=greedy %s -o /dev/null --filetype=obj

# RUN: : not --crash llc -mtriple=aie2p -start-before=greedy %s -o /dev/null --filetype=obj
# RUN: llc -mtriple=aie2p -start-before=greedy -stop-before=virtregrewriter %s -o - | FileCheck %s
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Check: with -stop-before=virtregrewrite we currently prevent the crash, right?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right, we do the wrong remat in greedy and crash during code emission. Reason I stopped before virtregrewriter is to minimize the change to MIR(still use virtual registers) so that wrong remat is clear. Also, I kept the run line to reproduce the crash.

@niwinanto
Copy link
Copy Markdown
Collaborator Author

@andcarminati
Core_compute_Insn_Count for the changed kernels(AIE2PS)

Topk1D_int8_0,3077,3346,PASS,REGR(+8.74%)
Topk2D_int8_0,178758,177934,PASS,IMPR(-0.46%)
Conv2D_Transpose_fp16_1,4729,4595,PASS,IMPR(-2.83%)
Conv2D_Transpose_bf16_AIE2_1,4728,4594,PASS,IMPR(-2.83%)
Conv2D_Transpose_fp16_0,59339,56981,PASS,IMPR(-3.97%)
Conv2D_Transpose_bf16_AIE2_0,59333,56975,PASS,IMPR(-3.97%)
Average diff,+0.00%,-0.01%,,-0.01%

Topk1d tests are fragile, as their inputs change from run to run and they strongly affect the execution profile, generated code is literally same between 2 runs. And regarding the improvements it reports, RA has not really reclaimed all the regressions but it changed the allocation and that act as an enabler for the pipeliner and we have better code in the end.

On the other side, Core_PM regress for these kernels.

Conv2D_Transpose_bf16_AIE2_1,4800,5024,PASS,REGR(+4.67%)
Conv2D_Transpose_fp16_1,4800,5024,PASS,REGR(+4.67%)
Conv2D_Transpose_bf16_AIE2_0,6896,7120,PASS,REGR(+3.25%)
Conv2D_Transpose_fp16_0,6896,7120,PASS,REGR(+3.25%)

No change to Core_StackSize.

Niwin Anto added 3 commits May 19, 2026 05:18
LiveRange::clear() destructively mutates a shared LiveInterval; any cached
reference held by greedy (spill weights, queue entries, hint
reconciliation) is left dangling.
AIESuperRegRewriter and AIEUnallocatedSuperRegRewriter rewrite or
unbundle instructions in place but leave the LiveInterval of the
VRM "Original" ancestor untouched. Its VNInfos still point at slots
whose MIs have changed, so a later Greedy split that consults
OrigLI via SplitEditor::defFromParent can rematerialize the wrong
instruction (e.g. a sublane-only MOV becomes a full-composite def).

Add AIESuperRegUtils::clearStaleSplitFromMappings and call it from
both passes. It clears the VRM split-from mapping for every
descendant of a touched Original via the new
VirtRegMap::clearSplitFromReg API, restoring each descendant to
the canonical "no split parent" state of a freshly created vreg.
SplitKit then reads the descendant's own repaired LI rather than
the stale ancestor's, and predicates that examine Virt2SplitMap
directly (e.g. isAssignedReg) no longer treat the descendant as a
split product. OrigLI is left untouched.
The earlier "Sever stale VRM split-from chain after super-reg rewrites"
commit calls VRM.clearSplitFromReg() on descendants whose ancestor
LiveInterval has gone stale, so SplitKit::defFromParent no longer
rematerializes through it. A side effect is that InlineSpiller can no
longer recognise sibling vregs as belonging to the same logical group
(VRM.getOriginal() now returns the descendant itself), so each sibling
is assigned its own stack slot instead of sharing one.

Add a target hook TargetSubtargetInfo::getSpillGroupOriginal that lets
a target advertise a "logical group original" for stack-slot sharing
in InlineSpiller, independent of the VRM split-from chain. Defaults to
no override.

Plumb it through AIE:
- AIEMachineFunctionInfo gains a side map (descendant -> pre-severance
  Original) with record/get accessors.
- AIESuperRegUtils::clearStaleSplitFromMappings records the chain into
  that side map before calling VRM.clearSplitFromReg().
- AIEBaseSubtarget overrides getSpillGroupOriginal to forward to
  AIEMachineFunctionInfo.
- Callers (AIE[Unallocated]SuperRegRewriter) updated to pass MF.
@niwinanto niwinanto force-pushed the niwin.splitmap.repair branch from 3f4ae1b to 23b474e Compare May 19, 2026 11:18
Copy link
Copy Markdown
Collaborator

@andcarminati andcarminati left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks for fixing this!

@niwinanto niwinanto merged commit 5ed1593 into aie-public May 19, 2026
7 checks passed
@niwinanto niwinanto deleted the niwin.splitmap.repair branch May 19, 2026 12:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants