Skip to content

[AIE] Implement top-down delay slot fixup for regions with top-fixed …#1004

Merged
andcarminati merged 1 commit into
aie-publicfrom
andreu.topdown.epilogue
May 20, 2026
Merged

[AIE] Implement top-down delay slot fixup for regions with top-fixed …#1004
andcarminati merged 1 commit into
aie-publicfrom
andreu.topdown.epilogue

Conversation

@andcarminati
Copy link
Copy Markdown
Collaborator

…bundles

When a region contains both top-fixed bundles and a delay-slot instruction, and no bot-fixed bundles are present, schedule the entire region top-down and correct the branch position afterwards via fixupDelaySlotPosition() instead of forcing bottom-up cycles.

The fixup locates the branch in TopBundles and moves it so that exactly NumDelaySlots bundles follow it:

Case 1: Already correct — return immediately.

Case 2: Too few trailing bundles — append NOPs, then fall through to Case 3 if inter-zone conflicts arise.

Case 3: Too many trailing bundles — extract the branch and re-place it at the first slot (scanning forward from TargetIdx) where neither a resource conflict (checked via the Top scoreboard at constant delta −(NumDelaySlots + 1)) nor an inter-zone conflict exists. Each failed attempt appends one NOP to TopBundles, advancing Top's scoreboard ring and preserving the NumDelaySlots invariant. A post-placement safety check handles any residual inter-zone conflict by appending a NOP and recursing.

; CHECK-NEXT: - PrologueBundles: '28'
; CHECK-NEXT: - Epilogue: bb.3.outer.loop.latch
; CHECK-NEXT: - EpilogueBundles: '33'
; CHECK-NEXT: - EpilogueBundles: '37'
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

JNZ is conflicting with the move slot here.

; CHECK-NEXT: vlda.ups.2x cml1, s0, upssign1, [p2], #64; vmac dm1, dm1, x2, x6, r8
; CHECK-NEXT: vlda.ups.2x cmh1, s0, upssign1, [p2], #64; vmac dm0, dm0, x2, x4, r8
; CHECK-NEXT: vlda.ups.2x cmh0, s0, upssign1, [p3], #64; vshuffle x10, x10, x1, r2
; CHECK-NEXT: jz r16, #.LBB0_1
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This regression will disappear after enabling JNZD as default in the outerloop pipeliner.

<< PlacedIdx << "\n");

// Step 3: Add BranchMI to the chosen bundle.
TopBundles[PlacedIdx].add(BranchMI);
Copy link
Copy Markdown
Collaborator Author

@andcarminati andcarminati May 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should emit the instruction in the scoreboard here for the last check safety.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

@andcarminati andcarminati force-pushed the andreu.topdown.epilogue branch from fd7bb64 to f716dff Compare May 19, 2026 07:28
CurMBB->splice(computeSplicePoint(TopBundles, PlacedIdx, BranchMI, CurMBB),
CurMBB, BranchMI->getIterator());

// Safety check: the Top scoreboard was not updated with the branch's new
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now it is up-to-date and we can check.

@andcarminati andcarminati force-pushed the andreu.topdown.epilogue branch from f716dff to 7cc7ff0 Compare May 19, 2026 08:42
TII->getNumReservedDelaySlots(*MI));
getAIEHazardRecognizer(Bot)->setReservedCycles(Reserved);

if (RegionTopDownCycles > 0 && EnableDelaySlotTopDown &&
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess we could assign this condition to DelaySlotFixupNeeded directly, then set DelaySlotCycles on it being false.

// possible. Now it is time to proceed with the bottom-up approach.
// Note: when DelaySlotFixupNeeded is true (top-fixed bundles + delay slot),
// we stay top-down for the entire region and fix up the branch position
// in leaveRegion().
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need a lot of explaining comment. I think that's caused by using a necessary repair (DelaySlotFixupNeeded) as an implicit scheduling mode. Perhaps it would be clearer to rename to 'PersistentTopDown' and derive the necessary repair from it.

for (auto It = Bundle.SlotMap.begin(); It != Bundle.SlotMap.end(); ++It) {
if (It->second == MI) {
const auto *SlotInfo = Bundle.FormatInterface->getSlotInfo(It->first);
if (SlotInfo)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks weird, not to have SlotInfo in a map keyed by it. I think we can assert on it.

auto &Instrs = Bundle.Instrs;
Instrs.erase(std::remove(Instrs.begin(), Instrs.end(), MI), Instrs.end());
for (auto It = Bundle.SlotMap.begin(); It != Bundle.SlotMap.end(); ++It) {
if (It->second == MI) {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

std::find, even if it is a reversed lookup?

const int BranchIdx = findInBundles(TopBundles, BranchMI);
assert(BranchIdx >= 0 && "Branch with delay slot not found in TopBundles");

// Non-const: Case 2 may extend TopBundles before falling through to Case 3.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this is an AI comment. It serves no purpose.

LLVM_DEBUG(dbgs() << "fixupDelaySlotPosition: appending " << Missing
<< " NOP bundle(s)\n");
for (unsigned I = 0; I < Missing; ++I)
AppendNop();
Copy link
Copy Markdown
Collaborator

@martien-de-jong martien-de-jong May 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest while (BundlesAfterBranch < NumDelaySlots) and keeping BundlesAfterBranch up to date.
The while can absorb the initial if statement, unindenting the continuation to case 3.

LLVM_DEBUG(dbgs() << "fixupDelaySlotPosition: inter-zone conflict after "
"delay-slot NOPs, moving branch forward\n");
BundlesAfterBranch =
TopBundles.size() - static_cast<unsigned>(BranchIdx) - 1;
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks messy. I guess pushing an extra nop and incrementing BundlesAfterBranch would be natural to clear the conflict and initiate step 3.

…bundles

When a region contains both top-fixed bundles and a delay-slot instruction,
and no bot-fixed bundles are present, schedule the entire region top-down
and correct the branch position afterwards via `fixupDelaySlotPosition()`
instead of forcing bottom-up cycles.

The fixup locates the branch in `TopBundles` and moves it so that exactly
`NumDelaySlots` bundles follow it:

Case 1: Already correct — return immediately.

Case 2: Too few trailing bundles — append NOPs, then fall through to Case 3
if inter-zone conflicts arise.

Case 3: Too many trailing bundles — extract the branch and re-place it at
the first slot (scanning forward from `TargetIdx`) where neither a resource
conflict (checked via the Top scoreboard at constant delta `−(NumDelaySlots + 1)`)
nor an inter-zone conflict exists. Each failed attempt appends one NOP to `TopBundles`,
advancing Top's scoreboard ring and preserving the `NumDelaySlots` invariant.
A post-placement safety check handles any residual inter-zone conflict by appending a
NOP and recursing.
@andcarminati andcarminati force-pushed the andreu.topdown.epilogue branch from 7cc7ff0 to 8a6dc23 Compare May 19, 2026 13:01
Copy link
Copy Markdown
Collaborator

@martien-de-jong martien-de-jong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A top nop drop

@andcarminati
Copy link
Copy Markdown
Collaborator Author

andcarminati commented May 20, 2026

QoR data:
image
image

@andcarminati andcarminati merged commit d8404a0 into aie-public May 20, 2026
7 checks passed
@andcarminati andcarminati deleted the andreu.topdown.epilogue branch May 20, 2026 14:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants