[AIE] Implement top-down delay slot fixup for regions with top-fixed …#1004
Conversation
| ; CHECK-NEXT: - PrologueBundles: '28' | ||
| ; CHECK-NEXT: - Epilogue: bb.3.outer.loop.latch | ||
| ; CHECK-NEXT: - EpilogueBundles: '33' | ||
| ; CHECK-NEXT: - EpilogueBundles: '37' |
There was a problem hiding this comment.
JNZ is conflicting with the move slot here.
| ; CHECK-NEXT: vlda.ups.2x cml1, s0, upssign1, [p2], #64; vmac dm1, dm1, x2, x6, r8 | ||
| ; CHECK-NEXT: vlda.ups.2x cmh1, s0, upssign1, [p2], #64; vmac dm0, dm0, x2, x4, r8 | ||
| ; CHECK-NEXT: vlda.ups.2x cmh0, s0, upssign1, [p3], #64; vshuffle x10, x10, x1, r2 | ||
| ; CHECK-NEXT: jz r16, #.LBB0_1 |
There was a problem hiding this comment.
This regression will disappear after enabling JNZD as default in the outerloop pipeliner.
| << PlacedIdx << "\n"); | ||
|
|
||
| // Step 3: Add BranchMI to the chosen bundle. | ||
| TopBundles[PlacedIdx].add(BranchMI); |
There was a problem hiding this comment.
We should emit the instruction in the scoreboard here for the last check safety.
fd7bb64 to
f716dff
Compare
| CurMBB->splice(computeSplicePoint(TopBundles, PlacedIdx, BranchMI, CurMBB), | ||
| CurMBB, BranchMI->getIterator()); | ||
|
|
||
| // Safety check: the Top scoreboard was not updated with the branch's new |
There was a problem hiding this comment.
Now it is up-to-date and we can check.
f716dff to
7cc7ff0
Compare
| TII->getNumReservedDelaySlots(*MI)); | ||
| getAIEHazardRecognizer(Bot)->setReservedCycles(Reserved); | ||
|
|
||
| if (RegionTopDownCycles > 0 && EnableDelaySlotTopDown && |
There was a problem hiding this comment.
I guess we could assign this condition to DelaySlotFixupNeeded directly, then set DelaySlotCycles on it being false.
| // possible. Now it is time to proceed with the bottom-up approach. | ||
| // Note: when DelaySlotFixupNeeded is true (top-fixed bundles + delay slot), | ||
| // we stay top-down for the entire region and fix up the branch position | ||
| // in leaveRegion(). |
There was a problem hiding this comment.
You need a lot of explaining comment. I think that's caused by using a necessary repair (DelaySlotFixupNeeded) as an implicit scheduling mode. Perhaps it would be clearer to rename to 'PersistentTopDown' and derive the necessary repair from it.
| for (auto It = Bundle.SlotMap.begin(); It != Bundle.SlotMap.end(); ++It) { | ||
| if (It->second == MI) { | ||
| const auto *SlotInfo = Bundle.FormatInterface->getSlotInfo(It->first); | ||
| if (SlotInfo) |
There was a problem hiding this comment.
It looks weird, not to have SlotInfo in a map keyed by it. I think we can assert on it.
| auto &Instrs = Bundle.Instrs; | ||
| Instrs.erase(std::remove(Instrs.begin(), Instrs.end(), MI), Instrs.end()); | ||
| for (auto It = Bundle.SlotMap.begin(); It != Bundle.SlotMap.end(); ++It) { | ||
| if (It->second == MI) { |
There was a problem hiding this comment.
std::find, even if it is a reversed lookup?
| const int BranchIdx = findInBundles(TopBundles, BranchMI); | ||
| assert(BranchIdx >= 0 && "Branch with delay slot not found in TopBundles"); | ||
|
|
||
| // Non-const: Case 2 may extend TopBundles before falling through to Case 3. |
There was a problem hiding this comment.
I guess this is an AI comment. It serves no purpose.
| LLVM_DEBUG(dbgs() << "fixupDelaySlotPosition: appending " << Missing | ||
| << " NOP bundle(s)\n"); | ||
| for (unsigned I = 0; I < Missing; ++I) | ||
| AppendNop(); |
There was a problem hiding this comment.
I would suggest while (BundlesAfterBranch < NumDelaySlots) and keeping BundlesAfterBranch up to date.
The while can absorb the initial if statement, unindenting the continuation to case 3.
| LLVM_DEBUG(dbgs() << "fixupDelaySlotPosition: inter-zone conflict after " | ||
| "delay-slot NOPs, moving branch forward\n"); | ||
| BundlesAfterBranch = | ||
| TopBundles.size() - static_cast<unsigned>(BranchIdx) - 1; |
There was a problem hiding this comment.
This looks messy. I guess pushing an extra nop and incrementing BundlesAfterBranch would be natural to clear the conflict and initiate step 3.
…bundles When a region contains both top-fixed bundles and a delay-slot instruction, and no bot-fixed bundles are present, schedule the entire region top-down and correct the branch position afterwards via `fixupDelaySlotPosition()` instead of forcing bottom-up cycles. The fixup locates the branch in `TopBundles` and moves it so that exactly `NumDelaySlots` bundles follow it: Case 1: Already correct — return immediately. Case 2: Too few trailing bundles — append NOPs, then fall through to Case 3 if inter-zone conflicts arise. Case 3: Too many trailing bundles — extract the branch and re-place it at the first slot (scanning forward from `TargetIdx`) where neither a resource conflict (checked via the Top scoreboard at constant delta `−(NumDelaySlots + 1)`) nor an inter-zone conflict exists. Each failed attempt appends one NOP to `TopBundles`, advancing Top's scoreboard ring and preserving the `NumDelaySlots` invariant. A post-placement safety check handles any residual inter-zone conflict by appending a NOP and recursing.
7cc7ff0 to
8a6dc23
Compare


…bundles
When a region contains both top-fixed bundles and a delay-slot instruction, and no bot-fixed bundles are present, schedule the entire region top-down and correct the branch position afterwards via
fixupDelaySlotPosition()instead of forcing bottom-up cycles.The fixup locates the branch in
TopBundlesand moves it so that exactlyNumDelaySlotsbundles follow it:Case 1: Already correct — return immediately.
Case 2: Too few trailing bundles — append NOPs, then fall through to Case 3 if inter-zone conflicts arise.
Case 3: Too many trailing bundles — extract the branch and re-place it at the first slot (scanning forward from
TargetIdx) where neither a resource conflict (checked via the Top scoreboard at constant delta−(NumDelaySlots + 1)) nor an inter-zone conflict exists. Each failed attempt appends one NOP toTopBundles, advancing Top's scoreboard ring and preserving theNumDelaySlotsinvariant. A post-placement safety check handles any residual inter-zone conflict by appending a NOP and recursing.