Martien.maxlatencyfinder by martien-de-jong · Pull Request #1007 · Xilinx/llvm-aie

martien-de-jong · 2026-05-21T12:42:23Z

This PR reuses InterBlockEdges to get a better estimate of MaxLatency.

Rather than scan the scheduled successor regions, it creates a dag between the block's bottom region and the successors top region, which are made available for every block before starting scheduling on any of them.

The results is accurate operand-specific latencies. If the successor block was scheduled, we check the depth of the destination nodes to correct those latencies before taking the maximum.

This is WIP, there's a ton of cleanup to do.

martien-de-jong · 2026-05-21T12:51:31Z

This is probably a dead end. With separate DAG instance hanging around, I would like to have more control and visibility on the dag mutators to call, rather than have the machine scheduler calling them.
Since we have a buildGraph() virtual in the scheduler strategy anyway, we could also directly call the 'AIE' dag mutators as part of that buildGraph().

martien-de-jong · 2026-05-21T13:13:36Z

+; ASM-NEXT:    vlda.3d.ups.s32.d8 cm6, s1, [p2], d0; movxm le, #.L_LEnd0
+; ASM-NEXT:    vlda.ups.s32.d8 cm4, s1, [p1], m1; and r0, r2, r0; mov s1, r3
+; ASM-NEXT:    vlda.3d.ups.s32.d8 cm5, s1, [p2], d0; add r0, r0, #-4; mov r2, #-2
+; ASM-NEXT:    vlda.3d.ups.s32.d8 cm3, s1, [p2], d0; lshl r0, r0, r2; mov crSRSSign, r6


This is one of the best improvements. The loads got the 7 cycles load latency, whereas the dependence was on the pointer increments. Therefore, the loads got pushed away from the end of the block without reason.

…rtion

Revert to old latency computation

This prepares for less rigid DAG mutators that can be localized, and given the function/interblock context in a less indirect way

andcarminati · 2026-05-21T13:25:20Z

+; CHECK-NEXT:    lda r22, [p6], #-4; movs dc0, dj4; or r30, r5, r5; mov r5, dj4
+; CHECK-NEXT:    lda m2, [p6], #-4; movs dc1, dj4; or r8, r7, r7; mov r7, dj4
+; CHECK-NEXT:    lda dj2, [p6], #-4; movs dc5, dj4; mov r23, m5
+; CHECK-NEXT:    lda dj6, [p6], #-4; movs dc3, dj4; mov dj1, r31


Crazy. I guess p6 was responsible for this huge pessimistic latency.

Use new, more accurate exit latency, retaining the computation of the old one for now. The old behavious can be restored by returning OldEffectiveLatency rather than NewEffectiveLatency in MaxLatencyFinder::operator() reference updates have been checked superficially. Nothing outrageous stands out, but I don't give a guarantee for correctness yet.

o findEarliestRef cs removed o InterBlockEdges subclasses DataDependencyHelper and override mayAlias to implement SafeToIgnoreeMemDeps o AAResults enters through the scheduling context in the constructor

Eliminate SuccessorEdges in favour of InterBlockEdges

martien-de-jong requested review from F-Stuckmann, SagarMaheshwari99, abhinay-anubola, abnikant, andcarminati, katerynamuts, khallouh, konstantinschwarz, mludevid, niwinanto and stephenneuendorffer as code owners May 21, 2026 12:42

martien-de-jong marked this pull request as draft May 21, 2026 12:45

martien-de-jong commented May 21, 2026

View reviewed changes

Martien de Jong added 6 commits May 21, 2026 15:15

[AIE][InterBlockScheduling] Separate region creation and Pro/Epi inse…

de38636

…rtion

[AIE][InterBlock] Also use GatheringRegions for regular, non-loop blocks

6c368a3

[AIE][InterBlock] Factor out InterBlockEdges into DataDependenceHelper

73adde4

succedges first attempt.

e076424

Revert to old latency computation

[AIE] Simpler MaxLatencyFinder constructor.

58f3302

[AIE] SchedStrategy.buildGraph also calls the appropriate DAG mutators

3fac8ad

This prepares for less rigid DAG mutators that can be localized, and given the function/interblock context in a less indirect way

andcarminati reviewed May 21, 2026

View reviewed changes

martien-de-jong force-pushed the martien.maxlatencyfinder branch from c084955 to c3b66ba Compare May 21, 2026 13:29

Martien de Jong added 2 commits May 26, 2026 15:29

hNFC][MaxLatencyFinder] Remove old maxlatency computation

843f961

o findEarliestRef cs removed o InterBlockEdges subclasses DataDependencyHelper and override mayAlias to implement SafeToIgnoreeMemDeps o AAResults enters through the scheduling context in the constructor

[NFC][AIE] generalize InterBlockEdges with Depth and Height maps.

fb1c127

Eliminate SuccessorEdges in favour of InterBlockEdges

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Martien.maxlatencyfinder#1007

Martien.maxlatencyfinder#1007
martien-de-jong wants to merge 9 commits into
aie-publicfrom
martien.maxlatencyfinder

martien-de-jong commented May 21, 2026

Uh oh!

martien-de-jong May 21, 2026

Uh oh!

martien-de-jong May 21, 2026

Uh oh!

andcarminati May 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

martien-de-jong commented May 21, 2026

Uh oh!

martien-de-jong May 21, 2026

Choose a reason for hiding this comment

Uh oh!

martien-de-jong May 21, 2026

Choose a reason for hiding this comment

Uh oh!

andcarminati May 21, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants