Skip to content

Martien.maxlatencyfinder#1007

Draft
martien-de-jong wants to merge 9 commits into
aie-publicfrom
martien.maxlatencyfinder
Draft

Martien.maxlatencyfinder#1007
martien-de-jong wants to merge 9 commits into
aie-publicfrom
martien.maxlatencyfinder

Conversation

@martien-de-jong
Copy link
Copy Markdown
Collaborator

This PR reuses InterBlockEdges to get a better estimate of MaxLatency.

Rather than scan the scheduled successor regions, it creates a dag between the block's bottom region and the successors top region, which are made available for every block before starting scheduling on any of them.

The results is accurate operand-specific latencies. If the successor block was scheduled, we check the depth of the destination nodes to correct those latencies before taking the maximum.

This is WIP, there's a ton of cleanup to do.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is probably a dead end. With separate DAG instance hanging around, I would like to have more control and visibility on the dag mutators to call, rather than have the machine scheduler calling them.
Since we have a buildGraph() virtual in the scheduler strategy anyway, we could also directly call the 'AIE' dag mutators as part of that buildGraph().

; ASM-NEXT: vlda.3d.ups.s32.d8 cm6, s1, [p2], d0; movxm le, #.L_LEnd0
; ASM-NEXT: vlda.ups.s32.d8 cm4, s1, [p1], m1; and r0, r2, r0; mov s1, r3
; ASM-NEXT: vlda.3d.ups.s32.d8 cm5, s1, [p2], d0; add r0, r0, #-4; mov r2, #-2
; ASM-NEXT: vlda.3d.ups.s32.d8 cm3, s1, [p2], d0; lshl r0, r0, r2; mov crSRSSign, r6
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is one of the best improvements. The loads got the 7 cycles load latency, whereas the dependence was on the pointer increments. Therefore, the loads got pushed away from the end of the block without reason.

; CHECK-NEXT: lda r22, [p6], #-4; movs dc0, dj4; or r30, r5, r5; mov r5, dj4
; CHECK-NEXT: lda m2, [p6], #-4; movs dc1, dj4; or r8, r7, r7; mov r7, dj4
; CHECK-NEXT: lda dj2, [p6], #-4; movs dc5, dj4; mov r23, m5
; CHECK-NEXT: lda dj6, [p6], #-4; movs dc3, dj4; mov dj1, r31
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Crazy. I guess p6 was responsible for this huge pessimistic latency.

Use new, more accurate exit latency, retaining the computation of
the old one for now. The old behavious can be restored by returning
OldEffectiveLatency rather than NewEffectiveLatency in
MaxLatencyFinder::operator()

reference updates have been checked superficially. Nothing outrageous
stands out, but I don't give a guarantee for correctness yet.
@martien-de-jong martien-de-jong force-pushed the martien.maxlatencyfinder branch from c084955 to c3b66ba Compare May 21, 2026 13:29
Martien de Jong added 2 commits May 26, 2026 15:29
o findEarliestRef cs removed
o InterBlockEdges subclasses DataDependencyHelper and override mayAlias to
  implement SafeToIgnoreeMemDeps
o AAResults enters through the scheduling context in the constructor
Eliminate SuccessorEdges in favour of InterBlockEdges
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants