Martien.maxlatencyfinder#1007
Conversation
There was a problem hiding this comment.
This is probably a dead end. With separate DAG instance hanging around, I would like to have more control and visibility on the dag mutators to call, rather than have the machine scheduler calling them.
Since we have a buildGraph() virtual in the scheduler strategy anyway, we could also directly call the 'AIE' dag mutators as part of that buildGraph().
| ; ASM-NEXT: vlda.3d.ups.s32.d8 cm6, s1, [p2], d0; movxm le, #.L_LEnd0 | ||
| ; ASM-NEXT: vlda.ups.s32.d8 cm4, s1, [p1], m1; and r0, r2, r0; mov s1, r3 | ||
| ; ASM-NEXT: vlda.3d.ups.s32.d8 cm5, s1, [p2], d0; add r0, r0, #-4; mov r2, #-2 | ||
| ; ASM-NEXT: vlda.3d.ups.s32.d8 cm3, s1, [p2], d0; lshl r0, r0, r2; mov crSRSSign, r6 |
There was a problem hiding this comment.
This is one of the best improvements. The loads got the 7 cycles load latency, whereas the dependence was on the pointer increments. Therefore, the loads got pushed away from the end of the block without reason.
Revert to old latency computation
This prepares for less rigid DAG mutators that can be localized, and given the function/interblock context in a less indirect way
| ; CHECK-NEXT: lda r22, [p6], #-4; movs dc0, dj4; or r30, r5, r5; mov r5, dj4 | ||
| ; CHECK-NEXT: lda m2, [p6], #-4; movs dc1, dj4; or r8, r7, r7; mov r7, dj4 | ||
| ; CHECK-NEXT: lda dj2, [p6], #-4; movs dc5, dj4; mov r23, m5 | ||
| ; CHECK-NEXT: lda dj6, [p6], #-4; movs dc3, dj4; mov dj1, r31 |
There was a problem hiding this comment.
Crazy. I guess p6 was responsible for this huge pessimistic latency.
Use new, more accurate exit latency, retaining the computation of the old one for now. The old behavious can be restored by returning OldEffectiveLatency rather than NewEffectiveLatency in MaxLatencyFinder::operator() reference updates have been checked superficially. Nothing outrageous stands out, but I don't give a guarantee for correctness yet.
c084955 to
c3b66ba
Compare
o findEarliestRef cs removed o InterBlockEdges subclasses DataDependencyHelper and override mayAlias to implement SafeToIgnoreeMemDeps o AAResults enters through the scheduling context in the constructor
Eliminate SuccessorEdges in favour of InterBlockEdges
This PR reuses InterBlockEdges to get a better estimate of MaxLatency.
Rather than scan the scheduled successor regions, it creates a dag between the block's bottom region and the successors top region, which are made available for every block before starting scheduling on any of them.
The results is accurate operand-specific latencies. If the successor block was scheduled, we check the depth of the destination nodes to correct those latencies before taking the maximum.
This is WIP, there's a ton of cleanup to do.