Skip to content

[P0] Unaccounted Routing Delay Causes Extra Stall Before ADD Operation in BICG #283

@guosran

Description

@guosran

Description

There is an II mismatch between the compiler's schedule and the RTL execution in the BiCG kernel. The compiler schedules under the assumption of zero-delay for cross-tile data movement, but the RTL implementation introduces a 1-cycle routing pipeline register delay on mesh links.

When this routing delay falls on a fully scheduled operation slot with no NAH slots to absorb it, it causes data misalignment (one operand arrives late). This results in an unexpected 1-cycle stall right before the operation can execute, directly increasing the overall II.

Log Trace

We can observe this behavior on an add (+) operation at tile 4 (t4), which ends up occupying two cycles instead of one:

  cyc= 280 | t0:a9t29(NAH)        | t1:a9t29(grant_pred)✓ | t2:a0t30(NAH)        | t4:a9t29(+)         ◇ | t5:a8t28(!)         ✓ | t6:a0t30(ret_void)  ◇ | t8:a8t28(*)         ✓ | t9:a8t28(*)         ☒ | t12:a0t10(st)        ◇ | t13:a8t8(NAH)       
  cyc= 281 | t0:a0t30(grant_once')✓ | t1:a9t29(grant_pred)☒ | t2:a1t31(NAH)        | t4:a9t29(+)         ✓ | t5:a9t29(grant_pred)✓ | t6:a0t30(ret_void)  ◇ | t8:a9t29(strdcst)   ◇ | t9:a8t28(*)         ☒ | t12:a0t10(st)        ◇ | t13:a9t9(+)         ✓

As shown in PE (0,1). The `ADD` takes 2 cycles, the first one is actually the stall.

<img width="4201" height="1253" alt="Image" src="https://github.com/user-attachments/assets/fd147019-cb80-433a-9800-6f055eeac061" />

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions