[P0] Unaccounted Routing Delay Causes Extra Stall Before ADD Operation in BICG

## Description
There is an II mismatch between the compiler's schedule and the RTL execution in the BiCG kernel. The compiler schedules under the assumption of zero-delay for cross-tile data movement, but the RTL implementation introduces a 1-cycle routing pipeline register delay on mesh links.

When this routing delay falls on a fully scheduled operation slot with no `NAH`  slots to absorb it, it causes data misalignment (one operand arrives late). This results in an unexpected 1-cycle stall right before the operation can execute, directly increasing the overall II.

## Log Trace
We can observe this behavior on an `add` (`+`) operation at `tile 4` (`t4`), which ends up occupying two cycles instead of one:

```text
  cyc= 280 | t0:a9t29(NAH)        | t1:a9t29(grant_pred)✓ | t2:a0t30(NAH)        | t4:a9t29(+)         ◇ | t5:a8t28(!)         ✓ | t6:a0t30(ret_void)  ◇ | t8:a8t28(*)         ✓ | t9:a8t28(*)         ☒ | t12:a0t10(st)        ◇ | t13:a8t8(NAH)       
  cyc= 281 | t0:a0t30(grant_once')✓ | t1:a9t29(grant_pred)☒ | t2:a1t31(NAH)        | t4:a9t29(+)         ✓ | t5:a9t29(grant_pred)✓ | t6:a0t30(ret_void)  ◇ | t8:a9t29(strdcst)   ◇ | t9:a8t28(*)         ☒ | t12:a0t10(st)        ◇ | t13:a9t9(+)         ✓

As shown in PE (0,1). The `ADD` takes 2 cycles, the first one is actually the stall.

<img width="4201" height="1253" alt="Image" src="https://github.com/user-attachments/assets/fd147019-cb80-433a-9800-6f055eeac061" />


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[P0] Unaccounted Routing Delay Causes Extra Stall Before ADD Operation in BICG #283

Description

Log Trace

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

[P0] Unaccounted Routing Delay Causes Extra Stall Before ADD Operation in BICG #283

Description

Description

Log Trace

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions