Skip to content

[ci] pin llvm-aie to last known-good nightly (temporary, revert when llvm-aie#1005 lands)#1617

Merged
erwei-xilinx merged 1 commit into
Xilinx:mainfrom
erwei-xilinx:pin-llvm-aie-temporary-regression
May 19, 2026
Merged

[ci] pin llvm-aie to last known-good nightly (temporary, revert when llvm-aie#1005 lands)#1617
erwei-xilinx merged 1 commit into
Xilinx:mainfrom
erwei-xilinx:pin-llvm-aie-temporary-regression

Conversation

@erwei-xilinx
Copy link
Copy Markdown
Collaborator

Summary

Temporary workaround. Pin llvm-aie (Peano) to the last known-good nightly to unblock Ryzen AI CI while Xilinx/llvm-aie#1005 (the upstream fix) is in review.

Root cause

Nightly llvm-aie==21.0.0.2026051601+55604435 (2026-05-16) introduced an llc assertion failure in AIELoopUtils::findPrologueEpilogue that fires on legitimate non-pipelined single-MBB loops with multiple non-loop predecessors — a CFG pattern produced by every AIE kernel with an outer loop ending in a halt-spin self-edge. The assertion was added to a utility extracted from the pipeliner (where the precondition holds) and then reused from a new remark emitter that runs on every single-MBB loop. Full analysis in Xilinx/llvm-aie#1005.

Bisect:

nightly result
2026051501+f4933ef7 (May 15) OK — last good
2026051601+55604435 (May 16) CRASH — first bad

Impact (currently blocking)

Reference failing CI run: actions/runs/26049388494.

  • NPU Phoenix (npu1, amd8845hs) — 9 xrt tests
  • NPU Strix (npu2, amdhx370) — 11 xrt tests

Same crash on both runners (same llc binary, same findPrologueEpilogue stack).

Changes

Two install sites pinned with a TEMPORARY: comment explaining the rationale and pointing at the upstream PR so future maintainers know to revert:

- python3 -m pip install --upgrade --force-reinstall llvm-aie -f https://github.com/Xilinx/llvm-aie/releases/expanded_assets/nightly
+ python3 -m pip install --upgrade --force-reinstall "llvm-aie==21.0.0.2026051501+f4933ef7" -f https://github.com/Xilinx/llvm-aie/releases/expanded_assets/nightly

Revert plan

Once Xilinx/llvm-aie#1005 lands and a fresh llvm-aie nightly publishes, revert this commit to restore the unpinned install.

Test plan

  • CI runs against pinned 2026051501+f4933ef7 — that exact nightly was already shown to pass on main pre-regression (last green Ryzen run was May 14 with the same code path; bisect confirms May 15 nightly also good).
  • Watch this PR's Ryzen AI checks for green.

🤖 Generated with Claude Code

Copilot AI review requested due to automatic review settings May 18, 2026 23:45
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Pins llvm-aie to the last known-good nightly to unblock Ryzen AI CI runs affected by an llc assertion failure introduced in the 2026-05-16 nightly, with inline “TEMPORARY” comments pointing to the upstream fix PR for later reversion.

Changes:

  • Pin llvm-aie to 21.0.0.2026051501+f4933ef7 in the Ryzen AI CI workflow.
  • Pin the same llvm-aie nightly in the developer wheel-based build script, with rationale and revert guidance.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.

File Description
utils/build-mlir-air-using-wheels.sh Pins llvm-aie nightly in the local wheel-install build path with context for reverting.
.github/workflows/buildAndTestRyzenAI.yml Pins llvm-aie nightly in CI to prevent llc crashes on Ryzen AI runners.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

…llvm-aie#1005 lands)

Nightly llvm-aie 2026-05-16 onwards (commit 55604435 and later) crash llc
on legitimate non-pipelined single-MBB loops via an assert in
AIELoopUtils::findPrologueEpilogue, blocking 9+ Ryzen AI xrt tests on
both NPU Phoenix and NPU Strix.

Pin to the last known-good nightly (2026-05-15, f4933ef7) in both the
Ryzen AI CI workflow and the developer wheel-install script until the
upstream fix lands. See Xilinx/llvm-aie#1005 for the root cause and fix.

Revert this commit once a clean llvm-aie nightly is published.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@erwei-xilinx erwei-xilinx merged commit 2a452f4 into Xilinx:main May 19, 2026
31 checks passed
fifield pushed a commit to fifield/mlir-air that referenced this pull request May 21, 2026
llvm-aie#1005 was merged 2026-05-19 and the fix is in nightly
21.0.0.2026052001+5ed15934 (first nightly built after the merge).
All 9 npu1 xrt tests that failed in Xilinx#1617 (02_mul_shim_1x1,
03_mul_L1L2_1x1, 06_add_shim_bf16, 07_extern_linalg,
28_gemm_loop_nest_bf16, 29_gemm_4_level_tiling_extern_vec_4x4_bf16,
34_cascade_vecadd, 36_cascade_vecmat_i32,
38_cascade_vecmat_transform_2x4_i32) pass locally with the new
nightly. Revert both install sites pinned in Xilinx#1617 to unpinned
\`llvm-aie\`.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants