merge main into amd-main by z1-cciauto · Pull Request #1840 · ROCm/llvm-project

z1-cciauto · 2026-03-22T16:06:53Z

No description provided.

@mgorny

This is instruction for cross-compiling LLDB on FreeBSD based on @mgorny's [blog post](https://web.archive.org/web/20250827001729/https://www.moritz.systems/blog/freebsd-remote-process-plugin-on-non-x86-architectures/). Tested building arm64 binary on amd64 machine and building amd64 binary on arm64 machine. --------- Signed-off-by: Minsoo Choo <minsoochoo0122@proton.me> Co-authored-by: Michał Górny <mgorny@quansight.com>

) It erroneously merged the closing brace even when breaking after the opening brace. Fixes llvm#187444

…vm#187635) This argument should be used by ControllerAccess implementations to pass bootstrap information (process triple, page size, initial symbols and values) to the controller.

…ling code (llvm#184014) This patch introduce a new HandleModuleName function to avoid duplicated code snippet in module name handling stage. --------- Signed-off-by: yronglin <yronglin777@gmail.com> Signed-off-by: Wang, Yihan <yronglin777@gmail.com>

… of CIRDialectLLVMIRTranslationInterface (llvm#186073) Add the amendOperation override to handle CIR dialect attributes during MLIR-to-LLVM IR translation. This dispatches to amendModule for ModuleOp, enabling module metadata. This PR also adds support to emit AMDGPU-specific module flags amdhsa_code_object_version and amdgpu_printf_kind to match OGCG behavior. In CIRGenModule, the flags are stored as CIR module attributes: cir.amdhsa_code_object_version (integer) cir.amdgpu_printf_kind (string: "hostcall" or "buffered") During lowering to LLVM IR (in LowerToLLVMIR.cpp), these attributes are converted to LLVM module flags. Upstreaming basic changes from clangIR PRs: llvm/clangir@61e9ebd llvm/clangir#768 llvm/clangir#773 llvm/clangir#2100

…lvm#187627) Part 2/4: Implement HALO for coroutines that flow off final suspend. Parent PR approved in llvm#185336, with no change since then Since `coro.id` is unavailable in resumers, Elide `coro.free` based on frame instead of `coro.id`

The fallback non-canonicalize path didn't work. Use a more straightforward implementation. Eventually this should use the pattern from llvm#172998

This eliminates duplicated epilog code. The unused half optimizes out just fine after inlining.

I was very puzzled the other day when it showed that VF 8 had a cost of X and VF 16 had a cost of X/2, yet it still choose VF 8. This PR adds some extra debug output to explain why this happens.

These were originally ported from rocm device libs in bc81ebe. Merge in more recent changes.

…llvm#187445) The previous value of 0 was allowing loads to move past the mops operations where it is not valid. Use a LocationSize::afterPointer() size instead. The GISel lowering currently loses the MMO, which is fine as it should be conservatively treated as a load/store to any location.

Follow the ordinary gentype conventions for the log implementation, instead of using a plain header. This doesn't quite yet enable vectorization, due to how the table is currently indexed. This should make it easier for targets to selectively overload the function for a subset of types.

…87538) This is pretty verbose and ugly. We're pulling the base implementation in for the double cases, and scalarizing it. Also fully defining the half and float cases to directly use the intrinsic, for all vector types. It would be much more convenient if we had linker based overrides for the generic implementations, rather than per source file.

…llvm#187570) This is to help with llvm#185382 and to make sure that I don't miss any PRs.

…#187462) This will allow us to more conveniently use llvm::formatv in the codebase.

This patch adds a Clang-compatible -mtune option to llc, to enable decoupled ISA and microarchitecture targeting, which is especially important for backend development. For example, it can enable to easily test a subtarget feature or scheduling model effects on codegen across a variaty of workloads on the IR corpus benchmark: https://github.com/dtcxzyw/llvm-codegen-benchmark. The implementation adds an isolated generic codegen flag, to establish a base for wider usage - the plan is to add it to `opt` as well in a followup patch. Then `llc` consumes it, and sets `tune-cpu` attributes for functions, which are further consumed by the backend.

…e archives (llvm#187113) Add checksum verification for libxml2, zlib, and zstd source archives via `cmake -E *sum` and `cmake -E compare_files` commands. This also adds the following minor changes: * Factor out libxml2 version into variable. * Check `tar` return code.

Bitcast the large scalar integer to a vXi64 vector, reverse the elements and then perform a per-element vXi64 bitreverse If we have SSSE3 or later, BITREVERSE expansion using PSHUFB is always more efficient than performing it as a scalar sequence (no need for mayFoldIntoVector check). Fixes llvm#187353

This pass does not actually use TargetMachine/TargetLoweringInfo.

llvm#187644) …tail storage" (llvm#187410) This reverts commit bf1db77. Avoid using an `InterpFrame` member after calling its destructor this time. I hope that was the only problem.

This updates `matchExtendedReductionOperand` so the simple case of `UpdateR(PrevValue, ext(...))` is matched first as an early exit. The binop matching is then flattened to remove the extra layer of the `MatchExtends` lambda.

"Effective" is the wrong word: Both overloads are effective; they do what they're supposed to do. But the character overload does less work.

…sics (llvm#187513) Previously, GlobalISel was failing to select these intrinsics when given scalar operands, as RegBankSelect would place these on GPR banks. Fixing this enables GlobalISel to lower correctly, as in Instruction Selection the intrinsic matches the SIMD patterns in AArch64InstrInfo.td.

Async operations transfer data between global memory and LDS. Their progress is tracked by the ASYNC_CNT counter on GFX1250 and later architectures. This change introduces the representation of that counter in SIInsertWaitCnts. For now, the programmer must manually insert s_wait_asyncnt instructions. Later changes will add compiler assistance for generating the waits by including this counter in the asyncmark instructions. Assisted-by: Claude Sonnet 4.5 This is part of a stack: - llvm#185813 - llvm#185810

…ns (llvm#187241) SPIR-V backend previously only supported function annotations in llvm.global.annotations and crashed with a fatal error when encountering global variable entries

…lvm#187483) Most loop transformations, like unrolling and vectorization, expect the latch branch to be countable. Allow rotation, if it turns the latch from uncountable to countable. This use SCEV to check for countable exits, if CheckExitCount set. Currently it is not set for the LPM1 run (where SCEV is not used by other passes), only in LPM. With that compile-time impact is mostly neutral https://llvm-compile-time-tracker.com/compare.php?from=eba342d0ba930a404a026c80aada51c43974f0db&to=2e676337b45fae63ce9498116d8e6e43772363c5&stat=instructions:u ClamAV is consistently slower (~+0.15%) and 7zip faster in most cases (~-0.13%) Across a large test set based on C/C++ workloads, this rotates ~0.8% more loops with ~2.68M rotated loops. For the test set, ~2.7% more loops are runtime-unrolled and +6.36% more early exit loops vectorized on ARM64 macOS. This fixes a regression where std::ranges::find_last loops stopped being runtime-unrolled after llvm@5f648c3 which changed the loop structure so we stopped rotating. https://clang.godbolt.org/z/6baeE1av6 Based on llvm#162654. Co-authored-by: Marek Sedláček <mr.mareksedlacek@gmail.com> PR: llvm#187483

…nd whitespace handling (llvm#186950) The `check_alphabetical_order.py` script previously only scanned the first line of each bullet point in `ReleaseNotes.rst`, causing sorting failures when a `:doc:` tag was split across multiple lines. Also, when it is sorting the last entry of a section, the script will insert an unnecessary whitespace. This PR fixes these two problems.

…lvm#187020) The `CallEvent` has data members that store the `LocationContext` and the `CFGElementRef` (i.e. `CFGBlock` + index of statement within that block); but the method `getReturnValueUnderConstruction` ignored these and used the currently analyzed `LocationContext` and `CFGBlock` instead of them. This was logically incorrect and would have caused problems if the `CallEvent` was used later when the "currently analyzed" things are different. However, the lit tests do pass even if I assert that the currently analyzed `LocationContext` and `CFGBlock` is the same as the ones saved in the `CallEvent`, so I'm pretty sure that there was no actual problem caused by this bad logic and this commit won't cause functional changes. I also evaluated this change on a set of open source projects (postgres, tinyxml2, libwebm, xerces, bitcoin, protobuf, qtbase, contour, openrct2) and validated that it doesn't change the results of the analysis.

…lock sub (llvm#184715) Resolves: llvm#174933 The issue goes into a case where fetch_sub(n) is properly optimized but fetch_add(neg(n)) is not optimized to the same code. Although the issue is tagged for x86 I assumed this be best handled outside of the backends so I put this in InstCombine.

Just use an empty list always.

Inline the definition of a variable into an assertion given it has no other users and no side effects.

…m#187842) Fixes llvm#171841

Lay the ground for C++26 `constexpr` math functions: - Introduce `LIBC_ENABLE_CONSTEXPR` macro switch to specify the desire of `constexpr`-only code route. - Introduce `LIBC_HAS_CONSTANT_EVALUATION` to indicate that we are using `constexpr`-only code in all dependent functions. - Introduce `LIBC_CONSTEXPR` macro qualifier to aid in altering the signature of non-`constexpr` functions. Note that non-`constexpr` qualified functions are caused by the exploitation of non-`constexpr` compatible utils, resulting in non-qualified dependent function, but it can be modified to be qualified using other code routes. If the function is `constexpr` compatible, then it's prohibited to use `LIBC_CONSTEXPR` as a function qualifier. We only qualify it with `constexpr` as usual. `LIBC_CONSTEXPR` may or may not evaluate to `constexpr` depending on the environment configurations, thus it's only used to modify the function signature in constant evaluation context and remove the qualifier if it's not desired (depending on provided configurations). Possible side effects: - Current qualified routes may or may not produce the desired ULP, this is implementation dependent (function by function basis) and needs further testing of the chosen code route. - The shared tests in the current configuration can still compile with unsupported compiler. I didn't want to raise compilation error with unsupported compilers now, but we need to push compiler support with newer versions for this one to work as intended.

…/invalid-vop3-source-modifiers.mir (llvm#187888)

Use (nearly) the same code to align case statements and expression, as the other alignments do. That way we also fix two things: - Keep the ColumnLimit intact, without duplicating the calculation. - Align all the case colons, even for empty cases.

The new code introduced for `-verify-directives` in PR llvm#179835 enforces that the order of diagnostics matches the order of the directives. However, before checking this, it sorts the directives by SourceLocation. Perhaps non-obviously, all directives which appear inside a single comment are given the same SourceLocation, pointing to the beginning of the comment. While these are added in order they appear in the comment, the non-stable std::sort may non-detministically misorder them. Switching to stable_sort ensures the correct order is verified. This was observed as a random test failure on the checks in clang/test/CXX/drs/cwg25xx.cpp lines 250 and 264, in some builds of Clang. Note that those lines end in backslashes, and thus, despite appearances, the directives on the following lines are also within the same single comment.

…7640) Session::tryCreateService will try to create an instance of ServiceT by forwarding the given arguments to the ServiceT::Create method, which must return an Expected<std::unique_ptr<ServiceT>>. This enables one-line construction and registration of Services with fallible constructors (which are expected to be common).

…lvm#187826) Use FieldDecl::getFieldIndex() instead of manually iterating over fields.

… of struct (llvm#187054) Fixes llvm#186749

Adds optional attribute to allow specialization into category Linalg ops. The default behavior of the transform op remains unchanged.

…m#186800) This relates to llvm#35980.

We can use the negate if carry trick for abdu, and it works on all legal for sbb

…87508) Fixes llvm#187012 which is a false positive on clang-tidy end.

…i/gcs=always (llvm#186343) Previously, the implicit warnings from force-bti (or gcs=always) weren't possible to silence. The force-ibt/cet-report flags could also be handled the same way, but I haven't checked with GNU ld how they behave. And there, the force-ibt flag only produces warnings if the IBT bit is missing, while cet-report warns if either IBT or SHSTK are missing - but force-ibt probably shouldn't implicitly start warning for missing SHSTK. This addresses a discrepancy to GNU ld that was noted in llvm#186173.

…lvm#187725)

Since befaa35, the CI stably failed for the generic-no-wide-characters build, because in no-`wchar_t` modes, the header for `__remove_cv_t` wasn't properly included. This PR adds the missing include of `<__type_traits/remove_cv.h>`. As drive-by, `<__cstddef/size_t.h>` and `<__type_traits/is_constant_evaluated.h>`, which are included by `<cwchar>`, are also made included by `<string>` to avoid potential regression as we're using `size_t` and `__libcpp_is_constant_evaluated()` in `<string>`.

Fixes llvm#163732

z1-cciauto · 2026-03-22T16:11:09Z

PSDB Build Link: http://mlse-bdc-20dd129:8065/#/builders/11/builds/72

mchoo7 and others added 30 commits March 20, 2026 13:41

[clang-format] Fix an AllowShortNamespacesOnASingleLine bug (llvm#187451

f5e2876

) It erroneously merged the closing brace even when breaking after the opening brace. Fixes llvm#187444

[orc-rt] Add BootstrapInfo argument to ControllerAccess::connect. (ll…

416935e

…vm#187635) This argument should be used by ControllerAccess implementations to pass bootstrap information (process triple, page size, initial symbols and values) to the controller.

libclc: Replace flush_if_daz implementation (llvm#187569)

090c405

The fallback non-canonicalize path didn't work. Use a more straightforward implementation. Eventually this should use the pattern from llvm#172998

libclc: Implement sin and cos with sincos (llvm#187571)

7f8e236

This eliminates duplicated epilog code. The unused half optimizes out just fine after inlining.

[LV] Explain why a less profitable VF was chosen (NFCI) (llvm#187469)

a971089

I was very puzzled the other day when it showed that VF 8 had a cost of X and VF 16 had a cost of X/2, yet it still choose VF 8. This PR adds some extra debug output to explain why this happens.

libclc: Update trigpi functions (llvm#187579)

421bf13

These were originally ported from rocm device libs in bc81ebe. Merge in more recent changes.

libclc: Override cbrt for AMDGPU (llvm#187560)

c8dd829

[clang][cir] Adding myself in CODEOWNERS for CIRGenBuiltinAArch64.cpp (…

facc82d

…llvm#187570) This is to help with llvm#185382 and to make sure that I don't miss any PRs.

[lldb] Implement llvm::formatv overload for Stream::operator << (llvm…

4df2967

…#187462) This will allow us to more conveniently use llvm::formatv in the codebase.

[ExpandMemCmp] Remove unused TM/TLI dependency (llvm#187660)

ab28384

This pass does not actually use TargetMachine/TargetLoweringInfo.

Reapply "[clang][bytecode] Allocate local variables in InterpFrame … (

78f267f

llvm#187644) …tail storage" (llvm#187410) This reverts commit bf1db77. Avoid using an `InterpFrame` member after calling its destructor this time. I hope that was the only problem.

[clang-tidy] Fix "effective" -> "efficient". (llvm#187536)

4376bf2

"Effective" is the wrong word: Both overloads are effective; they do what they're supposed to do. But the character overload does less work.

[SPIR-V] Support global variable annotations in llvm.global.annotatio…

14de6da

…ns (llvm#187241) SPIR-V backend previously only supported function annotations in llvm.global.annotations and crashed with a fatal error when encountering global variable entries

[BAZEL] Add missing affine python enum gen (llvm#187669)

66bc565

Takashiidobe and others added 22 commits March 21, 2026 20:45

[gn] "port" 0ec9f7e

2be28d6

Just use an empty list always.

[Clang][HLSL] Fix -Wunused-variable

adcb17b

Inline the definition of a variable into an assertion given it has no other users and no side effects.

[clang-format] Correctly annotate Java lambda/sychronized blocks (llv…

1f1d316

…m#187842) Fixes llvm#171841

[NFC][AMDGPU] Set output to null for llvm/test/MachineVerifier/AMDGPU…

1120c97

…/invalid-vop3-source-modifiers.mir (llvm#187888)

[libc][math] Refactor sqrtbf16 function header-only (llvm#187849)

aa62224

[clang-format][NFC] Remove redundant parens enclosing braced list

f014202

[clang][CodeGen] Use FieldDecl::getFieldIndex() in VisitOffsetOfExpr (l…

24546d9

…lvm#187826) Use FieldDecl::getFieldIndex() instead of manually iterating over fields.

[clang-tidy] False negatives readability-redundant-parantheses member…

a547208

… of struct (llvm#187054) Fixes llvm#186749

[mlir][linalg] Specialize transform op - emit category ops (llvm#187506)

5b71607

Adds optional attribute to allow specialization into category Linalg ops. The default behavior of the transform op remains unchanged.

[llvm][DebugInfo] Use formatv instead of format in DWARFDebugLoc (llv…

5324c23

…m#186800) This relates to llvm#35980.

[X86] Prefer branchless code with sbb for abdu (llvm#187783)

a0d5508

We can use the negate if carry trick for abdu, and it works on all legal for sbb

[clang] Detect pointee mutations in placement new expressions (llvm#1…

b4084bd

…87508) Fixes llvm#187012 which is a false positive on clang-tidy end.

[clang][AST] Fix assertion in getFullyQualifiedType for DecltypeType (l…

720abd7

…lvm#187725)

[Clang] Support constexpr for AVX512 compress intrinsics (llvm#187656)

b1cf9b0

Fixes llvm#163732

merge main into amd-main

03e870d

z1-cciauto requested review from antiagainst, david-salinas, kuhar, lamb-j, nicolasvasilache and stellaraccident as code owners March 22, 2026 16:06

z1-cciauto requested a review from a team March 22, 2026 16:06

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

merge main into amd-main#1840

merge main into amd-main#1840
z1-cciauto wants to merge 191 commits intoamd-mainfrom
upstream_merge_202603221206

z1-cciauto commented Mar 22, 2026

Uh oh!

z1-cciauto commented Mar 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants

Conversation

z1-cciauto commented Mar 22, 2026

Uh oh!

z1-cciauto commented Mar 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants