Skip to content

merge main into amd-main#1840

Open
z1-cciauto wants to merge 191 commits intoamd-mainfrom
upstream_merge_202603221206
Open

merge main into amd-main#1840
z1-cciauto wants to merge 191 commits intoamd-mainfrom
upstream_merge_202603221206

Conversation

@z1-cciauto
Copy link
Collaborator

No description provided.

mchoo7 and others added 30 commits March 20, 2026 13:41
This is instruction for cross-compiling LLDB on FreeBSD based on
@mgorny's [blog
post](https://web.archive.org/web/20250827001729/https://www.moritz.systems/blog/freebsd-remote-process-plugin-on-non-x86-architectures/).
Tested building arm64 binary on amd64 machine and building amd64 binary
on arm64 machine.

---------

Signed-off-by: Minsoo Choo <minsoochoo0122@proton.me>
Co-authored-by: Michał Górny <mgorny@quansight.com>
)

It erroneously merged the closing brace even when breaking after the
opening brace.

Fixes llvm#187444
…vm#187635)

This argument should be used by ControllerAccess implementations to pass
bootstrap information (process triple, page size, initial symbols and
values) to the controller.
…ling code (llvm#184014)

This patch introduce a new HandleModuleName function to avoid duplicated
code snippet in module name handling stage.

---------

Signed-off-by: yronglin <yronglin777@gmail.com>
Signed-off-by: Wang, Yihan <yronglin777@gmail.com>
… of CIRDialectLLVMIRTranslationInterface (llvm#186073)

Add the amendOperation override to handle CIR dialect attributes during
MLIR-to-LLVM IR translation. This dispatches to  amendModule for ModuleOp,
enabling module metadata.

This PR also adds support to emit AMDGPU-specific module flags
amdhsa_code_object_version and amdgpu_printf_kind to match OGCG
behavior.

In CIRGenModule, the flags are stored as CIR module attributes:

cir.amdhsa_code_object_version (integer)
cir.amdgpu_printf_kind (string: "hostcall" or "buffered")
During lowering to LLVM IR (in LowerToLLVMIR.cpp), these attributes are
converted to LLVM module flags.

Upstreaming basic changes from clangIR PRs: 

llvm/clangir@61e9ebd
llvm/clangir#768
llvm/clangir#773
llvm/clangir#2100
…lvm#187627)

Part 2/4: Implement HALO for coroutines that flow off final suspend.
Parent PR approved in llvm#185336,
with no change since then

Since `coro.id` is unavailable in resumers, Elide `coro.free` based on
frame instead of `coro.id`
The fallback non-canonicalize path didn't work. Use a more
straightforward implementation. Eventually this should use
the pattern from llvm#172998
This eliminates duplicated epilog code. The unused half
optimizes out just fine after inlining.
I was very puzzled the other day when it showed that VF 8 had a cost of
X and VF 16 had a cost of X/2, yet it still choose VF 8. This PR adds
some extra debug output to explain why this happens.
These were originally ported from rocm device
libs in bc81ebe.
Merge in more recent changes.
…llvm#187445)

The previous value of 0 was allowing loads to move past the mops
operations where it is not valid. Use a LocationSize::afterPointer()
size instead.

The GISel lowering currently loses the MMO, which is fine as it should
be conservatively treated as a load/store to any location.
Follow the ordinary gentype conventions for the log implementation,
instead of using a plain header. This doesn't quite yet enable
vectorization, due to how the table is currently indexed. This should
make it easier for targets to selectively overload the function for
a subset of types.
…87538)

This is pretty verbose and ugly. We're pulling the base implementation
in for the double cases, and scalarizing it. Also fully defining the
half and float cases to directly use the intrinsic, for all vector
types. It would be much more convenient if we had linker based overrides
for the generic implementations, rather than per source file.
…llvm#187570)

This is to help with llvm#185382 and to make sure that I don't miss any PRs.
…#187462)

This will allow us to more conveniently use llvm::formatv in the
codebase.
This patch adds a Clang-compatible -mtune option to llc, to enable
decoupled ISA and microarchitecture targeting, which is especially
important for backend development. For example, it can enable to easily
test a subtarget feature or scheduling model effects on codegen across a
variaty of workloads on the IR corpus benchmark:
https://github.com/dtcxzyw/llvm-codegen-benchmark.

The implementation adds an isolated generic codegen flag, to establish a
base for wider usage - the plan is to add it to `opt` as well in a
followup patch. Then `llc` consumes it, and sets `tune-cpu` attributes
for functions, which are further consumed by the backend.
…e archives (llvm#187113)

Add checksum verification for libxml2, zlib, and zstd source archives
via `cmake -E *sum` and `cmake -E compare_files` commands.

This also adds the following minor changes:
* Factor out libxml2 version into variable.
* Check `tar` return code.
Bitcast the large scalar integer to a vXi64 vector, reverse the elements
and then perform a per-element vXi64 bitreverse

If we have SSSE3 or later, BITREVERSE expansion using PSHUFB is always
more efficient than performing it as a scalar sequence (no need for
mayFoldIntoVector check).

Fixes llvm#187353
This pass does not actually use TargetMachine/TargetLoweringInfo.
llvm#187644)

…tail storage" (llvm#187410)

This reverts commit bf1db77.

Avoid using an `InterpFrame` member after calling its destructor this
time. I hope that was the only problem.
This updates `matchExtendedReductionOperand` so the simple case of
`UpdateR(PrevValue, ext(...))` is matched first as an early exit. The
binop matching is then flattened to remove the extra layer of the
`MatchExtends` lambda.
"Effective" is the wrong word: Both overloads are effective; they do
what they're supposed to do. But the character overload does less work.
…sics (llvm#187513)

Previously, GlobalISel was failing to select these intrinsics when given
scalar operands, as RegBankSelect would place these on GPR banks. Fixing
this enables GlobalISel to lower correctly, as in Instruction Selection
the intrinsic matches the SIMD patterns in AArch64InstrInfo.td.
Async operations transfer data between global memory and LDS. Their
progress is tracked by the ASYNC_CNT counter on GFX1250 and later
architectures. This change introduces the representation of that counter
in SIInsertWaitCnts. For now, the programmer must manually insert
s_wait_asyncnt instructions. Later changes will add compiler assistance
for generating the waits by including this counter in the asyncmark
instructions.

Assisted-by: Claude Sonnet 4.5

This is part of a stack:

- llvm#185813
- llvm#185810
…ns (llvm#187241)

SPIR-V backend previously only supported function annotations in
llvm.global.annotations and crashed with a fatal error when encountering
global variable entries
…lvm#187483)

Most loop transformations, like unrolling and vectorization, expect the
latch branch to be countable. Allow rotation, if it turns the latch from
uncountable to countable.

This use SCEV to check for countable exits, if CheckExitCount set.
Currently it is not set for the LPM1 run (where SCEV is not used by
other passes), only in LPM.

With that compile-time impact is mostly neutral

https://llvm-compile-time-tracker.com/compare.php?from=eba342d0ba930a404a026c80aada51c43974f0db&to=2e676337b45fae63ce9498116d8e6e43772363c5&stat=instructions:u

ClamAV is consistently slower (~+0.15%) and 7zip faster in most cases
(~-0.13%)

Across a large test set based on C/C++ workloads, this rotates ~0.8%
more loops with ~2.68M rotated loops.

For the test set, ~2.7% more loops are runtime-unrolled and +6.36% more
early exit loops vectorized on ARM64 macOS.

This fixes a regression where std::ranges::find_last loops stopped
being runtime-unrolled after
llvm@5f648c3
which changed the loop
structure so we stopped rotating.

https://clang.godbolt.org/z/6baeE1av6

Based on llvm#162654.

Co-authored-by:  Marek Sedláček <mr.mareksedlacek@gmail.com>

PR: llvm#187483
…nd whitespace handling (llvm#186950)

The `check_alphabetical_order.py` script previously only scanned the
first line of each bullet point in `ReleaseNotes.rst`, causing sorting
failures when a `:doc:` tag was split across multiple lines.

Also, when it is sorting the last entry of a section, the script will
insert an unnecessary whitespace.

This PR fixes these two problems.
…lvm#187020)

The `CallEvent` has data members that store the `LocationContext` and
the `CFGElementRef` (i.e. `CFGBlock` + index of statement within that
block); but the method `getReturnValueUnderConstruction` ignored these
and used the currently analyzed `LocationContext` and `CFGBlock` instead
of them.

This was logically incorrect and would have caused problems if the
`CallEvent` was used later when the "currently analyzed" things are
different. However, the lit tests do pass even if I assert that the
currently analyzed `LocationContext` and `CFGBlock` is the same as the
ones saved in the `CallEvent`, so I'm pretty sure that there was no
actual problem caused by this bad logic and this commit won't cause
functional changes.

I also evaluated this change on a set of open source projects (postgres,
tinyxml2, libwebm, xerces, bitcoin, protobuf, qtbase, contour, openrct2)
and validated that it doesn't change the results of the analysis.
Takashiidobe and others added 22 commits March 21, 2026 20:45
…lock sub (llvm#184715)

Resolves: llvm#174933

The issue goes into a case where fetch_sub(n) is properly optimized but
fetch_add(neg(n)) is not optimized to the same code.

Although the issue is tagged for x86 I assumed this be best handled
outside of the backends so I put this in InstCombine.
Just use an empty list always.
Inline the definition of a variable into an assertion given it has no
other users and no side effects.
Lay the ground for C++26 `constexpr` math functions:
- Introduce `LIBC_ENABLE_CONSTEXPR` macro switch to specify the desire
of `constexpr`-only code route.
- Introduce `LIBC_HAS_CONSTANT_EVALUATION` to indicate that we are using
`constexpr`-only code in all dependent functions.
- Introduce `LIBC_CONSTEXPR` macro qualifier to aid in altering the
signature of non-`constexpr` functions.

Note that non-`constexpr` qualified functions are caused by the
exploitation of non-`constexpr` compatible utils, resulting in
non-qualified dependent function, but it can be modified to be qualified
using other code routes.

If the function is `constexpr` compatible, then it's prohibited to use
`LIBC_CONSTEXPR` as a function qualifier. We only qualify it with
`constexpr` as usual.

`LIBC_CONSTEXPR` may or may not evaluate to `constexpr` depending on the
environment configurations, thus it's only used to modify the function
signature in constant evaluation context and remove the qualifier if
it's not desired (depending on provided configurations).

Possible side effects:
- Current qualified routes may or may not produce the desired ULP, this
is implementation dependent (function by function basis) and needs
further testing of the chosen code route.
- The shared tests in the current configuration can still compile with
unsupported compiler. I didn't want to raise compilation error with
unsupported compilers now, but we need to push compiler support with
newer versions for this one to work as intended.
Use (nearly) the same code to align case statements and expression, as
the other alignments do. That way we also fix two things:
- Keep the ColumnLimit intact, without duplicating the calculation.
- Align all the case colons, even for empty cases.
The new code introduced for `-verify-directives` in PR llvm#179835 enforces
that the order of diagnostics matches the order of the directives.
However, before checking this, it sorts the directives by
SourceLocation. Perhaps non-obviously, all directives which appear
inside a single comment are given the same SourceLocation, pointing to
the beginning of the comment. While these are added in order they appear
in the comment, the non-stable std::sort may non-detministically
misorder them. Switching to stable_sort ensures the correct order is
verified.

This was observed as a random test failure on the checks in
clang/test/CXX/drs/cwg25xx.cpp lines 250 and 264, in some builds of
Clang. Note that those lines end in backslashes, and thus, despite
appearances, the directives on the following lines are also within the
same single comment.
…7640)

Session::tryCreateService will try to create an instance of ServiceT by
forwarding the given arguments to the ServiceT::Create method, which
must return an Expected<std::unique_ptr<ServiceT>>.

This enables one-line construction and registration of Services with
fallible constructors (which are expected to be common).
…lvm#187826)

Use FieldDecl::getFieldIndex() instead of manually iterating over
fields.
Adds optional attribute to allow specialization into category Linalg
ops.

The default behavior of the transform op remains unchanged.
We can use the negate if carry trick for abdu, and it works on all legal for sbb
…i/gcs=always (llvm#186343)

Previously, the implicit warnings from force-bti (or gcs=always) weren't
possible to silence.

The force-ibt/cet-report flags could also be handled the same way, but I
haven't checked with GNU ld how they behave. And there, the force-ibt
flag only produces warnings if the IBT bit is missing, while cet-report
warns if either IBT or SHSTK are missing - but force-ibt probably
shouldn't implicitly start warning for missing SHSTK.

This addresses a discrepancy to GNU ld that was noted in llvm#186173.
Since befaa35, the CI stably failed for
the generic-no-wide-characters build, because in no-`wchar_t` modes, the
header for `__remove_cv_t` wasn't properly included.

This PR adds the missing include of `<__type_traits/remove_cv.h>`.

As drive-by, `<__cstddef/size_t.h>` and
`<__type_traits/is_constant_evaluated.h>`, which are included by
`<cwchar>`, are also made included by `<string>` to avoid potential
regression as we're using `size_t` and
`__libcpp_is_constant_evaluated()` in `<string>`.
@z1-cciauto
Copy link
Collaborator Author

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.