Skip to content

merge main into amd-main#1801

Open
z1-cciauto wants to merge 210 commits intoamd-mainfrom
upstream_merge_202603181348
Open

merge main into amd-main#1801
z1-cciauto wants to merge 210 commits intoamd-mainfrom
upstream_merge_202603181348

Conversation

@z1-cciauto
Copy link
Collaborator

No description provided.

kaladron and others added 30 commits March 17, 2026 16:12
…187025)

When building the full library with -nostdinc, directly including
<stdint.h> may pull in host or compiler-provided headers that collide
with LLVM-libc's local macro definitions. Switch to using our internal
stdint-macros.h when LIBC_FULL_BUILD is enabled.

Additionally, declare aligned_alloc with noexcept in C++ to match common
C library declarations and avoid fatal type specification mismatches
during sysroot builds.
Noticed while working on i512 shift expansion - if we end up with
repeated splat args, the compress node is unnecessary as we're just
shuffling the same element values
This is a small refactor of how DeducedType and it's derived types are
represented.

The different deduction kinds are spelled out in an enum, and how this
is tracked is simplified, to allow easier profiling.

How these types are constructed and canonicalized is also brought more
in line with how it works for the other types.

This fixes a crash reported here:
llvm#167513 (comment)
This does a few changes that are hard to separate from each other:
* Consider forming minnum/maxnum from setcc+select non-profitable. X86
  has instructions specifically for the setcc+select pattern. (Without
  this it's hard to get good coverage on this code path.)
* Reduce duplication in the code for forming FMIN/FMAX, by using
  predicate inversion (to make setcc and select operand order match) and
  predicate invswap (to canonicalize to ordered predicates). This leaves
  us with just ordered and NaN-less predicates.
* For non-strict non-less predicates, convert them to strict ones via
  invswap (i.e. swapping the operands of both the setcc and select).
  Previously this just treated them the same as strict predicates, but I
  believe that's incorrect in terms of signed zero handling.
Adds a newPM pass for AArch64ConditionOptimizer.

- Refactors base logic into an Impl class
- Renames old pass with the "Legacy" suffix
- Adds the new pass manager pass using refactored logic
- Updates tests

Context and motivation in
https://llvm.org/docs/NewPassManager.html#status-of-the-new-and-legacy-pass-managers
Use a class instead of an alias, so that CycleInfo can be
forward-declared.

We can't do the same for Cycle without further changes (a LoopInfo like
CRTP scheme).
Occasionally wait_for_file_on_target will time out on the Green Dragon
bots and we're not sure why. I'm adding this logging in an attempt to
get more clues as to what's happening when it fails.
…#187037)

AMDGPUTargetMachine also had a static method which did the same thing.
Remove it so that we have a single source of truth.
…for non-descriptor dummies (llvm#186894)

When ignore_tkr(c) is set and the actual argument is an allocatable or
pointer (stored as a descriptor), the lowering code was unconditionally
returning the descriptor pointer as-is, regardless of whether the dummy
argument expects a descriptor. For bind(c) interfaces with assumed-size
dummies (e.g., cuFFT), the dummy expects a raw pointer, not a
descriptor. Passing the descriptor caused the C function to receive the
wrong address, leading to silent data corruption and invalid descriptor
crashes at deallocation.

The fix adds a check that the early return for ignore_tkr(c) only
applies when the dummy type is itself a descriptor type. When the dummy
expects a base address, the normal path is taken, which correctly
extracts the base address from the descriptor via fir.box_addr.
The variable `matches` may be assigned the address of block-scope
`local_matches`, which is defined in a scope strictly smaller than the
scope of `matches`. Towards the end of the function, after
`loacl_matches` has been destroyed, `matches` is accessed, possibly
triggering a user-after-free.
Use `math.round` in lowering of `anint` so we can use passes like
`MathToNVVM` to target device code differently.
…) (llvm#186833)

This patch adds support for `float8_e3m4` and `float8_e4m3` in
`np_to_memref.py` by adding the appropriate ctypes structures.
Additionally changes minimum numpy version to 2.1.0 and uses a single
ml_dtypes version of 0.5.0.
…InputsDead (llvm#186325)

Optimize MakeRegionBranchOpSuccessorInputsDead patterns in
`ControlFlowInterfaces.cpp`:

- Add early exit to `computeReachableValuesFromSuccessorInput` when the
caller only needs to know if there is exactly one reachable value,
avoiding unnecessary traversal.

Assisted-by: Claude Code

Co-authored-by: Yang Bai <yangb@nvidia.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…rand types (llvm#186550)

This patch disables vectorizing Cmps with different operand types
because we can't form a legal vector.
This used to cause an assertion check crash once we attempted to pack
the bundle formed by Cmp's operands.
…ice array descriptors (llvm#186945)

When an allocatable CUDA Fortran device array is privatized in an OpenMP
region, the null descriptor created in the omp.private init region was
missing the allocator_idx attribute. This caused a subsequent allocate()
inside the parallel region to call malloc instead of cudaMalloc, because
the runtime's Descriptor::Allocate() reads allocator_idx from the
descriptor to select the allocator. On some systems it caused
cudaErrorInvalidValue crashes.

This patch sets allocator_idx = 2 (device allocator) on the null
fir.embox in handleNullAllocatable() when the symbol has a CUDA device
attribute, so that the Fortran runtime correctly calls cudaMalloc for
the privatized array.
…shes (llvm#186145)

When the bytecode type callback (test-kind=2) calls iface->readType()
for every builtin type, complex types like MemRefType could crash
because the generated reading code used get<T>() which asserts on
invalid parameters, rather than getChecked<T>() which returns null
gracefully.

This change:
- Adds a getChecked<T>() free function helper in
BytecodeImplementation.h that calls T::getChecked(emitError, params)
(no-context form) when a specific override exists, otherwise falls back
to get<T>(). The with-context second branch is intentionally omitted to
avoid instantiating StorageUserBase::getChecked<Args> for types that
only inherit the base template (e.g. ArrayAttr), which would require
complete storage types unavailable in the bytecode reading TU.
- Updates BytecodeBase.td default cBuilder for
DialectAttribute/DialectType to use getChecked<> instead of get<>.
- Updates all custom cBuilder strings in BuiltinDialectBytecode.td.
- Updates the no-args codegen case in BytecodeDialectGen.cpp.
- Adds a regression test in bytecode_callback_with_custom_type.mlir.

Fixes llvm#128308

Assisted-by: Claude Code
…ources (llvm#187063)

Allows us to use getMaskNode to canonicalize predicate masks in big shift lowering
…186653)

Replace the monolithic cir.binop.overflow operation and its
BinOpOverflowKind enum with three individual operations:
cir.add.overflow, cir.sub.overflow, and cir.mul.overflow.

This follows the same pattern used when BinOp and UnaryOp were
previously split into per-operation ops (cir.add, cir.sub, etc.),
eliminating enum dispatch and enabling per-op traits like Commutative.
This MR removes a hard-coded compute number in an MLIR test. This will
allow the test to not need to be updated in the future. The default
value will come from `NVVMOps.td`.
Allow `--function-order` to be combined with `--reorder-functions`
algorithms. Functions listed in the order file are pinned first
(indices 0..N-1), then the selected algorithm orders remaining
functions starting at index N.
… rules (llvm#186974)

Update `mlir/test/Target/SPIRV/struct.mlir` so it remains valid under
current SPIR-V validator checks in Logical addressing mode.

The recursive struct cases were emitting pointer-allocating globals in
storage classes rejected by `spirv-val`. Adjust those globals to
`Private` while keeping recursive member pointers in `StorageBuffer`,
and update the expected roundtrip types accordingly.

Also add the missing variable-pointer requirements to the module VCE:
- capability: `VariablePointers`
- extension: `SPV_KHR_variable_pointers`

Signed-off-by: Davide Grohmann <davide.grohmann@arm.com>
…)" (llvm#187079)

Reverts llvm#187025

Fails on openmp bot:
https://lab.llvm.org/buildbot/#/builders/10/builds/24743
('INT64_MIN' macro redefined when used Clang-provided <stdint.h> is
used)
fails on RISC-V-32 bot:
https://lab.llvm.org/buildbot/#/builders/196/builds/17067
due to MPFRNumber constructor not picking the right overload for
uint32_t argument.
)

This patch initiates the refactoring of Linux syscalls as described in
the RFC (https://discourse.llvm.org/t/rfc-linux-syscall-cleanup/87248/).

It introduces a new infrastructure in
`src/__support/OSUtil/linux/syscall_wrappers/` to house header-only
syscall wrappers. These wrappers utilize `ErrorOr` to provide a
consistent, type-safe interface for error handling across the library,
standardizing how syscall return values are converted into
errno-compatible Error objects.

Summary of changes:
- Created the `syscall_wrappers` directory and added `close.h`,
`read.h`, `write.h`, and `open.h`.
- Moved the existing `getrandom.h` into the new `syscall_wrappers`
directory and updated its callers (including HashTable/randomness.h).
- Refactored core entrypoints (`close`, `read`, `write`, `open`) to use
the new wrappers, removing direct `syscall_impl` logic and manual errno
setting.
- Updated `shm_open.cpp` to use the new `open` wrapper.
- Cleaned up `OSUtil/linux/fcntl.cpp` by removing redundant internal
implementations of `open` and `close`.
- Added a developer guide in `docs/dev/syscall_wrapper_refactor.rst`
outlining the established pattern for future migrations.

---------

Co-authored-by: Michael Jones <michaelrj@google.com>
…185772)

The `__ob_trap` type specifier can be used to trap (or warn with sanitizers) when overflow or truncation occurs on the specified type.

There was a gap in coverage for this with the `-fsanitize=implicit-integer-sign-change` sanitizer. Fix this by carrying around `__ob_trap` information through `EmitIntegerSignChange()` which allows us to properly trap or warn.
Add `spirv.GroupNonUniformBroadcastFirst` op and tests.
vangthao95 and others added 22 commits March 18, 2026 09:00
…age (llvm#185835)

Instead of heap-allocating an `InterpFrame` and then immediately
heap-allocating more space for the local variables, do only one
heap-allocation and use tail storage for the local variables.
We already know how many bytes we need to for the tail storage after
all.
This also makes `InterpFrame` a little smaller since we don't need to
save an explicit pointer for the local variable memory.






For an artificial test case doing lots of function calls with local
variables like:
```c++
constexpr int plus(int a, int b) {
        int x = a;
        int y = b;
        int z = x + y;
        return z;
}

constexpr int minus(int a, int b) {
        int x = a;
        int y = b;
        int z = x - y;
        return z;
}
constexpr int foo() {
        int a = 0;
        for (unsigned I = 0; I != 1'000'000; ++I) {
                int b = I;
                a = plus(a,b );
                a = minus(a,I);
        }
        return a;
}
static_assert(foo() == 0);
```
this saves us over 6%.

We also eliminate the per-argument `Block` heap allocation on the first
pointer-access to an argument the same way. To make this work, we change
the param ops to use the parameter index instead of the offset.
Refactor filter into a FilterTool class that can process multiple input
files sequentially and emits the remarks into a single output file.

Pull Request: llvm#187162
…#187145)

Add a LoadCompilationPrefixMap() helper in SymbolFile::FindPlugin that
walks up from the symbol file's directory looking for a
compilation-prefix-map.json file. When found, each key→value entry is
applied to the module's source path mapping list, allowing LLDB to
resolve source file paths that were rewritten by -fdebug-prefix-map at
build time without requiring manual `settings set target.source-map`.

The JSON file format maps fake paths (as written into debug info) back
to their real on-disk counterparts:
  { "/fake/srcdir": "/real/srcdir" }

Directory results are cached so the filesystem is walked at most once
per unique directory across all modules loaded in a session.

Also apply the module's source path remappings in
SymbolFileDWARFDebugMap::ParseCompileUnitAtIndex when constructing
compile units from N_SO stabs. This mirrors what MakeAbsoluteAndRemap
does for the dSYM case so that fake paths baked into the debug map are
transparently resolved to real paths.

rdar://84824567

Assisted-By: Claude
llvm#181572)

Update VPReplicateRecipe::computeCost to compute the cost for stores to
invariant addresses only masked by the header mask.

This matches the legacy cost model logic, but it is slightly odd that
the legacy cost model only seems to do this for stores predicated by the
header mask (i.e. tail-folding and not executed conditionally
otherwise). This is probably something we want to re-evaluate
eventually.

PR: llvm#181572
…ion (llvm#186769)

This addresses one of the limitations of llvm#174726 by directly selecting
`v_cvt_[u16/i16]_f16` instructions for conversion between 16-bit types,
as they already handle saturation internally.
If the routine op only has one of the string or id attributes, the API
was crashing since it was attempting to search in both. Guard each
search individually.
Rename to riscv_add_like to discriminate from generic pattern in a future patch

Ideally we'd make the riscv patterns generic but they're currently using value tracking and I'm not sure if we want to that generically?
Align it with the style of `LoopVectorize/VPlan/predicator.ll`:

* Move ascii-graphs close to IR to avoid scrolling through CHECKs when
comparing the picture and actual IR
* Rename `%cN` to ensure that `bbN` branches on `%cN`
…p. (llvm#187189)

If N was changed on the previous loop iteration, we need the handle to
point at the new N.

Fixes llvm#186969.
POSIX.1-2024 defines a header called
`endian.h`` which contains macros and helpers for
handling byte order conversions.

Unfortunately it is not available on all platforms, however LLVM libc
has a useful endian.h
implementation that is essentially only casts and
builtins.

This PR draws on that implementation to add a
clang header so applications on
platforms without it can make use of this header.
The clang header forwards to the system header if
available, so existing usages are not affected.
…ed (llvm#183900)

__asan_region_is_poisoned() uses an exclusive end address
(end = beg + size) to validate the region [beg, end) and to compute
the aligned inner shadow region. This causes correctness issue
near memory range upper boundary and could trigger address space
overflow on 32-bit targets.

1. Incorrect handling of the last byte of a memory range

   The implementation checks AddrIsInMem(end) instead of the last
   application byte (end - 1). For regions ending at the last byte
   of Low/Mid/HighMem (e.g. __asan_region_is_poisoned(kHighMemEnd, 1)),
   this returns end (kHighMemEnd + 1) instead of the original 
   pointer. This behavior is inconsistent with the function’s 
   semantics and with __asan_address_is_poisoned().

2) address space overflow and invalid shadow range

If a region ends at the top of the virtual address space (kHighMemEnd),
   e.g. on 32-bit targets, end = beg + size could wrap to 0.
   This violated the invariant beg < end and could trigger
   the CHECK failure.

   Additionally, overflow in RoundUpTo alignment computations
   for aligned_b could produce an invalid shadow region spanning
   LowShadow to HighShadow across ShadowGap, leading mem_is_zero()
   to access unmapped memory and crash.

Fix by switching to an inclusive last byte:

  last = beg + size - 1

All checks are now performed on beg and last. The aligned inner 
shadow region is also computed from [beg, last]. Additional guard 
for aligned_b prevents the mapping to shadow if aligned_b is wrapped
(in this case the aligned inner region is also empty and doesn't 
require the shadow scan via mem_is_zero()).

This fixes incorrect return values at memory range ends and 
prevents overflow related crashes on 32-bit targets.

Test is extended to cover these boundary cases.

---------

Co-authored-by: Vitaly Buka <vitalybuka@gmail.com>
This reverts commit 6261cb4 to try to
fix compile time regressions
…#185765)

This PR changes how `ModuleManager` deduplicates module files.

Previously, `ModuleManager` used `FileEntry` for assigning unique
identity to module files. This works fine for explicitly-built modules
because they don't change during the lifetime of a single Clang
instance. For implicitly-built modules however, there are two issues:
1. The `FileEntry` objects are deduplicated by `FileManager` based on
the inode number. Some file systems reuse inode numbers of previously
removed files. Because implicitly-built module files are rapidly removed
and created, this deduplication breaks and compilations may fail
spuriously when inode numbers are recycled during the lifetime of a
single Clang instance.
2. The first thing `ModuleManager` does when loading a module file is
consulting the `FileManager` and checking the file size and modification
time match the expectation of the importer. This is done even when such
module file already lives in the `InMemoryModuleCache`. This introduces
racy behavior into the mechanism that explicitly tries to solve race
conditions, and may lead into spurious compilation failures.

This PR identifies implicitly-built module files by a pair of
`DirectoryEntry` of the module cache path and the path suffix
`<context-hash>/<module-name>-<module-map-path-hash>.pcm`. This gives us
canonicalization of the user-provided module cache path without turning
to `FileEntry` for the PCM file. The path suffix is Clang-generated and
is already canonical.

Some tests needed to be updated because the module cache path directory
was also used as an include directory. This PR relies on not caching the
non-existence of the module cache directory in the `FileManager`. When
other parts of Clang are trying to look up the same path and cache its
non-existence, things break. This is probably very specific to some of
our tests and not how users are setting up their compilations.
This test needs a REQUIRES: asserts, as it uses -debug-only.
generate_config_doc() was writing configure.rst directly into the source
tree, which fails when building from a read-only source directory (e.g.
when the source is on a read-only filesystem or in a packaging
environment).

The Sphinx build in libc/docs/CMakeLists.txt already copies static .rst
files from the source tree into the build tree so that generated docs
don't pollute the source directory. Move configure.rst generation to
follow this same pattern by writing to LIBC_BUILD_DIR/docs/ instead of
LIBC_SOURCE_DIR/docs/.

This also removes configure.rst from the checked-in source tree, since
it was fully generated content that was being regenerated on every CMake
configure anyway.
Loop Fusion includes some internal dependence analysis code. Currently
the pass uses both DA and internal code and chooses the best result. The
goal is to use DA for all dependence analysis requirements in fusion.
This patch changes the default value. Removing the code will be done
separately later.
…n is empty (llvm#185063)

When walking operations post-order and erasing blocks, the inner body
block of a nested transform.sequence can be erased while the outer op is
still alive. If printAsOperand is called on the outer block at that
point, it triggers verification, which calls SequenceOp::getEffects ->
getPotentialTopLevelEffects -> getBodyBlock() -> Region::front() on an
empty region, causing an assertion failure in ilist_iterator
('\!NodePtr->isKnownSentinel()').

Fix by checking that the body region is non-empty before passing its
front block to detail::getPotentialTopLevelEffects in the
PossibleTopLevelTransformOpTrait.

Fixes llvm#60213

Assisted-by: Claude Code
After llvm#186855 there was still
one additional part of the pass that assumed it was able to erase
acc.use_device. Thus extend the same solution and add test.
Add --exclude to invert filter behavior, keeping all remarks excluding
those matching the filter.

Pull Request: llvm#187163
This patch adds a card that encompasses the whole documented entity
instead of just the description. This helps to visually separate the
documentation which was previously more difficult to distinguish. The
description card is also changed to only show a left border to create
less visual noise within the card.

The light theme colors are also changed slightly to not be completely
white.
@z1-cciauto
Copy link
Collaborator Author

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.