merge main into amd-main by z1-cciauto · Pull Request #1801 · ROCm/llvm-project

z1-cciauto · 2026-03-18T17:48:40Z

No description provided.

…187025) When building the full library with -nostdinc, directly including <stdint.h> may pull in host or compiler-provided headers that collide with LLVM-libc's local macro definitions. Switch to using our internal stdint-macros.h when LIBC_FULL_BUILD is enabled. Additionally, declare aligned_alloc with noexcept in C++ to match common C library declarations and avoid fatal type specification mismatches during sysroot builds.

Noticed while working on i512 shift expansion - if we end up with repeated splat args, the compress node is unnecessary as we're just shuffling the same element values

This is a small refactor of how DeducedType and it's derived types are represented. The different deduction kinds are spelled out in an enum, and how this is tracked is simplified, to allow easier profiling. How these types are constructed and canonicalized is also brought more in line with how it works for the other types. This fixes a crash reported here: llvm#167513 (comment)

This does a few changes that are hard to separate from each other: * Consider forming minnum/maxnum from setcc+select non-profitable. X86 has instructions specifically for the setcc+select pattern. (Without this it's hard to get good coverage on this code path.) * Reduce duplication in the code for forming FMIN/FMAX, by using predicate inversion (to make setcc and select operand order match) and predicate invswap (to canonicalize to ordered predicates). This leaves us with just ordered and NaN-less predicates. * For non-strict non-less predicates, convert them to strict ones via invswap (i.e. swapping the operands of both the setcc and select). Previously this just treated them the same as strict predicates, but I believe that's incorrect in terms of signed zero handling.

Alive2 proofs: smin pattern: https://alive2.llvm.org/ce/z/-E2Tpc

Adds a newPM pass for AArch64ConditionOptimizer. - Refactors base logic into an Impl class - Renames old pass with the "Legacy" suffix - Adds the new pass manager pass using refactored logic - Updates tests Context and motivation in https://llvm.org/docs/NewPassManager.html#status-of-the-new-and-legacy-pass-managers

Use a class instead of an alias, so that CycleInfo can be forward-declared. We can't do the same for Cycle without further changes (a LoopInfo like CRTP scheme).

Occasionally wait_for_file_on_target will time out on the Green Dragon bots and we're not sure why. I'm adding this logging in an attempt to get more clues as to what's happening when it fails.

…#187037) AMDGPUTargetMachine also had a static method which did the same thing. Remove it so that we have a single source of truth.

…for non-descriptor dummies (llvm#186894) When ignore_tkr(c) is set and the actual argument is an allocatable or pointer (stored as a descriptor), the lowering code was unconditionally returning the descriptor pointer as-is, regardless of whether the dummy argument expects a descriptor. For bind(c) interfaces with assumed-size dummies (e.g., cuFFT), the dummy expects a raw pointer, not a descriptor. Passing the descriptor caused the C function to receive the wrong address, leading to silent data corruption and invalid descriptor crashes at deallocation. The fix adds a check that the early return for ignore_tkr(c) only applies when the dummy type is itself a descriptor type. When the dummy expects a base address, the normal path is taken, which correctly extracts the base address from the descriptor via fir.box_addr.

The variable `matches` may be assigned the address of block-scope `local_matches`, which is defined in a scope strictly smaller than the scope of `matches`. Towards the end of the function, after `loacl_matches` has been destroyed, `matches` is accessed, possibly triggering a user-after-free.

Use `math.round` in lowering of `anint` so we can use passes like `MathToNVVM` to target device code differently.

…) (llvm#186833) This patch adds support for `float8_e3m4` and `float8_e4m3` in `np_to_memref.py` by adding the appropriate ctypes structures. Additionally changes minimum numpy version to 2.1.0 and uses a single ml_dtypes version of 0.5.0.

…InputsDead (llvm#186325) Optimize MakeRegionBranchOpSuccessorInputsDead patterns in `ControlFlowInterfaces.cpp`: - Add early exit to `computeReachableValuesFromSuccessorInput` when the caller only needs to know if there is exactly one reachable value, avoiding unnecessary traversal. Assisted-by: Claude Code Co-authored-by: Yang Bai <yangb@nvidia.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…rand types (llvm#186550) This patch disables vectorizing Cmps with different operand types because we can't form a legal vector. This used to cause an assertion check crash once we attempted to pack the bundle formed by Cmp's operands.

…ice array descriptors (llvm#186945) When an allocatable CUDA Fortran device array is privatized in an OpenMP region, the null descriptor created in the omp.private init region was missing the allocator_idx attribute. This caused a subsequent allocate() inside the parallel region to call malloc instead of cudaMalloc, because the runtime's Descriptor::Allocate() reads allocator_idx from the descriptor to select the allocator. On some systems it caused cudaErrorInvalidValue crashes. This patch sets allocator_idx = 2 (device allocator) on the null fir.embox in handleNullAllocatable() when the symbol has a CUDA device attribute, so that the Fortran runtime correctly calls cudaMalloc for the privatized array.

…shes (llvm#186145) When the bytecode type callback (test-kind=2) calls iface->readType() for every builtin type, complex types like MemRefType could crash because the generated reading code used get<T>() which asserts on invalid parameters, rather than getChecked<T>() which returns null gracefully. This change: - Adds a getChecked<T>() free function helper in BytecodeImplementation.h that calls T::getChecked(emitError, params) (no-context form) when a specific override exists, otherwise falls back to get<T>(). The with-context second branch is intentionally omitted to avoid instantiating StorageUserBase::getChecked<Args> for types that only inherit the base template (e.g. ArrayAttr), which would require complete storage types unavailable in the bytecode reading TU. - Updates BytecodeBase.td default cBuilder for DialectAttribute/DialectType to use getChecked<> instead of get<>. - Updates all custom cBuilder strings in BuiltinDialectBytecode.td. - Updates the no-args codegen case in BytecodeDialectGen.cpp. - Adds a regression test in bytecode_callback_with_custom_type.mlir. Fixes llvm#128308 Assisted-by: Claude Code

…ources (llvm#187063) Allows us to use getMaskNode to canonicalize predicate masks in big shift lowering

…186653) Replace the monolithic cir.binop.overflow operation and its BinOpOverflowKind enum with three individual operations: cir.add.overflow, cir.sub.overflow, and cir.mul.overflow. This follows the same pattern used when BinOp and UnaryOp were previously split into per-operation ops (cir.add, cir.sub, etc.), eliminating enum dispatch and enabling per-op traits like Commutative.

This MR removes a hard-coded compute number in an MLIR test. This will allow the test to not need to be updated in the future. The default value will come from `NVVMOps.td`.

Allow `--function-order` to be combined with `--reorder-functions` algorithms. Functions listed in the order file are pinned first (indices 0..N-1), then the selected algorithm orders remaining functions starting at index N.

… rules (llvm#186974) Update `mlir/test/Target/SPIRV/struct.mlir` so it remains valid under current SPIR-V validator checks in Logical addressing mode. The recursive struct cases were emitting pointer-allocating globals in storage classes rejected by `spirv-val`. Adjust those globals to `Private` while keeping recursive member pointers in `StorageBuffer`, and update the expected roundtrip types accordingly. Also add the missing variable-pointer requirements to the module VCE: - capability: `VariablePointers` - extension: `SPV_KHR_variable_pointers` Signed-off-by: Davide Grohmann <davide.grohmann@arm.com>

…)" (llvm#187079) Reverts llvm#187025 Fails on openmp bot: https://lab.llvm.org/buildbot/#/builders/10/builds/24743 ('INT64_MIN' macro redefined when used Clang-provided <stdint.h> is used) fails on RISC-V-32 bot: https://lab.llvm.org/buildbot/#/builders/196/builds/17067 due to MPFRNumber constructor not picking the right overload for uint32_t argument.

) This patch initiates the refactoring of Linux syscalls as described in the RFC (https://discourse.llvm.org/t/rfc-linux-syscall-cleanup/87248/). It introduces a new infrastructure in `src/__support/OSUtil/linux/syscall_wrappers/` to house header-only syscall wrappers. These wrappers utilize `ErrorOr` to provide a consistent, type-safe interface for error handling across the library, standardizing how syscall return values are converted into errno-compatible Error objects. Summary of changes: - Created the `syscall_wrappers` directory and added `close.h`, `read.h`, `write.h`, and `open.h`. - Moved the existing `getrandom.h` into the new `syscall_wrappers` directory and updated its callers (including HashTable/randomness.h). - Refactored core entrypoints (`close`, `read`, `write`, `open`) to use the new wrappers, removing direct `syscall_impl` logic and manual errno setting. - Updated `shm_open.cpp` to use the new `open` wrapper. - Cleaned up `OSUtil/linux/fcntl.cpp` by removing redundant internal implementations of `open` and `close`. - Added a developer guide in `docs/dev/syscall_wrapper_refactor.rst` outlining the established pattern for future migrations. --------- Co-authored-by: Michael Jones <michaelrj@google.com>

…185772) The `__ob_trap` type specifier can be used to trap (or warn with sanitizers) when overflow or truncation occurs on the specified type. There was a gap in coverage for this with the `-fsanitize=implicit-integer-sign-change` sanitizer. Fix this by carrying around `__ob_trap` information through `EmitIntegerSignChange()` which allows us to properly trap or warn.

Add `spirv.GroupNonUniformBroadcastFirst` op and tests.

build logs refs: https://github.com/valord577/nativepkgs/actions/runs/22346192467/job/64661318198

…lvm#185991)

…age (llvm#185835) Instead of heap-allocating an `InterpFrame` and then immediately heap-allocating more space for the local variables, do only one heap-allocation and use tail storage for the local variables. We already know how many bytes we need to for the tail storage after all. This also makes `InterpFrame` a little smaller since we don't need to save an explicit pointer for the local variable memory. For an artificial test case doing lots of function calls with local variables like: ```c++ constexpr int plus(int a, int b) { int x = a; int y = b; int z = x + y; return z; } constexpr int minus(int a, int b) { int x = a; int y = b; int z = x - y; return z; } constexpr int foo() { int a = 0; for (unsigned I = 0; I != 1'000'000; ++I) { int b = I; a = plus(a,b ); a = minus(a,I); } return a; } static_assert(foo() == 0); ``` this saves us over 6%. We also eliminate the per-argument `Block` heap allocation on the first pointer-access to an argument the same way. To make this work, we change the param ops to use the parameter index instead of the offset.

Refactor filter into a FilterTool class that can process multiple input files sequentially and emits the remarks into a single output file. Pull Request: llvm#187162

…#187145) Add a LoadCompilationPrefixMap() helper in SymbolFile::FindPlugin that walks up from the symbol file's directory looking for a compilation-prefix-map.json file. When found, each key→value entry is applied to the module's source path mapping list, allowing LLDB to resolve source file paths that were rewritten by -fdebug-prefix-map at build time without requiring manual `settings set target.source-map`. The JSON file format maps fake paths (as written into debug info) back to their real on-disk counterparts: { "/fake/srcdir": "/real/srcdir" } Directory results are cached so the filesystem is walked at most once per unique directory across all modules loaded in a session. Also apply the module's source path remappings in SymbolFileDWARFDebugMap::ParseCompileUnitAtIndex when constructing compile units from N_SO stabs. This mirrors what MakeAbsoluteAndRemap does for the dSYM case so that fake paths baked into the debug map are transparently resolved to real paths. rdar://84824567 Assisted-By: Claude

llvm#181572) Update VPReplicateRecipe::computeCost to compute the cost for stores to invariant addresses only masked by the header mask. This matches the legacy cost model logic, but it is slightly odd that the legacy cost model only seems to do this for stores predicated by the header mask (i.e. tail-folding and not executed conditionally otherwise). This is probably something we want to re-evaluate eventually. PR: llvm#181572

…ion (llvm#186769) This addresses one of the limitations of llvm#174726 by directly selecting `v_cvt_[u16/i16]_f16` instructions for conversion between 16-bit types, as they already handle saturation internally.

If the routine op only has one of the string or id attributes, the API was crashing since it was attempting to search in both. Guard each search individually.

Rename to riscv_add_like to discriminate from generic pattern in a future patch Ideally we'd make the riscv patterns generic but they're currently using value tracking and I'm not sure if we want to that generically?

Align it with the style of `LoopVectorize/VPlan/predicator.ll`: * Move ascii-graphs close to IR to avoid scrolling through CHECKs when comparing the picture and actual IR * Rename `%cN` to ensure that `bbN` branches on `%cN`

…p. (llvm#187189) If N was changed on the previous loop iteration, we need the handle to point at the new N. Fixes llvm#186969.

POSIX.1-2024 defines a header called `endian.h`` which contains macros and helpers for handling byte order conversions. Unfortunately it is not available on all platforms, however LLVM libc has a useful endian.h implementation that is essentially only casts and builtins. This PR draws on that implementation to add a clang header so applications on platforms without it can make use of this header. The clang header forwards to the system header if available, so existing usages are not affected.

…ed (llvm#183900) __asan_region_is_poisoned() uses an exclusive end address (end = beg + size) to validate the region [beg, end) and to compute the aligned inner shadow region. This causes correctness issue near memory range upper boundary and could trigger address space overflow on 32-bit targets. 1. Incorrect handling of the last byte of a memory range The implementation checks AddrIsInMem(end) instead of the last application byte (end - 1). For regions ending at the last byte of Low/Mid/HighMem (e.g. __asan_region_is_poisoned(kHighMemEnd, 1)), this returns end (kHighMemEnd + 1) instead of the original pointer. This behavior is inconsistent with the function’s semantics and with __asan_address_is_poisoned(). 2) address space overflow and invalid shadow range If a region ends at the top of the virtual address space (kHighMemEnd), e.g. on 32-bit targets, end = beg + size could wrap to 0. This violated the invariant beg < end and could trigger the CHECK failure. Additionally, overflow in RoundUpTo alignment computations for aligned_b could produce an invalid shadow region spanning LowShadow to HighShadow across ShadowGap, leading mem_is_zero() to access unmapped memory and crash. Fix by switching to an inclusive last byte: last = beg + size - 1 All checks are now performed on beg and last. The aligned inner shadow region is also computed from [beg, last]. Additional guard for aligned_b prevents the mapping to shadow if aligned_b is wrapped (in this case the aligned inner region is also empty and doesn't require the shadow scan via mem_is_zero()). This fixes incorrect return values at memory range ends and prevents overflow related crashes on 32-bit targets. Test is extended to cover these boundary cases. --------- Co-authored-by: Vitaly Buka <vitalybuka@gmail.com>

This reverts commit 6261cb4 to try to fix compile time regressions

…#185765) This PR changes how `ModuleManager` deduplicates module files. Previously, `ModuleManager` used `FileEntry` for assigning unique identity to module files. This works fine for explicitly-built modules because they don't change during the lifetime of a single Clang instance. For implicitly-built modules however, there are two issues: 1. The `FileEntry` objects are deduplicated by `FileManager` based on the inode number. Some file systems reuse inode numbers of previously removed files. Because implicitly-built module files are rapidly removed and created, this deduplication breaks and compilations may fail spuriously when inode numbers are recycled during the lifetime of a single Clang instance. 2. The first thing `ModuleManager` does when loading a module file is consulting the `FileManager` and checking the file size and modification time match the expectation of the importer. This is done even when such module file already lives in the `InMemoryModuleCache`. This introduces racy behavior into the mechanism that explicitly tries to solve race conditions, and may lead into spurious compilation failures. This PR identifies implicitly-built module files by a pair of `DirectoryEntry` of the module cache path and the path suffix `<context-hash>/<module-name>-<module-map-path-hash>.pcm`. This gives us canonicalization of the user-provided module cache path without turning to `FileEntry` for the PCM file. The path suffix is Clang-generated and is already canonical. Some tests needed to be updated because the module cache path directory was also used as an include directory. This PR relies on not caching the non-existence of the module cache directory in the `FileManager`. When other parts of Clang are trying to look up the same path and cache its non-existence, things break. This is probably very specific to some of our tests and not how users are setting up their compilations.

This test needs a REQUIRES: asserts, as it uses -debug-only.

generate_config_doc() was writing configure.rst directly into the source tree, which fails when building from a read-only source directory (e.g. when the source is on a read-only filesystem or in a packaging environment). The Sphinx build in libc/docs/CMakeLists.txt already copies static .rst files from the source tree into the build tree so that generated docs don't pollute the source directory. Move configure.rst generation to follow this same pattern by writing to LIBC_BUILD_DIR/docs/ instead of LIBC_SOURCE_DIR/docs/. This also removes configure.rst from the checked-in source tree, since it was fully generated content that was being regenerated on every CMake configure anyway.

Loop Fusion includes some internal dependence analysis code. Currently the pass uses both DA and internal code and chooses the best result. The goal is to use DA for all dependence analysis requirements in fusion. This patch changes the default value. Removing the code will be done separately later.

…n is empty (llvm#185063) When walking operations post-order and erasing blocks, the inner body block of a nested transform.sequence can be erased while the outer op is still alive. If printAsOperand is called on the outer block at that point, it triggers verification, which calls SequenceOp::getEffects -> getPotentialTopLevelEffects -> getBodyBlock() -> Region::front() on an empty region, causing an assertion failure in ilist_iterator ('\!NodePtr->isKnownSentinel()'). Fix by checking that the body region is non-empty before passing its front block to detail::getPotentialTopLevelEffects in the PossibleTopLevelTransformOpTrait. Fixes llvm#60213 Assisted-by: Claude Code

After llvm#186855 there was still one additional part of the pass that assumed it was able to erase acc.use_device. Thus extend the same solution and add test.

Add --exclude to invert filter behavior, keeping all remarks excluding those matching the filter. Pull Request: llvm#187163

This patch adds a card that encompasses the whole documented entity instead of just the description. This helps to visually separate the documentation which was previously more difficult to distinguish. The description card is also changed to only show a left border to create less visual noise within the card. The light theme colors are also changed slightly to not be completely white.

z1-cciauto · 2026-03-18T17:52:06Z

PSDB Build Link: http://mlse-bdc-20dd129:8065/#/builders/11/builds/59

kaladron and others added 30 commits March 17, 2026 16:12

[X86] Fold compress(splat(x),splat(x),mask) -> splat(x) (llvm#187042)

abd5b69

Noticed while working on i512 shift expansion - if we end up with repeated splat args, the compress node is unnecessary as we're just shuffling the same element values

[libc++] Add scripts defining two LNT runners for libc++ (llvm#187050)

fec11e3

[InstCombine] Recognize non-negative subtraction patterns (llvm#182597)

6f68daa

Alive2 proofs: smin pattern: https://alive2.llvm.org/ce/z/-E2Tpc

[gn] port 55b271d

b04b9e5

[CycleInfo] Support forward declarations (llvm#187029)

810ba55

Use a class instead of an alias, so that CycleInfo can be forward-declared. We can't do the same for Cycle without further changes (a LoopInfo like CRTP scheme).

[lldb] Add additional logging to wait_for_file_on_target (llvm#186915)

adf458c

Occasionally wait_for_file_on_target will time out on the Green Dragon bots and we're not sure why. I'm adding this logging in an attempt to get more clues as to what's happening when it fails.

[AMDGPU] Standardize on using AMDGPU::getNullPointerValue. NFC. (llvm…

79d1a2c

…#187037) AMDGPUTargetMachine also had a static method which did the same thing. Remove it so that we have a single source of truth.

[flang] Lower anint with math.round (llvm#186039)

1800651

Use `math.round` in lowering of `anint` so we can use passes like `MathToNVVM` to target device code differently.

[X86] getMaskNode - perform pre-truncation of oversized scalar mask s…

f28ef68

…ources (llvm#187063) Allows us to use getMaskNode to canonicalize predicate masks in big shift lowering

[green-dragon] fix Python and Swig flags (llvm#187052)

b5614bc

Removed Hardcoded SM Number from Mlir Test (llvm#186917)

0769dde

This MR removes a hard-coded compute number in an MLIR test. This will allow the test to not need to be updated in the future. The default value will come from `NVVMOps.td`.

Add hybrid function ordering support (llvm#186003)

037c209

Allow `--function-order` to be combined with `--reorder-functions` algorithms. Functions listed in the order file are pinned first (indices 0..N-1), then the selected algorithm orders remaining functions starting at index N.

[mlir][spirv] Add spirv.GroupNonUniformBroadcastFirst Op (llvm#185818)

f0dfa36

Add `spirv.GroupNonUniformBroadcastFirst` op and tests.

[lldb] Fix build on Linux when SEGV_PKUERR is undefined (llvm#186963)

7477045

build logs refs: https://github.com/valord577/nativepkgs/actions/runs/22346192467/job/64661318198

vangthao95 and others added 22 commits March 18, 2026 09:00

AMDGPU/GlobalISel: RegBankLegalize rules for ds_add/sub_gs_reg_rtn (l…

f609344

…lvm#185991)

[llvm-remarkutil] filter: Support multiple input files (llvm#187162)

9418cdb

Refactor filter into a FilterTool class that can process multiple input files sequentially and emits the remarks into a single output file. Pull Request: llvm#187162

[AMDGPU] Use native instructions for f16 to u16/i16 saturated convers…

55cee50

…ion (llvm#186769) This addresses one of the limitations of llvm#174726 by directly selecting `v_cvt_[u16/i16]_f16` instructions for conversion between 16-bit types, as they already handle saturation internally.

[mlir][acc] Fix bindNameValue for RoutineOp (llvm#187307)

16585af

If the routine op only has one of the string or id attributes, the API was crashing since it was attempting to search in both. Guard each search individually.

[RISCV] Rename add_like pattern -> riscv_add_like (llvm#187306)

7a9299f

Rename to riscv_add_like to discriminate from generic pattern in a future patch Ideally we'd make the riscv patterns generic but they're currently using value tracking and I'm not sure if we want to that generically?

[NFC] Update LoopVectorize/predicator.ll test (llvm#187125)

9b0c2a1

Align it with the style of `LoopVectorize/VPlan/predicator.ll`: * Move ascii-graphs close to IR to avoid scrolling through CHECKs when comparing the picture and actual IR * Rename `%cN` to ensure that `bbN` branches on `%cN`

[DAGCombiner] Move the XORHandle in rebuildSetCC inside the while loo…

9dd2e37

…p. (llvm#187189) If N was changed on the previous loop iteration, we need the handle to point at the new N. Fixes llvm#186969.

Revert "[SLP] Loop aware cost model/tree building"

4e500bd

This reverts commit 6261cb4 to try to fix compile time regressions

[NFC] Fix mve-reg-pressure-spills.ll test (llvm#187316)

81d3f04

This test needs a REQUIRES: asserts, as it uses -debug-only.

[flang][acc] Handle deduplicated use_device (part 2) (llvm#187305)

7166468

After llvm#186855 there was still one additional part of the pass that assumed it was able to erase acc.use_device. Thus extend the same solution and add test.

[llvm-remarkutil] filter: Add --exclude flag (llvm#187163)

d54da68

Add --exclude to invert filter behavior, keeping all remarks excluding those matching the filter. Pull Request: llvm#187163

merge main into amd-main

b0eb19e

z1-cciauto requested review from antiagainst, fabianmcg, krzysz00, kuhar, nicolasvasilache and stellaraccident as code owners March 18, 2026 17:48

z1-cciauto requested a review from a team March 18, 2026 17:48

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

merge main into amd-main#1801

merge main into amd-main#1801
z1-cciauto wants to merge 210 commits intoamd-mainfrom
upstream_merge_202603181348

z1-cciauto commented Mar 18, 2026

Uh oh!

z1-cciauto commented Mar 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants

Conversation

z1-cciauto commented Mar 18, 2026

Uh oh!

z1-cciauto commented Mar 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants