Backport tile fixes 34#9542
Conversation
* [Tile] Mark alignment helpers as unsupported in tile mode Currently tile does not support `__builtin_assume_aligned` Previously we would just disable the whole codepath but with added support for `__builtin_is_constant_evaluated` it became active again. Rather than disabling it for all of CCCL with tile mode, we mark those functions that use the builtin as `_CCCL_HOST_DEVICE` * [Tile] Fix various tests * [Tile] Do not compile out device only function it is otherwise not visible
📝 WalkthroughSummary by CodeRabbit
Walkthrough
ChangesTile Compilation Support for Memory, Alignment, and Test Infrastructure
Possibly related PRs
Suggested labels
Suggested reviewers
Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
libcudacxx/test/libcudacxx/std/atomics/atomics.types.generic/integral/16b_integral_ref.pass.cpp (1)
13-15: 🧹 Nitpick | 🔵 Trivial | ⚡ Quick winsuggestion: Use
UNSUPPORTED: enable-tileand reference nvbug6327166 for consistency with the other tile-related test annotations in this layer (aligned_accessor.pass.cpp, assume_aligned.pass.cpp, align.pass.cpp). Those files document the specific ICE: "call to unknown tile builtin function!"If this test hits a different error ("asm statement is unsupported"), clarify whether it's the same underlying bug or a distinct issue.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: f98f4d3a-5a71-492d-a59b-1396923fc269
📒 Files selected for processing (29)
libcudacxx/include/cuda/__memory/align_down.hlibcudacxx/include/cuda/__memory/align_up.hlibcudacxx/include/cuda/__memory/ptr_rebind.hlibcudacxx/include/cuda/std/__cccl/compiler.hlibcudacxx/include/cuda/std/__mdspan/aligned_accessor.hlibcudacxx/include/cuda/std/__memory/align.hlibcudacxx/include/cuda/std/__memory/assume_aligned.hlibcudacxx/include/cuda/std/__memory/runtime_assume_aligned.hlibcudacxx/test/libcudacxx/cuda/functional/proclaim_return_type.pass.cpplibcudacxx/test/libcudacxx/cuda/memory/align_down.pass.cpplibcudacxx/test/libcudacxx/cuda/memory/align_up.pass.cpplibcudacxx/test/libcudacxx/cuda/memory/ptr_rebind.pass.cpplibcudacxx/test/libcudacxx/cuda/utilities/expected/device_only_types.pass.cpplibcudacxx/test/libcudacxx/cuda/utilities/expected/expected.void/device_only_types.pass.cpplibcudacxx/test/libcudacxx/cuda/utilities/optional/device_only_types.pass.cpplibcudacxx/test/libcudacxx/cuda/utilities/tuple/device_only_types.pass.cpplibcudacxx/test/libcudacxx/cuda/utilities/unexpected/device_only_types.pass.cpplibcudacxx/test/libcudacxx/cuda/utilities/utility/pair/device_only_types.pass.cpplibcudacxx/test/libcudacxx/cuda/utilities/variant/device_only_types.pass.cpplibcudacxx/test/libcudacxx/libcxx/macros/extended_data_types.fp8.fail.cpplibcudacxx/test/libcudacxx/libcxx/numerics/floating.point/device_fp128_functions.pass.cpplibcudacxx/test/libcudacxx/std/algorithms/alg.modifying/alg.copy/copy_backward.pass.cpplibcudacxx/test/libcudacxx/std/algorithms/alg.modifying/alg.copy/copy_n.pass.cpplibcudacxx/test/libcudacxx/std/algorithms/alg.modifying/alg.copy/copy_rand.pass.cpplibcudacxx/test/libcudacxx/std/atomics/atomics.types.generic/integral/16b_integral_ref.pass.cpplibcudacxx/test/libcudacxx/std/containers/views/mdspan/mdspan.aligned_accessor/aligned_accessor.pass.cpplibcudacxx/test/libcudacxx/std/utilities/memory/ptr.align/align.pass.cpplibcudacxx/test/libcudacxx/std/utilities/memory/ptr.align/assume_aligned.pass.cpplibcudacxx/test/libcudacxx/std/utilities/memory/ptr.align/assume_aligned.runfail.cpp
| // UNSUPPORTED: enable-tile | ||
| // nvbug6327166: error: Internal Compiler Error (tile codegen): "call to unknown tile builtin function!" | ||
|
|
||
| #include <cuda/std/memory> |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Verify whether the internal header is still functional in non-tile builds
# and whether <cuda/std/memory> transitively includes assume_aligned
rg -n "include.*assume_aligned" libcudacxx/include/cuda/std/memory
# Check if the internal header is still present and functional
fd -t f "assume_aligned.h" libcudacxx/include/cuda/std/__memory/Repository: NVIDIA/cccl
Length of output: 220
important: The change from <cuda/std/__memory/assume_aligned.h> to <cuda/std/memory> violates the guideline requiring the most precise header available. The internal header still exists and is transitively included by the umbrella header, so using the precise internal header is preferred. Either revert to the internal header or provide technical justification for the broader header (e.g., if testing the public API stability is the intent).
Source: Coding guidelines
This comment has been minimized.
This comment has been minimized.
…VIDIA#9313) We were accidentally always testing `__tile__ __device__` functions, which avoided some issues that have been fixed in the ToT compiler. We need to ensure that we properly test `__device__` only functions in a `__device__ __tile__` feature
There was a problem hiding this comment.
Actionable comments posted: 3
🧹 Nitpick comments (1)
libcudacxx/test/libcudacxx/std/utilities/function.objects/func.invoke/invoke.pass.cpp (1)
401-404: 🧹 Nitpick | 🔵 Trivial | ⚡ Quick winsuggestion: Align callable return values with the asserted expectation. Line 403 and Line 419 return
bool{}(false), while Line 410 and Line 426 assert a truthycuda::std::invoke(...)result. Returningtruehere avoids a latent runtime-failure path if these kernels are executed.Also applies to: 410-410, 417-420, 426-426
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: df7fa037-d24d-49dc-939d-7ae53b22f68e
📒 Files selected for processing (55)
libcudacxx/include/cuda/std/__cccl/visibility.hlibcudacxx/test/libcudacxx/cuda/containers/views/mdspan/device_mdspan/index_operator.pass.cpplibcudacxx/test/libcudacxx/cuda/containers/views/mdspan/shared_mem_mdspan/assign.pass.cpplibcudacxx/test/libcudacxx/cuda/containers/views/mdspan/shared_mem_mdspan/conversion.pass.cpplibcudacxx/test/libcudacxx/cuda/containers/views/mdspan/shared_mem_mdspan/ctor.copy.pass.cpplibcudacxx/test/libcudacxx/cuda/containers/views/mdspan/shared_mem_mdspan/ctor.default.pass.cpplibcudacxx/test/libcudacxx/cuda/containers/views/mdspan/shared_mem_mdspan/ctor.dh_array.pass.cpplibcudacxx/test/libcudacxx/cuda/containers/views/mdspan/shared_mem_mdspan/ctor.dh_extents.pass.cpplibcudacxx/test/libcudacxx/cuda/containers/views/mdspan/shared_mem_mdspan/ctor.dh_integers.pass.cpplibcudacxx/test/libcudacxx/cuda/containers/views/mdspan/shared_mem_mdspan/ctor.dh_map.pass.cpplibcudacxx/test/libcudacxx/cuda/containers/views/mdspan/shared_mem_mdspan/ctor.dh_map_acc.pass.cpplibcudacxx/test/libcudacxx/cuda/containers/views/mdspan/shared_mem_mdspan/ctor.dh_span.pass.cpplibcudacxx/test/libcudacxx/cuda/containers/views/mdspan/shared_mem_mdspan/ctor.move.pass.cpplibcudacxx/test/libcudacxx/cuda/containers/views/mdspan/shared_mem_mdspan/index_operator.pass.cpplibcudacxx/test/libcudacxx/cuda/containers/views/mdspan/shared_mem_mdspan/move.pass.cpplibcudacxx/test/libcudacxx/cuda/containers/views/mdspan/shared_mem_mdspan/swap.pass.cpplibcudacxx/test/libcudacxx/cuda/utilities/expected/device_only_types.pass.cpplibcudacxx/test/libcudacxx/cuda/utilities/expected/expected.void/device_only_types.pass.cpplibcudacxx/test/libcudacxx/cuda/utilities/expected/expected.void/tile_only_types.pass.cpplibcudacxx/test/libcudacxx/cuda/utilities/expected/tile_only_types.pass.cpplibcudacxx/test/libcudacxx/cuda/utilities/optional/device_only_types.pass.cpplibcudacxx/test/libcudacxx/cuda/utilities/optional/tile_only_types.pass.cpplibcudacxx/test/libcudacxx/cuda/utilities/tuple/device_only_types.pass.cpplibcudacxx/test/libcudacxx/cuda/utilities/tuple/tile_only_types.pass.cpplibcudacxx/test/libcudacxx/cuda/utilities/unexpected/device_only_types.pass.cpplibcudacxx/test/libcudacxx/cuda/utilities/unexpected/tile_only_types.pass.cpplibcudacxx/test/libcudacxx/cuda/utilities/utility/pair/device_only_types.pass.cpplibcudacxx/test/libcudacxx/cuda/utilities/utility/pair/tile_only_types.pass.cpplibcudacxx/test/libcudacxx/cuda/utilities/variant/device_only_types.pass.cpplibcudacxx/test/libcudacxx/cuda/utilities/variant/tile_only_types.pass.cpplibcudacxx/test/libcudacxx/std/utilities/function.objects/arithmetic.operations/divides.pass.cpplibcudacxx/test/libcudacxx/std/utilities/function.objects/arithmetic.operations/minus.pass.cpplibcudacxx/test/libcudacxx/std/utilities/function.objects/arithmetic.operations/modulus.pass.cpplibcudacxx/test/libcudacxx/std/utilities/function.objects/arithmetic.operations/multiplies.pass.cpplibcudacxx/test/libcudacxx/std/utilities/function.objects/arithmetic.operations/negate.pass.cpplibcudacxx/test/libcudacxx/std/utilities/function.objects/arithmetic.operations/plus.pass.cpplibcudacxx/test/libcudacxx/std/utilities/function.objects/bitwise.operations/bit_and.pass.cpplibcudacxx/test/libcudacxx/std/utilities/function.objects/bitwise.operations/bit_not.pass.cpplibcudacxx/test/libcudacxx/std/utilities/function.objects/bitwise.operations/bit_or.pass.cpplibcudacxx/test/libcudacxx/std/utilities/function.objects/bitwise.operations/bit_xor.pass.cpplibcudacxx/test/libcudacxx/std/utilities/function.objects/comparisons/equal_to.pass.cpplibcudacxx/test/libcudacxx/std/utilities/function.objects/comparisons/greater.pass.cpplibcudacxx/test/libcudacxx/std/utilities/function.objects/comparisons/greater_equal.pass.cpplibcudacxx/test/libcudacxx/std/utilities/function.objects/comparisons/less.pass.cpplibcudacxx/test/libcudacxx/std/utilities/function.objects/comparisons/less_equal.pass.cpplibcudacxx/test/libcudacxx/std/utilities/function.objects/comparisons/not_equal_to.pass.cpplibcudacxx/test/libcudacxx/std/utilities/function.objects/func.invoke/invoke.pass.cpplibcudacxx/test/libcudacxx/std/utilities/function.objects/logical.operations/logical_and.pass.cpplibcudacxx/test/libcudacxx/std/utilities/function.objects/logical.operations/logical_not.pass.cpplibcudacxx/test/libcudacxx/std/utilities/function.objects/logical.operations/logical_or.pass.cpplibcudacxx/test/libcudacxx/std/utilities/function.objects/negators/binary_negate.pass.cpplibcudacxx/test/libcudacxx/std/utilities/function.objects/negators/unary_negate.pass.cpplibcudacxx/test/libcudacxx/std/utilities/time/time.traits/time.traits.duration_values/max.pass.cpplibcudacxx/test/support/host_device_types.hlibcudacxx/test/support/test_macros.h
💤 Files with no reviewable changes (1)
- libcudacxx/test/libcudacxx/std/utilities/time/time.traits/time.traits.duration_values/max.pass.cpp
✅ Files skipped from review due to trivial changes (1)
- libcudacxx/test/libcudacxx/cuda/containers/views/mdspan/shared_mem_mdspan/swap.pass.cpp
🚧 Files skipped from review as they are similar to previous changes (4)
- libcudacxx/test/libcudacxx/cuda/utilities/variant/device_only_types.pass.cpp
- libcudacxx/test/libcudacxx/cuda/utilities/tuple/device_only_types.pass.cpp
- libcudacxx/test/libcudacxx/cuda/utilities/unexpected/device_only_types.pass.cpp
- libcudacxx/test/libcudacxx/cuda/utilities/optional/device_only_types.pass.cpp
|
|
||
| // REQUIRES: enable-tile | ||
|
|
||
| // UNSUPPORTED: clang-14 |
There was a problem hiding this comment.
important: Add a reason for the clang-14 skip.
Line 12 marks UNSUPPORTED: clang-14 but does not state the failure mode; add a one-line rationale like the other tile tests in this cohort so the skip remains auditable.
As per coding guidelines, “If a test is unsupported, expected to fail, disabled, or skipped on a platform, motivate it with a comment.”
Source: Coding guidelines
| #include <cuda/std/cassert> | ||
| #include <cuda/std/expected> | ||
|
|
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
rg -n 'cuda::std::initializer_list' libcudacxx/test/libcudacxx/cuda/utilities/unexpected/tile_only_types.pass.cpp
rg -n '`#include` <cuda/std/initializer_list>' libcudacxx/test/libcudacxx/cuda/utilities/unexpected/tile_only_types.pass.cppRepository: NVIDIA/cccl
Length of output: 160
🏁 Script executed:
cat -n libcudacxx/test/libcudacxx/cuda/utilities/unexpected/tile_only_types.pass.cpp | head -50Repository: NVIDIA/cccl
Length of output: 1965
🏁 Script executed:
# Check what headers are included by <cuda/std/expected>
grep -r "include.*initializer_list" libcudacxx/include/cuda/std/expected* 2>/dev/null | head -20Repository: NVIDIA/cccl
Length of output: 37
🏁 Script executed:
# Check the expected header content for includes
head -100 libcudacxx/include/cuda/std/expected | grep -E "^`#include`"Repository: NVIDIA/cccl
Length of output: 300
🏁 Script executed:
# Check if initializer_list is included in the unexpected.h chain
grep -r "initializer_list" libcudacxx/include/cuda/std/__expected/ 2>/dev/null | head -10Repository: NVIDIA/cccl
Length of output: 1409
🏁 Script executed:
# Check for similar patterns in other test files - do they directly include initializer_list?
fd -e "pass.cpp" libcudacxx/test/ --exec grep -l "cuda::std::initializer_list" {} \; | head -5 | xargs -I {} sh -c 'echo "=== {} ===" && grep "`#include`.*initializer_list" {}'Repository: NVIDIA/cccl
Length of output: 425
🏁 Script executed:
# Check for similar patterns in other test files - do they directly include initializer_list?
fd -e "pass.cpp" --full-path 'libcudacxx/test/' --exec grep -l "cuda::std::initializer_list" {} \; 2>/dev/null | head -5 | xargs -I {} sh -c 'echo "=== {} ===" && grep "`#include`.*initializer_list" {} 2>/dev/null'Repository: NVIDIA/cccl
Length of output: 560
important: Add explicit #include <cuda/std/initializer_list> header.
Line 37 uses cuda::std::initializer_list<int>{}, but the file relies on transitive inclusion from <cuda/std/expected>. Per coding guidelines, files must include all headers for symbols they directly use. The sibling test file libcudacxx/test/libcudacxx/cuda/utilities/expected/host_only_types.pass.cpp includes this header explicitly, establishing the expected pattern.
Source: Coding guidelines
| #include <cuda/std/cassert> | ||
| #include <cuda/std/variant> | ||
|
|
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
rg -n 'cuda::std::initializer_list' libcudacxx/test/libcudacxx/cuda/utilities/variant/tile_only_types.pass.cpp
rg -n '`#include` <cuda/std/initializer_list>' libcudacxx/test/libcudacxx/cuda/utilities/variant/tile_only_types.pass.cppRepository: NVIDIA/cccl
Length of output: 342
🏁 Script executed:
head -100 libcudacxx/test/libcudacxx/cuda/utilities/variant/tile_only_types.pass.cpp | tail -80Repository: NVIDIA/cccl
Length of output: 2290
🏁 Script executed:
rg -n '`#include`' libcudacxx/test/libcudacxx/cuda/utilities/variant/tile_only_types.pass.cpp | head -20Repository: NVIDIA/cccl
Length of output: 179
🏁 Script executed:
rg 'initializer_list' libcudacxx/include/cuda/std/variant -lRepository: NVIDIA/cccl
Length of output: 37
🏁 Script executed:
find libcudacxx/include -name "variant*" -type fRepository: NVIDIA/cccl
Length of output: 568
🏁 Script executed:
head -50 libcudacxx/include/cuda/std/variant | grep -n "include"Repository: NVIDIA/cccl
Length of output: 433
🏁 Script executed:
rg '`#include`.*initializer_list' libcudacxx/include/cuda/std/ -A 1 -B 1Repository: NVIDIA/cccl
Length of output: 5204
Add #include <cuda/std/initializer_list> explicitly. Lines 48, 53, and 90 construct cuda::std::initializer_list<int>{}, but this header is not included directly. Although it comes transitively through <cuda/std/variant>, the coding guideline requires including all headers for symbols you use; transitive header inclusion is not allowed.
Source: Coding guidelines
🥳 CI Workflow Results🟩 Finished in 8h 30m: Pass: 100%/113 | Total: 6d 03h | Max: 4h 16m | Hits: 40%/2709280See results here. |
This backports #9487 and #9488