Describe the bug
On a native Jetson Orin (sm_87, JetPack 6 / CUDA 12.6) build — compiling on the device itself with -DEMBEDDED_TARGET=jetson-orin, not the aarch64 cross toolchain — two build-system issues prevent the project from building/linking out of the box. Impact: blocker for native Orin builds.
1. The top-level default CUDA architecture list clobbers an explicit -DCMAKE_CUDA_ARCHITECTURES.
CMakeLists.txt sets CMAKE_CUDA_ARCHITECTURES = 80;86;89[;100a;120] whenever AARCH64_BUILD is undefined. A native Orin build is not a cross build, so AARCH64_BUILD is unset and the datacenter arch list is forced even when the user configures with -DCMAKE_CUDA_ARCHITECTURES=87. The build then targets the wrong SM.
2. The CUDA<12.8 _cudaLaunchKernelEx wrap and the cutedsl cudart shim are not propagated to the final executable when CuTe DSL is linked through a STATIC_LIBRARY.
cmake/CuteDsl.cmake (cute_dsl_setup()) attaches the cutedsl cudart shim and -Wl,--wrap=_cudaLaunchKernelEx with PRIVATE. A static archive performs no link step, so for STATIC_LIBRARY link targets neither reaches the consuming executable, and the CuTe DSL GEMM path fails to link/resolve on JetPack 6 / CUDA 12.6 (wrapped _cudaLaunchKernelEx / cudaLibrary_t).
Steps/Code to reproduce bug
Build configuration:
# On a Jetson Orin (sm_87, JetPack 6 / CUDA 12.6), native (on-device) build:
cmake -B build -DCMAKE_BUILD_TYPE=Release \
-DEMBEDDED_TARGET=jetson-orin \
-DCMAKE_CUDA_ARCHITECTURES=87 \
-DTRT_PACKAGE_DIR=/path/to/TensorRT
cmake --build build -j
# (1) Configure still resolves CMAKE_CUDA_ARCHITECTURES to 80;86;89 (+100a;120), not 87.
# (2) Linking the CuTe DSL static-library consumer fails on the wrapped _cudaLaunchKernelEx.
Expected behavior
- An explicit
-DCMAKE_CUDA_ARCHITECTURES (e.g. 87) is respected; the project's default arch set applies only when the user does not provide one. Regular x86 builds are unchanged.
- The cutedsl cudart shim and the
--wrap linker option reach the final executable when CuTe DSL is consumed via a static library, so JetPack 6 / CUDA 12.6 builds link.
System information (Edge Device)
- Platform: NVIDIA Jetson Orin
- Software release: JetPack 6 (CUDA 12.6)
- CPU architecture: aarch64
- GPU compute capability: SM87
- Build type: Release
- Library versions:
- TensorRT Edge-LLM version or commit hash: v0.8.0 (
f9cc746)
- CUDA: 12.6
- CMake options used:
- EMBEDDED_TARGET:
jetson-orin
- CMAKE_CUDA_ARCHITECTURES:
87
A fix is ready and will be submitted as a PR referencing this issue.
Describe the bug
On a native Jetson Orin (sm_87, JetPack 6 / CUDA 12.6) build — compiling on the device itself with
-DEMBEDDED_TARGET=jetson-orin, not the aarch64 cross toolchain — two build-system issues prevent the project from building/linking out of the box. Impact: blocker for native Orin builds.1. The top-level default CUDA architecture list clobbers an explicit
-DCMAKE_CUDA_ARCHITECTURES.CMakeLists.txtsetsCMAKE_CUDA_ARCHITECTURES = 80;86;89[;100a;120]wheneverAARCH64_BUILDis undefined. A native Orin build is not a cross build, soAARCH64_BUILDis unset and the datacenter arch list is forced even when the user configures with-DCMAKE_CUDA_ARCHITECTURES=87. The build then targets the wrong SM.2. The CUDA<12.8
_cudaLaunchKernelExwrap and the cutedsl cudart shim are not propagated to the final executable when CuTe DSL is linked through a STATIC_LIBRARY.cmake/CuteDsl.cmake(cute_dsl_setup()) attaches the cutedsl cudart shim and-Wl,--wrap=_cudaLaunchKernelExwithPRIVATE. A static archive performs no link step, so forSTATIC_LIBRARYlink targets neither reaches the consuming executable, and the CuTe DSL GEMM path fails to link/resolve on JetPack 6 / CUDA 12.6 (wrapped_cudaLaunchKernelEx/cudaLibrary_t).Steps/Code to reproduce bug
Build configuration:
Expected behavior
-DCMAKE_CUDA_ARCHITECTURES(e.g.87) is respected; the project's default arch set applies only when the user does not provide one. Regular x86 builds are unchanged.--wraplinker option reach the final executable when CuTe DSL is consumed via a static library, so JetPack 6 / CUDA 12.6 builds link.System information (Edge Device)
f9cc746)jetson-orin87A fix is ready and will be submitted as a PR referencing this issue.