Skip to content

[Bug] Native Jetson Orin (sm_87) build: explicit -DCMAKE_CUDA_ARCHITECTURES ignored + CuTe DSL --wrap/cudart shim not propagated to static-lib consumers #117

Description

@suharvest

Describe the bug

On a native Jetson Orin (sm_87, JetPack 6 / CUDA 12.6) build — compiling on the device itself with -DEMBEDDED_TARGET=jetson-orin, not the aarch64 cross toolchain — two build-system issues prevent the project from building/linking out of the box. Impact: blocker for native Orin builds.

1. The top-level default CUDA architecture list clobbers an explicit -DCMAKE_CUDA_ARCHITECTURES.
CMakeLists.txt sets CMAKE_CUDA_ARCHITECTURES = 80;86;89[;100a;120] whenever AARCH64_BUILD is undefined. A native Orin build is not a cross build, so AARCH64_BUILD is unset and the datacenter arch list is forced even when the user configures with -DCMAKE_CUDA_ARCHITECTURES=87. The build then targets the wrong SM.

2. The CUDA<12.8 _cudaLaunchKernelEx wrap and the cutedsl cudart shim are not propagated to the final executable when CuTe DSL is linked through a STATIC_LIBRARY.
cmake/CuteDsl.cmake (cute_dsl_setup()) attaches the cutedsl cudart shim and -Wl,--wrap=_cudaLaunchKernelEx with PRIVATE. A static archive performs no link step, so for STATIC_LIBRARY link targets neither reaches the consuming executable, and the CuTe DSL GEMM path fails to link/resolve on JetPack 6 / CUDA 12.6 (wrapped _cudaLaunchKernelEx / cudaLibrary_t).

Steps/Code to reproduce bug

Build configuration:

# On a Jetson Orin (sm_87, JetPack 6 / CUDA 12.6), native (on-device) build:
cmake -B build -DCMAKE_BUILD_TYPE=Release \
  -DEMBEDDED_TARGET=jetson-orin \
  -DCMAKE_CUDA_ARCHITECTURES=87 \
  -DTRT_PACKAGE_DIR=/path/to/TensorRT
cmake --build build -j
# (1) Configure still resolves CMAKE_CUDA_ARCHITECTURES to 80;86;89 (+100a;120), not 87.
# (2) Linking the CuTe DSL static-library consumer fails on the wrapped _cudaLaunchKernelEx.

Expected behavior

  • An explicit -DCMAKE_CUDA_ARCHITECTURES (e.g. 87) is respected; the project's default arch set applies only when the user does not provide one. Regular x86 builds are unchanged.
  • The cutedsl cudart shim and the --wrap linker option reach the final executable when CuTe DSL is consumed via a static library, so JetPack 6 / CUDA 12.6 builds link.

System information (Edge Device)

  • Platform: NVIDIA Jetson Orin
  • Software release: JetPack 6 (CUDA 12.6)
  • CPU architecture: aarch64
  • GPU compute capability: SM87
  • Build type: Release
  • Library versions:
    • TensorRT Edge-LLM version or commit hash: v0.8.0 (f9cc746)
    • CUDA: 12.6
  • CMake options used:
    • EMBEDDED_TARGET: jetson-orin
    • CMAKE_CUDA_ARCHITECTURES: 87

A fix is ready and will be submitted as a PR referencing this issue.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Fields

    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions