From e52c15368743216532eca0a4d639fe2efd789810 Mon Sep 17 00:00:00 2001 From: Tshepang Mbambo Date: Fri, 24 Apr 2026 09:55:45 +0200 Subject: [PATCH 1/7] sembr src/offload/installation.md --- src/offload/installation.md | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/src/offload/installation.md b/src/offload/installation.md index ac852e01f..bf4070caf 100644 --- a/src/offload/installation.md +++ b/src/offload/installation.md @@ -1,6 +1,7 @@ # Installation -`std::offload` is partly available in nightly builds for users. For now, everyone however still needs to build rustc from source to use all features of it. +`std::offload` is partly available in nightly builds for users. +For now, everyone however still needs to build rustc from source to use all features of it. ## Build instructions From e34390ee75d592e9fbbc31d66b0437d2969dc9b8 Mon Sep 17 00:00:00 2001 From: Tshepang Mbambo Date: Fri, 24 Apr 2026 10:00:40 +0200 Subject: [PATCH 2/7] use a more correct code marker --- src/autodiff/installation.md | 44 ++++++++++++++++++------------------ src/offload/installation.md | 10 ++++---- 2 files changed, 27 insertions(+), 27 deletions(-) diff --git a/src/autodiff/installation.md b/src/autodiff/installation.md index 6b66a9dcb..2ea725e0d 100644 --- a/src/autodiff/installation.md +++ b/src/autodiff/installation.md @@ -1,11 +1,11 @@ # Installation -In the near future, `std::autodiff` should become available for users via rustup. As a rustc/enzyme/autodiff contributor however, you will still need to build rustc from source. -For the meantime, you can download up-to-date builds to enable `std::autodiff` on your latest nightly toolchain, if you are using either of: -**Linux**, with `x86_64-unknown-linux-gnu` or `aarch64-unknown-linux-gnu` -**Windows**, with `x86_64-llvm-mingw` or `aarch64-llvm-mingw` +In the near future, `std::autodiff` should become available for users via rustup. As a rustc/enzyme/autodiff contributor however, you will still need to build rustc from source. +For the meantime, you can download up-to-date builds to enable `std::autodiff` on your latest nightly toolchain, if you are using either of: +**Linux**, with `x86_64-unknown-linux-gnu` or `aarch64-unknown-linux-gnu` +**Windows**, with `x86_64-llvm-mingw` or `aarch64-llvm-mingw` -You can also download slightly outdated builds for **Apple** (aarch64-apple), which should generally work for now. +You can also download slightly outdated builds for **Apple** (aarch64-apple), which should generally work for now. If you need any other platform, you can build rustc including autodiff from source. Please open an issue if you want to help enabling automatic builds for your prefered target. @@ -15,8 +15,8 @@ If you want to use `std::autodiff` and don't plan to contribute PR's to the proj For now, you'll have to manually download and copy it. 1) On our github repository, find the last merged PR: [`Repo`] -2) Scroll down to the lower end of the PR, where you'll find a rust-bors message saying `Test successful` with a `CI` link. -3) Click on the `CI` link, and grep for your target. E.g. `dist-x86_64-linux` or `dist-aarch64-llvm-mingw` and click `Load summary`. +2) Scroll down to the lower end of the PR, where you'll find a rust-bors message saying `Test successful` with a `CI` link. +3) Click on the `CI` link, and grep for your target. E.g. `dist-x86_64-linux` or `dist-aarch64-llvm-mingw` and click `Load summary`. 4) Under the `CI artifacts` section, find the `enzyme-nightly` artifact, download, and unpack it. 5) Copy the artifact (libEnzyme-22.so for linux, libEnzyme-22.dylib for apple, etc.), which should be in a folder named `enzyme-preview`, to your rust toolchain directory. E.g. for linux: `cp ~/Downloads/enzyme-nightly-x86_64-unknown-linux-gnu/enzyme-preview/lib/rustlib/x86_64-unknown-linux-gnu/lib/libEnzyme-22.so ~/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/x86_64-unknown-linux-gnu/lib` @@ -32,7 +32,7 @@ In that case you might use the following nix configuration to get a rustc that s url = "https://ci-artifacts.rust-lang.org/rustc-builds/ec818fda361ca216eb186f5cf45131bd9c776bb4/enzyme-nightly-x86_64-unknown-linux-gnu.tar.xz"; sha256 = "sha256-Rnrop44vzS+qmYNaRoMNNMFyAc3YsMnwdNGYMXpZ5VY="; }; - + rustToolchain = pkgs.symlinkJoin { name = "rust-with-enzyme"; paths = [pkgs.rust-bin.nightly.latest.default]; @@ -49,26 +49,26 @@ In that case you might use the following nix configuration to get a rustc that s ## Build instructions First you need to clone and configure the Rust repository. Based on your preferences, you might also want to `--enable-clang` or `--enable-lld`. -```bash +```console git clone git@github.com:rust-lang/rust cd rust ./configure --release-channel=nightly --enable-llvm-enzyme --enable-llvm-link-shared --enable-llvm-assertions --enable-ninja --enable-option-checking --disable-docs --set llvm.download-ci-llvm=false ``` Afterwards you can build rustc using: -```bash +```console ./x build --stage 1 library ``` Afterwards rustc toolchain link will allow you to use it through cargo: -``` +```console rustup toolchain link enzyme build/host/stage1 rustup toolchain install nightly # enables -Z unstable-options ``` You can then run our test cases: -```bash +```console ./x test --stage 1 tests/codegen-llvm/autodiff ./x test --stage 1 tests/pretty/autodiff ./x test --stage 1 tests/ui/autodiff @@ -76,27 +76,27 @@ You can then run our test cases: ./x test --stage 1 tests/ui/feature-gates/feature-gate-autodiff.rs ``` -Autodiff is still experimental, so if you want to use it in your own projects, you will need to add `lto="fat"` to your Cargo.toml -and use `RUSTFLAGS="-Zautodiff=Enable" cargo +enzyme` instead of `cargo` or `cargo +nightly`. +Autodiff is still experimental, so if you want to use it in your own projects, you will need to add `lto="fat"` to your Cargo.toml +and use `RUSTFLAGS="-Zautodiff=Enable" cargo +enzyme` instead of `cargo` or `cargo +nightly`. ## Compiler Explorer and dist builds Our compiler explorer instance can be updated to a newer rustc in a similar way. First, prepare a docker instance. -```bash +```console docker run -it ubuntu:22.04 export CC=clang CXX=clang++ apt update -apt install wget vim python3 git curl libssl-dev pkg-config lld ninja-build cmake clang build-essential +apt install wget vim python3 git curl libssl-dev pkg-config lld ninja-build cmake clang build-essential ``` Then build rustc in a slightly altered way: -```bash +```console git clone https://github.com/rust-lang/rust cd rust ./configure --release-channel=nightly --enable-llvm-enzyme --enable-llvm-link-shared --enable-llvm-assertions --enable-ninja --enable-option-checking --disable-docs --set llvm.download-ci-llvm=false ./x dist ``` We then copy the tarball to our host. The dockerid is the newest entry under `docker ps -a`. -```bash +```console docker cp :/rust/build/dist/rust-nightly-x86_64-unknown-linux-gnu.tar.gz rust-nightly-x86_64-unknown-linux-gnu.tar.gz ``` Afterwards we can create a new (pre-release) tag on the EnzymeAD/rust repository and make a PR against the EnzymeAD/enzyme-explorer repository to update the tag. @@ -110,7 +110,7 @@ Following the Rust build instruction above will build LLVMEnzyme, LLDEnzyme, and We recommend that approach, if you just want to use any of them and have no experience with cmake. However, if you prefer to just build Enzyme without Rust, then these instructions might help. -```bash +```console git clone git@github.com:llvm/llvm-project cd llvm-project mkdir build @@ -121,11 +121,11 @@ ninja install ``` This gives you a working LLVM build, now we can continue with building Enzyme. Leave the `llvm-project` folder, and execute the following commands: -```bash +```console git clone git@github.com:EnzymeAD/Enzyme cd Enzyme/enzyme -mkdir build -cd build +mkdir build +cd build cmake .. -G Ninja -DLLVM_DIR=/llvm-project/build/lib/cmake/llvm/ -DLLVM_EXTERNAL_LIT=/llvm-project/llvm/utils/lit/lit.py -DCMAKE_BUILD_TYPE=Release -DCMAKE_EXPORT_COMPILE_COMMANDS=YES -DBUILD_SHARED_LIBS=ON ninja ``` diff --git a/src/offload/installation.md b/src/offload/installation.md index bf4070caf..00233d9b0 100644 --- a/src/offload/installation.md +++ b/src/offload/installation.md @@ -6,19 +6,19 @@ For now, everyone however still needs to build rustc from source to use all feat ## Build instructions First you need to clone and configure the Rust repository: -```bash +```console git clone git@github.com:rust-lang/rust cd rust ./configure --enable-llvm-link-shared --release-channel=nightly --enable-llvm-assertions --enable-llvm-offload --enable-llvm-enzyme --enable-clang --enable-lld --enable-option-checking --enable-ninja --disable-docs ``` Afterwards you can build rustc using: -```bash +```console ./x build --stage 1 library ``` Afterwards rustc toolchain link will allow you to use it through cargo: -``` +```console rustup toolchain link offload build/host/stage1 rustup toolchain install nightly # enables -Z unstable-options ``` @@ -26,7 +26,7 @@ rustup toolchain install nightly # enables -Z unstable-options ## Build instruction for LLVM itself -```bash +```console git clone git@github.com:llvm/llvm-project cd llvm-project mkdir build @@ -40,6 +40,6 @@ This gives you a working LLVM build. ## Testing run -``` +```console ./x test --stage 1 tests/codegen-llvm/gpu_offload ``` From dd85042fa8945cff997a11b27a76555307b1a0f4 Mon Sep 17 00:00:00 2001 From: Tshepang Mbambo Date: Fri, 24 Apr 2026 10:03:53 +0200 Subject: [PATCH 3/7] sembr src/autodiff/installation.md --- src/autodiff/installation.md | 32 ++++++++++++++++++++++---------- 1 file changed, 22 insertions(+), 10 deletions(-) diff --git a/src/autodiff/installation.md b/src/autodiff/installation.md index 2ea725e0d..648e6c0cc 100644 --- a/src/autodiff/installation.md +++ b/src/autodiff/installation.md @@ -1,17 +1,20 @@ # Installation -In the near future, `std::autodiff` should become available for users via rustup. As a rustc/enzyme/autodiff contributor however, you will still need to build rustc from source. +In the near future, `std::autodiff` should become available for users via rustup. +As a rustc/enzyme/autodiff contributor however, you will still need to build rustc from source. For the meantime, you can download up-to-date builds to enable `std::autodiff` on your latest nightly toolchain, if you are using either of: **Linux**, with `x86_64-unknown-linux-gnu` or `aarch64-unknown-linux-gnu` **Windows**, with `x86_64-llvm-mingw` or `aarch64-llvm-mingw` You can also download slightly outdated builds for **Apple** (aarch64-apple), which should generally work for now. -If you need any other platform, you can build rustc including autodiff from source. Please open an issue if you want to help enabling automatic builds for your prefered target. +If you need any other platform, you can build rustc including autodiff from source. +Please open an issue if you want to help enabling automatic builds for your prefered target. ## Installation guide -If you want to use `std::autodiff` and don't plan to contribute PR's to the project, then we recommend to just use your existing nightly installation and download the missing component. In the future, rustup will be able to do it for you. +If you want to use `std::autodiff` and don't plan to contribute PR's to the project, then we recommend to just use your existing nightly installation and download the missing component. +In the future, rustup will be able to do it for you. For now, you'll have to manually download and copy it. 1) On our github repository, find the last merged PR: [`Repo`] @@ -20,11 +23,14 @@ For now, you'll have to manually download and copy it. 4) Under the `CI artifacts` section, find the `enzyme-nightly` artifact, download, and unpack it. 5) Copy the artifact (libEnzyme-22.so for linux, libEnzyme-22.dylib for apple, etc.), which should be in a folder named `enzyme-preview`, to your rust toolchain directory. E.g. for linux: `cp ~/Downloads/enzyme-nightly-x86_64-unknown-linux-gnu/enzyme-preview/lib/rustlib/x86_64-unknown-linux-gnu/lib/libEnzyme-22.so ~/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/x86_64-unknown-linux-gnu/lib` -Apple support was temporarily reverted, due to downstream breakages. If you want to download autodiff for apple, please look at the artifacts from this [`run`]. +Apple support was temporarily reverted, due to downstream breakages. +If you want to download autodiff for apple, please look at the artifacts from this [`run`]. ## Installation guide for Nix user. -This setup was recommended by a nix and autodiff user. It uses [`Overlay`]. Please verify for yourself if you are comfortable using that repository. +This setup was recommended by a nix and autodiff user. +It uses [`Overlay`]. +Please verify for yourself if you are comfortable using that repository. In that case you might use the following nix configuration to get a rustc that supports `std::autodiff`. ```nix { @@ -48,7 +54,8 @@ In that case you might use the following nix configuration to get a rustc that s ## Build instructions -First you need to clone and configure the Rust repository. Based on your preferences, you might also want to `--enable-clang` or `--enable-lld`. +First you need to clone and configure the Rust repository. +Based on your preferences, you might also want to `--enable-clang` or `--enable-lld`. ```console git clone git@github.com:rust-lang/rust cd rust @@ -81,7 +88,8 @@ and use `RUSTFLAGS="-Zautodiff=Enable" cargo +enzyme` instead of `cargo` or `car ## Compiler Explorer and dist builds -Our compiler explorer instance can be updated to a newer rustc in a similar way. First, prepare a docker instance. +Our compiler explorer instance can be updated to a newer rustc in a similar way. +First, prepare a docker instance. ```console docker run -it ubuntu:22.04 export CC=clang CXX=clang++ @@ -95,12 +103,15 @@ cd rust ./configure --release-channel=nightly --enable-llvm-enzyme --enable-llvm-link-shared --enable-llvm-assertions --enable-ninja --enable-option-checking --disable-docs --set llvm.download-ci-llvm=false ./x dist ``` -We then copy the tarball to our host. The dockerid is the newest entry under `docker ps -a`. +We then copy the tarball to our host. +The dockerid is the newest entry under `docker ps -a`. ```console docker cp :/rust/build/dist/rust-nightly-x86_64-unknown-linux-gnu.tar.gz rust-nightly-x86_64-unknown-linux-gnu.tar.gz ``` Afterwards we can create a new (pre-release) tag on the EnzymeAD/rust repository and make a PR against the EnzymeAD/enzyme-explorer repository to update the tag. -Remember to ping `tgymnich` on the PR to run his update script. Note: We should archive EnzymeAD/rust and update the instructions here. The explorer should soon +Remember to ping `tgymnich` on the PR to run his update script. +Note: We should archive EnzymeAD/rust and update the instructions here. +The explorer should soon be able to get the rustc toolchain from the official rust servers. @@ -129,7 +140,8 @@ cd build cmake .. -G Ninja -DLLVM_DIR=/llvm-project/build/lib/cmake/llvm/ -DLLVM_EXTERNAL_LIT=/llvm-project/llvm/utils/lit/lit.py -DCMAKE_BUILD_TYPE=Release -DCMAKE_EXPORT_COMPILE_COMMANDS=YES -DBUILD_SHARED_LIBS=ON ninja ``` -This will build Enzyme, and you can find it in `Enzyme/enzyme/build/lib/Enzyme.so`. (Endings might differ based on your OS). +This will build Enzyme, and you can find it in `Enzyme/enzyme/build/lib/Enzyme.so`. +(Endings might differ based on your OS). [`Repo`]: https://github.com/rust-lang/rust/ [`run`]: https://github.com/rust-lang/rust/pull/153026#issuecomment-3950046599 From c6e47e01ce880c7c16978dc04cf224f445eb15b2 Mon Sep 17 00:00:00 2001 From: Tshepang Mbambo Date: Fri, 24 Apr 2026 10:04:43 +0200 Subject: [PATCH 4/7] sembr src/offload/internals.md --- src/offload/internals.md | 19 +++++++++++++------ 1 file changed, 13 insertions(+), 6 deletions(-) diff --git a/src/offload/internals.md b/src/offload/internals.md index 836fd7ad2..87c0fa9de 100644 --- a/src/offload/internals.md +++ b/src/offload/internals.md @@ -1,22 +1,29 @@ # std::offload -This module is under active development. Once upstream, it should allow Rust developers to run Rust code on GPUs. +This module is under active development. +Once upstream, it should allow Rust developers to run Rust code on GPUs. We aim to develop a `rusty` GPU programming interface, which is safe, convenient and sufficiently fast by default. -This includes automatic data movement to and from the GPU, in a efficient way. We will (later) +This includes automatic data movement to and from the GPU, in a efficient way. +We will (later) also offer more advanced, possibly unsafe, interfaces which allow a higher degree of control. The implementation is based on LLVM's "offload" project, which is already used by OpenMP to run Fortran or C++ code on GPUs. While the project is under development, users will need to call other compilers like clang to finish the compilation process. ## High-level compilation design: -We use a single-source, two-pass compilation approach. +We use a single-source, two-pass compilation approach. -First we compile all functions that should be offloaded for the device (e.g nvptx64, amdgcn-amd-amdhsa, intel in the future). Currently we require cumbersome `#cfg(target_os="")` annotations, but we intend to recognize those in the future based on our offload intrinsic. -This first compilation currently does not leverage rustc's internal Query system, so it will always recompile your kernels at the moment. This should be easy to fix, but we prioritize features and runtime performance improvements at the moment. Please reach out if you want to implement it, though! +First we compile all functions that should be offloaded for the device (e.g nvptx64, amdgcn-amd-amdhsa, intel in the future). +Currently we require cumbersome `#cfg(target_os="")` annotations, but we intend to recognize those in the future based on our offload intrinsic. +This first compilation currently does not leverage rustc's internal Query system, so it will always recompile your kernels at the moment. +This should be easy to fix, but we prioritize features and runtime performance improvements at the moment. +Please reach out if you want to implement it, though! We then compile the code for the host (e.g. x86-64), where most of the offloading logic happens. On the host side, we generate calls to the openmp offload runtime, to inform it about the layout of the types (a simplified version of the autodiff TypeTrees). We also use the type system to figure out whether kernel arguments have to be moved only to the device (e.g. `&[f32;1024]`), from the device, or both (e.g. `&mut [f64]`). We then launch the kernel, after which we inform the runtime to end this environment and move data back (as far as needed). -The second pass for the host will load the kernel artifacts from the previous compilation. rustc in general may not "guess" or hardcode the build directory layout, and as such it must be told the path to the kernel artifacts in the second invocation. The logic for this could be integrated into cargo, but it also only requires a trivial cargo wrapper, which we could trivially provide via crates.io till we see larger adoption. +The second pass for the host will load the kernel artifacts from the previous compilation. +rustc in general may not "guess" or hardcode the build directory layout, and as such it must be told the path to the kernel artifacts in the second invocation. +The logic for this could be integrated into cargo, but it also only requires a trivial cargo wrapper, which we could trivially provide via crates.io till we see larger adoption. It might seem tempting to think about a single-source, single pass compilation approach. However, a lot of the rustc frontend (e.g. AST) will drop any dead code (e.g. code behind an inactive `cfg`). Getting the frontend to expand and lower code for two targets naively will result in multiple definitions of the same symbol (and other issues). Trying to teach the whole rustc middle and backend to be aware that any symbol now might contain two implementations is a large undertaking, and it is questionable why we should make the whole compiler more complex, if the alternative is a ~5 line cargo wrapper. We still control the full compilation pipeline and have both host and device code available, therefore there shouldn't be a runtime performance difference between the two approaches. From a427d2ecb9618aae991cf0cb918c875bb2a26084 Mon Sep 17 00:00:00 2001 From: Tshepang Mbambo Date: Fri, 24 Apr 2026 10:08:39 +0200 Subject: [PATCH 5/7] reflow --- src/offload/internals.md | 40 +++++++++++++++++++++++++++++----------- 1 file changed, 29 insertions(+), 11 deletions(-) diff --git a/src/offload/internals.md b/src/offload/internals.md index 87c0fa9de..78a1a852d 100644 --- a/src/offload/internals.md +++ b/src/offload/internals.md @@ -4,26 +4,44 @@ This module is under active development. Once upstream, it should allow Rust developers to run Rust code on GPUs. We aim to develop a `rusty` GPU programming interface, which is safe, convenient and sufficiently fast by default. This includes automatic data movement to and from the GPU, in a efficient way. -We will (later) -also offer more advanced, possibly unsafe, interfaces which allow a higher degree of control. +We will (later) also offer more advanced, +possibly unsafe, interfaces which allow a higher degree of control. -The implementation is based on LLVM's "offload" project, which is already used by OpenMP to run Fortran or C++ code on GPUs. -While the project is under development, users will need to call other compilers like clang to finish the compilation process. +The implementation is based on LLVM's "offload" project, +which is already used by OpenMP to run Fortran or C++ code on GPUs. +While the project is under development, +users will need to call other compilers like clang to finish the compilation process. ## High-level compilation design: + We use a single-source, two-pass compilation approach. -First we compile all functions that should be offloaded for the device (e.g nvptx64, amdgcn-amd-amdhsa, intel in the future). +First we compile all functions that should be offloaded for the device +(e.g nvptx64, amdgcn-amd-amdhsa, intel in the future). Currently we require cumbersome `#cfg(target_os="")` annotations, but we intend to recognize those in the future based on our offload intrinsic. This first compilation currently does not leverage rustc's internal Query system, so it will always recompile your kernels at the moment. This should be easy to fix, but we prioritize features and runtime performance improvements at the moment. Please reach out if you want to implement it, though! -We then compile the code for the host (e.g. x86-64), where most of the offloading logic happens. On the host side, we generate calls to the openmp offload runtime, to inform it about the layout of the types (a simplified version of the autodiff TypeTrees). We also use the type system to figure out whether kernel arguments have to be moved only to the device (e.g. `&[f32;1024]`), from the device, or both (e.g. `&mut [f64]`). We then launch the kernel, after which we inform the runtime to end this environment and move data back (as far as needed). +We then compile the code for the host (e.g. x86-64), where most of the offloading logic happens. +On the host side, we generate calls to the openmp offload runtime, +to inform it about the layout of the types (a simplified version of the autodiff TypeTrees). +We also use the type system to figure out whether kernel arguments have to be moved only to the device (e.g. `&[f32;1024]`), +from the device, or both (e.g. `&mut [f64]`). +We then launch the kernel, +after which we inform the runtime to end this environment and move data back (as far as needed). The second pass for the host will load the kernel artifacts from the previous compilation. -rustc in general may not "guess" or hardcode the build directory layout, and as such it must be told the path to the kernel artifacts in the second invocation. -The logic for this could be integrated into cargo, but it also only requires a trivial cargo wrapper, which we could trivially provide via crates.io till we see larger adoption. - -It might seem tempting to think about a single-source, single pass compilation approach. However, a lot of the rustc frontend (e.g. AST) will drop any dead code (e.g. code behind an inactive `cfg`). Getting the frontend to expand and lower code for two targets naively will result in multiple definitions of the same symbol (and other issues). Trying to teach the whole rustc middle and backend to be aware that any symbol now might contain two implementations is a large undertaking, and it is questionable why we should make the whole compiler more complex, if the alternative is a ~5 line cargo wrapper. We still control the full compilation pipeline and have both host and device code available, therefore there shouldn't be a runtime performance difference between the two approaches. - +rustc in general may not "guess" or hardcode the build directory layout, +and as such it must be told the path to the kernel artifacts in the second invocation. +The logic for this could be integrated into cargo, +but it also only requires a trivial cargo wrapper, +which we could trivially provide via crates.io till we see larger adoption. + +It might seem tempting to think about a single-source, single pass compilation approach. +However, a lot of the rustc frontend (e.g. AST) will drop any dead code (e.g. code behind an inactive `cfg`). +Getting the frontend to expand and lower code for two targets naively will result in multiple definitions of the same symbol (and other issues). +Trying to teach the whole rustc middle and backend to be aware that any symbol now might contain two implementations is a large undertaking, +and it is questionable why we should make the whole compiler more complex, if the alternative is a ~5 line cargo wrapper. +We still control the full compilation pipeline and have both host and device code available, +therefore there shouldn't be a runtime performance difference between the two approaches. From 11f2641262ecb90cafb6d1587ee65ba04d4dd73f Mon Sep 17 00:00:00 2001 From: Tshepang Mbambo Date: Fri, 24 Apr 2026 10:11:37 +0200 Subject: [PATCH 6/7] sembr src/offload/contributing.md --- src/offload/contributing.md | 13 ++++++++++--- 1 file changed, 10 insertions(+), 3 deletions(-) diff --git a/src/offload/contributing.md b/src/offload/contributing.md index f3a1ed215..5211bdf43 100644 --- a/src/offload/contributing.md +++ b/src/offload/contributing.md @@ -1,8 +1,13 @@ # Contributing -Contributions are always welcome. This project is experimental, so the documentation and code are likely incomplete. Please ask on [Zulip](https://rust-lang.zulipchat.com/#narrow/channel/422870-t-compiler.2Fgpgpu-backend) (preferred) or the Rust Community Discord for help if you get stuck or if our documentation is unclear. +Contributions are always welcome. +This project is experimental, so the documentation and code are likely incomplete. +Please ask on [Zulip](https://rust-lang.zulipchat.com/#narrow/channel/422870-t-compiler.2Fgpgpu-backend) (preferred) or the Rust Community Discord for help if you get stuck or if our documentation is unclear. -We generally try to automate as much of the compilation process as possible for users. However, as a contributor it might sometimes be easier to directly rewrite and compile the LLVM-IR modules (.ll) to quickly iterate on changes, without needing to repeatedly recompile rustc. For people familiar with LLVM we therefore have the shell script below. Only when you are then happy with the IR changes you can work on updating rustc to generate the new, desired output. +We generally try to automate as much of the compilation process as possible for users. +However, as a contributor it might sometimes be easier to directly rewrite and compile the LLVM-IR modules (.ll) to quickly iterate on changes, without needing to repeatedly recompile rustc. +For people familiar with LLVM we therefore have the shell script below. +Only when you are then happy with the IR changes you can work on updating rustc to generate the new, desired output. ```sh set -e @@ -29,4 +34,6 @@ opt lib.ll -o lib.bc LIBOMPTARGET_INFO=-1 OFFLOAD_TRACK_ALLOCATION_TRACES=true ./a.out ``` -Please update the `` placeholders on the `clang-linker-wrapper` invocation. You will likely also need to adjust the library paths. See the linked usage section for details: [usage](usage.md#compile-instructions) +Please update the `` placeholders on the `clang-linker-wrapper` invocation. +You will likely also need to adjust the library paths. +See the linked usage section for details: [usage](usage.md#compile-instructions) From 4e867ff75348704cae8e666f9aa56129d6ca1b74 Mon Sep 17 00:00:00 2001 From: Tshepang Mbambo Date: Fri, 24 Apr 2026 10:12:45 +0200 Subject: [PATCH 7/7] sembr src/offload/usage.md --- src/offload/usage.md | 20 ++++++++++++++------ 1 file changed, 14 insertions(+), 6 deletions(-) diff --git a/src/offload/usage.md b/src/offload/usage.md index 4d3222123..c7ce2ded4 100644 --- a/src/offload/usage.md +++ b/src/offload/usage.md @@ -1,7 +1,9 @@ # Usage -This feature is work-in-progress, and not ready for usage. The instructions here are for contributors, or people interested in following the latest progress. -We currently work on launching the following Rust kernel on the GPU. To follow along, copy it to a `src/lib.rs` file. +This feature is work-in-progress, and not ready for usage. +The instructions here are for contributors, or people interested in following the latest progress. +We currently work on launching the following Rust kernel on the GPU. +To follow along, copy it to a `src/lib.rs` file. ```rust #![feature(abi_gpu_kernel)] @@ -75,9 +77,12 @@ pub extern "gpu-kernel" fn kernel_1(x: *mut [f64; 256]) { ``` ## Compile instructions -It is important to use a clang compiler build on the same llvm as rustc. Just calling clang without the full path will likely use your system clang, which probably will be incompatible. So either substitute clang/lld invocations below with absolute path, or set your `PATH` accordingly. +It is important to use a clang compiler build on the same llvm as rustc. +Just calling clang without the full path will likely use your system clang, which probably will be incompatible. +So either substitute clang/lld invocations below with absolute path, or set your `PATH` accordingly. -First we generate the device (gpu) code. Replace the target-cpu with the right code for your gpu. +First we generate the device (gpu) code. +Replace the target-cpu with the right code for your gpu. ``` RUSTFLAGS="-Ctarget-cpu=gfx90a --emit=llvm-bc,llvm-ir -Zoffload=Device -Csave-temps -Zunstable-options" cargo +offload build -Zunstable-options -r -v --target amdgcn-amd-amdhsa -Zbuild-std=core ``` @@ -94,8 +99,11 @@ While we integrated most offload steps into rustc by now, one binary invocation "clang-linker-wrapper" "--should-extract=gfx90a" "--device-compiler=amdgcn-amd-amdhsa=-g" "--device-compiler=amdgcn-amd-amdhsa=-save-temps=cwd" "--device-linker=amdgcn-amd-amdhsa=-lompdevice" "--host-triple=x86_64-unknown-linux-gnu" "--save-temps" "--linker-path=/ABSOlUTE_PATH_TO/rust/build/x86_64-unknown-linux-gnu/lld/bin/ld.lld" "--hash-style=gnu" "--eh-frame-hdr" "-m" "elf_x86_64" "-pie" "-dynamic-linker" "/lib64/ld-linux-x86-64.so.2" "-o" "bare" "/lib/../lib64/Scrt1.o" "/lib/../lib64/crti.o" "/ABSOLUTE_PATH_TO/crtbeginS.o" "-L/ABSOLUTE_PATH_TO/rust/build/x86_64-unknown-linux-gnu/llvm/bin/../lib/x86_64-unknown-linux-gnu" "-L/ABSOLUTE_PATH_TO/rust/build/x86_64-unknown-linux-gnu/llvm/lib/clang/21/lib/x86_64-unknown-linux-gnu" "-L/lib/../lib64" "-L/usr/lib64" "-L/lib" "-L/usr/lib" "target//release/host.o" "-lstdc++" "-lm" "-lomp" "-lomptarget" "-L/ABSOLUTE_PATH_TO/rust/build/x86_64-unknown-linux-gnu/llvm/lib" "-lgcc_s" "-lgcc" "-lpthread" "-lc" "-lgcc_s" "-lgcc" "/ABSOLUTE_PATH_TO/crtendS.o" "/lib/../lib64/crtn.o" ``` -You can try to find the paths to those files on your system. However, I recommend to not fix the paths, but rather just re-generate them by copying a bare-mode openmp example and compiling it with your clang. By adding `-###` to your clang invocation, you can see the invidual steps. -It will show multiple steps, just look for the clang-linker-wrapper example. Make sure to still include the path to the `host.o` file, and not whatever tmp file you got when compiling your c++ example with the following call. +You can try to find the paths to those files on your system. +However, I recommend to not fix the paths, but rather just re-generate them by copying a bare-mode openmp example and compiling it with your clang. +By adding `-###` to your clang invocation, you can see the invidual steps. +It will show multiple steps, just look for the clang-linker-wrapper example. +Make sure to still include the path to the `host.o` file, and not whatever tmp file you got when compiling your c++ example with the following call. ``` myclang++ -fuse-ld=lld -O3 -fopenmp -fopenmp-offload-mandatory --offload-arch=gfx90a omp_bare.cpp -o main -### ```