Make sure you have the required hardware and software to run CUDA programs:
- A CUDA-enabled GPU with compute capability >= 5
- cuda-toolkit version 12.x
- An appropriate nvidia driver (see table here)
- Rustup
Download the compiler from the releases page and unpack it. (or directly download for windows or linux)
continue with Linking the compiler and cloning the sample project
**Note: This document is modified from INSTALL.md and describes building Rust from source. **
Make sure you have installed the following build dependencies:
python3 or 2.7git- A C compiler (when building for the host,
ccis enough; cross-compiling may need additional compilers) curl(not needed on Windows)pkg-configif you are compiling on Linux and targeting Linuxlibiconv(already included with glibc on Debian-based distros)
To build Cargo, you'll also need OpenSSL (libssl-dev or openssl-devel on
most Unix distros).
On this compiler version, you'll need additional tools to compile LLVM:
g++,clang++, or MSVC with versions listed on LLVM's documentationninja, or GNUmake3.81 or later (Ninja is recommended, especially on Windows)cmake3.13.4 or laterlibstdc++-staticmay be required on some Linux distributions such as Fedora and Ubuntu
-
Clone the source with
git:git clone https://github.com/NiekAukes/rust.git cd rust git checkout kernel-dev-codegen
-
Configure the build settings:
./configure --set build.extended=false --set rust.deny-warnings=false
-
Build:
./x.py build --stage 1
MSVC builds of Rust additionally require an installation of Visual Studio 2017
(or later) so rustc can use its linker. The simplest way is to get
Visual Studio, check the "C++ build tools" and "Windows 10 SDK" workload.
With these dependencies installed, you can build the compiler in a cmd.exe
shell by:
-
Clone the source with
git:git clone https://github.com/NiekAukes/rust.git cd rust git checkout kernel-dev-codegen -
Configure the build settings:
x setup compiler
-
Build:
x build --stage 1 --set llvm.download-ci-llvm=false --set rust.deny-warnings=false
Right now, building Rust only works with some known versions of Visual Studio. If you have a more recent version installed and the build system doesn't understand, you may need to force rustbuild to use an older version. This can be done by manually calling the appropriate vcvars file before running the bootstrap.
CALL "C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Auxiliary\Build\vcvars64.bat"
python x.py buildyou can clone the sample project setup found on the rust-kernels repository, Preferably outside the compiler folder to avoid confusion. In rust-kernels, there is the sample folder, and 3 dependencies to execute the code on the GPU.
There are 2 methods to install the compiler.
rustup toolchain link rust-gpuhc [path-to-compiler]/rust-gpuhcor when building from source:
rustup toolchain link rust-gpuhc [path-to-compiler]/build/host/stage1in the sample project, create a new folder called .cargo and in that folder a file config.toml. This file should have contents similar to
[build] rustc = "[path-to-compiler]/rust-gpuhc/bin/rustc" // or when building from source rustc = "[path-to-compiler]/build/host/stage1/bin/rustc"
please note that rust-analyzer is not used to this compiler and may give faulty feedback. please refer to the compiler output for potential syntax errors.
To build and run the code, you can use the cargo tool Rust provides. Make sure you have followed the steps above to link the compiler to rustup or cargo.
With this compiler, writing code for the GPU is quite straightforward. To designate a function to be runnable on the GPU, use the #[kernel] attribute. This attribute can only be used on functions. Furthermore, this function is not callable anymore on the CPU as it is entirely replaced by a bytecode reference.
// an example of a kernel function that fills a simple array
#[kernel]
unsafe fn gpu64(mut a: Buffer<i32>) {
let i = gpu::global_tid_x();
a.set(i as usize, i);
}An important conceptual limitation of GPU programming in Rust is that mutable references may not be passed to the GPU. This is because the reference is inherently copied to multiple threads, which is not allowed in Rust. To work around this, the compiler provides a Buffer<T> type, which refers to a mutable buffer on the GPU. This buffer can be read from and written to without any issues using the set and get methods.
#[kernel]
unsafe fn add(mut a: Buffer<i32>,
mut b: Buffer<i32>,
mut out: Buffer<i32>) {
let i = gpu::global_tid_x() as usize;
out.set(i, a.get(i) + b.get(i));
a.set(i, 0);
b.set(i, 0);
}This constraint is not directly enforced by the compiler. However, the interface to run GPU code only accepts Buffer<T> types, and will not allow you to pass mutable references.
Unfortunately not all language features are supported by the compiler, and some care should be taken when writing code. A very stringent limitation is that the compiler cannot use features defined in the std library, even if they can be defined for non-std use cases. Examples of this are panics and (dynamic) memory allocation. The compiler does not support these features, and will crash when you use them.
To make a program compilable, an engine must be specified. This is done by adding the #![engine(cuda::engine)] attribute to the crate root. This attribute is required for the compiler to know where to store the compiled code.
Note: This section assumes you are using the cuda engine as specified in the previous section.
To run a gpu kernel, you can call the kernel.launch(threads, blocks, args...) function. This function takes the desired number of threads to run per block, the number of blocks to run, and the arguments to pass to the kernel. The arguments must be of the same type as the kernel arguments.
#[kernel]
unsafe fn add2(a: &[i32], b: &[i32], mut out: Buffer<i32>) {
let i = gpu::global_tid_x() as usize;
out.set(i, a[i] + b[i]);
}
fn main() {
let a = vec![1, 2, 3, 4, 5];
let b = vec![5, 4, 3, 2, 1];
let out = Buffer::alloc(5).unwrap();
add2.launch(5, 1, &a.as_slice(), &b.as_slice(), out)
.unwrap();
let result = out.retrieve().unwrap();
println!("{:?}", result);
}Unlike other types, Buffer<T> cannot be used on the CPU. Buffers created are only valid on the GPU. Using a buffer on the CPU will result in UB. Buffer::alloc is used to create a new empty buffer with memory allocated on the GPU. To create a buffer with data, use Buffer::allocate_with.
To copy data from the GPU, you can use the retrieve method. This method will copy the data from the GPU to the CPU and return it as a vector.
The launch_with_dptr function is a more advanced version of the launch function. It allows you to pass pointers to the device memory as an argument. This can be useful when you want to pass data that is continuously updated on the GPU. Use the to_device method to copy variables to the GPU.
fn main() {
let a = vec![1, 2, 3, 4, 5];
let b = vec![5, 4, 3, 2, 1];
// with a Buffer, the data is already on the GPU
// the to_device method simply converts the Buffer to a DPtr
let mut out = Buffer::<i32>::alloc(5).unwrap().to_device().unwrap();
let mut da = a.as_slice().to_device().unwrap();
let mut db = b.as_slice().to_device().unwrap();
add2.launch_with_dptr(5, 1, &mut da, &mut db, &mut out);
let out = out.retrieve();
println!("{:?}", out);
}The compiler is still in development, and some features may not work as expected. This usually results in a crash of the compiler, but may also result in UB. If you encounter any issues, please report them on the issues page of the compiler repository. Please include the code that caused the issue.
-
incompatible NVVM version: Most likely, your driver version is not compatible with the CUDA toolkit you're running. Please install an appropriate nvidia driver (see table here)
-
"parse invalid cast opcode for cast from 'i8*' to 'i64'": This is a known issue with the compiler. Compiling with --release should fix this issue in most cases. If not, please report it on the issues page.