This is the repo containing the code for the pure computational kernels for SpMM (Sparse Matrix-Matrix Multiplication) and SDDMM (Sampled Dense Dense Matrix Multiplication). The code is designed to be run on CPUs and NVIDIA GPUs.
- cuSPARSE (vendor provided library)
- Acc-SpMM: code paper
- ASpT: code paper
- RoDe: code paper
- HC-SpMM: code paper
- dgSPARSE: code paper
- GNN-Pilot: code paper
- DTC-SpMM: code paper
- Sputnik: code paper
Make sure to download all dependencies (FusedMM, Sputnik, pyTorch with CUDA support):
git submodule update --init --recursive
cd deps
bash ./install_FusedMM_base.sh
bash ./install_sputnik_base.sh
bash ./install_torch_cuda_base.shEdit the Makefile and the run.sh to build the executables needed for the benchmarks. The paths are configured for the epyc5 server (AMD 24-core CPU and NVIDIA A100 GPU).
In the run.sh file, select the matrices you want to run the benchmarks. You can also configure the number of threads for the CPU programs. For every executable, the second input parameter is the dimension k of the dense matrices.
Dimensions of matrices:
- Sparse matrix:
m x n - SpMM: input dense matrix:
n x k, output dense matrix:m x k - SDDMM: left input dense matrix:
m x k, right input dense matrix:k x n
make clean
make -j
bash ./run.shYou can edit the spmm_bench.cpp and the sddmm_bench.cpp, for better reporting of results (no need to edit the kernel_* files). In their current form, reported are:
- the name of the matrix and the number of rows and nonzeros
- the selected kernel
- the performance (in GFLOPs)
Some kernels (dgSPARSE, GNN-Pilot, DTC-SpMM) have several versions of SpMM kernels. After testing with more matrices, we could choose to keep only one (the best performing).
Some kernels (ACC-SpMM, DTC-SpMM) produce wrong results. We choose to ignore it for now...
For the remaining kernels, some matrices with very small values may fail the result verification check, ignore it. This happens due to the selection of 32-bits float numbers.