Skip to content

segmentation fault with >= 64 nodes on Frontier #13

Description

@BenWibking

I can run problem 1 successfully on Frontier with < 64 nodes fine, but I get a segmentation fault with >= 64 nodes:

Running with these driver parameters:
  Problem ID    = 1

=============================================
Hypre init times:
=============================================
Hypre init:
  wall clock time = 0.000006 seconds
  Laplacian_27pt:
    (Nx, Ny, Nz) = (1600, 1600, 1600)
    (Px, Py, Pz) = (8, 8, 8)

srun: error: frontier04522: tasks 282-287: Segmentation fault
srun: Terminating StepId=2131722.0

with Segmentation fault errors reported for all of the other MPI ranks as well.

I built Hypre v2.31.0 with:

./configure --with-hip --with-gpu-arch=gfx90a --with-MPI-lib-dirs="${MPICH_DIR}/lib" --with-MPI-libs="mpi" --with-MPI-include="${MPICH_DIR}/include" --enable-mixedint

with cce/17.0.0, rocm/5.7.1, and cray-mpich/8.1.28.

I'm running the problem with:

#SBATCH --ntasks-per-node=8
#SBATCH --cpus-per-task=7
#SBATCH --gpus-per-task=1
#SBATCH --gpu-bind=closest
#SBATCH -N 64

export LD_LIBRARY_PATH=${CRAY_LD_LIBRARY_PATH}:${LD_LIBRARY_PATH}
export MPICH_GPU_SUPPORT_ENABLED=1

srun ./amg -problem 1 -n 200 200 200 -P 8 8 8

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions