Skip to content

[Issue]: make error with v1.1.0-2 #2

Description

@billcsm

Problem Description

We followed the instruction to install ANP Plugin v1.1.0-2 from GitHub. The error occurs during the "make" process, and the details are as follows:

$ make RCCL_BUILD=$RCCL_BUILD MPI_INCLUDE=$MPI_INCLUDE MPI_LIB_PATH=$MPI_LIB_PATH
/opt/rocm/bin/hipcc -fPIC -g -O2 -O3 -DNDEBUG -Werror -MMD -MP -DTARGET_PLUGIN -DNCCL_BUILD_RDMA_CORE -UANP_TELEMETRY_ENABLED -D__HIP_PLATFORM_AMD__ -Iinclude -I/opt/rocm/include -I/usr/include -I/home/ubuntu/github/rccl/build/include -I/home/ubuntu/github/rccl/build/hipify/src -I/home/ubuntu/github/rccl/build/hipify/src/include -I/opt/ompi/include -c src/net_ib.cc -o build/src/net_ib.o
src/net_ib.cc:46:9: error: 'NCCL_NET_OPTIONAL_RECV_COMPLETION' macro redefined [-Werror,-Wmacro-redefined]
   46 | #define NCCL_NET_OPTIONAL_RECV_COMPLETION    (void *)0x1
      |         ^
/home/ubuntu/github/rccl/build/hipify/src/include/nccl_net.h:18:9: note: previous definition is here
   18 | #define NCCL_NET_OPTIONAL_RECV_COMPLETION 0x1
      |         ^
src/net_ib.cc:56:9: error: 'NCCL_NET_PLUGIN_SYMBOL' macro redefined [-Werror,-Wmacro-redefined]
   56 | #define NCCL_NET_PLUGIN_SYMBOL ncclNetPlugin_v8
      |         ^
/home/ubuntu/github/rccl/build/hipify/src/include/nccl_net.h:120:9: note: previous definition is here
  120 | #define NCCL_NET_PLUGIN_SYMBOL ncclNetPlugin_v9
      |         ^
src/net_ib.cc:2732:14: error: cannot initialize a member subobject of type 'ncclResult_t (*)(void *, void *, size_t, int, void *, void **)' (aka 'ncclResult_t (*)(void *, void *, unsigned long, int, void *, void **)') with an lvalue of type 'ncclResult_t (void *, void *, int, int, void *, void **)': type mismatch at 3rd parameter ('size_t' (aka 'unsigned long') vs 'int')
 2732 |     .isend = anpNetIsend,
      |              ^~~~~~~~~~~
src/net_ib.cc:2733:14: error: cannot initialize a member subobject of type 'ncclResult_t (*)(void *, int, void **, size_t *, int *, void **, void **)' (aka 'ncclResult_t (*)(void *, int, void **, unsigned long *, int *, void **, void **)') with an lvalue of type 'ncclResult_t (void *, int, void **, int *, int *, void **, void **)': type mismatch at 4th parameter ('size_t *' (aka 'unsigned long *') vs 'int *')
 2733 |     .irecv = anpNetIrecv,
      |              ^~~~~~~~~~~
4 errors generated when compiling for gfx942.
failed to execute:/opt/rocm-6.4.0/lib/llvm/bin/clang++  --offload-arch=gfx942 --offload-arch=gfx942 --offload-arch=gfx942 --offload-arch=gfx942 --offload-arch=gfx942 --offload-arch=gfx942 --offload-arch=gfx942 --offload-arch=gfx942  -fPIC -g -O2 -O3 -DNDEBUG -Werror -MMD -MP -DTARGET_PLUGIN -DNCCL_BUILD_RDMA_CORE -UANP_TELEMETRY_ENABLED -D__HIP_PLATFORM_AMD__ -Iinclude -I/opt/rocm/include -I/usr/include -I/home/ubuntu/github/rccl/build/include -I/home/ubuntu/github/rccl/build/hipify/src -I/home/ubuntu/github/rccl/build/hipify/src/include -I/opt/ompi/include -c -x hip src/net_ib.cc -o "build/src/net_ib.o"
make: *** [Makefile:77: build/src/net_ib.o] Error 1

Can you please investigate this issue for AMD-ANP setup?
Thank you.

Operating System

Ubuntu 22.04.5 LTS

CPU

AMD EPYC 9965 192-Core Processor

GPU

AMD Instinct MI325X

ROCm Version

ROCm 6.4.0

ROCm Component

No response

Steps to Reproduce

No response

(Optional for Linux users) Output of /opt/rocm/bin/rocminfo --support

No response

Additional Information

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions