Added Persistent Collectives to Develop#70
Conversation
TheMasterDirk
left a comment
There was a problem hiding this comment.
A couple of minor changes, and a thought on your duplicated communicators.
I did not check the algorithms themselves, as I trust that you've got them working with the tests you added.
| #if defined(GPU_AWARE) | ||
| ALLTOALL_INIT_GPU_PAIRWISE, | ||
| ALLTOALL_INIT_GPU_NONBLOCKING, | ||
| #if defined(MPI4) |
There was a problem hiding this comment.
Is this (MPI4) something we define?
| int gpu_error; | ||
| if (request->gpu_sendbuf) | ||
| { | ||
| #if defined(APU) |
There was a problem hiding this comment.
Is this (APU) added anywhere in the CMake code?
|
|
||
| if (local_S_request->n_msgs) | ||
| { | ||
| MPI_Waitall(local_S_request->n_msgs, local_S_request->requests, MPI_STATUSES_IGNORE); MPI_Reduce_local(request->tmpbuf, request->recvbuf, request->count, |
There was a problem hiding this comment.
File could use a formatting pass. Looks like this line didn't get it on its own line
| bool gpu_aware = false; | ||
| bool copy_to_cpu = false; |
There was a problem hiding this comment.
Just a minor tweak to fix compile warnings in non-gpu mode
| bool gpu_aware = false; | |
| bool copy_to_cpu = false; | |
| #if defined(GPU) | |
| bool gpu_aware = false; | |
| bool copy_to_cpu = false; | |
| #endif | |
| bool gpu_aware = false; | ||
| bool copy_to_cpu = false; |
There was a problem hiding this comment.
| bool gpu_aware = false; | |
| bool copy_to_cpu = false; | |
| #if defined(GPU) | |
| bool gpu_aware = false; | |
| bool copy_to_cpu = false; | |
| #endif |
| bool gpu_aware = false; | ||
| bool copy_to_cpu = false; |
There was a problem hiding this comment.
| bool gpu_aware = false; | |
| bool copy_to_cpu = false; | |
| #if defined(GPU) | |
| bool gpu_aware = false; | |
| bool copy_to_cpu = false; | |
| #endif |
| bool gpu_aware = false; | ||
| bool copy_to_cpu = false; |
There was a problem hiding this comment.
| bool gpu_aware = false; | |
| bool copy_to_cpu = false; | |
| #if defined(GPU) | |
| bool gpu_aware = false; | |
| bool copy_to_cpu = false; | |
| #endif |
| if (request->cpu_sendbuf) | ||
| // Added with MPI_Comm_dup, so need to free | ||
| // TODO: now that we have cached communicators, | ||
| // can we avoid this dup?? |
There was a problem hiding this comment.
Is global_comm even needed in the request class? I don't see it used other than the dup and the free. Unless you plan to use it in other algorithms, this global_comm can be removed`.
If it is needed, then I think the dup global_comm can be avoided if we turn MPIL_COMM::global_comm into the new CachedComm type, and then also using the same type in MPIL_Request::global_comm. Let me know if there's a different rationale for why it might need to be duped.
| MPI_Comm_free(&(request->global_comm)); | ||
| } | ||
| if (request->cpu_recvbuf) | ||
| if (request->local_comm != MPI_COMM_NULL) |
There was a problem hiding this comment.
(separate comment for local_comm)
For local comm, I think a similar approach could be used if you changed the type of the of local_comm in allreduce_init_dissemination_loc_core to CachedComm, as then you would not need to duplicate it and can just use the reference counted MPI_Comm wrapper (CachedComm).
This should work as long as anything calling allreduce_init_dissemination_loc_core is providing a communicator we create. Do you expect this to always be the case?
If it's not, then we may need to create a smarter caching mechanism
Merged persistent allreduce, allgather, alltoall, and alltoallv. Also merged the GPU-Aware and Copy-to-CPU methods for each. There are tests for the standard CPU implementations as well as GPU versions. All tests passed on Tuolumne.