ch4/ofi: rma get algorithm using mirror buffers#7695
Draft
hzhou wants to merge 4 commits intopmodels:mainfrom
Draft
ch4/ofi: rma get algorithm using mirror buffers#7695hzhou wants to merge 4 commits intopmodels:mainfrom
hzhou wants to merge 4 commits intopmodels:mainfrom
Conversation
3b3d43c to
64dd174
Compare
When the base is gpu buffer and native RMA is not supported, currently we allocate pack buffer and perform both gpu registration and nic mr registrations for every put or get. This is inefficient. In addition, provider may not support too many concurrent mr keys. Add MPIR_CVAR_OFI_ENABLE_WIN_MIRROR, if enabled, to allocate a mirror buffer to avoid allocating separate pack buffers and registrations.
Add ofi rma get fallback using active messages and the mirror buffer. The implementation is similar to the mpidig am get, but it doesn't use MPIDIG_REQUEST, and use async frame work for local async steps.
Refactor the asynchronous pipelined RDMA read operation to be reused by other parts of the code. It is a compromise for now to use an MPIR_Request for the async context. This minimizes the amount of code changes to ofi_rndv_read.c. We do not need the full MPIR_Request for the RDMA read operation on its own.
The direct fi_read won't work if the origin buffer is a device buffer or uses non-contig datatypes. Async operation MPIDI_OFI_rdmaread_poll handles pipelined read supporting both device buffer and non-contig datatypes.
Contributor
Author
|
test:mpich/custom/pipeline test:mpich/ch4/ofi |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Pull Request Description
MPI_Getwill fallback to MPIDIG active message layer when either the window buffers are device buffers (and libfabric HMEM is not enabled or supported) or the origin buffers is device buffer. The active message transport in ofi uses RDMA for data greater than 16KB, With gpu memory for both origin buffer and window base buffer, that involves an mr registration per message and extra allocation of staging buffers on both the origin and target side. Besides, both staging are currently synchronous.This PR proposes an alternative fallback algorithm -
[skip warnings]
Author Checklist
Particularly focus on why, not what. Reference background, issues, test failures, xfail entries, etc.
Commits are self-contained and do not do two things at once.
Commit message is of the form:
module: short descriptionCommit message explains what's in the commit.
Whitespace checker. Warnings test. Additional tests via comments.
For non-Argonne authors, check contribution agreement.
If necessary, request an explicit comment from your companies PR approval manager.