Open
Conversation
52c723a to
51a11c1
Compare
4 tasks
When p2p bench tests are launched with more than 2 procs, form even/odd pairs and perform pair-wise tests. This patch adapts p2p_one for now.
Allocate a single global buffer. Benchmarks will rotate inside the global buffer to discount the effect of memory caching for small messages. Bandwidth measurements to use multiple buffers within the global buffer for a more correct usage.
There isn't need to batch a window for large message bandwidth measurement because having multiple message completion inside a single progress poll won't be realistic. Rather, having multiple concurrent large messages in progress may slow bandwidth if the underlying algorithm such as pipelining may overwhelm progress. Have window size reduce to 1 for large messages. This also makes completion the benchmark faster so we can go up to larger MAX_BUFSIZE.
Similar to p2p_one and p2p_bw, add self_one and self_bw that tests sending self messages within a single process. Self tests are useful in testing local memcpy bandwidth.
Add the fast path in case the memory is from gpu.
Add more collective latency tests.
Rename bcast.def to coll_latency.def to add multiple collective latency tests. Just add allreduce for now.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Pull Request Description
It will run 4-pair bandwidth tests between consecutive even/odd ranks.
Add
p2p_selftests. It is comparable to amemcpybandwidth testrotate buffers in p2p tests. Avoid undesired caching effect esp. for small messages.
Adjust window size in bandwidth tests. More accurate and faster for large message measurements.
[skip warnings]
Author Checklist
Particularly focus on why, not what. Reference background, issues, test failures, xfail entries, etc.
Commits are self-contained and do not do two things at once.
Commit message is of the form:
module: short descriptionCommit message explains what's in the commit.
Whitespace checker. Warnings test. Additional tests via comments.
For non-Argonne authors, check contribution agreement.
If necessary, request an explicit comment from your companies PR approval manager.