Skip to content

Fi mr virt addr kludge#1

Open
hkuno wants to merge 340 commits intoguserav:osc-fsm-ofi-componentfrom
hkuno:fi_mr_virt_addr_kludge
Open

Fi mr virt addr kludge#1
hkuno wants to merge 340 commits intoguserav:osc-fsm-ofi-componentfrom
hkuno:fi_mr_virt_addr_kludge

Conversation

@hkuno
Copy link

@hkuno hkuno commented Sep 17, 2019

Changes to enable tests for Tech Con submission

Note that commit 64d7ea5 is a kludge.
It is unclear whether the zhpe provider for Libfabric is not properly handling the case
where FI_MR_VIRT_ADDR is not set, or whether something is going wrong in ompi.
I will enter a separate issue to make a test or otherwise figure out what is happening.

kawashima-fj and others added 30 commits July 23, 2019 08:45
These issues were introduced in the recent commit b71af0e.
This commit fixes Coverity CID 1451661 and 1451660.

Though `c_info` part was an actual bug, the `c_sendtypes` part was not.

Signed-off-by: KAWASHIMA Takahiro <t-kawashima@jp.fujitsu.com>
…binding

pcollreq/mpif-h: fix MPIX_Alltoallw_init() binding
fortran/mpif-h: fix C to Fortran error code conversion
osc/ucx: Add support for the no_locks info key
Provide locality for all procs on node
Signed-off-by: Nysal Jan K.A <jnysal@in.ibm.com>
Signed-off-by: Mikhail Brinskii <mikhailb@mellanox.com>
Signed-off-by: Austen Lauria <awlauria@us.ibm.com>
Try to prevent the compiler from optimizing out MPIR_Breakpoint().
COLL/TUNED: Update alltoall selection rule for mellanox platform
osc/ucx: Fix data corruption with non-contiguous accumulates
Signed-off-by: Ralph Castain <rhc@pmix.org>
Due to IF_NAMESIZE being a reused and conditionally defined macro,
issues could arise from macro mismatches. In particular, in cases where
opal/util/if.h is included, but net/if.h is not, IF_NAMESIZE will be 32.
If net/if.h is included on Linux systems, IF_NAMESIZE will be 16. This
can cause a mismatch when using the same macro on a system. Thus
different parts of the code can have differring ideas on the size of a
structure containing a char name[IF_NAMESIZE]. To avoid this error case,
we avoid reusing the IF_NAMESIZE macro and instead define our own as
OPAL_IF_NAMESIZE.

Signed-off-by: William Zhang <wilzhang@amazon.com>
…entinel

fortran/use-mpi-f08: do not slurp the sentinel module files
This commit fixes issue open-mpi#6853 by removing
MacOS/Darwin-specific logic from intercept_mmap.
It also opportunistically converts tabs to spaces.

Signed-off-by: Harumi Kuno <harumi.kuno@hpe.com>
Split the sentinel library in ompi/mpi/fortran/use-mpi-f08 into
 - the real sentinel that contains no code (only used to build the .mod files)
 - an internal library that does contain some code
and have libmpi_usempif08.la slurp the latter.

This fixes a regression introduced in open-mpi/ompi@5de5e75

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
…entinel

fortran/use-mpi-f08: slurp missing code
- added PPN hint for UCX context init

Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
Signed-off-by: Ralph Castain <rhc@pmix.org>
and SEEK_CUR. fixes an issue reported by Wei-keng Liao

Fixes Issue open-mpi#6858

Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>
opal/util: Change opal/util/if.h macro IF_NAMESIZE to OPAL_IF_NAMESIZE
io_ompio_file_open: fix offset calculation with SEEK_END
Override the defaults when provided. Ignore LSF binding file if user
overrides by specifying a policy.

Fixes open-mpi#6631

Signed-off-by: Ralph Castain <rhc@pmix.org>
Allow individual jobs to set their map/rank/bind policies
Provide a missing header and paren

Thanks to @zerothi for the assistance

Signed-off-by: Ralph Castain <rhc@pmix.org>
edgargabriel and others added 28 commits November 25, 2019 12:32
Fix typo in comment: contiaing -> containing
Add the missing code to check a return code
…message

fix misleading error message with missing #! interpreter
…y_fortran

cray ftn: modify fortran module loc checker
remove an extra and useless comma

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
…hortfloat

configury: fix a typo in mpiext/shortfloat
oshmem: fix race condition on new contexts
Squash compiler warning.

Signed-off-by: William Bailey <wbailey2@nd.edu>
Squash compiler warning.

ROMIO is third-party software but has an annoying compiler warning;
this is the minimum distance fix.

Signed-off-by: William Bailey <wbailey2@nd.edu>
Nidmap.c and onesided_aggregation.c Compiler Warnings fix
Squash compiler warning due to whitespace/brace problems.

The code block from lines 829-839 was improperly indented, which led to
both the code being confusing and a compiler warning. Comparing this code to
the current version in the MPICH repo made it clear that the code was simply
improperly indented. Fixing the indentation both makes the code readable and
squashes the compiler warning.

Signed-off-by: Maxwell Coil <mcoil@nd.edu>
The prior implementation was simply wrong.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
romio: Update ADIOI_R_Exchange_data function
…-fix

mpool/base: fix basic mpool_base() function
Squash compiler warning. Changed output specifier to match variable type (long int -> long long int).

Signed-off-by: William Bailey <wbailey2@nd.edu>
syscall() returns a long, but we are invoking shmat(), which returns
a void*.

Signed-off-by: Maxwell Coil <mcoil@nd.edu>
Squash compiler warning.

Signed-off-by: Maxwell Coil <mcoil@nd.edu>
Squash compiler warning.

Signed-off-by: Maxwell Coil <mcoil@nd.edu>
…_support

fcoll/two_phase_support: warning stomp
…ls4.fix.flowcontrol.bugs

portals4: fix flow control bugs
Signed-off-by: Harumi Kuno <harumi.kuno@hpe.com>

Conflicts:
	opal/mca/common/ofi/Makefile.am
	opal/mca/common/ofi/common_ofi.c
	opal/mca/common/ofi/common_ofi.h
	opal/mca/common/ofi/configure.m4
	opal/mca/common/ofi/owner.txt
	opal/mca/memory/patcher/configure.m4
Resolve merge conflicts for Dec. 11 top of tree ompi.

Signed-off-by: harumi kuno <harumi.kuno@hpe.com>
Signed-off-by: harumi kuno <harumi.kuno@hpe.com>
Per suggestion of John L. Byrne to stop cleanup error.

Signed-off-by: harumi kuno <harumi.kuno@hpe.com>
E.g., fix copyright date and remove extraneous white space

Signed-off-by: harumi kuno <harumi.kuno@hpe.com>
hkuno pushed a commit to hkuno/ompi that referenced this pull request Mar 3, 2020
This commit addresses two issues in osc/rdma:

 1) It is erroneous to attach regions that overlap. This was being
    allowed but the standard does not allow overlapping attachments.

 2) Overlapping registration regions (4k alignment of attachments)
    appear to be allowed. Add attachment bases to the bookeeping
    structure so we can keep better track of what can be detached.

It is possible that the standard did not intend to allow open-mpi#2. If that
is the case then open-mpi#2 should fail in the same way as guserav#1. There should
be no technical reason to disallow open-mpi#2 at this time.

References open-mpi#7384

Signed-off-by: Nathan Hjelm <hjelmn@google.com>
hkuno pushed a commit to hkuno/ompi that referenced this pull request Jun 20, 2020
This commit addresses two issues in osc/rdma:

 1) It is erroneous to attach regions that overlap. This was being
    allowed but the standard does not allow overlapping attachments.

 2) Overlapping registration regions (4k alignment of attachments)
    appear to be allowed. Add attachment bases to the bookeeping
    structure so we can keep better track of what can be detached.

It is possible that the standard did not intend to allow open-mpi#2. If that
is the case then open-mpi#2 should fail in the same way as guserav#1. There should
be no technical reason to disallow open-mpi#2 at this time.

References open-mpi#7384

Signed-off-by: Nathan Hjelm <hjelmn@google.com>
(cherry picked from commit 6649aef)
Signed-off-by: Nathan Hjelm <hjelmn@google.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.