Skip to content

ch4/ofi: check whether nic pci info is valid#7730

Open
hzhou wants to merge 1 commit intopmodels:mainfrom
hzhou:2602_ofi_nicpci
Open

ch4/ofi: check whether nic pci info is valid#7730
hzhou wants to merge 1 commit intopmodels:mainfrom
hzhou:2602_ofi_nicpci

Conversation

@hzhou
Copy link
Contributor

@hzhou hzhou commented Feb 16, 2026

Pull Request Description

In the multinic case, provider may provide an invalid "null" pci info, which will result in hwloc failing to obtain topology. Rather than dealing this invalid case in the topology code, let's guard this case and deal with it in the higher layer. In the case of ofi multi-nic, we will simply treat all nics are equally close and equally distribute them among the ranks.

[skip warnings]
Fixes #7727

Author Checklist

  • Provide Description
    Particularly focus on why, not what. Reference background, issues, test failures, xfail entries, etc.
  • Commits Follow Good Practice
    Commits are self-contained and do not do two things at once.
    Commit message is of the form: module: short description
    Commit message explains what's in the commit.
  • Passes All Tests
    Whitespace checker. Warnings test. Additional tests via comments.
  • Contribution Agreement
    For non-Argonne authors, check contribution agreement.
    If necessary, request an explicit comment from your companies PR approval manager.

In the multinic case, provider may provide an invalid "null" pci info,
which will result in hwloc failing to obtain topology. Rather than
dealing this invalid case in the topology code, let's guard this case
and deal with it in the higher layer. In the case of ofi multi-nic, we
will simply treat all nics are equally close and equally distribute them
among the ranks.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

MPICH CH4:OFI fails on AMD EPYC when Sub-NUMA Clustering (SNC) is enabled

1 participant