From 225bab35564ef77b64f31f30bf84fd37e59eeaac Mon Sep 17 00:00:00 2001 From: James Le Houx Date: Wed, 6 May 2026 22:16:00 +0000 Subject: [PATCH] fix collectBoundarySeeds: snapshot phaseFab to host before scanning MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit §3 of the profiling notebook still segfaulted on Colab T4 with 4.2.12 — this time the silent crash is in collectBoundarySeeds (FloodFill.cpp:45), which is called by PercolationCheck *before* the seed-planting phase 1 that 61cf635 already patched. The function searches the inlet/outlet domain faces for cells whose phase matches phaseID and pushes those into host-side IntVect vectors. The search itself uses amrex::LoopOnCpu reading phase_arr(i, j, k, 0) — and on a CUDA build that Array4 view points at device memory, so a host loop reading through it segfaults the same way the previous host *write* sites did. Fix follows the pattern used elsewhere when CPU code genuinely needs to walk iMultiFab data: snapshot phaseFab into a pinned-host iMultiFab once via amrex::MFInfo().SetArena(amrex::The_Pinned_Arena()), copy device → host, sync, then the existing LoopOnCpu walks the host copy. On CPU builds the snapshot is skipped via #ifdef AMREX_USE_GPU and we just alias the input phaseFab. The ParallelFor + DeviceVector approach used in the seed-planting fix isn't appropriate here because the output is a *list of positions* (not a fixed-size grid write); list-building reductions aren't a clean primitive in AMReX. The pinned-arena snapshot is one-time per PercolationCheck call and is cheap relative to the flood fill itself. Other LoopOnCpu / device-memory sites that still need similar treatment (separate commits, none in the current §3 notebook hot path): - ConnectedComponents.cpp:43 — oi.connected_components only - Diffusion.cpp:127, :238 — native binary only (not Python) - TortuosityHypre.cpp:1012 — checkMatrixProperties() debug - io/DatReader.cpp:232 — oi.read_image with .dat input - io/RawReader.cpp:488 — oi.read_image with .raw input - io/TiffReader.cpp:555, :653 — oi.read_image with .tif input These will surface in tutorials 2/4/7 (read_image workflows) on GPU; the fixes are likely the same pinned-arena snapshot or ParallelFor recipe once we hit them. https://claude.ai/code/session_011dJ5Bwq4Tnr8wxH597XJFf --- src/props/FloodFill.cpp | 23 +++++++++++++++++++---- 1 file changed, 19 insertions(+), 4 deletions(-) diff --git a/src/props/FloodFill.cpp b/src/props/FloodFill.cpp index 8c97610..8fed8bc 100644 --- a/src/props/FloodFill.cpp +++ b/src/props/FloodFill.cpp @@ -35,16 +35,31 @@ void collectBoundarySeeds(const amrex::iMultiFab& phaseFab, int phaseID, int dir amrex::Vector local_inlet; amrex::Vector local_outlet; - // Seed collection must run on CPU (building host-side IntVect vectors). - // Synchronize first to ensure phase data is available on host. + // Seed collection runs on CPU (it builds host-side IntVect vectors). + // On a CUDA build, phaseFab.const_array(mfi) is a view into device-resident + // memory — reading it from a host LoopOnCpu segfaults on T4 / A100. We + // therefore snapshot phaseFab into a pinned-host iMultiFab once, and + // iterate that copy on the CPU. On CPU builds the snapshot is a no-op + // assignment (host data already accessible) so we just alias the input. amrex::Gpu::streamSynchronize(); +#ifdef AMREX_USE_GPU + amrex::iMultiFab phaseFab_host_storage(phaseFab.boxArray(), phaseFab.DistributionMap(), + phaseFab.nComp(), phaseFab.nGrow(), + amrex::MFInfo().SetArena(amrex::The_Pinned_Arena())); + amrex::Copy(phaseFab_host_storage, phaseFab, 0, 0, phaseFab.nComp(), phaseFab.nGrow()); + amrex::Gpu::streamSynchronize(); + const amrex::iMultiFab& phaseFab_host = phaseFab_host_storage; +#else + const amrex::iMultiFab& phaseFab_host = phaseFab; +#endif + #ifdef AMREX_USE_OMP #pragma omp parallel if (amrex::Gpu::notInLaunchRegion()) #endif - for (amrex::MFIter mfi(phaseFab); mfi.isValid(); ++mfi) { + for (amrex::MFIter mfi(phaseFab_host); mfi.isValid(); ++mfi) { const amrex::Box& validBox = mfi.validbox(); - const auto phase_arr = phaseFab.const_array(mfi); + const auto phase_arr = phaseFab_host.const_array(mfi); amrex::Box inlet_intersect = validBox & domain_lo_face; if (!inlet_intersect.isEmpty()) {