fix collectBoundarySeeds: snapshot phaseFab to host before scanning#272
Merged
jameslehoux merged 1 commit intomasterfrom May 6, 2026
Merged
fix collectBoundarySeeds: snapshot phaseFab to host before scanning#272jameslehoux merged 1 commit intomasterfrom
jameslehoux merged 1 commit intomasterfrom
Conversation
§3 of the profiling notebook still segfaulted on Colab T4 with 4.2.12 — this time the silent crash is in collectBoundarySeeds (FloodFill.cpp:45), which is called by PercolationCheck *before* the seed-planting phase 1 that 61cf635 already patched. The function searches the inlet/outlet domain faces for cells whose phase matches phaseID and pushes those into host-side IntVect vectors. The search itself uses amrex::LoopOnCpu reading phase_arr(i, j, k, 0) — and on a CUDA build that Array4<int> view points at device memory, so a host loop reading through it segfaults the same way the previous host *write* sites did. Fix follows the pattern used elsewhere when CPU code genuinely needs to walk iMultiFab data: snapshot phaseFab into a pinned-host iMultiFab once via amrex::MFInfo().SetArena(amrex::The_Pinned_Arena()), copy device → host, sync, then the existing LoopOnCpu walks the host copy. On CPU builds the snapshot is skipped via #ifdef AMREX_USE_GPU and we just alias the input phaseFab. The ParallelFor + DeviceVector approach used in the seed-planting fix isn't appropriate here because the output is a *list of positions* (not a fixed-size grid write); list-building reductions aren't a clean primitive in AMReX. The pinned-arena snapshot is one-time per PercolationCheck call and is cheap relative to the flood fill itself. Other LoopOnCpu / device-memory sites that still need similar treatment (separate commits, none in the current §3 notebook hot path): - ConnectedComponents.cpp:43 — oi.connected_components only - Diffusion.cpp:127, :238 — native binary only (not Python) - TortuosityHypre.cpp:1012 — checkMatrixProperties() debug - io/DatReader.cpp:232 — oi.read_image with .dat input - io/RawReader.cpp:488 — oi.read_image with .raw input - io/TiffReader.cpp:555, :653 — oi.read_image with .tif input These will surface in tutorials 2/4/7 (read_image workflows) on GPU; the fixes are likely the same pinned-arena snapshot or ParallelFor recipe once we hit them. https://claude.ai/code/session_011dJ5Bwq4Tnr8wxH597XJFf
Performance Benchmark Results
Fastest solver: bicgstab at 64³ (0.4198s) Benchmark: uniform block (analytical τ = (N-1)/N) |
Code Coverage ReportGenerated by CI — coverage data from gcovr |
Codecov Report✅ All modified and coverable lines are covered by tests. 📢 Thoughts on this report? Let us know! |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
§3 of the profiling notebook still segfaulted on Colab T4 with 4.2.12 — this time the silent crash is in collectBoundarySeeds (FloodFill.cpp:45), which is called by PercolationCheck before the seed-planting phase 1 that 61cf635 already patched.
The function searches the inlet/outlet domain faces for cells whose phase matches phaseID and pushes those into host-side IntVect vectors. The search itself uses amrex::LoopOnCpu reading phase_arr(i, j, k, 0) — and on a CUDA build that Array4 view points at device memory, so a host loop reading through it segfaults the same way the previous host write sites did.
Fix follows the pattern used elsewhere when CPU code genuinely needs to walk iMultiFab data: snapshot phaseFab into a pinned-host iMultiFab once via amrex::MFInfo().SetArena(amrex::The_Pinned_Arena()), copy device → host, sync, then the existing LoopOnCpu walks the host copy. On CPU builds the snapshot is skipped via #ifdef AMREX_USE_GPU and we just alias the input phaseFab.
The ParallelFor + DeviceVector approach used in the seed-planting fix isn't appropriate here because the output is a list of positions (not a fixed-size grid write); list-building reductions aren't a clean primitive in AMReX. The pinned-arena snapshot is one-time per PercolationCheck call and is cheap relative to the flood fill itself.
Other LoopOnCpu / device-memory sites that still need similar treatment (separate commits, none in the current §3 notebook hot path):
These will surface in tutorials 2/4/7 (read_image workflows) on GPU; the fixes are likely the same pinned-arena snapshot or ParallelFor recipe once we hit them.