Skip to content

fix collectBoundarySeeds: snapshot phaseFab to host before scanning#272

Merged
jameslehoux merged 1 commit intomasterfrom
claude/upbeat-mccarthy-f1mNN
May 6, 2026
Merged

fix collectBoundarySeeds: snapshot phaseFab to host before scanning#272
jameslehoux merged 1 commit intomasterfrom
claude/upbeat-mccarthy-f1mNN

Conversation

@jameslehoux
Copy link
Copy Markdown

§3 of the profiling notebook still segfaulted on Colab T4 with 4.2.12 — this time the silent crash is in collectBoundarySeeds (FloodFill.cpp:45), which is called by PercolationCheck before the seed-planting phase 1 that 61cf635 already patched.

The function searches the inlet/outlet domain faces for cells whose phase matches phaseID and pushes those into host-side IntVect vectors. The search itself uses amrex::LoopOnCpu reading phase_arr(i, j, k, 0) — and on a CUDA build that Array4 view points at device memory, so a host loop reading through it segfaults the same way the previous host write sites did.

Fix follows the pattern used elsewhere when CPU code genuinely needs to walk iMultiFab data: snapshot phaseFab into a pinned-host iMultiFab once via amrex::MFInfo().SetArena(amrex::The_Pinned_Arena()), copy device → host, sync, then the existing LoopOnCpu walks the host copy. On CPU builds the snapshot is skipped via #ifdef AMREX_USE_GPU and we just alias the input phaseFab.

The ParallelFor + DeviceVector approach used in the seed-planting fix isn't appropriate here because the output is a list of positions (not a fixed-size grid write); list-building reductions aren't a clean primitive in AMReX. The pinned-arena snapshot is one-time per PercolationCheck call and is cheap relative to the flood fill itself.

Other LoopOnCpu / device-memory sites that still need similar treatment (separate commits, none in the current §3 notebook hot path):

  • ConnectedComponents.cpp:43 — oi.connected_components only
  • Diffusion.cpp:127, :238 — native binary only (not Python)
  • TortuosityHypre.cpp:1012 — checkMatrixProperties() debug
  • io/DatReader.cpp:232 — oi.read_image with .dat input
  • io/RawReader.cpp:488 — oi.read_image with .raw input
  • io/TiffReader.cpp:555, :653 — oi.read_image with .tif input

These will surface in tutorials 2/4/7 (read_image workflows) on GPU; the fixes are likely the same pinned-arena snapshot or ParallelFor recipe once we hit them.

§3 of the profiling notebook still segfaulted on Colab T4 with 4.2.12 —
this time the silent crash is in collectBoundarySeeds (FloodFill.cpp:45),
which is called by PercolationCheck *before* the seed-planting phase 1
that 61cf635 already patched.

The function searches the inlet/outlet domain faces for cells whose phase
matches phaseID and pushes those into host-side IntVect vectors. The
search itself uses amrex::LoopOnCpu reading phase_arr(i, j, k, 0) — and
on a CUDA build that Array4<int> view points at device memory, so a
host loop reading through it segfaults the same way the previous host
*write* sites did.

Fix follows the pattern used elsewhere when CPU code genuinely needs to
walk iMultiFab data: snapshot phaseFab into a pinned-host iMultiFab once
via amrex::MFInfo().SetArena(amrex::The_Pinned_Arena()), copy device →
host, sync, then the existing LoopOnCpu walks the host copy. On CPU
builds the snapshot is skipped via #ifdef AMREX_USE_GPU and we just
alias the input phaseFab.

The ParallelFor + DeviceVector approach used in the seed-planting fix
isn't appropriate here because the output is a *list of positions*
(not a fixed-size grid write); list-building reductions aren't a clean
primitive in AMReX. The pinned-arena snapshot is one-time per
PercolationCheck call and is cheap relative to the flood fill itself.

Other LoopOnCpu / device-memory sites that still need similar treatment
(separate commits, none in the current §3 notebook hot path):

  - ConnectedComponents.cpp:43        — oi.connected_components only
  - Diffusion.cpp:127, :238           — native binary only (not Python)
  - TortuosityHypre.cpp:1012          — checkMatrixProperties() debug
  - io/DatReader.cpp:232              — oi.read_image with .dat input
  - io/RawReader.cpp:488              — oi.read_image with .raw input
  - io/TiffReader.cpp:555, :653       — oi.read_image with .tif input

These will surface in tutorials 2/4/7 (read_image workflows) on GPU; the
fixes are likely the same pinned-arena snapshot or ParallelFor recipe
once we hit them.

https://claude.ai/code/session_011dJ5Bwq4Tnr8wxH597XJFf
@jameslehoux jameslehoux merged commit 6f5ca0b into master May 6, 2026
6 checks passed
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 6, 2026

Performance Benchmark Results

Size Solver Wall Time (s) Tortuosity Expected Rel. Error Iters Status
64³ pcg 0.7241 0.984375 0.984375 0.00e+00 1 PASS
64³ flexgmres 0.4326 0.984375 0.984375 0.00e+00 N/A PASS
64³ bicgstab 0.4198 0.984375 0.984375 0.00e+00 N/A PASS
64³ gmres 0.4216 0.984375 0.984375 0.00e+00 N/A PASS
128³ pcg 8.8314 0.992188 0.992188 0.00e+00 1 PASS
128³ flexgmres 5.7744 0.992188 0.992188 0.00e+00 N/A PASS
128³ bicgstab 5.6736 0.992188 0.992188 0.00e+00 N/A PASS
128³ gmres 5.6557 0.992188 0.992188 0.00e+00 N/A PASS

Fastest solver: bicgstab at 64³ (0.4198s)

Benchmark: uniform block (analytical τ = (N-1)/N)

@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 6, 2026

Code Coverage Report

------------------------------------------------------------------------------
                           GCC Code Coverage Report
Directory: .
------------------------------------------------------------------------------
File                                       Lines     Exec  Cover   Missing
------------------------------------------------------------------------------
src/io/CathodeWrite.cpp                       95       83    87%   40-41,97-100,115-116,182-185
src/io/CathodeWrite.H                          1        1   100%
src/io/DatReader.cpp                         135      105    77%   26-27,30,35,92-93,99-100,107-109,135-137,141,144-148,152-155,162,164,208-209,242,245
src/io/DatReader.H                             1        1   100%
src/io/HDF5Reader.cpp                        344       84    24%   40-41,43-44,46-49,52,54-56,58-59,62,64-66,68-74,92-93,126-128,144-145,154-157,174-180,182-187,204,213-215,217,219-228,230-233,236-238,240-251,253-258,266,266,266,266,266,266,266,270,270,270,270,270,270,270,274,276,278,280,282,288,290,297,297,297,297,297,297,297,301,301,301,301,301,301,301,305,305,305,305,305,305,305-306,306,306,306,306,306,306,309,309,309,309,309,309,309-310,310,310,310,310,310,310-311,311,311,311,311,311,311,313,313,313,313,313,313,313-314,314,314,314,314,314,314-315,315,315,315,315,315,315,319,319,319,319,319,319,319,324,324,324,324,324,324,324-325,325,325,325,325,325,325-326,326,326,326,326,326,326-327,327,327,327,327,327,327,332,332,332,332,332,332,332,337,337,337,337,337,337,337-338,338,338,338,338,338,338,343,343,343,343,343,343,343,350,350,350,350,350,350,350,357-358,432-435,437-440
src/io/HDF5Reader.H                            3        3   100%
src/io/ImageLoader.cpp                        61       42    68%   25,38,48,60-62,64-70,72,77,89-90,92,94
src/io/RawReader.cpp                         266      135    50%   49-50,89-90,111-112,115-117,120-121,140-142,155-157,166-168,174-177,185-186,192-196,200-204,209-212,219-224,231-237,271,273-274,276,283-284,301,312,314,318,325,327,331-334,338,346-347,353-355,361-363,365-366,369,372,374,377-380,382-384,386,388-389,391,393-394,396,398-399,401,403-404,406,410-411,413,417-418,420,425,465,471-472,521-524,538,540-542,544,546-548,558,562-564,566,588
src/io/RawReader.H                             1        1   100%
src/io/TiffReader.cpp                        384      130    33%   59-65,67-69,71-73,75-77,79-80,82-84,86-88,90-92,94-96,98-99,101-103,106-108,111-112,114-117,119,122,124-127,143-144,148-150,152-158,160,186,210,217,226,228-231,240,242-245,248,255,288-293,306,309-317,319-320,323-327,331-335,338-342,344-348,351-357,359-363,367,369,375-377,379-393,396,398-402,404-409,413-418,420-425,428-429,432-434,555-575,577-578,581-588,590,593-609,612-614,670,673-674,677-683,685,689-700,702-703
src/io/TiffReader.H                            5        5   100%
src/props/BoundaryCondition.H                131       74    56%   63,68,70,216,224-229,233-236,238-244,247-249,252-253,255,258-261,264-265,271-272,274-279,285-287,290-296,299,303,365-366,371,373
src/props/ConnectedComponents.cpp             69       67    97%   94-95
src/props/ConnectedComponents.H                4        4   100%
src/props/DeffTensor.cpp                      62       59    95%   122,128-129
src/props/Diffusion.cpp                      510      378    74%   93-94,97-98,103-104,106-116,118,123-132,134-141,144-150,153-157,159-163,165,168-173,175-177,179,182-184,186-187,190-191,193,195-198,200,202-203,288-289,297-298,300,349,359-360,368-371,373-375,404-413,415,453,461,465-467,526-527,533,535,539,547,581,610,638,646,735-736,739-740,757-760,771-772,774,824
src/props/EffDiffFillMtx.H                   120      106    88%   58,216-217,221-225,229,231-235
src/props/EffectiveDiffusivityHypre.cpp      413      372    90%   189-191,193-197,352-355,464,616-619,621-623,625-628,637-640,647,676,688-691,693-695,697,709,720,722
src/props/EffectiveDiffusivityHypre.H          7        7   100%
src/props/FloodFill.cpp                       90       87    96%   109-110,250
src/props/HypreStructSolver.cpp              343      210    61%   87-88,121,133-134,145,299,309,311,314,346,356,358,361,367-370,372-376,378-379,381-385,388-389,391-392,394,397-398,401-402,404-407,409-413,415-416,418-422,425-426,428-429,431,434-435,438-439,441-443,445-451,453-457,460-461,463-464,466,469-470,473,475-477,479-485,487-491,494-495,497-498,500,503-504,507,509-511,513-516,518-522,525-526,528-529,531,534-535,538,541-542,555
src/props/HypreStructSolver.H                  6        6   100%
src/props/MacroGeometry.H                     17       17   100%
src/props/ParticleSizeDistribution.cpp        11       11   100%
src/props/ParticleSizeDistribution.H           6        6   100%
src/props/PercolationCheck.cpp                53       46    86%   32-33,49-51,68,73
src/props/PercolationCheck.H                   4        4   100%
src/props/PhysicsConfig.H                     90       89    98%   150
src/props/ResultsJSON.H                      225      222    98%   242,395,416
src/props/REVStudy.cpp                       151      128    84%   72,83-91,159,170-173,175,183-186,188-190
src/props/SolverConfig.H                      32       20    62%   30,32,37-44,75-76
src/props/SpecificSurfaceArea.cpp             56       55    98%   59
src/props/SpecificSurfaceArea.H                6        6   100%
src/props/ThroughThicknessProfile.cpp         38       38   100%
src/props/ThroughThicknessProfile.H            5        5   100%
src/props/Tortuosity.H                         2        2   100%
src/props/TortuosityDirect.cpp               219      191    87%   81-83,86,100-106,113-114,125,134,140,202-209,226,394,424,433
src/props/TortuosityDirect.H                   5        5   100%
src/props/TortuosityHypre.cpp                794      566    71%   149-150,155-156,240-243,246-248,311,335-337,340-341,343,353-355,358-360,390-393,573,597,601,622,639-640,642-644,646-655,657,669,671-681,685-691,693-697,701-703,705-707,709-718,727,729-739,743-751,753-756,758,768,774-777,779-781,790-793,795-797,813,816-817,840-845,856-859,861,898,903-906,909-911,915-918,920,922-925,927,932-934,936,985,994,999,1002-1007,1023-1026,1040-1044,1049-1054,1064-1068,1073-1078,1083-1087,1090-1093,1100-1103,1114,1123,1125,1129,1131,1153,1199-1200,1286-1288,1414-1417
src/props/TortuosityHypre.H                   15       15   100%
src/props/TortuosityHypreFill.H              127       98    77%   85,203,205-212,237-239,241-245,247-248,250,252,255-256,258-262
src/props/TortuosityKernels.H                 97       53    54%   52,56-60,62-65,69-74,76-80,84-85,90,129,143,157,243,245-248,250-253,257-260,262-265
src/props/TortuosityMLMG.cpp                  99       91    91%   160,181-183,185-186,193,206
src/props/TortuosityMLMG.H                     1        1   100%
src/props/TortuositySolverBase.cpp           301      237    78%   70-72,74-75,94-101,104,106,142-145,200,203,205,255,280,298,327,391,394-396,398,406-409,411-417,422,427-429,435-436,438-440,454,460,464-465,467,478,492,496-498,500,502,506
src/props/TortuositySolverBase.H              13       13   100%
src/props/VolumeFraction.cpp                  25       25   100%
src/props/VolumeFraction.H                     4        4   100%
------------------------------------------------------------------------------
TOTAL                                       5447     3908    71%
------------------------------------------------------------------------------


Generated by CI — coverage data from gcovr

@codecov
Copy link
Copy Markdown

codecov Bot commented May 6, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant