profiling notebook: GPU-aware install, drop legacy probes, polish prose by jameslehoux · Pull Request #269 · BASE-Laboratory/OpenImpala

jameslehoux · 2026-05-05T15:27:36Z

The profiling notebook had grown a few rough edges as we iterated:

The install cell hard-coded pip install openimpala, ignoring the
GPU runtime entirely. On a Colab T4 every solve was running on the
CPU — exactly the failure mode §1a is supposed to detect. Rewrite
to mirror tutorials/02/04/07: detect nvidia-smi, install
openimpala-cuda on GPU runtimes, fall back to openimpala otherwise.
§1a's build_info() probe carried a 60-line subprocess banner
fallback for "pre-4.0.2 wheels". The published wheel has been at
4.2.x for some time now and build_info() is in every wheel we
publish. Drop the legacy fallback and the AttributeError handler.
Drop colloquialisms in markdown headers and prose:
- "Profiling & Bottleneck Hunt" -> "Profiling and Performance Tuning"
- "exists for one job" / "If you only have 5 minutes" — removed
- "(focused, post-diagnosis)" / "(optional)" parentheticals
- "wonky workaround" / "the classic signature" — recast neutrally
- The "issue Performance: close the gap between OpenImpala's potential and what Colab users see #256 acceptance target" reference in §9b — replaced
  with an explanation that scales naturally to readers who weren't
  following the issue thread
Add a proper contents table to the intro and clean up §2/§5/§6/§7/§9
intros with consistent structure: one-paragraph context, bullet
list of what to look for, one paragraph on interpretation.
Fix a latent bug in §12's recommendation: the CPU-detection branch
checked backend == "cpu", but build_info()'s actual values are
cpp-cpu / cpp-cuda / cpp-hip / pure-python. The check never
fired in practice. Switch to the already-computed is_gpu_build
flag so the "rebuild for GPU" recommendation actually appears when
relevant.

No analysis logic changes — every code cell that produced a chart, a
fit, or a number still does so identically. This is purely a polish
pass to make the notebook read as a tool rather than a scratchpad.

… mem) Reported on Colab T4: notebook §3 crashes the kernel silently at `_core.VoxelImage.from_numpy(arr, max_grid_size)`. Root cause: the binding was filling the new iMultiFab with a CPU loop: auto fab = img->mf->array(mfi); // Array4<int> view into iMultiFab data for (k...) for (j...) for (i...) fab(i, j, k) = ptr[idx]; // host write — but in CUDA mode this // is DEVICE memory → segfault Works fine on CPU builds (where iMultiFab data is host memory) but every attempt to ingest a NumPy array on the GPU build dies before any Python print can flush. Affects every entry point that goes through from_numpy — so high-level oi.tortuosity / oi.percolation_check / oi.volume_fraction on the GPU wheel were ALL broken. Switch to the AMReX idiom: stage the host data in a Gpu::DeviceVector, then use ParallelFor with an AMREX_GPU_DEVICE lambda so the assignment runs on the actual hardware that owns the iMultiFab memory. * CPU build (#ifndef AMREX_USE_GPU): src_ptr is the host pointer and ParallelFor expands to a serial/OMP host loop. Identical behaviour to before. * GPU build (#ifdef AMREX_USE_GPU): copy host → device once via Gpu::copyAsync + streamSynchronize, then ParallelFor launches a kernel that reads from the device buffer and writes through the Array4<int> view to its own memory. Final streamSynchronize before FillBoundary makes sure the kernel is done before the ghost-cell exchange reads it. Note: this fix requires a wheel rebuild + new release. The pure-Python preload helper in 4.2.9 already lets _core.so load on GPU runtimes; this change makes it actually USABLE on them. https://claude.ai/code/session_011dJ5Bwq4Tnr8wxH597XJFf

The profiling notebook had grown a few rough edges as we iterated: 1. The install cell hard-coded `pip install openimpala`, ignoring the GPU runtime entirely. On a Colab T4 every solve was running on the CPU — exactly the failure mode §1a is supposed to detect. Rewrite to mirror tutorials/02/04/07: detect nvidia-smi, install openimpala-cuda on GPU runtimes, fall back to openimpala otherwise. 2. §1a's build_info() probe carried a 60-line subprocess banner fallback for "pre-4.0.2 wheels". The published wheel has been at 4.2.x for some time now and build_info() is in every wheel we publish. Drop the legacy fallback and the AttributeError handler. 3. Drop colloquialisms in markdown headers and prose: - "Profiling & Bottleneck Hunt" -> "Profiling and Performance Tuning" - "exists for one job" / "If you only have 5 minutes" — removed - "*(focused, post-diagnosis)*" / "*(optional)*" parentheticals - "wonky workaround" / "the *classic* signature" — recast neutrally - The "issue #256 acceptance target" reference in §9b — replaced with an explanation that scales naturally to readers who weren't following the issue thread 4. Add a proper contents table to the intro and clean up §2/§5/§6/§7/§9 intros with consistent structure: one-paragraph context, bullet list of what to look for, one paragraph on interpretation. 5. Fix a latent bug in §12's recommendation: the CPU-detection branch checked `backend == "cpu"`, but build_info()'s actual values are `cpp-cpu` / `cpp-cuda` / `cpp-hip` / `pure-python`. The check never fired in practice. Switch to the already-computed `is_gpu_build` flag so the "rebuild for GPU" recommendation actually appears when relevant. No analysis logic changes — every code cell that produced a chart, a fit, or a number still does so identically. This is purely a polish pass to make the notebook read as a tool rather than a scratchpad. https://claude.ai/code/session_011dJ5Bwq4Tnr8wxH597XJFf

github-actions · 2026-05-05T15:30:41Z

Performance Benchmark Results

Size	Solver	Wall Time (s)	Tortuosity	Expected	Rel. Error	Iters	Status
64³	pcg	0.7918	0.984375	0.984375	0.00e+00	1	PASS
64³	flexgmres	0.4184	0.984375	0.984375	0.00e+00	N/A	PASS
64³	bicgstab	0.3921	0.984375	0.984375	0.00e+00	N/A	PASS
64³	gmres	0.3932	0.984375	0.984375	0.00e+00	N/A	PASS
128³	pcg	7.9763	0.992188	0.992188	0.00e+00	1	PASS
128³	flexgmres	5.6900	0.992188	0.992188	0.00e+00	N/A	PASS
128³	bicgstab	5.6239	0.992188	0.992188	0.00e+00	N/A	PASS
128³	gmres	5.6304	0.992188	0.992188	0.00e+00	N/A	PASS

Fastest solver: bicgstab at 64³ (0.3921s)

Benchmark: uniform block (analytical τ = (N-1)/N)

github-actions · 2026-05-05T15:40:58Z

Code Coverage Report

------------------------------------------------------------------------------
                           GCC Code Coverage Report
Directory: .
------------------------------------------------------------------------------
File                                       Lines     Exec  Cover   Missing
------------------------------------------------------------------------------
src/io/CathodeWrite.cpp                       95       83    87%   40-41,97-100,115-116,182-185
src/io/CathodeWrite.H                          1        1   100%
src/io/DatReader.cpp                         135      105    77%   26-27,30,35,92-93,99-100,107-109,135-137,141,144-148,152-155,162,164,208-209,242,245
src/io/DatReader.H                             1        1   100%
src/io/HDF5Reader.cpp                        344       84    24%   40-41,43-44,46-49,52,54-56,58-59,62,64-66,68-74,92-93,126-128,144-145,154-157,174-180,182-187,204,213-215,217,219-228,230-233,236-238,240-251,253-258,266,266,266,266,266,266,266,270,270,270,270,270,270,270,274,276,278,280,282,288,290,297,297,297,297,297,297,297,301,301,301,301,301,301,301,305,305,305,305,305,305,305-306,306,306,306,306,306,306,309,309,309,309,309,309,309-310,310,310,310,310,310,310-311,311,311,311,311,311,311,313,313,313,313,313,313,313-314,314,314,314,314,314,314-315,315,315,315,315,315,315,319,319,319,319,319,319,319,324,324,324,324,324,324,324-325,325,325,325,325,325,325-326,326,326,326,326,326,326-327,327,327,327,327,327,327,332,332,332,332,332,332,332,337,337,337,337,337,337,337-338,338,338,338,338,338,338,343,343,343,343,343,343,343,350,350,350,350,350,350,350,357-358,432-435,437-440
src/io/HDF5Reader.H                            3        3   100%
src/io/ImageLoader.cpp                        61       42    68%   25,38,48,60-62,64-70,72,77,89-90,92,94
src/io/RawReader.cpp                         266      135    50%   49-50,89-90,111-112,115-117,120-121,140-142,155-157,166-168,174-177,185-186,192-196,200-204,209-212,219-224,231-237,271,273-274,276,283-284,301,312,314,318,325,327,331-334,338,346-347,353-355,361-363,365-366,369,372,374,377-380,382-384,386,388-389,391,393-394,396,398-399,401,403-404,406,410-411,413,417-418,420,425,465,471-472,521-524,538,540-542,544,546-548,558,562-564,566,588
src/io/RawReader.H                             1        1   100%
src/io/TiffReader.cpp                        384      130    33%   59-65,67-69,71-73,75-77,79-80,82-84,86-88,90-92,94-96,98-99,101-103,106-108,111-112,114-117,119,122,124-127,143-144,148-150,152-158,160,186,210,217,226,228-231,240,242-245,248,255,288-293,306,309-317,319-320,323-327,331-335,338-342,344-348,351-357,359-363,367,369,375-377,379-393,396,398-402,404-409,413-418,420-425,428-429,432-434,555-575,577-578,581-588,590,593-609,612-614,670,673-674,677-683,685,689-700,702-703
src/io/TiffReader.H                            5        5   100%
src/props/BoundaryCondition.H                131       74    56%   63,68,70,216,224-229,233-236,238-244,247-249,252-253,255,258-261,264-265,271-272,274-279,285-287,290-296,299,303,365-366,371,373
src/props/ConnectedComponents.cpp             69       67    97%   94-95
src/props/ConnectedComponents.H                4        4   100%
src/props/DeffTensor.cpp                      62       59    95%   122,128-129
src/props/Diffusion.cpp                      510      378    74%   93-94,97-98,103-104,106-116,118,123-132,134-141,144-150,153-157,159-163,165,168-173,175-177,179,182-184,186-187,190-191,193,195-198,200,202-203,288-289,297-298,300,349,359-360,368-371,373-375,404-413,415,453,461,465-467,526-527,533,535,539,547,581,610,638,646,735-736,739-740,757-760,771-772,774,824
src/props/EffDiffFillMtx.H                   120      106    88%   58,216-217,221-225,229,231-235
src/props/EffectiveDiffusivityHypre.cpp      389      347    89%   189-191,193-197,305,367-370,479,612-615,617-619,621-624,633-636,643,672,684-687,689-691,693,705,716,718
src/props/EffectiveDiffusivityHypre.H          7        7   100%
src/props/FloodFill.cpp                       84       81    96%   94-95,203
src/props/HypreStructSolver.cpp              343      210    61%   87-88,121,133-134,145,299,309,311,314,346,356,358,361,367-370,372-376,378-379,381-385,388-389,391-392,394,397-398,401-402,404-407,409-413,415-416,418-422,425-426,428-429,431,434-435,438-439,441-443,445-451,453-457,460-461,463-464,466,469-470,473,475-477,479-485,487-491,494-495,497-498,500,503-504,507,509-511,513-516,518-522,525-526,528-529,531,534-535,538,541-542,555
src/props/HypreStructSolver.H                  6        6   100%
src/props/MacroGeometry.H                     17       17   100%
src/props/ParticleSizeDistribution.cpp        11       11   100%
src/props/ParticleSizeDistribution.H           6        6   100%
src/props/PercolationCheck.cpp                53       46    86%   32-33,49-51,68,73
src/props/PercolationCheck.H                   4        4   100%
src/props/PhysicsConfig.H                     90       89    98%   150
src/props/ResultsJSON.H                      225      222    98%   242,395,416
src/props/REVStudy.cpp                       151      128    84%   72,83-91,159,170-173,175,183-186,188-190
src/props/SolverConfig.H                      32       20    62%   30,32,37-44,75-76
src/props/SpecificSurfaceArea.cpp             56       55    98%   59
src/props/SpecificSurfaceArea.H                6        6   100%
src/props/ThroughThicknessProfile.cpp         38       38   100%
src/props/ThroughThicknessProfile.H            5        5   100%
src/props/Tortuosity.H                         2        2   100%
src/props/TortuosityDirect.cpp               219      191    87%   81-83,86,100-106,113-114,125,134,140,202-209,226,394,424,433
src/props/TortuosityDirect.H                   5        5   100%
src/props/TortuosityHypre.cpp                784      563    71%   149-150,155-156,240-243,246-248,311,335-337,340-341,343,353-355,358-360,390-393,573,597,601,622,639-640,642-644,646-655,657,660-664,668-670,673-680,682-686,690-692,694-696,698-707,709-713,715-726,728-731,733,743,749-752,754-756,765-768,770-772,788,791-792,815-820,831-834,836,873,878-881,884-886,890-893,895,897-900,902,907-909,911,960,969,974,977-982,998-1001,1015-1019,1024-1029,1039-1043,1048-1053,1058-1062,1065-1068,1075-1078,1089,1098,1100,1104,1106,1128,1159-1160,1246-1248,1374-1377
src/props/TortuosityHypre.H                   15       15   100%
src/props/TortuosityHypreFill.H              127       98    77%   85,203,205-212,237-239,241-245,247-248,250,252,255-256,258-262
src/props/TortuosityKernels.H                 97       53    54%   52,56-60,62-65,69-74,76-80,84-85,90,129,143,157,243,245-248,250-253,257-260,262-265
src/props/TortuosityMLMG.cpp                  99       91    91%   160,181-183,185-186,193,206
src/props/TortuosityMLMG.H                     1        1   100%
src/props/TortuositySolverBase.cpp           301      237    78%   70-72,74-75,94-101,104,106,142-145,200,203,205,255,280,298,327,391,394-396,398,406-409,411-417,422,427-429,435-436,438-440,454,460,464-465,467,478,492,496-498,500,502,506
src/props/TortuositySolverBase.H              13       13   100%
src/props/VolumeFraction.cpp                  25       25   100%
src/props/VolumeFraction.H                     4        4   100%
------------------------------------------------------------------------------
TOTAL                                       5407     3874    71%
------------------------------------------------------------------------------

Generated by CI — coverage data from gcovr

codecov · 2026-05-05T15:47:09Z

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

James Le Houx added 2 commits May 5, 2026 15:20

jameslehoux merged commit 1c3d340 into master May 5, 2026
1 check passed

github-actions Bot added gpu python labels May 5, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

profiling notebook: GPU-aware install, drop legacy probes, polish prose#269

profiling notebook: GPU-aware install, drop legacy probes, polish prose#269
jameslehoux merged 2 commits intomasterfrom
claude/upbeat-mccarthy-f1mNN

jameslehoux commented May 5, 2026

Uh oh!

Uh oh!

github-actions Bot commented May 5, 2026

Uh oh!

github-actions Bot commented May 5, 2026

Uh oh!

codecov Bot commented May 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

jameslehoux commented May 5, 2026

Uh oh!

Uh oh!

github-actions Bot commented May 5, 2026

Performance Benchmark Results

Uh oh!

github-actions Bot commented May 5, 2026

Code Coverage Report

Uh oh!

codecov Bot commented May 5, 2026

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant