Skip to content

profiling notebook: GPU-aware install, drop legacy probes, polish prose#269

Merged
jameslehoux merged 2 commits intomasterfrom
claude/upbeat-mccarthy-f1mNN
May 5, 2026
Merged

profiling notebook: GPU-aware install, drop legacy probes, polish prose#269
jameslehoux merged 2 commits intomasterfrom
claude/upbeat-mccarthy-f1mNN

Conversation

@jameslehoux
Copy link
Copy Markdown

The profiling notebook had grown a few rough edges as we iterated:

  1. The install cell hard-coded pip install openimpala, ignoring the
    GPU runtime entirely. On a Colab T4 every solve was running on the
    CPU — exactly the failure mode §1a is supposed to detect. Rewrite
    to mirror tutorials/02/04/07: detect nvidia-smi, install
    openimpala-cuda on GPU runtimes, fall back to openimpala otherwise.

  2. §1a's build_info() probe carried a 60-line subprocess banner
    fallback for "pre-4.0.2 wheels". The published wheel has been at
    4.2.x for some time now and build_info() is in every wheel we
    publish. Drop the legacy fallback and the AttributeError handler.

  3. Drop colloquialisms in markdown headers and prose:

    • "Profiling & Bottleneck Hunt" -> "Profiling and Performance Tuning"
    • "exists for one job" / "If you only have 5 minutes" — removed
    • "(focused, post-diagnosis)" / "(optional)" parentheticals
    • "wonky workaround" / "the classic signature" — recast neutrally
    • The "issue Performance: close the gap between OpenImpala's potential and what Colab users see #256 acceptance target" reference in §9b — replaced
      with an explanation that scales naturally to readers who weren't
      following the issue thread
  4. Add a proper contents table to the intro and clean up §2/§5/§6/§7/§9
    intros with consistent structure: one-paragraph context, bullet
    list of what to look for, one paragraph on interpretation.

  5. Fix a latent bug in §12's recommendation: the CPU-detection branch
    checked backend == "cpu", but build_info()'s actual values are
    cpp-cpu / cpp-cuda / cpp-hip / pure-python. The check never
    fired in practice. Switch to the already-computed is_gpu_build
    flag so the "rebuild for GPU" recommendation actually appears when
    relevant.

No analysis logic changes — every code cell that produced a chart, a
fit, or a number still does so identically. This is purely a polish
pass to make the notebook read as a tool rather than a scratchpad.

James Le Houx added 2 commits May 5, 2026 15:20
… mem)

Reported on Colab T4: notebook §3 crashes the kernel silently at
`_core.VoxelImage.from_numpy(arr, max_grid_size)`. Root cause: the
binding was filling the new iMultiFab with a CPU loop:

    auto fab = img->mf->array(mfi);   // Array4<int> view into iMultiFab data
    for (k...) for (j...) for (i...)
        fab(i, j, k) = ptr[idx];      // host write — but in CUDA mode this
                                      // is DEVICE memory → segfault

Works fine on CPU builds (where iMultiFab data is host memory) but every
attempt to ingest a NumPy array on the GPU build dies before any Python
print can flush. Affects every entry point that goes through from_numpy
— so high-level oi.tortuosity / oi.percolation_check / oi.volume_fraction
on the GPU wheel were ALL broken.

Switch to the AMReX idiom: stage the host data in a Gpu::DeviceVector,
then use ParallelFor with an AMREX_GPU_DEVICE lambda so the assignment
runs on the actual hardware that owns the iMultiFab memory.

  * CPU build (#ifndef AMREX_USE_GPU): src_ptr is the host pointer and
    ParallelFor expands to a serial/OMP host loop. Identical behaviour
    to before.
  * GPU build (#ifdef AMREX_USE_GPU): copy host → device once via
    Gpu::copyAsync + streamSynchronize, then ParallelFor launches a
    kernel that reads from the device buffer and writes through the
    Array4<int> view to its own memory.

Final streamSynchronize before FillBoundary makes sure the kernel is
done before the ghost-cell exchange reads it.

Note: this fix requires a wheel rebuild + new release. The pure-Python
preload helper in 4.2.9 already lets _core.so load on GPU runtimes;
this change makes it actually USABLE on them.

https://claude.ai/code/session_011dJ5Bwq4Tnr8wxH597XJFf
The profiling notebook had grown a few rough edges as we iterated:

 1. The install cell hard-coded `pip install openimpala`, ignoring the
    GPU runtime entirely. On a Colab T4 every solve was running on the
    CPU — exactly the failure mode §1a is supposed to detect. Rewrite
    to mirror tutorials/02/04/07: detect nvidia-smi, install
    openimpala-cuda on GPU runtimes, fall back to openimpala otherwise.

 2. §1a's build_info() probe carried a 60-line subprocess banner
    fallback for "pre-4.0.2 wheels". The published wheel has been at
    4.2.x for some time now and build_info() is in every wheel we
    publish. Drop the legacy fallback and the AttributeError handler.

 3. Drop colloquialisms in markdown headers and prose:
    - "Profiling & Bottleneck Hunt" -> "Profiling and Performance Tuning"
    - "exists for one job" / "If you only have 5 minutes" — removed
    - "*(focused, post-diagnosis)*" / "*(optional)*" parentheticals
    - "wonky workaround" / "the *classic* signature" — recast neutrally
    - The "issue #256 acceptance target" reference in §9b — replaced
      with an explanation that scales naturally to readers who weren't
      following the issue thread

 4. Add a proper contents table to the intro and clean up §2/§5/§6/§7/§9
    intros with consistent structure: one-paragraph context, bullet
    list of what to look for, one paragraph on interpretation.

 5. Fix a latent bug in §12's recommendation: the CPU-detection branch
    checked `backend == "cpu"`, but build_info()'s actual values are
    `cpp-cpu` / `cpp-cuda` / `cpp-hip` / `pure-python`. The check never
    fired in practice. Switch to the already-computed `is_gpu_build`
    flag so the "rebuild for GPU" recommendation actually appears when
    relevant.

No analysis logic changes — every code cell that produced a chart, a
fit, or a number still does so identically. This is purely a polish
pass to make the notebook read as a tool rather than a scratchpad.

https://claude.ai/code/session_011dJ5Bwq4Tnr8wxH597XJFf
@jameslehoux jameslehoux merged commit 1c3d340 into master May 5, 2026
1 check passed
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 5, 2026

Performance Benchmark Results

Size Solver Wall Time (s) Tortuosity Expected Rel. Error Iters Status
64³ pcg 0.7918 0.984375 0.984375 0.00e+00 1 PASS
64³ flexgmres 0.4184 0.984375 0.984375 0.00e+00 N/A PASS
64³ bicgstab 0.3921 0.984375 0.984375 0.00e+00 N/A PASS
64³ gmres 0.3932 0.984375 0.984375 0.00e+00 N/A PASS
128³ pcg 7.9763 0.992188 0.992188 0.00e+00 1 PASS
128³ flexgmres 5.6900 0.992188 0.992188 0.00e+00 N/A PASS
128³ bicgstab 5.6239 0.992188 0.992188 0.00e+00 N/A PASS
128³ gmres 5.6304 0.992188 0.992188 0.00e+00 N/A PASS

Fastest solver: bicgstab at 64³ (0.3921s)

Benchmark: uniform block (analytical τ = (N-1)/N)

@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 5, 2026

Code Coverage Report

------------------------------------------------------------------------------
                           GCC Code Coverage Report
Directory: .
------------------------------------------------------------------------------
File                                       Lines     Exec  Cover   Missing
------------------------------------------------------------------------------
src/io/CathodeWrite.cpp                       95       83    87%   40-41,97-100,115-116,182-185
src/io/CathodeWrite.H                          1        1   100%
src/io/DatReader.cpp                         135      105    77%   26-27,30,35,92-93,99-100,107-109,135-137,141,144-148,152-155,162,164,208-209,242,245
src/io/DatReader.H                             1        1   100%
src/io/HDF5Reader.cpp                        344       84    24%   40-41,43-44,46-49,52,54-56,58-59,62,64-66,68-74,92-93,126-128,144-145,154-157,174-180,182-187,204,213-215,217,219-228,230-233,236-238,240-251,253-258,266,266,266,266,266,266,266,270,270,270,270,270,270,270,274,276,278,280,282,288,290,297,297,297,297,297,297,297,301,301,301,301,301,301,301,305,305,305,305,305,305,305-306,306,306,306,306,306,306,309,309,309,309,309,309,309-310,310,310,310,310,310,310-311,311,311,311,311,311,311,313,313,313,313,313,313,313-314,314,314,314,314,314,314-315,315,315,315,315,315,315,319,319,319,319,319,319,319,324,324,324,324,324,324,324-325,325,325,325,325,325,325-326,326,326,326,326,326,326-327,327,327,327,327,327,327,332,332,332,332,332,332,332,337,337,337,337,337,337,337-338,338,338,338,338,338,338,343,343,343,343,343,343,343,350,350,350,350,350,350,350,357-358,432-435,437-440
src/io/HDF5Reader.H                            3        3   100%
src/io/ImageLoader.cpp                        61       42    68%   25,38,48,60-62,64-70,72,77,89-90,92,94
src/io/RawReader.cpp                         266      135    50%   49-50,89-90,111-112,115-117,120-121,140-142,155-157,166-168,174-177,185-186,192-196,200-204,209-212,219-224,231-237,271,273-274,276,283-284,301,312,314,318,325,327,331-334,338,346-347,353-355,361-363,365-366,369,372,374,377-380,382-384,386,388-389,391,393-394,396,398-399,401,403-404,406,410-411,413,417-418,420,425,465,471-472,521-524,538,540-542,544,546-548,558,562-564,566,588
src/io/RawReader.H                             1        1   100%
src/io/TiffReader.cpp                        384      130    33%   59-65,67-69,71-73,75-77,79-80,82-84,86-88,90-92,94-96,98-99,101-103,106-108,111-112,114-117,119,122,124-127,143-144,148-150,152-158,160,186,210,217,226,228-231,240,242-245,248,255,288-293,306,309-317,319-320,323-327,331-335,338-342,344-348,351-357,359-363,367,369,375-377,379-393,396,398-402,404-409,413-418,420-425,428-429,432-434,555-575,577-578,581-588,590,593-609,612-614,670,673-674,677-683,685,689-700,702-703
src/io/TiffReader.H                            5        5   100%
src/props/BoundaryCondition.H                131       74    56%   63,68,70,216,224-229,233-236,238-244,247-249,252-253,255,258-261,264-265,271-272,274-279,285-287,290-296,299,303,365-366,371,373
src/props/ConnectedComponents.cpp             69       67    97%   94-95
src/props/ConnectedComponents.H                4        4   100%
src/props/DeffTensor.cpp                      62       59    95%   122,128-129
src/props/Diffusion.cpp                      510      378    74%   93-94,97-98,103-104,106-116,118,123-132,134-141,144-150,153-157,159-163,165,168-173,175-177,179,182-184,186-187,190-191,193,195-198,200,202-203,288-289,297-298,300,349,359-360,368-371,373-375,404-413,415,453,461,465-467,526-527,533,535,539,547,581,610,638,646,735-736,739-740,757-760,771-772,774,824
src/props/EffDiffFillMtx.H                   120      106    88%   58,216-217,221-225,229,231-235
src/props/EffectiveDiffusivityHypre.cpp      389      347    89%   189-191,193-197,305,367-370,479,612-615,617-619,621-624,633-636,643,672,684-687,689-691,693,705,716,718
src/props/EffectiveDiffusivityHypre.H          7        7   100%
src/props/FloodFill.cpp                       84       81    96%   94-95,203
src/props/HypreStructSolver.cpp              343      210    61%   87-88,121,133-134,145,299,309,311,314,346,356,358,361,367-370,372-376,378-379,381-385,388-389,391-392,394,397-398,401-402,404-407,409-413,415-416,418-422,425-426,428-429,431,434-435,438-439,441-443,445-451,453-457,460-461,463-464,466,469-470,473,475-477,479-485,487-491,494-495,497-498,500,503-504,507,509-511,513-516,518-522,525-526,528-529,531,534-535,538,541-542,555
src/props/HypreStructSolver.H                  6        6   100%
src/props/MacroGeometry.H                     17       17   100%
src/props/ParticleSizeDistribution.cpp        11       11   100%
src/props/ParticleSizeDistribution.H           6        6   100%
src/props/PercolationCheck.cpp                53       46    86%   32-33,49-51,68,73
src/props/PercolationCheck.H                   4        4   100%
src/props/PhysicsConfig.H                     90       89    98%   150
src/props/ResultsJSON.H                      225      222    98%   242,395,416
src/props/REVStudy.cpp                       151      128    84%   72,83-91,159,170-173,175,183-186,188-190
src/props/SolverConfig.H                      32       20    62%   30,32,37-44,75-76
src/props/SpecificSurfaceArea.cpp             56       55    98%   59
src/props/SpecificSurfaceArea.H                6        6   100%
src/props/ThroughThicknessProfile.cpp         38       38   100%
src/props/ThroughThicknessProfile.H            5        5   100%
src/props/Tortuosity.H                         2        2   100%
src/props/TortuosityDirect.cpp               219      191    87%   81-83,86,100-106,113-114,125,134,140,202-209,226,394,424,433
src/props/TortuosityDirect.H                   5        5   100%
src/props/TortuosityHypre.cpp                784      563    71%   149-150,155-156,240-243,246-248,311,335-337,340-341,343,353-355,358-360,390-393,573,597,601,622,639-640,642-644,646-655,657,660-664,668-670,673-680,682-686,690-692,694-696,698-707,709-713,715-726,728-731,733,743,749-752,754-756,765-768,770-772,788,791-792,815-820,831-834,836,873,878-881,884-886,890-893,895,897-900,902,907-909,911,960,969,974,977-982,998-1001,1015-1019,1024-1029,1039-1043,1048-1053,1058-1062,1065-1068,1075-1078,1089,1098,1100,1104,1106,1128,1159-1160,1246-1248,1374-1377
src/props/TortuosityHypre.H                   15       15   100%
src/props/TortuosityHypreFill.H              127       98    77%   85,203,205-212,237-239,241-245,247-248,250,252,255-256,258-262
src/props/TortuosityKernels.H                 97       53    54%   52,56-60,62-65,69-74,76-80,84-85,90,129,143,157,243,245-248,250-253,257-260,262-265
src/props/TortuosityMLMG.cpp                  99       91    91%   160,181-183,185-186,193,206
src/props/TortuosityMLMG.H                     1        1   100%
src/props/TortuositySolverBase.cpp           301      237    78%   70-72,74-75,94-101,104,106,142-145,200,203,205,255,280,298,327,391,394-396,398,406-409,411-417,422,427-429,435-436,438-440,454,460,464-465,467,478,492,496-498,500,502,506
src/props/TortuositySolverBase.H              13       13   100%
src/props/VolumeFraction.cpp                  25       25   100%
src/props/VolumeFraction.H                     4        4   100%
------------------------------------------------------------------------------
TOTAL                                       5407     3874    71%
------------------------------------------------------------------------------


Generated by CI — coverage data from gcovr

@codecov
Copy link
Copy Markdown

codecov Bot commented May 5, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant