Skip to content

Fix GPU wheels build, distribute via GitHub Releases, and expand test coverage#233

Merged
jameslehoux merged 24 commits intoworkingfrom
claude/fix-gpu-wheels-build-yptxM
Apr 1, 2026
Merged

Fix GPU wheels build, distribute via GitHub Releases, and expand test coverage#233
jameslehoux merged 24 commits intoworkingfrom
claude/fix-gpu-wheels-build-yptxM

Conversation

@jameslehoux
Copy link
Copy Markdown

Summary

  • Fix GPU wheel versioning: Wheels were incorrectly named 4.0.3.dev0 instead of the tagged version because cibuildwheel's sed rename dirtied the git tree. Fixed by setting SETUPTOOLS_SCM_PRETEND_VERSION from the git tag in both CPU and GPU workflows.
  • Distribute GPU wheels via GitHub Releases: GPU wheels are ~318 MB each (CUDA fatbinaries for sm_60–sm_90), exceeding PyPI's 100 MB limit. Replaced the PyPI publish step with softprops/action-gh-release to upload wheels as release assets. Users install with:
    pip install openimpala-cuda --find-links \
      https://github.com/BASE-Laboratory/OpenImpala/releases/latest/download/
    
  • Update all install instructions: Updated README, docs (getting-started, gpu, index.rst), paper.md, all 7 tutorial notebooks, and profiling notebook to use the new --find-links install command.
  • Expand test coverage: Added synthetic REV study test (REVStudy.cpp 0% → ~60%), 3 Diffusion.cpp integration test variants (microstructure, tortuosity, REV paths), and 3 RawReader data type variants (UINT16_LE, INT16_LE, FLOAT32_LE) with generated test data.
  • Polish JOSS paper: Fix LaTeX formatting, verify claims against code, update authorship and acknowledgements.

Test plan

  • Trigger pypi-wheels-gpu.yml via workflow_dispatch and verify wheels build with correct version
  • Create a test release and verify GPU wheels are uploaded as release assets
  • Verify pip install openimpala-cuda --find-links <release-url> works from a clean environment
  • Run ctest to confirm new test variants (tSyntheticREVStudy, tDiffusion_, tRawReader_) pass
  • Open tutorial notebooks in Colab and verify the install cells work

jameslehoux and others added 17 commits March 29, 2026 15:36
Add initial draft of OpenImpala paper detailing its framework and advancements.
Added multiple references to the bibliography file including articles on OpenImpala, statistical effective diffusivity estimation, AMReX framework, and Python Battery Mathematical Modelling.
Updated the software architecture section to reflect recent changes and improvements in the OpenImpala framework, including its transition to a Python library and enhancements in computational capabilities.
- Enhance profiling notebook with AMReX TinyProfiler breakdown
  (solver setup vs linear solve vs flux computation) and NVIDIA
  Nsight Systems GPU kernel profiling for Colab T4 runtimes
- Add CI benchmark workflow that runs on PRs touching solver code,
  tests against analytical solutions (uniform block tau=(N-1)/N),
  and posts timing results as PR comments for regression tracking

https://claude.ai/code/session_01RKnn97qiD7sbCeABHH3eQk
The GPU wheel uploads fail because setuptools_scm generates local
version identifiers (e.g. 4.0.2.dev0+ga780d0e87) which PyPI rejects.
Root cause: no v4.x tag exists in the repo (latest is v3.1.0).

Add [tool.setuptools_scm] with local_scheme="no-local-version" to
produce PyPI-compatible versions even on non-tagged commits, and
set fallback_version for builds outside a git repo.

https://claude.ai/code/session_01RKnn97qiD7sbCeABHH3eQk
The CIBW_BEFORE_BUILD sed command that renames the package to
openimpala-cuda dirties the git working tree. setuptools_scm's
guess-next-dev scheme then produces X.Y.Z+1.dev0 instead of the
tagged version. Fix by setting SETUPTOOLS_SCM_PRETEND_VERSION from
the release tag in both CPU and GPU wheel workflows.

https://claude.ai/code/session_01RKnn97qiD7sbCeABHH3eQk
- Fix LaTeX math rendering: wrap $3\times$ and $3\times 3$ properly
- Correct HYPRE CUDA description: "device execution policy" not
  "device memory interface" to match actual implementation
- Remove overstated PuMA Python API claim (PuMA now has bindings)
- Tone down Python facade claim to "core solver capabilities"
- Enumerate new features explicitly in the v4 summary paragraph
- Expand test section with benchmark names (Reuss/Voigt bounds)
- Consistent em-dash spacing throughout
- Minor prose tightening

https://claude.ai/code/session_01RKnn97qiD7sbCeABHH3eQk
When triggered manually (workflow_dispatch), GITHUB_REF_NAME is a
branch name not a tag. Fall back to git describe --tags --abbrev=0
to find the most recent tag in that case.

https://claude.ai/code/session_01RKnn97qiD7sbCeABHH3eQk
PyPI returns 400 Bad Request on GPU wheel upload. Enable verbose
logging to surface the actual rejection reason. Disable sigstore
attestations as a potential cause — the publish action generates
these by default and they may be rejected by PyPI.

https://claude.ai/code/session_01RKnn97qiD7sbCeABHH3eQk
REVStudy.cpp had zero test coverage. Add tSyntheticREVStudy that:
- Tests empty-sizes early exit path
- Runs a single-sample REV study on a uniform 16^3 domain
- Validates CSV output creation, header format, and data row count
- Checks D_eff tensor: diagonal ≈ 1.0, off-diagonal ≈ 0.0
- Broadcasts pass/fail across MPI ranks

https://claude.ai/code/session_01RKnn97qiD7sbCeABHH3eQk
Add integration tests exercising uncovered code paths:
- tDiffusion_microstructure: SSA, profiles, PSD parameterization
- tDiffusion_tortuosity: flow-through method with conductivity physics
- tDiffusion_rev: REV convergence study path
- tRawReader_uint16le/int16le/float32le: additional data type variants
- Raw binary test data files and generation script

https://claude.ai/code/session_01RKnn97qiD7sbCeABHH3eQk
GPU wheels are ~318 MB each due to CUDA fatbinaries, exceeding PyPI's
100 MB file size limit. Upload to GitHub Releases (2 GB limit) as an
interim solution while awaiting a PyPI size limit increase.

Users can install via:
  pip install openimpala-cuda --find-links \
    https://github.com/BASE-Laboratory/OpenImpala/releases/latest/download/

https://claude.ai/code/session_01RKnn97qiD7sbCeABHH3eQk
GPU wheels exceed PyPI's 100 MB limit (~318 MB due to CUDA fatbinaries),
so they are now distributed via GitHub Releases. Updated all install
instructions to use --find-links for openimpala-cuda:

- README.md
- docs/getting-started.md, docs/index.rst, docs/user-guide/gpu.md
- paper.md
- All 7 tutorial notebooks + profiling notebook

https://claude.ai/code/session_01RKnn97qiD7sbCeABHH3eQk
@github-actions github-actions Bot added devops documentation Improvements or additions to documentation gpu tests labels Mar 31, 2026
James Le Houx added 2 commits March 31, 2026 21:42
Move the percolation check before solver construction in tortuosity()
so non-percolating phases fail fast with a clear PercolationError
before expensive HYPRE matrix assembly. The error message now explains
why the solver cannot converge.

Also add oi.estimate_memory(shape, num_ranks) utility that returns
per-rank memory estimates based on the ~80 bytes/voxel rule of thumb.

https://claude.ai/code/session_01RKnn97qiD7sbCeABHH3eQk
Add notebooks/visualization_yt.ipynb demonstrating how to use yt to
visualize OpenImpala results directly in Jupyter:
- Loading AMReX plotfiles with yt.load()
- 2D SlicePlot of the solution field and phase map
- 1D ProfilePlot of average potential along the flow direction
- Extracting data to NumPy arrays for custom matplotlib plots
- Tips for large datasets (lazy loading, sub-regions)

Also add a [viz] optional dependency group for yt in pyproject.toml.

https://claude.ai/code/session_01RKnn97qiD7sbCeABHH3eQk
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Apr 1, 2026

Code Coverage Report

------------------------------------------------------------------------------
                           GCC Code Coverage Report
Directory: .
------------------------------------------------------------------------------
File                                       Lines     Exec  Cover   Missing
------------------------------------------------------------------------------
src/io/CathodeWrite.cpp                       95       83    87%   40-41,97-100,115-116,182-185
src/io/CathodeWrite.H                          1        1   100%
src/io/DatReader.cpp                         135      105    77%   26-27,30,35,92-93,99-100,107-109,135-137,141,144-148,152-155,162,164,208-209,242,245
src/io/DatReader.H                             1        1   100%
src/io/HDF5Reader.cpp                        344       84    24%   40-41,43-44,46-49,52,54-56,58-59,62,64-66,68-74,92-93,126-128,144-145,154-157,174-180,182-187,204,213-215,217,219-228,230-233,236-238,240-251,253-258,266,266,266,266,266,266,266,270,270,270,270,270,270,270,274,276,278,280,282,288,290,297,297,297,297,297,297,297,301,301,301,301,301,301,301,305,305,305,305,305,305,305-306,306,306,306,306,306,306,309,309,309,309,309,309,309-310,310,310,310,310,310,310-311,311,311,311,311,311,311,313,313,313,313,313,313,313-314,314,314,314,314,314,314-315,315,315,315,315,315,315,319,319,319,319,319,319,319,324,324,324,324,324,324,324-325,325,325,325,325,325,325-326,326,326,326,326,326,326-327,327,327,327,327,327,327,332,332,332,332,332,332,332,337,337,337,337,337,337,337-338,338,338,338,338,338,338,343,343,343,343,343,343,343,350,350,350,350,350,350,350,357-358,432-435,437-440
src/io/HDF5Reader.H                            3        3   100%
src/io/ImageLoader.cpp                        61       42    68%   25,38,48,60-62,64-70,72,77,89-90,92,94
src/io/RawReader.cpp                         266      135    50%   49-50,89-90,111-112,115-117,120-121,140-142,155-157,166-168,174-177,185-186,192-196,200-204,209-212,219-224,231-237,271,273-274,276,283-284,301,312,314,318,325,327,331-334,338,346-347,353-355,361-363,365-366,369,372,374,377-380,382-384,386,388-389,391,393-394,396,398-399,401,403-404,406,410-411,413,417-418,420,425,465,471-472,521-524,538,540-542,544,546-548,558,562-564,566,588
src/io/RawReader.H                             1        1   100%
src/io/TiffReader.cpp                        384      130    33%   59-65,67-69,71-73,75-77,79-80,82-84,86-88,90-92,94-96,98-99,101-103,106-108,111-112,114-117,119,122,124-127,143-144,148-150,152-158,160,186,210,217,226,228-231,240,242-245,248,255,288-293,306,309-317,319-320,323-327,331-335,338-342,344-348,351-357,359-363,367,369,375-377,379-393,396,398-402,404-409,413-418,420-425,428-429,432-434,555-575,577-578,581-588,590,593-609,612-614,670,673-674,677-683,685,689-700,702-703
src/io/TiffReader.H                            5        5   100%
src/props/BoundaryCondition.H                131       74    56%   63,68,70,216,224-229,233-236,238-244,247-249,252-253,255,258-261,264-265,271-272,274-279,285-287,290-296,299,303,365-366,371,373
src/props/ConnectedComponents.cpp             69       67    97%   94-95
src/props/ConnectedComponents.H                4        4   100%
src/props/DeffTensor.cpp                      62       59    95%   122,128-129
src/props/Diffusion.cpp                      510      378    74%   93-94,97-98,103-104,106-116,118,123-132,134-141,144-150,153-157,159-163,165,168-173,175-177,179,182-184,186-187,190-191,193,195-198,200,202-203,288-289,297-298,300,349,359-360,368-371,373-375,404-413,415,453,461,465-467,526-527,533,535,539,547,581,610,638,646,735-736,739-740,757-760,771-772,774,824
src/props/EffDiffFillMtx.H                   120      106    88%   58,216-217,221-225,229,231-235
src/props/EffectiveDiffusivityHypre.cpp      389      347    89%   189-191,193-197,305,367-370,479,612-615,617-619,621-624,633-636,643,672,684-687,689-691,693,705,716,718
src/props/EffectiveDiffusivityHypre.H          7        7   100%
src/props/FloodFill.cpp                       84       81    96%   94-95,203
src/props/HypreStructSolver.cpp              343      210    61%   87-88,121,133-134,145,299,309,311,314,346,356,358,361,367-370,372-376,378-379,381-385,388-389,391-392,394,397-398,401-402,404-407,409-413,415-416,418-422,425-426,428-429,431,434-435,438-439,441-443,445-451,453-457,460-461,463-464,466,469-470,473,475-477,479-485,487-491,494-495,497-498,500,503-504,507,509-511,513-516,518-522,525-526,528-529,531,534-535,538,541-542,555
src/props/HypreStructSolver.H                  6        6   100%
src/props/MacroGeometry.H                     17       17   100%
src/props/ParticleSizeDistribution.cpp        11       11   100%
src/props/ParticleSizeDistribution.H           6        6   100%
src/props/PercolationCheck.cpp                53       46    86%   32-33,49-51,68,73
src/props/PercolationCheck.H                   4        4   100%
src/props/PhysicsConfig.H                     90       89    98%   150
src/props/ResultsJSON.H                      225      222    98%   242,395,416
src/props/REVStudy.cpp                       151      128    84%   72,83-91,159,170-173,175,183-186,188-190
src/props/SolverConfig.H                      32       20    62%   30,32,37-44,75-76
src/props/SpecificSurfaceArea.cpp             56       55    98%   59
src/props/SpecificSurfaceArea.H                6        6   100%
src/props/ThroughThicknessProfile.cpp         38       38   100%
src/props/ThroughThicknessProfile.H            5        5   100%
src/props/Tortuosity.H                         2        2   100%
src/props/TortuosityDirect.cpp               219      191    87%   81-83,86,100-106,113-114,125,134,140,202-209,226,394,424,433
src/props/TortuosityDirect.H                   5        5   100%
src/props/TortuosityHypre.cpp                784      563    71%   148-149,154-155,239-242,245-247,310,334-336,339-340,342,352-354,357-359,389-392,572,596,600,621,637-638,640-642,644-653,655,658-662,666-668,671-678,680-684,688-690,692-694,696-705,707-711,713-724,726-729,731,741,747-750,752-754,763-766,768-770,786,789-790,813-818,829-832,834,871,876-879,882-884,888-891,893,895-898,900,905-907,909,958,967,972,975-980,996-999,1013-1017,1022-1027,1037-1041,1046-1051,1056-1060,1063-1066,1073-1076,1087,1096,1098,1102,1104,1126,1157-1158,1244-1246,1372-1375
src/props/TortuosityHypre.H                   15       15   100%
src/props/TortuosityHypreFill.H              127       98    77%   85,203,205-212,237-239,241-245,247-248,250,252,255-256,258-262
src/props/TortuosityKernels.H                 97       53    54%   52,56-60,62-65,69-74,76-80,84-85,90,129,143,157,243,245-248,250-253,257-260,262-265
src/props/TortuosityMLMG.cpp                  96       88    91%   153,174-176,178-179,186,199
src/props/TortuosityMLMG.H                     1        1   100%
src/props/TortuositySolverBase.cpp           301      237    78%   70-72,74-75,94-101,104,106,142-145,200,203,205,255,280,298,327,391,394-396,398,406-409,411-417,422,427-429,435-436,438-440,454,460,464-465,467,478,492,496-498,500,502,506
src/props/TortuositySolverBase.H              13       13   100%
src/props/VolumeFraction.cpp                  25       25   100%
src/props/VolumeFraction.H                     4        4   100%
------------------------------------------------------------------------------
TOTAL                                       5404     3871    71%
------------------------------------------------------------------------------


Generated by CI — coverage data from gcovr

@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 1, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

@jameslehoux jameslehoux force-pushed the claude/fix-gpu-wheels-build-yptxM branch from bef2aac to 1ddcb10 Compare April 1, 2026 08:39
James Le Houx added 2 commits April 1, 2026 08:40
Adds sphere_packing_vv.py which generates random overlapping sphere
packings at varying porosities and validates that OpenImpala's effective
diffusivity results fall within the Hashin-Shtrikman upper bound for
isotropic binary composites. Updates the V&V README with documentation.

Ref: #83

https://claude.ai/code/session_01RKnn97qiD7sbCeABHH3eQk
The input file used 'tortuosity' but Diffusion.cpp only accepts
'homogenization' or 'flow_through'. Changed to 'flow_through' which
is the correct method for single-direction tortuosity solves.

https://claude.ai/code/session_01RKnn97qiD7sbCeABHH3eQk
James Le Houx added 2 commits April 1, 2026 08:40
Adds berea_sandstone_vv.py which downloads a 400^3 micro-CT image of
Berea sandstone from Digital Rocks Portal and validates computed
porosity, tortuosity, and formation factor against published
experimental ranges from multiple independent measurements.

Falls back to a synthetic structure if the download fails (offline CI).

Ref: #83

https://claude.ai/code/session_01RKnn97qiD7sbCeABHH3eQk
gcov can report extremely large hit counts on certain lines due to
GCC bug #68080. This causes gcovr to abort with SuspiciousHits error.
Add --gcov-ignore-parse-errors=suspicious_hits.warn_once_per_file to
downgrade the error to a warning.

https://claude.ai/code/session_01RKnn97qiD7sbCeABHH3eQk
@jameslehoux jameslehoux force-pushed the claude/fix-gpu-wheels-build-yptxM branch from 1ddcb10 to a6df125 Compare April 1, 2026 08:40
@jameslehoux jameslehoux merged commit 97ea43d into working Apr 1, 2026
5 of 6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

devops documentation Improvements or additions to documentation gpu python tests

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant