BASE-Laboratory · jameslehoux · Apr 2, 2026 · Apr 2, 2026 · Apr 2, 2026
diff --git a/.github/workflows/benchmark.yml b/.github/workflows/benchmark.yml
@@ -102,7 +102,7 @@ jobs:
           import openimpala as oi
 
           sizes = [int(s) for s in os.environ.get("BENCH_SIZES", "64,128").split(",")]
-          solvers = ["pcg", "flexgmres", "bicgstab", "gmres", "pfmg", "mlmg"]
+          solvers = ["pcg", "flexgmres", "bicgstab", "gmres"]
           n_repeats = 3
           results = []
 

diff --git a/README.md b/README.md
@@ -134,7 +134,7 @@ For **GPU acceleration** (NVIDIA CUDA), install `openimpala-cuda` from GitHub Re
 
 ```bash
 pip install openimpala-cuda --find-links \
-  https://github.com/BASE-Laboratory/OpenImpala/releases/latest/download/
+  https://github.com/BASE-Laboratory/OpenImpala/releases/expanded_assets/v4.0.6
 ```
 
 To install with optional dependencies:

diff --git a/docs/getting-started.md b/docs/getting-started.md
@@ -13,7 +13,7 @@ pip install openimpala
 # GPU version (requires NVIDIA CUDA runtime)
 # GPU wheels are distributed via GitHub Releases due to their size (~300 MB).
 pip install openimpala-cuda --find-links \
-  https://github.com/BASE-Laboratory/OpenImpala/releases/latest/download/
+  https://github.com/BASE-Laboratory/OpenImpala/releases/expanded_assets/v4.0.6
 ```
 
 **Requirements:** Python 3.8+ and NumPy. Optional: `mpi4py` for MPI parallelism.
@@ -25,7 +25,7 @@ For HPC clusters, download the pre-built Apptainer/Singularity container from
 
 ```bash
 # Download the latest .sif file
-wget https://github.com/BASE-Laboratory/OpenImpala/releases/latest/download/openimpala-v4.0.0.sif
+wget https://github.com/BASE-Laboratory/OpenImpala/releases/expanded_assets/v4.0.6openimpala-v4.0.0.sif
 
 # Run interactively
 apptainer shell openimpala-v4.0.0.sif

diff --git a/docs/index.rst b/docs/index.rst
@@ -31,7 +31,7 @@ Install from PyPI
 
    # GPU version (NVIDIA CUDA) — distributed via GitHub Releases
    pip install openimpala-cuda --find-links \
-     https://github.com/BASE-Laboratory/OpenImpala/releases/latest/download/
+     https://github.com/BASE-Laboratory/OpenImpala/releases/expanded_assets/v4.0.6
 
 .. toctree::
    :maxdepth: 2

diff --git a/docs/user-guide/gpu.md b/docs/user-guide/gpu.md
@@ -8,7 +8,7 @@ flood fills, and solver loops are GPU-compatible.
 ```bash
 # GPU wheels are distributed via GitHub Releases due to their size (~300 MB).
 pip install openimpala-cuda --find-links \
-  https://github.com/BASE-Laboratory/OpenImpala/releases/latest/download/
+  https://github.com/BASE-Laboratory/OpenImpala/releases/expanded_assets/v4.0.6
 ```
 
 The GPU wheel requires:

diff --git a/notebooks/profiling_and_tuning.ipynb b/notebooks/profiling_and_tuning.ipynb
@@ -59,7 +59,7 @@
    "source": [
     "# Install system MPI and Python packages\n",
     "!apt-get install -y libopenmpi-dev > /dev/null 2>&1\n",
-    "!pip install openimpala-cuda --find-links https://github.com/BASE-Laboratory/OpenImpala/releases/latest/download/[all] > /dev/null 2>&1\n",
+    "!pip install openimpala-cuda --find-links https://github.com/BASE-Laboratory/OpenImpala/releases/expanded_assets/v4.0.6[all] > /dev/null 2>&1\n",
     "!pip install porespy > /dev/null 2>&1\n",
     "print(\"Dependencies installed.\")"
    ]

diff --git a/paper.md b/paper.md
@@ -83,7 +83,7 @@ with oi.Session():
     print(f"Tortuosity: {result.tortuosity:.4f}")
 ```
 
-Pre-compiled CPU wheels are distributed via PyPI (`pip install openimpala`) and CUDA GPU wheels via GitHub Releases (`pip install openimpala-cuda --find-links https://github.com/BASE-Laboratory/OpenImpala/releases/latest/download/`), both built using `cibuildwheel` with statically linked dependencies. Interactive tutorial notebooks are provided for Google Colab, covering workflows from basic tortuosity computation to digital twin parameterisation with PyBaMM. API reference documentation, installation guides, and interactive tutorial notebooks are available at https://base-laboratory.github.io/OpenImpala/
+Pre-compiled CPU wheels are distributed via PyPI (`pip install openimpala`) and CUDA GPU wheels via GitHub Releases (`pip install openimpala-cuda --find-links https://github.com/BASE-Laboratory/OpenImpala/releases/expanded_assets/v4.0.6`), both built using `cibuildwheel` with statically linked dependencies. Interactive tutorial notebooks are provided for Google Colab, covering workflows from basic tortuosity computation to digital twin parameterisation with PyBaMM. API reference documentation, installation guides, and interactive tutorial notebooks are available at https://base-laboratory.github.io/OpenImpala/
 
 ## Testing and Quality Assurance
 

diff --git a/src/props/HypreStructSolver.cpp b/src/props/HypreStructSolver.cpp
@@ -480,9 +480,8 @@ bool HypreStructSolver::runSolver(PrecondType precond_type) {
         HYPRE_CHECK(ierr);
         HYPRE_StructPFMGSetTol(solver, m_eps);
         HYPRE_StructPFMGSetMaxIter(solver, m_maxiter);
-        HYPRE_StructPFMGSetNumPreRelax(solver, 2);
-        HYPRE_StructPFMGSetNumPostRelax(solver, 2);
-        HYPRE_StructPFMGSetRelaxType(solver, 2); // 2 = weighted Jacobi (more stable)
+        HYPRE_StructPFMGSetNumPreRelax(solver, 1);
+        HYPRE_StructPFMGSetNumPostRelax(solver, 1);
         HYPRE_StructPFMGSetPrintLevel(solver, m_verbose > 1 ? 3 : 0);
 
         ierr = HYPRE_StructPFMGSetup(solver, m_A, m_b, m_x);

diff --git a/src/props/TortuosityMLMG.cpp b/src/props/TortuosityMLMG.cpp
@@ -79,39 +79,18 @@ bool TortuosityMLMG::solve() {
     }
     mlabec.setDomainBC(lo_bc, hi_bc);
 
-    // --- Adjust Dirichlet face values for HYPRE-compatible cell-centre BCs ---
-    //
-    // AMReX MLABecLaplacian applies Dirichlet BCs at domain faces (half a cell
-    // outside the boundary cell centre). The shared flux integration code
-    // (globalFluxes / value) expects the HYPRE convention where Dirichlet
-    // values are at boundary cell centres: cell 0 = vlo, cell N-1 = vhi.
-    //
-    // To make the MLMG face BC produce the same cell-centre values, we extend
-    // the face values outward by half a cell:
-    //   face_lo = vlo - 0.5 * (vhi - vlo) / (N - 1)
-    //   face_hi = vhi + 0.5 * (vhi - vlo) / (N - 1)
-    //
-    // This ensures the linear solution through cell centres hits exactly
-    // vlo at cell 0 and vhi at cell N-1, matching HYPRE's τ = (N-1)/N.
-    const amrex::Box& domain = m_geom.Domain();
-    const int n_cells = domain.length(idir);
-    if (n_cells <= 1) {
-        amrex::Abort("TortuosityMLMG: domain must have more than 1 cell in flow direction.");
-    }
-    const amrex::Real half_step = 0.5 * (m_vhi - m_vlo) / static_cast<amrex::Real>(n_cells - 1);
-    const amrex::Real face_vlo = m_vlo - half_step;
-    const amrex::Real face_vhi = m_vhi + half_step;
-
     // Set initial guess: linear ramp in flow direction for better convergence
     m_mf_solution.setVal(0.0);
     {
+        const amrex::Box& domain = m_geom.Domain();
+        const int n_cells = domain.length(idir);
+        if (n_cells <= 1) {
+            amrex::Abort("TortuosityMLMG: domain must have more than 1 cell in flow direction.");
+        }
         const int dom_lo_dir = domain.smallEnd(idir);
         const int dom_hi_dir = domain.bigEnd(idir);
-        const amrex::Real vlo = face_vlo;
-        const amrex::Real vhi = face_vhi;
-        // Ramp from face_vlo at the low face to face_vhi at the high face.
-        // Cell centres at i map to fraction (i - dom_lo + 0.5) / n_cells.
-        const amrex::Real inv_n = 1.0 / static_cast<amrex::Real>(n_cells);
+        const amrex::Real vlo = m_vlo;
+        const amrex::Real vhi = m_vhi;
 #ifdef AMREX_USE_OMP
 #pragma omp parallel if (amrex::Gpu::notInLaunchRegion())
 #endif
@@ -121,7 +100,8 @@ bool TortuosityMLMG::solve() {
             amrex::ParallelFor(bx, [=] AMREX_GPU_DEVICE(int i, int j, int k) noexcept {
                 amrex::IntVect iv(i, j, k);
                 int idx_in_dir = iv[idir] - dom_lo_dir;
-                amrex::Real frac = (static_cast<amrex::Real>(idx_in_dir) + 0.5) * inv_n;
+                amrex::Real frac =
+                    static_cast<amrex::Real>(idx_in_dir) / static_cast<amrex::Real>(n_cells - 1);
                 if (iv[idir] >= dom_lo_dir && iv[idir] <= dom_hi_dir) {
                     phi(i, j, k) = vlo + frac * (vhi - vlo);
                 } else if (iv[idir] < dom_lo_dir) {
@@ -134,7 +114,7 @@ bool TortuosityMLMG::solve() {
     }
     m_mf_solution.FillBoundary(m_geom.periodicity());
 
-    // Set level BC (ghost cell values encode the Dirichlet face data)
+    // Set level BC (ghost cell values encode the Dirichlet data)
     mlabec.setLevelBC(0, &m_mf_solution);
 
     // Set coefficients: alpha*a - beta*div(B*grad)
@@ -145,48 +125,7 @@ bool TortuosityMLMG::solve() {
     acoef.setVal(0.0);
     mlabec.setACoeffs(0, acoef);
 
-    // B-coefficients: face-centred diffusivities via harmonic mean.
-    //
-    // First, extrapolate m_mf_diff_coeff into physical boundary ghost cells
-    // so that boundary-face harmonic means see the correct value (the adjacent
-    // interior cell's D) instead of the default 0.  Without this, boundary
-    // faces get B=0, effectively imposing zero-flux Neumann everywhere and
-    // preventing MLMG from enforcing Dirichlet BCs properly.
-    {
-        const amrex::Box& domain = m_geom.Domain();
-#ifdef AMREX_USE_OMP
-#pragma omp parallel if (amrex::Gpu::notInLaunchRegion())
-#endif
-        for (amrex::MFIter mfi(m_mf_diff_coeff, amrex::TilingIfNotGPU()); mfi.isValid(); ++mfi) {
-            amrex::Array4<amrex::Real> const dc = m_mf_diff_coeff.array(mfi);
-            const amrex::Box& vbx = mfi.validbox();
-            for (int d = 0; d < AMREX_SPACEDIM; ++d) {
-                // Low boundary: copy interior value into ghost cell
-                if (vbx.smallEnd(d) == domain.smallEnd(d)) {
-                    const amrex::Box lobx = amrex::adjCellLo(vbx, d, 1);
-                    const int interior = domain.smallEnd(d);
-                    amrex::ParallelFor(lobx, [=] AMREX_GPU_DEVICE(int i, int j, int k) noexcept {
-                        amrex::IntVect iv(i, j, k);
-                        amrex::IntVect iv_int = iv;
-                        iv_int[d] = interior;
-                        dc(i, j, k) = dc(iv_int);
-                    });
-                }
-                // High boundary: copy interior value into ghost cell
-                if (vbx.bigEnd(d) == domain.bigEnd(d)) {
-                    const amrex::Box hibx = amrex::adjCellHi(vbx, d, 1);
-                    const int interior = domain.bigEnd(d);
-                    amrex::ParallelFor(hibx, [=] AMREX_GPU_DEVICE(int i, int j, int k) noexcept {
-                        amrex::IntVect iv(i, j, k);
-                        amrex::IntVect iv_int = iv;
-                        iv_int[d] = interior;
-                        dc(i, j, k) = dc(iv_int);
-                    });
-                }
-            }
-        }
-    }
-
+    // B-coefficients: face-centred diffusivities via harmonic mean
     amrex::Array<amrex::MultiFab, AMREX_SPACEDIM> bcoefs;
     for (int d = 0; d < AMREX_SPACEDIM; ++d) {
         amrex::BoxArray edge_ba = m_ba;

diff --git a/tutorials/01_hello_openimpala.ipynb b/tutorials/01_hello_openimpala.ipynb
@@ -10,7 +10,7 @@
    "execution_count": null,
    "metadata": {},
    "outputs": [],
-   "source": "# Install OpenImpala, PoreSpy for structure generation, and Matplotlib for plots.\n!pip install -q openimpala-cuda --find-links https://github.com/BASE-Laboratory/OpenImpala/releases/latest/download/ porespy matplotlib"
+   "source": "# Install OpenImpala, PoreSpy for structure generation, and Matplotlib for plots.\n!pip install -q openimpala-cuda --find-links https://github.com/BASE-Laboratory/OpenImpala/releases/expanded_assets/v4.0.6 porespy matplotlib"
   },
   {
    "cell_type": "code",

diff --git a/tutorials/02_digital_twin.ipynb b/tutorials/02_digital_twin.ipynb
@@ -12,7 +12,7 @@
    "outputs": [],
    "source": [
     "# Install OpenImpala, PyBaMM, and visualization utilities\n",
-    "!pip install -q openimpala-cuda --find-links https://github.com/BASE-Laboratory/OpenImpala/releases/latest/download/ pybamm bpx tifffile matplotlib yt"
+    "!pip install -q openimpala-cuda --find-links https://github.com/BASE-Laboratory/OpenImpala/releases/expanded_assets/v4.0.6 pybamm bpx tifffile matplotlib yt"
    ]
   },
   {

diff --git a/tutorials/03_rev_and_uncertainty.ipynb b/tutorials/03_rev_and_uncertainty.ipynb
@@ -10,7 +10,7 @@
    "execution_count": null,
    "metadata": {},
    "outputs": [],
-   "source": "# Install OpenImpala and utilities\n!pip install -q openimpala-cuda --find-links https://github.com/BASE-Laboratory/OpenImpala/releases/latest/download/ tifffile matplotlib scipy"
+   "source": "# Install OpenImpala and utilities\n!pip install -q openimpala-cuda --find-links https://github.com/BASE-Laboratory/OpenImpala/releases/expanded_assets/v4.0.6 tifffile matplotlib scipy"
   },
   {
    "cell_type": "code",

diff --git a/tutorials/04_multiphase_and_fields.ipynb b/tutorials/04_multiphase_and_fields.ipynb
@@ -10,7 +10,7 @@
    "execution_count": null,
    "metadata": {},
    "outputs": [],
-   "source": "# Install OpenImpala, PoreSpy, yt (for AMReX plotfile visualisation), and Matplotlib.\n!pip install -q openimpala-cuda --find-links https://github.com/BASE-Laboratory/OpenImpala/releases/latest/download/ porespy yt matplotlib"
+   "source": "# Install OpenImpala, PoreSpy, yt (for AMReX plotfile visualisation), and Matplotlib.\n!pip install -q openimpala-cuda --find-links https://github.com/BASE-Laboratory/OpenImpala/releases/expanded_assets/v4.0.6 porespy yt matplotlib"
   },
   {
    "cell_type": "code",

diff --git a/tutorials/05_surrogate_modelling.ipynb b/tutorials/05_surrogate_modelling.ipynb
@@ -12,7 +12,7 @@
    "outputs": [],
    "source": [
     "# Install OpenImpala and ML libraries\n",
-    "!pip install -q openimpala-cuda --find-links https://github.com/BASE-Laboratory/OpenImpala/releases/latest/download/ porespy scikit-learn matplotlib"
+    "!pip install -q openimpala-cuda --find-links https://github.com/BASE-Laboratory/OpenImpala/releases/expanded_assets/v4.0.6 porespy scikit-learn matplotlib"
    ]
   },
   {

diff --git a/tutorials/06_topology_optimisation.ipynb b/tutorials/06_topology_optimisation.ipynb
@@ -10,7 +10,7 @@
    "execution_count": null,
    "metadata": {},
    "outputs": [],
-   "source": "!pip install -q openimpala-cuda --find-links https://github.com/BASE-Laboratory/OpenImpala/releases/latest/download/ matplotlib"
+   "source": "!pip install -q openimpala-cuda --find-links https://github.com/BASE-Laboratory/OpenImpala/releases/expanded_assets/v4.0.6 matplotlib"
   },
   {
    "cell_type": "code",

diff --git a/tutorials/07_hpc_scaling.ipynb b/tutorials/07_hpc_scaling.ipynb
@@ -242,7 +242,7 @@
    "execution_count": null,
    "metadata": {},
    "outputs": [],
-   "source": "!pip install -q openimpala-cuda --find-links https://github.com/BASE-Laboratory/OpenImpala/releases/latest/download/ porespy matplotlib"
+   "source": "!pip install -q openimpala-cuda --find-links https://github.com/BASE-Laboratory/OpenImpala/releases/expanded_assets/v4.0.6 porespy matplotlib"
   },
   {
    "cell_type": "code",
@@ -266,7 +266,7 @@
   {
    "cell_type": "markdown",
    "metadata": {},
-   "source": "## Summary\n\n| Scenario | Approach |\n|----------|----------|\n| **Laptop / Colab** | `pip install openimpala-cuda --find-links https://github.com/BASE-Laboratory/OpenImpala/releases/latest/download/`, use NumPy arrays directly |\n| **Small cluster (1-16 cores)** | `mpirun -np 16 python script.py` with NumPy loading |\n| **Large cluster (16-128+ cores)** | `mpirun -np 128 python script.py` with `oi.read_image()` for parallel I/O |\n| **HPC without Python** | `mpirun -np 128 Diffusion ./inputs` (pure C++ application) |\n\n### Solver Quick Reference\n\n| Solver | Best For | Notes |\n|--------|----------|-------|\n| **FlexGMRES** | General use | Robust default, handles non-symmetric systems |\n| **PCG** | Symmetric problems | Fastest when applicable |\n| **SMG/PFMG** | Structured grids | Geometric multigrid, excellent on regular domains |\n| **BiCGSTAB** | Non-symmetric | Alternative to GMRES |\n\nThe API is the same at every scale \u2014 only the launch command changes. The scaling study and solver comparison in Sections 6-7 provide concrete data to guide your deployment decisions.\n\n**Back to the beginning:**\n- [Tutorial 1: Getting Started](01_hello_openimpala.ipynb) \u2014 Core workflow with synthetic microstructures.\n- [Tutorial 2: From 3D Image to Device Model](02_digital_twin.ipynb) \u2014 Load real CT scans and export to PyBaMM.\n\n---\n\n## References & Further Reading\n\n1. **OpenImpala:** S. Mayner, F. Ciucci, *OpenImpala: open-source computational framework for microstructural analysis of 3D tomography data*, SoftwareX (2024). [GitHub](https://github.com/BASE-Laboratory/OpenImpala)\n2. **AMReX:** W. Zhang et al., *AMReX: a framework for block-structured adaptive mesh refinement*, J. Open Source Software 4(37), 1370 (2019). [doi:10.21105/joss.01370](https://doi.org/10.21105/joss.01370)\n3. **AMReX scaling:** A. S. Almgren et al., *Block-structured adaptive mesh refinement \u2014 theory, implementation and application*, J. Comput. Physics 332, 1-28 (2017). [doi:10.1016/j.jcp.2016.12.073](https://doi.org/10.1016/j.jcp.2016.12.073)\n4. **HYPRE:** R. D. Falgout & U. M. Yang, *hypre: A library of high performance preconditioners*, Computational Science \u2014 ICCS 2002, LNCS 2331, pp. 632-641 (2002). [doi:10.1007/3-540-47789-6_66](https://doi.org/10.1007/3-540-47789-6_66)\n5. **Parallel HDF5:** The HDF Group, *HDF5 \u2014 Parallel I/O*, [hdfgroup.org](https://www.hdfgroup.org/solutions/hdf5/)\n6. **Apptainer/Singularity for HPC:** G. M. Kurtzer et al., *Singularity: Scientific containers for mobility of compute*, PLoS ONE 12(5), e0177459 (2017). [doi:10.1371/journal.pone.0177459](https://doi.org/10.1371/journal.pone.0177459)\n7. **MPI standard:** Message Passing Interface Forum, *MPI: A Message-Passing Interface Standard, Version 4.0* (2021). [mpi-forum.org](https://www.mpi-forum.org/docs/mpi-4.0/mpi40-report.pdf)"
+   "source": "## Summary\n\n| Scenario | Approach |\n|----------|----------|\n| **Laptop / Colab** | `pip install openimpala-cuda --find-links https://github.com/BASE-Laboratory/OpenImpala/releases/expanded_assets/v4.0.6`, use NumPy arrays directly |\n| **Small cluster (1-16 cores)** | `mpirun -np 16 python script.py` with NumPy loading |\n| **Large cluster (16-128+ cores)** | `mpirun -np 128 python script.py` with `oi.read_image()` for parallel I/O |\n| **HPC without Python** | `mpirun -np 128 Diffusion ./inputs` (pure C++ application) |\n\n### Solver Quick Reference\n\n| Solver | Best For | Notes |\n|--------|----------|-------|\n| **FlexGMRES** | General use | Robust default, handles non-symmetric systems |\n| **PCG** | Symmetric problems | Fastest when applicable |\n| **SMG/PFMG** | Structured grids | Geometric multigrid, excellent on regular domains |\n| **BiCGSTAB** | Non-symmetric | Alternative to GMRES |\n\nThe API is the same at every scale \u2014 only the launch command changes. The scaling study and solver comparison in Sections 6-7 provide concrete data to guide your deployment decisions.\n\n**Back to the beginning:**\n- [Tutorial 1: Getting Started](01_hello_openimpala.ipynb) \u2014 Core workflow with synthetic microstructures.\n- [Tutorial 2: From 3D Image to Device Model](02_digital_twin.ipynb) \u2014 Load real CT scans and export to PyBaMM.\n\n---\n\n## References & Further Reading\n\n1. **OpenImpala:** S. Mayner, F. Ciucci, *OpenImpala: open-source computational framework for microstructural analysis of 3D tomography data*, SoftwareX (2024). [GitHub](https://github.com/BASE-Laboratory/OpenImpala)\n2. **AMReX:** W. Zhang et al., *AMReX: a framework for block-structured adaptive mesh refinement*, J. Open Source Software 4(37), 1370 (2019). [doi:10.21105/joss.01370](https://doi.org/10.21105/joss.01370)\n3. **AMReX scaling:** A. S. Almgren et al., *Block-structured adaptive mesh refinement \u2014 theory, implementation and application*, J. Comput. Physics 332, 1-28 (2017). [doi:10.1016/j.jcp.2016.12.073](https://doi.org/10.1016/j.jcp.2016.12.073)\n4. **HYPRE:** R. D. Falgout & U. M. Yang, *hypre: A library of high performance preconditioners*, Computational Science \u2014 ICCS 2002, LNCS 2331, pp. 632-641 (2002). [doi:10.1007/3-540-47789-6_66](https://doi.org/10.1007/3-540-47789-6_66)\n5. **Parallel HDF5:** The HDF Group, *HDF5 \u2014 Parallel I/O*, [hdfgroup.org](https://www.hdfgroup.org/solutions/hdf5/)\n6. **Apptainer/Singularity for HPC:** G. M. Kurtzer et al., *Singularity: Scientific containers for mobility of compute*, PLoS ONE 12(5), e0177459 (2017). [doi:10.1371/journal.pone.0177459](https://doi.org/10.1371/journal.pone.0177459)\n7. **MPI standard:** Message Passing Interface Forum, *MPI: A Message-Passing Interface Standard, Version 4.0* (2021). [mpi-forum.org](https://www.mpi-forum.org/docs/mpi-4.0/mpi40-report.pdf)"
   }
  ],
  "metadata": {