Skip to content

tr/trbpsd.f90 + bpsd/bpsd_plasmaf.f90 OOB cascade with eqdata.TST-2 (follow-up to #194) #203

@HengyuLi-Ozaki-lab

Description

@HengyuLi-Ozaki-lab

Summary

After fixing the bpsd kid OOB (separate bug), running tr_run() with the bundled eqdata.TST-2 fixture hits two cascading out-of-bounds writes inside tr/trbpsd.f90 and bpsd/bpsd_plasmaf.f90. Both are pre-existing latent bugs that were silently corrupting heap chunks under the old macOS build of libtrapi.so (the resulting heap corruption was the original SIGABRT signature reported in #194).

Filed as a follow-up to #194 — the SIGABRT in that issue is now confirmed eliminated on the macOS side, but these two OOBs remain.

Bug-B: tr/trbpsd.f90 temp(nr,...) and qp(nr-1) overflow

tr/trbpsd.f90 declares two local stack buffers sized to nrmp = NRMAX+1:

real(rkind) :: temp(nrmp, nsm, 3)        ! line ~166
real(rkind) :: qp(nrmp-1)                 ! line ~ similar

NRMAX defaults to 50 (set in tr/trinit.f90:375), so nrmp = 51 and the buffers are sized (51, *, 3) / 50 respectively. Then:

call bpsd_get_data(plasmaf, ierr)        ! line ~185
do ns = 1, plasmaf%nsmax
   do nr = 1, plasmaf%nrmax               ! line ~187 — but plasmaf%nrmax = 52
      temp(nr, ns, 1) = plasmaf%data(nr, ns)%density * 1.d-20
      ...
do nr = 2, plasmaf%nrmax                  ! line ~192
   qp(nr-1) = 1.d0 / plasmaf%qinv(nr)

With eqdata.TST-2 (file-baked nrmax=51), bpsd_get_data returns plasmaf%nrmax = 52. The loops then write temp(52, ...) past upper bound 51 and qp(51) past upper bound 50.

Reproduction (with OFLAGS = -g -O0 -fbounds-check -fcheck=all):

At line 187 of file ../task-kyoshimi/tr/trbpsd.f90
Fortran runtime error: Index '52' of dimension 1 of array 'temp' above upper bound of 51

This OOB used to corrupt heap chunks silently in release builds (no bounds-check), which is what macOS hardened-malloc was occasionally tripping on (see #194).

Aggravating factor: NRMAX is not parameterized

There's no clean workaround at the caller layer: NRMAX is not in TR's set_param registry (tr_set_param('NRMAX', 51.0) returns ierr=1). So an MCP / Python caller can't pre-set NRMAX to match the equilibrium file's nrmax before invoking bpsd_get_data. The only options are:

  1. Expose NRMAX via the param registry, OR
  2. Allocate temp/qp based on plasmaf%nrmax (dynamically) instead of compile-time nrmp, OR
  3. Clamp the inner loops with min(plasmaf%nrmax, nrmp).

Suggested fix

The cleanest fix is option 2 — switch temp and qp to allocatable arrays sized after bpsd_get_data returns:

real(rkind), allocatable :: temp(:,:,:), qp(:)
...
call bpsd_get_data(plasmaf, ierr)
allocate(temp(plasmaf%nrmax, plasmaf%nsmax, 3), qp(plasmaf%nrmax-1))

But see bug-C below — that fix uncovers another OOB further into the chain.

Bug-C: bpsd/bpsd_plasmaf.f90:202 plasmaf_out%rho size-1 after INTENT(OUT) reset

When attempting bug-B's allocatable-buffer fix, the next OOB surfaces inside bpsd_get_plasmaf (in the bpsd repo, file bpsd_plasmaf.f90):

subroutine bpsd_get_plasmaf(plasmaf_out, ierr)
   type(bpsd_plasmaf_type), intent(out) :: plasmaf_out    ! INTENT(OUT) resets all fields
   ...
   plasmaf_out%rho(nr) = ...                              ! line ~202; rho has been reset to size 1

INTENT(OUT) semantics on a derived type reset all components, including allocatable arrays back to size 1 (or unallocated, depending on the compiler). The writer subroutine then writes rho(nr) for nr up to nrmax without first reallocating.

Reproduction: with bug-B's allocatable-buffer patch applied, the same tr_run call now fails at:

At line 202 of file ../task-kyoshimi/bpsd/bpsd_plasmaf.f90
Fortran runtime error: Index '2' of dimension 1 of array 'rho' above upper bound of 1

Suggested fix

Inside bpsd_get_plasmaf, either:

  • Use intent(inout) (if the caller is expected to pre-allocate), OR
  • After intent(out) reset, explicitly allocate(plasmaf_out%rho(plasmaf_self%nrmax)) (and the other derived-type arrays) before writing.

This bug is technically in the bpsd repo (k-yoshimi/bpsd), but since it's the second link of the cascade revealed only after a bug-B fix attempt, tracking it here next to bug-B keeps the dependency chain visible. Happy to split into a separate bpsd issue if maintainer prefers.

Why we're not patching locally

We attempted bug-B's allocatable-buffer fix locally on kyoshimi-develop. It works for the immediate OOB but uncovers bug-C, which requires a coordinated patch across bpsd/bpsd_plasmaf.f90 and the TR caller. To avoid drifting our local tree from upstream further than necessary, we've reverted the bug-B patch and instead marked the 3 affected downstream tests (in HengyuLi-Ozaki-lab/task-web-client) as pytest.mark.xfail(strict=False) with this issue as the removal marker.

Environment

  • gfortran 15.2.0 (MacPorts / Homebrew, macOS arm64 — both confirmed)
  • task-kyoshimi @ c4e0defe (kyoshimi-develop branch, ahead 12 of kyoshimi/develop after rebase to f8f5c9e + 7a436d3 + 100 docs commits)
  • bpsd @ 975af0e (local patch for the kid bug from k-yoshimi/bpsd#6, otherwise master)
  • libtrapi.so built via scripts/setup.sh (no lib/macos_stubs.f90, uses -Wl,-undefined,dynamic_lookup)

Related

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions