Summary
After fixing the bpsd kid OOB (separate bug), running tr_run() with the bundled eqdata.TST-2 fixture hits two cascading out-of-bounds writes inside tr/trbpsd.f90 and bpsd/bpsd_plasmaf.f90. Both are pre-existing latent bugs that were silently corrupting heap chunks under the old macOS build of libtrapi.so (the resulting heap corruption was the original SIGABRT signature reported in #194).
Filed as a follow-up to #194 — the SIGABRT in that issue is now confirmed eliminated on the macOS side, but these two OOBs remain.
Bug-B: tr/trbpsd.f90 temp(nr,...) and qp(nr-1) overflow
tr/trbpsd.f90 declares two local stack buffers sized to nrmp = NRMAX+1:
real(rkind) :: temp(nrmp, nsm, 3) ! line ~166
real(rkind) :: qp(nrmp-1) ! line ~ similar
NRMAX defaults to 50 (set in tr/trinit.f90:375), so nrmp = 51 and the buffers are sized (51, *, 3) / 50 respectively. Then:
call bpsd_get_data(plasmaf, ierr) ! line ~185
do ns = 1, plasmaf%nsmax
do nr = 1, plasmaf%nrmax ! line ~187 — but plasmaf%nrmax = 52
temp(nr, ns, 1) = plasmaf%data(nr, ns)%density * 1.d-20
...
do nr = 2, plasmaf%nrmax ! line ~192
qp(nr-1) = 1.d0 / plasmaf%qinv(nr)
With eqdata.TST-2 (file-baked nrmax=51), bpsd_get_data returns plasmaf%nrmax = 52. The loops then write temp(52, ...) past upper bound 51 and qp(51) past upper bound 50.
Reproduction (with OFLAGS = -g -O0 -fbounds-check -fcheck=all):
At line 187 of file ../task-kyoshimi/tr/trbpsd.f90
Fortran runtime error: Index '52' of dimension 1 of array 'temp' above upper bound of 51
This OOB used to corrupt heap chunks silently in release builds (no bounds-check), which is what macOS hardened-malloc was occasionally tripping on (see #194).
Aggravating factor: NRMAX is not parameterized
There's no clean workaround at the caller layer: NRMAX is not in TR's set_param registry (tr_set_param('NRMAX', 51.0) returns ierr=1). So an MCP / Python caller can't pre-set NRMAX to match the equilibrium file's nrmax before invoking bpsd_get_data. The only options are:
- Expose
NRMAX via the param registry, OR
- Allocate
temp/qp based on plasmaf%nrmax (dynamically) instead of compile-time nrmp, OR
- Clamp the inner loops with
min(plasmaf%nrmax, nrmp).
Suggested fix
The cleanest fix is option 2 — switch temp and qp to allocatable arrays sized after bpsd_get_data returns:
real(rkind), allocatable :: temp(:,:,:), qp(:)
...
call bpsd_get_data(plasmaf, ierr)
allocate(temp(plasmaf%nrmax, plasmaf%nsmax, 3), qp(plasmaf%nrmax-1))
But see bug-C below — that fix uncovers another OOB further into the chain.
Bug-C: bpsd/bpsd_plasmaf.f90:202 plasmaf_out%rho size-1 after INTENT(OUT) reset
When attempting bug-B's allocatable-buffer fix, the next OOB surfaces inside bpsd_get_plasmaf (in the bpsd repo, file bpsd_plasmaf.f90):
subroutine bpsd_get_plasmaf(plasmaf_out, ierr)
type(bpsd_plasmaf_type), intent(out) :: plasmaf_out ! INTENT(OUT) resets all fields
...
plasmaf_out%rho(nr) = ... ! line ~202; rho has been reset to size 1
INTENT(OUT) semantics on a derived type reset all components, including allocatable arrays back to size 1 (or unallocated, depending on the compiler). The writer subroutine then writes rho(nr) for nr up to nrmax without first reallocating.
Reproduction: with bug-B's allocatable-buffer patch applied, the same tr_run call now fails at:
At line 202 of file ../task-kyoshimi/bpsd/bpsd_plasmaf.f90
Fortran runtime error: Index '2' of dimension 1 of array 'rho' above upper bound of 1
Suggested fix
Inside bpsd_get_plasmaf, either:
- Use
intent(inout) (if the caller is expected to pre-allocate), OR
- After
intent(out) reset, explicitly allocate(plasmaf_out%rho(plasmaf_self%nrmax)) (and the other derived-type arrays) before writing.
This bug is technically in the bpsd repo (k-yoshimi/bpsd), but since it's the second link of the cascade revealed only after a bug-B fix attempt, tracking it here next to bug-B keeps the dependency chain visible. Happy to split into a separate bpsd issue if maintainer prefers.
Why we're not patching locally
We attempted bug-B's allocatable-buffer fix locally on kyoshimi-develop. It works for the immediate OOB but uncovers bug-C, which requires a coordinated patch across bpsd/bpsd_plasmaf.f90 and the TR caller. To avoid drifting our local tree from upstream further than necessary, we've reverted the bug-B patch and instead marked the 3 affected downstream tests (in HengyuLi-Ozaki-lab/task-web-client) as pytest.mark.xfail(strict=False) with this issue as the removal marker.
Environment
- gfortran 15.2.0 (MacPorts / Homebrew, macOS arm64 — both confirmed)
- task-kyoshimi @
c4e0defe (kyoshimi-develop branch, ahead 12 of kyoshimi/develop after rebase to f8f5c9e + 7a436d3 + 100 docs commits)
- bpsd @
975af0e (local patch for the kid bug from k-yoshimi/bpsd#6, otherwise master)
- libtrapi.so built via
scripts/setup.sh (no lib/macos_stubs.f90, uses -Wl,-undefined,dynamic_lookup)
Related
Summary
After fixing the bpsd
kidOOB (separate bug), runningtr_run()with the bundledeqdata.TST-2fixture hits two cascading out-of-bounds writes insidetr/trbpsd.f90andbpsd/bpsd_plasmaf.f90. Both are pre-existing latent bugs that were silently corrupting heap chunks under the old macOS build of libtrapi.so (the resulting heap corruption was the original SIGABRT signature reported in #194).Filed as a follow-up to #194 — the SIGABRT in that issue is now confirmed eliminated on the macOS side, but these two OOBs remain.
Bug-B:
tr/trbpsd.f90temp(nr,...)andqp(nr-1)overflowtr/trbpsd.f90declares two local stack buffers sized tonrmp = NRMAX+1:NRMAXdefaults to 50 (set intr/trinit.f90:375), sonrmp = 51and the buffers are sized (51, *, 3) / 50 respectively. Then:With
eqdata.TST-2(file-bakednrmax=51),bpsd_get_datareturnsplasmaf%nrmax = 52. The loops then writetemp(52, ...)past upper bound 51 andqp(51)past upper bound 50.Reproduction (with
OFLAGS = -g -O0 -fbounds-check -fcheck=all):This OOB used to corrupt heap chunks silently in release builds (no bounds-check), which is what macOS hardened-malloc was occasionally tripping on (see #194).
Aggravating factor: NRMAX is not parameterized
There's no clean workaround at the caller layer:
NRMAXis not in TR'sset_paramregistry (tr_set_param('NRMAX', 51.0)returnsierr=1). So an MCP / Python caller can't pre-setNRMAXto match the equilibrium file's nrmax before invokingbpsd_get_data. The only options are:NRMAXvia the param registry, ORtemp/qpbased onplasmaf%nrmax(dynamically) instead of compile-timenrmp, ORmin(plasmaf%nrmax, nrmp).Suggested fix
The cleanest fix is option 2 — switch
tempandqpto allocatable arrays sized afterbpsd_get_datareturns:But see bug-C below — that fix uncovers another OOB further into the chain.
Bug-C:
bpsd/bpsd_plasmaf.f90:202plasmaf_out%rhosize-1 after INTENT(OUT) resetWhen attempting bug-B's allocatable-buffer fix, the next OOB surfaces inside
bpsd_get_plasmaf(in the bpsd repo, filebpsd_plasmaf.f90):INTENT(OUT)semantics on a derived type reset all components, including allocatable arrays back to size 1 (or unallocated, depending on the compiler). The writer subroutine then writesrho(nr)fornrup tonrmaxwithout first reallocating.Reproduction: with bug-B's allocatable-buffer patch applied, the same
tr_runcall now fails at:Suggested fix
Inside
bpsd_get_plasmaf, either:intent(inout)(if the caller is expected to pre-allocate), ORintent(out)reset, explicitlyallocate(plasmaf_out%rho(plasmaf_self%nrmax))(and the other derived-type arrays) before writing.This bug is technically in the bpsd repo (
k-yoshimi/bpsd), but since it's the second link of the cascade revealed only after a bug-B fix attempt, tracking it here next to bug-B keeps the dependency chain visible. Happy to split into a separate bpsd issue if maintainer prefers.Why we're not patching locally
We attempted bug-B's allocatable-buffer fix locally on
kyoshimi-develop. It works for the immediate OOB but uncovers bug-C, which requires a coordinated patch acrossbpsd/bpsd_plasmaf.f90and the TR caller. To avoid drifting our local tree from upstream further than necessary, we've reverted the bug-B patch and instead marked the 3 affected downstream tests (inHengyuLi-Ozaki-lab/task-web-client) aspytest.mark.xfail(strict=False)with this issue as the removal marker.Environment
c4e0defe(kyoshimi-develop branch, ahead 12 ofkyoshimi/developafter rebase to f8f5c9e + 7a436d3 + 100 docs commits)975af0e(local patch for the kid bug fromk-yoshimi/bpsd#6, otherwise master)scripts/setup.sh(nolib/macos_stubs.f90, uses-Wl,-undefined,dynamic_lookup)Related
lib/macos_stubs.f90+ these latent OOBskidbound bug, unblocks reaching bug-B/C