Skip to content

fix(trbpsd): size temp/qp to plasmaf%nrmax to avoid OOB (#203)#210

Open
HengyuLi-Ozaki-lab wants to merge 1 commit into
k-yoshimi:developfrom
HengyuLi-Ozaki-lab:fix/trbpsd-oob-203
Open

fix(trbpsd): size temp/qp to plasmaf%nrmax to avoid OOB (#203)#210
HengyuLi-Ozaki-lab wants to merge 1 commit into
k-yoshimi:developfrom
HengyuLi-Ozaki-lab:fix/trbpsd-oob-203

Conversation

@HengyuLi-Ozaki-lab
Copy link
Copy Markdown

Summary

Fixes the trbpsd.f90 half of the out-of-bounds cascade tracked in #203 (bug-B).

tr_bpsd_get declared its work arrays with the compile-time bound
nrmp = NRMAX + 1:

real(rkind) :: temp(nrmp,nsm,3)
real(rkind) :: tempx(nrmp,17)

For a file-baked equilibrium fixture such as eqdata.TST-2,
bpsd_get_data(plasmaf) returns plasmaf%nrmax = 52 while nrmp = 51
(default NRMAX = 50). The copy loop then writes temp(52,...) one row
past the end, and the do nr=2,plasmaf%nrmax loop writes qp(51) while
QP is allocated to size NRMAX (= 50). NRMAX is not exposed through
the TR set_param registry, so a caller cannot side-step this by shrinking
the mesh.

Under release builds these writes corrupted the heap silently; on macOS
hardened-malloc they surfaced as the non-deterministic
free_list_checksum_botch SIGABRT in #194.

Fix

  • Make temp / tempx ALLOCATABLE and size them to
    max(plasmaf%nrmax, nrmp) after bpsd_get_data.
  • Clamp the qp(nr-1) write loop to min(plasmaf%nrmax, nrmax+1) so it
    never exceeds the QP(NRMAX) allocation.
  • Deallocate on exit.

Evidence it is numerically correct (not just crash-suppressing)

A controlled rebuild compared a reference build with this fix + the
bpsd_get_plasmaf intent(out) fix (bug-C) and -fbounds-check -fcheck=all
ON, against a release build that lets the OOB go silent. tr_tst2 output
was compared pointwise across the full state:

  • Max relative error = 1.25e-11 — pure -O2/-O0 float reassociation,
    not the OOB.

The one-past-end temp(52) / qp(51) writes land in memory the physics
never reads back, and the two extra boundary points were never consumed
(mesh_convert_gtom uses only the inner mesh). With the fix, tr_tst2
runs to completion under bounds-check on macOS with the Newton solve
converging normally.

Related

Test plan

  • Build with -fbounds-check -fcheck=all.
  • Run the TST-2 transport baseline (tr_tst2) and confirm it completes
    with no Fortran runtime bounds error.
  • Spot-check output against a known-good release build (expect agreement
    to ~1e-11).

…-yoshimi#203 bug-B)

tr_bpsd_get declared temp(nrmp,nsm,3) and tempx(nrmp,17) with the
compile-time bound nrmp = NRMAX+1. For file-baked fixtures such as
eqdata.TST-2, bpsd_get_data(plasmaf) returns plasmaf%nrmax = 52 while
nrmp = 51, so the  do nr=1,plasmaf%nrmax  copy loop wrote temp(52,...)
one row past the end. Likewise the  do nr=2,plasmaf%nrmax  loop wrote
qp(51) while QP is allocated to size NRMAX (=50).

Fix: make temp/tempx ALLOCATABLE and size them to
max(plasmaf%nrmax, nrmp) after bpsd_get_data, and clamp the qp(nr-1)
write loop to min(plasmaf%nrmax, nrmax+1). These were two of the
upstream OOBs that silently corrupted the heap under release builds
and produced the macOS SIGABRT signature in k-yoshimi#194.

Verified: with this plus the bpsd_plasmaf intent(out) fix (bug-C),
tr_tst2 runs cleanly under -fbounds-check -fcheck=all on macOS, the
Newton solve converges, and the output matches the prior silent-OOB
release build to rel err 1.25e-11 (pure float reassociation).

Local patch on kyoshimi-develop; cherry-picked to a clean PR branch
for k-yoshimi#203.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant