Skip to content

ABI v4 lockstep: pointer lifetime hardening + zero-copy column rooting#25

Open
samtalki wants to merge 6 commits into
mainfrom
abi-v4-hardening
Open

ABI v4 lockstep: pointer lifetime hardening + zero-copy column rooting#25
samtalki wants to merge 6 commits into
mainfrom
abi-v4-hardening

Conversation

@samtalki

@samtalki samtalki commented Jun 12, 2026

Copy link
Copy Markdown
Member

The Julia half of eigenergy/powerio#116. Merge that PR first, then this one, and cut binaries from the same commit (tandem CI inactive, powerio#64).

Memory safety

  • Use after free. NetworkHandle frees the Rust network in a GC finalizer, but the transform paths extracted the raw pointer and used it past the handle's last Julia-visible use. Julia may collect an object after its last use, not at the end of the enclosing call, so any allocation between pointer extraction and a ccall could run the finalizer and hand Rust a freed pointer — reachable in ordinary code such as to_dense(parse_file(path)), where _dense_from_handle issues ~12 ccalls with allocations between them. Every helper now takes the NetworkHandle and runs its ccalls under GC.@preserve h; the raw pointer never travels alone. _cstr gains the same guard for its message buffers.
  • Zero-copy Arrow columns. to_arrow(...; copy=false) returned unsafe_wrap views whose buffers were freed with the ArrowTable, so a column extracted from the table dangled. Buffer ownership moves to an ArrowBuffers owner rooted by the table and by each column (ArrowColumn{T} <: AbstractVector{T}); an extracted column now keeps the producer alive. A GC stress test drops the table and reads the column. copy=true, the default, is unchanged.

ABI v4 lockstep (PIO_ABI_VERSION = 4)

  • from_json/to_json use the powerio-json format string through pio_parse_str/pio_to_format; to_matpower wraps to_format.
  • pio_normalize, pio_to_arrow, and pio_read_dir/pio_scenario_ids with the gridfm format string; gridfm read warnings now attach to the handle.
  • pio_convert_file argument order is (path, from, to).
  • Extractors pass capacities and verify the returned totals. to_dense renames reference_bus → ref_bus_index (a dense index, unlike the 1-based ids in branch.from/to) and n_components → n_islands.
  • New PowerIO.warnings(net), sized exactly via the byte-length query. Unexported: text, warnings = convert_file(...) is the documented destructuring idiom and would shadow it.

Project.toml 0.1.0 → 0.2.0 (the dense tuple field renames are breaking). All 180 tests pass against the library built from the paired branch.

🤖 Generated with Claude Code

Scoped out (v4 surface deliberately not bound here)

  • pio_ref_bus_indices — zero-vs-many reference disambiguation; to_dense documents that ref_bus_index == -1 covers both.
  • pio_convert_str — in-memory convert; parse_str + to_format covers the use today.
  • pio_write_dir — PyPSA CSV export (reads work via parse_file(dir; from="pypsa-csv") pass-through).

Use-after-free class (H1): NetworkHandle frees the Rust network in a GC
finalizer, but the transform paths extracted the raw pointer and kept using
it after the handle's last Julia-visible use — Julia frees an object after
its last use, not at end of call, so any allocation between the extraction
and the ccall could finalize the handle mid-flight (worst case to_dense:
~12 ccalls with a dozen allocations between them). Every helper now takes
the NetworkHandle and runs its ccalls under GC.@preserve h; the raw pointer
never travels alone. _cstr gains the same guard for its buffer (L1).

Zero-copy Arrow columns (M2): buffer ownership moves from ArrowTable into
an ArrowBuffers owner that the table AND each column root, with the release
in its finalizer — a column extracted from its table now keeps the producer
alive on its own instead of dangling. Columns are ArrowColumn{T} <:
AbstractVector{T}; copy=true is unchanged.

ABI v4 lockstep (PIO_ABI_VERSION = 4):
- from_json/to_json ride the powerio-json format string through
  pio_parse_str/pio_to_format (the snapshot, validated on read);
  to_matpower wraps to_format
- pio_normalize, pio_to_arrow, pio_read_dir + pio_scenario_ids with the
  gridfm format string (read_gridfm's warnings now attach to the handle)
- pio_convert_file argument order (path, from, to)
- extractors pass caps and check the returned totals; to_dense renames
  reference_bus -> ref_bus_index and n_components -> n_islands per the v4
  vocabulary (an index vs the 1-based ids in branch.from/to; islands is
  the domain word)
- new PowerIO.warnings(net): handle warnings sized exactly via the
  byte-length query (unexported:  is
  the documented idiom and would shadow it)

Project.toml 0.1.0 -> 0.2.0 (breaking: dense tuple field renames). All 180
tests pass against the v4 library, including a GC stress test that drops
the ArrowTable and reads the extracted column.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

@cameronkhanpour cameronkhanpour left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review of this branch against the normative ABI v4 contract from the paired eigenergy/powerio#112 (powerio-capi/include/powerio.h at v0.2.1-candidate, plus powerio/src/format/mod.rs for the format-token grammar).

Verified clean (no action needed):

  • All 19 ccall signatures match the v4 header exactly, including the pio_convert_file (path, from, to) reorder, the Csize_t cap/count extractor shapes, pio_ref_bus_index as Int64, and pio_is_radial as Cint/int32_t.
  • The GC.@preserve placement is correct everywhere it was added: every raw-pointer use sits inside its preserve (the multi-line total = GC.@preserve h ccall(...) forms parse as intended), _cstr's preserve is right, and ArrowColumn's read path was checked empirically — every AbstractArray fallback (iterate/collect/==/show/broadcast/sum) routes through the preserving getindex, and the per-element GC.@preserve benchmarks to zero cost with full SIMD vectorization on Julia 1.12.
  • The two-call pio_warnings protocol (probe with NULL/0, fill with len + 1) implements the header's byte-length contract correctly.
  • "powermodels-json" / "egret-json" (used by to_powermodels/from_powermodels and tests) are still accepted by target_format_from_name on the v4 branch, so the bridges keep working.
  • The zero-copy redesign's core ordering is sound: ArrowBuffers takes ownership before _decode_arrow, stays rooted across it, and double-release is covered by the C Data Interface release-NULLing contract.

14 findings are posted inline. One more has no diff line to attach to:

CI cannot see this PR's main risk (.github/workflows/CI.yml)

The workflow builds eigenergy/powerio at unpinned default-branch HEAD. Until powerio#112 merges, the built library reports ABI 3, library_available() returns false, and every native testset @test_skips — so CI on this PR is green having executed zero of the rewritten ccalls (a botched pio_convert_file arg order would merge clean). After upstream merges, the inverse bites main until this PR lands. The "180 tests pass" evidence is necessarily local-only. Until tandem CI (powerio#64) exists, consider: (a) pinning the powerio ref this PR is developed against (workflow input or repo variable), and (b) an opt-in strictness toggle (e.g. POWERIO_REQUIRE_NATIVE=1 set in this repo's CI) that turns the silent skip into a failure, so an ABI mismatch can't masquerade as a pass. The comment at CI.yml:57–58 also still names the deleted pio_export_arrow / pio_read_gridfm symbols.

Release-notes completeness

The PR body lists the dense tuple renames as the breaking change, but copy=false consumers face three more: columns are now ArrowColumn, not Vector (isa Vector dispatch, pointer(col)/ccall interop, and setindex! all stop working — collect(col) is the migration); finalize(table) is now a silent no-op (see inline comment); and error-message prefixes changed for from_json/to_matpower failures. Worth itemizing in the 0.2.0 notes.

Overall: the pointer-lifetime hardening and the column-rooting redesign are sound and the right shape; the items I'd resolve before merge are the artifact/CI lockstep gaps (build_tarballs/Artifacts.toml comment, this CI note) and the unchecked per-bus fills, since both undercut exactly the guarantees this PR exists to provide.

Comment thread gen/build_tarballs.jl Outdated
Comment thread src/PowerIO.jl Outdated
Comment thread src/gridfm.jl
Comment thread src/arrow.jl
Comment thread src/arrow.jl
Comment thread src/PowerIO.jl Outdated
Comment thread src/PowerIO.jl
Comment thread src/PowerIO.jl Outdated
Comment thread src/gridfm.jl Outdated
Comment thread test/runtests.jl Outdated
@samtalki

Copy link
Copy Markdown
Member Author

Review findings (xhigh effort, 9 finder angles, all candidates verified against the ABI v4 header and the powerio v4 source):

  1. Artifacts.toml:22PIO_ABI_VERSION bumps to 4 but Artifacts.toml still pins the powerio v0.1.0 tarballs (ABI 3). The load handshake refuses them, so the artifact path errors at first use for every user. Merge gate: regenerate Artifacts.toml from the v4 binary release before shipping 0.2.0.
  2. gen/build_tarballs.jl:20,28version = v"0.0.1" and GitSource pin db944afe (the v0.0.1 tag: ABI 3, no gridfm feature) contradict the --features arrow,gridfm line and the v4 NOTE this PR adds. The recipe fails as committed.
  3. CHANGELOG.md — no 0.2.0 entry despite breaking renames (reference_bus → ref_bus_index, n_components → n_islands), the new warnings accessor, and the changed zero-copy ownership.
  4. README.md:102–113 — the to_arrow paragraph still documents the old contract ("keep it alive while reading"); columns now root the buffers themselves.
  5. src/PowerIO.jl:588 — the section comment claims every extractor's returned total is asserted, but pio_bus_ids, pio_bus_demand, and pio_bus_shunt discard theirs; only branches and gens assert.
  6. src/arrow.jlArrowTable is now immutable: finalize(z) silently no-ops and no eager release path remains; buffers free only at GC (the copy=false decode error path too). Add close(t::ArrowTable).
  7. src/PowerIO.jl — error provenance changed: from_json failures now read PowerIO.parse_str:, to_matpower failures PowerIO.to_format:, _to_json failures append (snapshot).
  8. src/PowerIO.jl:394 — per-call warnings cap at _WARNLEN (4096 bytes) and truncate silently; the handle path sizes exactly. Detect or document.
  9. src/PowerIO.jl:306 — the parse_file docstring edited here omits the pandapower tokens (pandapower-json/pandapower/pp) the v4 surface accepts; README format list likewise.
  10. test/runtests.jl:312 — the GC stress test narrates "dropping the last reference releases the producer" but asserts nothing (@test true); the old double-finalize coverage is gone.

Verified clean: every ccall matches the v4 header arg for arg; feature gating (pio_read_dir/pio_scenario_ids behind gridfm, pio_to_arrow behind arrow) is correct; no stale pre-v4 symbol names remain in code; the ArrowColumn rooting graph is sound.

- CHANGELOG: 0.2.0 entry (ABI 4, dense renames, warnings accessor, column
  rooting, close).
- close(t::ArrowTable) releases the producer's buffers eagerly; the immutable
  table had dropped the finalize(t) path with no replacement. Test asserts the
  release callbacks NULL and that close is idempotent.
- _expect_total verifies every cap/count fill (bus ids, demand, and shunt were
  silently discarding theirs), matching the section comment's claim.
- from_json and to_json failures are labeled with their own entry points
  instead of parse_str/(snapshot).
- gen/build_tarballs.jl: pin the powerio#112 head (ABI 4) and v0.2.1 crate
  version; the old pin was the v0.0.1 tag, which predates the gridfm feature
  the script enables.
- Docs: zero-copy contract in the README reflects column rooting and close;
  pandapower format tokens listed; _WARNLEN truncation documented.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
@samtalki

Copy link
Copy Markdown
Member Author

Addressed in 05bc08b:

  • 2 build_tarballs pinned to the powerio#112 head (4511e350, ABI 4), version v0.2.1; stale v4 NOTE dropped.
  • 3 CHANGELOG 0.2.0 entry (ABI 4 requirement, dense renames, warnings, column rooting, close).
  • 4 README zero-copy paragraph and example now state the column rooting contract.
  • 5 _expect_total verifies all five cap/count fills; the section comment's "asserted below" is now true.
  • 6 Base.close(t::ArrowTable) releases the producer eagerly; idempotent.
  • 7 from_json / to_json failures labeled with their own entry points.
  • 8 _WARNLEN silent truncation documented at the constant.
  • 9 pandapower tokens in the parse_file docstring and README format list.
  • 10 GC test asserts the release callbacks NULL after close and that double close is a no-op.

Open:

  • 1 Artifacts.toml still pins the v0.1.0 (ABI 3) tarballs — unresolvable until powerio#112 merges and the v0.2.1 binaries are cut; regenerate with julia gen/update_artifacts.jl v0.2.1 before releasing 0.2.0. Merge gate.

182/182 tests pass against the library built from powerio#112 (4511e350).

- _gridfm_scenario_ids asserts the fill count matches the probe: both calls
  re-read the filesystem, so a shrinking dataset left heap garbage in the tail
  of ids.
- ArrowColumn.getproperty errors instead of handing out the bare unsafe_wrap
  view (c.data did not root the buffers); collect(c) is the supported copy.
- _feature_call_error narrows the feature-missing catches to ErrorException
  (an ArgumentError from argument conversion rethrows untouched) and dedupes
  the three copies; _exports_symbol dedupes the availability probes;
  _warn_lines dedupes the warn split, returns owned Strings everywhere, and
  appends a truncation sentinel when a capped fill lands at the buffer edge.
- _format_from_handle takes warn=false (length 0 discards the channel) so
  to_json and to_matpower skip the 4 KiB warn work; to_matpower errors are
  attributed to to_matpower again.
- to_json treats a finalized (null) handle like a missing one and falls back
  to cached data; NetworkHandle captures the allocating library's
  pio_network_free at construction so a set_library! swap cannot cross
  allocators, and set_library! invalidates the ABI handshake memo.
- Docs: docs/binary.md says arrow,gridfm; CONTRIBUTING forbids registering
  with mismatched artifact ABI; parse_file documents pypsa-csv directories;
  source_format lists the v4 value set; to_dense documents the ref_bus_index
  -1 ambiguity and the unbound pio_ref_bus_indices; ArrowTable documents
  PowerIO.columns and the Tables.jl claim is scoped to copy=true.
- Tests: the tautological handle-warnings assert now checks the synthesized
  bus ids warning content; z.id.data throws; PowerIO.columns covered.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
@samtalki

Copy link
Copy Markdown
Member Author

Review findings (iteration 3 — supersession sweep after powerio#112 closed in favor of powerio#116):

  1. gen/build_tarballs.jl:17-31 — comments name "powerio#112" and "the v0.2.1 release tag", and version = v"0.2.1"; #112 is closed, superseded by powerio#116 (v0.3.0-candidate, same head 4511e350, still ABI 4). The pin commit stays valid; the PR reference and release version are stale: the v4 binary ships as powerio v0.3.0.
  2. CHANGELOG.md:5 — "Targets powerio C ABI version 4 (powerio v0.2.1)" — same staleness; should read v0.3.0.
  3. PR description — "The Julia half of v0.3.0 candidate: memory safety hardening + C ABI v4 powerio#112. Merge that PR first" — should reference #116.
  4. Unchanged and still valid: the Artifacts.toml merge gate (regenerate from the v0.3.0 binaries once cut) and the pin itself.

CI is green (test 1.9/1, Documentation, benchmark); all 14 inline review threads are resolved.

- powerio#112 closed, superseded by #116 (v0.3.0-candidate, same head
  4511e350, still ABI 4): build_tarballs version and comments, CHANGELOG,
  and the PR description now name #116 / v0.3.0.
- _network_free_fn memoizes the dlsym'd pio_network_free per library path
  (was a dlopen/dlsym per handle) and resolves before `new`, so a failed
  lookup cannot strand a handle without a finalizer.
- convert_file docstring: pypsa-csv excluded from the shared token list
  (directories, not text).

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
@samtalki

Copy link
Copy Markdown
Member Author

Iteration 3 addressed in 2fb602c:

  • 1, 2 build_tarballs and CHANGELOG now name powerio#116 / v0.3.0 (the pin 4511e350 was already the #116 head, so only the labels moved).
  • 3 PR description points at powerio#116.
  • Also from this iteration's fresh pass over 0ea3929: pio_network_free is now memoized per library path and resolved before new, so a failed lookup cannot strand a finalizer-less handle, and convert_file's docstring excludes pypsa-csv from the shared token list.

Open (unchanged): Artifacts.toml regenerates from the powerio v0.3.0 binaries once #116 merges and the release is cut (julia gen/update_artifacts.jl v0.3.0) — merge gate.

185/185 tests pass against the library built from 4511e350.

…ruct

The docstring attached to `const _FREE_FN`, so Documenter's
[`NetworkHandle`](@ref) in docs/src/index.md no longer resolved and the
Documentation build failed. Move the memo above the docstring.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
samtalki added a commit that referenced this pull request Jun 12, 2026
* Pointer lifetime hardening on ABI 3: GC.@preserve sweep + zero-copy column rooting

The nonbreaking half of the ABI v4 lockstep branch (PR #25), portable to the
v0.2.x binaries:

- Every helper that lowers a NetworkHandle to a raw pointer runs its ccalls
  under GC.@preserve; the raw pointer never travels alone. Julia may collect a
  handle after its last use, not at end of call, so a GC between pointer
  extraction and a ccall could finalize the handle and hand Rust a freed
  pointer — reachable in ordinary code such as to_dense(parse_file(path)).
  _cstr gains the same guard for its message buffers.
- to_arrow(copy=false) columns are ArrowColumn views rooting a shared
  ArrowBuffers owner, so an extracted column keeps the producer alive on its
  own; close(t) releases eagerly. ArrowTable stays mutable so finalize(t) from
  0.1.0 code remains legal (now a no-op). GC stress test drops the table and
  reads the column.
- The handle finalizer captures pio_network_free from the allocating library
  (a set_library! swap no longer frees across allocators); set_library! resets
  the ABI handshake.
- read_gridfm_scenarios checks the scenario count between probe and fill;
  feature-gate errors rethrow non-toolchain exceptions; warning buffers grow
  to 4096 bytes with a truncation marker near the cap.

Project.toml 0.1.1. 183 tests pass against powerio v0.2.1 (ABI 3).

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

* Changelog: name the one behavioral change instead of claiming none

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

---------

Co-authored-by: Claude Fable 5 <noreply@anthropic.com>
@samtalki

Copy link
Copy Markdown
Member Author

Review findings (iteration 4 — periodic sweep):

  1. gen/build_tarballs.jl:28 — powerio#116's head moved (4511e350b92d9abe, a merge of main's v0.2.1 release into the candidate; the powerio-capi header diff between the two is empty). The pin comment claims "the powerio#116 head", which is now one merge behind. Bump the pin to b92d9abe to keep the claim true; the bound surface is unchanged.

No other changes: PR head d2be84c is green (all four checks), zero unresolved threads, no new review comments. powerio v0.2.1 (published 19:44Z) is an ABI 3 patch from main and does not affect this PR; the v4 release remains v0.3.0 via powerio#116, still open.

b92d9abe merges main's v0.2.1 release into the candidate; the powerio-capi
header is untouched, so the bound surface is unchanged.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants