ABI v4 lockstep: pointer lifetime hardening + zero-copy column rooting#25
ABI v4 lockstep: pointer lifetime hardening + zero-copy column rooting#25samtalki wants to merge 6 commits into
Conversation
Use-after-free class (H1): NetworkHandle frees the Rust network in a GC finalizer, but the transform paths extracted the raw pointer and kept using it after the handle's last Julia-visible use — Julia frees an object after its last use, not at end of call, so any allocation between the extraction and the ccall could finalize the handle mid-flight (worst case to_dense: ~12 ccalls with a dozen allocations between them). Every helper now takes the NetworkHandle and runs its ccalls under GC.@preserve h; the raw pointer never travels alone. _cstr gains the same guard for its buffer (L1). Zero-copy Arrow columns (M2): buffer ownership moves from ArrowTable into an ArrowBuffers owner that the table AND each column root, with the release in its finalizer — a column extracted from its table now keeps the producer alive on its own instead of dangling. Columns are ArrowColumn{T} <: AbstractVector{T}; copy=true is unchanged. ABI v4 lockstep (PIO_ABI_VERSION = 4): - from_json/to_json ride the powerio-json format string through pio_parse_str/pio_to_format (the snapshot, validated on read); to_matpower wraps to_format - pio_normalize, pio_to_arrow, pio_read_dir + pio_scenario_ids with the gridfm format string (read_gridfm's warnings now attach to the handle) - pio_convert_file argument order (path, from, to) - extractors pass caps and check the returned totals; to_dense renames reference_bus -> ref_bus_index and n_components -> n_islands per the v4 vocabulary (an index vs the 1-based ids in branch.from/to; islands is the domain word) - new PowerIO.warnings(net): handle warnings sized exactly via the byte-length query (unexported: is the documented idiom and would shadow it) Project.toml 0.1.0 -> 0.2.0 (breaking: dense tuple field renames). All 180 tests pass against the v4 library, including a GC stress test that drops the ArrowTable and reads the extracted column. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
There was a problem hiding this comment.
Review of this branch against the normative ABI v4 contract from the paired eigenergy/powerio#112 (powerio-capi/include/powerio.h at v0.2.1-candidate, plus powerio/src/format/mod.rs for the format-token grammar).
Verified clean (no action needed):
- All 19
ccallsignatures match the v4 header exactly, including thepio_convert_file(path, from, to)reorder, theCsize_tcap/count extractor shapes,pio_ref_bus_indexasInt64, andpio_is_radialasCint/int32_t. - The
GC.@preserveplacement is correct everywhere it was added: every raw-pointer use sits inside its preserve (the multi-linetotal = GC.@preserve h ccall(...)forms parse as intended),_cstr's preserve is right, andArrowColumn's read path was checked empirically — everyAbstractArrayfallback (iterate/collect/==/show/broadcast/sum) routes through the preservinggetindex, and the per-elementGC.@preservebenchmarks to zero cost with full SIMD vectorization on Julia 1.12. - The two-call
pio_warningsprotocol (probe withNULL/0, fill withlen + 1) implements the header's byte-length contract correctly. "powermodels-json"/"egret-json"(used byto_powermodels/from_powermodelsand tests) are still accepted bytarget_format_from_nameon the v4 branch, so the bridges keep working.- The zero-copy redesign's core ordering is sound:
ArrowBufferstakes ownership before_decode_arrow, stays rooted across it, and double-release is covered by the C Data Interface release-NULLing contract.
14 findings are posted inline. One more has no diff line to attach to:
CI cannot see this PR's main risk (.github/workflows/CI.yml)
The workflow builds eigenergy/powerio at unpinned default-branch HEAD. Until powerio#112 merges, the built library reports ABI 3, library_available() returns false, and every native testset @test_skips — so CI on this PR is green having executed zero of the rewritten ccalls (a botched pio_convert_file arg order would merge clean). After upstream merges, the inverse bites main until this PR lands. The "180 tests pass" evidence is necessarily local-only. Until tandem CI (powerio#64) exists, consider: (a) pinning the powerio ref this PR is developed against (workflow input or repo variable), and (b) an opt-in strictness toggle (e.g. POWERIO_REQUIRE_NATIVE=1 set in this repo's CI) that turns the silent skip into a failure, so an ABI mismatch can't masquerade as a pass. The comment at CI.yml:57–58 also still names the deleted pio_export_arrow / pio_read_gridfm symbols.
Release-notes completeness
The PR body lists the dense tuple renames as the breaking change, but copy=false consumers face three more: columns are now ArrowColumn, not Vector (isa Vector dispatch, pointer(col)/ccall interop, and setindex! all stop working — collect(col) is the migration); finalize(table) is now a silent no-op (see inline comment); and error-message prefixes changed for from_json/to_matpower failures. Worth itemizing in the 0.2.0 notes.
Overall: the pointer-lifetime hardening and the column-rooting redesign are sound and the right shape; the items I'd resolve before merge are the artifact/CI lockstep gaps (build_tarballs/Artifacts.toml comment, this CI note) and the unchecked per-bus fills, since both undercut exactly the guarantees this PR exists to provide.
|
Review findings (xhigh effort, 9 finder angles, all candidates verified against the ABI v4 header and the powerio v4 source):
Verified clean: every ccall matches the v4 header arg for arg; feature gating ( |
- CHANGELOG: 0.2.0 entry (ABI 4, dense renames, warnings accessor, column rooting, close). - close(t::ArrowTable) releases the producer's buffers eagerly; the immutable table had dropped the finalize(t) path with no replacement. Test asserts the release callbacks NULL and that close is idempotent. - _expect_total verifies every cap/count fill (bus ids, demand, and shunt were silently discarding theirs), matching the section comment's claim. - from_json and to_json failures are labeled with their own entry points instead of parse_str/(snapshot). - gen/build_tarballs.jl: pin the powerio#112 head (ABI 4) and v0.2.1 crate version; the old pin was the v0.0.1 tag, which predates the gridfm feature the script enables. - Docs: zero-copy contract in the README reflects column rooting and close; pandapower format tokens listed; _WARNLEN truncation documented. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
|
Addressed in 05bc08b:
Open:
182/182 tests pass against the library built from powerio#112 ( |
- _gridfm_scenario_ids asserts the fill count matches the probe: both calls re-read the filesystem, so a shrinking dataset left heap garbage in the tail of ids. - ArrowColumn.getproperty errors instead of handing out the bare unsafe_wrap view (c.data did not root the buffers); collect(c) is the supported copy. - _feature_call_error narrows the feature-missing catches to ErrorException (an ArgumentError from argument conversion rethrows untouched) and dedupes the three copies; _exports_symbol dedupes the availability probes; _warn_lines dedupes the warn split, returns owned Strings everywhere, and appends a truncation sentinel when a capped fill lands at the buffer edge. - _format_from_handle takes warn=false (length 0 discards the channel) so to_json and to_matpower skip the 4 KiB warn work; to_matpower errors are attributed to to_matpower again. - to_json treats a finalized (null) handle like a missing one and falls back to cached data; NetworkHandle captures the allocating library's pio_network_free at construction so a set_library! swap cannot cross allocators, and set_library! invalidates the ABI handshake memo. - Docs: docs/binary.md says arrow,gridfm; CONTRIBUTING forbids registering with mismatched artifact ABI; parse_file documents pypsa-csv directories; source_format lists the v4 value set; to_dense documents the ref_bus_index -1 ambiguity and the unbound pio_ref_bus_indices; ArrowTable documents PowerIO.columns and the Tables.jl claim is scoped to copy=true. - Tests: the tautological handle-warnings assert now checks the synthesized bus ids warning content; z.id.data throws; PowerIO.columns covered. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
|
Review findings (iteration 3 — supersession sweep after powerio#112 closed in favor of powerio#116):
CI is green (test 1.9/1, Documentation, benchmark); all 14 inline review threads are resolved. |
- powerio#112 closed, superseded by #116 (v0.3.0-candidate, same head 4511e350, still ABI 4): build_tarballs version and comments, CHANGELOG, and the PR description now name #116 / v0.3.0. - _network_free_fn memoizes the dlsym'd pio_network_free per library path (was a dlopen/dlsym per handle) and resolves before `new`, so a failed lookup cannot strand a handle without a finalizer. - convert_file docstring: pypsa-csv excluded from the shared token list (directories, not text). Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
|
Iteration 3 addressed in 2fb602c:
Open (unchanged): Artifacts.toml regenerates from the powerio v0.3.0 binaries once #116 merges and the release is cut ( 185/185 tests pass against the library built from |
…ruct The docstring attached to `const _FREE_FN`, so Documenter's [`NetworkHandle`](@ref) in docs/src/index.md no longer resolved and the Documentation build failed. Move the memo above the docstring. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
* Pointer lifetime hardening on ABI 3: GC.@preserve sweep + zero-copy column rooting The nonbreaking half of the ABI v4 lockstep branch (PR #25), portable to the v0.2.x binaries: - Every helper that lowers a NetworkHandle to a raw pointer runs its ccalls under GC.@preserve; the raw pointer never travels alone. Julia may collect a handle after its last use, not at end of call, so a GC between pointer extraction and a ccall could finalize the handle and hand Rust a freed pointer — reachable in ordinary code such as to_dense(parse_file(path)). _cstr gains the same guard for its message buffers. - to_arrow(copy=false) columns are ArrowColumn views rooting a shared ArrowBuffers owner, so an extracted column keeps the producer alive on its own; close(t) releases eagerly. ArrowTable stays mutable so finalize(t) from 0.1.0 code remains legal (now a no-op). GC stress test drops the table and reads the column. - The handle finalizer captures pio_network_free from the allocating library (a set_library! swap no longer frees across allocators); set_library! resets the ABI handshake. - read_gridfm_scenarios checks the scenario count between probe and fill; feature-gate errors rethrow non-toolchain exceptions; warning buffers grow to 4096 bytes with a truncation marker near the cap. Project.toml 0.1.1. 183 tests pass against powerio v0.2.1 (ABI 3). Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> * Changelog: name the one behavioral change instead of claiming none Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> --------- Co-authored-by: Claude Fable 5 <noreply@anthropic.com>
|
Review findings (iteration 4 — periodic sweep):
No other changes: PR head |
b92d9abe merges main's v0.2.1 release into the candidate; the powerio-capi header is untouched, so the bound surface is unchanged. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
The Julia half of eigenergy/powerio#116. Merge that PR first, then this one, and cut binaries from the same commit (tandem CI inactive, powerio#64).
Memory safety
NetworkHandlefrees the Rust network in a GC finalizer, but the transform paths extracted the raw pointer and used it past the handle's last Julia-visible use. Julia may collect an object after its last use, not at the end of the enclosing call, so any allocation between pointer extraction and a ccall could run the finalizer and hand Rust a freed pointer — reachable in ordinary code such asto_dense(parse_file(path)), where_dense_from_handleissues ~12 ccalls with allocations between them. Every helper now takes theNetworkHandleand runs its ccalls underGC.@preserve h; the raw pointer never travels alone._cstrgains the same guard for its message buffers.to_arrow(...; copy=false)returnedunsafe_wrapviews whose buffers were freed with theArrowTable, so a column extracted from the table dangled. Buffer ownership moves to anArrowBuffersowner rooted by the table and by each column (ArrowColumn{T} <: AbstractVector{T}); an extracted column now keeps the producer alive. A GC stress test drops the table and reads the column.copy=true, the default, is unchanged.ABI v4 lockstep (
PIO_ABI_VERSION = 4)from_json/to_jsonuse thepowerio-jsonformat string throughpio_parse_str/pio_to_format;to_matpowerwrapsto_format.pio_normalize,pio_to_arrow, andpio_read_dir/pio_scenario_idswith thegridfmformat string; gridfm read warnings now attach to the handle.pio_convert_fileargument order is(path, from, to).to_denserenamesreference_bus → ref_bus_index(a dense index, unlike the 1-based ids inbranch.from/to) andn_components → n_islands.PowerIO.warnings(net), sized exactly via the byte-length query. Unexported:text, warnings = convert_file(...)is the documented destructuring idiom and would shadow it.Project.toml 0.1.0 → 0.2.0 (the dense tuple field renames are breaking). All 180 tests pass against the library built from the paired branch.
🤖 Generated with Claude Code
Scoped out (v4 surface deliberately not bound here)
pio_ref_bus_indices— zero-vs-many reference disambiguation;to_densedocuments thatref_bus_index == -1covers both.pio_convert_str— in-memory convert;parse_str+to_formatcovers the use today.pio_write_dir— PyPSA CSV export (reads work viaparse_file(dir; from="pypsa-csv")pass-through).