v0.3.0 candidate: memory safety hardening + C ABI v4#112
Conversation
`gencost_row` reads NCOST from the file as an f64 truncated to usize, so a huge or non-finite value saturates near usize::MAX. The width requirement was then computed as `start + want` (with `want = 2*ncost` for piecewise costs), which overflows: an add-overflow panic under debug overflow-checks, and in release a wraparound that makes the `require` length check pass and then panics on the reversed `row[start..start + want]` slice range. A crafted MATPOWER `mpc.gencost` row (e.g. NCOST = 1e20) therefore panics on every build profile. Through the C ABI / Python / Julia the panic is caught at the FFI boundary and degraded to a generic "panic while parsing", but the pure Rust API and the CLI take an uncaught panic — a denial of service on untrusted input. It is not a memory-safety issue: the release wraparound lands on a bounds-checked slice, so it panics rather than reading out of bounds. Size the requirement with saturating arithmetic so an implausible NCOST is rejected by the existing length check as a loud `ShortRow` error, the parser's normal malformed-input signal, on every profile and through every binding. Found by malformed-input fuzzing of the parser surface. https://claude.ai/code/session_013KSDeKD9C3YsGaR67RDKhr
copy_to_buf clipped error/warning messages at a raw byte count, which could split a multi-byte UTF-8 codepoint and hand consumers an invalid UTF-8 string. Back the truncation point up to a character boundary so a clipped message is always valid UTF-8, and pin the behavior with a test. https://claude.ai/code/session_01KxR1fuH4L8XHHZXtNYgrG8
…to v0.2.1-candidate
…mat strings everywhere PIO_ABI_VERSION 3 -> 4. One verb per job, one meaning per word, and no format names in symbols, so the surface evolves additively from here: - pio_to_normalized -> pio_normalize (a value transform returns a handle; the to_ family re-encodes unchanged data, per the strtol/htons lineage) - pio_to_matpower / pio_to_json / pio_from_json cut: matpower and the new validated powerio-json snapshot flow through pio_to_format/pio_parse_str as format strings (TargetFormat::PowerioJson; write_as is now fallible because JSON has no Inf/NaN and the snapshot must round-trip exactly) - pio_export_arrow -> pio_to_arrow; the Arrow schema is the evolution valve - pio_write_pypsa_csv_folder -> pio_write_dir(net, to, dir); pio_read_gridfm -> pio_read_dir(dir, from, scenario); pio_gridfm_scenario_ids -> pio_scenario_ids(dir, from, ...): directory formats are strings too - pio_convert_str joins pio_convert_file (both now (input, from, to, ...)) - every array extractor takes a cap and returns the total count; NULL out is the count query, so a caller buffer can never silently overflow (pio_n_reference_buses folds into pio_ref_bus_indices) - pio_parse_warnings -> pio_warnings: warnings attach to the handle from any constructor (pio_read_dir drops its warnbuf), and the return is the byte length needed, so callers can size exactly - pio_reference_bus -> pio_ref_bus_index (i64): it returns a dense index while pio_branches from/to carry bus ids; the unit is now in the name - pio_n_components -> pio_n_islands; pio_nodal_demand/pio_nodal_shunt -> pio_bus_demand/pio_bus_shunt: bus/node/branch vocabulary fixed in the header preamble (bus = connection point, node = conductor point at a bus, reserved for the multiconductor surface; branch = any two-terminal series element) The header preamble now states the grammar, the conventions (errbuf per libpcap/curl, cap/count per snprintf, UTF-8 boundary truncation, handle immutability), and the freeze-and-evolve policy. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
matpower, psse, and powerio-json through parse_str; the PowerWorld .pwb and .pwd binary decoders on raw bytes. The invariant is the parser trust model: Ok or a structured Err on any input, never a panic. Excluded from the workspace (needs nightly + cargo-fuzz); see fuzz/README.md. The gencost NCOST overflow was found by exactly this harness shape. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
… model and panic notes The capi README gains the ABI v4 history row, the cap/count contract, the parser trust model (malformed input errors, never UB; memory scales with input and is uncapped), and the panic strategy note (guards need the default unwind; an abort build aborts cleanly). smoke.c now exercises the v4 surface: count queries, powerio-json snapshot round trip, convert_str, write_dir, pio_warnings sizing. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…ble write_as Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
- dataset format dispatch moves from powerio-capi into powerio-matrix's io hub (read_dataset_dir / dataset_scenario_ids), next to where the gridfm reader lives; the single-variant DatasetFormat enum at the C boundary is gone and the C ABI is a thin wrapper, like every other format dispatch - the three identical catch_unwind tails of pio_to_format / pio_convert_file / pio_convert_str fold into finish_conversion, mirroring finish_network - write_as: the PowerioJson early return becomes a match arm, dropping the unreachable!() (the snapshot still skips the warning passes deliberately: warn_normalized_tap would be false for a format that preserves the labels) Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…C block comment early Found by compiling smoke.c against the regenerated header: the pio_read_dir doc's directory-glob example ended the comment mid-sentence and broke every build including powerio.h. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
|
Lockstep Julia PR: eigenergy/PowerIO.jl#25 — merge this one first, then that, and cut binaries from the same commit. |
…ecoder bench The validate job's Julia shim ccalled pio_nodal_demand/pio_nodal_shunt and the un-capped extractor signatures — the one consumer the docs sweep missed (it greps .jl now). parse.rs gains parse_pwd_activsg200: the one reader whose hot loop runs per byte, regression coverage for the total byte accessors. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
|
Benchmark verification before merge (criterion A/B, branch vs main baseline, shared target dir, M-series local):
Note: the only statistically significant shifts are ≤1.3% on µs-scale benchmarks whose code this PR does not touch (matpower write/roundtrip, pwb), with equal-magnitude shifts in both directions across the suite — codegen layout jitter under lto=thin/codegen-units=1, not algorithmic change. The two readers this PR actually reworked measure flat (gencost saturating arithmetic) and faster (pwd total accessors, which now also cache decoded coordinates instead of re-reading). The flagged powermodels regression vanished on an immediate rerun against the same baseline. Continuous tracking so this stops being a manual ritual: #115. |
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
- to_json refuses non-finite values, naming the field: serde_json would degrade Inf/NaN to null, which the snapshot's own reader then rejects; the documented write-side error now actually fires - sniff_json learns the snapshot's top level buses key, so a .json snapshot parses without a format hint; powerio-cli gains the powerio-json arm (aliases powerio/json) - pio_network_free / pio_string_free run under the panic guard the boundary contract documents - capi README calls out the silent pio_convert_file argument reorder, the one v4 break invisible at link time - new powerworld_aux fuzz target: the .aux tokenizer was the one hand-written parse_str reader no harness fed - README examples compile again (.network + ?); languages.md drops a stale "(PR open)" label Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
The strict to_json guard broke the bindings' materialization path: readers legitimately produce Inf limits (the pandapower fixture carries an infinite pmax) and Python/Julia build every Network view through the snapshot, so refusing the write refused the parse. Keep the write total, surface the degradation as a write_as fidelity warning naming the field, and pin the no-read-back consequence in the test (the validating reader still rejects the null). Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Merges two malformed-input fixes, generalizes both bug classes, and revises the C ABI to v4. The header preamble (
powerio-capi/include/powerio.h) is the normative statement of the v4 conventions.Memory safety
1e20) overflowed the row-width arithmetic and panicked on every build profile. The arithmetic now saturates and the row is rejected as a parse error. (claude/keen-feynman-vv3049)claude/amazing-edison-4bjitk).pwdreader: the byte-read helpers indexed the buffer directly and relied on per-call-site bounds checks. They now returnOption, so an out-of-range offset from a corrupt file rejects the record instead of panicking; the record scan also retains decoded coordinates rather than re-reading them. The differential oracle tests (decoded coordinates checked against same-vintage.auxfiles across the save corpus) pass unchanged.fuzz/, workspace-excluded; nightly + cargo-fuzz): matpower, psse, and powerio-json viaparse_str; the.pwb/.pwddecoders on raw bytes. Invariant: any input yieldsOkor a structuredErr, never a panic. All five targets pass seeded smoke runs..pwbcursor reads are bounds-checked; the psse/egret/pandapower numeric casts never feed indexing; every entry point already catches panics at the boundary.C ABI v4 (
PIO_ABI_VERSION3 → 4)v3 used three different verbs for serialization, named a handle-returning transform like the string serializers, and let extractors write past a miscounted buffer. All known consumers are this repo and PowerIO.jl, and
pio_abi_versionrejects mismatched libraries at load, so the break is cheap now. The conventions are designed so it is the last one.pio_to_matpower,pio_to_json,pio_from_jsonmatpowerand the newpowerio-jsonsnapshot are format strings intopio_to_format/pio_parse_strpio_to_normalizedpio_normalize— a value transform returning a handle;to_re-encodes unchanged datapio_export_arrowpio_to_arrowpio_write_pypsa_csv_folderpio_write_dir(net, to, dir, ...)pio_read_gridfm,pio_gridfm_scenario_idspio_read_dir(dir, from, scenario, ...),pio_scenario_ids(dir, from, ...)pio_parse_warningspio_warningspio_reference_bus(isize),pio_reference_buses,pio_n_reference_busespio_ref_bus_index(i64),pio_ref_bus_indices(net, out, cap)— a dense index, not a bus id, and named sopio_n_components,pio_nodal_demand,pio_nodal_shuntpio_n_islands,pio_bus_demand,pio_bus_shuntpio_convert_file(path, to, from)pio_convert_file(path, from, to); newpio_convert_str(text, from, to)No format names remain in the symbol table; adding a format leaves the ABI unchanged.
Conventions:
cap, so a miscounted buffer reads short instead of overflowing, and(NULL, 0)is a count query — thesnprintfpattern. v3 wrote exactlypio_n_*elements on trust.pio_warningsreturns the byte length of the joined text, so a buffer can be sized exactly; v3 returned a warning count, which cannot size a buffer. Warnings attach to the handle from any constructor; only functions returning no handle (pio_to_format,pio_convert_*,pio_write_dir) take awarnbuf.errbuf/errlen(the libpcap/curl idiom) — no library-allocated strings to free, no thread-local state.bus_demandandn_islands.powerio-jsonsnapshot, whose schemas evolve without touching a C signature.Supporting changes:
TargetFormat::PowerioJsonmakes the snapshot an ordinary format, reachable from the CLI and the converters;write_as/to_formatbecome fallible because the snapshot rejects non-finite values rather than writingnull(foreign JSON targets are unchanged);powerio::write_dirandpowerio_matrix::read_dataset_dir/dataset_scenario_idsare the directory-format dispatch points;examples/smoke.cexercises the full v4 surface and is compiled and run in CI.Verification
Workspace test suite, 26 capi unit tests, header parity, the compiled C smoke binary end to end, PowerIO.jl's 180 tests against this branch's library, the PowerModels/Exa oracle matrix, and the fuzz smoke runs. Benchmarks against a main baseline show no regressions; the two reworked readers measure flat (matpower) and 1.3% faster (
.pwd). Full table: benchmark comment. Continuous tracking: #115.Numbering and pairing
No version number changes in this diff; the branch keeps its working name. Recommended release number: 0.3.0 — pre-1.0 convention puts breaking changes in the minor, and the ABI handshake remains the actual compatibility gate. Merge order: this PR, then eigenergy/PowerIO.jl#25, with binaries cut from the same commit (tandem CI inactive, #64). Follow-ups: #113 (dist surface adopts these conventions), #114 (PowerDiff field renames), #115 (benchmark tracking).
🤖 Generated with Claude Code