Skip to content

infinityabundance/kobold-bench

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

kobold-bench

Benchmark harness for the gnucobol-rs hot path — the cob_move DISPLAY ↔ COMP-3 conversions that dominate legacy-file ingestion — with a parity re-check after every run. The doctrine: performance work never alters sealed semantics. A throughput number is never reported without re-confirming the byte-exact result.

Baseline (correctness mode — no SIMD, no parallelism)

Measured by kobold-throughput on one developer machine (x86-64, single thread, --release, overflow-checks = true). Your numbers will differ; reproduce with cargo run --release --bin kobold-throughput -- 5000000.

Conversion Throughput Bytes Parity
DISPLAY S9(7)V99 → COMP-3 ~95 M records/sec ~860 MB/sec (source) re-checked byte-exact

For scale intuition: a 100-million-record nightly batch with this single field is ~1 second of CPU for the decimal conversion, single-threaded — before any parallel/SIMD feature. The kernel is allocation-free per conversion (fixed stack buffers), which is what makes it Lambda/Glue-friendly.

These are baseline numbers on purpose. The point is a provably correct primitive that is already fast; optional acceleration is gravy, gated, and re-proven.

Run it

cargo run --release --bin kobold-throughput -- 10000000   # quick throughput probe + parity re-check
cargo bench                                                # criterion micro-benchmarks (per-direction)

cargo bench runs benches/cob_move.rs (Criterion) for display_to_packed and packed_to_display with Throughput::Elements, reporting time/conversion and elements/sec.

Methodology

  • Synthetic, reproducible batches — a deterministic LCG (gen_display_batch) so the input is identical across machines and runs; mixed signs (overpunch) to exercise the sign path.
  • Parity re-check is mandatoryparity_holds round-trips a sample (DISPLAY → COMP-3 → DISPLAY) and asserts the decoded value is unchanged. kobold-throughput exits non-zero if it fails. No number ships without this.
  • Honest accounting — single-threaded, one field; multi-field records and end-to-end shim decode (with copybook layout) are heavier. This harness measures the kernel conversion, the part that must be both correct and fast.

Gated acceleration — the plan (not yet implemented, never default)

Performance features must be strictly optional, gated, and semantics-preserving. The intended shape (tracked, with each path re-running the full gnucobol-rs differential sweep + Kani suite in CI before it can be claimed):

  • parallel (Rayon). Batch-level par_iter over independent records — embarrassingly parallel, near-linear on the 8–64 vCPU instances common on AWS. No change to per-record bytes.
  • simd. Vectorized nibble pack/unpack and overpunch handling for the COMP-3 inner loop, in an isolated unsafe module with a scalar fallback and runtime CPU-feature detection. The default build stays #![forbid(unsafe_code)].

Each, if added, is labelled "accelerated (feature-enabled)" vs the "baseline (correctness mode)" here, reported in compat_profile, and listed as not part of the sealed courts.

AWS cost framing

Lambda/Glue bill on duration. A correct-and-fast kernel turns a decimal-heavy batch from a multi-hour job into minutes, cutting compute spend and shortening reconciliation windows. Pair these numbers with the kobold-data-shim parity receipts and the kobold-lambda-layer packaging for a full S3 → verified-records reference architecture.

License

Apache-2.0 (LICENSE). Links gnucobol-rs (LGPL-3.0-or-later) — see NOTICE for the binary-distribution obligations.

End-to-end scalar benchmark (KOBOLD.BENCH.2)

kobold-bench2 measures the full shim reconciliation pipeline (FILE.1 ingest → decode → LEVEL-88 → audit) over a synthetic happy corpus, in scalar mode (no Rayon, no SIMD, no fast mode). Timing is admitted only after the output/audit hash matches the pinned baseline — a benchmark must never alter or outrun sealed semantics. A mismatch aborts with PARITY FAIL and no timing.

cargo run --release --bin kobold-bench2 -- 50000   # records
# -> reports/BENCH-2-receipt.json (records/sec, µs/record, decode-only vs full split, host/profile)

It records the output sha256, the decode-only vs full-pipeline split (so ingest+audit overhead is visible), and host (cpu/arch/profile, rayon:false, simd:false). The hostile fixtures the courts fail-close on live in kobold-data-shim's KOBOLD.CORPUS.2. No production, AWS, parallel, or customer-workload throughput is claimed.

Gated parallelism (KOBOLD.PERF.1)

kobold-bench2 --features rayon adds record-level Rayon — admitted only when its output hash is byte-identical to the scalar baseline over a fixed reference corpus (perf1 parity: rayon == scalar); a mismatch aborts with RAYON PARITY FAIL and no timing. The receipt reports both modes + the speedup + custody_us_per_record (POSTING.1/EXTRACT.PROFILE.1). Same evidence, faster — never weaker evidence, faster. No production/AWS/SIMD/parallel-throughput claim; parallelism changes no emitted artifact.

Local scale measurement (KOBOLD.SCALE.1)

kobold-scale [100m|1g|5g|10g] generates a declared synthetic mixed fixed-record corpus to a temp file and streams it through the sealed reconcile pipeline in fixed reconcile-blocks — so memory stays bounded even at multi-GB (1 GB corpus ≈ 57 MB peak RSS here). Scalar and Rayon use the same block unit, so their output hashes are byte-identical by construction; **Rayon timing is admitted only after that match

  • a pinned per-size baseline** (else SCALE PARITY FAIL). The receipt records wall time, throughput, peak RSS, temp disk, and the POSTING.1 hash chain.
cargo run --release --bin kobold-scale --features rayon -- 1g
# -> reports/SCALE-1-receipt-1g.json (admitted) + SCALE-1-baseline-1g.json (pinned)

Caution

No production SLA · no AWS cost · no mainframe equivalence · no universal throughput · no customer-workload representativeness. A synthetic-corpus number on one host is exactly that.

Per-stage profiling (KOBOLD.PERF.2)

kobold-bench2 now reports a perf2_stage_profile — the reconcile pipeline's three stages (parse / per-record / aggregate) with the bottleneck named. On this host the per-record stage dominates (~75%, the part PERF.1's Rayon parallelizes byte-identically), aggregation ~25% (serial/ordered), parse ~0.5%. Profiling never changes the emitted bytes; parallelism only touches record-local work.

About

Benchmark harness for the gnucobol-rs hot path (cob_move COMP-3<->display) with a parity re-check after every run. ~95M records/sec baseline. Apache-2.0.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages