Skip to content

chore(db): pause branching benchmark campaign and capture resume state #352

@elitan

Description

@elitan

summary

Paused linux postgres branching benchmark campaign.
When we resume, rerun everything from start. Do not resume from old artifacts.

Goal:

  • single-host linux
  • postgres in docker
  • branching ops fast (<5s p95)
  • runtime performance high
  • low storage overhead

cleanup done

Benchmark VM cleanup confirmed.
Current Hetzner servers:

  • frost-live
  • frost-demo-1771268678

No frost-pg-bench-* server is running.

what is implemented

Research harness + adapters + tracking docs are in benchmarks/:

  • run-branching-research-all.sh
  • run-branching-ops-bench.sh
  • run-branching-runtime-bench.sh
  • run-branching-scale-bench.sh
  • run-branching-soak-bench.sh
  • run-ext4-control-bench.sh
  • run-branching-hetzner-batch.sh
  • provision-linux-vm.sh
  • backends/*.sh
  • tracking docs/csv in benchmarks/branching-research-*.md|csv

important scripts (copy/paste)

benchmarks/run-branching-research-all.sh
benchmarks/run-branching-hetzner-batch.sh
benchmarks/provision-linux-vm.sh
benchmarks/run-branching-ops-bench.sh
benchmarks/run-ext4-control-bench.sh
benchmarks/run-branching-runtime-bench.sh
benchmarks/run-branching-scale-bench.sh
benchmarks/run-branching-soak-bench.sh
benchmarks/collect-metrics.sh
benchmarks/parse-results.sh
benchmarks/backends/lvm-thin-adapter-core.sh
benchmarks/backends/lvm-thin-ext4-adapter.sh
benchmarks/backends/lvm-thin-meta-adapter.sh
benchmarks/backends/zfs-clone-adapter.sh
benchmarks/backends/btrfs-subvolume-adapter.sh
benchmarks/backends/xfs-reflink-adapter.sh
benchmarks/backends/backend-contract.sh
# full campaign (from scratch)
./benchmarks/run-branching-research-all.sh
# phase-1 ops only
./benchmarks/run-branching-ops-bench.sh
# ext4 baseline only
./benchmarks/run-ext4-control-bench.sh
# runtime only
./benchmarks/run-branching-runtime-bench.sh
# scale only
./benchmarks/run-branching-scale-bench.sh
# soak only
./benchmarks/run-branching-soak-bench.sh

key fixes already made

  • fixed CSV parse crash in ops bench (run-branching-ops-bench.sh) by sanitizing multiline/comma error text
  • fixed lvm prepare issues (backends/lvm-thin-adapter-core.sh):
    • enforce min LV size for pgbench init
    • save state earlier so cleanup is reliable
  • fixed runtime runner to skip failed backends instead of crashing entire batch (run-branching-runtime-bench.sh)
  • added phase-1 result override in orchestrator (exists, but do not use for next campaign)

previous run outputs (reference only)

These are historical signals only. Do not use them in final decision package for the next campaign.

Reference dirs:

  • benchmarks/results/20260227-085716-branching-research-ops/
  • benchmarks/results/20260227-093442-ext4-control/
  • benchmarks/results/20260227-120250-branching-research-all/

current blocker to fix before rerun

run-branching-research-all.sh can hang in phase-1 wrapper after remote completion.

Symptom:

  • remote ops-gates.csv exists
  • local process still stuck on SSH command in run-branching-hetzner-batch.sh

Likely area:

  • SSH lifecycle / remote command return handling in run-branching-hetzner-batch.sh

restart plan (from scratch)

  1. Fix SSH hang behavior in run-branching-hetzner-batch.sh:
    • timeout around remote execution
    • completion marker check
    • forced teardown path
  2. Start a new full run from phase-1, no overrides:
    • ./benchmarks/run-branching-research-all.sh
  3. Let full campaign complete in order:
    • phase1 ops
    • phase2 ext4 control
    • phase2 runtime
    • phase2 scale
    • phase3 ext4 control
    • phase3 runtime
    • phase3 scale
    • phase4 soak (24h)
    • decision package
  4. Use only the fresh run artifacts for final ranking/report.
  5. Verify benchmark VM auto-delete after each batch and at end.

done when

  • full campaign completes from phase-1 without artifact reuse
  • final report generated in new pipeline dir
  • no leftover benchmark VM in Hetzner

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions