Wire graph_bench.rs to CI with regression gate against bench/baselines.json#462
Merged
Conversation
…se… [ORB-00321] Wire graph_bench.rs to CI with regression gate against bench/baselines.json Planned-By: codex
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Task
ORB-00321 — Wire graph_bench.rs to CI with regression gate against bench/baselines.json
Description
Problem
GRAPH_SPEC.md §12 prescribes a CI-enforced perf gate: each run compares against the committed bench/baselines.json. Regression fires when any row is more than 20 percent slower than baseline. The baseline was captured in ORB-00296 (P0.3) and the bump workflow documented (label bench-baseline-bump). This task wires the existing graph_bench.rs harness to CI, emits results as an artifact, diffs against bench/baselines.json, and gates merges on more-than-20-percent regressions.
Why It Matters
Without this gate, the baseline file is decorative. The spec design choice (candidate ADR-6 in 4_decisions.md) was specifically to avoid 'ratchet up to whatever the last commit happened to measure'; that only works if the gate runs on every PR. This is the last piece of the perf contract — capture done in P0.3, gate landed here.
Constraints / Notes
Plan ID: P6.2. Depends on ORB-00296 (P0.3 — baseline captured) and ORB-00318 (P5.1 — CLI surface to invoke v2 numbers). Runs in parallel with P6.1.
Acceptance Criteria
Execution Summary
Click to expand
Outcome: success
Changes:
.github/workflows/ci-bench.ymlwith aci-benchPR job onubuntu-24.04, native PR path filters, runner core/memory verification, pinned actions, artifact upload, andbench-baseline-bumpPR-body justification validation.bench/run_graph_bench_ci.shto build release bench binaries, rungraph_bench.rsthree times via thegraph_buildexample, measure v2orbit-graph-clirows three times, and write median rows totarget/bench/results.json.bench/check_baseline.shto compare gated result rows againstbench/baselines.jsonand fail rows more than 20 percent slower than baseline.bench/README.mdwith the CI gate behavior, artifact location, v1/v2 gate decision, and exact baseline-bump justification line.Strategic decisions:
orbit-graphrows (gate: true) against the committed baseline and keep v1orbit-knowledgerows (gate: false) informational. | Rationale: GRAPH_SPEC §12 budgets target v2; v1 values remain useful transition context but should not decide the gate.Assessment: The implementation is scoped to CI/bench wiring and avoids new crate dependencies. Full
make ciwas not run locally because repo instructions reserve it for the PR merge gate;make ci-fastpassed.Validation:
bash -n bench/run_graph_bench_ci.sh bench/check_baseline.shpassed..github/workflows/ci-bench.ymlpassed.bench/check_baseline.shpass/fail cases passed, including skippinggate: falsev1 rows.GRAPH_BENCH_RUNS=1 ./bench/run_graph_bench_ci.shwrotetarget/bench/results.json; local macOS baseline comparison flagged v2cold_full_build, so the real gate remains pinned to the verified Ubuntu runner profile.make ci-fastpassed.Validation
Branch Freshness
origin/agent-mainorbit/ORB-00321-6a13de74