Wire graph_bench.rs to CI with regression gate against bench/baselines.json by danieljhkim · Pull Request #462 · danieljhkim/orbit

danieljhkim · 2026-05-25T05:44:21Z

Task

ORB-00321 — Wire graph_bench.rs to CI with regression gate against bench/baselines.json

Description

Problem

GRAPH_SPEC.md §12 prescribes a CI-enforced perf gate: each run compares against the committed bench/baselines.json. Regression fires when any row is more than 20 percent slower than baseline. The baseline was captured in ORB-00296 (P0.3) and the bump workflow documented (label bench-baseline-bump). This task wires the existing graph_bench.rs harness to CI, emits results as an artifact, diffs against bench/baselines.json, and gates merges on more-than-20-percent regressions.

Why It Matters

Without this gate, the baseline file is decorative. The spec design choice (candidate ADR-6 in 4_decisions.md) was specifically to avoid 'ratchet up to whatever the last commit happened to measure'; that only works if the gate runs on every PR. This is the last piece of the perf contract — capture done in P0.3, gate landed here.

Constraints / Notes

graph_bench.rs already exists at crates/orbit-knowledge/src/graph_bench.rs. After Phase 1's lift (ORB-00306 / P1.4), it should still run against the consumer surface. After Phases 3-4 land, it should also exercise orbit-graph directly for the v2 numbers.
Decision point: does the gate compare v1 numbers, v2 numbers, or both? Spec §12 budgets target the v2 implementation specifically. Recommendation is to gate on v2 numbers against the baseline; v1 numbers are informational. Document the choice in the task notes; deviation requires explicit rationale.
CI integration: a new GitHub Actions job runs the bench harness, writes results to target/bench/results.json, then runs a diff script against bench/baselines.json. Gate fires on any row more than 20 percent slower than baseline.
Bump workflow: PRs with the bench-baseline-bump label bypass the regression check but require a one-line justification in the PR body. The workflow verifies the justification. Implement label detection via the GitHub Actions event payload.
Runner profile: spec §12 says ubuntu-24.04, 4 vCPU, 16GB RAM. Pin to this runner. Do not run on macOS or other variants — timings will not be comparable.
Stability: run the bench 3 times and take the median to dampen noise. Document the choice.
Path filter: docs-only changes do not trigger the perf job. Use the GitHub Actions paths filter on Rust source roots.
This task lands the gate. It does NOT pre-flight v2 against the baseline (Phase 3 + 4 must land first for v2 to exist). After v2 lands, expect the first few runs to need baseline bumps as numbers settle — that is what the label workflow is for.

Plan ID: P6.2. Depends on ORB-00296 (P0.3 — baseline captured) and ORB-00318 (P5.1 — CLI surface to invoke v2 numbers). Runs in parallel with P6.1.

Acceptance Criteria

A new GitHub Actions job (e.g. ci-bench) runs graph_bench.rs on every PR with Rust source changes
The job runs the bench 3 times and takes the median per row to dampen noise
The job emits target/bench/results.json as an Actions artifact (uploaded for download on PR pages)
A diff script (bench/check_baseline.sh or a Rust binary) compares results.json against bench/baselines.json and exits non-zero if any row is more than 20 percent slower than the baseline
PRs with the bench-baseline-bump label bypass the regression check but require a one-line justification in the PR body; the workflow verifies the justification is present
Runner profile pinned to ubuntu-24.04 with 4 vCPU and 16GB RAM per GRAPH_SPEC.md §12
Path filter ensures docs-only changes do not trigger the perf job (no rust source touched = no bench run)
bench/README.md is updated with the new gate behavior and the label-bypass workflow (ORB-00296 added the bump workflow doc; this task adds the gate behavior section)
make ci passes on the current main branch with no false positives against the committed baseline (proves the runner profile + median strategy is stable enough for the gate)
Gate decision (v1 vs v2 numbers) documented in task notes — spec §12 budgets target v2 specifically, so the recommendation is to gate on v2 numbers with v1 numbers informational

Execution Summary

Click to expand

Outcome: success

Changes:

Added .github/workflows/ci-bench.yml with a ci-bench PR job on ubuntu-24.04, native PR path filters, runner core/memory verification, pinned actions, artifact upload, and bench-baseline-bump PR-body justification validation.
Added bench/run_graph_bench_ci.sh to build release bench binaries, run graph_bench.rs three times via the graph_build example, measure v2 orbit-graph-cli rows three times, and write median rows to target/bench/results.json.
Added bench/check_baseline.sh to compare gated result rows against bench/baselines.json and fail rows more than 20 percent slower than baseline.
Updated bench/README.md with the CI gate behavior, artifact location, v1/v2 gate decision, and exact baseline-bump justification line.

Strategic decisions:

Gate v2 orbit-graph rows (gate: true) against the committed baseline and keep v1 orbit-knowledge rows (gate: false) informational. | Rationale: GRAPH_SPEC §12 budgets target v2; v1 values remain useful transition context but should not decide the gate.

Assessment: The implementation is scoped to CI/bench wiring and avoids new crate dependencies. Full make ci was not run locally because repo instructions reserve it for the PR merge gate; make ci-fast passed.

Validation:

bash -n bench/run_graph_bench_ci.sh bench/check_baseline.sh passed.
Ruby YAML parse for .github/workflows/ci-bench.yml passed.
Synthetic bench/check_baseline.sh pass/fail cases passed, including skipping gate: false v1 rows.
GRAPH_BENCH_RUNS=1 ./bench/run_graph_bench_ci.sh wrote target/bench/results.json; local macOS baseline comparison flagged v2 cold_full_build, so the real gate remains pinned to the verified Ubuntu runner profile.
make ci-fast passed.

Validation

Not reported

Branch Freshness

Base ref: origin/agent-main
Head ref: orbit/ORB-00321-6a13de74
Behind base: 0
Ahead of base: 1

…se… [ORB-00321] Wire graph_bench.rs to CI with regression gate against bench/baselines.json Planned-By: codex

feat: Wire graph_bench.rs to CI with regression gate against bench/ba…

c3aacf4

…se… [ORB-00321] Wire graph_bench.rs to CI with regression gate against bench/baselines.json Planned-By: codex

danieljhkim merged commit d303f74 into agent-main May 25, 2026
2 of 4 checks passed

danieljhkim deleted the orbit/ORB-00321-6a13de74 branch May 25, 2026 05:46

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Wire graph_bench.rs to CI with regression gate against bench/baselines.json#462

Wire graph_bench.rs to CI with regression gate against bench/baselines.json#462
danieljhkim merged 1 commit into
agent-mainfrom
orbit/ORB-00321-6a13de74

danieljhkim commented May 25, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

danieljhkim commented May 25, 2026

Task

Description

Problem

Why It Matters

Constraints / Notes

Acceptance Criteria

Execution Summary

Validation

Branch Freshness

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant