Skip to content

Wire graph_bench.rs to CI with regression gate against bench/baselines.json#462

Merged
danieljhkim merged 1 commit into
agent-mainfrom
orbit/ORB-00321-6a13de74
May 25, 2026
Merged

Wire graph_bench.rs to CI with regression gate against bench/baselines.json#462
danieljhkim merged 1 commit into
agent-mainfrom
orbit/ORB-00321-6a13de74

Conversation

@danieljhkim
Copy link
Copy Markdown
Owner

Task

ORB-00321 — Wire graph_bench.rs to CI with regression gate against bench/baselines.json

Description

Problem

GRAPH_SPEC.md §12 prescribes a CI-enforced perf gate: each run compares against the committed bench/baselines.json. Regression fires when any row is more than 20 percent slower than baseline. The baseline was captured in ORB-00296 (P0.3) and the bump workflow documented (label bench-baseline-bump). This task wires the existing graph_bench.rs harness to CI, emits results as an artifact, diffs against bench/baselines.json, and gates merges on more-than-20-percent regressions.

Why It Matters

Without this gate, the baseline file is decorative. The spec design choice (candidate ADR-6 in 4_decisions.md) was specifically to avoid 'ratchet up to whatever the last commit happened to measure'; that only works if the gate runs on every PR. This is the last piece of the perf contract — capture done in P0.3, gate landed here.

Constraints / Notes

  • graph_bench.rs already exists at crates/orbit-knowledge/src/graph_bench.rs. After Phase 1's lift (ORB-00306 / P1.4), it should still run against the consumer surface. After Phases 3-4 land, it should also exercise orbit-graph directly for the v2 numbers.
  • Decision point: does the gate compare v1 numbers, v2 numbers, or both? Spec §12 budgets target the v2 implementation specifically. Recommendation is to gate on v2 numbers against the baseline; v1 numbers are informational. Document the choice in the task notes; deviation requires explicit rationale.
  • CI integration: a new GitHub Actions job runs the bench harness, writes results to target/bench/results.json, then runs a diff script against bench/baselines.json. Gate fires on any row more than 20 percent slower than baseline.
  • Bump workflow: PRs with the bench-baseline-bump label bypass the regression check but require a one-line justification in the PR body. The workflow verifies the justification. Implement label detection via the GitHub Actions event payload.
  • Runner profile: spec §12 says ubuntu-24.04, 4 vCPU, 16GB RAM. Pin to this runner. Do not run on macOS or other variants — timings will not be comparable.
  • Stability: run the bench 3 times and take the median to dampen noise. Document the choice.
  • Path filter: docs-only changes do not trigger the perf job. Use the GitHub Actions paths filter on Rust source roots.
  • This task lands the gate. It does NOT pre-flight v2 against the baseline (Phase 3 + 4 must land first for v2 to exist). After v2 lands, expect the first few runs to need baseline bumps as numbers settle — that is what the label workflow is for.

Plan ID: P6.2. Depends on ORB-00296 (P0.3 — baseline captured) and ORB-00318 (P5.1 — CLI surface to invoke v2 numbers). Runs in parallel with P6.1.

Acceptance Criteria

  • A new GitHub Actions job (e.g. ci-bench) runs graph_bench.rs on every PR with Rust source changes
  • The job runs the bench 3 times and takes the median per row to dampen noise
  • The job emits target/bench/results.json as an Actions artifact (uploaded for download on PR pages)
  • A diff script (bench/check_baseline.sh or a Rust binary) compares results.json against bench/baselines.json and exits non-zero if any row is more than 20 percent slower than the baseline
  • PRs with the bench-baseline-bump label bypass the regression check but require a one-line justification in the PR body; the workflow verifies the justification is present
  • Runner profile pinned to ubuntu-24.04 with 4 vCPU and 16GB RAM per GRAPH_SPEC.md §12
  • Path filter ensures docs-only changes do not trigger the perf job (no rust source touched = no bench run)
  • bench/README.md is updated with the new gate behavior and the label-bypass workflow (ORB-00296 added the bump workflow doc; this task adds the gate behavior section)
  • make ci passes on the current main branch with no false positives against the committed baseline (proves the runner profile + median strategy is stable enough for the gate)
  • Gate decision (v1 vs v2 numbers) documented in task notes — spec §12 budgets target v2 specifically, so the recommendation is to gate on v2 numbers with v1 numbers informational

Execution Summary

Click to expand

Outcome: success

Changes:

  • Added .github/workflows/ci-bench.yml with a ci-bench PR job on ubuntu-24.04, native PR path filters, runner core/memory verification, pinned actions, artifact upload, and bench-baseline-bump PR-body justification validation.
  • Added bench/run_graph_bench_ci.sh to build release bench binaries, run graph_bench.rs three times via the graph_build example, measure v2 orbit-graph-cli rows three times, and write median rows to target/bench/results.json.
  • Added bench/check_baseline.sh to compare gated result rows against bench/baselines.json and fail rows more than 20 percent slower than baseline.
  • Updated bench/README.md with the CI gate behavior, artifact location, v1/v2 gate decision, and exact baseline-bump justification line.

Strategic decisions:

  • Gate v2 orbit-graph rows (gate: true) against the committed baseline and keep v1 orbit-knowledge rows (gate: false) informational. | Rationale: GRAPH_SPEC §12 budgets target v2; v1 values remain useful transition context but should not decide the gate.

Assessment: The implementation is scoped to CI/bench wiring and avoids new crate dependencies. Full make ci was not run locally because repo instructions reserve it for the PR merge gate; make ci-fast passed.

Validation:

  • bash -n bench/run_graph_bench_ci.sh bench/check_baseline.sh passed.
  • Ruby YAML parse for .github/workflows/ci-bench.yml passed.
  • Synthetic bench/check_baseline.sh pass/fail cases passed, including skipping gate: false v1 rows.
  • GRAPH_BENCH_RUNS=1 ./bench/run_graph_bench_ci.sh wrote target/bench/results.json; local macOS baseline comparison flagged v2 cold_full_build, so the real gate remains pinned to the verified Ubuntu runner profile.
  • make ci-fast passed.

Validation

  • Not reported

Branch Freshness

  • Base ref: origin/agent-main
  • Head ref: orbit/ORB-00321-6a13de74
  • Behind base: 0
  • Ahead of base: 1

…se… [ORB-00321]

Wire graph_bench.rs to CI with regression gate against bench/baselines.json

Planned-By: codex
@danieljhkim danieljhkim merged commit d303f74 into agent-main May 25, 2026
2 of 4 checks passed
@danieljhkim danieljhkim deleted the orbit/ORB-00321-6a13de74 branch May 25, 2026 05:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant