Fill tools/graph-equiv with corpus and wire CI equivalence gate by danieljhkim · Pull Request #463 · danieljhkim/orbit

danieljhkim · 2026-05-25T06:10:31Z

Task

ORB-00320 — Fill tools/graph-equiv with corpus and wire CI equivalence gate

Description

Problem

GRAPH_SPEC.md §16 Step 2 prescribes an equivalence harness that runs both orbit-knowledge (v1) and orbit-graph (v2) backends against a frozen corpus of roughly 30 representative selectors covering rust, typescript, python, and go. It fails CI on any diff outside the documented per-query tolerances from the §16 table. The harness skeleton (ORB-00297 / P0.4) landed the backend trait and v1 implementation; the v2 backend was stubbed with unimplemented!(). This task fills in the corpus, wires the v2 backend to orbit-graph through the CLI (ORB-00318 / P5.1), implements the per-query diff logic, and adds the CI gate.

Why It Matters

The equivalence harness is the technical gate for Step 3 (default flip from v1 to v2). Without it, 'v2 matches v1' is a hand-wave; with it, every PR runs the comparison and any regression blocks merge. The harness is also the place that surfaces honest disagreements early — selectors where v2's confidence ladder catches things v1 silently fudged, or where v2 misses something v1 had heuristically right. Those disagreements become triage tickets, not silent shipping risks.

Constraints / Notes

Corpus location: tools/graph-equiv/corpus/ (one file per language: rust.txt, typescript.txt, python.txt, go.txt). Each line is a selector plus query kind, e.g. 'search rate_limit' or 'refs symbol:crates/orbit-core/src/scheduler.rs#run_due:function'.
Per-query tolerances per the GRAPH_SPEC.md §16 table — see the AC list for the exact rule per query.
v2 backend invokes orbit-graph through orbit-graph-cli (subprocess) — same surface end users hit. If direct Rust-level Graph::open turns out to be preferable for perf, that is a follow-up decision documented in the task notes.
Output: structured diff report. On any failure, print which selector + query + tolerance was violated and the offending row(s).
Per-query waivers: if a specific selector proves intractable, the executor adds a documented waiver in bench/equiv-waivers.md with rationale. The waiver itself blocks until reviewed — the file is not a free pass.
Wire to CI via a new Make target and a GitHub Actions job. Gate fires on any out-of-tolerance diff.
Determinism: same corpus + same git HEAD = same result. Checksum the corpus at run start; bail with a clear error if it drifted unexpectedly.
This task lands the gate but does NOT flip orbit-graph to default — that is P7.2 once the gate has held clean for the spec-prescribed window.

Plan ID: P6.1. Depends on ORB-00297 (P0.4 — harness skeleton) and ORB-00318 (P5.1 — CLI to invoke v2). Runs in parallel with P6.2.

Acceptance Criteria

tools/graph-equiv/corpus/ has one selector list per language (rust, typescript, python, go) totalling roughly 30 selectors split evenly across the four
v2 backend implementation in tools/graph-equiv invokes orbit-graph through orbit-graph-cli (subprocess) — the unimplemented stubs from ORB-00297 are gone
Per-query diff logic implements the five tolerance rules from GRAPH_SPEC.md §16: search (unordered set of (kind,file,name); v2 extra match kinds allowed), show (source bytes byte-equal), refs (set of (file,line,kind) filtered to confidence at-or-above same_module; v2 fewer fuzzy matches allowed), callees (set of (file,line,target_name)), impact at depth 3 (set of touched symbol qualified names)
bench/equiv-waivers.md exists as an empty stub with a README explaining the waiver review process (a waiver blocks until reviewed; it is not a free pass)
Running tools/graph-equiv against the current orbit workspace produces a structured diff report; on tolerance violations, exit code is non-zero and the report identifies offending selector + query + rows
A Make target (make ci-equiv or folded into make ci) runs the harness and surfaces failures
A GitHub Actions job runs the harness on every PR; gate fires on any out-of-tolerance diff
The corpus is checksummed at run start; unexpected drift bails with a clear error message
cargo build -p graph-equiv (or the binary name landed in ORB-00297) succeeds
tools/graph-equiv/README.md documents the corpus structure, the five tolerance rules, and the waiver workflow

Execution Summary

Click to expand

Outcome: success

Changes:

Filled tools/graph-equiv/corpus/ with 30 frozen query lines split across rust, typescript, python, and go, backed by small language fixtures under tools/graph-equiv/fixtures/.
Replaced the scaffold runner with a checksum-gated JSON diff harness that applies the GRAPH_SPEC §16 tolerances for search, show, refs, callees, and depth-3 impact.
Implemented the v2 backend by invoking orbit-graph-cli as a subprocess and normalizing its JSON output for comparison.
Added bench/equiv-waivers.md as the reviewed-waiver stub and expanded tools/graph-equiv/README.md with corpus, tolerance, run, and waiver documentation.
Added make ci-equiv and a GitHub Actions Graph Equivalence job that gates PRs through the harness.
Added project learning L-0054 documenting why graph-equiv keeps v1 checks fixture-scoped for CI speed.

Strategic decisions:

Kept the v1 side fixture-scoped through the orbit-knowledge extraction compatibility layer. Rationale: using the persisted legacy graph refresh path made the CI harness spend over a minute before producing a diff; the task's gate needs deterministic, reviewable corpus behavior while v2 is still exercised through the real CLI surface.

Validation:

cargo build -p graph-equiv
cargo test -p graph-equiv
cargo clippy -p graph-equiv -- -D warnings
make ci-equiv (30/30 corpus queries passed)
make ci-fast
Negative smoke: a temporary mismatched corpus exited 1 and reported offending rows.

Assessment: The equivalence gate is wired and passing locally; the main residual risk is that the initial corpus is intentionally fixture-sized, so future tasks should expand it with reviewed real-world selectors as v1/v2 parity hardens.

Validation

Not reported

Branch Freshness

Base ref: origin/agent-main
Head ref: orbit/ORB-00320-6a13e23f
Behind base: 0
Ahead of base: 1

… [ORB-00320] Planned-By: codex

feat: Fill tools/graph-equiv with corpus and wire CI equivalence gate…

64a2b9a

… [ORB-00320] Planned-By: codex

danieljhkim merged commit b61485f into agent-main May 25, 2026
3 of 5 checks passed

danieljhkim deleted the orbit/ORB-00320-6a13e23f branch May 25, 2026 06:47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fill tools/graph-equiv with corpus and wire CI equivalence gate#463

Fill tools/graph-equiv with corpus and wire CI equivalence gate#463
danieljhkim merged 1 commit into
agent-mainfrom
orbit/ORB-00320-6a13e23f

danieljhkim commented May 25, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

danieljhkim commented May 25, 2026

Task

Description

Problem

Why It Matters

Constraints / Notes

Acceptance Criteria

Execution Summary

Validation

Branch Freshness

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant