Skip to content

Fill tools/graph-equiv with corpus and wire CI equivalence gate#463

Merged
danieljhkim merged 1 commit into
agent-mainfrom
orbit/ORB-00320-6a13e23f
May 25, 2026
Merged

Fill tools/graph-equiv with corpus and wire CI equivalence gate#463
danieljhkim merged 1 commit into
agent-mainfrom
orbit/ORB-00320-6a13e23f

Conversation

@danieljhkim
Copy link
Copy Markdown
Owner

Task

ORB-00320 — Fill tools/graph-equiv with corpus and wire CI equivalence gate

Description

Problem

GRAPH_SPEC.md §16 Step 2 prescribes an equivalence harness that runs both orbit-knowledge (v1) and orbit-graph (v2) backends against a frozen corpus of roughly 30 representative selectors covering rust, typescript, python, and go. It fails CI on any diff outside the documented per-query tolerances from the §16 table. The harness skeleton (ORB-00297 / P0.4) landed the backend trait and v1 implementation; the v2 backend was stubbed with unimplemented!(). This task fills in the corpus, wires the v2 backend to orbit-graph through the CLI (ORB-00318 / P5.1), implements the per-query diff logic, and adds the CI gate.

Why It Matters

The equivalence harness is the technical gate for Step 3 (default flip from v1 to v2). Without it, 'v2 matches v1' is a hand-wave; with it, every PR runs the comparison and any regression blocks merge. The harness is also the place that surfaces honest disagreements early — selectors where v2's confidence ladder catches things v1 silently fudged, or where v2 misses something v1 had heuristically right. Those disagreements become triage tickets, not silent shipping risks.

Constraints / Notes

  • Corpus location: tools/graph-equiv/corpus/ (one file per language: rust.txt, typescript.txt, python.txt, go.txt). Each line is a selector plus query kind, e.g. 'search rate_limit' or 'refs symbol:crates/orbit-core/src/scheduler.rs#run_due:function'.
  • Per-query tolerances per the GRAPH_SPEC.md §16 table — see the AC list for the exact rule per query.
  • v2 backend invokes orbit-graph through orbit-graph-cli (subprocess) — same surface end users hit. If direct Rust-level Graph::open turns out to be preferable for perf, that is a follow-up decision documented in the task notes.
  • Output: structured diff report. On any failure, print which selector + query + tolerance was violated and the offending row(s).
  • Per-query waivers: if a specific selector proves intractable, the executor adds a documented waiver in bench/equiv-waivers.md with rationale. The waiver itself blocks until reviewed — the file is not a free pass.
  • Wire to CI via a new Make target and a GitHub Actions job. Gate fires on any out-of-tolerance diff.
  • Determinism: same corpus + same git HEAD = same result. Checksum the corpus at run start; bail with a clear error if it drifted unexpectedly.
  • This task lands the gate but does NOT flip orbit-graph to default — that is P7.2 once the gate has held clean for the spec-prescribed window.

Plan ID: P6.1. Depends on ORB-00297 (P0.4 — harness skeleton) and ORB-00318 (P5.1 — CLI to invoke v2). Runs in parallel with P6.2.

Acceptance Criteria

  • tools/graph-equiv/corpus/ has one selector list per language (rust, typescript, python, go) totalling roughly 30 selectors split evenly across the four
  • v2 backend implementation in tools/graph-equiv invokes orbit-graph through orbit-graph-cli (subprocess) — the unimplemented stubs from ORB-00297 are gone
  • Per-query diff logic implements the five tolerance rules from GRAPH_SPEC.md §16: search (unordered set of (kind,file,name); v2 extra match kinds allowed), show (source bytes byte-equal), refs (set of (file,line,kind) filtered to confidence at-or-above same_module; v2 fewer fuzzy matches allowed), callees (set of (file,line,target_name)), impact at depth 3 (set of touched symbol qualified names)
  • bench/equiv-waivers.md exists as an empty stub with a README explaining the waiver review process (a waiver blocks until reviewed; it is not a free pass)
  • Running tools/graph-equiv against the current orbit workspace produces a structured diff report; on tolerance violations, exit code is non-zero and the report identifies offending selector + query + rows
  • A Make target (make ci-equiv or folded into make ci) runs the harness and surfaces failures
  • A GitHub Actions job runs the harness on every PR; gate fires on any out-of-tolerance diff
  • The corpus is checksummed at run start; unexpected drift bails with a clear error message
  • cargo build -p graph-equiv (or the binary name landed in ORB-00297) succeeds
  • tools/graph-equiv/README.md documents the corpus structure, the five tolerance rules, and the waiver workflow

Execution Summary

Click to expand

Outcome: success

Changes:

  • Filled tools/graph-equiv/corpus/ with 30 frozen query lines split across rust, typescript, python, and go, backed by small language fixtures under tools/graph-equiv/fixtures/.
  • Replaced the scaffold runner with a checksum-gated JSON diff harness that applies the GRAPH_SPEC §16 tolerances for search, show, refs, callees, and depth-3 impact.
  • Implemented the v2 backend by invoking orbit-graph-cli as a subprocess and normalizing its JSON output for comparison.
  • Added bench/equiv-waivers.md as the reviewed-waiver stub and expanded tools/graph-equiv/README.md with corpus, tolerance, run, and waiver documentation.
  • Added make ci-equiv and a GitHub Actions Graph Equivalence job that gates PRs through the harness.
  • Added project learning L-0054 documenting why graph-equiv keeps v1 checks fixture-scoped for CI speed.

Strategic decisions:

  • Kept the v1 side fixture-scoped through the orbit-knowledge extraction compatibility layer. Rationale: using the persisted legacy graph refresh path made the CI harness spend over a minute before producing a diff; the task's gate needs deterministic, reviewable corpus behavior while v2 is still exercised through the real CLI surface.

Validation:

  • cargo build -p graph-equiv
  • cargo test -p graph-equiv
  • cargo clippy -p graph-equiv -- -D warnings
  • make ci-equiv (30/30 corpus queries passed)
  • make ci-fast
  • Negative smoke: a temporary mismatched corpus exited 1 and reported offending rows.

Assessment: The equivalence gate is wired and passing locally; the main residual risk is that the initial corpus is intentionally fixture-sized, so future tasks should expand it with reviewed real-world selectors as v1/v2 parity hardens.

Validation

  • Not reported

Branch Freshness

  • Base ref: origin/agent-main
  • Head ref: orbit/ORB-00320-6a13e23f
  • Behind base: 0
  • Ahead of base: 1

@danieljhkim danieljhkim merged commit b61485f into agent-main May 25, 2026
3 of 5 checks passed
@danieljhkim danieljhkim deleted the orbit/ORB-00320-6a13e23f branch May 25, 2026 06:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant