Skip to content

qa(phase-3): wire the per-run latency sidecar so the dormant RRI latency gate activates#957

Merged
100yenadmin merged 1 commit into
mainfrom
qa/latency-sidecar-wire
Jun 16, 2026
Merged

qa(phase-3): wire the per-run latency sidecar so the dormant RRI latency gate activates#957
100yenadmin merged 1 commit into
mainfrom
qa/latency-sidecar-wire

Conversation

@100yenadmin

Copy link
Copy Markdown
Member

Problem

PR #954 added additive latency hard-gates (latency_s_per_beat / latency_coldopen) to qa/release_readiness.py, but they were dormant on a real sweep:

  • release_readiness.read_latency() reads each persona run dir's latency.json sidecar (then a latency block in run.json, then score.json).
  • But the runners (qa/run_duo.sh) derive the latency rollup into the transcript dir ($T/$RUN.latency.json), not the per-run dir.

So the per-run sidecar never existed → the gate fell through to a (safe-by-design) evidence-gap SKIP instead of actually gating. Diagnosed in the Phase-3 sprint (the gate looked wired but wasn't).

Fix (all additive)

  • qa/latency_rollup.py — new reusable stamp_sidecars(rollup, run_dirs) helper + a --stamp-into CLI flag. Writes {s_per_beat, coldopen_s, turns_per_beat} into each run dir as <run>/latency.json (the exact shape read_latency() consumes). NULL columns preserved verbatim (read_latency treats null as absent → skip, never a fabricated 0.0); a non-existent run dir is skipped, never created (a stale/typo path can't fabricate evidence).
  • qa/release_gate.sh — after the duo run (which already produced the per-beat ledger), re-derive the same rollup and stamp it into every persona run dir before the RRI rollup reads them. Non-fatal: a stamp hiccup, or a duo with no derivable beat (NULL columns), leaves the gate a documented skip — exactly today's behavior when latency evidence is absent.
  • qa/release_readiness.py — corrected read_latency()'s stale docstring (it claimed run_duo.sh writes the per-run sidecar; in fact run_duo writes to the transcript dir and release_gate.sh stamps the per-run sidecar). That inaccuracy is part of why the gate looked wired but wasn't.
  • qa/evidence_audit.py — refreshed the stale "canonical 11 RRI gates" comment: the evaluated set is 11 by default and 13 once latency gates carry evidence (gates_total is read dynamically; RRI_GATE_NAMES stays the always-required baseline — the conditional latency gates are intentionally not in it).

Attribution note

The latency budget is a build-level signal ("is generation within budget on this build?"). The duo is the canonical deep play, so its rollup is replicated into every persona run dir; the gate aggregates the max across personas, so identical values yield exactly that build figure. (The .app persona plays write a different transcript shape — dm.combined.jsonl — so a per-persona rollup is out of scope here; the duo rollup is the available, representative evidence.)

Verification

  • End-to-end (real CLI): a real RRI rollup over runs with over-budget latency (s_per_beat=300 > 120, coldopen_s=500 > 240), stamped via the exact latency_rollup.py --stamp-into command release_gate.sh drives, now FAILS both latency gates → gates_total=13, release_ready=False. Under budget PASSES. Absent stays a byte-identical skip (gates_total=11).
  • New tests:
    • qa/test_latency_rollup.pystamp_sidecars unit coverage (writes the right columns, skips non-existent dirs, preserves NULL) + a --stamp-into CLI test.
    • qa/test_release_readiness.py — an end-to-end seam test driving the production rollup → stamp → gate path (synthetic duo beats, not a hand-written sidecar), asserting over-budget FAIL + under-budget PASS.
    • qa/test_release_gate_static.py — a static contract locking the release_gate.sh wiring (--stamp-into "$RUN_DIRS" runs before the RRI rollup) so the gate can't silently go dormant again.
  • Single-process (-p no:xdist): qa/test_release_readiness.py + qa/test_deterministic_rri_gate.py + qa/test_latency_rollup.py + affected static/audit/scope/orchestrate tests — all green. No committed data artifact touched (scores.db reverted; tests use tmp_path + --out tmp). license_check passes.

…ncy gate activates

PR #954 added additive latency hard-gates (latency_s_per_beat / latency_coldopen) to
qa/release_readiness.py, but they were DORMANT on a real sweep: read_latency() reads each
PERSONA run dir's latency.json sidecar, while the runners derive the latency rollup into the
TRANSCRIPT dir ($T/$RUN.latency.json) — so the per-run sidecar never existed and the gate
fell through to a (safe-by-design) evidence-gap SKIP instead of actually gating.

Wire it (all additive):
- qa/latency_rollup.py: new reusable stamp_sidecars(rollup, run_dirs) + a --stamp-into CLI
  flag. It writes {s_per_beat, coldopen_s, turns_per_beat} into each run dir as
  <run>/latency.json — the exact shape read_latency() consumes. NULL columns are preserved
  (read_latency treats null as ABSENT -> skip, never a fabricated 0.0); a non-existent run dir
  is skipped, never created.
- qa/release_gate.sh: after the duo run (which produced the per-beat ledger), re-derive the
  SAME rollup and stamp it into every persona run dir BEFORE the RRI rollup reads them.
  Non-fatal: a stamp hiccup / a duo with no derivable beat leaves the gate a documented skip.
- qa/release_readiness.py: corrected read_latency()'s stale docstring (it claimed run_duo.sh
  writes the per-run sidecar; in fact run_duo writes to the transcript dir and release_gate.sh
  stamps the per-run sidecar) — the inaccuracy is part of why the gate looked wired but wasn't.
- qa/evidence_audit.py: refreshed the stale "canonical 11 RRI gates" comment — the evaluated
  set is 11 by default and 13 once latency gates carry evidence (gates_total is read
  dynamically; RRI_GATE_NAMES stays the always-required baseline).

Verify:
- A real RRI rollup over runs WITH over-budget latency evidence (s_per_beat>120 or
  coldopen_s>240) now FAILS the latency gates (gates_total 13, release_ready=False); under
  budget PASSES; absent stays a byte-identical skip (gates_total 11).
- New tests: stamp_sidecars unit coverage + a CLI --stamp-into test (test_latency_rollup.py);
  an end-to-end SEAM test driving the production rollup→stamp→gate path
  (test_release_readiness.py); a static contract locking the release_gate.sh wiring so the gate
  can't silently go dormant again (test_release_gate_static.py).
- Single-process: qa/test_release_readiness.py + qa/test_deterministic_rri_gate.py +
  qa/test_latency_rollup.py + affected static/audit/scope/orchestrate tests — all green.
@coderabbitai

coderabbitai Bot commented Jun 16, 2026

Copy link
Copy Markdown

Warning

Review limit reached

@100yenadmin, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 43 minutes and 46 seconds. Learn how PR review limits work.

Your organization has used up its prepaid credits, and credit purchases are no longer available. Enable the review add-on in the billing tab to keep reviews running — you're only billed for reviews past your plan's rate limits ($0.25/file).

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 4afdc1ef-6eb8-4113-b455-e2d6d4cf8cec

📥 Commits

Reviewing files that changed from the base of the PR and between 7f65192 and f33f5d0.

📒 Files selected for processing (7)
  • qa/evidence_audit.py
  • qa/latency_rollup.py
  • qa/release_gate.sh
  • qa/release_readiness.py
  • qa/test_latency_rollup.py
  • qa/test_release_gate_static.py
  • qa/test_release_readiness.py

Comment @coderabbitai help to get the list of available commands and usage tips.

@100yenadmin 100yenadmin merged commit b4f50e6 into main Jun 16, 2026
17 of 18 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant