qa(phase-3): RRI cadence + latency gates, observability (trends/reconcile), split-RRI orchestration, visual regression by 100yenadmin · Pull Request #954 · electricsheephq/WorldOS

100yenadmin · 2026-06-16T13:41:55Z

Phase 3 (final) — cross-version regression prevention + agent-facing signals

Closes the program. All additive (absent inputs → byte-identical existing output); pure
readers/reporters + opt-in CI signals; nothing auto-acts or SSHes without an explicit flag.

Area	What	Notes
RRI cadence + latency	`release_readiness.py`: latency hard-gates (`s_per_beat`/`coldopen_s`) from existing artifacts, gating only when present & over budget (`latency_baseline.json`); `--deterministic-only` mode marks LLM/persona gates SKIPPED so CI/agent get an early deterministic-release signal. New advisory `release-readiness.yml` (dispatch+schedule).	load-bearing — 41 pre-existing `test_release_readiness.py` tests unchanged & green; +4 additive; `test_deterministic_rri_gate.py` (9)
Observability	`scores_db.py`: `trends_json()`/`--trends-json` (per-field time-series the agent queries) + `reconcile()`/`--reconcile` (READ-ONLY `INDEX.jsonl`↔db consistency)	reconcile is a tolerant `INDEX.jsonl` reader, never rewrites it — coordinates with open sibling #573. 26 existing tests green + 15 new
Split-RRI orchestration	`orchestrate_split_rri.py`: one-step VM(part-B)+Mac(part-A handoff) RRI rollup	DRY-RUN/`--plan` by default (prints exact commands); remote SSH only behind explicit `--execute`. Reuses `support_vm_preflight` + `release_readiness`
Visual regression	`visual_regression_check.py` + `screenshot_baselines/`: strict (stdlib sha256) vs audit (PIL optional, skips if absent)	dependency-light

Verification (local single-process, integrated)

123 passed, 2 skipped (PIL-absent audit skips). The existing release_readiness.py (41) and
scores_db.py (26) suites stay green. Zero committed-artifact writes (the scores.db lazy-migration
is excluded; tests use tmp_path). 3 new test files added to CI qa-release-gate-tests.

The two in-workflow FIX verdicts were shared-worktree cross-attribution false-positives (reviewers
ran a broad git diff and saw sibling builders' files) — disproven by disjoint-ownership + the integrated
green suite. Builds on #949/#950/#952/#953; completes the 3-phase QA Lab upgrade.

…ile, split-RRI orchestration, visual regression Phase 3 (final) of the QA Lab upgrade — agent-facing signals + cross-version regression prevention. All ADDITIVE: absent inputs mean byte-identical existing output. Closes audit gaps RRI-CADENCE-1, RRI-DETERMINISM-GAP-3, RRI-SPLIT-VM-LANE-2, REGRESSION-5/7, OBS-2/3/4/5/6, NO-VISUAL-REGRESSION-DIFFING-6. - release_readiness.py (load-bearing, additive): latency hard-gates (s_per_beat, coldopen_s) from existing disk artifacts, gating ONLY when latency evidence is present and over the budget in new latency_baseline.json (defaults 120/240, headroom over healthy ~78/~157); absent means skip, never a false fail. New --deterministic-only mode evaluates only non-LLM gates and marks LLM/persona gates SKIPPED (not failed) so CI/the agent get an early deterministic-release signal. 41 pre-existing tests unchanged and still pass; +4 additive. test_deterministic_rri_gate.py (9). - .github/workflows/release-readiness.yml: advisory workflow_dispatch+schedule running --deterministic-only over a temp fixture, uploads RRI-deterministic.json. continue-on-error, never blocks. - scores_db.py (additive): trends_json() + --trends-json (per-field time-series the agent queries) and reconcile() + --reconcile (READ-ONLY INDEX.jsonl vs db consistency; tolerant parser, never rewrites INDEX.jsonl, coordinates with open sibling PR #573). 26 existing tests green + 15 new. - orchestrate_split_rri.py: one-step VM(part-B)+Mac(part-A handoff) RRI rollup; DRY-RUN/--plan by default (prints exact commands), remote SSH only behind explicit --execute. Reuses support_vm_preflight + release_readiness. test (no live SSH). - visual_regression_check.py + qa/screenshot_baselines/: strict (stdlib sha256) vs audit (PIL optional, skips gracefully) screenshot diffing. test. - CI: 3 new test files added to qa-release-gate-tests. Verification (local single-process, integrated): 123 passed, 2 skipped (PIL-absent audit skips). Existing release_readiness (41) + scores_db (26) suites stay green. Zero committed-artifact writes (scores.db lazy-migration excluded; tests use tmp_path). The two workflow FIX verdicts were shared-worktree cross-attribution false-positives (reviewers saw sibling builders' files); disproven by disjoint-ownership plus integrated-green verification.

coderabbitai · 2026-06-16T13:42:09Z

Warning

Review limit reached

@100yenadmin, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 3 minutes and 40 seconds. Learn how PR review limits work.

Your organization has used up its prepaid credits, and credit purchases are no longer available. Enable the review add-on in the billing tab to keep reviews running — you're only billed for reviews past your plan's rate limits ($0.25/file).

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 9890e11f-a660-44da-95d4-4bee2a8db4e2

📥 Commits

Reviewing files that changed from the base of the PR and between b014dd8 and d3d5f51.

📒 Files selected for processing (13)

.github/workflows/ci.yml
.github/workflows/release-readiness.yml
qa/latency_baseline.json
qa/orchestrate_split_rri.py
qa/release_readiness.py
qa/scores_db.py
qa/screenshot_baselines/README.md
qa/test_deterministic_rri_gate.py
qa/test_orchestrate_split_rri.py
qa/test_release_readiness.py
qa/test_scores_db.py
qa/test_visual_regression_check.py
qa/visual_regression_check.py

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

…ncy gate activates (#957) PR #954 added additive latency hard-gates (latency_s_per_beat / latency_coldopen) to qa/release_readiness.py, but they were DORMANT on a real sweep: read_latency() reads each PERSONA run dir's latency.json sidecar, while the runners derive the latency rollup into the TRANSCRIPT dir ($T/$RUN.latency.json) — so the per-run sidecar never existed and the gate fell through to a (safe-by-design) evidence-gap SKIP instead of actually gating. Wire it (all additive): - qa/latency_rollup.py: new reusable stamp_sidecars(rollup, run_dirs) + a --stamp-into CLI flag. It writes {s_per_beat, coldopen_s, turns_per_beat} into each run dir as <run>/latency.json — the exact shape read_latency() consumes. NULL columns are preserved (read_latency treats null as ABSENT -> skip, never a fabricated 0.0); a non-existent run dir is skipped, never created. - qa/release_gate.sh: after the duo run (which produced the per-beat ledger), re-derive the SAME rollup and stamp it into every persona run dir BEFORE the RRI rollup reads them. Non-fatal: a stamp hiccup / a duo with no derivable beat leaves the gate a documented skip. - qa/release_readiness.py: corrected read_latency()'s stale docstring (it claimed run_duo.sh writes the per-run sidecar; in fact run_duo writes to the transcript dir and release_gate.sh stamps the per-run sidecar) — the inaccuracy is part of why the gate looked wired but wasn't. - qa/evidence_audit.py: refreshed the stale "canonical 11 RRI gates" comment — the evaluated set is 11 by default and 13 once latency gates carry evidence (gates_total is read dynamically; RRI_GATE_NAMES stays the always-required baseline). Verify: - A real RRI rollup over runs WITH over-budget latency evidence (s_per_beat>120 or coldopen_s>240) now FAILS the latency gates (gates_total 13, release_ready=False); under budget PASSES; absent stays a byte-identical skip (gates_total 11). - New tests: stamp_sidecars unit coverage + a CLI --stamp-into test (test_latency_rollup.py); an end-to-end SEAM test driving the production rollup→stamp→gate path (test_release_readiness.py); a static contract locking the release_gate.sh wiring so the gate can't silently go dormant again (test_release_gate_static.py). - Single-process: qa/test_release_readiness.py + qa/test_deterministic_rri_gate.py + qa/test_latency_rollup.py + affected static/audit/scope/orchestrate tests — all green. Co-authored-by: Eva <arncalso@gmail.com>

100yenadmin merged commit 2e51dbe into main Jun 16, 2026
20 checks passed

100yenadmin deleted the qa-lab/p3-cadence-obs branch June 16, 2026 13:47

100yenadmin mentioned this pull request Jun 16, 2026

qa(phase-3): wire the per-run latency sidecar so the dormant RRI latency gate activates #957

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

qa(phase-3): RRI cadence + latency gates, observability (trends/reconcile), split-RRI orchestration, visual regression#954

qa(phase-3): RRI cadence + latency gates, observability (trends/reconcile), split-RRI orchestration, visual regression#954
100yenadmin merged 1 commit into
mainfrom
qa-lab/p3-cadence-obs

100yenadmin commented Jun 16, 2026

Uh oh!

coderabbitai Bot commented Jun 16, 2026

Review limit reached

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

100yenadmin commented Jun 16, 2026

Phase 3 (final) — cross-version regression prevention + agent-facing signals

Verification (local single-process, integrated)

Uh oh!

coderabbitai Bot commented Jun 16, 2026

Review limit reached

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant