[qa][P0-harness] Quota circuit-breaker + stale-evidence hygiene — a 429 must yield QUOTA_BLOCKED, never a junk RRI

## Root cause (rc3 attempt-1 diagnosis, 2026-06-10 — full report in the session scratchpad)
The 22-min junk sweep @a245a2c: all 4 parallel personas died on **HTTP 429 "You've hit your session limit"** at the DM cold-open (evidence: identical error in all 4 `vm2-*/backend.log`). The harness has **no quota circuit-breaker**: (a) `scripts/play.sh:313` burns a *second* call retrying a 429, then `:332` masks it via the narration fallback; (b) `qa/ui_playtest_app.sh:821-870` polls 600s before mis-bucketing as `no_actor`/`no_provider`; (c) the player-agent's limit banner was *scored* (veteran: sat=5 "derived" off a rate-limit banner); (d) the sweep then published an RRI from stale evidence (next issue) reading like a build regression.

## Fix (one PR, qa/-only)
1. **`qa/lib_beat_driver.sh`** (~:613-617 `clawdnd_report_attempt_failure`): on a result matching `HTTP 429|hit your session limit|usage limit` → emit `[dm-attempt] QUOTA_EXHAUSTED until=<reset-if-parseable>` + write a `QUOTA_EXHAUSTED` sentinel into the run dir; callers SKIP the retry (don't burn a second call).
2. **`qa/ui_playtest_app.sh`**: ready-wait loop polls for the sentinel/marker → abort immediately with a new failure bucket `quota_exhausted` (add to `APP_FAILURE_BUCKETS_JSON` ~:209). `qa/ui_playtest_score.py`: detect the limit banner in a player verdict → same bucket, never a derived sat score.
3. **`qa/vm/sweep_v2.sh`**: after the canary + each persona, grep for the marker; on hit → kill remaining personas, `touch $RES/QUOTA_BLOCKED`, skip duo/ui_audit, and write an explicit `quota_blocked` verdict instead of a junk RRI.
4. **Preflight lane alignment**: the sweep's `support_vm_preflight.py` call must pass `--provider claude --player-agent claude` (today it checks codex defaults — wrong-lane noise).

## Stale-evidence hygiene (same PR)
5. Before the duo: `rm -f qa/transcripts/vm2-duo.*.json` (an aborted duo currently republishes the PREVIOUS run's lens scores byte-for-byte — rc3's "story 4.0/mech 3.0" were rc2's files, verified byte-identical).
6. Rollup `--runs`: filter run dirs to `run.json build_sha == $SHA` (or wipe `qa/ui_playtest_runs/vm2-*` at sweep start, mirroring the per-persona play-state wipe) — rc3's RRI consumed three Jun-6 dirs (3 stale criticals, phantom persona).
7. Behavioral: when duo.log has no `[duo] done.` line → report `NOT_RUN` (an evidence gap), never default-RED.

## Tests
Static contract tests (the test_release_gate_static.py style, runs in the qa CI lane): sentinel branch exists in lib_beat_driver; quota bucket in ui_playtest_app + buckets JSON; sweep greps the marker + QUOTA_BLOCKED path; preflight call carries --provider claude; duo-transcript rm precedes the duo; the rollup runs-filter exists. Plus a unit test: clawdnd_report_attempt_failure on a 429 fixture emits the marker + skips retry (extend test_dm_session_remint.py's bash-via-pytest pattern).

## Acceptance
A simulated-429 fixture run produces `quota_blocked` (not an RRI); a clean re-run @ the fix SHA produces a full-length sweep. Then rc3 re-runs for real.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[qa][P0-harness] Quota circuit-breaker + stale-evidence hygiene — a 429 must yield QUOTA_BLOCKED, never a junk RRI #842

Root cause (rc3 attempt-1 diagnosis, 2026-06-10 — full report in the session scratchpad)

Fix (one PR, qa/-only)

Stale-evidence hygiene (same PR)

Tests

Acceptance

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

[qa][P0-harness] Quota circuit-breaker + stale-evidence hygiene — a 429 must yield QUOTA_BLOCKED, never a junk RRI #842

Description

Root cause (rc3 attempt-1 diagnosis, 2026-06-10 — full report in the session scratchpad)

Fix (one PR, qa/-only)

Stale-evidence hygiene (same PR)

Tests

Acceptance

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions