Root cause (rc3 attempt-1 diagnosis, 2026-06-10 — full report in the session scratchpad)
The 22-min junk sweep @a245a2c: all 4 parallel personas died on HTTP 429 "You've hit your session limit" at the DM cold-open (evidence: identical error in all 4 vm2-*/backend.log). The harness has no quota circuit-breaker: (a) scripts/play.sh:313 burns a second call retrying a 429, then :332 masks it via the narration fallback; (b) qa/ui_playtest_app.sh:821-870 polls 600s before mis-bucketing as no_actor/no_provider; (c) the player-agent's limit banner was scored (veteran: sat=5 "derived" off a rate-limit banner); (d) the sweep then published an RRI from stale evidence (next issue) reading like a build regression.
Fix (one PR, qa/-only)
qa/lib_beat_driver.sh (~:613-617 clawdnd_report_attempt_failure): on a result matching HTTP 429|hit your session limit|usage limit → emit [dm-attempt] QUOTA_EXHAUSTED until=<reset-if-parseable> + write a QUOTA_EXHAUSTED sentinel into the run dir; callers SKIP the retry (don't burn a second call).
qa/ui_playtest_app.sh: ready-wait loop polls for the sentinel/marker → abort immediately with a new failure bucket quota_exhausted (add to APP_FAILURE_BUCKETS_JSON ~:209). qa/ui_playtest_score.py: detect the limit banner in a player verdict → same bucket, never a derived sat score.
qa/vm/sweep_v2.sh: after the canary + each persona, grep for the marker; on hit → kill remaining personas, touch $RES/QUOTA_BLOCKED, skip duo/ui_audit, and write an explicit quota_blocked verdict instead of a junk RRI.
- Preflight lane alignment: the sweep's
support_vm_preflight.py call must pass --provider claude --player-agent claude (today it checks codex defaults — wrong-lane noise).
Stale-evidence hygiene (same PR)
- Before the duo:
rm -f qa/transcripts/vm2-duo.*.json (an aborted duo currently republishes the PREVIOUS run's lens scores byte-for-byte — rc3's "story 4.0/mech 3.0" were rc2's files, verified byte-identical).
- Rollup
--runs: filter run dirs to run.json build_sha == $SHA (or wipe qa/ui_playtest_runs/vm2-* at sweep start, mirroring the per-persona play-state wipe) — rc3's RRI consumed three Jun-6 dirs (3 stale criticals, phantom persona).
- Behavioral: when duo.log has no
[duo] done. line → report NOT_RUN (an evidence gap), never default-RED.
Tests
Static contract tests (the test_release_gate_static.py style, runs in the qa CI lane): sentinel branch exists in lib_beat_driver; quota bucket in ui_playtest_app + buckets JSON; sweep greps the marker + QUOTA_BLOCKED path; preflight call carries --provider claude; duo-transcript rm precedes the duo; the rollup runs-filter exists. Plus a unit test: clawdnd_report_attempt_failure on a 429 fixture emits the marker + skips retry (extend test_dm_session_remint.py's bash-via-pytest pattern).
Acceptance
A simulated-429 fixture run produces quota_blocked (not an RRI); a clean re-run @ the fix SHA produces a full-length sweep. Then rc3 re-runs for real.
Root cause (rc3 attempt-1 diagnosis, 2026-06-10 — full report in the session scratchpad)
The 22-min junk sweep @a245a2c: all 4 parallel personas died on HTTP 429 "You've hit your session limit" at the DM cold-open (evidence: identical error in all 4
vm2-*/backend.log). The harness has no quota circuit-breaker: (a)scripts/play.sh:313burns a second call retrying a 429, then:332masks it via the narration fallback; (b)qa/ui_playtest_app.sh:821-870polls 600s before mis-bucketing asno_actor/no_provider; (c) the player-agent's limit banner was scored (veteran: sat=5 "derived" off a rate-limit banner); (d) the sweep then published an RRI from stale evidence (next issue) reading like a build regression.Fix (one PR, qa/-only)
qa/lib_beat_driver.sh(~:613-617clawdnd_report_attempt_failure): on a result matchingHTTP 429|hit your session limit|usage limit→ emit[dm-attempt] QUOTA_EXHAUSTED until=<reset-if-parseable>+ write aQUOTA_EXHAUSTEDsentinel into the run dir; callers SKIP the retry (don't burn a second call).qa/ui_playtest_app.sh: ready-wait loop polls for the sentinel/marker → abort immediately with a new failure bucketquota_exhausted(add toAPP_FAILURE_BUCKETS_JSON~:209).qa/ui_playtest_score.py: detect the limit banner in a player verdict → same bucket, never a derived sat score.qa/vm/sweep_v2.sh: after the canary + each persona, grep for the marker; on hit → kill remaining personas,touch $RES/QUOTA_BLOCKED, skip duo/ui_audit, and write an explicitquota_blockedverdict instead of a junk RRI.support_vm_preflight.pycall must pass--provider claude --player-agent claude(today it checks codex defaults — wrong-lane noise).Stale-evidence hygiene (same PR)
rm -f qa/transcripts/vm2-duo.*.json(an aborted duo currently republishes the PREVIOUS run's lens scores byte-for-byte — rc3's "story 4.0/mech 3.0" were rc2's files, verified byte-identical).--runs: filter run dirs torun.json build_sha == $SHA(or wipeqa/ui_playtest_runs/vm2-*at sweep start, mirroring the per-persona play-state wipe) — rc3's RRI consumed three Jun-6 dirs (3 stale criticals, phantom persona).[duo] done.line → reportNOT_RUN(an evidence gap), never default-RED.Tests
Static contract tests (the test_release_gate_static.py style, runs in the qa CI lane): sentinel branch exists in lib_beat_driver; quota bucket in ui_playtest_app + buckets JSON; sweep greps the marker + QUOTA_BLOCKED path; preflight call carries --provider claude; duo-transcript rm precedes the duo; the rollup runs-filter exists. Plus a unit test: clawdnd_report_attempt_failure on a 429 fixture emits the marker + skips retry (extend test_dm_session_remint.py's bash-via-pytest pattern).
Acceptance
A simulated-429 fixture run produces
quota_blocked(not an RRI); a clean re-run @ the fix SHA produces a full-length sweep. Then rc3 re-runs for real.