fix(gate): address P1-P3 audit findings (May 13 2026)#34
Conversation
Six concerns from the May 13 log + code audit: - P1 TUI orphan log pane: replace `Vertical.previous_sibling` probe (not a Textual API) with `panes_container.children` index lookup. Removes the recurring `AttributeError` + rebuild-all-panes cascade observed 40+ times in `activity.log`. - P1 Quota 401 noise: gate the WARN/DEBUG decision on the existing `quota-auth-drift-alerted.txt` 24h marker. First 401 per window WARNs with a re-auth hint; repeats DEBUG. Was 201 identical WARNs in 30 days. - P2 Cancellation latency: thread `threading.Event` through `_spawn_and_wait_agent` (`Event.wait(5)` replaces `time.sleep(5)`), `run_with_retry` (interrupts exponential back-off sleeps), and `StructuredRunner.run` (`Popen` + watchdog terminates Claude on cancel). All four `run_with_retry` callers updated: orchestrator agent + structured stages, fixer fix-rereview, fixer_polish self-audit. Worst-case cancel latency drops from ~5min (stacked retry sleeps) to ~0s. - P2 build.json race: guard the direct write at orchestrator.py:433 the same way `_save_stage_result` guards stage outputs. Fixes the `FileNotFoundError` observed on PR #318. Split the misleading `_save_stage_result` log message: cancel-set logs "review cancelled", workspace-gone logs "workspace gone" (was uniformly "review cancelled", which mis-attributed the May 9 PR #326/#330 2-hour tails to cancellation). - P2 Self-PR routing: new `_is_own_pr(pr_author, config)` helper + `pr_author=` arg on `post_review` and `approve_pr`. Bot-authored PRs route directly to `comment_pr` instead of the GraphQL-rejected `gh pr review` round-trip. Wired through 5 `approve_pr` and 1 `post_review` call site in orchestrator + 1 `approve_pr` in health.py orphan cleanup. - P3 fix_skipped vs fix_failed: new `FixResult.skipped: bool`; `fix_skipped` decision in `log_fix_result`; orchestrator emits neutral check-run "Gate Auto-Fix: skipped (<reason>)" and `status="skipped"` for the four fixer branches that are cancellations or policy blocks (not real failures). reports.py + TUI updated so the new decision is excluded from success-rate denominator and rendered correctly. Fixes the spurious PR #340 `fix_failed` entry caused by cooldown blocking the polish loop. Explicit follow-ups in the PR description. Tests: +30 new (cancellation across all four callsites, fix_skipped per branch, post_review/approve_pr bot routing, quota 401 gating, misleading-log split). Rewrote 2 regression tests for the `subprocess.run` → `Popen` switch in `StructuredRunner`. pytest: 1274 passed (up from 1244). ruff: All checks passed. mypy: no new errors; pre-existing fixer_polish baseline preserved. Co-authored-by: Cursor <cursoragent@cursor.com>
Gate Review ✅Approved — All six P1–P3 audit findings are correctly addressed and verified. Build is fully green (1268 passed, 6 pre-existing skips, ruff clean, no type errors). Architecture, security, and logic stages each returned zero findings. The logic reviewer explicitly verified all ten postconditions against the code: _is_own_pr bot-routing, approve_pr/post_review bot-comment bypass with escalation preserved, _auth_drift_marker_fresh/fail_open DEBUG-vs-WARNING gating, log_fix_result status→decision mapping, StructuredRunner.run Popen+watchdog correctness, run_with_retry Event.wait early-exit, _spawn_and_wait_agent _active_panes cleanup, and reports.summarize fix_skipped denominator exclusion. The Popen/watchdog threading refactor, quota suppression, TUI sibling-index fix, build.json guard, bot-PR routing, and fix_skipped vs fix_failed distinction are all behaviorally correct and test-covered. No actionable findings remain. Build Results
no issues across 6 stages (1097s, confidence: high) |
Gate Review SummaryVerdict: ✅ Approved · confidence All six P1–P3 audit findings are correctly addressed and verified. Build is fully green (1268 passed, 6 pre-existing skips, ruff clean, no type errors). Architecture, security, and logic stages each returned zero findings. The logic reviewer explicitly verified all ten postconditions against the code: _is_own_pr bot-routing, approve_pr/post_review bot-comment bypass with escalation preserved, _auth_drift_marker_fresh/fail_open DEBUG-vs-WARNING gating, log_fix_result status→decision mapping, StructuredRunner.run Popen+watchdog correctness, run_with_retry Event.wait early-exit, _spawn_and_wait_agent _active_panes cleanup, and reports.summarize fix_skipped denominator exclusion. The Popen/watchdog threading refactor, quota suppression, TUI sibling-index fix, build.json guard, bot-PR routing, and fix_skipped vs fix_failed distinction are all behaviorally correct and test-covered. No actionable findings remain. Build
Stats0 blocking · 0 other · 6 stages run Next steps
How to override
Updated 2026-05-13 17:51 UTC · sticky comment, replaced in-place on each review cycle. |
Summary
Implements all six P1–P3 concerns from the May 13 2026 audit of
activity.log+reviews.jsonl. Each concern was verified against the live logs and the current code before this PR.AttributeError: 'Vertical' object has no attribute 'previous_sibling', 40+ occurrences)tui.pyquota.pyorchestrator.py,runner.py,fixer.py,fixer_polish.pybuild.jsonwrite race after worktree teardown (PR #318)orchestrator.pyCan not approve/request changes on your own pull request, 4 occurrences)github.py,orchestrator.py,health.pyfix_skippeddistinguished fromfix_failed(PR #340 cooldown false-failure)schemas.py,logger.py,fixer.py,orchestrator.py,reports.py,tui.pyWhat changed
P1.1 — TUI orphan log pane
gate/tui.py: replacedcontainer.previous_siblingprobe (not part of Textual's publicWidgetAPI) withpanes_container.childrenindex lookup. Handles the asymmetric first-pane case (no leading separator) by falling back to the trailing separator.P1.2 — Quota 401 noise
gate/quota.py: new_auth_drift_marker_fresh()reuses the existingquota-auth-drift-alerted.txt24h marker file. First 401 per window logs WARNING with aclaude auth login --method browserhint; repeats log DEBUG. File-based so the gate survives process restarts.P2.1 — Cancellation plumbing
gate/runner.py:run_with_retry(cancelled=)short-circuits between attempts and usesEvent.wait(delay)for retry sleeps.StructuredRunner.run(cancelled=)switches tosubprocess.Popen+ a watchdog thread that callsproc.terminate()on cancel (worst-case cancel latency on a structured stage drops fromstructured_stage_s = 600to ~0.5s).gate/orchestrator.py:_spawn_and_wait_agent'stime.sleep(5)→self._cancelled.wait(5). Both_run_agent_stageand_run_structured_stagethreadcancelled=self._cancelledintorun_with_retry.gate/fixer.py:fix-rereviewrun_with_retrycall getscancelled=self._cancelled.gate/fixer_polish.py:fix-polishself-auditrun_with_retrycall getscancelled=pipeline._cancelled.All new args default to
None, preserving byte-identical legacy behavior for callers that don't opt in.P2.2 —
build.jsonrace + misleading log messagegate/orchestrator.py: guarded the directbuild.jsonwrite the same way_save_stage_resultguards stage outputs. Added a try/except for theFileNotFoundErrorrace after the guard runs but before the write completes._save_stage_resultlog message:review cancelledfor the cancel-flag path,workspace gonefor the missing-workspace path. (The uniform "review cancelled" wording is what caused the audit to initially attribute the May 9 PR #326/#330 2-hour tails to cancellation — they were actually workspace-teardown races.)P2.3 — Bot-PR routing
gate/github.py: new_is_own_pr(pr_author, config)helper.post_review(pr_author=)andapprove_pr(pr_author=)short-circuit tocomment_prwhen the PR author matchesconfig["repo"]["bot_account"]. The legacy stderr fallback is retained as belt-and-suspenders for the case wherepr_authoris unavailable (e.g. health.py orphan cleanup).gate/orchestrator.py: threadedpr_author=self.pr_author, config=self.configthrough 5approve_prand 1post_reviewcall sites.gate/health.py: passedconfig=to the orphan-cleanupapprove_pr.pr_authoris intentionally omitted (active_review.json doesn't persist it); the existing stderr fallback handles bot PRs in this rare recovery path.P3.1 —
fix_skippedvsfix_failedgate/schemas.py: newFixResult.skipped: bool = False.gate/logger.py:log_fix_result(status="skipped")writesdecision: "fix_skipped". Decision lookup now uses a map. Updated docstring to document all four states.gate/fixer.py: four cancellation/policy branches now returnskipped=True(cancelled-before-start, cooldown/soft/lifetime limit, mid-iteration cancel, workspace-deleted). Real-failure branches (iteration-exhausted, crash) keep their existingskipped=False.gate/orchestrator.py: new check-run branch emitsneutralconclusion with"Gate Auto-Fix: skipped (<reason>)"for skipped results;log_fix_resultcall passesstatus="skipped".gate/reports.py:fix_skippedadded to_FIX_DECISIONS; excluded from the success-rate denominator alongsidefix_no_op.gate/tui.py: added"fix_skipped": "⚒○"icon +"dim"color to the decision maps.Verification
pytest -q:1274 passed(up from 1244; +30 new tests, +2 regression tests rewritten for thesubprocess.run→Popenswitch).ruff check .: All checks passed.mypy gate/<touched files>: 0 new errors. Pre-existingfixer_polish.py_polish_contextbaseline preserved (verified viagit stash+ re-run onmain).New tests by file
tests/test_runner.py(+7):run_with_retrycancellation across early-short-circuit / rate-limit-sleep / transient-sleep / legacy back-compat +StructuredRunnerwatchdog cancellation + post-completion no-op + no-watchdog whencancelled=None.tests/test_logger.py(+2):status="skipped"mapping + no-regression onstatus="no_op".tests/test_quota.py(+4): first 401 WARNs with re-auth hint, repeat 401 DEBUGs, post-cooldown re-WARN, non-auth failures unaffected.tests/test_github.py(+5):approve_prandpost_reviewrouting for bot author, human author, and empty author.tests/test_reports.py(+3):fix_skippedexcluded from rate, classified as fix-followup, mixed-outcome denominator.tests/test_fixer.py(+5): each of the four cancellation/policy branches returnsskipped=True; defaultskipped=Falseregression.tests/test_orchestrator.py(+2): log-line split for cancel-set vs workspace-missing.Explicit follow-ups (out of scope for this PR)
builder.run_buildcancellation: a longnpm cistill runs to completion before the new build.json guard fires. Threadingcancelledintosubprocess.runcalls inbuilder.pyis a separate change.agent_stage_sdeadline. Needs a separate investigation — distinct from the cancellation work this PR addresses.claude auth login --method browser). The code-side noise suppression in P1.2 is the dev-side piece.Test plan
cancelledwithin seconds (was: up to ~5 min stacked retry sleeps).gate cancel --pr <N>mid-structured-stage; confirmclaudesubprocess is terminated.Failed to remove log paneinactivity.logand no flicker.CLAUDE_CODE_OAUTH_TOKEN, clear keychain); confirm first review WARNs with re-auth hint, subsequent reviews DEBUG only.failed to create review: GraphQL: Review Can not request changes on your own pull requestinactivity.log; verdict appears as a PR comment.decision: "fix_skipped"(notfix_failed) and the GitHub check-run shows neutralGate Auto-Fix: skipped (Fix cooldown active …).Rollback
All changes are backward-compatible.
git revertof this commit fully restores prior behavior — new params (cancelled=,pr_author=) and newFixResult.skippeddefault to neutral values. Persisted state (state/,reviews.jsonl) is forward-compatible: a pre-PR gate process reading post-PRfix_skippedrows handles them via existing.get()defaults.Made with Cursor