Skip to content

fix(gate): address P1-P3 audit findings (May 13 2026)#34

Merged
openlawbot merged 1 commit into
mainfrom
fix/p1-p3-audit-fixes
May 13, 2026
Merged

fix(gate): address P1-P3 audit findings (May 13 2026)#34
openlawbot merged 1 commit into
mainfrom
fix/p1-p3-audit-fixes

Conversation

@openlawbot
Copy link
Copy Markdown
Collaborator

Summary

Implements all six P1–P3 concerns from the May 13 2026 audit of activity.log + reviews.jsonl. Each concern was verified against the live logs and the current code before this PR.

Priority Concern Code surface
P1 TUI orphan log pane crash (AttributeError: 'Vertical' object has no attribute 'previous_sibling', 40+ occurrences) tui.py
P1 Quota 401 log storm (201 identical WARNs in 30 days) quota.py
P2 Cancellation latency / superseded review tails orchestrator.py, runner.py, fixer.py, fixer_polish.py
P2 build.json write race after worktree teardown (PR #318) orchestrator.py
P2 Bot reviewing its own PRs (Can not approve/request changes on your own pull request, 4 occurrences) github.py, orchestrator.py, health.py
P3 fix_skipped distinguished from fix_failed (PR #340 cooldown false-failure) schemas.py, logger.py, fixer.py, orchestrator.py, reports.py, tui.py

What changed

P1.1 — TUI orphan log pane

  • gate/tui.py: replaced container.previous_sibling probe (not part of Textual's public Widget API) with panes_container.children index lookup. Handles the asymmetric first-pane case (no leading separator) by falling back to the trailing separator.

P1.2 — Quota 401 noise

  • gate/quota.py: new _auth_drift_marker_fresh() reuses the existing quota-auth-drift-alerted.txt 24h marker file. First 401 per window logs WARNING with a claude auth login --method browser hint; repeats log DEBUG. File-based so the gate survives process restarts.

P2.1 — Cancellation plumbing

  • gate/runner.py: run_with_retry(cancelled=) short-circuits between attempts and uses Event.wait(delay) for retry sleeps. StructuredRunner.run(cancelled=) switches to subprocess.Popen + a watchdog thread that calls proc.terminate() on cancel (worst-case cancel latency on a structured stage drops from structured_stage_s = 600 to ~0.5s).
  • gate/orchestrator.py: _spawn_and_wait_agent's time.sleep(5)self._cancelled.wait(5). Both _run_agent_stage and _run_structured_stage thread cancelled=self._cancelled into run_with_retry.
  • gate/fixer.py: fix-rereview run_with_retry call gets cancelled=self._cancelled.
  • gate/fixer_polish.py: fix-polish self-audit run_with_retry call gets cancelled=pipeline._cancelled.

All new args default to None, preserving byte-identical legacy behavior for callers that don't opt in.

P2.2 — build.json race + misleading log message

  • gate/orchestrator.py: guarded the direct build.json write the same way _save_stage_result guards stage outputs. Added a try/except for the FileNotFoundError race after the guard runs but before the write completes.
  • Split the previously-misleading _save_stage_result log message: review cancelled for the cancel-flag path, workspace gone for the missing-workspace path. (The uniform "review cancelled" wording is what caused the audit to initially attribute the May 9 PR #326/#330 2-hour tails to cancellation — they were actually workspace-teardown races.)

P2.3 — Bot-PR routing

  • gate/github.py: new _is_own_pr(pr_author, config) helper. post_review(pr_author=) and approve_pr(pr_author=) short-circuit to comment_pr when the PR author matches config["repo"]["bot_account"]. The legacy stderr fallback is retained as belt-and-suspenders for the case where pr_author is unavailable (e.g. health.py orphan cleanup).
  • gate/orchestrator.py: threaded pr_author=self.pr_author, config=self.config through 5 approve_pr and 1 post_review call sites.
  • gate/health.py: passed config= to the orphan-cleanup approve_pr. pr_author is intentionally omitted (active_review.json doesn't persist it); the existing stderr fallback handles bot PRs in this rare recovery path.

P3.1 — fix_skipped vs fix_failed

  • gate/schemas.py: new FixResult.skipped: bool = False.
  • gate/logger.py: log_fix_result(status="skipped") writes decision: "fix_skipped". Decision lookup now uses a map. Updated docstring to document all four states.
  • gate/fixer.py: four cancellation/policy branches now return skipped=True (cancelled-before-start, cooldown/soft/lifetime limit, mid-iteration cancel, workspace-deleted). Real-failure branches (iteration-exhausted, crash) keep their existing skipped=False.
  • gate/orchestrator.py: new check-run branch emits neutral conclusion with "Gate Auto-Fix: skipped (<reason>)" for skipped results; log_fix_result call passes status="skipped".
  • gate/reports.py: fix_skipped added to _FIX_DECISIONS; excluded from the success-rate denominator alongside fix_no_op.
  • gate/tui.py: added "fix_skipped": "⚒○" icon + "dim" color to the decision maps.

Verification

  • pytest -q: 1274 passed (up from 1244; +30 new tests, +2 regression tests rewritten for the subprocess.runPopen switch).
  • ruff check .: All checks passed.
  • mypy gate/<touched files>: 0 new errors. Pre-existing fixer_polish.py _polish_context baseline preserved (verified via git stash + re-run on main).

New tests by file

  • tests/test_runner.py (+7): run_with_retry cancellation across early-short-circuit / rate-limit-sleep / transient-sleep / legacy back-compat + StructuredRunner watchdog cancellation + post-completion no-op + no-watchdog when cancelled=None.
  • tests/test_logger.py (+2): status="skipped" mapping + no-regression on status="no_op".
  • tests/test_quota.py (+4): first 401 WARNs with re-auth hint, repeat 401 DEBUGs, post-cooldown re-WARN, non-auth failures unaffected.
  • tests/test_github.py (+5): approve_pr and post_review routing for bot author, human author, and empty author.
  • tests/test_reports.py (+3): fix_skipped excluded from rate, classified as fix-followup, mixed-outcome denominator.
  • tests/test_fixer.py (+5): each of the four cancellation/policy branches returns skipped=True; default skipped=False regression.
  • tests/test_orchestrator.py (+2): log-line split for cancel-set vs workspace-missing.

Explicit follow-ups (out of scope for this PR)

  • builder.run_build cancellation: a long npm ci still runs to completion before the new build.json guard fires. Threading cancelled into subprocess.run calls in builder.py is a separate change.
  • Replacement-review stages burning to 2400s (the actual cause of the May 9 PR #326 / #330 2-hour tails): empty result files at agent_stage_s deadline. Needs a separate investigation — distinct from the cancellation work this PR addresses.
  • Architecture + Security stage parallelization: P2 perf opportunity but not a correctness issue.
  • Anthropic usage API re-auth: manual ops task (claude auth login --method browser). The code-side noise suppression in P1.2 is the dev-side piece.

Test plan

  • Full pytest suite green (1274 / 1274)
  • Ruff clean
  • Mypy no new errors
  • Manual: trigger a supersede on a live PR; confirm the cancelled review's check-run flips to cancelled within seconds (was: up to ~5 min stacked retry sleeps).
  • Manual: run gate cancel --pr <N> mid-structured-stage; confirm claude subprocess is terminated.
  • Manual: with 3+ active reviews in the TUI, cancel one; confirm no Failed to remove log pane in activity.log and no flicker.
  • Manual: force a 401 (unset CLAUDE_CODE_OAUTH_TOKEN, clear keychain); confirm first review WARNs with re-auth hint, subsequent reviews DEBUG only.
  • Manual: bot opens a PR with a request_changes verdict; confirm no failed to create review: GraphQL: Review Can not request changes on your own pull request in activity.log; verdict appears as a PR comment.
  • Manual: trigger the fix pipeline twice in quick succession; confirm the cooldown-blocked second invocation logs decision: "fix_skipped" (not fix_failed) and the GitHub check-run shows neutral Gate Auto-Fix: skipped (Fix cooldown active …).

Rollback

All changes are backward-compatible. git revert of this commit fully restores prior behavior — new params (cancelled=, pr_author=) and new FixResult.skipped default to neutral values. Persisted state (state/, reviews.jsonl) is forward-compatible: a pre-PR gate process reading post-PR fix_skipped rows handles them via existing .get() defaults.

Made with Cursor

Six concerns from the May 13 log + code audit:

- P1 TUI orphan log pane: replace `Vertical.previous_sibling` probe
  (not a Textual API) with `panes_container.children` index lookup.
  Removes the recurring `AttributeError` + rebuild-all-panes cascade
  observed 40+ times in `activity.log`.
- P1 Quota 401 noise: gate the WARN/DEBUG decision on the existing
  `quota-auth-drift-alerted.txt` 24h marker. First 401 per window
  WARNs with a re-auth hint; repeats DEBUG. Was 201 identical WARNs
  in 30 days.
- P2 Cancellation latency: thread `threading.Event` through
  `_spawn_and_wait_agent` (`Event.wait(5)` replaces `time.sleep(5)`),
  `run_with_retry` (interrupts exponential back-off sleeps), and
  `StructuredRunner.run` (`Popen` + watchdog terminates Claude on
  cancel). All four `run_with_retry` callers updated: orchestrator
  agent + structured stages, fixer fix-rereview, fixer_polish
  self-audit. Worst-case cancel latency drops from ~5min (stacked
  retry sleeps) to ~0s.
- P2 build.json race: guard the direct write at orchestrator.py:433
  the same way `_save_stage_result` guards stage outputs. Fixes the
  `FileNotFoundError` observed on PR #318. Split the misleading
  `_save_stage_result` log message: cancel-set logs "review
  cancelled", workspace-gone logs "workspace gone" (was uniformly
  "review cancelled", which mis-attributed the May 9 PR #326/#330
  2-hour tails to cancellation).
- P2 Self-PR routing: new `_is_own_pr(pr_author, config)` helper +
  `pr_author=` arg on `post_review` and `approve_pr`. Bot-authored
  PRs route directly to `comment_pr` instead of the GraphQL-rejected
  `gh pr review` round-trip. Wired through 5 `approve_pr` and 1
  `post_review` call site in orchestrator + 1 `approve_pr` in
  health.py orphan cleanup.
- P3 fix_skipped vs fix_failed: new `FixResult.skipped: bool`;
  `fix_skipped` decision in `log_fix_result`; orchestrator emits
  neutral check-run "Gate Auto-Fix: skipped (<reason>)" and
  `status="skipped"` for the four fixer branches that are
  cancellations or policy blocks (not real failures). reports.py +
  TUI updated so the new decision is excluded from success-rate
  denominator and rendered correctly. Fixes the spurious PR #340
  `fix_failed` entry caused by cooldown blocking the polish loop.

Explicit follow-ups in the PR description.

Tests: +30 new (cancellation across all four callsites, fix_skipped
per branch, post_review/approve_pr bot routing, quota 401 gating,
misleading-log split). Rewrote 2 regression tests for the
`subprocess.run` → `Popen` switch in `StructuredRunner`.

pytest: 1274 passed (up from 1244).
ruff: All checks passed.
mypy: no new errors; pre-existing fixer_polish baseline preserved.
Co-authored-by: Cursor <cursoragent@cursor.com>
@openlawbot
Copy link
Copy Markdown
Collaborator Author

Gate Review ✅

Approved — All six P1–P3 audit findings are correctly addressed and verified. Build is fully green (1268 passed, 6 pre-existing skips, ruff clean, no type errors). Architecture, security, and logic stages each returned zero findings. The logic reviewer explicitly verified all ten postconditions against the code: _is_own_pr bot-routing, approve_pr/post_review bot-comment bypass with escalation preserved, _auth_drift_marker_fresh/fail_open DEBUG-vs-WARNING gating, log_fix_result status→decision mapping, StructuredRunner.run Popen+watchdog correctness, run_with_retry Event.wait early-exit, _spawn_and_wait_agent _active_panes cleanup, and reports.summarize fix_skipped denominator exclusion. The Popen/watchdog threading refactor, quota suppression, TUI sibling-index fix, build.json guard, bot-PR routing, and fix_skipped vs fix_failed distinction are all behaviorally correct and test-covered. No actionable findings remain.

Build Results

  • ruff: ✅ (0 warnings)
  • pytest: ✅ (1268/1274 passed)

no issues across 6 stages (1097s, confidence: high)

@openlawbot
Copy link
Copy Markdown
Collaborator Author

Gate Review Summary

Verdict:Approved  ·  confidence high  ·  1097s

All six P1–P3 audit findings are correctly addressed and verified. Build is fully green (1268 passed, 6 pre-existing skips, ruff clean, no type errors). Architecture, security, and logic stages each returned zero findings. The logic reviewer explicitly verified all ten postconditions against the code: _is_own_pr bot-routing, approve_pr/post_review bot-comment bypass with escalation preserved, _auth_drift_marker_fresh/fail_open DEBUG-vs-WARNING gating, log_fix_result status→decision mapping, StructuredRunner.run Popen+watchdog correctness, run_with_retry Event.wait early-exit, _spawn_and_wait_agent _active_panes cleanup, and reports.summarize fix_skipped denominator exclusion. The Popen/watchdog threading refactor, quota suppression, TUI sibling-index fix, build.json guard, bot-PR routing, and fix_skipped vs fix_failed distinction are all behaviorally correct and test-covered. No actionable findings remain.

Build

typecheck ✅  ·  lint ✅  ·  tests

Stats

0 blocking  ·  0 other  ·  6 stages run


Next steps

  • This PR is approved by Gate.
  • A human reviewer should still sign off if your branch protection requires it.

How to override

You want to… Add this label, or comment
Skip Gate entirely gate-skip  or  /gate skip <reason>
Re-run the review gate-rerun  or  /gate rerun
Bypass blocking findings (audited) gate-emergency-bypass  or  /gate bypass <incident-link>
Disable auto-fix for this PR gate-no-fix

Updated 2026-05-13 17:51 UTC  ·  sticky comment, replaced in-place on each review cycle.

@openlawbot openlawbot merged commit 3673996 into main May 13, 2026
5 checks passed
@openlawbot openlawbot deleted the fix/p1-p3-audit-fixes branch May 13, 2026 18:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant