fix(engine): wall-clock deadline + static lint test — run_next cannot hang#368
Open
jwaldrip wants to merge 4 commits into
Open
fix(engine): wall-clock deadline + static lint test — run_next cannot hang#368jwaldrip wants to merge 4 commits into
jwaldrip wants to merge 4 commits into
Conversation
…e infinite loops Per the goal "ensure nothing in our engine can put us in an infinite loop, that includes an internal loop to the call itself, or an agent loop that cannot progress." The deadlock detector previously emitted telemetry only — useful for dashboards, useless for prevention. The user could still get stuck in loops because nothing actually stopped the engine from returning the same action again. PR #366 fixed the proximate FB-as-witness loop; this PR fixes the structural floor. What's new: - `wouldDeadlock(slug, action)` — pre-emit predicate on deadlock-detector.ts. Returns a verdict when: * the SAME signature would fire HALT_THRESHOLD (4) consecutive times — the agent has re-ticked through the same payload three times already, the 4th MUST be different state or it's wedged. * the recent window of CHURN_HALT_MIN_TICKS (8) signatures cycles through ≤ CHURN_MAX_DISTINCT (2) values — the A↔B alternation pattern. - `buildLoopHaltAction(slug, verdict)` — produces a concrete `loop_halted` action with the offending signature inline + a recovery script for the agent. - `broadcastTick` (run-tick.ts) now calls wouldDeadlock BEFORE recording or broadcasting. When the verdict fires, it swaps the cursor's action for the halt action. The agent reads `loop_halted`, surfaces the halt to the user, and stops re-ticking. Once the underlying state changes (different signature on the next tick), the halt lifts automatically. - New `prompts/global/loop_halted/` — eta-templated agent directive that's explicit about NOT auto-recovering. Registered in the prompt index. What this guarantees: - The same action signature CANNOT be returned more than 4 consecutive times. Even if every other prevention misses (drift dedup, bolt cap, migration idempotency, witness refresh), the detector forces a halt. - A↔B alternation is bounded at 8 consecutive ticks. - The halt is recoverable: on a different signature, the counter resets. No on-disk state needed; in-memory across the same MCP process. Tests: - deadlock-detector.test.mjs: 5 new cases pinning wouldDeadlock + buildLoopHaltAction — null when no prior, fires on HALT_THRESHOLD-1 + 1, fires on churn, halt action surfaces the signature, fresh intent doesn't false-positive. - Full suite: 1622 passed (was 1617). - Boot smoke test: bundle still launches cleanly. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…rchitectural # Conflicts: # plugin/bin/haiku.mjs
Three issues from the bot review: 1. Migration error path + select_* gates bypassed broadcastTick. The "every return path funnels through here" claim in the broadcastTick docstring was false — four early returns went around it. The migration error case is the load-bearing one: if migrateIntent throws on every tick, runWorkflowTick would return the same `action: "error"` indefinitely with no halting mechanism. Now all four paths route through broadcastTick → wouldDeadlock fires after HALT_THRESHOLD repeats. 2. loop_halted in the recent window defeated subsequent churn detection for ~CHURN_WINDOW more ticks. recordTickResult now detects the halt marker and skips appending it to recent[]; the count + telemetry still fire so the halt is observed, but the alternation history stays clean and the next churn check doesn't false-negative. 3. Misleading test name — "returns null until HALT_THRESHOLD-1" was actually asserting the predicate FIRES. Renamed to "fires on the HALT_THRESHOLD-th consecutive identical tick". Plus per the user question on Sentry routing + recoverability: - New `haiku.deadlock.halted` telemetry fires from buildLoopHaltAction. Already-existing `haiku.deadlock.suspected` + `churn_suspected` are advisory; the new `halted` event is the hard-halt counterpart. All three flow to the OTLP / Sentry sink via emitTelemetry. - The halt is recoverable by design: when the underlying state changes (a different action signature on the next tick), the consecutive counter resets automatically. No on-disk state to clean up. The agent reads `loop_halted`, surfaces to user, fix the underlying state, next tick lifts the halt. New test covers the loop_halted-not-in-window contract. Tests: 1623 passed (was 1622). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Per user goal (2026-05-15): "no matter what circumstance, the run_next call should never get stuck in a loop. It can block via a SPA pop, but it CANNOT hang for infinite recursion or stuck while loops, and you need to have tests and or lint rules that validate any call to run_next cannot hang with near absolute certainty." Three structural additions on top of the existing `RUN_NEXT_LOOP_CAP=16` iteration cap: 1. Wall-clock deadline (`RUN_NEXT_WALL_DEADLINE_MS=90_000`). Captured once at handler entry. Every loop body checks it alongside the iter counter. If the whole call exceeds 90 s (real disk-walks finish in milliseconds; 90 s = wedged), the engine returns `loop_aborted` with the elapsed-ms diagnostic. New `wallDeadlineExceeded` predicate + `wallDeadlineAbortResponse` helper in `_loop_guard.ts`. 2. Wall-deadline applied to the three structural while-loops: `select_*`, `close_feedback`, `complete_stage`. Each loop's first-iteration code now does both `++iterations > RUN_NEXT_LOOP_CAP` AND `wallDeadlineExceeded(startedAt)` checks. 3. The `gate_review` / `user_gate` loop is DELIBERATELY exempt from the wall deadline — its body calls `await_gate`, the SPA- pop blocking primitive the user explicitly carved out. A real human-in-the-loop decision can take minutes; a 90-s deadline would falsely trip. The iteration cap + no-progress sig check bound this loop on the engine's side; await_gate's own presence-loss timeout (~120 s) covers "SPA never opened." 4. NEW STATIC TEST `test/run-next-loops-bounded.test.mjs`. Parses `haiku_run_next.ts` source, finds every `while (...)` opener, asserts each loop body contains an iter-counter-gate (`++X > RUN_NEXT_LOOP_CAP` + `loopAbortResponse`) OR a no-progress signature check OR a bounded loop condition. If a future contributor adds a `while (...)` without one of those, the test fails before CI lets the change ship. This is the "lint rule" the user requested — run-time tests can't enumerate every input that might wedge a loop, but the static shape covers all inputs by construction. All 4 test cases pass against the current codebase. Tests: 1627 (was 1623; +4 new lint cases). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Per user goal (2026-05-15): "no matter what circumstance, the run_next call should never get stuck in a loop. It can block via a SPA pop, but it CANNOT hang for infinite recursion or stuck while loops, and you need to have tests and or lint rules that validate any call to run_next cannot hang with near absolute certainty."
Summary
Three structural additions on top of the existing `RUN_NEXT_LOOP_CAP=16`:
Test plan
🤖 Generated with Claude Code