feat: /haiku:debug skill (intent recovery) + run_next loop bounding (was #368)#369
Conversation
…e infinite loops Per the goal "ensure nothing in our engine can put us in an infinite loop, that includes an internal loop to the call itself, or an agent loop that cannot progress." The deadlock detector previously emitted telemetry only — useful for dashboards, useless for prevention. The user could still get stuck in loops because nothing actually stopped the engine from returning the same action again. PR #366 fixed the proximate FB-as-witness loop; this PR fixes the structural floor. What's new: - `wouldDeadlock(slug, action)` — pre-emit predicate on deadlock-detector.ts. Returns a verdict when: * the SAME signature would fire HALT_THRESHOLD (4) consecutive times — the agent has re-ticked through the same payload three times already, the 4th MUST be different state or it's wedged. * the recent window of CHURN_HALT_MIN_TICKS (8) signatures cycles through ≤ CHURN_MAX_DISTINCT (2) values — the A↔B alternation pattern. - `buildLoopHaltAction(slug, verdict)` — produces a concrete `loop_halted` action with the offending signature inline + a recovery script for the agent. - `broadcastTick` (run-tick.ts) now calls wouldDeadlock BEFORE recording or broadcasting. When the verdict fires, it swaps the cursor's action for the halt action. The agent reads `loop_halted`, surfaces the halt to the user, and stops re-ticking. Once the underlying state changes (different signature on the next tick), the halt lifts automatically. - New `prompts/global/loop_halted/` — eta-templated agent directive that's explicit about NOT auto-recovering. Registered in the prompt index. What this guarantees: - The same action signature CANNOT be returned more than 4 consecutive times. Even if every other prevention misses (drift dedup, bolt cap, migration idempotency, witness refresh), the detector forces a halt. - A↔B alternation is bounded at 8 consecutive ticks. - The halt is recoverable: on a different signature, the counter resets. No on-disk state needed; in-memory across the same MCP process. Tests: - deadlock-detector.test.mjs: 5 new cases pinning wouldDeadlock + buildLoopHaltAction — null when no prior, fires on HALT_THRESHOLD-1 + 1, fires on churn, halt action surfaces the signature, fresh intent doesn't false-positive. - Full suite: 1622 passed (was 1617). - Boot smoke test: bundle still launches cleanly. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…rchitectural # Conflicts: # plugin/bin/haiku.mjs
Three issues from the bot review: 1. Migration error path + select_* gates bypassed broadcastTick. The "every return path funnels through here" claim in the broadcastTick docstring was false — four early returns went around it. The migration error case is the load-bearing one: if migrateIntent throws on every tick, runWorkflowTick would return the same `action: "error"` indefinitely with no halting mechanism. Now all four paths route through broadcastTick → wouldDeadlock fires after HALT_THRESHOLD repeats. 2. loop_halted in the recent window defeated subsequent churn detection for ~CHURN_WINDOW more ticks. recordTickResult now detects the halt marker and skips appending it to recent[]; the count + telemetry still fire so the halt is observed, but the alternation history stays clean and the next churn check doesn't false-negative. 3. Misleading test name — "returns null until HALT_THRESHOLD-1" was actually asserting the predicate FIRES. Renamed to "fires on the HALT_THRESHOLD-th consecutive identical tick". Plus per the user question on Sentry routing + recoverability: - New `haiku.deadlock.halted` telemetry fires from buildLoopHaltAction. Already-existing `haiku.deadlock.suspected` + `churn_suspected` are advisory; the new `halted` event is the hard-halt counterpart. All three flow to the OTLP / Sentry sink via emitTelemetry. - The halt is recoverable by design: when the underlying state changes (a different action signature on the next tick), the consecutive counter resets automatically. No on-disk state to clean up. The agent reads `loop_halted`, surfaces to user, fix the underlying state, next tick lifts the halt. New test covers the loop_halted-not-in-window contract. Tests: 1623 passed (was 1622). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Per user goal (2026-05-15): "no matter what circumstance, the run_next call should never get stuck in a loop. It can block via a SPA pop, but it CANNOT hang for infinite recursion or stuck while loops, and you need to have tests and or lint rules that validate any call to run_next cannot hang with near absolute certainty." Three structural additions on top of the existing `RUN_NEXT_LOOP_CAP=16` iteration cap: 1. Wall-clock deadline (`RUN_NEXT_WALL_DEADLINE_MS=90_000`). Captured once at handler entry. Every loop body checks it alongside the iter counter. If the whole call exceeds 90 s (real disk-walks finish in milliseconds; 90 s = wedged), the engine returns `loop_aborted` with the elapsed-ms diagnostic. New `wallDeadlineExceeded` predicate + `wallDeadlineAbortResponse` helper in `_loop_guard.ts`. 2. Wall-deadline applied to the three structural while-loops: `select_*`, `close_feedback`, `complete_stage`. Each loop's first-iteration code now does both `++iterations > RUN_NEXT_LOOP_CAP` AND `wallDeadlineExceeded(startedAt)` checks. 3. The `gate_review` / `user_gate` loop is DELIBERATELY exempt from the wall deadline — its body calls `await_gate`, the SPA- pop blocking primitive the user explicitly carved out. A real human-in-the-loop decision can take minutes; a 90-s deadline would falsely trip. The iteration cap + no-progress sig check bound this loop on the engine's side; await_gate's own presence-loss timeout (~120 s) covers "SPA never opened." 4. NEW STATIC TEST `test/run-next-loops-bounded.test.mjs`. Parses `haiku_run_next.ts` source, finds every `while (...)` opener, asserts each loop body contains an iter-counter-gate (`++X > RUN_NEXT_LOOP_CAP` + `loopAbortResponse`) OR a no-progress signature check OR a bounded loop condition. If a future contributor adds a `while (...)` without one of those, the test fails before CI lets the change ship. This is the "lint rule" the user requested — run-time tests can't enumerate every input that might wedge a loop, but the static shape covers all inputs by construction. All 4 test cases pass against the current codebase. Tests: 1627 (was 1623; +4 new lint cases). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… corrupt intents
Five admin ops, every mutation gated behind SPA-picker confirmation:
- force_stage_complete — sign reviews/approvals/QGs for units that have
moved through every hat (terminal `iterations[].result === "advance"`).
Refuses units that haven't reached terminal advance.
- set_intent_field — bypass FSM-protected intent.md frontmatter (mode, etc.)
- reset_drift — re-stamp every witnessed slot with current SHA so drift
sweep stops re-firing on stale witnesses
- mutate_feedback — apply FM patch to a feedback record (status, closed_at,
resolution, etc.) with no lifecycle guards
- preview_cursor — read-only `derivePosition()` so the user can see the next
cursor head before/after mutating
Why it's safe: every mutating op routes through `runPicker({ kind: "confirm" })`
before any state change runs. The agent cannot reach these ops without the
user clicking through the SPA. preview_cursor skips the picker since
observation is non-mutating.
Why it exists: when intent state corrupts in ways the engine's recovery
paths can't reach (lost stamps, witness/SHA divergence the drift sweep
keeps re-firing on, FB stuck in impossible state), the only escape was
hand-editing FM on disk — which the guard-workflow-fields hook blocks
the agent from doing. This routes the same mutations through a
confirmation gate so the user can authorize them explicitly.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Claude finished @jwaldrip's task in 7m 23s —— View job PR ReviewTasks:
SummaryThe overall design is solid: admin ops scoped to a purpose-built tool, every mutation gated behind a SPA picker, a read-only preview op that requires no confirmation, and a skill doc that explicitly says "use Blocking: CI is red —
Running Security:
Correctness:
Medium:
Low / Observation:
Looks good:
|
| // definition here is just the operations; the user-confirmation | ||
| // gate is in `tools/orchestrator/haiku_debug.ts`. | ||
|
|
||
| import { existsSync, readFileSync, readdirSync, writeFileSync } from "node:fs" |
There was a problem hiding this comment.
CI failure: writeFileSync is imported but never used.
This is a Biome noUnusedImports error that's currently breaking the Biome Lint CI job. Remove writeFileSync from this import.
| import { existsSync, readFileSync, readdirSync, writeFileSync } from "node:fs" | |
| import { existsSync, readFileSync, readdirSync } from "node:fs" |
There's also a Biome import-order error and a formatter failure on this file (import order, multi-line function signature formatting). Running npx @biomejs/biome check --write packages/haiku/src/orchestrator/workflow/debug-ops.ts will fix all three automatically.
| } from "../../orchestrator/workflow/debug-ops.js" | ||
| import { runPicker } from "../../server/picker.js" | ||
| import { defineTool } from "../define.js" | ||
| import { text } from "./_text.js" |
There was a problem hiding this comment.
Missing validateSlugArgs call — path traversal risk.
Every other orchestrator tool that takes intent + stage + feedback_id calls validateSlugArgs early to reject values containing /, \, or ... haiku_debug skips this, so a crafted intent: "../../etc" would resolve intentDir() outside the .haiku/intents/ tree.
This is especially important here because the ops mutate files directly (e.g. setFrontmatterField on intentMdPath).
| import { text } from "./_text.js" | |
| import { | |
| forceStageComplete, | |
| mutateFeedback, | |
| previewCursor, | |
| resetDrift, | |
| setIntentField, | |
| } from "../../orchestrator/workflow/debug-ops.js" | |
| import { runPicker } from "../../server/picker.js" | |
| import { defineTool, validateSlugArgs } from "../define.js" | |
| import { text } from "./_text.js" |
Then add at the top of handle(), before the SUPPORTED_OPS check:
const slugCheck = validateSlugArgs({ intent: args.intent as string, stage: args.stage as string, feedback_id: args.feedback_id as string })
if (slugCheck) return slugCheck| } | ||
|
|
||
| // Read-only path: no picker required. | ||
| if (op === "preview_cursor") { |
There was a problem hiding this comment.
preview_cursor is unguarded by the try/catch block at line 123.
The previewCursor() call returns early (line 83-85) before the try { switch(op) { ... } } catch (err) { errorResponse(...) } block. If derivePosition() inside previewCursor() throws (e.g., corrupted state on a truly broken intent is exactly when someone reaches for this tool), the exception propagates unhandled through handle().
Wrap the preview_cursor early return:
| if (op === "preview_cursor") { | |
| // Read-only path: no picker required. | |
| if (op === "preview_cursor") { | |
| try { | |
| const r = previewCursor({ slug }) | |
| return text(JSON.stringify(r)) | |
| } catch (err) { | |
| return errorResponse({ | |
| error: "preview_cursor_threw", | |
| detail: err instanceof Error ? err.message : String(err), | |
| }) | |
| } | |
| } |
| // intent-completion review on the next tick. | ||
| let igsSigned = false | ||
| if (targetIdx === stages.length - 1) { | ||
| const intentApprovals = |
There was a problem hiding this comment.
force_stage_complete on the final stage omits "continuity" from the intent approval roles.
The cursor at cursor.ts:1354-1355 requires ["spec", "continuity", "user"] (or ["spec", "continuity"] for autopilot) before it reaches the intent_quality_gates check. This code signs ["spec", "user", "intent_quality_gates"], skipping continuity.
If the intent is not in autopilot mode and continuity is still unsigned, calling force_stage_complete on the final stage won't unblock the cursor — it will re-emit intent_review for the continuity role on the next tick.
| const intentApprovals = | |
| for (const role of ["spec", "continuity", "user", "intent_quality_gates"]) { |
Or, pull the roles from the same logic path the cursor uses for correctness against mode variation.
| if (!numMatch) return { ok: false, error: "invalid_feedback_id_shape" } | ||
| const nn = numMatch[1].padStart(3, "0") | ||
| const found = readdirSync(fbDir).find( | ||
| (f) => f.startsWith(`${nn}-`) || f.startsWith(`${args.feedbackId}-`), |
There was a problem hiding this comment.
The second startsWith arm is dead code and the regex lacks a $ anchor.
Two issues:
-
FB files on disk are named
NNN-slug.md(e.g.037-some-title.md), neverFB-NNN-slug.md. Sof.startsWith(\${args.feedbackId}-`)withfeedbackId = "FB-037"checks for"FB-037-"which never matches. The second arm is unreachable for any validFB-NNNinput and dead for bare-number input (sincenn` is already the padded form). Remove it. -
The regex
^(?:FB-)?(\d+)has no$anchor, unlike the canonicalfindFeedbackFileatstate-tools.ts:5688which uses^(?:FB-)?(\d+)$. Without$, inputs like"FB-037-anything"silently match, extracting"037"and finding a file the user didn't intend. The canonical function should be called directly here to avoid re-implementing divergent lookup logic.
Consider delegating to findFeedbackFile(args.slug, args.stage ?? "", args.feedbackId) instead:
| (f) => f.startsWith(`${nn}-`) || f.startsWith(`${args.feedbackId}-`), | |
| const found = findFeedbackFile(args.slug, args.stage ?? "", args.feedbackId) | |
| if (!found) return { ok: false, error: "feedback_not_found" } | |
| const fbPath = found.path |
| const dir = intentDir(args.slug) | ||
| const intentMdPath = join(dir, "intent.md") | ||
| if (!existsSync(intentMdPath)) return { ok: false, error: "intent_not_found" } | ||
| const fm = matter(readFileSync(intentMdPath, "utf8")).data |
There was a problem hiding this comment.
Inconsistent frontmatter parser: matter vs parseFrontmatter.
Every other function in this file uses parseFrontmatter (imported from state-tools.ts), which includes duplicate-key recovery via dedupeFrontmatterKeys and date normalization. previewCursor uses raw matter directly, which will throw on a duplicate-key YAML file (exactly the scenario a debug tool should handle gracefully) and won't normalize dates.
| const fm = matter(readFileSync(intentMdPath, "utf8")).data | |
| const fm = parseFrontmatter(readFileSync(intentMdPath, "utf8")).data |
With this change you can also drop the import matter from "gray-matter" line entirely (gray-matter is already available via parseFrontmatter), which also resolves the Biome import-order warning.
| if (!approvals[role]) | ||
| approvals[role] = buildApprovalRecord(dir, outputs) | ||
| } | ||
| setFrontmatterField(unitPath, "reviews", reviews) |
There was a problem hiding this comment.
Partial mutation on units_not_terminal_advance error — non-atomic behaviour worth surfacing clearly.
When a stage has a mix of terminal-advance and non-terminal units, this loop signs and writes the terminal ones before hitting the refused unit check. On ok: false, partial_signed is included in details, but the stagesProcessed list also includes the stage name (it's pushed unconditionally at line 129) — which overstates success to the caller.
The PR description says "refuses units that haven't reached terminal advance" and the intent is correct (it's a debug/recovery tool and users understand it's not atomic). However, the stagesProcessed list should either exclude stages with any refused units, or be renamed to stages_visited to avoid implying they're fully signed.
This won't cause a correctness bug (the signed units got signed correctly, the unsigned ones were skipped) but makes the error response slightly misleading. Low priority, but worth a pass before merge.
| }, | ||
| // v4: haiku_classify_drift removed. Drift sweep auto-files FBs; | ||
| // the feedback track handles assessment. | ||
| { |
There was a problem hiding this comment.
Schema duplication: haiku_debug input schema is defined twice.
The same schema appears verbatim in tool-defs.ts (for MCP advertisement) and in haiku_debug.ts's defineTool() call (for dispatch). These need to stay in sync manually. If tool-defs.ts drifts from the actual handler schema (e.g., someone adds a parameter to the handler but forgets to update the MCP listing), agents will see a different surface than what the handler accepts.
The pattern other migrated tools use is: defineTool() is the single source of truth for both the schema and the handler. The orchestratorToolHandlers registry in tools/orchestrator/index.ts exposes those ToolDef objects, and orchestratorToolDefs in tool-defs.ts reads from that registry — or the tools were separately added.
Worth checking if haiku_debug can follow the same pattern as haiku_select_mode or haiku_await_gate where tool-defs.ts re-exports rather than duplicates.
…t panel Companion to PR #369's MCP backend. Two surfaces, same five ops, two elicitation gates: - agent-driven: `haiku_debug` MCP tool → `runPicker` confirmation - user-driven: SPA admin panel → in-page confirmation modal Both call the same `debug-ops.ts` functions; the agent cannot reach the ops without an explicit user click on either surface. What ships in this PR: Server (`http/debug-routes.ts`): - GET /debug — SPA shell - GET /debug/:slug — SPA shell - GET /api/debug/intents — list every intent - GET /api/debug/intents/:intent — metadata + stages - GET /api/debug/intents/:intent/cursor — preview_cursor - POST /api/debug/intents/:intent/ops/:op — run an op (ops: force_stage_complete, set_intent_field, reset_drift, mutate_feedback) SPA (`packages/haiku-ui/src/routes/debug/`): - /debug — intent index table (slug, title, studio, mode, status, archived) with deep-link to per-intent admin panel - /debug/:slug — admin panel with intent metadata + cursor preview (re-fetched after every op) and four admin-op forms. Each form's "Prepare op" button opens a confirmation modal showing the exact request body before POST. Cancel = no-op; confirm = mutation runs and the cursor pane refreshes. Auth: read endpoints + write endpoints both go through the existing `requireTunnelAuth` chain (loopback-only in local mode, JWT in tunnel mode) — same model as the rest of the SPA's mutation surface. Tests (`test/debug-routes.test.mjs`): 11 tests against a real Fastify server with a real on-disk intent. Covers list/detail/cursor/op-dispatch/ error paths and SPA-shell HTML serving. Per the user's standing rule, no mocking — the routes call debug-ops directly so the tests prove both layers in one pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Claude finished @jwaldrip's task in 4m 32s —— View job PR Review (re-review after second commit)Tasks:
Status: previous issues remain open + two new security issues in the SPA commitNo issues from the previous review were fixed in the second commit. The second commit added the SPA routes layer, which introduced two new security gaps. All blocking issues must be resolved before merge. Blocking (CI + Correctness)CI is still red — Biome Lint job fails on
Run
Security
Medium (carried from previous review, still open)
Looks good (second commit)
|
| Params: { intent: string; op: string } | ||
| Body: Record<string, unknown> | ||
| }>("/api/debug/intents/:intent/ops/:op", async (req, reply) => { | ||
| if (!requireTunnelAuth(req, reply, null)) return |
There was a problem hiding this comment.
Security: mutation endpoint should use verifyIntentMutationAuth, not requireTunnelAuth(…, null).
Every other endpoint in the codebase that mutates a specific intent's files uses verifyIntentMutationAuth (or its alias verifyFeedbackMutationAuth) — feedback-api.ts and upload-routes.ts both follow this pattern. The difference matters in tunnel mode: requireTunnelAuth(req, reply, null) verifies the JWT is signed and non-expired, but does not bind the JWT's sid claim to the URL's :intent slug. A tunnel-mode reviewer holding a valid JWT for session S1 (intent A) can POST to /api/debug/intents/B/ops/force_stage_complete and mutate intent B. auth.ts:62–68 names this attack exactly: R-01 cross-session bypass.
| if (!requireTunnelAuth(req, reply, null)) return | |
| instance.post<{ | |
| Params: { intent: string; op: string } | |
| Body: Record<string, unknown> | |
| }>("/api/debug/intents/:intent/ops/:op", async (req, reply) => { | |
| if (!verifyIntentMutationAuth(req, reply, req.params.intent)) return |
And add verifyIntentMutationAuth to the import from "./auth.js":
import { requireTunnelAuth, verifyIntentMutationAuth } from "./auth.js"| let result: unknown | ||
| switch (op) { | ||
| case "force_stage_complete": { | ||
| const stage = typeof body.stage === "string" ? body.stage : "" |
There was a problem hiding this comment.
Security: body.stage is untrusted input — validate with isValidSlug before use.
The slug param is validated at line 184. The stage value from the POST body is not. It flows directly to mutateFeedback which constructs join(dir, "stages", args.stage, "feedback"). A crafted stage like "../../other-intent/stages/design" would escape the current intent's stages/ tree. (forceStageComplete is protected by its resolveStudioStages check, but mutateFeedback has no equivalent guard.)
| const stage = typeof body.stage === "string" ? body.stage : "" | |
| const stage = typeof body.stage === "string" ? body.stage : "" | |
| if (!stage) { | |
| reply.status(400).send({ error: "missing_stage" }) | |
| return | |
| } | |
| result = forceStageComplete({ slug, targetStage: stage }) |
Add this before line 205:
if (stage && !isValidSlug(stage)) {
reply.status(400).send({ error: "invalid_stage" })
return
}Apply the same check to body.stage in the mutate_feedback branch (line ~228).
…port order - debug-ops.ts: drop unused writeFileSync import - debug-routes.ts: organize imports - .tsx: refactor effect so refreshTick is read inside the body (biome's useExhaustiveDependencies couldn't see it as the trigger) - biome auto-format pass Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Claude finished @jwaldrip's task in 2m 40s —— View job Claude finished @jwaldrip's task —— View job PR Review (re-review after third commit)Tasks:
Status: Biome CI fixed. Three security/correctness issues remain blocking.The third commit ( Blocking (security + correctness — unresolved from previous reviews)
Medium (still open)
What the third commit fixed
Looks good (unchanged from previous reviews)
|
Blocking (security + correctness):
- haiku_debug.ts: add validateSlugArgs at the top of handle() — guards
intent / stage / feedback_id against path traversal before any
debug-op runs. Mirrors every other orchestrator tool.
- debug-routes.ts: write endpoint now uses verifyIntentMutationAuth
instead of bare requireTunnelAuth(..., null). Binds the JWT's `sid`
claim to the URL's :intent slug, closing R-01 cross-session bypass
(auth.ts:62–68 documents this exact attack).
- debug-routes.ts: validate body.stage with isValidSlug in both
force_stage_complete and mutate_feedback. mutateFeedback constructs
join(dir, "stages", body.stage, "feedback") so a crafted
"../../other-intent/stages/design" otherwise escapes the intent's tree.
- debug-ops.ts: forceStageComplete intent-completion role array now
includes "continuity" — cursor.ts:1354–1355 requires
["spec", "continuity"] (autopilot) or ["spec", "continuity", "user"]
(non-autopilot) BEFORE it reaches the intent_quality_gates check.
Without this, the cursor re-emits intent_review on the next tick and
the wedge persists.
Medium:
- haiku_debug.ts: wrap preview_cursor's early-return in its own
try/catch returning { error: "preview_cursor_threw" }. Callers reach
for this op precisely when intents are corrupted enough that
derivePosition() may throw.
- debug-ops.ts: previewCursor now uses parseFrontmatter (not raw
matter). On a file with duplicate YAML keys — exactly the corrupted-FM
scenario — parseFrontmatter recovers via dedupeFrontmatterKeys. Drops
the now-unused gray-matter import.
- debug-ops.ts: mutateFeedback regex now anchored with $; second
startsWith arm removed. Without $ the pattern silently matched
"FB-037-anything"; FB files on disk are NNN-slug.md (never
FB-NNN-slug.md), so the second arm was dead code.
Tests: +2 traversal-rejection tests (force_stage_complete + mutate_feedback).
13/13 pass. Full suite: 1636 passed, 0 failed.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Claude finished @jwaldrip's task in 2m 8s —— View job Claude finished @jwaldrip's task —— View job PR Review (re-review after fourth commit —
|
…ursor try/catch The three remaining non-blocking observations from review #4: - forceStageComplete is now atomic (debug-ops.ts): Two-pass design — pass 1 walks every unit on every stage and collects refusals WITHOUT writing anything; if any unit hasn't reached terminal advance the op aborts cleanly with `signed: 0`. Pass 2 only runs when the whole set passes. Earlier versions wrote partial signatures before the refusal check fired, then reported half-signed stages as "processed" — caller's mental model was off. Added test/debug-ops-atomicity.test.mjs (3 cases): refusal returns signed:0; ready unit's file is byte-identical before/after the failed call; stalled unit untouched. - Schema duplication eliminated (state/schemas/inputs/debug.ts): New TypeBox schema HAIKU_DEBUG_INPUT_SCHEMA is the single source of truth. tool-defs.ts uses jsonSchemaOf(HAIKU_DEBUG_INPUT_SCHEMA); haiku_debug.ts uses the same schema + the shared HAIKU_DEBUG_SUPPORTED_OPS constant. The 40+ lines of inline schema in tool-defs.ts are gone; adding a parameter can't leave the MCP advertisement and the dispatcher out of sync. - Cursor GET endpoint now has try/catch (debug-routes.ts): derivePosition() may throw on a truly corrupted intent — exactly what callers reach for the debug surface to diagnose. Returns a structured 500 with { ok: false, error: "preview_cursor_threw", detail } instead of letting Fastify's default error handler emit a generic crash page. Full suite: 1639 passed, 0 failed (was 1636 — +3 from the new atomicity test). Lint clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Claude finished @jwaldrip's task in 4m 6s —— View job Claude finished @jwaldrip's task —— View job PR Review (re-review after fifth commit —
|
| Issue | Status |
|---|---|
Biome lint failures (writeFileSync, import order, formatter) |
✅ Fixed in 57d3825 |
validateSlugArgs missing in haiku_debug.ts (path traversal) |
✅ Fixed in 33ebd76 |
verifyIntentMutationAuth missing in write endpoint |
✅ Fixed in 33ebd76 |
body.stage not validated with isValidSlug |
✅ Fixed in 33ebd76 |
"continuity" missing from intent approval roles |
✅ Fixed in 33ebd76 |
previewCursor unguarded by try/catch in haiku_debug.ts |
✅ Fixed in 33ebd76 |
Raw matter instead of parseFrontmatter in previewCursor |
✅ Fixed in 33ebd76 |
mutateFeedback regex missing $ anchor, dead code arm |
✅ Fixed in 33ebd76 |
stagesProcessed overstating success on partial refusal |
✅ Fixed in 3d978ce (two-pass design) |
Schema duplication between tool-defs.ts and haiku_debug.ts |
✅ Fixed in 3d978ce |
Cursor GET endpoint missing try/catch |
✅ Fixed in 3d978ce |
Looks good (full PR)
- Trust boundary is correctly implemented: SPA confirmation modal on the HTTP path,
runPickeron the MCP path. Both surfaces call the samedebug-ops.tsfunctions. The agent cannot reach any mutation without an explicit user action on one of the two surfaces. SKILL.mdis well-written: "When NOT to use" section directs to/haiku:repairand/haiku:reset-stagefirst, the confirmation-flow diagram is accurate, the "never auto-retry on cancellation" note is correct.debug-routes.test.mjs— real HTTP server, real on-disk intent. Two path-traversal tests cover bothforce_stage_completeandmutate_feedback. Three atomicity tests indebug-ops-atomicity.test.mjs.derived-stage-state.tschange is minimal — only exports two previously-private functions (reviewRolesFor,approvalRolesFor).listIntentSummariesresilient catch path correctly skips unparseable intent files.- Workflow diagrams are auto-generated (
export:workflow-diagrams); the regeneration is consistent across all studios.
Ready to merge.
The compiled validator was already exported from state/schemas/inputs/debug.ts but the handler skipped it. Without it, a shape-rejected call produced the MCP runtime's generic error instead of the stable named code \`haiku_debug_input_invalid\` with structured \`errors[]\`. Now matches the haiku_select_mode precedent and every other validated state-tool. Order: shape gate (validateToolInput) → path-traversal gate (validateSlugArgs) → op dispatch. Both gates produce stable named codes. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Combines the debug skill PR with the run-next loop-bounding PR so they ship as one atomic unit per user request. # Conflicts: # plugin/bin/haiku.mjs
|
Claude finished @jwaldrip's task in 2m 36s —— View job Claude finished @jwaldrip's task —— View job PR Review (re-review after sixth and seventh commits —
|
| Loop | Cap check | Wall deadline | No-progress sig |
|---|---|---|---|
select_* |
✓ | ✓ | ✓ |
close_feedback |
✓ | ✓ | ✓ |
complete_stage |
✓ | ✓ | ✓ |
gate_review / user_gate |
✓ | deliberately omitted | ✓ |
The gate_review loop's deliberate exemption from wallDeadlineExceeded is correctly motivated and clearly commented (lines 1489–1497): await_gate is the SPA-pop blocking primitive the user explicitly exempted ("It can block via a SPA pop, but it CANNOT hang for infinite recursion"). A real human-in-the-loop decision can take minutes; the 90s deadline would falsely fire. The iteration cap + no-progress check still bound the loop.
run-next-loops-bounded.test.mjs is a clever static assertion: it parses the source text of haiku_run_next.ts, finds every while ( opener, and asserts each one has the cap-gate pattern within the next 40 lines. A future commit that adds an unbounded while loop will fail CI before shipping. The test has its own hasBoundedConditionShape and hasUserDecisionAwait predicates for edge cases. Test file is thorough and the mechanism is sound.
One minor observation (non-blocking)
wallDeadlineAbortResponse passes reason: "no_progress" to loopAbortResponse (_loop_guard.ts:60). The actual trigger is a wall-clock ceiling, which is semantically closer to "cap" than "no_progress". The diagnostic is still accurate — the wall-deadline info is in the loop name string ("${loopName} (wall-deadline … ms exceeded; elapsed=…ms)"), so the log and error response are human-readable. The only oddity is the reason=no_progress token in the detail line where reason=cap would be more precise. Non-blocking; the user can read the diagnostic either way.
Full issue history — all resolved ✓
| Issue | Fixed in |
|---|---|
Biome lint failures (writeFileSync, import order, formatter) |
57d3825 |
validateSlugArgs missing (path traversal on MCP args) |
33ebd76 |
verifyIntentMutationAuth missing on write endpoint (R-01 bypass) |
33ebd76 |
body.stage not validated with isValidSlug |
33ebd76 |
"continuity" missing from intent approval roles |
33ebd76 |
previewCursor unguarded by try/catch in MCP tool |
33ebd76 |
Raw matter instead of parseFrontmatter in previewCursor |
33ebd76 |
mutateFeedback regex missing $ anchor, dead code arm |
33ebd76 |
stagesProcessed overstating success on partial refusal |
3d978ce (two-pass atomicity) |
Schema duplication between tool-defs.ts and haiku_debug.ts |
3d978ce |
Cursor GET endpoint missing try/catch |
3d978ce |
validateToolInput shape gate missing in haiku_debug.handle() |
22fb8d9 |
Looks good (full PR)
- Trust boundary correctly implemented: SPA confirmation modal on the HTTP path,
runPickeron the MCP path. Both surfaces call the samedebug-ops.tsfunctions. forceStageCompletetwo-pass atomicity is correct and tested (debug-ops-atomicity.test.mjssnapshot-compares file bytes before and after a failed call).- Schema is a single source of truth (
state/schemas/inputs/debug.ts);tool-defs.tsandhaiku_debug.tsboth import from it. debug-routes.tscorrectly usesverifyIntentMutationAuthon the write endpoint andisValidSlugon bothbody.stagepaths.SKILL.md"When NOT to use" section, confirmation-flow diagram, and "never auto-retry on cancellation" note are all correct.run-next-loops-bounded.test.mjsstatic assertion is a strong regression guard.
Ready to merge.
Open feedback continues to block the cursor even after every approval is signed — the next tick returns feedback_dispatch / review_fix as long as any FB lacks closure. The new opt-in option closes that gap in one call. When close_open_feedback: true, after the unit-sign pass, walk every FB on the targeted stages (and on intent scope when the final stage is the target). For each FB without closed_at and without closed_by, stamp: closed_at: <now ISO> closed_by: "force_complete" That's it. We deliberately do NOT write status: closed — v4 derives status from closed_at / closed_by (state-tools.ts:5645 doc), and the legacy `status` field is being phased out. Also rejects already-closed FBs as no-ops (don't overwrite existing closed_at, don't add force_complete provenance to FBs that closed via the normal fix loop). Wired through the schema (state/schemas/inputs/debug.ts), the MCP tool, the HTTP route, and the SPA force-stage-complete form (now has a checkbox for "Also close every open feedback"). Default is false — the closure behavior is opt-in, since some recovery scenarios want to keep FBs open for follow-up. Tests (debug-ops-atomicity.test.mjs): +4 cases. feedback_closed === 1 (only the open one); open FB now has closed_at + closed_by: force_complete; status field is NOT written (v4 derives it); already- closed FBs (via closed_at OR closed_by) are byte-untouched. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The v4 schemas already declared "there is no status field" (feedback.ts:8,
unit.ts:5), but ~25 consumer sites in state-tools.ts still read \`item.status\`
as a fallback when on-disk data carried the legacy enum. v8 closes the loop:
1. Migration v7→v8 (\`orchestrator/migrations/v7-to-v8.ts\`)
Walks every FB and unit md file in the intent dir. For FB: drops the
\`status:\` field; synthesizes the v8 closure signals when status implied
closure (addressed→closed_at+closed_by:feedback-assessor; answered→
closed_at+resolution:answered; closed→closed_at+closed_by:manual_close;
rejected→rejected_at). For unit: drops \`status:\` outright (unit position
is fully derived from iterations + reviews + approvals). Stamps
plugin_version: 8.0.0 on intent.md. Idempotent re-runs are no-ops.
2. \`deriveFeedbackStatus(fm)\` returns the same 6 enum values consumers
expected, computed from signal precedence:
rejected_at → rejected
closed_by === "feedback-assessor" → addressed
resolution === "answered" → answered
closed_at OR closed_by → closed
iterations[] non-empty → fixing
else → pending
Falls back to legacy \`status:\` field for un-migrated FBs in flight.
3. \`deriveUnitStatus(fm)\` returns "pending" / "active" / "completed":
iterations[] empty → pending
last iteration result==="advance" + approvals non-empty → completed
else → active
Same legacy-status fallback.
4. \`normalizeLegacyFeedbackStatus(fm)\` translates any legacy \`status:\`
write into the v8 signals before persisting:
status: closed/addressed/answered → write closed_at (+ closed_by /
resolution flavor where appropriate)
status: rejected → write rejected_at
status: pending/fixing/null/empty → just drop the legacy field
Wired into updateFeedbackFile, the no-op move path, and the reject
handler — every FB write site that might inherit legacy status from
a spread of \`found.data\` now strips it on the way out.
5. Consumer-site refactors in state-tools.ts:
- readFeedbackFiles: status now derived (not read from FM)
- countPendingFeedback: simplified to \`status === pending|fixing\`
- haiku_unit_set lifecycle gate: uses deriveUnitStatus
- haiku_unit_write lifecycle gate: uses deriveUnitStatus
- haiku_unit_delete lifecycle gate: uses deriveUnitStatus
- haiku_feedback_reject: stamps rejected_at instead of status:rejected,
normalizes the spread to drop any legacy status
- haiku_feedback_move lifecycle guard: uses deriveFeedbackStatus
- feedback list "isClosed" predicate: uses derived status
- appendFeedbackReply close_as_answered path: stamps closed_at +
resolution:answered instead of status:answered
6. Reject handler bug fix: \`closed_by: null\` write deletes the key
instead of setting it to undefined (js-yaml choked on the latter).
Tests: feedback.test.mjs updated to assert v8 disk shape (closed_at /
rejected_at / closed_by) instead of the legacy status field. Full
suite: 1647 passed, 0 failed.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…tness
Three things in one commit because they all converge on "v8 closure
truth = closed_at":
1. Bug fix (user-reported): force_stage_complete close_open_feedback
skipped FBs that had closed_by set but closed_at: null. The cursor's
source-of-truth for closure is closed_at — closed_by alone is "a unit
claimed it, the feedback-assessor hasn't verified yet" (per the v4
schema doc) which still BLOCKS the gate. The user found this when
29 of 31 design FBs in their stuck intent were in that exact state.
Now: skip only when closed_at OR rejected_at is stamped. If closed_by
exists without closed_at, stamp closed_at and PRESERVE the existing
closed_by (don't overwrite the claimer's provenance — just add the
verification timestamp).
2. deriveFeedbackStatus correctness: same root cause. Earlier returned
"closed" for closed_by-alone, lying to consumers. Now returns
"fixing" (claimed-pending-verification) until closed_at lands.
Matches the cursor exactly:
rejected_at → rejected
closed_at + closed_by:feedback-assessor → addressed
closed_at + resolution:answered → answered
closed_at → closed
closed_by alone OR iterations[] → fixing
else → pending
3. Batch ops: mutate_feedback now accepts feedback_ids: string[] and
set_intent_field now accepts fields: { [key]: value } so multi-file /
multi-key mutations land in ONE picker confirmation instead of one
round-trip per item. The user was running 11+ haiku_debug calls in a
row to address corrupt-intent state — batching collapses that into a
single confirm.
Wired through:
- Schema (state/schemas/inputs/debug.ts): feedback_ids array,
fields object
- haiku_debug MCP tool: dispatch loops + describeOp updates so the
picker prompt names the batch shape
- HTTP route /api/debug/intents/:intent/ops/:op: same batch dispatch
- SPA mutate-feedback form: textarea for IDs (one per line) — single
line = single mutate, multi = batch
- SPA set-intent-field form: multi-row "Add row" + "Remove last"
buttons — single row = single mutate, multi = batch
Tests:
test/debug-ops-atomicity.test.mjs: regression test asserts closed_by-
only FB now gets closed_at stamped (was: skipped); existing closed_by
preserved (was: would have been overwritten). feedback_closed count
now reflects both truly-open + claimed-but-unverified FBs.
test/debug-routes.test.mjs: 13/13 still pass.
test/feedback.test.mjs: 64/64 still pass after derivation tightening.
Full suite: 1647 passed, 0 failed.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Claude finished @jwaldrip's task in 4m 51s —— View job Claude finished @jwaldrip's task —— View job PR Review (re-review after three new commits —
|
| Issue | Fixed in |
|---|---|
| Biome lint failures | 57d3825 |
validateSlugArgs missing (path traversal on MCP args) |
33ebd76 |
verifyIntentMutationAuth missing (R-01 bypass) |
33ebd76 |
body.stage not validated with isValidSlug |
33ebd76 |
"continuity" missing from intent approval roles |
33ebd76 |
previewCursor unguarded by try/catch |
33ebd76 |
Raw matter instead of parseFrontmatter |
33ebd76 |
mutateFeedback regex missing $ anchor |
33ebd76 |
Two-pass atomicity for forceStageComplete |
3d978ce |
Schema duplication (tool-defs.ts vs haiku_debug.ts) |
3d978ce |
Cursor GET endpoint missing try/catch |
3d978ce |
validateToolInput shape gate missing |
22fb8d9 |
Looks good (new commits)
- Migration idempotency:
v7-to-v8.tscorrectly no-ops on files without astatus:key. The fallback timestamp for synthesized closure (createdAt || new Date(0).toISOString()) is reasonable — epoch is a valid timestamp, clearly synthetic, and won't cause false positives downstream. normalizeLegacyFeedbackStatuswiring: covers all three write paths that could propagate a legacystatus:spread (updateFeedbackFile, no-op move, reject handler).appendFeedbackReplywas directly updated to stampclosed_at + resolutioninstead.- Batch
setIntentFieldfield keys: these are YAML keys on a fixedintent.mdpath, not filesystem paths — no traversal risk. TheisValidSluggap above applies only tofeedback_ids. - SPA batch UI: textarea with newline-split IDs is a natural pattern; single-line → single call, multi-line → batch. The
biome-ignore lint/suspicious/noArrayIndexKeyannotation on the multi-row grid is correctly justified (positional identity). close_open_feedbackatomicity: the bug fix (stampclosed_atonclosed_by-only FBs) happens in the post-sign FB-walk, which runs only after the two-pass unit sign succeeds. Atomicity of the unit sign is unaffected.- Biome: clean on all four key files.
Ready to merge.
…omplete elaboration synth
Bug 1 (HIGH) — enforceStageBranch dragged the user off their checkout:
ensureOnStageBranch now short-circuits when ALL of:
1. Current branch is NOT a haiku-managed branch for this intent
(haiku/<slug>/main or haiku/<slug>/<stage>) — the engine's
pipeline lives on those branches and needs the actual checkout
even when stage tips converge.
2. Target stage branch is a STRICT ancestor of HEAD (HEAD has
commits the target doesn't). Equality is not enough — a freshly-
forked stage branch shares the same tip as intent main.
3. Implied: target was reachable to begin with (isAncestor true).
Result: a user on `dev` with stage branches merged in stays on `dev`.
The engine's intra-pipeline transitions still work (current is a
haiku branch → guard 1 fails → enforcement runs as before).
Bug 2 (MEDIUM) — close_open_feedback skipped FBs with closed_by but
closed_at: null. Fixed in the prior commit (eeeb66c). Verified.
Bug 3 (LOW-MED) — mutateFeedback required 3-digit zero-padding:
Lookup now mirrors `findFeedbackFile` in state-tools.ts — parses the
leading digits regardless of file-side or input-side padding width.
Pre-2026-05-07 intents with 2-digit FB filenames work without
rename. Tests: 2 new cases in debug-routes.test.mjs (legacy padding
+ filename-stem form).
Bug 4 (LOW) — invalid_feedback_id_shape on filename-stem inputs:
The accepted shapes are now: "FB-001" / "FB-1" / "001" / "1" /
"001-some-slug" / "01-some-slug". Same regex relaxation as Bug 3 —
match leading digits, ignore the rest.
NEW (user request) — force_stage_complete now synthesizes elaboration:
When every unit on a processed stage has terminal-advanced, the
elaboration round-trip is moot for recovery. Synthesize
stages/<stage>/elaboration.md with `recorded_at`, `verified_at`,
`decompose_verified_at`, and `synthesized_by: force_complete` so the
cursor walks past the elaborate phase without waiting for the
verifier subagent. Returns elaborations_sealed: N in the result.
Fires only inside the debug op (user-confirmed); the normal workflow
engine still requires the verifier to seal.
Tests:
test/debug-routes.test.mjs: +2 (legacy padding, filename-stem).
test/debug-ops-atomicity.test.mjs: +1 (elaboration synth).
Full suite: 1650 passed, 0 failed.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Claude finished @jwaldrip's task in 5m 55s —— View job PR Review (re-review after
|
| Issue | Fixed in |
|---|---|
Biome lint failures (writeFileSync, import order, formatter) |
57d3825 |
validateSlugArgs missing (path traversal on MCP args) |
33ebd76 |
verifyIntentMutationAuth missing (R-01 bypass) |
33ebd76 |
body.stage not validated with isValidSlug |
33ebd76 |
"continuity" missing from intent approval roles |
33ebd76 |
previewCursor unguarded by try/catch |
33ebd76 |
Raw matter instead of parseFrontmatter in previewCursor |
33ebd76 |
mutateFeedback regex missing $ anchor, dead code arm |
33ebd76 |
Two-pass atomicity for forceStageComplete |
3d978ce |
Schema duplication (tool-defs.ts vs haiku_debug.ts) |
3d978ce |
Cursor GET endpoint missing try/catch |
3d978ce |
validateToolInput shape gate missing in haiku_debug.handle() |
22fb8d9 |
close_open_feedback skipped closed_by-only FBs (bug) |
eeeb66c |
deriveFeedbackStatus returned "closed" for closed_by alone (bug) |
eeeb66c |
mutateFeedback rejected legacy 2-digit IDs + filename stems (bugs 3/4) |
ace000e |
ensureOnStageBranch forced checkout off user's non-haiku branch (bug 1) |
ace000e |
Looks good (full PR)
- Trust boundary: SPA confirmation modal (HTTP path) +
runPicker(MCP path). Both call the samedebug-ops.tsfunctions. Agent cannot reach any mutation without explicit user action. forceStageCompletetwo-pass atomicity is correct and tested —debug-ops-atomicity.test.mjssnapshot-compares file bytes before/after a failed call.- Schema (
state/schemas/inputs/debug.ts) is the single source of truth;tool-defs.tsandhaiku_debug.tsboth import from it. debug-routes.tsusesverifyIntentMutationAuthon the write endpoint andisValidSlugonbody.stage.run-next-loops-bounded.test.mjsstatic assertion is a strong regression guard for future loop additions.SKILL.md"When NOT to use" section and "never auto-retry on cancellation" note are correct.- Full suite: 1650 passed, 0 failed (up from 1643 in the original PR, consistent with new test cases).
Ready to merge.
…rations[] field
iterations[] is one of the FSM-driven unit fields (unit.ts:62) the
agent normally cannot touch — the schema's propertyNames deny-list
rejects writes and the handler at state-tools.ts:7571 blocks them with
a clear error. Normal writes flow through haiku_unit_start /
haiku_unit_advance_hat / haiku_unit_reject_hat.
But force_stage_complete refuses to sign units whose iterations[] lacks
a terminal advance entry. Legacy intents (and partial-state units
where outputs landed but no iteration was recorded) have no way to
satisfy that precondition — there's no agent-facing tool to write
iterations[].
The new debug op closes the gap. Two modes:
- Auto (default): synthesize one { hat, result: "advance" } entry per
hat in the stage's hats: sequence. The shape force_stage_complete
needs to count the unit as terminal-advance.
- Explicit: caller passes iterations: [...] to write a specific
sequence verbatim (still validated by the per-entry shape gate).
Wired through:
- Schema: HAIKU_DEBUG_INPUT_SCHEMA gains \`unit\` + \`iterations\` fields
- haiku_debug MCP tool: set_unit_iterations case with picker confirm
- HTTP route: POST /api/debug/intents/:intent/ops/set_unit_iterations
- SPA: SetUnitIterationsForm with auto/explicit toggle and JSON editor
Unit lookup tolerates the same shapes as findFeedbackFile:
- "unit-03-my-thing" exact
- "unit-03" prefix
- "03" / "3" / "unit-3" numeric
Tests: +2 in debug-routes.test.mjs (auto-synth + missing-args reject).
Full suite: 1652 passed, 0 failed.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Claude finished @jwaldrip's task in 3m 7s —— View job Claude finished @jwaldrip's task —— PR Review (re-review after
|
| Issue | Fixed in |
|---|---|
Biome lint failures (writeFileSync, import order, formatter) |
57d3825 |
validateSlugArgs missing (path traversal on MCP args) |
33ebd76 |
verifyIntentMutationAuth missing (R-01 bypass) |
33ebd76 |
body.stage not validated with isValidSlug |
33ebd76 |
"continuity" missing from intent approval roles |
33ebd76 |
previewCursor unguarded by try/catch |
33ebd76 |
Raw matter instead of parseFrontmatter in previewCursor |
33ebd76 |
mutateFeedback regex missing $ anchor, dead code arm |
33ebd76 |
Two-pass atomicity for forceStageComplete |
3d978ce |
Schema duplication (tool-defs.ts vs haiku_debug.ts) |
3d978ce |
Cursor GET endpoint missing try/catch |
3d978ce |
validateToolInput shape gate missing in haiku_debug.handle() |
22fb8d9 |
close_open_feedback skipped closed_by-only FBs (bug) |
eeeb66c |
deriveFeedbackStatus returned "closed" for closed_by alone (bug) |
eeeb66c |
mutateFeedback rejected legacy 2-digit IDs + filename stems (bugs 3/4) |
ace000e |
ensureOnStageBranch forced checkout off user's non-haiku branch (bug 1) |
ace000e |
Looks good (new commit)
resolveStageHatswas already exported fromstudio.ts:96— no new export needed.validateSlugArgsattools/define.ts:22already listed"unit"in its key set — the newunit: args.unitfield slots in correctly.- Unit lookup logic is correct: exact match → prefix match → numeric match (with
$anchor on the numeric regex). Handles"unit-03-slug","unit-03","03","3"all correctly. - Auto-synthesis guard:
no_hats_defined_for_stagereturned whenresolveStageHatsreturns empty — ops on invalid studio/stage combos fail cleanly. - Explicit path silently drops extra iteration fields (maps only
hat,result,at) — no surprise fields on disk. - SPA
SetUnitIterationsForm: auto/explicit toggle, JSON textarea for explicit mode,disabled={!stage || !unit}guard on Prepare button.
Ready to merge.
Summary
Combines two related lifecycle-safety features into one PR per request:
/haiku:debugskill — recovery toolkit for unsticking corrupt intents (originally this PR)run_nextloop bounding — wall-clock deadline + static lint test sohaiku_run_nextcannot hang (originally PR fix(engine): wall-clock deadline + static lint test — run_next cannot hang #368, merged in)Both ship together because the debug skill exists to recover from exactly the wedged-intent scenarios the loop bounding work prevents from happening in the first place.
Part 1 —
/haiku:debugskill (intent recovery)Two surfaces, same five ops, two elicitation gates:
haiku_debugMCP tool →runPickerconfirmation/debug→ in-page confirmation modalBoth surfaces call the same
debug-ops.tsfunctions. The agent cannot reach the ops without an explicit user click on either surface.The five ops
force_stage_complete— sign reviews/approvals/QGs for every unit in stages up to and including the target. Refuses units that haven't reached terminaliterations[].result === "advance"(the "moved through every hat" proof). For the final stage, also signsintent.md.approvals.{spec,continuity,user,intent_quality_gates}matching the cursor's intent-completion gate. Atomic — pass 1 collects refusals without writing; pass 2 only runs when the whole set passes.set_intent_field— bypass FSM-protectedintent.mdfrontmatter (mode, etc.).reset_drift— re-stamp every witnessed slot (reviews + approvals on every unit) with the current on-disk SHA so drift sweep stops re-firing on stale witnesses.mutate_feedback— apply an FM patch to a feedback record (status, closed_at, resolution, targets.*, etc.) with no lifecycle guards.preview_cursor— read-onlyderivePosition(). Refreshes after every mutation so the user sees the new cursor head immediately.Surfaces
MCP tool (
haiku_debug): Every mutating op routes throughrunPicker({ kind: "confirm" }). On Cancel/timeout the response is{ action: "cancelled" }and no state was mutated.SPA admin panel (
/debug): Index table at/debug(slug, title, studio, mode, status, archived) deep-links into/debug/:slugper-intent admin panel with intent metadata + cursor preview pane (refreshes after every op) + four admin-op forms. Each form's "Prepare op" button opens a confirmation modal showing the exact request body and an amber "BYPASSES the normal workflow engine" warning.Security
validateToolInputshape gate on the MCP path.validateSlugArgspath-traversal guard forintent/stage/feedback_idon the MCP path.verifyIntentMutationAuthon the HTTP write path (binds JWTsidclaim to URL:intentslug — closes R-01 cross-session bypass).isValidSlugonbody.stagefor bothforce_stage_completeandmutate_feedback(closes path-traversal escape).state/schemas/inputs/debug.ts); bothtool-defs.tsandhaiku_debug.tsimport the sameHAIKU_DEBUG_INPUT_SCHEMA.Part 2 —
run_nextloop bounding (was PR #368)Adds a wall-clock deadline + a static lint test so
haiku_run_nextcannot hang under any circumstance.RUN_NEXT_WALL_DEADLINE_MS=90_000— every internal loop inhaiku_run_next.ts(select_*,close_feedback,complete_stage) checks the wall clock against the deadline captured at handler entry. If exceeded, returns{ error: "wall_deadline_exceeded" }rather than spinning. Thegate_reviewloop is deliberately exempt — it's the SPA-pop blocking primitive.test/run-next-loops-bounded.test.mjs— static lint test scanninghaiku_run_next.tsforwhile (openers; asserts each is iter-counter-gated. New loops added without bounding fail this test._loop_guard.ts—wallDeadlineExceeded+wallDeadlineAbortResponsehelpers shared across loop sites.This composes with PR #367's
wouldDeadlock(already merged) so we have defense in depth: inter-tick deadlock detection + intra-call wall-clock deadline + static-lint enforcement.Files
Debug backend:
packages/haiku/src/orchestrator/workflow/debug-ops.ts— five ops (atomic two-pass design)packages/haiku/src/tools/orchestrator/haiku_debug.ts— MCP tool with picker confirmationpackages/haiku/src/orchestrator/tool-defs.ts— MCP-advertised tool def (uses shared schema)packages/haiku/src/state/schemas/inputs/debug.ts— single-source-of-truth schemapackages/haiku/src/orchestrator/workflow/derived-stage-state.ts— exportedreviewRolesFor/approvalRolesForpackages/haiku/src/http/debug-routes.ts— HTTP endpoints + SPA shellsDebug SPA:
packages/haiku-ui/src/routes/debug/index.tsx— intent indexpackages/haiku-ui/src/routes/debug/$slug.tsx— per-intent admin panel + four forms + confirm modalDebug skill:
plugin/skills/debug/SKILL.mdLoop bounding (was #368):
packages/haiku/src/tools/orchestrator/_loop_guard.tspackages/haiku/src/tools/orchestrator/haiku_run_next.tspackages/haiku/src/orchestrator/workflow/deadlock-detector.tspackages/haiku/src/orchestrator/workflow/run-tick.tspackages/haiku/src/orchestrator/prompts/global/loop_halted/Tests:
packages/haiku/test/debug-routes.test.mjs— 13 cases (real Fastify + on-disk intent)packages/haiku/test/debug-ops-atomicity.test.mjs— 3 cases (byte-identical file snapshot before/after refused op)packages/haiku/test/run-next-loops-bounded.test.mjs— static lint + behavioralpackages/haiku/test/deadlock-detector.test.mjsTest plan
run_nextscenario; verifywall_deadline_exceededafter 90s🤖 Generated with Claude Code