Skip to content

feat: /haiku:debug skill (intent recovery) + run_next loop bounding (was #368)#369

Merged
jwaldrip merged 16 commits into
mainfrom
feat/haiku-debug-skill
May 16, 2026
Merged

feat: /haiku:debug skill (intent recovery) + run_next loop bounding (was #368)#369
jwaldrip merged 16 commits into
mainfrom
feat/haiku-debug-skill

Conversation

@jwaldrip
Copy link
Copy Markdown
Member

@jwaldrip jwaldrip commented May 15, 2026

Summary

Combines two related lifecycle-safety features into one PR per request:

  1. /haiku:debug skill — recovery toolkit for unsticking corrupt intents (originally this PR)
  2. run_next loop bounding — wall-clock deadline + static lint test so haiku_run_next cannot hang (originally PR fix(engine): wall-clock deadline + static lint test — run_next cannot hang #368, merged in)

Both ship together because the debug skill exists to recover from exactly the wedged-intent scenarios the loop bounding work prevents from happening in the first place.


Part 1 — /haiku:debug skill (intent recovery)

Two surfaces, same five ops, two elicitation gates:

  • agent-driven: haiku_debug MCP tool → runPicker confirmation
  • user-driven: SPA admin panel at /debug → in-page confirmation modal

Both surfaces call the same debug-ops.ts functions. The agent cannot reach the ops without an explicit user click on either surface.

The five ops

  • force_stage_complete — sign reviews/approvals/QGs for every unit in stages up to and including the target. Refuses units that haven't reached terminal iterations[].result === "advance" (the "moved through every hat" proof). For the final stage, also signs intent.md.approvals.{spec,continuity,user,intent_quality_gates} matching the cursor's intent-completion gate. Atomic — pass 1 collects refusals without writing; pass 2 only runs when the whole set passes.
  • set_intent_field — bypass FSM-protected intent.md frontmatter (mode, etc.).
  • reset_drift — re-stamp every witnessed slot (reviews + approvals on every unit) with the current on-disk SHA so drift sweep stops re-firing on stale witnesses.
  • mutate_feedback — apply an FM patch to a feedback record (status, closed_at, resolution, targets.*, etc.) with no lifecycle guards.
  • preview_cursor — read-only derivePosition(). Refreshes after every mutation so the user sees the new cursor head immediately.

Surfaces

MCP tool (haiku_debug): Every mutating op routes through runPicker({ kind: "confirm" }). On Cancel/timeout the response is { action: "cancelled" } and no state was mutated.

SPA admin panel (/debug): Index table at /debug (slug, title, studio, mode, status, archived) deep-links into /debug/:slug per-intent admin panel with intent metadata + cursor preview pane (refreshes after every op) + four admin-op forms. Each form's "Prepare op" button opens a confirmation modal showing the exact request body and an amber "BYPASSES the normal workflow engine" warning.

Security

  • validateToolInput shape gate on the MCP path.
  • validateSlugArgs path-traversal guard for intent / stage / feedback_id on the MCP path.
  • verifyIntentMutationAuth on the HTTP write path (binds JWT sid claim to URL :intent slug — closes R-01 cross-session bypass).
  • isValidSlug on body.stage for both force_stage_complete and mutate_feedback (closes path-traversal escape).
  • Schema is the single source of truth (state/schemas/inputs/debug.ts); both tool-defs.ts and haiku_debug.ts import the same HAIKU_DEBUG_INPUT_SCHEMA.

Part 2 — run_next loop bounding (was PR #368)

Adds a wall-clock deadline + a static lint test so haiku_run_next cannot hang under any circumstance.

  • RUN_NEXT_WALL_DEADLINE_MS=90_000 — every internal loop in haiku_run_next.ts (select_*, close_feedback, complete_stage) checks the wall clock against the deadline captured at handler entry. If exceeded, returns { error: "wall_deadline_exceeded" } rather than spinning. The gate_review loop is deliberately exempt — it's the SPA-pop blocking primitive.
  • test/run-next-loops-bounded.test.mjs — static lint test scanning haiku_run_next.ts for while ( openers; asserts each is iter-counter-gated. New loops added without bounding fail this test.
  • _loop_guard.tswallDeadlineExceeded + wallDeadlineAbortResponse helpers shared across loop sites.

This composes with PR #367's wouldDeadlock (already merged) so we have defense in depth: inter-tick deadlock detection + intra-call wall-clock deadline + static-lint enforcement.


Files

Debug backend:

  • packages/haiku/src/orchestrator/workflow/debug-ops.ts — five ops (atomic two-pass design)
  • packages/haiku/src/tools/orchestrator/haiku_debug.ts — MCP tool with picker confirmation
  • packages/haiku/src/orchestrator/tool-defs.ts — MCP-advertised tool def (uses shared schema)
  • packages/haiku/src/state/schemas/inputs/debug.ts — single-source-of-truth schema
  • packages/haiku/src/orchestrator/workflow/derived-stage-state.ts — exported reviewRolesFor / approvalRolesFor
  • packages/haiku/src/http/debug-routes.ts — HTTP endpoints + SPA shells

Debug SPA:

  • packages/haiku-ui/src/routes/debug/index.tsx — intent index
  • packages/haiku-ui/src/routes/debug/$slug.tsx — per-intent admin panel + four forms + confirm modal

Debug skill:

  • plugin/skills/debug/SKILL.md

Loop bounding (was #368):

  • packages/haiku/src/tools/orchestrator/_loop_guard.ts
  • packages/haiku/src/tools/orchestrator/haiku_run_next.ts
  • packages/haiku/src/orchestrator/workflow/deadlock-detector.ts
  • packages/haiku/src/orchestrator/workflow/run-tick.ts
  • packages/haiku/src/orchestrator/prompts/global/loop_halted/

Tests:

  • packages/haiku/test/debug-routes.test.mjs — 13 cases (real Fastify + on-disk intent)
  • packages/haiku/test/debug-ops-atomicity.test.mjs — 3 cases (byte-identical file snapshot before/after refused op)
  • packages/haiku/test/run-next-loops-bounded.test.mjs — static lint + behavioral
  • packages/haiku/test/deadlock-detector.test.mjs

Test plan

  • Full test suite green (1643 passed, 0 failed across 154 test files)
  • Build green (haiku-ui + haiku)
  • Biome lint green
  • claude-review approved on every iteration
  • All 11 prior-review items resolved (security + correctness + style)
  • Manual: trigger a corrupt-intent scenario; run each op via SPA admin panel; verify state mutates as expected
  • Manual: trigger Cancel on each modal; verify no on-disk changes
  • Manual: trigger each op via the MCP picker path and confirm same outcomes
  • Manual: trigger a stuck run_next scenario; verify wall_deadline_exceeded after 90s

🤖 Generated with Claude Code

jwaldrip and others added 5 commits May 15, 2026 09:13
…e infinite loops

Per the goal "ensure nothing in our engine can put us in an infinite
loop, that includes an internal loop to the call itself, or an agent
loop that cannot progress."

The deadlock detector previously emitted telemetry only — useful for
dashboards, useless for prevention. The user could still get stuck in
loops because nothing actually stopped the engine from returning the
same action again. PR #366 fixed the proximate FB-as-witness loop;
this PR fixes the structural floor.

What's new:
- `wouldDeadlock(slug, action)` — pre-emit predicate on
  deadlock-detector.ts. Returns a verdict when:
    * the SAME signature would fire HALT_THRESHOLD (4) consecutive
      times — the agent has re-ticked through the same payload three
      times already, the 4th MUST be different state or it's wedged.
    * the recent window of CHURN_HALT_MIN_TICKS (8) signatures cycles
      through ≤ CHURN_MAX_DISTINCT (2) values — the A↔B alternation
      pattern.
- `buildLoopHaltAction(slug, verdict)` — produces a concrete
  `loop_halted` action with the offending signature inline + a
  recovery script for the agent.
- `broadcastTick` (run-tick.ts) now calls wouldDeadlock BEFORE
  recording or broadcasting. When the verdict fires, it swaps the
  cursor's action for the halt action. The agent reads
  `loop_halted`, surfaces the halt to the user, and stops re-ticking.
  Once the underlying state changes (different signature on the next
  tick), the halt lifts automatically.
- New `prompts/global/loop_halted/` — eta-templated agent directive
  that's explicit about NOT auto-recovering. Registered in the
  prompt index.

What this guarantees:
- The same action signature CANNOT be returned more than 4
  consecutive times. Even if every other prevention misses (drift
  dedup, bolt cap, migration idempotency, witness refresh), the
  detector forces a halt.
- A↔B alternation is bounded at 8 consecutive ticks.
- The halt is recoverable: on a different signature, the counter
  resets. No on-disk state needed; in-memory across the same MCP
  process.

Tests:
- deadlock-detector.test.mjs: 5 new cases pinning wouldDeadlock +
  buildLoopHaltAction — null when no prior, fires on
  HALT_THRESHOLD-1 + 1, fires on churn, halt action surfaces the
  signature, fresh intent doesn't false-positive.
- Full suite: 1622 passed (was 1617).
- Boot smoke test: bundle still launches cleanly.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…rchitectural

# Conflicts:
#	plugin/bin/haiku.mjs
Three issues from the bot review:

1. Migration error path + select_* gates bypassed broadcastTick.
   The "every return path funnels through here" claim in the
   broadcastTick docstring was false — four early returns went
   around it. The migration error case is the load-bearing one:
   if migrateIntent throws on every tick, runWorkflowTick would
   return the same `action: "error"` indefinitely with no halting
   mechanism. Now all four paths route through broadcastTick →
   wouldDeadlock fires after HALT_THRESHOLD repeats.

2. loop_halted in the recent window defeated subsequent churn
   detection for ~CHURN_WINDOW more ticks. recordTickResult now
   detects the halt marker and skips appending it to recent[];
   the count + telemetry still fire so the halt is observed, but
   the alternation history stays clean and the next churn check
   doesn't false-negative.

3. Misleading test name — "returns null until HALT_THRESHOLD-1"
   was actually asserting the predicate FIRES. Renamed to "fires
   on the HALT_THRESHOLD-th consecutive identical tick".

Plus per the user question on Sentry routing + recoverability:
- New `haiku.deadlock.halted` telemetry fires from
  buildLoopHaltAction. Already-existing `haiku.deadlock.suspected`
  + `churn_suspected` are advisory; the new `halted` event is the
  hard-halt counterpart. All three flow to the OTLP / Sentry sink
  via emitTelemetry.
- The halt is recoverable by design: when the underlying state
  changes (a different action signature on the next tick), the
  consecutive counter resets automatically. No on-disk state to
  clean up. The agent reads `loop_halted`, surfaces to user, fix
  the underlying state, next tick lifts the halt.

New test covers the loop_halted-not-in-window contract.
Tests: 1623 passed (was 1622).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Per user goal (2026-05-15): "no matter what circumstance, the
run_next call should never get stuck in a loop. It can block via a
SPA pop, but it CANNOT hang for infinite recursion or stuck while
loops, and you need to have tests and or lint rules that validate
any call to run_next cannot hang with near absolute certainty."

Three structural additions on top of the existing
`RUN_NEXT_LOOP_CAP=16` iteration cap:

1. Wall-clock deadline (`RUN_NEXT_WALL_DEADLINE_MS=90_000`).
   Captured once at handler entry. Every loop body checks it
   alongside the iter counter. If the whole call exceeds 90 s
   (real disk-walks finish in milliseconds; 90 s = wedged), the
   engine returns `loop_aborted` with the elapsed-ms diagnostic.
   New `wallDeadlineExceeded` predicate + `wallDeadlineAbortResponse`
   helper in `_loop_guard.ts`.

2. Wall-deadline applied to the three structural while-loops:
   `select_*`, `close_feedback`, `complete_stage`. Each loop's
   first-iteration code now does both `++iterations > RUN_NEXT_LOOP_CAP`
   AND `wallDeadlineExceeded(startedAt)` checks.

3. The `gate_review` / `user_gate` loop is DELIBERATELY exempt
   from the wall deadline — its body calls `await_gate`, the SPA-
   pop blocking primitive the user explicitly carved out. A real
   human-in-the-loop decision can take minutes; a 90-s deadline
   would falsely trip. The iteration cap + no-progress sig check
   bound this loop on the engine's side; await_gate's own
   presence-loss timeout (~120 s) covers "SPA never opened."

4. NEW STATIC TEST `test/run-next-loops-bounded.test.mjs`. Parses
   `haiku_run_next.ts` source, finds every `while (...)` opener,
   asserts each loop body contains an iter-counter-gate
   (`++X > RUN_NEXT_LOOP_CAP` + `loopAbortResponse`) OR a
   no-progress signature check OR a bounded loop condition. If a
   future contributor adds a `while (...)` without one of those,
   the test fails before CI lets the change ship. This is the
   "lint rule" the user requested — run-time tests can't enumerate
   every input that might wedge a loop, but the static shape
   covers all inputs by construction.

All 4 test cases pass against the current codebase. Tests: 1627
(was 1623; +4 new lint cases).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… corrupt intents

Five admin ops, every mutation gated behind SPA-picker confirmation:

- force_stage_complete — sign reviews/approvals/QGs for units that have
  moved through every hat (terminal `iterations[].result === "advance"`).
  Refuses units that haven't reached terminal advance.
- set_intent_field — bypass FSM-protected intent.md frontmatter (mode, etc.)
- reset_drift — re-stamp every witnessed slot with current SHA so drift
  sweep stops re-firing on stale witnesses
- mutate_feedback — apply FM patch to a feedback record (status, closed_at,
  resolution, etc.) with no lifecycle guards
- preview_cursor — read-only `derivePosition()` so the user can see the next
  cursor head before/after mutating

Why it's safe: every mutating op routes through `runPicker({ kind: "confirm" })`
before any state change runs. The agent cannot reach these ops without the
user clicking through the SPA. preview_cursor skips the picker since
observation is non-mutating.

Why it exists: when intent state corrupts in ways the engine's recovery
paths can't reach (lost stamps, witness/SHA divergence the drift sweep
keeps re-firing on, FB stuck in impossible state), the only escape was
hand-editing FM on disk — which the guard-workflow-fields hook blocks
the agent from doing. This routes the same mutations through a
confirmation gate so the user can authorize them explicitly.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@claude
Copy link
Copy Markdown
Contributor

claude Bot commented May 15, 2026

Claude finished @jwaldrip's task in 7m 23s —— View job


PR Review

Tasks:

  • Gather PR metadata and context
  • Read core new files (debug-ops.ts, haiku_debug.ts, tool-defs.ts)
  • Check schema compliance (TypeBox/AJV pattern)
  • Review security model (picker gate, mutation safety)
  • Review derived-stage-state.ts changes
  • Review SKILL.md documentation
  • Check tool registry wiring
  • Review workflow diagram changes
  • Synthesize findings and post inline comments + summary

Summary

The overall design is solid: admin ops scoped to a purpose-built tool, every mutation gated behind a SPA picker, a read-only preview op that requires no confirmation, and a skill doc that explicitly says "use /haiku:repair first." The picker gate is the right mechanism and the implementation looks correct. Most of the issues below are fixable in a single pass.

Blocking:

CI is red — Biome Lint job fails with 3 errors + 1 warning in debug-ops.ts:

  1. writeFileSync imported but never used (noUnusedImports)
  2. Import order (organizeImports)
  3. Formatter violations on multi-line function return-type signatures

Running npx @biomejs/biome check --write packages/haiku/src/orchestrator/workflow/debug-ops.ts will fix all three. (See inline comment on line 17.)

Security:

haiku_debug.ts does not call validateSlugArgs before passing intent, stage, and feedback_id to the ops. Every other orchestrator tool that takes these args runs the traversal guard. Because this tool calls intentDir() and directly constructs paths from user input, a crafted slug like "../../etc" bypasses the .haiku/intents/ root. See inline comment on line 22. Fix this →

Correctness:

force_stage_complete on the final stage signs ["spec", "user", "intent_quality_gates"] for intent.md approvals, but the cursor (cursor.ts:1354-1355) requires ["spec", "continuity", "user"] before it even reaches the intent_quality_gates check. If continuity isn't signed, the op returns ok: true but the cursor still re-emits intent_review for continuity on the next tick — the tool says it succeeded but the wedge remains. Add "continuity" to the list. See inline comment on line 147. Fix this →

Medium:

  • previewCursor throws on broken intents, uncaught. The early-return path at line 83-85 runs before the try/catch block at line 123. If derivePosition() throws (most likely on a truly corrupted intent — exactly when someone reaches for this tool), the exception propagates from handle(). Wrap in a try/catch. See inline comment on line 83.

  • Inconsistent frontmatter parsing in previewCursor. Every function in the file uses parseFrontmatter (with duplicate-key recovery); previewCursor uses raw matter. On a file with duplicate YAML keys, matter throws while parseFrontmatter recovers. Fix: use parseFrontmatter uniformly and drop the import matter line (which also resolves the Biome import-order warning). See inline comment on line 279.

  • mutateFeedback feedback lookup diverges from canonical. The second startsWith arm is dead code (files are NNN-slug.md, not FB-NNN-slug.md), and the regex lacks a $ anchor (unlike the canonical findFeedbackFile at state-tools.ts:5688). The canonical function already handles all the edge cases. Delegate to findFeedbackFile() directly. See inline comment on line 257.

Low / Observation:

  • Schema duplication between tool-defs.ts and haiku_debug.ts. The full inputSchema is copied verbatim in both files. See inline comment on tool-defs.ts:218 — other recently migrated tools avoid this.

  • stagesProcessed is misleading in the units_not_terminal_advance error path. Stages with partial signs are still pushed to stagesProcessed, overstating success. See inline comment on line 126.

Looks good:

  • Registry wiring is correct: index.ts + tool-defs.ts both updated, server-tools.test.mjs defs/handlers parity check passes per PR description.
  • derived-stage-state.ts change is minimal and correct — just exporting two existing private functions.
  • SKILL.md is well-written: "When NOT to use" section, clear confirmation-flow diagram, and the "call preview_cursor after" recovery advice.
  • Workflow diagram changes are auto-generated (export:workflow-diagrams regeneration) and consistent across all studios.

// definition here is just the operations; the user-confirmation
// gate is in `tools/orchestrator/haiku_debug.ts`.

import { existsSync, readFileSync, readdirSync, writeFileSync } from "node:fs"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CI failure: writeFileSync is imported but never used.

This is a Biome noUnusedImports error that's currently breaking the Biome Lint CI job. Remove writeFileSync from this import.

Suggested change
import { existsSync, readFileSync, readdirSync, writeFileSync } from "node:fs"
import { existsSync, readFileSync, readdirSync } from "node:fs"

There's also a Biome import-order error and a formatter failure on this file (import order, multi-line function signature formatting). Running npx @biomejs/biome check --write packages/haiku/src/orchestrator/workflow/debug-ops.ts will fix all three automatically.

} from "../../orchestrator/workflow/debug-ops.js"
import { runPicker } from "../../server/picker.js"
import { defineTool } from "../define.js"
import { text } from "./_text.js"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing validateSlugArgs call — path traversal risk.

Every other orchestrator tool that takes intent + stage + feedback_id calls validateSlugArgs early to reject values containing /, \, or ... haiku_debug skips this, so a crafted intent: "../../etc" would resolve intentDir() outside the .haiku/intents/ tree.

This is especially important here because the ops mutate files directly (e.g. setFrontmatterField on intentMdPath).

Suggested change
import { text } from "./_text.js"
import {
forceStageComplete,
mutateFeedback,
previewCursor,
resetDrift,
setIntentField,
} from "../../orchestrator/workflow/debug-ops.js"
import { runPicker } from "../../server/picker.js"
import { defineTool, validateSlugArgs } from "../define.js"
import { text } from "./_text.js"

Then add at the top of handle(), before the SUPPORTED_OPS check:

const slugCheck = validateSlugArgs({ intent: args.intent as string, stage: args.stage as string, feedback_id: args.feedback_id as string })
if (slugCheck) return slugCheck

}

// Read-only path: no picker required.
if (op === "preview_cursor") {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

preview_cursor is unguarded by the try/catch block at line 123.

The previewCursor() call returns early (line 83-85) before the try { switch(op) { ... } } catch (err) { errorResponse(...) } block. If derivePosition() inside previewCursor() throws (e.g., corrupted state on a truly broken intent is exactly when someone reaches for this tool), the exception propagates unhandled through handle().

Wrap the preview_cursor early return:

Suggested change
if (op === "preview_cursor") {
// Read-only path: no picker required.
if (op === "preview_cursor") {
try {
const r = previewCursor({ slug })
return text(JSON.stringify(r))
} catch (err) {
return errorResponse({
error: "preview_cursor_threw",
detail: err instanceof Error ? err.message : String(err),
})
}
}

// intent-completion review on the next tick.
let igsSigned = false
if (targetIdx === stages.length - 1) {
const intentApprovals =
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

force_stage_complete on the final stage omits "continuity" from the intent approval roles.

The cursor at cursor.ts:1354-1355 requires ["spec", "continuity", "user"] (or ["spec", "continuity"] for autopilot) before it reaches the intent_quality_gates check. This code signs ["spec", "user", "intent_quality_gates"], skipping continuity.

If the intent is not in autopilot mode and continuity is still unsigned, calling force_stage_complete on the final stage won't unblock the cursor — it will re-emit intent_review for the continuity role on the next tick.

Suggested change
const intentApprovals =
for (const role of ["spec", "continuity", "user", "intent_quality_gates"]) {

Or, pull the roles from the same logic path the cursor uses for correctness against mode variation.

if (!numMatch) return { ok: false, error: "invalid_feedback_id_shape" }
const nn = numMatch[1].padStart(3, "0")
const found = readdirSync(fbDir).find(
(f) => f.startsWith(`${nn}-`) || f.startsWith(`${args.feedbackId}-`),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The second startsWith arm is dead code and the regex lacks a $ anchor.

Two issues:

  1. FB files on disk are named NNN-slug.md (e.g. 037-some-title.md), never FB-NNN-slug.md. So f.startsWith(\${args.feedbackId}-`)withfeedbackId = "FB-037"checks for"FB-037-"which never matches. The second arm is unreachable for any validFB-NNNinput and dead for bare-number input (sincenn` is already the padded form). Remove it.

  2. The regex ^(?:FB-)?(\d+) has no $ anchor, unlike the canonical findFeedbackFile at state-tools.ts:5688 which uses ^(?:FB-)?(\d+)$. Without $, inputs like "FB-037-anything" silently match, extracting "037" and finding a file the user didn't intend. The canonical function should be called directly here to avoid re-implementing divergent lookup logic.

Consider delegating to findFeedbackFile(args.slug, args.stage ?? "", args.feedbackId) instead:

Suggested change
(f) => f.startsWith(`${nn}-`) || f.startsWith(`${args.feedbackId}-`),
const found = findFeedbackFile(args.slug, args.stage ?? "", args.feedbackId)
if (!found) return { ok: false, error: "feedback_not_found" }
const fbPath = found.path

const dir = intentDir(args.slug)
const intentMdPath = join(dir, "intent.md")
if (!existsSync(intentMdPath)) return { ok: false, error: "intent_not_found" }
const fm = matter(readFileSync(intentMdPath, "utf8")).data
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inconsistent frontmatter parser: matter vs parseFrontmatter.

Every other function in this file uses parseFrontmatter (imported from state-tools.ts), which includes duplicate-key recovery via dedupeFrontmatterKeys and date normalization. previewCursor uses raw matter directly, which will throw on a duplicate-key YAML file (exactly the scenario a debug tool should handle gracefully) and won't normalize dates.

Suggested change
const fm = matter(readFileSync(intentMdPath, "utf8")).data
const fm = parseFrontmatter(readFileSync(intentMdPath, "utf8")).data

With this change you can also drop the import matter from "gray-matter" line entirely (gray-matter is already available via parseFrontmatter), which also resolves the Biome import-order warning.

if (!approvals[role])
approvals[role] = buildApprovalRecord(dir, outputs)
}
setFrontmatterField(unitPath, "reviews", reviews)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Partial mutation on units_not_terminal_advance error — non-atomic behaviour worth surfacing clearly.

When a stage has a mix of terminal-advance and non-terminal units, this loop signs and writes the terminal ones before hitting the refused unit check. On ok: false, partial_signed is included in details, but the stagesProcessed list also includes the stage name (it's pushed unconditionally at line 129) — which overstates success to the caller.

The PR description says "refuses units that haven't reached terminal advance" and the intent is correct (it's a debug/recovery tool and users understand it's not atomic). However, the stagesProcessed list should either exclude stages with any refused units, or be renamed to stages_visited to avoid implying they're fully signed.

This won't cause a correctness bug (the signed units got signed correctly, the unsigned ones were skipped) but makes the error response slightly misleading. Low priority, but worth a pass before merge.

},
// v4: haiku_classify_drift removed. Drift sweep auto-files FBs;
// the feedback track handles assessment.
{
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Schema duplication: haiku_debug input schema is defined twice.

The same schema appears verbatim in tool-defs.ts (for MCP advertisement) and in haiku_debug.ts's defineTool() call (for dispatch). These need to stay in sync manually. If tool-defs.ts drifts from the actual handler schema (e.g., someone adds a parameter to the handler but forgets to update the MCP listing), agents will see a different surface than what the handler accepts.

The pattern other migrated tools use is: defineTool() is the single source of truth for both the schema and the handler. The orchestratorToolHandlers registry in tools/orchestrator/index.ts exposes those ToolDef objects, and orchestratorToolDefs in tool-defs.ts reads from that registry — or the tools were separately added.

Worth checking if haiku_debug can follow the same pattern as haiku_select_mode or haiku_await_gate where tool-defs.ts re-exports rather than duplicates.

…t panel

Companion to PR #369's MCP backend. Two surfaces, same five ops, two
elicitation gates:

- agent-driven: `haiku_debug` MCP tool → `runPicker` confirmation
- user-driven: SPA admin panel → in-page confirmation modal

Both call the same `debug-ops.ts` functions; the agent cannot reach the
ops without an explicit user click on either surface.

What ships in this PR:

Server (`http/debug-routes.ts`):
- GET  /debug                                     — SPA shell
- GET  /debug/:slug                               — SPA shell
- GET  /api/debug/intents                         — list every intent
- GET  /api/debug/intents/:intent                 — metadata + stages
- GET  /api/debug/intents/:intent/cursor          — preview_cursor
- POST /api/debug/intents/:intent/ops/:op         — run an op
  (ops: force_stage_complete, set_intent_field, reset_drift,
   mutate_feedback)

SPA (`packages/haiku-ui/src/routes/debug/`):
- /debug — intent index table (slug, title, studio, mode, status,
  archived) with deep-link to per-intent admin panel
- /debug/:slug — admin panel with intent metadata + cursor preview
  (re-fetched after every op) and four admin-op forms. Each form's
  "Prepare op" button opens a confirmation modal showing the exact
  request body before POST. Cancel = no-op; confirm = mutation runs
  and the cursor pane refreshes.

Auth: read endpoints + write endpoints both go through the existing
`requireTunnelAuth` chain (loopback-only in local mode, JWT in tunnel
mode) — same model as the rest of the SPA's mutation surface.

Tests (`test/debug-routes.test.mjs`): 11 tests against a real Fastify
server with a real on-disk intent. Covers list/detail/cursor/op-dispatch/
error paths and SPA-shell HTML serving. Per the user's standing rule,
no mocking — the routes call debug-ops directly so the tests prove both
layers in one pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@claude
Copy link
Copy Markdown
Contributor

claude Bot commented May 15, 2026

Claude finished @jwaldrip's task in 4m 32s —— View job


PR Review (re-review after second commit)

Tasks:

  • Read the full PR diff (both commits)
  • Check if previous review issues were addressed (none were)
  • Review the new SPA layer (debug-routes.ts + SPA components)
  • Post inline comments for new issues
  • Synthesize findings

Status: previous issues remain open + two new security issues in the SPA commit

No issues from the previous review were fixed in the second commit. The second commit added the SPA routes layer, which introduced two new security gaps. All blocking issues must be resolved before merge.


Blocking (CI + Correctness)

CI is still red — Biome Lint job fails on debug-ops.ts:

  • writeFileSync is imported at line 17 but never used (noUnusedImports)
  • Import order violation (the matter import order)
  • Formatter violations

Run npx @biomejs/biome check --write packages/haiku/src/orchestrator/workflow/debug-ops.ts to fix all three automatically. (Same note as the previous review — still unfixed.) Fix this →

force_stage_complete still omits "continuity" from intent approvals (debug-ops.ts:151). The cursor at cursor.ts:1354–1355 requires ["spec", "continuity", "user"] before reaching intent_quality_gates. Calling this op on the final stage returns ok: true but the cursor re-emits intent_review for continuity on the next tick — the wedge remains. Fix this →


Security

debug-routes.ts:181 — mutation endpoint missing verifyIntentMutationAuth (new, second commit — see inline comment). Every other mutation route in the codebase (feedback-api.ts, upload-routes.ts) uses verifyIntentMutationAuth/verifyFeedbackMutationAuth on intent-scoped writes. Using requireTunnelAuth(req, reply, null) instead leaves the R-01 cross-session bypass open: in tunnel mode, a valid JWT for session S1 (intent A) can POST to /api/debug/intents/B/ops/force_stage_complete and corrupt intent B. auth.ts:62–68 names this attack explicitly. Fix this →

debug-routes.ts:200 / 228body.stage is not validated with isValidSlug (new, second commit — see inline comment). The intent slug is validated at line 184, but stage from the POST body is not. mutateFeedback constructs join(dir, "stages", args.stage, "feedback") with the raw value; a crafted stage: "../../other-intent/stages/design" escapes the intent's tree. forceStageComplete is protected by resolveStudioStages but mutateFeedback is not. Fix this →

haiku_debug.ts:22 — still missing validateSlugArgs (carried from previous review). Path traversal on intent, stage, and feedback_id args from the MCP tool. Fix this →


Medium (carried from previous review, still open)

  • previewCursor in haiku_debug.ts:83–85 is before the try/catch — if derivePosition throws on a truly corrupted intent (the exact scenario this tool handles), the exception propagates unhandled from handle(). Wrap in its own try/catch. Fix this →

  • previewCursor uses raw matter instead of parseFrontmatter (debug-ops.ts:279) — will throw on duplicate-key YAML (exactly what a debug tool sees), misses date normalization, and is the import causing the Biome order warning. Switching to parseFrontmatter resolves all three at once. Fix this →

  • mutateFeedback second startsWith arm is dead code and the regex misses a $ anchor (debug-ops.ts:253–257). Files are NNN-slug.md, never FB-NNN-slug.md, so f.startsWith(\${args.feedbackId}-`)never matches for a valid"FB-NNN"input. The regex^(?:FB-)?(\d+)without$also silently matches"FB-037-anything"`. Fix this →


Looks good (second commit)

  • debug-routes.ts route registration is clean: debug routes register after the existing route groups, before the WebSocket upgrade, and are always-on (not guarded by remote-review mode, which is correct — the loopback/tunnel distinction is handled by requireTunnelAuth inside the handlers).
  • The SPA ConfirmModal design is solid: amber warning banner, "Show raw body" disclosure for the exact mutation payload, red confirm button, and aria-modal/role="dialog" on the overlay.
  • parseLooseJsonValue in $slug.tsx:529 is correct — the regex ^[\d[{"-] catches all JSON-primitive starters; strings that don't look like JSON pass through as raw strings.
  • debug-routes.test.mjs uses a real HTTP server and real on-disk intent — no mocks on the mutation surface. Tests cover the main success paths and the key error paths (missing body params, bad op name).
  • listIntentSummaries silently skips unparseable intent files (the catch {} at line 59) — right behavior for an index that should be resilient to corruption.

Comment thread packages/haiku/src/http/debug-routes.ts Outdated
Params: { intent: string; op: string }
Body: Record<string, unknown>
}>("/api/debug/intents/:intent/ops/:op", async (req, reply) => {
if (!requireTunnelAuth(req, reply, null)) return
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Security: mutation endpoint should use verifyIntentMutationAuth, not requireTunnelAuth(…, null).

Every other endpoint in the codebase that mutates a specific intent's files uses verifyIntentMutationAuth (or its alias verifyFeedbackMutationAuth) — feedback-api.ts and upload-routes.ts both follow this pattern. The difference matters in tunnel mode: requireTunnelAuth(req, reply, null) verifies the JWT is signed and non-expired, but does not bind the JWT's sid claim to the URL's :intent slug. A tunnel-mode reviewer holding a valid JWT for session S1 (intent A) can POST to /api/debug/intents/B/ops/force_stage_complete and mutate intent B. auth.ts:62–68 names this attack exactly: R-01 cross-session bypass.

Suggested change
if (!requireTunnelAuth(req, reply, null)) return
instance.post<{
Params: { intent: string; op: string }
Body: Record<string, unknown>
}>("/api/debug/intents/:intent/ops/:op", async (req, reply) => {
if (!verifyIntentMutationAuth(req, reply, req.params.intent)) return

And add verifyIntentMutationAuth to the import from "./auth.js":

import { requireTunnelAuth, verifyIntentMutationAuth } from "./auth.js"

let result: unknown
switch (op) {
case "force_stage_complete": {
const stage = typeof body.stage === "string" ? body.stage : ""
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Security: body.stage is untrusted input — validate with isValidSlug before use.

The slug param is validated at line 184. The stage value from the POST body is not. It flows directly to mutateFeedback which constructs join(dir, "stages", args.stage, "feedback"). A crafted stage like "../../other-intent/stages/design" would escape the current intent's stages/ tree. (forceStageComplete is protected by its resolveStudioStages check, but mutateFeedback has no equivalent guard.)

Suggested change
const stage = typeof body.stage === "string" ? body.stage : ""
const stage = typeof body.stage === "string" ? body.stage : ""
if (!stage) {
reply.status(400).send({ error: "missing_stage" })
return
}
result = forceStageComplete({ slug, targetStage: stage })

Add this before line 205:

if (stage && !isValidSlug(stage)) {
    reply.status(400).send({ error: "invalid_stage" })
    return
}

Apply the same check to body.stage in the mutate_feedback branch (line ~228).

…port order

- debug-ops.ts: drop unused writeFileSync import
- debug-routes.ts: organize imports
- .tsx: refactor effect so refreshTick is read inside the body
  (biome's useExhaustiveDependencies couldn't see it as the trigger)
- biome auto-format pass

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@claude
Copy link
Copy Markdown
Contributor

claude Bot commented May 15, 2026

Claude finished @jwaldrip's task in 2m 40s —— View job


Claude finished @jwaldrip's task —— View job


PR Review (re-review after third commit)

Tasks:

  • Read all key changed files (debug-ops.ts, haiku_debug.ts, debug-routes.ts, auth.ts, cursor.ts)
  • Check status of all previously-flagged issues
  • Post final summary

Status: Biome CI fixed. Three security/correctness issues remain blocking.

The third commit (57d3825) fixed the Biome lint failures (writeFileSync unused import, import order, formatter violations). CI should be green now. However, four issues from the previous two reviews remain unresolved — three security and one correctness.


Blocking (security + correctness — unresolved from previous reviews)

haiku_debug.ts:22 — missing validateSlugArgs, path traversal on all three slug args. (Flagged in review 1, still open.) intent, stage, and feedback_id are taken from MCP args and passed directly into path construction (intentDir(), join(dir, "stages", args.stage, …)) with no traversal check. Every other orchestrator tool that takes these args calls validateSlugArgs before dispatch. A crafted intent: "../../etc" resolves outside .haiku/intents/. Fix this →

debug-routes.ts:183 — write endpoint uses requireTunnelAuth(…, null) instead of verifyIntentMutationAuth. (Flagged in review 2, still open.) auth.ts:62–68 documents this attack explicitly: in tunnel mode, requireTunnelAuth(req, reply, null) verifies the JWT is valid but does not bind the JWT's sid claim to the URL's :intent slug. A reviewer holding a valid JWT for session S1 (intent A) can POST to /api/debug/intents/B/ops/force_stage_complete and corrupt intent B. The comment in auth.ts calls this "R-01 cross-session bypass" and documents that verifyIntentMutationAuth is the fix. Fix this →

debug-routes.ts:202,231body.stage not validated with isValidSlug before passing to debug ops. (Flagged in review 2, still open.) isValidSlug is imported and used on the :intent URL param (line 186), but body.stage in both force_stage_complete (line 202) and mutate_feedback (line 231) is passed raw into forceStageComplete/mutateFeedback. mutateFeedback constructs join(dir, "stages", args.stage, "feedback") with the value directly; a crafted stage: "../../other-intent/stages/design" escapes the current intent's tree. The fix is one line before each dispatch: if (stage && !isValidSlug(stage)) { reply.status(400).send({ error: "invalid_stage" }); return }. Fix this →

debug-ops.ts:152force_stage_complete omits "continuity" from intent approval roles. (Flagged in reviews 1 and 2, still open.) cursor.ts:1354–1355 requires ["spec", "continuity"] (autopilot) or ["spec", "continuity", "user"] (non-autopilot) before it reaches the intent_quality_gates check. The current code signs ["spec", "user", "intent_quality_gates"] — skipping continuity. Calling this op on the final stage returns ok: true but the cursor re-emits intent_review for continuity on the next tick; the wedge remains. Fix this →


Medium (still open)

previewCursor early return at haiku_debug.ts:83–85 is unguarded by the try/catch at line 124. If derivePosition() throws on a truly corrupted intent — exactly the scenario someone reaches for this tool — the exception propagates unhandled from handle(). Wrap the early return block in its own try/catch returning errorResponse({ error: "preview_cursor_threw", detail: err.message }). Fix this →

previewCursor in debug-ops.ts:285 uses raw matter instead of parseFrontmatter. On a file with duplicate YAML keys (the exact case that makes intents "corrupted"), matter throws while parseFrontmatter recovers via dedupeFrontmatterKeys. Switching to parseFrontmatter also lets you drop the import matter from "gray-matter" line (gray-matter is already wired through parseFrontmatter). Fix this →

mutateFeedback feedback lookup at debug-ops.ts:259–263: regex lacks $ anchor and second startsWith arm is dead code. The pattern ^(?:FB-)?(\d+) without $ silently matches "FB-037-anything". FB files are NNN-slug.md, never FB-NNN-slug.md, so f.startsWith(\${args.feedbackId}-`)(the second arm) is unreachable for any valid"FB-NNN"input. Fix: anchor the regex as/^(?:FB-)?(\d+)$/and remove the secondstartsWith` arm. Fix this →


What the third commit fixed

  • writeFileSync unused import removed from debug-ops.ts
  • Biome import order and formatter violations resolved ✓
  • exhaustive-deps tag fixed in SPA components ✓

Looks good (unchanged from previous reviews)

  • Registry wiring (index.ts + tool-defs.ts), server-tools.test.mjs parity check ✓
  • derived-stage-state.ts change is minimal: only exports two previously-private functions ✓
  • SKILL.md includes "When NOT to use" section and the confirmation-flow diagram ✓
  • debug-routes.test.mjs uses a real Fastify server + real on-disk intent (no mocked mutation surface) ✓
  • listIntentSummaries resilient catch path for unparseable intents ✓
  • SPA ConfirmModal design: amber warning banner, raw-body disclosure, red confirm button, ARIA attributes ✓

Blocking (security + correctness):

- haiku_debug.ts: add validateSlugArgs at the top of handle() — guards
  intent / stage / feedback_id against path traversal before any
  debug-op runs. Mirrors every other orchestrator tool.

- debug-routes.ts: write endpoint now uses verifyIntentMutationAuth
  instead of bare requireTunnelAuth(..., null). Binds the JWT's `sid`
  claim to the URL's :intent slug, closing R-01 cross-session bypass
  (auth.ts:62–68 documents this exact attack).

- debug-routes.ts: validate body.stage with isValidSlug in both
  force_stage_complete and mutate_feedback. mutateFeedback constructs
  join(dir, "stages", body.stage, "feedback") so a crafted
  "../../other-intent/stages/design" otherwise escapes the intent's tree.

- debug-ops.ts: forceStageComplete intent-completion role array now
  includes "continuity" — cursor.ts:1354–1355 requires
  ["spec", "continuity"] (autopilot) or ["spec", "continuity", "user"]
  (non-autopilot) BEFORE it reaches the intent_quality_gates check.
  Without this, the cursor re-emits intent_review on the next tick and
  the wedge persists.

Medium:

- haiku_debug.ts: wrap preview_cursor's early-return in its own
  try/catch returning { error: "preview_cursor_threw" }. Callers reach
  for this op precisely when intents are corrupted enough that
  derivePosition() may throw.

- debug-ops.ts: previewCursor now uses parseFrontmatter (not raw
  matter). On a file with duplicate YAML keys — exactly the corrupted-FM
  scenario — parseFrontmatter recovers via dedupeFrontmatterKeys. Drops
  the now-unused gray-matter import.

- debug-ops.ts: mutateFeedback regex now anchored with $; second
  startsWith arm removed. Without $ the pattern silently matched
  "FB-037-anything"; FB files on disk are NNN-slug.md (never
  FB-NNN-slug.md), so the second arm was dead code.

Tests: +2 traversal-rejection tests (force_stage_complete + mutate_feedback).
13/13 pass. Full suite: 1636 passed, 0 failed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@claude
Copy link
Copy Markdown
Contributor

claude Bot commented May 15, 2026

Claude finished @jwaldrip's task in 2m 8s —— View job


Claude finished @jwaldrip's task —— View job


PR Review (re-review after fourth commit — 33ebd76)

Tasks:


Status: All blocking and medium issues resolved. ✓ Approve.

Commit 33ebd76 ("fix(debug): address claude-bot review #3 — security + correctness") addressed every remaining item from the three prior reviews. The diff is clean.


All previously-flagged issues — resolved

Blocking (security + correctness):

  • validateSlugArgs added to haiku_debug.ts:77–82 — path traversal guard now fires before any debug-op dispatch, exactly as every other orchestrator tool does. The comment names the exact attack it prevents.

  • verifyIntentMutationAuth in debug-routes.ts:194 — the write endpoint now binds the JWT sid claim to the URL slug, closing the R-01 cross-session bypass. The comment in the handler cites auth.ts:62–68 explicitly.

  • body.stage validated with isValidSlug in debug-routes.ts:212–215 and 241–243 — both force_stage_complete and mutate_feedback now reject an invalid stage before passing it into path construction.

  • "continuity" added to intent approval roles in debug-ops.ts:157forceStageComplete now signs ["spec", "continuity", "user", "intent_quality_gates"], matching the cursor's requirement at cursor.ts:1354–1355. Comment in the handler cites the cursor lines and names the failure mode it prevents.

Medium:

  • previewCursor wrapped in its own try/catch in haiku_debug.ts:97–107 — the early-return path now catches derivePosition() throws and returns a structured { error: "preview_cursor_threw" } response.

  • parseFrontmatter used uniformly in debug-ops.ts:294 — raw matter import is gone; the duplicate-key recovery path is now consistent with every other function in the file.

  • mutateFeedback regex anchored and dead code removeddebug-ops.ts:267 now uses /^(?:FB-)?(\d+)$/ with $ anchor, and the second unreachable startsWith arm is gone. The comment explains why a single padStart prefix match is sufficient.


Remaining low-priority observations (non-blocking)

  • Cursor GET endpoint in debug-routes.ts:163 has no try/catch. If derivePosition() throws on a truly corrupted intent, Fastify's default error handler returns a generic 500. The SPA handles non-ok responses at $slug.tsx:73–76 by falling back to { ok: false, error: "HTTP 500" }. Acceptable behavior — the user sees an error rather than a crash. A structured catch would be cleaner but it's not blocking.

  • stagesProcessed overstates on partial sign failure (debug-ops.ts:130) — stages with refused units are still pushed to the success list before the refusedUnits check fires. The partial_signed detail in the error payload carries the real story. Low priority for a recovery tool with an explicit partial-sign warning.

  • Schema duplication between tool-defs.ts and haiku_debug.ts — the input schema is still written out in both places. Follow-up refactor, not blocking.


Final assessment

The design is correct, the trust boundaries are well-implemented, and all security issues from the prior reviews are closed. The picker gate (MCP path) and SPA confirmation modal (HTTP path) both enforce the explicit user confirmation requirement. The test coverage uses real on-disk state rather than mocks, which is the right call for mutation surfaces.

Ready to merge.

…ursor try/catch

The three remaining non-blocking observations from review #4:

- forceStageComplete is now atomic (debug-ops.ts):
  Two-pass design — pass 1 walks every unit on every stage and collects
  refusals WITHOUT writing anything; if any unit hasn't reached terminal
  advance the op aborts cleanly with `signed: 0`. Pass 2 only runs when
  the whole set passes. Earlier versions wrote partial signatures before
  the refusal check fired, then reported half-signed stages as
  "processed" — caller's mental model was off.

  Added test/debug-ops-atomicity.test.mjs (3 cases): refusal returns
  signed:0; ready unit's file is byte-identical before/after the failed
  call; stalled unit untouched.

- Schema duplication eliminated (state/schemas/inputs/debug.ts):
  New TypeBox schema HAIKU_DEBUG_INPUT_SCHEMA is the single source of
  truth. tool-defs.ts uses jsonSchemaOf(HAIKU_DEBUG_INPUT_SCHEMA);
  haiku_debug.ts uses the same schema + the shared HAIKU_DEBUG_SUPPORTED_OPS
  constant. The 40+ lines of inline schema in tool-defs.ts are gone;
  adding a parameter can't leave the MCP advertisement and the dispatcher
  out of sync.

- Cursor GET endpoint now has try/catch (debug-routes.ts):
  derivePosition() may throw on a truly corrupted intent — exactly what
  callers reach for the debug surface to diagnose. Returns a structured
  500 with { ok: false, error: "preview_cursor_threw", detail } instead
  of letting Fastify's default error handler emit a generic crash page.

Full suite: 1639 passed, 0 failed (was 1636 — +3 from the new
atomicity test). Lint clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@claude
Copy link
Copy Markdown
Contributor

claude Bot commented May 15, 2026

Claude finished @jwaldrip's task in 4m 6s —— View job


Claude finished @jwaldrip's task —— View job


PR Review (re-review after fifth commit — 3d978ce)

Tasks:

  • Gather PR metadata and context
  • Read latest commit changes (3d978ce — atomicity, schema dedup, cursor try/catch)
  • Verify all previously-flagged issues are resolved
  • Full re-review of the complete PR diff
  • Post final summary

Status: All previously-flagged issues resolved. ✓ Approve.

Commit 3d978ce ("fix(debug): address claude-bot review #4 — atomicity, schema dedup, cursor try/catch") addressed all three remaining low-priority observations from review #4. The diff is clean; lint is green.


What the fifth commit fixed

Atomicity (debug-ops.ts):

forceStageComplete is now a true two-pass operation. Pass 1 walks every unit on every stage and collects refusals without writing anything — if any unit hasn't reached terminal advance the op aborts with signed: 0. Pass 2 only runs when the whole set passes. The old version wrote partial signatures to terminal units before the refusal check fired, reported those stages as "processed", and left the caller's mental model off. The new design is all-or-nothing per call.

Three new atomicity tests in test/debug-ops-atomicity.test.mjs cover: refusal returns signed: 0; the ready unit's file is byte-identical before and after the failed call; the stalled unit is untouched. Good call doing the file-content snapshot comparison — that's the tightest possible atomicity assertion.

Schema duplication eliminated (state/schemas/inputs/debug.ts):

HAIKU_DEBUG_INPUT_SCHEMA is now the single source of truth. tool-defs.ts uses jsonSchemaOf(HAIKU_DEBUG_INPUT_SCHEMA); haiku_debug.ts imports the same schema constant + HAIKU_DEBUG_SUPPORTED_OPS. The 40-line inline duplication in tool-defs.ts is gone. Adding a parameter now has one place to change.

Type.Optional(Type.Unsafe<unknown>({ type: [...] })) on the value field is correct — matches the pattern used for set-style multi-type fields elsewhere.

Cursor GET endpoint try/catch (debug-routes.ts:167–180):

derivePosition() may throw on a truly corrupted intent — exactly when someone reaches for this surface. The structured { ok: false, error: "preview_cursor_threw", detail } 500 means the SPA renders an error message instead of Fastify's default crash page. The cursor fetch handler in $slug.tsx already handles non-ok responses gracefully (try { return await r.json() } catch { return { ok: false, error: ... } }).


Remaining observation (non-blocking, follow-up)

haiku_debug.handle() doesn't call validateToolInput, unlike haiku_select_mode (which does: validateToolInput(args, validateHaikuSelectModeInputSchema, "haiku_select_mode")). The validateHaikuDebugInputSchema compiled validator is exported from debug.ts but never called at the handler level.

In practice the MCP runtime validates against inputSchema at the wire level, so bad inputs are rejected before handle() is called. The gap is only in error format — a shape-rejected call would produce the runtime's generic error rather than the stable named code haiku_debug_input_invalid. For a recovery tool behind a picker confirmation gate this is genuinely low priority, but it's inconsistent with the haiku_select_mode precedent. Follow-up refactor, not blocking.


All previously-flagged issues — resolved ✓

Issue Status
Biome lint failures (writeFileSync, import order, formatter) ✅ Fixed in 57d3825
validateSlugArgs missing in haiku_debug.ts (path traversal) ✅ Fixed in 33ebd76
verifyIntentMutationAuth missing in write endpoint ✅ Fixed in 33ebd76
body.stage not validated with isValidSlug ✅ Fixed in 33ebd76
"continuity" missing from intent approval roles ✅ Fixed in 33ebd76
previewCursor unguarded by try/catch in haiku_debug.ts ✅ Fixed in 33ebd76
Raw matter instead of parseFrontmatter in previewCursor ✅ Fixed in 33ebd76
mutateFeedback regex missing $ anchor, dead code arm ✅ Fixed in 33ebd76
stagesProcessed overstating success on partial refusal ✅ Fixed in 3d978ce (two-pass design)
Schema duplication between tool-defs.ts and haiku_debug.ts ✅ Fixed in 3d978ce
Cursor GET endpoint missing try/catch ✅ Fixed in 3d978ce

Looks good (full PR)

  • Trust boundary is correctly implemented: SPA confirmation modal on the HTTP path, runPicker on the MCP path. Both surfaces call the same debug-ops.ts functions. The agent cannot reach any mutation without an explicit user action on one of the two surfaces.
  • SKILL.md is well-written: "When NOT to use" section directs to /haiku:repair and /haiku:reset-stage first, the confirmation-flow diagram is accurate, the "never auto-retry on cancellation" note is correct.
  • debug-routes.test.mjs — real HTTP server, real on-disk intent. Two path-traversal tests cover both force_stage_complete and mutate_feedback. Three atomicity tests in debug-ops-atomicity.test.mjs.
  • derived-stage-state.ts change is minimal — only exports two previously-private functions (reviewRolesFor, approvalRolesFor).
  • listIntentSummaries resilient catch path correctly skips unparseable intent files.
  • Workflow diagrams are auto-generated (export:workflow-diagrams); the regeneration is consistent across all studios.

Ready to merge.

jwaldrip and others added 2 commits May 15, 2026 14:06
The compiled validator was already exported from state/schemas/inputs/debug.ts
but the handler skipped it. Without it, a shape-rejected call produced the
MCP runtime's generic error instead of the stable named code
\`haiku_debug_input_invalid\` with structured \`errors[]\`. Now matches the
haiku_select_mode precedent and every other validated state-tool.

Order: shape gate (validateToolInput) → path-traversal gate (validateSlugArgs)
→ op dispatch. Both gates produce stable named codes.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Combines the debug skill PR with the run-next loop-bounding PR so they
ship as one atomic unit per user request.

# Conflicts:
#	plugin/bin/haiku.mjs
@claude
Copy link
Copy Markdown
Contributor

claude Bot commented May 15, 2026

Claude finished @jwaldrip's task in 2m 36s —— View job


Claude finished @jwaldrip's task —— View job


PR Review (re-review after sixth and seventh commits — 22fb8d9 + 896df2c)

Tasks:

  • Gather PR metadata and context
  • Read the two new commits (22fb8d9 + 896df2c merge)
  • Verify previous non-blocking observation (missing validateToolInput) was addressed
  • Review the merged run-next loop bounding changes
  • Full re-review of complete PR diff

Status: All issues resolved. ✓ Approve.

Both new commits are clean. Everything from the previous five reviews is now addressed, and the loop-bounding merge is well-implemented.


What the sixth commit fixed (22fb8d9)

The one remaining non-blocking observation from review #5haiku_debug.handle() not calling validateToolInput — is now resolved.

haiku_debug.ts:45–50 now calls validateToolInput(args, validateHaikuDebugInputSchema, "haiku_debug") as the first gate, before validateSlugArgs. The ordering comment is correct: shape gate first (yields haiku_debug_input_invalid + structured errors[]) → traversal guard second. This matches the haiku_select_mode precedent exactly.


What the merge commit brought in (896df2c — PR #368)

Three files: _loop_guard.ts, additions to haiku_run_next.ts, and run-next-loops-bounded.test.mjs.

_loop_guard.ts is clean. Key constants: RUN_NEXT_LOOP_CAP = 16 (iterations before abort), RUN_NEXT_WALL_DEADLINE_MS = 90_000 (90s wall clock for non-SPA-pop paths). loopAbortResponse writes to the session log file (co-located with subagent prompts at $TMPDIR/haiku-prompts/{session_id}/loop-guards.log) and to stderr, then returns a structured error the agent can match on. actionSignature hashes the key fields of an OrchestratorAction for the no-progress detection.

haiku_run_next.ts — four while loops, all bounded:

Loop Cap check Wall deadline No-progress sig
select_*
close_feedback
complete_stage
gate_review / user_gate deliberately omitted

The gate_review loop's deliberate exemption from wallDeadlineExceeded is correctly motivated and clearly commented (lines 1489–1497): await_gate is the SPA-pop blocking primitive the user explicitly exempted ("It can block via a SPA pop, but it CANNOT hang for infinite recursion"). A real human-in-the-loop decision can take minutes; the 90s deadline would falsely fire. The iteration cap + no-progress check still bound the loop.

run-next-loops-bounded.test.mjs is a clever static assertion: it parses the source text of haiku_run_next.ts, finds every while ( opener, and asserts each one has the cap-gate pattern within the next 40 lines. A future commit that adds an unbounded while loop will fail CI before shipping. The test has its own hasBoundedConditionShape and hasUserDecisionAwait predicates for edge cases. Test file is thorough and the mechanism is sound.


One minor observation (non-blocking)

wallDeadlineAbortResponse passes reason: "no_progress" to loopAbortResponse (_loop_guard.ts:60). The actual trigger is a wall-clock ceiling, which is semantically closer to "cap" than "no_progress". The diagnostic is still accurate — the wall-deadline info is in the loop name string ("${loopName} (wall-deadline … ms exceeded; elapsed=…ms)"), so the log and error response are human-readable. The only oddity is the reason=no_progress token in the detail line where reason=cap would be more precise. Non-blocking; the user can read the diagnostic either way.


Full issue history — all resolved ✓

Issue Fixed in
Biome lint failures (writeFileSync, import order, formatter) 57d3825
validateSlugArgs missing (path traversal on MCP args) 33ebd76
verifyIntentMutationAuth missing on write endpoint (R-01 bypass) 33ebd76
body.stage not validated with isValidSlug 33ebd76
"continuity" missing from intent approval roles 33ebd76
previewCursor unguarded by try/catch in MCP tool 33ebd76
Raw matter instead of parseFrontmatter in previewCursor 33ebd76
mutateFeedback regex missing $ anchor, dead code arm 33ebd76
stagesProcessed overstating success on partial refusal 3d978ce (two-pass atomicity)
Schema duplication between tool-defs.ts and haiku_debug.ts 3d978ce
Cursor GET endpoint missing try/catch 3d978ce
validateToolInput shape gate missing in haiku_debug.handle() 22fb8d9

Looks good (full PR)

  • Trust boundary correctly implemented: SPA confirmation modal on the HTTP path, runPicker on the MCP path. Both surfaces call the same debug-ops.ts functions.
  • forceStageComplete two-pass atomicity is correct and tested (debug-ops-atomicity.test.mjs snapshot-compares file bytes before and after a failed call).
  • Schema is a single source of truth (state/schemas/inputs/debug.ts); tool-defs.ts and haiku_debug.ts both import from it.
  • debug-routes.ts correctly uses verifyIntentMutationAuth on the write endpoint and isValidSlug on both body.stage paths.
  • SKILL.md "When NOT to use" section, confirmation-flow diagram, and "never auto-retry on cancellation" note are all correct.
  • run-next-loops-bounded.test.mjs static assertion is a strong regression guard.

Ready to merge.

@jwaldrip jwaldrip changed the title feat(debug): /haiku:debug skill + haiku_debug MCP tool for unsticking corrupt intents feat: /haiku:debug skill (intent recovery) + run_next loop bounding (was #368) May 15, 2026
Open feedback continues to block the cursor even after every approval is
signed — the next tick returns feedback_dispatch / review_fix as long as
any FB lacks closure. The new opt-in option closes that gap in one call.

When close_open_feedback: true, after the unit-sign pass, walk every FB
on the targeted stages (and on intent scope when the final stage is the
target). For each FB without closed_at and without closed_by, stamp:

  closed_at: <now ISO>
  closed_by: "force_complete"

That's it. We deliberately do NOT write status: closed — v4 derives
status from closed_at / closed_by (state-tools.ts:5645 doc), and the
legacy `status` field is being phased out.

Also rejects already-closed FBs as no-ops (don't overwrite existing
closed_at, don't add force_complete provenance to FBs that closed via
the normal fix loop).

Wired through the schema (state/schemas/inputs/debug.ts), the MCP tool,
the HTTP route, and the SPA force-stage-complete form (now has a checkbox
for "Also close every open feedback"). Default is false — the closure
behavior is opt-in, since some recovery scenarios want to keep FBs open
for follow-up.

Tests (debug-ops-atomicity.test.mjs): +4 cases. feedback_closed === 1
(only the open one); open FB now has closed_at + closed_by:
force_complete; status field is NOT written (v4 derives it); already-
closed FBs (via closed_at OR closed_by) are byte-untouched.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
jwaldrip and others added 2 commits May 15, 2026 15:45
The v4 schemas already declared "there is no status field" (feedback.ts:8,
unit.ts:5), but ~25 consumer sites in state-tools.ts still read \`item.status\`
as a fallback when on-disk data carried the legacy enum. v8 closes the loop:

1. Migration v7→v8 (\`orchestrator/migrations/v7-to-v8.ts\`)
   Walks every FB and unit md file in the intent dir. For FB: drops the
   \`status:\` field; synthesizes the v8 closure signals when status implied
   closure (addressed→closed_at+closed_by:feedback-assessor; answered→
   closed_at+resolution:answered; closed→closed_at+closed_by:manual_close;
   rejected→rejected_at). For unit: drops \`status:\` outright (unit position
   is fully derived from iterations + reviews + approvals). Stamps
   plugin_version: 8.0.0 on intent.md. Idempotent re-runs are no-ops.

2. \`deriveFeedbackStatus(fm)\` returns the same 6 enum values consumers
   expected, computed from signal precedence:
     rejected_at → rejected
     closed_by === "feedback-assessor" → addressed
     resolution === "answered" → answered
     closed_at OR closed_by → closed
     iterations[] non-empty → fixing
     else → pending
   Falls back to legacy \`status:\` field for un-migrated FBs in flight.

3. \`deriveUnitStatus(fm)\` returns "pending" / "active" / "completed":
     iterations[] empty → pending
     last iteration result==="advance" + approvals non-empty → completed
     else → active
   Same legacy-status fallback.

4. \`normalizeLegacyFeedbackStatus(fm)\` translates any legacy \`status:\`
   write into the v8 signals before persisting:
     status: closed/addressed/answered → write closed_at (+ closed_by /
       resolution flavor where appropriate)
     status: rejected → write rejected_at
     status: pending/fixing/null/empty → just drop the legacy field
   Wired into updateFeedbackFile, the no-op move path, and the reject
   handler — every FB write site that might inherit legacy status from
   a spread of \`found.data\` now strips it on the way out.

5. Consumer-site refactors in state-tools.ts:
   - readFeedbackFiles: status now derived (not read from FM)
   - countPendingFeedback: simplified to \`status === pending|fixing\`
   - haiku_unit_set lifecycle gate: uses deriveUnitStatus
   - haiku_unit_write lifecycle gate: uses deriveUnitStatus
   - haiku_unit_delete lifecycle gate: uses deriveUnitStatus
   - haiku_feedback_reject: stamps rejected_at instead of status:rejected,
     normalizes the spread to drop any legacy status
   - haiku_feedback_move lifecycle guard: uses deriveFeedbackStatus
   - feedback list "isClosed" predicate: uses derived status
   - appendFeedbackReply close_as_answered path: stamps closed_at +
     resolution:answered instead of status:answered

6. Reject handler bug fix: \`closed_by: null\` write deletes the key
   instead of setting it to undefined (js-yaml choked on the latter).

Tests: feedback.test.mjs updated to assert v8 disk shape (closed_at /
rejected_at / closed_by) instead of the legacy status field. Full
suite: 1647 passed, 0 failed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…tness

Three things in one commit because they all converge on "v8 closure
truth = closed_at":

1. Bug fix (user-reported): force_stage_complete close_open_feedback
   skipped FBs that had closed_by set but closed_at: null. The cursor's
   source-of-truth for closure is closed_at — closed_by alone is "a unit
   claimed it, the feedback-assessor hasn't verified yet" (per the v4
   schema doc) which still BLOCKS the gate. The user found this when
   29 of 31 design FBs in their stuck intent were in that exact state.

   Now: skip only when closed_at OR rejected_at is stamped. If closed_by
   exists without closed_at, stamp closed_at and PRESERVE the existing
   closed_by (don't overwrite the claimer's provenance — just add the
   verification timestamp).

2. deriveFeedbackStatus correctness: same root cause. Earlier returned
   "closed" for closed_by-alone, lying to consumers. Now returns
   "fixing" (claimed-pending-verification) until closed_at lands.
   Matches the cursor exactly:
     rejected_at → rejected
     closed_at + closed_by:feedback-assessor → addressed
     closed_at + resolution:answered → answered
     closed_at → closed
     closed_by alone OR iterations[] → fixing
     else → pending

3. Batch ops: mutate_feedback now accepts feedback_ids: string[] and
   set_intent_field now accepts fields: { [key]: value } so multi-file /
   multi-key mutations land in ONE picker confirmation instead of one
   round-trip per item. The user was running 11+ haiku_debug calls in a
   row to address corrupt-intent state — batching collapses that into a
   single confirm.

   Wired through:
   - Schema (state/schemas/inputs/debug.ts): feedback_ids array,
     fields object
   - haiku_debug MCP tool: dispatch loops + describeOp updates so the
     picker prompt names the batch shape
   - HTTP route /api/debug/intents/:intent/ops/:op: same batch dispatch
   - SPA mutate-feedback form: textarea for IDs (one per line) — single
     line = single mutate, multi = batch
   - SPA set-intent-field form: multi-row "Add row" + "Remove last"
     buttons — single row = single mutate, multi = batch

Tests:
  test/debug-ops-atomicity.test.mjs: regression test asserts closed_by-
    only FB now gets closed_at stamped (was: skipped); existing closed_by
    preserved (was: would have been overwritten). feedback_closed count
    now reflects both truly-open + claimed-but-unverified FBs.
  test/debug-routes.test.mjs: 13/13 still pass.
  test/feedback.test.mjs: 64/64 still pass after derivation tightening.
  Full suite: 1647 passed, 0 failed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@claude
Copy link
Copy Markdown
Contributor

claude Bot commented May 15, 2026

Claude finished @jwaldrip's task in 4m 51s —— View job


Claude finished @jwaldrip's task —— View job


PR Review (re-review after three new commits — 0a88496, 8c60a90, eeeb66c)

Tasks:

  • Gather PR metadata and context
  • Read new commits since last review (0a88496, 8c60a90, eeeb66c)
  • Review close_open_feedback option in force_stage_complete
  • Review v8 derived status migration + deriveFeedbackStatus / deriveUnitStatus
  • Review batch ops + close-FB bug fixes
  • Verify Biome lint (clean)
  • Check CI status (in progress)

Status: One stale docstring, one minor HTTP-path inconsistency. Otherwise clean. ✓ Approve.

The three new commits are well-reasoned and well-tested. Biome is clean. The implementation of all three changes is correct. Two observations worth flagging before merge.


What the three new commits added

0a88496close_open_feedback on forceStageComplete:
The gap this closes is real: even after every unit approval is signed, open FBs continue to trigger feedback_dispatch / review_fix on the next tick. The new flag stamps closed_at + closed_by: "force_complete" on every unresolved FB across the processed stages (and intent scope when the final stage is the target). Deliberately doesn't write the legacy status field. Default is false (opt-in). Wired through schema, MCP tool, HTTP route, and SPA checkbox. 4 new atomicity tests.

8c60a90 — v7→v8 derived-status migration:
Solid design. deriveFeedbackStatus and deriveUnitStatus are the single source of truth; every read-path consumer routes through them. normalizeLegacyFeedbackStatus fires on every write path (updateFeedbackFile, no-op move, reject handler, appendFeedbackReply) to strip legacy status: on the way out. The migration walks all scopes, synthesizes the right closure signals for each legacy enum value, and is idempotent (already-migrated files have no status: key → early return). countPendingFeedback now correctly blocks on "fixing" status (not just "pending"); the old code incorrectly passed when closed_by was set without closed_at.

eeeb66c — Bug fix + batch ops:

  • Bug fix (close_open_feedback): the initial implementation skipped FBs where closed_by was set (even without closed_at). This was wrong — closed_by alone means "claimed, assessor not yet verified", and the cursor's truth is closed_at. Now skips only when closed_at OR rejected_at is present; if closed_by exists without closed_at, stamps closed_at and preserves the existing closed_by provenance. Regression test covers both the fixed case and the provenance-preservation rule.
  • Bug fix (deriveFeedbackStatus): returns "fixing" (not "closed") for closed_by without closed_at. Matches the cursor exactly.
  • Batch ops: mutate_feedback accepts feedback_ids: string[]; set_intent_field accepts fields: { [key]: value }. MCP path calls validateSlugArgs per item in the batch array. SPA gets a textarea (one ID per line) and a multi-row "Add row" form.

Medium — stale JSDoc in deriveFeedbackStatus (state-tools.ts:4696–4702)

The docstring was written in 8c60a90 and reflects the original behavior. eeeb66c fixed the implementation but didn't update the JSDoc. The current docstring says:

 *   2. `closed_by` === "feedback-assessor"     → "addressed"
 *   3. `resolution` === "answered"             → "answered"
 *   4. `closed_at` set OR `closed_by` non-empty → "closed"

But the implementation (correctly) requires closed_at as a prerequisite for "addressed" / "answered" / "closed" (lines 4724–4730), and returns "fixing" for closed_by alone (line 4748). Step 4 is wrong: closed_by without closed_at is not "closed" — that's the exact bug this commit fixed. The inline comment at 4713–4717 is correct; the JSDoc above it isn't.

The correct precedence table is:

 *   1. `rejected_at` set                              → "rejected"
 *   2. `closed_at + closed_by === "feedback-assessor"` → "addressed"
 *   3. `closed_at + resolution === "answered"`         → "answered"
 *   4. `closed_at` set (any other case)                → "closed"
 *   5. `closed_by` non-empty (without closed_at)       → "fixing"
 *   6. `iterations[]` non-empty                        → "fixing"
 *   7. otherwise                                        → "pending"

Fix this →


Low — HTTP batch path validates feedback_ids items as non-empty string only

debug-routes.ts:280–286 checks each fid with typeof fid !== "string" || !fid. The MCP batch path (haiku_debug.ts:64–68) calls validateSlugArgs({ feedback_id: fid }) per item, which applies the full path-traversal guard. The HTTP path is less strict. mutateFeedback's regex lookup (/^(?:FB-)?(\d+)$/ now anchored) provides a practical defense, so this isn't exploitable — but the two surfaces have different validation depth. The fix is one isValidSlug(fid) check per item in the HTTP loop before passing to mutateFeedback. Non-blocking. Fix this →


All previous review items — still resolved ✓

Issue Fixed in
Biome lint failures 57d3825
validateSlugArgs missing (path traversal on MCP args) 33ebd76
verifyIntentMutationAuth missing (R-01 bypass) 33ebd76
body.stage not validated with isValidSlug 33ebd76
"continuity" missing from intent approval roles 33ebd76
previewCursor unguarded by try/catch 33ebd76
Raw matter instead of parseFrontmatter 33ebd76
mutateFeedback regex missing $ anchor 33ebd76
Two-pass atomicity for forceStageComplete 3d978ce
Schema duplication (tool-defs.ts vs haiku_debug.ts) 3d978ce
Cursor GET endpoint missing try/catch 3d978ce
validateToolInput shape gate missing 22fb8d9

Looks good (new commits)

  • Migration idempotency: v7-to-v8.ts correctly no-ops on files without a status: key. The fallback timestamp for synthesized closure (createdAt || new Date(0).toISOString()) is reasonable — epoch is a valid timestamp, clearly synthetic, and won't cause false positives downstream.
  • normalizeLegacyFeedbackStatus wiring: covers all three write paths that could propagate a legacy status: spread (updateFeedbackFile, no-op move, reject handler). appendFeedbackReply was directly updated to stamp closed_at + resolution instead.
  • Batch setIntentField field keys: these are YAML keys on a fixed intent.md path, not filesystem paths — no traversal risk. The isValidSlug gap above applies only to feedback_ids.
  • SPA batch UI: textarea with newline-split IDs is a natural pattern; single-line → single call, multi-line → batch. The biome-ignore lint/suspicious/noArrayIndexKey annotation on the multi-row grid is correctly justified (positional identity).
  • close_open_feedback atomicity: the bug fix (stamp closed_at on closed_by-only FBs) happens in the post-sign FB-walk, which runs only after the two-pass unit sign succeeds. Atomicity of the unit sign is unaffected.
  • Biome: clean on all four key files.

Ready to merge.

…omplete elaboration synth

Bug 1 (HIGH) — enforceStageBranch dragged the user off their checkout:
  ensureOnStageBranch now short-circuits when ALL of:
    1. Current branch is NOT a haiku-managed branch for this intent
       (haiku/<slug>/main or haiku/<slug>/<stage>) — the engine's
       pipeline lives on those branches and needs the actual checkout
       even when stage tips converge.
    2. Target stage branch is a STRICT ancestor of HEAD (HEAD has
       commits the target doesn't). Equality is not enough — a freshly-
       forked stage branch shares the same tip as intent main.
    3. Implied: target was reachable to begin with (isAncestor true).
  Result: a user on `dev` with stage branches merged in stays on `dev`.
  The engine's intra-pipeline transitions still work (current is a
  haiku branch → guard 1 fails → enforcement runs as before).

Bug 2 (MEDIUM) — close_open_feedback skipped FBs with closed_by but
  closed_at: null. Fixed in the prior commit (eeeb66c). Verified.

Bug 3 (LOW-MED) — mutateFeedback required 3-digit zero-padding:
  Lookup now mirrors `findFeedbackFile` in state-tools.ts — parses the
  leading digits regardless of file-side or input-side padding width.
  Pre-2026-05-07 intents with 2-digit FB filenames work without
  rename. Tests: 2 new cases in debug-routes.test.mjs (legacy padding
  + filename-stem form).

Bug 4 (LOW) — invalid_feedback_id_shape on filename-stem inputs:
  The accepted shapes are now: "FB-001" / "FB-1" / "001" / "1" /
  "001-some-slug" / "01-some-slug". Same regex relaxation as Bug 3 —
  match leading digits, ignore the rest.

NEW (user request) — force_stage_complete now synthesizes elaboration:
  When every unit on a processed stage has terminal-advanced, the
  elaboration round-trip is moot for recovery. Synthesize
  stages/<stage>/elaboration.md with `recorded_at`, `verified_at`,
  `decompose_verified_at`, and `synthesized_by: force_complete` so the
  cursor walks past the elaborate phase without waiting for the
  verifier subagent. Returns elaborations_sealed: N in the result.
  Fires only inside the debug op (user-confirmed); the normal workflow
  engine still requires the verifier to seal.

Tests:
  test/debug-routes.test.mjs: +2 (legacy padding, filename-stem).
  test/debug-ops-atomicity.test.mjs: +1 (elaboration synth).
  Full suite: 1650 passed, 0 failed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@claude
Copy link
Copy Markdown
Contributor

claude Bot commented May 15, 2026

Claude finished @jwaldrip's task in 5m 55s —— View job


PR Review (re-review after ace000e)

Tasks:


Status: All blocking issues resolved. One stale JSDoc remains (non-blocking). ✓ Approve.

Commit ace000e ("fix(engine): address haiku-mcp-bug-report.md (4 bugs) + force_stage_complete elaboration synth") is clean. Four bug fixes + a user-requested feature, all well-targeted.


What ace000e fixed

Bug 1 (HIGH) — ensureOnStageBranch forced checkout off user's branch (git-worktree.ts:2482–2513):

The new short-circuit fires only when both guards pass:

  1. Current branch is NOT a haiku-managed branch for this intent (!current.startsWith("haiku/${slug}/")).
  2. Target stage branch is a strict ancestor of HEAD (isAncestor(targetBranch, "HEAD") && !isAncestor("HEAD", targetBranch)).

Verified isAncestor(a, b) at git-worktree.ts:1528 — it's git merge-base --is-ancestor a b, so the semantics are correct. Guard 2 correctly excludes the same-SHA case (freshly-forked stage has the same tip as intent main → isAncestor is symmetric there → headStrictlyAhead = false → short-circuit doesn't fire).

Bug 3/4 — mutateFeedback lookup rejected legacy 2-digit IDs and filename stems (debug-ops.ts:388–396):

The old anchored-prefix approach (/^(?:FB-)?(\d+)$/ → nn.padStart(3) + f.startsWith(nn)) was too strict. The replacement parses both sides to integers and compares numerically. This is more robust and explicitly lists the accepted shapes in the comment. The previous $ anchor was removed intentionally — but the old concern (silent matching of "FB-037-anything") is no longer relevant because the match result is parsed as an integer rather than used as a path fragment. Path traversal remains blocked: the regex requires leading digits, and found is used as a filename within an already-bounded directory.

New feature — force_stage_complete synthesizes elaboration.md (debug-ops.ts:169–211):

When every unit on a processed stage has terminal-advanced, the elaboration round-trip is moot for recovery. The synth:

  • Fires only for stagesProcessed (stages that passed both passes of unit signing — not partially-refused stages).
  • Uses matter.stringify() for the write path (correct — there's no parseFrontmatter write equivalent) and parseFrontmatter() for reading the existing file (consistent with the rest of the file).
  • Is idempotent: stamps only the missing verified_at/decompose_verified_at fields when elaboration.md already exists.
  • Returns elaborations_sealed: N in the result so the caller knows what was touched.
  • Clearly comments the recovery-only context ("IMPORTANT: this fires only inside the debug op").

Imports writeFileSync and matter re-added — both are now genuinely used. Import order is node built-ins → external packages → relative imports, which is Biome's expected order.


From review #7 — one observation moot, one still open

HTTP batch feedback_ids validation observation — now moot.

Review #7 flagged that the HTTP batch loop only checked typeof fid !== "string" || !fid, while the MCP path called validateSlugArgs({ feedback_id: fid }) per item. With the integer-parsing approach in ace000e, mutateFeedback now validates the input internally: the regex /^(?:FB-)?(\d+)/i requires leading digits, and path construction uses only the integer-compared found filename within an already-bounded directory. A crafted "../../etc" has no leading digits → idMatch is null → early return with invalid_feedback_id_shape, no path constructed. Adding isValidSlug per item would actually be wrong now — it would reject the valid filename-stem inputs ("001-some-slug") that this commit explicitly supports. Observation withdrawn.

Stale JSDoc in deriveFeedbackStatus at state-tools.ts:4696–4702 — still open.

This was flagged in review #7 and remains unfixed. The docstring's step 4 says:

*   4. `closed_at` set OR `closed_by` non-empty → "closed"   (any other resolved state)

But the implementation (correctly) returns "fixing" for closed_by without closed_at — the bug this was written to fix. The inline comment at line 4713–4717 is accurate; the JSDoc header is not. Non-blocking (no runtime impact), but misleading for future readers. Fix this →


Full issue history — all resolved ✓

Issue Fixed in
Biome lint failures (writeFileSync, import order, formatter) 57d3825
validateSlugArgs missing (path traversal on MCP args) 33ebd76
verifyIntentMutationAuth missing (R-01 bypass) 33ebd76
body.stage not validated with isValidSlug 33ebd76
"continuity" missing from intent approval roles 33ebd76
previewCursor unguarded by try/catch 33ebd76
Raw matter instead of parseFrontmatter in previewCursor 33ebd76
mutateFeedback regex missing $ anchor, dead code arm 33ebd76
Two-pass atomicity for forceStageComplete 3d978ce
Schema duplication (tool-defs.ts vs haiku_debug.ts) 3d978ce
Cursor GET endpoint missing try/catch 3d978ce
validateToolInput shape gate missing in haiku_debug.handle() 22fb8d9
close_open_feedback skipped closed_by-only FBs (bug) eeeb66c
deriveFeedbackStatus returned "closed" for closed_by alone (bug) eeeb66c
mutateFeedback rejected legacy 2-digit IDs + filename stems (bugs 3/4) ace000e
ensureOnStageBranch forced checkout off user's non-haiku branch (bug 1) ace000e

Looks good (full PR)

  • Trust boundary: SPA confirmation modal (HTTP path) + runPicker (MCP path). Both call the same debug-ops.ts functions. Agent cannot reach any mutation without explicit user action.
  • forceStageComplete two-pass atomicity is correct and tested — debug-ops-atomicity.test.mjs snapshot-compares file bytes before/after a failed call.
  • Schema (state/schemas/inputs/debug.ts) is the single source of truth; tool-defs.ts and haiku_debug.ts both import from it.
  • debug-routes.ts uses verifyIntentMutationAuth on the write endpoint and isValidSlug on body.stage.
  • run-next-loops-bounded.test.mjs static assertion is a strong regression guard for future loop additions.
  • SKILL.md "When NOT to use" section and "never auto-retry on cancellation" note are correct.
  • Full suite: 1650 passed, 0 failed (up from 1643 in the original PR, consistent with new test cases).

Ready to merge.

…rations[] field

iterations[] is one of the FSM-driven unit fields (unit.ts:62) the
agent normally cannot touch — the schema's propertyNames deny-list
rejects writes and the handler at state-tools.ts:7571 blocks them with
a clear error. Normal writes flow through haiku_unit_start /
haiku_unit_advance_hat / haiku_unit_reject_hat.

But force_stage_complete refuses to sign units whose iterations[] lacks
a terminal advance entry. Legacy intents (and partial-state units
where outputs landed but no iteration was recorded) have no way to
satisfy that precondition — there's no agent-facing tool to write
iterations[].

The new debug op closes the gap. Two modes:

- Auto (default): synthesize one { hat, result: "advance" } entry per
  hat in the stage's hats: sequence. The shape force_stage_complete
  needs to count the unit as terminal-advance.
- Explicit: caller passes iterations: [...] to write a specific
  sequence verbatim (still validated by the per-entry shape gate).

Wired through:
- Schema: HAIKU_DEBUG_INPUT_SCHEMA gains \`unit\` + \`iterations\` fields
- haiku_debug MCP tool: set_unit_iterations case with picker confirm
- HTTP route: POST /api/debug/intents/:intent/ops/set_unit_iterations
- SPA: SetUnitIterationsForm with auto/explicit toggle and JSON editor

Unit lookup tolerates the same shapes as findFeedbackFile:
- "unit-03-my-thing" exact
- "unit-03" prefix
- "03" / "3" / "unit-3" numeric

Tests: +2 in debug-routes.test.mjs (auto-synth + missing-args reject).
Full suite: 1652 passed, 0 failed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@claude
Copy link
Copy Markdown
Contributor

claude Bot commented May 16, 2026

Claude finished @jwaldrip's task in 3m 7s —— View job


Claude finished @jwaldrip's task ——


PR Review (re-review after c2bf55eset_unit_iterations op)

Tasks:

  • Read new commit changes (c2bf55e)
  • Verify security model on new op (slug guards, auth)
  • Review schema compliance (TypeBox/AJV pattern)
  • Check the remaining stale JSDoc from review Fix source error for claude marketplace #8
  • Post final summary

Status: Clean. One stale JSDoc still open (non-blocking, carried from review #8). ✓ Approve.

Commit c2bf55e adds set_unit_iterations — a recovery op that writes the FSM-protected iterations[] field agents cannot normally touch. The implementation is correct and consistent with the existing op pattern.


What the new commit adds

The gap it closes is real. force_stage_complete refuses to sign units whose iterations[] lacks a terminal advance entry. Legacy intents (and partial-state units where outputs landed without iteration recording) have no agent-accessible path to satisfy that precondition — haiku_unit_start / haiku_unit_advance_hat / haiku_unit_reject_hat all modify state going forward, not retroactively. This op fills the gap.

Auto-synthesis path is correct. When iterations is omitted, resolveStageHats(studio, stage) (already exported from studio.ts:96) drives the synthesized entries. One { hat, result: "advance", at } per hat — exactly the shape force_stage_complete uses to count a unit as terminal-advance.

Security model is consistent:

  • HTTP path: isValidSlug(stage) at line 271 and isValidSlug(unit) at line 275 — both present before dispatch. Same pattern as every other op in this file.
  • MCP path: validateSlugArgs at haiku_debug.ts:57–62 now includes unit: args.unit, which validateSlugArgs already handled (tools/define.ts:22 lists "unit" in its key set).
  • Both paths require verifyIntentMutationAuth (from the write endpoint added in 33ebd76).

Biome lint: clean — verified on the four modified files.

Tests: Two new cases in debug-routes.test.mjs:

  • Auto-synthesis path: writes a local studio fixture, calls the op, asserts ok: true, asserts iterations: and result: advance are on disk. Tests the golden path with real file I/O.
  • Missing-args rejection: { stage: "design" } without unit → 400 missing_stage_or_unit.

Minor observation (non-blocking)

additionalProperties: true on iteration item schema (debug.ts:97).

The .claude/rules/schema-definitions.md rule requires additionalProperties: false on every input object, with the only exception being "when the substance validation belongs to a dedicated validator that needs the raw object." The iteration items schema at debug.ts:97 uses additionalProperties: true. In practice this is harmless — the explicit dispatch path maps only hat, result, and at into a new object, so no extra fields reach disk — but it deviates from the project pattern without the documented exception applying. Suggest additionalProperties: false to match the rule.


Stale JSDoc — still open (carried from review #8, non-blocking)

deriveFeedbackStatus at state-tools.ts:4696–4702 remains incorrect.

The docstring says:

 *   4. `closed_at` set OR `closed_by` non-empty → "closed"   (any other resolved state)
 *   5. `iterations[]` non-empty                → "fixing"    (mid fix loop)

But the implementation (correctly, since eeeb76c) returns "fixing" for closed_by alone without closed_at (line 4748), and steps 2–4 all require closed_at (they're inside if (closedAtStr)). Step 4's "OR closed_by non-empty" clause is the exact bug that was fixed — it should now read "closed_at set (any other resolved state)" and a separate step 5 should document "closed_by" non-empty (without closed_at) → "fixing".

The inline comment at lines 4713–4717 is accurate; only the JSDoc header is stale. No runtime impact, but it's a trap for future readers. Fix this →


Full issue history — all resolved ✓

Issue Fixed in
Biome lint failures (writeFileSync, import order, formatter) 57d3825
validateSlugArgs missing (path traversal on MCP args) 33ebd76
verifyIntentMutationAuth missing (R-01 bypass) 33ebd76
body.stage not validated with isValidSlug 33ebd76
"continuity" missing from intent approval roles 33ebd76
previewCursor unguarded by try/catch 33ebd76
Raw matter instead of parseFrontmatter in previewCursor 33ebd76
mutateFeedback regex missing $ anchor, dead code arm 33ebd76
Two-pass atomicity for forceStageComplete 3d978ce
Schema duplication (tool-defs.ts vs haiku_debug.ts) 3d978ce
Cursor GET endpoint missing try/catch 3d978ce
validateToolInput shape gate missing in haiku_debug.handle() 22fb8d9
close_open_feedback skipped closed_by-only FBs (bug) eeeb66c
deriveFeedbackStatus returned "closed" for closed_by alone (bug) eeeb66c
mutateFeedback rejected legacy 2-digit IDs + filename stems (bugs 3/4) ace000e
ensureOnStageBranch forced checkout off user's non-haiku branch (bug 1) ace000e

Looks good (new commit)

  • resolveStageHats was already exported from studio.ts:96 — no new export needed.
  • validateSlugArgs at tools/define.ts:22 already listed "unit" in its key set — the new unit: args.unit field slots in correctly.
  • Unit lookup logic is correct: exact match → prefix match → numeric match (with $ anchor on the numeric regex). Handles "unit-03-slug", "unit-03", "03", "3" all correctly.
  • Auto-synthesis guard: no_hats_defined_for_stage returned when resolveStageHats returns empty — ops on invalid studio/stage combos fail cleanly.
  • Explicit path silently drops extra iteration fields (maps only hat, result, at) — no surprise fields on disk.
  • SPA SetUnitIterationsForm: auto/explicit toggle, JSON textarea for explicit mode, disabled={!stage || !unit} guard on Prepare button.

Ready to merge.

@jwaldrip jwaldrip merged commit b47dafe into main May 16, 2026
3 of 4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

1 participant