screenleon · screenleon · Apr 25, 2026 · Apr 25, 2026 · Apr 25, 2026
diff --git a/DECISIONS.md b/DECISIONS.md
@@ -4,6 +4,14 @@ Active architectural and behavioral decisions for Agent Native PM.
 
 When this file exceeds 50 entries or 30 KB, archive older entries to `DECISIONS_ARCHIVE.md`. The most recent archival pass was on 2026-04-22.
 
+## 2026-04-25: Phase 6c v5.1 — L0 safety + authoring lifecycle + LLM router (suggest-only) + activity visibility [agent:feature-planner]
+
+- **Context**: Phase 6b shipped role-dispatch end-to-end in code but the user-facing path is broken in three independent ways: (1) `execution_role` has no UI authoring surface, so the role_dispatch radio is permanently disabled (catch-22); (2) "auto-dispatch" is in name only — operators must still manually pick a role for every task; (3) connector activity during long task execution (backend-architect can run 90 min) is invisible to the frontend, making dogfood debugging impossible. Phase 5 §(g)/§(h) also flagged subprocess sandboxing as a Phase 6 blocker. The user explicitly rejected simple / phase-staged solutions: "一律不考慮簡單作法 我希望是完善的改動 而不是臨時性處理" (memory: `feedback_no_simple_approach`). Closing all four gaps in one phase is therefore in scope, even at ~15-day total cost.
+- **Decision**: Phase 6c ships as **5 separate PRs**, each independently reviewable and rollback-able, but coherent as a single capability slice. **PR-1 (done)**: role catalog skeleton (hand-maintained `[]Role{...}` with per-role `DefaultTimeoutSec`) + L0 safety boundary in connector subprocess invocation (wall-clock timeout via `cmd.Cancel = SIGTERM` + `cmd.WaitDelay = 5s` escalation, output size cap via `boundedWriter`, JSON schema minimum validation, 4 new error kinds: `role_not_found`/`dispatch_timeout`/`output_too_large`/`invalid_result_schema`). **PR-2**: full authoring lifecycle — migration 030 adds generic `actor_audit` table (subject_kind/subject_id/field/old_value/new_value/actor_kind/actor_id/rationale/confidence) which is the **single source of truth** for execution_role authoring metadata (no denormalised columns on `backlog_candidates` per critic round 3 #1; frontend reads via helper `LatestAuthoring(subject_kind, subject_id, field)` joining against latest audit row); PATCH endpoint accepts `execution_role`; apply API extended with `execution_role` payload; catalog enforcement at four entry points (PATCH, apply, claim-next-task; suggest gates at validation in PR-3); CandidateReviewPanel rewritten so role_dispatch radio is always enabled with inline `<select>`; CandidateRoleEditor inline popover on candidate cards; stale-role warning when previously-suggested role no longer in catalog. **PR-3 (suggest-only, B2)**: LLM router as advisory meta-agent — new `backend/internal/prompts/meta/dispatcher.md` (category=meta, version 1, default timeout 60s; lives under `meta/` subtree per critic #10, drift test walks both `roles/` and `meta/`); `dispatcher.Service.Suggest` reuses PR-1's `invokeBuiltinCLI` for safety; `POST /api/backlog-candidates/:id/suggest-role` returns RouterResult without persisting; **Apply API NOT extended with `role_dispatch_auto` mode** — operator confirms suggested role manually then applies via `mode=role_dispatch` (audit row `actor_kind="operator"` even when suggestion came from router; this avoids premature auto-apply before router quality is dogfood-validated, per user pick B2); 1 new error kind (`router_no_match`); router output validated against catalog (role_id must be known or "no_match", confidence ∈ [0,1], reasoning ≤ 1024 chars with control-char sanitization); migration 032 reserved as PR-3 placeholder (per critic #9). **PR-4**: connector activity tracking — migration 031 adds `current_activity_json/at` snapshot column on `local_connectors`; ActivityReporter on connector emits phase transitions via **enqueue-not-overwrite** queue (per critic #5 — phase transitions in rapid succession all reach subscriber; same-phase step changes coalesce in 500ms window); phase enum is `idle/claiming_run/planning/claiming_task/dispatching/submitting` (no `routing` — that arrives in 6d with auto-apply); `POST /api/connector/activity` lightweight ingest endpoint; in-memory Hub broadcasts to SSE subscribers via unbuffered channels (slow clients auto-dropped); `GET /api/connectors/:id/activity-stream` SSE with 30s keepalive + `X-Accel-Buffering: no` header (C1: SSE retained vs polling-only); `GET /api/connectors/:id/activity` polling fallback; project-level aggregate `GET /api/projects/:id/active-connectors`; frontend `useConnectorActivity` hook auto-degrades SSE→polling→stale; `ConnectorActivityBadge` 3 density variants. **Activity does NOT write to actor_audit** (per critic #8 — 5+ phase transitions per task would drown the human-meaningful authoring trail; only latest snapshot persists). **PR-5**: dogfood (8 deliberate-trigger steps including activity SSE observation) + `docs/operating-rules.md` "Role-dispatch + visibility model" section + DECISIONS.md final + **archival pass** moving 2026-04-22-and-older entries to `DECISIONS_ARCHIVE.md` (per critic #11 — file already past 30KB threshold).
+- **Alternatives considered**: (1) Defer authoring + router + activity to Phase 6d — rejected; the user explicitly stated dogfood today is broken without all three (see catch-22 analysis in plan v5 §1.1). (2) Single mega-PR — rejected; 15-day single PR is unreviewable; coherent slicing into 5 PRs preserves coherence while making each PR shippable in 1-5 days. (3) Apply-time-only authoring (no candidate edit + no audit) — rejected; user feedback explicitly requires comprehensive authoring (`feedback_no_simple_approach`); apply-time-only would force re-rework when Phase 6d's LLM planner pre-fills `execution_role` at candidate creation. (4) Polling-only activity (no SSE) — rejected at user pick C1; sub-second visibility is consistent with `feedback_no_simple_approach` even though 3s polling would technically suffice. (5) Sync `role_dispatch_auto` in PR-3 — **rejected at user pick B2 (critic round 3 #2)**; without dogfood data on router quality, auto-apply is premature optimization; PR-3 ships suggest-only and PR-6 (or 6d) lands auto-apply once PR-5 dogfood validates router accuracy. (6) Async `role_dispatch_auto` + webhook in 6c — deferred to 6d per user §5 Q3 answer (depends on auto-apply existing first). (7) Skip Role.Category field — rejected; without `category="meta"` filter the dispatcher prompt would surface in `/api/roles` and self-recommend, breaking the routing semantics. (8) `actor_audit` as candidate-specific table — rejected; designed generic from start (subject_kind discriminator) so PR-3 router-actor rows (when 6d auto-apply lands) and PR-4 system-actor rows reuse the same infrastructure. (9) Denormalised `execution_role_set_by/_at/_confidence` columns on `backlog_candidates` — **rejected at critic round 3 #1**; `actor_audit` is the single source of truth and frontend reads via JOIN helper to avoid drift between two writers. (10) Activity history written to `actor_audit` — **rejected at critic round 3 #8**; ~5 phase transitions per task dispatch would drown the human-meaningful authoring trail; activity only persists as latest snapshot, dedicated time-series store deferred to 6d if needed. (11) `dispatcher.md` placed at `prompts/dispatcher.md` siblng to `backlog.md` and `whatsnext.md` — **rejected at critic round 3 #10**; meta-prompts deserve their own subtree (`prompts/meta/`) for IA clarity and future expansion.
+- **Constraints introduced**: **(a) Role catalog**: `backend/internal/roles/catalog.go` is hand-maintained `[]Role{...}` (no codegen); `Role.Category` ∈ {"role", "meta"}; `roles.All()` returns full set; `/api/roles` filters category="role"; `TestCatalogMatchesPromptDir` walks both `prompts/roles/*.md` and `prompts/meta/*.md`; PR adding a new role MUST edit both files. **(b) execution_role lifecycle**: writes go through `BacklogCandidateStore.UpdateExecutionRole(ctx, id, role, actor)` which does single-transaction validate → SELECT old → UPDATE → INSERT actor_audit; concurrent PATCH/apply use `BEGIN IMMEDIATE` (existing SQLite pattern). `set_by` ∈ {"", "operator", "router"}; `confidence` only set when `set_by="router"`. **(c) Apply API**: `mode=role_dispatch` requires non-empty `execution_role` in catalog → else 400; `mode=manual` ignores `execution_role`. (`mode=role_dispatch_auto` is deferred to Phase 6d per user pick B2 — see Alternatives §5.) **(d) Server-side claim enforcement**: `MarkTaskRoleNotFound` does `dispatch_status: queued → failed` atomic transition (NOT `running → failed`) plus single-tx audit row; non-applicable when task already leased. **(e) L0 safety**: per-role timeouts in catalog (code-reviewer=15min, test-writer=20min, api-contract-writer=30min, ui-scaffolder=45min, db-schema-designer=45min, backend-architect=90min, dispatcher=60min); `ANPM_DISPATCH_TIMEOUT` env override hierarchy: env>0 → env value, env=0 → disabled, env<0/unset → catalog → 30min fallback. SIGTERM→5s→SIGKILL escalation via `cmd.Cancel`+`cmd.WaitDelay`; adversarial test uses real subprocess with `signal.Ignore`. Output cap default 5 MB via `ANPM_DISPATCH_OUTPUT_MAX`; 0=disabled. JSON schema minimum: must contain `files []`; optional fields type-checked. **(f) Router**: dispatcher prompt is in `prompts/meta/dispatcher.md` (NOT under `roles/`) with `category: meta`; output validated for role_id ∈ catalog, confidence ∈ [0,1], reasoning length and control-char sanitization; alternatives all validated against catalog; `min_confidence` default 0.7 (reserved for Phase 6d auto-apply path); router output never trusted to be a valid catalog entry — Validation is a hard gate not a soft check. **(g) Activity model**: phases are exhaustive enum `idle/claiming_run/planning/claiming_task/dispatching/submitting` (no `routing` in 6c — that arrives in 6d alongside auto-apply); reporter uses **enqueue-not-overwrite** so rapid phase transitions all reach subscriber, only same-phase step changes coalesce within 500ms; in-memory Hub uses unbuffered subscriber channels with non-blocking send (slow client auto-drops, reconnect picks up via initial state); SSE includes 30s keepalive comments + `X-Accel-Buffering: no`; per-user concurrent SSE connections capped at 3 (503 above); DB persists latest snapshot only (Activity does NOT write to actor_audit per critic round 3 #8); idle activities retained 5 min before purge. **(h) Audit invariants**: every `execution_role` change writes to `actor_audit` in same transaction; `actor_kind` ∈ {"user", "router", "system", "connector"}; `rationale` stores router confidence + reasoning, or system change reason; cascade-delete with subject row. **(i) Operational constraint**: L0 boundary is the ONLY safety enforcement until L1 ships — operators MUST NOT expose role_dispatch to non-operator task submitters or untrusted task content; this is documented verbatim in `docs/operating-rules.md` and is non-negotiable. L1 (process-level jail via firejail/namespaces) is evaluated when Phase 6d opens; L2 (container/VM full isolation) requires one of three triggers fired: multi-tenant submitters, untrusted external repos, or compliance requirements. **(j) Phase 6d/7 trigger conditions**: recorded in `docs/phase6c-plan.md` §9; opening either phase without a documented trigger having fired is a scope-creep violation. **(k) PR ordering**: PR-1 first (already implemented); PR-2/3/4 sequential to avoid rebase cost (each later PR consumes earlier-PR types); PR-5 last (dogfood requires all four prior PRs).
+- **Source**: `docs/phase6c-plan.md` v5.1 (post-critic-round-3, B2 + C1 拍板). Backed by dogfood-generated backlog candidates `bad629dc` (catalog SoT) and `fb040ce6` (safety boundary), both `approved` status as of 2026-04-25, plus design dialogues with the user that surfaced the catch-22, the LLM router request, the activity visibility requirement, and a critic round adversarially analyzing v5 that produced 14 findings (9 unilateral fixes adopted, B2 + C1 user-decided).
+
 ## 2026-04-25: Requirement discard, analysis filtering, and connector run-status badge [agent:application-implementer]
 
 - **Context**: Three interconnected UX improvements: (1) Requirements with no applied tasks are permanently deletable (not just archivable); (2) `source=analysis` / `source=system` requirements should not appear in the sidebar or affect the isEmpty check; (3) The connector status badge should reflect when a run is actively leased or queued.

diff --git a/backend/internal/connector/builtin_adapter.go b/backend/internal/connector/builtin_adapter.go
@@ -62,14 +62,26 @@ func ExecuteBuiltin(ctx context.Context, input ExecJSONInput) models.LocalConnec
 	}
 
 	// Run CLI.
-	output, runErr := invokeBuiltinCLI(ctx, agent, binary, model, prompt, timeoutSec)
+	output, truncated, runErr := invokeBuiltinCLI(ctx, agent, binary, model, prompt, timeoutSec)
+	// Precedence: when both runErr and truncation are set, runErr is
+	// more informative (likely a timeout that produced partial output).
+	// The truncation-only branch fires when the CLI exited normally
+	// but produced more than the cap.
 	if runErr != "" {
 		return models.LocalConnectorSubmitRunResultRequest{
 			Success:      false,
 			ErrorMessage: runErr,
 			CliInfo:      &models.CliUsageInfo{Agent: agent, Model: model, ModelSource: modelSource},
 		}
 	}
+	if truncated {
+		return models.LocalConnectorSubmitRunResultRequest{
+			Success:      false,
+			ErrorMessage: fmt.Sprintf("CLI stdout exceeded the dispatch output cap (%d bytes); raise ANPM_DISPATCH_OUTPUT_MAX or set 0 to disable", dispatchOutputMaxBytes()),
+			ErrorKind:    models.ErrorKindOutputTooLarge,
+			CliInfo:      &models.CliUsageInfo{Agent: agent, Model: model, ModelSource: modelSource},
+		}
+	}
 
 	// Strip ANSI (Codex PTY output may contain escape codes even after drain).
 	output = stripANSI(output)
@@ -565,11 +577,27 @@ func buildScopeSnippet(req *models.Requirement) string {
 	return strings.Join(parts, "\n")
 }
 
-// invokeBuiltinCLI runs the CLI and returns (output, errorMessage).
+// invokeBuiltinCLI runs the CLI and returns (output, truncated, errorMessage).
+//
+// Phase 6c L0 safety boundary:
+//   - timeoutSec <= 0 → no wall-clock limit (escape hatch); the caller
+//     resolved this from roles.TimeoutFor (env=0) or chose to disable.
+//   - timeoutSec > 0  → context.WithTimeout wrapped with SIGTERM →
+//     sigtermGracePeriod → SIGKILL escalation (applyDispatchKillEscalation).
+//   - stdout is wrapped in boundedWriter using dispatchOutputMaxBytes;
+//     when the cap is exceeded, output is truncated and the second
+//     return value is true (caller maps to ErrorKindOutputTooLarge).
+//
 // For Claude: uses exec.CommandContext with -p flag.
 // For Codex: uses creack/pty because Codex checks stdin.isTTY.
-func invokeBuiltinCLI(ctx context.Context, agent, binary, model, prompt string, timeoutSec int) (string, string) {
-	runCtx, cancel := context.WithTimeout(ctx, time.Duration(timeoutSec)*time.Second)
+func invokeBuiltinCLI(ctx context.Context, agent, binary, model, prompt string, timeoutSec int) (string, bool, string) {
+	var runCtx context.Context
+	var cancel context.CancelFunc
+	if timeoutSec > 0 {
+		runCtx, cancel = context.WithTimeout(ctx, time.Duration(timeoutSec)*time.Second)
+	} else {
+		runCtx, cancel = context.WithCancel(ctx)
+	}
 	defer cancel()
 
 	switch agent {
@@ -578,26 +606,28 @@ func invokeBuiltinCLI(ctx context.Context, agent, binary, model, prompt string,
 	case "codex":
 		return invokeCodexCLI(runCtx, binary, model, prompt, timeoutSec)
 	default:
-		return "", fmt.Sprintf("unsupported agent %q (expected 'claude' or 'codex')", agent)
+		return "", false, fmt.Sprintf("unsupported agent %q (expected 'claude' or 'codex')", agent)
 	}
 }
 
 // invokeClaudeCLI runs: claude -p <prompt> [--model <model>]
-func invokeClaudeCLI(ctx context.Context, binary, model, prompt string, timeoutSec int) (string, string) {
+func invokeClaudeCLI(ctx context.Context, binary, model, prompt string, timeoutSec int) (string, bool, string) {
 	args := []string{"-p", prompt}
 	if model != "" {
 		args = append(args, "--model", model)
 	}
 	cmd := exec.CommandContext(ctx, binary, args...)
 	cmd.Stdin = nil // disconnected
+	applyDispatchKillEscalation(cmd)
 
 	var stdout, stderr bytes.Buffer
-	cmd.Stdout = &stdout
+	bw := newBoundedWriter(&stdout, dispatchOutputMaxBytes())
+	cmd.Stdout = bw
 	cmd.Stderr = &stderr
 
 	if err := cmd.Run(); err != nil {
 		if ctx.Err() == context.DeadlineExceeded {
-			return "", fmt.Sprintf("claude CLI timed out after %ds", timeoutSec)
+			return "", bw.Truncated(), fmt.Sprintf("claude CLI timed out after %ds", timeoutSec)
 		}
 		detail := strings.TrimSpace(stderr.String())
 		if detail == "" {
@@ -609,33 +639,49 @@ func invokeClaudeCLI(ctx context.Context, binary, model, prompt string, timeoutS
 		if len(detail) > 400 {
 			detail = detail[:400]
 		}
-		return "", fmt.Sprintf("claude CLI failed: %s", detail)
+		return stdout.String(), bw.Truncated(), fmt.Sprintf("claude CLI failed: %s", detail)
 	}
-	return stdout.String(), ""
+	return stdout.String(), bw.Truncated(), ""
 }
 
 // invokeCodexCLI runs codex inside a PTY because Codex checks stdin.isTTY.
-func invokeCodexCLI(ctx context.Context, binary, model, prompt string, timeoutSec int) (string, string) {
+//
+// PTY-specific concurrency: the io.Copy that drains the PTY master is
+// run on a separate goroutine so we can Wait() on the process FIRST.
+// On a CLI that ignores SIGTERM, exec sends SIGKILL after WaitDelay;
+// once the process dies the kernel will eventually close its end of
+// the PTY which lets io.Copy return — but on some platforms that close
+// is delayed. Closing ptmx explicitly after Wait returns guarantees
+// the goroutine unblocks immediately, and the copyDone channel ensures
+// we capture all output before reading buf.
+func invokeCodexCLI(ctx context.Context, binary, model, prompt string, timeoutSec int) (string, bool, string) {
 	args := []string{prompt}
 	if model != "" {
 		args = append(args, "--model", model)
 	}
 	cmd := exec.CommandContext(ctx, binary, args...)
+	applyDispatchKillEscalation(cmd)
 
 	ptmx, err := pty.Start(cmd)
 	if err != nil {
-		return "", fmt.Sprintf("pty unavailable for codex: %v", err)
+		return "", false, fmt.Sprintf("pty unavailable for codex: %v", err)
 	}
-	defer func() { _ = ptmx.Close() }()
 
-	// Drain all output from the PTY master into a buffer.
 	var buf bytes.Buffer
-	_, _ = io.Copy(&buf, ptmx)
-
-	// Wait for the command to finish.
-	if waitErr := cmd.Wait(); waitErr != nil {
+	bw := newBoundedWriter(&buf, dispatchOutputMaxBytes())
+	copyDone := make(chan struct{})
+	go func() {
+		_, _ = io.Copy(bw, ptmx)
+		close(copyDone)
+	}()
+
+	waitErr := cmd.Wait()
+	_ = ptmx.Close() // unblock the copy goroutine if it has not already returned
+	<-copyDone       // ensure all output is captured before reading buf
+
+	if waitErr != nil {
 		if ctx.Err() == context.DeadlineExceeded {
-			return "", fmt.Sprintf("codex CLI timed out after %ds", timeoutSec)
+			return "", bw.Truncated(), fmt.Sprintf("codex CLI timed out after %ds", timeoutSec)
 		}
 		raw := stripANSI(buf.String())
 		raw = strings.ReplaceAll(raw, "\r\n", "\n")
@@ -644,13 +690,13 @@ func invokeCodexCLI(ctx context.Context, binary, model, prompt string, timeoutSe
 		if len(tail) > 600 {
 			tail = tail[len(tail)-600:]
 		}
-		return "", fmt.Sprintf("codex CLI failed (%v): %s", waitErr, tail)
+		return raw, bw.Truncated(), fmt.Sprintf("codex CLI failed (%v): %s", waitErr, tail)
 	}
 
 	raw := buf.String()
 	raw = strings.ReplaceAll(raw, "\r\n", "\n")
 	raw = strings.ReplaceAll(raw, "\r", "\n")
-	return raw, ""
+	return raw, bw.Truncated(), ""
 }
 
 // stripANSI removes ANSI escape codes from s.