From 89bfbcf4e7bab14bf47ed3db07b0c5c7fce33f95 Mon Sep 17 00:00:00 2001 From: Lien Chen Date: Sat, 25 Apr 2026 22:52:13 +0900 Subject: [PATCH 1/2] feat(phase6c-pr1): role catalog skeleton + L0 safety boundary MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Phase 6c PR-1 — implements the role catalog data structure and the L0 safety boundary around connector subprocess invocation. Closes two of the four gaps Phase 6c addresses; remaining PRs ship per the v5.1 plan in docs/phase6c-plan.md. ## What's in this PR - backend/internal/roles/ — hand-maintained Role catalog (6 entries) with per-role DefaultTimeoutSec (code-reviewer 15min through backend-architect 90min). TimeoutFor resolves env override → catalog → 30min fallback. ANPM_DISPATCH_TIMEOUT=0 disables. TestCatalogMatchesPromptDir is the SoT-drift detector. - backend/internal/connector/dispatch_safety.go — - boundedWriter (atomic.Int64 + atomic.Bool, max=0 disables) - applyDispatchKillEscalation: cmd.Cancel = SIGTERM, cmd.WaitDelay = 5s before SIGKILL (per Go 1.20+ contract; bare context.Cancel does not guarantee subprocess termination on SIGTERM-trapping CLIs) - validateExecutionResult: enforces files [] required, optional test_instructions/risks/followups type-checked - invokeBuiltinCLI signature changes from (string, string) to (string, bool, string) adding truncated flag. invokeClaudeCLI and invokeCodexCLI both apply boundedWriter + signal escalation. invokeCodexCLI uses io.Copy goroutine + ptmx.Close after Wait to avoid hangs on SIGTERM-ignoring Codex (risk-reviewer H1). - RunOnceTask uses roles.TimeoutFor(roleID) for per-role timeout; precedence is runErr-over-truncated when both are set (so timeout signals are not masked by output cap firing); routes failures to 4 new error_kinds: dispatch_timeout, output_too_large, invalid_result_schema, role_not_found (the last is reserved for PR-2 server-side claim-next-task enforcement). - Adversarial test matrix: 14 dispatch tests + 9 catalog tests, all green under -race. T-6c-C2-2 spawns a real subprocess via os.Args[0] with signal.Ignore(SIGTERM) to prove SIGKILL escalation fires within 1s timeout + 5s grace. TestMain uses a double-sentinel guard (ANPM_TEST_HELPER_GUARD=1 + ANPM_TEST_HELPER_MODE) so a developer with the mode env set in their shell does not get a silent zero-test exit. ## Plan v5.1 + DECISIONS context This PR also ships docs/phase6c-plan.md (5-PR scope, ~13 days total) and a Phase 6c entry in DECISIONS.md. Subsequent PRs: - PR-2: authoring lifecycle + actor_audit (SoT) + 4-point catalog enforcement - PR-3: LLM router suggest endpoint (advisory; auto-apply deferred to Phase 6d per dogfood signal) - PR-4: connector activity tracking via SSE - PR-5: dogfood + DECISIONS archival ## Reviews completed - make pre-pr: green twice (SQLite + PostgreSQL + frontend build) - critic subagent: 2 rounds, 4 Required + 2 nice-to-have all addressed; round-2 findings included Codex PTY io.Copy goroutine fix, atomic.Int64 boundedWriter, SIGTERM-ignore test slack 5s - /security-review: 0 findings ≥ confidence 8 - risk-reviewer: 2 HIGH (H1 Codex PTY hang, H2 boundedWriter race) + 1 MEDIUM (M1 timing fragility) all addressed; lower-severity findings logged as risks R22-R31 in plan §6 Co-Authored-By: Claude Opus 4.7 (1M context) --- DECISIONS.md | 8 + backend/internal/connector/builtin_adapter.go | 88 +- backend/internal/connector/dispatch_safety.go | 179 +++ .../connector/dispatch_safety_test.go | 389 ++++++ backend/internal/connector/probe.go | 2 +- backend/internal/connector/service.go | 73 +- backend/internal/models/requirement.go | 70 +- backend/internal/roles/catalog.go | 156 +++ backend/internal/roles/catalog_test.go | 268 ++++ docs/phase6c-plan.md | 1111 +++++++++++++++++ 10 files changed, 2287 insertions(+), 57 deletions(-) create mode 100644 backend/internal/connector/dispatch_safety.go create mode 100644 backend/internal/connector/dispatch_safety_test.go create mode 100644 backend/internal/roles/catalog.go create mode 100644 backend/internal/roles/catalog_test.go create mode 100644 docs/phase6c-plan.md diff --git a/DECISIONS.md b/DECISIONS.md index 317c258..f6ba582 100644 --- a/DECISIONS.md +++ b/DECISIONS.md @@ -4,6 +4,14 @@ Active architectural and behavioral decisions for Agent Native PM. When this file exceeds 50 entries or 30 KB, archive older entries to `DECISIONS_ARCHIVE.md`. The most recent archival pass was on 2026-04-22. +## 2026-04-25: Phase 6c v5.1 — L0 safety + authoring lifecycle + LLM router (suggest-only) + activity visibility [agent:feature-planner] + +- **Context**: Phase 6b shipped role-dispatch end-to-end in code but the user-facing path is broken in three independent ways: (1) `execution_role` has no UI authoring surface, so the role_dispatch radio is permanently disabled (catch-22); (2) "auto-dispatch" is in name only — operators must still manually pick a role for every task; (3) connector activity during long task execution (backend-architect can run 90 min) is invisible to the frontend, making dogfood debugging impossible. Phase 5 §(g)/§(h) also flagged subprocess sandboxing as a Phase 6 blocker. The user explicitly rejected simple / phase-staged solutions: "一律不考慮簡單作法 我希望是完善的改動 而不是臨時性處理" (memory: `feedback_no_simple_approach`). Closing all four gaps in one phase is therefore in scope, even at ~15-day total cost. +- **Decision**: Phase 6c ships as **5 separate PRs**, each independently reviewable and rollback-able, but coherent as a single capability slice. **PR-1 (done)**: role catalog skeleton (hand-maintained `[]Role{...}` with per-role `DefaultTimeoutSec`) + L0 safety boundary in connector subprocess invocation (wall-clock timeout via `cmd.Cancel = SIGTERM` + `cmd.WaitDelay = 5s` escalation, output size cap via `boundedWriter`, JSON schema minimum validation, 4 new error kinds: `role_not_found`/`dispatch_timeout`/`output_too_large`/`invalid_result_schema`). **PR-2**: full authoring lifecycle — migration 030 adds generic `actor_audit` table (subject_kind/subject_id/field/old_value/new_value/actor_kind/actor_id/rationale/confidence) which is the **single source of truth** for execution_role authoring metadata (no denormalised columns on `backlog_candidates` per critic round 3 #1; frontend reads via helper `LatestAuthoring(subject_kind, subject_id, field)` joining against latest audit row); PATCH endpoint accepts `execution_role`; apply API extended with `execution_role` payload; catalog enforcement at four entry points (PATCH, apply, claim-next-task; suggest gates at validation in PR-3); CandidateReviewPanel rewritten so role_dispatch radio is always enabled with inline `` 從 catalog 拉,inline edit popover) +2. Operator 可在 apply panel **at apply time** 設 / 改 execution_role(pre-fill 自 candidate latest audit row) +3. Apply payload 帶 `execution_role`;server 在 4 個進入點做 catalog enforcement(PATCH / suggest / apply / claim-next-task) +4. Stale role(candidate 既有 role 但已不在 catalog)顯示 inline warning + 預設清空 dropdown +5. 所有 execution_role 變更走 `actor_audit` table,actor_kind ∈ {user, router, system},含 rationale + timestamp。**Audit 是唯一的 set_by/at/confidence SoT**(critic #1 — 不在 candidate 上重複欄位;frontend 顯示時走 audit JOIN) +6. `MarkTaskRoleNotFound` 在 claim-next-task 時把 stale-role task `queued → failed` 原子轉移 + +### 2.2 LLM Router — Suggest-only(PR-3) + +7. 新 prompt `prompts/meta/dispatcher.md`(category=meta),輸出 `{role_id, confidence, reasoning, alternatives[]}`(critic #10 — 放 `meta/` 子目錄,不和 `roles/` 並列也不和 backlog/whatsnext 並列) +8. `POST /api/backlog-candidates/:id/suggest-role` endpoint:呼叫 router、回傳結果**不持久化** +9. Apply panel + Candidate card 都加 "💡 Suggest" 按鈕:呼叫 router → 預填 dropdown + tooltip 顯示 reasoning + alternatives +10. Router 呼叫重用 PR-1 的 invokeBuiltinCLI(含 timeout / output cap / signal escalation);server 端在 process 內呼叫(單機假設,文件化) +11. Router timeout 來自 catalog(dispatcher role default 60s) +12. 1 個新 error_kind:`router_no_match`(router 自己判斷沒匹配)+ 既有 PR-1 kinds 涵蓋其他失敗(output_too_large / dispatch_timeout / invalid_result_schema) +13. **Auto-apply mode(`mode=role_dispatch_auto`)延到 Phase 6d**(critic #2 / user 拍板 B2)— 待 PR-5 dogfood 收集 router 品質訊號後再決定 + +### 2.3 Activity Visibility(PR-4 完整) + +14. Connector 在每個 phase 邊界(idle / claiming_run / planning / claiming_task / dispatching / submitting)呼叫 `ActivityReporter.Report`。**Phase 變化用 enqueue 不是 overwrite**(critic #5)— 確保連續 phase 切換 `claiming_task → dispatching → submitting` 都會被推送,即使在 coalesce 視窗內。`routing` phase **延到 Phase 6d**(auto-apply 上線後才需要)。 +15. `POST /api/connector/activity` lightweight endpoint 接收上報;server-side activity hub 維護 in-memory state + DB snapshot 欄位(不寫 actor_audit — critic #8,避免 write storm) +16. `GET /api/connectors/:id/activity-stream` SSE 推送即時 activity 變化;polling fallback `GET /api/connectors/:id/activity`(C1 拍板:保留 SSE) +17. Frontend `useConnectorActivity` hook:SSE 為主、polling 為輔、reconnect 邏輯、stale 偵測 +18. `ConnectorActivityBadge` 3 種 density(compact / standard / full);整合進 PlanningTab、TasksTab、CandidateReviewPanel apply 後 watch +19. `GET /api/projects/:id/active-connectors` project-level aggregate + +### 2.4 Dogfood + Docs(PR-5 完整) + +20. `docs/phase6c-dogfood-notes.md`:7 個 dogfood 步驟(5 個原 v3 觸發新 error_kind + 2 個 v5.1 觀察 router suggest 與 activity badge 切換;auto-apply / PhaseRouting 預覽留 6d) +21. `docs/operating-rules.md` 新「Role-dispatch safety + visibility model」一節,含 L0 / L1 / L2 觸發條件 + activity model 約束 +22. DECISIONS.md 補完 Phase 6c 條目(涵蓋 v5 全部設計) + +--- + +## 3. Slice 計畫(5 PR) + +### 3.1 PR-1:Catalog skeleton + L0 safety boundary(**已實作完成**) + +詳見 v3 plan 內容(保留): +- `backend/internal/roles/catalog.go`(Role struct + 6 entries + DefaultTimeoutSec + IsKnown / ByID / TimeoutFor / All) +- `backend/internal/roles/catalog_test.go`(drift detector + 9 tests) +- `backend/internal/connector/dispatch_safety.go`(boundedWriter + signal escalation + validateExecutionResult) +- `backend/internal/connector/dispatch_safety_test.go`(11 dispatch + 4 unit + 1 timeout-truncation precedence test) +- `invokeBuiltinCLI` 簽名擴 `(string, bool, string)`(+ truncated) +- `RunOnceTask` 用 `roles.TimeoutFor` + truncation/runErr precedence + schema validation + classifyDispatchRunError +- 4 個新 error_kind(dispatch_timeout / output_too_large / invalid_result_schema / role_not_found) + +**Critic round 2 修正**: +- TestMain 雙 sentinel guard(避免 user shell env 誤觸) +- ExecuteBuiltin truncation 補 ErrorKindOutputTooLarge +- 移除 redundant resolveAgentFromBinary +- runErr-over-truncated precedence + 對應 test +- boundedWriter 改 atomic.Int64(H2 防禦) +- Codex PTY io.Copy goroutine + ptmx.Close 序列(H1 修正) +- SIGTERM-ignore test slack 5s(M1 防 CI flake) +- TimeoutFor whitespace env test(L8) + +**Status**: 待開 PR;plan v5 確認後一起開 PR-1。 + +--- + +### 3.2 PR-2:Authoring 完整化 + audit log + multi-point catalog enforcement(4.6 天) + +#### 3.2.1 Migration 030 + +```sql +-- 030_authoring_audit.sql + +-- 通用 actor_audit 表 — 是 execution_role 的 single source of truth +-- (critic #1:不在 candidate 上加重複欄位) +CREATE TABLE actor_audit ( + id TEXT PRIMARY KEY, + subject_kind TEXT NOT NULL, -- 'backlog_candidate' | 'task' | 'planning_run' | 'connector' + subject_id TEXT NOT NULL, + field TEXT NOT NULL, -- 'execution_role' | 'status' | 'po_decision' | ... + old_value TEXT, + new_value TEXT, + actor_kind TEXT NOT NULL, -- 'user' | 'router' | 'system' | 'connector' + -- 'router' is reserved for Phase 6d auto-apply; + -- NO writer in 6c (PR-3 suggest writes 'user' after operator confirms) + actor_id TEXT, -- user_id | router prompt version | system component name + rationale TEXT, -- router confidence + reasoning,or system reason + confidence REAL, -- 0.0-1.0;only set when actor_kind='router' + created_at TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP +); +CREATE INDEX idx_actor_audit_subject ON actor_audit(subject_kind, subject_id, created_at DESC); +CREATE INDEX idx_actor_audit_subject_field ON actor_audit(subject_kind, subject_id, field, created_at DESC); +``` + +`backlog_candidates.execution_role` 既有欄位**保留**(v3 Phase 5 已加,是當前 task source 的 input)。但「誰設的、何時設、信心多少」一律從 `actor_audit` 查 — 不在 candidate row 上重複欄位。 + +**Helper 函式**:`backend/internal/audit/audit.go` 提供 `LatestAuthoring(subjectKind, subjectID, field)` 回傳最新一筆 audit row(with actor_kind/at/confidence/rationale)。Frontend `GET /api/backlog-candidates/:id` response 加 `execution_role_authoring` 欄位(透過此 helper 回填,非 column)。 + +#### 3.2.2 Backend changes + +**Store 層**: +```go +// backlog_candidate_store.go +func (s *Store) UpdateExecutionRole( + ctx context.Context, id, role string, actor ActorInfo, +) error +// 單一 transaction: +// 1. 驗 role 在 catalog(roles.IsKnown)— role="" 視為 clear,不需 catalog 檢查 +// 2. SELECT old_value(for audit) +// 3. UPDATE candidate.execution_role +// 4. INSERT actor_audit row(含 actor_kind/actor_id/rationale/confidence) +// 5. COMMIT + +// 既有 ApplyToTaskWithMode 簽名擴: +func (s *Store) ApplyToTaskWithMode( + id, executionMode, executionRole string, actor ActorInfo, +) (*ApplyResult, error) +// 內部: +// - mode=role_dispatch && role 空 → ErrApplyMissingRole +// - mode=role_dispatch && !roles.IsKnown(role) → ErrApplyUnknownRole +// - mode=manual → ignore role +// - 同 transaction 寫 candidate.execution_role (若有變) + audit + create task +``` + +**Handler 層**: +```go +// PATCH /api/backlog-candidates/:id 擴:accept execution_role 欄位 +type UpdateBacklogCandidateRequest struct { + POdecision *string `json:"po_decision,omitempty"` + ExecutionRole *string `json:"execution_role,omitempty"` // pointer = explicitly set/clear vs not-mentioned +} + +// POST /api/backlog-candidates/:id/apply 擴: +type ApplyBacklogCandidateRequest struct { + ExecutionMode string `json:"execution_mode"` + ExecutionRole string `json:"execution_role,omitempty"` +} +``` + +Validation: +- mode=`role_dispatch` + role empty → 400 `"execution_role required when execution_mode=role_dispatch"` +- mode=`role_dispatch` + role 不在 catalog → 400 with current catalog list +- mode=`manual` → ignore role +- mode=`role_dispatch_auto` → PR-3 處理 + +**`MarkTaskRoleNotFound` + claim-next-task enforcement**(從 v3 帶過來): +```go +func (s *TaskStore) MarkTaskRoleNotFound( + ctx, taskID, roleID string, +) error +// 條件 update:dispatch_status='queued' → 'failed' +// 同 transaction 寫 execution_result {success:false, error_kind:'role_not_found'} +// + actor_audit row(actor_kind='system') +// 若 task 已被 lease(status=running)→ 0 rows → ErrTaskNotInQueuedState +``` + +```go +// connector_dispatch.go ClaimNextTask 加: +roleID := parseRoleIDFromSource(task.Source) +if !roles.IsKnown(roleID) { + if err := store.MarkTaskRoleNotFound(ctx, task.ID, roleID); err != nil { + log.Printf("mark role_not_found failed: %v", err) + } + continue // 看下一個 task +} +``` + +**`GET /api/roles`**(公開): +```go +type RoleResponse struct { + ID string `json:"id"` + Title string `json:"title"` + Version int `json:"version"` + UseCase string `json:"use_case"` + DefaultTimeoutSec int `json:"default_timeout_sec"` + Category string `json:"category"` // "role" | "meta" +} + +func (h *Handler) ListRoles(w, r) { + roles := roles.All() + // filter category="role" — meta-roles (dispatcher) 不暴露給 apply panel + out := []RoleResponse{} + for _, r := range roles { + if r.Category == "role" { + out = append(out, toResponse(r)) + } + } + writeJSON(w, 200, out) +} +``` + +⚠️ 需要在 `roles/catalog.go` 加 `Role.Category` 欄位(之前 v3 沒有)— PR-2 順便加。dispatcher role(PR-3 加)會用 `Category: "meta"`。 + +#### 3.2.3 Frontend changes + +**新檔 `frontend/src/types/roles.ts`**: +```typescript +export const KNOWN_ROLE_IDS = [ + 'backend-architect', + 'ui-scaffolder', + 'db-schema-designer', + 'api-contract-writer', + 'test-writer', + 'code-reviewer', +] as const; +export type KnownRoleId = typeof KNOWN_ROLE_IDS[number]; + +export interface RoleInfo { + id: KnownRoleId; + title: string; + version: number; + use_case: string; + default_timeout_sec: number; + category: 'role' | 'meta'; +} +``` + +**新檔 `frontend/src/api/roles.ts`**: +```typescript +export async function listRoles(): Promise +export async function suggestRoleForCandidate(candidateID: string): Promise // PR-3 +``` + +**Drift test** `roles.test.ts`:fetch `/api/roles` → assert id 集合 = `KNOWN_ROLE_IDS`。 + +**CandidateReviewPanel 重寫 execution-mode UI**: +```tsx +// Radio 永遠 enabled(不看 candidate.execution_role) +const [chosenRole, setChosenRole] = useState(candidateInitialRole) +const candidateRole = selectedCandidate?.execution_role +const roleStaleWarning = candidateRole && !KNOWN_ROLE_IDS.includes(candidateRole) + + onSelectedExecutionModeChange('role_dispatch')} /> + +{selectedExecutionMode === 'role_dispatch' && ( + <> + + {roleStaleWarning && ( +
+ ⚠ Previously suggested role {candidateRole} is no longer in the catalog. +
+ )} + {/* Suggest 按鈕 在 PR-3 加 */} + +)} + +// Apply button disabled 條件加: +// (selectedExecutionMode === 'role_dispatch' && !chosenRole) +``` + +**新 component `CandidateRoleEditor.tsx`**(在 candidate card 上): +```tsx +
+ {candidate.execution_role ? ( + + [{role.title}] + + ) : ( + — no role set — + )} + + {/* popover with role `; CandidateRoleEditor inline popover on candidate cards; stale-role warning when previously-suggested role no longer in catalog. **PR-3 (suggest-only, B2)**: LLM router as advisory meta-agent — new `backend/internal/prompts/meta/dispatcher.md` (category=meta, version 1, default timeout 60s; lives under `meta/` subtree per critic #10, drift test walks both `roles/` and `meta/`); `dispatcher.Service.Suggest` reuses PR-1's `invokeBuiltinCLI` for safety; `POST /api/backlog-candidates/:id/suggest-role` returns RouterResult without persisting; **Apply API NOT extended with `role_dispatch_auto` mode** — operator confirms suggested role manually then applies via `mode=role_dispatch` (audit row `actor_kind="operator"` even when suggestion came from router; this avoids premature auto-apply before router quality is dogfood-validated, per user pick B2); 1 new error kind (`router_no_match`); router output validated against catalog (role_id must be known or "no_match", confidence ∈ [0,1], reasoning ≤ 1024 chars with control-char sanitization); migration 032 reserved as PR-3 placeholder (per critic #9). **PR-4**: connector activity tracking — migration 031 adds `current_activity_json/at` snapshot column on `local_connectors`; ActivityReporter on connector emits phase transitions via **enqueue-not-overwrite** queue (per critic #5 — phase transitions in rapid succession all reach subscriber; same-phase step changes coalesce in 500ms window); phase enum is `idle/claiming_run/planning/claiming_task/dispatching/submitting` (no `routing` — that arrives in 6d with auto-apply); `POST /api/connector/activity` lightweight ingest endpoint; in-memory Hub broadcasts to SSE subscribers via unbuffered channels (slow clients auto-dropped); `GET /api/connectors/:id/activity-stream` SSE with 30s keepalive + `X-Accel-Buffering: no` header (C1: SSE retained vs polling-only); `GET /api/connectors/:id/activity` polling fallback; project-level aggregate `GET /api/projects/:id/active-connectors`; frontend `useConnectorActivity` hook auto-degrades SSE→polling→stale; `ConnectorActivityBadge` 3 density variants. **Activity does NOT write to actor_audit** (per critic #8 — 5+ phase transitions per task would drown the human-meaningful authoring trail; only latest snapshot persists). **PR-5**: dogfood (8 deliberate-trigger steps including activity SSE observation) + `docs/operating-rules.md` "Role-dispatch + visibility model" section + DECISIONS.md final + **archival pass** moving 2026-04-22-and-older entries to `DECISIONS_ARCHIVE.md` (per critic #11 — file already past 30KB threshold). - **Alternatives considered**: (1) Defer authoring + router + activity to Phase 6d — rejected; the user explicitly stated dogfood today is broken without all three (see catch-22 analysis in plan v5 §1.1). (2) Single mega-PR — rejected; 15-day single PR is unreviewable; coherent slicing into 5 PRs preserves coherence while making each PR shippable in 1-5 days. (3) Apply-time-only authoring (no candidate edit + no audit) — rejected; user feedback explicitly requires comprehensive authoring (`feedback_no_simple_approach`); apply-time-only would force re-rework when Phase 6d's LLM planner pre-fills `execution_role` at candidate creation. (4) Polling-only activity (no SSE) — rejected at user pick C1; sub-second visibility is consistent with `feedback_no_simple_approach` even though 3s polling would technically suffice. (5) Sync `role_dispatch_auto` in PR-3 — **rejected at user pick B2 (critic round 3 #2)**; without dogfood data on router quality, auto-apply is premature optimization; PR-3 ships suggest-only and PR-6 (or 6d) lands auto-apply once PR-5 dogfood validates router accuracy. (6) Async `role_dispatch_auto` + webhook in 6c — deferred to 6d per user §5 Q3 answer (depends on auto-apply existing first). (7) Skip Role.Category field — rejected; without `category="meta"` filter the dispatcher prompt would surface in `/api/roles` and self-recommend, breaking the routing semantics. (8) `actor_audit` as candidate-specific table — rejected; designed generic from start (subject_kind discriminator) so PR-3 router-actor rows (when 6d auto-apply lands) and PR-4 system-actor rows reuse the same infrastructure. (9) Denormalised `execution_role_set_by/_at/_confidence` columns on `backlog_candidates` — **rejected at critic round 3 #1**; `actor_audit` is the single source of truth and frontend reads via JOIN helper to avoid drift between two writers. (10) Activity history written to `actor_audit` — **rejected at critic round 3 #8**; ~5 phase transitions per task dispatch would drown the human-meaningful authoring trail; activity only persists as latest snapshot, dedicated time-series store deferred to 6d if needed. (11) `dispatcher.md` placed at `prompts/dispatcher.md` siblng to `backlog.md` and `whatsnext.md` — **rejected at critic round 3 #10**; meta-prompts deserve their own subtree (`prompts/meta/`) for IA clarity and future expansion. -- **Constraints introduced**: **(a) Role catalog**: `backend/internal/roles/catalog.go` is hand-maintained `[]Role{...}` (no codegen); `Role.Category` ∈ {"role", "meta"}; `roles.All()` returns full set; `/api/roles` filters category="role"; `TestCatalogMatchesPromptDir` walks both `prompts/roles/*.md` and `prompts/dispatcher.md`; PR adding a new role MUST edit both files. **(b) execution_role lifecycle**: writes go through `BacklogCandidateStore.UpdateExecutionRole(ctx, id, role, actor)` which does single-transaction validate → SELECT old → UPDATE → INSERT actor_audit; concurrent PATCH/apply use `BEGIN IMMEDIATE` (existing SQLite pattern). `set_by` ∈ {"", "operator", "router"}; `confidence` only set when `set_by="router"`. **(c) Apply API**: `mode=role_dispatch` requires non-empty `execution_role` in catalog → else 400; `mode=role_dispatch_auto` calls dispatcher synchronously (6c) and returns 422 with router_decision payload when `confidence < min_confidence` or `role_id="no_match"`; `mode=manual` ignores `execution_role`. **(d) Server-side claim enforcement**: `MarkTaskRoleNotFound` does `dispatch_status: queued → failed` atomic transition (NOT `running → failed`) plus single-tx audit row; non-applicable when task already leased. **(e) L0 safety**: per-role timeouts in catalog (code-reviewer=15min, test-writer=20min, api-contract-writer=30min, ui-scaffolder=45min, db-schema-designer=45min, backend-architect=90min, dispatcher=60min); `ANPM_DISPATCH_TIMEOUT` env override hierarchy: env>0 → env value, env=0 → disabled, env<0/unset → catalog → 30min fallback. SIGTERM→5s→SIGKILL escalation via `cmd.Cancel`+`cmd.WaitDelay`; adversarial test uses real subprocess with `signal.Ignore`. Output cap default 5 MB via `ANPM_DISPATCH_OUTPUT_MAX`; 0=disabled. JSON schema minimum: must contain `files []`; optional fields type-checked. **(f) Router**: dispatcher prompt is in `prompts/dispatcher.md` (NOT under `roles/`) with `category: meta`; output validated for role_id ∈ catalog, confidence ∈ [0,1], reasoning length and control-char sanitization; alternatives all validated against catalog; `min_confidence` default 0.7 (operator-overridable per apply); router output never trusted to be a valid catalog entry — Validation is a hard gate not a soft check. **(g) Activity model**: phases are exhaustive enum (idle/claiming_run/planning/claiming_task/routing/dispatching/submitting); reporter coalesces step-only changes within 500ms but always sends phase transitions; in-memory Hub uses unbuffered subscriber channels with non-blocking send (slow client auto-drops, reconnect picks up via initial state); SSE includes 30s keepalive comments + `X-Accel-Buffering: no`; per-user concurrent SSE connections capped at 3 (503 above); DB persists latest snapshot only (history lives in actor_audit); idle activities retained 5 min before purge. **(h) Audit invariants**: every `execution_role` change writes to `actor_audit` in same transaction; `actor_kind` ∈ {"user", "router", "system", "connector"}; `rationale` stores router confidence + reasoning, or system change reason; cascade-delete with subject row. **(i) Operational constraint**: L0 boundary is the ONLY safety enforcement until L1 ships — operators MUST NOT expose role_dispatch to non-operator task submitters or untrusted task content; this is documented verbatim in `docs/operating-rules.md` and is non-negotiable. L1 (process-level jail via firejail/namespaces) is evaluated when Phase 6d opens; L2 (container/VM full isolation) requires one of three triggers fired: multi-tenant submitters, untrusted external repos, or compliance requirements. **(j) Phase 6d/7 trigger conditions**: recorded in `docs/phase6c-plan.md` §9; opening either phase without a documented trigger having fired is a scope-creep violation. **(k) PR ordering**: PR-1 first (already implemented); PR-2/3/4 sequential to avoid rebase cost (each later PR consumes earlier-PR types); PR-5 last (dogfood requires all four prior PRs). +- **Constraints introduced**: **(a) Role catalog**: `backend/internal/roles/catalog.go` is hand-maintained `[]Role{...}` (no codegen); `Role.Category` ∈ {"role", "meta"}; `roles.All()` returns full set; `/api/roles` filters category="role"; `TestCatalogMatchesPromptDir` walks both `prompts/roles/*.md` and `prompts/meta/*.md`; PR adding a new role MUST edit both files. **(b) execution_role lifecycle**: writes go through `BacklogCandidateStore.UpdateExecutionRole(ctx, id, role, actor)` which does single-transaction validate → SELECT old → UPDATE → INSERT actor_audit; concurrent PATCH/apply use `BEGIN IMMEDIATE` (existing SQLite pattern). `set_by` ∈ {"", "operator", "router"}; `confidence` only set when `set_by="router"`. **(c) Apply API**: `mode=role_dispatch` requires non-empty `execution_role` in catalog → else 400; `mode=manual` ignores `execution_role`. (`mode=role_dispatch_auto` is deferred to Phase 6d per user pick B2 — see Alternatives §5.) **(d) Server-side claim enforcement**: `MarkTaskRoleNotFound` does `dispatch_status: queued → failed` atomic transition (NOT `running → failed`) plus single-tx audit row; non-applicable when task already leased. **(e) L0 safety**: per-role timeouts in catalog (code-reviewer=15min, test-writer=20min, api-contract-writer=30min, ui-scaffolder=45min, db-schema-designer=45min, backend-architect=90min, dispatcher=60min); `ANPM_DISPATCH_TIMEOUT` env override hierarchy: env>0 → env value, env=0 → disabled, env<0/unset → catalog → 30min fallback. SIGTERM→5s→SIGKILL escalation via `cmd.Cancel`+`cmd.WaitDelay`; adversarial test uses real subprocess with `signal.Ignore`. Output cap default 5 MB via `ANPM_DISPATCH_OUTPUT_MAX`; 0=disabled. JSON schema minimum: must contain `files []`; optional fields type-checked. **(f) Router**: dispatcher prompt is in `prompts/meta/dispatcher.md` (NOT under `roles/`) with `category: meta`; output validated for role_id ∈ catalog, confidence ∈ [0,1], reasoning length and control-char sanitization; alternatives all validated against catalog; `min_confidence` default 0.7 (reserved for Phase 6d auto-apply path); router output never trusted to be a valid catalog entry — Validation is a hard gate not a soft check. **(g) Activity model**: phases are exhaustive enum `idle/claiming_run/planning/claiming_task/dispatching/submitting` (no `routing` in 6c — that arrives in 6d alongside auto-apply); reporter uses **enqueue-not-overwrite** so rapid phase transitions all reach subscriber, only same-phase step changes coalesce within 500ms; in-memory Hub uses unbuffered subscriber channels with non-blocking send (slow client auto-drops, reconnect picks up via initial state); SSE includes 30s keepalive comments + `X-Accel-Buffering: no`; per-user concurrent SSE connections capped at 3 (503 above); DB persists latest snapshot only (Activity does NOT write to actor_audit per critic round 3 #8); idle activities retained 5 min before purge. **(h) Audit invariants**: every `execution_role` change writes to `actor_audit` in same transaction; `actor_kind` ∈ {"user", "router", "system", "connector"}; `rationale` stores router confidence + reasoning, or system change reason; cascade-delete with subject row. **(i) Operational constraint**: L0 boundary is the ONLY safety enforcement until L1 ships — operators MUST NOT expose role_dispatch to non-operator task submitters or untrusted task content; this is documented verbatim in `docs/operating-rules.md` and is non-negotiable. L1 (process-level jail via firejail/namespaces) is evaluated when Phase 6d opens; L2 (container/VM full isolation) requires one of three triggers fired: multi-tenant submitters, untrusted external repos, or compliance requirements. **(j) Phase 6d/7 trigger conditions**: recorded in `docs/phase6c-plan.md` §9; opening either phase without a documented trigger having fired is a scope-creep violation. **(k) PR ordering**: PR-1 first (already implemented); PR-2/3/4 sequential to avoid rebase cost (each later PR consumes earlier-PR types); PR-5 last (dogfood requires all four prior PRs). - **Source**: `docs/phase6c-plan.md` v5.1 (post-critic-round-3, B2 + C1 拍板). Backed by dogfood-generated backlog candidates `bad629dc` (catalog SoT) and `fb040ce6` (safety boundary), both `approved` status as of 2026-04-25, plus design dialogues with the user that surfaced the catch-22, the LLM router request, the activity visibility requirement, and a critic round adversarially analyzing v5 that produced 14 findings (9 unilateral fixes adopted, B2 + C1 user-decided). ## 2026-04-25: Requirement discard, analysis filtering, and connector run-status badge [agent:application-implementer] diff --git a/backend/internal/connector/dispatch_safety_test.go b/backend/internal/connector/dispatch_safety_test.go index bd20f48..b59a592 100644 --- a/backend/internal/connector/dispatch_safety_test.go +++ b/backend/internal/connector/dispatch_safety_test.go @@ -5,7 +5,6 @@ import ( "context" "encoding/json" "fmt" - "io" "os" "os/signal" "strings" @@ -48,6 +47,18 @@ func runTestHelper(mode string) { // T-6c-C2-2: trap SIGTERM and ignore it; sleep until SIGKILL'd. signal.Ignore(syscall.SIGTERM) time.Sleep(10 * time.Minute) + case "ignore_sigterm_print_loop": + // T-6c-C2-15 (Copilot fix): trap SIGTERM, then continuously + // print stdout until SIGKILL'd. Triggers BOTH timeout AND + // boundedWriter truncation to verify runErr-over-truncated + // precedence in service.go RunOnceTask. + signal.Ignore(syscall.SIGTERM) + buf := bytes.Repeat([]byte("x"), 1024) + for { + if _, err := os.Stdout.Write(buf); err != nil { + return + } + } case "echo_args": // T-6c-C2-1: print the received -p prompt verbatim so the test // can verify shell metacharacters were NOT expanded. @@ -328,33 +339,35 @@ func TestInvokeBuiltinCLI_RaceFinishesBeforeTimeout(t *testing.T) { } func TestInvokeBuiltinCLI_TimeoutWithTruncationPrefersTimeout(t *testing.T) { - // Critic finding #4: when the CLI fills the output cap AND is - // killed by the timeout, the runErrMsg (timeout) carries more - // useful diagnostic info than truncation alone. The dispatch - // caller in service.go must prefer runErrMsg over truncated when - // both are set; this test pins the contract at invokeBuiltinCLI - // so any future refactor that swaps the precedence breaks here. + // Critic finding #4 + Copilot review #2: when the CLI fills the + // output cap AND is killed by the timeout, both `truncated=true` + // AND `runErrMsg!=""` are set. The dispatch caller in service.go + // must prefer runErrMsg (the timeout signal is more informative + // than the cap firing). This test now actually triggers BOTH + // conditions — earlier version used a sleep-only helper that + // printed nothing, so truncated stayed false and the test was + // theatrical. The new "ignore_sigterm_print_loop" helper traps + // SIGTERM and writes continuously, which trips the bounded + // writer well before SIGKILL escalation lands. if testing.Short() { t.Skip("subprocess test skipped in -short mode") } - // Use a tight 1-byte cap so even minimal output trips truncation. - // The helper sleeps forever ignoring SIGTERM, so timeout fires - // after 1s+5s = 6s. The 1-byte cap is irrelevant to whether the - // CLI is truncated (it doesn't print anything before being killed), - // but if a future implementation accidentally signals truncation - // preemptively, this combination would catch it. t.Setenv("ANPM_TEST_HELPER_GUARD", "1") - t.Setenv("ANPM_TEST_HELPER_MODE", "ignore_sigterm_sleep_forever") - t.Setenv("ANPM_DISPATCH_OUTPUT_MAX", "1") - _, _, errMsg := invokeBuiltinCLI(context.Background(), "claude", os.Args[0], "", "x", 1) + t.Setenv("ANPM_TEST_HELPER_MODE", "ignore_sigterm_print_loop") + t.Setenv("ANPM_DISPATCH_OUTPUT_MAX", "1024") // 1 KB — easy to trip + _, truncated, errMsg := invokeBuiltinCLI(context.Background(), "claude", os.Args[0], "", "x", 1) if errMsg == "" { t.Fatal("expected runErr (timeout); got empty") } + if !truncated { + t.Error("expected truncated=true; print loop should have exceeded 1 KB cap") + } if !strings.Contains(strings.ToLower(errMsg), "timed out") { t.Errorf("errMsg = %q, want substring 'timed out'", errMsg) } // Verify dispatch classifier picks dispatch_timeout (the Phase - // 6c-specific kind), not adapter_timeout. + // 6c-specific kind), not adapter_timeout, even when truncated is + // also set — that's the precedence rule under test. if got := classifyDispatchRunError(errMsg); got != "dispatch_timeout" { t.Errorf("classifyDispatchRunError = %q, want dispatch_timeout", got) } @@ -383,7 +396,3 @@ func TestClassifyDispatchRunError(t *testing.T) { } } -// Compile-time guard: `io.Discard` is used elsewhere; keep an unused -// reference here to avoid false-positive lint complaints if the imports -// shift around during refactors. -var _ io.Writer = io.Discard diff --git a/backend/internal/connector/service.go b/backend/internal/connector/service.go index 2cb6e7e..7c0a943 100644 --- a/backend/internal/connector/service.go +++ b/backend/internal/connector/service.go @@ -299,6 +299,12 @@ func (s *Service) RunOnceTask(ctx context.Context) (bool, error) { if len(snippet) > 240 { snippet = snippet[:240] } + // Normalize newlines so the error message renders cleanly in + // stderr logs and the task result panel — matches the + // builtin_adapter.go pattern for the same snippet shape. + snippet = strings.ReplaceAll(snippet, "\r\n", " ") + snippet = strings.ReplaceAll(snippet, "\n", " ") + snippet = strings.ReplaceAll(snippet, "\r", " ") errMsg := fmt.Sprintf("could not parse output as JSON: %v; first 240 chars: %s", extractErr, snippet) fmt.Fprintf(s.Stderr, "task %s: %s\n", task.ID, errMsg) if err := s.Client.SubmitTaskResult(ctx, task.ID, SubmitTaskResultRequest{