feat(phase6c-pr1): role catalog skeleton + L0 safety boundary by screenleon · Pull Request #26 · screenleon/agent-native-pm

screenleon · 2026-04-25T13:53:06Z

Summary

Phase 6c PR-1 — implements the role catalog data structure and the L0 safety boundary around connector subprocess invocation. This is the first of 5 PRs that comprise Phase 6c (full plan in docs/phase6c-plan.md v5.1).

What this PR ships

Role catalog (backend/internal/roles/): hand-maintained Role struct with 6 entries + per-role DefaultTimeoutSec (code-reviewer 15min through backend-architect 90min). TimeoutFor resolves env override → catalog → 30min fallback. ANPM_DISPATCH_TIMEOUT=0 disables. TestCatalogMatchesPromptDir is the SoT-drift detector.
L0 safety boundary (backend/internal/connector/dispatch_safety.go):
- boundedWriter with atomic.Int64 + atomic.Bool (max=0 disables)
- applyDispatchKillEscalation: cmd.Cancel = SIGTERM + cmd.WaitDelay = 5s → SIGKILL (Go 1.20+ contract; bare context.Cancel does not guarantee termination on SIGTERM-trapping CLIs)
- validateExecutionResult: enforces files [] required + optional fields type-checked
Codex PTY fix: io.Copy runs in a goroutine + ptmx.Close after cmd.Wait to avoid hangs on SIGTERM-ignoring Codex (risk-reviewer H1).
invokeBuiltinCLI signature change: (string, string) → (string, bool, string) adding truncated flag. All 3 call sites (probe, ExecuteBuiltin, RunOnceTask) updated.
RunOnceTask uses roles.TimeoutFor(roleID) for per-role timeout; precedence is runErr over truncated when both are set (so timeout signals are not masked by output cap firing).
4 new error_kinds: dispatch_timeout, output_too_large, invalid_result_schema, role_not_found (last is reserved for PR-2 server-side enforcement).

Test plan

make pre-pr green (SQLite + PostgreSQL + frontend build) — ran twice
go test -race green for connector + roles packages
T-6c-C2-2 SIGTERM-ignore subprocess test takes ~6.0s (= 1s timeout + 5s grace), proving SIGKILL escalation works
critic subagent review (2 rounds) — 4 Required findings + 2 nice-to-have all addressed
/security-review — 0 findings ≥ confidence 8
risk-reviewer — 2 HIGH (H1 Codex PTY, H2 atomic) + 1 MEDIUM (M1 timing) all addressed

Plan v5.1 context

This PR also ships docs/phase6c-plan.md v5.1 (5-PR scope, ~13 days total) and a Phase 6c entry in DECISIONS.md. Subsequent PRs (per v5.1):

PR	Scope	Estimated
PR-1 (this)	catalog + L0 safety	done
PR-2	authoring lifecycle + `actor_audit` (SoT) + 4-point catalog enforcement	4.4 days
PR-3	LLM router suggest endpoint (advisory only; auto-apply deferred to 6d per dogfood signal)	2.0 days
PR-4	connector activity tracking via SSE	5.3 days
PR-5	dogfood + docs + DECISIONS archival	1.2 days

🤖 Generated with Claude Code

Phase 6c PR-1 — implements the role catalog data structure and the L0 safety boundary around connector subprocess invocation. Closes two of the four gaps Phase 6c addresses; remaining PRs ship per the v5.1 plan in docs/phase6c-plan.md. ## What's in this PR - backend/internal/roles/ — hand-maintained Role catalog (6 entries) with per-role DefaultTimeoutSec (code-reviewer 15min through backend-architect 90min). TimeoutFor resolves env override → catalog → 30min fallback. ANPM_DISPATCH_TIMEOUT=0 disables. TestCatalogMatchesPromptDir is the SoT-drift detector. - backend/internal/connector/dispatch_safety.go — - boundedWriter (atomic.Int64 + atomic.Bool, max=0 disables) - applyDispatchKillEscalation: cmd.Cancel = SIGTERM, cmd.WaitDelay = 5s before SIGKILL (per Go 1.20+ contract; bare context.Cancel does not guarantee subprocess termination on SIGTERM-trapping CLIs) - validateExecutionResult: enforces files [] required, optional test_instructions/risks/followups type-checked - invokeBuiltinCLI signature changes from (string, string) to (string, bool, string) adding truncated flag. invokeClaudeCLI and invokeCodexCLI both apply boundedWriter + signal escalation. invokeCodexCLI uses io.Copy goroutine + ptmx.Close after Wait to avoid hangs on SIGTERM-ignoring Codex (risk-reviewer H1). - RunOnceTask uses roles.TimeoutFor(roleID) for per-role timeout; precedence is runErr-over-truncated when both are set (so timeout signals are not masked by output cap firing); routes failures to 4 new error_kinds: dispatch_timeout, output_too_large, invalid_result_schema, role_not_found (the last is reserved for PR-2 server-side claim-next-task enforcement). - Adversarial test matrix: 14 dispatch tests + 9 catalog tests, all green under -race. T-6c-C2-2 spawns a real subprocess via os.Args[0] with signal.Ignore(SIGTERM) to prove SIGKILL escalation fires within 1s timeout + 5s grace. TestMain uses a double-sentinel guard (ANPM_TEST_HELPER_GUARD=1 + ANPM_TEST_HELPER_MODE) so a developer with the mode env set in their shell does not get a silent zero-test exit. ## Plan v5.1 + DECISIONS context This PR also ships docs/phase6c-plan.md (5-PR scope, ~13 days total) and a Phase 6c entry in DECISIONS.md. Subsequent PRs: - PR-2: authoring lifecycle + actor_audit (SoT) + 4-point catalog enforcement - PR-3: LLM router suggest endpoint (advisory; auto-apply deferred to Phase 6d per dogfood signal) - PR-4: connector activity tracking via SSE - PR-5: dogfood + DECISIONS archival ## Reviews completed - make pre-pr: green twice (SQLite + PostgreSQL + frontend build) - critic subagent: 2 rounds, 4 Required + 2 nice-to-have all addressed; round-2 findings included Codex PTY io.Copy goroutine fix, atomic.Int64 boundedWriter, SIGTERM-ignore test slack 5s - /security-review: 0 findings ≥ confidence 8 - risk-reviewer: 2 HIGH (H1 Codex PTY hang, H2 boundedWriter race) + 1 MEDIUM (M1 timing fragility) all addressed; lower-severity findings logged as risks R22-R31 in plan §6 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Copilot

Pull request overview

Implements Phase 6c PR-1 foundations by introducing a backend role catalog and adding an L0 safety boundary around connector subprocess execution (timeouts, output caps, schema checks), plus associated error kinds/remediations and planning/docs scaffolding for the broader Phase 6c slice plan.

Changes:

Added backend/internal/roles catalog with per-role default timeouts and a drift test against prompt frontmatter.
Hardened connector CLI invocation with SIGTERM→SIGKILL escalation, stdout output capping, and minimal JSON result schema validation; updated call sites for new (output, truncated, errMsg) signature.
Expanded backend error_kind allowlist/remediation catalog and added Phase 6c plan + DECISIONS entry.

Reviewed changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated 4 comments.

Show a summary per file

File	Description
docs/phase6c-plan.md	Adds Phase 6c v5.1 multi-PR plan and design details (PR-1 through PR-5).
backend/internal/roles/catalog.go	Introduces canonical role catalog and timeout resolution (`ANPM_DISPATCH_TIMEOUT` override + fallback).
backend/internal/roles/catalog_test.go	Adds SoT drift detection vs `prompts/roles/*.md` and unit tests for catalog behaviors.
backend/internal/models/requirement.go	Adds dispatch safety boundary error_kinds to allowlist + remediations.
backend/internal/connector/service.go	Uses role-based timeout, handles output truncation, and validates execution result schema for task dispatch.
backend/internal/connector/probe.go	Updates probe path to new `invokeBuiltinCLI` return signature.
backend/internal/connector/dispatch_safety.go	Adds bounded stdout writer, output-max env parsing, SIGTERM→SIGKILL escalation wiring, and schema validator.
backend/internal/connector/builtin_adapter.go	Updates builtin CLI execution to enforce safety boundary + new signature and truncation handling.
backend/internal/connector/dispatch_safety_test.go	Adds unit + subprocess tests for output cap, timeout escalation, schema validation, and env parsing.
DECISIONS.md	Records Phase 6c v5.1 design decisions and PR slicing rationale.

Copilot · 2026-04-25T13:58:08Z

 		if err := s.Client.SubmitTaskResult(ctx, task.ID, SubmitTaskResultRequest{
 			Success:      false,
 			ErrorMessage: errMsg,
-			ErrorKind:    "unknown",
+			ErrorKind:    models.ErrorKindInvalidResultSchema,
+		}); err != nil {


The JSON-extraction failure path includes a raw output snippet in the error message, but it doesn't normalize newlines. In builtin_adapter.go the analogous code replaces \n with spaces before embedding the snippet; without that, task error messages/log lines can become multi-line and harder to read (and may render poorly in the UI). Consider applying the same newline (and possibly \r) normalization here before building errMsg.

Copilot · 2026-04-25T13:58:09Z

+func TestInvokeBuiltinCLI_TimeoutWithTruncationPrefersTimeout(t *testing.T) {
+	// Critic finding #4: when the CLI fills the output cap AND is
+	// killed by the timeout, the runErrMsg (timeout) carries more
+	// useful diagnostic info than truncation alone. The dispatch
+	// caller in service.go must prefer runErrMsg over truncated when
+	// both are set; this test pins the contract at invokeBuiltinCLI
+	// so any future refactor that swaps the precedence breaks here.
+	if testing.Short() {
+		t.Skip("subprocess test skipped in -short mode")
+	}
+	// Use a tight 1-byte cap so even minimal output trips truncation.
+	// The helper sleeps forever ignoring SIGTERM, so timeout fires
+	// after 1s+5s = 6s. The 1-byte cap is irrelevant to whether the
+	// CLI is truncated (it doesn't print anything before being killed),
+	// but if a future implementation accidentally signals truncation
+	// preemptively, this combination would catch it.
+	t.Setenv("ANPM_TEST_HELPER_GUARD", "1")
+	t.Setenv("ANPM_TEST_HELPER_MODE", "ignore_sigterm_sleep_forever")
+	t.Setenv("ANPM_DISPATCH_OUTPUT_MAX", "1")
+	_, _, errMsg := invokeBuiltinCLI(context.Background(), "claude", os.Args[0], "", "x", 1)
+	if errMsg == "" {
+		t.Fatal("expected runErr (timeout); got empty")
+	}
+	if !strings.Contains(strings.ToLower(errMsg), "timed out") {
+		t.Errorf("errMsg = %q, want substring 'timed out'", errMsg)
+	}
+	// Verify dispatch classifier picks dispatch_timeout (the Phase
+	// 6c-specific kind), not adapter_timeout.
+	if got := classifyDispatchRunError(errMsg); got != "dispatch_timeout" {
+		t.Errorf("classifyDispatchRunError = %q, want dispatch_timeout", got)
+	}


TestInvokeBuiltinCLI_TimeoutWithTruncationPrefersTimeout doesn't currently exercise the scenario it describes: the helper mode used (ignore_sigterm_sleep_forever) prints no stdout, so truncated should remain false and the test never verifies the "timeout + truncation" precedence. This makes the test name/comment misleading and leaves the intended edge case untested. Suggest adding a helper mode that continuously writes output while ignoring SIGTERM (so both timeout and output cap trigger), then assert truncated==true and errMsg is the timeout string (and/or add a focused test on the caller precedence in service.go).

Copilot · 2026-04-25T13:58:09Z

+// Compile-time guard: `io.Discard` is used elsewhere; keep an unused
+// reference here to avoid false-positive lint complaints if the imports
+// shift around during refactors.
+var _ io.Writer = io.Discard


The io import is only kept alive via the var _ io.Writer = io.Discard guard at the end of the file. This pattern adds noise and makes it harder to see real test logic; it's simpler to just remove the unused import (and this guard) and let gofmt/go test enforce import correctness.

Copilot · 2026-04-25T13:58:09Z

+- **Context**: Phase 6b shipped role-dispatch end-to-end in code but the user-facing path is broken in three independent ways: (1) `execution_role` has no UI authoring surface, so the role_dispatch radio is permanently disabled (catch-22); (2) "auto-dispatch" is in name only — operators must still manually pick a role for every task; (3) connector activity during long task execution (backend-architect can run 90 min) is invisible to the frontend, making dogfood debugging impossible. Phase 5 §(g)/§(h) also flagged subprocess sandboxing as a Phase 6 blocker. The user explicitly rejected simple / phase-staged solutions: "一律不考慮簡單作法 我希望是完善的改動 而不是臨時性處理" (memory: `feedback_no_simple_approach`). Closing all four gaps in one phase is therefore in scope, even at ~15-day total cost.
+- **Decision**: Phase 6c ships as **5 separate PRs**, each independently reviewable and rollback-able, but coherent as a single capability slice. **PR-1 (done)**: role catalog skeleton (hand-maintained `[]Role{...}` with per-role `DefaultTimeoutSec`) + L0 safety boundary in connector subprocess invocation (wall-clock timeout via `cmd.Cancel = SIGTERM` + `cmd.WaitDelay = 5s` escalation, output size cap via `boundedWriter`, JSON schema minimum validation, 4 new error kinds: `role_not_found`/`dispatch_timeout`/`output_too_large`/`invalid_result_schema`). **PR-2**: full authoring lifecycle — migration 030 adds generic `actor_audit` table (subject_kind/subject_id/field/old_value/new_value/actor_kind/actor_id/rationale/confidence) which is the **single source of truth** for execution_role authoring metadata (no denormalised columns on `backlog_candidates` per critic round 3 #1; frontend reads via helper `LatestAuthoring(subject_kind, subject_id, field)` joining against latest audit row); PATCH endpoint accepts `execution_role`; apply API extended with `execution_role` payload; catalog enforcement at four entry points (PATCH, apply, claim-next-task; suggest gates at validation in PR-3); CandidateReviewPanel rewritten so role_dispatch radio is always enabled with inline `<select>`; CandidateRoleEditor inline popover on candidate cards; stale-role warning when previously-suggested role no longer in catalog. **PR-3 (suggest-only, B2)**: LLM router as advisory meta-agent — new `backend/internal/prompts/meta/dispatcher.md` (category=meta, version 1, default timeout 60s; lives under `meta/` subtree per critic #10, drift test walks both `roles/` and `meta/`); `dispatcher.Service.Suggest` reuses PR-1's `invokeBuiltinCLI` for safety; `POST /api/backlog-candidates/:id/suggest-role` returns RouterResult without persisting; **Apply API NOT extended with `role_dispatch_auto` mode** — operator confirms suggested role manually then applies via `mode=role_dispatch` (audit row `actor_kind="operator"` even when suggestion came from router; this avoids premature auto-apply before router quality is dogfood-validated, per user pick B2); 1 new error kind (`router_no_match`); router output validated against catalog (role_id must be known or "no_match", confidence ∈ [0,1], reasoning ≤ 1024 chars with control-char sanitization); migration 032 reserved as PR-3 placeholder (per critic #9). **PR-4**: connector activity tracking — migration 031 adds `current_activity_json/at` snapshot column on `local_connectors`; ActivityReporter on connector emits phase transitions via **enqueue-not-overwrite** queue (per critic #5 — phase transitions in rapid succession all reach subscriber; same-phase step changes coalesce in 500ms window); phase enum is `idle/claiming_run/planning/claiming_task/dispatching/submitting` (no `routing` — that arrives in 6d with auto-apply); `POST /api/connector/activity` lightweight ingest endpoint; in-memory Hub broadcasts to SSE subscribers via unbuffered channels (slow clients auto-dropped); `GET /api/connectors/:id/activity-stream` SSE with 30s keepalive + `X-Accel-Buffering: no` header (C1: SSE retained vs polling-only); `GET /api/connectors/:id/activity` polling fallback; project-level aggregate `GET /api/projects/:id/active-connectors`; frontend `useConnectorActivity` hook auto-degrades SSE→polling→stale; `ConnectorActivityBadge` 3 density variants. **Activity does NOT write to actor_audit** (per critic #8 — 5+ phase transitions per task would drown the human-meaningful authoring trail; only latest snapshot persists). **PR-5**: dogfood (8 deliberate-trigger steps including activity SSE observation) + `docs/operating-rules.md` "Role-dispatch + visibility model" section + DECISIONS.md final + **archival pass** moving 2026-04-22-and-older entries to `DECISIONS_ARCHIVE.md` (per critic #11 — file already past 30KB threshold).
+- **Alternatives considered**: (1) Defer authoring + router + activity to Phase 6d — rejected; the user explicitly stated dogfood today is broken without all three (see catch-22 analysis in plan v5 §1.1). (2) Single mega-PR — rejected; 15-day single PR is unreviewable; coherent slicing into 5 PRs preserves coherence while making each PR shippable in 1-5 days. (3) Apply-time-only authoring (no candidate edit + no audit) — rejected; user feedback explicitly requires comprehensive authoring (`feedback_no_simple_approach`); apply-time-only would force re-rework when Phase 6d's LLM planner pre-fills `execution_role` at candidate creation. (4) Polling-only activity (no SSE) — rejected at user pick C1; sub-second visibility is consistent with `feedback_no_simple_approach` even though 3s polling would technically suffice. (5) Sync `role_dispatch_auto` in PR-3 — **rejected at user pick B2 (critic round 3 #2)**; without dogfood data on router quality, auto-apply is premature optimization; PR-3 ships suggest-only and PR-6 (or 6d) lands auto-apply once PR-5 dogfood validates router accuracy. (6) Async `role_dispatch_auto` + webhook in 6c — deferred to 6d per user §5 Q3 answer (depends on auto-apply existing first). (7) Skip Role.Category field — rejected; without `category="meta"` filter the dispatcher prompt would surface in `/api/roles` and self-recommend, breaking the routing semantics. (8) `actor_audit` as candidate-specific table — rejected; designed generic from start (subject_kind discriminator) so PR-3 router-actor rows (when 6d auto-apply lands) and PR-4 system-actor rows reuse the same infrastructure. (9) Denormalised `execution_role_set_by/_at/_confidence` columns on `backlog_candidates` — **rejected at critic round 3 #1**; `actor_audit` is the single source of truth and frontend reads via JOIN helper to avoid drift between two writers. (10) Activity history written to `actor_audit` — **rejected at critic round 3 #8**; ~5 phase transitions per task dispatch would drown the human-meaningful authoring trail; activity only persists as latest snapshot, dedicated time-series store deferred to 6d if needed. (11) `dispatcher.md` placed at `prompts/dispatcher.md` siblng to `backlog.md` and `whatsnext.md` — **rejected at critic round 3 #10**; meta-prompts deserve their own subtree (`prompts/meta/`) for IA clarity and future expansion.
+- **Constraints introduced**: **(a) Role catalog**: `backend/internal/roles/catalog.go` is hand-maintained `[]Role{...}` (no codegen); `Role.Category` ∈ {"role", "meta"}; `roles.All()` returns full set; `/api/roles` filters category="role"; `TestCatalogMatchesPromptDir` walks both `prompts/roles/*.md` and `prompts/dispatcher.md`; PR adding a new role MUST edit both files. **(b) execution_role lifecycle**: writes go through `BacklogCandidateStore.UpdateExecutionRole(ctx, id, role, actor)` which does single-transaction validate → SELECT old → UPDATE → INSERT actor_audit; concurrent PATCH/apply use `BEGIN IMMEDIATE` (existing SQLite pattern). `set_by` ∈ {"", "operator", "router"}; `confidence` only set when `set_by="router"`. **(c) Apply API**: `mode=role_dispatch` requires non-empty `execution_role` in catalog → else 400; `mode=role_dispatch_auto` calls dispatcher synchronously (6c) and returns 422 with router_decision payload when `confidence < min_confidence` or `role_id="no_match"`; `mode=manual` ignores `execution_role`. **(d) Server-side claim enforcement**: `MarkTaskRoleNotFound` does `dispatch_status: queued → failed` atomic transition (NOT `running → failed`) plus single-tx audit row; non-applicable when task already leased. **(e) L0 safety**: per-role timeouts in catalog (code-reviewer=15min, test-writer=20min, api-contract-writer=30min, ui-scaffolder=45min, db-schema-designer=45min, backend-architect=90min, dispatcher=60min); `ANPM_DISPATCH_TIMEOUT` env override hierarchy: env>0 → env value, env=0 → disabled, env<0/unset → catalog → 30min fallback. SIGTERM→5s→SIGKILL escalation via `cmd.Cancel`+`cmd.WaitDelay`; adversarial test uses real subprocess with `signal.Ignore`. Output cap default 5 MB via `ANPM_DISPATCH_OUTPUT_MAX`; 0=disabled. JSON schema minimum: must contain `files []`; optional fields type-checked. **(f) Router**: dispatcher prompt is in `prompts/dispatcher.md` (NOT under `roles/`) with `category: meta`; output validated for role_id ∈ catalog, confidence ∈ [0,1], reasoning length and control-char sanitization; alternatives all validated against catalog; `min_confidence` default 0.7 (operator-overridable per apply); router output never trusted to be a valid catalog entry — Validation is a hard gate not a soft check. **(g) Activity model**: phases are exhaustive enum (idle/claiming_run/planning/claiming_task/routing/dispatching/submitting); reporter coalesces step-only changes within 500ms but always sends phase transitions; in-memory Hub uses unbuffered subscriber channels with non-blocking send (slow client auto-drops, reconnect picks up via initial state); SSE includes 30s keepalive comments + `X-Accel-Buffering: no`; per-user concurrent SSE connections capped at 3 (503 above); DB persists latest snapshot only (history lives in actor_audit); idle activities retained 5 min before purge. **(h) Audit invariants**: every `execution_role` change writes to `actor_audit` in same transaction; `actor_kind` ∈ {"user", "router", "system", "connector"}; `rationale` stores router confidence + reasoning, or system change reason; cascade-delete with subject row. **(i) Operational constraint**: L0 boundary is the ONLY safety enforcement until L1 ships — operators MUST NOT expose role_dispatch to non-operator task submitters or untrusted task content; this is documented verbatim in `docs/operating-rules.md` and is non-negotiable. L1 (process-level jail via firejail/namespaces) is evaluated when Phase 6d opens; L2 (container/VM full isolation) requires one of three triggers fired: multi-tenant submitters, untrusted external repos, or compliance requirements. **(j) Phase 6d/7 trigger conditions**: recorded in `docs/phase6c-plan.md` §9; opening either phase without a documented trigger having fired is a scope-creep violation. **(k) PR ordering**: PR-1 first (already implemented); PR-2/3/4 sequential to avoid rebase cost (each later PR consumes earlier-PR types); PR-5 last (dogfood requires all four prior PRs).


This DECISIONS entry has a couple of internal inconsistencies with the plan text just above: it says the drift test walks prompts/dispatcher.md (but elsewhere in the same entry and the plan it’s prompts/meta/dispatcher.md), and it lists the activity phase enum as including routing even though PR-4 is described as having no routing phase in 6c. Suggest correcting these paths/enums here so DECISIONS remains an accurate source of truth.

Suggested change

- **Constraints introduced**: **(a) Role catalog**: `backend/internal/roles/catalog.go` is hand-maintained `[]Role{...}` (no codegen); `Role.Category` ∈ {"role", "meta"}; `roles.All()` returns full set; `/api/roles` filters category="role"; `TestCatalogMatchesPromptDir` walks both `prompts/roles/*.md` and `prompts/dispatcher.md`; PR adding a new role MUST edit both files. **(b) execution_role lifecycle**: writes go through `BacklogCandidateStore.UpdateExecutionRole(ctx, id, role, actor)` which does single-transaction validate → SELECT old → UPDATE → INSERT actor_audit; concurrent PATCH/apply use `BEGIN IMMEDIATE` (existing SQLite pattern). `set_by` ∈ {"", "operator", "router"}; `confidence` only set when `set_by="router"`. **(c) Apply API**: `mode=role_dispatch` requires non-empty `execution_role` in catalog → else 400; `mode=role_dispatch_auto` calls dispatcher synchronously (6c) and returns 422 with router_decision payload when `confidence < min_confidence` or `role_id="no_match"`; `mode=manual` ignores `execution_role`. **(d) Server-side claim enforcement**: `MarkTaskRoleNotFound` does `dispatch_status: queued → failed` atomic transition (NOT `running → failed`) plus single-tx audit row; non-applicable when task already leased. **(e) L0 safety**: per-role timeouts in catalog (code-reviewer=15min, test-writer=20min, api-contract-writer=30min, ui-scaffolder=45min, db-schema-designer=45min, backend-architect=90min, dispatcher=60min); `ANPM_DISPATCH_TIMEOUT` env override hierarchy: env>0 → env value, env=0 → disabled, env<0/unset → catalog → 30min fallback. SIGTERM→5s→SIGKILL escalation via `cmd.Cancel`+`cmd.WaitDelay`; adversarial test uses real subprocess with `signal.Ignore`. Output cap default 5 MB via `ANPM_DISPATCH_OUTPUT_MAX`; 0=disabled. JSON schema minimum: must contain `files []`; optional fields type-checked. **(f) Router**: dispatcher prompt is in `prompts/dispatcher.md` (NOT under `roles/`) with `category: meta`; output validated for role_id ∈ catalog, confidence ∈ [0,1], reasoning length and control-char sanitization; alternatives all validated against catalog; `min_confidence` default 0.7 (operator-overridable per apply); router output never trusted to be a valid catalog entry — Validation is a hard gate not a soft check. **(g) Activity model**: phases are exhaustive enum (idle/claiming_run/planning/claiming_task/routing/dispatching/submitting); reporter coalesces step-only changes within 500ms but always sends phase transitions; in-memory Hub uses unbuffered subscriber channels with non-blocking send (slow client auto-drops, reconnect picks up via initial state); SSE includes 30s keepalive comments + `X-Accel-Buffering: no`; per-user concurrent SSE connections capped at 3 (503 above); DB persists latest snapshot only (history lives in actor_audit); idle activities retained 5 min before purge. **(h) Audit invariants**: every `execution_role` change writes to `actor_audit` in same transaction; `actor_kind` ∈ {"user", "router", "system", "connector"}; `rationale` stores router confidence + reasoning, or system change reason; cascade-delete with subject row. **(i) Operational constraint**: L0 boundary is the ONLY safety enforcement until L1 ships — operators MUST NOT expose role_dispatch to non-operator task submitters or untrusted task content; this is documented verbatim in `docs/operating-rules.md` and is non-negotiable. L1 (process-level jail via firejail/namespaces) is evaluated when Phase 6d opens; L2 (container/VM full isolation) requires one of three triggers fired: multi-tenant submitters, untrusted external repos, or compliance requirements. **(j) Phase 6d/7 trigger conditions**: recorded in `docs/phase6c-plan.md` §9; opening either phase without a documented trigger having fired is a scope-creep violation. **(k) PR ordering**: PR-1 first (already implemented); PR-2/3/4 sequential to avoid rebase cost (each later PR consumes earlier-PR types); PR-5 last (dogfood requires all four prior PRs).

- **Constraints introduced**: **(a) Role catalog**: `backend/internal/roles/catalog.go` is hand-maintained `[]Role{...}` (no codegen); `Role.Category` ∈ {"role", "meta"}; `roles.All()` returns full set; `/api/roles` filters category="role"; `TestCatalogMatchesPromptDir` walks both `prompts/roles/*.md` and `prompts/meta/dispatcher.md`; PR adding a new role MUST edit both files. **(b) execution_role lifecycle**: writes go through `BacklogCandidateStore.UpdateExecutionRole(ctx, id, role, actor)` which does single-transaction validate → SELECT old → UPDATE → INSERT actor_audit; concurrent PATCH/apply use `BEGIN IMMEDIATE` (existing SQLite pattern). `set_by` ∈ {"", "operator", "router"}; `confidence` only set when `set_by="router"`. **(c) Apply API**: `mode=role_dispatch` requires non-empty `execution_role` in catalog → else 400; `mode=role_dispatch_auto` calls dispatcher synchronously (6c) and returns 422 with router_decision payload when `confidence < min_confidence` or `role_id="no_match"`; `mode=manual` ignores `execution_role`. **(d) Server-side claim enforcement**: `MarkTaskRoleNotFound` does `dispatch_status: queued → failed` atomic transition (NOT `running → failed`) plus single-tx audit row; non-applicable when task already leased. **(e) L0 safety**: per-role timeouts in catalog (code-reviewer=15min, test-writer=20min, api-contract-writer=30min, ui-scaffolder=45min, db-schema-designer=45min, backend-architect=90min, dispatcher=60min); `ANPM_DISPATCH_TIMEOUT` env override hierarchy: env>0 → env value, env=0 → disabled, env<0/unset → catalog → 30min fallback. SIGTERM→5s→SIGKILL escalation via `cmd.Cancel`+`cmd.WaitDelay`; adversarial test uses real subprocess with `signal.Ignore`. Output cap default 5 MB via `ANPM_DISPATCH_OUTPUT_MAX`; 0=disabled. JSON schema minimum: must contain `files []`; optional fields type-checked. **(f) Router**: dispatcher prompt is in `prompts/meta/dispatcher.md` (NOT under `roles/`) with `category: meta`; output validated for role_id ∈ catalog, confidence ∈ [0,1], reasoning length and control-char sanitization; alternatives all validated against catalog; `min_confidence` default 0.7 (operator-overridable per apply); router output never trusted to be a valid catalog entry — Validation is a hard gate not a soft check. **(g) Activity model**: phases are exhaustive enum (idle/claiming_run/planning/claiming_task/dispatching/submitting); reporter coalesces step-only changes within 500ms but always sends phase transitions; in-memory Hub uses unbuffered subscriber channels with non-blocking send (slow client auto-drops, reconnect picks up via initial state); SSE includes 30s keepalive comments + `X-Accel-Buffering: no`; per-user concurrent SSE connections capped at 3 (503 above); DB persists latest snapshot only (history lives in actor_audit); idle activities retained 5 min before purge. **(h) Audit invariants**: every `execution_role` change writes to `actor_audit` in same transaction; `actor_kind` ∈ {"user", "router", "system", "connector"}; `rationale` stores router confidence + reasoning, or system change reason; cascade-delete with subject row. **(i) Operational constraint**: L0 boundary is the ONLY safety enforcement until L1 ships — operators MUST NOT expose role_dispatch to non-operator task submitters or untrusted task content; this is documented verbatim in `docs/operating-rules.md` and is non-negotiable. L1 (process-level jail via firejail/namespaces) is evaluated when Phase 6d opens; L2 (container/VM full isolation) requires one of three triggers fired: multi-tenant submitters, untrusted external repos, or compliance requirements. **(j) Phase 6d/7 trigger conditions**: recorded in `docs/phase6c-plan.md` §9; opening either phase without a documented trigger having fired is a scope-creep violation. **(k) PR ordering**: PR-1 first (already implemented); PR-2/3/4 sequential to avoid rebase cost (each later PR consumes earlier-PR types); PR-5 last (dogfood requires all four prior PRs).

- service.go: normalize \r\n / \n / \r in JSON parse error snippet so task error messages render cleanly in stderr logs and result panel (matches existing builtin_adapter.go pattern) - dispatch_safety_test.go: rewrite TestInvokeBuiltinCLI_TimeoutWithTruncationPrefersTimeout to actually trigger BOTH conditions. Previous version used the ignore_sigterm_sleep_forever helper which prints nothing, so truncated stayed false and the test was theatrical. New ignore_sigterm_print_loop helper traps SIGTERM and writes continuously, tripping the bounded writer (1 KB cap) before SIGKILL escalation lands. Now asserts truncated=true AND timeout error together, then verifies the dispatch classifier picks dispatch_timeout (the precedence rule under test). - dispatch_safety_test.go: drop unused `io` import and the `var _ io.Writer = io.Discard` compile-time guard at end of file. The guard added noise without protecting anything real — gofmt/go test catch unused imports natively. - DECISIONS.md: fix three internal inconsistencies that contradicted the plan v5.1 above: (a) drift test path was `prompts/dispatcher.md`, corrected to `prompts/meta/*.md` (matches plan §3.3.2 and constraint (f)) (c) Apply API claimed `mode=role_dispatch_auto` exists in 6c; removed and replaced with explicit deferral-to-6d note (matches user pick B2 and Alternatives §5) (f) Router section had stale `prompts/dispatcher.md`, corrected to `prompts/meta/dispatcher.md` (g) Activity phase enum incorrectly listed `routing`; removed (matches plan §3.4.2 and §2.3 item 14 — `routing` is 6d) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Copilot AI review requested due to automatic review settings April 25, 2026 13:53

Copilot started reviewing on behalf of screenleon April 25, 2026 13:53 View session

Copilot AI reviewed Apr 25, 2026

View reviewed changes

screenleon merged commit c99248a into main Apr 25, 2026
4 checks passed

screenleon deleted the phase6c-c2-safety-boundary branch April 25, 2026 14:46

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(phase6c-pr1): role catalog skeleton + L0 safety boundary#26

feat(phase6c-pr1): role catalog skeleton + L0 safety boundary#26
screenleon merged 2 commits into
mainfrom
phase6c-c2-safety-boundary

screenleon commented Apr 25, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Apr 25, 2026

Uh oh!

Copilot AI Apr 25, 2026

Uh oh!

Copilot AI Apr 25, 2026

Uh oh!

Copilot AI Apr 25, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

screenleon commented Apr 25, 2026

Summary

What this PR ships

Test plan

Plan v5.1 context

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Apr 25, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 25, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 25, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 25, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants