feat(phase6c-pr1): role catalog skeleton + L0 safety boundary#26
Conversation
Phase 6c PR-1 — implements the role catalog data structure and the
L0 safety boundary around connector subprocess invocation. Closes
two of the four gaps Phase 6c addresses; remaining PRs ship per the
v5.1 plan in docs/phase6c-plan.md.
## What's in this PR
- backend/internal/roles/ — hand-maintained Role catalog (6 entries)
with per-role DefaultTimeoutSec (code-reviewer 15min through
backend-architect 90min). TimeoutFor resolves env override →
catalog → 30min fallback. ANPM_DISPATCH_TIMEOUT=0 disables.
TestCatalogMatchesPromptDir is the SoT-drift detector.
- backend/internal/connector/dispatch_safety.go —
- boundedWriter (atomic.Int64 + atomic.Bool, max=0 disables)
- applyDispatchKillEscalation: cmd.Cancel = SIGTERM, cmd.WaitDelay
= 5s before SIGKILL (per Go 1.20+ contract; bare context.Cancel
does not guarantee subprocess termination on SIGTERM-trapping
CLIs)
- validateExecutionResult: enforces files [] required, optional
test_instructions/risks/followups type-checked
- invokeBuiltinCLI signature changes from (string, string) to
(string, bool, string) adding truncated flag. invokeClaudeCLI and
invokeCodexCLI both apply boundedWriter + signal escalation.
invokeCodexCLI uses io.Copy goroutine + ptmx.Close after Wait to
avoid hangs on SIGTERM-ignoring Codex (risk-reviewer H1).
- RunOnceTask uses roles.TimeoutFor(roleID) for per-role timeout;
precedence is runErr-over-truncated when both are set (so timeout
signals are not masked by output cap firing); routes failures to
4 new error_kinds: dispatch_timeout, output_too_large,
invalid_result_schema, role_not_found (the last is reserved for
PR-2 server-side claim-next-task enforcement).
- Adversarial test matrix: 14 dispatch tests + 9 catalog tests, all
green under -race. T-6c-C2-2 spawns a real subprocess via
os.Args[0] with signal.Ignore(SIGTERM) to prove SIGKILL escalation
fires within 1s timeout + 5s grace. TestMain uses a double-sentinel
guard (ANPM_TEST_HELPER_GUARD=1 + ANPM_TEST_HELPER_MODE) so a
developer with the mode env set in their shell does not get a
silent zero-test exit.
## Plan v5.1 + DECISIONS context
This PR also ships docs/phase6c-plan.md (5-PR scope, ~13 days total)
and a Phase 6c entry in DECISIONS.md. Subsequent PRs:
- PR-2: authoring lifecycle + actor_audit (SoT) + 4-point catalog
enforcement
- PR-3: LLM router suggest endpoint (advisory; auto-apply deferred
to Phase 6d per dogfood signal)
- PR-4: connector activity tracking via SSE
- PR-5: dogfood + DECISIONS archival
## Reviews completed
- make pre-pr: green twice (SQLite + PostgreSQL + frontend build)
- critic subagent: 2 rounds, 4 Required + 2 nice-to-have all
addressed; round-2 findings included Codex PTY io.Copy goroutine
fix, atomic.Int64 boundedWriter, SIGTERM-ignore test slack 5s
- /security-review: 0 findings ≥ confidence 8
- risk-reviewer: 2 HIGH (H1 Codex PTY hang, H2 boundedWriter race)
+ 1 MEDIUM (M1 timing fragility) all addressed; lower-severity
findings logged as risks R22-R31 in plan §6
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Implements Phase 6c PR-1 foundations by introducing a backend role catalog and adding an L0 safety boundary around connector subprocess execution (timeouts, output caps, schema checks), plus associated error kinds/remediations and planning/docs scaffolding for the broader Phase 6c slice plan.
Changes:
- Added
backend/internal/rolescatalog with per-role default timeouts and a drift test against prompt frontmatter. - Hardened connector CLI invocation with SIGTERM→SIGKILL escalation, stdout output capping, and minimal JSON result schema validation; updated call sites for new
(output, truncated, errMsg)signature. - Expanded backend error_kind allowlist/remediation catalog and added Phase 6c plan + DECISIONS entry.
Reviewed changes
Copilot reviewed 10 out of 10 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| docs/phase6c-plan.md | Adds Phase 6c v5.1 multi-PR plan and design details (PR-1 through PR-5). |
| backend/internal/roles/catalog.go | Introduces canonical role catalog and timeout resolution (ANPM_DISPATCH_TIMEOUT override + fallback). |
| backend/internal/roles/catalog_test.go | Adds SoT drift detection vs prompts/roles/*.md and unit tests for catalog behaviors. |
| backend/internal/models/requirement.go | Adds dispatch safety boundary error_kinds to allowlist + remediations. |
| backend/internal/connector/service.go | Uses role-based timeout, handles output truncation, and validates execution result schema for task dispatch. |
| backend/internal/connector/probe.go | Updates probe path to new invokeBuiltinCLI return signature. |
| backend/internal/connector/dispatch_safety.go | Adds bounded stdout writer, output-max env parsing, SIGTERM→SIGKILL escalation wiring, and schema validator. |
| backend/internal/connector/builtin_adapter.go | Updates builtin CLI execution to enforce safety boundary + new signature and truncation handling. |
| backend/internal/connector/dispatch_safety_test.go | Adds unit + subprocess tests for output cap, timeout escalation, schema validation, and env parsing. |
| DECISIONS.md | Records Phase 6c v5.1 design decisions and PR slicing rationale. |
| if err := s.Client.SubmitTaskResult(ctx, task.ID, SubmitTaskResultRequest{ | ||
| Success: false, | ||
| ErrorMessage: errMsg, | ||
| ErrorKind: "unknown", | ||
| ErrorKind: models.ErrorKindInvalidResultSchema, | ||
| }); err != nil { |
There was a problem hiding this comment.
The JSON-extraction failure path includes a raw output snippet in the error message, but it doesn't normalize newlines. In builtin_adapter.go the analogous code replaces \n with spaces before embedding the snippet; without that, task error messages/log lines can become multi-line and harder to read (and may render poorly in the UI). Consider applying the same newline (and possibly \r) normalization here before building errMsg.
| func TestInvokeBuiltinCLI_TimeoutWithTruncationPrefersTimeout(t *testing.T) { | ||
| // Critic finding #4: when the CLI fills the output cap AND is | ||
| // killed by the timeout, the runErrMsg (timeout) carries more | ||
| // useful diagnostic info than truncation alone. The dispatch | ||
| // caller in service.go must prefer runErrMsg over truncated when | ||
| // both are set; this test pins the contract at invokeBuiltinCLI | ||
| // so any future refactor that swaps the precedence breaks here. | ||
| if testing.Short() { | ||
| t.Skip("subprocess test skipped in -short mode") | ||
| } | ||
| // Use a tight 1-byte cap so even minimal output trips truncation. | ||
| // The helper sleeps forever ignoring SIGTERM, so timeout fires | ||
| // after 1s+5s = 6s. The 1-byte cap is irrelevant to whether the | ||
| // CLI is truncated (it doesn't print anything before being killed), | ||
| // but if a future implementation accidentally signals truncation | ||
| // preemptively, this combination would catch it. | ||
| t.Setenv("ANPM_TEST_HELPER_GUARD", "1") | ||
| t.Setenv("ANPM_TEST_HELPER_MODE", "ignore_sigterm_sleep_forever") | ||
| t.Setenv("ANPM_DISPATCH_OUTPUT_MAX", "1") | ||
| _, _, errMsg := invokeBuiltinCLI(context.Background(), "claude", os.Args[0], "", "x", 1) | ||
| if errMsg == "" { | ||
| t.Fatal("expected runErr (timeout); got empty") | ||
| } | ||
| if !strings.Contains(strings.ToLower(errMsg), "timed out") { | ||
| t.Errorf("errMsg = %q, want substring 'timed out'", errMsg) | ||
| } | ||
| // Verify dispatch classifier picks dispatch_timeout (the Phase | ||
| // 6c-specific kind), not adapter_timeout. | ||
| if got := classifyDispatchRunError(errMsg); got != "dispatch_timeout" { | ||
| t.Errorf("classifyDispatchRunError = %q, want dispatch_timeout", got) | ||
| } |
There was a problem hiding this comment.
TestInvokeBuiltinCLI_TimeoutWithTruncationPrefersTimeout doesn't currently exercise the scenario it describes: the helper mode used (ignore_sigterm_sleep_forever) prints no stdout, so truncated should remain false and the test never verifies the "timeout + truncation" precedence. This makes the test name/comment misleading and leaves the intended edge case untested. Suggest adding a helper mode that continuously writes output while ignoring SIGTERM (so both timeout and output cap trigger), then assert truncated==true and errMsg is the timeout string (and/or add a focused test on the caller precedence in service.go).
| // Compile-time guard: `io.Discard` is used elsewhere; keep an unused | ||
| // reference here to avoid false-positive lint complaints if the imports | ||
| // shift around during refactors. | ||
| var _ io.Writer = io.Discard |
There was a problem hiding this comment.
The io import is only kept alive via the var _ io.Writer = io.Discard guard at the end of the file. This pattern adds noise and makes it harder to see real test logic; it's simpler to just remove the unused import (and this guard) and let gofmt/go test enforce import correctness.
| - **Context**: Phase 6b shipped role-dispatch end-to-end in code but the user-facing path is broken in three independent ways: (1) `execution_role` has no UI authoring surface, so the role_dispatch radio is permanently disabled (catch-22); (2) "auto-dispatch" is in name only — operators must still manually pick a role for every task; (3) connector activity during long task execution (backend-architect can run 90 min) is invisible to the frontend, making dogfood debugging impossible. Phase 5 §(g)/§(h) also flagged subprocess sandboxing as a Phase 6 blocker. The user explicitly rejected simple / phase-staged solutions: "一律不考慮簡單作法 我希望是完善的改動 而不是臨時性處理" (memory: `feedback_no_simple_approach`). Closing all four gaps in one phase is therefore in scope, even at ~15-day total cost. | ||
| - **Decision**: Phase 6c ships as **5 separate PRs**, each independently reviewable and rollback-able, but coherent as a single capability slice. **PR-1 (done)**: role catalog skeleton (hand-maintained `[]Role{...}` with per-role `DefaultTimeoutSec`) + L0 safety boundary in connector subprocess invocation (wall-clock timeout via `cmd.Cancel = SIGTERM` + `cmd.WaitDelay = 5s` escalation, output size cap via `boundedWriter`, JSON schema minimum validation, 4 new error kinds: `role_not_found`/`dispatch_timeout`/`output_too_large`/`invalid_result_schema`). **PR-2**: full authoring lifecycle — migration 030 adds generic `actor_audit` table (subject_kind/subject_id/field/old_value/new_value/actor_kind/actor_id/rationale/confidence) which is the **single source of truth** for execution_role authoring metadata (no denormalised columns on `backlog_candidates` per critic round 3 #1; frontend reads via helper `LatestAuthoring(subject_kind, subject_id, field)` joining against latest audit row); PATCH endpoint accepts `execution_role`; apply API extended with `execution_role` payload; catalog enforcement at four entry points (PATCH, apply, claim-next-task; suggest gates at validation in PR-3); CandidateReviewPanel rewritten so role_dispatch radio is always enabled with inline `<select>`; CandidateRoleEditor inline popover on candidate cards; stale-role warning when previously-suggested role no longer in catalog. **PR-3 (suggest-only, B2)**: LLM router as advisory meta-agent — new `backend/internal/prompts/meta/dispatcher.md` (category=meta, version 1, default timeout 60s; lives under `meta/` subtree per critic #10, drift test walks both `roles/` and `meta/`); `dispatcher.Service.Suggest` reuses PR-1's `invokeBuiltinCLI` for safety; `POST /api/backlog-candidates/:id/suggest-role` returns RouterResult without persisting; **Apply API NOT extended with `role_dispatch_auto` mode** — operator confirms suggested role manually then applies via `mode=role_dispatch` (audit row `actor_kind="operator"` even when suggestion came from router; this avoids premature auto-apply before router quality is dogfood-validated, per user pick B2); 1 new error kind (`router_no_match`); router output validated against catalog (role_id must be known or "no_match", confidence ∈ [0,1], reasoning ≤ 1024 chars with control-char sanitization); migration 032 reserved as PR-3 placeholder (per critic #9). **PR-4**: connector activity tracking — migration 031 adds `current_activity_json/at` snapshot column on `local_connectors`; ActivityReporter on connector emits phase transitions via **enqueue-not-overwrite** queue (per critic #5 — phase transitions in rapid succession all reach subscriber; same-phase step changes coalesce in 500ms window); phase enum is `idle/claiming_run/planning/claiming_task/dispatching/submitting` (no `routing` — that arrives in 6d with auto-apply); `POST /api/connector/activity` lightweight ingest endpoint; in-memory Hub broadcasts to SSE subscribers via unbuffered channels (slow clients auto-dropped); `GET /api/connectors/:id/activity-stream` SSE with 30s keepalive + `X-Accel-Buffering: no` header (C1: SSE retained vs polling-only); `GET /api/connectors/:id/activity` polling fallback; project-level aggregate `GET /api/projects/:id/active-connectors`; frontend `useConnectorActivity` hook auto-degrades SSE→polling→stale; `ConnectorActivityBadge` 3 density variants. **Activity does NOT write to actor_audit** (per critic #8 — 5+ phase transitions per task would drown the human-meaningful authoring trail; only latest snapshot persists). **PR-5**: dogfood (8 deliberate-trigger steps including activity SSE observation) + `docs/operating-rules.md` "Role-dispatch + visibility model" section + DECISIONS.md final + **archival pass** moving 2026-04-22-and-older entries to `DECISIONS_ARCHIVE.md` (per critic #11 — file already past 30KB threshold). | ||
| - **Alternatives considered**: (1) Defer authoring + router + activity to Phase 6d — rejected; the user explicitly stated dogfood today is broken without all three (see catch-22 analysis in plan v5 §1.1). (2) Single mega-PR — rejected; 15-day single PR is unreviewable; coherent slicing into 5 PRs preserves coherence while making each PR shippable in 1-5 days. (3) Apply-time-only authoring (no candidate edit + no audit) — rejected; user feedback explicitly requires comprehensive authoring (`feedback_no_simple_approach`); apply-time-only would force re-rework when Phase 6d's LLM planner pre-fills `execution_role` at candidate creation. (4) Polling-only activity (no SSE) — rejected at user pick C1; sub-second visibility is consistent with `feedback_no_simple_approach` even though 3s polling would technically suffice. (5) Sync `role_dispatch_auto` in PR-3 — **rejected at user pick B2 (critic round 3 #2)**; without dogfood data on router quality, auto-apply is premature optimization; PR-3 ships suggest-only and PR-6 (or 6d) lands auto-apply once PR-5 dogfood validates router accuracy. (6) Async `role_dispatch_auto` + webhook in 6c — deferred to 6d per user §5 Q3 answer (depends on auto-apply existing first). (7) Skip Role.Category field — rejected; without `category="meta"` filter the dispatcher prompt would surface in `/api/roles` and self-recommend, breaking the routing semantics. (8) `actor_audit` as candidate-specific table — rejected; designed generic from start (subject_kind discriminator) so PR-3 router-actor rows (when 6d auto-apply lands) and PR-4 system-actor rows reuse the same infrastructure. (9) Denormalised `execution_role_set_by/_at/_confidence` columns on `backlog_candidates` — **rejected at critic round 3 #1**; `actor_audit` is the single source of truth and frontend reads via JOIN helper to avoid drift between two writers. (10) Activity history written to `actor_audit` — **rejected at critic round 3 #8**; ~5 phase transitions per task dispatch would drown the human-meaningful authoring trail; activity only persists as latest snapshot, dedicated time-series store deferred to 6d if needed. (11) `dispatcher.md` placed at `prompts/dispatcher.md` siblng to `backlog.md` and `whatsnext.md` — **rejected at critic round 3 #10**; meta-prompts deserve their own subtree (`prompts/meta/`) for IA clarity and future expansion. | ||
| - **Constraints introduced**: **(a) Role catalog**: `backend/internal/roles/catalog.go` is hand-maintained `[]Role{...}` (no codegen); `Role.Category` ∈ {"role", "meta"}; `roles.All()` returns full set; `/api/roles` filters category="role"; `TestCatalogMatchesPromptDir` walks both `prompts/roles/*.md` and `prompts/dispatcher.md`; PR adding a new role MUST edit both files. **(b) execution_role lifecycle**: writes go through `BacklogCandidateStore.UpdateExecutionRole(ctx, id, role, actor)` which does single-transaction validate → SELECT old → UPDATE → INSERT actor_audit; concurrent PATCH/apply use `BEGIN IMMEDIATE` (existing SQLite pattern). `set_by` ∈ {"", "operator", "router"}; `confidence` only set when `set_by="router"`. **(c) Apply API**: `mode=role_dispatch` requires non-empty `execution_role` in catalog → else 400; `mode=role_dispatch_auto` calls dispatcher synchronously (6c) and returns 422 with router_decision payload when `confidence < min_confidence` or `role_id="no_match"`; `mode=manual` ignores `execution_role`. **(d) Server-side claim enforcement**: `MarkTaskRoleNotFound` does `dispatch_status: queued → failed` atomic transition (NOT `running → failed`) plus single-tx audit row; non-applicable when task already leased. **(e) L0 safety**: per-role timeouts in catalog (code-reviewer=15min, test-writer=20min, api-contract-writer=30min, ui-scaffolder=45min, db-schema-designer=45min, backend-architect=90min, dispatcher=60min); `ANPM_DISPATCH_TIMEOUT` env override hierarchy: env>0 → env value, env=0 → disabled, env<0/unset → catalog → 30min fallback. SIGTERM→5s→SIGKILL escalation via `cmd.Cancel`+`cmd.WaitDelay`; adversarial test uses real subprocess with `signal.Ignore`. Output cap default 5 MB via `ANPM_DISPATCH_OUTPUT_MAX`; 0=disabled. JSON schema minimum: must contain `files []`; optional fields type-checked. **(f) Router**: dispatcher prompt is in `prompts/dispatcher.md` (NOT under `roles/`) with `category: meta`; output validated for role_id ∈ catalog, confidence ∈ [0,1], reasoning length and control-char sanitization; alternatives all validated against catalog; `min_confidence` default 0.7 (operator-overridable per apply); router output never trusted to be a valid catalog entry — Validation is a hard gate not a soft check. **(g) Activity model**: phases are exhaustive enum (idle/claiming_run/planning/claiming_task/routing/dispatching/submitting); reporter coalesces step-only changes within 500ms but always sends phase transitions; in-memory Hub uses unbuffered subscriber channels with non-blocking send (slow client auto-drops, reconnect picks up via initial state); SSE includes 30s keepalive comments + `X-Accel-Buffering: no`; per-user concurrent SSE connections capped at 3 (503 above); DB persists latest snapshot only (history lives in actor_audit); idle activities retained 5 min before purge. **(h) Audit invariants**: every `execution_role` change writes to `actor_audit` in same transaction; `actor_kind` ∈ {"user", "router", "system", "connector"}; `rationale` stores router confidence + reasoning, or system change reason; cascade-delete with subject row. **(i) Operational constraint**: L0 boundary is the ONLY safety enforcement until L1 ships — operators MUST NOT expose role_dispatch to non-operator task submitters or untrusted task content; this is documented verbatim in `docs/operating-rules.md` and is non-negotiable. L1 (process-level jail via firejail/namespaces) is evaluated when Phase 6d opens; L2 (container/VM full isolation) requires one of three triggers fired: multi-tenant submitters, untrusted external repos, or compliance requirements. **(j) Phase 6d/7 trigger conditions**: recorded in `docs/phase6c-plan.md` §9; opening either phase without a documented trigger having fired is a scope-creep violation. **(k) PR ordering**: PR-1 first (already implemented); PR-2/3/4 sequential to avoid rebase cost (each later PR consumes earlier-PR types); PR-5 last (dogfood requires all four prior PRs). |
There was a problem hiding this comment.
This DECISIONS entry has a couple of internal inconsistencies with the plan text just above: it says the drift test walks prompts/dispatcher.md (but elsewhere in the same entry and the plan it’s prompts/meta/dispatcher.md), and it lists the activity phase enum as including routing even though PR-4 is described as having no routing phase in 6c. Suggest correcting these paths/enums here so DECISIONS remains an accurate source of truth.
| - **Constraints introduced**: **(a) Role catalog**: `backend/internal/roles/catalog.go` is hand-maintained `[]Role{...}` (no codegen); `Role.Category` ∈ {"role", "meta"}; `roles.All()` returns full set; `/api/roles` filters category="role"; `TestCatalogMatchesPromptDir` walks both `prompts/roles/*.md` and `prompts/dispatcher.md`; PR adding a new role MUST edit both files. **(b) execution_role lifecycle**: writes go through `BacklogCandidateStore.UpdateExecutionRole(ctx, id, role, actor)` which does single-transaction validate → SELECT old → UPDATE → INSERT actor_audit; concurrent PATCH/apply use `BEGIN IMMEDIATE` (existing SQLite pattern). `set_by` ∈ {"", "operator", "router"}; `confidence` only set when `set_by="router"`. **(c) Apply API**: `mode=role_dispatch` requires non-empty `execution_role` in catalog → else 400; `mode=role_dispatch_auto` calls dispatcher synchronously (6c) and returns 422 with router_decision payload when `confidence < min_confidence` or `role_id="no_match"`; `mode=manual` ignores `execution_role`. **(d) Server-side claim enforcement**: `MarkTaskRoleNotFound` does `dispatch_status: queued → failed` atomic transition (NOT `running → failed`) plus single-tx audit row; non-applicable when task already leased. **(e) L0 safety**: per-role timeouts in catalog (code-reviewer=15min, test-writer=20min, api-contract-writer=30min, ui-scaffolder=45min, db-schema-designer=45min, backend-architect=90min, dispatcher=60min); `ANPM_DISPATCH_TIMEOUT` env override hierarchy: env>0 → env value, env=0 → disabled, env<0/unset → catalog → 30min fallback. SIGTERM→5s→SIGKILL escalation via `cmd.Cancel`+`cmd.WaitDelay`; adversarial test uses real subprocess with `signal.Ignore`. Output cap default 5 MB via `ANPM_DISPATCH_OUTPUT_MAX`; 0=disabled. JSON schema minimum: must contain `files []`; optional fields type-checked. **(f) Router**: dispatcher prompt is in `prompts/dispatcher.md` (NOT under `roles/`) with `category: meta`; output validated for role_id ∈ catalog, confidence ∈ [0,1], reasoning length and control-char sanitization; alternatives all validated against catalog; `min_confidence` default 0.7 (operator-overridable per apply); router output never trusted to be a valid catalog entry — Validation is a hard gate not a soft check. **(g) Activity model**: phases are exhaustive enum (idle/claiming_run/planning/claiming_task/routing/dispatching/submitting); reporter coalesces step-only changes within 500ms but always sends phase transitions; in-memory Hub uses unbuffered subscriber channels with non-blocking send (slow client auto-drops, reconnect picks up via initial state); SSE includes 30s keepalive comments + `X-Accel-Buffering: no`; per-user concurrent SSE connections capped at 3 (503 above); DB persists latest snapshot only (history lives in actor_audit); idle activities retained 5 min before purge. **(h) Audit invariants**: every `execution_role` change writes to `actor_audit` in same transaction; `actor_kind` ∈ {"user", "router", "system", "connector"}; `rationale` stores router confidence + reasoning, or system change reason; cascade-delete with subject row. **(i) Operational constraint**: L0 boundary is the ONLY safety enforcement until L1 ships — operators MUST NOT expose role_dispatch to non-operator task submitters or untrusted task content; this is documented verbatim in `docs/operating-rules.md` and is non-negotiable. L1 (process-level jail via firejail/namespaces) is evaluated when Phase 6d opens; L2 (container/VM full isolation) requires one of three triggers fired: multi-tenant submitters, untrusted external repos, or compliance requirements. **(j) Phase 6d/7 trigger conditions**: recorded in `docs/phase6c-plan.md` §9; opening either phase without a documented trigger having fired is a scope-creep violation. **(k) PR ordering**: PR-1 first (already implemented); PR-2/3/4 sequential to avoid rebase cost (each later PR consumes earlier-PR types); PR-5 last (dogfood requires all four prior PRs). | |
| - **Constraints introduced**: **(a) Role catalog**: `backend/internal/roles/catalog.go` is hand-maintained `[]Role{...}` (no codegen); `Role.Category` ∈ {"role", "meta"}; `roles.All()` returns full set; `/api/roles` filters category="role"; `TestCatalogMatchesPromptDir` walks both `prompts/roles/*.md` and `prompts/meta/dispatcher.md`; PR adding a new role MUST edit both files. **(b) execution_role lifecycle**: writes go through `BacklogCandidateStore.UpdateExecutionRole(ctx, id, role, actor)` which does single-transaction validate → SELECT old → UPDATE → INSERT actor_audit; concurrent PATCH/apply use `BEGIN IMMEDIATE` (existing SQLite pattern). `set_by` ∈ {"", "operator", "router"}; `confidence` only set when `set_by="router"`. **(c) Apply API**: `mode=role_dispatch` requires non-empty `execution_role` in catalog → else 400; `mode=role_dispatch_auto` calls dispatcher synchronously (6c) and returns 422 with router_decision payload when `confidence < min_confidence` or `role_id="no_match"`; `mode=manual` ignores `execution_role`. **(d) Server-side claim enforcement**: `MarkTaskRoleNotFound` does `dispatch_status: queued → failed` atomic transition (NOT `running → failed`) plus single-tx audit row; non-applicable when task already leased. **(e) L0 safety**: per-role timeouts in catalog (code-reviewer=15min, test-writer=20min, api-contract-writer=30min, ui-scaffolder=45min, db-schema-designer=45min, backend-architect=90min, dispatcher=60min); `ANPM_DISPATCH_TIMEOUT` env override hierarchy: env>0 → env value, env=0 → disabled, env<0/unset → catalog → 30min fallback. SIGTERM→5s→SIGKILL escalation via `cmd.Cancel`+`cmd.WaitDelay`; adversarial test uses real subprocess with `signal.Ignore`. Output cap default 5 MB via `ANPM_DISPATCH_OUTPUT_MAX`; 0=disabled. JSON schema minimum: must contain `files []`; optional fields type-checked. **(f) Router**: dispatcher prompt is in `prompts/meta/dispatcher.md` (NOT under `roles/`) with `category: meta`; output validated for role_id ∈ catalog, confidence ∈ [0,1], reasoning length and control-char sanitization; alternatives all validated against catalog; `min_confidence` default 0.7 (operator-overridable per apply); router output never trusted to be a valid catalog entry — Validation is a hard gate not a soft check. **(g) Activity model**: phases are exhaustive enum (idle/claiming_run/planning/claiming_task/dispatching/submitting); reporter coalesces step-only changes within 500ms but always sends phase transitions; in-memory Hub uses unbuffered subscriber channels with non-blocking send (slow client auto-drops, reconnect picks up via initial state); SSE includes 30s keepalive comments + `X-Accel-Buffering: no`; per-user concurrent SSE connections capped at 3 (503 above); DB persists latest snapshot only (history lives in actor_audit); idle activities retained 5 min before purge. **(h) Audit invariants**: every `execution_role` change writes to `actor_audit` in same transaction; `actor_kind` ∈ {"user", "router", "system", "connector"}; `rationale` stores router confidence + reasoning, or system change reason; cascade-delete with subject row. **(i) Operational constraint**: L0 boundary is the ONLY safety enforcement until L1 ships — operators MUST NOT expose role_dispatch to non-operator task submitters or untrusted task content; this is documented verbatim in `docs/operating-rules.md` and is non-negotiable. L1 (process-level jail via firejail/namespaces) is evaluated when Phase 6d opens; L2 (container/VM full isolation) requires one of three triggers fired: multi-tenant submitters, untrusted external repos, or compliance requirements. **(j) Phase 6d/7 trigger conditions**: recorded in `docs/phase6c-plan.md` §9; opening either phase without a documented trigger having fired is a scope-creep violation. **(k) PR ordering**: PR-1 first (already implemented); PR-2/3/4 sequential to avoid rebase cost (each later PR consumes earlier-PR types); PR-5 last (dogfood requires all four prior PRs). |
- service.go: normalize \r\n / \n / \r in JSON parse error snippet so
task error messages render cleanly in stderr logs and result panel
(matches existing builtin_adapter.go pattern)
- dispatch_safety_test.go: rewrite
TestInvokeBuiltinCLI_TimeoutWithTruncationPrefersTimeout to actually
trigger BOTH conditions. Previous version used the
ignore_sigterm_sleep_forever helper which prints nothing, so
truncated stayed false and the test was theatrical. New
ignore_sigterm_print_loop helper traps SIGTERM and writes
continuously, tripping the bounded writer (1 KB cap) before SIGKILL
escalation lands. Now asserts truncated=true AND timeout error
together, then verifies the dispatch classifier picks
dispatch_timeout (the precedence rule under test).
- dispatch_safety_test.go: drop unused `io` import and the `var _
io.Writer = io.Discard` compile-time guard at end of file. The
guard added noise without protecting anything real — gofmt/go test
catch unused imports natively.
- DECISIONS.md: fix three internal inconsistencies that contradicted
the plan v5.1 above:
(a) drift test path was `prompts/dispatcher.md`, corrected to
`prompts/meta/*.md` (matches plan §3.3.2 and constraint (f))
(c) Apply API claimed `mode=role_dispatch_auto` exists in 6c;
removed and replaced with explicit deferral-to-6d note (matches
user pick B2 and Alternatives §5)
(f) Router section had stale `prompts/dispatcher.md`, corrected to
`prompts/meta/dispatcher.md`
(g) Activity phase enum incorrectly listed `routing`; removed
(matches plan §3.4.2 and §2.3 item 14 — `routing` is 6d)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Summary
Phase 6c PR-1 — implements the role catalog data structure and the L0 safety boundary around connector subprocess invocation. This is the first of 5 PRs that comprise Phase 6c (full plan in
docs/phase6c-plan.mdv5.1).What this PR ships
Role catalog (
backend/internal/roles/): hand-maintainedRolestruct with 6 entries + per-roleDefaultTimeoutSec(code-reviewer 15min through backend-architect 90min).TimeoutForresolves env override → catalog → 30min fallback.ANPM_DISPATCH_TIMEOUT=0disables.TestCatalogMatchesPromptDiris the SoT-drift detector.L0 safety boundary (
backend/internal/connector/dispatch_safety.go):boundedWriterwithatomic.Int64+atomic.Bool(max=0disables)applyDispatchKillEscalation:cmd.Cancel = SIGTERM+cmd.WaitDelay = 5s→ SIGKILL (Go 1.20+ contract; barecontext.Canceldoes not guarantee termination on SIGTERM-trapping CLIs)validateExecutionResult: enforcesfiles []required + optional fields type-checkedCodex PTY fix:
io.Copyruns in a goroutine +ptmx.Closeaftercmd.Waitto avoid hangs on SIGTERM-ignoring Codex (risk-reviewer H1).invokeBuiltinCLIsignature change:(string, string) → (string, bool, string)addingtruncatedflag. All 3 call sites (probe, ExecuteBuiltin, RunOnceTask) updated.RunOnceTaskusesroles.TimeoutFor(roleID)for per-role timeout; precedence isrunErrovertruncatedwhen both are set (so timeout signals are not masked by output cap firing).4 new error_kinds:
dispatch_timeout,output_too_large,invalid_result_schema,role_not_found(last is reserved for PR-2 server-side enforcement).Test plan
make pre-prgreen (SQLite + PostgreSQL + frontend build) — ran twicego test -racegreen for connector + roles packages/security-review— 0 findings ≥ confidence 8Plan v5.1 context
This PR also ships
docs/phase6c-plan.mdv5.1 (5-PR scope, ~13 days total) and a Phase 6c entry inDECISIONS.md. Subsequent PRs (per v5.1):actor_audit(SoT) + 4-point catalog enforcement🤖 Generated with Claude Code