Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 8 additions & 0 deletions DECISIONS.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,14 @@ Active architectural and behavioral decisions for Agent Native PM.

When this file exceeds 50 entries or 30 KB, archive older entries to `DECISIONS_ARCHIVE.md`. The most recent archival pass was on 2026-04-22.

## 2026-04-25: Phase 6c v5.1 — L0 safety + authoring lifecycle + LLM router (suggest-only) + activity visibility [agent:feature-planner]

- **Context**: Phase 6b shipped role-dispatch end-to-end in code but the user-facing path is broken in three independent ways: (1) `execution_role` has no UI authoring surface, so the role_dispatch radio is permanently disabled (catch-22); (2) "auto-dispatch" is in name only — operators must still manually pick a role for every task; (3) connector activity during long task execution (backend-architect can run 90 min) is invisible to the frontend, making dogfood debugging impossible. Phase 5 §(g)/§(h) also flagged subprocess sandboxing as a Phase 6 blocker. The user explicitly rejected simple / phase-staged solutions: "一律不考慮簡單作法 我希望是完善的改動 而不是臨時性處理" (memory: `feedback_no_simple_approach`). Closing all four gaps in one phase is therefore in scope, even at ~15-day total cost.
- **Decision**: Phase 6c ships as **5 separate PRs**, each independently reviewable and rollback-able, but coherent as a single capability slice. **PR-1 (done)**: role catalog skeleton (hand-maintained `[]Role{...}` with per-role `DefaultTimeoutSec`) + L0 safety boundary in connector subprocess invocation (wall-clock timeout via `cmd.Cancel = SIGTERM` + `cmd.WaitDelay = 5s` escalation, output size cap via `boundedWriter`, JSON schema minimum validation, 4 new error kinds: `role_not_found`/`dispatch_timeout`/`output_too_large`/`invalid_result_schema`). **PR-2**: full authoring lifecycle — migration 030 adds generic `actor_audit` table (subject_kind/subject_id/field/old_value/new_value/actor_kind/actor_id/rationale/confidence) which is the **single source of truth** for execution_role authoring metadata (no denormalised columns on `backlog_candidates` per critic round 3 #1; frontend reads via helper `LatestAuthoring(subject_kind, subject_id, field)` joining against latest audit row); PATCH endpoint accepts `execution_role`; apply API extended with `execution_role` payload; catalog enforcement at four entry points (PATCH, apply, claim-next-task; suggest gates at validation in PR-3); CandidateReviewPanel rewritten so role_dispatch radio is always enabled with inline `<select>`; CandidateRoleEditor inline popover on candidate cards; stale-role warning when previously-suggested role no longer in catalog. **PR-3 (suggest-only, B2)**: LLM router as advisory meta-agent — new `backend/internal/prompts/meta/dispatcher.md` (category=meta, version 1, default timeout 60s; lives under `meta/` subtree per critic #10, drift test walks both `roles/` and `meta/`); `dispatcher.Service.Suggest` reuses PR-1's `invokeBuiltinCLI` for safety; `POST /api/backlog-candidates/:id/suggest-role` returns RouterResult without persisting; **Apply API NOT extended with `role_dispatch_auto` mode** — operator confirms suggested role manually then applies via `mode=role_dispatch` (audit row `actor_kind="operator"` even when suggestion came from router; this avoids premature auto-apply before router quality is dogfood-validated, per user pick B2); 1 new error kind (`router_no_match`); router output validated against catalog (role_id must be known or "no_match", confidence ∈ [0,1], reasoning ≤ 1024 chars with control-char sanitization); migration 032 reserved as PR-3 placeholder (per critic #9). **PR-4**: connector activity tracking — migration 031 adds `current_activity_json/at` snapshot column on `local_connectors`; ActivityReporter on connector emits phase transitions via **enqueue-not-overwrite** queue (per critic #5 — phase transitions in rapid succession all reach subscriber; same-phase step changes coalesce in 500ms window); phase enum is `idle/claiming_run/planning/claiming_task/dispatching/submitting` (no `routing` — that arrives in 6d with auto-apply); `POST /api/connector/activity` lightweight ingest endpoint; in-memory Hub broadcasts to SSE subscribers via unbuffered channels (slow clients auto-dropped); `GET /api/connectors/:id/activity-stream` SSE with 30s keepalive + `X-Accel-Buffering: no` header (C1: SSE retained vs polling-only); `GET /api/connectors/:id/activity` polling fallback; project-level aggregate `GET /api/projects/:id/active-connectors`; frontend `useConnectorActivity` hook auto-degrades SSE→polling→stale; `ConnectorActivityBadge` 3 density variants. **Activity does NOT write to actor_audit** (per critic #8 — 5+ phase transitions per task would drown the human-meaningful authoring trail; only latest snapshot persists). **PR-5**: dogfood (8 deliberate-trigger steps including activity SSE observation) + `docs/operating-rules.md` "Role-dispatch + visibility model" section + DECISIONS.md final + **archival pass** moving 2026-04-22-and-older entries to `DECISIONS_ARCHIVE.md` (per critic #11 — file already past 30KB threshold).
- **Alternatives considered**: (1) Defer authoring + router + activity to Phase 6d — rejected; the user explicitly stated dogfood today is broken without all three (see catch-22 analysis in plan v5 §1.1). (2) Single mega-PR — rejected; 15-day single PR is unreviewable; coherent slicing into 5 PRs preserves coherence while making each PR shippable in 1-5 days. (3) Apply-time-only authoring (no candidate edit + no audit) — rejected; user feedback explicitly requires comprehensive authoring (`feedback_no_simple_approach`); apply-time-only would force re-rework when Phase 6d's LLM planner pre-fills `execution_role` at candidate creation. (4) Polling-only activity (no SSE) — rejected at user pick C1; sub-second visibility is consistent with `feedback_no_simple_approach` even though 3s polling would technically suffice. (5) Sync `role_dispatch_auto` in PR-3 — **rejected at user pick B2 (critic round 3 #2)**; without dogfood data on router quality, auto-apply is premature optimization; PR-3 ships suggest-only and PR-6 (or 6d) lands auto-apply once PR-5 dogfood validates router accuracy. (6) Async `role_dispatch_auto` + webhook in 6c — deferred to 6d per user §5 Q3 answer (depends on auto-apply existing first). (7) Skip Role.Category field — rejected; without `category="meta"` filter the dispatcher prompt would surface in `/api/roles` and self-recommend, breaking the routing semantics. (8) `actor_audit` as candidate-specific table — rejected; designed generic from start (subject_kind discriminator) so PR-3 router-actor rows (when 6d auto-apply lands) and PR-4 system-actor rows reuse the same infrastructure. (9) Denormalised `execution_role_set_by/_at/_confidence` columns on `backlog_candidates` — **rejected at critic round 3 #1**; `actor_audit` is the single source of truth and frontend reads via JOIN helper to avoid drift between two writers. (10) Activity history written to `actor_audit` — **rejected at critic round 3 #8**; ~5 phase transitions per task dispatch would drown the human-meaningful authoring trail; activity only persists as latest snapshot, dedicated time-series store deferred to 6d if needed. (11) `dispatcher.md` placed at `prompts/dispatcher.md` siblng to `backlog.md` and `whatsnext.md` — **rejected at critic round 3 #10**; meta-prompts deserve their own subtree (`prompts/meta/`) for IA clarity and future expansion.
- **Constraints introduced**: **(a) Role catalog**: `backend/internal/roles/catalog.go` is hand-maintained `[]Role{...}` (no codegen); `Role.Category` ∈ {"role", "meta"}; `roles.All()` returns full set; `/api/roles` filters category="role"; `TestCatalogMatchesPromptDir` walks both `prompts/roles/*.md` and `prompts/meta/*.md`; PR adding a new role MUST edit both files. **(b) execution_role lifecycle**: writes go through `BacklogCandidateStore.UpdateExecutionRole(ctx, id, role, actor)` which does single-transaction validate → SELECT old → UPDATE → INSERT actor_audit; concurrent PATCH/apply use `BEGIN IMMEDIATE` (existing SQLite pattern). `set_by` ∈ {"", "operator", "router"}; `confidence` only set when `set_by="router"`. **(c) Apply API**: `mode=role_dispatch` requires non-empty `execution_role` in catalog → else 400; `mode=manual` ignores `execution_role`. (`mode=role_dispatch_auto` is deferred to Phase 6d per user pick B2 — see Alternatives §5.) **(d) Server-side claim enforcement**: `MarkTaskRoleNotFound` does `dispatch_status: queued → failed` atomic transition (NOT `running → failed`) plus single-tx audit row; non-applicable when task already leased. **(e) L0 safety**: per-role timeouts in catalog (code-reviewer=15min, test-writer=20min, api-contract-writer=30min, ui-scaffolder=45min, db-schema-designer=45min, backend-architect=90min, dispatcher=60min); `ANPM_DISPATCH_TIMEOUT` env override hierarchy: env>0 → env value, env=0 → disabled, env<0/unset → catalog → 30min fallback. SIGTERM→5s→SIGKILL escalation via `cmd.Cancel`+`cmd.WaitDelay`; adversarial test uses real subprocess with `signal.Ignore`. Output cap default 5 MB via `ANPM_DISPATCH_OUTPUT_MAX`; 0=disabled. JSON schema minimum: must contain `files []`; optional fields type-checked. **(f) Router**: dispatcher prompt is in `prompts/meta/dispatcher.md` (NOT under `roles/`) with `category: meta`; output validated for role_id ∈ catalog, confidence ∈ [0,1], reasoning length and control-char sanitization; alternatives all validated against catalog; `min_confidence` default 0.7 (reserved for Phase 6d auto-apply path); router output never trusted to be a valid catalog entry — Validation is a hard gate not a soft check. **(g) Activity model**: phases are exhaustive enum `idle/claiming_run/planning/claiming_task/dispatching/submitting` (no `routing` in 6c — that arrives in 6d alongside auto-apply); reporter uses **enqueue-not-overwrite** so rapid phase transitions all reach subscriber, only same-phase step changes coalesce within 500ms; in-memory Hub uses unbuffered subscriber channels with non-blocking send (slow client auto-drops, reconnect picks up via initial state); SSE includes 30s keepalive comments + `X-Accel-Buffering: no`; per-user concurrent SSE connections capped at 3 (503 above); DB persists latest snapshot only (Activity does NOT write to actor_audit per critic round 3 #8); idle activities retained 5 min before purge. **(h) Audit invariants**: every `execution_role` change writes to `actor_audit` in same transaction; `actor_kind` ∈ {"user", "router", "system", "connector"}; `rationale` stores router confidence + reasoning, or system change reason; cascade-delete with subject row. **(i) Operational constraint**: L0 boundary is the ONLY safety enforcement until L1 ships — operators MUST NOT expose role_dispatch to non-operator task submitters or untrusted task content; this is documented verbatim in `docs/operating-rules.md` and is non-negotiable. L1 (process-level jail via firejail/namespaces) is evaluated when Phase 6d opens; L2 (container/VM full isolation) requires one of three triggers fired: multi-tenant submitters, untrusted external repos, or compliance requirements. **(j) Phase 6d/7 trigger conditions**: recorded in `docs/phase6c-plan.md` §9; opening either phase without a documented trigger having fired is a scope-creep violation. **(k) PR ordering**: PR-1 first (already implemented); PR-2/3/4 sequential to avoid rebase cost (each later PR consumes earlier-PR types); PR-5 last (dogfood requires all four prior PRs).
- **Source**: `docs/phase6c-plan.md` v5.1 (post-critic-round-3, B2 + C1 拍板). Backed by dogfood-generated backlog candidates `bad629dc` (catalog SoT) and `fb040ce6` (safety boundary), both `approved` status as of 2026-04-25, plus design dialogues with the user that surfaced the catch-22, the LLM router request, the activity visibility requirement, and a critic round adversarially analyzing v5 that produced 14 findings (9 unilateral fixes adopted, B2 + C1 user-decided).

## 2026-04-25: Requirement discard, analysis filtering, and connector run-status badge [agent:application-implementer]

- **Context**: Three interconnected UX improvements: (1) Requirements with no applied tasks are permanently deletable (not just archivable); (2) `source=analysis` / `source=system` requirements should not appear in the sidebar or affect the isEmpty check; (3) The connector status badge should reflect when a run is actively leased or queued.
Expand Down
88 changes: 67 additions & 21 deletions backend/internal/connector/builtin_adapter.go
Original file line number Diff line number Diff line change
Expand Up @@ -62,14 +62,26 @@ func ExecuteBuiltin(ctx context.Context, input ExecJSONInput) models.LocalConnec
}

// Run CLI.
output, runErr := invokeBuiltinCLI(ctx, agent, binary, model, prompt, timeoutSec)
output, truncated, runErr := invokeBuiltinCLI(ctx, agent, binary, model, prompt, timeoutSec)
// Precedence: when both runErr and truncation are set, runErr is
// more informative (likely a timeout that produced partial output).
// The truncation-only branch fires when the CLI exited normally
// but produced more than the cap.
if runErr != "" {
return models.LocalConnectorSubmitRunResultRequest{
Success: false,
ErrorMessage: runErr,
CliInfo: &models.CliUsageInfo{Agent: agent, Model: model, ModelSource: modelSource},
}
}
if truncated {
return models.LocalConnectorSubmitRunResultRequest{
Success: false,
ErrorMessage: fmt.Sprintf("CLI stdout exceeded the dispatch output cap (%d bytes); raise ANPM_DISPATCH_OUTPUT_MAX or set 0 to disable", dispatchOutputMaxBytes()),
ErrorKind: models.ErrorKindOutputTooLarge,
CliInfo: &models.CliUsageInfo{Agent: agent, Model: model, ModelSource: modelSource},
}
}

// Strip ANSI (Codex PTY output may contain escape codes even after drain).
output = stripANSI(output)
Expand Down Expand Up @@ -565,11 +577,27 @@ func buildScopeSnippet(req *models.Requirement) string {
return strings.Join(parts, "\n")
}

// invokeBuiltinCLI runs the CLI and returns (output, errorMessage).
// invokeBuiltinCLI runs the CLI and returns (output, truncated, errorMessage).
//
// Phase 6c L0 safety boundary:
// - timeoutSec <= 0 → no wall-clock limit (escape hatch); the caller
// resolved this from roles.TimeoutFor (env=0) or chose to disable.
// - timeoutSec > 0 → context.WithTimeout wrapped with SIGTERM →
// sigtermGracePeriod → SIGKILL escalation (applyDispatchKillEscalation).
// - stdout is wrapped in boundedWriter using dispatchOutputMaxBytes;
// when the cap is exceeded, output is truncated and the second
// return value is true (caller maps to ErrorKindOutputTooLarge).
//
// For Claude: uses exec.CommandContext with -p flag.
// For Codex: uses creack/pty because Codex checks stdin.isTTY.
func invokeBuiltinCLI(ctx context.Context, agent, binary, model, prompt string, timeoutSec int) (string, string) {
runCtx, cancel := context.WithTimeout(ctx, time.Duration(timeoutSec)*time.Second)
func invokeBuiltinCLI(ctx context.Context, agent, binary, model, prompt string, timeoutSec int) (string, bool, string) {
var runCtx context.Context
var cancel context.CancelFunc
if timeoutSec > 0 {
runCtx, cancel = context.WithTimeout(ctx, time.Duration(timeoutSec)*time.Second)
} else {
runCtx, cancel = context.WithCancel(ctx)
}
defer cancel()

switch agent {
Expand All @@ -578,26 +606,28 @@ func invokeBuiltinCLI(ctx context.Context, agent, binary, model, prompt string,
case "codex":
return invokeCodexCLI(runCtx, binary, model, prompt, timeoutSec)
default:
return "", fmt.Sprintf("unsupported agent %q (expected 'claude' or 'codex')", agent)
return "", false, fmt.Sprintf("unsupported agent %q (expected 'claude' or 'codex')", agent)
}
}

// invokeClaudeCLI runs: claude -p <prompt> [--model <model>]
func invokeClaudeCLI(ctx context.Context, binary, model, prompt string, timeoutSec int) (string, string) {
func invokeClaudeCLI(ctx context.Context, binary, model, prompt string, timeoutSec int) (string, bool, string) {
args := []string{"-p", prompt}
if model != "" {
args = append(args, "--model", model)
}
cmd := exec.CommandContext(ctx, binary, args...)
cmd.Stdin = nil // disconnected
applyDispatchKillEscalation(cmd)

var stdout, stderr bytes.Buffer
cmd.Stdout = &stdout
bw := newBoundedWriter(&stdout, dispatchOutputMaxBytes())
cmd.Stdout = bw
cmd.Stderr = &stderr

if err := cmd.Run(); err != nil {
if ctx.Err() == context.DeadlineExceeded {
return "", fmt.Sprintf("claude CLI timed out after %ds", timeoutSec)
return "", bw.Truncated(), fmt.Sprintf("claude CLI timed out after %ds", timeoutSec)
}
detail := strings.TrimSpace(stderr.String())
if detail == "" {
Expand All @@ -609,33 +639,49 @@ func invokeClaudeCLI(ctx context.Context, binary, model, prompt string, timeoutS
if len(detail) > 400 {
detail = detail[:400]
}
return "", fmt.Sprintf("claude CLI failed: %s", detail)
return stdout.String(), bw.Truncated(), fmt.Sprintf("claude CLI failed: %s", detail)
}
return stdout.String(), ""
return stdout.String(), bw.Truncated(), ""
}

// invokeCodexCLI runs codex inside a PTY because Codex checks stdin.isTTY.
func invokeCodexCLI(ctx context.Context, binary, model, prompt string, timeoutSec int) (string, string) {
//
// PTY-specific concurrency: the io.Copy that drains the PTY master is
// run on a separate goroutine so we can Wait() on the process FIRST.
// On a CLI that ignores SIGTERM, exec sends SIGKILL after WaitDelay;
// once the process dies the kernel will eventually close its end of
// the PTY which lets io.Copy return — but on some platforms that close
// is delayed. Closing ptmx explicitly after Wait returns guarantees
// the goroutine unblocks immediately, and the copyDone channel ensures
// we capture all output before reading buf.
func invokeCodexCLI(ctx context.Context, binary, model, prompt string, timeoutSec int) (string, bool, string) {
args := []string{prompt}
if model != "" {
args = append(args, "--model", model)
}
cmd := exec.CommandContext(ctx, binary, args...)
applyDispatchKillEscalation(cmd)

ptmx, err := pty.Start(cmd)
if err != nil {
return "", fmt.Sprintf("pty unavailable for codex: %v", err)
return "", false, fmt.Sprintf("pty unavailable for codex: %v", err)
}
defer func() { _ = ptmx.Close() }()

// Drain all output from the PTY master into a buffer.
var buf bytes.Buffer
_, _ = io.Copy(&buf, ptmx)

// Wait for the command to finish.
if waitErr := cmd.Wait(); waitErr != nil {
bw := newBoundedWriter(&buf, dispatchOutputMaxBytes())
copyDone := make(chan struct{})
go func() {
_, _ = io.Copy(bw, ptmx)
close(copyDone)
}()

waitErr := cmd.Wait()
_ = ptmx.Close() // unblock the copy goroutine if it has not already returned
<-copyDone // ensure all output is captured before reading buf

if waitErr != nil {
if ctx.Err() == context.DeadlineExceeded {
return "", fmt.Sprintf("codex CLI timed out after %ds", timeoutSec)
return "", bw.Truncated(), fmt.Sprintf("codex CLI timed out after %ds", timeoutSec)
}
raw := stripANSI(buf.String())
raw = strings.ReplaceAll(raw, "\r\n", "\n")
Expand All @@ -644,13 +690,13 @@ func invokeCodexCLI(ctx context.Context, binary, model, prompt string, timeoutSe
if len(tail) > 600 {
tail = tail[len(tail)-600:]
}
return "", fmt.Sprintf("codex CLI failed (%v): %s", waitErr, tail)
return raw, bw.Truncated(), fmt.Sprintf("codex CLI failed (%v): %s", waitErr, tail)
}

raw := buf.String()
raw = strings.ReplaceAll(raw, "\r\n", "\n")
raw = strings.ReplaceAll(raw, "\r", "\n")
return raw, ""
return raw, bw.Truncated(), ""
}

// stripANSI removes ANSI escape codes from s.
Expand Down
Loading
Loading