Config defaults & environment fixes (Phase 0.9, items 1-3) by mattleaverton · Pull Request #77 · danshapiro/kilroy

mattleaverton · 2026-04-01T22:05:52Z

Summary

Auto-detection fills gaps in partial configs: loadOrBuildConfig() now runs provider auto-detection for all configs, not just zero-config. A config file that omits providers no longer disables auto-detection — config-file values still take precedence.
require_clean defaults to false: Kilroy creates its own worktree, so the parent repo's cleanliness is irrelevant. Dirty repos no longer block startup. Users can still opt in with require_clean: true.
KILROY_RUN_ID injected into tool nodes: Tool nodes now receive KILROY_RUN_ID, KILROY_LOGS_ROOT, KILROY_WORKTREE_DIR, and other stage runtime env vars, matching agent/codergen nodes. Fixes the .ai/runs/$KILROY_RUN_ID/ data-passing convention.

Test plan

TestToolGraph_PartialConfigAutoDetectsProviders — partial config + env key → provider detected, run succeeds
TestToolGraph_DirtyRepoSucceedsWithDefaultConfig — dirty repo, no require_clean in config → run succeeds
TestToolGraph_RunIDInjected — tool node echoes $KILROY_RUN_ID → matches expected run ID
TestRun_FailsWhenRepoIsDirty — updated to explicitly opt in to RequireClean: true
Full engine test suite passes
Manual end-to-end: built binary, ran partial config on dirty repo, confirmed auto-detection + env injection in stdout

🤖 Generated with Claude Code

When a user provides a --config file that doesn't mention providers, auto-detection was completely skipped, causing "missing provider backend" errors even with API keys in the environment. Now auto-detection always runs and fills in missing providers — config-file values still take precedence via ApplyDetectedProviders. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Kilroy creates its own worktree for each run, so the parent repo's cleanliness is irrelevant. The old default of true caused runs to fail before starting whenever the repo had untracked or modified files. Users who need the old behavior can set require_clean: true in config. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Tool nodes (shape=parallelogram with tool_command) were missing KILROY_RUN_ID, KILROY_LOGS_ROOT, KILROY_WORKTREE_DIR, and other stage runtime env vars that agent nodes receive. This broke the .ai/runs/$KILROY_RUN_ID/ data-passing convention between tool and agent nodes. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

The previous commit changed the unconditional assignment from true to false, but it still overwrote any value the caller passed. Remove the unconditional assignment (bool zero value is already false) and fix the test that relies on require_clean=true to explicitly opt in. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Complete, portable PR review workflow that reviews any GitHub PR: setup → build/test → collect diff → code review → holistic review → report. Verified end-to-end against danshapiro/kilroy PR danshapiro#77: all nodes succeeded, review-report.md produced, output contract collected, run recorded in DB. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs(plan): add Phase 0.9 first-run friction fixes, input/output contracts From first real PR review workflow run: - Phase 0.9: config/auto-detect conflict, require_clean default, missing env vars in tool nodes, CLI headless warning, error message hints, worktree file-not-found context - Phase 3.7-3.9: run input contract (--input), output contract, node data passing conventions - Future work: iteration patterns (dynamic loops over collections) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * wip: PR review workflow graph (v2) Setup/build/test tool nodes + single review agent node. Experimental workflow for automated PR triage. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * wip: fix build-test script survival across branch switch Setup script now copies build-test.sh to .ai/ before gh pr checkout changes the branch and removes workflow files. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * wip: PR review workflow v3 — investigate/decide split - Raw git checkout (no gh pr checkout) with unique branch names for parallel safety - Separate investigate (exploratory, full tool access) and decide (directive, no tools) agents - Tighter output contract: next actions instead of follow-up tasks - Setup script preserves all workflow scripts to .ai/ before branch switch Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * wip: make PR review workflow repo-agnostic - Absolute path for setup script (works from any repo's worktree) - Remove Go-specific assumptions from investigate prompt - Agent discovers build system and runs appropriate checks - Add freshell run config Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs(plan): platform reframe — layered architecture for software ops platform Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * engine: export types and create L1/L2 package structure Create agents/ (L1) and workflows/ (L2) package directories. Split NewDefaultRegistry into NewCoreRegistry (L0-only) + NewDefaultRegistry (backward compat). Export engine types and methods needed by external handler packages: StatusSource, FallbackStatusPath, StageStatusContract, Truncate, WarnEngine, BuildFidelityPreamble, ClassifyAPIError, etc. Add Engine accessor methods: AppendProgress, CXDBPrompt, CXDBInterviewStarted/Completed/Timeout, LastResolvedFidelity, SetDefault. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * engine: wire layered handler registration from cmd/kilroy/ Add Registry field to RunOptions so cmd/kilroy/ can pass a pre-composed handler registry. cmd/kilroy/ now creates a layered registry: L0: engine.NewCoreRegistry() (start, exit, conditional, tool, parallel, fan_in) L1: agents.AgentHandler (codergen/default) L2: workflows.HumanGateHandler, workflows.ManagerLoopHandler Type aliases in agents/ and workflows/ establish the package structure and import direction. Implementations remain in engine/ until Phases 2-3. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * engine: add test proving tool-only graph works with L0-only registry TestCoreRegistry_ToolOnlyGraph demonstrates that a graph using only tool_command nodes executes successfully with NewCoreRegistry (no L1 agent handler or L2 workflow handlers registered). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * engine: rename codergen to agent throughout Rename Go types: CodergenBackend→AgentBackend, CodergenRouter→AgentRouter, SimulatedCodergenBackend→SimulatedAgentBackend, etc. Rename handler type string: "codergen"→"agent" in registry. Rename DOT attribute: agent_mode replaces codergen_mode (with fallback). Rename files: codergen_*.go → agent_*.go. Update comments, diagnostics, test names. The CodergenHandler type name is retained in engine/ as the implementation type (aliased as agents.AgentHandler). It will be renamed when the implementation moves to agents/ in Phase 2.1. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * style: gofmt formatting pass after rename Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * rundb: add SQLite run database with migration runner New package internal/attractor/rundb backed by modernc.org/sqlite (pure Go, no CGO). Global DB at ~/.local/state/kilroy/runs.db. Features: - Auto-applying numbered SQL migrations on DB open - WAL mode for concurrent reads - Schema: runs (with labels, inputs, timing), node_executions (with attempts), edge_decisions, provider_selections - Write ops: InsertRun, CompleteRun, InsertNodeStart, CompleteNode, InsertEdgeDecision, InsertProviderSelection - Read ops: GetRun, LatestRun, ListRuns (filter by status/labels/graph), GetNodeExecutions - Prune ops: PruneRuns (by date, graph, labels, or orphaned logs_root) - Cascade deletes: child records deleted with parent run - 16 tests covering all operations Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * engine: wire RunDB into engine lifecycle events Add RunDBWriter interface in engine/ (no rundb import needed). Engine records to RunDB at every lifecycle point: - run start (after worktree ready) - node start/complete (each execution) - edge selection decisions - run complete (success/failure) All RunDB calls are best-effort (warn on error, never block). cmd/kilroy/ opens global DB and passes via RunOptions.RunDB. Integration test proves tool graph produces correct DB entries. Also fix RunOptions propagation: RunDB, Registry, Labels now properly forwarded through bootstrapRunWithConfig overrides. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * cli: update runs list/prune/status --latest to use RunDB runs list: queries RunDB first, falls back to filesystem scan. Now shows duration column. runs prune: delegates to RunDB with filter support (before, graph, labels, orphans). status --latest: instant lookup via RunDB.LatestRun() instead of filesystem scan. All commands gracefully fall back to filesystem-based behavior when the RunDB is unavailable or empty. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * engine: add input contract system Structured inputs for graph runs via --input flag. Features: - LoadInputFile (YAML/JSON) and LoadInputString (inline JSON) - Graph declares required inputs via inputs="key1,key2" attribute - ValidateRequiredInputs rejects runs with missing declared inputs - Input values injected into context as input.* keys - Input values expanded in prompts as $input.key placeholders - Input values available as KILROY_INPUT_* env vars in tool_command nodes - Inputs recorded in RunDB Integration test proves tool_command nodes see KILROY_INPUT_* env vars. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * engine: add workspace abstraction Separates graph file location from execution location: - --workspace /path/to/dir sets the execution directory - --graph /path/to/graph.dot determines where prompt_file resolves - When --workspace is omitted, defaults to current behavior - GraphDir derived from --graph path for prompt_file resolution - Workspace flows through RunOptions → engine → worktree creation PrepareOptions gains GraphDir field that takes precedence over RepoPath for prompt_file resolution, enabling cross-repo workflows where graphs and scripts live separately from the target project. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * engine: add output contract Graphs declare expected output artifacts via outputs="file1,file2" attribute. After run completion, the engine: - Searches for declared outputs in the worktree - Copies found outputs to {logs_root}/outputs/ - Writes outputs.json manifest with found/missing status and file sizes - Emits warnings for missing declared outputs (not errors) - Records output collection in progress events Hooked into persistTerminalOutcome so outputs are collected on every run completion (success or failure). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * server: add /runs API endpoints and /pipelines backward compat Add canonical /runs endpoints per platform-reframe plan: - GET /runs: list runs from RunDB (with status/graph filters) - GET /runs/{id}/outputs: list collected output artifacts Existing /pipelines endpoints retained as backward-compat aliases. All /runs endpoints mirror /pipelines for submit, status, events, cancel, context, and questions. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * cli: add --label flag, --older-than duration for prune Run lifecycle management additions: - --label KEY=VALUE flag on attractor run (stored in RunDB) - --older-than 7d duration-based prune filter (supports d/h/m units) - Labels passed through to RunDB via RunOptions Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * agents: add tmux session manager and invocation templates New packages for Layer 1 agent capabilities: agents/tmux/ — tmux session management: - Session creation with two-step pattern (shell → respawn-pane) - Input delivery with sanitization, chunking, and Enter verification - Output capture with NBSP normalization for prompt detection - Readiness/idle/exit detection via polling with busy indicators - Process tree cleanup (SIGTERM → grace → SIGKILL) - Socket isolation (kilroy-specific tmux socket) - Session environment variable storage for metadata - 11 integration tests against real tmux on isolated socket agents/templates/ — per-tool invocation templates: - Template struct with per-tool config (args, env, prompt prefix, busy indicators, startup dialogs, exit behavior) - Built-in templates: claude, codex, gemini, opencode - Template registry for name-based lookup Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * agents/tmux: add smoke tests for real CLI tools TestSmoke_Claude_PrintMode spawns Claude Code via tmux in --print mode, waits for exit, and captures output. Verified working: Claude returns KILROY_SMOKE_OK, session exits with status 0. Tests skip gracefully when API keys or CLI tools are unavailable. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * agents: add TmuxAgentHandler for CLI tool execution via tmux TmuxAgentHandler implements engine.Handler and orchestrates the full agent lifecycle via tmux sessions: 1. Resolve tool template from node attributes (agent_tool or llm_provider) 2. Build command and environment from template 3. Create tmux session on isolated socket 4. Handle startup dialogs (trust prompts, permission warnings) 5. Wait for completion (exit-based or idle-detection) 6. Capture output and build outcome 7. Clean up session and process tree Two agent handlers now available: - AgentHandler: existing subprocess/API backend (backward compat) - TmuxAgentHandler: tmux-based CLI sessions (new) Also adds SendKeys method to tmux.Manager for dialog interaction. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * agents: wire TmuxAgentHandler end-to-end and prove it works Wire TmuxAgentHandler into cmd/kilroy/ via --tmux flag. Add exit code detection via tmux #{pane_dead_status}. Three integration test scenarios verified against real Claude via tmux: 1. Simple agent task: Claude creates a file, tool node verifies it exists → status: success, KILROY_TMUX_TEST_PASS confirmed 2. Multi-node pipeline: Claude writes calc.sh, tool node executes it, conditional routes on result → 42 computed, routed to success exit 3. Failure routing: Agent succeeds, tool node intentionally fails (cat nonexistent file), conditional routes to fail exit → correct routing All three runs recorded in RunDB with timing. TmuxAgentHandler correctly: - Spawns Claude in tmux sessions on isolated socket - Passes prompt and environment variables - Captures output and exit codes - Detects failures via non-zero exit codes - Cleans up sessions and process trees Also adds 3 unit tests with fake agent scripts proving handler contract: success path, failure detection (exit code 1), and workdir file creation. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs(plan): update Phase 3 with Fabro learnings and implementation notes Add event envelope canonicalization task (3.6), workflow.toml concept for packages, run retro idea, and testing emphasis notes from Phase 2 review. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * engine: add GitOps interface for version control abstraction Defines the GitOps interface that encapsulates all git operations the engine needs. When nil, the engine will operate in plain-directory mode. Added GitOps field to RunOptions and Engine struct. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * workflows: implement GitHook using gitutil GitHook implements the engine.GitOps interface, wrapping gitutil functions for worktree isolation, per-node commits, and branch management. This is the Layer 2 implementation that will replace direct gitutil calls in the engine. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * engine: replace all gitutil imports with GitOps interface The engine package now has zero direct gitutil imports. All git operations go through the GitOps interface, which is optional. When GitOps is nil, the engine operates in plain-directory mode: no worktrees, no commits, no branch management. Key changes: - engine.run() conditionally sets up git workspace via GitOps - checkpoint() only commits when GitOps is set - parallel_handlers use GitOps for branch workspace isolation, falling back to temp directory copy for no-git mode - resume uses GitOps for worktree recreation - config_defaults accepts GitOps parameter - cmd/kilroy/ creates GitHook and wires it through Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * cmd: auto-detect git mode and fix workspace git detection GitOps is now auto-detected: if the workspace (or cwd when no workspace is specified) is a git repo, git worktrees and commits are enabled. Otherwise, runs proceed in plain-directory mode. DefaultRunConfig now accepts an explicit repoPath parameter so --workspace correctly routes to the git repo. Tested against real binary: - Graph in git repo: worktree + commits created - Graph in plain dir: runs successfully without git - Graph with --workspace to git repo: worktree + commits created Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * engine: add AutoDetectGitOps for backward-compatible git detection Adds a pluggable AutoDetectGitOps factory function that eng.run() and bootstrapRunWithConfig call when GitOps is not explicitly set. This preserves backward compatibility: existing callers that set RepoPath to a git repo automatically get git worktree behavior. Fixes: - Branch engine now inherits GitOps from parent (parallel commits work) - TestRun_FailsWhenNotAGitRepo renamed to TestRun_SucceedsInNonGitDir (non-git dirs are now valid — the intended Phase 3.1 behavior) - cmd/kilroy/ registers AutoDetectGitOps at init time - Test TestMain registers testGitOps auto-detection Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * workflows: extract HumanGateHandler to Layer 2 HumanGateHandler in workflows/ is the real implementation of the wait.human handler, using exported CXDB event methods on Engine. Removed wait.human from NewCoreRegistry — it's now registered only by cmd/kilroy/ via the layered composition. Tests that build engines manually now explicitly register wait.human. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * workflows: add workflow package support A workflow package bundles a graph with scripts and prompts into a portable, self-contained directory. Supports --package flag, package materialization at .kilroy/package/ in workspace, and workflow.toml manifest with metadata, inputs, outputs, and default labels. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * workflows: add supervisor prototype with run health assessment The supervisor monitors runs by querying the RunDB and classifies health as: healthy, degraded, blocked, failed, or complete. Detection signals: - Blocked: same node failing 3+ times, or no progress for 10min - Degraded: nodes retrying - Failed: terminal run status Per-node timing visible via `kilroy attractor status --run <id>`. Tested against real multi-node graph run. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * engine: document CXDB integration boundary for future extraction CXDB is already effectively optional via nil-safe methods and the DisableCXDB flag. Full extraction (removing cxdb imports from engine/) is deferred — the nil-safe pattern provides zero overhead when disabled, and the extraction surface is large (30+ event methods, streaming, bootstrap, resume). Documents the extraction path for future work. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * engine: add canonical RunEvent envelope for all progress events All progress events now include a unique ID, UTC timestamp, run_id, and dot-notation event name. The RunEvent type provides the canonical envelope, with ToMap/FromMap bridging to the existing progress system. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * agents: fix codex and opencode tmux templates Codex: TUI mode doesn't exit after completion, so ExitsOnComplete was wrong. Switch to idle detection with correct prompt prefix (›) and busy indicators. Add StartupDialog to auto-dismiss the trust prompt. OpenCode: positional args are directory paths, not prompts. Use the `run` subcommand for headless execution. Set ExitsOnComplete since `opencode run` exits after completion. Found during Phase 1-3 integration verification. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: add Phase 1-3 integration verification report Real end-to-end testing of 5 scenarios against the ./kilroy binary. Found and fixed 2 bugs in agent templates (codex trust prompt, opencode run subcommand). Documents what works, what's broken, and what needs follow-up. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: update verification report with codex success Codex retest with fixed template completed end-to-end: startup dialog dismissed, task completed, idle detection triggered, verify passed. All 6 scenarios now pass. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * workflows: add build-test workflow package (Phase 4.3) Pure Layer 0 workflow that detects build systems, runs build + test, and produces build-report.json. Supports Go, Cargo, npm, Make, CMake, Maven, and Gradle with optional command overrides via inputs. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * workflows: add PR review workflow package (Phase 4.1) Complete, portable PR review workflow that reviews any GitHub PR: setup → build/test → collect diff → code review → holistic review → report. Verified end-to-end against danshapiro/kilroy PR #77: all nodes succeeded, review-report.md produced, output contract collected, run recorded in DB. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * server: add DB fallback for run detail, workflows endpoint, sort filter - GET /runs/{id} falls back to RunDB for completed runs, returns nodes, edges, provider selections, and DOT source - GET /workflows lists available workflow packages from workflows/ dir - GET /runs supports ?sort=newest|oldest|longest - GET /runs/{id}/context falls back to DB for completed runs - Add GetEdgeDecisions, GetProviderSelections, GetDotSource to RunDB Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * server: fix JSON casing, reconcile stale runs on startup - Add json tags to all RunDB read types (snake_case consistently) - Reconcile runs stuck in "running" for >2h as "interrupted" on server start - 3 zombie runs from testing correctly marked interrupted Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * server: add output download, provider recording, omitempty cleanup - GET /runs/{id}/outputs/{name} serves artifact content (md, json, txt) - Engine records provider/model selection to DB for agent nodes - Add omitempty to JSON tags so empty strings are omitted - Add GetEdgeDecisions, GetProviderSelections read methods to RunDB Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * server: add node turns endpoint, edge conditions, output download - GET /runs/{id}/nodes/{nodeId}/turns returns prompt, response, stdout, stderr, status, and timing for any node — works for both agent and tool nodes - Edge decisions now record the condition expression (e.g. "outcome=success") alongside the selection reason, via new DB migration - GET /runs/{id}/outputs/{name} serves artifact content directly Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * server: POST /runs accepts workflow packages, inputs, workspace POST /runs now supports two modes: 1. Legacy: dot_source + config_path (existing behavior) 2. Workflow package: { workflow: "pr-review", workspace: "/path", inputs: { pr_number: "42" }, labels: { team: "infra" }, tmux: true } Workflow name is resolved from workflows/ directory. Git integration auto-detected from workspace path. Config auto-built when not provided. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * server: fix run detail to use DB for completed runs in registry Completed runs in the in-memory registry returned sparse PipelineStatus format. Now falls through to DB for the rich response (nodes, edges, providers, DOT source) when the run is no longer actively running. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs(plan): file conventions, git diff tracking, and run observability Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * engine: add .kilroy/ file conventions for inter-node data exchange Creates .kilroy/ directory in workspace at run start with standard files: - INPUT.md: structured inputs as readable markdown - CONTEXT.md: accumulated run context, updated before each node - TASK.md: current node's task description - FEEDBACK.md: failure details written on retry - $KILROY_DATA_DIR env var points to .kilroy/ directory - .kilroy/ auto-added to .gitignore in git-managed workspaces Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * engine: add output contract enforcement at graph and node levels Graph-level: DeclaredOutputs/CollectOutputs collect files declared in graph outputs= attribute to logs_root/outputs/ after run completion. Implements CollectAndRecordOutputs that was previously a placeholder. Node-level: optional outputs= attribute on nodes, checked after execution. Missing files downgrade outcome to fail, write FEEDBACK.md, and trigger retry if retries remain. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * engine: add per-node git diff tracking - 003_node_diffs.sql migration for before/after SHA recording - DiffStat added to GitOps interface and gitutil - Diff/DiffFileList added to gitutil for full diff retrieval - Engine captures before SHA pre-execution, records diff after checkpoint - RunDB read methods: GetNodeDiffs (all), GetNodeDiff (single node+attempt) - RecordNodeDiff added to RunDBWriter interface Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * server: add diff API endpoint for per-node git diffs GET /runs/{id}/nodes/{nodeId}/diff returns summary from DB plus full unified diff and per-file stats from git. Supports ?attempt=N query param (defaults to latest). Returns 404 for nodes without diff data, degrades gracefully when worktree is unavailable. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * server: add file browser API for run logs and workspace GET /runs/{id}/files/{path} - browse and download from logs_root GET /runs/{id}/workspace/{path} - browse and download from worktree Directory listings return JSON with name, size, is_dir, modified_at. File downloads set content-type based on extension. Path traversal protection via filepath.Abs prefix checking. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * test: add DiffStat to testGitOps helper Implements the new GitOps.DiffStat method on the test helper so engine tests compile with the updated interface. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs(plan): canonical run log — single verbose activity log per run Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * engine: add RunLog type for canonical per-run activity logging Writes structured NDJSON events to {logs_root}/run.log with timestamp, level, source, node, event, message, and optional data fields. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * engine: wire RunLog lifecycle events at run/node/edge/checkpoint points Emits run.started, node.started, node.completed, edge.selected, context.updated, checkpoint.saved, and run.completed events to run.log alongside existing progress.ndjson calls. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * engine: stream tool stdout/stderr line-by-line through RunLog LineWriter tees command output to both the log file and RunLog events. Each newline-terminated line becomes a source=tool stdout/stderr event. Also emits exit code event when the tool process completes. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * server: add GET /runs/{id}/log endpoint with filtering and SSE streaming Serves the canonical run.log with query params: ?node=, ?source=, ?event=, ?since=, ?tail= for filtered reads, and ?stream=true for live SSE tailing via polling. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * agents: add conversation log parsers for Claude, Codex, and OpenCode Each template gets a LogLocator that finds the CLI tool's conversation log, and a parser that extracts tool_call, tool_result, text, and thinking events. The tmux handler emits parsed events to RunLog after agent completion. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * engine: add git activity events to RunLog (worktree.created, commit) Emits worktree.created when the run worktree is set up, and commit events with diff stats when recordNodeDiff finds file changes. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: provider recording timing and model passing to CLI tools Move rundbRecordProviderIfAgent to after executeWithRetry so provider/model attrs are populated. Pass llm_model from node attributes through to CLI tool --model flags. Record agent_tool as the backend in provider_selections. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * workflows: add multi-tool exercise for observability testing Exercises input/output contracts, .kilroy/ convention files, agent_tool routing across claude/codex/opencode, edge conditions, and run log events. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * agents: isolate CLI tool auth for headless operation Claude: add --bare flag to skip keychain/OAuth, rely purely on ANTHROPIC_API_KEY env var. Removes need for startup dialog handling. Codex: use exec subcommand with --full-auto --skip-git-repo-check. Write isolated auth.json under CODEX_HOME per session so codex uses the API key without touching ~/.codex/. OpenCode: add --format json --pure --dir flags. Inject provider config via OPENCODE_CONFIG_CONTENT env var for keyless config isolation. Add PrepareSession hook to Template for per-tool filesystem setup before tmux session creation. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * engine: add --skip-preflight flag for tmux-managed sessions The CLI preflight probe uses the old subprocess invocation path which doesn't match tmux template auth isolation. Skip it when the caller knows the tools are configured. Also fix codex node to use o3-mini model (can't probe claude model on the openai provider). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: CLI tool model IDs and auth format from first exercise run Claude: normalize dots to dashes in model ID (claude-sonnet-4.6 → claude-sonnet-4-6) since Claude CLI uses dash format. Codex: auth.json auth_mode must be lowercase "apikey" not "ApiKey". OpenCode: model format is provider/model (anthropic/claude-sonnet-4-6), add prefix and normalize dots. Also fix SkipPreflight not propagating through RunOptions override copy. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: codex auth.json uses OPENAI_API_KEY field, not token Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * workflows: switch codex model to codex-mini-latest o3-mini doesn't support codex's web_search_preview tool. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: codex template uses sandbox flag directly, gpt-5.4-nano model --full-auto implied web_search_preview which most models reject. Use --sandbox workspace-write directly. Switch to gpt-5.4-nano which supports codex's tool set. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * agents: capture structured JSONL output for agent log parsing Each CLI template now produces structured output (stream-json for claude, --json for codex, --format json for opencode). The handler redirects stdout to {stageDir}/agent_output.jsonl and parses it directly — no more hunting through tool-specific log directories. LogLocator remains as fallback for non-structured-output modes. Response text is extracted from the JSONL for response.md. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: disable codex web_search to support nano/mini models Codex defaults web_search="cached" which sends web_search_preview tool on every request. Most small models reject it. Disable it since Kilroy agents don't need web search. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: rewrite codex and opencode log parsers for actual JSONL formats Parsers were written speculatively. Now matched to real output: - Codex: item.completed/started with agent_message, command_execution - OpenCode: tool_use with nested part.state, text events Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: pr-review workflow for worktree and tmux agent execution - Fix .git detection to use -e (file or dir) for worktree support - Convert prompt_file to inline prompts with KILROY_STAGE_STATUS_PATH - Add output contract declarations on agent nodes Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * agents: real-time agent log streaming during execution Add TailJSONL that watches agent_output.jsonl and emits events to RunLog as lines appear. Refactor parsers to expose per-line functions (ParseClaudeLine, ParseCodexLine, ParseOpenCodeLine) used by both the tailer and batch parsing. The tailer starts when the tmux session is created and stops when the agent exits. Events flow through RunLog to the SSE endpoint in real time, so the UI sees agent tool calls as they happen rather than in a batch after completion. Falls back to batch parsing when structured output isn't available. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: bump stall watchdog from agent log tailer When structured output is redirected to a file, the tmux pane is empty and the stall watchdog sees no progress. The real-time tailer now calls TickStallWatchdog on each parsed event, keeping the watchdog alive during agent execution. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: record run failure in DB on engine error return When eng.run() returns an error (stall watchdog, context cancellation, etc.), the run stayed as "running" in the DB forever. Now RunWithConfig records a fail status before returning the error. The existing ReconcileStaleRuns on server startup handles panics and unclean exits as a safety net. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * server: SQLite concurrent writes, invocation capture, prefix IDs, cancel fix - SQLite: add busy_timeout(5000), synchronous(normal), SetMaxOpenConns(1) so concurrent detached runs don't silently fail to register in the DB - Detach: forward --input, --workspace, --package, --tmux, --skip-preflight, --label flags to child process; resolve relative paths to absolute - Invocation capture: record os.Args and run config in manifest.json and runs DB (new migration 004); expose in API response - Prefix ID matching: GET /runs/{short-id} resolves to full run ID via DB prefix query and in-memory registry scan - Cancel: fall back to PID-based SIGTERM for CLI-launched detached runs that aren't in the server's in-memory pipeline registry - AGENTS.md: document backend:cli vs api, --tmux flag, correct run config Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * server: embed and serve dashboard UI at /ui/ Bundles the single-file Kilroy dashboard SPA (index.html + Graphviz WASM worker) into the binary via //go:embed so `kilroy attractor serve` exposes a working dashboard at http://localhost:9700/ui/ with no extra processes or CORS configuration. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * engine: loop primitive, per-iteration artifact capture, label filters - Artifact capture: new node_execution_artifacts table (migration 005). At rundbRecordNodeComplete, ingest stage files (prompt, response, agent_output.jsonl, events.ndjson, status, stdout/stderr, tool_timing, etc.) as blobs keyed by node_execution_id. Each retry and each loop iteration gets its own DB row + captured artifacts, fixing retry history loss and enabling loop iteration history. - handleGetNodeTurns now serves from DB first with filesystem fallback for legacy runs. Response includes source="db"|"filesystem" so the UI can tell. - Loop primitive: new trapezium (loop.begin) and invtrapezium (loop.end) node shapes for multi-node loops, plus loop_count/loop_until_file/ loop_until_file_contains/loop_max attributes on any node for single-node loops. Termination evaluated after each iteration; loop_max exceeded fails the run. Separate from existing loop_restart which only handles transient_infra failure restarts. - Loop iterations tracked in Engine.loopIterations so each iteration gets a distinct attempt number in node_executions (currently on the loop-back target; body-node attempt numbering is a UX follow-up). - Label filtering wired end to end: GET /runs?label=KEY=VALUE&limit=N and kilroy attractor runs list --label --status --graph --limit. Underlying DB filter already existed; just surfaced to API and CLI. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * engine: per-iteration attempt numbering, script capture, loop UI - activeLoopIteration tracks current iteration across an entire loop body so every node execution inside a multi-node loop records a distinct attempt number (previously only the jump target got incremented, body nodes all recorded attempt=1). - captureReferencedScripts reads tool_invocation.json, tokenizes argv and command fields (handling bash -c "sh script.sh" pattern), and captures referenced script files as tool_script:<name> artifacts. The UI shows them alongside stdout/stderr in the Detail tab. - New endpoint GET /runs/{id}/nodes/{nodeId}/attempts returns all attempt rows for a node. GET /runs/{id}/nodes/{nodeId}/turns now accepts ?attempt=N to load a specific iteration's captured artifacts. - UI: sidebar shows iteration badges (↻1/5, ↻2/5, ...) when a node has multiple attempts. Detail tab shows "Iteration N of M" banner and passes n.attempt to the turns fetch so each iteration loads its own data. Command and captured scripts render in the Detail view. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * engine: concurrent split/join primitive New process-flow-level primitive for running independent node chains in parallel in the shared workspace. Distinct from the existing parallel handler (shape=component) which is worktree-isolated and winner-takes-all for LLM code-gen branching. - Shapes: pentagon → concurrent.split, cylinder → concurrent.join. Paired via concurrent_id attribute (defaults to the split node's ID). - runConcurrentRegion dispatches each outgoing edge from the split as a goroutine running runBranchUntilJoin. All branches share the engine's context, DB writer, git worktree, and progress sink. Each node executes through the same rundbRecordNodeStart/executeWithRetry/CompleteNode/ CaptureArtifacts sequence as the main loop. - Fail-fast: first branch error cancels the parent context, siblings exit at their next cancellation checkpoint. Optional allow_partial=true attribute on the split disables fail-fast. - Git commits: suppressed for non-sentinel nodes while concurrentDepth > 0. Concurrent region is treated as one atomic checkpoint unit. - Rejects nested concurrent regions and loops inside concurrent regions as runtime errors. Graph validation rule can be added later. Known follow-up: subprocess cancellation does not kill running child processes (sleep in a cancelled branch runs to completion) — branch goroutines see the cancelled context but the tool handler's exec doesn't propagate the kill. Separate lifecycle concern, not specific to the concurrent primitive. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * engine+validate: kill subprocess group on cancel, reject nested concurrent/loop Subprocess cancellation: - ToolHandler now runs commands in their own process group via setProcessGroupAttr (Setpgid=true) and sets cmd.Cancel to forceKillProcessGroup so context cancellation kills the entire process tree, not just the shell. Before this fix, a cancelled `bash -c "sleep 20"` left sleep as an orphan with the stdout pipe open, and cmd.Wait() blocked for the full 20s. Verified with a fail-fast concurrent test: total run time dropped from 20.5s to 1.1s. Graph validation: - lintConcurrentSplitMinBranches: concurrent_split requires ≥2 outgoing edges - lintConcurrentSplitHasJoin: concurrent_split must have a paired concurrent_join - lintNoNestedConcurrentRegions: concurrent regions cannot be nested - lintNoLoopsInConcurrentRegions: loops cannot be nested in concurrent regions - Pairs are matched by concurrent_id attribute (falling back to node ID) - nodesBetween walks the graph from the split forward to the join to build the "inside the region" set Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * cli+workflows: quick-launch package + runs show command Adds a minimal two-node workflow (stage + agent) for kicking off one-shot investigation runs with --input '{"prompt":...,"context_file":...}'. Three graph variants route to claude, codex, or gemini via the existing agent_tool/model_stylesheet mechanism. Adds `kilroy attractor runs show <id-or-prefix>` with --json, --outputs, and --print <file> modes so an agentic caller can pull result.md (or any declared output) back out without poking at the logs directory by hand. runs list --json now carries worktree_dir, repo_path, run_branch, and logs_root too. * skills: quick-launch skill + using-kilroy refresh New quick-launch skill gives agents the exact invocation for firing a one-shot delegated run: --detach --tmux + --package + --label + --input and the follow-up runs list / runs show / runs show --print flow for checking status and pulling result.md back out. Structured after the trycycle subskill style: action-oriented steps, no theory. using-kilroy was missing several current flags (--package, --tmux, --label, --input, --workspace, --skip-cli-headless-warning) and had no coverage of the runs subcommand family, so those gaps are filled in alongside a pointer to quick-launch for the one-shot case. * skills/quick-launch: slash command + stable paths Adds skills/quick-launch/commands/kilroy-quick.md as the canonical slash command file, symlinked into ~/.claude/commands/ and ~/.codex/commands/ at install time. One source of truth, live-editable from the repo. Updates SKILL.md to reference ~/.local/share/kilroy/workflows/quick-launch (installed as a symlink) instead of an <ABS_PATH> placeholder, and drops the --config requirement — kilroy auto-builds a default run config when cwd is a git repo and auto-detects installed provider CLIs. Verified with a bare git init + config-less launch. * quick-launch: ergonomics pass + workspace fix + install script Driven by feedback from testing /kilroy-quick in Claude. Five changes: 1. --prompt-file <path>: read a file verbatim into the "prompt" input key. Replaces hand-escaped multi-line JSON in --input. Strongly preferred for anything beyond a one-liner — no \n escapes, no quoting hazards. 2. Auto --no-cxdb when --config is absent. The zero-config default run config doesn't populate cxdb addresses, so requiring cxdb was just noise. Explicit --config with cxdb.binary_addr still enables it. 3. Auto-skip the interactive CLI-backend warning when stdin isn't a terminal. Uses mattn/go-isatty because a naive Mode&CharDevice check treats /dev/null as a TTY. Agent-driven invocations, CI, pipes, and the detach child all hit this path. 4. runs show --latest --label k=v and new runs wait subcommand. show returns the most recent matching run; wait polls the run DB until the target reaches a terminal state and exits 0/1/2 for success/fail/timeout. Both support the same id-or-prefix-or-latest target resolution. 5. launchDetached was starting the child with cmd.Dir=logs_root, so the detach child's cwd was the logs dir instead of the user's workspace — runs reported repo_path pointing at the logs dir and worktrees never saw the real files. Parent now forwards its own cwd to the child via --workspace when none was passed explicitly. Quick-launch workflow package simplified to a single agent node. The previous stage.sh wrote .kilroy/TASK.md, but the engine rewrites that file before every node; contents got clobbered. Inputs now land in .kilroy/INPUT.md (written once at run start) and the agent reads from there directly. scripts/install-skills.sh wires everything up idempotently: symlinks for the binary, the workflows dir, and the skills/commands into ~/.claude, ~/.agents (codex's native discovery path — not ~/.codex/skills), and ~/.config/opencode. Also rebuilds the SKILL.md to document --prompt-file as the default path for non-trivial tasks, drops the --no-cxdb / --skip-cli-headless-warning mentions, and points to runs wait / runs show --latest for the check-status / retrieve-result flow. * engine: isolate codex runs from user's shell profile Kilroy's isolated codex home used to copy both auth.json and config.toml from the user's real ~/.codex/ into the kilroy-owned codex state dir. That leaked user-scoped settings (model_reasoning_effort, personality, model) into kilroy runs, so a setting that worked for the user's interactive codex sessions could silently break kilroy runs for specific models — notably `gpt-5-codex` rejecting the inherited `model_reasoning_effort = "xhigh"` upstream with a 400 during preflight probes. Two fixes here: 1. Drop the config.toml copy entirely. Run configuration must come from kilroy and the .dot graph, not by accident from whatever the user has in ~/.codex/config.toml. If kilroy codex runs need specific settings, those belong in the graph or run.yaml. 2. When OPENAI_API_KEY is available in the parent env, write a fresh apikey auth.json into the isolated codex home instead of copying whatever auth.json the user has. This matches what tmux_handler.go + templates/codex.go already does for non-probe runs, so the probe stops diverging from the real run: both paths now force apikey mode when a key is present. When no OPENAI_API_KEY is set, kilroy still falls back to copying the user's auth.json (subscription auth). Probes under that path can't exercise apikey-only models like gpt-5-codex, but the rest of preflight still runs against something plausible. Tests updated: the old assertions on config.toml existence are replaced with explicit "must-not-exist" checks, and a new test covers the apikey auth.json write path. Verified end-to-end with a codex graph.codex.dot quick-launch run (39s, result.md correctly produced). --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

mattleaverton and others added 4 commits April 1, 2026 16:28

mattleaverton merged commit 424d612 into danshapiro:main Apr 2, 2026
1 check failed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Config defaults & environment fixes (Phase 0.9, items 1-3)#77

Config defaults & environment fixes (Phase 0.9, items 1-3)#77
mattleaverton merged 4 commits into
danshapiro:mainfrom
mattleaverton:impl/config-defaults

mattleaverton commented Apr 1, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

mattleaverton commented Apr 1, 2026

Summary

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant