Config defaults & environment fixes (Phase 0.9, items 1-3)#77
Merged
mattleaverton merged 4 commits intoApr 2, 2026
Conversation
When a user provides a --config file that doesn't mention providers, auto-detection was completely skipped, causing "missing provider backend" errors even with API keys in the environment. Now auto-detection always runs and fills in missing providers — config-file values still take precedence via ApplyDetectedProviders. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Kilroy creates its own worktree for each run, so the parent repo's cleanliness is irrelevant. The old default of true caused runs to fail before starting whenever the repo had untracked or modified files. Users who need the old behavior can set require_clean: true in config. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Tool nodes (shape=parallelogram with tool_command) were missing KILROY_RUN_ID, KILROY_LOGS_ROOT, KILROY_WORKTREE_DIR, and other stage runtime env vars that agent nodes receive. This broke the .ai/runs/$KILROY_RUN_ID/ data-passing convention between tool and agent nodes. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The previous commit changed the unconditional assignment from true to false, but it still overwrote any value the caller passed. Remove the unconditional assignment (bool zero value is already false) and fix the test that relies on require_clean=true to explicitly opt in. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
mattleaverton
added a commit
to mattleaverton/kilroy
that referenced
this pull request
Apr 10, 2026
Complete, portable PR review workflow that reviews any GitHub PR: setup → build/test → collect diff → code review → holistic review → report. Verified end-to-end against danshapiro/kilroy PR danshapiro#77: all nodes succeeded, review-report.md produced, output contract collected, run recorded in DB. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
mattleaverton
added a commit
that referenced
this pull request
Apr 17, 2026
* docs(plan): add Phase 0.9 first-run friction fixes, input/output contracts
From first real PR review workflow run:
- Phase 0.9: config/auto-detect conflict, require_clean default, missing
env vars in tool nodes, CLI headless warning, error message hints,
worktree file-not-found context
- Phase 3.7-3.9: run input contract (--input), output contract, node
data passing conventions
- Future work: iteration patterns (dynamic loops over collections)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* wip: PR review workflow graph (v2)
Setup/build/test tool nodes + single review agent node.
Experimental workflow for automated PR triage.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* wip: fix build-test script survival across branch switch
Setup script now copies build-test.sh to .ai/ before gh pr checkout
changes the branch and removes workflow files.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* wip: PR review workflow v3 — investigate/decide split
- Raw git checkout (no gh pr checkout) with unique branch names for parallel safety
- Separate investigate (exploratory, full tool access) and decide (directive, no tools) agents
- Tighter output contract: next actions instead of follow-up tasks
- Setup script preserves all workflow scripts to .ai/ before branch switch
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* wip: make PR review workflow repo-agnostic
- Absolute path for setup script (works from any repo's worktree)
- Remove Go-specific assumptions from investigate prompt
- Agent discovers build system and runs appropriate checks
- Add freshell run config
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* docs(plan): platform reframe — layered architecture for software ops platform
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* engine: export types and create L1/L2 package structure
Create agents/ (L1) and workflows/ (L2) package directories.
Split NewDefaultRegistry into NewCoreRegistry (L0-only) + NewDefaultRegistry
(backward compat). Export engine types and methods needed by external handler
packages: StatusSource, FallbackStatusPath, StageStatusContract, Truncate,
WarnEngine, BuildFidelityPreamble, ClassifyAPIError, etc.
Add Engine accessor methods: AppendProgress, CXDBPrompt,
CXDBInterviewStarted/Completed/Timeout, LastResolvedFidelity, SetDefault.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* engine: wire layered handler registration from cmd/kilroy/
Add Registry field to RunOptions so cmd/kilroy/ can pass a pre-composed
handler registry. cmd/kilroy/ now creates a layered registry:
L0: engine.NewCoreRegistry() (start, exit, conditional, tool, parallel, fan_in)
L1: agents.AgentHandler (codergen/default)
L2: workflows.HumanGateHandler, workflows.ManagerLoopHandler
Type aliases in agents/ and workflows/ establish the package structure and
import direction. Implementations remain in engine/ until Phases 2-3.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* engine: add test proving tool-only graph works with L0-only registry
TestCoreRegistry_ToolOnlyGraph demonstrates that a graph using only
tool_command nodes executes successfully with NewCoreRegistry (no L1
agent handler or L2 workflow handlers registered).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* engine: rename codergen to agent throughout
Rename Go types: CodergenBackend→AgentBackend, CodergenRouter→AgentRouter,
SimulatedCodergenBackend→SimulatedAgentBackend, etc.
Rename handler type string: "codergen"→"agent" in registry.
Rename DOT attribute: agent_mode replaces codergen_mode (with fallback).
Rename files: codergen_*.go → agent_*.go.
Update comments, diagnostics, test names.
The CodergenHandler type name is retained in engine/ as the implementation
type (aliased as agents.AgentHandler). It will be renamed when the
implementation moves to agents/ in Phase 2.1.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* style: gofmt formatting pass after rename
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* rundb: add SQLite run database with migration runner
New package internal/attractor/rundb backed by modernc.org/sqlite (pure Go,
no CGO). Global DB at ~/.local/state/kilroy/runs.db.
Features:
- Auto-applying numbered SQL migrations on DB open
- WAL mode for concurrent reads
- Schema: runs (with labels, inputs, timing), node_executions (with attempts),
edge_decisions, provider_selections
- Write ops: InsertRun, CompleteRun, InsertNodeStart, CompleteNode,
InsertEdgeDecision, InsertProviderSelection
- Read ops: GetRun, LatestRun, ListRuns (filter by status/labels/graph),
GetNodeExecutions
- Prune ops: PruneRuns (by date, graph, labels, or orphaned logs_root)
- Cascade deletes: child records deleted with parent run
- 16 tests covering all operations
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* engine: wire RunDB into engine lifecycle events
Add RunDBWriter interface in engine/ (no rundb import needed).
Engine records to RunDB at every lifecycle point:
- run start (after worktree ready)
- node start/complete (each execution)
- edge selection decisions
- run complete (success/failure)
All RunDB calls are best-effort (warn on error, never block).
cmd/kilroy/ opens global DB and passes via RunOptions.RunDB.
Integration test proves tool graph produces correct DB entries.
Also fix RunOptions propagation: RunDB, Registry, Labels now properly
forwarded through bootstrapRunWithConfig overrides.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* cli: update runs list/prune/status --latest to use RunDB
runs list: queries RunDB first, falls back to filesystem scan.
Now shows duration column.
runs prune: delegates to RunDB with filter support (before, graph, labels, orphans).
status --latest: instant lookup via RunDB.LatestRun() instead of filesystem scan.
All commands gracefully fall back to filesystem-based behavior when
the RunDB is unavailable or empty.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* engine: add input contract system
Structured inputs for graph runs via --input flag.
Features:
- LoadInputFile (YAML/JSON) and LoadInputString (inline JSON)
- Graph declares required inputs via inputs="key1,key2" attribute
- ValidateRequiredInputs rejects runs with missing declared inputs
- Input values injected into context as input.* keys
- Input values expanded in prompts as $input.key placeholders
- Input values available as KILROY_INPUT_* env vars in tool_command nodes
- Inputs recorded in RunDB
Integration test proves tool_command nodes see KILROY_INPUT_* env vars.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* engine: add workspace abstraction
Separates graph file location from execution location:
- --workspace /path/to/dir sets the execution directory
- --graph /path/to/graph.dot determines where prompt_file resolves
- When --workspace is omitted, defaults to current behavior
- GraphDir derived from --graph path for prompt_file resolution
- Workspace flows through RunOptions → engine → worktree creation
PrepareOptions gains GraphDir field that takes precedence over RepoPath
for prompt_file resolution, enabling cross-repo workflows where graphs
and scripts live separately from the target project.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* engine: add output contract
Graphs declare expected output artifacts via outputs="file1,file2" attribute.
After run completion, the engine:
- Searches for declared outputs in the worktree
- Copies found outputs to {logs_root}/outputs/
- Writes outputs.json manifest with found/missing status and file sizes
- Emits warnings for missing declared outputs (not errors)
- Records output collection in progress events
Hooked into persistTerminalOutcome so outputs are collected on every
run completion (success or failure).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* server: add /runs API endpoints and /pipelines backward compat
Add canonical /runs endpoints per platform-reframe plan:
- GET /runs: list runs from RunDB (with status/graph filters)
- GET /runs/{id}/outputs: list collected output artifacts
Existing /pipelines endpoints retained as backward-compat aliases.
All /runs endpoints mirror /pipelines for submit, status, events,
cancel, context, and questions.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* cli: add --label flag, --older-than duration for prune
Run lifecycle management additions:
- --label KEY=VALUE flag on attractor run (stored in RunDB)
- --older-than 7d duration-based prune filter (supports d/h/m units)
- Labels passed through to RunDB via RunOptions
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* agents: add tmux session manager and invocation templates
New packages for Layer 1 agent capabilities:
agents/tmux/ — tmux session management:
- Session creation with two-step pattern (shell → respawn-pane)
- Input delivery with sanitization, chunking, and Enter verification
- Output capture with NBSP normalization for prompt detection
- Readiness/idle/exit detection via polling with busy indicators
- Process tree cleanup (SIGTERM → grace → SIGKILL)
- Socket isolation (kilroy-specific tmux socket)
- Session environment variable storage for metadata
- 11 integration tests against real tmux on isolated socket
agents/templates/ — per-tool invocation templates:
- Template struct with per-tool config (args, env, prompt prefix,
busy indicators, startup dialogs, exit behavior)
- Built-in templates: claude, codex, gemini, opencode
- Template registry for name-based lookup
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* agents/tmux: add smoke tests for real CLI tools
TestSmoke_Claude_PrintMode spawns Claude Code via tmux in --print mode,
waits for exit, and captures output. Verified working: Claude returns
KILROY_SMOKE_OK, session exits with status 0.
Tests skip gracefully when API keys or CLI tools are unavailable.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* agents: add TmuxAgentHandler for CLI tool execution via tmux
TmuxAgentHandler implements engine.Handler and orchestrates the full
agent lifecycle via tmux sessions:
1. Resolve tool template from node attributes (agent_tool or llm_provider)
2. Build command and environment from template
3. Create tmux session on isolated socket
4. Handle startup dialogs (trust prompts, permission warnings)
5. Wait for completion (exit-based or idle-detection)
6. Capture output and build outcome
7. Clean up session and process tree
Two agent handlers now available:
- AgentHandler: existing subprocess/API backend (backward compat)
- TmuxAgentHandler: tmux-based CLI sessions (new)
Also adds SendKeys method to tmux.Manager for dialog interaction.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* agents: wire TmuxAgentHandler end-to-end and prove it works
Wire TmuxAgentHandler into cmd/kilroy/ via --tmux flag. Add exit code
detection via tmux #{pane_dead_status}. Three integration test scenarios
verified against real Claude via tmux:
1. Simple agent task: Claude creates a file, tool node verifies it exists
→ status: success, KILROY_TMUX_TEST_PASS confirmed
2. Multi-node pipeline: Claude writes calc.sh, tool node executes it,
conditional routes on result → 42 computed, routed to success exit
3. Failure routing: Agent succeeds, tool node intentionally fails (cat
nonexistent file), conditional routes to fail exit → correct routing
All three runs recorded in RunDB with timing. TmuxAgentHandler correctly:
- Spawns Claude in tmux sessions on isolated socket
- Passes prompt and environment variables
- Captures output and exit codes
- Detects failures via non-zero exit codes
- Cleans up sessions and process trees
Also adds 3 unit tests with fake agent scripts proving handler contract:
success path, failure detection (exit code 1), and workdir file creation.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* docs(plan): update Phase 3 with Fabro learnings and implementation notes
Add event envelope canonicalization task (3.6), workflow.toml concept for
packages, run retro idea, and testing emphasis notes from Phase 2 review.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* engine: add GitOps interface for version control abstraction
Defines the GitOps interface that encapsulates all git operations the
engine needs. When nil, the engine will operate in plain-directory mode.
Added GitOps field to RunOptions and Engine struct.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* workflows: implement GitHook using gitutil
GitHook implements the engine.GitOps interface, wrapping gitutil
functions for worktree isolation, per-node commits, and branch
management. This is the Layer 2 implementation that will replace
direct gitutil calls in the engine.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* engine: replace all gitutil imports with GitOps interface
The engine package now has zero direct gitutil imports. All git
operations go through the GitOps interface, which is optional.
When GitOps is nil, the engine operates in plain-directory mode:
no worktrees, no commits, no branch management.
Key changes:
- engine.run() conditionally sets up git workspace via GitOps
- checkpoint() only commits when GitOps is set
- parallel_handlers use GitOps for branch workspace isolation,
falling back to temp directory copy for no-git mode
- resume uses GitOps for worktree recreation
- config_defaults accepts GitOps parameter
- cmd/kilroy/ creates GitHook and wires it through
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* cmd: auto-detect git mode and fix workspace git detection
GitOps is now auto-detected: if the workspace (or cwd when no
workspace is specified) is a git repo, git worktrees and commits
are enabled. Otherwise, runs proceed in plain-directory mode.
DefaultRunConfig now accepts an explicit repoPath parameter so
--workspace correctly routes to the git repo.
Tested against real binary:
- Graph in git repo: worktree + commits created
- Graph in plain dir: runs successfully without git
- Graph with --workspace to git repo: worktree + commits created
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* engine: add AutoDetectGitOps for backward-compatible git detection
Adds a pluggable AutoDetectGitOps factory function that eng.run() and
bootstrapRunWithConfig call when GitOps is not explicitly set. This
preserves backward compatibility: existing callers that set RepoPath
to a git repo automatically get git worktree behavior.
Fixes:
- Branch engine now inherits GitOps from parent (parallel commits work)
- TestRun_FailsWhenNotAGitRepo renamed to TestRun_SucceedsInNonGitDir
(non-git dirs are now valid — the intended Phase 3.1 behavior)
- cmd/kilroy/ registers AutoDetectGitOps at init time
- Test TestMain registers testGitOps auto-detection
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* workflows: extract HumanGateHandler to Layer 2
HumanGateHandler in workflows/ is the real implementation of the
wait.human handler, using exported CXDB event methods on Engine.
Removed wait.human from NewCoreRegistry — it's now registered
only by cmd/kilroy/ via the layered composition.
Tests that build engines manually now explicitly register wait.human.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* workflows: add workflow package support
A workflow package bundles a graph with scripts and prompts into a
portable, self-contained directory. Supports --package flag, package
materialization at .kilroy/package/ in workspace, and workflow.toml
manifest with metadata, inputs, outputs, and default labels.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* workflows: add supervisor prototype with run health assessment
The supervisor monitors runs by querying the RunDB and classifies
health as: healthy, degraded, blocked, failed, or complete.
Detection signals:
- Blocked: same node failing 3+ times, or no progress for 10min
- Degraded: nodes retrying
- Failed: terminal run status
Per-node timing visible via `kilroy attractor status --run <id>`.
Tested against real multi-node graph run.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* engine: document CXDB integration boundary for future extraction
CXDB is already effectively optional via nil-safe methods and the
DisableCXDB flag. Full extraction (removing cxdb imports from engine/)
is deferred — the nil-safe pattern provides zero overhead when disabled,
and the extraction surface is large (30+ event methods, streaming,
bootstrap, resume).
Documents the extraction path for future work.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* engine: add canonical RunEvent envelope for all progress events
All progress events now include a unique ID, UTC timestamp, run_id,
and dot-notation event name. The RunEvent type provides the canonical
envelope, with ToMap/FromMap bridging to the existing progress system.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* agents: fix codex and opencode tmux templates
Codex: TUI mode doesn't exit after completion, so ExitsOnComplete was
wrong. Switch to idle detection with correct prompt prefix (›) and busy
indicators. Add StartupDialog to auto-dismiss the trust prompt.
OpenCode: positional args are directory paths, not prompts. Use the
`run` subcommand for headless execution. Set ExitsOnComplete since
`opencode run` exits after completion.
Found during Phase 1-3 integration verification.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* docs: add Phase 1-3 integration verification report
Real end-to-end testing of 5 scenarios against the ./kilroy binary.
Found and fixed 2 bugs in agent templates (codex trust prompt,
opencode run subcommand). Documents what works, what's broken,
and what needs follow-up.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* docs: update verification report with codex success
Codex retest with fixed template completed end-to-end: startup dialog
dismissed, task completed, idle detection triggered, verify passed.
All 6 scenarios now pass.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* workflows: add build-test workflow package (Phase 4.3)
Pure Layer 0 workflow that detects build systems, runs build + test,
and produces build-report.json. Supports Go, Cargo, npm, Make, CMake,
Maven, and Gradle with optional command overrides via inputs.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* workflows: add PR review workflow package (Phase 4.1)
Complete, portable PR review workflow that reviews any GitHub PR:
setup → build/test → collect diff → code review → holistic review → report.
Verified end-to-end against danshapiro/kilroy PR #77: all nodes succeeded,
review-report.md produced, output contract collected, run recorded in DB.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* server: add DB fallback for run detail, workflows endpoint, sort filter
- GET /runs/{id} falls back to RunDB for completed runs, returns nodes,
edges, provider selections, and DOT source
- GET /workflows lists available workflow packages from workflows/ dir
- GET /runs supports ?sort=newest|oldest|longest
- GET /runs/{id}/context falls back to DB for completed runs
- Add GetEdgeDecisions, GetProviderSelections, GetDotSource to RunDB
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* server: fix JSON casing, reconcile stale runs on startup
- Add json tags to all RunDB read types (snake_case consistently)
- Reconcile runs stuck in "running" for >2h as "interrupted" on server start
- 3 zombie runs from testing correctly marked interrupted
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* server: add output download, provider recording, omitempty cleanup
- GET /runs/{id}/outputs/{name} serves artifact content (md, json, txt)
- Engine records provider/model selection to DB for agent nodes
- Add omitempty to JSON tags so empty strings are omitted
- Add GetEdgeDecisions, GetProviderSelections read methods to RunDB
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* server: add node turns endpoint, edge conditions, output download
- GET /runs/{id}/nodes/{nodeId}/turns returns prompt, response, stdout,
stderr, status, and timing for any node — works for both agent and tool nodes
- Edge decisions now record the condition expression (e.g. "outcome=success")
alongside the selection reason, via new DB migration
- GET /runs/{id}/outputs/{name} serves artifact content directly
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* server: POST /runs accepts workflow packages, inputs, workspace
POST /runs now supports two modes:
1. Legacy: dot_source + config_path (existing behavior)
2. Workflow package: { workflow: "pr-review", workspace: "/path",
inputs: { pr_number: "42" }, labels: { team: "infra" }, tmux: true }
Workflow name is resolved from workflows/ directory. Git integration
auto-detected from workspace path. Config auto-built when not provided.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* server: fix run detail to use DB for completed runs in registry
Completed runs in the in-memory registry returned sparse PipelineStatus
format. Now falls through to DB for the rich response (nodes, edges,
providers, DOT source) when the run is no longer actively running.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* docs(plan): file conventions, git diff tracking, and run observability
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* engine: add .kilroy/ file conventions for inter-node data exchange
Creates .kilroy/ directory in workspace at run start with standard files:
- INPUT.md: structured inputs as readable markdown
- CONTEXT.md: accumulated run context, updated before each node
- TASK.md: current node's task description
- FEEDBACK.md: failure details written on retry
- $KILROY_DATA_DIR env var points to .kilroy/ directory
- .kilroy/ auto-added to .gitignore in git-managed workspaces
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* engine: add output contract enforcement at graph and node levels
Graph-level: DeclaredOutputs/CollectOutputs collect files declared in
graph outputs= attribute to logs_root/outputs/ after run completion.
Implements CollectAndRecordOutputs that was previously a placeholder.
Node-level: optional outputs= attribute on nodes, checked after execution.
Missing files downgrade outcome to fail, write FEEDBACK.md, and trigger
retry if retries remain.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* engine: add per-node git diff tracking
- 003_node_diffs.sql migration for before/after SHA recording
- DiffStat added to GitOps interface and gitutil
- Diff/DiffFileList added to gitutil for full diff retrieval
- Engine captures before SHA pre-execution, records diff after checkpoint
- RunDB read methods: GetNodeDiffs (all), GetNodeDiff (single node+attempt)
- RecordNodeDiff added to RunDBWriter interface
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* server: add diff API endpoint for per-node git diffs
GET /runs/{id}/nodes/{nodeId}/diff returns summary from DB plus
full unified diff and per-file stats from git. Supports ?attempt=N
query param (defaults to latest). Returns 404 for nodes without
diff data, degrades gracefully when worktree is unavailable.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* server: add file browser API for run logs and workspace
GET /runs/{id}/files/{path} - browse and download from logs_root
GET /runs/{id}/workspace/{path} - browse and download from worktree
Directory listings return JSON with name, size, is_dir, modified_at.
File downloads set content-type based on extension. Path traversal
protection via filepath.Abs prefix checking.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* test: add DiffStat to testGitOps helper
Implements the new GitOps.DiffStat method on the test helper
so engine tests compile with the updated interface.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* docs(plan): canonical run log — single verbose activity log per run
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* engine: add RunLog type for canonical per-run activity logging
Writes structured NDJSON events to {logs_root}/run.log with timestamp,
level, source, node, event, message, and optional data fields.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* engine: wire RunLog lifecycle events at run/node/edge/checkpoint points
Emits run.started, node.started, node.completed, edge.selected,
context.updated, checkpoint.saved, and run.completed events to run.log
alongside existing progress.ndjson calls.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* engine: stream tool stdout/stderr line-by-line through RunLog
LineWriter tees command output to both the log file and RunLog events.
Each newline-terminated line becomes a source=tool stdout/stderr event.
Also emits exit code event when the tool process completes.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* server: add GET /runs/{id}/log endpoint with filtering and SSE streaming
Serves the canonical run.log with query params: ?node=, ?source=,
?event=, ?since=, ?tail= for filtered reads, and ?stream=true for
live SSE tailing via polling.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* agents: add conversation log parsers for Claude, Codex, and OpenCode
Each template gets a LogLocator that finds the CLI tool's conversation
log, and a parser that extracts tool_call, tool_result, text, and
thinking events. The tmux handler emits parsed events to RunLog after
agent completion.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* engine: add git activity events to RunLog (worktree.created, commit)
Emits worktree.created when the run worktree is set up, and commit
events with diff stats when recordNodeDiff finds file changes.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: provider recording timing and model passing to CLI tools
Move rundbRecordProviderIfAgent to after executeWithRetry so provider/model
attrs are populated. Pass llm_model from node attributes through to CLI tool
--model flags. Record agent_tool as the backend in provider_selections.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* workflows: add multi-tool exercise for observability testing
Exercises input/output contracts, .kilroy/ convention files, agent_tool
routing across claude/codex/opencode, edge conditions, and run log events.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* agents: isolate CLI tool auth for headless operation
Claude: add --bare flag to skip keychain/OAuth, rely purely on
ANTHROPIC_API_KEY env var. Removes need for startup dialog handling.
Codex: use exec subcommand with --full-auto --skip-git-repo-check.
Write isolated auth.json under CODEX_HOME per session so codex uses
the API key without touching ~/.codex/.
OpenCode: add --format json --pure --dir flags. Inject provider config
via OPENCODE_CONFIG_CONTENT env var for keyless config isolation.
Add PrepareSession hook to Template for per-tool filesystem setup
before tmux session creation.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* engine: add --skip-preflight flag for tmux-managed sessions
The CLI preflight probe uses the old subprocess invocation path which
doesn't match tmux template auth isolation. Skip it when the caller
knows the tools are configured.
Also fix codex node to use o3-mini model (can't probe claude model
on the openai provider).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: CLI tool model IDs and auth format from first exercise run
Claude: normalize dots to dashes in model ID (claude-sonnet-4.6 →
claude-sonnet-4-6) since Claude CLI uses dash format.
Codex: auth.json auth_mode must be lowercase "apikey" not "ApiKey".
OpenCode: model format is provider/model (anthropic/claude-sonnet-4-6),
add prefix and normalize dots.
Also fix SkipPreflight not propagating through RunOptions override copy.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: codex auth.json uses OPENAI_API_KEY field, not token
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* workflows: switch codex model to codex-mini-latest
o3-mini doesn't support codex's web_search_preview tool.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: codex template uses sandbox flag directly, gpt-5.4-nano model
--full-auto implied web_search_preview which most models reject.
Use --sandbox workspace-write directly. Switch to gpt-5.4-nano
which supports codex's tool set.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* agents: capture structured JSONL output for agent log parsing
Each CLI template now produces structured output (stream-json for
claude, --json for codex, --format json for opencode). The handler
redirects stdout to {stageDir}/agent_output.jsonl and parses it
directly — no more hunting through tool-specific log directories.
LogLocator remains as fallback for non-structured-output modes.
Response text is extracted from the JSONL for response.md.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: disable codex web_search to support nano/mini models
Codex defaults web_search="cached" which sends web_search_preview
tool on every request. Most small models reject it. Disable it
since Kilroy agents don't need web search.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: rewrite codex and opencode log parsers for actual JSONL formats
Parsers were written speculatively. Now matched to real output:
- Codex: item.completed/started with agent_message, command_execution
- OpenCode: tool_use with nested part.state, text events
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: pr-review workflow for worktree and tmux agent execution
- Fix .git detection to use -e (file or dir) for worktree support
- Convert prompt_file to inline prompts with KILROY_STAGE_STATUS_PATH
- Add output contract declarations on agent nodes
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* agents: real-time agent log streaming during execution
Add TailJSONL that watches agent_output.jsonl and emits events to
RunLog as lines appear. Refactor parsers to expose per-line functions
(ParseClaudeLine, ParseCodexLine, ParseOpenCodeLine) used by both
the tailer and batch parsing.
The tailer starts when the tmux session is created and stops when the
agent exits. Events flow through RunLog to the SSE endpoint in real
time, so the UI sees agent tool calls as they happen rather than in
a batch after completion.
Falls back to batch parsing when structured output isn't available.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: bump stall watchdog from agent log tailer
When structured output is redirected to a file, the tmux pane is
empty and the stall watchdog sees no progress. The real-time tailer
now calls TickStallWatchdog on each parsed event, keeping the
watchdog alive during agent execution.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: record run failure in DB on engine error return
When eng.run() returns an error (stall watchdog, context cancellation,
etc.), the run stayed as "running" in the DB forever. Now
RunWithConfig records a fail status before returning the error.
The existing ReconcileStaleRuns on server startup handles panics
and unclean exits as a safety net.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* server: SQLite concurrent writes, invocation capture, prefix IDs, cancel fix
- SQLite: add busy_timeout(5000), synchronous(normal), SetMaxOpenConns(1)
so concurrent detached runs don't silently fail to register in the DB
- Detach: forward --input, --workspace, --package, --tmux, --skip-preflight,
--label flags to child process; resolve relative paths to absolute
- Invocation capture: record os.Args and run config in manifest.json and
runs DB (new migration 004); expose in API response
- Prefix ID matching: GET /runs/{short-id} resolves to full run ID via
DB prefix query and in-memory registry scan
- Cancel: fall back to PID-based SIGTERM for CLI-launched detached runs
that aren't in the server's in-memory pipeline registry
- AGENTS.md: document backend:cli vs api, --tmux flag, correct run config
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* server: embed and serve dashboard UI at /ui/
Bundles the single-file Kilroy dashboard SPA (index.html + Graphviz
WASM worker) into the binary via //go:embed so `kilroy attractor serve`
exposes a working dashboard at http://localhost:9700/ui/ with no extra
processes or CORS configuration.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* engine: loop primitive, per-iteration artifact capture, label filters
- Artifact capture: new node_execution_artifacts table (migration 005).
At rundbRecordNodeComplete, ingest stage files (prompt, response,
agent_output.jsonl, events.ndjson, status, stdout/stderr, tool_timing,
etc.) as blobs keyed by node_execution_id. Each retry and each loop
iteration gets its own DB row + captured artifacts, fixing retry
history loss and enabling loop iteration history.
- handleGetNodeTurns now serves from DB first with filesystem fallback
for legacy runs. Response includes source="db"|"filesystem" so the UI
can tell.
- Loop primitive: new trapezium (loop.begin) and invtrapezium (loop.end)
node shapes for multi-node loops, plus loop_count/loop_until_file/
loop_until_file_contains/loop_max attributes on any node for
single-node loops. Termination evaluated after each iteration;
loop_max exceeded fails the run. Separate from existing loop_restart
which only handles transient_infra failure restarts.
- Loop iterations tracked in Engine.loopIterations so each iteration
gets a distinct attempt number in node_executions (currently on the
loop-back target; body-node attempt numbering is a UX follow-up).
- Label filtering wired end to end: GET /runs?label=KEY=VALUE&limit=N
and kilroy attractor runs list --label --status --graph --limit.
Underlying DB filter already existed; just surfaced to API and CLI.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* engine: per-iteration attempt numbering, script capture, loop UI
- activeLoopIteration tracks current iteration across an entire loop body
so every node execution inside a multi-node loop records a distinct
attempt number (previously only the jump target got incremented, body
nodes all recorded attempt=1).
- captureReferencedScripts reads tool_invocation.json, tokenizes argv and
command fields (handling bash -c "sh script.sh" pattern), and captures
referenced script files as tool_script:<name> artifacts. The UI shows
them alongside stdout/stderr in the Detail tab.
- New endpoint GET /runs/{id}/nodes/{nodeId}/attempts returns all attempt
rows for a node. GET /runs/{id}/nodes/{nodeId}/turns now accepts
?attempt=N to load a specific iteration's captured artifacts.
- UI: sidebar shows iteration badges (↻1/5, ↻2/5, ...) when a node has
multiple attempts. Detail tab shows "Iteration N of M" banner and
passes n.attempt to the turns fetch so each iteration loads its own
data. Command and captured scripts render in the Detail view.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* engine: concurrent split/join primitive
New process-flow-level primitive for running independent node chains in
parallel in the shared workspace. Distinct from the existing parallel
handler (shape=component) which is worktree-isolated and winner-takes-all
for LLM code-gen branching.
- Shapes: pentagon → concurrent.split, cylinder → concurrent.join.
Paired via concurrent_id attribute (defaults to the split node's ID).
- runConcurrentRegion dispatches each outgoing edge from the split as a
goroutine running runBranchUntilJoin. All branches share the engine's
context, DB writer, git worktree, and progress sink. Each node executes
through the same rundbRecordNodeStart/executeWithRetry/CompleteNode/
CaptureArtifacts sequence as the main loop.
- Fail-fast: first branch error cancels the parent context, siblings exit
at their next cancellation checkpoint. Optional allow_partial=true
attribute on the split disables fail-fast.
- Git commits: suppressed for non-sentinel nodes while concurrentDepth > 0.
Concurrent region is treated as one atomic checkpoint unit.
- Rejects nested concurrent regions and loops inside concurrent regions
as runtime errors. Graph validation rule can be added later.
Known follow-up: subprocess cancellation does not kill running child
processes (sleep in a cancelled branch runs to completion) — branch
goroutines see the cancelled context but the tool handler's exec doesn't
propagate the kill. Separate lifecycle concern, not specific to the
concurrent primitive.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* engine+validate: kill subprocess group on cancel, reject nested concurrent/loop
Subprocess cancellation:
- ToolHandler now runs commands in their own process group via
setProcessGroupAttr (Setpgid=true) and sets cmd.Cancel to
forceKillProcessGroup so context cancellation kills the entire process
tree, not just the shell. Before this fix, a cancelled `bash -c "sleep 20"`
left sleep as an orphan with the stdout pipe open, and cmd.Wait() blocked
for the full 20s. Verified with a fail-fast concurrent test: total run
time dropped from 20.5s to 1.1s.
Graph validation:
- lintConcurrentSplitMinBranches: concurrent_split requires ≥2 outgoing edges
- lintConcurrentSplitHasJoin: concurrent_split must have a paired concurrent_join
- lintNoNestedConcurrentRegions: concurrent regions cannot be nested
- lintNoLoopsInConcurrentRegions: loops cannot be nested in concurrent regions
- Pairs are matched by concurrent_id attribute (falling back to node ID)
- nodesBetween walks the graph from the split forward to the join to build
the "inside the region" set
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* cli+workflows: quick-launch package + runs show command
Adds a minimal two-node workflow (stage + agent) for kicking off one-shot
investigation runs with --input '{"prompt":...,"context_file":...}'. Three
graph variants route to claude, codex, or gemini via the existing
agent_tool/model_stylesheet mechanism.
Adds `kilroy attractor runs show <id-or-prefix>` with --json, --outputs, and
--print <file> modes so an agentic caller can pull result.md (or any declared
output) back out without poking at the logs directory by hand. runs list
--json now carries worktree_dir, repo_path, run_branch, and logs_root too.
* skills: quick-launch skill + using-kilroy refresh
New quick-launch skill gives agents the exact invocation for firing a
one-shot delegated run: --detach --tmux + --package + --label + --input
and the follow-up runs list / runs show / runs show --print flow for
checking status and pulling result.md back out. Structured after the
trycycle subskill style: action-oriented steps, no theory.
using-kilroy was missing several current flags (--package, --tmux,
--label, --input, --workspace, --skip-cli-headless-warning) and had no
coverage of the runs subcommand family, so those gaps are filled in
alongside a pointer to quick-launch for the one-shot case.
* skills/quick-launch: slash command + stable paths
Adds skills/quick-launch/commands/kilroy-quick.md as the canonical slash
command file, symlinked into ~/.claude/commands/ and ~/.codex/commands/
at install time. One source of truth, live-editable from the repo.
Updates SKILL.md to reference ~/.local/share/kilroy/workflows/quick-launch
(installed as a symlink) instead of an <ABS_PATH> placeholder, and drops
the --config requirement — kilroy auto-builds a default run config when
cwd is a git repo and auto-detects installed provider CLIs. Verified with
a bare git init + config-less launch.
* quick-launch: ergonomics pass + workspace fix + install script
Driven by feedback from testing /kilroy-quick in Claude. Five changes:
1. --prompt-file <path>: read a file verbatim into the "prompt" input
key. Replaces hand-escaped multi-line JSON in --input. Strongly
preferred for anything beyond a one-liner — no \n escapes, no
quoting hazards.
2. Auto --no-cxdb when --config is absent. The zero-config default run
config doesn't populate cxdb addresses, so requiring cxdb was just
noise. Explicit --config with cxdb.binary_addr still enables it.
3. Auto-skip the interactive CLI-backend warning when stdin isn't a
terminal. Uses mattn/go-isatty because a naive Mode&CharDevice check
treats /dev/null as a TTY. Agent-driven invocations, CI, pipes, and
the detach child all hit this path.
4. runs show --latest --label k=v and new runs wait subcommand. show
returns the most recent matching run; wait polls the run DB until
the target reaches a terminal state and exits 0/1/2 for
success/fail/timeout. Both support the same id-or-prefix-or-latest
target resolution.
5. launchDetached was starting the child with cmd.Dir=logs_root, so
the detach child's cwd was the logs dir instead of the user's
workspace — runs reported repo_path pointing at the logs dir and
worktrees never saw the real files. Parent now forwards its own cwd
to the child via --workspace when none was passed explicitly.
Quick-launch workflow package simplified to a single agent node. The
previous stage.sh wrote .kilroy/TASK.md, but the engine rewrites that
file before every node; contents got clobbered. Inputs now land in
.kilroy/INPUT.md (written once at run start) and the agent reads from
there directly.
scripts/install-skills.sh wires everything up idempotently: symlinks
for the binary, the workflows dir, and the skills/commands into
~/.claude, ~/.agents (codex's native discovery path — not
~/.codex/skills), and ~/.config/opencode.
Also rebuilds the SKILL.md to document --prompt-file as the default
path for non-trivial tasks, drops the --no-cxdb / --skip-cli-headless-warning
mentions, and points to runs wait / runs show --latest for the
check-status / retrieve-result flow.
* engine: isolate codex runs from user's shell profile
Kilroy's isolated codex home used to copy both auth.json and config.toml
from the user's real ~/.codex/ into the kilroy-owned codex state dir. That
leaked user-scoped settings (model_reasoning_effort, personality, model)
into kilroy runs, so a setting that worked for the user's interactive
codex sessions could silently break kilroy runs for specific models —
notably `gpt-5-codex` rejecting the inherited `model_reasoning_effort =
"xhigh"` upstream with a 400 during preflight probes.
Two fixes here:
1. Drop the config.toml copy entirely. Run configuration must come from
kilroy and the .dot graph, not by accident from whatever the user has
in ~/.codex/config.toml. If kilroy codex runs need specific settings,
those belong in the graph or run.yaml.
2. When OPENAI_API_KEY is available in the parent env, write a fresh
apikey auth.json into the isolated codex home instead of copying
whatever auth.json the user has. This matches what tmux_handler.go +
templates/codex.go already does for non-probe runs, so the probe
stops diverging from the real run: both paths now force apikey mode
when a key is present.
When no OPENAI_API_KEY is set, kilroy still falls back to copying the
user's auth.json (subscription auth). Probes under that path can't
exercise apikey-only models like gpt-5-codex, but the rest of preflight
still runs against something plausible.
Tests updated: the old assertions on config.toml existence are replaced
with explicit "must-not-exist" checks, and a new test covers the apikey
auth.json write path. Verified end-to-end with a codex graph.codex.dot
quick-launch run (39s, result.md correctly produced).
---------
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
loadOrBuildConfig()now runs provider auto-detection for all configs, not just zero-config. A config file that omits providers no longer disables auto-detection — config-file values still take precedence.require_cleandefaults to false: Kilroy creates its own worktree, so the parent repo's cleanliness is irrelevant. Dirty repos no longer block startup. Users can still opt in withrequire_clean: true.KILROY_RUN_ID,KILROY_LOGS_ROOT,KILROY_WORKTREE_DIR, and other stage runtime env vars, matching agent/codergen nodes. Fixes the.ai/runs/$KILROY_RUN_ID/data-passing convention.Test plan
TestToolGraph_PartialConfigAutoDetectsProviders— partial config + env key → provider detected, run succeedsTestToolGraph_DirtyRepoSucceedsWithDefaultConfig— dirty repo, norequire_cleanin config → run succeedsTestToolGraph_RunIDInjected— tool node echoes$KILROY_RUN_ID→ matches expected run IDTestRun_FailsWhenRepoIsDirty— updated to explicitly opt in toRequireClean: true🤖 Generated with Claude Code