Platform reframe: layered engine, tmux agents, workflow packages#81
Merged
mattleaverton merged 86 commits intoApr 17, 2026
Merged
Conversation
…racts From first real PR review workflow run: - Phase 0.9: config/auto-detect conflict, require_clean default, missing env vars in tool nodes, CLI headless warning, error message hints, worktree file-not-found context - Phase 3.7-3.9: run input contract (--input), output contract, node data passing conventions - Future work: iteration patterns (dynamic loops over collections) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Setup/build/test tool nodes + single review agent node. Experimental workflow for automated PR triage. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Setup script now copies build-test.sh to .ai/ before gh pr checkout changes the branch and removes workflow files. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Raw git checkout (no gh pr checkout) with unique branch names for parallel safety - Separate investigate (exploratory, full tool access) and decide (directive, no tools) agents - Tighter output contract: next actions instead of follow-up tasks - Setup script preserves all workflow scripts to .ai/ before branch switch Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Absolute path for setup script (works from any repo's worktree) - Remove Go-specific assumptions from investigate prompt - Agent discovers build system and runs appropriate checks - Add freshell run config Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…platform Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Create agents/ (L1) and workflows/ (L2) package directories. Split NewDefaultRegistry into NewCoreRegistry (L0-only) + NewDefaultRegistry (backward compat). Export engine types and methods needed by external handler packages: StatusSource, FallbackStatusPath, StageStatusContract, Truncate, WarnEngine, BuildFidelityPreamble, ClassifyAPIError, etc. Add Engine accessor methods: AppendProgress, CXDBPrompt, CXDBInterviewStarted/Completed/Timeout, LastResolvedFidelity, SetDefault. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add Registry field to RunOptions so cmd/kilroy/ can pass a pre-composed handler registry. cmd/kilroy/ now creates a layered registry: L0: engine.NewCoreRegistry() (start, exit, conditional, tool, parallel, fan_in) L1: agents.AgentHandler (codergen/default) L2: workflows.HumanGateHandler, workflows.ManagerLoopHandler Type aliases in agents/ and workflows/ establish the package structure and import direction. Implementations remain in engine/ until Phases 2-3. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
TestCoreRegistry_ToolOnlyGraph demonstrates that a graph using only tool_command nodes executes successfully with NewCoreRegistry (no L1 agent handler or L2 workflow handlers registered). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Rename Go types: CodergenBackend→AgentBackend, CodergenRouter→AgentRouter, SimulatedCodergenBackend→SimulatedAgentBackend, etc. Rename handler type string: "codergen"→"agent" in registry. Rename DOT attribute: agent_mode replaces codergen_mode (with fallback). Rename files: codergen_*.go → agent_*.go. Update comments, diagnostics, test names. The CodergenHandler type name is retained in engine/ as the implementation type (aliased as agents.AgentHandler). It will be renamed when the implementation moves to agents/ in Phase 2.1. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
New package internal/attractor/rundb backed by modernc.org/sqlite (pure Go, no CGO). Global DB at ~/.local/state/kilroy/runs.db. Features: - Auto-applying numbered SQL migrations on DB open - WAL mode for concurrent reads - Schema: runs (with labels, inputs, timing), node_executions (with attempts), edge_decisions, provider_selections - Write ops: InsertRun, CompleteRun, InsertNodeStart, CompleteNode, InsertEdgeDecision, InsertProviderSelection - Read ops: GetRun, LatestRun, ListRuns (filter by status/labels/graph), GetNodeExecutions - Prune ops: PruneRuns (by date, graph, labels, or orphaned logs_root) - Cascade deletes: child records deleted with parent run - 16 tests covering all operations Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add RunDBWriter interface in engine/ (no rundb import needed). Engine records to RunDB at every lifecycle point: - run start (after worktree ready) - node start/complete (each execution) - edge selection decisions - run complete (success/failure) All RunDB calls are best-effort (warn on error, never block). cmd/kilroy/ opens global DB and passes via RunOptions.RunDB. Integration test proves tool graph produces correct DB entries. Also fix RunOptions propagation: RunDB, Registry, Labels now properly forwarded through bootstrapRunWithConfig overrides. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
runs list: queries RunDB first, falls back to filesystem scan. Now shows duration column. runs prune: delegates to RunDB with filter support (before, graph, labels, orphans). status --latest: instant lookup via RunDB.LatestRun() instead of filesystem scan. All commands gracefully fall back to filesystem-based behavior when the RunDB is unavailable or empty. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Structured inputs for graph runs via --input flag. Features: - LoadInputFile (YAML/JSON) and LoadInputString (inline JSON) - Graph declares required inputs via inputs="key1,key2" attribute - ValidateRequiredInputs rejects runs with missing declared inputs - Input values injected into context as input.* keys - Input values expanded in prompts as $input.key placeholders - Input values available as KILROY_INPUT_* env vars in tool_command nodes - Inputs recorded in RunDB Integration test proves tool_command nodes see KILROY_INPUT_* env vars. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Separates graph file location from execution location: - --workspace /path/to/dir sets the execution directory - --graph /path/to/graph.dot determines where prompt_file resolves - When --workspace is omitted, defaults to current behavior - GraphDir derived from --graph path for prompt_file resolution - Workspace flows through RunOptions → engine → worktree creation PrepareOptions gains GraphDir field that takes precedence over RepoPath for prompt_file resolution, enabling cross-repo workflows where graphs and scripts live separately from the target project. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Graphs declare expected output artifacts via outputs="file1,file2" attribute.
After run completion, the engine:
- Searches for declared outputs in the worktree
- Copies found outputs to {logs_root}/outputs/
- Writes outputs.json manifest with found/missing status and file sizes
- Emits warnings for missing declared outputs (not errors)
- Records output collection in progress events
Hooked into persistTerminalOutcome so outputs are collected on every
run completion (success or failure).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add canonical /runs endpoints per platform-reframe plan:
- GET /runs: list runs from RunDB (with status/graph filters)
- GET /runs/{id}/outputs: list collected output artifacts
Existing /pipelines endpoints retained as backward-compat aliases.
All /runs endpoints mirror /pipelines for submit, status, events,
cancel, context, and questions.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Run lifecycle management additions: - --label KEY=VALUE flag on attractor run (stored in RunDB) - --older-than 7d duration-based prune filter (supports d/h/m units) - Labels passed through to RunDB via RunOptions Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
New packages for Layer 1 agent capabilities:
agents/tmux/ — tmux session management:
- Session creation with two-step pattern (shell → respawn-pane)
- Input delivery with sanitization, chunking, and Enter verification
- Output capture with NBSP normalization for prompt detection
- Readiness/idle/exit detection via polling with busy indicators
- Process tree cleanup (SIGTERM → grace → SIGKILL)
- Socket isolation (kilroy-specific tmux socket)
- Session environment variable storage for metadata
- 11 integration tests against real tmux on isolated socket
agents/templates/ — per-tool invocation templates:
- Template struct with per-tool config (args, env, prompt prefix,
busy indicators, startup dialogs, exit behavior)
- Built-in templates: claude, codex, gemini, opencode
- Template registry for name-based lookup
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
TestSmoke_Claude_PrintMode spawns Claude Code via tmux in --print mode, waits for exit, and captures output. Verified working: Claude returns KILROY_SMOKE_OK, session exits with status 0. Tests skip gracefully when API keys or CLI tools are unavailable. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
TmuxAgentHandler implements engine.Handler and orchestrates the full agent lifecycle via tmux sessions: 1. Resolve tool template from node attributes (agent_tool or llm_provider) 2. Build command and environment from template 3. Create tmux session on isolated socket 4. Handle startup dialogs (trust prompts, permission warnings) 5. Wait for completion (exit-based or idle-detection) 6. Capture output and build outcome 7. Clean up session and process tree Two agent handlers now available: - AgentHandler: existing subprocess/API backend (backward compat) - TmuxAgentHandler: tmux-based CLI sessions (new) Also adds SendKeys method to tmux.Manager for dialog interaction. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Wire TmuxAgentHandler into cmd/kilroy/ via --tmux flag. Add exit code
detection via tmux #{pane_dead_status}. Three integration test scenarios
verified against real Claude via tmux:
1. Simple agent task: Claude creates a file, tool node verifies it exists
→ status: success, KILROY_TMUX_TEST_PASS confirmed
2. Multi-node pipeline: Claude writes calc.sh, tool node executes it,
conditional routes on result → 42 computed, routed to success exit
3. Failure routing: Agent succeeds, tool node intentionally fails (cat
nonexistent file), conditional routes to fail exit → correct routing
All three runs recorded in RunDB with timing. TmuxAgentHandler correctly:
- Spawns Claude in tmux sessions on isolated socket
- Passes prompt and environment variables
- Captures output and exit codes
- Detects failures via non-zero exit codes
- Cleans up sessions and process trees
Also adds 3 unit tests with fake agent scripts proving handler contract:
success path, failure detection (exit code 1), and workdir file creation.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add event envelope canonicalization task (3.6), workflow.toml concept for packages, run retro idea, and testing emphasis notes from Phase 2 review. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Defines the GitOps interface that encapsulates all git operations the engine needs. When nil, the engine will operate in plain-directory mode. Added GitOps field to RunOptions and Engine struct. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
GitHook implements the engine.GitOps interface, wrapping gitutil functions for worktree isolation, per-node commits, and branch management. This is the Layer 2 implementation that will replace direct gitutil calls in the engine. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The engine package now has zero direct gitutil imports. All git operations go through the GitOps interface, which is optional. When GitOps is nil, the engine operates in plain-directory mode: no worktrees, no commits, no branch management. Key changes: - engine.run() conditionally sets up git workspace via GitOps - checkpoint() only commits when GitOps is set - parallel_handlers use GitOps for branch workspace isolation, falling back to temp directory copy for no-git mode - resume uses GitOps for worktree recreation - config_defaults accepts GitOps parameter - cmd/kilroy/ creates GitHook and wires it through Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
GitOps is now auto-detected: if the workspace (or cwd when no workspace is specified) is a git repo, git worktrees and commits are enabled. Otherwise, runs proceed in plain-directory mode. DefaultRunConfig now accepts an explicit repoPath parameter so --workspace correctly routes to the git repo. Tested against real binary: - Graph in git repo: worktree + commits created - Graph in plain dir: runs successfully without git - Graph with --workspace to git repo: worktree + commits created Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Adds a pluggable AutoDetectGitOps factory function that eng.run() and bootstrapRunWithConfig call when GitOps is not explicitly set. This preserves backward compatibility: existing callers that set RepoPath to a git repo automatically get git worktree behavior. Fixes: - Branch engine now inherits GitOps from parent (parallel commits work) - TestRun_FailsWhenNotAGitRepo renamed to TestRun_SucceedsInNonGitDir (non-git dirs are now valid — the intended Phase 3.1 behavior) - cmd/kilroy/ registers AutoDetectGitOps at init time - Test TestMain registers testGitOps auto-detection Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Serves the canonical run.log with query params: ?node=, ?source=, ?event=, ?since=, ?tail= for filtered reads, and ?stream=true for live SSE tailing via polling. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Each template gets a LogLocator that finds the CLI tool's conversation log, and a parser that extracts tool_call, tool_result, text, and thinking events. The tmux handler emits parsed events to RunLog after agent completion. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Emits worktree.created when the run worktree is set up, and commit events with diff stats when recordNodeDiff finds file changes. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Move rundbRecordProviderIfAgent to after executeWithRetry so provider/model attrs are populated. Pass llm_model from node attributes through to CLI tool --model flags. Record agent_tool as the backend in provider_selections. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Exercises input/output contracts, .kilroy/ convention files, agent_tool routing across claude/codex/opencode, edge conditions, and run log events. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Claude: add --bare flag to skip keychain/OAuth, rely purely on ANTHROPIC_API_KEY env var. Removes need for startup dialog handling. Codex: use exec subcommand with --full-auto --skip-git-repo-check. Write isolated auth.json under CODEX_HOME per session so codex uses the API key without touching ~/.codex/. OpenCode: add --format json --pure --dir flags. Inject provider config via OPENCODE_CONFIG_CONTENT env var for keyless config isolation. Add PrepareSession hook to Template for per-tool filesystem setup before tmux session creation. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The CLI preflight probe uses the old subprocess invocation path which doesn't match tmux template auth isolation. Skip it when the caller knows the tools are configured. Also fix codex node to use o3-mini model (can't probe claude model on the openai provider). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Claude: normalize dots to dashes in model ID (claude-sonnet-4.6 → claude-sonnet-4-6) since Claude CLI uses dash format. Codex: auth.json auth_mode must be lowercase "apikey" not "ApiKey". OpenCode: model format is provider/model (anthropic/claude-sonnet-4-6), add prefix and normalize dots. Also fix SkipPreflight not propagating through RunOptions override copy. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
o3-mini doesn't support codex's web_search_preview tool. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
--full-auto implied web_search_preview which most models reject. Use --sandbox workspace-write directly. Switch to gpt-5.4-nano which supports codex's tool set. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Each CLI template now produces structured output (stream-json for
claude, --json for codex, --format json for opencode). The handler
redirects stdout to {stageDir}/agent_output.jsonl and parses it
directly — no more hunting through tool-specific log directories.
LogLocator remains as fallback for non-structured-output modes.
Response text is extracted from the JSONL for response.md.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Codex defaults web_search="cached" which sends web_search_preview tool on every request. Most small models reject it. Disable it since Kilroy agents don't need web search. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Parsers were written speculatively. Now matched to real output: - Codex: item.completed/started with agent_message, command_execution - OpenCode: tool_use with nested part.state, text events Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Fix .git detection to use -e (file or dir) for worktree support - Convert prompt_file to inline prompts with KILROY_STAGE_STATUS_PATH - Add output contract declarations on agent nodes Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add TailJSONL that watches agent_output.jsonl and emits events to RunLog as lines appear. Refactor parsers to expose per-line functions (ParseClaudeLine, ParseCodexLine, ParseOpenCodeLine) used by both the tailer and batch parsing. The tailer starts when the tmux session is created and stops when the agent exits. Events flow through RunLog to the SSE endpoint in real time, so the UI sees agent tool calls as they happen rather than in a batch after completion. Falls back to batch parsing when structured output isn't available. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
When structured output is redirected to a file, the tmux pane is empty and the stall watchdog sees no progress. The real-time tailer now calls TickStallWatchdog on each parsed event, keeping the watchdog alive during agent execution. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
When eng.run() returns an error (stall watchdog, context cancellation, etc.), the run stayed as "running" in the DB forever. Now RunWithConfig records a fail status before returning the error. The existing ReconcileStaleRuns on server startup handles panics and unclean exits as a safety net. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…cel fix
- SQLite: add busy_timeout(5000), synchronous(normal), SetMaxOpenConns(1)
so concurrent detached runs don't silently fail to register in the DB
- Detach: forward --input, --workspace, --package, --tmux, --skip-preflight,
--label flags to child process; resolve relative paths to absolute
- Invocation capture: record os.Args and run config in manifest.json and
runs DB (new migration 004); expose in API response
- Prefix ID matching: GET /runs/{short-id} resolves to full run ID via
DB prefix query and in-memory registry scan
- Cancel: fall back to PID-based SIGTERM for CLI-launched detached runs
that aren't in the server's in-memory pipeline registry
- AGENTS.md: document backend:cli vs api, --tmux flag, correct run config
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Bundles the single-file Kilroy dashboard SPA (index.html + Graphviz WASM worker) into the binary via //go:embed so `kilroy attractor serve` exposes a working dashboard at http://localhost:9700/ui/ with no extra processes or CORS configuration. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Artifact capture: new node_execution_artifacts table (migration 005). At rundbRecordNodeComplete, ingest stage files (prompt, response, agent_output.jsonl, events.ndjson, status, stdout/stderr, tool_timing, etc.) as blobs keyed by node_execution_id. Each retry and each loop iteration gets its own DB row + captured artifacts, fixing retry history loss and enabling loop iteration history. - handleGetNodeTurns now serves from DB first with filesystem fallback for legacy runs. Response includes source="db"|"filesystem" so the UI can tell. - Loop primitive: new trapezium (loop.begin) and invtrapezium (loop.end) node shapes for multi-node loops, plus loop_count/loop_until_file/ loop_until_file_contains/loop_max attributes on any node for single-node loops. Termination evaluated after each iteration; loop_max exceeded fails the run. Separate from existing loop_restart which only handles transient_infra failure restarts. - Loop iterations tracked in Engine.loopIterations so each iteration gets a distinct attempt number in node_executions (currently on the loop-back target; body-node attempt numbering is a UX follow-up). - Label filtering wired end to end: GET /runs?label=KEY=VALUE&limit=N and kilroy attractor runs list --label --status --graph --limit. Underlying DB filter already existed; just surfaced to API and CLI. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- activeLoopIteration tracks current iteration across an entire loop body
so every node execution inside a multi-node loop records a distinct
attempt number (previously only the jump target got incremented, body
nodes all recorded attempt=1).
- captureReferencedScripts reads tool_invocation.json, tokenizes argv and
command fields (handling bash -c "sh script.sh" pattern), and captures
referenced script files as tool_script:<name> artifacts. The UI shows
them alongside stdout/stderr in the Detail tab.
- New endpoint GET /runs/{id}/nodes/{nodeId}/attempts returns all attempt
rows for a node. GET /runs/{id}/nodes/{nodeId}/turns now accepts
?attempt=N to load a specific iteration's captured artifacts.
- UI: sidebar shows iteration badges (↻1/5, ↻2/5, ...) when a node has
multiple attempts. Detail tab shows "Iteration N of M" banner and
passes n.attempt to the turns fetch so each iteration loads its own
data. Command and captured scripts render in the Detail view.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
New process-flow-level primitive for running independent node chains in parallel in the shared workspace. Distinct from the existing parallel handler (shape=component) which is worktree-isolated and winner-takes-all for LLM code-gen branching. - Shapes: pentagon → concurrent.split, cylinder → concurrent.join. Paired via concurrent_id attribute (defaults to the split node's ID). - runConcurrentRegion dispatches each outgoing edge from the split as a goroutine running runBranchUntilJoin. All branches share the engine's context, DB writer, git worktree, and progress sink. Each node executes through the same rundbRecordNodeStart/executeWithRetry/CompleteNode/ CaptureArtifacts sequence as the main loop. - Fail-fast: first branch error cancels the parent context, siblings exit at their next cancellation checkpoint. Optional allow_partial=true attribute on the split disables fail-fast. - Git commits: suppressed for non-sentinel nodes while concurrentDepth > 0. Concurrent region is treated as one atomic checkpoint unit. - Rejects nested concurrent regions and loops inside concurrent regions as runtime errors. Graph validation rule can be added later. Known follow-up: subprocess cancellation does not kill running child processes (sleep in a cancelled branch runs to completion) — branch goroutines see the cancelled context but the tool handler's exec doesn't propagate the kill. Separate lifecycle concern, not specific to the concurrent primitive. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…rrent/loop Subprocess cancellation: - ToolHandler now runs commands in their own process group via setProcessGroupAttr (Setpgid=true) and sets cmd.Cancel to forceKillProcessGroup so context cancellation kills the entire process tree, not just the shell. Before this fix, a cancelled `bash -c "sleep 20"` left sleep as an orphan with the stdout pipe open, and cmd.Wait() blocked for the full 20s. Verified with a fail-fast concurrent test: total run time dropped from 20.5s to 1.1s. Graph validation: - lintConcurrentSplitMinBranches: concurrent_split requires ≥2 outgoing edges - lintConcurrentSplitHasJoin: concurrent_split must have a paired concurrent_join - lintNoNestedConcurrentRegions: concurrent regions cannot be nested - lintNoLoopsInConcurrentRegions: loops cannot be nested in concurrent regions - Pairs are matched by concurrent_id attribute (falling back to node ID) - nodesBetween walks the graph from the split forward to the join to build the "inside the region" set Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Adds a minimal two-node workflow (stage + agent) for kicking off one-shot
investigation runs with --input '{"prompt":...,"context_file":...}'. Three
graph variants route to claude, codex, or gemini via the existing
agent_tool/model_stylesheet mechanism.
Adds `kilroy attractor runs show <id-or-prefix>` with --json, --outputs, and
--print <file> modes so an agentic caller can pull result.md (or any declared
output) back out without poking at the logs directory by hand. runs list
--json now carries worktree_dir, repo_path, run_branch, and logs_root too.
New quick-launch skill gives agents the exact invocation for firing a one-shot delegated run: --detach --tmux + --package + --label + --input and the follow-up runs list / runs show / runs show --print flow for checking status and pulling result.md back out. Structured after the trycycle subskill style: action-oriented steps, no theory. using-kilroy was missing several current flags (--package, --tmux, --label, --input, --workspace, --skip-cli-headless-warning) and had no coverage of the runs subcommand family, so those gaps are filled in alongside a pointer to quick-launch for the one-shot case.
Adds skills/quick-launch/commands/kilroy-quick.md as the canonical slash command file, symlinked into ~/.claude/commands/ and ~/.codex/commands/ at install time. One source of truth, live-editable from the repo. Updates SKILL.md to reference ~/.local/share/kilroy/workflows/quick-launch (installed as a symlink) instead of an <ABS_PATH> placeholder, and drops the --config requirement — kilroy auto-builds a default run config when cwd is a git repo and auto-detects installed provider CLIs. Verified with a bare git init + config-less launch.
Driven by feedback from testing /kilroy-quick in Claude. Five changes: 1. --prompt-file <path>: read a file verbatim into the "prompt" input key. Replaces hand-escaped multi-line JSON in --input. Strongly preferred for anything beyond a one-liner — no \n escapes, no quoting hazards. 2. Auto --no-cxdb when --config is absent. The zero-config default run config doesn't populate cxdb addresses, so requiring cxdb was just noise. Explicit --config with cxdb.binary_addr still enables it. 3. Auto-skip the interactive CLI-backend warning when stdin isn't a terminal. Uses mattn/go-isatty because a naive Mode&CharDevice check treats /dev/null as a TTY. Agent-driven invocations, CI, pipes, and the detach child all hit this path. 4. runs show --latest --label k=v and new runs wait subcommand. show returns the most recent matching run; wait polls the run DB until the target reaches a terminal state and exits 0/1/2 for success/fail/timeout. Both support the same id-or-prefix-or-latest target resolution. 5. launchDetached was starting the child with cmd.Dir=logs_root, so the detach child's cwd was the logs dir instead of the user's workspace — runs reported repo_path pointing at the logs dir and worktrees never saw the real files. Parent now forwards its own cwd to the child via --workspace when none was passed explicitly. Quick-launch workflow package simplified to a single agent node. The previous stage.sh wrote .kilroy/TASK.md, but the engine rewrites that file before every node; contents got clobbered. Inputs now land in .kilroy/INPUT.md (written once at run start) and the agent reads from there directly. scripts/install-skills.sh wires everything up idempotently: symlinks for the binary, the workflows dir, and the skills/commands into ~/.claude, ~/.agents (codex's native discovery path — not ~/.codex/skills), and ~/.config/opencode. Also rebuilds the SKILL.md to document --prompt-file as the default path for non-trivial tasks, drops the --no-cxdb / --skip-cli-headless-warning mentions, and points to runs wait / runs show --latest for the check-status / retrieve-result flow.
Kilroy's isolated codex home used to copy both auth.json and config.toml from the user's real ~/.codex/ into the kilroy-owned codex state dir. That leaked user-scoped settings (model_reasoning_effort, personality, model) into kilroy runs, so a setting that worked for the user's interactive codex sessions could silently break kilroy runs for specific models — notably `gpt-5-codex` rejecting the inherited `model_reasoning_effort = "xhigh"` upstream with a 400 during preflight probes. Two fixes here: 1. Drop the config.toml copy entirely. Run configuration must come from kilroy and the .dot graph, not by accident from whatever the user has in ~/.codex/config.toml. If kilroy codex runs need specific settings, those belong in the graph or run.yaml. 2. When OPENAI_API_KEY is available in the parent env, write a fresh apikey auth.json into the isolated codex home instead of copying whatever auth.json the user has. This matches what tmux_handler.go + templates/codex.go already does for non-probe runs, so the probe stops diverging from the real run: both paths now force apikey mode when a key is present. When no OPENAI_API_KEY is set, kilroy still falls back to copying the user's auth.json (subscription auth). Probes under that path can't exercise apikey-only models like gpt-5-codex, but the rest of preflight still runs against something plausible. Tests updated: the old assertions on config.toml existence are replaced with explicit "must-not-exist" checks, and a new test covers the apikey auth.json write path. Verified end-to-end with a codex graph.codex.dot quick-launch run (39s, result.md correctly produced).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This PR reframes Kilroy as a layered platform: a core graph-execution engine (L0), an agent-execution layer that drives CLI tools via tmux (L1), and workflow packages that ship as self-contained directories with graphs, scripts, and manifests (L2). It adds the primitives and server infrastructure needed to run multi-agent workflows reliably from a headless process: explicit loop and concurrent split/join nodes, a SQLite run database, a per-run structured activity log, a REST API with SSE streaming, and an embedded dashboard UI. Three ready-to-use workflow packages ship with the branch:
quick-launch,pr-review, andbuild-test.This unblocks building agentic automation that can be launched from other agents (the quick-launch package is already used that way), observed in real time via the log API, and extended by adding new workflow packages without touching engine code.
What's new
Layered architecture (L0/L1/L2)
internal/attractor/engine/) is now L0-only: graph execution, node dispatch, lifecycle hooks. No agent or git specifics inside.internal/attractor/agents/) providesTmuxAgentHandler, which is registered as theagenthandler at startup.internal/attractor/workflows/) providesGitHook,HumanGateHandler, and the package loader — registered at startup incmd/kilroy/main.govianewLayeredRegistry().GitOpsinterface in the engine means git mode is fully optional; the engine runs in plain-directory mode whenGitOpsis nil, enabling non-git workflows.codergenhas been renamed toagentthroughout (types, files, test names).Workflow packages and CLI flags
--package <dir>: loads a self-contained workflow from a directory containinggraph.dot, optionalworkflow.toml, andscripts//prompts/subdirectories. Package scripts and prompts are materialized into.kilroy/package/inside the workspace before execution.--workspace <dir>: sets the execution directory independently of the source repo.--input <path-or-json>: passes structured inputs as a JSON/YAML file or inline JSON. Values are injected asKILROY_INPUT_<KEY>env vars and written to.kilroy/INPUT.md.--prompt-file <path>: stages a file as the node prompt.--label KEY=VALUE: attaches metadata to a run; queryable viaruns listandruns show.--tmux: enables the tmux agent handler for the run.runs show,runs wait: wait on and inspect a run by ID or prefix;--latestwith--labelfilters to pick the most recent matching run.Engine primitives
internal/attractor/engine/loop.go): explicit single-node or multi-node iteration. Termination conditions:loop_count,loop_max,loop_until_file,loop_until_file_contains,loop_while_outcome=fail. Each iteration gets its own attempt number and DB row for observability. Nested loops are rejected at validation.internal/attractor/engine/concurrent.go): runs independent node chains concurrently in the same workspace, waits for all branches. Configured viaconcurrent_idnode attribute;allow_partial=truepermits branch failures. Nested concurrent scopes are rejected.outputs=attribute; the engine enforces their existence after the node completes and writes.kilroy/FEEDBACK.mdon failure.internal/attractor/engine/kilroy_files.go):INPUT.md,CONTEXT.md,TASK.md,FEEDBACK.mdwritten by the engine as standard inter-node data files.KILROY_STAGE_STATUS_PATH/KILROY_STAGE_STATUS_FALLBACK_PATHenv vars let scripts signal completion status.Agent execution via tmux
TmuxAgentHandler(internal/attractor/agents/tmux_handler.go) spawns each agent node in a named tmux session on thekilroysocket. Sessions persist after completion for inspection; pane output is tailed in real time.internal/attractor/agents/tmux/) handles create, destroy, wait, and health checks. Process group kill (SIGTERMthenSIGKILL) on context cancel — no orphan processes.internal/attractor/agents/templates/) define per-tool startup, env, and structured-output behavior forclaude,codex,opencode, andgemini. Adding a new CLI tool requires only a new template, no handler code.auth.jsoninto an isolatedCODEX_HOME; the user's~/.codex/config.tomlis explicitly excluded so user-local settings (model, reasoning effort) cannot bleed into kilroy runs.internal/attractor/agents/agentlog/): structuredAgentEventextraction from Claude JSONL, Codex JSONL, and OpenCode JSONL. A live tailer reads the log during execution and emits events to the RunLog.RunLog and observability
RunLog(internal/attractor/engine/runlog.go): newline-delimited JSON activity log written to{run_logs_root}/run.log. Every lifecycle event (run start/end, node start/end, edge decision, checkpoint, git activity, tool stdout/stderr line, agent conversation event) is emitted as a timestampedRunLogEvent.GET /runs/{id}/log: HTTP endpoint that serves the run.log with optional query filters (?node=,?source=,?event=,?since=,?tail=N) and SSE streaming (?stream=true) for live tail without polling.worktree.created,commit) appear in the RunLog so the full timeline is in one place.Server and REST API
internal/server/now covers a full run-management API:POST /runs: accepts a workflow package, inputs, workspace, and labels to start a run.GET /runs,GET /runs/{id}: run list and detail from the RunDB (no filesystem-only fallback for completed runs).GET /runs/{id}/log: RunLog with SSE.GET /runs/{id}/nodes/{nodeId}/diff: per-node git diff.GET /runs/{id}/files/{path...},GET /runs/{id}/workspace/{path...}: file browser.GET /workflows: lists available packages./pipelines/...kept as a backward-compatibility alias./ui/(internal/server/ui/index.html,viz.js,viz-render.js) — compiled into the binary via//go:embed. No separate asset server needed.runs show 01KPBresolves to the unique matching run, same as the CLI.internal/attractor/rundb/rundb.go) for safe concurrent server writes.Workflows shipped
workflows/quick-launch/: single-agent task runner. Acceptsprompt, optionalcontext_fileorcontext. Three graph variants:graph.dot(Claude),graph.codex.dot(Codex),graph.gemini.dot(Gemini). Ships with a kilroy slash-command skill and install script.workflows/pr-review/: full PR review pipeline — checkout, build/test, per-file code review (Claude), holistic review, combined report. Acceptspr_repoandpr_numberinputs; emitsreview-report.md.workflows/build-test/: build-and-test workflow for CI-style validation.workflows/multi-tool-exercise/: multi-agent graph used to validate concurrent tool execution and observability.Reliability fixes
Setpgid: true) and sendsSIGTERMto the group on context cancellation, falling back toSIGKILLif needed (internal/attractor/engine/process_group_unix.go). Prevents orphaned child processes on run cancel.concurrentandloopscopes at graph load time.Breaking changes
codergenis renamed toagentthroughout internal types and test file names. Any code outside this repo that importsinternal/attractor/enginetypes namedCodergen*will need to be updated.--graphremains the primary flag;--packageis additive. No existing graph-based invocations change./pipelines/server API is retained as an alias but the canonical path is/runs/.internal/attractor/workflows/now containsHumanGateHandler(moved out of the engine package). Callers that registered await.humanhandler directly into the engine registry will need to use the L2 package instead, or registerworkflows.NewHumanGateHandler()themselves.Known gaps and follow-ups
internal/attractor/workflows/supervisor.go) is present but not wired to any endpoint or scheduler yet.process_group_windows.go); subprocess cleanup on cancel is not guaranteed on Windows.engine/cxdb_hook.go,engine/document CXDB integration boundary) but CXDB remains embedded in the engine; extraction to a separate service is a planned follow-up.graph.gemini.dot) exists but the Gemini template is minimal and untested end-to-end.Test plan
go build ./...compiles cleanly.go test ./...passes (unit and integration tests; integration tests require external services to be available or will skip).kilroy attractor run --graph demo/tmux-agent-test/graph.dot --tmuxcompletes without error; check that the tmux session is cleaned up.kilroy attractor run --package workflows/quick-launch --input '{"prompt":"say hello"}' --workspace /tmp/ql-test --tmuxproduces aresult.mdin the workspace.kilroy attractor runs listshows the run;kilroy attractor runs show --latestprints its detail.kilroy attractor servestarts; visithttp://localhost:8080/ui/and confirm the dashboard loads and lists the completed runs.GET http://localhost:8080/runs/<id>/logreturns the NDJSON log; add?stream=trueand confirm it stays open while a run is in progress.loop_count=3node and verify three distinct attempt rows appear inruns show --jsonoutput.tmux ls -L kilroyshould be empty).