From 2d6f5d5937cc90f733e0f3a2e6697ce6ad2c7183 Mon Sep 17 00:00:00 2001 From: Nillo Date: Thu, 18 Jun 2026 22:10:00 +0200 Subject: [PATCH] docs(roadmap): add Warp remote autonomous worker section MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Document the planned Warp feature (recallable remote autonomous worker): how the remote and local daemons coordinate over git as the cross-machine transport (no socket), a plain-language explanation, and the technical design — curated handoff artifact, CAS-ref lease + epoch fencing, event-driven pickup/recall via post-receive/webhook with ls-remote fallback, heartbeat/status ref, short-lived branch-scoped credential — plus reuse map, build order, execution model, and verification. Also commits the previously-staged "Planned Single-Terminal Pair Mode" roadmap subsection already present in the working tree. Co-Authored-By: Claude Opus 4.8 --- docs/ROADMAP.md | 193 ++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 193 insertions(+) diff --git a/docs/ROADMAP.md b/docs/ROADMAP.md index 961425e..6dc6b5f 100644 --- a/docs/ROADMAP.md +++ b/docs/ROADMAP.md @@ -342,6 +342,57 @@ Known limits in the current implementation: ## Roadmap +### Planned Single-Terminal Pair Mode + +The current `ctxrelay pair` command always launches a full live pair as two +foreground provider sessions: one `ctxrelay claude` terminal and one +`ctxrelay codex` terminal. `--no-tui` suppresses the ContextRelay control deck, +but it does not make either provider headless. There is no implemented +`ctxrelay pair --headless` mode today. + +A future single-terminal mode could reduce terminal sprawl by letting the user +pick one interactive lead and run the other side without a visible terminal: + +```text +ctxrelay pair --lead claude --headless codex +ctxrelay pair --lead codex --headless claude +``` + +This should be treated as a new runtime mode, not a rename of backup agents. +Backup agents are already headless, but they are one-shot, read-only helpers for +second opinions. A headless pair peer would need a clearer contract: + +- **Interactive lead:** the selected foreground agent owns the visible + conversation and receives human input. +- **Headless peer:** the other provider can respond to handoffs, deliberations, + and explicit review requests without opening a second terminal. +- **Coordinator remains separate:** `ctxrelay coordinator claude|codex|human` + still decides git-write ownership; `--lead` only controls which provider is + foreground. +- **No hidden auto-work by default:** the headless peer should act only when + routed a task, unless a separate autonomy mode is explicitly enabled. +- **Ledger-first behavior:** every request, response, failure, timeout, and + lifecycle event must be recorded in the shared ledger like normal pair traffic. + +Open design questions before implementation: + +- Whether the headless peer should be persistent for the whole session or spawned + per routed task. +- Whether Codex can provide a suitable long-running headless peer over + `codex exec` or needs a different app-server mode. +- Whether Claude should run through `claude -p` per task or a persistent + non-interactive session with the ContextRelay plugin surface. +- How cancellation, timeouts, cost reporting, and reconnect behavior should + appear in the TUI and `ctxrelay status`. +- How much context the headless peer receives by default so it is useful without + silently inflating token usage. + +The conservative first milestone is a per-task headless peer, reusing the +backup/runner subprocess shape but relaxing the "read-only second opinion" +semantics only for explicit handoffs and deliberations. A persistent headless +peer should wait until the per-task version proves useful and the lifecycle +model is clear. + ### Borrowed AgentBridge Lessons ContextRelay was originally forked from AgentBridge, now canonical at @@ -415,6 +466,148 @@ Reintroducing Grok or another external reviewer should be a separate milestone: - General multi-agent collaboration before a simpler agent identity model exists. - Replacing Claude Code or Codex native auth, session, or model behavior. +## Warp — Remote Autonomous Worker (Planned) + +Status: proposed, not implemented. Every surface below is planned and gated behind `CONTEXTRELAY_ALLOW_WARP=1`, default off. The design converged over three rounds of Claude↔Codex deliberation. + +Warp lets an active Claude + Codex job be handed to a remote, always-on machine that continues it autonomously. The originating machine can observe progress, "tune in", and recall control. The remote commits at checkpoints and pushes at milestones. + +Two facts about the current code shape the whole design: + +- The daemon binds loopback only (`127.0.0.1`, `src/daemon.ts`); there is no transport abstraction, peer concept, or remote auth. +- All git is local today (`src/session/git-helpers.ts`); there is no `push`, `fetch`, `remote`, or `clone` anywhere. + +Consequence: **git is the cross-machine transport.** The local and remote daemons never open a network port to each other. Each stays loopback-only on its own machine, and they communicate asynchronously through a shared git remote — a namespaced work branch plus a heartbeat side ref. This avoids rearchitecting the daemon and is why the feature is buildable. + +### How The Remote And Local Work Together (Plain Language) + +Imagine it is late, you have two AI assistants working on your project, and the job is not done. Warp hands the whole thing to a second computer that stays on overnight, so you can pick it up in the morning. + +The trick that keeps it simple and safe: **your laptop and the remote computer never talk to each other directly.** There is no live phone line between them. Instead they pass work back and forth through a shared online folder — the same place your code already lives (a Git repository like GitHub). Think of a dead-drop or a shared notebook: your laptop writes "here is the job and everything done so far" into the folder, and the remote computer reads it from there. + +- **Handing off (you → remote).** You run one command. Your laptop packages the task plus all your in-progress work into a labelled envelope and drops it in the shared folder. +- **The remote does the work.** The always-on computer picks up the envelope, sets up a clean copy of the project, and runs the same two AI assistants you were using — but now unattended. As it makes progress it saves checkpoints into the shared folder at sensible milestones, not after every tiny edit, so the history stays clean and reviewable. +- **Checking in.** Any time, you run a status command. The remote leaves a small heartbeat note — "still working, on step 4, last saved 2 minutes ago, healthy" — so you can tell "making progress" apart from "stuck" and "crashed". A plain list of saved checkpoints cannot tell you that; the heartbeat can. +- **Taking it back (recall).** When you are ready, you recall the job. If the remote is alive and cooperating, it finishes its current step, saves, and hands control back cleanly. If the remote has died or gone rogue, you do not ask it nicely — you change the locks: invalidate its access, cut off its ability to save anything more, and resume from the last good checkpoint. If the dead machine later wakes up and tries to save stale work, that work is quarantined and ignored. + +Guardrails in plain terms: + +- The remote can only ever save to a **special scratch branch** named after the job. It can never touch your main code, never overwrite history, never delete anything. +- It uses a **temporary, narrow key** — not your real GitHub login — that you can revoke instantly. Like a hotel keycard that opens one room and expires. +- It has **spending and effort limits**: a dollar cap, plus caps on how much it can save, how long it can run, and how many commands it can issue, so an unattended job cannot run away with your money or your machine. + +The one honest limitation: all of this assumes the remote is a computer you trust — your own always-on box. The safety system can control a *cooperative* machine, but it cannot fully control a machine that has been tampered with or is running buggy code while holding a key to your repository. So for now Warp is meant for *your own* second computer, not for hosting other people's jobs. Letting colleagues run their jobs on one shared machine is a much bigger problem (real walls between users, plus billing and abuse protection) and is deliberately left for later. This is consistent with the existing non-goal "Hosted multi-agent orchestration". + +### How The Remote Daemon Works (Technical) + +**Transport: git, not a socket.** Everything flows through one shared git remote both machines can push/fetch (same account for v1). The channel is a work branch namespace `refs/heads/contextrelay/warp/` (never `main`) plus a machine-written, non-semantic heartbeat/status side ref. Each daemon stays loopback-only for its own TUIs; they are peers only in that they read/write the same refs. + +**Local side — handing the job off (`ctxrelay warp prepare/push`, planned).** + +1. Build a curated, schema-versioned handoff artifact (not the raw ledger): job id, lease epoch (starts at 1), parent checkpoint, status, *requested* capabilities, and a human-readable summary. Sanitized and secret-scanned. +2. Capture in-progress work with the existing dirty-capture path (`capturePrimaryDirtySnapshot()` → `git diff --binary` + untracked allowlist, with the symlink/containment guards in `src/session/worktree.ts`), plus a secret scan. +3. Commit artifact + work onto `contextrelay/warp/` and push it (the new git op). +4. Lease state: `LOCAL → WARPED`. + +**Remote side — the warp-host daemon.** + +- Runs `ctxrelay` with a warp-host disposition behind `CONTEXTRELAY_ALLOW_WARP=1`. Still loopback-only; it does not accept inbound sockets from the originating machine. +- Learns about the job **event-driven**: a `post-receive` hook (controlled server) or a GitHub webhook/Action fires the moment the job is pushed; `git ls-remote` pointer-polling is only a fallback. v1 = attended accept (`ctxrelay warp accept `); auto-accept on the trigger is layered on later. See *Transport Mechanics*. +- On accept it checks out a deterministic worktree from the pushed branch, validates the artifact and secret-scans, **re-gates capabilities** (the artifact carries *requested*; the host *grants* locally — a job can never grant itself privilege), and mints a local lease at the same epoch (a compare-and-swap lease ref). +- Runs the existing headless worker engine (`runBackupAgent` / `buildBackupCommand` in `src/backup/runner.ts`: `codex exec --json --sandbox` + `claude -p --output-format stream-json`, with the SIGTERM→SIGKILL timeout already present) in a decide→act→verify loop, unattended. +- Under a new `warp:push` capability (distinct from act:write), it commits at daemon checkpoint boundaries (after a status/test capture + artifact validation), not on every file write; pushes at deterministic milestones (task-board lane completion); embeds the lease epoch + validated checkpoint artifact in every commit; and secret-scans before every commit and push. It uses a short-lived, branch-scoped credential (fine-grained PAT or GitHub App installation token scoped to one repo + the warp namespace), never a shared `gh login` and never a long-lived personal PAT. + +**Observing and coming back.** + +- `ctxrelay warp status` / `warp tail` (planned): the local side fetches the heartbeat ref and reads the artifact to show current task, last command class, last commit, health, and epoch. The commit log alone cannot distinguish "thinking", "hung", "blocked on auth", and "dead". +- `ctxrelay warp recall` (planned): + - Cooperative: local CAS-sets the lease ref to `RECALLING`; the remote sees it (via its push trigger or an `ls-remote` check) at its next safe checkpoint, commits/pushes, releases the lease → local fetches/pulls. `WARPED → RECALLING → RETURNED`. + - Forced (timeout / no heartbeat / dead): local CAS-bumps the lease epoch (the fencing token), marks `REMOTE_DEAD` / `FORCED_RECLAIMED`, revokes the short-lived credential, and recovers from the last accepted pushed ref. A zombie remote that wakes at the stale epoch has its pushes rejected by the CAS check (and quarantined). Forced reclaim never means the dead remote commits and pushes. +- Terminal states beyond `RETURNED`: `REMOTE_DEAD`, `FORCED_RECLAIMED`, `RETURN_FAILED`, `CONFLICT_QUARANTINED`. + +**Budgets (broader than dollars).** Reuse the fail-closed `budgetUsd` gate (`src/session/idle-write-spend.ts`) plus caps on max commits, push bytes, changed files, runtime, and command count. Recall is the kill switch; budget/timeout exhaustion is auto-stop. + +**The hard boundary.** Lease + fencing only control a *cooperative* daemon. A compromised or buggy job holding repo credentials can ignore recall, exfiltrate the token, rewrite its credential helper, spawn children, or keep pushing from elsewhere. The real guardrails are infrastructure: short-lived scoped credentials + revocation, sandbox/process isolation, and server-side branch protection. v1 autonomous Warp is safe only on a trusted always-on box owned by the operator, never as a host for other people's jobs (see Non-Goals). + +### Transport Mechanics (Git-Native, Not Polling-First) + +Warp does not use the naive "commit a notes file and poll the branch" model. It uses git's own primitives so coordination is **atomic**, **event-driven where possible**, and **enforced by the server** rather than by client discipline. Two mechanics are locked in for Phases 1–2. + +**Locked in #1 — CAS refs for the lease and epoch fencing.** The lease is a dedicated ref (`…/warp//lease`) whose value encodes the epoch. Every lease transition is a **compare-and-swap**: `git push --force-with-lease` from a client, or a server-side `git update-ref --stdin` transaction with an old-value check; multi-ref updates use `git push --atomic`. To write, the remote must CAS the lease from the epoch it holds; recall CAS-bumps the epoch. A stale or zombie writer's push is **rejected at the server because the ref already moved** — fencing is enforced, not a `lease.json` you have to trust. + +**Locked in #2 — hook/webhook triggering instead of polling.** Pickup and recall are event-driven: + +- On a git server the operator controls (a bare repo on the trusted box), `post-receive` launches/notifies the worker the instant a push lands, and `pre-receive`/`update` **rejects** a push that violates policy (wrong namespace, stale epoch, force-push, secret detected) before it is stored. +- On GitHub the equivalents are **webhooks / Actions** (push event → trigger) and **branch-protection rules** (server-enforced no-force-push / no-delete / restricted pushers). +- Polling is a **fallback only**, and when used it polls pointers with `git ls-remote` (ref→sha, no objects downloaded), never a full fetch. + +**Supporting primitives.** + +- **Dedicated ref namespaces** (not committed files) for control and heartbeat: `…/warp//control`, `…/warp//heartbeat`, each pointing at a tiny object — atomic pointer moves, no checkout, no `git branch` clutter. (Arbitrary `refs/warp/*` works on a server you control; on GitHub, fall back to a side-branch namespace under `refs/heads/contextrelay/warp//…` or to `git notes`.) +- **`git notes`** (`refs/notes/*`) for metadata that arrives *after* a commit — the validated checkpoint artifact, a local "verified" verdict, the epoch stamp — without rewriting the commit SHA. +- **Signed commits/tags** for provenance: the remote signs each checkpoint, the local signs the job offer, addressing "is this the authentic daemon?" without inventing a new auth layer. +- **Annotated tags** as immutable milestone markers; **push options** (`git push -o key=value`) as push-time intent to server hooks (full on your own server, partial on GitHub). + +**Mechanism map.** + +| Need | Naive way | Git-native mechanism | +|---|---|---| +| Lease + fencing | a `lease.json` you trust | **CAS ref** (`--force-with-lease` / `update-ref` old-value check) — server rejects stale writers | +| Pickup / recall trigger | poll the branch | **`post-receive` hook / webhook / Actions** → event-driven | +| Fallback poll | `git fetch` the branch | **`git ls-remote`** (pointers only, no objects) | +| Control / heartbeat channel | committed files | **dedicated refs** to tiny objects | +| Post-hoc checkpoint metadata | rewrite the commit | **`git notes`** | +| "Is this the real daemon?" | trust the name | **signed commits / tags** | +| No force-push to `main` | client discipline | **branch protection / `pre-receive`** | + +**Architectural consequence.** Because v1's trusted box is the operator's own, that box can **host the bare repo itself** — then `post-receive` triggers the worker (no poll) and `pre-receive` enforces the epoch (no trust required), with no third party in the loop. GitHub remains an option, trading your own hooks for webhooks + branch protection + `--force-with-lease`. + +### Reuse Map (Do Not Rebuild) + +| Need | Reuse | Location | +|---|---|---| +| Dirty work capture onto a branch | `capturePrimaryDirtySnapshot()`, `createWriteWorktree(seedDirtyFromPrimary)`, `captureWriteDiff()` | `src/session/worktree.ts` | +| Branch namespace pattern | `contextrelay/write/` prefix + guarded prune/sweep | `src/session/worktree.ts` | +| Local git executor (fail-closed, stdin-fed) | `runGitStrict()` | `src/session/git-helpers.ts` | +| Headless Claude + Codex worker | `buildBackupCommand()`, `buildBackupEnv()`, `runBackupAgent()` | `src/backup/runner.ts` | +| Spend gate (fail-closed) | `evaluateWriteSpendGate()` / `budgetUsd` | `src/session/idle-write-spend.ts` | +| Named per-session state + ports | `CONTEXTRELAY_ALLOW_NAMED_SESSIONS`, `runtime-sessions//` | `src/daemon.ts` | +| Opt-in config gate pattern | `autonomy.{enabled,writableAction.{enabled,budgetUsd}}`, `activation.autoConnect` | `src/config-service.ts` | +| Gate-status reporting (CLI) | `describeActModeGates()` rows | `src/session/idle-scanner-preflight.ts` | + +### What Is Genuinely New + +- `git push` / `git fetch` wrappers (extend `runGitStrict`), including `--force-with-lease` (CAS), `--atomic`, and `git ls-remote`; no network git exists today. +- `warp:push` — a new gated privilege class, distinct from act:write. act:write is diff-capture-only and the worker prompt explicitly forbids git (`src/session/idle-write-dispatch.ts`: "Do NOT run git, do NOT commit, do NOT push…"). Warp is the first time an agent may commit and push autonomously, so it is gated separately. +- Curated, schema-versioned handoff/checkpoint artifact (not the raw ledger) + validator + secret scanner. +- Lease state machine with **CAS-enforced** epoch fencing + quarantine (the lease is a ref moved only by compare-and-swap). +- Event-driven pickup/recall: `post-receive` hook (controlled server) or webhook/Actions + branch protection (GitHub); `git ls-remote` pointer-poll as fallback. +- Dedicated control/heartbeat refs + `git notes` checkpoint metadata + signed checkpoints; `warp status` / `warp tail`. +- Short-lived, branch-scoped push credential wiring (per session, not `gh login`). + +### Build Order (De-Risk: Prove Recall Before Autonomy) + +All phases behind `CONTEXTRELAY_ALLOW_WARP=1`, default off. + +1. **Foundation (local, no network).** Curated artifact (schema + sanitize + secret scan) + lease/epoch model **as a CAS ref** (compare-and-swap semantics); `git push`/`fetch` wrappers including `--force-with-lease`, `--atomic`, and `ls-remote`, all unit-tested against a **local bare repo** (CAS rejects a stale-epoch update; push wrapper rejects `main`/force/delete). Not yet wired to a real remote. Reuse the worktree dirty-capture. No credentials, no remote touched. +2. **Attended warp + recall.** `warp prepare → push → accept` (attended resume, capabilities off until approved) → `warp recall` (cooperative + forced + CAS fencing/quarantine). Wire **event-driven triggering** — a `post-receive` hook on the local bare repo for the E2E (webhook + branch protection for GitHub) — with `ls-remote` poll as fallback; all lease transitions via CAS. Proves the lease + git + recall plumbing with zero autonomy. Needs a throwaway remote namespace + a scoped test token. +3. **Autonomous worker.** Layer the `warp:push` capability + the headless worker loop (reuse `backup/runner`) onto the warp branch: checkpoint commits, milestone pushes, broad budgets, short-lived branch-scoped credential. +4. **Observability.** Heartbeat/status ref + `warp status` / `warp tail`. + +Deferred / cut from v1: multi-user hoster mode + colleagues' projects (needs real isolation); live pairing; raw ledger import; literal "commit every file change"; auto-merge to main; long-lived user PATs; live bidirectional daemon channel. + +### Execution Model (Codex Implements, Claude Verifies) + +Per project roles, Codex is implementer/executor and Claude is planner/reviewer/verifier and coordinator. Claude owns git writes. Per-phase loop: Claude writes a tight phase spec (files, interfaces, reuse points, acceptance criteria, security invariants) and hands it to Codex over ContextRelay; Codex implements + writes tests; Claude verifies the diff, runs `bun run typecheck` / `bun test src` / `bun run check`, and adversarially checks the containment invariants (warp namespace only, never `main`, no force-push; stale-epoch pushes rejected; secret scan rejects `.env`/ignored/secret files; artifact validation rejects malformed input; dirty-capture round-trips; no git credentials leaked into the worker env whitelist); Codex fixes; Claude re-verifies; Claude handles branch/commit/PR. + +Hard stops (human authority): no network remote, no credential creation, no merge, tag, release, or npm publish, and no change to coordinator/git policy without the human. Phase 1 completes entirely locally; Phases 2–4 need human-provided credentials and a test remote, so the loop pauses for the human after Phase 1. + +### Verification + +- Unit (Phase 1): artifact serialize/curate/validate; secret-scan rejects `.env`/secrets/ignored paths; lease-epoch fencing rejects a stale epoch; dirty-capture round-trips (extend `src/unit-test/worktree-write.test.ts` patterns); `git push`/`fetch` wrappers tested against a local bare repo (no network). +- E2E (Phase 2+, single human, two checkouts + a throwaway remote namespace): `prepare → push → accept` (attended, then autonomous) → checkpoint/milestone pushes visible → `warp status` shows health → cooperative recall pulls the work → forced recall quarantines a simulated zombie push at a stale epoch. +- Gates: `bun run check`, `bun test src` green before any handoff back to the coordinator. + ## Release Documentation Rule Before publishing or tagging a release: