Harness implementation research: link daemon, agent pill, VM-as-runtime#3417
Conversation
🧪 BenchmarkShould we run the Virtual MCP strategy benchmark for this PR? React with 👍 to run the benchmark.
Benchmark will run on the next push after you react. |
Release OptionsSuggested: Patch ( React with an emoji to override the release type:
Current version:
|
There was a problem hiding this comment.
4 issues found across 238 files
Prompt for AI agents (unresolved issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="apps/mesh/migrations/080-rename-runner-kind.ts">
<violation number="1" location="apps/mesh/migrations/080-rename-runner-kind.ts:37">
P1: Guard the vmMap aggregate with `COALESCE(..., '{}'::jsonb)`; otherwise rows with an empty `vmMap` can end up with `metadata = NULL` during migration.</violation>
</file>
<file name="apps/link/src/tunnel.ts">
<violation number="1" location="apps/link/src/tunnel.ts:51">
P1: Do not fall back to a hardcoded tunnel API key; require explicit configuration (input or env) instead.</violation>
</file>
<file name="apps/link/src/control-plane.ts">
<violation number="1" location="apps/link/src/control-plane.ts:77">
P2: Validate `handle` as a non-empty string (not just truthy) before calling `ensureSandbox`.</violation>
</file>
Note: This PR contains a large number of files. cubic only reviews up to 100 files per PR, so some files may not have been reviewed. cubic prioritizes the most important files to review.
On a pro plan you can use ultrareview for larger PRs.
Re-trigger cubic
| apiKey: | ||
| input.apiKey ?? | ||
| process.env.DECO_TUNNEL_SERVER_TOKEN ?? | ||
| LEGACY_TUNNEL_TOKEN, |
There was a problem hiding this comment.
P1: Do not fall back to a hardcoded tunnel API key; require explicit configuration (input or env) instead.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At apps/link/src/tunnel.ts, line 51:
<comment>Do not fall back to a hardcoded tunnel API key; require explicit configuration (input or env) instead.</comment>
<file context>
@@ -0,0 +1,85 @@
+ apiKey:
+ input.apiKey ??
+ process.env.DECO_TUNNEL_SERVER_TOKEN ??
+ LEGACY_TUNNEL_TOKEN,
+ });
+ await Promise.race([
</file context>
There was a problem hiding this comment.
Pre-existing shared key, not per-user. The file's header documents this explicitly ("The Warp server still uses a legacy shared key... Once Warp accepts OAuth tokens we can swap the source in one place"). The fallback was introduced in #3283 to fix a silent registration hang — this PR just extracts it into apps/mesh/src/link-daemon/tunnel.ts without changing the auth model. Removing it would break every laptop CLI client that hasn't manually set DECO_TUNNEL_SERVER_TOKEN. Tracked for replacement when Warp accepts OAuth bearers.
There was a problem hiding this comment.
Thanks for the feedback! I've saved this as a new learning to improve future reviews.
| v.metadata, | ||
| '{vmMap}', | ||
| ( | ||
| SELECT jsonb_object_agg( |
There was a problem hiding this comment.
P1: Guard the vmMap aggregate with COALESCE(..., '{}'::jsonb); otherwise rows with an empty vmMap can end up with metadata = NULL during migration.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At apps/mesh/migrations/080-rename-runner-kind.ts, line 37:
<comment>Guard the vmMap aggregate with `COALESCE(..., '{}'::jsonb)`; otherwise rows with an empty `vmMap` can end up with `metadata = NULL` during migration.</comment>
<file context>
@@ -0,0 +1,107 @@
+ v.metadata,
+ '{vmMap}',
+ (
+ SELECT jsonb_object_agg(
+ user_key,
+ (
</file context>
There was a problem hiding this comment.
Fixed in c8a0e5435. Wrapped both jsonb_object_agg calls (outer over vmMap and inner over user_map) in COALESCE(..., '{}'::jsonb) so rows whose vmMap is {} — or whose nested user_map is {} — keep an empty object instead of getting metadata = NULL via jsonb_set. Also tightened the WHERE to require jsonb_typeof(metadata->'vmMap') = 'object' so non-object vmMap values are skipped. Same guard applied to the down migration.
There was a problem hiding this comment.
Thanks for the feedback! I've saved this as a new learning to improve future reviews.
| headers: { "content-type": "application/json" }, | ||
| }); | ||
| } | ||
| if (!parsed.handle) { |
There was a problem hiding this comment.
P2: Validate handle as a non-empty string (not just truthy) before calling ensureSandbox.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At apps/link/src/control-plane.ts, line 77:
<comment>Validate `handle` as a non-empty string (not just truthy) before calling `ensureSandbox`.</comment>
<file context>
@@ -0,0 +1,104 @@
+ headers: { "content-type": "application/json" },
+ });
+ }
+ if (!parsed.handle) {
+ return new Response(JSON.stringify({ error: "missing_handle" }), {
+ status: 400,
</file context>
There was a problem hiding this comment.
Fixed. Changed the check to typeof handle === "string" && handle.length > 0 so non-string truthy bodies (e.g. arrays, objects) are also rejected with 400.
There was a problem hiding this comment.
Thanks for the feedback! I've saved this as a new learning to improve future reviews.
5633d48 to
d252d00
Compare
There was a problem hiding this comment.
1 issue found across 111 files (changes from recent commits).
Prompt for AI agents (unresolved issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="apps/mesh/migrations/080-rename-runner-kind.ts">
<violation number="1" location="apps/mesh/migrations/080-rename-runner-kind.ts:37">
P1: Guard the vmMap aggregate with `COALESCE(..., '{}'::jsonb)`; otherwise rows with an empty `vmMap` can end up with `metadata = NULL` during migration.</violation>
</file>
<file name="apps/link/src/tunnel.ts">
<violation number="1" location="apps/link/src/tunnel.ts:51">
P1: Do not fall back to a hardcoded tunnel API key; require explicit configuration (input or env) instead.</violation>
</file>
<file name="apps/link/src/control-plane.ts">
<violation number="1" location="apps/link/src/control-plane.ts:77">
P2: Validate `handle` as a non-empty string (not just truthy) before calling `ensureSandbox`.</violation>
</file>
Note: This PR contains a large number of files. cubic only reviews up to 100 files per PR, so some files may not have been reviewed. cubic prioritizes the most important files to review.
On a pro plan you can use ultrareview for larger PRs.
Tip: Review your code locally with the cubic CLI to iterate faster.
Re-trigger cubic
eb22f9e to
71ac220
Compare
…d runtime
Squashes 19 commits from tlgimenes/harness-impl-research.
Laptop link daemon: `deco link` boots a local Bun.serve, opens a Warp
tunnel to deco.host, registers with the cluster's /api/links to receive
a linkSecret, then exposes an HMAC-signed control plane for sandbox
lifecycle + reverse proxy. Capability probe surfaces claude-code, codex,
and decopilot-sandbox to the cluster. SIGINT/SIGTERM deregister cleanly.
Sandbox provider abstraction: RunnerKind → SandboxProviderKind across
type/storage column/helpers. New remote-user provider plugs the user's
link daemon in via ctx.linkRegistry; docker + agent-sandbox keep an
env-singleton fallback. vmMap re-keyed to (user, branch, providerKind)
with tolerant readers for legacy 2-level + runnerKind shapes; migration
086-fix-vm-map-rekey performs the rename that earlier migrations no-op'd
on a dropped table.
Chat UI: single Agent pill replaces Runner+Harness, eligibility-aware.
Clonable agents (Start Website + GitHub-imported) skip the no-AI-provider
empty state when the laptop link is online with a CLI harness, and the
two CLI options (Claude Code / Codex desktop) surface without a cloud
key — they run on laptop-stored credentials. Decopilot fallback for
agents with no GitHub repo. Connect-laptop empty-state tile + dialog.
cli-activate provider path removed (now on the laptop link); migration
087 purges sentinel keys.
Inlining refactor (vs prior research branch): apps/link → apps/mesh/src/
{cli/commands/link.ts, link-daemon/}, @decocms/harnesses → apps/mesh/src/
harnesses/, @decocms/link-protocol → apps/mesh/src/links/protocol/.
Drops three single-consumer packages, keeps one binary (deco), folds
session/tunnel/login plumbing into the existing CLI. packages/sandbox
reaches into apps/mesh via relative imports — documented inline as a
trade-off accepted per "abstractions to a minimum".
Migrations 082-087 are renumbered to start strictly after main's latest
applied (081-async-research-jobs-result-content) so prod's executed list
stays a clean prefix and the new migrations apply fresh on deploy.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Captures the design decisions from brainstorming: collapse the AgentPill (Decopilot/Claude Code/Codex) into the chat-input model trigger as a sectioned popover with green styling on local-CLI sections, mirror existing lock semantics, and fix the gap-collapse bug on the trigger button. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Q6 resolved: remove the cloud-Decopilot-on-local-sandbox option entirely (existing localStorage validation already migrates stale values to null). Adds the verified "what stays vs. what goes" inventory after exploring real call sites. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Nine bite-sized tasks: trim agent-options.ts, build getAgentSections (+ tests), AgentSection component, AgentModelPopover shell, rewrite AgentModelTrigger with green styling and gap fix, plumb new props through input.tsx, slim ThreadPills, delete AgentPill, and final type/lint/fmt/test verification. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ions Removes the cloud-Decopilot-on-local-sandbox option (decopilot-laptop) and computeAgentOptions, which gets replaced by getAgentSections in a following commit. Keeps AGENT_OPTION_PINS + pinsForOption / pinsToOption since chat-context.tsx still needs them. The localStorage load in chat-context already validates against AGENT_OPTION_PINS, so stale 'decopilot-laptop' values silently migrate to null.
Introduces AgentKind/AgentSection types and a pure getAgentSections function that returns the three sections (Decopilot, Claude Code, Codex) shown in the new merged chat-input popover. Mirrors the gates computeAgentOptions used (sans decopilot-laptop) and flags CLI sections with isLocal: true to drive the green styling. Preserves the existing getAgentModelSet helper used by the settings page. Renamed agent-models.ts to agent-models.tsx because the Decopilot tier entries now hold JSX icon nodes (Lightning01/Stars01/Atom01).
Renders one section in the new sectioned popover — header with the agent title (plus " · on this laptop" suffix for CLI agents), three tier rows with descriptions, and the existing "On" indicator. Local sections sit on a faint bg-success/5 band; disabled sections render opacity-40 + pointer-events-none + a small lock icon. Also adds Bun test infrastructure (bunfig preload + happy-dom + @testing-library/react + jest-dom matchers) so React component tests can render in a DOM and use toBeInTheDocument(). No prior precedent existed in the repo, so the setup file is new.
…age.json sort Code review flagged that the root bunfig.toml preloaded happy-dom for all workspace tests, replacing Node natives like fetch/Request/Response/URL in packages/* tests that depend on them. Switched to Option B: removed both bunfig.toml files (root and apps/mesh) and made apps/mesh/src/test/setup.ts an importable module with side effects. React component tests now `import "../../../../test/setup"` (or equivalent) at the top, guarded by GlobalRegistrator.isRegistered to avoid double-registration. This keeps happy-dom contained to component tests that opt in, while packages/* tests keep running against Node natives (804 pass / 0 fail vs the prior 470/31/15 baseline). Also reverts an unrelated @decocms/sandbox alphabetical re-sort that crept into the Task 3 commit -- restores its prior position between marked and mesh-plugin-workflows.
Composes AgentSection rows from a getAgentSections result. Handles lock semantics — when lockedAgent is set, only the matching section is interactive; the others render disabled. Row click fires onSelect with (kind, tier) and the locking is verified by tests.
The trigger now opens the new sectioned AgentModelPopover (Decopilot, Claude Code, Codex) instead of the old SimpleModeTierDropdown / CLI tier list split. Closed pill goes text-success + bg-success/10 when the active agent is a CLI variant, matching the 'Desktop connected' green elsewhere. Picks both agent (via setPendingAgentOption) and tier in a single click and fires the eager VM start for CLI agents when a branch is set. Also fixes the phantom 6px gap that appeared when the label collapses at narrow container widths. Also passes a concrete URL to the happy-dom GlobalRegistrator in the shared test setup so component tests whose transitive imports reach better-auth/react createAuthClient (which reads window.location at module init) no longer blow up on the default about:blank URL.
bun:test doesn't auto-call RTL's cleanup() the way Jest does, so DOM from one component test was leaking into the next file's render, e.g. a popover trigger from agent-model-trigger.test.tsx persisting into agent-section.test.tsx and breaking getByText queries. Add an afterEach in setup.ts that calls cleanup() to mirror the implicit Jest behavior. Because bun:test caches this setup module across every test file in a run, the top-level afterEach only registers in whichever test file imports setup first. To cover the remaining files we also install a Bun.plugin onLoad hook that prepends an afterEach -> cleanup() registration to any test source that already imports test/setup, so each file's own scope gets the hook regardless of load order.
The plugin onLoad transform that injected afterEach() into every test file was clever but surprising — future contributors would have no idea why test files appear to have shifted line numbers in stack traces. Replace it with an exported setupComponentTest() function that each component test calls at module top-level. Registration lands in the right scope without source rewrites or runtime magic.
The merged AgentModelTrigger needs both props so it can fire the eager VM start that ThreadPills used to own when the user picks a CLI agent. Plumbs them through the input mount without otherwise touching the composer layout.
The agent picker now lives inside the chat-input's AgentModelTrigger. ThreadPills shrinks back to a single BranchPill (still locked when the thread has messages). The eager VM-start logic moves into the merged AgentModelTrigger row click.
Replaced by the sectioned popover inside AgentModelTrigger. No remaining importers.
Two regressions surfaced during T9 verification when running the chat test directory as a whole: 1. happy-dom's GlobalRegistrator overwrites globalThis.TransformStream and WritableStream but leaves ReadableStream alone. Once a component test triggered registration, store-level tests that exercise ai-sdk's readUIMessageStream blew up with "readable should be ReadableStream" because stream.pipeThrough received a happy-dom TransformStream while the stream itself was the native ReadableStream. Snapshot the native classes before register() and restore them after. 2. The published @testing-library/jest-dom type augmentation targets the jest/vitest globals, so .toBeInTheDocument() flagged TS errors on bun:test's Matchers<T>. Declare the augmentation against "bun:test" so the matchers extended at runtime are visible to the type checker. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
AgentTierEntry.modelId is `string | null` (Decopilot tiers leave it null). The existing guard checked `!model` after `lookup[entry.modelId]`, which TypeScript correctly flagged because the index access itself rejects a null key. Add the explicit pre-guard so the type narrows cleanly and the runtime semantics stay identical. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…n with prod 082-secrets origin/main now has 082-secrets (PR #3425) at the position the earlier renumber commit (edb42f3) parked 082-thread-run-locally. Two migrations sharing the same 082- numeric prefix breaks Kysely's ordering guarantee (and the locally-run 082-thread-run-locally errors out with corrupted-migrations when the directory now sorts 082-secrets first). Shift all six branch migrations up by one slot: 082-thread-run-locally -> 083-thread-run-locally 083-drop-host-sandbox-rows -> 084-drop-host-sandbox-rows 084-rename-runner-kind -> 085-rename-runner-kind 085-thread-pins-and-vm-map-rekey -> 086-thread-pins-and-vm-map-rekey 086-fix-vm-map-rekey -> 087-fix-vm-map-rekey 087-purge-cli-activate-keys -> 088-purge-cli-activate-keys Updates the index.ts imports / migrations map and the two test files that self-imported their own migration module (086/087 test files). All 88 migrations apply cleanly in the test runner; 8/8 renamed tests pass. Local devs who already ran any of the old-numbered migrations on this branch must either reset their postgres data dir or rename the rows in kysely_migration to the new numbers before pulling. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…nner-kind
Wrap inner and outer jsonb_object_agg calls in COALESCE(..., '{}'::jsonb) so
rows whose vmMap is an empty object — or whose nested user_map is empty —
don't end up with metadata = NULL after the rewrite (jsonb_set returns NULL
when any argument is NULL). Also restrict the UPDATE to rows where
metadata->'vmMap' is an object so non-object vmMap values are skipped.
Addresses PR review on #3417.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Reject non-string truthy bodies (arrays, objects, numbers) with 400 instead of letting them flow through to ensureSandbox. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
71ac220 to
3ab12d0
Compare
…sandbox (#3434) The eager `ensureVm()` added to `POST /api/:org/decopilot/threads/:id/messages` in #3417 made the route fail when no sandbox provider could be provisioned — e.g. CI environments that don't run a link daemon and don't set STUDIO_SANDBOX_RUNNER (default `"remote-user"` ⇒ `provisionSandbox` throws "No link daemon registered for user ..."). The cross-pod attach scenario in `tests/multi-pod/scenarios/attach-cross-pod.test.ts` only drives the mock AI provider and never needs a real VM, yet POST `/messages` was 500-ing before it ever enqueued the run. `resolveDispatchTarget` only consumed `vm.sandboxProviderKind` from the returned entry, so refactor it to take the kind directly and drop the `ensureVm` call from the POST handler. The built-in tools layer already provisions the sandbox lazily on the first VM-tool invocation via `ensureHandle` in `apps/mesh/src/harnesses/decopilot/built-in-tools/index.ts`, so runs that never touch a VM-backed tool no longer pay the provisioning cost (or its hard prerequisites). Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Summary
End-to-end implementation of harness execution via a local link daemon, with the UI shifted from Runner+Harness pills to a single Agent pill driven by the link's real capabilities.
X-Link-Secret, capability probe for claude-code / codex / sandbox, current-link surfaced to the UI viaLINK_CURRENT_GETanduseCurrentLink.vm-proxynow resolves the sandbox runner per-user viactx.linkRegistry(with env-singleton fallback for docker / agent-sandbox).runnerKind→sandboxProviderKindrename onconnections.metadata, with a tolerantnormalizeVmMapon the read path.X-Link-Secretinstead ofAuthorization: Bearer …so heartbeats don't trip Better Auth'sINVALID_API_KEYlogger; legacy well-known deprecation middleware pinned to its two specific paths.Test plan
bun test apps/mesh/src/links(heartbeat / shutdown via X-Link-Secret)bun test apps/mesh/migrations/082-fix-vm-map-rekey.test.tsbun test apps/mesh/src/storage/virtual(normalized vmMap on read)INVALID_API_KEYlog per heartbeat/api/vm-proxy/...as aremote-userdeployment, confirm 503 surfaces when no link daemon🤖 Generated with Claude Code
Summary by cubic
Runs harnesses on the user’s laptop via a new
deco linkdaemon with HMAC‑secured remote dispatch, replaces Runner+Harness pills with a capability‑aware Agent pill, and switches VM routing to aSandboxProviderKindwith per‑thread pins and a 3‑level vmMap. Migrations are renumbered to 083–088 to avoid collisions and ensure the vmMap data fix lands.New Features
deco linkdaemon (apps/mesh/src/link-daemon): registers at/api/links, heartbeats viaX-Link-Secret, serves an HMAC control plane with nonce replay protection, opens a Warp tunnel (or--no-tunnellocalhost), probes capabilities (claude-code,codex,decopilot-sandbox), and exposes status viaGET /api/links/me;bun run smoke:linkchecks link health.vmMapplus per‑thread pins (threads.sandbox_provider_kind/threads.harness_id), streams to the desktop daemon over its tunnel with HMAC signing, supports cancellation, and bridges SSE back to chat; returns 409 with clear reasons when the link is offline or missing a capability.LinkRegistry(NATS JetStream / In‑mem) backs/api/links(register/heartbeat/shutdown/status); wire protocol, HMAC, and fixtures live underapps/mesh/src/links/protocol; dev can auto‑spawn the daemon and mint a local session whenMESH_ALLOW_LOCALHOST_LINKS=1; CLI adds--local-sandbox-providerand--no-tunnel.claude-code-models.tsandcodex-models.ts; local‑CLI provider adapters and thecli-activatepath removed;@anthropic-ai/claude-agent-sdkmoved tooptionalDependencies.Migration
threads.run_locally; drophostsandbox rows; renamesandbox_runner_state.runner_kind→sandbox_provider_kind; add per‑thread pins (threads.sandbox_provider_kind,threads.harness_id); re‑keyconnections.metadata.vmMapfrom 2‑level to 3‑level and finish the missedrunnerKind→sandboxProviderKindrewrite with explicit casts and COALESCE; purgeai_provider_keyssentinels forclaude-code/codex.Written for commit 3ab12d0. Summary will update on new commits. Review in cubic