Skip to content

fix(decopilot): defer ensureVm() so POST /messages doesn't require a sandbox#3434

Merged
tlgimenes merged 1 commit into
mainfrom
tlgimenes/debug-ci-job-77291089797
May 22, 2026
Merged

fix(decopilot): defer ensureVm() so POST /messages doesn't require a sandbox#3434
tlgimenes merged 1 commit into
mainfrom
tlgimenes/debug-ci-job-77291089797

Conversation

@tlgimenes
Copy link
Copy Markdown
Contributor

@tlgimenes tlgimenes commented May 22, 2026

What is this contribution about?

Fixes the multi-pod CI failure (tests/multi-pod/scenarios/attach-cross-pod.test.ts → 500 "No link daemon registered for user ..." on POST /api/:org/decopilot/threads/:id/messages). PR #3417 added an eager ensureVm() to that handler, which provisions a sandbox via resolveDefaultSandboxProviderKind → defaults to "remote-user" when no link is registered AND STUDIO_SANDBOX_RUNNER is unset (the multi-pod compose), then throws inside provisionSandbox because there is no link daemon. Since resolveDispatchTarget only ever read vm.sandboxProviderKind off the returned entry, this refactors it to take the kind directly and drops the eager ensureVm call from the POST handler — the built-in tools layer already provisions the sandbox lazily on the first VM-tool invocation (ensureHandle in apps/mesh/src/harnesses/decopilot/built-in-tools/index.ts), so runs that never touch a VM-backed tool no longer pay the provisioning cost (or its hard prerequisites).

How to Test

  1. Re-run the Multi-Pod Tests workflow on this branch — attach-cross-pod.test.ts should pass.
  2. cd apps/mesh && bun test src/links/resolve-dispatch-target.test.ts src/api/routes/decopilot/routes.test.ts — all 18 tests pass.
  3. Open Decopilot locally with bun run dev, send a message that triggers bash/read/write — the VM is provisioned on first tool call (unchanged behavior).

Migration Notes

None — resolveDispatchTarget is internal to mesh and has one production caller.

Review Checklist

  • PR title is clear and descriptive
  • Changes are tested and working
  • Documentation is updated (if needed)
  • No breaking changes

Summary by cubic

Defers VM provisioning for Decopilot so POST /api/:org/decopilot/threads/:id/messages no longer requires a sandbox, fixing 500s in multi-pod CI without a link daemon. Dispatch target is now resolved from the sandbox provider kind, and the VM is created lazily on first tool use.

  • Bug Fixes

    • Prevent 500 "No link daemon registered…" by removing eager ensureVm() in POST /messages.
    • Multi-pod CI path works again; runs that never use VM-backed tools skip provisioning.
  • Refactors

    • resolveDispatchTarget now takes sandboxProviderKind instead of a VM entry and avoids provisioning.
    • Route handler resolves and persists the kind, and tests were updated to stub kind resolution instead of ensureVm.

Written for commit 8fa7d10. Summary will update on new commits. Review in cubic

…sandbox

The eager `ensureVm()` added to `POST /api/:org/decopilot/threads/:id/messages`
in #3417 made the route fail when no sandbox provider could be provisioned —
e.g. CI environments that don't run a link daemon and don't set
STUDIO_SANDBOX_RUNNER (default `"remote-user"` ⇒ `provisionSandbox` throws
"No link daemon registered for user ..."). The cross-pod attach scenario in
`tests/multi-pod/scenarios/attach-cross-pod.test.ts` only drives the mock
AI provider and never needs a real VM, yet POST `/messages` was 500-ing
before it ever enqueued the run.

`resolveDispatchTarget` only consumed `vm.sandboxProviderKind` from the
returned entry, so refactor it to take the kind directly and drop the
`ensureVm` call from the POST handler. The built-in tools layer already
provisions the sandbox lazily on the first VM-tool invocation via
`ensureHandle` in `apps/mesh/src/harnesses/decopilot/built-in-tools/index.ts`,
so runs that never touch a VM-backed tool no longer pay the provisioning
cost (or its hard prerequisites).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown
Contributor

🧪 Benchmark

Should we run the Virtual MCP strategy benchmark for this PR?

React with 👍 to run the benchmark.

Reaction Action
👍 Run quick benchmark (10 & 128 tools)

Benchmark will run on the next push after you react.

@github-actions
Copy link
Copy Markdown
Contributor

Release Options

Suggested: Patch (2.339.11) — based on fix: prefix

React with an emoji to override the release type:

Reaction Type Next Version
👍 Prerelease 2.339.11-alpha.1
🎉 Patch 2.339.11
❤️ Minor 2.340.0
🚀 Major 3.0.0

Current version: 2.339.10

Note: If multiple reactions exist, the smallest bump wins. If no reactions, the suggested bump is used (default: patch).

Copy link
Copy Markdown
Contributor

@cubic-dev-ai cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No issues found across 4 files

Re-trigger cubic

@tlgimenes tlgimenes merged commit 91cbc97 into main May 22, 2026
16 of 17 checks passed
@tlgimenes tlgimenes deleted the tlgimenes/debug-ci-job-77291089797 branch May 22, 2026 00:20
tlgimenes added a commit that referenced this pull request May 22, 2026
…nk-gated (#3436)

Without an explicit runner the env defaults to "remote-user", and
resolveDispatchTarget then 409s with link_offline because CI doesn't run a
link daemon. PR #3434 removed the eager ensureVm() but left the dispatch
check in place, so attach-cross-pod still failed at POST /messages.
Pinning to "docker" makes resolveDispatchTarget short-circuit to the local
cluster default; no sandbox is actually provisioned because ensureVm is
lazy and the mock-ai scenario never emits tool calls.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant