Skip to content

fix(createSandbox): report token usage on reused sandboxes#683

Open
masone wants to merge 2 commits into
mattpocock:mainfrom
masone:fix/warm-path-token-capture
Open

fix(createSandbox): report token usage on reused sandboxes#683
masone wants to merge 2 commits into
mattpocock:mainfrom
masone:fix/warm-path-token-capture

Conversation

@masone
Copy link
Copy Markdown

@masone masone commented May 21, 2026

The bug

When you reuse a sandbox across multiple runs:

const sandbox = await createSandbox({ branch, sandbox: docker(...) })
const result = await sandbox.run({ agent, prompt })
result.iterations[0].sessionFilePath // ❌ always undefined
// → no session file captured, so no token usage is reported

The one-shot run() reports token usage fine. Only the reused
createSandbox + sandbox.run() path comes back empty.

Why

After each run, the orchestrator copies the agent's session file out of the
sandbox to the host and reads token usage from it. That step only happens when
it has a handle to the sandbox's filesystem (bindMountHandle).

run() passes that handle through. createSandbox builds its own internal
factory to reuse the container across runs — and that factory was passing
everything except the handle. So the copy-out step was silently skipped every
time, and with no session file there was nothing to read usage from.

The fix

Forward the bind-mount handle through the reuse factory, the same way run()
already does. The handle is narrowed to the bind-mount provider case (so an
isolated/no-sandbox handle is never mistaken for a bind-mount one). With the
handle present, the existing capture code runs and sessionFilePath / token
usage are populated on reused sandboxes too.

Changes

  • src/createSandbox.ts — carry the bind-mount handle into the handle context
    and forward it into the orchestrator's per-run context, narrowed to
    bind-mount providers.
  • src/createSandbox.test.ts — two regression tests (see below).
  • .changeset/ — patch bump.

Tests

Added two tests at the createSandbox level:

  • reused bind-mount sandbox captures the session + reports usage
    sessionFilePath and usage are populated. Verified to fail on the
    pre-fix factory
    (expected undefined to be defined).
  • a non-bind-mount (isolated) provider does not capture — a sessionId is
    still extracted, but the handle isn't forwarded, so capture is skipped. This
    pins the narrowing.

masone added 2 commits May 21, 2026 13:56
The reuse factory built by createSandbox passed only hostWorktreePath,
sandboxRepoPath and applyToHost into the orchestrator's withSandbox
context, omitting bindMountHandle. The orchestrator gates session capture
on bindMountHandle, so capture (and the token-usage parsing that depends
on the captured JSONL) was silently skipped on every sandbox.run() —
reused sandboxes reported zero token usage while one-shot run(), whose
primary factory forwards the handle, reported it correctly.

Forward the handle (narrowed to the bind-mount provider case) so capture
runs on reused sandboxes too, matching run().
Add two tests for the bindMountHandle forwarding fix:
- a reused bind-mount sandbox captures the session JSONL and reports
  token usage (sessionFilePath + usage populated). This fails on the
  pre-fix factory (sessionFilePath undefined).
- a non-bind-mount (isolated) provider does NOT capture: a sessionId is
  still extracted, but the handle is not forwarded, so capture is skipped.

The mock bind-mount handle's copyFileOut synthesizes the session JSONL,
avoiding the need for a writable in-sandbox projects dir; HOME is
redirected to a temp dir so capture never touches the real ~/.claude.
@vercel
Copy link
Copy Markdown

vercel Bot commented May 21, 2026

@masone is attempting to deploy a commit to the Matt Pocock's projects Team on Vercel.

A member of the Team first needs to authorize it.

@mattpocock mattpocock closed this May 29, 2026
@mattpocock mattpocock reopened this May 29, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants