feat(ai): claude-cli recipe for native gateway-based subagent dispatch by brettdavies · Pull Request #2277 · garrytan/gbrain

brettdavies · 2026-06-18T16:38:27Z

Closes #2276, closes #334

Summary

Operators with a Claude Max subscription can now route subagent dispatch to their existing OAuth session via the local claude binary, without an ANTHROPIC_API_KEY. After this PR, gbrain config set models.tier.subagent claude-cli:claude-sonnet-4-6 (or any per-task key like models.dream.patterns) dispatches through the gateway-loop the same way every other non-Anthropic recipe does. Per-call routing is preserved: a worker can serve claude-cli:* for one job and litellm:gpt-5.4 for the next, no env-var switch.

The fix

A native claude-cli recipe in the registry. The recipe + its ai-sdk LanguageModelV2 provider implementation let gateway.toolLoop() dispatch to the local claude binary directly, parse the JSON envelope, and surface parallel <use_tools> blocks as native ai-sdk tool-call content parts. End-to-end:

Recipe (src/core/ai/recipes/claude-cli.ts): tier: 'native', chat-only (no embedding / no expansion). Models: claude-opus-4-7 / claude-sonnet-4-6 / claude-haiku-4-5-20251001. supports_tools: true, supports_subagent_loop: true, auth_env.required: [] (CLI owns auth via its OAuth session).
Provider (src/core/ai/providers/claude-cli-language-model.ts): ai-sdk LanguageModelV2. doGenerate spawns claude --print --output-format json --model <X> --disable-slash-commands --system-prompt <gbrain prompt> from a dedicated tmpdir (skips local CLAUDE.md auto-discovery), parses the JSON envelope, extracts <use_tools> blocks back into native tool-call content parts. Parallel tool calls in one block round-trip cleanly.
Registry + gateway wiring (src/core/ai/recipes/index.ts, src/core/ai/types.ts, src/core/ai/gateway.ts): claude-cli joins the Implementation union; instantiateChat / instantiateExpansion / instantiateEmbedding gain case 'claude-cli': (chat returns the LanguageModelV2; expansion mirrors it for a future expansion touchpoint; embedding throws like native-anthropic does).

Why a recipe instead of #334's MessagesClient adapter

The original #334 proposal injected a MessagesClient adapter at jobs.ts worker startup behind GBRAIN_USE_CLAUDE_CLI=1. That adapter only runs on the legacy Anthropic-direct path. Under agent.use_gateway_loop = true (the production default for every non-Anthropic recipe), dispatch flows through gateway.toolLoop() not client.create() at subagent.ts:506, so the adapter never fires.

Per-call routing via a proper recipe is the architecture that extends correctly into gateway-loop world. It also gives the operator a clean per-call routing knob between OAuth-subscription dispatch and API-key dispatch on the same model name, in the same worker, without a process-level switch.

Context isolation

The recipe spawns from a clean tmpdir + --disable-slash-commands + --system-prompt, which is the maximum suppression preserving OAuth/Max-subscription auth. --bare would also strip ~/.claude/CLAUDE.md but forces ANTHROPIC_API_KEY auth, defeating the whole point. The ~42K cached tokens from user-level instructions stay as an unavoidable trade-off on the subscription path.

Reproduction (before this PR)

Brain with Claude Max subscription and claude CLI logged in. No ANTHROPIC_API_KEY in env.
gbrain config set models.tier.subagent claude-cli:claude-sonnet-4-6.
gbrain models → warns "unrecognized provider" and falls back to TIER_DEFAULTS.subagent (anthropic:claude-sonnet-4-6).
Submit any subagent job → throws AIConfigError: Anthropic ... requires ANTHROPIC_API_KEY.

After this PR: step 3 resolves to the claude-cli recipe; step 4 dispatches through the local claude binary; the job runs against the OAuth session.

Tests

test/claude-cli-recipe.test.ts: 16 tests, 46 assertions. Covers recipe registration; spawn args + tmpdir isolation; JSON envelope parse (text-only, tool-call, parallel tool-call, mixed); <use_tools> block extraction back to ai-sdk tool-call content parts; "model offered tools but declines to use them" case asserting clean text-only return so gateway.toolLoop treats it as a final answer rather than wedging; auth-env declared as empty array.
bun run typecheck clean. bun run verify green.
Verified end-to-end on a production brain (Claude Max subscription, no ANTHROPIC_API_KEY): gbrain dream --phase patterns with models.dream.patterns = claude-cli:claude-sonnet-4-6 runs through parallel tool calls, persists turns, resumes cleanly across worker recycles.

Coordination with the gateway-loop reliability fix

The gateway-loop tool-result-turn persistence fix (separate PR, separate issue; see fix(subagent): persist + reconcile tool-result turns in the gateway loop) is REQUIRED for this recipe to work under load. Without that fix, parallel tool calls dead-letter on any mid-conversation interruption regardless of which recipe is dispatching. Recommend landing that fix first; either order works for the actual code change (this PR's gateway.ts touches are additive only).

…use) Closes garrytan#334 (partially — text-only baseline; tool use lands in the next commit on this branch). Adds a MessagesClient adapter that shells out to `claude --print --output-format json --model <model>` instead of the Anthropic SDK. When `GBRAIN_USE_CLAUDE_CLI=1` is set, the subagent worker registers the adapter in place of the SDK client; the default path (Anthropic SDK with ANTHROPIC_API_KEY) is unchanged when the env var is unset or set to anything else. The benefit is that Claude Max subscribers can run Minions subagents against their existing OAuth subscription, no ANTHROPIC_API_KEY needed. New: src/core/minions/handlers/claude-cli-adapter.ts - Implements the MessagesClient interface exported from subagent.ts. - Strips provider prefixes (`anthropic:`, `litellm:`) from the model id because `claude --print` only accepts CLI-native aliases (`sonnet`, `opus`, `haiku`, or the bare `claude-*-N-M` form). - Flattens the Anthropic messages array into a single text prompt for claude-cli stdin. Tool blocks (tool_use / tool_result) are stringified as placeholders so multi-turn conversations stay coherent in this baseline; native tool_use round-tripping is the follow-up commit. - Spawns claude with stdio piped, captures stdout, parses the `{type:"result", subtype:"success", result, usage, ...}` JSON envelope, and returns it as a properly shaped Anthropic.Message with `stop_reason: 'end_turn'`. - Token totals propagate from the claude usage block so the subagent handler's `ctx.updateTokens()` reports usable numbers. - AbortSignal is wired through to SIGTERM the child so the subagent loop's cancellation path stays correct. Modified: src/commands/jobs.ts (worker registration) - Conditionally constructs a MessagesClient via the new adapter when GBRAIN_USE_CLAUDE_CLI=1. - Passes it into makeSubagentHandler({ engine, client: subagentClient }). - Logs `[minion worker] subagent routing via claude-cli (GBRAIN_USE_CLAUDE_CLI=1)` on startup so the env var status is operator-visible. Limitations of this commit (addressed in the follow-up): - Tool use is not yet supported. Tools in params.tools are ignored; the adapter returns a single text block with stop_reason='end_turn'. - Token counts come from claude-cli's reporting and may not match the Anthropic API's accounting precisely (especially for cache tiers). Original design from garrytan#334; this commit preserves that author's attribution. The follow-up commits on this branch carry the tool-use implementation.

…op of jarvisdoes baseline Builds on the previous commit (jarvisdoes's garrytan#334 baseline) by adding three things the upstream issue called out as gaps or that surfaced during review: 1. Tool use support via system-prompt-instructed JSON emission. 2. Context isolation flags so claude-cli does not load operator-level CLAUDE.md, skills, and local project context into every subagent call. 3. Env var rename from GBRAIN_USE_CLAUDE_CLI=1 to GBRAIN_SUBAGENT_PROVIDER=claude-cli to match the existing GBRAIN_<noun>_<role>=<value> convention used by GBRAIN_CHAT_MODEL, GBRAIN_EMBEDDING_MODEL, GBRAIN_EXPANSION_MODEL. ## Tool use The MessagesClient interface returns Anthropic.Message objects whose content array may include tool_use blocks. The subagent handler filters those blocks and dispatches each tool, so any backend that produces correctly shaped tool_use blocks gets the same loop behavior as the Anthropic SDK. The adapter injects a system-prompt addendum describing the tool registry plus an emission protocol: <use_tools> [{"id": "...", "name": "...", "input": {...}}, ...] </use_tools> After the response comes back, extractToolCalls() scans for the block, parses the JSON (tolerant of optional ```json fencing), and converts each entry into a tool_use content block. Multiple parallel tool calls in one turn are supported via the array shape; this is the exact case that breaks today on the codex-proxy / litellm GPT-5.x bridge where parallel tool-call response IDs get dropped. Defensive fallbacks: - Malformed JSON inside the block: drop to text-only, stop_reason='end_turn'. - Unterminated <use_tools> (no close tag): drop to text-only. - Model omits id field: adapter synthesizes a toolu_claude_cli_<rand> id. - Empty response: still hand the subagent loop a well-formed content array so the .filter chain does not crash. ## Context isolation claude-cli auto-discovers CLAUDE.md from cwd upward and injects the operator's skills + plugins + auto-memory into the default system prompt. On a real install that is ~42-65k tokens of contamination per subagent call, with both cost and behavioral consequences (the subagent picks up the operator's coding conventions, opinions, and preferences). The maximum suppression that still preserves OAuth / Claude Max subscription auth is: - Spawn from a dedicated clean cwd (tmpdir-based) so LOCAL CLAUDE.md auto-discovery has nothing to find. -13k tokens on a real gbrain install where CLAUDE.md is substantial. - --disable-slash-commands so skill resolution does not pull in /skill-name handlers. - --system-prompt <gbrain prompt> so the default system prompt is replaced rather than appended to. The --bare flag would also strip user-level ~/.claude/CLAUDE.md but it forces ANTHROPIC_API_KEY auth, defeating the whole point of this adapter. The remaining ~42k cached tokens from user-level instructions are accepted as a cost-trivial trade-off because the Max subscription absorbs the per-call cost. Behavioral contamination is mitigated by gbrain's strong per-call system prompt overriding any operator-level drift. ## Env var rename Surveyed all ~140 GBRAIN_* env vars in src/. The codebase uses three patterns: GBRAIN_NO_<feature> (negative toggles), GBRAIN_<noun>_<role> =<value> (routing keys), GBRAIN_ALLOW_<feature> (permissive toggles). GBRAIN_USE_* does not appear anywhere except jarvisdoes's original commit; it would introduce a fourth pattern. GBRAIN_SUBAGENT_PROVIDER=claude-cli aligns with the routing-keys family and is value-extensible — adding codex-cli / meridian-proxy / etc. later means a new value, not a new env var. The scope ('SUBAGENT_*') is also unambiguous about which calls the toggle covers; GBRAIN_USE_CLAUDE_CLI was silent on whether it applied to all gbrain LLM calls or only the subagent path. Unknown values are rejected with a fail-fast error message naming the two valid values rather than silently falling through to the default. ## Tests New file: test/claude-cli-adapter.test.ts — 12 tests, 33 assertions: - Text-only round trip (single text block, usage propagation, end_turn). - Provider prefix stripping ('anthropic:claude-sonnet-4-6' -> 'claude-sonnet-4-6'). - Single tool_use parsing. - Multiple parallel tool calls in one block (the case that triggered the codex-proxy regression). - Fenced JSON inside <use_tools> block. - Model-omitted id gets synthesized to toolu_claude_cli_<rand>. - Malformed JSON falls back to text. - Unterminated block falls back to text. - AbortSignal SIGTERMs the child. - Error envelope rejected with informative message. - Non-JSON output rejected with raw-output excerpt in the error. - argv + cwd assertion: --disable-slash-commands + --system-prompt are present and cwd is the dedicated tmpdir. Tests use a POSIX shell stub at GBRAIN_CLAUDE_CLI_BIN that emits a scripted --output-format json envelope, so the suite runs without claude-cli installed and without API credits.

garrytan#334 baseline) Replaces the MessagesClient adapter + GBRAIN_USE_CLAUDE_CLI=1 env-var gate from the previous commit on this branch with a proper gateway recipe. The recipe path gives per-call routing as a native capability: a model string like `claude-cli:claude-sonnet-4-6` lands here while a sibling `litellm:gpt-5.4` continues through the litellm-proxy / codex-proxy path in the same worker. No global env-var switch, no agent.use_gateway_loop bypass, no MessagesClient injection at jobs.ts worker startup. The previous commit on this branch (jarvisdoes baseline) is preserved in the history for garrytan#334 authorship attribution. Its functional changes are backed out here because the recipe pattern is gbrain's established integration seam; introducing a parallel MessagesClient + env-var path would have created two routing mechanisms competing for the same job. New: src/core/ai/recipes/claude-cli.ts - Recipe declaration: id 'claude-cli', tier 'native', implementation 'claude-cli', chat-only (no embedding or expansion touchpoints). - Models: claude-opus-4-7, claude-sonnet-4-6, claude-haiku-4-5-20251001. - supports_tools and supports_subagent_loop both true. - supports_prompt_cache false because the CLI handles caching internally and does not surface cache_control via the standard control plane. - auth_env.required is the empty array because the CLI owns auth (OAuth session managed by `claude login`). - Friendly aliases mirror the `anthropic` recipe: `sonnet`, `haiku`, `opus` and the same legacy-id rewrites for back-compat with stale config strings. New: src/core/ai/providers/claude-cli-language-model.ts - ClaudeCliLanguageModel class implementing the ai-sdk LanguageModelV2 interface. - doGenerate: renders the ai-sdk prompt array into a system text + user text, injects the use_tools protocol instructions when tools are present, spawns `claude --print --output-format json --model <X> --disable-slash-commands --system-prompt <gbrain prompt>` from a dedicated tmpdir (contamination suppression: no local CLAUDE.md auto-discovery), parses the JSON envelope, extracts <use_tools> blocks, and returns ai-sdk-shaped LanguageModelV2Content (text + tool-call parts with stringified-JSON input matching the V2 contract). - Tolerates fenced JSON inside use_tools blocks, malformed JSON (falls back to text), missing close tag (falls back to text), model-omitted ids (synthesizes toolu_claude_cli_<rand>). - Parallel tool calls in one block round-trip cleanly: this is the case that drops IDs on the litellm + codex-proxy bridge today. - AbortSignal SIGTERMs the child for proper cancellation. - doStream throws not-supported (gateway.toolLoop is non-streaming). Modified: src/core/ai/gateway.ts - Adds case 'claude-cli' to instantiateChat (returns ClaudeCliLanguageModel). - Adds case 'claude-cli' to instantiateExpansion (same wrapper, reserved for a future expansion touchpoint declaration). - Adds case 'claude-cli' to instantiateEmbedding (throws, no embedding model, mirrors the native-anthropic path). - Lazy require() at the call site keeps the gateway module load cheap for users who never use the claude-cli path. Modified: src/core/ai/recipes/index.ts - Registers `claudeCli` in the ALL[] array next to `anthropic`. Modified: src/core/ai/types.ts - Adds 'claude-cli' to the Implementation union so the gateway switch is exhaustive at compile time. Reverted: src/commands/jobs.ts - Drops the GBRAIN_USE_CLAUDE_CLI=1 env-var gate the prior commit added. Routing now happens at the gateway based on the model string. Deleted: src/core/minions/handlers/claude-cli-adapter.ts - The MessagesClient adapter is superseded by the recipe + LanguageModelV2 path. Two routing mechanisms competing for the same job would have forced users to reason about which one wins; the recipe is the single source of truth. New file: test/claude-cli-recipe.test.ts (16 tests, 46 assertions): - Recipe registration: getRecipe returns chat-only Recipe; aliases map short names (sonnet/haiku/opus) to canonical model ids. - Text round trip: single text content block, usage propagation, stop finish reason. - Provider prefix stripping. - Single tool-call parsing. - Multiple parallel tool calls in one block. - Fenced JSON inside the block. - Model-omitted id synthesizes toolu_claude_cli_<rand>. - Malformed JSON falls back to text + stop reason. - Unterminated block falls back to text + stop reason. - Tools offered but model declines: returns text-only with stop reason so the gateway-loop treats it as a final answer rather than wedging for tool calls that never come. - AbortSignal SIGTERMs the child. - is_error envelope rejected. - Non-JSON output rejected. - doStream throws. - argv + cwd assertion: --print, --disable-slash-commands, --system-prompt are present and cwd is the dedicated tmpdir. Tests use a POSIX shell stub at GBRAIN_CLAUDE_CLI_BIN so the suite runs without claude-cli installed and without API credits. End-to-end smoke verified against a real `claude --print --model haiku` invocation: model emitted `<use_tools>` block with toolu_add_001 + {"a":12,"b":30}, adapter parsed back into a `tool-call` content block, finishReason 'tool-calls'.

jarvisdoes and others added 3 commits June 17, 2026 21:51

brettdavies mentioned this pull request Jun 18, 2026

feat(ai): claude-cli recipe for native gateway-based subagent dispatch — Claude Max subscription without ANTHROPIC_API_KEY #2276

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(ai): claude-cli recipe for native gateway-based subagent dispatch#2277

feat(ai): claude-cli recipe for native gateway-based subagent dispatch#2277
brettdavies wants to merge 3 commits into
garrytan:masterfrom
brettdavies:feat/claude-cli-subagent-tool-use-reconcile

brettdavies commented Jun 18, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

brettdavies commented Jun 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

The fix

Why a recipe instead of #334's MessagesClient adapter

Context isolation

Reproduction (before this PR)

Tests

Coordination with the gateway-loop reliability fix

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

brettdavies commented Jun 18, 2026 •

edited

Loading