feat(ai): claude-cli recipe for native gateway-based subagent dispatch#2277
Open
brettdavies wants to merge 3 commits into
Open
feat(ai): claude-cli recipe for native gateway-based subagent dispatch#2277brettdavies wants to merge 3 commits into
brettdavies wants to merge 3 commits into
Conversation
…use) Closes garrytan#334 (partially — text-only baseline; tool use lands in the next commit on this branch). Adds a MessagesClient adapter that shells out to `claude --print --output-format json --model <model>` instead of the Anthropic SDK. When `GBRAIN_USE_CLAUDE_CLI=1` is set, the subagent worker registers the adapter in place of the SDK client; the default path (Anthropic SDK with ANTHROPIC_API_KEY) is unchanged when the env var is unset or set to anything else. The benefit is that Claude Max subscribers can run Minions subagents against their existing OAuth subscription, no ANTHROPIC_API_KEY needed. New: src/core/minions/handlers/claude-cli-adapter.ts - Implements the MessagesClient interface exported from subagent.ts. - Strips provider prefixes (`anthropic:`, `litellm:`) from the model id because `claude --print` only accepts CLI-native aliases (`sonnet`, `opus`, `haiku`, or the bare `claude-*-N-M` form). - Flattens the Anthropic messages array into a single text prompt for claude-cli stdin. Tool blocks (tool_use / tool_result) are stringified as placeholders so multi-turn conversations stay coherent in this baseline; native tool_use round-tripping is the follow-up commit. - Spawns claude with stdio piped, captures stdout, parses the `{type:"result", subtype:"success", result, usage, ...}` JSON envelope, and returns it as a properly shaped Anthropic.Message with `stop_reason: 'end_turn'`. - Token totals propagate from the claude usage block so the subagent handler's `ctx.updateTokens()` reports usable numbers. - AbortSignal is wired through to SIGTERM the child so the subagent loop's cancellation path stays correct. Modified: src/commands/jobs.ts (worker registration) - Conditionally constructs a MessagesClient via the new adapter when GBRAIN_USE_CLAUDE_CLI=1. - Passes it into makeSubagentHandler({ engine, client: subagentClient }). - Logs `[minion worker] subagent routing via claude-cli (GBRAIN_USE_CLAUDE_CLI=1)` on startup so the env var status is operator-visible. Limitations of this commit (addressed in the follow-up): - Tool use is not yet supported. Tools in params.tools are ignored; the adapter returns a single text block with stop_reason='end_turn'. - Token counts come from claude-cli's reporting and may not match the Anthropic API's accounting precisely (especially for cache tiers). Original design from garrytan#334; this commit preserves that author's attribution. The follow-up commits on this branch carry the tool-use implementation.
…op of jarvisdoes baseline Builds on the previous commit (jarvisdoes's garrytan#334 baseline) by adding three things the upstream issue called out as gaps or that surfaced during review: 1. Tool use support via system-prompt-instructed JSON emission. 2. Context isolation flags so claude-cli does not load operator-level CLAUDE.md, skills, and local project context into every subagent call. 3. Env var rename from GBRAIN_USE_CLAUDE_CLI=1 to GBRAIN_SUBAGENT_PROVIDER=claude-cli to match the existing GBRAIN_<noun>_<role>=<value> convention used by GBRAIN_CHAT_MODEL, GBRAIN_EMBEDDING_MODEL, GBRAIN_EXPANSION_MODEL. ## Tool use The MessagesClient interface returns Anthropic.Message objects whose content array may include tool_use blocks. The subagent handler filters those blocks and dispatches each tool, so any backend that produces correctly shaped tool_use blocks gets the same loop behavior as the Anthropic SDK. The adapter injects a system-prompt addendum describing the tool registry plus an emission protocol: <use_tools> [{"id": "...", "name": "...", "input": {...}}, ...] </use_tools> After the response comes back, extractToolCalls() scans for the block, parses the JSON (tolerant of optional ```json fencing), and converts each entry into a tool_use content block. Multiple parallel tool calls in one turn are supported via the array shape; this is the exact case that breaks today on the codex-proxy / litellm GPT-5.x bridge where parallel tool-call response IDs get dropped. Defensive fallbacks: - Malformed JSON inside the block: drop to text-only, stop_reason='end_turn'. - Unterminated <use_tools> (no close tag): drop to text-only. - Model omits id field: adapter synthesizes a toolu_claude_cli_<rand> id. - Empty response: still hand the subagent loop a well-formed content array so the .filter chain does not crash. ## Context isolation claude-cli auto-discovers CLAUDE.md from cwd upward and injects the operator's skills + plugins + auto-memory into the default system prompt. On a real install that is ~42-65k tokens of contamination per subagent call, with both cost and behavioral consequences (the subagent picks up the operator's coding conventions, opinions, and preferences). The maximum suppression that still preserves OAuth / Claude Max subscription auth is: - Spawn from a dedicated clean cwd (tmpdir-based) so LOCAL CLAUDE.md auto-discovery has nothing to find. -13k tokens on a real gbrain install where CLAUDE.md is substantial. - --disable-slash-commands so skill resolution does not pull in /skill-name handlers. - --system-prompt <gbrain prompt> so the default system prompt is replaced rather than appended to. The --bare flag would also strip user-level ~/.claude/CLAUDE.md but it forces ANTHROPIC_API_KEY auth, defeating the whole point of this adapter. The remaining ~42k cached tokens from user-level instructions are accepted as a cost-trivial trade-off because the Max subscription absorbs the per-call cost. Behavioral contamination is mitigated by gbrain's strong per-call system prompt overriding any operator-level drift. ## Env var rename Surveyed all ~140 GBRAIN_* env vars in src/. The codebase uses three patterns: GBRAIN_NO_<feature> (negative toggles), GBRAIN_<noun>_<role> =<value> (routing keys), GBRAIN_ALLOW_<feature> (permissive toggles). GBRAIN_USE_* does not appear anywhere except jarvisdoes's original commit; it would introduce a fourth pattern. GBRAIN_SUBAGENT_PROVIDER=claude-cli aligns with the routing-keys family and is value-extensible — adding codex-cli / meridian-proxy / etc. later means a new value, not a new env var. The scope ('SUBAGENT_*') is also unambiguous about which calls the toggle covers; GBRAIN_USE_CLAUDE_CLI was silent on whether it applied to all gbrain LLM calls or only the subagent path. Unknown values are rejected with a fail-fast error message naming the two valid values rather than silently falling through to the default. ## Tests New file: test/claude-cli-adapter.test.ts — 12 tests, 33 assertions: - Text-only round trip (single text block, usage propagation, end_turn). - Provider prefix stripping ('anthropic:claude-sonnet-4-6' -> 'claude-sonnet-4-6'). - Single tool_use parsing. - Multiple parallel tool calls in one block (the case that triggered the codex-proxy regression). - Fenced JSON inside <use_tools> block. - Model-omitted id gets synthesized to toolu_claude_cli_<rand>. - Malformed JSON falls back to text. - Unterminated block falls back to text. - AbortSignal SIGTERMs the child. - Error envelope rejected with informative message. - Non-JSON output rejected with raw-output excerpt in the error. - argv + cwd assertion: --disable-slash-commands + --system-prompt are present and cwd is the dedicated tmpdir. Tests use a POSIX shell stub at GBRAIN_CLAUDE_CLI_BIN that emits a scripted --output-format json envelope, so the suite runs without claude-cli installed and without API credits.
garrytan#334 baseline) Replaces the MessagesClient adapter + GBRAIN_USE_CLAUDE_CLI=1 env-var gate from the previous commit on this branch with a proper gateway recipe. The recipe path gives per-call routing as a native capability: a model string like `claude-cli:claude-sonnet-4-6` lands here while a sibling `litellm:gpt-5.4` continues through the litellm-proxy / codex-proxy path in the same worker. No global env-var switch, no agent.use_gateway_loop bypass, no MessagesClient injection at jobs.ts worker startup. The previous commit on this branch (jarvisdoes baseline) is preserved in the history for garrytan#334 authorship attribution. Its functional changes are backed out here because the recipe pattern is gbrain's established integration seam; introducing a parallel MessagesClient + env-var path would have created two routing mechanisms competing for the same job. New: src/core/ai/recipes/claude-cli.ts - Recipe declaration: id 'claude-cli', tier 'native', implementation 'claude-cli', chat-only (no embedding or expansion touchpoints). - Models: claude-opus-4-7, claude-sonnet-4-6, claude-haiku-4-5-20251001. - supports_tools and supports_subagent_loop both true. - supports_prompt_cache false because the CLI handles caching internally and does not surface cache_control via the standard control plane. - auth_env.required is the empty array because the CLI owns auth (OAuth session managed by `claude login`). - Friendly aliases mirror the `anthropic` recipe: `sonnet`, `haiku`, `opus` and the same legacy-id rewrites for back-compat with stale config strings. New: src/core/ai/providers/claude-cli-language-model.ts - ClaudeCliLanguageModel class implementing the ai-sdk LanguageModelV2 interface. - doGenerate: renders the ai-sdk prompt array into a system text + user text, injects the use_tools protocol instructions when tools are present, spawns `claude --print --output-format json --model <X> --disable-slash-commands --system-prompt <gbrain prompt>` from a dedicated tmpdir (contamination suppression: no local CLAUDE.md auto-discovery), parses the JSON envelope, extracts <use_tools> blocks, and returns ai-sdk-shaped LanguageModelV2Content (text + tool-call parts with stringified-JSON input matching the V2 contract). - Tolerates fenced JSON inside use_tools blocks, malformed JSON (falls back to text), missing close tag (falls back to text), model-omitted ids (synthesizes toolu_claude_cli_<rand>). - Parallel tool calls in one block round-trip cleanly: this is the case that drops IDs on the litellm + codex-proxy bridge today. - AbortSignal SIGTERMs the child for proper cancellation. - doStream throws not-supported (gateway.toolLoop is non-streaming). Modified: src/core/ai/gateway.ts - Adds case 'claude-cli' to instantiateChat (returns ClaudeCliLanguageModel). - Adds case 'claude-cli' to instantiateExpansion (same wrapper, reserved for a future expansion touchpoint declaration). - Adds case 'claude-cli' to instantiateEmbedding (throws, no embedding model, mirrors the native-anthropic path). - Lazy require() at the call site keeps the gateway module load cheap for users who never use the claude-cli path. Modified: src/core/ai/recipes/index.ts - Registers `claudeCli` in the ALL[] array next to `anthropic`. Modified: src/core/ai/types.ts - Adds 'claude-cli' to the Implementation union so the gateway switch is exhaustive at compile time. Reverted: src/commands/jobs.ts - Drops the GBRAIN_USE_CLAUDE_CLI=1 env-var gate the prior commit added. Routing now happens at the gateway based on the model string. Deleted: src/core/minions/handlers/claude-cli-adapter.ts - The MessagesClient adapter is superseded by the recipe + LanguageModelV2 path. Two routing mechanisms competing for the same job would have forced users to reason about which one wins; the recipe is the single source of truth. New file: test/claude-cli-recipe.test.ts (16 tests, 46 assertions): - Recipe registration: getRecipe returns chat-only Recipe; aliases map short names (sonnet/haiku/opus) to canonical model ids. - Text round trip: single text content block, usage propagation, stop finish reason. - Provider prefix stripping. - Single tool-call parsing. - Multiple parallel tool calls in one block. - Fenced JSON inside the block. - Model-omitted id synthesizes toolu_claude_cli_<rand>. - Malformed JSON falls back to text + stop reason. - Unterminated block falls back to text + stop reason. - Tools offered but model declines: returns text-only with stop reason so the gateway-loop treats it as a final answer rather than wedging for tool calls that never come. - AbortSignal SIGTERMs the child. - is_error envelope rejected. - Non-JSON output rejected. - doStream throws. - argv + cwd assertion: --print, --disable-slash-commands, --system-prompt are present and cwd is the dedicated tmpdir. Tests use a POSIX shell stub at GBRAIN_CLAUDE_CLI_BIN so the suite runs without claude-cli installed and without API credits. End-to-end smoke verified against a real `claude --print --model haiku` invocation: model emitted `<use_tools>` block with toolu_add_001 + {"a":12,"b":30}, adapter parsed back into a `tool-call` content block, finishReason 'tool-calls'.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #2276, closes #334
Summary
Operators with a Claude Max subscription can now route subagent dispatch to their existing OAuth session via the local
claudebinary, without anANTHROPIC_API_KEY. After this PR,gbrain config set models.tier.subagent claude-cli:claude-sonnet-4-6(or any per-task key likemodels.dream.patterns) dispatches through the gateway-loop the same way every other non-Anthropic recipe does. Per-call routing is preserved: a worker can serveclaude-cli:*for one job andlitellm:gpt-5.4for the next, no env-var switch.The fix
A native
claude-clirecipe in the registry. The recipe + its ai-sdkLanguageModelV2provider implementation letgateway.toolLoop()dispatch to the localclaudebinary directly, parse the JSON envelope, and surface parallel<use_tools>blocks as native ai-sdktool-callcontent parts. End-to-end:src/core/ai/recipes/claude-cli.ts):tier: 'native', chat-only (no embedding / no expansion). Models:claude-opus-4-7/claude-sonnet-4-6/claude-haiku-4-5-20251001.supports_tools: true,supports_subagent_loop: true,auth_env.required: [](CLI owns auth via its OAuth session).src/core/ai/providers/claude-cli-language-model.ts): ai-sdkLanguageModelV2.doGeneratespawnsclaude --print --output-format json --model <X> --disable-slash-commands --system-prompt <gbrain prompt>from a dedicated tmpdir (skips localCLAUDE.mdauto-discovery), parses the JSON envelope, extracts<use_tools>blocks back into nativetool-callcontent parts. Parallel tool calls in one block round-trip cleanly.src/core/ai/recipes/index.ts,src/core/ai/types.ts,src/core/ai/gateway.ts):claude-clijoins theImplementationunion;instantiateChat/instantiateExpansion/instantiateEmbeddinggaincase 'claude-cli':(chat returns theLanguageModelV2; expansion mirrors it for a future expansion touchpoint; embedding throws likenative-anthropicdoes).Why a recipe instead of #334's MessagesClient adapter
The original #334 proposal injected a
MessagesClientadapter at jobs.ts worker startup behindGBRAIN_USE_CLAUDE_CLI=1. That adapter only runs on the legacy Anthropic-direct path. Underagent.use_gateway_loop = true(the production default for every non-Anthropic recipe), dispatch flows throughgateway.toolLoop()notclient.create()atsubagent.ts:506, so the adapter never fires.Per-call routing via a proper recipe is the architecture that extends correctly into gateway-loop world. It also gives the operator a clean per-call routing knob between OAuth-subscription dispatch and API-key dispatch on the same model name, in the same worker, without a process-level switch.
Context isolation
The recipe spawns from a clean tmpdir +
--disable-slash-commands+--system-prompt, which is the maximum suppression preserving OAuth/Max-subscription auth.--barewould also strip~/.claude/CLAUDE.mdbut forcesANTHROPIC_API_KEYauth, defeating the whole point. The ~42K cached tokens from user-level instructions stay as an unavoidable trade-off on the subscription path.Reproduction (before this PR)
claudeCLI logged in. NoANTHROPIC_API_KEYin env.gbrain config set models.tier.subagent claude-cli:claude-sonnet-4-6.gbrain models→ warns "unrecognized provider" and falls back toTIER_DEFAULTS.subagent(anthropic:claude-sonnet-4-6).AIConfigError: Anthropic ... requires ANTHROPIC_API_KEY.After this PR: step 3 resolves to the claude-cli recipe; step 4 dispatches through the local
claudebinary; the job runs against the OAuth session.Tests
test/claude-cli-recipe.test.ts: 16 tests, 46 assertions. Covers recipe registration; spawn args + tmpdir isolation; JSON envelope parse (text-only, tool-call, parallel tool-call, mixed);<use_tools>block extraction back to ai-sdktool-callcontent parts; "model offered tools but declines to use them" case asserting clean text-only return so gateway.toolLoop treats it as a final answer rather than wedging; auth-env declared as empty array.bun run typecheckclean.bun run verifygreen.ANTHROPIC_API_KEY):gbrain dream --phase patternswithmodels.dream.patterns = claude-cli:claude-sonnet-4-6runs through parallel tool calls, persists turns, resumes cleanly across worker recycles.Coordination with the gateway-loop reliability fix
The gateway-loop tool-result-turn persistence fix (separate PR, separate issue; see
fix(subagent): persist + reconcile tool-result turns in the gateway loop) is REQUIRED for this recipe to work under load. Without that fix, parallel tool calls dead-letter on any mid-conversation interruption regardless of which recipe is dispatching. Recommend landing that fix first; either order works for the actual code change (this PR's gateway.ts touches are additive only).