Conversation
… walkthrough
New gateway/openapi.yaml (892 lines) — the first machine-readable
contract for the gateway's HTTP surface:
- POST /plan request body + full SSE event catalog
- GET /health liveness probe
- GET /.well-known/did.json self-published DID document
Covers every field on PlanRequest / AgentRequest / PeerAuth (discriminated
union on type: none|bearer|bearer_env|did_signed) / SkillRequest /
PlanPreferences, plus per-event schemas for session / plan / text.delta /
task.started / task.artifact / task.finished / final / error / done. Three
worked examples on /plan (minimal, single-agent, multi-agent with DID
signing + session continuation), one on /health, one on the DID document.
Recipes documented in the overview as internal (not on the API surface)
so readers understand the planner's behavior.
Distinct from the repo-root openapi.yaml, which describes the per-agent
Bindu Agent API (what a bindufy()-built agent exposes). The gateway spec
sits one layer up — the orchestrator, not the agent.
Validated with @redocly/cli lint — 0 errors, 13 benign "unused component"
warnings on the per-SSE-event schemas (OpenAPI 3.1 has no first-class
SSE modeling, so those schemas sit as reference docs for typed-client
generators rather than being $ref'd from a response body).
examples/gateway_test_fleet/README.md: three small additions so the
walkthrough reflects the new surface:
- New "teaching the planner a reusable pattern (recipes)" aside in
Part 5, pointing at the two seed recipes and the gateway README's
§Recipes section. Explains progressive disclosure in plain words.
- Glossary entries for Recipe and OpenAPI.
- Two new bullets under "What to look at next" — the openapi.yaml
for paste-into-Swagger-UI exploration, and the seed recipe for
the no-code extension path.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…h call_ prefix
Two latent name-collision bugs surfaced during the recipes work:
1. A2A tool id collisions — silent last-write-wins
session/prompt.ts built its tool map as `toolMap[id] = ai`, so two
catalog entries producing the same normalized `call_<agent>_<skill>`
id silently collapsed; the later entry won and the earlier one
vanished without any indication. A caller thinking they were load-
balancing across two peers would see only one being called, with
no way to tell which.
Fix: new exported helper `findDuplicateToolIds(agents)` in the
planner, consulted in plan-route.ts right after `PlanRequest.parse`.
Returns all colliding groups, not just the first. Plan-route emits a
400 `invalid_request` with a detail listing every clashing pair —
the caller can fix their catalog instead of chasing a phantom
dispatch.
Catches all three flavors:
- two entries with same name + same skill id
- one agent with a duplicated skill id in its skills[]
- non-alphanumerics flattening to the same normalized id
(e.g., "foo.bar" and "foo_bar" both → "foo_bar")
2. Recipe names starting with "call_"
The `load_recipe({name})` tool description lists recipe names that
sit visually adjacent to A2A tool ids like `call_research_search` in
the planner's prompt. Technically different namespaces (recipe names
are parameters of one tool, tool ids are tools), but the visual
collision would confuse the LLM.
Fix: Zod `.refine` on Recipe.Info.name rejecting the `call_` prefix.
Fails at load time with a clear message; the gateway won't boot with
a misnamed recipe on disk.
Coverage:
tests/planner/tool-id-collision.test.ts (10 tests)
tests/recipe/loader.test.ts (+1 test for the prefix guard)
OpenAPI: /plan's 400 response gained a second example showing the
collision-detail payload so integrators writing against the spec see
both failure shapes.
Typecheck clean, 185/185 tests pass (was 174), redocly lint 0 errors.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
One document, six chapters, written for a reader with no prior AI-agent
knowledge. Runnable from a clean clone, ~45 min straight through.
Chapters:
1. Why a gateway exists — the problem, the idea, no code
2. Hello, gateway — install, configure, fire one /plan, read the SSE
line by line
3. Adding a second agent — the full fleet, a three-agent chain,
what the planner is actually doing
4. Teaching it a pattern (recipes) — author a minimal recipe,
progressive disclosure, bundled layouts, permission scoping
5. Giving it an identity (DID signing) — why prod needs it, the env
vars, auto vs manual modes, what changes on the wire
6. What's next — reference pointers, hands-on suggestions, the
production checklist
Intentionally consolidates content that was previously scattered across
gateway/README.md (architecture teaser, DID setup, recipes section) and
examples/gateway_test_fleet/README.md (the 8-part tutorial). Those two
docs will trim to reference role in the next commits so there's one
story to read, not three overlapping ones.
No code changes — 174/174 tests still pass trivially since nothing in
src/ or tests/ was touched.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Before: gateway/README.md was 381 lines mixing status + quickstart + a
5-step hands-on walkthrough + architecture narrative + DID setup + recipes
intro + troubleshooting. examples/gateway_test_fleet/README.md was 580
lines with its own parallel quickstart + 8-part tutorial that duplicated
a lot of the same ground.
After:
gateway/README.md → 242 lines. Operator reference only:
env-var table, routes table, recipe
frontmatter reference, DID wire format
+ failure-modes table, repo layout.
Every walkthrough section moved to
docs/STORY.md.
fleet/README.md → 87 lines. Just describes the fleet:
what scripts exist, the 13-case matrix
table, the most common errors. Points
at STORY.md for the guided experience.
The narrative lives once, in docs/STORY.md; the operator reference lives
once, in gateway/README.md; the fleet scaffolding is documented where it
lives. No content lost — everything that was pedagogical got absorbed
into STORY.md during Phase A.
Net: three docs went from ~1460 lines to ~1340, the 120-line reduction
is pure de-duplication.
Typecheck clean, 185/185 tests pass (no source changes).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… and openapi.yaml gateway/plans/ was the phased roadmap + design-rationale directory (PLAN.md + phase-0 through phase-5). Phases 1-2 shipped; phases 3-5 are future work better tracked in GitHub issues than frozen markdown. The three load-bearing bits of PLAN.md have new homes: - API contract rationale → gateway/openapi.yaml §paths./plan + schemas - Architecture narrative → gateway/docs/STORY.md ch 2 (+ sidebar) - Fork & extract rationale → gateway/README.md §License + credits Source references that still pointed at gateway/plans/* have been repointed: gateway/src/planner/index.ts:86 → openapi.yaml §PlanPreferences gateway/src/api/plan-route.ts:30 → openapi.yaml §paths./plan gateway/tests/planner/plan-request-schema.test.ts:13 → openapi.yaml scripts/package.json → inline description scripts/bindu-dryrun.ts → inline description of fixtures Verified no remaining "plans/" references in the repo outside of node_modules (grep clean). Fleet logs/ and pids/ gitignore entries were already present at .gitignore:222-223 — nothing tracked, no change needed there. Net: −2,729 lines of stale planning docs, +7 lines of source updates. Typecheck clean, 185/185 tests pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The sample agents live under examples/gateway_test_fleet/, not examples/ directly. Two references — the "start one agent" command in Chapter 2 Step 5 and the "open any one" link in Chapter 3 — pointed at the wrong path. Fixed both. Caught by a link-integrity sweep across STORY.md, gateway/README.md, and fleet/README.md — all other links resolve. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Before: /health returned a four-field object — {ok, name, session,
supabase}. Enough for a container liveness check; too thin for
operators wanting to answer "what's running, what version, what
model?" at a glance.
After: full shape matching the per-agent Bindu health payload, adapted
for the coordinator role:
{
"version": "0.1.0",
"health": "healthy", // healthy|degraded|unhealthy
"runtime": {
"storage_backend": "Supabase",
"bus_backend": "EffectPubSub",
"planner": { // NEW — what LLM drives the plan
"model": "openrouter/anthropic/claude-sonnet-4.6",
"provider": "openrouter",
"model_id": "anthropic/claude-sonnet-4.6",
"temperature": 0.3,
"top_p": null,
"max_steps": 10
},
"recipe_count": 2,
"did_signing_enabled": true,
"hydra_integrated": true
},
"application": {
"name": "@bindu/gateway",
"session_mode": "stateful",
"gateway_did": "did:bindu:...", // renamed from agent-side "agent_did"
"gateway_id": "f72ba681-...", // renamed from "penguin_id"
"author": "ops_at_example_com"
},
"system": {
"node_version": "v22.22.1",
"platform": "darwin",
"architecture": "arm64",
"environment": "development"
},
"status": "ok",
"ready": true,
"uptime_seconds": 2.4
}
Implementation:
- New src/api/health-route.ts with buildHealthHandler(identity, hydraIntegrated)
— captures bootTime, package version, planner-agent snapshot, recipe count
at init; reconstitutes the full payload per request with only a single
Date.now() call.
- src/server/index.ts shrunk to the app shell — all route wiring lives in
src/api/* and is mounted from src/index.ts, matching the plan-route
pattern.
- Helpers (splitModelId, deriveGatewayId, deriveAuthor) exported so they
can be unit-tested without spinning up the layer graph.
- Returns 200 regardless of health state — /health is informational, not
a gate. Operators that want a readiness gate read `ready` in the body.
- All in-memory: no Supabase ping, no outbound HTTP. Safe as a k8s
liveness probe.
openapi.yaml: HealthResponse schema rewritten with four sub-schemas
(HealthRuntime, HealthPlanner, HealthApplication, HealthSystem), full
descriptions, nullable [string, "null"] unions for optional identity
fields, and a worked example matching a running fresh-boot gateway.
Redocly lint 0 errors.
gateway/README.md: §Quickstart /health curl replaced with the full
JSON example + pointer to the openapi schema.
docs/STORY.md Chapter 2: expanded from a one-line `ok:true` comment to
a walkthrough of the two most interesting fields (planner.model,
recipe_count) so a first-time reader sees why they matter.
Tests: 9 new cases in tests/api/health-route.test.ts covering
splitModelId's slash handling, deriveGatewayId for both did:bindu
and did:key, deriveAuthor falling back to null on non-Bindu DIDs and
malformed input.
Typecheck clean, 194/194 tests pass (was 185, +9 health-route).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Pinning peer DIDs is the common path to a non-null agent_did on every
gateway SSE frame. Before: operator had to run a separate ad-hoc
script (or dig through logs) to collect the five DIDs, then reformat
them into shell exports by hand. Now: start_fleet.sh polls each
agent's /health for up to 5s, extracts application.agent_did, and
prints two blocks:
1. Agent DIDs table — human-readable name / port / DID
2. Shell exports — ready-to-paste export JOKE_DID=... etc.
The variable names strip the _agent suffix (JOKE_DID, not
JOKE_AGENT_DID) so they match the "name" field convention used in
/plan requests.
Polling is tolerant — if an agent's /health is slow to come up, the
row shows "(not ready — re-run or check logs/...)" instead of
failing the whole script. Safe on re-run: if agents are already
running, the "skip" path still reaches the DID block and reprints
everything.
Sample output:
Agent DIDs:
joke_agent :3773 did:bindu:gateway_test_fleet_at_getbindu_com:joke_agent:47191e40-...
math_agent :3775 did:bindu:gateway_test_fleet_at_getbindu_com:math_agent:99ed7402-...
...
Shell exports (paste to pin peer DIDs in /plan requests):
export JOKE_DID="did:bindu:gateway_test_fleet_at_getbindu_com:joke_agent:47191e40-..."
export MATH_DID="..."
...
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Printing exports to stdout was the ritual 30 minutes ago; the operator
had to hand-copy lines back into their shell. Worse, executing
./start_fleet.sh runs the exports in a *child* process, so even if we
invoked `export` directly the parent shell would never see them —
a common shell-semantics gotcha.
Fix: write the exports to a sibling `.fleet.env` file (regenerated on
every run), and tell the operator the one command that actually loads
them — `source <path>`. The sourcing-vs-executing distinction is real
and worth being explicit about in the output.
Why not just `source ./start_fleet.sh` instead of `./start_fleet.sh`?
Because the script has `set -e`, spawns background processes, and
exits with non-zero on port conflicts. Any of those failures would
kill the operator's interactive shell if they'd sourced. Writing a
small idempotent .env the operator sources explicitly is the standard
workaround (think: `aws configure export-credentials`,
`direnv allow`, etc.).
Also:
- `.fleet.env` added to .gitignore (sits alongside logs/ + pids/).
- The `.fleet.env` header explains where it came from and how to
use it, so an operator who finds it first doesn't guess.
- Regenerate with `>` so stale DIDs from a prior run can't linger
if UUIDs rotated.
- Self-describing message at the bottom of start_fleet.sh output
points at the file and gives the exact command.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Before: turning on `trust.verifyDID: true` on a peer in /plan would
flip `verified="yes"` on the <remote_content> envelope the planner
sees, but the structured counts (signed / verified / unsigned) stayed
internal to the gateway. Operators couldn't tell "all signatures
verified cleanly" apart from "no artifacts were signed at all" —
both surface as `verified="yes"` in the envelope because the gateway
treats absence as non-failure.
After: task.artifact and task.finished SSE frames carry a
`signatures` field when the peer call ran verification:
"signatures": { "ok": true, "signed": 2, "verified": 2, "unsigned": 0 }
Now a consumer can assert `signatures.signed > 0 && signatures.ok`
to mean "this response was cryptographically proven to come from the
pinned DID's private key." `signed === 0` reveals the agent isn't
signing its artifacts — actionable info hidden before.
Implementation:
- ToolCallEnd Bus event schema gained an optional `signatures` field
(Zod: { ok, signed, verified, unsigned } | null).
- session/prompt.ts wraps each tool's execute() so the returned
ExecuteResult.metadata lands in a per-call Map. The AI SDK's
`aiTool.execute` only accepts a string output, so metadata has to
ride out of band. Map is closure-scoped to prompt() and dies when
the call exits.
- The tool-result handler in prompt.ts reads from the Map and
publishes signatures with the ToolCallEnd event.
- plan-route.ts adds `signatures` to both task.artifact and
task.finished SSE payloads. Conditional — absent when the tool
didn't do verification (local tools, load_recipe, etc.), `null`
when verification was enabled but skipped (no pinnedDID, DID doc
unreachable), populated object when verification actually ran.
- openapi.yaml SSEEvent_TaskArtifact / SSEEvent_TaskFinished
reference a new PlanSignatures schema with full docs on how to
interpret every combination of ok/signed/verified/unsigned,
including the "vacuous yes" trap when signed === 0.
Typecheck clean, 194/194 tests pass, redocly lint 0 errors.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Caution Review failedThe pull request is closed. ℹ️ Recent review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (27)
📝 WalkthroughWalkthroughThis pull request consolidates gateway documentation from phase-based planning files into a new narrative-driven STORY.md guide and formal OpenAPI specification. It implements a new Changes
Sequence Diagram(s)sequenceDiagram
participant Client
participant Gateway as Gateway Server
participant PlanRoute as /plan Handler
participant Planner as Planner
participant Agent as Agent
Client->>Gateway: POST /plan (request)
Note over PlanRoute: validate findDuplicateToolIds()
alt Collision Detected
PlanRoute-->>Client: 400 invalid_request (tool ID collisions)
else No Collisions
PlanRoute->>Planner: planRequest
Planner->>Agent: execute tool call
Agent-->>Planner: tool result + metadata
Note over Planner: extract signatures from metadata
Planner-->>PlanRoute: task.artifact (with signatures if present)
PlanRoute-->>Client: SSE stream (task.artifact, task.finished events)
end
sequenceDiagram
participant Client
participant Gateway as Gateway Server
participant HealthRoute as /health Handler
participant Config as Config Service
participant AgentService
participant RecipeService
Client->>Gateway: GET /health
HealthRoute->>Config: fetch config
HealthRoute->>AgentService: get agent count
HealthRoute->>RecipeService: get recipe count
Note over HealthRoute: compute uptime<br/>classify ready/health/status<br/>derive gateway ID and author
HealthRoute-->>Client: 200 JSON (version, runtime, application, system, ready)
Estimated code review effort🎯 4 (Complex) | ⏱️ ~50 minutes The review spans diverse areas: significant documentation replacement (1,884 lines removed, 1,200+ added), new feature implementations across three files (health endpoint, collision detection, fleet automation), logic modifications in session handling with metadata threading, and ten test files added/modified. The changes exhibit mixed complexity—some straightforward deletions and doc updates, others requiring understanding of new planner validation, signature metadata flow, and health endpoint composition. Possibly related PRs
Poem
✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
feat(gateway): verification hardening + docs consolidation (post-#487)
Summary
Describe the problem and fix in 2–5 bullets:
Change Type (select all that apply)
Scope (select all touched areas)
Linked Issue/PR
User-Visible / Behavior Changes
List user-visible changes (including defaults/config).
If none, write
None.Security Impact (required)
Yes/No)Yes/No)Yes/No)Yes/No)Yes/No)Yes, explain risk + mitigation:Verification
Environment
Steps to Test
Expected Behavior
Actual Behavior
Evidence (attach at least one)
Human Verification (required)
What you personally verified (not just CI):
Compatibility / Migration
Yes/No)Yes/No)Yes/No)Failure Recovery (if this breaks)
Risks and Mitigations
List only real risks for this PR. If none, write
None.Checklist
uv run pytest)uv run pre-commit run --all-files)Summary by CodeRabbit
Release Notes
New Features
/healthendpoint providing gateway status, configuration, and readiness checksImprovements
Documentation