From 334066c5bcd68c903da3e73f037ff6702fd0fc99 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Raahul=20Dutta=20-=20=E0=A6=B0=E0=A6=BE=E0=A6=B9=E0=A7=81?= =?UTF-8?q?=E0=A6=B2=20=F0=9F=96=96?= Date: Mon, 20 Apr 2026 16:31:26 +0200 Subject: [PATCH 1/6] =?UTF-8?q?docs(gateway):=20add=20BUGS=5FAND=5FKNOWN?= =?UTF-8?q?=5FISSUES.md=20=E2=80=94=20things=20noticed=20during=20this=20s?= =?UTF-8?q?ession?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Living list of concrete, file-path-grounded defects and gaps. 20 entries across 5 categories (security/correctness, reliability/observability, recipes, fleet/DX, API surface, tests, docs). Severity-tagged so a reader can triage: * 3 high-severity items β€” vacuous signatures.ok=true, dead AgentCard fallback code, and the permission model's "ask" being silently equivalent to "allow". * 10 medium items β€” catalog mutation on resumed sessions, no request size limit, no observability, no hot reload for recipes, etc. * 7 low items β€” stale names, missing tests, docs gaps. Each entry has a file path + line number so "I'll fix one" is a concrete task, not a discovery project. Adds a short "How to add to this list" footer so future contributors know the format. No code changes. Meant to be the gateway's equivalent of the "known issues" sections most production projects keep but this one was missing. Co-Authored-By: Claude Opus 4.7 (1M context) --- gateway/docs/BUGS_AND_KNOWN_ISSUES.md | 424 ++++++++++++++++++++++++++ 1 file changed, 424 insertions(+) create mode 100644 gateway/docs/BUGS_AND_KNOWN_ISSUES.md diff --git a/gateway/docs/BUGS_AND_KNOWN_ISSUES.md b/gateway/docs/BUGS_AND_KNOWN_ISSUES.md new file mode 100644 index 00000000..ab75bf1d --- /dev/null +++ b/gateway/docs/BUGS_AND_KNOWN_ISSUES.md @@ -0,0 +1,424 @@ +# Bugs and Known Issues β€” Bindu Gateway + +Living list of things that are wrong, brittle, or missing. Everything here is +grounded in specific files and line numbers so fixing them is a concrete task, +not a discovery project. + +**Legend:** +- πŸ”΄ **high** β€” behavior is subtly wrong or security-adjacent; fix before + anyone depends on it for production guarantees. +- 🟠 **medium** β€” real gap, not urgent, will bite someone eventually. +- 🟑 **low** β€” DX wart, stale comment, or nice-to-have. + +**Last reviewed:** 2026-04-20, on branch `feat/gateway-recipes`. + +--- + +## Security and correctness + +### πŸ”΄ `signatures.ok === true` can be vacuously true + +**Where:** [`src/bindu/client/index.ts:218`](../src/bindu/client/index.ts:218) + +```ts +ok: signed === 0 ? true : signed === verified +``` + +If an agent returns artifacts with **no signatures at all**, `ok` is `true` +and the `` envelope gets stamped `verified="yes"` β€” which is +indistinguishable (to the planner LLM reading the envelope) from "every +signature checked out against the pinned DID." The `signatures` SSE field +now exposes the counts so operators can tell, but the envelope itself still +lies. + +**Fix idea:** make the envelope's `verified` attribute three-valued β€” +`verified`, `unverified-unsigned`, `unverified-failed` β€” and update the +planner's system prompt so it knows how to read each. + +--- + +### πŸ”΄ `peer.card` is never populated β€” AgentCard-based DID fallback is dead code + +**Where:** [`src/bindu/client/index.ts:196`](../src/bindu/client/index.ts:196) + +```ts +const did = peer.trust.pinnedDID ?? (peer.card ? getPeerDID(peer.card) : null) +``` + +The `peer.card` branch is permanently `null` β€” nothing in the codebase +fetches `/.well-known/agent.json` or sets `PeerDescriptor.card`. Grep +confirms: no writes to `peer.card =` anywhere. Either: + +1. Implement AgentCard fetch at first call per session (the "option C" + from the earlier design discussion), OR +2. Delete the fallback and mark `peer.card` as reserved for a future + feature β€” pretending the fallback exists is worse than owning up. + +The plan-route's [`findPinnedDID()`](../src/api/plan-route.ts:314) also +only reads `pinnedDID`, so `agent_did` in SSE is `null` whenever the +caller didn't pin. Users quickly trip on this. + +--- + +### 🟠 `pinnedDID` accepts any string β€” no validation that it looks like a DID + +**Where:** [`src/planner/index.ts:77-79`](../src/planner/index.ts:77) + +```ts +trust: z.object({ + verifyDID: z.boolean().optional(), + pinnedDID: z.string().optional(), +}).optional(), +``` + +A caller can send `pinnedDID: "hello"` and the gateway will dutifully echo +`"hello"` in every SSE `agent_did` frame and try to resolve it (fails +silently at resolver). This burned us once with literal `${RESEARCH_DID}` +from an un-interpolated Postman variable. + +**Fix:** refine the Zod schema with `.regex(/^did:[a-z0-9]+:/)` at minimum. +Better: reject unknown DID methods (keep a whitelist: `did:bindu`, +`did:key`). Give the caller a 400 with a clear "didn't look like a DID" +message at the API boundary, not a silent misconfiguration. + +--- + +### 🟠 Session continuation silently overwrites `agentCatalog` on every call + +**Where:** [`src/planner/index.ts:184`](../src/planner/index.ts:184) + +```ts +if (existing) { + yield* db.updateSessionCatalog(sessionID, request.agents) +} +``` + +A second `/plan` with the same `session_id` and a different (shorter? +empty?) `agents` list replaces the catalog in the DB. Prior turns in the +session's history still reference the old catalog's tool names, but the +planner going forward sees the new catalog. If the user's second call +has fewer agents, the planner can't "call back" the agents from turn 1 +even though history suggests they existed. + +**Fix options:** +- Reject catalog mutation on resumed sessions (strict β€” might frustrate + callers who want to extend mid-conversation). +- Append rather than replace (accumulate over session lifetime). +- Version the catalog per message so history correctly references the + catalog that existed at the time the message was written. + +Today's silent-replace is the worst of the three β€” it pretends nothing +changed while changing it. + +--- + +### 🟠 `Recipe.available(agent)` treats "ask" like "allow" + +**Where:** [`src/recipe/index.ts:207-212`](../src/recipe/index.ts:207) + +```ts +return recipes.filter( + (r) => permEvaluate(rs, { permission: "recipe", target: r.name, defaultAction: "allow" }) !== "deny", +) +``` + +The permission evaluator returns `allow | deny | ask`. The recipe loader +filters only `deny`. That means a recipe with `permission: recipe: { "x": "ask" }` +will be **shown to the agent and loadable** without any interactive check β€” +`ctx.ask` on Tool.Context is a no-op today (see next item). + +**Fix:** decide the semantics now. Either: +- Treat "ask" as "deny until a permission UI ships" (safer default), OR +- Document that "ask" is identical to "allow" until Phase 2 β€” and change + the evaluator's three-valued return to a two-valued one for recipes. + +Current state will bite the first operator who writes `"ask"` thinking +it's restrictive. + +--- + +### 🟠 `ctx.ask` permission hook is never wired + +**Where:** [`src/session/prompt.ts:390-405`](../src/session/prompt.ts:390) (the `wrapTool` function) + +The `ToolContext` interface in [`src/tool/tool.ts:26`](../src/tool/tool.ts:26) +declares `ask?` as optional, and `wrapTool` constructs the context without +setting it. [`src/tool/recipe.ts:130-133`](../src/tool/recipe.ts:130) guards +the call (`if (ctx.ask) { ... }`) so there's no crash β€” it's just silently +skipped on every call. + +This means recipes load unconditionally today, permission config +notwithstanding. It's a known Phase-2 gap but operators reading the +`permission.recipe:` docs might reasonably expect it to work. + +**Fix:** either (a) wire a real `ask` implementation that can at minimum +log denied loads for later audit, or (b) add a `WARN: ctx.ask not wired` +log line at boot so operators see it. + +--- + +## Reliability and observability + +### 🟠 No request size limit on `/plan` body + +**Where:** [`src/api/plan-route.ts:62-68`](../src/api/plan-route.ts:62) + +```ts +const body = await c.req.json() +request = PlanRequest.parse(body) +``` + +Hono doesn't cap the body by default. A caller could POST a 100 MB +`agents[]` array and the gateway will happily parse it, run through +Zod, and push it into Supabase. At minimum it'll thrash memory. + +**Fix:** add a `bodyLimit` middleware on the `/plan` route β€” probably +1 MB for now (generous for real catalogs, hard-stops runaway payloads). + +--- + +### 🟠 Supabase failures mid-stream surface as a generic `event: error` + +**Where:** [`src/api/plan-route.ts:181-187`](../src/api/plan-route.ts:181) + +```ts +try { + await Effect.runPromise(planner.runPlan(sessionCtx, request, { abort: ac.signal })) +} catch (e) { + await stream.writeSSE({ + event: "error", + data: JSON.stringify({ message: (e as Error).message }), + }) +} +``` + +If Supabase goes down mid-plan, the generic `Error.message` ends up in +the SSE. Useful for a developer tailing logs; unhelpful for a caller +trying to write retry logic β€” no error code, no structured category. + +**Fix:** classify errors into a small enum (`db_failed`, +`peer_failed`, `llm_failed`, `timeout`) at the boundary and emit both +the code and the message. + +--- + +### 🟠 No observability β€” structured logs, tracing, correlation IDs + +No OTel, no structured-log fields, no correlation ID threaded through +Bus events. Debugging a production incident today means grepping +`console.log` output against the wall clock. + +**Fix path:** +- Adopt a structured logger (`pino` is what OpenCode uses and we + already vendored Effect). +- Add a `request_id` to every `Bus.publish` so a grep can follow one + `/plan` end-to-end. +- Wrap `planner.runPlan` in an OTel span when the env var + `OTEL_EXPORTER_OTLP_ENDPOINT` is set. + +--- + +### 🟑 `/health` checks no downstream connectivity + +By design (see `src/api/health-route.ts` module docstring) `/health` +doesn't ping Supabase or OpenRouter. That's correct for liveness probes +but leaves **no readiness endpoint** that does check downstream state. +An operator wanting k8s readiness gating has nowhere to wire it. + +**Fix:** add `/ready` that *does* ping Supabase + fetches OpenRouter's +`/models` with a short timeout. Return 503 if either fails. Keep +`/health` cheap for liveness. + +--- + +## Recipes + +### 🟠 No hot reload β€” authoring loop requires a gateway restart + +**Where:** [`src/recipe/index.ts:160-195`](../src/recipe/index.ts:160) β€” the +layer reads the filesystem once at init. + +Write a recipe, save, run `/plan` β€” planner uses the old recipe list +(or none if it's your first). Tight authoring loop needs Ctrl-C + +`npm run dev` every time. + +**Fix idea:** `chokidar` watch on `recipes/` dir β†’ re-run +`loadRecipesDir` β†’ swap state on change. Careful with the +`InstanceState`-style caching; the Effect layer will need a +cache-invalidation hook. + +--- + +### 🟑 `load_recipe` emits `agent_did: null` in SSE β€” can confuse consumers + +**Where:** `src/api/plan-route.ts` β€” `findPinnedDID` returns `null` for +any tool whose name doesn't match a catalog `agents[].name`, which is +correct for `load_recipe` (not a peer call) but consumers that filter +`task.*` frames by DID won't find this one. It looks like a peer call +that failed DID pinning. + +**Fix:** add an explicit `tool_kind: "peer" | "local"` field on the +SSE frames so consumers can partition. Right now the `agent` field +sometimes means "catalog entry name" (peer calls) and sometimes means +"local tool id" (load_recipe) β€” overloading. + +--- + +### 🟑 Recipe bundled-file scan has no depth signal in output + +**Where:** [`src/tool/recipe.ts:75-102`](../src/tool/recipe.ts:75) + +The tool returns a flat list of files inside `` β€” +relative paths only, no hint that the scan bottomed out at 10 entries +or 2 levels deep. If a recipe has >10 files, the planner silently sees +a truncated list. + +**Fix:** include an explicit `truncated: true | false` signal in the +metadata returned by the tool, and emit a console warning when a +recipe dir has more than 10 files. + +--- + +## Fleet / DX + +### 🟑 `faq_agent.py` registers as `bindu_docs_agent` β€” confusing DID name + +**Where:** `examples/gateway_test_fleet/faq_agent.py` (Python side, not +gateway code) + +Observed during the DID-printing work: +``` +faq_agent :3778 did:bindu:gateway_test_fleet_at_getbindu_com:bindu_docs_agent:9327ab1d-... +``` + +The DID segment is `bindu_docs_agent` because that's what the Python +agent registered itself as. The filename and the conventional catalog +name say `faq`. First-time readers of STORY.md Chapter 3 will stumble +on the mismatch. + +**Fix:** rename the Python agent to register consistently as +`faq_agent` OR rename the Python file to `bindu_docs_agent.py`. Pick +one, fix in one place. + +--- + +### 🟑 `.fleet.env` can go stale if agents regenerate DIDs + +The script regenerates `.fleet.env` on every run via `>`, so this is +fine in practice. But if an agent's DID seed rotates (delete +`~/.bindu/`, restart), and a user has an old shell with +`$RESEARCH_DID` set from a sourced `.fleet.env`, their next `/plan` +will pin the old DID and signature verification will fail cryptically. + +**Fix:** add a comment to `.fleet.env` reminding users to re-source +after any fleet restart, and have `start_fleet.sh` print a line +saying "if your shell has old DIDs sourced, re-source `.fleet.env`". + +--- + +## API surface + +### 🟑 No explicit `Accept: text/event-stream` enforcement on `/plan` + +Clients that POST without the header still get SSE. Not strictly +wrong (Hono streams regardless) but the implicit contract is +"everyone wants SSE back," which is only true by convention. + +**Fix:** either document `Accept` as required in openapi.yaml (it's +not today) or enforce it with a 406 for anything that doesn't list +`text/event-stream` or `*/*`. + +--- + +### 🟑 OpenRouter is the only supported provider + +**Where:** [`src/provider/index.ts`](../src/provider/index.ts) + +Swapping to a different LLM provider is a code change, not config. +The `model` field accepts `openrouter/…` strings and everything +downstream assumes that prefix. + +**Fix:** make provider lookup factory-style β€” `openrouter/*` β†’ +OpenRouterProvider, `anthropic/*` β†’ AnthropicProvider, etc. Not +urgent since OpenRouter already proxies almost everything, but +worth doing before any customer asks for it. + +--- + +## Tests and coverage gaps + +### 🟑 No planner-layer integration test + +We skipped this in Phase 7 of the recipes work because mocking +Provider + Session + DB + Bus is heavy. The unit tests for the +loader and tool cover the contracts, and the `/plan` SSE handler +has its own filter test, but no single test walks a request +end-to-end through the planner with a mocked LLM. + +**Fix:** build a layer that replaces `Provider.Service` with a +fake that emits a canned `StreamEvent` sequence, then assert the +SSE output matches expectations. Would also let us test the +signature-surfacing work against synthetic tool results. + +--- + +### 🟑 No health-endpoint integration test + +[`tests/api/health-route.test.ts`](../tests/api/health-route.test.ts) +only tests the pure helpers (`splitModelId`, `deriveGatewayId`, +`deriveAuthor`). The actual handler construction + response shape +is covered only by manual curl. + +**Fix:** add a test that builds the real layer graph (minus +Supabase β€” mock it), invokes the handler against a stub Hono +context, and asserts the full response body. Would catch drift +between the openapi schema and the actual response. + +--- + +## Docs + +### 🟑 STORY.md Chapter 5 references `BINDU_GATEWAY_HYDRA_SCOPE` only via +the gateway README β€” no standalone explanation. + +A reader going through STORY.md linearly hits Chapter 5 and needs +to understand what scopes mean before the DID-signing setup makes +sense. The README covers it but STORY.md doesn't. + +**Fix:** one-paragraph sidebar in Chapter 5 explaining "OAuth +scopes are just labels we ask Hydra to stamp on tokens; peers +check they have `agent:read` + `agent:write` before accepting +a /message/send". Not technical, just context. + +--- + +### 🟑 openapi.yaml lists SSE event schemas that aren't `$ref`'d + +Current redocly lint reports 13 "unused component" warnings for +`SSEEvent_Session`, `_Plan`, `_TextDelta`, `_TaskStarted`, etc. +OpenAPI 3.1 has no native SSE modeling so those schemas sit as +reference docs rather than being spliced into a response body. + +**Fix options:** +- Accept the warnings as documented (low-effort, pragmatic). +- Use `oneOf` inside the response's `text/event-stream` schema + to enumerate every event shape. Stretches OpenAPI but at + least gets the schemas used. +- Publish a second spec file (AsyncAPI 2.x or 3.x) for the + SSE surface. AsyncAPI natively models event streams. + +--- + +## How to add to this list + +When you find a gotcha while working on the gateway, add an entry +here. Format: + +- **Title** with severity icon +- **Where** β€” file path + line number (click-through) +- **What** β€” observed behavior + one-line hypothesis of cause +- **Fix** β€” concrete direction, not a rewrite + +Keep entries factual and file-path-grounded. Speculative "would be +nice" wishes belong in GitHub issues; this file is for verifiable +defects and real gaps. From 2a3ff15765893c2c9570cdf3737b8f0bfefac8e0 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Raahul=20Dutta=20-=20=E0=A6=B0=E0=A6=BE=E0=A6=B9=E0=A7=81?= =?UTF-8?q?=E0=A6=B2=20=F0=9F=96=96?= Date: Mon, 20 Apr 2026 16:36:38 +0200 Subject: [PATCH 2/6] fix(docs): silence detect-secrets false positive on sample DID signature MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit CI's detect-secrets hook flagged docs/openapi.yaml:598 as a "Base64 High Entropy String." That line is a base58-encoded Ed25519 signature used as an API example β€” signatures are public by design (that's the whole point), not a secret. Adding the standard `# pragma: allowlist secret` inline comment tells the scanner to skip this specific line without loosening global rules. Matches the pattern the project CLAUDE.md Β§Recent Learnings documents for .env.example files. Fixes https://github.com/GetBindu/Bindu/actions/runs/24672201148 --- docs/openapi.yaml | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/openapi.yaml b/docs/openapi.yaml index 7ab7ace4..68b47a65 100644 --- a/docs/openapi.yaml +++ b/docs/openapi.yaml @@ -595,7 +595,7 @@ paths: - kind: "text" text: "Paris." metadata: - did.message.signature: "2M1qbfLcyoQhSAfTzDghw15PMTfmv3jUoigk7KRuiowkEWZpU7aYLHTnqwamjEo4SxNskq15PZANNLuhJ7omzsxg" + did.message.signature: "2M1qbfLcyoQhSAfTzDghw15PMTfmv3jUoigk7KRuiowkEWZpU7aYLHTnqwamjEo4SxNskq15PZANNLuhJ7omzsxg" # pragma: allowlist secret artifact_id: "985b4f37-ee2e-48a4-bd6f-c66472e67b85" metadata: {} From b0c0b8a137ec754ffff535ad31078d7402f2a395 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Raahul=20Dutta=20-=20=E0=A6=B0=E0=A6=BE=E0=A6=B9=E0=A7=81?= =?UTF-8?q?=E0=A6=B2=20=F0=9F=96=96?= Date: Mon, 20 Apr 2026 16:40:40 +0200 Subject: [PATCH 3/6] =?UTF-8?q?fix(gateway):=20split=20'verified=3Dyes'=20?= =?UTF-8?q?into=20yes/unsigned=20=E2=80=94=20no=20more=20vacuous=20pass?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Addresses the first high-severity entry in BUGS_AND_KNOWN_ISSUES.md. Before: the envelope collapsed `signatures.ok` (a boolean) into a three-valued attribute β€” yes/no/unknown. When zero artifacts were signed, `ok` was vacuously true (nothing to fail) and the envelope read `verified="yes"` β€” indistinguishable to the planner LLM from a real cryptographic pass. After: four-valued `verified` attribute driven by a new pure helper computeVerifiedLabel(signatures): null β†’ "unknown" !ok β†’ "no" ok && signed === 0 β†’ "unsigned" (NEW) ok && signed > 0 && signed === verified β†’ "yes" The authoritative `signatures.ok` in the Bindu client stays unchanged β€” "no failures" is still a useful concept for the DB audit row and SSE signatures field. The label is just a richer projection for the planner's benefit. Implementation: - New exported VerifiedLabel type + computeVerifiedLabel() in planner - wrapRemoteContent takes the label directly (not a boolean) - Call site passes the full signatures object through the new helper Test: tests/planner/verified-label.test.ts β€” 7 cases pinning each branch, including the regression (vacuous-yes becomes "unsigned"). openapi.yaml: SSEEvent_TaskArtifact.content description updated to enumerate the four labels with semantics. BUGS_AND_KNOWN_ISSUES.md: entry marked RESOLVED with reference to the helper and the test. Typecheck clean, 201/201 tests pass, redocly lint 0 errors. Co-Authored-By: Claude Opus 4.7 (1M context) --- gateway/docs/BUGS_AND_KNOWN_ISSUES.md | 36 +++++++---- gateway/openapi.yaml | 14 ++++- gateway/src/planner/index.ts | 45 ++++++++++++-- gateway/tests/planner/verified-label.test.ts | 63 ++++++++++++++++++++ 4 files changed, 140 insertions(+), 18 deletions(-) create mode 100644 gateway/tests/planner/verified-label.test.ts diff --git a/gateway/docs/BUGS_AND_KNOWN_ISSUES.md b/gateway/docs/BUGS_AND_KNOWN_ISSUES.md index ab75bf1d..7f903c29 100644 --- a/gateway/docs/BUGS_AND_KNOWN_ISSUES.md +++ b/gateway/docs/BUGS_AND_KNOWN_ISSUES.md @@ -16,24 +16,34 @@ not a discovery project. ## Security and correctness -### πŸ”΄ `signatures.ok === true` can be vacuously true +### βœ… RESOLVED β€” `signatures.ok === true` was vacuously true -**Where:** [`src/bindu/client/index.ts:218`](../src/bindu/client/index.ts:218) +**Was at:** [`src/bindu/client/index.ts:218`](../src/bindu/client/index.ts:218) +(unchanged β€” that's the authoritative `ok` boolean, which still has +`signed === 0 ? true` semantics because "nothing to fail" isn't a failure). + +**Fixed by:** making the `` envelope label four-valued +instead of three-valued. See +[`computeVerifiedLabel()`](../src/planner/index.ts) β€” new pure helper +exported so the mapping is testable without the whole peer-call loop. -```ts -ok: signed === 0 ? true : signed === verified ``` +null β†’ "unknown" +!ok β†’ "no" +ok && signed === 0 β†’ "unsigned" <-- new +ok && signed > 0 && signed === verified β†’ "yes" +``` + +Before the fix, an agent that returned artifacts with no signatures +attached caused `signatures.ok` to be vacuously `true`, which the +envelope stamped as `verified="yes"` β€” indistinguishable from a real +cryptographic pass. The planner LLM trusted it equally. Post-fix the +`unsigned` label surfaces the ambiguity explicitly so the planner can +downweight unverified-hearsay responses in its reasoning. -If an agent returns artifacts with **no signatures at all**, `ok` is `true` -and the `` envelope gets stamped `verified="yes"` β€” which is -indistinguishable (to the planner LLM reading the envelope) from "every -signature checked out against the pinned DID." The `signatures` SSE field -now exposes the counts so operators can tell, but the envelope itself still -lies. +Coverage: [`tests/planner/verified-label.test.ts`](../tests/planner/verified-label.test.ts) β€” 7 cases pinning each branch including the regression (vacuous-yes becomes `"unsigned"`). -**Fix idea:** make the envelope's `verified` attribute three-valued β€” -`verified`, `unverified-unsigned`, `unverified-failed` β€” and update the -planner's system prompt so it knows how to read each. +Shipped on branch `feat/gateway-recipes` prior to merge. --- diff --git a/gateway/openapi.yaml b/gateway/openapi.yaml index c02b37ea..0a402862 100644 --- a/gateway/openapi.yaml +++ b/gateway/openapi.yaml @@ -972,9 +972,21 @@ components: content: type: string description: | - The peer's artifact text, wrapped in a `...` + The peer's artifact text, wrapped in a + `...` envelope. The planner treats this as untrusted data β€” clients should too. + + `verified` is four-valued: + - `yes` β†’ at least one signed artifact and all signed + verified against the pinned DID's public key. + Strongest guarantee. + - `no` β†’ at least one signed artifact failed + verification. Task is also marked `failed`. + - `unsigned` β†’ verification ran but no artifact carried a + signature. The body is unverified hearsay. + - `unknown` β†’ verification wasn't attempted (no `verifyDID`, + no `pinnedDID`, or DID doc unreachable). title: type: string description: Short display title, typically `@/`. diff --git a/gateway/src/planner/index.ts b/gateway/src/planner/index.ts index b58ce2ea..276af264 100644 --- a/gateway/src/planner/index.ts +++ b/gateway/src/planner/index.ts @@ -398,11 +398,15 @@ function buildSkillTool(peer: PeerDescriptor, skill: SkillRequest, deps: BuildTo }, }) - // 4. Wrap the output in an untrusted-content envelope for the planner + // 4. Wrap the output in an untrusted-content envelope for the planner. + // The `verified` attribute is four-valued (yes/no/unsigned/unknown) + // so the planner LLM can distinguish a cryptographic pass from + // "no signatures existed to check" β€” the latter used to appear + // as `verified="yes"` (vacuous) and quietly misled the model. const wrapped = wrapRemoteContent({ agentName: peer.name, did: peer.trust?.pinnedDID ?? null, - verified: outcome.signatures?.ok ?? null, + verified: computeVerifiedLabel(outcome.signatures ?? null), body: outputText, }) @@ -557,13 +561,46 @@ function extractOutputText(task: import("../bindu/protocol/types").Task): string return pieces.join("\n\n") } +/** + * Label the `verified` attribute on the envelope with + * enough fidelity that the planner LLM can distinguish the four real + * states verification can produce: + * + * - `"yes"` β€” at least one artifact carried a signature AND every + * signed artifact checked out against the pinned DID's + * public key. Strongest claim. + * - `"no"` β€” at least one signed artifact failed verification. + * Treat the body as definitely-tampered or + * wrong-provenance; the task is also marked `failed`. + * - `"unsigned"` β€” verification ran, but no artifact had a signature + * attached. Bodies came back; nothing was checked. + * Previously this collapsed into `"yes"` because the + * Bindu client's `signatures.ok` is vacuously true + * when `signed === 0` β€” misleading. Now it's + * explicit, so the planner can weight the source + * appropriately (e.g., treat as unverified hearsay). + * - `"unknown"` β€” verification was not attempted. Either + * `trust.verifyDID` was false, no pinned DID was + * supplied, or DID document resolution failed. + */ +export type VerifiedLabel = "yes" | "no" | "unsigned" | "unknown" + +export function computeVerifiedLabel( + signatures: { ok: boolean; signed: number; verified: number; unsigned: number } | null, +): VerifiedLabel { + if (signatures === null) return "unknown" + if (!signatures.ok) return "no" + if (signatures.signed === 0) return "unsigned" + return "yes" +} + function wrapRemoteContent(input: { agentName: string did: string | null - verified: boolean | null + verified: VerifiedLabel body: string }): string { - const verified = input.verified === null ? "unknown" : input.verified ? "yes" : "no" + const verified = input.verified const didAttr = input.did ? ` did="${escapeAttr(input.did)}"` : "" // Scrub nested envelope markers and common injection phrases so a // malicious peer can't forge a wrapper inside their own response. diff --git a/gateway/tests/planner/verified-label.test.ts b/gateway/tests/planner/verified-label.test.ts new file mode 100644 index 00000000..c3e98c0b --- /dev/null +++ b/gateway/tests/planner/verified-label.test.ts @@ -0,0 +1,63 @@ +import { describe, it, expect } from "vitest" +import { computeVerifiedLabel } from "../../src/planner" + +/** + * Regression test for the "vacuous verified=yes" bug. + * + * Before this fix, the envelope flattened any positive + * signatures.ok to verified="yes" β€” including the case where zero + * artifacts were signed (ok is vacuously true when there's nothing to + * fail). A planner LLM reading verified="yes" couldn't tell if a real + * cryptographic check happened or if the agent simply isn't signing. + * + * The fix splits the happy path: "yes" now means "signed > 0 && all + * verified" and a new "unsigned" label marks "verification ran but + * nothing was signed." See BUGS_AND_KNOWN_ISSUES.md Β§Security and + * correctness for the original issue. + */ + +describe("computeVerifiedLabel", () => { + it("returns 'unknown' when verification wasn't attempted (signatures null)", () => { + expect(computeVerifiedLabel(null)).toBe("unknown") + }) + + it("returns 'yes' when every signed artifact verified", () => { + expect( + computeVerifiedLabel({ ok: true, signed: 2, verified: 2, unsigned: 0 }), + ).toBe("yes") + }) + + it("returns 'yes' even when some artifacts were unsigned, as long as every signed one verified", () => { + expect( + computeVerifiedLabel({ ok: true, signed: 1, verified: 1, unsigned: 3 }), + ).toBe("yes") + }) + + it("returns 'unsigned' (not 'yes') when no artifacts carried signatures β€” prevents the vacuous pass", () => { + // This is the bug the fix addresses. Pre-fix: "yes". Post-fix: "unsigned". + expect( + computeVerifiedLabel({ ok: true, signed: 0, verified: 0, unsigned: 4 }), + ).toBe("unsigned") + }) + + it("returns 'unsigned' on an empty artifact list too", () => { + expect( + computeVerifiedLabel({ ok: true, signed: 0, verified: 0, unsigned: 0 }), + ).toBe("unsigned") + }) + + it("returns 'no' when at least one signed artifact failed verification", () => { + expect( + computeVerifiedLabel({ ok: false, signed: 2, verified: 1, unsigned: 0 }), + ).toBe("no") + }) + + it("returns 'no' even when signed === verified if ok is false (defensive against callers mutating counts)", () => { + // Shouldn't happen with the real Bindu client's output β€” ok follows + // deterministically from counts β€” but if any future caller constructs + // a signatures object by hand, `ok` is the authoritative field. + expect( + computeVerifiedLabel({ ok: false, signed: 1, verified: 1, unsigned: 0 }), + ).toBe("no") + }) +}) From d540ef2a53ad23192b9f4f11b67fe35a1e35394a Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Raahul=20Dutta=20-=20=E0=A6=B0=E0=A6=BE=E0=A6=B9=E0=A7=81?= =?UTF-8?q?=E0=A6=B2=20=F0=9F=96=96?= Date: Mon, 20 Apr 2026 16:44:44 +0200 Subject: [PATCH 4/6] =?UTF-8?q?feat(gateway):=20fetch=20AgentCard=20on=20f?= =?UTF-8?q?irst=20contact=20=E2=80=94=20activate=20DID=20fallback?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Resolves the second high-severity entry in BUGS_AND_KNOWN_ISSUES.md β€” the `peer.card` fallback at bindu/client/index.ts:196 was dead code because nothing in the codebase ever populated it. Signature verification required a pinnedDID; with none set, maybeVerifySignatures always short-circuited. New src/bindu/client/agent-card.ts β€” fetchAgentCard(peerUrl, opts): - GETs /.well-known/agent.json, Zod-parses against the AgentCard schema, caches per-process. - 2-second default timeout (short because it blocks the first call per peer; callers can override). - Every failure mode degrades to null: non-2xx, malformed JSON, schema mismatch, network error, abort. Failures are cached too so a flaky peer doesn't cost one outbound fetch per /plan. Wired into runCall at bindu/client/index.ts: fetches only when `trust.verifyDID: true` AND peer.card isn't already set. No cost on peers that don't care about verification. Mutation of input.peer.card is safe because PeerDescriptor is built fresh per catalog entry per request. End-to-end effect: `trust.verifyDID: true` WITHOUT `pinnedDID` is now a real feature. The gateway observes the peer's published DID, resolves its public key via the DID document, and verifies every artifact signature against it. Pinned wins over observed when both are set (pinning is a stronger security claim β€” the caller vouched for the identity, while observed just trusts what the peer published). Coverage: tests/bindu/agent-card-fetch.test.ts β€” 9 cases: - success path parses valid card - 404 returns null - malformed JSON returns null - schema mismatch returns null - network throw returns null - cache hit skips re-fetch (same reference returned) - negative cache skips re-fetch on known-bad peer - per-URL isolation β€” different peers fetched independently - timeout honored via internal AbortController BUGS_AND_KNOWN_ISSUES.md: original AgentCard entry marked RESOLVED with cross-refs to the helper and test. A new medium entry now tracks the remaining follow-up β€” SSE's agent_did field still only reflects pinned DIDs, not observed ones. That's plan-route.ts layer (findPinnedDID β†’ findAgentDID with observed fallback), scheduled next. Typecheck clean, 210/210 tests pass (+9 agent-card-fetch). Co-Authored-By: Claude Opus 4.7 (1M context) --- gateway/docs/BUGS_AND_KNOWN_ISSUES.md | 66 ++++++-- gateway/src/bindu/client/agent-card.ts | 88 +++++++++++ gateway/src/bindu/client/index.ts | 17 ++ gateway/tests/bindu/agent-card-fetch.test.ts | 155 +++++++++++++++++++ 4 files changed, 313 insertions(+), 13 deletions(-) create mode 100644 gateway/src/bindu/client/agent-card.ts create mode 100644 gateway/tests/bindu/agent-card-fetch.test.ts diff --git a/gateway/docs/BUGS_AND_KNOWN_ISSUES.md b/gateway/docs/BUGS_AND_KNOWN_ISSUES.md index 7f903c29..c38482a6 100644 --- a/gateway/docs/BUGS_AND_KNOWN_ISSUES.md +++ b/gateway/docs/BUGS_AND_KNOWN_ISSUES.md @@ -47,26 +47,66 @@ Shipped on branch `feat/gateway-recipes` prior to merge. --- -### πŸ”΄ `peer.card` is never populated β€” AgentCard-based DID fallback is dead code +### βœ… RESOLVED β€” `peer.card` is now populated via `fetchAgentCard` -**Where:** [`src/bindu/client/index.ts:196`](../src/bindu/client/index.ts:196) +**Was at:** [`src/bindu/client/index.ts:196`](../src/bindu/client/index.ts:196) (the fallback itself unchanged β€” it was correct, just starved of input). + +**Fixed by:** new [`src/bindu/client/agent-card.ts`](../src/bindu/client/agent-card.ts) β€” +`fetchAgentCard(peerUrl, opts)` that GETs `/.well-known/agent.json`, +Zod-parses it, and caches per-process. Wired into `runCall` at +[`bindu/client/index.ts:153-159`](../src/bindu/client/index.ts:153): ```ts -const did = peer.trust.pinnedDID ?? (peer.card ? getPeerDID(peer.card) : null) +if (!input.peer.card && input.peer.trust?.verifyDID) { + const card = await fetchAgentCard(input.peer.url, { signal: input.signal }) + if (card) input.peer.card = card +} ``` -The `peer.card` branch is permanently `null` β€” nothing in the codebase -fetches `/.well-known/agent.json` or sets `PeerDescriptor.card`. Grep -confirms: no writes to `peer.card =` anywhere. Either: +Now `trust.verifyDID: true` *without* a `pinnedDID` is a real feature: +the gateway observes the peer's published DID, resolves its public key +via the DID document, and verifies every artifact signature against it. +The pinned path still wins when both are set. + +Design choices: +- **Fetch only when `verifyDID: true`** β€” no network cost on peers that + don't care about verification. +- **2-second timeout** β€” short because it blocks the first call per + peer. Failures degrade to null (same behavior as before the fix). +- **Per-process cache includes negatives** β€” a flaky peer doesn't cost + one outbound fetch per `/plan`. +- **Mutation of `input.peer.card`** is safe because `PeerDescriptor` + is built fresh per catalog entry per request β€” no cross-session + leak. + +Coverage: [`tests/bindu/agent-card-fetch.test.ts`](../tests/bindu/agent-card-fetch.test.ts) +β€” 9 cases (success / 404 / malformed / schema-mismatch / network +failure / cache hit / negative cache / per-URL isolation / timeout). + +**Remaining follow-up β€” SSE `agent_did` still null without pinnedDID.** +`plan-route.ts`'s [`findPinnedDID()`](../src/api/plan-route.ts:314) only +reads `trust.pinnedDID`. The observed DID from `peer.card` isn't +surfaced in the SSE stream yet. Separate ticket; tracked below as +🟠 "SSE `agent_did` doesn't surface observed DIDs". + +--- + +### 🟠 SSE `agent_did` doesn't surface observed DIDs + +**Where:** [`src/api/plan-route.ts:314-316`](../src/api/plan-route.ts:314) β€” `findPinnedDID` only reads `trust.pinnedDID` from the request catalog. -1. Implement AgentCard fetch at first call per session (the "option C" - from the earlier design discussion), OR -2. Delete the fallback and mark `peer.card` as reserved for a future - feature β€” pretending the fallback exists is worse than owning up. +Consequence: even after `fetchAgentCard` populates `peer.card` inside the +Bindu client (see the resolved AgentCard-fallback entry above), the SSE +stream still emits `agent_did: null` unless the caller pinned a DID up +front. Signature verification works against the observed DID; the +display layer doesn't know about it. -The plan-route's [`findPinnedDID()`](../src/api/plan-route.ts:314) also -only reads `pinnedDID`, so `agent_did` in SSE is `null` whenever the -caller didn't pin. Users quickly trip on this. +**Fix:** rename to `findAgentDID(request, agentName)`, add an observed-DID +path (reads a per-plan cache populated after the first successful call +publishes a Bus event with the observed DID), and add an optional +`agent_did_source: "pinned" | "observed" | null` field on SSE frames so +consumers can tell the provenance. Detailed in the earlier design +discussion; ~80 LOC including tests. --- diff --git a/gateway/src/bindu/client/agent-card.ts b/gateway/src/bindu/client/agent-card.ts new file mode 100644 index 00000000..5877c021 --- /dev/null +++ b/gateway/src/bindu/client/agent-card.ts @@ -0,0 +1,88 @@ +import { AgentCard } from "../protocol/agent-card" + +/** + * Fetch and parse a peer's AgentCard from `/.well-known/agent.json`. + * + * Populated via this helper, the `peer.card` field on a PeerDescriptor + * activates the DID fallback in `maybeVerifySignatures` β€” when the + * caller sets `trust.verifyDID: true` but didn't pin a DID, the + * gateway can recover the peer's published DID from its AgentCard and + * verify artifacts against the corresponding public key. + * + * # Cache + * + * Per-process, keyed by peer URL. AgentCards are stable for the life + * of the peer process; a gateway restart is an acceptable boundary + * for picking up a rotated identity. Negative results (404, malformed + * JSON, timeout) are cached too so a flaky peer doesn't cost us one + * outbound fetch per /plan. + * + * # Timeout + * + * 2 seconds by default β€” the fetch blocks the first call to a new + * peer, so we keep it short. Callers can override via the opts. + * + * # Errors become `null` + * + * Every failure mode (network error, non-2xx, invalid JSON, schema + * mismatch, abort) returns `null` rather than throwing. The fallback + * in maybeVerifySignatures degrades gracefully β€” null peer.card just + * means the pinnedDID path is the only option, same behavior as before + * this module existed. Errors aren't "safety failures" here; they're + * "couldn't enrich." + */ + +const cache = new Map() + +export interface FetchAgentCardOptions { + readonly signal?: AbortSignal + readonly timeoutMs?: number +} + +const DEFAULT_TIMEOUT_MS = 2000 +const WELL_KNOWN_PATH = "/.well-known/agent.json" + +export async function fetchAgentCard( + peerUrl: string, + opts: FetchAgentCardOptions = {}, +): Promise { + if (cache.has(peerUrl)) return cache.get(peerUrl) ?? null + + const timeoutMs = opts.timeoutMs ?? DEFAULT_TIMEOUT_MS + const ac = new AbortController() + const timer = setTimeout(() => ac.abort(), timeoutMs) + const onUpstreamAbort = () => ac.abort() + opts.signal?.addEventListener("abort", onUpstreamAbort, { once: true }) + + try { + const target = new URL(WELL_KNOWN_PATH, peerUrl).toString() + const res = await fetch(target, { signal: ac.signal }) + if (!res.ok) { + cache.set(peerUrl, null) + return null + } + const json = (await res.json()) as unknown + const parsed = AgentCard.safeParse(json) + if (!parsed.success) { + cache.set(peerUrl, null) + return null + } + cache.set(peerUrl, parsed.data) + return parsed.data + } catch { + cache.set(peerUrl, null) + return null + } finally { + clearTimeout(timer) + opts.signal?.removeEventListener("abort", onUpstreamAbort) + } +} + +/** + * Reset the AgentCard cache. Only intended for use in tests β€” production + * callers have no reason to evict (a gateway restart picks up any + * identity rotation). Exported on a `__`-prefixed name to signal intent. + */ +export function __resetAgentCardCacheForTests(): void { + cache.clear() +} diff --git a/gateway/src/bindu/client/index.ts b/gateway/src/bindu/client/index.ts index f3d9d085..5c7f571d 100644 --- a/gateway/src/bindu/client/index.ts +++ b/gateway/src/bindu/client/index.ts @@ -10,6 +10,7 @@ import { verifyArtifact } from "../identity/verify" import { createResolver, primaryPublicKeyBase58 } from "../identity/resolve" import { getPeerDID } from "../protocol/identity" import type { AgentCard } from "../protocol/agent-card" +import { fetchAgentCard } from "./agent-card" /** * Public Bindu client β€” the thing the planner's tools invoke. @@ -148,6 +149,22 @@ async function runCall( identity: LocalIdentity | undefined, tokenProvider: TokenProvider | undefined, ): Promise { + // Populate peer.card from /.well-known/agent.json on first contact. + // Cached per process β€” subsequent calls to the same peer are free. + // This activates the AgentCard-based DID fallback in + // maybeVerifySignatures: when the caller enabled `trust.verifyDID` + // without pinning a DID, the gateway recovers the peer's published + // DID here and verifies against its public key. + // + // Runs concurrently-safe (cache handles races via last-write-wins, + // AgentCards are stable), non-blocking on failure (returns null, + // peer.card stays undefined, verification falls through to the + // pinnedDID-only path). + if (!input.peer.card && input.peer.trust?.verifyDID) { + const card = await fetchAgentCard(input.peer.url, { signal: input.signal }) + if (card) input.peer.card = card + } + const parts: Part[] = typeof input.input === "string" ? [{ kind: "text", text: input.input }] : input.input diff --git a/gateway/tests/bindu/agent-card-fetch.test.ts b/gateway/tests/bindu/agent-card-fetch.test.ts new file mode 100644 index 00000000..6e1072f5 --- /dev/null +++ b/gateway/tests/bindu/agent-card-fetch.test.ts @@ -0,0 +1,155 @@ +import { describe, it, expect, beforeEach, vi, afterEach } from "vitest" +import { fetchAgentCard, __resetAgentCardCacheForTests } from "../../src/bindu/client/agent-card" + +/** + * Coverage for the AgentCard fetch helper β€” the piece that activates + * maybeVerifySignatures' observed-DID fallback when the caller enabled + * `trust.verifyDID: true` without pinning a DID. + * + * Resolves the "peer.card is never populated" high-severity entry in + * BUGS_AND_KNOWN_ISSUES.md: before this module, the fallback at + * bindu/client/index.ts:196 was dead code because nothing ever set + * peer.card. These tests pin the four real outcomes (success, 404, + * malformed body, cache hit) and the degrade-to-null contract on + * every failure mode. + */ + +const VALID_CARD = { + id: "did:bindu:ops_at_example_com:research:7dc57d21-2c81-f6f5-c679-e51995f97e22", + name: "research", + description: "Web search and summarize.", + skills: [], + defaultInputModes: ["text/plain"], + defaultOutputModes: ["text/plain"], + capabilities: { + extensions: [ + { + uri: "did:bindu:ops_at_example_com:research:7dc57d21-2c81-f6f5-c679-e51995f97e22", + }, + ], + }, +} + +describe("fetchAgentCard", () => { + beforeEach(() => { + __resetAgentCardCacheForTests() + }) + + afterEach(() => { + vi.restoreAllMocks() + }) + + it("fetches and parses a valid AgentCard from /.well-known/agent.json", async () => { + const fetchSpy = vi.spyOn(global, "fetch").mockResolvedValue( + new Response(JSON.stringify(VALID_CARD), { + status: 200, + headers: { "content-type": "application/json" }, + }), + ) + + const result = await fetchAgentCard("http://localhost:3773") + expect(result).not.toBeNull() + expect(result?.name).toBe("research") + expect(fetchSpy).toHaveBeenCalledWith( + "http://localhost:3773/.well-known/agent.json", + expect.objectContaining({ signal: expect.anything() }), + ) + }) + + it("returns null on non-2xx without throwing", async () => { + vi.spyOn(global, "fetch").mockResolvedValue( + new Response("not found", { status: 404 }), + ) + + const result = await fetchAgentCard("http://localhost:3773") + expect(result).toBeNull() + }) + + it("returns null on malformed JSON body", async () => { + vi.spyOn(global, "fetch").mockResolvedValue( + new Response("not a json body, just a string", { + status: 200, + headers: { "content-type": "application/json" }, + }), + ) + + const result = await fetchAgentCard("http://localhost:3773") + expect(result).toBeNull() + }) + + it("returns null when the body doesn't match the AgentCard schema", async () => { + vi.spyOn(global, "fetch").mockResolvedValue( + new Response(JSON.stringify({ wrong: "shape" }), { + status: 200, + headers: { "content-type": "application/json" }, + }), + ) + + const result = await fetchAgentCard("http://localhost:3773") + expect(result).toBeNull() + }) + + it("returns null and caches the failure when fetch throws", async () => { + vi.spyOn(global, "fetch").mockRejectedValue(new Error("ECONNREFUSED")) + + const result = await fetchAgentCard("http://localhost:3773") + expect(result).toBeNull() + }) + + it("caches successful results β€” second call does not re-fetch", async () => { + const fetchSpy = vi.spyOn(global, "fetch").mockResolvedValue( + new Response(JSON.stringify(VALID_CARD), { + status: 200, + headers: { "content-type": "application/json" }, + }), + ) + + const first = await fetchAgentCard("http://localhost:3773") + const second = await fetchAgentCard("http://localhost:3773") + + expect(first).toBe(second) // same cached reference + expect(fetchSpy).toHaveBeenCalledTimes(1) + }) + + it("caches failures too β€” second call does not re-fetch a known-bad peer", async () => { + const fetchSpy = vi.spyOn(global, "fetch").mockResolvedValue( + new Response("", { status: 404 }), + ) + + await fetchAgentCard("http://localhost:3773") + await fetchAgentCard("http://localhost:3773") + + expect(fetchSpy).toHaveBeenCalledTimes(1) + }) + + it("caches per-URL β€” a different peer is fetched independently", async () => { + const fetchSpy = vi.spyOn(global, "fetch").mockResolvedValue( + new Response(JSON.stringify(VALID_CARD), { status: 200 }), + ) + + await fetchAgentCard("http://localhost:3773") + await fetchAgentCard("http://localhost:3775") + + expect(fetchSpy).toHaveBeenCalledTimes(2) + const urls = fetchSpy.mock.calls.map((c) => c[0]) + expect(urls).toContain("http://localhost:3773/.well-known/agent.json") + expect(urls).toContain("http://localhost:3775/.well-known/agent.json") + }) + + it("aborts on timeout", async () => { + // Simulate a fetch that never resolves β€” the helper's internal + // timeout should abort. We use a short timeout and expect null. + vi.spyOn(global, "fetch").mockImplementation( + (_url, init) => + new Promise((_resolve, reject) => { + const signal = (init as { signal?: AbortSignal } | undefined)?.signal + signal?.addEventListener("abort", () => reject(new Error("aborted")), { + once: true, + }) + }), + ) + + const result = await fetchAgentCard("http://localhost:3773", { timeoutMs: 50 }) + expect(result).toBeNull() + }) +}) From 8681d48531fd469adc33b6e25830641a4a08cc4c Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Raahul=20Dutta=20-=20=E0=A6=B0=E0=A6=BE=E0=A6=B9=E0=A7=81?= =?UTF-8?q?=E0=A6=B2=20=F0=9F=96=96?= Date: Mon, 20 Apr 2026 16:51:19 +0200 Subject: [PATCH 5/6] feat(gateway): surface observed DIDs in SSE with provenance MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Resolves the "SSE agent_did doesn't surface observed DIDs" follow-up from the AgentCard fetch work. Before: findPinnedDID read only trust.pinnedDID, so callers who didn't pin saw agent_did: null in every task.* frame even after the Bindu client had observed the peer's DID via AgentCard. Signature verification worked against the observed DID; the display layer didn't know about it. Changes: - New findAgentDID(request, observedByName, agentName) β†’ {did, source} with precedence pinned > observed > null. Replaces findPinnedDID. - New `agent_did_source: "pinned" | "observed" | null` field on every task.started / task.artifact / task.finished SSE frame so consumers can distinguish caller-vouched pinned DIDs from self-reported observed DIDs and apply their own trust policies per frame. - Upfront AgentCard discovery at /plan start via Promise.allSettled with a 2s TOTAL budget shared across the catalog (not 2s per peer). Results populate the observedByName map and hit the per-process fetchAgentCard cache, so the Bindu client's verification path gets them for free on subsequent calls. openapi.yaml: new shared AgentDIDSource schema with trust-policy docs referenced from all three task.* events. Each affected event's required fields list gains `agent_did_source`. Coverage: tests/api/find-agent-did.test.ts β€” 6 cases pinning precedence (pinned wins even when observed exists), per-agent independence, graceful null for agent names outside the catalog (load_recipe passes through this helper too). BUGS_AND_KNOWN_ISSUES.md: entry marked RESOLVED with cross-refs. Typecheck clean, 216/216 tests pass (+6 find-agent-did), redocly lint 0 errors. Co-Authored-By: Claude Opus 4.7 (1M context) --- gateway/docs/BUGS_AND_KNOWN_ISSUES.md | 53 +++++++++---- gateway/openapi.yaml | 50 +++++++++++- gateway/src/api/plan-route.ts | 92 ++++++++++++++++++---- gateway/tests/api/find-agent-did.test.ts | 98 ++++++++++++++++++++++++ 4 files changed, 256 insertions(+), 37 deletions(-) create mode 100644 gateway/tests/api/find-agent-did.test.ts diff --git a/gateway/docs/BUGS_AND_KNOWN_ISSUES.md b/gateway/docs/BUGS_AND_KNOWN_ISSUES.md index c38482a6..84de45f8 100644 --- a/gateway/docs/BUGS_AND_KNOWN_ISSUES.md +++ b/gateway/docs/BUGS_AND_KNOWN_ISSUES.md @@ -91,22 +91,43 @@ surfaced in the SSE stream yet. Separate ticket; tracked below as --- -### 🟠 SSE `agent_did` doesn't surface observed DIDs - -**Where:** [`src/api/plan-route.ts:314-316`](../src/api/plan-route.ts:314) β€” `findPinnedDID` only reads `trust.pinnedDID` from the request catalog. - -Consequence: even after `fetchAgentCard` populates `peer.card` inside the -Bindu client (see the resolved AgentCard-fallback entry above), the SSE -stream still emits `agent_did: null` unless the caller pinned a DID up -front. Signature verification works against the observed DID; the -display layer doesn't know about it. - -**Fix:** rename to `findAgentDID(request, agentName)`, add an observed-DID -path (reads a per-plan cache populated after the first successful call -publishes a Bus event with the observed DID), and add an optional -`agent_did_source: "pinned" | "observed" | null` field on SSE frames so -consumers can tell the provenance. Detailed in the earlier design -discussion; ~80 LOC including tests. +### βœ… RESOLVED β€” SSE `agent_did` now surfaces observed DIDs with provenance + +**Was at:** [`src/api/plan-route.ts:314-316`](../src/api/plan-route.ts:314) β€” the old `findPinnedDID` only read `trust.pinnedDID` from the catalog. + +**Fixed by:** `findPinnedDID` replaced with +[`findAgentDID(request, observedByName, agentName)`](../src/api/plan-route.ts) which returns an `{did, source}` +pair. Resolution precedence: + +``` +pinned trust.pinnedDID from /plan catalog strongest +observed peer.card DID from /.well-known/agent.json weaker +null neither path resolved no claim +``` + +Every `task.started` / `task.artifact` / `task.finished` SSE frame now +carries both `agent_did` and a new `agent_did_source` field with +values `"pinned" | "observed" | null` so consumers can apply the +right trust policy per frame. + +The observed map is populated upfront at `/plan` start: +`Promise.allSettled` over every catalog entry's +[`fetchAgentCard()`](../src/bindu/client/agent-card.ts) with a +**2-second total budget** shared across peers via a single +AbortController β€” even a fleet of 50 agents caps discovery at 2s +total, not 2s per peer. Results hit the per-process AgentCard cache +so the Bindu client's own verification path gets them for free on +subsequent calls. + +Coverage: [`tests/api/find-agent-did.test.ts`](../tests/api/find-agent-did.test.ts) +β€” 6 cases pinning precedence (pinned wins over observed), per-agent +independence, graceful null on unknown agent names (e.g., +`load_recipe` passes through the helper too), and map-vs-catalog +isolation. + +openapi: new shared `AgentDIDSource` schema referenced from all three +task.* SSE frames, with explicit docs on how consumers should treat +each value for trust decisions. --- diff --git a/gateway/openapi.yaml b/gateway/openapi.yaml index 0a402862..b30e7991 100644 --- a/gateway/openapi.yaml +++ b/gateway/openapi.yaml @@ -937,7 +937,7 @@ components: SSEEvent_TaskStarted: type: object - required: [task_id, agent, agent_did, skill, input] + required: [task_id, agent, agent_did, agent_did_source, skill, input] properties: task_id: type: string @@ -947,7 +947,15 @@ components: description: Display name of the peer agent (from `agents[].name`). agent_did: type: [string, "null"] - description: Pinned DID for the agent (from `agents[].trust.pinnedDID`), or null if not pinned. + description: | + The peer's DID, resolved with precedence pinned β†’ observed β†’ null: + (a) `trust.pinnedDID` from the /plan catalog if set; otherwise + (b) the DID the peer published at `/.well-known/agent.json`, + fetched by the gateway at plan-open time; otherwise + (c) `null` β€” cryptographic identity undeclared. + See `agent_did_source` for which path resolved it. + agent_did_source: + $ref: "#/components/schemas/AgentDIDSource" skill: type: string description: Skill id being invoked on the peer. @@ -961,7 +969,7 @@ components: SSEEvent_TaskArtifact: type: object - required: [task_id, agent, agent_did, content] + required: [task_id, agent, agent_did, agent_did_source, content] properties: task_id: type: string @@ -969,6 +977,9 @@ components: type: string agent_did: type: [string, "null"] + description: Same resolution rules as on `task.started` β€” pinned β†’ observed β†’ null. + agent_did_source: + $ref: "#/components/schemas/AgentDIDSource" content: type: string description: | @@ -1005,7 +1016,7 @@ components: SSEEvent_TaskFinished: type: object - required: [task_id, agent, agent_did, state] + required: [task_id, agent, agent_did, agent_did_source, state] properties: task_id: type: string @@ -1013,6 +1024,9 @@ components: type: string agent_did: type: [string, "null"] + description: Same resolution rules as on `task.started` β€” pinned β†’ observed β†’ null. + agent_did_source: + $ref: "#/components/schemas/AgentDIDSource" state: type: string enum: [completed, failed] @@ -1064,6 +1078,34 @@ components: description: Empty object. Last frame of every successful plan. additionalProperties: false + AgentDIDSource: + type: [string, "null"] + enum: [pinned, observed, null] + description: | + Provenance of the `agent_did` on the same SSE frame. Tells + consumers which of three paths resolved the DID, so they can + apply the right trust policy: + + - `"pinned"` β€” the caller declared `trust.pinnedDID` in the + /plan catalog. The caller vouched for this identity; the + gateway enforces it when `verifyDID: true` is also set. + Strongest claim a consumer can get out of this field. + - `"observed"` β€” the peer self-reported this DID in its + `/.well-known/agent.json` AgentCard, fetched by the gateway + at plan-open time. Weaker than pinned: an impostor standing + up a fake endpoint can advertise any DID they choose unless + signature verification is on. + - `null` β€” neither path resolved. Either the caller didn't pin + AND the AgentCard couldn't be fetched (no `.well-known`, + network failure, malformed), or the AgentCard had no DID in + either `id` or `capabilities.extensions[].uri`. + + Compliance-gated consumers should treat `"observed"` and + `null` identically unless they also see a `signatures.ok:true` + with `signed > 0` on the same or a following frame β€” that's + the cryptographic evidence that promotes an observed DID to a + verified one. + PlanSignatures: type: [object, "null"] description: | diff --git a/gateway/src/api/plan-route.ts b/gateway/src/api/plan-route.ts index a1cc61f8..9b26368b 100644 --- a/gateway/src/api/plan-route.ts +++ b/gateway/src/api/plan-route.ts @@ -12,6 +12,8 @@ import { import { Service as BusService, type Interface as BusInterface } from "../bus" import { Service as ConfigService, type Config } from "../config" import { PromptEvent } from "../session/prompt" +import { fetchAgentCard } from "../bindu/client/agent-card" +import { getPeerDID } from "../bindu/protocol/identity" import type { z } from "zod" /** @@ -92,6 +94,35 @@ async function handleRequest( ) } + // 2b. Pre-fetch each peer's AgentCard in parallel (total ≀2s budget) + // so we can surface observed DIDs in SSE even when the caller + // didn't pin one. Results are cached in fetchAgentCard's + // per-process Map β€” the Bindu client's downstream runCall will + // hit the same cache for free. Failures don't block: individual + // peer AgentCards default to "not observed", `agent_did` stays + // null for that peer. + const observedByName = new Map() + { + const discoveryBudget = 2000 + const ac = new AbortController() + const timer = setTimeout(() => ac.abort(), discoveryBudget) + try { + await Promise.allSettled( + request.agents.map(async (ag) => { + const card = await fetchAgentCard(ag.endpoint, { + signal: ac.signal, + timeoutMs: discoveryBudget, + }) + if (!card) return + const did = getPeerDID(card) + if (did) observedByName.set(ag.name, did) + }), + ) + } finally { + clearTimeout(timer) + } + } + // 3. Resolve session BEFORE opening SSE β€” required so subscribers can // filter events by sessionID. Any failure here returns plain JSON. let sessionCtx: SessionContext @@ -150,17 +181,18 @@ async function handleRequest( spawnReader(ac.signal, ownEvent(bus.subscribe(PromptEvent.ToolCallStart)), async (evt) => { const agentName = parseAgentFromTool(evt.properties.tool) + // Resolve the peer's DID with provenance. Pinned wins over + // observed (the caller vouched; observed is self-reported). + // Emitted on every task.* frame so SSE consumers can partition + // by `agent_did_source` if they only trust pinned claims. + const agentId = findAgentDID(request, observedByName, agentName) await stream.writeSSE({ event: "task.started", data: JSON.stringify({ task_id: evt.properties.callID, agent: agentName, - // Unique identifier for the peer. Agent names are - // operator-chosen and can collide across catalogs; DIDs - // are the stable cryptographic handle. Null when the - // request didn't pin a DID for this peer (auth.type="none" - // / "bearer" / "bearer_env" without trust.pinnedDID). - agent_did: findPinnedDID(request, agentName), + agent_did: agentId.did, + agent_did_source: agentId.source, skill: parseSkillFromTool(evt.properties.tool), input: evt.properties.input, }), @@ -178,12 +210,14 @@ async function handleRequest( evt.properties.signatures !== undefined ? { signatures: evt.properties.signatures } : {} + const agentId = findAgentDID(request, observedByName, agentName) await stream.writeSSE({ event: "task.artifact", data: JSON.stringify({ task_id: evt.properties.callID, agent: agentName, - agent_did: findPinnedDID(request, agentName), + agent_did: agentId.did, + agent_did_source: agentId.source, content: evt.properties.output, title: evt.properties.title, ...sigField, @@ -194,7 +228,8 @@ async function handleRequest( data: JSON.stringify({ task_id: evt.properties.callID, agent: agentName, - agent_did: findPinnedDID(request, agentName), + agent_did: agentId.did, + agent_did_source: agentId.source, state: evt.properties.error ? "failed" : "completed", ...(evt.properties.error ? { error: evt.properties.error } : {}), ...sigField, @@ -312,17 +347,40 @@ function parseSkillFromTool(toolId: string): string { return m?.[2] ?? "" } +/** Shape returned by findAgentDID β€” keeps DID + provenance together + * so every SSE frame can emit both without re-running the lookup. */ +export interface AgentDIDResolution { + readonly did: string | null + readonly source: "pinned" | "observed" | null +} + /** - * Resolve a peer's DID from the /plan request's agent catalog. + * Resolve a peer's DID with provenance, in precedence order: + * + * 1. ``trust.pinnedDID`` from the /plan request catalog β€” the caller + * explicitly declared which DID they expect. Strongest claim. + * 2. Observed DID from the peer's AgentCard (fetched upfront during + * /plan setup, keyed by agent name) β€” the peer self-reports this + * identity at /.well-known/agent.json. Weaker: an impostor can + * advertise any DID they like unless signature verification is on. + * 3. ``null`` β€” neither path resolved. Consumer can still identify + * the peer by name for display; cryptographic identity is + * unknown. * - * DIDs are optional in the API (callers can talk to ``auth.type="none"`` - * / ``"bearer"`` peers without ever pinning a DID), so this returns - * ``null`` when the catalog has no ``trust.pinnedDID`` for the named - * peer. SSE consumers treat ``null`` as "no cryptographic identity - * declared" β€” they can still identify the peer by name for display, - * just without the stable unique handle. + * The ``source`` field lets SSE consumers decide which guarantee they + * need. A consumer building an audit log of "calls made to peer X" + * might accept ``observed`` (human-readable correlation); a compliance + * gate might reject anything other than ``pinned``. */ -function findPinnedDID(request: PlanRequest, agentName: string): string | null { +export function findAgentDID( + request: PlanRequest, + observedByName: Map, + agentName: string, +): AgentDIDResolution { const entry = request.agents.find((a) => a.name === agentName) - return entry?.trust?.pinnedDID ?? null + const pinned = entry?.trust?.pinnedDID + if (pinned) return { did: pinned, source: "pinned" } + const observed = observedByName.get(agentName) + if (observed) return { did: observed, source: "observed" } + return { did: null, source: null } } diff --git a/gateway/tests/api/find-agent-did.test.ts b/gateway/tests/api/find-agent-did.test.ts new file mode 100644 index 00000000..68eaaef1 --- /dev/null +++ b/gateway/tests/api/find-agent-did.test.ts @@ -0,0 +1,98 @@ +import { describe, it, expect } from "vitest" +import { findAgentDID } from "../../src/api/plan-route" +import type { PlanRequest } from "../../src/planner" + +/** + * Precedence test for `findAgentDID` β€” the resolver that decides which + * DID (and which provenance label) lands on each task.* SSE frame. + * + * Covers the fix for the "SSE agent_did doesn't surface observed DIDs" + * bug (BUGS_AND_KNOWN_ISSUES.md). Before: the helper only read + * `trust.pinnedDID`, so any caller that didn't pin saw `agent_did: + * null` even after fetchAgentCard had observed the peer's real DID. + * After: observed DIDs are the fallback, with `source` distinguishing + * the two. + */ + +const DID_PINNED = "did:bindu:ops_at_example_com:research:pinned-11111111" +const DID_OBSERVED = "did:bindu:ops_at_example_com:research:observed-22222222" + +const mkRequest = (pinnedFor: Record = {}): PlanRequest => + ({ + question: "test", + agents: Object.entries(pinnedFor).map(([name, pinned]) => ({ + name, + endpoint: `http://${name}.local`, + skills: [], + ...(pinned ? { trust: { pinnedDID: pinned } } : {}), + })), + }) as PlanRequest + +describe("findAgentDID β€” DID precedence and provenance", () => { + it("returns {did: pinned, source: 'pinned'} when pinnedDID is set, even if observed also exists", () => { + const req = mkRequest({ research: DID_PINNED }) + const observed = new Map([["research", DID_OBSERVED]]) + + expect(findAgentDID(req, observed, "research")).toEqual({ + did: DID_PINNED, + source: "pinned", + }) + }) + + it("returns {did: observed, source: 'observed'} when pinnedDID is absent but observed exists", () => { + const req = mkRequest({ research: undefined }) + const observed = new Map([["research", DID_OBSERVED]]) + + expect(findAgentDID(req, observed, "research")).toEqual({ + did: DID_OBSERVED, + source: "observed", + }) + }) + + it("returns {did: null, source: null} when neither path resolves", () => { + const req = mkRequest({ research: undefined }) + const observed = new Map() + + expect(findAgentDID(req, observed, "research")).toEqual({ + did: null, + source: null, + }) + }) + + it("returns {did: null, source: null} when the agent name isn't in the catalog at all", () => { + // The planner's `load_recipe` tool passes through this helper with + // agentName = "load_recipe" β€” never in the catalog, should cleanly + // resolve to null/null rather than throwing. + const req = mkRequest({ research: DID_PINNED }) + const observed = new Map([["research", DID_OBSERVED]]) + + expect(findAgentDID(req, observed, "load_recipe")).toEqual({ + did: null, + source: null, + }) + }) + + it("pinned precedence is per-agent β€” one agent can be pinned while another is observed", () => { + const req = mkRequest({ research: DID_PINNED, math: undefined }) + const observed = new Map([ + ["research", DID_OBSERVED], + ["math", "did:bindu:ops_at_example_com:math:observed-3333"], + ]) + + const r = findAgentDID(req, observed, "research") + const m = findAgentDID(req, observed, "math") + + expect(r.source).toBe("pinned") + expect(r.did).toBe(DID_PINNED) + expect(m.source).toBe("observed") + expect(m.did).toBe("did:bindu:ops_at_example_com:math:observed-3333") + }) + + it("observed-DID map is treated as authoritative for observed claims β€” does not cross-contaminate across names", () => { + const req = mkRequest({ "a": undefined, "b": undefined }) + const observed = new Map([["a", DID_OBSERVED]]) + + expect(findAgentDID(req, observed, "a").did).toBe(DID_OBSERVED) + expect(findAgentDID(req, observed, "b").did).toBeNull() + }) +}) From aaeb4318b8984cd8d5e35c8098b74b0c3c3ae0d5 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Raahul=20Dutta=20-=20=E0=A6=B0=E0=A6=BE=E0=A6=B9=E0=A7=81?= =?UTF-8?q?=E0=A6=B2=20=F0=9F=96=96?= Date: Mon, 20 Apr 2026 17:00:53 +0200 Subject: [PATCH 6/6] =?UTF-8?q?docs:=20consolidate=20=E2=80=94=20bugs/know?= =?UTF-8?q?n-issues.md=20absorbs=20gateway=20issues,=20GATEWAY.md=20moves?= =?UTF-8?q?=20to=20repo-level=20docs/?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Matches the repo's existing doc layout: - /docs/ has topic-based guides per subsystem (AUTHENTICATION.md, PAYMENT.md, SCHEDULER.md, etc.) - /bugs/known-issues.md is the single user-facing known-issues registry; the file's own rules say entries are REMOVED on fix and new ones added in-place. This commit aligns gateway docs with that layout: /gateway/docs/STORY.md β†’ /docs/GATEWAY.md /gateway/docs/BUGS_AND_...md β†’ merged into /bugs/known-issues.md /gateway/docs/ β†’ deleted docs/GATEWAY.md: the end-to-end walkthrough. 8 internal links rewritten for the new location (one level up from where it was): ../openapi.yaml β†’ ../gateway/openapi.yaml ../README.md β†’ ../gateway/README.md ../agents/... β†’ ../gateway/agents/... ../recipes/... β†’ ../gateway/recipes/... ../../examples/... β†’ ../examples/... Link check across the four affected docs (docs/GATEWAY.md, gateway/README.md, fleet/README.md, bugs/known-issues.md) β€” all resolve. bugs/known-issues.md absorbs the unresolved work the gateway-local file was tracking. Net: - 15 new gateway entries (5 medium, 6 low, 5 nit) with file paths, symptoms, workarounds β€” all the items surfaced during the recipes/SSE/DID work that aren't yet fixed. - 2 existing entries narrowed: `tool-name-collisions-silent` was partially fixed in this session (collision rejection landed at plan-open time) and is now `parse-agent-from-tool-greedy-mismatch` tracking only the residual regex issue. `signature-verification-ok-when-unsigned` narrowed to `signature-verification-non-text-parts-unverified` β€” the envelope ambiguity was fixed by the four-valued `verified` label, `data`/`file` parts are still unverified. - Quick-index table + TOC counts updated accordingly. - Header's `_Last updated_` line reflects the 2026-04-20 session. gateway/README.md + examples/gateway_test_fleet/README.md: 6 total references to `docs/STORY.md` rewritten to point at the new `docs/GATEWAY.md` location. No code changes; 216/216 tests still pass. Co-Authored-By: Claude Opus 4.7 (1M context) --- bugs/known-issues.md | 315 +++++++++++++-- gateway/docs/STORY.md => docs/GATEWAY.md | 14 +- examples/gateway_test_fleet/README.md | 4 +- gateway/README.md | 8 +- gateway/docs/BUGS_AND_KNOWN_ISSUES.md | 495 ----------------------- 5 files changed, 297 insertions(+), 539 deletions(-) rename gateway/docs/STORY.md => docs/GATEWAY.md (98%) delete mode 100644 gateway/docs/BUGS_AND_KNOWN_ISSUES.md diff --git a/bugs/known-issues.md b/bugs/known-issues.md index f19b2015..799f8514 100644 --- a/bugs/known-issues.md +++ b/bugs/known-issues.md @@ -1,6 +1,6 @@ # Known Issues -_Last updated: 2026-04-19 β€” restructured into scannable tables and story-format entries for the high-severity items. `did-document-endpoint-returns-raw-dict` removed β€” see [`core/2026-04-19-did-document-endpoint-raw-dict.md`](./core/2026-04-19-did-document-endpoint-raw-dict.md)._ +_Last updated: 2026-04-20 β€” recipes/SSE/DID work on branch `feat/gateway-recipes` added 15 gateway entries (5 medium, 6 low, 5 nit) and narrowed two existing ones (`tool-name-collisions-silent` β†’ `parse-agent-from-tool-greedy-mismatch`, `signature-verification-ok-when-unsigned` β†’ `signature-verification-non-text-parts-unverified`) to the residual sub-bugs after partial resolutions landed. `did-document-endpoint-returns-raw-dict` removed β€” see [`core/2026-04-19-did-document-endpoint-raw-dict.md`](./core/2026-04-19-did-document-endpoint-raw-dict.md)._ This file is user-facing. It's the first thing a new contributor or operator reads to calibrate expectations: what Bindu doesn't do @@ -41,7 +41,7 @@ Issue referencing the slug (e.g. "Fixes `context-window-hardcoded`"). | Subsystem | High | Medium | Low | Nit | |---|---:|---:|---:|---:| -| [Gateway](#gateway) | 3 | 11 | 13 | 4 | +| [Gateway](#gateway) | 3 | 15 | 19 | 9 | | [Bindu Core (Python)](#bindu-core-python) | 4 | 7 | 2 | 0 | | [SDKs (TypeScript)](#sdks-typescript) | β€” | β€” | β€” | β€” | | [Frontend](#frontend) | β€” | β€” | β€” | β€” | @@ -59,9 +59,13 @@ Issue referencing the slug (e.g. "Fixes `context-window-hardcoded`"). | [`no-session-concurrency-guard`](#no-session-concurrency-guard) | high | Two `/plan` calls on the same session tangle their histories | | [`abort-signal-not-propagated-to-bindu-client`](#abort-signal-not-propagated-to-bindu-client) | medium | Client disconnect doesn't cancel in-flight peer polls | | [`permission-rules-not-enforced-for-tool-calls`](#permission-rules-not-enforced-for-tool-calls) | medium | Permission service exists but is never called for tool calls | -| [`tool-name-collisions-silent`](#tool-name-collisions-silent) | medium | `(agent, skill)` pairs can collide; second registration wins | +| [`parse-agent-from-tool-greedy-mismatch`](#parse-agent-from-tool-greedy-mismatch) | medium | SSE `agent` field wrong when agent names contain underscores | | [`agent-catalog-overwrite`](#agent-catalog-overwrite) | medium | Each `/plan` wholesale-overwrites the session's agent catalog | -| [`signature-verification-ok-when-unsigned`](#signature-verification-ok-when-unsigned) | medium (sec) | Unsigned peer responses indistinguishable from stripped signatures | +| [`signature-verification-non-text-parts-unverified`](#signature-verification-non-text-parts-unverified) | medium (sec) | `DataPart` / `FilePart` bypass signature verification | +| [`pinned-did-no-format-validation`](#pinned-did-no-format-validation) | medium | Any string is accepted as `pinnedDID`; no DID shape check | +| [`recipe-permission-ask-treated-as-allow`](#recipe-permission-ask-treated-as-allow) | medium | `permission: recipe: { "x": "ask" }` silently allows | +| [`ctx-ask-not-wired-for-recipes`](#ctx-ask-not-wired-for-recipes) | medium | Recipe permission hook is a no-op at runtime | +| [`supabase-error-surfaces-as-generic-sse-error`](#supabase-error-surfaces-as-generic-sse-error) | medium | Mid-stream DB failures emit untyped `event: error` | | [`did-resolver-no-key-id-selection`](#did-resolver-no-key-id-selection) | medium (sec) | Resolver picks the first public key; breaks during rotation | | [`list-messages-pagination-silent`](#list-messages-pagination-silent) | medium | Sessions >1000 messages silently truncate oldest | | [`tool-input-sent-as-textpart`](#tool-input-sent-as-textpart) | medium | Skills expecting `DataPart` receive `TextPart` with stringified JSON | @@ -86,10 +90,21 @@ Issue referencing the slug (e.g. "Fixes `context-window-hardcoded`"). | [`revert-doesnt-cancel-remote-tasks`](#revert-doesnt-cancel-remote-tasks) | low | `revertTo` doesn't cancel still-running peer tasks | | [`empty-agents-catalog-no-400`](#empty-agents-catalog-no-400) | low | `/plan` with empty `agents[]` runs and returns an LLM error | | [`no-migration-rollback`](#no-migration-rollback) | low | Migrations forward-only β€” no paired `down.sql` | +| [`recipe-no-hot-reload`](#recipe-no-hot-reload) | low | Recipe changes require a gateway restart | +| [`load-recipe-sse-frames-ambiguous`](#load-recipe-sse-frames-ambiguous) | low | `load_recipe` and peer calls use the same SSE `agent` field | +| [`recipe-file-enumeration-no-truncation-signal`](#recipe-file-enumeration-no-truncation-signal) | low | 10+ files per recipe silently truncated in tool output | +| [`faq-agent-did-name-mismatch`](#faq-agent-did-name-mismatch) | low | Fleet's `faq_agent.py` registers as `bindu_docs_agent` | +| [`fleet-env-can-desync-after-seed-rotation`](#fleet-env-can-desync-after-seed-rotation) | low | Stale `$*_DID` env vars survive agent seed rotation | +| [`provider-openrouter-hardcoded`](#provider-openrouter-hardcoded) | low | `model` field assumes OpenRouter prefix | | [`tasks-recorded-is-dead-state`](#tasks-recorded-is-dead-state) | nit | Unused `tasksRecorded` field in the planner | | [`map-finish-reason-pointless-ternary`](#map-finish-reason-pointless-ternary) | nit | Conditional type that evaluates to `any` either way | | [`db-effect-promise-swallows-errors`](#db-effect-promise-swallows-errors) | nit | `Effect.promise` silently swallows rejected promises | | [`test-coverage-gaps`](#test-coverage-gaps) | nit | Backlog list of missing test scenarios | +| [`accept-header-not-enforced-on-plan`](#accept-header-not-enforced-on-plan) | nit | `Accept: text/event-stream` not required | +| [`openapi-sse-schemas-unused`](#openapi-sse-schemas-unused) | nit | 13 redocly `no-unused-components` warnings on SSE schemas | +| [`no-planner-integration-test`](#no-planner-integration-test) | nit | Request β†’ SSE end-to-end test missing | +| [`no-health-handler-integration-test`](#no-health-handler-integration-test) | nit | `/health` only has pure-helper unit tests | +| [`story-chapter5-missing-oauth-scope-explanation`](#story-chapter5-missing-oauth-scope-explanation) | nit | DID chapter assumes reader knows OAuth scopes | ### High @@ -218,23 +233,22 @@ caller sends in the `/plan` request β€” only include agents the caller is allowed to invoke. **Tracking:** _(no issue yet)_ -### tool-name-collisions-silent +### parse-agent-from-tool-greedy-mismatch **Severity:** medium -**Summary:** `normalizeToolName` in +**Summary:** The collision half of this bug is now +**rejected** at plan-open time β€” `findDuplicateToolIds` in [`gateway/src/planner/index.ts`](../gateway/src/planner/index.ts) -replaces non-alphanumeric characters with `_` and truncates to 80 -chars. Distinct `(agent, skill)` pairs can map to the same tool id: -e.g. agent `research-v2` + skill `x` and agent `research_v2` + skill -`x` both become `call_research_v2_x`. The second registration -silently overwrites the first. A companion bug: `parseAgentFromTool` -uses a non-greedy regex `^call_(.+?)_(.+)$`, so an agent whose name -contains an underscore parses as `agent=first-segment`, -`skill=everything-else` β€” wrong for the SSE event the handler emits. -**Workaround:** Use globally-unique agent names in the catalog; -avoid both hyphens and underscores in the same naming scheme; avoid -underscores in agent names entirely if you rely on the `task.started` -SSE agent field being accurate. +returns a 400 when two `(agent, skill)` pairs normalize to the +same tool id. What remains: `parseAgentFromTool` in +[`gateway/src/api/plan-route.ts`](../gateway/src/api/plan-route.ts) +uses a non-greedy regex `^call_(.+?)_(.+)$`, so an agent whose +name contains an underscore (e.g. `research_v2`) emits SSE +`task.started` events with `agent=research`, `skill=v2_x` instead +of the intended `agent=research_v2`, `skill=x`. +**Workaround:** Avoid underscores in agent names in the catalog +you send on `/plan`. If you must, rely on `agent_did` / `task_id` +for correlation, not the SSE `agent` field. **Tracking:** _(no issue yet)_ ### agent-catalog-overwrite @@ -252,22 +266,21 @@ references its prior tool calls. Also exposed to a concurrency race even if individual agents are temporarily unavailable. **Tracking:** _(no issue yet)_ -### signature-verification-ok-when-unsigned +### signature-verification-non-text-parts-unverified **Severity:** medium (security-adjacent) -**Summary:** `verifyArtifact` in +**Summary:** The envelope-ambiguity half of this bug is **fixed** +β€” `` is now four-valued +(`yes | no | unsigned | unknown`), so the planner LLM can tell a +real cryptographic pass from "nothing was signed". What remains: +`verifyArtifact` in [`gateway/src/bindu/identity/verify.ts`](../gateway/src/bindu/identity/verify.ts) -returns `{ ok: true, signed: 0 }` when a peer sends text parts with -no signatures at all. An attacker stripping signatures from a -compromised peer cannot be distinguished from an unsigned peer. Also, -`file` and `data` parts are never verified at all regardless of -signing β€” a peer that moves payload into a `DataPart` bypasses -signature checks entirely. -**Workaround:** For trust-sensitive peers, set `trust.verifyDID: -true` AND explicitly check `outcome.signatures.signed > 0` in your -own code before trusting the response. Refuse peers that return -`data` or `file` parts for responses that must be verified. The -gateway does not enforce either today. +only inspects text parts. `file` and `data` parts are never +verified, so a peer moving payload into a `DataPart` bypasses +signature checks entirely even when `verifyDID: true` is set. +**Workaround:** Refuse peers that return `data` or `file` parts +for responses that must be verified. The gateway does not enforce +this constraint today. **Tracking:** _(no issue yet)_ ### did-resolver-no-key-id-selection @@ -679,6 +692,246 @@ Contributors tackling any of the fixable items above should add a test for it in the same PR. **Tracking:** _(no issue yet)_ + + + + +### pinned-did-no-format-validation + +**Severity:** medium +**Summary:** `PeerAuthRequest.trust.pinnedDID` in +[`gateway/src/planner/index.ts`](../gateway/src/planner/index.ts) +is `z.string().optional()` with no shape check. A caller can send +`pinnedDID: "hello"` or accidentally an un-interpolated template +literal like `"${RESEARCH_DID}"`, and the gateway will echo that +string in every SSE `agent_did` frame and try (vainly) to resolve +it for signature verification. Observed in practice when a Postman +user copy-pasted a bash-style variable reference into the request +body. +**Workaround:** Validate DID shape client-side before sending. At +minimum ensure the string starts with `did:bindu:` or `did:key:`. +**Tracking:** _(no issue yet)_ + +### recipe-permission-ask-treated-as-allow + +**Severity:** medium +**Summary:** +[`Recipe.available`](../gateway/src/recipe/index.ts) filters the +recipe list shown to an agent by excluding anything whose +`permission.recipe` resolves to `deny`. The evaluator is +three-valued (`allow | deny | ask`) β€” `ask` falls through and the +recipe is shown AND loadable. With the `ctx.ask` hook unwired (see +next entry), `"ask"` is silently identical to `"allow"`. +**Workaround:** Treat `"ask"` as if it were `"allow"` for recipes +today. If you need to restrict, use `"deny"`. +**Tracking:** _(no issue yet)_ + +### ctx-ask-not-wired-for-recipes + +**Severity:** medium +**Summary:** [`tool/recipe.ts`](../gateway/src/tool/recipe.ts) +calls `ctx.ask({permission: "recipe", target: name})` before +loading a recipe body, guarded by `if (ctx.ask)`. The `ctx.ask` +field is marked optional in `ToolContext` and +[`wrapTool`](../gateway/src/session/prompt.ts) never sets it. So +the gate is a permanent no-op β€” recipes load unconditionally +regardless of agent permission config. This is a separate concern +from `permission-rules-not-enforced-for-tool-calls` which targets +the broader tool-call gate. +**Workaround:** Don't rely on `permission.recipe` in agent configs +for production access control. Wait for Phase-2 permission UI. +**Tracking:** _(no issue yet)_ + +### supabase-error-surfaces-as-generic-sse-error + +**Severity:** medium +**Summary:** +[`plan-route.ts`](../gateway/src/api/plan-route.ts) wraps +`runPlan` in a catch that emits `event: error` with only +`{message: string}`. Callers can't programmatically distinguish a +Supabase outage from a peer crash from an LLM failure β€” all three +look identical on the wire. Operators debugging prod incidents +have to grep logs. +**Workaround:** Tail gateway logs out of band. Client-side, treat +any `event: error` as "retry with exponential backoff" regardless +of cause. +**Tracking:** _(no issue yet)_ + +### recipe-no-hot-reload + +**Severity:** low +**Summary:** +[`Recipe.layer`](../gateway/src/recipe/index.ts) reads +`gateway/recipes/` once at boot. Adding, editing, or deleting a +recipe markdown file has no effect until the gateway process +restarts. Authoring loop is Ctrl-C + `npm run dev` per change. +**Workaround:** Script `npm run dev` to auto-restart on +`recipes/**/*.md` changes using `nodemon` or `tsx watch` with a +wider include. Or edit + restart; the recipe layer init is cheap. +**Tracking:** _(no issue yet)_ + +### load-recipe-sse-frames-ambiguous + +**Severity:** low +**Summary:** When the planner calls the internal `load_recipe` +tool, the SSE stream emits `task.started` / `task.artifact` / +`task.finished` frames with `agent: "load_recipe"` and +`agent_did: null`. Consumers parsing `task.*` frames by peer +correlation have no crisp way to tell this apart from a peer call +that happens to have a null DID. The `agent_did_source: null` +field helps (peer calls from unpinned observed-failed peers also +have it null) but doesn't cleanly partition. +**Fix:** add an explicit `tool_kind: "peer" | "local"` field on +the task.* SSE frames. +**Workaround:** Filter on `agent === "load_recipe"` client-side. +**Tracking:** _(no issue yet)_ + +### recipe-file-enumeration-no-truncation-signal + +**Severity:** low +**Summary:** The +[`load_recipe`](../gateway/src/tool/recipe.ts) tool returns a +`` block listing sibling files inside a bundled +recipe's directory, capped at 10 entries. If a recipe directory +has more than 10 files, the excess silently disappears from the +list β€” the planner has no way to know it's seeing a sample, not +the full set. +**Fix:** include an explicit `truncated: true` marker in the tool +result's metadata, and emit a `console.warn` at boot when a recipe +directory exceeds the cap. +**Workaround:** Keep bundled recipe directories under 10 files. +**Tracking:** _(no issue yet)_ + +### faq-agent-did-name-mismatch + +**Severity:** low +**Summary:** The fleet demo's +[`examples/gateway_test_fleet/faq_agent.py`](../examples/gateway_test_fleet/faq_agent.py) +registers its DID as `bindu_docs_agent` but the filename and the +operator-facing port labels (3778) say `faq_agent`. First-time +readers following `docs/GATEWAY.md` Chapter 3 see the mismatch in +the SSE `agent_did` strings vs the catalog `agent` field and +wonder if something is wrong. Python-side concern, no gateway code +involved. +**Workaround:** The mismatch is cosmetic β€” signature verification +and routing both work. Ignore or pin `faq_agent` DID explicitly in +the catalog. +**Tracking:** _(no issue yet)_ + +### fleet-env-can-desync-after-seed-rotation + +**Severity:** low +**Summary:** +[`examples/gateway_test_fleet/start_fleet.sh`](../examples/gateway_test_fleet/start_fleet.sh) +writes fresh DIDs to `.fleet.env` on every run. But if an agent's +seed rotates (`rm -rf ~/.bindu`, restart) and a user has an old +shell with `$RESEARCH_DID` still sourced from a prior `.fleet.env`, +their next `/plan` pins a stale DID. Signature verification fails +with a cryptic mismatch error. +**Fix:** have `start_fleet.sh` print a "re-source `.fleet.env` if +you had one loaded previously" hint whenever any agent's DID +differs from the previous run's cache. +**Workaround:** Always `source .fleet.env` fresh after any fleet +restart. +**Tracking:** _(no issue yet)_ + +### provider-openrouter-hardcoded + +**Severity:** low +**Summary:** +[`gateway/src/provider/index.ts`](../gateway/src/provider/index.ts) +and downstream code assume the `openrouter/` model prefix. Adding +direct Anthropic or direct OpenAI support (without going through +OpenRouter's proxy) is a code change, not config. +**Workaround:** Use OpenRouter as the universal proxy; it supports +every major provider. Only reach for this when you need direct +provider features (e.g. Anthropic prompt caching, OpenAI +fine-tuned model access) that OpenRouter doesn't proxy. +**Tracking:** _(no issue yet)_ + +### accept-header-not-enforced-on-plan + +**Severity:** nit +**Summary:** `POST /plan` returns `text/event-stream` regardless +of whether the client sent `Accept: text/event-stream`. Clients +that forget the header still get SSE, which is convenient but +breaks strict content-negotiation semantics. Not currently +documented as required in +[`openapi.yaml`](../gateway/openapi.yaml) either. +**Fix:** either document the header as required and return 406 +when absent, or keep the permissive behavior and document it. +Currently we're in the worst middle ground. +**Workaround:** None needed; current behavior is operationally +fine. Consumers relying on strict 406 on wrong `Accept` won't get +it. +**Tracking:** _(no issue yet)_ + +### openapi-sse-schemas-unused + +**Severity:** nit +**Summary:** `redocly lint gateway/openapi.yaml` reports 13 +`no-unused-components` warnings on the `SSEEvent_*` schemas. +OpenAPI 3.1 has no native SSE modeling, so those schemas sit as +reference material rather than being `$ref`'d from a response +body. The warnings are expected given the format's limitations, +not indicative of drift. +**Fix options:** (a) accept β€” pragmatic, 0 errors, just warnings; +(b) use `oneOf` inside the `text/event-stream` response schema to +enumerate each event shape (stretches OpenAPI); (c) publish a +separate AsyncAPI 2.x/3.x spec for the SSE surface. +**Workaround:** None needed; warnings don't break consumers. +**Tracking:** _(no issue yet)_ + +### no-planner-integration-test + +**Severity:** nit +**Summary:** The unit tests cover every Gateway module in +isolation, but no single test walks a `/plan` request +end-to-end with a mocked LLM provider. The recipes feature, +signatures surfacing, and observed-DID resolution were all +verified manually against the real stack and via targeted unit +tests for their pure helpers. A regression in the cross-cutting +glue (tool β†’ Bus β†’ SSE JSON) would be caught only at integration +time. +**Fix:** mock `Provider.Service` with a fake emitting a canned +`StreamEvent` sequence; assert SSE output matches expectations. +**Workaround:** None β€” treat this as internal backlog. +**Tracking:** _(no issue yet)_ + +### no-health-handler-integration-test + +**Severity:** nit +**Summary:** +[`tests/api/health-route.test.ts`](../gateway/tests/api/health-route.test.ts) +covers only the pure helpers (`splitModelId`, `deriveGatewayId`, +`deriveAuthor`). The handler's full response shape β€” version, +planner-model nesting, runtime flags, uptime math β€” is verified +by manual curl, not by a test that builds the layer graph and +asserts the JSON. Drift between `openapi.yaml`'s `HealthResponse` +schema and the actual response would go unnoticed until someone +hand-checks. +**Fix:** build a minimal layer graph (mock Supabase) in a test, +invoke the handler against a stub Hono context, assert the body +matches the openapi schema. +**Workaround:** Manual curl against a running gateway. +**Tracking:** _(no issue yet)_ + +### story-chapter5-missing-oauth-scope-explanation + +**Severity:** nit +**Summary:** +[`docs/GATEWAY.md`](../docs/GATEWAY.md) Chapter 5 sets +`BINDU_GATEWAY_HYDRA_SCOPE` via env vars but never explains what +OAuth scopes are or why `agent:read` + `agent:write` are the +defaults. A reader walking the story linearly hits the config +step without context. +**Fix:** one-paragraph sidebar in Chapter 5 explaining "scopes +are labels we ask Hydra to stamp on tokens; peers check them +before accepting a `message/send`". +**Workaround:** Cross-ref to gateway/README.md Β§DID signing which +explains it. +**Tracking:** _(no issue yet)_ + --- ## Bindu Core (Python) diff --git a/gateway/docs/STORY.md b/docs/GATEWAY.md similarity index 98% rename from gateway/docs/STORY.md rename to docs/GATEWAY.md index 5f48c5d5..4db05b73 100644 --- a/gateway/docs/STORY.md +++ b/docs/GATEWAY.md @@ -393,7 +393,7 @@ Five agents now, each on its own port: | faq_agent | 3778 | Answers from a canned FAQ | Each is ~60 lines of Python. Open any one β€” say -[joke_agent.py](../../examples/gateway_test_fleet/joke_agent.py) β€” and you'll see +[joke_agent.py](../examples/gateway_test_fleet/joke_agent.py) β€” and you'll see a small configuration that wires a language model (`openai/gpt-4o-mini`) to a few lines of instructions ("tell jokes, refuse other requests"). Narrow scope on purpose so mistakes are visible. @@ -660,7 +660,7 @@ Where recipes shine: payment-required`, surface the payment URL to the user and STOP β€” do not retry" is a policy the planner wouldn't invent on its own. See the seed recipe at - [gateway/recipes/payment-required-flow/RECIPE.md](../recipes/payment-required-flow/RECIPE.md) + [gateway/recipes/payment-required-flow/RECIPE.md](../gateway/recipes/payment-required-flow/RECIPE.md) for a real example. - **Tenant-specific rules.** A recipe visible only to a certain agent can encode rules like "always include a disclaimer" or "always call @@ -929,20 +929,20 @@ skip. ### Reference material -- **[gateway/openapi.yaml](../openapi.yaml)** β€” the machine-readable +- **[gateway/openapi.yaml](../gateway/openapi.yaml)** β€” the machine-readable contract for `/plan`, `/health`, and `/.well-known/did.json`. Paste it into [Swagger UI](https://editor.swagger.io) or [Stoplight](https://stoplight.io) to click through every field, response, and example. This is the source of truth; this document is the prose. -- **[gateway/README.md](../README.md)** β€” the operator's reference: +- **[gateway/README.md](../gateway/README.md)** β€” the operator's reference: configuration knobs, environment variables, the `/health` payload, troubleshooting, and where vendored code came from (OpenCode). Short and targeted β€” most of the narrative moved into this story. -- **[gateway/agents/planner.md](../agents/planner.md)** β€” the planner +- **[gateway/agents/planner.md](../gateway/agents/planner.md)** β€” the planner LLM's system prompt. If the gateway is doing something you don't expect, start here. -- **[gateway/recipes/](../recipes)** β€” the two seed recipes +- **[gateway/recipes/](../gateway/recipes)** β€” the two seed recipes (`multi-agent-research`, `payment-required-flow`) plus whatever you authored in Chapter 4. Each one is a complete example. @@ -988,7 +988,7 @@ If you're moving this past localhost: ### When you're stuck - Gateway won't boot: re-read the env var section of - [gateway/README.md](../README.md). Partial DID or Hydra config fails + [gateway/README.md](../gateway/README.md). Partial DID or Hydra config fails fast with a message naming the missing var. - Planner never calls a tool: the descriptions you gave for `agents[].skills[].description` are probably too short or too vague. diff --git a/examples/gateway_test_fleet/README.md b/examples/gateway_test_fleet/README.md index 5f9a4f83..3428679c 100644 --- a/examples/gateway_test_fleet/README.md +++ b/examples/gateway_test_fleet/README.md @@ -4,7 +4,7 @@ A reproducible multi-agent setup for exercising the Bindu Gateway end-to-end. Fi ## If you're new here -**Don't start with this folder β€” start with [`gateway/docs/STORY.md`](../../gateway/docs/STORY.md).** That's the guided walkthrough; this fleet is what it uses under the hood. By Chapter 3 of STORY.md you'll have all five agents running via `start_fleet.sh` and a gateway driving them. +**Don't start with this folder β€” start with [`docs/GATEWAY.md`](../../docs/GATEWAY.md).** That's the guided walkthrough; this fleet is what it uses under the hood. By Chapter 3 of STORY.md you'll have all five agents running via `start_fleet.sh` and a gateway driving them. ## What's in here @@ -81,7 +81,7 @@ Each case writes its full SSE stream to `logs/.sse`. Open one end-to-end β€” ## Further reading -- [`gateway/docs/STORY.md`](../../gateway/docs/STORY.md) β€” the end-to-end story this fleet illustrates +- [`docs/GATEWAY.md`](../../docs/GATEWAY.md) β€” the end-to-end story this fleet illustrates - [`gateway/openapi.yaml`](../../gateway/openapi.yaml) β€” machine-readable API contract for the gateway - [`gateway/README.md`](../../gateway/README.md) β€” operator reference (env vars, /health, DID signing reference) - [`gateway/recipes/`](../../gateway/recipes/) β€” seed playbooks you can copy-edit as templates diff --git a/gateway/README.md b/gateway/README.md index 7691f473..cfb9492f 100644 --- a/gateway/README.md +++ b/gateway/README.md @@ -10,7 +10,7 @@ A task-first orchestrator that sits between an **external system** and one or mo ## New here? -**Read [`docs/STORY.md`](./docs/STORY.md) first.** It's a 45-minute end-to-end walkthrough that goes from a clean clone to running three chained agents, authoring a recipe, and turning on DID signing. Written for readers with no prior AI-agent knowledge. +**Read [`docs/GATEWAY.md`](../docs/GATEWAY.md) first.** It's a 45-minute end-to-end walkthrough that goes from a clean clone to running three chained agents, authoring a recipe, and turning on DID signing. Written for readers with no prior AI-agent knowledge. This README is the **operator's reference** β€” configuration, troubleshooting, and pointers into source. The narrative lives in STORY.md. @@ -73,7 +73,7 @@ Returns a detailed JSON payload describing the gateway process β€” version, plan } ``` -For a runnable multi-agent walkthrough, see [`docs/STORY.md`](./docs/STORY.md) Β§Chapter 2-3. +For a runnable multi-agent walkthrough, see [`docs/GATEWAY.md`](../docs/GATEWAY.md) Β§Chapter 2-3. --- @@ -125,7 +125,7 @@ Full request/response contract with examples: [`openapi.yaml`](./openapi.yaml). Recipes are markdown playbooks the planner lazy-loads when a task matches. Only metadata (`name` + `description`) sits in the system prompt; the full body is fetched on demand via the `load_recipe` tool. Pattern borrowed from [OpenCode Skills](https://opencode.ai/docs/skills/), renamed to avoid collision with A2A `SkillRequest` (an agent capability on the `/plan` request body). -**Author one in two minutes** β€” see [`docs/STORY.md`](./docs/STORY.md) Β§Chapter 4 for the walkthrough. The reference: +**Author one in two minutes** β€” see [`docs/GATEWAY.md`](../docs/GATEWAY.md) Β§Chapter 4 for the walkthrough. The reference: ### Layouts @@ -175,7 +175,7 @@ Default action is `allow` β€” an agent with no `recipe:` rules sees everything. For peers configured with `auth.type = "did_signed"`, the gateway signs each outbound A2A request with an Ed25519 identity. Peers verify against the gateway's public key (published at `/.well-known/did.json`) and reject mismatches. -**Full walkthrough** β€” [`docs/STORY.md`](./docs/STORY.md) Β§Chapter 5. The reference: +**Full walkthrough** β€” [`docs/GATEWAY.md`](../docs/GATEWAY.md) Β§Chapter 5. The reference: ### Two modes diff --git a/gateway/docs/BUGS_AND_KNOWN_ISSUES.md b/gateway/docs/BUGS_AND_KNOWN_ISSUES.md deleted file mode 100644 index 84de45f8..00000000 --- a/gateway/docs/BUGS_AND_KNOWN_ISSUES.md +++ /dev/null @@ -1,495 +0,0 @@ -# Bugs and Known Issues β€” Bindu Gateway - -Living list of things that are wrong, brittle, or missing. Everything here is -grounded in specific files and line numbers so fixing them is a concrete task, -not a discovery project. - -**Legend:** -- πŸ”΄ **high** β€” behavior is subtly wrong or security-adjacent; fix before - anyone depends on it for production guarantees. -- 🟠 **medium** β€” real gap, not urgent, will bite someone eventually. -- 🟑 **low** β€” DX wart, stale comment, or nice-to-have. - -**Last reviewed:** 2026-04-20, on branch `feat/gateway-recipes`. - ---- - -## Security and correctness - -### βœ… RESOLVED β€” `signatures.ok === true` was vacuously true - -**Was at:** [`src/bindu/client/index.ts:218`](../src/bindu/client/index.ts:218) -(unchanged β€” that's the authoritative `ok` boolean, which still has -`signed === 0 ? true` semantics because "nothing to fail" isn't a failure). - -**Fixed by:** making the `` envelope label four-valued -instead of three-valued. See -[`computeVerifiedLabel()`](../src/planner/index.ts) β€” new pure helper -exported so the mapping is testable without the whole peer-call loop. - -``` -null β†’ "unknown" -!ok β†’ "no" -ok && signed === 0 β†’ "unsigned" <-- new -ok && signed > 0 && signed === verified β†’ "yes" -``` - -Before the fix, an agent that returned artifacts with no signatures -attached caused `signatures.ok` to be vacuously `true`, which the -envelope stamped as `verified="yes"` β€” indistinguishable from a real -cryptographic pass. The planner LLM trusted it equally. Post-fix the -`unsigned` label surfaces the ambiguity explicitly so the planner can -downweight unverified-hearsay responses in its reasoning. - -Coverage: [`tests/planner/verified-label.test.ts`](../tests/planner/verified-label.test.ts) β€” 7 cases pinning each branch including the regression (vacuous-yes becomes `"unsigned"`). - -Shipped on branch `feat/gateway-recipes` prior to merge. - ---- - -### βœ… RESOLVED β€” `peer.card` is now populated via `fetchAgentCard` - -**Was at:** [`src/bindu/client/index.ts:196`](../src/bindu/client/index.ts:196) (the fallback itself unchanged β€” it was correct, just starved of input). - -**Fixed by:** new [`src/bindu/client/agent-card.ts`](../src/bindu/client/agent-card.ts) β€” -`fetchAgentCard(peerUrl, opts)` that GETs `/.well-known/agent.json`, -Zod-parses it, and caches per-process. Wired into `runCall` at -[`bindu/client/index.ts:153-159`](../src/bindu/client/index.ts:153): - -```ts -if (!input.peer.card && input.peer.trust?.verifyDID) { - const card = await fetchAgentCard(input.peer.url, { signal: input.signal }) - if (card) input.peer.card = card -} -``` - -Now `trust.verifyDID: true` *without* a `pinnedDID` is a real feature: -the gateway observes the peer's published DID, resolves its public key -via the DID document, and verifies every artifact signature against it. -The pinned path still wins when both are set. - -Design choices: -- **Fetch only when `verifyDID: true`** β€” no network cost on peers that - don't care about verification. -- **2-second timeout** β€” short because it blocks the first call per - peer. Failures degrade to null (same behavior as before the fix). -- **Per-process cache includes negatives** β€” a flaky peer doesn't cost - one outbound fetch per `/plan`. -- **Mutation of `input.peer.card`** is safe because `PeerDescriptor` - is built fresh per catalog entry per request β€” no cross-session - leak. - -Coverage: [`tests/bindu/agent-card-fetch.test.ts`](../tests/bindu/agent-card-fetch.test.ts) -β€” 9 cases (success / 404 / malformed / schema-mismatch / network -failure / cache hit / negative cache / per-URL isolation / timeout). - -**Remaining follow-up β€” SSE `agent_did` still null without pinnedDID.** -`plan-route.ts`'s [`findPinnedDID()`](../src/api/plan-route.ts:314) only -reads `trust.pinnedDID`. The observed DID from `peer.card` isn't -surfaced in the SSE stream yet. Separate ticket; tracked below as -🟠 "SSE `agent_did` doesn't surface observed DIDs". - ---- - -### βœ… RESOLVED β€” SSE `agent_did` now surfaces observed DIDs with provenance - -**Was at:** [`src/api/plan-route.ts:314-316`](../src/api/plan-route.ts:314) β€” the old `findPinnedDID` only read `trust.pinnedDID` from the catalog. - -**Fixed by:** `findPinnedDID` replaced with -[`findAgentDID(request, observedByName, agentName)`](../src/api/plan-route.ts) which returns an `{did, source}` -pair. Resolution precedence: - -``` -pinned trust.pinnedDID from /plan catalog strongest -observed peer.card DID from /.well-known/agent.json weaker -null neither path resolved no claim -``` - -Every `task.started` / `task.artifact` / `task.finished` SSE frame now -carries both `agent_did` and a new `agent_did_source` field with -values `"pinned" | "observed" | null` so consumers can apply the -right trust policy per frame. - -The observed map is populated upfront at `/plan` start: -`Promise.allSettled` over every catalog entry's -[`fetchAgentCard()`](../src/bindu/client/agent-card.ts) with a -**2-second total budget** shared across peers via a single -AbortController β€” even a fleet of 50 agents caps discovery at 2s -total, not 2s per peer. Results hit the per-process AgentCard cache -so the Bindu client's own verification path gets them for free on -subsequent calls. - -Coverage: [`tests/api/find-agent-did.test.ts`](../tests/api/find-agent-did.test.ts) -β€” 6 cases pinning precedence (pinned wins over observed), per-agent -independence, graceful null on unknown agent names (e.g., -`load_recipe` passes through the helper too), and map-vs-catalog -isolation. - -openapi: new shared `AgentDIDSource` schema referenced from all three -task.* SSE frames, with explicit docs on how consumers should treat -each value for trust decisions. - ---- - -### 🟠 `pinnedDID` accepts any string β€” no validation that it looks like a DID - -**Where:** [`src/planner/index.ts:77-79`](../src/planner/index.ts:77) - -```ts -trust: z.object({ - verifyDID: z.boolean().optional(), - pinnedDID: z.string().optional(), -}).optional(), -``` - -A caller can send `pinnedDID: "hello"` and the gateway will dutifully echo -`"hello"` in every SSE `agent_did` frame and try to resolve it (fails -silently at resolver). This burned us once with literal `${RESEARCH_DID}` -from an un-interpolated Postman variable. - -**Fix:** refine the Zod schema with `.regex(/^did:[a-z0-9]+:/)` at minimum. -Better: reject unknown DID methods (keep a whitelist: `did:bindu`, -`did:key`). Give the caller a 400 with a clear "didn't look like a DID" -message at the API boundary, not a silent misconfiguration. - ---- - -### 🟠 Session continuation silently overwrites `agentCatalog` on every call - -**Where:** [`src/planner/index.ts:184`](../src/planner/index.ts:184) - -```ts -if (existing) { - yield* db.updateSessionCatalog(sessionID, request.agents) -} -``` - -A second `/plan` with the same `session_id` and a different (shorter? -empty?) `agents` list replaces the catalog in the DB. Prior turns in the -session's history still reference the old catalog's tool names, but the -planner going forward sees the new catalog. If the user's second call -has fewer agents, the planner can't "call back" the agents from turn 1 -even though history suggests they existed. - -**Fix options:** -- Reject catalog mutation on resumed sessions (strict β€” might frustrate - callers who want to extend mid-conversation). -- Append rather than replace (accumulate over session lifetime). -- Version the catalog per message so history correctly references the - catalog that existed at the time the message was written. - -Today's silent-replace is the worst of the three β€” it pretends nothing -changed while changing it. - ---- - -### 🟠 `Recipe.available(agent)` treats "ask" like "allow" - -**Where:** [`src/recipe/index.ts:207-212`](../src/recipe/index.ts:207) - -```ts -return recipes.filter( - (r) => permEvaluate(rs, { permission: "recipe", target: r.name, defaultAction: "allow" }) !== "deny", -) -``` - -The permission evaluator returns `allow | deny | ask`. The recipe loader -filters only `deny`. That means a recipe with `permission: recipe: { "x": "ask" }` -will be **shown to the agent and loadable** without any interactive check β€” -`ctx.ask` on Tool.Context is a no-op today (see next item). - -**Fix:** decide the semantics now. Either: -- Treat "ask" as "deny until a permission UI ships" (safer default), OR -- Document that "ask" is identical to "allow" until Phase 2 β€” and change - the evaluator's three-valued return to a two-valued one for recipes. - -Current state will bite the first operator who writes `"ask"` thinking -it's restrictive. - ---- - -### 🟠 `ctx.ask` permission hook is never wired - -**Where:** [`src/session/prompt.ts:390-405`](../src/session/prompt.ts:390) (the `wrapTool` function) - -The `ToolContext` interface in [`src/tool/tool.ts:26`](../src/tool/tool.ts:26) -declares `ask?` as optional, and `wrapTool` constructs the context without -setting it. [`src/tool/recipe.ts:130-133`](../src/tool/recipe.ts:130) guards -the call (`if (ctx.ask) { ... }`) so there's no crash β€” it's just silently -skipped on every call. - -This means recipes load unconditionally today, permission config -notwithstanding. It's a known Phase-2 gap but operators reading the -`permission.recipe:` docs might reasonably expect it to work. - -**Fix:** either (a) wire a real `ask` implementation that can at minimum -log denied loads for later audit, or (b) add a `WARN: ctx.ask not wired` -log line at boot so operators see it. - ---- - -## Reliability and observability - -### 🟠 No request size limit on `/plan` body - -**Where:** [`src/api/plan-route.ts:62-68`](../src/api/plan-route.ts:62) - -```ts -const body = await c.req.json() -request = PlanRequest.parse(body) -``` - -Hono doesn't cap the body by default. A caller could POST a 100 MB -`agents[]` array and the gateway will happily parse it, run through -Zod, and push it into Supabase. At minimum it'll thrash memory. - -**Fix:** add a `bodyLimit` middleware on the `/plan` route β€” probably -1 MB for now (generous for real catalogs, hard-stops runaway payloads). - ---- - -### 🟠 Supabase failures mid-stream surface as a generic `event: error` - -**Where:** [`src/api/plan-route.ts:181-187`](../src/api/plan-route.ts:181) - -```ts -try { - await Effect.runPromise(planner.runPlan(sessionCtx, request, { abort: ac.signal })) -} catch (e) { - await stream.writeSSE({ - event: "error", - data: JSON.stringify({ message: (e as Error).message }), - }) -} -``` - -If Supabase goes down mid-plan, the generic `Error.message` ends up in -the SSE. Useful for a developer tailing logs; unhelpful for a caller -trying to write retry logic β€” no error code, no structured category. - -**Fix:** classify errors into a small enum (`db_failed`, -`peer_failed`, `llm_failed`, `timeout`) at the boundary and emit both -the code and the message. - ---- - -### 🟠 No observability β€” structured logs, tracing, correlation IDs - -No OTel, no structured-log fields, no correlation ID threaded through -Bus events. Debugging a production incident today means grepping -`console.log` output against the wall clock. - -**Fix path:** -- Adopt a structured logger (`pino` is what OpenCode uses and we - already vendored Effect). -- Add a `request_id` to every `Bus.publish` so a grep can follow one - `/plan` end-to-end. -- Wrap `planner.runPlan` in an OTel span when the env var - `OTEL_EXPORTER_OTLP_ENDPOINT` is set. - ---- - -### 🟑 `/health` checks no downstream connectivity - -By design (see `src/api/health-route.ts` module docstring) `/health` -doesn't ping Supabase or OpenRouter. That's correct for liveness probes -but leaves **no readiness endpoint** that does check downstream state. -An operator wanting k8s readiness gating has nowhere to wire it. - -**Fix:** add `/ready` that *does* ping Supabase + fetches OpenRouter's -`/models` with a short timeout. Return 503 if either fails. Keep -`/health` cheap for liveness. - ---- - -## Recipes - -### 🟠 No hot reload β€” authoring loop requires a gateway restart - -**Where:** [`src/recipe/index.ts:160-195`](../src/recipe/index.ts:160) β€” the -layer reads the filesystem once at init. - -Write a recipe, save, run `/plan` β€” planner uses the old recipe list -(or none if it's your first). Tight authoring loop needs Ctrl-C + -`npm run dev` every time. - -**Fix idea:** `chokidar` watch on `recipes/` dir β†’ re-run -`loadRecipesDir` β†’ swap state on change. Careful with the -`InstanceState`-style caching; the Effect layer will need a -cache-invalidation hook. - ---- - -### 🟑 `load_recipe` emits `agent_did: null` in SSE β€” can confuse consumers - -**Where:** `src/api/plan-route.ts` β€” `findPinnedDID` returns `null` for -any tool whose name doesn't match a catalog `agents[].name`, which is -correct for `load_recipe` (not a peer call) but consumers that filter -`task.*` frames by DID won't find this one. It looks like a peer call -that failed DID pinning. - -**Fix:** add an explicit `tool_kind: "peer" | "local"` field on the -SSE frames so consumers can partition. Right now the `agent` field -sometimes means "catalog entry name" (peer calls) and sometimes means -"local tool id" (load_recipe) β€” overloading. - ---- - -### 🟑 Recipe bundled-file scan has no depth signal in output - -**Where:** [`src/tool/recipe.ts:75-102`](../src/tool/recipe.ts:75) - -The tool returns a flat list of files inside `` β€” -relative paths only, no hint that the scan bottomed out at 10 entries -or 2 levels deep. If a recipe has >10 files, the planner silently sees -a truncated list. - -**Fix:** include an explicit `truncated: true | false` signal in the -metadata returned by the tool, and emit a console warning when a -recipe dir has more than 10 files. - ---- - -## Fleet / DX - -### 🟑 `faq_agent.py` registers as `bindu_docs_agent` β€” confusing DID name - -**Where:** `examples/gateway_test_fleet/faq_agent.py` (Python side, not -gateway code) - -Observed during the DID-printing work: -``` -faq_agent :3778 did:bindu:gateway_test_fleet_at_getbindu_com:bindu_docs_agent:9327ab1d-... -``` - -The DID segment is `bindu_docs_agent` because that's what the Python -agent registered itself as. The filename and the conventional catalog -name say `faq`. First-time readers of STORY.md Chapter 3 will stumble -on the mismatch. - -**Fix:** rename the Python agent to register consistently as -`faq_agent` OR rename the Python file to `bindu_docs_agent.py`. Pick -one, fix in one place. - ---- - -### 🟑 `.fleet.env` can go stale if agents regenerate DIDs - -The script regenerates `.fleet.env` on every run via `>`, so this is -fine in practice. But if an agent's DID seed rotates (delete -`~/.bindu/`, restart), and a user has an old shell with -`$RESEARCH_DID` set from a sourced `.fleet.env`, their next `/plan` -will pin the old DID and signature verification will fail cryptically. - -**Fix:** add a comment to `.fleet.env` reminding users to re-source -after any fleet restart, and have `start_fleet.sh` print a line -saying "if your shell has old DIDs sourced, re-source `.fleet.env`". - ---- - -## API surface - -### 🟑 No explicit `Accept: text/event-stream` enforcement on `/plan` - -Clients that POST without the header still get SSE. Not strictly -wrong (Hono streams regardless) but the implicit contract is -"everyone wants SSE back," which is only true by convention. - -**Fix:** either document `Accept` as required in openapi.yaml (it's -not today) or enforce it with a 406 for anything that doesn't list -`text/event-stream` or `*/*`. - ---- - -### 🟑 OpenRouter is the only supported provider - -**Where:** [`src/provider/index.ts`](../src/provider/index.ts) - -Swapping to a different LLM provider is a code change, not config. -The `model` field accepts `openrouter/…` strings and everything -downstream assumes that prefix. - -**Fix:** make provider lookup factory-style β€” `openrouter/*` β†’ -OpenRouterProvider, `anthropic/*` β†’ AnthropicProvider, etc. Not -urgent since OpenRouter already proxies almost everything, but -worth doing before any customer asks for it. - ---- - -## Tests and coverage gaps - -### 🟑 No planner-layer integration test - -We skipped this in Phase 7 of the recipes work because mocking -Provider + Session + DB + Bus is heavy. The unit tests for the -loader and tool cover the contracts, and the `/plan` SSE handler -has its own filter test, but no single test walks a request -end-to-end through the planner with a mocked LLM. - -**Fix:** build a layer that replaces `Provider.Service` with a -fake that emits a canned `StreamEvent` sequence, then assert the -SSE output matches expectations. Would also let us test the -signature-surfacing work against synthetic tool results. - ---- - -### 🟑 No health-endpoint integration test - -[`tests/api/health-route.test.ts`](../tests/api/health-route.test.ts) -only tests the pure helpers (`splitModelId`, `deriveGatewayId`, -`deriveAuthor`). The actual handler construction + response shape -is covered only by manual curl. - -**Fix:** add a test that builds the real layer graph (minus -Supabase β€” mock it), invokes the handler against a stub Hono -context, and asserts the full response body. Would catch drift -between the openapi schema and the actual response. - ---- - -## Docs - -### 🟑 STORY.md Chapter 5 references `BINDU_GATEWAY_HYDRA_SCOPE` only via -the gateway README β€” no standalone explanation. - -A reader going through STORY.md linearly hits Chapter 5 and needs -to understand what scopes mean before the DID-signing setup makes -sense. The README covers it but STORY.md doesn't. - -**Fix:** one-paragraph sidebar in Chapter 5 explaining "OAuth -scopes are just labels we ask Hydra to stamp on tokens; peers -check they have `agent:read` + `agent:write` before accepting -a /message/send". Not technical, just context. - ---- - -### 🟑 openapi.yaml lists SSE event schemas that aren't `$ref`'d - -Current redocly lint reports 13 "unused component" warnings for -`SSEEvent_Session`, `_Plan`, `_TextDelta`, `_TaskStarted`, etc. -OpenAPI 3.1 has no native SSE modeling so those schemas sit as -reference docs rather than being spliced into a response body. - -**Fix options:** -- Accept the warnings as documented (low-effort, pragmatic). -- Use `oneOf` inside the response's `text/event-stream` schema - to enumerate every event shape. Stretches OpenAPI but at - least gets the schemas used. -- Publish a second spec file (AsyncAPI 2.x or 3.x) for the - SSE surface. AsyncAPI natively models event streams. - ---- - -## How to add to this list - -When you find a gotcha while working on the gateway, add an entry -here. Format: - -- **Title** with severity icon -- **Where** β€” file path + line number (click-through) -- **What** β€” observed behavior + one-line hypothesis of cause -- **Fix** β€” concrete direction, not a rewrite - -Keep entries factual and file-path-grounded. Speculative "would be -nice" wishes belong in GitHub issues; this file is for verifiable -defects and real gaps.