diff --git a/bugs/known-issues.md b/bugs/known-issues.md index f19b2015..799f8514 100644 --- a/bugs/known-issues.md +++ b/bugs/known-issues.md @@ -1,6 +1,6 @@ # Known Issues -_Last updated: 2026-04-19 — restructured into scannable tables and story-format entries for the high-severity items. `did-document-endpoint-returns-raw-dict` removed — see [`core/2026-04-19-did-document-endpoint-raw-dict.md`](./core/2026-04-19-did-document-endpoint-raw-dict.md)._ +_Last updated: 2026-04-20 — recipes/SSE/DID work on branch `feat/gateway-recipes` added 15 gateway entries (5 medium, 6 low, 5 nit) and narrowed two existing ones (`tool-name-collisions-silent` → `parse-agent-from-tool-greedy-mismatch`, `signature-verification-ok-when-unsigned` → `signature-verification-non-text-parts-unverified`) to the residual sub-bugs after partial resolutions landed. `did-document-endpoint-returns-raw-dict` removed — see [`core/2026-04-19-did-document-endpoint-raw-dict.md`](./core/2026-04-19-did-document-endpoint-raw-dict.md)._ This file is user-facing. It's the first thing a new contributor or operator reads to calibrate expectations: what Bindu doesn't do @@ -41,7 +41,7 @@ Issue referencing the slug (e.g. "Fixes `context-window-hardcoded`"). | Subsystem | High | Medium | Low | Nit | |---|---:|---:|---:|---:| -| [Gateway](#gateway) | 3 | 11 | 13 | 4 | +| [Gateway](#gateway) | 3 | 15 | 19 | 9 | | [Bindu Core (Python)](#bindu-core-python) | 4 | 7 | 2 | 0 | | [SDKs (TypeScript)](#sdks-typescript) | — | — | — | — | | [Frontend](#frontend) | — | — | — | — | @@ -59,9 +59,13 @@ Issue referencing the slug (e.g. "Fixes `context-window-hardcoded`"). | [`no-session-concurrency-guard`](#no-session-concurrency-guard) | high | Two `/plan` calls on the same session tangle their histories | | [`abort-signal-not-propagated-to-bindu-client`](#abort-signal-not-propagated-to-bindu-client) | medium | Client disconnect doesn't cancel in-flight peer polls | | [`permission-rules-not-enforced-for-tool-calls`](#permission-rules-not-enforced-for-tool-calls) | medium | Permission service exists but is never called for tool calls | -| [`tool-name-collisions-silent`](#tool-name-collisions-silent) | medium | `(agent, skill)` pairs can collide; second registration wins | +| [`parse-agent-from-tool-greedy-mismatch`](#parse-agent-from-tool-greedy-mismatch) | medium | SSE `agent` field wrong when agent names contain underscores | | [`agent-catalog-overwrite`](#agent-catalog-overwrite) | medium | Each `/plan` wholesale-overwrites the session's agent catalog | -| [`signature-verification-ok-when-unsigned`](#signature-verification-ok-when-unsigned) | medium (sec) | Unsigned peer responses indistinguishable from stripped signatures | +| [`signature-verification-non-text-parts-unverified`](#signature-verification-non-text-parts-unverified) | medium (sec) | `DataPart` / `FilePart` bypass signature verification | +| [`pinned-did-no-format-validation`](#pinned-did-no-format-validation) | medium | Any string is accepted as `pinnedDID`; no DID shape check | +| [`recipe-permission-ask-treated-as-allow`](#recipe-permission-ask-treated-as-allow) | medium | `permission: recipe: { "x": "ask" }` silently allows | +| [`ctx-ask-not-wired-for-recipes`](#ctx-ask-not-wired-for-recipes) | medium | Recipe permission hook is a no-op at runtime | +| [`supabase-error-surfaces-as-generic-sse-error`](#supabase-error-surfaces-as-generic-sse-error) | medium | Mid-stream DB failures emit untyped `event: error` | | [`did-resolver-no-key-id-selection`](#did-resolver-no-key-id-selection) | medium (sec) | Resolver picks the first public key; breaks during rotation | | [`list-messages-pagination-silent`](#list-messages-pagination-silent) | medium | Sessions >1000 messages silently truncate oldest | | [`tool-input-sent-as-textpart`](#tool-input-sent-as-textpart) | medium | Skills expecting `DataPart` receive `TextPart` with stringified JSON | @@ -86,10 +90,21 @@ Issue referencing the slug (e.g. "Fixes `context-window-hardcoded`"). | [`revert-doesnt-cancel-remote-tasks`](#revert-doesnt-cancel-remote-tasks) | low | `revertTo` doesn't cancel still-running peer tasks | | [`empty-agents-catalog-no-400`](#empty-agents-catalog-no-400) | low | `/plan` with empty `agents[]` runs and returns an LLM error | | [`no-migration-rollback`](#no-migration-rollback) | low | Migrations forward-only — no paired `down.sql` | +| [`recipe-no-hot-reload`](#recipe-no-hot-reload) | low | Recipe changes require a gateway restart | +| [`load-recipe-sse-frames-ambiguous`](#load-recipe-sse-frames-ambiguous) | low | `load_recipe` and peer calls use the same SSE `agent` field | +| [`recipe-file-enumeration-no-truncation-signal`](#recipe-file-enumeration-no-truncation-signal) | low | 10+ files per recipe silently truncated in tool output | +| [`faq-agent-did-name-mismatch`](#faq-agent-did-name-mismatch) | low | Fleet's `faq_agent.py` registers as `bindu_docs_agent` | +| [`fleet-env-can-desync-after-seed-rotation`](#fleet-env-can-desync-after-seed-rotation) | low | Stale `$*_DID` env vars survive agent seed rotation | +| [`provider-openrouter-hardcoded`](#provider-openrouter-hardcoded) | low | `model` field assumes OpenRouter prefix | | [`tasks-recorded-is-dead-state`](#tasks-recorded-is-dead-state) | nit | Unused `tasksRecorded` field in the planner | | [`map-finish-reason-pointless-ternary`](#map-finish-reason-pointless-ternary) | nit | Conditional type that evaluates to `any` either way | | [`db-effect-promise-swallows-errors`](#db-effect-promise-swallows-errors) | nit | `Effect.promise` silently swallows rejected promises | | [`test-coverage-gaps`](#test-coverage-gaps) | nit | Backlog list of missing test scenarios | +| [`accept-header-not-enforced-on-plan`](#accept-header-not-enforced-on-plan) | nit | `Accept: text/event-stream` not required | +| [`openapi-sse-schemas-unused`](#openapi-sse-schemas-unused) | nit | 13 redocly `no-unused-components` warnings on SSE schemas | +| [`no-planner-integration-test`](#no-planner-integration-test) | nit | Request → SSE end-to-end test missing | +| [`no-health-handler-integration-test`](#no-health-handler-integration-test) | nit | `/health` only has pure-helper unit tests | +| [`story-chapter5-missing-oauth-scope-explanation`](#story-chapter5-missing-oauth-scope-explanation) | nit | DID chapter assumes reader knows OAuth scopes | ### High @@ -218,23 +233,22 @@ caller sends in the `/plan` request — only include agents the caller is allowed to invoke. **Tracking:** _(no issue yet)_ -### tool-name-collisions-silent +### parse-agent-from-tool-greedy-mismatch **Severity:** medium -**Summary:** `normalizeToolName` in +**Summary:** The collision half of this bug is now +**rejected** at plan-open time — `findDuplicateToolIds` in [`gateway/src/planner/index.ts`](../gateway/src/planner/index.ts) -replaces non-alphanumeric characters with `_` and truncates to 80 -chars. Distinct `(agent, skill)` pairs can map to the same tool id: -e.g. agent `research-v2` + skill `x` and agent `research_v2` + skill -`x` both become `call_research_v2_x`. The second registration -silently overwrites the first. A companion bug: `parseAgentFromTool` -uses a non-greedy regex `^call_(.+?)_(.+)$`, so an agent whose name -contains an underscore parses as `agent=first-segment`, -`skill=everything-else` — wrong for the SSE event the handler emits. -**Workaround:** Use globally-unique agent names in the catalog; -avoid both hyphens and underscores in the same naming scheme; avoid -underscores in agent names entirely if you rely on the `task.started` -SSE agent field being accurate. +returns a 400 when two `(agent, skill)` pairs normalize to the +same tool id. What remains: `parseAgentFromTool` in +[`gateway/src/api/plan-route.ts`](../gateway/src/api/plan-route.ts) +uses a non-greedy regex `^call_(.+?)_(.+)$`, so an agent whose +name contains an underscore (e.g. `research_v2`) emits SSE +`task.started` events with `agent=research`, `skill=v2_x` instead +of the intended `agent=research_v2`, `skill=x`. +**Workaround:** Avoid underscores in agent names in the catalog +you send on `/plan`. If you must, rely on `agent_did` / `task_id` +for correlation, not the SSE `agent` field. **Tracking:** _(no issue yet)_ ### agent-catalog-overwrite @@ -252,22 +266,21 @@ references its prior tool calls. Also exposed to a concurrency race even if individual agents are temporarily unavailable. **Tracking:** _(no issue yet)_ -### signature-verification-ok-when-unsigned +### signature-verification-non-text-parts-unverified **Severity:** medium (security-adjacent) -**Summary:** `verifyArtifact` in +**Summary:** The envelope-ambiguity half of this bug is **fixed** +— `` is now four-valued +(`yes | no | unsigned | unknown`), so the planner LLM can tell a +real cryptographic pass from "nothing was signed". What remains: +`verifyArtifact` in [`gateway/src/bindu/identity/verify.ts`](../gateway/src/bindu/identity/verify.ts) -returns `{ ok: true, signed: 0 }` when a peer sends text parts with -no signatures at all. An attacker stripping signatures from a -compromised peer cannot be distinguished from an unsigned peer. Also, -`file` and `data` parts are never verified at all regardless of -signing — a peer that moves payload into a `DataPart` bypasses -signature checks entirely. -**Workaround:** For trust-sensitive peers, set `trust.verifyDID: -true` AND explicitly check `outcome.signatures.signed > 0` in your -own code before trusting the response. Refuse peers that return -`data` or `file` parts for responses that must be verified. The -gateway does not enforce either today. +only inspects text parts. `file` and `data` parts are never +verified, so a peer moving payload into a `DataPart` bypasses +signature checks entirely even when `verifyDID: true` is set. +**Workaround:** Refuse peers that return `data` or `file` parts +for responses that must be verified. The gateway does not enforce +this constraint today. **Tracking:** _(no issue yet)_ ### did-resolver-no-key-id-selection @@ -679,6 +692,246 @@ Contributors tackling any of the fixable items above should add a test for it in the same PR. **Tracking:** _(no issue yet)_ + + + + +### pinned-did-no-format-validation + +**Severity:** medium +**Summary:** `PeerAuthRequest.trust.pinnedDID` in +[`gateway/src/planner/index.ts`](../gateway/src/planner/index.ts) +is `z.string().optional()` with no shape check. A caller can send +`pinnedDID: "hello"` or accidentally an un-interpolated template +literal like `"${RESEARCH_DID}"`, and the gateway will echo that +string in every SSE `agent_did` frame and try (vainly) to resolve +it for signature verification. Observed in practice when a Postman +user copy-pasted a bash-style variable reference into the request +body. +**Workaround:** Validate DID shape client-side before sending. At +minimum ensure the string starts with `did:bindu:` or `did:key:`. +**Tracking:** _(no issue yet)_ + +### recipe-permission-ask-treated-as-allow + +**Severity:** medium +**Summary:** +[`Recipe.available`](../gateway/src/recipe/index.ts) filters the +recipe list shown to an agent by excluding anything whose +`permission.recipe` resolves to `deny`. The evaluator is +three-valued (`allow | deny | ask`) — `ask` falls through and the +recipe is shown AND loadable. With the `ctx.ask` hook unwired (see +next entry), `"ask"` is silently identical to `"allow"`. +**Workaround:** Treat `"ask"` as if it were `"allow"` for recipes +today. If you need to restrict, use `"deny"`. +**Tracking:** _(no issue yet)_ + +### ctx-ask-not-wired-for-recipes + +**Severity:** medium +**Summary:** [`tool/recipe.ts`](../gateway/src/tool/recipe.ts) +calls `ctx.ask({permission: "recipe", target: name})` before +loading a recipe body, guarded by `if (ctx.ask)`. The `ctx.ask` +field is marked optional in `ToolContext` and +[`wrapTool`](../gateway/src/session/prompt.ts) never sets it. So +the gate is a permanent no-op — recipes load unconditionally +regardless of agent permission config. This is a separate concern +from `permission-rules-not-enforced-for-tool-calls` which targets +the broader tool-call gate. +**Workaround:** Don't rely on `permission.recipe` in agent configs +for production access control. Wait for Phase-2 permission UI. +**Tracking:** _(no issue yet)_ + +### supabase-error-surfaces-as-generic-sse-error + +**Severity:** medium +**Summary:** +[`plan-route.ts`](../gateway/src/api/plan-route.ts) wraps +`runPlan` in a catch that emits `event: error` with only +`{message: string}`. Callers can't programmatically distinguish a +Supabase outage from a peer crash from an LLM failure — all three +look identical on the wire. Operators debugging prod incidents +have to grep logs. +**Workaround:** Tail gateway logs out of band. Client-side, treat +any `event: error` as "retry with exponential backoff" regardless +of cause. +**Tracking:** _(no issue yet)_ + +### recipe-no-hot-reload + +**Severity:** low +**Summary:** +[`Recipe.layer`](../gateway/src/recipe/index.ts) reads +`gateway/recipes/` once at boot. Adding, editing, or deleting a +recipe markdown file has no effect until the gateway process +restarts. Authoring loop is Ctrl-C + `npm run dev` per change. +**Workaround:** Script `npm run dev` to auto-restart on +`recipes/**/*.md` changes using `nodemon` or `tsx watch` with a +wider include. Or edit + restart; the recipe layer init is cheap. +**Tracking:** _(no issue yet)_ + +### load-recipe-sse-frames-ambiguous + +**Severity:** low +**Summary:** When the planner calls the internal `load_recipe` +tool, the SSE stream emits `task.started` / `task.artifact` / +`task.finished` frames with `agent: "load_recipe"` and +`agent_did: null`. Consumers parsing `task.*` frames by peer +correlation have no crisp way to tell this apart from a peer call +that happens to have a null DID. The `agent_did_source: null` +field helps (peer calls from unpinned observed-failed peers also +have it null) but doesn't cleanly partition. +**Fix:** add an explicit `tool_kind: "peer" | "local"` field on +the task.* SSE frames. +**Workaround:** Filter on `agent === "load_recipe"` client-side. +**Tracking:** _(no issue yet)_ + +### recipe-file-enumeration-no-truncation-signal + +**Severity:** low +**Summary:** The +[`load_recipe`](../gateway/src/tool/recipe.ts) tool returns a +`` block listing sibling files inside a bundled +recipe's directory, capped at 10 entries. If a recipe directory +has more than 10 files, the excess silently disappears from the +list — the planner has no way to know it's seeing a sample, not +the full set. +**Fix:** include an explicit `truncated: true` marker in the tool +result's metadata, and emit a `console.warn` at boot when a recipe +directory exceeds the cap. +**Workaround:** Keep bundled recipe directories under 10 files. +**Tracking:** _(no issue yet)_ + +### faq-agent-did-name-mismatch + +**Severity:** low +**Summary:** The fleet demo's +[`examples/gateway_test_fleet/faq_agent.py`](../examples/gateway_test_fleet/faq_agent.py) +registers its DID as `bindu_docs_agent` but the filename and the +operator-facing port labels (3778) say `faq_agent`. First-time +readers following `docs/GATEWAY.md` Chapter 3 see the mismatch in +the SSE `agent_did` strings vs the catalog `agent` field and +wonder if something is wrong. Python-side concern, no gateway code +involved. +**Workaround:** The mismatch is cosmetic — signature verification +and routing both work. Ignore or pin `faq_agent` DID explicitly in +the catalog. +**Tracking:** _(no issue yet)_ + +### fleet-env-can-desync-after-seed-rotation + +**Severity:** low +**Summary:** +[`examples/gateway_test_fleet/start_fleet.sh`](../examples/gateway_test_fleet/start_fleet.sh) +writes fresh DIDs to `.fleet.env` on every run. But if an agent's +seed rotates (`rm -rf ~/.bindu`, restart) and a user has an old +shell with `$RESEARCH_DID` still sourced from a prior `.fleet.env`, +their next `/plan` pins a stale DID. Signature verification fails +with a cryptic mismatch error. +**Fix:** have `start_fleet.sh` print a "re-source `.fleet.env` if +you had one loaded previously" hint whenever any agent's DID +differs from the previous run's cache. +**Workaround:** Always `source .fleet.env` fresh after any fleet +restart. +**Tracking:** _(no issue yet)_ + +### provider-openrouter-hardcoded + +**Severity:** low +**Summary:** +[`gateway/src/provider/index.ts`](../gateway/src/provider/index.ts) +and downstream code assume the `openrouter/` model prefix. Adding +direct Anthropic or direct OpenAI support (without going through +OpenRouter's proxy) is a code change, not config. +**Workaround:** Use OpenRouter as the universal proxy; it supports +every major provider. Only reach for this when you need direct +provider features (e.g. Anthropic prompt caching, OpenAI +fine-tuned model access) that OpenRouter doesn't proxy. +**Tracking:** _(no issue yet)_ + +### accept-header-not-enforced-on-plan + +**Severity:** nit +**Summary:** `POST /plan` returns `text/event-stream` regardless +of whether the client sent `Accept: text/event-stream`. Clients +that forget the header still get SSE, which is convenient but +breaks strict content-negotiation semantics. Not currently +documented as required in +[`openapi.yaml`](../gateway/openapi.yaml) either. +**Fix:** either document the header as required and return 406 +when absent, or keep the permissive behavior and document it. +Currently we're in the worst middle ground. +**Workaround:** None needed; current behavior is operationally +fine. Consumers relying on strict 406 on wrong `Accept` won't get +it. +**Tracking:** _(no issue yet)_ + +### openapi-sse-schemas-unused + +**Severity:** nit +**Summary:** `redocly lint gateway/openapi.yaml` reports 13 +`no-unused-components` warnings on the `SSEEvent_*` schemas. +OpenAPI 3.1 has no native SSE modeling, so those schemas sit as +reference material rather than being `$ref`'d from a response +body. The warnings are expected given the format's limitations, +not indicative of drift. +**Fix options:** (a) accept — pragmatic, 0 errors, just warnings; +(b) use `oneOf` inside the `text/event-stream` response schema to +enumerate each event shape (stretches OpenAPI); (c) publish a +separate AsyncAPI 2.x/3.x spec for the SSE surface. +**Workaround:** None needed; warnings don't break consumers. +**Tracking:** _(no issue yet)_ + +### no-planner-integration-test + +**Severity:** nit +**Summary:** The unit tests cover every Gateway module in +isolation, but no single test walks a `/plan` request +end-to-end with a mocked LLM provider. The recipes feature, +signatures surfacing, and observed-DID resolution were all +verified manually against the real stack and via targeted unit +tests for their pure helpers. A regression in the cross-cutting +glue (tool → Bus → SSE JSON) would be caught only at integration +time. +**Fix:** mock `Provider.Service` with a fake emitting a canned +`StreamEvent` sequence; assert SSE output matches expectations. +**Workaround:** None — treat this as internal backlog. +**Tracking:** _(no issue yet)_ + +### no-health-handler-integration-test + +**Severity:** nit +**Summary:** +[`tests/api/health-route.test.ts`](../gateway/tests/api/health-route.test.ts) +covers only the pure helpers (`splitModelId`, `deriveGatewayId`, +`deriveAuthor`). The handler's full response shape — version, +planner-model nesting, runtime flags, uptime math — is verified +by manual curl, not by a test that builds the layer graph and +asserts the JSON. Drift between `openapi.yaml`'s `HealthResponse` +schema and the actual response would go unnoticed until someone +hand-checks. +**Fix:** build a minimal layer graph (mock Supabase) in a test, +invoke the handler against a stub Hono context, assert the body +matches the openapi schema. +**Workaround:** Manual curl against a running gateway. +**Tracking:** _(no issue yet)_ + +### story-chapter5-missing-oauth-scope-explanation + +**Severity:** nit +**Summary:** +[`docs/GATEWAY.md`](../docs/GATEWAY.md) Chapter 5 sets +`BINDU_GATEWAY_HYDRA_SCOPE` via env vars but never explains what +OAuth scopes are or why `agent:read` + `agent:write` are the +defaults. A reader walking the story linearly hits the config +step without context. +**Fix:** one-paragraph sidebar in Chapter 5 explaining "scopes +are labels we ask Hydra to stamp on tokens; peers check them +before accepting a `message/send`". +**Workaround:** Cross-ref to gateway/README.md §DID signing which +explains it. +**Tracking:** _(no issue yet)_ + --- ## Bindu Core (Python) diff --git a/gateway/docs/STORY.md b/docs/GATEWAY.md similarity index 98% rename from gateway/docs/STORY.md rename to docs/GATEWAY.md index 5f48c5d5..4db05b73 100644 --- a/gateway/docs/STORY.md +++ b/docs/GATEWAY.md @@ -393,7 +393,7 @@ Five agents now, each on its own port: | faq_agent | 3778 | Answers from a canned FAQ | Each is ~60 lines of Python. Open any one — say -[joke_agent.py](../../examples/gateway_test_fleet/joke_agent.py) — and you'll see +[joke_agent.py](../examples/gateway_test_fleet/joke_agent.py) — and you'll see a small configuration that wires a language model (`openai/gpt-4o-mini`) to a few lines of instructions ("tell jokes, refuse other requests"). Narrow scope on purpose so mistakes are visible. @@ -660,7 +660,7 @@ Where recipes shine: payment-required`, surface the payment URL to the user and STOP — do not retry" is a policy the planner wouldn't invent on its own. See the seed recipe at - [gateway/recipes/payment-required-flow/RECIPE.md](../recipes/payment-required-flow/RECIPE.md) + [gateway/recipes/payment-required-flow/RECIPE.md](../gateway/recipes/payment-required-flow/RECIPE.md) for a real example. - **Tenant-specific rules.** A recipe visible only to a certain agent can encode rules like "always include a disclaimer" or "always call @@ -929,20 +929,20 @@ skip. ### Reference material -- **[gateway/openapi.yaml](../openapi.yaml)** — the machine-readable +- **[gateway/openapi.yaml](../gateway/openapi.yaml)** — the machine-readable contract for `/plan`, `/health`, and `/.well-known/did.json`. Paste it into [Swagger UI](https://editor.swagger.io) or [Stoplight](https://stoplight.io) to click through every field, response, and example. This is the source of truth; this document is the prose. -- **[gateway/README.md](../README.md)** — the operator's reference: +- **[gateway/README.md](../gateway/README.md)** — the operator's reference: configuration knobs, environment variables, the `/health` payload, troubleshooting, and where vendored code came from (OpenCode). Short and targeted — most of the narrative moved into this story. -- **[gateway/agents/planner.md](../agents/planner.md)** — the planner +- **[gateway/agents/planner.md](../gateway/agents/planner.md)** — the planner LLM's system prompt. If the gateway is doing something you don't expect, start here. -- **[gateway/recipes/](../recipes)** — the two seed recipes +- **[gateway/recipes/](../gateway/recipes)** — the two seed recipes (`multi-agent-research`, `payment-required-flow`) plus whatever you authored in Chapter 4. Each one is a complete example. @@ -988,7 +988,7 @@ If you're moving this past localhost: ### When you're stuck - Gateway won't boot: re-read the env var section of - [gateway/README.md](../README.md). Partial DID or Hydra config fails + [gateway/README.md](../gateway/README.md). Partial DID or Hydra config fails fast with a message naming the missing var. - Planner never calls a tool: the descriptions you gave for `agents[].skills[].description` are probably too short or too vague. diff --git a/docs/openapi.yaml b/docs/openapi.yaml index 7ab7ace4..68b47a65 100644 --- a/docs/openapi.yaml +++ b/docs/openapi.yaml @@ -595,7 +595,7 @@ paths: - kind: "text" text: "Paris." metadata: - did.message.signature: "2M1qbfLcyoQhSAfTzDghw15PMTfmv3jUoigk7KRuiowkEWZpU7aYLHTnqwamjEo4SxNskq15PZANNLuhJ7omzsxg" + did.message.signature: "2M1qbfLcyoQhSAfTzDghw15PMTfmv3jUoigk7KRuiowkEWZpU7aYLHTnqwamjEo4SxNskq15PZANNLuhJ7omzsxg" # pragma: allowlist secret artifact_id: "985b4f37-ee2e-48a4-bd6f-c66472e67b85" metadata: {} diff --git a/examples/gateway_test_fleet/README.md b/examples/gateway_test_fleet/README.md index 5f9a4f83..3428679c 100644 --- a/examples/gateway_test_fleet/README.md +++ b/examples/gateway_test_fleet/README.md @@ -4,7 +4,7 @@ A reproducible multi-agent setup for exercising the Bindu Gateway end-to-end. Fi ## If you're new here -**Don't start with this folder — start with [`gateway/docs/STORY.md`](../../gateway/docs/STORY.md).** That's the guided walkthrough; this fleet is what it uses under the hood. By Chapter 3 of STORY.md you'll have all five agents running via `start_fleet.sh` and a gateway driving them. +**Don't start with this folder — start with [`docs/GATEWAY.md`](../../docs/GATEWAY.md).** That's the guided walkthrough; this fleet is what it uses under the hood. By Chapter 3 of STORY.md you'll have all five agents running via `start_fleet.sh` and a gateway driving them. ## What's in here @@ -81,7 +81,7 @@ Each case writes its full SSE stream to `logs/.sse`. Open one end-to-end — ## Further reading -- [`gateway/docs/STORY.md`](../../gateway/docs/STORY.md) — the end-to-end story this fleet illustrates +- [`docs/GATEWAY.md`](../../docs/GATEWAY.md) — the end-to-end story this fleet illustrates - [`gateway/openapi.yaml`](../../gateway/openapi.yaml) — machine-readable API contract for the gateway - [`gateway/README.md`](../../gateway/README.md) — operator reference (env vars, /health, DID signing reference) - [`gateway/recipes/`](../../gateway/recipes/) — seed playbooks you can copy-edit as templates diff --git a/gateway/README.md b/gateway/README.md index 7691f473..cfb9492f 100644 --- a/gateway/README.md +++ b/gateway/README.md @@ -10,7 +10,7 @@ A task-first orchestrator that sits between an **external system** and one or mo ## New here? -**Read [`docs/STORY.md`](./docs/STORY.md) first.** It's a 45-minute end-to-end walkthrough that goes from a clean clone to running three chained agents, authoring a recipe, and turning on DID signing. Written for readers with no prior AI-agent knowledge. +**Read [`docs/GATEWAY.md`](../docs/GATEWAY.md) first.** It's a 45-minute end-to-end walkthrough that goes from a clean clone to running three chained agents, authoring a recipe, and turning on DID signing. Written for readers with no prior AI-agent knowledge. This README is the **operator's reference** — configuration, troubleshooting, and pointers into source. The narrative lives in STORY.md. @@ -73,7 +73,7 @@ Returns a detailed JSON payload describing the gateway process — version, plan } ``` -For a runnable multi-agent walkthrough, see [`docs/STORY.md`](./docs/STORY.md) §Chapter 2-3. +For a runnable multi-agent walkthrough, see [`docs/GATEWAY.md`](../docs/GATEWAY.md) §Chapter 2-3. --- @@ -125,7 +125,7 @@ Full request/response contract with examples: [`openapi.yaml`](./openapi.yaml). Recipes are markdown playbooks the planner lazy-loads when a task matches. Only metadata (`name` + `description`) sits in the system prompt; the full body is fetched on demand via the `load_recipe` tool. Pattern borrowed from [OpenCode Skills](https://opencode.ai/docs/skills/), renamed to avoid collision with A2A `SkillRequest` (an agent capability on the `/plan` request body). -**Author one in two minutes** — see [`docs/STORY.md`](./docs/STORY.md) §Chapter 4 for the walkthrough. The reference: +**Author one in two minutes** — see [`docs/GATEWAY.md`](../docs/GATEWAY.md) §Chapter 4 for the walkthrough. The reference: ### Layouts @@ -175,7 +175,7 @@ Default action is `allow` — an agent with no `recipe:` rules sees everything. For peers configured with `auth.type = "did_signed"`, the gateway signs each outbound A2A request with an Ed25519 identity. Peers verify against the gateway's public key (published at `/.well-known/did.json`) and reject mismatches. -**Full walkthrough** — [`docs/STORY.md`](./docs/STORY.md) §Chapter 5. The reference: +**Full walkthrough** — [`docs/GATEWAY.md`](../docs/GATEWAY.md) §Chapter 5. The reference: ### Two modes diff --git a/gateway/openapi.yaml b/gateway/openapi.yaml index c02b37ea..b30e7991 100644 --- a/gateway/openapi.yaml +++ b/gateway/openapi.yaml @@ -937,7 +937,7 @@ components: SSEEvent_TaskStarted: type: object - required: [task_id, agent, agent_did, skill, input] + required: [task_id, agent, agent_did, agent_did_source, skill, input] properties: task_id: type: string @@ -947,7 +947,15 @@ components: description: Display name of the peer agent (from `agents[].name`). agent_did: type: [string, "null"] - description: Pinned DID for the agent (from `agents[].trust.pinnedDID`), or null if not pinned. + description: | + The peer's DID, resolved with precedence pinned → observed → null: + (a) `trust.pinnedDID` from the /plan catalog if set; otherwise + (b) the DID the peer published at `/.well-known/agent.json`, + fetched by the gateway at plan-open time; otherwise + (c) `null` — cryptographic identity undeclared. + See `agent_did_source` for which path resolved it. + agent_did_source: + $ref: "#/components/schemas/AgentDIDSource" skill: type: string description: Skill id being invoked on the peer. @@ -961,7 +969,7 @@ components: SSEEvent_TaskArtifact: type: object - required: [task_id, agent, agent_did, content] + required: [task_id, agent, agent_did, agent_did_source, content] properties: task_id: type: string @@ -969,12 +977,27 @@ components: type: string agent_did: type: [string, "null"] + description: Same resolution rules as on `task.started` — pinned → observed → null. + agent_did_source: + $ref: "#/components/schemas/AgentDIDSource" content: type: string description: | - The peer's artifact text, wrapped in a `...` + The peer's artifact text, wrapped in a + `...` envelope. The planner treats this as untrusted data — clients should too. + + `verified` is four-valued: + - `yes` → at least one signed artifact and all signed + verified against the pinned DID's public key. + Strongest guarantee. + - `no` → at least one signed artifact failed + verification. Task is also marked `failed`. + - `unsigned` → verification ran but no artifact carried a + signature. The body is unverified hearsay. + - `unknown` → verification wasn't attempted (no `verifyDID`, + no `pinnedDID`, or DID doc unreachable). title: type: string description: Short display title, typically `@/`. @@ -993,7 +1016,7 @@ components: SSEEvent_TaskFinished: type: object - required: [task_id, agent, agent_did, state] + required: [task_id, agent, agent_did, agent_did_source, state] properties: task_id: type: string @@ -1001,6 +1024,9 @@ components: type: string agent_did: type: [string, "null"] + description: Same resolution rules as on `task.started` — pinned → observed → null. + agent_did_source: + $ref: "#/components/schemas/AgentDIDSource" state: type: string enum: [completed, failed] @@ -1052,6 +1078,34 @@ components: description: Empty object. Last frame of every successful plan. additionalProperties: false + AgentDIDSource: + type: [string, "null"] + enum: [pinned, observed, null] + description: | + Provenance of the `agent_did` on the same SSE frame. Tells + consumers which of three paths resolved the DID, so they can + apply the right trust policy: + + - `"pinned"` — the caller declared `trust.pinnedDID` in the + /plan catalog. The caller vouched for this identity; the + gateway enforces it when `verifyDID: true` is also set. + Strongest claim a consumer can get out of this field. + - `"observed"` — the peer self-reported this DID in its + `/.well-known/agent.json` AgentCard, fetched by the gateway + at plan-open time. Weaker than pinned: an impostor standing + up a fake endpoint can advertise any DID they choose unless + signature verification is on. + - `null` — neither path resolved. Either the caller didn't pin + AND the AgentCard couldn't be fetched (no `.well-known`, + network failure, malformed), or the AgentCard had no DID in + either `id` or `capabilities.extensions[].uri`. + + Compliance-gated consumers should treat `"observed"` and + `null` identically unless they also see a `signatures.ok:true` + with `signed > 0` on the same or a following frame — that's + the cryptographic evidence that promotes an observed DID to a + verified one. + PlanSignatures: type: [object, "null"] description: | diff --git a/gateway/src/api/plan-route.ts b/gateway/src/api/plan-route.ts index a1cc61f8..9b26368b 100644 --- a/gateway/src/api/plan-route.ts +++ b/gateway/src/api/plan-route.ts @@ -12,6 +12,8 @@ import { import { Service as BusService, type Interface as BusInterface } from "../bus" import { Service as ConfigService, type Config } from "../config" import { PromptEvent } from "../session/prompt" +import { fetchAgentCard } from "../bindu/client/agent-card" +import { getPeerDID } from "../bindu/protocol/identity" import type { z } from "zod" /** @@ -92,6 +94,35 @@ async function handleRequest( ) } + // 2b. Pre-fetch each peer's AgentCard in parallel (total ≤2s budget) + // so we can surface observed DIDs in SSE even when the caller + // didn't pin one. Results are cached in fetchAgentCard's + // per-process Map — the Bindu client's downstream runCall will + // hit the same cache for free. Failures don't block: individual + // peer AgentCards default to "not observed", `agent_did` stays + // null for that peer. + const observedByName = new Map() + { + const discoveryBudget = 2000 + const ac = new AbortController() + const timer = setTimeout(() => ac.abort(), discoveryBudget) + try { + await Promise.allSettled( + request.agents.map(async (ag) => { + const card = await fetchAgentCard(ag.endpoint, { + signal: ac.signal, + timeoutMs: discoveryBudget, + }) + if (!card) return + const did = getPeerDID(card) + if (did) observedByName.set(ag.name, did) + }), + ) + } finally { + clearTimeout(timer) + } + } + // 3. Resolve session BEFORE opening SSE — required so subscribers can // filter events by sessionID. Any failure here returns plain JSON. let sessionCtx: SessionContext @@ -150,17 +181,18 @@ async function handleRequest( spawnReader(ac.signal, ownEvent(bus.subscribe(PromptEvent.ToolCallStart)), async (evt) => { const agentName = parseAgentFromTool(evt.properties.tool) + // Resolve the peer's DID with provenance. Pinned wins over + // observed (the caller vouched; observed is self-reported). + // Emitted on every task.* frame so SSE consumers can partition + // by `agent_did_source` if they only trust pinned claims. + const agentId = findAgentDID(request, observedByName, agentName) await stream.writeSSE({ event: "task.started", data: JSON.stringify({ task_id: evt.properties.callID, agent: agentName, - // Unique identifier for the peer. Agent names are - // operator-chosen and can collide across catalogs; DIDs - // are the stable cryptographic handle. Null when the - // request didn't pin a DID for this peer (auth.type="none" - // / "bearer" / "bearer_env" without trust.pinnedDID). - agent_did: findPinnedDID(request, agentName), + agent_did: agentId.did, + agent_did_source: agentId.source, skill: parseSkillFromTool(evt.properties.tool), input: evt.properties.input, }), @@ -178,12 +210,14 @@ async function handleRequest( evt.properties.signatures !== undefined ? { signatures: evt.properties.signatures } : {} + const agentId = findAgentDID(request, observedByName, agentName) await stream.writeSSE({ event: "task.artifact", data: JSON.stringify({ task_id: evt.properties.callID, agent: agentName, - agent_did: findPinnedDID(request, agentName), + agent_did: agentId.did, + agent_did_source: agentId.source, content: evt.properties.output, title: evt.properties.title, ...sigField, @@ -194,7 +228,8 @@ async function handleRequest( data: JSON.stringify({ task_id: evt.properties.callID, agent: agentName, - agent_did: findPinnedDID(request, agentName), + agent_did: agentId.did, + agent_did_source: agentId.source, state: evt.properties.error ? "failed" : "completed", ...(evt.properties.error ? { error: evt.properties.error } : {}), ...sigField, @@ -312,17 +347,40 @@ function parseSkillFromTool(toolId: string): string { return m?.[2] ?? "" } +/** Shape returned by findAgentDID — keeps DID + provenance together + * so every SSE frame can emit both without re-running the lookup. */ +export interface AgentDIDResolution { + readonly did: string | null + readonly source: "pinned" | "observed" | null +} + /** - * Resolve a peer's DID from the /plan request's agent catalog. + * Resolve a peer's DID with provenance, in precedence order: + * + * 1. ``trust.pinnedDID`` from the /plan request catalog — the caller + * explicitly declared which DID they expect. Strongest claim. + * 2. Observed DID from the peer's AgentCard (fetched upfront during + * /plan setup, keyed by agent name) — the peer self-reports this + * identity at /.well-known/agent.json. Weaker: an impostor can + * advertise any DID they like unless signature verification is on. + * 3. ``null`` — neither path resolved. Consumer can still identify + * the peer by name for display; cryptographic identity is + * unknown. * - * DIDs are optional in the API (callers can talk to ``auth.type="none"`` - * / ``"bearer"`` peers without ever pinning a DID), so this returns - * ``null`` when the catalog has no ``trust.pinnedDID`` for the named - * peer. SSE consumers treat ``null`` as "no cryptographic identity - * declared" — they can still identify the peer by name for display, - * just without the stable unique handle. + * The ``source`` field lets SSE consumers decide which guarantee they + * need. A consumer building an audit log of "calls made to peer X" + * might accept ``observed`` (human-readable correlation); a compliance + * gate might reject anything other than ``pinned``. */ -function findPinnedDID(request: PlanRequest, agentName: string): string | null { +export function findAgentDID( + request: PlanRequest, + observedByName: Map, + agentName: string, +): AgentDIDResolution { const entry = request.agents.find((a) => a.name === agentName) - return entry?.trust?.pinnedDID ?? null + const pinned = entry?.trust?.pinnedDID + if (pinned) return { did: pinned, source: "pinned" } + const observed = observedByName.get(agentName) + if (observed) return { did: observed, source: "observed" } + return { did: null, source: null } } diff --git a/gateway/src/bindu/client/agent-card.ts b/gateway/src/bindu/client/agent-card.ts new file mode 100644 index 00000000..5877c021 --- /dev/null +++ b/gateway/src/bindu/client/agent-card.ts @@ -0,0 +1,88 @@ +import { AgentCard } from "../protocol/agent-card" + +/** + * Fetch and parse a peer's AgentCard from `/.well-known/agent.json`. + * + * Populated via this helper, the `peer.card` field on a PeerDescriptor + * activates the DID fallback in `maybeVerifySignatures` — when the + * caller sets `trust.verifyDID: true` but didn't pin a DID, the + * gateway can recover the peer's published DID from its AgentCard and + * verify artifacts against the corresponding public key. + * + * # Cache + * + * Per-process, keyed by peer URL. AgentCards are stable for the life + * of the peer process; a gateway restart is an acceptable boundary + * for picking up a rotated identity. Negative results (404, malformed + * JSON, timeout) are cached too so a flaky peer doesn't cost us one + * outbound fetch per /plan. + * + * # Timeout + * + * 2 seconds by default — the fetch blocks the first call to a new + * peer, so we keep it short. Callers can override via the opts. + * + * # Errors become `null` + * + * Every failure mode (network error, non-2xx, invalid JSON, schema + * mismatch, abort) returns `null` rather than throwing. The fallback + * in maybeVerifySignatures degrades gracefully — null peer.card just + * means the pinnedDID path is the only option, same behavior as before + * this module existed. Errors aren't "safety failures" here; they're + * "couldn't enrich." + */ + +const cache = new Map() + +export interface FetchAgentCardOptions { + readonly signal?: AbortSignal + readonly timeoutMs?: number +} + +const DEFAULT_TIMEOUT_MS = 2000 +const WELL_KNOWN_PATH = "/.well-known/agent.json" + +export async function fetchAgentCard( + peerUrl: string, + opts: FetchAgentCardOptions = {}, +): Promise { + if (cache.has(peerUrl)) return cache.get(peerUrl) ?? null + + const timeoutMs = opts.timeoutMs ?? DEFAULT_TIMEOUT_MS + const ac = new AbortController() + const timer = setTimeout(() => ac.abort(), timeoutMs) + const onUpstreamAbort = () => ac.abort() + opts.signal?.addEventListener("abort", onUpstreamAbort, { once: true }) + + try { + const target = new URL(WELL_KNOWN_PATH, peerUrl).toString() + const res = await fetch(target, { signal: ac.signal }) + if (!res.ok) { + cache.set(peerUrl, null) + return null + } + const json = (await res.json()) as unknown + const parsed = AgentCard.safeParse(json) + if (!parsed.success) { + cache.set(peerUrl, null) + return null + } + cache.set(peerUrl, parsed.data) + return parsed.data + } catch { + cache.set(peerUrl, null) + return null + } finally { + clearTimeout(timer) + opts.signal?.removeEventListener("abort", onUpstreamAbort) + } +} + +/** + * Reset the AgentCard cache. Only intended for use in tests — production + * callers have no reason to evict (a gateway restart picks up any + * identity rotation). Exported on a `__`-prefixed name to signal intent. + */ +export function __resetAgentCardCacheForTests(): void { + cache.clear() +} diff --git a/gateway/src/bindu/client/index.ts b/gateway/src/bindu/client/index.ts index f3d9d085..5c7f571d 100644 --- a/gateway/src/bindu/client/index.ts +++ b/gateway/src/bindu/client/index.ts @@ -10,6 +10,7 @@ import { verifyArtifact } from "../identity/verify" import { createResolver, primaryPublicKeyBase58 } from "../identity/resolve" import { getPeerDID } from "../protocol/identity" import type { AgentCard } from "../protocol/agent-card" +import { fetchAgentCard } from "./agent-card" /** * Public Bindu client — the thing the planner's tools invoke. @@ -148,6 +149,22 @@ async function runCall( identity: LocalIdentity | undefined, tokenProvider: TokenProvider | undefined, ): Promise { + // Populate peer.card from /.well-known/agent.json on first contact. + // Cached per process — subsequent calls to the same peer are free. + // This activates the AgentCard-based DID fallback in + // maybeVerifySignatures: when the caller enabled `trust.verifyDID` + // without pinning a DID, the gateway recovers the peer's published + // DID here and verifies against its public key. + // + // Runs concurrently-safe (cache handles races via last-write-wins, + // AgentCards are stable), non-blocking on failure (returns null, + // peer.card stays undefined, verification falls through to the + // pinnedDID-only path). + if (!input.peer.card && input.peer.trust?.verifyDID) { + const card = await fetchAgentCard(input.peer.url, { signal: input.signal }) + if (card) input.peer.card = card + } + const parts: Part[] = typeof input.input === "string" ? [{ kind: "text", text: input.input }] : input.input diff --git a/gateway/src/planner/index.ts b/gateway/src/planner/index.ts index b58ce2ea..276af264 100644 --- a/gateway/src/planner/index.ts +++ b/gateway/src/planner/index.ts @@ -398,11 +398,15 @@ function buildSkillTool(peer: PeerDescriptor, skill: SkillRequest, deps: BuildTo }, }) - // 4. Wrap the output in an untrusted-content envelope for the planner + // 4. Wrap the output in an untrusted-content envelope for the planner. + // The `verified` attribute is four-valued (yes/no/unsigned/unknown) + // so the planner LLM can distinguish a cryptographic pass from + // "no signatures existed to check" — the latter used to appear + // as `verified="yes"` (vacuous) and quietly misled the model. const wrapped = wrapRemoteContent({ agentName: peer.name, did: peer.trust?.pinnedDID ?? null, - verified: outcome.signatures?.ok ?? null, + verified: computeVerifiedLabel(outcome.signatures ?? null), body: outputText, }) @@ -557,13 +561,46 @@ function extractOutputText(task: import("../bindu/protocol/types").Task): string return pieces.join("\n\n") } +/** + * Label the `verified` attribute on the envelope with + * enough fidelity that the planner LLM can distinguish the four real + * states verification can produce: + * + * - `"yes"` — at least one artifact carried a signature AND every + * signed artifact checked out against the pinned DID's + * public key. Strongest claim. + * - `"no"` — at least one signed artifact failed verification. + * Treat the body as definitely-tampered or + * wrong-provenance; the task is also marked `failed`. + * - `"unsigned"` — verification ran, but no artifact had a signature + * attached. Bodies came back; nothing was checked. + * Previously this collapsed into `"yes"` because the + * Bindu client's `signatures.ok` is vacuously true + * when `signed === 0` — misleading. Now it's + * explicit, so the planner can weight the source + * appropriately (e.g., treat as unverified hearsay). + * - `"unknown"` — verification was not attempted. Either + * `trust.verifyDID` was false, no pinned DID was + * supplied, or DID document resolution failed. + */ +export type VerifiedLabel = "yes" | "no" | "unsigned" | "unknown" + +export function computeVerifiedLabel( + signatures: { ok: boolean; signed: number; verified: number; unsigned: number } | null, +): VerifiedLabel { + if (signatures === null) return "unknown" + if (!signatures.ok) return "no" + if (signatures.signed === 0) return "unsigned" + return "yes" +} + function wrapRemoteContent(input: { agentName: string did: string | null - verified: boolean | null + verified: VerifiedLabel body: string }): string { - const verified = input.verified === null ? "unknown" : input.verified ? "yes" : "no" + const verified = input.verified const didAttr = input.did ? ` did="${escapeAttr(input.did)}"` : "" // Scrub nested envelope markers and common injection phrases so a // malicious peer can't forge a wrapper inside their own response. diff --git a/gateway/tests/api/find-agent-did.test.ts b/gateway/tests/api/find-agent-did.test.ts new file mode 100644 index 00000000..68eaaef1 --- /dev/null +++ b/gateway/tests/api/find-agent-did.test.ts @@ -0,0 +1,98 @@ +import { describe, it, expect } from "vitest" +import { findAgentDID } from "../../src/api/plan-route" +import type { PlanRequest } from "../../src/planner" + +/** + * Precedence test for `findAgentDID` — the resolver that decides which + * DID (and which provenance label) lands on each task.* SSE frame. + * + * Covers the fix for the "SSE agent_did doesn't surface observed DIDs" + * bug (BUGS_AND_KNOWN_ISSUES.md). Before: the helper only read + * `trust.pinnedDID`, so any caller that didn't pin saw `agent_did: + * null` even after fetchAgentCard had observed the peer's real DID. + * After: observed DIDs are the fallback, with `source` distinguishing + * the two. + */ + +const DID_PINNED = "did:bindu:ops_at_example_com:research:pinned-11111111" +const DID_OBSERVED = "did:bindu:ops_at_example_com:research:observed-22222222" + +const mkRequest = (pinnedFor: Record = {}): PlanRequest => + ({ + question: "test", + agents: Object.entries(pinnedFor).map(([name, pinned]) => ({ + name, + endpoint: `http://${name}.local`, + skills: [], + ...(pinned ? { trust: { pinnedDID: pinned } } : {}), + })), + }) as PlanRequest + +describe("findAgentDID — DID precedence and provenance", () => { + it("returns {did: pinned, source: 'pinned'} when pinnedDID is set, even if observed also exists", () => { + const req = mkRequest({ research: DID_PINNED }) + const observed = new Map([["research", DID_OBSERVED]]) + + expect(findAgentDID(req, observed, "research")).toEqual({ + did: DID_PINNED, + source: "pinned", + }) + }) + + it("returns {did: observed, source: 'observed'} when pinnedDID is absent but observed exists", () => { + const req = mkRequest({ research: undefined }) + const observed = new Map([["research", DID_OBSERVED]]) + + expect(findAgentDID(req, observed, "research")).toEqual({ + did: DID_OBSERVED, + source: "observed", + }) + }) + + it("returns {did: null, source: null} when neither path resolves", () => { + const req = mkRequest({ research: undefined }) + const observed = new Map() + + expect(findAgentDID(req, observed, "research")).toEqual({ + did: null, + source: null, + }) + }) + + it("returns {did: null, source: null} when the agent name isn't in the catalog at all", () => { + // The planner's `load_recipe` tool passes through this helper with + // agentName = "load_recipe" — never in the catalog, should cleanly + // resolve to null/null rather than throwing. + const req = mkRequest({ research: DID_PINNED }) + const observed = new Map([["research", DID_OBSERVED]]) + + expect(findAgentDID(req, observed, "load_recipe")).toEqual({ + did: null, + source: null, + }) + }) + + it("pinned precedence is per-agent — one agent can be pinned while another is observed", () => { + const req = mkRequest({ research: DID_PINNED, math: undefined }) + const observed = new Map([ + ["research", DID_OBSERVED], + ["math", "did:bindu:ops_at_example_com:math:observed-3333"], + ]) + + const r = findAgentDID(req, observed, "research") + const m = findAgentDID(req, observed, "math") + + expect(r.source).toBe("pinned") + expect(r.did).toBe(DID_PINNED) + expect(m.source).toBe("observed") + expect(m.did).toBe("did:bindu:ops_at_example_com:math:observed-3333") + }) + + it("observed-DID map is treated as authoritative for observed claims — does not cross-contaminate across names", () => { + const req = mkRequest({ "a": undefined, "b": undefined }) + const observed = new Map([["a", DID_OBSERVED]]) + + expect(findAgentDID(req, observed, "a").did).toBe(DID_OBSERVED) + expect(findAgentDID(req, observed, "b").did).toBeNull() + }) +}) diff --git a/gateway/tests/bindu/agent-card-fetch.test.ts b/gateway/tests/bindu/agent-card-fetch.test.ts new file mode 100644 index 00000000..6e1072f5 --- /dev/null +++ b/gateway/tests/bindu/agent-card-fetch.test.ts @@ -0,0 +1,155 @@ +import { describe, it, expect, beforeEach, vi, afterEach } from "vitest" +import { fetchAgentCard, __resetAgentCardCacheForTests } from "../../src/bindu/client/agent-card" + +/** + * Coverage for the AgentCard fetch helper — the piece that activates + * maybeVerifySignatures' observed-DID fallback when the caller enabled + * `trust.verifyDID: true` without pinning a DID. + * + * Resolves the "peer.card is never populated" high-severity entry in + * BUGS_AND_KNOWN_ISSUES.md: before this module, the fallback at + * bindu/client/index.ts:196 was dead code because nothing ever set + * peer.card. These tests pin the four real outcomes (success, 404, + * malformed body, cache hit) and the degrade-to-null contract on + * every failure mode. + */ + +const VALID_CARD = { + id: "did:bindu:ops_at_example_com:research:7dc57d21-2c81-f6f5-c679-e51995f97e22", + name: "research", + description: "Web search and summarize.", + skills: [], + defaultInputModes: ["text/plain"], + defaultOutputModes: ["text/plain"], + capabilities: { + extensions: [ + { + uri: "did:bindu:ops_at_example_com:research:7dc57d21-2c81-f6f5-c679-e51995f97e22", + }, + ], + }, +} + +describe("fetchAgentCard", () => { + beforeEach(() => { + __resetAgentCardCacheForTests() + }) + + afterEach(() => { + vi.restoreAllMocks() + }) + + it("fetches and parses a valid AgentCard from /.well-known/agent.json", async () => { + const fetchSpy = vi.spyOn(global, "fetch").mockResolvedValue( + new Response(JSON.stringify(VALID_CARD), { + status: 200, + headers: { "content-type": "application/json" }, + }), + ) + + const result = await fetchAgentCard("http://localhost:3773") + expect(result).not.toBeNull() + expect(result?.name).toBe("research") + expect(fetchSpy).toHaveBeenCalledWith( + "http://localhost:3773/.well-known/agent.json", + expect.objectContaining({ signal: expect.anything() }), + ) + }) + + it("returns null on non-2xx without throwing", async () => { + vi.spyOn(global, "fetch").mockResolvedValue( + new Response("not found", { status: 404 }), + ) + + const result = await fetchAgentCard("http://localhost:3773") + expect(result).toBeNull() + }) + + it("returns null on malformed JSON body", async () => { + vi.spyOn(global, "fetch").mockResolvedValue( + new Response("not a json body, just a string", { + status: 200, + headers: { "content-type": "application/json" }, + }), + ) + + const result = await fetchAgentCard("http://localhost:3773") + expect(result).toBeNull() + }) + + it("returns null when the body doesn't match the AgentCard schema", async () => { + vi.spyOn(global, "fetch").mockResolvedValue( + new Response(JSON.stringify({ wrong: "shape" }), { + status: 200, + headers: { "content-type": "application/json" }, + }), + ) + + const result = await fetchAgentCard("http://localhost:3773") + expect(result).toBeNull() + }) + + it("returns null and caches the failure when fetch throws", async () => { + vi.spyOn(global, "fetch").mockRejectedValue(new Error("ECONNREFUSED")) + + const result = await fetchAgentCard("http://localhost:3773") + expect(result).toBeNull() + }) + + it("caches successful results — second call does not re-fetch", async () => { + const fetchSpy = vi.spyOn(global, "fetch").mockResolvedValue( + new Response(JSON.stringify(VALID_CARD), { + status: 200, + headers: { "content-type": "application/json" }, + }), + ) + + const first = await fetchAgentCard("http://localhost:3773") + const second = await fetchAgentCard("http://localhost:3773") + + expect(first).toBe(second) // same cached reference + expect(fetchSpy).toHaveBeenCalledTimes(1) + }) + + it("caches failures too — second call does not re-fetch a known-bad peer", async () => { + const fetchSpy = vi.spyOn(global, "fetch").mockResolvedValue( + new Response("", { status: 404 }), + ) + + await fetchAgentCard("http://localhost:3773") + await fetchAgentCard("http://localhost:3773") + + expect(fetchSpy).toHaveBeenCalledTimes(1) + }) + + it("caches per-URL — a different peer is fetched independently", async () => { + const fetchSpy = vi.spyOn(global, "fetch").mockResolvedValue( + new Response(JSON.stringify(VALID_CARD), { status: 200 }), + ) + + await fetchAgentCard("http://localhost:3773") + await fetchAgentCard("http://localhost:3775") + + expect(fetchSpy).toHaveBeenCalledTimes(2) + const urls = fetchSpy.mock.calls.map((c) => c[0]) + expect(urls).toContain("http://localhost:3773/.well-known/agent.json") + expect(urls).toContain("http://localhost:3775/.well-known/agent.json") + }) + + it("aborts on timeout", async () => { + // Simulate a fetch that never resolves — the helper's internal + // timeout should abort. We use a short timeout and expect null. + vi.spyOn(global, "fetch").mockImplementation( + (_url, init) => + new Promise((_resolve, reject) => { + const signal = (init as { signal?: AbortSignal } | undefined)?.signal + signal?.addEventListener("abort", () => reject(new Error("aborted")), { + once: true, + }) + }), + ) + + const result = await fetchAgentCard("http://localhost:3773", { timeoutMs: 50 }) + expect(result).toBeNull() + }) +}) diff --git a/gateway/tests/planner/verified-label.test.ts b/gateway/tests/planner/verified-label.test.ts new file mode 100644 index 00000000..c3e98c0b --- /dev/null +++ b/gateway/tests/planner/verified-label.test.ts @@ -0,0 +1,63 @@ +import { describe, it, expect } from "vitest" +import { computeVerifiedLabel } from "../../src/planner" + +/** + * Regression test for the "vacuous verified=yes" bug. + * + * Before this fix, the envelope flattened any positive + * signatures.ok to verified="yes" — including the case where zero + * artifacts were signed (ok is vacuously true when there's nothing to + * fail). A planner LLM reading verified="yes" couldn't tell if a real + * cryptographic check happened or if the agent simply isn't signing. + * + * The fix splits the happy path: "yes" now means "signed > 0 && all + * verified" and a new "unsigned" label marks "verification ran but + * nothing was signed." See BUGS_AND_KNOWN_ISSUES.md §Security and + * correctness for the original issue. + */ + +describe("computeVerifiedLabel", () => { + it("returns 'unknown' when verification wasn't attempted (signatures null)", () => { + expect(computeVerifiedLabel(null)).toBe("unknown") + }) + + it("returns 'yes' when every signed artifact verified", () => { + expect( + computeVerifiedLabel({ ok: true, signed: 2, verified: 2, unsigned: 0 }), + ).toBe("yes") + }) + + it("returns 'yes' even when some artifacts were unsigned, as long as every signed one verified", () => { + expect( + computeVerifiedLabel({ ok: true, signed: 1, verified: 1, unsigned: 3 }), + ).toBe("yes") + }) + + it("returns 'unsigned' (not 'yes') when no artifacts carried signatures — prevents the vacuous pass", () => { + // This is the bug the fix addresses. Pre-fix: "yes". Post-fix: "unsigned". + expect( + computeVerifiedLabel({ ok: true, signed: 0, verified: 0, unsigned: 4 }), + ).toBe("unsigned") + }) + + it("returns 'unsigned' on an empty artifact list too", () => { + expect( + computeVerifiedLabel({ ok: true, signed: 0, verified: 0, unsigned: 0 }), + ).toBe("unsigned") + }) + + it("returns 'no' when at least one signed artifact failed verification", () => { + expect( + computeVerifiedLabel({ ok: false, signed: 2, verified: 1, unsigned: 0 }), + ).toBe("no") + }) + + it("returns 'no' even when signed === verified if ok is false (defensive against callers mutating counts)", () => { + // Shouldn't happen with the real Bindu client's output — ok follows + // deterministically from counts — but if any future caller constructs + // a signatures object by hand, `ok` is the authoritative field. + expect( + computeVerifiedLabel({ ok: false, signed: 1, verified: 1, unsigned: 0 }), + ).toBe("no") + }) +})