Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
315 changes: 284 additions & 31 deletions bugs/known-issues.md

Large diffs are not rendered by default.

14 changes: 7 additions & 7 deletions gateway/docs/STORY.md β†’ docs/GATEWAY.md
Original file line number Diff line number Diff line change
Expand Up @@ -393,7 +393,7 @@ Five agents now, each on its own port:
| faq_agent | 3778 | Answers from a canned FAQ |

Each is ~60 lines of Python. Open any one β€” say
[joke_agent.py](../../examples/gateway_test_fleet/joke_agent.py) β€” and you'll see
[joke_agent.py](../examples/gateway_test_fleet/joke_agent.py) β€” and you'll see
a small configuration that wires a language model (`openai/gpt-4o-mini`)
to a few lines of instructions ("tell jokes, refuse other requests").
Narrow scope on purpose so mistakes are visible.
Expand Down Expand Up @@ -660,7 +660,7 @@ Where recipes shine:
payment-required`, surface the payment URL to the user and STOP β€” do
not retry" is a policy the planner wouldn't invent on its own. See the
seed recipe at
[gateway/recipes/payment-required-flow/RECIPE.md](../recipes/payment-required-flow/RECIPE.md)
[gateway/recipes/payment-required-flow/RECIPE.md](../gateway/recipes/payment-required-flow/RECIPE.md)
for a real example.
- **Tenant-specific rules.** A recipe visible only to a certain agent
can encode rules like "always include a disclaimer" or "always call
Expand Down Expand Up @@ -929,20 +929,20 @@ skip.

### Reference material

- **[gateway/openapi.yaml](../openapi.yaml)** β€” the machine-readable
- **[gateway/openapi.yaml](../gateway/openapi.yaml)** β€” the machine-readable
contract for `/plan`, `/health`, and `/.well-known/did.json`. Paste it
into [Swagger UI](https://editor.swagger.io) or
[Stoplight](https://stoplight.io) to click through every field,
response, and example. This is the source of truth; this document is
the prose.
- **[gateway/README.md](../README.md)** β€” the operator's reference:
- **[gateway/README.md](../gateway/README.md)** β€” the operator's reference:
configuration knobs, environment variables, the `/health` payload,
troubleshooting, and where vendored code came from (OpenCode). Short
and targeted β€” most of the narrative moved into this story.
- **[gateway/agents/planner.md](../agents/planner.md)** β€” the planner
- **[gateway/agents/planner.md](../gateway/agents/planner.md)** β€” the planner
LLM's system prompt. If the gateway is doing something you don't
expect, start here.
- **[gateway/recipes/](../recipes)** β€” the two seed recipes
- **[gateway/recipes/](../gateway/recipes)** β€” the two seed recipes
(`multi-agent-research`, `payment-required-flow`) plus whatever you
authored in Chapter 4. Each one is a complete example.

Expand Down Expand Up @@ -988,7 +988,7 @@ If you're moving this past localhost:
### When you're stuck

- Gateway won't boot: re-read the env var section of
[gateway/README.md](../README.md). Partial DID or Hydra config fails
[gateway/README.md](../gateway/README.md). Partial DID or Hydra config fails
fast with a message naming the missing var.
- Planner never calls a tool: the descriptions you gave for
`agents[].skills[].description` are probably too short or too vague.
Expand Down
2 changes: 1 addition & 1 deletion docs/openapi.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -595,7 +595,7 @@ paths:
- kind: "text"
text: "Paris."
metadata:
did.message.signature: "2M1qbfLcyoQhSAfTzDghw15PMTfmv3jUoigk7KRuiowkEWZpU7aYLHTnqwamjEo4SxNskq15PZANNLuhJ7omzsxg"
did.message.signature: "2M1qbfLcyoQhSAfTzDghw15PMTfmv3jUoigk7KRuiowkEWZpU7aYLHTnqwamjEo4SxNskq15PZANNLuhJ7omzsxg" # pragma: allowlist secret
artifact_id: "985b4f37-ee2e-48a4-bd6f-c66472e67b85"
metadata: {}

Expand Down
4 changes: 2 additions & 2 deletions examples/gateway_test_fleet/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ A reproducible multi-agent setup for exercising the Bindu Gateway end-to-end. Fi

## If you're new here

**Don't start with this folder β€” start with [`gateway/docs/STORY.md`](../../gateway/docs/STORY.md).** That's the guided walkthrough; this fleet is what it uses under the hood. By Chapter 3 of STORY.md you'll have all five agents running via `start_fleet.sh` and a gateway driving them.
**Don't start with this folder β€” start with [`docs/GATEWAY.md`](../../docs/GATEWAY.md).** That's the guided walkthrough; this fleet is what it uses under the hood. By Chapter 3 of STORY.md you'll have all five agents running via `start_fleet.sh` and a gateway driving them.

## What's in here

Expand Down Expand Up @@ -81,7 +81,7 @@ Each case writes its full SSE stream to `logs/<ID>.sse`. Open one end-to-end β€”

## Further reading

- [`gateway/docs/STORY.md`](../../gateway/docs/STORY.md) β€” the end-to-end story this fleet illustrates
- [`docs/GATEWAY.md`](../../docs/GATEWAY.md) β€” the end-to-end story this fleet illustrates
- [`gateway/openapi.yaml`](../../gateway/openapi.yaml) β€” machine-readable API contract for the gateway
- [`gateway/README.md`](../../gateway/README.md) β€” operator reference (env vars, /health, DID signing reference)
- [`gateway/recipes/`](../../gateway/recipes/) β€” seed playbooks you can copy-edit as templates
8 changes: 4 additions & 4 deletions gateway/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ A task-first orchestrator that sits between an **external system** and one or mo

## New here?

**Read [`docs/STORY.md`](./docs/STORY.md) first.** It's a 45-minute end-to-end walkthrough that goes from a clean clone to running three chained agents, authoring a recipe, and turning on DID signing. Written for readers with no prior AI-agent knowledge.
**Read [`docs/GATEWAY.md`](../docs/GATEWAY.md) first.** It's a 45-minute end-to-end walkthrough that goes from a clean clone to running three chained agents, authoring a recipe, and turning on DID signing. Written for readers with no prior AI-agent knowledge.

This README is the **operator's reference** β€” configuration, troubleshooting, and pointers into source. The narrative lives in STORY.md.

Expand Down Expand Up @@ -73,7 +73,7 @@ Returns a detailed JSON payload describing the gateway process β€” version, plan
}
```

For a runnable multi-agent walkthrough, see [`docs/STORY.md`](./docs/STORY.md) Β§Chapter 2-3.
For a runnable multi-agent walkthrough, see [`docs/GATEWAY.md`](../docs/GATEWAY.md) Β§Chapter 2-3.

---

Expand Down Expand Up @@ -125,7 +125,7 @@ Full request/response contract with examples: [`openapi.yaml`](./openapi.yaml).

Recipes are markdown playbooks the planner lazy-loads when a task matches. Only metadata (`name` + `description`) sits in the system prompt; the full body is fetched on demand via the `load_recipe` tool. Pattern borrowed from [OpenCode Skills](https://opencode.ai/docs/skills/), renamed to avoid collision with A2A `SkillRequest` (an agent capability on the `/plan` request body).

**Author one in two minutes** β€” see [`docs/STORY.md`](./docs/STORY.md) Β§Chapter 4 for the walkthrough. The reference:
**Author one in two minutes** β€” see [`docs/GATEWAY.md`](../docs/GATEWAY.md) Β§Chapter 4 for the walkthrough. The reference:

### Layouts

Expand Down Expand Up @@ -175,7 +175,7 @@ Default action is `allow` β€” an agent with no `recipe:` rules sees everything.

For peers configured with `auth.type = "did_signed"`, the gateway signs each outbound A2A request with an Ed25519 identity. Peers verify against the gateway's public key (published at `/.well-known/did.json`) and reject mismatches.

**Full walkthrough** β€” [`docs/STORY.md`](./docs/STORY.md) Β§Chapter 5. The reference:
**Full walkthrough** β€” [`docs/GATEWAY.md`](../docs/GATEWAY.md) Β§Chapter 5. The reference:

### Two modes

Expand Down
64 changes: 59 additions & 5 deletions gateway/openapi.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -937,7 +937,7 @@ components:

SSEEvent_TaskStarted:
type: object
required: [task_id, agent, agent_did, skill, input]
required: [task_id, agent, agent_did, agent_did_source, skill, input]
properties:
task_id:
type: string
Expand All @@ -947,7 +947,15 @@ components:
description: Display name of the peer agent (from `agents[].name`).
agent_did:
type: [string, "null"]
description: Pinned DID for the agent (from `agents[].trust.pinnedDID`), or null if not pinned.
description: |
The peer's DID, resolved with precedence pinned β†’ observed β†’ null:
(a) `trust.pinnedDID` from the /plan catalog if set; otherwise
(b) the DID the peer published at `/.well-known/agent.json`,
fetched by the gateway at plan-open time; otherwise
(c) `null` β€” cryptographic identity undeclared.
See `agent_did_source` for which path resolved it.
agent_did_source:
$ref: "#/components/schemas/AgentDIDSource"
skill:
type: string
description: Skill id being invoked on the peer.
Expand All @@ -961,20 +969,35 @@ components:

SSEEvent_TaskArtifact:
type: object
required: [task_id, agent, agent_did, content]
required: [task_id, agent, agent_did, agent_did_source, content]
properties:
task_id:
type: string
agent:
type: string
agent_did:
type: [string, "null"]
description: Same resolution rules as on `task.started` β€” pinned β†’ observed β†’ null.
agent_did_source:
$ref: "#/components/schemas/AgentDIDSource"
content:
type: string
description: |
The peer's artifact text, wrapped in a `<remote_content agent="..." did="..." verified="yes|no|unknown">...</remote_content>`
The peer's artifact text, wrapped in a
`<remote_content agent="..." did="..." verified="...">...</remote_content>`
envelope. The planner treats this as untrusted data β€” clients
should too.

`verified` is four-valued:
- `yes` β†’ at least one signed artifact and all signed
verified against the pinned DID's public key.
Strongest guarantee.
- `no` β†’ at least one signed artifact failed
verification. Task is also marked `failed`.
- `unsigned` β†’ verification ran but no artifact carried a
signature. The body is unverified hearsay.
- `unknown` β†’ verification wasn't attempted (no `verifyDID`,
no `pinnedDID`, or DID doc unreachable).
title:
type: string
description: Short display title, typically `@<agent>/<skill>`.
Expand All @@ -993,14 +1016,17 @@ components:

SSEEvent_TaskFinished:
type: object
required: [task_id, agent, agent_did, state]
required: [task_id, agent, agent_did, agent_did_source, state]
properties:
task_id:
type: string
agent:
type: string
agent_did:
type: [string, "null"]
description: Same resolution rules as on `task.started` β€” pinned β†’ observed β†’ null.
agent_did_source:
$ref: "#/components/schemas/AgentDIDSource"
state:
type: string
enum: [completed, failed]
Expand Down Expand Up @@ -1052,6 +1078,34 @@ components:
description: Empty object. Last frame of every successful plan.
additionalProperties: false

AgentDIDSource:
type: [string, "null"]
enum: [pinned, observed, null]
description: |
Provenance of the `agent_did` on the same SSE frame. Tells
consumers which of three paths resolved the DID, so they can
apply the right trust policy:

- `"pinned"` β€” the caller declared `trust.pinnedDID` in the
/plan catalog. The caller vouched for this identity; the
gateway enforces it when `verifyDID: true` is also set.
Strongest claim a consumer can get out of this field.
- `"observed"` β€” the peer self-reported this DID in its
`/.well-known/agent.json` AgentCard, fetched by the gateway
at plan-open time. Weaker than pinned: an impostor standing
up a fake endpoint can advertise any DID they choose unless
signature verification is on.
- `null` β€” neither path resolved. Either the caller didn't pin
AND the AgentCard couldn't be fetched (no `.well-known`,
network failure, malformed), or the AgentCard had no DID in
either `id` or `capabilities.extensions[].uri`.

Compliance-gated consumers should treat `"observed"` and
`null` identically unless they also see a `signatures.ok:true`
with `signed > 0` on the same or a following frame β€” that's
the cryptographic evidence that promotes an observed DID to a
verified one.

PlanSignatures:
type: [object, "null"]
description: |
Expand Down
92 changes: 75 additions & 17 deletions gateway/src/api/plan-route.ts
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,8 @@ import {
import { Service as BusService, type Interface as BusInterface } from "../bus"
import { Service as ConfigService, type Config } from "../config"
import { PromptEvent } from "../session/prompt"
import { fetchAgentCard } from "../bindu/client/agent-card"
import { getPeerDID } from "../bindu/protocol/identity"
import type { z } from "zod"

/**
Expand Down Expand Up @@ -92,6 +94,35 @@ async function handleRequest(
)
}

// 2b. Pre-fetch each peer's AgentCard in parallel (total ≀2s budget)
// so we can surface observed DIDs in SSE even when the caller
// didn't pin one. Results are cached in fetchAgentCard's
// per-process Map β€” the Bindu client's downstream runCall will
// hit the same cache for free. Failures don't block: individual
// peer AgentCards default to "not observed", `agent_did` stays
// null for that peer.
const observedByName = new Map<string, string>()
{
const discoveryBudget = 2000
const ac = new AbortController()
const timer = setTimeout(() => ac.abort(), discoveryBudget)
try {
await Promise.allSettled(
request.agents.map(async (ag) => {
const card = await fetchAgentCard(ag.endpoint, {
signal: ac.signal,
timeoutMs: discoveryBudget,
})
if (!card) return
const did = getPeerDID(card)
if (did) observedByName.set(ag.name, did)
}),
)
} finally {
clearTimeout(timer)
}
}

// 3. Resolve session BEFORE opening SSE β€” required so subscribers can
// filter events by sessionID. Any failure here returns plain JSON.
let sessionCtx: SessionContext
Expand Down Expand Up @@ -150,17 +181,18 @@ async function handleRequest(

spawnReader(ac.signal, ownEvent(bus.subscribe(PromptEvent.ToolCallStart)), async (evt) => {
const agentName = parseAgentFromTool(evt.properties.tool)
// Resolve the peer's DID with provenance. Pinned wins over
// observed (the caller vouched; observed is self-reported).
// Emitted on every task.* frame so SSE consumers can partition
// by `agent_did_source` if they only trust pinned claims.
const agentId = findAgentDID(request, observedByName, agentName)
await stream.writeSSE({
event: "task.started",
data: JSON.stringify({
task_id: evt.properties.callID,
agent: agentName,
// Unique identifier for the peer. Agent names are
// operator-chosen and can collide across catalogs; DIDs
// are the stable cryptographic handle. Null when the
// request didn't pin a DID for this peer (auth.type="none"
// / "bearer" / "bearer_env" without trust.pinnedDID).
agent_did: findPinnedDID(request, agentName),
agent_did: agentId.did,
agent_did_source: agentId.source,
skill: parseSkillFromTool(evt.properties.tool),
input: evt.properties.input,
}),
Expand All @@ -178,12 +210,14 @@ async function handleRequest(
evt.properties.signatures !== undefined
? { signatures: evt.properties.signatures }
: {}
const agentId = findAgentDID(request, observedByName, agentName)
await stream.writeSSE({
event: "task.artifact",
data: JSON.stringify({
task_id: evt.properties.callID,
agent: agentName,
agent_did: findPinnedDID(request, agentName),
agent_did: agentId.did,
agent_did_source: agentId.source,
content: evt.properties.output,
title: evt.properties.title,
...sigField,
Expand All @@ -194,7 +228,8 @@ async function handleRequest(
data: JSON.stringify({
task_id: evt.properties.callID,
agent: agentName,
agent_did: findPinnedDID(request, agentName),
agent_did: agentId.did,
agent_did_source: agentId.source,
state: evt.properties.error ? "failed" : "completed",
...(evt.properties.error ? { error: evt.properties.error } : {}),
...sigField,
Expand Down Expand Up @@ -312,17 +347,40 @@ function parseSkillFromTool(toolId: string): string {
return m?.[2] ?? ""
}

/** Shape returned by findAgentDID β€” keeps DID + provenance together
* so every SSE frame can emit both without re-running the lookup. */
export interface AgentDIDResolution {
readonly did: string | null
readonly source: "pinned" | "observed" | null
}

/**
* Resolve a peer's DID from the /plan request's agent catalog.
* Resolve a peer's DID with provenance, in precedence order:
*
* 1. ``trust.pinnedDID`` from the /plan request catalog β€” the caller
* explicitly declared which DID they expect. Strongest claim.
* 2. Observed DID from the peer's AgentCard (fetched upfront during
* /plan setup, keyed by agent name) β€” the peer self-reports this
* identity at /.well-known/agent.json. Weaker: an impostor can
* advertise any DID they like unless signature verification is on.
* 3. ``null`` β€” neither path resolved. Consumer can still identify
* the peer by name for display; cryptographic identity is
* unknown.
*
* DIDs are optional in the API (callers can talk to ``auth.type="none"``
* / ``"bearer"`` peers without ever pinning a DID), so this returns
* ``null`` when the catalog has no ``trust.pinnedDID`` for the named
* peer. SSE consumers treat ``null`` as "no cryptographic identity
* declared" β€” they can still identify the peer by name for display,
* just without the stable unique handle.
* The ``source`` field lets SSE consumers decide which guarantee they
* need. A consumer building an audit log of "calls made to peer X"
* might accept ``observed`` (human-readable correlation); a compliance
* gate might reject anything other than ``pinned``.
*/
function findPinnedDID(request: PlanRequest, agentName: string): string | null {
export function findAgentDID(
request: PlanRequest,
observedByName: Map<string, string>,
agentName: string,
): AgentDIDResolution {
const entry = request.agents.find((a) => a.name === agentName)
return entry?.trust?.pinnedDID ?? null
const pinned = entry?.trust?.pinnedDID
if (pinned) return { did: pinned, source: "pinned" }
const observed = observedByName.get(agentName)
if (observed) return { did: observed, source: "observed" }
return { did: null, source: null }
}
Loading
Loading