Skip to content

feat: add node discovery protocol for split-origin nodes#373

Merged
peteski22 merged 14 commits into
mainfrom
feature/node-discovery-protocol
May 16, 2026
Merged

feat: add node discovery protocol for split-origin nodes#373
peteski22 merged 14 commits into
mainfrom
feature/node-discovery-protocol

Conversation

@peteski22
Copy link
Copy Markdown
Collaborator

@peteski22 peteski22 commented May 15, 2026

Summary

  • Introduces an optional discovery document at /.well-known/cq-node.json declaring a cq node's api_base_url and api_version. Absence of the document means defaults ({addr}/api/v1, v1), preserving today's same-origin behavior; presence lets split-origin deployments declare their topology without leaking internal hostnames into the CLI UX.
  • Adds the Go discovery library (sdk/go/discovery/) — a functional-options Resolver (WithCacheDir, WithHTTPClient, WithLogger) that probes the well-known endpoint and validates strictly (JSON-Schema-conformant document, url.Parse-validated api_base_url, rejected unknown fields, rejected unsupported document and API versions), caches results on disk under XDG with re-validation on every read. Threads the resolver through the SDK and CLI in place of hardcoded /api/v1 prefixes; pointing the CLI at a SPA host by mistake now surfaces a clear domain-terms error instead of invalid character '<' looking for beginning of value.
  • Adds cq.WithLogger(*slog.Logger) on the SDK Client and plumbs it through to the discovery resolver. The library defaults to slog.DiscardHandler so it writes nothing to stdout or stderr without explicit caller wiring (important for MCP-over-stdio where stdout carries JSONRPC). Cache-write failures surface as structured Warn records when a logger is wired; default callers stay silent.
  • Reference server (./server) explicitly returns 404 for the well-known path so the combined Docker image's SPA catch-all does not intercept it.

Test plan

  • make lint and make test green at repo root (schema, sdk/go, sdk/python, cli, server backend, frontend).
  • Unit tests cover the cache (mtime TTL, atomic writes, invalidation, schema-invalid entries treated as miss and removed), the Resolver (404 → defaults, valid doc → parsed, HTML → error, malformed JSON → error, unknown-field → error, document version mismatch → error, API version mismatch → error, hostless / non-http(s) api_base_url → error, 5xx retry, in-process and on-disk caching of success and 404 fallback, file cache disabled when WithCacheDir is not passed), and the options surface (each WithX rejects invalid inputs; defaults co-located in defaultOptions).
  • Logger contract: TestResolverIsSilentByDefault verifies no record reaches the discard handler even when the cache write fails; TestResolverLogsCachePutFailure verifies a structured Warn record surfaces with addr and err fields when a logger is wired.
  • Regression tests in SDK and CLI pin the existing /api/v1/<resource> URL shape on the default-when-404 path.
  • End-to-end against a live compose deployment: with the server returning 404, CLI falls back and caches {api_base_url: http://localhost:3000/api/v1, api_version: v1}; with the server temporarily publishing a real discovery doc, CLI uses the advertised api_base_url and caches the full document including node_name.

Notes for reviewers

  • The schema (schema/node_discovery.json) and operator docs (docs/node-discovery-protocol.md) land in the first two commits and have no consumer; the Go library lands next; SDK/CLI wiring and the server-side 404 land after that; later commits address review feedback (Version field on NodeInfo, DisallowUnknownFields probe decoding, url.Parse-based api_base_url validation, cache re-validation on read, best-effort cache-dir lookup, logger injection, functional options on discovery.New). Reviewing commit-by-commit may be easier than the full diff.
  • gojsonschema (the same validator mcpd uses) is added to schema/go.mod so the fixtures are validated in Go, not just via make validate.
  • A single discovery.DefaultCacheDir() (XDG-compliant) is shared between the SDK and the CLI; cache files live at ~/.cache/cq/discovery/<sha256(addr)>.json.
  • discovery.New(opts ...Option) follows the existing ClientOption pattern in sdk/go/options.go and mcpd's cache.Options precedent. defaultOptions() is the single source of truth for default values.
  • Two refactor(sdk): commits reorder files to match the repo's alphabetical-methods convention; surface area is unchanged.
  • Python SDK parity is a separate follow-up PR per the planning doc, not blocking this one.

peteski22 added 9 commits May 15, 2026 22:43
Introduces the JSON schema for /.well-known/cq-node.json, the optional
discovery document a cq node serves to declare its API base URL and
protocol version. Same-origin nodes (default ./server, localhost) need
not serve this document; clients fall back to {addr}/api/v1 and
api_version=v1 in its absence.

Adds three fixtures (minimal same-origin, split-origin, and an invalid
api_version for the negative test) plus Go tests that validate the
fixtures against the compiled schema via gojsonschema.
Documents /.well-known/cq-node.json: when and how to serve it, the
client behavior contract, and operator recipes for same-origin,
split-origin, and reverse-proxy deployments. References the JSON
schema added in the previous commit.
Introduces the sdk/go/discovery package. This commit adds the
protocol constants, the NodeInfo type, and a disk-backed cache for
discovery results. Cache entries are keyed by SHA-256 of the
user-supplied address and stored as one JSON file per address with
atomic write semantics; entries expire by file mtime against a
configurable TTL.

No resolver yet — that lands in the next commit.
Adds Resolver, the public face of the discovery package. Probes
{addr}/.well-known/cq-node.json, retries on 5xx, falls back to
defaults on 404, errors clearly on HTML responses (the SPA-vs-API
mistake) and on api_version mismatches. Results are memoized
in-process and on disk via the cache added in the previous commit.

The Resolver is not yet wired into the SDK or CLI — that change
threads it through the existing remote/HTTP clients and replaces
their hardcoded /api/v1 prefixes.
…nly docstrings

Methods on Resolver are now in alphabetical order within their
visibility band (Resolve, then cacheInMemory, fetchWithRetry, probe).
Package-level helpers stay alphabetical (defaultsFor, validate).

Docstrings rewritten to describe each unit's intent and contract
without referencing callers or downstream behavior. The for loop in
fetchWithRetry is modernized to range-over-int.

No behavior change; all tests pass.
cache methods now appear in alphabetical order (get, invalidate,
pathFor, put), matching the convention used in discovery.go and
elsewhere in sdk/go.
remoteClient now holds the user-supplied node address and a
Resolver instead of a precomputed baseURL with hardcoded /api/v1.
Each request resolves (cached) via the discovery protocol; the
404-fallback case preserves the previous behavior of hitting
{addr}/api/v1/<resource>.

client.go constructs a default discovery.Resolver rooted at an
XDG-compliant cache directory. SDK tests inject a static resolver
to keep the existing /api/v1/... path assertions intact, and a new
regression test exercises the real Resolver to pin the default-on-
404 contract.

Client and CLI integration test helpers wrap their httptest
handlers so the well-known discovery path returns 404, keeping the
existing fallback path under test without rewriting every fixture.
The auth client previously prepended a hardcoded /api/v1 prefix to
every request path. It now resolves the API base URL through the
discovery protocol; the 404-fallback case preserves the prior
behavior of hitting {addr}/api/v1/<resource>.

NewClient constructs a default discovery.Resolver rooted at an
XDG-compliant cache directory. Tests inject a static resolver so the
existing /api/v1/... path assertions continue to hold, and a new
regression test exercises the real Resolver to pin the default-on-404
contract.

To avoid duplicating the XDG cache-dir helper across the SDK and
CLI, the helper moves to discovery.DefaultCacheDir; both callers now
consume it from a single source.
The combined Docker image mounts a SPA catch-all that serves
index.html for any unmatched path. Without this explicit route, cq
clients probing /.well-known/cq-node.json see Content-Type: text/html
and treat the node as a SPA host rather than falling back to the
documented defaults.

A 404 is the correct answer here: the reference server does not know
its own public URL, so it cannot synthesize a valid discovery
document. Operators who need to advertise a custom api_base_url
should serve the document from a reverse proxy in front of this
server.
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a node discovery protocol that lets CQ clients resolve a user-facing node address (--addr) into a concrete API base URL via an optional /.well-known/cq-node.json document, while preserving today’s default same-origin behavior when the document is absent (404). This is threaded through the Go SDK and CLI to replace hardcoded /api/v1 prefixes, and the reference server explicitly returns JSON 404 at the well-known path to avoid SPA catch-all HTML responses.

Changes:

  • Introduces discovery schema + fixtures + documentation for /.well-known/cq-node.json.
  • Adds a Go discovery resolver with in-process and on-disk caching, and wires it into the Go SDK remote client + CLI auth client.
  • Updates the reference backend to explicitly 404 (JSON) on the well-known path and adds coverage to ensure that behavior.

Reviewed changes

Copilot reviewed 27 out of 28 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
server/backend/tests/test_app.py Adds backend test asserting /.well-known/cq-node.json returns JSON 404.
server/backend/src/cq_server/app.py Registers an explicit FastAPI route returning 404 JSON for the well-known discovery path to avoid SPA catch-all.
sdk/go/remote.go Switches remote client URL construction from hardcoded /api/v1 to discovery-resolved api_base_url.
sdk/go/remote_test.go Updates remote tests to inject a resolver and adds wiring regression test for default-on-404 behavior.
sdk/go/discovery/types.go Introduces discovery protocol constants and NodeInfo type.
sdk/go/discovery/paths.go Adds XDG-compliant default cache directory helper.
sdk/go/discovery/discovery.go Implements discovery resolver with retry + caching + validation of resolved node info.
sdk/go/discovery/discovery_test.go Adds resolver unit tests (404 fallback, valid doc, HTML/malformed JSON errors, retries, caching, slash normalization).
sdk/go/discovery/cache.go Implements on-disk cache keyed by SHA-256 of address with TTL and atomic writes.
sdk/go/discovery/cache_test.go Adds cache unit tests (hit/miss, TTL, invalidation, temp file cleanup).
sdk/go/client.go Wires discovery cache dir + resolver into Go SDK client construction for remote mode.
sdk/go/client_test.go Wraps test servers to return 404 on well-known discovery path and isolates cache dir via XDG_CACHE_HOME.
schema/node_discovery.json Adds JSON Schema for the discovery document.
schema/node_discovery_test.go Adds Go-based schema fixture validation using gojsonschema.
schema/Makefile Extends fixture validation to include node discovery fixtures via check-jsonschema.
schema/go.mod Adds gojsonschema dependency for schema tests.
schema/go.sum Updates module sums for new schema validation dependencies.
schema/fixtures/node_discovery_split.json Adds split-origin discovery fixture.
schema/fixtures/node_discovery_minimal.json Adds minimal valid discovery fixture.
schema/fixtures/node_discovery_invalid_version.json Adds invalid fixture for schema negative testing.
schema/embed.go Embeds node_discovery.json into the schema Go package.
schema/embed_test.go Ensures embedded node discovery schema accessor returns non-empty JSON.
docs/node-discovery-protocol.md Documents the discovery protocol, schema, and client/operator behavior.
cli/internal/auth/platform_client.go Updates CLI auth client construction to use discovery resolver based on node address.
cli/internal/auth/http_client.go Refactors CLI auth HTTP client to build requests using discovery-resolved API base URL (removes hardcoded /api/v1).
cli/internal/auth/http_client_test.go Updates tests to bypass real discovery for path assertions and adds a wiring regression test for default-on-404 behavior.
cli/internal/auth/api_keys_test.go Switches tests to a static-resolver-backed client to keep /api/v1/... assertions stable without discovery I/O.
cli/cmd/cmd_test.go Wraps CLI command tests’ fake remotes to 404 discovery and isolates cache dir via XDG_CACHE_HOME.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread sdk/go/discovery/types.go
Comment thread sdk/go/discovery/discovery.go
Comment thread sdk/go/discovery/cache.go
Comment thread cli/internal/auth/platform_client.go
Addresses four review comments on the initial discovery package:

1. NodeInfo now carries the document schema Version. The resolver
   rejects documents whose Version does not match
   SupportedDiscoveryVersion (currently 1), so a future incompatible
   schema cannot be silently parsed with this client's assumptions.

2. Probe decoding uses json.Decoder with DisallowUnknownFields, which
   matches the JSON Schema's additionalProperties:false constraint and
   rejects documents with unexpected fields.

3. cache.get re-runs decodeNodeInfo and validate on every read.
   Stale-format, hand-edited, or otherwise schema-invalid cache files
   are treated as a miss and removed from disk so subsequent lookups
   are clean misses.

4. discovery.New treats an empty cacheDir as "disable the file cache"
   instead of constructing a cache that would Stat/read entries
   relative to the process's working directory. Resolve nil-checks the
   file cache so platform_client.go's best-effort cacheDir lookup
   degrades cleanly to in-process-only memoization.

defaultsFor now sets Version on the synthesized fallback NodeInfo so
the in-process memo and any cache write of the 404-fallback result
carry the same shape as a probe-resolved one.
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 27 out of 28 changed files in this pull request and generated 2 comments.

Comment thread sdk/go/client.go Outdated
Comment thread sdk/go/discovery/discovery.go Outdated
Two follow-up review fixes on the discovery package:

1. sdk/go's NewClient now treats discovery.DefaultCacheDir failure as
   best-effort (matching the CLI). With New() supporting an empty
   cacheDir by disabling the file cache, a restricted or container
   environment without HOME or XDG_CACHE_HOME still constructs a
   working Client backed by in-process memoization only.

2. validate parses api_base_url via url.Parse and requires both a
   non-empty Scheme (http or https) and a non-empty Host. Previously
   a value of "https://" passed the HasPrefix check and produced
   garbage when callers ran url.JoinPath against it. The new check
   surfaces the malformed value at resolve time with an actionable
   message naming the missing piece.

New tests pin the two malformed-URL cases.
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 27 out of 28 changed files in this pull request and generated 2 comments.

Comment thread sdk/go/remote.go
Comment thread sdk/go/discovery/discovery.go
…l options

Adds WithLogger(*slog.Logger) on the SDK Client and threads the logger
through to discovery.New as a functional option. The library defaults
to slog.DiscardHandler so it writes nothing to stdout or stderr
without explicit caller wiring; this matters especially for the MCP
server where stdout carries JSONRPC and stray writes would corrupt
the channel.

discovery.New(cacheDir, httpc, logger) becomes discovery.New(opts...)
matching the ClientOption pattern already in sdk/go/options.go and
mcpd's cache.Options precedent. The three options validate strictly:
WithCacheDir trims and requires a non-empty absolute path (relative
paths cause reads relative to the process cwd, the failure mode that
prompted the earlier "empty cacheDir" review). WithHTTPClient and
WithLogger reject nil. Absence of WithCacheDir is the disabled-cache
state; callers with an optional cache dir conditionally append the
option only when discovery.DefaultCacheDir succeeded.

defaultOptions is the single source of truth for default values.
WithX docstrings point at it instead of repeating literal defaults
that could drift. The struct field docs describe what each field is
for; the defaultOptions docstring explains the reasoning behind the
specific default values.

The previous _ = err pattern after disk cache writes becomes a
structured Warn record carrying the addr and underlying error.
Persistent cache-dir permission issues now surface to anyone who has
wired a logger; default callers stay silent.

Test coverage on the options surface: TestDefaultOptionsValues,
TestNewSkipsNilOption, plus per-option Accept/Reject tests for
empty, whitespace, relative path, nil client, nil logger. Resolver
tests TestResolverIsSilentByDefault and TestResolverLogsCachePutFailure
pin both halves of the silent-by-default contract by forcing a cache
write to fail (cache dir pre-occupied as a file).
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 31 out of 32 changed files in this pull request and generated 3 comments.

Comment thread sdk/go/discovery/discovery.go
Comment thread cli/internal/auth/platform_client.go
Comment thread sdk/go/discovery/discovery_test.go Outdated
The XDG Base Directory Specification requires implementations to
ignore non-absolute values of XDG_CACHE_HOME. Before this change,
DefaultCacheDir blindly joined whatever the environment variable
held, producing a relative path that survived its own error check
and was then handed to WithCacheDir. WithCacheDir's IsAbs validation
rejected it, discovery.New returned a (nil, err) pair, and the
CLI's NewClient discarded the error and returned a Client with a
nil resolver that would panic on the next Resolve call.

DefaultCacheDir now uses LookupEnv (to distinguish unset from
empty), trims whitespace, and errors when the resolved value is
non-empty and not absolute. The error propagates back through the
SDK and CLI's existing `if err == nil { append WithCacheDir }`
branches so they skip the disk-cache option and the Resolver runs
with in-process memoization only. The implementation mirrors mcpd's
internal/files.userSpecificDir; the doc comment references the spec.

paths_test.go pins the four cases: absolute XDG_CACHE_HOME succeeds,
relative errors, whitespace-only falls back to HOME, unset falls
back to HOME. The TestResolverIsSilentByDefault comment is also
refreshed; it referenced the pre-options-refactor positional API.
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 32 out of 33 changed files in this pull request and generated 1 comment.

Comment thread sdk/go/discovery/discovery.go
json.Decoder.Decode reads exactly one JSON value and stops, silently
dropping any bytes that follow. That left a hole in the strict-parsing
contract that DisallowUnknownFields, the Version check, the api_base_url
url.Parse check, and the api_version check otherwise establish: a body
like `{"version":1,...} garbage` or two concatenated objects would
have been accepted with the trailing bytes ignored.

decodeNodeInfo now calls dec.More() after the Decode and returns
"parse body: unexpected trailing content" when anything remains. The
check covers both the wire path (probe) and the disk path (cache.get),
since both decode through the same helper.

Tests TestResolveErrorsOnTrailingContent and
TestCacheTreatsTrailingContentEntryAsMiss exercise both code paths
with a valid object followed by stray bytes; the cache test also
verifies the invalid entry is removed from disk so the next lookup is
a clean miss rather than a repeated reject-and-reprobe cycle.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants