feat: add node discovery protocol for split-origin nodes#373
Merged
Conversation
Introduces the JSON schema for /.well-known/cq-node.json, the optional
discovery document a cq node serves to declare its API base URL and
protocol version. Same-origin nodes (default ./server, localhost) need
not serve this document; clients fall back to {addr}/api/v1 and
api_version=v1 in its absence.
Adds three fixtures (minimal same-origin, split-origin, and an invalid
api_version for the negative test) plus Go tests that validate the
fixtures against the compiled schema via gojsonschema.
Documents /.well-known/cq-node.json: when and how to serve it, the client behavior contract, and operator recipes for same-origin, split-origin, and reverse-proxy deployments. References the JSON schema added in the previous commit.
Introduces the sdk/go/discovery package. This commit adds the protocol constants, the NodeInfo type, and a disk-backed cache for discovery results. Cache entries are keyed by SHA-256 of the user-supplied address and stored as one JSON file per address with atomic write semantics; entries expire by file mtime against a configurable TTL. No resolver yet — that lands in the next commit.
Adds Resolver, the public face of the discovery package. Probes
{addr}/.well-known/cq-node.json, retries on 5xx, falls back to
defaults on 404, errors clearly on HTML responses (the SPA-vs-API
mistake) and on api_version mismatches. Results are memoized
in-process and on disk via the cache added in the previous commit.
The Resolver is not yet wired into the SDK or CLI — that change
threads it through the existing remote/HTTP clients and replaces
their hardcoded /api/v1 prefixes.
…nly docstrings Methods on Resolver are now in alphabetical order within their visibility band (Resolve, then cacheInMemory, fetchWithRetry, probe). Package-level helpers stay alphabetical (defaultsFor, validate). Docstrings rewritten to describe each unit's intent and contract without referencing callers or downstream behavior. The for loop in fetchWithRetry is modernized to range-over-int. No behavior change; all tests pass.
cache methods now appear in alphabetical order (get, invalidate, pathFor, put), matching the convention used in discovery.go and elsewhere in sdk/go.
remoteClient now holds the user-supplied node address and a
Resolver instead of a precomputed baseURL with hardcoded /api/v1.
Each request resolves (cached) via the discovery protocol; the
404-fallback case preserves the previous behavior of hitting
{addr}/api/v1/<resource>.
client.go constructs a default discovery.Resolver rooted at an
XDG-compliant cache directory. SDK tests inject a static resolver
to keep the existing /api/v1/... path assertions intact, and a new
regression test exercises the real Resolver to pin the default-on-
404 contract.
Client and CLI integration test helpers wrap their httptest
handlers so the well-known discovery path returns 404, keeping the
existing fallback path under test without rewriting every fixture.
The auth client previously prepended a hardcoded /api/v1 prefix to
every request path. It now resolves the API base URL through the
discovery protocol; the 404-fallback case preserves the prior
behavior of hitting {addr}/api/v1/<resource>.
NewClient constructs a default discovery.Resolver rooted at an
XDG-compliant cache directory. Tests inject a static resolver so the
existing /api/v1/... path assertions continue to hold, and a new
regression test exercises the real Resolver to pin the default-on-404
contract.
To avoid duplicating the XDG cache-dir helper across the SDK and
CLI, the helper moves to discovery.DefaultCacheDir; both callers now
consume it from a single source.
The combined Docker image mounts a SPA catch-all that serves index.html for any unmatched path. Without this explicit route, cq clients probing /.well-known/cq-node.json see Content-Type: text/html and treat the node as a SPA host rather than falling back to the documented defaults. A 404 is the correct answer here: the reference server does not know its own public URL, so it cannot synthesize a valid discovery document. Operators who need to advertise a custom api_base_url should serve the document from a reverse proxy in front of this server.
Contributor
There was a problem hiding this comment.
Pull request overview
Adds a node discovery protocol that lets CQ clients resolve a user-facing node address (--addr) into a concrete API base URL via an optional /.well-known/cq-node.json document, while preserving today’s default same-origin behavior when the document is absent (404). This is threaded through the Go SDK and CLI to replace hardcoded /api/v1 prefixes, and the reference server explicitly returns JSON 404 at the well-known path to avoid SPA catch-all HTML responses.
Changes:
- Introduces discovery schema + fixtures + documentation for
/.well-known/cq-node.json. - Adds a Go discovery resolver with in-process and on-disk caching, and wires it into the Go SDK remote client + CLI auth client.
- Updates the reference backend to explicitly 404 (JSON) on the well-known path and adds coverage to ensure that behavior.
Reviewed changes
Copilot reviewed 27 out of 28 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| server/backend/tests/test_app.py | Adds backend test asserting /.well-known/cq-node.json returns JSON 404. |
| server/backend/src/cq_server/app.py | Registers an explicit FastAPI route returning 404 JSON for the well-known discovery path to avoid SPA catch-all. |
| sdk/go/remote.go | Switches remote client URL construction from hardcoded /api/v1 to discovery-resolved api_base_url. |
| sdk/go/remote_test.go | Updates remote tests to inject a resolver and adds wiring regression test for default-on-404 behavior. |
| sdk/go/discovery/types.go | Introduces discovery protocol constants and NodeInfo type. |
| sdk/go/discovery/paths.go | Adds XDG-compliant default cache directory helper. |
| sdk/go/discovery/discovery.go | Implements discovery resolver with retry + caching + validation of resolved node info. |
| sdk/go/discovery/discovery_test.go | Adds resolver unit tests (404 fallback, valid doc, HTML/malformed JSON errors, retries, caching, slash normalization). |
| sdk/go/discovery/cache.go | Implements on-disk cache keyed by SHA-256 of address with TTL and atomic writes. |
| sdk/go/discovery/cache_test.go | Adds cache unit tests (hit/miss, TTL, invalidation, temp file cleanup). |
| sdk/go/client.go | Wires discovery cache dir + resolver into Go SDK client construction for remote mode. |
| sdk/go/client_test.go | Wraps test servers to return 404 on well-known discovery path and isolates cache dir via XDG_CACHE_HOME. |
| schema/node_discovery.json | Adds JSON Schema for the discovery document. |
| schema/node_discovery_test.go | Adds Go-based schema fixture validation using gojsonschema. |
| schema/Makefile | Extends fixture validation to include node discovery fixtures via check-jsonschema. |
| schema/go.mod | Adds gojsonschema dependency for schema tests. |
| schema/go.sum | Updates module sums for new schema validation dependencies. |
| schema/fixtures/node_discovery_split.json | Adds split-origin discovery fixture. |
| schema/fixtures/node_discovery_minimal.json | Adds minimal valid discovery fixture. |
| schema/fixtures/node_discovery_invalid_version.json | Adds invalid fixture for schema negative testing. |
| schema/embed.go | Embeds node_discovery.json into the schema Go package. |
| schema/embed_test.go | Ensures embedded node discovery schema accessor returns non-empty JSON. |
| docs/node-discovery-protocol.md | Documents the discovery protocol, schema, and client/operator behavior. |
| cli/internal/auth/platform_client.go | Updates CLI auth client construction to use discovery resolver based on node address. |
| cli/internal/auth/http_client.go | Refactors CLI auth HTTP client to build requests using discovery-resolved API base URL (removes hardcoded /api/v1). |
| cli/internal/auth/http_client_test.go | Updates tests to bypass real discovery for path assertions and adds a wiring regression test for default-on-404 behavior. |
| cli/internal/auth/api_keys_test.go | Switches tests to a static-resolver-backed client to keep /api/v1/... assertions stable without discovery I/O. |
| cli/cmd/cmd_test.go | Wraps CLI command tests’ fake remotes to 404 discovery and isolates cache dir via XDG_CACHE_HOME. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Addresses four review comments on the initial discovery package: 1. NodeInfo now carries the document schema Version. The resolver rejects documents whose Version does not match SupportedDiscoveryVersion (currently 1), so a future incompatible schema cannot be silently parsed with this client's assumptions. 2. Probe decoding uses json.Decoder with DisallowUnknownFields, which matches the JSON Schema's additionalProperties:false constraint and rejects documents with unexpected fields. 3. cache.get re-runs decodeNodeInfo and validate on every read. Stale-format, hand-edited, or otherwise schema-invalid cache files are treated as a miss and removed from disk so subsequent lookups are clean misses. 4. discovery.New treats an empty cacheDir as "disable the file cache" instead of constructing a cache that would Stat/read entries relative to the process's working directory. Resolve nil-checks the file cache so platform_client.go's best-effort cacheDir lookup degrades cleanly to in-process-only memoization. defaultsFor now sets Version on the synthesized fallback NodeInfo so the in-process memo and any cache write of the 404-fallback result carry the same shape as a probe-resolved one.
Two follow-up review fixes on the discovery package: 1. sdk/go's NewClient now treats discovery.DefaultCacheDir failure as best-effort (matching the CLI). With New() supporting an empty cacheDir by disabling the file cache, a restricted or container environment without HOME or XDG_CACHE_HOME still constructs a working Client backed by in-process memoization only. 2. validate parses api_base_url via url.Parse and requires both a non-empty Scheme (http or https) and a non-empty Host. Previously a value of "https://" passed the HasPrefix check and produced garbage when callers ran url.JoinPath against it. The new check surfaces the malformed value at resolve time with an actionable message naming the missing piece. New tests pin the two malformed-URL cases.
5 tasks
…l options Adds WithLogger(*slog.Logger) on the SDK Client and threads the logger through to discovery.New as a functional option. The library defaults to slog.DiscardHandler so it writes nothing to stdout or stderr without explicit caller wiring; this matters especially for the MCP server where stdout carries JSONRPC and stray writes would corrupt the channel. discovery.New(cacheDir, httpc, logger) becomes discovery.New(opts...) matching the ClientOption pattern already in sdk/go/options.go and mcpd's cache.Options precedent. The three options validate strictly: WithCacheDir trims and requires a non-empty absolute path (relative paths cause reads relative to the process cwd, the failure mode that prompted the earlier "empty cacheDir" review). WithHTTPClient and WithLogger reject nil. Absence of WithCacheDir is the disabled-cache state; callers with an optional cache dir conditionally append the option only when discovery.DefaultCacheDir succeeded. defaultOptions is the single source of truth for default values. WithX docstrings point at it instead of repeating literal defaults that could drift. The struct field docs describe what each field is for; the defaultOptions docstring explains the reasoning behind the specific default values. The previous _ = err pattern after disk cache writes becomes a structured Warn record carrying the addr and underlying error. Persistent cache-dir permission issues now surface to anyone who has wired a logger; default callers stay silent. Test coverage on the options surface: TestDefaultOptionsValues, TestNewSkipsNilOption, plus per-option Accept/Reject tests for empty, whitespace, relative path, nil client, nil logger. Resolver tests TestResolverIsSilentByDefault and TestResolverLogsCachePutFailure pin both halves of the silent-by-default contract by forcing a cache write to fail (cache dir pre-occupied as a file).
The XDG Base Directory Specification requires implementations to
ignore non-absolute values of XDG_CACHE_HOME. Before this change,
DefaultCacheDir blindly joined whatever the environment variable
held, producing a relative path that survived its own error check
and was then handed to WithCacheDir. WithCacheDir's IsAbs validation
rejected it, discovery.New returned a (nil, err) pair, and the
CLI's NewClient discarded the error and returned a Client with a
nil resolver that would panic on the next Resolve call.
DefaultCacheDir now uses LookupEnv (to distinguish unset from
empty), trims whitespace, and errors when the resolved value is
non-empty and not absolute. The error propagates back through the
SDK and CLI's existing `if err == nil { append WithCacheDir }`
branches so they skip the disk-cache option and the Resolver runs
with in-process memoization only. The implementation mirrors mcpd's
internal/files.userSpecificDir; the doc comment references the spec.
paths_test.go pins the four cases: absolute XDG_CACHE_HOME succeeds,
relative errors, whitespace-only falls back to HOME, unset falls
back to HOME. The TestResolverIsSilentByDefault comment is also
refreshed; it referenced the pre-options-refactor positional API.
json.Decoder.Decode reads exactly one JSON value and stops, silently
dropping any bytes that follow. That left a hole in the strict-parsing
contract that DisallowUnknownFields, the Version check, the api_base_url
url.Parse check, and the api_version check otherwise establish: a body
like `{"version":1,...} garbage` or two concatenated objects would
have been accepted with the trailing bytes ignored.
decodeNodeInfo now calls dec.More() after the Decode and returns
"parse body: unexpected trailing content" when anything remains. The
check covers both the wire path (probe) and the disk path (cache.get),
since both decode through the same helper.
Tests TestResolveErrorsOnTrailingContent and
TestCacheTreatsTrailingContentEntryAsMiss exercise both code paths
with a valid object followed by stray bytes; the cache test also
verifies the invalid entry is removed from disk so the next lookup is
a clean miss rather than a repeated reject-and-reprobe cycle.
4 tasks
This was referenced May 18, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
/.well-known/cq-node.jsondeclaring a cq node'sapi_base_urlandapi_version. Absence of the document means defaults ({addr}/api/v1,v1), preserving today's same-origin behavior; presence lets split-origin deployments declare their topology without leaking internal hostnames into the CLI UX.sdk/go/discovery/) — a functional-optionsResolver(WithCacheDir,WithHTTPClient,WithLogger) that probes the well-known endpoint and validates strictly (JSON-Schema-conformant document,url.Parse-validatedapi_base_url, rejected unknown fields, rejected unsupported document and API versions), caches results on disk under XDG with re-validation on every read. Threads the resolver through the SDK and CLI in place of hardcoded/api/v1prefixes; pointing the CLI at a SPA host by mistake now surfaces a clear domain-terms error instead ofinvalid character '<' looking for beginning of value.cq.WithLogger(*slog.Logger)on the SDK Client and plumbs it through to the discovery resolver. The library defaults toslog.DiscardHandlerso it writes nothing to stdout or stderr without explicit caller wiring (important for MCP-over-stdio where stdout carries JSONRPC). Cache-write failures surface as structuredWarnrecords when a logger is wired; default callers stay silent../server) explicitly returns 404 for the well-known path so the combined Docker image's SPA catch-all does not intercept it.Test plan
make lintandmake testgreen at repo root (schema, sdk/go, sdk/python, cli, server backend, frontend).api_base_url→ error, 5xx retry, in-process and on-disk caching of success and 404 fallback, file cache disabled whenWithCacheDiris not passed), and the options surface (eachWithXrejects invalid inputs; defaults co-located indefaultOptions).TestResolverIsSilentByDefaultverifies no record reaches the discard handler even when the cache write fails;TestResolverLogsCachePutFailureverifies a structuredWarnrecord surfaces withaddranderrfields when a logger is wired./api/v1/<resource>URL shape on the default-when-404 path.{api_base_url: http://localhost:3000/api/v1, api_version: v1}; with the server temporarily publishing a real discovery doc, CLI uses the advertisedapi_base_urland caches the full document includingnode_name.Notes for reviewers
schema/node_discovery.json) and operator docs (docs/node-discovery-protocol.md) land in the first two commits and have no consumer; the Go library lands next; SDK/CLI wiring and the server-side 404 land after that; later commits address review feedback (Version field onNodeInfo,DisallowUnknownFieldsprobe decoding,url.Parse-basedapi_base_urlvalidation, cache re-validation on read, best-effort cache-dir lookup, logger injection, functional options ondiscovery.New). Reviewing commit-by-commit may be easier than the full diff.gojsonschema(the same validator mcpd uses) is added toschema/go.modso the fixtures are validated in Go, not just viamake validate.discovery.DefaultCacheDir()(XDG-compliant) is shared between the SDK and the CLI; cache files live at~/.cache/cq/discovery/<sha256(addr)>.json.discovery.New(opts ...Option)follows the existingClientOptionpattern insdk/go/options.goand mcpd'scache.Optionsprecedent.defaultOptions()is the single source of truth for default values.refactor(sdk):commits reorder files to match the repo's alphabetical-methods convention; surface area is unchanged.