Skip to content

docs: README rewrite + follow-up fixes from external review#29

Merged
babs merged 1 commit intomasterfrom
docs/readme-rewrite
Apr 28, 2026
Merged

docs: README rewrite + follow-up fixes from external review#29
babs merged 1 commit intomasterfrom
docs/readme-rewrite

Conversation

@babs
Copy link
Copy Markdown
Owner

@babs babs commented Apr 28, 2026

Summary

Two-part change:

1. README rewrite

The README's 39-row configuration mega-table was unscannable. Split into:

  • README (~370 lines): TL;DR runtime contract + Requirements at the top, the 6 required env vars, a "Production posture" callout summarising the safe-by-default guarantees, and pointers to the new reference doc.
  • docs/configuration.md (NEW): full env-var reference grouped by purpose — Required / Listeners & logging / Identity & authorization / Signing & rotation / Replay store / Rate limiting / Resource management / Production posture / Logging & observability — plus the alerting playbook and PromQL recipes that lived inline.

A new reader can land on the README, deploy in 5 minutes, and dig into the reference doc only when they need to. The architecture diagram, sealed-state TTL table, endpoint table, and supply-chain verification sections all stay in the README.

2. Follow-up fixes from a review pass

Doc drift caught by a fresh-eyes review:

  • docs/threat-model.md row 12 referenced token/seal.go and token/seal_test.go; actual paths are token/token.go + token/fuzz_test.go + token/token_test.go.
  • /readyz endpoint in the README table was flagged as if public; it lives on the metrics listener (port 9090). Fixed.
  • Per-replica rate-limit scope was implicit; now called out in both README (production-posture section) and docs/configuration.md (rate-limiting section). Operators sizing N replicas multiply the per-endpoint ceiling accordingly.

Contract fix:

  • MCP_TOOL_METRICS_MAX_CARDINALITY=0 is now accepted as the documented "disable the cap" sentinel. The consumer at metrics.ToolCardinality.ToolLabel already gates on MaxCardinality > 0; the only blocker was the config validator rejecting 0 with "must be a positive integer". Negative values still fail. New regression test TestLoad_ToolMetrics_ZeroDisablesCap pins the contract.

Supply-chain pins:

  • Dockerfile base images pinned by @sha256 digest (golang:1.26-alpine and gcr.io/distroless/static-debian13:nonroot). A base-image rebuild can no longer silently change the published binary's environment.
  • CI's govulncheck install pinned to v1.1.4 (was @latest).

README polish:

  • TL;DR uses <your-idp-secret> placeholder (copy-paste-safe) and a reachable Redis hostname.
  • TOKEN_SIGNING_SECRET row cross-references the key-rotation runbook.
  • OIDC_ALLOW_INSECURE_HTTP moved next to the other PROD_MODE-gated relaxation toggles.
  • MCP_PER_SUBJECT_CONCURRENCY split into a dedicated "Resource management" sub-section (it's a concurrency cap, not a rate limit).
  • Observability heading renamed for a clean GitHub anchor.

Items deferred to a follow-up PR

The same review flagged two items that need their own focused PR with regression tests:

  • TOKEN_SIGNING_SECRET entropy gate is structurally flawed. A 64-char openssl rand -hex 32 value is drawn from a 16-symbol alphabet — the expected number of distinct chars is ~15.75, so a real random secret can intermittently fail the < 16 distinct bytes check while a patterned 0123456789abcdef0123456789abcdef (exactly 16 distinct bytes) passes. False-positive AND false-negative on the same heuristic. Will replace with an obvious-pattern check + align docs on manifests/scripts/generate-signing-secret.sh.
  • Optional Redis-backed rate limiter for deployments that want a true cluster-wide ceiling rather than the documented N × per-replica math.

Test plan

  • CI green (test, lint, keycloak-e2e, fuzz-smoke, manifest-prod, govulncheck, build).
  • go test ./config/ -run "ToolMetrics" -v passes (incl. new ZeroDisablesCap regression).
  • Manual: render the README on github.com — TL;DR is the first thing after the diagram; cross-references resolve.

README split: the configuration mega-table moved into a new
docs/configuration.md grouped by purpose (required, listeners,
identity, signing, replay store, rate limits, resource management,
production posture, observability), with the alerting playbook +
PromQL recipes alongside it. The README itself drops to ~370 lines
and gains a TL;DR runtime contract + Requirements section near the
top so a new reader can deploy without scrolling.

Also addresses a batch of follow-up items from a review pass:

  - Doc-drift fixes:
    - threat-model row 12 referenced token/seal.go and
      token/seal_test.go; actual paths are token/token.go and
      token/fuzz_test.go + token/token_test.go.
    - README endpoint table now flags /readyz as living on the
      metrics listener (port 9090), not the public router —
      matches the actual main.go wiring.
    - Per-replica scope of the in-process rate limiter is now
      called out in both README (production-posture section) and
      docs/configuration.md (rate-limiting section), so an
      operator sizing N replicas multiplies the per-endpoint
      ceiling accordingly.

  - Code/contract fixes:
    - MCP_TOOL_METRICS_MAX_CARDINALITY=0 is now accepted as the
      documented "disable the cap" sentinel. The consumer at
      metrics.ToolCardinality.ToolLabel already gates on
      MaxCardinality > 0, so the only blocker was the config
      validator rejecting 0 with "must be a positive integer".
      Negative values still fail. New regression test
      TestLoad_ToolMetrics_ZeroDisablesCap pins the contract.

  - Supply-chain pins:
    - Dockerfile base images (golang:1.26-alpine and
      gcr.io/distroless/static-debian13:nonroot) pinned by
      sha256 digest so a base-image rebuild can't silently
      change the published binary's environment.
    - CI's govulncheck install pinned to v1.1.4 (was @latest).

  - README polish:
    - TL;DR uses an explicit <your-idp-secret> placeholder
      instead of a literal ellipsis (copy-paste-safe), and a
      reachable-Redis hostname is called out for readers running
      outside the demo compose network.
    - Configuration table cross-references the key-rotation
      runbook from the TOKEN_SIGNING_SECRET row.
    - OIDC_ALLOW_INSECURE_HTTP moved next to the other
      PROD_MODE-gated relaxation toggles in
      docs/configuration.md.
    - MCP_PER_SUBJECT_CONCURRENCY split into its own
      "Resource management" sub-section (it caps in-flight
      requests, not request rate).
    - Observability heading renamed for a clean GitHub anchor.
@babs babs merged commit 9dc12d1 into master Apr 28, 2026
7 checks passed
@babs babs deleted the docs/readme-rewrite branch April 28, 2026 00:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant