feat(replay): single-use claim on consent + callback state#22
Merged
Conversation
Closes the only documented security trade-off in specs.md
(consent_token replay) and removes IdP-fan-out + audit noise on
retried /callback. When a replay store is wired (REDIS_URL):
- GET /authorize seals a fresh JTI on every consent render and
a fresh SessionID on every silent-redirect session. Per-render
JTI keeps back-button safe (a re-render gets a new claim slot)
while still single-use within one render.
- POST /consent ClaimOnce on JTI before either approve or deny —
a captured consent_token cannot be replayed for either decision.
- GET /callback ClaimOnce on session.SessionID before the upstream
OIDC exchange — a replayed /callback URL never fans out to the
IdP token endpoint and never produces audit-log noise.
- Replay detected → 400 invalid_request with error_code
consent_replay / callback_state_replay, increments
mcp_auth_replay_detected_total{kind=consent|callback_state}.
- Store unreachable → 503 with error_code replay_store_unavailable,
increments mcp_auth_access_denied_total{reason="replay_store_unavailable"}
(reuses the existing /token counter so one alert rule covers
every claim site).
- nil store → stateless fallback (configured opt-out, mirrors /token).
- Empty JTI / SessionID on tokens sealed by older binaries falls
through to stateless behavior so an in-flight rollout doesn't
503 active users for the duration of consentTTL / sessionTTL.
- The /callback IdP-error redirect path deliberately does NOT
claim — the redirect is idempotent (no IdP fan-out, no token
issuance) and claiming would force re-auth on every legit
transient-error retry.
- All four replay_store_error log sites now carry an `op` field
(claim_authz_code / claim_refresh_family / claim_consent /
claim_callback_state) for site-level log correlation.
specs.md and README updated. 10 new handler tests cover replay
detection, deny-replayed, nil-store fallback, legacy empty
JTI/SID, store-error fail-closed, and JTI/SID population by
/authorize for both forks.
5 tasks
babs
added a commit
that referenced
this pull request
Apr 27, 2026
Closes T2.3 from misc/next-steps.md. Adds a configurable grace
window during which a refresh-rotation claim collision is treated
as a benign concurrent submit (parallel-tab refresh, slow-network
double-submit) and surfaces as 429 + error_code
refresh_concurrent_submit + Retry-After: 2 instead of revoking
the family.
Outside the window the strict pre-T2.3 behavior applies — every
collision still revokes the lineage (RFC 6749 §10.4 / OAuth 2.1
§6.1). REFRESH_RACE_GRACE_SEC tunes the window: default 2s,
clamped to [0, 10] at startup. The 10s ceiling is a security cap
(wider windows are statistically attacker-shaped); 0 disables
the grace and keeps the strict pre-T2.3 contract for operators
who prefer it.
Implementation:
- replay.Store.ClaimOrCheckFamily gains a graceWindow parameter
and a third return bool `racing`. Atomicity preserved: the
Lua script on Redis and the single-mutex section on
MemoryStore both classify {revoked, racing, claimed, fresh}
in one linearizable step, so the new branch cannot deadlock
or double-decide under concurrent load.
- The Lua script now stores epoch-ms timestamps (passed by Go
so the proxy is authoritative about its clock) as claim
values; legacy "1" placeholder values from pre-T2.3 binaries
fall through to the strict revoke path during a rolling
deploy. Documented in the script docstring + threat-model
row 5.
- MemoryStore gets a memoryEntry struct (expiresAt + setAt) so
the racing branch can compare against the original claim
time without polluting Mark semantics.
- 429 on /token is a deliberate deviation from RFC 6749 §5.2
(which defines 400/401 for token-endpoint errors). Most OAuth
client libraries treat 429+Retry-After as "back off and
retry", which is exactly the desired behavior. Documented in
specs.md.
- Tests: 2 handler-level (racing happy path + zero disables
grace), 2 per store (racing within grace + past-grace
revokes), 6 config (default + zero + custom + non-int +
negative + above-ceiling).
- All four replay_store_error log sites already carry the `op`
field from PR #22; this change preserves it.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Closes T1.2 + T2.1 from
misc/next-steps.md. Wires the existing replay store into two sealed-blob redemption sites that previously relied on TTL alone:POST /consent) — single-use claim on a per-renderjti. EachGET /authorizerender mints a freshjti, so back-button = re-consent (a new claim slot) rather than a stuck-state error.GET /callback) — single-use claim on a per-sessionsidbefore the upstream OIDC exchange. A replayed/callbackURL never fans out to the IdP token endpoint and never produces audit-log noise.Eliminates the only "deliberate trade-off" footnote in
specs.md(the consent-replay paragraph at the old line 337) and ships the audit-noise / IdP-fan-out fix the runbook called out for/callback.Behavior when the replay store is wired (
REDIS_URLset)POST /consentinvalid_request+error_code=consent_replay+mcp_auth_replay_detected_total{kind="consent"}++server_error+error_code=replay_store_unavailable+mcp_auth_access_denied_total{reason="replay_store_unavailable"}++GET /callbackinvalid_request+error_code=callback_state_replay+mcp_auth_replay_detected_total{kind="callback_state"}++Reuses the existing
mcp_auth_access_denied_total{reason="replay_store_unavailable"}counter from/tokenso one alert rule covers every claim site. All fourreplay_store_errorlog lines now carry anopfield (claim_authz_code/claim_refresh_family/claim_consent/claim_callback_state) for site-level correlation.Backwards compatibility
nilreplay store → stateless fallback (configured opt-out, matches/token).JTI/SessionIDon tokens sealed by older binaries → stateless fallback. Lets a rolling deploy avoid 503-ing active users for the duration ofconsentTTL(5 min) /sessionTTL(10 min).Deliberate non-claim
The
/callbackIdP-error redirect path (RFC 6749 §4.1.2.1) does NOT claim — the redirect is idempotent (no IdP fan-out, no token issuance) and claiming would burn the state on a transient IdP error and force re-auth on every legit retry. Documented inline inhandlers/callback.go.Test plan
test,lint,keycloak-e2e,fuzz-smoke,manifest-prod,govulncheck,build).handlers/single_use_replay_test.go:TestConsent_SingleUse_ReplayDetected— second POST → 400consent_replay, replay metric +1, approved counter does NOT double-count.TestConsent_SingleUse_ClaimsBeforeDeny— Deny is also one-shot.TestConsent_SingleUse_NilStoreFallback—nilstore keeps the prior stateless behavior.TestConsent_SingleUse_LegacyEmptyJTI— token with empty JTI succeeds (in-flight rollout).TestConsent_SingleUse_StoreErrorFailClosed— 503 +replay_store_unavailablemetric.TestCallback_SingleUse_ReplayDetected— second/callback→ 400callback_state_replay, replay metric +1.TestCallback_SingleUse_LegacyEmptySID— emptysiddoes NOT hit the replay branch.TestCallback_SingleUse_StoreErrorFailClosed— 503 + metric,verifyFuncnot called.TestAuthorize_RenderConsent_PopulatesJTI— GET/authorize(consent fork) seals a non-emptyjti.TestAuthorize_SilentRedirect_PopulatesSessionID— silent path seals a non-emptysid.-raceruns locally).Docs
specs.md—/consentparagraph reworked from "Why no replay-store integration" → "Single-use replay defense";/callbackstep list grew step 3 ("Single-use claim onsid"); subsequent steps renumbered.README.md—mcp_auth_replay_detected_total{kind}documentation extended withconsentandcallback_state.