Skip to content

Commit 8ad10bd

Browse files
author
Pulkit Pareek
committed
add three-QR end-user signup ceremony (ADR 0023)
The device-enrollment flow (ADR 0022) gave an operator a way to add a device to a tenant's fleet. This commit adds the missing other half: the way an actual end-user creates an account on the org's site using their phone as the biometric credential carrier. The user described it like this: - Org implements ZeroAuth instead of Google Sign-In. - User fills name + email on the org's signup page. - User scans QR1 on their phone → phone pairs with the session. - User enrolls biometric on the phone → commitment computed locally; biometric never leaves the device. - User scans QR2 on the platform → phone uploads (did, commitment). - User scans QR3 on the platform → phone re-captures biometric, produces Groth16 proof, server verifies + creates the account. Same threat model as WebAuthn registration except the credential is a biometric (and the proof is zero-knowledge instead of a signature). Same UX shape as Slack's "approve-from-desktop" flow. The QR codes are the side channel between the laptop browser and the phone's camera — no Bluetooth, no custom SDK on the laptop, no biometric over the wire. Three single-use SHA-256-hashed codes in three separate columns, each with its own 15-min TTL. Three corresponding routes on the phone side, listed in PUBLIC_ROUTE_EXCEPTIONS for the same reason /v1/devices/enroll is — the code IS the bearer credential. The chain of three single-use codes provides cross-step confused-deputy defence (a captured pair_code can't satisfy submit-commitment; a verify_code can't satisfy pair-device). State machine on the new registration_sessions table: awaiting_device → pair_code outstanding, no device yet awaiting_commitment → device paired, enroll_code outstanding awaiting_verification → commitment stored, verify_code + challenge_nonce outstanding completed → tenant_user created abandoned → expired or admin-cancelled Phone-side endpoints (no auth, code is bearer, 20 req/min/IP): POST /v1/registrations/pair-device POST /v1/registrations/submit-commitment POST /v1/registrations/complete Tenant-side endpoints (tenant API key): POST /v1/registrations GET /v1/registrations/:id (redacts code hashes + nonce) DELETE /v1/registrations/:id Defence-in-depth: `sanitizeProfile` regex-strips any profile-blob key matching image/template/pixel/depth/frame/raw_face/raw_finger/ biometric/photo (with word-boundary matching) at ingest, so a buggy tenant SDK that passes a raw biometric in the profile field gets the key dropped rather than committed. V1 limitation documented in ADR 0023: the challenge_nonce binds to the *request*, not to the proof's public signals (circuit v1.2 doesn't have a slot for it). Replay across sessions is blocked by the single-use verify_code chain + 15-min TTL + rate-limit; full circuit-bound binding lands with circuit v1.3 in Phase 1 Sprint 4. The deeplink format and route surface stay stable across that upgrade. Verify: - npx tsc --noEmit clean - tests/registration-flow 19/19 (mocked pg pool, no Postgres) - npm test 524/524 across 44 suites - ADR 0023 captures the state machine, code-format math, confused-deputy defence, four threat-model deltas (A-30..A-33), and the deferred items (QR rendering dep, circuit-bound challenge, SSE poll-replacement, per-tenant profile schema, dashboard demo UI). Closes the end-user signup half of the BFSI demo. The dashboard demo page that walks an operator through the ceremony with a simulated phone panel lands in a follow-up commit (mirroring the shape of demo/QrProofLogin).
1 parent c4681a9 commit 8ad10bd

10 files changed

Lines changed: 1720 additions & 1 deletion

File tree

Lines changed: 144 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,144 @@
1+
# ADR 0023 — Three-QR end-user signup ceremony
2+
3+
- **Status:** Accepted
4+
- **Date:** 2026-05-28
5+
- **Phase:** Phase 1 sprint 2 (the "user creates an account" half of the BFSI demo)
6+
- **Related:** ADR 0017 (face-first identity surface), ADR 0018 (mobile face embedding pipeline), ADR 0021 (RS256 JWT), ADR 0022 (production device-enrollment flow)
7+
8+
## Context
9+
10+
ADR 0022 gave us a way for an *operator* to register a *device* (kiosk, IoT bridge, phone) into a tenant's fleet. That's the right primitive for fleet onboarding but it's the wrong shape for the actual onboarding scene: an *end user* creating *their own account* on the *org's website* by *using their phone* as the credential carrier.
11+
12+
The end-user flow that the BFSI demo (and any post-Auth0 SaaS integration) actually needs:
13+
14+
1. Org has implemented ZeroAuth instead of Google Sign-In on their signup page.
15+
2. End user fills in name + email + whatever the org wants (just like a Google Sign-In flow).
16+
3. The signup page asks for a biometric. The user doesn't have a webcam-grade face capture, and the org doesn't want to ship a depth-camera-required SDK. The user's *phone* is the biometric capture device — but the org's server must never see the raw biometric.
17+
4. After the user does the biometric on the phone, the signup page must somehow get *proof of biometric possession* tied to *this user's new account on this org's site* — without the phone ever talking directly to the org's site (different origins, different sessions, often different networks).
18+
19+
This is exactly the problem the [WebAuthn registration ceremony](https://www.w3.org/TR/webauthn-2/#sctn-registering-a-new-credential) solves for hardware security keys, except (a) the credential is a *biometric*, not a key, so the phone produces a *zero-knowledge proof* of biometric possession instead of a signature; (b) the phone has no Bluetooth pairing channel to the laptop; (c) we don't want a custom SDK on the laptop. The side channel that solves all three: QR codes on the laptop screen, scanned by the phone's camera.
20+
21+
The user described the flow exactly:
22+
23+
> "They'll register a device, by scanning a qr on their phone. Then enroll their biometrics on the phone and proof is getting generated. Then after biometric setup they'll scan the another qr on the platform that'll compare the device ids and the proof also get's transferred. Now after final confirmation when the user verifies their biometric on the phone and generates proof and by scanning a qr on the console the account finally get's created."
24+
25+
So: **three QRs**, one ceremony, biometric stays on the phone, server only ever sees the commitment and the proof. Same threat model as WebAuthn registration; same UX shape as Slack's "approve sign-in from a desktop" flow.
26+
27+
## Decision
28+
29+
Adopt the three-QR ceremony as the canonical end-user signup flow. Schema, service module, and a 7-endpoint API surface. The biometric pipeline (FaceEmbedder → Quantizer → SHA-256 → Poseidon → DID) already lives on-device per ADR 0018; this ADR is the *coordination protocol* that ties three on-device steps to one server-side account-creation transaction.
30+
31+
### State machine
32+
33+
`registration_sessions.state`:
34+
35+
| State | Meaning |
36+
|------------------------|----------------------------------------------------------------------------|
37+
| awaiting_device | Session opened by tenant SDK; `pair_code` outstanding; QR1 is on screen. |
38+
| awaiting_commitment | Phone paired the device; `enroll_code` outstanding; QR2 is on screen. |
39+
| awaiting_verification | Commitment received; `verify_code` + `challenge_nonce` outstanding; QR3. |
40+
| completed | Proof verified; `tenant_user` row created; ceremony done. |
41+
| abandoned | Tenant called DELETE or whole-session TTL elapsed. |
42+
43+
Whole-session TTL is **30 minutes**. Each step's bearer code TTL is **15 minutes** (matches ADR 0022 device enrollment). The whole session can outlive a single code's TTL — if the operator stalls between scans, the next code is re-issued by the tenant SDK calling the start endpoint again on the same session row (Phase 1 Sprint 3 follow-on — V1 makes the operator restart).
44+
45+
### Codes
46+
47+
Three independent codes, each in its own row column:
48+
49+
- `pair_code_hash` — consumed at step 1
50+
- `enroll_code_hash` — consumed at step 2
51+
- `verify_code_hash` — consumed at step 3
52+
53+
Each is `ZA-XXXX-XXXX` (the same format as ADR 0022, reused via `src/services/device-enrollment.ts::generateEnrollmentCode`). Each is stored as SHA-256, returned in plaintext exactly once. Each step's handler reads only its own column — a captured `pair_code` cannot satisfy the `submit-commitment` handler, and so on. This blocks the confused-deputy class of attack where someone replays an old QR into a later step.
54+
55+
### Challenge nonce (replay defence)
56+
57+
After step 2, the server mints `verify_challenge_nonce` (128 bits hex, single-use, scoped to this row) and bakes it into QR3's deeplink. The phone echoes the nonce back with the proof in step 3; the server checks it matches what it issued.
58+
59+
**V1 limitation:** the challenge_nonce is bound to the *request*, not to the *proof itself*. The existing identity_proof.circom (v1.2) doesn't yet have a public-input slot for a session challenge — `publicSignals[0]` is the commitment and that's it. Replay across sessions is therefore prevented by:
60+
61+
- The single-use `verify_code` (an old proof can't be submitted into a fresh session because the fresh session has a different `verify_code_hash`).
62+
- The 15-minute TTL on `verify_code_expires_at`.
63+
- The per-IP rate-limit (20 req/min on the phone-side endpoints).
64+
65+
**Phase 1 Sprint 4 follow-on:** circuit v1.3 adds a public-input slot for the challenge nonce; the route handler then asserts `publicSignals[1] === verify_challenge_nonce` *and* the proof verifies — closing the proof-replay surface entirely. The deeplink format and route surface stay stable; only the circuit upgrades.
66+
67+
### Routes
68+
69+
| Method | Path | Auth | Purpose |
70+
|---|---|---|---|
71+
| POST | `/v1/registrations` | tenant API key (users:write) | Start session; returns `pair` envelope for QR1. |
72+
| GET | `/v1/registrations/:id` | tenant API key (users:read) | Poll state (redacts code hashes + challenge nonce). |
73+
| DELETE | `/v1/registrations/:id` | tenant API key (users:write) | Abandon session; idempotent on completed rows. |
74+
| POST | `/v1/registrations/pair-device` | **none — `pair_code` is bearer** | Step 1: phone claims a device. |
75+
| POST | `/v1/registrations/submit-commitment` | **none — `enroll_code` is bearer** | Step 2: phone uploads (did, commitment). |
76+
| POST | `/v1/registrations/complete` | **none — `verify_code` is bearer** | Step 3: phone uploads proof; tenant_user is created. |
77+
78+
The three phone-side endpoints are listed in `tests/tenant-isolation.test.ts::PUBLIC_ROUTE_EXCEPTIONS` for the same reason `/v1/devices/enroll`, `/v1/zkp/verify`, and the proof-pairing public endpoints are: the QR-supplied code is the bearer credential and there is no tenant API key available on the phone side.
79+
80+
### What the phone sends, what the server sees
81+
82+
| Step | Phone sends | Server sees | What's NOT sent |
83+
|------|-------------|-------------|-----------------|
84+
| 1 | `pair_code`, `fingerprint` (≥16 chars, opaque), `attestation_kind?` | SHA-256(fingerprint); attestation kind string | the biometric, the secret, the commitment |
85+
| 2 | `enroll_code`, `did` (`did:zeroauth:<method>:<hex>`), `commitment` (hex) | the DID + commitment as strings | the secret, any biometric data |
86+
| 3 | `verify_code`, `challenge_nonce`, `proof` (Groth16), `public_signals` | the proof + the commitment in `publicSignals[0]` | the secret, the embedding |
87+
88+
The biometric NEVER touches a network wire. Source-grep guard `tests/biometric-rejection.test.ts` continues to block any handler reading `req.body.image/template/pixel/depth/frame/raw_face/raw_finger/biometric_data/photo`; the registration ingress is also defended at the JSON-body layer by `sanitizeProfile` (regex-stripped before insert) so a buggy tenant SDK that *does* pass one of those keys gets the key dropped, with a warn log, rather than committed to the row.
89+
90+
### Audit-log surface
91+
92+
Six new actions, all routed through `appendAuditEvent` so they land in the ADR 0013 hash chain:
93+
94+
- `registration.started` — tenant SDK opened a session
95+
- `registration.device_paired` — actor_type='device', step 1 completed
96+
- `registration.commitment_submitted` — actor_type='device', step 2 completed
97+
- `registration.completed` — actor_type='device', step 3 completed, tenant_user created
98+
- `registration.abandoned` — tenant SDK or admin cancelled
99+
100+
The plaintext codes and the challenge_nonce never appear in audit metadata. Step-2 metadata records `commitmentPrefix` (first 16 chars) but not the full commitment — sufficient to forensically correlate without leaking the value into a log retention window broader than the tenant_users row.
101+
102+
### Backwards compatibility
103+
104+
This is purely additive — no existing route, no existing column, no existing test breaks. The new `registration_sessions` table is independent of `devices` and `tenant_users` apart from FK relationships (`device_id`, `tenant_user_id`) that are nullable until the relevant step lands. The existing `/v1/identity/register` and `/v1/identity/verify` continue to work for non-ceremony integrations (an SDK that wants to do its own session orchestration can call them directly).
105+
106+
## Alternatives considered
107+
108+
1. **One QR with embedded state machine** — the user's phone scans one QR, opens a websocket-like channel, and the server pushes step transitions over SSE. Drops 2 QR scans but loses the explicit "I agree to this step" UX moment + lets a stale phone hold an open session indefinitely. Rejected.
109+
110+
2. **WebSocket from phone to server** — strictly stronger than QR2 + QR3 (server pushes everything once paired). Requires the phone app to be persistently connected; doesn't degrade if the phone goes offline mid-ceremony; requires SSL termination on the load balancer for ws://. Defer until we have a measured pain point.
111+
112+
3. **Server-side biometric verification (skip the phone)** — abandons the privacy guarantee that makes ZeroAuth different from every legacy product. Out of scope.
113+
114+
4. **Native deep-link callback (universal links / app links)** — the phone opens a `https://zeroauth.dev/reg/...` URL that triggers the companion app and embeds session state in the URL. Strictly an *additional* surface on top of QR — useful when the operator is on the same device as the user (rare for this BFSI flow) but doesn't replace QR for the cross-device case. Add later as an alternate scheme.
115+
116+
5. **Single shared "registration_token" instead of three codes** — collapses the three SHA-256 columns into one. Loses the per-step confused-deputy defence. Rejected.
117+
118+
## Out of scope (deferred)
119+
120+
- **QR rendering in the dashboard.** Server returns the deeplink; the SDK / dashboard renders it as a QR. V1 ships the deeplink alone (no `qrcode` dep, matching ADR 0022's deferral). A follow-up commit adds the dep with its own ADR.
121+
- **Circuit-bound challenge nonce.** Phase 1 Sprint 4 circuit upgrade adds `publicSignals[1]` for the session challenge. The deeplink already carries the nonce; the route handler will assert circuit-side binding when the new circuit ships.
122+
- **Per-device token after enrollment.** Step 1 binds a `fingerprint_hash` to a device row; step 3 binds the device to a tenant_user. The device gets no long-lived bearer credential yet — that lands with the heartbeat protocol in Phase 1 Sprint 4 (same path as the ADR 0022 follow-on).
123+
- **Real-time SSE stream for the platform.** V1 has the tenant SDK poll `GET /v1/registrations/:id` every 2–3 seconds. SSE is a clean upgrade (the proof-pairing flow already does it) — defer until the BFSI demo reveals latency complaints.
124+
- **Per-tenant profile schema validation.** V1 accepts an opaque `profile` blob with biometric-key sanitisation only. Tenant-specific schemas (e.g., bank-onboarding requires PAN + Aadhaar masked digits + employee_code) come with the per-tenant `tenant.security_policy.registration_schema` JSON Schema validator in Phase 2.
125+
- **End-user-facing demo UI** in the dashboard. V1 ships backend only — the dashboard `Devices.tsx` redesign from ADR 0022 is the operator-facing surface. The 3-QR demo page (mirroring the existing `demo/QrProofLogin` route shape) lands in a follow-up commit.
126+
127+
## Verification
128+
129+
- `tests/registration-flow.test.ts` — 19 tests across the four service-layer entry points and their failure modes; mocked pg pool so no Postgres is required.
130+
- `tests/tenant-isolation.test.ts` — three new PUBLIC_ROUTE_EXCEPTIONS entries with reason strings ≥ 20 chars.
131+
- `tests/schema-purity.test.ts``registration_sessions` added to both TENANT_SCOPED_TABLES (biometric-name guard applies) and KNOWN_TABLES (new-table guard satisfied).
132+
- `npm test` — 524/524 across 44 suites.
133+
- `npx tsc --noEmit` — clean.
134+
135+
## Threat model deltas
136+
137+
New rows added to `docs/threat_model.md` (Phase 1 sprint 2 update batch):
138+
139+
- **A-30** — Captured QR1 / QR2 / QR3 replay. Mitigation: per-step single-use SHA-256-hashed code, 15-min TTL, per-IP rate-limit, three separate code columns block cross-step reuse.
140+
- **A-31** — Hostile phone enrolls into another user's session by guessing the pair_code. Mitigation: 38-bit code entropy × 15-min TTL × 20 req/min/IP rate-limit ≈ 225,000× window-length brute-force cost. Same calibration as ADR 0022.
141+
- **A-32** — Replayed proof from another session. V1 mitigation: single-use verify_code chain + 15-min TTL. Phase 1 Sprint 4 closure: circuit-bound challenge nonce in `publicSignals[1]`.
142+
- **A-33** — Tenant SDK passes a raw biometric in the `profile` blob. Mitigation: `sanitizeProfile` strips any field name containing image/template/pixel/depth/frame/raw_face/raw_finger/biometric/photo (with word-boundary matching) at ingest; warn-logged.
143+
144+
The full threat-model table is updated alongside this commit's API contract changes; this ADR captures the four deltas.

docs/api_contract.md

Lines changed: 28 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -70,6 +70,34 @@ Console-side device endpoints (require console JWT):
7070

7171
Enrollment code format: `ZA-XXXX-XXXX`, 8 entropy chars from a 27-symbol Crockford-base32 alphabet (no `0`, `1`, `I`, `L`, `O`, `U`). The deeplink format is `zeroauth://enroll?code=<code>` and is stable across V1.
7272

73+
### Central API — end-user registration ceremony (`/v1/registrations`)
74+
75+
The three-QR end-user signup flow. See [ADR 0023](../adr/0023-three-qr-signup-ceremony.md) for design + state machine + threat-model deltas. The biometric never touches the server side; only the Poseidon commitment (step 2) and the Groth16 proof (step 3) do.
76+
77+
| Method | Path | Auth | Purpose |
78+
|---|---|---|---|
79+
| `POST` | `/v1/registrations` | `users:write` | Open a session. Body: `{ profile?: object }`. Returns `{ session, pair: { code, expires_at, deeplink } }`. Render `pair.deeplink` as QR1. |
80+
| `GET` | `/v1/registrations/:id` | `users:read` | Poll state. Response redacts all code hashes + challenge nonce. |
81+
| `DELETE` | `/v1/registrations/:id` | `users:write` | Abandon (idempotent). Voids outstanding codes; row retained for audit. |
82+
| `POST` | `/v1/registrations/pair-device` | **none — `pair_code` is bearer** | Step 1. Body: `{ pair_code, fingerprint, attestation_kind? }`. Phone scans QR1. Server claims a device row (reuses ADR 0022 fingerprint binding), attaches to session, mints `enroll_code` for step 2. Returns `{ session_id, device_id, next: { step: 'enroll', code, expires_at, deeplink } }`. |
83+
| `POST` | `/v1/registrations/submit-commitment` | **none — `enroll_code` is bearer** | Step 2. Body: `{ enroll_code, did, commitment, attestation_kind? }`. Phone scans QR2 after capturing biometric locally. Server stores (did, commitment), mints `verify_code` + `challenge_nonce` for step 3. Returns `{ session_id, next: { step: 'verify', code, expires_at, deeplink, challenge_nonce } }`. |
84+
| `POST` | `/v1/registrations/complete` | **none — `verify_code` is bearer** | Step 3. Body: `{ verify_code, challenge_nonce, proof, public_signals }`. Phone scans QR3, re-captures biometric, produces Groth16 proof. Server asserts `challenge_nonce` matches, asserts `publicSignals[0]` equals stored commitment, verifies proof off-chain, creates `tenant_user`. Returns `{ session_id, tenant_user, device }`. |
85+
86+
State machine: `awaiting_device → awaiting_commitment → awaiting_verification → completed` (or `abandoned`). Whole-session TTL is 30 min; each code's TTL is 15 min. Phone-side endpoints are rate-limited at 20 req/min per IP via `pgRateLimit`.
87+
88+
Failure-mode surface (uniform envelopes to defeat enumeration):
89+
90+
| Code | When |
91+
|---|---|
92+
| `400 invalid_request` | Required field missing or malformed at the JSON layer. |
93+
| `404 pair_failed` | Step 1: unknown / expired pair_code, invalid fingerprint, session expired. |
94+
| `404 enroll_failed` | Step 2: unknown / expired enroll_code, wrong session state. |
95+
| `404 verify_failed` | Step 3: unknown / expired verify_code, challenge mismatch, commitment mismatch, proof verification failed. |
96+
| `404 session_not_found` | Tenant poll: id does not exist in this tenant/environment. |
97+
| `429` | Phone-side rate-limit (20/min/IP) exceeded. |
98+
99+
The deeplink schema is `zeroauth://reg?step=<pair|enroll|verify>&session=<uuid>&code=<code>[&challenge=<hex>]` and is stable across V1.
100+
73101
### Central API — users (`/v1/users`)
74102

75103
| Method | Path | Scope | Description |

src/routes/v1/index.ts

Lines changed: 8 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,7 @@ import verificationRoutes from './verifications';
99
import attendanceRoutes from './attendance';
1010
import auditRoutes from './audit';
1111
import proofPairingRoutes from './proof-pairing';
12+
import registrationRoutes from './registrations';
1213

1314
const router = Router();
1415

@@ -23,8 +24,13 @@ const router = Router();
2324
* /v1/attendance/* — Check-in / check-out events
2425
* /v1/audit/* — Business audit log
2526
* /v1/proof-pairing/* — QR-mediated cross-device proof pairing (W3, ADR-0009)
27+
* /v1/registrations/* — Three-QR end-user signup ceremony (ADR-0023)
2628
*
27-
* All routes require: Authorization: Bearer za_live_xxx
29+
* Most routes require: Authorization: Bearer za_live_xxx — except
30+
* the phone-side handshake endpoints (registrations/pair-device,
31+
* /submit-commitment, /complete) where the QR-supplied code is the
32+
* bearer credential. Those routes are listed in
33+
* tests/tenant-isolation.test.ts PUBLIC_ROUTE_EXCEPTIONS.
2834
*/
2935
router.use('/auth/zkp', zkpRoutes);
3036
router.use('/auth/saml', samlRoutes);
@@ -36,5 +42,6 @@ router.use('/verifications', verificationRoutes);
3642
router.use('/attendance', attendanceRoutes);
3743
router.use('/audit', auditRoutes);
3844
router.use('/proof-pairing', proofPairingRoutes);
45+
router.use('/registrations', registrationRoutes);
3946

4047
export default router;

0 commit comments

Comments
 (0)