Skip to content

feat(crypto): epoch key foundation — epoch keys, history chain, escrow, device seal, K_local (CAP-58)#235

Open
cvince wants to merge 5 commits into
mainfrom
feat/cap-58-epoch-crypto
Open

feat(crypto): epoch key foundation — epoch keys, history chain, escrow, device seal, K_local (CAP-58)#235
cvince wants to merge 5 commits into
mainfrom
feat/cap-58-epoch-crypto

Conversation

@cvince

@cvince cvince commented Jun 10, 2026

Copy link
Copy Markdown
Contributor

Pure cryptographic foundation for the epoch key model — cryptographic kick revocation without UX change (CAP-58, design: monorepo docs/epoch-key-design.md).

Additive and inert. New modules only; no existing behavior changes; nothing is wired into a command flow yet. This PR is the reviewable, fully-tested crypto core that the flow integration (invite/redeem/kick/transport/run) builds on next.

Modules

  • crypto/epochCrypto.tsgenerateEpochKey (CSPRNG, derived from nothing), deriveEpoch0 (the one M-derived migration epoch), deriveProjectKey(E_e, projectId, orgId), the backward history chain (org-wide + per-project, confined to the project), escrow blobs AES-GCM(E_e, HKDF(M,"escrow",e)), and snapshotAAD binding {orgId, projectId, epoch} (extends crypto: bind AAD on master-key AEAD wrapping (CAP-57) #233's AAD scheme to the data layer).
  • crypto/deviceKey.ts — per-user X25519 device keypair + box-style seal/open, built on native node:crypto only (no native addons — respects the pkg-binary constraint). Fresh ephemeral key per seal; shared secret mixed with both public keys via HKDF.
  • crypto/localKeyRoot.tsK_local, the device-local inner-wrap root that replaces the service-computable SHA256(userId:orgId), plus the two HKDF inner-key derivations (capy:inner:epoch, capy:inner:device).
  • config/globalConfig.ts — storage for K_local (local.key) and the double-wrapped device private key (device.enc) under orgs/<org>/users/<user>/ — the recovery-equivalent area capy logout never wipes (verified: logout only deletes the auth session file).

Why K_local

Pre-implementation review found the original ADR-12 fix circular: the legacy inner-wrap key SHA256(userId:orgId) is computable by the service (which strips the KMS outer layer on every co-decrypt), and re-keying the inner layer with the device private key didn't help because that key was wrapped the same way. K_local is 32 CSPRNG bytes, per machine, never transmitted, never derivable from identifiers — so the inner layer is finally opaque to the service. Full rationale in the monorepo design doc §4 + ADR-12 (amended) and CAP-58.

Tests (25, bun)

History walk E_3→E_2→E_1 + forward-walk-impossible; per-project chain confinement; escrow round-trip + epoch binding + wrong-M failure; device seal/open + tamper + wrong-recipient; K_local determinism/isolation; AAD binding. Plus the two canonical regression guards — inner key NOT derivable from public identifiers and service-view blindness — and a full kick-lifecycle composition that proves e2e scenarios 2 (kick blocks future) and 5 (owner break-glass) directly at the crypto layer.

Full CLI suite green except 2 pre-existing runCommand deployed-mode failures present on main (unrelated). typecheck + build clean.

Not in this PR (next steps)

Command-flow wiring (invite wraps E_e; redeem/transport/recover mint K_local + device keypair; kick epoch-bump transaction; per-run transparent re-key; legacy→K_local migration; keep.json epoch tagging), and the version bump — these are UX-facing and depend on the monorepo service PR being deployed for live testing, so they land in a follow-up after hands-on QA.

⚠️ Do not merge — the user merges.

cvince added 5 commits June 9, 2026 18:40
…w, device seal, K_local (CAP-58)

Pure crypto layer for cryptographic kick revocation (docs/epoch-key-design.md).
Additive and inert: new modules, no existing behavior changes, not yet wired into
any command flow. The flow integration (invite/redeem/kick/transport/run) lands
separately and is tested against the deployed service.

- crypto/epochCrypto.ts: generateEpochKey (CSPRNG), deriveEpoch0 (M-derived
  migration epoch), deriveProjectKey(E_e, ...), backward history chain
  (org-wide + per-project, confined), escrow blobs (HKDF(M,"escrow",e)),
  snapshotAAD binding {orgId, projectId, epoch}
- crypto/deviceKey.ts: X25519 device keypair + box-style seal/open built on
  native node:crypto only (no native addons — pkg binary constraint)
- crypto/localKeyRoot.ts: K_local inner-wrap root (replaces the
  service-computable SHA256(userId:orgId)) + the two HKDF inner-key derivations
- config/globalConfig.ts: K_local + device-key storage under
  orgs/<org>/users/<user>/ (the logout-safe, recovery-equivalent area)

Tests (bun, 25 cases): history walk + forward-walk-impossible, per-project
confinement, escrow round-trip + epoch binding, device seal/open + tamper,
K_local determinism, AAD binding, plus two regression guards — inner key NOT
derivable from public identifiers, and service-view blindness — and a full
kick-lifecycle composition proving e2e scenarios 2 (kick blocks future) and 5
(owner break-glass) at the crypto layer.
… (CAP-58)

Retire the service-computable inner-wrap key in favor of K_local at every
key.enc unwrap/wrap site. M is still the data root here (no epoch machinery yet)
— this is the standalone "close the co-decrypt exposure" half of CAP-58.

- keyResolver: new unwrapMasterKey() shared by resolveProjectKey + invite +
  transport. Tries K_local first, falls back to legacy SHA256 (self-heals a
  split-brain), and transparently re-wraps onto a freshly-minted K_local.
  wrapAndSaveMasterKey now mints K_local and wraps under HKDF(K_local).
- inviteCommand / transportCommand: use unwrapMasterKey instead of the inline
  legacy-key unwrap, so an already-migrated key.enc resolves correctly. K_local
  never enters the invite/transport payload (those re-wrap M under their token).
- tests: keyResolver migration test (legacy blob -> K_local re-wrap, no longer
  opens with the legacy key); recoverKdf helper reads K_local to unwrap the
  blob recover now writes.

Validated end-to-end: full e2e harness (init/invite/redeem/sync/kick/transport/
recover) green except one pre-existing protected-branch role-propagation flake
that fails identically with and without these changes. Full CLI unit suite green.
…s (CAP-58)

Increment A of the epoch machinery — device-key plumbing. Additive; data
encryption is still M-derived (the E_e switch lands in the next increment).

- serviceClient: getEpoch, registerDevice, listDevices, stage/commitEpoch,
  getEpochHistory, getEpochEscrows, backfillEscrows, getSealedBlobs; listMembers
  now surfaces the per-user device_keys map.
- crypto/deviceManager.ts: ensureDeviceKey (mint X25519 keypair, double-wrap the
  private key under HKDF(K_local) + KMS, register the public key — idempotent,
  self-healing) and loadDevicePrivateKey (co-decrypt + K_local unwrap).
- redeem: registers this machine's device keypair after the key is saved
  (best-effort — never blocks redeem).

Tests: deviceManager round-trip (mint/load, a blob sealed to the registered
pubkey opens with the recovered key, idempotency). Full CLI unit suite green
(462). e2e green except the known pre-existing WorkOS kick-propagation flake
("kicked user cannot re-use invite code"), whose code path this increment does
not touch (device reg runs after the co-decrypt 403 that aborts that path).

No version bump (VERSION stays 0.6.1) — release happens after e2e is green.
Per CAP-58's migration decision (confirmed by Vince): existing ciphertext is
encrypted under deriveProjectKey(M, …), so epoch 0 must use M directly for
legacy data to read without an O(data) re-encryption. deriveEpoch0 now returns
M (a copy), and the test asserts E_0 == M. Backward-compat is covered by a new
e2e scenario in the monorepo (legacy SHA256-wrapped key.enc migrates to K_local).

No version bump.
… (CAP-58)

The crypto + resolution plumbing for epoch-keyed data. Behaviorally a NO-OP
today: with no kick bumping the org past epoch 0, the current epoch key IS M,
so resolveProjectKey derives exactly as before. Validated: full unit suite
(462) + e2e green (only the known WorkOS-timing flake).

- crypto/epochManager.ts: getCurrentEpochKey (membership-gate via unwrapMasterKey
  first, then catch up), ensureCurrentEpoch (recover the current epoch key from
  M + escrow — works for any M-holder; kicked users can't fetch the post-kick
  escrow), refreshEpoch (device-sealed end-state path), bumpEpoch (the kick
  transaction — built + ready, not yet called).
- config/globalConfig.ts: epoch.enc storage (EpochKeyRecord, readLocalEpoch).
- keyResolver: KeyServiceOps gains optional getEpoch/getEpochEscrows;
  resolveProjectKey routes through getCurrentEpochKey (epoch 0 = M).
- 6 command ops builders pass the epoch methods.
- kickCommand: bump intentionally NOT wired — see NOTE. Activating it needs
  per-snapshot epoch tagging so data pushed under an older epoch stays readable
  after a bump (cross-epoch reads). That's the next increment.

No version bump.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant