Skip to content

chore: use local database state for role management#2916

Open
tgmendes wants to merge 13 commits into
mainfrom
use-local-tables-over-workos-fZao
Open

chore: use local database state for role management#2916
tgmendes wants to merge 13 commits into
mainfrom
use-local-tables-over-workos-fZao

Conversation

@tgmendes
Copy link
Copy Markdown
Contributor

@tgmendes tgmendes commented May 19, 2026

NOTE

I went through multiple rounds of reviews between me, claude and codex - so pretty confident about these changes.

Some things (Such as workOS outbox hardening) I'm deferring for later because otherwise this becomes an even large PR.

What

This PR finalizes the work around WorkOS sync by treating our database as source of truth. We are now using role URNs for principals (rather than slugs). Because this is a breaking change, we need to support the dual path for now (I'll work on a script to backfill so we can drop this multi support).

Key changes:

  1. I've extracted all role management access concerns from living inside impl.go to a dedicated manager. This makes it reusable (probably not going to be needed) but most of all separates the concerns between normal endpoint handling and actual role management (impl was becoming quite hard to follow)
  2. Any role read operation comes from database directly. In Authz we got rid of the caching as we now read from DB that has proper indexes for fast search
  3. Write operations write to database first and workOS last, using a retry mechanism.

Out of scope / follow-ups

  • Legacy role:<slug> backfill (AGE-1954): This PR introduces a dual-read/dual-delete for legacy slug-based principal URNs and canonical role:<kind>:<uuid> URNs. The backfill script that moves existing principal_grants rows from role:<slug> to the canonical URN is not yet written. Until then every grant read hits two principal keys. Owner: . Target: .
  • Cut-over migration: Once the backfill lands, remove RolePrincipals legacy fallback, drop the dual-delete in DeleteRoleGrants, and stop dual-reading in engine.PrepareContext and access.ListGrants.
  • Pagination on ListAccessMembers: Currently returns every member + a LATERAL role lookup per row in one query. Will not scale for enterprise orgs with thousands of members; pagination + total-count is a follow-up.

Durability of WorkOS sync

Local writes are committed first; WorkOS writes happen in a detached goroutine (context.WithoutCancel) with a bounded in-process retry (3 attempts, ≤300 ms backoff). This means:

  • If the pod crashes between tx.Commit and the goroutine completing, WorkOS will diverge from local state with no automatic reconciliation.
  • If WorkOS is down longer than ~1 second total, the sync is logged as failed and never retried.

This is acceptable as an interim step because the local DB is now the authoritative read source for role membership and grants, but it is not a long-term design. Follow-up: move WorkOS writes to an outbox table written inside the same transaction, drained by a Temporal workflow with backoff in the minutes/hours range (Temporal is already wired up in server/internal/background/activities/).

Read-path performance

The previous implementation cached role + grant lookups in-process. This PR removes that cache in favor of direct DB reads. Per RBAC-enforced request the authz engine now executes:

  1. users.GetUser (PK lookup)
  2. ListMemberRolePrincipalsByWorkosUser (assignment + LEFT JOIN org_roles + LEFT JOIN global_roles)
  3. GetPrincipalGrants (indexed lookup on (organization_id, principal_urn))

Indexes used:

  • organization_role_assignments_org_workos_user_role_key (partial unique) covers (2)
  • principal_grants_org_principal_scope_selector_key covers (3)

No load test has been run against this change. If p99 latency or PG QPS regresses, the planned mitigations are (in order of preference): (a) collapse (2) + (3) into a single query that joins assignments to grants by workos_user_id and returns grants directly, (b) short-TTL LRU keyed by (org, userID) invalidated on role mutation.

Bootstrap dependency

EnableRBAC calls SeedSystemRoleGrants, which resolves "admin"/"member" via GetActiveOrganizationRoleBySlug. These slugs live in global_roles, which is populated by WorkOS webhook events. An org cannot have RBAC enabled until the global roles have been synced from WorkOS. Today this is implicit and there's no preflight check; documenting it here so it's not surprising the next time we bring a new environment online. These roles exist in production today.

@tgmendes tgmendes requested a review from a team as a code owner May 19, 2026 11:05
Copy link
Copy Markdown

@claude claude Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude Code Review

This repository is configured for manual code reviews. Comment @claude review to trigger a review and subscribe this PR to future pushes, or @claude review once for a one-time review.

Tip: disable this comment in your organization's Code Review settings.

@vercel
Copy link
Copy Markdown

vercel Bot commented May 19, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
gram-docs-redirect Ready Ready Preview, Comment May 21, 2026 10:02am

Request Review

@changeset-bot
Copy link
Copy Markdown

changeset-bot Bot commented May 19, 2026

⚠️ No Changeset found

Latest commit: c9839ae

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

…-workos-fZao

# Conflicts:
#	server/internal/access/impl.go
#	server/internal/access/listusergrants_test.go
#	server/internal/access/queries.sql
#	server/internal/access/repo/queries.sql.go
#	server/internal/attr/conventions.go
#	server/internal/auth/setup_test.go
#	server/internal/authz/engine_test.go
#	server/internal/organizations/setup_test.go
#	server/internal/xmcp/setup_test.go
@github-actions github-actions Bot added the preview Spawn a preview environment label May 19, 2026
@speakeasybot
Copy link
Copy Markdown
Collaborator

speakeasybot commented May 19, 2026

🚀 Preview Environment (PR #2916)

Preview URL: https://pr-2916.dev.getgram.ai

Component Status Details Updated (UTC)
✅ Database Ready Existing database reused 2026-05-21 10:06:46.
✅ Images Available Container images ready 2026-05-21 10:06:29.

Gram Preview Bot

…-workos-fZao

# Conflicts:
#	server/internal/background/activities/process_workos_org_events.go
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

preview Spawn a preview environment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants