Skip to content

Tighten Admin Validations + IDOR Prevention#1641

Open
JSv4 wants to merge 3 commits into
mainfrom
claude/audit-auth0-security-fK9kK
Open

Tighten Admin Validations + IDOR Prevention#1641
JSv4 wants to merge 3 commits into
mainfrom
claude/audit-auth0-security-fK9kK

Conversation

@JSv4
Copy link
Copy Markdown
Collaborator

@JSv4 JSv4 commented May 13, 2026

Summary

Comprehensive security audit of Auth0 integration and analyzer callback handling. Twelve real fixes across three layers: defense-in-depth allowlist for is_superuser JWT elevation, IDOR-safe error messages in GraphQL mutations, and SHA-256 hashing of analyzer callback tokens to prevent credential leakage via database reads.

Key Changes

Auth0 Privilege Escalation Prevention

  • Defense-in-depth allowlist for is_superuser: Added AUTH0_SUPERUSER_SUB_ALLOWLIST environment variable. JWT claim sync now refuses to elevate is_superuser to True unless the user's Auth0 sub (Django username) appears in the allowlist, regardless of verified token claims. Blocks misconfigured tenants sourcing admin claims from user-writable user_metadata.
  • Reduced admin claims cache TTL: Changed ADMIN_CLAIMS_CACHE_TTL from 300 to 30 seconds for tighter privilege revocation SLA.
  • System check for empty allowlist: Added opencontractserver/users/checks.py to warn at startup when Auth0 is enabled but allowlist is empty.
  • Documentation hardening: Added security warnings in docs/configuration/authentication.md about sourcing claims from app_metadata (not user_metadata) and detailed allowlist setup instructions.

IDOR-Safe GraphQL Mutations

  • Unified error messages: AddRelationship, CreateMetadataColumn, and UpdateMetadataColumn mutations now return identical error messages for both "does not exist" and "no permission" cases, preventing enumeration attacks.
  • Annotation filtering via visible_to_user: Source and target annotations filtered through Annotation.objects.visible_to_user(user) with count validation to catch both missing and unauthorized IDs without echoing them back.
  • Corpus/Column visibility checks: Corpus and Column lookups now use visible_to_user() querysets; non-existent and inaccessible resources collapse into the same error branch.
  • Generic error logging: ORM and constraint exceptions logged server-side with logger.exception() but return generic "Error creating X" messages to clients.
  • Bad global ID handling: Unparseable global IDs treated as indistinguishable from not-found to keep IDOR surface flat.

Analyzer Callback Token Security

  • SHA-256 hashing of callback tokens: Replaced plaintext UUID callback_token field with callback_token_hash (CharField, max_length=64). Database never stores plaintext; a DB read alone cannot let an attacker forge analyzer callbacks.
  • Token rotation on submission: Analysis.rotate_callback_token() generates fresh secrets.token_urlsafe(32) plaintext on each submission, invalidating any in-flight callbacks bound to previous tokens.
  • Constant-time verification: Analysis.verify_callback_token(candidate) uses hmac.compare_digest() to prevent timing leaks.
  • Migration 0021: Backfills existing plaintext tokens as SHA-256 hashes so in-flight analyzers continue working; reverse migration is lossy (generates fresh UUIDs) as hashes cannot be recovered.
  • Redacted logging: Analyzer submission payloads logged with callback token redacted to avoid plaintext leakage in logs.

Supporting Changes

  • Updated opencontractserver/analyzer/views.py to call verify_callback_token() instead of direct string comparison.
  • Updated test fixtures in test_analyzers.py and test_security_hardening.py to mint plaintext tokens via rotate_callback_token() before submission.
  • Added AUTH0_CREATE_NEW_USERS environment variable (default True) to control auto-provisioning of new Auth0 users.
  • Documented User.email as informational (no unique=True constraint); User.username (Auth0 sub) is the only identity field.
  • Added JWKS stale-grace window (_JWKS_STALE_GRACE = 3600 seconds) to bound how long a compromised key

https://claude.ai/code/session_01DEQ8YPfGpzKHr1VviKjdpX

…ck token at-rest

End-to-end audit of every Auth0-touching authorization pathway. Twelve
verified findings shipped; agent-reported false positives (intentional
IDOR-safe SetCorpusVisibility, AnnotationImagesView, moderation mutations,
moderation_metrics, etc.) verified in source and dropped.

- F1: AUTH0_SUPERUSER_SUB_ALLOWLIST gates is_superuser elevation in
  sync_admin_claims_from_payload regardless of JWT claim, blocking the
  user_metadata-sourced privilege escalation path. New users.W001 system
  check warns on empty allowlist when USE_AUTH0=True.
- F2: ADMIN_CLAIMS_CACHE_TTL 300s -> 30s, tightening the privilege
  retention window after Auth0 demotion.
- F3+F4: AddRelationship now uses Annotation/Corpus visible_to_user with
  count comparison and a single unified not-found message; broad except
  no longer surfaces ORM text.
- F5+F6: CreateMetadataColumn and UpdateMetadataColumn use
  visible_to_user + unified not-found-or-no-permission message.
- F7: Datacell mutations catch DoesNotExist explicitly and stop
  echoing ORM exception text.
- F9: Bounded JWKS stale-cache fallback (1h grace past TTL); fails
  closed beyond the window so a key compromise + Auth0 outage cannot
  let the old key keep verifying tokens indefinitely.
- F10: Analysis.callback_token is now stored as SHA-256 hex
  (callback_token_hash). rotate_callback_token() returns a fresh
  plaintext at submit time; verify_callback_token() does the
  constant-time check on incoming requests. Migration backfills
  existing UUID tokens by hash so in-flight analyzers continue to
  authenticate. Submit log redacted.
- F8+F11: Documentation-only. Step 7 walkthrough for the allowlist,
  env-var rows for AUTH0_SUPERUSER_SUB_ALLOWLIST + AUTH0_CREATE_NEW_USERS,
  app_metadata-only requirement called out in danger callout, inline
  comment that User.email is informational (not an identity field).

Operator action required before deploy: populate
AUTH0_SUPERUSER_SUB_ALLOWLIST with the Auth0 sub of every user who must
retain Django is_superuser. Existing superusers whose subs are not in
the allowlist will be demoted within ~30s of their next API call.
@JSv4 JSv4 changed the title Security hardening: Auth0 permissioning audit & callback token hashing Tighten Admin Validations + IDOR Prevention May 13, 2026
Comment thread config/settings/base.py
# protects against misconfigured tenant Actions that source admin claims
# from user-writable ``user_metadata``. Existing superusers whose subs are
# not added here are demoted on next sync; populate this BEFORE deploy.
AUTH0_SUPERUSER_SUB_ALLOWLIST = env.list(
@claude
Copy link
Copy Markdown

claude Bot commented May 13, 2026

Code Review: Security Hardening — Auth0 Permissioning Audit & Callback Token Hashing

This is a well-scoped, defence-in-depth security PR. The changes are correct in intent and largely well-implemented. Below is a detailed breakdown.


Overall Assessment

Positive

  • All three attack surfaces (privilege escalation, IDOR enumeration, credential storage) are addressed coherently and consistently.
  • Constant-time comparison (hmac.compare_digest) is correctly used in verify_callback_token.
  • Migration is correct: forward path hashes existing tokens so in-flight analyzers continue working; reverse path is documented as intentionally lossy.
  • Unified error messages across IDOR-sensitive paths are applied consistently and the logic is clearly commented.
  • System check (users.W001) surfaces operator misconfiguration at startup rather than silently failing at runtime.
  • Redacted logging of callback tokens is a nice follow-through detail.

Issues & Suggestions

Medium — Document permission gap in AddRelationship (annotation_mutations.py)

The new code validates that source/target annotations are visible to the user and that the user has CREATE on the corpus. However document_pk is used in Relationship.objects.create(document_id=document_pk) without any visibility check on the document itself.

If a document belongs to a different corpus the user cannot see, they could still create a relationship referencing it by guessing the ID — as long as the corpus they have CREATE on accepts it. Recommend adding:

try:
    Document.objects.visible_to_user(user).get(pk=document_pk)
except Document.DoesNotExist:
    return AddRelationship(ok=False, relationship=None, message=not_found_msg)

Low — rotate_callback_token() requires caller to remember save() (models.py)

The method mutates self.callback_token_hash but defers save() to the caller. Both existing call-sites do this correctly, but it is an easy footgun for future callers. Consider always saving inside the method when the instance has a PK, or at minimum rename to generate_callback_token() to signal the object is in a dirty state:

def rotate_callback_token(self) -> str:
    plaintext = secrets.token_urlsafe(32)
    self.callback_token_hash = self._hash_callback_token(plaintext)
    if self.pk:
        self.save(update_fields=["callback_token_hash"])
    return plaintext

Low — System check not tagged with Tags.security (checks.py)

@register() without a tag works, but @register(Tags.security) would categorize it correctly in manage.py check --tag security and CI security-focused check runs:

from django.core.checks import Tags, Warning, register

@register(Tags.security)
def check_auth0_superuser_allowlist(...):

Low — N+1 queries in migration backfill (0021_hash_analysis_callback_token.py)

backfill_token_hashes issues one UPDATE per analysis row. For large tables this is slow in a migration window. A bulk_update approach in batches of ~500 would be more robust at scale. Minor for this table size, but worth improving.

Minor — logger.exception redundancy in broad-except blocks

In the new exception handlers, e.g.:

logger.exception("Error creating relationship: %s", e)

logger.exception already appends the full traceback including the exception message, so passing e as a format argument is redundant noise. Just:

logger.exception("Error creating relationship")

Minor — _JWKS_STALE_GRACE is not configurable via settings

_JWKS_STALE_GRACE = 3600 follows the same module-constant pattern as _JWKS_CACHE_TTL, which is fine. Environments with slower Auth0 failover might want to tune this, but a settings.JWKS_STALE_GRACE_SECONDS override is not required for this PR — just flagging for a potential follow-up.


Correctness Checks

  • count() != len(set(source_pks)) correctly handles duplicate IDs in the input without echoing them back. ✓
  • hmac.compare_digest operates on two hex strings of equal length (64 chars each), so the constant-time guarantee holds. ✓
  • _can_serve_stale is only ever called inside _jwks_cache_lock, satisfying its documented precondition. ✓
  • callback_token_hash CharField(max_length=64) is exactly right for SHA-256 hex output. ✓
  • ADMIN_CLAIMS_CACHE_TTL = 30 correctly propagates to the constant in opencontractserver/constants/auth.py and all doc references. ✓
  • The operator migration note in CHANGELOG ("populate AUTH0_SUPERUSER_SUB_ALLOWLIST BEFORE deploy") is present and prominent. ✓

Test Coverage

  • test_correct_token_accepted correctly replaces the old UUID-based test and exercises the rotate_callback_token → verify_callback_token round-trip. ✓
  • The existing test_same_error_for_missing_vs_wrong_token test covers the 403-equality assertion. ✓
  • Gap: No tests for the allowlist enforcement in sync_admin_claims_from_payload (the F1 fix). A test asserting is_superuser stays False when the sub is absent from the allowlist, and flips to True when added, would be valuable. The blocked-elevation warning log path is also untested.
  • Gap: No tests for the users.W001 system check itself. Django system checks are straightforward to unit-test with call_checks() or the check registry.

Summary

The core security fixes (F1–F10) are correctly implemented and well-motivated. The document-visibility gap in AddRelationship and the rotate_callback_token save footgun are the two items I would want resolved before merge. The test coverage gaps for the allowlist enforcement and the system check are lower priority but would increase confidence in the F1 fix long-term.

@codecov
Copy link
Copy Markdown

codecov Bot commented May 13, 2026

Codecov Report

❌ Patch coverage is 54.54545% with 15 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
opencontractserver/analyzer/models.py 53.33% 7 Missing ⚠️
opencontractserver/users/checks.py 66.66% 4 Missing ⚠️
opencontractserver/utils/analyzer.py 0.00% 3 Missing ⚠️
opencontractserver/analyzer/views.py 0.00% 1 Missing ⚠️

📢 Thoughts on this report? Let us know!

@claude
Copy link
Copy Markdown

claude Bot commented May 13, 2026

Code Review: Tighten Admin Validations + IDOR Prevention

This is a well-structured, defense-in-depth security PR. The changes are clearly motivated, consistently applied, and come with good test coverage. Below is a thorough review.


Overview

The PR addresses three distinct security layers:

  1. Auth0 privilege escalation — allowlist gating for is_superuser JWT claim sync
  2. IDOR oracle elimination — unified error messages and visible_to_user() filtering in GraphQL mutations
  3. Callback token credential storage — SHA-256 hashing of Analysis.callback_token at rest

All three areas contain real security improvements and the changes are generally correct. The following notes cover bugs, edge cases, and minor style issues.


Findings

1. _can_serve_stale reads outside the lock — potential race condition

File: config/graphql_auth0_auth/utils.py

_can_serve_stale reads _jwks_cache["data"] and _jwks_cache["expires_at"] without holding _jwks_cache_lock. It is only called from inside the with _jwks_cache_lock: block in _get_cached_jwks, so this is correct in the current call graph — but the docstring says "Caller must already hold _jwks_cache_lock" in a way that's easy to miss. Consider either asserting the lock or making it a private helper with a more prominent guard comment. As written it is safe but fragile.

2. ADMIN_CLAIMS_CACHE_TTL = 30 may be too aggressive for high-traffic deployments

File: opencontractserver/constants/auth.py

Dropping from 300 s to 30 s means every JWT-authenticated API call triggers a cache miss every 30 seconds per user and fires the claim-sync ORM write path. For a deployment with many concurrent users this is a 10× increase in write pressure on the auth path. The security motivation is sound, but consider making this configurable via an env var (like AUTH0_ADMIN_CLAIMS_CACHE_TTL) so operators can tune the trade-off without a code change.

3. visible_to_user + explicit user_has_permission_for_obj double-check in AddRelationship

File: config/graphql/annotation_mutations.py, lines ~506-519

After filtering the corpus through Corpus.objects.visible_to_user(user), the code then calls user_has_permission_for_obj(user, corpus, PermissionTypes.CREATE, ...) separately. This is redundant: visible_to_user already filters for read visibility, and the explicit permission check correctly adds the CREATE gate — but it uses the already-fetched corpus object, which is fine. The only concern is that visible_to_user may return a corpus where the user has READ but not CREATE, and the subsequent check handles this correctly. This is actually the right pattern per the CLAUDE.md permissioning guide. No bug here — just noting it's intentional.

4. Migration 0021 reverse path uses save() instead of update()

File: opencontractserver/analyzer/migrations/0021_hash_analysis_callback_token.py, line ~42

for analysis in Analysis.objects.all():
    analysis.callback_token = uuid.uuid4()
    analysis.save(update_fields=["callback_token"])

save() in a migration RunPython block uses the historical model, which may trigger signals or other side effects. Prefer the same Analysis.objects.filter(pk=...).update(callback_token=uuid.uuid4()) pattern used in the forward path. However, since this is explicitly documented as a lossy/best-effort reverse and no production deployment should rely on it, this is low severity.

5. backfill_token_hashes iterates row-by-row — may be slow on large tables

File: opencontractserver/analyzer/migrations/0021_hash_analysis_callback_token.py, lines ~26-31

for pk, raw_token in Analysis.objects.values_list("id", "callback_token"):
    if raw_token is None:
        continue
    digest = hashlib.sha256(str(raw_token).encode("utf-8")).hexdigest()
    Analysis.objects.filter(pk=pk).update(callback_token_hash=digest)

This fires one UPDATE per row. For tables with thousands of Analysis rows this could be very slow. Since SHA-256 cannot be expressed in a single SQL expression portably, the per-row approach is acceptable — but consider batching with bulk_update or chunking the queryset (iterator(chunk_size=500)) to reduce memory pressure during migration.

6. verify_callback_token returns False for empty callback_token_hash — correct but undocumented edge case

File: opencontractserver/analyzer/models.py

The new rotate_callback_token doesn't call save(), and a newly-created Analysis has callback_token_hash = "". The verify_callback_token guard on empty hash is correct. However, the analyzer submission path in opencontractserver/utils/analyzer.py now correctly calls rotate_callback_token() + save() — but only at submission time. If run_analysis() is called and then the process crashes before save(), the hash field is updated in memory but not persisted. This is pre-existing fragility (the old UUID field had the same issue) and not introduced by this PR, just worth noting.

7. AUTH0_CREATE_NEW_USERS setting is referenced in docs but not in base.py

File: docs/configuration/authentication.md, config/settings/base.py

The documentation Step 7 and env-var table mention AUTH0_CREATE_NEW_USERS, but there's no corresponding env.bool("AUTH0_CREATE_NEW_USERS", default=True) in config/settings/base.py. If the Auth0 backend doesn't currently read this setting, the doc is aspirational rather than functional. Is this intended to be a future follow-up, or was the settings entry accidentally omitted?

8. System check check_auth0_superuser_allowlist fires even in test environments

File: opencontractserver/users/checks.py

The check fires whenever USE_AUTH0=True. In CI runs that set USE_AUTH0=True for Auth0-path tests, this will produce a users.W001 warning on every test run unless the allowlist is populated. Consider adding a settings.TESTING guard or using Django's silenced_system_checks in test settings. This is minor but could be surprising.

9. AddRelationshiprelationship_label_pk is no longer validated

File: config/graphql/annotation_mutations.py

The old code passed relationship_label_pk directly to Relationship.objects.create(relationship_label_id=...). If the relationship_label_id global ID is invalid or doesn't exist, the create() call will raise an IntegrityError or ValueError caught by the broad except Exception block and returned as a generic "Error creating relationship." message. This is acceptable from a security standpoint (no IDOR leak), but it's a slight UX regression compared to a cleaner validation. This is a reasonable trade-off given the security goals.

10. test_correct_token_accepted test name change is correct

File: opencontractserver/tests/test_security_hardening.py

The old test name test_correct_token_uuid_type_accepted was accurate for the old UUID field. The new name test_correct_token_accepted is correct. Good cleanup.


Positive Highlights

  • Unified error messages are applied consistently across AddRelationship, CreateMetadataColumn, UpdateMetadataColumn, and Datacell mutations. The pattern of defining not_found_msg at the top of each mutation and reusing it is clean and prevents accidental divergence.
  • hmac.compare_digest in verify_callback_token is exactly the right tool for constant-time comparison here.
  • secrets.token_urlsafe(32) gives 256 bits of entropy — stronger than the previous UUID v4 (122 bits). Good upgrade.
  • Token rotation on submit invalidates old tokens with no extra state. Elegant design.
  • Migration commentary is thorough and correctly warns operators about the deployment ordering requirement.
  • TestAddRelationshipDocumentIDOR covers both the attack path and the happy path in the same class. The use of assertIn("not found", data["message"].lower()) instead of an exact string match is appropriately loose.
  • CHANGELOG entry is detailed with file paths, fix numbering, and the explicit list of verified false positives. This level of audit transparency is excellent.
  • Operator action required callout in the PR description and changelog is critical and well-placed.

Summary

This is a high-quality security PR. The identified issues are mostly minor (race condition documentation, one missing settings entry, migration performance). The three core security improvements — allowlist gating for superuser elevation, IDOR-safe mutations, and hashed callback tokens — are all correctly implemented with tests.

Recommended action before merge:

  1. Clarify whether AUTH0_CREATE_NEW_USERS is fully wired up or a documentation placeholder (issue Bump mypy from 0.971 to 0.982 #7 above).
  2. Confirm CI test settings won't generate noisy users.W001 warnings (issue Bump crispy-bootstrap5 from 0.6 to 0.7 #8).
  3. Consider making ADMIN_CLAIMS_CACHE_TTL env-configurable (issue Bump traefik from v2.8.7 to v2.9.1 in /compose/production/traefik #2) if write pressure is a concern.

The deployment note about populating AUTH0_SUPERUSER_SUB_ALLOWLIST before deploy is critical and is already well-documented in both the PR description and docs — good.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants