Skip to content

fix(auth): don't cache transient org-membership check failures#604

Open
norrietaylor wants to merge 1 commit into
mainfrom
fix/mcp-org-membership-transient-cache
Open

fix(auth): don't cache transient org-membership check failures#604
norrietaylor wants to merge 1 commit into
mainfrom
fix/mcp-org-membership-transient-cache

Conversation

@norrietaylor

@norrietaylor norrietaylor commented Jun 6, 2026

Copy link
Copy Markdown
Owner

Root cause

OrgMembershipChecker._check_org (src/distillery/mcp/org_membership.py) caches the result of every membership lookup keyed by (username, org) for the configured TTL (default 1 hour). The underlying _fetch_membership / _fetch_public_member helpers fail closed — they return False not only for a definitive GitHub 404 (genuinely not a member) but also for transient conditions: a network error (httpx.HTTPError) or an unexpected status such as 5xx / rate-limit.

Because _check_org cached that False unconditionally, a momentary GitHub outage during a single request would lock a legitimate org member out for the full TTL — every subsequent request returned the cached denial even after GitHub recovered. Failing closed for the current request is correct; persisting that denial is not.

Fix

Introduce a private _IndeterminateMembershipError raised on the transient/unknown paths in _fetch_membership and _fetch_public_member (network error and unexpected status), distinct from the definitive 404 -> False. _check_org catches it, returns False for the current request (still fail-closed), and skips the cache write so the next request re-hits the GitHub API. Definitive results (204 -> True, 404 -> False) are cached exactly as before.

Surgical change confined to org_membership.py; no public API or signature changes.

Acceptance

  • New tests in tests/test_mcp_org_membership.py:
    • test_transient_error_is_not_cached — network error then recovery: first request denies (fail-closed), second request admits the member and re-hits the API.
    • test_unexpected_status_is_not_cached — same for a 503 from the members endpoint.
  • Both tests fail on main (cached False → second request returns False, API hit once) and pass with the fix.
  • Existing behavior preserved: test_negative_result_is_cached (real 404 still cached), test_network_error_fails_closed, test_unexpected_status_returns_false, and the full 45-test test_mcp_org_membership.py suite pass.
  • test_middleware.py, test_mcp_auth.py, test_mcp_http_transport.py all pass (130 total).
  • ruff check and mypy --strict clean on touched files.

🤖 Generated with Claude Code

Summary by CodeRabbit

  • Bug Fixes

    • Enhanced organization membership verification to properly distinguish between transient failures (network errors, rate limiting, temporary service errors) and permanent denials, preventing transient conditions from being inappropriately cached and blocking future access requests.
  • Tests

    • Added comprehensive test coverage ensuring organization membership checks correctly retry after transient errors instead of relying on cached failure states.

A transient GitHub outage (network error or unexpected 5xx) made the
org-membership check fail closed for the request — correct — but the
resulting False was cached for the full TTL (default 1 hour). A
legitimate org member would then be locked out for up to an hour even
after GitHub recovered, since every subsequent request returned the
cached denial.

Distinguish a definitive non-member (GitHub 404) from an indeterminate
result via a private _IndeterminateMembershipError raised on transient
errors. _check_org catches it, fails closed for the current request, and
skips caching so the next request re-hits the API.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@coderabbitai

coderabbitai Bot commented Jun 6, 2026

Copy link
Copy Markdown
Contributor

Too many files changed? Review this PR in Change Stack to see how the pieces fit before you dive in.

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro Plus

Run ID: e9c95409-edbf-49de-9098-b1125d276a51

📥 Commits

Reviewing files that changed from the base of the PR and between 5a437a2 and 3c67ca4.

📒 Files selected for processing (2)
  • src/distillery/mcp/org_membership.py
  • tests/test_mcp_org_membership.py

📝 Walkthrough

Walkthrough

The PR modifies membership-checking logic to prevent transient/indeterminate failures from being cached. A new _IndeterminateMembershipError exception signals network errors, 5xx responses, and unexpected statuses that should fail closed immediately but must not be stored in the TTL cache. The changes ensure subsequent requests re-check the upstream API instead of serving a failed denial from cache.

Changes

Transient membership failure caching prevention

Layer / File(s) Summary
Indeterminate membership error definition
src/distillery/mcp/org_membership.py
Introduces _IndeterminateMembershipError internal exception type to represent transient/unknown membership conditions.
Cache wrapper for indeterminate error handling
src/distillery/mcp/org_membership.py
_check_org wraps membership fetching with try/except to catch _IndeterminateMembershipError, fail closed to False, and skip cache write.
Membership fetch error handling
src/distillery/mcp/org_membership.py
_fetch_membership raises _IndeterminateMembershipError for unexpected HTTP statuses and httpx.HTTPError instead of returning False; docstring updated to describe the new behavior.
Public member fetch error handling
src/distillery/mcp/org_membership.py
_fetch_public_member raises _IndeterminateMembershipError for unexpected HTTP statuses and httpx.HTTPError instead of returning False; docstring updated to describe the new behavior.
Test coverage for uncached transient failures
tests/test_mcp_org_membership.py
Two new async tests verify that httpx.ConnectError and HTTP 503 responses are not cached; each asserts that a second request re-invokes the API instead of serving from cache.

Sequence Diagram

sequenceDiagram
  participant Client
  participant _check_org
  participant _fetch_membership
  participant MembershipCache
  Client->>_check_org: check_membership(org, user)
  _check_org->>_fetch_membership: fetch membership
  _fetch_membership-->>_check_org: _IndeterminateMembershipError (503/ConnectError)
  _check_org-->>Client: False (fail-closed)
  Note over MembershipCache: No cache write
  Client->>_check_org: check_membership(org, user)
  _check_org->>_fetch_membership: fetch membership (cache miss)
  _fetch_membership-->>_check_org: True/False (successful result)
  _check_org->>MembershipCache: cache successful result
  _check_org-->>Client: True/False
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

Poem

🐰 A fetch that fails is just a blip—
No cache to keep that slip,
Retry next time, fresh and bright,
Transience handled right! ✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 62.50% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the main change: preventing caching of transient org-membership check failures, which is the core fix described in the PR objectives.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch

Comment @coderabbitai help to get the list of available commands and usage tips.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant