Skip to content

fix(quota): eliminate TOCTOU race conditions in quota service#1608

Open
atul-upadhyay-7 wants to merge 1 commit into
nisshchayarathi:mainfrom
atul-upadhyay-7:fix/issue-1588
Open

fix(quota): eliminate TOCTOU race conditions in quota service#1608
atul-upadhyay-7 wants to merge 1 commit into
nisshchayarathi:mainfrom
atul-upadhyay-7:fix/issue-1588

Conversation

@atul-upadhyay-7
Copy link
Copy Markdown
Collaborator

@atul-upadhyay-7 atul-upadhyay-7 commented Jun 1, 2026

Closes #1588

Summary by CodeRabbit

  • New Features

    • Added webhook rate limiting with input validation and enforcement
    • Added AI quota reservation system with automatic max quota management
    • Added status monitoring for quotas and rate limits
    • Added automatic cleanup of expired rate limit records
  • Tests

    • Added comprehensive test suite covering quota and rate limiting validation, configuration, and error handling

@vercel
Copy link
Copy Markdown

vercel Bot commented Jun 1, 2026

@anshika1179 is attempting to deploy a commit to the Nisshchaya's projects Team on Vercel.

A member of the Team first needs to authorize it.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Jun 1, 2026

Review Change Stack

📝 Walkthrough

Walkthrough

This PR expands QuotaService with new status interfaces, input validation, webhook rate-limit sanitization, and critically fixes a TOCTOU race condition in AI quota reservation using atomic updateMany. It adds monitoring/admin utilities and comprehensive test coverage with a security compliance checklist.

Changes

Quota and Rate-Limit Service Expansion

Layer / File(s) Summary
Status Interfaces and Webhook Rate Limit Setup
lib/services/quotaService.ts
Exports QuotaStatus and RateLimitStatus interfaces with all required fields (installationId, requestsUsed, maxRequests, windowStart/End, warningPosted, tokensConsumed, remainingRequests, utilizationPercent, isExpired, timeUntilResetMs, key, currentCount, limit, windowMs, remaining, isExceeded). Adds validation helpers and constants. Extends checkWebhookRateLimit to validate rate-limit parameters and sanitize keys.
Webhook Rate Limit Record Insertion and Error Handling
lib/services/quotaService.ts
Completes checkWebhookRateLimit by inserting limiter records for allowed requests. Handles Prisma unique-constraint race (P2002) by returning false (rate-limited). Other DB errors trigger logging and fail-open behavior (returns true).
Rate-Limit Monitoring and Cleanup
lib/services/quotaService.ts
Implements cleanupExpiredRateLimits() for batched periodic cleanup of expired limiter rows. Adds getRateLimitStatus() helper that counts active rows, computes remaining/utilization/exceeded flags, and returns safe defaults on DB errors.
AI Quota Reservation with Atomic TOCTOU Fix
lib/services/quotaService.ts
Fixes critical TOCTOU race condition in checkAndReserveQuota() by replacing vulnerable read-then-update with atomic updateMany guarded by requestsUsed < max. When update affects 0 rows, fetches quota to distinguish window expiry (resets and recounts) from true exhaustion (returns false). Validates installationId and retrieves maxRequests from getQuotaMax(). Enforces fail-closed error handling.
Quota Configuration and Admin Operations
lib/services/quotaService.ts
Implements getQuotaMax() with environment variable parsing and capping behavior. Adds getQuotaStatus() computing expiry, remaining, utilization, and timeUntilResetMs. Implements resetQuota() admin operation. Introduces getBulkQuotaStatus() that deduplicates input IDs, bulk-fetches aiQuota rows, and computes per-installation status with null fallback on failure.
Token Accounting and Warning Tracking
lib/services/quotaService.ts
Enhances recordTokenUsage() with input validation rejecting negative values and early-returning on zero, logging errors without throwing. Expands hasWarningBeenPosted() documentation while preserving behavior of returning true on DB errors to prevent duplicate warning spam.
Comprehensive Test Coverage and Security Checklist
lib/services/__tests__/quotaService.test.ts
Adds Scenario 5 test suite covering quota max capping/fallbacks, installationId/rate-limit parameter validation, key sanitization, invalid input handling for public methods, and status computation outputs with DB error scenarios. Appends security compliance verification checklist enumerating fail-open/fail-closed correctness, time/window calculations, sanitization, concurrent race handling, and state mutation safety.

Sequence Diagram

sequenceDiagram
  participant Client
  participant checkAndReserveQuota
  participant validateInstallationId
  participant getQuotaMax
  participant updateMany
  participant fetchQuota
  
  Client->>checkAndReserveQuota: installationId
  checkAndReserveQuota->>validateInstallationId: validate
  validateInstallationId-->>checkAndReserveQuota: valid/null
  checkAndReserveQuota->>getQuotaMax: get maxRequests
  getQuotaMax-->>checkAndReserveQuota: maxRequests
  checkAndReserveQuota->>updateMany: atomic increment where requestsUsed < max
  updateMany-->>checkAndReserveQuota: count affected rows
  alt count > 0 (reserved)
    checkAndReserveQuota-->>Client: true (allowed)
  else count == 0 (update failed)
    checkAndReserveQuota->>fetchQuota: fetch current quota
    fetchQuota-->>checkAndReserveQuota: quota or null
    alt quota null (record missing)
      checkAndReserveQuota->>updateMany: upsert and retry
      updateMany-->>checkAndReserveQuota: count
      checkAndReserveQuota-->>Client: count > 0
    else windowStart expired
      checkAndReserveQuota->>updateMany: reset window and increment
      updateMany-->>checkAndReserveQuota: count
      checkAndReserveQuota-->>Client: count > 0
    else true exhaustion
      checkAndReserveQuota-->>Client: false (denied)
    end
  else DB error
    checkAndReserveQuota-->>Client: false (fail-closed)
  end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

Possibly related PRs

  • nisshchayarathi/gitverse-nextjs#1584: Both PRs modify checkWebhookRateLimit (P2002 handling, key sanitization) and checkAndReserveQuota (atomic updateMany fix for TOCTOU race), with overlapping Jest test coverage in quotaService.test.ts.

Suggested labels

bug, concurrency, db, critical, level:advanced, mentor:nisshchayarathi

Poem

🐰 Atomic hops through the quota gate,
Where TOCTOU races meet their fate,
updateMany holds the line so tight,
Concurrent dreams resolve to right,
With rate limits clean and warnings bright!

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'fix(quota): eliminate TOCTOU race conditions in quota service' accurately summarizes the main change—addressing TOCTOU race conditions in the quota service as described in the linked issue.
Linked Issues check ✅ Passed The PR implements the suggested fix by replacing read-then-update with atomic conditional database updates using updateMany with requestsUsed < max guards, directly addressing the TOCTOU race condition documented in issue #1588.
Out of Scope Changes check ✅ Passed All changes are scoped to quota and rate-limit management: validations, sanitization, status monitoring, and atomic operations. No unrelated modifications detected outside the linked issue requirements.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions github-actions Bot added the gssoc:invalid GSSoC: Invalid contribution label Jun 1, 2026
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Jun 1, 2026

⚠️ GSSoC Quality Check Failed — PR #1608

Hi @atul-upadhyay-7! 👋 Your PR has been flagged by our automated GSSoC quality check.

Issues found:

  • 📝 Empty or missing description — Your PR has no meaningful description. Please explain what you changed and why.

✅ How to fix this

  1. Read the issues listed above carefully
  2. Edit your PR title and description to address them
  3. Make sure your PR is linked to an open issue using closes #<issue-number>
  4. Make sure your changes are meaningful and solve a real problem

Once you've fixed these, a maintainer will review and remove the flag. If you believe this is a mistake, please comment below. 🙏

GSSoC'26 automation · Maintainer: @nisshchayarathi

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
lib/services/quotaService.ts (1)

336-351: ⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

Race condition remains in window reset path.

When updateMany returns 0 rows and the window is expired, multiple concurrent requests can all enter this branch simultaneously. Each executes update({ data: { requestsUsed: 1, ... } }) and returns true. The result is that N concurrent requests all succeed while requestsUsed is only set to 1.

While the main TOCTOU fix (atomic updateMany with lt condition) is correct, this secondary race at window expiry allows quota bypass. The impact is limited to the moment of window transition (once per 24h typically), but could allow burst exceeding quota at that instant.

Consider using updateMany with a condition to ensure the window hasn't been reset by another request:

Proposed atomic window reset approach
         // Check if the window has expired and needs reset
         if (quota.quotaWindowEnd < now) {
-          // Reset the window: set requestsUsed to 1 (this request counts),
-          // clear token consumption, and start a new window.
-          await prisma.aiQuota.update({
-            where: { id: quota.id },
+          // Atomically reset only if window still expired (guards against concurrent resets).
+          const resetResult = await prisma.aiQuota.updateMany({
+            where: {
+              installationId,
+              quotaWindowEnd: { lt: now },
+            },
             data: {
               requestsUsed: 1,
               tokensConsumed: 0,
               quotaWindowStart: now,
               quotaWindowEnd: windowEnd,
               warningPosted: false,
               lastAnalysisAt: now,
             },
           });
-          return true;
+          // If reset succeeded, this request reserved. If 0 rows, another request
+          // already reset the window - retry the main updateMany path.
+          if (resetResult.count > 0) {
+            return true;
+          }
+          // Retry: window was reset by another request, try atomic increment again.
+          const retryResult = await prisma.aiQuota.updateMany({
+            where: {
+              installationId,
+              requestsUsed: { lt: defaultMaxAnalyses },
+            },
+            data: {
+              requestsUsed: { increment: 1 },
+              lastAnalysisAt: now,
+            },
+          });
+          return retryResult.count > 0;
         }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@lib/services/quotaService.ts` around lines 336 - 351, The window-reset branch
that calls prisma.aiQuota.update({ where: { id: quota.id }, ... }) is racy
because multiple concurrent requests can each set requestsUsed to 1; replace
that single-row update with an atomic updateMany using the same lt condition as
the earlier attempt: call prisma.aiQuota.updateMany({ where: { id: quota.id,
quotaWindowEnd: { lt: now } }, data: { requestsUsed: 1, tokensConsumed: 0,
quotaWindowStart: now, quotaWindowEnd: windowEnd, warningPosted: false,
lastAnalysisAt: now } }) and only treat the reset as successful if the returned
count > 0 (otherwise fall through/retry); keep references to
quota.quotaWindowEnd, prisma.aiQuota.updateMany, and quota.id to locate and
implement the change.
🧹 Nitpick comments (2)
lib/services/__tests__/quotaService.test.ts (2)

261-277: ⚡ Quick win

Restore process.env.AI_QUOTA_PER_WINDOW in afterEach to prevent cross-test leakage.

Tests 5.1–5.3 set and delete the env var inline. If any assertion throws before the delete line runs, the value leaks into later tests — notably 5.18 (Line 372) which depends on the default max of 250 to expect remainingRequests === 240. Snapshot/restore in the existing afterEach makes isolation robust regardless of failures.

♻️ Suggested isolation
   afterEach(() => {
     jest.useRealTimers();
+    delete process.env.AI_QUOTA_PER_WINDOW;
   });

Then the inline delete process.env.AI_QUOTA_PER_WINDOW; calls in 5.1–5.3 can be dropped.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@lib/services/__tests__/quotaService.test.ts` around lines 261 - 277, The
tests mutate process.env.AI_QUOTA_PER_WINDOW inline in the "Scenario
5.1/5.2/5.3" cases which can leak if an assertion throws; update the existing
test teardown to snapshot the original process.env.AI_QUOTA_PER_WINDOW at the
start of the suite and restore it in afterEach (so the environment is reset even
on failure), remove the inline delete process.env.AI_QUOTA_PER_WINDOW calls from
the three it blocks, and ensure QuotaService.getQuotaMax() tests still assert
expected values while relying on the afterEach restoration for isolation.

256-259: 💤 Low value

Trim the "compliance checklist" to what the suite actually verifies.

The block is titled a "SECURITY COMPLIANCE VERIFICATION CHECKLIST" but many enumerated items have no corresponding test — e.g. concurrency/thread-safety (Lines 405–406), BigInt overflow (Line 396), SQL-injection exclusion (Line 400), cleanup batch-size enforcement (Line 399), and lastAnalysisAt updates (Line 432). Presenting these as verified can give a false sense of assurance during an audit. The "GSSoC '26 authorization" references (Lines 258, 395) also don't map to any control here. Consider reducing this to items the tests genuinely cover, or relabel it as an explicit TODO/backlog of gaps.

Also applies to: 377-436

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@lib/services/__tests__/quotaService.test.ts` around lines 256 - 259, The
test-suite comment block titled "SECURITY COMPLIANCE VERIFICATION CHECKLIST"
inside the describe("Scenario 5: Additional Quota Configuration & Exception
Validations") is overstating coverage; remove or shrink the checklist to only
items that have explicit tests, and convert remaining unsupported items (e.g.,
concurrency/thread-safety, BigInt overflow, SQL-injection exclusion, cleanup
batch-size enforcement, lastAnalysisAt updates, and any "GSSoC '26
authorization" references) into a TODO/backlog note or separate risk list so the
suite doesn't claim untested controls; update the comment text accordingly to
accurately reflect what the tests in quotaService.test.ts actually verify.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In `@lib/services/quotaService.ts`:
- Around line 336-351: The window-reset branch that calls
prisma.aiQuota.update({ where: { id: quota.id }, ... }) is racy because multiple
concurrent requests can each set requestsUsed to 1; replace that single-row
update with an atomic updateMany using the same lt condition as the earlier
attempt: call prisma.aiQuota.updateMany({ where: { id: quota.id, quotaWindowEnd:
{ lt: now } }, data: { requestsUsed: 1, tokensConsumed: 0, quotaWindowStart:
now, quotaWindowEnd: windowEnd, warningPosted: false, lastAnalysisAt: now } })
and only treat the reset as successful if the returned count > 0 (otherwise fall
through/retry); keep references to quota.quotaWindowEnd,
prisma.aiQuota.updateMany, and quota.id to locate and implement the change.

---

Nitpick comments:
In `@lib/services/__tests__/quotaService.test.ts`:
- Around line 261-277: The tests mutate process.env.AI_QUOTA_PER_WINDOW inline
in the "Scenario 5.1/5.2/5.3" cases which can leak if an assertion throws;
update the existing test teardown to snapshot the original
process.env.AI_QUOTA_PER_WINDOW at the start of the suite and restore it in
afterEach (so the environment is reset even on failure), remove the inline
delete process.env.AI_QUOTA_PER_WINDOW calls from the three it blocks, and
ensure QuotaService.getQuotaMax() tests still assert expected values while
relying on the afterEach restoration for isolation.
- Around line 256-259: The test-suite comment block titled "SECURITY COMPLIANCE
VERIFICATION CHECKLIST" inside the describe("Scenario 5: Additional Quota
Configuration & Exception Validations") is overstating coverage; remove or
shrink the checklist to only items that have explicit tests, and convert
remaining unsupported items (e.g., concurrency/thread-safety, BigInt overflow,
SQL-injection exclusion, cleanup batch-size enforcement, lastAnalysisAt updates,
and any "GSSoC '26 authorization" references) into a TODO/backlog note or
separate risk list so the suite doesn't claim untested controls; update the
comment text accordingly to accurately reflect what the tests in
quotaService.test.ts actually verify.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 5bd9427b-8d6d-470e-8505-70300c10c60a

📥 Commits

Reviewing files that changed from the base of the PR and between 63ecb90 and e71a774.

📒 Files selected for processing (2)
  • lib/services/__tests__/quotaService.test.ts
  • lib/services/quotaService.ts

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

gssoc:invalid GSSoC: Invalid contribution

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Critical: TOCTOU Race Condition in Quota Service Allows Concurrent Requests to Exceed AI Quota Limits

2 participants