Skip to content

chore: better rate limit implementation#36

Open
tlg723 wants to merge 3 commits into
mainfrom
chore/rate-limit
Open

chore: better rate limit implementation#36
tlg723 wants to merge 3 commits into
mainfrom
chore/rate-limit

Conversation

@tlg723
Copy link
Copy Markdown
Collaborator

@tlg723 tlg723 commented Apr 2, 2026

Summary

Added reset support and conditional increment behavior in security.py.
Updated login flow to:
Check lockout without incrementing before auth.
Increment only on failed password or failed TOTP.
Reset login failure tracking on successful login.
Changes are in auth.py.
Added a new unit test suite with 8 scenarios in test_rate_limit_failed_attempts.py

Why

Successful and unsuccessful login/register contribute to rate limit based on same IP
Improve UX

Change / Contract Impact

  • [❌] no behavior change
  • [✅] changes app behavior / expected flow
  • [✅] changes implementation or use-case contract
  • [❌] changes source of truth

If any box other than "no behavior change" applies, explain:

Phase 1: Add Reset Function
1. Add `reset_rate_limit()` function to server/core/security.py
   - Takes key (e.g., `"login:<client_ip>"`)
   - Resets `attempts` to 0 while preserving `locked_until` (if lockout is active)
   - Pattern: Single SQL UPDATE statement to avoid race conditions (consistent with existing upsert pattern)
   - Should only reset if lockout hasn't expired, else delete the record entirely

Phase 2: Modify Rate Limit Checking (Conditional Increment)
2. Modify `check_rate_limit()` function signature in server/core/security.py
   - Add new optional parameter: `is_failed_attempt: bool = True` (default to current behavior)
   - Update the upsert logic:
     - If `is_failed_attempt=True`: increment attempts (current behavior)
     - If `is_failed_attempt=False`: skip incrementing; only check if locked
   - This allows successful checks to verify lockout status without incrementing the counter

Phase 3: Wire Up to Login Endpoint
3. Update server/api/auth.py login handler:
   - Line 101-107 (rate limit check): Call `check_rate_limit()` with `is_failed_attempt=False` to check without incrementing
   - After failed login (validation fails or credentials incorrect): Call `check_rate_limit()` with `is_failed_attempt=True` to increment
   - After successful login (line 205-211): Call `reset_rate_limit()` to clear the counter
   - Structure: wrap auth checks + explicit failure increments, then reset on success

Scope

  • [✅] single bug / single scope
  • [❌] stacked / dependent on another issue or PR

If this PR is tied to a bug report, confirm:

  • [✅] issue scope is clean and matches this implementation
  • [❌] no mixed root causes remain in the issue body
  • [❌] outdated speculation / analysis has been removed

Verification

  • [✅] related tests were updated or added if behavior / contract changed
  • [✅] all related tests pass
  • [❌] test commands actually run are listed below
  • [❌] git diff --check

List any additional commands actually run:

<command output summary>

Merge Readiness

  • [❌] ready to merge
  • [✅] not merge-ready yet

A PR is not merge-ready if:

  • app behavior / contract changed but related tests were not updated
  • source of truth changed but was not made explicit
  • CI is still failing

Workflow / Protection Impact

  • [✅] no GitHub workflow or protection impact
  • [❌] changes GitHub workflows
  • [❌] changes labels or CODEOWNERS
  • [❌] changes branch protection assumptions

If any box other than "no impact" applies, explain:

<what changed and what reviewers should check>

Stacked PR Note

  • [✅] not stacked
  • [❌] stacked on another PR

If stacked, link the base PR and state merge order:

Base PR: #
Merge order:

@tlg723 tlg723 requested a review from mythic3011 as a code owner April 2, 2026 09:51
@tlg723 tlg723 added tests Adds or changes automated test coverage security Security-sensitive changes or review focus server Changes under the server application labels Apr 2, 2026
@github-actions github-actions Bot added the testing Changes testing code, test coverage, or verification workflows label Apr 2, 2026
@mythic3011 mythic3011 requested a review from Copilot April 2, 2026 09:55
@mythic3011 mythic3011 assigned mythic3011 and unassigned mythic3011 Apr 2, 2026
@mythic3011 mythic3011 self-assigned this Apr 2, 2026
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR refines the server’s DB-backed rate limiting so that login rate limits are driven by failed attempts only (with an explicit reset on successful authentication), improving UX while preserving lockout behavior.

Changes:

  • Added reset_rate_limit() and extended check_rate_limit() with is_failed_attempt to support read-only lockout checks and explicit failure increments.
  • Updated /v1/auth/login flow to check lockout without incrementing, increment only on password/TOTP failures, and reset on success.
  • Added a unit test suite covering 8 failed-attempt-only and reset/lockout scenarios.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

File Description
server/core/security.py Adds reset support and conditional (failed-attempt-only) rate limit increment behavior.
server/api/auth.py Rewires login handler to separate lockout check, failure increments, and success reset.
tests/unit/test_rate_limit_failed_attempts.py New tests validating failed-attempt-only counting, reset semantics, and lockout preservation.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread server/api/auth.py Outdated
Comment on lines 232 to 236
# Successful authentication clears accumulated failure count for this IP.
await reset_rate_limit(rate_limit_key)

await db.commit()

Copy link

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reset_rate_limit() is invoked before db.commit(), and it uses a separate DB session/transaction. If the session insert/commit later fails, the rate-limit counter will still have been cleared. Consider moving the reset call to after a successful db.commit() (or executing the reset using the same db session/transaction) so the limiter state stays consistent with the login outcome and to reduce the chance of SQLite write-lock contention from nested sessions.

Suggested change
# Successful authentication clears accumulated failure count for this IP.
await reset_rate_limit(rate_limit_key)
await db.commit()
await db.commit()
# Successful authentication clears accumulated failure count for this IP.
await reset_rate_limit(rate_limit_key)

Copilot uses AI. Check for mistakes.
Comment thread server/core/security.py Outdated
Comment on lines +180 to +195
# Clean up non-locked (or expired lock) rows.
await db.execute(
delete(RateLimit).where(
and_(
RateLimit.key == key,
(RateLimit.locked_until.is_(None) | (RateLimit.locked_until <= now)),
)
)
)

# Keep active lockout but clear attempt counter.
await db.execute(
update(RateLimit)
.where(and_(RateLimit.key == key, RateLimit.locked_until > now))
.values(attempts=0)
)
Copy link

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reset_rate_limit() performs two separate write statements (DELETE then UPDATE). That means the reset isn’t a single atomic statement as described in the PR plan, and on databases with MVCC (e.g., Postgres/MySQL) another transaction could interleave between the two operations. If you want this to be race-resistant, consider implementing it as one SQL statement (e.g., a CTE that conditionally UPDATEs active-lock rows and otherwise DELETEs) or otherwise ensuring atomicity under concurrent login attempts.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Owner

@mythic3011 mythic3011 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR is not merge-ready.

It changes /v1/auth/login from a coarse per-IP request throttle into failed-attempt-only lockout behavior. In the new flow, successful logins do not consume the limiter, and a successful login resets the same shared IP-scoped key.

That creates two regressions:

  1. a valid account on the same IP can clear brute-force history for other usernames behind that IP
  2. valid credentials can be used to make unbounded login requests from one IP, removing the previous coarse endpoint throttle

Please restore a coarse per-IP request cap for /login and do not let a successful login clear the shared IP-wide failure bucket.

Not ready.

…successful login clear the shared IP-wide failure bucket.
Copy link
Copy Markdown
Owner

@mythic3011 mythic3011 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved as a regression fix.

This restores the coarse shared per-IP /login cap and stops successful login from clearing that shared bucket, which is the right immediate fix.

Follow-ups still needed:

  • define whether we also want per-account or username+IP failed-attempt throttling
  • add mixed success/failure regression coverage
  • clean up helper naming so the policy is explicit at the call site

Mergeable now, but not the final login throttling design.

@mythic3011
Copy link
Copy Markdown
Owner

mythic3011 commented Apr 3, 2026

@tlg723

I’d like to take ownership of this ticket so we can get it to a merge-ready state before the 8 Apr 2026 deadline.

I also think this should be marked as a co-work ticket in the report, not a single-owner item, because the final merged result will reflect both the original implementation attempt and the follow-up repair / alignment work.

Current review findings:

  1. [P1] login() now calls check_rate_limit(..., is_failed_attempt=...), but the limiter on this branch still does not support that contract. In the current PR state, /login will raise TypeError before authentication completes.

  2. [P1] The new tests do not collect because reset_rate_limit is imported but not implemented on this branch. That means the verification story is incomplete.

  3. [P1] The branch still locks in the old coarse shared-IP semantics: successful logins continue consuming the shared bucket, and the current regression test explicitly asserts 429 after repeated successful logins from the same IP. So the current code does not match the policy change described by the PR.

Verdict: not ready.

Before I proceed, I want to confirm:

  1. Can I take ownership of the final fix / merge path for this ticket?
  2. For the report, should this be recorded as a co-work ticket, and if so, what contribution split do you want us to state?

My scope will be narrow:

  • make the branch internally consistent
  • align the rate-limit helper API, login flow, and tests
  • land one explicit, defensible policy instead of two incomplete ones mixed together

Copy link
Copy Markdown
Owner

@mythic3011 mythic3011 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should keep the main helper direction here, not the current PR’s partial is_failed_attempt / reset_rate_limit model.

Right now the branch mixes two incompatible contracts:

  • /login calls the new helper shape
  • security.py still exposes the old one
  • tests reference reset_rate_limit, but it is not actually implemented
  • the resulting branch is both broken and policy-inconsistent

main is the safer base because it already preserves one coherent coarse shared per-IP abuse boundary and does not let successful login clear that shared state.

If we want a more refined login throttling model later, that should be a separate explicit follow-up, not a half-merged contract change in this PR.

@mythic3011
Copy link
Copy Markdown
Owner

@tlg723

Thanks for continuing to look into this. I agree there is still room to improve the rate-limit policy, and you’re welcome to keep working on it as a nice-to-have follow-up if you’d like to push it further.

At the same time, given the current timeline, I think we should be careful about priority. Compared with this refinement work, the report, testing evidence, and final video/demo preparation are more important for submission right now and will have a more direct impact on delivery and grading.

So my suggestion would be:

  • keep this as optional follow-up / polish work
  • focus first on the higher-priority submission tasks
  • return to this once the report/video side is in a safe state

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

security Security-sensitive changes or review focus server Changes under the server application testing Changes testing code, test coverage, or verification workflows tests Adds or changes automated test coverage

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants