Skip to content

Rate limit hits should pause the whole queue, not just the triggering task #3

Description

@mtibbits

Problem

When a task hits a rate limit, the queue marks it as RATE_LIMITED and moves on to the next queued task — which also hits the rate limit, and so on. All tasks burn through their retry_count simultaneously while waiting for the rate limit window to reset.

With a 5-hour rate limit window and the built-in 5-minute retry cooldown, a single task needs ~60 retries to survive one window. But because all N queued tasks fail in parallel, the Nth task in the queue needs N×60 retries before it ever gets a real chance to run. A queue of 5 tasks would require the last task to have max_retries: 300 — a value that's impossible to reason about without understanding the internals. Set it too low and tasks silently end up in failed/ with "max retries exceeded", even though nothing actually went wrong.

Proposed fix 1: Pause the whole queue on rate limit

When any task returns a rate limit error, the queue should enter a global rate-limited state and not attempt any other task until the cooldown has elapsed. This means only the rate-limited task burns a retry — subsequent tasks stay QUEUED and untouched.

This reduces the retry requirement for every task to ~60 regardless of queue depth, and eliminates the N×60 scaling problem entirely.

Proposed fix 2: Support max_retries: -1 for unlimited retries

Even with fix 1, users shouldn't have to calculate how many retries are needed to survive a rate limit window. max_retries: -1 as a sentinel for unlimited retries would make the intent explicit and remove the guesswork entirely. The change is minimal — a one-line guard in can_retry() in models.py:

def can_retry(self) -> bool:
    return (self.max_retries == -1 or self.retry_count < self.max_retries) and self.status in [
        PromptStatus.FAILED,
        PromptStatus.RATE_LIMITED,
    ]

Workaround

Setting max_retries to a large number (e.g. 999999) works today but requires users to reason about queue depth and rate limit windows to pick a safe value.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions