Problem
When a task hits a rate limit, the queue marks it as RATE_LIMITED and moves on to the next queued task — which also hits the rate limit, and so on. All tasks burn through their retry_count simultaneously while waiting for the rate limit window to reset.
With a 5-hour rate limit window and the built-in 5-minute retry cooldown, a single task needs ~60 retries to survive one window. But because all N queued tasks fail in parallel, the Nth task in the queue needs N×60 retries before it ever gets a real chance to run. A queue of 5 tasks would require the last task to have max_retries: 300 — a value that's impossible to reason about without understanding the internals. Set it too low and tasks silently end up in failed/ with "max retries exceeded", even though nothing actually went wrong.
Proposed fix 1: Pause the whole queue on rate limit
When any task returns a rate limit error, the queue should enter a global rate-limited state and not attempt any other task until the cooldown has elapsed. This means only the rate-limited task burns a retry — subsequent tasks stay QUEUED and untouched.
This reduces the retry requirement for every task to ~60 regardless of queue depth, and eliminates the N×60 scaling problem entirely.
Proposed fix 2: Support max_retries: -1 for unlimited retries
Even with fix 1, users shouldn't have to calculate how many retries are needed to survive a rate limit window. max_retries: -1 as a sentinel for unlimited retries would make the intent explicit and remove the guesswork entirely. The change is minimal — a one-line guard in can_retry() in models.py:
def can_retry(self) -> bool:
return (self.max_retries == -1 or self.retry_count < self.max_retries) and self.status in [
PromptStatus.FAILED,
PromptStatus.RATE_LIMITED,
]
Workaround
Setting max_retries to a large number (e.g. 999999) works today but requires users to reason about queue depth and rate limit windows to pick a safe value.
Problem
When a task hits a rate limit, the queue marks it as
RATE_LIMITEDand moves on to the next queued task — which also hits the rate limit, and so on. All tasks burn through theirretry_countsimultaneously while waiting for the rate limit window to reset.With a 5-hour rate limit window and the built-in 5-minute retry cooldown, a single task needs ~60 retries to survive one window. But because all N queued tasks fail in parallel, the Nth task in the queue needs N×60 retries before it ever gets a real chance to run. A queue of 5 tasks would require the last task to have
max_retries: 300— a value that's impossible to reason about without understanding the internals. Set it too low and tasks silently end up infailed/with "max retries exceeded", even though nothing actually went wrong.Proposed fix 1: Pause the whole queue on rate limit
When any task returns a rate limit error, the queue should enter a global rate-limited state and not attempt any other task until the cooldown has elapsed. This means only the rate-limited task burns a retry — subsequent tasks stay
QUEUEDand untouched.This reduces the retry requirement for every task to ~60 regardless of queue depth, and eliminates the N×60 scaling problem entirely.
Proposed fix 2: Support
max_retries: -1for unlimited retriesEven with fix 1, users shouldn't have to calculate how many retries are needed to survive a rate limit window.
max_retries: -1as a sentinel for unlimited retries would make the intent explicit and remove the guesswork entirely. The change is minimal — a one-line guard incan_retry()inmodels.py:Workaround
Setting
max_retriesto a large number (e.g.999999) works today but requires users to reason about queue depth and rate limit windows to pick a safe value.