[Deferred] Add configurable retry options #225
[Deferred] Add configurable retry options #225Digvijay-x1 wants to merge 1 commit intorage-rb:mainfrom
Conversation
rsamoilov
left a comment
There was a problem hiding this comment.
Hey @Digvijay-x1 , great to see you back.
The implementation doesn't seem to be working correctly. For example, I'd expect the following task to be retried MAX_ATTEMPTS times, as the raised exception hasn't been customised via retry_on:
class MyTask
include Rage::Deferred::Task
retry_on RuntimeError, attempts: 2
def perform
raise IOError
end
endInstead, the task is discarded after the very first run.
Have you considered a more low-level Sidekiq-like API instead of retry_on? Curious what would be the pros and cons of each approach.
|
TIL This behaviour will be outright wrong for |
|
Hi @rsamoilov,
The task is discarded after the very first run because Am I missing something, or is there a change in plan? Happy to adapt the code if needed :) |
I’m really sorry - that was very bad wording on my end. I’ve also never used Active Job, so I didn’t realise that
The behaviour that makes more sense for Have you explored the hooks Sidekiq provides? |
52813a4 to
a5e111b
Compare
|
I looked into Sidekiq's error handling hooks, they have I think sticking with
Let me know what you think! |
|
What if we go with a two-level approach? Level 1 is a config option for basic configuration: class SendWelcomeEmail
include Rage::Deferred::Task
max_retries 10
def perform(email)
# ...
end
endFor the most common use cases, developers just want to change the maximum number of retries. Here, the Level 2 is an overridable class method for advanced configuration: class ProcessPayment
include Rage::Deferred::Task
# return an Integer to retry in X seconds
# return super to use the default exponential backoff
# return false or nil to abort retries
def self.retry_interval(exception, attempt:)
case exception
when TemporaryNetworkError
10 # Retry in 10 seconds
when InvalidDataError
false # Do not retry
else
super # Default backoff strategy
end
end
def perform(payment_id)
# ...
end
endThe method allows users to set up custom rules, avoids DSLs, and is also easily testable. Plus, developers coming from Active Job won't be confused by What do you think? |
|
Oh, I get it now. This approach LGTM 👍 |
b068bb0 to
d9f113f
Compare
rsamoilov
left a comment
There was a problem hiding this comment.
Left several comments.
Really like what the API looks like.
lib/rage/deferred/task.rb
Outdated
| def __should_retry?(attempts, exception = nil) | ||
| interval = retry_interval(exception, attempt: attempts) | ||
| interval != false && !interval.nil? |
There was a problem hiding this comment.
Let's remove this method and rely on __next_retry_in as a single source of truth that determines both whether and when to retry. This will simplify the code and remove the double computation of retry_interval during retries.
lib/rage/deferred/task.rb
Outdated
| def __next_retry_in(attempts) | ||
| rand(BACKOFF_INTERVAL * 2**attempts.to_i) + 1 | ||
| def __next_retry_in(attempts, exception = nil) | ||
| interval = retry_interval(exception, attempt: attempts) | ||
| interval.is_a?(Integer) ? interval : rand(BACKOFF_INTERVAL * 2**attempts.to_i) + 1 |
There was a problem hiding this comment.
interval.is_a?(Integer)will skip floats. Let's check forNumericinstead- if
intervalis not falsey and notNumeric, let's log a warning
lib/rage/deferred/task.rb
Outdated
| # @private | ||
| def __next_retry_in(attempts) | ||
| rand(BACKOFF_INTERVAL * 2**attempts.to_i) + 1 | ||
| def __next_retry_in(attempts, exception = nil) |
There was a problem hiding this comment.
| def __next_retry_in(attempts, exception = nil) | |
| def __next_retry_in(attempts, exception) |
There was a problem hiding this comment.
Can you please cover the edge cases where retry_interval returns a non-boolean/non-integer/negative value?
| # # ... | ||
| # end | ||
| # end | ||
| def max_retries(count) |
There was a problem hiding this comment.
Right now, this actually defines attempts rather than retries. Let's ensure that max_retries 3 means the task will be executed up to 4 times.
d9f113f to
41b8f7c
Compare
|
Thank you, Roman, for reviewing this PR. I’ve made the requested changes. I added |
rsamoilov
left a comment
There was a problem hiding this comment.
Hi @Digvijay-x1
This looks almost perfect.
There're still several issues with the implementation though:
max_retriesdoesn't check the type of the argument passed to it, which can lead to exception during the task processing. Let's ensurecountresponds toto_i, and then convert it right away inmax_retries- the logic to calculate the default retry interval (
rand(BACKOFF_INTERVAL * 2**attempt) + 1) is duplicated inretry_intervaland__next_retry_in - if the user defines a custom
retry_interval, the retries become unbounded. For example, consider the following task:
class TestTask
include Rage::Deferred::Task
max_retries 2
def self.retry_interval(e, attempt:)
1
end
def perform
raise "test"
end
endWith the current implementation, this task will be retried indefinitely. There're arguments for this approach, e.g. overriding retry_interval means taking full control. Also, some tasks legitimately need unbounded retries.
However, in most cases the behaviour where forgetting to call super or track attempts yourself silently creates infinite retries can be very surprising. It could cause queue buildup and resource exhaustion in production from a simple oversight.
Let's update the implementation to ensure @__max_retries || MAX_ATTEMPTS is always taken into account. This way, the mental model becomes much more intuitive: max_retries = "how many times", retry_interval = "how long to wait (or abort early)".
41b8f7c to
1eae078
Compare
|
Hi Roman,
|
1eae078 to
877d6ec
Compare
…interval Add two class-level APIs for customizing retry behavior in deferred tasks: - max_retries: sets the maximum number of retries (max_retries 3 means the task will be executed up to 4 times total). Validates the argument via to_i at class-load time to fail fast on invalid types. - retry_interval: an overridable class method that receives the exception and attempt number, returning an interval in seconds (Numeric), or false/nil to abort retries Consolidate retry logic into __next_retry_in as the single source of truth for both whether and when to retry, removing the separate __should_retry? method. The max-retries cap is enforced in __next_retry_in so that custom retry_interval overrides cannot create unbounded retries. Extract __default_backoff as the single source for the exponential backoff formula, eliminating duplication between retry_interval and __next_retry_in. Validate retry_interval return values: accept any Numeric (Integer or Float), treat false/nil as abort, and log a warning for unexpected types while falling back to the default exponential backoff. Update __perform to return the exception object on failure instead of false, enabling retry_interval to make decisions based on exception type. Add comprehensive specs covering custom retry intervals, exception-based routing, input validation, bounded retries with custom intervals, and edge cases (Float, nil, false, 0, negative, String, Array).
877d6ec to
a53c533
Compare
[Deferred] Add configurable retry options with max_retries and retry_interval
max_retriesclass method for simple max attempt configurationretry_interval(exception, attempt:)as an overridable class method for advanced per-exception retry control:__should_retry?and__next_retry_into delegate to retry_interval__next_retry_inin the queue for per-exception delay decisionsFixes: #215