Skip to content

[Deferred] Add configurable retry options #225

Open
Digvijay-x1 wants to merge 1 commit intorage-rb:mainfrom
Digvijay-x1:configurable-retry-options
Open

[Deferred] Add configurable retry options #225
Digvijay-x1 wants to merge 1 commit intorage-rb:mainfrom
Digvijay-x1:configurable-retry-options

Conversation

@Digvijay-x1
Copy link
Contributor

@Digvijay-x1 Digvijay-x1 commented Feb 28, 2026

[Deferred] Add configurable retry options with max_retries and retry_interval

  • Add max_retries class method for simple max attempt configuration
  • Add retry_interval(exception, attempt:) as an overridable class method for advanced per-exception retry control:
    • Return an Integer to retry in N seconds
    • Return super for default exponential backoff
    • Return false/nil to abort retries
  • Update __should_retry? and __next_retry_in to delegate to retry_interval
  • Pass exception to __next_retry_in in the queue for per-exception delay decisions

Fixes: #215

@Digvijay-x1 Digvijay-x1 marked this pull request as ready for review February 28, 2026 22:11
Copy link
Member

@rsamoilov rsamoilov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @Digvijay-x1 , great to see you back.

The implementation doesn't seem to be working correctly. For example, I'd expect the following task to be retried MAX_ATTEMPTS times, as the raised exception hasn't been customised via retry_on:

class MyTask
  include Rage::Deferred::Task
  retry_on RuntimeError, attempts: 2

  def perform
    raise IOError
  end
end

Instead, the task is discarded after the very first run.

Have you considered a more low-level Sidekiq-like API instead of retry_on? Curious what would be the pros and cons of each approach.

@rsamoilov
Copy link
Member

TIL retry_on in Active Job works pretty much like you've implemented. And if an error is not in retry_on, it's bubbled up to the backend (e.g. Sidekiq)...

This behaviour will be outright wrong for Rage::Deferred. Let's start with defining the API.

@Digvijay-x1
Copy link
Contributor Author

Digvijay-x1 commented Mar 6, 2026

Hi @rsamoilov,
The issue description says "Retryable exceptions - allow specifying a list of exception classes that should trigger a retry. All other exceptions would not be retried."

The implementation doesn't seem to be working correctly. For example, I'd expect the following task to be retried MAX_ATTEMPTS times, as the raised exception hasn't been customised via retry_on:

The task is discarded after the very first run because IOError is not specified exception.

Am I missing something, or is there a change in plan? Happy to adapt the code if needed :)

@rsamoilov
Copy link
Member

The issue description says "Retryable exceptions - allow specifying a list of exception classes that should trigger a retry. All other exceptions would not be retried."

I’m really sorry - that was very bad wording on my end.

I’ve also never used Active Job, so I didn’t realise that retry_on specifies an exclusive list of exceptions where everything else gets propagated to the backend.

Rage::Deferred is an execution layer itself, not an abstraction over a backend. When a task fails with an unexpected exception, there’s nowhere for it to propagate to - it’s either gets retried or it’s lost. So treating the exception list as exclusive would mean that unexpected errors silently fail.

The behaviour that makes more sense for Rage::Deferred would be: exceptions on the list get retried with the custom retry count, and all other exceptions fall back to the default retry behaviour. The only thing is that considering how retry_on works in Active Job, a different behaviour in Rage might be surprising.

Have you explored the hooks Sidekiq provides?

@Digvijay-x1 Digvijay-x1 force-pushed the configurable-retry-options branch from 52813a4 to a5e111b Compare March 8, 2026 07:41
@Digvijay-x1
Copy link
Contributor Author

Digvijay-x1 commented Mar 8, 2026

I looked into Sidekiq's error handling hooks, they have sidekiq_options retry: N for per-job config, sidekiq_retry_in for custom backoff logic, and sidekiq_retries_exhausted for a death callback. It's a solid system, but most of it is built around having a dead set and a web UI for manual retries, which doesn't really seem to apply here.

I think sticking with retry_on but fixing the fallback behavior will be easier. I've updated __should_retry? so that:

  • Matching exceptions use the custom attempts count
  • Non-matching exceptions fall back to MAX_ATTEMPTS (5)
  • Default behavior stays the same

Let me know what you think!

@rsamoilov
Copy link
Member

What if we go with a two-level approach?

Level 1 is a config option for basic configuration:

class SendWelcomeEmail
  include Rage::Deferred::Task

  max_retries 10

  def perform(email)
    # ...
  end
end

For the most common use cases, developers just want to change the maximum number of retries. Here, the max_retries call allows just that.

Level 2 is an overridable class method for advanced configuration:

class ProcessPayment
  include Rage::Deferred::Task

  # return an Integer to retry in X seconds
  # return super to use the default exponential backoff
  # return false or nil to abort retries
  def self.retry_interval(exception, attempt:)
    case exception
    when TemporaryNetworkError
      10 # Retry in 10 seconds
    when InvalidDataError
      false # Do not retry
    else
      super # Default backoff strategy
    end
  end

  def perform(payment_id)
    # ...
  end
end

The method allows users to set up custom rules, avoids DSLs, and is also easily testable. Plus, developers coming from Active Job won't be confused by retry_on behaving differently.

What do you think?

@Digvijay-x1
Copy link
Contributor Author

Oh, I get it now. This approach LGTM 👍

@Digvijay-x1 Digvijay-x1 force-pushed the configurable-retry-options branch 2 times, most recently from b068bb0 to d9f113f Compare March 11, 2026 15:06
Copy link
Member

@rsamoilov rsamoilov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left several comments.

Really like what the API looks like.

Comment on lines +171 to +173
def __should_retry?(attempts, exception = nil)
interval = retry_interval(exception, attempt: attempts)
interval != false && !interval.nil?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's remove this method and rely on __next_retry_in as a single source of truth that determines both whether and when to retry. This will simplify the code and remove the double computation of retry_interval during retries.

Comment on lines +126 to +179
def __next_retry_in(attempts)
rand(BACKOFF_INTERVAL * 2**attempts.to_i) + 1
def __next_retry_in(attempts, exception = nil)
interval = retry_interval(exception, attempt: attempts)
interval.is_a?(Integer) ? interval : rand(BACKOFF_INTERVAL * 2**attempts.to_i) + 1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • interval.is_a?(Integer) will skip floats. Let's check for Numeric instead
  • if interval is not falsey and not Numeric, let's log a warning

# @private
def __next_retry_in(attempts)
rand(BACKOFF_INTERVAL * 2**attempts.to_i) + 1
def __next_retry_in(attempts, exception = nil)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def __next_retry_in(attempts, exception = nil)
def __next_retry_in(attempts, exception)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please cover the edge cases where retry_interval returns a non-boolean/non-integer/negative value?

# # ...
# end
# end
def max_retries(count)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right now, this actually defines attempts rather than retries. Let's ensure that max_retries 3 means the task will be executed up to 4 times.

@Digvijay-x1 Digvijay-x1 force-pushed the configurable-retry-options branch from d9f113f to 41b8f7c Compare March 14, 2026 09:36
@Digvijay-x1
Copy link
Contributor Author

Thank you, Roman, for reviewing this PR.

I’ve made the requested changes. I added 0 as a valid retry interval (0 → immediate retry instead of aborting) and documented this behavior in the tests. The remaining updates have been included in the latest commit.

Copy link
Member

@rsamoilov rsamoilov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @Digvijay-x1

This looks almost perfect.

There're still several issues with the implementation though:

  • max_retries doesn't check the type of the argument passed to it, which can lead to exception during the task processing. Let's ensure count responds to to_i, and then convert it right away in max_retries
  • the logic to calculate the default retry interval (rand(BACKOFF_INTERVAL * 2**attempt) + 1) is duplicated in retry_interval and __next_retry_in
  • if the user defines a custom retry_interval, the retries become unbounded. For example, consider the following task:
class TestTask
  include Rage::Deferred::Task

  max_retries 2

  def self.retry_interval(e, attempt:)
    1
  end

  def perform
    raise "test"
  end
end

With the current implementation, this task will be retried indefinitely. There're arguments for this approach, e.g. overriding retry_interval means taking full control. Also, some tasks legitimately need unbounded retries.

However, in most cases the behaviour where forgetting to call super or track attempts yourself silently creates infinite retries can be very surprising. It could cause queue buildup and resource exhaustion in production from a simple oversight.

Let's update the implementation to ensure @__max_retries || MAX_ATTEMPTS is always taken into account. This way, the mental model becomes much more intuitive: max_retries = "how many times", retry_interval = "how long to wait (or abort early)".

@Digvijay-x1 Digvijay-x1 force-pushed the configurable-retry-options branch from 41b8f7c to 1eae078 Compare March 18, 2026 15:28
@Digvijay-x1
Copy link
Contributor Author

Hi Roman,

  • I have added validation for max_retries argument
  • Removed the duplicated backoff formula ( In rand(BACKOFF_INTERVAL * 2**attempt.to_i) + 1 removed the .to_i Rage::Deferred::Context.inc_attempts always return a integer )
  • Moved the attempt-count check out of retry_interval to _next_retry_in

@Digvijay-x1 Digvijay-x1 force-pushed the configurable-retry-options branch from 1eae078 to 877d6ec Compare March 18, 2026 16:39
…interval

Add two class-level APIs for customizing retry behavior in deferred tasks:

- max_retries: sets the maximum number of retries (max_retries 3 means
  the task will be executed up to 4 times total). Validates the argument
  via to_i at class-load time to fail fast on invalid types.
- retry_interval: an overridable class method that receives the exception
  and attempt number, returning an interval in seconds (Numeric), or
  false/nil to abort retries

Consolidate retry logic into __next_retry_in as the single source of
truth for both whether and when to retry, removing the separate
__should_retry? method. The max-retries cap is enforced in __next_retry_in
so that custom retry_interval overrides cannot create unbounded retries.

Extract __default_backoff as the single source for the exponential backoff
formula, eliminating duplication between retry_interval and __next_retry_in.

Validate retry_interval return values: accept any Numeric (Integer or
Float), treat false/nil as abort, and log a warning for unexpected types
while falling back to the default exponential backoff.

Update __perform to return the exception object on failure instead of
false, enabling retry_interval to make decisions based on exception type.

Add comprehensive specs covering custom retry intervals, exception-based
routing, input validation, bounded retries with custom intervals, and
edge cases (Float, nil, false, 0, negative, String, Array).
@Digvijay-x1 Digvijay-x1 force-pushed the configurable-retry-options branch from 877d6ec to a53c533 Compare March 18, 2026 16:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Deferred] Configurable retry options

2 participants