feat(backoff): implement full-jitter in ExponentialBackoff jitter parameter#51
Merged
BlindMaster24 merged 1 commit intomainfrom Apr 24, 2026
Merged
Conversation
…ameter Previously `ExponentialBackoff::new(initial, max, factor, _jitter)` accepted the `jitter` parameter but ignored it entirely: the schedule was hard-coded to full jitter (`random_range(0..=cap)`) and callers had no way to trade safety for determinism. This wires the parameter end-to-end as an AWS-style jitter knob `j in [0, 1]`: cap_n = min(initial * factor^n, max) delay = cap_n * (1 - j) + rand(0, cap_n * j) with the extremes: * `j = 1.0` — full jitter: `delay = rand(0, cap_n)` (preserves pre-change default behaviour; safe thundering-herd avoidance). * `j = 0.0` — no jitter: `delay = cap_n` (deterministic; useful for tests and for tight SLAs where ordering matters). * `j = 0.5` — equal jitter: `delay in [cap_n/2, cap_n]`. Values outside `[0, 1]` (including NaN) are silently clamped via `f32::clamp`. A `jitter()` accessor is added so callers can read the clamped value. Also hardens the cap computation: * The previous code computed `cap` via `Duration::from_secs_f32(base.as_secs_f32() * factor.powf(n))`, which panics on overflow / NaN and drifts on small millisecond bases (e.g. 10ms * 2^2 was observed as 39ms instead of 40ms). * New code iterates integer `u128` millis with a f64 multiply and clamps to `max_millis` on each step. It exits early on `factor <= 1.0`, non-finite factor, and saturation. This also makes `initial > max` produce `delay <= max` (a real property- test failure before this change). Tests: * `crates/teamtalk/tests/backoff_jitter_tests.rs` adds 9 new integration tests covering jitter == 0 / 0.5 / 1.0 / out-of-range / NaN, exact cap progression, reset, and the default being full jitter. * Existing `utils_tests.rs` and `utils_property_tests.rs` pass unchanged. Local verification: * cargo fmt --all --check clean * cargo clippy --workspace --all-targets --all-features -- -D warnings clean * cargo test --workspace --all-features -> 296 passed / 0 failed (previously 287; +9 from the new backoff_jitter_tests.rs)
This was referenced Apr 23, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
ExponentialBackoff::new(initial, max, factor, _jitter)previouslyaccepted the
jitterparameter but ignored it entirely: theschedule was hard-coded to full jitter (
random_range(0..=cap)) andcallers had no way to trade safety for determinism. The parameter
is now wired end-to-end as an AWS-style jitter knob
j in [0, 1]:with the extremes:
j = 1.0— full jitter:delay = rand(0, cap_n)(preservespre-change default behaviour; safe thundering-herd avoidance).
j = 0.0— no jitter:delay = cap_n(deterministic; usefulfor tests and tight SLAs where ordering matters).
j = 0.5— equal jitter:delay in [cap_n/2, cap_n].Values outside
[0, 1](includingNaN) are silently clamped viaf32::clamp. Ajitter()accessor is added so callers can readthe clamped value.
Hardened cap computation (bug fixes uncovered while wiring jitter)
The previous code computed the cap via
Duration::from_secs_f32(base.as_secs_f32() * factor.powf(n)),which had two real problems:
factor = 2.0andn > ~30the float product overflows and
Duration::from_secs_f32panicswith "cannot convert float seconds to Duration". Tests that ran
next_delay()many times were susceptible.base = 10 msandfactor = 2.0, the schedule at attempt 2 was39 msinstead of40 msbecausef32round-trips throughas_secs_f32()lose a millisecond.initial > maxproduceddelay > max. The property testbackoff_respects_maxwithinitial_ms = 3, max_ms = 1exposed this: the cap was never clamped to
maxon attempt 0.The rewrite iterates integer
u128millis with anf64multiply,clamps to
max_millison each step (including on the very firstattempt), and exits early on
factor <= 1.0, non-finite factor,and saturation. This removes the panic path, fixes the
10 ms ×2^n = 40 ms precision issue, and enforces
delay <= maxfor every input.Tests
crates/teamtalk/tests/backoff_jitter_tests.rsadds 9 newintegration tests covering:
jitter == 0is deterministic and always returns the cap.jitter == 1spans most of the[0, cap]range over1 000 samples (loose bounds to stay non-flaky).
jitter == 0.5stays in[cap/2, cap].5.0,-0.5,NaN) is clamped.10, 20, 40, ...up tomax.reset()clearscurrent_delayandattempts.Defaultis full jitter.utils_tests.rsandutils_property_tests.rspassunchanged.
Local verification
cargo fmt --all --checkcleancargo clippy --workspace --all-targets --all-features -- -D warningscleancargo test --workspace --all-features— 296 passed / 0 failed(previously 287; +9 from
backoff_jitter_tests.rs).Review & Testing Checklist for Human
f32::clamp(NaN, 0, 1) -> 0.0on the current toolchain(Rust documents that
clampreturnsminwhen the value isNaN; the
jitter_nan_is_clamped_to_zerotest covers this).pre-change behaviour seen by all existing callers (the old
code hard-coded
rand(0..=cap)regardless of the unusedparameter; the new default picks
jitter = 1.0inDefault::default()).previous float-only cap: both are exact on typical
initial ∈ [1 ms, 10 s],factor ∈ [1.1, 2.0],max ∈ [10 ms, 10 min]schedules, and the new code removesthe panic path on overflow.
Notes
First P1 API improvement after the P0 structural refactor queue
(#44–#50). Branches from clean
main, independent of the otherPRs in flight, so can be merged in any order.
Next up:
StreamTypesnewtype (PR #53),SdkErrorCodemappingfor
Error::CommandFailed(PR #54),TimeoutKindenum forError::Timeout(PR #55),SecretString/zeroizefor passwords(PR #56), indexed
EventBus::dispatch/Router::dispatchviaHashMap(PR #57), then a big coverage-gap test-fill PR driven byscripts/audit_teamtalk_coverage.py.The pre-existing
semverCI gate is expected to remain red;release-plzhandles the eventual version bump.Link to Devin session: https://app.devin.ai/sessions/71fdd6196cb74723a2e277bb81993a9c
Requested by: @BlindMaster24