Skip to content

feat(cc): SEARCH#3518

Open
omansfeld wants to merge 9 commits intomozilla:mainfrom
omansfeld:search
Open

feat(cc): SEARCH#3518
omansfeld wants to merge 9 commits intomozilla:mainfrom
omansfeld:search

Conversation

@omansfeld
Copy link
Copy Markdown
Collaborator

@omansfeld omansfeld commented Mar 31, 2026

SEARCH 4.0 implementation.

This is still missing tests, which I will add later to this PR, but I thought it makes sense to get the implementation into review already while I work on that.

Update 2026-05-04: Have added the tests.

SEARCH specific metrics will be added in a follow-up PR.

I've marked divergences from the most recent draft-09 including reasoning for them in code comments, prefaced by NOTE.

In my limited manual testing during development SEARCH seems to exit slow start before loss very reliably -- more so than HyStart++. Exemplary output and qvis graph from neqo-client cli to cloudflare:

9.625 INFO stats for Client ...
  rx: 26034 drop 3 dup 0 saved 3
  tx: 35991 lost 8 lateack 0 ptoack 0 unackdrop 0
  cc:
    ce_loss 5 ce_ecn 0 ce_spurious 0
    final_cwnd Some(181065) ss_exit_cwnd Some(465764) ss_exit_reason Some(Heuristic)
[...]
image

cc: @maryam-ataei (from SEARCH team) FYI the current implementation going into review.

@codecov
Copy link
Copy Markdown

codecov Bot commented Mar 31, 2026

Codecov Report

❌ Patch coverage is 89.01734% with 19 lines in your changes missing coverage. Please review.
✅ Project coverage is 94.82%. Comparing base (f7d28ed) to head (77ef539).
⚠️ Report is 12 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #3518      +/-   ##
==========================================
+ Coverage   94.56%   94.82%   +0.26%     
==========================================
  Files         128      115      -13     
  Lines       39816    38102    -1714     
  Branches    39816    38102    -1714     
==========================================
- Hits        37651    36131    -1520     
+ Misses       1328     1268      -60     
+ Partials      837      703     -134     
Flag Coverage Δ
freebsd 93.94% <89.01%> (+0.31%) ⬆️
linux 94.98% <89.01%> (+0.45%) ⬆️
macos 94.92% <89.01%> (+0.44%) ⬆️
windows 94.97% <89.01%> (+0.45%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Components Coverage Δ
neqo-common 98.61% <ø> (ø)
neqo-crypto ∅ <ø> (∅)
neqo-http3 93.92% <ø> (ø)
neqo-qpack 95.14% <ø> (ø)
neqo-transport 95.54% <89.01%> (-0.07%) ⬇️
neqo-udp 84.90% <ø> (ø)
mtu 86.61% <ø> (ø)

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only changes to function signatures on the SlowStart trait in this file.

Comment thread neqo-transport/src/cc/search.rs Outdated
@codspeed-hq
Copy link
Copy Markdown

codspeed-hq Bot commented Apr 27, 2026

Merging this PR will not alter performance

✅ 24 untouched benchmarks
⏩ 27 skipped benchmarks1


Comparing omansfeld:search (77ef539) with main (8c18132)

Open in CodSpeed

Footnotes

  1. 27 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports.

Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review: SEARCH slow-start implementation (draft-chung-ccwg-search-09)

Clean, well-structured implementation. The algorithm logic — bin-based delivery-rate tracking, interpolated lookback, and normalized-difference exit criterion — reads correctly against the draft. The SlowStart trait extension is minimal and the integration into ClassicCongestionController is straightforward.

Key observations:

  • Floating-point elimination: The earlier feedback from @larseggert about avoiding f64 has been addressed — all arithmetic now uses integer fixed-point with SCALE = 100. Nice.

  • Division-by-zero risk: The initialize guard only rejects Duration::ZERO RTTs, but the computed bin_duration can still be zero for very small non-zero RTTs (see inline comment). This creates panic paths in update_bins and on_packets_acked.

  • Drain-phase omission: The decision to skip the draft §3.2 drain-phase is well-reasoned given CUBIC's buffer-filling behavior. The TODO for telemetry capture of the drain-target is a good follow-up.

  • Tests: The PR description notes tests will be added separately. The SEARCH algorithm has several interesting edge cases worth covering: initialization on first ACK, bin skipping/reset for stale data, the prev_idx <= W early-return guard, compute_sent interpolation boundaries, and the exit threshold itself. Looking forward to seeing those.

  • Spec divergences are well-documented: The NOTE comments for divergences from draft-09 (reset mechanism, bit-shifting optimization, interpolation direction fix, drain-phase) are clear and cite the relevant draft sections — this is exactly the right approach for an implementation tracking an evolving draft.

No blocking issues found. The bin_duration == 0 panic path (inline) is the most important item to address.

Comment thread neqo-transport/src/cc/search.rs Outdated
Comment thread neqo-transport/src/cc/search.rs Outdated
Comment thread neqo-transport/src/cc/search.rs
Comment thread neqo-transport/src/cc/search.rs Outdated
Comment thread neqo-transport/src/cc/search.rs
Comment thread neqo-transport/src/sender.rs
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds the SEARCH (Slow start Exit At Right CHokepoint) slow-start exit algorithm to neqo-transport congestion control, enabling heuristic slow-start exit based on delivery-rate flattening per the SEARCH draft.

Changes:

  • Introduces cc::Search and wires it into PacketSender for NewReno and Cubic.
  • Extends the SlowStart trait to accept per-ACK delivered bytes and now, and per-send packet bytes; updates HyStart/ClassicSlowStart and related tests accordingly.
  • Updates clippy doc identifier allowlist for “CHokepoint”.

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
neqo-transport/src/sender.rs Wires SlowStart::Search into CC selection for NewReno/Cubic.
neqo-transport/src/cc/search.rs New SEARCH slow-start exit algorithm implementation.
neqo-transport/src/cc/mod.rs Registers and exports Search; adds SlowStart::Search enum variant.
neqo-transport/src/cc/classic_cc.rs Extends SlowStart trait API; plumbs new args through CC core.
neqo-transport/src/cc/hystart.rs Updates HyStart to new SlowStart trait signature.
neqo-transport/src/cc/classic_slow_start.rs Updates ClassicSlowStart to new SlowStart trait signature.
neqo-transport/src/cc/tests/hystart.rs Updates HyStart test suite to new SlowStart trait signature.
.clippy.toml Allows “CHokepoint” in rustdoc identifiers.

Comment thread neqo-transport/src/cc/classic_cc.rs
Comment thread neqo-transport/src/cc/search.rs Outdated
Comment thread neqo-transport/src/cc/search.rs Outdated
Comment thread neqo-transport/src/cc/search.rs Outdated
Comment thread neqo-transport/src/cc/tests/hystart.rs Outdated
Comment thread neqo-transport/src/cc/tests/hystart.rs Outdated
Comment on lines +19 to +26
/// Slow start Exit At Right CHokepoint (SEARCH).
///
/// Exits slow start when the delivery rate flattens, indicating the network is
/// near its bottleneck capacity.
///
/// <https://datatracker.ietf.org/doc/html/draft-chung-ccwg-search-09>
#[derive(Debug)]
pub struct Search {
Copy link

Copilot AI Apr 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SEARCH is a new slow-start exit algorithm and changes congestion-control behavior, but there are no unit/integration tests covering its exit conditions, resets (missed bins), and division-by-zero/edge cases. Please add tests alongside the existing cc test suites (e.g., similar to hystart.rs) before merging to prevent regressions.

Copilot generated this review using guidance from repository custom instructions.
Comment thread neqo-transport/src/cc/search.rs Outdated
Comment on lines +226 to +228
let bins_last_rtt = (rtt_nanos / bin_nanos) as usize;
let prev_idx = curr_idx.saturating_sub(bins_last_rtt);
let fraction = ((rtt_nanos % bin_nanos) * Self::SCALE as u128 / bin_nanos) as usize;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Warning

These u128 as usize casts will be flagged by clippy::cast_possible_truncation (enabled via the workspace pedantic lint group). update_bins already handles the identical pattern with the safer usize::try_from(...).unwrap_or(usize::MAX) — these should be consistent.

For bins_last_rtt, unwrap_or(usize::MAX) is correct since an oversized value is caught by the EXTRA_BINS guard below. For fraction, the value is provably < SCALE (100) so truncation is impossible, but clippy can't see that.

Suggested change
let bins_last_rtt = (rtt_nanos / bin_nanos) as usize;
let prev_idx = curr_idx.saturating_sub(bins_last_rtt);
let fraction = ((rtt_nanos % bin_nanos) * Self::SCALE as u128 / bin_nanos) as usize;
let bins_last_rtt = usize::try_from(rtt_nanos / bin_nanos).unwrap_or(usize::MAX);
let prev_idx = curr_idx.saturating_sub(bins_last_rtt);
let fraction = usize::try_from((rtt_nanos % bin_nanos) * Self::SCALE as u128 / bin_nanos)
.unwrap_or(0);

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is our take in cases like this? Of course it is nice to guard against u128 as usize truncation in general, but in this case bin_nanos is derived from initial_rtt during initialization, so with realistic values rtt_nanos / bin_nanos should never not fit into a usize and those values are purely u128 because we use .as_nanos() for division accuracy.

Imo the readability here is worth more than the additional complexity of guarding against a truncation that won't ever happen in practice.

Comment thread neqo-transport/src/cc/search.rs
Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clean implementation of SEARCH draft-09. The algorithm logic — bin-based delivery-rate tracking, interpolated lookback via compute_sent, and normalized-difference exit criterion — reads correctly against the spec. Constants (W=10, THRESH=0.26, WINDOW_SIZE_FACTOR=3.50) all match.

The intentional divergences are well-documented with rationale and spec citations:

  • Stale-data reset (§ missed bins): good addition for app-limited and flow-control-limited scenarios.
  • Drain-phase omission (§3.2): well-reasoned given CUBIC's buffer-filling behavior; the TODO for telemetry capture is a good follow-up.
  • Interpolation direction fix: correcting the draft's forward interpolation to backward — noted as discussed with the SEARCH team.

The SlowStart trait extension is minimal. Making on_packet_sent a default no-op is a good design choice — it keeps ClassicSlowStart simple while letting SEARCH and HyStart override only what they need. The new new_acked_bytes and now parameters are well-motivated by SEARCH's delivery-rate tracking needs.

Two inline findings (neither blocking): a clippy::cast_possible_truncation lint issue with as usize casts, and a minor 32-bit arithmetic overflow concern in compute_sent.

The previous bot-review items (typos, test parameter confusion) appear to have been addressed in the latest commit.

Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good implementation of SEARCH draft-09. The integer-math refactor from floating point is clean, the circular buffer indexing is correct (the prev_idx <= W guard ensures compute_sent never underflows on old - 1), and the spec divergences (no drain phase, optional missed-bin reset) are well-documented with rationale.

Architecture: The SlowStart trait extension (new_acked_bytes, now) is a reasonable approach — the new parameters are plumbed through cleanly and existing impls (ClassicSlowStart, HyStart) just ignore them. The on_packet_sent default body {} is a nice simplification vs. requiring every impl to provide a no-op.

Key items from this review:

  • 32-bit overflow in norm_diff computation (same class as the interpolation issue already discussed, but different location)
  • u128 as usize casts should use usize::try_from for consistency with update_bins
  • debug_assert!(false) gives no visibility in release builds — consider adding a qwarn!

Tests are still pending per the PR description — looking forward to seeing those cover the exit condition, missed-bin reset, and edge cases (zero/tiny RTT, bins_last_rtt = 0).

return None;
}
let diff = prev_sent.saturating_sub(curr_delv);
let norm_diff = diff * Self::SCALE / prev_sent;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Warning

diff * Self::SCALE can overflow on 32-bit targets when diff > ~43 MB (same class of issue flagged for compute_sent, but at a different location). Since diff <= prev_sent the final result is bounded by SCALE, but the intermediate product overflows first.

Consider rewriting to avoid the widening multiplication:

Suggested change
let norm_diff = diff * Self::SCALE / prev_sent;
let norm_diff = diff / (prev_sent / Self::SCALE);

That loses a bit of precision (integer division twice), but avoids overflow entirely. Alternatively, widen to u64:

let norm_diff = (diff as u64 * Self::SCALE as u64 / prev_sent as u64) as usize;

Comment thread neqo-transport/src/cc/search.rs Outdated
Comment on lines +94 to +100
if self.bin_duration.is_zero() {
debug_assert!(
false,
"bin_duration must be non-zero for correctness and to guard against div by zero -- initial_rtt was zero or too small"
);
return;
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

When initialize returns early (zero bin_duration), curr_idx remains None, so on_packets_acked will re-attempt initialization on every subsequent ACK. That's fine for recovery once a valid RTT arrives.

However, the early return leaves bin_end as None and bin_duration as Duration::ZERO — so in release builds (debug_assert is a no-op) we silently enter a state where SEARCH is permanently disabled for this connection if every latest_rtt() value remains below ~3ns (yielding a zero bin_duration after integer division). While that's only a theoretical concern, the debug_assert!(false, ...) pattern is unconventional — a qdebug! or qwarn! would give visibility in release builds too.

Suggested change
if self.bin_duration.is_zero() {
debug_assert!(
false,
"bin_duration must be non-zero for correctness and to guard against div by zero -- initial_rtt was zero or too small"
);
return;
}
if self.bin_duration.is_zero() {
debug_assert!(
false,
"bin_duration must be non-zero for correctness and to guard against div by zero -- initial_rtt was zero or too small"
);
qwarn!("SEARCH: bin_duration is zero (initial_rtt too small), skipping initialization");
return;
}


// Initialize on first ACK.
if self.curr_idx.is_none() {
self.initialize(rtt, now);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note

Per draft-09 §3.2 step 5, the algorithm initializes bin_duration using the RTT from the first ACK (initial_rtt). This value is then fixed for the lifetime of the SEARCH instance (until reset()). If the first RTT sample is an outlier (e.g., inflated by server processing delay), the bin grid will be permanently mis-calibrated — bins that are too wide reduce SEARCH's temporal resolution, and bins that are too narrow increase the chance of hitting the prev_idx > EXTRA_BINS bail-out.

This follows the draft, so no change needed now, but it might be worth tracking as a potential improvement: re-initializing with smoothed_rtt after a few samples could make SEARCH more robust to noisy first RTTs.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a real concern - the first RTT sample is usually too high.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, but I'm not sure it's worth the tradeoff.

Since bin_duration fixes the grid for the byte-counters, re-initializing it would de-validate all collected samples thus far, which would delay the point where SEARCH has enough data to actually check for the exit point. Using smoothed_rtt we'd need to wait multiple RTTs until we have a good approximation.

On small bandwidth connections that might make us miss the detection window before loss occurs.

In contrast too big a bin_duration from a high first RTT also delays the point where we have passed enough bins for a SEARCH check, but by less than being idle for multiple RTTs.

One option would be something like min(first_rtt, second_rtt). Then we miss just one RTT and if the second one is actually lower this might be better than having an inflated bin_duration.

I assume the first RTT is usually higher because of handshake overhead? By how much usually?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you re-bin existing samples if you detect the initial RTT was too high?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, I don't think so.

SEARCH records snapshots of the cumulative acked/sent bytes counters when crossing a bin_duration boundary. There is no way of knowing where those counters would've been at for the new bin_duration when changing it retrospectively.

I suggest we keep it as is per the spec for now to see how it performs but keep the min(first_rtt, second_rtt) in the back of our heads as potential tuning. I will make a note to also add metrics for this in my follow on SEARCH-metric PR, so we can see how often and by how much the first RTT is inflated.

Is my intuition tracking with your experience that if the first RTT is inflated the second will most likely already be back to normal, i.e. min(first_rtt, second_rtt) would be enough of a mechanism to prevent the too high value?

Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary

Clean implementation of SEARCH 4.0 (draft-09) that integrates well via the existing SlowStart trait. The trait extension is nicely done — adding a default no-op for on_packet_sent avoids boilerplate in implementors that don't need it. Divergences from the spec (no drain phase, interpolation direction fix, optional reset on missed bins) are well-documented with rationale and spec links.

The circular buffer arithmetic is correct — I verified that NUM_ACKED_BINS (11) and NUM_SENT_BINS (26) are exactly sized to avoid aliasing given the access patterns in compute_delv and compute_sent, including the EXTRA_BINS guard at line 266.

The new tests cover initialization, bin advancement, skipped-bin propagation, reset/reinit, and the end-to-end exit condition. A few targeted edge-case tests would round out coverage (noted inline).

Most arithmetic concerns (32-bit overflow, bare as usize casts) were raised in prior review rounds — I've added concrete suggestions for the remaining instances. The debug_assert! + #[should_panic] interaction is a new finding that should be addressed before merge.

Comment on lines +82 to +88
#[test]
#[should_panic(
expected = "bin_duration must be non-zero for correctness and to guard against div by zero -- initial_rtt was zero or too small"
)]
fn initialization_with_zero_rtt() {
init_search(Duration::ZERO);
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Warning

#[should_panic] relies on the debug_assert!(false, ...) in initialize(), which is stripped in release builds. If tests are ever run with --release, this test will fail (no panic fires, then init_search panics at the bin_end assertion with a different message that doesn't match expected).

Guard the test with #[cfg(debug_assertions)] to make this explicit:

Suggested change
#[test]
#[should_panic(
expected = "bin_duration must be non-zero for correctness and to guard against div by zero -- initial_rtt was zero or too small"
)]
fn initialization_with_zero_rtt() {
init_search(Duration::ZERO);
}
#[test]
#[cfg(debug_assertions)]
#[should_panic(
expected = "bin_duration must be non-zero for correctness and to guard against div by zero -- initial_rtt was zero or too small"
)]
fn initialization_with_zero_rtt() {
init_search(Duration::ZERO);
}

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or just make it expect the bin_end assertion text.

Comment thread neqo-transport/src/cc/search.rs Outdated
// `0..[Self::SCALE]` for interpolation in `compute_sent`.
let rtt_nanos = rtt.as_nanos();
let bin_nanos = self.bin_duration.as_nanos();
let bins_last_rtt = (rtt_nanos / bin_nanos) as usize;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Warning

Bare as usize casts from u128 here and on line 256 are inconsistent with the defensive usize::try_from(...).unwrap_or(usize::MAX) pattern used in update_bins (line 119). While bins_last_rtt is bounded in practice (RTT / bin_duration ≈ 2.86 for typical ratios), and fraction is always < 100, using the same style throughout would be more robust and consistent.

Suggested change
let bins_last_rtt = (rtt_nanos / bin_nanos) as usize;
let bins_last_rtt = usize::try_from(rtt_nanos / bin_nanos).unwrap_or(usize::MAX);

return None;
}
let diff = prev_sent.saturating_sub(curr_delv);
let norm_diff = diff * Self::SCALE / prev_sent;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Warning

diff * Self::SCALE can overflow usize on 32-bit targets when diff exceeds ~42 MB (usize::MAX / 100). This is reachable during slow start with large BDPs. Use saturating_mul for consistency with the rest of the arithmetic in this file:

Suggested change
let norm_diff = diff * Self::SCALE / prev_sent;
let norm_diff = diff.saturating_mul(Self::SCALE) / prev_sent;

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would the saturating_mul limit growth for BDPs > ~42MB?

pn += 1;
bytes_this_round += bytes_this_round / 4;
}
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tip

The exit-condition test is good. Consider adding a targeted unit test for the prev_sent == 0 early-return path (line 276 of search.rs) — e.g., trigger a SEARCH check window where no packets were sent in the lookback range. Also, a test verifying that compute_sent interpolation produces the correct weighted value for a known fraction would increase confidence in the interpolation math independently of the end-to-end exit test.

Comment on lines +94 to +100
if self.bin_duration.is_zero() {
debug_assert!(
false,
"bin_duration must be non-zero for correctness and to guard against div by zero -- initial_rtt was zero or too small"
);
return;
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note

When initialize returns early here (release build, zero/tiny RTT), curr_idx stays None and SEARCH silently retries initialization on every subsequent ACK until a non-zero bin_duration results. This is safe but wasteful. Consider setting a flag to avoid the repeated retry, or logging a one-time warning so operators can diagnose why SEARCH never activates.

Suggested change
if self.bin_duration.is_zero() {
debug_assert!(
false,
"bin_duration must be non-zero for correctness and to guard against div by zero -- initial_rtt was zero or too small"
);
return;
}
if self.bin_duration.is_zero() {
debug_assert!(
false,
"bin_duration must be non-zero for correctness and to guard against div by zero -- initial_rtt was zero or too small"
);
// Mark as permanently disabled to avoid retrying on every ACK.
self.bin_end = Some(now);
return;
}

(With this, the now <= bin_end check on line 240 would early-return on subsequent ACKs without re-entering initialize. Alternative: use a dedicated disabled: bool field if you prefer clarity over reusing bin_end.)

@larseggert
Copy link
Copy Markdown
Collaborator

@omansfeld there are a lot of QNS timeouts that other recent PRs such as #3586 (comment) do not have. That is suspicious.

- for byte/index calculations
- also adds a helper for `prev_idx` calculation
- fixes existing should_panic test
Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clean implementation of SEARCH 4.0 (draft-09). The SlowStart trait extension is minimal and well-designed (default no-op for on_packet_sent, new parameters plumbed cleanly). Spec divergences (no drain phase, missed-bin reset, interpolation direction fix) are well-documented with rationale and citations.

The new test suite covers the core paths: initialization, bin advancement, skipped-bin propagation, reset/reinit, cumulative byte tracking, prev_idx/fraction calculation, and the end-to-end exit condition. The #[should_panic] test is correctly gated with #[cfg(debug_assertions)].

Main concern: the 32-bit arithmetic overflow in norm_diff and compute_sent — see inline comments for why saturating_mul (previously suggested) is not a safe fix and would cause false slow-start exits on 32-bit targets. Widening to u64 is the right approach.

return None;
}
let diff = prev_sent.saturating_sub(curr_delv);
let norm_diff = diff * Self::SCALE / prev_sent;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Re @larseggert's open question on the prior thread: saturating_mul is not safe here. When diff * SCALE saturates to usize::MAX on 32-bit, norm_diff = usize::MAX / prev_sent yields a huge value (e.g., ~95 for a 43 MB prev_sent), far exceeding THRESH=26. This causes a false slow-start exit, not limited growth.

Since neqo targets 32-bit Android (Firefox ARMv7), widen to u64 instead — the result is bounded by SCALE (100), so the final truncation is always safe:

Suggested change
let norm_diff = diff * Self::SCALE / prev_sent;
let norm_diff = (diff as u64 * Self::SCALE as u64 / prev_sent as u64) as usize;

Comment on lines +192 to +206
const fn compute_sent(&self, old: usize, new: usize, fraction: usize) -> usize {
// NOTE: SEARCH draft-09 does forward interpolation here, i.e. `new + 1`/`old + 1`. That is
// a mistake in the draft and has been discussed with the SEARCH team. Subtracting
// is correct.
let mut sent = (self.sent_bins[(new - 1) % Self::NUM_SENT_BINS]
.saturating_sub(self.sent_bins[(old - 1) % Self::NUM_SENT_BINS]))
.saturating_mul(fraction);

sent = sent.saturating_add(
(self.sent_bins[new % Self::NUM_SENT_BINS]
.saturating_sub(self.sent_bins[old % Self::NUM_SENT_BINS]))
.saturating_mul(Self::SCALE - fraction),
);
sent / Self::SCALE
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Warning

Same class of 32-bit overflow as the norm_diff computation. compute_sent returns sent / Self::SCALE, and if the intermediate products saturated to usize::MAX, the result is usize::MAX / 100 ≈ 42.9M on 32-bit — which inflates prev_sent and can trigger a false slow-start exit downstream.

Consider widening the interpolation to u64 as well:

Suggested change
const fn compute_sent(&self, old: usize, new: usize, fraction: usize) -> usize {
// NOTE: SEARCH draft-09 does forward interpolation here, i.e. `new + 1`/`old + 1`. That is
// a mistake in the draft and has been discussed with the SEARCH team. Subtracting
// is correct.
let mut sent = (self.sent_bins[(new - 1) % Self::NUM_SENT_BINS]
.saturating_sub(self.sent_bins[(old - 1) % Self::NUM_SENT_BINS]))
.saturating_mul(fraction);
sent = sent.saturating_add(
(self.sent_bins[new % Self::NUM_SENT_BINS]
.saturating_sub(self.sent_bins[old % Self::NUM_SENT_BINS]))
.saturating_mul(Self::SCALE - fraction),
);
sent / Self::SCALE
}
const fn compute_sent(&self, old: usize, new: usize, fraction: usize) -> usize {
// NOTE: SEARCH draft-09 does forward interpolation here, i.e. `new + 1`/`old + 1`. That is
// a mistake in the draft and has been discussed with the SEARCH team. Subtracting
// is correct.
let a = (self.sent_bins[(new - 1) % Self::NUM_SENT_BINS]
.saturating_sub(self.sent_bins[(old - 1) % Self::NUM_SENT_BINS]))
as u64
* fraction as u64;
let b = (self.sent_bins[new % Self::NUM_SENT_BINS]
.saturating_sub(self.sent_bins[old % Self::NUM_SENT_BINS]))
as u64
* (Self::SCALE - fraction) as u64;
(a.saturating_add(b) / Self::SCALE as u64) as usize
}

pn += 1;
bytes_this_round += bytes_this_round / 4;
}
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note

The curr_idx - prev_idx >= EXTRA_BINS guard on search.rs:294 (which handles excessive RTT inflation making the lookback impossible) has no test coverage. Consider adding a test that simulates significant RTT inflation (e.g., initial_rtt=100ms then rtt=600ms, giving bins_last_rtt ≈ 17 > EXTRA_BINS=15) and verifies SEARCH returns None rather than using stale data.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 4, 2026

Benchmark results

Significant performance differences relative to 8c18132.

transfer/1-conn/1-100mb-resp (aka. Download): 💔 Performance has regressed by +2.9155%.
       time:   [206.48 ms 206.78 ms 207.12 ms]
       thrpt:  [482.82 MiB/s 483.61 MiB/s 484.32 MiB/s]
change:
       time:   [+2.6759% +2.9155% +3.1529] (p = 0.00 < 0.05)
       thrpt:  [-3.0566% -2.8329% -2.6062]
       Performance has regressed.
Found 2 outliers among 100 measurements (2.00%)
2 (2.00%) high severe
transfer/1-conn/1-100mb-req (aka. Upload): 💔 Performance has regressed by +3.1347%.
       time:   [208.20 ms 208.61 ms 209.03 ms]
       thrpt:  [478.39 MiB/s 479.36 MiB/s 480.31 MiB/s]
change:
       time:   [+2.8432% +3.1347% +3.4175] (p = 0.00 < 0.05)
       thrpt:  [-3.3046% -3.0394% -2.7646]
       Performance has regressed.
Found 2 outliers among 100 measurements (2.00%)
1 (1.00%) low mild
1 (1.00%) high severe
streams/walltime/1000-streams/each-1000-bytes: 💚 Performance has improved by -1.4408%.
       time:   [44.793 ms 44.837 ms 44.882 ms]
       change: [-1.7250% -1.4408% -1.2257] (p = 0.00 < 0.05)
       Performance has improved.
Found 1 outliers among 100 measurements (1.00%)
1 (1.00%) high severe
All results
transfer/1-conn/1-100mb-resp (aka. Download): 💔 Performance has regressed by +2.9155%.
       time:   [206.48 ms 206.78 ms 207.12 ms]
       thrpt:  [482.82 MiB/s 483.61 MiB/s 484.32 MiB/s]
change:
       time:   [+2.6759% +2.9155% +3.1529] (p = 0.00 < 0.05)
       thrpt:  [-3.0566% -2.8329% -2.6062]
       Performance has regressed.
Found 2 outliers among 100 measurements (2.00%)
2 (2.00%) high severe
transfer/1-conn/10_000-parallel-1b-resp (aka. RPS): No change in performance detected.
       time:   [284.33 ms 286.17 ms 288.04 ms]
       thrpt:  [34.718 Kelem/s 34.944 Kelem/s 35.171 Kelem/s]
change:
       time:   [-1.4719% -0.5999% +0.2562] (p = 0.19 > 0.05)
       thrpt:  [-0.2555% +0.6036% +1.4939]
       No change in performance detected.
transfer/1-conn/1-1b-resp (aka. HPS): No change in performance detected.
       time:   [38.667 ms 38.825 ms 39.002 ms]
       thrpt:  [25.639   B/s 25.757   B/s 25.862   B/s]
change:
       time:   [-1.2984% -0.5996% +0.0926] (p = 0.10 > 0.05)
       thrpt:  [-0.0925% +0.6032% +1.3155]
       No change in performance detected.
Found 9 outliers among 100 measurements (9.00%)
2 (2.00%) high mild
7 (7.00%) high severe
transfer/1-conn/1-100mb-req (aka. Upload): 💔 Performance has regressed by +3.1347%.
       time:   [208.20 ms 208.61 ms 209.03 ms]
       thrpt:  [478.39 MiB/s 479.36 MiB/s 480.31 MiB/s]
change:
       time:   [+2.8432% +3.1347% +3.4175] (p = 0.00 < 0.05)
       thrpt:  [-3.3046% -3.0394% -2.7646]
       Performance has regressed.
Found 2 outliers among 100 measurements (2.00%)
1 (1.00%) low mild
1 (1.00%) high severe
streams/walltime/1-streams/each-1000-bytes: Change within noise threshold.
       time:   [592.80 µs 594.60 µs 596.72 µs]
       change: [+0.0155% +0.6357% +1.2321] (p = 0.04 < 0.05)
       Change within noise threshold.
Found 7 outliers among 100 measurements (7.00%)
7 (7.00%) high severe
streams/walltime/1000-streams/each-1-bytes: Change within noise threshold.
       time:   [12.232 ms 12.253 ms 12.275 ms]
       change: [-0.8030% -0.5649% -0.3139] (p = 0.00 < 0.05)
       Change within noise threshold.
Found 1 outliers among 100 measurements (1.00%)
1 (1.00%) high severe
streams/walltime/1000-streams/each-1000-bytes: 💚 Performance has improved by -1.4408%.
       time:   [44.793 ms 44.837 ms 44.882 ms]
       change: [-1.7250% -1.4408% -1.2257] (p = 0.00 < 0.05)
       Performance has improved.
Found 1 outliers among 100 measurements (1.00%)
1 (1.00%) high severe
transfer/walltime/pacing-false/varying-seeds: Change within noise threshold.
       time:   [23.362 ms 23.378 ms 23.393 ms]
       change: [-0.8000% -0.6899% -0.5768] (p = 0.00 < 0.05)
       Change within noise threshold.
Found 2 outliers among 100 measurements (2.00%)
1 (1.00%) low mild
1 (1.00%) high mild
transfer/walltime/pacing-true/varying-seeds: Change within noise threshold.
       time:   [23.755 ms 23.770 ms 23.785 ms]
       change: [+0.2200% +0.3117% +0.4038] (p = 0.00 < 0.05)
       Change within noise threshold.
transfer/walltime/pacing-false/same-seed: Change within noise threshold.
       time:   [23.770 ms 23.787 ms 23.804 ms]
       change: [+0.9409% +1.0425% +1.1427] (p = 0.00 < 0.05)
       Change within noise threshold.
Found 2 outliers among 100 measurements (2.00%)
2 (2.00%) high mild
transfer/walltime/pacing-true/same-seed: Change within noise threshold.
       time:   [24.048 ms 24.074 ms 24.109 ms]
       change: [+0.0495% +0.1853% +0.3413] (p = 0.01 < 0.05)
       Change within noise threshold.
Found 4 outliers among 100 measurements (4.00%)
2 (2.00%) low mild
1 (1.00%) high mild
1 (1.00%) high severe

Download data for profiler.firefox.com or download performance comparison data.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 4, 2026

Failed Interop Tests

QUIC Interop Runner, client vs. server, differences relative to main at faa1437.

neqo-pr as clientneqo-pr as server
neqo-pr vs. aioquic: BP BA
neqo-pr vs. go-x-net: BP BA
neqo-pr vs. haproxy: BP BA
neqo-pr vs. kwik: 🚀L1 C1 BP BA
neqo-pr vs. linuxquic: L1 BP BA CM
neqo-pr vs. lsquic: L1 C1 BP BA CM
neqo-pr vs. msquic: A L1 C1 BP BA
neqo-pr vs. mvfst: A BP BA
neqo-pr vs. neqo: A BP BA CM
neqo-pr vs. nginx: BP BA
neqo-pr vs. ngtcp2: E BP BA CM
neqo-pr vs. picoquic: A BP BA CM
neqo-pr vs. quic-go: A BP BA
neqo-pr vs. quic-zig: BP BA CM
neqo-pr vs. quiche: BP BA
neqo-pr vs. quinn: BP BA
neqo-pr vs. s2n-quic: BP BA CM
neqo-pr vs. tquic: S BP BA
neqo-pr vs. xquic: A 🚀C1 BP BA
aioquic vs. neqo-pr: 🚀L1 ⚠️C1 BP BA CM
go-x-net vs. neqo-pr: BP BA CM
kwik vs. neqo-pr: BP BA CM
linuxquic vs. neqo-pr: run cancelled after 20 min
lsquic vs. neqo-pr: ⚠️C1 BP BA CM
msquic vs. neqo-pr: BP BA CM
mvfst vs. neqo-pr: run cancelled after 20 min
neqo vs. neqo-pr: A BP BA CM
ngtcp2 vs. neqo-pr: E BP BA CM
openssl vs. neqo-pr: LR M A BP BA CM
quic-go vs. neqo-pr: BP BA CM
quic-zig vs. neqo-pr: ⚠️DC
quiche vs. neqo-pr: BP BA CM
quinn vs. neqo-pr: V2 CM
s2n-quic vs. neqo-pr: 🚀BP CM
tquic vs. neqo-pr: CM
xquic vs. neqo-pr: M CM
All results

Succeeded Interop Tests

QUIC Interop Runner, client vs. server

neqo-pr as client

neqo-pr as server

Unsupported Interop Tests

QUIC Interop Runner, client vs. server

neqo-pr as client

neqo-pr as server

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 4, 2026

Client/server transfer results

Performance differences relative to 8c18132.

Transfer of 33554432 bytes over loopback, min. 100 runs. All unit-less numbers are in milliseconds.

Client vs. server (params) Mean ± σ Min Max MiB/s ± σ Δ baseline Δ baseline
neqo-msquic-cubic 150.6 ± 4.3 144.6 177.0 212.4 ± 7.4 💔 1.4 0.9%
neqo-neqo-cubic 89.8 ± 2.3 85.5 95.3 356.3 ± 13.9 💚 -0.8 -0.9%
neqo-neqo-cubic-nopacing 88.8 ± 2.6 84.6 95.9 360.2 ± 12.3 💚 -0.7 -0.8%
quiche-neqo-cubic 171.8 ± 3.3 165.0 181.0 186.3 ± 9.7 💔 2.6 1.6%

Table above only shows statistically significant changes. See all results below.

All results

Transfer of 33554432 bytes over loopback, min. 100 runs. All unit-less numbers are in milliseconds.

Client vs. server (params) Mean ± σ Min Max MiB/s ± σ Δ baseline Δ baseline
google-google-nopacing 457.6 ± 1.6 454.8 464.3 69.9 ± 20.0
google-neqo-cubic 267.8 ± 2.9 264.5 287.4 119.5 ± 11.0 -0.4 -0.1%
msquic-msquic-nopacing 128.2 ± 21.9 110.5 202.7 249.6 ± 1.5
msquic-neqo-cubic 149.1 ± 26.4 118.8 330.0 214.6 ± 1.2 4.3 3.0%
neqo-google-cubic 769.5 ± 2.2 763.3 774.5 41.6 ± 14.5 -0.0 -0.0%
neqo-msquic-cubic 150.6 ± 4.3 144.6 177.0 212.4 ± 7.4 💔 1.4 0.9%
neqo-neqo-cubic 89.8 ± 2.3 85.5 95.3 356.3 ± 13.9 💚 -0.8 -0.9%
neqo-neqo-cubic-nopacing 88.8 ± 2.6 84.6 95.9 360.2 ± 12.3 💚 -0.7 -0.8%
neqo-neqo-newreno 90.6 ± 2.4 85.2 96.1 353.1 ± 13.3 -0.6 -0.6%
neqo-neqo-newreno-nopacing 89.6 ± 3.5 83.5 112.0 357.1 ± 9.1 -0.7 -0.8%
neqo-quiche-cubic 188.1 ± 2.3 183.9 195.4 170.1 ± 13.9 -0.4 -0.2%
neqo-s2n-cubic 215.1 ± 2.3 209.6 229.8 148.8 ± 13.9 -0.1 -0.1%
quiche-neqo-cubic 171.8 ± 3.3 165.0 181.0 186.3 ± 9.7 💔 2.6 1.6%
quiche-quiche-nopacing 138.9 ± 3.6 133.8 161.8 230.4 ± 8.9
s2n-neqo-cubic 213.8 ± 3.4 207.7 226.2 149.7 ± 9.4 -0.8 -0.4%
s2n-s2n-nopacing 292.5 ± 25.8 267.7 389.1 109.4 ± 1.2

Download data for profiler.firefox.com or download performance comparison data.

@omansfeld
Copy link
Copy Markdown
Collaborator Author

The usize overflows on 32-bit when doing widening multiplication with Self::SCALE in compute_sent and when calculating norm_diff are a real concern. My flight to Toronto is almost over, so I'll look into that with a clear head again in the coming days.

Probably the easiest solution is to just widen to u64 for intermediate results when multiplying with Self::SCALE.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants