Skip to content

feat: Increase default UDP send buffer size to 1MB#3495

Open
larseggert wants to merge 1 commit intomozilla:mainfrom
larseggert:feat-1mb-sndbuf
Open

feat: Increase default UDP send buffer size to 1MB#3495
larseggert wants to merge 1 commit intomozilla:mainfrom
larseggert:feat-1mb-sndbuf

Conversation

@larseggert
Copy link
Copy Markdown
Collaborator

Copilot AI review requested due to automatic review settings March 20, 2026 11:22
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates the neqo-bin UDP socket setup to proactively increase the default per-socket UDP send buffer size to 1MB (matching Firefox behavior), improving performance headroom for higher-throughput scenarios.

Changes:

  • Set SO_SNDBUF to 1MB when the existing send buffer is below 1MB.
  • Add debug logging to report when the send buffer is changed vs. left unchanged.

Comment thread neqo-bin/src/udp.rs
Comment on lines +39 to +42
qdebug!(
"Increasing socket send buffer size from {send_buf_before} to {ONE_MB}, now: {:?}",
state.send_buffer_size((&socket).into())
);
Copy link

Copilot AI Mar 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

state.send_buffer_size((&socket).into()) returns a Result, but here it's logged with {:?} without ?, so the message will print Ok(..)/Err(..) and any error is silently ignored. Consider capturing send_buf_after using ? (or handling/logging the error explicitly) and logging the actual size to keep the log message accurate and error handling consistent with the earlier send_buf_before query.

Copilot uses AI. Check for mistakes.
@codecov
Copy link
Copy Markdown

codecov Bot commented Mar 20, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 94.18%. Comparing base (5b4e850) to head (8bb596f).
⚠️ Report is 9 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #3495      +/-   ##
==========================================
- Coverage   94.30%   94.18%   -0.12%     
==========================================
  Files         127      131       +4     
  Lines       38739    39069     +330     
  Branches    38739    39069     +330     
==========================================
+ Hits        36532    36799     +267     
- Misses       1369     1421      +52     
- Partials      838      849      +11     
Flag Coverage Δ
freebsd 93.23% <ø> (-0.09%) ⬇️
linux 94.29% <ø> (-0.01%) ⬇️
macos 94.18% <ø> (ø)
windows 94.29% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

Components Coverage Δ
neqo-common 98.49% <ø> (ø)
neqo-crypto 86.90% <ø> (ø)
neqo-http3 93.91% <ø> (ø)
neqo-qpack 94.81% <ø> (ø)
neqo-transport 95.23% <ø> (-0.02%) ⬇️
neqo-udp 82.55% <ø> (-0.43%) ⬇️
mtu 86.61% <ø> (ø)

@github-actions
Copy link
Copy Markdown
Contributor

Benchmark results

Significant performance differences relative to 5b4e850.

transfer/1-conn/1-100mb-resp (aka. Download)/mtu-1504: 💚 Performance has improved by -1.4682%.
       time:   [197.87 ms 198.32 ms 198.78 ms]
       thrpt:  [503.07 MiB/s 504.23 MiB/s 505.38 MiB/s]
change:
       time:   [-1.8496% -1.4682% -1.1042] (p = 0.00 < 0.05)
       thrpt:  [+1.1165% +1.4901% +1.8845]
       Performance has improved.
Found 1 outliers among 100 measurements (1.00%)
1 (1.00%) high mild
transfer/1-conn/1-100mb-req (aka. Upload)/mtu-1504: 💚 Performance has improved by -1.4988%.
       time:   [201.27 ms 201.61 ms 201.95 ms]
       thrpt:  [495.16 MiB/s 496.01 MiB/s 496.85 MiB/s]
change:
       time:   [-1.8140% -1.4988% -1.2163] (p = 0.00 < 0.05)
       thrpt:  [+1.2313% +1.5216% +1.8475]
       Performance has improved.
Found 2 outliers among 100 measurements (2.00%)
2 (2.00%) high mild
All results
transfer/1-conn/1-100mb-resp (aka. Download)/mtu-1504: 💚 Performance has improved by -1.4682%.
       time:   [197.87 ms 198.32 ms 198.78 ms]
       thrpt:  [503.07 MiB/s 504.23 MiB/s 505.38 MiB/s]
change:
       time:   [-1.8496% -1.4682% -1.1042] (p = 0.00 < 0.05)
       thrpt:  [+1.1165% +1.4901% +1.8845]
       Performance has improved.
Found 1 outliers among 100 measurements (1.00%)
1 (1.00%) high mild
transfer/1-conn/10_000-parallel-1b-resp (aka. RPS)/mtu-1504: No change in performance detected.
       time:   [284.31 ms 286.48 ms 288.64 ms]
       thrpt:  [34.645 Kelem/s 34.907 Kelem/s 35.173 Kelem/s]
change:
       time:   [-0.8882% +0.1486% +1.1667] (p = 0.78 > 0.05)
       thrpt:  [-1.1532% -0.1484% +0.8962]
       No change in performance detected.
transfer/1-conn/1-1b-resp (aka. HPS)/mtu-1504: No change in performance detected.
       time:   [38.557 ms 38.689 ms 38.840 ms]
       thrpt:  [25.747   B/s 25.847   B/s 25.936   B/s]
change:
       time:   [-0.9644% -0.3229% +0.2969] (p = 0.31 > 0.05)
       thrpt:  [-0.2960% +0.3240% +0.9738]
       No change in performance detected.
Found 5 outliers among 100 measurements (5.00%)
1 (1.00%) high mild
4 (4.00%) high severe
transfer/1-conn/1-100mb-req (aka. Upload)/mtu-1504: 💚 Performance has improved by -1.4988%.
       time:   [201.27 ms 201.61 ms 201.95 ms]
       thrpt:  [495.16 MiB/s 496.01 MiB/s 496.85 MiB/s]
change:
       time:   [-1.8140% -1.4988% -1.2163] (p = 0.00 < 0.05)
       thrpt:  [+1.2313% +1.5216% +1.8475]
       Performance has improved.
Found 2 outliers among 100 measurements (2.00%)
2 (2.00%) high mild
streams/walltime/1-streams/each-1000-bytes: No change in performance detected.
       time:   [585.24 µs 586.98 µs 589.06 µs]
       change: [-0.4552% +0.0162% +0.5025] (p = 0.94 > 0.05)
       No change in performance detected.
Found 7 outliers among 100 measurements (7.00%)
1 (1.00%) high mild
6 (6.00%) high severe
streams/walltime/1000-streams/each-1-bytes: No change in performance detected.
       time:   [12.330 ms 12.348 ms 12.365 ms]
       change: [-0.5767% -0.0159% +0.3522] (p = 0.96 > 0.05)
       No change in performance detected.
streams/walltime/1000-streams/each-1000-bytes: Change within noise threshold.
       time:   [45.334 ms 45.379 ms 45.423 ms]
       change: [+0.3568% +0.4952% +0.6280] (p = 0.00 < 0.05)
       Change within noise threshold.
Found 1 outliers among 100 measurements (1.00%)
1 (1.00%) high mild
transfer/walltime/pacing-false/varying-seeds: No change in performance detected.
       time:   [77.788 ms 77.896 ms 78.049 ms]
       change: [-0.3611% -0.1790% +0.0298] (p = 0.07 > 0.05)
       No change in performance detected.
Found 4 outliers among 100 measurements (4.00%)
2 (2.00%) high mild
2 (2.00%) high severe
transfer/walltime/pacing-true/varying-seeds: Change within noise threshold.
       time:   [79.705 ms 79.873 ms 80.089 ms]
       change: [+0.1325% +0.3644% +0.6441] (p = 0.00 < 0.05)
       Change within noise threshold.
Found 9 outliers among 100 measurements (9.00%)
4 (4.00%) high mild
5 (5.00%) high severe
transfer/walltime/pacing-false/same-seed: Change within noise threshold.
       time:   [78.072 ms 78.192 ms 78.348 ms]
       change: [+0.0781% +0.3370% +0.5724] (p = 0.00 < 0.05)
       Change within noise threshold.
Found 8 outliers among 100 measurements (8.00%)
4 (4.00%) high mild
4 (4.00%) high severe
transfer/walltime/pacing-true/same-seed: Change within noise threshold.
       time:   [79.591 ms 79.695 ms 79.852 ms]
       change: [-1.2790% -1.0413% -0.7934] (p = 0.00 < 0.05)
       Change within noise threshold.
Found 4 outliers among 100 measurements (4.00%)
3 (3.00%) high mild
1 (1.00%) high severe

Download data for profiler.firefox.com or download performance comparison data.

@github-actions
Copy link
Copy Markdown
Contributor

Failed Interop Tests

QUIC Interop Runner, client vs. server, differences relative to main at 5b4e850.

neqo-pr as clientneqo-pr as server
neqo-pr vs. go-x-net: BP BA
neqo-pr vs. haproxy: BP BA
neqo-pr vs. kwik: S L1 C1 BP BA
neqo-pr vs. linuxquic: L1 🚀C1
neqo-pr vs. lsquic: run cancelled after 20 min
neqo-pr vs. msquic: A L1 C1
neqo-pr vs. mvfst: H DC LR M R Z 3 B U A L1 L2 C1 C2 6 BP BA
neqo-pr vs. neqo: Z A 🚀BP
neqo-pr vs. nginx: BP BA
neqo-pr vs. ngtcp2: CM
neqo-pr vs. picoquic: A
neqo-pr vs. quic-go: A
neqo-pr vs. quiche: BP BA
neqo-pr vs. s2n-quic: 🚀BP BA CM
neqo-pr vs. tquic: S BP BA
neqo-pr vs. xquic: A 🚀C1
aioquic vs. neqo-pr: Z CM
go-x-net vs. neqo-pr: CM
kwik vs. neqo-pr: Z BP BA CM
linuxquic vs. neqo-pr: Z
lsquic vs. neqo-pr: Z 🚀C1 ⚠️BA
msquic vs. neqo-pr: Z ⚠️BA CM
mvfst vs. neqo-pr: Z A L1 C1 CM
neqo vs. neqo-pr: Z A
openssl vs. neqo-pr: LR M A 🚀BP CM
picoquic vs. neqo-pr: Z CM
quic-go vs. neqo-pr: CM
quiche vs. neqo-pr: run cancelled after 20 min
quinn vs. neqo-pr: Z 🚀L1 C1 V2 CM
s2n-quic vs. neqo-pr: 🚀BP CM
tquic vs. neqo-pr: Z CM
xquic vs. neqo-pr: M CM
All results

Succeeded Interop Tests

QUIC Interop Runner, client vs. server

neqo-pr as client

neqo-pr as server

Unsupported Interop Tests

QUIC Interop Runner, client vs. server

neqo-pr as client

neqo-pr as server

@github-actions
Copy link
Copy Markdown
Contributor

Client/server transfer results

Performance differences relative to 5b4e850.

Transfer of 33554432 bytes over loopback, min. 100 runs. All unit-less numbers are in milliseconds.

Client vs. server (params) Mean ± σ Min Max MiB/s ± σ Δ baseline Δ baseline
neqo-s2n-cubic 217.4 ± 4.2 209.5 228.8 147.2 ± 7.6 💚 -1.7 -0.8%

Table above only shows statistically significant changes. See all results below.

All results

Transfer of 33554432 bytes over loopback, min. 100 runs. All unit-less numbers are in milliseconds.

Client vs. server (params) Mean ± σ Min Max MiB/s ± σ Δ baseline Δ baseline
google-google-nopacing 449.8 ± 3.3 444.3 464.9 71.1 ± 9.7
google-neqo-cubic 269.7 ± 4.8 261.3 278.7 118.7 ± 6.7 -0.5 -0.2%
msquic-msquic-nopacing 186.0 ± 99.9 115.5 642.4 172.0 ± 0.3
msquic-neqo-cubic 186.4 ± 64.2 127.4 376.4 171.6 ± 0.5 -7.9 -4.1%
neqo-google-cubic 747.0 ± 5.1 713.9 757.6 42.8 ± 6.3 -0.4 -0.0%
neqo-msquic-cubic 153.0 ± 4.1 147.0 162.0 209.2 ± 7.8 -0.9 -0.6%
neqo-neqo-cubic 96.8 ± 4.5 90.2 107.1 330.4 ± 7.1 -0.3 -0.3%
neqo-neqo-cubic-nopacing 96.9 ± 4.5 87.5 108.1 330.3 ± 7.1 0.7 0.8%
neqo-neqo-newreno 96.6 ± 4.4 87.6 106.5 331.4 ± 7.3 -0.4 -0.4%
neqo-neqo-newreno-nopacing 96.9 ± 4.3 87.3 105.2 330.2 ± 7.4 0.6 0.7%
neqo-quiche-cubic 191.2 ± 4.4 185.8 200.4 167.4 ± 7.3 -0.1 -0.0%
neqo-s2n-cubic 217.4 ± 4.2 209.5 228.8 147.2 ± 7.6 💚 -1.7 -0.8%
quiche-neqo-cubic 178.4 ± 4.7 169.9 190.5 179.4 ± 6.8 -0.3 -0.2%
quiche-quiche-nopacing 142.4 ± 4.6 134.0 157.5 224.8 ± 7.0
s2n-neqo-cubic 221.0 ± 3.9 213.8 229.7 144.8 ± 8.2 -1.1 -0.5%
s2n-s2n-nopacing 293.4 ± 15.3 282.4 388.8 109.1 ± 2.1

Download data for profiler.firefox.com or download performance comparison data.

@mxinden
Copy link
Copy Markdown
Member

mxinden commented Mar 22, 2026

Is there evidence that a larger send buffer improves performance? Intuitively, I can not tell whether it is a performance improvement, or leads to source buffer bloat.

See past discussion in #2470 (comment).

@mxinden
Copy link
Copy Markdown
Member

mxinden commented Mar 22, 2026

I also commented on Phabricator: https://phabricator.services.mozilla.com/D288864#10030860

@larseggert
Copy link
Copy Markdown
Collaborator Author

Are you concerned 1MB is too much? The defaults are IIRC ~260KB on Linux and less on Mac, so I worry we may be buffer limited when uploading over reasonably fast paths.

@mxinden
Copy link
Copy Markdown
Member

mxinden commented Mar 24, 2026

Yes, I am worried that 1 MB is too much. In which scenario do you see Firefox exceeding the 260KB OS buffer?

An example:

  • given a 50ms RTT connection
  • given a 260 KB OS UDP send buffer
  • with a single send per RTT we would thus be able to saturate a 40 Mbit/s link
  • more realistically, but still very conservative, with 10 sends per RTT we would be able to saturate a 400 Mbit/s link

Given that we pace all sends across 0.5 of the RTT, I don't see us benefiting from a larger OS buffer. With the theoretical benefits of a larger buffer come drawbacks, i.e. buffer bloat, which I assume are far more negative than the benefits.

@martinthomson
Copy link
Copy Markdown
Member

Careful, you almost make a good argument for a larger buffer. Lots of people are on gigabit links. I don't know how often we'd be able to service timers over 50ms, but I suspect it is more like 3 times than 10 in some cases (if timer granularity isn't increased beyond the typical).

We definitely need to fix that pacing speedup. It's doing us more harm than good right now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants