Skip to content

fix: Handle ENOBUFS in UDP send#3482

Open
larseggert wants to merge 5 commits intomozilla:mainfrom
larseggert:fix-enobufs
Open

fix: Handle ENOBUFS in UDP send#3482
larseggert wants to merge 5 commits intomozilla:mainfrom
larseggert:fix-enobufs

Conversation

@larseggert
Copy link
Copy Markdown
Collaborator

Based on a discussion with @ianswett yesterday, don't treat ENOBUFS as a fatal error when sending UDP packets.

Copilot AI review requested due to automatic review settings March 17, 2026 02:57
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Updates neqo-udp’s UDP send error handling to avoid treating ENOBUFS as a fatal send error, aligning send behavior with transient NIC/stack buffer exhaustion on some platforms.

Changes:

  • Treat ENOBUFS from UdpSocketState::try_send() as io::ErrorKind::WouldBlock instead of a hard error.
  • Add is_enobufs() helper (unix-gated) and update unit tests to account for the new predicate.

You can also share your feedback on Copilot code review. Take the survey.

Comment thread neqo-udp/src/lib.rs Outdated
Comment thread neqo-udp/src/lib.rs Outdated
Comment thread neqo-udp/src/lib.rs
Comment thread neqo-udp/src/lib.rs
@codecov
Copy link
Copy Markdown

codecov Bot commented Mar 17, 2026

Codecov Report

❌ Patch coverage is 85.18519% with 4 lines in your changes missing coverage. Please review.
✅ Project coverage is 94.19%. Comparing base (53d49ba) to head (0c9c486).
⚠️ Report is 10 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #3482      +/-   ##
==========================================
- Coverage   94.30%   94.19%   -0.11%     
==========================================
  Files         127      131       +4     
  Lines       38837    39178     +341     
  Branches    38837    39178     +341     
==========================================
+ Hits        36626    36905     +279     
- Misses       1376     1428      +52     
- Partials      835      845      +10     
Flag Coverage Δ
freebsd 93.20% <86.36%> (-0.12%) ⬇️
linux 94.26% <86.36%> (-0.01%) ⬇️
macos 94.20% <86.36%> (-0.01%) ⬇️
windows 94.24% <66.66%> (-0.02%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Components Coverage Δ
neqo-common 98.50% <ø> (ø)
neqo-crypto 86.90% <ø> (ø)
neqo-http3 93.92% <ø> (ø)
neqo-qpack 95.03% <ø> (ø)
neqo-transport 95.20% <ø> (ø)
neqo-udp 82.69% <85.18%> (-0.85%) ⬇️
mtu 86.61% <ø> (ø)

@larseggert larseggert force-pushed the fix-enobufs branch 3 times, most recently from 04de82b to 345f3cf Compare March 17, 2026 09:02
Based on a discussion with @ianswett yesterday, don't treat `ENOBUFS` as a fatal error when sending UDP packets.
Copy link
Copy Markdown
Member

@mxinden mxinden left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer fixing this issue in quinn-udp, thus de-duplicating the logic across the various usages.

quinn-rs/quinn#2293

@larseggert
Copy link
Copy Markdown
Collaborator Author

No objections!

@larseggert
Copy link
Copy Markdown
Collaborator Author

@mxinden quinn-rs/quinn#2293 got closed - what now?

@mxinden
Copy link
Copy Markdown
Member

mxinden commented Apr 1, 2026

@larseggert let's try to get to the root of the discussion. I prepared quinn-rs/quinn#2594. Can you help out?

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 10, 2026

Claude finished @larseggert's task in 2m 3s —— View job


Review of PR #3482

  • Read CLAUDE.md and repo guidelines
  • Read the changed file and PR diff
  • Analyze existing review comments
  • Post review with inline comments

Summary

Reviewed the ENOBUFS handling changes. The approach is correct — callers already handle WouldBlock with wait-and-retry loops, making this a clean fit. Posted 2 inline suggestions:

  1. Comment/code scope mismatch — comment says "macOS/BSD" but the predicate fires on all unix + Windows
  2. Missing Windows test for WSAENOBUFS

Also addressed previous copilot review comments (original error loss is acceptable given qdebug! logging, EAGAIN coverage is present in the combined test, and ENOBUFS integration testing is impractical without socket mocking).

Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary

Maps ENOBUFS (and WSAENOBUFS on Windows) to WouldBlock during UDP send, so the existing retry loops in both client and server correctly wait for writability and retry rather than treating it as fatal. Also cleans up is_emsgsize by switching from map_or to is_some_and, removing the clippy suppression.

The approach is sound — I verified that callers (neqo-bin/src/client/mod.rs:484, neqo-bin/src/server/mod.rs:325) handle WouldBlock with a writable().await + retry loop, which is exactly the right recovery for a transient send queue exhaustion. The try_io wrapper in neqo-bin/src/udp.rs:93 also correctly propagates the WouldBlock through tokio's readiness machinery.

Notes

  • Comment/code scope mismatch: The inline comment says "On macOS/BSD" but the predicate fires on all unix + Windows. See inline suggestion.
  • Windows test gap: is_enobufs on Windows (WSAENOBUFS) has no test coverage. See inline suggestion.
  • Upstream dedup: Agree with @mxinden that this could live in quinn-udp to benefit all consumers. If quinn-rs/quinn#2594 lands, this local workaround can be removed. Worth tracking.

On previous review comments

  • @copilot re: losing the original error (line 90): Agreed in principle, but callers only inspect e.kind() and the original is captured by qdebug!. Not blocking.
  • @copilot re: missing send_inner integration test (line 91): Triggering ENOBUFS in a test environment is impractical without mocking the socket layer. The unit tests on the predicates are sufficient.
  • @copilot re: EAGAIN coverage dropped (line 384): This is addressed — the combined is_emsgsize_and_is_enobufs_are_disjoint test at line 377 asserts both predicates return false for EAGAIN.

Comment thread neqo-udp/src/lib.rs Outdated
Comment thread neqo-udp/src/lib.rs
#[cfg(not(any(unix, windows)))]
return false;
})
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note

Missing Windows test coverage for is_enobufs. The existing Windows tests cover is_emsgsize variants (WSAEMSGSIZE, WSAEINVAL, WSAEWOULDBLOCK) but don't verify WSAENOBUFS. Consider adding:

#[test]
#[cfg(windows)]
fn is_enobufs_true_for_wsaenobufs() {
    let err = io::Error::from_raw_os_error(WinSock::WSAENOBUFS.0);
    assert!(is_enobufs(&err));
    assert!(!is_emsgsize(&err));
}

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 10, 2026

Reviewing PR #3482

  • Read CLAUDE.md and repo guidelines
  • Read changed files and PR diff
  • Analyze changes and post review

View job run

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Signed-off-by: Lars Eggert <lars@eggert.org>
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 10, 2026

Claude finished @larseggert's task in 2m 44s —— View job


Reviewing PR #3482

  • Read CLAUDE.md and repo guidelines
  • Read changed file and PR diff
  • Check existing review comments
  • Post review with inline comments

Summary

Reviewed the ENOBUFS → WouldBlock mapping. The approach integrates correctly with tokio's readiness model via try_io, and callers already handle WouldBlock with wait-and-retry loops. Posted a formal PR review with one inline suggestion for missing Windows WSAENOBUFS test coverage.

Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approach is sound. ENOBUFS → WouldBlock integrates correctly with the callers: both client/mod.rs:484 and server/mod.rs:325 retry in a writable().await loop, and critically, the try_io wrapper in neqo-bin/src/udp.rs:93 ensures tokio clears the socket's readiness flag on WouldBlock, preventing a busy-loop.

The is_emsgsize cleanup (is_some_and replacing map_or + clippy suppression) and the combined disjointness test are nice improvements.

One inline comment for missing Windows test coverage. Previous reviewer comments about upstream dedup in quinn-udp (quinn-rs/quinn#2594) are worth tracking — if that lands, this local workaround can be removed.

Comment thread neqo-udp/src/lib.rs
Comment on lines 411 to 412

#[test]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note

Windows WSAENOBUFS coverage gap — the unix test is_emsgsize_and_is_enobufs_are_disjoint verifies both predicates for EMSGSIZE, ENOBUFS, and EAGAIN, but there's no Windows-side equivalent for WSAENOBUFS. The existing Windows tests only cover is_emsgsize variants.

Suggested change
#[test]
#[test]
#[cfg(windows)]
fn is_enobufs_true_for_wsaenobufs() {
let err = io::Error::from_raw_os_error(WinSock::WSAENOBUFS.0);
assert!(is_enobufs(&err));
assert!(!is_emsgsize(&err));
}
#[test]

@github-actions
Copy link
Copy Markdown
Contributor

Failed Interop Tests

QUIC Interop Runner, client vs. server, differences relative to main at 00e61ef.

neqo-pr as clientneqo-pr as server
neqo-pr vs. go-x-net: BP BA
neqo-pr vs. haproxy: BP BA
neqo-pr vs. kwik: S L1 C1 BP BA
neqo-pr vs. linuxquic: C1
neqo-pr vs. lsquic: baseline result missing
neqo-pr vs. msquic: A L1 C1
neqo-pr vs. mvfst: H DC LR M R Z 3 B U A L1 L2 C1 C2 6 BP BA
neqo-pr vs. neqo: Z A
neqo-pr vs. nginx: BP BA
neqo-pr vs. ngtcp2: CM
neqo-pr vs. picoquic: A
neqo-pr vs. quic-go: A
neqo-pr vs. quiche: ⚠️L1 BP BA
neqo-pr vs. s2n-quic: ⚠️BP BA CM
neqo-pr vs. tquic: S BP BA
neqo-pr vs. xquic: A 🚀L1 C1
aioquic vs. neqo-pr: Z CM
go-x-net vs. neqo-pr: CM
kwik vs. neqo-pr: Z BP BA CM
lsquic vs. neqo-pr: Z
msquic vs. neqo-pr: Z CM
mvfst vs. neqo-pr: Z A L1 C1 CM
neqo vs. neqo-pr: Z A
openssl vs. neqo-pr: LR M A CM
picoquic vs. neqo-pr: Z
quic-go vs. neqo-pr: CM
quiche vs. neqo-pr: Z CM
quinn vs. neqo-pr: Z V2 CM
s2n-quic vs. neqo-pr: CM
tquic vs. neqo-pr: Z CM
xquic vs. neqo-pr: M CM
All results

Succeeded Interop Tests

QUIC Interop Runner, client vs. server

neqo-pr as client

neqo-pr as server

Unsupported Interop Tests

QUIC Interop Runner, client vs. server

neqo-pr as client

neqo-pr as server

@github-actions
Copy link
Copy Markdown
Contributor

Benchmark results

Significant performance differences relative to 93f8004.

transfer/1-conn/1-100mb-req (aka. Upload)/mtu-1504: 💚 Performance has improved by -1.4275%.
       time:   [202.01 ms 202.39 ms 202.84 ms]
       thrpt:  [493.00 MiB/s 494.09 MiB/s 495.03 MiB/s]
change:
       time:   [-1.7257% -1.4275% -1.1276] (p = 0.00 < 0.05)
       thrpt:  [+1.1405% +1.4482% +1.7560]
       Performance has improved.
Found 1 outliers among 100 measurements (1.00%)
1 (1.00%) high severe
All results
transfer/1-conn/1-100mb-resp (aka. Download)/mtu-1504: Change within noise threshold.
       time:   [200.06 ms 200.64 ms 201.31 ms]
       thrpt:  [496.74 MiB/s 498.41 MiB/s 499.84 MiB/s]
change:
       time:   [-1.3346% -0.9891% -0.6262] (p = 0.00 < 0.05)
       thrpt:  [+0.6302% +0.9990% +1.3526]
       Change within noise threshold.
Found 3 outliers among 100 measurements (3.00%)
2 (2.00%) high mild
1 (1.00%) high severe
transfer/1-conn/10_000-parallel-1b-resp (aka. RPS)/mtu-1504: No change in performance detected.
       time:   [285.70 ms 287.35 ms 289.01 ms]
       thrpt:  [34.601 Kelem/s 34.801 Kelem/s 35.001 Kelem/s]
change:
       time:   [-0.1259% +0.7315% +1.5788] (p = 0.10 > 0.05)
       thrpt:  [-1.5543% -0.7262% +0.1260]
       No change in performance detected.
Found 2 outliers among 100 measurements (2.00%)
1 (1.00%) low mild
1 (1.00%) high mild
transfer/1-conn/1-1b-resp (aka. HPS)/mtu-1504: No change in performance detected.
       time:   [38.789 ms 38.958 ms 39.144 ms]
       thrpt:  [25.547   B/s 25.669   B/s 25.780   B/s]
change:
       time:   [-0.4506% +0.2038% +0.8784] (p = 0.54 > 0.05)
       thrpt:  [-0.8707% -0.2034% +0.4527]
       No change in performance detected.
Found 10 outliers among 100 measurements (10.00%)
2 (2.00%) high mild
8 (8.00%) high severe
transfer/1-conn/1-100mb-req (aka. Upload)/mtu-1504: 💚 Performance has improved by -1.4275%.
       time:   [202.01 ms 202.39 ms 202.84 ms]
       thrpt:  [493.00 MiB/s 494.09 MiB/s 495.03 MiB/s]
change:
       time:   [-1.7257% -1.4275% -1.1276] (p = 0.00 < 0.05)
       thrpt:  [+1.1405% +1.4482% +1.7560]
       Performance has improved.
Found 1 outliers among 100 measurements (1.00%)
1 (1.00%) high severe
streams/walltime/1-streams/each-1000-bytes: No change in performance detected.
       time:   [593.75 µs 596.54 µs 600.12 µs]
       change: [-0.5828% +0.0265% +0.7514] (p = 0.94 > 0.05)
       No change in performance detected.
Found 8 outliers among 100 measurements (8.00%)
8 (8.00%) high severe
streams/walltime/1000-streams/each-1-bytes: No change in performance detected.
       time:   [12.290 ms 12.307 ms 12.324 ms]
       change: [-0.7874% -0.2680% +0.0874] (p = 0.28 > 0.05)
       No change in performance detected.
Found 1 outliers among 100 measurements (1.00%)
1 (1.00%) high mild
streams/walltime/1000-streams/each-1000-bytes: Change within noise threshold.
       time:   [44.401 ms 44.494 ms 44.637 ms]
       change: [-0.7751% -0.5245% -0.2330] (p = 0.00 < 0.05)
       Change within noise threshold.
Found 2 outliers among 100 measurements (2.00%)
2 (2.00%) high severe
transfer/walltime/pacing-false/varying-seeds: Change within noise threshold.
       time:   [81.114 ms 81.187 ms 81.259 ms]
       change: [+1.7971% +1.9600% +2.0968] (p = 0.00 < 0.05)
       Change within noise threshold.
Found 2 outliers among 100 measurements (2.00%)
1 (1.00%) low severe
1 (1.00%) high mild
transfer/walltime/pacing-true/varying-seeds: No change in performance detected.
       time:   [81.256 ms 81.339 ms 81.445 ms]
       change: [-0.1638% +0.1040% +0.3346] (p = 0.43 > 0.05)
       No change in performance detected.
Found 3 outliers among 100 measurements (3.00%)
2 (2.00%) high mild
1 (1.00%) high severe
transfer/walltime/pacing-false/same-seed: Change within noise threshold.
       time:   [81.462 ms 81.553 ms 81.681 ms]
       change: [+2.0046% +2.2419% +2.4553] (p = 0.00 < 0.05)
       Change within noise threshold.
Found 2 outliers among 100 measurements (2.00%)
1 (1.00%) high mild
1 (1.00%) high severe
transfer/walltime/pacing-true/same-seed: No change in performance detected.
       time:   [81.173 ms 81.249 ms 81.336 ms]
       change: [-0.3887% -0.1776% -0.0128] (p = 0.07 > 0.05)
       No change in performance detected.
Found 3 outliers among 100 measurements (3.00%)
1 (1.00%) high mild
2 (2.00%) high severe

Download data for profiler.firefox.com or download performance comparison data.

@github-actions
Copy link
Copy Markdown
Contributor

Client/server transfer results

Performance differences relative to 93f8004.

Transfer of 33554432 bytes over loopback, min. 100 runs. All unit-less numbers are in milliseconds.

Client vs. server (params) Mean ± σ Min Max MiB/s ± σ Δ baseline Δ baseline
neqo-neqo-cubic 97.4 ± 4.2 88.9 106.2 328.5 ± 7.6 💚 -1.2 -1.2%
neqo-neqo-cubic-nopacing 96.0 ± 3.9 88.4 103.9 333.5 ± 8.2 💚 -2.3 -2.4%
neqo-quiche-cubic 190.1 ± 3.7 185.5 201.3 168.3 ± 8.6 💚 -1.4 -0.7%

Table above only shows statistically significant changes. See all results below.

All results

Transfer of 33554432 bytes over loopback, min. 100 runs. All unit-less numbers are in milliseconds.

Client vs. server (params) Mean ± σ Min Max MiB/s ± σ Δ baseline Δ baseline
google-google-nopacing 454.9 ± 3.9 447.7 467.2 70.4 ± 8.2
google-neqo-cubic 271.2 ± 4.9 261.3 284.8 118.0 ± 6.5 0.2 0.1%
msquic-msquic-nopacing 189.5 ± 89.6 120.5 528.6 168.8 ± 0.4
msquic-neqo-cubic 214.7 ± 91.9 132.8 550.3 149.1 ± 0.3 -10.4 -4.6%
neqo-google-cubic 758.6 ± 4.5 751.6 778.2 42.2 ± 7.1 0.9 0.1%
neqo-msquic-cubic 154.7 ± 4.4 147.2 163.3 206.9 ± 7.3 1.1 0.7%
neqo-neqo-cubic 97.4 ± 4.2 88.9 106.2 328.5 ± 7.6 💚 -1.2 -1.2%
neqo-neqo-cubic-nopacing 96.0 ± 3.9 88.4 103.9 333.5 ± 8.2 💚 -2.3 -2.4%
neqo-neqo-newreno 96.9 ± 3.9 88.1 104.2 330.1 ± 8.2 0.2 0.2%
neqo-neqo-newreno-nopacing 98.1 ± 4.6 87.4 112.1 326.2 ± 7.0 0.4 0.4%
neqo-quiche-cubic 190.1 ± 3.7 185.5 201.3 168.3 ± 8.6 💚 -1.4 -0.7%
neqo-s2n-cubic 220.7 ± 4.4 213.4 229.8 145.0 ± 7.3 0.3 0.1%
quiche-neqo-cubic 180.4 ± 4.4 168.2 188.3 177.4 ± 7.3 -0.0 -0.0%
quiche-quiche-nopacing 141.0 ± 4.8 134.3 155.1 226.9 ± 6.7
s2n-neqo-cubic 220.4 ± 4.5 211.3 230.3 145.2 ± 7.1 -0.2 -0.1%
s2n-s2n-nopacing 294.3 ± 14.9 280.2 392.3 108.7 ± 2.1

Download data for profiler.firefox.com or download performance comparison data.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants