feat: Trigger migration based on UDP send errors#3485
feat: Trigger migration based on UDP send errors#3485larseggert wants to merge 1 commit intomozilla:mainfrom
Conversation
ed2bf7e to
d02a1a3
Compare
There was a problem hiding this comment.
Pull request overview
This PR adds a mechanism for reacting to UDP send failures that indicate a path is no longer usable, by scheduling path re-probing and (on clients) immediately failing over to an alternate valid path when available.
Changes:
- Add
neqo_udp::is_network_error()to classify transient network-level send failures. - Add
Connection::on_path_unavailable()/Paths::on_path_unavailable()and new path state handling to trigger re-probing and client fallback behavior. - Wire send-error handling into
neqo-binclient and add transport-level migration tests covering fallback/re-probe/close behavior.
Reviewed changes
Copilot reviewed 9 out of 9 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| neqo-udp/src/lib.rs | Adds is_network_error() helper and tests to classify network send errors. |
| neqo-transport/src/path.rs | Adds path/paths hooks (mark_failed_send, on_path_unavailable) to schedule probes and fallback to another path. |
| neqo-transport/src/connection/tests/migration.rs | Adds new tests for send-error-triggered fallback, probe exhaustion, and server behavior. |
| neqo-transport/src/connection/mod.rs | Exposes Connection::on_path_unavailable() to integrate send failures with path management and recovery. |
| neqo-http3/src/connection_client.rs | Plumbs on_path_unavailable() through the HTTP/3 client wrapper. |
| neqo-bin/src/server/mod.rs | Treats network send errors as non-fatal for the server runner send loop. |
| neqo-bin/src/client/mod.rs | Detects network send errors and notifies the QUIC engine via on_path_unavailable(). |
| neqo-bin/src/client/http3.rs | Implements the new client trait method by forwarding to the HTTP/3 client. |
| neqo-bin/src/client/http09.rs | Implements the new client trait method by forwarding to the transport connection. |
You can also share your feedback on Copilot code review. Take the survey.
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #3485 +/- ##
==========================================
- Coverage 94.30% 94.26% -0.04%
==========================================
Files 127 131 +4
Lines 38711 39252 +541
Branches 38711 39252 +541
==========================================
+ Hits 36505 37000 +495
- Misses 1365 1403 +38
- Partials 841 849 +8
Flags with carried forward coverage won't be shown. Click here to find out more.
|
cf653ed to
21d43d7
Compare
|
On a high level: I assume the main motivation here is Firefox on Android switching between cellular and wifi, is that correct @larseggert? In general, I am in favor of supporting QUIC path migration. Before we merge here, I would like to better understand the larger plan. As far as I can tell there are multiple missing pieces to achieve the larger goal of path migration in Firefox.
Before better understanding the larger project, I don't think merging here makes sense. It introduces complexity with no gain (today) in Firefox. |
When sending indicates that the current path may not be working, trigger migration to a new path.
21d43d7 to
59944ce
Compare
|
So this is mostly plumbing, which as you say would require neqo-glue changes to take effect in Gecko. I was thinking we can experiment with the demo client and server here? |
But then the first step would be providing new client addresses to Neqo, right? Otherwise there would never be any two paths to choose from in the first place. |
Benchmark resultsSignificant performance differences relative to 64a36e2. transfer/1-conn/1-100mb-resp (aka. Download)/mtu-1504: 💚 Performance has improved by -1.9440%. time: [198.59 ms 199.05 ms 199.61 ms]
thrpt: [500.99 MiB/s 502.39 MiB/s 503.56 MiB/s]
change:
time: [-2.2712% -1.9440% -1.6026] (p = 0.00 < 0.05)
thrpt: [+1.6287% +1.9825% +2.3240]
Performance has improved.
Found 2 outliers among 100 measurements (2.00%)
1 (1.00%) high mild
1 (1.00%) high severeAll resultstransfer/1-conn/1-100mb-resp (aka. Download)/mtu-1504: 💚 Performance has improved by -1.9440%. time: [198.59 ms 199.05 ms 199.61 ms]
thrpt: [500.99 MiB/s 502.39 MiB/s 503.56 MiB/s]
change:
time: [-2.2712% -1.9440% -1.6026] (p = 0.00 < 0.05)
thrpt: [+1.6287% +1.9825% +2.3240]
Performance has improved.
Found 2 outliers among 100 measurements (2.00%)
1 (1.00%) high mild
1 (1.00%) high severetransfer/1-conn/10_000-parallel-1b-resp (aka. RPS)/mtu-1504: Change within noise threshold. time: [281.24 ms 283.18 ms 285.11 ms]
thrpt: [35.074 Kelem/s 35.314 Kelem/s 35.556 Kelem/s]
change:
time: [-2.0875% -1.1559% -0.1608] (p = 0.02 < 0.05)
thrpt: [+0.1611% +1.1694% +2.1320]
Change within noise threshold.
Found 1 outliers among 100 measurements (1.00%)
1 (1.00%) high mildtransfer/1-conn/1-1b-resp (aka. HPS)/mtu-1504: No change in performance detected. time: [38.478 ms 38.636 ms 38.814 ms]
thrpt: [25.764 B/s 25.883 B/s 25.989 B/s]
change:
time: [-0.1172% +0.4624% +1.0079] (p = 0.12 > 0.05)
thrpt: [-0.9978% -0.4602% +0.1174]
No change in performance detected.
Found 7 outliers among 100 measurements (7.00%)
1 (1.00%) high mild
6 (6.00%) high severetransfer/1-conn/1-100mb-req (aka. Upload)/mtu-1504: Change within noise threshold. time: [205.20 ms 205.55 ms 205.91 ms]
thrpt: [485.64 MiB/s 486.50 MiB/s 487.32 MiB/s]
change:
time: [+0.0127% +0.2523% +0.4882] (p = 0.05 < 0.05)
thrpt: [-0.4858% -0.2516% -0.0127]
Change within noise threshold.
Found 1 outliers among 100 measurements (1.00%)
1 (1.00%) high mildstreams/walltime/1-streams/each-1000-bytes: No change in performance detected. time: [586.42 µs 588.29 µs 590.50 µs]
change: [-0.4090% +0.0993% +0.5818] (p = 0.72 > 0.05)
No change in performance detected.
Found 17 outliers among 100 measurements (17.00%)
11 (11.00%) high mild
6 (6.00%) high severestreams/walltime/1000-streams/each-1-bytes: No change in performance detected. time: [12.349 ms 12.368 ms 12.388 ms]
change: [-0.2192% +0.0221% +0.2441] (p = 0.86 > 0.05)
No change in performance detected.
Found 1 outliers among 100 measurements (1.00%)
1 (1.00%) high mildstreams/walltime/1000-streams/each-1000-bytes: Change within noise threshold. time: [44.878 ms 44.963 ms 45.082 ms]
change: [+0.5644% +0.7884% +1.1189] (p = 0.00 < 0.05)
Change within noise threshold.
Found 3 outliers among 100 measurements (3.00%)
2 (2.00%) high mild
1 (1.00%) high severetransfer/walltime/pacing-false/varying-seeds: No change in performance detected. time: [80.846 ms 80.909 ms 80.979 ms]
change: [-0.2376% -0.0333% +0.1200] (p = 0.75 > 0.05)
No change in performance detected.
Found 3 outliers among 100 measurements (3.00%)
2 (2.00%) high mild
1 (1.00%) high severetransfer/walltime/pacing-true/varying-seeds: Change within noise threshold. time: [82.336 ms 82.428 ms 82.542 ms]
change: [+0.5549% +0.7720% +0.9516] (p = 0.00 < 0.05)
Change within noise threshold.
Found 5 outliers among 100 measurements (5.00%)
3 (3.00%) high mild
2 (2.00%) high severetransfer/walltime/pacing-false/same-seed: No change in performance detected. time: [80.732 ms 80.861 ms 81.018 ms]
change: [-0.2691% +0.0290% +0.3154] (p = 0.84 > 0.05)
No change in performance detected.
Found 9 outliers among 100 measurements (9.00%)
2 (2.00%) high mild
7 (7.00%) high severetransfer/walltime/pacing-true/same-seed: Change within noise threshold. time: [81.896 ms 81.977 ms 82.074 ms]
change: [-0.8414% -0.6985% -0.5529] (p = 0.00 < 0.05)
Change within noise threshold.
Found 4 outliers among 100 measurements (4.00%)
1 (1.00%) low mild
2 (2.00%) high mild
1 (1.00%) high severeDownload data for |
Failed Interop TestsQUIC Interop Runner, client vs. server, differences relative to
All resultsSucceeded Interop TestsQUIC Interop Runner, client vs. server neqo-pr as client
neqo-pr as server
Unsupported Interop TestsQUIC Interop Runner, client vs. server neqo-pr as client
neqo-pr as server
|
Client/server transfer resultsPerformance differences relative to 64a36e2. Transfer of 33554432 bytes over loopback, min. 100 runs. All unit-less numbers are in milliseconds.
Table above only shows statistically significant changes. See all results below. All resultsTransfer of 33554432 bytes over loopback, min. 100 runs. All unit-less numbers are in milliseconds.
Download data for |
|
Yes, this needs more pieces before being useful. Would you prefer to expand the PR before landing it? |
Yes. I don't think at this point in time there is value in merging this pull request, only the risk of dead code on in |
When sending indicates that the current path may not be working, trigger migration to a new path.