Skip to content

chore: Stabilize benchmarks for better signal#3370

Draft
larseggert wants to merge 2 commits intomozilla:mainfrom
larseggert:chore-bench-stabilization
Draft

chore: Stabilize benchmarks for better signal#3370
larseggert wants to merge 2 commits intomozilla:mainfrom
larseggert:chore-bench-stabilization

Conversation

@larseggert
Copy link
Copy Markdown
Collaborator

More runs, tighter intervals, joint config.

@codspeed-hq
Copy link
Copy Markdown

codspeed-hq Bot commented Jan 28, 2026

Merging this PR will degrade performance by 4.53%

❌ 1 regressed benchmark
✅ 26 untouched benchmarks
🗄️ 5 archived benchmarks run1

⚠️ Please fix the performance issues or acknowledge them on CodSpeed.

Performance Changes

Mode Benchmark BASE HEAD Efficiency
WallTime walltime/1-streams/each-1000-bytes 1.4 ms 1.4 ms -4.53%

Comparing larseggert:chore-bench-stabilization (ed64ad6) with main (6a84421)

Open in CodSpeed

Footnotes

  1. 5 benchmarks were run, but are now archived. If they were deleted in another branch, consider rebasing to remove them from the report. Instead if they were added back, click here to restore them.

@codecov
Copy link
Copy Markdown

codecov Bot commented Jan 28, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 94.22%. Comparing base (6a84421) to head (ed64ad6).

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #3370      +/-   ##
==========================================
- Coverage   94.33%   94.22%   -0.12%     
==========================================
  Files         125      129       +4     
  Lines       38392    38722     +330     
  Branches    38392    38722     +330     
==========================================
+ Hits        36216    36484     +268     
- Misses       1337     1389      +52     
- Partials      839      849      +10     
Flag Coverage Δ
freebsd 93.25% <ø> (-0.12%) ⬇️
linux 94.31% <ø> (-0.01%) ⬇️
macos 94.21% <ø> (-0.01%) ⬇️
windows 94.32% <ø> (-0.01%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Components Coverage Δ
neqo-common 98.49% <ø> (ø)
neqo-crypto 86.90% <ø> (ø)
neqo-http3 93.91% <ø> (ø)
neqo-qpack 94.81% <ø> (ø)
neqo-transport 95.30% <ø> (-0.02%) ⬇️
neqo-udp 82.90% <ø> (ø)
mtu 86.61% <ø> (ø)

@larseggert larseggert force-pushed the chore-bench-stabilization branch from d38618d to e6a3222 Compare January 28, 2026 14:38
More runs, tighter intervals, joint config.
@larseggert larseggert force-pushed the chore-bench-stabilization branch from e6a3222 to fbbcbbd Compare January 28, 2026 14:56
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Mar 4, 2026

Failed Interop Tests

QUIC Interop Runner, client vs. server, differences relative to main at 6a84421.

neqo-pr as clientneqo-pr as server
neqo-pr vs. go-x-net: BP BA
neqo-pr vs. haproxy: BP BA
neqo-pr vs. kwik: BP BA
neqo-pr vs. lsquic: L1 C1
neqo-pr vs. msquic: A L1 C1
neqo-pr vs. mvfst: A L1 C1 BA
neqo-pr vs. neqo: Z A
neqo-pr vs. nginx: BP BA
neqo-pr vs. ngtcp2: CM
neqo-pr vs. picoquic: A
neqo-pr vs. quic-go: A
neqo-pr vs. quiche: BP BA
neqo-pr vs. s2n-quic: ⚠️BP BA CM
neqo-pr vs. tquic: S BP BA
neqo-pr vs. xquic: A 🚀L1 ⚠️C1
aioquic vs. neqo-pr: Z ⚠️C1 CM
go-x-net vs. neqo-pr: CM
kwik vs. neqo-pr: Z BP BA CM
lsquic vs. neqo-pr: Z ⚠️C1
msquic vs. neqo-pr: Z 🚀BP CM
mvfst vs. neqo-pr: Z A L1 C1 CM
neqo vs. neqo-pr: Z A
openssl vs. neqo-pr: LR M A CM
picoquic vs. neqo-pr: Z
quic-go vs. neqo-pr: 🚀Z BP CM
quiche vs. neqo-pr: Z CM
quinn vs. neqo-pr: Z 🚀C1 V2 CM
s2n-quic vs. neqo-pr: CM
tquic vs. neqo-pr: Z CM
xquic vs. neqo-pr: M CM
All results

Succeeded Interop Tests

QUIC Interop Runner, client vs. server

neqo-pr as client

neqo-pr as server

Unsupported Interop Tests

QUIC Interop Runner, client vs. server

neqo-pr as client

neqo-pr as server

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Mar 4, 2026

Client/server transfer results

Performance differences relative to 6a84421.

Transfer of 33554432 bytes over loopback, min. 100 runs. All unit-less numbers are in milliseconds.

Client vs. server (params) Mean ± σ Min Max MiB/s ± σ Δ main Δ main
neqo-google-cubic 772.6 ± 4.1 765.6 785.5 41.4 ± 7.8 💔 1.3 0.2%
neqo-neqo-newreno-nopacing 96.5 ± 4.2 87.8 104.6 331.5 ± 7.6 💚 -1.3 -1.4%
neqo-s2n-cubic 221.9 ± 4.8 212.5 231.6 144.2 ± 6.7 💔 3.6 1.7%

Table above only shows statistically significant changes. See all results below.

All results

Transfer of 33554432 bytes over loopback, min. 100 runs. All unit-less numbers are in milliseconds.

Client vs. server (params) Mean ± σ Min Max MiB/s ± σ Δ main Δ main
google-google-nopacing 463.4 ± 4.0 457.1 472.7 69.1 ± 8.0
google-neqo-cubic 275.1 ± 4.1 266.3 285.4 116.3 ± 7.8 0.1 0.0%
msquic-msquic-nopacing 181.8 ± 58.2 137.6 457.3 176.0 ± 0.5
msquic-neqo-cubic 208.1 ± 57.5 147.4 433.2 153.8 ± 0.6 1.1 0.6%
neqo-google-cubic 772.6 ± 4.1 765.6 785.5 41.4 ± 7.8 💔 1.3 0.2%
neqo-msquic-cubic 159.8 ± 4.8 150.8 168.5 200.3 ± 6.7 -0.3 -0.2%
neqo-neqo-cubic 96.7 ± 5.1 87.3 106.5 331.0 ± 6.3 1.3 1.3%
neqo-neqo-cubic-nopacing 96.3 ± 4.9 85.7 106.2 332.1 ± 6.5 0.9 0.9%
neqo-neqo-newreno 95.5 ± 4.5 89.1 107.1 334.9 ± 7.1 -0.5 -0.5%
neqo-neqo-newreno-nopacing 96.5 ± 4.2 87.8 104.6 331.5 ± 7.6 💚 -1.3 -1.4%
neqo-quiche-cubic 191.4 ± 4.0 186.0 208.7 167.2 ± 8.0 0.8 0.4%
neqo-s2n-cubic 221.9 ± 4.8 212.5 231.6 144.2 ± 6.7 💔 3.6 1.7%
quiche-neqo-cubic 157.1 ± 8.7 145.3 201.7 203.7 ± 3.7 -0.9 -0.6%
quiche-quiche-nopacing 145.8 ± 4.5 138.2 159.8 219.4 ± 7.1
s2n-neqo-cubic 175.4 ± 5.1 164.7 190.7 182.4 ± 6.3 0.5 0.3%
s2n-s2n-nopacing 246.5 ± 23.2 231.5 349.8 129.8 ± 1.4

Download data for profiler.firefox.com or download performance comparison data.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Mar 4, 2026

Benchmark results

Significant performance differences relative to 6a84421.

decode 1048576 bytes, mask ff: 💚 Performance has improved by -2.9816%.
       time:   [1.1235 ms 1.1270 ms 1.1316 ms]
       change: [-3.8578% -2.9816% -2.3159] (p = 0.00 < 0.01)
       Performance has improved.
Found 108 outliers among 1000 measurements (10.80%)
14 (1.40%) low severe
56 (5.60%) low mild
2 (0.20%) high mild
36 (3.60%) high severe
coalesce_acked_from_zero 1000+1 entries: 💔 Performance has regressed by +2.4517%.
       time:   [97.448 ns 97.651 ns 97.896 ns]
       change: [+1.8323% +2.4517% +2.9603] (p = 0.00 < 0.01)
       Performance has regressed.
Found 21 outliers among 1000 measurements (2.10%)
4 (0.40%) high mild
17 (1.70%) high severe
All results
transfer/1-conn/1-100mb-resp (aka. Download)/mtu-1504: No change in performance detected.
       time:   [205.08 ms 205.29 ms 205.55 ms]
       thrpt:  [486.50 MiB/s 487.11 MiB/s 487.63 MiB/s]
change:
       time:   [-0.5469% -0.0096% +0.3899] (p = 0.91 > 0.01)
       thrpt:  [-0.3884% +0.0096% +0.5499]
       No change in performance detected.
Found 11 outliers among 500 measurements (2.20%)
5 (1.00%) high mild
6 (1.20%) high severe
transfer/1-conn/10_000-parallel-1b-resp (aka. RPS)/mtu-1504: No change in performance detected.
       time:   [282.77 ms 284.07 ms 285.39 ms]
       thrpt:  [35.040 Kelem/s 35.202 Kelem/s 35.365 Kelem/s]
change:
       time:   [-0.8402% +0.2483% +1.2901] (p = 0.55 > 0.01)
       thrpt:  [-1.2736% -0.2477% +0.8474]
       No change in performance detected.
Found 4 outliers among 500 measurements (0.80%)
4 (0.80%) high mild
transfer/1-conn/1-1b-resp (aka. HPS)/mtu-1504: No change in performance detected.
       time:   [38.697 ms 38.767 ms 38.842 ms]
       thrpt:  [25.746   B/s 25.795   B/s 25.842   B/s]
change:
       time:   [-0.6880% +0.0825% +0.7560] (p = 0.79 > 0.01)
       thrpt:  [-0.7503% -0.0825% +0.6928]
       No change in performance detected.
Found 50 outliers among 500 measurements (10.00%)
35 (7.00%) high mild
15 (3.00%) high severe
transfer/1-conn/1-100mb-req (aka. Upload)/mtu-1504: No change in performance detected.
       time:   [205.95 ms 206.21 ms 206.49 ms]
       thrpt:  [484.28 MiB/s 484.94 MiB/s 485.55 MiB/s]
change:
       time:   [-0.6683% -0.2787% +0.0435] (p = 0.02 > 0.01)
       thrpt:  [-0.0435% +0.2794% +0.6728]
       No change in performance detected.
Found 11 outliers among 500 measurements (2.20%)
7 (1.40%) high mild
4 (0.80%) high severe
decode 4096 bytes, mask ff: No change in performance detected.
       time:   [4.5000 µs 4.5093 µs 4.5195 µs]
       change: [-0.2014% +0.3905% +0.9139] (p = 0.24 > 0.01)
       No change in performance detected.
Found 286 outliers among 1000 measurements (28.60%)
62 (6.20%) low severe
134 (13.40%) low mild
16 (1.60%) high mild
74 (7.40%) high severe
decode 1048576 bytes, mask ff: 💚 Performance has improved by -2.9816%.
       time:   [1.1235 ms 1.1270 ms 1.1316 ms]
       change: [-3.8578% -2.9816% -2.3159] (p = 0.00 < 0.01)
       Performance has improved.
Found 108 outliers among 1000 measurements (10.80%)
14 (1.40%) low severe
56 (5.60%) low mild
2 (0.20%) high mild
36 (3.60%) high severe
decode 4096 bytes, mask 7f: No change in performance detected.
       time:   [5.7839 µs 5.7938 µs 5.8045 µs]
       change: [-1.9747% -0.6698% -0.0011] (p = 0.02 > 0.01)
       No change in performance detected.
Found 160 outliers among 1000 measurements (16.00%)
34 (3.40%) low severe
35 (3.50%) low mild
20 (2.00%) high mild
71 (7.10%) high severe
decode 1048576 bytes, mask 7f: Change within noise threshold.
       time:   [1.5029 ms 1.5061 ms 1.5097 ms]
       change: [+0.9570% +1.2533% +1.5946] (p = 0.00 < 0.01)
       Change within noise threshold.
Found 48 outliers among 1000 measurements (4.80%)
2 (0.20%) high mild
46 (4.60%) high severe
decode 4096 bytes, mask 3f: No change in performance detected.
       time:   [5.5322 µs 5.5419 µs 5.5526 µs]
       change: [-0.7270% -0.2134% +0.2451] (p = 0.12 > 0.01)
       No change in performance detected.
Found 149 outliers among 1000 measurements (14.90%)
32 (3.20%) low severe
31 (3.10%) low mild
14 (1.40%) high mild
72 (7.20%) high severe
decode 1048576 bytes, mask 3f: No change in performance detected.
       time:   [1.4147 ms 1.4200 ms 1.4277 ms]
       change: [-0.3108% +0.1470% +0.7135] (p = 0.84 > 0.01)
       No change in performance detected.
Found 56 outliers among 1000 measurements (5.60%)
5 (0.50%) high mild
51 (5.10%) high severe
streams/simulated/1-streams/each-1000-bytes: Change within noise threshold.
       time:   [129.68 ms 129.68 ms 129.68 ms]
       thrpt:  [7.5304 KiB/s 7.5305 KiB/s 7.5306 KiB/s]
change:
       time:   [-0.0097% -0.0060% -0.0023] (p = 0.00 < 0.01)
       thrpt:  [+0.0023% +0.0060% +0.0097]
       Change within noise threshold.
Found 9 outliers among 1000 measurements (0.90%)
1 (0.10%) low mild
8 (0.80%) high mild
streams/simulated/1000-streams/each-1-bytes: No change in performance detected.
       time:   [2.5363 s 2.5365 s 2.5367 s]
       thrpt:  [394.22   B/s 394.25   B/s 394.27   B/s]
change:
       time:   [-0.0186% -0.0011% +0.0165] (p = 0.87 > 0.01)
       thrpt:  [-0.0165% +0.0011% +0.0186]
       No change in performance detected.
streams/simulated/1000-streams/each-1000-bytes: No change in performance detected.
       time:   [6.5926 s 6.5980 s 6.6040 s]
       thrpt:  [147.87 KiB/s 148.01 KiB/s 148.13 KiB/s]
change:
       time:   [-0.0906% +0.1103% +0.2848] (p = 0.24 > 0.01)
       thrpt:  [-0.2840% -0.1102% +0.0906]
       No change in performance detected.
Found 28 outliers among 1000 measurements (2.80%)
28 (2.80%) high severe
streams/walltime/1-streams/each-1000-bytes: No change in performance detected.
       time:   [590.73 µs 591.66 µs 592.71 µs]
       change: [-1.2397% -0.5003% +0.1318] (p = 0.03 > 0.01)
       No change in performance detected.
Found 83 outliers among 500 measurements (16.60%)
70 (14.00%) high mild
13 (2.60%) high severe
streams/walltime/1000-streams/each-1-bytes: Change within noise threshold.
       time:   [12.334 ms 12.345 ms 12.357 ms]
       change: [+0.1549% +0.3833% +0.6070] (p = 0.00 < 0.01)
       Change within noise threshold.
Found 10 outliers among 500 measurements (2.00%)
6 (1.20%) high mild
4 (0.80%) high severe
streams/walltime/1000-streams/each-1000-bytes: Change within noise threshold.
       time:   [44.991 ms 45.037 ms 45.088 ms]
       change: [-0.3678% -0.2035% -0.0337] (p = 0.00 < 0.01)
       Change within noise threshold.
Found 28 outliers among 500 measurements (5.60%)
1 (0.20%) low mild
19 (3.80%) high mild
8 (1.60%) high severe
coalesce_acked_from_zero 1+1 entries: Change within noise threshold.
       time:   [91.291 ns 91.579 ns 91.966 ns]
       change: [-1.1685% -0.4940% +0.1547] (p = 0.00 < 0.01)
       Change within noise threshold.
Found 86 outliers among 1000 measurements (8.60%)
66 (6.60%) low mild
4 (0.40%) high mild
16 (1.60%) high severe
coalesce_acked_from_zero 3+1 entries: No change in performance detected.
       time:   [109.89 ns 110.43 ns 111.11 ns]
       change: [-0.4207% +0.0994% +0.6821] (p = 0.84 > 0.01)
       No change in performance detected.
Found 54 outliers among 1000 measurements (5.40%)
22 (2.20%) low mild
6 (0.60%) high mild
26 (2.60%) high severe
coalesce_acked_from_zero 10+1 entries: No change in performance detected.
       time:   [109.44 ns 110.01 ns 110.78 ns]
       change: [-2.7179% -0.6946% +0.4027] (p = 0.12 > 0.01)
       No change in performance detected.
Found 30 outliers among 1000 measurements (3.00%)
13 (1.30%) high mild
17 (1.70%) high severe
coalesce_acked_from_zero 1000+1 entries: 💔 Performance has regressed by +2.4517%.
       time:   [97.448 ns 97.651 ns 97.896 ns]
       change: [+1.8323% +2.4517% +2.9603] (p = 0.00 < 0.01)
       Performance has regressed.
Found 21 outliers among 1000 measurements (2.10%)
4 (0.40%) high mild
17 (1.70%) high severe
RxStreamOrderer::inbound_frame(): No change in performance detected.
       time:   [108.20 ms 108.25 ms 108.33 ms]
       change: [-0.4387% -0.0805% +0.1184] (p = 0.35 > 0.01)
       No change in performance detected.
Found 127 outliers among 1000 measurements (12.70%)
59 (5.90%) low mild
50 (5.00%) high mild
18 (1.80%) high severe
sent::Packets::take_ranges: No change in performance detected.
       time:   [4.3532 µs 4.3895 µs 4.4829 µs]
       change: [-2.1677% +0.8720% +4.3283] (p = 0.95 > 0.01)
       No change in performance detected.
Found 81 outliers among 1000 measurements (8.10%)
60 (6.00%) high mild
21 (2.10%) high severe
transfer/simulated/pacing-false/fixed-seed
       time:   [23.941 s 23.941 s 23.941 s]
       thrpt:  [171.09 KiB/s 171.09 KiB/s 171.09 KiB/s]
transfer/simulated/pacing-true/fixed-seed
       time:   [23.676 s 23.676 s 23.676 s]
       thrpt:  [173.01 KiB/s 173.01 KiB/s 173.01 KiB/s]
transfer/walltime/pacing-false/fixed-seed
       time:   [23.259 ms 23.277 ms 23.300 ms]
Found 5 outliers among 500 measurements (1.00%)
1 (0.20%) high mild
4 (0.80%) high severe
transfer/walltime/pacing-true/fixed-seed
       time:   [23.357 ms 23.373 ms 23.396 ms]
Found 17 outliers among 500 measurements (3.40%)
1 (0.20%) low mild
10 (2.00%) high mild
6 (1.20%) high severe

Download data for profiler.firefox.com or download performance comparison data.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant