Skip to content

fix(tests): stabilise e2e suite on macOS CI runners#50

Merged
mickvandijke merged 1 commit intomainfrom
fix/e2e-macos-quote-flake
Apr 21, 2026
Merged

fix(tests): stabilise e2e suite on macOS CI runners#50
mickvandijke merged 1 commit intomainfrom
fix/e2e-macos-quote-flake

Conversation

@grumbach
Copy link
Copy Markdown
Contributor

Context

The E2E suite has been recurrently failing on macos-latest with:

test_attack_corrupted_public_key panicked:
quotes should succeed: InsufficientPeers(
  "Got 6 quotes, need 7. Failures: [<peer>: timeout: Timeout waiting for quote from <peer>]"
)

The same root cause surfaces as test_chunk_exists, test_chunk_put_duplicate_skips_payment, test_payment_required_enforcement and a few others. Always 6-of-7 on macOS; Linux runners pass consistently.

Why only macOS

This looked weird — "it's slow networking" doesn't explain a platform split. It turns out to be test-infrastructure, not a client bug:

1. Zero redundancy in the testnet. CLOSE_GROUP_SIZE is 7, and DEFAULT_NODE_COUNT was CLOSE_GROUP_SIZE + 1 = 8. Quote collection gates on quotes.len() >= CLOSE_GROUP_SIZE, i.e. all 7 of the remote peers (the client is the 8th) must answer. One slow peer = fail. The 2x over-query is a no-op because only 8 nodes exist.

2. macOS GitHub runners are roughly half the CPU throughput of Linux runners. They're nested-virt on Anka/VMware. Spawning 8 QUIC nodes in-process and handshaking to all of them simultaneously saturates the available CPU during the handshake burst. On Linux, all 8 handshakes finish comfortably under the 10s per-peer timeout. On macOS, one or two routinely don't — the test sees 6 successes, one timeout failure, and no way to reach quorum.

Put together: the test had no slack at all and was running on the one platform that couldn't meet the timing. Merkle was unaffected because it already bumped its own config to 120s.

Fix

Two changes, both test-only — prod ClientConfig::default() is unchanged:

  1. DEFAULT_NODE_COUNT from CLOSE_GROUP_SIZE + 1 (8) to CLOSE_GROUP_SIZE * 2 (14). One full extra group of slack. Up to 7 peers can be slow and the test still reaches quorum. Extra cost: ~1.2 s of spawn delay per test setup.

  2. New test_client_config() helper in tests/support/mod.rs with quote_timeout_secs = store_timeout_secs = 60. All e2e_* files except e2e_merkle.rs (which already had its own 120 s config) switch from ClientConfig::default() to this helper. 60 s is deliberately conservative — in the happy path everything completes in ~1 s, the extra budget only shows up on flakes.

Both constants carry doc comments explaining why they're different from production, since the numbers look surprising at a glance.

Test plan

  • cargo fmt --all --check: clean
  • cargo clippy --all-targets --all-features -- -D warnings: clean
  • cargo test -p ant-core --test e2e_chunk test_chunk_put_get_round_trip: passes locally (macOS)
  • cargo test -p ant-core --test e2e_security test_attack_corrupted_public_key: passes locally — this is the one that was flaking
  • cargo test -p ant-core --test e2e_payment test_payment_required_enforcement: passes locally — the other flake
  • Full CI matrix will run on this PR; expect macOS E2E to finally go green consistently

The recurring `InsufficientPeers("Got 6 quotes, need 7")` flake on
macos-latest runners was a test-infrastructure problem, not a client
bug. Two changes close it:

1. Bump `DEFAULT_NODE_COUNT` from `CLOSE_GROUP_SIZE + 1` (8) to
   `CLOSE_GROUP_SIZE * 2` (14). The old value left zero slack - quote
   collection needs `CLOSE_GROUP_SIZE` peers to respond, so a single
   slow peer failed the whole test. Doubling the group gives a full
   extra group of redundancy.

2. Add `test_client_config()` helper with 60s quote/store timeouts
   (prod default is 10s). E2E tests run a full P2P network in one
   CI VM; macOS GitHub runners are nested-virt and roughly half the
   CPU throughput of Linux runners, so the 8-node QUIC handshake
   burst routinely took >10s per peer under load. Linux runners
   finished in time; macOS did not. Prod defaults stay at 10s -
   this only affects the loopback MiniTestnet.

The merkle suite already used 120s for the same reason; this brings
the rest of the e2e suite into line at 60s.
@mickvandijke mickvandijke merged commit 0b104d1 into main Apr 21, 2026
12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants