fix(tests): stabilise e2e suite on macOS CI runners by grumbach · Pull Request #50 · WithAutonomi/ant-client

grumbach · 2026-04-21T03:32:30Z

Context

The E2E suite has been recurrently failing on macos-latest with:

test_attack_corrupted_public_key panicked:
quotes should succeed: InsufficientPeers(
  "Got 6 quotes, need 7. Failures: [<peer>: timeout: Timeout waiting for quote from <peer>]"
)

The same root cause surfaces as test_chunk_exists, test_chunk_put_duplicate_skips_payment, test_payment_required_enforcement and a few others. Always 6-of-7 on macOS; Linux runners pass consistently.

Why only macOS

This looked weird — "it's slow networking" doesn't explain a platform split. It turns out to be test-infrastructure, not a client bug:

1. Zero redundancy in the testnet. CLOSE_GROUP_SIZE is 7, and DEFAULT_NODE_COUNT was CLOSE_GROUP_SIZE + 1 = 8. Quote collection gates on quotes.len() >= CLOSE_GROUP_SIZE, i.e. all 7 of the remote peers (the client is the 8th) must answer. One slow peer = fail. The 2x over-query is a no-op because only 8 nodes exist.

2. macOS GitHub runners are roughly half the CPU throughput of Linux runners. They're nested-virt on Anka/VMware. Spawning 8 QUIC nodes in-process and handshaking to all of them simultaneously saturates the available CPU during the handshake burst. On Linux, all 8 handshakes finish comfortably under the 10s per-peer timeout. On macOS, one or two routinely don't — the test sees 6 successes, one timeout failure, and no way to reach quorum.

Put together: the test had no slack at all and was running on the one platform that couldn't meet the timing. Merkle was unaffected because it already bumped its own config to 120s.

Fix

Two changes, both test-only — prod ClientConfig::default() is unchanged:

DEFAULT_NODE_COUNT from CLOSE_GROUP_SIZE + 1 (8) to CLOSE_GROUP_SIZE * 2 (14). One full extra group of slack. Up to 7 peers can be slow and the test still reaches quorum. Extra cost: ~1.2 s of spawn delay per test setup.
New test_client_config() helper in tests/support/mod.rs with quote_timeout_secs = store_timeout_secs = 60. All e2e_* files except e2e_merkle.rs (which already had its own 120 s config) switch from ClientConfig::default() to this helper. 60 s is deliberately conservative — in the happy path everything completes in ~1 s, the extra budget only shows up on flakes.

Both constants carry doc comments explaining why they're different from production, since the numbers look surprising at a glance.

Test plan

cargo fmt --all --check: clean
cargo clippy --all-targets --all-features -- -D warnings: clean
cargo test -p ant-core --test e2e_chunk test_chunk_put_get_round_trip: passes locally (macOS)
cargo test -p ant-core --test e2e_security test_attack_corrupted_public_key: passes locally — this is the one that was flaking
cargo test -p ant-core --test e2e_payment test_payment_required_enforcement: passes locally — the other flake
Full CI matrix will run on this PR; expect macOS E2E to finally go green consistently

The recurring `InsufficientPeers("Got 6 quotes, need 7")` flake on macos-latest runners was a test-infrastructure problem, not a client bug. Two changes close it: 1. Bump `DEFAULT_NODE_COUNT` from `CLOSE_GROUP_SIZE + 1` (8) to `CLOSE_GROUP_SIZE * 2` (14). The old value left zero slack - quote collection needs `CLOSE_GROUP_SIZE` peers to respond, so a single slow peer failed the whole test. Doubling the group gives a full extra group of redundancy. 2. Add `test_client_config()` helper with 60s quote/store timeouts (prod default is 10s). E2E tests run a full P2P network in one CI VM; macOS GitHub runners are nested-virt and roughly half the CPU throughput of Linux runners, so the 8-node QUIC handshake burst routinely took >10s per peer under load. Linux runners finished in time; macOS did not. Prod defaults stay at 10s - this only affects the loopback MiniTestnet. The merkle suite already used 120s for the same reason; this brings the rest of the e2e suite into line at 60s.

mickvandijke approved these changes Apr 21, 2026

View reviewed changes

mickvandijke merged commit 0b104d1 into main Apr 21, 2026
12 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(tests): stabilise e2e suite on macOS CI runners#50

fix(tests): stabilise e2e suite on macOS CI runners#50
mickvandijke merged 1 commit intomainfrom
fix/e2e-macos-quote-flake

grumbach commented Apr 21, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

grumbach commented Apr 21, 2026

Context

Why only macOS

Fix

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants