Skip to content

test: add p2p end-to-end tests#464

Open
therustmonk wants to merge 1 commit into
mainfrom
kd-may29-p2p
Open

test: add p2p end-to-end tests#464
therustmonk wants to merge 1 commit into
mainfrom
kd-may29-p2p

Conversation

@therustmonk
Copy link
Copy Markdown
Collaborator

The networking layer had almost no automated coverage. The existing frostp2p_integ_test uses a different path (Node::new_server + custom behaviour for FROST); the normal client path and the relay circuit hop were untested.

Two real-socket integration tests for the p2p layer:

  • direct_tcp_connection.rs — two real Pluto nodes connect directly over TCP and run the full production PlutoBehaviour: connect both ways, exchange identify (Received), and ping.
  • relay_circuit.rs — two clients connect only through an in-process relay (reservation + relayed dial), never directly.

Both log swarm errors at warn instead of swallowing them, so failures show the real cause, not a bare timeout.

@therustmonk
Copy link
Copy Markdown
Collaborator Author

@claude review

@claude
Copy link
Copy Markdown

claude Bot commented Jun 1, 2026

Claude finished @therustmonk's task in 4m 4s —— View job


Claude PR Review

  • Read PR diff and changed files
  • Cross-check with existing p2p test patterns (frostp2p_integ_test, Node::new / Node::new_server)
  • Analyze for correctness, flakiness, style, and Pluto conventions
  • Post review feedback

Overall: Solid additions. Both tests drive the full production PlutoBehaviour over real sockets, which is genuine new coverage relative to dkg::frostp2p_integ_test (which uses a custom test behaviour). The use of identify::Event::Received (not Sent) and ping::Event with result.is_ok() are the right correctness anchors, and note_swarm_event logging at warn is a real ergonomic win over silent timeouts. I did not run cargo test / cargo clippy here — cargo is not in my allowlist for this run — so the comments below are from code review only.


Findings

1. relay_circuit.rs only asserts ConnectionEstablished — not that the protocol stack actually works over the circuit

crates/p2p/tests/relay_circuit.rs:198-221 confirms that both ends see a ConnectionEstablished with the other peer's id, but never checks that identify / ping actually flow over the relayed connection. Compared to the direct test (which checks all three of connect / identify-Received / ping-OK), this leaves a gap: a circuit dial can succeed and then the upper protocols can silently fail to negotiate over the relay transport.

Suggest adding an Observed { connected, identified, pinged } pattern symmetric to direct_tcp_connection.rs:40-51, applied to both listener and dialer. That would actually exercise "the relay reservation and circuit hop" end-to-end rather than just "reservation + dial accepted". Fix this →

2. Helper duplication between the two test files

init_logs (direct_tcp_connection.rs:56-62relay_circuit.rs:43-49) and note_swarm_event (direct_tcp_connection.rs:67-86relay_circuit.rs:54-73) are near-identical — the relay version only adds a label field. Rust integration tests support a shared tests/common/mod.rs (added via mod common; in each test file); extracting these would prevent the two from drifting apart as more tests get added. Not blocking.

3. Spawned relay task is fire-and-forget

crates/p2p/tests/relay_circuit.rs:138-143 spawns the relay's event loop and never awaits it. relay_handle.abort() on :232 is also only reached on the happy path — if any of the intermediate timeout(...).await.context(...)? fail, the handle is dropped without abort. Since the tokio runtime tears down at test exit this is benign, but a JoinHandle panic inside the spawned loop would currently be invisible. Minor; an alternative is tokio::select!-ing the relay's events into the main drive loop instead of tokio::spawn, so panics propagate.

4. relay::Config literal is exhaustive — drift risk vs explicitness

relay_circuit.rs:88-98 enumerates every field. This is safe today, but means any new field added upstream in libp2p will need to be set manually here; switching to relay::Config { max_reservations: 16, ..Default::default() } would reduce churn. (Either choice is defensible — flagging as a small style call.)

5. Test seeds 1/2 reused across both files

generate_insecure_k1_key(1) / (2) appear in both files. Because Cargo compiles each tests/*.rs as a separate test binary, this is not a runtime collision today, but it does make grepping for "which test uses which keypair" a bit ambiguous. Could disambiguate (e.g., seeds 10/11 for direct, 20/21/22 for circuit). Stylistic only.

6. TEST_TIMEOUT differs between files (20s vs 30s) and stacks on itself

relay_circuit.rs calls timeout(TEST_TIMEOUT, ...) three times sequentially (:122, :174, :198), so the worst-case wall clock for that test is up to ~90s before failure. Loopback TCP should be far faster than this in practice, but on a heavily-loaded CI runner the cumulative budget might still be tight if multiple p2p tests run concurrently. Probably fine; flagging for awareness.

7. Direct TCP test is asymmetric in setup but symmetric in assertion — that's good, but worth a note

direct_tcp_connection.rs:196-199: only node A listens, only node B dials. The symmetry of the Observed checks then proves bidirectional protocol activity over that single connection — which is the realistic shape and the right test. (Not an issue; calling out that I checked.)

8. Nit — comment accuracy

direct_tcp_connection.rs:5-7 says "the composed PlutoBehaviour with its connection logger, gater, identify, ping, autonat and QUIC-upgrade sub-behaviours". quic_enabled defaults to false in the builder (behaviours/pluto.rs:109), so the QUIC-upgrade sub-behaviour is wired but inert in this test. Not wrong, but a reader might expect QUIC-upgrade activity. Could clarify or add a NodeType::QUIC variant as a follow-up.


Things checked and look correct

  • P2PConfig::default() has empty tcp_addrs, so the explicit listen_on("/ip4/127.0.0.1/tcp/0") is the only listen address — no surprise extra bind. (config.rs:60-80, p2p.rs:339-379)
  • filter_private_addrs: false is essential here; loopback would otherwise be filtered out of advertised addresses.
  • relay_node.add_external_address(relay_addr.clone()) (relay_circuit.rs:136) is required, otherwise the listener gets NoAddressesInReservation. The inline comment captures this — good.
  • Default gater is open (behaviours/pluto.rs:189), so unknown peers connect; tests don't need to register cluster peers for connectivity (only for the metrics/peer-store distinction).
  • peer_id == dialer_peer / peer_id == listener_peer checks correctly distinguish circuit-hop peer-ids from the relay peer-id.
  • The circuit multiaddr construction relay/p2p/<relay>/p2p-circuit/p2p/<listener> (relay_circuit.rs:146-188) is the right shape for a relayed dial.
    • branch: kd-may29-p2p

Copy link
Copy Markdown
Collaborator

@emlautarom1 emlautarom1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would remove the tracing setup and logging: in tests we should avoid all logs to not clutter the output. If a test fails better to do so with a clear panic/assert that will actually fail the test suite. It's hard to know what to do with a warning (should it fail? does it need attention? is this expected?)

Comment on lines +34 to +38
/// Returns true if `addr` is a relayed (`/p2p-circuit`) address.
fn is_circuit_addr(addr: &Multiaddr) -> bool {
addr.iter()
.any(|proto| matches!(proto, Protocol::P2pCircuit))
}
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: duplicate of

pub fn is_relay_addr(addr: &Multiaddr) -> bool {
addr.iter().any(|p| matches!(p, MaProtocol::P2pCircuit))
}

Comment on lines +43 to +49
fn init_logs() {
let filter = EnvFilter::try_from_default_env().unwrap_or_else(|_| EnvFilter::new("warn"));
let _ = tracing_subscriber::fmt()
.with_env_filter(filter)
.with_test_writer()
.try_init();
}
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure we want to enable tracing in tests. It will generate a lot of noise that provides no value 99% of the time. If there is a regression then the test should fail loudly with an error so we can review it.

Comment on lines +51 to +54
/// Logs swarm events instead of silently dropping them: failures at `warn`, the
/// rest at `trace`. A rejected reservation or failed circuit dial then shows up
/// in the logs rather than as a bare timeout.
fn note_swarm_event<E: Debug>(label: &str, event: &SwarmEvent<E>) {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case, are any of these errors expected in any way? If they're not (they are all actual errors) then it should panic so we can get a better trace when debugging tests.


use std::{fmt::Debug, time::Duration};

use anyhow::{Context as _, ensure};
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I think we can drop anyhow here and use expect/assert! instead for all scenarios

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants