Skip to content

tests: add initial patchbay tests#3986

Merged
Frando merged 35 commits intomainfrom
Frando/netsim
Apr 2, 2026
Merged

tests: add initial patchbay tests#3986
Frando merged 35 commits intomainfrom
Frando/netsim

Conversation

@Frando
Copy link
Copy Markdown
Member

@Frando Frando commented Feb 27, 2026

Description

Adds a first round of tests using our new network simulation framework, patchbay. Patchbay uses Linux network namespaces to create isolated network topologies with routers, NATs, and link impairment. This lets us test actual holepunching and NAT traversal in realistic conditions, without needing any external infrastructure.

The test suite covers establishing a direct path in different scenarios:

  • Holepunching: two devices behind destination-independent NATs connect via relay, then upgrade to direct.
  • Uplink switching: a device changes its network uplink (v4 and v6) mid-connection and recovers a direct path.
  • Interface changes: network interfaces are added or removed while connected
  • Link outage recovery: a device goes offline and comes back, direct connection resumes
  • Degraded links: increasing levels of impairment (latency, jitter, packet loss) are applied to either side of a connection and holepunching still succeeds

Some tests that iroh doesn't pass yet are included but marked #[ignore]. They should be un-ignored as we improve things.

Running the tests

On Linux, things just work with user namespace support:

cargo make patchbay
# expands to:
cargo nextest run -p iroh --features qlog --test patchbay --profile patchbay

# run a specific test with logs:
RUST_LOG=trace cargo make patchbay holepunch_simple --nocapture

Test output is saved in ./target/testdir-current/patchbay/<test-name>. Patchbay currently collects tracing logs for each device, qlog files (if the feature is enabled), and endpoint metrics.
There's also a browser UI for viewing timelines, topologies, and logs. The UI can also be used to compare different test runs.

cargo install --git https://github.com/n0-computer/patchbay patchbay-cli
patchbay serve --testdir --open

On macOS you'll need to run the patchbay tests in a VM or container. The patchbay CLI includes a tool to set up a container or QEMU VM for this.

# both commands default to the container backend on macOS and native on linux
patchbay test -p iroh --test patchbay
patchbay serve --testdir --open
# use --vm to force VM mode even on linux

See the patchbay docs and README for more details.

Notes and open questions

There's a draft PR with some more tests: #4065, but they need more thought and work to really cover what we want to test.

Breaking Changes

None.

Change checklist

  • Self-review.
  • Documentation updates following the style guide, if relevant.
  • Tests if relevant.

@github-actions
Copy link
Copy Markdown

github-actions bot commented Feb 27, 2026

Documentation for this PR has been generated and is available at: https://n0-computer.github.io/iroh/pr/3986/docs/iroh/

Last updated: 2026-04-02T08:16:34Z

@github-actions
Copy link
Copy Markdown

github-actions bot commented Feb 27, 2026

Netsim report & logs for this PR have been generated and is available at: LOGS
This report will remain available for 3 days.

Last updated for commit: 942c81f

@n0bot n0bot bot added this to iroh Feb 27, 2026
@github-project-automation github-project-automation bot moved this to 🚑 Needs Triage in iroh Feb 27, 2026
@dignifiedquire dignifiedquire moved this from 🚑 Needs Triage to 🏗 In progress in iroh Mar 3, 2026
@Frando Frando changed the title tests: add initial netsim tests tests: add initial patchbay tests Mar 5, 2026
@Frando Frando force-pushed the Frando/netsim branch 2 times, most recently from 83bcb4e to c8f6e3e Compare March 5, 2026 11:50
@n0-computer n0-computer deleted a comment from github-actions bot Mar 24, 2026
@github-actions
Copy link
Copy Markdown

github-actions bot commented Mar 24, 2026

patchbay: success | results
Last updated: 2026-03-31T13:08:02Z · d10ebd5a6de175c306801e911f38f5ff7e8ad8fa

@Frando Frando marked this pull request as ready for review March 25, 2026 10:15
Ok(())
}

// ---
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is becoming a long file, maybe split it up into multiple test files?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed several tests and moved them to #4065 (still a draft) so let's do this once we add those back.

@flub
Copy link
Copy Markdown
Contributor

flub commented Mar 25, 2026

cargo install --git https://github.com/n0-computer/patchbay -p patchbay-runner
patchbay serve --testdir --open

Why the need to install a tool, can you not write a self-contained html file into the testdir?

Copy link
Copy Markdown
Contributor

@flub flub left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔥

.run(
async move |_dev, _ep, conn| {
let mut paths = conn.paths();
assert!(paths.is_relay(), "connection started relayed");
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this not flaky? This checks to see whether the selected path is a relay path, but this test does not itself control when holepunching starts so I would expect this to be flaky. If the runtime manages to delay this task enough then iroh could already have holepunched.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, this could be flaky if the runtime stalls this longer than at least one RTT. I've never seen it flake though. Should we leave it in until it flakes? So far it served me as a useful assert that the connection didn't start on IP, because if it did, a check for holepunching would be worthless.

@matheus23
Copy link
Copy Markdown
Member

The command in the PR description needs to be updated: The tests are now called patchbay, not netsim

@matheus23
Copy link
Copy Markdown
Member

cargo install --git https://github.com/n0-computer/patchbay -p patchbay-runner
patchbay serve --testdir --open

This command from the PR description needs updating: It's called patchbay-cli, and you need to drop the -p, as that's not an allowed flag in cargo install.

@Frando
Copy link
Copy Markdown
Member Author

Frando commented Mar 31, 2026

I removed the "Publish to patchbay server" part for now. We don't want to hold this up on deploying the service, and the utility is debatable because we can just rerun locally if it fails.

So from my side this is ready for merge now. We can always iterate and add more tests later.

Copy link
Copy Markdown
Member

@matheus23 matheus23 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just nits

Comment on lines +384 to +386
/// Both peers behind FullCone NAT (EIM+EIF with hairpin). The most permissive
/// NAT type — any external host can send to the mapped port. Holepunching
/// always succeeds on the first try.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And then this repeats the claim, but this time about FullCone NAT?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test is among the ones I removed for the inital merge. I will review the tests and docs more and mark #4065 as ready for review once one.

Comment on lines +730 to +736
info!("holepunched, now killing link for 2s");

// Take the link down.
dev.link_down("eth0").await?;
tokio::time::sleep(Duration::from_secs(2)).await;
dev.link_up("eth0").await?;
info!("link restored, waiting for recovery");
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be interesting to increase the time the link is down here - perhaps on the order of 15s or so - so that it's above the path idle timeout.

But I'm just musing. This is great already - we can add tests later.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to 5s. Can adapt further if needed, or we add more tests with that.

Comment on lines +759 to +761
/// Hotel WiFi: captive-portal firewall allows all outbound TCP but only UDP
/// port 53 (DNS). Similar to corporate firewall but less restrictive on TCP.
/// Relay via HTTPS should work, holepunching should not.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at the differences between RouterPreset::Hotel and RouterPreset::Corporate, the main difference seems to be that the hotel one is IPv4-only, whereas corporate is IPv4 + IPv6, and corporate only allows port 80/443, whereas the hotel one allows any port.

Tbqh, I don't think any of these things make much of a difference in terms of testing for iroh. In most cases I'd expect these tests to either both fail or none of them to fail, except maybe for some edge cases regarding IPv6.

Regardless, perhaps it's worth either adding more information to the test description, or linking the docs via something like "See also [RouterPreset::Hotel] and [RouterPreset::Corporate] for differences".

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 This test is among the ones I removed for the inital merge. I will review the tests and docs more and mark #4065 as ready for review once one.

Copy link
Copy Markdown
Contributor

@flub flub left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there are too many tests to review, i'm not sure how to tackle that. e.g. i can't say whether the way they are written makes sense, whether any failure would be because of the test, patchpay or iroh.

So I've only looked at the first few so far.

#[tokio::test]
#[traced_test]
#[ignore = "stays relayed, holepunch times out (deadline elapsed)"]
async fn holepunch_home_nat_one_side() -> Result {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is strange this doens't work. what mechanism did we figure to address these ignored tests?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No clear mechanism yet. First step: Let's run them locally with --ignored and if someone has an ignored one passing, then do a PR to remove the ignore line and we see what CI says.

@flub
Copy link
Copy Markdown
Contributor

flub commented Mar 31, 2026

Oh, another thing I found was that investigating any failure requires me to set RUST_LOG=trace. Why not enable that by default for the logfiles created? It uses some more disk space, but I don't think that's a big deal. Any other downside?

@Frando
Copy link
Copy Markdown
Member Author

Frando commented Mar 31, 2026

there are too many tests to review, i'm not sure how to tackle that.

I could just remove some, and do another PR then to add more. Let me do that tomorrow.

Frando added 3 commits April 1, 2026 11:58
- make it more obvious who is client and who is server
- drop endpoint dead on the floor after both run functions completed
@Frando Frando mentioned this pull request Apr 1, 2026
11 tasks
@Frando Frando requested a review from flub April 1, 2026 10:59
@Frando
Copy link
Copy Markdown
Member Author

Frando commented Apr 1, 2026

I moved a number of tests to #4065 to make it easier to review this PR.
Also did a round of cleanups, I think I addressed all feedback.

Copy link
Copy Markdown
Contributor

@flub flub left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, some nice improvements still.

I still didn't review the actual tests, but I think that's fine. They'll probably evolve anyway.

@Frando Frando merged commit 2ab1240 into main Apr 2, 2026
32 of 33 checks passed
@github-project-automation github-project-automation bot moved this from 🏗 In progress to ✅ Done in iroh Apr 2, 2026
@flub flub deleted the Frando/netsim branch April 2, 2026 10:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: ✅ Done

Development

Successfully merging this pull request may close these issues.

5 participants