Skip to content

mesh_remote: add external connection support for ALPC meshes#3125

Open
jstarks wants to merge 6 commits intomicrosoft:mainfrom
jstarks:mesh-reconnect-windows
Open

mesh_remote: add external connection support for ALPC meshes#3125
jstarks wants to merge 6 commits intomicrosoft:mainfrom
jstarks:mesh-reconnect-windows

Conversation

@jstarks
Copy link
Copy Markdown
Member

@jstarks jstarks commented Mar 24, 2026

Enable already-running processes to join an ALPC mesh at runtime, which is needed for modify/attach scenarios where a control client connects to a running VMM process.

Enable already-running processes to join an ALPC mesh at runtime, which
is needed for openvmm-neo modify/attach scenarios where a control client
connects to a running VMM process.

The core problem is that ALPC mesh invitations today are only delivered
via inherited handles during fork/exec. There is no mechanism for an
external process to discover and connect to an existing mesh.

This adds two building blocks:

AlpcNode::new_named() creates a mesh whose Ob directory lives at a
well-known path (\BaseNamedObjects\mesh-<node-id>) instead of an
anonymous directory. This lets external processes open the directory by
name rather than requiring a duplicated handle.

AlpcMeshListener uses a named pipe for the invitation handshake. The
server listens on \\.\pipe\<name>, and when a client connects, it
creates a NamedInvitation (credentials + directory path) and sends it
over the pipe. The client deserializes the invitation and calls
AlpcNode::join_named() to enter the mesh. The named pipe's default DACL
restricts access to the current user, and the mesh secret provides ALPC
connection authentication.

AlpcMeshInviter is a Clone+Send handle extracted from AlpcNode that
delegates invitation creation to the node's internal task, so the
listener's accept loop doesn't need to hold a reference to the node.

Also hardens ALPC connection authentication: each mesh now has a 256-bit
secret generated from getrandom, sent as part of ALPC connection data
and validated with constant-time comparison on accept. Connections with
invalid secrets are rejected. PortConfig gains required_server_sid()
for optional SID-based mutual authentication on the ALPC connect path.
Copilot AI review requested due to automatic review settings March 24, 2026 23:40
@github-actions github-actions bot added the unsafe Related to unsafe code label Mar 24, 2026
@github-actions
Copy link
Copy Markdown

⚠️ Unsafe Code Detected

This PR modifies files containing unsafe Rust code. Extra scrutiny is required during review.

For more on why we check whole files, instead of just diffs, check out the Rustonomicon

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Enables Windows processes that are already running to join an existing mesh_remote ALPC mesh at runtime by introducing named-directory ALPC nodes and a named-pipe-based invitation listener, removing the previous “inherited handles only” limitation.

Changes:

  • Add a Windows named-pipe listener (AlpcMeshListener) to distribute ALPC named invitations to external processes.
  • Extend Windows ALPC mesh nodes with named-directory support plus serialized invitation credentials (address + mesh secret).
  • Add ALPC connection mutual-auth plumbing via an optional RequiredServerSid in pal’s ALPC connect path.

Reviewed changes

Copilot reviewed 8 out of 9 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
support/pal/src/windows/alpc.rs Adds SID validation/buffering and passes RequiredServerSid into NtAlpcConnectPortEx.
support/pal/Cargo.toml Adds thiserror for Windows-only PAL error types.
support/mesh/mesh_remote/src/lib.rs Exposes new Windows ALPC APIs (listener + new error/types) via re-exports.
support/mesh/mesh_remote/src/alpc_node.rs Adds named-directory nodes, invitation credentials structs, mesh secret auth, and inviter handle pattern.
support/mesh/mesh_remote/src/alpc_listener.rs New named-pipe listener/handshake implementation for runtime mesh joins.
support/mesh/mesh_remote/Cargo.toml Adds Windows-only deps for mesh secret auth (constant_time_eq, getrandom).
support/mesh/mesh_process/src/lib.rs Updates invitation propagation to use AlpcInvitationCredentials and the new AlpcInvitation::new(...) API.
Cargo.toml Adds workspace dependency version for constant_time_eq.
Cargo.lock Locks new crate additions (constant_time_eq, getrandom usage in mesh_remote).

@jstarks jstarks requested a review from Copilot March 25, 2026 00:14
@jstarks jstarks changed the title mesh_remote: add ALPC mesh listener and named-directory nodes mesh_remote: add external connection support for ALPC meshes Mar 25, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 8 out of 9 changed files in this pull request and generated 2 comments.

Comment on lines +881 to +889
/// `expected_server_sid` is the SID of the expected ALPC server process.
/// When non-empty, all ALPC connections will use `RequiredServerSid` for
/// mutual auth. Pass an empty `Vec` to skip SID validation (e.g., when
/// filesystem permissions on the invitation socket are sufficient).
pub fn join_named(
driver: impl Driver + Spawn + Clone,
invitation: NamedInvitation,
expected_server_sid: Vec<u8>,
port: Port,
Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The expected_server_sid parameter name is misleading: it’s applied to all outgoing ALPC connections (every connect_alpc call), not just the initial “server”. If this is intended to pin the SID for every mesh peer, consider renaming to something like expected_peer_sid / required_remote_sid; otherwise, consider scoping this SID check to only the leader node or carrying per-peer identity info in invitations.

Copilot uses AI. Check for mistakes.
Comment on lines +159 to +192
pub async fn join_by_socket(
driver: impl Driver + Spawn + Clone,
path: &Path,
port: Port,
) -> Result<Self, JoinBySocketError> {
let mut stream = PolledSocket::<UnixStream>::connect_unix(&driver, path)
.await
.map_err(JoinBySocketError::Connect)?;

// Read the framed invitation: [4 bytes LE: data_len][data_len bytes]
let mut len_buf = [0u8; 4];
stream
.read_exact(&mut len_buf)
.await
.map_err(JoinBySocketError::Connect)?;
let data_len = u32::from_le_bytes(len_buf) as usize;

const MAX_INVITATION_SIZE: usize = 64 * 1024;
if data_len > MAX_INVITATION_SIZE {
return Err(JoinBySocketError::InvitationTooLarge { len: data_len });
}

let mut data = vec![0u8; data_len];
stream
.read_exact(&mut data)
.await
.map_err(JoinBySocketError::Connect)?;
drop(stream);

let invitation: NamedInvitation =
mesh_protobuf::decode(&data).map_err(JoinBySocketError::Decode)?;

AlpcNode::join_named(driver, invitation, Vec::new(), port).map_err(JoinBySocketError::Join)
}
Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

join_by_socket() hard-codes Vec::new() for expected_server_sid, making it impossible for callers to opt into RequiredServerSid mutual authentication when joining via the socket handshake. Consider adding an expected_server_sid (or similar) parameter to join_by_socket() (and possibly including/pinning server identity in the invitation) so callers can enforce peer identity when filesystem permissions alone aren’t sufficient.

Copilot uses AI. Check for mistakes.
jstarks added 3 commits March 25, 2026 15:56
The requirements for SID-based mutual auth on ALPC connections are not
clear enough to justify the current API surface. Remove the
expected_server_sid parameter from join_named and all internal functions
in the connect path (with_id, process_connects, connect_alpc,
process_one_connect). This can be re-added with a clearer design when
needed.
Copilot AI review requested due to automatic review settings March 27, 2026 23:26
@jstarks jstarks marked this pull request as ready for review March 27, 2026 23:26
@jstarks jstarks requested a review from a team as a code owner March 27, 2026 23:26
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 6 out of 7 changed files in this pull request and generated 4 comments.

Comment on lines +168 to +185
// Read the length-prefixed invitation: [4 bytes LE length][data].
let mut len_buf = [0u8; 4];
stream
.read_exact(&mut len_buf)
.await
.map_err(JoinBySocketError::Connect)?;
let data_len = u32::from_le_bytes(len_buf) as usize;

const MAX_INVITATION_SIZE: usize = 64 * 1024;
if data_len > MAX_INVITATION_SIZE {
return Err(JoinBySocketError::InvitationTooLarge { len: data_len });
}

let mut data = vec![0u8; data_len];
stream
.read_exact(&mut data)
.await
.map_err(JoinBySocketError::Connect)?;
Copy link

Copilot AI Mar 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

JoinBySocketError::Connect is used for read failures as well (both length and payload reads). That produces a misleading "failed to connect" error when the connection succeeded but the peer closed early or sent a truncated invitation. Consider adding a dedicated read/IO variant (or renaming Connect to something like Io) and mapping these read_exact errors to it.

Copilot uses AI. Check for mistakes.
Comment on lines +4 to +12
//! Unix socket listener for distributing ALPC mesh invitations.
//!
//! This module provides [`AlpcMeshListener`] for servers and
//! [`AlpcNode::join_by_socket`] for clients. The Unix socket is used only for
//! the invitation handshake — mesh communication happens over ALPC.
//!
//! Security is enforced by the filesystem: the socket file inherits the
//! permissions of its parent directory, so placing it in a user-owned directory
//! prevents other users from connecting.
Copy link

Copilot AI Mar 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The module-level docs describe this as a "Unix socket listener" and explain Unix filesystem permissions, but the entire module is #![cfg(windows)]. Clarify in the docs that this is using Windows AF_UNIX (Unix domain sockets on Windows) and describe the Windows ACL/permission expectations to avoid misleading security guidance.

Copilot uses AI. Check for mistakes.
}

/// Extract an [`AlpcMeshInviter`] handle. Cheap — clones the sender.
pub fn inviter(&self) -> AlpcMeshInviter {
Copy link

Copilot AI Mar 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AlpcNode::inviter is pub but returns AlpcMeshInviter, which is declared pub(crate). This will trip the private_interfaces lint (and may fail the build if denied) and also exposes a non-public type in the public API. Either make AlpcMeshInviter public (and optionally re-export it) or reduce inviter() visibility to pub(crate) / pub(super) as appropriate.

Suggested change
pub fn inviter(&self) -> AlpcMeshInviter {
pub(crate) fn inviter(&self) -> AlpcMeshInviter {

Copilot uses AI. Check for mistakes.
// Generate a 256-bit mesh secret for ALPC connection authentication.
// Every connecting node must present this secret to be accepted.
let mut secret = [0u8; 32];
getrandom::fill(&mut secret).unwrap();
Copy link

Copilot AI Mar 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

getrandom::fill(&mut secret).unwrap() can panic during node creation. Since this is part of normal construction (AlpcNode::new/new_named), it should return an error instead (e.g., add a NewNodeError::RngFailure variant and map the getrandom error) to avoid crashing the process on RNG/entropy failures.

Suggested change
getrandom::fill(&mut secret).unwrap();
getrandom::fill(&mut secret).map_err(NewNodeError::RngFailure)?;

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

unsafe Related to unsafe code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants