mesh_remote: add external connection support for ALPC meshes#3125
mesh_remote: add external connection support for ALPC meshes#3125jstarks wants to merge 6 commits intomicrosoft:mainfrom
Conversation
Enable already-running processes to join an ALPC mesh at runtime, which is needed for openvmm-neo modify/attach scenarios where a control client connects to a running VMM process. The core problem is that ALPC mesh invitations today are only delivered via inherited handles during fork/exec. There is no mechanism for an external process to discover and connect to an existing mesh. This adds two building blocks: AlpcNode::new_named() creates a mesh whose Ob directory lives at a well-known path (\BaseNamedObjects\mesh-<node-id>) instead of an anonymous directory. This lets external processes open the directory by name rather than requiring a duplicated handle. AlpcMeshListener uses a named pipe for the invitation handshake. The server listens on \\.\pipe\<name>, and when a client connects, it creates a NamedInvitation (credentials + directory path) and sends it over the pipe. The client deserializes the invitation and calls AlpcNode::join_named() to enter the mesh. The named pipe's default DACL restricts access to the current user, and the mesh secret provides ALPC connection authentication. AlpcMeshInviter is a Clone+Send handle extracted from AlpcNode that delegates invitation creation to the node's internal task, so the listener's accept loop doesn't need to hold a reference to the node. Also hardens ALPC connection authentication: each mesh now has a 256-bit secret generated from getrandom, sent as part of ALPC connection data and validated with constant-time comparison on accept. Connections with invalid secrets are rejected. PortConfig gains required_server_sid() for optional SID-based mutual authentication on the ALPC connect path.
|
This PR modifies files containing For more on why we check whole files, instead of just diffs, check out the Rustonomicon |
There was a problem hiding this comment.
Pull request overview
Enables Windows processes that are already running to join an existing mesh_remote ALPC mesh at runtime by introducing named-directory ALPC nodes and a named-pipe-based invitation listener, removing the previous “inherited handles only” limitation.
Changes:
- Add a Windows named-pipe listener (
AlpcMeshListener) to distribute ALPC named invitations to external processes. - Extend Windows ALPC mesh nodes with named-directory support plus serialized invitation credentials (address + mesh secret).
- Add ALPC connection mutual-auth plumbing via an optional
RequiredServerSidinpal’s ALPC connect path.
Reviewed changes
Copilot reviewed 8 out of 9 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| support/pal/src/windows/alpc.rs | Adds SID validation/buffering and passes RequiredServerSid into NtAlpcConnectPortEx. |
| support/pal/Cargo.toml | Adds thiserror for Windows-only PAL error types. |
| support/mesh/mesh_remote/src/lib.rs | Exposes new Windows ALPC APIs (listener + new error/types) via re-exports. |
| support/mesh/mesh_remote/src/alpc_node.rs | Adds named-directory nodes, invitation credentials structs, mesh secret auth, and inviter handle pattern. |
| support/mesh/mesh_remote/src/alpc_listener.rs | New named-pipe listener/handshake implementation for runtime mesh joins. |
| support/mesh/mesh_remote/Cargo.toml | Adds Windows-only deps for mesh secret auth (constant_time_eq, getrandom). |
| support/mesh/mesh_process/src/lib.rs | Updates invitation propagation to use AlpcInvitationCredentials and the new AlpcInvitation::new(...) API. |
| Cargo.toml | Adds workspace dependency version for constant_time_eq. |
| Cargo.lock | Locks new crate additions (constant_time_eq, getrandom usage in mesh_remote). |
| /// `expected_server_sid` is the SID of the expected ALPC server process. | ||
| /// When non-empty, all ALPC connections will use `RequiredServerSid` for | ||
| /// mutual auth. Pass an empty `Vec` to skip SID validation (e.g., when | ||
| /// filesystem permissions on the invitation socket are sufficient). | ||
| pub fn join_named( | ||
| driver: impl Driver + Spawn + Clone, | ||
| invitation: NamedInvitation, | ||
| expected_server_sid: Vec<u8>, | ||
| port: Port, |
There was a problem hiding this comment.
The expected_server_sid parameter name is misleading: it’s applied to all outgoing ALPC connections (every connect_alpc call), not just the initial “server”. If this is intended to pin the SID for every mesh peer, consider renaming to something like expected_peer_sid / required_remote_sid; otherwise, consider scoping this SID check to only the leader node or carrying per-peer identity info in invitations.
| pub async fn join_by_socket( | ||
| driver: impl Driver + Spawn + Clone, | ||
| path: &Path, | ||
| port: Port, | ||
| ) -> Result<Self, JoinBySocketError> { | ||
| let mut stream = PolledSocket::<UnixStream>::connect_unix(&driver, path) | ||
| .await | ||
| .map_err(JoinBySocketError::Connect)?; | ||
|
|
||
| // Read the framed invitation: [4 bytes LE: data_len][data_len bytes] | ||
| let mut len_buf = [0u8; 4]; | ||
| stream | ||
| .read_exact(&mut len_buf) | ||
| .await | ||
| .map_err(JoinBySocketError::Connect)?; | ||
| let data_len = u32::from_le_bytes(len_buf) as usize; | ||
|
|
||
| const MAX_INVITATION_SIZE: usize = 64 * 1024; | ||
| if data_len > MAX_INVITATION_SIZE { | ||
| return Err(JoinBySocketError::InvitationTooLarge { len: data_len }); | ||
| } | ||
|
|
||
| let mut data = vec![0u8; data_len]; | ||
| stream | ||
| .read_exact(&mut data) | ||
| .await | ||
| .map_err(JoinBySocketError::Connect)?; | ||
| drop(stream); | ||
|
|
||
| let invitation: NamedInvitation = | ||
| mesh_protobuf::decode(&data).map_err(JoinBySocketError::Decode)?; | ||
|
|
||
| AlpcNode::join_named(driver, invitation, Vec::new(), port).map_err(JoinBySocketError::Join) | ||
| } |
There was a problem hiding this comment.
join_by_socket() hard-codes Vec::new() for expected_server_sid, making it impossible for callers to opt into RequiredServerSid mutual authentication when joining via the socket handshake. Consider adding an expected_server_sid (or similar) parameter to join_by_socket() (and possibly including/pinning server identity in the invitation) so callers can enforce peer identity when filesystem permissions alone aren’t sufficient.
The requirements for SID-based mutual auth on ALPC connections are not clear enough to justify the current API surface. Remove the expected_server_sid parameter from join_named and all internal functions in the connect path (with_id, process_connects, connect_alpc, process_one_connect). This can be re-added with a clearer design when needed.
| // Read the length-prefixed invitation: [4 bytes LE length][data]. | ||
| let mut len_buf = [0u8; 4]; | ||
| stream | ||
| .read_exact(&mut len_buf) | ||
| .await | ||
| .map_err(JoinBySocketError::Connect)?; | ||
| let data_len = u32::from_le_bytes(len_buf) as usize; | ||
|
|
||
| const MAX_INVITATION_SIZE: usize = 64 * 1024; | ||
| if data_len > MAX_INVITATION_SIZE { | ||
| return Err(JoinBySocketError::InvitationTooLarge { len: data_len }); | ||
| } | ||
|
|
||
| let mut data = vec![0u8; data_len]; | ||
| stream | ||
| .read_exact(&mut data) | ||
| .await | ||
| .map_err(JoinBySocketError::Connect)?; |
There was a problem hiding this comment.
JoinBySocketError::Connect is used for read failures as well (both length and payload reads). That produces a misleading "failed to connect" error when the connection succeeded but the peer closed early or sent a truncated invitation. Consider adding a dedicated read/IO variant (or renaming Connect to something like Io) and mapping these read_exact errors to it.
| //! Unix socket listener for distributing ALPC mesh invitations. | ||
| //! | ||
| //! This module provides [`AlpcMeshListener`] for servers and | ||
| //! [`AlpcNode::join_by_socket`] for clients. The Unix socket is used only for | ||
| //! the invitation handshake — mesh communication happens over ALPC. | ||
| //! | ||
| //! Security is enforced by the filesystem: the socket file inherits the | ||
| //! permissions of its parent directory, so placing it in a user-owned directory | ||
| //! prevents other users from connecting. |
There was a problem hiding this comment.
The module-level docs describe this as a "Unix socket listener" and explain Unix filesystem permissions, but the entire module is #![cfg(windows)]. Clarify in the docs that this is using Windows AF_UNIX (Unix domain sockets on Windows) and describe the Windows ACL/permission expectations to avoid misleading security guidance.
| } | ||
|
|
||
| /// Extract an [`AlpcMeshInviter`] handle. Cheap — clones the sender. | ||
| pub fn inviter(&self) -> AlpcMeshInviter { |
There was a problem hiding this comment.
AlpcNode::inviter is pub but returns AlpcMeshInviter, which is declared pub(crate). This will trip the private_interfaces lint (and may fail the build if denied) and also exposes a non-public type in the public API. Either make AlpcMeshInviter public (and optionally re-export it) or reduce inviter() visibility to pub(crate) / pub(super) as appropriate.
| pub fn inviter(&self) -> AlpcMeshInviter { | |
| pub(crate) fn inviter(&self) -> AlpcMeshInviter { |
| // Generate a 256-bit mesh secret for ALPC connection authentication. | ||
| // Every connecting node must present this secret to be accepted. | ||
| let mut secret = [0u8; 32]; | ||
| getrandom::fill(&mut secret).unwrap(); |
There was a problem hiding this comment.
getrandom::fill(&mut secret).unwrap() can panic during node creation. Since this is part of normal construction (AlpcNode::new/new_named), it should return an error instead (e.g., add a NewNodeError::RngFailure variant and map the getrandom error) to avoid crashing the process on RNG/entropy failures.
| getrandom::fill(&mut secret).unwrap(); | |
| getrandom::fill(&mut secret).map_err(NewNodeError::RngFailure)?; |
Enable already-running processes to join an ALPC mesh at runtime, which is needed for modify/attach scenarios where a control client connects to a running VMM process.