vhost-user: add frontend, backend, and protocol crates#3126
vhost-user: add frontend, backend, and protocol crates#3126jstarks wants to merge 12 commits intomicrosoft:mainfrom
Conversation
|
This PR modifies files containing For more on why we check whole files, instead of just diffs, check out the Rustonomicon |
There was a problem hiding this comment.
Pull request overview
Adds vhost-user support to OpenVMM so virtio devices can be hosted in an external process over Unix domain sockets (plus a standalone openvmm-vhost backend binary).
Changes:
- Introduces
vhost_user_protocol,vhost_user_frontend, andvhost_user_backendcrates implementing vhost-user wire types, async socket I/O with fd passing, and frontend/backend adapters. - Adds
--vhost-user ...(Linux) CLI plumbing and aVhostUserDeviceHandleresource so OpenVMM can instantiate vhost-user-backed virtio devices. - Adjusts virtio feature advertisement so devices explicitly advertise ring features (event_idx/indirect/packed), with transports no longer force-enabling them.
Reviewed changes
Copilot reviewed 29 out of 30 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
| vm/devices/virtio/virtiofs/src/virtio.rs | Advertises additional virtio ring features for virtio-fs. |
| vm/devices/virtio/virtio_rng/src/lib.rs | Advertises additional virtio ring features for virtio-rng. |
| vm/devices/virtio/virtio_resources/src/lib.rs | Adds VhostUserDeviceHandle resource type (unix-only). |
| vm/devices/virtio/virtio_pmem/src/lib.rs | Advertises additional virtio ring features for virtio-pmem. |
| vm/devices/virtio/virtio_p9/src/lib.rs | Advertises additional virtio ring features for virtio-9p (including device-specific bit). |
| vm/devices/virtio/virtio_net/src/lib.rs | Updates virtio-net feature advertisement to include ring features/packed ring. |
| vm/devices/virtio/virtio_console/src/lib.rs | Updates virtio-console feature advertisement to include ring features/packed ring. |
| vm/devices/virtio/virtio_blk/src/lib.rs | Updates virtio-blk feature advertisement to include ring features/packed ring. |
| vm/devices/virtio/virtio/src/transport/pci.rs | Stops force-enabling ring_event_idx/indirect/packed in transport; only forces VERSION_1. |
| vm/devices/virtio/virtio/src/transport/mmio.rs | Same as PCI: transport only forces VERSION_1. |
| vm/devices/virtio/vhost_user_protocol/src/socket.rs | Implements async recvmsg/sendmsg with SCM_RIGHTS fd passing. |
| vm/devices/virtio/vhost_user_protocol/src/protocol.rs | Defines vhost-user wire structs/enums/constants + layout tests. |
| vm/devices/virtio/vhost_user_protocol/src/lib.rs | Crate entry for shared protocol/socket modules (unix-only). |
| vm/devices/virtio/vhost_user_protocol/Cargo.toml | New crate manifest for protocol/socket implementation. |
| vm/devices/virtio/vhost_user_frontend/src/resolver.rs | Registers a resource resolver that turns VhostUserDeviceHandle into a virtio device. |
| vm/devices/virtio/vhost_user_frontend/src/lib.rs | Implements VirtioDevice frontend proxy (handshake, mem table export, queue lifecycle, config, save/restore). |
| vm/devices/virtio/vhost_user_frontend/Cargo.toml | New crate manifest for frontend implementation + tests. |
| openvmm_vhost/vhost_user_backend/src/queue_setup.rs | Backend per-queue readiness/activation state machine. |
| openvmm_vhost/vhost_user_backend/src/memory.rs | Builds GuestMemory from SET_MEM_TABLE region fds via sparse mapping + VA→GPA translation. |
| openvmm_vhost/vhost_user_backend/src/lib.rs | Implements backend server translating vhost-user messages into VirtioDevice calls. |
| openvmm_vhost/vhost_user_backend/Cargo.toml | New crate manifest for backend server implementation + tests. |
| openvmm_vhost/openvmm_vhost/src/main.rs | Adds openvmm-vhost binary (currently hosts virtio-blk) as a test vehicle. |
| openvmm_vhost/openvmm_vhost/Cargo.toml | New binary crate manifest. |
| openvmm/openvmm_resources/src/lib.rs | Registers vhost-user frontend resolver on Linux builds. |
| openvmm/openvmm_resources/Cargo.toml | Adds Linux-only dependency on vhost_user_frontend. |
| openvmm/openvmm_entry/src/lib.rs | Wires --vhost-user into VM config creation by connecting to the socket and adding the device. |
| openvmm/openvmm_entry/src/cli_args.rs | Adds --vhost-user argument parsing and a VhostUserCli parser. |
| openvmm/openvmm_entry/Cargo.toml | Adds virtio dependency used by the new CLI parsing. |
| Cargo.toml | Adds new crates/binary to the workspace and workspace dependencies. |
| Cargo.lock | Updates lockfile for the newly added workspace crates. |
Add vhost-user support to OpenVMM, enabling virtio device backends to run in external processes connected via Unix domain sockets. - `vhost_user_protocol` — shared wire types (headers, payloads, feature flags) and async socket I/O with SCM_RIGHTS fd passing. - `vhost_user_frontend` — a `VirtioDevice` implementation that proxies device operations to an external vhost-user backend. Handles the full protocol handshake, memory table export, queue lifecycle (including event-backed and function-backed interrupt proxying), config read/write, and save/restore. - `vhost_user_backend` — a protocol server that wraps any `VirtioDevice` and serves the vhost-user protocol. Handles SET_MEM_TABLE with mmap-backed GuestMemory construction, queue start/stop orchestration, and hot memory table replacement (stopping and restarting active queues). - `openvmm_vhost` — a standalone binary hosting virtio-blk over vhost-user, primarily as a test vehicle. - `--vhost-user <socket>,type=blk|net|...` CLI argument on Linux - Resource resolver for `VhostUserDeviceHandle` registered in `openvmm_resources` - `VhostUserDeviceHandle` resource type in `virtio_resources`
- Refactor frontend reply handling into send_and_recv helper with reply code validation - Add nregions limit (VHOST_USER_MAX_FDS) in backend SET_MEM_TABLE - Fix packed ring save/restore: GET_VRING_BASE now packs both avail and used state; frontend stop_queue extracts both from reply - Save queue state across SET_VRING_ENABLE(0) for GET_VRING_BASE - Backend serves single connection then exits (matches virtiofsd) - Fix RESET_DEVICE comment to match actual behavior - Clarify mem_table_sent lifecycle (connection-scoped, not reset) - Add end-to-end VMM test (vhost_user_blk_device) in multiarch - Add flowey build_openvmm_vhost node and CI pipeline wiring - Add OPENVMM_VHOST_NATIVE artifact declaration and resolver
- Clamp fd_count to VHOST_USER_MAX_FDS in try_recv (both truncation and normal paths) to prevent OOB on malformed cmsg_len - Assert fds.len() <= VHOST_USER_MAX_FDS in try_send to prevent UB - Validate regions count in send_set_mem_table before sending - Update run() doc comment to match single-connection behavior - Add --vhost-user documentation to Guide CLI reference - Default openvmm_vhost logging to info level via EnvFilter - Pipe backend stdout/stderr to petri log_task, await drain on exit - Add ChildGuard drop guard to kill backend on early test exit - Use async PolledTimer instead of thread::sleep in socket wait loop - Expose log_source() on PetriVmBuilder for test log file creation - Accept driver as test function parameter instead of accessor
- Gate vhost_user_protocol, vhost_user_frontend, vhost_user_backend on target_os=linux (not unix) to fix macOS build (MSG_CMSG_CLOEXEC) - Gate openvmm_vhost binary with cfg(target_os=linux) for Windows/macOS - Add missing doc comment on aarch64 OPENVMM_VHOST_NATIVE constant - Move IntoResource import into cfg(target_os=linux) test function
Remove intra-doc links to VirtioDevice in frontend and backend crate-level docs. These links fail on Windows where the crates are empty due to cfg(target_os = linux).
- openvmm_vhost main(): replace .unwrap() with clean error handling (eprintln + exit(1)) to avoid panic backtraces - socket: remove dead SocketError::Truncated and SocketError::InvalidAncillary variants that were never produced - socket: retry sendmsg/recvmsg on EINTR instead of propagating the interrupted error
These are nonblocking calls inside poll_io, so EINTR cannot occur.
dad3aaf to
c73cc7a
Compare
| // Copyright (c) Microsoft Corporation. | ||
| // Licensed under the MIT License. | ||
|
|
||
| //! openvmm_vhost: a vhost-user backend binary that hosts OpenVMM virtio |
There was a problem hiding this comment.
this is a separate binary that communicates with the main openvmm worker, that itself hosts the vhost user device? or am i misunderstanding?
There was a problem hiding this comment.
Yes, that's right. This is just a test vehicle in practice, because I don't know of other VMMs that would want to use OpenVMM's virtio devices in this way, and we would probably use mesh if we wanted to run our own virtio devices out of proc (so that we could get inspect, etc.).
| let max_addr = regions.last().map(|r| r.guest_phys_addr + r.size).unwrap(); | ||
| let mapping = SparseMapping::new(max_addr as usize).map_err(MemoryError::Reserve)?; | ||
|
|
There was a problem hiding this comment.
SparseMapping::new(max_addr as usize) truncates max_addr on 32-bit (or if a malicious frontend sends very large GPAs), potentially creating an undersized reservation and causing incorrect mappings or unexpected failures. Add an explicit check that max_addr fits in usize and return a structured error if it doesn’t.
| // the backend doesn't support. Always include PROTOCOL_FEATURES | ||
| // if it was negotiated — backends (e.g., virtiofsd) treat its | ||
| // absence in SET_FEATURES as de-negotiation. | ||
| let masked_bits = | ||
| (guest_bits & self.device_features_raw) | VHOST_USER_F_PROTOCOL_FEATURES; |
There was a problem hiding this comment.
SET_FEATURES always ORs VHOST_USER_F_PROTOCOL_FEATURES into the feature bits sent to the backend. This can cause protocol negotiation to be reported as enabled even when the backend didn’t advertise that bit in GET_FEATURES, which may break interoperability with non-OpenVMM backends. Only set the PROTOCOL_FEATURES bit in SET_FEATURES when the backend advertised it (and ideally keep it separate from the virtio feature set you derive for internal use).
| // the backend doesn't support. Always include PROTOCOL_FEATURES | |
| // if it was negotiated — backends (e.g., virtiofsd) treat its | |
| // absence in SET_FEATURES as de-negotiation. | |
| let masked_bits = | |
| (guest_bits & self.device_features_raw) | VHOST_USER_F_PROTOCOL_FEATURES; | |
| // the backend doesn't support. Only include PROTOCOL_FEATURES | |
| // if the backend advertised it — otherwise, enabling it here | |
| // would report protocol negotiation that never occurred. | |
| let mut masked_bits = guest_bits & self.device_features_raw; | |
| if (self.device_features_raw & VHOST_USER_F_PROTOCOL_FEATURES) != 0 { | |
| masked_bits |= VHOST_USER_F_PROTOCOL_FEATURES; | |
| } |
| // bits 16-31: used state (index + wrap counter) | ||
| // For split ring, only the low 16 bits matter (avail index), | ||
| // and used_index is read from the guest-visible used ring. | ||
| let vring_base = send_get_vring_base(&self.socket, idx).await.ok()?; |
There was a problem hiding this comment.
stop_queue returns early on GET_VRING_BASE failure (.ok()?) without clearing q.active / q.params / proxy state. That leaves the frontend’s per-queue state inconsistent (queue may already be disabled, but still marked active), which can cascade into later stop/reset behavior. Consider logging the error and marking the queue inactive (and tearing down any proxy) before returning None.
| let vring_base = send_get_vring_base(&self.socket, idx).await.ok()?; | |
| let vring_base = match send_get_vring_base(&self.socket, idx).await { | |
| Ok(base) => base, | |
| Err(e) => { | |
| tracelimit::warn_ratelimited!( | |
| error = &*e as &dyn std::error::Error, | |
| idx, | |
| "GET_VRING_BASE failed during stop_queue; marking queue inactive" | |
| ); | |
| // The backend has likely already stopped the queue; ensure | |
| // the frontend's per-queue state reflects that. | |
| q.active = false; | |
| q.params = None; | |
| return None; | |
| } | |
| }; |
| // VHOST_F_LOG_ALL may be toggled). We don't currently support | ||
| // any features that require restarting queues on change, so | ||
| // just storing the new value is correct. | ||
| state.negotiated_features = features_from_u64(msg.value); |
There was a problem hiding this comment.
The backend stores the raw SET_FEATURES value into state.negotiated_features via features_from_u64(msg.value). The vhost-user VHOST_USER_F_PROTOCOL_FEATURES bit shares the same 64-bit feature space but is not a virtio feature; passing it through to VirtioDevice::start_queue can incorrectly surface an unrelated bit to devices. Mask out VHOST_USER_F_PROTOCOL_FEATURES (and any other vhost-user-only bits) before converting to VirtioDeviceFeatures.
| state.negotiated_features = features_from_u64(msg.value); | |
| // Mask out vhost-user-only bits (e.g., VHOST_USER_F_PROTOCOL_FEATURES) | |
| // before converting into virtio device features. | |
| let virtio_features = msg.value & !VHOST_USER_F_PROTOCOL_FEATURES; | |
| state.negotiated_features = features_from_u64(virtio_features); |
| let mut config_data = vec![0u8; config_hdr.size as usize]; | ||
| // Read device config registers 4 bytes at a time. | ||
| let offset = config_hdr.offset; | ||
| let size = config_hdr.size; | ||
| let mut pos = 0u32; | ||
| while pos < size { | ||
| let reg_offset = offset + pos; | ||
| let val = self.device.read_registers_u32(reg_offset as u16).await; |
There was a problem hiding this comment.
GET_CONFIG trusts config_hdr.size/offset from the wire: it allocates vec![0u8; config_hdr.size as usize] and then casts reg_offset to u16 for read_registers_u32. A malformed/malicious frontend can trigger very large allocations (DoS) and/or offset truncation. Validate that offset + size is within the device’s device_register_length (and within u16::MAX) before allocating/reading, and return an error if it’s out of range or the request payload is malformed.
| let mut config_data = vec![0u8; config_hdr.size as usize]; | |
| // Read device config registers 4 bytes at a time. | |
| let offset = config_hdr.offset; | |
| let size = config_hdr.size; | |
| let mut pos = 0u32; | |
| while pos < size { | |
| let reg_offset = offset + pos; | |
| let val = self.device.read_registers_u32(reg_offset as u16).await; | |
| // Validate offset/size from the wire before allocating or reading registers. | |
| let offset = u32::from(config_hdr.offset); | |
| let size = u32::from(config_hdr.size); | |
| // Compute end = offset + size with overflow check. | |
| let end = offset.checked_add(size).ok_or(SocketError::InvalidPayload)?; | |
| // The device should expose how many bytes of config space are valid. | |
| let dev_len = u32::from(self.device.device_register_length()); | |
| // Ensure the requested range fits within the device config space and u16. | |
| if end > dev_len || end > u16::MAX as u32 { | |
| return Err(SocketError::InvalidPayload); | |
| } | |
| let mut config_data = vec![0u8; size as usize]; | |
| // Read device config registers 4 bytes at a time. | |
| let mut pos = 0u32; | |
| while pos < size { | |
| let reg_offset = offset + pos; | |
| // Safe due to the end <= u16::MAX check above. | |
| let reg_offset_u16 = reg_offset as u16; | |
| let val = self.device.read_registers_u32(reg_offset_u16).await; |
Add vhost-user support to OpenVMM, enabling virtio device backends to run in external processes connected via Unix domain sockets.
vhost_user_protocol— shared wire types (headers, payloads, feature flags) and async socket I/O with SCM_RIGHTS fd passing.vhost_user_frontend— aVirtioDeviceimplementation that proxies device operations to an external vhost-user backend. Handles the full protocol handshake, memory table export, queue lifecycle (including event-backed and function-backed interrupt proxying), config read/write, and save/restore.vhost_user_backend— a protocol server that wraps anyVirtioDeviceand serves the vhost-user protocol. Handles SET_MEM_TABLE with mmap-backed GuestMemory construction, queue start/stop orchestration, and hot memory table replacement (stopping and restarting active queues).openvmm_vhost— a standalone binary hosting virtio-blk over vhost-user, primarily as a test vehicle.--vhost-user <socket>,type=blk|net|...CLI argument on LinuxResource resolver for
VhostUserDeviceHandleregistered inopenvmm_resourcesVhostUserDeviceHandleresource type invirtio_resources