nvme_driver: Free all resources when keepalive is off#3086
nvme_driver: Free all resources when keepalive is off#3086alandau wants to merge 3 commits intomicrosoft:mainfrom
Conversation
When NVMe keepalive was off, the driver didn't free all resources on shutdown. E.g., IoIssuers kept within Namespaces were still referencing DMA allocations, thus these DMA regions were saved (if MANA keepalive was on). However, after servicing with keepalive off, NVMe manager didn't restore these allocations, which failed validation ("unrestored allocations found").
This change makes several changes:
1. Namespace now references IoIssuers via a Weak rather than Arc, which doesn't prevent resources from being freed when NVMe manager shuts down.
2. NVMe manager is shut down before saving when KA is off to free all resources (including DMA allocations) before save. It's still (partially) shut down after save when KA is on, as before.
3. Added a test that verifies servicing succeeds for all 4 combinations of (NVMe, MANA) keepalive.
There was a problem hiding this comment.
Pull request overview
This PR fixes an NVMe servicing teardown/resource-lifetime issue when NVMe keepalive is disabled, ensuring DMA allocations don’t remain referenced and inadvertently get persisted via other keepalive paths (e.g., MANA).
Changes:
- Switch
Namespace→IoIssuersreferences fromArctoWeakso namespaces don’t keep IO/DMA-related resources alive past NVMe manager shutdown. - Adjust Underhill servicing flow to shut down the NVMe manager before save when NVMe keepalive is disabled (freeing resources before any DMA-manager state is captured).
- Add a servicing test intended to cover all 4 combinations of (NVMe, MANA) keepalive.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
vmm_tests/vmm_tests/tests/tests/multiarch/openhcl_servicing.rs |
Adds a new servicing test for NVMe/MANA keepalive combinations. |
vm/devices/storage/disk_nvme/nvme_driver/src/namespace.rs |
Replaces Arc<IoIssuers> with Weak<IoIssuers> and upgrades on-demand when issuing IO. |
vm/devices/storage/disk_nvme/nvme_driver/src/driver.rs |
Updates IoIssuers::get to return Arc<Issuer> (clone) instead of &Issuer to support the new ownership model. |
openhcl/underhill_core/src/dispatch/mod.rs |
Shuts down NVMe manager before save when NVMe KA is off; retains post-save partial shutdown behavior when NVMe KA is on. |
| // If keepalive is disabled, reset all user-mode NVMe devices before | ||
| // save to free all resources, including DMA allocations. | ||
| if !nvme_keepalive_enabled { | ||
| shutdown_nvme_manager( | ||
| &mut self.nvme_manager, | ||
| nvme_keepalive_enabled, | ||
| correlation_id, | ||
| ) | ||
| .await; | ||
| } |
There was a problem hiding this comment.
shutdown_nvme_manager(&mut self.nvme_manager, ...) is invoked before self.save(...) when NVMe keepalive is disabled, and it take()s (drops) the NvmeManager. If self.save(...) later returns an error, the error path only calls resume_drivers() (which currently doesn’t recreate/restart nvme_manager), so the VM may resume running without NVMe devices available. Consider keeping the manager available on save failure (e.g., avoid take() until after a successful save, or add a non-consuming “reset/free resources for save” path, or explicitly reinitialize nvme_manager in the failure rollback).
There was a problem hiding this comment.
@mattkur: Does this comment make sense? Can save fail (with the guest expected to continue running as if nothing happened)? If yes, I don't like any of the presented options since the NVMe manager's state is torn down by the time save failed, and recreating it might be a PITA.
There was a problem hiding this comment.
Hypothetically, yes. But I don't know if we currently have an idempotent save. @jstarks: interested in your architectural perspective. (1) do we want save to be idempotent, and (2) do you think we're there?
There was a problem hiding this comment.
Perhaps I'm understanding the word "idempotent" differently... In my understanding idempotent means a second run will not do anything differently from the first run (i.e. the result of run 1+2 is the same as the result of run 1). I think this is orthogonal to the issue at hand, which is: can save fail and are we expected to recover in this case? (unless I'm misunderstanding something)
There was a problem hiding this comment.
You're right, and I blame weekend brain. I mean: do we expect save -> resume to work in openhcl (without a restore and reload)
When NVMe keepalive was off, the driver didn't free all resources on shutdown. E.g., IoIssuers kept within Namespaces were still referencing DMA allocations, thus these DMA regions were saved (if MANA keepalive was on). However, after servicing with keepalive off, NVMe manager didn't restore these allocations, which failed validation ("unrestored allocations found").
This change makes several changes: