igvm: Add CPUID 0xD sub-functions 11 and 12 for CET xstate#60
Open
souradeep100 wants to merge 1 commit intomicrosoft:mainfrom
Open
igvm: Add CPUID 0xD sub-functions 11 and 12 for CET xstate#60souradeep100 wants to merge 1 commit intomicrosoft:mainfrom
souradeep100 wants to merge 1 commit intomicrosoft:mainfrom
Conversation
The SNP CPUID table template only includes CPUID 0xD (Extended State
Enumeration) sub-functions 0 through 8. On hosts that support CET
(Control-flow Enforcement Technology), CPUID 0xD:1 advertises CET_U
(bit 11) and CET_S (bit 12) in the XSS supervisor state mask (ECX).
When the guest kernel processes CPUID 0xD:1 and finds CET xstate bits
enabled, it calls snp_cpuid_calc_xsave_size() to compute the total
xsave area size by iterating over all enabled xfeature sub-functions.
Since sub-functions 11 and 12 are missing from the SNP CPUID table,
the lookup fails with xfeatures_found != xfeatures_en, returning 0.
This causes the kernel to call sev_es_terminate(), which sends
GHCB_MSR_TERM_REQ (VMGEXIT 0x100) and crashes the VM.
Add sub-functions 11 (CET_U) and 12 (CET_S) to the CPUID 0xD entries
in the SNP CPUID table template. The PSP firmware populates the actual
EAX/EBX/ECX/EDX values at launch time from hardware. On hosts without
CET support, these entries are harmlessly zeroed out by the PSP.
Testing:
Reproducer (on Azure DC16as_cc_v5 with CET, /dev/mshv):
igvmgen -kernel bzImage \
-append "console=ttyS0 root=/dev/vda1 rw" \
-boot_mode x64 -vtl 0 -svme 1 -encrypted_page 1 \
-pvalidate_opt 1 -o output.bin
cloud-hypervisor --cpus boot=1,nested=off --memory size=512M \
--disk path=osdisk.img path=cloudinit \
--igvm output.bin --platform sev_snp=on -v
Before fix (with Cloud Hypervisor GHCB CPUID handler patched):
Guest kernel panics during xsave size computation:
x86/fpu: misordered xstate at 576
sev_es_terminate() -> VMGEXIT 0x100 (GHCB_MSR_TERM_REQ)
After fix: VM boots successfully to login prompt.
Signed-off-by: Souradeep Chakrabarti <schakrabarti@microsoft.com>
Member
|
@KenGordon ping for a review. Thanks |
Contributor
|
Can you explain which generation has these bits, maybe linking to some AMD docs, and confirm that you have tested this on Milan and Genoa please? |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
The SNP CPUID table template only includes CPUID 0xD (Extended State Enumeration) sub-functions 0 through 8. On hosts that support CET (Control-flow Enforcement Technology), CPUID 0xD:1 advertises CET_U (bit 11) and CET_S (bit 12) in the XSS supervisor state mask (ECX).
When the guest kernel processes CPUID 0xD:1 and finds CET xstate bits enabled, it calls snp_cpuid_calc_xsave_size() to compute the total xsave area size by iterating over all enabled xfeature sub-functions. Since sub-functions 11 and 12 are missing from the SNP CPUID table, the lookup fails with xfeatures_found != xfeatures_en, returning 0. This causes the kernel to call sev_es_terminate(), which sends GHCB_MSR_TERM_REQ (VMGEXIT 0x100) and crashes the VM.
Add sub-functions 11 (CET_U) and 12 (CET_S) to the CPUID 0xD entries in the SNP CPUID table template. The PSP firmware populates the actual EAX/EBX/ECX/EDX values at launch time from hardware. On hosts without CET support, these entries are harmlessly zeroed out by the PSP.
Testing:
Reproducer (on Azure with CET, /dev/mshv):
igvmgen -kernel bzImage
-append "console=ttyS0 root=/dev/vda1 rw"
-boot_mode x64 -vtl 0 -svme 1 -encrypted_page 1
-pvalidate_opt 1 -o output.bin
cloud-hypervisor --cpus boot=1,nested=off --memory size=512M
--disk path=osdisk.img path=cloudinit
--igvm output.bin --platform sev_snp=on -v
Before fix (with Cloud Hypervisor GHCB CPUID handler patched):
After fix: VM boots successfully to login prompt.