2020-03-24 10:41:54 +01:00
|
|
|
// SPDX-License-Identifier: GPL-2.0-only
|
|
|
|
/*
|
|
|
|
* Kernel-based Virtual Machine driver for Linux
|
|
|
|
*
|
|
|
|
* AMD SVM-SEV support
|
|
|
|
*
|
|
|
|
* Copyright 2010 Red Hat, Inc. and/or its affiliates.
|
|
|
|
*/
|
KVM: x86: Unify pr_fmt to use module name for all KVM modules
Define pr_fmt using KBUILD_MODNAME for all KVM x86 code so that printks
use consistent formatting across common x86, Intel, and AMD code. In
addition to providing consistent print formatting, using KBUILD_MODNAME,
e.g. kvm_amd and kvm_intel, allows referencing SVM and VMX (and SEV and
SGX and ...) as technologies without generating weird messages, and
without causing naming conflicts with other kernel code, e.g. "SEV: ",
"tdx: ", "sgx: " etc.. are all used by the kernel for non-KVM subsystems.
Opportunistically move away from printk() for prints that need to be
modified anyways, e.g. to drop a manual "kvm: " prefix.
Opportunistically convert a few SGX WARNs that are similarly modified to
WARN_ONCE; in the very unlikely event that the WARNs fire, odds are good
that they would fire repeatedly and spam the kernel log without providing
unique information in each print.
Note, defining pr_fmt yields undesirable results for code that uses KVM's
printk wrappers, e.g. vcpu_unimpl(). But, that's a pre-existing problem
as SVM/kvm_amd already defines a pr_fmt, and thankfully use of KVM's
wrappers is relatively limited in KVM x86 code.
Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Paul Durrant <paul@xen.org>
Message-Id: <20221130230934.1014142-35-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-11-30 23:09:18 +00:00
|
|
|
#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
|
2020-03-24 10:41:54 +01:00
|
|
|
|
|
|
|
#include <linux/kvm_types.h>
|
|
|
|
#include <linux/kvm_host.h>
|
|
|
|
#include <linux/kernel.h>
|
|
|
|
#include <linux/highmem.h>
|
2023-03-10 15:19:45 -06:00
|
|
|
#include <linux/psp.h>
|
2020-03-24 10:41:54 +01:00
|
|
|
#include <linux/psp-sev.h>
|
2020-04-11 18:09:27 +02:00
|
|
|
#include <linux/pagemap.h>
|
2020-03-24 10:41:54 +01:00
|
|
|
#include <linux/swap.h>
|
2021-03-29 21:42:06 -07:00
|
|
|
#include <linux/misc_cgroup.h>
|
2020-12-10 11:09:40 -06:00
|
|
|
#include <linux/processor.h>
|
2020-12-10 11:09:48 -06:00
|
|
|
#include <linux/trace_events.h>
|
KVM: SEV: Provide support for SNP_GUEST_REQUEST NAE event
Version 2 of GHCB specification added support for the SNP Guest Request
Message NAE event. The event allows for an SEV-SNP guest to make
requests to the SEV-SNP firmware through the hypervisor using the
SNP_GUEST_REQUEST API defined in the SEV-SNP firmware specification.
This is used by guests primarily to request attestation reports from
firmware. There are other request types are available as well, but the
specifics of what guest requests are being made generally does not
affect how they are handled by the hypervisor, which only serves as a
proxy for the guest requests and firmware responses.
Implement handling for these events.
When an SNP Guest Request is issued, the guest will provide its own
request/response pages, which could in theory be passed along directly
to firmware. However, these pages would need special care:
- Both pages are from shared guest memory, so they need to be
protected from migration/etc. occurring while firmware reads/writes
to them. At a minimum, this requires elevating the ref counts and
potentially needing an explicit pinning of the memory. This places
additional restrictions on what type of memory backends userspace
can use for shared guest memory since there would be some reliance
on using refcounted pages.
- The response page needs to be switched to Firmware-owned state
before the firmware can write to it, which can lead to potential
host RMP #PFs if the guest is misbehaved and hands the host a
guest page that KVM is writing to for other reasons (e.g. virtio
buffers).
Both of these issues can be avoided completely by using
separately-allocated bounce pages for both the request/response pages
and passing those to firmware instead. So that's the approach taken
here.
Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Co-developed-by: Alexey Kardashevskiy <aik@amd.com>
Signed-off-by: Alexey Kardashevskiy <aik@amd.com>
Co-developed-by: Ashish Kalra <ashish.kalra@amd.com>
Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com>
Reviewed-by: Liam Merwick <liam.merwick@oracle.com>
[mdr: ensure FW command failures are indicated to guest, drop extended
request handling to be re-written as separate patch, massage commit]
Signed-off-by: Michael Roth <michael.roth@amd.com>
Message-ID: <20240701223148.3798365-2-michael.roth@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-07-01 17:31:46 -05:00
|
|
|
#include <uapi/linux/sev-guest.h>
|
2020-03-24 10:41:54 +01:00
|
|
|
|
2021-06-23 14:02:05 +02:00
|
|
|
#include <asm/pkru.h>
|
2020-12-15 12:44:07 -05:00
|
|
|
#include <asm/trapnr.h>
|
2021-10-15 03:16:31 +02:00
|
|
|
#include <asm/fpu/xcr.h>
|
2024-04-04 08:13:21 -04:00
|
|
|
#include <asm/fpu/xstate.h>
|
2023-06-15 16:37:54 +10:00
|
|
|
#include <asm/debugreg.h>
|
2025-04-30 22:42:41 -07:00
|
|
|
#include <asm/msr.h>
|
2024-05-01 03:51:55 -05:00
|
|
|
#include <asm/sev.h>
|
2020-12-15 12:44:07 -05:00
|
|
|
|
2022-08-03 22:49:57 +00:00
|
|
|
#include "mmu.h"
|
2020-03-24 10:41:54 +01:00
|
|
|
#include "x86.h"
|
|
|
|
#include "svm.h"
|
2020-12-30 16:27:00 -08:00
|
|
|
#include "svm_ops.h"
|
2020-12-10 11:09:47 -06:00
|
|
|
#include "cpuid.h"
|
2020-12-10 11:09:48 -06:00
|
|
|
#include "trace.h"
|
2020-03-24 10:41:54 +01:00
|
|
|
|
2024-05-01 02:10:48 -05:00
|
|
|
#define GHCB_VERSION_MAX 2ULL
|
|
|
|
#define GHCB_VERSION_DEFAULT 2ULL
|
2024-04-04 08:13:12 -04:00
|
|
|
#define GHCB_VERSION_MIN 1ULL
|
2021-03-29 21:42:06 -07:00
|
|
|
|
2024-05-01 03:52:02 -05:00
|
|
|
#define GHCB_HV_FT_SUPPORTED (GHCB_HV_FT_SNP | GHCB_HV_FT_SNP_AP_CREATION)
|
2024-05-01 02:10:46 -05:00
|
|
|
|
2021-04-21 19:11:14 -07:00
|
|
|
/* enable/disable SEV support */
|
2021-04-21 19:11:19 -07:00
|
|
|
static bool sev_enabled = true;
|
2021-04-21 19:11:17 -07:00
|
|
|
module_param_named(sev, sev_enabled, bool, 0444);
|
2021-04-21 19:11:14 -07:00
|
|
|
|
|
|
|
/* enable/disable SEV-ES support */
|
2021-04-21 19:11:19 -07:00
|
|
|
static bool sev_es_enabled = true;
|
2021-04-21 19:11:17 -07:00
|
|
|
module_param_named(sev_es, sev_es_enabled, bool, 0444);
|
2023-06-15 16:37:54 +10:00
|
|
|
|
2024-05-01 03:51:54 -05:00
|
|
|
/* enable/disable SEV-SNP support */
|
2024-05-01 03:52:07 -05:00
|
|
|
static bool sev_snp_enabled = true;
|
|
|
|
module_param_named(sev_snp, sev_snp_enabled, bool, 0444);
|
2024-05-01 03:51:54 -05:00
|
|
|
|
2023-06-15 16:37:54 +10:00
|
|
|
/* enable/disable SEV-ES DebugSwap support */
|
2024-04-04 08:13:23 -04:00
|
|
|
static bool sev_es_debug_swap_enabled = true;
|
2023-06-15 16:37:54 +10:00
|
|
|
module_param_named(debug_swap, sev_es_debug_swap_enabled, bool, 0444);
|
2024-04-04 08:13:15 -04:00
|
|
|
static u64 sev_supported_vmsa_features;
|
2021-04-21 19:11:14 -07:00
|
|
|
|
2024-05-01 02:10:45 -05:00
|
|
|
#define AP_RESET_HOLD_NONE 0
|
|
|
|
#define AP_RESET_HOLD_NAE_EVENT 1
|
|
|
|
#define AP_RESET_HOLD_MSR_PROTO 2
|
|
|
|
|
2024-05-01 03:51:55 -05:00
|
|
|
/* As defined by SEV-SNP Firmware ABI, under "Guest Policy". */
|
|
|
|
#define SNP_POLICY_MASK_API_MINOR GENMASK_ULL(7, 0)
|
|
|
|
#define SNP_POLICY_MASK_API_MAJOR GENMASK_ULL(15, 8)
|
|
|
|
#define SNP_POLICY_MASK_SMT BIT_ULL(16)
|
|
|
|
#define SNP_POLICY_MASK_RSVD_MBO BIT_ULL(17)
|
|
|
|
#define SNP_POLICY_MASK_DEBUG BIT_ULL(19)
|
|
|
|
#define SNP_POLICY_MASK_SINGLE_SOCKET BIT_ULL(20)
|
|
|
|
|
|
|
|
#define SNP_POLICY_MASK_VALID (SNP_POLICY_MASK_API_MINOR | \
|
|
|
|
SNP_POLICY_MASK_API_MAJOR | \
|
|
|
|
SNP_POLICY_MASK_SMT | \
|
|
|
|
SNP_POLICY_MASK_RSVD_MBO | \
|
|
|
|
SNP_POLICY_MASK_DEBUG | \
|
|
|
|
SNP_POLICY_MASK_SINGLE_SOCKET)
|
|
|
|
|
2024-05-01 03:51:57 -05:00
|
|
|
#define INITIAL_VMSA_GPA 0xFFFFFFFFF000
|
|
|
|
|
2020-12-10 11:09:49 -06:00
|
|
|
static u8 sev_enc_bit;
|
2020-03-24 10:41:54 +01:00
|
|
|
static DECLARE_RWSEM(sev_deactivate_lock);
|
|
|
|
static DEFINE_MUTEX(sev_bitmap_lock);
|
|
|
|
unsigned int max_sev_asid;
|
|
|
|
static unsigned int min_sev_asid;
|
2021-04-15 15:53:55 +00:00
|
|
|
static unsigned long sev_me_mask;
|
2021-08-02 11:09:03 -07:00
|
|
|
static unsigned int nr_asids;
|
2020-03-24 10:41:54 +01:00
|
|
|
static unsigned long *sev_asid_bitmap;
|
|
|
|
static unsigned long *sev_reclaim_asid_bitmap;
|
|
|
|
|
2024-05-01 03:51:55 -05:00
|
|
|
static int snp_decommission_context(struct kvm *kvm);
|
|
|
|
|
2020-03-24 10:41:54 +01:00
|
|
|
struct enc_region {
|
|
|
|
struct list_head list;
|
|
|
|
unsigned long npages;
|
|
|
|
struct page **pages;
|
|
|
|
unsigned long uaddr;
|
|
|
|
unsigned long size;
|
|
|
|
};
|
|
|
|
|
2021-04-21 19:11:25 -07:00
|
|
|
/* Called with the sev_bitmap_lock held, or on shutdown */
|
2024-01-31 15:56:07 -08:00
|
|
|
static int sev_flush_asids(unsigned int min_asid, unsigned int max_asid)
|
2020-03-24 10:41:54 +01:00
|
|
|
{
|
2024-01-31 15:56:07 -08:00
|
|
|
int ret, error = 0;
|
|
|
|
unsigned int asid;
|
2021-04-21 19:11:25 -07:00
|
|
|
|
|
|
|
/* Check if there are any ASIDs to reclaim before performing a flush */
|
2021-08-02 11:09:03 -07:00
|
|
|
asid = find_next_bit(sev_reclaim_asid_bitmap, nr_asids, min_asid);
|
|
|
|
if (asid > max_asid)
|
2021-04-21 19:11:25 -07:00
|
|
|
return -EBUSY;
|
2020-03-24 10:41:54 +01:00
|
|
|
|
|
|
|
/*
|
|
|
|
* DEACTIVATE will clear the WBINVD indicator causing DF_FLUSH to fail,
|
|
|
|
* so it must be guarded.
|
|
|
|
*/
|
|
|
|
down_write(&sev_deactivate_lock);
|
|
|
|
|
2025-05-22 16:37:31 -07:00
|
|
|
/* SNP firmware requires use of WBINVD for ASID recycling. */
|
2020-03-24 10:41:54 +01:00
|
|
|
wbinvd_on_all_cpus();
|
2024-05-01 03:51:55 -05:00
|
|
|
|
|
|
|
if (sev_snp_enabled)
|
|
|
|
ret = sev_do_cmd(SEV_CMD_SNP_DF_FLUSH, NULL, &error);
|
|
|
|
else
|
|
|
|
ret = sev_guest_df_flush(&error);
|
2020-03-24 10:41:54 +01:00
|
|
|
|
|
|
|
up_write(&sev_deactivate_lock);
|
|
|
|
|
|
|
|
if (ret)
|
2024-05-01 03:51:55 -05:00
|
|
|
pr_err("SEV%s: DF_FLUSH failed, ret=%d, error=%#x\n",
|
|
|
|
sev_snp_enabled ? "-SNP" : "", ret, error);
|
2020-03-24 10:41:54 +01:00
|
|
|
|
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
|
2021-04-08 22:32:14 +00:00
|
|
|
static inline bool is_mirroring_enc_context(struct kvm *kvm)
|
|
|
|
{
|
2024-04-04 08:13:19 -04:00
|
|
|
return !!to_kvm_sev_info(kvm)->enc_context_owner;
|
2021-04-08 22:32:14 +00:00
|
|
|
}
|
|
|
|
|
2024-04-04 08:13:16 -04:00
|
|
|
static bool sev_vcpu_has_debug_swap(struct vcpu_svm *svm)
|
|
|
|
{
|
|
|
|
struct kvm_vcpu *vcpu = &svm->vcpu;
|
2025-01-23 11:21:40 +05:30
|
|
|
struct kvm_sev_info *sev = to_kvm_sev_info(vcpu->kvm);
|
2024-04-04 08:13:16 -04:00
|
|
|
|
|
|
|
return sev->vmsa_features & SVM_SEV_FEAT_DEBUG_SWAP;
|
|
|
|
}
|
|
|
|
|
2020-03-24 10:41:54 +01:00
|
|
|
/* Must be called with the sev_bitmap_lock held */
|
2024-01-31 15:56:07 -08:00
|
|
|
static bool __sev_recycle_asids(unsigned int min_asid, unsigned int max_asid)
|
2020-03-24 10:41:54 +01:00
|
|
|
{
|
2021-04-21 19:11:25 -07:00
|
|
|
if (sev_flush_asids(min_asid, max_asid))
|
2020-03-24 10:41:54 +01:00
|
|
|
return false;
|
|
|
|
|
2020-12-10 11:10:05 -06:00
|
|
|
/* The flush process will flush all reclaimable SEV and SEV-ES ASIDs */
|
2020-03-24 10:41:54 +01:00
|
|
|
bitmap_xor(sev_asid_bitmap, sev_asid_bitmap, sev_reclaim_asid_bitmap,
|
2021-08-02 11:09:03 -07:00
|
|
|
nr_asids);
|
|
|
|
bitmap_zero(sev_reclaim_asid_bitmap, nr_asids);
|
2020-03-24 10:41:54 +01:00
|
|
|
|
|
|
|
return true;
|
|
|
|
}
|
|
|
|
|
2021-11-11 10:02:26 -05:00
|
|
|
static int sev_misc_cg_try_charge(struct kvm_sev_info *sev)
|
|
|
|
{
|
|
|
|
enum misc_res_type type = sev->es_active ? MISC_CG_RES_SEV_ES : MISC_CG_RES_SEV;
|
|
|
|
return misc_cg_try_charge(type, sev->misc_cg, 1);
|
|
|
|
}
|
|
|
|
|
|
|
|
static void sev_misc_cg_uncharge(struct kvm_sev_info *sev)
|
|
|
|
{
|
|
|
|
enum misc_res_type type = sev->es_active ? MISC_CG_RES_SEV_ES : MISC_CG_RES_SEV;
|
|
|
|
misc_cg_uncharge(type, sev->misc_cg, 1);
|
|
|
|
}
|
|
|
|
|
2020-12-10 11:10:05 -06:00
|
|
|
static int sev_asid_new(struct kvm_sev_info *sev)
|
2020-03-24 10:41:54 +01:00
|
|
|
{
|
2024-01-31 15:56:08 -08:00
|
|
|
/*
|
|
|
|
* SEV-enabled guests must use asid from min_sev_asid to max_sev_asid.
|
|
|
|
* SEV-ES-enabled guest can use from 1 to min_sev_asid - 1.
|
|
|
|
* Note: min ASID can end up larger than the max if basic SEV support is
|
|
|
|
* effectively disabled by disallowing use of ASIDs for SEV guests.
|
|
|
|
*/
|
|
|
|
unsigned int min_asid = sev->es_active ? 1 : min_sev_asid;
|
|
|
|
unsigned int max_asid = sev->es_active ? min_sev_asid - 1 : max_sev_asid;
|
|
|
|
unsigned int asid;
|
2020-03-24 10:41:54 +01:00
|
|
|
bool retry = true;
|
2024-01-31 15:56:07 -08:00
|
|
|
int ret;
|
2021-03-29 21:42:06 -07:00
|
|
|
|
2024-01-31 15:56:08 -08:00
|
|
|
if (min_asid > max_asid)
|
|
|
|
return -ENOTTY;
|
2021-03-29 21:42:06 -07:00
|
|
|
|
|
|
|
WARN_ON(sev->misc_cg);
|
|
|
|
sev->misc_cg = get_current_misc_cg();
|
2021-11-11 10:02:26 -05:00
|
|
|
ret = sev_misc_cg_try_charge(sev);
|
2021-03-29 21:42:06 -07:00
|
|
|
if (ret) {
|
|
|
|
put_misc_cg(sev->misc_cg);
|
|
|
|
sev->misc_cg = NULL;
|
|
|
|
return ret;
|
|
|
|
}
|
2020-03-24 10:41:54 +01:00
|
|
|
|
|
|
|
mutex_lock(&sev_bitmap_lock);
|
|
|
|
|
|
|
|
again:
|
2021-08-02 11:09:03 -07:00
|
|
|
asid = find_next_zero_bit(sev_asid_bitmap, max_asid + 1, min_asid);
|
|
|
|
if (asid > max_asid) {
|
2020-12-10 11:10:05 -06:00
|
|
|
if (retry && __sev_recycle_asids(min_asid, max_asid)) {
|
2020-03-24 10:41:54 +01:00
|
|
|
retry = false;
|
|
|
|
goto again;
|
|
|
|
}
|
|
|
|
mutex_unlock(&sev_bitmap_lock);
|
2021-03-29 21:42:06 -07:00
|
|
|
ret = -EBUSY;
|
|
|
|
goto e_uncharge;
|
2020-03-24 10:41:54 +01:00
|
|
|
}
|
|
|
|
|
2021-08-02 11:09:03 -07:00
|
|
|
__set_bit(asid, sev_asid_bitmap);
|
2020-03-24 10:41:54 +01:00
|
|
|
|
|
|
|
mutex_unlock(&sev_bitmap_lock);
|
|
|
|
|
2024-01-31 15:56:06 -08:00
|
|
|
sev->asid = asid;
|
|
|
|
return 0;
|
2021-03-29 21:42:06 -07:00
|
|
|
e_uncharge:
|
2021-11-11 10:02:26 -05:00
|
|
|
sev_misc_cg_uncharge(sev);
|
2021-03-29 21:42:06 -07:00
|
|
|
put_misc_cg(sev->misc_cg);
|
|
|
|
sev->misc_cg = NULL;
|
|
|
|
return ret;
|
2020-03-24 10:41:54 +01:00
|
|
|
}
|
|
|
|
|
2024-01-31 15:56:07 -08:00
|
|
|
static unsigned int sev_get_asid(struct kvm *kvm)
|
2020-03-24 10:41:54 +01:00
|
|
|
{
|
2025-01-23 11:21:40 +05:30
|
|
|
return to_kvm_sev_info(kvm)->asid;
|
2020-03-24 10:41:54 +01:00
|
|
|
}
|
|
|
|
|
2021-03-29 21:42:06 -07:00
|
|
|
static void sev_asid_free(struct kvm_sev_info *sev)
|
2020-03-24 10:41:54 +01:00
|
|
|
{
|
|
|
|
struct svm_cpu_data *sd;
|
2021-08-02 11:09:03 -07:00
|
|
|
int cpu;
|
2020-03-24 10:41:54 +01:00
|
|
|
|
|
|
|
mutex_lock(&sev_bitmap_lock);
|
|
|
|
|
2021-08-02 11:09:03 -07:00
|
|
|
__set_bit(sev->asid, sev_reclaim_asid_bitmap);
|
2020-03-24 10:41:54 +01:00
|
|
|
|
|
|
|
for_each_possible_cpu(cpu) {
|
2022-11-09 09:07:55 -05:00
|
|
|
sd = per_cpu_ptr(&svm_data, cpu);
|
KVM: SVM: Fix off-by-one indexing when nullifying last used SEV VMCB
Use the raw ASID, not ASID-1, when nullifying the last used VMCB when
freeing an SEV ASID. The consumer, pre_sev_run(), indexes the array by
the raw ASID, thus KVM could get a false negative when checking for a
different VMCB if KVM manages to reallocate the same ASID+VMCB combo for
a new VM.
Note, this cannot cause a functional issue _in the current code_, as
pre_sev_run() also checks which pCPU last did VMRUN for the vCPU, and
last_vmentry_cpu is initialized to -1 during vCPU creation, i.e. is
guaranteed to mismatch on the first VMRUN. However, prior to commit
8a14fe4f0c54 ("kvm: x86: Move last_cpu into kvm_vcpu_arch as
last_vmentry_cpu"), SVM tracked pCPU on its own and zero-initialized the
last_cpu variable. Thus it's theoretically possible that older versions
of KVM could miss a TLB flush if the first VMRUN is on pCPU0 and the ASID
and VMCB exactly match those of a prior VM.
Fixes: 70cd94e60c73 ("KVM: SVM: VMRUN should use associated ASID when SEV is enabled")
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: Brijesh Singh <brijesh.singh@amd.com>
Cc: stable@vger.kernel.org
Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-08-03 09:27:46 -07:00
|
|
|
sd->sev_vmcbs[sev->asid] = NULL;
|
2020-03-24 10:41:54 +01:00
|
|
|
}
|
|
|
|
|
|
|
|
mutex_unlock(&sev_bitmap_lock);
|
2021-03-29 21:42:06 -07:00
|
|
|
|
2021-11-11 10:02:26 -05:00
|
|
|
sev_misc_cg_uncharge(sev);
|
2021-03-29 21:42:06 -07:00
|
|
|
put_misc_cg(sev->misc_cg);
|
|
|
|
sev->misc_cg = NULL;
|
2020-03-24 10:41:54 +01:00
|
|
|
}
|
|
|
|
|
2021-06-10 17:46:04 +00:00
|
|
|
static void sev_decommission(unsigned int handle)
|
2020-03-24 10:41:54 +01:00
|
|
|
{
|
2021-04-06 15:49:52 -07:00
|
|
|
struct sev_data_decommission decommission;
|
2021-06-10 17:46:04 +00:00
|
|
|
|
|
|
|
if (!handle)
|
|
|
|
return;
|
|
|
|
|
|
|
|
decommission.handle = handle;
|
|
|
|
sev_guest_decommission(&decommission, NULL);
|
|
|
|
}
|
|
|
|
|
2024-05-28 15:58:09 -05:00
|
|
|
/*
|
|
|
|
* Transition a page to hypervisor-owned/shared state in the RMP table. This
|
|
|
|
* should not fail under normal conditions, but leak the page should that
|
|
|
|
* happen since it will no longer be usable by the host due to RMP protections.
|
|
|
|
*/
|
|
|
|
static int kvm_rmp_make_shared(struct kvm *kvm, u64 pfn, enum pg_level level)
|
|
|
|
{
|
|
|
|
if (KVM_BUG_ON(rmp_make_shared(pfn, level), kvm)) {
|
|
|
|
snp_leak_pages(pfn, page_level_size(level) >> PAGE_SHIFT);
|
|
|
|
return -EIO;
|
|
|
|
}
|
|
|
|
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2024-05-01 03:51:56 -05:00
|
|
|
/*
|
|
|
|
* Certain page-states, such as Pre-Guest and Firmware pages (as documented
|
|
|
|
* in Chapter 5 of the SEV-SNP Firmware ABI under "Page States") cannot be
|
|
|
|
* directly transitioned back to normal/hypervisor-owned state via RMPUPDATE
|
|
|
|
* unless they are reclaimed first.
|
|
|
|
*
|
|
|
|
* Until they are reclaimed and subsequently transitioned via RMPUPDATE, they
|
|
|
|
* might not be usable by the host due to being set as immutable or still
|
|
|
|
* being associated with a guest ASID.
|
2024-05-28 15:58:09 -05:00
|
|
|
*
|
|
|
|
* Bug the VM and leak the page if reclaim fails, or if the RMP entry can't be
|
|
|
|
* converted back to shared, as the page is no longer usable due to RMP
|
|
|
|
* protections, and it's infeasible for the guest to continue on.
|
2024-05-01 03:51:56 -05:00
|
|
|
*/
|
2024-05-28 15:58:09 -05:00
|
|
|
static int snp_page_reclaim(struct kvm *kvm, u64 pfn)
|
2024-05-01 03:51:56 -05:00
|
|
|
{
|
|
|
|
struct sev_data_snp_page_reclaim data = {0};
|
2024-05-28 15:58:09 -05:00
|
|
|
int fw_err, rc;
|
2024-05-01 03:51:56 -05:00
|
|
|
|
|
|
|
data.paddr = __sme_set(pfn << PAGE_SHIFT);
|
2024-05-28 15:58:09 -05:00
|
|
|
rc = sev_do_cmd(SEV_CMD_SNP_PAGE_RECLAIM, &data, &fw_err);
|
|
|
|
if (KVM_BUG(rc, kvm, "Failed to reclaim PFN %llx, rc %d fw_err %d", pfn, rc, fw_err)) {
|
2024-05-01 03:51:56 -05:00
|
|
|
snp_leak_pages(pfn, 1);
|
2024-05-28 15:58:09 -05:00
|
|
|
return -EIO;
|
|
|
|
}
|
2024-05-01 03:51:56 -05:00
|
|
|
|
2024-05-28 15:58:09 -05:00
|
|
|
if (kvm_rmp_make_shared(kvm, pfn, PG_LEVEL_4K))
|
|
|
|
return -EIO;
|
2024-05-01 03:51:56 -05:00
|
|
|
|
|
|
|
return rc;
|
|
|
|
}
|
|
|
|
|
2021-06-10 17:46:04 +00:00
|
|
|
static void sev_unbind_asid(struct kvm *kvm, unsigned int handle)
|
|
|
|
{
|
2021-04-06 15:49:52 -07:00
|
|
|
struct sev_data_deactivate deactivate;
|
2020-03-24 10:41:54 +01:00
|
|
|
|
|
|
|
if (!handle)
|
|
|
|
return;
|
|
|
|
|
2021-04-06 15:49:52 -07:00
|
|
|
deactivate.handle = handle;
|
2020-03-24 10:41:54 +01:00
|
|
|
|
|
|
|
/* Guard DEACTIVATE against WBINVD/DF_FLUSH used in ASID recycling */
|
|
|
|
down_read(&sev_deactivate_lock);
|
2021-04-06 15:49:52 -07:00
|
|
|
sev_guest_deactivate(&deactivate, NULL);
|
2020-03-24 10:41:54 +01:00
|
|
|
up_read(&sev_deactivate_lock);
|
|
|
|
|
2021-06-10 17:46:04 +00:00
|
|
|
sev_decommission(handle);
|
2020-03-24 10:41:54 +01:00
|
|
|
}
|
|
|
|
|
KVM: SEV: Provide support for SNP_GUEST_REQUEST NAE event
Version 2 of GHCB specification added support for the SNP Guest Request
Message NAE event. The event allows for an SEV-SNP guest to make
requests to the SEV-SNP firmware through the hypervisor using the
SNP_GUEST_REQUEST API defined in the SEV-SNP firmware specification.
This is used by guests primarily to request attestation reports from
firmware. There are other request types are available as well, but the
specifics of what guest requests are being made generally does not
affect how they are handled by the hypervisor, which only serves as a
proxy for the guest requests and firmware responses.
Implement handling for these events.
When an SNP Guest Request is issued, the guest will provide its own
request/response pages, which could in theory be passed along directly
to firmware. However, these pages would need special care:
- Both pages are from shared guest memory, so they need to be
protected from migration/etc. occurring while firmware reads/writes
to them. At a minimum, this requires elevating the ref counts and
potentially needing an explicit pinning of the memory. This places
additional restrictions on what type of memory backends userspace
can use for shared guest memory since there would be some reliance
on using refcounted pages.
- The response page needs to be switched to Firmware-owned state
before the firmware can write to it, which can lead to potential
host RMP #PFs if the guest is misbehaved and hands the host a
guest page that KVM is writing to for other reasons (e.g. virtio
buffers).
Both of these issues can be avoided completely by using
separately-allocated bounce pages for both the request/response pages
and passing those to firmware instead. So that's the approach taken
here.
Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Co-developed-by: Alexey Kardashevskiy <aik@amd.com>
Signed-off-by: Alexey Kardashevskiy <aik@amd.com>
Co-developed-by: Ashish Kalra <ashish.kalra@amd.com>
Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com>
Reviewed-by: Liam Merwick <liam.merwick@oracle.com>
[mdr: ensure FW command failures are indicated to guest, drop extended
request handling to be re-written as separate patch, massage commit]
Signed-off-by: Michael Roth <michael.roth@amd.com>
Message-ID: <20240701223148.3798365-2-michael.roth@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-07-01 17:31:46 -05:00
|
|
|
/*
|
|
|
|
* This sets up bounce buffers/firmware pages to handle SNP Guest Request
|
|
|
|
* messages (e.g. attestation requests). See "SNP Guest Request" in the GHCB
|
|
|
|
* 2.0 specification for more details.
|
|
|
|
*
|
|
|
|
* Technically, when an SNP Guest Request is issued, the guest will provide its
|
|
|
|
* own request/response pages, which could in theory be passed along directly
|
|
|
|
* to firmware rather than using bounce pages. However, these pages would need
|
|
|
|
* special care:
|
|
|
|
*
|
|
|
|
* - Both pages are from shared guest memory, so they need to be protected
|
|
|
|
* from migration/etc. occurring while firmware reads/writes to them. At a
|
|
|
|
* minimum, this requires elevating the ref counts and potentially needing
|
|
|
|
* an explicit pinning of the memory. This places additional restrictions
|
|
|
|
* on what type of memory backends userspace can use for shared guest
|
|
|
|
* memory since there is some reliance on using refcounted pages.
|
|
|
|
*
|
|
|
|
* - The response page needs to be switched to Firmware-owned[1] state
|
|
|
|
* before the firmware can write to it, which can lead to potential
|
|
|
|
* host RMP #PFs if the guest is misbehaved and hands the host a
|
|
|
|
* guest page that KVM might write to for other reasons (e.g. virtio
|
|
|
|
* buffers/etc.).
|
|
|
|
*
|
|
|
|
* Both of these issues can be avoided completely by using separately-allocated
|
|
|
|
* bounce pages for both the request/response pages and passing those to
|
|
|
|
* firmware instead. So that's what is being set up here.
|
|
|
|
*
|
|
|
|
* Guest requests rely on message sequence numbers to ensure requests are
|
|
|
|
* issued to firmware in the order the guest issues them, so concurrent guest
|
|
|
|
* requests generally shouldn't happen. But a misbehaved guest could issue
|
|
|
|
* concurrent guest requests in theory, so a mutex is used to serialize
|
|
|
|
* access to the bounce buffers.
|
|
|
|
*
|
|
|
|
* [1] See the "Page States" section of the SEV-SNP Firmware ABI for more
|
|
|
|
* details on Firmware-owned pages, along with "RMP and VMPL Access Checks"
|
|
|
|
* in the APM for details on the related RMP restrictions.
|
|
|
|
*/
|
|
|
|
static int snp_guest_req_init(struct kvm *kvm)
|
|
|
|
{
|
|
|
|
struct kvm_sev_info *sev = to_kvm_sev_info(kvm);
|
|
|
|
struct page *req_page;
|
|
|
|
|
|
|
|
req_page = alloc_page(GFP_KERNEL_ACCOUNT | __GFP_ZERO);
|
|
|
|
if (!req_page)
|
|
|
|
return -ENOMEM;
|
|
|
|
|
|
|
|
sev->guest_resp_buf = snp_alloc_firmware_page(GFP_KERNEL_ACCOUNT | __GFP_ZERO);
|
|
|
|
if (!sev->guest_resp_buf) {
|
|
|
|
__free_page(req_page);
|
|
|
|
return -EIO;
|
|
|
|
}
|
|
|
|
|
|
|
|
sev->guest_req_buf = page_address(req_page);
|
|
|
|
mutex_init(&sev->guest_req_mutex);
|
|
|
|
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
static void snp_guest_req_cleanup(struct kvm *kvm)
|
|
|
|
{
|
|
|
|
struct kvm_sev_info *sev = to_kvm_sev_info(kvm);
|
|
|
|
|
|
|
|
if (sev->guest_resp_buf)
|
|
|
|
snp_free_firmware_page(sev->guest_resp_buf);
|
|
|
|
|
|
|
|
if (sev->guest_req_buf)
|
|
|
|
__free_page(virt_to_page(sev->guest_req_buf));
|
|
|
|
|
|
|
|
sev->guest_req_buf = NULL;
|
|
|
|
sev->guest_resp_buf = NULL;
|
|
|
|
}
|
|
|
|
|
2024-04-04 08:13:22 -04:00
|
|
|
static int __sev_guest_init(struct kvm *kvm, struct kvm_sev_cmd *argp,
|
|
|
|
struct kvm_sev_init *data,
|
|
|
|
unsigned long vm_type)
|
2020-03-24 10:41:54 +01:00
|
|
|
{
|
2025-01-23 11:21:40 +05:30
|
|
|
struct kvm_sev_info *sev = to_kvm_sev_info(kvm);
|
crypto: ccp: Add support to initialize the AMD-SP for SEV-SNP
Before SNP VMs can be launched, the platform must be appropriately
configured and initialized via the SNP_INIT command.
During the execution of SNP_INIT command, the firmware configures
and enables SNP security policy enforcement in many system components.
Some system components write to regions of memory reserved by early
x86 firmware (e.g. UEFI). Other system components write to regions
provided by the operation system, hypervisor, or x86 firmware.
Such system components can only write to HV-fixed pages or Default
pages. They will error when attempting to write to pages in other page
states after SNP_INIT enables their SNP enforcement.
Starting in SNP firmware v1.52, the SNP_INIT_EX command takes a list of
system physical address ranges to convert into the HV-fixed page states
during the RMP initialization. If INIT_RMP is 1, hypervisors should
provide all system physical address ranges that the hypervisor will
never assign to a guest until the next RMP re-initialization.
For instance, the memory that UEFI reserves should be included in the
range list. This allows system components that occasionally write to
memory (e.g. logging to UEFI reserved regions) to not fail due to
RMP initialization and SNP enablement.
Note that SNP_INIT(_EX) must not be executed while non-SEV guests are
executing, otherwise it is possible that the system could reset or hang.
The psp_init_on_probe module parameter was added for SEV/SEV-ES support
and the init_ex_path module parameter to allow for time for the
necessary file system to be mounted/available.
SNP_INIT(_EX) does not use the file associated with init_ex_path. So, to
avoid running into issues where SNP_INIT(_EX) is called while there are
other running guests, issue it during module probe regardless of the
psp_init_on_probe setting, but maintain the previous deferrable handling
for SEV/SEV-ES initialization.
[ mdr: Squash in psp_init_on_probe changes from Tom, reduce
proliferation of 'probe' function parameter where possible.
bp: Fix 32-bit allmodconfig build. ]
Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Co-developed-by: Ashish Kalra <ashish.kalra@amd.com>
Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
Co-developed-by: Jarkko Sakkinen <jarkko@profian.com>
Signed-off-by: Jarkko Sakkinen <jarkko@profian.com>
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Michael Roth <michael.roth@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20240126041126.1927228-14-michael.roth@amd.com
2024-01-25 22:11:13 -06:00
|
|
|
struct sev_platform_init_args init_args = {0};
|
2024-04-04 08:13:22 -04:00
|
|
|
bool es_active = vm_type != KVM_X86_SEV_VM;
|
|
|
|
u64 valid_vmsa_features = es_active ? sev_supported_vmsa_features : 0;
|
2024-01-31 15:56:06 -08:00
|
|
|
int ret;
|
2020-03-24 10:41:54 +01:00
|
|
|
|
2021-03-30 20:19:36 -07:00
|
|
|
if (kvm->created_vcpus)
|
|
|
|
return -EINVAL;
|
|
|
|
|
2024-04-04 08:13:22 -04:00
|
|
|
if (data->flags)
|
|
|
|
return -EINVAL;
|
|
|
|
|
|
|
|
if (data->vmsa_features & ~valid_vmsa_features)
|
2024-04-04 08:13:20 -04:00
|
|
|
return -EINVAL;
|
|
|
|
|
2024-05-01 02:10:48 -05:00
|
|
|
if (data->ghcb_version > GHCB_VERSION_MAX || (!es_active && data->ghcb_version))
|
|
|
|
return -EINVAL;
|
|
|
|
|
2020-03-24 10:41:54 +01:00
|
|
|
if (unlikely(sev->active))
|
2024-01-31 15:56:09 -08:00
|
|
|
return -EINVAL;
|
2020-03-24 10:41:54 +01:00
|
|
|
|
2021-11-09 21:50:58 +00:00
|
|
|
sev->active = true;
|
2024-04-04 08:13:22 -04:00
|
|
|
sev->es_active = es_active;
|
|
|
|
sev->vmsa_features = data->vmsa_features;
|
2024-05-01 02:10:48 -05:00
|
|
|
sev->ghcb_version = data->ghcb_version;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Currently KVM supports the full range of mandatory features defined
|
|
|
|
* by version 2 of the GHCB protocol, so default to that for SEV-ES
|
|
|
|
* guests created via KVM_SEV_INIT2.
|
|
|
|
*/
|
|
|
|
if (sev->es_active && !sev->ghcb_version)
|
|
|
|
sev->ghcb_version = GHCB_VERSION_DEFAULT;
|
2024-04-04 08:13:16 -04:00
|
|
|
|
2024-05-01 03:51:54 -05:00
|
|
|
if (vm_type == KVM_X86_SNP_VM)
|
|
|
|
sev->vmsa_features |= SVM_SEV_FEAT_SNP_ACTIVE;
|
|
|
|
|
2024-01-31 15:56:06 -08:00
|
|
|
ret = sev_asid_new(sev);
|
|
|
|
if (ret)
|
2021-04-22 02:39:48 -04:00
|
|
|
goto e_no_asid;
|
2020-03-24 10:41:54 +01:00
|
|
|
|
crypto: ccp: Add support to initialize the AMD-SP for SEV-SNP
Before SNP VMs can be launched, the platform must be appropriately
configured and initialized via the SNP_INIT command.
During the execution of SNP_INIT command, the firmware configures
and enables SNP security policy enforcement in many system components.
Some system components write to regions of memory reserved by early
x86 firmware (e.g. UEFI). Other system components write to regions
provided by the operation system, hypervisor, or x86 firmware.
Such system components can only write to HV-fixed pages or Default
pages. They will error when attempting to write to pages in other page
states after SNP_INIT enables their SNP enforcement.
Starting in SNP firmware v1.52, the SNP_INIT_EX command takes a list of
system physical address ranges to convert into the HV-fixed page states
during the RMP initialization. If INIT_RMP is 1, hypervisors should
provide all system physical address ranges that the hypervisor will
never assign to a guest until the next RMP re-initialization.
For instance, the memory that UEFI reserves should be included in the
range list. This allows system components that occasionally write to
memory (e.g. logging to UEFI reserved regions) to not fail due to
RMP initialization and SNP enablement.
Note that SNP_INIT(_EX) must not be executed while non-SEV guests are
executing, otherwise it is possible that the system could reset or hang.
The psp_init_on_probe module parameter was added for SEV/SEV-ES support
and the init_ex_path module parameter to allow for time for the
necessary file system to be mounted/available.
SNP_INIT(_EX) does not use the file associated with init_ex_path. So, to
avoid running into issues where SNP_INIT(_EX) is called while there are
other running guests, issue it during module probe regardless of the
psp_init_on_probe setting, but maintain the previous deferrable handling
for SEV/SEV-ES initialization.
[ mdr: Squash in psp_init_on_probe changes from Tom, reduce
proliferation of 'probe' function parameter where possible.
bp: Fix 32-bit allmodconfig build. ]
Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Co-developed-by: Ashish Kalra <ashish.kalra@amd.com>
Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
Co-developed-by: Jarkko Sakkinen <jarkko@profian.com>
Signed-off-by: Jarkko Sakkinen <jarkko@profian.com>
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Michael Roth <michael.roth@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20240126041126.1927228-14-michael.roth@amd.com
2024-01-25 22:11:13 -06:00
|
|
|
init_args.probe = false;
|
|
|
|
ret = sev_platform_init(&init_args);
|
2020-03-24 10:41:54 +01:00
|
|
|
if (ret)
|
KVM: SVM: Flush cache only on CPUs running SEV guest
On AMD CPUs without ensuring cache consistency, each memory page
reclamation in an SEV guest triggers a call to do WBNOINVD/WBINVD on all
CPUs, thereby affecting the performance of other programs on the host.
Typically, an AMD server may have 128 cores or more, while the SEV guest
might only utilize 8 of these cores. Meanwhile, host can use qemu-affinity
to bind these 8 vCPUs to specific physical CPUs.
Therefore, keeping a record of the physical core numbers each time a vCPU
runs can help avoid flushing the cache for all CPUs every time.
Take care to allocate the cpumask used to track which CPUs have run a
vCPU when copying or moving an "encryption context", as nothing guarantees
memory in a mirror VM is a strict subset of the ASID owner, and the
destination VM for intrahost migration needs to maintain it's own set of
CPUs. E.g. for intrahost migration, if a CPU was used for the source VM
but not the destination VM, then it can only have cached memory that was
accessible to the source VM. And a CPU that was run in the source is also
used by the destination is no different than a CPU that was run in the
destination only.
Note, KVM is guaranteed to do flush caches prior to sev_vm_destroy(),
thanks to kvm_arch_guest_memory_reclaimed for SEV and SEV-ES, and
kvm_arch_gmem_invalidate() for SEV-SNP. I.e. it's safe to free the
cpumask prior to unregistering encrypted regions and freeing the ASID.
Opportunistically clean up sev_vm_destroy()'s comment regarding what is
(implicitly, what isn't) skipped for mirror VMs.
Cc: Srikanth Aithal <sraithal@amd.com>
Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Zheyun Shen <szy0127@sjtu.edu.cn>
Link: https://lore.kernel.org/r/20250522233733.3176144-9-seanjc@google.com
Link: https://lore.kernel.org/all/935a82e3-f7ad-47d7-aaaf-f3d2b62ed768@amd.com
Co-developed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-05-22 16:37:32 -07:00
|
|
|
goto e_free_asid;
|
|
|
|
|
|
|
|
if (!zalloc_cpumask_var(&sev->have_run_cpus, GFP_KERNEL_ACCOUNT)) {
|
|
|
|
ret = -ENOMEM;
|
|
|
|
goto e_free_asid;
|
|
|
|
}
|
2020-03-24 10:41:54 +01:00
|
|
|
|
KVM: SEV: Provide support for SNP_GUEST_REQUEST NAE event
Version 2 of GHCB specification added support for the SNP Guest Request
Message NAE event. The event allows for an SEV-SNP guest to make
requests to the SEV-SNP firmware through the hypervisor using the
SNP_GUEST_REQUEST API defined in the SEV-SNP firmware specification.
This is used by guests primarily to request attestation reports from
firmware. There are other request types are available as well, but the
specifics of what guest requests are being made generally does not
affect how they are handled by the hypervisor, which only serves as a
proxy for the guest requests and firmware responses.
Implement handling for these events.
When an SNP Guest Request is issued, the guest will provide its own
request/response pages, which could in theory be passed along directly
to firmware. However, these pages would need special care:
- Both pages are from shared guest memory, so they need to be
protected from migration/etc. occurring while firmware reads/writes
to them. At a minimum, this requires elevating the ref counts and
potentially needing an explicit pinning of the memory. This places
additional restrictions on what type of memory backends userspace
can use for shared guest memory since there would be some reliance
on using refcounted pages.
- The response page needs to be switched to Firmware-owned state
before the firmware can write to it, which can lead to potential
host RMP #PFs if the guest is misbehaved and hands the host a
guest page that KVM is writing to for other reasons (e.g. virtio
buffers).
Both of these issues can be avoided completely by using
separately-allocated bounce pages for both the request/response pages
and passing those to firmware instead. So that's the approach taken
here.
Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Co-developed-by: Alexey Kardashevskiy <aik@amd.com>
Signed-off-by: Alexey Kardashevskiy <aik@amd.com>
Co-developed-by: Ashish Kalra <ashish.kalra@amd.com>
Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com>
Reviewed-by: Liam Merwick <liam.merwick@oracle.com>
[mdr: ensure FW command failures are indicated to guest, drop extended
request handling to be re-written as separate patch, massage commit]
Signed-off-by: Michael Roth <michael.roth@amd.com>
Message-ID: <20240701223148.3798365-2-michael.roth@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-07-01 17:31:46 -05:00
|
|
|
/* This needs to happen after SEV/SNP firmware initialization. */
|
2024-10-31 13:32:14 -07:00
|
|
|
if (vm_type == KVM_X86_SNP_VM) {
|
|
|
|
ret = snp_guest_req_init(kvm);
|
|
|
|
if (ret)
|
|
|
|
goto e_free;
|
|
|
|
}
|
KVM: SEV: Provide support for SNP_GUEST_REQUEST NAE event
Version 2 of GHCB specification added support for the SNP Guest Request
Message NAE event. The event allows for an SEV-SNP guest to make
requests to the SEV-SNP firmware through the hypervisor using the
SNP_GUEST_REQUEST API defined in the SEV-SNP firmware specification.
This is used by guests primarily to request attestation reports from
firmware. There are other request types are available as well, but the
specifics of what guest requests are being made generally does not
affect how they are handled by the hypervisor, which only serves as a
proxy for the guest requests and firmware responses.
Implement handling for these events.
When an SNP Guest Request is issued, the guest will provide its own
request/response pages, which could in theory be passed along directly
to firmware. However, these pages would need special care:
- Both pages are from shared guest memory, so they need to be
protected from migration/etc. occurring while firmware reads/writes
to them. At a minimum, this requires elevating the ref counts and
potentially needing an explicit pinning of the memory. This places
additional restrictions on what type of memory backends userspace
can use for shared guest memory since there would be some reliance
on using refcounted pages.
- The response page needs to be switched to Firmware-owned state
before the firmware can write to it, which can lead to potential
host RMP #PFs if the guest is misbehaved and hands the host a
guest page that KVM is writing to for other reasons (e.g. virtio
buffers).
Both of these issues can be avoided completely by using
separately-allocated bounce pages for both the request/response pages
and passing those to firmware instead. So that's the approach taken
here.
Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Co-developed-by: Alexey Kardashevskiy <aik@amd.com>
Signed-off-by: Alexey Kardashevskiy <aik@amd.com>
Co-developed-by: Ashish Kalra <ashish.kalra@amd.com>
Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com>
Reviewed-by: Liam Merwick <liam.merwick@oracle.com>
[mdr: ensure FW command failures are indicated to guest, drop extended
request handling to be re-written as separate patch, massage commit]
Signed-off-by: Michael Roth <michael.roth@amd.com>
Message-ID: <20240701223148.3798365-2-michael.roth@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-07-01 17:31:46 -05:00
|
|
|
|
2020-03-24 10:41:54 +01:00
|
|
|
INIT_LIST_HEAD(&sev->regions_list);
|
2022-02-11 11:36:34 -08:00
|
|
|
INIT_LIST_HEAD(&sev->mirror_vms);
|
2024-04-04 08:13:20 -04:00
|
|
|
sev->need_init = false;
|
2020-03-24 10:41:54 +01:00
|
|
|
|
2022-04-08 08:37:10 -05:00
|
|
|
kvm_set_apicv_inhibit(kvm, APICV_INHIBIT_REASON_SEV);
|
|
|
|
|
2020-03-24 10:41:54 +01:00
|
|
|
return 0;
|
|
|
|
|
|
|
|
e_free:
|
KVM: SVM: Flush cache only on CPUs running SEV guest
On AMD CPUs without ensuring cache consistency, each memory page
reclamation in an SEV guest triggers a call to do WBNOINVD/WBINVD on all
CPUs, thereby affecting the performance of other programs on the host.
Typically, an AMD server may have 128 cores or more, while the SEV guest
might only utilize 8 of these cores. Meanwhile, host can use qemu-affinity
to bind these 8 vCPUs to specific physical CPUs.
Therefore, keeping a record of the physical core numbers each time a vCPU
runs can help avoid flushing the cache for all CPUs every time.
Take care to allocate the cpumask used to track which CPUs have run a
vCPU when copying or moving an "encryption context", as nothing guarantees
memory in a mirror VM is a strict subset of the ASID owner, and the
destination VM for intrahost migration needs to maintain it's own set of
CPUs. E.g. for intrahost migration, if a CPU was used for the source VM
but not the destination VM, then it can only have cached memory that was
accessible to the source VM. And a CPU that was run in the source is also
used by the destination is no different than a CPU that was run in the
destination only.
Note, KVM is guaranteed to do flush caches prior to sev_vm_destroy(),
thanks to kvm_arch_guest_memory_reclaimed for SEV and SEV-ES, and
kvm_arch_gmem_invalidate() for SEV-SNP. I.e. it's safe to free the
cpumask prior to unregistering encrypted regions and freeing the ASID.
Opportunistically clean up sev_vm_destroy()'s comment regarding what is
(implicitly, what isn't) skipped for mirror VMs.
Cc: Srikanth Aithal <sraithal@amd.com>
Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Zheyun Shen <szy0127@sjtu.edu.cn>
Link: https://lore.kernel.org/r/20250522233733.3176144-9-seanjc@google.com
Link: https://lore.kernel.org/all/935a82e3-f7ad-47d7-aaaf-f3d2b62ed768@amd.com
Co-developed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-05-22 16:37:32 -07:00
|
|
|
free_cpumask_var(sev->have_run_cpus);
|
|
|
|
e_free_asid:
|
crypto: ccp: Add support to initialize the AMD-SP for SEV-SNP
Before SNP VMs can be launched, the platform must be appropriately
configured and initialized via the SNP_INIT command.
During the execution of SNP_INIT command, the firmware configures
and enables SNP security policy enforcement in many system components.
Some system components write to regions of memory reserved by early
x86 firmware (e.g. UEFI). Other system components write to regions
provided by the operation system, hypervisor, or x86 firmware.
Such system components can only write to HV-fixed pages or Default
pages. They will error when attempting to write to pages in other page
states after SNP_INIT enables their SNP enforcement.
Starting in SNP firmware v1.52, the SNP_INIT_EX command takes a list of
system physical address ranges to convert into the HV-fixed page states
during the RMP initialization. If INIT_RMP is 1, hypervisors should
provide all system physical address ranges that the hypervisor will
never assign to a guest until the next RMP re-initialization.
For instance, the memory that UEFI reserves should be included in the
range list. This allows system components that occasionally write to
memory (e.g. logging to UEFI reserved regions) to not fail due to
RMP initialization and SNP enablement.
Note that SNP_INIT(_EX) must not be executed while non-SEV guests are
executing, otherwise it is possible that the system could reset or hang.
The psp_init_on_probe module parameter was added for SEV/SEV-ES support
and the init_ex_path module parameter to allow for time for the
necessary file system to be mounted/available.
SNP_INIT(_EX) does not use the file associated with init_ex_path. So, to
avoid running into issues where SNP_INIT(_EX) is called while there are
other running guests, issue it during module probe regardless of the
psp_init_on_probe setting, but maintain the previous deferrable handling
for SEV/SEV-ES initialization.
[ mdr: Squash in psp_init_on_probe changes from Tom, reduce
proliferation of 'probe' function parameter where possible.
bp: Fix 32-bit allmodconfig build. ]
Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Co-developed-by: Ashish Kalra <ashish.kalra@amd.com>
Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
Co-developed-by: Jarkko Sakkinen <jarkko@profian.com>
Signed-off-by: Jarkko Sakkinen <jarkko@profian.com>
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Michael Roth <michael.roth@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20240126041126.1927228-14-michael.roth@amd.com
2024-01-25 22:11:13 -06:00
|
|
|
argp->error = init_args.error;
|
2021-03-29 21:42:06 -07:00
|
|
|
sev_asid_free(sev);
|
|
|
|
sev->asid = 0;
|
2021-04-22 02:39:48 -04:00
|
|
|
e_no_asid:
|
2024-04-04 08:13:16 -04:00
|
|
|
sev->vmsa_features = 0;
|
2021-04-22 02:39:48 -04:00
|
|
|
sev->es_active = false;
|
2021-11-09 21:50:58 +00:00
|
|
|
sev->active = false;
|
2020-03-24 10:41:54 +01:00
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
|
2024-04-04 08:13:22 -04:00
|
|
|
static int sev_guest_init(struct kvm *kvm, struct kvm_sev_cmd *argp)
|
|
|
|
{
|
|
|
|
struct kvm_sev_init data = {
|
|
|
|
.vmsa_features = 0,
|
2024-05-01 02:10:48 -05:00
|
|
|
.ghcb_version = 0,
|
2024-04-04 08:13:22 -04:00
|
|
|
};
|
|
|
|
unsigned long vm_type;
|
|
|
|
|
|
|
|
if (kvm->arch.vm_type != KVM_X86_DEFAULT_VM)
|
|
|
|
return -EINVAL;
|
|
|
|
|
|
|
|
vm_type = (argp->id == KVM_SEV_INIT ? KVM_X86_SEV_VM : KVM_X86_SEV_ES_VM);
|
2024-05-01 02:10:48 -05:00
|
|
|
|
|
|
|
/*
|
|
|
|
* KVM_SEV_ES_INIT has been deprecated by KVM_SEV_INIT2, so it will
|
|
|
|
* continue to only ever support the minimal GHCB protocol version.
|
|
|
|
*/
|
|
|
|
if (vm_type == KVM_X86_SEV_ES_VM)
|
|
|
|
data.ghcb_version = GHCB_VERSION_MIN;
|
|
|
|
|
2024-04-04 08:13:22 -04:00
|
|
|
return __sev_guest_init(kvm, argp, &data, vm_type);
|
|
|
|
}
|
|
|
|
|
|
|
|
static int sev_guest_init2(struct kvm *kvm, struct kvm_sev_cmd *argp)
|
|
|
|
{
|
|
|
|
struct kvm_sev_init data;
|
|
|
|
|
2025-01-23 11:21:40 +05:30
|
|
|
if (!to_kvm_sev_info(kvm)->need_init)
|
2024-04-04 08:13:22 -04:00
|
|
|
return -EINVAL;
|
|
|
|
|
|
|
|
if (kvm->arch.vm_type != KVM_X86_SEV_VM &&
|
2024-05-01 03:51:54 -05:00
|
|
|
kvm->arch.vm_type != KVM_X86_SEV_ES_VM &&
|
|
|
|
kvm->arch.vm_type != KVM_X86_SNP_VM)
|
2024-04-04 08:13:22 -04:00
|
|
|
return -EINVAL;
|
|
|
|
|
|
|
|
if (copy_from_user(&data, u64_to_user_ptr(argp->data), sizeof(data)))
|
|
|
|
return -EFAULT;
|
|
|
|
|
|
|
|
return __sev_guest_init(kvm, argp, &data, kvm->arch.vm_type);
|
|
|
|
}
|
|
|
|
|
2020-03-24 10:41:54 +01:00
|
|
|
static int sev_bind_asid(struct kvm *kvm, unsigned int handle, int *error)
|
|
|
|
{
|
2024-01-31 15:56:07 -08:00
|
|
|
unsigned int asid = sev_get_asid(kvm);
|
2021-04-06 15:49:52 -07:00
|
|
|
struct sev_data_activate activate;
|
2020-03-24 10:41:54 +01:00
|
|
|
int ret;
|
|
|
|
|
|
|
|
/* activate ASID on the given handle */
|
2021-04-06 15:49:52 -07:00
|
|
|
activate.handle = handle;
|
|
|
|
activate.asid = asid;
|
|
|
|
ret = sev_guest_activate(&activate, error);
|
2020-03-24 10:41:54 +01:00
|
|
|
|
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
|
|
|
|
static int __sev_issue_cmd(int fd, int id, void *data, int *error)
|
|
|
|
{
|
2024-07-19 20:17:58 -04:00
|
|
|
CLASS(fd, f)(fd);
|
2020-03-24 10:41:54 +01:00
|
|
|
|
2024-07-19 20:17:58 -04:00
|
|
|
if (fd_empty(f))
|
2020-03-24 10:41:54 +01:00
|
|
|
return -EBADF;
|
|
|
|
|
2024-07-19 20:17:58 -04:00
|
|
|
return sev_issue_cmd_external_user(fd_file(f), id, data, error);
|
2020-03-24 10:41:54 +01:00
|
|
|
}
|
|
|
|
|
|
|
|
static int sev_issue_cmd(struct kvm *kvm, int id, void *data, int *error)
|
|
|
|
{
|
2025-01-23 11:21:40 +05:30
|
|
|
struct kvm_sev_info *sev = to_kvm_sev_info(kvm);
|
2020-03-24 10:41:54 +01:00
|
|
|
|
|
|
|
return __sev_issue_cmd(sev->fd, id, data, error);
|
|
|
|
}
|
|
|
|
|
|
|
|
static int sev_launch_start(struct kvm *kvm, struct kvm_sev_cmd *argp)
|
|
|
|
{
|
2025-01-23 11:21:40 +05:30
|
|
|
struct kvm_sev_info *sev = to_kvm_sev_info(kvm);
|
2021-04-06 15:49:52 -07:00
|
|
|
struct sev_data_launch_start start;
|
2020-03-24 10:41:54 +01:00
|
|
|
struct kvm_sev_launch_start params;
|
|
|
|
void *dh_blob, *session_blob;
|
|
|
|
int *error = &argp->error;
|
|
|
|
int ret;
|
|
|
|
|
|
|
|
if (!sev_guest(kvm))
|
|
|
|
return -ENOTTY;
|
|
|
|
|
2024-02-26 13:42:55 -05:00
|
|
|
if (copy_from_user(¶ms, u64_to_user_ptr(argp->data), sizeof(params)))
|
2020-03-24 10:41:54 +01:00
|
|
|
return -EFAULT;
|
|
|
|
|
2025-03-20 08:26:49 -05:00
|
|
|
sev->policy = params.policy;
|
|
|
|
|
2021-04-06 15:49:52 -07:00
|
|
|
memset(&start, 0, sizeof(start));
|
2020-03-24 10:41:54 +01:00
|
|
|
|
|
|
|
dh_blob = NULL;
|
|
|
|
if (params.dh_uaddr) {
|
|
|
|
dh_blob = psp_copy_user_blob(params.dh_uaddr, params.dh_len);
|
2021-04-06 15:49:52 -07:00
|
|
|
if (IS_ERR(dh_blob))
|
|
|
|
return PTR_ERR(dh_blob);
|
2020-03-24 10:41:54 +01:00
|
|
|
|
2021-04-06 15:49:52 -07:00
|
|
|
start.dh_cert_address = __sme_set(__pa(dh_blob));
|
|
|
|
start.dh_cert_len = params.dh_len;
|
2020-03-24 10:41:54 +01:00
|
|
|
}
|
|
|
|
|
|
|
|
session_blob = NULL;
|
|
|
|
if (params.session_uaddr) {
|
|
|
|
session_blob = psp_copy_user_blob(params.session_uaddr, params.session_len);
|
|
|
|
if (IS_ERR(session_blob)) {
|
|
|
|
ret = PTR_ERR(session_blob);
|
|
|
|
goto e_free_dh;
|
|
|
|
}
|
|
|
|
|
2021-04-06 15:49:52 -07:00
|
|
|
start.session_address = __sme_set(__pa(session_blob));
|
|
|
|
start.session_len = params.session_len;
|
2020-03-24 10:41:54 +01:00
|
|
|
}
|
|
|
|
|
2021-04-06 15:49:52 -07:00
|
|
|
start.handle = params.handle;
|
|
|
|
start.policy = params.policy;
|
2020-03-24 10:41:54 +01:00
|
|
|
|
|
|
|
/* create memory encryption context */
|
2021-04-06 15:49:52 -07:00
|
|
|
ret = __sev_issue_cmd(argp->sev_fd, SEV_CMD_LAUNCH_START, &start, error);
|
2020-03-24 10:41:54 +01:00
|
|
|
if (ret)
|
|
|
|
goto e_free_session;
|
|
|
|
|
|
|
|
/* Bind ASID to this guest */
|
2021-04-06 15:49:52 -07:00
|
|
|
ret = sev_bind_asid(kvm, start.handle, error);
|
2021-06-10 17:46:04 +00:00
|
|
|
if (ret) {
|
|
|
|
sev_decommission(start.handle);
|
2020-03-24 10:41:54 +01:00
|
|
|
goto e_free_session;
|
2021-06-10 17:46:04 +00:00
|
|
|
}
|
2020-03-24 10:41:54 +01:00
|
|
|
|
|
|
|
/* return handle to userspace */
|
2021-04-06 15:49:52 -07:00
|
|
|
params.handle = start.handle;
|
2024-02-26 13:42:55 -05:00
|
|
|
if (copy_to_user(u64_to_user_ptr(argp->data), ¶ms, sizeof(params))) {
|
2021-04-06 15:49:52 -07:00
|
|
|
sev_unbind_asid(kvm, start.handle);
|
2020-03-24 10:41:54 +01:00
|
|
|
ret = -EFAULT;
|
|
|
|
goto e_free_session;
|
|
|
|
}
|
|
|
|
|
2021-04-06 15:49:52 -07:00
|
|
|
sev->handle = start.handle;
|
2020-03-24 10:41:54 +01:00
|
|
|
sev->fd = argp->sev_fd;
|
|
|
|
|
|
|
|
e_free_session:
|
|
|
|
kfree(session_blob);
|
|
|
|
e_free_dh:
|
|
|
|
kfree(dh_blob);
|
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
|
|
|
|
static struct page **sev_pin_memory(struct kvm *kvm, unsigned long uaddr,
|
|
|
|
unsigned long ulen, unsigned long *n,
|
2025-02-11 10:37:03 +08:00
|
|
|
unsigned int flags)
|
2020-03-24 10:41:54 +01:00
|
|
|
{
|
2025-01-23 11:21:40 +05:30
|
|
|
struct kvm_sev_info *sev = to_kvm_sev_info(kvm);
|
2020-05-25 23:22:06 -07:00
|
|
|
unsigned long npages, size;
|
|
|
|
int npinned;
|
2020-03-24 10:41:54 +01:00
|
|
|
unsigned long locked, lock_limit;
|
|
|
|
struct page **pages;
|
|
|
|
unsigned long first, last;
|
2020-07-14 17:23:51 +03:00
|
|
|
int ret;
|
2020-03-24 10:41:54 +01:00
|
|
|
|
2021-01-27 08:15:24 -08:00
|
|
|
lockdep_assert_held(&kvm->lock);
|
|
|
|
|
2020-03-24 10:41:54 +01:00
|
|
|
if (ulen == 0 || uaddr + ulen < uaddr)
|
2020-06-23 05:12:24 -04:00
|
|
|
return ERR_PTR(-EINVAL);
|
2020-03-24 10:41:54 +01:00
|
|
|
|
|
|
|
/* Calculate number of pages. */
|
|
|
|
first = (uaddr & PAGE_MASK) >> PAGE_SHIFT;
|
|
|
|
last = ((uaddr + ulen - 1) & PAGE_MASK) >> PAGE_SHIFT;
|
|
|
|
npages = (last - first + 1);
|
|
|
|
|
|
|
|
locked = sev->pages_locked + npages;
|
|
|
|
lock_limit = rlimit(RLIMIT_MEMLOCK) >> PAGE_SHIFT;
|
|
|
|
if (locked > lock_limit && !capable(CAP_IPC_LOCK)) {
|
|
|
|
pr_err("SEV: %lu locked pages exceed the lock limit of %lu.\n", locked, lock_limit);
|
2020-06-23 05:12:24 -04:00
|
|
|
return ERR_PTR(-ENOMEM);
|
2020-03-24 10:41:54 +01:00
|
|
|
}
|
|
|
|
|
2020-05-25 23:22:06 -07:00
|
|
|
if (WARN_ON_ONCE(npages > INT_MAX))
|
2020-06-23 05:12:24 -04:00
|
|
|
return ERR_PTR(-EINVAL);
|
2020-05-25 23:22:06 -07:00
|
|
|
|
2020-03-24 10:41:54 +01:00
|
|
|
/* Avoid using vmalloc for smaller buffers. */
|
|
|
|
size = npages * sizeof(struct page *);
|
|
|
|
if (size > PAGE_SIZE)
|
2024-03-09 18:15:45 +01:00
|
|
|
pages = __vmalloc(size, GFP_KERNEL_ACCOUNT);
|
2020-03-24 10:41:54 +01:00
|
|
|
else
|
|
|
|
pages = kmalloc(size, GFP_KERNEL_ACCOUNT);
|
|
|
|
|
|
|
|
if (!pages)
|
2020-06-23 05:12:24 -04:00
|
|
|
return ERR_PTR(-ENOMEM);
|
2020-03-24 10:41:54 +01:00
|
|
|
|
|
|
|
/* Pin the user virtual address. */
|
2025-02-11 10:37:03 +08:00
|
|
|
npinned = pin_user_pages_fast(uaddr, npages, flags, pages);
|
2020-03-24 10:41:54 +01:00
|
|
|
if (npinned != npages) {
|
|
|
|
pr_err("SEV: Failure locking %lu pages.\n", npages);
|
2020-07-14 17:23:51 +03:00
|
|
|
ret = -ENOMEM;
|
2020-03-24 10:41:54 +01:00
|
|
|
goto err;
|
|
|
|
}
|
|
|
|
|
|
|
|
*n = npages;
|
|
|
|
sev->pages_locked = locked;
|
|
|
|
|
|
|
|
return pages;
|
|
|
|
|
|
|
|
err:
|
2020-07-14 17:23:51 +03:00
|
|
|
if (npinned > 0)
|
2020-05-25 23:22:07 -07:00
|
|
|
unpin_user_pages(pages, npinned);
|
2020-03-24 10:41:54 +01:00
|
|
|
|
|
|
|
kvfree(pages);
|
2020-07-14 17:23:51 +03:00
|
|
|
return ERR_PTR(ret);
|
2020-03-24 10:41:54 +01:00
|
|
|
}
|
|
|
|
|
|
|
|
static void sev_unpin_memory(struct kvm *kvm, struct page **pages,
|
|
|
|
unsigned long npages)
|
|
|
|
{
|
2020-05-25 23:22:07 -07:00
|
|
|
unpin_user_pages(pages, npages);
|
2020-03-24 10:41:54 +01:00
|
|
|
kvfree(pages);
|
2025-01-23 11:21:40 +05:30
|
|
|
to_kvm_sev_info(kvm)->pages_locked -= npages;
|
2020-03-24 10:41:54 +01:00
|
|
|
}
|
|
|
|
|
|
|
|
static void sev_clflush_pages(struct page *pages[], unsigned long npages)
|
|
|
|
{
|
|
|
|
uint8_t *page_virtual;
|
|
|
|
unsigned long i;
|
|
|
|
|
2020-09-17 21:20:38 +00:00
|
|
|
if (this_cpu_has(X86_FEATURE_SME_COHERENT) || npages == 0 ||
|
|
|
|
pages == NULL)
|
2020-03-24 10:41:54 +01:00
|
|
|
return;
|
|
|
|
|
|
|
|
for (i = 0; i < npages; i++) {
|
2022-09-28 17:27:48 +08:00
|
|
|
page_virtual = kmap_local_page(pages[i]);
|
2020-03-24 10:41:54 +01:00
|
|
|
clflush_cache_range(page_virtual, PAGE_SIZE);
|
2022-09-28 17:27:48 +08:00
|
|
|
kunmap_local(page_virtual);
|
2022-03-30 09:43:06 -07:00
|
|
|
cond_resched();
|
2020-03-24 10:41:54 +01:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
KVM: SVM: Flush cache only on CPUs running SEV guest
On AMD CPUs without ensuring cache consistency, each memory page
reclamation in an SEV guest triggers a call to do WBNOINVD/WBINVD on all
CPUs, thereby affecting the performance of other programs on the host.
Typically, an AMD server may have 128 cores or more, while the SEV guest
might only utilize 8 of these cores. Meanwhile, host can use qemu-affinity
to bind these 8 vCPUs to specific physical CPUs.
Therefore, keeping a record of the physical core numbers each time a vCPU
runs can help avoid flushing the cache for all CPUs every time.
Take care to allocate the cpumask used to track which CPUs have run a
vCPU when copying or moving an "encryption context", as nothing guarantees
memory in a mirror VM is a strict subset of the ASID owner, and the
destination VM for intrahost migration needs to maintain it's own set of
CPUs. E.g. for intrahost migration, if a CPU was used for the source VM
but not the destination VM, then it can only have cached memory that was
accessible to the source VM. And a CPU that was run in the source is also
used by the destination is no different than a CPU that was run in the
destination only.
Note, KVM is guaranteed to do flush caches prior to sev_vm_destroy(),
thanks to kvm_arch_guest_memory_reclaimed for SEV and SEV-ES, and
kvm_arch_gmem_invalidate() for SEV-SNP. I.e. it's safe to free the
cpumask prior to unregistering encrypted regions and freeing the ASID.
Opportunistically clean up sev_vm_destroy()'s comment regarding what is
(implicitly, what isn't) skipped for mirror VMs.
Cc: Srikanth Aithal <sraithal@amd.com>
Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Zheyun Shen <szy0127@sjtu.edu.cn>
Link: https://lore.kernel.org/r/20250522233733.3176144-9-seanjc@google.com
Link: https://lore.kernel.org/all/935a82e3-f7ad-47d7-aaaf-f3d2b62ed768@amd.com
Co-developed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-05-22 16:37:32 -07:00
|
|
|
static void sev_writeback_caches(struct kvm *kvm)
|
2025-05-22 16:37:31 -07:00
|
|
|
{
|
KVM: SVM: Flush cache only on CPUs running SEV guest
On AMD CPUs without ensuring cache consistency, each memory page
reclamation in an SEV guest triggers a call to do WBNOINVD/WBINVD on all
CPUs, thereby affecting the performance of other programs on the host.
Typically, an AMD server may have 128 cores or more, while the SEV guest
might only utilize 8 of these cores. Meanwhile, host can use qemu-affinity
to bind these 8 vCPUs to specific physical CPUs.
Therefore, keeping a record of the physical core numbers each time a vCPU
runs can help avoid flushing the cache for all CPUs every time.
Take care to allocate the cpumask used to track which CPUs have run a
vCPU when copying or moving an "encryption context", as nothing guarantees
memory in a mirror VM is a strict subset of the ASID owner, and the
destination VM for intrahost migration needs to maintain it's own set of
CPUs. E.g. for intrahost migration, if a CPU was used for the source VM
but not the destination VM, then it can only have cached memory that was
accessible to the source VM. And a CPU that was run in the source is also
used by the destination is no different than a CPU that was run in the
destination only.
Note, KVM is guaranteed to do flush caches prior to sev_vm_destroy(),
thanks to kvm_arch_guest_memory_reclaimed for SEV and SEV-ES, and
kvm_arch_gmem_invalidate() for SEV-SNP. I.e. it's safe to free the
cpumask prior to unregistering encrypted regions and freeing the ASID.
Opportunistically clean up sev_vm_destroy()'s comment regarding what is
(implicitly, what isn't) skipped for mirror VMs.
Cc: Srikanth Aithal <sraithal@amd.com>
Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Zheyun Shen <szy0127@sjtu.edu.cn>
Link: https://lore.kernel.org/r/20250522233733.3176144-9-seanjc@google.com
Link: https://lore.kernel.org/all/935a82e3-f7ad-47d7-aaaf-f3d2b62ed768@amd.com
Co-developed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-05-22 16:37:32 -07:00
|
|
|
/*
|
|
|
|
* Note, the caller is responsible for ensuring correctness if the mask
|
|
|
|
* can be modified, e.g. if a CPU could be doing VMRUN.
|
|
|
|
*/
|
|
|
|
if (cpumask_empty(to_kvm_sev_info(kvm)->have_run_cpus))
|
|
|
|
return;
|
|
|
|
|
2025-05-22 16:37:31 -07:00
|
|
|
/*
|
|
|
|
* Ensure that all dirty guest tagged cache entries are written back
|
|
|
|
* before releasing the pages back to the system for use. CLFLUSH will
|
|
|
|
* not do this without SME_COHERENT, and flushing many cache lines
|
|
|
|
* individually is slower than blasting WBINVD for large VMs, so issue
|
KVM: SVM: Flush cache only on CPUs running SEV guest
On AMD CPUs without ensuring cache consistency, each memory page
reclamation in an SEV guest triggers a call to do WBNOINVD/WBINVD on all
CPUs, thereby affecting the performance of other programs on the host.
Typically, an AMD server may have 128 cores or more, while the SEV guest
might only utilize 8 of these cores. Meanwhile, host can use qemu-affinity
to bind these 8 vCPUs to specific physical CPUs.
Therefore, keeping a record of the physical core numbers each time a vCPU
runs can help avoid flushing the cache for all CPUs every time.
Take care to allocate the cpumask used to track which CPUs have run a
vCPU when copying or moving an "encryption context", as nothing guarantees
memory in a mirror VM is a strict subset of the ASID owner, and the
destination VM for intrahost migration needs to maintain it's own set of
CPUs. E.g. for intrahost migration, if a CPU was used for the source VM
but not the destination VM, then it can only have cached memory that was
accessible to the source VM. And a CPU that was run in the source is also
used by the destination is no different than a CPU that was run in the
destination only.
Note, KVM is guaranteed to do flush caches prior to sev_vm_destroy(),
thanks to kvm_arch_guest_memory_reclaimed for SEV and SEV-ES, and
kvm_arch_gmem_invalidate() for SEV-SNP. I.e. it's safe to free the
cpumask prior to unregistering encrypted regions and freeing the ASID.
Opportunistically clean up sev_vm_destroy()'s comment regarding what is
(implicitly, what isn't) skipped for mirror VMs.
Cc: Srikanth Aithal <sraithal@amd.com>
Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Zheyun Shen <szy0127@sjtu.edu.cn>
Link: https://lore.kernel.org/r/20250522233733.3176144-9-seanjc@google.com
Link: https://lore.kernel.org/all/935a82e3-f7ad-47d7-aaaf-f3d2b62ed768@amd.com
Co-developed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-05-22 16:37:32 -07:00
|
|
|
* WBNOINVD (or WBINVD if the "no invalidate" variant is unsupported)
|
|
|
|
* on CPUs that have done VMRUN, i.e. may have dirtied data using the
|
|
|
|
* VM's ASID.
|
|
|
|
*
|
|
|
|
* For simplicity, never remove CPUs from the bitmap. Ideally, KVM
|
|
|
|
* would clear the mask when flushing caches, but doing so requires
|
|
|
|
* serializing multiple calls and having responding CPUs (to the IPI)
|
|
|
|
* mark themselves as still running if they are running (or about to
|
|
|
|
* run) a vCPU for the VM.
|
2025-05-22 16:37:31 -07:00
|
|
|
*/
|
KVM: SVM: Flush cache only on CPUs running SEV guest
On AMD CPUs without ensuring cache consistency, each memory page
reclamation in an SEV guest triggers a call to do WBNOINVD/WBINVD on all
CPUs, thereby affecting the performance of other programs on the host.
Typically, an AMD server may have 128 cores or more, while the SEV guest
might only utilize 8 of these cores. Meanwhile, host can use qemu-affinity
to bind these 8 vCPUs to specific physical CPUs.
Therefore, keeping a record of the physical core numbers each time a vCPU
runs can help avoid flushing the cache for all CPUs every time.
Take care to allocate the cpumask used to track which CPUs have run a
vCPU when copying or moving an "encryption context", as nothing guarantees
memory in a mirror VM is a strict subset of the ASID owner, and the
destination VM for intrahost migration needs to maintain it's own set of
CPUs. E.g. for intrahost migration, if a CPU was used for the source VM
but not the destination VM, then it can only have cached memory that was
accessible to the source VM. And a CPU that was run in the source is also
used by the destination is no different than a CPU that was run in the
destination only.
Note, KVM is guaranteed to do flush caches prior to sev_vm_destroy(),
thanks to kvm_arch_guest_memory_reclaimed for SEV and SEV-ES, and
kvm_arch_gmem_invalidate() for SEV-SNP. I.e. it's safe to free the
cpumask prior to unregistering encrypted regions and freeing the ASID.
Opportunistically clean up sev_vm_destroy()'s comment regarding what is
(implicitly, what isn't) skipped for mirror VMs.
Cc: Srikanth Aithal <sraithal@amd.com>
Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Zheyun Shen <szy0127@sjtu.edu.cn>
Link: https://lore.kernel.org/r/20250522233733.3176144-9-seanjc@google.com
Link: https://lore.kernel.org/all/935a82e3-f7ad-47d7-aaaf-f3d2b62ed768@amd.com
Co-developed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-05-22 16:37:32 -07:00
|
|
|
wbnoinvd_on_cpus_mask(to_kvm_sev_info(kvm)->have_run_cpus);
|
2025-05-22 16:37:31 -07:00
|
|
|
}
|
|
|
|
|
2020-03-24 10:41:54 +01:00
|
|
|
static unsigned long get_num_contig_pages(unsigned long idx,
|
|
|
|
struct page **inpages, unsigned long npages)
|
|
|
|
{
|
|
|
|
unsigned long paddr, next_paddr;
|
|
|
|
unsigned long i = idx + 1, pages = 1;
|
|
|
|
|
|
|
|
/* find the number of contiguous pages starting from idx */
|
|
|
|
paddr = __sme_page_pa(inpages[idx]);
|
|
|
|
while (i < npages) {
|
|
|
|
next_paddr = __sme_page_pa(inpages[i++]);
|
|
|
|
if ((paddr + PAGE_SIZE) == next_paddr) {
|
|
|
|
pages++;
|
|
|
|
paddr = next_paddr;
|
|
|
|
continue;
|
|
|
|
}
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
|
|
|
|
return pages;
|
|
|
|
}
|
|
|
|
|
|
|
|
static int sev_launch_update_data(struct kvm *kvm, struct kvm_sev_cmd *argp)
|
|
|
|
{
|
|
|
|
unsigned long vaddr, vaddr_end, next_vaddr, npages, pages, size, i;
|
|
|
|
struct kvm_sev_launch_update_data params;
|
2021-04-06 15:49:52 -07:00
|
|
|
struct sev_data_launch_update_data data;
|
2020-03-24 10:41:54 +01:00
|
|
|
struct page **inpages;
|
|
|
|
int ret;
|
|
|
|
|
|
|
|
if (!sev_guest(kvm))
|
|
|
|
return -ENOTTY;
|
|
|
|
|
2024-02-26 13:42:55 -05:00
|
|
|
if (copy_from_user(¶ms, u64_to_user_ptr(argp->data), sizeof(params)))
|
2020-03-24 10:41:54 +01:00
|
|
|
return -EFAULT;
|
|
|
|
|
|
|
|
vaddr = params.uaddr;
|
|
|
|
size = params.len;
|
|
|
|
vaddr_end = vaddr + size;
|
|
|
|
|
|
|
|
/* Lock the user memory. */
|
2025-02-11 10:37:03 +08:00
|
|
|
inpages = sev_pin_memory(kvm, vaddr, size, &npages, FOLL_WRITE);
|
2021-04-06 15:49:52 -07:00
|
|
|
if (IS_ERR(inpages))
|
|
|
|
return PTR_ERR(inpages);
|
2020-03-24 10:41:54 +01:00
|
|
|
|
|
|
|
/*
|
2020-09-23 13:01:33 -04:00
|
|
|
* Flush (on non-coherent CPUs) before LAUNCH_UPDATE encrypts pages in
|
|
|
|
* place; the cache may contain the data that was written unencrypted.
|
2020-03-24 10:41:54 +01:00
|
|
|
*/
|
|
|
|
sev_clflush_pages(inpages, npages);
|
|
|
|
|
2021-04-06 15:49:52 -07:00
|
|
|
data.reserved = 0;
|
2025-01-23 11:21:40 +05:30
|
|
|
data.handle = to_kvm_sev_info(kvm)->handle;
|
2021-04-06 15:49:52 -07:00
|
|
|
|
2020-03-24 10:41:54 +01:00
|
|
|
for (i = 0; vaddr < vaddr_end; vaddr = next_vaddr, i += pages) {
|
|
|
|
int offset, len;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* If the user buffer is not page-aligned, calculate the offset
|
|
|
|
* within the page.
|
|
|
|
*/
|
|
|
|
offset = vaddr & (PAGE_SIZE - 1);
|
|
|
|
|
|
|
|
/* Calculate the number of pages that can be encrypted in one go. */
|
|
|
|
pages = get_num_contig_pages(i, inpages, npages);
|
|
|
|
|
|
|
|
len = min_t(size_t, ((pages * PAGE_SIZE) - offset), size);
|
|
|
|
|
2021-04-06 15:49:52 -07:00
|
|
|
data.len = len;
|
|
|
|
data.address = __sme_page_pa(inpages[i]) + offset;
|
|
|
|
ret = sev_issue_cmd(kvm, SEV_CMD_LAUNCH_UPDATE_DATA, &data, &argp->error);
|
2020-03-24 10:41:54 +01:00
|
|
|
if (ret)
|
|
|
|
goto e_unpin;
|
|
|
|
|
|
|
|
size -= len;
|
|
|
|
next_vaddr = vaddr + len;
|
|
|
|
}
|
|
|
|
|
|
|
|
e_unpin:
|
|
|
|
/* content of memory is updated, mark pages dirty */
|
|
|
|
for (i = 0; i < npages; i++) {
|
|
|
|
set_page_dirty_lock(inpages[i]);
|
|
|
|
mark_page_accessed(inpages[i]);
|
|
|
|
}
|
|
|
|
/* unlock the user pages */
|
|
|
|
sev_unpin_memory(kvm, inpages, npages);
|
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
|
2020-12-10 11:10:09 -06:00
|
|
|
static int sev_es_sync_vmsa(struct vcpu_svm *svm)
|
|
|
|
{
|
2024-04-04 08:13:16 -04:00
|
|
|
struct kvm_vcpu *vcpu = &svm->vcpu;
|
2025-01-23 11:21:40 +05:30
|
|
|
struct kvm_sev_info *sev = to_kvm_sev_info(vcpu->kvm);
|
2022-04-05 13:27:43 -05:00
|
|
|
struct sev_es_save_area *save = svm->sev_es.vmsa;
|
2024-04-04 08:13:21 -04:00
|
|
|
struct xregs_state *xsave;
|
|
|
|
const u8 *s;
|
|
|
|
u8 *d;
|
|
|
|
int i;
|
2020-12-10 11:10:09 -06:00
|
|
|
|
|
|
|
/* Check some debug related fields before encrypting the VMSA */
|
2022-04-05 13:27:43 -05:00
|
|
|
if (svm->vcpu.guest_debug || (svm->vmcb->save.dr7 & ~DR7_FIXED_1))
|
2020-12-10 11:10:09 -06:00
|
|
|
return -EINVAL;
|
|
|
|
|
2022-04-05 13:27:43 -05:00
|
|
|
/*
|
|
|
|
* SEV-ES will use a VMSA that is pointed to by the VMCB, not
|
|
|
|
* the traditional VMSA that is part of the VMCB. Copy the
|
|
|
|
* traditional VMSA as it has been built so far (in prep
|
|
|
|
* for LAUNCH_UPDATE_VMSA) to be the initial SEV-ES state.
|
|
|
|
*/
|
|
|
|
memcpy(save, &svm->vmcb->save, sizeof(svm->vmcb->save));
|
|
|
|
|
2020-12-10 11:10:09 -06:00
|
|
|
/* Sync registgers */
|
|
|
|
save->rax = svm->vcpu.arch.regs[VCPU_REGS_RAX];
|
|
|
|
save->rbx = svm->vcpu.arch.regs[VCPU_REGS_RBX];
|
|
|
|
save->rcx = svm->vcpu.arch.regs[VCPU_REGS_RCX];
|
|
|
|
save->rdx = svm->vcpu.arch.regs[VCPU_REGS_RDX];
|
|
|
|
save->rsp = svm->vcpu.arch.regs[VCPU_REGS_RSP];
|
|
|
|
save->rbp = svm->vcpu.arch.regs[VCPU_REGS_RBP];
|
|
|
|
save->rsi = svm->vcpu.arch.regs[VCPU_REGS_RSI];
|
|
|
|
save->rdi = svm->vcpu.arch.regs[VCPU_REGS_RDI];
|
2020-12-16 13:08:21 -05:00
|
|
|
#ifdef CONFIG_X86_64
|
2020-12-10 11:10:09 -06:00
|
|
|
save->r8 = svm->vcpu.arch.regs[VCPU_REGS_R8];
|
|
|
|
save->r9 = svm->vcpu.arch.regs[VCPU_REGS_R9];
|
|
|
|
save->r10 = svm->vcpu.arch.regs[VCPU_REGS_R10];
|
|
|
|
save->r11 = svm->vcpu.arch.regs[VCPU_REGS_R11];
|
|
|
|
save->r12 = svm->vcpu.arch.regs[VCPU_REGS_R12];
|
|
|
|
save->r13 = svm->vcpu.arch.regs[VCPU_REGS_R13];
|
|
|
|
save->r14 = svm->vcpu.arch.regs[VCPU_REGS_R14];
|
|
|
|
save->r15 = svm->vcpu.arch.regs[VCPU_REGS_R15];
|
2020-12-16 13:08:21 -05:00
|
|
|
#endif
|
2020-12-10 11:10:09 -06:00
|
|
|
save->rip = svm->vcpu.arch.regs[VCPU_REGS_RIP];
|
|
|
|
|
|
|
|
/* Sync some non-GPR registers before encrypting */
|
|
|
|
save->xcr0 = svm->vcpu.arch.xcr0;
|
|
|
|
save->pkru = svm->vcpu.arch.pkru;
|
|
|
|
save->xss = svm->vcpu.arch.ia32_xss;
|
2021-07-13 09:33:10 -07:00
|
|
|
save->dr6 = svm->vcpu.arch.dr6;
|
2020-12-10 11:10:09 -06:00
|
|
|
|
2024-04-04 08:13:16 -04:00
|
|
|
save->sev_features = sev->vmsa_features;
|
2023-06-15 16:37:54 +10:00
|
|
|
|
2024-04-04 08:13:21 -04:00
|
|
|
/*
|
|
|
|
* Skip FPU and AVX setup with KVM_SEV_ES_INIT to avoid
|
|
|
|
* breaking older measurements.
|
|
|
|
*/
|
|
|
|
if (vcpu->kvm->arch.vm_type != KVM_X86_DEFAULT_VM) {
|
|
|
|
xsave = &vcpu->arch.guest_fpu.fpstate->regs.xsave;
|
|
|
|
save->x87_dp = xsave->i387.rdp;
|
|
|
|
save->mxcsr = xsave->i387.mxcsr;
|
|
|
|
save->x87_ftw = xsave->i387.twd;
|
|
|
|
save->x87_fsw = xsave->i387.swd;
|
|
|
|
save->x87_fcw = xsave->i387.cwd;
|
|
|
|
save->x87_fop = xsave->i387.fop;
|
|
|
|
save->x87_ds = 0;
|
|
|
|
save->x87_cs = 0;
|
|
|
|
save->x87_rip = xsave->i387.rip;
|
|
|
|
|
|
|
|
for (i = 0; i < 8; i++) {
|
|
|
|
/*
|
|
|
|
* The format of the x87 save area is undocumented and
|
|
|
|
* definitely not what you would expect. It consists of
|
|
|
|
* an 8*8 bytes area with bytes 0-7, and an 8*2 bytes
|
|
|
|
* area with bytes 8-9 of each register.
|
|
|
|
*/
|
|
|
|
d = save->fpreg_x87 + i * 8;
|
|
|
|
s = ((u8 *)xsave->i387.st_space) + i * 16;
|
|
|
|
memcpy(d, s, 8);
|
|
|
|
save->fpreg_x87[64 + i * 2] = s[8];
|
|
|
|
save->fpreg_x87[64 + i * 2 + 1] = s[9];
|
|
|
|
}
|
|
|
|
memcpy(save->fpreg_xmm, xsave->i387.xmm_space, 256);
|
|
|
|
|
|
|
|
s = get_xsave_addr(xsave, XFEATURE_YMM);
|
|
|
|
if (s)
|
|
|
|
memcpy(save->fpreg_ymm, s, 256);
|
|
|
|
else
|
|
|
|
memset(save->fpreg_ymm, 0, 256);
|
|
|
|
}
|
|
|
|
|
2022-07-28 08:09:19 +03:00
|
|
|
pr_debug("Virtual Machine Save Area (VMSA):\n");
|
2022-11-04 07:22:20 -07:00
|
|
|
print_hex_dump_debug("", DUMP_PREFIX_NONE, 16, 1, save, sizeof(*save), false);
|
2022-07-28 08:09:19 +03:00
|
|
|
|
2020-12-10 11:10:09 -06:00
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2021-09-15 10:17:55 -07:00
|
|
|
static int __sev_launch_update_vmsa(struct kvm *kvm, struct kvm_vcpu *vcpu,
|
|
|
|
int *error)
|
2020-12-10 11:10:09 -06:00
|
|
|
{
|
2021-04-06 15:49:52 -07:00
|
|
|
struct sev_data_launch_update_vmsa vmsa;
|
2021-09-15 10:17:55 -07:00
|
|
|
struct vcpu_svm *svm = to_svm(vcpu);
|
|
|
|
int ret;
|
|
|
|
|
2023-06-15 16:37:52 +10:00
|
|
|
if (vcpu->guest_debug) {
|
|
|
|
pr_warn_once("KVM_SET_GUEST_DEBUG for SEV-ES guest is not supported");
|
|
|
|
return -EINVAL;
|
|
|
|
}
|
|
|
|
|
2021-09-15 10:17:55 -07:00
|
|
|
/* Perform some pre-encryption checks against the VMSA */
|
|
|
|
ret = sev_es_sync_vmsa(svm);
|
|
|
|
if (ret)
|
|
|
|
return ret;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* The LAUNCH_UPDATE_VMSA command will perform in-place encryption of
|
|
|
|
* the VMSA memory content (i.e it will write the same memory region
|
|
|
|
* with the guest's key), so invalidate it first.
|
|
|
|
*/
|
2021-10-21 10:42:59 -07:00
|
|
|
clflush_cache_range(svm->sev_es.vmsa, PAGE_SIZE);
|
2021-09-15 10:17:55 -07:00
|
|
|
|
|
|
|
vmsa.reserved = 0;
|
2024-04-04 08:13:19 -04:00
|
|
|
vmsa.handle = to_kvm_sev_info(kvm)->handle;
|
2021-10-21 10:42:59 -07:00
|
|
|
vmsa.address = __sme_pa(svm->sev_es.vmsa);
|
2021-09-15 10:17:55 -07:00
|
|
|
vmsa.len = PAGE_SIZE;
|
2021-10-15 13:32:22 -04:00
|
|
|
ret = sev_issue_cmd(kvm, SEV_CMD_LAUNCH_UPDATE_VMSA, &vmsa, error);
|
|
|
|
if (ret)
|
|
|
|
return ret;
|
|
|
|
|
2024-04-04 08:13:21 -04:00
|
|
|
/*
|
|
|
|
* SEV-ES guests maintain an encrypted version of their FPU
|
|
|
|
* state which is restored and saved on VMRUN and VMEXIT.
|
|
|
|
* Mark vcpu->arch.guest_fpu->fpstate as scratch so it won't
|
|
|
|
* do xsave/xrstor on it.
|
|
|
|
*/
|
|
|
|
fpstate_set_confidential(&vcpu->arch.guest_fpu);
|
2021-10-15 13:32:22 -04:00
|
|
|
vcpu->arch.guest_state_protected = true;
|
2024-05-31 04:46:44 +00:00
|
|
|
|
|
|
|
/*
|
|
|
|
* SEV-ES guest mandates LBR Virtualization to be _always_ ON. Enable it
|
|
|
|
* only after setting guest_state_protected because KVM_SET_MSRS allows
|
|
|
|
* dynamic toggling of LBRV (for performance reason) on write access to
|
|
|
|
* MSR_IA32_DEBUGCTLMSR when guest_state_protected is not set.
|
|
|
|
*/
|
|
|
|
svm_enable_lbrv(vcpu);
|
2021-10-15 13:32:22 -04:00
|
|
|
return 0;
|
2021-09-15 10:17:55 -07:00
|
|
|
}
|
|
|
|
|
|
|
|
static int sev_launch_update_vmsa(struct kvm *kvm, struct kvm_sev_cmd *argp)
|
|
|
|
{
|
2021-03-30 20:19:34 -07:00
|
|
|
struct kvm_vcpu *vcpu;
|
2021-11-16 16:04:02 +00:00
|
|
|
unsigned long i;
|
|
|
|
int ret;
|
2020-12-10 11:10:09 -06:00
|
|
|
|
|
|
|
if (!sev_es_guest(kvm))
|
|
|
|
return -ENOTTY;
|
|
|
|
|
2021-03-30 20:19:34 -07:00
|
|
|
kvm_for_each_vcpu(i, vcpu, kvm) {
|
2021-09-15 10:17:55 -07:00
|
|
|
ret = mutex_lock_killable(&vcpu->mutex);
|
2020-12-10 11:10:09 -06:00
|
|
|
if (ret)
|
2021-04-06 15:49:52 -07:00
|
|
|
return ret;
|
2020-12-10 11:10:09 -06:00
|
|
|
|
2021-09-15 10:17:55 -07:00
|
|
|
ret = __sev_launch_update_vmsa(kvm, vcpu, &argp->error);
|
2020-12-10 11:10:09 -06:00
|
|
|
|
2021-09-15 10:17:55 -07:00
|
|
|
mutex_unlock(&vcpu->mutex);
|
2020-12-10 11:10:09 -06:00
|
|
|
if (ret)
|
2021-04-06 15:49:52 -07:00
|
|
|
return ret;
|
2020-12-10 11:10:09 -06:00
|
|
|
}
|
|
|
|
|
2021-04-06 15:49:52 -07:00
|
|
|
return 0;
|
2020-12-10 11:10:09 -06:00
|
|
|
}
|
|
|
|
|
2020-03-24 10:41:54 +01:00
|
|
|
static int sev_launch_measure(struct kvm *kvm, struct kvm_sev_cmd *argp)
|
|
|
|
{
|
2024-02-26 13:42:55 -05:00
|
|
|
void __user *measure = u64_to_user_ptr(argp->data);
|
2021-04-06 15:49:52 -07:00
|
|
|
struct sev_data_launch_measure data;
|
2020-03-24 10:41:54 +01:00
|
|
|
struct kvm_sev_launch_measure params;
|
|
|
|
void __user *p = NULL;
|
|
|
|
void *blob = NULL;
|
|
|
|
int ret;
|
|
|
|
|
|
|
|
if (!sev_guest(kvm))
|
|
|
|
return -ENOTTY;
|
|
|
|
|
|
|
|
if (copy_from_user(¶ms, measure, sizeof(params)))
|
|
|
|
return -EFAULT;
|
|
|
|
|
2021-04-06 15:49:52 -07:00
|
|
|
memset(&data, 0, sizeof(data));
|
2020-03-24 10:41:54 +01:00
|
|
|
|
|
|
|
/* User wants to query the blob length */
|
|
|
|
if (!params.len)
|
|
|
|
goto cmd;
|
|
|
|
|
2024-02-26 13:42:55 -05:00
|
|
|
p = u64_to_user_ptr(params.uaddr);
|
2020-03-24 10:41:54 +01:00
|
|
|
if (p) {
|
2021-04-06 15:49:52 -07:00
|
|
|
if (params.len > SEV_FW_BLOB_MAX_SIZE)
|
|
|
|
return -EINVAL;
|
2020-03-24 10:41:54 +01:00
|
|
|
|
2022-05-16 15:43:10 +00:00
|
|
|
blob = kzalloc(params.len, GFP_KERNEL_ACCOUNT);
|
2020-03-24 10:41:54 +01:00
|
|
|
if (!blob)
|
2021-04-06 15:49:52 -07:00
|
|
|
return -ENOMEM;
|
2020-03-24 10:41:54 +01:00
|
|
|
|
2021-04-06 15:49:52 -07:00
|
|
|
data.address = __psp_pa(blob);
|
|
|
|
data.len = params.len;
|
2020-03-24 10:41:54 +01:00
|
|
|
}
|
|
|
|
|
|
|
|
cmd:
|
2025-01-23 11:21:40 +05:30
|
|
|
data.handle = to_kvm_sev_info(kvm)->handle;
|
2021-04-06 15:49:52 -07:00
|
|
|
ret = sev_issue_cmd(kvm, SEV_CMD_LAUNCH_MEASURE, &data, &argp->error);
|
2020-03-24 10:41:54 +01:00
|
|
|
|
|
|
|
/*
|
|
|
|
* If we query the session length, FW responded with expected data.
|
|
|
|
*/
|
|
|
|
if (!params.len)
|
|
|
|
goto done;
|
|
|
|
|
|
|
|
if (ret)
|
|
|
|
goto e_free_blob;
|
|
|
|
|
|
|
|
if (blob) {
|
|
|
|
if (copy_to_user(p, blob, params.len))
|
|
|
|
ret = -EFAULT;
|
|
|
|
}
|
|
|
|
|
|
|
|
done:
|
2021-04-06 15:49:52 -07:00
|
|
|
params.len = data.len;
|
2020-03-24 10:41:54 +01:00
|
|
|
if (copy_to_user(measure, ¶ms, sizeof(params)))
|
|
|
|
ret = -EFAULT;
|
|
|
|
e_free_blob:
|
|
|
|
kfree(blob);
|
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
|
|
|
|
static int sev_launch_finish(struct kvm *kvm, struct kvm_sev_cmd *argp)
|
|
|
|
{
|
2021-04-06 15:49:52 -07:00
|
|
|
struct sev_data_launch_finish data;
|
2020-03-24 10:41:54 +01:00
|
|
|
|
|
|
|
if (!sev_guest(kvm))
|
|
|
|
return -ENOTTY;
|
|
|
|
|
2025-01-23 11:21:40 +05:30
|
|
|
data.handle = to_kvm_sev_info(kvm)->handle;
|
2021-04-06 15:49:52 -07:00
|
|
|
return sev_issue_cmd(kvm, SEV_CMD_LAUNCH_FINISH, &data, &argp->error);
|
2020-03-24 10:41:54 +01:00
|
|
|
}
|
|
|
|
|
|
|
|
static int sev_guest_status(struct kvm *kvm, struct kvm_sev_cmd *argp)
|
|
|
|
{
|
|
|
|
struct kvm_sev_guest_status params;
|
2021-04-06 15:49:52 -07:00
|
|
|
struct sev_data_guest_status data;
|
2020-03-24 10:41:54 +01:00
|
|
|
int ret;
|
|
|
|
|
|
|
|
if (!sev_guest(kvm))
|
|
|
|
return -ENOTTY;
|
|
|
|
|
2021-04-06 15:49:52 -07:00
|
|
|
memset(&data, 0, sizeof(data));
|
2020-03-24 10:41:54 +01:00
|
|
|
|
2025-01-23 11:21:40 +05:30
|
|
|
data.handle = to_kvm_sev_info(kvm)->handle;
|
2021-04-06 15:49:52 -07:00
|
|
|
ret = sev_issue_cmd(kvm, SEV_CMD_GUEST_STATUS, &data, &argp->error);
|
2020-03-24 10:41:54 +01:00
|
|
|
if (ret)
|
2021-04-06 15:49:52 -07:00
|
|
|
return ret;
|
2020-03-24 10:41:54 +01:00
|
|
|
|
2021-04-06 15:49:52 -07:00
|
|
|
params.policy = data.policy;
|
|
|
|
params.state = data.state;
|
|
|
|
params.handle = data.handle;
|
2020-03-24 10:41:54 +01:00
|
|
|
|
2024-02-26 13:42:55 -05:00
|
|
|
if (copy_to_user(u64_to_user_ptr(argp->data), ¶ms, sizeof(params)))
|
2020-03-24 10:41:54 +01:00
|
|
|
ret = -EFAULT;
|
2021-04-06 15:49:52 -07:00
|
|
|
|
2020-03-24 10:41:54 +01:00
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
|
|
|
|
static int __sev_issue_dbg_cmd(struct kvm *kvm, unsigned long src,
|
|
|
|
unsigned long dst, int size,
|
|
|
|
int *error, bool enc)
|
|
|
|
{
|
2021-04-06 15:49:52 -07:00
|
|
|
struct sev_data_dbg data;
|
2020-03-24 10:41:54 +01:00
|
|
|
|
2021-04-06 15:49:52 -07:00
|
|
|
data.reserved = 0;
|
2025-01-23 11:21:40 +05:30
|
|
|
data.handle = to_kvm_sev_info(kvm)->handle;
|
2021-04-06 15:49:52 -07:00
|
|
|
data.dst_addr = dst;
|
|
|
|
data.src_addr = src;
|
|
|
|
data.len = size;
|
2020-03-24 10:41:54 +01:00
|
|
|
|
2021-04-06 15:49:52 -07:00
|
|
|
return sev_issue_cmd(kvm,
|
|
|
|
enc ? SEV_CMD_DBG_ENCRYPT : SEV_CMD_DBG_DECRYPT,
|
|
|
|
&data, error);
|
2020-03-24 10:41:54 +01:00
|
|
|
}
|
|
|
|
|
|
|
|
static int __sev_dbg_decrypt(struct kvm *kvm, unsigned long src_paddr,
|
|
|
|
unsigned long dst_paddr, int sz, int *err)
|
|
|
|
{
|
|
|
|
int offset;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Its safe to read more than we are asked, caller should ensure that
|
|
|
|
* destination has enough space.
|
|
|
|
*/
|
|
|
|
offset = src_paddr & 15;
|
2020-11-10 22:42:05 +00:00
|
|
|
src_paddr = round_down(src_paddr, 16);
|
2020-03-24 10:41:54 +01:00
|
|
|
sz = round_up(sz + offset, 16);
|
|
|
|
|
|
|
|
return __sev_issue_dbg_cmd(kvm, src_paddr, dst_paddr, sz, err, false);
|
|
|
|
}
|
|
|
|
|
|
|
|
static int __sev_dbg_decrypt_user(struct kvm *kvm, unsigned long paddr,
|
2021-05-06 16:15:42 -07:00
|
|
|
void __user *dst_uaddr,
|
2020-03-24 10:41:54 +01:00
|
|
|
unsigned long dst_paddr,
|
|
|
|
int size, int *err)
|
|
|
|
{
|
|
|
|
struct page *tpage = NULL;
|
|
|
|
int ret, offset;
|
|
|
|
|
|
|
|
/* if inputs are not 16-byte then use intermediate buffer */
|
|
|
|
if (!IS_ALIGNED(dst_paddr, 16) ||
|
|
|
|
!IS_ALIGNED(paddr, 16) ||
|
|
|
|
!IS_ALIGNED(size, 16)) {
|
2023-01-13 22:09:23 +00:00
|
|
|
tpage = (void *)alloc_page(GFP_KERNEL_ACCOUNT | __GFP_ZERO);
|
2020-03-24 10:41:54 +01:00
|
|
|
if (!tpage)
|
|
|
|
return -ENOMEM;
|
|
|
|
|
|
|
|
dst_paddr = __sme_page_pa(tpage);
|
|
|
|
}
|
|
|
|
|
|
|
|
ret = __sev_dbg_decrypt(kvm, paddr, dst_paddr, size, err);
|
|
|
|
if (ret)
|
|
|
|
goto e_free;
|
|
|
|
|
|
|
|
if (tpage) {
|
|
|
|
offset = paddr & 15;
|
2021-05-06 16:15:42 -07:00
|
|
|
if (copy_to_user(dst_uaddr, page_address(tpage) + offset, size))
|
2020-03-24 10:41:54 +01:00
|
|
|
ret = -EFAULT;
|
|
|
|
}
|
|
|
|
|
|
|
|
e_free:
|
|
|
|
if (tpage)
|
|
|
|
__free_page(tpage);
|
|
|
|
|
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
|
|
|
|
static int __sev_dbg_encrypt_user(struct kvm *kvm, unsigned long paddr,
|
2021-05-06 16:15:42 -07:00
|
|
|
void __user *vaddr,
|
2020-03-24 10:41:54 +01:00
|
|
|
unsigned long dst_paddr,
|
2021-05-06 16:15:42 -07:00
|
|
|
void __user *dst_vaddr,
|
2020-03-24 10:41:54 +01:00
|
|
|
int size, int *error)
|
|
|
|
{
|
|
|
|
struct page *src_tpage = NULL;
|
|
|
|
struct page *dst_tpage = NULL;
|
|
|
|
int ret, len = size;
|
|
|
|
|
|
|
|
/* If source buffer is not aligned then use an intermediate buffer */
|
2021-05-06 16:15:42 -07:00
|
|
|
if (!IS_ALIGNED((unsigned long)vaddr, 16)) {
|
2022-06-23 17:18:58 +00:00
|
|
|
src_tpage = alloc_page(GFP_KERNEL_ACCOUNT);
|
2020-03-24 10:41:54 +01:00
|
|
|
if (!src_tpage)
|
|
|
|
return -ENOMEM;
|
|
|
|
|
2021-05-06 16:15:42 -07:00
|
|
|
if (copy_from_user(page_address(src_tpage), vaddr, size)) {
|
2020-03-24 10:41:54 +01:00
|
|
|
__free_page(src_tpage);
|
|
|
|
return -EFAULT;
|
|
|
|
}
|
|
|
|
|
|
|
|
paddr = __sme_page_pa(src_tpage);
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* If destination buffer or length is not aligned then do read-modify-write:
|
|
|
|
* - decrypt destination in an intermediate buffer
|
|
|
|
* - copy the source buffer in an intermediate buffer
|
|
|
|
* - use the intermediate buffer as source buffer
|
|
|
|
*/
|
2021-05-06 16:15:42 -07:00
|
|
|
if (!IS_ALIGNED((unsigned long)dst_vaddr, 16) || !IS_ALIGNED(size, 16)) {
|
2020-03-24 10:41:54 +01:00
|
|
|
int dst_offset;
|
|
|
|
|
2022-06-23 17:18:58 +00:00
|
|
|
dst_tpage = alloc_page(GFP_KERNEL_ACCOUNT);
|
2020-03-24 10:41:54 +01:00
|
|
|
if (!dst_tpage) {
|
|
|
|
ret = -ENOMEM;
|
|
|
|
goto e_free;
|
|
|
|
}
|
|
|
|
|
|
|
|
ret = __sev_dbg_decrypt(kvm, dst_paddr,
|
|
|
|
__sme_page_pa(dst_tpage), size, error);
|
|
|
|
if (ret)
|
|
|
|
goto e_free;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* If source is kernel buffer then use memcpy() otherwise
|
|
|
|
* copy_from_user().
|
|
|
|
*/
|
|
|
|
dst_offset = dst_paddr & 15;
|
|
|
|
|
|
|
|
if (src_tpage)
|
|
|
|
memcpy(page_address(dst_tpage) + dst_offset,
|
|
|
|
page_address(src_tpage), size);
|
|
|
|
else {
|
|
|
|
if (copy_from_user(page_address(dst_tpage) + dst_offset,
|
2021-05-06 16:15:42 -07:00
|
|
|
vaddr, size)) {
|
2020-03-24 10:41:54 +01:00
|
|
|
ret = -EFAULT;
|
|
|
|
goto e_free;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
paddr = __sme_page_pa(dst_tpage);
|
|
|
|
dst_paddr = round_down(dst_paddr, 16);
|
|
|
|
len = round_up(size, 16);
|
|
|
|
}
|
|
|
|
|
|
|
|
ret = __sev_issue_dbg_cmd(kvm, paddr, dst_paddr, len, error, true);
|
|
|
|
|
|
|
|
e_free:
|
|
|
|
if (src_tpage)
|
|
|
|
__free_page(src_tpage);
|
|
|
|
if (dst_tpage)
|
|
|
|
__free_page(dst_tpage);
|
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
|
|
|
|
static int sev_dbg_crypt(struct kvm *kvm, struct kvm_sev_cmd *argp, bool dec)
|
|
|
|
{
|
|
|
|
unsigned long vaddr, vaddr_end, next_vaddr;
|
|
|
|
unsigned long dst_vaddr;
|
|
|
|
struct page **src_p, **dst_p;
|
|
|
|
struct kvm_sev_dbg debug;
|
|
|
|
unsigned long n;
|
|
|
|
unsigned int size;
|
|
|
|
int ret;
|
|
|
|
|
|
|
|
if (!sev_guest(kvm))
|
|
|
|
return -ENOTTY;
|
|
|
|
|
2024-02-26 13:42:55 -05:00
|
|
|
if (copy_from_user(&debug, u64_to_user_ptr(argp->data), sizeof(debug)))
|
2020-03-24 10:41:54 +01:00
|
|
|
return -EFAULT;
|
|
|
|
|
|
|
|
if (!debug.len || debug.src_uaddr + debug.len < debug.src_uaddr)
|
|
|
|
return -EINVAL;
|
|
|
|
if (!debug.dst_uaddr)
|
|
|
|
return -EINVAL;
|
|
|
|
|
|
|
|
vaddr = debug.src_uaddr;
|
|
|
|
size = debug.len;
|
|
|
|
vaddr_end = vaddr + size;
|
|
|
|
dst_vaddr = debug.dst_uaddr;
|
|
|
|
|
|
|
|
for (; vaddr < vaddr_end; vaddr = next_vaddr) {
|
|
|
|
int len, s_off, d_off;
|
|
|
|
|
|
|
|
/* lock userspace source and destination page */
|
|
|
|
src_p = sev_pin_memory(kvm, vaddr & PAGE_MASK, PAGE_SIZE, &n, 0);
|
2020-07-14 17:23:51 +03:00
|
|
|
if (IS_ERR(src_p))
|
|
|
|
return PTR_ERR(src_p);
|
2020-03-24 10:41:54 +01:00
|
|
|
|
2025-02-11 10:37:03 +08:00
|
|
|
dst_p = sev_pin_memory(kvm, dst_vaddr & PAGE_MASK, PAGE_SIZE, &n, FOLL_WRITE);
|
2020-07-14 17:23:51 +03:00
|
|
|
if (IS_ERR(dst_p)) {
|
2020-03-24 10:41:54 +01:00
|
|
|
sev_unpin_memory(kvm, src_p, n);
|
2020-07-14 17:23:51 +03:00
|
|
|
return PTR_ERR(dst_p);
|
2020-03-24 10:41:54 +01:00
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
2020-09-23 13:01:33 -04:00
|
|
|
* Flush (on non-coherent CPUs) before DBG_{DE,EN}CRYPT read or modify
|
|
|
|
* the pages; flush the destination too so that future accesses do not
|
|
|
|
* see stale data.
|
2020-03-24 10:41:54 +01:00
|
|
|
*/
|
|
|
|
sev_clflush_pages(src_p, 1);
|
|
|
|
sev_clflush_pages(dst_p, 1);
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Since user buffer may not be page aligned, calculate the
|
|
|
|
* offset within the page.
|
|
|
|
*/
|
|
|
|
s_off = vaddr & ~PAGE_MASK;
|
|
|
|
d_off = dst_vaddr & ~PAGE_MASK;
|
|
|
|
len = min_t(size_t, (PAGE_SIZE - s_off), size);
|
|
|
|
|
|
|
|
if (dec)
|
|
|
|
ret = __sev_dbg_decrypt_user(kvm,
|
|
|
|
__sme_page_pa(src_p[0]) + s_off,
|
2021-05-06 16:15:42 -07:00
|
|
|
(void __user *)dst_vaddr,
|
2020-03-24 10:41:54 +01:00
|
|
|
__sme_page_pa(dst_p[0]) + d_off,
|
|
|
|
len, &argp->error);
|
|
|
|
else
|
|
|
|
ret = __sev_dbg_encrypt_user(kvm,
|
|
|
|
__sme_page_pa(src_p[0]) + s_off,
|
2021-05-06 16:15:42 -07:00
|
|
|
(void __user *)vaddr,
|
2020-03-24 10:41:54 +01:00
|
|
|
__sme_page_pa(dst_p[0]) + d_off,
|
2021-05-06 16:15:42 -07:00
|
|
|
(void __user *)dst_vaddr,
|
2020-03-24 10:41:54 +01:00
|
|
|
len, &argp->error);
|
|
|
|
|
|
|
|
sev_unpin_memory(kvm, src_p, n);
|
|
|
|
sev_unpin_memory(kvm, dst_p, n);
|
|
|
|
|
|
|
|
if (ret)
|
|
|
|
goto err;
|
|
|
|
|
|
|
|
next_vaddr = vaddr + len;
|
|
|
|
dst_vaddr = dst_vaddr + len;
|
|
|
|
size -= len;
|
|
|
|
}
|
|
|
|
err:
|
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
|
|
|
|
static int sev_launch_secret(struct kvm *kvm, struct kvm_sev_cmd *argp)
|
|
|
|
{
|
2021-04-06 15:49:52 -07:00
|
|
|
struct sev_data_launch_secret data;
|
2020-03-24 10:41:54 +01:00
|
|
|
struct kvm_sev_launch_secret params;
|
|
|
|
struct page **pages;
|
|
|
|
void *blob, *hdr;
|
2020-08-07 17:37:46 -07:00
|
|
|
unsigned long n, i;
|
2020-03-24 10:41:54 +01:00
|
|
|
int ret, offset;
|
|
|
|
|
|
|
|
if (!sev_guest(kvm))
|
|
|
|
return -ENOTTY;
|
|
|
|
|
2024-02-26 13:42:55 -05:00
|
|
|
if (copy_from_user(¶ms, u64_to_user_ptr(argp->data), sizeof(params)))
|
2020-03-24 10:41:54 +01:00
|
|
|
return -EFAULT;
|
|
|
|
|
2025-02-11 10:37:03 +08:00
|
|
|
pages = sev_pin_memory(kvm, params.guest_uaddr, params.guest_len, &n, FOLL_WRITE);
|
2020-06-23 05:12:24 -04:00
|
|
|
if (IS_ERR(pages))
|
|
|
|
return PTR_ERR(pages);
|
2020-03-24 10:41:54 +01:00
|
|
|
|
2020-08-07 17:37:46 -07:00
|
|
|
/*
|
2020-09-23 13:01:33 -04:00
|
|
|
* Flush (on non-coherent CPUs) before LAUNCH_SECRET encrypts pages in
|
|
|
|
* place; the cache may contain the data that was written unencrypted.
|
2020-08-07 17:37:46 -07:00
|
|
|
*/
|
|
|
|
sev_clflush_pages(pages, n);
|
|
|
|
|
2020-03-24 10:41:54 +01:00
|
|
|
/*
|
|
|
|
* The secret must be copied into contiguous memory region, lets verify
|
|
|
|
* that userspace memory pages are contiguous before we issue command.
|
|
|
|
*/
|
|
|
|
if (get_num_contig_pages(0, pages, n) != n) {
|
|
|
|
ret = -EINVAL;
|
|
|
|
goto e_unpin_memory;
|
|
|
|
}
|
|
|
|
|
2021-04-06 15:49:52 -07:00
|
|
|
memset(&data, 0, sizeof(data));
|
2020-03-24 10:41:54 +01:00
|
|
|
|
|
|
|
offset = params.guest_uaddr & (PAGE_SIZE - 1);
|
2021-04-06 15:49:52 -07:00
|
|
|
data.guest_address = __sme_page_pa(pages[0]) + offset;
|
|
|
|
data.guest_len = params.guest_len;
|
2020-03-24 10:41:54 +01:00
|
|
|
|
|
|
|
blob = psp_copy_user_blob(params.trans_uaddr, params.trans_len);
|
|
|
|
if (IS_ERR(blob)) {
|
|
|
|
ret = PTR_ERR(blob);
|
2021-04-06 15:49:52 -07:00
|
|
|
goto e_unpin_memory;
|
2020-03-24 10:41:54 +01:00
|
|
|
}
|
|
|
|
|
2021-04-06 15:49:52 -07:00
|
|
|
data.trans_address = __psp_pa(blob);
|
|
|
|
data.trans_len = params.trans_len;
|
2020-03-24 10:41:54 +01:00
|
|
|
|
|
|
|
hdr = psp_copy_user_blob(params.hdr_uaddr, params.hdr_len);
|
|
|
|
if (IS_ERR(hdr)) {
|
|
|
|
ret = PTR_ERR(hdr);
|
|
|
|
goto e_free_blob;
|
|
|
|
}
|
2021-04-06 15:49:52 -07:00
|
|
|
data.hdr_address = __psp_pa(hdr);
|
|
|
|
data.hdr_len = params.hdr_len;
|
2020-03-24 10:41:54 +01:00
|
|
|
|
2025-01-23 11:21:40 +05:30
|
|
|
data.handle = to_kvm_sev_info(kvm)->handle;
|
2021-04-06 15:49:52 -07:00
|
|
|
ret = sev_issue_cmd(kvm, SEV_CMD_LAUNCH_UPDATE_SECRET, &data, &argp->error);
|
2020-03-24 10:41:54 +01:00
|
|
|
|
|
|
|
kfree(hdr);
|
|
|
|
|
|
|
|
e_free_blob:
|
|
|
|
kfree(blob);
|
|
|
|
e_unpin_memory:
|
2020-08-07 17:37:46 -07:00
|
|
|
/* content of memory is updated, mark pages dirty */
|
|
|
|
for (i = 0; i < n; i++) {
|
|
|
|
set_page_dirty_lock(pages[i]);
|
|
|
|
mark_page_accessed(pages[i]);
|
|
|
|
}
|
2020-03-24 10:41:54 +01:00
|
|
|
sev_unpin_memory(kvm, pages, n);
|
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
|
2021-01-04 09:17:49 -06:00
|
|
|
static int sev_get_attestation_report(struct kvm *kvm, struct kvm_sev_cmd *argp)
|
|
|
|
{
|
2024-02-26 13:42:55 -05:00
|
|
|
void __user *report = u64_to_user_ptr(argp->data);
|
2021-04-06 15:49:52 -07:00
|
|
|
struct sev_data_attestation_report data;
|
2021-01-04 09:17:49 -06:00
|
|
|
struct kvm_sev_attestation_report params;
|
|
|
|
void __user *p;
|
|
|
|
void *blob = NULL;
|
|
|
|
int ret;
|
|
|
|
|
|
|
|
if (!sev_guest(kvm))
|
|
|
|
return -ENOTTY;
|
|
|
|
|
2024-02-26 13:42:55 -05:00
|
|
|
if (copy_from_user(¶ms, u64_to_user_ptr(argp->data), sizeof(params)))
|
2021-01-04 09:17:49 -06:00
|
|
|
return -EFAULT;
|
|
|
|
|
2021-04-06 15:49:52 -07:00
|
|
|
memset(&data, 0, sizeof(data));
|
2021-01-04 09:17:49 -06:00
|
|
|
|
|
|
|
/* User wants to query the blob length */
|
|
|
|
if (!params.len)
|
|
|
|
goto cmd;
|
|
|
|
|
2024-02-26 13:42:55 -05:00
|
|
|
p = u64_to_user_ptr(params.uaddr);
|
2021-01-04 09:17:49 -06:00
|
|
|
if (p) {
|
2021-04-06 15:49:52 -07:00
|
|
|
if (params.len > SEV_FW_BLOB_MAX_SIZE)
|
|
|
|
return -EINVAL;
|
2021-01-04 09:17:49 -06:00
|
|
|
|
2022-05-16 15:43:10 +00:00
|
|
|
blob = kzalloc(params.len, GFP_KERNEL_ACCOUNT);
|
2021-01-04 09:17:49 -06:00
|
|
|
if (!blob)
|
2021-04-06 15:49:52 -07:00
|
|
|
return -ENOMEM;
|
2021-01-04 09:17:49 -06:00
|
|
|
|
2021-04-06 15:49:52 -07:00
|
|
|
data.address = __psp_pa(blob);
|
|
|
|
data.len = params.len;
|
|
|
|
memcpy(data.mnonce, params.mnonce, sizeof(params.mnonce));
|
2021-01-04 09:17:49 -06:00
|
|
|
}
|
|
|
|
cmd:
|
2025-01-23 11:21:40 +05:30
|
|
|
data.handle = to_kvm_sev_info(kvm)->handle;
|
2021-04-06 15:49:52 -07:00
|
|
|
ret = sev_issue_cmd(kvm, SEV_CMD_ATTESTATION_REPORT, &data, &argp->error);
|
2021-01-04 09:17:49 -06:00
|
|
|
/*
|
|
|
|
* If we query the session length, FW responded with expected data.
|
|
|
|
*/
|
|
|
|
if (!params.len)
|
|
|
|
goto done;
|
|
|
|
|
|
|
|
if (ret)
|
|
|
|
goto e_free_blob;
|
|
|
|
|
|
|
|
if (blob) {
|
|
|
|
if (copy_to_user(p, blob, params.len))
|
|
|
|
ret = -EFAULT;
|
|
|
|
}
|
|
|
|
|
|
|
|
done:
|
2021-04-06 15:49:52 -07:00
|
|
|
params.len = data.len;
|
2021-01-04 09:17:49 -06:00
|
|
|
if (copy_to_user(report, ¶ms, sizeof(params)))
|
|
|
|
ret = -EFAULT;
|
|
|
|
e_free_blob:
|
|
|
|
kfree(blob);
|
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
|
2021-04-15 15:53:14 +00:00
|
|
|
/* Userspace wants to query session length. */
|
|
|
|
static int
|
|
|
|
__sev_send_start_query_session_length(struct kvm *kvm, struct kvm_sev_cmd *argp,
|
|
|
|
struct kvm_sev_send_start *params)
|
|
|
|
{
|
2021-04-06 15:49:52 -07:00
|
|
|
struct sev_data_send_start data;
|
2021-04-15 15:53:14 +00:00
|
|
|
int ret;
|
|
|
|
|
2021-06-07 06:15:32 +00:00
|
|
|
memset(&data, 0, sizeof(data));
|
2025-01-23 11:21:40 +05:30
|
|
|
data.handle = to_kvm_sev_info(kvm)->handle;
|
2021-04-06 15:49:52 -07:00
|
|
|
ret = sev_issue_cmd(kvm, SEV_CMD_SEND_START, &data, &argp->error);
|
2021-04-15 15:53:14 +00:00
|
|
|
|
2021-04-06 15:49:52 -07:00
|
|
|
params->session_len = data.session_len;
|
2024-02-26 13:42:55 -05:00
|
|
|
if (copy_to_user(u64_to_user_ptr(argp->data), params,
|
2021-04-15 15:53:14 +00:00
|
|
|
sizeof(struct kvm_sev_send_start)))
|
|
|
|
ret = -EFAULT;
|
|
|
|
|
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
|
|
|
|
static int sev_send_start(struct kvm *kvm, struct kvm_sev_cmd *argp)
|
|
|
|
{
|
2021-04-06 15:49:52 -07:00
|
|
|
struct sev_data_send_start data;
|
2021-04-15 15:53:14 +00:00
|
|
|
struct kvm_sev_send_start params;
|
|
|
|
void *amd_certs, *session_data;
|
|
|
|
void *pdh_cert, *plat_certs;
|
|
|
|
int ret;
|
|
|
|
|
|
|
|
if (!sev_guest(kvm))
|
|
|
|
return -ENOTTY;
|
|
|
|
|
2024-02-26 13:42:55 -05:00
|
|
|
if (copy_from_user(¶ms, u64_to_user_ptr(argp->data),
|
2021-04-15 15:53:14 +00:00
|
|
|
sizeof(struct kvm_sev_send_start)))
|
|
|
|
return -EFAULT;
|
|
|
|
|
|
|
|
/* if session_len is zero, userspace wants to query the session length */
|
|
|
|
if (!params.session_len)
|
|
|
|
return __sev_send_start_query_session_length(kvm, argp,
|
|
|
|
¶ms);
|
|
|
|
|
|
|
|
/* some sanity checks */
|
|
|
|
if (!params.pdh_cert_uaddr || !params.pdh_cert_len ||
|
|
|
|
!params.session_uaddr || params.session_len > SEV_FW_BLOB_MAX_SIZE)
|
|
|
|
return -EINVAL;
|
|
|
|
|
|
|
|
/* allocate the memory to hold the session data blob */
|
2022-05-16 15:43:10 +00:00
|
|
|
session_data = kzalloc(params.session_len, GFP_KERNEL_ACCOUNT);
|
2021-04-15 15:53:14 +00:00
|
|
|
if (!session_data)
|
|
|
|
return -ENOMEM;
|
|
|
|
|
|
|
|
/* copy the certificate blobs from userspace */
|
|
|
|
pdh_cert = psp_copy_user_blob(params.pdh_cert_uaddr,
|
|
|
|
params.pdh_cert_len);
|
|
|
|
if (IS_ERR(pdh_cert)) {
|
|
|
|
ret = PTR_ERR(pdh_cert);
|
|
|
|
goto e_free_session;
|
|
|
|
}
|
|
|
|
|
|
|
|
plat_certs = psp_copy_user_blob(params.plat_certs_uaddr,
|
|
|
|
params.plat_certs_len);
|
|
|
|
if (IS_ERR(plat_certs)) {
|
|
|
|
ret = PTR_ERR(plat_certs);
|
|
|
|
goto e_free_pdh;
|
|
|
|
}
|
|
|
|
|
|
|
|
amd_certs = psp_copy_user_blob(params.amd_certs_uaddr,
|
|
|
|
params.amd_certs_len);
|
|
|
|
if (IS_ERR(amd_certs)) {
|
|
|
|
ret = PTR_ERR(amd_certs);
|
|
|
|
goto e_free_plat_cert;
|
|
|
|
}
|
|
|
|
|
|
|
|
/* populate the FW SEND_START field with system physical address */
|
2021-04-06 15:49:52 -07:00
|
|
|
memset(&data, 0, sizeof(data));
|
|
|
|
data.pdh_cert_address = __psp_pa(pdh_cert);
|
|
|
|
data.pdh_cert_len = params.pdh_cert_len;
|
|
|
|
data.plat_certs_address = __psp_pa(plat_certs);
|
|
|
|
data.plat_certs_len = params.plat_certs_len;
|
|
|
|
data.amd_certs_address = __psp_pa(amd_certs);
|
|
|
|
data.amd_certs_len = params.amd_certs_len;
|
|
|
|
data.session_address = __psp_pa(session_data);
|
|
|
|
data.session_len = params.session_len;
|
2025-01-23 11:21:40 +05:30
|
|
|
data.handle = to_kvm_sev_info(kvm)->handle;
|
2021-04-06 15:49:52 -07:00
|
|
|
|
|
|
|
ret = sev_issue_cmd(kvm, SEV_CMD_SEND_START, &data, &argp->error);
|
2021-04-15 15:53:14 +00:00
|
|
|
|
2024-02-26 13:42:55 -05:00
|
|
|
if (!ret && copy_to_user(u64_to_user_ptr(params.session_uaddr),
|
2021-04-15 15:53:14 +00:00
|
|
|
session_data, params.session_len)) {
|
|
|
|
ret = -EFAULT;
|
2021-04-06 15:49:52 -07:00
|
|
|
goto e_free_amd_cert;
|
2021-04-15 15:53:14 +00:00
|
|
|
}
|
|
|
|
|
2021-04-06 15:49:52 -07:00
|
|
|
params.policy = data.policy;
|
|
|
|
params.session_len = data.session_len;
|
2024-02-26 13:42:55 -05:00
|
|
|
if (copy_to_user(u64_to_user_ptr(argp->data), ¶ms,
|
2021-04-15 15:53:14 +00:00
|
|
|
sizeof(struct kvm_sev_send_start)))
|
|
|
|
ret = -EFAULT;
|
|
|
|
|
|
|
|
e_free_amd_cert:
|
|
|
|
kfree(amd_certs);
|
|
|
|
e_free_plat_cert:
|
|
|
|
kfree(plat_certs);
|
|
|
|
e_free_pdh:
|
|
|
|
kfree(pdh_cert);
|
|
|
|
e_free_session:
|
|
|
|
kfree(session_data);
|
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
|
2021-04-15 15:53:55 +00:00
|
|
|
/* Userspace wants to query either header or trans length. */
|
|
|
|
static int
|
|
|
|
__sev_send_update_data_query_lengths(struct kvm *kvm, struct kvm_sev_cmd *argp,
|
|
|
|
struct kvm_sev_send_update_data *params)
|
|
|
|
{
|
2021-04-06 15:49:52 -07:00
|
|
|
struct sev_data_send_update_data data;
|
2021-04-15 15:53:55 +00:00
|
|
|
int ret;
|
|
|
|
|
2021-06-07 06:15:32 +00:00
|
|
|
memset(&data, 0, sizeof(data));
|
2025-01-23 11:21:40 +05:30
|
|
|
data.handle = to_kvm_sev_info(kvm)->handle;
|
2021-04-06 15:49:52 -07:00
|
|
|
ret = sev_issue_cmd(kvm, SEV_CMD_SEND_UPDATE_DATA, &data, &argp->error);
|
2021-04-15 15:53:55 +00:00
|
|
|
|
2021-04-06 15:49:52 -07:00
|
|
|
params->hdr_len = data.hdr_len;
|
|
|
|
params->trans_len = data.trans_len;
|
2021-04-15 15:53:55 +00:00
|
|
|
|
2024-02-26 13:42:55 -05:00
|
|
|
if (copy_to_user(u64_to_user_ptr(argp->data), params,
|
2021-04-15 15:53:55 +00:00
|
|
|
sizeof(struct kvm_sev_send_update_data)))
|
|
|
|
ret = -EFAULT;
|
|
|
|
|
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
|
|
|
|
static int sev_send_update_data(struct kvm *kvm, struct kvm_sev_cmd *argp)
|
|
|
|
{
|
2021-04-06 15:49:52 -07:00
|
|
|
struct sev_data_send_update_data data;
|
2021-04-15 15:53:55 +00:00
|
|
|
struct kvm_sev_send_update_data params;
|
|
|
|
void *hdr, *trans_data;
|
|
|
|
struct page **guest_page;
|
|
|
|
unsigned long n;
|
|
|
|
int ret, offset;
|
|
|
|
|
|
|
|
if (!sev_guest(kvm))
|
|
|
|
return -ENOTTY;
|
|
|
|
|
2024-02-26 13:42:55 -05:00
|
|
|
if (copy_from_user(¶ms, u64_to_user_ptr(argp->data),
|
2021-04-15 15:53:55 +00:00
|
|
|
sizeof(struct kvm_sev_send_update_data)))
|
|
|
|
return -EFAULT;
|
|
|
|
|
|
|
|
/* userspace wants to query either header or trans length */
|
|
|
|
if (!params.trans_len || !params.hdr_len)
|
|
|
|
return __sev_send_update_data_query_lengths(kvm, argp, ¶ms);
|
|
|
|
|
|
|
|
if (!params.trans_uaddr || !params.guest_uaddr ||
|
|
|
|
!params.guest_len || !params.hdr_uaddr)
|
|
|
|
return -EINVAL;
|
|
|
|
|
|
|
|
/* Check if we are crossing the page boundary */
|
|
|
|
offset = params.guest_uaddr & (PAGE_SIZE - 1);
|
2023-02-07 09:13:54 -08:00
|
|
|
if (params.guest_len > PAGE_SIZE || (params.guest_len + offset) > PAGE_SIZE)
|
2021-04-15 15:53:55 +00:00
|
|
|
return -EINVAL;
|
|
|
|
|
|
|
|
/* Pin guest memory */
|
|
|
|
guest_page = sev_pin_memory(kvm, params.guest_uaddr & PAGE_MASK,
|
|
|
|
PAGE_SIZE, &n, 0);
|
2021-05-06 10:58:26 -07:00
|
|
|
if (IS_ERR(guest_page))
|
|
|
|
return PTR_ERR(guest_page);
|
2021-04-15 15:53:55 +00:00
|
|
|
|
|
|
|
/* allocate memory for header and transport buffer */
|
|
|
|
ret = -ENOMEM;
|
2025-04-28 14:30:13 +08:00
|
|
|
hdr = kzalloc(params.hdr_len, GFP_KERNEL);
|
2021-04-15 15:53:55 +00:00
|
|
|
if (!hdr)
|
|
|
|
goto e_unpin;
|
|
|
|
|
2025-04-28 14:30:13 +08:00
|
|
|
trans_data = kzalloc(params.trans_len, GFP_KERNEL);
|
2021-04-15 15:53:55 +00:00
|
|
|
if (!trans_data)
|
|
|
|
goto e_free_hdr;
|
|
|
|
|
2021-04-06 15:49:52 -07:00
|
|
|
memset(&data, 0, sizeof(data));
|
|
|
|
data.hdr_address = __psp_pa(hdr);
|
|
|
|
data.hdr_len = params.hdr_len;
|
|
|
|
data.trans_address = __psp_pa(trans_data);
|
|
|
|
data.trans_len = params.trans_len;
|
2021-04-15 15:53:55 +00:00
|
|
|
|
|
|
|
/* The SEND_UPDATE_DATA command requires C-bit to be always set. */
|
2021-04-06 15:49:52 -07:00
|
|
|
data.guest_address = (page_to_pfn(guest_page[0]) << PAGE_SHIFT) + offset;
|
|
|
|
data.guest_address |= sev_me_mask;
|
|
|
|
data.guest_len = params.guest_len;
|
2025-01-23 11:21:40 +05:30
|
|
|
data.handle = to_kvm_sev_info(kvm)->handle;
|
2021-04-15 15:53:55 +00:00
|
|
|
|
2021-04-06 15:49:52 -07:00
|
|
|
ret = sev_issue_cmd(kvm, SEV_CMD_SEND_UPDATE_DATA, &data, &argp->error);
|
2021-04-15 15:53:55 +00:00
|
|
|
|
|
|
|
if (ret)
|
2021-04-06 15:49:52 -07:00
|
|
|
goto e_free_trans_data;
|
2021-04-15 15:53:55 +00:00
|
|
|
|
|
|
|
/* copy transport buffer to user space */
|
2024-02-26 13:42:55 -05:00
|
|
|
if (copy_to_user(u64_to_user_ptr(params.trans_uaddr),
|
2021-04-15 15:53:55 +00:00
|
|
|
trans_data, params.trans_len)) {
|
|
|
|
ret = -EFAULT;
|
2021-04-06 15:49:52 -07:00
|
|
|
goto e_free_trans_data;
|
2021-04-15 15:53:55 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
/* Copy packet header to userspace. */
|
2024-02-26 13:42:55 -05:00
|
|
|
if (copy_to_user(u64_to_user_ptr(params.hdr_uaddr), hdr,
|
2021-05-06 10:58:25 -07:00
|
|
|
params.hdr_len))
|
|
|
|
ret = -EFAULT;
|
2021-04-15 15:53:55 +00:00
|
|
|
|
|
|
|
e_free_trans_data:
|
|
|
|
kfree(trans_data);
|
|
|
|
e_free_hdr:
|
|
|
|
kfree(hdr);
|
|
|
|
e_unpin:
|
|
|
|
sev_unpin_memory(kvm, guest_page, n);
|
|
|
|
|
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
|
2021-04-15 15:54:15 +00:00
|
|
|
static int sev_send_finish(struct kvm *kvm, struct kvm_sev_cmd *argp)
|
|
|
|
{
|
2021-04-06 15:49:52 -07:00
|
|
|
struct sev_data_send_finish data;
|
2021-04-15 15:54:15 +00:00
|
|
|
|
|
|
|
if (!sev_guest(kvm))
|
|
|
|
return -ENOTTY;
|
|
|
|
|
2025-01-23 11:21:40 +05:30
|
|
|
data.handle = to_kvm_sev_info(kvm)->handle;
|
2021-04-06 15:49:52 -07:00
|
|
|
return sev_issue_cmd(kvm, SEV_CMD_SEND_FINISH, &data, &argp->error);
|
2021-04-15 15:54:15 +00:00
|
|
|
}
|
|
|
|
|
2021-04-20 05:01:20 -04:00
|
|
|
static int sev_send_cancel(struct kvm *kvm, struct kvm_sev_cmd *argp)
|
|
|
|
{
|
2021-04-06 15:49:52 -07:00
|
|
|
struct sev_data_send_cancel data;
|
2021-04-20 05:01:20 -04:00
|
|
|
|
|
|
|
if (!sev_guest(kvm))
|
|
|
|
return -ENOTTY;
|
|
|
|
|
2025-01-23 11:21:40 +05:30
|
|
|
data.handle = to_kvm_sev_info(kvm)->handle;
|
2021-04-06 15:49:52 -07:00
|
|
|
return sev_issue_cmd(kvm, SEV_CMD_SEND_CANCEL, &data, &argp->error);
|
2021-04-20 05:01:20 -04:00
|
|
|
}
|
|
|
|
|
2021-04-15 15:54:50 +00:00
|
|
|
static int sev_receive_start(struct kvm *kvm, struct kvm_sev_cmd *argp)
|
|
|
|
{
|
2025-01-23 11:21:40 +05:30
|
|
|
struct kvm_sev_info *sev = to_kvm_sev_info(kvm);
|
2021-04-06 15:49:52 -07:00
|
|
|
struct sev_data_receive_start start;
|
2021-04-15 15:54:50 +00:00
|
|
|
struct kvm_sev_receive_start params;
|
|
|
|
int *error = &argp->error;
|
|
|
|
void *session_data;
|
|
|
|
void *pdh_data;
|
|
|
|
int ret;
|
|
|
|
|
|
|
|
if (!sev_guest(kvm))
|
|
|
|
return -ENOTTY;
|
|
|
|
|
|
|
|
/* Get parameter from the userspace */
|
2024-02-26 13:42:55 -05:00
|
|
|
if (copy_from_user(¶ms, u64_to_user_ptr(argp->data),
|
2021-04-15 15:54:50 +00:00
|
|
|
sizeof(struct kvm_sev_receive_start)))
|
|
|
|
return -EFAULT;
|
|
|
|
|
|
|
|
/* some sanity checks */
|
|
|
|
if (!params.pdh_uaddr || !params.pdh_len ||
|
|
|
|
!params.session_uaddr || !params.session_len)
|
|
|
|
return -EINVAL;
|
|
|
|
|
|
|
|
pdh_data = psp_copy_user_blob(params.pdh_uaddr, params.pdh_len);
|
|
|
|
if (IS_ERR(pdh_data))
|
|
|
|
return PTR_ERR(pdh_data);
|
|
|
|
|
|
|
|
session_data = psp_copy_user_blob(params.session_uaddr,
|
|
|
|
params.session_len);
|
|
|
|
if (IS_ERR(session_data)) {
|
|
|
|
ret = PTR_ERR(session_data);
|
|
|
|
goto e_free_pdh;
|
|
|
|
}
|
|
|
|
|
2021-04-06 15:49:52 -07:00
|
|
|
memset(&start, 0, sizeof(start));
|
|
|
|
start.handle = params.handle;
|
|
|
|
start.policy = params.policy;
|
|
|
|
start.pdh_cert_address = __psp_pa(pdh_data);
|
|
|
|
start.pdh_cert_len = params.pdh_len;
|
|
|
|
start.session_address = __psp_pa(session_data);
|
|
|
|
start.session_len = params.session_len;
|
2021-04-15 15:54:50 +00:00
|
|
|
|
|
|
|
/* create memory encryption context */
|
2021-04-06 15:49:52 -07:00
|
|
|
ret = __sev_issue_cmd(argp->sev_fd, SEV_CMD_RECEIVE_START, &start,
|
2021-04-15 15:54:50 +00:00
|
|
|
error);
|
|
|
|
if (ret)
|
2021-04-06 15:49:52 -07:00
|
|
|
goto e_free_session;
|
2021-04-15 15:54:50 +00:00
|
|
|
|
|
|
|
/* Bind ASID to this guest */
|
2021-04-06 15:49:52 -07:00
|
|
|
ret = sev_bind_asid(kvm, start.handle, error);
|
2021-09-12 18:18:15 +00:00
|
|
|
if (ret) {
|
|
|
|
sev_decommission(start.handle);
|
2021-04-06 15:49:52 -07:00
|
|
|
goto e_free_session;
|
2021-09-12 18:18:15 +00:00
|
|
|
}
|
2021-04-15 15:54:50 +00:00
|
|
|
|
2021-04-06 15:49:52 -07:00
|
|
|
params.handle = start.handle;
|
2024-02-26 13:42:55 -05:00
|
|
|
if (copy_to_user(u64_to_user_ptr(argp->data),
|
2021-04-15 15:54:50 +00:00
|
|
|
¶ms, sizeof(struct kvm_sev_receive_start))) {
|
|
|
|
ret = -EFAULT;
|
2021-04-06 15:49:52 -07:00
|
|
|
sev_unbind_asid(kvm, start.handle);
|
|
|
|
goto e_free_session;
|
2021-04-15 15:54:50 +00:00
|
|
|
}
|
|
|
|
|
2021-04-06 15:49:52 -07:00
|
|
|
sev->handle = start.handle;
|
2021-04-15 15:54:50 +00:00
|
|
|
sev->fd = argp->sev_fd;
|
|
|
|
|
|
|
|
e_free_session:
|
|
|
|
kfree(session_data);
|
|
|
|
e_free_pdh:
|
|
|
|
kfree(pdh_data);
|
|
|
|
|
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
|
2021-04-15 15:55:17 +00:00
|
|
|
static int sev_receive_update_data(struct kvm *kvm, struct kvm_sev_cmd *argp)
|
|
|
|
{
|
|
|
|
struct kvm_sev_receive_update_data params;
|
2021-04-06 15:49:52 -07:00
|
|
|
struct sev_data_receive_update_data data;
|
2021-04-15 15:55:17 +00:00
|
|
|
void *hdr = NULL, *trans = NULL;
|
|
|
|
struct page **guest_page;
|
|
|
|
unsigned long n;
|
|
|
|
int ret, offset;
|
|
|
|
|
|
|
|
if (!sev_guest(kvm))
|
|
|
|
return -EINVAL;
|
|
|
|
|
2024-02-26 13:42:55 -05:00
|
|
|
if (copy_from_user(¶ms, u64_to_user_ptr(argp->data),
|
2021-04-15 15:55:17 +00:00
|
|
|
sizeof(struct kvm_sev_receive_update_data)))
|
|
|
|
return -EFAULT;
|
|
|
|
|
|
|
|
if (!params.hdr_uaddr || !params.hdr_len ||
|
|
|
|
!params.guest_uaddr || !params.guest_len ||
|
|
|
|
!params.trans_uaddr || !params.trans_len)
|
|
|
|
return -EINVAL;
|
|
|
|
|
|
|
|
/* Check if we are crossing the page boundary */
|
|
|
|
offset = params.guest_uaddr & (PAGE_SIZE - 1);
|
2023-02-07 09:13:54 -08:00
|
|
|
if (params.guest_len > PAGE_SIZE || (params.guest_len + offset) > PAGE_SIZE)
|
2021-04-15 15:55:17 +00:00
|
|
|
return -EINVAL;
|
|
|
|
|
|
|
|
hdr = psp_copy_user_blob(params.hdr_uaddr, params.hdr_len);
|
|
|
|
if (IS_ERR(hdr))
|
|
|
|
return PTR_ERR(hdr);
|
|
|
|
|
|
|
|
trans = psp_copy_user_blob(params.trans_uaddr, params.trans_len);
|
|
|
|
if (IS_ERR(trans)) {
|
|
|
|
ret = PTR_ERR(trans);
|
|
|
|
goto e_free_hdr;
|
|
|
|
}
|
|
|
|
|
2021-04-06 15:49:52 -07:00
|
|
|
memset(&data, 0, sizeof(data));
|
|
|
|
data.hdr_address = __psp_pa(hdr);
|
|
|
|
data.hdr_len = params.hdr_len;
|
|
|
|
data.trans_address = __psp_pa(trans);
|
|
|
|
data.trans_len = params.trans_len;
|
2021-04-15 15:55:17 +00:00
|
|
|
|
|
|
|
/* Pin guest memory */
|
|
|
|
guest_page = sev_pin_memory(kvm, params.guest_uaddr & PAGE_MASK,
|
2025-02-11 10:37:03 +08:00
|
|
|
PAGE_SIZE, &n, FOLL_WRITE);
|
2021-05-06 10:58:26 -07:00
|
|
|
if (IS_ERR(guest_page)) {
|
|
|
|
ret = PTR_ERR(guest_page);
|
2021-04-06 15:49:52 -07:00
|
|
|
goto e_free_trans;
|
2021-05-06 10:58:26 -07:00
|
|
|
}
|
2021-04-15 15:55:17 +00:00
|
|
|
|
2021-09-14 14:09:51 -07:00
|
|
|
/*
|
|
|
|
* Flush (on non-coherent CPUs) before RECEIVE_UPDATE_DATA, the PSP
|
|
|
|
* encrypts the written data with the guest's key, and the cache may
|
|
|
|
* contain dirty, unencrypted data.
|
|
|
|
*/
|
|
|
|
sev_clflush_pages(guest_page, n);
|
|
|
|
|
2021-04-15 15:55:17 +00:00
|
|
|
/* The RECEIVE_UPDATE_DATA command requires C-bit to be always set. */
|
2021-04-06 15:49:52 -07:00
|
|
|
data.guest_address = (page_to_pfn(guest_page[0]) << PAGE_SHIFT) + offset;
|
|
|
|
data.guest_address |= sev_me_mask;
|
|
|
|
data.guest_len = params.guest_len;
|
2025-01-23 11:21:40 +05:30
|
|
|
data.handle = to_kvm_sev_info(kvm)->handle;
|
2021-04-15 15:55:17 +00:00
|
|
|
|
2021-04-06 15:49:52 -07:00
|
|
|
ret = sev_issue_cmd(kvm, SEV_CMD_RECEIVE_UPDATE_DATA, &data,
|
2021-04-15 15:55:17 +00:00
|
|
|
&argp->error);
|
|
|
|
|
|
|
|
sev_unpin_memory(kvm, guest_page, n);
|
|
|
|
|
|
|
|
e_free_trans:
|
|
|
|
kfree(trans);
|
|
|
|
e_free_hdr:
|
|
|
|
kfree(hdr);
|
|
|
|
|
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
|
2021-04-15 15:55:40 +00:00
|
|
|
static int sev_receive_finish(struct kvm *kvm, struct kvm_sev_cmd *argp)
|
|
|
|
{
|
2021-04-06 15:49:52 -07:00
|
|
|
struct sev_data_receive_finish data;
|
2021-04-15 15:55:40 +00:00
|
|
|
|
|
|
|
if (!sev_guest(kvm))
|
|
|
|
return -ENOTTY;
|
|
|
|
|
2025-01-23 11:21:40 +05:30
|
|
|
data.handle = to_kvm_sev_info(kvm)->handle;
|
2021-04-06 15:49:52 -07:00
|
|
|
return sev_issue_cmd(kvm, SEV_CMD_RECEIVE_FINISH, &data, &argp->error);
|
2021-04-15 15:55:40 +00:00
|
|
|
}
|
|
|
|
|
2021-11-09 21:51:01 +00:00
|
|
|
static bool is_cmd_allowed_from_mirror(u32 cmd_id)
|
2021-09-21 08:03:45 -07:00
|
|
|
{
|
|
|
|
/*
|
|
|
|
* Allow mirrors VM to call KVM_SEV_LAUNCH_UPDATE_VMSA to enable SEV-ES
|
|
|
|
* active mirror VMs. Also allow the debugging and status commands.
|
|
|
|
*/
|
|
|
|
if (cmd_id == KVM_SEV_LAUNCH_UPDATE_VMSA ||
|
|
|
|
cmd_id == KVM_SEV_GUEST_STATUS || cmd_id == KVM_SEV_DBG_DECRYPT ||
|
|
|
|
cmd_id == KVM_SEV_DBG_ENCRYPT)
|
|
|
|
return true;
|
|
|
|
|
|
|
|
return false;
|
|
|
|
}
|
|
|
|
|
2021-11-22 19:50:29 -05:00
|
|
|
static int sev_lock_two_vms(struct kvm *dst_kvm, struct kvm *src_kvm)
|
2021-10-21 10:43:00 -07:00
|
|
|
{
|
2025-01-23 11:21:40 +05:30
|
|
|
struct kvm_sev_info *dst_sev = to_kvm_sev_info(dst_kvm);
|
|
|
|
struct kvm_sev_info *src_sev = to_kvm_sev_info(src_kvm);
|
2021-11-22 19:50:36 -05:00
|
|
|
int r = -EBUSY;
|
2021-11-22 19:50:29 -05:00
|
|
|
|
|
|
|
if (dst_kvm == src_kvm)
|
|
|
|
return -EINVAL;
|
2021-10-21 10:43:00 -07:00
|
|
|
|
|
|
|
/*
|
2021-11-22 19:50:29 -05:00
|
|
|
* Bail if these VMs are already involved in a migration to avoid
|
|
|
|
* deadlock between two VMs trying to migrate to/from each other.
|
2021-10-21 10:43:00 -07:00
|
|
|
*/
|
2021-11-22 19:50:29 -05:00
|
|
|
if (atomic_cmpxchg_acquire(&dst_sev->migration_in_progress, 0, 1))
|
2021-10-21 10:43:00 -07:00
|
|
|
return -EBUSY;
|
|
|
|
|
2021-11-22 19:50:36 -05:00
|
|
|
if (atomic_cmpxchg_acquire(&src_sev->migration_in_progress, 0, 1))
|
|
|
|
goto release_dst;
|
2021-10-21 10:43:00 -07:00
|
|
|
|
2021-11-22 19:50:36 -05:00
|
|
|
r = -EINTR;
|
|
|
|
if (mutex_lock_killable(&dst_kvm->lock))
|
|
|
|
goto release_src;
|
2022-01-04 22:41:03 -08:00
|
|
|
if (mutex_lock_killable_nested(&src_kvm->lock, SINGLE_DEPTH_NESTING))
|
2021-11-22 19:50:36 -05:00
|
|
|
goto unlock_dst;
|
2021-10-21 10:43:00 -07:00
|
|
|
return 0;
|
2021-11-22 19:50:36 -05:00
|
|
|
|
|
|
|
unlock_dst:
|
|
|
|
mutex_unlock(&dst_kvm->lock);
|
|
|
|
release_src:
|
|
|
|
atomic_set_release(&src_sev->migration_in_progress, 0);
|
|
|
|
release_dst:
|
|
|
|
atomic_set_release(&dst_sev->migration_in_progress, 0);
|
|
|
|
return r;
|
2021-10-21 10:43:00 -07:00
|
|
|
}
|
|
|
|
|
2021-11-22 19:50:29 -05:00
|
|
|
static void sev_unlock_two_vms(struct kvm *dst_kvm, struct kvm *src_kvm)
|
2021-10-21 10:43:00 -07:00
|
|
|
{
|
2025-01-23 11:21:40 +05:30
|
|
|
struct kvm_sev_info *dst_sev = to_kvm_sev_info(dst_kvm);
|
|
|
|
struct kvm_sev_info *src_sev = to_kvm_sev_info(src_kvm);
|
2021-10-21 10:43:00 -07:00
|
|
|
|
2021-11-22 19:50:29 -05:00
|
|
|
mutex_unlock(&dst_kvm->lock);
|
|
|
|
mutex_unlock(&src_kvm->lock);
|
|
|
|
atomic_set_release(&dst_sev->migration_in_progress, 0);
|
|
|
|
atomic_set_release(&src_sev->migration_in_progress, 0);
|
2021-10-21 10:43:00 -07:00
|
|
|
}
|
|
|
|
|
2022-02-11 11:36:34 -08:00
|
|
|
static void sev_migrate_from(struct kvm *dst_kvm, struct kvm *src_kvm)
|
2021-10-21 10:43:00 -07:00
|
|
|
{
|
2025-01-23 11:21:40 +05:30
|
|
|
struct kvm_sev_info *dst = to_kvm_sev_info(dst_kvm);
|
|
|
|
struct kvm_sev_info *src = to_kvm_sev_info(src_kvm);
|
2022-06-23 10:34:06 -07:00
|
|
|
struct kvm_vcpu *dst_vcpu, *src_vcpu;
|
|
|
|
struct vcpu_svm *dst_svm, *src_svm;
|
2022-02-11 11:36:34 -08:00
|
|
|
struct kvm_sev_info *mirror;
|
2022-06-23 10:34:06 -07:00
|
|
|
unsigned long i;
|
2022-02-11 11:36:34 -08:00
|
|
|
|
2021-10-21 10:43:00 -07:00
|
|
|
dst->active = true;
|
|
|
|
dst->asid = src->asid;
|
|
|
|
dst->handle = src->handle;
|
|
|
|
dst->pages_locked = src->pages_locked;
|
2021-11-22 19:50:31 -05:00
|
|
|
dst->enc_context_owner = src->enc_context_owner;
|
2022-06-23 10:34:06 -07:00
|
|
|
dst->es_active = src->es_active;
|
2024-04-04 08:13:16 -04:00
|
|
|
dst->vmsa_features = src->vmsa_features;
|
2021-10-21 10:43:00 -07:00
|
|
|
|
|
|
|
src->asid = 0;
|
|
|
|
src->active = false;
|
|
|
|
src->handle = 0;
|
|
|
|
src->pages_locked = 0;
|
2021-11-22 19:50:31 -05:00
|
|
|
src->enc_context_owner = NULL;
|
2022-06-23 10:34:06 -07:00
|
|
|
src->es_active = false;
|
2021-10-21 10:43:00 -07:00
|
|
|
|
KVM: SEV: do not use list_replace_init on an empty list
list_replace_init cannot be used if the source is an empty list,
because "new->next->prev = new" will overwrite "old->next":
new old
prev = new, next = new prev = old, next = old
new->next = old->next prev = new, next = old prev = old, next = old
new->next->prev = new prev = new, next = old prev = old, next = new
new->prev = old->prev prev = old, next = old prev = old, next = old
new->next->prev = new prev = old, next = old prev = new, next = new
The desired outcome instead would be to leave both old and new the same
as they were (two empty circular lists). Use list_cut_before, which
already has the necessary check and is documented to discard the
previous contents of the list that will hold the result.
Fixes: b56639318bb2 ("KVM: SEV: Add support for SEV intra host migration")
Reviewed-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20211123005036.2954379-5-pbonzini@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-11-22 19:50:28 -05:00
|
|
|
list_cut_before(&dst->regions_list, &src->regions_list, &src->regions_list);
|
2022-02-11 11:36:34 -08:00
|
|
|
|
|
|
|
/*
|
|
|
|
* If this VM has mirrors, "transfer" each mirror's refcount of the
|
|
|
|
* source to the destination (this KVM). The caller holds a reference
|
|
|
|
* to the source, so there's no danger of use-after-free.
|
|
|
|
*/
|
|
|
|
list_cut_before(&dst->mirror_vms, &src->mirror_vms, &src->mirror_vms);
|
|
|
|
list_for_each_entry(mirror, &dst->mirror_vms, mirror_entry) {
|
|
|
|
kvm_get_kvm(dst_kvm);
|
|
|
|
kvm_put_kvm(src_kvm);
|
|
|
|
mirror->enc_context_owner = dst_kvm;
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* If this VM is a mirror, remove the old mirror from the owners list
|
|
|
|
* and add the new mirror to the list.
|
|
|
|
*/
|
|
|
|
if (is_mirroring_enc_context(dst_kvm)) {
|
2025-01-23 11:21:40 +05:30
|
|
|
struct kvm_sev_info *owner_sev_info = to_kvm_sev_info(dst->enc_context_owner);
|
2022-02-11 11:36:34 -08:00
|
|
|
|
|
|
|
list_del(&src->mirror_entry);
|
|
|
|
list_add_tail(&dst->mirror_entry, &owner_sev_info->mirror_vms);
|
|
|
|
}
|
2021-10-21 10:43:00 -07:00
|
|
|
|
2022-06-23 10:34:06 -07:00
|
|
|
kvm_for_each_vcpu(i, dst_vcpu, dst_kvm) {
|
|
|
|
dst_svm = to_svm(dst_vcpu);
|
2021-10-21 10:43:01 -07:00
|
|
|
|
2022-06-23 10:34:06 -07:00
|
|
|
sev_init_vmcb(dst_svm);
|
2021-10-21 10:43:01 -07:00
|
|
|
|
2022-06-23 10:34:06 -07:00
|
|
|
if (!dst->es_active)
|
|
|
|
continue;
|
2021-10-21 10:43:01 -07:00
|
|
|
|
2022-06-23 10:34:06 -07:00
|
|
|
/*
|
|
|
|
* Note, the source is not required to have the same number of
|
|
|
|
* vCPUs as the destination when migrating a vanilla SEV VM.
|
|
|
|
*/
|
2023-08-24 19:23:56 -07:00
|
|
|
src_vcpu = kvm_get_vcpu(src_kvm, i);
|
2021-10-21 10:43:01 -07:00
|
|
|
src_svm = to_svm(src_vcpu);
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Transfer VMSA and GHCB state to the destination. Nullify and
|
|
|
|
* clear source fields as appropriate, the state now belongs to
|
|
|
|
* the destination.
|
|
|
|
*/
|
|
|
|
memcpy(&dst_svm->sev_es, &src_svm->sev_es, sizeof(src_svm->sev_es));
|
|
|
|
dst_svm->vmcb->control.ghcb_gpa = src_svm->vmcb->control.ghcb_gpa;
|
|
|
|
dst_svm->vmcb->control.vmsa_pa = src_svm->vmcb->control.vmsa_pa;
|
|
|
|
dst_vcpu->arch.guest_state_protected = true;
|
|
|
|
|
|
|
|
memset(&src_svm->sev_es, 0, sizeof(src_svm->sev_es));
|
|
|
|
src_svm->vmcb->control.ghcb_gpa = INVALID_PAGE;
|
|
|
|
src_svm->vmcb->control.vmsa_pa = INVALID_PAGE;
|
|
|
|
src_vcpu->arch.guest_state_protected = false;
|
|
|
|
}
|
2022-06-23 10:34:06 -07:00
|
|
|
}
|
|
|
|
|
|
|
|
static int sev_check_source_vcpus(struct kvm *dst, struct kvm *src)
|
|
|
|
{
|
|
|
|
struct kvm_vcpu *src_vcpu;
|
|
|
|
unsigned long i;
|
|
|
|
|
KVM: SVM: Reject SEV{-ES} intra host migration if vCPU creation is in-flight
Reject migration of SEV{-ES} state if either the source or destination VM
is actively creating a vCPU, i.e. if kvm_vm_ioctl_create_vcpu() is in the
section between incrementing created_vcpus and online_vcpus. The bulk of
vCPU creation runs _outside_ of kvm->lock to allow creating multiple vCPUs
in parallel, and so sev_info.es_active can get toggled from false=>true in
the destination VM after (or during) svm_vcpu_create(), resulting in an
SEV{-ES} VM effectively having a non-SEV{-ES} vCPU.
The issue manifests most visibly as a crash when trying to free a vCPU's
NULL VMSA page in an SEV-ES VM, but any number of things can go wrong.
BUG: unable to handle page fault for address: ffffebde00000000
#PF: supervisor read access in kernel mode
#PF: error_code(0x0000) - not-present page
PGD 0 P4D 0
Oops: Oops: 0000 [#1] SMP KASAN NOPTI
CPU: 227 UID: 0 PID: 64063 Comm: syz.5.60023 Tainted: G U O 6.15.0-smp-DEV #2 NONE
Tainted: [U]=USER, [O]=OOT_MODULE
Hardware name: Google, Inc. Arcadia_IT_80/Arcadia_IT_80, BIOS 12.52.0-0 10/28/2024
RIP: 0010:constant_test_bit arch/x86/include/asm/bitops.h:206 [inline]
RIP: 0010:arch_test_bit arch/x86/include/asm/bitops.h:238 [inline]
RIP: 0010:_test_bit include/asm-generic/bitops/instrumented-non-atomic.h:142 [inline]
RIP: 0010:PageHead include/linux/page-flags.h:866 [inline]
RIP: 0010:___free_pages+0x3e/0x120 mm/page_alloc.c:5067
Code: <49> f7 06 40 00 00 00 75 05 45 31 ff eb 0c 66 90 4c 89 f0 4c 39 f0
RSP: 0018:ffff8984551978d0 EFLAGS: 00010246
RAX: 0000777f80000001 RBX: 0000000000000000 RCX: ffffffff918aeb98
RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffffebde00000000
RBP: 0000000000000000 R08: ffffebde00000007 R09: 1ffffd7bc0000000
R10: dffffc0000000000 R11: fffff97bc0000001 R12: dffffc0000000000
R13: ffff8983e19751a8 R14: ffffebde00000000 R15: 1ffffd7bc0000000
FS: 0000000000000000(0000) GS:ffff89ee661d3000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffffebde00000000 CR3: 000000793ceaa000 CR4: 0000000000350ef0
DR0: 0000000000000000 DR1: 0000000000000b5f DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Call Trace:
<TASK>
sev_free_vcpu+0x413/0x630 arch/x86/kvm/svm/sev.c:3169
svm_vcpu_free+0x13a/0x2a0 arch/x86/kvm/svm/svm.c:1515
kvm_arch_vcpu_destroy+0x6a/0x1d0 arch/x86/kvm/x86.c:12396
kvm_vcpu_destroy virt/kvm/kvm_main.c:470 [inline]
kvm_destroy_vcpus+0xd1/0x300 virt/kvm/kvm_main.c:490
kvm_arch_destroy_vm+0x636/0x820 arch/x86/kvm/x86.c:12895
kvm_put_kvm+0xb8e/0xfb0 virt/kvm/kvm_main.c:1310
kvm_vm_release+0x48/0x60 virt/kvm/kvm_main.c:1369
__fput+0x3e4/0x9e0 fs/file_table.c:465
task_work_run+0x1a9/0x220 kernel/task_work.c:227
exit_task_work include/linux/task_work.h:40 [inline]
do_exit+0x7f0/0x25b0 kernel/exit.c:953
do_group_exit+0x203/0x2d0 kernel/exit.c:1102
get_signal+0x1357/0x1480 kernel/signal.c:3034
arch_do_signal_or_restart+0x40/0x690 arch/x86/kernel/signal.c:337
exit_to_user_mode_loop kernel/entry/common.c:111 [inline]
exit_to_user_mode_prepare include/linux/entry-common.h:329 [inline]
__syscall_exit_to_user_mode_work kernel/entry/common.c:207 [inline]
syscall_exit_to_user_mode+0x67/0xb0 kernel/entry/common.c:218
do_syscall_64+0x7c/0x150 arch/x86/entry/syscall_64.c:100
entry_SYSCALL_64_after_hwframe+0x76/0x7e
RIP: 0033:0x7f87a898e969
</TASK>
Modules linked in: gq(O)
gsmi: Log Shutdown Reason 0x03
CR2: ffffebde00000000
---[ end trace 0000000000000000 ]---
Deliberately don't check for a NULL VMSA when freeing the vCPU, as crashing
the host is likely desirable due to the VMSA being consumed by hardware.
E.g. if KVM manages to allow VMRUN on the vCPU, hardware may read/write a
bogus VMSA page. Accessing PFN 0 is "fine"-ish now that it's sequestered
away thanks to L1TF, but panicking in this scenario is preferable to
potentially running with corrupted state.
Reported-by: Alexander Potapenko <glider@google.com>
Tested-by: Alexander Potapenko <glider@google.com>
Fixes: 0b020f5af092 ("KVM: SEV: Add support for SEV-ES intra host migration")
Fixes: b56639318bb2 ("KVM: SEV: Add support for SEV intra host migration")
Cc: stable@vger.kernel.org
Cc: James Houghton <jthoughton@google.com>
Cc: Peter Gonda <pgonda@google.com>
Reviewed-by: Liam Merwick <liam.merwick@oracle.com>
Tested-by: Liam Merwick <liam.merwick@oracle.com>
Reviewed-by: James Houghton <jthoughton@google.com>
Link: https://lore.kernel.org/r/20250602224459.41505-2-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-06-02 15:44:58 -07:00
|
|
|
if (src->created_vcpus != atomic_read(&src->online_vcpus) ||
|
|
|
|
dst->created_vcpus != atomic_read(&dst->online_vcpus))
|
|
|
|
return -EBUSY;
|
|
|
|
|
2022-06-23 10:34:06 -07:00
|
|
|
if (!sev_es_guest(src))
|
|
|
|
return 0;
|
|
|
|
|
|
|
|
if (atomic_read(&src->online_vcpus) != atomic_read(&dst->online_vcpus))
|
|
|
|
return -EINVAL;
|
|
|
|
|
|
|
|
kvm_for_each_vcpu(i, src_vcpu, src) {
|
|
|
|
if (!src_vcpu->arch.guest_state_protected)
|
|
|
|
return -EINVAL;
|
|
|
|
}
|
2021-10-21 10:43:01 -07:00
|
|
|
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2022-01-28 00:52:06 +00:00
|
|
|
int sev_vm_move_enc_context_from(struct kvm *kvm, unsigned int source_fd)
|
2021-10-21 10:43:00 -07:00
|
|
|
{
|
2025-01-23 11:21:40 +05:30
|
|
|
struct kvm_sev_info *dst_sev = to_kvm_sev_info(kvm);
|
2021-11-12 04:02:24 -05:00
|
|
|
struct kvm_sev_info *src_sev, *cg_cleanup_sev;
|
2024-07-19 20:17:58 -04:00
|
|
|
CLASS(fd, f)(source_fd);
|
2021-10-21 10:43:00 -07:00
|
|
|
struct kvm *source_kvm;
|
2021-11-12 04:02:24 -05:00
|
|
|
bool charged = false;
|
2021-10-21 10:43:00 -07:00
|
|
|
int ret;
|
|
|
|
|
2024-07-19 20:17:58 -04:00
|
|
|
if (fd_empty(f))
|
2022-05-14 19:45:22 -04:00
|
|
|
return -EBADF;
|
|
|
|
|
2024-07-19 20:17:58 -04:00
|
|
|
if (!file_is_kvm(fd_file(f)))
|
|
|
|
return -EBADF;
|
2021-10-21 10:43:00 -07:00
|
|
|
|
2024-05-31 14:12:01 -04:00
|
|
|
source_kvm = fd_file(f)->private_data;
|
2021-11-22 19:50:29 -05:00
|
|
|
ret = sev_lock_two_vms(kvm, source_kvm);
|
2021-10-21 10:43:00 -07:00
|
|
|
if (ret)
|
2024-07-19 20:17:58 -04:00
|
|
|
return ret;
|
2021-10-21 10:43:00 -07:00
|
|
|
|
2024-04-04 08:13:20 -04:00
|
|
|
if (kvm->arch.vm_type != source_kvm->arch.vm_type ||
|
|
|
|
sev_guest(kvm) || !sev_guest(source_kvm)) {
|
2021-10-21 10:43:00 -07:00
|
|
|
ret = -EINVAL;
|
2021-11-22 19:50:29 -05:00
|
|
|
goto out_unlock;
|
2021-10-21 10:43:00 -07:00
|
|
|
}
|
|
|
|
|
2025-01-23 11:21:40 +05:30
|
|
|
src_sev = to_kvm_sev_info(source_kvm);
|
2021-11-22 19:50:34 -05:00
|
|
|
|
2021-10-21 10:43:00 -07:00
|
|
|
dst_sev->misc_cg = get_current_misc_cg();
|
2021-11-12 04:02:24 -05:00
|
|
|
cg_cleanup_sev = dst_sev;
|
2021-10-21 10:43:00 -07:00
|
|
|
if (dst_sev->misc_cg != src_sev->misc_cg) {
|
|
|
|
ret = sev_misc_cg_try_charge(dst_sev);
|
|
|
|
if (ret)
|
2021-11-12 04:02:24 -05:00
|
|
|
goto out_dst_cgroup;
|
|
|
|
charged = true;
|
2021-10-21 10:43:00 -07:00
|
|
|
}
|
|
|
|
|
2025-05-12 14:04:05 -04:00
|
|
|
ret = kvm_lock_all_vcpus(kvm);
|
2021-10-21 10:43:00 -07:00
|
|
|
if (ret)
|
|
|
|
goto out_dst_cgroup;
|
2025-05-12 14:04:05 -04:00
|
|
|
ret = kvm_lock_all_vcpus(source_kvm);
|
2021-10-21 10:43:00 -07:00
|
|
|
if (ret)
|
|
|
|
goto out_dst_vcpu;
|
|
|
|
|
2022-06-23 10:34:06 -07:00
|
|
|
ret = sev_check_source_vcpus(kvm, source_kvm);
|
|
|
|
if (ret)
|
|
|
|
goto out_source_vcpu;
|
2022-02-11 11:36:34 -08:00
|
|
|
|
KVM: SVM: Flush cache only on CPUs running SEV guest
On AMD CPUs without ensuring cache consistency, each memory page
reclamation in an SEV guest triggers a call to do WBNOINVD/WBINVD on all
CPUs, thereby affecting the performance of other programs on the host.
Typically, an AMD server may have 128 cores or more, while the SEV guest
might only utilize 8 of these cores. Meanwhile, host can use qemu-affinity
to bind these 8 vCPUs to specific physical CPUs.
Therefore, keeping a record of the physical core numbers each time a vCPU
runs can help avoid flushing the cache for all CPUs every time.
Take care to allocate the cpumask used to track which CPUs have run a
vCPU when copying or moving an "encryption context", as nothing guarantees
memory in a mirror VM is a strict subset of the ASID owner, and the
destination VM for intrahost migration needs to maintain it's own set of
CPUs. E.g. for intrahost migration, if a CPU was used for the source VM
but not the destination VM, then it can only have cached memory that was
accessible to the source VM. And a CPU that was run in the source is also
used by the destination is no different than a CPU that was run in the
destination only.
Note, KVM is guaranteed to do flush caches prior to sev_vm_destroy(),
thanks to kvm_arch_guest_memory_reclaimed for SEV and SEV-ES, and
kvm_arch_gmem_invalidate() for SEV-SNP. I.e. it's safe to free the
cpumask prior to unregistering encrypted regions and freeing the ASID.
Opportunistically clean up sev_vm_destroy()'s comment regarding what is
(implicitly, what isn't) skipped for mirror VMs.
Cc: Srikanth Aithal <sraithal@amd.com>
Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Zheyun Shen <szy0127@sjtu.edu.cn>
Link: https://lore.kernel.org/r/20250522233733.3176144-9-seanjc@google.com
Link: https://lore.kernel.org/all/935a82e3-f7ad-47d7-aaaf-f3d2b62ed768@amd.com
Co-developed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-05-22 16:37:32 -07:00
|
|
|
/*
|
|
|
|
* Allocate a new have_run_cpus for the destination, i.e. don't copy
|
|
|
|
* the set of CPUs from the source. If a CPU was used to run a vCPU in
|
|
|
|
* the source VM but is never used for the destination VM, then the CPU
|
|
|
|
* can only have cached memory that was accessible to the source VM.
|
|
|
|
*/
|
|
|
|
if (!zalloc_cpumask_var(&dst_sev->have_run_cpus, GFP_KERNEL_ACCOUNT)) {
|
|
|
|
ret = -ENOMEM;
|
|
|
|
goto out_source_vcpu;
|
|
|
|
}
|
|
|
|
|
2022-02-11 11:36:34 -08:00
|
|
|
sev_migrate_from(kvm, source_kvm);
|
2021-10-21 10:43:00 -07:00
|
|
|
kvm_vm_dead(source_kvm);
|
2021-11-12 04:02:24 -05:00
|
|
|
cg_cleanup_sev = src_sev;
|
2021-10-21 10:43:00 -07:00
|
|
|
ret = 0;
|
|
|
|
|
2021-10-21 10:43:01 -07:00
|
|
|
out_source_vcpu:
|
2025-05-12 14:04:05 -04:00
|
|
|
kvm_unlock_all_vcpus(source_kvm);
|
2021-10-21 10:43:00 -07:00
|
|
|
out_dst_vcpu:
|
2025-05-12 14:04:05 -04:00
|
|
|
kvm_unlock_all_vcpus(kvm);
|
2021-10-21 10:43:00 -07:00
|
|
|
out_dst_cgroup:
|
2021-11-12 04:02:24 -05:00
|
|
|
/* Operates on the source on success, on the destination on failure. */
|
|
|
|
if (charged)
|
|
|
|
sev_misc_cg_uncharge(cg_cleanup_sev);
|
|
|
|
put_misc_cg(cg_cleanup_sev->misc_cg);
|
|
|
|
cg_cleanup_sev->misc_cg = NULL;
|
2021-11-22 19:50:29 -05:00
|
|
|
out_unlock:
|
|
|
|
sev_unlock_two_vms(kvm, source_kvm);
|
2021-10-21 10:43:00 -07:00
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
|
2024-04-04 08:13:15 -04:00
|
|
|
int sev_dev_get_attr(u32 group, u64 attr, u64 *val)
|
|
|
|
{
|
|
|
|
if (group != KVM_X86_GRP_SEV)
|
|
|
|
return -ENXIO;
|
|
|
|
|
|
|
|
switch (attr) {
|
|
|
|
case KVM_X86_SEV_VMSA_FEATURES:
|
|
|
|
*val = sev_supported_vmsa_features;
|
|
|
|
return 0;
|
|
|
|
|
|
|
|
default:
|
|
|
|
return -ENXIO;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2024-05-01 03:51:55 -05:00
|
|
|
/*
|
|
|
|
* The guest context contains all the information, keys and metadata
|
|
|
|
* associated with the guest that the firmware tracks to implement SEV
|
|
|
|
* and SNP features. The firmware stores the guest context in hypervisor
|
|
|
|
* provide page via the SNP_GCTX_CREATE command.
|
|
|
|
*/
|
|
|
|
static void *snp_context_create(struct kvm *kvm, struct kvm_sev_cmd *argp)
|
|
|
|
{
|
|
|
|
struct sev_data_snp_addr data = {};
|
|
|
|
void *context;
|
|
|
|
int rc;
|
|
|
|
|
|
|
|
/* Allocate memory for context page */
|
|
|
|
context = snp_alloc_firmware_page(GFP_KERNEL_ACCOUNT);
|
|
|
|
if (!context)
|
|
|
|
return NULL;
|
|
|
|
|
|
|
|
data.address = __psp_pa(context);
|
|
|
|
rc = __sev_issue_cmd(argp->sev_fd, SEV_CMD_SNP_GCTX_CREATE, &data, &argp->error);
|
|
|
|
if (rc) {
|
|
|
|
pr_warn("Failed to create SEV-SNP context, rc %d fw_error %d",
|
|
|
|
rc, argp->error);
|
|
|
|
snp_free_firmware_page(context);
|
|
|
|
return NULL;
|
|
|
|
}
|
|
|
|
|
|
|
|
return context;
|
|
|
|
}
|
|
|
|
|
|
|
|
static int snp_bind_asid(struct kvm *kvm, int *error)
|
|
|
|
{
|
2025-01-23 11:21:40 +05:30
|
|
|
struct kvm_sev_info *sev = to_kvm_sev_info(kvm);
|
2024-05-01 03:51:55 -05:00
|
|
|
struct sev_data_snp_activate data = {0};
|
|
|
|
|
|
|
|
data.gctx_paddr = __psp_pa(sev->snp_context);
|
|
|
|
data.asid = sev_get_asid(kvm);
|
|
|
|
return sev_issue_cmd(kvm, SEV_CMD_SNP_ACTIVATE, &data, error);
|
|
|
|
}
|
|
|
|
|
|
|
|
static int snp_launch_start(struct kvm *kvm, struct kvm_sev_cmd *argp)
|
|
|
|
{
|
2025-01-23 11:21:40 +05:30
|
|
|
struct kvm_sev_info *sev = to_kvm_sev_info(kvm);
|
2024-05-01 03:51:55 -05:00
|
|
|
struct sev_data_snp_launch_start start = {0};
|
|
|
|
struct kvm_sev_snp_launch_start params;
|
|
|
|
int rc;
|
|
|
|
|
|
|
|
if (!sev_snp_guest(kvm))
|
|
|
|
return -ENOTTY;
|
|
|
|
|
|
|
|
if (copy_from_user(¶ms, u64_to_user_ptr(argp->data), sizeof(params)))
|
|
|
|
return -EFAULT;
|
|
|
|
|
|
|
|
/* Don't allow userspace to allocate memory for more than 1 SNP context. */
|
|
|
|
if (sev->snp_context)
|
|
|
|
return -EINVAL;
|
|
|
|
|
|
|
|
if (params.flags)
|
|
|
|
return -EINVAL;
|
|
|
|
|
|
|
|
if (params.policy & ~SNP_POLICY_MASK_VALID)
|
|
|
|
return -EINVAL;
|
|
|
|
|
|
|
|
/* Check for policy bits that must be set */
|
2025-05-29 16:17:59 -05:00
|
|
|
if (!(params.policy & SNP_POLICY_MASK_RSVD_MBO))
|
2024-05-01 03:51:55 -05:00
|
|
|
return -EINVAL;
|
|
|
|
|
2025-03-20 08:26:49 -05:00
|
|
|
sev->policy = params.policy;
|
|
|
|
|
2024-11-05 01:05:48 +00:00
|
|
|
sev->snp_context = snp_context_create(kvm, argp);
|
|
|
|
if (!sev->snp_context)
|
|
|
|
return -ENOTTY;
|
|
|
|
|
2024-05-01 03:51:55 -05:00
|
|
|
start.gctx_paddr = __psp_pa(sev->snp_context);
|
|
|
|
start.policy = params.policy;
|
|
|
|
memcpy(start.gosvw, params.gosvw, sizeof(params.gosvw));
|
|
|
|
rc = __sev_issue_cmd(argp->sev_fd, SEV_CMD_SNP_LAUNCH_START, &start, &argp->error);
|
|
|
|
if (rc) {
|
|
|
|
pr_debug("%s: SEV_CMD_SNP_LAUNCH_START firmware command failed, rc %d\n",
|
|
|
|
__func__, rc);
|
|
|
|
goto e_free_context;
|
|
|
|
}
|
|
|
|
|
|
|
|
sev->fd = argp->sev_fd;
|
|
|
|
rc = snp_bind_asid(kvm, &argp->error);
|
|
|
|
if (rc) {
|
|
|
|
pr_debug("%s: Failed to bind ASID to SEV-SNP context, rc %d\n",
|
|
|
|
__func__, rc);
|
|
|
|
goto e_free_context;
|
|
|
|
}
|
|
|
|
|
|
|
|
return 0;
|
|
|
|
|
|
|
|
e_free_context:
|
|
|
|
snp_decommission_context(kvm);
|
|
|
|
|
|
|
|
return rc;
|
|
|
|
}
|
|
|
|
|
2024-05-01 03:51:56 -05:00
|
|
|
struct sev_gmem_populate_args {
|
|
|
|
__u8 type;
|
|
|
|
int sev_fd;
|
|
|
|
int fw_error;
|
|
|
|
};
|
|
|
|
|
|
|
|
static int sev_gmem_post_populate(struct kvm *kvm, gfn_t gfn_start, kvm_pfn_t pfn,
|
|
|
|
void __user *src, int order, void *opaque)
|
|
|
|
{
|
|
|
|
struct sev_gmem_populate_args *sev_populate_args = opaque;
|
2025-01-23 11:21:40 +05:30
|
|
|
struct kvm_sev_info *sev = to_kvm_sev_info(kvm);
|
2024-05-01 03:51:56 -05:00
|
|
|
int n_private = 0, ret, i;
|
|
|
|
int npages = (1 << order);
|
|
|
|
gfn_t gfn;
|
|
|
|
|
|
|
|
if (WARN_ON_ONCE(sev_populate_args->type != KVM_SEV_SNP_PAGE_TYPE_ZERO && !src))
|
|
|
|
return -EINVAL;
|
|
|
|
|
|
|
|
for (gfn = gfn_start, i = 0; gfn < gfn_start + npages; gfn++, i++) {
|
|
|
|
struct sev_data_snp_launch_update fw_args = {0};
|
2024-06-12 14:50:38 +03:00
|
|
|
bool assigned = false;
|
2024-05-01 03:51:56 -05:00
|
|
|
int level;
|
|
|
|
|
|
|
|
ret = snp_lookup_rmpentry((u64)pfn + i, &assigned, &level);
|
|
|
|
if (ret || assigned) {
|
|
|
|
pr_debug("%s: Failed to ensure GFN 0x%llx RMP entry is initial shared state, ret: %d assigned: %d\n",
|
|
|
|
__func__, gfn, ret, assigned);
|
KVM: guest_memfd: move check for already-populated page to common code
Do not allow populating the same page twice with startup data. In the
case of SEV-SNP, for example, the firmware does not allow it anyway,
since the launch-update operation is only possible on pages that are
still shared in the RMP.
Even if it worked, kvm_gmem_populate()'s callback is meant to have side
effects such as updating launch measurements, and updating the same
page twice is unlikely to have the desired results.
Races between calls to the ioctl are not possible because
kvm_gmem_populate() holds slots_lock and the VM should not be running.
But again, even if this worked on other confidential computing technology,
it doesn't matter to guest_memfd.c whether this is something fishy
such as missing synchronization in userspace, or rather something
intentional. One of the racers wins, and the page is initialized by
either kvm_gmem_prepare_folio() or kvm_gmem_populate().
Anyway, out of paranoia, adjust sev_gmem_post_populate() anyway to use
the same errno that kvm_gmem_populate() is using.
Reviewed-by: Michael Roth <michael.roth@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-07-11 18:27:52 -04:00
|
|
|
ret = ret ? -EINVAL : -EEXIST;
|
2024-05-01 03:51:56 -05:00
|
|
|
goto err;
|
|
|
|
}
|
|
|
|
|
|
|
|
if (src) {
|
|
|
|
void *vaddr = kmap_local_pfn(pfn + i);
|
|
|
|
|
2024-06-12 14:50:39 +03:00
|
|
|
if (copy_from_user(vaddr, src + i * PAGE_SIZE, PAGE_SIZE)) {
|
|
|
|
ret = -EFAULT;
|
2024-05-01 03:51:56 -05:00
|
|
|
goto err;
|
2024-06-12 14:50:39 +03:00
|
|
|
}
|
2024-05-01 03:51:56 -05:00
|
|
|
kunmap_local(vaddr);
|
|
|
|
}
|
|
|
|
|
|
|
|
ret = rmp_make_private(pfn + i, gfn << PAGE_SHIFT, PG_LEVEL_4K,
|
|
|
|
sev_get_asid(kvm), true);
|
|
|
|
if (ret)
|
|
|
|
goto err;
|
|
|
|
|
|
|
|
n_private++;
|
|
|
|
|
|
|
|
fw_args.gctx_paddr = __psp_pa(sev->snp_context);
|
|
|
|
fw_args.address = __sme_set(pfn_to_hpa(pfn + i));
|
|
|
|
fw_args.page_size = PG_LEVEL_TO_RMP(PG_LEVEL_4K);
|
|
|
|
fw_args.page_type = sev_populate_args->type;
|
|
|
|
|
|
|
|
ret = __sev_issue_cmd(sev_populate_args->sev_fd, SEV_CMD_SNP_LAUNCH_UPDATE,
|
|
|
|
&fw_args, &sev_populate_args->fw_error);
|
|
|
|
if (ret)
|
|
|
|
goto fw_err;
|
|
|
|
}
|
|
|
|
|
|
|
|
return 0;
|
|
|
|
|
|
|
|
fw_err:
|
|
|
|
/*
|
|
|
|
* If the firmware command failed handle the reclaim and cleanup of that
|
|
|
|
* PFN specially vs. prior pages which can be cleaned up below without
|
|
|
|
* needing to reclaim in advance.
|
|
|
|
*
|
|
|
|
* Additionally, when invalid CPUID function entries are detected,
|
|
|
|
* firmware writes the expected values into the page and leaves it
|
|
|
|
* unencrypted so it can be used for debugging and error-reporting.
|
|
|
|
*
|
|
|
|
* Copy this page back into the source buffer so userspace can use this
|
|
|
|
* information to provide information on which CPUID leaves/fields
|
|
|
|
* failed CPUID validation.
|
|
|
|
*/
|
2024-05-28 15:58:09 -05:00
|
|
|
if (!snp_page_reclaim(kvm, pfn + i) &&
|
2024-05-01 03:51:56 -05:00
|
|
|
sev_populate_args->type == KVM_SEV_SNP_PAGE_TYPE_CPUID &&
|
|
|
|
sev_populate_args->fw_error == SEV_RET_INVALID_PARAM) {
|
|
|
|
void *vaddr = kmap_local_pfn(pfn + i);
|
|
|
|
|
|
|
|
if (copy_to_user(src + i * PAGE_SIZE, vaddr, PAGE_SIZE))
|
|
|
|
pr_debug("Failed to write CPUID page back to userspace\n");
|
|
|
|
|
|
|
|
kunmap_local(vaddr);
|
|
|
|
}
|
|
|
|
|
|
|
|
/* pfn + i is hypervisor-owned now, so skip below cleanup for it. */
|
|
|
|
n_private--;
|
|
|
|
|
|
|
|
err:
|
|
|
|
pr_debug("%s: exiting with error ret %d (fw_error %d), restoring %d gmem PFNs to shared.\n",
|
|
|
|
__func__, ret, sev_populate_args->fw_error, n_private);
|
|
|
|
for (i = 0; i < n_private; i++)
|
2024-05-28 15:58:09 -05:00
|
|
|
kvm_rmp_make_shared(kvm, pfn + i, PG_LEVEL_4K);
|
2024-05-01 03:51:56 -05:00
|
|
|
|
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
|
|
|
|
static int snp_launch_update(struct kvm *kvm, struct kvm_sev_cmd *argp)
|
|
|
|
{
|
2025-01-23 11:21:40 +05:30
|
|
|
struct kvm_sev_info *sev = to_kvm_sev_info(kvm);
|
2024-05-01 03:51:56 -05:00
|
|
|
struct sev_gmem_populate_args sev_populate_args = {0};
|
|
|
|
struct kvm_sev_snp_launch_update params;
|
|
|
|
struct kvm_memory_slot *memslot;
|
|
|
|
long npages, count;
|
|
|
|
void __user *src;
|
|
|
|
int ret = 0;
|
|
|
|
|
|
|
|
if (!sev_snp_guest(kvm) || !sev->snp_context)
|
|
|
|
return -EINVAL;
|
|
|
|
|
|
|
|
if (copy_from_user(¶ms, u64_to_user_ptr(argp->data), sizeof(params)))
|
|
|
|
return -EFAULT;
|
|
|
|
|
|
|
|
pr_debug("%s: GFN start 0x%llx length 0x%llx type %d flags %d\n", __func__,
|
|
|
|
params.gfn_start, params.len, params.type, params.flags);
|
|
|
|
|
|
|
|
if (!PAGE_ALIGNED(params.len) || params.flags ||
|
|
|
|
(params.type != KVM_SEV_SNP_PAGE_TYPE_NORMAL &&
|
|
|
|
params.type != KVM_SEV_SNP_PAGE_TYPE_ZERO &&
|
|
|
|
params.type != KVM_SEV_SNP_PAGE_TYPE_UNMEASURED &&
|
|
|
|
params.type != KVM_SEV_SNP_PAGE_TYPE_SECRETS &&
|
|
|
|
params.type != KVM_SEV_SNP_PAGE_TYPE_CPUID))
|
|
|
|
return -EINVAL;
|
|
|
|
|
|
|
|
npages = params.len / PAGE_SIZE;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* For each GFN that's being prepared as part of the initial guest
|
|
|
|
* state, the following pre-conditions are verified:
|
|
|
|
*
|
|
|
|
* 1) The backing memslot is a valid private memslot.
|
|
|
|
* 2) The GFN has been set to private via KVM_SET_MEMORY_ATTRIBUTES
|
|
|
|
* beforehand.
|
|
|
|
* 3) The PFN of the guest_memfd has not already been set to private
|
|
|
|
* in the RMP table.
|
|
|
|
*
|
|
|
|
* The KVM MMU relies on kvm->mmu_invalidate_seq to retry nested page
|
|
|
|
* faults if there's a race between a fault and an attribute update via
|
|
|
|
* KVM_SET_MEMORY_ATTRIBUTES, and a similar approach could be utilized
|
|
|
|
* here. However, kvm->slots_lock guards against both this as well as
|
|
|
|
* concurrent memslot updates occurring while these checks are being
|
|
|
|
* performed, so use that here to make it easier to reason about the
|
|
|
|
* initial expected state and better guard against unexpected
|
|
|
|
* situations.
|
|
|
|
*/
|
|
|
|
mutex_lock(&kvm->slots_lock);
|
|
|
|
|
|
|
|
memslot = gfn_to_memslot(kvm, params.gfn_start);
|
|
|
|
if (!kvm_slot_can_be_private(memslot)) {
|
|
|
|
ret = -EINVAL;
|
|
|
|
goto out;
|
|
|
|
}
|
|
|
|
|
|
|
|
sev_populate_args.sev_fd = argp->sev_fd;
|
|
|
|
sev_populate_args.type = params.type;
|
|
|
|
src = params.type == KVM_SEV_SNP_PAGE_TYPE_ZERO ? NULL : u64_to_user_ptr(params.uaddr);
|
|
|
|
|
|
|
|
count = kvm_gmem_populate(kvm, params.gfn_start, src, npages,
|
|
|
|
sev_gmem_post_populate, &sev_populate_args);
|
|
|
|
if (count < 0) {
|
|
|
|
argp->error = sev_populate_args.fw_error;
|
|
|
|
pr_debug("%s: kvm_gmem_populate failed, ret %ld (fw_error %d)\n",
|
|
|
|
__func__, count, argp->error);
|
|
|
|
ret = -EIO;
|
|
|
|
} else {
|
|
|
|
params.gfn_start += count;
|
|
|
|
params.len -= count * PAGE_SIZE;
|
|
|
|
if (params.type != KVM_SEV_SNP_PAGE_TYPE_ZERO)
|
|
|
|
params.uaddr += count * PAGE_SIZE;
|
|
|
|
|
|
|
|
ret = 0;
|
|
|
|
if (copy_to_user(u64_to_user_ptr(argp->data), ¶ms, sizeof(params)))
|
|
|
|
ret = -EFAULT;
|
|
|
|
}
|
|
|
|
|
|
|
|
out:
|
|
|
|
mutex_unlock(&kvm->slots_lock);
|
|
|
|
|
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
|
2024-05-01 03:51:57 -05:00
|
|
|
static int snp_launch_update_vmsa(struct kvm *kvm, struct kvm_sev_cmd *argp)
|
|
|
|
{
|
2025-01-23 11:21:40 +05:30
|
|
|
struct kvm_sev_info *sev = to_kvm_sev_info(kvm);
|
2024-05-01 03:51:57 -05:00
|
|
|
struct sev_data_snp_launch_update data = {};
|
|
|
|
struct kvm_vcpu *vcpu;
|
|
|
|
unsigned long i;
|
|
|
|
int ret;
|
|
|
|
|
|
|
|
data.gctx_paddr = __psp_pa(sev->snp_context);
|
|
|
|
data.page_type = SNP_PAGE_TYPE_VMSA;
|
|
|
|
|
|
|
|
kvm_for_each_vcpu(i, vcpu, kvm) {
|
|
|
|
struct vcpu_svm *svm = to_svm(vcpu);
|
|
|
|
u64 pfn = __pa(svm->sev_es.vmsa) >> PAGE_SHIFT;
|
|
|
|
|
|
|
|
ret = sev_es_sync_vmsa(svm);
|
|
|
|
if (ret)
|
|
|
|
return ret;
|
|
|
|
|
|
|
|
/* Transition the VMSA page to a firmware state. */
|
|
|
|
ret = rmp_make_private(pfn, INITIAL_VMSA_GPA, PG_LEVEL_4K, sev->asid, true);
|
|
|
|
if (ret)
|
|
|
|
return ret;
|
|
|
|
|
|
|
|
/* Issue the SNP command to encrypt the VMSA */
|
|
|
|
data.address = __sme_pa(svm->sev_es.vmsa);
|
|
|
|
ret = __sev_issue_cmd(argp->sev_fd, SEV_CMD_SNP_LAUNCH_UPDATE,
|
|
|
|
&data, &argp->error);
|
|
|
|
if (ret) {
|
2024-05-28 15:58:09 -05:00
|
|
|
snp_page_reclaim(kvm, pfn);
|
2024-05-01 03:51:57 -05:00
|
|
|
|
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
|
|
|
|
svm->vcpu.arch.guest_state_protected = true;
|
2024-06-05 11:48:10 +00:00
|
|
|
/*
|
|
|
|
* SEV-ES (and thus SNP) guest mandates LBR Virtualization to
|
|
|
|
* be _always_ ON. Enable it only after setting
|
|
|
|
* guest_state_protected because KVM_SET_MSRS allows dynamic
|
|
|
|
* toggling of LBRV (for performance reason) on write access to
|
|
|
|
* MSR_IA32_DEBUGCTLMSR when guest_state_protected is not set.
|
|
|
|
*/
|
|
|
|
svm_enable_lbrv(vcpu);
|
2024-05-01 03:51:57 -05:00
|
|
|
}
|
|
|
|
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
static int snp_launch_finish(struct kvm *kvm, struct kvm_sev_cmd *argp)
|
|
|
|
{
|
2025-01-23 11:21:40 +05:30
|
|
|
struct kvm_sev_info *sev = to_kvm_sev_info(kvm);
|
2024-05-01 03:51:57 -05:00
|
|
|
struct kvm_sev_snp_launch_finish params;
|
|
|
|
struct sev_data_snp_launch_finish *data;
|
|
|
|
void *id_block = NULL, *id_auth = NULL;
|
|
|
|
int ret;
|
|
|
|
|
|
|
|
if (!sev_snp_guest(kvm))
|
|
|
|
return -ENOTTY;
|
|
|
|
|
|
|
|
if (!sev->snp_context)
|
|
|
|
return -EINVAL;
|
|
|
|
|
|
|
|
if (copy_from_user(¶ms, u64_to_user_ptr(argp->data), sizeof(params)))
|
|
|
|
return -EFAULT;
|
|
|
|
|
|
|
|
if (params.flags)
|
|
|
|
return -EINVAL;
|
|
|
|
|
|
|
|
/* Measure all vCPUs using LAUNCH_UPDATE before finalizing the launch flow. */
|
|
|
|
ret = snp_launch_update_vmsa(kvm, argp);
|
|
|
|
if (ret)
|
|
|
|
return ret;
|
|
|
|
|
|
|
|
data = kzalloc(sizeof(*data), GFP_KERNEL_ACCOUNT);
|
|
|
|
if (!data)
|
|
|
|
return -ENOMEM;
|
|
|
|
|
|
|
|
if (params.id_block_en) {
|
|
|
|
id_block = psp_copy_user_blob(params.id_block_uaddr, KVM_SEV_SNP_ID_BLOCK_SIZE);
|
|
|
|
if (IS_ERR(id_block)) {
|
|
|
|
ret = PTR_ERR(id_block);
|
|
|
|
goto e_free;
|
|
|
|
}
|
|
|
|
|
|
|
|
data->id_block_en = 1;
|
|
|
|
data->id_block_paddr = __sme_pa(id_block);
|
|
|
|
|
|
|
|
id_auth = psp_copy_user_blob(params.id_auth_uaddr, KVM_SEV_SNP_ID_AUTH_SIZE);
|
|
|
|
if (IS_ERR(id_auth)) {
|
|
|
|
ret = PTR_ERR(id_auth);
|
|
|
|
goto e_free_id_block;
|
|
|
|
}
|
|
|
|
|
|
|
|
data->id_auth_paddr = __sme_pa(id_auth);
|
|
|
|
|
|
|
|
if (params.auth_key_en)
|
|
|
|
data->auth_key_en = 1;
|
|
|
|
}
|
|
|
|
|
|
|
|
data->vcek_disabled = params.vcek_disabled;
|
|
|
|
|
|
|
|
memcpy(data->host_data, params.host_data, KVM_SEV_SNP_FINISH_DATA_SIZE);
|
|
|
|
data->gctx_paddr = __psp_pa(sev->snp_context);
|
|
|
|
ret = sev_issue_cmd(kvm, SEV_CMD_SNP_LAUNCH_FINISH, data, &argp->error);
|
|
|
|
|
2024-07-17 13:04:48 -04:00
|
|
|
/*
|
|
|
|
* Now that there will be no more SNP_LAUNCH_UPDATE ioctls, private pages
|
|
|
|
* can be given to the guest simply by marking the RMP entry as private.
|
|
|
|
* This can happen on first access and also with KVM_PRE_FAULT_MEMORY.
|
|
|
|
*/
|
|
|
|
if (!ret)
|
|
|
|
kvm->arch.pre_fault_allowed = true;
|
|
|
|
|
2024-05-01 03:51:57 -05:00
|
|
|
kfree(id_auth);
|
|
|
|
|
|
|
|
e_free_id_block:
|
|
|
|
kfree(id_block);
|
|
|
|
|
|
|
|
e_free:
|
|
|
|
kfree(data);
|
|
|
|
|
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
|
2022-01-28 00:52:06 +00:00
|
|
|
int sev_mem_enc_ioctl(struct kvm *kvm, void __user *argp)
|
2020-03-24 10:41:54 +01:00
|
|
|
{
|
|
|
|
struct kvm_sev_cmd sev_cmd;
|
|
|
|
int r;
|
|
|
|
|
2021-04-21 19:11:23 -07:00
|
|
|
if (!sev_enabled)
|
2020-03-24 10:41:54 +01:00
|
|
|
return -ENOTTY;
|
|
|
|
|
|
|
|
if (!argp)
|
|
|
|
return 0;
|
|
|
|
|
|
|
|
if (copy_from_user(&sev_cmd, argp, sizeof(struct kvm_sev_cmd)))
|
|
|
|
return -EFAULT;
|
|
|
|
|
|
|
|
mutex_lock(&kvm->lock);
|
|
|
|
|
2021-09-21 08:03:45 -07:00
|
|
|
/* Only the enc_context_owner handles some memory enc operations. */
|
|
|
|
if (is_mirroring_enc_context(kvm) &&
|
2021-11-09 21:51:01 +00:00
|
|
|
!is_cmd_allowed_from_mirror(sev_cmd.id)) {
|
2021-04-08 22:32:14 +00:00
|
|
|
r = -EINVAL;
|
|
|
|
goto out;
|
|
|
|
}
|
|
|
|
|
2024-05-01 03:51:55 -05:00
|
|
|
/*
|
|
|
|
* Once KVM_SEV_INIT2 initializes a KVM instance as an SNP guest, only
|
|
|
|
* allow the use of SNP-specific commands.
|
|
|
|
*/
|
|
|
|
if (sev_snp_guest(kvm) && sev_cmd.id < KVM_SEV_SNP_LAUNCH_START) {
|
|
|
|
r = -EPERM;
|
|
|
|
goto out;
|
|
|
|
}
|
|
|
|
|
2020-03-24 10:41:54 +01:00
|
|
|
switch (sev_cmd.id) {
|
2021-03-30 20:19:35 -07:00
|
|
|
case KVM_SEV_ES_INIT:
|
2021-04-21 19:11:17 -07:00
|
|
|
if (!sev_es_enabled) {
|
2021-03-30 20:19:35 -07:00
|
|
|
r = -ENOTTY;
|
|
|
|
goto out;
|
|
|
|
}
|
|
|
|
fallthrough;
|
2020-03-24 10:41:54 +01:00
|
|
|
case KVM_SEV_INIT:
|
|
|
|
r = sev_guest_init(kvm, &sev_cmd);
|
|
|
|
break;
|
2024-04-04 08:13:22 -04:00
|
|
|
case KVM_SEV_INIT2:
|
|
|
|
r = sev_guest_init2(kvm, &sev_cmd);
|
|
|
|
break;
|
2020-03-24 10:41:54 +01:00
|
|
|
case KVM_SEV_LAUNCH_START:
|
|
|
|
r = sev_launch_start(kvm, &sev_cmd);
|
|
|
|
break;
|
|
|
|
case KVM_SEV_LAUNCH_UPDATE_DATA:
|
|
|
|
r = sev_launch_update_data(kvm, &sev_cmd);
|
|
|
|
break;
|
2020-12-10 11:10:09 -06:00
|
|
|
case KVM_SEV_LAUNCH_UPDATE_VMSA:
|
|
|
|
r = sev_launch_update_vmsa(kvm, &sev_cmd);
|
|
|
|
break;
|
2020-03-24 10:41:54 +01:00
|
|
|
case KVM_SEV_LAUNCH_MEASURE:
|
|
|
|
r = sev_launch_measure(kvm, &sev_cmd);
|
|
|
|
break;
|
|
|
|
case KVM_SEV_LAUNCH_FINISH:
|
|
|
|
r = sev_launch_finish(kvm, &sev_cmd);
|
|
|
|
break;
|
|
|
|
case KVM_SEV_GUEST_STATUS:
|
|
|
|
r = sev_guest_status(kvm, &sev_cmd);
|
|
|
|
break;
|
|
|
|
case KVM_SEV_DBG_DECRYPT:
|
|
|
|
r = sev_dbg_crypt(kvm, &sev_cmd, true);
|
|
|
|
break;
|
|
|
|
case KVM_SEV_DBG_ENCRYPT:
|
|
|
|
r = sev_dbg_crypt(kvm, &sev_cmd, false);
|
|
|
|
break;
|
|
|
|
case KVM_SEV_LAUNCH_SECRET:
|
|
|
|
r = sev_launch_secret(kvm, &sev_cmd);
|
|
|
|
break;
|
2021-01-04 09:17:49 -06:00
|
|
|
case KVM_SEV_GET_ATTESTATION_REPORT:
|
|
|
|
r = sev_get_attestation_report(kvm, &sev_cmd);
|
|
|
|
break;
|
2021-04-15 15:53:14 +00:00
|
|
|
case KVM_SEV_SEND_START:
|
|
|
|
r = sev_send_start(kvm, &sev_cmd);
|
|
|
|
break;
|
2021-04-15 15:53:55 +00:00
|
|
|
case KVM_SEV_SEND_UPDATE_DATA:
|
|
|
|
r = sev_send_update_data(kvm, &sev_cmd);
|
|
|
|
break;
|
2021-04-15 15:54:15 +00:00
|
|
|
case KVM_SEV_SEND_FINISH:
|
|
|
|
r = sev_send_finish(kvm, &sev_cmd);
|
|
|
|
break;
|
2021-04-20 05:01:20 -04:00
|
|
|
case KVM_SEV_SEND_CANCEL:
|
|
|
|
r = sev_send_cancel(kvm, &sev_cmd);
|
|
|
|
break;
|
2021-04-15 15:54:50 +00:00
|
|
|
case KVM_SEV_RECEIVE_START:
|
|
|
|
r = sev_receive_start(kvm, &sev_cmd);
|
|
|
|
break;
|
2021-04-15 15:55:17 +00:00
|
|
|
case KVM_SEV_RECEIVE_UPDATE_DATA:
|
|
|
|
r = sev_receive_update_data(kvm, &sev_cmd);
|
|
|
|
break;
|
2021-04-15 15:55:40 +00:00
|
|
|
case KVM_SEV_RECEIVE_FINISH:
|
|
|
|
r = sev_receive_finish(kvm, &sev_cmd);
|
|
|
|
break;
|
2024-05-01 03:51:55 -05:00
|
|
|
case KVM_SEV_SNP_LAUNCH_START:
|
|
|
|
r = snp_launch_start(kvm, &sev_cmd);
|
|
|
|
break;
|
2024-05-01 03:51:56 -05:00
|
|
|
case KVM_SEV_SNP_LAUNCH_UPDATE:
|
|
|
|
r = snp_launch_update(kvm, &sev_cmd);
|
|
|
|
break;
|
2024-05-01 03:51:57 -05:00
|
|
|
case KVM_SEV_SNP_LAUNCH_FINISH:
|
|
|
|
r = snp_launch_finish(kvm, &sev_cmd);
|
|
|
|
break;
|
2020-03-24 10:41:54 +01:00
|
|
|
default:
|
|
|
|
r = -EINVAL;
|
|
|
|
goto out;
|
|
|
|
}
|
|
|
|
|
|
|
|
if (copy_to_user(argp, &sev_cmd, sizeof(struct kvm_sev_cmd)))
|
|
|
|
r = -EFAULT;
|
|
|
|
|
|
|
|
out:
|
|
|
|
mutex_unlock(&kvm->lock);
|
|
|
|
return r;
|
|
|
|
}
|
|
|
|
|
2022-01-28 00:52:06 +00:00
|
|
|
int sev_mem_enc_register_region(struct kvm *kvm,
|
|
|
|
struct kvm_enc_region *range)
|
2020-03-24 10:41:54 +01:00
|
|
|
{
|
2025-01-23 11:21:40 +05:30
|
|
|
struct kvm_sev_info *sev = to_kvm_sev_info(kvm);
|
2020-03-24 10:41:54 +01:00
|
|
|
struct enc_region *region;
|
|
|
|
int ret = 0;
|
|
|
|
|
|
|
|
if (!sev_guest(kvm))
|
|
|
|
return -ENOTTY;
|
|
|
|
|
2021-04-08 22:32:14 +00:00
|
|
|
/* If kvm is mirroring encryption context it isn't responsible for it */
|
|
|
|
if (is_mirroring_enc_context(kvm))
|
|
|
|
return -EINVAL;
|
|
|
|
|
2020-03-24 10:41:54 +01:00
|
|
|
if (range->addr > ULONG_MAX || range->size > ULONG_MAX)
|
|
|
|
return -EINVAL;
|
|
|
|
|
|
|
|
region = kzalloc(sizeof(*region), GFP_KERNEL_ACCOUNT);
|
|
|
|
if (!region)
|
|
|
|
return -ENOMEM;
|
|
|
|
|
2021-01-27 08:15:24 -08:00
|
|
|
mutex_lock(&kvm->lock);
|
2025-02-11 10:37:03 +08:00
|
|
|
region->pages = sev_pin_memory(kvm, range->addr, range->size, ®ion->npages,
|
|
|
|
FOLL_WRITE | FOLL_LONGTERM);
|
2020-06-23 05:12:24 -04:00
|
|
|
if (IS_ERR(region->pages)) {
|
|
|
|
ret = PTR_ERR(region->pages);
|
2021-01-27 08:15:24 -08:00
|
|
|
mutex_unlock(&kvm->lock);
|
2020-03-24 10:41:54 +01:00
|
|
|
goto e_free;
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* The guest may change the memory encryption attribute from C=0 -> C=1
|
|
|
|
* or vice versa for this memory range. Lets make sure caches are
|
|
|
|
* flushed to ensure that guest data gets written into memory with
|
2024-02-16 17:34:30 -08:00
|
|
|
* correct C-bit. Note, this must be done before dropping kvm->lock,
|
|
|
|
* as region and its array of pages can be freed by a different task
|
|
|
|
* once kvm->lock is released.
|
2020-03-24 10:41:54 +01:00
|
|
|
*/
|
|
|
|
sev_clflush_pages(region->pages, region->npages);
|
|
|
|
|
2024-02-16 17:34:30 -08:00
|
|
|
region->uaddr = range->addr;
|
|
|
|
region->size = range->size;
|
|
|
|
|
|
|
|
list_add_tail(®ion->list, &sev->regions_list);
|
|
|
|
mutex_unlock(&kvm->lock);
|
|
|
|
|
2020-03-24 10:41:54 +01:00
|
|
|
return ret;
|
|
|
|
|
|
|
|
e_free:
|
|
|
|
kfree(region);
|
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
|
|
|
|
static struct enc_region *
|
|
|
|
find_enc_region(struct kvm *kvm, struct kvm_enc_region *range)
|
|
|
|
{
|
2025-01-23 11:21:40 +05:30
|
|
|
struct kvm_sev_info *sev = to_kvm_sev_info(kvm);
|
2020-03-24 10:41:54 +01:00
|
|
|
struct list_head *head = &sev->regions_list;
|
|
|
|
struct enc_region *i;
|
|
|
|
|
|
|
|
list_for_each_entry(i, head, list) {
|
|
|
|
if (i->uaddr == range->addr &&
|
|
|
|
i->size == range->size)
|
|
|
|
return i;
|
|
|
|
}
|
|
|
|
|
|
|
|
return NULL;
|
|
|
|
}
|
|
|
|
|
|
|
|
static void __unregister_enc_region_locked(struct kvm *kvm,
|
|
|
|
struct enc_region *region)
|
|
|
|
{
|
|
|
|
sev_unpin_memory(kvm, region->pages, region->npages);
|
|
|
|
list_del(®ion->list);
|
|
|
|
kfree(region);
|
|
|
|
}
|
|
|
|
|
2022-01-28 00:52:06 +00:00
|
|
|
int sev_mem_enc_unregister_region(struct kvm *kvm,
|
|
|
|
struct kvm_enc_region *range)
|
2020-03-24 10:41:54 +01:00
|
|
|
{
|
|
|
|
struct enc_region *region;
|
|
|
|
int ret;
|
|
|
|
|
2021-04-08 22:32:14 +00:00
|
|
|
/* If kvm is mirroring encryption context it isn't responsible for it */
|
|
|
|
if (is_mirroring_enc_context(kvm))
|
|
|
|
return -EINVAL;
|
|
|
|
|
2020-03-24 10:41:54 +01:00
|
|
|
mutex_lock(&kvm->lock);
|
|
|
|
|
|
|
|
if (!sev_guest(kvm)) {
|
|
|
|
ret = -ENOTTY;
|
|
|
|
goto failed;
|
|
|
|
}
|
|
|
|
|
|
|
|
region = find_enc_region(kvm, range);
|
|
|
|
if (!region) {
|
|
|
|
ret = -EINVAL;
|
|
|
|
goto failed;
|
|
|
|
}
|
|
|
|
|
KVM: SVM: Flush cache only on CPUs running SEV guest
On AMD CPUs without ensuring cache consistency, each memory page
reclamation in an SEV guest triggers a call to do WBNOINVD/WBINVD on all
CPUs, thereby affecting the performance of other programs on the host.
Typically, an AMD server may have 128 cores or more, while the SEV guest
might only utilize 8 of these cores. Meanwhile, host can use qemu-affinity
to bind these 8 vCPUs to specific physical CPUs.
Therefore, keeping a record of the physical core numbers each time a vCPU
runs can help avoid flushing the cache for all CPUs every time.
Take care to allocate the cpumask used to track which CPUs have run a
vCPU when copying or moving an "encryption context", as nothing guarantees
memory in a mirror VM is a strict subset of the ASID owner, and the
destination VM for intrahost migration needs to maintain it's own set of
CPUs. E.g. for intrahost migration, if a CPU was used for the source VM
but not the destination VM, then it can only have cached memory that was
accessible to the source VM. And a CPU that was run in the source is also
used by the destination is no different than a CPU that was run in the
destination only.
Note, KVM is guaranteed to do flush caches prior to sev_vm_destroy(),
thanks to kvm_arch_guest_memory_reclaimed for SEV and SEV-ES, and
kvm_arch_gmem_invalidate() for SEV-SNP. I.e. it's safe to free the
cpumask prior to unregistering encrypted regions and freeing the ASID.
Opportunistically clean up sev_vm_destroy()'s comment regarding what is
(implicitly, what isn't) skipped for mirror VMs.
Cc: Srikanth Aithal <sraithal@amd.com>
Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Zheyun Shen <szy0127@sjtu.edu.cn>
Link: https://lore.kernel.org/r/20250522233733.3176144-9-seanjc@google.com
Link: https://lore.kernel.org/all/935a82e3-f7ad-47d7-aaaf-f3d2b62ed768@amd.com
Co-developed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-05-22 16:37:32 -07:00
|
|
|
sev_writeback_caches(kvm);
|
2020-03-24 10:41:54 +01:00
|
|
|
|
|
|
|
__unregister_enc_region_locked(kvm, region);
|
|
|
|
|
|
|
|
mutex_unlock(&kvm->lock);
|
|
|
|
return 0;
|
|
|
|
|
|
|
|
failed:
|
|
|
|
mutex_unlock(&kvm->lock);
|
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
|
2022-01-28 00:52:06 +00:00
|
|
|
int sev_vm_copy_enc_context_from(struct kvm *kvm, unsigned int source_fd)
|
2021-04-08 22:32:14 +00:00
|
|
|
{
|
2024-07-19 20:17:58 -04:00
|
|
|
CLASS(fd, f)(source_fd);
|
2021-04-08 22:32:14 +00:00
|
|
|
struct kvm *source_kvm;
|
2021-11-22 19:50:33 -05:00
|
|
|
struct kvm_sev_info *source_sev, *mirror_sev;
|
2021-04-08 22:32:14 +00:00
|
|
|
int ret;
|
|
|
|
|
2024-07-19 20:17:58 -04:00
|
|
|
if (fd_empty(f))
|
2022-05-14 19:45:22 -04:00
|
|
|
return -EBADF;
|
|
|
|
|
2024-07-19 20:17:58 -04:00
|
|
|
if (!file_is_kvm(fd_file(f)))
|
|
|
|
return -EBADF;
|
2021-04-08 22:32:14 +00:00
|
|
|
|
2024-05-31 14:12:01 -04:00
|
|
|
source_kvm = fd_file(f)->private_data;
|
2021-11-22 19:50:33 -05:00
|
|
|
ret = sev_lock_two_vms(kvm, source_kvm);
|
|
|
|
if (ret)
|
2024-07-19 20:17:58 -04:00
|
|
|
return ret;
|
2021-04-08 22:32:14 +00:00
|
|
|
|
2021-11-22 19:50:33 -05:00
|
|
|
/*
|
|
|
|
* Mirrors of mirrors should work, but let's not get silly. Also
|
|
|
|
* disallow out-of-band SEV/SEV-ES init if the target is already an
|
|
|
|
* SEV guest, or if vCPUs have been created. KVM relies on vCPUs being
|
|
|
|
* created after SEV/SEV-ES initialization, e.g. to init intercepts.
|
|
|
|
*/
|
|
|
|
if (sev_guest(kvm) || !sev_guest(source_kvm) ||
|
|
|
|
is_mirroring_enc_context(source_kvm) || kvm->created_vcpus) {
|
2021-04-08 22:32:14 +00:00
|
|
|
ret = -EINVAL;
|
2021-11-22 19:50:33 -05:00
|
|
|
goto e_unlock;
|
2021-04-08 22:32:14 +00:00
|
|
|
}
|
|
|
|
|
KVM: SVM: Flush cache only on CPUs running SEV guest
On AMD CPUs without ensuring cache consistency, each memory page
reclamation in an SEV guest triggers a call to do WBNOINVD/WBINVD on all
CPUs, thereby affecting the performance of other programs on the host.
Typically, an AMD server may have 128 cores or more, while the SEV guest
might only utilize 8 of these cores. Meanwhile, host can use qemu-affinity
to bind these 8 vCPUs to specific physical CPUs.
Therefore, keeping a record of the physical core numbers each time a vCPU
runs can help avoid flushing the cache for all CPUs every time.
Take care to allocate the cpumask used to track which CPUs have run a
vCPU when copying or moving an "encryption context", as nothing guarantees
memory in a mirror VM is a strict subset of the ASID owner, and the
destination VM for intrahost migration needs to maintain it's own set of
CPUs. E.g. for intrahost migration, if a CPU was used for the source VM
but not the destination VM, then it can only have cached memory that was
accessible to the source VM. And a CPU that was run in the source is also
used by the destination is no different than a CPU that was run in the
destination only.
Note, KVM is guaranteed to do flush caches prior to sev_vm_destroy(),
thanks to kvm_arch_guest_memory_reclaimed for SEV and SEV-ES, and
kvm_arch_gmem_invalidate() for SEV-SNP. I.e. it's safe to free the
cpumask prior to unregistering encrypted regions and freeing the ASID.
Opportunistically clean up sev_vm_destroy()'s comment regarding what is
(implicitly, what isn't) skipped for mirror VMs.
Cc: Srikanth Aithal <sraithal@amd.com>
Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Zheyun Shen <szy0127@sjtu.edu.cn>
Link: https://lore.kernel.org/r/20250522233733.3176144-9-seanjc@google.com
Link: https://lore.kernel.org/all/935a82e3-f7ad-47d7-aaaf-f3d2b62ed768@amd.com
Co-developed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-05-22 16:37:32 -07:00
|
|
|
mirror_sev = to_kvm_sev_info(kvm);
|
|
|
|
if (!zalloc_cpumask_var(&mirror_sev->have_run_cpus, GFP_KERNEL_ACCOUNT)) {
|
|
|
|
ret = -ENOMEM;
|
|
|
|
goto e_unlock;
|
|
|
|
}
|
|
|
|
|
2021-04-08 22:32:14 +00:00
|
|
|
/*
|
|
|
|
* The mirror kvm holds an enc_context_owner ref so its asid can't
|
|
|
|
* disappear until we're done with it
|
|
|
|
*/
|
2025-01-23 11:21:40 +05:30
|
|
|
source_sev = to_kvm_sev_info(source_kvm);
|
2021-04-08 22:32:14 +00:00
|
|
|
kvm_get_kvm(source_kvm);
|
2022-02-11 11:36:34 -08:00
|
|
|
list_add_tail(&mirror_sev->mirror_entry, &source_sev->mirror_vms);
|
2021-04-08 22:32:14 +00:00
|
|
|
|
|
|
|
/* Set enc_context_owner and copy its encryption context over */
|
|
|
|
mirror_sev->enc_context_owner = source_kvm;
|
|
|
|
mirror_sev->active = true;
|
2021-11-22 19:50:33 -05:00
|
|
|
mirror_sev->asid = source_sev->asid;
|
|
|
|
mirror_sev->fd = source_sev->fd;
|
|
|
|
mirror_sev->es_active = source_sev->es_active;
|
2024-04-04 08:13:20 -04:00
|
|
|
mirror_sev->need_init = false;
|
2021-11-22 19:50:33 -05:00
|
|
|
mirror_sev->handle = source_sev->handle;
|
2021-11-22 19:50:30 -05:00
|
|
|
INIT_LIST_HEAD(&mirror_sev->regions_list);
|
2022-02-11 11:36:34 -08:00
|
|
|
INIT_LIST_HEAD(&mirror_sev->mirror_vms);
|
2021-11-22 19:50:33 -05:00
|
|
|
ret = 0;
|
|
|
|
|
2021-09-21 08:03:44 -07:00
|
|
|
/*
|
|
|
|
* Do not copy ap_jump_table. Since the mirror does not share the same
|
|
|
|
* KVM contexts as the original, and they may have different
|
|
|
|
* memory-views.
|
|
|
|
*/
|
2021-04-08 22:32:14 +00:00
|
|
|
|
2021-11-22 19:50:33 -05:00
|
|
|
e_unlock:
|
|
|
|
sev_unlock_two_vms(kvm, source_kvm);
|
2021-04-08 22:32:14 +00:00
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
|
2024-05-01 03:51:55 -05:00
|
|
|
static int snp_decommission_context(struct kvm *kvm)
|
|
|
|
{
|
2025-01-23 11:21:40 +05:30
|
|
|
struct kvm_sev_info *sev = to_kvm_sev_info(kvm);
|
2024-05-01 03:51:55 -05:00
|
|
|
struct sev_data_snp_addr data = {};
|
|
|
|
int ret;
|
|
|
|
|
|
|
|
/* If context is not created then do nothing */
|
|
|
|
if (!sev->snp_context)
|
|
|
|
return 0;
|
|
|
|
|
|
|
|
/* Do the decommision, which will unbind the ASID from the SNP context */
|
|
|
|
data.address = __sme_pa(sev->snp_context);
|
|
|
|
down_write(&sev_deactivate_lock);
|
|
|
|
ret = sev_do_cmd(SEV_CMD_SNP_DECOMMISSION, &data, NULL);
|
|
|
|
up_write(&sev_deactivate_lock);
|
|
|
|
|
|
|
|
if (WARN_ONCE(ret, "Failed to release guest context, ret %d", ret))
|
|
|
|
return ret;
|
|
|
|
|
|
|
|
snp_free_firmware_page(sev->snp_context);
|
|
|
|
sev->snp_context = NULL;
|
|
|
|
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2020-03-24 10:41:54 +01:00
|
|
|
void sev_vm_destroy(struct kvm *kvm)
|
|
|
|
{
|
2025-01-23 11:21:40 +05:30
|
|
|
struct kvm_sev_info *sev = to_kvm_sev_info(kvm);
|
2020-03-24 10:41:54 +01:00
|
|
|
struct list_head *head = &sev->regions_list;
|
|
|
|
struct list_head *pos, *q;
|
|
|
|
|
|
|
|
if (!sev_guest(kvm))
|
|
|
|
return;
|
|
|
|
|
2022-02-11 11:36:34 -08:00
|
|
|
WARN_ON(!list_empty(&sev->mirror_vms));
|
|
|
|
|
KVM: SVM: Flush cache only on CPUs running SEV guest
On AMD CPUs without ensuring cache consistency, each memory page
reclamation in an SEV guest triggers a call to do WBNOINVD/WBINVD on all
CPUs, thereby affecting the performance of other programs on the host.
Typically, an AMD server may have 128 cores or more, while the SEV guest
might only utilize 8 of these cores. Meanwhile, host can use qemu-affinity
to bind these 8 vCPUs to specific physical CPUs.
Therefore, keeping a record of the physical core numbers each time a vCPU
runs can help avoid flushing the cache for all CPUs every time.
Take care to allocate the cpumask used to track which CPUs have run a
vCPU when copying or moving an "encryption context", as nothing guarantees
memory in a mirror VM is a strict subset of the ASID owner, and the
destination VM for intrahost migration needs to maintain it's own set of
CPUs. E.g. for intrahost migration, if a CPU was used for the source VM
but not the destination VM, then it can only have cached memory that was
accessible to the source VM. And a CPU that was run in the source is also
used by the destination is no different than a CPU that was run in the
destination only.
Note, KVM is guaranteed to do flush caches prior to sev_vm_destroy(),
thanks to kvm_arch_guest_memory_reclaimed for SEV and SEV-ES, and
kvm_arch_gmem_invalidate() for SEV-SNP. I.e. it's safe to free the
cpumask prior to unregistering encrypted regions and freeing the ASID.
Opportunistically clean up sev_vm_destroy()'s comment regarding what is
(implicitly, what isn't) skipped for mirror VMs.
Cc: Srikanth Aithal <sraithal@amd.com>
Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Zheyun Shen <szy0127@sjtu.edu.cn>
Link: https://lore.kernel.org/r/20250522233733.3176144-9-seanjc@google.com
Link: https://lore.kernel.org/all/935a82e3-f7ad-47d7-aaaf-f3d2b62ed768@amd.com
Co-developed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-05-22 16:37:32 -07:00
|
|
|
free_cpumask_var(sev->have_run_cpus);
|
|
|
|
|
|
|
|
/*
|
|
|
|
* If this is a mirror VM, remove it from the owner's list of a mirrors
|
|
|
|
* and skip ASID cleanup (the ASID is tied to the lifetime of the owner).
|
|
|
|
* Note, mirror VMs don't support registering encrypted regions.
|
|
|
|
*/
|
2021-04-08 22:32:14 +00:00
|
|
|
if (is_mirroring_enc_context(kvm)) {
|
2021-11-22 19:50:34 -05:00
|
|
|
struct kvm *owner_kvm = sev->enc_context_owner;
|
|
|
|
|
|
|
|
mutex_lock(&owner_kvm->lock);
|
2022-02-11 11:36:34 -08:00
|
|
|
list_del(&sev->mirror_entry);
|
2021-11-22 19:50:34 -05:00
|
|
|
mutex_unlock(&owner_kvm->lock);
|
|
|
|
kvm_put_kvm(owner_kvm);
|
2021-04-08 22:32:14 +00:00
|
|
|
return;
|
|
|
|
}
|
|
|
|
|
2020-03-24 10:41:54 +01:00
|
|
|
|
|
|
|
/*
|
|
|
|
* if userspace was terminated before unregistering the memory regions
|
|
|
|
* then lets unpin all the registered memory.
|
|
|
|
*/
|
|
|
|
if (!list_empty(head)) {
|
|
|
|
list_for_each_safe(pos, q, head) {
|
|
|
|
__unregister_enc_region_locked(kvm,
|
|
|
|
list_entry(pos, struct enc_region, list));
|
2020-08-25 12:56:28 -07:00
|
|
|
cond_resched();
|
2020-03-24 10:41:54 +01:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2024-05-01 03:51:55 -05:00
|
|
|
if (sev_snp_guest(kvm)) {
|
KVM: SEV: Provide support for SNP_GUEST_REQUEST NAE event
Version 2 of GHCB specification added support for the SNP Guest Request
Message NAE event. The event allows for an SEV-SNP guest to make
requests to the SEV-SNP firmware through the hypervisor using the
SNP_GUEST_REQUEST API defined in the SEV-SNP firmware specification.
This is used by guests primarily to request attestation reports from
firmware. There are other request types are available as well, but the
specifics of what guest requests are being made generally does not
affect how they are handled by the hypervisor, which only serves as a
proxy for the guest requests and firmware responses.
Implement handling for these events.
When an SNP Guest Request is issued, the guest will provide its own
request/response pages, which could in theory be passed along directly
to firmware. However, these pages would need special care:
- Both pages are from shared guest memory, so they need to be
protected from migration/etc. occurring while firmware reads/writes
to them. At a minimum, this requires elevating the ref counts and
potentially needing an explicit pinning of the memory. This places
additional restrictions on what type of memory backends userspace
can use for shared guest memory since there would be some reliance
on using refcounted pages.
- The response page needs to be switched to Firmware-owned state
before the firmware can write to it, which can lead to potential
host RMP #PFs if the guest is misbehaved and hands the host a
guest page that KVM is writing to for other reasons (e.g. virtio
buffers).
Both of these issues can be avoided completely by using
separately-allocated bounce pages for both the request/response pages
and passing those to firmware instead. So that's the approach taken
here.
Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Co-developed-by: Alexey Kardashevskiy <aik@amd.com>
Signed-off-by: Alexey Kardashevskiy <aik@amd.com>
Co-developed-by: Ashish Kalra <ashish.kalra@amd.com>
Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com>
Reviewed-by: Liam Merwick <liam.merwick@oracle.com>
[mdr: ensure FW command failures are indicated to guest, drop extended
request handling to be re-written as separate patch, massage commit]
Signed-off-by: Michael Roth <michael.roth@amd.com>
Message-ID: <20240701223148.3798365-2-michael.roth@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-07-01 17:31:46 -05:00
|
|
|
snp_guest_req_cleanup(kvm);
|
|
|
|
|
2024-05-01 03:51:55 -05:00
|
|
|
/*
|
|
|
|
* Decomission handles unbinding of the ASID. If it fails for
|
|
|
|
* some unexpected reason, just leak the ASID.
|
|
|
|
*/
|
|
|
|
if (snp_decommission_context(kvm))
|
|
|
|
return;
|
|
|
|
} else {
|
|
|
|
sev_unbind_asid(kvm, sev->handle);
|
|
|
|
}
|
|
|
|
|
2021-03-29 21:42:06 -07:00
|
|
|
sev_asid_free(sev);
|
2020-03-24 10:41:54 +01:00
|
|
|
}
|
|
|
|
|
2021-04-21 19:11:15 -07:00
|
|
|
void __init sev_set_cpu_caps(void)
|
|
|
|
{
|
2024-04-04 08:13:20 -04:00
|
|
|
if (sev_enabled) {
|
2024-04-04 08:13:11 -04:00
|
|
|
kvm_cpu_cap_set(X86_FEATURE_SEV);
|
2024-04-04 08:13:20 -04:00
|
|
|
kvm_caps.supported_vm_types |= BIT(KVM_X86_SEV_VM);
|
|
|
|
}
|
|
|
|
if (sev_es_enabled) {
|
2024-04-04 08:13:11 -04:00
|
|
|
kvm_cpu_cap_set(X86_FEATURE_SEV_ES);
|
2024-04-04 08:13:20 -04:00
|
|
|
kvm_caps.supported_vm_types |= BIT(KVM_X86_SEV_ES_VM);
|
|
|
|
}
|
2024-05-01 03:51:54 -05:00
|
|
|
if (sev_snp_enabled) {
|
|
|
|
kvm_cpu_cap_set(X86_FEATURE_SEV_SNP);
|
|
|
|
kvm_caps.supported_vm_types |= BIT(KVM_X86_SNP_VM);
|
|
|
|
}
|
2021-04-21 19:11:15 -07:00
|
|
|
}
|
|
|
|
|
2025-05-12 22:16:34 +00:00
|
|
|
static bool is_sev_snp_initialized(void)
|
|
|
|
{
|
|
|
|
struct sev_user_data_snp_status *status;
|
|
|
|
struct sev_data_snp_addr buf;
|
|
|
|
bool initialized = false;
|
|
|
|
int ret, error = 0;
|
|
|
|
|
|
|
|
status = snp_alloc_firmware_page(GFP_KERNEL | __GFP_ZERO);
|
|
|
|
if (!status)
|
|
|
|
return false;
|
|
|
|
|
|
|
|
buf.address = __psp_pa(status);
|
|
|
|
ret = sev_do_cmd(SEV_CMD_SNP_PLATFORM_STATUS, &buf, &error);
|
|
|
|
if (ret) {
|
|
|
|
pr_err("SEV: SNP_PLATFORM_STATUS failed ret=%d, fw_error=%d (%#x)\n",
|
|
|
|
ret, error, error);
|
|
|
|
goto out;
|
|
|
|
}
|
|
|
|
|
|
|
|
initialized = !!status->state;
|
|
|
|
|
|
|
|
out:
|
|
|
|
snp_free_firmware_page(status);
|
|
|
|
|
|
|
|
return initialized;
|
|
|
|
}
|
|
|
|
|
2020-12-10 11:09:38 -06:00
|
|
|
void __init sev_hardware_setup(void)
|
2020-03-24 10:41:54 +01:00
|
|
|
{
|
2021-03-29 21:42:06 -07:00
|
|
|
unsigned int eax, ebx, ecx, edx, sev_asid_count, sev_es_asid_count;
|
2025-03-24 21:15:31 +00:00
|
|
|
struct sev_platform_init_args init_args = {0};
|
2024-05-01 03:51:54 -05:00
|
|
|
bool sev_snp_supported = false;
|
2020-12-10 11:09:38 -06:00
|
|
|
bool sev_es_supported = false;
|
|
|
|
bool sev_supported = false;
|
|
|
|
|
2023-08-24 18:36:19 -07:00
|
|
|
if (!sev_enabled || !npt_enabled || !nrips)
|
2021-04-21 19:11:14 -07:00
|
|
|
goto out;
|
|
|
|
|
2022-01-20 01:07:14 +00:00
|
|
|
/*
|
|
|
|
* SEV must obviously be supported in hardware. Sanity check that the
|
|
|
|
* CPU supports decode assists, which is mandatory for SEV guests to
|
2023-10-18 12:36:17 -07:00
|
|
|
* support instruction emulation. Ditto for flushing by ASID, as SEV
|
|
|
|
* guests are bound to a single ASID, i.e. KVM can't rotate to a new
|
|
|
|
* ASID to effect a TLB flush.
|
2022-01-20 01:07:14 +00:00
|
|
|
*/
|
|
|
|
if (!boot_cpu_has(X86_FEATURE_SEV) ||
|
2023-10-18 12:36:17 -07:00
|
|
|
WARN_ON_ONCE(!boot_cpu_has(X86_FEATURE_DECODEASSISTS)) ||
|
|
|
|
WARN_ON_ONCE(!boot_cpu_has(X86_FEATURE_FLUSHBYASID)))
|
2020-12-10 11:09:38 -06:00
|
|
|
goto out;
|
|
|
|
|
2025-02-10 22:54:02 +00:00
|
|
|
/*
|
|
|
|
* The kernel's initcall infrastructure lacks the ability to express
|
|
|
|
* dependencies between initcalls, whereas the modules infrastructure
|
|
|
|
* automatically handles dependencies via symbol loading. Ensure the
|
|
|
|
* PSP SEV driver is initialized before proceeding if KVM is built-in,
|
|
|
|
* as the dependency isn't handled by the initcall infrastructure.
|
|
|
|
*/
|
|
|
|
if (IS_BUILTIN(CONFIG_KVM_AMD) && sev_module_init())
|
|
|
|
goto out;
|
|
|
|
|
2020-12-10 11:09:38 -06:00
|
|
|
/* Retrieve SEV CPUID information */
|
|
|
|
cpuid(0x8000001f, &eax, &ebx, &ecx, &edx);
|
|
|
|
|
2020-12-10 11:09:49 -06:00
|
|
|
/* Set encryption bit location for SEV-ES guests */
|
|
|
|
sev_enc_bit = ebx & 0x3f;
|
|
|
|
|
2020-03-24 10:41:54 +01:00
|
|
|
/* Maximum number of encrypted guests supported simultaneously */
|
2020-12-10 11:09:38 -06:00
|
|
|
max_sev_asid = ecx;
|
2021-04-21 19:11:21 -07:00
|
|
|
if (!max_sev_asid)
|
2020-12-10 11:09:38 -06:00
|
|
|
goto out;
|
2020-03-24 10:41:54 +01:00
|
|
|
|
|
|
|
/* Minimum ASID value that should be used for SEV guest */
|
2020-12-10 11:09:38 -06:00
|
|
|
min_sev_asid = edx;
|
2021-04-15 15:53:55 +00:00
|
|
|
sev_me_mask = 1UL << (ebx & 0x3f);
|
2020-03-24 10:41:54 +01:00
|
|
|
|
2021-08-02 11:09:03 -07:00
|
|
|
/*
|
|
|
|
* Initialize SEV ASID bitmaps. Allocate space for ASID 0 in the bitmap,
|
|
|
|
* even though it's never used, so that the bitmap is indexed by the
|
|
|
|
* actual ASID.
|
|
|
|
*/
|
|
|
|
nr_asids = max_sev_asid + 1;
|
|
|
|
sev_asid_bitmap = bitmap_zalloc(nr_asids, GFP_KERNEL);
|
2020-03-24 10:41:54 +01:00
|
|
|
if (!sev_asid_bitmap)
|
2020-12-10 11:09:38 -06:00
|
|
|
goto out;
|
2020-03-24 10:41:54 +01:00
|
|
|
|
2021-08-02 11:09:03 -07:00
|
|
|
sev_reclaim_asid_bitmap = bitmap_zalloc(nr_asids, GFP_KERNEL);
|
2021-04-21 19:11:12 -07:00
|
|
|
if (!sev_reclaim_asid_bitmap) {
|
|
|
|
bitmap_free(sev_asid_bitmap);
|
|
|
|
sev_asid_bitmap = NULL;
|
2020-12-10 11:09:38 -06:00
|
|
|
goto out;
|
2021-04-21 19:11:12 -07:00
|
|
|
}
|
2020-03-24 10:41:54 +01:00
|
|
|
|
2024-01-31 15:56:08 -08:00
|
|
|
if (min_sev_asid <= max_sev_asid) {
|
|
|
|
sev_asid_count = max_sev_asid - min_sev_asid + 1;
|
|
|
|
WARN_ON_ONCE(misc_cg_set_capacity(MISC_CG_RES_SEV, sev_asid_count));
|
|
|
|
}
|
2020-12-10 11:09:38 -06:00
|
|
|
sev_supported = true;
|
2020-03-24 10:41:54 +01:00
|
|
|
|
2020-12-10 11:09:38 -06:00
|
|
|
/* SEV-ES support requested? */
|
2021-04-21 19:11:17 -07:00
|
|
|
if (!sev_es_enabled)
|
2020-12-10 11:09:38 -06:00
|
|
|
goto out;
|
|
|
|
|
2022-08-03 22:49:57 +00:00
|
|
|
/*
|
|
|
|
* SEV-ES requires MMIO caching as KVM doesn't have access to the guest
|
|
|
|
* instruction stream, i.e. can't emulate in response to a #NPF and
|
|
|
|
* instead relies on #NPF(RSVD) being reflected into the guest as #VC
|
|
|
|
* (the guest can then do a #VMGEXIT to request MMIO emulation).
|
|
|
|
*/
|
|
|
|
if (!enable_mmio_caching)
|
|
|
|
goto out;
|
|
|
|
|
2020-12-10 11:09:38 -06:00
|
|
|
/* Does the CPU support SEV-ES? */
|
|
|
|
if (!boot_cpu_has(X86_FEATURE_SEV_ES))
|
|
|
|
goto out;
|
|
|
|
|
2024-05-31 04:46:43 +00:00
|
|
|
if (!lbrv) {
|
|
|
|
WARN_ONCE(!boot_cpu_has(X86_FEATURE_LBRV),
|
|
|
|
"LBRV must be present for SEV-ES support");
|
|
|
|
goto out;
|
|
|
|
}
|
|
|
|
|
2020-12-10 11:09:38 -06:00
|
|
|
/* Has the system been allocated ASIDs for SEV-ES? */
|
|
|
|
if (min_sev_asid == 1)
|
|
|
|
goto out;
|
|
|
|
|
2021-03-29 21:42:06 -07:00
|
|
|
sev_es_asid_count = min_sev_asid - 1;
|
2023-06-06 17:44:49 -07:00
|
|
|
WARN_ON_ONCE(misc_cg_set_capacity(MISC_CG_RES_SEV_ES, sev_es_asid_count));
|
2020-12-10 11:09:38 -06:00
|
|
|
sev_es_supported = true;
|
2024-05-01 03:51:54 -05:00
|
|
|
sev_snp_supported = sev_snp_enabled && cc_platform_has(CC_ATTR_HOST_SEV_SNP);
|
2020-12-10 11:09:38 -06:00
|
|
|
|
|
|
|
out:
|
2025-05-12 22:16:34 +00:00
|
|
|
if (sev_enabled) {
|
|
|
|
init_args.probe = true;
|
|
|
|
if (sev_platform_init(&init_args))
|
|
|
|
sev_supported = sev_es_supported = sev_snp_supported = false;
|
|
|
|
else if (sev_snp_supported)
|
|
|
|
sev_snp_supported = is_sev_snp_initialized();
|
|
|
|
}
|
|
|
|
|
2023-05-22 18:12:48 +02:00
|
|
|
if (boot_cpu_has(X86_FEATURE_SEV))
|
|
|
|
pr_info("SEV %s (ASIDs %u - %u)\n",
|
2024-01-31 15:56:08 -08:00
|
|
|
sev_supported ? min_sev_asid <= max_sev_asid ? "enabled" :
|
|
|
|
"unusable" :
|
|
|
|
"disabled",
|
2023-05-22 18:12:48 +02:00
|
|
|
min_sev_asid, max_sev_asid);
|
|
|
|
if (boot_cpu_has(X86_FEATURE_SEV_ES))
|
|
|
|
pr_info("SEV-ES %s (ASIDs %u - %u)\n",
|
2024-12-27 10:44:51 +01:00
|
|
|
str_enabled_disabled(sev_es_supported),
|
2023-05-22 18:12:48 +02:00
|
|
|
min_sev_asid > 1 ? 1 : 0, min_sev_asid - 1);
|
2024-05-01 03:51:54 -05:00
|
|
|
if (boot_cpu_has(X86_FEATURE_SEV_SNP))
|
|
|
|
pr_info("SEV-SNP %s (ASIDs %u - %u)\n",
|
2024-12-27 10:44:51 +01:00
|
|
|
str_enabled_disabled(sev_snp_supported),
|
2024-05-01 03:51:54 -05:00
|
|
|
min_sev_asid > 1 ? 1 : 0, min_sev_asid - 1);
|
2023-05-22 18:12:48 +02:00
|
|
|
|
2021-04-21 19:11:17 -07:00
|
|
|
sev_enabled = sev_supported;
|
|
|
|
sev_es_enabled = sev_es_supported;
|
2024-05-01 03:51:54 -05:00
|
|
|
sev_snp_enabled = sev_snp_supported;
|
|
|
|
|
2023-06-15 16:37:54 +10:00
|
|
|
if (!sev_es_enabled || !cpu_feature_enabled(X86_FEATURE_DEBUG_SWAP) ||
|
|
|
|
!cpu_feature_enabled(X86_FEATURE_NO_NESTED_DATA_BP))
|
|
|
|
sev_es_debug_swap_enabled = false;
|
2024-04-04 08:13:15 -04:00
|
|
|
|
|
|
|
sev_supported_vmsa_features = 0;
|
|
|
|
if (sev_es_debug_swap_enabled)
|
|
|
|
sev_supported_vmsa_features |= SVM_SEV_FEAT_DEBUG_SWAP;
|
2020-03-24 10:41:54 +01:00
|
|
|
}
|
|
|
|
|
2022-01-28 00:52:07 +00:00
|
|
|
void sev_hardware_unsetup(void)
|
2020-03-24 10:41:54 +01:00
|
|
|
{
|
2021-04-21 19:11:23 -07:00
|
|
|
if (!sev_enabled)
|
2020-04-13 03:20:06 -04:00
|
|
|
return;
|
|
|
|
|
2021-04-21 19:11:25 -07:00
|
|
|
/* No need to take sev_bitmap_lock, all VMs have been destroyed. */
|
2021-08-02 11:09:03 -07:00
|
|
|
sev_flush_asids(1, max_sev_asid);
|
2021-04-21 19:11:25 -07:00
|
|
|
|
2020-03-24 10:41:54 +01:00
|
|
|
bitmap_free(sev_asid_bitmap);
|
|
|
|
bitmap_free(sev_reclaim_asid_bitmap);
|
2021-04-21 19:11:25 -07:00
|
|
|
|
2021-03-29 21:42:06 -07:00
|
|
|
misc_cg_set_capacity(MISC_CG_RES_SEV, 0);
|
|
|
|
misc_cg_set_capacity(MISC_CG_RES_SEV_ES, 0);
|
2025-03-24 21:15:31 +00:00
|
|
|
|
|
|
|
sev_platform_shutdown();
|
2020-03-24 10:41:54 +01:00
|
|
|
}
|
|
|
|
|
2021-04-21 19:11:22 -07:00
|
|
|
int sev_cpu_init(struct svm_cpu_data *sd)
|
|
|
|
{
|
2021-04-21 19:11:23 -07:00
|
|
|
if (!sev_enabled)
|
2021-04-21 19:11:22 -07:00
|
|
|
return 0;
|
|
|
|
|
2021-08-02 11:09:03 -07:00
|
|
|
sd->sev_vmcbs = kcalloc(nr_asids, sizeof(void *), GFP_KERNEL);
|
2021-04-21 19:11:22 -07:00
|
|
|
if (!sd->sev_vmcbs)
|
|
|
|
return -ENOMEM;
|
|
|
|
|
|
|
|
return 0;
|
2020-03-24 10:41:54 +01:00
|
|
|
}
|
|
|
|
|
2020-12-10 11:09:40 -06:00
|
|
|
/*
|
|
|
|
* Pages used by hardware to hold guest encrypted state must be flushed before
|
|
|
|
* returning them to the system.
|
|
|
|
*/
|
KVM: SVM: Simplify and harden helper to flush SEV guest page(s)
Rework sev_flush_guest_memory() to explicitly handle only a single page,
and harden it to fall back to WBINVD if VM_PAGE_FLUSH fails. Per-page
flushing is currently used only to flush the VMSA, and in its current
form, the helper is completely broken with respect to flushing actual
guest memory, i.e. won't work correctly for an arbitrary memory range.
VM_PAGE_FLUSH takes a host virtual address, and is subject to normal page
walks, i.e. will fault if the address is not present in the host page
tables or does not have the correct permissions. Current AMD CPUs also
do not honor SMAP overrides (undocumented in kernel versions of the APM),
so passing in a userspace address is completely out of the question. In
other words, KVM would need to manually walk the host page tables to get
the pfn, ensure the pfn is stable, and then use the direct map to invoke
VM_PAGE_FLUSH. And the latter might not even work, e.g. if userspace is
particularly evil/clever and backs the guest with Secret Memory (which
unmaps memory from the direct map).
Signed-off-by: Sean Christopherson <seanjc@google.com>
Fixes: add5e2f04541 ("KVM: SVM: Add support for the SEV-ES VMSA")
Reported-by: Mingwei Zhang <mizhang@google.com>
Cc: stable@vger.kernel.org
Signed-off-by: Mingwei Zhang <mizhang@google.com>
Message-Id: <20220421031407.2516575-2-mizhang@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-04-21 03:14:05 +00:00
|
|
|
static void sev_flush_encrypted_page(struct kvm_vcpu *vcpu, void *va)
|
2020-12-10 11:09:40 -06:00
|
|
|
{
|
2024-01-31 15:56:07 -08:00
|
|
|
unsigned int asid = sev_get_asid(vcpu->kvm);
|
KVM: SVM: Simplify and harden helper to flush SEV guest page(s)
Rework sev_flush_guest_memory() to explicitly handle only a single page,
and harden it to fall back to WBINVD if VM_PAGE_FLUSH fails. Per-page
flushing is currently used only to flush the VMSA, and in its current
form, the helper is completely broken with respect to flushing actual
guest memory, i.e. won't work correctly for an arbitrary memory range.
VM_PAGE_FLUSH takes a host virtual address, and is subject to normal page
walks, i.e. will fault if the address is not present in the host page
tables or does not have the correct permissions. Current AMD CPUs also
do not honor SMAP overrides (undocumented in kernel versions of the APM),
so passing in a userspace address is completely out of the question. In
other words, KVM would need to manually walk the host page tables to get
the pfn, ensure the pfn is stable, and then use the direct map to invoke
VM_PAGE_FLUSH. And the latter might not even work, e.g. if userspace is
particularly evil/clever and backs the guest with Secret Memory (which
unmaps memory from the direct map).
Signed-off-by: Sean Christopherson <seanjc@google.com>
Fixes: add5e2f04541 ("KVM: SVM: Add support for the SEV-ES VMSA")
Reported-by: Mingwei Zhang <mizhang@google.com>
Cc: stable@vger.kernel.org
Signed-off-by: Mingwei Zhang <mizhang@google.com>
Message-Id: <20220421031407.2516575-2-mizhang@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-04-21 03:14:05 +00:00
|
|
|
|
|
|
|
/*
|
|
|
|
* Note! The address must be a kernel address, as regular page walk
|
|
|
|
* checks are performed by VM_PAGE_FLUSH, i.e. operating on a user
|
|
|
|
* address is non-deterministic and unsafe. This function deliberately
|
|
|
|
* takes a pointer to deter passing in a user address.
|
|
|
|
*/
|
|
|
|
unsigned long addr = (unsigned long)va;
|
|
|
|
|
2020-12-10 11:09:40 -06:00
|
|
|
/*
|
2022-04-21 03:14:06 +00:00
|
|
|
* If CPU enforced cache coherency for encrypted mappings of the
|
|
|
|
* same physical page is supported, use CLFLUSHOPT instead. NOTE: cache
|
|
|
|
* flush is still needed in order to work properly with DMA devices.
|
2020-12-10 11:09:40 -06:00
|
|
|
*/
|
2022-04-21 03:14:06 +00:00
|
|
|
if (boot_cpu_has(X86_FEATURE_SME_COHERENT)) {
|
|
|
|
clflush_cache_range(va, PAGE_SIZE);
|
2020-12-10 11:09:40 -06:00
|
|
|
return;
|
2022-04-21 03:14:06 +00:00
|
|
|
}
|
2020-12-10 11:09:40 -06:00
|
|
|
|
|
|
|
/*
|
KVM: SVM: Simplify and harden helper to flush SEV guest page(s)
Rework sev_flush_guest_memory() to explicitly handle only a single page,
and harden it to fall back to WBINVD if VM_PAGE_FLUSH fails. Per-page
flushing is currently used only to flush the VMSA, and in its current
form, the helper is completely broken with respect to flushing actual
guest memory, i.e. won't work correctly for an arbitrary memory range.
VM_PAGE_FLUSH takes a host virtual address, and is subject to normal page
walks, i.e. will fault if the address is not present in the host page
tables or does not have the correct permissions. Current AMD CPUs also
do not honor SMAP overrides (undocumented in kernel versions of the APM),
so passing in a userspace address is completely out of the question. In
other words, KVM would need to manually walk the host page tables to get
the pfn, ensure the pfn is stable, and then use the direct map to invoke
VM_PAGE_FLUSH. And the latter might not even work, e.g. if userspace is
particularly evil/clever and backs the guest with Secret Memory (which
unmaps memory from the direct map).
Signed-off-by: Sean Christopherson <seanjc@google.com>
Fixes: add5e2f04541 ("KVM: SVM: Add support for the SEV-ES VMSA")
Reported-by: Mingwei Zhang <mizhang@google.com>
Cc: stable@vger.kernel.org
Signed-off-by: Mingwei Zhang <mizhang@google.com>
Message-Id: <20220421031407.2516575-2-mizhang@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-04-21 03:14:05 +00:00
|
|
|
* VM Page Flush takes a host virtual address and a guest ASID. Fall
|
2025-05-22 16:37:31 -07:00
|
|
|
* back to full writeback of caches if this faults so as not to make
|
|
|
|
* any problems worse by leaving stale encrypted data in the cache.
|
2020-12-10 11:09:40 -06:00
|
|
|
*/
|
2025-04-09 22:28:57 +02:00
|
|
|
if (WARN_ON_ONCE(wrmsrq_safe(MSR_AMD64_VM_PAGE_FLUSH, addr | asid)))
|
2025-05-22 16:37:31 -07:00
|
|
|
goto do_sev_writeback_caches;
|
2020-12-10 11:09:40 -06:00
|
|
|
|
KVM: SVM: Simplify and harden helper to flush SEV guest page(s)
Rework sev_flush_guest_memory() to explicitly handle only a single page,
and harden it to fall back to WBINVD if VM_PAGE_FLUSH fails. Per-page
flushing is currently used only to flush the VMSA, and in its current
form, the helper is completely broken with respect to flushing actual
guest memory, i.e. won't work correctly for an arbitrary memory range.
VM_PAGE_FLUSH takes a host virtual address, and is subject to normal page
walks, i.e. will fault if the address is not present in the host page
tables or does not have the correct permissions. Current AMD CPUs also
do not honor SMAP overrides (undocumented in kernel versions of the APM),
so passing in a userspace address is completely out of the question. In
other words, KVM would need to manually walk the host page tables to get
the pfn, ensure the pfn is stable, and then use the direct map to invoke
VM_PAGE_FLUSH. And the latter might not even work, e.g. if userspace is
particularly evil/clever and backs the guest with Secret Memory (which
unmaps memory from the direct map).
Signed-off-by: Sean Christopherson <seanjc@google.com>
Fixes: add5e2f04541 ("KVM: SVM: Add support for the SEV-ES VMSA")
Reported-by: Mingwei Zhang <mizhang@google.com>
Cc: stable@vger.kernel.org
Signed-off-by: Mingwei Zhang <mizhang@google.com>
Message-Id: <20220421031407.2516575-2-mizhang@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-04-21 03:14:05 +00:00
|
|
|
return;
|
2020-12-10 11:09:40 -06:00
|
|
|
|
2025-05-22 16:37:31 -07:00
|
|
|
do_sev_writeback_caches:
|
KVM: SVM: Flush cache only on CPUs running SEV guest
On AMD CPUs without ensuring cache consistency, each memory page
reclamation in an SEV guest triggers a call to do WBNOINVD/WBINVD on all
CPUs, thereby affecting the performance of other programs on the host.
Typically, an AMD server may have 128 cores or more, while the SEV guest
might only utilize 8 of these cores. Meanwhile, host can use qemu-affinity
to bind these 8 vCPUs to specific physical CPUs.
Therefore, keeping a record of the physical core numbers each time a vCPU
runs can help avoid flushing the cache for all CPUs every time.
Take care to allocate the cpumask used to track which CPUs have run a
vCPU when copying or moving an "encryption context", as nothing guarantees
memory in a mirror VM is a strict subset of the ASID owner, and the
destination VM for intrahost migration needs to maintain it's own set of
CPUs. E.g. for intrahost migration, if a CPU was used for the source VM
but not the destination VM, then it can only have cached memory that was
accessible to the source VM. And a CPU that was run in the source is also
used by the destination is no different than a CPU that was run in the
destination only.
Note, KVM is guaranteed to do flush caches prior to sev_vm_destroy(),
thanks to kvm_arch_guest_memory_reclaimed for SEV and SEV-ES, and
kvm_arch_gmem_invalidate() for SEV-SNP. I.e. it's safe to free the
cpumask prior to unregistering encrypted regions and freeing the ASID.
Opportunistically clean up sev_vm_destroy()'s comment regarding what is
(implicitly, what isn't) skipped for mirror VMs.
Cc: Srikanth Aithal <sraithal@amd.com>
Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Zheyun Shen <szy0127@sjtu.edu.cn>
Link: https://lore.kernel.org/r/20250522233733.3176144-9-seanjc@google.com
Link: https://lore.kernel.org/all/935a82e3-f7ad-47d7-aaaf-f3d2b62ed768@amd.com
Co-developed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-05-22 16:37:32 -07:00
|
|
|
sev_writeback_caches(vcpu->kvm);
|
2020-12-10 11:09:40 -06:00
|
|
|
}
|
|
|
|
|
2022-04-21 03:14:07 +00:00
|
|
|
void sev_guest_memory_reclaimed(struct kvm *kvm)
|
|
|
|
{
|
2024-05-01 03:52:06 -05:00
|
|
|
/*
|
|
|
|
* With SNP+gmem, private/encrypted memory is unreachable via the
|
2025-05-22 16:37:31 -07:00
|
|
|
* hva-based mmu notifiers, i.e. these events are explicitly scoped to
|
|
|
|
* shared pages, where there's no need to flush caches.
|
2024-05-01 03:52:06 -05:00
|
|
|
*/
|
|
|
|
if (!sev_guest(kvm) || sev_snp_guest(kvm))
|
2022-04-21 03:14:07 +00:00
|
|
|
return;
|
|
|
|
|
KVM: SVM: Flush cache only on CPUs running SEV guest
On AMD CPUs without ensuring cache consistency, each memory page
reclamation in an SEV guest triggers a call to do WBNOINVD/WBINVD on all
CPUs, thereby affecting the performance of other programs on the host.
Typically, an AMD server may have 128 cores or more, while the SEV guest
might only utilize 8 of these cores. Meanwhile, host can use qemu-affinity
to bind these 8 vCPUs to specific physical CPUs.
Therefore, keeping a record of the physical core numbers each time a vCPU
runs can help avoid flushing the cache for all CPUs every time.
Take care to allocate the cpumask used to track which CPUs have run a
vCPU when copying or moving an "encryption context", as nothing guarantees
memory in a mirror VM is a strict subset of the ASID owner, and the
destination VM for intrahost migration needs to maintain it's own set of
CPUs. E.g. for intrahost migration, if a CPU was used for the source VM
but not the destination VM, then it can only have cached memory that was
accessible to the source VM. And a CPU that was run in the source is also
used by the destination is no different than a CPU that was run in the
destination only.
Note, KVM is guaranteed to do flush caches prior to sev_vm_destroy(),
thanks to kvm_arch_guest_memory_reclaimed for SEV and SEV-ES, and
kvm_arch_gmem_invalidate() for SEV-SNP. I.e. it's safe to free the
cpumask prior to unregistering encrypted regions and freeing the ASID.
Opportunistically clean up sev_vm_destroy()'s comment regarding what is
(implicitly, what isn't) skipped for mirror VMs.
Cc: Srikanth Aithal <sraithal@amd.com>
Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Zheyun Shen <szy0127@sjtu.edu.cn>
Link: https://lore.kernel.org/r/20250522233733.3176144-9-seanjc@google.com
Link: https://lore.kernel.org/all/935a82e3-f7ad-47d7-aaaf-f3d2b62ed768@amd.com
Co-developed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-05-22 16:37:32 -07:00
|
|
|
sev_writeback_caches(kvm);
|
2022-04-21 03:14:07 +00:00
|
|
|
}
|
|
|
|
|
2020-12-10 11:09:40 -06:00
|
|
|
void sev_free_vcpu(struct kvm_vcpu *vcpu)
|
|
|
|
{
|
|
|
|
struct vcpu_svm *svm;
|
|
|
|
|
|
|
|
if (!sev_es_guest(vcpu->kvm))
|
|
|
|
return;
|
|
|
|
|
|
|
|
svm = to_svm(vcpu);
|
|
|
|
|
2024-05-01 03:51:57 -05:00
|
|
|
/*
|
|
|
|
* If it's an SNP guest, then the VMSA was marked in the RMP table as
|
|
|
|
* a guest-owned page. Transition the page to hypervisor state before
|
|
|
|
* releasing it back to the system.
|
|
|
|
*/
|
|
|
|
if (sev_snp_guest(vcpu->kvm)) {
|
|
|
|
u64 pfn = __pa(svm->sev_es.vmsa) >> PAGE_SHIFT;
|
|
|
|
|
2024-05-28 15:58:09 -05:00
|
|
|
if (kvm_rmp_make_shared(vcpu->kvm, pfn, PG_LEVEL_4K))
|
2024-05-01 03:51:57 -05:00
|
|
|
goto skip_vmsa_free;
|
|
|
|
}
|
|
|
|
|
2020-12-10 11:09:40 -06:00
|
|
|
if (vcpu->arch.guest_state_protected)
|
KVM: SVM: Simplify and harden helper to flush SEV guest page(s)
Rework sev_flush_guest_memory() to explicitly handle only a single page,
and harden it to fall back to WBINVD if VM_PAGE_FLUSH fails. Per-page
flushing is currently used only to flush the VMSA, and in its current
form, the helper is completely broken with respect to flushing actual
guest memory, i.e. won't work correctly for an arbitrary memory range.
VM_PAGE_FLUSH takes a host virtual address, and is subject to normal page
walks, i.e. will fault if the address is not present in the host page
tables or does not have the correct permissions. Current AMD CPUs also
do not honor SMAP overrides (undocumented in kernel versions of the APM),
so passing in a userspace address is completely out of the question. In
other words, KVM would need to manually walk the host page tables to get
the pfn, ensure the pfn is stable, and then use the direct map to invoke
VM_PAGE_FLUSH. And the latter might not even work, e.g. if userspace is
particularly evil/clever and backs the guest with Secret Memory (which
unmaps memory from the direct map).
Signed-off-by: Sean Christopherson <seanjc@google.com>
Fixes: add5e2f04541 ("KVM: SVM: Add support for the SEV-ES VMSA")
Reported-by: Mingwei Zhang <mizhang@google.com>
Cc: stable@vger.kernel.org
Signed-off-by: Mingwei Zhang <mizhang@google.com>
Message-Id: <20220421031407.2516575-2-mizhang@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-04-21 03:14:05 +00:00
|
|
|
sev_flush_encrypted_page(vcpu, svm->sev_es.vmsa);
|
|
|
|
|
2021-10-21 10:42:59 -07:00
|
|
|
__free_page(virt_to_page(svm->sev_es.vmsa));
|
2020-12-10 11:09:53 -06:00
|
|
|
|
2024-05-01 03:51:57 -05:00
|
|
|
skip_vmsa_free:
|
2021-10-21 10:42:59 -07:00
|
|
|
if (svm->sev_es.ghcb_sa_free)
|
2021-11-09 22:23:50 +00:00
|
|
|
kvfree(svm->sev_es.ghcb_sa);
|
2020-12-10 11:09:40 -06:00
|
|
|
}
|
|
|
|
|
2025-04-28 13:55:31 -05:00
|
|
|
static u64 kvm_ghcb_get_sw_exit_code(struct vmcb_control_area *control)
|
|
|
|
{
|
|
|
|
return (((u64)control->exit_code_hi) << 32) | control->exit_code;
|
|
|
|
}
|
|
|
|
|
2020-12-10 11:09:47 -06:00
|
|
|
static void dump_ghcb(struct vcpu_svm *svm)
|
|
|
|
{
|
2025-04-28 13:55:31 -05:00
|
|
|
struct vmcb_control_area *control = &svm->vmcb->control;
|
2020-12-10 11:09:47 -06:00
|
|
|
unsigned int nbits;
|
|
|
|
|
|
|
|
/* Re-use the dump_invalid_vmcb module parameter */
|
|
|
|
if (!dump_invalid_vmcb) {
|
|
|
|
pr_warn_ratelimited("set kvm_amd.dump_invalid_vmcb=1 to dump internal KVM state.\n");
|
|
|
|
return;
|
|
|
|
}
|
|
|
|
|
2025-04-28 13:55:31 -05:00
|
|
|
nbits = sizeof(svm->sev_es.valid_bitmap) * 8;
|
2020-12-10 11:09:47 -06:00
|
|
|
|
2025-04-28 13:55:31 -05:00
|
|
|
/*
|
|
|
|
* Print KVM's snapshot of the GHCB values that were (unsuccessfully)
|
|
|
|
* used to handle the exit. If the guest has since modified the GHCB
|
|
|
|
* itself, dumping the raw GHCB won't help debug why KVM was unable to
|
|
|
|
* handle the VMGEXIT that KVM observed.
|
|
|
|
*/
|
|
|
|
pr_err("GHCB (GPA=%016llx) snapshot:\n", svm->vmcb->control.ghcb_gpa);
|
2020-12-10 11:09:47 -06:00
|
|
|
pr_err("%-20s%016llx is_valid: %u\n", "sw_exit_code",
|
2025-04-28 13:55:31 -05:00
|
|
|
kvm_ghcb_get_sw_exit_code(control), kvm_ghcb_sw_exit_code_is_valid(svm));
|
2020-12-10 11:09:47 -06:00
|
|
|
pr_err("%-20s%016llx is_valid: %u\n", "sw_exit_info_1",
|
2025-04-28 13:55:31 -05:00
|
|
|
control->exit_info_1, kvm_ghcb_sw_exit_info_1_is_valid(svm));
|
2020-12-10 11:09:47 -06:00
|
|
|
pr_err("%-20s%016llx is_valid: %u\n", "sw_exit_info_2",
|
2025-04-28 13:55:31 -05:00
|
|
|
control->exit_info_2, kvm_ghcb_sw_exit_info_2_is_valid(svm));
|
2020-12-10 11:09:47 -06:00
|
|
|
pr_err("%-20s%016llx is_valid: %u\n", "sw_scratch",
|
2025-04-28 13:55:31 -05:00
|
|
|
svm->sev_es.sw_scratch, kvm_ghcb_sw_scratch_is_valid(svm));
|
|
|
|
pr_err("%-20s%*pb\n", "valid_bitmap", nbits, svm->sev_es.valid_bitmap);
|
2020-12-10 11:09:47 -06:00
|
|
|
}
|
|
|
|
|
|
|
|
static void sev_es_sync_to_ghcb(struct vcpu_svm *svm)
|
|
|
|
{
|
|
|
|
struct kvm_vcpu *vcpu = &svm->vcpu;
|
2021-10-21 10:42:59 -07:00
|
|
|
struct ghcb *ghcb = svm->sev_es.ghcb;
|
2020-12-10 11:09:47 -06:00
|
|
|
|
|
|
|
/*
|
|
|
|
* The GHCB protocol so far allows for the following data
|
|
|
|
* to be returned:
|
|
|
|
* GPRs RAX, RBX, RCX, RDX
|
|
|
|
*
|
2021-01-22 15:50:47 -08:00
|
|
|
* Copy their values, even if they may not have been written during the
|
|
|
|
* VM-Exit. It's the guest's responsibility to not consume random data.
|
2020-12-10 11:09:47 -06:00
|
|
|
*/
|
2021-01-22 15:50:47 -08:00
|
|
|
ghcb_set_rax(ghcb, vcpu->arch.regs[VCPU_REGS_RAX]);
|
|
|
|
ghcb_set_rbx(ghcb, vcpu->arch.regs[VCPU_REGS_RBX]);
|
|
|
|
ghcb_set_rcx(ghcb, vcpu->arch.regs[VCPU_REGS_RCX]);
|
|
|
|
ghcb_set_rdx(ghcb, vcpu->arch.regs[VCPU_REGS_RDX]);
|
2020-12-10 11:09:47 -06:00
|
|
|
}
|
|
|
|
|
|
|
|
static void sev_es_sync_from_ghcb(struct vcpu_svm *svm)
|
|
|
|
{
|
|
|
|
struct vmcb_control_area *control = &svm->vmcb->control;
|
|
|
|
struct kvm_vcpu *vcpu = &svm->vcpu;
|
2021-10-21 10:42:59 -07:00
|
|
|
struct ghcb *ghcb = svm->sev_es.ghcb;
|
2020-12-10 11:09:47 -06:00
|
|
|
u64 exit_code;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* The GHCB protocol so far allows for the following data
|
|
|
|
* to be supplied:
|
|
|
|
* GPRs RAX, RBX, RCX, RDX
|
|
|
|
* XCR0
|
|
|
|
* CPL
|
|
|
|
*
|
|
|
|
* VMMCALL allows the guest to provide extra registers. KVM also
|
|
|
|
* expects RSI for hypercalls, so include that, too.
|
|
|
|
*
|
|
|
|
* Copy their values to the appropriate location if supplied.
|
|
|
|
*/
|
|
|
|
memset(vcpu->arch.regs, 0, sizeof(vcpu->arch.regs));
|
|
|
|
|
2023-08-04 12:42:45 -04:00
|
|
|
BUILD_BUG_ON(sizeof(svm->sev_es.valid_bitmap) != sizeof(ghcb->save.valid_bitmap));
|
|
|
|
memcpy(&svm->sev_es.valid_bitmap, &ghcb->save.valid_bitmap, sizeof(ghcb->save.valid_bitmap));
|
2020-12-10 11:09:47 -06:00
|
|
|
|
2023-08-04 12:42:45 -04:00
|
|
|
vcpu->arch.regs[VCPU_REGS_RAX] = kvm_ghcb_get_rax_if_valid(svm, ghcb);
|
|
|
|
vcpu->arch.regs[VCPU_REGS_RBX] = kvm_ghcb_get_rbx_if_valid(svm, ghcb);
|
|
|
|
vcpu->arch.regs[VCPU_REGS_RCX] = kvm_ghcb_get_rcx_if_valid(svm, ghcb);
|
|
|
|
vcpu->arch.regs[VCPU_REGS_RDX] = kvm_ghcb_get_rdx_if_valid(svm, ghcb);
|
|
|
|
vcpu->arch.regs[VCPU_REGS_RSI] = kvm_ghcb_get_rsi_if_valid(svm, ghcb);
|
2020-12-10 11:09:47 -06:00
|
|
|
|
2023-08-04 12:42:45 -04:00
|
|
|
svm->vmcb->save.cpl = kvm_ghcb_get_cpl_if_valid(svm, ghcb);
|
|
|
|
|
|
|
|
if (kvm_ghcb_xcr0_is_valid(svm)) {
|
2020-12-10 11:09:47 -06:00
|
|
|
vcpu->arch.xcr0 = ghcb_get_xcr0(ghcb);
|
KVM: x86: Defer runtime updates of dynamic CPUID bits until CPUID emulation
Defer runtime CPUID updates until the next non-faulting CPUID emulation
or KVM_GET_CPUID2, which are the only paths in KVM that consume the
dynamic entries. Deferring the updates is especially beneficial to
nested VM-Enter/VM-Exit, as KVM will almost always detect multiple state
changes, not to mention the updates don't need to be realized while L2 is
active if CPUID is being intercepted by L1 (CPUID is a mandatory intercept
on Intel, but not AMD).
Deferring CPUID updates shaves several hundred cycles from nested VMX
roundtrips, as measured from L2 executing CPUID in a tight loop:
SKX 6850 => 6450
ICX 9000 => 8800
EMR 7900 => 7700
Alternatively, KVM could update only the CPUID leaves that are affected
by the state change, e.g. update XSAVE info only if XCR0 or XSS changes,
but that adds non-trivial complexity and doesn't solve the underlying
problem of nested transitions potentially changing both XCR0 and XSS, on
both nested VM-Enter and VM-Exit.
Skipping updates entirely if L2 is active and CPUID is being intercepted
by L1 could work for the common case. However, simply skipping updates if
L2 is active is *very* subtly dangerous and complex. Most KVM updates are
triggered by changes to the current vCPU state, which may be L2 state,
whereas performing updates only for L1 would requiring detecting changes
to L1 state. KVM would need to either track relevant L1 state, or defer
runtime CPUID updates until the next nested VM-Exit. The former is ugly
and complex, while the latter comes with similar dangers to deferring all
CPUID updates, and would only address the nested VM-Enter path.
To guard against using stale data, disallow querying dynamic CPUID feature
bits, i.e. features that KVM updates at runtime, via a compile-time
assertion in guest_cpu_cap_has(). Exempt MWAIT from the rule, as the
MISC_ENABLE_NO_MWAIT means that MWAIT is _conditionally_ a dynamic CPUID
feature.
Note, the rule could be enforced for MWAIT as well, e.g. by querying guest
CPUID in kvm_emulate_monitor_mwait, but there's no obvious advtantage to
doing so, and allowing MWAIT for guest_cpuid_has() opens up a different can
of worms. MONITOR/MWAIT can't be virtualized (for a reasonable definition),
and the nature of the MWAIT_NEVER_UD_FAULTS and MISC_ENABLE_NO_MWAIT quirks
means checking X86_FEATURE_MWAIT outside of kvm_emulate_monitor_mwait() is
wrong for other reasons.
Beyond the aforementioned feature bits, the only other dynamic CPUID
(sub)leaves are the XSAVE sizes, and similar to MWAIT, consuming those
CPUID entries in KVM is all but guaranteed to be a bug. The layout for an
actual XSAVE buffer depends on the format (compacted or not) and
potentially the features that are actually enabled. E.g. see the logic in
fpstate_clear_xstate_component() needed to poke into the guest's effective
XSAVE state to clear MPX state on INIT. KVM does consume
CPUID.0xD.0.{EAX,EDX} in kvm_check_cpuid() and cpuid_get_supported_xcr0(),
but not EBX, which is the only dynamic output register in the leaf.
Link: https://lore.kernel.org/r/20241211013302.1347853-6-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-12-10 17:33:02 -08:00
|
|
|
vcpu->arch.cpuid_dynamic_bits_dirty = true;
|
2020-12-10 11:09:47 -06:00
|
|
|
}
|
|
|
|
|
|
|
|
/* Copy the GHCB exit information into the VMCB fields */
|
|
|
|
exit_code = ghcb_get_sw_exit_code(ghcb);
|
|
|
|
control->exit_code = lower_32_bits(exit_code);
|
|
|
|
control->exit_code_hi = upper_32_bits(exit_code);
|
|
|
|
control->exit_info_1 = ghcb_get_sw_exit_info_1(ghcb);
|
|
|
|
control->exit_info_2 = ghcb_get_sw_exit_info_2(ghcb);
|
2023-08-04 12:42:45 -04:00
|
|
|
svm->sev_es.sw_scratch = kvm_ghcb_get_sw_scratch_if_valid(svm, ghcb);
|
2020-12-10 11:09:47 -06:00
|
|
|
|
|
|
|
/* Clear the valid entries fields */
|
|
|
|
memset(ghcb->save.valid_bitmap, 0, sizeof(ghcb->save.valid_bitmap));
|
|
|
|
}
|
|
|
|
|
KVM: SVM: Exit to userspace on ENOMEM/EFAULT GHCB errors
Exit to userspace if setup_vmgexit_scratch() fails due to OOM or because
copying data from guest (userspace) memory failed/faulted. The OOM
scenario is clearcut, it's userspace's decision as to whether it should
terminate the guest, free memory, etc...
As for -EFAULT, arguably, any guest issue is a violation of the guest's
contract with userspace, and thus userspace needs to decide how to
proceed. E.g. userspace defines what is RAM vs. MMIO and communicates
that directly to the guest, KVM is not involved in deciding what is/isn't
RAM nor in communicating that information to the guest. If the scratch
GPA doesn't resolve to a memslot, then the guest is not honoring the
memory configuration as defined by userspace.
And if userspace unmaps an hva for whatever reason, then exiting to
userspace with -EFAULT is absolutely the right thing to do. KVM's ABI
currently sucks and doesn't provide enough information to act on the
-EFAULT, but that will hopefully be remedied in the future as there are
multiple use cases, e.g. uffd and virtiofs truncation, that shouldn't
require any work in KVM beyond returning -EFAULT with a small amount of
metadata.
KVM could define its ABI such that failure to access the scratch area is
reflected into the guest, i.e. establish a contract with userspace, but
that's undesirable as it limits KVM's options in the future, e.g. in the
potential uffd case any failure on a uaccess needs to kick out to
userspace. KVM does have several cases where it reflects these errors
into the guest, e.g. kvm_pv_clock_pairing() and Hyper-V emulation, but
KVM would preferably "fix" those instead of propagating the falsehood
that any memory failure is the guest's fault.
Lastly, returning a boolean as an "error" for that a helper that isn't
named accordingly never works out well.
Fixes: ad5b353240c8 ("KVM: SVM: Do not terminate SEV-ES guests on GHCB validation failure")
Cc: Alper Gun <alpergun@google.com>
Cc: Peter Gonda <pgonda@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20220225205209.3881130-1-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-02-25 20:52:09 +00:00
|
|
|
static int sev_es_validate_vmgexit(struct vcpu_svm *svm)
|
2020-12-10 11:09:47 -06:00
|
|
|
{
|
2023-08-04 12:56:36 -04:00
|
|
|
struct vmcb_control_area *control = &svm->vmcb->control;
|
|
|
|
struct kvm_vcpu *vcpu = &svm->vcpu;
|
2021-12-02 12:52:05 -06:00
|
|
|
u64 exit_code;
|
|
|
|
u64 reason;
|
2020-12-10 11:09:47 -06:00
|
|
|
|
|
|
|
/*
|
2021-12-02 12:52:05 -06:00
|
|
|
* Retrieve the exit code now even though it may not be marked valid
|
2020-12-10 11:09:47 -06:00
|
|
|
* as it could help with debugging.
|
|
|
|
*/
|
2023-08-04 12:56:36 -04:00
|
|
|
exit_code = kvm_ghcb_get_sw_exit_code(control);
|
2020-12-10 11:09:47 -06:00
|
|
|
|
2021-12-02 12:52:05 -06:00
|
|
|
/* Only GHCB Usage code 0 is supported */
|
2023-08-04 13:01:43 -04:00
|
|
|
if (svm->sev_es.ghcb->ghcb_usage) {
|
2021-12-02 12:52:05 -06:00
|
|
|
reason = GHCB_ERR_INVALID_USAGE;
|
|
|
|
goto vmgexit_err;
|
|
|
|
}
|
|
|
|
|
|
|
|
reason = GHCB_ERR_MISSING_INPUT;
|
|
|
|
|
2023-08-04 12:42:45 -04:00
|
|
|
if (!kvm_ghcb_sw_exit_code_is_valid(svm) ||
|
|
|
|
!kvm_ghcb_sw_exit_info_1_is_valid(svm) ||
|
|
|
|
!kvm_ghcb_sw_exit_info_2_is_valid(svm))
|
2020-12-10 11:09:47 -06:00
|
|
|
goto vmgexit_err;
|
|
|
|
|
2023-08-04 12:56:36 -04:00
|
|
|
switch (exit_code) {
|
2020-12-10 11:09:47 -06:00
|
|
|
case SVM_EXIT_READ_DR7:
|
|
|
|
break;
|
|
|
|
case SVM_EXIT_WRITE_DR7:
|
2023-08-04 12:42:45 -04:00
|
|
|
if (!kvm_ghcb_rax_is_valid(svm))
|
2020-12-10 11:09:47 -06:00
|
|
|
goto vmgexit_err;
|
|
|
|
break;
|
|
|
|
case SVM_EXIT_RDTSC:
|
|
|
|
break;
|
|
|
|
case SVM_EXIT_RDPMC:
|
2023-08-04 12:42:45 -04:00
|
|
|
if (!kvm_ghcb_rcx_is_valid(svm))
|
2020-12-10 11:09:47 -06:00
|
|
|
goto vmgexit_err;
|
|
|
|
break;
|
|
|
|
case SVM_EXIT_CPUID:
|
2023-08-04 12:42:45 -04:00
|
|
|
if (!kvm_ghcb_rax_is_valid(svm) ||
|
|
|
|
!kvm_ghcb_rcx_is_valid(svm))
|
2020-12-10 11:09:47 -06:00
|
|
|
goto vmgexit_err;
|
2023-08-04 12:56:36 -04:00
|
|
|
if (vcpu->arch.regs[VCPU_REGS_RAX] == 0xd)
|
2023-08-04 12:42:45 -04:00
|
|
|
if (!kvm_ghcb_xcr0_is_valid(svm))
|
2020-12-10 11:09:47 -06:00
|
|
|
goto vmgexit_err;
|
|
|
|
break;
|
|
|
|
case SVM_EXIT_INVD:
|
|
|
|
break;
|
|
|
|
case SVM_EXIT_IOIO:
|
2023-08-04 12:56:36 -04:00
|
|
|
if (control->exit_info_1 & SVM_IOIO_STR_MASK) {
|
2023-08-04 12:42:45 -04:00
|
|
|
if (!kvm_ghcb_sw_scratch_is_valid(svm))
|
2020-12-10 11:09:47 -06:00
|
|
|
goto vmgexit_err;
|
2020-12-10 11:09:54 -06:00
|
|
|
} else {
|
2023-08-04 12:56:36 -04:00
|
|
|
if (!(control->exit_info_1 & SVM_IOIO_TYPE_MASK))
|
2023-08-04 12:42:45 -04:00
|
|
|
if (!kvm_ghcb_rax_is_valid(svm))
|
2020-12-10 11:09:54 -06:00
|
|
|
goto vmgexit_err;
|
|
|
|
}
|
2020-12-10 11:09:47 -06:00
|
|
|
break;
|
|
|
|
case SVM_EXIT_MSR:
|
2023-08-04 12:42:45 -04:00
|
|
|
if (!kvm_ghcb_rcx_is_valid(svm))
|
2020-12-10 11:09:47 -06:00
|
|
|
goto vmgexit_err;
|
2023-08-04 12:56:36 -04:00
|
|
|
if (control->exit_info_1) {
|
2023-08-04 12:42:45 -04:00
|
|
|
if (!kvm_ghcb_rax_is_valid(svm) ||
|
|
|
|
!kvm_ghcb_rdx_is_valid(svm))
|
2020-12-10 11:09:47 -06:00
|
|
|
goto vmgexit_err;
|
|
|
|
}
|
|
|
|
break;
|
|
|
|
case SVM_EXIT_VMMCALL:
|
2023-08-04 12:42:45 -04:00
|
|
|
if (!kvm_ghcb_rax_is_valid(svm) ||
|
|
|
|
!kvm_ghcb_cpl_is_valid(svm))
|
2020-12-10 11:09:47 -06:00
|
|
|
goto vmgexit_err;
|
|
|
|
break;
|
|
|
|
case SVM_EXIT_RDTSCP:
|
|
|
|
break;
|
|
|
|
case SVM_EXIT_WBINVD:
|
|
|
|
break;
|
|
|
|
case SVM_EXIT_MONITOR:
|
2023-08-04 12:42:45 -04:00
|
|
|
if (!kvm_ghcb_rax_is_valid(svm) ||
|
|
|
|
!kvm_ghcb_rcx_is_valid(svm) ||
|
|
|
|
!kvm_ghcb_rdx_is_valid(svm))
|
2020-12-10 11:09:47 -06:00
|
|
|
goto vmgexit_err;
|
|
|
|
break;
|
|
|
|
case SVM_EXIT_MWAIT:
|
2023-08-04 12:42:45 -04:00
|
|
|
if (!kvm_ghcb_rax_is_valid(svm) ||
|
|
|
|
!kvm_ghcb_rcx_is_valid(svm))
|
2020-12-10 11:09:47 -06:00
|
|
|
goto vmgexit_err;
|
|
|
|
break;
|
2020-12-10 11:09:53 -06:00
|
|
|
case SVM_VMGEXIT_MMIO_READ:
|
|
|
|
case SVM_VMGEXIT_MMIO_WRITE:
|
2023-08-04 12:42:45 -04:00
|
|
|
if (!kvm_ghcb_sw_scratch_is_valid(svm))
|
2020-12-10 11:09:53 -06:00
|
|
|
goto vmgexit_err;
|
|
|
|
break;
|
2024-05-01 03:52:02 -05:00
|
|
|
case SVM_VMGEXIT_AP_CREATION:
|
|
|
|
if (!sev_snp_guest(vcpu->kvm))
|
|
|
|
goto vmgexit_err;
|
|
|
|
if (lower_32_bits(control->exit_info_1) != SVM_VMGEXIT_AP_DESTROY)
|
|
|
|
if (!kvm_ghcb_rax_is_valid(svm))
|
|
|
|
goto vmgexit_err;
|
|
|
|
break;
|
2020-12-14 11:16:03 -05:00
|
|
|
case SVM_VMGEXIT_NMI_COMPLETE:
|
KVM: SVM: Add support for booting APs in an SEV-ES guest
Typically under KVM, an AP is booted using the INIT-SIPI-SIPI sequence,
where the guest vCPU register state is updated and then the vCPU is VMRUN
to begin execution of the AP. For an SEV-ES guest, this won't work because
the guest register state is encrypted.
Following the GHCB specification, the hypervisor must not alter the guest
register state, so KVM must track an AP/vCPU boot. Should the guest want
to park the AP, it must use the AP Reset Hold exit event in place of, for
example, a HLT loop.
First AP boot (first INIT-SIPI-SIPI sequence):
Execute the AP (vCPU) as it was initialized and measured by the SEV-ES
support. It is up to the guest to transfer control of the AP to the
proper location.
Subsequent AP boot:
KVM will expect to receive an AP Reset Hold exit event indicating that
the vCPU is being parked and will require an INIT-SIPI-SIPI sequence to
awaken it. When the AP Reset Hold exit event is received, KVM will place
the vCPU into a simulated HLT mode. Upon receiving the INIT-SIPI-SIPI
sequence, KVM will make the vCPU runnable. It is again up to the guest
to then transfer control of the AP to the proper location.
To differentiate between an actual HLT and an AP Reset Hold, a new MP
state is introduced, KVM_MP_STATE_AP_RESET_HOLD, which the vCPU is
placed in upon receiving the AP Reset Hold exit event. Additionally, to
communicate the AP Reset Hold exit event up to userspace (if needed), a
new exit reason is introduced, KVM_EXIT_AP_RESET_HOLD.
A new x86 ops function is introduced, vcpu_deliver_sipi_vector, in order
to accomplish AP booting. For VMX, vcpu_deliver_sipi_vector is set to the
original SIPI delivery function, kvm_vcpu_deliver_sipi_vector(). SVM adds
a new function that, for non SEV-ES guests, invokes the original SIPI
delivery function, kvm_vcpu_deliver_sipi_vector(), but for SEV-ES guests,
implements the logic above.
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Message-Id: <e8fbebe8eb161ceaabdad7c01a5859a78b424d5e.1609791600.git.thomas.lendacky@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-01-04 14:20:01 -06:00
|
|
|
case SVM_VMGEXIT_AP_HLT_LOOP:
|
2020-12-15 12:44:07 -05:00
|
|
|
case SVM_VMGEXIT_AP_JUMP_TABLE:
|
2020-12-10 11:09:47 -06:00
|
|
|
case SVM_VMGEXIT_UNSUPPORTED_EVENT:
|
2024-05-01 02:10:46 -05:00
|
|
|
case SVM_VMGEXIT_HV_FEATURES:
|
2024-05-01 02:10:47 -05:00
|
|
|
case SVM_VMGEXIT_TERM_REQUEST:
|
2020-12-10 11:09:47 -06:00
|
|
|
break;
|
2024-05-01 03:52:00 -05:00
|
|
|
case SVM_VMGEXIT_PSC:
|
|
|
|
if (!sev_snp_guest(vcpu->kvm) || !kvm_ghcb_sw_scratch_is_valid(svm))
|
|
|
|
goto vmgexit_err;
|
|
|
|
break;
|
KVM: SEV: Provide support for SNP_GUEST_REQUEST NAE event
Version 2 of GHCB specification added support for the SNP Guest Request
Message NAE event. The event allows for an SEV-SNP guest to make
requests to the SEV-SNP firmware through the hypervisor using the
SNP_GUEST_REQUEST API defined in the SEV-SNP firmware specification.
This is used by guests primarily to request attestation reports from
firmware. There are other request types are available as well, but the
specifics of what guest requests are being made generally does not
affect how they are handled by the hypervisor, which only serves as a
proxy for the guest requests and firmware responses.
Implement handling for these events.
When an SNP Guest Request is issued, the guest will provide its own
request/response pages, which could in theory be passed along directly
to firmware. However, these pages would need special care:
- Both pages are from shared guest memory, so they need to be
protected from migration/etc. occurring while firmware reads/writes
to them. At a minimum, this requires elevating the ref counts and
potentially needing an explicit pinning of the memory. This places
additional restrictions on what type of memory backends userspace
can use for shared guest memory since there would be some reliance
on using refcounted pages.
- The response page needs to be switched to Firmware-owned state
before the firmware can write to it, which can lead to potential
host RMP #PFs if the guest is misbehaved and hands the host a
guest page that KVM is writing to for other reasons (e.g. virtio
buffers).
Both of these issues can be avoided completely by using
separately-allocated bounce pages for both the request/response pages
and passing those to firmware instead. So that's the approach taken
here.
Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Co-developed-by: Alexey Kardashevskiy <aik@amd.com>
Signed-off-by: Alexey Kardashevskiy <aik@amd.com>
Co-developed-by: Ashish Kalra <ashish.kalra@amd.com>
Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com>
Reviewed-by: Liam Merwick <liam.merwick@oracle.com>
[mdr: ensure FW command failures are indicated to guest, drop extended
request handling to be re-written as separate patch, massage commit]
Signed-off-by: Michael Roth <michael.roth@amd.com>
Message-ID: <20240701223148.3798365-2-michael.roth@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-07-01 17:31:46 -05:00
|
|
|
case SVM_VMGEXIT_GUEST_REQUEST:
|
2024-07-01 17:31:48 -05:00
|
|
|
case SVM_VMGEXIT_EXT_GUEST_REQUEST:
|
KVM: SEV: Provide support for SNP_GUEST_REQUEST NAE event
Version 2 of GHCB specification added support for the SNP Guest Request
Message NAE event. The event allows for an SEV-SNP guest to make
requests to the SEV-SNP firmware through the hypervisor using the
SNP_GUEST_REQUEST API defined in the SEV-SNP firmware specification.
This is used by guests primarily to request attestation reports from
firmware. There are other request types are available as well, but the
specifics of what guest requests are being made generally does not
affect how they are handled by the hypervisor, which only serves as a
proxy for the guest requests and firmware responses.
Implement handling for these events.
When an SNP Guest Request is issued, the guest will provide its own
request/response pages, which could in theory be passed along directly
to firmware. However, these pages would need special care:
- Both pages are from shared guest memory, so they need to be
protected from migration/etc. occurring while firmware reads/writes
to them. At a minimum, this requires elevating the ref counts and
potentially needing an explicit pinning of the memory. This places
additional restrictions on what type of memory backends userspace
can use for shared guest memory since there would be some reliance
on using refcounted pages.
- The response page needs to be switched to Firmware-owned state
before the firmware can write to it, which can lead to potential
host RMP #PFs if the guest is misbehaved and hands the host a
guest page that KVM is writing to for other reasons (e.g. virtio
buffers).
Both of these issues can be avoided completely by using
separately-allocated bounce pages for both the request/response pages
and passing those to firmware instead. So that's the approach taken
here.
Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Co-developed-by: Alexey Kardashevskiy <aik@amd.com>
Signed-off-by: Alexey Kardashevskiy <aik@amd.com>
Co-developed-by: Ashish Kalra <ashish.kalra@amd.com>
Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com>
Reviewed-by: Liam Merwick <liam.merwick@oracle.com>
[mdr: ensure FW command failures are indicated to guest, drop extended
request handling to be re-written as separate patch, massage commit]
Signed-off-by: Michael Roth <michael.roth@amd.com>
Message-ID: <20240701223148.3798365-2-michael.roth@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-07-01 17:31:46 -05:00
|
|
|
if (!sev_snp_guest(vcpu->kvm) ||
|
|
|
|
!PAGE_ALIGNED(control->exit_info_1) ||
|
|
|
|
!PAGE_ALIGNED(control->exit_info_2) ||
|
|
|
|
control->exit_info_1 == control->exit_info_2)
|
|
|
|
goto vmgexit_err;
|
|
|
|
break;
|
2020-12-10 11:09:47 -06:00
|
|
|
default:
|
2021-12-02 12:52:05 -06:00
|
|
|
reason = GHCB_ERR_INVALID_EVENT;
|
2020-12-10 11:09:47 -06:00
|
|
|
goto vmgexit_err;
|
|
|
|
}
|
|
|
|
|
KVM: SVM: Exit to userspace on ENOMEM/EFAULT GHCB errors
Exit to userspace if setup_vmgexit_scratch() fails due to OOM or because
copying data from guest (userspace) memory failed/faulted. The OOM
scenario is clearcut, it's userspace's decision as to whether it should
terminate the guest, free memory, etc...
As for -EFAULT, arguably, any guest issue is a violation of the guest's
contract with userspace, and thus userspace needs to decide how to
proceed. E.g. userspace defines what is RAM vs. MMIO and communicates
that directly to the guest, KVM is not involved in deciding what is/isn't
RAM nor in communicating that information to the guest. If the scratch
GPA doesn't resolve to a memslot, then the guest is not honoring the
memory configuration as defined by userspace.
And if userspace unmaps an hva for whatever reason, then exiting to
userspace with -EFAULT is absolutely the right thing to do. KVM's ABI
currently sucks and doesn't provide enough information to act on the
-EFAULT, but that will hopefully be remedied in the future as there are
multiple use cases, e.g. uffd and virtiofs truncation, that shouldn't
require any work in KVM beyond returning -EFAULT with a small amount of
metadata.
KVM could define its ABI such that failure to access the scratch area is
reflected into the guest, i.e. establish a contract with userspace, but
that's undesirable as it limits KVM's options in the future, e.g. in the
potential uffd case any failure on a uaccess needs to kick out to
userspace. KVM does have several cases where it reflects these errors
into the guest, e.g. kvm_pv_clock_pairing() and Hyper-V emulation, but
KVM would preferably "fix" those instead of propagating the falsehood
that any memory failure is the guest's fault.
Lastly, returning a boolean as an "error" for that a helper that isn't
named accordingly never works out well.
Fixes: ad5b353240c8 ("KVM: SVM: Do not terminate SEV-ES guests on GHCB validation failure")
Cc: Alper Gun <alpergun@google.com>
Cc: Peter Gonda <pgonda@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20220225205209.3881130-1-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-02-25 20:52:09 +00:00
|
|
|
return 0;
|
2020-12-10 11:09:47 -06:00
|
|
|
|
|
|
|
vmgexit_err:
|
2021-12-02 12:52:05 -06:00
|
|
|
if (reason == GHCB_ERR_INVALID_USAGE) {
|
2020-12-10 11:09:47 -06:00
|
|
|
vcpu_unimpl(vcpu, "vmgexit: ghcb usage %#x is not valid\n",
|
2023-08-04 13:01:43 -04:00
|
|
|
svm->sev_es.ghcb->ghcb_usage);
|
2021-12-02 12:52:05 -06:00
|
|
|
} else if (reason == GHCB_ERR_INVALID_EVENT) {
|
|
|
|
vcpu_unimpl(vcpu, "vmgexit: exit code %#llx is not valid\n",
|
|
|
|
exit_code);
|
2020-12-10 11:09:47 -06:00
|
|
|
} else {
|
2021-12-02 12:52:05 -06:00
|
|
|
vcpu_unimpl(vcpu, "vmgexit: exit code %#llx input is not valid\n",
|
2020-12-10 11:09:47 -06:00
|
|
|
exit_code);
|
|
|
|
dump_ghcb(svm);
|
|
|
|
}
|
|
|
|
|
2025-02-25 21:39:37 +00:00
|
|
|
svm_vmgexit_bad_input(svm, reason);
|
2020-12-10 11:09:47 -06:00
|
|
|
|
KVM: SVM: Exit to userspace on ENOMEM/EFAULT GHCB errors
Exit to userspace if setup_vmgexit_scratch() fails due to OOM or because
copying data from guest (userspace) memory failed/faulted. The OOM
scenario is clearcut, it's userspace's decision as to whether it should
terminate the guest, free memory, etc...
As for -EFAULT, arguably, any guest issue is a violation of the guest's
contract with userspace, and thus userspace needs to decide how to
proceed. E.g. userspace defines what is RAM vs. MMIO and communicates
that directly to the guest, KVM is not involved in deciding what is/isn't
RAM nor in communicating that information to the guest. If the scratch
GPA doesn't resolve to a memslot, then the guest is not honoring the
memory configuration as defined by userspace.
And if userspace unmaps an hva for whatever reason, then exiting to
userspace with -EFAULT is absolutely the right thing to do. KVM's ABI
currently sucks and doesn't provide enough information to act on the
-EFAULT, but that will hopefully be remedied in the future as there are
multiple use cases, e.g. uffd and virtiofs truncation, that shouldn't
require any work in KVM beyond returning -EFAULT with a small amount of
metadata.
KVM could define its ABI such that failure to access the scratch area is
reflected into the guest, i.e. establish a contract with userspace, but
that's undesirable as it limits KVM's options in the future, e.g. in the
potential uffd case any failure on a uaccess needs to kick out to
userspace. KVM does have several cases where it reflects these errors
into the guest, e.g. kvm_pv_clock_pairing() and Hyper-V emulation, but
KVM would preferably "fix" those instead of propagating the falsehood
that any memory failure is the guest's fault.
Lastly, returning a boolean as an "error" for that a helper that isn't
named accordingly never works out well.
Fixes: ad5b353240c8 ("KVM: SVM: Do not terminate SEV-ES guests on GHCB validation failure")
Cc: Alper Gun <alpergun@google.com>
Cc: Peter Gonda <pgonda@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20220225205209.3881130-1-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-02-25 20:52:09 +00:00
|
|
|
/* Resume the guest to "return" the error code. */
|
|
|
|
return 1;
|
2020-12-10 11:09:47 -06:00
|
|
|
}
|
|
|
|
|
2021-05-06 15:14:41 -05:00
|
|
|
void sev_es_unmap_ghcb(struct vcpu_svm *svm)
|
2020-12-10 11:09:47 -06:00
|
|
|
{
|
2024-05-01 02:10:45 -05:00
|
|
|
/* Clear any indication that the vCPU is in a type of AP Reset Hold */
|
|
|
|
svm->sev_es.ap_reset_hold_type = AP_RESET_HOLD_NONE;
|
|
|
|
|
2021-10-21 10:42:59 -07:00
|
|
|
if (!svm->sev_es.ghcb)
|
2020-12-10 11:09:47 -06:00
|
|
|
return;
|
|
|
|
|
2021-10-21 10:42:59 -07:00
|
|
|
if (svm->sev_es.ghcb_sa_free) {
|
2020-12-10 11:09:53 -06:00
|
|
|
/*
|
|
|
|
* The scratch area lives outside the GHCB, so there is a
|
|
|
|
* buffer that, depending on the operation performed, may
|
|
|
|
* need to be synced, then freed.
|
|
|
|
*/
|
2021-10-21 10:42:59 -07:00
|
|
|
if (svm->sev_es.ghcb_sa_sync) {
|
2020-12-10 11:09:53 -06:00
|
|
|
kvm_write_guest(svm->vcpu.kvm,
|
2023-08-04 12:42:45 -04:00
|
|
|
svm->sev_es.sw_scratch,
|
2021-10-21 10:42:59 -07:00
|
|
|
svm->sev_es.ghcb_sa,
|
|
|
|
svm->sev_es.ghcb_sa_len);
|
|
|
|
svm->sev_es.ghcb_sa_sync = false;
|
2020-12-10 11:09:53 -06:00
|
|
|
}
|
|
|
|
|
2021-11-09 22:23:50 +00:00
|
|
|
kvfree(svm->sev_es.ghcb_sa);
|
2021-10-21 10:42:59 -07:00
|
|
|
svm->sev_es.ghcb_sa = NULL;
|
|
|
|
svm->sev_es.ghcb_sa_free = false;
|
2020-12-10 11:09:53 -06:00
|
|
|
}
|
|
|
|
|
2021-10-21 10:42:59 -07:00
|
|
|
trace_kvm_vmgexit_exit(svm->vcpu.vcpu_id, svm->sev_es.ghcb);
|
2020-12-10 11:09:48 -06:00
|
|
|
|
2020-12-10 11:09:47 -06:00
|
|
|
sev_es_sync_to_ghcb(svm);
|
|
|
|
|
2024-10-10 11:23:35 -07:00
|
|
|
kvm_vcpu_unmap(&svm->vcpu, &svm->sev_es.ghcb_map);
|
2021-10-21 10:42:59 -07:00
|
|
|
svm->sev_es.ghcb = NULL;
|
2020-12-10 11:09:47 -06:00
|
|
|
}
|
|
|
|
|
2025-02-26 17:25:34 -08:00
|
|
|
int pre_sev_run(struct vcpu_svm *svm, int cpu)
|
2020-03-24 10:41:54 +01:00
|
|
|
{
|
2022-11-09 09:07:55 -05:00
|
|
|
struct svm_cpu_data *sd = per_cpu_ptr(&svm_data, cpu);
|
2025-02-26 17:25:34 -08:00
|
|
|
struct kvm *kvm = svm->vcpu.kvm;
|
|
|
|
unsigned int asid = sev_get_asid(kvm);
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Reject KVM_RUN if userspace attempts to run the vCPU with an invalid
|
|
|
|
* VMSA, e.g. if userspace forces the vCPU to be RUNNABLE after an SNP
|
|
|
|
* AP Destroy event.
|
|
|
|
*/
|
|
|
|
if (sev_es_guest(kvm) && !VALID_PAGE(svm->vmcb->control.vmsa_pa))
|
|
|
|
return -EINVAL;
|
2020-03-24 10:41:54 +01:00
|
|
|
|
KVM: SVM: Flush cache only on CPUs running SEV guest
On AMD CPUs without ensuring cache consistency, each memory page
reclamation in an SEV guest triggers a call to do WBNOINVD/WBINVD on all
CPUs, thereby affecting the performance of other programs on the host.
Typically, an AMD server may have 128 cores or more, while the SEV guest
might only utilize 8 of these cores. Meanwhile, host can use qemu-affinity
to bind these 8 vCPUs to specific physical CPUs.
Therefore, keeping a record of the physical core numbers each time a vCPU
runs can help avoid flushing the cache for all CPUs every time.
Take care to allocate the cpumask used to track which CPUs have run a
vCPU when copying or moving an "encryption context", as nothing guarantees
memory in a mirror VM is a strict subset of the ASID owner, and the
destination VM for intrahost migration needs to maintain it's own set of
CPUs. E.g. for intrahost migration, if a CPU was used for the source VM
but not the destination VM, then it can only have cached memory that was
accessible to the source VM. And a CPU that was run in the source is also
used by the destination is no different than a CPU that was run in the
destination only.
Note, KVM is guaranteed to do flush caches prior to sev_vm_destroy(),
thanks to kvm_arch_guest_memory_reclaimed for SEV and SEV-ES, and
kvm_arch_gmem_invalidate() for SEV-SNP. I.e. it's safe to free the
cpumask prior to unregistering encrypted regions and freeing the ASID.
Opportunistically clean up sev_vm_destroy()'s comment regarding what is
(implicitly, what isn't) skipped for mirror VMs.
Cc: Srikanth Aithal <sraithal@amd.com>
Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Zheyun Shen <szy0127@sjtu.edu.cn>
Link: https://lore.kernel.org/r/20250522233733.3176144-9-seanjc@google.com
Link: https://lore.kernel.org/all/935a82e3-f7ad-47d7-aaaf-f3d2b62ed768@amd.com
Co-developed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-05-22 16:37:32 -07:00
|
|
|
/*
|
|
|
|
* To optimize cache flushes when memory is reclaimed from an SEV VM,
|
|
|
|
* track physical CPUs that enter the guest for SEV VMs and thus can
|
|
|
|
* have encrypted, dirty data in the cache, and flush caches only for
|
|
|
|
* CPUs that have entered the guest.
|
|
|
|
*/
|
|
|
|
if (!cpumask_test_cpu(cpu, to_kvm_sev_info(kvm)->have_run_cpus))
|
|
|
|
cpumask_set_cpu(cpu, to_kvm_sev_info(kvm)->have_run_cpus);
|
|
|
|
|
2020-03-24 10:41:54 +01:00
|
|
|
/* Assign the asid allocated with this SEV guest */
|
2020-11-30 09:39:59 -05:00
|
|
|
svm->asid = asid;
|
2020-03-24 10:41:54 +01:00
|
|
|
|
|
|
|
/*
|
|
|
|
* Flush guest TLB:
|
|
|
|
*
|
|
|
|
* 1) when different VMCB for the same ASID is to be run on the same host CPU.
|
|
|
|
* 2) or this VMCB was executed on different host CPU in previous VMRUNs.
|
|
|
|
*/
|
|
|
|
if (sd->sev_vmcbs[asid] == svm->vmcb &&
|
2020-06-03 16:56:22 -07:00
|
|
|
svm->vcpu.arch.last_vmentry_cpu == cpu)
|
2025-02-26 17:25:34 -08:00
|
|
|
return 0;
|
2020-03-24 10:41:54 +01:00
|
|
|
|
|
|
|
sd->sev_vmcbs[asid] = svm->vmcb;
|
|
|
|
svm->vmcb->control.tlb_ctl = TLB_CONTROL_FLUSH_ASID;
|
2020-06-25 10:03:23 +02:00
|
|
|
vmcb_mark_dirty(svm->vmcb, VMCB_ASID);
|
2025-02-26 17:25:34 -08:00
|
|
|
return 0;
|
2020-03-24 10:41:54 +01:00
|
|
|
}
|
2020-12-10 11:09:47 -06:00
|
|
|
|
2020-12-10 11:09:53 -06:00
|
|
|
#define GHCB_SCRATCH_AREA_LIMIT (16ULL * PAGE_SIZE)
|
KVM: SVM: Exit to userspace on ENOMEM/EFAULT GHCB errors
Exit to userspace if setup_vmgexit_scratch() fails due to OOM or because
copying data from guest (userspace) memory failed/faulted. The OOM
scenario is clearcut, it's userspace's decision as to whether it should
terminate the guest, free memory, etc...
As for -EFAULT, arguably, any guest issue is a violation of the guest's
contract with userspace, and thus userspace needs to decide how to
proceed. E.g. userspace defines what is RAM vs. MMIO and communicates
that directly to the guest, KVM is not involved in deciding what is/isn't
RAM nor in communicating that information to the guest. If the scratch
GPA doesn't resolve to a memslot, then the guest is not honoring the
memory configuration as defined by userspace.
And if userspace unmaps an hva for whatever reason, then exiting to
userspace with -EFAULT is absolutely the right thing to do. KVM's ABI
currently sucks and doesn't provide enough information to act on the
-EFAULT, but that will hopefully be remedied in the future as there are
multiple use cases, e.g. uffd and virtiofs truncation, that shouldn't
require any work in KVM beyond returning -EFAULT with a small amount of
metadata.
KVM could define its ABI such that failure to access the scratch area is
reflected into the guest, i.e. establish a contract with userspace, but
that's undesirable as it limits KVM's options in the future, e.g. in the
potential uffd case any failure on a uaccess needs to kick out to
userspace. KVM does have several cases where it reflects these errors
into the guest, e.g. kvm_pv_clock_pairing() and Hyper-V emulation, but
KVM would preferably "fix" those instead of propagating the falsehood
that any memory failure is the guest's fault.
Lastly, returning a boolean as an "error" for that a helper that isn't
named accordingly never works out well.
Fixes: ad5b353240c8 ("KVM: SVM: Do not terminate SEV-ES guests on GHCB validation failure")
Cc: Alper Gun <alpergun@google.com>
Cc: Peter Gonda <pgonda@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20220225205209.3881130-1-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-02-25 20:52:09 +00:00
|
|
|
static int setup_vmgexit_scratch(struct vcpu_svm *svm, bool sync, u64 len)
|
2020-12-10 11:09:53 -06:00
|
|
|
{
|
|
|
|
struct vmcb_control_area *control = &svm->vmcb->control;
|
|
|
|
u64 ghcb_scratch_beg, ghcb_scratch_end;
|
|
|
|
u64 scratch_gpa_beg, scratch_gpa_end;
|
|
|
|
void *scratch_va;
|
|
|
|
|
2023-08-04 12:42:45 -04:00
|
|
|
scratch_gpa_beg = svm->sev_es.sw_scratch;
|
2020-12-10 11:09:53 -06:00
|
|
|
if (!scratch_gpa_beg) {
|
|
|
|
pr_err("vmgexit: scratch gpa not provided\n");
|
2021-12-02 12:52:05 -06:00
|
|
|
goto e_scratch;
|
2020-12-10 11:09:53 -06:00
|
|
|
}
|
|
|
|
|
|
|
|
scratch_gpa_end = scratch_gpa_beg + len;
|
|
|
|
if (scratch_gpa_end < scratch_gpa_beg) {
|
|
|
|
pr_err("vmgexit: scratch length (%#llx) not valid for scratch address (%#llx)\n",
|
|
|
|
len, scratch_gpa_beg);
|
2021-12-02 12:52:05 -06:00
|
|
|
goto e_scratch;
|
2020-12-10 11:09:53 -06:00
|
|
|
}
|
|
|
|
|
|
|
|
if ((scratch_gpa_beg & PAGE_MASK) == control->ghcb_gpa) {
|
|
|
|
/* Scratch area begins within GHCB */
|
|
|
|
ghcb_scratch_beg = control->ghcb_gpa +
|
|
|
|
offsetof(struct ghcb, shared_buffer);
|
|
|
|
ghcb_scratch_end = control->ghcb_gpa +
|
2022-10-24 11:44:48 -05:00
|
|
|
offsetof(struct ghcb, reserved_0xff0);
|
2020-12-10 11:09:53 -06:00
|
|
|
|
|
|
|
/*
|
|
|
|
* If the scratch area begins within the GHCB, it must be
|
|
|
|
* completely contained in the GHCB shared buffer area.
|
|
|
|
*/
|
|
|
|
if (scratch_gpa_beg < ghcb_scratch_beg ||
|
|
|
|
scratch_gpa_end > ghcb_scratch_end) {
|
|
|
|
pr_err("vmgexit: scratch area is outside of GHCB shared buffer area (%#llx - %#llx)\n",
|
|
|
|
scratch_gpa_beg, scratch_gpa_end);
|
2021-12-02 12:52:05 -06:00
|
|
|
goto e_scratch;
|
2020-12-10 11:09:53 -06:00
|
|
|
}
|
|
|
|
|
2021-10-21 10:42:59 -07:00
|
|
|
scratch_va = (void *)svm->sev_es.ghcb;
|
2020-12-10 11:09:53 -06:00
|
|
|
scratch_va += (scratch_gpa_beg - control->ghcb_gpa);
|
|
|
|
} else {
|
|
|
|
/*
|
|
|
|
* The guest memory must be read into a kernel buffer, so
|
|
|
|
* limit the size
|
|
|
|
*/
|
|
|
|
if (len > GHCB_SCRATCH_AREA_LIMIT) {
|
|
|
|
pr_err("vmgexit: scratch area exceeds KVM limits (%#llx requested, %#llx limit)\n",
|
|
|
|
len, GHCB_SCRATCH_AREA_LIMIT);
|
2021-12-02 12:52:05 -06:00
|
|
|
goto e_scratch;
|
2020-12-10 11:09:53 -06:00
|
|
|
}
|
2021-11-09 22:23:50 +00:00
|
|
|
scratch_va = kvzalloc(len, GFP_KERNEL_ACCOUNT);
|
2020-12-10 11:09:53 -06:00
|
|
|
if (!scratch_va)
|
KVM: SVM: Exit to userspace on ENOMEM/EFAULT GHCB errors
Exit to userspace if setup_vmgexit_scratch() fails due to OOM or because
copying data from guest (userspace) memory failed/faulted. The OOM
scenario is clearcut, it's userspace's decision as to whether it should
terminate the guest, free memory, etc...
As for -EFAULT, arguably, any guest issue is a violation of the guest's
contract with userspace, and thus userspace needs to decide how to
proceed. E.g. userspace defines what is RAM vs. MMIO and communicates
that directly to the guest, KVM is not involved in deciding what is/isn't
RAM nor in communicating that information to the guest. If the scratch
GPA doesn't resolve to a memslot, then the guest is not honoring the
memory configuration as defined by userspace.
And if userspace unmaps an hva for whatever reason, then exiting to
userspace with -EFAULT is absolutely the right thing to do. KVM's ABI
currently sucks and doesn't provide enough information to act on the
-EFAULT, but that will hopefully be remedied in the future as there are
multiple use cases, e.g. uffd and virtiofs truncation, that shouldn't
require any work in KVM beyond returning -EFAULT with a small amount of
metadata.
KVM could define its ABI such that failure to access the scratch area is
reflected into the guest, i.e. establish a contract with userspace, but
that's undesirable as it limits KVM's options in the future, e.g. in the
potential uffd case any failure on a uaccess needs to kick out to
userspace. KVM does have several cases where it reflects these errors
into the guest, e.g. kvm_pv_clock_pairing() and Hyper-V emulation, but
KVM would preferably "fix" those instead of propagating the falsehood
that any memory failure is the guest's fault.
Lastly, returning a boolean as an "error" for that a helper that isn't
named accordingly never works out well.
Fixes: ad5b353240c8 ("KVM: SVM: Do not terminate SEV-ES guests on GHCB validation failure")
Cc: Alper Gun <alpergun@google.com>
Cc: Peter Gonda <pgonda@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20220225205209.3881130-1-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-02-25 20:52:09 +00:00
|
|
|
return -ENOMEM;
|
2020-12-10 11:09:53 -06:00
|
|
|
|
|
|
|
if (kvm_read_guest(svm->vcpu.kvm, scratch_gpa_beg, scratch_va, len)) {
|
|
|
|
/* Unable to copy scratch area from guest */
|
|
|
|
pr_err("vmgexit: kvm_read_guest for scratch area failed\n");
|
|
|
|
|
2021-11-09 22:23:50 +00:00
|
|
|
kvfree(scratch_va);
|
KVM: SVM: Exit to userspace on ENOMEM/EFAULT GHCB errors
Exit to userspace if setup_vmgexit_scratch() fails due to OOM or because
copying data from guest (userspace) memory failed/faulted. The OOM
scenario is clearcut, it's userspace's decision as to whether it should
terminate the guest, free memory, etc...
As for -EFAULT, arguably, any guest issue is a violation of the guest's
contract with userspace, and thus userspace needs to decide how to
proceed. E.g. userspace defines what is RAM vs. MMIO and communicates
that directly to the guest, KVM is not involved in deciding what is/isn't
RAM nor in communicating that information to the guest. If the scratch
GPA doesn't resolve to a memslot, then the guest is not honoring the
memory configuration as defined by userspace.
And if userspace unmaps an hva for whatever reason, then exiting to
userspace with -EFAULT is absolutely the right thing to do. KVM's ABI
currently sucks and doesn't provide enough information to act on the
-EFAULT, but that will hopefully be remedied in the future as there are
multiple use cases, e.g. uffd and virtiofs truncation, that shouldn't
require any work in KVM beyond returning -EFAULT with a small amount of
metadata.
KVM could define its ABI such that failure to access the scratch area is
reflected into the guest, i.e. establish a contract with userspace, but
that's undesirable as it limits KVM's options in the future, e.g. in the
potential uffd case any failure on a uaccess needs to kick out to
userspace. KVM does have several cases where it reflects these errors
into the guest, e.g. kvm_pv_clock_pairing() and Hyper-V emulation, but
KVM would preferably "fix" those instead of propagating the falsehood
that any memory failure is the guest's fault.
Lastly, returning a boolean as an "error" for that a helper that isn't
named accordingly never works out well.
Fixes: ad5b353240c8 ("KVM: SVM: Do not terminate SEV-ES guests on GHCB validation failure")
Cc: Alper Gun <alpergun@google.com>
Cc: Peter Gonda <pgonda@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20220225205209.3881130-1-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-02-25 20:52:09 +00:00
|
|
|
return -EFAULT;
|
2020-12-10 11:09:53 -06:00
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* The scratch area is outside the GHCB. The operation will
|
|
|
|
* dictate whether the buffer needs to be synced before running
|
|
|
|
* the vCPU next time (i.e. a read was requested so the data
|
|
|
|
* must be written back to the guest memory).
|
|
|
|
*/
|
2021-10-21 10:42:59 -07:00
|
|
|
svm->sev_es.ghcb_sa_sync = sync;
|
|
|
|
svm->sev_es.ghcb_sa_free = true;
|
2020-12-10 11:09:53 -06:00
|
|
|
}
|
|
|
|
|
2021-10-21 10:42:59 -07:00
|
|
|
svm->sev_es.ghcb_sa = scratch_va;
|
|
|
|
svm->sev_es.ghcb_sa_len = len;
|
2020-12-10 11:09:53 -06:00
|
|
|
|
KVM: SVM: Exit to userspace on ENOMEM/EFAULT GHCB errors
Exit to userspace if setup_vmgexit_scratch() fails due to OOM or because
copying data from guest (userspace) memory failed/faulted. The OOM
scenario is clearcut, it's userspace's decision as to whether it should
terminate the guest, free memory, etc...
As for -EFAULT, arguably, any guest issue is a violation of the guest's
contract with userspace, and thus userspace needs to decide how to
proceed. E.g. userspace defines what is RAM vs. MMIO and communicates
that directly to the guest, KVM is not involved in deciding what is/isn't
RAM nor in communicating that information to the guest. If the scratch
GPA doesn't resolve to a memslot, then the guest is not honoring the
memory configuration as defined by userspace.
And if userspace unmaps an hva for whatever reason, then exiting to
userspace with -EFAULT is absolutely the right thing to do. KVM's ABI
currently sucks and doesn't provide enough information to act on the
-EFAULT, but that will hopefully be remedied in the future as there are
multiple use cases, e.g. uffd and virtiofs truncation, that shouldn't
require any work in KVM beyond returning -EFAULT with a small amount of
metadata.
KVM could define its ABI such that failure to access the scratch area is
reflected into the guest, i.e. establish a contract with userspace, but
that's undesirable as it limits KVM's options in the future, e.g. in the
potential uffd case any failure on a uaccess needs to kick out to
userspace. KVM does have several cases where it reflects these errors
into the guest, e.g. kvm_pv_clock_pairing() and Hyper-V emulation, but
KVM would preferably "fix" those instead of propagating the falsehood
that any memory failure is the guest's fault.
Lastly, returning a boolean as an "error" for that a helper that isn't
named accordingly never works out well.
Fixes: ad5b353240c8 ("KVM: SVM: Do not terminate SEV-ES guests on GHCB validation failure")
Cc: Alper Gun <alpergun@google.com>
Cc: Peter Gonda <pgonda@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20220225205209.3881130-1-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-02-25 20:52:09 +00:00
|
|
|
return 0;
|
2021-12-02 12:52:05 -06:00
|
|
|
|
|
|
|
e_scratch:
|
2025-02-25 21:39:37 +00:00
|
|
|
svm_vmgexit_bad_input(svm, GHCB_ERR_INVALID_SCRATCH_AREA);
|
2021-12-02 12:52:05 -06:00
|
|
|
|
KVM: SVM: Exit to userspace on ENOMEM/EFAULT GHCB errors
Exit to userspace if setup_vmgexit_scratch() fails due to OOM or because
copying data from guest (userspace) memory failed/faulted. The OOM
scenario is clearcut, it's userspace's decision as to whether it should
terminate the guest, free memory, etc...
As for -EFAULT, arguably, any guest issue is a violation of the guest's
contract with userspace, and thus userspace needs to decide how to
proceed. E.g. userspace defines what is RAM vs. MMIO and communicates
that directly to the guest, KVM is not involved in deciding what is/isn't
RAM nor in communicating that information to the guest. If the scratch
GPA doesn't resolve to a memslot, then the guest is not honoring the
memory configuration as defined by userspace.
And if userspace unmaps an hva for whatever reason, then exiting to
userspace with -EFAULT is absolutely the right thing to do. KVM's ABI
currently sucks and doesn't provide enough information to act on the
-EFAULT, but that will hopefully be remedied in the future as there are
multiple use cases, e.g. uffd and virtiofs truncation, that shouldn't
require any work in KVM beyond returning -EFAULT with a small amount of
metadata.
KVM could define its ABI such that failure to access the scratch area is
reflected into the guest, i.e. establish a contract with userspace, but
that's undesirable as it limits KVM's options in the future, e.g. in the
potential uffd case any failure on a uaccess needs to kick out to
userspace. KVM does have several cases where it reflects these errors
into the guest, e.g. kvm_pv_clock_pairing() and Hyper-V emulation, but
KVM would preferably "fix" those instead of propagating the falsehood
that any memory failure is the guest's fault.
Lastly, returning a boolean as an "error" for that a helper that isn't
named accordingly never works out well.
Fixes: ad5b353240c8 ("KVM: SVM: Do not terminate SEV-ES guests on GHCB validation failure")
Cc: Alper Gun <alpergun@google.com>
Cc: Peter Gonda <pgonda@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20220225205209.3881130-1-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-02-25 20:52:09 +00:00
|
|
|
return 1;
|
2020-12-10 11:09:53 -06:00
|
|
|
}
|
|
|
|
|
2020-12-10 11:09:50 -06:00
|
|
|
static void set_ghcb_msr_bits(struct vcpu_svm *svm, u64 value, u64 mask,
|
|
|
|
unsigned int pos)
|
|
|
|
{
|
|
|
|
svm->vmcb->control.ghcb_gpa &= ~(mask << pos);
|
|
|
|
svm->vmcb->control.ghcb_gpa |= (value & mask) << pos;
|
|
|
|
}
|
|
|
|
|
|
|
|
static u64 get_ghcb_msr_bits(struct vcpu_svm *svm, u64 mask, unsigned int pos)
|
|
|
|
{
|
|
|
|
return (svm->vmcb->control.ghcb_gpa >> pos) & mask;
|
|
|
|
}
|
|
|
|
|
2020-12-10 11:09:49 -06:00
|
|
|
static void set_ghcb_msr(struct vcpu_svm *svm, u64 value)
|
|
|
|
{
|
|
|
|
svm->vmcb->control.ghcb_gpa = value;
|
|
|
|
}
|
|
|
|
|
2024-05-01 03:52:01 -05:00
|
|
|
static int snp_rmptable_psmash(kvm_pfn_t pfn)
|
2020-12-10 11:09:47 -06:00
|
|
|
{
|
2024-05-01 03:52:01 -05:00
|
|
|
int ret;
|
|
|
|
|
|
|
|
pfn = pfn & ~(KVM_PAGES_PER_HPAGE(PG_LEVEL_2M) - 1);
|
|
|
|
|
|
|
|
/*
|
|
|
|
* PSMASH_FAIL_INUSE indicates another processor is modifying the
|
|
|
|
* entry, so retry until that's no longer the case.
|
|
|
|
*/
|
|
|
|
do {
|
|
|
|
ret = psmash(pfn);
|
|
|
|
} while (ret == PSMASH_FAIL_INUSE);
|
|
|
|
|
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
|
2024-05-01 03:51:59 -05:00
|
|
|
static int snp_complete_psc_msr(struct kvm_vcpu *vcpu)
|
|
|
|
{
|
|
|
|
struct vcpu_svm *svm = to_svm(vcpu);
|
|
|
|
|
|
|
|
if (vcpu->run->hypercall.ret)
|
|
|
|
set_ghcb_msr(svm, GHCB_MSR_PSC_RESP_ERROR);
|
|
|
|
else
|
|
|
|
set_ghcb_msr(svm, GHCB_MSR_PSC_RESP);
|
|
|
|
|
|
|
|
return 1; /* resume guest */
|
|
|
|
}
|
|
|
|
|
|
|
|
static int snp_begin_psc_msr(struct vcpu_svm *svm, u64 ghcb_msr)
|
|
|
|
{
|
|
|
|
u64 gpa = gfn_to_gpa(GHCB_MSR_PSC_REQ_TO_GFN(ghcb_msr));
|
|
|
|
u8 op = GHCB_MSR_PSC_REQ_TO_OP(ghcb_msr);
|
|
|
|
struct kvm_vcpu *vcpu = &svm->vcpu;
|
|
|
|
|
|
|
|
if (op != SNP_PAGE_STATE_PRIVATE && op != SNP_PAGE_STATE_SHARED) {
|
|
|
|
set_ghcb_msr(svm, GHCB_MSR_PSC_RESP_ERROR);
|
|
|
|
return 1; /* resume guest */
|
|
|
|
}
|
|
|
|
|
2024-11-27 16:43:40 -08:00
|
|
|
if (!user_exit_on_hypercall(vcpu->kvm, KVM_HC_MAP_GPA_RANGE)) {
|
2024-05-01 03:51:59 -05:00
|
|
|
set_ghcb_msr(svm, GHCB_MSR_PSC_RESP_ERROR);
|
|
|
|
return 1; /* resume guest */
|
|
|
|
}
|
|
|
|
|
|
|
|
vcpu->run->exit_reason = KVM_EXIT_HYPERCALL;
|
|
|
|
vcpu->run->hypercall.nr = KVM_HC_MAP_GPA_RANGE;
|
2024-12-13 14:36:25 -05:00
|
|
|
/*
|
|
|
|
* In principle this should have been -KVM_ENOSYS, but userspace (QEMU <=9.2)
|
|
|
|
* assumed that vcpu->run->hypercall.ret is never changed by KVM and thus that
|
|
|
|
* it was always zero on KVM_EXIT_HYPERCALL. Since KVM is now overwriting
|
|
|
|
* vcpu->run->hypercall.ret, ensuring that it is zero to not break QEMU.
|
|
|
|
*/
|
|
|
|
vcpu->run->hypercall.ret = 0;
|
2024-05-01 03:51:59 -05:00
|
|
|
vcpu->run->hypercall.args[0] = gpa;
|
|
|
|
vcpu->run->hypercall.args[1] = 1;
|
|
|
|
vcpu->run->hypercall.args[2] = (op == SNP_PAGE_STATE_PRIVATE)
|
|
|
|
? KVM_MAP_GPA_RANGE_ENCRYPTED
|
|
|
|
: KVM_MAP_GPA_RANGE_DECRYPTED;
|
|
|
|
vcpu->run->hypercall.args[2] |= KVM_MAP_GPA_RANGE_PAGE_SZ_4K;
|
|
|
|
|
|
|
|
vcpu->arch.complete_userspace_io = snp_complete_psc_msr;
|
|
|
|
|
|
|
|
return 0; /* forward request to userspace */
|
|
|
|
}
|
|
|
|
|
2024-05-01 03:52:00 -05:00
|
|
|
struct psc_buffer {
|
|
|
|
struct psc_hdr hdr;
|
|
|
|
struct psc_entry entries[];
|
|
|
|
} __packed;
|
|
|
|
|
|
|
|
static int snp_begin_psc(struct vcpu_svm *svm, struct psc_buffer *psc);
|
|
|
|
|
|
|
|
static void snp_complete_psc(struct vcpu_svm *svm, u64 psc_ret)
|
|
|
|
{
|
|
|
|
svm->sev_es.psc_inflight = 0;
|
|
|
|
svm->sev_es.psc_idx = 0;
|
|
|
|
svm->sev_es.psc_2m = false;
|
2025-02-25 21:39:37 +00:00
|
|
|
|
|
|
|
/*
|
|
|
|
* PSC requests always get a "no action" response in SW_EXITINFO1, with
|
|
|
|
* a PSC-specific return code in SW_EXITINFO2 that provides the "real"
|
|
|
|
* return code. E.g. if the PSC request was interrupted, the need to
|
|
|
|
* retry is communicated via SW_EXITINFO2, not SW_EXITINFO1.
|
|
|
|
*/
|
|
|
|
svm_vmgexit_no_action(svm, psc_ret);
|
2024-05-01 03:52:00 -05:00
|
|
|
}
|
|
|
|
|
|
|
|
static void __snp_complete_one_psc(struct vcpu_svm *svm)
|
|
|
|
{
|
|
|
|
struct psc_buffer *psc = svm->sev_es.ghcb_sa;
|
|
|
|
struct psc_entry *entries = psc->entries;
|
|
|
|
struct psc_hdr *hdr = &psc->hdr;
|
|
|
|
__u16 idx;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Everything in-flight has been processed successfully. Update the
|
|
|
|
* corresponding entries in the guest's PSC buffer and zero out the
|
|
|
|
* count of in-flight PSC entries.
|
|
|
|
*/
|
|
|
|
for (idx = svm->sev_es.psc_idx; svm->sev_es.psc_inflight;
|
|
|
|
svm->sev_es.psc_inflight--, idx++) {
|
|
|
|
struct psc_entry *entry = &entries[idx];
|
|
|
|
|
|
|
|
entry->cur_page = entry->pagesize ? 512 : 1;
|
|
|
|
}
|
|
|
|
|
|
|
|
hdr->cur_entry = idx;
|
|
|
|
}
|
|
|
|
|
|
|
|
static int snp_complete_one_psc(struct kvm_vcpu *vcpu)
|
|
|
|
{
|
|
|
|
struct vcpu_svm *svm = to_svm(vcpu);
|
|
|
|
struct psc_buffer *psc = svm->sev_es.ghcb_sa;
|
|
|
|
|
|
|
|
if (vcpu->run->hypercall.ret) {
|
|
|
|
snp_complete_psc(svm, VMGEXIT_PSC_ERROR_GENERIC);
|
|
|
|
return 1; /* resume guest */
|
|
|
|
}
|
|
|
|
|
|
|
|
__snp_complete_one_psc(svm);
|
|
|
|
|
|
|
|
/* Handle the next range (if any). */
|
|
|
|
return snp_begin_psc(svm, psc);
|
|
|
|
}
|
|
|
|
|
|
|
|
static int snp_begin_psc(struct vcpu_svm *svm, struct psc_buffer *psc)
|
|
|
|
{
|
|
|
|
struct psc_entry *entries = psc->entries;
|
|
|
|
struct kvm_vcpu *vcpu = &svm->vcpu;
|
|
|
|
struct psc_hdr *hdr = &psc->hdr;
|
|
|
|
struct psc_entry entry_start;
|
|
|
|
u16 idx, idx_start, idx_end;
|
|
|
|
int npages;
|
|
|
|
bool huge;
|
|
|
|
u64 gfn;
|
|
|
|
|
2024-11-27 16:43:40 -08:00
|
|
|
if (!user_exit_on_hypercall(vcpu->kvm, KVM_HC_MAP_GPA_RANGE)) {
|
2024-05-01 03:52:00 -05:00
|
|
|
snp_complete_psc(svm, VMGEXIT_PSC_ERROR_GENERIC);
|
|
|
|
return 1;
|
|
|
|
}
|
|
|
|
|
|
|
|
next_range:
|
|
|
|
/* There should be no other PSCs in-flight at this point. */
|
|
|
|
if (WARN_ON_ONCE(svm->sev_es.psc_inflight)) {
|
|
|
|
snp_complete_psc(svm, VMGEXIT_PSC_ERROR_GENERIC);
|
|
|
|
return 1;
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* The PSC descriptor buffer can be modified by a misbehaved guest after
|
|
|
|
* validation, so take care to only use validated copies of values used
|
|
|
|
* for things like array indexing.
|
|
|
|
*/
|
|
|
|
idx_start = hdr->cur_entry;
|
|
|
|
idx_end = hdr->end_entry;
|
|
|
|
|
|
|
|
if (idx_end >= VMGEXIT_PSC_MAX_COUNT) {
|
|
|
|
snp_complete_psc(svm, VMGEXIT_PSC_ERROR_INVALID_HDR);
|
|
|
|
return 1;
|
|
|
|
}
|
|
|
|
|
|
|
|
/* Find the start of the next range which needs processing. */
|
|
|
|
for (idx = idx_start; idx <= idx_end; idx++, hdr->cur_entry++) {
|
|
|
|
entry_start = entries[idx];
|
|
|
|
|
|
|
|
gfn = entry_start.gfn;
|
|
|
|
huge = entry_start.pagesize;
|
|
|
|
npages = huge ? 512 : 1;
|
|
|
|
|
|
|
|
if (entry_start.cur_page > npages || !IS_ALIGNED(gfn, npages)) {
|
|
|
|
snp_complete_psc(svm, VMGEXIT_PSC_ERROR_INVALID_ENTRY);
|
|
|
|
return 1;
|
|
|
|
}
|
|
|
|
|
|
|
|
if (entry_start.cur_page) {
|
|
|
|
/*
|
|
|
|
* If this is a partially-completed 2M range, force 4K handling
|
|
|
|
* for the remaining pages since they're effectively split at
|
|
|
|
* this point. Subsequent code should ensure this doesn't get
|
|
|
|
* combined with adjacent PSC entries where 2M handling is still
|
|
|
|
* possible.
|
|
|
|
*/
|
|
|
|
npages -= entry_start.cur_page;
|
|
|
|
gfn += entry_start.cur_page;
|
|
|
|
huge = false;
|
|
|
|
}
|
|
|
|
|
|
|
|
if (npages)
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
|
|
|
|
if (idx > idx_end) {
|
|
|
|
/* Nothing more to process. */
|
|
|
|
snp_complete_psc(svm, 0);
|
|
|
|
return 1;
|
|
|
|
}
|
|
|
|
|
|
|
|
svm->sev_es.psc_2m = huge;
|
|
|
|
svm->sev_es.psc_idx = idx;
|
|
|
|
svm->sev_es.psc_inflight = 1;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Find all subsequent PSC entries that contain adjacent GPA
|
|
|
|
* ranges/operations and can be combined into a single
|
|
|
|
* KVM_HC_MAP_GPA_RANGE exit.
|
|
|
|
*/
|
|
|
|
while (++idx <= idx_end) {
|
|
|
|
struct psc_entry entry = entries[idx];
|
|
|
|
|
|
|
|
if (entry.operation != entry_start.operation ||
|
|
|
|
entry.gfn != entry_start.gfn + npages ||
|
|
|
|
entry.cur_page || !!entry.pagesize != huge)
|
|
|
|
break;
|
|
|
|
|
|
|
|
svm->sev_es.psc_inflight++;
|
|
|
|
npages += huge ? 512 : 1;
|
|
|
|
}
|
|
|
|
|
|
|
|
switch (entry_start.operation) {
|
|
|
|
case VMGEXIT_PSC_OP_PRIVATE:
|
|
|
|
case VMGEXIT_PSC_OP_SHARED:
|
|
|
|
vcpu->run->exit_reason = KVM_EXIT_HYPERCALL;
|
|
|
|
vcpu->run->hypercall.nr = KVM_HC_MAP_GPA_RANGE;
|
2024-12-13 14:36:25 -05:00
|
|
|
/*
|
|
|
|
* In principle this should have been -KVM_ENOSYS, but userspace (QEMU <=9.2)
|
|
|
|
* assumed that vcpu->run->hypercall.ret is never changed by KVM and thus that
|
|
|
|
* it was always zero on KVM_EXIT_HYPERCALL. Since KVM is now overwriting
|
|
|
|
* vcpu->run->hypercall.ret, ensuring that it is zero to not break QEMU.
|
|
|
|
*/
|
|
|
|
vcpu->run->hypercall.ret = 0;
|
2024-05-01 03:52:00 -05:00
|
|
|
vcpu->run->hypercall.args[0] = gfn_to_gpa(gfn);
|
|
|
|
vcpu->run->hypercall.args[1] = npages;
|
|
|
|
vcpu->run->hypercall.args[2] = entry_start.operation == VMGEXIT_PSC_OP_PRIVATE
|
|
|
|
? KVM_MAP_GPA_RANGE_ENCRYPTED
|
|
|
|
: KVM_MAP_GPA_RANGE_DECRYPTED;
|
|
|
|
vcpu->run->hypercall.args[2] |= entry_start.pagesize
|
|
|
|
? KVM_MAP_GPA_RANGE_PAGE_SZ_2M
|
|
|
|
: KVM_MAP_GPA_RANGE_PAGE_SZ_4K;
|
|
|
|
vcpu->arch.complete_userspace_io = snp_complete_one_psc;
|
|
|
|
return 0; /* forward request to userspace */
|
|
|
|
default:
|
|
|
|
/*
|
|
|
|
* Only shared/private PSC operations are currently supported, so if the
|
|
|
|
* entire range consists of unsupported operations (e.g. SMASH/UNSMASH),
|
|
|
|
* then consider the entire range completed and avoid exiting to
|
|
|
|
* userspace. In theory snp_complete_psc() can always be called directly
|
|
|
|
* at this point to complete the current range and start the next one,
|
|
|
|
* but that could lead to unexpected levels of recursion.
|
|
|
|
*/
|
|
|
|
__snp_complete_one_psc(svm);
|
|
|
|
goto next_range;
|
|
|
|
}
|
|
|
|
|
2024-11-28 10:39:02 +01:00
|
|
|
BUG();
|
2024-05-01 03:52:00 -05:00
|
|
|
}
|
|
|
|
|
2025-02-26 17:25:40 -08:00
|
|
|
/*
|
|
|
|
* Invoked as part of svm_vcpu_reset() processing of an init event.
|
|
|
|
*/
|
|
|
|
void sev_snp_init_protected_guest_state(struct kvm_vcpu *vcpu)
|
2024-05-01 03:52:02 -05:00
|
|
|
{
|
|
|
|
struct vcpu_svm *svm = to_svm(vcpu);
|
2025-02-26 17:25:40 -08:00
|
|
|
struct kvm_memory_slot *slot;
|
|
|
|
struct page *page;
|
|
|
|
kvm_pfn_t pfn;
|
|
|
|
gfn_t gfn;
|
2024-05-01 03:52:02 -05:00
|
|
|
|
2025-02-26 17:25:40 -08:00
|
|
|
if (!sev_snp_guest(vcpu->kvm))
|
|
|
|
return;
|
|
|
|
|
|
|
|
guard(mutex)(&svm->sev_es.snp_vmsa_mutex);
|
|
|
|
|
|
|
|
if (!svm->sev_es.snp_ap_waiting_for_reset)
|
|
|
|
return;
|
|
|
|
|
|
|
|
svm->sev_es.snp_ap_waiting_for_reset = false;
|
2024-05-01 03:52:02 -05:00
|
|
|
|
|
|
|
/* Mark the vCPU as offline and not runnable */
|
|
|
|
vcpu->arch.pv.pv_unhalted = false;
|
2025-01-13 12:01:43 -08:00
|
|
|
kvm_set_mp_state(vcpu, KVM_MP_STATE_HALTED);
|
2024-05-01 03:52:02 -05:00
|
|
|
|
|
|
|
/* Clear use of the VMSA */
|
|
|
|
svm->vmcb->control.vmsa_pa = INVALID_PAGE;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* When replacing the VMSA during SEV-SNP AP creation,
|
|
|
|
* mark the VMCB dirty so that full state is always reloaded.
|
|
|
|
*/
|
|
|
|
vmcb_mark_all_dirty(svm->vmcb);
|
|
|
|
|
2025-02-26 17:25:40 -08:00
|
|
|
if (!VALID_PAGE(svm->sev_es.snp_vmsa_gpa))
|
|
|
|
return;
|
2024-05-01 03:52:02 -05:00
|
|
|
|
2025-02-26 17:25:40 -08:00
|
|
|
gfn = gpa_to_gfn(svm->sev_es.snp_vmsa_gpa);
|
2025-02-26 17:25:41 -08:00
|
|
|
svm->sev_es.snp_vmsa_gpa = INVALID_PAGE;
|
2024-05-01 03:52:02 -05:00
|
|
|
|
2025-02-26 17:25:40 -08:00
|
|
|
slot = gfn_to_memslot(vcpu->kvm, gfn);
|
|
|
|
if (!slot)
|
2024-05-01 03:52:02 -05:00
|
|
|
return;
|
|
|
|
|
2025-02-26 17:25:40 -08:00
|
|
|
/*
|
|
|
|
* The new VMSA will be private memory guest memory, so retrieve the
|
|
|
|
* PFN from the gmem backend.
|
|
|
|
*/
|
|
|
|
if (kvm_gmem_get_pfn(vcpu->kvm, slot, gfn, &pfn, &page, NULL))
|
2024-05-01 03:52:02 -05:00
|
|
|
return;
|
|
|
|
|
2025-02-26 17:25:40 -08:00
|
|
|
/*
|
|
|
|
* From this point forward, the VMSA will always be a guest-mapped page
|
|
|
|
* rather than the initial one allocated by KVM in svm->sev_es.vmsa. In
|
|
|
|
* theory, svm->sev_es.vmsa could be free'd and cleaned up here, but
|
2025-05-22 16:37:31 -07:00
|
|
|
* that involves cleanups like flushing caches, which would ideally be
|
|
|
|
* handled during teardown rather than guest boot. Deferring that also
|
|
|
|
* allows the existing logic for SEV-ES VMSAs to be re-used with
|
2025-02-26 17:25:40 -08:00
|
|
|
* minimal SNP-specific changes.
|
|
|
|
*/
|
|
|
|
svm->sev_es.snp_has_guest_vmsa = true;
|
2024-05-01 03:52:02 -05:00
|
|
|
|
2025-02-26 17:25:40 -08:00
|
|
|
/* Use the new VMSA */
|
|
|
|
svm->vmcb->control.vmsa_pa = pfn_to_hpa(pfn);
|
2024-05-01 03:52:02 -05:00
|
|
|
|
2025-02-26 17:25:40 -08:00
|
|
|
/* Mark the vCPU as runnable */
|
2025-03-19 09:10:44 -04:00
|
|
|
kvm_set_mp_state(vcpu, KVM_MP_STATE_RUNNABLE);
|
2024-05-01 03:52:02 -05:00
|
|
|
|
2025-02-26 17:25:40 -08:00
|
|
|
/*
|
|
|
|
* gmem pages aren't currently migratable, but if this ever changes
|
|
|
|
* then care should be taken to ensure svm->sev_es.vmsa is pinned
|
|
|
|
* through some other means.
|
|
|
|
*/
|
|
|
|
kvm_release_page_clean(page);
|
2024-05-01 03:52:02 -05:00
|
|
|
}
|
|
|
|
|
|
|
|
static int sev_snp_ap_creation(struct vcpu_svm *svm)
|
|
|
|
{
|
2025-02-26 17:25:36 -08:00
|
|
|
struct kvm_sev_info *sev = to_kvm_sev_info(svm->vcpu.kvm);
|
2024-05-01 03:52:02 -05:00
|
|
|
struct kvm_vcpu *vcpu = &svm->vcpu;
|
|
|
|
struct kvm_vcpu *target_vcpu;
|
|
|
|
struct vcpu_svm *target_svm;
|
|
|
|
unsigned int request;
|
|
|
|
unsigned int apic_id;
|
|
|
|
|
|
|
|
request = lower_32_bits(svm->vmcb->control.exit_info_1);
|
|
|
|
apic_id = upper_32_bits(svm->vmcb->control.exit_info_1);
|
|
|
|
|
|
|
|
/* Validate the APIC ID */
|
|
|
|
target_vcpu = kvm_get_vcpu_by_id(vcpu->kvm, apic_id);
|
|
|
|
if (!target_vcpu) {
|
|
|
|
vcpu_unimpl(vcpu, "vmgexit: invalid AP APIC ID [%#x] from guest\n",
|
|
|
|
apic_id);
|
|
|
|
return -EINVAL;
|
|
|
|
}
|
|
|
|
|
|
|
|
target_svm = to_svm(target_vcpu);
|
|
|
|
|
2025-02-26 17:25:38 -08:00
|
|
|
guard(mutex)(&target_svm->sev_es.snp_vmsa_mutex);
|
2024-05-01 03:52:02 -05:00
|
|
|
|
|
|
|
switch (request) {
|
|
|
|
case SVM_VMGEXIT_AP_CREATE_ON_INIT:
|
|
|
|
case SVM_VMGEXIT_AP_CREATE:
|
2025-02-26 17:25:36 -08:00
|
|
|
if (vcpu->arch.regs[VCPU_REGS_RAX] != sev->vmsa_features) {
|
|
|
|
vcpu_unimpl(vcpu, "vmgexit: mismatched AP sev_features [%#lx] != [%#llx] from guest\n",
|
|
|
|
vcpu->arch.regs[VCPU_REGS_RAX], sev->vmsa_features);
|
2025-02-26 17:25:38 -08:00
|
|
|
return -EINVAL;
|
2025-02-26 17:25:36 -08:00
|
|
|
}
|
|
|
|
|
2024-05-01 03:52:02 -05:00
|
|
|
if (!page_address_valid(vcpu, svm->vmcb->control.exit_info_2)) {
|
|
|
|
vcpu_unimpl(vcpu, "vmgexit: invalid AP VMSA address [%#llx] from guest\n",
|
|
|
|
svm->vmcb->control.exit_info_2);
|
2025-02-26 17:25:38 -08:00
|
|
|
return -EINVAL;
|
2024-05-01 03:52:02 -05:00
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Malicious guest can RMPADJUST a large page into VMSA which
|
|
|
|
* will hit the SNP erratum where the CPU will incorrectly signal
|
|
|
|
* an RMP violation #PF if a hugepage collides with the RMP entry
|
|
|
|
* of VMSA page, reject the AP CREATE request if VMSA address from
|
|
|
|
* guest is 2M aligned.
|
|
|
|
*/
|
|
|
|
if (IS_ALIGNED(svm->vmcb->control.exit_info_2, PMD_SIZE)) {
|
|
|
|
vcpu_unimpl(vcpu,
|
|
|
|
"vmgexit: AP VMSA address [%llx] from guest is unsafe as it is 2M aligned\n",
|
|
|
|
svm->vmcb->control.exit_info_2);
|
2025-02-26 17:25:38 -08:00
|
|
|
return -EINVAL;
|
2024-05-01 03:52:02 -05:00
|
|
|
}
|
|
|
|
|
|
|
|
target_svm->sev_es.snp_vmsa_gpa = svm->vmcb->control.exit_info_2;
|
|
|
|
break;
|
|
|
|
case SVM_VMGEXIT_AP_DESTROY:
|
2025-02-26 17:25:35 -08:00
|
|
|
target_svm->sev_es.snp_vmsa_gpa = INVALID_PAGE;
|
2024-05-01 03:52:02 -05:00
|
|
|
break;
|
|
|
|
default:
|
|
|
|
vcpu_unimpl(vcpu, "vmgexit: invalid AP creation request [%#x] from guest\n",
|
|
|
|
request);
|
2025-02-26 17:25:38 -08:00
|
|
|
return -EINVAL;
|
2024-05-01 03:52:02 -05:00
|
|
|
}
|
|
|
|
|
2025-02-26 17:25:35 -08:00
|
|
|
target_svm->sev_es.snp_ap_waiting_for_reset = true;
|
|
|
|
|
2025-02-26 17:25:37 -08:00
|
|
|
/*
|
|
|
|
* Unless Creation is deferred until INIT, signal the vCPU to update
|
|
|
|
* its state.
|
|
|
|
*/
|
2025-03-27 12:39:56 -05:00
|
|
|
if (request != SVM_VMGEXIT_AP_CREATE_ON_INIT)
|
|
|
|
kvm_make_request_and_kick(KVM_REQ_UPDATE_PROTECTED_GUEST_STATE, target_vcpu);
|
2024-05-01 03:52:02 -05:00
|
|
|
|
2025-02-26 17:25:38 -08:00
|
|
|
return 0;
|
2024-05-01 03:52:02 -05:00
|
|
|
}
|
|
|
|
|
KVM: SEV: Provide support for SNP_GUEST_REQUEST NAE event
Version 2 of GHCB specification added support for the SNP Guest Request
Message NAE event. The event allows for an SEV-SNP guest to make
requests to the SEV-SNP firmware through the hypervisor using the
SNP_GUEST_REQUEST API defined in the SEV-SNP firmware specification.
This is used by guests primarily to request attestation reports from
firmware. There are other request types are available as well, but the
specifics of what guest requests are being made generally does not
affect how they are handled by the hypervisor, which only serves as a
proxy for the guest requests and firmware responses.
Implement handling for these events.
When an SNP Guest Request is issued, the guest will provide its own
request/response pages, which could in theory be passed along directly
to firmware. However, these pages would need special care:
- Both pages are from shared guest memory, so they need to be
protected from migration/etc. occurring while firmware reads/writes
to them. At a minimum, this requires elevating the ref counts and
potentially needing an explicit pinning of the memory. This places
additional restrictions on what type of memory backends userspace
can use for shared guest memory since there would be some reliance
on using refcounted pages.
- The response page needs to be switched to Firmware-owned state
before the firmware can write to it, which can lead to potential
host RMP #PFs if the guest is misbehaved and hands the host a
guest page that KVM is writing to for other reasons (e.g. virtio
buffers).
Both of these issues can be avoided completely by using
separately-allocated bounce pages for both the request/response pages
and passing those to firmware instead. So that's the approach taken
here.
Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Co-developed-by: Alexey Kardashevskiy <aik@amd.com>
Signed-off-by: Alexey Kardashevskiy <aik@amd.com>
Co-developed-by: Ashish Kalra <ashish.kalra@amd.com>
Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com>
Reviewed-by: Liam Merwick <liam.merwick@oracle.com>
[mdr: ensure FW command failures are indicated to guest, drop extended
request handling to be re-written as separate patch, massage commit]
Signed-off-by: Michael Roth <michael.roth@amd.com>
Message-ID: <20240701223148.3798365-2-michael.roth@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-07-01 17:31:46 -05:00
|
|
|
static int snp_handle_guest_req(struct vcpu_svm *svm, gpa_t req_gpa, gpa_t resp_gpa)
|
|
|
|
{
|
|
|
|
struct sev_data_snp_guest_request data = {0};
|
|
|
|
struct kvm *kvm = svm->vcpu.kvm;
|
|
|
|
struct kvm_sev_info *sev = to_kvm_sev_info(kvm);
|
|
|
|
sev_ret_code fw_err = 0;
|
|
|
|
int ret;
|
|
|
|
|
|
|
|
if (!sev_snp_guest(kvm))
|
|
|
|
return -EINVAL;
|
|
|
|
|
|
|
|
mutex_lock(&sev->guest_req_mutex);
|
|
|
|
|
|
|
|
if (kvm_read_guest(kvm, req_gpa, sev->guest_req_buf, PAGE_SIZE)) {
|
|
|
|
ret = -EIO;
|
|
|
|
goto out_unlock;
|
|
|
|
}
|
|
|
|
|
|
|
|
data.gctx_paddr = __psp_pa(sev->snp_context);
|
|
|
|
data.req_paddr = __psp_pa(sev->guest_req_buf);
|
|
|
|
data.res_paddr = __psp_pa(sev->guest_resp_buf);
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Firmware failures are propagated on to guest, but any other failure
|
|
|
|
* condition along the way should be reported to userspace. E.g. if
|
|
|
|
* the PSP is dead and commands are timing out.
|
|
|
|
*/
|
|
|
|
ret = sev_issue_cmd(kvm, SEV_CMD_SNP_GUEST_REQUEST, &data, &fw_err);
|
|
|
|
if (ret && !fw_err)
|
|
|
|
goto out_unlock;
|
|
|
|
|
|
|
|
if (kvm_write_guest(kvm, resp_gpa, sev->guest_resp_buf, PAGE_SIZE)) {
|
|
|
|
ret = -EIO;
|
|
|
|
goto out_unlock;
|
|
|
|
}
|
|
|
|
|
2025-02-25 21:39:37 +00:00
|
|
|
/* No action is requested *from KVM* if there was a firmware error. */
|
|
|
|
svm_vmgexit_no_action(svm, SNP_GUEST_ERR(0, fw_err));
|
KVM: SEV: Provide support for SNP_GUEST_REQUEST NAE event
Version 2 of GHCB specification added support for the SNP Guest Request
Message NAE event. The event allows for an SEV-SNP guest to make
requests to the SEV-SNP firmware through the hypervisor using the
SNP_GUEST_REQUEST API defined in the SEV-SNP firmware specification.
This is used by guests primarily to request attestation reports from
firmware. There are other request types are available as well, but the
specifics of what guest requests are being made generally does not
affect how they are handled by the hypervisor, which only serves as a
proxy for the guest requests and firmware responses.
Implement handling for these events.
When an SNP Guest Request is issued, the guest will provide its own
request/response pages, which could in theory be passed along directly
to firmware. However, these pages would need special care:
- Both pages are from shared guest memory, so they need to be
protected from migration/etc. occurring while firmware reads/writes
to them. At a minimum, this requires elevating the ref counts and
potentially needing an explicit pinning of the memory. This places
additional restrictions on what type of memory backends userspace
can use for shared guest memory since there would be some reliance
on using refcounted pages.
- The response page needs to be switched to Firmware-owned state
before the firmware can write to it, which can lead to potential
host RMP #PFs if the guest is misbehaved and hands the host a
guest page that KVM is writing to for other reasons (e.g. virtio
buffers).
Both of these issues can be avoided completely by using
separately-allocated bounce pages for both the request/response pages
and passing those to firmware instead. So that's the approach taken
here.
Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Co-developed-by: Alexey Kardashevskiy <aik@amd.com>
Signed-off-by: Alexey Kardashevskiy <aik@amd.com>
Co-developed-by: Ashish Kalra <ashish.kalra@amd.com>
Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com>
Reviewed-by: Liam Merwick <liam.merwick@oracle.com>
[mdr: ensure FW command failures are indicated to guest, drop extended
request handling to be re-written as separate patch, massage commit]
Signed-off-by: Michael Roth <michael.roth@amd.com>
Message-ID: <20240701223148.3798365-2-michael.roth@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-07-01 17:31:46 -05:00
|
|
|
|
|
|
|
ret = 1; /* resume guest */
|
|
|
|
|
|
|
|
out_unlock:
|
|
|
|
mutex_unlock(&sev->guest_req_mutex);
|
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
|
2024-07-01 17:31:48 -05:00
|
|
|
static int snp_handle_ext_guest_req(struct vcpu_svm *svm, gpa_t req_gpa, gpa_t resp_gpa)
|
|
|
|
{
|
|
|
|
struct kvm *kvm = svm->vcpu.kvm;
|
|
|
|
u8 msg_type;
|
|
|
|
|
|
|
|
if (!sev_snp_guest(kvm))
|
|
|
|
return -EINVAL;
|
|
|
|
|
|
|
|
if (kvm_read_guest(kvm, req_gpa + offsetof(struct snp_guest_msg_hdr, msg_type),
|
|
|
|
&msg_type, 1))
|
|
|
|
return -EIO;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* As per GHCB spec, requests of type MSG_REPORT_REQ also allow for
|
|
|
|
* additional certificate data to be provided alongside the attestation
|
|
|
|
* report via the guest-provided data pages indicated by RAX/RBX. The
|
|
|
|
* certificate data is optional and requires additional KVM enablement
|
|
|
|
* to provide an interface for userspace to provide it, but KVM still
|
|
|
|
* needs to be able to handle extended guest requests either way. So
|
|
|
|
* provide a stub implementation that will always return an empty
|
|
|
|
* certificate table in the guest-provided data pages.
|
|
|
|
*/
|
|
|
|
if (msg_type == SNP_MSG_REPORT_REQ) {
|
|
|
|
struct kvm_vcpu *vcpu = &svm->vcpu;
|
|
|
|
u64 data_npages;
|
|
|
|
gpa_t data_gpa;
|
|
|
|
|
|
|
|
if (!kvm_ghcb_rax_is_valid(svm) || !kvm_ghcb_rbx_is_valid(svm))
|
|
|
|
goto request_invalid;
|
|
|
|
|
|
|
|
data_gpa = vcpu->arch.regs[VCPU_REGS_RAX];
|
|
|
|
data_npages = vcpu->arch.regs[VCPU_REGS_RBX];
|
|
|
|
|
|
|
|
if (!PAGE_ALIGNED(data_gpa))
|
|
|
|
goto request_invalid;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* As per GHCB spec (see "SNP Extended Guest Request"), the
|
|
|
|
* certificate table is terminated by 24-bytes of zeroes.
|
|
|
|
*/
|
|
|
|
if (data_npages && kvm_clear_guest(kvm, data_gpa, 24))
|
|
|
|
return -EIO;
|
|
|
|
}
|
|
|
|
|
|
|
|
return snp_handle_guest_req(svm, req_gpa, resp_gpa);
|
|
|
|
|
|
|
|
request_invalid:
|
2025-02-25 21:39:37 +00:00
|
|
|
svm_vmgexit_bad_input(svm, GHCB_ERR_INVALID_INPUT);
|
2024-07-01 17:31:48 -05:00
|
|
|
return 1; /* resume guest */
|
|
|
|
}
|
|
|
|
|
2020-12-10 11:09:47 -06:00
|
|
|
static int sev_handle_vmgexit_msr_protocol(struct vcpu_svm *svm)
|
|
|
|
{
|
2020-12-10 11:09:49 -06:00
|
|
|
struct vmcb_control_area *control = &svm->vmcb->control;
|
2020-12-10 11:09:50 -06:00
|
|
|
struct kvm_vcpu *vcpu = &svm->vcpu;
|
2025-01-23 11:21:40 +05:30
|
|
|
struct kvm_sev_info *sev = to_kvm_sev_info(vcpu->kvm);
|
2020-12-10 11:09:49 -06:00
|
|
|
u64 ghcb_info;
|
2020-12-10 11:09:50 -06:00
|
|
|
int ret = 1;
|
2020-12-10 11:09:49 -06:00
|
|
|
|
|
|
|
ghcb_info = control->ghcb_gpa & GHCB_MSR_INFO_MASK;
|
|
|
|
|
2020-12-10 11:09:52 -06:00
|
|
|
trace_kvm_vmgexit_msr_protocol_enter(svm->vcpu.vcpu_id,
|
|
|
|
control->ghcb_gpa);
|
|
|
|
|
2020-12-10 11:09:49 -06:00
|
|
|
switch (ghcb_info) {
|
|
|
|
case GHCB_MSR_SEV_INFO_REQ:
|
2024-05-01 02:10:48 -05:00
|
|
|
set_ghcb_msr(svm, GHCB_MSR_SEV_INFO((__u64)sev->ghcb_version,
|
2020-12-10 11:09:49 -06:00
|
|
|
GHCB_VERSION_MIN,
|
|
|
|
sev_enc_bit));
|
|
|
|
break;
|
2020-12-10 11:09:50 -06:00
|
|
|
case GHCB_MSR_CPUID_REQ: {
|
|
|
|
u64 cpuid_fn, cpuid_reg, cpuid_value;
|
|
|
|
|
|
|
|
cpuid_fn = get_ghcb_msr_bits(svm,
|
|
|
|
GHCB_MSR_CPUID_FUNC_MASK,
|
|
|
|
GHCB_MSR_CPUID_FUNC_POS);
|
|
|
|
|
|
|
|
/* Initialize the registers needed by the CPUID intercept */
|
|
|
|
vcpu->arch.regs[VCPU_REGS_RAX] = cpuid_fn;
|
|
|
|
vcpu->arch.regs[VCPU_REGS_RCX] = 0;
|
|
|
|
|
2021-03-02 14:40:39 -05:00
|
|
|
ret = svm_invoke_exit_handler(vcpu, SVM_EXIT_CPUID);
|
2020-12-10 11:09:50 -06:00
|
|
|
if (!ret) {
|
2021-12-02 12:52:05 -06:00
|
|
|
/* Error, keep GHCB MSR value as-is */
|
2020-12-10 11:09:50 -06:00
|
|
|
break;
|
|
|
|
}
|
|
|
|
|
|
|
|
cpuid_reg = get_ghcb_msr_bits(svm,
|
|
|
|
GHCB_MSR_CPUID_REG_MASK,
|
|
|
|
GHCB_MSR_CPUID_REG_POS);
|
|
|
|
if (cpuid_reg == 0)
|
|
|
|
cpuid_value = vcpu->arch.regs[VCPU_REGS_RAX];
|
|
|
|
else if (cpuid_reg == 1)
|
|
|
|
cpuid_value = vcpu->arch.regs[VCPU_REGS_RBX];
|
|
|
|
else if (cpuid_reg == 2)
|
|
|
|
cpuid_value = vcpu->arch.regs[VCPU_REGS_RCX];
|
|
|
|
else
|
|
|
|
cpuid_value = vcpu->arch.regs[VCPU_REGS_RDX];
|
|
|
|
|
|
|
|
set_ghcb_msr_bits(svm, cpuid_value,
|
|
|
|
GHCB_MSR_CPUID_VALUE_MASK,
|
|
|
|
GHCB_MSR_CPUID_VALUE_POS);
|
|
|
|
|
|
|
|
set_ghcb_msr_bits(svm, GHCB_MSR_CPUID_RESP,
|
|
|
|
GHCB_MSR_INFO_MASK,
|
|
|
|
GHCB_MSR_INFO_POS);
|
|
|
|
break;
|
|
|
|
}
|
2024-05-01 02:10:45 -05:00
|
|
|
case GHCB_MSR_AP_RESET_HOLD_REQ:
|
|
|
|
svm->sev_es.ap_reset_hold_type = AP_RESET_HOLD_MSR_PROTO;
|
|
|
|
ret = kvm_emulate_ap_reset_hold(&svm->vcpu);
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Preset the result to a non-SIPI return and then only set
|
|
|
|
* the result to non-zero when delivering a SIPI.
|
|
|
|
*/
|
|
|
|
set_ghcb_msr_bits(svm, 0,
|
|
|
|
GHCB_MSR_AP_RESET_HOLD_RESULT_MASK,
|
|
|
|
GHCB_MSR_AP_RESET_HOLD_RESULT_POS);
|
|
|
|
|
|
|
|
set_ghcb_msr_bits(svm, GHCB_MSR_AP_RESET_HOLD_RESP,
|
|
|
|
GHCB_MSR_INFO_MASK,
|
|
|
|
GHCB_MSR_INFO_POS);
|
|
|
|
break;
|
2024-05-01 02:10:46 -05:00
|
|
|
case GHCB_MSR_HV_FT_REQ:
|
|
|
|
set_ghcb_msr_bits(svm, GHCB_HV_FT_SUPPORTED,
|
|
|
|
GHCB_MSR_HV_FT_MASK, GHCB_MSR_HV_FT_POS);
|
|
|
|
set_ghcb_msr_bits(svm, GHCB_MSR_HV_FT_RESP,
|
|
|
|
GHCB_MSR_INFO_MASK, GHCB_MSR_INFO_POS);
|
|
|
|
break;
|
2024-05-01 03:51:58 -05:00
|
|
|
case GHCB_MSR_PREF_GPA_REQ:
|
|
|
|
if (!sev_snp_guest(vcpu->kvm))
|
|
|
|
goto out_terminate;
|
|
|
|
|
|
|
|
set_ghcb_msr_bits(svm, GHCB_MSR_PREF_GPA_NONE, GHCB_MSR_GPA_VALUE_MASK,
|
|
|
|
GHCB_MSR_GPA_VALUE_POS);
|
|
|
|
set_ghcb_msr_bits(svm, GHCB_MSR_PREF_GPA_RESP, GHCB_MSR_INFO_MASK,
|
|
|
|
GHCB_MSR_INFO_POS);
|
|
|
|
break;
|
|
|
|
case GHCB_MSR_REG_GPA_REQ: {
|
|
|
|
u64 gfn;
|
|
|
|
|
|
|
|
if (!sev_snp_guest(vcpu->kvm))
|
|
|
|
goto out_terminate;
|
|
|
|
|
|
|
|
gfn = get_ghcb_msr_bits(svm, GHCB_MSR_GPA_VALUE_MASK,
|
|
|
|
GHCB_MSR_GPA_VALUE_POS);
|
|
|
|
|
|
|
|
svm->sev_es.ghcb_registered_gpa = gfn_to_gpa(gfn);
|
|
|
|
|
|
|
|
set_ghcb_msr_bits(svm, gfn, GHCB_MSR_GPA_VALUE_MASK,
|
|
|
|
GHCB_MSR_GPA_VALUE_POS);
|
|
|
|
set_ghcb_msr_bits(svm, GHCB_MSR_REG_GPA_RESP, GHCB_MSR_INFO_MASK,
|
|
|
|
GHCB_MSR_INFO_POS);
|
|
|
|
break;
|
|
|
|
}
|
2024-05-01 03:51:59 -05:00
|
|
|
case GHCB_MSR_PSC_REQ:
|
|
|
|
if (!sev_snp_guest(vcpu->kvm))
|
|
|
|
goto out_terminate;
|
|
|
|
|
|
|
|
ret = snp_begin_psc_msr(svm, control->ghcb_gpa);
|
|
|
|
break;
|
2020-12-10 11:09:51 -06:00
|
|
|
case GHCB_MSR_TERM_REQ: {
|
|
|
|
u64 reason_set, reason_code;
|
|
|
|
|
|
|
|
reason_set = get_ghcb_msr_bits(svm,
|
|
|
|
GHCB_MSR_TERM_REASON_SET_MASK,
|
|
|
|
GHCB_MSR_TERM_REASON_SET_POS);
|
|
|
|
reason_code = get_ghcb_msr_bits(svm,
|
|
|
|
GHCB_MSR_TERM_REASON_MASK,
|
|
|
|
GHCB_MSR_TERM_REASON_POS);
|
|
|
|
pr_info("SEV-ES guest requested termination: %#llx:%#llx\n",
|
|
|
|
reason_set, reason_code);
|
2021-12-02 12:52:05 -06:00
|
|
|
|
2024-05-01 03:51:58 -05:00
|
|
|
goto out_terminate;
|
2020-12-10 11:09:51 -06:00
|
|
|
}
|
2020-12-10 11:09:49 -06:00
|
|
|
default:
|
2021-12-02 12:52:05 -06:00
|
|
|
/* Error, keep GHCB MSR value as-is */
|
|
|
|
break;
|
2020-12-10 11:09:49 -06:00
|
|
|
}
|
|
|
|
|
2020-12-10 11:09:52 -06:00
|
|
|
trace_kvm_vmgexit_msr_protocol_exit(svm->vcpu.vcpu_id,
|
|
|
|
control->ghcb_gpa, ret);
|
|
|
|
|
2020-12-10 11:09:50 -06:00
|
|
|
return ret;
|
2024-05-01 03:51:58 -05:00
|
|
|
|
|
|
|
out_terminate:
|
|
|
|
vcpu->run->exit_reason = KVM_EXIT_SYSTEM_EVENT;
|
|
|
|
vcpu->run->system_event.type = KVM_SYSTEM_EVENT_SEV_TERM;
|
|
|
|
vcpu->run->system_event.ndata = 1;
|
|
|
|
vcpu->run->system_event.data[0] = control->ghcb_gpa;
|
|
|
|
|
|
|
|
return 0;
|
2020-12-10 11:09:47 -06:00
|
|
|
}
|
|
|
|
|
2021-03-02 14:40:39 -05:00
|
|
|
int sev_handle_vmgexit(struct kvm_vcpu *vcpu)
|
2020-12-10 11:09:47 -06:00
|
|
|
{
|
2021-03-02 14:40:39 -05:00
|
|
|
struct vcpu_svm *svm = to_svm(vcpu);
|
2020-12-10 11:09:47 -06:00
|
|
|
struct vmcb_control_area *control = &svm->vmcb->control;
|
|
|
|
u64 ghcb_gpa, exit_code;
|
|
|
|
int ret;
|
|
|
|
|
|
|
|
/* Validate the GHCB */
|
|
|
|
ghcb_gpa = control->ghcb_gpa;
|
|
|
|
if (ghcb_gpa & GHCB_MSR_INFO_MASK)
|
|
|
|
return sev_handle_vmgexit_msr_protocol(svm);
|
|
|
|
|
|
|
|
if (!ghcb_gpa) {
|
2021-03-02 14:40:39 -05:00
|
|
|
vcpu_unimpl(vcpu, "vmgexit: GHCB gpa is not set\n");
|
2021-12-02 12:52:05 -06:00
|
|
|
|
|
|
|
/* Without a GHCB, just return right back to the guest */
|
|
|
|
return 1;
|
2020-12-10 11:09:47 -06:00
|
|
|
}
|
|
|
|
|
2021-10-21 10:42:59 -07:00
|
|
|
if (kvm_vcpu_map(vcpu, ghcb_gpa >> PAGE_SHIFT, &svm->sev_es.ghcb_map)) {
|
2020-12-10 11:09:47 -06:00
|
|
|
/* Unable to map GHCB from guest */
|
2021-03-02 14:40:39 -05:00
|
|
|
vcpu_unimpl(vcpu, "vmgexit: error mapping GHCB [%#llx] from guest\n",
|
2020-12-10 11:09:47 -06:00
|
|
|
ghcb_gpa);
|
2021-12-02 12:52:05 -06:00
|
|
|
|
|
|
|
/* Without a GHCB, just return right back to the guest */
|
|
|
|
return 1;
|
2020-12-10 11:09:47 -06:00
|
|
|
}
|
|
|
|
|
2021-10-21 10:42:59 -07:00
|
|
|
svm->sev_es.ghcb = svm->sev_es.ghcb_map.hva;
|
2020-12-10 11:09:47 -06:00
|
|
|
|
2023-08-04 13:01:43 -04:00
|
|
|
trace_kvm_vmgexit_enter(vcpu->vcpu_id, svm->sev_es.ghcb);
|
2020-12-10 11:09:48 -06:00
|
|
|
|
2023-08-04 12:42:45 -04:00
|
|
|
sev_es_sync_from_ghcb(svm);
|
2024-05-01 03:51:58 -05:00
|
|
|
|
|
|
|
/* SEV-SNP guest requires that the GHCB GPA must be registered */
|
|
|
|
if (sev_snp_guest(svm->vcpu.kvm) && !ghcb_gpa_is_registered(svm, ghcb_gpa)) {
|
|
|
|
vcpu_unimpl(&svm->vcpu, "vmgexit: GHCB GPA [%#llx] is not registered.\n", ghcb_gpa);
|
|
|
|
return -EINVAL;
|
|
|
|
}
|
|
|
|
|
KVM: SVM: Exit to userspace on ENOMEM/EFAULT GHCB errors
Exit to userspace if setup_vmgexit_scratch() fails due to OOM or because
copying data from guest (userspace) memory failed/faulted. The OOM
scenario is clearcut, it's userspace's decision as to whether it should
terminate the guest, free memory, etc...
As for -EFAULT, arguably, any guest issue is a violation of the guest's
contract with userspace, and thus userspace needs to decide how to
proceed. E.g. userspace defines what is RAM vs. MMIO and communicates
that directly to the guest, KVM is not involved in deciding what is/isn't
RAM nor in communicating that information to the guest. If the scratch
GPA doesn't resolve to a memslot, then the guest is not honoring the
memory configuration as defined by userspace.
And if userspace unmaps an hva for whatever reason, then exiting to
userspace with -EFAULT is absolutely the right thing to do. KVM's ABI
currently sucks and doesn't provide enough information to act on the
-EFAULT, but that will hopefully be remedied in the future as there are
multiple use cases, e.g. uffd and virtiofs truncation, that shouldn't
require any work in KVM beyond returning -EFAULT with a small amount of
metadata.
KVM could define its ABI such that failure to access the scratch area is
reflected into the guest, i.e. establish a contract with userspace, but
that's undesirable as it limits KVM's options in the future, e.g. in the
potential uffd case any failure on a uaccess needs to kick out to
userspace. KVM does have several cases where it reflects these errors
into the guest, e.g. kvm_pv_clock_pairing() and Hyper-V emulation, but
KVM would preferably "fix" those instead of propagating the falsehood
that any memory failure is the guest's fault.
Lastly, returning a boolean as an "error" for that a helper that isn't
named accordingly never works out well.
Fixes: ad5b353240c8 ("KVM: SVM: Do not terminate SEV-ES guests on GHCB validation failure")
Cc: Alper Gun <alpergun@google.com>
Cc: Peter Gonda <pgonda@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20220225205209.3881130-1-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-02-25 20:52:09 +00:00
|
|
|
ret = sev_es_validate_vmgexit(svm);
|
|
|
|
if (ret)
|
|
|
|
return ret;
|
2020-12-10 11:09:47 -06:00
|
|
|
|
2025-02-25 21:39:37 +00:00
|
|
|
svm_vmgexit_success(svm, 0);
|
2020-12-10 11:09:47 -06:00
|
|
|
|
2023-08-04 12:56:36 -04:00
|
|
|
exit_code = kvm_ghcb_get_sw_exit_code(control);
|
2020-12-10 11:09:47 -06:00
|
|
|
switch (exit_code) {
|
2020-12-10 11:09:53 -06:00
|
|
|
case SVM_VMGEXIT_MMIO_READ:
|
KVM: SVM: Exit to userspace on ENOMEM/EFAULT GHCB errors
Exit to userspace if setup_vmgexit_scratch() fails due to OOM or because
copying data from guest (userspace) memory failed/faulted. The OOM
scenario is clearcut, it's userspace's decision as to whether it should
terminate the guest, free memory, etc...
As for -EFAULT, arguably, any guest issue is a violation of the guest's
contract with userspace, and thus userspace needs to decide how to
proceed. E.g. userspace defines what is RAM vs. MMIO and communicates
that directly to the guest, KVM is not involved in deciding what is/isn't
RAM nor in communicating that information to the guest. If the scratch
GPA doesn't resolve to a memslot, then the guest is not honoring the
memory configuration as defined by userspace.
And if userspace unmaps an hva for whatever reason, then exiting to
userspace with -EFAULT is absolutely the right thing to do. KVM's ABI
currently sucks and doesn't provide enough information to act on the
-EFAULT, but that will hopefully be remedied in the future as there are
multiple use cases, e.g. uffd and virtiofs truncation, that shouldn't
require any work in KVM beyond returning -EFAULT with a small amount of
metadata.
KVM could define its ABI such that failure to access the scratch area is
reflected into the guest, i.e. establish a contract with userspace, but
that's undesirable as it limits KVM's options in the future, e.g. in the
potential uffd case any failure on a uaccess needs to kick out to
userspace. KVM does have several cases where it reflects these errors
into the guest, e.g. kvm_pv_clock_pairing() and Hyper-V emulation, but
KVM would preferably "fix" those instead of propagating the falsehood
that any memory failure is the guest's fault.
Lastly, returning a boolean as an "error" for that a helper that isn't
named accordingly never works out well.
Fixes: ad5b353240c8 ("KVM: SVM: Do not terminate SEV-ES guests on GHCB validation failure")
Cc: Alper Gun <alpergun@google.com>
Cc: Peter Gonda <pgonda@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20220225205209.3881130-1-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-02-25 20:52:09 +00:00
|
|
|
ret = setup_vmgexit_scratch(svm, true, control->exit_info_2);
|
|
|
|
if (ret)
|
2020-12-10 11:09:53 -06:00
|
|
|
break;
|
|
|
|
|
2021-03-02 14:40:39 -05:00
|
|
|
ret = kvm_sev_es_mmio_read(vcpu,
|
2020-12-10 11:09:53 -06:00
|
|
|
control->exit_info_1,
|
|
|
|
control->exit_info_2,
|
2021-10-21 10:42:59 -07:00
|
|
|
svm->sev_es.ghcb_sa);
|
2020-12-10 11:09:53 -06:00
|
|
|
break;
|
|
|
|
case SVM_VMGEXIT_MMIO_WRITE:
|
KVM: SVM: Exit to userspace on ENOMEM/EFAULT GHCB errors
Exit to userspace if setup_vmgexit_scratch() fails due to OOM or because
copying data from guest (userspace) memory failed/faulted. The OOM
scenario is clearcut, it's userspace's decision as to whether it should
terminate the guest, free memory, etc...
As for -EFAULT, arguably, any guest issue is a violation of the guest's
contract with userspace, and thus userspace needs to decide how to
proceed. E.g. userspace defines what is RAM vs. MMIO and communicates
that directly to the guest, KVM is not involved in deciding what is/isn't
RAM nor in communicating that information to the guest. If the scratch
GPA doesn't resolve to a memslot, then the guest is not honoring the
memory configuration as defined by userspace.
And if userspace unmaps an hva for whatever reason, then exiting to
userspace with -EFAULT is absolutely the right thing to do. KVM's ABI
currently sucks and doesn't provide enough information to act on the
-EFAULT, but that will hopefully be remedied in the future as there are
multiple use cases, e.g. uffd and virtiofs truncation, that shouldn't
require any work in KVM beyond returning -EFAULT with a small amount of
metadata.
KVM could define its ABI such that failure to access the scratch area is
reflected into the guest, i.e. establish a contract with userspace, but
that's undesirable as it limits KVM's options in the future, e.g. in the
potential uffd case any failure on a uaccess needs to kick out to
userspace. KVM does have several cases where it reflects these errors
into the guest, e.g. kvm_pv_clock_pairing() and Hyper-V emulation, but
KVM would preferably "fix" those instead of propagating the falsehood
that any memory failure is the guest's fault.
Lastly, returning a boolean as an "error" for that a helper that isn't
named accordingly never works out well.
Fixes: ad5b353240c8 ("KVM: SVM: Do not terminate SEV-ES guests on GHCB validation failure")
Cc: Alper Gun <alpergun@google.com>
Cc: Peter Gonda <pgonda@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20220225205209.3881130-1-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-02-25 20:52:09 +00:00
|
|
|
ret = setup_vmgexit_scratch(svm, false, control->exit_info_2);
|
|
|
|
if (ret)
|
2020-12-10 11:09:53 -06:00
|
|
|
break;
|
|
|
|
|
2021-03-02 14:40:39 -05:00
|
|
|
ret = kvm_sev_es_mmio_write(vcpu,
|
2020-12-10 11:09:53 -06:00
|
|
|
control->exit_info_1,
|
|
|
|
control->exit_info_2,
|
2021-10-21 10:42:59 -07:00
|
|
|
svm->sev_es.ghcb_sa);
|
2020-12-10 11:09:53 -06:00
|
|
|
break;
|
2020-12-14 11:16:03 -05:00
|
|
|
case SVM_VMGEXIT_NMI_COMPLETE:
|
2023-06-15 16:37:56 +10:00
|
|
|
++vcpu->stat.nmi_window_exits;
|
|
|
|
svm->nmi_masked = false;
|
|
|
|
kvm_make_request(KVM_REQ_EVENT, vcpu);
|
|
|
|
ret = 1;
|
2020-12-14 11:16:03 -05:00
|
|
|
break;
|
KVM: SVM: Add support for booting APs in an SEV-ES guest
Typically under KVM, an AP is booted using the INIT-SIPI-SIPI sequence,
where the guest vCPU register state is updated and then the vCPU is VMRUN
to begin execution of the AP. For an SEV-ES guest, this won't work because
the guest register state is encrypted.
Following the GHCB specification, the hypervisor must not alter the guest
register state, so KVM must track an AP/vCPU boot. Should the guest want
to park the AP, it must use the AP Reset Hold exit event in place of, for
example, a HLT loop.
First AP boot (first INIT-SIPI-SIPI sequence):
Execute the AP (vCPU) as it was initialized and measured by the SEV-ES
support. It is up to the guest to transfer control of the AP to the
proper location.
Subsequent AP boot:
KVM will expect to receive an AP Reset Hold exit event indicating that
the vCPU is being parked and will require an INIT-SIPI-SIPI sequence to
awaken it. When the AP Reset Hold exit event is received, KVM will place
the vCPU into a simulated HLT mode. Upon receiving the INIT-SIPI-SIPI
sequence, KVM will make the vCPU runnable. It is again up to the guest
to then transfer control of the AP to the proper location.
To differentiate between an actual HLT and an AP Reset Hold, a new MP
state is introduced, KVM_MP_STATE_AP_RESET_HOLD, which the vCPU is
placed in upon receiving the AP Reset Hold exit event. Additionally, to
communicate the AP Reset Hold exit event up to userspace (if needed), a
new exit reason is introduced, KVM_EXIT_AP_RESET_HOLD.
A new x86 ops function is introduced, vcpu_deliver_sipi_vector, in order
to accomplish AP booting. For VMX, vcpu_deliver_sipi_vector is set to the
original SIPI delivery function, kvm_vcpu_deliver_sipi_vector(). SVM adds
a new function that, for non SEV-ES guests, invokes the original SIPI
delivery function, kvm_vcpu_deliver_sipi_vector(), but for SEV-ES guests,
implements the logic above.
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Message-Id: <e8fbebe8eb161ceaabdad7c01a5859a78b424d5e.1609791600.git.thomas.lendacky@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-01-04 14:20:01 -06:00
|
|
|
case SVM_VMGEXIT_AP_HLT_LOOP:
|
2024-05-01 02:10:45 -05:00
|
|
|
svm->sev_es.ap_reset_hold_type = AP_RESET_HOLD_NAE_EVENT;
|
2021-03-02 14:40:39 -05:00
|
|
|
ret = kvm_emulate_ap_reset_hold(vcpu);
|
KVM: SVM: Add support for booting APs in an SEV-ES guest
Typically under KVM, an AP is booted using the INIT-SIPI-SIPI sequence,
where the guest vCPU register state is updated and then the vCPU is VMRUN
to begin execution of the AP. For an SEV-ES guest, this won't work because
the guest register state is encrypted.
Following the GHCB specification, the hypervisor must not alter the guest
register state, so KVM must track an AP/vCPU boot. Should the guest want
to park the AP, it must use the AP Reset Hold exit event in place of, for
example, a HLT loop.
First AP boot (first INIT-SIPI-SIPI sequence):
Execute the AP (vCPU) as it was initialized and measured by the SEV-ES
support. It is up to the guest to transfer control of the AP to the
proper location.
Subsequent AP boot:
KVM will expect to receive an AP Reset Hold exit event indicating that
the vCPU is being parked and will require an INIT-SIPI-SIPI sequence to
awaken it. When the AP Reset Hold exit event is received, KVM will place
the vCPU into a simulated HLT mode. Upon receiving the INIT-SIPI-SIPI
sequence, KVM will make the vCPU runnable. It is again up to the guest
to then transfer control of the AP to the proper location.
To differentiate between an actual HLT and an AP Reset Hold, a new MP
state is introduced, KVM_MP_STATE_AP_RESET_HOLD, which the vCPU is
placed in upon receiving the AP Reset Hold exit event. Additionally, to
communicate the AP Reset Hold exit event up to userspace (if needed), a
new exit reason is introduced, KVM_EXIT_AP_RESET_HOLD.
A new x86 ops function is introduced, vcpu_deliver_sipi_vector, in order
to accomplish AP booting. For VMX, vcpu_deliver_sipi_vector is set to the
original SIPI delivery function, kvm_vcpu_deliver_sipi_vector(). SVM adds
a new function that, for non SEV-ES guests, invokes the original SIPI
delivery function, kvm_vcpu_deliver_sipi_vector(), but for SEV-ES guests,
implements the logic above.
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Message-Id: <e8fbebe8eb161ceaabdad7c01a5859a78b424d5e.1609791600.git.thomas.lendacky@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-01-04 14:20:01 -06:00
|
|
|
break;
|
2020-12-15 12:44:07 -05:00
|
|
|
case SVM_VMGEXIT_AP_JUMP_TABLE: {
|
2025-01-23 11:21:40 +05:30
|
|
|
struct kvm_sev_info *sev = to_kvm_sev_info(vcpu->kvm);
|
2020-12-15 12:44:07 -05:00
|
|
|
|
|
|
|
switch (control->exit_info_1) {
|
|
|
|
case 0:
|
|
|
|
/* Set AP jump table address */
|
|
|
|
sev->ap_jump_table = control->exit_info_2;
|
|
|
|
break;
|
|
|
|
case 1:
|
|
|
|
/* Get AP jump table address */
|
2025-02-25 21:39:37 +00:00
|
|
|
svm_vmgexit_success(svm, sev->ap_jump_table);
|
2020-12-15 12:44:07 -05:00
|
|
|
break;
|
|
|
|
default:
|
|
|
|
pr_err("svm: vmgexit: unsupported AP jump table request - exit_info_1=%#llx\n",
|
|
|
|
control->exit_info_1);
|
2025-02-25 21:39:37 +00:00
|
|
|
svm_vmgexit_bad_input(svm, GHCB_ERR_INVALID_INPUT);
|
2020-12-15 12:44:07 -05:00
|
|
|
}
|
|
|
|
|
KVM: SVM: Exit to userspace on ENOMEM/EFAULT GHCB errors
Exit to userspace if setup_vmgexit_scratch() fails due to OOM or because
copying data from guest (userspace) memory failed/faulted. The OOM
scenario is clearcut, it's userspace's decision as to whether it should
terminate the guest, free memory, etc...
As for -EFAULT, arguably, any guest issue is a violation of the guest's
contract with userspace, and thus userspace needs to decide how to
proceed. E.g. userspace defines what is RAM vs. MMIO and communicates
that directly to the guest, KVM is not involved in deciding what is/isn't
RAM nor in communicating that information to the guest. If the scratch
GPA doesn't resolve to a memslot, then the guest is not honoring the
memory configuration as defined by userspace.
And if userspace unmaps an hva for whatever reason, then exiting to
userspace with -EFAULT is absolutely the right thing to do. KVM's ABI
currently sucks and doesn't provide enough information to act on the
-EFAULT, but that will hopefully be remedied in the future as there are
multiple use cases, e.g. uffd and virtiofs truncation, that shouldn't
require any work in KVM beyond returning -EFAULT with a small amount of
metadata.
KVM could define its ABI such that failure to access the scratch area is
reflected into the guest, i.e. establish a contract with userspace, but
that's undesirable as it limits KVM's options in the future, e.g. in the
potential uffd case any failure on a uaccess needs to kick out to
userspace. KVM does have several cases where it reflects these errors
into the guest, e.g. kvm_pv_clock_pairing() and Hyper-V emulation, but
KVM would preferably "fix" those instead of propagating the falsehood
that any memory failure is the guest's fault.
Lastly, returning a boolean as an "error" for that a helper that isn't
named accordingly never works out well.
Fixes: ad5b353240c8 ("KVM: SVM: Do not terminate SEV-ES guests on GHCB validation failure")
Cc: Alper Gun <alpergun@google.com>
Cc: Peter Gonda <pgonda@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20220225205209.3881130-1-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-02-25 20:52:09 +00:00
|
|
|
ret = 1;
|
2020-12-15 12:44:07 -05:00
|
|
|
break;
|
|
|
|
}
|
2024-05-01 02:10:46 -05:00
|
|
|
case SVM_VMGEXIT_HV_FEATURES:
|
2025-02-25 21:39:37 +00:00
|
|
|
svm_vmgexit_success(svm, GHCB_HV_FT_SUPPORTED);
|
2024-05-01 02:10:46 -05:00
|
|
|
ret = 1;
|
|
|
|
break;
|
2024-05-01 02:10:47 -05:00
|
|
|
case SVM_VMGEXIT_TERM_REQUEST:
|
|
|
|
pr_info("SEV-ES guest requested termination: reason %#llx info %#llx\n",
|
|
|
|
control->exit_info_1, control->exit_info_2);
|
|
|
|
vcpu->run->exit_reason = KVM_EXIT_SYSTEM_EVENT;
|
|
|
|
vcpu->run->system_event.type = KVM_SYSTEM_EVENT_SEV_TERM;
|
|
|
|
vcpu->run->system_event.ndata = 1;
|
|
|
|
vcpu->run->system_event.data[0] = control->ghcb_gpa;
|
|
|
|
break;
|
2024-05-01 03:52:00 -05:00
|
|
|
case SVM_VMGEXIT_PSC:
|
|
|
|
ret = setup_vmgexit_scratch(svm, true, control->exit_info_2);
|
|
|
|
if (ret)
|
|
|
|
break;
|
|
|
|
|
|
|
|
ret = snp_begin_psc(svm, svm->sev_es.ghcb_sa);
|
|
|
|
break;
|
2024-05-01 03:52:02 -05:00
|
|
|
case SVM_VMGEXIT_AP_CREATION:
|
|
|
|
ret = sev_snp_ap_creation(svm);
|
|
|
|
if (ret) {
|
2025-02-25 21:39:37 +00:00
|
|
|
svm_vmgexit_bad_input(svm, GHCB_ERR_INVALID_INPUT);
|
2024-05-01 03:52:02 -05:00
|
|
|
}
|
|
|
|
|
|
|
|
ret = 1;
|
|
|
|
break;
|
KVM: SEV: Provide support for SNP_GUEST_REQUEST NAE event
Version 2 of GHCB specification added support for the SNP Guest Request
Message NAE event. The event allows for an SEV-SNP guest to make
requests to the SEV-SNP firmware through the hypervisor using the
SNP_GUEST_REQUEST API defined in the SEV-SNP firmware specification.
This is used by guests primarily to request attestation reports from
firmware. There are other request types are available as well, but the
specifics of what guest requests are being made generally does not
affect how they are handled by the hypervisor, which only serves as a
proxy for the guest requests and firmware responses.
Implement handling for these events.
When an SNP Guest Request is issued, the guest will provide its own
request/response pages, which could in theory be passed along directly
to firmware. However, these pages would need special care:
- Both pages are from shared guest memory, so they need to be
protected from migration/etc. occurring while firmware reads/writes
to them. At a minimum, this requires elevating the ref counts and
potentially needing an explicit pinning of the memory. This places
additional restrictions on what type of memory backends userspace
can use for shared guest memory since there would be some reliance
on using refcounted pages.
- The response page needs to be switched to Firmware-owned state
before the firmware can write to it, which can lead to potential
host RMP #PFs if the guest is misbehaved and hands the host a
guest page that KVM is writing to for other reasons (e.g. virtio
buffers).
Both of these issues can be avoided completely by using
separately-allocated bounce pages for both the request/response pages
and passing those to firmware instead. So that's the approach taken
here.
Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Co-developed-by: Alexey Kardashevskiy <aik@amd.com>
Signed-off-by: Alexey Kardashevskiy <aik@amd.com>
Co-developed-by: Ashish Kalra <ashish.kalra@amd.com>
Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com>
Reviewed-by: Liam Merwick <liam.merwick@oracle.com>
[mdr: ensure FW command failures are indicated to guest, drop extended
request handling to be re-written as separate patch, massage commit]
Signed-off-by: Michael Roth <michael.roth@amd.com>
Message-ID: <20240701223148.3798365-2-michael.roth@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-07-01 17:31:46 -05:00
|
|
|
case SVM_VMGEXIT_GUEST_REQUEST:
|
|
|
|
ret = snp_handle_guest_req(svm, control->exit_info_1, control->exit_info_2);
|
|
|
|
break;
|
2024-07-01 17:31:48 -05:00
|
|
|
case SVM_VMGEXIT_EXT_GUEST_REQUEST:
|
|
|
|
ret = snp_handle_ext_guest_req(svm, control->exit_info_1, control->exit_info_2);
|
|
|
|
break;
|
2020-12-10 11:09:47 -06:00
|
|
|
case SVM_VMGEXIT_UNSUPPORTED_EVENT:
|
2021-03-02 14:40:39 -05:00
|
|
|
vcpu_unimpl(vcpu,
|
2020-12-10 11:09:47 -06:00
|
|
|
"vmgexit: unsupported event - exit_info_1=%#llx, exit_info_2=%#llx\n",
|
|
|
|
control->exit_info_1, control->exit_info_2);
|
2021-11-09 22:23:49 +00:00
|
|
|
ret = -EINVAL;
|
2020-12-10 11:09:47 -06:00
|
|
|
break;
|
|
|
|
default:
|
2021-03-02 14:40:39 -05:00
|
|
|
ret = svm_invoke_exit_handler(vcpu, exit_code);
|
2020-12-10 11:09:47 -06:00
|
|
|
}
|
|
|
|
|
|
|
|
return ret;
|
|
|
|
}
|
2020-12-10 11:09:54 -06:00
|
|
|
|
|
|
|
int sev_es_string_io(struct vcpu_svm *svm, int size, unsigned int port, int in)
|
|
|
|
{
|
2021-10-25 12:14:31 -04:00
|
|
|
int count;
|
|
|
|
int bytes;
|
KVM: SVM: Exit to userspace on ENOMEM/EFAULT GHCB errors
Exit to userspace if setup_vmgexit_scratch() fails due to OOM or because
copying data from guest (userspace) memory failed/faulted. The OOM
scenario is clearcut, it's userspace's decision as to whether it should
terminate the guest, free memory, etc...
As for -EFAULT, arguably, any guest issue is a violation of the guest's
contract with userspace, and thus userspace needs to decide how to
proceed. E.g. userspace defines what is RAM vs. MMIO and communicates
that directly to the guest, KVM is not involved in deciding what is/isn't
RAM nor in communicating that information to the guest. If the scratch
GPA doesn't resolve to a memslot, then the guest is not honoring the
memory configuration as defined by userspace.
And if userspace unmaps an hva for whatever reason, then exiting to
userspace with -EFAULT is absolutely the right thing to do. KVM's ABI
currently sucks and doesn't provide enough information to act on the
-EFAULT, but that will hopefully be remedied in the future as there are
multiple use cases, e.g. uffd and virtiofs truncation, that shouldn't
require any work in KVM beyond returning -EFAULT with a small amount of
metadata.
KVM could define its ABI such that failure to access the scratch area is
reflected into the guest, i.e. establish a contract with userspace, but
that's undesirable as it limits KVM's options in the future, e.g. in the
potential uffd case any failure on a uaccess needs to kick out to
userspace. KVM does have several cases where it reflects these errors
into the guest, e.g. kvm_pv_clock_pairing() and Hyper-V emulation, but
KVM would preferably "fix" those instead of propagating the falsehood
that any memory failure is the guest's fault.
Lastly, returning a boolean as an "error" for that a helper that isn't
named accordingly never works out well.
Fixes: ad5b353240c8 ("KVM: SVM: Do not terminate SEV-ES guests on GHCB validation failure")
Cc: Alper Gun <alpergun@google.com>
Cc: Peter Gonda <pgonda@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20220225205209.3881130-1-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-02-25 20:52:09 +00:00
|
|
|
int r;
|
2021-10-25 12:14:31 -04:00
|
|
|
|
|
|
|
if (svm->vmcb->control.exit_info_2 > INT_MAX)
|
|
|
|
return -EINVAL;
|
|
|
|
|
|
|
|
count = svm->vmcb->control.exit_info_2;
|
|
|
|
if (unlikely(check_mul_overflow(count, size, &bytes)))
|
|
|
|
return -EINVAL;
|
|
|
|
|
KVM: SVM: Exit to userspace on ENOMEM/EFAULT GHCB errors
Exit to userspace if setup_vmgexit_scratch() fails due to OOM or because
copying data from guest (userspace) memory failed/faulted. The OOM
scenario is clearcut, it's userspace's decision as to whether it should
terminate the guest, free memory, etc...
As for -EFAULT, arguably, any guest issue is a violation of the guest's
contract with userspace, and thus userspace needs to decide how to
proceed. E.g. userspace defines what is RAM vs. MMIO and communicates
that directly to the guest, KVM is not involved in deciding what is/isn't
RAM nor in communicating that information to the guest. If the scratch
GPA doesn't resolve to a memslot, then the guest is not honoring the
memory configuration as defined by userspace.
And if userspace unmaps an hva for whatever reason, then exiting to
userspace with -EFAULT is absolutely the right thing to do. KVM's ABI
currently sucks and doesn't provide enough information to act on the
-EFAULT, but that will hopefully be remedied in the future as there are
multiple use cases, e.g. uffd and virtiofs truncation, that shouldn't
require any work in KVM beyond returning -EFAULT with a small amount of
metadata.
KVM could define its ABI such that failure to access the scratch area is
reflected into the guest, i.e. establish a contract with userspace, but
that's undesirable as it limits KVM's options in the future, e.g. in the
potential uffd case any failure on a uaccess needs to kick out to
userspace. KVM does have several cases where it reflects these errors
into the guest, e.g. kvm_pv_clock_pairing() and Hyper-V emulation, but
KVM would preferably "fix" those instead of propagating the falsehood
that any memory failure is the guest's fault.
Lastly, returning a boolean as an "error" for that a helper that isn't
named accordingly never works out well.
Fixes: ad5b353240c8 ("KVM: SVM: Do not terminate SEV-ES guests on GHCB validation failure")
Cc: Alper Gun <alpergun@google.com>
Cc: Peter Gonda <pgonda@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20220225205209.3881130-1-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-02-25 20:52:09 +00:00
|
|
|
r = setup_vmgexit_scratch(svm, in, bytes);
|
|
|
|
if (r)
|
|
|
|
return r;
|
2020-12-10 11:09:54 -06:00
|
|
|
|
2021-10-21 10:42:59 -07:00
|
|
|
return kvm_sev_es_string_io(&svm->vcpu, size, port, svm->sev_es.ghcb_sa,
|
2021-11-11 10:52:26 -05:00
|
|
|
count, in);
|
2020-12-10 11:09:54 -06:00
|
|
|
}
|
2020-12-10 11:10:06 -06:00
|
|
|
|
2025-06-10 15:57:24 -07:00
|
|
|
void sev_es_recalc_msr_intercepts(struct kvm_vcpu *vcpu)
|
KVM: SVM: Fix TSC_AUX virtualization setup
The checks for virtualizing TSC_AUX occur during the vCPU reset processing
path. However, at the time of initial vCPU reset processing, when the vCPU
is first created, not all of the guest CPUID information has been set. In
this case the RDTSCP and RDPID feature support for the guest is not in
place and so TSC_AUX virtualization is not established.
This continues for each vCPU created for the guest. On the first boot of
an AP, vCPU reset processing is executed as a result of an APIC INIT
event, this time with all of the guest CPUID information set, resulting
in TSC_AUX virtualization being enabled, but only for the APs. The BSP
always sees a TSC_AUX value of 0 which probably went unnoticed because,
at least for Linux, the BSP TSC_AUX value is 0.
Move the TSC_AUX virtualization enablement out of the init_vmcb() path and
into the vcpu_after_set_cpuid() path to allow for proper initialization of
the support after the guest CPUID information has been set.
With the TSC_AUX virtualization support now in the vcpu_set_after_cpuid()
path, the intercepts must be either cleared or set based on the guest
CPUID input.
Fixes: 296d5a17e793 ("KVM: SEV-ES: Use V_TSC_AUX if available instead of RDTSC/MSR_TSC_AUX intercepts")
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Message-Id: <4137fbcb9008951ab5f0befa74a0399d2cce809a.1694811272.git.thomas.lendacky@amd.com>
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-09-15 15:54:30 -05:00
|
|
|
{
|
2025-06-10 15:57:24 -07:00
|
|
|
/* Clear intercepts on MSRs that are context switched by hardware. */
|
|
|
|
svm_disable_intercept_for_msr(vcpu, MSR_AMD64_SEV_ES_GHCB, MSR_TYPE_RW);
|
|
|
|
svm_disable_intercept_for_msr(vcpu, MSR_EFER, MSR_TYPE_RW);
|
|
|
|
svm_disable_intercept_for_msr(vcpu, MSR_IA32_CR_PAT, MSR_TYPE_RW);
|
KVM: SVM: Fix TSC_AUX virtualization setup
The checks for virtualizing TSC_AUX occur during the vCPU reset processing
path. However, at the time of initial vCPU reset processing, when the vCPU
is first created, not all of the guest CPUID information has been set. In
this case the RDTSCP and RDPID feature support for the guest is not in
place and so TSC_AUX virtualization is not established.
This continues for each vCPU created for the guest. On the first boot of
an AP, vCPU reset processing is executed as a result of an APIC INIT
event, this time with all of the guest CPUID information set, resulting
in TSC_AUX virtualization being enabled, but only for the APs. The BSP
always sees a TSC_AUX value of 0 which probably went unnoticed because,
at least for Linux, the BSP TSC_AUX value is 0.
Move the TSC_AUX virtualization enablement out of the init_vmcb() path and
into the vcpu_after_set_cpuid() path to allow for proper initialization of
the support after the guest CPUID information has been set.
With the TSC_AUX virtualization support now in the vcpu_set_after_cpuid()
path, the intercepts must be either cleared or set based on the guest
CPUID input.
Fixes: 296d5a17e793 ("KVM: SEV-ES: Use V_TSC_AUX if available instead of RDTSC/MSR_TSC_AUX intercepts")
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Message-Id: <4137fbcb9008951ab5f0befa74a0399d2cce809a.1694811272.git.thomas.lendacky@amd.com>
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-09-15 15:54:30 -05:00
|
|
|
|
KVM: SVM: Implement and adopt VMX style MSR intercepts APIs
Add and use SVM MSR interception APIs (in most paths) to match VMX's
APIs and nomenclature. Specifically, add SVM variants of:
vmx_disable_intercept_for_msr(vcpu, msr, type)
vmx_enable_intercept_for_msr(vcpu, msr, type)
vmx_set_intercept_for_msr(vcpu, msr, type, intercept)
to eventually replace SVM's single helper:
set_msr_interception(vcpu, msrpm, msr, allow_read, allow_write)
which is awkward to use (in all cases, KVM either applies the same logic
for both reads and writes, or intercepts one of read or write), and is
unintuitive due to using '0' to indicate interception should be *set*.
Keep the guts of the old API for the moment to avoid churning the MSR
filter code, as that mess will be overhauled in the near future. Leave
behind a temporary comment to call out that the shadow bitmaps have
inverted polarity relative to the bitmaps consumed by hardware.
No functional change intended.
Reviewed-by: Chao Gao <chao.gao@intel.com>
Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com>
Link: https://lore.kernel.org/r/20250610225737.156318-15-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-06-10 15:57:19 -07:00
|
|
|
if (boot_cpu_has(X86_FEATURE_V_TSC_AUX))
|
|
|
|
svm_set_intercept_for_msr(vcpu, MSR_TSC_AUX, MSR_TYPE_RW,
|
|
|
|
!guest_cpu_cap_has(vcpu, X86_FEATURE_RDTSCP) &&
|
|
|
|
!guest_cpu_cap_has(vcpu, X86_FEATURE_RDPID));
|
KVM: SEV: Do not intercept accesses to MSR_IA32_XSS for SEV-ES guests
When intercepts are enabled for MSR_IA32_XSS, the host will swap in/out
the guest-defined values while context-switching to/from guest mode.
However, in the case of SEV-ES, vcpu->arch.guest_state_protected is set,
so the guest-defined value is effectively ignored when switching to
guest mode with the understanding that the VMSA will handle swapping
in/out this register state.
However, SVM is still configured to intercept these accesses for SEV-ES
guests, so the values in the initial MSR_IA32_XSS are effectively
read-only, and a guest will experience undefined behavior if it actually
tries to write to this MSR. Fortunately, only CET/shadowstack makes use
of this register on SEV-ES-capable systems currently, which isn't yet
widely used, but this may become more of an issue in the future.
Additionally, enabling intercepts of MSR_IA32_XSS results in #VC
exceptions in the guest in certain paths that can lead to unexpected #VC
nesting levels. One example is SEV-SNP guests when handling #VC
exceptions for CPUID instructions involving leaf 0xD, subleaf 0x1, since
they will access MSR_IA32_XSS as part of servicing the CPUID #VC, then
generate another #VC when accessing MSR_IA32_XSS, which can lead to
guest crashes if an NMI occurs at that point in time. Running perf on a
guest while it is issuing such a sequence is one example where these can
be problematic.
Address this by disabling intercepts of MSR_IA32_XSS for SEV-ES guests
if the host/guest configuration allows it. If the host/guest
configuration doesn't allow for MSR_IA32_XSS, leave it intercepted so
that it can be caught by the existing checks in
kvm_{set,get}_msr_common() if the guest still attempts to access it.
Fixes: 376c6d285017 ("KVM: SVM: Provide support for SEV-ES vCPU creation/loading")
Cc: Alexey Kardashevskiy <aik@amd.com>
Suggested-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Michael Roth <michael.roth@amd.com>
Message-Id: <20231016132819.1002933-4-michael.roth@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-10-16 08:27:32 -05:00
|
|
|
|
|
|
|
/*
|
|
|
|
* For SEV-ES, accesses to MSR_IA32_XSS should not be intercepted if
|
|
|
|
* the host/guest supports its use.
|
|
|
|
*
|
2024-11-27 17:34:06 -08:00
|
|
|
* KVM treats the guest as being capable of using XSAVES even if XSAVES
|
|
|
|
* isn't enabled in guest CPUID as there is no intercept for XSAVES,
|
|
|
|
* i.e. the guest can use XSAVES/XRSTOR to read/write XSS if XSAVE is
|
|
|
|
* exposed to the guest and XSAVES is supported in hardware. Condition
|
|
|
|
* full XSS passthrough on the guest being able to use XSAVES *and*
|
|
|
|
* XSAVES being exposed to the guest so that KVM can at least honor
|
|
|
|
* guest CPUID for RDMSR and WRMSR.
|
KVM: SEV: Do not intercept accesses to MSR_IA32_XSS for SEV-ES guests
When intercepts are enabled for MSR_IA32_XSS, the host will swap in/out
the guest-defined values while context-switching to/from guest mode.
However, in the case of SEV-ES, vcpu->arch.guest_state_protected is set,
so the guest-defined value is effectively ignored when switching to
guest mode with the understanding that the VMSA will handle swapping
in/out this register state.
However, SVM is still configured to intercept these accesses for SEV-ES
guests, so the values in the initial MSR_IA32_XSS are effectively
read-only, and a guest will experience undefined behavior if it actually
tries to write to this MSR. Fortunately, only CET/shadowstack makes use
of this register on SEV-ES-capable systems currently, which isn't yet
widely used, but this may become more of an issue in the future.
Additionally, enabling intercepts of MSR_IA32_XSS results in #VC
exceptions in the guest in certain paths that can lead to unexpected #VC
nesting levels. One example is SEV-SNP guests when handling #VC
exceptions for CPUID instructions involving leaf 0xD, subleaf 0x1, since
they will access MSR_IA32_XSS as part of servicing the CPUID #VC, then
generate another #VC when accessing MSR_IA32_XSS, which can lead to
guest crashes if an NMI occurs at that point in time. Running perf on a
guest while it is issuing such a sequence is one example where these can
be problematic.
Address this by disabling intercepts of MSR_IA32_XSS for SEV-ES guests
if the host/guest configuration allows it. If the host/guest
configuration doesn't allow for MSR_IA32_XSS, leave it intercepted so
that it can be caught by the existing checks in
kvm_{set,get}_msr_common() if the guest still attempts to access it.
Fixes: 376c6d285017 ("KVM: SVM: Provide support for SEV-ES vCPU creation/loading")
Cc: Alexey Kardashevskiy <aik@amd.com>
Suggested-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Michael Roth <michael.roth@amd.com>
Message-Id: <20231016132819.1002933-4-michael.roth@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-10-16 08:27:32 -05:00
|
|
|
*/
|
2025-06-12 01:19:47 -07:00
|
|
|
svm_set_intercept_for_msr(vcpu, MSR_IA32_XSS, MSR_TYPE_RW,
|
|
|
|
!guest_cpu_cap_has(vcpu, X86_FEATURE_XSAVES) ||
|
|
|
|
!guest_cpuid_has(vcpu, X86_FEATURE_XSAVES));
|
KVM: SVM: Fix TSC_AUX virtualization setup
The checks for virtualizing TSC_AUX occur during the vCPU reset processing
path. However, at the time of initial vCPU reset processing, when the vCPU
is first created, not all of the guest CPUID information has been set. In
this case the RDTSCP and RDPID feature support for the guest is not in
place and so TSC_AUX virtualization is not established.
This continues for each vCPU created for the guest. On the first boot of
an AP, vCPU reset processing is executed as a result of an APIC INIT
event, this time with all of the guest CPUID information set, resulting
in TSC_AUX virtualization being enabled, but only for the APs. The BSP
always sees a TSC_AUX value of 0 which probably went unnoticed because,
at least for Linux, the BSP TSC_AUX value is 0.
Move the TSC_AUX virtualization enablement out of the init_vmcb() path and
into the vcpu_after_set_cpuid() path to allow for proper initialization of
the support after the guest CPUID information has been set.
With the TSC_AUX virtualization support now in the vcpu_set_after_cpuid()
path, the intercepts must be either cleared or set based on the guest
CPUID input.
Fixes: 296d5a17e793 ("KVM: SEV-ES: Use V_TSC_AUX if available instead of RDTSC/MSR_TSC_AUX intercepts")
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Message-Id: <4137fbcb9008951ab5f0befa74a0399d2cce809a.1694811272.git.thomas.lendacky@amd.com>
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-09-15 15:54:30 -05:00
|
|
|
}
|
|
|
|
|
|
|
|
void sev_vcpu_after_set_cpuid(struct vcpu_svm *svm)
|
|
|
|
{
|
|
|
|
struct kvm_vcpu *vcpu = &svm->vcpu;
|
|
|
|
struct kvm_cpuid_entry2 *best;
|
|
|
|
|
|
|
|
/* For sev guests, the memory encryption bit is not reserved in CR3. */
|
|
|
|
best = kvm_find_cpuid_entry(vcpu, 0x8000001F);
|
|
|
|
if (best)
|
|
|
|
vcpu->arch.reserved_gpa_bits &= ~(1UL << (best->ebx & 0x3f));
|
|
|
|
}
|
|
|
|
|
2022-06-23 10:34:06 -07:00
|
|
|
static void sev_es_init_vmcb(struct vcpu_svm *svm)
|
2020-12-10 11:10:06 -06:00
|
|
|
{
|
2025-03-10 15:16:03 -05:00
|
|
|
struct kvm_sev_info *sev = to_kvm_sev_info(svm->vcpu.kvm);
|
2023-06-15 16:37:53 +10:00
|
|
|
struct vmcb *vmcb = svm->vmcb01.ptr;
|
2020-12-10 11:10:06 -06:00
|
|
|
|
|
|
|
svm->vmcb->control.nested_ctl |= SVM_NESTED_CTL_SEV_ES_ENABLE;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* An SEV-ES guest requires a VMSA area that is a separate from the
|
|
|
|
* VMCB page. Do not include the encryption mask on the VMSA physical
|
2023-08-24 19:23:57 -07:00
|
|
|
* address since hardware will access it using the guest key. Note,
|
|
|
|
* the VMSA will be NULL if this vCPU is the destination for intrahost
|
|
|
|
* migration, and will be copied later.
|
2020-12-10 11:10:06 -06:00
|
|
|
*/
|
2025-06-02 15:44:59 -07:00
|
|
|
if (!svm->sev_es.snp_has_guest_vmsa) {
|
|
|
|
if (svm->sev_es.vmsa)
|
|
|
|
svm->vmcb->control.vmsa_pa = __pa(svm->sev_es.vmsa);
|
|
|
|
else
|
|
|
|
svm->vmcb->control.vmsa_pa = INVALID_PAGE;
|
|
|
|
}
|
2020-12-10 11:10:06 -06:00
|
|
|
|
2025-03-10 15:16:03 -05:00
|
|
|
if (cpu_feature_enabled(X86_FEATURE_ALLOWED_SEV_FEATURES))
|
|
|
|
svm->vmcb->control.allowed_sev_features = sev->vmsa_features |
|
|
|
|
VMCB_ALLOWED_SEV_FEATURES_VALID;
|
|
|
|
|
2020-12-10 11:10:06 -06:00
|
|
|
/* Can't intercept CR register access, HV can't modify CR registers */
|
|
|
|
svm_clr_intercept(svm, INTERCEPT_CR0_READ);
|
|
|
|
svm_clr_intercept(svm, INTERCEPT_CR4_READ);
|
|
|
|
svm_clr_intercept(svm, INTERCEPT_CR8_READ);
|
|
|
|
svm_clr_intercept(svm, INTERCEPT_CR0_WRITE);
|
|
|
|
svm_clr_intercept(svm, INTERCEPT_CR4_WRITE);
|
|
|
|
svm_clr_intercept(svm, INTERCEPT_CR8_WRITE);
|
|
|
|
|
|
|
|
svm_clr_intercept(svm, INTERCEPT_SELECTIVE_CR0);
|
|
|
|
|
|
|
|
/* Track EFER/CR register changes */
|
|
|
|
svm_set_intercept(svm, TRAP_EFER_WRITE);
|
|
|
|
svm_set_intercept(svm, TRAP_CR0_WRITE);
|
|
|
|
svm_set_intercept(svm, TRAP_CR4_WRITE);
|
|
|
|
svm_set_intercept(svm, TRAP_CR8_WRITE);
|
|
|
|
|
2023-06-15 16:37:53 +10:00
|
|
|
vmcb->control.intercepts[INTERCEPT_DR] = 0;
|
2024-04-04 08:13:16 -04:00
|
|
|
if (!sev_vcpu_has_debug_swap(svm)) {
|
2023-06-15 16:37:54 +10:00
|
|
|
vmcb_set_intercept(&vmcb->control, INTERCEPT_DR7_READ);
|
|
|
|
vmcb_set_intercept(&vmcb->control, INTERCEPT_DR7_WRITE);
|
|
|
|
recalc_intercepts(svm);
|
2023-06-15 16:37:55 +10:00
|
|
|
} else {
|
|
|
|
/*
|
|
|
|
* Disable #DB intercept iff DebugSwap is enabled. KVM doesn't
|
|
|
|
* allow debugging SEV-ES guests, and enables DebugSwap iff
|
|
|
|
* NO_NESTED_DATA_BP is supported, so there's no reason to
|
|
|
|
* intercept #DB when DebugSwap is enabled. For simplicity
|
|
|
|
* with respect to guest debug, intercept #DB for other VMs
|
|
|
|
* even if NO_NESTED_DATA_BP is supported, i.e. even if the
|
|
|
|
* guest can't DoS the CPU with infinite #DB vectoring.
|
|
|
|
*/
|
|
|
|
clr_exception_intercept(svm, DB_VECTOR);
|
2023-06-15 16:37:54 +10:00
|
|
|
}
|
2020-12-10 11:10:06 -06:00
|
|
|
|
|
|
|
/* Can't intercept XSETBV, HV can't modify XCR0 directly */
|
|
|
|
svm_clr_intercept(svm, INTERCEPT_XSETBV);
|
|
|
|
}
|
|
|
|
|
2022-06-23 10:34:06 -07:00
|
|
|
void sev_init_vmcb(struct vcpu_svm *svm)
|
|
|
|
{
|
|
|
|
svm->vmcb->control.nested_ctl |= SVM_NESTED_CTL_SEV_ENABLE;
|
|
|
|
clr_exception_intercept(svm, UD_VECTOR);
|
|
|
|
|
2023-06-15 16:37:50 +10:00
|
|
|
/*
|
|
|
|
* Don't intercept #GP for SEV guests, e.g. for the VMware backdoor, as
|
|
|
|
* KVM can't decrypt guest memory to decode the faulting instruction.
|
|
|
|
*/
|
|
|
|
clr_exception_intercept(svm, GP_VECTOR);
|
|
|
|
|
2022-06-23 10:34:06 -07:00
|
|
|
if (sev_es_guest(svm->vcpu.kvm))
|
|
|
|
sev_es_init_vmcb(svm);
|
|
|
|
}
|
|
|
|
|
2021-09-20 17:03:02 -07:00
|
|
|
void sev_es_vcpu_reset(struct vcpu_svm *svm)
|
2020-12-10 11:10:06 -06:00
|
|
|
{
|
2024-05-01 02:10:48 -05:00
|
|
|
struct kvm_vcpu *vcpu = &svm->vcpu;
|
2025-01-23 11:21:40 +05:30
|
|
|
struct kvm_sev_info *sev = to_kvm_sev_info(vcpu->kvm);
|
2024-05-01 02:10:48 -05:00
|
|
|
|
2020-12-10 11:10:06 -06:00
|
|
|
/*
|
2021-09-20 17:03:02 -07:00
|
|
|
* Set the GHCB MSR value as per the GHCB specification when emulating
|
|
|
|
* vCPU RESET for an SEV-ES guest.
|
2020-12-10 11:10:06 -06:00
|
|
|
*/
|
2024-05-01 02:10:48 -05:00
|
|
|
set_ghcb_msr(svm, GHCB_MSR_SEV_INFO((__u64)sev->ghcb_version,
|
2020-12-10 11:10:06 -06:00
|
|
|
GHCB_VERSION_MIN,
|
|
|
|
sev_enc_bit));
|
2024-05-01 03:52:02 -05:00
|
|
|
|
|
|
|
mutex_init(&svm->sev_es.snp_vmsa_mutex);
|
2020-12-10 11:10:06 -06:00
|
|
|
}
|
2020-12-10 11:10:07 -06:00
|
|
|
|
2024-04-04 08:13:16 -04:00
|
|
|
void sev_es_prepare_switch_to_guest(struct vcpu_svm *svm, struct sev_es_save_area *hostsa)
|
2020-12-10 11:10:07 -06:00
|
|
|
{
|
KVM: SVM: Save host DR masks on CPUs with DebugSwap
When running SEV-SNP guests on a CPU that supports DebugSwap, always save
the host's DR0..DR3 mask MSR values irrespective of whether or not
DebugSwap is enabled, to ensure the host values aren't clobbered by the
CPU. And for now, also save DR0..DR3, even though doing so isn't
necessary (see below).
SVM_VMGEXIT_AP_CREATE is deeply flawed in that it allows the *guest* to
create a VMSA with guest-controlled SEV_FEATURES. A well behaved guest
can inform the hypervisor, i.e. KVM, of its "requested" features, but on
CPUs without ALLOWED_SEV_FEATURES support, nothing prevents the guest from
lying about which SEV features are being enabled (or not!).
If a misbehaving guest enables DebugSwap in a secondary vCPU's VMSA, the
CPU will load the DR0..DR3 mask MSRs on #VMEXIT, i.e. will clobber the
MSRs with '0' if KVM doesn't save its desired value.
Note, DR0..DR3 themselves are "ok", as DR7 is reset on #VMEXIT, and KVM
restores all DRs in common x86 code as needed via hw_breakpoint_restore().
I.e. there is no risk of host DR0..DR3 being clobbered (when it matters).
However, there is a flaw in the opposite direction; because the guest can
lie about enabling DebugSwap, i.e. can *disable* DebugSwap without KVM's
knowledge, KVM must not rely on the CPU to restore DRs. Defer fixing
that wart, as it's more of a documentation issue than a bug in the code.
Note, KVM added support for DebugSwap on commit d1f85fbe836e ("KVM: SEV:
Enable data breakpoints in SEV-ES"), but that is not an appropriate Fixes,
as the underlying flaw exists in hardware, not in KVM. I.e. all kernels
that support SEV-SNP need to be patched, not just kernels with KVM's full
support for DebugSwap (ignoring that DebugSwap support landed first).
Opportunistically fix an incorrect statement in the comment; on CPUs
without DebugSwap, the CPU does NOT save or load debug registers, i.e.
Fixes: e366f92ea99e ("KVM: SEV: Support SEV-SNP AP Creation NAE event")
Cc: stable@vger.kernel.org
Cc: Naveen N Rao <naveen@kernel.org>
Cc: Kim Phillips <kim.phillips@amd.com>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: Alexey Kardashevskiy <aik@amd.com>
Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com>
Link: https://lore.kernel.org/r/20250227012541.3234589-2-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-02-26 17:25:32 -08:00
|
|
|
struct kvm *kvm = svm->vcpu.kvm;
|
|
|
|
|
2020-12-10 11:10:07 -06:00
|
|
|
/*
|
2023-06-15 16:37:51 +10:00
|
|
|
* All host state for SEV-ES guests is categorized into three swap types
|
|
|
|
* based on how it is handled by hardware during a world switch:
|
|
|
|
*
|
|
|
|
* A: VMRUN: Host state saved in host save area
|
|
|
|
* VMEXIT: Host state loaded from host save area
|
|
|
|
*
|
|
|
|
* B: VMRUN: Host state _NOT_ saved in host save area
|
|
|
|
* VMEXIT: Host state loaded from host save area
|
|
|
|
*
|
|
|
|
* C: VMRUN: Host state _NOT_ saved in host save area
|
|
|
|
* VMEXIT: Host state initialized to default(reset) values
|
|
|
|
*
|
|
|
|
* Manually save type-B state, i.e. state that is loaded by VMEXIT but
|
|
|
|
* isn't saved by VMRUN, that isn't already saved by VMSAVE (performed
|
|
|
|
* by common SVM code).
|
2020-12-10 11:10:07 -06:00
|
|
|
*/
|
2024-04-23 15:15:19 -07:00
|
|
|
hostsa->xcr0 = kvm_host.xcr0;
|
2020-12-10 11:10:07 -06:00
|
|
|
hostsa->pkru = read_pkru();
|
2024-04-23 15:15:18 -07:00
|
|
|
hostsa->xss = kvm_host.xss;
|
2023-06-15 16:37:54 +10:00
|
|
|
|
|
|
|
/*
|
|
|
|
* If DebugSwap is enabled, debug registers are loaded but NOT saved by
|
KVM: SVM: Save host DR masks on CPUs with DebugSwap
When running SEV-SNP guests on a CPU that supports DebugSwap, always save
the host's DR0..DR3 mask MSR values irrespective of whether or not
DebugSwap is enabled, to ensure the host values aren't clobbered by the
CPU. And for now, also save DR0..DR3, even though doing so isn't
necessary (see below).
SVM_VMGEXIT_AP_CREATE is deeply flawed in that it allows the *guest* to
create a VMSA with guest-controlled SEV_FEATURES. A well behaved guest
can inform the hypervisor, i.e. KVM, of its "requested" features, but on
CPUs without ALLOWED_SEV_FEATURES support, nothing prevents the guest from
lying about which SEV features are being enabled (or not!).
If a misbehaving guest enables DebugSwap in a secondary vCPU's VMSA, the
CPU will load the DR0..DR3 mask MSRs on #VMEXIT, i.e. will clobber the
MSRs with '0' if KVM doesn't save its desired value.
Note, DR0..DR3 themselves are "ok", as DR7 is reset on #VMEXIT, and KVM
restores all DRs in common x86 code as needed via hw_breakpoint_restore().
I.e. there is no risk of host DR0..DR3 being clobbered (when it matters).
However, there is a flaw in the opposite direction; because the guest can
lie about enabling DebugSwap, i.e. can *disable* DebugSwap without KVM's
knowledge, KVM must not rely on the CPU to restore DRs. Defer fixing
that wart, as it's more of a documentation issue than a bug in the code.
Note, KVM added support for DebugSwap on commit d1f85fbe836e ("KVM: SEV:
Enable data breakpoints in SEV-ES"), but that is not an appropriate Fixes,
as the underlying flaw exists in hardware, not in KVM. I.e. all kernels
that support SEV-SNP need to be patched, not just kernels with KVM's full
support for DebugSwap (ignoring that DebugSwap support landed first).
Opportunistically fix an incorrect statement in the comment; on CPUs
without DebugSwap, the CPU does NOT save or load debug registers, i.e.
Fixes: e366f92ea99e ("KVM: SEV: Support SEV-SNP AP Creation NAE event")
Cc: stable@vger.kernel.org
Cc: Naveen N Rao <naveen@kernel.org>
Cc: Kim Phillips <kim.phillips@amd.com>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: Alexey Kardashevskiy <aik@amd.com>
Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com>
Link: https://lore.kernel.org/r/20250227012541.3234589-2-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-02-26 17:25:32 -08:00
|
|
|
* the CPU (Type-B). If DebugSwap is disabled/unsupported, the CPU does
|
2025-02-26 17:25:33 -08:00
|
|
|
* not save or load debug registers. Sadly, KVM can't prevent SNP
|
|
|
|
* guests from lying about DebugSwap on secondary vCPUs, i.e. the
|
|
|
|
* SEV_FEATURES provided at "AP Create" isn't guaranteed to match what
|
|
|
|
* the guest has actually enabled (or not!) in the VMSA.
|
|
|
|
*
|
|
|
|
* If DebugSwap is *possible*, save the masks so that they're restored
|
|
|
|
* if the guest enables DebugSwap. But for the DRs themselves, do NOT
|
|
|
|
* rely on the CPU to restore the host values; KVM will restore them as
|
|
|
|
* needed in common code, via hw_breakpoint_restore(). Note, KVM does
|
|
|
|
* NOT support virtualizing Breakpoint Extensions, i.e. the mask MSRs
|
|
|
|
* don't need to be restored per se, KVM just needs to ensure they are
|
|
|
|
* loaded with the correct values *if* the CPU writes the MSRs.
|
2023-06-15 16:37:54 +10:00
|
|
|
*/
|
KVM: SVM: Save host DR masks on CPUs with DebugSwap
When running SEV-SNP guests on a CPU that supports DebugSwap, always save
the host's DR0..DR3 mask MSR values irrespective of whether or not
DebugSwap is enabled, to ensure the host values aren't clobbered by the
CPU. And for now, also save DR0..DR3, even though doing so isn't
necessary (see below).
SVM_VMGEXIT_AP_CREATE is deeply flawed in that it allows the *guest* to
create a VMSA with guest-controlled SEV_FEATURES. A well behaved guest
can inform the hypervisor, i.e. KVM, of its "requested" features, but on
CPUs without ALLOWED_SEV_FEATURES support, nothing prevents the guest from
lying about which SEV features are being enabled (or not!).
If a misbehaving guest enables DebugSwap in a secondary vCPU's VMSA, the
CPU will load the DR0..DR3 mask MSRs on #VMEXIT, i.e. will clobber the
MSRs with '0' if KVM doesn't save its desired value.
Note, DR0..DR3 themselves are "ok", as DR7 is reset on #VMEXIT, and KVM
restores all DRs in common x86 code as needed via hw_breakpoint_restore().
I.e. there is no risk of host DR0..DR3 being clobbered (when it matters).
However, there is a flaw in the opposite direction; because the guest can
lie about enabling DebugSwap, i.e. can *disable* DebugSwap without KVM's
knowledge, KVM must not rely on the CPU to restore DRs. Defer fixing
that wart, as it's more of a documentation issue than a bug in the code.
Note, KVM added support for DebugSwap on commit d1f85fbe836e ("KVM: SEV:
Enable data breakpoints in SEV-ES"), but that is not an appropriate Fixes,
as the underlying flaw exists in hardware, not in KVM. I.e. all kernels
that support SEV-SNP need to be patched, not just kernels with KVM's full
support for DebugSwap (ignoring that DebugSwap support landed first).
Opportunistically fix an incorrect statement in the comment; on CPUs
without DebugSwap, the CPU does NOT save or load debug registers, i.e.
Fixes: e366f92ea99e ("KVM: SEV: Support SEV-SNP AP Creation NAE event")
Cc: stable@vger.kernel.org
Cc: Naveen N Rao <naveen@kernel.org>
Cc: Kim Phillips <kim.phillips@amd.com>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: Alexey Kardashevskiy <aik@amd.com>
Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com>
Link: https://lore.kernel.org/r/20250227012541.3234589-2-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-02-26 17:25:32 -08:00
|
|
|
if (sev_vcpu_has_debug_swap(svm) ||
|
|
|
|
(sev_snp_guest(kvm) && cpu_feature_enabled(X86_FEATURE_DEBUG_SWAP))) {
|
2023-06-15 16:37:54 +10:00
|
|
|
hostsa->dr0_addr_mask = amd_get_dr_addr_mask(0);
|
|
|
|
hostsa->dr1_addr_mask = amd_get_dr_addr_mask(1);
|
|
|
|
hostsa->dr2_addr_mask = amd_get_dr_addr_mask(2);
|
|
|
|
hostsa->dr3_addr_mask = amd_get_dr_addr_mask(3);
|
|
|
|
}
|
2020-12-10 11:10:07 -06:00
|
|
|
}
|
|
|
|
|
KVM: SVM: Add support for booting APs in an SEV-ES guest
Typically under KVM, an AP is booted using the INIT-SIPI-SIPI sequence,
where the guest vCPU register state is updated and then the vCPU is VMRUN
to begin execution of the AP. For an SEV-ES guest, this won't work because
the guest register state is encrypted.
Following the GHCB specification, the hypervisor must not alter the guest
register state, so KVM must track an AP/vCPU boot. Should the guest want
to park the AP, it must use the AP Reset Hold exit event in place of, for
example, a HLT loop.
First AP boot (first INIT-SIPI-SIPI sequence):
Execute the AP (vCPU) as it was initialized and measured by the SEV-ES
support. It is up to the guest to transfer control of the AP to the
proper location.
Subsequent AP boot:
KVM will expect to receive an AP Reset Hold exit event indicating that
the vCPU is being parked and will require an INIT-SIPI-SIPI sequence to
awaken it. When the AP Reset Hold exit event is received, KVM will place
the vCPU into a simulated HLT mode. Upon receiving the INIT-SIPI-SIPI
sequence, KVM will make the vCPU runnable. It is again up to the guest
to then transfer control of the AP to the proper location.
To differentiate between an actual HLT and an AP Reset Hold, a new MP
state is introduced, KVM_MP_STATE_AP_RESET_HOLD, which the vCPU is
placed in upon receiving the AP Reset Hold exit event. Additionally, to
communicate the AP Reset Hold exit event up to userspace (if needed), a
new exit reason is introduced, KVM_EXIT_AP_RESET_HOLD.
A new x86 ops function is introduced, vcpu_deliver_sipi_vector, in order
to accomplish AP booting. For VMX, vcpu_deliver_sipi_vector is set to the
original SIPI delivery function, kvm_vcpu_deliver_sipi_vector(). SVM adds
a new function that, for non SEV-ES guests, invokes the original SIPI
delivery function, kvm_vcpu_deliver_sipi_vector(), but for SEV-ES guests,
implements the logic above.
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Message-Id: <e8fbebe8eb161ceaabdad7c01a5859a78b424d5e.1609791600.git.thomas.lendacky@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-01-04 14:20:01 -06:00
|
|
|
void sev_vcpu_deliver_sipi_vector(struct kvm_vcpu *vcpu, u8 vector)
|
|
|
|
{
|
|
|
|
struct vcpu_svm *svm = to_svm(vcpu);
|
|
|
|
|
|
|
|
/* First SIPI: Use the values as initially set by the VMM */
|
2021-10-21 10:42:59 -07:00
|
|
|
if (!svm->sev_es.received_first_sipi) {
|
|
|
|
svm->sev_es.received_first_sipi = true;
|
KVM: SVM: Add support for booting APs in an SEV-ES guest
Typically under KVM, an AP is booted using the INIT-SIPI-SIPI sequence,
where the guest vCPU register state is updated and then the vCPU is VMRUN
to begin execution of the AP. For an SEV-ES guest, this won't work because
the guest register state is encrypted.
Following the GHCB specification, the hypervisor must not alter the guest
register state, so KVM must track an AP/vCPU boot. Should the guest want
to park the AP, it must use the AP Reset Hold exit event in place of, for
example, a HLT loop.
First AP boot (first INIT-SIPI-SIPI sequence):
Execute the AP (vCPU) as it was initialized and measured by the SEV-ES
support. It is up to the guest to transfer control of the AP to the
proper location.
Subsequent AP boot:
KVM will expect to receive an AP Reset Hold exit event indicating that
the vCPU is being parked and will require an INIT-SIPI-SIPI sequence to
awaken it. When the AP Reset Hold exit event is received, KVM will place
the vCPU into a simulated HLT mode. Upon receiving the INIT-SIPI-SIPI
sequence, KVM will make the vCPU runnable. It is again up to the guest
to then transfer control of the AP to the proper location.
To differentiate between an actual HLT and an AP Reset Hold, a new MP
state is introduced, KVM_MP_STATE_AP_RESET_HOLD, which the vCPU is
placed in upon receiving the AP Reset Hold exit event. Additionally, to
communicate the AP Reset Hold exit event up to userspace (if needed), a
new exit reason is introduced, KVM_EXIT_AP_RESET_HOLD.
A new x86 ops function is introduced, vcpu_deliver_sipi_vector, in order
to accomplish AP booting. For VMX, vcpu_deliver_sipi_vector is set to the
original SIPI delivery function, kvm_vcpu_deliver_sipi_vector(). SVM adds
a new function that, for non SEV-ES guests, invokes the original SIPI
delivery function, kvm_vcpu_deliver_sipi_vector(), but for SEV-ES guests,
implements the logic above.
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Message-Id: <e8fbebe8eb161ceaabdad7c01a5859a78b424d5e.1609791600.git.thomas.lendacky@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-01-04 14:20:01 -06:00
|
|
|
return;
|
|
|
|
}
|
|
|
|
|
2024-05-01 02:10:45 -05:00
|
|
|
/* Subsequent SIPI */
|
|
|
|
switch (svm->sev_es.ap_reset_hold_type) {
|
|
|
|
case AP_RESET_HOLD_NAE_EVENT:
|
|
|
|
/*
|
|
|
|
* Return from an AP Reset Hold VMGEXIT, where the guest will
|
|
|
|
* set the CS and RIP. Set SW_EXIT_INFO_2 to a non-zero value.
|
|
|
|
*/
|
2025-02-25 21:39:37 +00:00
|
|
|
svm_vmgexit_success(svm, 1);
|
2024-05-01 02:10:45 -05:00
|
|
|
break;
|
|
|
|
case AP_RESET_HOLD_MSR_PROTO:
|
|
|
|
/*
|
|
|
|
* Return from an AP Reset Hold VMGEXIT, where the guest will
|
|
|
|
* set the CS and RIP. Set GHCB data field to a non-zero value.
|
|
|
|
*/
|
|
|
|
set_ghcb_msr_bits(svm, 1,
|
|
|
|
GHCB_MSR_AP_RESET_HOLD_RESULT_MASK,
|
|
|
|
GHCB_MSR_AP_RESET_HOLD_RESULT_POS);
|
2021-04-09 09:38:42 -05:00
|
|
|
|
2024-05-01 02:10:45 -05:00
|
|
|
set_ghcb_msr_bits(svm, GHCB_MSR_AP_RESET_HOLD_RESP,
|
|
|
|
GHCB_MSR_INFO_MASK,
|
|
|
|
GHCB_MSR_INFO_POS);
|
|
|
|
break;
|
|
|
|
default:
|
|
|
|
break;
|
|
|
|
}
|
KVM: SVM: Add support for booting APs in an SEV-ES guest
Typically under KVM, an AP is booted using the INIT-SIPI-SIPI sequence,
where the guest vCPU register state is updated and then the vCPU is VMRUN
to begin execution of the AP. For an SEV-ES guest, this won't work because
the guest register state is encrypted.
Following the GHCB specification, the hypervisor must not alter the guest
register state, so KVM must track an AP/vCPU boot. Should the guest want
to park the AP, it must use the AP Reset Hold exit event in place of, for
example, a HLT loop.
First AP boot (first INIT-SIPI-SIPI sequence):
Execute the AP (vCPU) as it was initialized and measured by the SEV-ES
support. It is up to the guest to transfer control of the AP to the
proper location.
Subsequent AP boot:
KVM will expect to receive an AP Reset Hold exit event indicating that
the vCPU is being parked and will require an INIT-SIPI-SIPI sequence to
awaken it. When the AP Reset Hold exit event is received, KVM will place
the vCPU into a simulated HLT mode. Upon receiving the INIT-SIPI-SIPI
sequence, KVM will make the vCPU runnable. It is again up to the guest
to then transfer control of the AP to the proper location.
To differentiate between an actual HLT and an AP Reset Hold, a new MP
state is introduced, KVM_MP_STATE_AP_RESET_HOLD, which the vCPU is
placed in upon receiving the AP Reset Hold exit event. Additionally, to
communicate the AP Reset Hold exit event up to userspace (if needed), a
new exit reason is introduced, KVM_EXIT_AP_RESET_HOLD.
A new x86 ops function is introduced, vcpu_deliver_sipi_vector, in order
to accomplish AP booting. For VMX, vcpu_deliver_sipi_vector is set to the
original SIPI delivery function, kvm_vcpu_deliver_sipi_vector(). SVM adds
a new function that, for non SEV-ES guests, invokes the original SIPI
delivery function, kvm_vcpu_deliver_sipi_vector(), but for SEV-ES guests,
implements the logic above.
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Message-Id: <e8fbebe8eb161ceaabdad7c01a5859a78b424d5e.1609791600.git.thomas.lendacky@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-01-04 14:20:01 -06:00
|
|
|
}
|
KVM: SEV: Make AVIC backing, VMSA and VMCB memory allocation SNP safe
Implement a workaround for an SNP erratum where the CPU will incorrectly
signal an RMP violation #PF if a hugepage (2MB or 1GB) collides with the
RMP entry of a VMCB, VMSA or AVIC backing page.
When SEV-SNP is globally enabled, the CPU marks the VMCB, VMSA, and AVIC
backing pages as "in-use" via a reserved bit in the corresponding RMP
entry after a successful VMRUN. This is done for _all_ VMs, not just
SNP-Active VMs.
If the hypervisor accesses an in-use page through a writable
translation, the CPU will throw an RMP violation #PF. On early SNP
hardware, if an in-use page is 2MB-aligned and software accesses any
part of the associated 2MB region with a hugepage, the CPU will
incorrectly treat the entire 2MB region as in-use and signal a an RMP
violation #PF.
To avoid this, the recommendation is to not use a 2MB-aligned page for
the VMCB, VMSA or AVIC pages. Add a generic allocator that will ensure
that the page returned is not 2MB-aligned and is safe to be used when
SEV-SNP is enabled. Also implement similar handling for the VMCB/VMSA
pages of nested guests.
[ mdr: Squash in nested guest handling from Ashish, commit msg fixups. ]
Reported-by: Alper Gun <alpergun@google.com> # for nested VMSA case
Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Co-developed-by: Marc Orr <marcorr@google.com>
Signed-off-by: Marc Orr <marcorr@google.com>
Co-developed-by: Ashish Kalra <ashish.kalra@amd.com>
Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
Signed-off-by: Michael Roth <michael.roth@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Link: https://lore.kernel.org/r/20240126041126.1927228-22-michael.roth@amd.com
2024-01-25 22:11:21 -06:00
|
|
|
|
2024-05-20 20:08:58 +08:00
|
|
|
struct page *snp_safe_alloc_page_node(int node, gfp_t gfp)
|
KVM: SEV: Make AVIC backing, VMSA and VMCB memory allocation SNP safe
Implement a workaround for an SNP erratum where the CPU will incorrectly
signal an RMP violation #PF if a hugepage (2MB or 1GB) collides with the
RMP entry of a VMCB, VMSA or AVIC backing page.
When SEV-SNP is globally enabled, the CPU marks the VMCB, VMSA, and AVIC
backing pages as "in-use" via a reserved bit in the corresponding RMP
entry after a successful VMRUN. This is done for _all_ VMs, not just
SNP-Active VMs.
If the hypervisor accesses an in-use page through a writable
translation, the CPU will throw an RMP violation #PF. On early SNP
hardware, if an in-use page is 2MB-aligned and software accesses any
part of the associated 2MB region with a hugepage, the CPU will
incorrectly treat the entire 2MB region as in-use and signal a an RMP
violation #PF.
To avoid this, the recommendation is to not use a 2MB-aligned page for
the VMCB, VMSA or AVIC pages. Add a generic allocator that will ensure
that the page returned is not 2MB-aligned and is safe to be used when
SEV-SNP is enabled. Also implement similar handling for the VMCB/VMSA
pages of nested guests.
[ mdr: Squash in nested guest handling from Ashish, commit msg fixups. ]
Reported-by: Alper Gun <alpergun@google.com> # for nested VMSA case
Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Co-developed-by: Marc Orr <marcorr@google.com>
Signed-off-by: Marc Orr <marcorr@google.com>
Co-developed-by: Ashish Kalra <ashish.kalra@amd.com>
Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
Signed-off-by: Michael Roth <michael.roth@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Link: https://lore.kernel.org/r/20240126041126.1927228-22-michael.roth@amd.com
2024-01-25 22:11:21 -06:00
|
|
|
{
|
|
|
|
unsigned long pfn;
|
|
|
|
struct page *p;
|
|
|
|
|
2024-03-27 16:43:17 +01:00
|
|
|
if (!cc_platform_has(CC_ATTR_HOST_SEV_SNP))
|
2024-05-20 20:08:58 +08:00
|
|
|
return alloc_pages_node(node, gfp | __GFP_ZERO, 0);
|
KVM: SEV: Make AVIC backing, VMSA and VMCB memory allocation SNP safe
Implement a workaround for an SNP erratum where the CPU will incorrectly
signal an RMP violation #PF if a hugepage (2MB or 1GB) collides with the
RMP entry of a VMCB, VMSA or AVIC backing page.
When SEV-SNP is globally enabled, the CPU marks the VMCB, VMSA, and AVIC
backing pages as "in-use" via a reserved bit in the corresponding RMP
entry after a successful VMRUN. This is done for _all_ VMs, not just
SNP-Active VMs.
If the hypervisor accesses an in-use page through a writable
translation, the CPU will throw an RMP violation #PF. On early SNP
hardware, if an in-use page is 2MB-aligned and software accesses any
part of the associated 2MB region with a hugepage, the CPU will
incorrectly treat the entire 2MB region as in-use and signal a an RMP
violation #PF.
To avoid this, the recommendation is to not use a 2MB-aligned page for
the VMCB, VMSA or AVIC pages. Add a generic allocator that will ensure
that the page returned is not 2MB-aligned and is safe to be used when
SEV-SNP is enabled. Also implement similar handling for the VMCB/VMSA
pages of nested guests.
[ mdr: Squash in nested guest handling from Ashish, commit msg fixups. ]
Reported-by: Alper Gun <alpergun@google.com> # for nested VMSA case
Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Co-developed-by: Marc Orr <marcorr@google.com>
Signed-off-by: Marc Orr <marcorr@google.com>
Co-developed-by: Ashish Kalra <ashish.kalra@amd.com>
Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
Signed-off-by: Michael Roth <michael.roth@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Link: https://lore.kernel.org/r/20240126041126.1927228-22-michael.roth@amd.com
2024-01-25 22:11:21 -06:00
|
|
|
|
|
|
|
/*
|
|
|
|
* Allocate an SNP-safe page to workaround the SNP erratum where
|
|
|
|
* the CPU will incorrectly signal an RMP violation #PF if a
|
|
|
|
* hugepage (2MB or 1GB) collides with the RMP entry of a
|
|
|
|
* 2MB-aligned VMCB, VMSA, or AVIC backing page.
|
|
|
|
*
|
|
|
|
* Allocate one extra page, choose a page which is not
|
|
|
|
* 2MB-aligned, and free the other.
|
|
|
|
*/
|
2024-05-20 20:08:58 +08:00
|
|
|
p = alloc_pages_node(node, gfp | __GFP_ZERO, 1);
|
KVM: SEV: Make AVIC backing, VMSA and VMCB memory allocation SNP safe
Implement a workaround for an SNP erratum where the CPU will incorrectly
signal an RMP violation #PF if a hugepage (2MB or 1GB) collides with the
RMP entry of a VMCB, VMSA or AVIC backing page.
When SEV-SNP is globally enabled, the CPU marks the VMCB, VMSA, and AVIC
backing pages as "in-use" via a reserved bit in the corresponding RMP
entry after a successful VMRUN. This is done for _all_ VMs, not just
SNP-Active VMs.
If the hypervisor accesses an in-use page through a writable
translation, the CPU will throw an RMP violation #PF. On early SNP
hardware, if an in-use page is 2MB-aligned and software accesses any
part of the associated 2MB region with a hugepage, the CPU will
incorrectly treat the entire 2MB region as in-use and signal a an RMP
violation #PF.
To avoid this, the recommendation is to not use a 2MB-aligned page for
the VMCB, VMSA or AVIC pages. Add a generic allocator that will ensure
that the page returned is not 2MB-aligned and is safe to be used when
SEV-SNP is enabled. Also implement similar handling for the VMCB/VMSA
pages of nested guests.
[ mdr: Squash in nested guest handling from Ashish, commit msg fixups. ]
Reported-by: Alper Gun <alpergun@google.com> # for nested VMSA case
Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Co-developed-by: Marc Orr <marcorr@google.com>
Signed-off-by: Marc Orr <marcorr@google.com>
Co-developed-by: Ashish Kalra <ashish.kalra@amd.com>
Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
Signed-off-by: Michael Roth <michael.roth@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Link: https://lore.kernel.org/r/20240126041126.1927228-22-michael.roth@amd.com
2024-01-25 22:11:21 -06:00
|
|
|
if (!p)
|
|
|
|
return NULL;
|
|
|
|
|
|
|
|
split_page(p, 1);
|
|
|
|
|
|
|
|
pfn = page_to_pfn(p);
|
|
|
|
if (IS_ALIGNED(pfn, PTRS_PER_PMD))
|
|
|
|
__free_page(p++);
|
|
|
|
else
|
|
|
|
__free_page(p + 1);
|
|
|
|
|
|
|
|
return p;
|
|
|
|
}
|
2024-05-01 03:52:01 -05:00
|
|
|
|
|
|
|
void sev_handle_rmp_fault(struct kvm_vcpu *vcpu, gpa_t gpa, u64 error_code)
|
|
|
|
{
|
|
|
|
struct kvm_memory_slot *slot;
|
|
|
|
struct kvm *kvm = vcpu->kvm;
|
|
|
|
int order, rmp_level, ret;
|
2024-10-10 11:23:48 -07:00
|
|
|
struct page *page;
|
2024-05-01 03:52:01 -05:00
|
|
|
bool assigned;
|
|
|
|
kvm_pfn_t pfn;
|
|
|
|
gfn_t gfn;
|
|
|
|
|
|
|
|
gfn = gpa >> PAGE_SHIFT;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* The only time RMP faults occur for shared pages is when the guest is
|
|
|
|
* triggering an RMP fault for an implicit page-state change from
|
|
|
|
* shared->private. Implicit page-state changes are forwarded to
|
|
|
|
* userspace via KVM_EXIT_MEMORY_FAULT events, however, so RMP faults
|
|
|
|
* for shared pages should not end up here.
|
|
|
|
*/
|
|
|
|
if (!kvm_mem_is_private(kvm, gfn)) {
|
|
|
|
pr_warn_ratelimited("SEV: Unexpected RMP fault for non-private GPA 0x%llx\n",
|
|
|
|
gpa);
|
|
|
|
return;
|
|
|
|
}
|
|
|
|
|
|
|
|
slot = gfn_to_memslot(kvm, gfn);
|
|
|
|
if (!kvm_slot_can_be_private(slot)) {
|
|
|
|
pr_warn_ratelimited("SEV: Unexpected RMP fault, non-private slot for GPA 0x%llx\n",
|
|
|
|
gpa);
|
|
|
|
return;
|
|
|
|
}
|
|
|
|
|
2024-10-10 11:23:48 -07:00
|
|
|
ret = kvm_gmem_get_pfn(kvm, slot, gfn, &pfn, &page, &order);
|
2024-05-01 03:52:01 -05:00
|
|
|
if (ret) {
|
|
|
|
pr_warn_ratelimited("SEV: Unexpected RMP fault, no backing page for private GPA 0x%llx\n",
|
|
|
|
gpa);
|
|
|
|
return;
|
|
|
|
}
|
|
|
|
|
|
|
|
ret = snp_lookup_rmpentry(pfn, &assigned, &rmp_level);
|
|
|
|
if (ret || !assigned) {
|
|
|
|
pr_warn_ratelimited("SEV: Unexpected RMP fault, no assigned RMP entry found for GPA 0x%llx PFN 0x%llx error %d\n",
|
|
|
|
gpa, pfn, ret);
|
|
|
|
goto out_no_trace;
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* There are 2 cases where a PSMASH may be needed to resolve an #NPF
|
|
|
|
* with PFERR_GUEST_RMP_BIT set:
|
|
|
|
*
|
|
|
|
* 1) RMPADJUST/PVALIDATE can trigger an #NPF with PFERR_GUEST_SIZEM
|
|
|
|
* bit set if the guest issues them with a smaller granularity than
|
|
|
|
* what is indicated by the page-size bit in the 2MB RMP entry for
|
|
|
|
* the PFN that backs the GPA.
|
|
|
|
*
|
|
|
|
* 2) Guest access via NPT can trigger an #NPF if the NPT mapping is
|
|
|
|
* smaller than what is indicated by the 2MB RMP entry for the PFN
|
|
|
|
* that backs the GPA.
|
|
|
|
*
|
|
|
|
* In both these cases, the corresponding 2M RMP entry needs to
|
|
|
|
* be PSMASH'd to 512 4K RMP entries. If the RMP entry is already
|
|
|
|
* split into 4K RMP entries, then this is likely a spurious case which
|
|
|
|
* can occur when there are concurrent accesses by the guest to a 2MB
|
|
|
|
* GPA range that is backed by a 2MB-aligned PFN who's RMP entry is in
|
|
|
|
* the process of being PMASH'd into 4K entries. These cases should
|
|
|
|
* resolve automatically on subsequent accesses, so just ignore them
|
|
|
|
* here.
|
|
|
|
*/
|
|
|
|
if (rmp_level == PG_LEVEL_4K)
|
|
|
|
goto out;
|
|
|
|
|
|
|
|
ret = snp_rmptable_psmash(pfn);
|
|
|
|
if (ret) {
|
|
|
|
/*
|
|
|
|
* Look it up again. If it's 4K now then the PSMASH may have
|
|
|
|
* raced with another process and the issue has already resolved
|
|
|
|
* itself.
|
|
|
|
*/
|
|
|
|
if (!snp_lookup_rmpentry(pfn, &assigned, &rmp_level) &&
|
|
|
|
assigned && rmp_level == PG_LEVEL_4K)
|
|
|
|
goto out;
|
|
|
|
|
|
|
|
pr_warn_ratelimited("SEV: Unable to split RMP entry for GPA 0x%llx PFN 0x%llx ret %d\n",
|
|
|
|
gpa, pfn, ret);
|
|
|
|
}
|
|
|
|
|
|
|
|
kvm_zap_gfn_range(kvm, gfn, gfn + PTRS_PER_PMD);
|
|
|
|
out:
|
|
|
|
trace_kvm_rmp_fault(vcpu, gpa, pfn, error_code, rmp_level, ret);
|
|
|
|
out_no_trace:
|
2024-10-10 11:23:48 -07:00
|
|
|
kvm_release_page_unused(page);
|
2024-05-01 03:52:01 -05:00
|
|
|
}
|
2024-05-01 03:52:03 -05:00
|
|
|
|
|
|
|
static bool is_pfn_range_shared(kvm_pfn_t start, kvm_pfn_t end)
|
|
|
|
{
|
|
|
|
kvm_pfn_t pfn = start;
|
|
|
|
|
|
|
|
while (pfn < end) {
|
|
|
|
int ret, rmp_level;
|
|
|
|
bool assigned;
|
|
|
|
|
|
|
|
ret = snp_lookup_rmpentry(pfn, &assigned, &rmp_level);
|
|
|
|
if (ret) {
|
|
|
|
pr_warn_ratelimited("SEV: Failed to retrieve RMP entry: PFN 0x%llx GFN start 0x%llx GFN end 0x%llx RMP level %d error %d\n",
|
|
|
|
pfn, start, end, rmp_level, ret);
|
|
|
|
return false;
|
|
|
|
}
|
|
|
|
|
|
|
|
if (assigned) {
|
|
|
|
pr_debug("%s: overlap detected, PFN 0x%llx start 0x%llx end 0x%llx RMP level %d\n",
|
|
|
|
__func__, pfn, start, end, rmp_level);
|
|
|
|
return false;
|
|
|
|
}
|
|
|
|
|
|
|
|
pfn++;
|
|
|
|
}
|
|
|
|
|
|
|
|
return true;
|
|
|
|
}
|
|
|
|
|
|
|
|
static u8 max_level_for_order(int order)
|
|
|
|
{
|
|
|
|
if (order >= KVM_HPAGE_GFN_SHIFT(PG_LEVEL_2M))
|
|
|
|
return PG_LEVEL_2M;
|
|
|
|
|
|
|
|
return PG_LEVEL_4K;
|
|
|
|
}
|
|
|
|
|
|
|
|
static bool is_large_rmp_possible(struct kvm *kvm, kvm_pfn_t pfn, int order)
|
|
|
|
{
|
|
|
|
kvm_pfn_t pfn_aligned = ALIGN_DOWN(pfn, PTRS_PER_PMD);
|
|
|
|
|
|
|
|
/*
|
|
|
|
* If this is a large folio, and the entire 2M range containing the
|
|
|
|
* PFN is currently shared, then the entire 2M-aligned range can be
|
|
|
|
* set to private via a single 2M RMP entry.
|
|
|
|
*/
|
|
|
|
if (max_level_for_order(order) > PG_LEVEL_4K &&
|
|
|
|
is_pfn_range_shared(pfn_aligned, pfn_aligned + PTRS_PER_PMD))
|
|
|
|
return true;
|
|
|
|
|
|
|
|
return false;
|
|
|
|
}
|
|
|
|
|
|
|
|
int sev_gmem_prepare(struct kvm *kvm, kvm_pfn_t pfn, gfn_t gfn, int max_order)
|
|
|
|
{
|
2025-01-23 11:21:40 +05:30
|
|
|
struct kvm_sev_info *sev = to_kvm_sev_info(kvm);
|
2024-05-01 03:52:03 -05:00
|
|
|
kvm_pfn_t pfn_aligned;
|
|
|
|
gfn_t gfn_aligned;
|
|
|
|
int level, rc;
|
|
|
|
bool assigned;
|
|
|
|
|
|
|
|
if (!sev_snp_guest(kvm))
|
|
|
|
return 0;
|
|
|
|
|
|
|
|
rc = snp_lookup_rmpentry(pfn, &assigned, &level);
|
|
|
|
if (rc) {
|
|
|
|
pr_err_ratelimited("SEV: Failed to look up RMP entry: GFN %llx PFN %llx error %d\n",
|
|
|
|
gfn, pfn, rc);
|
|
|
|
return -ENOENT;
|
|
|
|
}
|
|
|
|
|
|
|
|
if (assigned) {
|
|
|
|
pr_debug("%s: already assigned: gfn %llx pfn %llx max_order %d level %d\n",
|
|
|
|
__func__, gfn, pfn, max_order, level);
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
if (is_large_rmp_possible(kvm, pfn, max_order)) {
|
|
|
|
level = PG_LEVEL_2M;
|
|
|
|
pfn_aligned = ALIGN_DOWN(pfn, PTRS_PER_PMD);
|
|
|
|
gfn_aligned = ALIGN_DOWN(gfn, PTRS_PER_PMD);
|
|
|
|
} else {
|
|
|
|
level = PG_LEVEL_4K;
|
|
|
|
pfn_aligned = pfn;
|
|
|
|
gfn_aligned = gfn;
|
|
|
|
}
|
|
|
|
|
|
|
|
rc = rmp_make_private(pfn_aligned, gfn_to_gpa(gfn_aligned), level, sev->asid, false);
|
|
|
|
if (rc) {
|
|
|
|
pr_err_ratelimited("SEV: Failed to update RMP entry: GFN %llx PFN %llx level %d error %d\n",
|
|
|
|
gfn, pfn, level, rc);
|
|
|
|
return -EINVAL;
|
|
|
|
}
|
|
|
|
|
|
|
|
pr_debug("%s: updated: gfn %llx pfn %llx pfn_aligned %llx max_order %d level %d\n",
|
|
|
|
__func__, gfn, pfn, pfn_aligned, max_order, level);
|
|
|
|
|
|
|
|
return 0;
|
|
|
|
}
|
2024-05-01 03:52:04 -05:00
|
|
|
|
|
|
|
void sev_gmem_invalidate(kvm_pfn_t start, kvm_pfn_t end)
|
|
|
|
{
|
|
|
|
kvm_pfn_t pfn;
|
|
|
|
|
2024-06-03 12:37:26 -04:00
|
|
|
if (!cc_platform_has(CC_ATTR_HOST_SEV_SNP))
|
|
|
|
return;
|
|
|
|
|
2024-05-01 03:52:04 -05:00
|
|
|
pr_debug("%s: PFN start 0x%llx PFN end 0x%llx\n", __func__, start, end);
|
|
|
|
|
|
|
|
for (pfn = start; pfn < end;) {
|
|
|
|
bool use_2m_update = false;
|
|
|
|
int rc, rmp_level;
|
|
|
|
bool assigned;
|
|
|
|
|
|
|
|
rc = snp_lookup_rmpentry(pfn, &assigned, &rmp_level);
|
2024-06-03 12:37:26 -04:00
|
|
|
if (rc || !assigned)
|
2024-05-01 03:52:04 -05:00
|
|
|
goto next_pfn;
|
|
|
|
|
|
|
|
use_2m_update = IS_ALIGNED(pfn, PTRS_PER_PMD) &&
|
|
|
|
end >= (pfn + PTRS_PER_PMD) &&
|
|
|
|
rmp_level > PG_LEVEL_4K;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* If an unaligned PFN corresponds to a 2M region assigned as a
|
|
|
|
* large page in the RMP table, PSMASH the region into individual
|
|
|
|
* 4K RMP entries before attempting to convert a 4K sub-page.
|
|
|
|
*/
|
|
|
|
if (!use_2m_update && rmp_level > PG_LEVEL_4K) {
|
|
|
|
/*
|
|
|
|
* This shouldn't fail, but if it does, report it, but
|
|
|
|
* still try to update RMP entry to shared and pray this
|
|
|
|
* was a spurious error that can be addressed later.
|
|
|
|
*/
|
|
|
|
rc = snp_rmptable_psmash(pfn);
|
|
|
|
WARN_ONCE(rc, "SEV: Failed to PSMASH RMP entry for PFN 0x%llx error %d\n",
|
|
|
|
pfn, rc);
|
|
|
|
}
|
|
|
|
|
|
|
|
rc = rmp_make_shared(pfn, use_2m_update ? PG_LEVEL_2M : PG_LEVEL_4K);
|
|
|
|
if (WARN_ONCE(rc, "SEV: Failed to update RMP entry for PFN 0x%llx error %d\n",
|
|
|
|
pfn, rc))
|
|
|
|
goto next_pfn;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* SEV-ES avoids host/guest cache coherency issues through
|
2025-05-22 16:37:31 -07:00
|
|
|
* WBNOINVD hooks issued via MMU notifiers during run-time, and
|
2024-05-01 03:52:04 -05:00
|
|
|
* KVM's VM destroy path at shutdown. Those MMU notifier events
|
|
|
|
* don't cover gmem since there is no requirement to map pages
|
|
|
|
* to a HVA in order to use them for a running guest. While the
|
|
|
|
* shutdown path would still likely cover things for SNP guests,
|
|
|
|
* userspace may also free gmem pages during run-time via
|
|
|
|
* hole-punching operations on the guest_memfd, so flush the
|
|
|
|
* cache entries for these pages before free'ing them back to
|
|
|
|
* the host.
|
|
|
|
*/
|
|
|
|
clflush_cache_range(__va(pfn_to_hpa(pfn)),
|
|
|
|
use_2m_update ? PMD_SIZE : PAGE_SIZE);
|
|
|
|
next_pfn:
|
|
|
|
pfn += use_2m_update ? PTRS_PER_PMD : 1;
|
|
|
|
cond_resched();
|
|
|
|
}
|
|
|
|
}
|
2024-05-01 03:52:05 -05:00
|
|
|
|
|
|
|
int sev_private_max_mapping_level(struct kvm *kvm, kvm_pfn_t pfn)
|
|
|
|
{
|
|
|
|
int level, rc;
|
|
|
|
bool assigned;
|
|
|
|
|
|
|
|
if (!sev_snp_guest(kvm))
|
|
|
|
return 0;
|
|
|
|
|
|
|
|
rc = snp_lookup_rmpentry(pfn, &assigned, &level);
|
|
|
|
if (rc || !assigned)
|
|
|
|
return PG_LEVEL_4K;
|
|
|
|
|
|
|
|
return level;
|
|
|
|
}
|
2025-03-20 08:26:49 -05:00
|
|
|
|
|
|
|
struct vmcb_save_area *sev_decrypt_vmsa(struct kvm_vcpu *vcpu)
|
|
|
|
{
|
|
|
|
struct vcpu_svm *svm = to_svm(vcpu);
|
|
|
|
struct vmcb_save_area *vmsa;
|
|
|
|
struct kvm_sev_info *sev;
|
|
|
|
int error = 0;
|
|
|
|
int ret;
|
|
|
|
|
|
|
|
if (!sev_es_guest(vcpu->kvm))
|
|
|
|
return NULL;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* If the VMSA has not yet been encrypted, return a pointer to the
|
|
|
|
* current un-encrypted VMSA.
|
|
|
|
*/
|
|
|
|
if (!vcpu->arch.guest_state_protected)
|
|
|
|
return (struct vmcb_save_area *)svm->sev_es.vmsa;
|
|
|
|
|
|
|
|
sev = to_kvm_sev_info(vcpu->kvm);
|
|
|
|
|
|
|
|
/* Check if the SEV policy allows debugging */
|
|
|
|
if (sev_snp_guest(vcpu->kvm)) {
|
|
|
|
if (!(sev->policy & SNP_POLICY_DEBUG))
|
|
|
|
return NULL;
|
|
|
|
} else {
|
|
|
|
if (sev->policy & SEV_POLICY_NODBG)
|
|
|
|
return NULL;
|
|
|
|
}
|
|
|
|
|
|
|
|
if (sev_snp_guest(vcpu->kvm)) {
|
|
|
|
struct sev_data_snp_dbg dbg = {0};
|
|
|
|
|
|
|
|
vmsa = snp_alloc_firmware_page(__GFP_ZERO);
|
|
|
|
if (!vmsa)
|
|
|
|
return NULL;
|
|
|
|
|
|
|
|
dbg.gctx_paddr = __psp_pa(sev->snp_context);
|
|
|
|
dbg.src_addr = svm->vmcb->control.vmsa_pa;
|
|
|
|
dbg.dst_addr = __psp_pa(vmsa);
|
|
|
|
|
|
|
|
ret = sev_do_cmd(SEV_CMD_SNP_DBG_DECRYPT, &dbg, &error);
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Return the target page to a hypervisor page no matter what.
|
|
|
|
* If this fails, the page can't be used, so leak it and don't
|
|
|
|
* try to use it.
|
|
|
|
*/
|
|
|
|
if (snp_page_reclaim(vcpu->kvm, PHYS_PFN(__pa(vmsa))))
|
|
|
|
return NULL;
|
|
|
|
|
|
|
|
if (ret) {
|
|
|
|
pr_err("SEV: SNP_DBG_DECRYPT failed ret=%d, fw_error=%d (%#x)\n",
|
|
|
|
ret, error, error);
|
|
|
|
free_page((unsigned long)vmsa);
|
|
|
|
|
|
|
|
return NULL;
|
|
|
|
}
|
|
|
|
} else {
|
|
|
|
struct sev_data_dbg dbg = {0};
|
|
|
|
struct page *vmsa_page;
|
|
|
|
|
|
|
|
vmsa_page = alloc_page(GFP_KERNEL);
|
|
|
|
if (!vmsa_page)
|
|
|
|
return NULL;
|
|
|
|
|
|
|
|
vmsa = page_address(vmsa_page);
|
|
|
|
|
|
|
|
dbg.handle = sev->handle;
|
|
|
|
dbg.src_addr = svm->vmcb->control.vmsa_pa;
|
|
|
|
dbg.dst_addr = __psp_pa(vmsa);
|
|
|
|
dbg.len = PAGE_SIZE;
|
|
|
|
|
|
|
|
ret = sev_do_cmd(SEV_CMD_DBG_DECRYPT, &dbg, &error);
|
|
|
|
if (ret) {
|
|
|
|
pr_err("SEV: SEV_CMD_DBG_DECRYPT failed ret=%d, fw_error=%d (0x%x)\n",
|
|
|
|
ret, error, error);
|
|
|
|
__free_page(vmsa_page);
|
|
|
|
|
|
|
|
return NULL;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
return vmsa;
|
|
|
|
}
|
|
|
|
|
|
|
|
void sev_free_decrypted_vmsa(struct kvm_vcpu *vcpu, struct vmcb_save_area *vmsa)
|
|
|
|
{
|
|
|
|
/* If the VMSA has not yet been encrypted, nothing was allocated */
|
|
|
|
if (!vcpu->arch.guest_state_protected || !vmsa)
|
|
|
|
return;
|
|
|
|
|
|
|
|
free_page((unsigned long)vmsa);
|
|
|
|
}
|