2019-07-31 17:15:23 +02:00
|
|
|
// SPDX-License-Identifier: GPL-2.0
|
|
|
|
/*
|
|
|
|
* ucall support. A ucall is a "hypercall to userspace".
|
|
|
|
*
|
|
|
|
* Copyright (C) 2018, Red Hat, Inc.
|
|
|
|
*/
|
|
|
|
#include "kvm_util.h"
|
|
|
|
|
|
|
|
#define UCALL_PIO_PORT ((uint16_t)0x1000)
|
|
|
|
|
2022-10-06 00:34:03 +00:00
|
|
|
void ucall_arch_do_ucall(vm_vaddr_t uc)
|
2019-07-31 17:15:23 +02:00
|
|
|
{
|
KVM: selftests: Add a shameful hack to preserve/clobber GPRs across ucall
Preserve or clobber all GPRs (except RIP and RSP, as they're saved and
restored via the VMCS) when performing a ucall on x86 to fudge around a
horrific long-standing bug in selftests' nested VMX support where L2's
GPRs are not preserved across a nested VM-Exit. I.e. if a test triggers a
nested VM-Exit to L1 in response to a ucall, e.g. GUEST_SYNC(), then L2's
GPR state can be corrupted.
The issues manifests as an unexpected #GP in clear_bit() when running the
hyperv_evmcs test due to RBX being used to track the ucall object, and RBX
being clobbered by the nested VM-Exit. The problematic hyperv_evmcs
testcase is where L0 (test's host userspace) injects an NMI in response to
GUEST_SYNC(8) from L2, but the bug could "randomly" manifest in any test
that induces a nested VM-Exit from L0. The bug hasn't caused failures in
the past due to sheer dumb luck.
The obvious fix is to rework the nVMX helpers to save/restore L2 GPRs
across VM-Exit and VM-Enter, but that is a much bigger task and carries
its own risks, e.g. nSVM does save/restore GPRs, but not in a thread-safe
manner, and there is a _lot_ of cleanup that can be done to unify code
for doing VM-Enter on nVMX, nSVM, and eVMCS.
Link: https://lore.kernel.org/r/20230729003643.1053367-4-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-07-28 17:36:12 -07:00
|
|
|
/*
|
|
|
|
* FIXME: Revert this hack (the entire commit that added it) once nVMX
|
|
|
|
* preserves L2 GPRs across a nested VM-Exit. If a ucall from L2, e.g.
|
|
|
|
* to do a GUEST_SYNC(), lands the vCPU in L1, any and all GPRs can be
|
|
|
|
* clobbered by L1. Save and restore non-volatile GPRs (clobbering RBP
|
|
|
|
* in particular is problematic) along with RDX and RDI (which are
|
|
|
|
* inputs), and clobber volatile GPRs. *sigh*
|
|
|
|
*/
|
|
|
|
#define HORRIFIC_L2_UCALL_CLOBBER_HACK \
|
|
|
|
"rcx", "rsi", "r8", "r9", "r10", "r11"
|
|
|
|
|
|
|
|
asm volatile("push %%rbp\n\t"
|
|
|
|
"push %%r15\n\t"
|
|
|
|
"push %%r14\n\t"
|
|
|
|
"push %%r13\n\t"
|
|
|
|
"push %%r12\n\t"
|
|
|
|
"push %%rbx\n\t"
|
|
|
|
"push %%rdx\n\t"
|
|
|
|
"push %%rdi\n\t"
|
|
|
|
"in %[port], %%al\n\t"
|
|
|
|
"pop %%rdi\n\t"
|
|
|
|
"pop %%rdx\n\t"
|
|
|
|
"pop %%rbx\n\t"
|
|
|
|
"pop %%r12\n\t"
|
|
|
|
"pop %%r13\n\t"
|
|
|
|
"pop %%r14\n\t"
|
|
|
|
"pop %%r15\n\t"
|
|
|
|
"pop %%rbp\n\t"
|
|
|
|
: : [port] "d" (UCALL_PIO_PORT), "D" (uc) : "rax", "memory",
|
|
|
|
HORRIFIC_L2_UCALL_CLOBBER_HACK);
|
2019-07-31 17:15:23 +02:00
|
|
|
}
|
|
|
|
|
2022-10-06 00:34:04 +00:00
|
|
|
void *ucall_arch_get_ucall(struct kvm_vcpu *vcpu)
|
2019-07-31 17:15:23 +02:00
|
|
|
{
|
2022-06-02 13:41:33 -07:00
|
|
|
struct kvm_run *run = vcpu->run;
|
2020-10-12 12:47:14 -07:00
|
|
|
|
2019-07-31 17:15:23 +02:00
|
|
|
if (run->exit_reason == KVM_EXIT_IO && run->io.port == UCALL_PIO_PORT) {
|
|
|
|
struct kvm_regs regs;
|
|
|
|
|
2022-06-02 13:41:33 -07:00
|
|
|
vcpu_regs_get(vcpu, ®s);
|
KVM: selftests: Add ucall pool based implementation
To play nice with guests whose stack memory is encrypted, e.g. AMD SEV,
introduce a new "ucall pool" implementation that passes the ucall struct
via dedicated memory (which can be mapped shared, a.k.a. as plain text).
Because not all architectures have access to the vCPU index in the guest,
use a bitmap with atomic accesses to track which entries in the pool are
free/used. A list+lock could also work in theory, but synchronizing the
individual pointers to the guest would be a mess.
Note, there's no need to rewalk the bitmap to ensure success. If all
vCPUs are simply allocating, success is guaranteed because there are
enough entries for all vCPUs. If one or more vCPUs are freeing and then
reallocating, success is guaranteed because vCPUs _always_ walk the
bitmap from 0=>N; if vCPU frees an entry and then wins a race to
re-allocate, then either it will consume the entry it just freed (bit is
the first free bit), or the losing vCPU is guaranteed to see the freed
bit (winner consumes an earlier bit, which the loser hasn't yet visited).
Reviewed-by: Andrew Jones <andrew.jones@linux.dev>
Signed-off-by: Peter Gonda <pgonda@google.com>
Co-developed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Link: https://lore.kernel.org/r/20221006003409.649993-8-seanjc@google.com
2022-10-06 00:34:09 +00:00
|
|
|
return (void *)regs.rdi;
|
2019-07-31 17:15:23 +02:00
|
|
|
}
|
2022-10-06 00:34:04 +00:00
|
|
|
return NULL;
|
2019-07-31 17:15:23 +02:00
|
|
|
}
|