calls properly with the goal of implementing EFI variable store in the
SVSM - a component which is trusted by the guest, vs in the firmware, which
is not
- Allow the kernel to handle #VC exceptions from EFI runtime services
properly when running as a SNP guest
- Rework and cleanup the SNP guest request issue glue code a bit
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmiH13gACgkQEsHwGGHe
VUr88A//fIbR7eaz7QRiHq32S57NOpyOAciYrGsBrWSo1BLSFcrelYG4RTnzaKzR
ACVr2yALoeoZooH7gtPgjt7554xWJHA9DR9Ln2YPGd5a2Np8fknY0Uu1MGFVIorC
4z2u1EATlsB0I/nCh/LboryVxFN4C+qRKRk7iJ7wibdJ15zguc0T/P5lU8gY1eB8
0NZ2e0T8QnpjIc8cx/XSYXDXIwvOJ5rX36Xm5/g6A/vPubLy1UO0hkBDGfVh+2WG
dt8T+szidtqru8RQ522jW/3R/ct8iZa0U8Cp9QDdwwcQC3jBvo/xyIv5K4ueDEEI
J0KfcIKn5zbDeQbBHMw5a9XvPshwHKQIUjY83JfSsviZ1yVseQEQHeJOE6mDn2Mj
QeCWuqtwMaEoElhNX5xhe9p60KID8VoBJqB+bb1bgbN8sPeYoHc8f9p13XJaU1Mo
hV0dwlpFwCaxCZgWdtxDVji9mmvzaUT4O1QEO88AdfhDNMa+b/T5L0dJb1gnZaUY
rQ6ePImHh9nXRtJncfK3UsGmSE6HPc4O7dyV83IAcniGTgQycIYlOIUzkUbpF+wJ
advBv5Zhx2xCOBDI1ucpHNWXCCe99YVE5GeaLjq6DMLgD0HdGnXqrCw4kOluZDBQ
Xoy07x1XANQVLk0xQ5Bf1MsOiztbCZG9Rvb2dN9lCA06W5v+0MA=
=KRgO
-----END PGP SIGNATURE-----
Merge tag 'x86_sev_for_v6.17_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 SEV updates from Borislav Petkov:
- Map the SNP calling area pages too so that OVMF EFI fw can issue SVSM
calls properly with the goal of implementing EFI variable store in
the SVSM - a component which is trusted by the guest, vs in the
firmware, which is not
- Allow the kernel to handle #VC exceptions from EFI runtime services
properly when running as a SNP guest
- Rework and cleanup the SNP guest request issue glue code a bit
* tag 'x86_sev_for_v6.17_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/sev: Let sev_es_efi_map_ghcbs() map the CA pages too
x86/sev/vc: Fix EFI runtime instruction emulation
x86/sev: Drop unnecessary parameter in snp_issue_guest_request()
x86/sev: Document requirement for linear mapping of guest request buffers
x86/sev: Allocate request in TSC_INFO_REQ on stack
virt: sev-guest: Contain snp_guest_request_ioctl in sev-guest
Forcibly disable KCSAN for the sev-nmi.c source file, which only
contains functions annotated as 'noinstr' but is emitted with calls to
KCSAN instrumentation nonetheless. E.g.,
vmlinux.o: error: objtool: __sev_es_nmi_complete+0x58: call to __kcsan_check_access() leaves .noinstr.text section
make[2]: *** [/usr/local/google/home/ardb/linux/scripts/Makefile.vmlinux_o:72: vmlinux.o] Error 1
Fixes: a3cbbb4717 ("x86/boot: Move SEV startup code into startup/")
Reported-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Tested-by: Randy Dunlap <rdunlap@infradead.org>
Link: https://lore.kernel.org/20250714073402.4107091-2-ardb+git@google.com
When using Secure TSC, the GUEST_TSC_FREQ MSR reports a frequency based on
the nominal P0 frequency, which deviates slightly (typically ~0.2%) from
the actual mean TSC frequency due to clocking parameters.
Over extended VM uptime, this discrepancy accumulates, causing clock skew
between the hypervisor and a SEV-SNP VM, leading to early timer interrupts as
perceived by the guest.
The guest kernel relies on the reported nominal frequency for TSC-based
timekeeping, while the actual frequency set during SNP_LAUNCH_START may
differ. This mismatch results in inaccurate time calculations, causing the
guest to perceive hrtimers as firing earlier than expected.
Utilize the TSC_FACTOR from the SEV firmware's secrets page (see "Secrets
Page Format" in the SNP Firmware ABI Specification) to calculate the mean
TSC frequency, ensuring accurate timekeeping and mitigating clock skew in
SEV-SNP VMs.
Use early_ioremap_encrypted() to map the secrets page as
ioremap_encrypted() uses kmalloc() which is not available during early TSC
initialization and causes a panic.
[ bp: Drop the silly dummy var:
https://lore.kernel.org/r/20250630192726.GBaGLlHl84xIopx4Pt@fat_crate.local ]
Fixes: 73bbf3b0fb ("x86/tsc: Init the TSC for Secure TSC guests")
Signed-off-by: Nikunj A Dadhania <nikunj@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/20250630081858.485187-1-nikunj@amd.com
OVMF EFI firmware needs access to the CA page to do SVSM protocol calls. For
example, when the SVSM implements an EFI variable store, such calls will be
necessary.
So add that to sev_es_efi_map_ghcbs() and also rename the function to reflect
the additional job it is doing now.
[ bp: Massage. ]
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/20250626114014.373748-4-kraxel@redhat.com
In case efi_mm is active go use the userspace instruction decoder which
supports fetching instructions from active_mm. This is needed to make
instruction emulation work for EFI runtime code, so it can use CPUID and
RDMSR.
EFI runtime code uses the CPUID instruction to gather information about
the environment it is running in, such as SEV being enabled or not, and
choose (if needed) the SEV code path for ioport access.
EFI runtime code uses the RDMSR instruction to get the location of the
CAA page (see SVSM spec, section 4.2 - "Post Boot").
The big picture behind this is that the kernel needs to be able to
properly handle #VC exceptions that come from EFI runtime services.
Since EFI runtime services have a special page table mapping for the EFI
virtual address space, the efi_mm context must be used when decoding
instructions during #VC handling.
[ bp: Massage. ]
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Pankaj Gupta <pankaj.gupta@amd.com>
Link: https://lore.kernel.org/20250626114014.373748-2-kraxel@redhat.com
Commit
3e385c0d6c ("virt: sev-guest: Move SNP Guest Request data pages handling under snp_cmd_mutex")
moved @input from snp_msg_desc to snp_guest_req which is passed to
snp_issue_guest_request().
Drop the extra parameter.
No functional change intended.
Signed-off-by: Alexey Kardashevskiy <aik@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com>
Reviewed-by: Dionna Glaze <dionnaglaze@google.com>
Link: https://lore.kernel.org/20250611040842.2667262-5-aik@amd.com
The Guest Request supports 3 types of messages now, the largest is the
extended variant of MSG_REPORT_REQ: sizeof(snp_ext_report_req)==112. These
used to be allocated on stack and then moved to the SNP guest platform device
(snp_guest_dev) for the reason explained in
db10cb9b57 ("virt: sevguest: Fix passing a stack buffer as a scatterlist target"):
aesgcm_encrypt() and aesgcm_decrypt() are used for guest messages and might
potentially use a crypto accelerator which requires DMA buffers to be in the
linear mapping.
Add a comment, warn and return an error when the buffers are not in linear
mapping.
Signed-off-by: Alexey Kardashevskiy <aik@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com>
Reviewed-by: Dionna Glaze <dionnaglaze@google.com>
Link: https://lore.kernel.org/20250611040842.2667262-4-aik@amd.com
SNP Guest Request uses only exitinfo2 which is a return value from GHCB, has
meaning beyond ioctl and therefore belongs to struct snp_guest_req.
Move exitinfo2 there and remove snp_guest_request_ioctl from the SEV platform
code.
No functional change intended.
Signed-off-by: Alexey Kardashevskiy <aik@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com>
Reviewed-by: Dionna Glaze <dionnaglaze@google.com>
Link: https://lore.kernel.org/20250611040842.2667262-2-aik@amd.com
-----BEGIN PGP SIGNATURE-----
iQFHBAABCgAxFiEEIbPD0id6easf0xsudhRwX5BBoF4FAmg+jmETHHdlaS5saXVA
a2VybmVsLm9yZwAKCRB2FHBfkEGgXiWaCACjYSQcCXW2nnZuWUnGMJq8HD5XGBAH
tNYzOyp2Y4bXEJzfmbHv8UpJynGr3IFKybCnhm0uAQZCmiR5k4CfMvjPQXcJu9LK
7yUI/dTGrRGG7f3NClWK2vXg7ATqzRGiPuPDk2lDcP04aQQWaUMDYe5SXIgcqKyZ
cm2OVHapHGbQ7wA+xXGQcUBb6VJ5+BrQUVOqaEQyl4LURvjaQcn7rVDS0SmEi8gq
42+KDVd8uWYos5dT57HIq9UI5og3PeTvAvHsx26eX8JWNqwXLgvxRH83kstK+GWY
uG3sOm5yRbJvErLpJHnyBOlXDFNw2EBeLC1VyhdJXBR8RabgI+H/mrY3
=4bTC
-----END PGP SIGNATURE-----
Merge tag 'hyperv-next-signed-20250602' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux
Pull hyperv updates from Wei Liu:
- Support for Virtual Trust Level (VTL) on arm64 (Roman Kisel)
- Fixes for Hyper-V UIO driver (Long Li)
- Fixes for Hyper-V PCI driver (Michael Kelley)
- Select CONFIG_SYSFB for Hyper-V guests (Michael Kelley)
- Documentation updates for Hyper-V VMBus (Michael Kelley)
- Enhance logging for hv_kvp_daemon (Shradha Gupta)
* tag 'hyperv-next-signed-20250602' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux: (23 commits)
Drivers: hv: Always select CONFIG_SYSFB for Hyper-V guests
Drivers: hv: vmbus: Add comments about races with "channels" sysfs dir
Documentation: hyperv: Update VMBus doc with new features and info
PCI: hv: Remove unnecessary flex array in struct pci_packet
Drivers: hv: Remove hv_alloc/free_* helpers
Drivers: hv: Use kzalloc for panic page allocation
uio_hv_generic: Align ring size to system page
uio_hv_generic: Use correct size for interrupt and monitor pages
Drivers: hv: Allocate interrupt and monitor pages aligned to system page boundary
arch/x86: Provide the CPU number in the wakeup AP callback
x86/hyperv: Fix APIC ID and VP index confusion in hv_snp_boot_ap()
PCI: hv: Get vPCI MSI IRQ domain from DeviceTree
ACPI: irq: Introduce acpi_get_gsi_dispatcher()
Drivers: hv: vmbus: Introduce hv_get_vmbus_root_device()
Drivers: hv: vmbus: Get the IRQ number from DeviceTree
dt-bindings: microsoft,vmbus: Add interrupt and DMA coherence properties
arm64, x86: hyperv: Report the VTL the system boots in
arm64: hyperv: Initialize the Virtual Trust Level field
Drivers: hv: Provide arch-neutral implementation of get_vtl()
Drivers: hv: Enable VTL mode for arm64
...
- Add a general sysfs scheme for publishing "Measurement" values
provided by the architecture's TEE Security Manager. Use it to publish
TDX "Runtime Measurement Registers" ("RTMRs") that either maintain a
hash of stored values (similar to a TPM PCR) or provide statically
provisioned data. These measurements are validated by a relying party.
- Reorganize the drivers/virt/coco/ directory for "host" and "guest"
shared infrastructure.
- Fix a configfs-tsm-report unregister bug
- With CONFIG_TSM_MEASUREMENTS joining CONFIG_TSM_REPORTS and in
anticipation of more shared "TSM" infrastructure arriving, rename the
maintainer entry to "TRUSTED SECURITY MODULE (TSM) INFRASTRUCTURE".
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQSbo+XnGs+rwLz9XGXfioYZHlFsZwUCaDj38gAKCRDfioYZHlFs
Z3EKAQC2K7RgoufBlLv4C79W8IGiUirKKQvtY9aiC7s/W8R4UwEApwV5gXQx2ImN
cEIIkAkVI2h9wJ9LHxyr3R5XfZPBGgA=
=2fTp
-----END PGP SIGNATURE-----
Merge tag 'tsm-for-6.16' of git://git.kernel.org/pub/scm/linux/kernel/git/devsec/tsm
Pull trusted security manager (TSM) updates from Dan Williams:
- Add a general sysfs scheme for publishing "Measurement" values
provided by the architecture's TEE Security Manager. Use it to
publish TDX "Runtime Measurement Registers" ("RTMRs") that either
maintain a hash of stored values (similar to a TPM PCR) or provide
statically provisioned data. These measurements are validated by a
relying party.
- Reorganize the drivers/virt/coco/ directory for "host" and "guest"
shared infrastructure.
- Fix a configfs-tsm-report unregister bug
- With CONFIG_TSM_MEASUREMENTS joining CONFIG_TSM_REPORTS and in
anticipation of more shared "TSM" infrastructure arriving, rename the
maintainer entry to "TRUSTED SECURITY MODULE (TSM) INFRASTRUCTURE".
* tag 'tsm-for-6.16' of git://git.kernel.org/pub/scm/linux/kernel/git/devsec/tsm:
tsm-mr: Fix init breakage after bin_attrs constification by scoping non-const pointers to init phase
sample/tsm-mr: Fix missing static for sample_report
virt: tdx-guest: Transition to scoped_cond_guard for mutex operations
virt: tdx-guest: Refactor and streamline TDREPORT generation
virt: tdx-guest: Expose TDX MRs as sysfs attributes
x86/tdx: tdx_mcall_get_report0: Return -EBUSY on TDCALL_OPERAND_BUSY error
x86/tdx: Add tdx_mcall_extend_rtmr() interface
tsm-mr: Add tsm-mr sample code
tsm-mr: Add TVM Measurement Register support
configfs-tsm-report: Fix NULL dereference of tsm_ops
coco/guest: Move shared guest CC infrastructure to drivers/virt/coco/guest/
configfs-tsm: Namespace TSM report symbols
device emulated by a Secure VM Service Module (SVSM) - a helper module of sorts
which runs at a different privilege level in the SEV-SNP VM stack.
The intent being that a TPM device is emulated by a trusted entity and not by
the untrusted host which is the default assumption in the confidential
computing scenarios.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmg0vOoACgkQEsHwGGHe
VUo44hAAqro0GkugpytbwYnNSnT2q0C7SoYghvKXmrGJiv/hn/5Q+cYh0AfIRsR0
hVRymTuyGSODjqUSOycMcTMbpQMVryjc2X0rK4MAWs9PKyIaLJVZh7vW497i00q5
Nl0kE+HzjZ44kF8udIBsYKS1qFfNyn11eE3xbFVVcHBCee0n775aOXWBup2d+G7u
+BnskBbDuV3umlce6oDrmhmtKF6PfFMSx5E4YTCrXBDEPYwJemxbdw+N6AXY6cme
mSJe8zwqrz+FX8653O6j9DsZgniIY/XhZd3tJ8QECY+DBRW1ldUt+k1zKeAeeVhc
oqLJGtyVd3cmfx+ZhvLh1VIMHpihkAfzl6gZoNTvUP9m3aGKz44xbL3aiA5UDdDo
DpQek8fInKA0iyWg6SHU+phRuvCXVXorcmIegSdYj3hzOc29AyEkcgfeIwzcbAE+
fuWO9SlGFqa/872d8z1AtSISTB6gWh0KqLaphdfmYmkZljNpKC0bs40788ymWOg8
iZPUj1ffP+Jy8/CGiOSmM5sq2Msy3D1JoIwciaIDo80OqsYDEvp7cwrp+w82T0yH
clQu+WZNCFYr6OLgrUq7pr0WDa8h8a5ifMaPjpaYl6V4TvDiAo97t5/yWJXTInzS
1HbckXH4BF4spgNDjX4wbDHhfoRC7ac4IqhGOL5ngI2vIWLijXo=
=tbNN
-----END PGP SIGNATURE-----
Merge tag 'x86_sev_for_v6.16_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull AMD SEV update from Borislav Petkov:
"Add a virtual TPM driver glue which allows a guest kernel to talk to a
TPM device emulated by a Secure VM Service Module (SVSM) - a helper
module of sorts which runs at a different privilege level in the
SEV-SNP VM stack.
The intent being that a TPM device is emulated by a trusted entity and
not by the untrusted host which is the default assumption in the
confidential computing scenarios"
* tag 'x86_sev_for_v6.16_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/sev: Register tpm-svsm platform device
tpm: Add SNP SVSM vTPM driver
svsm: Add header with SVSM_VTPM_CMD helpers
x86/sev: Add SVSM vTPM probe/send_command functions
When starting APs, confidential guests and paravisor guests
need to know the CPU number, and the pattern of using the linear
search has emerged in several places. With N processors that leads
to the O(N^2) time complexity.
Provide the CPU number in the AP wake up callback so that one can
get the CPU number in constant time.
Suggested-by: Michael Kelley <mhklinux@outlook.com>
Signed-off-by: Roman Kisel <romank@linux.microsoft.com>
Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Michael Kelley <mhklinux@outlook.com>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Link: https://lore.kernel.org/r/20250507182227.7421-3-romank@linux.microsoft.com
Signed-off-by: Wei Liu <wei.liu@kernel.org>
Message-ID: <20250507182227.7421-3-romank@linux.microsoft.com>
-----BEGIN PGP SIGNATURE-----
iQFSBAABCgA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAmgqSbkeHHRvcnZhbGRz
QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGr6sH/1ICAvlin1GuxffE
ISVNz3xhXQpXG2k8yl9r0umpdCfPQbGrxm30vZyuIDNutY/FuMvkIqfu+Z1NnLg0
GidZW015LtXrp7/puKtTnUD5CPSjdETMXig+Q7c1PrxkkmHwz8sBbbm173AIDbDB
t7wwqSEUQh2AIDouGwN+DXB+6bR2FoOXb/k/njmtappIwR3rBc2f1HQJnP095rKO
5AKw1c9DMv5Wq2cEdBOCP48e4CFZEIN1ycW0nvtjpnOmcPOJjLoEothRbntQolqF
udtj5UeTGdAJqmjigv7KHmlrmFNe+GqBq4+beHl5MRxhBaT2uGGaM9jCJiSxT3Jx
sHyYYr8=
=Ddma
-----END PGP SIGNATURE-----
Merge tag 'v6.15-rc7' into x86/core, to pick up fixes
Pick up build fixes from upstream to make this tree more testable.
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The main CPUID header <asm/cpuid.h> was originally a storefront for the
headers:
<asm/cpuid/api.h>
<asm/cpuid/leaf_0x2_api.h>
Now that the latter CPUID(0x2) header has been merged into the former,
there is no practical difference between <asm/cpuid.h> and
<asm/cpuid/api.h>.
Migrate all users to the <asm/cpuid/api.h> header, in preparation of
the removal of <asm/cpuid.h>.
Don't remove <asm/cpuid.h> just yet, in case some new code in -next
started using it.
Suggested-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Ahmed S. Darwish <darwi@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: John Ogness <john.ogness@linutronix.de>
Cc: x86-cpuid@lists.linux.dev
Link: https://lore.kernel.org/r/20250508150240.172915-3-darwi@linutronix.de
When shared pages are being converted to private during kdump, additional
checks are performed. They include handling the case of a GHCB page being
contained within a huge page.
Currently, this check incorrectly skips a page just below the GHCB page from
being transitioned back to private during kdump preparation.
This skipped page causes a 0x404 #VC exception when it is accessed later while
dumping guest memory for vmcore generation.
Correct the range to be checked for GHCB contained in a huge page. Also,
ensure that the skipped huge page containing the GHCB page is transitioned
back to private by applying the correct address mask later when changing GHCBs
to private at end of kdump preparation.
[ bp: Massage commit message. ]
Fixes: 3074152e56 ("x86/sev: Convert shared memory back to private on kexec")
Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com>
Tested-by: Srikanth Aithal <sraithal@amd.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/20250506183529.289549-1-Ashish.Kalra@amd.com
When kdump is running makedumpfile to generate vmcore and dump SNP guest
memory it touches the VMSA page of the vCPU executing kdump.
It then results in unrecoverable #NPF/RMP faults as the VMSA page is
marked busy/in-use when the vCPU is running and subsequently a causes
guest softlockup/hang.
Additionally, other APs may be halted in guest mode and their VMSA pages
are marked busy and touching these VMSA pages during guest memory dump
will also cause #NPF.
Issue AP_DESTROY GHCB calls on other APs to ensure they are kicked out
of guest mode and then clear the VMSA bit on their VMSA pages.
If the vCPU running kdump is an AP, mark it's VMSA page as offline to
ensure that makedumpfile excludes that page while dumping guest memory.
Fixes: 3074152e56 ("x86/sev: Convert shared memory back to private on kexec")
Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Pankaj Gupta <pankaj.gupta@amd.com>
Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com>
Tested-by: Srikanth Aithal <sraithal@amd.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/20250428214151.155464-1-Ashish.Kalra@amd.com
Return `-EBUSY` from tdx_mcall_get_report0() when `TDG.MR.REPORT` returns
`TDCALL_OPERAND_BUSY`. This enables the caller to retry obtaining a
TDREPORT later if another VCPU is extending an RTMR concurrently.
Signed-off-by: Cedric Xing <cedric.xing@intel.com>
Acked-by: Dionna Amalie Glaze <dionnaglaze@google.com>
Acked-by: Dave Hansen <dave.hansen@linux.intel.com>
Link: https://patch.msgid.link/20250506-tdx-rtmr-v6-4-ac6ff5e9d58a@intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
The TDX guest exposes one MRTD (Build-time Measurement Register) and four
RTMR (Run-time Measurement Register) registers to record the build and boot
measurements of a virtual machine (VM). These registers are similar to PCR
(Platform Configuration Register) registers in the TPM (Trusted Platform
Module) space. This measurement data is used to implement security features
like attestation and trusted boot.
To facilitate updating the RTMR registers, the TDX module provides support
for the `TDG.MR.RTMR.EXTEND` TDCALL which can be used to securely extend
the RTMR registers.
Add helper function to update RTMR registers. It will be used by the TDX
guest driver in enabling RTMR extension support.
Co-developed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Signed-off-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Signed-off-by: Cedric Xing <cedric.xing@intel.com>
Acked-by: Dionna Amalie Glaze <dionnaglaze@google.com>
Acked-by: Dave Hansen <dave.hansen@linux.intel.com>
Link: https://patch.msgid.link/20250506-tdx-rtmr-v6-3-ac6ff5e9d58a@intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Most of the SEV support code used to reside in a single C source file
that was included in two places: the core kernel, and the decompressor.
The code that is actually shared with the decompressor was moved into a
separate, shared source file under startup/, on the basis that the
decompressor also executes from the early 1:1 mapping of memory.
However, while the elaborate #VC handling and instruction decoding that
it involves is also performed by the decompressor, it does not actually
occur in the core kernel at early boot, and therefore, does not need to
be part of the confined early startup code.
So split off the #VC handling code and move it back into arch/x86/coco
where it came from, into another C source file that is included from
both the decompressor and the core kernel.
Code movement only - no functional change intended.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: David Woodhouse <dwmw@amazon.co.uk>
Cc: Dionna Amalie Glaze <dionnaglaze@google.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Kevin Loughlin <kevinloughlin@google.com>
Cc: Len Brown <len.brown@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: linux-efi@vger.kernel.org
Link: https://lore.kernel.org/r/20250504095230.2932860-31-ardb+git@google.com
Add aliases for all the data objects that the startup code references -
this is needed so that this code can be moved into its own confined area
where it can only access symbols that have a __pi_ prefix.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: David Woodhouse <dwmw@amazon.co.uk>
Cc: Dionna Amalie Glaze <dionnaglaze@google.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Kevin Loughlin <kevinloughlin@google.com>
Cc: Len Brown <len.brown@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: linux-efi@vger.kernel.org
Link: https://lore.kernel.org/r/20250504095230.2932860-39-ardb+git@google.com
__rdmsr() is the lowest level MSR write API, with native_rdmsr()
and native_rdmsrq() serving as higher-level wrappers around it.
#define native_rdmsr(msr, val1, val2) \
do { \
u64 __val = __rdmsr((msr)); \
(void)((val1) = (u32)__val); \
(void)((val2) = (u32)(__val >> 32)); \
} while (0)
static __always_inline u64 native_rdmsrq(u32 msr)
{
return __rdmsr(msr);
}
However, __rdmsr() continues to be utilized in various locations.
MSR APIs are designed for different scenarios, such as native or
pvops, with or without trace, and safe or non-safe. Unfortunately,
the current MSR API names do not adequately reflect these factors,
making it challenging to select the most appropriate API for
various situations.
To pave the way for improving MSR API names, convert __rdmsr()
uses to native_rdmsrq() to ensure consistent usage. Later, these
APIs can be renamed to better reflect their implications, such as
native or pvops, with or without trace, and safe or non-safe.
No functional change intended.
Signed-off-by: Xin Li (Intel) <xin@zytor.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Sean Christopherson <seanjc@google.com>
Cc: Stefano Stabellini <sstabellini@kernel.org>
Cc: Uros Bizjak <ubizjak@gmail.com>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Link: https://lore.kernel.org/r/20250427092027.1598740-10-xin@zytor.com
For historic reasons there are some TSC-related functions in the
<asm/msr.h> header, even though there's an <asm/tsc.h> header.
To facilitate the relocation of rdtsc{,_ordered}() from <asm/msr.h>
to <asm/tsc.h> and to eventually eliminate the inclusion of
<asm/msr.h> in <asm/tsc.h>, add an explicit <asm/msr.h> dependency
to the source files that reference definitions from <asm/msr.h>.
[ mingo: Clarified the changelog. ]
Signed-off-by: Xin Li (Intel) <xin@zytor.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Dave Hansen <dave.hansen@linux.intel.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Uros Bizjak <ubizjak@gmail.com>
Link: https://lore.kernel.org/r/20250501054241.1245648-1-xin@zytor.com
This commits breaks SNP guests:
234cf67fc3 ("x86/sev: Split off startup code from core code")
The SNP guest boots, but no longer has access to the VMPCK keys needed
to communicate with the ASP, which is used, for example, to obtain an
attestation report.
The secrets_pa value is defined as static in both startup.c and
core.c. It is set by a function in startup.c and so when used in
core.c its value will be 0.
Share it again and add the sev_ prefix to put it into the global
SEV symbols namespace.
[ mingo: Renamed to sev_secrets_pa ]
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Cc: Dionna Amalie Glaze <dionnaglaze@google.com>
Cc: Kevin Loughlin <kevinloughlin@google.com>
Link: https://lore.kernel.org/r/cf878810-81ed-3017-52c6-ce6aa41b5f01@amd.com
Move the SEV startup code into arch/x86/boot/startup/, where it will
reside along with other code that executes extremely early, and
therefore needs to be built in a special manner.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: David Woodhouse <dwmw@amazon.co.uk>
Cc: Dionna Amalie Glaze <dionnaglaze@google.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Kevin Loughlin <kevinloughlin@google.com>
Cc: Len Brown <len.brown@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Link: https://lore.kernel.org/r/20250418141253.2601348-12-ardb+git@google.com
Disentangle the SEV core code and the SEV code that is called during
early boot. The latter piece will be moved into startup/ in a subsequent
patch.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: David Woodhouse <dwmw@amazon.co.uk>
Cc: Dionna Amalie Glaze <dionnaglaze@google.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Kevin Loughlin <kevinloughlin@google.com>
Cc: Len Brown <len.brown@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Link: https://lore.kernel.org/r/20250418141253.2601348-11-ardb+git@google.com
GCC may ignore the __no_sanitize_address function attribute when
inlining, resulting in KASAN instrumentation in code tagged as
'noinstr'.
Move the SEV NMI handling code, which is noinstr, into a separate source
file so KASAN can be disabled on the whole file without losing coverage
of other SEV core code, once the startup code is split off from it too.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: David Woodhouse <dwmw@amazon.co.uk>
Cc: Dionna Amalie Glaze <dionnaglaze@google.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Kevin Loughlin <kevinloughlin@google.com>
Cc: Len Brown <len.brown@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Link: https://lore.kernel.org/r/20250418141253.2601348-10-ardb+git@google.com
Prepare for splitting off parts of the SEV core.c source file into a
file that carries code that must tolerate being called from the early
1:1 mapping. This will allow special build-time handling of thise code,
to ensure that it gets generated in a way that is compatible with the
early execution context.
So create a de-facto internal SEV API and put the definitions into
sev-internal.h. No attempt is made to allow this header file to be
included in arbitrary other sources - this is explicitly not the intent.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Dionna Amalie Glaze <dionnaglaze@google.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Kevin Loughlin <kevinloughlin@google.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: linux-efi@vger.kernel.org
Link: https://lore.kernel.org/r/20250410134117.3713574-20-ardb+git@google.com
RIP_REL_REF() is used in non-PIC C code that is called very early,
before the kernel virtual mapping is up, which is the mapping that the
linker expects. It is currently used in two different ways:
- to refer to the value of a global variable, including as an lvalue in
assignments;
- to take the address of a global variable via the mapping that the code
currently executes at.
The former case is only needed in non-PIC code, as PIC code will never
use absolute symbol references when the address of the symbol is not
being used. But taking the address of a variable in PIC code may still
require extra care, as a stack allocated struct assignment may be
emitted as a memcpy() from a statically allocated copy in .rodata.
For instance, this
void startup_64_setup_gdt_idt(void)
{
struct desc_ptr startup_gdt_descr = {
.address = (__force unsigned long)gdt_page.gdt,
.size = GDT_SIZE - 1,
};
may result in an absolute symbol reference in PIC code, even though the
struct is allocated on the stack and populated at runtime.
To address this case, make rip_rel_ptr() accessible in PIC code, and
update any existing uses where the address of a global variable is
taken using RIP_REL_REF.
Once all code of this nature has been moved into arch/x86/boot/startup
and built with -fPIC, RIP_REL_REF() can be retired, and only
rip_rel_ptr() will remain.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Dionna Amalie Glaze <dionnaglaze@google.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Kevin Loughlin <kevinloughlin@google.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: linux-efi@vger.kernel.org
Link: https://lore.kernel.org/r/20250410134117.3713574-14-ardb+git@google.com
SNP platform can provide a vTPM device emulated by SVSM.
The "tpm-svsm" device can be handled by the platform driver registered by the
x86/sev core code.
Register the platform device only when SVSM is available and it supports vTPM
commands as checked by snp_svsm_vtpm_probe().
[ bp: Massage commit message. ]
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com>
Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>
Link: https://lore.kernel.org/r/20250410135118.133240-5-sgarzare@redhat.com
Add two new functions to probe and send commands to the SVSM vTPM. They
leverage the two calls defined by the AMD SVSM specification [1] for the vTPM
protocol: SVSM_VTPM_QUERY and SVSM_VTPM_CMD.
Expose snp_svsm_vtpm_send_command() to be used by a TPM driver.
[1] "Secure VM Service Module for SEV-SNP Guests"
Publication # 58019 Revision: 1.00
[ bp: Some doc touchups. ]
Co-developed-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Co-developed-by: Claudio Carvalho <cclaudio@linux.ibm.com>
Signed-off-by: Claudio Carvalho <cclaudio@linux.ibm.com>
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com>
Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>
Link: https://lore.kernel.org/r/20250403100943.120738-2-sgarzare@redhat.com
Direct HLT instruction execution causes #VEs for TDX VMs which is routed
to hypervisor via TDCALL. safe_halt() routines execute HLT in STI-shadow
so IRQs need to remain disabled until the TDCALL to ensure that pending
IRQs are correctly treated as wake events.
Emit warning and fail emulation if IRQs are enabled during HLT #VE handling
to avoid running into scenarios where IRQ wake events are lost resulting in
indefinite HLT execution times.
Signed-off-by: Vishal Annapurve <vannapurve@google.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Tested-by: Ryan Afranji <afranji@google.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Link: https://lore.kernel.org/r/20250228014416.3925664-4-vannapurve@google.com
Direct HLT instruction execution causes #VEs for TDX VMs which is routed
to hypervisor via TDCALL. If HLT is executed in STI-shadow, resulting #VE
handler will enable interrupts before TDCALL is routed to hypervisor
leading to missed wakeup events, as current TDX spec doesn't expose
interruptibility state information to allow #VE handler to selectively
enable interrupts.
Commit bfe6ed0c67 ("x86/tdx: Add HLT support for TDX guests")
prevented the idle routines from executing HLT instruction in STI-shadow.
But it missed the paravirt routine which can be reached via this path
as an example:
kvm_wait() =>
safe_halt() =>
raw_safe_halt() =>
arch_safe_halt() =>
irq.safe_halt() =>
pv_native_safe_halt()
To reliably handle arch_safe_halt() for TDX VMs, introduce explicit
dependency on CONFIG_PARAVIRT and override paravirt halt()/safe_halt()
routines with TDX-safe versions that execute direct TDCALL and needed
interrupt flag updates. Executing direct TDCALL brings in additional
benefit of avoiding HLT related #VEs altogether.
As tested by Ryan Afranji:
"Tested with the specjbb2015 benchmark. It has heavy lock contention which leads
to many halt calls. TDX VMs suffered a poor score before this patchset.
Verified the major performance improvement with this patchset applied."
Fixes: bfe6ed0c67 ("x86/tdx: Add HLT support for TDX guests")
Signed-off-by: Vishal Annapurve <vannapurve@google.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Tested-by: Ryan Afranji <afranji@google.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20250228014416.3925664-3-vannapurve@google.com
- The biggest change is the new option to automatically fail
the build on objtool warnings: CONFIG_OBJTOOL_WERROR.
While there are no currently known unfixed false positives
left, such an expansion in the severity of objtool warnings
inevitably creates a risk of build failures, so it's disabled by
default and depends on !COMPILE_TEST, so it shouldn't be enabled
on allyesconfig/allmodconfig builds and won't be forced on people
who just accept build-time defaults in 'make oldconfig'.
While the option is strongly recommended, only people who enable
it explicitly should see it.
(Josh Poimboeuf)
- Disable branch profiling in noinstr code with a broad
brush that includes all of arch/x86/ and kernel/sched/. (Josh Poimboeuf)
- Create backup object files on objtool errors and print exact
objtool arguments to make failure analysis easier (Josh Poimboeuf)
- Improve noreturn handling (Josh Poimboeuf)
- Improve rodata handling (Tiezhu Yang)
- Support jump tables, switch tables and goto tables on LoongArch (Tiezhu Yang)
- Misc cleanups and fixes (Josh Poimboeuf, David Engraf, Ingo Molnar)
Signed-off-by: Ingo Molnar <mingo@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmfefkARHG1pbmdvQGtl
cm5lbC5vcmcACgkQEnMQ0APhK1inlRAAvd9Hom2qh9Iu+KYYF58vsg9zsxWZA6I1
blouKI4SUA8Xjuw6Nihx+emaPaMW1boGSLTNsFzrCa3S1+4UHTTp/Y8snZWJ/Mc/
Peg52N6u/LIcoQM+vNJYRtd9y4wabX87vl0qTxte0kB0Neps3/yQvxtUa2K1srXp
8nwHK+PdzNsgPuIrIiNc9ymsPvbqFHmVIRRVNVKX4BlPJi2kJs9kx43kszweQR/X
kW/bs1315m1HS5i02K0Zs/XdOZHLsk9ERu+aviBJV1txrgZIukATIqbODiI+3RZX
0oa3KxfzEVFN2k3OukrezV2INzETkN+oOSTAZIUOqwSVe+8rdQVBSdYT6svYn/yy
aS8Bi5Mm1nfizTU+cRrzU7FxWCmwsxm9r4fHPTV8Owjxg0uoGk/E/qlvERuR2rpA
p2tHMo1lp2Yo+VZBZPfm5KDHFG4tSGhF9eav2bqSI7/Kf5AWxRl8kBs5iLrcxsXh
4qk3FalnuM7A+1McAUcBJAvM897Yie0s2G83ZipyYyA6U3LSBhBMWh9FlIiAjuIh
YnX6IFkW9tVzVZpJFEGQn+2Ewl5Y2Go5bokKk03vkWCZCgg+hEUsVh6Cnm1ocZpO
Ll3/UF4i8XjjyuAuHDn6mzyHIgch2xRN02v7dJSb09+O9b8vIoPoSqbWSLJUOBqf
r6UesXDG8mY=
=iswJ
-----END PGP SIGNATURE-----
Merge tag 'objtool-core-2025-03-22' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull objtool updates from Ingo Molnar:
- The biggest change is the new option to automatically fail the build
on objtool warnings: CONFIG_OBJTOOL_WERROR.
While there are no currently known unfixed false positives left, such
an expansion in the severity of objtool warnings inevitably creates a
risk of build failures, so it's disabled by default and depends on
!COMPILE_TEST, so it shouldn't be enabled on
allyesconfig/allmodconfig builds and won't be forced on people who
just accept build-time defaults in 'make oldconfig'.
While the option is strongly recommended, only people who enable it
explicitly should see it.
(Josh Poimboeuf)
- Disable branch profiling in noinstr code with a broad brush that
includes all of arch/x86/ and kernel/sched/. (Josh Poimboeuf)
- Create backup object files on objtool errors and print exact objtool
arguments to make failure analysis easier (Josh Poimboeuf)
- Improve noreturn handling (Josh Poimboeuf)
- Improve rodata handling (Tiezhu Yang)
- Support jump tables, switch tables and goto tables on LoongArch
(Tiezhu Yang)
- Misc cleanups and fixes (Josh Poimboeuf, David Engraf, Ingo Molnar)
* tag 'objtool-core-2025-03-22' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (22 commits)
tracing: Disable branch profiling in noinstr code
objtool: Use O_CREAT with explicit mode mask
objtool: Add CONFIG_OBJTOOL_WERROR
objtool: Create backup on error and print args
objtool: Change "warning:" to "error:" for --Werror
objtool: Add --Werror option
objtool: Add --output option
objtool: Upgrade "Linked object detected" warning to error
objtool: Consolidate option validation
objtool: Remove --unret dependency on --rethunk
objtool: Increase per-function WARN_FUNC() rate limit
objtool: Update documentation
objtool: Improve __noreturn annotation warning
objtool: Fix error handling inconsistencies in check()
x86/traps: Make exc_double_fault() consistently noreturn
LoongArch: Enable jump table for objtool
objtool/LoongArch: Add support for goto table
objtool/LoongArch: Add support for switch table
objtool: Handle PC relative relocation type
objtool: Handle different entry size of rodata
...
- loadpin: remove unsupported MODULE_COMPRESS_NONE (Arulpandiyan Vadivel)
- samples/check-exec: Fix script name (Mickaël Salaün)
- yama: remove needless locking in yama_task_prctl() (Oleg Nesterov)
- lib/string_choices: Sort by function name (R Sundar)
- hardening: Allow default HARDENED_USERCOPY to be set at compile time
(Mel Gorman)
- uaccess: Split out compile-time checks into ucopysize.h
- kbuild: clang: Support building UM with SUBARCH=i386
- x86: Enable i386 FORTIFY_SOURCE on Clang 16+
- ubsan/overflow: Rework integer overflow sanitizer option
- Add missing __nonstring annotations for callers of memtostr*()/strtomem*()
- Add __must_be_noncstr() and have memtostr*()/strtomem*() check for it
- Introduce __nonstring_array for silencing future GCC 15 warnings
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQRSPkdeREjth1dHnSE2KwveOeQkuwUCZ9hGrgAKCRA2KwveOeQk
u1WvAQC3ZxFu3b0Omfmht2pPqCltf2UOQNvUx3egjoeXpUaNSgD+Lxr/T4xksy7E
jHh7rCYDkruOWs3DHA5JjRQcf0BBLQo=
=FTQp
-----END PGP SIGNATURE-----
Merge tag 'hardening-v6.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux
Pull hardening updates from Kees Cook:
"As usual, it's scattered changes all over. Patches touching things
outside of our traditional areas in the tree have been Acked by
maintainers or were trivial changes:
- loadpin: remove unsupported MODULE_COMPRESS_NONE (Arulpandiyan
Vadivel)
- samples/check-exec: Fix script name (Mickaël Salaün)
- yama: remove needless locking in yama_task_prctl() (Oleg Nesterov)
- lib/string_choices: Sort by function name (R Sundar)
- hardening: Allow default HARDENED_USERCOPY to be set at compile
time (Mel Gorman)
- uaccess: Split out compile-time checks into ucopysize.h
- kbuild: clang: Support building UM with SUBARCH=i386
- x86: Enable i386 FORTIFY_SOURCE on Clang 16+
- ubsan/overflow: Rework integer overflow sanitizer option
- Add missing __nonstring annotations for callers of
memtostr*()/strtomem*()
- Add __must_be_noncstr() and have memtostr*()/strtomem*() check for
it
- Introduce __nonstring_array for silencing future GCC 15 warnings"
* tag 'hardening-v6.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: (26 commits)
compiler_types: Introduce __nonstring_array
hardening: Enable i386 FORTIFY_SOURCE on Clang 16+
x86/build: Remove -ffreestanding on i386 with GCC
ubsan/overflow: Enable ignorelist parsing and add type filter
ubsan/overflow: Enable pattern exclusions
ubsan/overflow: Rework integer overflow sanitizer option to turn on everything
samples/check-exec: Fix script name
yama: don't abuse rcu_read_lock/get_task_struct in yama_task_prctl()
kbuild: clang: Support building UM with SUBARCH=i386
loadpin: remove MODULE_COMPRESS_NONE as it is no longer supported
lib/string_choices: Rearrange functions in sorted order
string.h: Validate memtostr*()/strtomem*() arguments more carefully
compiler.h: Introduce __must_be_noncstr()
nilfs2: Mark on-disk strings as nonstring
uapi: stddef.h: Introduce __kernel_nonstring
x86/tdx: Mark message.bytes as nonstring
string: kunit: Mark nonstring test strings as __nonstring
scsi: qla2xxx: Mark device strings as nonstring
scsi: mpt3sas: Mark device strings as nonstring
scsi: mpi3mr: Mark device strings as nonstring
...
CONFIG_TRACE_BRANCH_PROFILING inserts a call to ftrace_likely_update()
for each use of likely() or unlikely(). That breaks noinstr rules if
the affected function is annotated as noinstr.
Disable branch profiling for files with noinstr functions. In addition
to some individual files, this also includes the entire arch/x86
subtree, as well as the kernel/entry, drivers/cpuidle, and drivers/idle
directories, all of which are noinstr-heavy.
Due to the nature of how sched binaries are built by combining multiple
.c files into one, branch profiling is disabled more broadly across the
sched code than would otherwise be needed.
This fixes many warnings like the following:
vmlinux.o: warning: objtool: do_syscall_64+0x40: call to ftrace_likely_update() leaves .noinstr.text section
vmlinux.o: warning: objtool: __rdgsbase_inactive+0x33: call to ftrace_likely_update() leaves .noinstr.text section
vmlinux.o: warning: objtool: handle_bug.isra.0+0x198: call to ftrace_likely_update() leaves .noinstr.text section
...
Reported-by: Ingo Molnar <mingo@kernel.org>
Suggested-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/r/fb94fc9303d48a5ed370498f54500cc4c338eb6d.1742586676.git.jpoimboe@kernel.org
No need for an 'else' statement after a 'return'.
[ mingo: Clarified the changelog ]
Signed-off-by: Peng Hao <flyingpeng@tencent.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: linux-kernel@vger.kernel.org
Compared to the SNP Guest Request, the "Extended" version adds data pages for
receiving certificates. If not enough pages provided, the HV can report to the
VM how much is needed so the VM can reallocate and repeat.
Commit
ae596615d9 ("virt: sev-guest: Reduce the scope of SNP command mutex")
moved handling of the allocated/desired pages number out of scope of said
mutex and create a possibility for a race (multiple instances trying to
trigger Extended request in a VM) as there is just one instance of
snp_msg_desc per /dev/sev-guest and no locking other than snp_cmd_mutex.
Fix the issue by moving the data blob/size and the GHCB input struct
(snp_req_data) into snp_guest_req which is allocated on stack now and accessed
by the GHCB caller under that mutex.
Stop allocating SEV_FW_BLOB_MAX_SIZE in snp_msg_alloc() as only one of four
callers needs it. Free the received blob in get_ext_report() right after it is
copied to the userspace. Possible future users of snp_send_guest_request() are
likely to have different ideas about the buffer size anyways.
Fixes: ae596615d9 ("virt: sev-guest: Reduce the scope of SNP command mutex")
Signed-off-by: Alexey Kardashevskiy <aik@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Nikunj A Dadhania <nikunj@amd.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20250307013700.437505-3-aik@amd.com
In preparation for strtomem*() checking that its destination is a
nonstring, rename and annotate message.bytes accordingly.
Acked-by: Dave Hansen <dave.hansen@linux.intel.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Kees Cook <kees@kernel.org>
When retpolines and IBT are both disabled, the compiler is free to use
jump tables to optimize switch instructions. However, these are emitted
by Clang as absolute references into .rodata:
jmp *-0x7dfffe90(,%r9,8)
R_X86_64_32S .rodata+0x170
Given that this code will execute before that address in .rodata has even
been mapped, it is guaranteed to crash a SEV-SNP guest in a way that is
difficult to diagnose.
So disable jump tables when building this code. It would be better if we
could attach this annotation to the __head macro but this appears to be
impossible.
Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Tested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20250127114334.1045857-6-ardb+git@google.com
indivudual patches which are described in their changelogs.
- "Allocate and free frozen pages" from Matthew Wilcox reorganizes the
page allocator so we end up with the ability to allocate and free
zero-refcount pages. So that callers (ie, slab) can avoid a refcount
inc & dec.
- "Support large folios for tmpfs" from Baolin Wang teaches tmpfs to use
large folios other than PMD-sized ones.
- "Fix mm/rodata_test" from Petr Tesarik performs some maintenance and
fixes for this small built-in kernel selftest.
- "mas_anode_descend() related cleanup" from Wei Yang tidies up part of
the mapletree code.
- "mm: fix format issues and param types" from Keren Sun implements a
few minor code cleanups.
- "simplify split calculation" from Wei Yang provides a few fixes and a
test for the mapletree code.
- "mm/vma: make more mmap logic userland testable" from Lorenzo Stoakes
continues the work of moving vma-related code into the (relatively) new
mm/vma.c.
- "mm/page_alloc: gfp flags cleanups for alloc_contig_*()" from David
Hildenbrand cleans up and rationalizes handling of gfp flags in the page
allocator.
- "readahead: Reintroduce fix for improper RA window sizing" from Jan
Kara is a second attempt at fixing a readahead window sizing issue. It
should reduce the amount of unnecessary reading.
- "synchronously scan and reclaim empty user PTE pages" from Qi Zheng
addresses an issue where "huge" amounts of pte pagetables are
accumulated
(https://lore.kernel.org/lkml/cover.1718267194.git.zhengqi.arch@bytedance.com/).
Qi's series addresses this windup by synchronously freeing PTE memory
within the context of madvise(MADV_DONTNEED).
- "selftest/mm: Remove warnings found by adding compiler flags" from
Muhammad Usama Anjum fixes some build warnings in the selftests code
when optional compiler warnings are enabled.
- "mm: don't use __GFP_HARDWALL when migrating remote pages" from David
Hildenbrand tightens the allocator's observance of __GFP_HARDWALL.
- "pkeys kselftests improvements" from Kevin Brodsky implements various
fixes and cleanups in the MM selftests code, mainly pertaining to the
pkeys tests.
- "mm/damon: add sample modules" from SeongJae Park enhances DAMON to
estimate application working set size.
- "memcg/hugetlb: Rework memcg hugetlb charging" from Joshua Hahn
provides some cleanups to memcg's hugetlb charging logic.
- "mm/swap_cgroup: remove global swap cgroup lock" from Kairui Song
removes the global swap cgroup lock. A speedup of 10% for a tmpfs-based
kernel build was demonstrated.
- "zram: split page type read/write handling" from Sergey Senozhatsky
has several fixes and cleaups for zram in the area of zram_write_page().
A watchdog softlockup warning was eliminated.
- "move pagetable_*_dtor() to __tlb_remove_table()" from Kevin Brodsky
cleans up the pagetable destructor implementations. A rare
use-after-free race is fixed.
- "mm/debug: introduce and use VM_WARN_ON_VMG()" from Lorenzo Stoakes
simplifies and cleans up the debugging code in the VMA merging logic.
- "Account page tables at all levels" from Kevin Brodsky cleans up and
regularizes the pagetable ctor/dtor handling. This results in
improvements in accounting accuracy.
- "mm/damon: replace most damon_callback usages in sysfs with new core
functions" from SeongJae Park cleans up and generalizes DAMON's sysfs
file interface logic.
- "mm/damon: enable page level properties based monitoring" from
SeongJae Park increases the amount of information which is presented in
response to DAMOS actions.
- "mm/damon: remove DAMON debugfs interface" from SeongJae Park removes
DAMON's long-deprecated debugfs interfaces. Thus the migration to sysfs
is completed.
- "mm/hugetlb: Refactor hugetlb allocation resv accounting" from Peter
Xu cleans up and generalizes the hugetlb reservation accounting.
- "mm: alloc_pages_bulk: small API refactor" from Luiz Capitulino
removes a never-used feature of the alloc_pages_bulk() interface.
- "mm/damon: extend DAMOS filters for inclusion" from SeongJae Park
extends DAMOS filters to support not only exclusion (rejecting), but
also inclusion (allowing) behavior.
- "Add zpdesc memory descriptor for zswap.zpool" from Alex Shi
"introduces a new memory descriptor for zswap.zpool that currently
overlaps with struct page for now. This is part of the effort to reduce
the size of struct page and to enable dynamic allocation of memory
descriptors."
- "mm, swap: rework of swap allocator locks" from Kairui Song redoes and
simplifies the swap allocator locking. A speedup of 400% was
demonstrated for one workload. As was a 35% reduction for kernel build
time with swap-on-zram.
- "mm: update mips to use do_mmap(), make mmap_region() internal" from
Lorenzo Stoakes reworks MIPS's use of mmap_region() so that
mmap_region() can be made MM-internal.
- "mm/mglru: performance optimizations" from Yu Zhao fixes a few MGLRU
regressions and otherwise improves MGLRU performance.
- "Docs/mm/damon: add tuning guide and misc updates" from SeongJae Park
updates DAMON documentation.
- "Cleanup for memfd_create()" from Isaac Manjarres does that thing.
- "mm: hugetlb+THP folio and migration cleanups" from David Hildenbrand
provides various cleanups in the areas of hugetlb folios, THP folios and
migration.
- "Uncached buffered IO" from Jens Axboe implements the new
RWF_DONTCACHE flag which provides synchronous dropbehind for pagecache
reading and writing. To permite userspace to address issues with
massive buildup of useless pagecache when reading/writing fast devices.
- "selftests/mm: virtual_address_range: Reduce memory" from Thomas
Weißschuh fixes and optimizes some of the MM selftests.
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZ5a+cwAKCRDdBJ7gKXxA
jtoyAP9R58oaOKPJuTizEKKXvh/RpMyD6sYcz/uPpnf+cKTZxQEAqfVznfWlw/Lz
uC3KRZYhmd5YrxU4o+qjbzp9XWX/xAE=
=Ib2s
-----END PGP SIGNATURE-----
Merge tag 'mm-stable-2025-01-26-14-59' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull MM updates from Andrew Morton:
"The various patchsets are summarized below. Plus of course many
indivudual patches which are described in their changelogs.
- "Allocate and free frozen pages" from Matthew Wilcox reorganizes
the page allocator so we end up with the ability to allocate and
free zero-refcount pages. So that callers (ie, slab) can avoid a
refcount inc & dec
- "Support large folios for tmpfs" from Baolin Wang teaches tmpfs to
use large folios other than PMD-sized ones
- "Fix mm/rodata_test" from Petr Tesarik performs some maintenance
and fixes for this small built-in kernel selftest
- "mas_anode_descend() related cleanup" from Wei Yang tidies up part
of the mapletree code
- "mm: fix format issues and param types" from Keren Sun implements a
few minor code cleanups
- "simplify split calculation" from Wei Yang provides a few fixes and
a test for the mapletree code
- "mm/vma: make more mmap logic userland testable" from Lorenzo
Stoakes continues the work of moving vma-related code into the
(relatively) new mm/vma.c
- "mm/page_alloc: gfp flags cleanups for alloc_contig_*()" from David
Hildenbrand cleans up and rationalizes handling of gfp flags in the
page allocator
- "readahead: Reintroduce fix for improper RA window sizing" from Jan
Kara is a second attempt at fixing a readahead window sizing issue.
It should reduce the amount of unnecessary reading
- "synchronously scan and reclaim empty user PTE pages" from Qi Zheng
addresses an issue where "huge" amounts of pte pagetables are
accumulated:
https://lore.kernel.org/lkml/cover.1718267194.git.zhengqi.arch@bytedance.com/
Qi's series addresses this windup by synchronously freeing PTE
memory within the context of madvise(MADV_DONTNEED)
- "selftest/mm: Remove warnings found by adding compiler flags" from
Muhammad Usama Anjum fixes some build warnings in the selftests
code when optional compiler warnings are enabled
- "mm: don't use __GFP_HARDWALL when migrating remote pages" from
David Hildenbrand tightens the allocator's observance of
__GFP_HARDWALL
- "pkeys kselftests improvements" from Kevin Brodsky implements
various fixes and cleanups in the MM selftests code, mainly
pertaining to the pkeys tests
- "mm/damon: add sample modules" from SeongJae Park enhances DAMON to
estimate application working set size
- "memcg/hugetlb: Rework memcg hugetlb charging" from Joshua Hahn
provides some cleanups to memcg's hugetlb charging logic
- "mm/swap_cgroup: remove global swap cgroup lock" from Kairui Song
removes the global swap cgroup lock. A speedup of 10% for a
tmpfs-based kernel build was demonstrated
- "zram: split page type read/write handling" from Sergey Senozhatsky
has several fixes and cleaups for zram in the area of
zram_write_page(). A watchdog softlockup warning was eliminated
- "move pagetable_*_dtor() to __tlb_remove_table()" from Kevin
Brodsky cleans up the pagetable destructor implementations. A rare
use-after-free race is fixed
- "mm/debug: introduce and use VM_WARN_ON_VMG()" from Lorenzo Stoakes
simplifies and cleans up the debugging code in the VMA merging
logic
- "Account page tables at all levels" from Kevin Brodsky cleans up
and regularizes the pagetable ctor/dtor handling. This results in
improvements in accounting accuracy
- "mm/damon: replace most damon_callback usages in sysfs with new
core functions" from SeongJae Park cleans up and generalizes
DAMON's sysfs file interface logic
- "mm/damon: enable page level properties based monitoring" from
SeongJae Park increases the amount of information which is
presented in response to DAMOS actions
- "mm/damon: remove DAMON debugfs interface" from SeongJae Park
removes DAMON's long-deprecated debugfs interfaces. Thus the
migration to sysfs is completed
- "mm/hugetlb: Refactor hugetlb allocation resv accounting" from
Peter Xu cleans up and generalizes the hugetlb reservation
accounting
- "mm: alloc_pages_bulk: small API refactor" from Luiz Capitulino
removes a never-used feature of the alloc_pages_bulk() interface
- "mm/damon: extend DAMOS filters for inclusion" from SeongJae Park
extends DAMOS filters to support not only exclusion (rejecting),
but also inclusion (allowing) behavior
- "Add zpdesc memory descriptor for zswap.zpool" from Alex Shi
introduces a new memory descriptor for zswap.zpool that currently
overlaps with struct page for now. This is part of the effort to
reduce the size of struct page and to enable dynamic allocation of
memory descriptors
- "mm, swap: rework of swap allocator locks" from Kairui Song redoes
and simplifies the swap allocator locking. A speedup of 400% was
demonstrated for one workload. As was a 35% reduction for kernel
build time with swap-on-zram
- "mm: update mips to use do_mmap(), make mmap_region() internal"
from Lorenzo Stoakes reworks MIPS's use of mmap_region() so that
mmap_region() can be made MM-internal
- "mm/mglru: performance optimizations" from Yu Zhao fixes a few
MGLRU regressions and otherwise improves MGLRU performance
- "Docs/mm/damon: add tuning guide and misc updates" from SeongJae
Park updates DAMON documentation
- "Cleanup for memfd_create()" from Isaac Manjarres does that thing
- "mm: hugetlb+THP folio and migration cleanups" from David
Hildenbrand provides various cleanups in the areas of hugetlb
folios, THP folios and migration
- "Uncached buffered IO" from Jens Axboe implements the new
RWF_DONTCACHE flag which provides synchronous dropbehind for
pagecache reading and writing. To permite userspace to address
issues with massive buildup of useless pagecache when
reading/writing fast devices
- "selftests/mm: virtual_address_range: Reduce memory" from Thomas
Weißschuh fixes and optimizes some of the MM selftests"
* tag 'mm-stable-2025-01-26-14-59' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (321 commits)
mm/compaction: fix UBSAN shift-out-of-bounds warning
s390/mm: add missing ctor/dtor on page table upgrade
kasan: sw_tags: use str_on_off() helper in kasan_init_sw_tags()
tools: add VM_WARN_ON_VMG definition
mm/damon/core: use str_high_low() helper in damos_wmark_wait_us()
seqlock: add missing parameter documentation for raw_seqcount_try_begin()
mm/page-writeback: consolidate wb_thresh bumping logic into __wb_calc_thresh
mm/page_alloc: remove the incorrect and misleading comment
zram: remove zcomp_stream_put() from write_incompressible_page()
mm: separate move/undo parts from migrate_pages_batch()
mm/kfence: use str_write_read() helper in get_access_type()
selftests/mm/mkdirty: fix memory leak in test_uffdio_copy()
kasan: hw_tags: Use str_on_off() helper in kasan_init_hw_tags()
selftests/mm: virtual_address_range: avoid reading from VM_IO mappings
selftests/mm: vm_util: split up /proc/self/smaps parsing
selftests/mm: virtual_address_range: unmap chunks after validation
selftests/mm: virtual_address_range: mmap() without PROT_WRITE
selftests/memfd/memfd_test: fix possible NULL pointer dereference
mm: add FGP_DONTCACHE folio creation flag
mm: call filemap_fdatawrite_range_kick() after IOCB_DONTCACHE issue
...
Before SLUB initialization, various subsystems used memblock_alloc to
allocate memory. In most cases, when memory allocation fails, an
immediate panic is required. To simplify this behavior and reduce
repetitive checks, introduce `memblock_alloc_or_panic`. This function
ensures that memory allocation failures result in a panic automatically,
improving code readability and consistency across subsystems that require
this behavior.
[guoweikang.kernel@gmail.com: arch/s390: save_area_alloc default failure behavior changed to panic]
Link: https://lkml.kernel.org/r/20250109033136.2845676-1-guoweikang.kernel@gmail.com
Link: https://lore.kernel.org/lkml/Z2fknmnNtiZbCc7x@kernel.org/
Link: https://lkml.kernel.org/r/20250102072528.650926-1-guoweikang.kernel@gmail.com
Signed-off-by: Guo Weikang <guoweikang.kernel@gmail.com>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> [m68k]
Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com> [s390]
Acked-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
* Use new TDX module features for exception suppression and RBP
clobbering
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEV76QKkVc4xCGURexaDWVMHDJkrAFAmeQO2IACgkQaDWVMHDJ
krA/UQ/5Abo250b0p6dyez2EhDiKXVJ3v30iriiZvGT+mKDyxQQFvLYE+QDUfHz7
rKjJLEaHKt0wnHkSmBX05RpSieek2oS2ySpfIDxXW2tMjsap5a5WqcvFyOW6NcKy
uwim2tRX6/54kr76smGNdGEeLPV7GL/FTAoub8pt/TCfL/88m+qEsQOvYg7RNvrV
j9lOyL/Tt791E855cTgb1qHHFBu/I9siUWU2RTBKUPo2ILYAlSPS/xRiDe8W5fzS
kPQxtwa5RSvhLBsz4lVbABJFZ1DmvY5TqxS0MMKwh/Dfm0jBXeod7alvMXrfhddF
ikeBVKPuoB3u044rIGpoPqk4nM6ICowqI+H36D/4/bzXfz4zxRlIoPS5WdzS6ICm
23pS1MsoAUOYfWfJTgRtNvKJ7JKEhXgAZFVDWjgLF6kVVIwD0lu550fVTd0OIhaA
mko4+RmaXx+OfumZV3HZsHauV39rCQOT4Ay5jNhPho73BFExOzK3m1fmIjf2nsKu
8Pk+2bhPQV7LvBOgc42BKVMusP/IW4O3dyzS8NnD2YO25gw+xd1oUNAb3LFGg9OJ
vJXLuxlcHe6PmH9TdSZ+Lwj31vpbFtGTX5AOAzVbaIyfIJTuDUmTY5BX9eTW0/N7
iLiNaY713mapkDBbKqWBsiBi1SEVuDZyrixujUiMMrGUwXRCAH4=
=nzuM
-----END PGP SIGNATURE-----
Merge tag 'x86_tdx_for_6.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 TDX updates from Dave Hansen:
"Intel Trust Domain updates.
The existing TDX code needs a _bit_ of metadata from the TDX module.
But KVM is going to need a bunch more very shortly. Rework the
interface with the TDX module to be more consistent and handle the new
higher volume.
The TDX module has added a few new features. The first is a promise
not to clobber RBP under any circumstances. Basically the kernel now
will refuse to use any modules that don't have this promise. Second,
enable the new "REDUCE_VE" feature. This ensures that the TDX module
will not send some silly virtualization exceptions that the guest had
no good way to handle anyway.
- Centralize global metadata infrastructure
- Use new TDX module features for exception suppression and RBP
clobbering"
* tag 'x86_tdx_for_6.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/virt/tdx: Require the module to assert it has the NO_RBP_MOD mitigation
x86/virt/tdx: Switch to use auto-generated global metadata reading code
x86/virt/tdx: Use dedicated struct members for PAMT entry sizes
x86/virt/tdx: Use auto-generated code to read global metadata
x86/virt/tdx: Start to track all global metadata in one structure
x86/virt/tdx: Rename 'struct tdx_tdmr_sysinfo' to reflect the spec better
x86/tdx: Dump attributes and TD_CTLS on boot
x86/tdx: Disable unnecessary virtualization exceptions
- A large and involved preparatory series to pave the way to add exception
handling for relocate_kernel - which will be a debugging facility that
has aided in the field to debug an exceptionally hard to debug early boot bug.
Plus assorted cleanups and fixes that were discovered along the way,
by David Woodhouse:
- Clean up and document register use in relocate_kernel_64.S
- Use named labels in swap_pages in relocate_kernel_64.S
- Only swap pages for ::preserve_context mode
- Allocate PGD for x86_64 transition page tables separately
- Copy control page into place in machine_kexec_prepare()
- Invoke copy of relocate_kernel() instead of the original
- Move relocate_kernel to kernel .data section
- Add data section to relocate_kernel
- Drop page_list argument from relocate_kernel()
- Eliminate writes through kernel mapping of relocate_kernel page
- Clean up register usage in relocate_kernel()
- Mark relocate_kernel page as ROX instead of RWX
- Disable global pages before writing to control page
- Ensure preserve_context flag is set on return to kernel
- Use correct swap page in swap_pages function
- Fix stack and handling of re-entry point for ::preserve_context
- Mark machine_kexec() with __nocfi
- Cope with relocate_kernel() not being at the start of the page
- Use typedef for relocate_kernel_fn function prototype
- Fix location of relocate_kernel with -ffunction-sections (fix by Nathan Chancellor)
- A series to remove the last remaining absolute symbol references from
.head.text, and enforce this at build time, by Ard Biesheuvel:
- Avoid WARN()s and panic()s in early boot code
- Don't hang but terminate on failure to remap SVSM CA
- Determine VA/PA offset before entering C code
- Avoid intentional absolute symbol references in .head.text
- Disable UBSAN in early boot code
- Move ENTRY_TEXT to the start of the image
- Move .head.text into its own output section
- Reject absolute references in .head.text
- Which build-time enforcement uncovered a handful of bugs of essentially
non-working code, and a wrokaround for a toolchain bug, fixed by
Ard Biesheuvel as well:
- Fix spurious undefined reference when CONFIG_X86_5LEVEL=n, on GCC-12
- Disable UBSAN on SEV code that may execute very early
- Disable ftrace branch profiling in SEV startup code
- And miscellaneous cleanups:
- kexec_core: Add and update comments regarding the KEXEC_JUMP flow (Rafael J. Wysocki)
- x86/sysfs: Constify 'struct bin_attribute' (Thomas Weißschuh)
Signed-off-by: Ingo Molnar <mingo@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmeQDmURHG1pbmdvQGtl
cm5lbC5vcmcACgkQEnMQ0APhK1inwRAAjD5QR/Yu7Yiv2nM/ncUwAItsFkv9Jk4Y
HPGz9qNJoZxKxuZVj9bfQhWDe3g6VLnlDYgatht9BsyP5b12qZrUe+yp/TOH54Z3
wPD+U/jun4jiSr7oJkJC+bFn+a/tL39pB8Y6m+jblacgVglleO3SH5fBWNE1UbIV
e2iiNxi0bfuHy3wquegnKaMyF1e7YLw1p5laGSwwk21g5FjT7cLQOqC0/9u8u9xX
Ha+iaod7JOcjiQOqIt/MV57ldWEFCrUhQozRV3tK5Ptf5aoGFpisgQoRoduWUtFz
UbHiHhv6zE4DOIUzaAbJjYfR1Z/LCviwON97XJgeOOkJaULF7yFCfhGxKSyQoMIh
qZtlBs4VsGl2/dOl+iW6xKwgRiNundTzSQtt5D/xuFz5LnDxe/SrlZnYp8lOPP8R
w9V2b/fC0YxmUzEW6EDhBqvfuScKiNWoic47qvYfZPaWyg1ESpvWTIh6AKB5ThUR
upgJQdA4HW+y5C57uHW40TSe3xEeqM3+Slk0jxLElP7/yTul5r7jrjq2EkwaAv/j
6/0LsMSr33r9fVFeMP1qLXPUaipcqTWWTpeeTr8NBGUcvOKzw5SltEG4NihzCyhF
3/UMQhcQ6KE3iFMPlRu4hV7ZV4gErZmLoRwh9Uk28f2Xx8T95uoV8KTg1/sRZRTo
uQLeRxYnyrw=
=vGWS
-----END PGP SIGNATURE-----
Merge tag 'x86-boot-2025-01-21' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 boot updates from Ingo Molnar:
- A large and involved preparatory series to pave the way to add
exception handling for relocate_kernel - which will be a debugging
facility that has aided in the field to debug an exceptionally hard
to debug early boot bug. Plus assorted cleanups and fixes that were
discovered along the way, by David Woodhouse:
- Clean up and document register use in relocate_kernel_64.S
- Use named labels in swap_pages in relocate_kernel_64.S
- Only swap pages for ::preserve_context mode
- Allocate PGD for x86_64 transition page tables separately
- Copy control page into place in machine_kexec_prepare()
- Invoke copy of relocate_kernel() instead of the original
- Move relocate_kernel to kernel .data section
- Add data section to relocate_kernel
- Drop page_list argument from relocate_kernel()
- Eliminate writes through kernel mapping of relocate_kernel page
- Clean up register usage in relocate_kernel()
- Mark relocate_kernel page as ROX instead of RWX
- Disable global pages before writing to control page
- Ensure preserve_context flag is set on return to kernel
- Use correct swap page in swap_pages function
- Fix stack and handling of re-entry point for ::preserve_context
- Mark machine_kexec() with __nocfi
- Cope with relocate_kernel() not being at the start of the page
- Use typedef for relocate_kernel_fn function prototype
- Fix location of relocate_kernel with -ffunction-sections (fix by Nathan Chancellor)
- A series to remove the last remaining absolute symbol references from
.head.text, and enforce this at build time, by Ard Biesheuvel:
- Avoid WARN()s and panic()s in early boot code
- Don't hang but terminate on failure to remap SVSM CA
- Determine VA/PA offset before entering C code
- Avoid intentional absolute symbol references in .head.text
- Disable UBSAN in early boot code
- Move ENTRY_TEXT to the start of the image
- Move .head.text into its own output section
- Reject absolute references in .head.text
- The above build-time enforcement uncovered a handful of bugs of
essentially non-working code, and a wrokaround for a toolchain bug,
fixed by Ard Biesheuvel as well:
- Fix spurious undefined reference when CONFIG_X86_5LEVEL=n, on GCC-12
- Disable UBSAN on SEV code that may execute very early
- Disable ftrace branch profiling in SEV startup code
- And miscellaneous cleanups:
- kexec_core: Add and update comments regarding the KEXEC_JUMP flow (Rafael J. Wysocki)
- x86/sysfs: Constify 'struct bin_attribute' (Thomas Weißschuh)"
* tag 'x86-boot-2025-01-21' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (33 commits)
x86/sev: Disable ftrace branch profiling in SEV startup code
x86/kexec: Use typedef for relocate_kernel_fn function prototype
x86/kexec: Cope with relocate_kernel() not being at the start of the page
kexec_core: Add and update comments regarding the KEXEC_JUMP flow
x86/kexec: Mark machine_kexec() with __nocfi
x86/kexec: Fix location of relocate_kernel with -ffunction-sections
x86/kexec: Fix stack and handling of re-entry point for ::preserve_context
x86/kexec: Use correct swap page in swap_pages function
x86/kexec: Ensure preserve_context flag is set on return to kernel
x86/kexec: Disable global pages before writing to control page
x86/sev: Don't hang but terminate on failure to remap SVSM CA
x86/sev: Disable UBSAN on SEV code that may execute very early
x86/boot/64: Fix spurious undefined reference when CONFIG_X86_5LEVEL=n, on GCC-12
x86/sysfs: Constify 'struct bin_attribute'
x86/kexec: Mark relocate_kernel page as ROX instead of RWX
x86/kexec: Clean up register usage in relocate_kernel()
x86/kexec: Eliminate writes through kernel mapping of relocate_kernel page
x86/kexec: Drop page_list argument from relocate_kernel()
x86/kexec: Add data section to relocate_kernel
x86/kexec: Move relocate_kernel to kernel .data section
...
Ftrace branch profiling inserts absolute references to its metadata at
call sites, and this implies that this kind of instrumentation cannot be
used while executing from the 1:1 mapping of memory.
Therefore, disable ftrace branch profiling in the SEV startup routines,
by disabling it for the entire SEV core source file.
Closes: https://lore.kernel.org/oe-kbuild-all/202501072244.zZrx9864-lkp@intel.com/
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Acked-by: Tom Lendacky <thomas.lendacky@amd.com>
Link: https://lore.kernel.org/r/20250107151826.820147-2-ardb+git@google.com
Use the GUEST_TSC_FREQ MSR to discover the TSC frequency instead of
relying on kvm-clock based frequency calibration. Override both CPU and
TSC frequency calibration callbacks with securetsc_get_tsc_khz(). Since
the difference between CPU base and TSC frequency does not apply in this
case, the same callback is being used.
[ bp: Carve out from
https://lore.kernel.org/r/20250106124633.1418972-11-nikunj@amd.com ]
Signed-off-by: Nikunj A Dadhania <nikunj@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20250106124633.1418972-11-nikunj@amd.com
The hypervisor should not be intercepting RDTSC/RDTSCP when Secure TSC is
enabled. A #VC exception will be generated if the RDTSC/RDTSCP instructions
are being intercepted. If this should occur and Secure TSC is enabled,
guest execution should be terminated as the guest cannot rely on the TSC
value provided by the hypervisor.
Signed-off-by: Nikunj A Dadhania <nikunj@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com>
Tested-by: Peter Gonda <pgonda@google.com>
Link: https://lore.kernel.org/r/20250106124633.1418972-9-nikunj@amd.com