Miscellaneous fixes:

- Fix CPU topology related regression that limited
    Xen PV guests to a single CPU
 
  - Fix ancient e820__register_nosave_regions() bugs that
    were causing problems with kexec's artificial memory
    maps
 
  - Fix an S4 hibernation crash caused by two missing ENDBR's that
    were mistakenly removed in a recent commit
 
  - Fix a resctrl serialization bug
 
  - Fix early_printk documentation and comments
 
  - Fix RSB bugs, combined with preparatory updates to better
    match the code to vendor recommendations.
 
  - Add RSB mitigation document
 
  - Fix/update documentation
 
  - Fix the erratum_1386_microcode[] table to be NULL terminated
 
 Signed-off-by: Ingo Molnar <mingo@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmf4Na0RHG1pbmdvQGtl
 cm5lbC5vcmcACgkQEnMQ0APhK1iy0hAAw03t9IGCgFEbzFkm2jRvoR/kUBnh7Q+B
 E1LLYjlYws0TLcxTFIkc3slI2dt0LE6YN6kHT4gzJmE2Rp7G3oKR9xGwW/soJEuv
 +hTZ4ueY8TY2mEOwKUkY7xetBDI/e6iXqMnrXIVz1xIDwW3wyQ31jT+A7LzW7Gxn
 CKKymIJQfH9eDJwakiTjrmsJRy2cmah5ajFmhrlt1bLDV1Ykts595HTZNFBnsDJq
 mGxUwKZi0h9h6JZgLSZJQtUu2Pv3WmI/6DlkPG3cNZJIIfS7sMPj1LpQVTKMPQ19
 zGzkHGAv6tgp7gIxse1MFoLiKEsAPR/iAL++o2PeyQkynXpVb0g6d6fvicGK/OAe
 xWR4rf/LVluvvwRam9bYaIkDkahbT/uLe/dp99YEqclfBGSsHY1C8jhPiuVyOQQK
 w5AS1D5LSqXVTxu1XWCVTAhfR5nPS+O5q2hEs4O8tEdWNeOQSeExOZ8z2lqyqeoG
 VifCuQqcPbCja0msBWX9eEY/M/ie3AcasrfgD49Xj7oTBQOMXO70YeENM1fVzcko
 NQFY8RqA+N/EmTaWJvJ8o88ZIvTKqosyTYOvQIq9ZJS7DeeVtPZ+wgJahiZbBKT7
 4KSjLOO3ZvosrgafS35I4v5+zU0GO6B7rgWUKALFsSy52FgXk0ip4RpO6DPCsmRD
 8GEpn0X19xM=
 =1DWX
 -----END PGP SIGNATURE-----

Merge tag 'x86-urgent-2025-04-10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull misc x86 fixes from Ingo Molnar:

 - Fix CPU topology related regression that limited Xen PV guests to a
   single CPU

 - Fix ancient e820__register_nosave_regions() bugs that were causing
   problems with kexec's artificial memory maps

 - Fix an S4 hibernation crash caused by two missing ENDBR's that were
   mistakenly removed in a recent commit

 - Fix a resctrl serialization bug

 - Fix early_printk documentation and comments

 - Fix RSB bugs, combined with preparatory updates to better match the
   code to vendor recommendations.

 - Add RSB mitigation document

 - Fix/update documentation

 - Fix the erratum_1386_microcode[] table to be NULL terminated

* tag 'x86-urgent-2025-04-10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/ibt: Fix hibernate
  x86/cpu: Avoid running off the end of an AMD erratum table
  Documentation/x86: Zap the subsection letters
  Documentation/x86: Update the naming of CPU features for /proc/cpuinfo
  x86/bugs: Add RSB mitigation document
  x86/bugs: Don't fill RSB on context switch with eIBRS
  x86/bugs: Don't fill RSB on VMEXIT with eIBRS+retpoline
  x86/bugs: Fix RSB clearing in indirect_branch_prediction_barrier()
  x86/bugs: Use SBPB in write_ibpb() if applicable
  x86/bugs: Rename entry_ibpb() to write_ibpb()
  x86/early_printk: Use 'mmio32' for consistency, fix comments
  x86/resctrl: Fix rdtgroup_mkdir()'s unlocked use of kernfs_node::name
  x86/e820: Fix handling of subpage regions when calculating nosave ranges in e820__register_nosave_regions()
  x86/acpi: Don't limit CPUs to 1 for Xen PV guests due to disabled ACPI
This commit is contained in:
Linus Torvalds 2025-04-10 15:20:10 -07:00
commit 3c9de67dd3
14 changed files with 408 additions and 160 deletions

View file

@ -22,3 +22,4 @@ are configurable at compile, boot or run time.
srso
gather_data_sampling
reg-file-data-sampling
rsb

View file

@ -0,0 +1,268 @@
.. SPDX-License-Identifier: GPL-2.0
=======================
RSB-related mitigations
=======================
.. warning::
Please keep this document up-to-date, otherwise you will be
volunteered to update it and convert it to a very long comment in
bugs.c!
Since 2018 there have been many Spectre CVEs related to the Return Stack
Buffer (RSB) (sometimes referred to as the Return Address Stack (RAS) or
Return Address Predictor (RAP) on AMD).
Information about these CVEs and how to mitigate them is scattered
amongst a myriad of microarchitecture-specific documents.
This document attempts to consolidate all the relevant information in
once place and clarify the reasoning behind the current RSB-related
mitigations. It's meant to be as concise as possible, focused only on
the current kernel mitigations: what are the RSB-related attack vectors
and how are they currently being mitigated?
It's *not* meant to describe how the RSB mechanism operates or how the
exploits work. More details about those can be found in the references
below.
Rather, this is basically a glorified comment, but too long to actually
be one. So when the next CVE comes along, a kernel developer can
quickly refer to this as a refresher to see what we're actually doing
and why.
At a high level, there are two classes of RSB attacks: RSB poisoning
(Intel and AMD) and RSB underflow (Intel only). They must each be
considered individually for each attack vector (and microarchitecture
where applicable).
----
RSB poisoning (Intel and AMD)
=============================
SpectreRSB
~~~~~~~~~~
RSB poisoning is a technique used by SpectreRSB [#spectre-rsb]_ where
an attacker poisons an RSB entry to cause a victim's return instruction
to speculate to an attacker-controlled address. This can happen when
there are unbalanced CALLs/RETs after a context switch or VMEXIT.
* All attack vectors can potentially be mitigated by flushing out any
poisoned RSB entries using an RSB filling sequence
[#intel-rsb-filling]_ [#amd-rsb-filling]_ when transitioning between
untrusted and trusted domains. But this has a performance impact and
should be avoided whenever possible.
.. DANGER::
**FIXME**: Currently we're flushing 32 entries. However, some CPU
models have more than 32 entries. The loop count needs to be
increased for those. More detailed information is needed about RSB
sizes.
* On context switch, the user->user mitigation requires ensuring the
RSB gets filled or cleared whenever IBPB gets written [#cond-ibpb]_
during a context switch:
* AMD:
On Zen 4+, IBPB (or SBPB [#amd-sbpb]_ if used) clears the RSB.
This is indicated by IBPB_RET in CPUID [#amd-ibpb-rsb]_.
On Zen < 4, the RSB filling sequence [#amd-rsb-filling]_ must be
always be done in addition to IBPB [#amd-ibpb-no-rsb]_. This is
indicated by X86_BUG_IBPB_NO_RET.
* Intel:
IBPB always clears the RSB:
"Software that executed before the IBPB command cannot control
the predicted targets of indirect branches executed after the
command on the same logical processor. The term indirect branch
in this context includes near return instructions, so these
predicted targets may come from the RSB." [#intel-ibpb-rsb]_
* On context switch, user->kernel attacks are prevented by SMEP. User
space can only insert user space addresses into the RSB. Even
non-canonical addresses can't be inserted due to the page gap at the
end of the user canonical address space reserved by TASK_SIZE_MAX.
A SMEP #PF at instruction fetch prevents the kernel from speculatively
executing user space.
* AMD:
"Finally, branches that are predicted as 'ret' instructions get
their predicted targets from the Return Address Predictor (RAP).
AMD recommends software use a RAP stuffing sequence (mitigation
V2-3 in [2]) and/or Supervisor Mode Execution Protection (SMEP)
to ensure that the addresses in the RAP are safe for
speculation. Collectively, we refer to these mitigations as "RAP
Protection"." [#amd-smep-rsb]_
* Intel:
"On processors with enhanced IBRS, an RSB overwrite sequence may
not suffice to prevent the predicted target of a near return
from using an RSB entry created in a less privileged predictor
mode. Software can prevent this by enabling SMEP (for
transitions from user mode to supervisor mode) and by having
IA32_SPEC_CTRL.IBRS set during VM exits." [#intel-smep-rsb]_
* On VMEXIT, guest->host attacks are mitigated by eIBRS (and PBRSB
mitigation if needed):
* AMD:
"When Automatic IBRS is enabled, the internal return address
stack used for return address predictions is cleared on VMEXIT."
[#amd-eibrs-vmexit]_
* Intel:
"On processors with enhanced IBRS, an RSB overwrite sequence may
not suffice to prevent the predicted target of a near return
from using an RSB entry created in a less privileged predictor
mode. Software can prevent this by enabling SMEP (for
transitions from user mode to supervisor mode) and by having
IA32_SPEC_CTRL.IBRS set during VM exits. Processors with
enhanced IBRS still support the usage model where IBRS is set
only in the OS/VMM for OSes that enable SMEP. To do this, such
processors will ensure that guest behavior cannot control the
RSB after a VM exit once IBRS is set, even if IBRS was not set
at the time of the VM exit." [#intel-eibrs-vmexit]_
Note that some Intel CPUs are susceptible to Post-barrier Return
Stack Buffer Predictions (PBRSB) [#intel-pbrsb]_, where the last
CALL from the guest can be used to predict the first unbalanced RET.
In this case the PBRSB mitigation is needed in addition to eIBRS.
AMD RETBleed / SRSO / Branch Type Confusion
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
On AMD, poisoned RSB entries can also be created by the AMD RETBleed
variant [#retbleed-paper]_ [#amd-btc]_ or by Speculative Return Stack
Overflow [#amd-srso]_ (Inception [#inception-paper]_). The kernel
protects itself by replacing every RET in the kernel with a branch to a
single safe RET.
----
RSB underflow (Intel only)
==========================
RSB Alternate (RSBA) ("Intel Retbleed")
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Some Intel Skylake-generation CPUs are susceptible to the Intel variant
of RETBleed [#retbleed-paper]_ (Return Stack Buffer Underflow
[#intel-rsbu]_). If a RET is executed when the RSB buffer is empty due
to mismatched CALLs/RETs or returning from a deep call stack, the branch
predictor can fall back to using the Branch Target Buffer (BTB). If a
user forces a BTB collision then the RET can speculatively branch to a
user-controlled address.
* Note that RSB filling doesn't fully mitigate this issue. If there
are enough unbalanced RETs, the RSB may still underflow and fall back
to using a poisoned BTB entry.
* On context switch, user->user underflow attacks are mitigated by the
conditional IBPB [#cond-ibpb]_ on context switch which effectively
clears the BTB:
* "The indirect branch predictor barrier (IBPB) is an indirect branch
control mechanism that establishes a barrier, preventing software
that executed before the barrier from controlling the predicted
targets of indirect branches executed after the barrier on the same
logical processor." [#intel-ibpb-btb]_
* On context switch and VMEXIT, user->kernel and guest->host RSB
underflows are mitigated by IBRS or eIBRS:
* "Enabling IBRS (including enhanced IBRS) will mitigate the "RSBU"
attack demonstrated by the researchers. As previously documented,
Intel recommends the use of enhanced IBRS, where supported. This
includes any processor that enumerates RRSBA but not RRSBA_DIS_S."
[#intel-rsbu]_
However, note that eIBRS and IBRS do not mitigate intra-mode attacks.
Like RRSBA below, this is mitigated by clearing the BHB on kernel
entry.
As an alternative to classic IBRS, call depth tracking (combined with
retpolines) can be used to track kernel returns and fill the RSB when
it gets close to being empty.
Restricted RSB Alternate (RRSBA)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Some newer Intel CPUs have Restricted RSB Alternate (RRSBA) behavior,
which, similar to RSBA described above, also falls back to using the BTB
on RSB underflow. The only difference is that the predicted targets are
restricted to the current domain when eIBRS is enabled:
* "Restricted RSB Alternate (RRSBA) behavior allows alternate branch
predictors to be used by near RET instructions when the RSB is
empty. When eIBRS is enabled, the predicted targets of these
alternate predictors are restricted to those belonging to the
indirect branch predictor entries of the current prediction domain.
[#intel-eibrs-rrsba]_
When a CPU with RRSBA is vulnerable to Branch History Injection
[#bhi-paper]_ [#intel-bhi]_, an RSB underflow could be used for an
intra-mode BTI attack. This is mitigated by clearing the BHB on
kernel entry.
However if the kernel uses retpolines instead of eIBRS, it needs to
disable RRSBA:
* "Where software is using retpoline as a mitigation for BHI or
intra-mode BTI, and the processor both enumerates RRSBA and
enumerates RRSBA_DIS controls, it should disable this behavior."
[#intel-retpoline-rrsba]_
----
References
==========
.. [#spectre-rsb] `Spectre Returns! Speculation Attacks using the Return Stack Buffer <https://arxiv.org/pdf/1807.07940.pdf>`_
.. [#intel-rsb-filling] "Empty RSB Mitigation on Skylake-generation" in `Retpoline: A Branch Target Injection Mitigation <https://www.intel.com/content/www/us/en/developer/articles/technical/software-security-guidance/technical-documentation/retpoline-branch-target-injection-mitigation.html#inpage-nav-5-1>`_
.. [#amd-rsb-filling] "Mitigation V2-3" in `Software Techniques for Managing Speculation <https://www.amd.com/content/dam/amd/en/documents/processor-tech-docs/programmer-references/software-techniques-for-managing-speculation.pdf>`_
.. [#cond-ibpb] Whether IBPB is written depends on whether the prev and/or next task is protected from Spectre attacks. It typically requires opting in per task or system-wide. For more details see the documentation for the ``spectre_v2_user`` cmdline option in Documentation/admin-guide/kernel-parameters.txt.
.. [#amd-sbpb] IBPB without flushing of branch type predictions. Only exists for AMD.
.. [#amd-ibpb-rsb] "Function 8000_0008h -- Processor Capacity Parameters and Extended Feature Identification" in `AMD64 Architecture Programmer's Manual Volume 3: General-Purpose and System Instructions <https://www.amd.com/content/dam/amd/en/documents/processor-tech-docs/programmer-references/24594.pdf>`_. SBPB behaves the same way according to `this email <https://lore.kernel.org/5175b163a3736ca5fd01cedf406735636c99a>`_.
.. [#amd-ibpb-no-rsb] `Spectre Attacks: Exploiting Speculative Execution <https://comsec.ethz.ch/wp-content/files/ibpb_sp25.pdf>`_
.. [#intel-ibpb-rsb] "Introduction" in `Post-barrier Return Stack Buffer Predictions / CVE-2022-26373 / INTEL-SA-00706 <https://www.intel.com/content/www/us/en/developer/articles/technical/software-security-guidance/advisory-guidance/post-barrier-return-stack-buffer-predictions.html>`_
.. [#amd-smep-rsb] "Existing Mitigations" in `Technical Guidance for Mitigating Branch Type Confusion <https://www.amd.com/content/dam/amd/en/documents/resources/technical-guidance-for-mitigating-branch-type-confusion.pdf>`_
.. [#intel-smep-rsb] "Enhanced IBRS" in `Indirect Branch Restricted Speculation <https://www.intel.com/content/www/us/en/developer/articles/technical/software-security-guidance/technical-documentation/indirect-branch-restricted-speculation.html>`_
.. [#amd-eibrs-vmexit] "Extended Feature Enable Register (EFER)" in `AMD64 Architecture Programmer's Manual Volume 2: System Programming <https://www.amd.com/content/dam/amd/en/documents/processor-tech-docs/programmer-references/24593.pdf>`_
.. [#intel-eibrs-vmexit] "Enhanced IBRS" in `Indirect Branch Restricted Speculation <https://www.intel.com/content/www/us/en/developer/articles/technical/software-security-guidance/technical-documentation/indirect-branch-restricted-speculation.html>`_
.. [#intel-pbrsb] `Post-barrier Return Stack Buffer Predictions / CVE-2022-26373 / INTEL-SA-00706 <https://www.intel.com/content/www/us/en/developer/articles/technical/software-security-guidance/advisory-guidance/post-barrier-return-stack-buffer-predictions.html>`_
.. [#retbleed-paper] `RETBleed: Arbitrary Speculative Code Execution with Return Instruction <https://comsec.ethz.ch/wp-content/files/retbleed_sec22.pdf>`_
.. [#amd-btc] `Technical Guidance for Mitigating Branch Type Confusion <https://www.amd.com/content/dam/amd/en/documents/resources/technical-guidance-for-mitigating-branch-type-confusion.pdf>`_
.. [#amd-srso] `Technical Update Regarding Speculative Return Stack Overflow <https://www.amd.com/content/dam/amd/en/documents/corporate/cr/speculative-return-stack-overflow-whitepaper.pdf>`_
.. [#inception-paper] `Inception: Exposing New Attack Surfaces with Training in Transient Execution <https://comsec.ethz.ch/wp-content/files/inception_sec23.pdf>`_
.. [#intel-rsbu] `Return Stack Buffer Underflow / Return Stack Buffer Underflow / CVE-2022-29901, CVE-2022-28693 / INTEL-SA-00702 <https://www.intel.com/content/www/us/en/developer/articles/technical/software-security-guidance/advisory-guidance/return-stack-buffer-underflow.html>`_
.. [#intel-ibpb-btb] `Indirect Branch Predictor Barrier' <https://www.intel.com/content/www/us/en/developer/articles/technical/software-security-guidance/technical-documentation/indirect-branch-predictor-barrier.html>`_
.. [#intel-eibrs-rrsba] "Guidance for RSBU" in `Return Stack Buffer Underflow / Return Stack Buffer Underflow / CVE-2022-29901, CVE-2022-28693 / INTEL-SA-00702 <https://www.intel.com/content/www/us/en/developer/articles/technical/software-security-guidance/advisory-guidance/return-stack-buffer-underflow.html>`_
.. [#bhi-paper] `Branch History Injection: On the Effectiveness of Hardware Mitigations Against Cross-Privilege Spectre-v2 Attacks <http://download.vusec.net/papers/bhi-spectre-bhb_sec22.pdf>`_
.. [#intel-bhi] `Branch History Injection and Intra-mode Branch Target Injection / CVE-2022-0001, CVE-2022-0002 / INTEL-SA-00598 <https://www.intel.com/content/www/us/en/developer/articles/technical/software-security-guidance/technical-documentation/branch-history-injection.html>`_
.. [#intel-retpoline-rrsba] "Retpoline" in `Branch History Injection and Intra-mode Branch Target Injection / CVE-2022-0001, CVE-2022-0002 / INTEL-SA-00598 <https://www.intel.com/content/www/us/en/developer/articles/technical/software-security-guidance/technical-documentation/branch-history-injection.html>`_

View file

@ -1407,18 +1407,15 @@
earlyprintk=serial[,0x...[,baudrate]]
earlyprintk=ttySn[,baudrate]
earlyprintk=dbgp[debugController#]
earlyprintk=mmio32,membase[,{nocfg|baudrate}]
earlyprintk=pciserial[,force],bus:device.function[,{nocfg|baudrate}]
earlyprintk=xdbc[xhciController#]
earlyprintk=bios
earlyprintk=mmio,membase[,{nocfg|baudrate}]
earlyprintk is useful when the kernel crashes before
the normal console is initialized. It is not enabled by
default because it has some cosmetic problems.
Only 32-bit memory addresses are supported for "mmio"
and "pciserial" devices.
Use "nocfg" to skip UART configuration, assume
BIOS/firmware has configured UART correctly.

View file

@ -79,8 +79,9 @@ feature flags.
How are feature flags created?
==============================
a: Feature flags can be derived from the contents of CPUID leaves.
------------------------------------------------------------------
Feature flags can be derived from the contents of CPUID leaves
--------------------------------------------------------------
These feature definitions are organized mirroring the layout of CPUID
leaves and grouped in words with offsets as mapped in enum cpuid_leafs
in cpufeatures.h (see arch/x86/include/asm/cpufeatures.h for details).
@ -89,8 +90,9 @@ cpufeatures.h, and if it is detected at run time, the flags will be
displayed accordingly in /proc/cpuinfo. For example, the flag "avx2"
comes from X86_FEATURE_AVX2 in cpufeatures.h.
b: Flags can be from scattered CPUID-based features.
----------------------------------------------------
Flags can be from scattered CPUID-based features
------------------------------------------------
Hardware features enumerated in sparsely populated CPUID leaves get
software-defined values. Still, CPUID needs to be queried to determine
if a given feature is present. This is done in init_scattered_cpuid_features().
@ -104,8 +106,9 @@ has only one feature and would waste 31 bits of space in the x86_capability[]
array. Since there is a struct cpuinfo_x86 for each possible CPU, the wasted
memory is not trivial.
c: Flags can be created synthetically under certain conditions for hardware features.
-------------------------------------------------------------------------------------
Flags can be created synthetically under certain conditions for hardware features
---------------------------------------------------------------------------------
Examples of conditions include whether certain features are present in
MSR_IA32_CORE_CAPS or specific CPU models are identified. If the needed
conditions are met, the features are enabled by the set_cpu_cap or
@ -114,8 +117,8 @@ the feature X86_FEATURE_SPLIT_LOCK_DETECT will be enabled and
"split_lock_detect" will be displayed. The flag "ring3mwait" will be
displayed only when running on INTEL_XEON_PHI_[KNL|KNM] processors.
d: Flags can represent purely software features.
------------------------------------------------
Flags can represent purely software features
--------------------------------------------
These flags do not represent hardware features. Instead, they represent a
software feature implemented in the kernel. For example, Kernel Page Table
Isolation is purely software feature and its feature flag X86_FEATURE_PTI is
@ -130,14 +133,18 @@ x86_cap/bug_flags[] arrays in kernel/cpu/capflags.c. The names in the
resulting x86_cap/bug_flags[] are used to populate /proc/cpuinfo. The naming
of flags in the x86_cap/bug_flags[] are as follows:
a: The name of the flag is from the string in X86_FEATURE_<name> by default.
----------------------------------------------------------------------------
By default, the flag <name> in /proc/cpuinfo is extracted from the respective
X86_FEATURE_<name> in cpufeatures.h. For example, the flag "avx2" is from
X86_FEATURE_AVX2.
Flags do not appear by default in /proc/cpuinfo
-----------------------------------------------
Feature flags are omitted by default from /proc/cpuinfo as it does not make
sense for the feature to be exposed to userspace in most cases. For example,
X86_FEATURE_ALWAYS is defined in cpufeatures.h but that flag is an internal
kernel feature used in the alternative runtime patching functionality. So the
flag does not appear in /proc/cpuinfo.
Specify a flag name if absolutely needed
----------------------------------------
b: The naming can be overridden.
--------------------------------
If the comment on the line for the #define X86_FEATURE_* starts with a
double-quote character (""), the string inside the double-quote characters
will be the name of the flags. For example, the flag "sse4_1" comes from
@ -148,36 +155,31 @@ needed. For instance, /proc/cpuinfo is a userspace interface and must remain
constant. If, for some reason, the naming of X86_FEATURE_<name> changes, one
shall override the new naming with the name already used in /proc/cpuinfo.
c: The naming override can be "", which means it will not appear in /proc/cpuinfo.
----------------------------------------------------------------------------------
The feature shall be omitted from /proc/cpuinfo if it does not make sense for
the feature to be exposed to userspace. For example, X86_FEATURE_ALWAYS is
defined in cpufeatures.h but that flag is an internal kernel feature used
in the alternative runtime patching functionality. So, its name is overridden
with "". Its flag will not appear in /proc/cpuinfo.
Flags are missing when one or more of these happen
==================================================
a: The hardware does not enumerate support for it.
--------------------------------------------------
The hardware does not enumerate support for it
----------------------------------------------
For example, when a new kernel is running on old hardware or the feature is
not enabled by boot firmware. Even if the hardware is new, there might be a
problem enabling the feature at run time, the flag will not be displayed.
b: The kernel does not know about the flag.
-------------------------------------------
The kernel does not know about the flag
---------------------------------------
For example, when an old kernel is running on new hardware.
c: The kernel disabled support for it at compile-time.
------------------------------------------------------
The kernel disabled support for it at compile-time
--------------------------------------------------
For example, if 5-level-paging is not enabled when building (i.e.,
CONFIG_X86_5LEVEL is not selected) the flag "la57" will not show up [#f1]_.
Even though the feature will still be detected via CPUID, the kernel disables
it by clearing via setup_clear_cpu_cap(X86_FEATURE_LA57).
d: The feature is disabled at boot-time.
----------------------------------------
The feature is disabled at boot-time
------------------------------------
A feature can be disabled either using a command-line parameter or because
it failed to be enabled. The command-line parameter clearcpuid= can be used
to disable features using the feature number as defined in
@ -190,8 +192,9 @@ disable specific features. The list of parameters includes, but is not limited
to, nofsgsbase, nosgx, noxsave, etc. 5-level paging can also be disabled using
"no5lvl".
e: The feature was known to be non-functional.
----------------------------------------------
The feature was known to be non-functional
------------------------------------------
The feature was known to be non-functional because a dependency was
missing at runtime. For example, AVX flags will not show up if XSAVE feature
is disabled since they depend on XSAVE feature. Another example would be broken

View file

@ -17,19 +17,20 @@
.pushsection .noinstr.text, "ax"
SYM_FUNC_START(entry_ibpb)
/* Clobbers AX, CX, DX */
SYM_FUNC_START(write_ibpb)
ANNOTATE_NOENDBR
movl $MSR_IA32_PRED_CMD, %ecx
movl $PRED_CMD_IBPB, %eax
movl _ASM_RIP(x86_pred_cmd), %eax
xorl %edx, %edx
wrmsr
/* Make sure IBPB clears return stack preductions too. */
FILL_RETURN_BUFFER %rax, RSB_CLEAR_LOOPS, X86_BUG_IBPB_NO_RET
RET
SYM_FUNC_END(entry_ibpb)
SYM_FUNC_END(write_ibpb)
/* For KVM */
EXPORT_SYMBOL_GPL(entry_ibpb);
EXPORT_SYMBOL_GPL(write_ibpb);
.popsection

View file

@ -269,7 +269,7 @@
* typically has NO_MELTDOWN).
*
* While retbleed_untrain_ret() doesn't clobber anything but requires stack,
* entry_ibpb() will clobber AX, CX, DX.
* write_ibpb() will clobber AX, CX, DX.
*
* As such, this must be placed after every *SWITCH_TO_KERNEL_CR3 at a point
* where we have a stack but before any RET instruction.
@ -279,7 +279,7 @@
VALIDATE_UNRET_END
CALL_UNTRAIN_RET
ALTERNATIVE_2 "", \
"call entry_ibpb", \ibpb_feature, \
"call write_ibpb", \ibpb_feature, \
__stringify(\call_depth_insns), X86_FEATURE_CALL_DEPTH
#endif
.endm
@ -368,7 +368,7 @@ extern void srso_return_thunk(void);
extern void srso_alias_return_thunk(void);
extern void entry_untrain_ret(void);
extern void entry_ibpb(void);
extern void write_ibpb(void);
#ifdef CONFIG_X86_64
extern void clear_bhb_loop(void);
@ -514,11 +514,11 @@ void alternative_msr_write(unsigned int msr, u64 val, unsigned int feature)
: "memory");
}
extern u64 x86_pred_cmd;
static inline void indirect_branch_prediction_barrier(void)
{
alternative_msr_write(MSR_IA32_PRED_CMD, x86_pred_cmd, X86_FEATURE_IBPB);
asm_inline volatile(ALTERNATIVE("", "call write_ibpb", X86_FEATURE_IBPB)
: ASM_CALL_CONSTRAINT
:: "rax", "rcx", "rdx", "memory");
}
/* The Intel SPEC CTRL MSR base value cache */

View file

@ -23,6 +23,8 @@
#include <linux/serial_core.h>
#include <linux/pgtable.h>
#include <xen/xen.h>
#include <asm/e820/api.h>
#include <asm/irqdomain.h>
#include <asm/pci_x86.h>
@ -1729,6 +1731,15 @@ int __init acpi_mps_check(void)
{
#if defined(CONFIG_X86_LOCAL_APIC) && !defined(CONFIG_X86_MPPARSE)
/* mptable code is not built-in*/
/*
* Xen disables ACPI in PV DomU guests but it still emulates APIC and
* supports SMP. Returning early here ensures that APIC is not disabled
* unnecessarily and the guest is not limited to a single vCPU.
*/
if (xen_pv_domain() && !xen_initial_domain())
return 0;
if (acpi_disabled || acpi_noirq) {
pr_warn("MPS support code is not built-in, using acpi=off or acpi=noirq or pci=noacpi may have problem\n");
return 1;

View file

@ -805,6 +805,7 @@ static void init_amd_bd(struct cpuinfo_x86 *c)
static const struct x86_cpu_id erratum_1386_microcode[] = {
X86_MATCH_VFM_STEPS(VFM_MAKE(X86_VENDOR_AMD, 0x17, 0x01), 0x2, 0x2, 0x0800126e),
X86_MATCH_VFM_STEPS(VFM_MAKE(X86_VENDOR_AMD, 0x17, 0x31), 0x0, 0x0, 0x08301052),
{}
};
static void fix_erratum_1386(struct cpuinfo_x86 *c)

View file

@ -59,7 +59,6 @@ DEFINE_PER_CPU(u64, x86_spec_ctrl_current);
EXPORT_PER_CPU_SYMBOL_GPL(x86_spec_ctrl_current);
u64 x86_pred_cmd __ro_after_init = PRED_CMD_IBPB;
EXPORT_SYMBOL_GPL(x86_pred_cmd);
static u64 __ro_after_init x86_arch_cap_msr;
@ -1142,7 +1141,7 @@ do_cmd_auto:
setup_clear_cpu_cap(X86_FEATURE_RETHUNK);
/*
* There is no need for RSB filling: entry_ibpb() ensures
* There is no need for RSB filling: write_ibpb() ensures
* all predictions, including the RSB, are invalidated,
* regardless of IBPB implementation.
*/
@ -1592,51 +1591,54 @@ static void __init spec_ctrl_disable_kernel_rrsba(void)
rrsba_disabled = true;
}
static void __init spectre_v2_determine_rsb_fill_type_at_vmexit(enum spectre_v2_mitigation mode)
static void __init spectre_v2_select_rsb_mitigation(enum spectre_v2_mitigation mode)
{
/*
* Similar to context switches, there are two types of RSB attacks
* after VM exit:
* WARNING! There are many subtleties to consider when changing *any*
* code related to RSB-related mitigations. Before doing so, carefully
* read the following document, and update if necessary:
*
* 1) RSB underflow
* Documentation/admin-guide/hw-vuln/rsb.rst
*
* 2) Poisoned RSB entry
* In an overly simplified nutshell:
*
* When retpoline is enabled, both are mitigated by filling/clearing
* the RSB.
* - User->user RSB attacks are conditionally mitigated during
* context switches by cond_mitigation -> write_ibpb().
*
* When IBRS is enabled, while #1 would be mitigated by the IBRS branch
* prediction isolation protections, RSB still needs to be cleared
* because of #2. Note that SMEP provides no protection here, unlike
* user-space-poisoned RSB entries.
* - User->kernel and guest->host attacks are mitigated by eIBRS or
* RSB filling.
*
* eIBRS should protect against RSB poisoning, but if the EIBRS_PBRSB
* bug is present then a LITE version of RSB protection is required,
* just a single call needs to retire before a RET is executed.
* Though, depending on config, note that other alternative
* mitigations may end up getting used instead, e.g., IBPB on
* entry/vmexit, call depth tracking, or return thunks.
*/
switch (mode) {
case SPECTRE_V2_NONE:
return;
break;
case SPECTRE_V2_EIBRS_LFENCE:
case SPECTRE_V2_EIBRS:
if (boot_cpu_has_bug(X86_BUG_EIBRS_PBRSB)) {
setup_force_cpu_cap(X86_FEATURE_RSB_VMEXIT_LITE);
pr_info("Spectre v2 / PBRSB-eIBRS: Retire a single CALL on VMEXIT\n");
}
return;
case SPECTRE_V2_EIBRS_LFENCE:
case SPECTRE_V2_EIBRS_RETPOLINE:
if (boot_cpu_has_bug(X86_BUG_EIBRS_PBRSB)) {
pr_info("Spectre v2 / PBRSB-eIBRS: Retire a single CALL on VMEXIT\n");
setup_force_cpu_cap(X86_FEATURE_RSB_VMEXIT_LITE);
}
break;
case SPECTRE_V2_RETPOLINE:
case SPECTRE_V2_LFENCE:
case SPECTRE_V2_IBRS:
pr_info("Spectre v2 / SpectreRSB: Filling RSB on context switch and VMEXIT\n");
setup_force_cpu_cap(X86_FEATURE_RSB_CTXSW);
setup_force_cpu_cap(X86_FEATURE_RSB_VMEXIT);
pr_info("Spectre v2 / SpectreRSB : Filling RSB on VMEXIT\n");
return;
}
break;
pr_warn_once("Unknown Spectre v2 mode, disabling RSB mitigation at VM exit");
dump_stack();
default:
pr_warn_once("Unknown Spectre v2 mode, disabling RSB mitigation\n");
dump_stack();
break;
}
}
/*
@ -1830,48 +1832,7 @@ static void __init spectre_v2_select_mitigation(void)
spectre_v2_enabled = mode;
pr_info("%s\n", spectre_v2_strings[mode]);
/*
* If Spectre v2 protection has been enabled, fill the RSB during a
* context switch. In general there are two types of RSB attacks
* across context switches, for which the CALLs/RETs may be unbalanced.
*
* 1) RSB underflow
*
* Some Intel parts have "bottomless RSB". When the RSB is empty,
* speculated return targets may come from the branch predictor,
* which could have a user-poisoned BTB or BHB entry.
*
* AMD has it even worse: *all* returns are speculated from the BTB,
* regardless of the state of the RSB.
*
* When IBRS or eIBRS is enabled, the "user -> kernel" attack
* scenario is mitigated by the IBRS branch prediction isolation
* properties, so the RSB buffer filling wouldn't be necessary to
* protect against this type of attack.
*
* The "user -> user" attack scenario is mitigated by RSB filling.
*
* 2) Poisoned RSB entry
*
* If the 'next' in-kernel return stack is shorter than 'prev',
* 'next' could be tricked into speculating with a user-poisoned RSB
* entry.
*
* The "user -> kernel" attack scenario is mitigated by SMEP and
* eIBRS.
*
* The "user -> user" scenario, also known as SpectreBHB, requires
* RSB clearing.
*
* So to mitigate all cases, unconditionally fill RSB on context
* switches.
*
* FIXME: Is this pointless for retbleed-affected AMD?
*/
setup_force_cpu_cap(X86_FEATURE_RSB_CTXSW);
pr_info("Spectre v2 / SpectreRSB mitigation: Filling RSB on context switch\n");
spectre_v2_determine_rsb_fill_type_at_vmexit(mode);
spectre_v2_select_rsb_mitigation(mode);
/*
* Retpoline protects the kernel, but doesn't protect firmware. IBRS
@ -2676,7 +2637,7 @@ static void __init srso_select_mitigation(void)
setup_clear_cpu_cap(X86_FEATURE_RETHUNK);
/*
* There is no need for RSB filling: entry_ibpb() ensures
* There is no need for RSB filling: write_ibpb() ensures
* all predictions, including the RSB, are invalidated,
* regardless of IBPB implementation.
*/
@ -2701,7 +2662,7 @@ ibpb_on_vmexit:
srso_mitigation = SRSO_MITIGATION_IBPB_ON_VMEXIT;
/*
* There is no need for RSB filling: entry_ibpb() ensures
* There is no need for RSB filling: write_ibpb() ensures
* all predictions, including the RSB, are invalidated,
* regardless of IBPB implementation.
*/

View file

@ -3553,6 +3553,22 @@ static void mkdir_rdt_prepare_rmid_free(struct rdtgroup *rgrp)
free_rmid(rgrp->closid, rgrp->mon.rmid);
}
/*
* We allow creating mon groups only with in a directory called "mon_groups"
* which is present in every ctrl_mon group. Check if this is a valid
* "mon_groups" directory.
*
* 1. The directory should be named "mon_groups".
* 2. The mon group itself should "not" be named "mon_groups".
* This makes sure "mon_groups" directory always has a ctrl_mon group
* as parent.
*/
static bool is_mon_groups(struct kernfs_node *kn, const char *name)
{
return (!strcmp(rdt_kn_name(kn), "mon_groups") &&
strcmp(name, "mon_groups"));
}
static int mkdir_rdt_prepare(struct kernfs_node *parent_kn,
const char *name, umode_t mode,
enum rdt_group_type rtype, struct rdtgroup **r)
@ -3568,6 +3584,15 @@ static int mkdir_rdt_prepare(struct kernfs_node *parent_kn,
goto out_unlock;
}
/*
* Check that the parent directory for a monitor group is a "mon_groups"
* directory.
*/
if (rtype == RDTMON_GROUP && !is_mon_groups(parent_kn, name)) {
ret = -EPERM;
goto out_unlock;
}
if (rtype == RDTMON_GROUP &&
(prdtgrp->mode == RDT_MODE_PSEUDO_LOCKSETUP ||
prdtgrp->mode == RDT_MODE_PSEUDO_LOCKED)) {
@ -3751,22 +3776,6 @@ out_unlock:
return ret;
}
/*
* We allow creating mon groups only with in a directory called "mon_groups"
* which is present in every ctrl_mon group. Check if this is a valid
* "mon_groups" directory.
*
* 1. The directory should be named "mon_groups".
* 2. The mon group itself should "not" be named "mon_groups".
* This makes sure "mon_groups" directory always has a ctrl_mon group
* as parent.
*/
static bool is_mon_groups(struct kernfs_node *kn, const char *name)
{
return (!strcmp(rdt_kn_name(kn), "mon_groups") &&
strcmp(name, "mon_groups"));
}
static int rdtgroup_mkdir(struct kernfs_node *parent_kn, const char *name,
umode_t mode)
{
@ -3782,11 +3791,8 @@ static int rdtgroup_mkdir(struct kernfs_node *parent_kn, const char *name,
if (resctrl_arch_alloc_capable() && parent_kn == rdtgroup_default.kn)
return rdtgroup_mkdir_ctrl_mon(parent_kn, name, mode);
/*
* If RDT monitoring is supported and the parent directory is a valid
* "mon_groups" directory, add a monitoring subdirectory.
*/
if (resctrl_arch_mon_capable() && is_mon_groups(parent_kn, name))
/* Else, attempt to add a monitoring subdirectory. */
if (resctrl_arch_mon_capable())
return rdtgroup_mkdir_mon(parent_kn, name, mode);
return -EPERM;

View file

@ -753,22 +753,21 @@ void __init e820__memory_setup_extended(u64 phys_addr, u32 data_len)
void __init e820__register_nosave_regions(unsigned long limit_pfn)
{
int i;
unsigned long pfn = 0;
u64 last_addr = 0;
for (i = 0; i < e820_table->nr_entries; i++) {
struct e820_entry *entry = &e820_table->entries[i];
if (pfn < PFN_UP(entry->addr))
register_nosave_region(pfn, PFN_UP(entry->addr));
pfn = PFN_DOWN(entry->addr + entry->size);
if (entry->type != E820_TYPE_RAM)
register_nosave_region(PFN_UP(entry->addr), pfn);
continue;
if (pfn >= limit_pfn)
break;
if (last_addr < entry->addr)
register_nosave_region(PFN_DOWN(last_addr), PFN_UP(entry->addr));
last_addr = entry->addr + entry->size;
}
register_nosave_region(PFN_DOWN(last_addr), limit_pfn);
}
#ifdef CONFIG_ACPI

View file

@ -389,10 +389,10 @@ static int __init setup_early_printk(char *buf)
keep = (strstr(buf, "keep") != NULL);
while (*buf != '\0') {
if (!strncmp(buf, "mmio", 4)) {
early_mmio_serial_init(buf + 4);
if (!strncmp(buf, "mmio32", 6)) {
buf += 6;
early_mmio_serial_init(buf);
early_console_register(&early_serial_console, keep);
buf += 4;
}
if (!strncmp(buf, "serial", 6)) {
buf += 6;
@ -407,9 +407,9 @@ static int __init setup_early_printk(char *buf)
}
#ifdef CONFIG_PCI
if (!strncmp(buf, "pciserial", 9)) {
early_pci_serial_init(buf + 9);
buf += 9; /* Keep from match the above "pciserial" */
early_pci_serial_init(buf);
early_console_register(&early_serial_console, keep);
buf += 9; /* Keep from match the above "serial" */
}
#endif
if (!strncmp(buf, "vga", 3) &&

View file

@ -667,9 +667,9 @@ static void cond_mitigation(struct task_struct *next)
prev_mm = this_cpu_read(cpu_tlbstate.last_user_mm_spec);
/*
* Avoid user/user BTB poisoning by flushing the branch predictor
* when switching between processes. This stops one process from
* doing Spectre-v2 attacks on another.
* Avoid user->user BTB/RSB poisoning by flushing them when switching
* between processes. This stops one process from doing Spectre-v2
* attacks on another.
*
* Both, the conditional and the always IBPB mode use the mm
* pointer to avoid the IBPB when switching between tasks of the

View file

@ -26,7 +26,7 @@
/* code below belongs to the image kernel */
.align PAGE_SIZE
SYM_FUNC_START(restore_registers)
ANNOTATE_NOENDBR
ENDBR
/* go back to the original page tables */
movq %r9, %cr3
@ -120,7 +120,7 @@ SYM_FUNC_END(restore_image)
/* code below has been relocated to a safe page */
SYM_FUNC_START(core_restore_code)
ANNOTATE_NOENDBR
ENDBR
/* switch to temporary page tables */
movq %rax, %cr3
/* flush TLB */