mirror of
git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
synced 2025-04-13 09:59:31 +00:00
Miscellaneous fixes:
- Fix CPU topology related regression that limited Xen PV guests to a single CPU - Fix ancient e820__register_nosave_regions() bugs that were causing problems with kexec's artificial memory maps - Fix an S4 hibernation crash caused by two missing ENDBR's that were mistakenly removed in a recent commit - Fix a resctrl serialization bug - Fix early_printk documentation and comments - Fix RSB bugs, combined with preparatory updates to better match the code to vendor recommendations. - Add RSB mitigation document - Fix/update documentation - Fix the erratum_1386_microcode[] table to be NULL terminated Signed-off-by: Ingo Molnar <mingo@kernel.org> -----BEGIN PGP SIGNATURE----- iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmf4Na0RHG1pbmdvQGtl cm5lbC5vcmcACgkQEnMQ0APhK1iy0hAAw03t9IGCgFEbzFkm2jRvoR/kUBnh7Q+B E1LLYjlYws0TLcxTFIkc3slI2dt0LE6YN6kHT4gzJmE2Rp7G3oKR9xGwW/soJEuv +hTZ4ueY8TY2mEOwKUkY7xetBDI/e6iXqMnrXIVz1xIDwW3wyQ31jT+A7LzW7Gxn CKKymIJQfH9eDJwakiTjrmsJRy2cmah5ajFmhrlt1bLDV1Ykts595HTZNFBnsDJq mGxUwKZi0h9h6JZgLSZJQtUu2Pv3WmI/6DlkPG3cNZJIIfS7sMPj1LpQVTKMPQ19 zGzkHGAv6tgp7gIxse1MFoLiKEsAPR/iAL++o2PeyQkynXpVb0g6d6fvicGK/OAe xWR4rf/LVluvvwRam9bYaIkDkahbT/uLe/dp99YEqclfBGSsHY1C8jhPiuVyOQQK w5AS1D5LSqXVTxu1XWCVTAhfR5nPS+O5q2hEs4O8tEdWNeOQSeExOZ8z2lqyqeoG VifCuQqcPbCja0msBWX9eEY/M/ie3AcasrfgD49Xj7oTBQOMXO70YeENM1fVzcko NQFY8RqA+N/EmTaWJvJ8o88ZIvTKqosyTYOvQIq9ZJS7DeeVtPZ+wgJahiZbBKT7 4KSjLOO3ZvosrgafS35I4v5+zU0GO6B7rgWUKALFsSy52FgXk0ip4RpO6DPCsmRD 8GEpn0X19xM= =1DWX -----END PGP SIGNATURE----- Merge tag 'x86-urgent-2025-04-10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull misc x86 fixes from Ingo Molnar: - Fix CPU topology related regression that limited Xen PV guests to a single CPU - Fix ancient e820__register_nosave_regions() bugs that were causing problems with kexec's artificial memory maps - Fix an S4 hibernation crash caused by two missing ENDBR's that were mistakenly removed in a recent commit - Fix a resctrl serialization bug - Fix early_printk documentation and comments - Fix RSB bugs, combined with preparatory updates to better match the code to vendor recommendations. - Add RSB mitigation document - Fix/update documentation - Fix the erratum_1386_microcode[] table to be NULL terminated * tag 'x86-urgent-2025-04-10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/ibt: Fix hibernate x86/cpu: Avoid running off the end of an AMD erratum table Documentation/x86: Zap the subsection letters Documentation/x86: Update the naming of CPU features for /proc/cpuinfo x86/bugs: Add RSB mitigation document x86/bugs: Don't fill RSB on context switch with eIBRS x86/bugs: Don't fill RSB on VMEXIT with eIBRS+retpoline x86/bugs: Fix RSB clearing in indirect_branch_prediction_barrier() x86/bugs: Use SBPB in write_ibpb() if applicable x86/bugs: Rename entry_ibpb() to write_ibpb() x86/early_printk: Use 'mmio32' for consistency, fix comments x86/resctrl: Fix rdtgroup_mkdir()'s unlocked use of kernfs_node::name x86/e820: Fix handling of subpage regions when calculating nosave ranges in e820__register_nosave_regions() x86/acpi: Don't limit CPUs to 1 for Xen PV guests due to disabled ACPI
This commit is contained in:
commit
3c9de67dd3
14 changed files with 408 additions and 160 deletions
|
@ -22,3 +22,4 @@ are configurable at compile, boot or run time.
|
||||||
srso
|
srso
|
||||||
gather_data_sampling
|
gather_data_sampling
|
||||||
reg-file-data-sampling
|
reg-file-data-sampling
|
||||||
|
rsb
|
||||||
|
|
268
Documentation/admin-guide/hw-vuln/rsb.rst
Normal file
268
Documentation/admin-guide/hw-vuln/rsb.rst
Normal file
|
@ -0,0 +1,268 @@
|
||||||
|
.. SPDX-License-Identifier: GPL-2.0
|
||||||
|
|
||||||
|
=======================
|
||||||
|
RSB-related mitigations
|
||||||
|
=======================
|
||||||
|
|
||||||
|
.. warning::
|
||||||
|
Please keep this document up-to-date, otherwise you will be
|
||||||
|
volunteered to update it and convert it to a very long comment in
|
||||||
|
bugs.c!
|
||||||
|
|
||||||
|
Since 2018 there have been many Spectre CVEs related to the Return Stack
|
||||||
|
Buffer (RSB) (sometimes referred to as the Return Address Stack (RAS) or
|
||||||
|
Return Address Predictor (RAP) on AMD).
|
||||||
|
|
||||||
|
Information about these CVEs and how to mitigate them is scattered
|
||||||
|
amongst a myriad of microarchitecture-specific documents.
|
||||||
|
|
||||||
|
This document attempts to consolidate all the relevant information in
|
||||||
|
once place and clarify the reasoning behind the current RSB-related
|
||||||
|
mitigations. It's meant to be as concise as possible, focused only on
|
||||||
|
the current kernel mitigations: what are the RSB-related attack vectors
|
||||||
|
and how are they currently being mitigated?
|
||||||
|
|
||||||
|
It's *not* meant to describe how the RSB mechanism operates or how the
|
||||||
|
exploits work. More details about those can be found in the references
|
||||||
|
below.
|
||||||
|
|
||||||
|
Rather, this is basically a glorified comment, but too long to actually
|
||||||
|
be one. So when the next CVE comes along, a kernel developer can
|
||||||
|
quickly refer to this as a refresher to see what we're actually doing
|
||||||
|
and why.
|
||||||
|
|
||||||
|
At a high level, there are two classes of RSB attacks: RSB poisoning
|
||||||
|
(Intel and AMD) and RSB underflow (Intel only). They must each be
|
||||||
|
considered individually for each attack vector (and microarchitecture
|
||||||
|
where applicable).
|
||||||
|
|
||||||
|
----
|
||||||
|
|
||||||
|
RSB poisoning (Intel and AMD)
|
||||||
|
=============================
|
||||||
|
|
||||||
|
SpectreRSB
|
||||||
|
~~~~~~~~~~
|
||||||
|
|
||||||
|
RSB poisoning is a technique used by SpectreRSB [#spectre-rsb]_ where
|
||||||
|
an attacker poisons an RSB entry to cause a victim's return instruction
|
||||||
|
to speculate to an attacker-controlled address. This can happen when
|
||||||
|
there are unbalanced CALLs/RETs after a context switch or VMEXIT.
|
||||||
|
|
||||||
|
* All attack vectors can potentially be mitigated by flushing out any
|
||||||
|
poisoned RSB entries using an RSB filling sequence
|
||||||
|
[#intel-rsb-filling]_ [#amd-rsb-filling]_ when transitioning between
|
||||||
|
untrusted and trusted domains. But this has a performance impact and
|
||||||
|
should be avoided whenever possible.
|
||||||
|
|
||||||
|
.. DANGER::
|
||||||
|
**FIXME**: Currently we're flushing 32 entries. However, some CPU
|
||||||
|
models have more than 32 entries. The loop count needs to be
|
||||||
|
increased for those. More detailed information is needed about RSB
|
||||||
|
sizes.
|
||||||
|
|
||||||
|
* On context switch, the user->user mitigation requires ensuring the
|
||||||
|
RSB gets filled or cleared whenever IBPB gets written [#cond-ibpb]_
|
||||||
|
during a context switch:
|
||||||
|
|
||||||
|
* AMD:
|
||||||
|
On Zen 4+, IBPB (or SBPB [#amd-sbpb]_ if used) clears the RSB.
|
||||||
|
This is indicated by IBPB_RET in CPUID [#amd-ibpb-rsb]_.
|
||||||
|
|
||||||
|
On Zen < 4, the RSB filling sequence [#amd-rsb-filling]_ must be
|
||||||
|
always be done in addition to IBPB [#amd-ibpb-no-rsb]_. This is
|
||||||
|
indicated by X86_BUG_IBPB_NO_RET.
|
||||||
|
|
||||||
|
* Intel:
|
||||||
|
IBPB always clears the RSB:
|
||||||
|
|
||||||
|
"Software that executed before the IBPB command cannot control
|
||||||
|
the predicted targets of indirect branches executed after the
|
||||||
|
command on the same logical processor. The term indirect branch
|
||||||
|
in this context includes near return instructions, so these
|
||||||
|
predicted targets may come from the RSB." [#intel-ibpb-rsb]_
|
||||||
|
|
||||||
|
* On context switch, user->kernel attacks are prevented by SMEP. User
|
||||||
|
space can only insert user space addresses into the RSB. Even
|
||||||
|
non-canonical addresses can't be inserted due to the page gap at the
|
||||||
|
end of the user canonical address space reserved by TASK_SIZE_MAX.
|
||||||
|
A SMEP #PF at instruction fetch prevents the kernel from speculatively
|
||||||
|
executing user space.
|
||||||
|
|
||||||
|
* AMD:
|
||||||
|
"Finally, branches that are predicted as 'ret' instructions get
|
||||||
|
their predicted targets from the Return Address Predictor (RAP).
|
||||||
|
AMD recommends software use a RAP stuffing sequence (mitigation
|
||||||
|
V2-3 in [2]) and/or Supervisor Mode Execution Protection (SMEP)
|
||||||
|
to ensure that the addresses in the RAP are safe for
|
||||||
|
speculation. Collectively, we refer to these mitigations as "RAP
|
||||||
|
Protection"." [#amd-smep-rsb]_
|
||||||
|
|
||||||
|
* Intel:
|
||||||
|
"On processors with enhanced IBRS, an RSB overwrite sequence may
|
||||||
|
not suffice to prevent the predicted target of a near return
|
||||||
|
from using an RSB entry created in a less privileged predictor
|
||||||
|
mode. Software can prevent this by enabling SMEP (for
|
||||||
|
transitions from user mode to supervisor mode) and by having
|
||||||
|
IA32_SPEC_CTRL.IBRS set during VM exits." [#intel-smep-rsb]_
|
||||||
|
|
||||||
|
* On VMEXIT, guest->host attacks are mitigated by eIBRS (and PBRSB
|
||||||
|
mitigation if needed):
|
||||||
|
|
||||||
|
* AMD:
|
||||||
|
"When Automatic IBRS is enabled, the internal return address
|
||||||
|
stack used for return address predictions is cleared on VMEXIT."
|
||||||
|
[#amd-eibrs-vmexit]_
|
||||||
|
|
||||||
|
* Intel:
|
||||||
|
"On processors with enhanced IBRS, an RSB overwrite sequence may
|
||||||
|
not suffice to prevent the predicted target of a near return
|
||||||
|
from using an RSB entry created in a less privileged predictor
|
||||||
|
mode. Software can prevent this by enabling SMEP (for
|
||||||
|
transitions from user mode to supervisor mode) and by having
|
||||||
|
IA32_SPEC_CTRL.IBRS set during VM exits. Processors with
|
||||||
|
enhanced IBRS still support the usage model where IBRS is set
|
||||||
|
only in the OS/VMM for OSes that enable SMEP. To do this, such
|
||||||
|
processors will ensure that guest behavior cannot control the
|
||||||
|
RSB after a VM exit once IBRS is set, even if IBRS was not set
|
||||||
|
at the time of the VM exit." [#intel-eibrs-vmexit]_
|
||||||
|
|
||||||
|
Note that some Intel CPUs are susceptible to Post-barrier Return
|
||||||
|
Stack Buffer Predictions (PBRSB) [#intel-pbrsb]_, where the last
|
||||||
|
CALL from the guest can be used to predict the first unbalanced RET.
|
||||||
|
In this case the PBRSB mitigation is needed in addition to eIBRS.
|
||||||
|
|
||||||
|
AMD RETBleed / SRSO / Branch Type Confusion
|
||||||
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
On AMD, poisoned RSB entries can also be created by the AMD RETBleed
|
||||||
|
variant [#retbleed-paper]_ [#amd-btc]_ or by Speculative Return Stack
|
||||||
|
Overflow [#amd-srso]_ (Inception [#inception-paper]_). The kernel
|
||||||
|
protects itself by replacing every RET in the kernel with a branch to a
|
||||||
|
single safe RET.
|
||||||
|
|
||||||
|
----
|
||||||
|
|
||||||
|
RSB underflow (Intel only)
|
||||||
|
==========================
|
||||||
|
|
||||||
|
RSB Alternate (RSBA) ("Intel Retbleed")
|
||||||
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
Some Intel Skylake-generation CPUs are susceptible to the Intel variant
|
||||||
|
of RETBleed [#retbleed-paper]_ (Return Stack Buffer Underflow
|
||||||
|
[#intel-rsbu]_). If a RET is executed when the RSB buffer is empty due
|
||||||
|
to mismatched CALLs/RETs or returning from a deep call stack, the branch
|
||||||
|
predictor can fall back to using the Branch Target Buffer (BTB). If a
|
||||||
|
user forces a BTB collision then the RET can speculatively branch to a
|
||||||
|
user-controlled address.
|
||||||
|
|
||||||
|
* Note that RSB filling doesn't fully mitigate this issue. If there
|
||||||
|
are enough unbalanced RETs, the RSB may still underflow and fall back
|
||||||
|
to using a poisoned BTB entry.
|
||||||
|
|
||||||
|
* On context switch, user->user underflow attacks are mitigated by the
|
||||||
|
conditional IBPB [#cond-ibpb]_ on context switch which effectively
|
||||||
|
clears the BTB:
|
||||||
|
|
||||||
|
* "The indirect branch predictor barrier (IBPB) is an indirect branch
|
||||||
|
control mechanism that establishes a barrier, preventing software
|
||||||
|
that executed before the barrier from controlling the predicted
|
||||||
|
targets of indirect branches executed after the barrier on the same
|
||||||
|
logical processor." [#intel-ibpb-btb]_
|
||||||
|
|
||||||
|
* On context switch and VMEXIT, user->kernel and guest->host RSB
|
||||||
|
underflows are mitigated by IBRS or eIBRS:
|
||||||
|
|
||||||
|
* "Enabling IBRS (including enhanced IBRS) will mitigate the "RSBU"
|
||||||
|
attack demonstrated by the researchers. As previously documented,
|
||||||
|
Intel recommends the use of enhanced IBRS, where supported. This
|
||||||
|
includes any processor that enumerates RRSBA but not RRSBA_DIS_S."
|
||||||
|
[#intel-rsbu]_
|
||||||
|
|
||||||
|
However, note that eIBRS and IBRS do not mitigate intra-mode attacks.
|
||||||
|
Like RRSBA below, this is mitigated by clearing the BHB on kernel
|
||||||
|
entry.
|
||||||
|
|
||||||
|
As an alternative to classic IBRS, call depth tracking (combined with
|
||||||
|
retpolines) can be used to track kernel returns and fill the RSB when
|
||||||
|
it gets close to being empty.
|
||||||
|
|
||||||
|
Restricted RSB Alternate (RRSBA)
|
||||||
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
Some newer Intel CPUs have Restricted RSB Alternate (RRSBA) behavior,
|
||||||
|
which, similar to RSBA described above, also falls back to using the BTB
|
||||||
|
on RSB underflow. The only difference is that the predicted targets are
|
||||||
|
restricted to the current domain when eIBRS is enabled:
|
||||||
|
|
||||||
|
* "Restricted RSB Alternate (RRSBA) behavior allows alternate branch
|
||||||
|
predictors to be used by near RET instructions when the RSB is
|
||||||
|
empty. When eIBRS is enabled, the predicted targets of these
|
||||||
|
alternate predictors are restricted to those belonging to the
|
||||||
|
indirect branch predictor entries of the current prediction domain.
|
||||||
|
[#intel-eibrs-rrsba]_
|
||||||
|
|
||||||
|
When a CPU with RRSBA is vulnerable to Branch History Injection
|
||||||
|
[#bhi-paper]_ [#intel-bhi]_, an RSB underflow could be used for an
|
||||||
|
intra-mode BTI attack. This is mitigated by clearing the BHB on
|
||||||
|
kernel entry.
|
||||||
|
|
||||||
|
However if the kernel uses retpolines instead of eIBRS, it needs to
|
||||||
|
disable RRSBA:
|
||||||
|
|
||||||
|
* "Where software is using retpoline as a mitigation for BHI or
|
||||||
|
intra-mode BTI, and the processor both enumerates RRSBA and
|
||||||
|
enumerates RRSBA_DIS controls, it should disable this behavior."
|
||||||
|
[#intel-retpoline-rrsba]_
|
||||||
|
|
||||||
|
----
|
||||||
|
|
||||||
|
References
|
||||||
|
==========
|
||||||
|
|
||||||
|
.. [#spectre-rsb] `Spectre Returns! Speculation Attacks using the Return Stack Buffer <https://arxiv.org/pdf/1807.07940.pdf>`_
|
||||||
|
|
||||||
|
.. [#intel-rsb-filling] "Empty RSB Mitigation on Skylake-generation" in `Retpoline: A Branch Target Injection Mitigation <https://www.intel.com/content/www/us/en/developer/articles/technical/software-security-guidance/technical-documentation/retpoline-branch-target-injection-mitigation.html#inpage-nav-5-1>`_
|
||||||
|
|
||||||
|
.. [#amd-rsb-filling] "Mitigation V2-3" in `Software Techniques for Managing Speculation <https://www.amd.com/content/dam/amd/en/documents/processor-tech-docs/programmer-references/software-techniques-for-managing-speculation.pdf>`_
|
||||||
|
|
||||||
|
.. [#cond-ibpb] Whether IBPB is written depends on whether the prev and/or next task is protected from Spectre attacks. It typically requires opting in per task or system-wide. For more details see the documentation for the ``spectre_v2_user`` cmdline option in Documentation/admin-guide/kernel-parameters.txt.
|
||||||
|
|
||||||
|
.. [#amd-sbpb] IBPB without flushing of branch type predictions. Only exists for AMD.
|
||||||
|
|
||||||
|
.. [#amd-ibpb-rsb] "Function 8000_0008h -- Processor Capacity Parameters and Extended Feature Identification" in `AMD64 Architecture Programmer's Manual Volume 3: General-Purpose and System Instructions <https://www.amd.com/content/dam/amd/en/documents/processor-tech-docs/programmer-references/24594.pdf>`_. SBPB behaves the same way according to `this email <https://lore.kernel.org/5175b163a3736ca5fd01cedf406735636c99a>`_.
|
||||||
|
|
||||||
|
.. [#amd-ibpb-no-rsb] `Spectre Attacks: Exploiting Speculative Execution <https://comsec.ethz.ch/wp-content/files/ibpb_sp25.pdf>`_
|
||||||
|
|
||||||
|
.. [#intel-ibpb-rsb] "Introduction" in `Post-barrier Return Stack Buffer Predictions / CVE-2022-26373 / INTEL-SA-00706 <https://www.intel.com/content/www/us/en/developer/articles/technical/software-security-guidance/advisory-guidance/post-barrier-return-stack-buffer-predictions.html>`_
|
||||||
|
|
||||||
|
.. [#amd-smep-rsb] "Existing Mitigations" in `Technical Guidance for Mitigating Branch Type Confusion <https://www.amd.com/content/dam/amd/en/documents/resources/technical-guidance-for-mitigating-branch-type-confusion.pdf>`_
|
||||||
|
|
||||||
|
.. [#intel-smep-rsb] "Enhanced IBRS" in `Indirect Branch Restricted Speculation <https://www.intel.com/content/www/us/en/developer/articles/technical/software-security-guidance/technical-documentation/indirect-branch-restricted-speculation.html>`_
|
||||||
|
|
||||||
|
.. [#amd-eibrs-vmexit] "Extended Feature Enable Register (EFER)" in `AMD64 Architecture Programmer's Manual Volume 2: System Programming <https://www.amd.com/content/dam/amd/en/documents/processor-tech-docs/programmer-references/24593.pdf>`_
|
||||||
|
|
||||||
|
.. [#intel-eibrs-vmexit] "Enhanced IBRS" in `Indirect Branch Restricted Speculation <https://www.intel.com/content/www/us/en/developer/articles/technical/software-security-guidance/technical-documentation/indirect-branch-restricted-speculation.html>`_
|
||||||
|
|
||||||
|
.. [#intel-pbrsb] `Post-barrier Return Stack Buffer Predictions / CVE-2022-26373 / INTEL-SA-00706 <https://www.intel.com/content/www/us/en/developer/articles/technical/software-security-guidance/advisory-guidance/post-barrier-return-stack-buffer-predictions.html>`_
|
||||||
|
|
||||||
|
.. [#retbleed-paper] `RETBleed: Arbitrary Speculative Code Execution with Return Instruction <https://comsec.ethz.ch/wp-content/files/retbleed_sec22.pdf>`_
|
||||||
|
|
||||||
|
.. [#amd-btc] `Technical Guidance for Mitigating Branch Type Confusion <https://www.amd.com/content/dam/amd/en/documents/resources/technical-guidance-for-mitigating-branch-type-confusion.pdf>`_
|
||||||
|
|
||||||
|
.. [#amd-srso] `Technical Update Regarding Speculative Return Stack Overflow <https://www.amd.com/content/dam/amd/en/documents/corporate/cr/speculative-return-stack-overflow-whitepaper.pdf>`_
|
||||||
|
|
||||||
|
.. [#inception-paper] `Inception: Exposing New Attack Surfaces with Training in Transient Execution <https://comsec.ethz.ch/wp-content/files/inception_sec23.pdf>`_
|
||||||
|
|
||||||
|
.. [#intel-rsbu] `Return Stack Buffer Underflow / Return Stack Buffer Underflow / CVE-2022-29901, CVE-2022-28693 / INTEL-SA-00702 <https://www.intel.com/content/www/us/en/developer/articles/technical/software-security-guidance/advisory-guidance/return-stack-buffer-underflow.html>`_
|
||||||
|
|
||||||
|
.. [#intel-ibpb-btb] `Indirect Branch Predictor Barrier' <https://www.intel.com/content/www/us/en/developer/articles/technical/software-security-guidance/technical-documentation/indirect-branch-predictor-barrier.html>`_
|
||||||
|
|
||||||
|
.. [#intel-eibrs-rrsba] "Guidance for RSBU" in `Return Stack Buffer Underflow / Return Stack Buffer Underflow / CVE-2022-29901, CVE-2022-28693 / INTEL-SA-00702 <https://www.intel.com/content/www/us/en/developer/articles/technical/software-security-guidance/advisory-guidance/return-stack-buffer-underflow.html>`_
|
||||||
|
|
||||||
|
.. [#bhi-paper] `Branch History Injection: On the Effectiveness of Hardware Mitigations Against Cross-Privilege Spectre-v2 Attacks <http://download.vusec.net/papers/bhi-spectre-bhb_sec22.pdf>`_
|
||||||
|
|
||||||
|
.. [#intel-bhi] `Branch History Injection and Intra-mode Branch Target Injection / CVE-2022-0001, CVE-2022-0002 / INTEL-SA-00598 <https://www.intel.com/content/www/us/en/developer/articles/technical/software-security-guidance/technical-documentation/branch-history-injection.html>`_
|
||||||
|
|
||||||
|
.. [#intel-retpoline-rrsba] "Retpoline" in `Branch History Injection and Intra-mode Branch Target Injection / CVE-2022-0001, CVE-2022-0002 / INTEL-SA-00598 <https://www.intel.com/content/www/us/en/developer/articles/technical/software-security-guidance/technical-documentation/branch-history-injection.html>`_
|
|
@ -1407,18 +1407,15 @@
|
||||||
earlyprintk=serial[,0x...[,baudrate]]
|
earlyprintk=serial[,0x...[,baudrate]]
|
||||||
earlyprintk=ttySn[,baudrate]
|
earlyprintk=ttySn[,baudrate]
|
||||||
earlyprintk=dbgp[debugController#]
|
earlyprintk=dbgp[debugController#]
|
||||||
|
earlyprintk=mmio32,membase[,{nocfg|baudrate}]
|
||||||
earlyprintk=pciserial[,force],bus:device.function[,{nocfg|baudrate}]
|
earlyprintk=pciserial[,force],bus:device.function[,{nocfg|baudrate}]
|
||||||
earlyprintk=xdbc[xhciController#]
|
earlyprintk=xdbc[xhciController#]
|
||||||
earlyprintk=bios
|
earlyprintk=bios
|
||||||
earlyprintk=mmio,membase[,{nocfg|baudrate}]
|
|
||||||
|
|
||||||
earlyprintk is useful when the kernel crashes before
|
earlyprintk is useful when the kernel crashes before
|
||||||
the normal console is initialized. It is not enabled by
|
the normal console is initialized. It is not enabled by
|
||||||
default because it has some cosmetic problems.
|
default because it has some cosmetic problems.
|
||||||
|
|
||||||
Only 32-bit memory addresses are supported for "mmio"
|
|
||||||
and "pciserial" devices.
|
|
||||||
|
|
||||||
Use "nocfg" to skip UART configuration, assume
|
Use "nocfg" to skip UART configuration, assume
|
||||||
BIOS/firmware has configured UART correctly.
|
BIOS/firmware has configured UART correctly.
|
||||||
|
|
||||||
|
|
|
@ -79,8 +79,9 @@ feature flags.
|
||||||
How are feature flags created?
|
How are feature flags created?
|
||||||
==============================
|
==============================
|
||||||
|
|
||||||
a: Feature flags can be derived from the contents of CPUID leaves.
|
Feature flags can be derived from the contents of CPUID leaves
|
||||||
------------------------------------------------------------------
|
--------------------------------------------------------------
|
||||||
|
|
||||||
These feature definitions are organized mirroring the layout of CPUID
|
These feature definitions are organized mirroring the layout of CPUID
|
||||||
leaves and grouped in words with offsets as mapped in enum cpuid_leafs
|
leaves and grouped in words with offsets as mapped in enum cpuid_leafs
|
||||||
in cpufeatures.h (see arch/x86/include/asm/cpufeatures.h for details).
|
in cpufeatures.h (see arch/x86/include/asm/cpufeatures.h for details).
|
||||||
|
@ -89,8 +90,9 @@ cpufeatures.h, and if it is detected at run time, the flags will be
|
||||||
displayed accordingly in /proc/cpuinfo. For example, the flag "avx2"
|
displayed accordingly in /proc/cpuinfo. For example, the flag "avx2"
|
||||||
comes from X86_FEATURE_AVX2 in cpufeatures.h.
|
comes from X86_FEATURE_AVX2 in cpufeatures.h.
|
||||||
|
|
||||||
b: Flags can be from scattered CPUID-based features.
|
Flags can be from scattered CPUID-based features
|
||||||
----------------------------------------------------
|
------------------------------------------------
|
||||||
|
|
||||||
Hardware features enumerated in sparsely populated CPUID leaves get
|
Hardware features enumerated in sparsely populated CPUID leaves get
|
||||||
software-defined values. Still, CPUID needs to be queried to determine
|
software-defined values. Still, CPUID needs to be queried to determine
|
||||||
if a given feature is present. This is done in init_scattered_cpuid_features().
|
if a given feature is present. This is done in init_scattered_cpuid_features().
|
||||||
|
@ -104,8 +106,9 @@ has only one feature and would waste 31 bits of space in the x86_capability[]
|
||||||
array. Since there is a struct cpuinfo_x86 for each possible CPU, the wasted
|
array. Since there is a struct cpuinfo_x86 for each possible CPU, the wasted
|
||||||
memory is not trivial.
|
memory is not trivial.
|
||||||
|
|
||||||
c: Flags can be created synthetically under certain conditions for hardware features.
|
Flags can be created synthetically under certain conditions for hardware features
|
||||||
-------------------------------------------------------------------------------------
|
---------------------------------------------------------------------------------
|
||||||
|
|
||||||
Examples of conditions include whether certain features are present in
|
Examples of conditions include whether certain features are present in
|
||||||
MSR_IA32_CORE_CAPS or specific CPU models are identified. If the needed
|
MSR_IA32_CORE_CAPS or specific CPU models are identified. If the needed
|
||||||
conditions are met, the features are enabled by the set_cpu_cap or
|
conditions are met, the features are enabled by the set_cpu_cap or
|
||||||
|
@ -114,8 +117,8 @@ the feature X86_FEATURE_SPLIT_LOCK_DETECT will be enabled and
|
||||||
"split_lock_detect" will be displayed. The flag "ring3mwait" will be
|
"split_lock_detect" will be displayed. The flag "ring3mwait" will be
|
||||||
displayed only when running on INTEL_XEON_PHI_[KNL|KNM] processors.
|
displayed only when running on INTEL_XEON_PHI_[KNL|KNM] processors.
|
||||||
|
|
||||||
d: Flags can represent purely software features.
|
Flags can represent purely software features
|
||||||
------------------------------------------------
|
--------------------------------------------
|
||||||
These flags do not represent hardware features. Instead, they represent a
|
These flags do not represent hardware features. Instead, they represent a
|
||||||
software feature implemented in the kernel. For example, Kernel Page Table
|
software feature implemented in the kernel. For example, Kernel Page Table
|
||||||
Isolation is purely software feature and its feature flag X86_FEATURE_PTI is
|
Isolation is purely software feature and its feature flag X86_FEATURE_PTI is
|
||||||
|
@ -130,14 +133,18 @@ x86_cap/bug_flags[] arrays in kernel/cpu/capflags.c. The names in the
|
||||||
resulting x86_cap/bug_flags[] are used to populate /proc/cpuinfo. The naming
|
resulting x86_cap/bug_flags[] are used to populate /proc/cpuinfo. The naming
|
||||||
of flags in the x86_cap/bug_flags[] are as follows:
|
of flags in the x86_cap/bug_flags[] are as follows:
|
||||||
|
|
||||||
a: The name of the flag is from the string in X86_FEATURE_<name> by default.
|
Flags do not appear by default in /proc/cpuinfo
|
||||||
----------------------------------------------------------------------------
|
-----------------------------------------------
|
||||||
By default, the flag <name> in /proc/cpuinfo is extracted from the respective
|
|
||||||
X86_FEATURE_<name> in cpufeatures.h. For example, the flag "avx2" is from
|
Feature flags are omitted by default from /proc/cpuinfo as it does not make
|
||||||
X86_FEATURE_AVX2.
|
sense for the feature to be exposed to userspace in most cases. For example,
|
||||||
|
X86_FEATURE_ALWAYS is defined in cpufeatures.h but that flag is an internal
|
||||||
|
kernel feature used in the alternative runtime patching functionality. So the
|
||||||
|
flag does not appear in /proc/cpuinfo.
|
||||||
|
|
||||||
|
Specify a flag name if absolutely needed
|
||||||
|
----------------------------------------
|
||||||
|
|
||||||
b: The naming can be overridden.
|
|
||||||
--------------------------------
|
|
||||||
If the comment on the line for the #define X86_FEATURE_* starts with a
|
If the comment on the line for the #define X86_FEATURE_* starts with a
|
||||||
double-quote character (""), the string inside the double-quote characters
|
double-quote character (""), the string inside the double-quote characters
|
||||||
will be the name of the flags. For example, the flag "sse4_1" comes from
|
will be the name of the flags. For example, the flag "sse4_1" comes from
|
||||||
|
@ -148,36 +155,31 @@ needed. For instance, /proc/cpuinfo is a userspace interface and must remain
|
||||||
constant. If, for some reason, the naming of X86_FEATURE_<name> changes, one
|
constant. If, for some reason, the naming of X86_FEATURE_<name> changes, one
|
||||||
shall override the new naming with the name already used in /proc/cpuinfo.
|
shall override the new naming with the name already used in /proc/cpuinfo.
|
||||||
|
|
||||||
c: The naming override can be "", which means it will not appear in /proc/cpuinfo.
|
|
||||||
----------------------------------------------------------------------------------
|
|
||||||
The feature shall be omitted from /proc/cpuinfo if it does not make sense for
|
|
||||||
the feature to be exposed to userspace. For example, X86_FEATURE_ALWAYS is
|
|
||||||
defined in cpufeatures.h but that flag is an internal kernel feature used
|
|
||||||
in the alternative runtime patching functionality. So, its name is overridden
|
|
||||||
with "". Its flag will not appear in /proc/cpuinfo.
|
|
||||||
|
|
||||||
Flags are missing when one or more of these happen
|
Flags are missing when one or more of these happen
|
||||||
==================================================
|
==================================================
|
||||||
|
|
||||||
a: The hardware does not enumerate support for it.
|
The hardware does not enumerate support for it
|
||||||
--------------------------------------------------
|
----------------------------------------------
|
||||||
|
|
||||||
For example, when a new kernel is running on old hardware or the feature is
|
For example, when a new kernel is running on old hardware or the feature is
|
||||||
not enabled by boot firmware. Even if the hardware is new, there might be a
|
not enabled by boot firmware. Even if the hardware is new, there might be a
|
||||||
problem enabling the feature at run time, the flag will not be displayed.
|
problem enabling the feature at run time, the flag will not be displayed.
|
||||||
|
|
||||||
b: The kernel does not know about the flag.
|
The kernel does not know about the flag
|
||||||
-------------------------------------------
|
---------------------------------------
|
||||||
|
|
||||||
For example, when an old kernel is running on new hardware.
|
For example, when an old kernel is running on new hardware.
|
||||||
|
|
||||||
c: The kernel disabled support for it at compile-time.
|
The kernel disabled support for it at compile-time
|
||||||
------------------------------------------------------
|
--------------------------------------------------
|
||||||
|
|
||||||
For example, if 5-level-paging is not enabled when building (i.e.,
|
For example, if 5-level-paging is not enabled when building (i.e.,
|
||||||
CONFIG_X86_5LEVEL is not selected) the flag "la57" will not show up [#f1]_.
|
CONFIG_X86_5LEVEL is not selected) the flag "la57" will not show up [#f1]_.
|
||||||
Even though the feature will still be detected via CPUID, the kernel disables
|
Even though the feature will still be detected via CPUID, the kernel disables
|
||||||
it by clearing via setup_clear_cpu_cap(X86_FEATURE_LA57).
|
it by clearing via setup_clear_cpu_cap(X86_FEATURE_LA57).
|
||||||
|
|
||||||
d: The feature is disabled at boot-time.
|
The feature is disabled at boot-time
|
||||||
----------------------------------------
|
------------------------------------
|
||||||
A feature can be disabled either using a command-line parameter or because
|
A feature can be disabled either using a command-line parameter or because
|
||||||
it failed to be enabled. The command-line parameter clearcpuid= can be used
|
it failed to be enabled. The command-line parameter clearcpuid= can be used
|
||||||
to disable features using the feature number as defined in
|
to disable features using the feature number as defined in
|
||||||
|
@ -190,8 +192,9 @@ disable specific features. The list of parameters includes, but is not limited
|
||||||
to, nofsgsbase, nosgx, noxsave, etc. 5-level paging can also be disabled using
|
to, nofsgsbase, nosgx, noxsave, etc. 5-level paging can also be disabled using
|
||||||
"no5lvl".
|
"no5lvl".
|
||||||
|
|
||||||
e: The feature was known to be non-functional.
|
The feature was known to be non-functional
|
||||||
----------------------------------------------
|
------------------------------------------
|
||||||
|
|
||||||
The feature was known to be non-functional because a dependency was
|
The feature was known to be non-functional because a dependency was
|
||||||
missing at runtime. For example, AVX flags will not show up if XSAVE feature
|
missing at runtime. For example, AVX flags will not show up if XSAVE feature
|
||||||
is disabled since they depend on XSAVE feature. Another example would be broken
|
is disabled since they depend on XSAVE feature. Another example would be broken
|
||||||
|
|
|
@ -17,19 +17,20 @@
|
||||||
|
|
||||||
.pushsection .noinstr.text, "ax"
|
.pushsection .noinstr.text, "ax"
|
||||||
|
|
||||||
SYM_FUNC_START(entry_ibpb)
|
/* Clobbers AX, CX, DX */
|
||||||
|
SYM_FUNC_START(write_ibpb)
|
||||||
ANNOTATE_NOENDBR
|
ANNOTATE_NOENDBR
|
||||||
movl $MSR_IA32_PRED_CMD, %ecx
|
movl $MSR_IA32_PRED_CMD, %ecx
|
||||||
movl $PRED_CMD_IBPB, %eax
|
movl _ASM_RIP(x86_pred_cmd), %eax
|
||||||
xorl %edx, %edx
|
xorl %edx, %edx
|
||||||
wrmsr
|
wrmsr
|
||||||
|
|
||||||
/* Make sure IBPB clears return stack preductions too. */
|
/* Make sure IBPB clears return stack preductions too. */
|
||||||
FILL_RETURN_BUFFER %rax, RSB_CLEAR_LOOPS, X86_BUG_IBPB_NO_RET
|
FILL_RETURN_BUFFER %rax, RSB_CLEAR_LOOPS, X86_BUG_IBPB_NO_RET
|
||||||
RET
|
RET
|
||||||
SYM_FUNC_END(entry_ibpb)
|
SYM_FUNC_END(write_ibpb)
|
||||||
/* For KVM */
|
/* For KVM */
|
||||||
EXPORT_SYMBOL_GPL(entry_ibpb);
|
EXPORT_SYMBOL_GPL(write_ibpb);
|
||||||
|
|
||||||
.popsection
|
.popsection
|
||||||
|
|
||||||
|
|
|
@ -269,7 +269,7 @@
|
||||||
* typically has NO_MELTDOWN).
|
* typically has NO_MELTDOWN).
|
||||||
*
|
*
|
||||||
* While retbleed_untrain_ret() doesn't clobber anything but requires stack,
|
* While retbleed_untrain_ret() doesn't clobber anything but requires stack,
|
||||||
* entry_ibpb() will clobber AX, CX, DX.
|
* write_ibpb() will clobber AX, CX, DX.
|
||||||
*
|
*
|
||||||
* As such, this must be placed after every *SWITCH_TO_KERNEL_CR3 at a point
|
* As such, this must be placed after every *SWITCH_TO_KERNEL_CR3 at a point
|
||||||
* where we have a stack but before any RET instruction.
|
* where we have a stack but before any RET instruction.
|
||||||
|
@ -279,7 +279,7 @@
|
||||||
VALIDATE_UNRET_END
|
VALIDATE_UNRET_END
|
||||||
CALL_UNTRAIN_RET
|
CALL_UNTRAIN_RET
|
||||||
ALTERNATIVE_2 "", \
|
ALTERNATIVE_2 "", \
|
||||||
"call entry_ibpb", \ibpb_feature, \
|
"call write_ibpb", \ibpb_feature, \
|
||||||
__stringify(\call_depth_insns), X86_FEATURE_CALL_DEPTH
|
__stringify(\call_depth_insns), X86_FEATURE_CALL_DEPTH
|
||||||
#endif
|
#endif
|
||||||
.endm
|
.endm
|
||||||
|
@ -368,7 +368,7 @@ extern void srso_return_thunk(void);
|
||||||
extern void srso_alias_return_thunk(void);
|
extern void srso_alias_return_thunk(void);
|
||||||
|
|
||||||
extern void entry_untrain_ret(void);
|
extern void entry_untrain_ret(void);
|
||||||
extern void entry_ibpb(void);
|
extern void write_ibpb(void);
|
||||||
|
|
||||||
#ifdef CONFIG_X86_64
|
#ifdef CONFIG_X86_64
|
||||||
extern void clear_bhb_loop(void);
|
extern void clear_bhb_loop(void);
|
||||||
|
@ -514,11 +514,11 @@ void alternative_msr_write(unsigned int msr, u64 val, unsigned int feature)
|
||||||
: "memory");
|
: "memory");
|
||||||
}
|
}
|
||||||
|
|
||||||
extern u64 x86_pred_cmd;
|
|
||||||
|
|
||||||
static inline void indirect_branch_prediction_barrier(void)
|
static inline void indirect_branch_prediction_barrier(void)
|
||||||
{
|
{
|
||||||
alternative_msr_write(MSR_IA32_PRED_CMD, x86_pred_cmd, X86_FEATURE_IBPB);
|
asm_inline volatile(ALTERNATIVE("", "call write_ibpb", X86_FEATURE_IBPB)
|
||||||
|
: ASM_CALL_CONSTRAINT
|
||||||
|
:: "rax", "rcx", "rdx", "memory");
|
||||||
}
|
}
|
||||||
|
|
||||||
/* The Intel SPEC CTRL MSR base value cache */
|
/* The Intel SPEC CTRL MSR base value cache */
|
||||||
|
|
|
@ -23,6 +23,8 @@
|
||||||
#include <linux/serial_core.h>
|
#include <linux/serial_core.h>
|
||||||
#include <linux/pgtable.h>
|
#include <linux/pgtable.h>
|
||||||
|
|
||||||
|
#include <xen/xen.h>
|
||||||
|
|
||||||
#include <asm/e820/api.h>
|
#include <asm/e820/api.h>
|
||||||
#include <asm/irqdomain.h>
|
#include <asm/irqdomain.h>
|
||||||
#include <asm/pci_x86.h>
|
#include <asm/pci_x86.h>
|
||||||
|
@ -1729,6 +1731,15 @@ int __init acpi_mps_check(void)
|
||||||
{
|
{
|
||||||
#if defined(CONFIG_X86_LOCAL_APIC) && !defined(CONFIG_X86_MPPARSE)
|
#if defined(CONFIG_X86_LOCAL_APIC) && !defined(CONFIG_X86_MPPARSE)
|
||||||
/* mptable code is not built-in*/
|
/* mptable code is not built-in*/
|
||||||
|
|
||||||
|
/*
|
||||||
|
* Xen disables ACPI in PV DomU guests but it still emulates APIC and
|
||||||
|
* supports SMP. Returning early here ensures that APIC is not disabled
|
||||||
|
* unnecessarily and the guest is not limited to a single vCPU.
|
||||||
|
*/
|
||||||
|
if (xen_pv_domain() && !xen_initial_domain())
|
||||||
|
return 0;
|
||||||
|
|
||||||
if (acpi_disabled || acpi_noirq) {
|
if (acpi_disabled || acpi_noirq) {
|
||||||
pr_warn("MPS support code is not built-in, using acpi=off or acpi=noirq or pci=noacpi may have problem\n");
|
pr_warn("MPS support code is not built-in, using acpi=off or acpi=noirq or pci=noacpi may have problem\n");
|
||||||
return 1;
|
return 1;
|
||||||
|
|
|
@ -805,6 +805,7 @@ static void init_amd_bd(struct cpuinfo_x86 *c)
|
||||||
static const struct x86_cpu_id erratum_1386_microcode[] = {
|
static const struct x86_cpu_id erratum_1386_microcode[] = {
|
||||||
X86_MATCH_VFM_STEPS(VFM_MAKE(X86_VENDOR_AMD, 0x17, 0x01), 0x2, 0x2, 0x0800126e),
|
X86_MATCH_VFM_STEPS(VFM_MAKE(X86_VENDOR_AMD, 0x17, 0x01), 0x2, 0x2, 0x0800126e),
|
||||||
X86_MATCH_VFM_STEPS(VFM_MAKE(X86_VENDOR_AMD, 0x17, 0x31), 0x0, 0x0, 0x08301052),
|
X86_MATCH_VFM_STEPS(VFM_MAKE(X86_VENDOR_AMD, 0x17, 0x31), 0x0, 0x0, 0x08301052),
|
||||||
|
{}
|
||||||
};
|
};
|
||||||
|
|
||||||
static void fix_erratum_1386(struct cpuinfo_x86 *c)
|
static void fix_erratum_1386(struct cpuinfo_x86 *c)
|
||||||
|
|
|
@ -59,7 +59,6 @@ DEFINE_PER_CPU(u64, x86_spec_ctrl_current);
|
||||||
EXPORT_PER_CPU_SYMBOL_GPL(x86_spec_ctrl_current);
|
EXPORT_PER_CPU_SYMBOL_GPL(x86_spec_ctrl_current);
|
||||||
|
|
||||||
u64 x86_pred_cmd __ro_after_init = PRED_CMD_IBPB;
|
u64 x86_pred_cmd __ro_after_init = PRED_CMD_IBPB;
|
||||||
EXPORT_SYMBOL_GPL(x86_pred_cmd);
|
|
||||||
|
|
||||||
static u64 __ro_after_init x86_arch_cap_msr;
|
static u64 __ro_after_init x86_arch_cap_msr;
|
||||||
|
|
||||||
|
@ -1142,7 +1141,7 @@ do_cmd_auto:
|
||||||
setup_clear_cpu_cap(X86_FEATURE_RETHUNK);
|
setup_clear_cpu_cap(X86_FEATURE_RETHUNK);
|
||||||
|
|
||||||
/*
|
/*
|
||||||
* There is no need for RSB filling: entry_ibpb() ensures
|
* There is no need for RSB filling: write_ibpb() ensures
|
||||||
* all predictions, including the RSB, are invalidated,
|
* all predictions, including the RSB, are invalidated,
|
||||||
* regardless of IBPB implementation.
|
* regardless of IBPB implementation.
|
||||||
*/
|
*/
|
||||||
|
@ -1592,51 +1591,54 @@ static void __init spec_ctrl_disable_kernel_rrsba(void)
|
||||||
rrsba_disabled = true;
|
rrsba_disabled = true;
|
||||||
}
|
}
|
||||||
|
|
||||||
static void __init spectre_v2_determine_rsb_fill_type_at_vmexit(enum spectre_v2_mitigation mode)
|
static void __init spectre_v2_select_rsb_mitigation(enum spectre_v2_mitigation mode)
|
||||||
{
|
{
|
||||||
/*
|
/*
|
||||||
* Similar to context switches, there are two types of RSB attacks
|
* WARNING! There are many subtleties to consider when changing *any*
|
||||||
* after VM exit:
|
* code related to RSB-related mitigations. Before doing so, carefully
|
||||||
|
* read the following document, and update if necessary:
|
||||||
*
|
*
|
||||||
* 1) RSB underflow
|
* Documentation/admin-guide/hw-vuln/rsb.rst
|
||||||
*
|
*
|
||||||
* 2) Poisoned RSB entry
|
* In an overly simplified nutshell:
|
||||||
*
|
*
|
||||||
* When retpoline is enabled, both are mitigated by filling/clearing
|
* - User->user RSB attacks are conditionally mitigated during
|
||||||
* the RSB.
|
* context switches by cond_mitigation -> write_ibpb().
|
||||||
*
|
*
|
||||||
* When IBRS is enabled, while #1 would be mitigated by the IBRS branch
|
* - User->kernel and guest->host attacks are mitigated by eIBRS or
|
||||||
* prediction isolation protections, RSB still needs to be cleared
|
* RSB filling.
|
||||||
* because of #2. Note that SMEP provides no protection here, unlike
|
|
||||||
* user-space-poisoned RSB entries.
|
|
||||||
*
|
*
|
||||||
* eIBRS should protect against RSB poisoning, but if the EIBRS_PBRSB
|
* Though, depending on config, note that other alternative
|
||||||
* bug is present then a LITE version of RSB protection is required,
|
* mitigations may end up getting used instead, e.g., IBPB on
|
||||||
* just a single call needs to retire before a RET is executed.
|
* entry/vmexit, call depth tracking, or return thunks.
|
||||||
*/
|
*/
|
||||||
|
|
||||||
switch (mode) {
|
switch (mode) {
|
||||||
case SPECTRE_V2_NONE:
|
case SPECTRE_V2_NONE:
|
||||||
return;
|
break;
|
||||||
|
|
||||||
case SPECTRE_V2_EIBRS_LFENCE:
|
|
||||||
case SPECTRE_V2_EIBRS:
|
case SPECTRE_V2_EIBRS:
|
||||||
if (boot_cpu_has_bug(X86_BUG_EIBRS_PBRSB)) {
|
case SPECTRE_V2_EIBRS_LFENCE:
|
||||||
setup_force_cpu_cap(X86_FEATURE_RSB_VMEXIT_LITE);
|
|
||||||
pr_info("Spectre v2 / PBRSB-eIBRS: Retire a single CALL on VMEXIT\n");
|
|
||||||
}
|
|
||||||
return;
|
|
||||||
|
|
||||||
case SPECTRE_V2_EIBRS_RETPOLINE:
|
case SPECTRE_V2_EIBRS_RETPOLINE:
|
||||||
|
if (boot_cpu_has_bug(X86_BUG_EIBRS_PBRSB)) {
|
||||||
|
pr_info("Spectre v2 / PBRSB-eIBRS: Retire a single CALL on VMEXIT\n");
|
||||||
|
setup_force_cpu_cap(X86_FEATURE_RSB_VMEXIT_LITE);
|
||||||
|
}
|
||||||
|
break;
|
||||||
|
|
||||||
case SPECTRE_V2_RETPOLINE:
|
case SPECTRE_V2_RETPOLINE:
|
||||||
case SPECTRE_V2_LFENCE:
|
case SPECTRE_V2_LFENCE:
|
||||||
case SPECTRE_V2_IBRS:
|
case SPECTRE_V2_IBRS:
|
||||||
|
pr_info("Spectre v2 / SpectreRSB: Filling RSB on context switch and VMEXIT\n");
|
||||||
|
setup_force_cpu_cap(X86_FEATURE_RSB_CTXSW);
|
||||||
setup_force_cpu_cap(X86_FEATURE_RSB_VMEXIT);
|
setup_force_cpu_cap(X86_FEATURE_RSB_VMEXIT);
|
||||||
pr_info("Spectre v2 / SpectreRSB : Filling RSB on VMEXIT\n");
|
break;
|
||||||
return;
|
|
||||||
}
|
|
||||||
|
|
||||||
pr_warn_once("Unknown Spectre v2 mode, disabling RSB mitigation at VM exit");
|
default:
|
||||||
dump_stack();
|
pr_warn_once("Unknown Spectre v2 mode, disabling RSB mitigation\n");
|
||||||
|
dump_stack();
|
||||||
|
break;
|
||||||
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
/*
|
/*
|
||||||
|
@ -1830,48 +1832,7 @@ static void __init spectre_v2_select_mitigation(void)
|
||||||
spectre_v2_enabled = mode;
|
spectre_v2_enabled = mode;
|
||||||
pr_info("%s\n", spectre_v2_strings[mode]);
|
pr_info("%s\n", spectre_v2_strings[mode]);
|
||||||
|
|
||||||
/*
|
spectre_v2_select_rsb_mitigation(mode);
|
||||||
* If Spectre v2 protection has been enabled, fill the RSB during a
|
|
||||||
* context switch. In general there are two types of RSB attacks
|
|
||||||
* across context switches, for which the CALLs/RETs may be unbalanced.
|
|
||||||
*
|
|
||||||
* 1) RSB underflow
|
|
||||||
*
|
|
||||||
* Some Intel parts have "bottomless RSB". When the RSB is empty,
|
|
||||||
* speculated return targets may come from the branch predictor,
|
|
||||||
* which could have a user-poisoned BTB or BHB entry.
|
|
||||||
*
|
|
||||||
* AMD has it even worse: *all* returns are speculated from the BTB,
|
|
||||||
* regardless of the state of the RSB.
|
|
||||||
*
|
|
||||||
* When IBRS or eIBRS is enabled, the "user -> kernel" attack
|
|
||||||
* scenario is mitigated by the IBRS branch prediction isolation
|
|
||||||
* properties, so the RSB buffer filling wouldn't be necessary to
|
|
||||||
* protect against this type of attack.
|
|
||||||
*
|
|
||||||
* The "user -> user" attack scenario is mitigated by RSB filling.
|
|
||||||
*
|
|
||||||
* 2) Poisoned RSB entry
|
|
||||||
*
|
|
||||||
* If the 'next' in-kernel return stack is shorter than 'prev',
|
|
||||||
* 'next' could be tricked into speculating with a user-poisoned RSB
|
|
||||||
* entry.
|
|
||||||
*
|
|
||||||
* The "user -> kernel" attack scenario is mitigated by SMEP and
|
|
||||||
* eIBRS.
|
|
||||||
*
|
|
||||||
* The "user -> user" scenario, also known as SpectreBHB, requires
|
|
||||||
* RSB clearing.
|
|
||||||
*
|
|
||||||
* So to mitigate all cases, unconditionally fill RSB on context
|
|
||||||
* switches.
|
|
||||||
*
|
|
||||||
* FIXME: Is this pointless for retbleed-affected AMD?
|
|
||||||
*/
|
|
||||||
setup_force_cpu_cap(X86_FEATURE_RSB_CTXSW);
|
|
||||||
pr_info("Spectre v2 / SpectreRSB mitigation: Filling RSB on context switch\n");
|
|
||||||
|
|
||||||
spectre_v2_determine_rsb_fill_type_at_vmexit(mode);
|
|
||||||
|
|
||||||
/*
|
/*
|
||||||
* Retpoline protects the kernel, but doesn't protect firmware. IBRS
|
* Retpoline protects the kernel, but doesn't protect firmware. IBRS
|
||||||
|
@ -2676,7 +2637,7 @@ static void __init srso_select_mitigation(void)
|
||||||
setup_clear_cpu_cap(X86_FEATURE_RETHUNK);
|
setup_clear_cpu_cap(X86_FEATURE_RETHUNK);
|
||||||
|
|
||||||
/*
|
/*
|
||||||
* There is no need for RSB filling: entry_ibpb() ensures
|
* There is no need for RSB filling: write_ibpb() ensures
|
||||||
* all predictions, including the RSB, are invalidated,
|
* all predictions, including the RSB, are invalidated,
|
||||||
* regardless of IBPB implementation.
|
* regardless of IBPB implementation.
|
||||||
*/
|
*/
|
||||||
|
@ -2701,7 +2662,7 @@ ibpb_on_vmexit:
|
||||||
srso_mitigation = SRSO_MITIGATION_IBPB_ON_VMEXIT;
|
srso_mitigation = SRSO_MITIGATION_IBPB_ON_VMEXIT;
|
||||||
|
|
||||||
/*
|
/*
|
||||||
* There is no need for RSB filling: entry_ibpb() ensures
|
* There is no need for RSB filling: write_ibpb() ensures
|
||||||
* all predictions, including the RSB, are invalidated,
|
* all predictions, including the RSB, are invalidated,
|
||||||
* regardless of IBPB implementation.
|
* regardless of IBPB implementation.
|
||||||
*/
|
*/
|
||||||
|
|
|
@ -3553,6 +3553,22 @@ static void mkdir_rdt_prepare_rmid_free(struct rdtgroup *rgrp)
|
||||||
free_rmid(rgrp->closid, rgrp->mon.rmid);
|
free_rmid(rgrp->closid, rgrp->mon.rmid);
|
||||||
}
|
}
|
||||||
|
|
||||||
|
/*
|
||||||
|
* We allow creating mon groups only with in a directory called "mon_groups"
|
||||||
|
* which is present in every ctrl_mon group. Check if this is a valid
|
||||||
|
* "mon_groups" directory.
|
||||||
|
*
|
||||||
|
* 1. The directory should be named "mon_groups".
|
||||||
|
* 2. The mon group itself should "not" be named "mon_groups".
|
||||||
|
* This makes sure "mon_groups" directory always has a ctrl_mon group
|
||||||
|
* as parent.
|
||||||
|
*/
|
||||||
|
static bool is_mon_groups(struct kernfs_node *kn, const char *name)
|
||||||
|
{
|
||||||
|
return (!strcmp(rdt_kn_name(kn), "mon_groups") &&
|
||||||
|
strcmp(name, "mon_groups"));
|
||||||
|
}
|
||||||
|
|
||||||
static int mkdir_rdt_prepare(struct kernfs_node *parent_kn,
|
static int mkdir_rdt_prepare(struct kernfs_node *parent_kn,
|
||||||
const char *name, umode_t mode,
|
const char *name, umode_t mode,
|
||||||
enum rdt_group_type rtype, struct rdtgroup **r)
|
enum rdt_group_type rtype, struct rdtgroup **r)
|
||||||
|
@ -3568,6 +3584,15 @@ static int mkdir_rdt_prepare(struct kernfs_node *parent_kn,
|
||||||
goto out_unlock;
|
goto out_unlock;
|
||||||
}
|
}
|
||||||
|
|
||||||
|
/*
|
||||||
|
* Check that the parent directory for a monitor group is a "mon_groups"
|
||||||
|
* directory.
|
||||||
|
*/
|
||||||
|
if (rtype == RDTMON_GROUP && !is_mon_groups(parent_kn, name)) {
|
||||||
|
ret = -EPERM;
|
||||||
|
goto out_unlock;
|
||||||
|
}
|
||||||
|
|
||||||
if (rtype == RDTMON_GROUP &&
|
if (rtype == RDTMON_GROUP &&
|
||||||
(prdtgrp->mode == RDT_MODE_PSEUDO_LOCKSETUP ||
|
(prdtgrp->mode == RDT_MODE_PSEUDO_LOCKSETUP ||
|
||||||
prdtgrp->mode == RDT_MODE_PSEUDO_LOCKED)) {
|
prdtgrp->mode == RDT_MODE_PSEUDO_LOCKED)) {
|
||||||
|
@ -3751,22 +3776,6 @@ out_unlock:
|
||||||
return ret;
|
return ret;
|
||||||
}
|
}
|
||||||
|
|
||||||
/*
|
|
||||||
* We allow creating mon groups only with in a directory called "mon_groups"
|
|
||||||
* which is present in every ctrl_mon group. Check if this is a valid
|
|
||||||
* "mon_groups" directory.
|
|
||||||
*
|
|
||||||
* 1. The directory should be named "mon_groups".
|
|
||||||
* 2. The mon group itself should "not" be named "mon_groups".
|
|
||||||
* This makes sure "mon_groups" directory always has a ctrl_mon group
|
|
||||||
* as parent.
|
|
||||||
*/
|
|
||||||
static bool is_mon_groups(struct kernfs_node *kn, const char *name)
|
|
||||||
{
|
|
||||||
return (!strcmp(rdt_kn_name(kn), "mon_groups") &&
|
|
||||||
strcmp(name, "mon_groups"));
|
|
||||||
}
|
|
||||||
|
|
||||||
static int rdtgroup_mkdir(struct kernfs_node *parent_kn, const char *name,
|
static int rdtgroup_mkdir(struct kernfs_node *parent_kn, const char *name,
|
||||||
umode_t mode)
|
umode_t mode)
|
||||||
{
|
{
|
||||||
|
@ -3782,11 +3791,8 @@ static int rdtgroup_mkdir(struct kernfs_node *parent_kn, const char *name,
|
||||||
if (resctrl_arch_alloc_capable() && parent_kn == rdtgroup_default.kn)
|
if (resctrl_arch_alloc_capable() && parent_kn == rdtgroup_default.kn)
|
||||||
return rdtgroup_mkdir_ctrl_mon(parent_kn, name, mode);
|
return rdtgroup_mkdir_ctrl_mon(parent_kn, name, mode);
|
||||||
|
|
||||||
/*
|
/* Else, attempt to add a monitoring subdirectory. */
|
||||||
* If RDT monitoring is supported and the parent directory is a valid
|
if (resctrl_arch_mon_capable())
|
||||||
* "mon_groups" directory, add a monitoring subdirectory.
|
|
||||||
*/
|
|
||||||
if (resctrl_arch_mon_capable() && is_mon_groups(parent_kn, name))
|
|
||||||
return rdtgroup_mkdir_mon(parent_kn, name, mode);
|
return rdtgroup_mkdir_mon(parent_kn, name, mode);
|
||||||
|
|
||||||
return -EPERM;
|
return -EPERM;
|
||||||
|
|
|
@ -753,22 +753,21 @@ void __init e820__memory_setup_extended(u64 phys_addr, u32 data_len)
|
||||||
void __init e820__register_nosave_regions(unsigned long limit_pfn)
|
void __init e820__register_nosave_regions(unsigned long limit_pfn)
|
||||||
{
|
{
|
||||||
int i;
|
int i;
|
||||||
unsigned long pfn = 0;
|
u64 last_addr = 0;
|
||||||
|
|
||||||
for (i = 0; i < e820_table->nr_entries; i++) {
|
for (i = 0; i < e820_table->nr_entries; i++) {
|
||||||
struct e820_entry *entry = &e820_table->entries[i];
|
struct e820_entry *entry = &e820_table->entries[i];
|
||||||
|
|
||||||
if (pfn < PFN_UP(entry->addr))
|
|
||||||
register_nosave_region(pfn, PFN_UP(entry->addr));
|
|
||||||
|
|
||||||
pfn = PFN_DOWN(entry->addr + entry->size);
|
|
||||||
|
|
||||||
if (entry->type != E820_TYPE_RAM)
|
if (entry->type != E820_TYPE_RAM)
|
||||||
register_nosave_region(PFN_UP(entry->addr), pfn);
|
continue;
|
||||||
|
|
||||||
if (pfn >= limit_pfn)
|
if (last_addr < entry->addr)
|
||||||
break;
|
register_nosave_region(PFN_DOWN(last_addr), PFN_UP(entry->addr));
|
||||||
|
|
||||||
|
last_addr = entry->addr + entry->size;
|
||||||
}
|
}
|
||||||
|
|
||||||
|
register_nosave_region(PFN_DOWN(last_addr), limit_pfn);
|
||||||
}
|
}
|
||||||
|
|
||||||
#ifdef CONFIG_ACPI
|
#ifdef CONFIG_ACPI
|
||||||
|
|
|
@ -389,10 +389,10 @@ static int __init setup_early_printk(char *buf)
|
||||||
keep = (strstr(buf, "keep") != NULL);
|
keep = (strstr(buf, "keep") != NULL);
|
||||||
|
|
||||||
while (*buf != '\0') {
|
while (*buf != '\0') {
|
||||||
if (!strncmp(buf, "mmio", 4)) {
|
if (!strncmp(buf, "mmio32", 6)) {
|
||||||
early_mmio_serial_init(buf + 4);
|
buf += 6;
|
||||||
|
early_mmio_serial_init(buf);
|
||||||
early_console_register(&early_serial_console, keep);
|
early_console_register(&early_serial_console, keep);
|
||||||
buf += 4;
|
|
||||||
}
|
}
|
||||||
if (!strncmp(buf, "serial", 6)) {
|
if (!strncmp(buf, "serial", 6)) {
|
||||||
buf += 6;
|
buf += 6;
|
||||||
|
@ -407,9 +407,9 @@ static int __init setup_early_printk(char *buf)
|
||||||
}
|
}
|
||||||
#ifdef CONFIG_PCI
|
#ifdef CONFIG_PCI
|
||||||
if (!strncmp(buf, "pciserial", 9)) {
|
if (!strncmp(buf, "pciserial", 9)) {
|
||||||
early_pci_serial_init(buf + 9);
|
buf += 9; /* Keep from match the above "pciserial" */
|
||||||
|
early_pci_serial_init(buf);
|
||||||
early_console_register(&early_serial_console, keep);
|
early_console_register(&early_serial_console, keep);
|
||||||
buf += 9; /* Keep from match the above "serial" */
|
|
||||||
}
|
}
|
||||||
#endif
|
#endif
|
||||||
if (!strncmp(buf, "vga", 3) &&
|
if (!strncmp(buf, "vga", 3) &&
|
||||||
|
|
|
@ -667,9 +667,9 @@ static void cond_mitigation(struct task_struct *next)
|
||||||
prev_mm = this_cpu_read(cpu_tlbstate.last_user_mm_spec);
|
prev_mm = this_cpu_read(cpu_tlbstate.last_user_mm_spec);
|
||||||
|
|
||||||
/*
|
/*
|
||||||
* Avoid user/user BTB poisoning by flushing the branch predictor
|
* Avoid user->user BTB/RSB poisoning by flushing them when switching
|
||||||
* when switching between processes. This stops one process from
|
* between processes. This stops one process from doing Spectre-v2
|
||||||
* doing Spectre-v2 attacks on another.
|
* attacks on another.
|
||||||
*
|
*
|
||||||
* Both, the conditional and the always IBPB mode use the mm
|
* Both, the conditional and the always IBPB mode use the mm
|
||||||
* pointer to avoid the IBPB when switching between tasks of the
|
* pointer to avoid the IBPB when switching between tasks of the
|
||||||
|
|
|
@ -26,7 +26,7 @@
|
||||||
/* code below belongs to the image kernel */
|
/* code below belongs to the image kernel */
|
||||||
.align PAGE_SIZE
|
.align PAGE_SIZE
|
||||||
SYM_FUNC_START(restore_registers)
|
SYM_FUNC_START(restore_registers)
|
||||||
ANNOTATE_NOENDBR
|
ENDBR
|
||||||
/* go back to the original page tables */
|
/* go back to the original page tables */
|
||||||
movq %r9, %cr3
|
movq %r9, %cr3
|
||||||
|
|
||||||
|
@ -120,7 +120,7 @@ SYM_FUNC_END(restore_image)
|
||||||
|
|
||||||
/* code below has been relocated to a safe page */
|
/* code below has been relocated to a safe page */
|
||||||
SYM_FUNC_START(core_restore_code)
|
SYM_FUNC_START(core_restore_code)
|
||||||
ANNOTATE_NOENDBR
|
ENDBR
|
||||||
/* switch to temporary page tables */
|
/* switch to temporary page tables */
|
||||||
movq %rax, %cr3
|
movq %rax, %cr3
|
||||||
/* flush TLB */
|
/* flush TLB */
|
||||||
|
|
Loading…
Add table
Reference in a new issue