Commit graph

14571 commits

Author SHA1 Message Date
Reinette Chatre
7604df6e16 x86/intel_rdt: Support flexible data to parsing callbacks
Each resource is associated with a configurable callback that should be
used to parse the information provided for the particular resource from
user space. In addition to the resource and domain pointers this callback
is provided with just the character buffer being parsed.

In support of flexible parsing the callback is modified to support a void
pointer as argument. This enables resources that need more data than just
the user provided data to pass its required data to the callback without
affecting the signatures for the callbacks of all the other resources.

Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: fenghua.yu@intel.com
Cc: tony.luck@intel.com
Cc: vikas.shivappa@linux.intel.com
Cc: gavin.hindman@intel.com
Cc: jithu.joseph@intel.com
Cc: dave.hansen@intel.com
Cc: hpa@zytor.com
Link: https://lkml.kernel.org/r/34baacfced4d787d994ec7015e249e6c7e619053.1529706536.git.reinette.chatre@intel.com
2018-06-23 13:03:43 +02:00
Reinette Chatre
9af4c0a6dc x86/intel_rdt: Making CBM name and type more explicit
cbm_validate() receives a pointer to the variable that will be initialized
with a validated capacity bitmask. The pointer points to a variable of type
unsigned long that is immediately assigned to a variable of type u32 by the
caller on return from cbm_validate().

Let cbm_validate() initialize a variable of type u32 directly.

At this time also change tha variable name "data" within parse_cbm() to a
name more reflective of the content: "cbm_val". This frees up the generic
"data" to be used later when it is indeed used for a collection of input.

Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: fenghua.yu@intel.com
Cc: tony.luck@intel.com
Cc: vikas.shivappa@linux.intel.com
Cc: gavin.hindman@intel.com
Cc: jithu.joseph@intel.com
Cc: dave.hansen@intel.com
Cc: hpa@zytor.com
Link: https://lkml.kernel.org/r/5e29cf0209ea2deac9beacd35cbe5239a50959fb.1529706536.git.reinette.chatre@intel.com
2018-06-23 13:03:43 +02:00
Reinette Chatre
49f7b4efa1 x86/intel_rdt: Enable setting of exclusive mode
The new "mode" file now accepts "exclusive" that means that the
allocations of this resource group cannot be shared.

Enable users to modify a resource group's mode to "exclusive". To
succeed it is required that there is no overlap between resource group's
current schemata and that of all the other active resource groups as
well as cache regions potentially used by other hardware entities.

Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: fenghua.yu@intel.com
Cc: tony.luck@intel.com
Cc: vikas.shivappa@linux.intel.com
Cc: gavin.hindman@intel.com
Cc: jithu.joseph@intel.com
Cc: dave.hansen@intel.com
Cc: hpa@zytor.com
Link: https://lkml.kernel.org/r/83642cbba3c8c21db7fa6bb36fe7d385d3b275f2.1529706536.git.reinette.chatre@intel.com
2018-06-23 13:03:42 +02:00
Reinette Chatre
414dd2b473 x86/intel_rdt: Introduce new "exclusive" mode
At the moment all allocations are shareable. There is no way for a user to
designate that an allocation associated with a resource group cannot be
shared by another.

Introduce the new mode "exclusive". When a resource group is marked as such
it implies that no overlap is allowed between its allocation and that of
another resource group.

Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: fenghua.yu@intel.com
Cc: tony.luck@intel.com
Cc: vikas.shivappa@linux.intel.com
Cc: gavin.hindman@intel.com
Cc: jithu.joseph@intel.com
Cc: dave.hansen@intel.com
Cc: hpa@zytor.com
Link: https://lkml.kernel.org/r/f6d24672a4280fe3b24cd2da9b5f50214439c1af.1529706536.git.reinette.chatre@intel.com
2018-06-23 13:03:42 +02:00
Reinette Chatre
95f0b77efa x86/intel_rdt: Initialize new resource group with sane defaults
Currently when a new resource group is created its allocations would be
those that belonged to the resource group to which its closid belonged
previously.

That is, we can encounter a case like:
mkdir newgroup
cat newgroup/schemata
L2:0=ff;1=ff
echo 'L2:0=0xf0;1=0xf0' > newgroup/schemata
cat newgroup/schemata
L2:0=0xf0;1=0xf0
rmdir newgroup
mkdir newnewgroup
cat newnewgroup/schemata
L2:0=0xf0;1=0xf0

When the new group is created it would be reasonable to expect its
allocations to be initialized with all regions that it can possibly use.
At this time these regions would be all that are shareable by other
resource groups as well as regions that are not currently used.
If the available cache region is found to be non-contiguous the
available region is adjusted to enforce validity.

When a new resource group is created the hardware is initialized with
these new default allocations.

Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: fenghua.yu@intel.com
Cc: tony.luck@intel.com
Cc: vikas.shivappa@linux.intel.com
Cc: gavin.hindman@intel.com
Cc: jithu.joseph@intel.com
Cc: dave.hansen@intel.com
Cc: hpa@zytor.com
Link: https://lkml.kernel.org/r/c468ed79340b63024111978e01430bb9589d85c0.1529706536.git.reinette.chatre@intel.com
2018-06-23 13:03:42 +02:00
Reinette Chatre
024d15be38 x86/intel_rdt: Make useful functions available internally
In support of the work done to enable resource groups to have different
modes some static functions need to be available for sharing amongst
all RDT components.

Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: fenghua.yu@intel.com
Cc: tony.luck@intel.com
Cc: vikas.shivappa@linux.intel.com
Cc: gavin.hindman@intel.com
Cc: jithu.joseph@intel.com
Cc: dave.hansen@intel.com
Cc: hpa@zytor.com
Link: https://lkml.kernel.org/r/2af8fd6e937ae4fbdaa52dee1123823cb4993176.1529706536.git.reinette.chatre@intel.com
2018-06-23 13:03:42 +02:00
Reinette Chatre
0b9aa65626 x86/intel_rdt: Introduce test to determine if closid is in use
During CAT feature discovery the capacity bitmasks (CBMs) associated
with all the classes of service are initialized to all ones, even if the
class of service is not in use. Introduce a test that can be used to
determine if a class of service is in use. This test enables code
interested in parsing the CBMs to know if its values are meaningful or
can be ignored.

Temporarily mark the function as unused to silence compile warnings
until it is used.

Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: fenghua.yu@intel.com
Cc: tony.luck@intel.com
Cc: vikas.shivappa@linux.intel.com
Cc: gavin.hindman@intel.com
Cc: jithu.joseph@intel.com
Cc: dave.hansen@intel.com
Cc: hpa@zytor.com
Link: https://lkml.kernel.org/r/798f8d89cd9b12df492d48c14bdc8ee3b39b1c6f.1529706536.git.reinette.chatre@intel.com
2018-06-23 13:03:41 +02:00
Reinette Chatre
d48d7a57f7 x86/intel_rdt: Introduce resource group's mode resctrl file
A new resctrl file "mode" associated with each resource group is
introduced. This file will display the resource group's current mode and an
administrator can also use it to modify the resource group's mode.

Only shareable mode is currently supported.

Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: fenghua.yu@intel.com
Cc: tony.luck@intel.com
Cc: vikas.shivappa@linux.intel.com
Cc: gavin.hindman@intel.com
Cc: jithu.joseph@intel.com
Cc: dave.hansen@intel.com
Cc: hpa@zytor.com
Link: https://lkml.kernel.org/r/20ab78fda26a8c8d98e18ec555f6a1f728948972.1529706536.git.reinette.chatre@intel.com
2018-06-23 13:03:41 +02:00
Reinette Chatre
472ef09b40 x86/intel_rdt: Associate mode with each RDT resource group
Each RDT resource group is associated with a mode that will reflect
the level of sharing of its allocations. The default, shareable, will be
associated with each resource group on creation since it is zero and
resource groups are created with kzalloc. The managing of the mode of a
resource group will follow. The default resource group always remain
though so ensure that it is reset to the default mode when the resctrl
filesystem is unmounted.

Also introduce a utility that can be used to determine the mode of a
resource group when it is searched for based on its class of service.

Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: fenghua.yu@intel.com
Cc: tony.luck@intel.com
Cc: vikas.shivappa@linux.intel.com
Cc: gavin.hindman@intel.com
Cc: jithu.joseph@intel.com
Cc: dave.hansen@intel.com
Cc: hpa@zytor.com
Link: https://lkml.kernel.org/r/797e4e1de4e4fcdf5b5e0039354d6a28079e2015.1529706536.git.reinette.chatre@intel.com
2018-06-23 13:03:41 +02:00
Reinette Chatre
eb956a636f x86/intel_rdt: Introduce RDT resource group mode
At this time there are no constraints on how bitmasks represented by
schemata can be associated with closids represented by resource groups.  A
bitmask of one class of service can without any objections overlap with the
bitmask of another class of service.

The concept of "mode" is introduced in preparation for support of control
over whether cache regions can be shared between classes of service. At
this time the only mode reflects the current cache allocations where all
can potentially be shared.

Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: fenghua.yu@intel.com
Cc: tony.luck@intel.com
Cc: vikas.shivappa@linux.intel.com
Cc: gavin.hindman@intel.com
Cc: jithu.joseph@intel.com
Cc: dave.hansen@intel.com
Cc: hpa@zytor.com
Link: https://lkml.kernel.org/r/87e88275597fbfa03ea9d41c1186bf012c831c01.1529706536.git.reinette.chatre@intel.com
2018-06-23 13:03:40 +02:00
Reinette Chatre
32206ab365 x86/intel_rdt: Provide pseudo-locking hooks within rdt_mount
Stephen Rothwell reported that the Cache Pseudo-Locking enabling and the
kernfs support for mounting with fs_context are conflicting.

In preparation for a conflict-free merge between the two repos some no-op
hooks are created within the RDT mount function being changed by the two
features. The goal is for this commit to be placed on a minimal no-rebase
branch to be consumed by both features.

Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Suggested-by: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: fenghua.yu@intel.com
Cc: tony.luck@intel.com
Cc: vikas.shivappa@linux.intel.com
Cc: gavin.hindman@intel.com
Cc: jithu.joseph@intel.com
Cc: dave.hansen@intel.com
Cc: hpa@zytor.com
Cc: David Howells <dhowells@redhat.com>
Link: https://lkml.kernel.org/r/410697ead08978bd12111c0afc4ce9e7bd71a5fe.1529706536.git.reinette.chatre@intel.com
2018-06-23 12:53:19 +02:00
Suravee Suthikulpanit
964d978433 x86/CPU/AMD: Fix LLC ID bit-shift calculation
The current logic incorrectly calculates the LLC ID from the APIC ID.

Unless specified otherwise, the LLC ID should be calculated by removing
the Core and Thread ID bits from the least significant end of the APIC
ID. For more info, see "ApicId Enumeration Requirements" in any Fam17h
PPR document.

[ bp: Improve commit message. ]

Fixes: 68091ee7ac ("Calculate last level cache ID from number of sharing threads")
Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1528915390-30533-1-git-send-email-suravee.suthikulpanit@amd.com
2018-06-22 21:21:49 +02:00
Thomas Gleixner
7731b8bc94 Merge branch 'linus' into x86/urgent
Required to queue a dependent fix.
2018-06-22 21:20:35 +02:00
Will Deacon
784e0300fe rseq: Avoid infinite recursion when delivering SIGSEGV
When delivering a signal to a task that is using rseq, we call into
__rseq_handle_notify_resume() so that the registers pushed in the
sigframe are updated to reflect the state of the restartable sequence
(for example, ensuring that the signal returns to the abort handler if
necessary).

However, if the rseq management fails due to an unrecoverable fault when
accessing userspace or certain combinations of RSEQ_CS_* flags, then we
will attempt to deliver a SIGSEGV. This has the potential for infinite
recursion if the rseq code continuously fails on signal delivery.

Avoid this problem by using force_sigsegv() instead of force_sig(), which
is explicitly designed to reset the SEGV handler to SIG_DFL in the case
of a recursive fault. In doing so, remove rseq_signal_deliver() from the
internal rseq API and have an optional struct ksignal * parameter to
rseq_handle_notify_resume() instead.

Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: peterz@infradead.org
Cc: paulmck@linux.vnet.ibm.com
Cc: boqun.feng@gmail.com
Link: https://lkml.kernel.org/r/1529664307-983-1-git-send-email-will.deacon@arm.com
2018-06-22 19:04:22 +02:00
Borislav Petkov
7ce2f0393e x86/CPU/AMD: Move TOPOEXT reenablement before reading smp_num_siblings
The TOPOEXT reenablement is a workaround for broken BIOSen which didn't
enable the CPUID bit. amd_get_topology_early(), however, relies on
that bit being set so that it can read out the CPUID leaf and set
smp_num_siblings properly.

Move the reenablement up to early_init_amd(). While at it, simplify
amd_get_topology_early().

Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2018-06-22 14:52:13 +02:00
Zhenzhong Duan
0218c76626 x86/microcode/intel: Fix memleak in save_microcode_patch()
Free useless ucode_patch entry when it's replaced.

[ bp: Drop the memfree_patch() two-liner. ]

Signed-off-by: Zhenzhong Duan <zhenzhong.duan@oracle.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Srinivas REDDY Eeda <srinivas.eeda@oracle.com>
Link: http://lkml.kernel.org/r/888102f0-fd22-459d-b090-a1bd8a00cb2b@default
2018-06-22 14:42:59 +02:00
Borislav Petkov
d5c84ef202 x86/mce: Cleanup __mc_scan_banks()
Correct comments, improve readability, simplify.

No functional changes.

Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20180622095428.626-7-bp@alien8.de
2018-06-22 14:37:23 +02:00
Borislav Petkov
f35565e398 x86/mce: Carve out bank scanning code
Carve out the scan loop into a separate function.

No functional changes.

Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20180622095428.626-6-bp@alien8.de
2018-06-22 14:37:23 +02:00
Borislav Petkov
45deca7d96 x86/mce: Remove !banks check
If we don't have MCA banks, we won't see machine checks anyway. Drop the
check.

Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20180622095428.626-5-bp@alien8.de
2018-06-22 14:37:22 +02:00
Borislav Petkov
d3d6923cd1 x86/mce: Carve out the crashing_cpu check
Carve out the rendezvous handler timeout avoidance check into a separate
function in order to simplify the #MC handler.

No functional changes.

Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20180622095428.626-4-bp@alien8.de
2018-06-22 14:37:22 +02:00
Arnd Bergmann
bc39f01020 x86/mce: Always use 64-bit timestamps
The machine check timestamp uses get_seconds(), which returns an
'unsigned long' number that might overflow on 32-bit architectures (in
the distant future) and is therefore deprecated.

The normal replacement would be ktime_get_real_seconds(), but that needs
to use a sequence lock that might cause a deadlock if the MCE happens at
just the wrong moment. The __ktime_get_real_seconds() skips that lock
and is safer here, but has a miniscule risk of returning the wrong time
when we read it on a 32-bit architecture at the same time as updating
the epoch, i.e. from before y2106 overflow time to after, or vice versa.

This seems to be an acceptable risk in this particular case, and is the
same thing we do in kdb.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: y2038@lists.linaro.org
Link: http://lkml.kernel.org/r/20180618100759.1921750-1-arnd@arndb.de
2018-06-22 14:37:22 +02:00
Tony Luck
40c36e2741 x86/mce: Fix incorrect "Machine check from unknown source" message
Some injection testing resulted in the following console log:

  mce: [Hardware Error]: CPU 22: Machine Check Exception: f Bank 1: bd80000000100134
  mce: [Hardware Error]: RIP 10:<ffffffffc05292dd> {pmem_do_bvec+0x11d/0x330 [nd_pmem]}
  mce: [Hardware Error]: TSC c51a63035d52 ADDR 3234bc4000 MISC 88
  mce: [Hardware Error]: PROCESSOR 0:50654 TIME 1526502199 SOCKET 0 APIC 38 microcode 2000043
  mce: [Hardware Error]: Run the above through 'mcelog --ascii'
  Kernel panic - not syncing: Machine check from unknown source

This confused everybody because the first line quite clearly shows
that we found a logged error in "Bank 1", while the last line says
"unknown source".

The problem is that the Linux code doesn't do the right thing
for a local machine check that results in a fatal error.

It turns out that we know very early in the handler whether the
machine check is fatal. The call to mce_no_way_out() has checked
all the banks for the CPU that took the local machine check. If
it says we must crash, we can do so right away with the right
messages.

We do scan all the banks again. This means that we might initially
not see a problem, but during the second scan find something fatal.
If this happens we print a slightly different message (so I can
see if it actually every happens).

[ bp: Remove unneeded severity assignment. ]

Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Ashok Raj <ashok.raj@intel.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: stable@vger.kernel.org # 4.2
Link: http://lkml.kernel.org/r/52e049a497e86fd0b71c529651def8871c804df0.1527283897.git.tony.luck@intel.com
2018-06-22 14:35:50 +02:00
Borislav Petkov
1f74c8a647 x86/mce: Do not overwrite MCi_STATUS in mce_no_way_out()
mce_no_way_out() does a quick check during #MC to see whether some of
the MCEs logged would require the kernel to panic immediately. And it
passes a struct mce where MCi_STATUS gets written.

However, after having saved a valid status value, the next iteration
of the loop which goes over the MCA banks on the CPU, overwrites the
valid status value because we're using struct mce as storage instead of
a temporary variable.

Which leads to MCE records with an empty status value:

  mce: [Hardware Error]: CPU 0: Machine Check Exception: 6 Bank 0: 0000000000000000
  mce: [Hardware Error]: RIP 10:<ffffffffbd42fbd7> {trigger_mce+0x7/0x10}

In order to prevent the loss of the status register value, return
immediately when severity is a panic one so that we can panic
immediately with the first fatal MCE logged. This is also the intention
of this function and not to noodle over the banks while a fatal MCE is
already logged.

Tony: read the rest of the MCA bank to populate the struct mce fully.

Suggested-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: <stable@vger.kernel.org>
Link: https://lkml.kernel.org/r/20180622095428.626-8-bp@alien8.de
2018-06-22 14:35:50 +02:00
Masami Hiramatsu
0ea063306e kprobes/x86: Fix %p uses in error messages
Remove all %p uses in error messages in kprobes/x86.

Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: David Howells <dhowells@redhat.com>
Cc: David S . Miller <davem@davemloft.net>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Jon Medhurst <tixy@linaro.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Thomas Richter <tmricht@linux.ibm.com>
Cc: Tobin C . Harding <me@tobin.cc>
Cc: Will Deacon <will.deacon@arm.com>
Cc: acme@kernel.org
Cc: akpm@linux-foundation.org
Cc: brueckner@linux.vnet.ibm.com
Cc: linux-arch@vger.kernel.org
Cc: rostedt@goodmis.org
Cc: schwidefsky@de.ibm.com
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/lkml/152491902310.9916.13355297638917767319.stgit@devbox
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-06-21 17:33:42 +02:00
Oleg Nesterov
90718e32e1 uprobes/x86: Remove incorrect WARN_ON() in uprobe_init_insn()
insn_get_length() has the side-effect of processing the entire instruction
but only if it was decoded successfully, otherwise insn_complete() can fail
and in this case we need to just return an error without warning.

Reported-by: syzbot+30d675e3ca03c1c351e7@syzkaller.appspotmail.com
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: syzkaller-bugs@googlegroups.com
Link: https://lkml.kernel.org/lkml/20180518162739.GA5559@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-06-21 17:11:02 +02:00
Josh Poimboeuf
d31a580266 x86/unwind/orc: Detect the end of the stack
The existing UNWIND_HINT_EMPTY annotations happen to be good indicators
of where entry code calls into C code for the first time.  So also use
them to mark the end of the stack for the ORC unwinder.

Use that information to set unwind->error if the ORC unwinder doesn't
unwind all the way to the end.  This will be needed for enabling
HAVE_RELIABLE_STACKTRACE for the ORC unwinder so we can use it with the
livepatch consistency model.

Thanks to Jiri Slaby for teaching the ORCs about the unwind hints.

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Acked-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/lkml/20180518064713.26440-5-jslaby@suse.cz
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-06-21 16:34:56 +02:00
Jiri Slaby
0c414367c0 x86/stacktrace: Do not fail for ORC with regs on stack
save_stack_trace_reliable now returns "non reliable" when there are
kernel pt_regs on stack. This means an interrupt or exception happened
somewhere down the route. It is a problem for the frame pointer
unwinder, because the frame might not have been set up yet when the irq
happened, so the unwinder might fail to unwind from the interrupted
function.

With ORC, this is not a problem, as ORC has out-of-band data. We can
find ORC data even for the IP in the interrupted function and always
unwind one level up reliably.

So lift the check to apply only when CONFIG_FRAME_POINTER=y is enabled.

Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Acked-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/lkml/20180518064713.26440-4-jslaby@suse.cz
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-06-21 16:34:56 +02:00
Jiri Slaby
441ccc3580 x86/stacktrace: Clarify the reliable success paths
Make clear which path is for user tasks and for kthreads and idle
tasks. This will allow easier plug-in of the ORC unwinder in the next
patches.

Note that we added a check for unwind error to the top of the loop, so
that an error is returned also for user tasks (the 'goto success' would
skip the check after the loop otherwise).

Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Acked-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/lkml/20180518064713.26440-3-jslaby@suse.cz
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-06-21 16:34:55 +02:00
Jiri Slaby
17426923b0 x86/stacktrace: Remove STACKTRACE_DUMP_ONCE
The stack unwinding can sometimes fail yet. Especially with the
generated debug info. So do not yell at users -- live patching (the only
user of this interface) will inform the user about the failure
gracefully.

And given this was the only user of the macro, remove the macro proper
too.

Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Acked-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/lkml/20180518064713.26440-2-jslaby@suse.cz
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-06-21 16:34:55 +02:00
Jiri Slaby
0797a8d0d7 x86/stacktrace: Do not unwind after user regs
Josh pointed out, that there is no way a frame can be after user regs.
So remove the last unwind and the check.

Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Acked-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/lkml/20180518064713.26440-1-jslaby@suse.cz
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-06-21 16:34:55 +02:00
mike.travis@hpe.com
d7609f4210 x86/platform/UV: Add kernel parameter to set memory block size
Add a kernel parameter that allows setting UV memory block size.  This
is to provide an adjustment for new forms of PMEM and other DIMM memory
that might require alignment restrictions other than scanning the global
address table for the required minimum alignment.  The value set will be
further adjusted by both the GAM range table scan as well as restrictions
imposed by set_memory_block_size_order().

Signed-off-by: Mike Travis <mike.travis@hpe.com>
Reviewed-by: Andrew Banman <andrew.banman@hpe.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Dimitri Sivanich <dimitri.sivanich@hpe.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Russ Anderson <russ.anderson@hpe.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: dan.j.williams@intel.com
Cc: jgross@suse.com
Cc: kirill.shutemov@linux.intel.com
Cc: mhocko@suse.com
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/lkml/20180524201711.854849120@stormcage.americas.sgi.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-06-21 16:14:46 +02:00
mike.travis@hpe.com
bbbd2b51a2 x86/platform/UV: Use new set memory block size function
Add a call to the new function to "adjust" the current fixed UV memory
block size of 2GB so it can be changed to a different physical boundary.
This accommodates changes in the Intel BIOS, and therefore UV BIOS,
which now can align boundaries different than the previous UV standard
of 2GB.  It also flags any UV Global Address boundaries from BIOS that
cause a change in the mem block size (boundary).

The current boundary of 2GB has been used on UV since the first system
release in 2009 with Linux 2.6 and has worked fine.  But the new NVDIMM
persistent memory modules (PMEM), along with the Intel BIOS changes to
support these modules caused the memory block size boundary to be set
to a lower limit.  Intel only guarantees that this minimum boundary at
64MB though the current Linux limit is 128MB.

Note that the default remains 2GB if no changes occur.

Signed-off-by: Mike Travis <mike.travis@hpe.com>
Reviewed-by: Andrew Banman <andrew.banman@hpe.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Dimitri Sivanich <dimitri.sivanich@hpe.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Russ Anderson <russ.anderson@hpe.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: dan.j.williams@intel.com
Cc: jgross@suse.com
Cc: kirill.shutemov@linux.intel.com
Cc: mhocko@suse.com
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/lkml/20180524201711.732785782@stormcage.americas.sgi.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-06-21 16:14:45 +02:00
Thomas Gleixner
2207def700 x86/apic: Ignore secondary threads if nosmt=force
nosmt on the kernel command line merely prevents the onlining of the
secondary SMT siblings.

nosmt=force makes the APIC detection code ignore the secondary SMT siblings
completely, so they even do not show up as possible CPUs. That reduces the
amount of memory allocations for per cpu variables and saves other
resources from being allocated too large.

This is not fully equivalent to disabling SMT in the BIOS because the low
level SMT enabling in the BIOS can result in partitioning of resources
between the siblings, which is not undone by just ignoring them. Some CPUs
can use the full resources when their sibling is not onlined, but this is
depending on the CPU family and model and it's not well documented whether
this applies to all partitioned resources. That means depending on the
workload disabling SMT in the BIOS might result in better performance.

Linus analysis of the Intel manual:

  The intel optimization manual is not very clear on what the partitioning
  rules are.

  I find:

    "In general, the buffers for staging instructions between major pipe
     stages  are partitioned. These buffers include µop queues after the
     execution trace cache, the queues after the register rename stage, the
     reorder buffer which stages instructions for retirement, and the load
     and store buffers.

     In the case of load and store buffers, partitioning also provided an
     easier implementation to maintain memory ordering for each logical
     processor and detect memory ordering violations"

  but some of that partitioning may be relaxed if the HT thread is "not
  active":

    "In Intel microarchitecture code name Sandy Bridge, the micro-op queue
     is statically partitioned to provide 28 entries for each logical
     processor,  irrespective of software executing in single thread or
     multiple threads. If one logical processor is not active in Intel
     microarchitecture code name Ivy Bridge, then a single thread executing
     on that processor  core can use the 56 entries in the micro-op queue"

  but I do not know what "not active" means, and how dynamic it is. Some of
  that partitioning may be entirely static and depend on the early BIOS
  disabling of HT, and even if we park the cores, the resources will just be
  wasted.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
2018-06-21 14:21:00 +02:00
Thomas Gleixner
1e1d7e25fd x86/cpu/AMD: Evaluate smp_num_siblings early
To support force disabling of SMT it's required to know the number of
thread siblings early. amd_get_topology() cannot be called before the APIC
driver is selected, so split out the part which initializes
smp_num_siblings and invoke it from amd_early_init().

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Ingo Molnar <mingo@kernel.org>
2018-06-21 14:21:00 +02:00
Borislav Petkov
119bff8a9c x86/CPU/AMD: Do not check CPUID max ext level before parsing SMP info
Old code used to check whether CPUID ext max level is >= 0x80000008 because
that last leaf contains the number of cores of the physical CPU.  The three
functions called there now do not depend on that leaf anymore so the check
can go.

Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Ingo Molnar <mingo@kernel.org>
2018-06-21 14:20:59 +02:00
Thomas Gleixner
1910ad5624 x86/cpu/intel: Evaluate smp_num_siblings early
Make use of the new early detection function to initialize smp_num_siblings
on the boot cpu before the MP-Table or ACPI/MADT scan happens. That's
required for force disabling SMT.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
2018-06-21 14:20:59 +02:00
Thomas Gleixner
95f3d39ccf x86/cpu/topology: Provide detect_extended_topology_early()
To support force disabling of SMT it's required to know the number of
thread siblings early. detect_extended_topology() cannot be called before
the APIC driver is selected, so split out the part which initializes
smp_num_siblings.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
2018-06-21 14:20:59 +02:00
Thomas Gleixner
545401f444 x86/cpu/common: Provide detect_ht_early()
To support force disabling of SMT it's required to know the number of
thread siblings early. detect_ht() cannot be called before the APIC driver
is selected, so split out the part which initializes smp_num_siblings.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
2018-06-21 14:20:59 +02:00
Thomas Gleixner
44ca36de56 x86/cpu/AMD: Remove the pointless detect_ht() call
Real 32bit AMD CPUs do not have SMT and the only value of the call was to
reach the magic printout which got removed.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
2018-06-21 14:20:58 +02:00
Thomas Gleixner
55e6d279ab x86/cpu: Remove the pointless CPU printout
The value of this printout is dubious at best and there is no point in
having it in two different places along with convoluted ways to reach it.

Remove it completely.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
2018-06-21 14:20:58 +02:00
Thomas Gleixner
f048c399e0 x86/topology: Provide topology_smt_supported()
Provide information whether SMT is supoorted by the CPUs. Preparatory patch
for SMT control mechanism.

Suggested-by: Dave Hansen <dave.hansen@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Ingo Molnar <mingo@kernel.org>
2018-06-21 14:20:57 +02:00
Thomas Gleixner
6a4d2657e0 x86/smp: Provide topology_is_primary_thread()
If the CPU is supporting SMT then the primary thread can be found by
checking the lower APIC ID bits for zero. smp_num_siblings is used to build
the mask for the APIC ID bits which need to be taken into account.

This uses the MPTABLE or ACPI/MADT supplied APIC ID, which can be different
than the initial APIC ID in CPUID. But according to AMD the lower bits have
to be consistent. Intel gave a tentative confirmation as well.

Preparatory patch to support disabling SMT at boot/runtime.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
2018-06-21 14:20:57 +02:00
Jiri Kosina
6cb2b08ff9 x86/pti: Don't report XenPV as vulnerable
Xen PV domain kernel is not by design affected by meltdown as it's
enforcing split CR3 itself. Let's not report such systems as "Vulnerable"
in sysfs (we're also already forcing PTI to off in X86_HYPER_XEN_PV cases);
the security of the system ultimately depends on presence of mitigation in
the Hypervisor, which can't be easily detected from DomU; let's report
that.

Reported-and-tested-by: Mike Latimer <mlatimer@suse.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Juergen Gross <jgross@suse.com>
Cc: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/nycvar.YFH.7.76.1806180959080.6203@cbobk.fhfr.pm
[ Merge the user-visible string into a single line. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-06-21 14:14:52 +02:00
Masami Hiramatsu
2bbda764d7 kprobes/x86: Do not disable preempt on int3 path
Since int3 and debug exception(for singlestep) are run with
IRQ disabled and while running single stepping we drop IF
from regs->flags, that path must not be preemptible. So we
can remove the preempt disable/enable calls from that path.

Suggested-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Ananth N Mavinakayanahalli <ananth@linux.vnet.ibm.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: linux-arch@vger.kernel.org
Cc: linux-doc@vger.kernel.org
Link: https://lore.kernel.org/lkml/152942497779.15209.2879580696589868291.stgit@devbox
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-06-21 12:33:20 +02:00
Masami Hiramatsu
cce188bd58 bpf/error-inject/kprobes: Clear current_kprobe and enable preempt in kprobe
Clear current_kprobe and enable preemption in kprobe
even if pre_handler returns !0.

This simplifies function override using kprobes.

Jprobe used to require to keep the preemption disabled and
keep current_kprobe until it returned to original function
entry. For this reason kprobe_int3_handler() and similar
arch dependent kprobe handers checks pre_handler result
and exit without enabling preemption if the result is !0.

After removing the jprobe, Kprobes does not need to
keep preempt disabled even if user handler returns !0
anymore.

But since the function override handler in error-inject
and bpf is also returns !0 if it overrides a function,
to balancing the preempt count, it enables preemption
and reset current kprobe by itself.

That is a bad design that is very buggy. This fixes
such unbalanced preempt-count and current_kprobes setting
in kprobes, bpf and error-inject.

Note: for powerpc and x86, this removes all preempt_disable
from kprobe_ftrace_handler because ftrace callbacks are
called under preempt disabled.

Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Ananth N Mavinakayanahalli <ananth@linux.vnet.ibm.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: James Hogan <jhogan@kernel.org>
Cc: Josef Bacik <jbacik@fb.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Rich Felker <dalias@libc.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: linux-arch@vger.kernel.org
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-ia64@vger.kernel.org
Cc: linux-mips@linux-mips.org
Cc: linux-s390@vger.kernel.org
Cc: linux-sh@vger.kernel.org
Cc: linux-snps-arc@lists.infradead.org
Cc: linuxppc-dev@lists.ozlabs.org
Cc: sparclinux@vger.kernel.org
Link: https://lore.kernel.org/lkml/152942494574.15209.12323837825873032258.stgit@devbox
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-06-21 12:33:19 +02:00
Masami Hiramatsu
e704e34cd0 kprobes/x86: Don't call the ->break_handler() in x86 kprobes
Don't call the ->break_handler() and remove break_handler
related code from x86 since that was only used by jprobe
which got removed.

Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Ananth N Mavinakayanahalli <ananth@linux.vnet.ibm.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: linux-arch@vger.kernel.org
Link: https://lore.kernel.org/lkml/152942465549.15209.15889693025972771135.stgit@devbox
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-06-21 12:33:13 +02:00
Masami Hiramatsu
80006dbee6 kprobes/x86: Remove jprobe implementation
Remove arch dependent setjump/longjump functions
and unused fields in kprobe_ctlblk for jprobes
from arch/x86.

Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Ananth N Mavinakayanahalli <ananth@linux.vnet.ibm.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: linux-arch@vger.kernel.org
Link: https://lore.kernel.org/lkml/152942433578.15209.14034551799624757792.stgit@devbox
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-06-21 12:33:05 +02:00
Konrad Rzeszutek Wilk
56563f53d3 x86/bugs: Move the l1tf function and define pr_fmt properly
The pr_warn in l1tf_select_mitigation would have used the prior pr_fmt
which was defined as "Spectre V2 : ".

Move the function to be past SSBD and also define the pr_fmt.

Fixes: 17dbca1193 ("x86/speculation/l1tf: Add sysfs reporting for l1tf")
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2018-06-21 08:38:34 +02:00
Andi Kleen
17dbca1193 x86/speculation/l1tf: Add sysfs reporting for l1tf
L1TF core kernel workarounds are cheap and normally always enabled, However
they still should be reported in sysfs if the system is vulnerable or
mitigated. Add the necessary CPU feature/bug bits.

- Extend the existing checks for Meltdowns to determine if the system is
  vulnerable. All CPUs which are not vulnerable to Meltdown are also not
  vulnerable to L1TF

- Check for 32bit non PAE and emit a warning as there is no practical way
  for mitigation due to the limited physical address bits

- If the system has more than MAX_PA/2 physical memory the invert page
  workarounds don't protect the system against the L1TF attack anymore,
  because an inverted physical address will also point to valid
  memory. Print a warning in this case and report that the system is
  vulnerable.

Add a function which returns the PFN limit for the L1TF mitigation, which
will be used in follow up patches for sanity and range checks.

[ tglx: Renamed the CPU feature bit to L1TF_PTEINV ]

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
Acked-by: Dave Hansen <dave.hansen@intel.com>
2018-06-20 19:10:00 +02:00
Andi Kleen
10a70416e1 x86/speculation/l1tf: Make sure the first page is always reserved
The L1TF workaround doesn't make any attempt to mitigate speculate accesses
to the first physical page for zeroed PTEs. Normally it only contains some
data from the early real mode BIOS.

It's not entirely clear that the first page is reserved in all
configurations, so add an extra reservation call to make sure it is really
reserved. In most configurations (e.g.  with the standard reservations)
it's likely a nop.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
Acked-by: Dave Hansen <dave.hansen@intel.com>
2018-06-20 19:10:00 +02:00