2019-05-27 08:55:01 +02:00
|
|
|
// SPDX-License-Identifier: GPL-2.0-or-later
|
2016-04-29 23:26:05 +10:00
|
|
|
/*
|
|
|
|
* TLB flush routines for radix kernels.
|
|
|
|
*
|
|
|
|
* Copyright 2015-2016, Aneesh Kumar K.V, IBM Corporation.
|
|
|
|
*/
|
|
|
|
|
|
|
|
#include <linux/mm.h>
|
|
|
|
#include <linux/hugetlb.h>
|
|
|
|
#include <linux/memblock.h>
|
powerpc/64s/radix: flush remote CPUs out of single-threaded mm_cpumask
When a single-threaded process has a non-local mm_cpumask, try to use
that point to flush the TLBs out of other CPUs in the cpumask.
An IPI is used for clearing remote CPUs for a few reasons:
- An IPI can end lazy TLB use of the mm, which is required to prevent
TLB entries being created on the remote CPU. The alternative is to
drop lazy TLB switching completely, which costs 7.5% in a context
switch ping-pong test betwee a process and kernel idle thread.
- An IPI can have remote CPUs flush the entire PID, but the local CPU
can flush a specific VA. tlbie would require over-flushing of the
local CPU (where the process is running).
- A single threaded process that is migrated to a different CPU is
likely to have a relatively small mm_cpumask, so IPI is reasonable.
No other thread can concurrently switch to this mm, because it must
have been given a reference to mm_users by the current thread before it
can use_mm. mm_users can be asynchronously incremented (by
mm_activate or mmget_not_zero), but those users must use remote mm
access and can't use_mm or access user address space. Existing code
makes the this assumption already, for example sparc64 has reset
mm_cpumask using this condition since the start of history, see
arch/sparc/kernel/smp_64.c.
This reduces tlbies for a kernel compile workload from 0.90M to 0.12M,
tlbiels are increased significantly due to the PID flushing for the
cleaning up remote CPUs, and increased local flushes (PID flushes take
128 tlbiels vs 1 tlbie).
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-06-01 20:01:21 +10:00
|
|
|
#include <linux/mmu_context.h>
|
|
|
|
#include <linux/sched/mm.h>
|
2021-08-12 18:58:31 +05:30
|
|
|
#include <linux/debugfs.h>
|
2016-04-29 23:26:05 +10:00
|
|
|
|
powerpc/mm/radix: Workaround prefetch issue with KVM
There's a somewhat architectural issue with Radix MMU and KVM.
When coming out of a guest with AIL (Alternate Interrupt Location, ie,
MMU enabled), we start executing hypervisor code with the PID register
still containing whatever the guest has been using.
The problem is that the CPU can (and will) then start prefetching or
speculatively load from whatever host context has that same PID (if
any), thus bringing translations for that context into the TLB, which
Linux doesn't know about.
This can cause stale translations and subsequent crashes.
Fixing this in a way that is neither racy nor a huge performance
impact is difficult. We could just make the host invalidations always
use broadcast forms but that would hurt single threaded programs for
example.
We chose to fix it instead by partitioning the PID space between guest
and host. This is possible because today Linux only use 19 out of the
20 bits of PID space, so existing guests will work if we make the host
use the top half of the 20 bits space.
We additionally add support for a property to indicate to Linux the
size of the PID register which will be useful if we eventually have
processors with a larger PID space available.
There is still an issue with malicious guests purposefully setting the
PID register to a value in the hosts PID range. Hopefully future HW
can prevent that, but in the meantime, we handle it with a pair of
kludges:
- On the way out of a guest, before we clear the current VCPU in the
PACA, we check the PID and if it's outside of the permitted range
we flush the TLB for that PID.
- When context switching, if the mm is "new" on that CPU (the
corresponding bit was set for the first time in the mm cpumask), we
check if any sibling thread is in KVM (has a non-NULL VCPU pointer
in the PACA). If that is the case, we also flush the PID for that
CPU (core).
This second part is needed to handle the case where a process is
migrated (or starts a new pthread) on a sibling thread of the CPU
coming out of KVM, as there's a window where stale translations can
exist before we detect it and flush them out.
A future optimization could be added by keeping track of whether the
PID has ever been used and avoid doing that for completely fresh PIDs.
We could similarily mark PIDs that have been the subject of a global
invalidation as "fresh". But for now this will do.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
[mpe: Rework the asm to build with CONFIG_PPC_RADIX_MMU=n, drop
unneeded include of kvm_book3s_asm.h]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-07-24 14:26:06 +10:00
|
|
|
#include <asm/ppc-opcode.h>
|
2016-04-29 23:26:05 +10:00
|
|
|
#include <asm/tlb.h>
|
|
|
|
#include <asm/tlbflush.h>
|
2017-04-11 15:23:25 +10:00
|
|
|
#include <asm/trace.h>
|
powerpc/mm/radix: Workaround prefetch issue with KVM
There's a somewhat architectural issue with Radix MMU and KVM.
When coming out of a guest with AIL (Alternate Interrupt Location, ie,
MMU enabled), we start executing hypervisor code with the PID register
still containing whatever the guest has been using.
The problem is that the CPU can (and will) then start prefetching or
speculatively load from whatever host context has that same PID (if
any), thus bringing translations for that context into the TLB, which
Linux doesn't know about.
This can cause stale translations and subsequent crashes.
Fixing this in a way that is neither racy nor a huge performance
impact is difficult. We could just make the host invalidations always
use broadcast forms but that would hurt single threaded programs for
example.
We chose to fix it instead by partitioning the PID space between guest
and host. This is possible because today Linux only use 19 out of the
20 bits of PID space, so existing guests will work if we make the host
use the top half of the 20 bits space.
We additionally add support for a property to indicate to Linux the
size of the PID register which will be useful if we eventually have
processors with a larger PID space available.
There is still an issue with malicious guests purposefully setting the
PID register to a value in the hosts PID range. Hopefully future HW
can prevent that, but in the meantime, we handle it with a pair of
kludges:
- On the way out of a guest, before we clear the current VCPU in the
PACA, we check the PID and if it's outside of the permitted range
we flush the TLB for that PID.
- When context switching, if the mm is "new" on that CPU (the
corresponding bit was set for the first time in the mm cpumask), we
check if any sibling thread is in KVM (has a non-NULL VCPU pointer
in the PACA). If that is the case, we also flush the PID for that
CPU (core).
This second part is needed to handle the case where a process is
migrated (or starts a new pthread) on a sibling thread of the CPU
coming out of KVM, as there's a window where stale translations can
exist before we detect it and flush them out.
A future optimization could be added by keeping track of whether the
PID has ever been used and avoid doing that for completely fresh PIDs.
We could similarily mark PIDs that have been the subject of a global
invalidation as "fresh". But for now this will do.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
[mpe: Rework the asm to build with CONFIG_PPC_RADIX_MMU=n, drop
unneeded include of kvm_book3s_asm.h]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-07-24 14:26:06 +10:00
|
|
|
#include <asm/cputhreads.h>
|
2020-07-03 11:06:08 +05:30
|
|
|
#include <asm/plpar_wrappers.h>
|
2016-04-29 23:26:05 +10:00
|
|
|
|
2021-02-11 00:08:03 +11:00
|
|
|
#include "internal.h"
|
|
|
|
|
powerpc/64s: Improve local TLB flush for boot and MCE on POWER9
There are several cases outside the normal address space management
where a CPU's entire local TLB is to be flushed:
1. Booting the kernel, in case something has left stale entries in
the TLB (e.g., kexec).
2. Machine check, to clean corrupted TLB entries.
One other place where the TLB is flushed, is waking from deep idle
states. The flush is a side-effect of calling ->cpu_restore with the
intention of re-setting various SPRs. The flush itself is unnecessary
because in the first case, the TLB should not acquire new corrupted
TLB entries as part of sleep/wake (though they may be lost).
This type of TLB flush is coded inflexibly, several times for each CPU
type, and they have a number of problems with ISA v3.0B:
- The current radix mode of the MMU is not taken into account, it is
always done as a hash flushn For IS=2 (LPID-matching flush from host)
and IS=3 with HV=0 (guest kernel flush), tlbie(l) is undefined if
the R field does not match the current radix mode.
- ISA v3.0B hash must flush the partition and process table caches as
well.
- ISA v3.0B radix must flush partition and process scoped translations,
partition and process table caches, and also the page walk cache.
So consolidate the flushing code and implement it in C and inline asm
under the mm/ directory with the rest of the flush code. Add ISA v3.0B
cases for radix and hash, and use the radix flush in radix environment.
Provide a way for IS=2 (LPID flush) to specify the radix mode of the
partition. Have KVM pass in the radix mode of the guest.
Take out the flushes from early cputable/dt_cpu_ftrs detection hooks,
and move it later in the boot process after, the MMU registers are set
up and before relocation is first turned on.
The TLB flush is no longer called when restoring from deep idle states.
This was not be done as a separate step because booting secondaries
uses the same cpu_restore as idle restore, which needs the TLB flush.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-12-24 01:15:50 +10:00
|
|
|
/*
|
|
|
|
* tlbiel instruction for radix, set invalidation
|
|
|
|
* i.e., r=1 and is=01 or is=10 or is=11
|
|
|
|
*/
|
2019-05-21 22:13:24 +09:00
|
|
|
static __always_inline void tlbiel_radix_set_isa300(unsigned int set, unsigned int is,
|
powerpc/64s: Improve local TLB flush for boot and MCE on POWER9
There are several cases outside the normal address space management
where a CPU's entire local TLB is to be flushed:
1. Booting the kernel, in case something has left stale entries in
the TLB (e.g., kexec).
2. Machine check, to clean corrupted TLB entries.
One other place where the TLB is flushed, is waking from deep idle
states. The flush is a side-effect of calling ->cpu_restore with the
intention of re-setting various SPRs. The flush itself is unnecessary
because in the first case, the TLB should not acquire new corrupted
TLB entries as part of sleep/wake (though they may be lost).
This type of TLB flush is coded inflexibly, several times for each CPU
type, and they have a number of problems with ISA v3.0B:
- The current radix mode of the MMU is not taken into account, it is
always done as a hash flushn For IS=2 (LPID-matching flush from host)
and IS=3 with HV=0 (guest kernel flush), tlbie(l) is undefined if
the R field does not match the current radix mode.
- ISA v3.0B hash must flush the partition and process table caches as
well.
- ISA v3.0B radix must flush partition and process scoped translations,
partition and process table caches, and also the page walk cache.
So consolidate the flushing code and implement it in C and inline asm
under the mm/ directory with the rest of the flush code. Add ISA v3.0B
cases for radix and hash, and use the radix flush in radix environment.
Provide a way for IS=2 (LPID flush) to specify the radix mode of the
partition. Have KVM pass in the radix mode of the guest.
Take out the flushes from early cputable/dt_cpu_ftrs detection hooks,
and move it later in the boot process after, the MMU registers are set
up and before relocation is first turned on.
The TLB flush is no longer called when restoring from deep idle states.
This was not be done as a separate step because booting secondaries
uses the same cpu_restore as idle restore, which needs the TLB flush.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-12-24 01:15:50 +10:00
|
|
|
unsigned int pid,
|
|
|
|
unsigned int ric, unsigned int prs)
|
|
|
|
{
|
|
|
|
unsigned long rb;
|
|
|
|
unsigned long rs;
|
|
|
|
|
|
|
|
rb = (set << PPC_BITLSHIFT(51)) | (is << PPC_BITLSHIFT(53));
|
|
|
|
rs = ((unsigned long)pid << PPC_BITLSHIFT(31));
|
|
|
|
|
powerpc/mm/radix: Fix checkstops caused by invalid tlbiel
In tlbiel_radix_set_isa300() we use the PPC_TLBIEL() macro to
construct tlbiel instructions. The instruction takes 5 fields, two of
which are registers, and the others are constants. But because it's
constructed with inline asm the compiler doesn't know that.
We got the constraint wrong on the 'r' field, using "r" tells the
compiler to put the value in a register. The value we then get in the
macro is the *register number*, not the value of the field.
That means when we mask the register number with 0x1 we get 0 or 1
depending on which register the compiler happens to put the constant
in, eg:
li r10,1
tlbiel r8,r9,2,0,0
li r7,1
tlbiel r10,r6,0,0,1
If we're unlucky we might generate an invalid instruction form, for
example RIC=0, PRS=1 and R=0, tlbiel r8,r7,0,1,0, this has been
observed to cause machine checks:
Oops: Machine check, sig: 7 [#1]
CPU: 24 PID: 0 Comm: swapper
NIP: 00000000000385f4 LR: 000000000100ed00 CTR: 000000000000007f
REGS: c00000000110bb40 TRAP: 0200
MSR: 9000000000201003 <SF,HV,ME,RI,LE> CR: 48002222 XER: 20040000
CFAR: 00000000000385d0 DAR: 0000000000001c00 DSISR: 00000200 SOFTE: 1
If the machine check happens early in boot while we have MSR_ME=0 it
will escalate into a checkstop and kill the box entirely.
To fix it we could change the inline asm constraint to "i" which
tells the compiler the value is a constant. But a better fix is to just
pass a literal 1 into the macro, which bypasses any problems with inline
asm constraints.
Fixes: d4748276ae14 ("powerpc/64s: Improve local TLB flush for boot and MCE on POWER9")
Cc: stable@vger.kernel.org # v4.16+
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
2018-04-12 11:35:55 +10:00
|
|
|
asm volatile(PPC_TLBIEL(%0, %1, %2, %3, 1)
|
|
|
|
: : "r"(rb), "r"(rs), "i"(ric), "i"(prs)
|
powerpc/64s: Improve local TLB flush for boot and MCE on POWER9
There are several cases outside the normal address space management
where a CPU's entire local TLB is to be flushed:
1. Booting the kernel, in case something has left stale entries in
the TLB (e.g., kexec).
2. Machine check, to clean corrupted TLB entries.
One other place where the TLB is flushed, is waking from deep idle
states. The flush is a side-effect of calling ->cpu_restore with the
intention of re-setting various SPRs. The flush itself is unnecessary
because in the first case, the TLB should not acquire new corrupted
TLB entries as part of sleep/wake (though they may be lost).
This type of TLB flush is coded inflexibly, several times for each CPU
type, and they have a number of problems with ISA v3.0B:
- The current radix mode of the MMU is not taken into account, it is
always done as a hash flushn For IS=2 (LPID-matching flush from host)
and IS=3 with HV=0 (guest kernel flush), tlbie(l) is undefined if
the R field does not match the current radix mode.
- ISA v3.0B hash must flush the partition and process table caches as
well.
- ISA v3.0B radix must flush partition and process scoped translations,
partition and process table caches, and also the page walk cache.
So consolidate the flushing code and implement it in C and inline asm
under the mm/ directory with the rest of the flush code. Add ISA v3.0B
cases for radix and hash, and use the radix flush in radix environment.
Provide a way for IS=2 (LPID flush) to specify the radix mode of the
partition. Have KVM pass in the radix mode of the guest.
Take out the flushes from early cputable/dt_cpu_ftrs detection hooks,
and move it later in the boot process after, the MMU registers are set
up and before relocation is first turned on.
The TLB flush is no longer called when restoring from deep idle states.
This was not be done as a separate step because booting secondaries
uses the same cpu_restore as idle restore, which needs the TLB flush.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-12-24 01:15:50 +10:00
|
|
|
: "memory");
|
|
|
|
}
|
|
|
|
|
|
|
|
static void tlbiel_all_isa300(unsigned int num_sets, unsigned int is)
|
|
|
|
{
|
|
|
|
unsigned int set;
|
|
|
|
|
|
|
|
asm volatile("ptesync": : :"memory");
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Flush the first set of the TLB, and the entire Page Walk Cache
|
|
|
|
* and partition table entries. Then flush the remaining sets of the
|
|
|
|
* TLB.
|
|
|
|
*/
|
|
|
|
|
2019-09-03 01:29:29 +10:00
|
|
|
if (early_cpu_has_feature(CPU_FTR_HVMODE)) {
|
|
|
|
/* MSR[HV] should flush partition scope translations first. */
|
|
|
|
tlbiel_radix_set_isa300(0, is, 0, RIC_FLUSH_ALL, 0);
|
2020-10-07 11:03:05 +05:30
|
|
|
|
|
|
|
if (!early_cpu_has_feature(CPU_FTR_ARCH_31)) {
|
|
|
|
for (set = 1; set < num_sets; set++)
|
|
|
|
tlbiel_radix_set_isa300(set, is, 0,
|
|
|
|
RIC_FLUSH_TLB, 0);
|
|
|
|
}
|
2019-09-03 01:29:29 +10:00
|
|
|
}
|
|
|
|
|
|
|
|
/* Flush process scoped entries. */
|
powerpc/64s: Improve local TLB flush for boot and MCE on POWER9
There are several cases outside the normal address space management
where a CPU's entire local TLB is to be flushed:
1. Booting the kernel, in case something has left stale entries in
the TLB (e.g., kexec).
2. Machine check, to clean corrupted TLB entries.
One other place where the TLB is flushed, is waking from deep idle
states. The flush is a side-effect of calling ->cpu_restore with the
intention of re-setting various SPRs. The flush itself is unnecessary
because in the first case, the TLB should not acquire new corrupted
TLB entries as part of sleep/wake (though they may be lost).
This type of TLB flush is coded inflexibly, several times for each CPU
type, and they have a number of problems with ISA v3.0B:
- The current radix mode of the MMU is not taken into account, it is
always done as a hash flushn For IS=2 (LPID-matching flush from host)
and IS=3 with HV=0 (guest kernel flush), tlbie(l) is undefined if
the R field does not match the current radix mode.
- ISA v3.0B hash must flush the partition and process table caches as
well.
- ISA v3.0B radix must flush partition and process scoped translations,
partition and process table caches, and also the page walk cache.
So consolidate the flushing code and implement it in C and inline asm
under the mm/ directory with the rest of the flush code. Add ISA v3.0B
cases for radix and hash, and use the radix flush in radix environment.
Provide a way for IS=2 (LPID flush) to specify the radix mode of the
partition. Have KVM pass in the radix mode of the guest.
Take out the flushes from early cputable/dt_cpu_ftrs detection hooks,
and move it later in the boot process after, the MMU registers are set
up and before relocation is first turned on.
The TLB flush is no longer called when restoring from deep idle states.
This was not be done as a separate step because booting secondaries
uses the same cpu_restore as idle restore, which needs the TLB flush.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-12-24 01:15:50 +10:00
|
|
|
tlbiel_radix_set_isa300(0, is, 0, RIC_FLUSH_ALL, 1);
|
2020-10-07 11:03:05 +05:30
|
|
|
|
|
|
|
if (!early_cpu_has_feature(CPU_FTR_ARCH_31)) {
|
|
|
|
for (set = 1; set < num_sets; set++)
|
|
|
|
tlbiel_radix_set_isa300(set, is, 0, RIC_FLUSH_TLB, 1);
|
|
|
|
}
|
powerpc/64s: Improve local TLB flush for boot and MCE on POWER9
There are several cases outside the normal address space management
where a CPU's entire local TLB is to be flushed:
1. Booting the kernel, in case something has left stale entries in
the TLB (e.g., kexec).
2. Machine check, to clean corrupted TLB entries.
One other place where the TLB is flushed, is waking from deep idle
states. The flush is a side-effect of calling ->cpu_restore with the
intention of re-setting various SPRs. The flush itself is unnecessary
because in the first case, the TLB should not acquire new corrupted
TLB entries as part of sleep/wake (though they may be lost).
This type of TLB flush is coded inflexibly, several times for each CPU
type, and they have a number of problems with ISA v3.0B:
- The current radix mode of the MMU is not taken into account, it is
always done as a hash flushn For IS=2 (LPID-matching flush from host)
and IS=3 with HV=0 (guest kernel flush), tlbie(l) is undefined if
the R field does not match the current radix mode.
- ISA v3.0B hash must flush the partition and process table caches as
well.
- ISA v3.0B radix must flush partition and process scoped translations,
partition and process table caches, and also the page walk cache.
So consolidate the flushing code and implement it in C and inline asm
under the mm/ directory with the rest of the flush code. Add ISA v3.0B
cases for radix and hash, and use the radix flush in radix environment.
Provide a way for IS=2 (LPID flush) to specify the radix mode of the
partition. Have KVM pass in the radix mode of the guest.
Take out the flushes from early cputable/dt_cpu_ftrs detection hooks,
and move it later in the boot process after, the MMU registers are set
up and before relocation is first turned on.
The TLB flush is no longer called when restoring from deep idle states.
This was not be done as a separate step because booting secondaries
uses the same cpu_restore as idle restore, which needs the TLB flush.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-12-24 01:15:50 +10:00
|
|
|
|
2020-09-16 13:02:34 +10:00
|
|
|
ppc_after_tlbiel_barrier();
|
powerpc/64s: Improve local TLB flush for boot and MCE on POWER9
There are several cases outside the normal address space management
where a CPU's entire local TLB is to be flushed:
1. Booting the kernel, in case something has left stale entries in
the TLB (e.g., kexec).
2. Machine check, to clean corrupted TLB entries.
One other place where the TLB is flushed, is waking from deep idle
states. The flush is a side-effect of calling ->cpu_restore with the
intention of re-setting various SPRs. The flush itself is unnecessary
because in the first case, the TLB should not acquire new corrupted
TLB entries as part of sleep/wake (though they may be lost).
This type of TLB flush is coded inflexibly, several times for each CPU
type, and they have a number of problems with ISA v3.0B:
- The current radix mode of the MMU is not taken into account, it is
always done as a hash flushn For IS=2 (LPID-matching flush from host)
and IS=3 with HV=0 (guest kernel flush), tlbie(l) is undefined if
the R field does not match the current radix mode.
- ISA v3.0B hash must flush the partition and process table caches as
well.
- ISA v3.0B radix must flush partition and process scoped translations,
partition and process table caches, and also the page walk cache.
So consolidate the flushing code and implement it in C and inline asm
under the mm/ directory with the rest of the flush code. Add ISA v3.0B
cases for radix and hash, and use the radix flush in radix environment.
Provide a way for IS=2 (LPID flush) to specify the radix mode of the
partition. Have KVM pass in the radix mode of the guest.
Take out the flushes from early cputable/dt_cpu_ftrs detection hooks,
and move it later in the boot process after, the MMU registers are set
up and before relocation is first turned on.
The TLB flush is no longer called when restoring from deep idle states.
This was not be done as a separate step because booting secondaries
uses the same cpu_restore as idle restore, which needs the TLB flush.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-12-24 01:15:50 +10:00
|
|
|
}
|
|
|
|
|
|
|
|
void radix__tlbiel_all(unsigned int action)
|
|
|
|
{
|
|
|
|
unsigned int is;
|
|
|
|
|
|
|
|
switch (action) {
|
|
|
|
case TLB_INVAL_SCOPE_GLOBAL:
|
|
|
|
is = 3;
|
|
|
|
break;
|
|
|
|
case TLB_INVAL_SCOPE_LPID:
|
|
|
|
is = 2;
|
|
|
|
break;
|
|
|
|
default:
|
|
|
|
BUG();
|
|
|
|
}
|
|
|
|
|
|
|
|
if (early_cpu_has_feature(CPU_FTR_ARCH_300))
|
|
|
|
tlbiel_all_isa300(POWER9_TLB_SETS_RADIX, is);
|
|
|
|
else
|
|
|
|
WARN(1, "%s called on pre-POWER9 CPU\n", __func__);
|
|
|
|
|
2019-06-23 20:41:51 +10:00
|
|
|
asm volatile(PPC_ISA_3_0_INVALIDATE_ERAT "; isync" : : :"memory");
|
powerpc/64s: Improve local TLB flush for boot and MCE on POWER9
There are several cases outside the normal address space management
where a CPU's entire local TLB is to be flushed:
1. Booting the kernel, in case something has left stale entries in
the TLB (e.g., kexec).
2. Machine check, to clean corrupted TLB entries.
One other place where the TLB is flushed, is waking from deep idle
states. The flush is a side-effect of calling ->cpu_restore with the
intention of re-setting various SPRs. The flush itself is unnecessary
because in the first case, the TLB should not acquire new corrupted
TLB entries as part of sleep/wake (though they may be lost).
This type of TLB flush is coded inflexibly, several times for each CPU
type, and they have a number of problems with ISA v3.0B:
- The current radix mode of the MMU is not taken into account, it is
always done as a hash flushn For IS=2 (LPID-matching flush from host)
and IS=3 with HV=0 (guest kernel flush), tlbie(l) is undefined if
the R field does not match the current radix mode.
- ISA v3.0B hash must flush the partition and process table caches as
well.
- ISA v3.0B radix must flush partition and process scoped translations,
partition and process table caches, and also the page walk cache.
So consolidate the flushing code and implement it in C and inline asm
under the mm/ directory with the rest of the flush code. Add ISA v3.0B
cases for radix and hash, and use the radix flush in radix environment.
Provide a way for IS=2 (LPID flush) to specify the radix mode of the
partition. Have KVM pass in the radix mode of the guest.
Take out the flushes from early cputable/dt_cpu_ftrs detection hooks,
and move it later in the boot process after, the MMU registers are set
up and before relocation is first turned on.
The TLB flush is no longer called when restoring from deep idle states.
This was not be done as a separate step because booting secondaries
uses the same cpu_restore as idle restore, which needs the TLB flush.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-12-24 01:15:50 +10:00
|
|
|
}
|
|
|
|
|
powerpc/mm/radix: mark as __tlbie_pid() and friends as__always_inline
This prepares to move CONFIG_OPTIMIZE_INLINING from x86 to a common
place. We need to eliminate potential issues beforehand.
If it is enabled for powerpc, the following errors are reported:
arch/powerpc/mm/tlb-radix.c: In function '__tlbie_lpid':
arch/powerpc/mm/tlb-radix.c:148:2: warning: asm operand 3 probably doesn't match constraints
asm volatile(PPC_TLBIE_5(%0, %4, %3, %2, %1)
^~~
arch/powerpc/mm/tlb-radix.c:148:2: error: impossible constraint in 'asm'
arch/powerpc/mm/tlb-radix.c: In function '__tlbie_pid':
arch/powerpc/mm/tlb-radix.c:118:2: warning: asm operand 3 probably doesn't match constraints
asm volatile(PPC_TLBIE_5(%0, %4, %3, %2, %1)
^~~
arch/powerpc/mm/tlb-radix.c: In function '__tlbiel_pid':
arch/powerpc/mm/tlb-radix.c:104:2: warning: asm operand 3 probably doesn't match constraints
asm volatile(PPC_TLBIEL(%0, %4, %3, %2, %1)
^~~
Link: http://lkml.kernel.org/r/20190423034959.13525-11-yamada.masahiro@socionext.com
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Boris Brezillon <bbrezillon@kernel.org>
Cc: Borislav Petkov <bp@suse.de>
Cc: Brian Norris <computersforpeace@gmail.com>
Cc: Christophe Leroy <christophe.leroy@c-s.fr>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Marek Vasut <marek.vasut@gmail.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Malaterre <malat@debian.org>
Cc: Miquel Raynal <miquel.raynal@bootlin.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Richard Weinberger <richard@nod.at>
Cc: Russell King <rmk+kernel@arm.linux.org.uk>
Cc: Stefan Agner <stefan@agner.ch>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-05-14 15:42:21 -07:00
|
|
|
static __always_inline void __tlbiel_pid(unsigned long pid, int set,
|
2016-06-08 19:55:50 +05:30
|
|
|
unsigned long ric)
|
2016-04-29 23:26:05 +10:00
|
|
|
{
|
2016-06-08 19:55:50 +05:30
|
|
|
unsigned long rb,rs,prs,r;
|
2016-04-29 23:26:05 +10:00
|
|
|
|
|
|
|
rb = PPC_BIT(53); /* IS = 1 */
|
|
|
|
rb |= set << PPC_BITLSHIFT(51);
|
|
|
|
rs = ((unsigned long)pid) << PPC_BITLSHIFT(31);
|
|
|
|
prs = 1; /* process scoped */
|
2018-02-01 16:07:25 +11:00
|
|
|
r = 1; /* radix format */
|
2016-04-29 23:26:05 +10:00
|
|
|
|
2016-07-13 15:05:20 +05:30
|
|
|
asm volatile(PPC_TLBIEL(%0, %4, %3, %2, %1)
|
2016-04-29 23:26:05 +10:00
|
|
|
: : "r"(rb), "i"(r), "i"(prs), "i"(ric), "r"(rs) : "memory");
|
2017-04-11 15:23:25 +10:00
|
|
|
trace_tlbie(0, 1, rb, rs, ric, prs, r);
|
2016-04-29 23:26:05 +10:00
|
|
|
}
|
|
|
|
|
powerpc/mm/radix: mark as __tlbie_pid() and friends as__always_inline
This prepares to move CONFIG_OPTIMIZE_INLINING from x86 to a common
place. We need to eliminate potential issues beforehand.
If it is enabled for powerpc, the following errors are reported:
arch/powerpc/mm/tlb-radix.c: In function '__tlbie_lpid':
arch/powerpc/mm/tlb-radix.c:148:2: warning: asm operand 3 probably doesn't match constraints
asm volatile(PPC_TLBIE_5(%0, %4, %3, %2, %1)
^~~
arch/powerpc/mm/tlb-radix.c:148:2: error: impossible constraint in 'asm'
arch/powerpc/mm/tlb-radix.c: In function '__tlbie_pid':
arch/powerpc/mm/tlb-radix.c:118:2: warning: asm operand 3 probably doesn't match constraints
asm volatile(PPC_TLBIE_5(%0, %4, %3, %2, %1)
^~~
arch/powerpc/mm/tlb-radix.c: In function '__tlbiel_pid':
arch/powerpc/mm/tlb-radix.c:104:2: warning: asm operand 3 probably doesn't match constraints
asm volatile(PPC_TLBIEL(%0, %4, %3, %2, %1)
^~~
Link: http://lkml.kernel.org/r/20190423034959.13525-11-yamada.masahiro@socionext.com
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Boris Brezillon <bbrezillon@kernel.org>
Cc: Borislav Petkov <bp@suse.de>
Cc: Brian Norris <computersforpeace@gmail.com>
Cc: Christophe Leroy <christophe.leroy@c-s.fr>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Marek Vasut <marek.vasut@gmail.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Malaterre <malat@debian.org>
Cc: Miquel Raynal <miquel.raynal@bootlin.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Richard Weinberger <richard@nod.at>
Cc: Russell King <rmk+kernel@arm.linux.org.uk>
Cc: Stefan Agner <stefan@agner.ch>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-05-14 15:42:21 -07:00
|
|
|
static __always_inline void __tlbie_pid(unsigned long pid, unsigned long ric)
|
2017-11-07 18:53:09 +11:00
|
|
|
{
|
|
|
|
unsigned long rb,rs,prs,r;
|
|
|
|
|
|
|
|
rb = PPC_BIT(53); /* IS = 1 */
|
|
|
|
rs = pid << PPC_BITLSHIFT(31);
|
|
|
|
prs = 1; /* process scoped */
|
2018-02-01 16:07:25 +11:00
|
|
|
r = 1; /* radix format */
|
2017-11-07 18:53:09 +11:00
|
|
|
|
|
|
|
asm volatile(PPC_TLBIE_5(%0, %4, %3, %2, %1)
|
|
|
|
: : "r"(rb), "i"(r), "i"(prs), "i"(ric), "r"(rs) : "memory");
|
|
|
|
trace_tlbie(0, 0, rb, rs, ric, prs, r);
|
|
|
|
}
|
|
|
|
|
powerpc/mm/radix: mark as __tlbie_pid() and friends as__always_inline
This prepares to move CONFIG_OPTIMIZE_INLINING from x86 to a common
place. We need to eliminate potential issues beforehand.
If it is enabled for powerpc, the following errors are reported:
arch/powerpc/mm/tlb-radix.c: In function '__tlbie_lpid':
arch/powerpc/mm/tlb-radix.c:148:2: warning: asm operand 3 probably doesn't match constraints
asm volatile(PPC_TLBIE_5(%0, %4, %3, %2, %1)
^~~
arch/powerpc/mm/tlb-radix.c:148:2: error: impossible constraint in 'asm'
arch/powerpc/mm/tlb-radix.c: In function '__tlbie_pid':
arch/powerpc/mm/tlb-radix.c:118:2: warning: asm operand 3 probably doesn't match constraints
asm volatile(PPC_TLBIE_5(%0, %4, %3, %2, %1)
^~~
arch/powerpc/mm/tlb-radix.c: In function '__tlbiel_pid':
arch/powerpc/mm/tlb-radix.c:104:2: warning: asm operand 3 probably doesn't match constraints
asm volatile(PPC_TLBIEL(%0, %4, %3, %2, %1)
^~~
Link: http://lkml.kernel.org/r/20190423034959.13525-11-yamada.masahiro@socionext.com
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Boris Brezillon <bbrezillon@kernel.org>
Cc: Borislav Petkov <bp@suse.de>
Cc: Brian Norris <computersforpeace@gmail.com>
Cc: Christophe Leroy <christophe.leroy@c-s.fr>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Marek Vasut <marek.vasut@gmail.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Malaterre <malat@debian.org>
Cc: Miquel Raynal <miquel.raynal@bootlin.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Richard Weinberger <richard@nod.at>
Cc: Russell King <rmk+kernel@arm.linux.org.uk>
Cc: Stefan Agner <stefan@agner.ch>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-05-14 15:42:21 -07:00
|
|
|
static __always_inline void __tlbie_lpid(unsigned long lpid, unsigned long ric)
|
2018-05-09 12:20:18 +10:00
|
|
|
{
|
|
|
|
unsigned long rb,rs,prs,r;
|
|
|
|
|
|
|
|
rb = PPC_BIT(52); /* IS = 2 */
|
|
|
|
rs = lpid;
|
|
|
|
prs = 0; /* partition scoped */
|
|
|
|
r = 1; /* radix format */
|
|
|
|
|
|
|
|
asm volatile(PPC_TLBIE_5(%0, %4, %3, %2, %1)
|
|
|
|
: : "r"(rb), "i"(r), "i"(prs), "i"(ric), "r"(rs) : "memory");
|
|
|
|
trace_tlbie(lpid, 0, rb, rs, ric, prs, r);
|
|
|
|
}
|
|
|
|
|
2019-09-03 01:29:27 +10:00
|
|
|
static __always_inline void __tlbie_lpid_guest(unsigned long lpid, unsigned long ric)
|
2018-05-09 12:20:18 +10:00
|
|
|
{
|
|
|
|
unsigned long rb,rs,prs,r;
|
|
|
|
|
|
|
|
rb = PPC_BIT(52); /* IS = 2 */
|
2019-09-03 01:29:27 +10:00
|
|
|
rs = lpid;
|
2018-05-09 12:20:18 +10:00
|
|
|
prs = 1; /* process scoped */
|
|
|
|
r = 1; /* radix format */
|
|
|
|
|
2019-09-03 01:29:27 +10:00
|
|
|
asm volatile(PPC_TLBIE_5(%0, %4, %3, %2, %1)
|
2018-05-09 12:20:18 +10:00
|
|
|
: : "r"(rb), "i"(r), "i"(prs), "i"(ric), "r"(rs) : "memory");
|
2019-09-03 01:29:27 +10:00
|
|
|
trace_tlbie(lpid, 0, rb, rs, ric, prs, r);
|
2018-05-09 12:20:18 +10:00
|
|
|
}
|
|
|
|
|
2019-05-21 22:13:24 +09:00
|
|
|
static __always_inline void __tlbiel_va(unsigned long va, unsigned long pid,
|
|
|
|
unsigned long ap, unsigned long ric)
|
2018-03-23 10:26:26 +05:30
|
|
|
{
|
|
|
|
unsigned long rb,rs,prs,r;
|
|
|
|
|
|
|
|
rb = va & ~(PPC_BITMASK(52, 63));
|
|
|
|
rb |= ap << PPC_BITLSHIFT(58);
|
|
|
|
rs = pid << PPC_BITLSHIFT(31);
|
|
|
|
prs = 1; /* process scoped */
|
2018-03-28 22:59:50 +11:00
|
|
|
r = 1; /* radix format */
|
2018-03-23 10:26:26 +05:30
|
|
|
|
|
|
|
asm volatile(PPC_TLBIEL(%0, %4, %3, %2, %1)
|
|
|
|
: : "r"(rb), "i"(r), "i"(prs), "i"(ric), "r"(rs) : "memory");
|
|
|
|
trace_tlbie(0, 1, rb, rs, ric, prs, r);
|
|
|
|
}
|
|
|
|
|
2019-05-21 22:13:24 +09:00
|
|
|
static __always_inline void __tlbie_va(unsigned long va, unsigned long pid,
|
|
|
|
unsigned long ap, unsigned long ric)
|
2018-03-23 10:26:26 +05:30
|
|
|
{
|
|
|
|
unsigned long rb,rs,prs,r;
|
|
|
|
|
|
|
|
rb = va & ~(PPC_BITMASK(52, 63));
|
|
|
|
rb |= ap << PPC_BITLSHIFT(58);
|
|
|
|
rs = pid << PPC_BITLSHIFT(31);
|
|
|
|
prs = 1; /* process scoped */
|
2018-03-28 22:59:50 +11:00
|
|
|
r = 1; /* radix format */
|
2018-03-23 10:26:26 +05:30
|
|
|
|
|
|
|
asm volatile(PPC_TLBIE_5(%0, %4, %3, %2, %1)
|
|
|
|
: : "r"(rb), "i"(r), "i"(prs), "i"(ric), "r"(rs) : "memory");
|
|
|
|
trace_tlbie(0, 0, rb, rs, ric, prs, r);
|
|
|
|
}
|
|
|
|
|
2019-05-21 22:13:24 +09:00
|
|
|
static __always_inline void __tlbie_lpid_va(unsigned long va, unsigned long lpid,
|
|
|
|
unsigned long ap, unsigned long ric)
|
2018-05-09 12:20:18 +10:00
|
|
|
{
|
|
|
|
unsigned long rb,rs,prs,r;
|
|
|
|
|
|
|
|
rb = va & ~(PPC_BITMASK(52, 63));
|
|
|
|
rb |= ap << PPC_BITLSHIFT(58);
|
|
|
|
rs = lpid;
|
|
|
|
prs = 0; /* partition scoped */
|
|
|
|
r = 1; /* radix format */
|
|
|
|
|
|
|
|
asm volatile(PPC_TLBIE_5(%0, %4, %3, %2, %1)
|
|
|
|
: : "r"(rb), "i"(r), "i"(prs), "i"(ric), "r"(rs) : "memory");
|
|
|
|
trace_tlbie(lpid, 0, rb, rs, ric, prs, r);
|
|
|
|
}
|
|
|
|
|
2019-09-24 09:22:53 +05:30
|
|
|
|
|
|
|
static inline void fixup_tlbie_va(unsigned long va, unsigned long pid,
|
|
|
|
unsigned long ap)
|
|
|
|
{
|
|
|
|
if (cpu_has_feature(CPU_FTR_P9_TLBIE_ERAT_BUG)) {
|
|
|
|
asm volatile("ptesync": : :"memory");
|
|
|
|
__tlbie_va(va, 0, ap, RIC_FLUSH_TLB);
|
|
|
|
}
|
|
|
|
|
|
|
|
if (cpu_has_feature(CPU_FTR_P9_TLBIE_STQ_BUG)) {
|
|
|
|
asm volatile("ptesync": : :"memory");
|
|
|
|
__tlbie_va(va, pid, ap, RIC_FLUSH_TLB);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
static inline void fixup_tlbie_va_range(unsigned long va, unsigned long pid,
|
|
|
|
unsigned long ap)
|
|
|
|
{
|
|
|
|
if (cpu_has_feature(CPU_FTR_P9_TLBIE_ERAT_BUG)) {
|
|
|
|
asm volatile("ptesync": : :"memory");
|
|
|
|
__tlbie_pid(0, RIC_FLUSH_TLB);
|
|
|
|
}
|
|
|
|
|
|
|
|
if (cpu_has_feature(CPU_FTR_P9_TLBIE_STQ_BUG)) {
|
|
|
|
asm volatile("ptesync": : :"memory");
|
|
|
|
__tlbie_va(va, pid, ap, RIC_FLUSH_TLB);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
static inline void fixup_tlbie_pid(unsigned long pid)
|
2018-03-23 10:26:27 +05:30
|
|
|
{
|
2019-09-24 09:22:53 +05:30
|
|
|
/*
|
|
|
|
* We can use any address for the invalidation, pick one which is
|
|
|
|
* probably unused as an optimisation.
|
|
|
|
*/
|
2018-03-23 10:26:27 +05:30
|
|
|
unsigned long va = ((1UL << 52) - 1);
|
|
|
|
|
2019-09-24 09:22:53 +05:30
|
|
|
if (cpu_has_feature(CPU_FTR_P9_TLBIE_ERAT_BUG)) {
|
|
|
|
asm volatile("ptesync": : :"memory");
|
|
|
|
__tlbie_pid(0, RIC_FLUSH_TLB);
|
|
|
|
}
|
|
|
|
|
2019-09-24 09:22:52 +05:30
|
|
|
if (cpu_has_feature(CPU_FTR_P9_TLBIE_STQ_BUG)) {
|
2018-03-23 10:26:27 +05:30
|
|
|
asm volatile("ptesync": : :"memory");
|
|
|
|
__tlbie_va(va, pid, mmu_get_ap(MMU_PAGE_64K), RIC_FLUSH_TLB);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2019-09-24 09:22:53 +05:30
|
|
|
static inline void fixup_tlbie_lpid_va(unsigned long va, unsigned long lpid,
|
|
|
|
unsigned long ap)
|
|
|
|
{
|
|
|
|
if (cpu_has_feature(CPU_FTR_P9_TLBIE_ERAT_BUG)) {
|
|
|
|
asm volatile("ptesync": : :"memory");
|
|
|
|
__tlbie_lpid_va(va, 0, ap, RIC_FLUSH_TLB);
|
|
|
|
}
|
|
|
|
|
|
|
|
if (cpu_has_feature(CPU_FTR_P9_TLBIE_STQ_BUG)) {
|
|
|
|
asm volatile("ptesync": : :"memory");
|
|
|
|
__tlbie_lpid_va(va, lpid, ap, RIC_FLUSH_TLB);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2018-05-09 12:20:18 +10:00
|
|
|
static inline void fixup_tlbie_lpid(unsigned long lpid)
|
|
|
|
{
|
2019-09-24 09:22:53 +05:30
|
|
|
/*
|
|
|
|
* We can use any address for the invalidation, pick one which is
|
|
|
|
* probably unused as an optimisation.
|
|
|
|
*/
|
2018-05-09 12:20:18 +10:00
|
|
|
unsigned long va = ((1UL << 52) - 1);
|
|
|
|
|
2019-09-24 09:22:53 +05:30
|
|
|
if (cpu_has_feature(CPU_FTR_P9_TLBIE_ERAT_BUG)) {
|
|
|
|
asm volatile("ptesync": : :"memory");
|
|
|
|
__tlbie_lpid(0, RIC_FLUSH_TLB);
|
|
|
|
}
|
|
|
|
|
2019-09-24 09:22:52 +05:30
|
|
|
if (cpu_has_feature(CPU_FTR_P9_TLBIE_STQ_BUG)) {
|
2018-05-09 12:20:18 +10:00
|
|
|
asm volatile("ptesync": : :"memory");
|
|
|
|
__tlbie_lpid_va(va, lpid, mmu_get_ap(MMU_PAGE_64K), RIC_FLUSH_TLB);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2016-04-29 23:26:05 +10:00
|
|
|
/*
|
|
|
|
* We use 128 set in radix mode and 256 set in hpt mode.
|
|
|
|
*/
|
2021-06-10 14:06:39 +05:30
|
|
|
static inline void _tlbiel_pid(unsigned long pid, unsigned long ric)
|
2016-04-29 23:26:05 +10:00
|
|
|
{
|
|
|
|
int set;
|
|
|
|
|
2017-04-01 20:11:48 +05:30
|
|
|
asm volatile("ptesync": : :"memory");
|
2017-04-26 21:38:17 +10:00
|
|
|
|
2021-06-10 14:06:39 +05:30
|
|
|
switch (ric) {
|
|
|
|
case RIC_FLUSH_PWC:
|
2017-04-26 21:38:17 +10:00
|
|
|
|
2021-06-10 14:06:39 +05:30
|
|
|
/* For PWC, only one flush is needed */
|
|
|
|
__tlbiel_pid(pid, 0, RIC_FLUSH_PWC);
|
2020-09-16 13:02:34 +10:00
|
|
|
ppc_after_tlbiel_barrier();
|
2017-07-19 14:49:04 +10:00
|
|
|
return;
|
2021-06-10 14:06:39 +05:30
|
|
|
case RIC_FLUSH_TLB:
|
|
|
|
__tlbiel_pid(pid, 0, RIC_FLUSH_TLB);
|
|
|
|
break;
|
|
|
|
case RIC_FLUSH_ALL:
|
|
|
|
default:
|
|
|
|
/*
|
|
|
|
* Flush the first set of the TLB, and if
|
|
|
|
* we're doing a RIC_FLUSH_ALL, also flush
|
|
|
|
* the entire Page Walk Cache.
|
|
|
|
*/
|
|
|
|
__tlbiel_pid(pid, 0, RIC_FLUSH_ALL);
|
2017-07-19 14:49:04 +10:00
|
|
|
}
|
2017-04-26 21:38:17 +10:00
|
|
|
|
2020-10-07 11:03:05 +05:30
|
|
|
if (!cpu_has_feature(CPU_FTR_ARCH_31)) {
|
|
|
|
/* For the remaining sets, just flush the TLB */
|
|
|
|
for (set = 1; set < POWER9_TLB_SETS_RADIX ; set++)
|
|
|
|
__tlbiel_pid(pid, set, RIC_FLUSH_TLB);
|
|
|
|
}
|
2017-04-26 21:38:17 +10:00
|
|
|
|
2020-09-16 13:02:34 +10:00
|
|
|
ppc_after_tlbiel_barrier();
|
2019-06-23 20:41:52 +10:00
|
|
|
asm volatile(PPC_RADIX_INVALIDATE_ERAT_USER "; isync" : : :"memory");
|
2016-04-29 23:26:05 +10:00
|
|
|
}
|
|
|
|
|
2016-06-08 19:55:50 +05:30
|
|
|
static inline void _tlbie_pid(unsigned long pid, unsigned long ric)
|
2016-04-29 23:26:05 +10:00
|
|
|
{
|
|
|
|
asm volatile("ptesync": : :"memory");
|
|
|
|
|
2018-03-23 09:29:06 +11:00
|
|
|
/*
|
|
|
|
* Workaround the fact that the "ric" argument to __tlbie_pid
|
2022-04-30 20:56:54 +02:00
|
|
|
* must be a compile-time constraint to match the "i" constraint
|
2018-03-23 09:29:06 +11:00
|
|
|
* in the asm statement.
|
|
|
|
*/
|
|
|
|
switch (ric) {
|
|
|
|
case RIC_FLUSH_TLB:
|
|
|
|
__tlbie_pid(pid, RIC_FLUSH_TLB);
|
2019-09-24 09:22:53 +05:30
|
|
|
fixup_tlbie_pid(pid);
|
2018-03-23 09:29:06 +11:00
|
|
|
break;
|
|
|
|
case RIC_FLUSH_PWC:
|
|
|
|
__tlbie_pid(pid, RIC_FLUSH_PWC);
|
|
|
|
break;
|
|
|
|
case RIC_FLUSH_ALL:
|
|
|
|
default:
|
|
|
|
__tlbie_pid(pid, RIC_FLUSH_ALL);
|
2019-09-24 09:22:53 +05:30
|
|
|
fixup_tlbie_pid(pid);
|
2018-03-23 09:29:06 +11:00
|
|
|
}
|
2016-04-29 23:26:05 +10:00
|
|
|
asm volatile("eieio; tlbsync; ptesync": : :"memory");
|
|
|
|
}
|
|
|
|
|
2019-09-03 01:29:31 +10:00
|
|
|
struct tlbiel_pid {
|
|
|
|
unsigned long pid;
|
|
|
|
unsigned long ric;
|
|
|
|
};
|
|
|
|
|
|
|
|
static void do_tlbiel_pid(void *info)
|
|
|
|
{
|
|
|
|
struct tlbiel_pid *t = info;
|
|
|
|
|
|
|
|
if (t->ric == RIC_FLUSH_TLB)
|
|
|
|
_tlbiel_pid(t->pid, RIC_FLUSH_TLB);
|
|
|
|
else if (t->ric == RIC_FLUSH_PWC)
|
|
|
|
_tlbiel_pid(t->pid, RIC_FLUSH_PWC);
|
|
|
|
else
|
|
|
|
_tlbiel_pid(t->pid, RIC_FLUSH_ALL);
|
|
|
|
}
|
|
|
|
|
|
|
|
static inline void _tlbiel_pid_multicast(struct mm_struct *mm,
|
|
|
|
unsigned long pid, unsigned long ric)
|
|
|
|
{
|
|
|
|
struct cpumask *cpus = mm_cpumask(mm);
|
|
|
|
struct tlbiel_pid t = { .pid = pid, .ric = ric };
|
|
|
|
|
|
|
|
on_each_cpu_mask(cpus, do_tlbiel_pid, &t, 1);
|
|
|
|
/*
|
|
|
|
* Always want the CPU translations to be invalidated with tlbiel in
|
|
|
|
* these paths, so while coprocessors must use tlbie, we can not
|
|
|
|
* optimise away the tlbiel component.
|
|
|
|
*/
|
|
|
|
if (atomic_read(&mm->context.copros) > 0)
|
|
|
|
_tlbie_pid(pid, RIC_FLUSH_ALL);
|
|
|
|
}
|
|
|
|
|
2018-05-09 12:20:18 +10:00
|
|
|
static inline void _tlbie_lpid(unsigned long lpid, unsigned long ric)
|
|
|
|
{
|
|
|
|
asm volatile("ptesync": : :"memory");
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Workaround the fact that the "ric" argument to __tlbie_pid
|
|
|
|
* must be a compile-time contraint to match the "i" constraint
|
|
|
|
* in the asm statement.
|
|
|
|
*/
|
|
|
|
switch (ric) {
|
|
|
|
case RIC_FLUSH_TLB:
|
|
|
|
__tlbie_lpid(lpid, RIC_FLUSH_TLB);
|
2019-09-24 09:22:53 +05:30
|
|
|
fixup_tlbie_lpid(lpid);
|
2018-05-09 12:20:18 +10:00
|
|
|
break;
|
|
|
|
case RIC_FLUSH_PWC:
|
|
|
|
__tlbie_lpid(lpid, RIC_FLUSH_PWC);
|
|
|
|
break;
|
|
|
|
case RIC_FLUSH_ALL:
|
|
|
|
default:
|
|
|
|
__tlbie_lpid(lpid, RIC_FLUSH_ALL);
|
2019-09-24 09:22:53 +05:30
|
|
|
fixup_tlbie_lpid(lpid);
|
2018-05-09 12:20:18 +10:00
|
|
|
}
|
|
|
|
asm volatile("eieio; tlbsync; ptesync": : :"memory");
|
|
|
|
}
|
|
|
|
|
2019-09-03 01:29:27 +10:00
|
|
|
static __always_inline void _tlbie_lpid_guest(unsigned long lpid, unsigned long ric)
|
2018-05-09 12:20:18 +10:00
|
|
|
{
|
|
|
|
/*
|
2019-09-03 01:29:27 +10:00
|
|
|
* Workaround the fact that the "ric" argument to __tlbie_pid
|
|
|
|
* must be a compile-time contraint to match the "i" constraint
|
|
|
|
* in the asm statement.
|
2018-05-09 12:20:18 +10:00
|
|
|
*/
|
2019-09-03 01:29:27 +10:00
|
|
|
switch (ric) {
|
|
|
|
case RIC_FLUSH_TLB:
|
|
|
|
__tlbie_lpid_guest(lpid, RIC_FLUSH_TLB);
|
|
|
|
break;
|
|
|
|
case RIC_FLUSH_PWC:
|
|
|
|
__tlbie_lpid_guest(lpid, RIC_FLUSH_PWC);
|
|
|
|
break;
|
|
|
|
case RIC_FLUSH_ALL:
|
|
|
|
default:
|
|
|
|
__tlbie_lpid_guest(lpid, RIC_FLUSH_ALL);
|
2018-05-09 12:20:18 +10:00
|
|
|
}
|
2019-09-03 01:29:27 +10:00
|
|
|
fixup_tlbie_lpid(lpid);
|
|
|
|
asm volatile("eieio; tlbsync; ptesync": : :"memory");
|
2018-05-09 12:20:18 +10:00
|
|
|
}
|
|
|
|
|
powerpc/64s/radix: Optimize flush_tlb_range
Currently for radix, flush_tlb_range flushes the entire PID, because
the Linux mm code does not tell us about page size here for THP vs
regular pages. This is quite sub-optimal for small mremap / mprotect
/ change_protection.
So implement va range flushes with two flush passes, one for each
page size (regular and THP). The second flush has an order of matnitude
fewer tlbie instructions than the first, so it is a relatively small
additional cost.
There is still room for improvement here with some changes to generic
APIs, particularly if there are mostly THP pages to be invalidated,
the small page flushes could be reduced.
Time to mprotect 1 page of memory (after mmap, touch):
vanilla 2.9us 1.8us
patched 1.2us 1.6us
Time to mprotect 30 pages of memory (after mmap, touch):
vanilla 8.2us 7.2us
patched 6.9us 17.9us
Time to mprotect 34 pages of memory (after mmap, touch):
vanilla 9.1us 8.0us
patched 9.0us 8.0us
34 pages is the point at which the invalidation switches from va
to entire PID, which tlbie can do in a single instruction. This is
why in the case of 30 pages, the new code runs slower for this test.
This is a deliberate tradeoff already present in the unmap and THP
promotion code, the idea is that the benefit from avoiding flushing
entire TLB for this PID on all threads in the system.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-11-07 18:53:07 +11:00
|
|
|
static inline void __tlbiel_va_range(unsigned long start, unsigned long end,
|
|
|
|
unsigned long pid, unsigned long page_size,
|
|
|
|
unsigned long psize)
|
|
|
|
{
|
|
|
|
unsigned long addr;
|
|
|
|
unsigned long ap = mmu_get_ap(psize);
|
|
|
|
|
|
|
|
for (addr = start; addr < end; addr += page_size)
|
|
|
|
__tlbiel_va(addr, pid, ap, RIC_FLUSH_TLB);
|
|
|
|
}
|
|
|
|
|
2019-05-21 22:13:24 +09:00
|
|
|
static __always_inline void _tlbiel_va(unsigned long va, unsigned long pid,
|
|
|
|
unsigned long psize, unsigned long ric)
|
2017-11-07 18:53:05 +11:00
|
|
|
{
|
2017-11-07 18:53:06 +11:00
|
|
|
unsigned long ap = mmu_get_ap(psize);
|
|
|
|
|
2017-11-07 18:53:05 +11:00
|
|
|
asm volatile("ptesync": : :"memory");
|
|
|
|
__tlbiel_va(va, pid, ap, ric);
|
2020-09-16 13:02:34 +10:00
|
|
|
ppc_after_tlbiel_barrier();
|
2017-11-07 18:53:05 +11:00
|
|
|
}
|
|
|
|
|
2017-11-07 18:53:06 +11:00
|
|
|
static inline void _tlbiel_va_range(unsigned long start, unsigned long end,
|
|
|
|
unsigned long pid, unsigned long page_size,
|
2017-11-07 18:53:09 +11:00
|
|
|
unsigned long psize, bool also_pwc)
|
2017-11-07 18:53:06 +11:00
|
|
|
{
|
|
|
|
asm volatile("ptesync": : :"memory");
|
2017-11-07 18:53:09 +11:00
|
|
|
if (also_pwc)
|
|
|
|
__tlbiel_pid(pid, 0, RIC_FLUSH_PWC);
|
powerpc/64s/radix: Optimize flush_tlb_range
Currently for radix, flush_tlb_range flushes the entire PID, because
the Linux mm code does not tell us about page size here for THP vs
regular pages. This is quite sub-optimal for small mremap / mprotect
/ change_protection.
So implement va range flushes with two flush passes, one for each
page size (regular and THP). The second flush has an order of matnitude
fewer tlbie instructions than the first, so it is a relatively small
additional cost.
There is still room for improvement here with some changes to generic
APIs, particularly if there are mostly THP pages to be invalidated,
the small page flushes could be reduced.
Time to mprotect 1 page of memory (after mmap, touch):
vanilla 2.9us 1.8us
patched 1.2us 1.6us
Time to mprotect 30 pages of memory (after mmap, touch):
vanilla 8.2us 7.2us
patched 6.9us 17.9us
Time to mprotect 34 pages of memory (after mmap, touch):
vanilla 9.1us 8.0us
patched 9.0us 8.0us
34 pages is the point at which the invalidation switches from va
to entire PID, which tlbie can do in a single instruction. This is
why in the case of 30 pages, the new code runs slower for this test.
This is a deliberate tradeoff already present in the unmap and THP
promotion code, the idea is that the benefit from avoiding flushing
entire TLB for this PID on all threads in the system.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-11-07 18:53:07 +11:00
|
|
|
__tlbiel_va_range(start, end, pid, page_size, psize);
|
2020-09-16 13:02:34 +10:00
|
|
|
ppc_after_tlbiel_barrier();
|
2017-11-07 18:53:06 +11:00
|
|
|
}
|
|
|
|
|
powerpc/64s/radix: Optimize flush_tlb_range
Currently for radix, flush_tlb_range flushes the entire PID, because
the Linux mm code does not tell us about page size here for THP vs
regular pages. This is quite sub-optimal for small mremap / mprotect
/ change_protection.
So implement va range flushes with two flush passes, one for each
page size (regular and THP). The second flush has an order of matnitude
fewer tlbie instructions than the first, so it is a relatively small
additional cost.
There is still room for improvement here with some changes to generic
APIs, particularly if there are mostly THP pages to be invalidated,
the small page flushes could be reduced.
Time to mprotect 1 page of memory (after mmap, touch):
vanilla 2.9us 1.8us
patched 1.2us 1.6us
Time to mprotect 30 pages of memory (after mmap, touch):
vanilla 8.2us 7.2us
patched 6.9us 17.9us
Time to mprotect 34 pages of memory (after mmap, touch):
vanilla 9.1us 8.0us
patched 9.0us 8.0us
34 pages is the point at which the invalidation switches from va
to entire PID, which tlbie can do in a single instruction. This is
why in the case of 30 pages, the new code runs slower for this test.
This is a deliberate tradeoff already present in the unmap and THP
promotion code, the idea is that the benefit from avoiding flushing
entire TLB for this PID on all threads in the system.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-11-07 18:53:07 +11:00
|
|
|
static inline void __tlbie_va_range(unsigned long start, unsigned long end,
|
|
|
|
unsigned long pid, unsigned long page_size,
|
|
|
|
unsigned long psize)
|
|
|
|
{
|
|
|
|
unsigned long addr;
|
|
|
|
unsigned long ap = mmu_get_ap(psize);
|
|
|
|
|
|
|
|
for (addr = start; addr < end; addr += page_size)
|
|
|
|
__tlbie_va(addr, pid, ap, RIC_FLUSH_TLB);
|
2019-09-24 09:22:53 +05:30
|
|
|
|
|
|
|
fixup_tlbie_va_range(addr - page_size, pid, ap);
|
powerpc/64s/radix: Optimize flush_tlb_range
Currently for radix, flush_tlb_range flushes the entire PID, because
the Linux mm code does not tell us about page size here for THP vs
regular pages. This is quite sub-optimal for small mremap / mprotect
/ change_protection.
So implement va range flushes with two flush passes, one for each
page size (regular and THP). The second flush has an order of matnitude
fewer tlbie instructions than the first, so it is a relatively small
additional cost.
There is still room for improvement here with some changes to generic
APIs, particularly if there are mostly THP pages to be invalidated,
the small page flushes could be reduced.
Time to mprotect 1 page of memory (after mmap, touch):
vanilla 2.9us 1.8us
patched 1.2us 1.6us
Time to mprotect 30 pages of memory (after mmap, touch):
vanilla 8.2us 7.2us
patched 6.9us 17.9us
Time to mprotect 34 pages of memory (after mmap, touch):
vanilla 9.1us 8.0us
patched 9.0us 8.0us
34 pages is the point at which the invalidation switches from va
to entire PID, which tlbie can do in a single instruction. This is
why in the case of 30 pages, the new code runs slower for this test.
This is a deliberate tradeoff already present in the unmap and THP
promotion code, the idea is that the benefit from avoiding flushing
entire TLB for this PID on all threads in the system.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-11-07 18:53:07 +11:00
|
|
|
}
|
|
|
|
|
2019-05-21 22:13:24 +09:00
|
|
|
static __always_inline void _tlbie_va(unsigned long va, unsigned long pid,
|
|
|
|
unsigned long psize, unsigned long ric)
|
2017-11-07 18:53:05 +11:00
|
|
|
{
|
2017-11-07 18:53:06 +11:00
|
|
|
unsigned long ap = mmu_get_ap(psize);
|
|
|
|
|
2017-11-07 18:53:05 +11:00
|
|
|
asm volatile("ptesync": : :"memory");
|
|
|
|
__tlbie_va(va, pid, ap, ric);
|
2019-09-24 09:22:53 +05:30
|
|
|
fixup_tlbie_va(va, pid, ap);
|
2017-11-07 18:53:05 +11:00
|
|
|
asm volatile("eieio; tlbsync; ptesync": : :"memory");
|
|
|
|
}
|
|
|
|
|
2019-09-03 01:29:31 +10:00
|
|
|
struct tlbiel_va {
|
|
|
|
unsigned long pid;
|
|
|
|
unsigned long va;
|
|
|
|
unsigned long psize;
|
|
|
|
unsigned long ric;
|
|
|
|
};
|
|
|
|
|
|
|
|
static void do_tlbiel_va(void *info)
|
|
|
|
{
|
|
|
|
struct tlbiel_va *t = info;
|
|
|
|
|
|
|
|
if (t->ric == RIC_FLUSH_TLB)
|
|
|
|
_tlbiel_va(t->va, t->pid, t->psize, RIC_FLUSH_TLB);
|
|
|
|
else if (t->ric == RIC_FLUSH_PWC)
|
|
|
|
_tlbiel_va(t->va, t->pid, t->psize, RIC_FLUSH_PWC);
|
|
|
|
else
|
|
|
|
_tlbiel_va(t->va, t->pid, t->psize, RIC_FLUSH_ALL);
|
|
|
|
}
|
|
|
|
|
|
|
|
static inline void _tlbiel_va_multicast(struct mm_struct *mm,
|
|
|
|
unsigned long va, unsigned long pid,
|
|
|
|
unsigned long psize, unsigned long ric)
|
|
|
|
{
|
|
|
|
struct cpumask *cpus = mm_cpumask(mm);
|
|
|
|
struct tlbiel_va t = { .va = va, .pid = pid, .psize = psize, .ric = ric };
|
|
|
|
on_each_cpu_mask(cpus, do_tlbiel_va, &t, 1);
|
|
|
|
if (atomic_read(&mm->context.copros) > 0)
|
|
|
|
_tlbie_va(va, pid, psize, RIC_FLUSH_TLB);
|
|
|
|
}
|
|
|
|
|
|
|
|
struct tlbiel_va_range {
|
|
|
|
unsigned long pid;
|
|
|
|
unsigned long start;
|
|
|
|
unsigned long end;
|
|
|
|
unsigned long page_size;
|
|
|
|
unsigned long psize;
|
|
|
|
bool also_pwc;
|
|
|
|
};
|
|
|
|
|
|
|
|
static void do_tlbiel_va_range(void *info)
|
|
|
|
{
|
|
|
|
struct tlbiel_va_range *t = info;
|
|
|
|
|
|
|
|
_tlbiel_va_range(t->start, t->end, t->pid, t->page_size,
|
|
|
|
t->psize, t->also_pwc);
|
|
|
|
}
|
|
|
|
|
2019-05-21 22:13:24 +09:00
|
|
|
static __always_inline void _tlbie_lpid_va(unsigned long va, unsigned long lpid,
|
2018-05-09 12:20:18 +10:00
|
|
|
unsigned long psize, unsigned long ric)
|
|
|
|
{
|
|
|
|
unsigned long ap = mmu_get_ap(psize);
|
|
|
|
|
|
|
|
asm volatile("ptesync": : :"memory");
|
|
|
|
__tlbie_lpid_va(va, lpid, ap, ric);
|
2019-09-24 09:22:53 +05:30
|
|
|
fixup_tlbie_lpid_va(va, lpid, ap);
|
2018-05-09 12:20:18 +10:00
|
|
|
asm volatile("eieio; tlbsync; ptesync": : :"memory");
|
|
|
|
}
|
|
|
|
|
2017-11-07 18:53:06 +11:00
|
|
|
static inline void _tlbie_va_range(unsigned long start, unsigned long end,
|
|
|
|
unsigned long pid, unsigned long page_size,
|
2017-11-07 18:53:09 +11:00
|
|
|
unsigned long psize, bool also_pwc)
|
2017-11-07 18:53:06 +11:00
|
|
|
{
|
|
|
|
asm volatile("ptesync": : :"memory");
|
2017-11-07 18:53:09 +11:00
|
|
|
if (also_pwc)
|
|
|
|
__tlbie_pid(pid, RIC_FLUSH_PWC);
|
powerpc/64s/radix: Optimize flush_tlb_range
Currently for radix, flush_tlb_range flushes the entire PID, because
the Linux mm code does not tell us about page size here for THP vs
regular pages. This is quite sub-optimal for small mremap / mprotect
/ change_protection.
So implement va range flushes with two flush passes, one for each
page size (regular and THP). The second flush has an order of matnitude
fewer tlbie instructions than the first, so it is a relatively small
additional cost.
There is still room for improvement here with some changes to generic
APIs, particularly if there are mostly THP pages to be invalidated,
the small page flushes could be reduced.
Time to mprotect 1 page of memory (after mmap, touch):
vanilla 2.9us 1.8us
patched 1.2us 1.6us
Time to mprotect 30 pages of memory (after mmap, touch):
vanilla 8.2us 7.2us
patched 6.9us 17.9us
Time to mprotect 34 pages of memory (after mmap, touch):
vanilla 9.1us 8.0us
patched 9.0us 8.0us
34 pages is the point at which the invalidation switches from va
to entire PID, which tlbie can do in a single instruction. This is
why in the case of 30 pages, the new code runs slower for this test.
This is a deliberate tradeoff already present in the unmap and THP
promotion code, the idea is that the benefit from avoiding flushing
entire TLB for this PID on all threads in the system.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-11-07 18:53:07 +11:00
|
|
|
__tlbie_va_range(start, end, pid, page_size, psize);
|
2017-11-07 18:53:06 +11:00
|
|
|
asm volatile("eieio; tlbsync; ptesync": : :"memory");
|
|
|
|
}
|
2017-11-07 18:53:05 +11:00
|
|
|
|
2019-09-03 01:29:31 +10:00
|
|
|
static inline void _tlbiel_va_range_multicast(struct mm_struct *mm,
|
|
|
|
unsigned long start, unsigned long end,
|
|
|
|
unsigned long pid, unsigned long page_size,
|
|
|
|
unsigned long psize, bool also_pwc)
|
|
|
|
{
|
|
|
|
struct cpumask *cpus = mm_cpumask(mm);
|
|
|
|
struct tlbiel_va_range t = { .start = start, .end = end,
|
|
|
|
.pid = pid, .page_size = page_size,
|
|
|
|
.psize = psize, .also_pwc = also_pwc };
|
|
|
|
|
|
|
|
on_each_cpu_mask(cpus, do_tlbiel_va_range, &t, 1);
|
|
|
|
if (atomic_read(&mm->context.copros) > 0)
|
|
|
|
_tlbie_va_range(start, end, pid, page_size, psize, also_pwc);
|
|
|
|
}
|
|
|
|
|
2016-04-29 23:26:05 +10:00
|
|
|
/*
|
|
|
|
* Base TLB flushing operations:
|
|
|
|
*
|
|
|
|
* - flush_tlb_mm(mm) flushes the specified mm context TLB's
|
|
|
|
* - flush_tlb_page(vma, vmaddr) flushes one page
|
|
|
|
* - flush_tlb_range(vma, start, end) flushes a range of pages
|
|
|
|
* - flush_tlb_kernel_range(start, end) flushes kernel pages
|
|
|
|
*
|
|
|
|
* - local_* variants of page and mm only apply to the current
|
|
|
|
* processor
|
|
|
|
*/
|
|
|
|
void radix__local_flush_tlb_mm(struct mm_struct *mm)
|
|
|
|
{
|
2023-02-03 21:17:17 +10:00
|
|
|
unsigned long pid = mm->context.id;
|
|
|
|
|
|
|
|
if (WARN_ON_ONCE(pid == MMU_NO_CONTEXT))
|
|
|
|
return;
|
2016-04-29 23:26:05 +10:00
|
|
|
|
|
|
|
preempt_disable();
|
2023-02-03 21:17:17 +10:00
|
|
|
_tlbiel_pid(pid, RIC_FLUSH_TLB);
|
2016-04-29 23:26:05 +10:00
|
|
|
preempt_enable();
|
|
|
|
}
|
|
|
|
EXPORT_SYMBOL(radix__local_flush_tlb_mm);
|
|
|
|
|
2017-07-19 14:49:05 +10:00
|
|
|
#ifndef CONFIG_SMP
|
2017-09-03 20:15:12 +02:00
|
|
|
void radix__local_flush_all_mm(struct mm_struct *mm)
|
2016-06-08 19:55:51 +05:30
|
|
|
{
|
2023-02-03 21:17:17 +10:00
|
|
|
unsigned long pid = mm->context.id;
|
|
|
|
|
|
|
|
if (WARN_ON_ONCE(pid == MMU_NO_CONTEXT))
|
|
|
|
return;
|
2016-06-08 19:55:51 +05:30
|
|
|
|
|
|
|
preempt_disable();
|
2023-02-03 21:17:17 +10:00
|
|
|
_tlbiel_pid(pid, RIC_FLUSH_ALL);
|
2016-06-08 19:55:51 +05:30
|
|
|
preempt_enable();
|
|
|
|
}
|
2017-09-03 20:15:12 +02:00
|
|
|
EXPORT_SYMBOL(radix__local_flush_all_mm);
|
2020-03-02 11:04:10 +10:00
|
|
|
|
|
|
|
static void __flush_all_mm(struct mm_struct *mm, bool fullmm)
|
|
|
|
{
|
|
|
|
radix__local_flush_all_mm(mm);
|
|
|
|
}
|
2017-07-19 14:49:05 +10:00
|
|
|
#endif /* CONFIG_SMP */
|
2016-06-08 19:55:51 +05:30
|
|
|
|
2016-07-13 15:06:41 +05:30
|
|
|
void radix__local_flush_tlb_page_psize(struct mm_struct *mm, unsigned long vmaddr,
|
2016-07-13 15:06:42 +05:30
|
|
|
int psize)
|
2016-04-29 23:26:05 +10:00
|
|
|
{
|
2023-02-03 21:17:17 +10:00
|
|
|
unsigned long pid = mm->context.id;
|
|
|
|
|
|
|
|
if (WARN_ON_ONCE(pid == MMU_NO_CONTEXT))
|
|
|
|
return;
|
2016-04-29 23:26:05 +10:00
|
|
|
|
|
|
|
preempt_disable();
|
2023-02-03 21:17:17 +10:00
|
|
|
_tlbiel_va(vmaddr, pid, psize, RIC_FLUSH_TLB);
|
2016-04-29 23:26:05 +10:00
|
|
|
preempt_enable();
|
|
|
|
}
|
|
|
|
|
|
|
|
void radix__local_flush_tlb_page(struct vm_area_struct *vma, unsigned long vmaddr)
|
|
|
|
{
|
2016-04-29 23:26:25 +10:00
|
|
|
#ifdef CONFIG_HUGETLB_PAGE
|
|
|
|
/* need the return fix for nohash.c */
|
2017-10-16 12:41:00 +05:30
|
|
|
if (is_vm_hugetlb_page(vma))
|
|
|
|
return radix__local_flush_hugetlb_page(vma, vmaddr);
|
2016-04-29 23:26:25 +10:00
|
|
|
#endif
|
2017-10-16 12:41:00 +05:30
|
|
|
radix__local_flush_tlb_page_psize(vma->vm_mm, vmaddr, mmu_virtual_psize);
|
2016-04-29 23:26:05 +10:00
|
|
|
}
|
|
|
|
EXPORT_SYMBOL(radix__local_flush_tlb_page);
|
|
|
|
|
2018-03-23 09:29:06 +11:00
|
|
|
static bool mm_needs_flush_escalation(struct mm_struct *mm)
|
|
|
|
{
|
|
|
|
/*
|
2022-05-25 12:23:56 +10:00
|
|
|
* The P9 nest MMU has issues with the page walk cache caching PTEs
|
|
|
|
* and not flushing them when RIC = 0 for a PID/LPID invalidate.
|
|
|
|
*
|
|
|
|
* This may have been fixed in shipping firmware (by disabling PWC
|
|
|
|
* or preventing it from caching PTEs), but until that is confirmed,
|
|
|
|
* this workaround is required - escalate all RIC=0 IS=1/2/3 flushes
|
|
|
|
* to RIC=2.
|
|
|
|
*
|
|
|
|
* POWER10 (and P9P) does not have this problem.
|
2018-03-23 09:29:06 +11:00
|
|
|
*/
|
2022-05-25 12:23:56 +10:00
|
|
|
if (cpu_has_feature(CPU_FTR_ARCH_31))
|
|
|
|
return false;
|
powerpc/64s/radix: flush remote CPUs out of single-threaded mm_cpumask
When a single-threaded process has a non-local mm_cpumask, try to use
that point to flush the TLBs out of other CPUs in the cpumask.
An IPI is used for clearing remote CPUs for a few reasons:
- An IPI can end lazy TLB use of the mm, which is required to prevent
TLB entries being created on the remote CPU. The alternative is to
drop lazy TLB switching completely, which costs 7.5% in a context
switch ping-pong test betwee a process and kernel idle thread.
- An IPI can have remote CPUs flush the entire PID, but the local CPU
can flush a specific VA. tlbie would require over-flushing of the
local CPU (where the process is running).
- A single threaded process that is migrated to a different CPU is
likely to have a relatively small mm_cpumask, so IPI is reasonable.
No other thread can concurrently switch to this mm, because it must
have been given a reference to mm_users by the current thread before it
can use_mm. mm_users can be asynchronously incremented (by
mm_activate or mmget_not_zero), but those users must use remote mm
access and can't use_mm or access user address space. Existing code
makes the this assumption already, for example sparc64 has reset
mm_cpumask using this condition since the start of history, see
arch/sparc/kernel/smp_64.c.
This reduces tlbies for a kernel compile workload from 0.90M to 0.12M,
tlbiels are increased significantly due to the PID flushing for the
cleaning up remote CPUs, and increased local flushes (PID flushes take
128 tlbiels vs 1 tlbie).
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-06-01 20:01:21 +10:00
|
|
|
if (atomic_read(&mm->context.copros) > 0)
|
|
|
|
return true;
|
|
|
|
return false;
|
2018-03-23 09:29:06 +11:00
|
|
|
}
|
|
|
|
|
2020-12-17 23:47:30 +10:00
|
|
|
/*
|
|
|
|
* If always_flush is true, then flush even if this CPU can't be removed
|
|
|
|
* from mm_cpumask.
|
|
|
|
*/
|
|
|
|
void exit_lazy_flush_tlb(struct mm_struct *mm, bool always_flush)
|
powerpc/64s/radix: flush remote CPUs out of single-threaded mm_cpumask
When a single-threaded process has a non-local mm_cpumask, try to use
that point to flush the TLBs out of other CPUs in the cpumask.
An IPI is used for clearing remote CPUs for a few reasons:
- An IPI can end lazy TLB use of the mm, which is required to prevent
TLB entries being created on the remote CPU. The alternative is to
drop lazy TLB switching completely, which costs 7.5% in a context
switch ping-pong test betwee a process and kernel idle thread.
- An IPI can have remote CPUs flush the entire PID, but the local CPU
can flush a specific VA. tlbie would require over-flushing of the
local CPU (where the process is running).
- A single threaded process that is migrated to a different CPU is
likely to have a relatively small mm_cpumask, so IPI is reasonable.
No other thread can concurrently switch to this mm, because it must
have been given a reference to mm_users by the current thread before it
can use_mm. mm_users can be asynchronously incremented (by
mm_activate or mmget_not_zero), but those users must use remote mm
access and can't use_mm or access user address space. Existing code
makes the this assumption already, for example sparc64 has reset
mm_cpumask using this condition since the start of history, see
arch/sparc/kernel/smp_64.c.
This reduces tlbies for a kernel compile workload from 0.90M to 0.12M,
tlbiels are increased significantly due to the PID flushing for the
cleaning up remote CPUs, and increased local flushes (PID flushes take
128 tlbiels vs 1 tlbie).
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-06-01 20:01:21 +10:00
|
|
|
{
|
|
|
|
unsigned long pid = mm->context.id;
|
2020-12-17 23:47:25 +10:00
|
|
|
int cpu = smp_processor_id();
|
powerpc/64s/radix: flush remote CPUs out of single-threaded mm_cpumask
When a single-threaded process has a non-local mm_cpumask, try to use
that point to flush the TLBs out of other CPUs in the cpumask.
An IPI is used for clearing remote CPUs for a few reasons:
- An IPI can end lazy TLB use of the mm, which is required to prevent
TLB entries being created on the remote CPU. The alternative is to
drop lazy TLB switching completely, which costs 7.5% in a context
switch ping-pong test betwee a process and kernel idle thread.
- An IPI can have remote CPUs flush the entire PID, but the local CPU
can flush a specific VA. tlbie would require over-flushing of the
local CPU (where the process is running).
- A single threaded process that is migrated to a different CPU is
likely to have a relatively small mm_cpumask, so IPI is reasonable.
No other thread can concurrently switch to this mm, because it must
have been given a reference to mm_users by the current thread before it
can use_mm. mm_users can be asynchronously incremented (by
mm_activate or mmget_not_zero), but those users must use remote mm
access and can't use_mm or access user address space. Existing code
makes the this assumption already, for example sparc64 has reset
mm_cpumask using this condition since the start of history, see
arch/sparc/kernel/smp_64.c.
This reduces tlbies for a kernel compile workload from 0.90M to 0.12M,
tlbiels are increased significantly due to the PID flushing for the
cleaning up remote CPUs, and increased local flushes (PID flushes take
128 tlbiels vs 1 tlbie).
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-06-01 20:01:21 +10:00
|
|
|
|
2020-09-14 14:52:19 +10:00
|
|
|
/*
|
|
|
|
* A kthread could have done a mmget_not_zero() after the flushing CPU
|
2020-12-17 23:47:25 +10:00
|
|
|
* checked mm_cpumask, and be in the process of kthread_use_mm when
|
|
|
|
* interrupted here. In that case, current->mm will be set to mm,
|
|
|
|
* because kthread_use_mm() setting ->mm and switching to the mm is
|
|
|
|
* done with interrupts off.
|
2020-09-14 14:52:19 +10:00
|
|
|
*/
|
powerpc/64s/radix: flush remote CPUs out of single-threaded mm_cpumask
When a single-threaded process has a non-local mm_cpumask, try to use
that point to flush the TLBs out of other CPUs in the cpumask.
An IPI is used for clearing remote CPUs for a few reasons:
- An IPI can end lazy TLB use of the mm, which is required to prevent
TLB entries being created on the remote CPU. The alternative is to
drop lazy TLB switching completely, which costs 7.5% in a context
switch ping-pong test betwee a process and kernel idle thread.
- An IPI can have remote CPUs flush the entire PID, but the local CPU
can flush a specific VA. tlbie would require over-flushing of the
local CPU (where the process is running).
- A single threaded process that is migrated to a different CPU is
likely to have a relatively small mm_cpumask, so IPI is reasonable.
No other thread can concurrently switch to this mm, because it must
have been given a reference to mm_users by the current thread before it
can use_mm. mm_users can be asynchronously incremented (by
mm_activate or mmget_not_zero), but those users must use remote mm
access and can't use_mm or access user address space. Existing code
makes the this assumption already, for example sparc64 has reset
mm_cpumask using this condition since the start of history, see
arch/sparc/kernel/smp_64.c.
This reduces tlbies for a kernel compile workload from 0.90M to 0.12M,
tlbiels are increased significantly due to the PID flushing for the
cleaning up remote CPUs, and increased local flushes (PID flushes take
128 tlbiels vs 1 tlbie).
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-06-01 20:01:21 +10:00
|
|
|
if (current->mm == mm)
|
2020-12-17 23:47:30 +10:00
|
|
|
goto out;
|
powerpc/64s/radix: flush remote CPUs out of single-threaded mm_cpumask
When a single-threaded process has a non-local mm_cpumask, try to use
that point to flush the TLBs out of other CPUs in the cpumask.
An IPI is used for clearing remote CPUs for a few reasons:
- An IPI can end lazy TLB use of the mm, which is required to prevent
TLB entries being created on the remote CPU. The alternative is to
drop lazy TLB switching completely, which costs 7.5% in a context
switch ping-pong test betwee a process and kernel idle thread.
- An IPI can have remote CPUs flush the entire PID, but the local CPU
can flush a specific VA. tlbie would require over-flushing of the
local CPU (where the process is running).
- A single threaded process that is migrated to a different CPU is
likely to have a relatively small mm_cpumask, so IPI is reasonable.
No other thread can concurrently switch to this mm, because it must
have been given a reference to mm_users by the current thread before it
can use_mm. mm_users can be asynchronously incremented (by
mm_activate or mmget_not_zero), but those users must use remote mm
access and can't use_mm or access user address space. Existing code
makes the this assumption already, for example sparc64 has reset
mm_cpumask using this condition since the start of history, see
arch/sparc/kernel/smp_64.c.
This reduces tlbies for a kernel compile workload from 0.90M to 0.12M,
tlbiels are increased significantly due to the PID flushing for the
cleaning up remote CPUs, and increased local flushes (PID flushes take
128 tlbiels vs 1 tlbie).
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-06-01 20:01:21 +10:00
|
|
|
|
|
|
|
if (current->active_mm == mm) {
|
2023-06-07 10:56:00 +10:00
|
|
|
unsigned long flags;
|
|
|
|
|
2020-09-14 14:52:19 +10:00
|
|
|
WARN_ON_ONCE(current->mm != NULL);
|
2023-06-07 10:56:00 +10:00
|
|
|
/*
|
|
|
|
* It is a kernel thread and is using mm as the lazy tlb, so
|
|
|
|
* switch it to init_mm. This is not always called from IPI
|
|
|
|
* (e.g., flush_type_needed), so must disable irqs.
|
|
|
|
*/
|
|
|
|
local_irq_save(flags);
|
2023-02-03 17:18:34 +10:00
|
|
|
mmgrab_lazy_tlb(&init_mm);
|
powerpc/64s/radix: flush remote CPUs out of single-threaded mm_cpumask
When a single-threaded process has a non-local mm_cpumask, try to use
that point to flush the TLBs out of other CPUs in the cpumask.
An IPI is used for clearing remote CPUs for a few reasons:
- An IPI can end lazy TLB use of the mm, which is required to prevent
TLB entries being created on the remote CPU. The alternative is to
drop lazy TLB switching completely, which costs 7.5% in a context
switch ping-pong test betwee a process and kernel idle thread.
- An IPI can have remote CPUs flush the entire PID, but the local CPU
can flush a specific VA. tlbie would require over-flushing of the
local CPU (where the process is running).
- A single threaded process that is migrated to a different CPU is
likely to have a relatively small mm_cpumask, so IPI is reasonable.
No other thread can concurrently switch to this mm, because it must
have been given a reference to mm_users by the current thread before it
can use_mm. mm_users can be asynchronously incremented (by
mm_activate or mmget_not_zero), but those users must use remote mm
access and can't use_mm or access user address space. Existing code
makes the this assumption already, for example sparc64 has reset
mm_cpumask using this condition since the start of history, see
arch/sparc/kernel/smp_64.c.
This reduces tlbies for a kernel compile workload from 0.90M to 0.12M,
tlbiels are increased significantly due to the PID flushing for the
cleaning up remote CPUs, and increased local flushes (PID flushes take
128 tlbiels vs 1 tlbie).
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-06-01 20:01:21 +10:00
|
|
|
current->active_mm = &init_mm;
|
2020-09-14 14:52:19 +10:00
|
|
|
switch_mm_irqs_off(mm, &init_mm, current);
|
2023-02-03 17:18:34 +10:00
|
|
|
mmdrop_lazy_tlb(mm);
|
2023-06-07 10:56:00 +10:00
|
|
|
local_irq_restore(flags);
|
powerpc/64s/radix: flush remote CPUs out of single-threaded mm_cpumask
When a single-threaded process has a non-local mm_cpumask, try to use
that point to flush the TLBs out of other CPUs in the cpumask.
An IPI is used for clearing remote CPUs for a few reasons:
- An IPI can end lazy TLB use of the mm, which is required to prevent
TLB entries being created on the remote CPU. The alternative is to
drop lazy TLB switching completely, which costs 7.5% in a context
switch ping-pong test betwee a process and kernel idle thread.
- An IPI can have remote CPUs flush the entire PID, but the local CPU
can flush a specific VA. tlbie would require over-flushing of the
local CPU (where the process is running).
- A single threaded process that is migrated to a different CPU is
likely to have a relatively small mm_cpumask, so IPI is reasonable.
No other thread can concurrently switch to this mm, because it must
have been given a reference to mm_users by the current thread before it
can use_mm. mm_users can be asynchronously incremented (by
mm_activate or mmget_not_zero), but those users must use remote mm
access and can't use_mm or access user address space. Existing code
makes the this assumption already, for example sparc64 has reset
mm_cpumask using this condition since the start of history, see
arch/sparc/kernel/smp_64.c.
This reduces tlbies for a kernel compile workload from 0.90M to 0.12M,
tlbiels are increased significantly due to the PID flushing for the
cleaning up remote CPUs, and increased local flushes (PID flushes take
128 tlbiels vs 1 tlbie).
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-06-01 20:01:21 +10:00
|
|
|
}
|
2020-09-14 14:52:19 +10:00
|
|
|
|
2020-12-17 23:47:25 +10:00
|
|
|
/*
|
2020-12-17 23:47:28 +10:00
|
|
|
* This IPI may be initiated from any source including those not
|
|
|
|
* running the mm, so there may be a racing IPI that comes after
|
|
|
|
* this one which finds the cpumask already clear. Check and avoid
|
|
|
|
* underflowing the active_cpus count in that case. The race should
|
|
|
|
* not otherwise be a problem, but the TLB must be flushed because
|
|
|
|
* that's what the caller expects.
|
2020-12-17 23:47:25 +10:00
|
|
|
*/
|
|
|
|
if (cpumask_test_cpu(cpu, mm_cpumask(mm))) {
|
2023-05-24 16:08:19 +10:00
|
|
|
dec_mm_active_cpus(mm);
|
2020-12-17 23:47:25 +10:00
|
|
|
cpumask_clear_cpu(cpu, mm_cpumask(mm));
|
2020-12-17 23:47:30 +10:00
|
|
|
always_flush = true;
|
2020-12-17 23:47:25 +10:00
|
|
|
}
|
2020-09-14 14:52:19 +10:00
|
|
|
|
2020-12-17 23:47:30 +10:00
|
|
|
out:
|
|
|
|
if (always_flush)
|
|
|
|
_tlbiel_pid(pid, RIC_FLUSH_ALL);
|
powerpc/64s/radix: flush remote CPUs out of single-threaded mm_cpumask
When a single-threaded process has a non-local mm_cpumask, try to use
that point to flush the TLBs out of other CPUs in the cpumask.
An IPI is used for clearing remote CPUs for a few reasons:
- An IPI can end lazy TLB use of the mm, which is required to prevent
TLB entries being created on the remote CPU. The alternative is to
drop lazy TLB switching completely, which costs 7.5% in a context
switch ping-pong test betwee a process and kernel idle thread.
- An IPI can have remote CPUs flush the entire PID, but the local CPU
can flush a specific VA. tlbie would require over-flushing of the
local CPU (where the process is running).
- A single threaded process that is migrated to a different CPU is
likely to have a relatively small mm_cpumask, so IPI is reasonable.
No other thread can concurrently switch to this mm, because it must
have been given a reference to mm_users by the current thread before it
can use_mm. mm_users can be asynchronously incremented (by
mm_activate or mmget_not_zero), but those users must use remote mm
access and can't use_mm or access user address space. Existing code
makes the this assumption already, for example sparc64 has reset
mm_cpumask using this condition since the start of history, see
arch/sparc/kernel/smp_64.c.
This reduces tlbies for a kernel compile workload from 0.90M to 0.12M,
tlbiels are increased significantly due to the PID flushing for the
cleaning up remote CPUs, and increased local flushes (PID flushes take
128 tlbiels vs 1 tlbie).
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-06-01 20:01:21 +10:00
|
|
|
}
|
|
|
|
|
2020-12-17 23:47:29 +10:00
|
|
|
#ifdef CONFIG_SMP
|
|
|
|
static void do_exit_flush_lazy_tlb(void *arg)
|
|
|
|
{
|
|
|
|
struct mm_struct *mm = arg;
|
2020-12-17 23:47:30 +10:00
|
|
|
exit_lazy_flush_tlb(mm, true);
|
2020-12-17 23:47:29 +10:00
|
|
|
}
|
|
|
|
|
powerpc/64s/radix: flush remote CPUs out of single-threaded mm_cpumask
When a single-threaded process has a non-local mm_cpumask, try to use
that point to flush the TLBs out of other CPUs in the cpumask.
An IPI is used for clearing remote CPUs for a few reasons:
- An IPI can end lazy TLB use of the mm, which is required to prevent
TLB entries being created on the remote CPU. The alternative is to
drop lazy TLB switching completely, which costs 7.5% in a context
switch ping-pong test betwee a process and kernel idle thread.
- An IPI can have remote CPUs flush the entire PID, but the local CPU
can flush a specific VA. tlbie would require over-flushing of the
local CPU (where the process is running).
- A single threaded process that is migrated to a different CPU is
likely to have a relatively small mm_cpumask, so IPI is reasonable.
No other thread can concurrently switch to this mm, because it must
have been given a reference to mm_users by the current thread before it
can use_mm. mm_users can be asynchronously incremented (by
mm_activate or mmget_not_zero), but those users must use remote mm
access and can't use_mm or access user address space. Existing code
makes the this assumption already, for example sparc64 has reset
mm_cpumask using this condition since the start of history, see
arch/sparc/kernel/smp_64.c.
This reduces tlbies for a kernel compile workload from 0.90M to 0.12M,
tlbiels are increased significantly due to the PID flushing for the
cleaning up remote CPUs, and increased local flushes (PID flushes take
128 tlbiels vs 1 tlbie).
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-06-01 20:01:21 +10:00
|
|
|
static void exit_flush_lazy_tlbs(struct mm_struct *mm)
|
|
|
|
{
|
|
|
|
/*
|
|
|
|
* Would be nice if this was async so it could be run in
|
|
|
|
* parallel with our local flush, but generic code does not
|
|
|
|
* give a good API for it. Could extend the generic code or
|
|
|
|
* make a special powerpc IPI for flushing TLBs.
|
|
|
|
* For now it's not too performance critical.
|
|
|
|
*/
|
|
|
|
smp_call_function_many(mm_cpumask(mm), do_exit_flush_lazy_tlb,
|
|
|
|
(void *)mm, 1);
|
|
|
|
}
|
2020-12-17 23:47:29 +10:00
|
|
|
|
2020-12-17 23:47:26 +10:00
|
|
|
#else /* CONFIG_SMP */
|
|
|
|
static inline void exit_flush_lazy_tlbs(struct mm_struct *mm) { }
|
|
|
|
#endif /* CONFIG_SMP */
|
|
|
|
|
2020-12-17 23:47:29 +10:00
|
|
|
static DEFINE_PER_CPU(unsigned int, mm_cpumask_trim_clock);
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Interval between flushes at which we send out IPIs to check whether the
|
|
|
|
* mm_cpumask can be trimmed for the case where it's not a single-threaded
|
|
|
|
* process flushing its own mm. The intent is to reduce the cost of later
|
|
|
|
* flushes. Don't want this to be so low that it adds noticable cost to TLB
|
|
|
|
* flushing, or so high that it doesn't help reduce global TLBIEs.
|
|
|
|
*/
|
|
|
|
static unsigned long tlb_mm_cpumask_trim_timer = 1073;
|
|
|
|
|
|
|
|
static bool tick_and_test_trim_clock(void)
|
|
|
|
{
|
|
|
|
if (__this_cpu_inc_return(mm_cpumask_trim_clock) ==
|
|
|
|
tlb_mm_cpumask_trim_timer) {
|
|
|
|
__this_cpu_write(mm_cpumask_trim_clock, 0);
|
|
|
|
return true;
|
|
|
|
}
|
|
|
|
return false;
|
|
|
|
}
|
|
|
|
|
2020-12-17 23:47:26 +10:00
|
|
|
enum tlb_flush_type {
|
powerpc/64s/radix: Check for no TLB flush required
If there are no CPUs in mm_cpumask, no TLB flush is required at all.
This patch adds a check for this case.
Currently it's not tested for, in fact mm_is_thread_local() returns
false if the current CPU is not in mm_cpumask, so it's treated as a
global flush.
This can come up in some cases like exec failure before the new mm has
ever been switched to. This patch reduces TLBIE instructions required
to build a kernel from about 120,000 to 45,000. Another situation it
could help is page reclaim, KSM, THP, etc., (i.e., asynch operations
external to the process) where the process is sleeping and has all TLBs
flushed out of all CPUs.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201217134731.488135-4-npiggin@gmail.com
2020-12-17 23:47:27 +10:00
|
|
|
FLUSH_TYPE_NONE,
|
2020-12-17 23:47:26 +10:00
|
|
|
FLUSH_TYPE_LOCAL,
|
|
|
|
FLUSH_TYPE_GLOBAL,
|
|
|
|
};
|
|
|
|
|
|
|
|
static enum tlb_flush_type flush_type_needed(struct mm_struct *mm, bool fullmm)
|
|
|
|
{
|
powerpc/64s/radix: Check for no TLB flush required
If there are no CPUs in mm_cpumask, no TLB flush is required at all.
This patch adds a check for this case.
Currently it's not tested for, in fact mm_is_thread_local() returns
false if the current CPU is not in mm_cpumask, so it's treated as a
global flush.
This can come up in some cases like exec failure before the new mm has
ever been switched to. This patch reduces TLBIE instructions required
to build a kernel from about 120,000 to 45,000. Another situation it
could help is page reclaim, KSM, THP, etc., (i.e., asynch operations
external to the process) where the process is sleeping and has all TLBs
flushed out of all CPUs.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201217134731.488135-4-npiggin@gmail.com
2020-12-17 23:47:27 +10:00
|
|
|
int active_cpus = atomic_read(&mm->context.active_cpus);
|
|
|
|
int cpu = smp_processor_id();
|
|
|
|
|
|
|
|
if (active_cpus == 0)
|
|
|
|
return FLUSH_TYPE_NONE;
|
2020-12-17 23:47:29 +10:00
|
|
|
if (active_cpus == 1 && cpumask_test_cpu(cpu, mm_cpumask(mm))) {
|
|
|
|
if (current->mm != mm) {
|
|
|
|
/*
|
|
|
|
* Asynchronous flush sources may trim down to nothing
|
|
|
|
* if the process is not running, so occasionally try
|
|
|
|
* to trim.
|
|
|
|
*/
|
|
|
|
if (tick_and_test_trim_clock()) {
|
2020-12-17 23:47:30 +10:00
|
|
|
exit_lazy_flush_tlb(mm, true);
|
2020-12-17 23:47:29 +10:00
|
|
|
return FLUSH_TYPE_NONE;
|
|
|
|
}
|
|
|
|
}
|
2020-12-17 23:47:26 +10:00
|
|
|
return FLUSH_TYPE_LOCAL;
|
2020-12-17 23:47:29 +10:00
|
|
|
}
|
2020-12-17 23:47:26 +10:00
|
|
|
|
|
|
|
/* Coprocessors require TLBIE to invalidate nMMU. */
|
|
|
|
if (atomic_read(&mm->context.copros) > 0)
|
|
|
|
return FLUSH_TYPE_GLOBAL;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* In the fullmm case there's no point doing the exit_flush_lazy_tlbs
|
|
|
|
* because the mm is being taken down anyway, and a TLBIE tends to
|
|
|
|
* be faster than an IPI+TLBIEL.
|
|
|
|
*/
|
|
|
|
if (fullmm)
|
|
|
|
return FLUSH_TYPE_GLOBAL;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* If we are running the only thread of a single-threaded process,
|
|
|
|
* then we should almost always be able to trim off the rest of the
|
|
|
|
* CPU mask (except in the case of use_mm() races), so always try
|
|
|
|
* trimming the mask.
|
|
|
|
*/
|
|
|
|
if (atomic_read(&mm->mm_users) <= 1 && current->mm == mm) {
|
|
|
|
exit_flush_lazy_tlbs(mm);
|
|
|
|
/*
|
|
|
|
* use_mm() race could prevent IPIs from being able to clear
|
|
|
|
* the cpumask here, however those users are established
|
|
|
|
* after our first check (and so after the PTEs are removed),
|
|
|
|
* and the TLB still gets flushed by the IPI, so this CPU
|
|
|
|
* will only require a local flush.
|
|
|
|
*/
|
|
|
|
return FLUSH_TYPE_LOCAL;
|
|
|
|
}
|
|
|
|
|
2020-12-17 23:47:29 +10:00
|
|
|
/*
|
|
|
|
* Occasionally try to trim down the cpumask. It's possible this can
|
|
|
|
* bring the mask to zero, which results in no flush.
|
|
|
|
*/
|
|
|
|
if (tick_and_test_trim_clock()) {
|
|
|
|
exit_flush_lazy_tlbs(mm);
|
|
|
|
if (current->mm == mm)
|
|
|
|
return FLUSH_TYPE_LOCAL;
|
|
|
|
if (cpumask_test_cpu(cpu, mm_cpumask(mm)))
|
2020-12-17 23:47:30 +10:00
|
|
|
exit_lazy_flush_tlb(mm, true);
|
2020-12-17 23:47:29 +10:00
|
|
|
return FLUSH_TYPE_NONE;
|
|
|
|
}
|
|
|
|
|
2020-12-17 23:47:26 +10:00
|
|
|
return FLUSH_TYPE_GLOBAL;
|
|
|
|
}
|
powerpc/64s/radix: flush remote CPUs out of single-threaded mm_cpumask
When a single-threaded process has a non-local mm_cpumask, try to use
that point to flush the TLBs out of other CPUs in the cpumask.
An IPI is used for clearing remote CPUs for a few reasons:
- An IPI can end lazy TLB use of the mm, which is required to prevent
TLB entries being created on the remote CPU. The alternative is to
drop lazy TLB switching completely, which costs 7.5% in a context
switch ping-pong test betwee a process and kernel idle thread.
- An IPI can have remote CPUs flush the entire PID, but the local CPU
can flush a specific VA. tlbie would require over-flushing of the
local CPU (where the process is running).
- A single threaded process that is migrated to a different CPU is
likely to have a relatively small mm_cpumask, so IPI is reasonable.
No other thread can concurrently switch to this mm, because it must
have been given a reference to mm_users by the current thread before it
can use_mm. mm_users can be asynchronously incremented (by
mm_activate or mmget_not_zero), but those users must use remote mm
access and can't use_mm or access user address space. Existing code
makes the this assumption already, for example sparc64 has reset
mm_cpumask using this condition since the start of history, see
arch/sparc/kernel/smp_64.c.
This reduces tlbies for a kernel compile workload from 0.90M to 0.12M,
tlbiels are increased significantly due to the PID flushing for the
cleaning up remote CPUs, and increased local flushes (PID flushes take
128 tlbiels vs 1 tlbie).
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-06-01 20:01:21 +10:00
|
|
|
|
2020-12-17 23:47:26 +10:00
|
|
|
#ifdef CONFIG_SMP
|
2016-04-29 23:26:05 +10:00
|
|
|
void radix__flush_tlb_mm(struct mm_struct *mm)
|
|
|
|
{
|
2016-06-02 15:14:48 +05:30
|
|
|
unsigned long pid;
|
2020-12-17 23:47:26 +10:00
|
|
|
enum tlb_flush_type type;
|
2016-04-29 23:26:05 +10:00
|
|
|
|
|
|
|
pid = mm->context.id;
|
2023-02-03 21:17:17 +10:00
|
|
|
if (WARN_ON_ONCE(pid == MMU_NO_CONTEXT))
|
2017-10-24 23:06:53 +10:00
|
|
|
return;
|
2016-04-29 23:26:05 +10:00
|
|
|
|
2017-10-24 23:06:53 +10:00
|
|
|
preempt_disable();
|
2018-06-01 20:01:20 +10:00
|
|
|
/*
|
2020-12-17 23:47:26 +10:00
|
|
|
* Order loads of mm_cpumask (in flush_type_needed) vs previous
|
|
|
|
* stores to clear ptes before the invalidate. See barrier in
|
|
|
|
* switch_mm_irqs_off
|
2018-06-01 20:01:20 +10:00
|
|
|
*/
|
|
|
|
smp_mb();
|
2020-12-17 23:47:26 +10:00
|
|
|
type = flush_type_needed(mm, false);
|
powerpc/64s/radix: Check for no TLB flush required
If there are no CPUs in mm_cpumask, no TLB flush is required at all.
This patch adds a check for this case.
Currently it's not tested for, in fact mm_is_thread_local() returns
false if the current CPU is not in mm_cpumask, so it's treated as a
global flush.
This can come up in some cases like exec failure before the new mm has
ever been switched to. This patch reduces TLBIE instructions required
to build a kernel from about 120,000 to 45,000. Another situation it
could help is page reclaim, KSM, THP, etc., (i.e., asynch operations
external to the process) where the process is sleeping and has all TLBs
flushed out of all CPUs.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201217134731.488135-4-npiggin@gmail.com
2020-12-17 23:47:27 +10:00
|
|
|
if (type == FLUSH_TYPE_LOCAL) {
|
|
|
|
_tlbiel_pid(pid, RIC_FLUSH_TLB);
|
|
|
|
} else if (type == FLUSH_TYPE_GLOBAL) {
|
2020-07-03 11:06:08 +05:30
|
|
|
if (!mmu_has_feature(MMU_FTR_GTSE)) {
|
|
|
|
unsigned long tgt = H_RPTI_TARGET_CMMU;
|
|
|
|
|
|
|
|
if (atomic_read(&mm->context.copros) > 0)
|
|
|
|
tgt |= H_RPTI_TARGET_NMMU;
|
|
|
|
pseries_rpt_invalidate(pid, tgt, H_RPTI_TYPE_TLB,
|
|
|
|
H_RPTI_PAGE_ALL, 0, -1UL);
|
|
|
|
} else if (cputlb_use_tlbie()) {
|
2019-09-03 01:29:31 +10:00
|
|
|
if (mm_needs_flush_escalation(mm))
|
|
|
|
_tlbie_pid(pid, RIC_FLUSH_ALL);
|
|
|
|
else
|
|
|
|
_tlbie_pid(pid, RIC_FLUSH_TLB);
|
|
|
|
} else {
|
|
|
|
_tlbiel_pid_multicast(mm, pid, RIC_FLUSH_TLB);
|
|
|
|
}
|
powerpc/64s/radix: flush remote CPUs out of single-threaded mm_cpumask
When a single-threaded process has a non-local mm_cpumask, try to use
that point to flush the TLBs out of other CPUs in the cpumask.
An IPI is used for clearing remote CPUs for a few reasons:
- An IPI can end lazy TLB use of the mm, which is required to prevent
TLB entries being created on the remote CPU. The alternative is to
drop lazy TLB switching completely, which costs 7.5% in a context
switch ping-pong test betwee a process and kernel idle thread.
- An IPI can have remote CPUs flush the entire PID, but the local CPU
can flush a specific VA. tlbie would require over-flushing of the
local CPU (where the process is running).
- A single threaded process that is migrated to a different CPU is
likely to have a relatively small mm_cpumask, so IPI is reasonable.
No other thread can concurrently switch to this mm, because it must
have been given a reference to mm_users by the current thread before it
can use_mm. mm_users can be asynchronously incremented (by
mm_activate or mmget_not_zero), but those users must use remote mm
access and can't use_mm or access user address space. Existing code
makes the this assumption already, for example sparc64 has reset
mm_cpumask using this condition since the start of history, see
arch/sparc/kernel/smp_64.c.
This reduces tlbies for a kernel compile workload from 0.90M to 0.12M,
tlbiels are increased significantly due to the PID flushing for the
cleaning up remote CPUs, and increased local flushes (PID flushes take
128 tlbiels vs 1 tlbie).
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-06-01 20:01:21 +10:00
|
|
|
}
|
2016-04-29 23:26:05 +10:00
|
|
|
preempt_enable();
|
2023-07-25 23:42:07 +10:00
|
|
|
mmu_notifier_arch_invalidate_secondary_tlbs(mm, 0, -1UL);
|
2016-04-29 23:26:05 +10:00
|
|
|
}
|
|
|
|
EXPORT_SYMBOL(radix__flush_tlb_mm);
|
|
|
|
|
powerpc/64s/radix: flush remote CPUs out of single-threaded mm_cpumask
When a single-threaded process has a non-local mm_cpumask, try to use
that point to flush the TLBs out of other CPUs in the cpumask.
An IPI is used for clearing remote CPUs for a few reasons:
- An IPI can end lazy TLB use of the mm, which is required to prevent
TLB entries being created on the remote CPU. The alternative is to
drop lazy TLB switching completely, which costs 7.5% in a context
switch ping-pong test betwee a process and kernel idle thread.
- An IPI can have remote CPUs flush the entire PID, but the local CPU
can flush a specific VA. tlbie would require over-flushing of the
local CPU (where the process is running).
- A single threaded process that is migrated to a different CPU is
likely to have a relatively small mm_cpumask, so IPI is reasonable.
No other thread can concurrently switch to this mm, because it must
have been given a reference to mm_users by the current thread before it
can use_mm. mm_users can be asynchronously incremented (by
mm_activate or mmget_not_zero), but those users must use remote mm
access and can't use_mm or access user address space. Existing code
makes the this assumption already, for example sparc64 has reset
mm_cpumask using this condition since the start of history, see
arch/sparc/kernel/smp_64.c.
This reduces tlbies for a kernel compile workload from 0.90M to 0.12M,
tlbiels are increased significantly due to the PID flushing for the
cleaning up remote CPUs, and increased local flushes (PID flushes take
128 tlbiels vs 1 tlbie).
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-06-01 20:01:21 +10:00
|
|
|
static void __flush_all_mm(struct mm_struct *mm, bool fullmm)
|
2016-06-08 19:55:51 +05:30
|
|
|
{
|
|
|
|
unsigned long pid;
|
2020-12-17 23:47:26 +10:00
|
|
|
enum tlb_flush_type type;
|
2016-06-08 19:55:51 +05:30
|
|
|
|
|
|
|
pid = mm->context.id;
|
2023-02-03 21:17:17 +10:00
|
|
|
if (WARN_ON_ONCE(pid == MMU_NO_CONTEXT))
|
2017-10-24 23:06:53 +10:00
|
|
|
return;
|
2016-06-08 19:55:51 +05:30
|
|
|
|
2017-10-24 23:06:53 +10:00
|
|
|
preempt_disable();
|
2018-06-01 20:01:20 +10:00
|
|
|
smp_mb(); /* see radix__flush_tlb_mm */
|
2020-12-17 23:47:26 +10:00
|
|
|
type = flush_type_needed(mm, fullmm);
|
powerpc/64s/radix: Check for no TLB flush required
If there are no CPUs in mm_cpumask, no TLB flush is required at all.
This patch adds a check for this case.
Currently it's not tested for, in fact mm_is_thread_local() returns
false if the current CPU is not in mm_cpumask, so it's treated as a
global flush.
This can come up in some cases like exec failure before the new mm has
ever been switched to. This patch reduces TLBIE instructions required
to build a kernel from about 120,000 to 45,000. Another situation it
could help is page reclaim, KSM, THP, etc., (i.e., asynch operations
external to the process) where the process is sleeping and has all TLBs
flushed out of all CPUs.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201217134731.488135-4-npiggin@gmail.com
2020-12-17 23:47:27 +10:00
|
|
|
if (type == FLUSH_TYPE_LOCAL) {
|
|
|
|
_tlbiel_pid(pid, RIC_FLUSH_ALL);
|
|
|
|
} else if (type == FLUSH_TYPE_GLOBAL) {
|
2020-07-03 11:06:08 +05:30
|
|
|
if (!mmu_has_feature(MMU_FTR_GTSE)) {
|
|
|
|
unsigned long tgt = H_RPTI_TARGET_CMMU;
|
|
|
|
unsigned long type = H_RPTI_TYPE_TLB | H_RPTI_TYPE_PWC |
|
|
|
|
H_RPTI_TYPE_PRT;
|
|
|
|
|
|
|
|
if (atomic_read(&mm->context.copros) > 0)
|
|
|
|
tgt |= H_RPTI_TARGET_NMMU;
|
|
|
|
pseries_rpt_invalidate(pid, tgt, type,
|
|
|
|
H_RPTI_PAGE_ALL, 0, -1UL);
|
|
|
|
} else if (cputlb_use_tlbie())
|
2019-09-03 01:29:31 +10:00
|
|
|
_tlbie_pid(pid, RIC_FLUSH_ALL);
|
|
|
|
else
|
|
|
|
_tlbiel_pid_multicast(mm, pid, RIC_FLUSH_ALL);
|
powerpc/64s/radix: flush remote CPUs out of single-threaded mm_cpumask
When a single-threaded process has a non-local mm_cpumask, try to use
that point to flush the TLBs out of other CPUs in the cpumask.
An IPI is used for clearing remote CPUs for a few reasons:
- An IPI can end lazy TLB use of the mm, which is required to prevent
TLB entries being created on the remote CPU. The alternative is to
drop lazy TLB switching completely, which costs 7.5% in a context
switch ping-pong test betwee a process and kernel idle thread.
- An IPI can have remote CPUs flush the entire PID, but the local CPU
can flush a specific VA. tlbie would require over-flushing of the
local CPU (where the process is running).
- A single threaded process that is migrated to a different CPU is
likely to have a relatively small mm_cpumask, so IPI is reasonable.
No other thread can concurrently switch to this mm, because it must
have been given a reference to mm_users by the current thread before it
can use_mm. mm_users can be asynchronously incremented (by
mm_activate or mmget_not_zero), but those users must use remote mm
access and can't use_mm or access user address space. Existing code
makes the this assumption already, for example sparc64 has reset
mm_cpumask using this condition since the start of history, see
arch/sparc/kernel/smp_64.c.
This reduces tlbies for a kernel compile workload from 0.90M to 0.12M,
tlbiels are increased significantly due to the PID flushing for the
cleaning up remote CPUs, and increased local flushes (PID flushes take
128 tlbiels vs 1 tlbie).
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-06-01 20:01:21 +10:00
|
|
|
}
|
2016-06-08 19:55:51 +05:30
|
|
|
preempt_enable();
|
2023-07-25 23:42:07 +10:00
|
|
|
mmu_notifier_arch_invalidate_secondary_tlbs(mm, 0, -1UL);
|
2016-06-08 19:55:51 +05:30
|
|
|
}
|
2019-10-24 13:28:00 +05:30
|
|
|
|
powerpc/64s/radix: flush remote CPUs out of single-threaded mm_cpumask
When a single-threaded process has a non-local mm_cpumask, try to use
that point to flush the TLBs out of other CPUs in the cpumask.
An IPI is used for clearing remote CPUs for a few reasons:
- An IPI can end lazy TLB use of the mm, which is required to prevent
TLB entries being created on the remote CPU. The alternative is to
drop lazy TLB switching completely, which costs 7.5% in a context
switch ping-pong test betwee a process and kernel idle thread.
- An IPI can have remote CPUs flush the entire PID, but the local CPU
can flush a specific VA. tlbie would require over-flushing of the
local CPU (where the process is running).
- A single threaded process that is migrated to a different CPU is
likely to have a relatively small mm_cpumask, so IPI is reasonable.
No other thread can concurrently switch to this mm, because it must
have been given a reference to mm_users by the current thread before it
can use_mm. mm_users can be asynchronously incremented (by
mm_activate or mmget_not_zero), but those users must use remote mm
access and can't use_mm or access user address space. Existing code
makes the this assumption already, for example sparc64 has reset
mm_cpumask using this condition since the start of history, see
arch/sparc/kernel/smp_64.c.
This reduces tlbies for a kernel compile workload from 0.90M to 0.12M,
tlbiels are increased significantly due to the PID flushing for the
cleaning up remote CPUs, and increased local flushes (PID flushes take
128 tlbiels vs 1 tlbie).
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-06-01 20:01:21 +10:00
|
|
|
void radix__flush_all_mm(struct mm_struct *mm)
|
|
|
|
{
|
|
|
|
__flush_all_mm(mm, false);
|
|
|
|
}
|
2017-09-03 20:15:12 +02:00
|
|
|
EXPORT_SYMBOL(radix__flush_all_mm);
|
2017-07-19 14:49:05 +10:00
|
|
|
|
2016-07-13 15:06:41 +05:30
|
|
|
void radix__flush_tlb_page_psize(struct mm_struct *mm, unsigned long vmaddr,
|
2016-07-13 15:06:42 +05:30
|
|
|
int psize)
|
2016-04-29 23:26:05 +10:00
|
|
|
{
|
2016-06-02 15:14:48 +05:30
|
|
|
unsigned long pid;
|
2020-12-17 23:47:26 +10:00
|
|
|
enum tlb_flush_type type;
|
2016-04-29 23:26:05 +10:00
|
|
|
|
2017-10-16 12:41:00 +05:30
|
|
|
pid = mm->context.id;
|
2023-02-03 21:17:17 +10:00
|
|
|
if (WARN_ON_ONCE(pid == MMU_NO_CONTEXT))
|
2017-10-24 23:06:53 +10:00
|
|
|
return;
|
|
|
|
|
|
|
|
preempt_disable();
|
2018-06-01 20:01:20 +10:00
|
|
|
smp_mb(); /* see radix__flush_tlb_mm */
|
2020-12-17 23:47:26 +10:00
|
|
|
type = flush_type_needed(mm, false);
|
powerpc/64s/radix: Check for no TLB flush required
If there are no CPUs in mm_cpumask, no TLB flush is required at all.
This patch adds a check for this case.
Currently it's not tested for, in fact mm_is_thread_local() returns
false if the current CPU is not in mm_cpumask, so it's treated as a
global flush.
This can come up in some cases like exec failure before the new mm has
ever been switched to. This patch reduces TLBIE instructions required
to build a kernel from about 120,000 to 45,000. Another situation it
could help is page reclaim, KSM, THP, etc., (i.e., asynch operations
external to the process) where the process is sleeping and has all TLBs
flushed out of all CPUs.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201217134731.488135-4-npiggin@gmail.com
2020-12-17 23:47:27 +10:00
|
|
|
if (type == FLUSH_TYPE_LOCAL) {
|
|
|
|
_tlbiel_va(vmaddr, pid, psize, RIC_FLUSH_TLB);
|
|
|
|
} else if (type == FLUSH_TYPE_GLOBAL) {
|
2020-07-03 11:06:08 +05:30
|
|
|
if (!mmu_has_feature(MMU_FTR_GTSE)) {
|
|
|
|
unsigned long tgt, pg_sizes, size;
|
|
|
|
|
|
|
|
tgt = H_RPTI_TARGET_CMMU;
|
|
|
|
pg_sizes = psize_to_rpti_pgsize(psize);
|
|
|
|
size = 1UL << mmu_psize_to_shift(psize);
|
|
|
|
|
|
|
|
if (atomic_read(&mm->context.copros) > 0)
|
|
|
|
tgt |= H_RPTI_TARGET_NMMU;
|
|
|
|
pseries_rpt_invalidate(pid, tgt, H_RPTI_TYPE_TLB,
|
|
|
|
pg_sizes, vmaddr,
|
|
|
|
vmaddr + size);
|
|
|
|
} else if (cputlb_use_tlbie())
|
2019-09-03 01:29:31 +10:00
|
|
|
_tlbie_va(vmaddr, pid, psize, RIC_FLUSH_TLB);
|
|
|
|
else
|
|
|
|
_tlbiel_va_multicast(mm, vmaddr, pid, psize, RIC_FLUSH_TLB);
|
powerpc/64s/radix: flush remote CPUs out of single-threaded mm_cpumask
When a single-threaded process has a non-local mm_cpumask, try to use
that point to flush the TLBs out of other CPUs in the cpumask.
An IPI is used for clearing remote CPUs for a few reasons:
- An IPI can end lazy TLB use of the mm, which is required to prevent
TLB entries being created on the remote CPU. The alternative is to
drop lazy TLB switching completely, which costs 7.5% in a context
switch ping-pong test betwee a process and kernel idle thread.
- An IPI can have remote CPUs flush the entire PID, but the local CPU
can flush a specific VA. tlbie would require over-flushing of the
local CPU (where the process is running).
- A single threaded process that is migrated to a different CPU is
likely to have a relatively small mm_cpumask, so IPI is reasonable.
No other thread can concurrently switch to this mm, because it must
have been given a reference to mm_users by the current thread before it
can use_mm. mm_users can be asynchronously incremented (by
mm_activate or mmget_not_zero), but those users must use remote mm
access and can't use_mm or access user address space. Existing code
makes the this assumption already, for example sparc64 has reset
mm_cpumask using this condition since the start of history, see
arch/sparc/kernel/smp_64.c.
This reduces tlbies for a kernel compile workload from 0.90M to 0.12M,
tlbiels are increased significantly due to the PID flushing for the
cleaning up remote CPUs, and increased local flushes (PID flushes take
128 tlbiels vs 1 tlbie).
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-06-01 20:01:21 +10:00
|
|
|
}
|
2016-04-29 23:26:05 +10:00
|
|
|
preempt_enable();
|
|
|
|
}
|
|
|
|
|
|
|
|
void radix__flush_tlb_page(struct vm_area_struct *vma, unsigned long vmaddr)
|
|
|
|
{
|
2016-04-29 23:26:25 +10:00
|
|
|
#ifdef CONFIG_HUGETLB_PAGE
|
2017-10-16 12:41:00 +05:30
|
|
|
if (is_vm_hugetlb_page(vma))
|
|
|
|
return radix__flush_hugetlb_page(vma, vmaddr);
|
2016-04-29 23:26:25 +10:00
|
|
|
#endif
|
2017-10-16 12:41:00 +05:30
|
|
|
radix__flush_tlb_page_psize(vma->vm_mm, vmaddr, mmu_virtual_psize);
|
2016-04-29 23:26:05 +10:00
|
|
|
}
|
|
|
|
EXPORT_SYMBOL(radix__flush_tlb_page);
|
|
|
|
|
|
|
|
#endif /* CONFIG_SMP */
|
|
|
|
|
2019-09-03 01:29:31 +10:00
|
|
|
static void do_tlbiel_kernel(void *info)
|
|
|
|
{
|
|
|
|
_tlbiel_pid(0, RIC_FLUSH_ALL);
|
|
|
|
}
|
|
|
|
|
|
|
|
static inline void _tlbiel_kernel_broadcast(void)
|
|
|
|
{
|
|
|
|
on_each_cpu(do_tlbiel_kernel, NULL, 1);
|
|
|
|
if (tlbie_capable) {
|
|
|
|
/*
|
|
|
|
* Coherent accelerators don't refcount kernel memory mappings,
|
|
|
|
* so have to always issue a tlbie for them. This is quite a
|
|
|
|
* slow path anyway.
|
|
|
|
*/
|
|
|
|
_tlbie_pid(0, RIC_FLUSH_ALL);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2019-06-20 14:12:01 +10:00
|
|
|
/*
|
|
|
|
* If kernel TLBIs ever become local rather than global, then
|
|
|
|
* drivers/misc/ocxl/link.c:ocxl_link_add_pe will need some work, as it
|
|
|
|
* assumes kernel TLBIs are global.
|
|
|
|
*/
|
2016-04-29 23:26:05 +10:00
|
|
|
void radix__flush_tlb_kernel_range(unsigned long start, unsigned long end)
|
|
|
|
{
|
2020-07-03 11:06:08 +05:30
|
|
|
if (!mmu_has_feature(MMU_FTR_GTSE)) {
|
|
|
|
unsigned long tgt = H_RPTI_TARGET_CMMU | H_RPTI_TARGET_NMMU;
|
|
|
|
unsigned long type = H_RPTI_TYPE_TLB | H_RPTI_TYPE_PWC |
|
|
|
|
H_RPTI_TYPE_PRT;
|
|
|
|
|
|
|
|
pseries_rpt_invalidate(0, tgt, type, H_RPTI_PAGE_ALL,
|
|
|
|
start, end);
|
|
|
|
} else if (cputlb_use_tlbie())
|
2019-09-03 01:29:31 +10:00
|
|
|
_tlbie_pid(0, RIC_FLUSH_ALL);
|
|
|
|
else
|
|
|
|
_tlbiel_kernel_broadcast();
|
2016-04-29 23:26:05 +10:00
|
|
|
}
|
|
|
|
EXPORT_SYMBOL(radix__flush_tlb_kernel_range);
|
|
|
|
|
2023-02-03 21:17:18 +10:00
|
|
|
/*
|
|
|
|
* Doesn't appear to be used anywhere. Remove.
|
|
|
|
*/
|
powerpc/64s/radix: Optimize flush_tlb_range
Currently for radix, flush_tlb_range flushes the entire PID, because
the Linux mm code does not tell us about page size here for THP vs
regular pages. This is quite sub-optimal for small mremap / mprotect
/ change_protection.
So implement va range flushes with two flush passes, one for each
page size (regular and THP). The second flush has an order of matnitude
fewer tlbie instructions than the first, so it is a relatively small
additional cost.
There is still room for improvement here with some changes to generic
APIs, particularly if there are mostly THP pages to be invalidated,
the small page flushes could be reduced.
Time to mprotect 1 page of memory (after mmap, touch):
vanilla 2.9us 1.8us
patched 1.2us 1.6us
Time to mprotect 30 pages of memory (after mmap, touch):
vanilla 8.2us 7.2us
patched 6.9us 17.9us
Time to mprotect 34 pages of memory (after mmap, touch):
vanilla 9.1us 8.0us
patched 9.0us 8.0us
34 pages is the point at which the invalidation switches from va
to entire PID, which tlbie can do in a single instruction. This is
why in the case of 30 pages, the new code runs slower for this test.
This is a deliberate tradeoff already present in the unmap and THP
promotion code, the idea is that the benefit from avoiding flushing
entire TLB for this PID on all threads in the system.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-11-07 18:53:07 +11:00
|
|
|
#define TLB_FLUSH_ALL -1UL
|
|
|
|
|
2016-04-29 23:26:05 +10:00
|
|
|
/*
|
powerpc/64s/radix: Optimize flush_tlb_range
Currently for radix, flush_tlb_range flushes the entire PID, because
the Linux mm code does not tell us about page size here for THP vs
regular pages. This is quite sub-optimal for small mremap / mprotect
/ change_protection.
So implement va range flushes with two flush passes, one for each
page size (regular and THP). The second flush has an order of matnitude
fewer tlbie instructions than the first, so it is a relatively small
additional cost.
There is still room for improvement here with some changes to generic
APIs, particularly if there are mostly THP pages to be invalidated,
the small page flushes could be reduced.
Time to mprotect 1 page of memory (after mmap, touch):
vanilla 2.9us 1.8us
patched 1.2us 1.6us
Time to mprotect 30 pages of memory (after mmap, touch):
vanilla 8.2us 7.2us
patched 6.9us 17.9us
Time to mprotect 34 pages of memory (after mmap, touch):
vanilla 9.1us 8.0us
patched 9.0us 8.0us
34 pages is the point at which the invalidation switches from va
to entire PID, which tlbie can do in a single instruction. This is
why in the case of 30 pages, the new code runs slower for this test.
This is a deliberate tradeoff already present in the unmap and THP
promotion code, the idea is that the benefit from avoiding flushing
entire TLB for this PID on all threads in the system.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-11-07 18:53:07 +11:00
|
|
|
* Number of pages above which we invalidate the entire PID rather than
|
|
|
|
* flush individual pages, for local and global flushes respectively.
|
|
|
|
*
|
|
|
|
* tlbie goes out to the interconnect and individual ops are more costly.
|
|
|
|
* It also does not iterate over sets like the local tlbiel variant when
|
|
|
|
* invalidating a full PID, so it has a far lower threshold to change from
|
|
|
|
* individual page flushes to full-pid flushes.
|
2016-04-29 23:26:05 +10:00
|
|
|
*/
|
2021-08-12 18:58:30 +05:30
|
|
|
static u32 tlb_single_page_flush_ceiling __read_mostly = 33;
|
|
|
|
static u32 tlb_local_single_page_flush_ceiling __read_mostly = POWER9_TLB_SETS_RADIX * 2;
|
powerpc/64s/radix: Optimize flush_tlb_range
Currently for radix, flush_tlb_range flushes the entire PID, because
the Linux mm code does not tell us about page size here for THP vs
regular pages. This is quite sub-optimal for small mremap / mprotect
/ change_protection.
So implement va range flushes with two flush passes, one for each
page size (regular and THP). The second flush has an order of matnitude
fewer tlbie instructions than the first, so it is a relatively small
additional cost.
There is still room for improvement here with some changes to generic
APIs, particularly if there are mostly THP pages to be invalidated,
the small page flushes could be reduced.
Time to mprotect 1 page of memory (after mmap, touch):
vanilla 2.9us 1.8us
patched 1.2us 1.6us
Time to mprotect 30 pages of memory (after mmap, touch):
vanilla 8.2us 7.2us
patched 6.9us 17.9us
Time to mprotect 34 pages of memory (after mmap, touch):
vanilla 9.1us 8.0us
patched 9.0us 8.0us
34 pages is the point at which the invalidation switches from va
to entire PID, which tlbie can do in a single instruction. This is
why in the case of 30 pages, the new code runs slower for this test.
This is a deliberate tradeoff already present in the unmap and THP
promotion code, the idea is that the benefit from avoiding flushing
entire TLB for this PID on all threads in the system.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-11-07 18:53:07 +11:00
|
|
|
|
2018-06-15 11:38:37 +10:00
|
|
|
static inline void __radix__flush_tlb_range(struct mm_struct *mm,
|
2019-10-24 13:27:59 +05:30
|
|
|
unsigned long start, unsigned long end)
|
2016-04-29 23:26:05 +10:00
|
|
|
{
|
powerpc/64s/radix: Optimize flush_tlb_range
Currently for radix, flush_tlb_range flushes the entire PID, because
the Linux mm code does not tell us about page size here for THP vs
regular pages. This is quite sub-optimal for small mremap / mprotect
/ change_protection.
So implement va range flushes with two flush passes, one for each
page size (regular and THP). The second flush has an order of matnitude
fewer tlbie instructions than the first, so it is a relatively small
additional cost.
There is still room for improvement here with some changes to generic
APIs, particularly if there are mostly THP pages to be invalidated,
the small page flushes could be reduced.
Time to mprotect 1 page of memory (after mmap, touch):
vanilla 2.9us 1.8us
patched 1.2us 1.6us
Time to mprotect 30 pages of memory (after mmap, touch):
vanilla 8.2us 7.2us
patched 6.9us 17.9us
Time to mprotect 34 pages of memory (after mmap, touch):
vanilla 9.1us 8.0us
patched 9.0us 8.0us
34 pages is the point at which the invalidation switches from va
to entire PID, which tlbie can do in a single instruction. This is
why in the case of 30 pages, the new code runs slower for this test.
This is a deliberate tradeoff already present in the unmap and THP
promotion code, the idea is that the benefit from avoiding flushing
entire TLB for this PID on all threads in the system.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-11-07 18:53:07 +11:00
|
|
|
unsigned long pid;
|
|
|
|
unsigned int page_shift = mmu_psize_defs[mmu_virtual_psize].shift;
|
|
|
|
unsigned long page_size = 1UL << page_shift;
|
|
|
|
unsigned long nr_pages = (end - start) >> page_shift;
|
2021-07-07 18:10:21 -07:00
|
|
|
bool flush_pid, flush_pwc = false;
|
2020-12-17 23:47:26 +10:00
|
|
|
enum tlb_flush_type type;
|
2017-07-19 14:49:05 +10:00
|
|
|
|
powerpc/64s/radix: Optimize flush_tlb_range
Currently for radix, flush_tlb_range flushes the entire PID, because
the Linux mm code does not tell us about page size here for THP vs
regular pages. This is quite sub-optimal for small mremap / mprotect
/ change_protection.
So implement va range flushes with two flush passes, one for each
page size (regular and THP). The second flush has an order of matnitude
fewer tlbie instructions than the first, so it is a relatively small
additional cost.
There is still room for improvement here with some changes to generic
APIs, particularly if there are mostly THP pages to be invalidated,
the small page flushes could be reduced.
Time to mprotect 1 page of memory (after mmap, touch):
vanilla 2.9us 1.8us
patched 1.2us 1.6us
Time to mprotect 30 pages of memory (after mmap, touch):
vanilla 8.2us 7.2us
patched 6.9us 17.9us
Time to mprotect 34 pages of memory (after mmap, touch):
vanilla 9.1us 8.0us
patched 9.0us 8.0us
34 pages is the point at which the invalidation switches from va
to entire PID, which tlbie can do in a single instruction. This is
why in the case of 30 pages, the new code runs slower for this test.
This is a deliberate tradeoff already present in the unmap and THP
promotion code, the idea is that the benefit from avoiding flushing
entire TLB for this PID on all threads in the system.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-11-07 18:53:07 +11:00
|
|
|
pid = mm->context.id;
|
2023-02-03 21:17:17 +10:00
|
|
|
if (WARN_ON_ONCE(pid == MMU_NO_CONTEXT))
|
powerpc/64s/radix: Optimize flush_tlb_range
Currently for radix, flush_tlb_range flushes the entire PID, because
the Linux mm code does not tell us about page size here for THP vs
regular pages. This is quite sub-optimal for small mremap / mprotect
/ change_protection.
So implement va range flushes with two flush passes, one for each
page size (regular and THP). The second flush has an order of matnitude
fewer tlbie instructions than the first, so it is a relatively small
additional cost.
There is still room for improvement here with some changes to generic
APIs, particularly if there are mostly THP pages to be invalidated,
the small page flushes could be reduced.
Time to mprotect 1 page of memory (after mmap, touch):
vanilla 2.9us 1.8us
patched 1.2us 1.6us
Time to mprotect 30 pages of memory (after mmap, touch):
vanilla 8.2us 7.2us
patched 6.9us 17.9us
Time to mprotect 34 pages of memory (after mmap, touch):
vanilla 9.1us 8.0us
patched 9.0us 8.0us
34 pages is the point at which the invalidation switches from va
to entire PID, which tlbie can do in a single instruction. This is
why in the case of 30 pages, the new code runs slower for this test.
This is a deliberate tradeoff already present in the unmap and THP
promotion code, the idea is that the benefit from avoiding flushing
entire TLB for this PID on all threads in the system.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-11-07 18:53:07 +11:00
|
|
|
return;
|
|
|
|
|
2023-02-03 21:17:18 +10:00
|
|
|
WARN_ON_ONCE(end == TLB_FLUSH_ALL);
|
|
|
|
|
powerpc/64s/radix: Optimize flush_tlb_range
Currently for radix, flush_tlb_range flushes the entire PID, because
the Linux mm code does not tell us about page size here for THP vs
regular pages. This is quite sub-optimal for small mremap / mprotect
/ change_protection.
So implement va range flushes with two flush passes, one for each
page size (regular and THP). The second flush has an order of matnitude
fewer tlbie instructions than the first, so it is a relatively small
additional cost.
There is still room for improvement here with some changes to generic
APIs, particularly if there are mostly THP pages to be invalidated,
the small page flushes could be reduced.
Time to mprotect 1 page of memory (after mmap, touch):
vanilla 2.9us 1.8us
patched 1.2us 1.6us
Time to mprotect 30 pages of memory (after mmap, touch):
vanilla 8.2us 7.2us
patched 6.9us 17.9us
Time to mprotect 34 pages of memory (after mmap, touch):
vanilla 9.1us 8.0us
patched 9.0us 8.0us
34 pages is the point at which the invalidation switches from va
to entire PID, which tlbie can do in a single instruction. This is
why in the case of 30 pages, the new code runs slower for this test.
This is a deliberate tradeoff already present in the unmap and THP
promotion code, the idea is that the benefit from avoiding flushing
entire TLB for this PID on all threads in the system.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-11-07 18:53:07 +11:00
|
|
|
preempt_disable();
|
2018-06-01 20:01:20 +10:00
|
|
|
smp_mb(); /* see radix__flush_tlb_mm */
|
2023-02-03 21:17:18 +10:00
|
|
|
type = flush_type_needed(mm, false);
|
powerpc/64s/radix: Check for no TLB flush required
If there are no CPUs in mm_cpumask, no TLB flush is required at all.
This patch adds a check for this case.
Currently it's not tested for, in fact mm_is_thread_local() returns
false if the current CPU is not in mm_cpumask, so it's treated as a
global flush.
This can come up in some cases like exec failure before the new mm has
ever been switched to. This patch reduces TLBIE instructions required
to build a kernel from about 120,000 to 45,000. Another situation it
could help is page reclaim, KSM, THP, etc., (i.e., asynch operations
external to the process) where the process is sleeping and has all TLBs
flushed out of all CPUs.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201217134731.488135-4-npiggin@gmail.com
2020-12-17 23:47:27 +10:00
|
|
|
if (type == FLUSH_TYPE_NONE)
|
|
|
|
goto out;
|
2020-12-17 23:47:26 +10:00
|
|
|
|
2023-02-03 21:17:18 +10:00
|
|
|
if (type == FLUSH_TYPE_GLOBAL)
|
2020-12-17 23:47:26 +10:00
|
|
|
flush_pid = nr_pages > tlb_single_page_flush_ceiling;
|
|
|
|
else
|
|
|
|
flush_pid = nr_pages > tlb_local_single_page_flush_ceiling;
|
2021-07-07 18:10:21 -07:00
|
|
|
/*
|
|
|
|
* full pid flush already does the PWC flush. if it is not full pid
|
|
|
|
* flush check the range is more than PMD and force a pwc flush
|
|
|
|
* mremap() depends on this behaviour.
|
|
|
|
*/
|
|
|
|
if (!flush_pid && (end - start) >= PMD_SIZE)
|
|
|
|
flush_pwc = true;
|
powerpc/64s/radix: Optimize flush_tlb_range
Currently for radix, flush_tlb_range flushes the entire PID, because
the Linux mm code does not tell us about page size here for THP vs
regular pages. This is quite sub-optimal for small mremap / mprotect
/ change_protection.
So implement va range flushes with two flush passes, one for each
page size (regular and THP). The second flush has an order of matnitude
fewer tlbie instructions than the first, so it is a relatively small
additional cost.
There is still room for improvement here with some changes to generic
APIs, particularly if there are mostly THP pages to be invalidated,
the small page flushes could be reduced.
Time to mprotect 1 page of memory (after mmap, touch):
vanilla 2.9us 1.8us
patched 1.2us 1.6us
Time to mprotect 30 pages of memory (after mmap, touch):
vanilla 8.2us 7.2us
patched 6.9us 17.9us
Time to mprotect 34 pages of memory (after mmap, touch):
vanilla 9.1us 8.0us
patched 9.0us 8.0us
34 pages is the point at which the invalidation switches from va
to entire PID, which tlbie can do in a single instruction. This is
why in the case of 30 pages, the new code runs slower for this test.
This is a deliberate tradeoff already present in the unmap and THP
promotion code, the idea is that the benefit from avoiding flushing
entire TLB for this PID on all threads in the system.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-11-07 18:53:07 +11:00
|
|
|
|
2020-12-17 23:47:26 +10:00
|
|
|
if (!mmu_has_feature(MMU_FTR_GTSE) && type == FLUSH_TYPE_GLOBAL) {
|
2021-07-07 18:10:21 -07:00
|
|
|
unsigned long type = H_RPTI_TYPE_TLB;
|
2020-07-03 11:06:08 +05:30
|
|
|
unsigned long tgt = H_RPTI_TARGET_CMMU;
|
|
|
|
unsigned long pg_sizes = psize_to_rpti_pgsize(mmu_virtual_psize);
|
|
|
|
|
|
|
|
if (IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE))
|
|
|
|
pg_sizes |= psize_to_rpti_pgsize(MMU_PAGE_2M);
|
|
|
|
if (atomic_read(&mm->context.copros) > 0)
|
|
|
|
tgt |= H_RPTI_TARGET_NMMU;
|
2021-07-07 18:10:21 -07:00
|
|
|
if (flush_pwc)
|
|
|
|
type |= H_RPTI_TYPE_PWC;
|
|
|
|
pseries_rpt_invalidate(pid, tgt, type, pg_sizes, start, end);
|
2020-12-17 23:47:26 +10:00
|
|
|
} else if (flush_pid) {
|
2021-07-07 18:10:21 -07:00
|
|
|
/*
|
|
|
|
* We are now flushing a range larger than PMD size force a RIC_FLUSH_ALL
|
|
|
|
*/
|
2020-12-17 23:47:26 +10:00
|
|
|
if (type == FLUSH_TYPE_LOCAL) {
|
2021-07-07 18:10:21 -07:00
|
|
|
_tlbiel_pid(pid, RIC_FLUSH_ALL);
|
2018-03-23 09:29:06 +11:00
|
|
|
} else {
|
2019-09-03 01:29:31 +10:00
|
|
|
if (cputlb_use_tlbie()) {
|
2021-07-07 18:10:21 -07:00
|
|
|
_tlbie_pid(pid, RIC_FLUSH_ALL);
|
2019-09-03 01:29:31 +10:00
|
|
|
} else {
|
2021-07-07 18:10:21 -07:00
|
|
|
_tlbiel_pid_multicast(mm, pid, RIC_FLUSH_ALL);
|
2019-09-03 01:29:31 +10:00
|
|
|
}
|
2018-03-23 09:29:06 +11:00
|
|
|
}
|
powerpc/64s/radix: Optimize flush_tlb_range
Currently for radix, flush_tlb_range flushes the entire PID, because
the Linux mm code does not tell us about page size here for THP vs
regular pages. This is quite sub-optimal for small mremap / mprotect
/ change_protection.
So implement va range flushes with two flush passes, one for each
page size (regular and THP). The second flush has an order of matnitude
fewer tlbie instructions than the first, so it is a relatively small
additional cost.
There is still room for improvement here with some changes to generic
APIs, particularly if there are mostly THP pages to be invalidated,
the small page flushes could be reduced.
Time to mprotect 1 page of memory (after mmap, touch):
vanilla 2.9us 1.8us
patched 1.2us 1.6us
Time to mprotect 30 pages of memory (after mmap, touch):
vanilla 8.2us 7.2us
patched 6.9us 17.9us
Time to mprotect 34 pages of memory (after mmap, touch):
vanilla 9.1us 8.0us
patched 9.0us 8.0us
34 pages is the point at which the invalidation switches from va
to entire PID, which tlbie can do in a single instruction. This is
why in the case of 30 pages, the new code runs slower for this test.
This is a deliberate tradeoff already present in the unmap and THP
promotion code, the idea is that the benefit from avoiding flushing
entire TLB for this PID on all threads in the system.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-11-07 18:53:07 +11:00
|
|
|
} else {
|
2022-08-10 13:43:18 +02:00
|
|
|
bool hflush;
|
powerpc/64s/radix: Optimize flush_tlb_range
Currently for radix, flush_tlb_range flushes the entire PID, because
the Linux mm code does not tell us about page size here for THP vs
regular pages. This is quite sub-optimal for small mremap / mprotect
/ change_protection.
So implement va range flushes with two flush passes, one for each
page size (regular and THP). The second flush has an order of matnitude
fewer tlbie instructions than the first, so it is a relatively small
additional cost.
There is still room for improvement here with some changes to generic
APIs, particularly if there are mostly THP pages to be invalidated,
the small page flushes could be reduced.
Time to mprotect 1 page of memory (after mmap, touch):
vanilla 2.9us 1.8us
patched 1.2us 1.6us
Time to mprotect 30 pages of memory (after mmap, touch):
vanilla 8.2us 7.2us
patched 6.9us 17.9us
Time to mprotect 34 pages of memory (after mmap, touch):
vanilla 9.1us 8.0us
patched 9.0us 8.0us
34 pages is the point at which the invalidation switches from va
to entire PID, which tlbie can do in a single instruction. This is
why in the case of 30 pages, the new code runs slower for this test.
This is a deliberate tradeoff already present in the unmap and THP
promotion code, the idea is that the benefit from avoiding flushing
entire TLB for this PID on all threads in the system.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-11-07 18:53:07 +11:00
|
|
|
unsigned long hstart, hend;
|
|
|
|
|
2022-08-10 13:43:18 +02:00
|
|
|
hstart = (start + PMD_SIZE - 1) & PMD_MASK;
|
|
|
|
hend = end & PMD_MASK;
|
|
|
|
hflush = IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE) && hstart < hend;
|
powerpc/64s/radix: Optimize flush_tlb_range
Currently for radix, flush_tlb_range flushes the entire PID, because
the Linux mm code does not tell us about page size here for THP vs
regular pages. This is quite sub-optimal for small mremap / mprotect
/ change_protection.
So implement va range flushes with two flush passes, one for each
page size (regular and THP). The second flush has an order of matnitude
fewer tlbie instructions than the first, so it is a relatively small
additional cost.
There is still room for improvement here with some changes to generic
APIs, particularly if there are mostly THP pages to be invalidated,
the small page flushes could be reduced.
Time to mprotect 1 page of memory (after mmap, touch):
vanilla 2.9us 1.8us
patched 1.2us 1.6us
Time to mprotect 30 pages of memory (after mmap, touch):
vanilla 8.2us 7.2us
patched 6.9us 17.9us
Time to mprotect 34 pages of memory (after mmap, touch):
vanilla 9.1us 8.0us
patched 9.0us 8.0us
34 pages is the point at which the invalidation switches from va
to entire PID, which tlbie can do in a single instruction. This is
why in the case of 30 pages, the new code runs slower for this test.
This is a deliberate tradeoff already present in the unmap and THP
promotion code, the idea is that the benefit from avoiding flushing
entire TLB for this PID on all threads in the system.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-11-07 18:53:07 +11:00
|
|
|
|
2020-12-17 23:47:26 +10:00
|
|
|
if (type == FLUSH_TYPE_LOCAL) {
|
2019-09-03 01:29:31 +10:00
|
|
|
asm volatile("ptesync": : :"memory");
|
2021-07-07 18:10:21 -07:00
|
|
|
if (flush_pwc)
|
|
|
|
/* For PWC, only one flush is needed */
|
|
|
|
__tlbiel_pid(pid, 0, RIC_FLUSH_PWC);
|
powerpc/64s/radix: Optimize flush_tlb_range
Currently for radix, flush_tlb_range flushes the entire PID, because
the Linux mm code does not tell us about page size here for THP vs
regular pages. This is quite sub-optimal for small mremap / mprotect
/ change_protection.
So implement va range flushes with two flush passes, one for each
page size (regular and THP). The second flush has an order of matnitude
fewer tlbie instructions than the first, so it is a relatively small
additional cost.
There is still room for improvement here with some changes to generic
APIs, particularly if there are mostly THP pages to be invalidated,
the small page flushes could be reduced.
Time to mprotect 1 page of memory (after mmap, touch):
vanilla 2.9us 1.8us
patched 1.2us 1.6us
Time to mprotect 30 pages of memory (after mmap, touch):
vanilla 8.2us 7.2us
patched 6.9us 17.9us
Time to mprotect 34 pages of memory (after mmap, touch):
vanilla 9.1us 8.0us
patched 9.0us 8.0us
34 pages is the point at which the invalidation switches from va
to entire PID, which tlbie can do in a single instruction. This is
why in the case of 30 pages, the new code runs slower for this test.
This is a deliberate tradeoff already present in the unmap and THP
promotion code, the idea is that the benefit from avoiding flushing
entire TLB for this PID on all threads in the system.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-11-07 18:53:07 +11:00
|
|
|
__tlbiel_va_range(start, end, pid, page_size, mmu_virtual_psize);
|
|
|
|
if (hflush)
|
|
|
|
__tlbiel_va_range(hstart, hend, pid,
|
2018-06-15 11:38:37 +10:00
|
|
|
PMD_SIZE, MMU_PAGE_2M);
|
2020-09-16 13:02:34 +10:00
|
|
|
ppc_after_tlbiel_barrier();
|
2019-09-03 01:29:31 +10:00
|
|
|
} else if (cputlb_use_tlbie()) {
|
|
|
|
asm volatile("ptesync": : :"memory");
|
2021-07-07 18:10:21 -07:00
|
|
|
if (flush_pwc)
|
|
|
|
__tlbie_pid(pid, RIC_FLUSH_PWC);
|
powerpc/64s/radix: Optimize flush_tlb_range
Currently for radix, flush_tlb_range flushes the entire PID, because
the Linux mm code does not tell us about page size here for THP vs
regular pages. This is quite sub-optimal for small mremap / mprotect
/ change_protection.
So implement va range flushes with two flush passes, one for each
page size (regular and THP). The second flush has an order of matnitude
fewer tlbie instructions than the first, so it is a relatively small
additional cost.
There is still room for improvement here with some changes to generic
APIs, particularly if there are mostly THP pages to be invalidated,
the small page flushes could be reduced.
Time to mprotect 1 page of memory (after mmap, touch):
vanilla 2.9us 1.8us
patched 1.2us 1.6us
Time to mprotect 30 pages of memory (after mmap, touch):
vanilla 8.2us 7.2us
patched 6.9us 17.9us
Time to mprotect 34 pages of memory (after mmap, touch):
vanilla 9.1us 8.0us
patched 9.0us 8.0us
34 pages is the point at which the invalidation switches from va
to entire PID, which tlbie can do in a single instruction. This is
why in the case of 30 pages, the new code runs slower for this test.
This is a deliberate tradeoff already present in the unmap and THP
promotion code, the idea is that the benefit from avoiding flushing
entire TLB for this PID on all threads in the system.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-11-07 18:53:07 +11:00
|
|
|
__tlbie_va_range(start, end, pid, page_size, mmu_virtual_psize);
|
|
|
|
if (hflush)
|
|
|
|
__tlbie_va_range(hstart, hend, pid,
|
2018-06-15 11:38:37 +10:00
|
|
|
PMD_SIZE, MMU_PAGE_2M);
|
powerpc/64s/radix: Optimize flush_tlb_range
Currently for radix, flush_tlb_range flushes the entire PID, because
the Linux mm code does not tell us about page size here for THP vs
regular pages. This is quite sub-optimal for small mremap / mprotect
/ change_protection.
So implement va range flushes with two flush passes, one for each
page size (regular and THP). The second flush has an order of matnitude
fewer tlbie instructions than the first, so it is a relatively small
additional cost.
There is still room for improvement here with some changes to generic
APIs, particularly if there are mostly THP pages to be invalidated,
the small page flushes could be reduced.
Time to mprotect 1 page of memory (after mmap, touch):
vanilla 2.9us 1.8us
patched 1.2us 1.6us
Time to mprotect 30 pages of memory (after mmap, touch):
vanilla 8.2us 7.2us
patched 6.9us 17.9us
Time to mprotect 34 pages of memory (after mmap, touch):
vanilla 9.1us 8.0us
patched 9.0us 8.0us
34 pages is the point at which the invalidation switches from va
to entire PID, which tlbie can do in a single instruction. This is
why in the case of 30 pages, the new code runs slower for this test.
This is a deliberate tradeoff already present in the unmap and THP
promotion code, the idea is that the benefit from avoiding flushing
entire TLB for this PID on all threads in the system.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-11-07 18:53:07 +11:00
|
|
|
asm volatile("eieio; tlbsync; ptesync": : :"memory");
|
2019-09-03 01:29:31 +10:00
|
|
|
} else {
|
|
|
|
_tlbiel_va_range_multicast(mm,
|
2021-07-07 18:10:21 -07:00
|
|
|
start, end, pid, page_size, mmu_virtual_psize, flush_pwc);
|
2019-09-03 01:29:31 +10:00
|
|
|
if (hflush)
|
|
|
|
_tlbiel_va_range_multicast(mm,
|
2021-07-07 18:10:21 -07:00
|
|
|
hstart, hend, pid, PMD_SIZE, MMU_PAGE_2M, flush_pwc);
|
powerpc/64s/radix: Optimize flush_tlb_range
Currently for radix, flush_tlb_range flushes the entire PID, because
the Linux mm code does not tell us about page size here for THP vs
regular pages. This is quite sub-optimal for small mremap / mprotect
/ change_protection.
So implement va range flushes with two flush passes, one for each
page size (regular and THP). The second flush has an order of matnitude
fewer tlbie instructions than the first, so it is a relatively small
additional cost.
There is still room for improvement here with some changes to generic
APIs, particularly if there are mostly THP pages to be invalidated,
the small page flushes could be reduced.
Time to mprotect 1 page of memory (after mmap, touch):
vanilla 2.9us 1.8us
patched 1.2us 1.6us
Time to mprotect 30 pages of memory (after mmap, touch):
vanilla 8.2us 7.2us
patched 6.9us 17.9us
Time to mprotect 34 pages of memory (after mmap, touch):
vanilla 9.1us 8.0us
patched 9.0us 8.0us
34 pages is the point at which the invalidation switches from va
to entire PID, which tlbie can do in a single instruction. This is
why in the case of 30 pages, the new code runs slower for this test.
This is a deliberate tradeoff already present in the unmap and THP
promotion code, the idea is that the benefit from avoiding flushing
entire TLB for this PID on all threads in the system.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-11-07 18:53:07 +11:00
|
|
|
}
|
|
|
|
}
|
powerpc/64s/radix: Check for no TLB flush required
If there are no CPUs in mm_cpumask, no TLB flush is required at all.
This patch adds a check for this case.
Currently it's not tested for, in fact mm_is_thread_local() returns
false if the current CPU is not in mm_cpumask, so it's treated as a
global flush.
This can come up in some cases like exec failure before the new mm has
ever been switched to. This patch reduces TLBIE instructions required
to build a kernel from about 120,000 to 45,000. Another situation it
could help is page reclaim, KSM, THP, etc., (i.e., asynch operations
external to the process) where the process is sleeping and has all TLBs
flushed out of all CPUs.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201217134731.488135-4-npiggin@gmail.com
2020-12-17 23:47:27 +10:00
|
|
|
out:
|
powerpc/64s/radix: Optimize flush_tlb_range
Currently for radix, flush_tlb_range flushes the entire PID, because
the Linux mm code does not tell us about page size here for THP vs
regular pages. This is quite sub-optimal for small mremap / mprotect
/ change_protection.
So implement va range flushes with two flush passes, one for each
page size (regular and THP). The second flush has an order of matnitude
fewer tlbie instructions than the first, so it is a relatively small
additional cost.
There is still room for improvement here with some changes to generic
APIs, particularly if there are mostly THP pages to be invalidated,
the small page flushes could be reduced.
Time to mprotect 1 page of memory (after mmap, touch):
vanilla 2.9us 1.8us
patched 1.2us 1.6us
Time to mprotect 30 pages of memory (after mmap, touch):
vanilla 8.2us 7.2us
patched 6.9us 17.9us
Time to mprotect 34 pages of memory (after mmap, touch):
vanilla 9.1us 8.0us
patched 9.0us 8.0us
34 pages is the point at which the invalidation switches from va
to entire PID, which tlbie can do in a single instruction. This is
why in the case of 30 pages, the new code runs slower for this test.
This is a deliberate tradeoff already present in the unmap and THP
promotion code, the idea is that the benefit from avoiding flushing
entire TLB for this PID on all threads in the system.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-11-07 18:53:07 +11:00
|
|
|
preempt_enable();
|
2023-07-25 23:42:07 +10:00
|
|
|
mmu_notifier_arch_invalidate_secondary_tlbs(mm, start, end);
|
2016-04-29 23:26:05 +10:00
|
|
|
}
|
2018-06-15 11:38:37 +10:00
|
|
|
|
|
|
|
void radix__flush_tlb_range(struct vm_area_struct *vma, unsigned long start,
|
|
|
|
unsigned long end)
|
|
|
|
|
|
|
|
{
|
|
|
|
#ifdef CONFIG_HUGETLB_PAGE
|
|
|
|
if (is_vm_hugetlb_page(vma))
|
|
|
|
return radix__flush_hugetlb_tlb_range(vma, start, end);
|
|
|
|
#endif
|
|
|
|
|
2019-10-24 13:27:59 +05:30
|
|
|
__radix__flush_tlb_range(vma->vm_mm, start, end);
|
2018-06-15 11:38:37 +10:00
|
|
|
}
|
2016-04-29 23:26:05 +10:00
|
|
|
EXPORT_SYMBOL(radix__flush_tlb_range);
|
|
|
|
|
2016-07-13 15:05:29 +05:30
|
|
|
static int radix_get_mmu_psize(int page_size)
|
|
|
|
{
|
|
|
|
int psize;
|
|
|
|
|
|
|
|
if (page_size == (1UL << mmu_psize_defs[mmu_virtual_psize].shift))
|
|
|
|
psize = mmu_virtual_psize;
|
|
|
|
else if (page_size == (1UL << mmu_psize_defs[MMU_PAGE_2M].shift))
|
|
|
|
psize = MMU_PAGE_2M;
|
|
|
|
else if (page_size == (1UL << mmu_psize_defs[MMU_PAGE_1G].shift))
|
|
|
|
psize = MMU_PAGE_1G;
|
|
|
|
else
|
|
|
|
return -1;
|
|
|
|
return psize;
|
|
|
|
}
|
2016-04-29 23:26:05 +10:00
|
|
|
|
2018-05-09 12:20:18 +10:00
|
|
|
/*
|
|
|
|
* Flush partition scoped LPID address translation for all CPUs.
|
|
|
|
*/
|
|
|
|
void radix__flush_tlb_lpid_page(unsigned int lpid,
|
|
|
|
unsigned long addr,
|
|
|
|
unsigned long page_size)
|
|
|
|
{
|
|
|
|
int psize = radix_get_mmu_psize(page_size);
|
|
|
|
|
|
|
|
_tlbie_lpid_va(addr, lpid, psize, RIC_FLUSH_TLB);
|
|
|
|
}
|
|
|
|
EXPORT_SYMBOL_GPL(radix__flush_tlb_lpid_page);
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Flush partition scoped PWC from LPID for all CPUs.
|
|
|
|
*/
|
|
|
|
void radix__flush_pwc_lpid(unsigned int lpid)
|
|
|
|
{
|
|
|
|
_tlbie_lpid(lpid, RIC_FLUSH_PWC);
|
|
|
|
}
|
|
|
|
EXPORT_SYMBOL_GPL(radix__flush_pwc_lpid);
|
|
|
|
|
2018-10-08 16:31:07 +11:00
|
|
|
/*
|
|
|
|
* Flush partition scoped translations from LPID (=LPIDR)
|
|
|
|
*/
|
2019-09-03 01:29:27 +10:00
|
|
|
void radix__flush_all_lpid(unsigned int lpid)
|
2018-10-08 16:31:07 +11:00
|
|
|
{
|
|
|
|
_tlbie_lpid(lpid, RIC_FLUSH_ALL);
|
|
|
|
}
|
2019-09-03 01:29:27 +10:00
|
|
|
EXPORT_SYMBOL_GPL(radix__flush_all_lpid);
|
2018-10-08 16:31:07 +11:00
|
|
|
|
2018-05-09 12:20:18 +10:00
|
|
|
/*
|
2019-09-03 01:29:27 +10:00
|
|
|
* Flush process scoped translations from LPID (=LPIDR)
|
2018-05-09 12:20:18 +10:00
|
|
|
*/
|
2019-09-03 01:29:27 +10:00
|
|
|
void radix__flush_all_lpid_guest(unsigned int lpid)
|
2018-05-09 12:20:18 +10:00
|
|
|
{
|
2019-09-03 01:29:27 +10:00
|
|
|
_tlbie_lpid_guest(lpid, RIC_FLUSH_ALL);
|
2018-05-09 12:20:18 +10:00
|
|
|
}
|
|
|
|
|
2016-04-29 23:26:05 +10:00
|
|
|
void radix__tlb_flush(struct mmu_gather *tlb)
|
|
|
|
{
|
2016-07-13 15:06:35 +05:30
|
|
|
int psize = 0;
|
2016-04-29 23:26:05 +10:00
|
|
|
struct mm_struct *mm = tlb->mm;
|
2016-07-13 15:06:35 +05:30
|
|
|
int page_size = tlb->page_size;
|
2018-06-15 11:38:37 +10:00
|
|
|
unsigned long start = tlb->start;
|
|
|
|
unsigned long end = tlb->end;
|
2016-07-13 15:06:35 +05:30
|
|
|
|
|
|
|
/*
|
|
|
|
* if page size is not something we understand, do a full mm flush
|
2017-10-24 23:06:54 +10:00
|
|
|
*
|
|
|
|
* A "fullmm" flush must always do a flush_all_mm (RIC=2) flush
|
|
|
|
* that flushes the process table entry cache upon process teardown.
|
|
|
|
* See the comment for radix in arch_exit_mmap().
|
2016-07-13 15:06:35 +05:30
|
|
|
*/
|
2023-02-03 21:17:16 +10:00
|
|
|
if (tlb->fullmm) {
|
2023-05-24 16:08:21 +10:00
|
|
|
if (IS_ENABLED(CONFIG_MMU_LAZY_TLB_SHOOTDOWN)) {
|
|
|
|
/*
|
|
|
|
* Shootdown based lazy tlb mm refcounting means we
|
|
|
|
* have to IPI everyone in the mm_cpumask anyway soon
|
|
|
|
* when the mm goes away, so might as well do it as
|
|
|
|
* part of the final flush now.
|
|
|
|
*
|
|
|
|
* If lazy shootdown was improved to reduce IPIs (e.g.,
|
|
|
|
* by batching), then it may end up being better to use
|
|
|
|
* tlbies here instead.
|
|
|
|
*/
|
|
|
|
preempt_disable();
|
|
|
|
|
|
|
|
smp_mb(); /* see radix__flush_tlb_mm */
|
|
|
|
exit_flush_lazy_tlbs(mm);
|
powerpc/64s/radix: Don't warn on copros in radix__tlb_flush()
Sachin reported a warning when running the inject-ra-err selftest:
# selftests: powerpc/mce: inject-ra-err
Disabling lock debugging due to kernel taint
MCE: CPU19: machine check (Severe) Real address Load/Store (foreign/control memory) [Not recovered]
MCE: CPU19: PID: 5254 Comm: inject-ra-err NIP: [0000000010000e48]
MCE: CPU19: Initiator CPU
MCE: CPU19: Unknown
------------[ cut here ]------------
WARNING: CPU: 19 PID: 5254 at arch/powerpc/mm/book3s64/radix_tlb.c:1221 radix__tlb_flush+0x160/0x180
CPU: 19 PID: 5254 Comm: inject-ra-err Kdump: loaded Tainted: G M E 6.6.0-rc3-00055-g9ed22ae6be81 #4
Hardware name: IBM,9080-HEX POWER10 (raw) 0x800200 0xf000006 of:IBM,FW1030.20 (NH1030_058) hv:phyp pSeries
...
NIP radix__tlb_flush+0x160/0x180
LR radix__tlb_flush+0x104/0x180
Call Trace:
radix__tlb_flush+0xf4/0x180 (unreliable)
tlb_finish_mmu+0x15c/0x1e0
exit_mmap+0x1a0/0x510
__mmput+0x60/0x1e0
exit_mm+0xdc/0x170
do_exit+0x2bc/0x5a0
do_group_exit+0x4c/0xc0
sys_exit_group+0x28/0x30
system_call_exception+0x138/0x330
system_call_vectored_common+0x15c/0x2ec
And bisected it to commit e43c0a0c3c28 ("powerpc/64s/radix: combine
final TLB flush and lazy tlb mm shootdown IPIs"), which added a warning
in radix__tlb_flush() if mm->context.copros is still elevated.
However it's possible for the copros count to be elevated if a process
exits without first closing file descriptors that are associated with a
copro, eg. VAS.
If the process exits with a VAS file still open, the release callback
is queued up for exit_task_work() via:
exit_files()
put_files_struct()
close_files()
filp_close()
fput()
And called via:
exit_task_work()
____fput()
__fput()
file->f_op->release(inode, file)
coproc_release()
vas_user_win_ops->close_win()
vas_deallocate_window()
mm_context_remove_vas_window()
mm_context_remove_copro()
But that is after exit_mm() has been called from do_exit() and triggered
the warning.
Fix it by dropping the warning, and always calling __flush_all_mm().
In the normal case of no copros, that will result in a call to
_tlbiel_pid(mm->context.id, RIC_FLUSH_ALL) just as the current code
does.
If the copros count is elevated then it will cause a global flush, which
should flush translations from any copros. Note that the process table
entry was cleared in arch_exit_mmap(), so copros should not be able to
fetch any new translations.
Fixes: e43c0a0c3c28 ("powerpc/64s/radix: combine final TLB flush and lazy tlb mm shootdown IPIs")
Reported-by: Sachin Sant <sachinp@linux.ibm.com>
Closes: https://lore.kernel.org/all/A8E52547-4BF1-47CE-8AEA-BC5A9D7E3567@linux.ibm.com/
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Tested-by: Sachin Sant <sachinp@linux.ibm.com>
Link: https://msgid.link/20231017121527.1574104-1-mpe@ellerman.id.au
2023-10-17 23:15:27 +11:00
|
|
|
__flush_all_mm(mm, true);
|
2023-05-24 16:08:21 +10:00
|
|
|
|
|
|
|
preempt_enable();
|
|
|
|
} else {
|
|
|
|
__flush_all_mm(mm, true);
|
|
|
|
}
|
|
|
|
|
2017-11-07 18:53:09 +11:00
|
|
|
} else if ( (psize = radix_get_mmu_psize(page_size)) == -1) {
|
2019-10-24 13:28:00 +05:30
|
|
|
if (!tlb->freed_tables)
|
2017-11-07 18:53:09 +11:00
|
|
|
radix__flush_tlb_mm(mm);
|
|
|
|
else
|
|
|
|
radix__flush_all_mm(mm);
|
|
|
|
} else {
|
2019-10-24 13:28:00 +05:30
|
|
|
if (!tlb->freed_tables)
|
2017-11-07 18:53:09 +11:00
|
|
|
radix__flush_tlb_range_psize(mm, start, end, psize);
|
|
|
|
else
|
|
|
|
radix__flush_tlb_pwc_range_psize(mm, start, end, psize);
|
|
|
|
}
|
2016-07-13 15:06:35 +05:30
|
|
|
}
|
|
|
|
|
2021-06-10 14:06:39 +05:30
|
|
|
static void __radix__flush_tlb_range_psize(struct mm_struct *mm,
|
2017-11-07 18:53:09 +11:00
|
|
|
unsigned long start, unsigned long end,
|
|
|
|
int psize, bool also_pwc)
|
2016-07-13 15:06:35 +05:30
|
|
|
{
|
|
|
|
unsigned long pid;
|
powerpc/64s/radix: Optimize flush_tlb_range
Currently for radix, flush_tlb_range flushes the entire PID, because
the Linux mm code does not tell us about page size here for THP vs
regular pages. This is quite sub-optimal for small mremap / mprotect
/ change_protection.
So implement va range flushes with two flush passes, one for each
page size (regular and THP). The second flush has an order of matnitude
fewer tlbie instructions than the first, so it is a relatively small
additional cost.
There is still room for improvement here with some changes to generic
APIs, particularly if there are mostly THP pages to be invalidated,
the small page flushes could be reduced.
Time to mprotect 1 page of memory (after mmap, touch):
vanilla 2.9us 1.8us
patched 1.2us 1.6us
Time to mprotect 30 pages of memory (after mmap, touch):
vanilla 8.2us 7.2us
patched 6.9us 17.9us
Time to mprotect 34 pages of memory (after mmap, touch):
vanilla 9.1us 8.0us
patched 9.0us 8.0us
34 pages is the point at which the invalidation switches from va
to entire PID, which tlbie can do in a single instruction. This is
why in the case of 30 pages, the new code runs slower for this test.
This is a deliberate tradeoff already present in the unmap and THP
promotion code, the idea is that the benefit from avoiding flushing
entire TLB for this PID on all threads in the system.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-11-07 18:53:07 +11:00
|
|
|
unsigned int page_shift = mmu_psize_defs[psize].shift;
|
|
|
|
unsigned long page_size = 1UL << page_shift;
|
|
|
|
unsigned long nr_pages = (end - start) >> page_shift;
|
2020-12-17 23:47:26 +10:00
|
|
|
bool flush_pid;
|
|
|
|
enum tlb_flush_type type;
|
2016-07-13 15:06:35 +05:30
|
|
|
|
2017-10-16 12:41:00 +05:30
|
|
|
pid = mm->context.id;
|
2023-02-03 21:17:17 +10:00
|
|
|
if (WARN_ON_ONCE(pid == MMU_NO_CONTEXT))
|
2017-10-24 23:06:53 +10:00
|
|
|
return;
|
2016-07-13 15:06:35 +05:30
|
|
|
|
2023-02-03 21:17:18 +10:00
|
|
|
WARN_ON_ONCE(end == TLB_FLUSH_ALL);
|
2020-12-17 23:47:26 +10:00
|
|
|
|
2017-10-24 23:06:53 +10:00
|
|
|
preempt_disable();
|
2018-06-01 20:01:20 +10:00
|
|
|
smp_mb(); /* see radix__flush_tlb_mm */
|
2023-02-03 21:17:18 +10:00
|
|
|
type = flush_type_needed(mm, false);
|
powerpc/64s/radix: Check for no TLB flush required
If there are no CPUs in mm_cpumask, no TLB flush is required at all.
This patch adds a check for this case.
Currently it's not tested for, in fact mm_is_thread_local() returns
false if the current CPU is not in mm_cpumask, so it's treated as a
global flush.
This can come up in some cases like exec failure before the new mm has
ever been switched to. This patch reduces TLBIE instructions required
to build a kernel from about 120,000 to 45,000. Another situation it
could help is page reclaim, KSM, THP, etc., (i.e., asynch operations
external to the process) where the process is sleeping and has all TLBs
flushed out of all CPUs.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201217134731.488135-4-npiggin@gmail.com
2020-12-17 23:47:27 +10:00
|
|
|
if (type == FLUSH_TYPE_NONE)
|
|
|
|
goto out;
|
powerpc/64s/radix: Optimize flush_tlb_range
Currently for radix, flush_tlb_range flushes the entire PID, because
the Linux mm code does not tell us about page size here for THP vs
regular pages. This is quite sub-optimal for small mremap / mprotect
/ change_protection.
So implement va range flushes with two flush passes, one for each
page size (regular and THP). The second flush has an order of matnitude
fewer tlbie instructions than the first, so it is a relatively small
additional cost.
There is still room for improvement here with some changes to generic
APIs, particularly if there are mostly THP pages to be invalidated,
the small page flushes could be reduced.
Time to mprotect 1 page of memory (after mmap, touch):
vanilla 2.9us 1.8us
patched 1.2us 1.6us
Time to mprotect 30 pages of memory (after mmap, touch):
vanilla 8.2us 7.2us
patched 6.9us 17.9us
Time to mprotect 34 pages of memory (after mmap, touch):
vanilla 9.1us 8.0us
patched 9.0us 8.0us
34 pages is the point at which the invalidation switches from va
to entire PID, which tlbie can do in a single instruction. This is
why in the case of 30 pages, the new code runs slower for this test.
This is a deliberate tradeoff already present in the unmap and THP
promotion code, the idea is that the benefit from avoiding flushing
entire TLB for this PID on all threads in the system.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-11-07 18:53:07 +11:00
|
|
|
|
2023-02-03 21:17:18 +10:00
|
|
|
if (type == FLUSH_TYPE_GLOBAL)
|
2020-12-17 23:47:26 +10:00
|
|
|
flush_pid = nr_pages > tlb_single_page_flush_ceiling;
|
|
|
|
else
|
|
|
|
flush_pid = nr_pages > tlb_local_single_page_flush_ceiling;
|
|
|
|
|
|
|
|
if (!mmu_has_feature(MMU_FTR_GTSE) && type == FLUSH_TYPE_GLOBAL) {
|
2020-07-03 11:06:08 +05:30
|
|
|
unsigned long tgt = H_RPTI_TARGET_CMMU;
|
|
|
|
unsigned long type = H_RPTI_TYPE_TLB;
|
|
|
|
unsigned long pg_sizes = psize_to_rpti_pgsize(psize);
|
|
|
|
|
|
|
|
if (also_pwc)
|
|
|
|
type |= H_RPTI_TYPE_PWC;
|
|
|
|
if (atomic_read(&mm->context.copros) > 0)
|
|
|
|
tgt |= H_RPTI_TARGET_NMMU;
|
|
|
|
pseries_rpt_invalidate(pid, tgt, type, pg_sizes, start, end);
|
2020-12-17 23:47:26 +10:00
|
|
|
} else if (flush_pid) {
|
|
|
|
if (type == FLUSH_TYPE_LOCAL) {
|
2017-11-07 18:53:09 +11:00
|
|
|
_tlbiel_pid(pid, also_pwc ? RIC_FLUSH_ALL : RIC_FLUSH_TLB);
|
powerpc/64s/radix: flush remote CPUs out of single-threaded mm_cpumask
When a single-threaded process has a non-local mm_cpumask, try to use
that point to flush the TLBs out of other CPUs in the cpumask.
An IPI is used for clearing remote CPUs for a few reasons:
- An IPI can end lazy TLB use of the mm, which is required to prevent
TLB entries being created on the remote CPU. The alternative is to
drop lazy TLB switching completely, which costs 7.5% in a context
switch ping-pong test betwee a process and kernel idle thread.
- An IPI can have remote CPUs flush the entire PID, but the local CPU
can flush a specific VA. tlbie would require over-flushing of the
local CPU (where the process is running).
- A single threaded process that is migrated to a different CPU is
likely to have a relatively small mm_cpumask, so IPI is reasonable.
No other thread can concurrently switch to this mm, because it must
have been given a reference to mm_users by the current thread before it
can use_mm. mm_users can be asynchronously incremented (by
mm_activate or mmget_not_zero), but those users must use remote mm
access and can't use_mm or access user address space. Existing code
makes the this assumption already, for example sparc64 has reset
mm_cpumask using this condition since the start of history, see
arch/sparc/kernel/smp_64.c.
This reduces tlbies for a kernel compile workload from 0.90M to 0.12M,
tlbiels are increased significantly due to the PID flushing for the
cleaning up remote CPUs, and increased local flushes (PID flushes take
128 tlbiels vs 1 tlbie).
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-06-01 20:01:21 +10:00
|
|
|
} else {
|
2019-09-03 01:29:31 +10:00
|
|
|
if (cputlb_use_tlbie()) {
|
|
|
|
if (mm_needs_flush_escalation(mm))
|
|
|
|
also_pwc = true;
|
|
|
|
|
|
|
|
_tlbie_pid(pid,
|
|
|
|
also_pwc ? RIC_FLUSH_ALL : RIC_FLUSH_TLB);
|
|
|
|
} else {
|
|
|
|
_tlbiel_pid_multicast(mm, pid,
|
|
|
|
also_pwc ? RIC_FLUSH_ALL : RIC_FLUSH_TLB);
|
|
|
|
}
|
powerpc/64s/radix: flush remote CPUs out of single-threaded mm_cpumask
When a single-threaded process has a non-local mm_cpumask, try to use
that point to flush the TLBs out of other CPUs in the cpumask.
An IPI is used for clearing remote CPUs for a few reasons:
- An IPI can end lazy TLB use of the mm, which is required to prevent
TLB entries being created on the remote CPU. The alternative is to
drop lazy TLB switching completely, which costs 7.5% in a context
switch ping-pong test betwee a process and kernel idle thread.
- An IPI can have remote CPUs flush the entire PID, but the local CPU
can flush a specific VA. tlbie would require over-flushing of the
local CPU (where the process is running).
- A single threaded process that is migrated to a different CPU is
likely to have a relatively small mm_cpumask, so IPI is reasonable.
No other thread can concurrently switch to this mm, because it must
have been given a reference to mm_users by the current thread before it
can use_mm. mm_users can be asynchronously incremented (by
mm_activate or mmget_not_zero), but those users must use remote mm
access and can't use_mm or access user address space. Existing code
makes the this assumption already, for example sparc64 has reset
mm_cpumask using this condition since the start of history, see
arch/sparc/kernel/smp_64.c.
This reduces tlbies for a kernel compile workload from 0.90M to 0.12M,
tlbiels are increased significantly due to the PID flushing for the
cleaning up remote CPUs, and increased local flushes (PID flushes take
128 tlbiels vs 1 tlbie).
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-06-01 20:01:21 +10:00
|
|
|
|
|
|
|
}
|
2017-10-24 23:06:53 +10:00
|
|
|
} else {
|
2020-12-17 23:47:26 +10:00
|
|
|
if (type == FLUSH_TYPE_LOCAL)
|
2017-11-07 18:53:09 +11:00
|
|
|
_tlbiel_va_range(start, end, pid, page_size, psize, also_pwc);
|
2019-09-03 01:29:31 +10:00
|
|
|
else if (cputlb_use_tlbie())
|
2017-11-07 18:53:09 +11:00
|
|
|
_tlbie_va_range(start, end, pid, page_size, psize, also_pwc);
|
2019-09-03 01:29:31 +10:00
|
|
|
else
|
|
|
|
_tlbiel_va_range_multicast(mm,
|
|
|
|
start, end, pid, page_size, psize, also_pwc);
|
2016-07-13 15:06:35 +05:30
|
|
|
}
|
powerpc/64s/radix: Check for no TLB flush required
If there are no CPUs in mm_cpumask, no TLB flush is required at all.
This patch adds a check for this case.
Currently it's not tested for, in fact mm_is_thread_local() returns
false if the current CPU is not in mm_cpumask, so it's treated as a
global flush.
This can come up in some cases like exec failure before the new mm has
ever been switched to. This patch reduces TLBIE instructions required
to build a kernel from about 120,000 to 45,000. Another situation it
could help is page reclaim, KSM, THP, etc., (i.e., asynch operations
external to the process) where the process is sleeping and has all TLBs
flushed out of all CPUs.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201217134731.488135-4-npiggin@gmail.com
2020-12-17 23:47:27 +10:00
|
|
|
out:
|
2016-07-13 15:06:35 +05:30
|
|
|
preempt_enable();
|
2023-07-25 23:42:07 +10:00
|
|
|
mmu_notifier_arch_invalidate_secondary_tlbs(mm, start, end);
|
2016-04-29 23:26:05 +10:00
|
|
|
}
|
2016-07-13 15:05:29 +05:30
|
|
|
|
2017-11-07 18:53:09 +11:00
|
|
|
void radix__flush_tlb_range_psize(struct mm_struct *mm, unsigned long start,
|
|
|
|
unsigned long end, int psize)
|
|
|
|
{
|
|
|
|
return __radix__flush_tlb_range_psize(mm, start, end, psize, false);
|
|
|
|
}
|
|
|
|
|
2021-07-07 18:10:21 -07:00
|
|
|
void radix__flush_tlb_pwc_range_psize(struct mm_struct *mm, unsigned long start,
|
|
|
|
unsigned long end, int psize)
|
2017-11-07 18:53:09 +11:00
|
|
|
{
|
|
|
|
__radix__flush_tlb_range_psize(mm, start, end, psize, true);
|
|
|
|
}
|
|
|
|
|
2017-07-19 14:49:06 +10:00
|
|
|
#ifdef CONFIG_TRANSPARENT_HUGEPAGE
|
|
|
|
void radix__flush_tlb_collapsed_pmd(struct mm_struct *mm, unsigned long addr)
|
|
|
|
{
|
|
|
|
unsigned long pid, end;
|
2020-12-17 23:47:26 +10:00
|
|
|
enum tlb_flush_type type;
|
2017-07-19 14:49:06 +10:00
|
|
|
|
2017-10-16 12:41:00 +05:30
|
|
|
pid = mm->context.id;
|
2023-02-03 21:17:17 +10:00
|
|
|
if (WARN_ON_ONCE(pid == MMU_NO_CONTEXT))
|
2017-10-24 23:06:53 +10:00
|
|
|
return;
|
2017-07-19 14:49:06 +10:00
|
|
|
|
|
|
|
/* 4k page size, just blow the world */
|
|
|
|
if (PAGE_SIZE == 0x1000) {
|
|
|
|
radix__flush_all_mm(mm);
|
|
|
|
return;
|
|
|
|
}
|
|
|
|
|
powerpc/64s/radix: Optimize flush_tlb_range
Currently for radix, flush_tlb_range flushes the entire PID, because
the Linux mm code does not tell us about page size here for THP vs
regular pages. This is quite sub-optimal for small mremap / mprotect
/ change_protection.
So implement va range flushes with two flush passes, one for each
page size (regular and THP). The second flush has an order of matnitude
fewer tlbie instructions than the first, so it is a relatively small
additional cost.
There is still room for improvement here with some changes to generic
APIs, particularly if there are mostly THP pages to be invalidated,
the small page flushes could be reduced.
Time to mprotect 1 page of memory (after mmap, touch):
vanilla 2.9us 1.8us
patched 1.2us 1.6us
Time to mprotect 30 pages of memory (after mmap, touch):
vanilla 8.2us 7.2us
patched 6.9us 17.9us
Time to mprotect 34 pages of memory (after mmap, touch):
vanilla 9.1us 8.0us
patched 9.0us 8.0us
34 pages is the point at which the invalidation switches from va
to entire PID, which tlbie can do in a single instruction. This is
why in the case of 30 pages, the new code runs slower for this test.
This is a deliberate tradeoff already present in the unmap and THP
promotion code, the idea is that the benefit from avoiding flushing
entire TLB for this PID on all threads in the system.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-11-07 18:53:07 +11:00
|
|
|
end = addr + HPAGE_PMD_SIZE;
|
|
|
|
|
|
|
|
/* Otherwise first do the PWC, then iterate the pages. */
|
2017-10-24 23:06:53 +10:00
|
|
|
preempt_disable();
|
2018-06-01 20:01:20 +10:00
|
|
|
smp_mb(); /* see radix__flush_tlb_mm */
|
2020-12-17 23:47:26 +10:00
|
|
|
type = flush_type_needed(mm, false);
|
powerpc/64s/radix: Check for no TLB flush required
If there are no CPUs in mm_cpumask, no TLB flush is required at all.
This patch adds a check for this case.
Currently it's not tested for, in fact mm_is_thread_local() returns
false if the current CPU is not in mm_cpumask, so it's treated as a
global flush.
This can come up in some cases like exec failure before the new mm has
ever been switched to. This patch reduces TLBIE instructions required
to build a kernel from about 120,000 to 45,000. Another situation it
could help is page reclaim, KSM, THP, etc., (i.e., asynch operations
external to the process) where the process is sleeping and has all TLBs
flushed out of all CPUs.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201217134731.488135-4-npiggin@gmail.com
2020-12-17 23:47:27 +10:00
|
|
|
if (type == FLUSH_TYPE_LOCAL) {
|
|
|
|
_tlbiel_va_range(addr, end, pid, PAGE_SIZE, mmu_virtual_psize, true);
|
|
|
|
} else if (type == FLUSH_TYPE_GLOBAL) {
|
2020-07-03 11:06:08 +05:30
|
|
|
if (!mmu_has_feature(MMU_FTR_GTSE)) {
|
|
|
|
unsigned long tgt, type, pg_sizes;
|
|
|
|
|
|
|
|
tgt = H_RPTI_TARGET_CMMU;
|
|
|
|
type = H_RPTI_TYPE_TLB | H_RPTI_TYPE_PWC |
|
|
|
|
H_RPTI_TYPE_PRT;
|
|
|
|
pg_sizes = psize_to_rpti_pgsize(mmu_virtual_psize);
|
|
|
|
|
|
|
|
if (atomic_read(&mm->context.copros) > 0)
|
|
|
|
tgt |= H_RPTI_TARGET_NMMU;
|
|
|
|
pseries_rpt_invalidate(pid, tgt, type, pg_sizes,
|
|
|
|
addr, end);
|
|
|
|
} else if (cputlb_use_tlbie())
|
2019-09-03 01:29:31 +10:00
|
|
|
_tlbie_va_range(addr, end, pid, PAGE_SIZE, mmu_virtual_psize, true);
|
|
|
|
else
|
|
|
|
_tlbiel_va_range_multicast(mm,
|
|
|
|
addr, end, pid, PAGE_SIZE, mmu_virtual_psize, true);
|
powerpc/64s/radix: Optimize flush_tlb_range
Currently for radix, flush_tlb_range flushes the entire PID, because
the Linux mm code does not tell us about page size here for THP vs
regular pages. This is quite sub-optimal for small mremap / mprotect
/ change_protection.
So implement va range flushes with two flush passes, one for each
page size (regular and THP). The second flush has an order of matnitude
fewer tlbie instructions than the first, so it is a relatively small
additional cost.
There is still room for improvement here with some changes to generic
APIs, particularly if there are mostly THP pages to be invalidated,
the small page flushes could be reduced.
Time to mprotect 1 page of memory (after mmap, touch):
vanilla 2.9us 1.8us
patched 1.2us 1.6us
Time to mprotect 30 pages of memory (after mmap, touch):
vanilla 8.2us 7.2us
patched 6.9us 17.9us
Time to mprotect 34 pages of memory (after mmap, touch):
vanilla 9.1us 8.0us
patched 9.0us 8.0us
34 pages is the point at which the invalidation switches from va
to entire PID, which tlbie can do in a single instruction. This is
why in the case of 30 pages, the new code runs slower for this test.
This is a deliberate tradeoff already present in the unmap and THP
promotion code, the idea is that the benefit from avoiding flushing
entire TLB for this PID on all threads in the system.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-11-07 18:53:07 +11:00
|
|
|
}
|
2017-11-07 18:53:05 +11:00
|
|
|
|
2017-07-19 14:49:06 +10:00
|
|
|
preempt_enable();
|
|
|
|
}
|
|
|
|
#endif /* CONFIG_TRANSPARENT_HUGEPAGE */
|
|
|
|
|
2016-07-13 15:06:40 +05:30
|
|
|
void radix__flush_pmd_tlb_range(struct vm_area_struct *vma,
|
|
|
|
unsigned long start, unsigned long end)
|
|
|
|
{
|
|
|
|
radix__flush_tlb_range_psize(vma->vm_mm, start, end, MMU_PAGE_2M);
|
|
|
|
}
|
|
|
|
EXPORT_SYMBOL(radix__flush_pmd_tlb_range);
|
2016-08-23 16:27:48 +05:30
|
|
|
|
2023-07-25 00:37:55 +05:30
|
|
|
void radix__flush_pud_tlb_range(struct vm_area_struct *vma,
|
|
|
|
unsigned long start, unsigned long end)
|
|
|
|
{
|
|
|
|
radix__flush_tlb_range_psize(vma->vm_mm, start, end, MMU_PAGE_1G);
|
|
|
|
}
|
|
|
|
EXPORT_SYMBOL(radix__flush_pud_tlb_range);
|
|
|
|
|
2016-08-23 16:27:48 +05:30
|
|
|
void radix__flush_tlb_all(void)
|
|
|
|
{
|
|
|
|
unsigned long rb,prs,r,rs;
|
|
|
|
unsigned long ric = RIC_FLUSH_ALL;
|
|
|
|
|
|
|
|
rb = 0x3 << PPC_BITLSHIFT(53); /* IS = 3 */
|
|
|
|
prs = 0; /* partition scoped */
|
2018-02-01 16:07:25 +11:00
|
|
|
r = 1; /* radix format */
|
2016-08-23 16:27:48 +05:30
|
|
|
rs = 1 & ((1UL << 32) - 1); /* any LPID value to flush guest mappings */
|
|
|
|
|
|
|
|
asm volatile("ptesync": : :"memory");
|
|
|
|
/*
|
|
|
|
* now flush guest entries by passing PRS = 1 and LPID != 0
|
|
|
|
*/
|
|
|
|
asm volatile(PPC_TLBIE_5(%0, %4, %3, %2, %1)
|
|
|
|
: : "r"(rb), "i"(r), "i"(1), "i"(ric), "r"(rs) : "memory");
|
|
|
|
/*
|
|
|
|
* now flush host entires by passing PRS = 0 and LPID == 0
|
|
|
|
*/
|
|
|
|
asm volatile(PPC_TLBIE_5(%0, %4, %3, %2, %1)
|
|
|
|
: : "r"(rb), "i"(r), "i"(prs), "i"(ric), "r"(0) : "memory");
|
|
|
|
asm volatile("eieio; tlbsync; ptesync": : :"memory");
|
|
|
|
}
|
KVM: PPC: Book3S HV: Add support for H_RPT_INVALIDATE
H_RPT_INVALIDATE does two types of TLB invalidations:
1. Process-scoped invalidations for guests when LPCR[GTSE]=0.
This is currently not used in KVM as GTSE is not usually
disabled in KVM.
2. Partition-scoped invalidations that an L1 hypervisor does on
behalf of an L2 guest. This is currently handled
by H_TLB_INVALIDATE hcall and this new replaces the old that.
This commit enables process-scoped invalidations for L1 guests.
Support for process-scoped and partition-scoped invalidations
from/for nested guests will be added separately.
Process scoped tlbie invalidations from L1 and nested guests
need RS register for TLBIE instruction to contain both PID and
LPID. This patch introduces primitives that execute tlbie
instruction with both PID and LPID set in prepartion for
H_RPT_INVALIDATE hcall.
A description of H_RPT_INVALIDATE follows:
int64 /* H_Success: Return code on successful completion */
/* H_Busy - repeat the call with the same */
/* H_Parameter, H_P2, H_P3, H_P4, H_P5 : Invalid
parameters */
hcall(const uint64 H_RPT_INVALIDATE, /* Invalidate RPT
translation
lookaside information */
uint64 id, /* PID/LPID to invalidate */
uint64 target, /* Invalidation target */
uint64 type, /* Type of lookaside information */
uint64 pg_sizes, /* Page sizes */
uint64 start, /* Start of Effective Address (EA)
range (inclusive) */
uint64 end) /* End of EA range (exclusive) */
Invalidation targets (target)
-----------------------------
Core MMU 0x01 /* All virtual processors in the
partition */
Core local MMU 0x02 /* Current virtual processor */
Nest MMU 0x04 /* All nest/accelerator agents
in use by the partition */
A combination of the above can be specified,
except core and core local.
Type of translation to invalidate (type)
---------------------------------------
NESTED 0x0001 /* invalidate nested guest partition-scope */
TLB 0x0002 /* Invalidate TLB */
PWC 0x0004 /* Invalidate Page Walk Cache */
PRT 0x0008 /* Invalidate caching of Process Table
Entries if NESTED is clear */
PAT 0x0008 /* Invalidate caching of Partition Table
Entries if NESTED is set */
A combination of the above can be specified.
Page size mask (pages)
----------------------
4K 0x01
64K 0x02
2M 0x04
1G 0x08
All sizes (-1UL)
A combination of the above can be specified.
All page sizes can be selected with -1.
Semantics: Invalidate radix tree lookaside information
matching the parameters given.
* Return H_P2, H_P3 or H_P4 if target, type, or pageSizes parameters
are different from the defined values.
* Return H_PARAMETER if NESTED is set and pid is not a valid nested
LPID allocated to this partition
* Return H_P5 if (start, end) doesn't form a valid range. Start and
end should be a valid Quadrant address and end > start.
* Return H_NotSupported if the partition is not in running in radix
translation mode.
* May invalidate more translation information than requested.
* If start = 0 and end = -1, set the range to cover all valid
addresses. Else start and end should be aligned to 4kB (lower 11
bits clear).
* If NESTED is clear, then invalidate process scoped lookaside
information. Else pid specifies a nested LPID, and the invalidation
is performed on nested guest partition table and nested guest
partition scope real addresses.
* If pid = 0 and NESTED is clear, then valid addresses are quadrant 3
and quadrant 0 spaces, Else valid addresses are quadrant 0.
* Pages which are fully covered by the range are to be invalidated.
Those which are partially covered are considered outside
invalidation range, which allows a caller to optimally invalidate
ranges that may contain mixed page sizes.
* Return H_SUCCESS on success.
Signed-off-by: Bharata B Rao <bharata@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210621085003.904767-4-bharata@linux.ibm.com
2021-06-21 14:20:00 +05:30
|
|
|
|
|
|
|
#ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE
|
powerpc/radix: Move some functions into #ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE
With skiboot_defconfig, Clang reports:
CC arch/powerpc/mm/book3s64/radix_tlb.o
arch/powerpc/mm/book3s64/radix_tlb.c:419:20: error: unused function '_tlbie_pid_lpid' [-Werror,-Wunused-function]
static inline void _tlbie_pid_lpid(unsigned long pid, unsigned long lpid,
^
arch/powerpc/mm/book3s64/radix_tlb.c:663:20: error: unused function '_tlbie_va_range_lpid' [-Werror,-Wunused-function]
static inline void _tlbie_va_range_lpid(unsigned long start, unsigned long end,
^
This is because those functions are only called from functions
enclosed in a #ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE
Move below functions inside that #ifdef
* __tlbie_pid_lpid(unsigned long pid,
* __tlbie_va_lpid(unsigned long va, unsigned long pid,
* fixup_tlbie_pid_lpid(unsigned long pid, unsigned long lpid)
* _tlbie_pid_lpid(unsigned long pid, unsigned long lpid,
* fixup_tlbie_va_range_lpid(unsigned long va,
* __tlbie_va_range_lpid(unsigned long start, unsigned long end,
* _tlbie_va_range_lpid(unsigned long start, unsigned long end,
Fixes: f0c6fbbb9050 ("KVM: PPC: Book3S HV: Add support for H_RPT_INVALIDATE")
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202307260802.Mjr99P5O-lkp@intel.com/
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/3d72efd39f986ee939d068af69fdce28bd600766.1691568093.git.christophe.leroy@csgroup.eu
2023-08-09 10:01:43 +02:00
|
|
|
static __always_inline void __tlbie_pid_lpid(unsigned long pid,
|
|
|
|
unsigned long lpid,
|
|
|
|
unsigned long ric)
|
|
|
|
{
|
|
|
|
unsigned long rb, rs, prs, r;
|
|
|
|
|
|
|
|
rb = PPC_BIT(53); /* IS = 1 */
|
|
|
|
rs = (pid << PPC_BITLSHIFT(31)) | (lpid & ~(PPC_BITMASK(0, 31)));
|
|
|
|
prs = 1; /* process scoped */
|
|
|
|
r = 1; /* radix format */
|
|
|
|
|
|
|
|
asm volatile(PPC_TLBIE_5(%0, %4, %3, %2, %1)
|
|
|
|
: : "r"(rb), "i"(r), "i"(prs), "i"(ric), "r"(rs) : "memory");
|
|
|
|
trace_tlbie(0, 0, rb, rs, ric, prs, r);
|
|
|
|
}
|
|
|
|
|
|
|
|
static __always_inline void __tlbie_va_lpid(unsigned long va, unsigned long pid,
|
|
|
|
unsigned long lpid,
|
|
|
|
unsigned long ap, unsigned long ric)
|
|
|
|
{
|
|
|
|
unsigned long rb, rs, prs, r;
|
|
|
|
|
|
|
|
rb = va & ~(PPC_BITMASK(52, 63));
|
|
|
|
rb |= ap << PPC_BITLSHIFT(58);
|
|
|
|
rs = (pid << PPC_BITLSHIFT(31)) | (lpid & ~(PPC_BITMASK(0, 31)));
|
|
|
|
prs = 1; /* process scoped */
|
|
|
|
r = 1; /* radix format */
|
|
|
|
|
|
|
|
asm volatile(PPC_TLBIE_5(%0, %4, %3, %2, %1)
|
|
|
|
: : "r"(rb), "i"(r), "i"(prs), "i"(ric), "r"(rs) : "memory");
|
|
|
|
trace_tlbie(0, 0, rb, rs, ric, prs, r);
|
|
|
|
}
|
|
|
|
|
|
|
|
static inline void fixup_tlbie_pid_lpid(unsigned long pid, unsigned long lpid)
|
|
|
|
{
|
|
|
|
/*
|
|
|
|
* We can use any address for the invalidation, pick one which is
|
|
|
|
* probably unused as an optimisation.
|
|
|
|
*/
|
|
|
|
unsigned long va = ((1UL << 52) - 1);
|
|
|
|
|
|
|
|
if (cpu_has_feature(CPU_FTR_P9_TLBIE_ERAT_BUG)) {
|
|
|
|
asm volatile("ptesync" : : : "memory");
|
|
|
|
__tlbie_pid_lpid(0, lpid, RIC_FLUSH_TLB);
|
|
|
|
}
|
|
|
|
|
|
|
|
if (cpu_has_feature(CPU_FTR_P9_TLBIE_STQ_BUG)) {
|
|
|
|
asm volatile("ptesync" : : : "memory");
|
|
|
|
__tlbie_va_lpid(va, pid, lpid, mmu_get_ap(MMU_PAGE_64K),
|
|
|
|
RIC_FLUSH_TLB);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
static inline void _tlbie_pid_lpid(unsigned long pid, unsigned long lpid,
|
|
|
|
unsigned long ric)
|
|
|
|
{
|
|
|
|
asm volatile("ptesync" : : : "memory");
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Workaround the fact that the "ric" argument to __tlbie_pid
|
|
|
|
* must be a compile-time contraint to match the "i" constraint
|
|
|
|
* in the asm statement.
|
|
|
|
*/
|
|
|
|
switch (ric) {
|
|
|
|
case RIC_FLUSH_TLB:
|
|
|
|
__tlbie_pid_lpid(pid, lpid, RIC_FLUSH_TLB);
|
|
|
|
fixup_tlbie_pid_lpid(pid, lpid);
|
|
|
|
break;
|
|
|
|
case RIC_FLUSH_PWC:
|
|
|
|
__tlbie_pid_lpid(pid, lpid, RIC_FLUSH_PWC);
|
|
|
|
break;
|
|
|
|
case RIC_FLUSH_ALL:
|
|
|
|
default:
|
|
|
|
__tlbie_pid_lpid(pid, lpid, RIC_FLUSH_ALL);
|
|
|
|
fixup_tlbie_pid_lpid(pid, lpid);
|
|
|
|
}
|
|
|
|
asm volatile("eieio; tlbsync; ptesync" : : : "memory");
|
|
|
|
}
|
|
|
|
|
|
|
|
static inline void fixup_tlbie_va_range_lpid(unsigned long va,
|
|
|
|
unsigned long pid,
|
|
|
|
unsigned long lpid,
|
|
|
|
unsigned long ap)
|
|
|
|
{
|
|
|
|
if (cpu_has_feature(CPU_FTR_P9_TLBIE_ERAT_BUG)) {
|
|
|
|
asm volatile("ptesync" : : : "memory");
|
|
|
|
__tlbie_pid_lpid(0, lpid, RIC_FLUSH_TLB);
|
|
|
|
}
|
|
|
|
|
|
|
|
if (cpu_has_feature(CPU_FTR_P9_TLBIE_STQ_BUG)) {
|
|
|
|
asm volatile("ptesync" : : : "memory");
|
|
|
|
__tlbie_va_lpid(va, pid, lpid, ap, RIC_FLUSH_TLB);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
static inline void __tlbie_va_range_lpid(unsigned long start, unsigned long end,
|
|
|
|
unsigned long pid, unsigned long lpid,
|
|
|
|
unsigned long page_size,
|
|
|
|
unsigned long psize)
|
|
|
|
{
|
|
|
|
unsigned long addr;
|
|
|
|
unsigned long ap = mmu_get_ap(psize);
|
|
|
|
|
|
|
|
for (addr = start; addr < end; addr += page_size)
|
|
|
|
__tlbie_va_lpid(addr, pid, lpid, ap, RIC_FLUSH_TLB);
|
|
|
|
|
|
|
|
fixup_tlbie_va_range_lpid(addr - page_size, pid, lpid, ap);
|
|
|
|
}
|
|
|
|
|
|
|
|
static inline void _tlbie_va_range_lpid(unsigned long start, unsigned long end,
|
|
|
|
unsigned long pid, unsigned long lpid,
|
|
|
|
unsigned long page_size,
|
|
|
|
unsigned long psize, bool also_pwc)
|
|
|
|
{
|
|
|
|
asm volatile("ptesync" : : : "memory");
|
|
|
|
if (also_pwc)
|
|
|
|
__tlbie_pid_lpid(pid, lpid, RIC_FLUSH_PWC);
|
|
|
|
__tlbie_va_range_lpid(start, end, pid, lpid, page_size, psize);
|
|
|
|
asm volatile("eieio; tlbsync; ptesync" : : : "memory");
|
|
|
|
}
|
|
|
|
|
KVM: PPC: Book3S HV: Add support for H_RPT_INVALIDATE
H_RPT_INVALIDATE does two types of TLB invalidations:
1. Process-scoped invalidations for guests when LPCR[GTSE]=0.
This is currently not used in KVM as GTSE is not usually
disabled in KVM.
2. Partition-scoped invalidations that an L1 hypervisor does on
behalf of an L2 guest. This is currently handled
by H_TLB_INVALIDATE hcall and this new replaces the old that.
This commit enables process-scoped invalidations for L1 guests.
Support for process-scoped and partition-scoped invalidations
from/for nested guests will be added separately.
Process scoped tlbie invalidations from L1 and nested guests
need RS register for TLBIE instruction to contain both PID and
LPID. This patch introduces primitives that execute tlbie
instruction with both PID and LPID set in prepartion for
H_RPT_INVALIDATE hcall.
A description of H_RPT_INVALIDATE follows:
int64 /* H_Success: Return code on successful completion */
/* H_Busy - repeat the call with the same */
/* H_Parameter, H_P2, H_P3, H_P4, H_P5 : Invalid
parameters */
hcall(const uint64 H_RPT_INVALIDATE, /* Invalidate RPT
translation
lookaside information */
uint64 id, /* PID/LPID to invalidate */
uint64 target, /* Invalidation target */
uint64 type, /* Type of lookaside information */
uint64 pg_sizes, /* Page sizes */
uint64 start, /* Start of Effective Address (EA)
range (inclusive) */
uint64 end) /* End of EA range (exclusive) */
Invalidation targets (target)
-----------------------------
Core MMU 0x01 /* All virtual processors in the
partition */
Core local MMU 0x02 /* Current virtual processor */
Nest MMU 0x04 /* All nest/accelerator agents
in use by the partition */
A combination of the above can be specified,
except core and core local.
Type of translation to invalidate (type)
---------------------------------------
NESTED 0x0001 /* invalidate nested guest partition-scope */
TLB 0x0002 /* Invalidate TLB */
PWC 0x0004 /* Invalidate Page Walk Cache */
PRT 0x0008 /* Invalidate caching of Process Table
Entries if NESTED is clear */
PAT 0x0008 /* Invalidate caching of Partition Table
Entries if NESTED is set */
A combination of the above can be specified.
Page size mask (pages)
----------------------
4K 0x01
64K 0x02
2M 0x04
1G 0x08
All sizes (-1UL)
A combination of the above can be specified.
All page sizes can be selected with -1.
Semantics: Invalidate radix tree lookaside information
matching the parameters given.
* Return H_P2, H_P3 or H_P4 if target, type, or pageSizes parameters
are different from the defined values.
* Return H_PARAMETER if NESTED is set and pid is not a valid nested
LPID allocated to this partition
* Return H_P5 if (start, end) doesn't form a valid range. Start and
end should be a valid Quadrant address and end > start.
* Return H_NotSupported if the partition is not in running in radix
translation mode.
* May invalidate more translation information than requested.
* If start = 0 and end = -1, set the range to cover all valid
addresses. Else start and end should be aligned to 4kB (lower 11
bits clear).
* If NESTED is clear, then invalidate process scoped lookaside
information. Else pid specifies a nested LPID, and the invalidation
is performed on nested guest partition table and nested guest
partition scope real addresses.
* If pid = 0 and NESTED is clear, then valid addresses are quadrant 3
and quadrant 0 spaces, Else valid addresses are quadrant 0.
* Pages which are fully covered by the range are to be invalidated.
Those which are partially covered are considered outside
invalidation range, which allows a caller to optimally invalidate
ranges that may contain mixed page sizes.
* Return H_SUCCESS on success.
Signed-off-by: Bharata B Rao <bharata@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210621085003.904767-4-bharata@linux.ibm.com
2021-06-21 14:20:00 +05:30
|
|
|
/*
|
|
|
|
* Performs process-scoped invalidations for a given LPID
|
|
|
|
* as part of H_RPT_INVALIDATE hcall.
|
|
|
|
*/
|
|
|
|
void do_h_rpt_invalidate_prt(unsigned long pid, unsigned long lpid,
|
|
|
|
unsigned long type, unsigned long pg_sizes,
|
|
|
|
unsigned long start, unsigned long end)
|
|
|
|
{
|
|
|
|
unsigned long psize, nr_pages;
|
|
|
|
struct mmu_psize_def *def;
|
|
|
|
bool flush_pid;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* A H_RPTI_TYPE_ALL request implies RIC=3, hence
|
|
|
|
* do a single IS=1 based flush.
|
|
|
|
*/
|
|
|
|
if ((type & H_RPTI_TYPE_ALL) == H_RPTI_TYPE_ALL) {
|
|
|
|
_tlbie_pid_lpid(pid, lpid, RIC_FLUSH_ALL);
|
|
|
|
return;
|
|
|
|
}
|
|
|
|
|
|
|
|
if (type & H_RPTI_TYPE_PWC)
|
|
|
|
_tlbie_pid_lpid(pid, lpid, RIC_FLUSH_PWC);
|
|
|
|
|
|
|
|
/* Full PID flush */
|
|
|
|
if (start == 0 && end == -1)
|
|
|
|
return _tlbie_pid_lpid(pid, lpid, RIC_FLUSH_TLB);
|
|
|
|
|
|
|
|
/* Do range invalidation for all the valid page sizes */
|
|
|
|
for (psize = 0; psize < MMU_PAGE_COUNT; psize++) {
|
|
|
|
def = &mmu_psize_defs[psize];
|
|
|
|
if (!(pg_sizes & def->h_rpt_pgsize))
|
|
|
|
continue;
|
|
|
|
|
|
|
|
nr_pages = (end - start) >> def->shift;
|
|
|
|
flush_pid = nr_pages > tlb_single_page_flush_ceiling;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* If the number of pages spanning the range is above
|
|
|
|
* the ceiling, convert the request into a full PID flush.
|
|
|
|
* And since PID flush takes out all the page sizes, there
|
|
|
|
* is no need to consider remaining page sizes.
|
|
|
|
*/
|
|
|
|
if (flush_pid) {
|
|
|
|
_tlbie_pid_lpid(pid, lpid, RIC_FLUSH_TLB);
|
|
|
|
return;
|
|
|
|
}
|
|
|
|
_tlbie_va_range_lpid(start, end, pid, lpid,
|
|
|
|
(1UL << def->shift), psize, false);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
EXPORT_SYMBOL_GPL(do_h_rpt_invalidate_prt);
|
|
|
|
|
|
|
|
#endif /* CONFIG_KVM_BOOK3S_HV_POSSIBLE */
|
2021-08-12 18:58:30 +05:30
|
|
|
|
|
|
|
static int __init create_tlb_single_page_flush_ceiling(void)
|
|
|
|
{
|
|
|
|
debugfs_create_u32("tlb_single_page_flush_ceiling", 0600,
|
2021-08-12 18:58:31 +05:30
|
|
|
arch_debugfs_dir, &tlb_single_page_flush_ceiling);
|
2021-08-12 18:58:30 +05:30
|
|
|
debugfs_create_u32("tlb_local_single_page_flush_ceiling", 0600,
|
2021-08-12 18:58:31 +05:30
|
|
|
arch_debugfs_dir, &tlb_local_single_page_flush_ceiling);
|
2021-08-12 18:58:30 +05:30
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
late_initcall(create_tlb_single_page_flush_ceiling);
|
|
|
|
|