linux/arch/x86/kernel/cpu
Peter Newman fe1f071438 x86/resctrl: Fix task CLOSID/RMID update race
When the user moves a running task to a new rdtgroup using the task's
file interface or by deleting its rdtgroup, the resulting change in
CLOSID/RMID must be immediately propagated to the PQR_ASSOC MSR on the
task(s) CPUs.

x86 allows reordering loads with prior stores, so if the task starts
running between a task_curr() check that the CPU hoisted before the
stores in the CLOSID/RMID update then it can start running with the old
CLOSID/RMID until it is switched again because __rdtgroup_move_task()
failed to determine that it needs to be interrupted to obtain the new
CLOSID/RMID.

Refer to the diagram below:

CPU 0                                   CPU 1
-----                                   -----
__rdtgroup_move_task():
  curr <- t1->cpu->rq->curr
                                        __schedule():
                                          rq->curr <- t1
                                        resctrl_sched_in():
                                          t1->{closid,rmid} -> {1,1}
  t1->{closid,rmid} <- {2,2}
  if (curr == t1) // false
   IPI(t1->cpu)

A similar race impacts rdt_move_group_tasks(), which updates tasks in a
deleted rdtgroup.

In both cases, use smp_mb() to order the task_struct::{closid,rmid}
stores before the loads in task_curr().  In particular, in the
rdt_move_group_tasks() case, simply execute an smp_mb() on every
iteration with a matching task.

It is possible to use a single smp_mb() in rdt_move_group_tasks(), but
this would require two passes and a means of remembering which
task_structs were updated in the first loop. However, benchmarking
results below showed too little performance impact in the simple
approach to justify implementing the two-pass approach.

Times below were collected using `perf stat` to measure the time to
remove a group containing a 1600-task, parallel workload.

CPU: Intel(R) Xeon(R) Platinum P-8136 CPU @ 2.00GHz (112 threads)

  # mkdir /sys/fs/resctrl/test
  # echo $$ > /sys/fs/resctrl/test/tasks
  # perf bench sched messaging -g 40 -l 100000

task-clock time ranges collected using:

  # perf stat rmdir /sys/fs/resctrl/test

Baseline:                     1.54 - 1.60 ms
smp_mb() every matching task: 1.57 - 1.67 ms

  [ bp: Massage commit message. ]

Fixes: ae28d1aae4 ("x86/resctrl: Use an IPI instead of task_work_add() to update PQR_ASSOC MSR")
Fixes: 0efc89be94 ("x86/intel_rdt: Update task closid immediately on CPU in rmdir and unmount")
Signed-off-by: Peter Newman <peternewman@google.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Reviewed-by: Babu Moger <babu.moger@amd.com>
Cc: <stable@kernel.org>
Link: https://lore.kernel.org/r/20221220161123.432120-1-peternewman@google.com
2023-01-10 19:47:30 +01:00
..
mce x86/mce: Use severity table to handle uncorrected errors in kernel 2022-10-31 17:01:19 +01:00
microcode - Add support for multiple testing sequences to the Intel In-Field Scan 2022-12-13 15:05:29 -08:00
mtrr x86/mtrr: Make message for disabled MTRRs more descriptive 2022-12-05 11:08:25 +01:00
resctrl x86/resctrl: Fix task CLOSID/RMID update race 2023-01-10 19:47:30 +01:00
sgx MM patches for 6.2-rc1. 2022-12-13 19:29:45 -08:00
.gitignore
acrn.c x86/acrn: Set up timekeeping 2022-08-04 11:11:59 +02:00
amd.c - Split MTRR and PAT init code to accomodate at least Xen PV and TDX 2022-12-13 14:56:56 -08:00
aperfmperf.c
bugs.c x86/bugs: Flush IBP in ib_prctl_set() 2023-01-04 11:25:32 +01:00
cacheinfo.c x86/cacheinfo: Switch cache_ap_init() to hotplug callback 2022-11-10 13:12:45 +01:00
centaur.c
common.c - Add the call depth tracking mitigation for Retbleed which has 2022-12-14 15:03:00 -08:00
cpu.h
cpuid-deps.c KVM/VMX: Allow exposing EDECCSSA user leaf function to KVM guest 2022-11-04 15:33:56 -07:00
cyrix.c x86/cyrix: include header linux/isa-dma.h 2022-07-26 14:03:12 -05:00
feat_ctl.c x86/cpu: Include the header of init_ia32_feat_ctl()'s prototype 2022-09-26 17:06:27 +02:00
hygon.c - Split MTRR and PAT init code to accomodate at least Xen PV and TDX 2022-12-13 14:56:56 -08:00
hypervisor.c
intel.c - Add support for multiple testing sequences to the Intel In-Field Scan 2022-12-13 15:05:29 -08:00
intel_epb.c x86/intel_epb: Set Alder Lake N and Raptor Lake P normal EPB 2022-11-03 11:31:01 -07:00
intel_pconfig.c
Makefile x86/cpu: Re-enable stackprotector 2022-10-17 16:40:56 +02:00
match.c
mkcapflags.sh
mshyperv.c iommu/hyper-v: Allow hyperv irq remapping without x2apic 2022-11-28 16:48:20 +00:00
perfctr-watchdog.c
powerflags.c
proc.c
rdrand.c x86/rdrand: Remove "nordrand" flag in favor of "random.trust_cpu" 2022-07-18 15:04:04 +02:00
scattered.c KVM/VMX: Allow exposing EDECCSSA user leaf function to KVM guest 2022-11-04 15:33:56 -07:00
topology.c x86/topology: Fix duplicated core ID within a package 2022-10-17 11:58:52 -07:00
transmeta.c
tsx.c x86/tsx: Add a feature bit for TSX control MSR support 2022-11-21 14:08:20 +01:00
umc.c
umwait.c
vmware.c
vortex.c
zhaoxin.c