linux/arch/x86/kernel/cpu/resctrl/ctrlmondata.c

598 lines
14 KiB
C
Raw Normal View History

// SPDX-License-Identifier: GPL-2.0-only
/*
* Resource Director Technology(RDT)
* - Cache Allocation code.
*
* Copyright (C) 2016 Intel Corporation
*
* Authors:
* Fenghua Yu <fenghua.yu@intel.com>
* Tony Luck <tony.luck@intel.com>
*
* More information about RDT be found in the Intel (R) x86 Architecture
* Software Developer Manual June 2016, volume 3, section 17.17.
*/
#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
x86/intel_rdt: Ensure a CPU remains online for the region's pseudo-locking sequence The user triggers the creation of a pseudo-locked region when writing the requested schemata to the schemata resctrl file. The pseudo-locking of a region is required to be done on a CPU that is associated with the cache on which the pseudo-locked region will reside. In order to run the locking code on a specific CPU, the needed CPU has to be selected and ensured to remain online during the entire locking sequence. At this time, the cpu_hotplug_lock is not taken during the pseudo-lock region creation and it is thus possible for a CPU to be selected to run the pseudo-locking code and then that CPU to go offline before the thread is able to run on it. Fix this by ensuring that the cpu_hotplug_lock is taken while the CPU on which code has to run needs to be controlled. Since the cpu_hotplug_lock is always taken before rdtgroup_mutex the lock order is maintained. Fixes: e0bdfe8e36f3 ("x86/intel_rdt: Support creation/removal of pseudo-locked region") Signed-off-by: Reinette Chatre <reinette.chatre@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Cc: gavin.hindman@intel.com Cc: jithu.joseph@intel.com Cc: stable <stable@vger.kernel.org> Cc: x86-ml <x86@kernel.org> Link: https://lkml.kernel.org/r/b7b17432a80f95a1fa21a1698ba643014f58ad31.1544476425.git.reinette.chatre@intel.com
2018-12-10 13:21:54 -08:00
#include <linux/cpu.h>
#include <linux/kernfs.h>
#include <linux/seq_file.h>
#include <linux/slab.h>
x86/resctrl: Queue mon_event_read() instead of sending an IPI Intel is blessed with an abundance of monitors, one per RMID, that can be read from any CPU in the domain. MPAMs monitors reside in the MMIO MSC, the number implemented is up to the manufacturer. This means when there are fewer monitors than needed, they need to be allocated and freed. MPAM's CSU monitors are used to back the 'llc_occupancy' monitor file. The CSU counter is allowed to return 'not ready' for a small number of micro-seconds after programming. To allow one CSU hardware monitor to be used for multiple control or monitor groups, the CPU accessing the monitor needs to be able to block when configuring and reading the counter. Worse, the domain may be broken up into slices, and the MMIO accesses for each slice may need performing from different CPUs. These two details mean MPAMs monitor code needs to be able to sleep, and IPI another CPU in the domain to read from a resource that has been sliced. mon_event_read() already invokes mon_event_count() via IPI, which means this isn't possible. On systems using nohz-full, some CPUs need to be interrupted to run kernel work as they otherwise stay in user-space running realtime workloads. Interrupting these CPUs should be avoided, and scheduling work on them may never complete. Change mon_event_read() to pick a housekeeping CPU, (one that is not using nohz_full) and schedule mon_event_count() and wait. If all the CPUs in a domain are using nohz-full, then an IPI is used as the fallback. This function is only used in response to a user-space filesystem request (not the timing sensitive overflow code). This allows MPAM to hide the slice behaviour from resctrl, and to keep the monitor-allocation in monitor.c. When the IPI fallback is used on machines where MPAM needs to make an access on multiple CPUs, the counter read will always fail. Signed-off-by: James Morse <james.morse@arm.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Shaopeng Tan <tan.shaopeng@fujitsu.com> Reviewed-by: Peter Newman <peternewman@google.com> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Reviewed-by: Babu Moger <babu.moger@amd.com> Tested-by: Shaopeng Tan <tan.shaopeng@fujitsu.com> Tested-by: Peter Newman <peternewman@google.com> Tested-by: Babu Moger <babu.moger@amd.com> Tested-by: Carl Worth <carl@os.amperecomputing.com> # arm64 Link: https://lore.kernel.org/r/20240213184438.16675-14-james.morse@arm.com Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
2024-02-13 18:44:27 +00:00
#include <linux/tick.h>
x86/resctrl: Rename and move rdt files to a separate directory New generation of AMD processors add support for RDT (or QOS) features. Together, these features will be called RESCTRL. With more than one vendors supporting these features, it seems more appropriate to rename these files. Create a new directory with the name 'resctrl' and move all the intel_rdt files to the new directory. This way all the resctrl related code resides inside one directory. [ bp: Add SPDX identifier to the Makefile ] Suggested-by: Borislav Petkov <bp@suse.de> Signed-off-by: Babu Moger <babu.moger@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Brijesh Singh <brijesh.singh@amd.com> Cc: "Chang S. Bae" <chang.seok.bae@intel.com> Cc: David Miller <davem@davemloft.net> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Dmitry Safonov <dima@arista.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jann Horn <jannh@google.com> Cc: Joerg Roedel <jroedel@suse.de> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Kate Stewart <kstewart@linuxfoundation.org> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: <linux-doc@vger.kernel.org> Cc: Mauro Carvalho Chehab <mchehab+samsung@kernel.org> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Philippe Ombredanne <pombredanne@nexb.com> Cc: Pu Wen <puwen@hygon.cn> Cc: <qianyue.zj@alibaba-inc.com> Cc: "Rafael J. Wysocki" <rafael@kernel.org> Cc: Reinette Chatre <reinette.chatre@intel.com> Cc: Rian Hunter <rian@alum.mit.edu> Cc: Sherry Hurwitz <sherry.hurwitz@amd.com> Cc: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Thomas Lendacky <Thomas.Lendacky@amd.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Vitaly Kuznetsov <vkuznets@redhat.com> Cc: <xiaochen.shen@intel.com> Link: https://lkml.kernel.org/r/20181121202811.4492-2-babu.moger@amd.com
2018-11-21 20:28:25 +00:00
#include "internal.h"
/*
* Check whether MBA bandwidth percentage value is correct. The value is
* checked against the minimum and max bandwidth values specified by the
* hardware. The allocated bandwidth percentage is rounded to the next
* control step available on the hardware.
*/
static bool bw_validate(char *buf, unsigned long *data, struct rdt_resource *r)
{
unsigned long bw;
int ret;
/*
* Only linear delay values is supported for current Intel SKUs.
*/
if (!r->membw.delay_linear && r->membw.arch_needs_linear) {
rdt_last_cmd_puts("No support for non-linear MB domains\n");
return false;
}
ret = kstrtoul(buf, 10, &bw);
if (ret) {
rdt_last_cmd_printf("Non-decimal digit in MB value %s\n", buf);
return false;
}
if ((bw < r->membw.min_bw || bw > r->default_ctrl) &&
!is_mba_sc(r)) {
rdt_last_cmd_printf("MB value %ld out of range [%d,%d]\n", bw,
r->membw.min_bw, r->default_ctrl);
return false;
}
*data = roundup(bw, (unsigned long)r->membw.bw_gran);
return true;
}
int parse_bw(struct rdt_parse_data *data, struct resctrl_schema *s,
struct rdt_domain *d)
{
struct resctrl_staged_config *cfg;
x86/resctrl: Switch over to the resctrl mbps_val list Updates to resctrl's software controller follow the same path as other configuration updates, but they don't modify the hardware state. rdtgroup_schemata_write() uses parse_line() and the resource's parse_ctrlval() function to stage the configuration. resctrl_arch_update_domains() then updates the mbps_val[] array instead, and resctrl_arch_update_domains() skips the rdt_ctrl_update() call that would update hardware. This complicates the interface between resctrl's filesystem parts and architecture specific code. It should be possible for mba_sc to be completely implemented by the filesystem parts of resctrl. This would allow it to work on a second architecture with no additional code. resctrl_arch_update_domains() using the mbps_val[] array prevents this. Change parse_bw() to write the configuration value directly to the mbps_val[] array in the domain structure. Change rdtgroup_schemata_write() to skip the call to resctrl_arch_update_domains(), meaning all the mba_sc specific code in resctrl_arch_update_domains() can be removed. On the read-side, show_doms() and update_mba_bw() are changed to read the mbps_val[] array from the domain structure. With this, resctrl_arch_get_config() no longer needs to consider mba_sc resources. Signed-off-by: James Morse <james.morse@arm.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Jamie Iles <quic_jiles@quicinc.com> Reviewed-by: Shaopeng Tan <tan.shaopeng@fujitsu.com> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Tested-by: Xin Hao <xhao@linux.alibaba.com> Tested-by: Shaopeng Tan <tan.shaopeng@fujitsu.com> Tested-by: Cristian Marussi <cristian.marussi@arm.com> Link: https://lore.kernel.org/r/20220902154829.30399-10-james.morse@arm.com
2022-09-02 15:48:17 +00:00
u32 closid = data->rdtgrp->closid;
struct rdt_resource *r = s->res;
unsigned long bw_val;
cfg = &d->staged_config[s->conf_type];
if (cfg->have_new_ctrl) {
x86/resctrl: Fixup the user-visible strings Fix the messages in rdt_last_cmd_printf() and rdt_last_cmd_puts() to make them more meaningful and consistent. [ bp: s/cpu/CPU/; s/mem\W/memory ] Signed-off-by: Babu Moger <babu.moger@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Brijesh Singh <brijesh.singh@amd.com> Cc: "Chang S. Bae" <chang.seok.bae@intel.com> Cc: David Miller <davem@davemloft.net> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Dmitry Safonov <dima@arista.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jann Horn <jannh@google.com> Cc: Joerg Roedel <jroedel@suse.de> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Kate Stewart <kstewart@linuxfoundation.org> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: <linux-doc@vger.kernel.org> Cc: Mauro Carvalho Chehab <mchehab+samsung@kernel.org> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Philippe Ombredanne <pombredanne@nexb.com> Cc: Pu Wen <puwen@hygon.cn> Cc: <qianyue.zj@alibaba-inc.com> Cc: "Rafael J. Wysocki" <rafael@kernel.org> Cc: Reinette Chatre <reinette.chatre@intel.com> Cc: Rian Hunter <rian@alum.mit.edu> Cc: Sherry Hurwitz <sherry.hurwitz@amd.com> Cc: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Thomas Lendacky <Thomas.Lendacky@amd.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Vitaly Kuznetsov <vkuznets@redhat.com> Cc: <xiaochen.shen@intel.com> Link: https://lkml.kernel.org/r/20181121202811.4492-11-babu.moger@amd.com
2018-11-21 20:28:43 +00:00
rdt_last_cmd_printf("Duplicate domain %d\n", d->id);
return -EINVAL;
}
if (!bw_validate(data->buf, &bw_val, r))
return -EINVAL;
x86/resctrl: Switch over to the resctrl mbps_val list Updates to resctrl's software controller follow the same path as other configuration updates, but they don't modify the hardware state. rdtgroup_schemata_write() uses parse_line() and the resource's parse_ctrlval() function to stage the configuration. resctrl_arch_update_domains() then updates the mbps_val[] array instead, and resctrl_arch_update_domains() skips the rdt_ctrl_update() call that would update hardware. This complicates the interface between resctrl's filesystem parts and architecture specific code. It should be possible for mba_sc to be completely implemented by the filesystem parts of resctrl. This would allow it to work on a second architecture with no additional code. resctrl_arch_update_domains() using the mbps_val[] array prevents this. Change parse_bw() to write the configuration value directly to the mbps_val[] array in the domain structure. Change rdtgroup_schemata_write() to skip the call to resctrl_arch_update_domains(), meaning all the mba_sc specific code in resctrl_arch_update_domains() can be removed. On the read-side, show_doms() and update_mba_bw() are changed to read the mbps_val[] array from the domain structure. With this, resctrl_arch_get_config() no longer needs to consider mba_sc resources. Signed-off-by: James Morse <james.morse@arm.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Jamie Iles <quic_jiles@quicinc.com> Reviewed-by: Shaopeng Tan <tan.shaopeng@fujitsu.com> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Tested-by: Xin Hao <xhao@linux.alibaba.com> Tested-by: Shaopeng Tan <tan.shaopeng@fujitsu.com> Tested-by: Cristian Marussi <cristian.marussi@arm.com> Link: https://lore.kernel.org/r/20220902154829.30399-10-james.morse@arm.com
2022-09-02 15:48:17 +00:00
if (is_mba_sc(r)) {
d->mbps_val[closid] = bw_val;
return 0;
}
cfg->new_ctrl = bw_val;
cfg->have_new_ctrl = true;
return 0;
}
/*
* Check whether a cache bit mask is valid.
* On Intel CPUs, non-contiguous 1s value support is indicated by CPUID:
* - CPUID.0x10.1:ECX[3]: L3 non-contiguous 1s value supported if 1
* - CPUID.0x10.2:ECX[3]: L2 non-contiguous 1s value supported if 1
*
* Haswell does not support a non-contiguous 1s value and additionally
* requires at least two bits set.
* AMD allows non-contiguous bitmasks.
*/
static bool cbm_validate(char *buf, u32 *data, struct rdt_resource *r)
{
unsigned long first_bit, zero_bit, val;
unsigned int cbm_len = r->cache.cbm_len;
int ret;
ret = kstrtoul(buf, 16, &val);
if (ret) {
x86/resctrl: Fixup the user-visible strings Fix the messages in rdt_last_cmd_printf() and rdt_last_cmd_puts() to make them more meaningful and consistent. [ bp: s/cpu/CPU/; s/mem\W/memory ] Signed-off-by: Babu Moger <babu.moger@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Brijesh Singh <brijesh.singh@amd.com> Cc: "Chang S. Bae" <chang.seok.bae@intel.com> Cc: David Miller <davem@davemloft.net> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Dmitry Safonov <dima@arista.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jann Horn <jannh@google.com> Cc: Joerg Roedel <jroedel@suse.de> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Kate Stewart <kstewart@linuxfoundation.org> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: <linux-doc@vger.kernel.org> Cc: Mauro Carvalho Chehab <mchehab+samsung@kernel.org> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Philippe Ombredanne <pombredanne@nexb.com> Cc: Pu Wen <puwen@hygon.cn> Cc: <qianyue.zj@alibaba-inc.com> Cc: "Rafael J. Wysocki" <rafael@kernel.org> Cc: Reinette Chatre <reinette.chatre@intel.com> Cc: Rian Hunter <rian@alum.mit.edu> Cc: Sherry Hurwitz <sherry.hurwitz@amd.com> Cc: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Thomas Lendacky <Thomas.Lendacky@amd.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Vitaly Kuznetsov <vkuznets@redhat.com> Cc: <xiaochen.shen@intel.com> Link: https://lkml.kernel.org/r/20181121202811.4492-11-babu.moger@amd.com
2018-11-21 20:28:43 +00:00
rdt_last_cmd_printf("Non-hex character in the mask %s\n", buf);
return false;
}
if ((r->cache.min_cbm_bits > 0 && val == 0) || val > r->default_ctrl) {
x86/resctrl: Fixup the user-visible strings Fix the messages in rdt_last_cmd_printf() and rdt_last_cmd_puts() to make them more meaningful and consistent. [ bp: s/cpu/CPU/; s/mem\W/memory ] Signed-off-by: Babu Moger <babu.moger@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Brijesh Singh <brijesh.singh@amd.com> Cc: "Chang S. Bae" <chang.seok.bae@intel.com> Cc: David Miller <davem@davemloft.net> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Dmitry Safonov <dima@arista.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jann Horn <jannh@google.com> Cc: Joerg Roedel <jroedel@suse.de> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Kate Stewart <kstewart@linuxfoundation.org> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: <linux-doc@vger.kernel.org> Cc: Mauro Carvalho Chehab <mchehab+samsung@kernel.org> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Philippe Ombredanne <pombredanne@nexb.com> Cc: Pu Wen <puwen@hygon.cn> Cc: <qianyue.zj@alibaba-inc.com> Cc: "Rafael J. Wysocki" <rafael@kernel.org> Cc: Reinette Chatre <reinette.chatre@intel.com> Cc: Rian Hunter <rian@alum.mit.edu> Cc: Sherry Hurwitz <sherry.hurwitz@amd.com> Cc: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Thomas Lendacky <Thomas.Lendacky@amd.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Vitaly Kuznetsov <vkuznets@redhat.com> Cc: <xiaochen.shen@intel.com> Link: https://lkml.kernel.org/r/20181121202811.4492-11-babu.moger@amd.com
2018-11-21 20:28:43 +00:00
rdt_last_cmd_puts("Mask out of range\n");
return false;
}
first_bit = find_first_bit(&val, cbm_len);
zero_bit = find_next_zero_bit(&val, cbm_len, first_bit);
/* Are non-contiguous bitmasks allowed? */
if (!r->cache.arch_has_sparse_bitmasks &&
(find_next_bit(&val, cbm_len, zero_bit) < cbm_len)) {
x86/resctrl: Fixup the user-visible strings Fix the messages in rdt_last_cmd_printf() and rdt_last_cmd_puts() to make them more meaningful and consistent. [ bp: s/cpu/CPU/; s/mem\W/memory ] Signed-off-by: Babu Moger <babu.moger@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Brijesh Singh <brijesh.singh@amd.com> Cc: "Chang S. Bae" <chang.seok.bae@intel.com> Cc: David Miller <davem@davemloft.net> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Dmitry Safonov <dima@arista.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jann Horn <jannh@google.com> Cc: Joerg Roedel <jroedel@suse.de> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Kate Stewart <kstewart@linuxfoundation.org> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: <linux-doc@vger.kernel.org> Cc: Mauro Carvalho Chehab <mchehab+samsung@kernel.org> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Philippe Ombredanne <pombredanne@nexb.com> Cc: Pu Wen <puwen@hygon.cn> Cc: <qianyue.zj@alibaba-inc.com> Cc: "Rafael J. Wysocki" <rafael@kernel.org> Cc: Reinette Chatre <reinette.chatre@intel.com> Cc: Rian Hunter <rian@alum.mit.edu> Cc: Sherry Hurwitz <sherry.hurwitz@amd.com> Cc: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Thomas Lendacky <Thomas.Lendacky@amd.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Vitaly Kuznetsov <vkuznets@redhat.com> Cc: <xiaochen.shen@intel.com> Link: https://lkml.kernel.org/r/20181121202811.4492-11-babu.moger@amd.com
2018-11-21 20:28:43 +00:00
rdt_last_cmd_printf("The mask %lx has non-consecutive 1-bits\n", val);
return false;
}
if ((zero_bit - first_bit) < r->cache.min_cbm_bits) {
x86/resctrl: Fixup the user-visible strings Fix the messages in rdt_last_cmd_printf() and rdt_last_cmd_puts() to make them more meaningful and consistent. [ bp: s/cpu/CPU/; s/mem\W/memory ] Signed-off-by: Babu Moger <babu.moger@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Brijesh Singh <brijesh.singh@amd.com> Cc: "Chang S. Bae" <chang.seok.bae@intel.com> Cc: David Miller <davem@davemloft.net> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Dmitry Safonov <dima@arista.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jann Horn <jannh@google.com> Cc: Joerg Roedel <jroedel@suse.de> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Kate Stewart <kstewart@linuxfoundation.org> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: <linux-doc@vger.kernel.org> Cc: Mauro Carvalho Chehab <mchehab+samsung@kernel.org> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Philippe Ombredanne <pombredanne@nexb.com> Cc: Pu Wen <puwen@hygon.cn> Cc: <qianyue.zj@alibaba-inc.com> Cc: "Rafael J. Wysocki" <rafael@kernel.org> Cc: Reinette Chatre <reinette.chatre@intel.com> Cc: Rian Hunter <rian@alum.mit.edu> Cc: Sherry Hurwitz <sherry.hurwitz@amd.com> Cc: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Thomas Lendacky <Thomas.Lendacky@amd.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Vitaly Kuznetsov <vkuznets@redhat.com> Cc: <xiaochen.shen@intel.com> Link: https://lkml.kernel.org/r/20181121202811.4492-11-babu.moger@amd.com
2018-11-21 20:28:43 +00:00
rdt_last_cmd_printf("Need at least %d bits in the mask\n",
r->cache.min_cbm_bits);
return false;
}
*data = val;
return true;
}
/*
* Read one cache bit mask (hex). Check that it is valid for the current
* resource type.
*/
int parse_cbm(struct rdt_parse_data *data, struct resctrl_schema *s,
struct rdt_domain *d)
{
struct rdtgroup *rdtgrp = data->rdtgrp;
struct resctrl_staged_config *cfg;
struct rdt_resource *r = s->res;
u32 cbm_val;
cfg = &d->staged_config[s->conf_type];
if (cfg->have_new_ctrl) {
x86/resctrl: Fixup the user-visible strings Fix the messages in rdt_last_cmd_printf() and rdt_last_cmd_puts() to make them more meaningful and consistent. [ bp: s/cpu/CPU/; s/mem\W/memory ] Signed-off-by: Babu Moger <babu.moger@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Brijesh Singh <brijesh.singh@amd.com> Cc: "Chang S. Bae" <chang.seok.bae@intel.com> Cc: David Miller <davem@davemloft.net> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Dmitry Safonov <dima@arista.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jann Horn <jannh@google.com> Cc: Joerg Roedel <jroedel@suse.de> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Kate Stewart <kstewart@linuxfoundation.org> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: <linux-doc@vger.kernel.org> Cc: Mauro Carvalho Chehab <mchehab+samsung@kernel.org> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Philippe Ombredanne <pombredanne@nexb.com> Cc: Pu Wen <puwen@hygon.cn> Cc: <qianyue.zj@alibaba-inc.com> Cc: "Rafael J. Wysocki" <rafael@kernel.org> Cc: Reinette Chatre <reinette.chatre@intel.com> Cc: Rian Hunter <rian@alum.mit.edu> Cc: Sherry Hurwitz <sherry.hurwitz@amd.com> Cc: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Thomas Lendacky <Thomas.Lendacky@amd.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Vitaly Kuznetsov <vkuznets@redhat.com> Cc: <xiaochen.shen@intel.com> Link: https://lkml.kernel.org/r/20181121202811.4492-11-babu.moger@amd.com
2018-11-21 20:28:43 +00:00
rdt_last_cmd_printf("Duplicate domain %d\n", d->id);
return -EINVAL;
}
/*
* Cannot set up more than one pseudo-locked region in a cache
* hierarchy.
*/
if (rdtgrp->mode == RDT_MODE_PSEUDO_LOCKSETUP &&
rdtgroup_pseudo_locked_in_hierarchy(d)) {
x86/resctrl: Use rdt_last_cmd_puts() where possible The last_cmd_status sequence buffer contains user-visible messages (accessed via /sys/fs/resctrl/info/last_cmd_status) that detail any errors encountered while interacting with the resctrl filesystem. rdt_last_cmd_printf() and rdt_last_cmd_puts() are the two calls available to respectively print a string with format specifiers or a simple one (which contains no format specifiers) to the last_cmd_status buffer. A few occurrences exist where rdt_last_cmd_printf() is used to print a simple string. Doing so does not result in incorrect result or incorrect behavior, but rdt_last_cmd_puts() is the function intended to be used in these cases, as it is faster and it doesn't need to do the vsnprintf() formatting. Fix these occurrences to use rdt_last_cmd_puts() instead. While doing so, fix two typos that were recently introduced into two of these simple strings. [ bp: massage commit message and correct typos. ] Fixes: 723f1a0dd8e2 ("x86/resctrl: Fixup the user-visible strings") Fixes: e0bdfe8e36f3 ("x86/intel_rdt: Support creation/removal of pseudo-locked region") Fixes: 9ab9aa15c309 ("x86/intel_rdt: Ensure requested schemata respects mode") Fixes: d48d7a57f718 ("x86/intel_rdt: Introduce resource group's mode resctrl file") Fixes: dfe9674b04ff ("x86/intel_rdt: Enable entering of pseudo-locksetup mode") Signed-off-by: Reinette Chatre <reinette.chatre@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Cc: babu.moger@amd.com Cc: jithu.joseph@intel.com Cc: x86-ml <x86@kernel.org> Link: https://lkml.kernel.org/r/f48e46a016d6a5c79f13de8faeca382052189e2e.1543346009.git.reinette.chatre@intel.com
2018-11-27 11:19:36 -08:00
rdt_last_cmd_puts("Pseudo-locked region in hierarchy\n");
return -EINVAL;
}
if (!cbm_validate(data->buf, &cbm_val, r))
return -EINVAL;
if ((rdtgrp->mode == RDT_MODE_EXCLUSIVE ||
rdtgrp->mode == RDT_MODE_SHAREABLE) &&
rdtgroup_cbm_overlaps_pseudo_locked(d, cbm_val)) {
x86/resctrl: Use rdt_last_cmd_puts() where possible The last_cmd_status sequence buffer contains user-visible messages (accessed via /sys/fs/resctrl/info/last_cmd_status) that detail any errors encountered while interacting with the resctrl filesystem. rdt_last_cmd_printf() and rdt_last_cmd_puts() are the two calls available to respectively print a string with format specifiers or a simple one (which contains no format specifiers) to the last_cmd_status buffer. A few occurrences exist where rdt_last_cmd_printf() is used to print a simple string. Doing so does not result in incorrect result or incorrect behavior, but rdt_last_cmd_puts() is the function intended to be used in these cases, as it is faster and it doesn't need to do the vsnprintf() formatting. Fix these occurrences to use rdt_last_cmd_puts() instead. While doing so, fix two typos that were recently introduced into two of these simple strings. [ bp: massage commit message and correct typos. ] Fixes: 723f1a0dd8e2 ("x86/resctrl: Fixup the user-visible strings") Fixes: e0bdfe8e36f3 ("x86/intel_rdt: Support creation/removal of pseudo-locked region") Fixes: 9ab9aa15c309 ("x86/intel_rdt: Ensure requested schemata respects mode") Fixes: d48d7a57f718 ("x86/intel_rdt: Introduce resource group's mode resctrl file") Fixes: dfe9674b04ff ("x86/intel_rdt: Enable entering of pseudo-locksetup mode") Signed-off-by: Reinette Chatre <reinette.chatre@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Cc: babu.moger@amd.com Cc: jithu.joseph@intel.com Cc: x86-ml <x86@kernel.org> Link: https://lkml.kernel.org/r/f48e46a016d6a5c79f13de8faeca382052189e2e.1543346009.git.reinette.chatre@intel.com
2018-11-27 11:19:36 -08:00
rdt_last_cmd_puts("CBM overlaps with pseudo-locked region\n");
return -EINVAL;
}
/*
* The CBM may not overlap with the CBM of another closid if
* either is exclusive.
*/
if (rdtgroup_cbm_overlaps(s, d, cbm_val, rdtgrp->closid, true)) {
x86/resctrl: Use rdt_last_cmd_puts() where possible The last_cmd_status sequence buffer contains user-visible messages (accessed via /sys/fs/resctrl/info/last_cmd_status) that detail any errors encountered while interacting with the resctrl filesystem. rdt_last_cmd_printf() and rdt_last_cmd_puts() are the two calls available to respectively print a string with format specifiers or a simple one (which contains no format specifiers) to the last_cmd_status buffer. A few occurrences exist where rdt_last_cmd_printf() is used to print a simple string. Doing so does not result in incorrect result or incorrect behavior, but rdt_last_cmd_puts() is the function intended to be used in these cases, as it is faster and it doesn't need to do the vsnprintf() formatting. Fix these occurrences to use rdt_last_cmd_puts() instead. While doing so, fix two typos that were recently introduced into two of these simple strings. [ bp: massage commit message and correct typos. ] Fixes: 723f1a0dd8e2 ("x86/resctrl: Fixup the user-visible strings") Fixes: e0bdfe8e36f3 ("x86/intel_rdt: Support creation/removal of pseudo-locked region") Fixes: 9ab9aa15c309 ("x86/intel_rdt: Ensure requested schemata respects mode") Fixes: d48d7a57f718 ("x86/intel_rdt: Introduce resource group's mode resctrl file") Fixes: dfe9674b04ff ("x86/intel_rdt: Enable entering of pseudo-locksetup mode") Signed-off-by: Reinette Chatre <reinette.chatre@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Cc: babu.moger@amd.com Cc: jithu.joseph@intel.com Cc: x86-ml <x86@kernel.org> Link: https://lkml.kernel.org/r/f48e46a016d6a5c79f13de8faeca382052189e2e.1543346009.git.reinette.chatre@intel.com
2018-11-27 11:19:36 -08:00
rdt_last_cmd_puts("Overlaps with exclusive group\n");
return -EINVAL;
}
if (rdtgroup_cbm_overlaps(s, d, cbm_val, rdtgrp->closid, false)) {
if (rdtgrp->mode == RDT_MODE_EXCLUSIVE ||
rdtgrp->mode == RDT_MODE_PSEUDO_LOCKSETUP) {
x86/resctrl: Use rdt_last_cmd_puts() where possible The last_cmd_status sequence buffer contains user-visible messages (accessed via /sys/fs/resctrl/info/last_cmd_status) that detail any errors encountered while interacting with the resctrl filesystem. rdt_last_cmd_printf() and rdt_last_cmd_puts() are the two calls available to respectively print a string with format specifiers or a simple one (which contains no format specifiers) to the last_cmd_status buffer. A few occurrences exist where rdt_last_cmd_printf() is used to print a simple string. Doing so does not result in incorrect result or incorrect behavior, but rdt_last_cmd_puts() is the function intended to be used in these cases, as it is faster and it doesn't need to do the vsnprintf() formatting. Fix these occurrences to use rdt_last_cmd_puts() instead. While doing so, fix two typos that were recently introduced into two of these simple strings. [ bp: massage commit message and correct typos. ] Fixes: 723f1a0dd8e2 ("x86/resctrl: Fixup the user-visible strings") Fixes: e0bdfe8e36f3 ("x86/intel_rdt: Support creation/removal of pseudo-locked region") Fixes: 9ab9aa15c309 ("x86/intel_rdt: Ensure requested schemata respects mode") Fixes: d48d7a57f718 ("x86/intel_rdt: Introduce resource group's mode resctrl file") Fixes: dfe9674b04ff ("x86/intel_rdt: Enable entering of pseudo-locksetup mode") Signed-off-by: Reinette Chatre <reinette.chatre@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Cc: babu.moger@amd.com Cc: jithu.joseph@intel.com Cc: x86-ml <x86@kernel.org> Link: https://lkml.kernel.org/r/f48e46a016d6a5c79f13de8faeca382052189e2e.1543346009.git.reinette.chatre@intel.com
2018-11-27 11:19:36 -08:00
rdt_last_cmd_puts("Overlaps with other group\n");
return -EINVAL;
}
}
cfg->new_ctrl = cbm_val;
cfg->have_new_ctrl = true;
return 0;
}
/*
* For each domain in this resource we expect to find a series of:
* id=mask
* separated by ";". The "id" is in decimal, and must match one of
* the "id"s for this resource.
*/
static int parse_line(char *line, struct resctrl_schema *s,
struct rdtgroup *rdtgrp)
{
enum resctrl_conf_type t = s->conf_type;
struct resctrl_staged_config *cfg;
struct rdt_resource *r = s->res;
struct rdt_parse_data data;
char *dom = NULL, *id;
struct rdt_domain *d;
unsigned long dom_id;
x86/resctrl: Separate arch and fs resctrl locks resctrl has one mutex that is taken by the architecture-specific code, and the filesystem parts. The two interact via cpuhp, where the architecture code updates the domain list. Filesystem handlers that walk the domains list should not run concurrently with the cpuhp callback modifying the list. Exposing a lock from the filesystem code means the interface is not cleanly defined, and creates the possibility of cross-architecture lock ordering headaches. The interaction only exists so that certain filesystem paths are serialised against CPU hotplug. The CPU hotplug code already has a mechanism to do this using cpus_read_lock(). MPAM's monitors have an overflow interrupt, so it needs to be possible to walk the domains list in irq context. RCU is ideal for this, but some paths need to be able to sleep to allocate memory. Because resctrl_{on,off}line_cpu() take the rdtgroup_mutex as part of a cpuhp callback, cpus_read_lock() must always be taken first. rdtgroup_schemata_write() already does this. Most of the filesystem code's domain list walkers are currently protected by the rdtgroup_mutex taken in rdtgroup_kn_lock_live(). The exceptions are rdt_bit_usage_show() and the mon_config helpers which take the lock directly. Make the domain list protected by RCU. An architecture-specific lock prevents concurrent writers. rdt_bit_usage_show() could walk the domain list using RCU, but to keep all the filesystem operations the same, this is changed to call cpus_read_lock(). The mon_config helpers send multiple IPIs, take the cpus_read_lock() in these cases. The other filesystem list walkers need to be able to sleep. Add cpus_read_lock() to rdtgroup_kn_lock_live() so that the cpuhp callbacks can't be invoked when file system operations are occurring. Add lockdep_assert_cpus_held() in the cases where the rdtgroup_kn_lock_live() call isn't obvious. Resctrl's domain online/offline calls now need to take the rdtgroup_mutex themselves. [ bp: Fold in a build fix: https://lore.kernel.org/r/87zfvwieli.ffs@tglx ] Signed-off-by: James Morse <james.morse@arm.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Shaopeng Tan <tan.shaopeng@fujitsu.com> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Reviewed-by: Babu Moger <babu.moger@amd.com> Tested-by: Shaopeng Tan <tan.shaopeng@fujitsu.com> Tested-by: Peter Newman <peternewman@google.com> Tested-by: Babu Moger <babu.moger@amd.com> Tested-by: Carl Worth <carl@os.amperecomputing.com> # arm64 Link: https://lore.kernel.org/r/20240213184438.16675-25-james.morse@arm.com Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
2024-02-13 18:44:38 +00:00
/* Walking r->domains, ensure it can't race with cpuhp */
lockdep_assert_cpus_held();
if (rdtgrp->mode == RDT_MODE_PSEUDO_LOCKSETUP &&
(r->rid == RDT_RESOURCE_MBA || r->rid == RDT_RESOURCE_SMBA)) {
rdt_last_cmd_puts("Cannot pseudo-lock MBA resource\n");
return -EINVAL;
}
next:
if (!line || line[0] == '\0')
return 0;
dom = strsep(&line, ";");
id = strsep(&dom, "=");
if (!dom || kstrtoul(id, 10, &dom_id)) {
rdt_last_cmd_puts("Missing '=' or non-numeric domain\n");
return -EINVAL;
}
dom = strim(dom);
list_for_each_entry(d, &r->domains, list) {
if (d->id == dom_id) {
data.buf = dom;
data.rdtgrp = rdtgrp;
if (r->parse_ctrlval(&data, s, d))
return -EINVAL;
if (rdtgrp->mode == RDT_MODE_PSEUDO_LOCKSETUP) {
cfg = &d->staged_config[t];
/*
* In pseudo-locking setup mode and just
* parsed a valid CBM that should be
* pseudo-locked. Only one locked region per
* resource group and domain so just do
* the required initialization for single
* region and return.
*/
rdtgrp->plr->s = s;
rdtgrp->plr->d = d;
rdtgrp->plr->cbm = cfg->new_ctrl;
d->plr = rdtgrp->plr;
return 0;
}
goto next;
}
}
return -EINVAL;
}
static u32 get_config_index(u32 closid, enum resctrl_conf_type type)
{
switch (type) {
default:
case CDP_NONE:
return closid;
case CDP_CODE:
return closid * 2 + 1;
case CDP_DATA:
return closid * 2;
}
}
int resctrl_arch_update_one(struct rdt_resource *r, struct rdt_domain *d,
u32 closid, enum resctrl_conf_type t, u32 cfg_val)
{
struct rdt_hw_resource *hw_res = resctrl_to_arch_res(r);
struct rdt_hw_domain *hw_dom = resctrl_to_arch_dom(d);
u32 idx = get_config_index(closid, t);
struct msr_param msr_param;
if (!cpumask_test_cpu(smp_processor_id(), &d->cpu_mask))
return -EINVAL;
hw_dom->ctrl_val[idx] = cfg_val;
msr_param.res = r;
msr_param.dom = d;
msr_param.low = idx;
msr_param.high = idx + 1;
hw_res->msr_update(&msr_param);
return 0;
}
int resctrl_arch_update_domains(struct rdt_resource *r, u32 closid)
{
struct resctrl_staged_config *cfg;
struct rdt_hw_domain *hw_dom;
struct msr_param msr_param;
enum resctrl_conf_type t;
struct rdt_domain *d;
u32 idx;
x86/resctrl: Separate arch and fs resctrl locks resctrl has one mutex that is taken by the architecture-specific code, and the filesystem parts. The two interact via cpuhp, where the architecture code updates the domain list. Filesystem handlers that walk the domains list should not run concurrently with the cpuhp callback modifying the list. Exposing a lock from the filesystem code means the interface is not cleanly defined, and creates the possibility of cross-architecture lock ordering headaches. The interaction only exists so that certain filesystem paths are serialised against CPU hotplug. The CPU hotplug code already has a mechanism to do this using cpus_read_lock(). MPAM's monitors have an overflow interrupt, so it needs to be possible to walk the domains list in irq context. RCU is ideal for this, but some paths need to be able to sleep to allocate memory. Because resctrl_{on,off}line_cpu() take the rdtgroup_mutex as part of a cpuhp callback, cpus_read_lock() must always be taken first. rdtgroup_schemata_write() already does this. Most of the filesystem code's domain list walkers are currently protected by the rdtgroup_mutex taken in rdtgroup_kn_lock_live(). The exceptions are rdt_bit_usage_show() and the mon_config helpers which take the lock directly. Make the domain list protected by RCU. An architecture-specific lock prevents concurrent writers. rdt_bit_usage_show() could walk the domain list using RCU, but to keep all the filesystem operations the same, this is changed to call cpus_read_lock(). The mon_config helpers send multiple IPIs, take the cpus_read_lock() in these cases. The other filesystem list walkers need to be able to sleep. Add cpus_read_lock() to rdtgroup_kn_lock_live() so that the cpuhp callbacks can't be invoked when file system operations are occurring. Add lockdep_assert_cpus_held() in the cases where the rdtgroup_kn_lock_live() call isn't obvious. Resctrl's domain online/offline calls now need to take the rdtgroup_mutex themselves. [ bp: Fold in a build fix: https://lore.kernel.org/r/87zfvwieli.ffs@tglx ] Signed-off-by: James Morse <james.morse@arm.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Shaopeng Tan <tan.shaopeng@fujitsu.com> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Reviewed-by: Babu Moger <babu.moger@amd.com> Tested-by: Shaopeng Tan <tan.shaopeng@fujitsu.com> Tested-by: Peter Newman <peternewman@google.com> Tested-by: Babu Moger <babu.moger@amd.com> Tested-by: Carl Worth <carl@os.amperecomputing.com> # arm64 Link: https://lore.kernel.org/r/20240213184438.16675-25-james.morse@arm.com Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
2024-02-13 18:44:38 +00:00
/* Walking r->domains, ensure it can't race with cpuhp */
lockdep_assert_cpus_held();
list_for_each_entry(d, &r->domains, list) {
hw_dom = resctrl_to_arch_dom(d);
msr_param.res = NULL;
for (t = 0; t < CDP_NUM_TYPES; t++) {
cfg = &hw_dom->d_resctrl.staged_config[t];
if (!cfg->have_new_ctrl)
continue;
idx = get_config_index(closid, t);
if (cfg->new_ctrl == hw_dom->ctrl_val[idx])
continue;
hw_dom->ctrl_val[idx] = cfg->new_ctrl;
if (!msr_param.res) {
msr_param.low = idx;
msr_param.high = msr_param.low + 1;
msr_param.res = r;
msr_param.dom = d;
} else {
msr_param.low = min(msr_param.low, idx);
msr_param.high = max(msr_param.high, idx + 1);
}
}
if (msr_param.res)
smp_call_function_any(&d->cpu_mask, rdt_ctrl_update, &msr_param, 1);
}
return 0;
}
static int rdtgroup_parse_resource(char *resname, char *tok,
struct rdtgroup *rdtgrp)
{
struct resctrl_schema *s;
list_for_each_entry(s, &resctrl_schema_all, list) {
if (!strcmp(resname, s->name) && rdtgrp->closid < s->num_closid)
return parse_line(tok, s, rdtgrp);
}
x86/resctrl: Fixup the user-visible strings Fix the messages in rdt_last_cmd_printf() and rdt_last_cmd_puts() to make them more meaningful and consistent. [ bp: s/cpu/CPU/; s/mem\W/memory ] Signed-off-by: Babu Moger <babu.moger@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Brijesh Singh <brijesh.singh@amd.com> Cc: "Chang S. Bae" <chang.seok.bae@intel.com> Cc: David Miller <davem@davemloft.net> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Dmitry Safonov <dima@arista.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jann Horn <jannh@google.com> Cc: Joerg Roedel <jroedel@suse.de> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Kate Stewart <kstewart@linuxfoundation.org> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: <linux-doc@vger.kernel.org> Cc: Mauro Carvalho Chehab <mchehab+samsung@kernel.org> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Philippe Ombredanne <pombredanne@nexb.com> Cc: Pu Wen <puwen@hygon.cn> Cc: <qianyue.zj@alibaba-inc.com> Cc: "Rafael J. Wysocki" <rafael@kernel.org> Cc: Reinette Chatre <reinette.chatre@intel.com> Cc: Rian Hunter <rian@alum.mit.edu> Cc: Sherry Hurwitz <sherry.hurwitz@amd.com> Cc: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Thomas Lendacky <Thomas.Lendacky@amd.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Vitaly Kuznetsov <vkuznets@redhat.com> Cc: <xiaochen.shen@intel.com> Link: https://lkml.kernel.org/r/20181121202811.4492-11-babu.moger@amd.com
2018-11-21 20:28:43 +00:00
rdt_last_cmd_printf("Unknown or unsupported resource name '%s'\n", resname);
return -EINVAL;
}
ssize_t rdtgroup_schemata_write(struct kernfs_open_file *of,
char *buf, size_t nbytes, loff_t off)
{
struct resctrl_schema *s;
struct rdtgroup *rdtgrp;
struct rdt_resource *r;
char *tok, *resname;
int ret = 0;
/* Valid input requires a trailing newline */
if (nbytes == 0 || buf[nbytes - 1] != '\n')
return -EINVAL;
buf[nbytes - 1] = '\0';
rdtgrp = rdtgroup_kn_lock_live(of->kn);
if (!rdtgrp) {
rdtgroup_kn_unlock(of->kn);
return -ENOENT;
}
rdt_last_cmd_clear();
/*
* No changes to pseudo-locked region allowed. It has to be removed
* and re-created instead.
*/
if (rdtgrp->mode == RDT_MODE_PSEUDO_LOCKED) {
ret = -EINVAL;
x86/resctrl: Fixup the user-visible strings Fix the messages in rdt_last_cmd_printf() and rdt_last_cmd_puts() to make them more meaningful and consistent. [ bp: s/cpu/CPU/; s/mem\W/memory ] Signed-off-by: Babu Moger <babu.moger@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Brijesh Singh <brijesh.singh@amd.com> Cc: "Chang S. Bae" <chang.seok.bae@intel.com> Cc: David Miller <davem@davemloft.net> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Dmitry Safonov <dima@arista.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jann Horn <jannh@google.com> Cc: Joerg Roedel <jroedel@suse.de> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Kate Stewart <kstewart@linuxfoundation.org> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: <linux-doc@vger.kernel.org> Cc: Mauro Carvalho Chehab <mchehab+samsung@kernel.org> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Philippe Ombredanne <pombredanne@nexb.com> Cc: Pu Wen <puwen@hygon.cn> Cc: <qianyue.zj@alibaba-inc.com> Cc: "Rafael J. Wysocki" <rafael@kernel.org> Cc: Reinette Chatre <reinette.chatre@intel.com> Cc: Rian Hunter <rian@alum.mit.edu> Cc: Sherry Hurwitz <sherry.hurwitz@amd.com> Cc: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Thomas Lendacky <Thomas.Lendacky@amd.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Vitaly Kuznetsov <vkuznets@redhat.com> Cc: <xiaochen.shen@intel.com> Link: https://lkml.kernel.org/r/20181121202811.4492-11-babu.moger@amd.com
2018-11-21 20:28:43 +00:00
rdt_last_cmd_puts("Resource group is pseudo-locked\n");
goto out;
}
x86/resctrl: Clear staged_config[] before and after it is used As a temporary storage, staged_config[] in rdt_domain should be cleared before and after it is used. The stale value in staged_config[] could cause an MSR access error. Here is a reproducer on a system with 16 usable CLOSIDs for a 15-way L3 Cache (MBA should be disabled if the number of CLOSIDs for MB is less than 16.) : mount -t resctrl resctrl -o cdp /sys/fs/resctrl mkdir /sys/fs/resctrl/p{1..7} umount /sys/fs/resctrl/ mount -t resctrl resctrl /sys/fs/resctrl mkdir /sys/fs/resctrl/p{1..8} An error occurs when creating resource group named p8: unchecked MSR access error: WRMSR to 0xca0 (tried to write 0x00000000000007ff) at rIP: 0xffffffff82249142 (cat_wrmsr+0x32/0x60) Call Trace: <IRQ> __flush_smp_call_function_queue+0x11d/0x170 __sysvec_call_function+0x24/0xd0 sysvec_call_function+0x89/0xc0 </IRQ> <TASK> asm_sysvec_call_function+0x16/0x20 When creating a new resource control group, hardware will be configured by the following process: rdtgroup_mkdir() rdtgroup_mkdir_ctrl_mon() rdtgroup_init_alloc() resctrl_arch_update_domains() resctrl_arch_update_domains() iterates and updates all resctrl_conf_type whose have_new_ctrl is true. Since staged_config[] holds the same values as when CDP was enabled, it will continue to update the CDP_CODE and CDP_DATA configurations. When group p8 is created, get_config_index() called in resctrl_arch_update_domains() will return 16 and 17 as the CLOSIDs for CDP_CODE and CDP_DATA, which will be translated to an invalid register - 0xca0 in this scenario. Fix it by clearing staged_config[] before and after it is used. [reinette: re-order commit tags] Fixes: 75408e43509e ("x86/resctrl: Allow different CODE/DATA configurations to be staged") Suggested-by: Xin Hao <xhao@linux.alibaba.com> Signed-off-by: Shawn Wang <shawnwang@linux.alibaba.com> Signed-off-by: Reinette Chatre <reinette.chatre@intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Tested-by: Reinette Chatre <reinette.chatre@intel.com> Cc:stable@vger.kernel.org Link: https://lore.kernel.org/all/2fad13f49fbe89687fc40e9a5a61f23a28d1507a.1673988935.git.reinette.chatre%40intel.com
2023-01-17 13:14:50 -08:00
rdt_staged_configs_clear();
while ((tok = strsep(&buf, "\n")) != NULL) {
resname = strim(strsep(&tok, ":"));
if (!tok) {
rdt_last_cmd_puts("Missing ':'\n");
ret = -EINVAL;
goto out;
}
x86/intel_rdt: Fix a silent failure when writing zero value schemata Writing an invalid schemata with no domain values (e.g., "(L3|MB):"), results in a silent failure, i.e. the last_cmd_status returns OK, Check for an empty value and set the result string with a proper error message and return -EINVAL. Before the fix: # mkdir /sys/fs/resctrl/p1 # echo "L3:" > /sys/fs/resctrl/p1/schemata (silent failure) # cat /sys/fs/resctrl/info/last_cmd_status ok # echo "MB:" > /sys/fs/resctrl/p1/schemata (silent failure) # cat /sys/fs/resctrl/info/last_cmd_status ok After the fix: # mkdir /sys/fs/resctrl/p1 # echo "L3:" > /sys/fs/resctrl/p1/schemata -bash: echo: write error: Invalid argument # cat /sys/fs/resctrl/info/last_cmd_status Missing 'L3' value # echo "MB:" > /sys/fs/resctrl/p1/schemata -bash: echo: write error: Invalid argument # cat /sys/fs/resctrl/info/last_cmd_status Missing 'MB' value [ Tony: This is an unintended side effect of the patch earlier to allow the user to just write the value they want to change. While allowing user to specify less than all of the values, it also allows an empty value. ] Fixes: c4026b7b95a4 ("x86/intel_rdt: Implement "update" mode when writing schemata file") Signed-off-by: Xiaochen Shen <xiaochen.shen@intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Vikas Shivappa <vikas.shivappa@linux.intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Link: https://lkml.kernel.org/r/20171110191624.20280-1-tony.luck@intel.com
2017-11-10 11:16:24 -08:00
if (tok[0] == '\0') {
rdt_last_cmd_printf("Missing '%s' value\n", resname);
ret = -EINVAL;
goto out;
}
ret = rdtgroup_parse_resource(resname, tok, rdtgrp);
if (ret)
goto out;
}
list_for_each_entry(s, &resctrl_schema_all, list) {
r = s->res;
x86/resctrl: Switch over to the resctrl mbps_val list Updates to resctrl's software controller follow the same path as other configuration updates, but they don't modify the hardware state. rdtgroup_schemata_write() uses parse_line() and the resource's parse_ctrlval() function to stage the configuration. resctrl_arch_update_domains() then updates the mbps_val[] array instead, and resctrl_arch_update_domains() skips the rdt_ctrl_update() call that would update hardware. This complicates the interface between resctrl's filesystem parts and architecture specific code. It should be possible for mba_sc to be completely implemented by the filesystem parts of resctrl. This would allow it to work on a second architecture with no additional code. resctrl_arch_update_domains() using the mbps_val[] array prevents this. Change parse_bw() to write the configuration value directly to the mbps_val[] array in the domain structure. Change rdtgroup_schemata_write() to skip the call to resctrl_arch_update_domains(), meaning all the mba_sc specific code in resctrl_arch_update_domains() can be removed. On the read-side, show_doms() and update_mba_bw() are changed to read the mbps_val[] array from the domain structure. With this, resctrl_arch_get_config() no longer needs to consider mba_sc resources. Signed-off-by: James Morse <james.morse@arm.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Jamie Iles <quic_jiles@quicinc.com> Reviewed-by: Shaopeng Tan <tan.shaopeng@fujitsu.com> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Tested-by: Xin Hao <xhao@linux.alibaba.com> Tested-by: Shaopeng Tan <tan.shaopeng@fujitsu.com> Tested-by: Cristian Marussi <cristian.marussi@arm.com> Link: https://lore.kernel.org/r/20220902154829.30399-10-james.morse@arm.com
2022-09-02 15:48:17 +00:00
/*
* Writes to mba_sc resources update the software controller,
* not the control MSR.
*/
if (is_mba_sc(r))
continue;
ret = resctrl_arch_update_domains(r, rdtgrp->closid);
if (ret)
goto out;
}
if (rdtgrp->mode == RDT_MODE_PSEUDO_LOCKSETUP) {
/*
* If pseudo-locking fails we keep the resource group in
* mode RDT_MODE_PSEUDO_LOCKSETUP with its class of service
* active and updated for just the domain the pseudo-locked
* region was requested for.
*/
ret = rdtgroup_pseudo_lock_create(rdtgrp);
}
out:
x86/resctrl: Clear staged_config[] before and after it is used As a temporary storage, staged_config[] in rdt_domain should be cleared before and after it is used. The stale value in staged_config[] could cause an MSR access error. Here is a reproducer on a system with 16 usable CLOSIDs for a 15-way L3 Cache (MBA should be disabled if the number of CLOSIDs for MB is less than 16.) : mount -t resctrl resctrl -o cdp /sys/fs/resctrl mkdir /sys/fs/resctrl/p{1..7} umount /sys/fs/resctrl/ mount -t resctrl resctrl /sys/fs/resctrl mkdir /sys/fs/resctrl/p{1..8} An error occurs when creating resource group named p8: unchecked MSR access error: WRMSR to 0xca0 (tried to write 0x00000000000007ff) at rIP: 0xffffffff82249142 (cat_wrmsr+0x32/0x60) Call Trace: <IRQ> __flush_smp_call_function_queue+0x11d/0x170 __sysvec_call_function+0x24/0xd0 sysvec_call_function+0x89/0xc0 </IRQ> <TASK> asm_sysvec_call_function+0x16/0x20 When creating a new resource control group, hardware will be configured by the following process: rdtgroup_mkdir() rdtgroup_mkdir_ctrl_mon() rdtgroup_init_alloc() resctrl_arch_update_domains() resctrl_arch_update_domains() iterates and updates all resctrl_conf_type whose have_new_ctrl is true. Since staged_config[] holds the same values as when CDP was enabled, it will continue to update the CDP_CODE and CDP_DATA configurations. When group p8 is created, get_config_index() called in resctrl_arch_update_domains() will return 16 and 17 as the CLOSIDs for CDP_CODE and CDP_DATA, which will be translated to an invalid register - 0xca0 in this scenario. Fix it by clearing staged_config[] before and after it is used. [reinette: re-order commit tags] Fixes: 75408e43509e ("x86/resctrl: Allow different CODE/DATA configurations to be staged") Suggested-by: Xin Hao <xhao@linux.alibaba.com> Signed-off-by: Shawn Wang <shawnwang@linux.alibaba.com> Signed-off-by: Reinette Chatre <reinette.chatre@intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Tested-by: Reinette Chatre <reinette.chatre@intel.com> Cc:stable@vger.kernel.org Link: https://lore.kernel.org/all/2fad13f49fbe89687fc40e9a5a61f23a28d1507a.1673988935.git.reinette.chatre%40intel.com
2023-01-17 13:14:50 -08:00
rdt_staged_configs_clear();
rdtgroup_kn_unlock(of->kn);
return ret ?: nbytes;
}
u32 resctrl_arch_get_config(struct rdt_resource *r, struct rdt_domain *d,
u32 closid, enum resctrl_conf_type type)
2021-07-28 17:06:29 +00:00
{
struct rdt_hw_domain *hw_dom = resctrl_to_arch_dom(d);
u32 idx = get_config_index(closid, type);
2021-07-28 17:06:29 +00:00
x86/resctrl: Switch over to the resctrl mbps_val list Updates to resctrl's software controller follow the same path as other configuration updates, but they don't modify the hardware state. rdtgroup_schemata_write() uses parse_line() and the resource's parse_ctrlval() function to stage the configuration. resctrl_arch_update_domains() then updates the mbps_val[] array instead, and resctrl_arch_update_domains() skips the rdt_ctrl_update() call that would update hardware. This complicates the interface between resctrl's filesystem parts and architecture specific code. It should be possible for mba_sc to be completely implemented by the filesystem parts of resctrl. This would allow it to work on a second architecture with no additional code. resctrl_arch_update_domains() using the mbps_val[] array prevents this. Change parse_bw() to write the configuration value directly to the mbps_val[] array in the domain structure. Change rdtgroup_schemata_write() to skip the call to resctrl_arch_update_domains(), meaning all the mba_sc specific code in resctrl_arch_update_domains() can be removed. On the read-side, show_doms() and update_mba_bw() are changed to read the mbps_val[] array from the domain structure. With this, resctrl_arch_get_config() no longer needs to consider mba_sc resources. Signed-off-by: James Morse <james.morse@arm.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Jamie Iles <quic_jiles@quicinc.com> Reviewed-by: Shaopeng Tan <tan.shaopeng@fujitsu.com> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Tested-by: Xin Hao <xhao@linux.alibaba.com> Tested-by: Shaopeng Tan <tan.shaopeng@fujitsu.com> Tested-by: Cristian Marussi <cristian.marussi@arm.com> Link: https://lore.kernel.org/r/20220902154829.30399-10-james.morse@arm.com
2022-09-02 15:48:17 +00:00
return hw_dom->ctrl_val[idx];
2021-07-28 17:06:29 +00:00
}
static void show_doms(struct seq_file *s, struct resctrl_schema *schema, int closid)
{
struct rdt_resource *r = schema->res;
struct rdt_domain *dom;
bool sep = false;
u32 ctrl_val;
x86/resctrl: Separate arch and fs resctrl locks resctrl has one mutex that is taken by the architecture-specific code, and the filesystem parts. The two interact via cpuhp, where the architecture code updates the domain list. Filesystem handlers that walk the domains list should not run concurrently with the cpuhp callback modifying the list. Exposing a lock from the filesystem code means the interface is not cleanly defined, and creates the possibility of cross-architecture lock ordering headaches. The interaction only exists so that certain filesystem paths are serialised against CPU hotplug. The CPU hotplug code already has a mechanism to do this using cpus_read_lock(). MPAM's monitors have an overflow interrupt, so it needs to be possible to walk the domains list in irq context. RCU is ideal for this, but some paths need to be able to sleep to allocate memory. Because resctrl_{on,off}line_cpu() take the rdtgroup_mutex as part of a cpuhp callback, cpus_read_lock() must always be taken first. rdtgroup_schemata_write() already does this. Most of the filesystem code's domain list walkers are currently protected by the rdtgroup_mutex taken in rdtgroup_kn_lock_live(). The exceptions are rdt_bit_usage_show() and the mon_config helpers which take the lock directly. Make the domain list protected by RCU. An architecture-specific lock prevents concurrent writers. rdt_bit_usage_show() could walk the domain list using RCU, but to keep all the filesystem operations the same, this is changed to call cpus_read_lock(). The mon_config helpers send multiple IPIs, take the cpus_read_lock() in these cases. The other filesystem list walkers need to be able to sleep. Add cpus_read_lock() to rdtgroup_kn_lock_live() so that the cpuhp callbacks can't be invoked when file system operations are occurring. Add lockdep_assert_cpus_held() in the cases where the rdtgroup_kn_lock_live() call isn't obvious. Resctrl's domain online/offline calls now need to take the rdtgroup_mutex themselves. [ bp: Fold in a build fix: https://lore.kernel.org/r/87zfvwieli.ffs@tglx ] Signed-off-by: James Morse <james.morse@arm.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Shaopeng Tan <tan.shaopeng@fujitsu.com> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Reviewed-by: Babu Moger <babu.moger@amd.com> Tested-by: Shaopeng Tan <tan.shaopeng@fujitsu.com> Tested-by: Peter Newman <peternewman@google.com> Tested-by: Babu Moger <babu.moger@amd.com> Tested-by: Carl Worth <carl@os.amperecomputing.com> # arm64 Link: https://lore.kernel.org/r/20240213184438.16675-25-james.morse@arm.com Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
2024-02-13 18:44:38 +00:00
/* Walking r->domains, ensure it can't race with cpuhp */
lockdep_assert_cpus_held();
seq_printf(s, "%*s:", max_name_width, schema->name);
list_for_each_entry(dom, &r->domains, list) {
if (sep)
seq_puts(s, ";");
x86/resctrl: Switch over to the resctrl mbps_val list Updates to resctrl's software controller follow the same path as other configuration updates, but they don't modify the hardware state. rdtgroup_schemata_write() uses parse_line() and the resource's parse_ctrlval() function to stage the configuration. resctrl_arch_update_domains() then updates the mbps_val[] array instead, and resctrl_arch_update_domains() skips the rdt_ctrl_update() call that would update hardware. This complicates the interface between resctrl's filesystem parts and architecture specific code. It should be possible for mba_sc to be completely implemented by the filesystem parts of resctrl. This would allow it to work on a second architecture with no additional code. resctrl_arch_update_domains() using the mbps_val[] array prevents this. Change parse_bw() to write the configuration value directly to the mbps_val[] array in the domain structure. Change rdtgroup_schemata_write() to skip the call to resctrl_arch_update_domains(), meaning all the mba_sc specific code in resctrl_arch_update_domains() can be removed. On the read-side, show_doms() and update_mba_bw() are changed to read the mbps_val[] array from the domain structure. With this, resctrl_arch_get_config() no longer needs to consider mba_sc resources. Signed-off-by: James Morse <james.morse@arm.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Jamie Iles <quic_jiles@quicinc.com> Reviewed-by: Shaopeng Tan <tan.shaopeng@fujitsu.com> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Tested-by: Xin Hao <xhao@linux.alibaba.com> Tested-by: Shaopeng Tan <tan.shaopeng@fujitsu.com> Tested-by: Cristian Marussi <cristian.marussi@arm.com> Link: https://lore.kernel.org/r/20220902154829.30399-10-james.morse@arm.com
2022-09-02 15:48:17 +00:00
if (is_mba_sc(r))
ctrl_val = dom->mbps_val[closid];
else
ctrl_val = resctrl_arch_get_config(r, dom, closid,
schema->conf_type);
seq_printf(s, r->format_str, dom->id, max_data_width,
ctrl_val);
sep = true;
}
seq_puts(s, "\n");
}
int rdtgroup_schemata_show(struct kernfs_open_file *of,
struct seq_file *s, void *v)
{
struct resctrl_schema *schema;
struct rdtgroup *rdtgrp;
int ret = 0;
u32 closid;
rdtgrp = rdtgroup_kn_lock_live(of->kn);
if (rdtgrp) {
if (rdtgrp->mode == RDT_MODE_PSEUDO_LOCKSETUP) {
list_for_each_entry(schema, &resctrl_schema_all, list) {
seq_printf(s, "%s:uninitialized\n", schema->name);
}
} else if (rdtgrp->mode == RDT_MODE_PSEUDO_LOCKED) {
if (!rdtgrp->plr->d) {
rdt_last_cmd_clear();
rdt_last_cmd_puts("Cache domain offline\n");
ret = -ENODEV;
} else {
seq_printf(s, "%s:%d=%x\n",
rdtgrp->plr->s->res->name,
rdtgrp->plr->d->id,
rdtgrp->plr->cbm);
}
} else {
closid = rdtgrp->closid;
list_for_each_entry(schema, &resctrl_schema_all, list) {
if (closid < schema->num_closid)
show_doms(s, schema, closid);
}
}
} else {
ret = -ENOENT;
}
rdtgroup_kn_unlock(of->kn);
return ret;
}
x86/resctrl: Queue mon_event_read() instead of sending an IPI Intel is blessed with an abundance of monitors, one per RMID, that can be read from any CPU in the domain. MPAMs monitors reside in the MMIO MSC, the number implemented is up to the manufacturer. This means when there are fewer monitors than needed, they need to be allocated and freed. MPAM's CSU monitors are used to back the 'llc_occupancy' monitor file. The CSU counter is allowed to return 'not ready' for a small number of micro-seconds after programming. To allow one CSU hardware monitor to be used for multiple control or monitor groups, the CPU accessing the monitor needs to be able to block when configuring and reading the counter. Worse, the domain may be broken up into slices, and the MMIO accesses for each slice may need performing from different CPUs. These two details mean MPAMs monitor code needs to be able to sleep, and IPI another CPU in the domain to read from a resource that has been sliced. mon_event_read() already invokes mon_event_count() via IPI, which means this isn't possible. On systems using nohz-full, some CPUs need to be interrupted to run kernel work as they otherwise stay in user-space running realtime workloads. Interrupting these CPUs should be avoided, and scheduling work on them may never complete. Change mon_event_read() to pick a housekeeping CPU, (one that is not using nohz_full) and schedule mon_event_count() and wait. If all the CPUs in a domain are using nohz-full, then an IPI is used as the fallback. This function is only used in response to a user-space filesystem request (not the timing sensitive overflow code). This allows MPAM to hide the slice behaviour from resctrl, and to keep the monitor-allocation in monitor.c. When the IPI fallback is used on machines where MPAM needs to make an access on multiple CPUs, the counter read will always fail. Signed-off-by: James Morse <james.morse@arm.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Shaopeng Tan <tan.shaopeng@fujitsu.com> Reviewed-by: Peter Newman <peternewman@google.com> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Reviewed-by: Babu Moger <babu.moger@amd.com> Tested-by: Shaopeng Tan <tan.shaopeng@fujitsu.com> Tested-by: Peter Newman <peternewman@google.com> Tested-by: Babu Moger <babu.moger@amd.com> Tested-by: Carl Worth <carl@os.amperecomputing.com> # arm64 Link: https://lore.kernel.org/r/20240213184438.16675-14-james.morse@arm.com Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
2024-02-13 18:44:27 +00:00
static int smp_mon_event_count(void *arg)
{
mon_event_count(arg);
return 0;
}
void mon_event_read(struct rmid_read *rr, struct rdt_resource *r,
struct rdt_domain *d, struct rdtgroup *rdtgrp,
int evtid, int first)
{
x86/resctrl: Queue mon_event_read() instead of sending an IPI Intel is blessed with an abundance of monitors, one per RMID, that can be read from any CPU in the domain. MPAMs monitors reside in the MMIO MSC, the number implemented is up to the manufacturer. This means when there are fewer monitors than needed, they need to be allocated and freed. MPAM's CSU monitors are used to back the 'llc_occupancy' monitor file. The CSU counter is allowed to return 'not ready' for a small number of micro-seconds after programming. To allow one CSU hardware monitor to be used for multiple control or monitor groups, the CPU accessing the monitor needs to be able to block when configuring and reading the counter. Worse, the domain may be broken up into slices, and the MMIO accesses for each slice may need performing from different CPUs. These two details mean MPAMs monitor code needs to be able to sleep, and IPI another CPU in the domain to read from a resource that has been sliced. mon_event_read() already invokes mon_event_count() via IPI, which means this isn't possible. On systems using nohz-full, some CPUs need to be interrupted to run kernel work as they otherwise stay in user-space running realtime workloads. Interrupting these CPUs should be avoided, and scheduling work on them may never complete. Change mon_event_read() to pick a housekeeping CPU, (one that is not using nohz_full) and schedule mon_event_count() and wait. If all the CPUs in a domain are using nohz-full, then an IPI is used as the fallback. This function is only used in response to a user-space filesystem request (not the timing sensitive overflow code). This allows MPAM to hide the slice behaviour from resctrl, and to keep the monitor-allocation in monitor.c. When the IPI fallback is used on machines where MPAM needs to make an access on multiple CPUs, the counter read will always fail. Signed-off-by: James Morse <james.morse@arm.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Shaopeng Tan <tan.shaopeng@fujitsu.com> Reviewed-by: Peter Newman <peternewman@google.com> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Reviewed-by: Babu Moger <babu.moger@amd.com> Tested-by: Shaopeng Tan <tan.shaopeng@fujitsu.com> Tested-by: Peter Newman <peternewman@google.com> Tested-by: Babu Moger <babu.moger@amd.com> Tested-by: Carl Worth <carl@os.amperecomputing.com> # arm64 Link: https://lore.kernel.org/r/20240213184438.16675-14-james.morse@arm.com Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
2024-02-13 18:44:27 +00:00
int cpu;
x86/resctrl: Separate arch and fs resctrl locks resctrl has one mutex that is taken by the architecture-specific code, and the filesystem parts. The two interact via cpuhp, where the architecture code updates the domain list. Filesystem handlers that walk the domains list should not run concurrently with the cpuhp callback modifying the list. Exposing a lock from the filesystem code means the interface is not cleanly defined, and creates the possibility of cross-architecture lock ordering headaches. The interaction only exists so that certain filesystem paths are serialised against CPU hotplug. The CPU hotplug code already has a mechanism to do this using cpus_read_lock(). MPAM's monitors have an overflow interrupt, so it needs to be possible to walk the domains list in irq context. RCU is ideal for this, but some paths need to be able to sleep to allocate memory. Because resctrl_{on,off}line_cpu() take the rdtgroup_mutex as part of a cpuhp callback, cpus_read_lock() must always be taken first. rdtgroup_schemata_write() already does this. Most of the filesystem code's domain list walkers are currently protected by the rdtgroup_mutex taken in rdtgroup_kn_lock_live(). The exceptions are rdt_bit_usage_show() and the mon_config helpers which take the lock directly. Make the domain list protected by RCU. An architecture-specific lock prevents concurrent writers. rdt_bit_usage_show() could walk the domain list using RCU, but to keep all the filesystem operations the same, this is changed to call cpus_read_lock(). The mon_config helpers send multiple IPIs, take the cpus_read_lock() in these cases. The other filesystem list walkers need to be able to sleep. Add cpus_read_lock() to rdtgroup_kn_lock_live() so that the cpuhp callbacks can't be invoked when file system operations are occurring. Add lockdep_assert_cpus_held() in the cases where the rdtgroup_kn_lock_live() call isn't obvious. Resctrl's domain online/offline calls now need to take the rdtgroup_mutex themselves. [ bp: Fold in a build fix: https://lore.kernel.org/r/87zfvwieli.ffs@tglx ] Signed-off-by: James Morse <james.morse@arm.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Shaopeng Tan <tan.shaopeng@fujitsu.com> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Reviewed-by: Babu Moger <babu.moger@amd.com> Tested-by: Shaopeng Tan <tan.shaopeng@fujitsu.com> Tested-by: Peter Newman <peternewman@google.com> Tested-by: Babu Moger <babu.moger@amd.com> Tested-by: Carl Worth <carl@os.amperecomputing.com> # arm64 Link: https://lore.kernel.org/r/20240213184438.16675-25-james.morse@arm.com Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
2024-02-13 18:44:38 +00:00
/* When picking a CPU from cpu_mask, ensure it can't race with cpuhp */
lockdep_assert_cpus_held();
/*
x86/resctrl: Queue mon_event_read() instead of sending an IPI Intel is blessed with an abundance of monitors, one per RMID, that can be read from any CPU in the domain. MPAMs monitors reside in the MMIO MSC, the number implemented is up to the manufacturer. This means when there are fewer monitors than needed, they need to be allocated and freed. MPAM's CSU monitors are used to back the 'llc_occupancy' monitor file. The CSU counter is allowed to return 'not ready' for a small number of micro-seconds after programming. To allow one CSU hardware monitor to be used for multiple control or monitor groups, the CPU accessing the monitor needs to be able to block when configuring and reading the counter. Worse, the domain may be broken up into slices, and the MMIO accesses for each slice may need performing from different CPUs. These two details mean MPAMs monitor code needs to be able to sleep, and IPI another CPU in the domain to read from a resource that has been sliced. mon_event_read() already invokes mon_event_count() via IPI, which means this isn't possible. On systems using nohz-full, some CPUs need to be interrupted to run kernel work as they otherwise stay in user-space running realtime workloads. Interrupting these CPUs should be avoided, and scheduling work on them may never complete. Change mon_event_read() to pick a housekeeping CPU, (one that is not using nohz_full) and schedule mon_event_count() and wait. If all the CPUs in a domain are using nohz-full, then an IPI is used as the fallback. This function is only used in response to a user-space filesystem request (not the timing sensitive overflow code). This allows MPAM to hide the slice behaviour from resctrl, and to keep the monitor-allocation in monitor.c. When the IPI fallback is used on machines where MPAM needs to make an access on multiple CPUs, the counter read will always fail. Signed-off-by: James Morse <james.morse@arm.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Shaopeng Tan <tan.shaopeng@fujitsu.com> Reviewed-by: Peter Newman <peternewman@google.com> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Reviewed-by: Babu Moger <babu.moger@amd.com> Tested-by: Shaopeng Tan <tan.shaopeng@fujitsu.com> Tested-by: Peter Newman <peternewman@google.com> Tested-by: Babu Moger <babu.moger@amd.com> Tested-by: Carl Worth <carl@os.amperecomputing.com> # arm64 Link: https://lore.kernel.org/r/20240213184438.16675-14-james.morse@arm.com Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
2024-02-13 18:44:27 +00:00
* Setup the parameters to pass to mon_event_count() to read the data.
*/
rr->rgrp = rdtgrp;
rr->evtid = evtid;
rr->r = r;
rr->d = d;
rr->val = 0;
rr->first = first;
rr->arch_mon_ctx = resctrl_arch_mon_ctx_alloc(r, evtid);
if (IS_ERR(rr->arch_mon_ctx)) {
rr->err = -EINVAL;
return;
}
cpu = cpumask_any_housekeeping(&d->cpu_mask, RESCTRL_PICK_ANY_CPU);
x86/resctrl: Queue mon_event_read() instead of sending an IPI Intel is blessed with an abundance of monitors, one per RMID, that can be read from any CPU in the domain. MPAMs monitors reside in the MMIO MSC, the number implemented is up to the manufacturer. This means when there are fewer monitors than needed, they need to be allocated and freed. MPAM's CSU monitors are used to back the 'llc_occupancy' monitor file. The CSU counter is allowed to return 'not ready' for a small number of micro-seconds after programming. To allow one CSU hardware monitor to be used for multiple control or monitor groups, the CPU accessing the monitor needs to be able to block when configuring and reading the counter. Worse, the domain may be broken up into slices, and the MMIO accesses for each slice may need performing from different CPUs. These two details mean MPAMs monitor code needs to be able to sleep, and IPI another CPU in the domain to read from a resource that has been sliced. mon_event_read() already invokes mon_event_count() via IPI, which means this isn't possible. On systems using nohz-full, some CPUs need to be interrupted to run kernel work as they otherwise stay in user-space running realtime workloads. Interrupting these CPUs should be avoided, and scheduling work on them may never complete. Change mon_event_read() to pick a housekeeping CPU, (one that is not using nohz_full) and schedule mon_event_count() and wait. If all the CPUs in a domain are using nohz-full, then an IPI is used as the fallback. This function is only used in response to a user-space filesystem request (not the timing sensitive overflow code). This allows MPAM to hide the slice behaviour from resctrl, and to keep the monitor-allocation in monitor.c. When the IPI fallback is used on machines where MPAM needs to make an access on multiple CPUs, the counter read will always fail. Signed-off-by: James Morse <james.morse@arm.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Shaopeng Tan <tan.shaopeng@fujitsu.com> Reviewed-by: Peter Newman <peternewman@google.com> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Reviewed-by: Babu Moger <babu.moger@amd.com> Tested-by: Shaopeng Tan <tan.shaopeng@fujitsu.com> Tested-by: Peter Newman <peternewman@google.com> Tested-by: Babu Moger <babu.moger@amd.com> Tested-by: Carl Worth <carl@os.amperecomputing.com> # arm64 Link: https://lore.kernel.org/r/20240213184438.16675-14-james.morse@arm.com Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
2024-02-13 18:44:27 +00:00
/*
* cpumask_any_housekeeping() prefers housekeeping CPUs, but
* are all the CPUs nohz_full? If yes, pick a CPU to IPI.
* MPAM's resctrl_arch_rmid_read() is unable to read the
* counters on some platforms if its called in IRQ context.
*/
if (tick_nohz_full_cpu(cpu))
smp_call_function_any(&d->cpu_mask, mon_event_count, rr, 1);
else
smp_call_on_cpu(cpu, smp_mon_event_count, rr, false);
resctrl_arch_mon_ctx_free(r, evtid, rr->arch_mon_ctx);
}
int rdtgroup_mondata_show(struct seq_file *m, void *arg)
{
struct kernfs_open_file *of = m->private;
u32 resid, evtid, domid;
struct rdtgroup *rdtgrp;
struct rdt_resource *r;
union mon_data_bits md;
struct rdt_domain *d;
struct rmid_read rr;
int ret = 0;
rdtgrp = rdtgroup_kn_lock_live(of->kn);
if (!rdtgrp) {
ret = -ENOENT;
goto out;
}
md.priv = of->kn->priv;
resid = md.u.rid;
domid = md.u.domid;
evtid = md.u.evtid;
r = &rdt_resources_all[resid].r_resctrl;
d = rdt_find_domain(r, domid, NULL);
if (IS_ERR_OR_NULL(d)) {
ret = -ENOENT;
goto out;
}
mon_event_read(&rr, r, d, rdtgrp, evtid, false);
if (rr.err == -EIO)
seq_puts(m, "Error\n");
else if (rr.err == -EINVAL)
seq_puts(m, "Unavailable\n");
else
seq_printf(m, "%llu\n", rr.val);
out:
rdtgroup_kn_unlock(of->kn);
return ret;
}