linux/arch/powerpc/include/asm/rtas.h

566 lines
23 KiB
C
Raw Normal View History

/* SPDX-License-Identifier: GPL-2.0-or-later */
#ifndef _POWERPC_RTAS_H
#define _POWERPC_RTAS_H
#ifdef __KERNEL__
powerpc/rtas: Facilitate high-level call sequences On RTAS platforms there is a general restriction that the OS must not enter RTAS on more than one CPU at a time. This low-level serialization requirement is satisfied by holding a spin lock (rtas_lock) across most RTAS function invocations. However, some pseries RTAS functions require multiple successive calls to complete a logical operation. Beginning a new call sequence for such a function may disrupt any other sequences of that function already in progress. Safe and reliable use of these functions effectively requires higher-level serialization beyond what is already done at the level of RTAS entry and exit. Where a sequence-based RTAS function is invoked only through sys_rtas(), with no in-kernel users, there is no issue as far as the kernel is concerned. User space is responsible for appropriately serializing its call sequences. (Whether user space code actually takes measures to prevent sequence interleaving is another matter.) Examples of such functions currently include ibm,platform-dump and ibm,get-vpd. But where a sequence-based RTAS function has both user space and in-kernel uesrs, there is a hazard. Even if the in-kernel call sites of such a function serialize their sequences correctly, a user of sys_rtas() can invoke the same function at any time, potentially disrupting a sequence in progress. So in order to prevent disruption of kernel-based RTAS call sequences, they must serialize not only with themselves but also with sys_rtas() users, somehow. Preferably without adding more function-specific hacks to sys_rtas(). This is a prerequisite for adding an in-kernel call sequence of ibm,get-vpd, which is in a change to follow. Note that it has never been feasible for the kernel to prevent sys_rtas()-based sequences from being disrupted because control returns to user space on every call. sys_rtas()-based users of these functions have always been, and continue to be, responsible for coordinating their call sequences with other users, even those which may invoke the RTAS functions through less direct means than sys_rtas(). This is an unavoidable consequence of exposing sequence-based RTAS functions through sys_rtas(). * Add an optional mutex member to struct rtas_function. * Statically define a mutex for each RTAS function with known call sequence serialization requirements, and assign its address to the .lock member of the corresponding function table entry, along with justifying commentary. * In sys_rtas(), if the table entry for the RTAS function being called has a populated lock member, acquire it before taking rtas_lock and entering RTAS. * Kernel-based RTAS call sequences are expected to access the appropriate mutex explicitly by name. For example, a user of the ibm,activate-firmware RTAS function would do: int token = rtas_function_token(RTAS_FN_IBM_ACTIVATE_FIRMWARE); int fwrc; mutex_lock(&rtas_ibm_activate_firmware_lock); do { fwrc = rtas_call(token, 0, 1, NULL); } while (rtas_busy_delay(fwrc)); mutex_unlock(&rtas_ibm_activate_firmware_lock); There should be no perceivable change introduced here except that concurrent callers of the same RTAS function via sys_rtas() may block on a mutex instead of spinning on rtas_lock. Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20231212-papr-sys_rtas-vs-lockdown-v6-6-e9eafd0c8c6c@linux.ibm.com
2023-12-12 11:01:53 -06:00
#include <linux/mutex.h>
#include <linux/spinlock.h>
#include <asm/page.h>
#include <asm/rtas-types.h>
#include <linux/time.h>
#include <linux/cpumask.h>
/*
* Definitions for talking to the RTAS on CHRP machines.
*
* Copyright (C) 2001 Peter Bergner
* Copyright (C) 2001 PPC 64 Team, IBM Corp
*/
powerpc/rtas: improve function information lookups The core RTAS support code and its clients perform two types of lookup for RTAS firmware function information. First, mapping a known function name to a token. The typical use case invokes rtas_token() to retrieve the token value to pass to rtas_call(). rtas_token() relies on of_get_property(), which performs a linear search of the /rtas node's property list under a lock with IRQs disabled. Second, and less common: given a token value, looking up some information about the function. The primary example is the sys_rtas filter path, which linearly scans a small table to match the token to a rtas_filter struct. Another use case to come is RTAS entry/exit tracepoints, which will require efficient lookup of function names from token values. Currently there is no general API for this. We need something much like the existing rtas_filters table, but more general and organized to facilitate efficient lookups. Introduce: * A new rtas_function type, aggregating function name, token, and filter. Other function characteristics could be added in the future. * An array of rtas_function, where each element corresponds to a known RTAS function. All information in the table is static save the token values, which are derived from the device tree at boot. The array is sorted by function name to allow binary search. * A named constant for each known RTAS function, used to index the function array. These also will be used in a client-facing API to be added later. * An xarray that maps valid tokens to rtas_function objects. Fold the existing rtas_filter table into the new rtas_function array, with the appropriate adjustments to block_rtas_call(). Remove now-redundant fields from struct rtas_filter. Preserve the function of the CONFIG_CPU_BIG_ENDIAN guard in the current filter table by introducing a per-function flag that is set for the function entries related to pseries LPAR migration. These have never had working users via sys_rtas on ppc64le; see commit de0f7349a0dd ("powerpc/rtas: prevent suspend-related sys_rtas use on LE"). Convert rtas_token() to use a lockless binary search on the function table. Fall back to the old behavior for lookups against names that are not known to be RTAS functions, but issue a warning. rtas_token() is for function names; it is not a general facility for accessing arbitrary properties of the /rtas node. All known misuses of rtas_token() have been converted to more appropriate of_ APIs in preceding changes. Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20230125-b4-powerpc-rtas-queue-v3-8-26929c8cce78@linux.ibm.com
2023-02-10 12:41:56 -06:00
enum rtas_function_index {
RTAS_FNIDX__CHECK_EXCEPTION,
RTAS_FNIDX__DISPLAY_CHARACTER,
RTAS_FNIDX__EVENT_SCAN,
RTAS_FNIDX__FREEZE_TIME_BASE,
RTAS_FNIDX__GET_POWER_LEVEL,
RTAS_FNIDX__GET_SENSOR_STATE,
RTAS_FNIDX__GET_TERM_CHAR,
RTAS_FNIDX__GET_TIME_OF_DAY,
RTAS_FNIDX__IBM_ACTIVATE_FIRMWARE,
RTAS_FNIDX__IBM_CBE_START_PTCAL,
RTAS_FNIDX__IBM_CBE_STOP_PTCAL,
RTAS_FNIDX__IBM_CHANGE_MSI,
RTAS_FNIDX__IBM_CLOSE_ERRINJCT,
RTAS_FNIDX__IBM_CONFIGURE_BRIDGE,
RTAS_FNIDX__IBM_CONFIGURE_CONNECTOR,
RTAS_FNIDX__IBM_CONFIGURE_KERNEL_DUMP,
RTAS_FNIDX__IBM_CONFIGURE_PE,
RTAS_FNIDX__IBM_CREATE_PE_DMA_WINDOW,
RTAS_FNIDX__IBM_DISPLAY_MESSAGE,
RTAS_FNIDX__IBM_ERRINJCT,
RTAS_FNIDX__IBM_EXTI2C,
RTAS_FNIDX__IBM_GET_CONFIG_ADDR_INFO,
RTAS_FNIDX__IBM_GET_CONFIG_ADDR_INFO2,
RTAS_FNIDX__IBM_GET_DYNAMIC_SENSOR_STATE,
RTAS_FNIDX__IBM_GET_INDICES,
RTAS_FNIDX__IBM_GET_RIO_TOPOLOGY,
RTAS_FNIDX__IBM_GET_SYSTEM_PARAMETER,
RTAS_FNIDX__IBM_GET_VPD,
RTAS_FNIDX__IBM_GET_XIVE,
RTAS_FNIDX__IBM_INT_OFF,
RTAS_FNIDX__IBM_INT_ON,
RTAS_FNIDX__IBM_IO_QUIESCE_ACK,
RTAS_FNIDX__IBM_LPAR_PERFTOOLS,
RTAS_FNIDX__IBM_MANAGE_FLASH_IMAGE,
RTAS_FNIDX__IBM_MANAGE_STORAGE_PRESERVATION,
RTAS_FNIDX__IBM_NMI_INTERLOCK,
RTAS_FNIDX__IBM_NMI_REGISTER,
RTAS_FNIDX__IBM_OPEN_ERRINJCT,
RTAS_FNIDX__IBM_OPEN_SRIOV_ALLOW_UNFREEZE,
RTAS_FNIDX__IBM_OPEN_SRIOV_MAP_PE_NUMBER,
RTAS_FNIDX__IBM_OS_TERM,
RTAS_FNIDX__IBM_PARTNER_CONTROL,
RTAS_FNIDX__IBM_PHYSICAL_ATTESTATION,
RTAS_FNIDX__IBM_PLATFORM_DUMP,
RTAS_FNIDX__IBM_POWER_OFF_UPS,
RTAS_FNIDX__IBM_QUERY_INTERRUPT_SOURCE_NUMBER,
RTAS_FNIDX__IBM_QUERY_PE_DMA_WINDOW,
RTAS_FNIDX__IBM_READ_PCI_CONFIG,
RTAS_FNIDX__IBM_READ_SLOT_RESET_STATE,
RTAS_FNIDX__IBM_READ_SLOT_RESET_STATE2,
RTAS_FNIDX__IBM_REMOVE_PE_DMA_WINDOW,
powerpc/rtas: use correct function name for resetting TCE tables The PAPR spec spells the function name as "ibm,reset-pe-dma-windows" but in practice firmware uses the singular form: "ibm,reset-pe-dma-window" in the device tree. Since we have the wrong spelling in the RTAS function table, reverse lookups (token -> name) fail and warn: unexpected failed lookup for token 86 WARNING: CPU: 1 PID: 545 at arch/powerpc/kernel/rtas.c:659 __do_enter_rtas_trace+0x2a4/0x2b4 CPU: 1 PID: 545 Comm: systemd-udevd Not tainted 6.8.0-rc4 #30 Hardware name: IBM,9105-22A POWER10 (raw) 0x800200 0xf000006 of:IBM,FW1060.00 (NL1060_028) hv:phyp pSeries NIP [c0000000000417f0] __do_enter_rtas_trace+0x2a4/0x2b4 LR [c0000000000417ec] __do_enter_rtas_trace+0x2a0/0x2b4 Call Trace: __do_enter_rtas_trace+0x2a0/0x2b4 (unreliable) rtas_call+0x1f8/0x3e0 enable_ddw.constprop.0+0x4d0/0xc84 dma_iommu_dma_supported+0xe8/0x24c dma_set_mask+0x5c/0xd8 mlx5_pci_init.constprop.0+0xf0/0x46c [mlx5_core] probe_one+0xfc/0x32c [mlx5_core] local_pci_probe+0x68/0x12c pci_call_probe+0x68/0x1ec pci_device_probe+0xbc/0x1a8 really_probe+0x104/0x570 __driver_probe_device+0xb8/0x224 driver_probe_device+0x54/0x130 __driver_attach+0x158/0x2b0 bus_for_each_dev+0xa8/0x120 driver_attach+0x34/0x48 bus_add_driver+0x174/0x304 driver_register+0x8c/0x1c4 __pci_register_driver+0x68/0x7c mlx5_init+0xb8/0x118 [mlx5_core] do_one_initcall+0x60/0x388 do_init_module+0x7c/0x2a4 init_module_from_file+0xb4/0x108 idempotent_init_module+0x184/0x34c sys_finit_module+0x90/0x114 And oopses are possible when lockdep is enabled or the RTAS tracepoints are active, since those paths dereference the result of the lookup. Use the correct spelling to match firmware's behavior, adjusting the related constants to match. Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com> Fixes: 8252b88294d2 ("powerpc/rtas: improve function information lookups") Reported-by: Gaurav Batra <gbatra@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20240222-rtas-fix-ibm-reset-pe-dma-window-v1-1-7aaf235ac63c@linux.ibm.com
2024-02-22 16:19:14 -06:00
RTAS_FNIDX__IBM_RESET_PE_DMA_WINDOW,
powerpc/rtas: improve function information lookups The core RTAS support code and its clients perform two types of lookup for RTAS firmware function information. First, mapping a known function name to a token. The typical use case invokes rtas_token() to retrieve the token value to pass to rtas_call(). rtas_token() relies on of_get_property(), which performs a linear search of the /rtas node's property list under a lock with IRQs disabled. Second, and less common: given a token value, looking up some information about the function. The primary example is the sys_rtas filter path, which linearly scans a small table to match the token to a rtas_filter struct. Another use case to come is RTAS entry/exit tracepoints, which will require efficient lookup of function names from token values. Currently there is no general API for this. We need something much like the existing rtas_filters table, but more general and organized to facilitate efficient lookups. Introduce: * A new rtas_function type, aggregating function name, token, and filter. Other function characteristics could be added in the future. * An array of rtas_function, where each element corresponds to a known RTAS function. All information in the table is static save the token values, which are derived from the device tree at boot. The array is sorted by function name to allow binary search. * A named constant for each known RTAS function, used to index the function array. These also will be used in a client-facing API to be added later. * An xarray that maps valid tokens to rtas_function objects. Fold the existing rtas_filter table into the new rtas_function array, with the appropriate adjustments to block_rtas_call(). Remove now-redundant fields from struct rtas_filter. Preserve the function of the CONFIG_CPU_BIG_ENDIAN guard in the current filter table by introducing a per-function flag that is set for the function entries related to pseries LPAR migration. These have never had working users via sys_rtas on ppc64le; see commit de0f7349a0dd ("powerpc/rtas: prevent suspend-related sys_rtas use on LE"). Convert rtas_token() to use a lockless binary search on the function table. Fall back to the old behavior for lookups against names that are not known to be RTAS functions, but issue a warning. rtas_token() is for function names; it is not a general facility for accessing arbitrary properties of the /rtas node. All known misuses of rtas_token() have been converted to more appropriate of_ APIs in preceding changes. Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20230125-b4-powerpc-rtas-queue-v3-8-26929c8cce78@linux.ibm.com
2023-02-10 12:41:56 -06:00
RTAS_FNIDX__IBM_SCAN_LOG_DUMP,
RTAS_FNIDX__IBM_SET_DYNAMIC_INDICATOR,
RTAS_FNIDX__IBM_SET_EEH_OPTION,
RTAS_FNIDX__IBM_SET_SLOT_RESET,
RTAS_FNIDX__IBM_SET_SYSTEM_PARAMETER,
RTAS_FNIDX__IBM_SET_XIVE,
RTAS_FNIDX__IBM_SLOT_ERROR_DETAIL,
RTAS_FNIDX__IBM_SUSPEND_ME,
RTAS_FNIDX__IBM_TUNE_DMA_PARMS,
RTAS_FNIDX__IBM_UPDATE_FLASH_64_AND_REBOOT,
RTAS_FNIDX__IBM_UPDATE_NODES,
RTAS_FNIDX__IBM_UPDATE_PROPERTIES,
RTAS_FNIDX__IBM_VALIDATE_FLASH_IMAGE,
RTAS_FNIDX__IBM_WRITE_PCI_CONFIG,
RTAS_FNIDX__NVRAM_FETCH,
RTAS_FNIDX__NVRAM_STORE,
RTAS_FNIDX__POWER_OFF,
RTAS_FNIDX__PUT_TERM_CHAR,
RTAS_FNIDX__QUERY_CPU_STOPPED_STATE,
RTAS_FNIDX__READ_PCI_CONFIG,
RTAS_FNIDX__RTAS_LAST_ERROR,
RTAS_FNIDX__SET_INDICATOR,
RTAS_FNIDX__SET_POWER_LEVEL,
RTAS_FNIDX__SET_TIME_FOR_POWER_ON,
RTAS_FNIDX__SET_TIME_OF_DAY,
RTAS_FNIDX__START_CPU,
RTAS_FNIDX__STOP_SELF,
RTAS_FNIDX__SYSTEM_REBOOT,
RTAS_FNIDX__THAW_TIME_BASE,
RTAS_FNIDX__WRITE_PCI_CONFIG,
};
/*
* Opaque handle for client code to refer to RTAS functions. All valid
* function handles are build-time constants prefixed with RTAS_FN_.
*/
typedef struct {
const enum rtas_function_index index;
} rtas_fn_handle_t;
#define rtas_fn_handle(x_) ((const rtas_fn_handle_t) { .index = x_, })
#define RTAS_FN_CHECK_EXCEPTION rtas_fn_handle(RTAS_FNIDX__CHECK_EXCEPTION)
#define RTAS_FN_DISPLAY_CHARACTER rtas_fn_handle(RTAS_FNIDX__DISPLAY_CHARACTER)
#define RTAS_FN_EVENT_SCAN rtas_fn_handle(RTAS_FNIDX__EVENT_SCAN)
#define RTAS_FN_FREEZE_TIME_BASE rtas_fn_handle(RTAS_FNIDX__FREEZE_TIME_BASE)
#define RTAS_FN_GET_POWER_LEVEL rtas_fn_handle(RTAS_FNIDX__GET_POWER_LEVEL)
#define RTAS_FN_GET_SENSOR_STATE rtas_fn_handle(RTAS_FNIDX__GET_SENSOR_STATE)
#define RTAS_FN_GET_TERM_CHAR rtas_fn_handle(RTAS_FNIDX__GET_TERM_CHAR)
#define RTAS_FN_GET_TIME_OF_DAY rtas_fn_handle(RTAS_FNIDX__GET_TIME_OF_DAY)
#define RTAS_FN_IBM_ACTIVATE_FIRMWARE rtas_fn_handle(RTAS_FNIDX__IBM_ACTIVATE_FIRMWARE)
#define RTAS_FN_IBM_CBE_START_PTCAL rtas_fn_handle(RTAS_FNIDX__IBM_CBE_START_PTCAL)
#define RTAS_FN_IBM_CBE_STOP_PTCAL rtas_fn_handle(RTAS_FNIDX__IBM_CBE_STOP_PTCAL)
#define RTAS_FN_IBM_CHANGE_MSI rtas_fn_handle(RTAS_FNIDX__IBM_CHANGE_MSI)
#define RTAS_FN_IBM_CLOSE_ERRINJCT rtas_fn_handle(RTAS_FNIDX__IBM_CLOSE_ERRINJCT)
#define RTAS_FN_IBM_CONFIGURE_BRIDGE rtas_fn_handle(RTAS_FNIDX__IBM_CONFIGURE_BRIDGE)
#define RTAS_FN_IBM_CONFIGURE_CONNECTOR rtas_fn_handle(RTAS_FNIDX__IBM_CONFIGURE_CONNECTOR)
#define RTAS_FN_IBM_CONFIGURE_KERNEL_DUMP rtas_fn_handle(RTAS_FNIDX__IBM_CONFIGURE_KERNEL_DUMP)
#define RTAS_FN_IBM_CONFIGURE_PE rtas_fn_handle(RTAS_FNIDX__IBM_CONFIGURE_PE)
#define RTAS_FN_IBM_CREATE_PE_DMA_WINDOW rtas_fn_handle(RTAS_FNIDX__IBM_CREATE_PE_DMA_WINDOW)
#define RTAS_FN_IBM_DISPLAY_MESSAGE rtas_fn_handle(RTAS_FNIDX__IBM_DISPLAY_MESSAGE)
#define RTAS_FN_IBM_ERRINJCT rtas_fn_handle(RTAS_FNIDX__IBM_ERRINJCT)
#define RTAS_FN_IBM_EXTI2C rtas_fn_handle(RTAS_FNIDX__IBM_EXTI2C)
#define RTAS_FN_IBM_GET_CONFIG_ADDR_INFO rtas_fn_handle(RTAS_FNIDX__IBM_GET_CONFIG_ADDR_INFO)
#define RTAS_FN_IBM_GET_CONFIG_ADDR_INFO2 rtas_fn_handle(RTAS_FNIDX__IBM_GET_CONFIG_ADDR_INFO2)
#define RTAS_FN_IBM_GET_DYNAMIC_SENSOR_STATE rtas_fn_handle(RTAS_FNIDX__IBM_GET_DYNAMIC_SENSOR_STATE)
#define RTAS_FN_IBM_GET_INDICES rtas_fn_handle(RTAS_FNIDX__IBM_GET_INDICES)
#define RTAS_FN_IBM_GET_RIO_TOPOLOGY rtas_fn_handle(RTAS_FNIDX__IBM_GET_RIO_TOPOLOGY)
#define RTAS_FN_IBM_GET_SYSTEM_PARAMETER rtas_fn_handle(RTAS_FNIDX__IBM_GET_SYSTEM_PARAMETER)
#define RTAS_FN_IBM_GET_VPD rtas_fn_handle(RTAS_FNIDX__IBM_GET_VPD)
#define RTAS_FN_IBM_GET_XIVE rtas_fn_handle(RTAS_FNIDX__IBM_GET_XIVE)
#define RTAS_FN_IBM_INT_OFF rtas_fn_handle(RTAS_FNIDX__IBM_INT_OFF)
#define RTAS_FN_IBM_INT_ON rtas_fn_handle(RTAS_FNIDX__IBM_INT_ON)
#define RTAS_FN_IBM_IO_QUIESCE_ACK rtas_fn_handle(RTAS_FNIDX__IBM_IO_QUIESCE_ACK)
#define RTAS_FN_IBM_LPAR_PERFTOOLS rtas_fn_handle(RTAS_FNIDX__IBM_LPAR_PERFTOOLS)
#define RTAS_FN_IBM_MANAGE_FLASH_IMAGE rtas_fn_handle(RTAS_FNIDX__IBM_MANAGE_FLASH_IMAGE)
#define RTAS_FN_IBM_MANAGE_STORAGE_PRESERVATION rtas_fn_handle(RTAS_FNIDX__IBM_MANAGE_STORAGE_PRESERVATION)
#define RTAS_FN_IBM_NMI_INTERLOCK rtas_fn_handle(RTAS_FNIDX__IBM_NMI_INTERLOCK)
#define RTAS_FN_IBM_NMI_REGISTER rtas_fn_handle(RTAS_FNIDX__IBM_NMI_REGISTER)
#define RTAS_FN_IBM_OPEN_ERRINJCT rtas_fn_handle(RTAS_FNIDX__IBM_OPEN_ERRINJCT)
#define RTAS_FN_IBM_OPEN_SRIOV_ALLOW_UNFREEZE rtas_fn_handle(RTAS_FNIDX__IBM_OPEN_SRIOV_ALLOW_UNFREEZE)
#define RTAS_FN_IBM_OPEN_SRIOV_MAP_PE_NUMBER rtas_fn_handle(RTAS_FNIDX__IBM_OPEN_SRIOV_MAP_PE_NUMBER)
#define RTAS_FN_IBM_OS_TERM rtas_fn_handle(RTAS_FNIDX__IBM_OS_TERM)
#define RTAS_FN_IBM_PARTNER_CONTROL rtas_fn_handle(RTAS_FNIDX__IBM_PARTNER_CONTROL)
#define RTAS_FN_IBM_PHYSICAL_ATTESTATION rtas_fn_handle(RTAS_FNIDX__IBM_PHYSICAL_ATTESTATION)
#define RTAS_FN_IBM_PLATFORM_DUMP rtas_fn_handle(RTAS_FNIDX__IBM_PLATFORM_DUMP)
#define RTAS_FN_IBM_POWER_OFF_UPS rtas_fn_handle(RTAS_FNIDX__IBM_POWER_OFF_UPS)
#define RTAS_FN_IBM_QUERY_INTERRUPT_SOURCE_NUMBER rtas_fn_handle(RTAS_FNIDX__IBM_QUERY_INTERRUPT_SOURCE_NUMBER)
#define RTAS_FN_IBM_QUERY_PE_DMA_WINDOW rtas_fn_handle(RTAS_FNIDX__IBM_QUERY_PE_DMA_WINDOW)
#define RTAS_FN_IBM_READ_PCI_CONFIG rtas_fn_handle(RTAS_FNIDX__IBM_READ_PCI_CONFIG)
#define RTAS_FN_IBM_READ_SLOT_RESET_STATE rtas_fn_handle(RTAS_FNIDX__IBM_READ_SLOT_RESET_STATE)
#define RTAS_FN_IBM_READ_SLOT_RESET_STATE2 rtas_fn_handle(RTAS_FNIDX__IBM_READ_SLOT_RESET_STATE2)
#define RTAS_FN_IBM_REMOVE_PE_DMA_WINDOW rtas_fn_handle(RTAS_FNIDX__IBM_REMOVE_PE_DMA_WINDOW)
powerpc/rtas: use correct function name for resetting TCE tables The PAPR spec spells the function name as "ibm,reset-pe-dma-windows" but in practice firmware uses the singular form: "ibm,reset-pe-dma-window" in the device tree. Since we have the wrong spelling in the RTAS function table, reverse lookups (token -> name) fail and warn: unexpected failed lookup for token 86 WARNING: CPU: 1 PID: 545 at arch/powerpc/kernel/rtas.c:659 __do_enter_rtas_trace+0x2a4/0x2b4 CPU: 1 PID: 545 Comm: systemd-udevd Not tainted 6.8.0-rc4 #30 Hardware name: IBM,9105-22A POWER10 (raw) 0x800200 0xf000006 of:IBM,FW1060.00 (NL1060_028) hv:phyp pSeries NIP [c0000000000417f0] __do_enter_rtas_trace+0x2a4/0x2b4 LR [c0000000000417ec] __do_enter_rtas_trace+0x2a0/0x2b4 Call Trace: __do_enter_rtas_trace+0x2a0/0x2b4 (unreliable) rtas_call+0x1f8/0x3e0 enable_ddw.constprop.0+0x4d0/0xc84 dma_iommu_dma_supported+0xe8/0x24c dma_set_mask+0x5c/0xd8 mlx5_pci_init.constprop.0+0xf0/0x46c [mlx5_core] probe_one+0xfc/0x32c [mlx5_core] local_pci_probe+0x68/0x12c pci_call_probe+0x68/0x1ec pci_device_probe+0xbc/0x1a8 really_probe+0x104/0x570 __driver_probe_device+0xb8/0x224 driver_probe_device+0x54/0x130 __driver_attach+0x158/0x2b0 bus_for_each_dev+0xa8/0x120 driver_attach+0x34/0x48 bus_add_driver+0x174/0x304 driver_register+0x8c/0x1c4 __pci_register_driver+0x68/0x7c mlx5_init+0xb8/0x118 [mlx5_core] do_one_initcall+0x60/0x388 do_init_module+0x7c/0x2a4 init_module_from_file+0xb4/0x108 idempotent_init_module+0x184/0x34c sys_finit_module+0x90/0x114 And oopses are possible when lockdep is enabled or the RTAS tracepoints are active, since those paths dereference the result of the lookup. Use the correct spelling to match firmware's behavior, adjusting the related constants to match. Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com> Fixes: 8252b88294d2 ("powerpc/rtas: improve function information lookups") Reported-by: Gaurav Batra <gbatra@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20240222-rtas-fix-ibm-reset-pe-dma-window-v1-1-7aaf235ac63c@linux.ibm.com
2024-02-22 16:19:14 -06:00
#define RTAS_FN_IBM_RESET_PE_DMA_WINDOW rtas_fn_handle(RTAS_FNIDX__IBM_RESET_PE_DMA_WINDOW)
#define RTAS_FN_IBM_SCAN_LOG_DUMP rtas_fn_handle(RTAS_FNIDX__IBM_SCAN_LOG_DUMP)
#define RTAS_FN_IBM_SET_DYNAMIC_INDICATOR rtas_fn_handle(RTAS_FNIDX__IBM_SET_DYNAMIC_INDICATOR)
#define RTAS_FN_IBM_SET_EEH_OPTION rtas_fn_handle(RTAS_FNIDX__IBM_SET_EEH_OPTION)
#define RTAS_FN_IBM_SET_SLOT_RESET rtas_fn_handle(RTAS_FNIDX__IBM_SET_SLOT_RESET)
#define RTAS_FN_IBM_SET_SYSTEM_PARAMETER rtas_fn_handle(RTAS_FNIDX__IBM_SET_SYSTEM_PARAMETER)
#define RTAS_FN_IBM_SET_XIVE rtas_fn_handle(RTAS_FNIDX__IBM_SET_XIVE)
#define RTAS_FN_IBM_SLOT_ERROR_DETAIL rtas_fn_handle(RTAS_FNIDX__IBM_SLOT_ERROR_DETAIL)
#define RTAS_FN_IBM_SUSPEND_ME rtas_fn_handle(RTAS_FNIDX__IBM_SUSPEND_ME)
#define RTAS_FN_IBM_TUNE_DMA_PARMS rtas_fn_handle(RTAS_FNIDX__IBM_TUNE_DMA_PARMS)
#define RTAS_FN_IBM_UPDATE_FLASH_64_AND_REBOOT rtas_fn_handle(RTAS_FNIDX__IBM_UPDATE_FLASH_64_AND_REBOOT)
#define RTAS_FN_IBM_UPDATE_NODES rtas_fn_handle(RTAS_FNIDX__IBM_UPDATE_NODES)
#define RTAS_FN_IBM_UPDATE_PROPERTIES rtas_fn_handle(RTAS_FNIDX__IBM_UPDATE_PROPERTIES)
#define RTAS_FN_IBM_VALIDATE_FLASH_IMAGE rtas_fn_handle(RTAS_FNIDX__IBM_VALIDATE_FLASH_IMAGE)
#define RTAS_FN_IBM_WRITE_PCI_CONFIG rtas_fn_handle(RTAS_FNIDX__IBM_WRITE_PCI_CONFIG)
#define RTAS_FN_NVRAM_FETCH rtas_fn_handle(RTAS_FNIDX__NVRAM_FETCH)
#define RTAS_FN_NVRAM_STORE rtas_fn_handle(RTAS_FNIDX__NVRAM_STORE)
#define RTAS_FN_POWER_OFF rtas_fn_handle(RTAS_FNIDX__POWER_OFF)
#define RTAS_FN_PUT_TERM_CHAR rtas_fn_handle(RTAS_FNIDX__PUT_TERM_CHAR)
#define RTAS_FN_QUERY_CPU_STOPPED_STATE rtas_fn_handle(RTAS_FNIDX__QUERY_CPU_STOPPED_STATE)
#define RTAS_FN_READ_PCI_CONFIG rtas_fn_handle(RTAS_FNIDX__READ_PCI_CONFIG)
#define RTAS_FN_RTAS_LAST_ERROR rtas_fn_handle(RTAS_FNIDX__RTAS_LAST_ERROR)
#define RTAS_FN_SET_INDICATOR rtas_fn_handle(RTAS_FNIDX__SET_INDICATOR)
#define RTAS_FN_SET_POWER_LEVEL rtas_fn_handle(RTAS_FNIDX__SET_POWER_LEVEL)
#define RTAS_FN_SET_TIME_FOR_POWER_ON rtas_fn_handle(RTAS_FNIDX__SET_TIME_FOR_POWER_ON)
#define RTAS_FN_SET_TIME_OF_DAY rtas_fn_handle(RTAS_FNIDX__SET_TIME_OF_DAY)
#define RTAS_FN_START_CPU rtas_fn_handle(RTAS_FNIDX__START_CPU)
#define RTAS_FN_STOP_SELF rtas_fn_handle(RTAS_FNIDX__STOP_SELF)
#define RTAS_FN_SYSTEM_REBOOT rtas_fn_handle(RTAS_FNIDX__SYSTEM_REBOOT)
#define RTAS_FN_THAW_TIME_BASE rtas_fn_handle(RTAS_FNIDX__THAW_TIME_BASE)
#define RTAS_FN_WRITE_PCI_CONFIG rtas_fn_handle(RTAS_FNIDX__WRITE_PCI_CONFIG)
#define RTAS_UNKNOWN_SERVICE (-1)
#define RTAS_INSTANTIATE_MAX (1ULL<<30) /* Don't instantiate rtas at/above this value */
/* Memory set aside for sys_rtas to use with calls that need a work area. */
#define RTAS_USER_REGION_SIZE (64 * 1024)
/*
* Common RTAS function return values, derived from the table "RTAS
* Status Word Values" in PAPR+ v2.13 7.2.8: "Return Codes". If a
* function can return a value in this table then generally it has the
* meaning listed here. More extended commentary in the documentation
* for rtas_call().
*
* RTAS functions may use negative and positive numbers not in this
* set for function-specific error and success conditions,
* respectively.
*/
#define RTAS_SUCCESS 0 /* Success. */
#define RTAS_HARDWARE_ERROR -1 /* Hardware or other unspecified error. */
#define RTAS_BUSY -2 /* Retry immediately. */
#define RTAS_INVALID_PARAMETER -3 /* Invalid indicator/domain/sensor etc. */
#define RTAS_UNEXPECTED_STATE_CHANGE -7 /* Seems limited to EEH and slot reset. */
#define RTAS_EXTENDED_DELAY_MIN 9900 /* Retry after delaying for ~1ms. */
#define RTAS_EXTENDED_DELAY_MAX 9905 /* Retry after delaying for ~100s. */
#define RTAS_ML_ISOLATION_ERROR -9000 /* Multi-level isolation error. */
/* statuses specific to ibm,suspend-me */
#define RTAS_SUSPEND_ABORTED 9000 /* Suspension aborted */
#define RTAS_NOT_SUSPENDABLE -9004 /* Partition not suspendable */
#define RTAS_THREADS_ACTIVE -9005 /* Multiple processor threads active */
#define RTAS_OUTSTANDING_COPROC -9006 /* Outstanding coprocessor operations */
/* RTAS event classes */
#define RTAS_INTERNAL_ERROR 0x80000000 /* set bit 0 */
#define RTAS_EPOW_WARNING 0x40000000 /* set bit 1 */
#define RTAS_HOTPLUG_EVENTS 0x10000000 /* set bit 3 */
#define RTAS_IO_EVENTS 0x08000000 /* set bit 4 */
#define RTAS_EVENT_SCAN_ALL_EVENTS 0xffffffff
/* RTAS event severity */
#define RTAS_SEVERITY_FATAL 0x5
#define RTAS_SEVERITY_ERROR 0x4
#define RTAS_SEVERITY_ERROR_SYNC 0x3
#define RTAS_SEVERITY_WARNING 0x2
#define RTAS_SEVERITY_EVENT 0x1
#define RTAS_SEVERITY_NO_ERROR 0x0
/* RTAS event disposition */
#define RTAS_DISP_FULLY_RECOVERED 0x0
#define RTAS_DISP_LIMITED_RECOVERY 0x1
#define RTAS_DISP_NOT_RECOVERED 0x2
/* RTAS event initiator */
#define RTAS_INITIATOR_UNKNOWN 0x0
#define RTAS_INITIATOR_CPU 0x1
#define RTAS_INITIATOR_PCI 0x2
#define RTAS_INITIATOR_ISA 0x3
#define RTAS_INITIATOR_MEMORY 0x4
#define RTAS_INITIATOR_POWERMGM 0x5
/* RTAS event target */
#define RTAS_TARGET_UNKNOWN 0x0
#define RTAS_TARGET_CPU 0x1
#define RTAS_TARGET_PCI 0x2
#define RTAS_TARGET_ISA 0x3
#define RTAS_TARGET_MEMORY 0x4
#define RTAS_TARGET_POWERMGM 0x5
/* RTAS event type */
#define RTAS_TYPE_RETRY 0x01
#define RTAS_TYPE_TCE_ERR 0x02
#define RTAS_TYPE_INTERN_DEV_FAIL 0x03
#define RTAS_TYPE_TIMEOUT 0x04
#define RTAS_TYPE_DATA_PARITY 0x05
#define RTAS_TYPE_ADDR_PARITY 0x06
#define RTAS_TYPE_CACHE_PARITY 0x07
#define RTAS_TYPE_ADDR_INVALID 0x08
#define RTAS_TYPE_ECC_UNCORR 0x09
#define RTAS_TYPE_ECC_CORR 0x0a
#define RTAS_TYPE_EPOW 0x40
#define RTAS_TYPE_PLATFORM 0xE0
#define RTAS_TYPE_IO 0xE1
#define RTAS_TYPE_INFO 0xE2
#define RTAS_TYPE_DEALLOC 0xE3
#define RTAS_TYPE_DUMP 0xE4
#define RTAS_TYPE_HOTPLUG 0xE5
/* I don't add PowerMGM events right now, this is a different topic */
#define RTAS_TYPE_PMGM_POWER_SW_ON 0x60
#define RTAS_TYPE_PMGM_POWER_SW_OFF 0x61
#define RTAS_TYPE_PMGM_LID_OPEN 0x62
#define RTAS_TYPE_PMGM_LID_CLOSE 0x63
#define RTAS_TYPE_PMGM_SLEEP_BTN 0x64
#define RTAS_TYPE_PMGM_WAKE_BTN 0x65
#define RTAS_TYPE_PMGM_BATTERY_WARN 0x66
#define RTAS_TYPE_PMGM_BATTERY_CRIT 0x67
#define RTAS_TYPE_PMGM_SWITCH_TO_BAT 0x68
#define RTAS_TYPE_PMGM_SWITCH_TO_AC 0x69
#define RTAS_TYPE_PMGM_KBD_OR_MOUSE 0x6a
#define RTAS_TYPE_PMGM_ENCLOS_OPEN 0x6b
#define RTAS_TYPE_PMGM_ENCLOS_CLOSED 0x6c
#define RTAS_TYPE_PMGM_RING_INDICATE 0x6d
#define RTAS_TYPE_PMGM_LAN_ATTENTION 0x6e
#define RTAS_TYPE_PMGM_TIME_ALARM 0x6f
#define RTAS_TYPE_PMGM_CONFIG_CHANGE 0x70
#define RTAS_TYPE_PMGM_SERVICE_PROC 0x71
powerpc/pseries: Add PRRN RTAS event handler A PRRN event is signaled via the RTAS event-scan mechanism, which returns a Hot Plug Event message "fixed part" indicating "Platform Resource Reassignment". In response to the Hot Plug Event message, we must call ibm,update-nodes to determine which resources were reassigned and then ibm,update-properties to obtain the new affinity information about those resources. The PRRN event-scan RTAS message contains only the "fixed part" with the "Type" field set to the value 160 and no Extended Event Log. The four-byte Extended Event Log Length field is re-purposed (since no Extended Event Log message is included) to pass the "scope" parameter that causes the ibm,update-nodes to return the nodes affected by the specific resource reassignment. This patch adds a handler for RTAS events. The function pseries_devicetree_update() (from mobility.c) is used to make the ibm,update-nodes/ibm,update-properties RTAS calls. Updating the NUMA maps (handled by a subsequent patch) will require significant processing, so pseries_devicetree_update() is called from an asynchronous workqueue to allow event processing to continue. PRRN RTAS events on pseries systems are rare events that have to be initiated from the HMC console for the system by an IBM tech. This allows us to assume that these events are widely spaced. Additionally, all work on the queue is flushed before handling any new work to ensure we only have one event in flight being handled at a time. Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-04-24 05:51:33 +00:00
/* Platform Resource Reassignment Notification */
#define RTAS_TYPE_PRRN 0xA0
/* RTAS check-exception vector offset */
#define RTAS_VECTOR_EXTERNAL_INTERRUPT 0x500
static inline uint8_t rtas_error_severity(const struct rtas_error_log *elog)
{
return (elog->byte1 & 0xE0) >> 5;
}
static inline uint8_t rtas_error_disposition(const struct rtas_error_log *elog)
{
return (elog->byte1 & 0x18) >> 3;
}
static inline
void rtas_set_disposition_recovered(struct rtas_error_log *elog)
{
elog->byte1 &= ~0x18;
elog->byte1 |= (RTAS_DISP_FULLY_RECOVERED << 3);
}
static inline uint8_t rtas_error_extended(const struct rtas_error_log *elog)
{
return (elog->byte1 & 0x04) >> 2;
}
static inline uint8_t rtas_error_initiator(const struct rtas_error_log *elog)
{
return (elog->byte2 & 0xf0) >> 4;
}
#define rtas_error_type(x) ((x)->byte3)
static inline
uint32_t rtas_error_extended_log_length(const struct rtas_error_log *elog)
{
return be32_to_cpu(elog->extended_log_length);
}
#define RTAS_V6EXT_LOG_FORMAT_EVENT_LOG 14
#define RTAS_V6EXT_COMPANY_ID_IBM (('I' << 24) | ('B' << 16) | ('M' << 8))
static
inline uint8_t rtas_ext_event_log_format(struct rtas_ext_event_log_v6 *ext_log)
{
return ext_log->byte2 & 0x0F;
}
static
inline uint32_t rtas_ext_event_company_id(struct rtas_ext_event_log_v6 *ext_log)
{
return be32_to_cpu(ext_log->company_id);
}
/* pSeries event log format */
/* Two bytes ASCII section IDs */
#define PSERIES_ELOG_SECT_ID_PRIV_HDR (('P' << 8) | 'H')
#define PSERIES_ELOG_SECT_ID_USER_HDR (('U' << 8) | 'H')
#define PSERIES_ELOG_SECT_ID_PRIMARY_SRC (('P' << 8) | 'S')
#define PSERIES_ELOG_SECT_ID_EXTENDED_UH (('E' << 8) | 'H')
#define PSERIES_ELOG_SECT_ID_FAILING_MTMS (('M' << 8) | 'T')
#define PSERIES_ELOG_SECT_ID_SECONDARY_SRC (('S' << 8) | 'S')
#define PSERIES_ELOG_SECT_ID_DUMP_LOCATOR (('D' << 8) | 'H')
#define PSERIES_ELOG_SECT_ID_FW_ERROR (('S' << 8) | 'W')
#define PSERIES_ELOG_SECT_ID_IMPACT_PART_ID (('L' << 8) | 'P')
#define PSERIES_ELOG_SECT_ID_LOGIC_RESOURCE_ID (('L' << 8) | 'R')
#define PSERIES_ELOG_SECT_ID_HMC_ID (('H' << 8) | 'M')
#define PSERIES_ELOG_SECT_ID_EPOW (('E' << 8) | 'P')
#define PSERIES_ELOG_SECT_ID_IO_EVENT (('I' << 8) | 'E')
#define PSERIES_ELOG_SECT_ID_MANUFACT_INFO (('M' << 8) | 'I')
#define PSERIES_ELOG_SECT_ID_CALL_HOME (('C' << 8) | 'H')
#define PSERIES_ELOG_SECT_ID_USER_DEF (('U' << 8) | 'D')
#define PSERIES_ELOG_SECT_ID_HOTPLUG (('H' << 8) | 'P')
#define PSERIES_ELOG_SECT_ID_MCE (('M' << 8) | 'C')
static
inline uint16_t pseries_errorlog_id(struct pseries_errorlog *sect)
{
return be16_to_cpu(sect->id);
}
static
inline uint16_t pseries_errorlog_length(struct pseries_errorlog *sect)
{
return be16_to_cpu(sect->length);
}
#define PSERIES_HP_ELOG_RESOURCE_CPU 1
#define PSERIES_HP_ELOG_RESOURCE_MEM 2
#define PSERIES_HP_ELOG_RESOURCE_SLOT 3
#define PSERIES_HP_ELOG_RESOURCE_PHB 4
#define PSERIES_HP_ELOG_RESOURCE_PMEM 6
#define PSERIES_HP_ELOG_ACTION_ADD 1
#define PSERIES_HP_ELOG_ACTION_REMOVE 2
#define PSERIES_HP_ELOG_ID_DRC_NAME 1
#define PSERIES_HP_ELOG_ID_DRC_INDEX 2
#define PSERIES_HP_ELOG_ID_DRC_COUNT 3
#define PSERIES_HP_ELOG_ID_DRC_IC 4
struct pseries_errorlog *get_pseries_errorlog(struct rtas_error_log *log,
uint16_t section_id);
/*
* This can be set by the rtas_flash module so that it can get called
* as the absolutely last thing before the kernel terminates.
*/
extern void (*rtas_flash_term_hook)(int);
extern struct rtas_t rtas;
s32 rtas_function_token(const rtas_fn_handle_t handle);
static inline bool rtas_function_implemented(const rtas_fn_handle_t handle)
{
return rtas_function_token(handle) != RTAS_UNKNOWN_SERVICE;
}
int rtas_token(const char *service);
int rtas_call(int token, int nargs, int nret, int *outputs, ...);
void rtas_call_unlocked(struct rtas_args *args, int token, int nargs,
int nret, ...);
void __noreturn rtas_restart(char *cmd);
void rtas_power_off(void);
void __noreturn rtas_halt(void);
void rtas_os_term(char *str);
void rtas_activate_firmware(void);
int rtas_get_sensor(int sensor, int index, int *state);
int rtas_get_sensor_fast(int sensor, int index, int *state);
int rtas_get_power_level(int powerdomain, int *level);
int rtas_set_power_level(int powerdomain, int level, int *setlevel);
bool rtas_indicator_present(int token, int *maxindex);
int rtas_set_indicator(int indicator, int index, int new_value);
int rtas_set_indicator_fast(int indicator, int index, int new_value);
void rtas_progress(char *s, unsigned short hex);
int rtas_ibm_suspend_me(int *fw_status);
int rtas_error_rc(int rtas_rc);
struct rtc_time;
time64_t rtas_get_boot_time(void);
void rtas_get_rtc_time(struct rtc_time *rtc_time);
int rtas_set_rtc_time(struct rtc_time *rtc_time);
unsigned int rtas_busy_delay_time(int status);
powerpc/rtas: rtas_busy_delay() improvements Generally RTAS cannot block, and in PAPR it is required to return control to the OS within a few tens of microseconds. In order to support operations which may take longer to complete, many RTAS primitives can return intermediate -2 ("busy") or 990x ("extended delay") values, which indicate that the OS should reattempt the same call with the same arguments at some point in the future. Current versions of PAPR are less than clear about this, but the intended meanings of these values in more detail are: RTAS_BUSY (-2): RTAS has suspended a potentially long-running operation in order to meet its latency obligation and give the OS the opportunity to perform other work. RTAS can resume making progress as soon as the OS reattempts the call. RTAS_EXTENDED_DELAY_{MIN...MAX} (9900-9905): RTAS must wait for an external event to occur or for internal contention to resolve before it can complete the requested operation. The value encodes a non-binding hint as to roughly how long the OS should wait before calling again, but the OS is allowed to reattempt the call sooner or even immediately. Linux of course must take its own CPU scheduling obligations into account when handling these statuses; e.g. a task which receives an RTAS_BUSY status should check whether to reschedule before it attempts the RTAS call again to avoid starving other tasks. rtas_busy_delay() is a helper function that "consumes" a busy or extended delay status. Common usage: int rc; do { rc = rtas_call(rtas_token("some-function"), ...); } while (rtas_busy_delay(rc)); /* convert rc to Linux error value, etc */ If rc is a busy or extended delay status, the caller can rely on rtas_busy_delay() to perform an appropriate sleep or reschedule and return nonzero. Other statuses are handled normally by the caller. The current implementation of rtas_busy_delay() both oversleeps and overuses the CPU: * It performs msleep() for all 990x and even when no delay is suggested (-2), but this is understood to actually sleep for two jiffies minimum in practice (20ms with HZ=100). 9900 (1ms) and 9901 (10ms) appear to be the most common extended delay statuses, and the oversleeping measurably lengthens DLPAR operations, which perform many RTAS calls. * It does not sleep on 990x unless need_resched() is true, causing code like the loop above to needlessly retry, wasting CPU time. Alter the logic to align better with the intended meanings: * When passed RTAS_BUSY, perform cond_resched() and return without sleeping. The caller should reattempt immediately * Always sleep when passed an extended delay status, using usleep_range() for precise shorter sleeps. Limit the sleep time to one second even though there are higher architected values. Change rtas_busy_delay()'s return type to bool to better reflect its usage, and add kernel-doc. rtas_busy_delay_time() is unchanged, even though it "incorrectly" returns 1 for RTAS_BUSY. There are users of that API with open-coded delay loops in sensitive contexts that will have to be taken on an individual basis. Brief results for addition and removal of 5GB memory on a small P9 PowerVM partition follow. Load was generated with stress-ng --cpu N. For add, elapsed time is greatly reduced without significant change in the number of RTAS calls or time spent on CPU. For remove, elapsed time is modestly reduced, with significant reductions in RTAS calls and time spent on CPU. With no competing workload (- before, + after): Performance counter stats for 'bash -c echo "memory add count 20" > /sys/kernel/dlpar' (10 runs): - 1,935 probe:rtas_call # 0.003 M/sec ( +- 0.22% ) - 609.99 msec task-clock # 0.183 CPUs utilized ( +- 0.19% ) + 1,956 probe:rtas_call # 0.003 M/sec ( +- 0.17% ) + 618.56 msec task-clock # 0.278 CPUs utilized ( +- 0.11% ) - 3.3322 +- 0.0670 seconds time elapsed ( +- 2.01% ) + 2.2222 +- 0.0416 seconds time elapsed ( +- 1.87% ) Performance counter stats for 'bash -c echo "memory remove count 20" > /sys/kernel/dlpar' (10 runs): - 6,224 probe:rtas_call # 0.008 M/sec ( +- 2.57% ) - 750.36 msec task-clock # 0.190 CPUs utilized ( +- 2.01% ) + 843 probe:rtas_call # 0.003 M/sec ( +- 0.12% ) + 250.66 msec task-clock # 0.068 CPUs utilized ( +- 0.17% ) - 3.9394 +- 0.0890 seconds time elapsed ( +- 2.26% ) + 3.678 +- 0.113 seconds time elapsed ( +- 3.07% ) With all CPUs 100% busy (- before, + after): Performance counter stats for 'bash -c echo "memory add count 20" > /sys/kernel/dlpar' (10 runs): - 2,979 probe:rtas_call # 0.003 M/sec ( +- 0.12% ) - 1,096.62 msec task-clock # 0.105 CPUs utilized ( +- 0.10% ) + 2,981 probe:rtas_call # 0.003 M/sec ( +- 0.22% ) + 1,095.26 msec task-clock # 0.154 CPUs utilized ( +- 0.21% ) - 10.476 +- 0.104 seconds time elapsed ( +- 1.00% ) + 7.1124 +- 0.0865 seconds time elapsed ( +- 1.22% ) Performance counter stats for 'bash -c echo "memory remove count 20" > /sys/kernel/dlpar' (10 runs): - 2,702 probe:rtas_call # 0.004 M/sec ( +- 4.00% ) - 722.71 msec task-clock # 0.067 CPUs utilized ( +- 2.41% ) + 1,246 probe:rtas_call # 0.003 M/sec ( +- 0.25% ) + 487.73 msec task-clock # 0.049 CPUs utilized ( +- 0.20% ) - 10.829 +- 0.163 seconds time elapsed ( +- 1.51% ) + 9.9887 +- 0.0866 seconds time elapsed ( +- 0.87% ) Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20211117060259.957178-2-nathanl@linux.ibm.com
2021-11-17 00:02:58 -06:00
bool rtas_busy_delay(int status);
int early_init_dt_scan_rtas(unsigned long node, const char *uname, int depth, void *data);
void pSeries_log_error(char *buf, unsigned int err_type, int fatal);
#ifdef CONFIG_PPC_PSERIES
extern time64_t last_rtas_event;
int clobbering_unread_rtas_event(void);
int rtas_syscall_dispatch_ibm_suspend_me(u64 handle);
#else
static inline int clobbering_unread_rtas_event(void) { return 0; }
static inline int rtas_syscall_dispatch_ibm_suspend_me(u64 handle)
{
return -EINVAL;
}
#endif
#ifdef CONFIG_PPC_RTAS_DAEMON
void rtas_cancel_event_scan(void);
#else
static inline void rtas_cancel_event_scan(void) { }
#endif
/* Error types logged. */
#define ERR_FLAG_ALREADY_LOGGED 0x0
#define ERR_FLAG_BOOT 0x1 /* log was pulled from NVRAM on boot */
#define ERR_TYPE_RTAS_LOG 0x2 /* from rtas event-scan */
#define ERR_TYPE_KERNEL_PANIC 0x4 /* from die()/panic() */
#define ERR_TYPE_KERNEL_PANIC_GZ 0x8 /* ditto, compressed */
/* All the types and not flags */
#define ERR_TYPE_MASK \
(ERR_TYPE_RTAS_LOG | ERR_TYPE_KERNEL_PANIC | ERR_TYPE_KERNEL_PANIC_GZ)
#define RTAS_DEBUG KERN_DEBUG "RTAS: "
#define RTAS_ERROR_LOG_MAX 2048
/*
* Return the firmware-specified size of the error log buffer
* for all rtas calls that require an error buffer argument.
* This includes 'check-exception' and 'rtas-last-error'.
*/
int rtas_get_error_log_max(void);
/* Event Scan Parameters */
#define EVENT_SCAN_ALL_EVENTS 0xf0000000
#define SURVEILLANCE_TOKEN 9000
#define LOG_NUMBER 64 /* must be a power of two */
#define LOG_NUMBER_MASK (LOG_NUMBER-1)
/* Some RTAS ops require a data buffer and that buffer must be < 4G.
* Rather than having a memory allocator, just use this buffer
* (get the lock first), make the RTAS call. Copy the data instead
* of holding the buffer for long.
*/
#define RTAS_DATA_BUF_SIZE 4096
extern spinlock_t rtas_data_buf_lock;
extern char rtas_data_buf[RTAS_DATA_BUF_SIZE];
/* RMO buffer reserved for user-space RTAS use */
extern unsigned long rtas_rmo_buf;
powerpc/rtas: Facilitate high-level call sequences On RTAS platforms there is a general restriction that the OS must not enter RTAS on more than one CPU at a time. This low-level serialization requirement is satisfied by holding a spin lock (rtas_lock) across most RTAS function invocations. However, some pseries RTAS functions require multiple successive calls to complete a logical operation. Beginning a new call sequence for such a function may disrupt any other sequences of that function already in progress. Safe and reliable use of these functions effectively requires higher-level serialization beyond what is already done at the level of RTAS entry and exit. Where a sequence-based RTAS function is invoked only through sys_rtas(), with no in-kernel users, there is no issue as far as the kernel is concerned. User space is responsible for appropriately serializing its call sequences. (Whether user space code actually takes measures to prevent sequence interleaving is another matter.) Examples of such functions currently include ibm,platform-dump and ibm,get-vpd. But where a sequence-based RTAS function has both user space and in-kernel uesrs, there is a hazard. Even if the in-kernel call sites of such a function serialize their sequences correctly, a user of sys_rtas() can invoke the same function at any time, potentially disrupting a sequence in progress. So in order to prevent disruption of kernel-based RTAS call sequences, they must serialize not only with themselves but also with sys_rtas() users, somehow. Preferably without adding more function-specific hacks to sys_rtas(). This is a prerequisite for adding an in-kernel call sequence of ibm,get-vpd, which is in a change to follow. Note that it has never been feasible for the kernel to prevent sys_rtas()-based sequences from being disrupted because control returns to user space on every call. sys_rtas()-based users of these functions have always been, and continue to be, responsible for coordinating their call sequences with other users, even those which may invoke the RTAS functions through less direct means than sys_rtas(). This is an unavoidable consequence of exposing sequence-based RTAS functions through sys_rtas(). * Add an optional mutex member to struct rtas_function. * Statically define a mutex for each RTAS function with known call sequence serialization requirements, and assign its address to the .lock member of the corresponding function table entry, along with justifying commentary. * In sys_rtas(), if the table entry for the RTAS function being called has a populated lock member, acquire it before taking rtas_lock and entering RTAS. * Kernel-based RTAS call sequences are expected to access the appropriate mutex explicitly by name. For example, a user of the ibm,activate-firmware RTAS function would do: int token = rtas_function_token(RTAS_FN_IBM_ACTIVATE_FIRMWARE); int fwrc; mutex_lock(&rtas_ibm_activate_firmware_lock); do { fwrc = rtas_call(token, 0, 1, NULL); } while (rtas_busy_delay(fwrc)); mutex_unlock(&rtas_ibm_activate_firmware_lock); There should be no perceivable change introduced here except that concurrent callers of the same RTAS function via sys_rtas() may block on a mutex instead of spinning on rtas_lock. Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20231212-papr-sys_rtas-vs-lockdown-v6-6-e9eafd0c8c6c@linux.ibm.com
2023-12-12 11:01:53 -06:00
extern struct mutex rtas_ibm_get_vpd_lock;
#define GLOBAL_INTERRUPT_QUEUE 9005
/**
* rtas_config_addr - Format a busno, devfn and reg for RTAS.
* @busno: The bus number.
* @devfn: The device and function number as encoded by PCI_DEVFN().
* @reg: The register number.
*
* This function encodes the given busno, devfn and register number as
* required for RTAS calls that take a "config_addr" parameter.
* See PAPR requirement 7.3.4-1 for more info.
*/
static inline u32 rtas_config_addr(int busno, int devfn, int reg)
{
return ((reg & 0xf00) << 20) | ((busno & 0xff) << 16) |
(devfn << 8) | (reg & 0xff);
}
void rtas_give_timebase(void);
void rtas_take_timebase(void);
#ifdef CONFIG_PPC_RTAS
static inline int page_is_rtas_user_buf(unsigned long pfn)
{
unsigned long paddr = (pfn << PAGE_SHIFT);
if (paddr >= rtas_rmo_buf && paddr < (rtas_rmo_buf + RTAS_USER_REGION_SIZE))
return 1;
return 0;
}
/* Not the best place to put pSeries_coalesce_init, will be fixed when we
* move some of the rtas suspend-me stuff to pseries */
void pSeries_coalesce_init(void);
void rtas_initialize(void);
#else
static inline int page_is_rtas_user_buf(unsigned long pfn) { return 0;}
static inline void pSeries_coalesce_init(void) { }
static inline void rtas_initialize(void) { }
#endif
#ifdef CONFIG_HV_PERF_CTRS
void read_24x7_sys_info(void);
#else
static inline void read_24x7_sys_info(void) { }
#endif
#endif /* __KERNEL__ */
#endif /* _POWERPC_RTAS_H */