linux/drivers/pci/pcie/bwctrl.c

330 lines
8.8 KiB
C
Raw Permalink Normal View History

PCI/bwctrl: Re-add BW notification portdrv as PCIe BW controller This mostly reverts the commit b4c7d2076b4e ("PCI/LINK: Remove bandwidth notification"). An upcoming commit extends this driver building PCIe bandwidth controller on top of it. PCIe bandwidth notifications were first added in the commit e8303bb7a75c ("PCI/LINK: Report degraded links via link bandwidth notification") but later had to be removed. The significant changes compared with the old bandwidth notification driver include: 1) Don't print the notifications into kernel log, just keep the Link Speed cached in struct pci_bus updated. While somewhat unfortunate, the log spam was the source of complaints that eventually lead to the removal of the bandwidth notifications driver (see the links below for further information). 2) Besides the Link Bandwidth Management Interrupt, also enable Link Autonomous Bandwidth Interrupt to cover the other source of bandwidth changes. 3) Handle Link Speed updates robustly. Refresh the cached Link Speed when enabling Bandwidth Notification Interrupts, and solve the race between Link Speed read and LBMS/LABS update in pcie_bwnotif_irq_thread(). 4) Use concurrency safe LNKCTL RMW operations. 5) The driver is now called PCIe bwctrl (bandwidth controller) instead of just bandwidth notifications because of increased scope and functionality within the driver. 6) Coexist with the Target Link Speed quirk in pcie_failed_link_retrain(). Provide LBMS counting API for it. 7) Tweaks to variable/functions names for consistency and length reasons. Bandwidth Notifications enable the cur_bus_speed in the struct pci_bus to keep track PCIe Link Speed changes. [bhelgaas: This is based on previous work by Alexandru Gagniuc <mr.nuke.me@gmail.com>; see e8303bb7a75c ("PCI/LINK: Report degraded links via link bandwidth notification")] Link: https://lore.kernel.org/r/20241018144755.7875-7-ilpo.jarvinen@linux.intel.com Link: https://lore.kernel.org/all/20190429185611.121751-1-helgaas@kernel.org/ Link: https://lore.kernel.org/linux-pci/20190501142942.26972-1-keith.busch@intel.com/ Link: https://lore.kernel.org/linux-pci/20200115221008.GA191037@google.com/ Suggested-by: Lukas Wunner <lukas@wunner.de> # Building bwctrl on top of bwnotif Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> [bhelgaas: squash fix to drop IRQF_ONESHOT and convert to hardirq handler: https://lore.kernel.org/r/20241115165717.15233-1-ilpo.jarvinen@linux.intel.com] Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Tested-by: Stefan Wahren <wahrenst@gmx.net> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
2024-10-18 17:47:52 +03:00
// SPDX-License-Identifier: GPL-2.0+
/*
* PCIe bandwidth controller
*
* Author: Alexandru Gagniuc <mr.nuke.me@gmail.com>
*
* Copyright (C) 2019 Dell Inc
* Copyright (C) 2023-2024 Intel Corporation
*
PCI/bwctrl: Add pcie_set_target_speed() to set PCIe Link Speed Currently, PCIe Link Speeds are adjusted by custom code rather than in a common function provided in PCI core. The PCIe bandwidth controller (bwctrl) introduces an in-kernel API, pcie_set_target_speed(), to set PCIe Link Speed. Convert Target Speed quirk to use the new API. The Target Speed quirk runs very early when bwctrl is not yet probed for a Port and can also run later when bwctrl is already setup for the Port, which requires the per port mutex (set_speed_mutex) to be only taken if the bwctrl setup is already complete. The new API is also intended to be used in an upcoming commit that adds a thermal cooling device to throttle PCIe bandwidth when thermal thresholds are reached. The PCIe bandwidth control procedure is as follows. The highest speed supported by the Port and the PCIe device which is not higher than the requested speed is selected and written into the Target Link Speed in the Link Control 2 Register. Then bandwidth controller retrains the PCIe Link. Bandwidth Notifications enable the cur_bus_speed in the struct pci_bus to keep track PCIe Link Speed changes. While Bandwidth Notifications should also be generated when bandwidth controller alters the PCIe Link Speed, a few platforms do not deliver LMBS interrupt after Link Training as expected. Thus, after changing the Link Speed, bandwidth controller makes additional read for the Link Status Register to ensure cur_bus_speed is consistent with the new PCIe Link Speed. Link: https://lore.kernel.org/r/20241018144755.7875-8-ilpo.jarvinen@linux.intel.com Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> [bhelgaas: squash devm_mutex_init() error checking from https://lore.kernel.org/r/20241030163139.2111689-1-andriy.shevchenko@linux.intel.com, drop export of pcie_set_target_speed()] Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
2024-10-18 17:47:53 +03:00
* The PCIe bandwidth controller provides a way to alter PCIe Link Speeds
* and notify the operating system when the Link Width or Speed changes. The
* notification capability is required for all Root Ports and Downstream
* Ports supporting Link Width wider than x1 and/or multiple Link Speeds.
*
PCI/bwctrl: Re-add BW notification portdrv as PCIe BW controller This mostly reverts the commit b4c7d2076b4e ("PCI/LINK: Remove bandwidth notification"). An upcoming commit extends this driver building PCIe bandwidth controller on top of it. PCIe bandwidth notifications were first added in the commit e8303bb7a75c ("PCI/LINK: Report degraded links via link bandwidth notification") but later had to be removed. The significant changes compared with the old bandwidth notification driver include: 1) Don't print the notifications into kernel log, just keep the Link Speed cached in struct pci_bus updated. While somewhat unfortunate, the log spam was the source of complaints that eventually lead to the removal of the bandwidth notifications driver (see the links below for further information). 2) Besides the Link Bandwidth Management Interrupt, also enable Link Autonomous Bandwidth Interrupt to cover the other source of bandwidth changes. 3) Handle Link Speed updates robustly. Refresh the cached Link Speed when enabling Bandwidth Notification Interrupts, and solve the race between Link Speed read and LBMS/LABS update in pcie_bwnotif_irq_thread(). 4) Use concurrency safe LNKCTL RMW operations. 5) The driver is now called PCIe bwctrl (bandwidth controller) instead of just bandwidth notifications because of increased scope and functionality within the driver. 6) Coexist with the Target Link Speed quirk in pcie_failed_link_retrain(). Provide LBMS counting API for it. 7) Tweaks to variable/functions names for consistency and length reasons. Bandwidth Notifications enable the cur_bus_speed in the struct pci_bus to keep track PCIe Link Speed changes. [bhelgaas: This is based on previous work by Alexandru Gagniuc <mr.nuke.me@gmail.com>; see e8303bb7a75c ("PCI/LINK: Report degraded links via link bandwidth notification")] Link: https://lore.kernel.org/r/20241018144755.7875-7-ilpo.jarvinen@linux.intel.com Link: https://lore.kernel.org/all/20190429185611.121751-1-helgaas@kernel.org/ Link: https://lore.kernel.org/linux-pci/20190501142942.26972-1-keith.busch@intel.com/ Link: https://lore.kernel.org/linux-pci/20200115221008.GA191037@google.com/ Suggested-by: Lukas Wunner <lukas@wunner.de> # Building bwctrl on top of bwnotif Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> [bhelgaas: squash fix to drop IRQF_ONESHOT and convert to hardirq handler: https://lore.kernel.org/r/20241115165717.15233-1-ilpo.jarvinen@linux.intel.com] Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Tested-by: Stefan Wahren <wahrenst@gmx.net> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
2024-10-18 17:47:52 +03:00
* This service port driver hooks into the Bandwidth Notification interrupt
* watching for changes or links becoming degraded in operation. It updates
* the cached Current Link Speed that is exposed to user space through sysfs.
*/
#define dev_fmt(fmt) "bwctrl: " fmt
#include <linux/atomic.h>
PCI/bwctrl: Add pcie_set_target_speed() to set PCIe Link Speed Currently, PCIe Link Speeds are adjusted by custom code rather than in a common function provided in PCI core. The PCIe bandwidth controller (bwctrl) introduces an in-kernel API, pcie_set_target_speed(), to set PCIe Link Speed. Convert Target Speed quirk to use the new API. The Target Speed quirk runs very early when bwctrl is not yet probed for a Port and can also run later when bwctrl is already setup for the Port, which requires the per port mutex (set_speed_mutex) to be only taken if the bwctrl setup is already complete. The new API is also intended to be used in an upcoming commit that adds a thermal cooling device to throttle PCIe bandwidth when thermal thresholds are reached. The PCIe bandwidth control procedure is as follows. The highest speed supported by the Port and the PCIe device which is not higher than the requested speed is selected and written into the Target Link Speed in the Link Control 2 Register. Then bandwidth controller retrains the PCIe Link. Bandwidth Notifications enable the cur_bus_speed in the struct pci_bus to keep track PCIe Link Speed changes. While Bandwidth Notifications should also be generated when bandwidth controller alters the PCIe Link Speed, a few platforms do not deliver LMBS interrupt after Link Training as expected. Thus, after changing the Link Speed, bandwidth controller makes additional read for the Link Status Register to ensure cur_bus_speed is consistent with the new PCIe Link Speed. Link: https://lore.kernel.org/r/20241018144755.7875-8-ilpo.jarvinen@linux.intel.com Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> [bhelgaas: squash devm_mutex_init() error checking from https://lore.kernel.org/r/20241030163139.2111689-1-andriy.shevchenko@linux.intel.com, drop export of pcie_set_target_speed()] Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
2024-10-18 17:47:53 +03:00
#include <linux/bitops.h>
#include <linux/bits.h>
PCI/bwctrl: Re-add BW notification portdrv as PCIe BW controller This mostly reverts the commit b4c7d2076b4e ("PCI/LINK: Remove bandwidth notification"). An upcoming commit extends this driver building PCIe bandwidth controller on top of it. PCIe bandwidth notifications were first added in the commit e8303bb7a75c ("PCI/LINK: Report degraded links via link bandwidth notification") but later had to be removed. The significant changes compared with the old bandwidth notification driver include: 1) Don't print the notifications into kernel log, just keep the Link Speed cached in struct pci_bus updated. While somewhat unfortunate, the log spam was the source of complaints that eventually lead to the removal of the bandwidth notifications driver (see the links below for further information). 2) Besides the Link Bandwidth Management Interrupt, also enable Link Autonomous Bandwidth Interrupt to cover the other source of bandwidth changes. 3) Handle Link Speed updates robustly. Refresh the cached Link Speed when enabling Bandwidth Notification Interrupts, and solve the race between Link Speed read and LBMS/LABS update in pcie_bwnotif_irq_thread(). 4) Use concurrency safe LNKCTL RMW operations. 5) The driver is now called PCIe bwctrl (bandwidth controller) instead of just bandwidth notifications because of increased scope and functionality within the driver. 6) Coexist with the Target Link Speed quirk in pcie_failed_link_retrain(). Provide LBMS counting API for it. 7) Tweaks to variable/functions names for consistency and length reasons. Bandwidth Notifications enable the cur_bus_speed in the struct pci_bus to keep track PCIe Link Speed changes. [bhelgaas: This is based on previous work by Alexandru Gagniuc <mr.nuke.me@gmail.com>; see e8303bb7a75c ("PCI/LINK: Report degraded links via link bandwidth notification")] Link: https://lore.kernel.org/r/20241018144755.7875-7-ilpo.jarvinen@linux.intel.com Link: https://lore.kernel.org/all/20190429185611.121751-1-helgaas@kernel.org/ Link: https://lore.kernel.org/linux-pci/20190501142942.26972-1-keith.busch@intel.com/ Link: https://lore.kernel.org/linux-pci/20200115221008.GA191037@google.com/ Suggested-by: Lukas Wunner <lukas@wunner.de> # Building bwctrl on top of bwnotif Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> [bhelgaas: squash fix to drop IRQF_ONESHOT and convert to hardirq handler: https://lore.kernel.org/r/20241115165717.15233-1-ilpo.jarvinen@linux.intel.com] Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Tested-by: Stefan Wahren <wahrenst@gmx.net> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
2024-10-18 17:47:52 +03:00
#include <linux/cleanup.h>
#include <linux/errno.h>
#include <linux/interrupt.h>
PCI/bwctrl: Add pcie_set_target_speed() to set PCIe Link Speed Currently, PCIe Link Speeds are adjusted by custom code rather than in a common function provided in PCI core. The PCIe bandwidth controller (bwctrl) introduces an in-kernel API, pcie_set_target_speed(), to set PCIe Link Speed. Convert Target Speed quirk to use the new API. The Target Speed quirk runs very early when bwctrl is not yet probed for a Port and can also run later when bwctrl is already setup for the Port, which requires the per port mutex (set_speed_mutex) to be only taken if the bwctrl setup is already complete. The new API is also intended to be used in an upcoming commit that adds a thermal cooling device to throttle PCIe bandwidth when thermal thresholds are reached. The PCIe bandwidth control procedure is as follows. The highest speed supported by the Port and the PCIe device which is not higher than the requested speed is selected and written into the Target Link Speed in the Link Control 2 Register. Then bandwidth controller retrains the PCIe Link. Bandwidth Notifications enable the cur_bus_speed in the struct pci_bus to keep track PCIe Link Speed changes. While Bandwidth Notifications should also be generated when bandwidth controller alters the PCIe Link Speed, a few platforms do not deliver LMBS interrupt after Link Training as expected. Thus, after changing the Link Speed, bandwidth controller makes additional read for the Link Status Register to ensure cur_bus_speed is consistent with the new PCIe Link Speed. Link: https://lore.kernel.org/r/20241018144755.7875-8-ilpo.jarvinen@linux.intel.com Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> [bhelgaas: squash devm_mutex_init() error checking from https://lore.kernel.org/r/20241030163139.2111689-1-andriy.shevchenko@linux.intel.com, drop export of pcie_set_target_speed()] Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
2024-10-18 17:47:53 +03:00
#include <linux/mutex.h>
PCI/bwctrl: Re-add BW notification portdrv as PCIe BW controller This mostly reverts the commit b4c7d2076b4e ("PCI/LINK: Remove bandwidth notification"). An upcoming commit extends this driver building PCIe bandwidth controller on top of it. PCIe bandwidth notifications were first added in the commit e8303bb7a75c ("PCI/LINK: Report degraded links via link bandwidth notification") but later had to be removed. The significant changes compared with the old bandwidth notification driver include: 1) Don't print the notifications into kernel log, just keep the Link Speed cached in struct pci_bus updated. While somewhat unfortunate, the log spam was the source of complaints that eventually lead to the removal of the bandwidth notifications driver (see the links below for further information). 2) Besides the Link Bandwidth Management Interrupt, also enable Link Autonomous Bandwidth Interrupt to cover the other source of bandwidth changes. 3) Handle Link Speed updates robustly. Refresh the cached Link Speed when enabling Bandwidth Notification Interrupts, and solve the race between Link Speed read and LBMS/LABS update in pcie_bwnotif_irq_thread(). 4) Use concurrency safe LNKCTL RMW operations. 5) The driver is now called PCIe bwctrl (bandwidth controller) instead of just bandwidth notifications because of increased scope and functionality within the driver. 6) Coexist with the Target Link Speed quirk in pcie_failed_link_retrain(). Provide LBMS counting API for it. 7) Tweaks to variable/functions names for consistency and length reasons. Bandwidth Notifications enable the cur_bus_speed in the struct pci_bus to keep track PCIe Link Speed changes. [bhelgaas: This is based on previous work by Alexandru Gagniuc <mr.nuke.me@gmail.com>; see e8303bb7a75c ("PCI/LINK: Report degraded links via link bandwidth notification")] Link: https://lore.kernel.org/r/20241018144755.7875-7-ilpo.jarvinen@linux.intel.com Link: https://lore.kernel.org/all/20190429185611.121751-1-helgaas@kernel.org/ Link: https://lore.kernel.org/linux-pci/20190501142942.26972-1-keith.busch@intel.com/ Link: https://lore.kernel.org/linux-pci/20200115221008.GA191037@google.com/ Suggested-by: Lukas Wunner <lukas@wunner.de> # Building bwctrl on top of bwnotif Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> [bhelgaas: squash fix to drop IRQF_ONESHOT and convert to hardirq handler: https://lore.kernel.org/r/20241115165717.15233-1-ilpo.jarvinen@linux.intel.com] Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Tested-by: Stefan Wahren <wahrenst@gmx.net> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
2024-10-18 17:47:52 +03:00
#include <linux/pci.h>
#include <linux/pci-bwctrl.h>
PCI/bwctrl: Re-add BW notification portdrv as PCIe BW controller This mostly reverts the commit b4c7d2076b4e ("PCI/LINK: Remove bandwidth notification"). An upcoming commit extends this driver building PCIe bandwidth controller on top of it. PCIe bandwidth notifications were first added in the commit e8303bb7a75c ("PCI/LINK: Report degraded links via link bandwidth notification") but later had to be removed. The significant changes compared with the old bandwidth notification driver include: 1) Don't print the notifications into kernel log, just keep the Link Speed cached in struct pci_bus updated. While somewhat unfortunate, the log spam was the source of complaints that eventually lead to the removal of the bandwidth notifications driver (see the links below for further information). 2) Besides the Link Bandwidth Management Interrupt, also enable Link Autonomous Bandwidth Interrupt to cover the other source of bandwidth changes. 3) Handle Link Speed updates robustly. Refresh the cached Link Speed when enabling Bandwidth Notification Interrupts, and solve the race between Link Speed read and LBMS/LABS update in pcie_bwnotif_irq_thread(). 4) Use concurrency safe LNKCTL RMW operations. 5) The driver is now called PCIe bwctrl (bandwidth controller) instead of just bandwidth notifications because of increased scope and functionality within the driver. 6) Coexist with the Target Link Speed quirk in pcie_failed_link_retrain(). Provide LBMS counting API for it. 7) Tweaks to variable/functions names for consistency and length reasons. Bandwidth Notifications enable the cur_bus_speed in the struct pci_bus to keep track PCIe Link Speed changes. [bhelgaas: This is based on previous work by Alexandru Gagniuc <mr.nuke.me@gmail.com>; see e8303bb7a75c ("PCI/LINK: Report degraded links via link bandwidth notification")] Link: https://lore.kernel.org/r/20241018144755.7875-7-ilpo.jarvinen@linux.intel.com Link: https://lore.kernel.org/all/20190429185611.121751-1-helgaas@kernel.org/ Link: https://lore.kernel.org/linux-pci/20190501142942.26972-1-keith.busch@intel.com/ Link: https://lore.kernel.org/linux-pci/20200115221008.GA191037@google.com/ Suggested-by: Lukas Wunner <lukas@wunner.de> # Building bwctrl on top of bwnotif Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> [bhelgaas: squash fix to drop IRQF_ONESHOT and convert to hardirq handler: https://lore.kernel.org/r/20241115165717.15233-1-ilpo.jarvinen@linux.intel.com] Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Tested-by: Stefan Wahren <wahrenst@gmx.net> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
2024-10-18 17:47:52 +03:00
#include <linux/rwsem.h>
#include <linux/slab.h>
#include <linux/types.h>
#include "../pci.h"
#include "portdrv.h"
/**
* struct pcie_bwctrl_data - PCIe bandwidth controller
PCI/bwctrl: Add pcie_set_target_speed() to set PCIe Link Speed Currently, PCIe Link Speeds are adjusted by custom code rather than in a common function provided in PCI core. The PCIe bandwidth controller (bwctrl) introduces an in-kernel API, pcie_set_target_speed(), to set PCIe Link Speed. Convert Target Speed quirk to use the new API. The Target Speed quirk runs very early when bwctrl is not yet probed for a Port and can also run later when bwctrl is already setup for the Port, which requires the per port mutex (set_speed_mutex) to be only taken if the bwctrl setup is already complete. The new API is also intended to be used in an upcoming commit that adds a thermal cooling device to throttle PCIe bandwidth when thermal thresholds are reached. The PCIe bandwidth control procedure is as follows. The highest speed supported by the Port and the PCIe device which is not higher than the requested speed is selected and written into the Target Link Speed in the Link Control 2 Register. Then bandwidth controller retrains the PCIe Link. Bandwidth Notifications enable the cur_bus_speed in the struct pci_bus to keep track PCIe Link Speed changes. While Bandwidth Notifications should also be generated when bandwidth controller alters the PCIe Link Speed, a few platforms do not deliver LMBS interrupt after Link Training as expected. Thus, after changing the Link Speed, bandwidth controller makes additional read for the Link Status Register to ensure cur_bus_speed is consistent with the new PCIe Link Speed. Link: https://lore.kernel.org/r/20241018144755.7875-8-ilpo.jarvinen@linux.intel.com Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> [bhelgaas: squash devm_mutex_init() error checking from https://lore.kernel.org/r/20241030163139.2111689-1-andriy.shevchenko@linux.intel.com, drop export of pcie_set_target_speed()] Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
2024-10-18 17:47:53 +03:00
* @set_speed_mutex: Serializes link speed changes
* @cdev: Thermal cooling device associated with the port
PCI/bwctrl: Re-add BW notification portdrv as PCIe BW controller This mostly reverts the commit b4c7d2076b4e ("PCI/LINK: Remove bandwidth notification"). An upcoming commit extends this driver building PCIe bandwidth controller on top of it. PCIe bandwidth notifications were first added in the commit e8303bb7a75c ("PCI/LINK: Report degraded links via link bandwidth notification") but later had to be removed. The significant changes compared with the old bandwidth notification driver include: 1) Don't print the notifications into kernel log, just keep the Link Speed cached in struct pci_bus updated. While somewhat unfortunate, the log spam was the source of complaints that eventually lead to the removal of the bandwidth notifications driver (see the links below for further information). 2) Besides the Link Bandwidth Management Interrupt, also enable Link Autonomous Bandwidth Interrupt to cover the other source of bandwidth changes. 3) Handle Link Speed updates robustly. Refresh the cached Link Speed when enabling Bandwidth Notification Interrupts, and solve the race between Link Speed read and LBMS/LABS update in pcie_bwnotif_irq_thread(). 4) Use concurrency safe LNKCTL RMW operations. 5) The driver is now called PCIe bwctrl (bandwidth controller) instead of just bandwidth notifications because of increased scope and functionality within the driver. 6) Coexist with the Target Link Speed quirk in pcie_failed_link_retrain(). Provide LBMS counting API for it. 7) Tweaks to variable/functions names for consistency and length reasons. Bandwidth Notifications enable the cur_bus_speed in the struct pci_bus to keep track PCIe Link Speed changes. [bhelgaas: This is based on previous work by Alexandru Gagniuc <mr.nuke.me@gmail.com>; see e8303bb7a75c ("PCI/LINK: Report degraded links via link bandwidth notification")] Link: https://lore.kernel.org/r/20241018144755.7875-7-ilpo.jarvinen@linux.intel.com Link: https://lore.kernel.org/all/20190429185611.121751-1-helgaas@kernel.org/ Link: https://lore.kernel.org/linux-pci/20190501142942.26972-1-keith.busch@intel.com/ Link: https://lore.kernel.org/linux-pci/20200115221008.GA191037@google.com/ Suggested-by: Lukas Wunner <lukas@wunner.de> # Building bwctrl on top of bwnotif Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> [bhelgaas: squash fix to drop IRQF_ONESHOT and convert to hardirq handler: https://lore.kernel.org/r/20241115165717.15233-1-ilpo.jarvinen@linux.intel.com] Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Tested-by: Stefan Wahren <wahrenst@gmx.net> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
2024-10-18 17:47:52 +03:00
*/
struct pcie_bwctrl_data {
PCI/bwctrl: Add pcie_set_target_speed() to set PCIe Link Speed Currently, PCIe Link Speeds are adjusted by custom code rather than in a common function provided in PCI core. The PCIe bandwidth controller (bwctrl) introduces an in-kernel API, pcie_set_target_speed(), to set PCIe Link Speed. Convert Target Speed quirk to use the new API. The Target Speed quirk runs very early when bwctrl is not yet probed for a Port and can also run later when bwctrl is already setup for the Port, which requires the per port mutex (set_speed_mutex) to be only taken if the bwctrl setup is already complete. The new API is also intended to be used in an upcoming commit that adds a thermal cooling device to throttle PCIe bandwidth when thermal thresholds are reached. The PCIe bandwidth control procedure is as follows. The highest speed supported by the Port and the PCIe device which is not higher than the requested speed is selected and written into the Target Link Speed in the Link Control 2 Register. Then bandwidth controller retrains the PCIe Link. Bandwidth Notifications enable the cur_bus_speed in the struct pci_bus to keep track PCIe Link Speed changes. While Bandwidth Notifications should also be generated when bandwidth controller alters the PCIe Link Speed, a few platforms do not deliver LMBS interrupt after Link Training as expected. Thus, after changing the Link Speed, bandwidth controller makes additional read for the Link Status Register to ensure cur_bus_speed is consistent with the new PCIe Link Speed. Link: https://lore.kernel.org/r/20241018144755.7875-8-ilpo.jarvinen@linux.intel.com Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> [bhelgaas: squash devm_mutex_init() error checking from https://lore.kernel.org/r/20241030163139.2111689-1-andriy.shevchenko@linux.intel.com, drop export of pcie_set_target_speed()] Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
2024-10-18 17:47:53 +03:00
struct mutex set_speed_mutex;
struct thermal_cooling_device *cdev;
PCI/bwctrl: Re-add BW notification portdrv as PCIe BW controller This mostly reverts the commit b4c7d2076b4e ("PCI/LINK: Remove bandwidth notification"). An upcoming commit extends this driver building PCIe bandwidth controller on top of it. PCIe bandwidth notifications were first added in the commit e8303bb7a75c ("PCI/LINK: Report degraded links via link bandwidth notification") but later had to be removed. The significant changes compared with the old bandwidth notification driver include: 1) Don't print the notifications into kernel log, just keep the Link Speed cached in struct pci_bus updated. While somewhat unfortunate, the log spam was the source of complaints that eventually lead to the removal of the bandwidth notifications driver (see the links below for further information). 2) Besides the Link Bandwidth Management Interrupt, also enable Link Autonomous Bandwidth Interrupt to cover the other source of bandwidth changes. 3) Handle Link Speed updates robustly. Refresh the cached Link Speed when enabling Bandwidth Notification Interrupts, and solve the race between Link Speed read and LBMS/LABS update in pcie_bwnotif_irq_thread(). 4) Use concurrency safe LNKCTL RMW operations. 5) The driver is now called PCIe bwctrl (bandwidth controller) instead of just bandwidth notifications because of increased scope and functionality within the driver. 6) Coexist with the Target Link Speed quirk in pcie_failed_link_retrain(). Provide LBMS counting API for it. 7) Tweaks to variable/functions names for consistency and length reasons. Bandwidth Notifications enable the cur_bus_speed in the struct pci_bus to keep track PCIe Link Speed changes. [bhelgaas: This is based on previous work by Alexandru Gagniuc <mr.nuke.me@gmail.com>; see e8303bb7a75c ("PCI/LINK: Report degraded links via link bandwidth notification")] Link: https://lore.kernel.org/r/20241018144755.7875-7-ilpo.jarvinen@linux.intel.com Link: https://lore.kernel.org/all/20190429185611.121751-1-helgaas@kernel.org/ Link: https://lore.kernel.org/linux-pci/20190501142942.26972-1-keith.busch@intel.com/ Link: https://lore.kernel.org/linux-pci/20200115221008.GA191037@google.com/ Suggested-by: Lukas Wunner <lukas@wunner.de> # Building bwctrl on top of bwnotif Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> [bhelgaas: squash fix to drop IRQF_ONESHOT and convert to hardirq handler: https://lore.kernel.org/r/20241115165717.15233-1-ilpo.jarvinen@linux.intel.com] Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Tested-by: Stefan Wahren <wahrenst@gmx.net> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
2024-10-18 17:47:52 +03:00
};
PCI/bwctrl: Replace lbms_count with PCI_LINK_LBMS_SEEN flag PCIe BW controller counted LBMS assertions for the purposes of the Target Speed quirk (pcie_failed_link_retrain()). It was also a plan to expose the LBMS count through sysfs to allow better diagnosing link related issues. Lukas Wunner suggested, however, that adding a trace event would be better for diagnostics purposes, leaving only pcie_failed_link_retrain() as a user of the lbms_count. The logic in pcie_failed_link_retrain() does not require keeping count of LBMS assertions, so replace lbms_count with a simple flag in pci_dev's priv_flags. The reduced complexity allows removing pcie_bwctrl_lbms_rwsem. Since pcie_failed_link_retrain() runs before bwctrl is probed during boot, the LBMS in Link Status register still has to be checked by the quirk. The priv_flags numbering is not continuous because hotplug code added a few flags to fill numbers 4-5 (hotplug and bwctrl changes are routed through in different branches). Suggested-by: Lukas Wunner <lukas@wunner.de> Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> [bhelgaas: commit log] Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> [kwilczynski: squashed a fix to resolve build failures from https://lore.kernel.org/all/20250508090036.1528-1-ilpo.jarvinen@linux.intel.com] Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org> Reviewed-by: Lukas Wunner <lukas@wunner.de> Link: https://patch.msgid.link/20250422115548.1483-1-ilpo.jarvinen@linux.intel.com
2025-04-22 14:55:47 +03:00
/* Prevent port removal during Link Speed changes. */
PCI/bwctrl: Add pcie_set_target_speed() to set PCIe Link Speed Currently, PCIe Link Speeds are adjusted by custom code rather than in a common function provided in PCI core. The PCIe bandwidth controller (bwctrl) introduces an in-kernel API, pcie_set_target_speed(), to set PCIe Link Speed. Convert Target Speed quirk to use the new API. The Target Speed quirk runs very early when bwctrl is not yet probed for a Port and can also run later when bwctrl is already setup for the Port, which requires the per port mutex (set_speed_mutex) to be only taken if the bwctrl setup is already complete. The new API is also intended to be used in an upcoming commit that adds a thermal cooling device to throttle PCIe bandwidth when thermal thresholds are reached. The PCIe bandwidth control procedure is as follows. The highest speed supported by the Port and the PCIe device which is not higher than the requested speed is selected and written into the Target Link Speed in the Link Control 2 Register. Then bandwidth controller retrains the PCIe Link. Bandwidth Notifications enable the cur_bus_speed in the struct pci_bus to keep track PCIe Link Speed changes. While Bandwidth Notifications should also be generated when bandwidth controller alters the PCIe Link Speed, a few platforms do not deliver LMBS interrupt after Link Training as expected. Thus, after changing the Link Speed, bandwidth controller makes additional read for the Link Status Register to ensure cur_bus_speed is consistent with the new PCIe Link Speed. Link: https://lore.kernel.org/r/20241018144755.7875-8-ilpo.jarvinen@linux.intel.com Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> [bhelgaas: squash devm_mutex_init() error checking from https://lore.kernel.org/r/20241030163139.2111689-1-andriy.shevchenko@linux.intel.com, drop export of pcie_set_target_speed()] Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
2024-10-18 17:47:53 +03:00
static DECLARE_RWSEM(pcie_bwctrl_setspeed_rwsem);
static bool pcie_valid_speed(enum pci_bus_speed speed)
{
return (speed >= PCIE_SPEED_2_5GT) && (speed <= PCIE_SPEED_64_0GT);
}
static u16 pci_bus_speed2lnkctl2(enum pci_bus_speed speed)
{
static const u8 speed_conv[] = {
[PCIE_SPEED_2_5GT] = PCI_EXP_LNKCTL2_TLS_2_5GT,
[PCIE_SPEED_5_0GT] = PCI_EXP_LNKCTL2_TLS_5_0GT,
[PCIE_SPEED_8_0GT] = PCI_EXP_LNKCTL2_TLS_8_0GT,
[PCIE_SPEED_16_0GT] = PCI_EXP_LNKCTL2_TLS_16_0GT,
[PCIE_SPEED_32_0GT] = PCI_EXP_LNKCTL2_TLS_32_0GT,
[PCIE_SPEED_64_0GT] = PCI_EXP_LNKCTL2_TLS_64_0GT,
};
if (WARN_ON_ONCE(!pcie_valid_speed(speed)))
return 0;
return speed_conv[speed];
}
static inline u16 pcie_supported_speeds2target_speed(u8 supported_speeds)
{
return __fls(supported_speeds);
}
/**
* pcie_bwctrl_select_speed - Select Target Link Speed
* @port: PCIe Port
* @speed_req: Requested PCIe Link Speed
*
* Select Target Link Speed by take into account Supported Link Speeds of
* both the Root Port and the Endpoint.
*
* Return: Target Link Speed (1=2.5GT/s, 2=5GT/s, 3=8GT/s, etc.)
*/
static u16 pcie_bwctrl_select_speed(struct pci_dev *port, enum pci_bus_speed speed_req)
{
struct pci_bus *bus = port->subordinate;
u8 desired_speeds, supported_speeds;
struct pci_dev *dev;
desired_speeds = GENMASK(pci_bus_speed2lnkctl2(speed_req),
__fls(PCI_EXP_LNKCAP2_SLS_2_5GB));
supported_speeds = port->supported_speeds;
if (bus) {
down_read(&pci_bus_sem);
dev = list_first_entry_or_null(&bus->devices, struct pci_dev, bus_list);
if (dev)
supported_speeds &= dev->supported_speeds;
up_read(&pci_bus_sem);
}
if (!supported_speeds)
supported_speeds = PCI_EXP_LNKCAP2_SLS_2_5GB;
PCI/bwctrl: Add pcie_set_target_speed() to set PCIe Link Speed Currently, PCIe Link Speeds are adjusted by custom code rather than in a common function provided in PCI core. The PCIe bandwidth controller (bwctrl) introduces an in-kernel API, pcie_set_target_speed(), to set PCIe Link Speed. Convert Target Speed quirk to use the new API. The Target Speed quirk runs very early when bwctrl is not yet probed for a Port and can also run later when bwctrl is already setup for the Port, which requires the per port mutex (set_speed_mutex) to be only taken if the bwctrl setup is already complete. The new API is also intended to be used in an upcoming commit that adds a thermal cooling device to throttle PCIe bandwidth when thermal thresholds are reached. The PCIe bandwidth control procedure is as follows. The highest speed supported by the Port and the PCIe device which is not higher than the requested speed is selected and written into the Target Link Speed in the Link Control 2 Register. Then bandwidth controller retrains the PCIe Link. Bandwidth Notifications enable the cur_bus_speed in the struct pci_bus to keep track PCIe Link Speed changes. While Bandwidth Notifications should also be generated when bandwidth controller alters the PCIe Link Speed, a few platforms do not deliver LMBS interrupt after Link Training as expected. Thus, after changing the Link Speed, bandwidth controller makes additional read for the Link Status Register to ensure cur_bus_speed is consistent with the new PCIe Link Speed. Link: https://lore.kernel.org/r/20241018144755.7875-8-ilpo.jarvinen@linux.intel.com Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> [bhelgaas: squash devm_mutex_init() error checking from https://lore.kernel.org/r/20241030163139.2111689-1-andriy.shevchenko@linux.intel.com, drop export of pcie_set_target_speed()] Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
2024-10-18 17:47:53 +03:00
return pcie_supported_speeds2target_speed(supported_speeds & desired_speeds);
}
static int pcie_bwctrl_change_speed(struct pci_dev *port, u16 target_speed, bool use_lt)
{
int ret;
ret = pcie_capability_clear_and_set_word(port, PCI_EXP_LNKCTL2,
PCI_EXP_LNKCTL2_TLS, target_speed);
if (ret != PCIBIOS_SUCCESSFUL)
return pcibios_err_to_errno(ret);
PCI: Update Link Speed after retraining PCIe Link Retraining can alter Link Speed. pcie_retrain_link() that performs the Link Training is called from bwctrl and ASPM driver. While bwctrl listens for Link Bandwidth Management Status (LBMS) to pick up changes in Link Speed, there is a race between pcie_reset_lbms() clearing LBMS after the Link Training and pcie_bwnotif_irq() reading the Link Status register. If LBMS is already cleared when the irq handler reads the register, the interrupt handler will return early with IRQ_NONE and won't update the Link Speed. When Link Speed update originates from bwctrl, pcie_bwctrl_change_speed() ensures Link Speed is updated after the retraining. ASPM driver, however, calls pcie_retrain_link() but does not update the Link Speed after retraining which can result in stale Link Speed. Also, it is possible to have ASPM support with CONFIG_PCIEPORTBUS=n in which case bwctrl will not be built in (and thus won't update the Link Speed at all). To ensure Link Speed is not left stale after Link Training, move the call to pcie_update_link_speed() from pcie_bwctrl_change_speed() into pcie_retrain_link(). Suggested-by: Lukas Wunner <lukas@wunner.de> Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org> Reviewed-by: Lukas Wunner <lukas@wunner.de> Link: https://lore.kernel.org/linux-pci/aBCjpfyYmlkJ12AZ@wunner.de Link: https://lore.kernel.org/r/20250514132821.15705-1-ilpo.jarvinen@linux.intel.com
2025-05-14 16:28:21 +03:00
return pcie_retrain_link(port, use_lt);
PCI/bwctrl: Add pcie_set_target_speed() to set PCIe Link Speed Currently, PCIe Link Speeds are adjusted by custom code rather than in a common function provided in PCI core. The PCIe bandwidth controller (bwctrl) introduces an in-kernel API, pcie_set_target_speed(), to set PCIe Link Speed. Convert Target Speed quirk to use the new API. The Target Speed quirk runs very early when bwctrl is not yet probed for a Port and can also run later when bwctrl is already setup for the Port, which requires the per port mutex (set_speed_mutex) to be only taken if the bwctrl setup is already complete. The new API is also intended to be used in an upcoming commit that adds a thermal cooling device to throttle PCIe bandwidth when thermal thresholds are reached. The PCIe bandwidth control procedure is as follows. The highest speed supported by the Port and the PCIe device which is not higher than the requested speed is selected and written into the Target Link Speed in the Link Control 2 Register. Then bandwidth controller retrains the PCIe Link. Bandwidth Notifications enable the cur_bus_speed in the struct pci_bus to keep track PCIe Link Speed changes. While Bandwidth Notifications should also be generated when bandwidth controller alters the PCIe Link Speed, a few platforms do not deliver LMBS interrupt after Link Training as expected. Thus, after changing the Link Speed, bandwidth controller makes additional read for the Link Status Register to ensure cur_bus_speed is consistent with the new PCIe Link Speed. Link: https://lore.kernel.org/r/20241018144755.7875-8-ilpo.jarvinen@linux.intel.com Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> [bhelgaas: squash devm_mutex_init() error checking from https://lore.kernel.org/r/20241030163139.2111689-1-andriy.shevchenko@linux.intel.com, drop export of pcie_set_target_speed()] Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
2024-10-18 17:47:53 +03:00
}
/**
* pcie_set_target_speed - Set downstream Link Speed for PCIe Port
* @port: PCIe Port
* @speed_req: Requested PCIe Link Speed
* @use_lt: Wait for the LT or DLLLA bit to detect the end of link training
*
* Attempt to set PCIe Port Link Speed to @speed_req. @speed_req may be
* adjusted downwards to the best speed supported by both the Port and PCIe
* Device underneath it.
*
* Return:
* * 0 - on success
* * -EINVAL - @speed_req is not a PCIe Link Speed
* * -ENODEV - @port is not controllable
* * -ETIMEDOUT - changing Link Speed took too long
* * -EAGAIN - Link Speed was changed but @speed_req was not achieved
*/
int pcie_set_target_speed(struct pci_dev *port, enum pci_bus_speed speed_req,
bool use_lt)
{
struct pci_bus *bus = port->subordinate;
u16 target_speed;
int ret;
if (WARN_ON_ONCE(!pcie_valid_speed(speed_req)))
return -EINVAL;
if (bus && bus->cur_bus_speed == speed_req)
return 0;
target_speed = pcie_bwctrl_select_speed(port, speed_req);
scoped_guard(rwsem_read, &pcie_bwctrl_setspeed_rwsem) {
struct pcie_bwctrl_data *data = port->link_bwctrl;
/*
* port->link_bwctrl is NULL during initial scan when called
* e.g. from the Target Speed quirk.
*/
if (data)
mutex_lock(&data->set_speed_mutex);
ret = pcie_bwctrl_change_speed(port, target_speed, use_lt);
if (data)
mutex_unlock(&data->set_speed_mutex);
}
/*
* Despite setting higher speed into the Target Link Speed, empty
* bus won't train to 5GT+ speeds.
*/
if (!ret && bus && bus->cur_bus_speed != speed_req &&
!list_empty(&bus->devices))
ret = -EAGAIN;
return ret;
}
PCI/bwctrl: Re-add BW notification portdrv as PCIe BW controller This mostly reverts the commit b4c7d2076b4e ("PCI/LINK: Remove bandwidth notification"). An upcoming commit extends this driver building PCIe bandwidth controller on top of it. PCIe bandwidth notifications were first added in the commit e8303bb7a75c ("PCI/LINK: Report degraded links via link bandwidth notification") but later had to be removed. The significant changes compared with the old bandwidth notification driver include: 1) Don't print the notifications into kernel log, just keep the Link Speed cached in struct pci_bus updated. While somewhat unfortunate, the log spam was the source of complaints that eventually lead to the removal of the bandwidth notifications driver (see the links below for further information). 2) Besides the Link Bandwidth Management Interrupt, also enable Link Autonomous Bandwidth Interrupt to cover the other source of bandwidth changes. 3) Handle Link Speed updates robustly. Refresh the cached Link Speed when enabling Bandwidth Notification Interrupts, and solve the race between Link Speed read and LBMS/LABS update in pcie_bwnotif_irq_thread(). 4) Use concurrency safe LNKCTL RMW operations. 5) The driver is now called PCIe bwctrl (bandwidth controller) instead of just bandwidth notifications because of increased scope and functionality within the driver. 6) Coexist with the Target Link Speed quirk in pcie_failed_link_retrain(). Provide LBMS counting API for it. 7) Tweaks to variable/functions names for consistency and length reasons. Bandwidth Notifications enable the cur_bus_speed in the struct pci_bus to keep track PCIe Link Speed changes. [bhelgaas: This is based on previous work by Alexandru Gagniuc <mr.nuke.me@gmail.com>; see e8303bb7a75c ("PCI/LINK: Report degraded links via link bandwidth notification")] Link: https://lore.kernel.org/r/20241018144755.7875-7-ilpo.jarvinen@linux.intel.com Link: https://lore.kernel.org/all/20190429185611.121751-1-helgaas@kernel.org/ Link: https://lore.kernel.org/linux-pci/20190501142942.26972-1-keith.busch@intel.com/ Link: https://lore.kernel.org/linux-pci/20200115221008.GA191037@google.com/ Suggested-by: Lukas Wunner <lukas@wunner.de> # Building bwctrl on top of bwnotif Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> [bhelgaas: squash fix to drop IRQF_ONESHOT and convert to hardirq handler: https://lore.kernel.org/r/20241115165717.15233-1-ilpo.jarvinen@linux.intel.com] Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Tested-by: Stefan Wahren <wahrenst@gmx.net> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
2024-10-18 17:47:52 +03:00
static void pcie_bwnotif_enable(struct pcie_device *srv)
{
struct pci_dev *port = srv->port;
u16 link_status;
int ret;
PCI/bwctrl: Replace lbms_count with PCI_LINK_LBMS_SEEN flag PCIe BW controller counted LBMS assertions for the purposes of the Target Speed quirk (pcie_failed_link_retrain()). It was also a plan to expose the LBMS count through sysfs to allow better diagnosing link related issues. Lukas Wunner suggested, however, that adding a trace event would be better for diagnostics purposes, leaving only pcie_failed_link_retrain() as a user of the lbms_count. The logic in pcie_failed_link_retrain() does not require keeping count of LBMS assertions, so replace lbms_count with a simple flag in pci_dev's priv_flags. The reduced complexity allows removing pcie_bwctrl_lbms_rwsem. Since pcie_failed_link_retrain() runs before bwctrl is probed during boot, the LBMS in Link Status register still has to be checked by the quirk. The priv_flags numbering is not continuous because hotplug code added a few flags to fill numbers 4-5 (hotplug and bwctrl changes are routed through in different branches). Suggested-by: Lukas Wunner <lukas@wunner.de> Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> [bhelgaas: commit log] Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> [kwilczynski: squashed a fix to resolve build failures from https://lore.kernel.org/all/20250508090036.1528-1-ilpo.jarvinen@linux.intel.com] Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org> Reviewed-by: Lukas Wunner <lukas@wunner.de> Link: https://patch.msgid.link/20250422115548.1483-1-ilpo.jarvinen@linux.intel.com
2025-04-22 14:55:47 +03:00
/* Note if LBMS has been seen so far */
PCI/bwctrl: Re-add BW notification portdrv as PCIe BW controller This mostly reverts the commit b4c7d2076b4e ("PCI/LINK: Remove bandwidth notification"). An upcoming commit extends this driver building PCIe bandwidth controller on top of it. PCIe bandwidth notifications were first added in the commit e8303bb7a75c ("PCI/LINK: Report degraded links via link bandwidth notification") but later had to be removed. The significant changes compared with the old bandwidth notification driver include: 1) Don't print the notifications into kernel log, just keep the Link Speed cached in struct pci_bus updated. While somewhat unfortunate, the log spam was the source of complaints that eventually lead to the removal of the bandwidth notifications driver (see the links below for further information). 2) Besides the Link Bandwidth Management Interrupt, also enable Link Autonomous Bandwidth Interrupt to cover the other source of bandwidth changes. 3) Handle Link Speed updates robustly. Refresh the cached Link Speed when enabling Bandwidth Notification Interrupts, and solve the race between Link Speed read and LBMS/LABS update in pcie_bwnotif_irq_thread(). 4) Use concurrency safe LNKCTL RMW operations. 5) The driver is now called PCIe bwctrl (bandwidth controller) instead of just bandwidth notifications because of increased scope and functionality within the driver. 6) Coexist with the Target Link Speed quirk in pcie_failed_link_retrain(). Provide LBMS counting API for it. 7) Tweaks to variable/functions names for consistency and length reasons. Bandwidth Notifications enable the cur_bus_speed in the struct pci_bus to keep track PCIe Link Speed changes. [bhelgaas: This is based on previous work by Alexandru Gagniuc <mr.nuke.me@gmail.com>; see e8303bb7a75c ("PCI/LINK: Report degraded links via link bandwidth notification")] Link: https://lore.kernel.org/r/20241018144755.7875-7-ilpo.jarvinen@linux.intel.com Link: https://lore.kernel.org/all/20190429185611.121751-1-helgaas@kernel.org/ Link: https://lore.kernel.org/linux-pci/20190501142942.26972-1-keith.busch@intel.com/ Link: https://lore.kernel.org/linux-pci/20200115221008.GA191037@google.com/ Suggested-by: Lukas Wunner <lukas@wunner.de> # Building bwctrl on top of bwnotif Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> [bhelgaas: squash fix to drop IRQF_ONESHOT and convert to hardirq handler: https://lore.kernel.org/r/20241115165717.15233-1-ilpo.jarvinen@linux.intel.com] Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Tested-by: Stefan Wahren <wahrenst@gmx.net> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
2024-10-18 17:47:52 +03:00
ret = pcie_capability_read_word(port, PCI_EXP_LNKSTA, &link_status);
if (ret == PCIBIOS_SUCCESSFUL && link_status & PCI_EXP_LNKSTA_LBMS)
PCI/bwctrl: Replace lbms_count with PCI_LINK_LBMS_SEEN flag PCIe BW controller counted LBMS assertions for the purposes of the Target Speed quirk (pcie_failed_link_retrain()). It was also a plan to expose the LBMS count through sysfs to allow better diagnosing link related issues. Lukas Wunner suggested, however, that adding a trace event would be better for diagnostics purposes, leaving only pcie_failed_link_retrain() as a user of the lbms_count. The logic in pcie_failed_link_retrain() does not require keeping count of LBMS assertions, so replace lbms_count with a simple flag in pci_dev's priv_flags. The reduced complexity allows removing pcie_bwctrl_lbms_rwsem. Since pcie_failed_link_retrain() runs before bwctrl is probed during boot, the LBMS in Link Status register still has to be checked by the quirk. The priv_flags numbering is not continuous because hotplug code added a few flags to fill numbers 4-5 (hotplug and bwctrl changes are routed through in different branches). Suggested-by: Lukas Wunner <lukas@wunner.de> Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> [bhelgaas: commit log] Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> [kwilczynski: squashed a fix to resolve build failures from https://lore.kernel.org/all/20250508090036.1528-1-ilpo.jarvinen@linux.intel.com] Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org> Reviewed-by: Lukas Wunner <lukas@wunner.de> Link: https://patch.msgid.link/20250422115548.1483-1-ilpo.jarvinen@linux.intel.com
2025-04-22 14:55:47 +03:00
set_bit(PCI_LINK_LBMS_SEEN, &port->priv_flags);
PCI/bwctrl: Re-add BW notification portdrv as PCIe BW controller This mostly reverts the commit b4c7d2076b4e ("PCI/LINK: Remove bandwidth notification"). An upcoming commit extends this driver building PCIe bandwidth controller on top of it. PCIe bandwidth notifications were first added in the commit e8303bb7a75c ("PCI/LINK: Report degraded links via link bandwidth notification") but later had to be removed. The significant changes compared with the old bandwidth notification driver include: 1) Don't print the notifications into kernel log, just keep the Link Speed cached in struct pci_bus updated. While somewhat unfortunate, the log spam was the source of complaints that eventually lead to the removal of the bandwidth notifications driver (see the links below for further information). 2) Besides the Link Bandwidth Management Interrupt, also enable Link Autonomous Bandwidth Interrupt to cover the other source of bandwidth changes. 3) Handle Link Speed updates robustly. Refresh the cached Link Speed when enabling Bandwidth Notification Interrupts, and solve the race between Link Speed read and LBMS/LABS update in pcie_bwnotif_irq_thread(). 4) Use concurrency safe LNKCTL RMW operations. 5) The driver is now called PCIe bwctrl (bandwidth controller) instead of just bandwidth notifications because of increased scope and functionality within the driver. 6) Coexist with the Target Link Speed quirk in pcie_failed_link_retrain(). Provide LBMS counting API for it. 7) Tweaks to variable/functions names for consistency and length reasons. Bandwidth Notifications enable the cur_bus_speed in the struct pci_bus to keep track PCIe Link Speed changes. [bhelgaas: This is based on previous work by Alexandru Gagniuc <mr.nuke.me@gmail.com>; see e8303bb7a75c ("PCI/LINK: Report degraded links via link bandwidth notification")] Link: https://lore.kernel.org/r/20241018144755.7875-7-ilpo.jarvinen@linux.intel.com Link: https://lore.kernel.org/all/20190429185611.121751-1-helgaas@kernel.org/ Link: https://lore.kernel.org/linux-pci/20190501142942.26972-1-keith.busch@intel.com/ Link: https://lore.kernel.org/linux-pci/20200115221008.GA191037@google.com/ Suggested-by: Lukas Wunner <lukas@wunner.de> # Building bwctrl on top of bwnotif Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> [bhelgaas: squash fix to drop IRQF_ONESHOT and convert to hardirq handler: https://lore.kernel.org/r/20241115165717.15233-1-ilpo.jarvinen@linux.intel.com] Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Tested-by: Stefan Wahren <wahrenst@gmx.net> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
2024-10-18 17:47:52 +03:00
pcie_capability_set_word(port, PCI_EXP_LNKCTL,
PCI_EXP_LNKCTL_LBMIE | PCI_EXP_LNKCTL_LABIE);
pcie_capability_write_word(port, PCI_EXP_LNKSTA,
PCI_EXP_LNKSTA_LBMS | PCI_EXP_LNKSTA_LABS);
/*
* Update after enabling notifications & clearing status bits ensures
* link speed is up to date.
*/
pcie_update_link_speed(port->subordinate);
}
static void pcie_bwnotif_disable(struct pci_dev *port)
{
pcie_capability_clear_word(port, PCI_EXP_LNKCTL,
PCI_EXP_LNKCTL_LBMIE | PCI_EXP_LNKCTL_LABIE);
}
static irqreturn_t pcie_bwnotif_irq(int irq, void *context)
{
struct pcie_device *srv = context;
struct pci_dev *port = srv->port;
u16 link_status, events;
int ret;
ret = pcie_capability_read_word(port, PCI_EXP_LNKSTA, &link_status);
if (ret != PCIBIOS_SUCCESSFUL)
return IRQ_NONE;
events = link_status & (PCI_EXP_LNKSTA_LBMS | PCI_EXP_LNKSTA_LABS);
if (!events)
return IRQ_NONE;
if (events & PCI_EXP_LNKSTA_LBMS)
PCI/bwctrl: Replace lbms_count with PCI_LINK_LBMS_SEEN flag PCIe BW controller counted LBMS assertions for the purposes of the Target Speed quirk (pcie_failed_link_retrain()). It was also a plan to expose the LBMS count through sysfs to allow better diagnosing link related issues. Lukas Wunner suggested, however, that adding a trace event would be better for diagnostics purposes, leaving only pcie_failed_link_retrain() as a user of the lbms_count. The logic in pcie_failed_link_retrain() does not require keeping count of LBMS assertions, so replace lbms_count with a simple flag in pci_dev's priv_flags. The reduced complexity allows removing pcie_bwctrl_lbms_rwsem. Since pcie_failed_link_retrain() runs before bwctrl is probed during boot, the LBMS in Link Status register still has to be checked by the quirk. The priv_flags numbering is not continuous because hotplug code added a few flags to fill numbers 4-5 (hotplug and bwctrl changes are routed through in different branches). Suggested-by: Lukas Wunner <lukas@wunner.de> Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> [bhelgaas: commit log] Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> [kwilczynski: squashed a fix to resolve build failures from https://lore.kernel.org/all/20250508090036.1528-1-ilpo.jarvinen@linux.intel.com] Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org> Reviewed-by: Lukas Wunner <lukas@wunner.de> Link: https://patch.msgid.link/20250422115548.1483-1-ilpo.jarvinen@linux.intel.com
2025-04-22 14:55:47 +03:00
set_bit(PCI_LINK_LBMS_SEEN, &port->priv_flags);
PCI/bwctrl: Re-add BW notification portdrv as PCIe BW controller This mostly reverts the commit b4c7d2076b4e ("PCI/LINK: Remove bandwidth notification"). An upcoming commit extends this driver building PCIe bandwidth controller on top of it. PCIe bandwidth notifications were first added in the commit e8303bb7a75c ("PCI/LINK: Report degraded links via link bandwidth notification") but later had to be removed. The significant changes compared with the old bandwidth notification driver include: 1) Don't print the notifications into kernel log, just keep the Link Speed cached in struct pci_bus updated. While somewhat unfortunate, the log spam was the source of complaints that eventually lead to the removal of the bandwidth notifications driver (see the links below for further information). 2) Besides the Link Bandwidth Management Interrupt, also enable Link Autonomous Bandwidth Interrupt to cover the other source of bandwidth changes. 3) Handle Link Speed updates robustly. Refresh the cached Link Speed when enabling Bandwidth Notification Interrupts, and solve the race between Link Speed read and LBMS/LABS update in pcie_bwnotif_irq_thread(). 4) Use concurrency safe LNKCTL RMW operations. 5) The driver is now called PCIe bwctrl (bandwidth controller) instead of just bandwidth notifications because of increased scope and functionality within the driver. 6) Coexist with the Target Link Speed quirk in pcie_failed_link_retrain(). Provide LBMS counting API for it. 7) Tweaks to variable/functions names for consistency and length reasons. Bandwidth Notifications enable the cur_bus_speed in the struct pci_bus to keep track PCIe Link Speed changes. [bhelgaas: This is based on previous work by Alexandru Gagniuc <mr.nuke.me@gmail.com>; see e8303bb7a75c ("PCI/LINK: Report degraded links via link bandwidth notification")] Link: https://lore.kernel.org/r/20241018144755.7875-7-ilpo.jarvinen@linux.intel.com Link: https://lore.kernel.org/all/20190429185611.121751-1-helgaas@kernel.org/ Link: https://lore.kernel.org/linux-pci/20190501142942.26972-1-keith.busch@intel.com/ Link: https://lore.kernel.org/linux-pci/20200115221008.GA191037@google.com/ Suggested-by: Lukas Wunner <lukas@wunner.de> # Building bwctrl on top of bwnotif Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> [bhelgaas: squash fix to drop IRQF_ONESHOT and convert to hardirq handler: https://lore.kernel.org/r/20241115165717.15233-1-ilpo.jarvinen@linux.intel.com] Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Tested-by: Stefan Wahren <wahrenst@gmx.net> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
2024-10-18 17:47:52 +03:00
pcie_capability_write_word(port, PCI_EXP_LNKSTA, events);
/*
* Interrupts will not be triggered from any further Link Speed
* change until LBMS is cleared by the write. Therefore, re-read the
* speed (inside pcie_update_link_speed()) after LBMS has been
* cleared to avoid missing link speed changes.
*/
pcie_update_link_speed(port->subordinate);
return IRQ_HANDLED;
}
PCI/bwctrl: Replace lbms_count with PCI_LINK_LBMS_SEEN flag PCIe BW controller counted LBMS assertions for the purposes of the Target Speed quirk (pcie_failed_link_retrain()). It was also a plan to expose the LBMS count through sysfs to allow better diagnosing link related issues. Lukas Wunner suggested, however, that adding a trace event would be better for diagnostics purposes, leaving only pcie_failed_link_retrain() as a user of the lbms_count. The logic in pcie_failed_link_retrain() does not require keeping count of LBMS assertions, so replace lbms_count with a simple flag in pci_dev's priv_flags. The reduced complexity allows removing pcie_bwctrl_lbms_rwsem. Since pcie_failed_link_retrain() runs before bwctrl is probed during boot, the LBMS in Link Status register still has to be checked by the quirk. The priv_flags numbering is not continuous because hotplug code added a few flags to fill numbers 4-5 (hotplug and bwctrl changes are routed through in different branches). Suggested-by: Lukas Wunner <lukas@wunner.de> Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> [bhelgaas: commit log] Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> [kwilczynski: squashed a fix to resolve build failures from https://lore.kernel.org/all/20250508090036.1528-1-ilpo.jarvinen@linux.intel.com] Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org> Reviewed-by: Lukas Wunner <lukas@wunner.de> Link: https://patch.msgid.link/20250422115548.1483-1-ilpo.jarvinen@linux.intel.com
2025-04-22 14:55:47 +03:00
void pcie_reset_lbms(struct pci_dev *port)
PCI/bwctrl: Re-add BW notification portdrv as PCIe BW controller This mostly reverts the commit b4c7d2076b4e ("PCI/LINK: Remove bandwidth notification"). An upcoming commit extends this driver building PCIe bandwidth controller on top of it. PCIe bandwidth notifications were first added in the commit e8303bb7a75c ("PCI/LINK: Report degraded links via link bandwidth notification") but later had to be removed. The significant changes compared with the old bandwidth notification driver include: 1) Don't print the notifications into kernel log, just keep the Link Speed cached in struct pci_bus updated. While somewhat unfortunate, the log spam was the source of complaints that eventually lead to the removal of the bandwidth notifications driver (see the links below for further information). 2) Besides the Link Bandwidth Management Interrupt, also enable Link Autonomous Bandwidth Interrupt to cover the other source of bandwidth changes. 3) Handle Link Speed updates robustly. Refresh the cached Link Speed when enabling Bandwidth Notification Interrupts, and solve the race between Link Speed read and LBMS/LABS update in pcie_bwnotif_irq_thread(). 4) Use concurrency safe LNKCTL RMW operations. 5) The driver is now called PCIe bwctrl (bandwidth controller) instead of just bandwidth notifications because of increased scope and functionality within the driver. 6) Coexist with the Target Link Speed quirk in pcie_failed_link_retrain(). Provide LBMS counting API for it. 7) Tweaks to variable/functions names for consistency and length reasons. Bandwidth Notifications enable the cur_bus_speed in the struct pci_bus to keep track PCIe Link Speed changes. [bhelgaas: This is based on previous work by Alexandru Gagniuc <mr.nuke.me@gmail.com>; see e8303bb7a75c ("PCI/LINK: Report degraded links via link bandwidth notification")] Link: https://lore.kernel.org/r/20241018144755.7875-7-ilpo.jarvinen@linux.intel.com Link: https://lore.kernel.org/all/20190429185611.121751-1-helgaas@kernel.org/ Link: https://lore.kernel.org/linux-pci/20190501142942.26972-1-keith.busch@intel.com/ Link: https://lore.kernel.org/linux-pci/20200115221008.GA191037@google.com/ Suggested-by: Lukas Wunner <lukas@wunner.de> # Building bwctrl on top of bwnotif Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> [bhelgaas: squash fix to drop IRQF_ONESHOT and convert to hardirq handler: https://lore.kernel.org/r/20241115165717.15233-1-ilpo.jarvinen@linux.intel.com] Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Tested-by: Stefan Wahren <wahrenst@gmx.net> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
2024-10-18 17:47:52 +03:00
{
PCI/bwctrl: Replace lbms_count with PCI_LINK_LBMS_SEEN flag PCIe BW controller counted LBMS assertions for the purposes of the Target Speed quirk (pcie_failed_link_retrain()). It was also a plan to expose the LBMS count through sysfs to allow better diagnosing link related issues. Lukas Wunner suggested, however, that adding a trace event would be better for diagnostics purposes, leaving only pcie_failed_link_retrain() as a user of the lbms_count. The logic in pcie_failed_link_retrain() does not require keeping count of LBMS assertions, so replace lbms_count with a simple flag in pci_dev's priv_flags. The reduced complexity allows removing pcie_bwctrl_lbms_rwsem. Since pcie_failed_link_retrain() runs before bwctrl is probed during boot, the LBMS in Link Status register still has to be checked by the quirk. The priv_flags numbering is not continuous because hotplug code added a few flags to fill numbers 4-5 (hotplug and bwctrl changes are routed through in different branches). Suggested-by: Lukas Wunner <lukas@wunner.de> Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> [bhelgaas: commit log] Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> [kwilczynski: squashed a fix to resolve build failures from https://lore.kernel.org/all/20250508090036.1528-1-ilpo.jarvinen@linux.intel.com] Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org> Reviewed-by: Lukas Wunner <lukas@wunner.de> Link: https://patch.msgid.link/20250422115548.1483-1-ilpo.jarvinen@linux.intel.com
2025-04-22 14:55:47 +03:00
clear_bit(PCI_LINK_LBMS_SEEN, &port->priv_flags);
pcie_capability_write_word(port, PCI_EXP_LNKSTA, PCI_EXP_LNKSTA_LBMS);
PCI/bwctrl: Re-add BW notification portdrv as PCIe BW controller This mostly reverts the commit b4c7d2076b4e ("PCI/LINK: Remove bandwidth notification"). An upcoming commit extends this driver building PCIe bandwidth controller on top of it. PCIe bandwidth notifications were first added in the commit e8303bb7a75c ("PCI/LINK: Report degraded links via link bandwidth notification") but later had to be removed. The significant changes compared with the old bandwidth notification driver include: 1) Don't print the notifications into kernel log, just keep the Link Speed cached in struct pci_bus updated. While somewhat unfortunate, the log spam was the source of complaints that eventually lead to the removal of the bandwidth notifications driver (see the links below for further information). 2) Besides the Link Bandwidth Management Interrupt, also enable Link Autonomous Bandwidth Interrupt to cover the other source of bandwidth changes. 3) Handle Link Speed updates robustly. Refresh the cached Link Speed when enabling Bandwidth Notification Interrupts, and solve the race between Link Speed read and LBMS/LABS update in pcie_bwnotif_irq_thread(). 4) Use concurrency safe LNKCTL RMW operations. 5) The driver is now called PCIe bwctrl (bandwidth controller) instead of just bandwidth notifications because of increased scope and functionality within the driver. 6) Coexist with the Target Link Speed quirk in pcie_failed_link_retrain(). Provide LBMS counting API for it. 7) Tweaks to variable/functions names for consistency and length reasons. Bandwidth Notifications enable the cur_bus_speed in the struct pci_bus to keep track PCIe Link Speed changes. [bhelgaas: This is based on previous work by Alexandru Gagniuc <mr.nuke.me@gmail.com>; see e8303bb7a75c ("PCI/LINK: Report degraded links via link bandwidth notification")] Link: https://lore.kernel.org/r/20241018144755.7875-7-ilpo.jarvinen@linux.intel.com Link: https://lore.kernel.org/all/20190429185611.121751-1-helgaas@kernel.org/ Link: https://lore.kernel.org/linux-pci/20190501142942.26972-1-keith.busch@intel.com/ Link: https://lore.kernel.org/linux-pci/20200115221008.GA191037@google.com/ Suggested-by: Lukas Wunner <lukas@wunner.de> # Building bwctrl on top of bwnotif Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> [bhelgaas: squash fix to drop IRQF_ONESHOT and convert to hardirq handler: https://lore.kernel.org/r/20241115165717.15233-1-ilpo.jarvinen@linux.intel.com] Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Tested-by: Stefan Wahren <wahrenst@gmx.net> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
2024-10-18 17:47:52 +03:00
}
static int pcie_bwnotif_probe(struct pcie_device *srv)
{
struct pci_dev *port = srv->port;
int ret;
PCI/bwctrl: Fix NULL pointer dereference on bus number exhaustion When BIOS neglects to assign bus numbers to PCI bridges, the kernel attempts to correct that during PCI device enumeration. If it runs out of bus numbers, no pci_bus is allocated and the "subordinate" pointer in the bridge's pci_dev remains NULL. The PCIe bandwidth controller erroneously does not check for a NULL subordinate pointer and dereferences it on probe. Bandwidth control of unusable devices below the bridge is of questionable utility, so simply error out instead. This mirrors what PCIe hotplug does since commit 62e4492c3063 ("PCI: Prevent NULL dereference during pciehp probe"). The PCI core emits a message with KERN_INFO severity if it has run out of bus numbers. PCIe hotplug emits an additional message with KERN_ERR severity to inform the user that hotplug functionality is disabled at the bridge. A similar message for bandwidth control does not seem merited, given that its only purpose so far is to expose an up-to-date link speed in sysfs and throttle the link speed on certain laptops with limited Thermal Design Power. So error out silently. User-visible messages: pci 0000:16:02.0: bridge configuration invalid ([bus 00-00]), reconfiguring [...] pci_bus 0000:45: busn_res: [bus 45-74] end is updated to 74 pci 0000:16:02.0: devices behind bridge are unusable because [bus 45-74] cannot be assigned for them [...] pcieport 0000:16:02.0: pciehp: Hotplug bridge without secondary bus, ignoring [...] BUG: kernel NULL pointer dereference RIP: pcie_update_link_speed pcie_bwnotif_enable pcie_bwnotif_probe pcie_port_probe_service really_probe Fixes: 665745f27487 ("PCI/bwctrl: Re-add BW notification portdrv as PCIe BW controller") Reported-by: Wouter Bijlsma <wouter@wouterbijlsma.nl> Closes: https://bugzilla.kernel.org/show_bug.cgi?id=219906 Signed-off-by: Lukas Wunner <lukas@wunner.de> Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org> Tested-by: Wouter Bijlsma <wouter@wouterbijlsma.nl> Cc: stable@vger.kernel.org # v6.13+ Link: https://lore.kernel.org/r/3b6c8d973aedc48860640a9d75d20528336f1f3c.1742669372.git.lukas@wunner.de
2025-03-22 19:52:08 +01:00
/* Can happen if we run out of bus numbers during enumeration. */
if (!port->subordinate)
return -ENODEV;
PCI/bwctrl: Re-add BW notification portdrv as PCIe BW controller This mostly reverts the commit b4c7d2076b4e ("PCI/LINK: Remove bandwidth notification"). An upcoming commit extends this driver building PCIe bandwidth controller on top of it. PCIe bandwidth notifications were first added in the commit e8303bb7a75c ("PCI/LINK: Report degraded links via link bandwidth notification") but later had to be removed. The significant changes compared with the old bandwidth notification driver include: 1) Don't print the notifications into kernel log, just keep the Link Speed cached in struct pci_bus updated. While somewhat unfortunate, the log spam was the source of complaints that eventually lead to the removal of the bandwidth notifications driver (see the links below for further information). 2) Besides the Link Bandwidth Management Interrupt, also enable Link Autonomous Bandwidth Interrupt to cover the other source of bandwidth changes. 3) Handle Link Speed updates robustly. Refresh the cached Link Speed when enabling Bandwidth Notification Interrupts, and solve the race between Link Speed read and LBMS/LABS update in pcie_bwnotif_irq_thread(). 4) Use concurrency safe LNKCTL RMW operations. 5) The driver is now called PCIe bwctrl (bandwidth controller) instead of just bandwidth notifications because of increased scope and functionality within the driver. 6) Coexist with the Target Link Speed quirk in pcie_failed_link_retrain(). Provide LBMS counting API for it. 7) Tweaks to variable/functions names for consistency and length reasons. Bandwidth Notifications enable the cur_bus_speed in the struct pci_bus to keep track PCIe Link Speed changes. [bhelgaas: This is based on previous work by Alexandru Gagniuc <mr.nuke.me@gmail.com>; see e8303bb7a75c ("PCI/LINK: Report degraded links via link bandwidth notification")] Link: https://lore.kernel.org/r/20241018144755.7875-7-ilpo.jarvinen@linux.intel.com Link: https://lore.kernel.org/all/20190429185611.121751-1-helgaas@kernel.org/ Link: https://lore.kernel.org/linux-pci/20190501142942.26972-1-keith.busch@intel.com/ Link: https://lore.kernel.org/linux-pci/20200115221008.GA191037@google.com/ Suggested-by: Lukas Wunner <lukas@wunner.de> # Building bwctrl on top of bwnotif Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> [bhelgaas: squash fix to drop IRQF_ONESHOT and convert to hardirq handler: https://lore.kernel.org/r/20241115165717.15233-1-ilpo.jarvinen@linux.intel.com] Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Tested-by: Stefan Wahren <wahrenst@gmx.net> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
2024-10-18 17:47:52 +03:00
struct pcie_bwctrl_data *data = devm_kzalloc(&srv->device,
sizeof(*data), GFP_KERNEL);
if (!data)
return -ENOMEM;
PCI/bwctrl: Add pcie_set_target_speed() to set PCIe Link Speed Currently, PCIe Link Speeds are adjusted by custom code rather than in a common function provided in PCI core. The PCIe bandwidth controller (bwctrl) introduces an in-kernel API, pcie_set_target_speed(), to set PCIe Link Speed. Convert Target Speed quirk to use the new API. The Target Speed quirk runs very early when bwctrl is not yet probed for a Port and can also run later when bwctrl is already setup for the Port, which requires the per port mutex (set_speed_mutex) to be only taken if the bwctrl setup is already complete. The new API is also intended to be used in an upcoming commit that adds a thermal cooling device to throttle PCIe bandwidth when thermal thresholds are reached. The PCIe bandwidth control procedure is as follows. The highest speed supported by the Port and the PCIe device which is not higher than the requested speed is selected and written into the Target Link Speed in the Link Control 2 Register. Then bandwidth controller retrains the PCIe Link. Bandwidth Notifications enable the cur_bus_speed in the struct pci_bus to keep track PCIe Link Speed changes. While Bandwidth Notifications should also be generated when bandwidth controller alters the PCIe Link Speed, a few platforms do not deliver LMBS interrupt after Link Training as expected. Thus, after changing the Link Speed, bandwidth controller makes additional read for the Link Status Register to ensure cur_bus_speed is consistent with the new PCIe Link Speed. Link: https://lore.kernel.org/r/20241018144755.7875-8-ilpo.jarvinen@linux.intel.com Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> [bhelgaas: squash devm_mutex_init() error checking from https://lore.kernel.org/r/20241030163139.2111689-1-andriy.shevchenko@linux.intel.com, drop export of pcie_set_target_speed()] Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
2024-10-18 17:47:53 +03:00
ret = devm_mutex_init(&srv->device, &data->set_speed_mutex);
if (ret)
return ret;
scoped_guard(rwsem_write, &pcie_bwctrl_setspeed_rwsem) {
PCI/bwctrl: Replace lbms_count with PCI_LINK_LBMS_SEEN flag PCIe BW controller counted LBMS assertions for the purposes of the Target Speed quirk (pcie_failed_link_retrain()). It was also a plan to expose the LBMS count through sysfs to allow better diagnosing link related issues. Lukas Wunner suggested, however, that adding a trace event would be better for diagnostics purposes, leaving only pcie_failed_link_retrain() as a user of the lbms_count. The logic in pcie_failed_link_retrain() does not require keeping count of LBMS assertions, so replace lbms_count with a simple flag in pci_dev's priv_flags. The reduced complexity allows removing pcie_bwctrl_lbms_rwsem. Since pcie_failed_link_retrain() runs before bwctrl is probed during boot, the LBMS in Link Status register still has to be checked by the quirk. The priv_flags numbering is not continuous because hotplug code added a few flags to fill numbers 4-5 (hotplug and bwctrl changes are routed through in different branches). Suggested-by: Lukas Wunner <lukas@wunner.de> Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> [bhelgaas: commit log] Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> [kwilczynski: squashed a fix to resolve build failures from https://lore.kernel.org/all/20250508090036.1528-1-ilpo.jarvinen@linux.intel.com] Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org> Reviewed-by: Lukas Wunner <lukas@wunner.de> Link: https://patch.msgid.link/20250422115548.1483-1-ilpo.jarvinen@linux.intel.com
2025-04-22 14:55:47 +03:00
port->link_bwctrl = data;
PCI/bwctrl: Fix NULL pointer deref on unbind and bind The interrupt handler for bandwidth notifications, pcie_bwnotif_irq(), dereferences a "data" pointer. On unbind, that pointer is set to NULL by pcie_bwnotif_remove(). However the interrupt handler may still be invoked afterwards and will dereference that NULL pointer. That's because the interrupt is requested using a devm_*() helper and the driver core releases devm_*() resources *after* calling ->remove(). pcie_bwnotif_remove() does clear the Link Bandwidth Management Interrupt Enable and Link Autonomous Bandwidth Interrupt Enable bits in the Link Control Register, but that won't prevent execution of pcie_bwnotif_irq(): The interrupt for bandwidth notifications may be shared with AER, DPC, PME, and hotplug. So pcie_bwnotif_irq() may be executed as long as the interrupt is requested. There's a similar race on bind: pcie_bwnotif_probe() requests the interrupt when the "data" pointer still points to NULL. A NULL pointer deref may thus likewise occur if AER, DPC, PME or hotplug raise an interrupt in-between the bandwidth controller's call to devm_request_irq() and assignment of the "data" pointer. Drop the devm_*() usage and reorder requesting of the interrupt to fix the issue. While at it, drop a stray but harmless no_free_ptr() invocation when assigning the "data" pointer in pcie_bwnotif_probe(). Ilpo points out that the locking on unbind and bind needs to be symmetric, so move the call to pcie_bwnotif_disable() inside the critical section protected by pcie_bwctrl_setspeed_rwsem and pcie_bwctrl_lbms_rwsem. Evert reports a hang on shutdown of an ASUS ROG Strix SCAR 17 G733PYV. The issue is no longer reproducible with the present commit. Evert found that attaching a USB-C monitor prevented the hang. The machine contains an ASMedia USB 3.2 controller below a hotplug-capable Root Port. So one possible explanation is that the controller gets hot-removed on shutdown unless something is connected. And the ensuing hotplug interrupt occurs exactly when the bandwidth controller is unregistering. The precise cause could not be determined because the screen had already turned black when the hang occurred. Link: https://lore.kernel.org/r/ae2b02c9cfbefff475b6e132b0aa962aaccbd7b2.1736162539.git.lukas@wunner.de Fixes: 665745f27487 ("PCI/bwctrl: Re-add BW notification portdrv as PCIe BW controller") Reported-by: Evert Vorster <evorster@gmail.com> Closes: https://bugzilla.kernel.org/show_bug.cgi?id=219629 Signed-off-by: Lukas Wunner <lukas@wunner.de> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Tested-by: Evert Vorster <evorster@gmail.com> Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
2025-01-06 12:26:35 +01:00
PCI/bwctrl: Replace lbms_count with PCI_LINK_LBMS_SEEN flag PCIe BW controller counted LBMS assertions for the purposes of the Target Speed quirk (pcie_failed_link_retrain()). It was also a plan to expose the LBMS count through sysfs to allow better diagnosing link related issues. Lukas Wunner suggested, however, that adding a trace event would be better for diagnostics purposes, leaving only pcie_failed_link_retrain() as a user of the lbms_count. The logic in pcie_failed_link_retrain() does not require keeping count of LBMS assertions, so replace lbms_count with a simple flag in pci_dev's priv_flags. The reduced complexity allows removing pcie_bwctrl_lbms_rwsem. Since pcie_failed_link_retrain() runs before bwctrl is probed during boot, the LBMS in Link Status register still has to be checked by the quirk. The priv_flags numbering is not continuous because hotplug code added a few flags to fill numbers 4-5 (hotplug and bwctrl changes are routed through in different branches). Suggested-by: Lukas Wunner <lukas@wunner.de> Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> [bhelgaas: commit log] Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> [kwilczynski: squashed a fix to resolve build failures from https://lore.kernel.org/all/20250508090036.1528-1-ilpo.jarvinen@linux.intel.com] Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org> Reviewed-by: Lukas Wunner <lukas@wunner.de> Link: https://patch.msgid.link/20250422115548.1483-1-ilpo.jarvinen@linux.intel.com
2025-04-22 14:55:47 +03:00
ret = request_irq(srv->irq, pcie_bwnotif_irq,
IRQF_SHARED, "PCIe bwctrl", srv);
if (ret) {
port->link_bwctrl = NULL;
return ret;
PCI/bwctrl: Add pcie_set_target_speed() to set PCIe Link Speed Currently, PCIe Link Speeds are adjusted by custom code rather than in a common function provided in PCI core. The PCIe bandwidth controller (bwctrl) introduces an in-kernel API, pcie_set_target_speed(), to set PCIe Link Speed. Convert Target Speed quirk to use the new API. The Target Speed quirk runs very early when bwctrl is not yet probed for a Port and can also run later when bwctrl is already setup for the Port, which requires the per port mutex (set_speed_mutex) to be only taken if the bwctrl setup is already complete. The new API is also intended to be used in an upcoming commit that adds a thermal cooling device to throttle PCIe bandwidth when thermal thresholds are reached. The PCIe bandwidth control procedure is as follows. The highest speed supported by the Port and the PCIe device which is not higher than the requested speed is selected and written into the Target Link Speed in the Link Control 2 Register. Then bandwidth controller retrains the PCIe Link. Bandwidth Notifications enable the cur_bus_speed in the struct pci_bus to keep track PCIe Link Speed changes. While Bandwidth Notifications should also be generated when bandwidth controller alters the PCIe Link Speed, a few platforms do not deliver LMBS interrupt after Link Training as expected. Thus, after changing the Link Speed, bandwidth controller makes additional read for the Link Status Register to ensure cur_bus_speed is consistent with the new PCIe Link Speed. Link: https://lore.kernel.org/r/20241018144755.7875-8-ilpo.jarvinen@linux.intel.com Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> [bhelgaas: squash devm_mutex_init() error checking from https://lore.kernel.org/r/20241030163139.2111689-1-andriy.shevchenko@linux.intel.com, drop export of pcie_set_target_speed()] Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
2024-10-18 17:47:53 +03:00
}
PCI/bwctrl: Replace lbms_count with PCI_LINK_LBMS_SEEN flag PCIe BW controller counted LBMS assertions for the purposes of the Target Speed quirk (pcie_failed_link_retrain()). It was also a plan to expose the LBMS count through sysfs to allow better diagnosing link related issues. Lukas Wunner suggested, however, that adding a trace event would be better for diagnostics purposes, leaving only pcie_failed_link_retrain() as a user of the lbms_count. The logic in pcie_failed_link_retrain() does not require keeping count of LBMS assertions, so replace lbms_count with a simple flag in pci_dev's priv_flags. The reduced complexity allows removing pcie_bwctrl_lbms_rwsem. Since pcie_failed_link_retrain() runs before bwctrl is probed during boot, the LBMS in Link Status register still has to be checked by the quirk. The priv_flags numbering is not continuous because hotplug code added a few flags to fill numbers 4-5 (hotplug and bwctrl changes are routed through in different branches). Suggested-by: Lukas Wunner <lukas@wunner.de> Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> [bhelgaas: commit log] Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> [kwilczynski: squashed a fix to resolve build failures from https://lore.kernel.org/all/20250508090036.1528-1-ilpo.jarvinen@linux.intel.com] Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org> Reviewed-by: Lukas Wunner <lukas@wunner.de> Link: https://patch.msgid.link/20250422115548.1483-1-ilpo.jarvinen@linux.intel.com
2025-04-22 14:55:47 +03:00
pcie_bwnotif_enable(srv);
PCI/bwctrl: Re-add BW notification portdrv as PCIe BW controller This mostly reverts the commit b4c7d2076b4e ("PCI/LINK: Remove bandwidth notification"). An upcoming commit extends this driver building PCIe bandwidth controller on top of it. PCIe bandwidth notifications were first added in the commit e8303bb7a75c ("PCI/LINK: Report degraded links via link bandwidth notification") but later had to be removed. The significant changes compared with the old bandwidth notification driver include: 1) Don't print the notifications into kernel log, just keep the Link Speed cached in struct pci_bus updated. While somewhat unfortunate, the log spam was the source of complaints that eventually lead to the removal of the bandwidth notifications driver (see the links below for further information). 2) Besides the Link Bandwidth Management Interrupt, also enable Link Autonomous Bandwidth Interrupt to cover the other source of bandwidth changes. 3) Handle Link Speed updates robustly. Refresh the cached Link Speed when enabling Bandwidth Notification Interrupts, and solve the race between Link Speed read and LBMS/LABS update in pcie_bwnotif_irq_thread(). 4) Use concurrency safe LNKCTL RMW operations. 5) The driver is now called PCIe bwctrl (bandwidth controller) instead of just bandwidth notifications because of increased scope and functionality within the driver. 6) Coexist with the Target Link Speed quirk in pcie_failed_link_retrain(). Provide LBMS counting API for it. 7) Tweaks to variable/functions names for consistency and length reasons. Bandwidth Notifications enable the cur_bus_speed in the struct pci_bus to keep track PCIe Link Speed changes. [bhelgaas: This is based on previous work by Alexandru Gagniuc <mr.nuke.me@gmail.com>; see e8303bb7a75c ("PCI/LINK: Report degraded links via link bandwidth notification")] Link: https://lore.kernel.org/r/20241018144755.7875-7-ilpo.jarvinen@linux.intel.com Link: https://lore.kernel.org/all/20190429185611.121751-1-helgaas@kernel.org/ Link: https://lore.kernel.org/linux-pci/20190501142942.26972-1-keith.busch@intel.com/ Link: https://lore.kernel.org/linux-pci/20200115221008.GA191037@google.com/ Suggested-by: Lukas Wunner <lukas@wunner.de> # Building bwctrl on top of bwnotif Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> [bhelgaas: squash fix to drop IRQF_ONESHOT and convert to hardirq handler: https://lore.kernel.org/r/20241115165717.15233-1-ilpo.jarvinen@linux.intel.com] Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Tested-by: Stefan Wahren <wahrenst@gmx.net> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
2024-10-18 17:47:52 +03:00
}
pci_dbg(port, "enabled with IRQ %d\n", srv->irq);
/* Don't fail on errors. Don't leave IS_ERR() "pointer" into ->cdev */
port->link_bwctrl->cdev = pcie_cooling_device_register(port);
if (IS_ERR(port->link_bwctrl->cdev))
port->link_bwctrl->cdev = NULL;
PCI/bwctrl: Re-add BW notification portdrv as PCIe BW controller This mostly reverts the commit b4c7d2076b4e ("PCI/LINK: Remove bandwidth notification"). An upcoming commit extends this driver building PCIe bandwidth controller on top of it. PCIe bandwidth notifications were first added in the commit e8303bb7a75c ("PCI/LINK: Report degraded links via link bandwidth notification") but later had to be removed. The significant changes compared with the old bandwidth notification driver include: 1) Don't print the notifications into kernel log, just keep the Link Speed cached in struct pci_bus updated. While somewhat unfortunate, the log spam was the source of complaints that eventually lead to the removal of the bandwidth notifications driver (see the links below for further information). 2) Besides the Link Bandwidth Management Interrupt, also enable Link Autonomous Bandwidth Interrupt to cover the other source of bandwidth changes. 3) Handle Link Speed updates robustly. Refresh the cached Link Speed when enabling Bandwidth Notification Interrupts, and solve the race between Link Speed read and LBMS/LABS update in pcie_bwnotif_irq_thread(). 4) Use concurrency safe LNKCTL RMW operations. 5) The driver is now called PCIe bwctrl (bandwidth controller) instead of just bandwidth notifications because of increased scope and functionality within the driver. 6) Coexist with the Target Link Speed quirk in pcie_failed_link_retrain(). Provide LBMS counting API for it. 7) Tweaks to variable/functions names for consistency and length reasons. Bandwidth Notifications enable the cur_bus_speed in the struct pci_bus to keep track PCIe Link Speed changes. [bhelgaas: This is based on previous work by Alexandru Gagniuc <mr.nuke.me@gmail.com>; see e8303bb7a75c ("PCI/LINK: Report degraded links via link bandwidth notification")] Link: https://lore.kernel.org/r/20241018144755.7875-7-ilpo.jarvinen@linux.intel.com Link: https://lore.kernel.org/all/20190429185611.121751-1-helgaas@kernel.org/ Link: https://lore.kernel.org/linux-pci/20190501142942.26972-1-keith.busch@intel.com/ Link: https://lore.kernel.org/linux-pci/20200115221008.GA191037@google.com/ Suggested-by: Lukas Wunner <lukas@wunner.de> # Building bwctrl on top of bwnotif Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> [bhelgaas: squash fix to drop IRQF_ONESHOT and convert to hardirq handler: https://lore.kernel.org/r/20241115165717.15233-1-ilpo.jarvinen@linux.intel.com] Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Tested-by: Stefan Wahren <wahrenst@gmx.net> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
2024-10-18 17:47:52 +03:00
return 0;
}
static void pcie_bwnotif_remove(struct pcie_device *srv)
{
struct pcie_bwctrl_data *data = srv->port->link_bwctrl;
pcie_cooling_device_unregister(data->cdev);
PCI/bwctrl: Fix NULL pointer deref on unbind and bind The interrupt handler for bandwidth notifications, pcie_bwnotif_irq(), dereferences a "data" pointer. On unbind, that pointer is set to NULL by pcie_bwnotif_remove(). However the interrupt handler may still be invoked afterwards and will dereference that NULL pointer. That's because the interrupt is requested using a devm_*() helper and the driver core releases devm_*() resources *after* calling ->remove(). pcie_bwnotif_remove() does clear the Link Bandwidth Management Interrupt Enable and Link Autonomous Bandwidth Interrupt Enable bits in the Link Control Register, but that won't prevent execution of pcie_bwnotif_irq(): The interrupt for bandwidth notifications may be shared with AER, DPC, PME, and hotplug. So pcie_bwnotif_irq() may be executed as long as the interrupt is requested. There's a similar race on bind: pcie_bwnotif_probe() requests the interrupt when the "data" pointer still points to NULL. A NULL pointer deref may thus likewise occur if AER, DPC, PME or hotplug raise an interrupt in-between the bandwidth controller's call to devm_request_irq() and assignment of the "data" pointer. Drop the devm_*() usage and reorder requesting of the interrupt to fix the issue. While at it, drop a stray but harmless no_free_ptr() invocation when assigning the "data" pointer in pcie_bwnotif_probe(). Ilpo points out that the locking on unbind and bind needs to be symmetric, so move the call to pcie_bwnotif_disable() inside the critical section protected by pcie_bwctrl_setspeed_rwsem and pcie_bwctrl_lbms_rwsem. Evert reports a hang on shutdown of an ASUS ROG Strix SCAR 17 G733PYV. The issue is no longer reproducible with the present commit. Evert found that attaching a USB-C monitor prevented the hang. The machine contains an ASMedia USB 3.2 controller below a hotplug-capable Root Port. So one possible explanation is that the controller gets hot-removed on shutdown unless something is connected. And the ensuing hotplug interrupt occurs exactly when the bandwidth controller is unregistering. The precise cause could not be determined because the screen had already turned black when the hang occurred. Link: https://lore.kernel.org/r/ae2b02c9cfbefff475b6e132b0aa962aaccbd7b2.1736162539.git.lukas@wunner.de Fixes: 665745f27487 ("PCI/bwctrl: Re-add BW notification portdrv as PCIe BW controller") Reported-by: Evert Vorster <evorster@gmail.com> Closes: https://bugzilla.kernel.org/show_bug.cgi?id=219629 Signed-off-by: Lukas Wunner <lukas@wunner.de> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Tested-by: Evert Vorster <evorster@gmail.com> Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
2025-01-06 12:26:35 +01:00
scoped_guard(rwsem_write, &pcie_bwctrl_setspeed_rwsem) {
PCI/bwctrl: Replace lbms_count with PCI_LINK_LBMS_SEEN flag PCIe BW controller counted LBMS assertions for the purposes of the Target Speed quirk (pcie_failed_link_retrain()). It was also a plan to expose the LBMS count through sysfs to allow better diagnosing link related issues. Lukas Wunner suggested, however, that adding a trace event would be better for diagnostics purposes, leaving only pcie_failed_link_retrain() as a user of the lbms_count. The logic in pcie_failed_link_retrain() does not require keeping count of LBMS assertions, so replace lbms_count with a simple flag in pci_dev's priv_flags. The reduced complexity allows removing pcie_bwctrl_lbms_rwsem. Since pcie_failed_link_retrain() runs before bwctrl is probed during boot, the LBMS in Link Status register still has to be checked by the quirk. The priv_flags numbering is not continuous because hotplug code added a few flags to fill numbers 4-5 (hotplug and bwctrl changes are routed through in different branches). Suggested-by: Lukas Wunner <lukas@wunner.de> Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> [bhelgaas: commit log] Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> [kwilczynski: squashed a fix to resolve build failures from https://lore.kernel.org/all/20250508090036.1528-1-ilpo.jarvinen@linux.intel.com] Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org> Reviewed-by: Lukas Wunner <lukas@wunner.de> Link: https://patch.msgid.link/20250422115548.1483-1-ilpo.jarvinen@linux.intel.com
2025-04-22 14:55:47 +03:00
pcie_bwnotif_disable(srv->port);
PCI/bwctrl: Fix NULL pointer deref on unbind and bind The interrupt handler for bandwidth notifications, pcie_bwnotif_irq(), dereferences a "data" pointer. On unbind, that pointer is set to NULL by pcie_bwnotif_remove(). However the interrupt handler may still be invoked afterwards and will dereference that NULL pointer. That's because the interrupt is requested using a devm_*() helper and the driver core releases devm_*() resources *after* calling ->remove(). pcie_bwnotif_remove() does clear the Link Bandwidth Management Interrupt Enable and Link Autonomous Bandwidth Interrupt Enable bits in the Link Control Register, but that won't prevent execution of pcie_bwnotif_irq(): The interrupt for bandwidth notifications may be shared with AER, DPC, PME, and hotplug. So pcie_bwnotif_irq() may be executed as long as the interrupt is requested. There's a similar race on bind: pcie_bwnotif_probe() requests the interrupt when the "data" pointer still points to NULL. A NULL pointer deref may thus likewise occur if AER, DPC, PME or hotplug raise an interrupt in-between the bandwidth controller's call to devm_request_irq() and assignment of the "data" pointer. Drop the devm_*() usage and reorder requesting of the interrupt to fix the issue. While at it, drop a stray but harmless no_free_ptr() invocation when assigning the "data" pointer in pcie_bwnotif_probe(). Ilpo points out that the locking on unbind and bind needs to be symmetric, so move the call to pcie_bwnotif_disable() inside the critical section protected by pcie_bwctrl_setspeed_rwsem and pcie_bwctrl_lbms_rwsem. Evert reports a hang on shutdown of an ASUS ROG Strix SCAR 17 G733PYV. The issue is no longer reproducible with the present commit. Evert found that attaching a USB-C monitor prevented the hang. The machine contains an ASMedia USB 3.2 controller below a hotplug-capable Root Port. So one possible explanation is that the controller gets hot-removed on shutdown unless something is connected. And the ensuing hotplug interrupt occurs exactly when the bandwidth controller is unregistering. The precise cause could not be determined because the screen had already turned black when the hang occurred. Link: https://lore.kernel.org/r/ae2b02c9cfbefff475b6e132b0aa962aaccbd7b2.1736162539.git.lukas@wunner.de Fixes: 665745f27487 ("PCI/bwctrl: Re-add BW notification portdrv as PCIe BW controller") Reported-by: Evert Vorster <evorster@gmail.com> Closes: https://bugzilla.kernel.org/show_bug.cgi?id=219629 Signed-off-by: Lukas Wunner <lukas@wunner.de> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Tested-by: Evert Vorster <evorster@gmail.com> Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
2025-01-06 12:26:35 +01:00
PCI/bwctrl: Replace lbms_count with PCI_LINK_LBMS_SEEN flag PCIe BW controller counted LBMS assertions for the purposes of the Target Speed quirk (pcie_failed_link_retrain()). It was also a plan to expose the LBMS count through sysfs to allow better diagnosing link related issues. Lukas Wunner suggested, however, that adding a trace event would be better for diagnostics purposes, leaving only pcie_failed_link_retrain() as a user of the lbms_count. The logic in pcie_failed_link_retrain() does not require keeping count of LBMS assertions, so replace lbms_count with a simple flag in pci_dev's priv_flags. The reduced complexity allows removing pcie_bwctrl_lbms_rwsem. Since pcie_failed_link_retrain() runs before bwctrl is probed during boot, the LBMS in Link Status register still has to be checked by the quirk. The priv_flags numbering is not continuous because hotplug code added a few flags to fill numbers 4-5 (hotplug and bwctrl changes are routed through in different branches). Suggested-by: Lukas Wunner <lukas@wunner.de> Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> [bhelgaas: commit log] Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> [kwilczynski: squashed a fix to resolve build failures from https://lore.kernel.org/all/20250508090036.1528-1-ilpo.jarvinen@linux.intel.com] Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org> Reviewed-by: Lukas Wunner <lukas@wunner.de> Link: https://patch.msgid.link/20250422115548.1483-1-ilpo.jarvinen@linux.intel.com
2025-04-22 14:55:47 +03:00
free_irq(srv->irq, srv);
PCI/bwctrl: Add pcie_set_target_speed() to set PCIe Link Speed Currently, PCIe Link Speeds are adjusted by custom code rather than in a common function provided in PCI core. The PCIe bandwidth controller (bwctrl) introduces an in-kernel API, pcie_set_target_speed(), to set PCIe Link Speed. Convert Target Speed quirk to use the new API. The Target Speed quirk runs very early when bwctrl is not yet probed for a Port and can also run later when bwctrl is already setup for the Port, which requires the per port mutex (set_speed_mutex) to be only taken if the bwctrl setup is already complete. The new API is also intended to be used in an upcoming commit that adds a thermal cooling device to throttle PCIe bandwidth when thermal thresholds are reached. The PCIe bandwidth control procedure is as follows. The highest speed supported by the Port and the PCIe device which is not higher than the requested speed is selected and written into the Target Link Speed in the Link Control 2 Register. Then bandwidth controller retrains the PCIe Link. Bandwidth Notifications enable the cur_bus_speed in the struct pci_bus to keep track PCIe Link Speed changes. While Bandwidth Notifications should also be generated when bandwidth controller alters the PCIe Link Speed, a few platforms do not deliver LMBS interrupt after Link Training as expected. Thus, after changing the Link Speed, bandwidth controller makes additional read for the Link Status Register to ensure cur_bus_speed is consistent with the new PCIe Link Speed. Link: https://lore.kernel.org/r/20241018144755.7875-8-ilpo.jarvinen@linux.intel.com Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> [bhelgaas: squash devm_mutex_init() error checking from https://lore.kernel.org/r/20241030163139.2111689-1-andriy.shevchenko@linux.intel.com, drop export of pcie_set_target_speed()] Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
2024-10-18 17:47:53 +03:00
PCI/bwctrl: Replace lbms_count with PCI_LINK_LBMS_SEEN flag PCIe BW controller counted LBMS assertions for the purposes of the Target Speed quirk (pcie_failed_link_retrain()). It was also a plan to expose the LBMS count through sysfs to allow better diagnosing link related issues. Lukas Wunner suggested, however, that adding a trace event would be better for diagnostics purposes, leaving only pcie_failed_link_retrain() as a user of the lbms_count. The logic in pcie_failed_link_retrain() does not require keeping count of LBMS assertions, so replace lbms_count with a simple flag in pci_dev's priv_flags. The reduced complexity allows removing pcie_bwctrl_lbms_rwsem. Since pcie_failed_link_retrain() runs before bwctrl is probed during boot, the LBMS in Link Status register still has to be checked by the quirk. The priv_flags numbering is not continuous because hotplug code added a few flags to fill numbers 4-5 (hotplug and bwctrl changes are routed through in different branches). Suggested-by: Lukas Wunner <lukas@wunner.de> Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> [bhelgaas: commit log] Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> [kwilczynski: squashed a fix to resolve build failures from https://lore.kernel.org/all/20250508090036.1528-1-ilpo.jarvinen@linux.intel.com] Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org> Reviewed-by: Lukas Wunner <lukas@wunner.de> Link: https://patch.msgid.link/20250422115548.1483-1-ilpo.jarvinen@linux.intel.com
2025-04-22 14:55:47 +03:00
srv->port->link_bwctrl = NULL;
PCI/bwctrl: Fix NULL pointer deref on unbind and bind The interrupt handler for bandwidth notifications, pcie_bwnotif_irq(), dereferences a "data" pointer. On unbind, that pointer is set to NULL by pcie_bwnotif_remove(). However the interrupt handler may still be invoked afterwards and will dereference that NULL pointer. That's because the interrupt is requested using a devm_*() helper and the driver core releases devm_*() resources *after* calling ->remove(). pcie_bwnotif_remove() does clear the Link Bandwidth Management Interrupt Enable and Link Autonomous Bandwidth Interrupt Enable bits in the Link Control Register, but that won't prevent execution of pcie_bwnotif_irq(): The interrupt for bandwidth notifications may be shared with AER, DPC, PME, and hotplug. So pcie_bwnotif_irq() may be executed as long as the interrupt is requested. There's a similar race on bind: pcie_bwnotif_probe() requests the interrupt when the "data" pointer still points to NULL. A NULL pointer deref may thus likewise occur if AER, DPC, PME or hotplug raise an interrupt in-between the bandwidth controller's call to devm_request_irq() and assignment of the "data" pointer. Drop the devm_*() usage and reorder requesting of the interrupt to fix the issue. While at it, drop a stray but harmless no_free_ptr() invocation when assigning the "data" pointer in pcie_bwnotif_probe(). Ilpo points out that the locking on unbind and bind needs to be symmetric, so move the call to pcie_bwnotif_disable() inside the critical section protected by pcie_bwctrl_setspeed_rwsem and pcie_bwctrl_lbms_rwsem. Evert reports a hang on shutdown of an ASUS ROG Strix SCAR 17 G733PYV. The issue is no longer reproducible with the present commit. Evert found that attaching a USB-C monitor prevented the hang. The machine contains an ASMedia USB 3.2 controller below a hotplug-capable Root Port. So one possible explanation is that the controller gets hot-removed on shutdown unless something is connected. And the ensuing hotplug interrupt occurs exactly when the bandwidth controller is unregistering. The precise cause could not be determined because the screen had already turned black when the hang occurred. Link: https://lore.kernel.org/r/ae2b02c9cfbefff475b6e132b0aa962aaccbd7b2.1736162539.git.lukas@wunner.de Fixes: 665745f27487 ("PCI/bwctrl: Re-add BW notification portdrv as PCIe BW controller") Reported-by: Evert Vorster <evorster@gmail.com> Closes: https://bugzilla.kernel.org/show_bug.cgi?id=219629 Signed-off-by: Lukas Wunner <lukas@wunner.de> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Tested-by: Evert Vorster <evorster@gmail.com> Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
2025-01-06 12:26:35 +01:00
}
PCI/bwctrl: Re-add BW notification portdrv as PCIe BW controller This mostly reverts the commit b4c7d2076b4e ("PCI/LINK: Remove bandwidth notification"). An upcoming commit extends this driver building PCIe bandwidth controller on top of it. PCIe bandwidth notifications were first added in the commit e8303bb7a75c ("PCI/LINK: Report degraded links via link bandwidth notification") but later had to be removed. The significant changes compared with the old bandwidth notification driver include: 1) Don't print the notifications into kernel log, just keep the Link Speed cached in struct pci_bus updated. While somewhat unfortunate, the log spam was the source of complaints that eventually lead to the removal of the bandwidth notifications driver (see the links below for further information). 2) Besides the Link Bandwidth Management Interrupt, also enable Link Autonomous Bandwidth Interrupt to cover the other source of bandwidth changes. 3) Handle Link Speed updates robustly. Refresh the cached Link Speed when enabling Bandwidth Notification Interrupts, and solve the race between Link Speed read and LBMS/LABS update in pcie_bwnotif_irq_thread(). 4) Use concurrency safe LNKCTL RMW operations. 5) The driver is now called PCIe bwctrl (bandwidth controller) instead of just bandwidth notifications because of increased scope and functionality within the driver. 6) Coexist with the Target Link Speed quirk in pcie_failed_link_retrain(). Provide LBMS counting API for it. 7) Tweaks to variable/functions names for consistency and length reasons. Bandwidth Notifications enable the cur_bus_speed in the struct pci_bus to keep track PCIe Link Speed changes. [bhelgaas: This is based on previous work by Alexandru Gagniuc <mr.nuke.me@gmail.com>; see e8303bb7a75c ("PCI/LINK: Report degraded links via link bandwidth notification")] Link: https://lore.kernel.org/r/20241018144755.7875-7-ilpo.jarvinen@linux.intel.com Link: https://lore.kernel.org/all/20190429185611.121751-1-helgaas@kernel.org/ Link: https://lore.kernel.org/linux-pci/20190501142942.26972-1-keith.busch@intel.com/ Link: https://lore.kernel.org/linux-pci/20200115221008.GA191037@google.com/ Suggested-by: Lukas Wunner <lukas@wunner.de> # Building bwctrl on top of bwnotif Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> [bhelgaas: squash fix to drop IRQF_ONESHOT and convert to hardirq handler: https://lore.kernel.org/r/20241115165717.15233-1-ilpo.jarvinen@linux.intel.com] Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Tested-by: Stefan Wahren <wahrenst@gmx.net> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
2024-10-18 17:47:52 +03:00
}
static int pcie_bwnotif_suspend(struct pcie_device *srv)
{
pcie_bwnotif_disable(srv->port);
return 0;
}
static int pcie_bwnotif_resume(struct pcie_device *srv)
{
pcie_bwnotif_enable(srv);
return 0;
}
static struct pcie_port_service_driver pcie_bwctrl_driver = {
.name = "pcie_bwctrl",
.port_type = PCIE_ANY_PORT,
.service = PCIE_PORT_SERVICE_BWCTRL,
.probe = pcie_bwnotif_probe,
.suspend = pcie_bwnotif_suspend,
.resume = pcie_bwnotif_resume,
.remove = pcie_bwnotif_remove,
};
int __init pcie_bwctrl_init(void)
{
return pcie_port_service_register(&pcie_bwctrl_driver);
}