From 400123bd0107175e92a9780b97f7a5934eb0a991 Mon Sep 17 00:00:00 2001 From: Andrej Picej Date: Thu, 29 May 2025 07:36:53 +0200 Subject: [PATCH 001/289] dt-bindings: drm/bridge: ti-sn65dsi83: drop $ref to fix lvds-vod* warnings The kernel test robot reported a warning related to the use of "$ref" type definitions for custom endpoint properties - "ti,lvds-vod-swing-clock-microvolt" and - "ti,lvds-vod-swing-data-microvolt". Using "$ref" with "uint32-array" is not correctly handled in this context. Removing "$ref" and relying solely on "maxItems: 2" enforces the intended requirement of specifying exactly two values, without triggering a schema validation warning. Reported-by: kernel test robot Closes: https://lore.kernel.org/oe-kbuild-all/202505021937.efnQPPqx-lkp@intel.com/ Signed-off-by: Andrej Picej Link: https://lore.kernel.org/r/20250529053654.1754926-1-andrej.picej@norik.com Signed-off-by: Rob Herring (Arm) --- .../devicetree/bindings/display/bridge/ti,sn65dsi83.yaml | 4 ---- 1 file changed, 4 deletions(-) diff --git a/Documentation/devicetree/bindings/display/bridge/ti,sn65dsi83.yaml b/Documentation/devicetree/bindings/display/bridge/ti,sn65dsi83.yaml index 9b5f3f3eab19..e69b6343a8eb 100644 --- a/Documentation/devicetree/bindings/display/bridge/ti,sn65dsi83.yaml +++ b/Documentation/devicetree/bindings/display/bridge/ti,sn65dsi83.yaml @@ -118,15 +118,11 @@ $defs: ti,lvds-vod-swing-clock-microvolt: description: LVDS diferential output voltage for clock lanes in microvolts. - $ref: /schemas/types.yaml#/definitions/uint32-array - minItems: 2 maxItems: 2 ti,lvds-vod-swing-data-microvolt: description: LVDS diferential output voltage for data lanes in microvolts. - $ref: /schemas/types.yaml#/definitions/uint32-array - minItems: 2 maxItems: 2 allOf: From 009c3a4bc41e855fd76f92727f9fbae4e5917d7f Mon Sep 17 00:00:00 2001 From: Avri Altman Date: Mon, 26 May 2025 14:44:45 +0300 Subject: [PATCH 002/289] mmc: core: sd: Apply BROKEN_SD_DISCARD quirk earlier Move the BROKEN_SD_DISCARD quirk for certain SanDisk SD cards from the `mmc_blk_fixups[]` to `mmc_sd_fixups[]`. This ensures the quirk is applied earlier in the device initialization process, aligning with the reasoning in [1]. Applying the quirk sooner prevents the kernel from incorrectly enabling discard support on affected cards during initial setup. [1] https://lore.kernel.org/all/20240820230631.GA436523@sony.com Fixes: 07d2872bf4c8 ("mmc: core: Add SD card quirk for broken discard") Signed-off-by: Avri Altman Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20250526114445.675548-1-avri.altman@sandisk.com Signed-off-by: Ulf Hansson --- drivers/mmc/core/quirks.h | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/drivers/mmc/core/quirks.h b/drivers/mmc/core/quirks.h index 7f893bafaa60..c417ed34c057 100644 --- a/drivers/mmc/core/quirks.h +++ b/drivers/mmc/core/quirks.h @@ -44,6 +44,12 @@ static const struct mmc_fixup __maybe_unused mmc_sd_fixups[] = { 0, -1ull, SDIO_ANY_ID, SDIO_ANY_ID, add_quirk_sd, MMC_QUIRK_NO_UHS_DDR50_TUNING, EXT_CSD_REV_ANY), + /* + * Some SD cards reports discard support while they don't + */ + MMC_FIXUP(CID_NAME_ANY, CID_MANFID_SANDISK_SD, 0x5344, add_quirk_sd, + MMC_QUIRK_BROKEN_SD_DISCARD), + END_FIXUP }; @@ -147,12 +153,6 @@ static const struct mmc_fixup __maybe_unused mmc_blk_fixups[] = { MMC_FIXUP("M62704", CID_MANFID_KINGSTON, 0x0100, add_quirk_mmc, MMC_QUIRK_TRIM_BROKEN), - /* - * Some SD cards reports discard support while they don't - */ - MMC_FIXUP(CID_NAME_ANY, CID_MANFID_SANDISK_SD, 0x5344, add_quirk_sd, - MMC_QUIRK_BROKEN_SD_DISCARD), - END_FIXUP }; From 3358b836d4369ad47823c26834b3613778fe75b2 Mon Sep 17 00:00:00 2001 From: Dan Carpenter Date: Tue, 27 May 2025 08:55:01 +0300 Subject: [PATCH 003/289] mmc: sdhci-of-k1: Fix error code in probe() If spacemit_sdhci_get_clocks() fails, then propagate the error code. Don't return success. Fixes: e5502d15b0f3 ("mmc: sdhci-of-k1: add support for SpacemiT K1 SoC") Signed-off-by: Dan Carpenter Reviewed-by: Yixun Lan Acked-by: Adrian Hunter Link: https://lore.kernel.org/r/aDVTtQdXVtRhxOrb@stanley.mountain Signed-off-by: Ulf Hansson --- drivers/mmc/host/sdhci-of-k1.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/mmc/host/sdhci-of-k1.c b/drivers/mmc/host/sdhci-of-k1.c index 6880d3e9ab62..2e5da7c5834c 100644 --- a/drivers/mmc/host/sdhci-of-k1.c +++ b/drivers/mmc/host/sdhci-of-k1.c @@ -276,7 +276,8 @@ static int spacemit_sdhci_probe(struct platform_device *pdev) host->mmc->caps |= MMC_CAP_NEED_RSP_BUSY; - if (spacemit_sdhci_get_clocks(dev, pltfm_host)) + ret = spacemit_sdhci_get_clocks(dev, pltfm_host); + if (ret) goto err_pltfm; ret = sdhci_add_host(host); From 539d80575b810c7a5987c7ac8915e3bc99c03695 Mon Sep 17 00:00:00 2001 From: "Masami Hiramatsu (Google)" Date: Thu, 5 Jun 2025 10:07:38 +0900 Subject: [PATCH 004/289] mtk-sd: Fix a pagefault in dma_unmap_sg() for not prepared data When swiotlb buffer is full, the dma_map_sg() returns 0 to msdc_prepare_data(), but it does not check it and sets the MSDC_PREPARE_FLAG. swiotlb_tbl_map_single() /* prints "swiotlb buffer is full" */ <-swiotlb_map() <-dma_direct_map_page() <-dma_direct_map_sg() <-__dma_map_sg_attrs() <-dma_map_sg_attrs() <-dma_map_sg() /* returns 0 (pages mapped) */ <-msdc_prepare_data() Then, the msdc_unprepare_data() checks MSDC_PREPARE_FLAG and calls dma_unmap_sg() with unmapped pages. It causes a page fault. To fix this problem, Do not set MSDC_PREPARE_FLAG if dma_map_sg() fails because this is not prepared. Fixes: 208489032bdd ("mmc: mediatek: Add Mediatek MMC driver") Signed-off-by: Masami Hiramatsu (Google) Tested-by: Sergey Senozhatsky Reviewed-by: AngeloGioacchino Del Regno Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/174908565814.4056588.769599127120955383.stgit@mhiramat.tok.corp.google.com Signed-off-by: Ulf Hansson --- drivers/mmc/host/mtk-sd.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/mmc/host/mtk-sd.c b/drivers/mmc/host/mtk-sd.c index 31eb90536bce..b1d1586cf1fc 100644 --- a/drivers/mmc/host/mtk-sd.c +++ b/drivers/mmc/host/mtk-sd.c @@ -846,9 +846,10 @@ static inline void msdc_dma_setup(struct msdc_host *host, struct msdc_dma *dma, static void msdc_prepare_data(struct msdc_host *host, struct mmc_data *data) { if (!(data->host_cookie & MSDC_PREPARE_FLAG)) { - data->host_cookie |= MSDC_PREPARE_FLAG; data->sg_count = dma_map_sg(host->dev, data->sg, data->sg_len, mmc_get_dma_dir(data)); + if (data->sg_count) + data->host_cookie |= MSDC_PREPARE_FLAG; } } From d53fd59707c402d03ec740b4e90a2ddcb5312006 Mon Sep 17 00:00:00 2001 From: "Rob Herring (Arm)" Date: Wed, 7 May 2025 16:59:02 -0500 Subject: [PATCH 005/289] dt-bindings: soc: fsl,ls1028a-reset: Drop extra "/" in $id The $id value has a double "//". Drop it. Fixes: 9ca5a7d9d2e0 ("dt-bindings: soc: fsl: Add fsl,ls1028a-reset for reset syscon node") Reviewed-by: Krzysztof Kozlowski Link: https://lore.kernel.org/r/20250507215903.2748698-1-robh@kernel.org Signed-off-by: Rob Herring (Arm) --- .../devicetree/bindings/soc/fsl/fsl,ls1028a-reset.yaml | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Documentation/devicetree/bindings/soc/fsl/fsl,ls1028a-reset.yaml b/Documentation/devicetree/bindings/soc/fsl/fsl,ls1028a-reset.yaml index 31295be91013..558a8c7aab9c 100644 --- a/Documentation/devicetree/bindings/soc/fsl/fsl,ls1028a-reset.yaml +++ b/Documentation/devicetree/bindings/soc/fsl/fsl,ls1028a-reset.yaml @@ -1,7 +1,7 @@ # SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause) %YAML 1.2 --- -$id: http://devicetree.org/schemas//soc/fsl/fsl,ls1028a-reset.yaml# +$id: http://devicetree.org/schemas/soc/fsl/fsl,ls1028a-reset.yaml# $schema: http://devicetree.org/meta-schemas/core.yaml# title: Freescale Layerscape Reset Registers Module From 87b42c114cdda76c8ad3002f2096699ad5146cb3 Mon Sep 17 00:00:00 2001 From: Dan Carpenter Date: Wed, 28 May 2025 11:11:41 +0300 Subject: [PATCH 006/289] cxl: fix return value in cxlctl_validate_set_features() The cxlctl_validate_set_features() function is type bool. It's supposed to return true for valid requests and false for invalid. However, this error path returns ERR_PTR(-EINVAL) which is true when it was intended to return false. The incorrect return will result in kernel failing to prevent a incorrect op_size passed in from userspace to be detected. [ dj: Add user impact to commit log ] Fixes: f76e0bbc8bc3 ("cxl: Update prototype of function get_support_feature_info()") Signed-off-by: Dan Carpenter Reviewed-by: Ira Weiny Link: https://patch.msgid.link/aDbFPSCujpJLY1if@stanley.mountain Signed-off-by: Dave Jiang --- drivers/cxl/core/features.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/cxl/core/features.c b/drivers/cxl/core/features.c index 6f2eae1eb126..7c750599ea69 100644 --- a/drivers/cxl/core/features.c +++ b/drivers/cxl/core/features.c @@ -544,7 +544,7 @@ static bool cxlctl_validate_set_features(struct cxl_features_state *cxlfs, u32 flags; if (rpc_in->op_size < sizeof(uuid_t)) - return ERR_PTR(-EINVAL); + return false; feat = cxl_feature_info(cxlfs, &rpc_in->set_feat_in.uuid); if (IS_ERR(feat)) From 5ae416c5b1e2e816aee7b3fc8347adf70afabb4c Mon Sep 17 00:00:00 2001 From: Qasim Ijaz Date: Fri, 6 Jun 2025 19:49:57 +0100 Subject: [PATCH 007/289] HID: wacom: fix memory leak on kobject creation failure During wacom_initialize_remotes() a fifo buffer is allocated with kfifo_alloc() and later a cleanup action is registered during devm_add_action_or_reset() to clean it up. However if the code fails to create a kobject and register it with sysfs the code simply returns -ENOMEM before the cleanup action is registered leading to a memory leak. Fix this by ensuring the fifo is freed when the kobject creation and registration process fails. Fixes: 83e6b40e2de6 ("HID: wacom: EKR: have the wacom resources dynamically allocated") Reviewed-by: Ping Cheng Cc: stable@vger.kernel.org Signed-off-by: Qasim Ijaz Signed-off-by: Jiri Kosina --- drivers/hid/wacom_sys.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/drivers/hid/wacom_sys.c b/drivers/hid/wacom_sys.c index eaf099b2efdb..ec5282bc69d6 100644 --- a/drivers/hid/wacom_sys.c +++ b/drivers/hid/wacom_sys.c @@ -2048,8 +2048,10 @@ static int wacom_initialize_remotes(struct wacom *wacom) remote->remote_dir = kobject_create_and_add("wacom_remote", &wacom->hdev->dev.kobj); - if (!remote->remote_dir) + if (!remote->remote_dir) { + kfifo_free(&remote->remote_fifo); return -ENOMEM; + } error = sysfs_create_files(remote->remote_dir, remote_unpair_attrs); From 1a19ae437ca5d5c7d9ec2678946fb339b1c706bf Mon Sep 17 00:00:00 2001 From: Qasim Ijaz Date: Fri, 6 Jun 2025 19:49:58 +0100 Subject: [PATCH 008/289] HID: wacom: fix memory leak on sysfs attribute creation failure When sysfs_create_files() fails during wacom_initialize_remotes() the fifo buffer is not freed leading to a memory leak. Fix this by calling kfifo_free() before returning. Fixes: 83e6b40e2de6 ("HID: wacom: EKR: have the wacom resources dynamically allocated") Reviewed-by: Ping Cheng Cc: stable@vger.kernel.org Signed-off-by: Qasim Ijaz Signed-off-by: Jiri Kosina --- drivers/hid/wacom_sys.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/hid/wacom_sys.c b/drivers/hid/wacom_sys.c index ec5282bc69d6..58cbd43a37e9 100644 --- a/drivers/hid/wacom_sys.c +++ b/drivers/hid/wacom_sys.c @@ -2058,6 +2058,7 @@ static int wacom_initialize_remotes(struct wacom *wacom) if (error) { hid_err(wacom->hdev, "cannot create sysfs group err: %d\n", error); + kfifo_free(&remote->remote_fifo); return error; } From 85a720f4337f0ddf1603c8b75a8f1ffbbe022ef9 Mon Sep 17 00:00:00 2001 From: Qasim Ijaz Date: Fri, 6 Jun 2025 19:49:59 +0100 Subject: [PATCH 009/289] HID: wacom: fix kobject reference count leak When sysfs_create_files() fails in wacom_initialize_remotes() the error is returned and the cleanup action will not have been registered yet. As a result the kobject???s refcount is never dropped, so the kobject can never be freed leading to a reference leak. Fix this by calling kobject_put() before returning. Fixes: 83e6b40e2de6 ("HID: wacom: EKR: have the wacom resources dynamically allocated") Acked-by: Ping Cheng Cc: stable@vger.kernel.org Signed-off-by: Qasim Ijaz Signed-off-by: Jiri Kosina --- drivers/hid/wacom_sys.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/hid/wacom_sys.c b/drivers/hid/wacom_sys.c index 58cbd43a37e9..1257131b1e34 100644 --- a/drivers/hid/wacom_sys.c +++ b/drivers/hid/wacom_sys.c @@ -2059,6 +2059,7 @@ static int wacom_initialize_remotes(struct wacom *wacom) hid_err(wacom->hdev, "cannot create sysfs group err: %d\n", error); kfifo_free(&remote->remote_fifo); + kobject_put(remote->remote_dir); return error; } From 4a0381080397e77792a5168069f174d3e56175ff Mon Sep 17 00:00:00 2001 From: "Daniel J. Ogorchock" Date: Tue, 13 May 2025 03:47:00 -0400 Subject: [PATCH 010/289] HID: nintendo: avoid bluetooth suspend/resume stalls Ensure we don't stall or panic the kernel when using bluetooth-connected controllers. This was reported as an issue on android devices using kernel 6.6 due to the resume hook which had been added for usb joycons. First, set a new state value to JOYCON_CTLR_STATE_SUSPENDED in a newly-added nintendo_hid_suspend. This makes sure we will not stall out the kernel waiting for input reports during led classdev suspend. The stalls could happen if connectivity is unreliable or lost to the controller prior to suspend. Second, since we lose connectivity during suspend, do not try joycon_init() for bluetooth controllers in the nintendo_hid_resume path. Tested via multiple suspend/resume flows when using the controller both in USB and bluetooth modes. Signed-off-by: Daniel J. Ogorchock Reviewed-by: Silvan Jegen Signed-off-by: Jiri Kosina --- drivers/hid/hid-nintendo.c | 38 ++++++++++++++++++++++++++++++++++++-- 1 file changed, 36 insertions(+), 2 deletions(-) diff --git a/drivers/hid/hid-nintendo.c b/drivers/hid/hid-nintendo.c index 839d5bcd72b1..fb4985988615 100644 --- a/drivers/hid/hid-nintendo.c +++ b/drivers/hid/hid-nintendo.c @@ -308,6 +308,7 @@ enum joycon_ctlr_state { JOYCON_CTLR_STATE_INIT, JOYCON_CTLR_STATE_READ, JOYCON_CTLR_STATE_REMOVED, + JOYCON_CTLR_STATE_SUSPENDED, }; /* Controller type received as part of device info */ @@ -2750,14 +2751,46 @@ static void nintendo_hid_remove(struct hid_device *hdev) static int nintendo_hid_resume(struct hid_device *hdev) { - int ret = joycon_init(hdev); + struct joycon_ctlr *ctlr = hid_get_drvdata(hdev); + int ret; + hid_dbg(hdev, "resume\n"); + if (!joycon_using_usb(ctlr)) { + hid_dbg(hdev, "no-op resume for bt ctlr\n"); + ctlr->ctlr_state = JOYCON_CTLR_STATE_READ; + return 0; + } + + ret = joycon_init(hdev); if (ret) - hid_err(hdev, "Failed to restore controller after resume"); + hid_err(hdev, + "Failed to restore controller after resume: %d\n", + ret); + else + ctlr->ctlr_state = JOYCON_CTLR_STATE_READ; return ret; } +static int nintendo_hid_suspend(struct hid_device *hdev, pm_message_t message) +{ + struct joycon_ctlr *ctlr = hid_get_drvdata(hdev); + + hid_dbg(hdev, "suspend: %d\n", message.event); + /* + * Avoid any blocking loops in suspend/resume transitions. + * + * joycon_enforce_subcmd_rate() can result in repeated retries if for + * whatever reason the controller stops providing input reports. + * + * This has been observed with bluetooth controllers which lose + * connectivity prior to suspend (but not long enough to result in + * complete disconnection). + */ + ctlr->ctlr_state = JOYCON_CTLR_STATE_SUSPENDED; + return 0; +} + #endif static const struct hid_device_id nintendo_hid_devices[] = { @@ -2796,6 +2829,7 @@ static struct hid_driver nintendo_hid_driver = { #ifdef CONFIG_PM .resume = nintendo_hid_resume, + .suspend = nintendo_hid_suspend, #endif }; static int __init nintendo_init(void) From 73f3a7415d93cf418c7625d03bce72da84344406 Mon Sep 17 00:00:00 2001 From: Even Xu Date: Wed, 14 May 2025 14:26:38 +0800 Subject: [PATCH 011/289] HID: Intel-thc-hid: Intel-quicki2c: Enhance QuickI2C reset flow During customer board enabling, it was found: some touch devices prepared reset response, but either forgot sending interrupt or THC missed reset interrupt because of timing issue. THC QuickI2C driver depends on interrupt to read reset response, in this case, it will cause driver waiting timeout. This patch enhances the flow by adding manually reset response reading after waiting for reset interrupt timeout. Signed-off-by: Even Xu Tested-by: Chong Han Fixes: 66b59bfce6d9 ("HID: intel-thc-hid: intel-quicki2c: Complete THC QuickI2C driver") Signed-off-by: Jiri Kosina --- .../intel-quicki2c/quicki2c-protocol.c | 26 ++++++++++++++++++- 1 file changed, 25 insertions(+), 1 deletion(-) diff --git a/drivers/hid/intel-thc-hid/intel-quicki2c/quicki2c-protocol.c b/drivers/hid/intel-thc-hid/intel-quicki2c/quicki2c-protocol.c index f493df0d5dc4..a63f8c833252 100644 --- a/drivers/hid/intel-thc-hid/intel-quicki2c/quicki2c-protocol.c +++ b/drivers/hid/intel-thc-hid/intel-quicki2c/quicki2c-protocol.c @@ -4,6 +4,7 @@ #include #include #include +#include #include "intel-thc-dev.h" #include "intel-thc-dma.h" @@ -200,6 +201,9 @@ int quicki2c_set_report(struct quicki2c_device *qcdev, u8 report_type, int quicki2c_reset(struct quicki2c_device *qcdev) { + u16 input_reg = le16_to_cpu(qcdev->dev_desc.input_reg); + size_t read_len = HIDI2C_LENGTH_LEN; + u32 prd_len = read_len; int ret; qcdev->reset_ack = false; @@ -213,12 +217,32 @@ int quicki2c_reset(struct quicki2c_device *qcdev) ret = wait_event_interruptible_timeout(qcdev->reset_ack_wq, qcdev->reset_ack, HIDI2C_RESET_TIMEOUT * HZ); - if (ret <= 0 || !qcdev->reset_ack) { + if (qcdev->reset_ack) + return 0; + + /* + * Manually read reset response if it wasn't received, in case reset interrupt + * was missed by touch device or THC hardware. + */ + ret = thc_tic_pio_read(qcdev->thc_hw, input_reg, read_len, &prd_len, + (u32 *)qcdev->input_buf); + if (ret) { + dev_err_once(qcdev->dev, "Read Reset Response failed, ret %d\n", ret); + return ret; + } + + /* + * Check response packet length, it's first 16 bits of packet. + * If response packet length is zero, it's reset response, otherwise not. + */ + if (get_unaligned_le16(qcdev->input_buf)) { dev_err_once(qcdev->dev, "Wait reset response timed out ret:%d timeout:%ds\n", ret, HIDI2C_RESET_TIMEOUT); return -ETIMEDOUT; } + qcdev->reset_ack = true; + return 0; } From 54bae4c17c11688339eb73a04fd24203bb6e7494 Mon Sep 17 00:00:00 2001 From: "Chia-Lin Kao (AceLan)" Date: Tue, 6 May 2025 13:50:15 +0800 Subject: [PATCH 012/289] HID: quirks: Add quirk for 2 Chicony Electronics HP 5MP Cameras The Chicony Electronics HP 5MP Cameras (USB ID 04F2:B824 & 04F2:B82C) report a HID sensor interface that is not actually implemented. Attempting to access this non-functional sensor via iio_info causes system hangs as runtime PM tries to wake up an unresponsive sensor. Add these 2 devices to the HID ignore list since the sensor interface is non-functional by design and should not be exposed to userspace. Signed-off-by: Chia-Lin Kao (AceLan) Signed-off-by: Jiri Kosina --- drivers/hid/hid-ids.h | 2 ++ drivers/hid/hid-quirks.c | 2 ++ 2 files changed, 4 insertions(+) diff --git a/drivers/hid/hid-ids.h b/drivers/hid/hid-ids.h index e3fb4e2fe911..f1e00641e26e 100644 --- a/drivers/hid/hid-ids.h +++ b/drivers/hid/hid-ids.h @@ -312,6 +312,8 @@ #define USB_DEVICE_ID_ASUS_AK1D 0x1125 #define USB_DEVICE_ID_CHICONY_TOSHIBA_WT10A 0x1408 #define USB_DEVICE_ID_CHICONY_ACER_SWITCH12 0x1421 +#define USB_DEVICE_ID_CHICONY_HP_5MP_CAMERA 0xb824 +#define USB_DEVICE_ID_CHICONY_HP_5MP_CAMERA2 0xb82c #define USB_VENDOR_ID_CHUNGHWAT 0x2247 #define USB_DEVICE_ID_CHUNGHWAT_MULTITOUCH 0x0001 diff --git a/drivers/hid/hid-quirks.c b/drivers/hid/hid-quirks.c index 7fefeb413ec3..96f3c712acc8 100644 --- a/drivers/hid/hid-quirks.c +++ b/drivers/hid/hid-quirks.c @@ -757,6 +757,8 @@ static const struct hid_device_id hid_ignore_list[] = { { HID_USB_DEVICE(USB_VENDOR_ID_AVERMEDIA, USB_DEVICE_ID_AVER_FM_MR800) }, { HID_USB_DEVICE(USB_VENDOR_ID_AXENTIA, USB_DEVICE_ID_AXENTIA_FM_RADIO) }, { HID_USB_DEVICE(USB_VENDOR_ID_BERKSHIRE, USB_DEVICE_ID_BERKSHIRE_PCWD) }, + { HID_USB_DEVICE(USB_VENDOR_ID_CHICONY, USB_DEVICE_ID_CHICONY_HP_5MP_CAMERA) }, + { HID_USB_DEVICE(USB_VENDOR_ID_CHICONY, USB_DEVICE_ID_CHICONY_HP_5MP_CAMERA2) }, { HID_USB_DEVICE(USB_VENDOR_ID_CIDC, 0x0103) }, { HID_USB_DEVICE(USB_VENDOR_ID_CYGNAL, USB_DEVICE_ID_CYGNAL_RADIO_SI470X) }, { HID_USB_DEVICE(USB_VENDOR_ID_CYGNAL, USB_DEVICE_ID_CYGNAL_RADIO_SI4713) }, From fa10d4515817274a50af510d5d283d3c7fffc1ae Mon Sep 17 00:00:00 2001 From: Mario Limonciello Date: Fri, 23 May 2025 11:10:07 -0500 Subject: [PATCH 013/289] HID: input: lower message severity of 'No inputs registered, leaving' to debug Plugging in a "Blue snowball" microphone always shows the error 'No inputs registered, leaving', but the device functions as intended. When a HID device is started using the function hid_hw_start() and the argument HID_CONNECT_DEFAULT it will try all various hid connect requests. Not all devices will create an input device and so the message is needlessly noisy. Decrease it to debug instead. [jkosina@suse.com: edit shortlog] Signed-off-by: Mario Limonciello Signed-off-by: Jiri Kosina --- drivers/hid/hid-input.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/hid/hid-input.c b/drivers/hid/hid-input.c index 9d80635a91eb..ff1784b5c2a4 100644 --- a/drivers/hid/hid-input.c +++ b/drivers/hid/hid-input.c @@ -2343,7 +2343,7 @@ int hidinput_connect(struct hid_device *hid, unsigned int force) } if (list_empty(&hid->inputs)) { - hid_err(hid, "No inputs registered, leaving\n"); + hid_dbg(hid, "No inputs registered, leaving\n"); goto out_unwind; } From 1a8953f4f7746c6a515989774fe03047c522c613 Mon Sep 17 00:00:00 2001 From: Zhang Heng Date: Thu, 5 Jun 2025 15:29:59 +0800 Subject: [PATCH 014/289] HID: Add IGNORE quirk for SMARTLINKTECHNOLOGY MARTLINKTECHNOLOGY is a microphone device, when the HID interface in an audio device is requested to get specific report id, the following error may occur. [ 562.939373] usb 1-1.4.1.2: new full-speed USB device number 21 using xhci_hcd [ 563.104908] usb 1-1.4.1.2: New USB device found, idVendor=4c4a, idProduct=4155, bcdDevice= 1.00 [ 563.104910] usb 1-1.4.1.2: New USB device strings: Mfr=1, Product=2, SerialNumber=3 [ 563.104911] usb 1-1.4.1.2: Product: USB Composite Device [ 563.104912] usb 1-1.4.1.2: Manufacturer: SmartlinkTechnology [ 563.104913] usb 1-1.4.1.2: SerialNumber: 20201111000001 [ 563.229499] input: SmartlinkTechnology USB Composite Device as /devices/pci0000:00/0000:00:07.1/0000:04:00.3/usb1/1-1/1-1.4/1-1.4.1/1-1.4.1.2/1-1.4.1.2:1.2/0003:4C4A:4155.000F/input/input35 [ 563.291505] hid-generic 0003:4C4A:4155.000F: input,hidraw2: USB HID v2.01 Keyboard [SmartlinkTechnology USB Composite Device] on usb-0000:04:00.3-1.4.1.2/input2 [ 563.291557] usbhid 1-1.4.1.2:1.3: couldn't find an input interrupt endpoint [ 568.506654] usb 1-1.4.1.2: 1:1: usb_set_interface failed (-110) [ 573.626656] usb 1-1.4.1.2: 1:1: usb_set_interface failed (-110) [ 578.746657] usb 1-1.4.1.2: 1:1: usb_set_interface failed (-110) [ 583.866655] usb 1-1.4.1.2: 1:1: usb_set_interface failed (-110) [ 588.986657] usb 1-1.4.1.2: 1:1: usb_set_interface failed (-110) Ignore HID interface. The device is working properly. Signed-off-by: Zhang Heng Signed-off-by: Jiri Kosina --- drivers/hid/hid-ids.h | 3 +++ drivers/hid/hid-quirks.c | 1 + 2 files changed, 4 insertions(+) diff --git a/drivers/hid/hid-ids.h b/drivers/hid/hid-ids.h index f1e00641e26e..2d3769405ec3 100644 --- a/drivers/hid/hid-ids.h +++ b/drivers/hid/hid-ids.h @@ -1527,4 +1527,7 @@ #define USB_VENDOR_ID_SIGNOTEC 0x2133 #define USB_DEVICE_ID_SIGNOTEC_VIEWSONIC_PD1011 0x0018 +#define USB_VENDOR_ID_SMARTLINKTECHNOLOGY 0x4c4a +#define USB_DEVICE_ID_SMARTLINKTECHNOLOGY_4155 0x4155 + #endif diff --git a/drivers/hid/hid-quirks.c b/drivers/hid/hid-quirks.c index 96f3c712acc8..31508da93ba2 100644 --- a/drivers/hid/hid-quirks.c +++ b/drivers/hid/hid-quirks.c @@ -906,6 +906,7 @@ static const struct hid_device_id hid_ignore_list[] = { #endif { HID_USB_DEVICE(USB_VENDOR_ID_YEALINK, USB_DEVICE_ID_YEALINK_P1K_P4K_B2K) }, { HID_USB_DEVICE(USB_VENDOR_ID_QUANTA, USB_DEVICE_ID_QUANTA_HP_5MP_CAMERA_5473) }, + { HID_USB_DEVICE(USB_VENDOR_ID_SMARTLINKTECHNOLOGY, USB_DEVICE_ID_SMARTLINKTECHNOLOGY_4155) }, { } }; From 9327e3ee5b077c4ab4495a09b67624f670ed88b6 Mon Sep 17 00:00:00 2001 From: Iusico Maxim Date: Thu, 5 Jun 2025 19:55:50 +0200 Subject: [PATCH 015/289] HID: lenovo: Restrict F7/9/11 mode to compact keyboards only Commit 2f2bd7cbd1d1 ("hid: lenovo: Resend all settings on reset_resume for compact keyboards") introduced a regression for ThinkPad TrackPoint Keyboard II by removing the conditional check for enabling F7/9/11 mode needed for compact keyboards only. As a result, the non-compact keyboards can no longer toggle Fn-lock via Fn+Esc, although it can be controlled via sysfs knob that directly sends raw commands. This patch restores the previous conditional check without any additions. Cc: stable@vger.kernel.org Fixes: 2f2bd7cbd1d1 ("hid: lenovo: Resend all settings on reset_resume for compact keyboards") Signed-off-by: Iusico Maxim Signed-off-by: Jiri Kosina --- drivers/hid/hid-lenovo.c | 11 +++++++---- 1 file changed, 7 insertions(+), 4 deletions(-) diff --git a/drivers/hid/hid-lenovo.c b/drivers/hid/hid-lenovo.c index af29ba840522..a3c23a72316a 100644 --- a/drivers/hid/hid-lenovo.c +++ b/drivers/hid/hid-lenovo.c @@ -548,11 +548,14 @@ static void lenovo_features_set_cptkbd(struct hid_device *hdev) /* * Tell the keyboard a driver understands it, and turn F7, F9, F11 into - * regular keys + * regular keys (Compact only) */ - ret = lenovo_send_cmd_cptkbd(hdev, 0x01, 0x03); - if (ret) - hid_warn(hdev, "Failed to switch F7/9/11 mode: %d\n", ret); + if (hdev->product == USB_DEVICE_ID_LENOVO_CUSBKBD || + hdev->product == USB_DEVICE_ID_LENOVO_CBTKBD) { + ret = lenovo_send_cmd_cptkbd(hdev, 0x01, 0x03); + if (ret) + hid_warn(hdev, "Failed to switch F7/9/11 mode: %d\n", ret); + } /* Switch middle button to native mode */ ret = lenovo_send_cmd_cptkbd(hdev, 0x09, 0x01); From 0e97f5b6a0808fa2ea865280708511c817d4bca3 Mon Sep 17 00:00:00 2001 From: Zhang Lixu Date: Tue, 10 Jun 2025 10:01:31 +0800 Subject: [PATCH 016/289] hid: intel-ish-hid: Use PCI_DEVICE_DATA() macro for ISH device table Replace the usage of PCI_VDEVICE() with driver_data assignment in the ISH PCI device table with the PCI_DEVICE_DATA() macro. This improves code readability. Signed-off-by: Zhang Lixu Acked-by: Srinivas Pandruvada Signed-off-by: Jiri Kosina --- drivers/hid/intel-ish-hid/ipc/pci-ish.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/drivers/hid/intel-ish-hid/ipc/pci-ish.c b/drivers/hid/intel-ish-hid/ipc/pci-ish.c index ff0fc8010072..0db41ed74a14 100644 --- a/drivers/hid/intel-ish-hid/ipc/pci-ish.c +++ b/drivers/hid/intel-ish-hid/ipc/pci-ish.c @@ -67,9 +67,9 @@ static const struct pci_device_id ish_pci_tbl[] = { {PCI_VDEVICE(INTEL, PCI_DEVICE_ID_INTEL_ISH_MTL_P)}, {PCI_VDEVICE(INTEL, PCI_DEVICE_ID_INTEL_ISH_ARL_H)}, {PCI_VDEVICE(INTEL, PCI_DEVICE_ID_INTEL_ISH_ARL_S)}, - {PCI_VDEVICE(INTEL, PCI_DEVICE_ID_INTEL_ISH_LNL_M), .driver_data = ISHTP_DRIVER_DATA_LNL_M}, - {PCI_VDEVICE(INTEL, PCI_DEVICE_ID_INTEL_ISH_PTL_H), .driver_data = ISHTP_DRIVER_DATA_PTL}, - {PCI_VDEVICE(INTEL, PCI_DEVICE_ID_INTEL_ISH_PTL_P), .driver_data = ISHTP_DRIVER_DATA_PTL}, + {PCI_DEVICE_DATA(INTEL, ISH_LNL_M, ISHTP_DRIVER_DATA_LNL_M)}, + {PCI_DEVICE_DATA(INTEL, ISH_PTL_H, ISHTP_DRIVER_DATA_PTL)}, + {PCI_DEVICE_DATA(INTEL, ISH_PTL_P, ISHTP_DRIVER_DATA_PTL)}, {} }; MODULE_DEVICE_TABLE(pci, ish_pci_tbl); From 5cdb49a680b45f467e9d915c0e74756bc0c67c57 Mon Sep 17 00:00:00 2001 From: Zhang Lixu Date: Tue, 10 Jun 2025 10:01:32 +0800 Subject: [PATCH 017/289] HID: intel-ish-hid: ipc: Add Wildcat Lake PCI device ID Add device ID of Wildcat Lake into ishtp support list. Signed-off-by: Zhang Lixu Acked-by: Srinivas Pandruvada Signed-off-by: Jiri Kosina --- drivers/hid/intel-ish-hid/ipc/hw-ish.h | 1 + drivers/hid/intel-ish-hid/ipc/pci-ish.c | 6 ++++++ 2 files changed, 7 insertions(+) diff --git a/drivers/hid/intel-ish-hid/ipc/hw-ish.h b/drivers/hid/intel-ish-hid/ipc/hw-ish.h index 07e90d51f073..fa5d68c36313 100644 --- a/drivers/hid/intel-ish-hid/ipc/hw-ish.h +++ b/drivers/hid/intel-ish-hid/ipc/hw-ish.h @@ -38,6 +38,7 @@ #define PCI_DEVICE_ID_INTEL_ISH_LNL_M 0xA845 #define PCI_DEVICE_ID_INTEL_ISH_PTL_H 0xE345 #define PCI_DEVICE_ID_INTEL_ISH_PTL_P 0xE445 +#define PCI_DEVICE_ID_INTEL_ISH_WCL 0x4D45 #define REVISION_ID_CHT_A0 0x6 #define REVISION_ID_CHT_Ax_SI 0x0 diff --git a/drivers/hid/intel-ish-hid/ipc/pci-ish.c b/drivers/hid/intel-ish-hid/ipc/pci-ish.c index 0db41ed74a14..c57483224db6 100644 --- a/drivers/hid/intel-ish-hid/ipc/pci-ish.c +++ b/drivers/hid/intel-ish-hid/ipc/pci-ish.c @@ -27,10 +27,12 @@ enum ishtp_driver_data_index { ISHTP_DRIVER_DATA_NONE, ISHTP_DRIVER_DATA_LNL_M, ISHTP_DRIVER_DATA_PTL, + ISHTP_DRIVER_DATA_WCL, }; #define ISH_FW_GEN_LNL_M "lnlm" #define ISH_FW_GEN_PTL "ptl" +#define ISH_FW_GEN_WCL "wcl" #define ISH_FIRMWARE_PATH(gen) "intel/ish/ish_" gen ".bin" #define ISH_FIRMWARE_PATH_ALL "intel/ish/ish_*.bin" @@ -42,6 +44,9 @@ static struct ishtp_driver_data ishtp_driver_data[] = { [ISHTP_DRIVER_DATA_PTL] = { .fw_generation = ISH_FW_GEN_PTL, }, + [ISHTP_DRIVER_DATA_WCL] = { + .fw_generation = ISH_FW_GEN_WCL, + }, }; static const struct pci_device_id ish_pci_tbl[] = { @@ -70,6 +75,7 @@ static const struct pci_device_id ish_pci_tbl[] = { {PCI_DEVICE_DATA(INTEL, ISH_LNL_M, ISHTP_DRIVER_DATA_LNL_M)}, {PCI_DEVICE_DATA(INTEL, ISH_PTL_H, ISHTP_DRIVER_DATA_PTL)}, {PCI_DEVICE_DATA(INTEL, ISH_PTL_P, ISHTP_DRIVER_DATA_PTL)}, + {PCI_DEVICE_DATA(INTEL, ISH_WCL, ISHTP_DRIVER_DATA_WCL)}, {} }; MODULE_DEVICE_TABLE(pci, ish_pci_tbl); From e0eb1b6b0cd29ca7793c501d5960fd36ba11f110 Mon Sep 17 00:00:00 2001 From: Fangrui Song Date: Mon, 2 Jun 2025 20:48:44 -0700 Subject: [PATCH 018/289] riscv: vdso: Exclude .rodata from the PT_DYNAMIC segment .rodata is implicitly included in the PT_DYNAMIC segment due to inheriting the segment of the preceding .dynamic section (in both GNU ld and LLD). When the .rodata section's size is not a multiple of 16 bytes on riscv64, llvm-readelf will report a "PT_DYNAMIC dynamic table is invalid" warning. Note: in the presence of the .dynamic section, GNU readelf and llvm-readelf's -d option decodes the dynamic section using the section. This issue arose after commit 8f8c1ff879fab60f80f3a7aec3000f47e5b03ba9 ("riscv: vdso.lds.S: remove hardcoded 0x800 .text start addr"), which placed .rodata directly after .dynamic by removing .eh_frame. This patch resolves the implicit inclusion into PT_DYNAMIC by explicitly specifying the :text output section phdr. Reported-by: Nathan Chancellor Closes: https://github.com/ClangBuiltLinux/linux/issues/2093 Signed-off-by: Fangrui Song Tested-by: Nathan Chancellor Link: https://lore.kernel.org/r/20250602-riscv-vdso-v1-1-0620cf63cff0@maskray.me Signed-off-by: Palmer Dabbelt --- arch/riscv/kernel/vdso/vdso.lds.S | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/riscv/kernel/vdso/vdso.lds.S b/arch/riscv/kernel/vdso/vdso.lds.S index 7c15b0f4ee3b..c29ef12a63bb 100644 --- a/arch/riscv/kernel/vdso/vdso.lds.S +++ b/arch/riscv/kernel/vdso/vdso.lds.S @@ -30,7 +30,7 @@ SECTIONS *(.data .data.* .gnu.linkonce.d.*) *(.dynbss) *(.bss .bss.* .gnu.linkonce.b.*) - } + } :text .note : { *(.note.*) } :text :note From fdc9be90929081ed5a9658583aa5f3121d5e8365 Mon Sep 17 00:00:00 2001 From: Li Ming Date: Tue, 3 Jun 2025 18:43:13 +0800 Subject: [PATCH 019/289] cxl/edac: Fix the min_scrub_cycle of a region miscalculation When trying to update the scrub_cycle value of a cxl region, which means updating the scrub_cycle value of each memdev under a cxl region. cxl driver needs to guarantee the new scrub_cycle value is greater than the min_scrub_cycle value of a memdev, otherwise the updating operation will fail(Per Table 8-223 in CXL r3.2 section 8.2.10.9.11.1). Current implementation logic of getting the min_scrub_cycle value of a cxl region is that getting the min_scrub_cycle value of each memdevs under the cxl region, then using the minimum min_scrub_cycle value as the region's min_scrub_cycle. Checking if the new scrub_cycle value is greater than this value. If yes, updating the new scrub_cycle value to each memdevs. The issue is that the new scrub_cycle value is possibly greater than the minimum min_scrub_cycle value of all memdevs but less than the maximum min_scrub_cycle value of all memdevs if memdevs have a different min_scrub_cycle value. The updating operation will always fail on these memdevs which have a greater min_scrub_cycle than the new scrub_cycle. The correct implementation logic is to get the maximum value of these memdevs' min_scrub_cycle, check if the new scrub_cycle value is greater than the value. If yes, the new scrub_cycle value is fit for the region. The change also impacts the result of cxl_patrol_scrub_get_min_scrub_cycle(), the interface returned the minimum min_scrub_cycle value among all memdevs under the region before the change. The interface will return the maximum min_scrub_cycle value among all memdevs under the region with the change. Signed-off-by: Li Ming Reviewed-by: Jonathan Cameron Reviewed-by: Dave Jiang Reviewed-by: Shiju Jose Reviewed-by: Davidlohr Bueso Link: https://patch.msgid.link/20250603104314.25569-1-ming.li@zohomail.com Signed-off-by: Dave Jiang --- drivers/cxl/core/edac.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/drivers/cxl/core/edac.c b/drivers/cxl/core/edac.c index 2cbc664e5d62..0ef245d0bd9f 100644 --- a/drivers/cxl/core/edac.c +++ b/drivers/cxl/core/edac.c @@ -103,10 +103,10 @@ static int cxl_scrub_get_attrbs(struct cxl_patrol_scrub_context *cxl_ps_ctx, u8 *cap, u16 *cycle, u8 *flags, u8 *min_cycle) { struct cxl_mailbox *cxl_mbox; - u8 min_scrub_cycle = U8_MAX; struct cxl_region_params *p; struct cxl_memdev *cxlmd; struct cxl_region *cxlr; + u8 min_scrub_cycle = 0; int i, ret; if (!cxl_ps_ctx->cxlr) { @@ -133,8 +133,12 @@ static int cxl_scrub_get_attrbs(struct cxl_patrol_scrub_context *cxl_ps_ctx, if (ret) return ret; + /* + * The min_scrub_cycle of a region is the max of minimum scrub + * cycles supported by memdevs that back the region. + */ if (min_cycle) - min_scrub_cycle = min(*min_cycle, min_scrub_cycle); + min_scrub_cycle = max(*min_cycle, min_scrub_cycle); } if (min_cycle) From 85cc50bfcb8b08c9304925b66cd2bc83c1c765bf Mon Sep 17 00:00:00 2001 From: Li Ming Date: Tue, 3 Jun 2025 18:43:14 +0800 Subject: [PATCH 020/289] cxl/Documentation: Add more description about min/max scrub cycle user can configurare scrub cycle for a region or a memory device via sysfs interface. Currently, these interfaces have not enough description for the return value. So adding return value description to these interfaces. Suggested-by: Alison Schofield Signed-off-by: Shiju Jose Signed-off-by: Li Ming Reviewed-by: Jonathan Cameron Reviewed-by: Dave Jiang Reviewed-by: Davidlohr Bueso Link: https://patch.msgid.link/20250603104314.25569-2-ming.li@zohomail.com Signed-off-by: Dave Jiang --- Documentation/ABI/testing/sysfs-edac-scrub | 16 ++++++++++++++++ 1 file changed, 16 insertions(+) diff --git a/Documentation/ABI/testing/sysfs-edac-scrub b/Documentation/ABI/testing/sysfs-edac-scrub index c43be90deab4..ab6014743da5 100644 --- a/Documentation/ABI/testing/sysfs-edac-scrub +++ b/Documentation/ABI/testing/sysfs-edac-scrub @@ -49,6 +49,12 @@ Description: (RO) Supported minimum scrub cycle duration in seconds by the memory scrubber. + Device-based scrub: returns the minimum scrub cycle + supported by the memory device. + + Region-based scrub: returns the max of minimum scrub cycles + supported by individual memory devices that back the region. + What: /sys/bus/edac/devices//scrubX/max_cycle_duration Date: March 2025 KernelVersion: 6.15 @@ -57,6 +63,16 @@ Description: (RO) Supported maximum scrub cycle duration in seconds by the memory scrubber. + Device-based scrub: returns the maximum scrub cycle supported + by the memory device. + + Region-based scrub: returns the min of maximum scrub cycles + supported by individual memory devices that back the region. + + If the memory device does not provide maximum scrub cycle + information, return the maximum supported value of the scrub + cycle field. + What: /sys/bus/edac/devices//scrubX/current_cycle_duration Date: March 2025 KernelVersion: 6.15 From f3054152c12e2eed1e72704aff47b0ea58229584 Mon Sep 17 00:00:00 2001 From: Thomas Zeitlhofer Date: Mon, 19 May 2025 10:54:46 +0200 Subject: [PATCH 021/289] HID: wacom: fix crash in wacom_aes_battery_handler() Commit fd2a9b29dc9c ("HID: wacom: Remove AES power_supply after extended inactivity") introduced wacom_aes_battery_handler() which is scheduled as a delayed work (aes_battery_work). In wacom_remove(), aes_battery_work is not canceled. Consequently, if the device is removed while aes_battery_work is still pending, then hard crashes or "Oops: general protection fault..." are experienced when wacom_aes_battery_handler() is finally called. E.g., this happens with built-in USB devices after resume from hibernate when aes_battery_work was still pending at the time of hibernation. So, take care to cancel aes_battery_work in wacom_remove(). Fixes: fd2a9b29dc9c ("HID: wacom: Remove AES power_supply after extended inactivity") Signed-off-by: Thomas Zeitlhofer Acked-by: Ping Cheng Signed-off-by: Jiri Kosina --- drivers/hid/wacom_sys.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/hid/wacom_sys.c b/drivers/hid/wacom_sys.c index 1257131b1e34..9a57504e51a1 100644 --- a/drivers/hid/wacom_sys.c +++ b/drivers/hid/wacom_sys.c @@ -2905,6 +2905,7 @@ static void wacom_remove(struct hid_device *hdev) hid_hw_stop(hdev); cancel_delayed_work_sync(&wacom->init_work); + cancel_delayed_work_sync(&wacom->aes_battery_work); cancel_work_sync(&wacom->wireless_work); cancel_work_sync(&wacom->battery_work); cancel_work_sync(&wacom->remote_work); From 8d90d9872edae7e78c3a12b98e239bfaa66f3639 Mon Sep 17 00:00:00 2001 From: Charles Mirabile Date: Fri, 30 May 2025 17:14:22 -0400 Subject: [PATCH 022/289] riscv: fix runtime constant support for nommu kernels the `__runtime_fixup_32` function does not handle the case where `val` is zero correctly (as might occur when patching a nommu kernel and referring to a physical address below the 4GiB boundary whose upper 32 bits are all zero) because nothing in the existing logic prevents the code from taking the `else` branch of both nop-checks and emitting two `nop` instructions. This leaves random garbage in the register that is supposed to receive the upper 32 bits of the pointer instead of zero that when combined with the value for the lower 32 bits yields an invalid pointer and causes a kernel panic when that pointer is eventually accessed. The author clearly considered the fact that if the `lui` is converted into a `nop` that the second instruction needs to be adjusted to become an `li` instead of an `addi`, hence introducing the `addi_insn_mask` variable, but didn't follow that logic through fully to the case where the `else` branch executes. To fix it just adjust the logic to ensure that the second `else` branch is not taken if the first instruction will be patched to a `nop`. Fixes: a44fb5722199 ("riscv: Add runtime constant support") Signed-off-by: Charles Mirabile Reviewed-by: Charlie Jenkins Tested-by: Charlie Jenkins Link: https://lore.kernel.org/r/20250530211422.784415-2-cmirabil@redhat.com Signed-off-by: Palmer Dabbelt --- arch/riscv/include/asm/runtime-const.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/riscv/include/asm/runtime-const.h b/arch/riscv/include/asm/runtime-const.h index 451fd76b8811..d766e2b9e6df 100644 --- a/arch/riscv/include/asm/runtime-const.h +++ b/arch/riscv/include/asm/runtime-const.h @@ -206,7 +206,7 @@ static inline void __runtime_fixup_32(__le16 *lui_parcel, __le16 *addi_parcel, u addi_insn_mask &= 0x07fff; } - if (lower_immediate & 0x00000fff) { + if (lower_immediate & 0x00000fff || lui_insn == RISCV_INSN_NOP4) { /* replace upper 12 bits of addi with lower 12 bits of val */ addi_insn &= addi_insn_mask; addi_insn |= (lower_immediate & 0x00000fff) << 20; From ed2a6ff0234e21cd5e56b63d8bff80120bbe5f15 Mon Sep 17 00:00:00 2001 From: "Rob Herring (Arm)" Date: Thu, 1 May 2025 20:14:26 -0500 Subject: [PATCH 023/289] dt-bindings: serial: Convert altr,juart-1.0 to DT schema Convert the Altera JTAG UART binding to DT schema. The "ALTR,juart-1.0" compatible has long been deprecated, so drop it. Signed-off-by: Rob Herring (Arm) --- .../bindings/serial/altera_jtaguart.txt | 5 ----- .../bindings/serial/altr,juart-1.0.yaml | 19 +++++++++++++++++++ 2 files changed, 19 insertions(+), 5 deletions(-) delete mode 100644 Documentation/devicetree/bindings/serial/altera_jtaguart.txt create mode 100644 Documentation/devicetree/bindings/serial/altr,juart-1.0.yaml diff --git a/Documentation/devicetree/bindings/serial/altera_jtaguart.txt b/Documentation/devicetree/bindings/serial/altera_jtaguart.txt deleted file mode 100644 index 55a901051e8f..000000000000 --- a/Documentation/devicetree/bindings/serial/altera_jtaguart.txt +++ /dev/null @@ -1,5 +0,0 @@ -Altera JTAG UART - -Required properties: -- compatible : should be "ALTR,juart-1.0" -- compatible : should be "altr,juart-1.0" diff --git a/Documentation/devicetree/bindings/serial/altr,juart-1.0.yaml b/Documentation/devicetree/bindings/serial/altr,juart-1.0.yaml new file mode 100644 index 000000000000..02e20fa591da --- /dev/null +++ b/Documentation/devicetree/bindings/serial/altr,juart-1.0.yaml @@ -0,0 +1,19 @@ +# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause) +%YAML 1.2 +--- +$id: http://devicetree.org/schemas/serial/altr,juart-1.0.yaml# +$schema: http://devicetree.org/meta-schemas/core.yaml# + +title: Altera JTAG UART + +maintainers: + - Dinh Nguyen + +properties: + compatible: + const: altr,juart-1.0 + +required: + - compatible + +additionalProperties: false From f75794b6077ec729f57de9a1ad24f14d288a68bb Mon Sep 17 00:00:00 2001 From: "Rob Herring (Arm)" Date: Thu, 1 May 2025 20:15:40 -0500 Subject: [PATCH 024/289] dt-bindings: serial: Convert altr,uart-1.0 to DT schema Convert the Altera JTAG UART binding to DT schema. The "ALTR,uart-1.0" compatible has long been deprecated, so drop it. Signed-off-by: Rob Herring (Arm) --- .../bindings/serial/altera_uart.txt | 8 ------ .../bindings/serial/altr,uart-1.0.yaml | 25 +++++++++++++++++++ 2 files changed, 25 insertions(+), 8 deletions(-) delete mode 100644 Documentation/devicetree/bindings/serial/altera_uart.txt create mode 100644 Documentation/devicetree/bindings/serial/altr,uart-1.0.yaml diff --git a/Documentation/devicetree/bindings/serial/altera_uart.txt b/Documentation/devicetree/bindings/serial/altera_uart.txt deleted file mode 100644 index 81bf7ffb1a81..000000000000 --- a/Documentation/devicetree/bindings/serial/altera_uart.txt +++ /dev/null @@ -1,8 +0,0 @@ -Altera UART - -Required properties: -- compatible : should be "ALTR,uart-1.0" -- compatible : should be "altr,uart-1.0" - -Optional properties: -- clock-frequency : frequency of the clock input to the UART diff --git a/Documentation/devicetree/bindings/serial/altr,uart-1.0.yaml b/Documentation/devicetree/bindings/serial/altr,uart-1.0.yaml new file mode 100644 index 000000000000..72d4972e1e22 --- /dev/null +++ b/Documentation/devicetree/bindings/serial/altr,uart-1.0.yaml @@ -0,0 +1,25 @@ +# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause) +%YAML 1.2 +--- +$id: http://devicetree.org/schemas/serial/altr,uart-1.0.yaml# +$schema: http://devicetree.org/meta-schemas/core.yaml# + +title: Altera UART + +maintainers: + - Dinh Nguyen + +allOf: + - $ref: /schemas/serial/serial.yaml# + +properties: + compatible: + const: altr,uart-1.0 + + clock-frequency: + description: Frequency of the clock input to the UART. + +required: + - compatible + +unevaluatedProperties: false From b26852daaa83f535109253d114426d1fa674155d Mon Sep 17 00:00:00 2001 From: Arnd Bergmann Date: Tue, 10 Jun 2025 11:28:42 +0200 Subject: [PATCH 025/289] RDMA/mlx5: reduce stack usage in mlx5_ib_ufile_hw_cleanup This function has an array of eight mlx5_async_cmd structures, which often fits on the stack, but depending on the configuration can end up blowing the stack frame warning limit: drivers/infiniband/hw/mlx5/devx.c:2670:6: error: stack frame size (1392) exceeds limit (1280) in 'mlx5_ib_ufile_hw_cleanup' [-Werror,-Wframe-larger-than] Change this to a dynamic allocation instead. While a kmalloc() can theoretically fail, a GFP_KERNEL allocation under a page will block until memory has been freed up, so in the worst case, this only adds extra time in an already constrained environment. Fixes: 7c891a4dbcc1 ("RDMA/mlx5: Add implementation for ufile_hw_cleanup device operation") Signed-off-by: Arnd Bergmann Link: https://patch.msgid.link/20250610092846.2642535-1-arnd@kernel.org Signed-off-by: Leon Romanovsky --- drivers/infiniband/hw/mlx5/devx.c | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/drivers/infiniband/hw/mlx5/devx.c b/drivers/infiniband/hw/mlx5/devx.c index 2479da8620ca..bceae1c1f980 100644 --- a/drivers/infiniband/hw/mlx5/devx.c +++ b/drivers/infiniband/hw/mlx5/devx.c @@ -2669,7 +2669,7 @@ static void devx_wait_async_destroy(struct mlx5_async_cmd *cmd) void mlx5_ib_ufile_hw_cleanup(struct ib_uverbs_file *ufile) { - struct mlx5_async_cmd async_cmd[MAX_ASYNC_CMDS]; + struct mlx5_async_cmd *async_cmd; struct ib_ucontext *ucontext = ufile->ucontext; struct ib_device *device = ucontext->device; struct mlx5_ib_dev *dev = to_mdev(device); @@ -2678,6 +2678,10 @@ void mlx5_ib_ufile_hw_cleanup(struct ib_uverbs_file *ufile) int head = 0; int tail = 0; + async_cmd = kcalloc(MAX_ASYNC_CMDS, sizeof(*async_cmd), GFP_KERNEL); + if (!async_cmd) + return; + list_for_each_entry(uobject, &ufile->uobjects, list) { WARN_ON(uverbs_try_lock_object(uobject, UVERBS_LOOKUP_WRITE)); @@ -2713,6 +2717,8 @@ void mlx5_ib_ufile_hw_cleanup(struct ib_uverbs_file *ufile) devx_wait_async_destroy(&async_cmd[head % MAX_ASYNC_CMDS]); head++; } + + kfree(async_cmd); } static ssize_t devx_async_cmd_event_read(struct file *filp, char __user *buf, From 4262bd0d9cc704ea1365ac00afc1272400c2cbef Mon Sep 17 00:00:00 2001 From: Han Gao Date: Fri, 23 May 2025 18:25:56 +0800 Subject: [PATCH 026/289] riscv: vector: Fix context save/restore with xtheadvector Previously only v0-v7 were correctly saved/restored, and the context of v8-v31 are damanged. Correctly save/restore v8-v31 to avoid breaking userspace. Fixes: d863910eabaf ("riscv: vector: Support xtheadvector save/restore") Cc: stable@vger.kernel.org Signed-off-by: Han Gao Tested-by: Xiongchuan Tan Reviewed-by: Charlie Jenkins Reviewed-by: Yanteng Si Reviewed-by: Andy Chiu Link: https://lore.kernel.org/r/9b9eb2337f3d5336ce813721f8ebea51e0b2b553.1747994822.git.rabenda.cn@gmail.com Signed-off-by: Alexandre Ghiti Signed-off-by: Palmer Dabbelt --- arch/riscv/include/asm/vector.h | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/arch/riscv/include/asm/vector.h b/arch/riscv/include/asm/vector.h index 45c9b426fcc5..b61786d43c20 100644 --- a/arch/riscv/include/asm/vector.h +++ b/arch/riscv/include/asm/vector.h @@ -205,11 +205,11 @@ static inline void __riscv_v_vstate_save(struct __riscv_v_ext_state *save_to, THEAD_VSETVLI_T4X0E8M8D1 THEAD_VSB_V_V0T0 "add t0, t0, t4\n\t" - THEAD_VSB_V_V0T0 + THEAD_VSB_V_V8T0 "add t0, t0, t4\n\t" - THEAD_VSB_V_V0T0 + THEAD_VSB_V_V16T0 "add t0, t0, t4\n\t" - THEAD_VSB_V_V0T0 + THEAD_VSB_V_V24T0 : : "r" (datap) : "memory", "t0", "t4"); } else { asm volatile ( @@ -241,11 +241,11 @@ static inline void __riscv_v_vstate_restore(struct __riscv_v_ext_state *restore_ THEAD_VSETVLI_T4X0E8M8D1 THEAD_VLB_V_V0T0 "add t0, t0, t4\n\t" - THEAD_VLB_V_V0T0 + THEAD_VLB_V_V8T0 "add t0, t0, t4\n\t" - THEAD_VLB_V_V0T0 + THEAD_VLB_V_V16T0 "add t0, t0, t4\n\t" - THEAD_VLB_V_V0T0 + THEAD_VLB_V_V24T0 : : "r" (datap) : "memory", "t0", "t4"); } else { asm volatile ( From 2b9518684f8558cd61a7c608cc03a27822cf7b03 Mon Sep 17 00:00:00 2001 From: Xi Ruoyao Date: Fri, 6 Jun 2025 17:24:44 +0800 Subject: [PATCH 027/289] RISC-V: vDSO: Correct inline assembly constraints in the getrandom syscall wrapper MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit As recently pointed out by Thomas, if a register is forced for two different register variables, among them one is used as "+" (both input and output) and another is only used as input, Clang would treat the conflicting input parameters as undefined behaviour and optimize away the argument assignment. Instead use "=r" (only output) for the output parameter and "r" (only input) for the input parameter. While the example from the GCC documentation uses "0" for the input parameter, this is not necessary as confirmed by the GCC developers and "r" matches what the other architectures' vDSO implementations are using. [ alex: Update log to match v2 (Thomas) ] Link: https://lore.kernel.org/all/20250603-loongarch-vdso-syscall-v1-1-6d12d6dfbdd0@linutronix.de/ Link: https://gcc.gnu.org/onlinedocs/gcc-15.1.0/gcc/Local-Register-Variables.html Link: https://gcc.gnu.org/pipermail/gcc-help/2025-June/144266.html Cc: Thomas Weißschuh Cc: Nathan Chancellor Signed-off-by: Xi Ruoyao Reviewed-by: Thomas Weißschuh Fixes: ee0d03053e70 ("RISC-V: vDSO: Wire up getrandom() vDSO") Link: https://lore.kernel.org/r/20250606092443.73650-2-xry111@xry111.site Signed-off-by: Alexandre Ghiti Signed-off-by: Palmer Dabbelt --- arch/riscv/include/asm/vdso/getrandom.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/riscv/include/asm/vdso/getrandom.h b/arch/riscv/include/asm/vdso/getrandom.h index 8dc92441702a..c6d66895c1f5 100644 --- a/arch/riscv/include/asm/vdso/getrandom.h +++ b/arch/riscv/include/asm/vdso/getrandom.h @@ -18,7 +18,7 @@ static __always_inline ssize_t getrandom_syscall(void *_buffer, size_t _len, uns register unsigned int flags asm("a2") = _flags; asm volatile ("ecall\n" - : "+r" (ret) + : "=r" (ret) : "r" (nr), "r" (buffer), "r" (len), "r" (flags) : "memory"); From bc75552b80e6683b2def5a0459433607ea4788f5 Mon Sep 17 00:00:00 2001 From: Chunyan Zhang Date: Tue, 10 Jun 2025 18:12:32 +0800 Subject: [PATCH 028/289] raid6: riscv: Fix NULL pointer dereference caused by a missing clobber When running the raid6 user-space test program on RISC-V QEMU, there's a segmentation fault which seems caused by accessing a NULL pointer, which is the pointer variable p/q in raid6_rvv*_gen/xor_syndrome_real(), p/q should have been equal to dptr[x], but when I use GDB command to see its value, which was 0x10 like below: " Program received signal SIGSEGV, Segmentation fault. 0x0000000000011062 in raid6_rvv2_xor_syndrome_real (disks=, start=0, stop=, bytes=4096, ptrs=) at rvv.c:386 (gdb) p p $1 = (u8 *) 0x10 " The issue was found to be related with: 1) Compile optimization There's no segmentation fault if compiling the raid6test program with the optimization flag -O0. 2) The RISC-V vector command vsetvli If not used t0 as the first parameter in vsetvli, there's no segmentation fault either. This patch selects the 2nd solution to fix the issue. [Palmer: The actual issue here is a missing clobber in the vsetvli code. It's a little tricky: we've already probed for VLENB so we don't need to look at the output register, we just need to have an X register in the instruction as that's the form required to actually set VL. Thus we clobber a register, and without describing that we end up breaking compilers.] Fixes: 6093faaf9593 ("raid6: Add RISC-V SIMD syndrome and recovery calculations") Signed-off-by: Chunyan Zhang Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20250610101234.1100660-3-zhangchunyan@iscas.ac.cn Signed-off-by: Palmer Dabbelt --- lib/raid6/rvv.c | 48 ++++++++++++++++++++++++++++-------------------- 1 file changed, 28 insertions(+), 20 deletions(-) diff --git a/lib/raid6/rvv.c b/lib/raid6/rvv.c index f0887344b274..7d82efa5b14f 100644 --- a/lib/raid6/rvv.c +++ b/lib/raid6/rvv.c @@ -26,9 +26,9 @@ static int rvv_has_vector(void) static void raid6_rvv1_gen_syndrome_real(int disks, unsigned long bytes, void **ptrs) { u8 **dptr = (u8 **)ptrs; - unsigned long d; - int z, z0; u8 *p, *q; + unsigned long vl, d; + int z, z0; z0 = disks - 3; /* Highest data disk */ p = dptr[z0 + 1]; /* XOR parity */ @@ -36,8 +36,9 @@ static void raid6_rvv1_gen_syndrome_real(int disks, unsigned long bytes, void ** asm volatile (".option push\n" ".option arch,+v\n" - "vsetvli t0, x0, e8, m1, ta, ma\n" + "vsetvli %0, x0, e8, m1, ta, ma\n" ".option pop\n" + : "=&r" (vl) ); /* v0:wp0, v1:wq0, v2:wd0/w20, v3:w10 */ @@ -99,7 +100,7 @@ static void raid6_rvv1_xor_syndrome_real(int disks, int start, int stop, { u8 **dptr = (u8 **)ptrs; u8 *p, *q; - unsigned long d; + unsigned long vl, d; int z, z0; z0 = stop; /* P/Q right side optimization */ @@ -108,8 +109,9 @@ static void raid6_rvv1_xor_syndrome_real(int disks, int start, int stop, asm volatile (".option push\n" ".option arch,+v\n" - "vsetvli t0, x0, e8, m1, ta, ma\n" + "vsetvli %0, x0, e8, m1, ta, ma\n" ".option pop\n" + : "=&r" (vl) ); /* v0:wp0, v1:wq0, v2:wd0/w20, v3:w10 */ @@ -195,9 +197,9 @@ static void raid6_rvv1_xor_syndrome_real(int disks, int start, int stop, static void raid6_rvv2_gen_syndrome_real(int disks, unsigned long bytes, void **ptrs) { u8 **dptr = (u8 **)ptrs; - unsigned long d; - int z, z0; u8 *p, *q; + unsigned long vl, d; + int z, z0; z0 = disks - 3; /* Highest data disk */ p = dptr[z0 + 1]; /* XOR parity */ @@ -205,8 +207,9 @@ static void raid6_rvv2_gen_syndrome_real(int disks, unsigned long bytes, void ** asm volatile (".option push\n" ".option arch,+v\n" - "vsetvli t0, x0, e8, m1, ta, ma\n" + "vsetvli %0, x0, e8, m1, ta, ma\n" ".option pop\n" + : "=&r" (vl) ); /* @@ -287,7 +290,7 @@ static void raid6_rvv2_xor_syndrome_real(int disks, int start, int stop, { u8 **dptr = (u8 **)ptrs; u8 *p, *q; - unsigned long d; + unsigned long vl, d; int z, z0; z0 = stop; /* P/Q right side optimization */ @@ -296,8 +299,9 @@ static void raid6_rvv2_xor_syndrome_real(int disks, int start, int stop, asm volatile (".option push\n" ".option arch,+v\n" - "vsetvli t0, x0, e8, m1, ta, ma\n" + "vsetvli %0, x0, e8, m1, ta, ma\n" ".option pop\n" + : "=&r" (vl) ); /* @@ -413,9 +417,9 @@ static void raid6_rvv2_xor_syndrome_real(int disks, int start, int stop, static void raid6_rvv4_gen_syndrome_real(int disks, unsigned long bytes, void **ptrs) { u8 **dptr = (u8 **)ptrs; - unsigned long d; - int z, z0; u8 *p, *q; + unsigned long vl, d; + int z, z0; z0 = disks - 3; /* Highest data disk */ p = dptr[z0 + 1]; /* XOR parity */ @@ -423,8 +427,9 @@ static void raid6_rvv4_gen_syndrome_real(int disks, unsigned long bytes, void ** asm volatile (".option push\n" ".option arch,+v\n" - "vsetvli t0, x0, e8, m1, ta, ma\n" + "vsetvli %0, x0, e8, m1, ta, ma\n" ".option pop\n" + : "=&r" (vl) ); /* @@ -539,7 +544,7 @@ static void raid6_rvv4_xor_syndrome_real(int disks, int start, int stop, { u8 **dptr = (u8 **)ptrs; u8 *p, *q; - unsigned long d; + unsigned long vl, d; int z, z0; z0 = stop; /* P/Q right side optimization */ @@ -548,8 +553,9 @@ static void raid6_rvv4_xor_syndrome_real(int disks, int start, int stop, asm volatile (".option push\n" ".option arch,+v\n" - "vsetvli t0, x0, e8, m1, ta, ma\n" + "vsetvli %0, x0, e8, m1, ta, ma\n" ".option pop\n" + : "=&r" (vl) ); /* @@ -721,9 +727,9 @@ static void raid6_rvv4_xor_syndrome_real(int disks, int start, int stop, static void raid6_rvv8_gen_syndrome_real(int disks, unsigned long bytes, void **ptrs) { u8 **dptr = (u8 **)ptrs; - unsigned long d; - int z, z0; u8 *p, *q; + unsigned long vl, d; + int z, z0; z0 = disks - 3; /* Highest data disk */ p = dptr[z0 + 1]; /* XOR parity */ @@ -731,8 +737,9 @@ static void raid6_rvv8_gen_syndrome_real(int disks, unsigned long bytes, void ** asm volatile (".option push\n" ".option arch,+v\n" - "vsetvli t0, x0, e8, m1, ta, ma\n" + "vsetvli %0, x0, e8, m1, ta, ma\n" ".option pop\n" + : "=&r" (vl) ); /* @@ -915,7 +922,7 @@ static void raid6_rvv8_xor_syndrome_real(int disks, int start, int stop, { u8 **dptr = (u8 **)ptrs; u8 *p, *q; - unsigned long d; + unsigned long vl, d; int z, z0; z0 = stop; /* P/Q right side optimization */ @@ -924,8 +931,9 @@ static void raid6_rvv8_xor_syndrome_real(int disks, int start, int stop, asm volatile (".option push\n" ".option arch,+v\n" - "vsetvli t0, x0, e8, m1, ta, ma\n" + "vsetvli %0, x0, e8, m1, ta, ma\n" ".option pop\n" + : "=&r" (vl) ); /* From 2aa5801ada29948ce510fc8b1e3b3ec8162423e2 Mon Sep 17 00:00:00 2001 From: Palmer Dabbelt Date: Tue, 10 Jun 2025 14:30:58 -0700 Subject: [PATCH 029/289] RISC-V: uaccess: Wrap the get_user_8 uaccess macro I must have lost this rebasing things during the merge window, I know I got it at some point but it's not here now. Without this I get warnings along the lines of include/linux/fs.h:3975:15: warning: label followed by a declaration is a C23 extension [-Wc23-extensions] 3975 | if (unlikely(get_user(c, path))) | ^ arch/riscv/include/asm/uaccess.h:274:3: note: expanded from macro 'get_user' 274 | __get_user((x), __p) : \ | ^ arch/riscv/include/asm/uaccess.h:244:2: note: expanded from macro '__get_user' 244 | __get_user_error(__gu_val, __gu_ptr, __gu_err); \ | ^ arch/riscv/include/asm/uaccess.h:207:2: note: expanded from macro '__get_user_error' 207 | __ge LD [M] net/802/psnap.ko t_user_nocheck(x, ptr, __gu_failed); \ | ^ arch/riscv/include/asm/uaccess.h:196:3: note: expanded from macro '__get_user_nocheck' 196 | __get_user_8((x), __gu_ptr, label); \ | ^ arch/riscv/include/asm/uaccess.h:130:2: note: expanded from macro '__get_user_8' 130 | u32 __user *__ptr = (u32 __user *)(ptr); \ | ^ Link: https://lore.kernel.org/r/20250610213058.24852-1-palmer@dabbelt.com Reviewed-by: Alexandre Ghiti Cc: stable@vger.kernel.org Fixes: f6bff7827a48 ("riscv: uaccess: use 'asm_goto_output' for get_user()") Signed-off-by: Palmer Dabbelt --- arch/riscv/include/asm/uaccess.h | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/arch/riscv/include/asm/uaccess.h b/arch/riscv/include/asm/uaccess.h index d472da4450e6..525e50db24f7 100644 --- a/arch/riscv/include/asm/uaccess.h +++ b/arch/riscv/include/asm/uaccess.h @@ -127,6 +127,7 @@ do { \ #ifdef CONFIG_CC_HAS_ASM_GOTO_OUTPUT #define __get_user_8(x, ptr, label) \ +do { \ u32 __user *__ptr = (u32 __user *)(ptr); \ u32 __lo, __hi; \ asm_goto_output( \ @@ -141,7 +142,7 @@ do { \ : : label); \ (x) = (__typeof__(x))((__typeof__((x) - (x)))( \ (((u64)__hi << 32) | __lo))); \ - +} while (0) #else /* !CONFIG_CC_HAS_ASM_GOTO_OUTPUT */ #define __get_user_8(x, ptr, label) \ do { \ From a403fe6c0b17f472e01246eb350f5eef105243ac Mon Sep 17 00:00:00 2001 From: Li Ming Date: Fri, 13 Jun 2025 09:16:48 +0800 Subject: [PATCH 030/289] cxl/edac: Fix potential memory leak issues In cxl_store_rec_gen_media() and cxl_store_rec_dram(), use kmemdup() to duplicate a cxl gen_media/dram event to store the event in a xarray by xa_store(). The cxl gen_media/dram event allocated by kmemdup() should be freed in the case that the xa_store() fails. Fixes: 0b5ccb0de1e2 ("cxl/edac: Support for finding memory operation attributes from the current boot") Signed-off-by: Li Ming Tested-by: Shiju Jose Reviewed-by: Shiju Jose Reviewed-by: Ira Weiny Reviewed-by: Jonathan Cameron Link: https://patch.msgid.link/20250613011648.102840-1-ming.li@zohomail.com Signed-off-by: Dave Jiang --- drivers/cxl/core/edac.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/drivers/cxl/core/edac.c b/drivers/cxl/core/edac.c index 0ef245d0bd9f..d725ee954199 100644 --- a/drivers/cxl/core/edac.c +++ b/drivers/cxl/core/edac.c @@ -1103,8 +1103,10 @@ int cxl_store_rec_gen_media(struct cxl_memdev *cxlmd, union cxl_event *evt) old_rec = xa_store(&array_rec->rec_gen_media, le64_to_cpu(rec->media_hdr.phys_addr), rec, GFP_KERNEL); - if (xa_is_err(old_rec)) + if (xa_is_err(old_rec)) { + kfree(rec); return xa_err(old_rec); + } kfree(old_rec); @@ -1131,8 +1133,10 @@ int cxl_store_rec_dram(struct cxl_memdev *cxlmd, union cxl_event *evt) old_rec = xa_store(&array_rec->rec_dram, le64_to_cpu(rec->media_hdr.phys_addr), rec, GFP_KERNEL); - if (xa_is_err(old_rec)) + if (xa_is_err(old_rec)) { + kfree(rec); return xa_err(old_rec); + } kfree(old_rec); From 3c70ec71abdaf4e4fa48cd8fdfbbd864d78235a8 Mon Sep 17 00:00:00 2001 From: Dan Williams Date: Thu, 12 Jun 2025 12:20:43 -0700 Subject: [PATCH 031/289] cxl/ras: Fix CPER handler device confusion By inspection, cxl_cper_handle_prot_err() is making a series of fragile assumptions that can lead to crashes: 1/ It assumes that endpoints identified in the record are a CXL-type-3 device, nothing guarantees that. 2/ It assumes that the device is bound to the cxl_pci driver, nothing guarantees that. 3/ Minor, it holds the device lock over the switch-port tracing for no reason as the trace is 100% generated from data in the record. Correct those by checking that the PCIe endpoint parents a cxl_memdev before assuming the format of the driver data, and move the lock to where it is required. Consequently this also makes the implementation ready for CXL accelerators that are not bound to cxl_pci. Fixes: 36f257e3b0ba ("acpi/ghes, cxl/pci: Process CXL CPER Protocol Errors") Cc: Terry Bowman Cc: Li Ming Cc: Alison Schofield Cc: Ira Weiny Cc: Tony Luck Reviewed-by: Smita Koralahalli Reviewed-by: Dave Jiang Signed-off-by: Dan Williams Reviewed-by: Jonathan Cameron Reviewed-by: Li Ming Link: https://patch.msgid.link/20250612192043.2254617-1-dan.j.williams@intel.com Signed-off-by: Dave Jiang --- drivers/cxl/core/ras.c | 47 ++++++++++++++++++++++++------------------ 1 file changed, 27 insertions(+), 20 deletions(-) diff --git a/drivers/cxl/core/ras.c b/drivers/cxl/core/ras.c index 485a831695c7..2731ba3a0799 100644 --- a/drivers/cxl/core/ras.c +++ b/drivers/cxl/core/ras.c @@ -31,40 +31,38 @@ static void cxl_cper_trace_uncorr_port_prot_err(struct pci_dev *pdev, ras_cap.header_log); } -static void cxl_cper_trace_corr_prot_err(struct pci_dev *pdev, - struct cxl_ras_capability_regs ras_cap) +static void cxl_cper_trace_corr_prot_err(struct cxl_memdev *cxlmd, + struct cxl_ras_capability_regs ras_cap) { u32 status = ras_cap.cor_status & ~ras_cap.cor_mask; - struct cxl_dev_state *cxlds; - cxlds = pci_get_drvdata(pdev); - if (!cxlds) - return; - - trace_cxl_aer_correctable_error(cxlds->cxlmd, status); + trace_cxl_aer_correctable_error(cxlmd, status); } -static void cxl_cper_trace_uncorr_prot_err(struct pci_dev *pdev, - struct cxl_ras_capability_regs ras_cap) +static void +cxl_cper_trace_uncorr_prot_err(struct cxl_memdev *cxlmd, + struct cxl_ras_capability_regs ras_cap) { u32 status = ras_cap.uncor_status & ~ras_cap.uncor_mask; - struct cxl_dev_state *cxlds; u32 fe; - cxlds = pci_get_drvdata(pdev); - if (!cxlds) - return; - if (hweight32(status) > 1) fe = BIT(FIELD_GET(CXL_RAS_CAP_CONTROL_FE_MASK, ras_cap.cap_control)); else fe = status; - trace_cxl_aer_uncorrectable_error(cxlds->cxlmd, status, fe, + trace_cxl_aer_uncorrectable_error(cxlmd, status, fe, ras_cap.header_log); } +static int match_memdev_by_parent(struct device *dev, const void *uport) +{ + if (is_cxl_memdev(dev) && dev->parent == uport) + return 1; + return 0; +} + static void cxl_cper_handle_prot_err(struct cxl_cper_prot_err_work_data *data) { unsigned int devfn = PCI_DEVFN(data->prot_err.agent_addr.device, @@ -73,13 +71,12 @@ static void cxl_cper_handle_prot_err(struct cxl_cper_prot_err_work_data *data) pci_get_domain_bus_and_slot(data->prot_err.agent_addr.segment, data->prot_err.agent_addr.bus, devfn); + struct cxl_memdev *cxlmd; int port_type; if (!pdev) return; - guard(device)(&pdev->dev); - port_type = pci_pcie_type(pdev); if (port_type == PCI_EXP_TYPE_ROOT_PORT || port_type == PCI_EXP_TYPE_DOWNSTREAM || @@ -92,10 +89,20 @@ static void cxl_cper_handle_prot_err(struct cxl_cper_prot_err_work_data *data) return; } + guard(device)(&pdev->dev); + if (!pdev->dev.driver) + return; + + struct device *mem_dev __free(put_device) = bus_find_device( + &cxl_bus_type, NULL, pdev, match_memdev_by_parent); + if (!mem_dev) + return; + + cxlmd = to_cxl_memdev(mem_dev); if (data->severity == AER_CORRECTABLE) - cxl_cper_trace_corr_prot_err(pdev, data->ras_cap); + cxl_cper_trace_corr_prot_err(cxlmd, data->ras_cap); else - cxl_cper_trace_uncorr_prot_err(pdev, data->ras_cap); + cxl_cper_trace_uncorr_prot_err(cxlmd, data->ras_cap); } static void cxl_cper_prot_err_work_fn(struct work_struct *work) From 0f4dd2ce352d38c7ecf1b3821c908816eb6376a7 Mon Sep 17 00:00:00 2001 From: Kent Overstreet Date: Thu, 12 Jun 2025 18:26:48 -0400 Subject: [PATCH 032/289] bcachefs: trace_extent_trim_atomic Add a tracepoint for when we insert only part of an extent, due to too many overwrites. Signed-off-by: Kent Overstreet --- fs/bcachefs/extent_update.c | 13 ++++++++++++- fs/bcachefs/trace.h | 5 +++++ 2 files changed, 17 insertions(+), 1 deletion(-) diff --git a/fs/bcachefs/extent_update.c b/fs/bcachefs/extent_update.c index b899ee75f5b9..e76e58a568bf 100644 --- a/fs/bcachefs/extent_update.c +++ b/fs/bcachefs/extent_update.c @@ -139,6 +139,17 @@ int bch2_extent_trim_atomic(struct btree_trans *trans, if (ret) return ret; - bch2_cut_back(end, k); + /* tracepoint */ + + if (bpos_lt(end, k->k.p)) { + if (trace_extent_trim_atomic_enabled()) { + CLASS(printbuf, buf)(); + bch2_bpos_to_text(&buf, end); + prt_newline(&buf); + bch2_bkey_val_to_text(&buf, trans->c, bkey_i_to_s_c(k)); + trace_extent_trim_atomic(trans->c, buf.buf); + } + bch2_cut_back(end, k); + } return 0; } diff --git a/fs/bcachefs/trace.h b/fs/bcachefs/trace.h index dc09532796af..41efebdd06ef 100644 --- a/fs/bcachefs/trace.h +++ b/fs/bcachefs/trace.h @@ -1490,6 +1490,11 @@ DEFINE_EVENT(fs_str, io_move_evacuate_bucket, TP_ARGS(c, str) ); +DEFINE_EVENT(fs_str, extent_trim_atomic, + TP_PROTO(struct bch_fs *c, const char *str), + TP_ARGS(c, str) +); + #ifdef CONFIG_BCACHEFS_PATH_TRACEPOINTS TRACE_EVENT(update_by_path, From 3bd6f8aeae3d3f8121cbae5a8650a46622aa4e07 Mon Sep 17 00:00:00 2001 From: Kent Overstreet Date: Thu, 12 Jun 2025 18:27:37 -0400 Subject: [PATCH 033/289] bcachefs: btree iter tracepoints We've been seeing some livelock-ish behavior in the index update part of the main write path, and while we've got low level btree path tracepoints, we've been lacking high level btree iterator tracepoints. Signed-off-by: Kent Overstreet --- fs/bcachefs/btree_iter.c | 67 +++++++++++++++++++++++++++++++++++++--- fs/bcachefs/trace.h | 20 ++++++++++++ 2 files changed, 82 insertions(+), 5 deletions(-) diff --git a/fs/bcachefs/btree_iter.c b/fs/bcachefs/btree_iter.c index b78403376c07..b586ecf2fdfa 100644 --- a/fs/bcachefs/btree_iter.c +++ b/fs/bcachefs/btree_iter.c @@ -2326,6 +2326,20 @@ static struct bkey_s_c __bch2_btree_iter_peek(struct btree_trans *trans, struct } bch2_btree_iter_verify(trans, iter); + + if (trace___btree_iter_peek_enabled()) { + CLASS(printbuf, buf)(); + + int ret = bkey_err(k); + if (ret) + prt_str(&buf, bch2_err_str(ret)); + else if (k.k) + bch2_bkey_val_to_text(&buf, trans->c, k); + else + prt_str(&buf, "(null)"); + trace___btree_iter_peek(trans->c, buf.buf); + } + return k; } @@ -2484,6 +2498,19 @@ out_no_locked: bch2_btree_iter_verify_entry_exit(iter); + if (trace_btree_iter_peek_max_enabled()) { + CLASS(printbuf, buf)(); + + int ret = bkey_err(k); + if (ret) + prt_str(&buf, bch2_err_str(ret)); + else if (k.k) + bch2_bkey_val_to_text(&buf, trans->c, k); + else + prt_str(&buf, "(null)"); + trace_btree_iter_peek_max(trans->c, buf.buf); + } + return k; end: bch2_btree_iter_set_pos(trans, iter, end); @@ -2724,6 +2751,19 @@ out_no_locked: bch2_btree_iter_verify_entry_exit(iter); bch2_btree_iter_verify(trans, iter); + + if (trace_btree_iter_peek_prev_min_enabled()) { + CLASS(printbuf, buf)(); + + int ret = bkey_err(k); + if (ret) + prt_str(&buf, bch2_err_str(ret)); + else if (k.k) + bch2_bkey_val_to_text(&buf, trans->c, k); + else + prt_str(&buf, "(null)"); + trace_btree_iter_peek_prev_min(trans->c, buf.buf); + } return k; end: bch2_btree_iter_set_pos(trans, iter, end); @@ -2767,8 +2807,10 @@ struct bkey_s_c bch2_btree_iter_peek_slot(struct btree_trans *trans, struct btre /* extents can't span inode numbers: */ if ((iter->flags & BTREE_ITER_is_extents) && unlikely(iter->pos.offset == KEY_OFFSET_MAX)) { - if (iter->pos.inode == KEY_INODE_MAX) - return bkey_s_c_null; + if (iter->pos.inode == KEY_INODE_MAX) { + k = bkey_s_c_null; + goto out2; + } bch2_btree_iter_set_pos(trans, iter, bpos_nosnap_successor(iter->pos)); } @@ -2785,8 +2827,10 @@ struct bkey_s_c bch2_btree_iter_peek_slot(struct btree_trans *trans, struct btre } struct btree_path *path = btree_iter_path(trans, iter); - if (unlikely(!btree_path_node(path, path->level))) - return bkey_s_c_null; + if (unlikely(!btree_path_node(path, path->level))) { + k = bkey_s_c_null; + goto out2; + } btree_path_set_should_be_locked(trans, path); @@ -2879,7 +2923,20 @@ out: bch2_btree_iter_verify(trans, iter); ret = bch2_btree_iter_verify_ret(trans, iter, k); if (unlikely(ret)) - return bkey_s_c_err(ret); + k = bkey_s_c_err(ret); +out2: + if (trace_btree_iter_peek_slot_enabled()) { + CLASS(printbuf, buf)(); + + int ret = bkey_err(k); + if (ret) + prt_str(&buf, bch2_err_str(ret)); + else if (k.k) + bch2_bkey_val_to_text(&buf, trans->c, k); + else + prt_str(&buf, "(null)"); + trace_btree_iter_peek_slot(trans->c, buf.buf); + } return k; } diff --git a/fs/bcachefs/trace.h b/fs/bcachefs/trace.h index 41efebdd06ef..e759c9ff3965 100644 --- a/fs/bcachefs/trace.h +++ b/fs/bcachefs/trace.h @@ -1495,6 +1495,26 @@ DEFINE_EVENT(fs_str, extent_trim_atomic, TP_ARGS(c, str) ); +DEFINE_EVENT(fs_str, btree_iter_peek_slot, + TP_PROTO(struct bch_fs *c, const char *str), + TP_ARGS(c, str) +); + +DEFINE_EVENT(fs_str, __btree_iter_peek, + TP_PROTO(struct bch_fs *c, const char *str), + TP_ARGS(c, str) +); + +DEFINE_EVENT(fs_str, btree_iter_peek_max, + TP_PROTO(struct bch_fs *c, const char *str), + TP_ARGS(c, str) +); + +DEFINE_EVENT(fs_str, btree_iter_peek_prev_min, + TP_PROTO(struct bch_fs *c, const char *str), + TP_ARGS(c, str) +); + #ifdef CONFIG_BCACHEFS_PATH_TRACEPOINTS TRACE_EVENT(update_by_path, From 9b9a3270092bf8030dbe21ce90b2d0c8d98d33c7 Mon Sep 17 00:00:00 2001 From: Alan Huang Date: Fri, 13 Jun 2025 21:19:50 +0800 Subject: [PATCH 034/289] bcachefs: Don't allocate new memory when mempool is exhausted Allocating new memory when mempool is exhausted is too complicated, just return ENOMEM is fine. memcpy is not needed, since there might be pointers point to the old memory, that's the bug. Signed-off-by: Alan Huang Signed-off-by: Kent Overstreet --- fs/bcachefs/btree_iter.c | 29 ++++------------------------- 1 file changed, 4 insertions(+), 25 deletions(-) diff --git a/fs/bcachefs/btree_iter.c b/fs/bcachefs/btree_iter.c index b586ecf2fdfa..55de2e474705 100644 --- a/fs/bcachefs/btree_iter.c +++ b/fs/bcachefs/btree_iter.c @@ -3217,28 +3217,8 @@ void *__bch2_trans_kmalloc(struct btree_trans *trans, size_t size, unsigned long } if (trans->used_mempool) { - if (trans->mem_bytes >= new_bytes) - goto out_change_top; - - /* No more space from mempool item, need malloc new one */ - new_mem = kmalloc(new_bytes, GFP_NOWAIT|__GFP_NOWARN); - if (unlikely(!new_mem)) { - bch2_trans_unlock(trans); - - new_mem = kmalloc(new_bytes, GFP_KERNEL); - if (!new_mem) - return ERR_PTR(-BCH_ERR_ENOMEM_trans_kmalloc); - - ret = bch2_trans_relock(trans); - if (ret) { - kfree(new_mem); - return ERR_PTR(ret); - } - } - memcpy(new_mem, trans->mem, trans->mem_top); - trans->used_mempool = false; - mempool_free(trans->mem, &c->btree_trans_mem_pool); - goto out_new_mem; + EBUG_ON(trans->mem_bytes >= new_bytes); + return ERR_PTR(-BCH_ERR_ENOMEM_trans_kmalloc); } new_mem = krealloc(trans->mem, new_bytes, GFP_NOWAIT|__GFP_NOWARN); @@ -3249,7 +3229,6 @@ void *__bch2_trans_kmalloc(struct btree_trans *trans, size_t size, unsigned long if (!new_mem && new_bytes <= BTREE_TRANS_MEM_MAX) { new_mem = mempool_alloc(&c->btree_trans_mem_pool, GFP_KERNEL); new_bytes = BTREE_TRANS_MEM_MAX; - memcpy(new_mem, trans->mem, trans->mem_top); trans->used_mempool = true; kfree(trans->mem); } @@ -3264,7 +3243,7 @@ void *__bch2_trans_kmalloc(struct btree_trans *trans, size_t size, unsigned long if (ret) return ERR_PTR(ret); } -out_new_mem: + trans->mem = new_mem; trans->mem_bytes = new_bytes; @@ -3273,7 +3252,7 @@ out_new_mem: return ERR_PTR(btree_trans_restart_ip(trans, BCH_ERR_transaction_restart_mem_realloced, _RET_IP_)); } -out_change_top: + bch2_trans_kmalloc_trace(trans, size, ip); p = trans->mem + trans->mem_top; From 9b54efe66c9b44e7446e8a81a058c014cd43661d Mon Sep 17 00:00:00 2001 From: Alan Huang Date: Fri, 13 Jun 2025 22:54:59 +0800 Subject: [PATCH 035/289] bcachefs: Fix alloc_req use after free Now the alloc_req is allocated from the bump allocator, if there is reallocation, the memory of alloc_req would be frees, fix by delaying the reallocation to transaction restart, it has to restart anyway. Reported-by: syzbot+2887a13a5c387e616a68@syzkaller.appspotmail.com Signed-off-by: Alan Huang Signed-off-by: Kent Overstreet --- fs/bcachefs/btree_iter.c | 48 +++++++++++++++++++++++++++------------ fs/bcachefs/btree_types.h | 1 + 2 files changed, 35 insertions(+), 14 deletions(-) diff --git a/fs/bcachefs/btree_iter.c b/fs/bcachefs/btree_iter.c index 55de2e474705..e2747c57ba2a 100644 --- a/fs/bcachefs/btree_iter.c +++ b/fs/bcachefs/btree_iter.c @@ -3216,25 +3216,32 @@ void *__bch2_trans_kmalloc(struct btree_trans *trans, size_t size, unsigned long mutex_unlock(&s->lock); } - if (trans->used_mempool) { + if (trans->used_mempool || new_bytes > BTREE_TRANS_MEM_MAX) { EBUG_ON(trans->mem_bytes >= new_bytes); return ERR_PTR(-BCH_ERR_ENOMEM_trans_kmalloc); } - new_mem = krealloc(trans->mem, new_bytes, GFP_NOWAIT|__GFP_NOWARN); + if (old_bytes) { + trans->realloc_bytes_required = new_bytes; + trace_and_count(c, trans_restart_mem_realloced, trans, _RET_IP_, new_bytes); + return ERR_PTR(btree_trans_restart_ip(trans, + BCH_ERR_transaction_restart_mem_realloced, _RET_IP_)); + } + + EBUG_ON(trans->mem); + + new_mem = kmalloc(new_bytes, GFP_NOWAIT|__GFP_NOWARN); if (unlikely(!new_mem)) { bch2_trans_unlock(trans); - new_mem = krealloc(trans->mem, new_bytes, GFP_KERNEL); + new_mem = kmalloc(new_bytes, GFP_KERNEL); if (!new_mem && new_bytes <= BTREE_TRANS_MEM_MAX) { new_mem = mempool_alloc(&c->btree_trans_mem_pool, GFP_KERNEL); new_bytes = BTREE_TRANS_MEM_MAX; trans->used_mempool = true; - kfree(trans->mem); } - if (!new_mem) - return ERR_PTR(-BCH_ERR_ENOMEM_trans_kmalloc); + EBUG_ON(!new_mem); trans->mem = new_mem; trans->mem_bytes = new_bytes; @@ -3247,14 +3254,6 @@ void *__bch2_trans_kmalloc(struct btree_trans *trans, size_t size, unsigned long trans->mem = new_mem; trans->mem_bytes = new_bytes; - if (old_bytes) { - trace_and_count(c, trans_restart_mem_realloced, trans, _RET_IP_, new_bytes); - return ERR_PTR(btree_trans_restart_ip(trans, - BCH_ERR_transaction_restart_mem_realloced, _RET_IP_)); - } - - bch2_trans_kmalloc_trace(trans, size, ip); - p = trans->mem + trans->mem_top; trans->mem_top += size; memset(p, 0, size); @@ -3315,6 +3314,27 @@ u32 bch2_trans_begin(struct btree_trans *trans) trans->restart_count++; trans->mem_top = 0; + if (trans->restarted == BCH_ERR_transaction_restart_mem_realloced) { + EBUG_ON(!trans->mem || !trans->mem_bytes); + unsigned new_bytes = trans->realloc_bytes_required; + void *new_mem = krealloc(trans->mem, new_bytes, GFP_NOWAIT|__GFP_NOWARN); + if (unlikely(!new_mem)) { + bch2_trans_unlock(trans); + new_mem = krealloc(trans->mem, new_bytes, GFP_KERNEL); + + EBUG_ON(new_bytes > BTREE_TRANS_MEM_MAX); + + if (!new_mem) { + new_mem = mempool_alloc(&trans->c->btree_trans_mem_pool, GFP_KERNEL); + new_bytes = BTREE_TRANS_MEM_MAX; + trans->used_mempool = true; + kfree(trans->mem); + } + } + trans->mem = new_mem; + trans->mem_bytes = new_bytes; + } + trans_for_each_path(trans, path, i) { path->should_be_locked = false; diff --git a/fs/bcachefs/btree_types.h b/fs/bcachefs/btree_types.h index 3aa4a602bd02..112170fd9c8f 100644 --- a/fs/bcachefs/btree_types.h +++ b/fs/bcachefs/btree_types.h @@ -497,6 +497,7 @@ struct btree_trans { void *mem; unsigned mem_top; unsigned mem_bytes; + unsigned realloc_bytes_required; #ifdef CONFIG_BCACHEFS_TRANS_KMALLOC_TRACE darray_trans_kmalloc_trace trans_kmalloc_trace; #endif From e31144f8cbe041a572ca64d9261ceeb647ccf557 Mon Sep 17 00:00:00 2001 From: Alan Huang Date: Fri, 13 Jun 2025 03:01:58 +0800 Subject: [PATCH 036/289] bcachefs: Add missing EBUG_ON Just like the EBUG_ON in bch2_journal_add_entry(). Signed-off-by: Alan Huang Signed-off-by: Kent Overstreet --- fs/bcachefs/btree_trans_commit.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/fs/bcachefs/btree_trans_commit.c b/fs/bcachefs/btree_trans_commit.c index d9710801e3ee..55952143f0d3 100644 --- a/fs/bcachefs/btree_trans_commit.c +++ b/fs/bcachefs/btree_trans_commit.c @@ -757,6 +757,8 @@ bch2_trans_commit_write_locked(struct btree_trans *trans, unsigned flags, btree_trans_journal_entries_start(trans), trans->journal_entries.u64s); + EBUG_ON(trans->journal_res.u64s < trans->journal_entries.u64s); + trans->journal_res.offset += trans->journal_entries.u64s; trans->journal_res.u64s -= trans->journal_entries.u64s; From 0dc8eaebed99ea323ab3be7cf1438277f990189d Mon Sep 17 00:00:00 2001 From: Alan Huang Date: Fri, 13 Jun 2025 03:01:59 +0800 Subject: [PATCH 037/289] bcachefs: Delay calculation of trans->journal_u64s When there is commit error that need split btree leaf, fsck might change the value of trans->journal_entries.u64s, when retry commit, the value of trans->journal_u64s would be incorrect, which will lead to trans->journal_res.u64s underflow, and then out of bounds write will occur: [ 464.496970][T11969] Call trace: [ 464.496973][T11969] show_stack+0x3c/0x88 (C) [ 464.496995][T11969] dump_stack_lvl+0xf8/0x178 [ 464.497014][T11969] dump_stack+0x20/0x30 [ 464.497031][T11969] __bch2_trans_log_str+0x344/0x350 [ 464.497048][T11969] bch2_trans_log_str+0x3c/0x60 [ 464.497065][T11969] __bch2_fsck_err+0x11bc/0x1390 [ 464.497083][T11969] bch2_check_discard_freespace_key+0xad4/0x10d0 [ 464.497100][T11969] bch2_bucket_alloc_freelist+0x99c/0x1130 [ 464.497117][T11969] bch2_bucket_alloc_trans+0x79c/0xcb8 [ 464.497133][T11969] bch2_bucket_alloc_set_trans+0x378/0xc20 [ 464.497151][T11969] __open_bucket_add_buckets+0x7fc/0x1c00 [ 464.497168][T11969] open_bucket_add_buckets+0x184/0x3a8 [ 464.497185][T11969] bch2_alloc_sectors_start_trans+0xa04/0x1da0 [ 464.497203][T11969] bch2_btree_reserve_get+0x6e0/0xef0 [ 464.497220][T11969] bch2_btree_update_start+0x1618/0x2600 [ 464.497239][T11969] bch2_btree_split_leaf+0xcc/0x730 [ 464.497258][T11969] bch2_trans_commit_error+0x22c/0xc30 [ 464.497276][T11969] __bch2_trans_commit+0x207c/0x4e30 [ 464.497292][T11969] bch2_journal_replay+0x9e0/0x1420 [ 464.497305][T11969] __bch2_run_recovery_passes+0x458/0xf98 [ 464.497318][T11969] bch2_run_recovery_passes+0x280/0x478 [ 464.497331][T11969] bch2_fs_recovery+0x24f0/0x3a28 [ 464.497344][T11969] bch2_fs_start+0xb80/0x1248 [ 464.497358][T11969] bch2_fs_get_tree+0xe94/0x1708 [ 464.497377][T11969] vfs_get_tree+0x84/0x2d0 Signed-off-by: Alan Huang Signed-off-by: Kent Overstreet --- fs/bcachefs/btree_trans_commit.c | 11 +++++++---- 1 file changed, 7 insertions(+), 4 deletions(-) diff --git a/fs/bcachefs/btree_trans_commit.c b/fs/bcachefs/btree_trans_commit.c index 55952143f0d3..61107f2310ab 100644 --- a/fs/bcachefs/btree_trans_commit.c +++ b/fs/bcachefs/btree_trans_commit.c @@ -1005,6 +1005,7 @@ int __bch2_trans_commit(struct btree_trans *trans, unsigned flags) { struct btree_insert_entry *errored_at = NULL; struct bch_fs *c = trans->c; + unsigned journal_u64s = 0; int ret = 0; bch2_trans_verify_not_unlocked_or_in_restart(trans); @@ -1033,10 +1034,10 @@ int __bch2_trans_commit(struct btree_trans *trans, unsigned flags) EBUG_ON(test_bit(BCH_FS_clean_shutdown, &c->flags)); - trans->journal_u64s = trans->journal_entries.u64s + jset_u64s(trans->accounting.u64s); + journal_u64s = jset_u64s(trans->accounting.u64s); trans->journal_transaction_names = READ_ONCE(c->opts.journal_transaction_names); if (trans->journal_transaction_names) - trans->journal_u64s += jset_u64s(JSET_ENTRY_LOG_U64s); + journal_u64s += jset_u64s(JSET_ENTRY_LOG_U64s); trans_for_each_update(trans, i) { struct btree_path *path = trans->paths + i->path; @@ -1056,11 +1057,11 @@ int __bch2_trans_commit(struct btree_trans *trans, unsigned flags) continue; /* we're going to journal the key being updated: */ - trans->journal_u64s += jset_u64s(i->k->k.u64s); + journal_u64s += jset_u64s(i->k->k.u64s); /* and we're also going to log the overwrite: */ if (trans->journal_transaction_names) - trans->journal_u64s += jset_u64s(i->old_k.u64s); + journal_u64s += jset_u64s(i->old_k.u64s); } if (trans->extra_disk_res) { @@ -1078,6 +1079,8 @@ retry: memset(&trans->journal_res, 0, sizeof(trans->journal_res)); memset(&trans->fs_usage_delta, 0, sizeof(trans->fs_usage_delta)); + trans->journal_u64s = journal_u64s + trans->journal_entries.u64s; + ret = do_bch2_trans_commit(trans, flags, &errored_at, _RET_IP_); /* make sure we didn't drop or screw up locks: */ From 0e62fca2a6dbfcedaab4919d7ad2044f20fdf889 Mon Sep 17 00:00:00 2001 From: Kent Overstreet Date: Fri, 13 Jun 2025 14:53:42 -0400 Subject: [PATCH 038/289] bcachefs: Fix bch2_journal_keys_peek_prev_min() this code is rarely invoked, so - we had a few bugs left from basing it off of bch2_journal_keys_peek_max()... Signed-off-by: Kent Overstreet --- fs/bcachefs/btree_journal_iter.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/fs/bcachefs/btree_journal_iter.c b/fs/bcachefs/btree_journal_iter.c index cf7398751644..de996c848e43 100644 --- a/fs/bcachefs/btree_journal_iter.c +++ b/fs/bcachefs/btree_journal_iter.c @@ -141,8 +141,8 @@ search: if (!*idx) *idx = __bch2_journal_key_search(keys, btree_id, level, pos); - while (*idx && - __journal_key_cmp(btree_id, level, end_pos, idx_to_key(keys, *idx - 1)) <= 0) { + while (*idx < keys->nr && + __journal_key_cmp(btree_id, level, end_pos, idx_to_key(keys, *idx - 1)) >= 0) { (*idx)++; iters++; if (iters == 10) { From 425da82c63e3ac06b2f8780879e83463e88cbf9f Mon Sep 17 00:00:00 2001 From: Kent Overstreet Date: Fri, 13 Jun 2025 14:57:34 -0400 Subject: [PATCH 039/289] bcachefs: btree_iter: fix updates, journal overlay We need to start searching from search_key - _not_ path->pos, which will point to the key we found in the btree Signed-off-by: Kent Overstreet --- fs/bcachefs/btree_iter.c | 29 +++++++++++++++++------------ 1 file changed, 17 insertions(+), 12 deletions(-) diff --git a/fs/bcachefs/btree_iter.c b/fs/bcachefs/btree_iter.c index e2747c57ba2a..061603999ae5 100644 --- a/fs/bcachefs/btree_iter.c +++ b/fs/bcachefs/btree_iter.c @@ -2076,14 +2076,14 @@ inline bool bch2_btree_iter_rewind(struct btree_trans *trans, struct btree_iter static noinline void bch2_btree_trans_peek_prev_updates(struct btree_trans *trans, struct btree_iter *iter, - struct bkey_s_c *k) + struct bpos search_key, struct bkey_s_c *k) { struct bpos end = path_l(btree_iter_path(trans, iter))->b->data->min_key; trans_for_each_update(trans, i) if (!i->key_cache_already_flushed && i->btree_id == iter->btree_id && - bpos_le(i->k->k.p, iter->pos) && + bpos_le(i->k->k.p, search_key) && bpos_ge(i->k->k.p, k->k ? k->k->p : end)) { iter->k = i->k->k; *k = bkey_i_to_s_c(i->k); @@ -2092,6 +2092,7 @@ void bch2_btree_trans_peek_prev_updates(struct btree_trans *trans, struct btree_ static noinline void bch2_btree_trans_peek_updates(struct btree_trans *trans, struct btree_iter *iter, + struct bpos search_key, struct bkey_s_c *k) { struct btree_path *path = btree_iter_path(trans, iter); @@ -2100,7 +2101,7 @@ void bch2_btree_trans_peek_updates(struct btree_trans *trans, struct btree_iter trans_for_each_update(trans, i) if (!i->key_cache_already_flushed && i->btree_id == iter->btree_id && - bpos_ge(i->k->k.p, path->pos) && + bpos_ge(i->k->k.p, search_key) && bpos_le(i->k->k.p, k->k ? k->k->p : end)) { iter->k = i->k->k; *k = bkey_i_to_s_c(i->k); @@ -2122,13 +2123,14 @@ void bch2_btree_trans_peek_slot_updates(struct btree_trans *trans, struct btree_ static struct bkey_i *bch2_btree_journal_peek(struct btree_trans *trans, struct btree_iter *iter, + struct bpos search_pos, struct bpos end_pos) { struct btree_path *path = btree_iter_path(trans, iter); return bch2_journal_keys_peek_max(trans->c, iter->btree_id, path->level, - path->pos, + search_pos, end_pos, &iter->journal_idx); } @@ -2138,7 +2140,7 @@ struct bkey_s_c btree_trans_peek_slot_journal(struct btree_trans *trans, struct btree_iter *iter) { struct btree_path *path = btree_iter_path(trans, iter); - struct bkey_i *k = bch2_btree_journal_peek(trans, iter, path->pos); + struct bkey_i *k = bch2_btree_journal_peek(trans, iter, path->pos, path->pos); if (k) { iter->k = k->k; @@ -2151,11 +2153,12 @@ struct bkey_s_c btree_trans_peek_slot_journal(struct btree_trans *trans, static noinline void btree_trans_peek_journal(struct btree_trans *trans, struct btree_iter *iter, + struct bpos search_key, struct bkey_s_c *k) { struct btree_path *path = btree_iter_path(trans, iter); struct bkey_i *next_journal = - bch2_btree_journal_peek(trans, iter, + bch2_btree_journal_peek(trans, iter, search_key, k->k ? k->k->p : path_l(path)->b->key.k.p); if (next_journal) { iter->k = next_journal->k; @@ -2165,13 +2168,14 @@ void btree_trans_peek_journal(struct btree_trans *trans, static struct bkey_i *bch2_btree_journal_peek_prev(struct btree_trans *trans, struct btree_iter *iter, + struct bpos search_key, struct bpos end_pos) { struct btree_path *path = btree_iter_path(trans, iter); return bch2_journal_keys_peek_prev_min(trans->c, iter->btree_id, path->level, - path->pos, + search_key, end_pos, &iter->journal_idx); } @@ -2179,11 +2183,12 @@ static struct bkey_i *bch2_btree_journal_peek_prev(struct btree_trans *trans, static noinline void btree_trans_peek_prev_journal(struct btree_trans *trans, struct btree_iter *iter, + struct bpos search_key, struct bkey_s_c *k) { struct btree_path *path = btree_iter_path(trans, iter); struct bkey_i *next_journal = - bch2_btree_journal_peek_prev(trans, iter, + bch2_btree_journal_peek_prev(trans, iter, search_key, k->k ? k->k->p : path_l(path)->b->key.k.p); if (next_journal) { @@ -2292,11 +2297,11 @@ static struct bkey_s_c __bch2_btree_iter_peek(struct btree_trans *trans, struct } if (unlikely(iter->flags & BTREE_ITER_with_journal)) - btree_trans_peek_journal(trans, iter, &k); + btree_trans_peek_journal(trans, iter, search_key, &k); if (unlikely((iter->flags & BTREE_ITER_with_updates) && trans->nr_updates)) - bch2_btree_trans_peek_updates(trans, iter, &k); + bch2_btree_trans_peek_updates(trans, iter, search_key, &k); if (k.k && bkey_deleted(k.k)) { /* @@ -2584,11 +2589,11 @@ static struct bkey_s_c __bch2_btree_iter_peek_prev(struct btree_trans *trans, st } if (unlikely(iter->flags & BTREE_ITER_with_journal)) - btree_trans_peek_prev_journal(trans, iter, &k); + btree_trans_peek_prev_journal(trans, iter, search_key, &k); if (unlikely((iter->flags & BTREE_ITER_with_updates) && trans->nr_updates)) - bch2_btree_trans_peek_prev_updates(trans, iter, &k); + bch2_btree_trans_peek_prev_updates(trans, iter, search_key, &k); if (likely(k.k && !bkey_deleted(k.k))) { break; From f2ed0892732d470abde7a1af360bd670dc8a68c6 Mon Sep 17 00:00:00 2001 From: Kent Overstreet Date: Fri, 13 Jun 2025 15:17:37 -0400 Subject: [PATCH 040/289] bcachefs: better __bch2_snapshot_is_ancestor() assert Previously, we weren't checking the result of the skiplist walk, just the is_ancestor bitmap. Signed-off-by: Kent Overstreet --- fs/bcachefs/snapshot.c | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-) diff --git a/fs/bcachefs/snapshot.c b/fs/bcachefs/snapshot.c index 23a332d76b32..38aeaa128d27 100644 --- a/fs/bcachefs/snapshot.c +++ b/fs/bcachefs/snapshot.c @@ -135,7 +135,9 @@ static bool test_ancestor_bitmap(struct snapshot_table *t, u32 id, u32 ancestor) bool __bch2_snapshot_is_ancestor(struct bch_fs *c, u32 id, u32 ancestor) { - bool ret; +#ifdef CONFIG_BCACHEFS_DEBUG + u32 orig_id = id; +#endif guard(rcu)(); struct snapshot_table *t = rcu_dereference(c->snapshots); @@ -147,11 +149,11 @@ bool __bch2_snapshot_is_ancestor(struct bch_fs *c, u32 id, u32 ancestor) while (id && id < ancestor - IS_ANCESTOR_BITMAP) id = get_ancestor_below(t, id, ancestor); - ret = id && id < ancestor + bool ret = id && id < ancestor ? test_ancestor_bitmap(t, id, ancestor) : id == ancestor; - EBUG_ON(ret != __bch2_snapshot_is_ancestor_early(t, id, ancestor)); + EBUG_ON(ret != __bch2_snapshot_is_ancestor_early(t, orig_id, ancestor)); return ret; } From 2ba562cc04dc22d01750e80d0638cc1d91fdd95c Mon Sep 17 00:00:00 2001 From: Kent Overstreet Date: Fri, 13 Jun 2025 15:15:09 -0400 Subject: [PATCH 041/289] bcachefs: pass last_seq into fs_journal_start() Prep work for journal rewind, where the seq we're replaying from may be different than the last journal entry's last_seq. Signed-off-by: Kent Overstreet --- fs/bcachefs/journal.c | 18 ++++++------------ fs/bcachefs/journal.h | 2 +- fs/bcachefs/recovery.c | 4 ++-- 3 files changed, 9 insertions(+), 15 deletions(-) diff --git a/fs/bcachefs/journal.c b/fs/bcachefs/journal.c index dda802a656cf..df71af0013ba 100644 --- a/fs/bcachefs/journal.c +++ b/fs/bcachefs/journal.c @@ -1474,14 +1474,13 @@ void bch2_fs_journal_stop(struct journal *j) clear_bit(JOURNAL_running, &j->flags); } -int bch2_fs_journal_start(struct journal *j, u64 cur_seq) +int bch2_fs_journal_start(struct journal *j, u64 last_seq, u64 cur_seq) { struct bch_fs *c = container_of(j, struct bch_fs, journal); struct journal_entry_pin_list *p; struct journal_replay *i, **_i; struct genradix_iter iter; bool had_entries = false; - u64 last_seq = cur_seq, nr, seq; /* * @@ -1495,17 +1494,11 @@ int bch2_fs_journal_start(struct journal *j, u64 cur_seq) return -EINVAL; } - genradix_for_each_reverse(&c->journal_entries, iter, _i) { - i = *_i; + /* Clean filesystem? */ + if (!last_seq) + last_seq = cur_seq; - if (journal_replay_ignore(i)) - continue; - - last_seq = le64_to_cpu(i->j.last_seq); - break; - } - - nr = cur_seq - last_seq; + u64 nr = cur_seq - last_seq; /* * Extra fudge factor, in case we crashed when the journal pin fifo was @@ -1532,6 +1525,7 @@ int bch2_fs_journal_start(struct journal *j, u64 cur_seq) j->pin.back = cur_seq; atomic64_set(&j->seq, cur_seq - 1); + u64 seq; fifo_for_each_entry_ptr(p, &j->pin, seq) journal_pin_list_init(p, 1); diff --git a/fs/bcachefs/journal.h b/fs/bcachefs/journal.h index 83734fe4331f..977907038d98 100644 --- a/fs/bcachefs/journal.h +++ b/fs/bcachefs/journal.h @@ -453,7 +453,7 @@ int bch2_fs_journal_alloc(struct bch_fs *); void bch2_dev_journal_stop(struct journal *, struct bch_dev *); void bch2_fs_journal_stop(struct journal *); -int bch2_fs_journal_start(struct journal *, u64); +int bch2_fs_journal_start(struct journal *, u64, u64); void bch2_journal_set_replay_done(struct journal *); void bch2_dev_journal_exit(struct bch_dev *); diff --git a/fs/bcachefs/recovery.c b/fs/bcachefs/recovery.c index 0b21fa6ff062..6aef8b101820 100644 --- a/fs/bcachefs/recovery.c +++ b/fs/bcachefs/recovery.c @@ -965,7 +965,7 @@ use_clean: ret = bch2_journal_log_msg(c, "starting journal at entry %llu, replaying %llu-%llu", journal_seq, last_seq, blacklist_seq - 1) ?: - bch2_fs_journal_start(&c->journal, journal_seq); + bch2_fs_journal_start(&c->journal, last_seq, journal_seq); if (ret) goto err; @@ -1181,7 +1181,7 @@ int bch2_fs_initialize(struct bch_fs *c) * journal_res_get() will crash if called before this has * set up the journal.pin FIFO and journal.cur pointer: */ - ret = bch2_fs_journal_start(&c->journal, 1); + ret = bch2_fs_journal_start(&c->journal, 1, 1); if (ret) goto err; From c1ccd43b357e157d78c899e61764fc83b4d1dbaa Mon Sep 17 00:00:00 2001 From: Kent Overstreet Date: Fri, 13 Jun 2025 17:47:07 -0400 Subject: [PATCH 042/289] bcachefs: Fix "now allowing incompatible features" message Check against version_incompat_allowed, not version_incompat. Signed-off-by: Kent Overstreet --- fs/bcachefs/recovery.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/bcachefs/recovery.c b/fs/bcachefs/recovery.c index 6aef8b101820..820249e9c5ea 100644 --- a/fs/bcachefs/recovery.c +++ b/fs/bcachefs/recovery.c @@ -692,7 +692,7 @@ static bool check_version_upgrade(struct bch_fs *c) ret = true; } - if (new_version > c->sb.version_incompat && + if (new_version > c->sb.version_incompat_allowed && c->opts.version_upgrade == BCH_VERSION_UPGRADE_incompatible) { struct printbuf buf = PRINTBUF; From c27e5782d957c7242e9cb6405a395b89e8e1d573 Mon Sep 17 00:00:00 2001 From: Kent Overstreet Date: Fri, 13 Jun 2025 18:27:33 -0400 Subject: [PATCH 043/289] bcachefs: Fix snapshot_key_missing_inode_snapshot repair When the inode was a whiteout, we were inserting a new whiteout at the wrong (old) snapshot. Signed-off-by: Kent Overstreet --- fs/bcachefs/fsck.c | 10 ++++------ 1 file changed, 4 insertions(+), 6 deletions(-) diff --git a/fs/bcachefs/fsck.c b/fs/bcachefs/fsck.c index 68ed69a255e1..b80a56e19b40 100644 --- a/fs/bcachefs/fsck.c +++ b/fs/bcachefs/fsck.c @@ -903,17 +903,15 @@ lookup_inode_for_snapshot(struct btree_trans *trans, struct inode_walker *w, str w->last_pos.inode, k.k->p.snapshot, i->inode.bi_snapshot, (bch2_bkey_val_to_text(&buf, c, k), buf.buf))) { - struct bch_inode_unpacked new = i->inode; - struct bkey_i whiteout; - - new.bi_snapshot = k.k->p.snapshot; - if (!i->whiteout) { + struct bch_inode_unpacked new = i->inode; + new.bi_snapshot = k.k->p.snapshot; ret = __bch2_fsck_write_inode(trans, &new); } else { + struct bkey_i whiteout; bkey_init(&whiteout.k); whiteout.k.type = KEY_TYPE_whiteout; - whiteout.k.p = SPOS(0, i->inode.bi_inum, i->inode.bi_snapshot); + whiteout.k.p = SPOS(0, i->inode.bi_inum, k.k->p.snapshot); ret = bch2_btree_insert_nonextent(trans, BTREE_ID_inodes, &whiteout, BTREE_UPDATE_internal_snapshot_node); From b17d7bdb128c50025fc3eb7a9e57b3c7caa4a5ac Mon Sep 17 00:00:00 2001 From: Kent Overstreet Date: Fri, 13 Jun 2025 18:49:54 -0400 Subject: [PATCH 044/289] bcachefs: fsck: fix add_inode() the inode btree uses the offset field for the inum, not the inode field. Signed-off-by: Kent Overstreet --- fs/bcachefs/fsck.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/bcachefs/fsck.c b/fs/bcachefs/fsck.c index b80a56e19b40..28ca07c0e029 100644 --- a/fs/bcachefs/fsck.c +++ b/fs/bcachefs/fsck.c @@ -806,7 +806,7 @@ static int add_inode(struct bch_fs *c, struct inode_walker *w, if (!n->whiteout) { return bch2_inode_unpack(inode, &n->inode); } else { - n->inode.bi_inum = inode.k->p.inode; + n->inode.bi_inum = inode.k->p.offset; n->inode.bi_snapshot = inode.k->p.snapshot; return 0; } From 191334400d8031df367eb74b6ab49bc163f6f821 Mon Sep 17 00:00:00 2001 From: Kent Overstreet Date: Fri, 13 Jun 2025 18:58:57 -0400 Subject: [PATCH 045/289] bcachefs: fsck: fix extent past end of inode repair Fix the case where we're deleting in a different snapshot and need to emit a whiteout - that requires a regular BTREE_ITER_filter_snapshots iterator. Also, only delete the part of the extent that extents past i_size. Signed-off-by: Kent Overstreet --- fs/bcachefs/fsck.c | 31 ++++++++++++++++++++++++++----- 1 file changed, 26 insertions(+), 5 deletions(-) diff --git a/fs/bcachefs/fsck.c b/fs/bcachefs/fsck.c index 28ca07c0e029..48810a8e5d4b 100644 --- a/fs/bcachefs/fsck.c +++ b/fs/bcachefs/fsck.c @@ -1820,18 +1820,39 @@ static int check_extent(struct btree_trans *trans, struct btree_iter *iter, !key_visible_in_snapshot(c, s, i->inode.bi_snapshot, k.k->p.snapshot)) continue; - if (fsck_err_on(k.k->p.offset > round_up(i->inode.bi_size, block_bytes(c)) >> 9 && + u64 last_block = round_up(i->inode.bi_size, block_bytes(c)) >> 9; + + if (fsck_err_on(k.k->p.offset > last_block && !bkey_extent_is_reservation(k), trans, extent_past_end_of_inode, "extent type past end of inode %llu:%u, i_size %llu\n%s", i->inode.bi_inum, i->inode.bi_snapshot, i->inode.bi_size, (bch2_bkey_val_to_text(&buf, c, k), buf.buf))) { - struct btree_iter iter2; + struct bkey_i *whiteout = bch2_trans_kmalloc(trans, sizeof(*whiteout)); + ret = PTR_ERR_OR_ZERO(whiteout); + if (ret) + goto err; - bch2_trans_copy_iter(trans, &iter2, iter); - bch2_btree_iter_set_snapshot(trans, &iter2, i->inode.bi_snapshot); + bkey_init(&whiteout->k); + whiteout->k.p = SPOS(k.k->p.inode, + last_block, + i->inode.bi_snapshot); + bch2_key_resize(&whiteout->k, + min(KEY_SIZE_MAX & (~0 << c->block_bits), + U64_MAX - whiteout->k.p.offset)); + + + /* + * Need a normal (not BTREE_ITER_all_snapshots) + * iterator, if we're deleting in a different + * snapshot and need to emit a whiteout + */ + struct btree_iter iter2; + bch2_trans_iter_init(trans, &iter2, BTREE_ID_extents, + bkey_start_pos(&whiteout->k), + BTREE_ITER_intent); ret = bch2_btree_iter_traverse(trans, &iter2) ?: - bch2_btree_delete_at(trans, &iter2, + bch2_trans_update(trans, &iter2, whiteout, BTREE_UPDATE_internal_snapshot_node); bch2_trans_iter_exit(trans, &iter2); if (ret) From b0f77d301eb2b4e1fc816f33ade8519ae7f894f4 Mon Sep 17 00:00:00 2001 From: Christoph Hellwig Date: Thu, 5 Jun 2025 08:16:27 +0200 Subject: [PATCH 046/289] xfs: check for shutdown before going to sleep in xfs_select_zone Ensure the file system hasn't been shut down before waiting for a free zone to become available, because that won't happen on a shut down file system. Without this processes can occasionally get stuck in the allocator wait loop when racing with a file system shutdown. This sporadically happens when running generic/388 or generic/475. Fixes: 4e4d52075577 ("xfs: add the zoned space allocator") Reported-by: Shinichiro Kawasaki Signed-off-by: Christoph Hellwig Reviewed-by: Hans Holmberg Tested-by: Shin'ichiro Kawasaki Signed-off-by: Carlos Maiolino --- fs/xfs/xfs_zone_alloc.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/xfs/xfs_zone_alloc.c b/fs/xfs/xfs_zone_alloc.c index 80add26c0111..0de6f64b3169 100644 --- a/fs/xfs/xfs_zone_alloc.c +++ b/fs/xfs/xfs_zone_alloc.c @@ -727,7 +727,7 @@ xfs_select_zone( for (;;) { prepare_to_wait(&zi->zi_zone_wait, &wait, TASK_UNINTERRUPTIBLE); oz = xfs_select_zone_nowait(mp, write_hint, pack_tight); - if (oz) + if (oz || xfs_is_shutdown(mp)) break; schedule(); } From a593c89ac5a417605b165cbc9768b3663ab4d8ad Mon Sep 17 00:00:00 2001 From: Christoph Hellwig Date: Thu, 5 Jun 2025 08:16:28 +0200 Subject: [PATCH 047/289] xfs: remove NULL pointer checks in xfs_mru_cache_insert Remove the check for a NULL mru or mru->list in xfs_mru_cache_insert as this API misused lead to a direct NULL pointer dereference on first use and is not user triggerable. As a smatch run by Dan points out with the recent cleanup it would otherwise try to free the object we just determined to be NULL for this impossible to reach case. Fixes: 70b95cb86513 ("xfs: free the item in xfs_mru_cache_insert on failure") Reported-by: Dan Carpenter Signed-off-by: Christoph Hellwig Reviewed-by: Hans Holmberg Signed-off-by: Carlos Maiolino --- fs/xfs/xfs_mru_cache.c | 4 ---- 1 file changed, 4 deletions(-) diff --git a/fs/xfs/xfs_mru_cache.c b/fs/xfs/xfs_mru_cache.c index 08443ceec329..c95401de8397 100644 --- a/fs/xfs/xfs_mru_cache.c +++ b/fs/xfs/xfs_mru_cache.c @@ -425,10 +425,6 @@ xfs_mru_cache_insert( { int error = -EINVAL; - ASSERT(mru && mru->lists); - if (!mru || !mru->lists) - goto out_free; - error = -ENOMEM; if (radix_tree_preload(GFP_KERNEL)) goto out_free; From df3b7e2b56d271f93e2d1f395c13235a1a277639 Mon Sep 17 00:00:00 2001 From: Christoph Hellwig Date: Thu, 5 Jun 2025 08:16:29 +0200 Subject: [PATCH 048/289] xfs: use xfs_readonly_buftarg in xfs_remount_rw Use xfs_readonly_buftarg instead of open coding it. Signed-off-by: Christoph Hellwig Reviewed-by: Hans Holmberg Reviewed-by: John Garry Signed-off-by: Carlos Maiolino --- fs/xfs/xfs_super.c | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c index 0bc4b5489078..bb0a82635a77 100644 --- a/fs/xfs/xfs_super.c +++ b/fs/xfs/xfs_super.c @@ -2020,14 +2020,13 @@ xfs_remount_rw( int error; if (mp->m_logdev_targp && mp->m_logdev_targp != mp->m_ddev_targp && - bdev_read_only(mp->m_logdev_targp->bt_bdev)) { + xfs_readonly_buftarg(mp->m_logdev_targp)) { xfs_warn(mp, "ro->rw transition prohibited by read-only logdev"); return -EACCES; } - if (mp->m_rtdev_targp && - bdev_read_only(mp->m_rtdev_targp->bt_bdev)) { + if (mp->m_rtdev_targp && xfs_readonly_buftarg(mp->m_rtdev_targp)) { xfs_warn(mp, "ro->rw transition prohibited by read-only rtdev"); return -EACCES; From 0989dfa61f438150c4f1110604ba0787856fe8b0 Mon Sep 17 00:00:00 2001 From: Christoph Hellwig Date: Thu, 5 Jun 2025 08:16:30 +0200 Subject: [PATCH 049/289] xfs: move xfs_submit_zoned_bio a bit Commit f3e2e53823b9 ("xfs: add inode to zone caching for data placement") add the new code right between xfs_submit_zoned_bio and xfs_zone_alloc_and_submit which implement the main zoned write path. Move xfs_submit_zoned_bio down to keep it together again. Signed-off-by: Christoph Hellwig Reviewed-by: Carlos Maiolino Reviewed-by: Hans Holmberg Signed-off-by: Carlos Maiolino --- fs/xfs/xfs_zone_alloc.c | 40 ++++++++++++++++++++-------------------- 1 file changed, 20 insertions(+), 20 deletions(-) diff --git a/fs/xfs/xfs_zone_alloc.c b/fs/xfs/xfs_zone_alloc.c index 0de6f64b3169..01315ed75502 100644 --- a/fs/xfs/xfs_zone_alloc.c +++ b/fs/xfs/xfs_zone_alloc.c @@ -777,26 +777,6 @@ xfs_mark_rtg_boundary( ioend->io_flags |= IOMAP_IOEND_BOUNDARY; } -static void -xfs_submit_zoned_bio( - struct iomap_ioend *ioend, - struct xfs_open_zone *oz, - bool is_seq) -{ - ioend->io_bio.bi_iter.bi_sector = ioend->io_sector; - ioend->io_private = oz; - atomic_inc(&oz->oz_ref); /* for xfs_zoned_end_io */ - - if (is_seq) { - ioend->io_bio.bi_opf &= ~REQ_OP_WRITE; - ioend->io_bio.bi_opf |= REQ_OP_ZONE_APPEND; - } else { - xfs_mark_rtg_boundary(ioend); - } - - submit_bio(&ioend->io_bio); -} - /* * Cache the last zone written to for an inode so that it is considered first * for subsequent writes. @@ -891,6 +871,26 @@ xfs_zone_cache_create_association( xfs_mru_cache_insert(mp->m_zone_cache, ip->i_ino, &item->mru); } +static void +xfs_submit_zoned_bio( + struct iomap_ioend *ioend, + struct xfs_open_zone *oz, + bool is_seq) +{ + ioend->io_bio.bi_iter.bi_sector = ioend->io_sector; + ioend->io_private = oz; + atomic_inc(&oz->oz_ref); /* for xfs_zoned_end_io */ + + if (is_seq) { + ioend->io_bio.bi_opf &= ~REQ_OP_WRITE; + ioend->io_bio.bi_opf |= REQ_OP_ZONE_APPEND; + } else { + xfs_mark_rtg_boundary(ioend); + } + + submit_bio(&ioend->io_bio); +} + void xfs_zone_alloc_and_submit( struct iomap_ioend *ioend, From 19fa6e493a933c095f164230d3393d504216e052 Mon Sep 17 00:00:00 2001 From: Markus Elfring Date: Tue, 10 Jun 2025 14:50:07 +0200 Subject: [PATCH 050/289] xfs: Improve error handling in xfs_mru_cache_create() Simplify error handling in this function implementation. * Delete unnecessary pointer checks and variable assignments. * Omit a redundant function call. This issue was detected by using the Coccinelle software. Signed-off-by: Markus Elfring Reviewed-by: Darrick J. Wong Signed-off-by: Carlos Maiolino --- fs/xfs/xfs_mru_cache.c | 15 ++++----------- 1 file changed, 4 insertions(+), 11 deletions(-) diff --git a/fs/xfs/xfs_mru_cache.c b/fs/xfs/xfs_mru_cache.c index c95401de8397..866c71d9fbae 100644 --- a/fs/xfs/xfs_mru_cache.c +++ b/fs/xfs/xfs_mru_cache.c @@ -320,7 +320,7 @@ xfs_mru_cache_create( xfs_mru_cache_free_func_t free_func) { struct xfs_mru_cache *mru = NULL; - int err = 0, grp; + int grp; unsigned int grp_time; if (mrup) @@ -341,8 +341,8 @@ xfs_mru_cache_create( mru->lists = kzalloc(mru->grp_count * sizeof(*mru->lists), GFP_KERNEL | __GFP_NOFAIL); if (!mru->lists) { - err = -ENOMEM; - goto exit; + kfree(mru); + return -ENOMEM; } for (grp = 0; grp < mru->grp_count; grp++) @@ -361,14 +361,7 @@ xfs_mru_cache_create( mru->free_func = free_func; mru->data = data; *mrup = mru; - -exit: - if (err && mru && mru->lists) - kfree(mru->lists); - if (err && mru) - kfree(mru); - - return err; + return 0; } /* From db44d088a5ab030b741a3adf2e7b181a8a6dcfbe Mon Sep 17 00:00:00 2001 From: "Darrick J. Wong" Date: Thu, 12 Jun 2025 10:51:12 -0700 Subject: [PATCH 051/289] xfs: actually use the xfs_growfs_check_rtgeom tracepoint We created a new tracepoint but forgot to put it in. Fix that. Cc: rostedt@goodmis.org Cc: stable@vger.kernel.org # v6.14 Fixes: 59a57acbce282d ("xfs: check that the rtrmapbt maxlevels doesn't increase when growing fs") Signed-off-by: Darrick J. Wong Reviewed-by: Carlos Maiolino Reported-by: Steven Rostedt Closes: https://lore.kernel.org/all/20250612131021.114e6ec8@batman.local.home/ Signed-off-by: Carlos Maiolino --- fs/xfs/xfs_rtalloc.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/fs/xfs/xfs_rtalloc.c b/fs/xfs/xfs_rtalloc.c index 6484c596ecea..736eb0924573 100644 --- a/fs/xfs/xfs_rtalloc.c +++ b/fs/xfs/xfs_rtalloc.c @@ -1259,6 +1259,8 @@ xfs_growfs_check_rtgeom( kfree(nmp); + trace_xfs_growfs_check_rtgeom(mp, min_logfsbs); + if (min_logfsbs > mp->m_sb.sb_logblocks) return -EINVAL; From 7360ee47599af91a1d5f4e74d635d9408a54e489 Mon Sep 17 00:00:00 2001 From: Fedor Pchelkin Date: Wed, 11 Jun 2025 22:20:10 +0300 Subject: [PATCH 052/289] s390/pkey: Prevent overflow in size calculation for memdup_user() Number of apqn target list entries contained in 'nr_apqns' variable is determined by userspace via an ioctl call so the result of the product in calculation of size passed to memdup_user() may overflow. In this case the actual size of the allocated area and the value describing it won't be in sync leading to various types of unpredictable behaviour later. Use a proper memdup_array_user() helper which returns an error if an overflow is detected. Note that it is different from when nr_apqns is initially zero - that case is considered valid and should be handled in subsequent pkey_handler implementations. Found by Linux Verification Center (linuxtesting.org). Fixes: f2bbc96e7cfa ("s390/pkey: add CCA AES cipher key support") Cc: stable@vger.kernel.org Signed-off-by: Fedor Pchelkin Reviewed-by: Holger Dengler Reviewed-by: Heiko Carstens Link: https://lore.kernel.org/r/20250611192011.206057-1-pchelkin@ispras.ru Signed-off-by: Alexander Gordeev --- drivers/s390/crypto/pkey_api.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/s390/crypto/pkey_api.c b/drivers/s390/crypto/pkey_api.c index cef60770f68b..b3fcdcae379e 100644 --- a/drivers/s390/crypto/pkey_api.c +++ b/drivers/s390/crypto/pkey_api.c @@ -86,7 +86,7 @@ static void *_copy_apqns_from_user(void __user *uapqns, size_t nr_apqns) if (!uapqns || nr_apqns == 0) return NULL; - return memdup_user(uapqns, nr_apqns * sizeof(struct pkey_apqn)); + return memdup_array_user(uapqns, nr_apqns, sizeof(struct pkey_apqn)); } static int pkey_ioctl_genseck(struct pkey_genseck __user *ugs) From 17c3395e25f7db23fa5758c257a372d410d16cfd Mon Sep 17 00:00:00 2001 From: Kent Overstreet Date: Sat, 7 Jun 2025 19:16:12 -0400 Subject: [PATCH 053/289] bcachefs: opts.journal_rewind Add a mount option for rewinding the journal, bringing the entire filesystem to where it was at a previous point in time. This is for extreme disaster recovery scenarios - it's not intended as an undelete operation. The option takes a journal sequence number; the desired sequence number can be determined with 'bcachefs list_journal' Caveats: - The 'journal_transaction_names' option must have been enabled (it's on by default). The option controls emitting of extra debug info in the journal, so we can see what individual transactions were doing; It also enables journalling of keys being overwritten, which is what we rely on here. - A full fsck run will be automatically triggered since alloc info will be inconsistent. Only leaf node updates to non-alloc btrees are rewound, since rewinding interior btree updates isn't possible or desirable. - We can't do anything about data that was deleted and overwritten. Lots of metadata updates after the point in time we're rewinding to shouldn't cause a problem, since we segragate data and metadata allocations (this is in order to make repair by btree node scan practical on larger filesystems; there's a small 64-bit per device bitmap in the superblock of device ranges with btree nodes, and we try to keep this small). However, having discards enabled will cause problems, since buckets are discarded as soon as they become empty (this is why we don't implement fstrim: we don't need it). Hopefully, this feature will be a one-off thing that's never used again: this was implemented for recovering from the "vfs i_nlink 0 -> subvol deletion" bug, and that bug was unusually disastrous and additional safeguards have since been implemented. But if it does turn out that we need this more in the future, I'll have to implement an option so that empty buckets aren't discarded immediately - lagging by perhaps 1% of device capacity. Signed-off-by: Kent Overstreet --- fs/bcachefs/btree_journal_iter.c | 56 +++++++++++++++++--------- fs/bcachefs/btree_journal_iter_types.h | 5 ++- fs/bcachefs/journal_io.c | 21 ++++++++-- fs/bcachefs/opts.h | 5 +++ fs/bcachefs/recovery.c | 5 +++ 5 files changed, 67 insertions(+), 25 deletions(-) diff --git a/fs/bcachefs/btree_journal_iter.c b/fs/bcachefs/btree_journal_iter.c index de996c848e43..a41fabd06332 100644 --- a/fs/bcachefs/btree_journal_iter.c +++ b/fs/bcachefs/btree_journal_iter.c @@ -641,10 +641,11 @@ static int journal_sort_key_cmp(const void *_l, const void *_r) { const struct journal_key *l = _l; const struct journal_key *r = _r; + int rewind = l->rewind && r->rewind ? -1 : 1; return journal_key_cmp(l, r) ?: - cmp_int(l->journal_seq, r->journal_seq) ?: - cmp_int(l->journal_offset, r->journal_offset); + ((cmp_int(l->journal_seq, r->journal_seq) ?: + cmp_int(l->journal_offset, r->journal_offset)) * rewind); } void bch2_journal_keys_put(struct bch_fs *c) @@ -713,6 +714,8 @@ int bch2_journal_keys_sort(struct bch_fs *c) struct journal_keys *keys = &c->journal_keys; size_t nr_read = 0; + u64 rewind_seq = c->opts.journal_rewind ?: U64_MAX; + genradix_for_each(&c->journal_entries, iter, _i) { i = *_i; @@ -721,28 +724,43 @@ int bch2_journal_keys_sort(struct bch_fs *c) cond_resched(); - for_each_jset_key(k, entry, &i->j) { - struct journal_key n = (struct journal_key) { - .btree_id = entry->btree_id, - .level = entry->level, - .k = k, - .journal_seq = le64_to_cpu(i->j.seq), - .journal_offset = k->_data - i->j._data, - }; + vstruct_for_each(&i->j, entry) { + bool rewind = !entry->level && + !btree_id_is_alloc(entry->btree_id) && + le64_to_cpu(i->j.seq) >= rewind_seq; - if (darray_push(keys, n)) { - __journal_keys_sort(keys); + if (entry->type != (rewind + ? BCH_JSET_ENTRY_overwrite + : BCH_JSET_ENTRY_btree_keys)) + continue; - if (keys->nr * 8 > keys->size * 7) { - bch_err(c, "Too many journal keys for slowpath; have %zu compacted, buf size %zu, processed %zu keys at seq %llu", - keys->nr, keys->size, nr_read, le64_to_cpu(i->j.seq)); - return bch_err_throw(c, ENOMEM_journal_keys_sort); + if (!rewind && le64_to_cpu(i->j.seq) < c->journal_replay_seq_start) + continue; + + jset_entry_for_each_key(entry, k) { + struct journal_key n = (struct journal_key) { + .btree_id = entry->btree_id, + .level = entry->level, + .rewind = rewind, + .k = k, + .journal_seq = le64_to_cpu(i->j.seq), + .journal_offset = k->_data - i->j._data, + }; + + if (darray_push(keys, n)) { + __journal_keys_sort(keys); + + if (keys->nr * 8 > keys->size * 7) { + bch_err(c, "Too many journal keys for slowpath; have %zu compacted, buf size %zu, processed %zu keys at seq %llu", + keys->nr, keys->size, nr_read, le64_to_cpu(i->j.seq)); + return bch_err_throw(c, ENOMEM_journal_keys_sort); + } + + BUG_ON(darray_push(keys, n)); } - BUG_ON(darray_push(keys, n)); + nr_read++; } - - nr_read++; } } diff --git a/fs/bcachefs/btree_journal_iter_types.h b/fs/bcachefs/btree_journal_iter_types.h index 8b773823704f..86aacb254fb2 100644 --- a/fs/bcachefs/btree_journal_iter_types.h +++ b/fs/bcachefs/btree_journal_iter_types.h @@ -11,8 +11,9 @@ struct journal_key { u32 journal_offset; enum btree_id btree_id:8; unsigned level:8; - bool allocated; - bool overwritten; + bool allocated:1; + bool overwritten:1; + bool rewind:1; struct journal_key_range_overwritten __rcu * overwritten_range; struct bkey_i *k; diff --git a/fs/bcachefs/journal_io.c b/fs/bcachefs/journal_io.c index 0b15d71a8d2d..afbf12e8f0c5 100644 --- a/fs/bcachefs/journal_io.c +++ b/fs/bcachefs/journal_io.c @@ -160,6 +160,9 @@ static int journal_entry_add(struct bch_fs *c, struct bch_dev *ca, struct printbuf buf = PRINTBUF; int ret = JOURNAL_ENTRY_ADD_OK; + if (last_seq && c->opts.journal_rewind) + last_seq = min(last_seq, c->opts.journal_rewind); + if (!c->journal.oldest_seq_found_ondisk || le64_to_cpu(j->seq) < c->journal.oldest_seq_found_ondisk) c->journal.oldest_seq_found_ondisk = le64_to_cpu(j->seq); @@ -1430,11 +1433,21 @@ int bch2_journal_read(struct bch_fs *c, printbuf_reset(&buf); prt_printf(&buf, "journal read done, replaying entries %llu-%llu", *last_seq, *blacklist_seq - 1); + + /* + * Drop blacklisted entries and entries older than last_seq (or start of + * journal rewind: + */ + u64 drop_before = *last_seq; + if (c->opts.journal_rewind) { + drop_before = min(drop_before, c->opts.journal_rewind); + prt_printf(&buf, " (rewinding from %llu)", c->opts.journal_rewind); + } + + *last_seq = drop_before; if (*start_seq != *blacklist_seq) prt_printf(&buf, " (unflushed %llu-%llu)", *blacklist_seq, *start_seq - 1); bch_info(c, "%s", buf.buf); - - /* Drop blacklisted entries and entries older than last_seq: */ genradix_for_each(&c->journal_entries, radix_iter, _i) { i = *_i; @@ -1442,7 +1455,7 @@ int bch2_journal_read(struct bch_fs *c, continue; seq = le64_to_cpu(i->j.seq); - if (seq < *last_seq) { + if (seq < drop_before) { journal_replay_free(c, i, false); continue; } @@ -1455,7 +1468,7 @@ int bch2_journal_read(struct bch_fs *c, } } - ret = bch2_journal_check_for_missing(c, *last_seq, *blacklist_seq - 1); + ret = bch2_journal_check_for_missing(c, drop_before, *blacklist_seq - 1); if (ret) goto err; diff --git a/fs/bcachefs/opts.h b/fs/bcachefs/opts.h index 2a02606254b3..b0a76bd6d6f5 100644 --- a/fs/bcachefs/opts.h +++ b/fs/bcachefs/opts.h @@ -379,6 +379,11 @@ enum fsck_err_opts { OPT_BOOL(), \ BCH2_NO_SB_OPT, false, \ NULL, "Exit recovery immediately prior to journal replay")\ + x(journal_rewind, u64, \ + OPT_FS|OPT_MOUNT, \ + OPT_UINT(0, U64_MAX), \ + BCH2_NO_SB_OPT, 0, \ + NULL, "Rewind journal") \ x(recovery_passes, u64, \ OPT_FS|OPT_MOUNT, \ OPT_BITFIELD(bch2_recovery_passes), \ diff --git a/fs/bcachefs/recovery.c b/fs/bcachefs/recovery.c index 820249e9c5ea..37f2cc1ec2f8 100644 --- a/fs/bcachefs/recovery.c +++ b/fs/bcachefs/recovery.c @@ -757,6 +757,11 @@ int bch2_fs_recovery(struct bch_fs *c) if (c->opts.nochanges) c->opts.read_only = true; + if (c->opts.journal_rewind) { + bch_info(c, "rewinding journal, fsck required"); + c->opts.fsck = true; + } + mutex_lock(&c->sb_lock); struct bch_sb_field_ext *ext = bch2_sb_field_get(c->disk_sb.sb, ext); bool write_sb = false; From 10dfe4926de30b550913409d107005278ab47911 Mon Sep 17 00:00:00 2001 From: Kent Overstreet Date: Fri, 13 Jun 2025 20:06:33 -0400 Subject: [PATCH 054/289] bcachefs: Kill unused tracepoints Dead code cleanup. Link: https://lore.kernel.org/linux-bcachefs/20250612224059.39fddd07@batman.local.home/ Signed-off-by: Kent Overstreet --- fs/bcachefs/btree_trans_commit.c | 5 +- fs/bcachefs/btree_write_buffer.c | 3 + fs/bcachefs/errcode.h | 5 -- fs/bcachefs/trace.h | 100 +------------------------------ 4 files changed, 9 insertions(+), 104 deletions(-) diff --git a/fs/bcachefs/btree_trans_commit.c b/fs/bcachefs/btree_trans_commit.c index 61107f2310ab..639ef75b3dbd 100644 --- a/fs/bcachefs/btree_trans_commit.c +++ b/fs/bcachefs/btree_trans_commit.c @@ -595,12 +595,13 @@ bch2_trans_commit_write_locked(struct btree_trans *trans, unsigned flags, int ret = 0; bch2_trans_verify_not_unlocked_or_in_restart(trans); - +#if 0 + /* todo: bring back dynamic fault injection */ if (race_fault()) { trace_and_count(c, trans_restart_fault_inject, trans, trace_ip); return btree_trans_restart(trans, BCH_ERR_transaction_restart_fault_inject); } - +#endif /* * Check if the insert will fit in the leaf node with the write lock * held, otherwise another thread could write the node changing the diff --git a/fs/bcachefs/btree_write_buffer.c b/fs/bcachefs/btree_write_buffer.c index 90b21e61d2b6..21b5c03d1822 100644 --- a/fs/bcachefs/btree_write_buffer.c +++ b/fs/bcachefs/btree_write_buffer.c @@ -676,6 +676,9 @@ int bch2_btree_write_buffer_maybe_flush(struct btree_trans *trans, goto err; bch2_bkey_buf_copy(last_flushed, c, tmp.k); + + /* can we avoid the unconditional restart? */ + trace_and_count(c, trans_restart_write_buffer_flush, trans, _RET_IP_); ret = bch_err_throw(c, transaction_restart_write_buffer_flush); } err: diff --git a/fs/bcachefs/errcode.h b/fs/bcachefs/errcode.h index ac3264134a15..86a842f1e88e 100644 --- a/fs/bcachefs/errcode.h +++ b/fs/bcachefs/errcode.h @@ -137,7 +137,6 @@ x(BCH_ERR_transaction_restart, transaction_restart_relock) \ x(BCH_ERR_transaction_restart, transaction_restart_relock_path) \ x(BCH_ERR_transaction_restart, transaction_restart_relock_path_intent) \ - x(BCH_ERR_transaction_restart, transaction_restart_relock_after_fill) \ x(BCH_ERR_transaction_restart, transaction_restart_too_many_iters) \ x(BCH_ERR_transaction_restart, transaction_restart_lock_node_reused) \ x(BCH_ERR_transaction_restart, transaction_restart_fill_relock) \ @@ -148,11 +147,8 @@ x(BCH_ERR_transaction_restart, transaction_restart_would_deadlock_write)\ x(BCH_ERR_transaction_restart, transaction_restart_deadlock_recursion_limit)\ x(BCH_ERR_transaction_restart, transaction_restart_upgrade) \ - x(BCH_ERR_transaction_restart, transaction_restart_key_cache_upgrade) \ x(BCH_ERR_transaction_restart, transaction_restart_key_cache_fill) \ x(BCH_ERR_transaction_restart, transaction_restart_key_cache_raced) \ - x(BCH_ERR_transaction_restart, transaction_restart_key_cache_realloced)\ - x(BCH_ERR_transaction_restart, transaction_restart_journal_preres_get) \ x(BCH_ERR_transaction_restart, transaction_restart_split_race) \ x(BCH_ERR_transaction_restart, transaction_restart_write_buffer_flush) \ x(BCH_ERR_transaction_restart, transaction_restart_nested) \ @@ -241,7 +237,6 @@ x(BCH_ERR_journal_res_blocked, journal_buf_enomem) \ x(BCH_ERR_journal_res_blocked, journal_stuck) \ x(BCH_ERR_journal_res_blocked, journal_retry_open) \ - x(BCH_ERR_journal_res_blocked, journal_preres_get_blocked) \ x(BCH_ERR_journal_res_blocked, bucket_alloc_blocked) \ x(BCH_ERR_journal_res_blocked, stripe_alloc_blocked) \ x(BCH_ERR_invalid, invalid_sb) \ diff --git a/fs/bcachefs/trace.h b/fs/bcachefs/trace.h index e759c9ff3965..9c5a9c551f03 100644 --- a/fs/bcachefs/trace.h +++ b/fs/bcachefs/trace.h @@ -1080,34 +1080,14 @@ TRACE_EVENT(trans_blocked_journal_reclaim, __entry->must_wait) ); -TRACE_EVENT(trans_restart_journal_preres_get, - TP_PROTO(struct btree_trans *trans, - unsigned long caller_ip, - unsigned flags), - TP_ARGS(trans, caller_ip, flags), - - TP_STRUCT__entry( - __array(char, trans_fn, 32 ) - __field(unsigned long, caller_ip ) - __field(unsigned, flags ) - ), - - TP_fast_assign( - strscpy(__entry->trans_fn, trans->fn, sizeof(__entry->trans_fn)); - __entry->caller_ip = caller_ip; - __entry->flags = flags; - ), - - TP_printk("%s %pS %x", __entry->trans_fn, - (void *) __entry->caller_ip, - __entry->flags) -); - +#if 0 +/* todo: bring back dynamic fault injection */ DEFINE_EVENT(transaction_event, trans_restart_fault_inject, TP_PROTO(struct btree_trans *trans, unsigned long caller_ip), TP_ARGS(trans, caller_ip) ); +#endif DEFINE_EVENT(transaction_event, trans_traverse_all, TP_PROTO(struct btree_trans *trans, @@ -1195,19 +1175,6 @@ DEFINE_EVENT(transaction_restart_iter, trans_restart_relock_parent_for_fill, TP_ARGS(trans, caller_ip, path) ); -DEFINE_EVENT(transaction_restart_iter, trans_restart_relock_after_fill, - TP_PROTO(struct btree_trans *trans, - unsigned long caller_ip, - struct btree_path *path), - TP_ARGS(trans, caller_ip, path) -); - -DEFINE_EVENT(transaction_event, trans_restart_key_cache_upgrade, - TP_PROTO(struct btree_trans *trans, - unsigned long caller_ip), - TP_ARGS(trans, caller_ip) -); - DEFINE_EVENT(transaction_restart_iter, trans_restart_relock_key_cache_fill, TP_PROTO(struct btree_trans *trans, unsigned long caller_ip, @@ -1229,13 +1196,6 @@ DEFINE_EVENT(transaction_restart_iter, trans_restart_relock_path_intent, TP_ARGS(trans, caller_ip, path) ); -DEFINE_EVENT(transaction_restart_iter, trans_restart_traverse, - TP_PROTO(struct btree_trans *trans, - unsigned long caller_ip, - struct btree_path *path), - TP_ARGS(trans, caller_ip, path) -); - DEFINE_EVENT(transaction_restart_iter, trans_restart_memory_allocation_failure, TP_PROTO(struct btree_trans *trans, unsigned long caller_ip, @@ -1294,44 +1254,6 @@ TRACE_EVENT(trans_restart_mem_realloced, __entry->bytes) ); -TRACE_EVENT(trans_restart_key_cache_key_realloced, - TP_PROTO(struct btree_trans *trans, - unsigned long caller_ip, - struct btree_path *path, - unsigned old_u64s, - unsigned new_u64s), - TP_ARGS(trans, caller_ip, path, old_u64s, new_u64s), - - TP_STRUCT__entry( - __array(char, trans_fn, 32 ) - __field(unsigned long, caller_ip ) - __field(enum btree_id, btree_id ) - TRACE_BPOS_entries(pos) - __field(u32, old_u64s ) - __field(u32, new_u64s ) - ), - - TP_fast_assign( - strscpy(__entry->trans_fn, trans->fn, sizeof(__entry->trans_fn)); - __entry->caller_ip = caller_ip; - - __entry->btree_id = path->btree_id; - TRACE_BPOS_assign(pos, path->pos); - __entry->old_u64s = old_u64s; - __entry->new_u64s = new_u64s; - ), - - TP_printk("%s %pS btree %s pos %llu:%llu:%u old_u64s %u new_u64s %u", - __entry->trans_fn, - (void *) __entry->caller_ip, - bch2_btree_id_str(__entry->btree_id), - __entry->pos_inode, - __entry->pos_offset, - __entry->pos_snapshot, - __entry->old_u64s, - __entry->new_u64s) -); - DEFINE_EVENT(transaction_event, trans_restart_write_buffer_flush, TP_PROTO(struct btree_trans *trans, unsigned long caller_ip), @@ -1927,21 +1849,6 @@ TRACE_EVENT(btree_path_free, __entry->dup_locked) ); -TRACE_EVENT(btree_path_free_trans_begin, - TP_PROTO(btree_path_idx_t path), - TP_ARGS(path), - - TP_STRUCT__entry( - __field(btree_path_idx_t, idx ) - ), - - TP_fast_assign( - __entry->idx = path; - ), - - TP_printk(" path %3u", __entry->idx) -); - #else /* CONFIG_BCACHEFS_PATH_TRACEPOINTS */ #ifndef _TRACE_BCACHEFS_H @@ -1959,7 +1866,6 @@ static inline void trace_btree_path_traverse_start(struct btree_trans *trans, st static inline void trace_btree_path_traverse_end(struct btree_trans *trans, struct btree_path *path) {} static inline void trace_btree_path_set_pos(struct btree_trans *trans, struct btree_path *path, struct bpos *new_pos) {} static inline void trace_btree_path_free(struct btree_trans *trans, btree_path_idx_t path, struct btree_path *dup) {} -static inline void trace_btree_path_free_trans_begin(btree_path_idx_t path) {} #endif #endif /* CONFIG_BCACHEFS_PATH_TRACEPOINTS */ From 7c9cef5f8bf10a803fd0937ea071a93778f1108a Mon Sep 17 00:00:00 2001 From: Kent Overstreet Date: Fri, 13 Jun 2025 18:01:25 -0400 Subject: [PATCH 055/289] bcachefs: mark more errors autofix Signed-off-by: Kent Overstreet --- fs/bcachefs/sb-errors_format.h | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/fs/bcachefs/sb-errors_format.h b/fs/bcachefs/sb-errors_format.h index d06e73884871..56971ebe0f78 100644 --- a/fs/bcachefs/sb-errors_format.h +++ b/fs/bcachefs/sb-errors_format.h @@ -217,7 +217,7 @@ enum bch_fsck_flags { x(inode_str_hash_invalid, 194, 0) \ x(inode_v3_fields_start_bad, 195, 0) \ x(inode_snapshot_mismatch, 196, 0) \ - x(snapshot_key_missing_inode_snapshot, 314, 0) \ + x(snapshot_key_missing_inode_snapshot, 314, FSCK_AUTOFIX) \ x(inode_unlinked_but_clean, 197, 0) \ x(inode_unlinked_but_nlink_nonzero, 198, 0) \ x(inode_unlinked_and_not_open, 281, 0) \ @@ -253,18 +253,18 @@ enum bch_fsck_flags { x(extent_overlapping, 215, 0) \ x(key_in_missing_inode, 216, 0) \ x(key_in_wrong_inode_type, 217, 0) \ - x(extent_past_end_of_inode, 218, 0) \ + x(extent_past_end_of_inode, 218, FSCK_AUTOFIX) \ x(dirent_empty_name, 219, 0) \ x(dirent_val_too_big, 220, 0) \ x(dirent_name_too_long, 221, 0) \ x(dirent_name_embedded_nul, 222, 0) \ x(dirent_name_dot_or_dotdot, 223, 0) \ x(dirent_name_has_slash, 224, 0) \ - x(dirent_d_type_wrong, 225, 0) \ + x(dirent_d_type_wrong, 225, FSCK_AUTOFIX) \ x(inode_bi_parent_wrong, 226, 0) \ x(dirent_in_missing_dir_inode, 227, 0) \ x(dirent_in_non_dir_inode, 228, 0) \ - x(dirent_to_missing_inode, 229, 0) \ + x(dirent_to_missing_inode, 229, FSCK_AUTOFIX) \ x(dirent_to_overwritten_inode, 302, 0) \ x(dirent_to_missing_subvol, 230, 0) \ x(dirent_to_itself, 231, 0) \ From d89a34b14df5c205de698c23c3950b2b947cdb97 Mon Sep 17 00:00:00 2001 From: Alan Huang Date: Sat, 14 Jun 2025 17:18:07 +0800 Subject: [PATCH 056/289] bcachefs: Move bset size check before csum check In syzbot's crash, the bset's u64s is larger than the btree node. Reported-by: syzbot+bfaeaa8e26281970158d@syzkaller.appspotmail.com Signed-off-by: Alan Huang Signed-off-by: Kent Overstreet --- fs/bcachefs/btree_io.c | 42 ++++++++++++++++++++++-------------------- 1 file changed, 22 insertions(+), 20 deletions(-) diff --git a/fs/bcachefs/btree_io.c b/fs/bcachefs/btree_io.c index d8f3c4c65e90..005d5c94edd0 100644 --- a/fs/bcachefs/btree_io.c +++ b/fs/bcachefs/btree_io.c @@ -723,12 +723,11 @@ void bch2_btree_node_drop_keys_outside_node(struct btree *b) static int validate_bset(struct bch_fs *c, struct bch_dev *ca, struct btree *b, struct bset *i, - unsigned offset, unsigned sectors, int write, + unsigned offset, int write, struct bch_io_failures *failed, struct printbuf *err_msg) { unsigned version = le16_to_cpu(i->version); - unsigned ptr_written = btree_ptr_sectors_written(bkey_i_to_s_c(&b->key)); struct printbuf buf1 = PRINTBUF; struct printbuf buf2 = PRINTBUF; int ret = 0; @@ -778,15 +777,6 @@ static int validate_bset(struct bch_fs *c, struct bch_dev *ca, btree_node_unsupported_version, "BSET_SEPARATE_WHITEOUTS no longer supported"); - if (!write && - btree_err_on(offset + sectors > (ptr_written ?: btree_sectors(c)), - -BCH_ERR_btree_node_read_err_fixable, - c, ca, b, i, NULL, - bset_past_end_of_btree_node, - "bset past end of btree node (offset %u len %u but written %zu)", - offset, sectors, ptr_written ?: btree_sectors(c))) - i->u64s = 0; - btree_err_on(offset && !i->u64s, -BCH_ERR_btree_node_read_err_fixable, c, ca, b, i, NULL, @@ -1151,6 +1141,14 @@ int bch2_btree_node_read_done(struct bch_fs *c, struct bch_dev *ca, "unknown checksum type %llu", BSET_CSUM_TYPE(i)); if (first) { + sectors = vstruct_sectors(b->data, c->block_bits); + if (btree_err_on(b->written + sectors > (ptr_written ?: btree_sectors(c)), + -BCH_ERR_btree_node_read_err_fixable, + c, ca, b, i, NULL, + bset_past_end_of_btree_node, + "bset past end of btree node (offset %u len %u but written %zu)", + b->written, sectors, ptr_written ?: btree_sectors(c))) + i->u64s = 0; if (good_csum_type) { struct bch_csum csum = csum_vstruct(c, BSET_CSUM_TYPE(i), nonce, b->data); bool csum_bad = bch2_crc_cmp(b->data->csum, csum); @@ -1178,9 +1176,15 @@ int bch2_btree_node_read_done(struct bch_fs *c, struct bch_dev *ca, c, NULL, b, NULL, NULL, btree_node_unsupported_version, "btree node does not have NEW_EXTENT_OVERWRITE set"); - - sectors = vstruct_sectors(b->data, c->block_bits); } else { + sectors = vstruct_sectors(bne, c->block_bits); + if (btree_err_on(b->written + sectors > (ptr_written ?: btree_sectors(c)), + -BCH_ERR_btree_node_read_err_fixable, + c, ca, b, i, NULL, + bset_past_end_of_btree_node, + "bset past end of btree node (offset %u len %u but written %zu)", + b->written, sectors, ptr_written ?: btree_sectors(c))) + i->u64s = 0; if (good_csum_type) { struct bch_csum csum = csum_vstruct(c, BSET_CSUM_TYPE(i), nonce, bne); bool csum_bad = bch2_crc_cmp(bne->csum, csum); @@ -1201,14 +1205,12 @@ int bch2_btree_node_read_done(struct bch_fs *c, struct bch_dev *ca, "decrypting btree node: %s", bch2_err_str(ret))) goto fsck_err; } - - sectors = vstruct_sectors(bne, c->block_bits); } b->version_ondisk = min(b->version_ondisk, le16_to_cpu(i->version)); - ret = validate_bset(c, ca, b, i, b->written, sectors, READ, failed, err_msg); + ret = validate_bset(c, ca, b, i, b->written, READ, failed, err_msg); if (ret) goto fsck_err; @@ -2267,7 +2269,7 @@ static void btree_node_write_endio(struct bio *bio) } static int validate_bset_for_write(struct bch_fs *c, struct btree *b, - struct bset *i, unsigned sectors) + struct bset *i) { int ret = bch2_bkey_validate(c, bkey_i_to_s_c(&b->key), (struct bkey_validate_context) { @@ -2282,7 +2284,7 @@ static int validate_bset_for_write(struct bch_fs *c, struct btree *b, } ret = validate_bset_keys(c, b, i, WRITE, NULL, NULL) ?: - validate_bset(c, NULL, b, i, b->written, sectors, WRITE, NULL, NULL); + validate_bset(c, NULL, b, i, b->written, WRITE, NULL, NULL); if (ret) { bch2_inconsistent_error(c); dump_stack(); @@ -2475,7 +2477,7 @@ do_write: /* if we're going to be encrypting, check metadata validity first: */ if (validate_before_checksum && - validate_bset_for_write(c, b, i, sectors_to_write)) + validate_bset_for_write(c, b, i)) goto err; ret = bset_encrypt(c, i, b->written << 9); @@ -2492,7 +2494,7 @@ do_write: /* if we're not encrypting, check metadata after checksumming: */ if (!validate_before_checksum && - validate_bset_for_write(c, b, i, sectors_to_write)) + validate_bset_for_write(c, b, i)) goto err; /* From 56be92c63f02e0f6fd855075acb1471ea1c68539 Mon Sep 17 00:00:00 2001 From: Alan Huang Date: Sun, 15 Jun 2025 13:41:22 +0800 Subject: [PATCH 057/289] bcachefs: Fix pool->alloc NULL pointer dereference btree_interior_update_pool has not been initialized before the filesystem becomes read-write, thus mempool_alloc in bch2_btree_update_start will trigger pool->alloc NULL pointer dereference in mempool_alloc_noprof Reported-by: syzbot+2f3859bd28f20fa682e6@syzkaller.appspotmail.com Signed-off-by: Alan Huang Signed-off-by: Kent Overstreet --- fs/bcachefs/bcachefs.h | 3 ++- fs/bcachefs/chardev.c | 29 ++++++++++++++++++++++------- 2 files changed, 24 insertions(+), 8 deletions(-) diff --git a/fs/bcachefs/bcachefs.h b/fs/bcachefs/bcachefs.h index 5a1cede2febf..8043943cdf6a 100644 --- a/fs/bcachefs/bcachefs.h +++ b/fs/bcachefs/bcachefs.h @@ -767,7 +767,8 @@ struct btree_trans_buf { x(sysfs) \ x(btree_write_buffer) \ x(btree_node_scrub) \ - x(async_recovery_passes) + x(async_recovery_passes) \ + x(ioctl_data) enum bch_write_ref { #define x(n) BCH_WRITE_REF_##n, diff --git a/fs/bcachefs/chardev.c b/fs/bcachefs/chardev.c index fde3c2380e28..5ea89aa2b0c4 100644 --- a/fs/bcachefs/chardev.c +++ b/fs/bcachefs/chardev.c @@ -319,6 +319,7 @@ static int bch2_data_thread(void *arg) ctx->stats.ret = BCH_IOCTL_DATA_EVENT_RET_done; ctx->stats.data_type = (int) DATA_PROGRESS_DATA_TYPE_done; } + enumerated_ref_put(&ctx->c->writes, BCH_WRITE_REF_ioctl_data); return 0; } @@ -378,15 +379,24 @@ static long bch2_ioctl_data(struct bch_fs *c, struct bch_data_ctx *ctx; int ret; - if (!capable(CAP_SYS_ADMIN)) - return -EPERM; + if (!enumerated_ref_tryget(&c->writes, BCH_WRITE_REF_ioctl_data)) + return -EROFS; - if (arg.op >= BCH_DATA_OP_NR || arg.flags) - return -EINVAL; + if (!capable(CAP_SYS_ADMIN)) { + ret = -EPERM; + goto put_ref; + } + + if (arg.op >= BCH_DATA_OP_NR || arg.flags) { + ret = -EINVAL; + goto put_ref; + } ctx = kzalloc(sizeof(*ctx), GFP_KERNEL); - if (!ctx) - return -ENOMEM; + if (!ctx) { + ret = -ENOMEM; + goto put_ref; + } ctx->c = c; ctx->arg = arg; @@ -395,7 +405,12 @@ static long bch2_ioctl_data(struct bch_fs *c, &bcachefs_data_ops, bch2_data_thread); if (ret < 0) - kfree(ctx); + goto cleanup; + return ret; +cleanup: + kfree(ctx); +put_ref: + enumerated_ref_put(&c->writes, BCH_WRITE_REF_ioctl_data); return ret; } From 03208bd06a61bc2ebc423b6485cbcffecd37af36 Mon Sep 17 00:00:00 2001 From: Bharadwaj Raju Date: Sun, 15 Jun 2025 22:15:38 +0530 Subject: [PATCH 058/289] bcachefs: don't return fsck_fix for unfixable node errors in __btree_err After cd3cdb1ef706 ("Single err message for btree node reads"), all errors caused __btree_err to return -BCH_ERR_fsck_fix no matter what the actual error type was if the recovery pass was scanning for btree nodes. This lead to the code continuing despite things like bad node formats when they earlier would have caused a jump to fsck_err, because btree_err only jumps when the return from __btree_err does not match fsck_fix. Ultimately this lead to undefined behavior by attempting to unpack a key based on an invalid format. Make only errors of type -BCH_ERR_btree_node_read_err_fixable cause __btree_err to return -BCH_ERR_fsck_fix when scanning for btree nodes. Reported-by: syzbot+cfd994b9cdf00446fd54@syzkaller.appspotmail.com Fixes: cd3cdb1ef706 ("bcachefs: Single err message for btree node reads") Signed-off-by: Bharadwaj Raju Signed-off-by: Kent Overstreet --- fs/bcachefs/btree_io.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/fs/bcachefs/btree_io.c b/fs/bcachefs/btree_io.c index 005d5c94edd0..dca5530a2f49 100644 --- a/fs/bcachefs/btree_io.c +++ b/fs/bcachefs/btree_io.c @@ -557,7 +557,9 @@ static int __btree_err(int ret, const char *fmt, ...) { if (c->recovery.curr_pass == BCH_RECOVERY_PASS_scan_for_btree_nodes) - return bch_err_throw(c, fsck_fix); + return ret == -BCH_ERR_btree_node_read_err_fixable + ? bch_err_throw(c, fsck_fix) + : ret; bool have_retry = false; int ret2; From f2a701fd94f161bdca7537284ba218c20181451e Mon Sep 17 00:00:00 2001 From: Kent Overstreet Date: Sun, 15 Jun 2025 12:09:24 -0400 Subject: [PATCH 059/289] bcachefs: fsck: Improve check_key_has_inode() Print out more info when we find a key (extent, dirent, xattr) for a missing inode - was there a good inode in an older snapshot, full(ish) list of keys for that missing inode, so we can make better decisions on how to repair. If it looks like it should've been deleted, autofix it. If we ever hit the non-autofix cases, we'll want to write more repair code (possibly reconstituting the inode). Signed-off-by: Kent Overstreet --- fs/bcachefs/fsck.c | 108 ++++++++++++++++++++++++++------- fs/bcachefs/sb-errors_format.h | 2 +- 2 files changed, 88 insertions(+), 22 deletions(-) diff --git a/fs/bcachefs/fsck.c b/fs/bcachefs/fsck.c index 48810a8e5d4b..f25d33fb3e23 100644 --- a/fs/bcachefs/fsck.c +++ b/fs/bcachefs/fsck.c @@ -1436,6 +1436,7 @@ static int check_key_has_inode(struct btree_trans *trans, { struct bch_fs *c = trans->c; struct printbuf buf = PRINTBUF; + struct btree_iter iter2 = {}; int ret = PTR_ERR_OR_ZERO(i); if (ret) return ret; @@ -1445,40 +1446,105 @@ static int check_key_has_inode(struct btree_trans *trans, bool have_inode = i && !i->whiteout; - if (!have_inode && (c->sb.btrees_lost_data & BIT_ULL(BTREE_ID_inodes))) { - ret = reconstruct_inode(trans, iter->btree_id, k.k->p.snapshot, k.k->p.inode) ?: - bch2_trans_commit(trans, NULL, NULL, BCH_TRANS_COMMIT_no_enospc); - if (ret) - goto err; + if (!have_inode && (c->sb.btrees_lost_data & BIT_ULL(BTREE_ID_inodes))) + goto reconstruct; - inode->last_pos.inode--; - ret = bch_err_throw(c, transaction_restart_nested); - goto err; + if (have_inode && btree_matches_i_mode(iter->btree_id, i->inode.bi_mode)) + goto out; + + prt_printf(&buf, ", "); + + bool have_old_inode = false; + darray_for_each(inode->inodes, i2) + if (!i2->whiteout && + bch2_snapshot_is_ancestor(c, k.k->p.snapshot, i2->inode.bi_snapshot) && + btree_matches_i_mode(iter->btree_id, i2->inode.bi_mode)) { + prt_printf(&buf, "but found good inode in older snapshot\n"); + bch2_inode_unpacked_to_text(&buf, &i2->inode); + prt_newline(&buf); + have_old_inode = true; + break; + } + + struct bkey_s_c k2; + unsigned nr_keys = 0; + + prt_printf(&buf, "found keys:\n"); + + for_each_btree_key_max_norestart(trans, iter2, iter->btree_id, + SPOS(k.k->p.inode, 0, k.k->p.snapshot), + POS(k.k->p.inode, U64_MAX), + 0, k2, ret) { + nr_keys++; + if (nr_keys <= 10) { + bch2_bkey_val_to_text(&buf, c, k2); + prt_newline(&buf); + } + if (nr_keys >= 100) + break; } - if (fsck_err_on(!have_inode, - trans, key_in_missing_inode, - "key in missing inode:\n%s", - (printbuf_reset(&buf), - bch2_bkey_val_to_text(&buf, c, k), buf.buf))) - goto delete; + if (ret) + goto err; - if (fsck_err_on(have_inode && !btree_matches_i_mode(iter->btree_id, i->inode.bi_mode), - trans, key_in_wrong_inode_type, - "key for wrong inode mode %o:\n%s", - i->inode.bi_mode, - (printbuf_reset(&buf), - bch2_bkey_val_to_text(&buf, c, k), buf.buf))) - goto delete; + if (nr_keys > 100) + prt_printf(&buf, "found > %u keys for this missing inode\n", nr_keys); + else if (nr_keys > 10) + prt_printf(&buf, "found %u keys for this missing inode\n", nr_keys); + + if (!have_inode) { + if (fsck_err_on(!have_inode, + trans, key_in_missing_inode, + "key in missing inode%s", buf.buf)) { + /* + * Maybe a deletion that raced with data move, or something + * weird like that? But if we know the inode was deleted, or + * it's just a few keys, we can safely delete them. + * + * If it's many keys, we should probably recreate the inode + */ + if (have_old_inode || nr_keys <= 2) + goto delete; + else + goto reconstruct; + } + } else { + /* + * not autofix, this one would be a giant wtf - bit error in the + * inode corrupting i_mode? + * + * may want to try repairing inode instead of deleting + */ + if (fsck_err_on(!btree_matches_i_mode(iter->btree_id, i->inode.bi_mode), + trans, key_in_wrong_inode_type, + "key for wrong inode mode %o%s", + i->inode.bi_mode, buf.buf)) + goto delete; + } out: err: fsck_err: + bch2_trans_iter_exit(trans, &iter2); printbuf_exit(&buf); bch_err_fn(c, ret); return ret; delete: + /* + * XXX: print out more info + * count up extents for this inode, check if we have different inode in + * an older snapshot version, perhaps decide if we want to reconstitute + */ ret = bch2_btree_delete_at(trans, iter, BTREE_UPDATE_internal_snapshot_node); goto out; +reconstruct: + ret = reconstruct_inode(trans, iter->btree_id, k.k->p.snapshot, k.k->p.inode) ?: + bch2_trans_commit(trans, NULL, NULL, BCH_TRANS_COMMIT_no_enospc); + if (ret) + goto err; + + inode->last_pos.inode--; + ret = bch_err_throw(c, transaction_restart_nested); + goto out; } static int check_i_sectors_notnested(struct btree_trans *trans, struct inode_walker *w) diff --git a/fs/bcachefs/sb-errors_format.h b/fs/bcachefs/sb-errors_format.h index 56971ebe0f78..f1aa40542a0e 100644 --- a/fs/bcachefs/sb-errors_format.h +++ b/fs/bcachefs/sb-errors_format.h @@ -251,7 +251,7 @@ enum bch_fsck_flags { x(deleted_inode_not_unlinked, 214, FSCK_AUTOFIX) \ x(deleted_inode_has_child_snapshots, 288, FSCK_AUTOFIX) \ x(extent_overlapping, 215, 0) \ - x(key_in_missing_inode, 216, 0) \ + x(key_in_missing_inode, 216, FSCK_AUTOFIX) \ x(key_in_wrong_inode_type, 217, 0) \ x(extent_past_end_of_inode, 218, FSCK_AUTOFIX) \ x(dirent_empty_name, 219, 0) \ From 1cddad0fcbccc7056036f2a83184a3ca5c62f7d3 Mon Sep 17 00:00:00 2001 From: Kent Overstreet Date: Sun, 15 Jun 2025 16:43:34 -0400 Subject: [PATCH 060/289] bcachefs: Call bch2_fs_init_rw() early if we'll be going rw kthread creation checks for pending signals, which is _very_ annoying if we have to do a long recovery and don't go rw until we've done significant work. Check if we'll be going rw and pre-allocate kthreads/workqueues. Signed-off-by: Kent Overstreet --- fs/bcachefs/recovery.c | 10 ++++++++++ fs/bcachefs/recovery_passes.c | 6 +----- fs/bcachefs/recovery_passes.h | 9 +++++++++ fs/bcachefs/super.c | 13 +++++++++++-- fs/bcachefs/super.h | 1 + 5 files changed, 32 insertions(+), 7 deletions(-) diff --git a/fs/bcachefs/recovery.c b/fs/bcachefs/recovery.c index 37f2cc1ec2f8..fa5d1ef5bea6 100644 --- a/fs/bcachefs/recovery.c +++ b/fs/bcachefs/recovery.c @@ -762,6 +762,16 @@ int bch2_fs_recovery(struct bch_fs *c) c->opts.fsck = true; } + if (go_rw_in_recovery(c)) { + /* + * start workqueues/kworkers early - kthread creation checks for + * pending signals, which is _very_ annoying + */ + ret = bch2_fs_init_rw(c); + if (ret) + goto err; + } + mutex_lock(&c->sb_lock); struct bch_sb_field_ext *ext = bch2_sb_field_get(c->disk_sb.sb, ext); bool write_sb = false; diff --git a/fs/bcachefs/recovery_passes.c b/fs/bcachefs/recovery_passes.c index 35ac0d64d73a..c2c18c0a5429 100644 --- a/fs/bcachefs/recovery_passes.c +++ b/fs/bcachefs/recovery_passes.c @@ -217,11 +217,7 @@ static int bch2_set_may_go_rw(struct bch_fs *c) set_bit(BCH_FS_may_go_rw, &c->flags); - if (keys->nr || - !c->opts.read_only || - !c->sb.clean || - c->opts.recovery_passes || - (c->opts.fsck && !(c->sb.features & BIT_ULL(BCH_FEATURE_no_alloc_info)))) { + if (go_rw_in_recovery(c)) { if (c->sb.features & BIT_ULL(BCH_FEATURE_no_alloc_info)) { bch_info(c, "mounting a filesystem with no alloc info read-write; will recreate"); bch2_reconstruct_alloc(c); diff --git a/fs/bcachefs/recovery_passes.h b/fs/bcachefs/recovery_passes.h index 260571c7105e..2117f0ce1922 100644 --- a/fs/bcachefs/recovery_passes.h +++ b/fs/bcachefs/recovery_passes.h @@ -17,6 +17,15 @@ enum bch_run_recovery_pass_flags { RUN_RECOVERY_PASS_ratelimit = BIT(1), }; +static inline bool go_rw_in_recovery(struct bch_fs *c) +{ + return (c->journal_keys.nr || + !c->opts.read_only || + !c->sb.clean || + c->opts.recovery_passes || + (c->opts.fsck && !(c->sb.features & BIT_ULL(BCH_FEATURE_no_alloc_info)))); +} + int bch2_run_print_explicit_recovery_pass(struct bch_fs *, enum bch_recovery_pass); int __bch2_run_explicit_recovery_pass(struct bch_fs *, struct printbuf *, diff --git a/fs/bcachefs/super.c b/fs/bcachefs/super.c index a5b97c9c5163..69c097ff54e7 100644 --- a/fs/bcachefs/super.c +++ b/fs/bcachefs/super.c @@ -210,7 +210,6 @@ static int bch2_dev_alloc(struct bch_fs *, unsigned); static int bch2_dev_sysfs_online(struct bch_fs *, struct bch_dev *); static void bch2_dev_io_ref_stop(struct bch_dev *, int); static void __bch2_dev_read_only(struct bch_fs *, struct bch_dev *); -static int bch2_fs_init_rw(struct bch_fs *); struct bch_fs *bch2_dev_to_fs(dev_t dev) { @@ -794,7 +793,7 @@ err: return ret; } -static int bch2_fs_init_rw(struct bch_fs *c) +int bch2_fs_init_rw(struct bch_fs *c) { if (test_bit(BCH_FS_rw_init_done, &c->flags)) return 0; @@ -1015,6 +1014,16 @@ static struct bch_fs *bch2_fs_alloc(struct bch_sb *sb, struct bch_opts *opts, if (ret) goto err; + if (go_rw_in_recovery(c)) { + /* + * start workqueues/kworkers early - kthread creation checks for + * pending signals, which is _very_ annoying + */ + ret = bch2_fs_init_rw(c); + if (ret) + goto err; + } + #ifdef CONFIG_UNICODE /* Default encoding until we can potentially have more as an option. */ c->cf_encoding = utf8_load(BCH_FS_DEFAULT_UTF8_ENCODING); diff --git a/fs/bcachefs/super.h b/fs/bcachefs/super.h index dc52f06cb2b9..e90bab9afe78 100644 --- a/fs/bcachefs/super.h +++ b/fs/bcachefs/super.h @@ -46,6 +46,7 @@ void __bch2_fs_stop(struct bch_fs *); void bch2_fs_free(struct bch_fs *); void bch2_fs_stop(struct bch_fs *); +int bch2_fs_init_rw(struct bch_fs *); int bch2_fs_start(struct bch_fs *); struct bch_fs *bch2_fs_open(darray_const_str *, struct bch_opts *); From 9ba6930ef8e0e00102716f4627896e0be6358d7b Mon Sep 17 00:00:00 2001 From: Kent Overstreet Date: Mon, 16 Jun 2025 14:00:40 -0400 Subject: [PATCH 061/289] bcachefs: Fix __bch2_inum_to_path() when crossing subvol boundaries The bch2_subvolume_get_snapshot() call needs to happen before the dirent lookup - the dirent is in the parent subvolume. Also, check for loops. Signed-off-by: Kent Overstreet --- fs/bcachefs/namei.c | 27 ++++++++++++++++++++++----- 1 file changed, 22 insertions(+), 5 deletions(-) diff --git a/fs/bcachefs/namei.c b/fs/bcachefs/namei.c index 779c22eb3979..7d1068aa998f 100644 --- a/fs/bcachefs/namei.c +++ b/fs/bcachefs/namei.c @@ -625,14 +625,26 @@ static int __bch2_inum_to_path(struct btree_trans *trans, { unsigned orig_pos = path->pos; int ret = 0; + DARRAY(subvol_inum) inums = {}; + + if (!snapshot) { + ret = bch2_subvolume_get_snapshot(trans, subvol, &snapshot); + if (ret) + goto disconnected; + } while (true) { - if (!snapshot) { - ret = bch2_subvolume_get_snapshot(trans, subvol, &snapshot); - if (ret) - goto disconnected; + subvol_inum n = (subvol_inum) { subvol ?: snapshot, inum }; + + if (darray_find_p(inums, i, i->subvol == n.subvol && i->inum == n.inum)) { + prt_str_reversed(path, "(loop)"); + break; } + ret = darray_push(&inums, n); + if (ret) + goto err; + struct bch_inode_unpacked inode; ret = bch2_inode_find_by_inum_snapshot(trans, inum, snapshot, &inode, 0); if (ret) @@ -650,7 +662,9 @@ static int __bch2_inum_to_path(struct btree_trans *trans, inum = inode.bi_dir; if (inode.bi_parent_subvol) { subvol = inode.bi_parent_subvol; - snapshot = 0; + ret = bch2_subvolume_get_snapshot(trans, inode.bi_parent_subvol, &snapshot); + if (ret) + goto disconnected; } struct btree_iter d_iter; @@ -662,6 +676,7 @@ static int __bch2_inum_to_path(struct btree_trans *trans, goto disconnected; struct qstr dirent_name = bch2_dirent_get_name(d); + prt_bytes_reversed(path, dirent_name.name, dirent_name.len); prt_char(path, '/'); @@ -677,8 +692,10 @@ out: goto err; reverse_bytes(path->buf + orig_pos, path->pos - orig_pos); + darray_exit(&inums); return 0; err: + darray_exit(&inums); return ret; disconnected: if (bch2_err_matches(ret, BCH_ERR_transaction_restart)) From 7029cc4d13453499a88f512720d26c1a0c4e957b Mon Sep 17 00:00:00 2001 From: Kent Overstreet Date: Sat, 7 Jun 2025 19:33:59 -0400 Subject: [PATCH 062/289] bcachefs: fsck: Print path when we find a subvol loop Signed-off-by: Kent Overstreet --- fs/bcachefs/fsck.c | 13 +++++++++---- 1 file changed, 9 insertions(+), 4 deletions(-) diff --git a/fs/bcachefs/fsck.c b/fs/bcachefs/fsck.c index f25d33fb3e23..0d87aa9b4f23 100644 --- a/fs/bcachefs/fsck.c +++ b/fs/bcachefs/fsck.c @@ -2578,6 +2578,11 @@ static int check_subvol_path(struct btree_trans *trans, struct btree_iter *iter, if (k.k->type != KEY_TYPE_subvolume) return 0; + subvol_inum start = { + .subvol = k.k->p.offset, + .inum = le64_to_cpu(bkey_s_c_to_subvolume(k).v->inode), + }; + while (k.k->p.offset != BCACHEFS_ROOT_SUBVOL) { ret = darray_push(&subvol_path, k.k->p.offset); if (ret) @@ -2596,11 +2601,11 @@ static int check_subvol_path(struct btree_trans *trans, struct btree_iter *iter, if (darray_u32_has(&subvol_path, parent)) { printbuf_reset(&buf); - prt_printf(&buf, "subvolume loop:\n"); + prt_printf(&buf, "subvolume loop: "); - darray_for_each_reverse(subvol_path, i) - prt_printf(&buf, "%u ", *i); - prt_printf(&buf, "%u", parent); + ret = bch2_inum_to_path(trans, start, &buf); + if (ret) + goto err; if (fsck_err(trans, subvol_loop, "%s", buf.buf)) ret = reattach_subvol(trans, s); From c1ca07a4dd1a1ab17ed651729c37b04af9f75ee8 Mon Sep 17 00:00:00 2001 From: Kent Overstreet Date: Mon, 16 Jun 2025 14:15:28 -0400 Subject: [PATCH 063/289] bcachefs: fsck: Fix remove_backpointer() for subvol roots The dirent will be in a different snapshot if the inode is a subvolume root. Signed-off-by: Kent Overstreet --- fs/bcachefs/fsck.c | 10 +++++++++- 1 file changed, 9 insertions(+), 1 deletion(-) diff --git a/fs/bcachefs/fsck.c b/fs/bcachefs/fsck.c index 0d87aa9b4f23..a3f73e7a2113 100644 --- a/fs/bcachefs/fsck.c +++ b/fs/bcachefs/fsck.c @@ -493,10 +493,18 @@ static int remove_backpointer(struct btree_trans *trans, if (!inode->bi_dir) return 0; + u32 snapshot = inode->bi_snapshot; + + if (inode->bi_parent_subvol) { + int ret = bch2_subvolume_get_snapshot(trans, inode->bi_parent_subvol, &snapshot); + if (ret) + return ret; + } + struct bch_fs *c = trans->c; struct btree_iter iter; struct bkey_s_c_dirent d = dirent_get_by_pos(trans, &iter, - SPOS(inode->bi_dir, inode->bi_dir_offset, inode->bi_snapshot)); + SPOS(inode->bi_dir, inode->bi_dir_offset, snapshot)); int ret = bkey_err(d) ?: dirent_points_to_inode(c, d, inode) ?: bch2_fsck_remove_dirent(trans, d.k->p); From 9fb09ace59b2beab312ec225630ce87ddbec6d79 Mon Sep 17 00:00:00 2001 From: Kent Overstreet Date: Mon, 16 Jun 2025 14:41:27 -0400 Subject: [PATCH 064/289] bcachefs: fsck: Fix reattach_inode() for subvol roots bch_subvolume.fs_path_parent needs to be updated as well, it should match inode.bi_parent_subvol. Signed-off-by: Kent Overstreet --- fs/bcachefs/fsck.c | 12 ++++++++++++ 1 file changed, 12 insertions(+) diff --git a/fs/bcachefs/fsck.c b/fs/bcachefs/fsck.c index a3f73e7a2113..6e64af4ccc36 100644 --- a/fs/bcachefs/fsck.c +++ b/fs/bcachefs/fsck.c @@ -372,6 +372,18 @@ static int reattach_inode(struct btree_trans *trans, struct bch_inode_unpacked * if (inode->bi_subvol) { inode->bi_parent_subvol = BCACHEFS_ROOT_SUBVOL; + struct btree_iter subvol_iter; + struct bkey_i_subvolume *subvol = + bch2_bkey_get_mut_typed(trans, &subvol_iter, + BTREE_ID_subvolumes, POS(0, inode->bi_subvol), + 0, subvolume); + ret = PTR_ERR_OR_ZERO(subvol); + if (ret) + return ret; + + subvol->v.fs_path_parent = BCACHEFS_ROOT_SUBVOL; + bch2_trans_iter_exit(trans, &subvol_iter); + u64 root_inum; ret = subvol_lookup(trans, inode->bi_parent_subvol, &dirent_snapshot, &root_inum); From 3e5ceaa5bfd71ae94d736426f8a27391023182ff Mon Sep 17 00:00:00 2001 From: Kent Overstreet Date: Mon, 16 Jun 2025 16:13:02 -0400 Subject: [PATCH 065/289] bcachefs: fsck: check_directory_structure runs in reverse order When we find a directory connectivity problem, we should do the repair in the oldest snapshot that has the issue - so that we don't end up duplicating work or making a real mess of things. Oldest snapshot IDs have the highest integer value, so - just walk inodes in reverse order. Signed-off-by: Kent Overstreet --- fs/bcachefs/fsck.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/bcachefs/fsck.c b/fs/bcachefs/fsck.c index 6e64af4ccc36..7414a16af990 100644 --- a/fs/bcachefs/fsck.c +++ b/fs/bcachefs/fsck.c @@ -2841,7 +2841,7 @@ fsck_err: int bch2_check_directory_structure(struct bch_fs *c) { int ret = bch2_trans_run(c, - for_each_btree_key_commit(trans, iter, BTREE_ID_inodes, POS_MIN, + for_each_btree_key_reverse_commit(trans, iter, BTREE_ID_inodes, POS_MIN, BTREE_ITER_intent| BTREE_ITER_prefetch| BTREE_ITER_all_snapshots, k, From 8d6ac82361df55d223186d8cd2ce884310ee5e6b Mon Sep 17 00:00:00 2001 From: Kent Overstreet Date: Mon, 16 Jun 2025 16:14:59 -0400 Subject: [PATCH 066/289] bcachefs: fsck: additional diagnostics for reattach_inode() Log the inode's new path. Signed-off-by: Kent Overstreet --- fs/bcachefs/fsck.c | 12 ++++++++++++ 1 file changed, 12 insertions(+) diff --git a/fs/bcachefs/fsck.c b/fs/bcachefs/fsck.c index 7414a16af990..f2c8e904acab 100644 --- a/fs/bcachefs/fsck.c +++ b/fs/bcachefs/fsck.c @@ -399,6 +399,8 @@ static int reattach_inode(struct btree_trans *trans, struct bch_inode_unpacked * if (ret) return ret; + bch_verbose(c, "got lostfound inum %llu", lostfound.bi_inum); + lostfound.bi_nlink += S_ISDIR(inode->bi_mode); /* ensure lost+found inode is also present in inode snapshot */ @@ -435,6 +437,16 @@ static int reattach_inode(struct btree_trans *trans, struct bch_inode_unpacked * if (ret) return ret; + { + CLASS(printbuf, buf)(); + ret = bch2_inum_snapshot_to_path(trans, inode->bi_inum, + inode->bi_snapshot, NULL, &buf); + if (ret) + return ret; + + bch_info(c, "reattached at %s", buf.buf); + } + /* * Fix up inodes in child snapshots: if they should also be reattached * update the backpointer field, if they should not be we need to emit From 583ba52a40dd1f9328e2626056f0797d273817e7 Mon Sep 17 00:00:00 2001 From: Kent Overstreet Date: Mon, 16 Jun 2025 16:15:41 -0400 Subject: [PATCH 067/289] bcachefs: fsck: check_subdir_count logs path We can easily go from inode number -> path now, which makes for more useful log messages. Signed-off-by: Kent Overstreet --- fs/bcachefs/fsck.c | 24 ++++++++++++++++-------- 1 file changed, 16 insertions(+), 8 deletions(-) diff --git a/fs/bcachefs/fsck.c b/fs/bcachefs/fsck.c index f2c8e904acab..96b9ea58705f 100644 --- a/fs/bcachefs/fsck.c +++ b/fs/bcachefs/fsck.c @@ -2066,14 +2066,22 @@ static int check_subdir_count_notnested(struct btree_trans *trans, struct inode_ continue; } - if (fsck_err_on(i->inode.bi_nlink != i->count, - trans, inode_dir_wrong_nlink, - "directory %llu:%u with wrong i_nlink: got %u, should be %llu", - w->last_pos.inode, i->inode.bi_snapshot, i->inode.bi_nlink, i->count)) { - i->inode.bi_nlink = i->count; - ret = bch2_fsck_write_inode(trans, &i->inode); - if (ret) - break; + if (i->inode.bi_nlink != i->count) { + CLASS(printbuf, buf)(); + + lockrestart_do(trans, + bch2_inum_snapshot_to_path(trans, w->last_pos.inode, + i->inode.bi_snapshot, NULL, &buf)); + + if (fsck_err_on(i->inode.bi_nlink != i->count, + trans, inode_dir_wrong_nlink, + "directory with wrong i_nlink: got %u, should be %llu\n%s", + i->inode.bi_nlink, i->count, buf.buf)) { + i->inode.bi_nlink = i->count; + ret = bch2_fsck_write_inode(trans, &i->inode); + if (ret) + break; + } } } fsck_err: From 495ba899d5a163d4cd50627ab48e3edd80dfa3a4 Mon Sep 17 00:00:00 2001 From: Kent Overstreet Date: Mon, 16 Jun 2025 16:16:38 -0400 Subject: [PATCH 068/289] bcachefs: fsck: Fix check_path_loop() + snapshots A path exists in a particular snapshot: we should do the pathwalk in the snapshot ID of the inode we started from, _not_ change snapshot ID as we walk inodes and dirents. Signed-off-by: Kent Overstreet --- fs/bcachefs/fsck.c | 62 +++++++++++++++++++--------------------------- 1 file changed, 26 insertions(+), 36 deletions(-) diff --git a/fs/bcachefs/fsck.c b/fs/bcachefs/fsck.c index 96b9ea58705f..73cff24598ed 100644 --- a/fs/bcachefs/fsck.c +++ b/fs/bcachefs/fsck.c @@ -2689,19 +2689,13 @@ int bch2_check_subvolume_structure(struct bch_fs *c) return ret; } -struct pathbuf_entry { - u64 inum; - u32 snapshot; -}; - -typedef DARRAY(struct pathbuf_entry) pathbuf; - -static int bch2_bi_depth_renumber_one(struct btree_trans *trans, struct pathbuf_entry *p, +static int bch2_bi_depth_renumber_one(struct btree_trans *trans, + u64 inum, u32 snapshot, u32 new_depth) { struct btree_iter iter; struct bkey_s_c k = bch2_bkey_get_iter(trans, &iter, BTREE_ID_inodes, - SPOS(0, p->inum, p->snapshot), 0); + SPOS(0, inum, snapshot), 0); struct bch_inode_unpacked inode; int ret = bkey_err(k) ?: @@ -2720,14 +2714,15 @@ err: return ret; } -static int bch2_bi_depth_renumber(struct btree_trans *trans, pathbuf *path, u32 new_bi_depth) +static int bch2_bi_depth_renumber(struct btree_trans *trans, darray_u64 *path, + u32 snapshot, u32 new_bi_depth) { u32 restart_count = trans->restart_count; int ret = 0; darray_for_each_reverse(*path, i) { ret = nested_lockrestart_do(trans, - bch2_bi_depth_renumber_one(trans, i, new_bi_depth)); + bch2_bi_depth_renumber_one(trans, *i, snapshot, new_bi_depth)); bch_err_fn(trans->c, ret); if (ret) break; @@ -2738,26 +2733,19 @@ static int bch2_bi_depth_renumber(struct btree_trans *trans, pathbuf *path, u32 return ret ?: trans_was_restarted(trans, restart_count); } -static bool path_is_dup(pathbuf *p, u64 inum, u32 snapshot) -{ - darray_for_each(*p, i) - if (i->inum == inum && - i->snapshot == snapshot) - return true; - return false; -} - static int check_path_loop(struct btree_trans *trans, struct bkey_s_c inode_k) { struct bch_fs *c = trans->c; struct btree_iter inode_iter = {}; - pathbuf path = {}; + darray_u64 path = {}; struct printbuf buf = PRINTBUF; u32 snapshot = inode_k.k->p.snapshot; bool redo_bi_depth = false; u32 min_bi_depth = U32_MAX; int ret = 0; + struct bpos start = inode_k.k->p; + struct bch_inode_unpacked inode; ret = bch2_inode_unpack(inode_k, &inode); if (ret) @@ -2766,9 +2754,9 @@ static int check_path_loop(struct btree_trans *trans, struct bkey_s_c inode_k) while (!inode.bi_subvol) { struct btree_iter dirent_iter; struct bkey_s_c_dirent d; - u32 parent_snapshot = snapshot; - d = inode_get_dirent(trans, &dirent_iter, &inode, &parent_snapshot); + d = dirent_get_by_pos(trans, &dirent_iter, + SPOS(inode.bi_dir, inode.bi_dir_offset, snapshot)); ret = bkey_err(d.s_c); if (ret && !bch2_err_matches(ret, ENOENT)) goto out; @@ -2786,15 +2774,10 @@ static int check_path_loop(struct btree_trans *trans, struct bkey_s_c inode_k) bch2_trans_iter_exit(trans, &dirent_iter); - ret = darray_push(&path, ((struct pathbuf_entry) { - .inum = inode.bi_inum, - .snapshot = snapshot, - })); + ret = darray_push(&path, inode.bi_inum); if (ret) return ret; - snapshot = parent_snapshot; - bch2_trans_iter_exit(trans, &inode_iter); inode_k = bch2_bkey_get_iter(trans, &inode_iter, BTREE_ID_inodes, SPOS(0, inode.bi_dir, snapshot), 0); @@ -2816,15 +2799,22 @@ static int check_path_loop(struct btree_trans *trans, struct bkey_s_c inode_k) break; inode = parent_inode; - snapshot = inode_k.k->p.snapshot; redo_bi_depth = true; - if (path_is_dup(&path, inode.bi_inum, snapshot)) { + if (darray_find(path, inode.bi_inum)) { printbuf_reset(&buf); - prt_printf(&buf, "directory structure loop:\n"); - darray_for_each_reverse(path, i) - prt_printf(&buf, "%llu:%u ", i->inum, i->snapshot); - prt_printf(&buf, "%llu:%u", inode.bi_inum, snapshot); + prt_printf(&buf, "directory structure loop in snapshot %u: ", + snapshot); + + ret = bch2_inum_snapshot_to_path(trans, start.offset, start.snapshot, NULL, &buf); + if (ret) + goto out; + + if (c->opts.verbose) { + prt_newline(&buf); + darray_for_each(path, i) + prt_printf(&buf, "%llu ", *i); + } if (fsck_err(trans, dir_loop, "%s", buf.buf)) { ret = remove_backpointer(trans, &inode); @@ -2844,7 +2834,7 @@ static int check_path_loop(struct btree_trans *trans, struct bkey_s_c inode_k) min_bi_depth = 0; if (redo_bi_depth) - ret = bch2_bi_depth_renumber(trans, &path, min_bi_depth); + ret = bch2_bi_depth_renumber(trans, &path, snapshot, min_bi_depth); out: fsck_err: bch2_trans_iter_exit(trans, &inode_iter); From 6c4897caefc8e4c6f2bf08f8317b955672667909 Mon Sep 17 00:00:00 2001 From: Kent Overstreet Date: Mon, 16 Jun 2025 20:08:23 -0400 Subject: [PATCH 069/289] bcachefs: Fix bch2_read_bio_to_text() We can only pass negative error codes to bch2_err_str(); if it's a positive integer it's not an error and we trip an assert. Signed-off-by: Kent Overstreet --- fs/bcachefs/io_read.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/fs/bcachefs/io_read.c b/fs/bcachefs/io_read.c index 04bbdcf58e40..cd184b219a65 100644 --- a/fs/bcachefs/io_read.c +++ b/fs/bcachefs/io_read.c @@ -1491,7 +1491,12 @@ void bch2_read_bio_to_text(struct printbuf *out, struct bch_read_bio *rbio) prt_printf(out, "have_ioref:\t%u\n", rbio->have_ioref); prt_printf(out, "narrow_crcs:\t%u\n", rbio->narrow_crcs); prt_printf(out, "context:\t%u\n", rbio->context); - prt_printf(out, "ret:\t%s\n", bch2_err_str(rbio->ret)); + + int ret = READ_ONCE(rbio->ret); + if (ret < 0) + prt_printf(out, "ret:\t%s\n", bch2_err_str(ret)); + else + prt_printf(out, "ret:\t%i\n", ret); prt_printf(out, "flags:\t"); bch2_prt_bitflags(out, bch2_read_bio_flags, rbio->flags); From e1f0e1a45a40f45ed615abf938cbdfeb7793a9ee Mon Sep 17 00:00:00 2001 From: Kent Overstreet Date: Tue, 17 Jun 2025 10:23:53 -0400 Subject: [PATCH 070/289] bcachefs: Fix restart handling in btree_node_scrub_work() btree node scrub was sometimes failing to rewrite nodes with errors; bch2_btree_node_rewrite() can return a transaction restart and we weren't checking - the lockrestart_do() needs to wrap the entire operation. And there's a better helper it should've been using, bch2_btree_node_rewrite_key(), which makes all this more convenient. Signed-off-by: Kent Overstreet --- fs/bcachefs/btree_io.c | 28 ++++++---------------------- fs/bcachefs/btree_update_interior.c | 11 +++++------ fs/bcachefs/btree_update_interior.h | 3 +++ 3 files changed, 14 insertions(+), 28 deletions(-) diff --git a/fs/bcachefs/btree_io.c b/fs/bcachefs/btree_io.c index dca5530a2f49..08b22bddd747 100644 --- a/fs/bcachefs/btree_io.c +++ b/fs/bcachefs/btree_io.c @@ -1986,28 +1986,12 @@ static void btree_node_scrub_work(struct work_struct *work) prt_newline(&err); if (!btree_node_scrub_check(c, scrub->buf, scrub->written, &err)) { - struct btree_trans *trans = bch2_trans_get(c); - - struct btree_iter iter; - bch2_trans_node_iter_init(trans, &iter, scrub->btree, - scrub->key.k->k.p, 0, scrub->level - 1, 0); - - struct btree *b; - int ret = lockrestart_do(trans, - PTR_ERR_OR_ZERO(b = bch2_btree_iter_peek_node(trans, &iter))); - if (ret) - goto err; - - if (bkey_i_to_btree_ptr_v2(&b->key)->v.seq == scrub->seq) { - bch_err(c, "error validating btree node during scrub on %s at btree %s", - scrub->ca->name, err.buf); - - ret = bch2_btree_node_rewrite(trans, &iter, b, 0, 0); - } -err: - bch2_trans_iter_exit(trans, &iter); - bch2_trans_begin(trans); - bch2_trans_put(trans); + int ret = bch2_trans_do(c, + bch2_btree_node_rewrite_key(trans, scrub->btree, scrub->level - 1, + scrub->key.k, 0)); + if (!bch2_err_matches(ret, ENOENT) && + !bch2_err_matches(ret, EROFS)) + bch_err_fn_ratelimited(c, ret); } printbuf_exit(&err); diff --git a/fs/bcachefs/btree_update_interior.c b/fs/bcachefs/btree_update_interior.c index e77584607f0d..7bf1bd6a6e92 100644 --- a/fs/bcachefs/btree_update_interior.c +++ b/fs/bcachefs/btree_update_interior.c @@ -2293,9 +2293,9 @@ err: goto out; } -static int bch2_btree_node_rewrite_key(struct btree_trans *trans, - enum btree_id btree, unsigned level, - struct bkey_i *k, unsigned flags) +int bch2_btree_node_rewrite_key(struct btree_trans *trans, + enum btree_id btree, unsigned level, + struct bkey_i *k, unsigned flags) { struct btree_iter iter; bch2_trans_node_iter_init(trans, &iter, @@ -2367,9 +2367,8 @@ static void async_btree_node_rewrite_work(struct work_struct *work) int ret = bch2_trans_do(c, bch2_btree_node_rewrite_key(trans, a->btree_id, a->level, a->key.k, 0)); - if (ret != -ENOENT && - !bch2_err_matches(ret, EROFS) && - ret != -BCH_ERR_journal_shutdown) + if (!bch2_err_matches(ret, ENOENT) && + !bch2_err_matches(ret, EROFS)) bch_err_fn_ratelimited(c, ret); spin_lock(&c->btree_node_rewrites_lock); diff --git a/fs/bcachefs/btree_update_interior.h b/fs/bcachefs/btree_update_interior.h index b649c36c3fbb..ac04e45a8515 100644 --- a/fs/bcachefs/btree_update_interior.h +++ b/fs/bcachefs/btree_update_interior.h @@ -176,6 +176,9 @@ static inline int bch2_foreground_maybe_merge(struct btree_trans *trans, int bch2_btree_node_rewrite(struct btree_trans *, struct btree_iter *, struct btree *, unsigned, unsigned); +int bch2_btree_node_rewrite_key(struct btree_trans *, + enum btree_id, unsigned, + struct bkey_i *, unsigned); int bch2_btree_node_rewrite_pos(struct btree_trans *, enum btree_id, unsigned, struct bpos, unsigned, unsigned); From 7f8073cfb04a97842fe891ca50dad60afd1e3121 Mon Sep 17 00:00:00 2001 From: Heiko Carstens Date: Fri, 13 Jun 2025 17:53:04 +0200 Subject: [PATCH 071/289] s390/ptrace: Fix pointer dereferencing in regs_get_kernel_stack_nth() The recent change which added READ_ONCE_NOCHECK() to read the nth entry from the kernel stack incorrectly dropped dereferencing of the stack pointer in order to read the requested entry. In result the address of the entry is returned instead of its content. Dereference the pointer again to fix this. Reported-by: Will Deacon Closes: https://lore.kernel.org/r/20250612163331.GA13384@willie-the-truck Fixes: d93a855c31b7 ("s390/ptrace: Avoid KASAN false positives in regs_get_kernel_stack_nth()") Cc: stable@vger.kernel.org Reviewed-by: Alexander Gordeev Signed-off-by: Heiko Carstens Signed-off-by: Alexander Gordeev --- arch/s390/include/asm/ptrace.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/s390/include/asm/ptrace.h b/arch/s390/include/asm/ptrace.h index 62c0ab4a4b9d..0905fa99a31e 100644 --- a/arch/s390/include/asm/ptrace.h +++ b/arch/s390/include/asm/ptrace.h @@ -265,7 +265,7 @@ static __always_inline unsigned long regs_get_kernel_stack_nth(struct pt_regs *r addr = kernel_stack_pointer(regs) + n * sizeof(long); if (!regs_within_kernel_stack(regs, addr)) return 0; - return READ_ONCE_NOCHECK(addr); + return READ_ONCE_NOCHECK(*(unsigned long *)addr); } /** From 2c6b640ea08bff1a192bf87fa45246ff1e40767c Mon Sep 17 00:00:00 2001 From: Or Har-Toov Date: Mon, 16 Jun 2025 11:17:01 +0300 Subject: [PATCH 072/289] RDMA/mlx5: Fix unsafe xarray access in implicit ODP handling __xa_store() and __xa_erase() were used without holding the proper lock, which led to a lockdep warning due to unsafe RCU usage. This patch replaces them with xa_store() and xa_erase(), which perform the necessary locking internally. ============================= WARNING: suspicious RCPU usage 6.14.0-rc7_for_upstream_debug_2025_03_18_15_01 #1 Not tainted ----------------------------- ./include/linux/xarray.h:1211 suspicious rcu_dereference_protected() usage! other info that might help us debug this: rcu_scheduler_active = 2, debug_locks = 1 3 locks held by kworker/u136:0/219: at: process_one_work+0xbe4/0x15f0 process_one_work+0x75c/0x15f0 pagefault_mr+0x9a5/0x1390 [mlx5_ib] stack backtrace: CPU: 14 UID: 0 PID: 219 Comm: kworker/u136:0 Not tainted 6.14.0-rc7_for_upstream_debug_2025_03_18_15_01 #1 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014 Workqueue: mlx5_ib_page_fault mlx5_ib_eqe_pf_action [mlx5_ib] Call Trace: dump_stack_lvl+0xa8/0xc0 lockdep_rcu_suspicious+0x1e6/0x260 xas_create+0xb8a/0xee0 xas_store+0x73/0x14c0 __xa_store+0x13c/0x220 ? xa_store_range+0x390/0x390 ? spin_bug+0x1d0/0x1d0 pagefault_mr+0xcb5/0x1390 [mlx5_ib] ? _raw_spin_unlock+0x1f/0x30 mlx5_ib_eqe_pf_action+0x3be/0x2620 [mlx5_ib] ? lockdep_hardirqs_on_prepare+0x400/0x400 ? mlx5_ib_invalidate_range+0xcb0/0xcb0 [mlx5_ib] process_one_work+0x7db/0x15f0 ? pwq_dec_nr_in_flight+0xda0/0xda0 ? assign_work+0x168/0x240 worker_thread+0x57d/0xcd0 ? rescuer_thread+0xc40/0xc40 kthread+0x3b3/0x800 ? kthread_is_per_cpu+0xb0/0xb0 ? lock_downgrade+0x680/0x680 ? do_raw_spin_lock+0x12d/0x270 ? spin_bug+0x1d0/0x1d0 ? finish_task_switch.isra.0+0x284/0x9e0 ? lockdep_hardirqs_on_prepare+0x284/0x400 ? kthread_is_per_cpu+0xb0/0xb0 ret_from_fork+0x2d/0x70 ? kthread_is_per_cpu+0xb0/0xb0 ret_from_fork_asm+0x11/0x20 Fixes: d3d930411ce3 ("RDMA/mlx5: Fix implicit ODP use after free") Link: https://patch.msgid.link/r/a85ddd16f45c8cb2bc0a188c2b0fcedfce975eb8.1750061791.git.leon@kernel.org Signed-off-by: Or Har-Toov Reviewed-by: Patrisious Haddad Signed-off-by: Leon Romanovsky Signed-off-by: Jason Gunthorpe --- drivers/infiniband/hw/mlx5/odp.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/drivers/infiniband/hw/mlx5/odp.c b/drivers/infiniband/hw/mlx5/odp.c index eaa2f9f5f3a9..f6abd64f07f7 100644 --- a/drivers/infiniband/hw/mlx5/odp.c +++ b/drivers/infiniband/hw/mlx5/odp.c @@ -259,8 +259,8 @@ static void destroy_unused_implicit_child_mr(struct mlx5_ib_mr *mr) } if (MLX5_CAP_ODP(mr_to_mdev(mr)->mdev, mem_page_fault)) - __xa_erase(&mr_to_mdev(mr)->odp_mkeys, - mlx5_base_mkey(mr->mmkey.key)); + xa_erase(&mr_to_mdev(mr)->odp_mkeys, + mlx5_base_mkey(mr->mmkey.key)); xa_unlock(&imr->implicit_children); /* Freeing a MR is a sleeping operation, so bounce to a work queue */ @@ -532,8 +532,8 @@ static struct mlx5_ib_mr *implicit_get_child_mr(struct mlx5_ib_mr *imr, } if (MLX5_CAP_ODP(dev->mdev, mem_page_fault)) { - ret = __xa_store(&dev->odp_mkeys, mlx5_base_mkey(mr->mmkey.key), - &mr->mmkey, GFP_KERNEL); + ret = xa_store(&dev->odp_mkeys, mlx5_base_mkey(mr->mmkey.key), + &mr->mmkey, GFP_KERNEL); if (xa_is_err(ret)) { ret = ERR_PTR(xa_err(ret)); __xa_erase(&imr->implicit_children, idx); From 333e4d79316c9ed5877d7aac8b8ed22efc74e96d Mon Sep 17 00:00:00 2001 From: Maor Gottlieb Date: Mon, 16 Jun 2025 11:26:21 +0300 Subject: [PATCH 073/289] RDMA/core: Rate limit GID cache warning messages The GID cache warning messages can flood the kernel log when there are multiple failed attempts to add GIDs. This can happen when creating many virtual interfaces without having enough space for their GIDs in the GID table. Change pr_warn to pr_warn_ratelimited to prevent log flooding while still maintaining visibility of the issue. Link: https://patch.msgid.link/r/fd45ed4a1078e743f498b234c3ae816610ba1b18.1750062357.git.leon@kernel.org Signed-off-by: Maor Gottlieb Signed-off-by: Leon Romanovsky Signed-off-by: Jason Gunthorpe --- drivers/infiniband/core/cache.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/infiniband/core/cache.c b/drivers/infiniband/core/cache.c index 9979a351577f..81cf3c902e81 100644 --- a/drivers/infiniband/core/cache.c +++ b/drivers/infiniband/core/cache.c @@ -582,8 +582,8 @@ static int __ib_cache_gid_add(struct ib_device *ib_dev, u32 port, out_unlock: mutex_unlock(&table->lock); if (ret) - pr_warn("%s: unable to add gid %pI6 error=%d\n", - __func__, gid->raw, ret); + pr_warn_ratelimited("%s: unable to add gid %pI6 error=%d\n", + __func__, gid->raw, ret); return ret; } From 8edab8a72d67742f87e9dc2e2b0cdfddda5dc29a Mon Sep 17 00:00:00 2001 From: Mark Zhang Date: Tue, 17 Jun 2025 11:13:55 +0300 Subject: [PATCH 074/289] RDMA/mlx5: Initialize obj_event->obj_sub_list before xa_insert The obj_event may be loaded immediately after inserted, then if the list_head is not initialized then we may get a poisonous pointer. This fixes the crash below: mlx5_core 0000:03:00.0: MLX5E: StrdRq(1) RqSz(8) StrdSz(2048) RxCqeCmprss(0 enhanced) mlx5_core.sf mlx5_core.sf.4: firmware version: 32.38.3056 mlx5_core 0000:03:00.0 en3f0pf0sf2002: renamed from eth0 mlx5_core.sf mlx5_core.sf.4: Rate limit: 127 rates are supported, range: 0Mbps to 195312Mbps IPv6: ADDRCONF(NETDEV_CHANGE): en3f0pf0sf2002: link becomes ready Unable to handle kernel NULL pointer dereference at virtual address 0000000000000060 Mem abort info: ESR = 0x96000006 EC = 0x25: DABT (current EL), IL = 32 bits SET = 0, FnV = 0 EA = 0, S1PTW = 0 Data abort info: ISV = 0, ISS = 0x00000006 CM = 0, WnR = 0 user pgtable: 4k pages, 48-bit VAs, pgdp=00000007760fb000 [0000000000000060] pgd=000000076f6d7003, p4d=000000076f6d7003, pud=0000000777841003, pmd=0000000000000000 Internal error: Oops: 96000006 [#1] SMP Modules linked in: ipmb_host(OE) act_mirred(E) cls_flower(E) sch_ingress(E) mptcp_diag(E) udp_diag(E) raw_diag(E) unix_diag(E) tcp_diag(E) inet_diag(E) binfmt_misc(E) bonding(OE) rdma_ucm(OE) rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) isofs(E) cdrom(E) mst_pciconf(OE) ib_umad(OE) mlx5_ib(OE) ipmb_dev_int(OE) mlx5_core(OE) kpatch_15237886(OEK) mlxdevm(OE) auxiliary(OE) ib_uverbs(OE) ib_core(OE) psample(E) mlxfw(OE) tls(E) sunrpc(E) vfat(E) fat(E) crct10dif_ce(E) ghash_ce(E) sha1_ce(E) sbsa_gwdt(E) virtio_console(E) ext4(E) mbcache(E) jbd2(E) xfs(E) libcrc32c(E) mmc_block(E) virtio_net(E) net_failover(E) failover(E) sha2_ce(E) sha256_arm64(E) nvme(OE) nvme_core(OE) gpio_mlxbf3(OE) mlx_compat(OE) mlxbf_pmc(OE) i2c_mlxbf(OE) sdhci_of_dwcmshc(OE) pinctrl_mlxbf3(OE) mlxbf_pka(OE) gpio_generic(E) i2c_core(E) mmc_core(E) mlxbf_gige(OE) vitesse(E) pwr_mlxbf(OE) mlxbf_tmfifo(OE) micrel(E) mlxbf_bootctl(OE) virtio_ring(E) virtio(E) ipmi_devintf(E) ipmi_msghandler(E) [last unloaded: mst_pci] CPU: 11 PID: 20913 Comm: rte-worker-11 Kdump: loaded Tainted: G OE K 5.10.134-13.1.an8.aarch64 #1 Hardware name: https://www.mellanox.com BlueField-3 SmartNIC Main Card/BlueField-3 SmartNIC Main Card, BIOS 4.2.2.12968 Oct 26 2023 pstate: a0400089 (NzCv daIf +PAN -UAO -TCO BTYPE=--) pc : dispatch_event_fd+0x68/0x300 [mlx5_ib] lr : devx_event_notifier+0xcc/0x228 [mlx5_ib] sp : ffff80001005bcf0 x29: ffff80001005bcf0 x28: 0000000000000001 x27: ffff244e0740a1d8 x26: ffff244e0740a1d0 x25: ffffda56beff5ae0 x24: ffffda56bf911618 x23: ffff244e0596a480 x22: ffff244e0596a480 x21: ffff244d8312ad90 x20: ffff244e0596a480 x19: fffffffffffffff0 x18: 0000000000000000 x17: 0000000000000000 x16: ffffda56be66d620 x15: 0000000000000000 x14: 0000000000000000 x13: 0000000000000000 x12: 0000000000000000 x11: 0000000000000040 x10: ffffda56bfcafb50 x9 : ffffda5655c25f2c x8 : 0000000000000010 x7 : 0000000000000000 x6 : ffff24545a2e24b8 x5 : 0000000000000003 x4 : ffff80001005bd28 x3 : 0000000000000000 x2 : 0000000000000000 x1 : ffff244e0596a480 x0 : ffff244d8312ad90 Call trace: dispatch_event_fd+0x68/0x300 [mlx5_ib] devx_event_notifier+0xcc/0x228 [mlx5_ib] atomic_notifier_call_chain+0x58/0x80 mlx5_eq_async_int+0x148/0x2b0 [mlx5_core] atomic_notifier_call_chain+0x58/0x80 irq_int_handler+0x20/0x30 [mlx5_core] __handle_irq_event_percpu+0x60/0x220 handle_irq_event_percpu+0x3c/0x90 handle_irq_event+0x58/0x158 handle_fasteoi_irq+0xfc/0x188 generic_handle_irq+0x34/0x48 ... Fixes: 759738537142 ("IB/mlx5: Enable subscription for device events over DEVX") Link: https://patch.msgid.link/r/3ce7f20e0d1a03dc7de6e57494ec4b8eaf1f05c2.1750147949.git.leon@kernel.org Signed-off-by: Mark Zhang Signed-off-by: Leon Romanovsky Signed-off-by: Jason Gunthorpe --- drivers/infiniband/hw/mlx5/devx.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/infiniband/hw/mlx5/devx.c b/drivers/infiniband/hw/mlx5/devx.c index bceae1c1f980..843dcd312242 100644 --- a/drivers/infiniband/hw/mlx5/devx.c +++ b/drivers/infiniband/hw/mlx5/devx.c @@ -1958,6 +1958,7 @@ subscribe_event_xa_alloc(struct mlx5_devx_event_table *devx_event_table, /* Level1 is valid for future use, no need to free */ return -ENOMEM; + INIT_LIST_HEAD(&obj_event->obj_sub_list); err = xa_insert(&event->object_ids, key_level2, obj_event, @@ -1966,7 +1967,6 @@ subscribe_event_xa_alloc(struct mlx5_devx_event_table *devx_event_table, kfree(obj_event); return err; } - INIT_LIST_HEAD(&obj_event->obj_sub_list); } return 0; From bbc3a0b17a890aa19bddd0f9b08e8af488f1ec94 Mon Sep 17 00:00:00 2001 From: Kent Overstreet Date: Tue, 17 Jun 2025 13:08:35 -0400 Subject: [PATCH 075/289] bcachefs: fsck: Fix check_directory_structure when no check_dirents check_directory_structure runs after check_dirents, so it expects that it won't see any inodes with missing backpointers - normally. But online fsck can't run check_dirents yet, or the user might only be running a specific pass, so we need to be careful that this isn't an error. If an inode is unreachable, that's handled by a separate pass. Also, add a new 'bch2_inode_has_backpointer()' helper, since we were doing this inconsistently. Signed-off-by: Kent Overstreet --- fs/bcachefs/fsck.c | 18 +++++++++++++----- fs/bcachefs/inode.h | 5 +++++ fs/bcachefs/namei.c | 3 +-- 3 files changed, 19 insertions(+), 7 deletions(-) diff --git a/fs/bcachefs/fsck.c b/fs/bcachefs/fsck.c index 73cff24598ed..57ddc20a5cce 100644 --- a/fs/bcachefs/fsck.c +++ b/fs/bcachefs/fsck.c @@ -327,7 +327,8 @@ static inline bool inode_should_reattach(struct bch_inode_unpacked *inode) (inode->bi_flags & BCH_INODE_has_child_snapshot)) return false; - return !inode->bi_dir && !(inode->bi_flags & BCH_INODE_unlinked); + return !bch2_inode_has_backpointer(inode) && + !(inode->bi_flags & BCH_INODE_unlinked); } static int maybe_delete_dirent(struct btree_trans *trans, struct bpos d_pos, u32 snapshot) @@ -514,7 +515,7 @@ static struct bkey_s_c_dirent dirent_get_by_pos(struct btree_trans *trans, static int remove_backpointer(struct btree_trans *trans, struct bch_inode_unpacked *inode) { - if (!inode->bi_dir) + if (!bch2_inode_has_backpointer(inode)) return 0; u32 snapshot = inode->bi_snapshot; @@ -1165,13 +1166,14 @@ static int check_inode(struct btree_trans *trans, if (ret) goto err; - if (u.bi_dir || u.bi_dir_offset) { + if (bch2_inode_has_backpointer(&u)) { ret = check_inode_dirent_inode(trans, &u, &do_update); if (ret) goto err; } - if (fsck_err_on(u.bi_dir && (u.bi_flags & BCH_INODE_unlinked), + if (fsck_err_on(bch2_inode_has_backpointer(&u) && + (u.bi_flags & BCH_INODE_unlinked), trans, inode_unlinked_but_has_dirent, "inode unlinked but has dirent\n%s", (printbuf_reset(&buf), @@ -2751,7 +2753,13 @@ static int check_path_loop(struct btree_trans *trans, struct bkey_s_c inode_k) if (ret) return ret; - while (!inode.bi_subvol) { + /* + * If we're running full fsck, check_dirents() will have already ran, + * and we shouldn't see any missing backpointers here - otherwise that's + * handled separately, by check_unreachable_inodes + */ + while (!inode.bi_subvol && + bch2_inode_has_backpointer(&inode)) { struct btree_iter dirent_iter; struct bkey_s_c_dirent d; diff --git a/fs/bcachefs/inode.h b/fs/bcachefs/inode.h index 82cec2836cbd..b8ec3e628d90 100644 --- a/fs/bcachefs/inode.h +++ b/fs/bcachefs/inode.h @@ -254,6 +254,11 @@ static inline bool bch2_inode_casefold(struct bch_fs *c, const struct bch_inode_ : c->opts.casefold; } +static inline bool bch2_inode_has_backpointer(const struct bch_inode_unpacked *bi) +{ + return bi->bi_dir || bi->bi_dir_offset; +} + /* i_nlink: */ static inline unsigned nlink_bias(umode_t mode) diff --git a/fs/bcachefs/namei.c b/fs/bcachefs/namei.c index 7d1068aa998f..c3f87c59922d 100644 --- a/fs/bcachefs/namei.c +++ b/fs/bcachefs/namei.c @@ -734,8 +734,7 @@ static int bch2_check_dirent_inode_dirent(struct btree_trans *trans, if (inode_points_to_dirent(target, d)) return 0; - if (!target->bi_dir && - !target->bi_dir_offset) { + if (!bch2_inode_has_backpointer(target)) { fsck_err_on(S_ISDIR(target->bi_mode), trans, inode_dir_missing_backpointer, "directory with missing backpointer\n%s", From 3f890768dab1f97ff9bd7ebb76f4c52309401501 Mon Sep 17 00:00:00 2001 From: Kent Overstreet Date: Tue, 17 Jun 2025 16:41:43 -0400 Subject: [PATCH 076/289] bcachefs: fsck: fix unhandled restart in topology repair Signed-off-by: Kent Overstreet --- fs/bcachefs/btree_gc.c | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/fs/bcachefs/btree_gc.c b/fs/bcachefs/btree_gc.c index e92cf3928c63..697c6ecc3a65 100644 --- a/fs/bcachefs/btree_gc.c +++ b/fs/bcachefs/btree_gc.c @@ -503,8 +503,14 @@ again: prt_newline(&buf); bch2_bkey_val_to_text(&buf, c, bkey_i_to_s_c(&b->key)); + /* + * XXX: we're not passing the trans object here because we're not set up + * to handle a transaction restart - this code needs to be rewritten + * when we start doing online topology repair + */ + bch2_trans_unlock_long(trans); if (mustfix_fsck_err_on(!have_child, - trans, btree_node_topology_interior_node_empty, + c, btree_node_topology_interior_node_empty, "empty interior btree node at %s", buf.buf)) ret = DROP_THIS_NODE; err: From 1df310860aa5b4d97b3cc83a1a8dee599071b72d Mon Sep 17 00:00:00 2001 From: Kent Overstreet Date: Tue, 17 Jun 2025 16:49:40 -0400 Subject: [PATCH 077/289] bcachefs: fsck: Fix oops in key_visible_in_snapshot() The normal fsck code doesn't call key_visible_in_snapshot() with an empty list of snapshot IDs seen (the current snapshot ID will always be on the list), but str_hash_repair_key() -> bch2_get_snapshot_overwrites() can, and that's totally fine as long as we check for it. Signed-off-by: Kent Overstreet --- fs/bcachefs/fsck.c | 13 ++----------- 1 file changed, 2 insertions(+), 11 deletions(-) diff --git a/fs/bcachefs/fsck.c b/fs/bcachefs/fsck.c index 57ddc20a5cce..9920f1affc5b 100644 --- a/fs/bcachefs/fsck.c +++ b/fs/bcachefs/fsck.c @@ -728,14 +728,8 @@ static int snapshots_seen_update(struct bch_fs *c, struct snapshots_seen *s, static bool key_visible_in_snapshot(struct bch_fs *c, struct snapshots_seen *seen, u32 id, u32 ancestor) { - ssize_t i; - EBUG_ON(id > ancestor); - /* @ancestor should be the snapshot most recently added to @seen */ - EBUG_ON(ancestor != seen->pos.snapshot); - EBUG_ON(ancestor != darray_last(seen->ids)); - if (id == ancestor) return true; @@ -751,11 +745,8 @@ static bool key_visible_in_snapshot(struct bch_fs *c, struct snapshots_seen *see * numerically, since snapshot ID lists are kept sorted, so if we find * an id that's an ancestor of @id we're done: */ - - for (i = seen->ids.nr - 2; - i >= 0 && seen->ids.data[i] >= id; - --i) - if (bch2_snapshot_is_ancestor(c, id, seen->ids.data[i])) + darray_for_each_reverse(seen->ids, i) + if (*i != ancestor && bch2_snapshot_is_ancestor(c, id, *i)) return false; return true; From 88bd771191f7f20d6295700d1746f576419e3d1f Mon Sep 17 00:00:00 2001 From: Kent Overstreet Date: Tue, 17 Jun 2025 20:24:12 -0400 Subject: [PATCH 078/289] bcachefs: fix spurious error in read_btree_roots() Signed-off-by: Kent Overstreet --- fs/bcachefs/recovery.c | 1 + 1 file changed, 1 insertion(+) diff --git a/fs/bcachefs/recovery.c b/fs/bcachefs/recovery.c index fa5d1ef5bea6..e0d824471bff 100644 --- a/fs/bcachefs/recovery.c +++ b/fs/bcachefs/recovery.c @@ -607,6 +607,7 @@ static int read_btree_roots(struct bch_fs *c) buf.buf, bch2_err_str(ret))) { if (btree_id_is_alloc(i)) r->error = 0; + ret = 0; } } From 434635987fb7d544dd134f25c922413e19b02112 Mon Sep 17 00:00:00 2001 From: Kent Overstreet Date: Tue, 17 Jun 2025 20:22:32 -0400 Subject: [PATCH 079/289] bcachefs: Fix missing newlines before ero Signed-off-by: Kent Overstreet --- fs/bcachefs/data_update.c | 1 + fs/bcachefs/journal_io.c | 5 +++-- fs/bcachefs/recovery.c | 2 +- 3 files changed, 5 insertions(+), 3 deletions(-) diff --git a/fs/bcachefs/data_update.c b/fs/bcachefs/data_update.c index 5f1174348974..e848e210a9bf 100644 --- a/fs/bcachefs/data_update.c +++ b/fs/bcachefs/data_update.c @@ -249,6 +249,7 @@ static int data_update_invalid_bkey(struct data_update *m, bch2_bkey_val_to_text(&buf, c, k); prt_str(&buf, "\nnew: "); bch2_bkey_val_to_text(&buf, c, bkey_i_to_s_c(insert)); + prt_newline(&buf); bch2_fs_emergency_read_only2(c, &buf); diff --git a/fs/bcachefs/journal_io.c b/fs/bcachefs/journal_io.c index afbf12e8f0c5..dd3f3434c1b0 100644 --- a/fs/bcachefs/journal_io.c +++ b/fs/bcachefs/journal_io.c @@ -1716,9 +1716,10 @@ static CLOSURE_CALLBACK(journal_write_done) bch2_log_msg_start(c, &buf); if (err == -BCH_ERR_journal_write_err) - prt_printf(&buf, "unable to write journal to sufficient devices"); + prt_printf(&buf, "unable to write journal to sufficient devices\n"); else - prt_printf(&buf, "journal write error marking replicas: %s", bch2_err_str(err)); + prt_printf(&buf, "journal write error marking replicas: %s\n", + bch2_err_str(err)); bch2_fs_emergency_read_only2(c, &buf); diff --git a/fs/bcachefs/recovery.c b/fs/bcachefs/recovery.c index e0d824471bff..d0b7e3a36a54 100644 --- a/fs/bcachefs/recovery.c +++ b/fs/bcachefs/recovery.c @@ -1142,7 +1142,7 @@ fsck_err: struct printbuf buf = PRINTBUF; bch2_log_msg_start(c, &buf); - prt_printf(&buf, "error in recovery: %s", bch2_err_str(ret)); + prt_printf(&buf, "error in recovery: %s\n", bch2_err_str(ret)); bch2_fs_emergency_read_only2(c, &buf); bch2_print_str(c, KERN_ERR, buf.buf); From d9d79e4f7dc935fea96dbf3de524404c08d08b03 Mon Sep 17 00:00:00 2001 From: Arnd Bergmann Date: Tue, 20 May 2025 17:40:43 +0200 Subject: [PATCH 080/289] mfd: Fix building without CONFIG_OF Using the of_fwnode_handle() means that local 'node' variables are unused whenever CONFIG_OF is disabled for compile testing: drivers/mfd/88pm860x-core.c: In function 'device_irq_init': drivers/mfd/88pm860x-core.c:576:29: error: unused variable 'node' [-Werror=unused-variable] 576 | struct device_node *node = i2c->dev.of_node; | ^~~~ drivers/mfd/max8925-core.c: In function 'max8925_irq_init': drivers/mfd/max8925-core.c:659:29: error: unused variable 'node' [-Werror=unused-variable] 659 | struct device_node *node = chip->dev->of_node; | ^~~~ drivers/mfd/twl4030-irq.c: In function 'twl4030_init_irq': drivers/mfd/twl4030-irq.c:679:46: error: unused variable 'node' [-Werror=unused-variable] 679 | struct device_node *node = dev->of_node; | ^~~~ Replace these with the corresponding dev_fwnode() lookups that keep the code simpler in addition to avoiding the warnings. Fixes: e3d44f11da04 ("mfd: Switch to irq_domain_create_*()") Signed-off-by: Arnd Bergmann Reviewed-by: Jiri Slaby Link: https://lore.kernel.org/r/20250520154106.2019525-1-arnd@kernel.org Signed-off-by: Lee Jones --- drivers/mfd/88pm860x-core.c | 3 +-- drivers/mfd/max8925-core.c | 6 +++--- drivers/mfd/twl4030-irq.c | 3 +-- 3 files changed, 5 insertions(+), 7 deletions(-) diff --git a/drivers/mfd/88pm860x-core.c b/drivers/mfd/88pm860x-core.c index 488e346047c1..77230fbe07be 100644 --- a/drivers/mfd/88pm860x-core.c +++ b/drivers/mfd/88pm860x-core.c @@ -573,7 +573,6 @@ static int device_irq_init(struct pm860x_chip *chip, unsigned long flags = IRQF_TRIGGER_FALLING | IRQF_ONESHOT; int data, mask, ret = -EINVAL; int nr_irqs, irq_base = -1; - struct device_node *node = i2c->dev.of_node; mask = PM8607_B0_MISC1_INV_INT | PM8607_B0_MISC1_INT_CLEAR | PM8607_B0_MISC1_INT_MASK; @@ -624,7 +623,7 @@ static int device_irq_init(struct pm860x_chip *chip, ret = -EBUSY; goto out; } - irq_domain_create_legacy(of_fwnode_handle(node), nr_irqs, chip->irq_base, 0, + irq_domain_create_legacy(dev_fwnode(&i2c->dev), nr_irqs, chip->irq_base, 0, &pm860x_irq_domain_ops, chip); chip->core_irq = i2c->irq; if (!chip->core_irq) diff --git a/drivers/mfd/max8925-core.c b/drivers/mfd/max8925-core.c index 78b16c67a5fc..25377dcce60e 100644 --- a/drivers/mfd/max8925-core.c +++ b/drivers/mfd/max8925-core.c @@ -656,7 +656,6 @@ static int max8925_irq_init(struct max8925_chip *chip, int irq, { unsigned long flags = IRQF_TRIGGER_FALLING | IRQF_ONESHOT; int ret; - struct device_node *node = chip->dev->of_node; /* clear all interrupts */ max8925_reg_read(chip->i2c, MAX8925_CHG_IRQ1); @@ -682,8 +681,9 @@ static int max8925_irq_init(struct max8925_chip *chip, int irq, return -EBUSY; } - irq_domain_create_legacy(of_fwnode_handle(node), MAX8925_NR_IRQS, chip->irq_base, 0, - &max8925_irq_domain_ops, chip); + irq_domain_create_legacy(dev_fwnode(chip->dev), MAX8925_NR_IRQS, + chip->irq_base, 0, &max8925_irq_domain_ops, + chip); /* request irq handler for pmic main irq*/ chip->core_irq = irq; diff --git a/drivers/mfd/twl4030-irq.c b/drivers/mfd/twl4030-irq.c index 232c2bfe8c18..d3ab40651307 100644 --- a/drivers/mfd/twl4030-irq.c +++ b/drivers/mfd/twl4030-irq.c @@ -676,7 +676,6 @@ int twl4030_init_irq(struct device *dev, int irq_num) static struct irq_chip twl4030_irq_chip; int status, i; int irq_base, irq_end, nr_irqs; - struct device_node *node = dev->of_node; /* * TWL core and pwr interrupts must be contiguous because @@ -691,7 +690,7 @@ int twl4030_init_irq(struct device *dev, int irq_num) return irq_base; } - irq_domain_create_legacy(of_fwnode_handle(node), nr_irqs, irq_base, 0, + irq_domain_create_legacy(dev_fwnode(dev), nr_irqs, irq_base, 0, &irq_domain_simple_ops, NULL); irq_end = irq_base + TWL4030_CORE_NR_IRQS; From f5de469990f19569627ea0dd56536ff5a13beaa3 Mon Sep 17 00:00:00 2001 From: "Masami Hiramatsu (Google)" Date: Thu, 12 Jun 2025 20:26:10 +0900 Subject: [PATCH 081/289] mtk-sd: Prevent memory corruption from DMA map failure If msdc_prepare_data() fails to map the DMA region, the request is not prepared for data receiving, but msdc_start_data() proceeds the DMA with previous setting. Since this will lead a memory corruption, we have to stop the request operation soon after the msdc_prepare_data() fails to prepare it. Signed-off-by: Masami Hiramatsu (Google) Fixes: 208489032bdd ("mmc: mediatek: Add Mediatek MMC driver") Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/174972756982.3337526.6755001617701603082.stgit@mhiramat.tok.corp.google.com Signed-off-by: Ulf Hansson --- drivers/mmc/host/mtk-sd.c | 17 ++++++++++++++++- 1 file changed, 16 insertions(+), 1 deletion(-) diff --git a/drivers/mmc/host/mtk-sd.c b/drivers/mmc/host/mtk-sd.c index b1d1586cf1fc..b12cfb9a5e5f 100644 --- a/drivers/mmc/host/mtk-sd.c +++ b/drivers/mmc/host/mtk-sd.c @@ -853,6 +853,11 @@ static void msdc_prepare_data(struct msdc_host *host, struct mmc_data *data) } } +static bool msdc_data_prepared(struct mmc_data *data) +{ + return data->host_cookie & MSDC_PREPARE_FLAG; +} + static void msdc_unprepare_data(struct msdc_host *host, struct mmc_data *data) { if (data->host_cookie & MSDC_ASYNC_FLAG) @@ -1484,8 +1489,18 @@ static void msdc_ops_request(struct mmc_host *mmc, struct mmc_request *mrq) WARN_ON(!host->hsq_en && host->mrq); host->mrq = mrq; - if (mrq->data) + if (mrq->data) { msdc_prepare_data(host, mrq->data); + if (!msdc_data_prepared(mrq->data)) { + /* + * Failed to prepare DMA area, fail fast before + * starting any commands. + */ + mrq->cmd->error = -ENOSPC; + mmc_request_done(mmc_from_priv(host), mrq); + return; + } + } /* if SBC is required, we have HW option and SW option. * if HW option is enabled, and SBC does not have "special" flags, From ff78538e07fa284ce08cbbcb0730daa91ed16722 Mon Sep 17 00:00:00 2001 From: Nicolas Pitre Date: Tue, 10 Jun 2025 21:41:44 -0400 Subject: [PATCH 082/289] vt: add missing notification when switching back to text mode Programs using poll() on /dev/vcsa to be notified when VT changes occur were missing one case: the switch from gfx to text mode. Signed-off-by: Nicolas Pitre Link: https://lore.kernel.org/r/9o5ro928-0pp4-05rq-70p4-ro385n21n723@onlyvoer.pbz Signed-off-by: Greg Kroah-Hartman --- drivers/tty/vt/vt.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/tty/vt/vt.c b/drivers/tty/vt/vt.c index ed39d9cb4432..62049ceb34de 100644 --- a/drivers/tty/vt/vt.c +++ b/drivers/tty/vt/vt.c @@ -4650,6 +4650,7 @@ void do_unblank_screen(int leaving_gfx) set_palette(vc); set_cursor(vc); vt_event_post(VT_EVENT_UNBLANK, vc->vc_num, vc->vc_num); + notify_update(vc); } EXPORT_SYMBOL(do_unblank_screen); From 747b52413effe958ed57cf6d7bef80c34e1185f3 Mon Sep 17 00:00:00 2001 From: Randy Dunlap Date: Tue, 10 Jun 2025 19:02:29 -0700 Subject: [PATCH 083/289] vt: fix kernel-doc warnings in ucs_get_fallback() Use the correct function parameter name in ucs_get_fallback() to prevent kernel-doc warnings: Warning: drivers/tty/vt/ucs.c:218 function parameter 'cp' not described in 'ucs_get_fallback' Warning: drivers/tty/vt/ucs.c:218 Excess function parameter 'base' description in 'ucs_get_fallback' Fixes: fe26933cf1e1 ("vt: add ucs_get_fallback()") Signed-off-by: Randy Dunlap Cc: Nicolas Pitre Cc: Greg Kroah-Hartman Cc: Jiri Slaby Cc: linux-serial@vger.kernel.org Reviewed-by: Nicolas Pitre . Link: https://lore.kernel.org/r/20250611020229.2650595-1-rdunlap@infradead.org Signed-off-by: Greg Kroah-Hartman --- drivers/tty/vt/ucs.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/tty/vt/ucs.c b/drivers/tty/vt/ucs.c index 6ead622b7713..03877485dfb7 100644 --- a/drivers/tty/vt/ucs.c +++ b/drivers/tty/vt/ucs.c @@ -206,7 +206,7 @@ static int ucs_page_entry_cmp(const void *key, const void *element) /** * ucs_get_fallback() - Get a substitution for the provided Unicode character - * @base: Base Unicode code point (UCS-4) + * @cp: Unicode code point (UCS-4) * * Get a simpler fallback character for the provided Unicode character. * This is used for terminal display when corresponding glyph is unavailable. From d36f0e9a0002f04f4d6dd9be908d58fe5bd3a279 Mon Sep 17 00:00:00 2001 From: Aidan Stewart Date: Tue, 17 Jun 2025 10:48:19 -0600 Subject: [PATCH 084/289] serial: core: restore of_node information in sysfs Since in v6.8-rc1, the of_node symlink under tty devices is missing. This breaks any udev rules relying on this information. Link the of_node information in the serial controller device with the parent defined in the device tree. This will also apply to the serial device which takes the serial controller as a parent device. Fixes: b286f4e87e32 ("serial: core: Move tty and serdev to be children of serial core port device") Cc: stable@vger.kernel.org Signed-off-by: Aidan Stewart Link: https://lore.kernel.org/r/20250617164819.13912-1-astewart@tektelic.com Signed-off-by: Greg Kroah-Hartman --- drivers/tty/serial/serial_base_bus.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/tty/serial/serial_base_bus.c b/drivers/tty/serial/serial_base_bus.c index 5d1677f1b651..cb3b127b06b6 100644 --- a/drivers/tty/serial/serial_base_bus.c +++ b/drivers/tty/serial/serial_base_bus.c @@ -72,6 +72,7 @@ static int serial_base_device_init(struct uart_port *port, dev->parent = parent_dev; dev->bus = &serial_base_bus_type; dev->release = release; + device_set_of_node_from_dev(dev, parent_dev); if (!serial_base_initialized) { dev_dbg(port->dev, "uart_add_one_port() called before arch_initcall()?\n"); From a55bc4ffc06d8c965a7d6f0a01ed0ed41380df28 Mon Sep 17 00:00:00 2001 From: Nathan Chancellor Date: Mon, 9 Jun 2025 14:13:14 -0700 Subject: [PATCH 085/289] staging: rtl8723bs: Avoid memset() in aes_cipher() and aes_decipher() After commit 6f110a5e4f99 ("Disable SLUB_TINY for build testing"), which causes CONFIG_KASAN to be enabled in allmodconfig again, arm64 allmodconfig builds with older versions of clang (15 through 17) show an instance of -Wframe-larger-than (which breaks the build with CONFIG_WERROR=y): drivers/staging/rtl8723bs/core/rtw_security.c:1287:5: error: stack frame size (2208) exceeds limit (2048) in 'rtw_aes_decrypt' [-Werror,-Wframe-larger-than] 1287 | u32 rtw_aes_decrypt(struct adapter *padapter, u8 *precvframe) | ^ This comes from aes_decipher() being inlined in rtw_aes_decrypt(). Running the same build with CONFIG_FRAME_WARN=128 shows aes_cipher() also uses a decent amount of stack, just under the limit of 2048: drivers/staging/rtl8723bs/core/rtw_security.c:864:19: warning: stack frame size (1952) exceeds limit (128) in 'aes_cipher' [-Wframe-larger-than] 864 | static signed int aes_cipher(u8 *key, uint hdrlen, | ^ -Rpass-analysis=stack-frame-layout only shows one large structure on the stack, which is the ctx variable inlined from aes128k128d(). A good number of the other variables come from the additional checks of fortified string routines, which are present in memset(), which both aes_cipher() and aes_decipher() use to initialize some temporary buffers. In this case, since the size is known at compile time, these additional checks should not result in any code generation changes but allmodconfig has several sanitizers enabled, which may make it harder for the compiler to eliminate the compile time checks and the variables that come about from them. The memset() calls are just initializing these buffers to zero, so use '= {}' instead, which is used all over the kernel and does the exact same thing as memset() without the fortify checks, which drops the stack usage of these functions by a few hundred kilobytes. drivers/staging/rtl8723bs/core/rtw_security.c:864:19: warning: stack frame size (1584) exceeds limit (128) in 'aes_cipher' [-Wframe-larger-than] 864 | static signed int aes_cipher(u8 *key, uint hdrlen, | ^ drivers/staging/rtl8723bs/core/rtw_security.c:1271:5: warning: stack frame size (1456) exceeds limit (128) in 'rtw_aes_decrypt' [-Wframe-larger-than] 1271 | u32 rtw_aes_decrypt(struct adapter *padapter, u8 *precvframe) | ^ Cc: stable@vger.kernel.org Fixes: 554c0a3abf21 ("staging: Add rtl8723bs sdio wifi driver") Signed-off-by: Nathan Chancellor Reviewed-by: Dan Carpenter Link: https://lore.kernel.org/r/20250609-rtl8723bs-fix-clang-arm64-wflt-v1-1-e2accba43def@kernel.org Signed-off-by: Greg Kroah-Hartman --- drivers/staging/rtl8723bs/core/rtw_security.c | 44 ++++++------------- 1 file changed, 14 insertions(+), 30 deletions(-) diff --git a/drivers/staging/rtl8723bs/core/rtw_security.c b/drivers/staging/rtl8723bs/core/rtw_security.c index 1e9eff01b1aa..e9f382c280d9 100644 --- a/drivers/staging/rtl8723bs/core/rtw_security.c +++ b/drivers/staging/rtl8723bs/core/rtw_security.c @@ -868,29 +868,21 @@ static signed int aes_cipher(u8 *key, uint hdrlen, num_blocks, payload_index; u8 pn_vector[6]; - u8 mic_iv[16]; - u8 mic_header1[16]; - u8 mic_header2[16]; - u8 ctr_preload[16]; + u8 mic_iv[16] = {}; + u8 mic_header1[16] = {}; + u8 mic_header2[16] = {}; + u8 ctr_preload[16] = {}; /* Intermediate Buffers */ - u8 chain_buffer[16]; - u8 aes_out[16]; - u8 padded_buffer[16]; + u8 chain_buffer[16] = {}; + u8 aes_out[16] = {}; + u8 padded_buffer[16] = {}; u8 mic[8]; uint frtype = GetFrameType(pframe); uint frsubtype = GetFrameSubType(pframe); frsubtype = frsubtype>>4; - memset((void *)mic_iv, 0, 16); - memset((void *)mic_header1, 0, 16); - memset((void *)mic_header2, 0, 16); - memset((void *)ctr_preload, 0, 16); - memset((void *)chain_buffer, 0, 16); - memset((void *)aes_out, 0, 16); - memset((void *)padded_buffer, 0, 16); - if ((hdrlen == WLAN_HDR_A3_LEN) || (hdrlen == WLAN_HDR_A3_QOS_LEN)) a4_exists = 0; else @@ -1080,15 +1072,15 @@ static signed int aes_decipher(u8 *key, uint hdrlen, num_blocks, payload_index; signed int res = _SUCCESS; u8 pn_vector[6]; - u8 mic_iv[16]; - u8 mic_header1[16]; - u8 mic_header2[16]; - u8 ctr_preload[16]; + u8 mic_iv[16] = {}; + u8 mic_header1[16] = {}; + u8 mic_header2[16] = {}; + u8 ctr_preload[16] = {}; /* Intermediate Buffers */ - u8 chain_buffer[16]; - u8 aes_out[16]; - u8 padded_buffer[16]; + u8 chain_buffer[16] = {}; + u8 aes_out[16] = {}; + u8 padded_buffer[16] = {}; u8 mic[8]; uint frtype = GetFrameType(pframe); @@ -1096,14 +1088,6 @@ static signed int aes_decipher(u8 *key, uint hdrlen, frsubtype = frsubtype>>4; - memset((void *)mic_iv, 0, 16); - memset((void *)mic_header1, 0, 16); - memset((void *)mic_header2, 0, 16); - memset((void *)ctr_preload, 0, 16); - memset((void *)chain_buffer, 0, 16); - memset((void *)aes_out, 0, 16); - memset((void *)padded_buffer, 0, 16); - /* start to decrypt the payload */ num_blocks = (plen-8) / 16; /* plen including LLC, payload_length and mic) */ From b2348fe6c81cc92c7d0bd8a7d9241f8de0fa72b7 Mon Sep 17 00:00:00 2001 From: Kent Overstreet Date: Thu, 19 Jun 2025 12:25:41 -0400 Subject: [PATCH 086/289] bcachefs: Fix *__bch2_trans_subbuf_alloc() error path Don't change buf->size on error - this would usually be a transaction restart, but it could also be -ENOMEM - when we've exceeded the bump allocator max). Fixes: 247abee6ae6d ("bcachefs: btree_trans_subbuf") Signed-off-by: Kent Overstreet --- fs/bcachefs/btree_update.c | 16 +++++++++++----- 1 file changed, 11 insertions(+), 5 deletions(-) diff --git a/fs/bcachefs/btree_update.c b/fs/bcachefs/btree_update.c index e97e78c10f49..ee657b9f4b96 100644 --- a/fs/bcachefs/btree_update.c +++ b/fs/bcachefs/btree_update.c @@ -549,20 +549,26 @@ void *__bch2_trans_subbuf_alloc(struct btree_trans *trans, unsigned u64s) { unsigned new_top = buf->u64s + u64s; - unsigned old_size = buf->size; + unsigned new_size = buf->size; - if (new_top > buf->size) - buf->size = roundup_pow_of_two(new_top); + BUG_ON(roundup_pow_of_two(new_top) > U16_MAX); - void *n = bch2_trans_kmalloc_nomemzero(trans, buf->size * sizeof(u64)); + if (new_top > new_size) + new_size = roundup_pow_of_two(new_top); + + void *n = bch2_trans_kmalloc_nomemzero(trans, new_size * sizeof(u64)); if (IS_ERR(n)) return n; + unsigned offset = (u64 *) n - (u64 *) trans->mem; + BUG_ON(offset > U16_MAX); + if (buf->u64s) memcpy(n, btree_trans_subbuf_base(trans, buf), - old_size * sizeof(u64)); + buf->size * sizeof(u64)); buf->base = (u64 *) n - (u64 *) trans->mem; + buf->size = new_size; void *p = btree_trans_subbuf_top(trans, buf); buf->u64s = new_top; From 32a01cd4334175a0ae42b3c3d7f37fd26468ecc3 Mon Sep 17 00:00:00 2001 From: Kent Overstreet Date: Thu, 19 Jun 2025 12:50:06 -0400 Subject: [PATCH 087/289] bcachefs: Don't log fsck err in the journal if doing repair elsewhere This fixes exceeding the bump allocator limit when the allocator finds many buckets that need repair - they're repaired asynchronously, which means that every error logged a message in the bump allocator, without committing. Signed-off-by: Kent Overstreet --- fs/bcachefs/alloc_background.c | 13 +++++++++---- fs/bcachefs/btree_iter.c | 4 ++++ fs/bcachefs/error.c | 4 +++- fs/bcachefs/sb-errors_format.h | 7 ++++--- 4 files changed, 20 insertions(+), 8 deletions(-) diff --git a/fs/bcachefs/alloc_background.c b/fs/bcachefs/alloc_background.c index b228a5a64479..66de46318620 100644 --- a/fs/bcachefs/alloc_background.c +++ b/fs/bcachefs/alloc_background.c @@ -1406,6 +1406,9 @@ int bch2_check_discard_freespace_key(struct btree_trans *trans, struct btree_ite : BCH_DATA_free; struct printbuf buf = PRINTBUF; + unsigned fsck_flags = (async_repair ? FSCK_ERR_NO_LOG : 0)| + FSCK_CAN_FIX|FSCK_CAN_IGNORE; + struct bpos bucket = iter->pos; bucket.offset &= ~(~0ULL << 56); u64 genbits = iter->pos.offset & (~0ULL << 56); @@ -1419,9 +1422,10 @@ int bch2_check_discard_freespace_key(struct btree_trans *trans, struct btree_ite return ret; if (!bch2_dev_bucket_exists(c, bucket)) { - if (fsck_err(trans, need_discard_freespace_key_to_invalid_dev_bucket, - "entry in %s btree for nonexistant dev:bucket %llu:%llu", - bch2_btree_id_str(iter->btree_id), bucket.inode, bucket.offset)) + if (__fsck_err(trans, fsck_flags, + need_discard_freespace_key_to_invalid_dev_bucket, + "entry in %s btree for nonexistant dev:bucket %llu:%llu", + bch2_btree_id_str(iter->btree_id), bucket.inode, bucket.offset)) goto delete; ret = 1; goto out; @@ -1433,7 +1437,8 @@ int bch2_check_discard_freespace_key(struct btree_trans *trans, struct btree_ite if (a->data_type != state || (state == BCH_DATA_free && genbits != alloc_freespace_genbits(*a))) { - if (fsck_err(trans, need_discard_freespace_key_bad, + if (__fsck_err(trans, fsck_flags, + need_discard_freespace_key_bad, "%s\nincorrectly set at %s:%llu:%llu:0 (free %u, genbits %llu should be %llu)", (bch2_bkey_val_to_text(&buf, c, alloc_k), buf.buf), bch2_btree_id_str(iter->btree_id), diff --git a/fs/bcachefs/btree_iter.c b/fs/bcachefs/btree_iter.c index 061603999ae5..352f9cd2634f 100644 --- a/fs/bcachefs/btree_iter.c +++ b/fs/bcachefs/btree_iter.c @@ -3194,6 +3194,10 @@ void *__bch2_trans_kmalloc(struct btree_trans *trans, size_t size, unsigned long if (WARN_ON_ONCE(new_bytes > BTREE_TRANS_MEM_MAX)) { #ifdef CONFIG_BCACHEFS_TRANS_KMALLOC_TRACE struct printbuf buf = PRINTBUF; + bch2_log_msg_start(c, &buf); + prt_printf(&buf, "bump allocator exceeded BTREE_TRANS_MEM_MAX (%u)\n", + BTREE_TRANS_MEM_MAX); + bch2_trans_kmalloc_trace_to_text(&buf, &trans->trans_kmalloc_trace); bch2_print_str(c, KERN_ERR, buf.buf); printbuf_exit(&buf); diff --git a/fs/bcachefs/error.c b/fs/bcachefs/error.c index a8ec6aae5738..b2a6c041e165 100644 --- a/fs/bcachefs/error.c +++ b/fs/bcachefs/error.c @@ -621,7 +621,9 @@ print: if (s) s->ret = ret; - if (trans) + if (trans && + !(flags & FSCK_ERR_NO_LOG) && + ret == -BCH_ERR_fsck_fix) ret = bch2_trans_log_str(trans, bch2_sb_error_strs[err]) ?: ret; err_unlock: mutex_unlock(&c->fsck_error_msgs_lock); diff --git a/fs/bcachefs/sb-errors_format.h b/fs/bcachefs/sb-errors_format.h index f1aa40542a0e..041887aad63a 100644 --- a/fs/bcachefs/sb-errors_format.h +++ b/fs/bcachefs/sb-errors_format.h @@ -3,9 +3,10 @@ #define _BCACHEFS_SB_ERRORS_FORMAT_H enum bch_fsck_flags { - FSCK_CAN_FIX = 1 << 0, - FSCK_CAN_IGNORE = 1 << 1, - FSCK_AUTOFIX = 1 << 2, + FSCK_CAN_FIX = BIT(0), + FSCK_CAN_IGNORE = BIT(1), + FSCK_AUTOFIX = BIT(2), + FSCK_ERR_NO_LOG = BIT(3), }; #define BCH_SB_ERRS() \ From b2e2bed119809a5ca384241e0631f04c6142ae08 Mon Sep 17 00:00:00 2001 From: Kent Overstreet Date: Thu, 19 Jun 2025 14:32:57 -0400 Subject: [PATCH 088/289] bcachefs: Add missing key type checks to check_snapshot_exists() For now we only have one key type in these btrees, but forward compatibility means we do have to check. Reported-by: syzbot+b4cb4a6988aced0cec4b@syzkaller.appspotmail.com Signed-off-by: Kent Overstreet --- fs/bcachefs/snapshot.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/fs/bcachefs/snapshot.c b/fs/bcachefs/snapshot.c index 38aeaa128d27..4c43d2a2c1f5 100644 --- a/fs/bcachefs/snapshot.c +++ b/fs/bcachefs/snapshot.c @@ -871,7 +871,8 @@ static int check_snapshot_exists(struct btree_trans *trans, u32 id) for_each_btree_key_norestart(trans, iter, BTREE_ID_snapshot_trees, POS_MIN, 0, k, ret) { - if (le32_to_cpu(bkey_s_c_to_snapshot_tree(k).v->root_snapshot) == id) { + if (k.k->type == KEY_TYPE_snapshot_tree && + le32_to_cpu(bkey_s_c_to_snapshot_tree(k).v->root_snapshot) == id) { tree_id = k.k->p.offset; break; } @@ -899,7 +900,8 @@ static int check_snapshot_exists(struct btree_trans *trans, u32 id) for_each_btree_key_norestart(trans, iter, BTREE_ID_subvolumes, POS_MIN, 0, k, ret) { - if (le32_to_cpu(bkey_s_c_to_subvolume(k).v->snapshot) == id) { + if (k.k->type == KEY_TYPE_subvolume && + le32_to_cpu(bkey_s_c_to_subvolume(k).v->snapshot) == id) { snapshot->v.subvol = cpu_to_le32(k.k->p.offset); SET_BCH_SNAPSHOT_SUBVOL(&snapshot->v, true); break; From 4540e41e753a7d69ecd3f5bad51fe620205c3a18 Mon Sep 17 00:00:00 2001 From: Qasim Ijaz Date: Sun, 15 Jun 2025 23:59:41 +0100 Subject: [PATCH 089/289] HID: appletb-kbd: fix "appletb_backlight" backlight device reference counting During appletb_kbd_probe, probe attempts to get the backlight device by name. When this happens backlight_device_get_by_name looks for a device in the backlight class which has name "appletb_backlight" and upon finding a match it increments the reference count for the device and returns it to the caller. However this reference is never released leading to a reference leak. Fix this by decrementing the backlight device reference count on removal via put_device and on probe failure. Fixes: 93a0fc489481 ("HID: hid-appletb-kbd: add support for automatic brightness control while using the touchbar") Cc: stable@vger.kernel.org Signed-off-by: Qasim Ijaz Reviewed-by: Aditya Garg Signed-off-by: Jiri Kosina --- drivers/hid/hid-appletb-kbd.c | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/drivers/hid/hid-appletb-kbd.c b/drivers/hid/hid-appletb-kbd.c index ef51b2c06872..e06567886e50 100644 --- a/drivers/hid/hid-appletb-kbd.c +++ b/drivers/hid/hid-appletb-kbd.c @@ -438,6 +438,8 @@ static int appletb_kbd_probe(struct hid_device *hdev, const struct hid_device_id return 0; close_hw: + if (kbd->backlight_dev) + put_device(&kbd->backlight_dev->dev); hid_hw_close(hdev); stop_hw: hid_hw_stop(hdev); @@ -453,6 +455,9 @@ static void appletb_kbd_remove(struct hid_device *hdev) input_unregister_handler(&kbd->inp_handler); timer_delete_sync(&kbd->inactivity_timer); + if (kbd->backlight_dev) + put_device(&kbd->backlight_dev->dev); + hid_hw_close(hdev); hid_hw_stop(hdev); } From a8905238c3bbe13db90065ed74682418f23830c3 Mon Sep 17 00:00:00 2001 From: Akira Inoue Date: Thu, 12 Jun 2025 13:34:38 +0900 Subject: [PATCH 090/289] HID: lenovo: Add support for ThinkPad X1 Tablet Thin Keyboard Gen2 Add "Thinkpad X1 Tablet Gen 2 Keyboard" PID to hid-lenovo driver to fix trackpoint not working issue. Signed-off-by: Akira Inoue Signed-off-by: Jiri Kosina --- drivers/hid/hid-ids.h | 1 + drivers/hid/hid-lenovo.c | 8 ++++++++ drivers/hid/hid-multitouch.c | 8 +++++++- 3 files changed, 16 insertions(+), 1 deletion(-) diff --git a/drivers/hid/hid-ids.h b/drivers/hid/hid-ids.h index 2d3769405ec3..c6468568aea1 100644 --- a/drivers/hid/hid-ids.h +++ b/drivers/hid/hid-ids.h @@ -821,6 +821,7 @@ #define USB_DEVICE_ID_LENOVO_TPPRODOCK 0x6067 #define USB_DEVICE_ID_LENOVO_X1_COVER 0x6085 #define USB_DEVICE_ID_LENOVO_X1_TAB 0x60a3 +#define USB_DEVICE_ID_LENOVO_X1_TAB2 0x60a4 #define USB_DEVICE_ID_LENOVO_X1_TAB3 0x60b5 #define USB_DEVICE_ID_LENOVO_X12_TAB 0x60fe #define USB_DEVICE_ID_LENOVO_X12_TAB2 0x61ae diff --git a/drivers/hid/hid-lenovo.c b/drivers/hid/hid-lenovo.c index a3c23a72316a..b3121fa7a72d 100644 --- a/drivers/hid/hid-lenovo.c +++ b/drivers/hid/hid-lenovo.c @@ -492,6 +492,7 @@ static int lenovo_input_mapping(struct hid_device *hdev, case USB_DEVICE_ID_LENOVO_X12_TAB: case USB_DEVICE_ID_LENOVO_X12_TAB2: case USB_DEVICE_ID_LENOVO_X1_TAB: + case USB_DEVICE_ID_LENOVO_X1_TAB2: case USB_DEVICE_ID_LENOVO_X1_TAB3: return lenovo_input_mapping_x1_tab_kbd(hdev, hi, field, usage, bit, max); default: @@ -608,6 +609,7 @@ static ssize_t attr_fn_lock_store(struct device *dev, case USB_DEVICE_ID_LENOVO_X12_TAB2: case USB_DEVICE_ID_LENOVO_TP10UBKBD: case USB_DEVICE_ID_LENOVO_X1_TAB: + case USB_DEVICE_ID_LENOVO_X1_TAB2: case USB_DEVICE_ID_LENOVO_X1_TAB3: ret = lenovo_led_set_tp10ubkbd(hdev, TP10UBKBD_FN_LOCK_LED, value); if (ret) @@ -864,6 +866,7 @@ static int lenovo_event(struct hid_device *hdev, struct hid_field *field, case USB_DEVICE_ID_LENOVO_X12_TAB2: case USB_DEVICE_ID_LENOVO_TP10UBKBD: case USB_DEVICE_ID_LENOVO_X1_TAB: + case USB_DEVICE_ID_LENOVO_X1_TAB2: case USB_DEVICE_ID_LENOVO_X1_TAB3: return lenovo_event_tp10ubkbd(hdev, field, usage, value); default: @@ -1147,6 +1150,7 @@ static int lenovo_led_brightness_set(struct led_classdev *led_cdev, case USB_DEVICE_ID_LENOVO_X12_TAB2: case USB_DEVICE_ID_LENOVO_TP10UBKBD: case USB_DEVICE_ID_LENOVO_X1_TAB: + case USB_DEVICE_ID_LENOVO_X1_TAB2: case USB_DEVICE_ID_LENOVO_X1_TAB3: ret = lenovo_led_set_tp10ubkbd(hdev, tp10ubkbd_led[led_nr], value); break; @@ -1387,6 +1391,7 @@ static int lenovo_probe(struct hid_device *hdev, case USB_DEVICE_ID_LENOVO_X12_TAB2: case USB_DEVICE_ID_LENOVO_TP10UBKBD: case USB_DEVICE_ID_LENOVO_X1_TAB: + case USB_DEVICE_ID_LENOVO_X1_TAB2: case USB_DEVICE_ID_LENOVO_X1_TAB3: ret = lenovo_probe_tp10ubkbd(hdev); break; @@ -1476,6 +1481,7 @@ static void lenovo_remove(struct hid_device *hdev) case USB_DEVICE_ID_LENOVO_X12_TAB2: case USB_DEVICE_ID_LENOVO_TP10UBKBD: case USB_DEVICE_ID_LENOVO_X1_TAB: + case USB_DEVICE_ID_LENOVO_X1_TAB2: case USB_DEVICE_ID_LENOVO_X1_TAB3: lenovo_remove_tp10ubkbd(hdev); break; @@ -1526,6 +1532,8 @@ static const struct hid_device_id lenovo_devices[] = { */ { HID_DEVICE(BUS_USB, HID_GROUP_GENERIC, USB_VENDOR_ID_LENOVO, USB_DEVICE_ID_LENOVO_X1_TAB) }, + { HID_DEVICE(BUS_USB, HID_GROUP_GENERIC, + USB_VENDOR_ID_LENOVO, USB_DEVICE_ID_LENOVO_X1_TAB2) }, { HID_DEVICE(BUS_USB, HID_GROUP_GENERIC, USB_VENDOR_ID_LENOVO, USB_DEVICE_ID_LENOVO_X1_TAB3) }, { HID_DEVICE(BUS_USB, HID_GROUP_GENERIC, diff --git a/drivers/hid/hid-multitouch.c b/drivers/hid/hid-multitouch.c index ded0fef7d8c7..24aa6e7e6fdd 100644 --- a/drivers/hid/hid-multitouch.c +++ b/drivers/hid/hid-multitouch.c @@ -2132,12 +2132,18 @@ static const struct hid_device_id mt_devices[] = { HID_DEVICE(BUS_I2C, HID_GROUP_GENERIC, USB_VENDOR_ID_LG, I2C_DEVICE_ID_LG_7010) }, - /* Lenovo X1 TAB Gen 2 */ + /* Lenovo X1 TAB Gen 1 */ { .driver_data = MT_CLS_WIN_8_FORCE_MULTI_INPUT, HID_DEVICE(BUS_USB, HID_GROUP_MULTITOUCH_WIN_8, USB_VENDOR_ID_LENOVO, USB_DEVICE_ID_LENOVO_X1_TAB) }, + /* Lenovo X1 TAB Gen 2 */ + { .driver_data = MT_CLS_WIN_8_FORCE_MULTI_INPUT, + HID_DEVICE(BUS_USB, HID_GROUP_MULTITOUCH_WIN_8, + USB_VENDOR_ID_LENOVO, + USB_DEVICE_ID_LENOVO_X1_TAB2) }, + /* Lenovo X1 TAB Gen 3 */ { .driver_data = MT_CLS_WIN_8_FORCE_MULTI_INPUT, HID_DEVICE(BUS_USB, HID_GROUP_MULTITOUCH_WIN_8, From 850f0e2433cdd38f36d80a4c1ab59f82029bef74 Mon Sep 17 00:00:00 2001 From: Drew Fustini Date: Wed, 18 Jun 2025 20:54:57 -0700 Subject: [PATCH 091/289] MAINTAINERS: Update Drew Fustini's email address Switch from personal domain to kernel.org address. Signed-off-by: Drew Fustini Link: https://lore.kernel.org/r/20250619035457.331065-1-fustini@kernel.org Signed-off-by: Palmer Dabbelt --- .mailmap | 1 + MAINTAINERS | 2 +- 2 files changed, 2 insertions(+), 1 deletion(-) diff --git a/.mailmap b/.mailmap index c8a21d72241f..1ce0dd52dfaf 100644 --- a/.mailmap +++ b/.mailmap @@ -222,6 +222,7 @@ Dmitry Safonov <0x7f454c46@gmail.com> Dmitry Safonov <0x7f454c46@gmail.com> Domen Puncer Douglas Gilbert +Drew Fustini Ed L. Cashin Elliot Berman Enric Balletbo i Serra diff --git a/MAINTAINERS b/MAINTAINERS index a92290fffa16..dcb1142d10b6 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -21374,7 +21374,7 @@ N: spacemit K: spacemit RISC-V THEAD SoC SUPPORT -M: Drew Fustini +M: Drew Fustini M: Guo Ren M: Fu Wei L: linux-riscv@lists.infradead.org From 64f7548aad63d2fbca2eeb6eb33361c218ebd5a5 Mon Sep 17 00:00:00 2001 From: Arnd Bergmann Date: Fri, 20 Jun 2025 21:19:40 +0200 Subject: [PATCH 092/289] lib/crypto: sha256: Mark sha256_choose_blocks as __always_inline When the compiler chooses to not inline sha256_choose_blocks() in the purgatory code, it fails to link against the missing CPU specific version: x86_64-linux-ld: arch/x86/purgatory/purgatory.ro: in function `sha256_choose_blocks.part.0': sha256.c:(.text+0x6a6): undefined reference to `irq_fpu_usable' sha256.c:(.text+0x6c7): undefined reference to `sha256_blocks_arch' sha256.c:(.text+0x6cc): undefined reference to `sha256_blocks_simd' Mark this function as __always_inline to prevent this, same as sha256_finup(). Fixes: 5b90a779bc54 ("crypto: lib/sha256 - Add helpers for block-based shash") Signed-off-by: Arnd Bergmann Link: https://lore.kernel.org/r/20250620191952.1867578-1-arnd@kernel.org Signed-off-by: Eric Biggers --- include/crypto/internal/sha2.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/include/crypto/internal/sha2.h b/include/crypto/internal/sha2.h index b9bccd3ff57f..21a27fd5e198 100644 --- a/include/crypto/internal/sha2.h +++ b/include/crypto/internal/sha2.h @@ -25,7 +25,7 @@ void sha256_blocks_arch(u32 state[SHA256_STATE_WORDS], void sha256_blocks_simd(u32 state[SHA256_STATE_WORDS], const u8 *data, size_t nblocks); -static inline void sha256_choose_blocks( +static __always_inline void sha256_choose_blocks( u32 state[SHA256_STATE_WORDS], const u8 *data, size_t nblocks, bool force_generic, bool force_simd) { From bb378314ceee4d181e26bfe180deca852ae80c5c Mon Sep 17 00:00:00 2001 From: Kent Overstreet Date: Thu, 19 Jun 2025 16:51:32 -0400 Subject: [PATCH 093/289] bcachefs: Add missing bch2_err_class() to fileattr_set() Make sure we return a standard error code. Signed-off-by: Kent Overstreet --- fs/bcachefs/fs.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/fs/bcachefs/fs.c b/fs/bcachefs/fs.c index 3063a8ddc2df..db24a76563f8 100644 --- a/fs/bcachefs/fs.c +++ b/fs/bcachefs/fs.c @@ -1732,7 +1732,8 @@ static int bch2_fileattr_set(struct mnt_idmap *idmap, bch2_write_inode(c, inode, fssetxattr_inode_update_fn, &s, ATTR_CTIME); mutex_unlock(&inode->ei_update_lock); - return ret; + + return bch2_err_class(ret); } static const struct file_operations bch_file_operations = { From abcb6bd4be19d935795611c022d7d19ab69363b0 Mon Sep 17 00:00:00 2001 From: Kent Overstreet Date: Thu, 19 Jun 2025 17:06:43 -0400 Subject: [PATCH 094/289] bcachefs: fix spurious error_throw Signed-off-by: Kent Overstreet --- fs/bcachefs/backpointers.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/bcachefs/backpointers.c b/fs/bcachefs/backpointers.c index e76809e71858..77d93beb3c8f 100644 --- a/fs/bcachefs/backpointers.c +++ b/fs/bcachefs/backpointers.c @@ -353,7 +353,7 @@ static struct bkey_s_c __bch2_backpointer_get_key(struct btree_trans *trans, return ret ? bkey_s_c_err(ret) : bkey_s_c_null; } else { struct btree *b = __bch2_backpointer_get_node(trans, bp, iter, last_flushed, commit); - if (b == ERR_PTR(bch_err_throw(c, backpointer_to_overwritten_btree_node))) + if (b == ERR_PTR(-BCH_ERR_backpointer_to_overwritten_btree_node)) return bkey_s_c_null; if (IS_ERR_OR_NULL(b)) return ((struct bkey_s_c) { .k = ERR_CAST(b) }); From 72c0d9cb0fc48b6d382c3b5b6f702108a612cfdb Mon Sep 17 00:00:00 2001 From: Kent Overstreet Date: Thu, 19 Jun 2025 23:06:01 -0400 Subject: [PATCH 095/289] bcachefs: Fix range in bch2_lookup_indirect_extent() error path Before calling bch2_indirect_extent_missing_error(), we have to calculate the missing range, which is the intersection of the reflink pointer and the non-indirect-extent we found. The calculation didn't take into account that the returned extent may span the iter position, leading to an infinite loop when we (unnecessarily) resized the extent we were returning to one that didn't extend past the offset we were looking up. Signed-off-by: Kent Overstreet --- fs/bcachefs/reflink.c | 12 +++++++----- 1 file changed, 7 insertions(+), 5 deletions(-) diff --git a/fs/bcachefs/reflink.c b/fs/bcachefs/reflink.c index a535abd44df3..92b90cfe622b 100644 --- a/fs/bcachefs/reflink.c +++ b/fs/bcachefs/reflink.c @@ -64,6 +64,9 @@ void bch2_reflink_p_to_text(struct printbuf *out, struct bch_fs *c, REFLINK_P_IDX(p.v), le32_to_cpu(p.v->front_pad), le32_to_cpu(p.v->back_pad)); + + if (REFLINK_P_ERROR(p.v)) + prt_str(out, " error"); } bool bch2_reflink_p_merge(struct bch_fs *c, struct bkey_s _l, struct bkey_s_c _r) @@ -269,13 +272,12 @@ struct bkey_s_c bch2_lookup_indirect_extent(struct btree_trans *trans, return k; if (unlikely(!bkey_extent_is_reflink_data(k.k))) { - unsigned size = min((u64) k.k->size, - REFLINK_P_IDX(p.v) + p.k->size + le32_to_cpu(p.v->back_pad) - - reflink_offset); - bch2_key_resize(&iter->k, size); + u64 missing_end = min(k.k->p.offset, + REFLINK_P_IDX(p.v) + p.k->size + le32_to_cpu(p.v->back_pad)); + BUG_ON(reflink_offset == missing_end); int ret = bch2_indirect_extent_missing_error(trans, p, reflink_offset, - k.k->p.offset, should_commit); + missing_end, should_commit); if (ret) { bch2_trans_iter_exit(trans, iter); return bkey_s_c_err(ret); From e41687b511d5e5437db5d2151e23c115dba30411 Mon Sep 17 00:00:00 2001 From: Tim Crawford Date: Fri, 20 Jun 2025 14:43:29 -0600 Subject: [PATCH 096/289] ALSA: hda/realtek: Add quirks for some Clevo laptops Add audio quirks to fix speaker output and headset detection on the following Clevo models: - V350ENC - V350WNPQ - V540TU - X560WNR - X580WNS Signed-off-by: Tim Crawford Link: https://patch.msgid.link/20250620204329.35878-1-tcrawford@system76.com Signed-off-by: Takashi Iwai --- sound/pci/hda/patch_realtek.c | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/sound/pci/hda/patch_realtek.c b/sound/pci/hda/patch_realtek.c index 2e1618494c20..2149dcd34bc6 100644 --- a/sound/pci/hda/patch_realtek.c +++ b/sound/pci/hda/patch_realtek.c @@ -2656,6 +2656,7 @@ static const struct hda_quirk alc882_fixup_tbl[] = { SND_PCI_QUIRK(0x147b, 0x107a, "Abit AW9D-MAX", ALC882_FIXUP_ABIT_AW9D_MAX), SND_PCI_QUIRK(0x1558, 0x3702, "Clevo X370SN[VW]", ALC1220_FIXUP_CLEVO_PB51ED_PINS), SND_PCI_QUIRK(0x1558, 0x50d3, "Clevo PC50[ER][CDF]", ALC1220_FIXUP_CLEVO_PB51ED_PINS), + SND_PCI_QUIRK(0x1558, 0x5802, "Clevo X58[05]WN[RST]", ALC1220_FIXUP_CLEVO_PB51ED_PINS), SND_PCI_QUIRK(0x1558, 0x65d1, "Clevo PB51[ER][CDF]", ALC1220_FIXUP_CLEVO_PB51ED_PINS), SND_PCI_QUIRK(0x1558, 0x65d2, "Clevo PB51R[CDF]", ALC1220_FIXUP_CLEVO_PB51ED_PINS), SND_PCI_QUIRK(0x1558, 0x65e1, "Clevo PB51[ED][DF]", ALC1220_FIXUP_CLEVO_PB51ED_PINS), @@ -11135,6 +11136,8 @@ static const struct hda_quirk alc269_fixup_tbl[] = { SND_PCI_QUIRK(0x1558, 0x14a1, "Clevo L141MU", ALC293_FIXUP_SYSTEM76_MIC_NO_PRESENCE), SND_PCI_QUIRK(0x1558, 0x2624, "Clevo L240TU", ALC256_FIXUP_SYSTEM76_MIC_NO_PRESENCE), SND_PCI_QUIRK(0x1558, 0x28c1, "Clevo V370VND", ALC2XX_FIXUP_HEADSET_MIC), + SND_PCI_QUIRK(0x1558, 0x35a1, "Clevo V3[56]0EN[CDE]", ALC256_FIXUP_SYSTEM76_MIC_NO_PRESENCE), + SND_PCI_QUIRK(0x1558, 0x35b1, "Clevo V3[57]0WN[MNP]Q", ALC256_FIXUP_SYSTEM76_MIC_NO_PRESENCE), SND_PCI_QUIRK(0x1558, 0x4018, "Clevo NV40M[BE]", ALC293_FIXUP_SYSTEM76_MIC_NO_PRESENCE), SND_PCI_QUIRK(0x1558, 0x4019, "Clevo NV40MZ", ALC293_FIXUP_SYSTEM76_MIC_NO_PRESENCE), SND_PCI_QUIRK(0x1558, 0x4020, "Clevo NV40MB", ALC293_FIXUP_SYSTEM76_MIC_NO_PRESENCE), @@ -11162,6 +11165,7 @@ static const struct hda_quirk alc269_fixup_tbl[] = { SND_PCI_QUIRK(0x1558, 0x51b1, "Clevo NS50AU", ALC256_FIXUP_SYSTEM76_MIC_NO_PRESENCE), SND_PCI_QUIRK(0x1558, 0x51b3, "Clevo NS70AU", ALC256_FIXUP_SYSTEM76_MIC_NO_PRESENCE), SND_PCI_QUIRK(0x1558, 0x5630, "Clevo NP50RNJS", ALC256_FIXUP_SYSTEM76_MIC_NO_PRESENCE), + SND_PCI_QUIRK(0x1558, 0x5700, "Clevo X560WN[RST]", ALC256_FIXUP_SYSTEM76_MIC_NO_PRESENCE), SND_PCI_QUIRK(0x1558, 0x70a1, "Clevo NB70T[HJK]", ALC293_FIXUP_SYSTEM76_MIC_NO_PRESENCE), SND_PCI_QUIRK(0x1558, 0x70b3, "Clevo NK70SB", ALC293_FIXUP_SYSTEM76_MIC_NO_PRESENCE), SND_PCI_QUIRK(0x1558, 0x70f2, "Clevo NH79EPY", ALC293_FIXUP_SYSTEM76_MIC_NO_PRESENCE), @@ -11201,6 +11205,7 @@ static const struct hda_quirk alc269_fixup_tbl[] = { SND_PCI_QUIRK(0x1558, 0xa650, "Clevo NP[567]0SN[CD]", ALC256_FIXUP_SYSTEM76_MIC_NO_PRESENCE), SND_PCI_QUIRK(0x1558, 0xa671, "Clevo NP70SN[CDE]", ALC256_FIXUP_SYSTEM76_MIC_NO_PRESENCE), SND_PCI_QUIRK(0x1558, 0xa741, "Clevo V54x_6x_TNE", ALC245_FIXUP_CLEVO_NOISY_MIC), + SND_PCI_QUIRK(0x1558, 0xa743, "Clevo V54x_6x_TU", ALC245_FIXUP_CLEVO_NOISY_MIC), SND_PCI_QUIRK(0x1558, 0xa763, "Clevo V54x_6x_TU", ALC245_FIXUP_CLEVO_NOISY_MIC), SND_PCI_QUIRK(0x1558, 0xb018, "Clevo NP50D[BE]", ALC293_FIXUP_SYSTEM76_MIC_NO_PRESENCE), SND_PCI_QUIRK(0x1558, 0xb019, "Clevo NH77D[BE]Q", ALC293_FIXUP_SYSTEM76_MIC_NO_PRESENCE), From 68cc9d3c8e44afe90e43cbbd2960da15c2f31e23 Mon Sep 17 00:00:00 2001 From: Yasmin Fitzgerald Date: Sat, 21 Jun 2025 01:36:14 -0400 Subject: [PATCH 097/289] ALSA: hda/realtek - Enable mute LED on HP Pavilion Laptop 15-eg100 The HP Pavilion Laptop 15-eg100 has Realtek HDA codec ALC287. It needs the ALC287_FIXUP_HP_GPIO_LED quirk to enable the mute LED. Signed-off-by: Yasmin Fitzgerald Link: https://patch.msgid.link/20250621053832.52950-1-sunoflife1.git@gmail.com Signed-off-by: Takashi Iwai --- sound/pci/hda/patch_realtek.c | 1 + 1 file changed, 1 insertion(+) diff --git a/sound/pci/hda/patch_realtek.c b/sound/pci/hda/patch_realtek.c index 2149dcd34bc6..e144ebb08841 100644 --- a/sound/pci/hda/patch_realtek.c +++ b/sound/pci/hda/patch_realtek.c @@ -10738,6 +10738,7 @@ static const struct hda_quirk alc269_fixup_tbl[] = { SND_PCI_QUIRK(0x103c, 0x8975, "HP EliteBook x360 840 Aero G9", ALC245_FIXUP_CS35L41_SPI_2_HP_GPIO_LED), SND_PCI_QUIRK(0x103c, 0x897d, "HP mt440 Mobile Thin Client U74", ALC236_FIXUP_HP_GPIO_LED), SND_PCI_QUIRK(0x103c, 0x8981, "HP Elite Dragonfly G3", ALC245_FIXUP_CS35L41_SPI_4), + SND_PCI_QUIRK(0x103c, 0x898a, "HP Pavilion 15-eg100", ALC287_FIXUP_HP_GPIO_LED), SND_PCI_QUIRK(0x103c, 0x898e, "HP EliteBook 835 G9", ALC287_FIXUP_CS35L41_I2C_2), SND_PCI_QUIRK(0x103c, 0x898f, "HP EliteBook 835 G9", ALC287_FIXUP_CS35L41_I2C_2), SND_PCI_QUIRK(0x103c, 0x8991, "HP EliteBook 845 G9", ALC287_FIXUP_CS35L41_I2C_2_HP_GPIO_LED), From 999fb9d51f939ee23cbb9313ae558d29d6987804 Mon Sep 17 00:00:00 2001 From: Luca Weiss Date: Tue, 17 Jun 2025 14:20:12 +0200 Subject: [PATCH 098/289] ASoC: qcom: sm8250: Fix possibly undefined reference With CONFIG_SND_SOC_SM8250=y and CONFIG_SND_SOC_QCOM_OFFLOAD_UTILS=m selected in kconfig, the build will fail due to trying to link against a symbol only found in the module. aarch64-linux-gnu-ld: sound/soc/qcom/sm8250.o: in function `sm8250_snd_exit': sound/soc/qcom/sm8250.c:52:(.text+0x210): undefined reference to `qcom_snd_usb_offload_jack_remove' Fix this by declaring the dependency that forces CONFIG_SND_SOC_SM8250=m when CONFIG_SND_SOC_QCOM_OFFLOAD_UTILS is =m. Reported-by: Matthew Croughan Fixes: 1b8d0d87b934 ("ASoC: qcom: qdsp6: Add headphone jack for offload connection status") Signed-off-by: Luca Weiss Link: https://patch.msgid.link/20250617-snd-sm8250-dep-fix-v1-1-879af8906ec4@fairphone.com Signed-off-by: Mark Brown --- sound/soc/qcom/Kconfig | 1 + 1 file changed, 1 insertion(+) diff --git a/sound/soc/qcom/Kconfig b/sound/soc/qcom/Kconfig index e86b4a03dd61..3d9ba13ee1e5 100644 --- a/sound/soc/qcom/Kconfig +++ b/sound/soc/qcom/Kconfig @@ -186,6 +186,7 @@ config SND_SOC_SM8250 tristate "SoC Machine driver for SM8250 boards" depends on QCOM_APR && SOUNDWIRE depends on COMMON_CLK + depends on SND_SOC_QCOM_OFFLOAD_UTILS || !SND_SOC_QCOM_OFFLOAD_UTILS select SND_SOC_QDSP6 select SND_SOC_QCOM_COMMON select SND_SOC_QCOM_SDW From 7186b81807b4a08f8bf834b6bdc72d6ed8ba1587 Mon Sep 17 00:00:00 2001 From: Yuzuru10 Date: Sun, 22 Jun 2025 22:58:00 +0000 Subject: [PATCH 099/289] ASoC: amd: yc: add quirk for Acer Nitro ANV15-41 internal mic This patch adds DMI-based quirk for the Acer Nitro ANV15-41, allowing the internal microphone to be detected correctly on machines with "RB" as board vendor. Signed-off-by: Yuzuru Link: https://patch.msgid.link/20250622225754.20856-1-yuzuru_10@proton.me Signed-off-by: Mark Brown --- sound/soc/amd/yc/acp6x-mach.c | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/sound/soc/amd/yc/acp6x-mach.c b/sound/soc/amd/yc/acp6x-mach.c index 98022e5fd428..499f9f7c76ee 100644 --- a/sound/soc/amd/yc/acp6x-mach.c +++ b/sound/soc/amd/yc/acp6x-mach.c @@ -353,6 +353,13 @@ static const struct dmi_system_id yc_acp_quirk_table[] = { DMI_MATCH(DMI_PRODUCT_NAME, "83Q3"), } }, + { + .driver_data = &acp6x_card, + .matches = { + DMI_MATCH(DMI_BOARD_VENDOR, "RB"), + DMI_MATCH(DMI_PRODUCT_NAME, "Nitro ANV15-41"), + } + }, { .driver_data = &acp6x_card, .matches = { From bf39286adc5e10ce3e32eb86ad316ae56f3b52a0 Mon Sep 17 00:00:00 2001 From: Oliver Schramm Date: Sun, 22 Jun 2025 00:30:01 +0200 Subject: [PATCH 100/289] ASoC: amd: yc: Add DMI quirk for Lenovo IdeaPad Slim 5 15 It's smaller brother has already received the patch to enable the microphone, now add it too to the DMI quirk table. Cc: stable@vger.kernel.org Signed-off-by: Oliver Schramm Link: https://patch.msgid.link/20250621223000.11817-2-oliver.schramm97@gmail.com Signed-off-by: Mark Brown --- sound/soc/amd/yc/acp6x-mach.c | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/sound/soc/amd/yc/acp6x-mach.c b/sound/soc/amd/yc/acp6x-mach.c index 499f9f7c76ee..97e340140d0c 100644 --- a/sound/soc/amd/yc/acp6x-mach.c +++ b/sound/soc/amd/yc/acp6x-mach.c @@ -367,6 +367,13 @@ static const struct dmi_system_id yc_acp_quirk_table[] = { DMI_MATCH(DMI_PRODUCT_NAME, "83J2"), } }, + { + .driver_data = &acp6x_card, + .matches = { + DMI_MATCH(DMI_BOARD_VENDOR, "LENOVO"), + DMI_MATCH(DMI_PRODUCT_NAME, "83J3"), + } + }, { .driver_data = &acp6x_card, .matches = { From fb721b2c35b1829b8ecf62e3adb41cf30260316a Mon Sep 17 00:00:00 2001 From: Louis Chauvet Date: Tue, 29 Apr 2025 10:36:23 +0200 Subject: [PATCH 101/289] drm: writeback: Fix drm_writeback_connector_cleanup signature MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The drm_writeback_connector_cleanup have the signature: static void drm_writeback_connector_cleanup( struct drm_device *dev, struct drm_writeback_connector *wb_connector) But it is stored and used as a drmres_release_t typedef void (*drmres_release_t)(struct drm_device *dev, void *res); While the current code is valid and does not produce any warning, the CFI runtime check (CONFIG_CFI_CLANG) can fail because the function signature is not the same as drmres_release_t. In order to fix this, change the function signature to match what is expected by drmres_release_t. Fixes: 1914ba2b91ea ("drm: writeback: Create drmm variants for drm_writeback_connector initialization") Suggested-by: Mark Yacoub Reviewed-by: Maíra Canal Link: https://lore.kernel.org/r/20250429-drm-fix-writeback-cleanup-v2-1-548ff3a4e284@bootlin.com Signed-off-by: Louis Chauvet --- drivers/gpu/drm/drm_writeback.c | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/drivers/gpu/drm/drm_writeback.c b/drivers/gpu/drm/drm_writeback.c index edbeab88ff2b..d983ee85cf13 100644 --- a/drivers/gpu/drm/drm_writeback.c +++ b/drivers/gpu/drm/drm_writeback.c @@ -343,17 +343,18 @@ EXPORT_SYMBOL(drm_writeback_connector_init_with_encoder); /** * drm_writeback_connector_cleanup - Cleanup the writeback connector * @dev: DRM device - * @wb_connector: Pointer to the writeback connector to clean up + * @data: Pointer to the writeback connector to clean up * * This will decrement the reference counter of blobs and destroy properties. It * will also clean the remaining jobs in this writeback connector. Caution: This helper will not * clean up the attached encoder and the drm_connector. */ static void drm_writeback_connector_cleanup(struct drm_device *dev, - struct drm_writeback_connector *wb_connector) + void *data) { unsigned long flags; struct drm_writeback_job *pos, *n; + struct drm_writeback_connector *wb_connector = data; delete_writeback_properties(dev); drm_property_blob_put(wb_connector->pixel_formats_blob_ptr); @@ -405,7 +406,7 @@ int drmm_writeback_connector_init(struct drm_device *dev, if (ret) return ret; - ret = drmm_add_action_or_reset(dev, (void *)drm_writeback_connector_cleanup, + ret = drmm_add_action_or_reset(dev, drm_writeback_connector_cleanup, wb_connector); if (ret) return ret; From 20d71750cc72e80859d52548cf5c2a7513983b0d Mon Sep 17 00:00:00 2001 From: Herbert Xu Date: Fri, 20 Jun 2025 20:15:03 +0800 Subject: [PATCH 102/289] crypto: wp512 - Use API partial block handling Use the Crypto API partial block handling. Signed-off-by: Herbert Xu Tested-by: Milan Broz Signed-off-by: Herbert Xu --- crypto/wp512.c | 125 +++++++++++++++++++------------------------------ 1 file changed, 47 insertions(+), 78 deletions(-) diff --git a/crypto/wp512.c b/crypto/wp512.c index 41f13d490333..229b189a7988 100644 --- a/crypto/wp512.c +++ b/crypto/wp512.c @@ -21,10 +21,10 @@ */ #include #include +#include #include -#include -#include -#include +#include +#include #define WP512_DIGEST_SIZE 64 #define WP384_DIGEST_SIZE 48 @@ -37,9 +37,6 @@ struct wp512_ctx { u8 bitLength[WP512_LENGTHBYTES]; - u8 buffer[WP512_BLOCK_SIZE]; - int bufferBits; - int bufferPos; u64 hash[WP512_DIGEST_SIZE/8]; }; @@ -779,16 +776,16 @@ static const u64 rc[WHIRLPOOL_ROUNDS] = { * The core Whirlpool transform. */ -static __no_kmsan_checks void wp512_process_buffer(struct wp512_ctx *wctx) { +static __no_kmsan_checks void wp512_process_buffer(struct wp512_ctx *wctx, + const u8 *buffer) { int i, r; u64 K[8]; /* the round key */ u64 block[8]; /* mu(buffer) */ u64 state[8]; /* the cipher state */ u64 L[8]; - const __be64 *buffer = (const __be64 *)wctx->buffer; for (i = 0; i < 8; i++) - block[i] = be64_to_cpu(buffer[i]); + block[i] = get_unaligned_be64(buffer + i * 8); state[0] = block[0] ^ (K[0] = wctx->hash[0]); state[1] = block[1] ^ (K[1] = wctx->hash[1]); @@ -991,8 +988,6 @@ static int wp512_init(struct shash_desc *desc) { int i; memset(wctx->bitLength, 0, 32); - wctx->bufferBits = wctx->bufferPos = 0; - wctx->buffer[0] = 0; for (i = 0; i < 8; i++) { wctx->hash[i] = 0L; } @@ -1000,84 +995,54 @@ static int wp512_init(struct shash_desc *desc) { return 0; } -static int wp512_update(struct shash_desc *desc, const u8 *source, - unsigned int len) +static void wp512_add_length(u8 *bitLength, u64 value) { - struct wp512_ctx *wctx = shash_desc_ctx(desc); - int sourcePos = 0; - unsigned int bits_len = len * 8; // convert to number of bits - int sourceGap = (8 - ((int)bits_len & 7)) & 7; - int bufferRem = wctx->bufferBits & 7; + u32 carry; int i; - u32 b, carry; - u8 *buffer = wctx->buffer; - u8 *bitLength = wctx->bitLength; - int bufferBits = wctx->bufferBits; - int bufferPos = wctx->bufferPos; - u64 value = bits_len; for (i = 31, carry = 0; i >= 0 && (carry != 0 || value != 0ULL); i--) { carry += bitLength[i] + ((u32)value & 0xff); bitLength[i] = (u8)carry; carry >>= 8; value >>= 8; } - while (bits_len > 8) { - b = ((source[sourcePos] << sourceGap) & 0xff) | - ((source[sourcePos + 1] & 0xff) >> (8 - sourceGap)); - buffer[bufferPos++] |= (u8)(b >> bufferRem); - bufferBits += 8 - bufferRem; - if (bufferBits == WP512_BLOCK_SIZE * 8) { - wp512_process_buffer(wctx); - bufferBits = bufferPos = 0; - } - buffer[bufferPos] = b << (8 - bufferRem); - bufferBits += bufferRem; - bits_len -= 8; - sourcePos++; - } - if (bits_len > 0) { - b = (source[sourcePos] << sourceGap) & 0xff; - buffer[bufferPos] |= b >> bufferRem; - } else { - b = 0; - } - if (bufferRem + bits_len < 8) { - bufferBits += bits_len; - } else { - bufferPos++; - bufferBits += 8 - bufferRem; - bits_len -= 8 - bufferRem; - if (bufferBits == WP512_BLOCK_SIZE * 8) { - wp512_process_buffer(wctx); - bufferBits = bufferPos = 0; - } - buffer[bufferPos] = b << (8 - bufferRem); - bufferBits += (int)bits_len; - } - - wctx->bufferBits = bufferBits; - wctx->bufferPos = bufferPos; - - return 0; } -static int wp512_final(struct shash_desc *desc, u8 *out) +static int wp512_update(struct shash_desc *desc, const u8 *source, + unsigned int len) +{ + struct wp512_ctx *wctx = shash_desc_ctx(desc); + unsigned int remain = len % WP512_BLOCK_SIZE; + u64 bits_len = (len - remain) * 8ull; + u8 *bitLength = wctx->bitLength; + + wp512_add_length(bitLength, bits_len); + do { + wp512_process_buffer(wctx, source); + source += WP512_BLOCK_SIZE; + bits_len -= WP512_BLOCK_SIZE * 8; + } while (bits_len); + + return remain; +} + +static int wp512_finup(struct shash_desc *desc, const u8 *src, + unsigned int bufferPos, u8 *out) { struct wp512_ctx *wctx = shash_desc_ctx(desc); int i; - u8 *buffer = wctx->buffer; u8 *bitLength = wctx->bitLength; - int bufferBits = wctx->bufferBits; - int bufferPos = wctx->bufferPos; __be64 *digest = (__be64 *)out; + u8 buffer[WP512_BLOCK_SIZE]; - buffer[bufferPos] |= 0x80U >> (bufferBits & 7); + wp512_add_length(bitLength, bufferPos * 8); + memcpy(buffer, src, bufferPos); + buffer[bufferPos] = 0x80U; bufferPos++; if (bufferPos > WP512_BLOCK_SIZE - WP512_LENGTHBYTES) { if (bufferPos < WP512_BLOCK_SIZE) memset(&buffer[bufferPos], 0, WP512_BLOCK_SIZE - bufferPos); - wp512_process_buffer(wctx); + wp512_process_buffer(wctx, buffer); bufferPos = 0; } if (bufferPos < WP512_BLOCK_SIZE - WP512_LENGTHBYTES) @@ -1086,31 +1051,32 @@ static int wp512_final(struct shash_desc *desc, u8 *out) bufferPos = WP512_BLOCK_SIZE - WP512_LENGTHBYTES; memcpy(&buffer[WP512_BLOCK_SIZE - WP512_LENGTHBYTES], bitLength, WP512_LENGTHBYTES); - wp512_process_buffer(wctx); + wp512_process_buffer(wctx, buffer); + memzero_explicit(buffer, sizeof(buffer)); for (i = 0; i < WP512_DIGEST_SIZE/8; i++) digest[i] = cpu_to_be64(wctx->hash[i]); - wctx->bufferBits = bufferBits; - wctx->bufferPos = bufferPos; return 0; } -static int wp384_final(struct shash_desc *desc, u8 *out) +static int wp384_finup(struct shash_desc *desc, const u8 *src, + unsigned int len, u8 *out) { u8 D[64]; - wp512_final(desc, D); + wp512_finup(desc, src, len, D); memcpy(out, D, WP384_DIGEST_SIZE); memzero_explicit(D, WP512_DIGEST_SIZE); return 0; } -static int wp256_final(struct shash_desc *desc, u8 *out) +static int wp256_finup(struct shash_desc *desc, const u8 *src, + unsigned int len, u8 *out) { u8 D[64]; - wp512_final(desc, D); + wp512_finup(desc, src, len, D); memcpy(out, D, WP256_DIGEST_SIZE); memzero_explicit(D, WP512_DIGEST_SIZE); @@ -1121,11 +1087,12 @@ static struct shash_alg wp_algs[3] = { { .digestsize = WP512_DIGEST_SIZE, .init = wp512_init, .update = wp512_update, - .final = wp512_final, + .finup = wp512_finup, .descsize = sizeof(struct wp512_ctx), .base = { .cra_name = "wp512", .cra_driver_name = "wp512-generic", + .cra_flags = CRYPTO_AHASH_ALG_BLOCK_ONLY, .cra_blocksize = WP512_BLOCK_SIZE, .cra_module = THIS_MODULE, } @@ -1133,11 +1100,12 @@ static struct shash_alg wp_algs[3] = { { .digestsize = WP384_DIGEST_SIZE, .init = wp512_init, .update = wp512_update, - .final = wp384_final, + .finup = wp384_finup, .descsize = sizeof(struct wp512_ctx), .base = { .cra_name = "wp384", .cra_driver_name = "wp384-generic", + .cra_flags = CRYPTO_AHASH_ALG_BLOCK_ONLY, .cra_blocksize = WP512_BLOCK_SIZE, .cra_module = THIS_MODULE, } @@ -1145,11 +1113,12 @@ static struct shash_alg wp_algs[3] = { { .digestsize = WP256_DIGEST_SIZE, .init = wp512_init, .update = wp512_update, - .final = wp256_final, + .finup = wp256_finup, .descsize = sizeof(struct wp512_ctx), .base = { .cra_name = "wp256", .cra_driver_name = "wp256-generic", + .cra_flags = CRYPTO_AHASH_ALG_BLOCK_ONLY, .cra_blocksize = WP512_BLOCK_SIZE, .cra_module = THIS_MODULE, } From 9205999e9f13a07cb29d5a8836c25afdca186007 Mon Sep 17 00:00:00 2001 From: Ankit Nautiyal Date: Wed, 18 Jun 2025 18:39:50 +0530 Subject: [PATCH 103/289] drm/i915/snps_hdmi_pll: Fix 64-bit divisor truncation by using div64_u64 DIV_ROUND_CLOSEST_ULL uses do_div(), which expects a 32-bit divisor. When passing a 64-bit constant like CURVE2_MULTIPLIER, the value is silently truncated to u32, potentially leading to incorrect results on large divisors. Replace DIV_ROUND_CLOSEST_ULL with DIV64_U64_ROUND_CLOSEST which correctly handles full 64-bit division. v2: Use DIV64_U64_ROUND_CLOSEST instead of div64_u64 macro. (Jani) Fixes: 5947642004bf ("drm/i915/display: Add support for SNPS PHY HDMI PLL algorithm for DG2") Reported-by: Vas Novikov Closes: https://lore.kernel.org/all/8d7c7958-9558-4c8a-a81a-e9310f2d8852@gmail.com/ Cc: Ankit Nautiyal Cc: Suraj Kandpal Cc: Jani Nikula Cc: Vas Novikov Cc: stable@vger.kernel.org # v6.15+ Reviewed-by: Jani Nikula Signed-off-by: Ankit Nautiyal Link: https://lore.kernel.org/r/20250618130951.1596587-2-ankit.k.nautiyal@intel.com (cherry picked from commit b300a175a11e6a934d728317dc39787723cc7917) Signed-off-by: Joonas Lahtinen --- drivers/gpu/drm/i915/display/intel_snps_hdmi_pll.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/i915/display/intel_snps_hdmi_pll.c b/drivers/gpu/drm/i915/display/intel_snps_hdmi_pll.c index 74bb3bedf30f..5111bdc3075b 100644 --- a/drivers/gpu/drm/i915/display/intel_snps_hdmi_pll.c +++ b/drivers/gpu/drm/i915/display/intel_snps_hdmi_pll.c @@ -103,8 +103,8 @@ static void get_ana_cp_int_prop(u64 vco_clk, DIV_ROUND_DOWN_ULL(curve_1_interpolated, CURVE0_MULTIPLIER))); ana_cp_int_temp = - DIV_ROUND_CLOSEST_ULL(DIV_ROUND_DOWN_ULL(adjusted_vco_clk1, curve_2_scaled1), - CURVE2_MULTIPLIER); + DIV64_U64_ROUND_CLOSEST(DIV_ROUND_DOWN_ULL(adjusted_vco_clk1, curve_2_scaled1), + CURVE2_MULTIPLIER); *ana_cp_int = max(1, min(ana_cp_int_temp, 127)); From a24cc6ce1933eade12aa2b9859de0fcd2dac2c06 Mon Sep 17 00:00:00 2001 From: Sebastian Andrzej Siewior Date: Mon, 23 Jun 2025 10:34:08 +0200 Subject: [PATCH 104/289] futex: Initialize futex_phash_new during fork(). During a hash resize operation the new private hash is stored in mm_struct::futex_phash_new if the current hash can not be immediately replaced. The new hash must not be copied during fork() into the new task. Doing so will lead to a double-free of the memory by the two tasks. Initialize the mm_struct::futex_phash_new during fork(). Closes: https://lore.kernel.org/all/aFBQ8CBKmRzEqIfS@mozart.vkv.me/ Fixes: bd54df5ea7cad ("futex: Allow to resize the private local hash") Reported-by: Calvin Owens Signed-off-by: Sebastian Andrzej Siewior Signed-off-by: Peter Zijlstra (Intel) Tested-by: Calvin Owens Link: https://lkml.kernel.org/r/20250623083408.jTiJiC6_@linutronix.de --- include/linux/futex.h | 1 + 1 file changed, 1 insertion(+) diff --git a/include/linux/futex.h b/include/linux/futex.h index 005b040c4791..b37193653e6b 100644 --- a/include/linux/futex.h +++ b/include/linux/futex.h @@ -89,6 +89,7 @@ void futex_hash_free(struct mm_struct *mm); static inline void futex_mm_init(struct mm_struct *mm) { RCU_INIT_POINTER(mm->futex_phash, NULL); + mm->futex_phash_new = NULL; mutex_init(&mm->futex_hash_lock); } From a3ef3c2da675a8a564c8bea1a511cdd0a2a9aa49 Mon Sep 17 00:00:00 2001 From: Imre Deak Date: Thu, 5 Jun 2025 11:28:46 +0300 Subject: [PATCH 105/289] drm/dp: Change AUX DPCD probe address from DPCD_REV to LANE0_1_STATUS MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Reading DPCD registers has side-effects in general. In particular accessing registers outside of the link training register range (0x102-0x106, 0x202-0x207, 0x200c-0x200f, 0x2216) is explicitly forbidden by the DP v2.1 Standard, see 3.6.5.1 DPTX AUX Transaction Handling Mandates 3.6.7.4 128b/132b DP Link Layer LTTPR Link Training Mandates Based on my tests, accessing the DPCD_REV register during the link training of an UHBR TBT DP tunnel sink leads to link training failures. Solve the above by using the DP_LANE0_1_STATUS (0x202) register for the DPCD register access quirk. Cc: Cc: Ville Syrjälä Cc: Jani Nikula Acked-by: Jani Nikula Signed-off-by: Imre Deak Link: https://lore.kernel.org/r/20250605082850.65136-2-imre.deak@intel.com (cherry picked from commit a40c5d727b8111b5db424a1e43e14a1dcce1e77f) Signed-off-by: Joonas Lahtinen --- drivers/gpu/drm/display/drm_dp_helper.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/display/drm_dp_helper.c b/drivers/gpu/drm/display/drm_dp_helper.c index f2a6559a2710..dc622c78db9d 100644 --- a/drivers/gpu/drm/display/drm_dp_helper.c +++ b/drivers/gpu/drm/display/drm_dp_helper.c @@ -725,7 +725,7 @@ ssize_t drm_dp_dpcd_read(struct drm_dp_aux *aux, unsigned int offset, * monitor doesn't power down exactly after the throw away read. */ if (!aux->is_remote) { - ret = drm_dp_dpcd_probe(aux, DP_DPCD_REV); + ret = drm_dp_dpcd_probe(aux, DP_LANE0_1_STATUS); if (ret < 0) return ret; } From dc6458ed95e40146699f9c523e34cb13ff127170 Mon Sep 17 00:00:00 2001 From: Vijendar Mukunda Date: Mon, 23 Jun 2025 14:14:55 +0530 Subject: [PATCH 106/289] ASoC: amd: ps: fix for soundwire failures during hibernation exit sequence During the hibernate entry sequence, ACP registers will be reset to default values and acp ip will be completely powered off including acp SoundWire pads. During resume sequence, if acp SoundWire pad keeper enable register is not restored along with pad pulldown control register value, then SoundWire manager links won't be powered on correctly results in peripheral register access failures and completely audio function is broken. Add code to store the acp SoundWire pad keeper enable register and acp pad pulldown ctrl register values before entering into suspend state and restore the register values during resume sequence based on condition check for acp SoundWire pad keeper enable register for ACP6.3, ACP7.0 & ACP7.1 platforms. Fixes: 491628388005 ("ASoC: amd: ps: add callback functions for acp pci driver pm ops") Signed-off-by: Vijendar Mukunda Link: https://patch.msgid.link/20250623084630.3100279-1-Vijendar.Mukunda@amd.com Signed-off-by: Mark Brown --- sound/soc/amd/ps/acp63.h | 4 ++++ sound/soc/amd/ps/ps-common.c | 18 ++++++++++++++++++ 2 files changed, 22 insertions(+) diff --git a/sound/soc/amd/ps/acp63.h b/sound/soc/amd/ps/acp63.h index 85feae45c44c..d7c994e26e4d 100644 --- a/sound/soc/amd/ps/acp63.h +++ b/sound/soc/amd/ps/acp63.h @@ -334,6 +334,8 @@ struct acp_hw_ops { * @addr: pci ioremap address * @reg_range: ACP reigister range * @acp_rev: ACP PCI revision id + * @acp_sw_pad_keeper_en: store acp SoundWire pad keeper enable register value + * @acp_pad_pulldown_ctrl: store acp pad pulldown control register value * @acp63_sdw0-dma_intr_stat: DMA interrupt status array for ACP6.3 platform SoundWire * manager-SW0 instance * @acp63_sdw_dma_intr_stat: DMA interrupt status array for ACP6.3 platform SoundWire @@ -367,6 +369,8 @@ struct acp63_dev_data { u32 addr; u32 reg_range; u32 acp_rev; + u32 acp_sw_pad_keeper_en; + u32 acp_pad_pulldown_ctrl; u16 acp63_sdw0_dma_intr_stat[ACP63_SDW0_DMA_MAX_STREAMS]; u16 acp63_sdw1_dma_intr_stat[ACP63_SDW1_DMA_MAX_STREAMS]; u16 acp70_sdw0_dma_intr_stat[ACP70_SDW0_DMA_MAX_STREAMS]; diff --git a/sound/soc/amd/ps/ps-common.c b/sound/soc/amd/ps/ps-common.c index 1c89fb5fe1da..7b4966b75dc6 100644 --- a/sound/soc/amd/ps/ps-common.c +++ b/sound/soc/amd/ps/ps-common.c @@ -160,6 +160,8 @@ static int __maybe_unused snd_acp63_suspend(struct device *dev) adata = dev_get_drvdata(dev); if (adata->is_sdw_dev) { + adata->acp_sw_pad_keeper_en = readl(adata->acp63_base + ACP_SW0_PAD_KEEPER_EN); + adata->acp_pad_pulldown_ctrl = readl(adata->acp63_base + ACP_PAD_PULLDOWN_CTRL); adata->sdw_en_stat = check_acp_sdw_enable_status(adata); if (adata->sdw_en_stat) { writel(1, adata->acp63_base + ACP_ZSC_DSP_CTRL); @@ -197,6 +199,7 @@ static int __maybe_unused snd_acp63_runtime_resume(struct device *dev) static int __maybe_unused snd_acp63_resume(struct device *dev) { struct acp63_dev_data *adata; + u32 acp_sw_pad_keeper_en; int ret; adata = dev_get_drvdata(dev); @@ -209,6 +212,12 @@ static int __maybe_unused snd_acp63_resume(struct device *dev) if (ret) dev_err(dev, "ACP init failed\n"); + acp_sw_pad_keeper_en = readl(adata->acp63_base + ACP_SW0_PAD_KEEPER_EN); + dev_dbg(dev, "ACP_SW0_PAD_KEEPER_EN:0x%x\n", acp_sw_pad_keeper_en); + if (!acp_sw_pad_keeper_en) { + writel(adata->acp_sw_pad_keeper_en, adata->acp63_base + ACP_SW0_PAD_KEEPER_EN); + writel(adata->acp_pad_pulldown_ctrl, adata->acp63_base + ACP_PAD_PULLDOWN_CTRL); + } return ret; } @@ -408,6 +417,8 @@ static int __maybe_unused snd_acp70_suspend(struct device *dev) adata = dev_get_drvdata(dev); if (adata->is_sdw_dev) { + adata->acp_sw_pad_keeper_en = readl(adata->acp63_base + ACP_SW0_PAD_KEEPER_EN); + adata->acp_pad_pulldown_ctrl = readl(adata->acp63_base + ACP_PAD_PULLDOWN_CTRL); adata->sdw_en_stat = check_acp_sdw_enable_status(adata); if (adata->sdw_en_stat) { writel(1, adata->acp63_base + ACP_ZSC_DSP_CTRL); @@ -445,6 +456,7 @@ static int __maybe_unused snd_acp70_runtime_resume(struct device *dev) static int __maybe_unused snd_acp70_resume(struct device *dev) { struct acp63_dev_data *adata; + u32 acp_sw_pad_keeper_en; int ret; adata = dev_get_drvdata(dev); @@ -459,6 +471,12 @@ static int __maybe_unused snd_acp70_resume(struct device *dev) if (ret) dev_err(dev, "ACP init failed\n"); + acp_sw_pad_keeper_en = readl(adata->acp63_base + ACP_SW0_PAD_KEEPER_EN); + dev_dbg(dev, "ACP_SW0_PAD_KEEPER_EN:0x%x\n", acp_sw_pad_keeper_en); + if (!acp_sw_pad_keeper_en) { + writel(adata->acp_sw_pad_keeper_en, adata->acp63_base + ACP_SW0_PAD_KEEPER_EN); + writel(adata->acp_pad_pulldown_ctrl, adata->acp63_base + ACP_PAD_PULLDOWN_CTRL); + } return ret; } From ff8abbd248c1f52df0c321690b88454b13ff54b2 Mon Sep 17 00:00:00 2001 From: Paulo Alcantara Date: Sun, 22 Jun 2025 14:13:40 -0300 Subject: [PATCH 107/289] smb: client: fix regression with native SMB symlinks Some users and customers reported that their backup/copy tools started to fail when the directory being copied contained symlink targets that the client couldn't parse - even when those symlinks weren't followed. Fix this by allowing lstat(2) and readlink(2) to succeed even when the client can't resolve the symlink target, restoring old behavior. Cc: linux-cifs@vger.kernel.org Cc: stable@vger.kernel.org Reported-by: Remy Monsen Closes: https://lore.kernel.org/r/CAN+tdP7y=jqw3pBndZAGjQv0ObFq8Q=+PUDHgB36HdEz9QA6FQ@mail.gmail.com Reported-by: Pierguido Lambri Fixes: 12b466eb52d9 ("cifs: Fix creating and resolving absolute NT-style symlinks") Signed-off-by: Paulo Alcantara (Red Hat) Signed-off-by: Steve French --- fs/smb/client/reparse.c | 20 ++++---------------- 1 file changed, 4 insertions(+), 16 deletions(-) diff --git a/fs/smb/client/reparse.c b/fs/smb/client/reparse.c index 511611206dab..1c40e42e4d89 100644 --- a/fs/smb/client/reparse.c +++ b/fs/smb/client/reparse.c @@ -875,15 +875,8 @@ globalroot: abs_path += sizeof("\\DosDevices\\")-1; else if (strstarts(abs_path, "\\GLOBAL??\\")) abs_path += sizeof("\\GLOBAL??\\")-1; - else { - /* Unhandled absolute symlink, points outside of DOS/Win32 */ - cifs_dbg(VFS, - "absolute symlink '%s' cannot be converted from NT format " - "because points to unknown target\n", - smb_target); - rc = -EIO; - goto out; - } + else + goto out_unhandled_target; /* Sometimes path separator after \?? is double backslash */ if (abs_path[0] == '\\') @@ -910,13 +903,7 @@ globalroot: abs_path++; abs_path[0] = drive_letter; } else { - /* Unhandled absolute symlink. Report an error. */ - cifs_dbg(VFS, - "absolute symlink '%s' cannot be converted from NT format " - "because points to unknown target\n", - smb_target); - rc = -EIO; - goto out; + goto out_unhandled_target; } abs_path_len = strlen(abs_path)+1; @@ -966,6 +953,7 @@ globalroot: * These paths have same format as Linux symlinks, so no * conversion is needed. */ +out_unhandled_target: linux_target = smb_target; smb_target = NULL; } From 88a80066af1617fab444776135d840467414beb6 Mon Sep 17 00:00:00 2001 From: Fengnan Chang Date: Mon, 23 Jun 2025 19:02:18 +0800 Subject: [PATCH 108/289] io_uring: make fallocate be hashed work Like ftruncate and write, fallocate operations on the same file cannot be executed in parallel, so it is better to make fallocate be hashed work. Signed-off-by: Fengnan Chang Link: https://lore.kernel.org/r/20250623110218.61490-1-changfengnan@bytedance.com Signed-off-by: Jens Axboe --- io_uring/opdef.c | 1 + 1 file changed, 1 insertion(+) diff --git a/io_uring/opdef.c b/io_uring/opdef.c index 6e0882b051f9..6de6229207a8 100644 --- a/io_uring/opdef.c +++ b/io_uring/opdef.c @@ -216,6 +216,7 @@ const struct io_issue_def io_issue_defs[] = { }, [IORING_OP_FALLOCATE] = { .needs_file = 1, + .hash_reg_file = 1, .prep = io_fallocate_prep, .issue = io_fallocate, }, From 9c19b3315cef8b62d2c037616a66b4446b966f6d Mon Sep 17 00:00:00 2001 From: Nikhil Jha Date: Wed, 11 Jun 2025 15:46:39 -0400 Subject: [PATCH 109/289] sunrpc: fix loop in gss seqno cache There was a silly bug in the initial implementation where a loop variable was not incremented. This commit increments the loop variable. This bug is somewhat tricky to catch because it can only happen on loops of two or more. If it is hit, it locks up a kernel thread in an infinite loop. Signed-off-by: Nikhil Jha Tested-by: Nikhil Jha Fixes: 08d6ee6d8a10 ("sunrpc: implement rfc2203 rpcsec_gss seqnum cache") Reviewed-by: Benjamin Coddington Signed-off-by: Anna Schumaker --- net/sunrpc/auth_gss/auth_gss.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/sunrpc/auth_gss/auth_gss.c b/net/sunrpc/auth_gss/auth_gss.c index 0fa244f16876..7b943fbafcc3 100644 --- a/net/sunrpc/auth_gss/auth_gss.c +++ b/net/sunrpc/auth_gss/auth_gss.c @@ -1724,7 +1724,7 @@ gss_validate(struct rpc_task *task, struct xdr_stream *xdr) maj_stat = gss_validate_seqno_mic(ctx, task->tk_rqstp->rq_seqnos[0], seq, p, len); /* RFC 2203 5.3.3.1 - compute the checksum of each sequence number in the cache */ while (unlikely(maj_stat == GSS_S_BAD_SIG && i < task->tk_rqstp->rq_seqno_count)) - maj_stat = gss_validate_seqno_mic(ctx, task->tk_rqstp->rq_seqnos[i], seq, p, len); + maj_stat = gss_validate_seqno_mic(ctx, task->tk_rqstp->rq_seqnos[i++], seq, p, len); if (maj_stat == GSS_S_CONTEXT_EXPIRED) clear_bit(RPCAUTH_CRED_UPTODATE, &cred->cr_flags); if (maj_stat) From e8d6f3ab59468e230f3253efe5cb63efa35289f7 Mon Sep 17 00:00:00 2001 From: Kuniyuki Iwashima Date: Thu, 12 Jun 2025 14:52:50 -0700 Subject: [PATCH 110/289] nfs: Clean up /proc/net/rpc/nfs when nfs_fs_proc_net_init() fails. syzbot reported a warning below [1] following a fault injection in nfs_fs_proc_net_init(). [0] When nfs_fs_proc_net_init() fails, /proc/net/rpc/nfs is not removed. Later, rpc_proc_exit() tries to remove /proc/net/rpc, and the warning is logged as the directory is not empty. Let's handle the error of nfs_fs_proc_net_init() properly. [0]: FAULT_INJECTION: forcing a failure. name failslab, interval 1, probability 0, space 0, times 0 CPU: 1 UID: 0 PID: 6120 Comm: syz.2.27 Not tainted 6.16.0-rc1-syzkaller-00010-g2c4a1f3fe03e #0 PREEMPT(full) Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 05/07/2025 Call Trace: dump_stack_lvl (lib/dump_stack.c:123) should_fail_ex (lib/fault-inject.c:73 lib/fault-inject.c:174) should_failslab (mm/failslab.c:46) kmem_cache_alloc_noprof (mm/slub.c:4178 mm/slub.c:4204) __proc_create (fs/proc/generic.c:427) proc_create_reg (fs/proc/generic.c:554) proc_create_net_data (fs/proc/proc_net.c:120) nfs_fs_proc_net_init (fs/nfs/client.c:1409) nfs_net_init (fs/nfs/inode.c:2600) ops_init (net/core/net_namespace.c:138) setup_net (net/core/net_namespace.c:443) copy_net_ns (net/core/net_namespace.c:576) create_new_namespaces (kernel/nsproxy.c:110) unshare_nsproxy_namespaces (kernel/nsproxy.c:218 (discriminator 4)) ksys_unshare (kernel/fork.c:3123) __x64_sys_unshare (kernel/fork.c:3190) do_syscall_64 (arch/x86/entry/syscall_64.c:63 arch/x86/entry/syscall_64.c:94) entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:130) [1]: remove_proc_entry: removing non-empty directory 'net/rpc', leaking at least 'nfs' WARNING: CPU: 1 PID: 6120 at fs/proc/generic.c:727 remove_proc_entry+0x45e/0x530 fs/proc/generic.c:727 Modules linked in: CPU: 1 UID: 0 PID: 6120 Comm: syz.2.27 Not tainted 6.16.0-rc1-syzkaller-00010-g2c4a1f3fe03e #0 PREEMPT(full) Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 05/07/2025 RIP: 0010:remove_proc_entry+0x45e/0x530 fs/proc/generic.c:727 Code: 3c 02 00 0f 85 85 00 00 00 48 8b 93 d8 00 00 00 4d 89 f0 4c 89 e9 48 c7 c6 40 ba a2 8b 48 c7 c7 60 b9 a2 8b e8 33 81 1d ff 90 <0f> 0b 90 90 e9 5f fe ff ff e8 04 69 5e ff 90 48 b8 00 00 00 00 00 RSP: 0018:ffffc90003637b08 EFLAGS: 00010282 RAX: 0000000000000000 RBX: ffff88805f534140 RCX: ffffffff817a92c8 RDX: ffff88807da99e00 RSI: ffffffff817a92d5 RDI: 0000000000000001 RBP: ffff888033431ac0 R08: 0000000000000001 R09: 0000000000000000 R10: 0000000000000001 R11: 0000000000000001 R12: ffff888033431a00 R13: ffff888033431ae4 R14: ffff888033184724 R15: dffffc0000000000 FS: 0000555580328500(0000) GS:ffff888124a62000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f71733743e0 CR3: 000000007f618000 CR4: 00000000003526f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: sunrpc_exit_net+0x46/0x90 net/sunrpc/sunrpc_syms.c:76 ops_exit_list net/core/net_namespace.c:200 [inline] ops_undo_list+0x2eb/0xab0 net/core/net_namespace.c:253 setup_net+0x2e1/0x510 net/core/net_namespace.c:457 copy_net_ns+0x2a6/0x5f0 net/core/net_namespace.c:574 create_new_namespaces+0x3ea/0xa90 kernel/nsproxy.c:110 unshare_nsproxy_namespaces+0xc0/0x1f0 kernel/nsproxy.c:218 ksys_unshare+0x45b/0xa40 kernel/fork.c:3121 __do_sys_unshare kernel/fork.c:3192 [inline] __se_sys_unshare kernel/fork.c:3190 [inline] __x64_sys_unshare+0x31/0x40 kernel/fork.c:3190 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline] do_syscall_64+0xcd/0x490 arch/x86/entry/syscall_64.c:94 entry_SYSCALL_64_after_hwframe+0x77/0x7f RIP: 0033:0x7fa1a6b8e929 Code: ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 a8 ff ff ff f7 d8 64 89 01 48 RSP: 002b:00007fff3a090368 EFLAGS: 00000246 ORIG_RAX: 0000000000000110 RAX: ffffffffffffffda RBX: 00007fa1a6db5fa0 RCX: 00007fa1a6b8e929 RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000040000080 RBP: 00007fa1a6c10b39 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000 R13: 00007fa1a6db5fa0 R14: 00007fa1a6db5fa0 R15: 0000000000000001 Fixes: d47151b79e32 ("nfs: expose /proc/net/sunrpc/nfs in net namespaces") Reported-by: syzbot+a4cc4ac22daa4a71b87c@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=a4cc4ac22daa4a71b87c Tested-by: syzbot+a4cc4ac22daa4a71b87c@syzkaller.appspotmail.com Signed-off-by: Kuniyuki Iwashima Signed-off-by: Anna Schumaker --- fs/nfs/inode.c | 17 ++++++++++++++--- 1 file changed, 14 insertions(+), 3 deletions(-) diff --git a/fs/nfs/inode.c b/fs/nfs/inode.c index 8ab7868807a7..a2fa6bc4d74e 100644 --- a/fs/nfs/inode.c +++ b/fs/nfs/inode.c @@ -2589,15 +2589,26 @@ EXPORT_SYMBOL_GPL(nfs_net_id); static int nfs_net_init(struct net *net) { struct nfs_net *nn = net_generic(net, nfs_net_id); + int err; nfs_clients_init(net); if (!rpc_proc_register(net, &nn->rpcstats)) { - nfs_clients_exit(net); - return -ENOMEM; + err = -ENOMEM; + goto err_proc_rpc; } - return nfs_fs_proc_net_init(net); + err = nfs_fs_proc_net_init(net); + if (err) + goto err_proc_nfs; + + return 0; + +err_proc_nfs: + rpc_proc_unregister(net, "nfs"); +err_proc_rpc: + nfs_clients_exit(net); + return err; } static void nfs_net_exit(struct net *net) From c01776287414ca43412d1319d2877cbad65444ac Mon Sep 17 00:00:00 2001 From: Benjamin Coddington Date: Thu, 19 Jun 2025 11:02:21 -0400 Subject: [PATCH 111/289] NFSv4/pNFS: Fix a race to wake on NFS_LAYOUT_DRAIN We found a few different systems hung up in writeback waiting on the same page lock, and one task waiting on the NFS_LAYOUT_DRAIN bit in pnfs_update_layout(), however the pnfs_layout_hdr's plh_outstanding count was zero. It seems most likely that this is another race between the waiter and waker similar to commit ed0172af5d6f ("SUNRPC: Fix a race to wake a sync task"). Fix it up by applying the advised barrier. Fixes: 880265c77ac4 ("pNFS: Avoid a live lock condition in pnfs_update_layout()") Signed-off-by: Benjamin Coddington Signed-off-by: Anna Schumaker --- fs/nfs/pnfs.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c index 3adb7d0dbec7..1a7ec68bde15 100644 --- a/fs/nfs/pnfs.c +++ b/fs/nfs/pnfs.c @@ -2059,8 +2059,10 @@ static void nfs_layoutget_begin(struct pnfs_layout_hdr *lo) static void nfs_layoutget_end(struct pnfs_layout_hdr *lo) { if (atomic_dec_and_test(&lo->plh_outstanding) && - test_and_clear_bit(NFS_LAYOUT_DRAIN, &lo->plh_flags)) + test_and_clear_bit(NFS_LAYOUT_DRAIN, &lo->plh_flags)) { + smp_mb__after_atomic(); wake_up_bit(&lo->plh_flags, NFS_LAYOUT_DRAIN); + } } static bool pnfs_is_first_layoutget(struct pnfs_layout_hdr *lo) From 9a07ca9a4015f8f71e2b594ee76ac55483babd89 Mon Sep 17 00:00:00 2001 From: Chris Chiu Date: Mon, 23 Jun 2025 14:30:23 +0800 Subject: [PATCH 112/289] ALSA: hda/realtek: fix mute/micmute LEDs for HP EliteBook 6 G1a HP EliteBook 6 G1a laptops use ALC236 codec and need the fixup ALC236_FIXUP_HP_MUTE_LED_MICMUTE_VREF to make the mic/micmute LEDs work. Signed-off-by: Chris Chiu Link: https://patch.msgid.link/20250623063023.374920-1-chris.chiu@canonical.com Signed-off-by: Takashi Iwai --- sound/pci/hda/patch_realtek.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/sound/pci/hda/patch_realtek.c b/sound/pci/hda/patch_realtek.c index e144ebb08841..166730f00e5e 100644 --- a/sound/pci/hda/patch_realtek.c +++ b/sound/pci/hda/patch_realtek.c @@ -10909,7 +10909,9 @@ static const struct hda_quirk alc269_fixup_tbl[] = { SND_PCI_QUIRK(0x103c, 0x8def, "HP EliteBook 660 G12", ALC236_FIXUP_HP_GPIO_LED), SND_PCI_QUIRK(0x103c, 0x8df0, "HP EliteBook 630 G12", ALC236_FIXUP_HP_GPIO_LED), SND_PCI_QUIRK(0x103c, 0x8df1, "HP EliteBook 630 G12", ALC236_FIXUP_HP_GPIO_LED), + SND_PCI_QUIRK(0x103c, 0x8dfb, "HP EliteBook 6 G1a 14", ALC236_FIXUP_HP_MUTE_LED_MICMUTE_VREF), SND_PCI_QUIRK(0x103c, 0x8dfc, "HP EliteBook 645 G12", ALC236_FIXUP_HP_GPIO_LED), + SND_PCI_QUIRK(0x103c, 0x8dfd, "HP EliteBook 6 G1a 16", ALC236_FIXUP_HP_MUTE_LED_MICMUTE_VREF), SND_PCI_QUIRK(0x103c, 0x8dfe, "HP EliteBook 665 G12", ALC236_FIXUP_HP_GPIO_LED), SND_PCI_QUIRK(0x103c, 0x8e11, "HP Trekker", ALC287_FIXUP_CS35L41_I2C_2), SND_PCI_QUIRK(0x103c, 0x8e12, "HP Trekker", ALC287_FIXUP_CS35L41_I2C_2), From fb4e2a6e8f28a3c0ad382e363aeb9cd822007b8a Mon Sep 17 00:00:00 2001 From: Youngjun Lee Date: Mon, 23 Jun 2025 20:05:25 +0900 Subject: [PATCH 113/289] ALSA: usb-audio: Fix out-of-bounds read in snd_usb_get_audioformat_uac3() In snd_usb_get_audioformat_uac3(), the length value returned from snd_usb_ctl_msg() is used directly for memory allocation without validation. This length is controlled by the USB device. The allocated buffer is cast to a uac3_cluster_header_descriptor and its fields are accessed without verifying that the buffer is large enough. If the device returns a smaller than expected length, this leads to an out-of-bounds read. Add a length check to ensure the buffer is large enough for uac3_cluster_header_descriptor. Signed-off-by: Youngjun Lee Fixes: 9a2fe9b801f5 ("ALSA: usb: initial USB Audio Device Class 3.0 support") Link: https://patch.msgid.link/20250623-uac3-oob-fix-v1-1-527303eaf40a@samsung.com Signed-off-by: Takashi Iwai --- sound/usb/stream.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/sound/usb/stream.c b/sound/usb/stream.c index c1ea8844a46f..aa91d63749f2 100644 --- a/sound/usb/stream.c +++ b/sound/usb/stream.c @@ -987,6 +987,8 @@ snd_usb_get_audioformat_uac3(struct snd_usb_audio *chip, * and request Cluster Descriptor */ wLength = le16_to_cpu(hc_header.wLength); + if (wLength < sizeof(cluster)) + return NULL; cluster = kzalloc(wLength, GFP_KERNEL); if (!cluster) return ERR_PTR(-ENOMEM); From 41c66461cb2e8d3934a5395f27e572ebe63696b4 Mon Sep 17 00:00:00 2001 From: Takashi Iwai Date: Mon, 23 Jun 2025 17:18:39 +0200 Subject: [PATCH 114/289] ALSA: hda/realtek: Add mic-mute LED setup for ASUS UM5606 ASUS UM5606* models use the quirk to set up the bass speakers, but it missed the mic-mute LED configuration. Other similar models have the AMD ACP dmic, and the mic-mute is set up for that, but those models don't have AMD ACP but rather built-in mics of Realtek codec, hence the Realtek driver should set it up, instead. Link: https://bugzilla.kernel.org/show_bug.cgi?id=220125 Link: https://patch.msgid.link/20250623151841.28810-1-tiwai@suse.de Signed-off-by: Takashi Iwai --- sound/pci/hda/patch_realtek.c | 1 + 1 file changed, 1 insertion(+) diff --git a/sound/pci/hda/patch_realtek.c b/sound/pci/hda/patch_realtek.c index 166730f00e5e..784644a2b51d 100644 --- a/sound/pci/hda/patch_realtek.c +++ b/sound/pci/hda/patch_realtek.c @@ -6610,6 +6610,7 @@ static void alc294_fixup_bass_speaker_15(struct hda_codec *codec, if (action == HDA_FIXUP_ACT_PRE_PROBE) { static const hda_nid_t conn[] = { 0x02, 0x03 }; snd_hda_override_conn_list(codec, 0x15, ARRAY_SIZE(conn), conn); + snd_hda_gen_add_micmute_led_cdev(codec, NULL); } } From 5aa326a6a2ff0a7e7c6e11777045e66704c2d5e4 Mon Sep 17 00:00:00 2001 From: Manivannan Sadhasivam Date: Sun, 8 Jun 2025 09:03:05 +0530 Subject: [PATCH 115/289] PCI/PTM: Build debugfs code only if CONFIG_DEBUG_FS is enabled Otherwise, the following build error will happen for CONFIG_DEBUG_FS=n && CONFIG_PCIE_PTM=y: drivers/pci/pcie/ptm.c:498:25: error: redefinition of 'pcie_ptm_create_debugfs' 498 | struct pci_ptm_debugfs *pcie_ptm_create_debugfs(struct device *dev, void *pdata, | ^ ./include/linux/pci.h:1915:2: note: previous definition is here 1915 | *pcie_ptm_create_debugfs(struct device *dev, void *pdata, | ^ drivers/pci/pcie/ptm.c:546:6: error: redefinition of 'pcie_ptm_destroy_debugfs' 546 | void pcie_ptm_destroy_debugfs(struct pci_ptm_debugfs *ptm_debugfs) | ^ ./include/linux/pci.h:1918:1: note: previous definition is here 1918 | pcie_ptm_destroy_debugfs(struct pci_ptm_debugfs *ptm_debugfs) { } | Fixes: 132833405e61 ("PCI: Add debugfs support for exposing PTM context") Reported-by: Eric Biggers Closes: https://lore.kernel.org/linux-pci/20250607025506.GA16607@sol Signed-off-by: Manivannan Sadhasivam Signed-off-by: Bjorn Helgaas Tested-by: Eric Biggers Reviewed-by: Kuppuswamy Sathyanarayanan Link: https://patch.msgid.link/20250608033305.15214-1-manivannan.sadhasivam@linaro.org --- drivers/pci/pcie/ptm.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/pci/pcie/ptm.c b/drivers/pci/pcie/ptm.c index ee5f615a9023..4bd73f038ffb 100644 --- a/drivers/pci/pcie/ptm.c +++ b/drivers/pci/pcie/ptm.c @@ -254,6 +254,7 @@ bool pcie_ptm_enabled(struct pci_dev *dev) } EXPORT_SYMBOL(pcie_ptm_enabled); +#if IS_ENABLED(CONFIG_DEBUG_FS) static ssize_t context_update_write(struct file *file, const char __user *ubuf, size_t count, loff_t *ppos) { @@ -552,3 +553,4 @@ void pcie_ptm_destroy_debugfs(struct pci_ptm_debugfs *ptm_debugfs) debugfs_remove_recursive(ptm_debugfs->debugfs); } EXPORT_SYMBOL_GPL(pcie_ptm_destroy_debugfs); +#endif From 2b8be57fa0c88ac824a906f29c04d728f9f6047a Mon Sep 17 00:00:00 2001 From: Zhe Qiao Date: Thu, 19 Jun 2025 15:26:08 +0800 Subject: [PATCH 116/289] Revert "PCI/ACPI: Fix allocated memory release on error in pci_acpi_scan_root()" This reverts commit 631b2af2f357 ("PCI/ACPI: Fix allocated memory release on error in pci_acpi_scan_root()"). The reverted patch causes the 'ri->cfg' and 'root_ops' resources to be released multiple times. When acpi_pci_root_create() fails, these resources have already been released internally by the __acpi_pci_root_release_info() function. Releasing them again in pci_acpi_scan_root() leads to incorrect behavior and potential memory issues. We plan to resolve the issue using a more appropriate fix. Reported-by: Dan Carpenter Closes: https://lore.kernel.org/all/aEmdnuw715btq7Q5@stanley.mountain/ Signed-off-by: Zhe Qiao Acked-by: Dan Carpenter Link: https://patch.msgid.link/20250619072608.2075475-1-qiaozhe@iscas.ac.cn Signed-off-by: Rafael J. Wysocki --- drivers/pci/pci-acpi.c | 23 ++++++++++------------- 1 file changed, 10 insertions(+), 13 deletions(-) diff --git a/drivers/pci/pci-acpi.c b/drivers/pci/pci-acpi.c index b78e0e417324..af370628e583 100644 --- a/drivers/pci/pci-acpi.c +++ b/drivers/pci/pci-acpi.c @@ -1676,19 +1676,24 @@ struct pci_bus *pci_acpi_scan_root(struct acpi_pci_root *root) return NULL; root_ops = kzalloc(sizeof(*root_ops), GFP_KERNEL); - if (!root_ops) - goto free_ri; + if (!root_ops) { + kfree(ri); + return NULL; + } ri->cfg = pci_acpi_setup_ecam_mapping(root); - if (!ri->cfg) - goto free_root_ops; + if (!ri->cfg) { + kfree(ri); + kfree(root_ops); + return NULL; + } root_ops->release_info = pci_acpi_generic_release_info; root_ops->prepare_resources = pci_acpi_root_prepare_resources; root_ops->pci_ops = (struct pci_ops *)&ri->cfg->ops->pci_ops; bus = acpi_pci_root_create(root, root_ops, &ri->common, ri->cfg); if (!bus) - goto free_cfg; + return NULL; /* If we must preserve the resource configuration, claim now */ host = pci_find_host_bridge(bus); @@ -1705,14 +1710,6 @@ struct pci_bus *pci_acpi_scan_root(struct acpi_pci_root *root) pcie_bus_configure_settings(child); return bus; - -free_cfg: - pci_ecam_free(ri->cfg); -free_root_ops: - kfree(root_ops); -free_ri: - kfree(ri); - return NULL; } void pcibios_add_bus(struct pci_bus *bus) From 002cc0ee90e6172ef40ae95c248bf6ce67fb3f3f Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Andr=C3=A9=20Draszik?= Date: Wed, 9 Apr 2025 21:37:46 +0100 Subject: [PATCH 117/289] rtc: s5m: cache device type during probe MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit platform_get_device_id() is called mulitple times during probe to retrieve the device type. This makes the code harder to read than necessary. Just get the type once, which also trims the lengths of the lines involved. Signed-off-by: André Draszik Link: https://lore.kernel.org/r/20250409-s2mpg10-v4-25-d66d5f39b6bf@linaro.org Signed-off-by: Alexandre Belloni --- drivers/rtc/rtc-s5m.c | 10 ++++++---- 1 file changed, 6 insertions(+), 4 deletions(-) diff --git a/drivers/rtc/rtc-s5m.c b/drivers/rtc/rtc-s5m.c index db5c9b641277..c7636738b797 100644 --- a/drivers/rtc/rtc-s5m.c +++ b/drivers/rtc/rtc-s5m.c @@ -637,6 +637,8 @@ static int s5m8767_rtc_init_reg(struct s5m_rtc_info *info) static int s5m_rtc_probe(struct platform_device *pdev) { struct sec_pmic_dev *s5m87xx = dev_get_drvdata(pdev->dev.parent); + enum sec_device_type device_type = + platform_get_device_id(pdev)->driver_data; struct s5m_rtc_info *info; struct i2c_client *i2c; const struct regmap_config *regmap_cfg; @@ -646,7 +648,7 @@ static int s5m_rtc_probe(struct platform_device *pdev) if (!info) return -ENOMEM; - switch (platform_get_device_id(pdev)->driver_data) { + switch (device_type) { case S2MPS15X: regmap_cfg = &s2mps14_rtc_regmap_config; info->regs = &s2mps15_rtc_regs; @@ -669,8 +671,8 @@ static int s5m_rtc_probe(struct platform_device *pdev) break; default: return dev_err_probe(&pdev->dev, -ENODEV, - "Device type %lu is not supported by RTC driver\n", - platform_get_device_id(pdev)->driver_data); + "Device type %d is not supported by RTC driver\n", + device_type); } i2c = devm_i2c_new_dummy_device(&pdev->dev, s5m87xx->i2c->adapter, @@ -686,7 +688,7 @@ static int s5m_rtc_probe(struct platform_device *pdev) info->dev = &pdev->dev; info->s5m87xx = s5m87xx; - info->device_type = platform_get_device_id(pdev)->driver_data; + info->device_type = device_type; if (s5m87xx->irq_data) { info->irq = regmap_irq_get_virq(s5m87xx->irq_data, alarm_irq); From a57743bf009e788ada3607ac6b74ae9ebbd4ed74 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Andr=C3=A9=20Draszik?= Date: Wed, 9 Apr 2025 21:37:47 +0100 Subject: [PATCH 118/289] rtc: s5m: prepare for external regmap MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The Samsung S2MPG10 PMIC is not connected via I2C as this driver assumes, hence this driver's current approach of creating an I2C-based regmap doesn't work for it, and this driver should use the regmap provided by the parent (core) driver instead for that PMIC. To prepare this driver for s2mpg support, restructure the code to only create a regmap if one isn't provided by the parent. No functional changes, since the parent doesn't provide a regmap for any of the PMICs currently supported by this driver. Having this change separate will simply make the addition of S2MPG10 support more self-contained, without additional restructuring. Reviewed-by: Krzysztof Kozlowski Signed-off-by: André Draszik Link: https://lore.kernel.org/r/20250409-s2mpg10-v4-26-d66d5f39b6bf@linaro.org Signed-off-by: Alexandre Belloni --- drivers/rtc/rtc-s5m.c | 81 ++++++++++++++++++++++++------------------- 1 file changed, 45 insertions(+), 36 deletions(-) diff --git a/drivers/rtc/rtc-s5m.c b/drivers/rtc/rtc-s5m.c index c7636738b797..f8abcdee8611 100644 --- a/drivers/rtc/rtc-s5m.c +++ b/drivers/rtc/rtc-s5m.c @@ -640,52 +640,61 @@ static int s5m_rtc_probe(struct platform_device *pdev) enum sec_device_type device_type = platform_get_device_id(pdev)->driver_data; struct s5m_rtc_info *info; - struct i2c_client *i2c; - const struct regmap_config *regmap_cfg; int ret, alarm_irq; info = devm_kzalloc(&pdev->dev, sizeof(*info), GFP_KERNEL); if (!info) return -ENOMEM; - switch (device_type) { - case S2MPS15X: - regmap_cfg = &s2mps14_rtc_regmap_config; - info->regs = &s2mps15_rtc_regs; - alarm_irq = S2MPS14_IRQ_RTCA0; - break; - case S2MPS14X: - regmap_cfg = &s2mps14_rtc_regmap_config; - info->regs = &s2mps14_rtc_regs; - alarm_irq = S2MPS14_IRQ_RTCA0; - break; - case S2MPS13X: - regmap_cfg = &s2mps14_rtc_regmap_config; - info->regs = &s2mps13_rtc_regs; - alarm_irq = S2MPS14_IRQ_RTCA0; - break; - case S5M8767X: - regmap_cfg = &s5m_rtc_regmap_config; - info->regs = &s5m_rtc_regs; - alarm_irq = S5M8767_IRQ_RTCA1; - break; - default: + info->regmap = dev_get_regmap(pdev->dev.parent, "rtc"); + if (!info->regmap) { + const struct regmap_config *regmap_cfg; + struct i2c_client *i2c; + + switch (device_type) { + case S2MPS15X: + regmap_cfg = &s2mps14_rtc_regmap_config; + info->regs = &s2mps15_rtc_regs; + alarm_irq = S2MPS14_IRQ_RTCA0; + break; + case S2MPS14X: + regmap_cfg = &s2mps14_rtc_regmap_config; + info->regs = &s2mps14_rtc_regs; + alarm_irq = S2MPS14_IRQ_RTCA0; + break; + case S2MPS13X: + regmap_cfg = &s2mps14_rtc_regmap_config; + info->regs = &s2mps13_rtc_regs; + alarm_irq = S2MPS14_IRQ_RTCA0; + break; + case S5M8767X: + regmap_cfg = &s5m_rtc_regmap_config; + info->regs = &s5m_rtc_regs; + alarm_irq = S5M8767_IRQ_RTCA1; + break; + default: + return dev_err_probe(&pdev->dev, -ENODEV, + "Unsupported device type %d\n", + device_type); + } + + i2c = devm_i2c_new_dummy_device(&pdev->dev, + s5m87xx->i2c->adapter, + RTC_I2C_ADDR); + if (IS_ERR(i2c)) + return dev_err_probe(&pdev->dev, PTR_ERR(i2c), + "Failed to allocate I2C\n"); + + info->regmap = devm_regmap_init_i2c(i2c, regmap_cfg); + if (IS_ERR(info->regmap)) + return dev_err_probe(&pdev->dev, PTR_ERR(info->regmap), + "Failed to allocate regmap\n"); + } else { return dev_err_probe(&pdev->dev, -ENODEV, - "Device type %d is not supported by RTC driver\n", + "Unsupported device type %d\n", device_type); } - i2c = devm_i2c_new_dummy_device(&pdev->dev, s5m87xx->i2c->adapter, - RTC_I2C_ADDR); - if (IS_ERR(i2c)) - return dev_err_probe(&pdev->dev, PTR_ERR(i2c), - "Failed to allocate I2C for RTC\n"); - - info->regmap = devm_regmap_init_i2c(i2c, regmap_cfg); - if (IS_ERR(info->regmap)) - return dev_err_probe(&pdev->dev, PTR_ERR(info->regmap), - "Failed to allocate RTC register map\n"); - info->dev = &pdev->dev; info->s5m87xx = s5m87xx; info->device_type = device_type; From e64180846e7e4397938bbd0acf0640836f0f8051 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Andr=C3=A9=20Draszik?= Date: Wed, 9 Apr 2025 21:37:48 +0100 Subject: [PATCH 119/289] rtc: s5m: add support for S2MPG10 RTC MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Add support for Samsung's S2MPG10 PMIC RTC, which is similar to the existing PMIC RTCs supported by this driver. S2MPG10 doesn't use I2C, so we expect the core driver to have created a regmap for us. Additionally, it can be used for doing a cold-reset. If requested to do so (via DT), S2MPG10 is programmed with a watchdog configuration that will perform a full power cycle upon watchdog expiry. Reviewed-by: Krzysztof Kozlowski Signed-off-by: André Draszik Link: https://lore.kernel.org/r/20250409-s2mpg10-v4-27-d66d5f39b6bf@linaro.org Signed-off-by: Alexandre Belloni --- drivers/rtc/rtc-s5m.c | 60 +++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 60 insertions(+) diff --git a/drivers/rtc/rtc-s5m.c b/drivers/rtc/rtc-s5m.c index f8abcdee8611..c6394faaee86 100644 --- a/drivers/rtc/rtc-s5m.c +++ b/drivers/rtc/rtc-s5m.c @@ -10,6 +10,7 @@ #include #include #include +#include #include #include #include @@ -53,6 +54,7 @@ enum { * Device | Write time | Read time | Write alarm * ================================================= * S5M8767 | UDR + TIME | | UDR + * S2MPG10 | WUDR | RUDR | AUDR * S2MPS11/14 | WUDR | RUDR | WUDR + RUDR * S2MPS13 | WUDR | RUDR | WUDR + AUDR * S2MPS15 | WUDR | RUDR | AUDR @@ -99,6 +101,20 @@ static const struct s5m_rtc_reg_config s5m_rtc_regs = { .write_alarm_udr_mask = S5M_RTC_UDR_MASK, }; +/* Register map for S2MPG10 */ +static const struct s5m_rtc_reg_config s2mpg10_rtc_regs = { + .regs_count = 7, + .time = S2MPG10_RTC_SEC, + .ctrl = S2MPG10_RTC_CTRL, + .alarm0 = S2MPG10_RTC_A0SEC, + .alarm1 = S2MPG10_RTC_A1SEC, + .udr_update = S2MPG10_RTC_UPDATE, + .autoclear_udr_mask = S2MPS15_RTC_WUDR_MASK | S2MPS15_RTC_AUDR_MASK, + .read_time_udr_mask = S2MPS_RTC_RUDR_MASK, + .write_time_udr_mask = S2MPS15_RTC_WUDR_MASK, + .write_alarm_udr_mask = S2MPS15_RTC_AUDR_MASK, +}; + /* Register map for S2MPS13 */ static const struct s5m_rtc_reg_config s2mps13_rtc_regs = { .regs_count = 7, @@ -238,6 +254,7 @@ static int s5m_check_peding_alarm_interrupt(struct s5m_rtc_info *info, ret = regmap_read(info->regmap, S5M_RTC_STATUS, &val); val &= S5M_ALARM0_STATUS; break; + case S2MPG10: case S2MPS15X: case S2MPS14X: case S2MPS13X: @@ -300,6 +317,7 @@ static int s5m8767_rtc_set_alarm_reg(struct s5m_rtc_info *info) case S5M8767X: data &= ~S5M_RTC_TIME_EN_MASK; break; + case S2MPG10: case S2MPS15X: case S2MPS14X: case S2MPS13X: @@ -351,6 +369,7 @@ static int s5m_rtc_read_time(struct device *dev, struct rtc_time *tm) switch (info->device_type) { case S5M8767X: + case S2MPG10: case S2MPS15X: case S2MPS14X: case S2MPS13X: @@ -374,6 +393,7 @@ static int s5m_rtc_set_time(struct device *dev, struct rtc_time *tm) switch (info->device_type) { case S5M8767X: + case S2MPG10: case S2MPS15X: case S2MPS14X: case S2MPS13X: @@ -411,6 +431,7 @@ static int s5m_rtc_read_alarm(struct device *dev, struct rtc_wkalrm *alrm) switch (info->device_type) { case S5M8767X: + case S2MPG10: case S2MPS15X: case S2MPS14X: case S2MPS13X: @@ -449,6 +470,7 @@ static int s5m_rtc_stop_alarm(struct s5m_rtc_info *info) switch (info->device_type) { case S5M8767X: + case S2MPG10: case S2MPS15X: case S2MPS14X: case S2MPS13X: @@ -487,6 +509,7 @@ static int s5m_rtc_start_alarm(struct s5m_rtc_info *info) switch (info->device_type) { case S5M8767X: + case S2MPG10: case S2MPS15X: case S2MPS14X: case S2MPS13X: @@ -524,6 +547,7 @@ static int s5m_rtc_set_alarm(struct device *dev, struct rtc_wkalrm *alrm) switch (info->device_type) { case S5M8767X: + case S2MPG10: case S2MPS15X: case S2MPS14X: case S2MPS13X: @@ -604,6 +628,7 @@ static int s5m8767_rtc_init_reg(struct s5m_rtc_info *info) ret = regmap_raw_write(info->regmap, S5M_ALARM0_CONF, data, 2); break; + case S2MPG10: case S2MPS15X: case S2MPS14X: case S2MPS13X: @@ -634,6 +659,25 @@ static int s5m8767_rtc_init_reg(struct s5m_rtc_info *info) return ret; } +static int s5m_rtc_restart_s2mpg10(struct sys_off_data *data) +{ + struct s5m_rtc_info *info = data->cb_data; + int ret; + + if (data->mode != REBOOT_COLD && data->mode != REBOOT_HARD) + return NOTIFY_DONE; + + /* + * Arm watchdog with maximum timeout (2 seconds), and perform full reset + * on expiry. + */ + ret = regmap_set_bits(info->regmap, S2MPG10_RTC_WTSR, + (S2MPG10_WTSR_COLDTIMER | S2MPG10_WTSR_COLDRST + | S2MPG10_WTSR_WTSRT | S2MPG10_WTSR_WTSR_EN)); + + return ret ? NOTIFY_BAD : NOTIFY_DONE; +} + static int s5m_rtc_probe(struct platform_device *pdev) { struct sec_pmic_dev *s5m87xx = dev_get_drvdata(pdev->dev.parent); @@ -689,6 +733,9 @@ static int s5m_rtc_probe(struct platform_device *pdev) if (IS_ERR(info->regmap)) return dev_err_probe(&pdev->dev, PTR_ERR(info->regmap), "Failed to allocate regmap\n"); + } else if (device_type == S2MPG10) { + info->regs = &s2mpg10_rtc_regs; + alarm_irq = S2MPG10_IRQ_RTCA0; } else { return dev_err_probe(&pdev->dev, -ENODEV, "Unsupported device type %d\n", @@ -735,6 +782,18 @@ static int s5m_rtc_probe(struct platform_device *pdev) device_init_wakeup(&pdev->dev, true); } + if (of_device_is_system_power_controller(pdev->dev.parent->of_node) && + info->device_type == S2MPG10) { + ret = devm_register_sys_off_handler(&pdev->dev, + SYS_OFF_MODE_RESTART, + SYS_OFF_PRIO_HIGH + 1, + s5m_rtc_restart_s2mpg10, + info); + if (ret) + return dev_err_probe(&pdev->dev, ret, + "Failed to register restart handler\n"); + } + return devm_rtc_register_device(info->rtc_dev); } @@ -766,6 +825,7 @@ static SIMPLE_DEV_PM_OPS(s5m_rtc_pm_ops, s5m_rtc_suspend, s5m_rtc_resume); static const struct platform_device_id s5m_rtc_id[] = { { "s5m-rtc", S5M8767X }, + { "s2mpg10-rtc", S2MPG10 }, { "s2mps13-rtc", S2MPS13X }, { "s2mps14-rtc", S2MPS14X }, { "s2mps15-rtc", S2MPS15X }, From 972a3b47f6e191a0c5afcc45f4de74bf3d043a75 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Andr=C3=A9=20Draszik?= Date: Wed, 9 Apr 2025 21:37:49 +0100 Subject: [PATCH 120/289] rtc: s5m: fix a typo: peding -> pending MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Fix this minor typo, and adjust the a related incorrect alignment to avoid a checkpatch error. Reviewed-by: Krzysztof Kozlowski Signed-off-by: André Draszik Link: https://lore.kernel.org/r/20250409-s2mpg10-v4-28-d66d5f39b6bf@linaro.org Signed-off-by: Alexandre Belloni --- drivers/rtc/rtc-s5m.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/drivers/rtc/rtc-s5m.c b/drivers/rtc/rtc-s5m.c index c6394faaee86..095b090ec394 100644 --- a/drivers/rtc/rtc-s5m.c +++ b/drivers/rtc/rtc-s5m.c @@ -243,8 +243,8 @@ static int s5m8767_wait_for_udr_update(struct s5m_rtc_info *info) return ret; } -static int s5m_check_peding_alarm_interrupt(struct s5m_rtc_info *info, - struct rtc_wkalrm *alarm) +static int s5m_check_pending_alarm_interrupt(struct s5m_rtc_info *info, + struct rtc_wkalrm *alarm) { int ret; unsigned int val; @@ -451,7 +451,7 @@ static int s5m_rtc_read_alarm(struct device *dev, struct rtc_wkalrm *alrm) dev_dbg(dev, "%s: %ptR(%d)\n", __func__, &alrm->time, alrm->time.tm_wday); - return s5m_check_peding_alarm_interrupt(info, alrm); + return s5m_check_pending_alarm_interrupt(info, alrm); } static int s5m_rtc_stop_alarm(struct s5m_rtc_info *info) From 1dd609587414f8b2844e551d1fe0505f12871992 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Andr=C3=A9=20Draszik?= Date: Wed, 9 Apr 2025 21:37:50 +0100 Subject: [PATCH 121/289] rtc: s5m: switch to devm_device_init_wakeup MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit To release memory allocated by device_init_wakeup(true), drivers have to call device_init_wakeup(false) in error paths and unbind. Switch to the new devres managed version devm_device_init_wakeup() to plug this memleak. Reviewed-by: Krzysztof Kozlowski Signed-off-by: André Draszik Link: https://lore.kernel.org/r/20250409-s2mpg10-v4-29-d66d5f39b6bf@linaro.org Signed-off-by: Alexandre Belloni --- drivers/rtc/rtc-s5m.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/drivers/rtc/rtc-s5m.c b/drivers/rtc/rtc-s5m.c index 095b090ec394..f4caed953efd 100644 --- a/drivers/rtc/rtc-s5m.c +++ b/drivers/rtc/rtc-s5m.c @@ -779,7 +779,11 @@ static int s5m_rtc_probe(struct platform_device *pdev) return dev_err_probe(&pdev->dev, ret, "Failed to request alarm IRQ %d\n", info->irq); - device_init_wakeup(&pdev->dev, true); + + ret = devm_device_init_wakeup(&pdev->dev); + if (ret < 0) + return dev_err_probe(&pdev->dev, ret, + "Failed to init wakeup\n"); } if (of_device_is_system_power_controller(pdev->dev.parent->of_node) && From f5adb1fa04d08731a13db48afafd1a5f4384d3c9 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Andr=C3=A9=20Draszik?= Date: Wed, 9 Apr 2025 21:37:51 +0100 Subject: [PATCH 122/289] rtc: s5m: replace regmap_update_bits with regmap_clear/set_bits MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The regmap_clear_bits() and regmap_set_bits() helper macros state the intention a bit more obviously. Use those. Reviewed-by: Krzysztof Kozlowski Signed-off-by: André Draszik Link: https://lore.kernel.org/r/20250409-s2mpg10-v4-30-d66d5f39b6bf@linaro.org Signed-off-by: Alexandre Belloni --- drivers/rtc/rtc-s5m.c | 10 ++++------ 1 file changed, 4 insertions(+), 6 deletions(-) diff --git a/drivers/rtc/rtc-s5m.c b/drivers/rtc/rtc-s5m.c index f4caed953efd..27115523b8c2 100644 --- a/drivers/rtc/rtc-s5m.c +++ b/drivers/rtc/rtc-s5m.c @@ -338,8 +338,8 @@ static int s5m8767_rtc_set_alarm_reg(struct s5m_rtc_info *info) /* On S2MPS13 the AUDR is not auto-cleared */ if (info->device_type == S2MPS13X) - regmap_update_bits(info->regmap, info->regs->udr_update, - S2MPS13_RTC_AUDR_MASK, 0); + regmap_clear_bits(info->regmap, info->regs->udr_update, + S2MPS13_RTC_AUDR_MASK); return ret; } @@ -351,10 +351,8 @@ static int s5m_rtc_read_time(struct device *dev, struct rtc_time *tm) int ret; if (info->regs->read_time_udr_mask) { - ret = regmap_update_bits(info->regmap, - info->regs->udr_update, - info->regs->read_time_udr_mask, - info->regs->read_time_udr_mask); + ret = regmap_set_bits(info->regmap, info->regs->udr_update, + info->regs->read_time_udr_mask); if (ret) { dev_err(dev, "Failed to prepare registers for time reading: %d\n", From b1248da008362323f75e8b84874586e9ea4c0b31 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Andr=C3=A9=20Draszik?= Date: Wed, 9 Apr 2025 21:37:52 +0100 Subject: [PATCH 123/289] rtc: s5m: replace open-coded read/modify/write registers with regmap helpers MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Instead of the open-coded read/modify/write sequence, we can simply use the regmap helpers regmap_set_bits() and regmap_update_bits() respectively. This makes the code easier to read, and avoids extra work in case the underlying bus supports updating bits via struct regmap_bus::reg_update_bits() directly (which is the case for S2MPG10 on gs101 where this driver communicates via ACPM). Signed-off-by: André Draszik Link: https://lore.kernel.org/r/20250409-s2mpg10-v4-31-d66d5f39b6bf@linaro.org Signed-off-by: Alexandre Belloni --- drivers/rtc/rtc-s5m.c | 28 +++++++--------------------- 1 file changed, 7 insertions(+), 21 deletions(-) diff --git a/drivers/rtc/rtc-s5m.c b/drivers/rtc/rtc-s5m.c index 27115523b8c2..a7220b4d0e8d 100644 --- a/drivers/rtc/rtc-s5m.c +++ b/drivers/rtc/rtc-s5m.c @@ -279,17 +279,9 @@ static int s5m_check_pending_alarm_interrupt(struct s5m_rtc_info *info, static int s5m8767_rtc_set_time_reg(struct s5m_rtc_info *info) { int ret; - unsigned int data; - ret = regmap_read(info->regmap, info->regs->udr_update, &data); - if (ret < 0) { - dev_err(info->dev, "failed to read update reg(%d)\n", ret); - return ret; - } - - data |= info->regs->write_time_udr_mask; - - ret = regmap_write(info->regmap, info->regs->udr_update, data); + ret = regmap_set_bits(info->regmap, info->regs->udr_update, + info->regs->write_time_udr_mask); if (ret < 0) { dev_err(info->dev, "failed to write update reg(%d)\n", ret); return ret; @@ -303,19 +295,12 @@ static int s5m8767_rtc_set_time_reg(struct s5m_rtc_info *info) static int s5m8767_rtc_set_alarm_reg(struct s5m_rtc_info *info) { int ret; - unsigned int data; + unsigned int udr_mask; - ret = regmap_read(info->regmap, info->regs->udr_update, &data); - if (ret < 0) { - dev_err(info->dev, "%s: fail to read update reg(%d)\n", - __func__, ret); - return ret; - } - - data |= info->regs->write_alarm_udr_mask; + udr_mask = info->regs->write_alarm_udr_mask; switch (info->device_type) { case S5M8767X: - data &= ~S5M_RTC_TIME_EN_MASK; + udr_mask |= S5M_RTC_TIME_EN_MASK; break; case S2MPG10: case S2MPS15X: @@ -327,7 +312,8 @@ static int s5m8767_rtc_set_alarm_reg(struct s5m_rtc_info *info) return -EINVAL; } - ret = regmap_write(info->regmap, info->regs->udr_update, data); + ret = regmap_update_bits(info->regmap, info->regs->udr_update, + udr_mask, info->regs->write_alarm_udr_mask); if (ret < 0) { dev_err(info->dev, "%s: fail to write update reg(%d)\n", __func__, ret); From 10357824151262636fda879845f8b64553541106 Mon Sep 17 00:00:00 2001 From: Chaoyi Chen Date: Fri, 20 Jun 2025 09:16:16 +0800 Subject: [PATCH 124/289] drm/bridge-connector: Fix bridge in drm_connector_hdmi_audio_init() The bridge used in drm_connector_hdmi_audio_init() does not correctly point to the required audio bridge, which lead to incorrect audio configuration input. Fixes: 231adeda9f67 ("drm/bridge-connector: hook DisplayPort audio support") Signed-off-by: Chaoyi Chen Reviewed-by: Dmitry Baryshkov Tested-by: Stephan Gerhold Link: https://lore.kernel.org/r/20250620011616.118-1-kernel@airkyi.com Signed-off-by: Dmitry Baryshkov --- drivers/gpu/drm/display/drm_bridge_connector.c | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/display/drm_bridge_connector.c b/drivers/gpu/drm/display/drm_bridge_connector.c index 7d2e499ea5de..262e93e07a28 100644 --- a/drivers/gpu/drm/display/drm_bridge_connector.c +++ b/drivers/gpu/drm/display/drm_bridge_connector.c @@ -708,11 +708,14 @@ struct drm_connector *drm_bridge_connector_init(struct drm_device *drm, if (bridge_connector->bridge_hdmi_audio || bridge_connector->bridge_dp_audio) { struct device *dev; + struct drm_bridge *bridge; if (bridge_connector->bridge_hdmi_audio) - dev = bridge_connector->bridge_hdmi_audio->hdmi_audio_dev; + bridge = bridge_connector->bridge_hdmi_audio; else - dev = bridge_connector->bridge_dp_audio->hdmi_audio_dev; + bridge = bridge_connector->bridge_dp_audio; + + dev = bridge->hdmi_audio_dev; ret = drm_connector_hdmi_audio_init(connector, dev, &drm_bridge_connector_hdmi_audio_funcs, From 00a39d8652ff9088de07a6fe6e9e1893452fe0dd Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Mateusz=20Jo=C5=84czyk?= Date: Sat, 7 Jun 2025 23:06:08 +0200 Subject: [PATCH 125/289] rtc: cmos: use spin_lock_irqsave in cmos_interrupt MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit cmos_interrupt() can be called in a non-interrupt context, such as in an ACPI event handler (which runs in an interrupt thread). Therefore, usage of spin_lock(&rtc_lock) is insecure. Use spin_lock_irqsave() / spin_unlock_irqrestore() instead. Before a misguided commit 6950d046eb6e ("rtc: cmos: Replace spin_lock_irqsave with spin_lock in hard IRQ") the cmos_interrupt() function used spin_lock_irqsave(). That commit changed it to spin_lock() and broke locking, which was partially fixed in commit 13be2efc390a ("rtc: cmos: Disable irq around direct invocation of cmos_interrupt()") That second commit did not take account of the ACPI fixed event handler pathway, however. It introduced local_irq_disable() workarounds in cmos_check_wkalrm(), which can cause problems on PREEMPT_RT kernels and are now unnecessary. Add an explicit comment so that this change will not be reverted by mistake. Cc: stable@vger.kernel.org Fixes: 6950d046eb6e ("rtc: cmos: Replace spin_lock_irqsave with spin_lock in hard IRQ") Signed-off-by: Mateusz Jończyk Reviewed-by: Sebastian Andrzej Siewior Tested-by: Chris Bainbridge Reported-by: Chris Bainbridge Closes: https://lore.kernel.org/all/aDtJ92foPUYmGheF@debian.local/ Link: https://lore.kernel.org/r/20250607210608.14835-1-mat.jonczyk@o2.pl Signed-off-by: Alexandre Belloni --- drivers/rtc/rtc-cmos.c | 10 ++++++---- 1 file changed, 6 insertions(+), 4 deletions(-) diff --git a/drivers/rtc/rtc-cmos.c b/drivers/rtc/rtc-cmos.c index 8172869bd3d7..0743c6acd6e2 100644 --- a/drivers/rtc/rtc-cmos.c +++ b/drivers/rtc/rtc-cmos.c @@ -692,8 +692,12 @@ static irqreturn_t cmos_interrupt(int irq, void *p) { u8 irqstat; u8 rtc_control; + unsigned long flags; - spin_lock(&rtc_lock); + /* We cannot use spin_lock() here, as cmos_interrupt() is also called + * in a non-irq context. + */ + spin_lock_irqsave(&rtc_lock, flags); /* When the HPET interrupt handler calls us, the interrupt * status is passed as arg1 instead of the irq number. But @@ -727,7 +731,7 @@ static irqreturn_t cmos_interrupt(int irq, void *p) hpet_mask_rtc_irq_bit(RTC_AIE); CMOS_READ(RTC_INTR_FLAGS); } - spin_unlock(&rtc_lock); + spin_unlock_irqrestore(&rtc_lock, flags); if (is_intr(irqstat)) { rtc_update_irq(p, 1, irqstat); @@ -1295,9 +1299,7 @@ static void cmos_check_wkalrm(struct device *dev) * ACK the rtc irq here */ if (t_now >= cmos->alarm_expires && cmos_use_acpi_alarm()) { - local_irq_disable(); cmos_interrupt(0, (void *)cmos->rtc); - local_irq_enable(); return; } From 2f73c62d4e13df67380ff6faca39eec2bf08dd93 Mon Sep 17 00:00:00 2001 From: Nam Cao Date: Fri, 20 Jun 2025 13:09:39 +0200 Subject: [PATCH 126/289] Revert "riscv: misaligned: fix sleeping function called during misaligned access handling" MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit This reverts commit 61a74ad25462 ("riscv: misaligned: fix sleeping function called during misaligned access handling"). The commit addresses a sleeping in atomic context problem, but it is not the correct fix as explained by Clément: "Using nofault would lead to failure to read from user memory that is paged out for instance. This is not really acceptable, we should handle user misaligned access even at an address that would generate a page fault." This bug has been properly fixed by commit 453805f0a28f ("riscv: misaligned: enable IRQs while handling misaligned accesses"). Revert this improper fix. Link: https://lore.kernel.org/linux-riscv/b779beed-e44e-4a5e-9551-4647682b0d21@rivosinc.com/ Signed-off-by: Nam Cao Cc: stable@vger.kernel.org Reviewed-by: Clément Léger Reviewed-by: Alexandre Ghiti Fixes: 61a74ad25462 ("riscv: misaligned: fix sleeping function called during misaligned access handling") Link: https://lore.kernel.org/r/20250620110939.1642735-1-namcao@linutronix.de Signed-off-by: Palmer Dabbelt --- arch/riscv/kernel/traps_misaligned.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/arch/riscv/kernel/traps_misaligned.c b/arch/riscv/kernel/traps_misaligned.c index dd8e4af6583f..93043924fe6c 100644 --- a/arch/riscv/kernel/traps_misaligned.c +++ b/arch/riscv/kernel/traps_misaligned.c @@ -454,7 +454,7 @@ static int handle_scalar_misaligned_load(struct pt_regs *regs) val.data_u64 = 0; if (user_mode(regs)) { - if (copy_from_user_nofault(&val, (u8 __user *)addr, len)) + if (copy_from_user(&val, (u8 __user *)addr, len)) return -1; } else { memcpy(&val, (u8 *)addr, len); @@ -555,7 +555,7 @@ static int handle_scalar_misaligned_store(struct pt_regs *regs) return -EOPNOTSUPP; if (user_mode(regs)) { - if (copy_to_user_nofault((u8 __user *)addr, &val, len)) + if (copy_to_user((u8 __user *)addr, &val, len)) return -1; } else { memcpy((u8 *)addr, &val, len); From b0843f836126c980fb31923563c86ec52e8b9514 Mon Sep 17 00:00:00 2001 From: Alexandre Ghiti Date: Fri, 20 Jun 2025 12:08:11 +0000 Subject: [PATCH 127/289] riscv: Fix sparse warning in vendor_extensions/sifive.c sparse reports the following warning: arch/riscv/kernel/vendor_extensions/sifive.c:11:33: sparse: sparse: symbol 'riscv_isa_vendor_ext_sifive' was not declared. Should it be static? So as this struct is only used in this file, make it static. Fixes: 2d147d77ae6e ("riscv: Add SiFive xsfvqmaccdod and xsfvqmaccqoq vendor extensions") Reported-by: kernel test robot Closes: https://lore.kernel.org/oe-kbuild-all/202505072100.TZlEp8h1-lkp@intel.com/ Signed-off-by: Alexandre Ghiti Link: https://lore.kernel.org/r/20250620-dev-alex-fix_sparse_sifive_v1-v1-1-efa3a6f93846@rivosinc.com Signed-off-by: Palmer Dabbelt --- arch/riscv/kernel/vendor_extensions/sifive.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/riscv/kernel/vendor_extensions/sifive.c b/arch/riscv/kernel/vendor_extensions/sifive.c index 1411337dc1e6..8fcf67e8c07f 100644 --- a/arch/riscv/kernel/vendor_extensions/sifive.c +++ b/arch/riscv/kernel/vendor_extensions/sifive.c @@ -8,7 +8,7 @@ #include /* All SiFive vendor extensions supported in Linux */ -const struct riscv_isa_ext_data riscv_isa_vendor_ext_sifive[] = { +static const struct riscv_isa_ext_data riscv_isa_vendor_ext_sifive[] = { __RISCV_ISA_EXT_DATA(xsfvfnrclipxfqf, RISCV_ISA_VENDOR_EXT_XSFVFNRCLIPXFQF), __RISCV_ISA_EXT_DATA(xsfvfwmaccqqq, RISCV_ISA_VENDOR_EXT_XSFVFWMACCQQQ), __RISCV_ISA_EXT_DATA(xsfvqmaccdod, RISCV_ISA_VENDOR_EXT_XSFVQMACCDOD), From 890ba5be6335dbbbc99af14ea007befb5f83f174 Mon Sep 17 00:00:00 2001 From: Nam Cao Date: Thu, 19 Jun 2025 17:58:58 +0200 Subject: [PATCH 128/289] Revert "riscv: Define TASK_SIZE_MAX for __access_ok()" This reverts commit ad5643cf2f69 ("riscv: Define TASK_SIZE_MAX for __access_ok()"). This commit changes TASK_SIZE_MAX to be LONG_MAX to optimize access_ok(), because the previous TASK_SIZE_MAX (default to TASK_SIZE) requires some computation. The reasoning was that all user addresses are less than LONG_MAX, and all kernel addresses are greater than LONG_MAX. Therefore access_ok() can filter kernel addresses. Addresses between TASK_SIZE and LONG_MAX are not valid user addresses, but access_ok() let them pass. That was thought to be okay, because they are not valid addresses at hardware level. Unfortunately, one case is missed: get_user_pages_fast() happily accepts addresses between TASK_SIZE and LONG_MAX. futex(), for instance, uses get_user_pages_fast(). This causes the problem reported by Robert [1]. Therefore, revert this commit. TASK_SIZE_MAX is changed to the default: TASK_SIZE. This unfortunately reduces performance, because TASK_SIZE is more expensive to compute compared to LONG_MAX. But correctness first, we can think about optimization later, if required. Reported-by: Closes: https://lore.kernel.org/linux-riscv/77605.1750245028@localhost/ Signed-off-by: Nam Cao Cc: stable@vger.kernel.org Reviewed-by: Alexandre Ghiti Fixes: ad5643cf2f69 ("riscv: Define TASK_SIZE_MAX for __access_ok()") Link: https://lore.kernel.org/r/20250619155858.1249789-1-namcao@linutronix.de Signed-off-by: Palmer Dabbelt --- arch/riscv/include/asm/pgtable.h | 1 - 1 file changed, 1 deletion(-) diff --git a/arch/riscv/include/asm/pgtable.h b/arch/riscv/include/asm/pgtable.h index a11816bbf9e7..521a995741cd 100644 --- a/arch/riscv/include/asm/pgtable.h +++ b/arch/riscv/include/asm/pgtable.h @@ -1075,7 +1075,6 @@ static inline pte_t pte_swp_clear_exclusive(pte_t pte) */ #ifdef CONFIG_64BIT #define TASK_SIZE_64 (PGDIR_SIZE * PTRS_PER_PGD / 2) -#define TASK_SIZE_MAX LONG_MAX #ifdef CONFIG_COMPAT #define TASK_SIZE_32 (_AC(0x80000000, UL) - PAGE_SIZE) From c5136add3f9b4c23b8bbe5f4d722c95d4cfb936e Mon Sep 17 00:00:00 2001 From: Klara Modin Date: Tue, 17 Jun 2025 14:58:47 +0200 Subject: [PATCH 129/289] riscv: export boot_cpu_hartid The mailbox controller driver for the Microchip Inter-processor Communication can be built as a module. It uses cpuid_to_hartid_map and commit 4783ce32b080 ("riscv: export __cpuid_to_hartid_map") enables that to work for SMP. However, cpuid_to_hartid_map uses boot_cpu_hartid on non-SMP kernels and this driver can be useful in such configurations[1]. Export boot_cpu_hartid so the driver can be built as a module on non-SMP kernels as well. Link: https://lore.kernel.org/lkml/20250617-confess-reimburse-876101e099cb@spud/ [1] Cc: stable@vger.kernel.org Fixes: e4b1d67e7141 ("mailbox: add Microchip IPC support") Signed-off-by: Klara Modin Acked-by: Conor Dooley Link: https://lore.kernel.org/r/20250617125847.23829-1-klarasmodin@gmail.com Signed-off-by: Palmer Dabbelt --- arch/riscv/kernel/setup.c | 1 + 1 file changed, 1 insertion(+) diff --git a/arch/riscv/kernel/setup.c b/arch/riscv/kernel/setup.c index f7c9a1caa83e..14888e5ea19a 100644 --- a/arch/riscv/kernel/setup.c +++ b/arch/riscv/kernel/setup.c @@ -50,6 +50,7 @@ atomic_t hart_lottery __section(".sdata") #endif ; unsigned long boot_cpu_hartid; +EXPORT_SYMBOL_GPL(boot_cpu_hartid); /* * Place kernel memory regions on the resource tree so that From b272f42547d85356b035e46273ddaf2aa4e161b8 Mon Sep 17 00:00:00 2001 From: Harshit Mogalapalli Date: Mon, 23 Jun 2025 07:26:27 -0700 Subject: [PATCH 130/289] ALSA: qc_audio_offload: Fix missing error code in prepare_qmi_response() When snd_soc_usb_find_priv_data() fails, return failure instead of success. While we are at it also use direct returns at first few error paths where there is no additional cleanup needed. Reported-by: Dan Carpenter Closes: https://lore.kernel.org/all/Z_40qL4JnyjR4j0O@stanley.mountain/ Fixes: 326bbc348298 ("ALSA: usb-audio: qcom: Introduce QC USB SND offloading support") Signed-off-by: Harshit Mogalapalli Link: https://patch.msgid.link/20250623142639.2938056-1-harshit.m.mogalapalli@oracle.com Signed-off-by: Takashi Iwai --- sound/usb/qcom/qc_audio_offload.c | 14 +++++++------- 1 file changed, 7 insertions(+), 7 deletions(-) diff --git a/sound/usb/qcom/qc_audio_offload.c b/sound/usb/qcom/qc_audio_offload.c index 5bc27c82e0af..797afd4561bd 100644 --- a/sound/usb/qcom/qc_audio_offload.c +++ b/sound/usb/qcom/qc_audio_offload.c @@ -1360,20 +1360,21 @@ static int prepare_qmi_response(struct snd_usb_substream *subs, if (!uadev[card_num].ctrl_intf) { dev_err(&subs->dev->dev, "audio ctrl intf info not cached\n"); - ret = -ENODEV; - goto err; + return -ENODEV; } ret = uaudio_populate_uac_desc(subs, resp); if (ret < 0) - goto err; + return ret; resp->slot_id = subs->dev->slot_id; resp->slot_id_valid = 1; data = snd_soc_usb_find_priv_data(uaudio_qdev->auxdev->dev.parent); - if (!data) - goto err; + if (!data) { + dev_err(&subs->dev->dev, "No private data found\n"); + return -ENODEV; + } uaudio_qdev->data = data; @@ -1382,7 +1383,7 @@ static int prepare_qmi_response(struct snd_usb_substream *subs, &resp->xhci_mem_info.tr_data, &resp->std_as_data_ep_desc); if (ret < 0) - goto err; + return ret; resp->std_as_data_ep_desc_valid = 1; @@ -1500,7 +1501,6 @@ drop_data_ep: xhci_sideband_remove_endpoint(uadev[card_num].sb, usb_pipe_endpoint(subs->dev, subs->data_endpoint->pipe)); -err: return ret; } From 14633da0f416fdbb6844d1b295cdc828b666e273 Mon Sep 17 00:00:00 2001 From: Victor Shih Date: Fri, 6 Jun 2025 19:01:19 +0800 Subject: [PATCH 131/289] mmc: core: Adjust some error messages for SD UHS-II cards Adjust some error messages to debug mode to avoid causing misunderstanding it is an error. Signed-off-by: Victor Shih Acked-by: Adrian Hunter Fixes: 9a9f7e13952b ("mmc: core: Support UHS-II card control and access") Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20250606110121.96314-2-victorshihgli@gmail.com Signed-off-by: Ulf Hansson --- drivers/mmc/core/sd_uhs2.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/mmc/core/sd_uhs2.c b/drivers/mmc/core/sd_uhs2.c index 1c31d0dfa961..de17d1611290 100644 --- a/drivers/mmc/core/sd_uhs2.c +++ b/drivers/mmc/core/sd_uhs2.c @@ -91,8 +91,8 @@ static int sd_uhs2_phy_init(struct mmc_host *host) err = host->ops->uhs2_control(host, UHS2_PHY_INIT); if (err) { - pr_err("%s: failed to initial phy for UHS-II!\n", - mmc_hostname(host)); + pr_debug("%s: failed to initial phy for UHS-II!\n", + mmc_hostname(host)); } return err; From 2881ba9af073faa8ee7408a8d1e0575e50eb3f6c Mon Sep 17 00:00:00 2001 From: Victor Shih Date: Fri, 6 Jun 2025 19:01:20 +0800 Subject: [PATCH 132/289] mmc: sdhci: Add a helper function for dump register in dynamic debug mode Add a helper function for dump register in dynamic debug mode. Signed-off-by: Victor Shih Acked-by: Adrian Hunter Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20250606110121.96314-3-victorshihgli@gmail.com Signed-off-by: Ulf Hansson --- drivers/mmc/host/sdhci.h | 16 ++++++++++++++++ 1 file changed, 16 insertions(+) diff --git a/drivers/mmc/host/sdhci.h b/drivers/mmc/host/sdhci.h index f9d65dd0f2b2..70ada1857a4c 100644 --- a/drivers/mmc/host/sdhci.h +++ b/drivers/mmc/host/sdhci.h @@ -900,4 +900,20 @@ void sdhci_switch_external_dma(struct sdhci_host *host, bool en); void sdhci_set_data_timeout_irq(struct sdhci_host *host, bool enable); void __sdhci_set_timeout(struct sdhci_host *host, struct mmc_command *cmd); +#if defined(CONFIG_DYNAMIC_DEBUG) || \ + (defined(CONFIG_DYNAMIC_DEBUG_CORE) && defined(DYNAMIC_DEBUG_MODULE)) +#define SDHCI_DBG_ANYWAY 0 +#elif defined(DEBUG) +#define SDHCI_DBG_ANYWAY 1 +#else +#define SDHCI_DBG_ANYWAY 0 +#endif + +#define sdhci_dbg_dumpregs(host, fmt) \ +do { \ + DEFINE_DYNAMIC_DEBUG_METADATA(descriptor, fmt); \ + if (DYNAMIC_DEBUG_BRANCH(descriptor) || SDHCI_DBG_ANYWAY) \ + sdhci_dumpregs(host); \ +} while (0) + #endif /* __SDHCI_HW_H */ From 49b14db035135341f6cf4de7af6ac2cbc8ad29b6 Mon Sep 17 00:00:00 2001 From: Victor Shih Date: Fri, 6 Jun 2025 19:01:21 +0800 Subject: [PATCH 133/289] mmc: sdhci-uhs2: Adjust some error messages and register dump for SD UHS-II card Adjust some error messages to debug mode and register dump to dynamic debug mode to avoid causing misunderstanding it is an error. Signed-off-by: Victor Shih Acked-by: Adrian Hunter Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20250606110121.96314-4-victorshihgli@gmail.com Signed-off-by: Ulf Hansson --- drivers/mmc/host/sdhci-uhs2.c | 20 ++++++++++---------- 1 file changed, 10 insertions(+), 10 deletions(-) diff --git a/drivers/mmc/host/sdhci-uhs2.c b/drivers/mmc/host/sdhci-uhs2.c index c53b64d50c0d..0efeb9d0c376 100644 --- a/drivers/mmc/host/sdhci-uhs2.c +++ b/drivers/mmc/host/sdhci-uhs2.c @@ -99,8 +99,8 @@ void sdhci_uhs2_reset(struct sdhci_host *host, u16 mask) /* hw clears the bit when it's done */ if (read_poll_timeout_atomic(sdhci_readw, val, !(val & mask), 10, UHS2_RESET_TIMEOUT_100MS, true, host, SDHCI_UHS2_SW_RESET)) { - pr_warn("%s: %s: Reset 0x%x never completed. %s: clean reset bit.\n", __func__, - mmc_hostname(host->mmc), (int)mask, mmc_hostname(host->mmc)); + pr_debug("%s: %s: Reset 0x%x never completed. %s: clean reset bit.\n", __func__, + mmc_hostname(host->mmc), (int)mask, mmc_hostname(host->mmc)); sdhci_writeb(host, 0, SDHCI_UHS2_SW_RESET); return; } @@ -335,8 +335,8 @@ static int sdhci_uhs2_interface_detect(struct sdhci_host *host) if (read_poll_timeout(sdhci_readl, val, (val & SDHCI_UHS2_IF_DETECT), 100, UHS2_INTERFACE_DETECT_TIMEOUT_100MS, true, host, SDHCI_PRESENT_STATE)) { - pr_warn("%s: not detect UHS2 interface in 100ms.\n", mmc_hostname(host->mmc)); - sdhci_dumpregs(host); + pr_debug("%s: not detect UHS2 interface in 100ms.\n", mmc_hostname(host->mmc)); + sdhci_dbg_dumpregs(host, "UHS2 interface detect timeout in 100ms"); return -EIO; } @@ -345,8 +345,8 @@ static int sdhci_uhs2_interface_detect(struct sdhci_host *host) if (read_poll_timeout(sdhci_readl, val, (val & SDHCI_UHS2_LANE_SYNC), 100, UHS2_LANE_SYNC_TIMEOUT_150MS, true, host, SDHCI_PRESENT_STATE)) { - pr_warn("%s: UHS2 Lane sync fail in 150ms.\n", mmc_hostname(host->mmc)); - sdhci_dumpregs(host); + pr_debug("%s: UHS2 Lane sync fail in 150ms.\n", mmc_hostname(host->mmc)); + sdhci_dbg_dumpregs(host, "UHS2 Lane sync fail in 150ms"); return -EIO; } @@ -417,12 +417,12 @@ static int sdhci_uhs2_do_detect_init(struct mmc_host *mmc) host->ops->uhs2_pre_detect_init(host); if (sdhci_uhs2_interface_detect(host)) { - pr_warn("%s: cannot detect UHS2 interface.\n", mmc_hostname(host->mmc)); + pr_debug("%s: cannot detect UHS2 interface.\n", mmc_hostname(host->mmc)); return -EIO; } if (sdhci_uhs2_init(host)) { - pr_warn("%s: UHS2 init fail.\n", mmc_hostname(host->mmc)); + pr_debug("%s: UHS2 init fail.\n", mmc_hostname(host->mmc)); return -EIO; } @@ -504,8 +504,8 @@ static int sdhci_uhs2_check_dormant(struct sdhci_host *host) if (read_poll_timeout(sdhci_readl, val, (val & SDHCI_UHS2_IN_DORMANT_STATE), 100, UHS2_CHECK_DORMANT_TIMEOUT_100MS, true, host, SDHCI_PRESENT_STATE)) { - pr_warn("%s: UHS2 IN_DORMANT fail in 100ms.\n", mmc_hostname(host->mmc)); - sdhci_dumpregs(host); + pr_debug("%s: UHS2 IN_DORMANT fail in 100ms.\n", mmc_hostname(host->mmc)); + sdhci_dbg_dumpregs(host, "UHS2 IN_DORMANT fail in 100ms"); return -EIO; } return 0; From dcc3bcfc5b50c625b475dcc25d167b6b947a6637 Mon Sep 17 00:00:00 2001 From: Ulf Hansson Date: Tue, 24 Jun 2025 13:09:32 +0200 Subject: [PATCH 134/289] Revert "mmc: sdhci: Disable SD card clock before changing parameters" It has turned out the trying to strictly conform to the SDHCI specification is causing problems. Let's revert and start over. This reverts commit fb3bbc46c94f261b6156ee863c1b06c84cf157dc. Cc: Erick Shepherd Cc: stable@vger.kernel.org Fixes: fb3bbc46c94f ("mmc: sdhci: Disable SD card clock before changing parameters") Suggested-by: Adrian Hunter Reported-by: Jonathan Liu Reported-by: Salvatore Bonaccorso Closes: https://bugs.debian.org/1108065 Acked-by: Adrian Hunter Signed-off-by: Ulf Hansson Link: https://lore.kernel.org/r/20250624110932.176925-1-ulf.hansson@linaro.org --- drivers/mmc/host/sdhci.c | 9 ++------- 1 file changed, 2 insertions(+), 7 deletions(-) diff --git a/drivers/mmc/host/sdhci.c b/drivers/mmc/host/sdhci.c index f008167d1863..e116f2db34d5 100644 --- a/drivers/mmc/host/sdhci.c +++ b/drivers/mmc/host/sdhci.c @@ -2065,15 +2065,10 @@ void sdhci_set_clock(struct sdhci_host *host, unsigned int clock) host->mmc->actual_clock = 0; - clk = sdhci_readw(host, SDHCI_CLOCK_CONTROL); - if (clk & SDHCI_CLOCK_CARD_EN) - sdhci_writew(host, clk & ~SDHCI_CLOCK_CARD_EN, - SDHCI_CLOCK_CONTROL); + sdhci_writew(host, 0, SDHCI_CLOCK_CONTROL); - if (clock == 0) { - sdhci_writew(host, 0, SDHCI_CLOCK_CONTROL); + if (clock == 0) return; - } clk = sdhci_calc_clk(host, clock, &host->mmc->actual_clock); sdhci_enable_clk(host, clk); From ff21a6ec0f27c126db0a86d96751bd6e5d1d9874 Mon Sep 17 00:00:00 2001 From: Jack Yu Date: Tue, 24 Jun 2025 02:59:28 +0000 Subject: [PATCH 135/289] ASoC: rt721-sdca: fix boost gain calculation error Fix the boost gain calculation error in rt721_sdca_set_gain_get. This patch is specific for "FU33 Boost Volume". Signed-off-by: Jack Yu Link: https://patch.msgid.link/1b18fcde41c64d6fa85451d523c0434a@realtek.com Signed-off-by: Mark Brown --- sound/soc/codecs/rt721-sdca.c | 23 +++++++++++++++++++---- 1 file changed, 19 insertions(+), 4 deletions(-) diff --git a/sound/soc/codecs/rt721-sdca.c b/sound/soc/codecs/rt721-sdca.c index 1c9f32e405cf..ba080957e933 100644 --- a/sound/soc/codecs/rt721-sdca.c +++ b/sound/soc/codecs/rt721-sdca.c @@ -430,6 +430,7 @@ static int rt721_sdca_set_gain_get(struct snd_kcontrol *kcontrol, unsigned int read_l, read_r, ctl_l = 0, ctl_r = 0; unsigned int adc_vol_flag = 0; const unsigned int interval_offset = 0xc0; + const unsigned int tendA = 0x200; const unsigned int tendB = 0xa00; if (strstr(ucontrol->id.name, "FU1E Capture Volume") || @@ -439,9 +440,16 @@ static int rt721_sdca_set_gain_get(struct snd_kcontrol *kcontrol, regmap_read(rt721->mbq_regmap, mc->reg, &read_l); regmap_read(rt721->mbq_regmap, mc->rreg, &read_r); - if (mc->shift == 8) /* boost gain */ + if (mc->shift == 8) { + /* boost gain */ ctl_l = read_l / tendB; - else { + } else if (mc->shift == 1) { + /* FU33 boost gain */ + if (read_l == 0x8000 || read_l == 0xfe00) + ctl_l = 0; + else + ctl_l = read_l / tendA + 1; + } else { if (adc_vol_flag) ctl_l = mc->max - (((0x1e00 - read_l) & 0xffff) / interval_offset); else @@ -449,9 +457,16 @@ static int rt721_sdca_set_gain_get(struct snd_kcontrol *kcontrol, } if (read_l != read_r) { - if (mc->shift == 8) /* boost gain */ + if (mc->shift == 8) { + /* boost gain */ ctl_r = read_r / tendB; - else { /* ADC/DAC gain */ + } else if (mc->shift == 1) { + /* FU33 boost gain */ + if (read_r == 0x8000 || read_r == 0xfe00) + ctl_r = 0; + else + ctl_r = read_r / tendA + 1; + } else { /* ADC/DAC gain */ if (adc_vol_flag) ctl_r = mc->max - (((0x1e00 - read_r) & 0xffff) / interval_offset); else From fa78e9b606a472495ef5b6b3d8b45c37f7727f9d Mon Sep 17 00:00:00 2001 From: Elena Popa Date: Fri, 30 May 2025 13:40:00 +0300 Subject: [PATCH 136/289] rtc: pcf2127: fix SPI command byte for PCF2131 PCF2131 was not responding to read/write operations using SPI. PCF2131 has a different command byte definition, compared to PCF2127/29. Added the new command byte definition when PCF2131 is detected. Fixes: afc505bf9039 ("rtc: pcf2127: add support for PCF2131 RTC") Cc: stable@vger.kernel.org Signed-off-by: Elena Popa Acked-by: Hugo Villeneuve Link: https://lore.kernel.org/r/20250530104001.957977-1-elena.popa@nxp.com Signed-off-by: Alexandre Belloni --- drivers/rtc/rtc-pcf2127.c | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/drivers/rtc/rtc-pcf2127.c b/drivers/rtc/rtc-pcf2127.c index 31c7dca8f469..2c7917bc2a31 100644 --- a/drivers/rtc/rtc-pcf2127.c +++ b/drivers/rtc/rtc-pcf2127.c @@ -1538,6 +1538,11 @@ static int pcf2127_spi_probe(struct spi_device *spi) variant = &pcf21xx_cfg[type]; } + if (variant->type == PCF2131) { + config.read_flag_mask = 0x0; + config.write_flag_mask = 0x0; + } + config.max_register = variant->max_register, regmap = devm_regmap_init_spi(spi, &config); From 08d82d0cad51c2b1d454fe41ea1ff96ade676961 Mon Sep 17 00:00:00 2001 From: Hugo Villeneuve Date: Thu, 29 May 2025 16:29:22 -0400 Subject: [PATCH 137/289] rtc: pcf2127: add missing semicolon after statement Replace comma with semicolon at the end of the statement when setting config.max_register. Fixes: fd28ceb4603f ("rtc: pcf2127: add variant-specific configuration structure") Cc: stable@vger.kernel.org Cc: Elena Popa Signed-off-by: Hugo Villeneuve Link: https://lore.kernel.org/r/20250529202923.1552560-1-hugo@hugovil.com Signed-off-by: Alexandre Belloni --- drivers/rtc/rtc-pcf2127.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/rtc/rtc-pcf2127.c b/drivers/rtc/rtc-pcf2127.c index 2c7917bc2a31..2e1ac0c42e93 100644 --- a/drivers/rtc/rtc-pcf2127.c +++ b/drivers/rtc/rtc-pcf2127.c @@ -1543,7 +1543,7 @@ static int pcf2127_spi_probe(struct spi_device *spi) config.write_flag_mask = 0x0; } - config.max_register = variant->max_register, + config.max_register = variant->max_register; regmap = devm_regmap_init_spi(spi, &config); if (IS_ERR(regmap)) { From f23c52aafb1675ab1d1f46914556d8e29cbbf7b3 Mon Sep 17 00:00:00 2001 From: Fabio Estevam Date: Thu, 19 Jun 2025 08:46:17 -0300 Subject: [PATCH 138/289] serial: imx: Restore original RXTL for console to fix data loss Commit 7a637784d517 ("serial: imx: reduce RX interrupt frequency") introduced a regression on the i.MX6UL EVK board. The issue can be reproduced with the following steps: - Open vi on the board. - Paste a text file (~150 characters). - Save the file, then repeat the process. - Compare the sha256sum of the saved files. The checksums do not match due to missing characters or entire lines. Fix this by restoring the RXTL value to 1 when the UART is used as a console. This ensures timely RX interrupts and reliable data reception in console mode. With this change, pasted content is saved correctly, and checksums are always consistent. Cc: stable Fixes: 7a637784d517 ("serial: imx: reduce RX interrupt frequency") Signed-off-by: Fabio Estevam Reviewed-by: Stefan Wahren Link: https://lore.kernel.org/r/20250619114617.2791939-1-festevam@gmail.com Signed-off-by: Greg Kroah-Hartman --- drivers/tty/serial/imx.c | 17 ++++++++++++----- 1 file changed, 12 insertions(+), 5 deletions(-) diff --git a/drivers/tty/serial/imx.c b/drivers/tty/serial/imx.c index bd02ee898f5d..500dfc009d03 100644 --- a/drivers/tty/serial/imx.c +++ b/drivers/tty/serial/imx.c @@ -235,6 +235,7 @@ struct imx_port { enum imx_tx_state tx_state; struct hrtimer trigger_start_tx; struct hrtimer trigger_stop_tx; + unsigned int rxtl; }; struct imx_port_ucrs { @@ -1339,6 +1340,7 @@ static void imx_uart_clear_rx_errors(struct imx_port *sport) #define TXTL_DEFAULT 8 #define RXTL_DEFAULT 8 /* 8 characters or aging timer */ +#define RXTL_CONSOLE_DEFAULT 1 #define TXTL_DMA 8 /* DMA burst setting */ #define RXTL_DMA 9 /* DMA burst setting */ @@ -1457,7 +1459,7 @@ static void imx_uart_disable_dma(struct imx_port *sport) ucr1 &= ~(UCR1_RXDMAEN | UCR1_TXDMAEN | UCR1_ATDMAEN); imx_uart_writel(sport, ucr1, UCR1); - imx_uart_setup_ufcr(sport, TXTL_DEFAULT, RXTL_DEFAULT); + imx_uart_setup_ufcr(sport, TXTL_DEFAULT, sport->rxtl); sport->dma_is_enabled = 0; } @@ -1482,7 +1484,12 @@ static int imx_uart_startup(struct uart_port *port) return retval; } - imx_uart_setup_ufcr(sport, TXTL_DEFAULT, RXTL_DEFAULT); + if (uart_console(&sport->port)) + sport->rxtl = RXTL_CONSOLE_DEFAULT; + else + sport->rxtl = RXTL_DEFAULT; + + imx_uart_setup_ufcr(sport, TXTL_DEFAULT, sport->rxtl); /* disable the DREN bit (Data Ready interrupt enable) before * requesting IRQs @@ -1948,7 +1955,7 @@ static int imx_uart_poll_init(struct uart_port *port) if (retval) clk_disable_unprepare(sport->clk_ipg); - imx_uart_setup_ufcr(sport, TXTL_DEFAULT, RXTL_DEFAULT); + imx_uart_setup_ufcr(sport, TXTL_DEFAULT, sport->rxtl); uart_port_lock_irqsave(&sport->port, &flags); @@ -2040,7 +2047,7 @@ static int imx_uart_rs485_config(struct uart_port *port, struct ktermios *termio /* If the receiver trigger is 0, set it to a default value */ ufcr = imx_uart_readl(sport, UFCR); if ((ufcr & UFCR_RXTL_MASK) == 0) - imx_uart_setup_ufcr(sport, TXTL_DEFAULT, RXTL_DEFAULT); + imx_uart_setup_ufcr(sport, TXTL_DEFAULT, sport->rxtl); imx_uart_start_rx(port); } @@ -2302,7 +2309,7 @@ imx_uart_console_setup(struct console *co, char *options) else imx_uart_console_get_options(sport, &baud, &parity, &bits); - imx_uart_setup_ufcr(sport, TXTL_DEFAULT, RXTL_DEFAULT); + imx_uart_setup_ufcr(sport, TXTL_DEFAULT, sport->rxtl); retval = uart_set_options(&sport->port, co, baud, parity, bits, flow); From 09812134071b3941fb81def30b61ed36d3a5dfb5 Mon Sep 17 00:00:00 2001 From: Yao Zi Date: Mon, 23 Jun 2025 09:34:45 +0000 Subject: [PATCH 139/289] dt-bindings: serial: 8250: Make clocks and clock-frequency exclusive The 8250 binding before converting to json-schema states, - clock-frequency : the input clock frequency for the UART or - clocks phandle to refer to the clk used as per Documentation/devicetree for clock-related properties, where "or" indicates these properties shouldn't exist at the same time. Additionally, the behavior of Linux's driver is strange when both clocks and clock-frequency are specified: it ignores clocks and obtains the frequency from clock-frequency, left the specified clocks unclaimed. It may even be disabled, which is undesired most of the time. But "anyOf" doesn't prevent these two properties from coexisting, as it considers the object valid as long as there's at LEAST one match. Let's switch to "oneOf" and disallows the other property if one exists, precisely matching the original binding and avoiding future confusion on the driver's behavior. Fixes: e69f5dc623f9 ("dt-bindings: serial: Convert 8250 to json-schema") Cc: stable Signed-off-by: Yao Zi Reviewed-by: Conor Dooley Link: https://lore.kernel.org/r/20250623093445.62327-1-ziyao@disroot.org Signed-off-by: Greg Kroah-Hartman --- Documentation/devicetree/bindings/serial/8250.yaml | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Documentation/devicetree/bindings/serial/8250.yaml b/Documentation/devicetree/bindings/serial/8250.yaml index 33d2016b6509..c6bc27709bf7 100644 --- a/Documentation/devicetree/bindings/serial/8250.yaml +++ b/Documentation/devicetree/bindings/serial/8250.yaml @@ -45,7 +45,7 @@ allOf: - ns16550 - ns16550a then: - anyOf: + oneOf: - required: [ clock-frequency ] - required: [ clocks ] From 0043ec26d827ddb1e85bd9786693152aa6f55d16 Mon Sep 17 00:00:00 2001 From: Srinivasan Shanmugam Date: Thu, 12 Jun 2025 20:11:14 +0530 Subject: [PATCH 140/289] drm/amdgpu/gfx9: Add Cleaner Shader Support for GFX9.x GPUs MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Enable the cleaner shader for other GFX9.x series of GPUs to provide data isolation between GPU workloads. The cleaner shader is responsible for clearing the Local Data Store (LDS), Vector General Purpose Registers (VGPRs), and Scalar General Purpose Registers (SGPRs), which helps prevent data leakage and ensures accurate computation results. This update extends cleaner shader support to GFX9.x GPUs, previously available for GFX9.4.2. It enhances security by clearing GPU memory between processes and maintains a consistent GPU state across KGD and KFD workloads. Cc: Manu Rastogi Cc: Christian König Cc: Alex Deucher Signed-off-by: Srinivasan Shanmugam Acked-by: Alex Deucher Signed-off-by: Alex Deucher (cherry picked from commit 99808926d0ea6234a89e35240a7cb088368de9e1) --- drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c | 19 +++++++++++++++++++ 1 file changed, 19 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c index d377a7c57d5e..ad9be3656653 100644 --- a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c @@ -2235,6 +2235,25 @@ static int gfx_v9_0_sw_init(struct amdgpu_ip_block *ip_block) } switch (amdgpu_ip_version(adev, GC_HWIP, 0)) { + case IP_VERSION(9, 0, 1): + case IP_VERSION(9, 2, 1): + case IP_VERSION(9, 4, 0): + case IP_VERSION(9, 2, 2): + case IP_VERSION(9, 1, 0): + case IP_VERSION(9, 3, 0): + adev->gfx.cleaner_shader_ptr = gfx_9_4_2_cleaner_shader_hex; + adev->gfx.cleaner_shader_size = sizeof(gfx_9_4_2_cleaner_shader_hex); + if (adev->gfx.me_fw_version >= 167 && + adev->gfx.pfp_fw_version >= 196 && + adev->gfx.mec_fw_version >= 474) { + adev->gfx.enable_cleaner_shader = true; + r = amdgpu_gfx_cleaner_shader_sw_init(adev, adev->gfx.cleaner_shader_size); + if (r) { + adev->gfx.enable_cleaner_shader = false; + dev_err(adev->dev, "Failed to initialize cleaner shader\n"); + } + } + break; case IP_VERSION(9, 4, 2): adev->gfx.cleaner_shader_ptr = gfx_9_4_2_cleaner_shader_hex; adev->gfx.cleaner_shader_size = sizeof(gfx_9_4_2_cleaner_shader_hex); From 99579c55c3d6132a5236926652c0a72a526b809d Mon Sep 17 00:00:00 2001 From: Alex Deucher Date: Tue, 20 May 2025 10:02:14 -0400 Subject: [PATCH 141/289] drm/amdgpu/mes: add compatibility checks for set_hw_resource_1 Seems some older MES firmware versions do not properly support this packet. Add back some the compatibility checks. v2: switch to fw version check (Shaoyun) Fixes: f81cd793119e ("drm/amd/amdgpu: Fix MES init sequence") Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4295 Cc: Shaoyun Liu Reviewed-by: shaoyun.liu Signed-off-by: Alex Deucher (cherry picked from commit 0180e0a5dd5c6ff118043ee42dbbbddaf881f283) Cc: stable@vger.kernel.org --- drivers/gpu/drm/amd/amdgpu/mes_v11_0.c | 10 ++++++---- drivers/gpu/drm/amd/amdgpu/mes_v12_0.c | 3 ++- 2 files changed, 8 insertions(+), 5 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c b/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c index c9eba537de09..28eb846280dd 100644 --- a/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c +++ b/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c @@ -1630,10 +1630,12 @@ static int mes_v11_0_hw_init(struct amdgpu_ip_block *ip_block) if (r) goto failure; - r = mes_v11_0_set_hw_resources_1(&adev->mes); - if (r) { - DRM_ERROR("failed mes_v11_0_set_hw_resources_1, r=%d\n", r); - goto failure; + if ((adev->mes.sched_version & AMDGPU_MES_VERSION_MASK) >= 0x50) { + r = mes_v11_0_set_hw_resources_1(&adev->mes); + if (r) { + DRM_ERROR("failed mes_v11_0_set_hw_resources_1, r=%d\n", r); + goto failure; + } } r = mes_v11_0_query_sched_status(&adev->mes); diff --git a/drivers/gpu/drm/amd/amdgpu/mes_v12_0.c b/drivers/gpu/drm/amd/amdgpu/mes_v12_0.c index b4f17332d466..6b222630f3fa 100644 --- a/drivers/gpu/drm/amd/amdgpu/mes_v12_0.c +++ b/drivers/gpu/drm/amd/amdgpu/mes_v12_0.c @@ -1742,7 +1742,8 @@ static int mes_v12_0_hw_init(struct amdgpu_ip_block *ip_block) if (r) goto failure; - mes_v12_0_set_hw_resources_1(&adev->mes, AMDGPU_MES_SCHED_PIPE); + if ((adev->mes.sched_version & AMDGPU_MES_VERSION_MASK) >= 0x4b) + mes_v12_0_set_hw_resources_1(&adev->mes, AMDGPU_MES_SCHED_PIPE); mes_v12_0_init_aggregated_doorbell(&adev->mes); From 73eab78721f7b85216f1ca8c7b732f13213b5b32 Mon Sep 17 00:00:00 2001 From: Mario Limonciello Date: Tue, 17 Jun 2025 13:30:52 -0500 Subject: [PATCH 142/289] drm/amd: Adjust output for discovery error handling commit 017fbb6690c2 ("drm/amdgpu/discovery: check ip_discovery fw file available") added support for reading an amdgpu IP discovery bin file for some specific products. If it's not found then it will fallback to hardcoded values. However if it's not found there is also a lot of noise about missing files and errors. Adjust the error handling to decrease most messages to DEBUG and to show users less about missing files. Reviewed-by: Lijo Lazar Reported-by: Marcus Seyfarth Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4312 Tested-by: Marcus Seyfarth Fixes: 017fbb6690c2 ("drm/amdgpu/discovery: check ip_discovery fw file available") Acked-by: Alex Deucher Link: https://lore.kernel.org/r/20250617183052.1692059-1-superm1@kernel.org Signed-off-by: Mario Limonciello Signed-off-by: Alex Deucher (cherry picked from commit 49f1f9f6c3c9febf8ba93f94a8d9c8d03e1ea0a1) --- drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c | 28 +++++++++---------- 1 file changed, 13 insertions(+), 15 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c index a0e9bf9b2710..81b3443c8d7f 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c @@ -321,10 +321,12 @@ static int amdgpu_discovery_read_binary_from_file(struct amdgpu_device *adev, const struct firmware *fw; int r; - r = request_firmware(&fw, fw_name, adev->dev); + r = firmware_request_nowarn(&fw, fw_name, adev->dev); if (r) { - dev_err(adev->dev, "can't load firmware \"%s\"\n", - fw_name); + if (amdgpu_discovery == 2) + dev_err(adev->dev, "can't load firmware \"%s\"\n", fw_name); + else + drm_info(&adev->ddev, "Optional firmware \"%s\" was not found\n", fw_name); return r; } @@ -459,16 +461,12 @@ static int amdgpu_discovery_init(struct amdgpu_device *adev) /* Read from file if it is the preferred option */ fw_name = amdgpu_discovery_get_fw_name(adev); if (fw_name != NULL) { - dev_info(adev->dev, "use ip discovery information from file"); + drm_dbg(&adev->ddev, "use ip discovery information from file"); r = amdgpu_discovery_read_binary_from_file(adev, adev->mman.discovery_bin, fw_name); - - if (r) { - dev_err(adev->dev, "failed to read ip discovery binary from file\n"); - r = -EINVAL; + if (r) goto out; - } - } else { + drm_dbg(&adev->ddev, "use ip discovery information from memory"); r = amdgpu_discovery_read_binary_from_mem( adev, adev->mman.discovery_bin); if (r) @@ -1338,10 +1336,8 @@ static int amdgpu_discovery_reg_base_init(struct amdgpu_device *adev) int r; r = amdgpu_discovery_init(adev); - if (r) { - DRM_ERROR("amdgpu_discovery_init failed\n"); + if (r) return r; - } wafl_ver = 0; adev->gfx.xcc_mask = 0; @@ -2579,8 +2575,10 @@ int amdgpu_discovery_set_ip_blocks(struct amdgpu_device *adev) break; default: r = amdgpu_discovery_reg_base_init(adev); - if (r) - return -EINVAL; + if (r) { + drm_err(&adev->ddev, "discovery failed: %d\n", r); + return r; + } amdgpu_discovery_harvest_ip(adev); amdgpu_discovery_get_gfx_info(adev); From 899dec4e885f839da2065803e21b7ab003a5c609 Mon Sep 17 00:00:00 2001 From: Alex Deucher Date: Thu, 19 Jun 2025 17:56:29 -0400 Subject: [PATCH 143/289] drm/amdgpu/sdma6: add ucode version checks for userq support SDMA 6.0.0 version 24 SDMA 6.0.2 version 21 SDMA 6.0.3 version 25 Reviewed-by: Jesse Zhang Signed-off-by: Alex Deucher (cherry picked from commit e8cca30d8b34f1c4101c237914c53068d4a55e73) --- drivers/gpu/drm/amd/amdgpu/sdma_v6_0.c | 19 ++++++++++++++++--- 1 file changed, 16 insertions(+), 3 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/sdma_v6_0.c b/drivers/gpu/drm/amd/amdgpu/sdma_v6_0.c index 5a70ae17be04..a9bdf8d61d6c 100644 --- a/drivers/gpu/drm/amd/amdgpu/sdma_v6_0.c +++ b/drivers/gpu/drm/amd/amdgpu/sdma_v6_0.c @@ -1374,9 +1374,22 @@ static int sdma_v6_0_sw_init(struct amdgpu_ip_block *ip_block) else DRM_ERROR("Failed to allocated memory for SDMA IP Dump\n"); - /* add firmware version checks here */ - if (0 && !adev->sdma.disable_uq) - adev->userq_funcs[AMDGPU_HW_IP_DMA] = &userq_mes_funcs; + switch (amdgpu_ip_version(adev, SDMA0_HWIP, 0)) { + case IP_VERSION(6, 0, 0): + if ((adev->sdma.instance[0].fw_version >= 24) && !adev->sdma.disable_uq) + adev->userq_funcs[AMDGPU_HW_IP_DMA] = &userq_mes_funcs; + break; + case IP_VERSION(6, 0, 2): + if ((adev->sdma.instance[0].fw_version >= 21) && !adev->sdma.disable_uq) + adev->userq_funcs[AMDGPU_HW_IP_DMA] = &userq_mes_funcs; + break; + case IP_VERSION(6, 0, 3): + if ((adev->sdma.instance[0].fw_version >= 25) && !adev->sdma.disable_uq) + adev->userq_funcs[AMDGPU_HW_IP_DMA] = &userq_mes_funcs; + break; + default: + break; + } r = amdgpu_sdma_sysfs_reset_mask_init(adev); if (r) From 31135cc99c40247bec924dcdcd74a58e866c52d8 Mon Sep 17 00:00:00 2001 From: Alex Deucher Date: Fri, 20 Jun 2025 11:39:22 -0400 Subject: [PATCH 144/289] drm/amdgpu/sdma7: add ucode version checks for userq support SDMA 7.0.0/1: 7836028 Reviewed-by: Jesse Zhang Signed-off-by: Alex Deucher (cherry picked from commit 8c011408ed842dfccdd50a90a9cf6bccdb85cc0e) --- drivers/gpu/drm/amd/amdgpu/sdma_v7_0.c | 12 +++++++++--- 1 file changed, 9 insertions(+), 3 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/sdma_v7_0.c b/drivers/gpu/drm/amd/amdgpu/sdma_v7_0.c index ad47d0bdf777..86903eccbd4e 100644 --- a/drivers/gpu/drm/amd/amdgpu/sdma_v7_0.c +++ b/drivers/gpu/drm/amd/amdgpu/sdma_v7_0.c @@ -1349,9 +1349,15 @@ static int sdma_v7_0_sw_init(struct amdgpu_ip_block *ip_block) else DRM_ERROR("Failed to allocated memory for SDMA IP Dump\n"); - /* add firmware version checks here */ - if (0 && !adev->sdma.disable_uq) - adev->userq_funcs[AMDGPU_HW_IP_DMA] = &userq_mes_funcs; + switch (amdgpu_ip_version(adev, SDMA0_HWIP, 0)) { + case IP_VERSION(7, 0, 0): + case IP_VERSION(7, 0, 1): + if ((adev->sdma.instance[0].fw_version >= 7836028) && !adev->sdma.disable_uq) + adev->userq_funcs[AMDGPU_HW_IP_DMA] = &userq_mes_funcs; + break; + default: + break; + } return r; } From 66abb996999de0d440a02583a6e70c2c24deab45 Mon Sep 17 00:00:00 2001 From: Mario Limonciello Date: Mon, 23 Jun 2025 12:11:13 -0500 Subject: [PATCH 145/289] drm/amd/display: Fix AMDGPU_MAX_BL_LEVEL value [Why] commit 16dc8bc27c2a ("drm/amd/display: Export full brightness range to userspace") adjusted the brightness range to scale to larger values, but missed updating AMDGPU_MAX_BL_LEVEL which is needed to make sure that scaling works properly with custom brightness curves. [How] As the change for max brightness of 0xFFFF only applies to devices supporting DC, use existing DC define MAX_BACKLIGHT_LEVEL. Fixes: 16dc8bc27c2a ("drm/amd/display: Export full brightness range to userspace") Acked-by: Alex Deucher Link: https://lore.kernel.org/r/20250623171114.1156451-1-mario.limonciello@amd.com Signed-off-by: Mario Limonciello Signed-off-by: Alex Deucher (cherry picked from commit 5b852044eb0d3e1f1c946d32e05fcb068e0a20a0) Cc: stable@vger.kernel.org --- drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c index bc4cd11bfc79..0b8ac9edc070 100644 --- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c @@ -4718,16 +4718,16 @@ static int get_brightness_range(const struct amdgpu_dm_backlight_caps *caps, return 1; } -/* Rescale from [min..max] to [0..AMDGPU_MAX_BL_LEVEL] */ +/* Rescale from [min..max] to [0..MAX_BACKLIGHT_LEVEL] */ static inline u32 scale_input_to_fw(int min, int max, u64 input) { - return DIV_ROUND_CLOSEST_ULL(input * AMDGPU_MAX_BL_LEVEL, max - min); + return DIV_ROUND_CLOSEST_ULL(input * MAX_BACKLIGHT_LEVEL, max - min); } -/* Rescale from [0..AMDGPU_MAX_BL_LEVEL] to [min..max] */ +/* Rescale from [0..MAX_BACKLIGHT_LEVEL] to [min..max] */ static inline u32 scale_fw_to_input(int min, int max, u64 input) { - return min + DIV_ROUND_CLOSEST_ULL(input * (max - min), AMDGPU_MAX_BL_LEVEL); + return min + DIV_ROUND_CLOSEST_ULL(input * (max - min), MAX_BACKLIGHT_LEVEL); } static void convert_custom_brightness(const struct amdgpu_dm_backlight_caps *caps, @@ -4947,7 +4947,7 @@ amdgpu_dm_register_backlight_device(struct amdgpu_dm_connector *aconnector) drm_dbg(drm, "Backlight caps: min: %d, max: %d, ac %d, dc %d\n", min, max, caps->ac_level, caps->dc_level); } else - props.brightness = props.max_brightness = AMDGPU_MAX_BL_LEVEL; + props.brightness = props.max_brightness = MAX_BACKLIGHT_LEVEL; if (caps->data_points && !(amdgpu_dc_debug_mask & DC_DISABLE_CUSTOM_BRIGHTNESS_CURVE)) drm_info(drm, "Using custom brightness curve\n"); From 6847b3b6e84ef37451c074e6a8db3fbd250c8dbf Mon Sep 17 00:00:00 2001 From: Takashi Iwai Date: Mon, 16 Jun 2025 18:08:41 +0200 Subject: [PATCH 146/289] drm/amd/display: Add sanity checks for drm_edid_raw() When EDID is retrieved via drm_edid_raw(), it doesn't guarantee to return proper EDID bytes the caller wants: it may be either NULL (that leads to an Oops) or with too long bytes over the fixed size raw_edid array (that may lead to memory corruption). The latter was reported actually when connected with a bad adapter. Add sanity checks for drm_edid_raw() to address the above corner cases, and return EDID_BAD_INPUT accordingly. Fixes: 48edb2a4256e ("drm/amd/display: switch amdgpu_dm_connector to use struct drm_edid") Link: https://bugzilla.suse.com/show_bug.cgi?id=1236415 Signed-off-by: Takashi Iwai Signed-off-by: Alex Deucher (cherry picked from commit 648d3f4d209725d51900d6a3ed46b7b600140cdf) Cc: stable@vger.kernel.org --- drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_helpers.c | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_helpers.c b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_helpers.c index d4395b92fb85..9e3e51a2dc49 100644 --- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_helpers.c +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_helpers.c @@ -1029,6 +1029,10 @@ enum dc_edid_status dm_helpers_read_local_edid( return EDID_NO_RESPONSE; edid = drm_edid_raw(drm_edid); // FIXME: Get rid of drm_edid_raw() + if (!edid || + edid->extensions >= sizeof(sink->dc_edid.raw_edid) / EDID_LENGTH) + return EDID_BAD_INPUT; + sink->dc_edid.length = EDID_LENGTH * (edid->extensions + 1); memmove(sink->dc_edid.raw_edid, (uint8_t *)edid, sink->dc_edid.length); From 6c038b58a2dc5a008c7e7a1297f5aaa4deaaaa7e Mon Sep 17 00:00:00 2001 From: Tamura Dai Date: Mon, 16 Jun 2025 08:55:48 +0900 Subject: [PATCH 147/289] ASoC: SOF: Intel: hda: Use devm_kstrdup() to avoid memleak. sof_pdata->tplg_filename can have address allocated by kstrdup() and can be overwritten. Memory leak was detected with kmemleak: unreferenced object 0xffff88812391ff60 (size 16): comm "kworker/4:1", pid 161, jiffies 4294802931 hex dump (first 16 bytes): 73 6f 66 2d 68 64 61 2d 67 65 6e 65 72 69 63 00 sof-hda-generic. backtrace (crc 4bf1675c): __kmalloc_node_track_caller_noprof+0x49c/0x6b0 kstrdup+0x46/0xc0 hda_machine_select.cold+0x1de/0x12cf [snd_sof_intel_hda_generic] sof_init_environment+0x16f/0xb50 [snd_sof] sof_probe_continue+0x45/0x7c0 [snd_sof] sof_probe_work+0x1e/0x40 [snd_sof] process_one_work+0x894/0x14b0 worker_thread+0x5e5/0xfb0 kthread+0x39d/0x760 ret_from_fork+0x31/0x70 ret_from_fork_asm+0x1a/0x30 Signed-off-by: Tamura Dai Link: https://patch.msgid.link/20250615235548.8591-1-kirinode0@gmail.com Signed-off-by: Mark Brown --- sound/soc/sof/intel/hda.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/sound/soc/sof/intel/hda.c b/sound/soc/sof/intel/hda.c index bdfe388da198..3b47191ea7a5 100644 --- a/sound/soc/sof/intel/hda.c +++ b/sound/soc/sof/intel/hda.c @@ -1257,11 +1257,11 @@ static int check_tplg_quirk_mask(struct snd_soc_acpi_mach *mach) return 0; } -static char *remove_file_ext(const char *tplg_filename) +static char *remove_file_ext(struct device *dev, const char *tplg_filename) { char *filename, *tmp; - filename = kstrdup(tplg_filename, GFP_KERNEL); + filename = devm_kstrdup(dev, tplg_filename, GFP_KERNEL); if (!filename) return NULL; @@ -1345,7 +1345,7 @@ struct snd_soc_acpi_mach *hda_machine_select(struct snd_sof_dev *sdev) */ if (!sof_pdata->tplg_filename) { /* remove file extension if it exists */ - tplg_filename = remove_file_ext(mach->sof_tplg_filename); + tplg_filename = remove_file_ext(sdev->dev, mach->sof_tplg_filename); if (!tplg_filename) return NULL; From 62207293479e6c03ef498a70f2914c51f4d31d2c Mon Sep 17 00:00:00 2001 From: Haoxiang Li Date: Fri, 16 May 2025 15:16:55 +0300 Subject: [PATCH 148/289] drm/xe/display: Add check for alloc_ordered_workqueue() MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Add check for the return value of alloc_ordered_workqueue() in xe_display_create() to catch potential exception. Fixes: 44e694958b95 ("drm/xe/display: Implement display support") Cc: stable@vger.kernel.org Signed-off-by: Haoxiang Li Reviewed-by: Matthew Auld Link: https://lore.kernel.org/r/4ee1b0e5d1626ce1dde2e82af05c2edaed50c3aa.1747397638.git.jani.nikula@intel.com Signed-off-by: Jani Nikula (cherry picked from commit 5b62d63395d5b7d4094e7cd380bccae4b25415cb) Signed-off-by: Thomas Hellström --- drivers/gpu/drm/xe/display/xe_display.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/gpu/drm/xe/display/xe_display.c b/drivers/gpu/drm/xe/display/xe_display.c index 68f064f33d4b..9f4ade25787a 100644 --- a/drivers/gpu/drm/xe/display/xe_display.c +++ b/drivers/gpu/drm/xe/display/xe_display.c @@ -104,6 +104,8 @@ int xe_display_create(struct xe_device *xe) spin_lock_init(&xe->display.fb_tracking.lock); xe->display.hotplug.dp_wq = alloc_ordered_workqueue("xe-dp", 0); + if (!xe->display.hotplug.dp_wq) + return -ENOMEM; return drmm_add_action_or_reset(&xe->drm, display_destroy, NULL); } From 9127a69c7193ad47047ff968a2de9161d5c93d37 Mon Sep 17 00:00:00 2001 From: Karthik Poosa Date: Tue, 17 Jun 2025 17:30:30 +0530 Subject: [PATCH 149/289] drm/xe/hwmon: Fix xe_hwmon_power_max_write MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Prevent other bits of mailbox power limit from being overwritten with 0. This issue was due to a missing read and modify of current power limit, before setting a requested mailbox power limit, which is added in this patch. v2: - Improve commit message. (Anshuman) v3: - Rebase. - Rephrase commit message. (Riana) - Add read-modify-write variant of xe_hwmon_pcode_write_power_limit() i.e. xe_hwmon_pcode_rmw_power_limit(). (Badal) - Use xe_hwmon_pcode_rmw_power_limit() to set mailbox power limits. - Remove xe_hwmon_pcode_write_power_limit() as all mailbox power limits writes use xe_hwmon_pcode_rmw_power_limit() only. v4: - Use PWR_LIM in place of (PWR_LIM_EN | PWR_LIM_VAL) wherever applicable. (Riana) Fixes: 25a2aa779fc3 ("drm/xe/hwmon: Add support to manage power limits though mailbox") Reviewed-by: Riana Tauro Signed-off-by: Karthik Poosa Reviewed-by: Badal Nilawar Link: https://lore.kernel.org/r/20250617120030.612819-1-karthik.poosa@intel.com Signed-off-by: Rodrigo Vivi (cherry picked from commit 8aa7306631f088881759398972d503757cf0c901) Signed-off-by: Thomas Hellström --- drivers/gpu/drm/xe/regs/xe_mchbar_regs.h | 1 + drivers/gpu/drm/xe/xe_hwmon.c | 34 +++++++++++------------- 2 files changed, 16 insertions(+), 19 deletions(-) diff --git a/drivers/gpu/drm/xe/regs/xe_mchbar_regs.h b/drivers/gpu/drm/xe/regs/xe_mchbar_regs.h index 5394a1373a6b..ef2bf984723f 100644 --- a/drivers/gpu/drm/xe/regs/xe_mchbar_regs.h +++ b/drivers/gpu/drm/xe/regs/xe_mchbar_regs.h @@ -40,6 +40,7 @@ #define PCU_CR_PACKAGE_RAPL_LIMIT XE_REG(MCHBAR_MIRROR_BASE_SNB + 0x59a0) #define PWR_LIM_VAL REG_GENMASK(14, 0) #define PWR_LIM_EN REG_BIT(15) +#define PWR_LIM REG_GENMASK(15, 0) #define PWR_LIM_TIME REG_GENMASK(23, 17) #define PWR_LIM_TIME_X REG_GENMASK(23, 22) #define PWR_LIM_TIME_Y REG_GENMASK(21, 17) diff --git a/drivers/gpu/drm/xe/xe_hwmon.c b/drivers/gpu/drm/xe/xe_hwmon.c index 74f31639b37f..f008e8049700 100644 --- a/drivers/gpu/drm/xe/xe_hwmon.c +++ b/drivers/gpu/drm/xe/xe_hwmon.c @@ -159,8 +159,8 @@ static int xe_hwmon_pcode_read_power_limit(const struct xe_hwmon *hwmon, u32 att return ret; } -static int xe_hwmon_pcode_write_power_limit(const struct xe_hwmon *hwmon, u32 attr, u8 channel, - u32 uval) +static int xe_hwmon_pcode_rmw_power_limit(const struct xe_hwmon *hwmon, u32 attr, u8 channel, + u32 clr, u32 set) { struct xe_tile *root_tile = xe_device_get_root_tile(hwmon->xe); u32 val0, val1; @@ -179,7 +179,7 @@ static int xe_hwmon_pcode_write_power_limit(const struct xe_hwmon *hwmon, u32 at channel, val0, val1, ret); if (attr == PL1_HWMON_ATTR) - val0 = uval; + val0 = (val0 & ~clr) | set; else return -EIO; @@ -339,7 +339,7 @@ static int xe_hwmon_power_max_write(struct xe_hwmon *hwmon, u32 attr, int channe if (hwmon->xe->info.has_mbx_power_limits) { drm_dbg(&hwmon->xe->drm, "disabling %s on channel %d\n", PWR_ATTR_TO_STR(attr), channel); - xe_hwmon_pcode_write_power_limit(hwmon, attr, channel, 0); + xe_hwmon_pcode_rmw_power_limit(hwmon, attr, channel, PWR_LIM_EN, 0); xe_hwmon_pcode_read_power_limit(hwmon, attr, channel, ®_val); } else { reg_val = xe_mmio_rmw32(mmio, rapl_limit, PWR_LIM_EN, 0); @@ -370,10 +370,9 @@ static int xe_hwmon_power_max_write(struct xe_hwmon *hwmon, u32 attr, int channe } if (hwmon->xe->info.has_mbx_power_limits) - ret = xe_hwmon_pcode_write_power_limit(hwmon, attr, channel, reg_val); + ret = xe_hwmon_pcode_rmw_power_limit(hwmon, attr, channel, PWR_LIM, reg_val); else - reg_val = xe_mmio_rmw32(mmio, rapl_limit, PWR_LIM_EN | PWR_LIM_VAL, - reg_val); + reg_val = xe_mmio_rmw32(mmio, rapl_limit, PWR_LIM, reg_val); unlock: mutex_unlock(&hwmon->hwmon_lock); return ret; @@ -563,14 +562,11 @@ xe_hwmon_power_max_interval_store(struct device *dev, struct device_attribute *a mutex_lock(&hwmon->hwmon_lock); - if (hwmon->xe->info.has_mbx_power_limits) { - ret = xe_hwmon_pcode_read_power_limit(hwmon, power_attr, channel, (u32 *)&r); - r = (r & ~PWR_LIM_TIME) | rxy; - xe_hwmon_pcode_write_power_limit(hwmon, power_attr, channel, r); - } else { + if (hwmon->xe->info.has_mbx_power_limits) + xe_hwmon_pcode_rmw_power_limit(hwmon, power_attr, channel, PWR_LIM_TIME, rxy); + else r = xe_mmio_rmw32(mmio, xe_hwmon_get_reg(hwmon, REG_PKG_RAPL_LIMIT, channel), PWR_LIM_TIME, rxy); - } mutex_unlock(&hwmon->hwmon_lock); @@ -1138,12 +1134,12 @@ xe_hwmon_get_preregistration_info(struct xe_hwmon *hwmon) } else { drm_info(&hwmon->xe->drm, "Using mailbox commands for power limits\n"); /* Write default limits to read from pcode from now on. */ - xe_hwmon_pcode_write_power_limit(hwmon, PL1_HWMON_ATTR, - CHANNEL_CARD, - hwmon->pl1_on_boot[CHANNEL_CARD]); - xe_hwmon_pcode_write_power_limit(hwmon, PL1_HWMON_ATTR, - CHANNEL_PKG, - hwmon->pl1_on_boot[CHANNEL_PKG]); + xe_hwmon_pcode_rmw_power_limit(hwmon, PL1_HWMON_ATTR, + CHANNEL_CARD, PWR_LIM | PWR_LIM_TIME, + hwmon->pl1_on_boot[CHANNEL_CARD]); + xe_hwmon_pcode_rmw_power_limit(hwmon, PL1_HWMON_ATTR, + CHANNEL_PKG, PWR_LIM | PWR_LIM_TIME, + hwmon->pl1_on_boot[CHANNEL_PKG]); hwmon->scl_shift_power = PWR_UNIT; hwmon->scl_shift_energy = ENERGY_UNIT; hwmon->scl_shift_time = TIME_UNIT; From 818625570558cd91082c9bafd6f2b59b73241a69 Mon Sep 17 00:00:00 2001 From: Nicolin Chen Date: Tue, 24 Jun 2025 11:00:45 -0700 Subject: [PATCH 150/289] iommufd/selftest: Fix iommufd_dirty_tracking with large hugepage sizes The hugepage test cases of iommufd_dirty_tracking have the 64MB and 128MB coverages. Both of them are smaller than the default hugepage size 512MB, when CONFIG_PAGE_SIZE_64KB=y. However, these test cases have a variant of using huge pages, which would mmap(MAP_HUGETLB) using these smaller sizes than the system hugepag size. This results in the kernel aligning up the smaller size to 512MB. If a memory was located between the upper 64/128MB size boundary and the hugepage 512MB boundary, it would get wiped out: https://lore.kernel.org/all/aEoUhPYIAizTLADq@nvidia.com/ Given that this aligning up behavior is well documented, we have no choice but to allocate a hugepage aligned size to avoid this unintended wipe out. Instead of relying on the kernel's internal force alignment, pass the same size to posix_memalign() and map(). Also, fix the FIXTURE_TEARDOWN() misusing munmap() to free the memory from posix_memalign(), as munmap() doesn't destroy the allocator meta data. So, call free() instead. Fixes: a9af47e382a4 ("iommufd/selftest: Test IOMMU_HWPT_GET_DIRTY_BITMAP") Link: https://patch.msgid.link/r/1ea8609ae6d523fdd4d8efb179ddee79c8582cb6.1750787928.git.nicolinc@nvidia.com Cc: stable@vger.kernel.org Suggested-by: Jason Gunthorpe Signed-off-by: Nicolin Chen Signed-off-by: Jason Gunthorpe --- tools/testing/selftests/iommu/iommufd.c | 30 +++++++++++++++++-------- 1 file changed, 21 insertions(+), 9 deletions(-) diff --git a/tools/testing/selftests/iommu/iommufd.c b/tools/testing/selftests/iommu/iommufd.c index 1a8e85afe9aa..e7eb11a94034 100644 --- a/tools/testing/selftests/iommu/iommufd.c +++ b/tools/testing/selftests/iommu/iommufd.c @@ -2008,6 +2008,7 @@ FIXTURE_VARIANT(iommufd_dirty_tracking) FIXTURE_SETUP(iommufd_dirty_tracking) { + size_t mmap_buffer_size; unsigned long size; int mmap_flags; void *vrc; @@ -2022,22 +2023,33 @@ FIXTURE_SETUP(iommufd_dirty_tracking) self->fd = open("/dev/iommu", O_RDWR); ASSERT_NE(-1, self->fd); - rc = posix_memalign(&self->buffer, HUGEPAGE_SIZE, variant->buffer_size); - if (rc || !self->buffer) { - SKIP(return, "Skipping buffer_size=%lu due to errno=%d", - variant->buffer_size, rc); - } - mmap_flags = MAP_SHARED | MAP_ANONYMOUS | MAP_FIXED; + mmap_buffer_size = variant->buffer_size; if (variant->hugepages) { /* * MAP_POPULATE will cause the kernel to fail mmap if THPs are * not available. */ mmap_flags |= MAP_HUGETLB | MAP_POPULATE; + + /* + * Allocation must be aligned to the HUGEPAGE_SIZE, because the + * following mmap() will automatically align the length to be a + * multiple of the underlying huge page size. Failing to do the + * same at this allocation will result in a memory overwrite by + * the mmap(). + */ + if (mmap_buffer_size < HUGEPAGE_SIZE) + mmap_buffer_size = HUGEPAGE_SIZE; + } + + rc = posix_memalign(&self->buffer, HUGEPAGE_SIZE, mmap_buffer_size); + if (rc || !self->buffer) { + SKIP(return, "Skipping buffer_size=%lu due to errno=%d", + mmap_buffer_size, rc); } assert((uintptr_t)self->buffer % HUGEPAGE_SIZE == 0); - vrc = mmap(self->buffer, variant->buffer_size, PROT_READ | PROT_WRITE, + vrc = mmap(self->buffer, mmap_buffer_size, PROT_READ | PROT_WRITE, mmap_flags, -1, 0); assert(vrc == self->buffer); @@ -2066,8 +2078,8 @@ FIXTURE_SETUP(iommufd_dirty_tracking) FIXTURE_TEARDOWN(iommufd_dirty_tracking) { - munmap(self->buffer, variant->buffer_size); - munmap(self->bitmap, DIV_ROUND_UP(self->bitmap_size, BITS_PER_BYTE)); + free(self->buffer); + free(self->bitmap); teardown_iommufd(self->fd, _metadata); } From 4b75e3babb85238912f50bbe0647bae08242a9b6 Mon Sep 17 00:00:00 2001 From: Nicolin Chen Date: Tue, 24 Jun 2025 11:00:46 -0700 Subject: [PATCH 151/289] iommufd/selftest: Add missing close(mfd) in memfd_mmap() Do not forget to close mfd in the error paths, since none of the callers would close it when ASSERT_NE(MAP_FAILED, buf) fails. Fixes: 0bcceb1f51c7 ("iommufd: Selftest coverage for IOMMU_IOAS_MAP_FILE") Link: https://patch.msgid.link/r/a363a69dbf453d4bc1bde276f3b16778620488e1.1750787928.git.nicolinc@nvidia.com Cc: stable@vger.kernel.org Reviewed-by: Jason Gunthorpe Signed-off-by: Nicolin Chen Signed-off-by: Jason Gunthorpe --- tools/testing/selftests/iommu/iommufd_utils.h | 9 +++++++-- 1 file changed, 7 insertions(+), 2 deletions(-) diff --git a/tools/testing/selftests/iommu/iommufd_utils.h b/tools/testing/selftests/iommu/iommufd_utils.h index 72f6636e5d90..6e967b58acfd 100644 --- a/tools/testing/selftests/iommu/iommufd_utils.h +++ b/tools/testing/selftests/iommu/iommufd_utils.h @@ -60,13 +60,18 @@ static inline void *memfd_mmap(size_t length, int prot, int flags, int *mfd_p) { int mfd_flags = (flags & MAP_HUGETLB) ? MFD_HUGETLB : 0; int mfd = memfd_create("buffer", mfd_flags); + void *buf = MAP_FAILED; if (mfd <= 0) return MAP_FAILED; if (ftruncate(mfd, length)) - return MAP_FAILED; + goto out; *mfd_p = mfd; - return mmap(0, length, prot, flags, mfd, 0); + buf = mmap(0, length, prot, flags, mfd, 0); +out: + if (buf == MAP_FAILED) + close(mfd); + return buf; } /* From a9bf67ee170514b17541038c60bb94cb2cf5732f Mon Sep 17 00:00:00 2001 From: Nicolin Chen Date: Tue, 24 Jun 2025 11:00:47 -0700 Subject: [PATCH 152/289] iommufd/selftest: Add asserts testing global mfd The mfd and mfd_buffer will be used in the tests directly without an extra check. Test them in setup_sizes() to ensure they are safe to use. Fixes: 0bcceb1f51c7 ("iommufd: Selftest coverage for IOMMU_IOAS_MAP_FILE") Link: https://patch.msgid.link/r/94bdc11d2b6d5db337b1361c5e5fce0ed494bb40.1750787928.git.nicolinc@nvidia.com Cc: stable@vger.kernel.org Reviewed-by: Jason Gunthorpe Signed-off-by: Nicolin Chen Signed-off-by: Jason Gunthorpe --- tools/testing/selftests/iommu/iommufd.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/tools/testing/selftests/iommu/iommufd.c b/tools/testing/selftests/iommu/iommufd.c index e7eb11a94034..e61218c0537f 100644 --- a/tools/testing/selftests/iommu/iommufd.c +++ b/tools/testing/selftests/iommu/iommufd.c @@ -54,6 +54,8 @@ static __attribute__((constructor)) void setup_sizes(void) mfd_buffer = memfd_mmap(BUFFER_SIZE, PROT_READ | PROT_WRITE, MAP_SHARED, &mfd); + assert(mfd_buffer != MAP_FAILED); + assert(mfd > 0); } FIXTURE(iommufd) From 9a96876e3c6578031fa5dc5dde7759d383b2fb75 Mon Sep 17 00:00:00 2001 From: Nicolin Chen Date: Tue, 24 Jun 2025 11:00:48 -0700 Subject: [PATCH 153/289] iommufd/selftest: Fix build warnings due to uninitialized mfd MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Commit 869c788909b9 ("selftests: harness: Stop using setjmp()/longjmp()") changed the harness structure. For some unknown reason, two build warnings occur to the iommufd selftest: iommufd.c: In function ‘wrapper_iommufd_mock_domain_all_aligns’: iommufd.c:1807:17: warning: ‘mfd’ may be used uninitialized in this function 1807 | close(mfd); | ^~~~~~~~~~ iommufd.c:1767:13: note: ‘mfd’ was declared here 1767 | int mfd; | ^~~ iommufd.c: In function ‘wrapper_iommufd_mock_domain_all_aligns_copy’: iommufd.c:1870:17: warning: ‘mfd’ may be used uninitialized in this function 1870 | close(mfd); | ^~~~~~~~~~ iommufd.c:1819:13: note: ‘mfd’ was declared here 1819 | int mfd; | ^~~ All the mfd have been used in the variant->file path only, so it's likely a false alarm. FWIW, the commit mentioned above does not cause this, yet it might affect gcc in a certain way that resulted in the warnings. It is also found that ading a dummy setjmp (which doesn't make sense) could mute the warnings: https://lore.kernel.org/all/aEi8DV+ReF3v3Rlf@nvidia.com/ The job of this selftest is to catch kernel bug, while such warnings will unlikely disrupt its role. Mute the warning by force initializing the mfd and add an ASSERT_GT(). Link: https://patch.msgid.link/r/6951d85d5cd34cbf22abab7714542654e63ecc44.1750787928.git.nicolinc@nvidia.com Reviewed-by: Jason Gunthorpe Signed-off-by: Nicolin Chen Signed-off-by: Jason Gunthorpe --- tools/testing/selftests/iommu/iommufd.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/tools/testing/selftests/iommu/iommufd.c b/tools/testing/selftests/iommu/iommufd.c index e61218c0537f..1926ef6b40ab 100644 --- a/tools/testing/selftests/iommu/iommufd.c +++ b/tools/testing/selftests/iommu/iommufd.c @@ -1748,13 +1748,15 @@ TEST_F(iommufd_mock_domain, all_aligns) unsigned int end; uint8_t *buf; int prot = PROT_READ | PROT_WRITE; - int mfd; + int mfd = -1; if (variant->file) buf = memfd_mmap(buf_size, prot, MAP_SHARED, &mfd); else buf = mmap(0, buf_size, prot, self->mmap_flags, -1, 0); ASSERT_NE(MAP_FAILED, buf); + if (variant->file) + ASSERT_GT(mfd, 0); check_refs(buf, buf_size, 0); /* @@ -1800,13 +1802,15 @@ TEST_F(iommufd_mock_domain, all_aligns_copy) unsigned int end; uint8_t *buf; int prot = PROT_READ | PROT_WRITE; - int mfd; + int mfd = -1; if (variant->file) buf = memfd_mmap(buf_size, prot, MAP_SHARED, &mfd); else buf = mmap(0, buf_size, prot, self->mmap_flags, -1, 0); ASSERT_NE(MAP_FAILED, buf); + if (variant->file) + ASSERT_GT(mfd, 0); check_refs(buf, buf_size, 0); /* From 5c4acbc8ce9025fe6e318966af8d3c42ffc6e9ca Mon Sep 17 00:00:00 2001 From: Alan Huang Date: Wed, 25 Jun 2025 03:10:27 +0800 Subject: [PATCH 154/289] bcachefs: Don't unlock the trans if ret doesn't match BCH_ERR_operation_blocked Reported-by: syzbot+d540192e763531d307ff@syzkaller.appspotmail.com Signed-off-by: Alan Huang Signed-off-by: Kent Overstreet --- fs/bcachefs/btree_update_interior.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/fs/bcachefs/btree_update_interior.c b/fs/bcachefs/btree_update_interior.c index 7bf1bd6a6e92..553059b33bfd 100644 --- a/fs/bcachefs/btree_update_interior.c +++ b/fs/bcachefs/btree_update_interior.c @@ -1287,10 +1287,11 @@ bch2_btree_update_start(struct btree_trans *trans, struct btree_path *path, do { ret = bch2_btree_reserve_get(trans, as, nr_nodes, target, flags, &cl); - + if (!bch2_err_matches(ret, BCH_ERR_operation_blocked)) + break; bch2_trans_unlock(trans); bch2_wait_on_allocator(c, &cl); - } while (bch2_err_matches(ret, BCH_ERR_operation_blocked)); + } while (1); } if (ret) { From 865ad1dbf13244b349e3da2731c1188476ad17b8 Mon Sep 17 00:00:00 2001 From: Kent Overstreet Date: Mon, 23 Jun 2025 18:42:42 -0400 Subject: [PATCH 155/289] bcachefs: Check for bad write buffer key when moving from journal Signed-off-by: Kent Overstreet --- fs/bcachefs/btree_update.h | 5 ++--- fs/bcachefs/btree_write_buffer.c | 5 ++--- fs/bcachefs/btree_write_buffer.h | 6 ++++++ 3 files changed, 10 insertions(+), 6 deletions(-) diff --git a/fs/bcachefs/btree_update.h b/fs/bcachefs/btree_update.h index 9feef1dc4de5..0b98ab959719 100644 --- a/fs/bcachefs/btree_update.h +++ b/fs/bcachefs/btree_update.h @@ -170,8 +170,7 @@ bch2_trans_jset_entry_alloc(struct btree_trans *trans, unsigned u64s) int bch2_btree_insert_clone_trans(struct btree_trans *, enum btree_id, struct bkey_i *); -int bch2_btree_write_buffer_insert_err(struct btree_trans *, - enum btree_id, struct bkey_i *); +int bch2_btree_write_buffer_insert_err(struct bch_fs *, enum btree_id, struct bkey_i *); static inline int __must_check bch2_trans_update_buffered(struct btree_trans *trans, enum btree_id btree, @@ -182,7 +181,7 @@ static inline int __must_check bch2_trans_update_buffered(struct btree_trans *tr EBUG_ON(k->k.u64s > BTREE_WRITE_BUFERED_U64s_MAX); if (unlikely(!btree_type_uses_write_buffer(btree))) { - int ret = bch2_btree_write_buffer_insert_err(trans, btree, k); + int ret = bch2_btree_write_buffer_insert_err(trans->c, btree, k); dump_stack(); return ret; } diff --git a/fs/bcachefs/btree_write_buffer.c b/fs/bcachefs/btree_write_buffer.c index 21b5c03d1822..4b095235a0d2 100644 --- a/fs/bcachefs/btree_write_buffer.c +++ b/fs/bcachefs/btree_write_buffer.c @@ -267,10 +267,9 @@ out: BUG_ON(wb->sorted.size < wb->flushing.keys.nr); } -int bch2_btree_write_buffer_insert_err(struct btree_trans *trans, +int bch2_btree_write_buffer_insert_err(struct bch_fs *c, enum btree_id btree, struct bkey_i *k) { - struct bch_fs *c = trans->c; struct printbuf buf = PRINTBUF; prt_printf(&buf, "attempting to do write buffer update on non wb btree="); @@ -332,7 +331,7 @@ static int bch2_btree_write_buffer_flush_locked(struct btree_trans *trans) struct btree_write_buffered_key *k = &wb->flushing.keys.data[i->idx]; if (unlikely(!btree_type_uses_write_buffer(k->btree))) { - ret = bch2_btree_write_buffer_insert_err(trans, k->btree, &k->k); + ret = bch2_btree_write_buffer_insert_err(trans->c, k->btree, &k->k); goto err; } diff --git a/fs/bcachefs/btree_write_buffer.h b/fs/bcachefs/btree_write_buffer.h index 05f56fd1eed0..c351d21aca0b 100644 --- a/fs/bcachefs/btree_write_buffer.h +++ b/fs/bcachefs/btree_write_buffer.h @@ -89,6 +89,12 @@ static inline int bch2_journal_key_to_wb(struct bch_fs *c, struct journal_keys_to_wb *dst, enum btree_id btree, struct bkey_i *k) { + if (unlikely(!btree_type_uses_write_buffer(btree))) { + int ret = bch2_btree_write_buffer_insert_err(c, btree, k); + dump_stack(); + return ret; + } + EBUG_ON(!dst->seq); return k->k.type == KEY_TYPE_accounting From 5f465c148c61e876b6d6eacd8e8e365f2d47758f Mon Sep 17 00:00:00 2001 From: "Xin Li (Intel)" Date: Fri, 20 Jun 2025 16:15:03 -0700 Subject: [PATCH 156/289] x86/traps: Initialize DR6 by writing its architectural reset value Initialize DR6 by writing its architectural reset value to avoid incorrectly zeroing DR6 to clear DR6.BLD at boot time, which leads to a false bus lock detected warning. The Intel SDM says: 1) Certain debug exceptions may clear bits 0-3 of DR6. 2) BLD induced #DB clears DR6.BLD and any other debug exception doesn't modify DR6.BLD. 3) RTM induced #DB clears DR6.RTM and any other debug exception sets DR6.RTM. To avoid confusion in identifying debug exceptions, debug handlers should set DR6.BLD and DR6.RTM, and clear other DR6 bits before returning. The DR6 architectural reset value 0xFFFF0FF0, already defined as macro DR6_RESERVED, satisfies these requirements, so just use it to reinitialize DR6 whenever needed. Since clear_all_debug_regs() no longer zeros all debug registers, rename it to initialize_debug_regs() to better reflect its current behavior. Since debug_read_clear_dr6() no longer clears DR6, rename it to debug_read_reset_dr6() to better reflect its current behavior. Fixes: ebb1064e7c2e9 ("x86/traps: Handle #DB for bus lock") Reported-by: Sohil Mehta Suggested-by: H. Peter Anvin (Intel) Signed-off-by: Xin Li (Intel) Signed-off-by: Dave Hansen Reviewed-by: H. Peter Anvin (Intel) Reviewed-by: Sohil Mehta Acked-by: Peter Zijlstra (Intel) Tested-by: Sohil Mehta Link: https://lore.kernel.org/lkml/06e68373-a92b-472e-8fd9-ba548119770c@intel.com/ Cc:stable@vger.kernel.org Link: https://lore.kernel.org/all/20250620231504.2676902-2-xin%40zytor.com --- arch/x86/include/uapi/asm/debugreg.h | 21 ++++++++++++++++- arch/x86/kernel/cpu/common.c | 24 ++++++++------------ arch/x86/kernel/traps.c | 34 +++++++++++++++++----------- 3 files changed, 51 insertions(+), 28 deletions(-) diff --git a/arch/x86/include/uapi/asm/debugreg.h b/arch/x86/include/uapi/asm/debugreg.h index 0007ba077c0c..41da492dfb01 100644 --- a/arch/x86/include/uapi/asm/debugreg.h +++ b/arch/x86/include/uapi/asm/debugreg.h @@ -15,7 +15,26 @@ which debugging register was responsible for the trap. The other bits are either reserved or not of interest to us. */ -/* Define reserved bits in DR6 which are always set to 1 */ +/* + * Define bits in DR6 which are set to 1 by default. + * + * This is also the DR6 architectural value following Power-up, Reset or INIT. + * + * Note, with the introduction of Bus Lock Detection (BLD) and Restricted + * Transactional Memory (RTM), the DR6 register has been modified: + * + * 1) BLD flag (bit 11) is no longer reserved to 1 if the CPU supports + * Bus Lock Detection. The assertion of a bus lock could clear it. + * + * 2) RTM flag (bit 16) is no longer reserved to 1 if the CPU supports + * restricted transactional memory. #DB occurred inside an RTM region + * could clear it. + * + * Apparently, DR6.BLD and DR6.RTM are active low bits. + * + * As a result, DR6_RESERVED is an incorrect name now, but it is kept for + * compatibility. + */ #define DR6_RESERVED (0xFFFF0FF0) #define DR_TRAP0 (0x1) /* db0 */ diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c index 8feb8fd2957a..0f6c280a94f0 100644 --- a/arch/x86/kernel/cpu/common.c +++ b/arch/x86/kernel/cpu/common.c @@ -2243,20 +2243,16 @@ EXPORT_PER_CPU_SYMBOL(__stack_chk_guard); #endif #endif -/* - * Clear all 6 debug registers: - */ -static void clear_all_debug_regs(void) +static void initialize_debug_regs(void) { - int i; - - for (i = 0; i < 8; i++) { - /* Ignore db4, db5 */ - if ((i == 4) || (i == 5)) - continue; - - set_debugreg(0, i); - } + /* Control register first -- to make sure everything is disabled. */ + set_debugreg(0, 7); + set_debugreg(DR6_RESERVED, 6); + /* dr5 and dr4 don't exist */ + set_debugreg(0, 3); + set_debugreg(0, 2); + set_debugreg(0, 1); + set_debugreg(0, 0); } #ifdef CONFIG_KGDB @@ -2417,7 +2413,7 @@ void cpu_init(void) load_mm_ldt(&init_mm); - clear_all_debug_regs(); + initialize_debug_regs(); dbg_restore_debug_regs(); doublefault_init_cpu_tss(); diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c index c5c897a86418..36354b470590 100644 --- a/arch/x86/kernel/traps.c +++ b/arch/x86/kernel/traps.c @@ -1022,24 +1022,32 @@ static bool is_sysenter_singlestep(struct pt_regs *regs) #endif } -static __always_inline unsigned long debug_read_clear_dr6(void) +static __always_inline unsigned long debug_read_reset_dr6(void) { unsigned long dr6; + get_debugreg(dr6, 6); + dr6 ^= DR6_RESERVED; /* Flip to positive polarity */ + /* * The Intel SDM says: * - * Certain debug exceptions may clear bits 0-3. The remaining - * contents of the DR6 register are never cleared by the - * processor. To avoid confusion in identifying debug - * exceptions, debug handlers should clear the register before - * returning to the interrupted task. + * Certain debug exceptions may clear bits 0-3 of DR6. * - * Keep it simple: clear DR6 immediately. + * BLD induced #DB clears DR6.BLD and any other debug + * exception doesn't modify DR6.BLD. + * + * RTM induced #DB clears DR6.RTM and any other debug + * exception sets DR6.RTM. + * + * To avoid confusion in identifying debug exceptions, + * debug handlers should set DR6.BLD and DR6.RTM, and + * clear other DR6 bits before returning. + * + * Keep it simple: write DR6 with its architectural reset + * value 0xFFFF0FF0, defined as DR6_RESERVED, immediately. */ - get_debugreg(dr6, 6); set_debugreg(DR6_RESERVED, 6); - dr6 ^= DR6_RESERVED; /* Flip to positive polarity */ return dr6; } @@ -1239,13 +1247,13 @@ out: /* IST stack entry */ DEFINE_IDTENTRY_DEBUG(exc_debug) { - exc_debug_kernel(regs, debug_read_clear_dr6()); + exc_debug_kernel(regs, debug_read_reset_dr6()); } /* User entry, runs on regular task stack */ DEFINE_IDTENTRY_DEBUG_USER(exc_debug) { - exc_debug_user(regs, debug_read_clear_dr6()); + exc_debug_user(regs, debug_read_reset_dr6()); } #ifdef CONFIG_X86_FRED @@ -1264,7 +1272,7 @@ DEFINE_FREDENTRY_DEBUG(exc_debug) { /* * FRED #DB stores DR6 on the stack in the format which - * debug_read_clear_dr6() returns for the IDT entry points. + * debug_read_reset_dr6() returns for the IDT entry points. */ unsigned long dr6 = fred_event_data(regs); @@ -1279,7 +1287,7 @@ DEFINE_FREDENTRY_DEBUG(exc_debug) /* 32 bit does not have separate entry points. */ DEFINE_IDTENTRY_RAW(exc_debug) { - unsigned long dr6 = debug_read_clear_dr6(); + unsigned long dr6 = debug_read_reset_dr6(); if (user_mode(regs)) exc_debug_user(regs, dr6); From fa7d0f83c5c4223a01598876352473cb3d3bd4d7 Mon Sep 17 00:00:00 2001 From: "Xin Li (Intel)" Date: Fri, 20 Jun 2025 16:15:04 -0700 Subject: [PATCH 157/289] x86/traps: Initialize DR7 by writing its architectural reset value Initialize DR7 by writing its architectural reset value to always set bit 10, which is reserved to '1', when "clearing" DR7 so as not to trigger unanticipated behavior if said bit is ever unreserved, e.g. as a feature enabling flag with inverted polarity. Signed-off-by: Xin Li (Intel) Signed-off-by: Dave Hansen Reviewed-by: H. Peter Anvin (Intel) Reviewed-by: Sohil Mehta Acked-by: Peter Zijlstra (Intel) Acked-by: Sean Christopherson Tested-by: Sohil Mehta Cc:stable@vger.kernel.org Link: https://lore.kernel.org/all/20250620231504.2676902-3-xin%40zytor.com --- arch/x86/include/asm/debugreg.h | 19 +++++++++++++++---- arch/x86/include/asm/kvm_host.h | 2 +- arch/x86/kernel/cpu/common.c | 2 +- arch/x86/kernel/kgdb.c | 2 +- arch/x86/kernel/process_32.c | 2 +- arch/x86/kernel/process_64.c | 2 +- arch/x86/kvm/x86.c | 4 ++-- 7 files changed, 22 insertions(+), 11 deletions(-) diff --git a/arch/x86/include/asm/debugreg.h b/arch/x86/include/asm/debugreg.h index 363110e6b2e3..a2c1f2d24b64 100644 --- a/arch/x86/include/asm/debugreg.h +++ b/arch/x86/include/asm/debugreg.h @@ -9,6 +9,14 @@ #include #include +/* + * Define bits that are always set to 1 in DR7, only bit 10 is + * architecturally reserved to '1'. + * + * This is also the init/reset value for DR7. + */ +#define DR7_FIXED_1 0x00000400 + DECLARE_PER_CPU(unsigned long, cpu_dr7); #ifndef CONFIG_PARAVIRT_XXL @@ -100,8 +108,8 @@ static __always_inline void native_set_debugreg(int regno, unsigned long value) static inline void hw_breakpoint_disable(void) { - /* Zero the control register for HW Breakpoint */ - set_debugreg(0UL, 7); + /* Reset the control register for HW Breakpoint */ + set_debugreg(DR7_FIXED_1, 7); /* Zero-out the individual HW breakpoint address registers */ set_debugreg(0UL, 0); @@ -125,9 +133,12 @@ static __always_inline unsigned long local_db_save(void) return 0; get_debugreg(dr7, 7); - dr7 &= ~0x400; /* architecturally set bit */ + + /* Architecturally set bit */ + dr7 &= ~DR7_FIXED_1; if (dr7) - set_debugreg(0, 7); + set_debugreg(DR7_FIXED_1, 7); + /* * Ensure the compiler doesn't lower the above statements into * the critical section; disabling breakpoints late would not diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index b4a391929cdb..639d9bcee842 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -31,6 +31,7 @@ #include #include +#include #include #include #include @@ -249,7 +250,6 @@ enum x86_intercept_stage; #define DR7_BP_EN_MASK 0x000000ff #define DR7_GE (1 << 9) #define DR7_GD (1 << 13) -#define DR7_FIXED_1 0x00000400 #define DR7_VOLATILE 0xffff2bff #define KVM_GUESTDBG_VALID_MASK \ diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c index 0f6c280a94f0..27125e009847 100644 --- a/arch/x86/kernel/cpu/common.c +++ b/arch/x86/kernel/cpu/common.c @@ -2246,7 +2246,7 @@ EXPORT_PER_CPU_SYMBOL(__stack_chk_guard); static void initialize_debug_regs(void) { /* Control register first -- to make sure everything is disabled. */ - set_debugreg(0, 7); + set_debugreg(DR7_FIXED_1, 7); set_debugreg(DR6_RESERVED, 6); /* dr5 and dr4 don't exist */ set_debugreg(0, 3); diff --git a/arch/x86/kernel/kgdb.c b/arch/x86/kernel/kgdb.c index 102641fd2172..8b1a9733d13e 100644 --- a/arch/x86/kernel/kgdb.c +++ b/arch/x86/kernel/kgdb.c @@ -385,7 +385,7 @@ static void kgdb_disable_hw_debug(struct pt_regs *regs) struct perf_event *bp; /* Disable hardware debugging while we are in kgdb: */ - set_debugreg(0UL, 7); + set_debugreg(DR7_FIXED_1, 7); for (i = 0; i < HBP_NUM; i++) { if (!breakinfo[i].enabled) continue; diff --git a/arch/x86/kernel/process_32.c b/arch/x86/kernel/process_32.c index a10e180cbf23..3ef15c2f152f 100644 --- a/arch/x86/kernel/process_32.c +++ b/arch/x86/kernel/process_32.c @@ -93,7 +93,7 @@ void __show_regs(struct pt_regs *regs, enum show_regs_mode mode, /* Only print out debug registers if they are in their non-default state. */ if ((d0 == 0) && (d1 == 0) && (d2 == 0) && (d3 == 0) && - (d6 == DR6_RESERVED) && (d7 == 0x400)) + (d6 == DR6_RESERVED) && (d7 == DR7_FIXED_1)) return; printk("%sDR0: %08lx DR1: %08lx DR2: %08lx DR3: %08lx\n", diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c index 8d6cf25127aa..b972bf72fb8b 100644 --- a/arch/x86/kernel/process_64.c +++ b/arch/x86/kernel/process_64.c @@ -133,7 +133,7 @@ void __show_regs(struct pt_regs *regs, enum show_regs_mode mode, /* Only print out debug registers if they are in their non-default state. */ if (!((d0 == 0) && (d1 == 0) && (d2 == 0) && (d3 == 0) && - (d6 == DR6_RESERVED) && (d7 == 0x400))) { + (d6 == DR6_RESERVED) && (d7 == DR7_FIXED_1))) { printk("%sDR0: %016lx DR1: %016lx DR2: %016lx\n", log_lvl, d0, d1, d2); printk("%sDR3: %016lx DR6: %016lx DR7: %016lx\n", diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index b58a74c1722d..a9d992d5652f 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -11035,7 +11035,7 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu) if (unlikely(vcpu->arch.switch_db_regs && !(vcpu->arch.switch_db_regs & KVM_DEBUGREG_AUTO_SWITCH))) { - set_debugreg(0, 7); + set_debugreg(DR7_FIXED_1, 7); set_debugreg(vcpu->arch.eff_db[0], 0); set_debugreg(vcpu->arch.eff_db[1], 1); set_debugreg(vcpu->arch.eff_db[2], 2); @@ -11044,7 +11044,7 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu) if (unlikely(vcpu->arch.switch_db_regs & KVM_DEBUGREG_WONT_EXIT)) kvm_x86_call(set_dr6)(vcpu, vcpu->arch.dr6); } else if (unlikely(hw_breakpoint_active())) { - set_debugreg(0, 7); + set_debugreg(DR7_FIXED_1, 7); } vcpu->arch.host_debugctl = get_debugctlmsr(); From f5109c201cf2bc304a05ecc40c2aabb119b27833 Mon Sep 17 00:00:00 2001 From: Kent Overstreet Date: Tue, 24 Jun 2025 17:53:00 -0400 Subject: [PATCH 158/289] bcachefs: Use wait_on_allocator() when allocating journal wait_on_allocator() emits debug info when we hang trying to allocate. Signed-off-by: Kent Overstreet --- fs/bcachefs/journal.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/bcachefs/journal.c b/fs/bcachefs/journal.c index df71af0013ba..f22b05e02c1e 100644 --- a/fs/bcachefs/journal.c +++ b/fs/bcachefs/journal.c @@ -1283,7 +1283,7 @@ static int bch2_set_nr_journal_buckets_loop(struct bch_fs *c, struct bch_dev *ca ret = 0; /* wait and retry */ bch2_disk_reservation_put(c, &disk_res); - closure_sync(&cl); + bch2_wait_on_allocator(c, &cl); } return ret; From 1f8aede70d491a1d5867f575ca44c86fe2e335ae Mon Sep 17 00:00:00 2001 From: Kent Overstreet Date: Tue, 24 Jun 2025 16:04:32 -0400 Subject: [PATCH 159/289] bcachefs: fix bch2_journal_keys_peek_prev_min() underflow Signed-off-by: Kent Overstreet --- fs/bcachefs/btree_journal_iter.c | 20 +++++++++++++++----- 1 file changed, 15 insertions(+), 5 deletions(-) diff --git a/fs/bcachefs/btree_journal_iter.c b/fs/bcachefs/btree_journal_iter.c index a41fabd06332..ea839560a136 100644 --- a/fs/bcachefs/btree_journal_iter.c +++ b/fs/bcachefs/btree_journal_iter.c @@ -137,12 +137,15 @@ struct bkey_i *bch2_journal_keys_peek_prev_min(struct bch_fs *c, enum btree_id b struct journal_key *k; BUG_ON(*idx > keys->nr); + + if (!keys->nr) + return NULL; search: if (!*idx) *idx = __bch2_journal_key_search(keys, btree_id, level, pos); while (*idx < keys->nr && - __journal_key_cmp(btree_id, level, end_pos, idx_to_key(keys, *idx - 1)) >= 0) { + __journal_key_cmp(btree_id, level, end_pos, idx_to_key(keys, *idx)) >= 0) { (*idx)++; iters++; if (iters == 10) { @@ -151,18 +154,23 @@ search: } } + if (*idx == keys->nr) + --(*idx); + struct bkey_i *ret = NULL; rcu_read_lock(); /* for overwritten_ranges */ - while ((k = *idx < keys->nr ? idx_to_key(keys, *idx) : NULL)) { + while (true) { + k = idx_to_key(keys, *idx); if (__journal_key_cmp(btree_id, level, end_pos, k) > 0) break; if (k->overwritten) { if (k->overwritten_range) - *idx = rcu_dereference(k->overwritten_range)->start - 1; - else - *idx -= 1; + *idx = rcu_dereference(k->overwritten_range)->start; + if (!*idx) + break; + --(*idx); continue; } @@ -171,6 +179,8 @@ search: break; } + if (!*idx) + break; --(*idx); iters++; if (iters == 10) { From 524346e9d79f63a6e5aaa645140da3d1ec7a8a0f Mon Sep 17 00:00:00 2001 From: Ming Lei Date: Wed, 25 Jun 2025 10:25:54 +0800 Subject: [PATCH 160/289] ublk: build batch from IOs in same io_ring_ctx and io task ublk_queue_cmd_list() dispatches the whole batch list by scheduling task work via the tail request's io_uring_cmd, this way is fine even though more than one io_ring_ctx are involved for this batch since it is just one running context. However, the task work handler ublk_cmd_list_tw_cb() takes `issue_flags` of tail uring_cmd's io_ring_ctx for completing all commands. This way is wrong if any uring_cmd is issued from different io_ring_ctx. Fixes it by always building batch IOs from same io_ring_ctx and io task because ublk_dispatch_req() does validate task context, and IO needs to be aborted in case of running from fallback task work context. For typical per-queue or per-io daemon implementation, this way shouldn't make difference from performance viewpoint, because single io_ring_ctx is taken in each daemon for normal use case. Fixes: d796cea7b9f3 ("ublk: implement ->queue_rqs()") Signed-off-by: Ming Lei Link: https://lore.kernel.org/r/20250625022554.883571-1-ming.lei@redhat.com Signed-off-by: Jens Axboe --- drivers/block/ublk_drv.c | 11 ++++++++++- 1 file changed, 10 insertions(+), 1 deletion(-) diff --git a/drivers/block/ublk_drv.c b/drivers/block/ublk_drv.c index d36f44f5ee80..90c02ced392a 100644 --- a/drivers/block/ublk_drv.c +++ b/drivers/block/ublk_drv.c @@ -1416,6 +1416,14 @@ static blk_status_t ublk_queue_rq(struct blk_mq_hw_ctx *hctx, return BLK_STS_OK; } +static inline bool ublk_belong_to_same_batch(const struct ublk_io *io, + const struct ublk_io *io2) +{ + return (io_uring_cmd_ctx_handle(io->cmd) == + io_uring_cmd_ctx_handle(io2->cmd)) && + (io->task == io2->task); +} + static void ublk_queue_rqs(struct rq_list *rqlist) { struct rq_list requeue_list = { }; @@ -1427,7 +1435,8 @@ static void ublk_queue_rqs(struct rq_list *rqlist) struct ublk_queue *this_q = req->mq_hctx->driver_data; struct ublk_io *this_io = &this_q->ios[req->tag]; - if (io && io->task != this_io->task && !rq_list_empty(&submit_list)) + if (io && !ublk_belong_to_same_batch(io, this_io) && + !rq_list_empty(&submit_list)) ublk_queue_cmd_list(io, &submit_list); io = this_io; From 5223372e672eaa576c892984b438875426d8ff2b Mon Sep 17 00:00:00 2001 From: Ming Lei Date: Mon, 23 Jun 2025 09:19:27 +0800 Subject: [PATCH 161/289] selftests: ublk: don't take same backing file for more than one ublk devices Don't use same backing file for more than one ublk devices, and avoid concurrent write on same file from more ublk disks. Fixes: 8ccebc19ee3d ("selftests: ublk: support UBLK_F_AUTO_BUF_REG") Signed-off-by: Ming Lei Link: https://lore.kernel.org/r/20250623011934.741788-3-ming.lei@redhat.com Signed-off-by: Jens Axboe --- tools/testing/selftests/ublk/test_stress_03.sh | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/tools/testing/selftests/ublk/test_stress_03.sh b/tools/testing/selftests/ublk/test_stress_03.sh index 6eef282d569f..3ed4c9b2d8c0 100755 --- a/tools/testing/selftests/ublk/test_stress_03.sh +++ b/tools/testing/selftests/ublk/test_stress_03.sh @@ -32,22 +32,23 @@ _create_backfile 2 128M ublk_io_and_remove 8G -t null -q 4 -z & ublk_io_and_remove 256M -t loop -q 4 -z "${UBLK_BACKFILES[0]}" & ublk_io_and_remove 256M -t stripe -q 4 -z "${UBLK_BACKFILES[1]}" "${UBLK_BACKFILES[2]}" & +wait if _have_feature "AUTO_BUF_REG"; then ublk_io_and_remove 8G -t null -q 4 --auto_zc & ublk_io_and_remove 256M -t loop -q 4 --auto_zc "${UBLK_BACKFILES[0]}" & ublk_io_and_remove 256M -t stripe -q 4 --auto_zc "${UBLK_BACKFILES[1]}" "${UBLK_BACKFILES[2]}" & ublk_io_and_remove 8G -t null -q 4 -z --auto_zc --auto_zc_fallback & + wait fi -wait if _have_feature "PER_IO_DAEMON"; then ublk_io_and_remove 8G -t null -q 4 --auto_zc --nthreads 8 --per_io_tasks & ublk_io_and_remove 256M -t loop -q 4 --auto_zc --nthreads 8 --per_io_tasks "${UBLK_BACKFILES[0]}" & ublk_io_and_remove 256M -t stripe -q 4 --auto_zc --nthreads 8 --per_io_tasks "${UBLK_BACKFILES[1]}" "${UBLK_BACKFILES[2]}" & ublk_io_and_remove 8G -t null -q 4 -z --auto_zc --auto_zc_fallback --nthreads 8 --per_io_tasks & + wait fi -wait _cleanup_test "stress" _show_result $TID $ERR_CODE From 67caa528ae08cd05e485c0ea6aea0baaf6579b06 Mon Sep 17 00:00:00 2001 From: Caleb Sander Mateos Date: Sat, 21 Jun 2025 10:28:41 -0600 Subject: [PATCH 162/289] ublk: fix narrowing warnings in UAPI header When a C++ file compiled with -Wc++11-narrowing includes the UAPI header linux/ublk_cmd.h, ublk_sqe_addr_to_auto_buf_reg()'s assignments of u64 values to u8, u16, and u32 fields result in compiler warnings. Add explicit casts to the intended types to avoid these warnings. Drop the unnecessary bitmasks. Reported-by: Uday Shankar Signed-off-by: Caleb Sander Mateos Fixes: 99c1e4eb6a3f ("ublk: register buffer to local io_uring with provided buf index via UBLK_F_AUTO_BUF_REG") Reviewed-by: Ming Lei Link: https://lore.kernel.org/r/20250621162842.337452-1-csander@purestorage.com Signed-off-by: Jens Axboe --- include/uapi/linux/ublk_cmd.h | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/include/uapi/linux/ublk_cmd.h b/include/uapi/linux/ublk_cmd.h index 77d9d6af46da..c062109cb686 100644 --- a/include/uapi/linux/ublk_cmd.h +++ b/include/uapi/linux/ublk_cmd.h @@ -450,10 +450,10 @@ static inline struct ublk_auto_buf_reg ublk_sqe_addr_to_auto_buf_reg( __u64 sqe_addr) { struct ublk_auto_buf_reg reg = { - .index = sqe_addr & 0xffff, - .flags = (sqe_addr >> 16) & 0xff, - .reserved0 = (sqe_addr >> 24) & 0xff, - .reserved1 = sqe_addr >> 32, + .index = (__u16)sqe_addr, + .flags = (__u8)(sqe_addr >> 16), + .reserved0 = (__u8)(sqe_addr >> 24), + .reserved1 = (__u32)(sqe_addr >> 32), }; return reg; From 81b4d1a1d03301dcca8af5c58eded9e535f1f6ed Mon Sep 17 00:00:00 2001 From: Caleb Sander Mateos Date: Sat, 21 Jun 2025 11:10:14 -0600 Subject: [PATCH 163/289] ublk: update UBLK_F_SUPPORT_ZERO_COPY comment in UAPI header UBLK_F_SUPPORT_ZERO_COPY has a very old comment describing the initial idea for how zero-copy would be implemented. The actual implementation added in commit 1f6540e2aabb ("ublk: zc register/unregister bvec") uses io_uring registered buffers rather than shared memory mapping. Remove the inaccurate remarks about mapping ublk request memory into the ublk server's address space and requiring 4K block size. Replace them with a description of the current zero-copy mechanism. Signed-off-by: Caleb Sander Mateos Reviewed-by: Ming Lei Link: https://lore.kernel.org/r/20250621171015.354932-1-csander@purestorage.com Signed-off-by: Jens Axboe --- include/uapi/linux/ublk_cmd.h | 24 ++++++++++++++++++++++-- 1 file changed, 22 insertions(+), 2 deletions(-) diff --git a/include/uapi/linux/ublk_cmd.h b/include/uapi/linux/ublk_cmd.h index c062109cb686..c9751bdfd937 100644 --- a/include/uapi/linux/ublk_cmd.h +++ b/include/uapi/linux/ublk_cmd.h @@ -135,8 +135,28 @@ #define UBLKSRV_IO_BUF_TOTAL_SIZE (1ULL << UBLKSRV_IO_BUF_TOTAL_BITS) /* - * zero copy requires 4k block size, and can remap ublk driver's io - * request into ublksrv's vm space + * ublk server can register data buffers for incoming I/O requests with a sparse + * io_uring buffer table. The request buffer can then be used as the data buffer + * for io_uring operations via the fixed buffer index. + * Note that the ublk server can never directly access the request data memory. + * + * To use this feature, the ublk server must first register a sparse buffer + * table on an io_uring instance. + * When an incoming ublk request is received, the ublk server submits a + * UBLK_U_IO_REGISTER_IO_BUF command to that io_uring instance. The + * ublksrv_io_cmd's q_id and tag specify the request whose buffer to register + * and addr is the index in the io_uring's buffer table to install the buffer. + * SQEs can now be submitted to the io_uring to read/write the request's buffer + * by enabling fixed buffers (e.g. using IORING_OP_{READ,WRITE}_FIXED or + * IORING_URING_CMD_FIXED) and passing the registered buffer index in buf_index. + * Once the last io_uring operation using the request's buffer has completed, + * the ublk server submits a UBLK_U_IO_UNREGISTER_IO_BUF command with q_id, tag, + * and addr again specifying the request buffer to unregister. + * The ublk request is completed when its buffer is unregistered from all + * io_uring instances and the ublk server issues UBLK_U_IO_COMMIT_AND_FETCH_REQ. + * + * Not available for UBLK_F_UNPRIVILEGED_DEV, as a ublk server can leak + * uninitialized kernel memory by not reading into the full request buffer. */ #define UBLK_F_SUPPORT_ZERO_COPY (1ULL << 0) From 4c8a951787ffc4b61a547db9866196104971b5fd Mon Sep 17 00:00:00 2001 From: Ming Lei Date: Tue, 24 Jun 2025 18:41:21 +0800 Subject: [PATCH 164/289] ublk: setup ublk_io correctly in case of ublk_get_data() failure If ublk_get_data() fails, -EIOCBQUEUED is returned and the current command becomes ASYNC. And the only reason is that mapping data can't move on, because of no enough pages or pending signal, then the current ublk request has to be requeued. Once the request need to be requeued, we have to setup `ublk_io` correctly, including io->cmd and flags, otherwise the request may not be forwarded to ublk server successfully. Fixes: 9810362a57cb ("ublk: don't call ublk_dispatch_req() for NEED_GET_DATA") Reported-by: Changhui Zhong Closes: https://lore.kernel.org/linux-block/CAGVVp+VN9QcpHUz_0nasFf5q9i1gi8H8j-G-6mkBoqa3TyjRHA@mail.gmail.com/ Signed-off-by: Ming Lei Tested-by: Changhui Zhong Link: https://lore.kernel.org/r/20250624104121.859519-1-ming.lei@redhat.com Signed-off-by: Jens Axboe --- drivers/block/ublk_drv.c | 35 +++++++++++++++++++++++++---------- 1 file changed, 25 insertions(+), 10 deletions(-) diff --git a/drivers/block/ublk_drv.c b/drivers/block/ublk_drv.c index 90c02ced392a..d441d3259edb 100644 --- a/drivers/block/ublk_drv.c +++ b/drivers/block/ublk_drv.c @@ -1148,8 +1148,8 @@ exit: blk_mq_end_request(req, res); } -static void ublk_complete_io_cmd(struct ublk_io *io, struct request *req, - int res, unsigned issue_flags) +static struct io_uring_cmd *__ublk_prep_compl_io_cmd(struct ublk_io *io, + struct request *req) { /* read cmd first because req will overwrite it */ struct io_uring_cmd *cmd = io->cmd; @@ -1164,6 +1164,13 @@ static void ublk_complete_io_cmd(struct ublk_io *io, struct request *req, io->flags &= ~UBLK_IO_FLAG_ACTIVE; io->req = req; + return cmd; +} + +static void ublk_complete_io_cmd(struct ublk_io *io, struct request *req, + int res, unsigned issue_flags) +{ + struct io_uring_cmd *cmd = __ublk_prep_compl_io_cmd(io, req); /* tell ublksrv one io request is coming */ io_uring_cmd_done(cmd, res, 0, issue_flags); @@ -2157,10 +2164,9 @@ static int ublk_commit_and_fetch(const struct ublk_queue *ubq, return 0; } -static bool ublk_get_data(const struct ublk_queue *ubq, struct ublk_io *io) +static bool ublk_get_data(const struct ublk_queue *ubq, struct ublk_io *io, + struct request *req) { - struct request *req = io->req; - /* * We have handled UBLK_IO_NEED_GET_DATA command, * so clear UBLK_IO_FLAG_NEED_GET_DATA now and just @@ -2187,6 +2193,7 @@ static int __ublk_ch_uring_cmd(struct io_uring_cmd *cmd, u32 cmd_op = cmd->cmd_op; unsigned tag = ub_cmd->tag; int ret = -EINVAL; + struct request *req; pr_devel("%s: received: cmd op %d queue %d tag %d result %d\n", __func__, cmd->cmd_op, ub_cmd->q_id, tag, @@ -2245,11 +2252,19 @@ static int __ublk_ch_uring_cmd(struct io_uring_cmd *cmd, goto out; break; case UBLK_IO_NEED_GET_DATA: - io->addr = ub_cmd->addr; - if (!ublk_get_data(ubq, io)) - return -EIOCBQUEUED; - - return UBLK_IO_RES_OK; + /* + * ublk_get_data() may fail and fallback to requeue, so keep + * uring_cmd active first and prepare for handling new requeued + * request + */ + req = io->req; + ublk_fill_io_cmd(io, cmd, ub_cmd->addr); + io->flags &= ~UBLK_IO_FLAG_OWNED_BY_SRV; + if (likely(ublk_get_data(ubq, io, req))) { + __ublk_prep_compl_io_cmd(io, req); + return UBLK_IO_RES_OK; + } + break; default: goto out; } From 5afb4bf9fc62d828647647ec31745083637132e4 Mon Sep 17 00:00:00 2001 From: Pavel Begunkov Date: Tue, 24 Jun 2025 14:40:33 +0100 Subject: [PATCH 165/289] io_uring/rsrc: fix folio unpinning syzbot complains about an unmapping failure: [ 108.070381][ T14] kernel BUG at mm/gup.c:71! [ 108.070502][ T14] Internal error: Oops - BUG: 00000000f2000800 [#1] SMP [ 108.123672][ T14] Hardware name: QEMU KVM Virtual Machine, BIOS edk2-20250221-8.fc42 02/21/2025 [ 108.127458][ T14] Workqueue: iou_exit io_ring_exit_work [ 108.174205][ T14] Call trace: [ 108.175649][ T14] sanity_check_pinned_pages+0x7cc/0x7d0 (P) [ 108.178138][ T14] unpin_user_page+0x80/0x10c [ 108.180189][ T14] io_release_ubuf+0x84/0xf8 [ 108.182196][ T14] io_free_rsrc_node+0x250/0x57c [ 108.184345][ T14] io_rsrc_data_free+0x148/0x298 [ 108.186493][ T14] io_sqe_buffers_unregister+0x84/0xa0 [ 108.188991][ T14] io_ring_ctx_free+0x48/0x480 [ 108.191057][ T14] io_ring_exit_work+0x764/0x7d8 [ 108.193207][ T14] process_one_work+0x7e8/0x155c [ 108.195431][ T14] worker_thread+0x958/0xed8 [ 108.197561][ T14] kthread+0x5fc/0x75c [ 108.199362][ T14] ret_from_fork+0x10/0x20 We can pin a tail page of a folio, but then io_uring will try to unpin the head page of the folio. While it should be fine in terms of keeping the page actually alive, mm folks say it's wrong and triggers a debug warning. Use unpin_user_folio() instead of unpin_user_page*. Cc: stable@vger.kernel.org Debugged-by: David Hildenbrand Reported-by: syzbot+1d335893772467199ab6@syzkaller.appspotmail.com Closes: https://lkml.kernel.org/r/683f1551.050a0220.55ceb.0017.GAE@google.com Fixes: a8edbb424b139 ("io_uring/rsrc: enable multi-hugepage buffer coalescing") Signed-off-by: Pavel Begunkov Link: https://lore.kernel.org/io-uring/a28b0f87339ac2acf14a645dad1e95bbcbf18acd.1750771718.git.asml.silence@gmail.com/ [axboe: adapt to current tree, massage commit message] Signed-off-by: Jens Axboe --- io_uring/rsrc.c | 13 +++++++++---- 1 file changed, 9 insertions(+), 4 deletions(-) diff --git a/io_uring/rsrc.c b/io_uring/rsrc.c index d724602697e7..0c09e38784c9 100644 --- a/io_uring/rsrc.c +++ b/io_uring/rsrc.c @@ -112,8 +112,11 @@ static void io_release_ubuf(void *priv) struct io_mapped_ubuf *imu = priv; unsigned int i; - for (i = 0; i < imu->nr_bvecs; i++) - unpin_user_page(imu->bvec[i].bv_page); + for (i = 0; i < imu->nr_bvecs; i++) { + struct folio *folio = page_folio(imu->bvec[i].bv_page); + + unpin_user_folio(folio, 1); + } } static struct io_mapped_ubuf *io_alloc_imu(struct io_ring_ctx *ctx, @@ -840,8 +843,10 @@ done: if (ret) { if (imu) io_free_imu(ctx, imu); - if (pages) - unpin_user_pages(pages, nr_pages); + if (pages) { + for (i = 0; i < nr_pages; i++) + unpin_user_folio(page_folio(pages[i]), 1); + } io_cache_free(&ctx->node_cache, node); node = ERR_PTR(ret); } From 3a3c6d61577dbb23c09df3e21f6f9eda1ecd634b Mon Sep 17 00:00:00 2001 From: Pavel Begunkov Date: Tue, 24 Jun 2025 14:40:34 +0100 Subject: [PATCH 166/289] io_uring/rsrc: don't rely on user vaddr alignment There is no guaranteed alignment for user pointers, however the calculation of an offset of the first page into a folio after coalescing uses some weird bit mask logic, get rid of it. Cc: stable@vger.kernel.org Reported-by: David Hildenbrand Fixes: a8edbb424b139 ("io_uring/rsrc: enable multi-hugepage buffer coalescing") Signed-off-by: Pavel Begunkov Link: https://lore.kernel.org/io-uring/e387b4c78b33f231105a601d84eefd8301f57954.1750771718.git.asml.silence@gmail.com/ Signed-off-by: Jens Axboe --- io_uring/rsrc.c | 7 ++++++- io_uring/rsrc.h | 1 + 2 files changed, 7 insertions(+), 1 deletion(-) diff --git a/io_uring/rsrc.c b/io_uring/rsrc.c index 0c09e38784c9..afc67530f912 100644 --- a/io_uring/rsrc.c +++ b/io_uring/rsrc.c @@ -734,6 +734,7 @@ bool io_check_coalesce_buffer(struct page **page_array, int nr_pages, data->nr_pages_mid = folio_nr_pages(folio); data->folio_shift = folio_shift(folio); + data->first_folio_page_idx = folio_page_idx(folio, page_array[0]); /* * Check if pages are contiguous inside a folio, and all folios have @@ -827,7 +828,11 @@ static struct io_rsrc_node *io_sqe_buffer_register(struct io_ring_ctx *ctx, if (coalesced) imu->folio_shift = data.folio_shift; refcount_set(&imu->refs, 1); - off = (unsigned long) iov->iov_base & ((1UL << imu->folio_shift) - 1); + + off = (unsigned long)iov->iov_base & ~PAGE_MASK; + if (coalesced) + off += data.first_folio_page_idx << PAGE_SHIFT; + node->buf = imu; ret = 0; diff --git a/io_uring/rsrc.h b/io_uring/rsrc.h index 0d2138f16322..25e7e998dcfd 100644 --- a/io_uring/rsrc.h +++ b/io_uring/rsrc.h @@ -49,6 +49,7 @@ struct io_imu_folio_data { unsigned int nr_pages_mid; unsigned int folio_shift; unsigned int nr_folios; + unsigned long first_folio_page_idx; }; bool io_rsrc_cache_init(struct io_ring_ctx *ctx); From e1d7727b73a1f78035316ac35ee184d477059f0b Mon Sep 17 00:00:00 2001 From: Pavel Begunkov Date: Tue, 24 Jun 2025 14:40:35 +0100 Subject: [PATCH 167/289] io_uring: don't assume uaddr alignment in io_vec_fill_bvec There is no guaranteed alignment for user pointers. Don't use mask trickery and adjust the offset by bv_offset. Cc: stable@vger.kernel.org Reported-by: David Hildenbrand Fixes: 9ef4cbbcb4ac3 ("io_uring: add infra for importing vectored reg buffers") Signed-off-by: Pavel Begunkov Link: https://lore.kernel.org/io-uring/19530391f5c361a026ac9b401ff8e123bde55d98.1750771718.git.asml.silence@gmail.com/ Signed-off-by: Jens Axboe --- io_uring/rsrc.c | 10 +++++++--- 1 file changed, 7 insertions(+), 3 deletions(-) diff --git a/io_uring/rsrc.c b/io_uring/rsrc.c index afc67530f912..f2b31fb68992 100644 --- a/io_uring/rsrc.c +++ b/io_uring/rsrc.c @@ -1339,7 +1339,6 @@ static int io_vec_fill_bvec(int ddir, struct iov_iter *iter, { unsigned long folio_size = 1 << imu->folio_shift; unsigned long folio_mask = folio_size - 1; - u64 folio_addr = imu->ubuf & ~folio_mask; struct bio_vec *res_bvec = vec->bvec; size_t total_len = 0; unsigned bvec_idx = 0; @@ -1361,8 +1360,13 @@ static int io_vec_fill_bvec(int ddir, struct iov_iter *iter, if (unlikely(check_add_overflow(total_len, iov_len, &total_len))) return -EOVERFLOW; - /* by using folio address it also accounts for bvec offset */ - offset = buf_addr - folio_addr; + offset = buf_addr - imu->ubuf; + /* + * Only the first bvec can have non zero bv_offset, account it + * here and work with full folios below. + */ + offset += imu->bvec[0].bv_offset; + src_bvec = imu->bvec + (offset >> imu->folio_shift); offset &= folio_mask; From 5e9571750c4e53d16727a04159455c693d7b31cb Mon Sep 17 00:00:00 2001 From: Pei Xiao Date: Tue, 24 Jun 2025 17:00:47 +0800 Subject: [PATCH 168/289] ALSA: usb: qcom: fix NULL pointer dereference in qmi_stop_session The find_substream() call may return NULL, but the error path dereferenced 'subs' unconditionally via dev_err(&subs->dev->dev, ...), causing a NULL pointer dereference when subs is NULL. Fix by switching to &uadev[idx].udev->dev which is always valid in this context. Signed-off-by: Pei Xiao Link: https://patch.msgid.link/86ac2939273ac853535049e60391c09d7688714e.1750755508.git.xiaopei01@kylinos.cn Signed-off-by: Takashi Iwai --- sound/usb/qcom/qc_audio_offload.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/sound/usb/qcom/qc_audio_offload.c b/sound/usb/qcom/qc_audio_offload.c index 797afd4561bd..3543b5a53592 100644 --- a/sound/usb/qcom/qc_audio_offload.c +++ b/sound/usb/qcom/qc_audio_offload.c @@ -759,7 +759,7 @@ static void qmi_stop_session(void) subs = find_substream(pcm_card_num, info->pcm_dev_num, info->direction); if (!subs || !chip || atomic_read(&chip->shutdown)) { - dev_err(&subs->dev->dev, + dev_err(&uadev[idx].udev->dev, "no sub for c#%u dev#%u dir%u\n", info->pcm_card_num, info->pcm_dev_num, From d02b2103a08b6d6908f1d3d8e8783d3f342555ac Mon Sep 17 00:00:00 2001 From: Arnd Bergmann Date: Fri, 20 Jun 2025 13:18:18 +0200 Subject: [PATCH 169/289] drm/i915: fix build error some more An earlier patch fixed a build failure with clang, but I still see the same problem with some configurations using gcc: drivers/gpu/drm/i915/i915_pmu.c: In function 'config_mask': include/linux/compiler_types.h:568:38: error: call to '__compiletime_assert_462' declared with attribute error: BUILD_BUG_ON failed: bit > BITS_PER_TYPE(typeof_member(struct i915_pmu, enable)) - 1 drivers/gpu/drm/i915/i915_pmu.c:116:3: note: in expansion of macro 'BUILD_BUG_ON' 116 | BUILD_BUG_ON(bit > As I understand it, the problem is that the function is not always fully inlined, but the __builtin_constant_p() can still evaluate the argument as being constant. Marking it as __always_inline so far works for me in all configurations. Fixes: a7137b1825b5 ("drm/i915/pmu: Fix build error with GCOV and AutoFDO enabled") Fixes: a644fde77ff7 ("drm/i915/pmu: Change bitmask of enabled events to u32") Reviewed-by: Rodrigo Vivi Signed-off-by: Arnd Bergmann Link: https://lore.kernel.org/r/20250620111824.3395007-1-arnd@kernel.org Signed-off-by: Rodrigo Vivi (cherry picked from commit ef69f9dd1cd7301cdf04ba326ed28152a3affcf6) Signed-off-by: Joonas Lahtinen --- drivers/gpu/drm/i915/i915_pmu.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/i915/i915_pmu.c b/drivers/gpu/drm/i915/i915_pmu.c index 990bfaba3ce4..5bc696bfbb0f 100644 --- a/drivers/gpu/drm/i915/i915_pmu.c +++ b/drivers/gpu/drm/i915/i915_pmu.c @@ -108,7 +108,7 @@ static unsigned int config_bit(const u64 config) return other_bit(config); } -static u32 config_mask(const u64 config) +static __always_inline u32 config_mask(const u64 config) { unsigned int bit = config_bit(config); From 2ed25aa7f7711f508b6120e336f05cd9d49943c0 Mon Sep 17 00:00:00 2001 From: Or Har-Toov Date: Mon, 16 Jun 2025 11:14:09 +0300 Subject: [PATCH 170/289] IB/mlx5: Fix potential deadlock in MR deregistration MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The issue arises when kzalloc() is invoked while holding umem_mutex or any other lock acquired under umem_mutex. This is problematic because kzalloc() can trigger fs_reclaim_aqcuire(), which may, in turn, invoke mmu_notifier_invalidate_range_start(). This function can lead to mlx5_ib_invalidate_range(), which attempts to acquire umem_mutex again, resulting in a deadlock. The problematic flow: CPU0 | CPU1 ---------------------------------------|------------------------------------------------ mlx5_ib_dereg_mr() | → revoke_mr() | → mutex_lock(&umem_odp->umem_mutex) | | mlx5_mkey_cache_init() | → mutex_lock(&dev->cache.rb_lock) | → mlx5r_cache_create_ent_locked() | → kzalloc(GFP_KERNEL) | → fs_reclaim() | → mmu_notifier_invalidate_range_start() | → mlx5_ib_invalidate_range() | → mutex_lock(&umem_odp->umem_mutex) → cache_ent_find_and_store() | → mutex_lock(&dev->cache.rb_lock) | Additionally, when kzalloc() is called from within cache_ent_find_and_store(), we encounter the same deadlock due to re-acquisition of umem_mutex. Solve by releasing umem_mutex in dereg_mr() after umr_revoke_mr() and before acquiring rb_lock. This ensures that we don't hold umem_mutex while performing memory allocations that could trigger the reclaim path. This change prevents the deadlock by ensuring proper lock ordering and avoiding holding locks during memory allocation operations that could trigger the reclaim path. The following lockdep warning demonstrates the deadlock: python3/20557 is trying to acquire lock: ffff888387542128 (&umem_odp->umem_mutex){+.+.}-{4:4}, at: mlx5_ib_invalidate_range+0x5b/0x550 [mlx5_ib] but task is already holding lock: ffffffff82f6b840 (mmu_notifier_invalidate_range_start){+.+.}-{0:0}, at: unmap_vmas+0x7b/0x1a0 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #3 (mmu_notifier_invalidate_range_start){+.+.}-{0:0}: fs_reclaim_acquire+0x60/0xd0 mem_cgroup_css_alloc+0x6f/0x9b0 cgroup_init_subsys+0xa4/0x240 cgroup_init+0x1c8/0x510 start_kernel+0x747/0x760 x86_64_start_reservations+0x25/0x30 x86_64_start_kernel+0x73/0x80 common_startup_64+0x129/0x138 -> #2 (fs_reclaim){+.+.}-{0:0}: fs_reclaim_acquire+0x91/0xd0 __kmalloc_cache_noprof+0x4d/0x4c0 mlx5r_cache_create_ent_locked+0x75/0x620 [mlx5_ib] mlx5_mkey_cache_init+0x186/0x360 [mlx5_ib] mlx5_ib_stage_post_ib_reg_umr_init+0x3c/0x60 [mlx5_ib] __mlx5_ib_add+0x4b/0x190 [mlx5_ib] mlx5r_probe+0xd9/0x320 [mlx5_ib] auxiliary_bus_probe+0x42/0x70 really_probe+0xdb/0x360 __driver_probe_device+0x8f/0x130 driver_probe_device+0x1f/0xb0 __driver_attach+0xd4/0x1f0 bus_for_each_dev+0x79/0xd0 bus_add_driver+0xf0/0x200 driver_register+0x6e/0xc0 __auxiliary_driver_register+0x6a/0xc0 do_one_initcall+0x5e/0x390 do_init_module+0x88/0x240 init_module_from_file+0x85/0xc0 idempotent_init_module+0x104/0x300 __x64_sys_finit_module+0x68/0xc0 do_syscall_64+0x6d/0x140 entry_SYSCALL_64_after_hwframe+0x4b/0x53 -> #1 (&dev->cache.rb_lock){+.+.}-{4:4}: __mutex_lock+0x98/0xf10 __mlx5_ib_dereg_mr+0x6f2/0x890 [mlx5_ib] mlx5_ib_dereg_mr+0x21/0x110 [mlx5_ib] ib_dereg_mr_user+0x85/0x1f0 [ib_core] uverbs_free_mr+0x19/0x30 [ib_uverbs] destroy_hw_idr_uobject+0x21/0x80 [ib_uverbs] uverbs_destroy_uobject+0x60/0x3d0 [ib_uverbs] uobj_destroy+0x57/0xa0 [ib_uverbs] ib_uverbs_cmd_verbs+0x4d5/0x1210 [ib_uverbs] ib_uverbs_ioctl+0x129/0x230 [ib_uverbs] __x64_sys_ioctl+0x596/0xaa0 do_syscall_64+0x6d/0x140 entry_SYSCALL_64_after_hwframe+0x4b/0x53 -> #0 (&umem_odp->umem_mutex){+.+.}-{4:4}: __lock_acquire+0x1826/0x2f00 lock_acquire+0xd3/0x2e0 __mutex_lock+0x98/0xf10 mlx5_ib_invalidate_range+0x5b/0x550 [mlx5_ib] __mmu_notifier_invalidate_range_start+0x18e/0x1f0 unmap_vmas+0x182/0x1a0 exit_mmap+0xf3/0x4a0 mmput+0x3a/0x100 do_exit+0x2b9/0xa90 do_group_exit+0x32/0xa0 get_signal+0xc32/0xcb0 arch_do_signal_or_restart+0x29/0x1d0 syscall_exit_to_user_mode+0x105/0x1d0 do_syscall_64+0x79/0x140 entry_SYSCALL_64_after_hwframe+0x4b/0x53 Chain exists of: &dev->cache.rb_lock --> mmu_notifier_invalidate_range_start --> &umem_odp->umem_mutex Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(&umem_odp->umem_mutex); lock(mmu_notifier_invalidate_range_start); lock(&umem_odp->umem_mutex); lock(&dev->cache.rb_lock); *** DEADLOCK *** Fixes: abb604a1a9c8 ("RDMA/mlx5: Fix a race for an ODP MR which leads to CQE with error") Signed-off-by: Or Har-Toov Reviewed-by: Michael Guralnik Link: https://patch.msgid.link/3c8f225a8a9fade647d19b014df1172544643e4a.1750061612.git.leon@kernel.org Signed-off-by: Leon Romanovsky --- drivers/infiniband/hw/mlx5/mr.c | 61 +++++++++++++++++++++++++-------- 1 file changed, 47 insertions(+), 14 deletions(-) diff --git a/drivers/infiniband/hw/mlx5/mr.c b/drivers/infiniband/hw/mlx5/mr.c index 57f9bc2a4a3a..bd35e75d9ce5 100644 --- a/drivers/infiniband/hw/mlx5/mr.c +++ b/drivers/infiniband/hw/mlx5/mr.c @@ -2027,23 +2027,50 @@ void mlx5_ib_revoke_data_direct_mrs(struct mlx5_ib_dev *dev) } } -static int mlx5_revoke_mr(struct mlx5_ib_mr *mr) +static int mlx5_umr_revoke_mr_with_lock(struct mlx5_ib_mr *mr) { - struct mlx5_ib_dev *dev = to_mdev(mr->ibmr.device); - struct mlx5_cache_ent *ent = mr->mmkey.cache_ent; - bool is_odp = is_odp_mr(mr); bool is_odp_dma_buf = is_dmabuf_mr(mr) && - !to_ib_umem_dmabuf(mr->umem)->pinned; - bool from_cache = !!ent; - int ret = 0; + !to_ib_umem_dmabuf(mr->umem)->pinned; + bool is_odp = is_odp_mr(mr); + int ret; if (is_odp) mutex_lock(&to_ib_umem_odp(mr->umem)->umem_mutex); if (is_odp_dma_buf) - dma_resv_lock(to_ib_umem_dmabuf(mr->umem)->attach->dmabuf->resv, NULL); + dma_resv_lock(to_ib_umem_dmabuf(mr->umem)->attach->dmabuf->resv, + NULL); - if (mr->mmkey.cacheable && !mlx5r_umr_revoke_mr(mr) && !cache_ent_find_and_store(dev, mr)) { + ret = mlx5r_umr_revoke_mr(mr); + + if (is_odp) { + if (!ret) + to_ib_umem_odp(mr->umem)->private = NULL; + mutex_unlock(&to_ib_umem_odp(mr->umem)->umem_mutex); + } + + if (is_odp_dma_buf) { + if (!ret) + to_ib_umem_dmabuf(mr->umem)->private = NULL; + dma_resv_unlock( + to_ib_umem_dmabuf(mr->umem)->attach->dmabuf->resv); + } + + return ret; +} + +static int mlx5r_handle_mkey_cleanup(struct mlx5_ib_mr *mr) +{ + bool is_odp_dma_buf = is_dmabuf_mr(mr) && + !to_ib_umem_dmabuf(mr->umem)->pinned; + struct mlx5_ib_dev *dev = to_mdev(mr->ibmr.device); + struct mlx5_cache_ent *ent = mr->mmkey.cache_ent; + bool is_odp = is_odp_mr(mr); + bool from_cache = !!ent; + int ret; + + if (mr->mmkey.cacheable && !mlx5_umr_revoke_mr_with_lock(mr) && + !cache_ent_find_and_store(dev, mr)) { ent = mr->mmkey.cache_ent; /* upon storing to a clean temp entry - schedule its cleanup */ spin_lock_irq(&ent->mkeys_queue.lock); @@ -2055,7 +2082,7 @@ static int mlx5_revoke_mr(struct mlx5_ib_mr *mr) ent->tmp_cleanup_scheduled = true; } spin_unlock_irq(&ent->mkeys_queue.lock); - goto out; + return 0; } if (ent) { @@ -2064,8 +2091,14 @@ static int mlx5_revoke_mr(struct mlx5_ib_mr *mr) mr->mmkey.cache_ent = NULL; spin_unlock_irq(&ent->mkeys_queue.lock); } + + if (is_odp) + mutex_lock(&to_ib_umem_odp(mr->umem)->umem_mutex); + + if (is_odp_dma_buf) + dma_resv_lock(to_ib_umem_dmabuf(mr->umem)->attach->dmabuf->resv, + NULL); ret = destroy_mkey(dev, mr); -out: if (is_odp) { if (!ret) to_ib_umem_odp(mr->umem)->private = NULL; @@ -2075,9 +2108,9 @@ out: if (is_odp_dma_buf) { if (!ret) to_ib_umem_dmabuf(mr->umem)->private = NULL; - dma_resv_unlock(to_ib_umem_dmabuf(mr->umem)->attach->dmabuf->resv); + dma_resv_unlock( + to_ib_umem_dmabuf(mr->umem)->attach->dmabuf->resv); } - return ret; } @@ -2126,7 +2159,7 @@ static int __mlx5_ib_dereg_mr(struct ib_mr *ibmr) } /* Stop DMA */ - rc = mlx5_revoke_mr(mr); + rc = mlx5r_handle_mkey_cleanup(mr); if (rc) return rc; From 3f5f6321f129ad5a30aa03c99c196b4612be68a8 Mon Sep 17 00:00:00 2001 From: Or Har-Toov Date: Mon, 16 Jun 2025 11:16:03 +0300 Subject: [PATCH 171/289] IB/core: Annotate umem_mutex acquisition under fs_reclaim for lockdep MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Following the fix in the previous commit ("IB/mlx5: Fix potential deadlock in MR deregistration"), teach lockdep explicitly about the locking order between fs_reclaim and umem_mutex. The previous commit resolved a potential deadlock scenario where kzalloc(GFP_KERNEL) was called while holding umem_mutex, which could lead to reclaim and eventually invoke the MMU notifier (mlx5_ib_invalidate_range()), causing a recursive acquisition of umem_mutex. To prevent such issues from reoccurring unnoticed in future code changes, add a lockdep annotation in ib_init_umem_odp() that simulates taking umem_mutex inside a reclaim context. This makes lockdep aware of this locking dependency and ensures that future violations—such as calling kzalloc() or any memory allocator that may enter reclaim while holding umem_mutex—will immediately raise a lockdep warning. Signed-off-by: Or Har-Toov Reviewed-by: Michael Guralnik Link: https://patch.msgid.link/9d31b9d8fe1db648a9f47cec3df6b8463319dee5.1750061698.git.leon@kernel.org Signed-off-by: Leon Romanovsky --- drivers/infiniband/core/umem_odp.c | 11 +++++++++++ 1 file changed, 11 insertions(+) diff --git a/drivers/infiniband/core/umem_odp.c b/drivers/infiniband/core/umem_odp.c index c752ae9fad6c..b1c44ec1a3f3 100644 --- a/drivers/infiniband/core/umem_odp.c +++ b/drivers/infiniband/core/umem_odp.c @@ -76,6 +76,17 @@ static int ib_init_umem_odp(struct ib_umem_odp *umem_odp, end = ALIGN(end, page_size); if (unlikely(end < page_size)) return -EOVERFLOW; + /* + * The mmu notifier can be called within reclaim contexts and takes the + * umem_mutex. This is rare to trigger in testing, teach lockdep about + * it. + */ + if (IS_ENABLED(CONFIG_LOCKDEP)) { + fs_reclaim_acquire(GFP_KERNEL); + mutex_lock(&umem_odp->umem_mutex); + mutex_unlock(&umem_odp->umem_mutex); + fs_reclaim_release(GFP_KERNEL); + } nr_entries = (end - start) >> PAGE_SHIFT; if (!(nr_entries * PAGE_SIZE / page_size)) From 3cc1dbfddf88dc5ecce0a75185061403b1f7352d Mon Sep 17 00:00:00 2001 From: Patrisious Haddad Date: Mon, 16 Jun 2025 12:14:52 +0300 Subject: [PATCH 172/289] RDMA/mlx5: Fix HW counters query for non-representor devices To get the device HW counters, a non-representor switchdev device should use the mlx5_ib_query_q_counters() function and query all of the available counters. While a representor device in switchdev mode should use the mlx5_ib_query_q_counters_vport() function and query only the Q_Counters without the PPCNT counters and congestion control counters, since they aren't relevant for a representor device. Currently a non-representor switchdev device skips querying the PPCNT counters and congestion control counters, leaving them unupdated. Fix that by properly querying those counters for non-representor devices. Fixes: d22467a71ebe ("RDMA/mlx5: Expand switchdev Q-counters to expose representor statistics") Signed-off-by: Patrisious Haddad Reviewed-by: Maher Sanalla Link: https://patch.msgid.link/56bf8af4ca8c58e3fb9f7e47b1dca2009eeeed81.1750064969.git.leon@kernel.org Signed-off-by: Leon Romanovsky --- drivers/infiniband/hw/mlx5/counters.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/infiniband/hw/mlx5/counters.c b/drivers/infiniband/hw/mlx5/counters.c index b847084dcd99..943e9eb2ad20 100644 --- a/drivers/infiniband/hw/mlx5/counters.c +++ b/drivers/infiniband/hw/mlx5/counters.c @@ -398,7 +398,7 @@ static int do_get_hw_stats(struct ib_device *ibdev, return ret; /* We don't expose device counters over Vports */ - if (is_mdev_switchdev_mode(dev->mdev) && port_num != 0) + if (is_mdev_switchdev_mode(dev->mdev) && dev->is_rep && port_num != 0) goto done; if (MLX5_CAP_PCAM_FEATURE(dev->mdev, rx_icrc_encapsulated_counter)) { From acd245b1e33fc4b9d0f2e3372021d632f7ee0652 Mon Sep 17 00:00:00 2001 From: Patrisious Haddad Date: Mon, 16 Jun 2025 12:14:53 +0300 Subject: [PATCH 173/289] RDMA/mlx5: Fix CC counters query for MPV In case, CC counters are querying for the second port use the correct core device for the query instead of always using the master core device. Fixes: aac4492ef23a ("IB/mlx5: Update counter implementation for dual port RoCE") Signed-off-by: Patrisious Haddad Reviewed-by: Michael Guralnik Link: https://patch.msgid.link/9cace74dcf106116118bebfa9146d40d4166c6b0.1750064969.git.leon@kernel.org Signed-off-by: Leon Romanovsky --- drivers/infiniband/hw/mlx5/counters.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/infiniband/hw/mlx5/counters.c b/drivers/infiniband/hw/mlx5/counters.c index 943e9eb2ad20..a506fafd2b15 100644 --- a/drivers/infiniband/hw/mlx5/counters.c +++ b/drivers/infiniband/hw/mlx5/counters.c @@ -418,7 +418,7 @@ static int do_get_hw_stats(struct ib_device *ibdev, */ goto done; } - ret = mlx5_lag_query_cong_counters(dev->mdev, + ret = mlx5_lag_query_cong_counters(mdev, stats->value + cnts->num_q_counters, cnts->num_cong_counters, From a9a9e68954f29b1e197663f76289db4879fd51bb Mon Sep 17 00:00:00 2001 From: Patrisious Haddad Date: Mon, 16 Jun 2025 12:14:54 +0300 Subject: [PATCH 174/289] RDMA/mlx5: Fix vport loopback for MPV device Always enable vport loopback for both MPV devices on driver start. Previously in some cases related to MPV RoCE, packets weren't correctly executing loopback check at vport in FW, since it was disabled. Due to complexity of identifying such cases for MPV always enable vport loopback for both GVMIs when binding the slave to the master port. Fixes: 0042f9e458a5 ("RDMA/mlx5: Enable vport loopback when user context or QP mandate") Signed-off-by: Patrisious Haddad Reviewed-by: Mark Bloch Link: https://patch.msgid.link/d4298f5ebb2197459e9e7221c51ecd6a34699847.1750064969.git.leon@kernel.org Signed-off-by: Leon Romanovsky --- drivers/infiniband/hw/mlx5/main.c | 33 +++++++++++++++++++++++++++++++ 1 file changed, 33 insertions(+) diff --git a/drivers/infiniband/hw/mlx5/main.c b/drivers/infiniband/hw/mlx5/main.c index ce7610740412..df6557ddbdfc 100644 --- a/drivers/infiniband/hw/mlx5/main.c +++ b/drivers/infiniband/hw/mlx5/main.c @@ -1791,6 +1791,33 @@ static void deallocate_uars(struct mlx5_ib_dev *dev, context->devx_uid); } +static int mlx5_ib_enable_lb_mp(struct mlx5_core_dev *master, + struct mlx5_core_dev *slave) +{ + int err; + + err = mlx5_nic_vport_update_local_lb(master, true); + if (err) + return err; + + err = mlx5_nic_vport_update_local_lb(slave, true); + if (err) + goto out; + + return 0; + +out: + mlx5_nic_vport_update_local_lb(master, false); + return err; +} + +static void mlx5_ib_disable_lb_mp(struct mlx5_core_dev *master, + struct mlx5_core_dev *slave) +{ + mlx5_nic_vport_update_local_lb(slave, false); + mlx5_nic_vport_update_local_lb(master, false); +} + int mlx5_ib_enable_lb(struct mlx5_ib_dev *dev, bool td, bool qp) { int err = 0; @@ -3495,6 +3522,8 @@ static void mlx5_ib_unbind_slave_port(struct mlx5_ib_dev *ibdev, lockdep_assert_held(&mlx5_ib_multiport_mutex); + mlx5_ib_disable_lb_mp(ibdev->mdev, mpi->mdev); + mlx5_core_mp_event_replay(ibdev->mdev, MLX5_DRIVER_EVENT_AFFILIATION_REMOVED, NULL); @@ -3590,6 +3619,10 @@ static bool mlx5_ib_bind_slave_port(struct mlx5_ib_dev *ibdev, MLX5_DRIVER_EVENT_AFFILIATION_DONE, &key); + err = mlx5_ib_enable_lb_mp(ibdev->mdev, mpi->mdev); + if (err) + goto unbind; + return true; unbind: From ec54c0a20709ed6e56f40a8d59eee725c31a916b Mon Sep 17 00:00:00 2001 From: Sergey Senozhatsky Date: Wed, 25 Jun 2025 14:20:37 +0900 Subject: [PATCH 175/289] mtk-sd: reset host->mrq on prepare_data() error Do not leave host with dangling ->mrq pointer if we hit the msdc_prepare_data() error out path. Signed-off-by: Sergey Senozhatsky Reviewed-by: Masami Hiramatsu (Google) Fixes: f5de469990f1 ("mtk-sd: Prevent memory corruption from DMA map failure") Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20250625052106.584905-1-senozhatsky@chromium.org Signed-off-by: Ulf Hansson --- drivers/mmc/host/mtk-sd.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/mmc/host/mtk-sd.c b/drivers/mmc/host/mtk-sd.c index b12cfb9a5e5f..d7020e06dd55 100644 --- a/drivers/mmc/host/mtk-sd.c +++ b/drivers/mmc/host/mtk-sd.c @@ -1492,6 +1492,7 @@ static void msdc_ops_request(struct mmc_host *mmc, struct mmc_request *mrq) if (mrq->data) { msdc_prepare_data(host, mrq->data); if (!msdc_data_prepared(mrq->data)) { + host->mrq = NULL; /* * Failed to prepare DMA area, fail fast before * starting any commands. From 3e0809b1664b9dc650d9dbca9a2d3ac690d4f661 Mon Sep 17 00:00:00 2001 From: Niklas Cassel Date: Tue, 24 Jun 2025 09:40:30 +0200 Subject: [PATCH 176/289] ata: ahci: Use correct DMI identifier for ASUSPRO-D840SA LPM quirk ASUS store the board name in DMI_PRODUCT_NAME rather than DMI_PRODUCT_VERSION. (Apparently it is only Lenovo that stores the model-name in DMI_PRODUCT_VERSION.) Use the correct DMI identifier, DMI_PRODUCT_NAME, to match the ASUSPRO-D840SA board, such that the quirk actually gets applied. Cc: stable@vger.kernel.org Reported-by: Andy Yang Tested-by: Andy Yang Closes: https://lore.kernel.org/linux-ide/aFb3wXAwJSSJUB7o@ryzen/ Fixes: b5acc3628898 ("ata: ahci: Disallow LPM for ASUSPRO-D840SA motherboard") Reviewed-by: Hans de Goede Reviewed-by: Damien Le Moal Link: https://lore.kernel.org/r/20250624074029.963028-2-cassel@kernel.org Signed-off-by: Niklas Cassel --- drivers/ata/ahci.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/ata/ahci.c b/drivers/ata/ahci.c index e5e5c2e81d09..aa93b0ecbbc6 100644 --- a/drivers/ata/ahci.c +++ b/drivers/ata/ahci.c @@ -1450,7 +1450,7 @@ static bool ahci_broken_lpm(struct pci_dev *pdev) { .matches = { DMI_MATCH(DMI_SYS_VENDOR, "ASUSTeK COMPUTER INC."), - DMI_MATCH(DMI_PRODUCT_VERSION, "ASUSPRO D840MB_M840SA"), + DMI_MATCH(DMI_PRODUCT_NAME, "ASUSPRO D840MB_M840SA"), }, /* 320 is broken, there is no known good version. */ }, From 7cac633a42a7b3c8146eb1db76fb80dc652998de Mon Sep 17 00:00:00 2001 From: Penglei Jiang Date: Wed, 25 Jun 2025 03:27:03 -0700 Subject: [PATCH 177/289] io_uring: fix resource leak in io_import_dmabuf() Replace the return statement with setting ret = -EINVAL and jumping to the err label to ensure resources are released via io_release_dmabuf. Fixes: a5c98e942457 ("io_uring/zcrx: dmabuf backed zerocopy receive") Signed-off-by: Penglei Jiang Link: https://lore.kernel.org/r/20250625102703.68336-1-superman.xpt@gmail.com Signed-off-by: Jens Axboe --- io_uring/zcrx.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/io_uring/zcrx.c b/io_uring/zcrx.c index 21c816c3bfe0..ade4da9c4e31 100644 --- a/io_uring/zcrx.c +++ b/io_uring/zcrx.c @@ -106,8 +106,10 @@ static int io_import_dmabuf(struct io_zcrx_ifq *ifq, for_each_sgtable_dma_sg(mem->sgt, sg, i) total_size += sg_dma_len(sg); - if (total_size < off + len) - return -EINVAL; + if (total_size < off + len) { + ret = -EINVAL; + goto err; + } mem->dmabuf_offset = off; mem->size = len; From a3f3040657417aeadb9622c629d4a0c2693a0f93 Mon Sep 17 00:00:00 2001 From: Avadhut Naik Date: Thu, 29 May 2025 20:50:04 +0000 Subject: [PATCH 178/289] EDAC/amd64: Fix size calculation for Non-Power-of-Two DIMMs MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Each Chip-Select (CS) of a Unified Memory Controller (UMC) on AMD Zen-based SOCs has an Address Mask and a Secondary Address Mask register associated with it. The amd64_edac module logs DIMM sizes on a per-UMC per-CS granularity during init using these two registers. Currently, the module primarily considers only the Address Mask register for computing DIMM sizes. The Secondary Address Mask register is only considered for odd CS. Additionally, if it has been considered, the Address Mask register is ignored altogether for that CS. For power-of-two DIMMs i.e. DIMMs whose total capacity is a power of two (32GB, 64GB, etc), this is not an issue since only the Address Mask register is used. For non-power-of-two DIMMs i.e., DIMMs whose total capacity is not a power of two (48GB, 96GB, etc), however, the Secondary Address Mask register is used in conjunction with the Address Mask register. However, since the module only considers either of the two registers for a CS, the size computed by the module is incorrect. The Secondary Address Mask register is not considered for even CS, and the Address Mask register is not considered for odd CS. Introduce a new helper function so that both Address Mask and Secondary Address Mask registers are considered, when valid, for computing DIMM sizes. Furthermore, also rename some variables for greater clarity. Fixes: 81f5090db843 ("EDAC/amd64: Support asymmetric dual-rank DIMMs") Closes: https://lore.kernel.org/dbec22b6-00f2-498b-b70d-ab6f8a5ec87e@natrix.lt Reported-by: Žilvinas Žaltiena Signed-off-by: Avadhut Naik Signed-off-by: Borislav Petkov (AMD) Reviewed-by: Yazen Ghannam Tested-by: Žilvinas Žaltiena Cc: stable@vger.kernel.org Link: https://lore.kernel.org/20250529205013.403450-1-avadhut.naik@amd.com --- drivers/edac/amd64_edac.c | 57 ++++++++++++++++++++++++--------------- 1 file changed, 36 insertions(+), 21 deletions(-) diff --git a/drivers/edac/amd64_edac.c b/drivers/edac/amd64_edac.c index b681c0663203..07f1e9dc1ca7 100644 --- a/drivers/edac/amd64_edac.c +++ b/drivers/edac/amd64_edac.c @@ -1209,7 +1209,9 @@ static int umc_get_cs_mode(int dimm, u8 ctrl, struct amd64_pvt *pvt) if (csrow_enabled(2 * dimm + 1, ctrl, pvt)) cs_mode |= CS_ODD_PRIMARY; - /* Asymmetric dual-rank DIMM support. */ + if (csrow_sec_enabled(2 * dimm, ctrl, pvt)) + cs_mode |= CS_EVEN_SECONDARY; + if (csrow_sec_enabled(2 * dimm + 1, ctrl, pvt)) cs_mode |= CS_ODD_SECONDARY; @@ -1230,12 +1232,13 @@ static int umc_get_cs_mode(int dimm, u8 ctrl, struct amd64_pvt *pvt) return cs_mode; } -static int __addr_mask_to_cs_size(u32 addr_mask_orig, unsigned int cs_mode, - int csrow_nr, int dimm) +static int calculate_cs_size(u32 mask, unsigned int cs_mode) { - u32 msb, weight, num_zero_bits; - u32 addr_mask_deinterleaved; - int size = 0; + int msb, weight, num_zero_bits; + u32 deinterleaved_mask; + + if (!mask) + return 0; /* * The number of zero bits in the mask is equal to the number of bits @@ -1248,19 +1251,30 @@ static int __addr_mask_to_cs_size(u32 addr_mask_orig, unsigned int cs_mode, * without swapping with the most significant bit. This can be handled * by keeping the MSB where it is and ignoring the single zero bit. */ - msb = fls(addr_mask_orig) - 1; - weight = hweight_long(addr_mask_orig); + msb = fls(mask) - 1; + weight = hweight_long(mask); num_zero_bits = msb - weight - !!(cs_mode & CS_3R_INTERLEAVE); /* Take the number of zero bits off from the top of the mask. */ - addr_mask_deinterleaved = GENMASK_ULL(msb - num_zero_bits, 1); + deinterleaved_mask = GENMASK(msb - num_zero_bits, 1); + edac_dbg(1, " Deinterleaved AddrMask: 0x%x\n", deinterleaved_mask); + + return (deinterleaved_mask >> 2) + 1; +} + +static int __addr_mask_to_cs_size(u32 addr_mask, u32 addr_mask_sec, + unsigned int cs_mode, int csrow_nr, int dimm) +{ + int size; edac_dbg(1, "CS%d DIMM%d AddrMasks:\n", csrow_nr, dimm); - edac_dbg(1, " Original AddrMask: 0x%x\n", addr_mask_orig); - edac_dbg(1, " Deinterleaved AddrMask: 0x%x\n", addr_mask_deinterleaved); + edac_dbg(1, " Primary AddrMask: 0x%x\n", addr_mask); /* Register [31:1] = Address [39:9]. Size is in kBs here. */ - size = (addr_mask_deinterleaved >> 2) + 1; + size = calculate_cs_size(addr_mask, cs_mode); + + edac_dbg(1, " Secondary AddrMask: 0x%x\n", addr_mask_sec); + size += calculate_cs_size(addr_mask_sec, cs_mode); /* Return size in MBs. */ return size >> 10; @@ -1269,8 +1283,8 @@ static int __addr_mask_to_cs_size(u32 addr_mask_orig, unsigned int cs_mode, static int umc_addr_mask_to_cs_size(struct amd64_pvt *pvt, u8 umc, unsigned int cs_mode, int csrow_nr) { + u32 addr_mask = 0, addr_mask_sec = 0; int cs_mask_nr = csrow_nr; - u32 addr_mask_orig; int dimm, size = 0; /* No Chip Selects are enabled. */ @@ -1308,13 +1322,13 @@ static int umc_addr_mask_to_cs_size(struct amd64_pvt *pvt, u8 umc, if (!pvt->flags.zn_regs_v2) cs_mask_nr >>= 1; - /* Asymmetric dual-rank DIMM support. */ - if ((csrow_nr & 1) && (cs_mode & CS_ODD_SECONDARY)) - addr_mask_orig = pvt->csels[umc].csmasks_sec[cs_mask_nr]; - else - addr_mask_orig = pvt->csels[umc].csmasks[cs_mask_nr]; + if (cs_mode & (CS_EVEN_PRIMARY | CS_ODD_PRIMARY)) + addr_mask = pvt->csels[umc].csmasks[cs_mask_nr]; - return __addr_mask_to_cs_size(addr_mask_orig, cs_mode, csrow_nr, dimm); + if (cs_mode & (CS_EVEN_SECONDARY | CS_ODD_SECONDARY)) + addr_mask_sec = pvt->csels[umc].csmasks_sec[cs_mask_nr]; + + return __addr_mask_to_cs_size(addr_mask, addr_mask_sec, cs_mode, csrow_nr, dimm); } static void umc_debug_display_dimm_sizes(struct amd64_pvt *pvt, u8 ctrl) @@ -3512,9 +3526,10 @@ static void gpu_get_err_info(struct mce *m, struct err_info *err) static int gpu_addr_mask_to_cs_size(struct amd64_pvt *pvt, u8 umc, unsigned int cs_mode, int csrow_nr) { - u32 addr_mask_orig = pvt->csels[umc].csmasks[csrow_nr]; + u32 addr_mask = pvt->csels[umc].csmasks[csrow_nr]; + u32 addr_mask_sec = pvt->csels[umc].csmasks_sec[csrow_nr]; - return __addr_mask_to_cs_size(addr_mask_orig, cs_mode, csrow_nr, csrow_nr >> 1); + return __addr_mask_to_cs_size(addr_mask, addr_mask_sec, cs_mode, csrow_nr, csrow_nr >> 1); } static void gpu_debug_display_dimm_sizes(struct amd64_pvt *pvt, u8 ctrl) From 55e8ff842051b1150461d7595d8f1d033c69d66b Mon Sep 17 00:00:00 2001 From: Jayesh Choudhary Date: Tue, 24 Jun 2025 10:18:35 +0530 Subject: [PATCH 179/289] drm/bridge: ti-sn65dsi86: Add HPD for DisplayPort connector type By default, HPD was disabled on SN65DSI86 bridge. When the driver was added (commit "a095f15c00e27"), the HPD_DISABLE bit was set in pre-enable call which was moved to other function calls subsequently. Later on, commit "c312b0df3b13" added detect utility for DP mode. But with HPD_DISABLE bit set, all the HPD events are disabled[0] and the debounced state always return 1 (always connected state). Set HPD_DISABLE bit conditionally based on display sink's connector type. Since the HPD_STATE is reflected correctly only after waiting for debounce time (~100-400ms) and adding this delay in detect() is not feasible owing to the performace impact (glitches and frame drop), remove runtime calls in detect() and add hpd_enable()/disable() bridge hooks with runtime calls, to detect hpd properly without any delay. [0]: (Pg. 32) Fixes: c312b0df3b13 ("drm/bridge: ti-sn65dsi86: Implement bridge connector operations for DP") Cc: Max Krummenacher Reviewed-by: Douglas Anderson Tested-by: Ernest Van Hoecke Signed-off-by: Jayesh Choudhary Signed-off-by: Douglas Anderson Link: https://lore.kernel.org/r/20250624044835.165708-1-j-choudhary@ti.com --- drivers/gpu/drm/bridge/ti-sn65dsi86.c | 69 +++++++++++++++++++++++---- 1 file changed, 60 insertions(+), 9 deletions(-) diff --git a/drivers/gpu/drm/bridge/ti-sn65dsi86.c b/drivers/gpu/drm/bridge/ti-sn65dsi86.c index 60224f476e1d..de9c23537465 100644 --- a/drivers/gpu/drm/bridge/ti-sn65dsi86.c +++ b/drivers/gpu/drm/bridge/ti-sn65dsi86.c @@ -348,12 +348,18 @@ static void ti_sn65dsi86_enable_comms(struct ti_sn65dsi86 *pdata, * 200 ms. We'll assume that the panel driver will have the hardcoded * delay in its prepare and always disable HPD. * - * If HPD somehow makes sense on some future panel we'll have to - * change this to be conditional on someone specifying that HPD should - * be used. + * For DisplayPort bridge type, we need HPD. So we use the bridge type + * to conditionally disable HPD. + * NOTE: The bridge type is set in ti_sn_bridge_probe() but enable_comms() + * can be called before. So for DisplayPort, HPD will be enabled once + * bridge type is set. We are using bridge type instead of "no-hpd" + * property because it is not used properly in devicetree description + * and hence is unreliable. */ - regmap_update_bits(pdata->regmap, SN_HPD_DISABLE_REG, HPD_DISABLE, - HPD_DISABLE); + + if (pdata->bridge.type != DRM_MODE_CONNECTOR_DisplayPort) + regmap_update_bits(pdata->regmap, SN_HPD_DISABLE_REG, HPD_DISABLE, + HPD_DISABLE); pdata->comms_enabled = true; @@ -1195,9 +1201,14 @@ static enum drm_connector_status ti_sn_bridge_detect(struct drm_bridge *bridge) struct ti_sn65dsi86 *pdata = bridge_to_ti_sn65dsi86(bridge); int val = 0; - pm_runtime_get_sync(pdata->dev); + /* + * Runtime reference is grabbed in ti_sn_bridge_hpd_enable() + * as the chip won't report HPD just after being powered on. + * HPD_DEBOUNCED_STATE reflects correct state only after the + * debounce time (~100-400 ms). + */ + regmap_read(pdata->regmap, SN_HPD_DISABLE_REG, &val); - pm_runtime_put_autosuspend(pdata->dev); return val & HPD_DEBOUNCED_STATE ? connector_status_connected : connector_status_disconnected; @@ -1220,6 +1231,26 @@ static void ti_sn65dsi86_debugfs_init(struct drm_bridge *bridge, struct dentry * debugfs_create_file("status", 0600, debugfs, pdata, &status_fops); } +static void ti_sn_bridge_hpd_enable(struct drm_bridge *bridge) +{ + struct ti_sn65dsi86 *pdata = bridge_to_ti_sn65dsi86(bridge); + + /* + * Device needs to be powered on before reading the HPD state + * for reliable hpd detection in ti_sn_bridge_detect() due to + * the high debounce time. + */ + + pm_runtime_get_sync(pdata->dev); +} + +static void ti_sn_bridge_hpd_disable(struct drm_bridge *bridge) +{ + struct ti_sn65dsi86 *pdata = bridge_to_ti_sn65dsi86(bridge); + + pm_runtime_put_autosuspend(pdata->dev); +} + static const struct drm_bridge_funcs ti_sn_bridge_funcs = { .attach = ti_sn_bridge_attach, .detach = ti_sn_bridge_detach, @@ -1234,6 +1265,8 @@ static const struct drm_bridge_funcs ti_sn_bridge_funcs = { .atomic_duplicate_state = drm_atomic_helper_bridge_duplicate_state, .atomic_destroy_state = drm_atomic_helper_bridge_destroy_state, .debugfs_init = ti_sn65dsi86_debugfs_init, + .hpd_enable = ti_sn_bridge_hpd_enable, + .hpd_disable = ti_sn_bridge_hpd_disable, }; static void ti_sn_bridge_parse_lanes(struct ti_sn65dsi86 *pdata, @@ -1321,8 +1354,26 @@ static int ti_sn_bridge_probe(struct auxiliary_device *adev, pdata->bridge.type = pdata->next_bridge->type == DRM_MODE_CONNECTOR_DisplayPort ? DRM_MODE_CONNECTOR_DisplayPort : DRM_MODE_CONNECTOR_eDP; - if (pdata->bridge.type == DRM_MODE_CONNECTOR_DisplayPort) - pdata->bridge.ops = DRM_BRIDGE_OP_EDID | DRM_BRIDGE_OP_DETECT; + if (pdata->bridge.type == DRM_MODE_CONNECTOR_DisplayPort) { + pdata->bridge.ops = DRM_BRIDGE_OP_EDID | DRM_BRIDGE_OP_DETECT | + DRM_BRIDGE_OP_HPD; + /* + * If comms were already enabled they would have been enabled + * with the wrong value of HPD_DISABLE. Update it now. Comms + * could be enabled if anyone is holding a pm_runtime reference + * (like if a GPIO is in use). Note that in most cases nobody + * is doing AUX channel xfers before the bridge is added so + * HPD doesn't _really_ matter then. The only exception is in + * the eDP case where the panel wants to read the EDID before + * the bridge is added. We always consistently have HPD disabled + * for eDP. + */ + mutex_lock(&pdata->comms_mutex); + if (pdata->comms_enabled) + regmap_update_bits(pdata->regmap, SN_HPD_DISABLE_REG, + HPD_DISABLE, 0); + mutex_unlock(&pdata->comms_mutex); + }; drm_bridge_add(&pdata->bridge); From 1944f6ab4967db7ad8d4db527dceae8c77de76e9 Mon Sep 17 00:00:00 2001 From: Stefan Metzmacher Date: Wed, 25 Jun 2025 10:16:38 +0200 Subject: [PATCH 180/289] smb: client: let smbd_post_send_iter() respect the peers max_send_size and transmit all data We should not send smbdirect_data_transfer messages larger than the negotiated max_send_size, typically 1364 bytes, which means 24 bytes of the smbdirect_data_transfer header + 1340 payload bytes. This happened when doing an SMB2 write with more than 1340 bytes (which is done inline as it's below rdma_readwrite_threshold). It means the peer resets the connection. When testing between cifs.ko and ksmbd.ko something like this is logged: client: CIFS: VFS: RDMA transport re-established siw: got TERMINATE. layer 1, type 2, code 2 siw: got TERMINATE. layer 1, type 2, code 2 siw: got TERMINATE. layer 1, type 2, code 2 siw: got TERMINATE. layer 1, type 2, code 2 siw: got TERMINATE. layer 1, type 2, code 2 siw: got TERMINATE. layer 1, type 2, code 2 siw: got TERMINATE. layer 1, type 2, code 2 siw: got TERMINATE. layer 1, type 2, code 2 siw: got TERMINATE. layer 1, type 2, code 2 CIFS: VFS: \\carina Send error in SessSetup = -11 smb2_reconnect: 12 callbacks suppressed CIFS: VFS: reconnect tcon failed rc = -11 CIFS: VFS: reconnect tcon failed rc = -11 CIFS: VFS: reconnect tcon failed rc = -11 CIFS: VFS: SMB: Zero rsize calculated, using minimum value 65536 and: CIFS: VFS: RDMA transport re-established siw: got TERMINATE. layer 1, type 2, code 2 CIFS: VFS: smbd_recv:1894 disconnected siw: got TERMINATE. layer 1, type 2, code 2 The ksmbd dmesg is showing things like: smb_direct: Recv error. status='local length error (1)' opcode=128 smb_direct: disconnected smb_direct: Recv error. status='local length error (1)' opcode=128 ksmbd: smb_direct: disconnected ksmbd: sock_read failed: -107 As smbd_post_send_iter() limits the transmitted number of bytes we need loop over it in order to transmit the whole iter. Reviewed-by: David Howells Tested-by: David Howells Tested-by: Meetakshi Setiya Cc: Tom Talpey Cc: linux-cifs@vger.kernel.org Cc: # sp->max_send_size should be info->max_send_size in backports Fixes: 3d78fe73fa12 ("cifs: Build the RDMA SGE list directly from an iterator") Signed-off-by: Stefan Metzmacher Signed-off-by: Steve French --- fs/smb/client/smbdirect.c | 31 +++++++++++++++++++++++++++---- 1 file changed, 27 insertions(+), 4 deletions(-) diff --git a/fs/smb/client/smbdirect.c b/fs/smb/client/smbdirect.c index cbc85bca006f..a976bcf61226 100644 --- a/fs/smb/client/smbdirect.c +++ b/fs/smb/client/smbdirect.c @@ -907,8 +907,10 @@ wait_send_queue: .local_dma_lkey = sc->ib.pd->local_dma_lkey, .direction = DMA_TO_DEVICE, }; + size_t payload_len = umin(*_remaining_data_length, + sp->max_send_size - sizeof(*packet)); - rc = smb_extract_iter_to_rdma(iter, *_remaining_data_length, + rc = smb_extract_iter_to_rdma(iter, payload_len, &extract); if (rc < 0) goto err_dma; @@ -1013,6 +1015,27 @@ static int smbd_post_send_empty(struct smbd_connection *info) return smbd_post_send_iter(info, NULL, &remaining_data_length); } +static int smbd_post_send_full_iter(struct smbd_connection *info, + struct iov_iter *iter, + int *_remaining_data_length) +{ + int rc = 0; + + /* + * smbd_post_send_iter() respects the + * negotiated max_send_size, so we need to + * loop until the full iter is posted + */ + + while (iov_iter_count(iter) > 0) { + rc = smbd_post_send_iter(info, iter, _remaining_data_length); + if (rc < 0) + break; + } + + return rc; +} + /* * Post a receive request to the transport * The remote peer can only send data when a receive request is posted @@ -2032,14 +2055,14 @@ int smbd_send(struct TCP_Server_Info *server, klen += rqst->rq_iov[i].iov_len; iov_iter_kvec(&iter, ITER_SOURCE, rqst->rq_iov, rqst->rq_nvec, klen); - rc = smbd_post_send_iter(info, &iter, &remaining_data_length); + rc = smbd_post_send_full_iter(info, &iter, &remaining_data_length); if (rc < 0) break; if (iov_iter_count(&rqst->rq_iter) > 0) { /* And then the data pages if there are any */ - rc = smbd_post_send_iter(info, &rqst->rq_iter, - &remaining_data_length); + rc = smbd_post_send_full_iter(info, &rqst->rq_iter, + &remaining_data_length); if (rc < 0) break; } From 9a709b7e98e6fa51600b5f2d24c5068efa6d39de Mon Sep 17 00:00:00 2001 From: Jens Axboe Date: Wed, 25 Jun 2025 10:17:06 -0600 Subject: [PATCH 181/289] io_uring/net: mark iov as dynamically allocated even for single segments A bigger array of vecs could've been allocated, but io_ring_buffers_peek() still decided to cap the mapped range depending on how much data was available. Hence don't rely on the segment count to know if the request should be marked as needing cleanup, always check upfront if the iov array is different than the fast_iov array. Fixes: 26ec15e4b0c1 ("io_uring/kbuf: don't truncate end buffer for multiple buffer peeks") Signed-off-by: Jens Axboe --- io_uring/net.c | 11 ++++++----- 1 file changed, 6 insertions(+), 5 deletions(-) diff --git a/io_uring/net.c b/io_uring/net.c index 9550d4c8f866..5c1e8c4ba468 100644 --- a/io_uring/net.c +++ b/io_uring/net.c @@ -1077,6 +1077,12 @@ static int io_recv_buf_select(struct io_kiocb *req, struct io_async_msghdr *kmsg if (unlikely(ret < 0)) return ret; + if (arg.iovs != &kmsg->fast_iov && arg.iovs != kmsg->vec.iovec) { + kmsg->vec.nr = ret; + kmsg->vec.iovec = arg.iovs; + req->flags |= REQ_F_NEED_CLEANUP; + } + /* special case 1 vec, can be a fast path */ if (ret == 1) { sr->buf = arg.iovs[0].iov_base; @@ -1085,11 +1091,6 @@ static int io_recv_buf_select(struct io_kiocb *req, struct io_async_msghdr *kmsg } iov_iter_init(&kmsg->msg.msg_iter, ITER_DEST, arg.iovs, ret, arg.out_len); - if (arg.iovs != &kmsg->fast_iov && arg.iovs != kmsg->vec.iovec) { - kmsg->vec.nr = ret; - kmsg->vec.iovec = arg.iovs; - req->flags |= REQ_F_NEED_CLEANUP; - } } else { void __user *buf; From e97f9540ce001503a4539f337da742c1dfa7d86a Mon Sep 17 00:00:00 2001 From: Stefan Metzmacher Date: Wed, 25 Jun 2025 10:13:04 +0200 Subject: [PATCH 182/289] smb: client: remove \t from TP_printk statements The generate '[FAILED TO PARSE]' strings in trace-cmd report output like this: rm-5298 [001] 6084.533748493: smb3_exit_err: [FAILED TO PARSE] xid=972 func_name=cifs_rmdir rc=-39 rm-5298 [001] 6084.533959234: smb3_enter: [FAILED TO PARSE] xid=973 func_name=cifs_closedir rm-5298 [001] 6084.533967630: smb3_close_enter: [FAILED TO PARSE] xid=973 fid=94489281833 tid=1 sesid=96758029877361 rm-5298 [001] 6084.534004008: smb3_cmd_enter: [FAILED TO PARSE] tid=1 sesid=96758029877361 cmd=6 mid=566 rm-5298 [001] 6084.552248232: smb3_cmd_done: [FAILED TO PARSE] tid=1 sesid=96758029877361 cmd=6 mid=566 rm-5298 [001] 6084.552280542: smb3_close_done: [FAILED TO PARSE] xid=973 fid=94489281833 tid=1 sesid=96758029877361 rm-5298 [001] 6084.552316034: smb3_exit_done: [FAILED TO PARSE] xid=973 func_name=cifs_closedir Cc: stable@vger.kernel.org Signed-off-by: Stefan Metzmacher Signed-off-by: Steve French --- fs/smb/client/trace.h | 24 ++++++++++++------------ 1 file changed, 12 insertions(+), 12 deletions(-) diff --git a/fs/smb/client/trace.h b/fs/smb/client/trace.h index 52bcb55d9952..93e5b2bb9f28 100644 --- a/fs/smb/client/trace.h +++ b/fs/smb/client/trace.h @@ -140,7 +140,7 @@ DECLARE_EVENT_CLASS(smb3_rw_err_class, __entry->len = len; __entry->rc = rc; ), - TP_printk("\tR=%08x[%x] xid=%u sid=0x%llx tid=0x%x fid=0x%llx offset=0x%llx len=0x%x rc=%d", + TP_printk("R=%08x[%x] xid=%u sid=0x%llx tid=0x%x fid=0x%llx offset=0x%llx len=0x%x rc=%d", __entry->rreq_debug_id, __entry->rreq_debug_index, __entry->xid, __entry->sesid, __entry->tid, __entry->fid, __entry->offset, __entry->len, __entry->rc) @@ -190,7 +190,7 @@ DECLARE_EVENT_CLASS(smb3_other_err_class, __entry->len = len; __entry->rc = rc; ), - TP_printk("\txid=%u sid=0x%llx tid=0x%x fid=0x%llx offset=0x%llx len=0x%x rc=%d", + TP_printk("xid=%u sid=0x%llx tid=0x%x fid=0x%llx offset=0x%llx len=0x%x rc=%d", __entry->xid, __entry->sesid, __entry->tid, __entry->fid, __entry->offset, __entry->len, __entry->rc) ) @@ -247,7 +247,7 @@ DECLARE_EVENT_CLASS(smb3_copy_range_err_class, __entry->len = len; __entry->rc = rc; ), - TP_printk("\txid=%u sid=0x%llx tid=0x%x source fid=0x%llx source offset=0x%llx target fid=0x%llx target offset=0x%llx len=0x%x rc=%d", + TP_printk("xid=%u sid=0x%llx tid=0x%x source fid=0x%llx source offset=0x%llx target fid=0x%llx target offset=0x%llx len=0x%x rc=%d", __entry->xid, __entry->sesid, __entry->tid, __entry->target_fid, __entry->src_offset, __entry->target_fid, __entry->target_offset, __entry->len, __entry->rc) ) @@ -298,7 +298,7 @@ DECLARE_EVENT_CLASS(smb3_copy_range_done_class, __entry->target_offset = target_offset; __entry->len = len; ), - TP_printk("\txid=%u sid=0x%llx tid=0x%x source fid=0x%llx source offset=0x%llx target fid=0x%llx target offset=0x%llx len=0x%x", + TP_printk("xid=%u sid=0x%llx tid=0x%x source fid=0x%llx source offset=0x%llx target fid=0x%llx target offset=0x%llx len=0x%x", __entry->xid, __entry->sesid, __entry->tid, __entry->target_fid, __entry->src_offset, __entry->target_fid, __entry->target_offset, __entry->len) ) @@ -482,7 +482,7 @@ DECLARE_EVENT_CLASS(smb3_fd_class, __entry->tid = tid; __entry->sesid = sesid; ), - TP_printk("\txid=%u sid=0x%llx tid=0x%x fid=0x%llx", + TP_printk("xid=%u sid=0x%llx tid=0x%x fid=0x%llx", __entry->xid, __entry->sesid, __entry->tid, __entry->fid) ) @@ -521,7 +521,7 @@ DECLARE_EVENT_CLASS(smb3_fd_err_class, __entry->sesid = sesid; __entry->rc = rc; ), - TP_printk("\txid=%u sid=0x%llx tid=0x%x fid=0x%llx rc=%d", + TP_printk("xid=%u sid=0x%llx tid=0x%x fid=0x%llx rc=%d", __entry->xid, __entry->sesid, __entry->tid, __entry->fid, __entry->rc) ) @@ -794,7 +794,7 @@ DECLARE_EVENT_CLASS(smb3_cmd_err_class, __entry->status = status; __entry->rc = rc; ), - TP_printk("\tsid=0x%llx tid=0x%x cmd=%u mid=%llu status=0x%x rc=%d", + TP_printk("sid=0x%llx tid=0x%x cmd=%u mid=%llu status=0x%x rc=%d", __entry->sesid, __entry->tid, __entry->cmd, __entry->mid, __entry->status, __entry->rc) ) @@ -829,7 +829,7 @@ DECLARE_EVENT_CLASS(smb3_cmd_done_class, __entry->cmd = cmd; __entry->mid = mid; ), - TP_printk("\tsid=0x%llx tid=0x%x cmd=%u mid=%llu", + TP_printk("sid=0x%llx tid=0x%x cmd=%u mid=%llu", __entry->sesid, __entry->tid, __entry->cmd, __entry->mid) ) @@ -867,7 +867,7 @@ DECLARE_EVENT_CLASS(smb3_mid_class, __entry->when_sent = when_sent; __entry->when_received = when_received; ), - TP_printk("\tcmd=%u mid=%llu pid=%u, when_sent=%lu when_rcv=%lu", + TP_printk("cmd=%u mid=%llu pid=%u, when_sent=%lu when_rcv=%lu", __entry->cmd, __entry->mid, __entry->pid, __entry->when_sent, __entry->when_received) ) @@ -898,7 +898,7 @@ DECLARE_EVENT_CLASS(smb3_exit_err_class, __assign_str(func_name); __entry->rc = rc; ), - TP_printk("\t%s: xid=%u rc=%d", + TP_printk("%s: xid=%u rc=%d", __get_str(func_name), __entry->xid, __entry->rc) ) @@ -924,7 +924,7 @@ DECLARE_EVENT_CLASS(smb3_sync_err_class, __entry->ino = ino; __entry->rc = rc; ), - TP_printk("\tino=%lu rc=%d", + TP_printk("ino=%lu rc=%d", __entry->ino, __entry->rc) ) @@ -950,7 +950,7 @@ DECLARE_EVENT_CLASS(smb3_enter_exit_class, __entry->xid = xid; __assign_str(func_name); ), - TP_printk("\t%s: xid=%u", + TP_printk("%s: xid=%u", __get_str(func_name), __entry->xid) ) From 0a46f60a9fe16f5596b6b4b3ee1a483ea7854136 Mon Sep 17 00:00:00 2001 From: Li Ming Date: Fri, 20 Jun 2025 13:29:24 +0800 Subject: [PATCH 183/289] cxl/edac: Fix using wrong repair type to check dram event record cxl_find_rec_dram() is used to find a DRAM event record based on the inputted attributes. Different repair_type of the inputted attributes will check the DRAM event record in different ways. When EDAC driver is performing a memory rank sparing, it should use CXL_RANK_SPARING rather than CXL_BANK_SPARING as repair_type for DRAM event record checking. Fixes: 588ca944c277 ("cxl/edac: Add CXL memory device memory sparing control feature") Signed-off-by: Li Ming Reviewed-by: Shiju Jose Reviewed-by: Jonathan Cameron Reviewed-by: Ira Weiny Reviewed-by: Fan Ni Link: https://patch.msgid.link/20250620052924.138892-1-ming.li@zohomail.com Signed-off-by: Dave Jiang --- drivers/cxl/core/edac.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/cxl/core/edac.c b/drivers/cxl/core/edac.c index d725ee954199..623aaa4439c4 100644 --- a/drivers/cxl/core/edac.c +++ b/drivers/cxl/core/edac.c @@ -1323,7 +1323,7 @@ cxl_mem_get_rec_dram(struct cxl_memdev *cxlmd, attrbs.bank = ctx->bank; break; case EDAC_REPAIR_RANK_SPARING: - attrbs.repair_type = CXL_BANK_SPARING; + attrbs.repair_type = CXL_RANK_SPARING; break; default: return NULL; From a5d0b9e32745277644cda8d7d334e7080bd339bf Mon Sep 17 00:00:00 2001 From: Lukasz Kucharczyk Date: Tue, 20 May 2025 14:22:52 +0200 Subject: [PATCH 184/289] i2c: imx: fix emulated smbus block read Acknowledge the byte count submitted by the target. When I2C_SMBUS_BLOCK_DATA read operation is executed by i2c_smbus_xfer_emulated(), the length of the second (read) message is set to 1. Length of the block is supposed to be obtained from the target by the underlying bus driver. The i2c_imx_isr_read() function should emit the acknowledge on i2c bus after reading the first byte (i.e., byte count) while processing such message (as defined in Section 6.5.7 of System Management Bus Specification [1]). Without this acknowledge, the target does not submit subsequent bytes and the controller only reads 0xff's. In addition, store the length of block data obtained from the target in the buffer provided by i2c_smbus_xfer_emulated() - otherwise the first byte of actual data is erroneously interpreted as length of the data block. [1] https://smbus.org/specs/SMBus_3_3_20240512.pdf Fixes: 5f5c2d4579ca ("i2c: imx: prevent rescheduling in non dma mode") Signed-off-by: Lukasz Kucharczyk Cc: # v6.13+ Acked-by: Oleksij Rempel Reviewed-by: Stefan Eichenberger Reviewed-by: Carlos Song Signed-off-by: Andi Shyti Link: https://lore.kernel.org/r/20250520122252.1475403-1-lukasz.kucharczyk@leica-geosystems.com --- drivers/i2c/busses/i2c-imx.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/i2c/busses/i2c-imx.c b/drivers/i2c/busses/i2c-imx.c index e5732b0557fb..205cc132fdec 100644 --- a/drivers/i2c/busses/i2c-imx.c +++ b/drivers/i2c/busses/i2c-imx.c @@ -1008,7 +1008,7 @@ static inline int i2c_imx_isr_read(struct imx_i2c_struct *i2c_imx) /* setup bus to read data */ temp = imx_i2c_read_reg(i2c_imx, IMX_I2C_I2CR); temp &= ~I2CR_MTX; - if (i2c_imx->msg->len - 1) + if ((i2c_imx->msg->len - 1) || (i2c_imx->msg->flags & I2C_M_RECV_LEN)) temp &= ~I2CR_TXAK; imx_i2c_write_reg(temp, i2c_imx, IMX_I2C_I2CR); @@ -1063,6 +1063,7 @@ static inline void i2c_imx_isr_read_block_data_len(struct imx_i2c_struct *i2c_im wake_up(&i2c_imx->queue); } i2c_imx->msg->len += len; + i2c_imx->msg->buf[i2c_imx->msg_buf_idx++] = len; } static irqreturn_t i2c_imx_master_isr(struct imx_i2c_struct *i2c_imx, unsigned int status) From 56ad91c1aa9c18064348edf69308080b03c9dc48 Mon Sep 17 00:00:00 2001 From: Wolfram Sang Date: Thu, 22 May 2025 08:42:35 +0200 Subject: [PATCH 185/289] i2c: robotfuzz-osif: disable zero-length read messages This driver passes the length of an i2c_msg directly to usb_control_msg(). If the message is now a read and of length 0, it violates the USB protocol and a warning will be printed. Enable the I2C_AQ_NO_ZERO_LEN_READ quirk for this adapter thus forbidding 0-length read messages altogether. Fixes: 83e53a8f120f ("i2c: Add bus driver for for OSIF USB i2c device.") Signed-off-by: Wolfram Sang Cc: # v3.14+ Signed-off-by: Andi Shyti Link: https://lore.kernel.org/r/20250522064234.3721-2-wsa+renesas@sang-engineering.com --- drivers/i2c/busses/i2c-robotfuzz-osif.c | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/drivers/i2c/busses/i2c-robotfuzz-osif.c b/drivers/i2c/busses/i2c-robotfuzz-osif.c index 80d45079b763..e0a76fb5bc31 100644 --- a/drivers/i2c/busses/i2c-robotfuzz-osif.c +++ b/drivers/i2c/busses/i2c-robotfuzz-osif.c @@ -111,6 +111,11 @@ static u32 osif_func(struct i2c_adapter *adapter) return I2C_FUNC_I2C | I2C_FUNC_SMBUS_EMUL; } +/* prevent invalid 0-length usb_control_msg */ +static const struct i2c_adapter_quirks osif_quirks = { + .flags = I2C_AQ_NO_ZERO_LEN_READ, +}; + static const struct i2c_algorithm osif_algorithm = { .xfer = osif_xfer, .functionality = osif_func, @@ -143,6 +148,7 @@ static int osif_probe(struct usb_interface *interface, priv->adapter.owner = THIS_MODULE; priv->adapter.class = I2C_CLASS_HWMON; + priv->adapter.quirks = &osif_quirks; priv->adapter.algo = &osif_algorithm; priv->adapter.algo_data = priv; snprintf(priv->adapter.name, sizeof(priv->adapter.name), From cbdb25ccf7566eee0c2b945e35cb98baf9ed0aa6 Mon Sep 17 00:00:00 2001 From: Wolfram Sang Date: Thu, 22 May 2025 08:43:49 +0200 Subject: [PATCH 186/289] i2c: tiny-usb: disable zero-length read messages This driver passes the length of an i2c_msg directly to usb_control_msg(). If the message is now a read and of length 0, it violates the USB protocol and a warning will be printed. Enable the I2C_AQ_NO_ZERO_LEN_READ quirk for this adapter thus forbidding 0-length read messages altogether. Fixes: e8c76eed2ecd ("i2c: New i2c-tiny-usb bus driver") Signed-off-by: Wolfram Sang Cc: # v2.6.22+ Signed-off-by: Andi Shyti Link: https://lore.kernel.org/r/20250522064349.3823-2-wsa+renesas@sang-engineering.com --- drivers/i2c/busses/i2c-tiny-usb.c | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/drivers/i2c/busses/i2c-tiny-usb.c b/drivers/i2c/busses/i2c-tiny-usb.c index a18eab0992a1..57dfe5f1a7d9 100644 --- a/drivers/i2c/busses/i2c-tiny-usb.c +++ b/drivers/i2c/busses/i2c-tiny-usb.c @@ -139,6 +139,11 @@ out: return ret; } +/* prevent invalid 0-length usb_control_msg */ +static const struct i2c_adapter_quirks usb_quirks = { + .flags = I2C_AQ_NO_ZERO_LEN_READ, +}; + /* This is the actual algorithm we define */ static const struct i2c_algorithm usb_algorithm = { .xfer = usb_xfer, @@ -247,6 +252,7 @@ static int i2c_tiny_usb_probe(struct usb_interface *interface, /* setup i2c adapter description */ dev->adapter.owner = THIS_MODULE; dev->adapter.class = I2C_CLASS_HWMON; + dev->adapter.quirks = &usb_quirks; dev->adapter.algo = &usb_algorithm; dev->adapter.algo_data = dev; snprintf(dev->adapter.name, sizeof(dev->adapter.name), From 942e1aece13e4007656e41d3182c0adf05cf08b8 Mon Sep 17 00:00:00 2001 From: Pratap Nirujogi Date: Mon, 9 Jun 2025 11:53:55 -0400 Subject: [PATCH 187/289] i2c: designware: Initialize adapter name only when not set Check if the adapter name is already set in the driver prior to initializing with generic name in i2c_dw_probe_master(). This check allows to retain the unique adapter name driver has initialized, which platform driver can use to distinguish it from other i2c designware adapters. Tested-by: Randy Dunlap Signed-off-by: Pratap Nirujogi Signed-off-by: Andi Shyti Link: https://lore.kernel.org/r/20250609155601.1477055-2-pratap.nirujogi@amd.com --- drivers/i2c/busses/i2c-designware-master.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/drivers/i2c/busses/i2c-designware-master.c b/drivers/i2c/busses/i2c-designware-master.c index c5394229b77f..9d7d9e47564a 100644 --- a/drivers/i2c/busses/i2c-designware-master.c +++ b/drivers/i2c/busses/i2c-designware-master.c @@ -1042,8 +1042,9 @@ int i2c_dw_probe_master(struct dw_i2c_dev *dev) if (ret) return ret; - snprintf(adap->name, sizeof(adap->name), - "Synopsys DesignWare I2C adapter"); + if (!adap->name[0]) + scnprintf(adap->name, sizeof(adap->name), + "Synopsys DesignWare I2C adapter"); adap->retries = 3; adap->algo = &i2c_dw_algo; adap->quirks = &i2c_dw_quirks; From c8dc579169738a3546f57ecb38e62d3872a3cc04 Mon Sep 17 00:00:00 2001 From: Pratap Nirujogi Date: Mon, 9 Jun 2025 11:53:56 -0400 Subject: [PATCH 188/289] i2c: amd-isp: Initialize unique adapter name Initialize unique name for amdisp i2c adapter, which is used in the platform driver to detect the matching adapter for i2c_client creation. Add definition of amdisp i2c adapter name in a new header file (include/linux/soc/amd/isp4_misc.h) as it is referred in different driver modules. Tested-by: Randy Dunlap Signed-off-by: Pratap Nirujogi Signed-off-by: Andi Shyti Link: https://lore.kernel.org/r/20250609155601.1477055-3-pratap.nirujogi@amd.com --- MAINTAINERS | 1 + drivers/i2c/busses/i2c-designware-amdisp.c | 2 ++ include/linux/soc/amd/isp4_misc.h | 12 ++++++++++++ 3 files changed, 15 insertions(+) create mode 100644 include/linux/soc/amd/isp4_misc.h diff --git a/MAINTAINERS b/MAINTAINERS index c3f7fbd0d67a..8719f097aae3 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -24063,6 +24063,7 @@ M: Bin Du L: linux-i2c@vger.kernel.org S: Maintained F: drivers/i2c/busses/i2c-designware-amdisp.c +F: include/linux/soc/amd/isp4_misc.h SYNOPSYS DESIGNWARE MMC/SD/SDIO DRIVER M: Jaehoon Chung diff --git a/drivers/i2c/busses/i2c-designware-amdisp.c b/drivers/i2c/busses/i2c-designware-amdisp.c index ad6f08338124..450793d5f839 100644 --- a/drivers/i2c/busses/i2c-designware-amdisp.c +++ b/drivers/i2c/busses/i2c-designware-amdisp.c @@ -8,6 +8,7 @@ #include #include #include +#include #include "i2c-designware-core.h" @@ -62,6 +63,7 @@ static int amd_isp_dw_i2c_plat_probe(struct platform_device *pdev) adap = &isp_i2c_dev->adapter; adap->owner = THIS_MODULE; + scnprintf(adap->name, sizeof(adap->name), AMDISP_I2C_ADAP_NAME); ACPI_COMPANION_SET(&adap->dev, ACPI_COMPANION(&pdev->dev)); adap->dev.of_node = pdev->dev.of_node; /* use dynamically allocated adapter id */ diff --git a/include/linux/soc/amd/isp4_misc.h b/include/linux/soc/amd/isp4_misc.h new file mode 100644 index 000000000000..6738796986a7 --- /dev/null +++ b/include/linux/soc/amd/isp4_misc.h @@ -0,0 +1,12 @@ +// SPDX-License-Identifier: GPL-2.0+ + +/* + * Copyright (C) 2025 Advanced Micro Devices, Inc. + */ + +#ifndef __SOC_ISP4_MISC_H +#define __SOC_ISP4_MISC_H + +#define AMDISP_I2C_ADAP_NAME "AMDISP DesignWare I2C adapter" + +#endif From 577c1e0ef351e41aba764816232a9feb7a9b3969 Mon Sep 17 00:00:00 2001 From: Pratap Nirujogi Date: Mon, 9 Jun 2025 11:53:57 -0400 Subject: [PATCH 189/289] platform/x86: Use i2c adapter name to fix build errors MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Use adapater->name inplace of adapter->owner->name to fix build issues when CONFIG_MODULES is not defined. Fixes: 90b85567e457 ("platform/x86: Add AMD ISP platform config for OV05C10") Reported-by: Randy Dunlap Link: https://lore.kernel.org/all/04577a46-9add-420c-b181-29bad582026d@infradead.org Tested-by: Randy Dunlap Signed-off-by: Pratap Nirujogi Requires: 942e1aece13e ("i2c: designware: Initialize adapter name only when not set" Requires: c8dc57916973 ("i2c: amd-isp: Initialize unique adapter name") Acked-by: Ilpo Järvinen Signed-off-by: Andi Shyti Link: https://lore.kernel.org/r/20250609155601.1477055-4-pratap.nirujogi@amd.com --- drivers/platform/x86/amd/amd_isp4.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/platform/x86/amd/amd_isp4.c b/drivers/platform/x86/amd/amd_isp4.c index 0cc01441bcbb..9f291aeb35f1 100644 --- a/drivers/platform/x86/amd/amd_isp4.c +++ b/drivers/platform/x86/amd/amd_isp4.c @@ -11,6 +11,7 @@ #include #include #include +#include #include #include #include @@ -151,7 +152,7 @@ MODULE_DEVICE_TABLE(acpi, amdisp_sensor_ids); static inline bool is_isp_i2c_adapter(struct i2c_adapter *adap) { - return !strcmp(adap->owner->name, "i2c_designware_amdisp"); + return !strcmp(adap->name, AMDISP_I2C_ADAP_NAME); } static void instantiate_isp_i2c_client(struct amdisp_platform *isp4_platform, From 666c23af755dccca8c25b5d5200ca28153c69a05 Mon Sep 17 00:00:00 2001 From: Christophe JAILLET Date: Sat, 14 Jun 2025 16:59:26 +0200 Subject: [PATCH 190/289] i2c: omap: Fix an error handling path in omap_i2c_probe() If an error occurs after calling mux_state_select(), mux_state_deselect() should be called as already done in the remove function. Fixes: b6ef830c60b6 ("i2c: omap: Add support for setting mux") Signed-off-by: Christophe JAILLET Cc: # v6.15+ Signed-off-by: Andi Shyti Link: https://lore.kernel.org/r/998542981b6d2435c057dd8b9fe71743927babab.1749913149.git.christophe.jaillet@wanadoo.fr --- drivers/i2c/busses/i2c-omap.c | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/drivers/i2c/busses/i2c-omap.c b/drivers/i2c/busses/i2c-omap.c index f1cc26ac5b80..8b01df3cc8e9 100644 --- a/drivers/i2c/busses/i2c-omap.c +++ b/drivers/i2c/busses/i2c-omap.c @@ -1461,13 +1461,13 @@ omap_i2c_probe(struct platform_device *pdev) if (IS_ERR(mux_state)) { r = PTR_ERR(mux_state); dev_dbg(&pdev->dev, "failed to get I2C mux: %d\n", r); - goto err_disable_pm; + goto err_put_pm; } omap->mux_state = mux_state; r = mux_state_select(omap->mux_state); if (r) { dev_err(&pdev->dev, "failed to select I2C mux: %d\n", r); - goto err_disable_pm; + goto err_put_pm; } } @@ -1515,6 +1515,9 @@ omap_i2c_probe(struct platform_device *pdev) err_unuse_clocks: omap_i2c_write_reg(omap, OMAP_I2C_CON_REG, 0); + if (omap->mux_state) + mux_state_deselect(omap->mux_state); +err_put_pm: pm_runtime_dont_use_autosuspend(omap->dev); pm_runtime_put_sync(omap->dev); err_disable_pm: From 4a5e85f4eb8fd18b1266342d100e4f0849544ca0 Mon Sep 17 00:00:00 2001 From: David Hildenbrand Date: Tue, 17 Jun 2025 16:35:32 +0200 Subject: [PATCH 191/289] fs/proc/task_mmu: fix PAGE_IS_PFNZERO detection for the huge zero folio is_zero_pfn() does not work for the huge zero folio. Fix it by using is_huge_zero_pmd(). This can cause the PAGEMAP_SCAN ioctl against /proc/pid/pagemap to present pages as PAGE_IS_PRESENT rather than as PAGE_IS_PFNZERO. Found by code inspection. Link: https://lkml.kernel.org/r/20250617143532.2375383-1-david@redhat.com Fixes: 52526ca7fdb9 ("fs/proc/task_mmu: implement IOCTL to get and optionally clear info about PTEs") Signed-off-by: David Hildenbrand Cc: Muhammad Usama Anjum Cc: Signed-off-by: Andrew Morton --- fs/proc/task_mmu.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c index 27972c0749e7..4be91eb6ea5c 100644 --- a/fs/proc/task_mmu.c +++ b/fs/proc/task_mmu.c @@ -2182,7 +2182,7 @@ static unsigned long pagemap_thp_category(struct pagemap_scan_private *p, categories |= PAGE_IS_FILE; } - if (is_zero_pfn(pmd_pfn(pmd))) + if (is_huge_zero_pmd(pmd)) categories |= PAGE_IS_PFNZERO; if (pmd_soft_dirty(pmd)) categories |= PAGE_IS_SOFT_DIRTY; From 86bc5752a9173589271439911b33b4d952b0a91f Mon Sep 17 00:00:00 2001 From: Michal Hocko Date: Tue, 17 Jun 2025 10:58:19 +0200 Subject: [PATCH 192/289] mm: add OOM killer maintainer structure Add MAINTAINERS info for the oom-killer. [akpm@linux-foundation.org: fix mhocko email address (SeongJae), add files (Lorenzo)] [akpm@linux-foundation.org: fix ordering] Link: https://lkml.kernel.org/r/20250617085819.355838-1-mhocko@kernel.org Signed-off-by: Michal Hocko Acked-by: David Rientjes Acked-by: Shakeel Butt Acked-by: Lorenzo Stoakes Acked-by: SeongJae Park Acked-by: Vlastimil Babka Cc: Michal Hocko Signed-off-by: Andrew Morton --- MAINTAINERS | 11 +++++++++++ 1 file changed, 11 insertions(+) diff --git a/MAINTAINERS b/MAINTAINERS index efb51ee92683..689b9a20b6df 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -15848,6 +15848,17 @@ F: mm/numa.c F: mm/numa_emulation.c F: mm/numa_memblks.c +MEMORY MANAGEMENT - OOM KILLER +M: Michal Hocko +R: David Rientjes +R: Shakeel Butt +L: linux-mm@kvack.org +S: Maintained +F: include/linux/oom.h +F: include/trace/events/oom.h +F: include/uapi/linux/oom.h +F: mm/oom_kill.c + MEMORY MANAGEMENT - PAGE ALLOCATOR M: Andrew Morton M: Vlastimil Babka From 71aa17fd981be2ef683bf83326128c505abe070c Mon Sep 17 00:00:00 2001 From: "Mike Rapoport (Microsoft)" Date: Wed, 18 Jun 2025 19:47:52 +0300 Subject: [PATCH 193/289] MAINTAINERS: add tree entry to mm init block Link: https://lkml.kernel.org/r/aFLubPfiO5hqfhCe@kernel.org Signed-off-by: Mike Rapoport (Microsoft) Cc: Lorenzo Stoakes Signed-off-by: Andrew Morton --- MAINTAINERS | 2 ++ 1 file changed, 2 insertions(+) diff --git a/MAINTAINERS b/MAINTAINERS index 689b9a20b6df..fc6cc0b2e9a3 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -15676,6 +15676,8 @@ MEMBLOCK AND MEMORY MANAGEMENT INITIALIZATION M: Mike Rapoport L: linux-mm@kvack.org S: Maintained +T: git git://git.kernel.org/pub/scm/linux/kernel/git/rppt/memblock.git for-next +T: git git://git.kernel.org/pub/scm/linux/kernel/git/rppt/memblock.git fixes F: Documentation/core-api/boot-time-mm.rst F: Documentation/core-api/kho/bindings/memblock/* F: include/linux/memblock.h From 3746351e876b42f5221c2f85ba60965b38494ff2 Mon Sep 17 00:00:00 2001 From: Lorenzo Stoakes Date: Wed, 18 Jun 2025 11:59:53 +0100 Subject: [PATCH 194/289] MAINTAINERS: add missing files to mm page alloc section There are a number of files within memory management which appear to be most suitably placed within the page allocation section of MAINTAINERS and are otherwise unassigned, so place these there. Link: https://lkml.kernel.org/r/20250618105953.67630-1-lorenzo.stoakes@oracle.com Signed-off-by: Lorenzo Stoakes Acked-by: Vlastimil Babka Acked-by: Brendan Jackman Acked-by: Zi Yan Acked-by: David Hildenbrand Cc: Johannes Weiner Cc: Michal Hocko Cc: Suren Baghdasaryan Signed-off-by: Andrew Morton --- MAINTAINERS | 9 +++++++++ 1 file changed, 9 insertions(+) diff --git a/MAINTAINERS b/MAINTAINERS index fc6cc0b2e9a3..4e131bb97894 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -15875,8 +15875,17 @@ F: include/linux/compaction.h F: include/linux/gfp.h F: include/linux/page-isolation.h F: mm/compaction.c +F: mm/debug_page_alloc.c +F: mm/fail_page_alloc.c F: mm/page_alloc.c +F: mm/page_ext.c +F: mm/page_frag_cache.c F: mm/page_isolation.c +F: mm/page_owner.c +F: mm/page_poison.c +F: mm/page_reporting.c +F: mm/show_mem.c +F: mm/shuffle.c MEMORY MANAGEMENT - RECLAIM M: Andrew Morton From 344ef45b03336e7f74658814f66483b5417c9cf1 Mon Sep 17 00:00:00 2001 From: Ge Yang Date: Tue, 27 May 2025 11:36:50 +0800 Subject: [PATCH 195/289] mm/hugetlb: remove unnecessary holding of hugetlb_lock In isolate_or_dissolve_huge_folio(), after acquiring the hugetlb_lock, it is only for the purpose of obtaining the correct hstate, which is then passed to alloc_and_dissolve_hugetlb_folio(). alloc_and_dissolve_hugetlb_folio() itself also acquires the hugetlb_lock. We can have alloc_and_dissolve_hugetlb_folio() obtain the hstate by itself, so that isolate_or_dissolve_huge_folio() no longer needs to acquire the hugetlb_lock. In addition, we keep the folio_test_hugetlb() check within isolate_or_dissolve_huge_folio(). By doing so, we can avoid disrupting the normal path by vainly holding the hugetlb_lock. replace_free_hugepage_folios() has the same issue, and we should address it as well. Addresses a possible performance problem which was added by the hotfix 113ed54ad276 ("mm/hugetlb: fix kernel NULL pointer dereference when replacing free hugetlb folios"). Link: https://lkml.kernel.org/r/1748317010-16272-1-git-send-email-yangge1116@126.com Fixes: 113ed54ad276 ("mm/hugetlb: fix kernel NULL pointer dereference when replacing free hugetlb folios") Signed-off-by: Ge Yang Suggested-by: Oscar Salvador Reviewed-by: Muchun Song Cc: Baolin Wang Cc: Barry Song <21cnbao@gmail.com> Cc: David Hildenbrand Cc: Signed-off-by: Andrew Morton --- mm/hugetlb.c | 54 +++++++++++++++++----------------------------------- 1 file changed, 17 insertions(+), 37 deletions(-) diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 8746ed2fec13..9dc95eac558c 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -2787,20 +2787,24 @@ void restore_reserve_on_error(struct hstate *h, struct vm_area_struct *vma, /* * alloc_and_dissolve_hugetlb_folio - Allocate a new folio and dissolve * the old one - * @h: struct hstate old page belongs to * @old_folio: Old folio to dissolve * @list: List to isolate the page in case we need to * Returns 0 on success, otherwise negated error. */ -static int alloc_and_dissolve_hugetlb_folio(struct hstate *h, - struct folio *old_folio, struct list_head *list) +static int alloc_and_dissolve_hugetlb_folio(struct folio *old_folio, + struct list_head *list) { - gfp_t gfp_mask = htlb_alloc_mask(h) | __GFP_THISNODE; + gfp_t gfp_mask; + struct hstate *h; int nid = folio_nid(old_folio); struct folio *new_folio = NULL; int ret = 0; retry: + /* + * The old_folio might have been dissolved from under our feet, so make sure + * to carefully check the state under the lock. + */ spin_lock_irq(&hugetlb_lock); if (!folio_test_hugetlb(old_folio)) { /* @@ -2829,8 +2833,10 @@ retry: cond_resched(); goto retry; } else { + h = folio_hstate(old_folio); if (!new_folio) { spin_unlock_irq(&hugetlb_lock); + gfp_mask = htlb_alloc_mask(h) | __GFP_THISNODE; new_folio = alloc_buddy_hugetlb_folio(h, gfp_mask, nid, NULL, NULL); if (!new_folio) @@ -2874,35 +2880,24 @@ free_new: int isolate_or_dissolve_huge_folio(struct folio *folio, struct list_head *list) { - struct hstate *h; int ret = -EBUSY; - /* - * The page might have been dissolved from under our feet, so make sure - * to carefully check the state under the lock. - * Return success when racing as if we dissolved the page ourselves. - */ - spin_lock_irq(&hugetlb_lock); - if (folio_test_hugetlb(folio)) { - h = folio_hstate(folio); - } else { - spin_unlock_irq(&hugetlb_lock); + /* Not to disrupt normal path by vainly holding hugetlb_lock */ + if (!folio_test_hugetlb(folio)) return 0; - } - spin_unlock_irq(&hugetlb_lock); /* * Fence off gigantic pages as there is a cyclic dependency between * alloc_contig_range and them. Return -ENOMEM as this has the effect * of bailing out right away without further retrying. */ - if (hstate_is_gigantic(h)) + if (folio_order(folio) > MAX_PAGE_ORDER) return -ENOMEM; if (folio_ref_count(folio) && folio_isolate_hugetlb(folio, list)) ret = 0; else if (!folio_ref_count(folio)) - ret = alloc_and_dissolve_hugetlb_folio(h, folio, list); + ret = alloc_and_dissolve_hugetlb_folio(folio, list); return ret; } @@ -2916,7 +2911,6 @@ int isolate_or_dissolve_huge_folio(struct folio *folio, struct list_head *list) */ int replace_free_hugepage_folios(unsigned long start_pfn, unsigned long end_pfn) { - struct hstate *h; struct folio *folio; int ret = 0; @@ -2925,23 +2919,9 @@ int replace_free_hugepage_folios(unsigned long start_pfn, unsigned long end_pfn) while (start_pfn < end_pfn) { folio = pfn_folio(start_pfn); - /* - * The folio might have been dissolved from under our feet, so make sure - * to carefully check the state under the lock. - */ - spin_lock_irq(&hugetlb_lock); - if (folio_test_hugetlb(folio)) { - h = folio_hstate(folio); - } else { - spin_unlock_irq(&hugetlb_lock); - start_pfn++; - continue; - } - spin_unlock_irq(&hugetlb_lock); - - if (!folio_ref_count(folio)) { - ret = alloc_and_dissolve_hugetlb_folio(h, folio, - &isolate_list); + /* Not to disrupt normal path by vainly holding hugetlb_lock */ + if (folio_test_hugetlb(folio) && !folio_ref_count(folio)) { + ret = alloc_and_dissolve_hugetlb_folio(folio, &isolate_list); if (ret) break; From df831e97739405ecbaddb85516bc7d4d1c933d6b Mon Sep 17 00:00:00 2001 From: Yu Kuai Date: Thu, 19 Jun 2025 21:26:55 +0800 Subject: [PATCH 196/289] lib/group_cpus: fix NULL pointer dereference from group_cpus_evenly() While testing null_blk with configfs, echo 0 > poll_queues will trigger following panic: BUG: kernel NULL pointer dereference, address: 0000000000000010 Oops: Oops: 0000 [#1] SMP NOPTI CPU: 27 UID: 0 PID: 920 Comm: bash Not tainted 6.15.0-02023-gadbdb95c8696-dirty #1238 PREEMPT(undef) Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.1-2.fc37 04/01/2014 RIP: 0010:__bitmap_or+0x48/0x70 Call Trace: __group_cpus_evenly+0x822/0x8c0 group_cpus_evenly+0x2d9/0x490 blk_mq_map_queues+0x1e/0x110 null_map_queues+0xc9/0x170 [null_blk] blk_mq_update_queue_map+0xdb/0x160 blk_mq_update_nr_hw_queues+0x22b/0x560 nullb_update_nr_hw_queues+0x71/0xf0 [null_blk] nullb_device_poll_queues_store+0xa4/0x130 [null_blk] configfs_write_iter+0x109/0x1d0 vfs_write+0x26e/0x6f0 ksys_write+0x79/0x180 __x64_sys_write+0x1d/0x30 x64_sys_call+0x45c4/0x45f0 do_syscall_64+0xa5/0x240 entry_SYSCALL_64_after_hwframe+0x76/0x7e Root cause is that numgrps is set to 0, and ZERO_SIZE_PTR is returned from kcalloc(), and later ZERO_SIZE_PTR will be deferenced. Fix the problem by checking numgrps first in group_cpus_evenly(), and return NULL directly if numgrps is zero. [yukuai3@huawei.com: also fix the non-SMP version] Link: https://lkml.kernel.org/r/20250620010958.1265984-1-yukuai1@huaweicloud.com Link: https://lkml.kernel.org/r/20250619132655.3318883-1-yukuai1@huaweicloud.com Fixes: 6a6dcae8f486 ("blk-mq: Build default queue map via group_cpus_evenly()") Signed-off-by: Yu Kuai Reviewed-by: Ming Lei Reviewed-by: Jens Axboe Cc: ErKun Yang Cc: John Garry Cc: Thomas Gleinxer Cc: "zhangyi (F)" Cc: Signed-off-by: Andrew Morton --- lib/group_cpus.c | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-) diff --git a/lib/group_cpus.c b/lib/group_cpus.c index ee272c4cefcc..18d43a406114 100644 --- a/lib/group_cpus.c +++ b/lib/group_cpus.c @@ -352,6 +352,9 @@ struct cpumask *group_cpus_evenly(unsigned int numgrps) int ret = -ENOMEM; struct cpumask *masks = NULL; + if (numgrps == 0) + return NULL; + if (!zalloc_cpumask_var(&nmsk, GFP_KERNEL)) return NULL; @@ -426,8 +429,12 @@ struct cpumask *group_cpus_evenly(unsigned int numgrps) #else /* CONFIG_SMP */ struct cpumask *group_cpus_evenly(unsigned int numgrps) { - struct cpumask *masks = kcalloc(numgrps, sizeof(*masks), GFP_KERNEL); + struct cpumask *masks; + if (numgrps == 0) + return NULL; + + masks = kcalloc(numgrps, sizeof(*masks), GFP_KERNEL); if (!masks) return NULL; From f5769359c5b241978e6933672bb78b3adc36aa18 Mon Sep 17 00:00:00 2001 From: Hao Ge Date: Fri, 20 Jun 2025 02:31:54 +0800 Subject: [PATCH 197/289] mm/alloc_tag: fix the kmemleak false positive issue in the allocation of the percpu variable tag->counters When loading a module, as long as the module has memory allocation operations, kmemleak produces a false positive report that resembles the following: unreferenced object (percpu) 0x7dfd232a1650 (size 16): comm "modprobe", pid 1301, jiffies 4294940249 hex dump (first 16 bytes on cpu 2): 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace (crc 0): kmemleak_alloc_percpu+0xb4/0xd0 pcpu_alloc_noprof+0x700/0x1098 load_module+0xd4/0x348 codetag_module_init+0x20c/0x450 codetag_load_module+0x70/0xb8 load_module+0xef8/0x1608 init_module_from_file+0xec/0x158 idempotent_init_module+0x354/0x608 __arm64_sys_finit_module+0xbc/0x150 invoke_syscall+0xd4/0x258 el0_svc_common.constprop.0+0xb4/0x240 do_el0_svc+0x48/0x68 el0_svc+0x40/0xf8 el0t_64_sync_handler+0x10c/0x138 el0t_64_sync+0x1ac/0x1b0 This is because the module can only indirectly reference alloc_tag_counters through the alloc_tag section, which misleads kmemleak. However, we don't have a kmemleak ignore interface for percpu allocations yet. So let's create one and invoke it for tag->counters. [gehao@kylinos.cn: fix build error when CONFIG_DEBUG_KMEMLEAK=n, s/igonore/ignore/] Link: https://lkml.kernel.org/r/20250620093102.2416767-1-hao.ge@linux.dev Link: https://lkml.kernel.org/r/20250619183154.2122608-1-hao.ge@linux.dev Fixes: 12ca42c23775 ("alloc_tag: allocate percpu counters for module tags dynamically") Signed-off-by: Hao Ge Reviewed-by: Catalin Marinas Acked-by: Suren Baghdasaryan [lib/alloc_tag.c] Cc: Kent Overstreet Signed-off-by: Andrew Morton --- include/linux/kmemleak.h | 4 ++++ lib/alloc_tag.c | 8 +++++++- mm/kmemleak.c | 14 ++++++++++++++ 3 files changed, 25 insertions(+), 1 deletion(-) diff --git a/include/linux/kmemleak.h b/include/linux/kmemleak.h index 93a73c076d16..fbd424b2abb1 100644 --- a/include/linux/kmemleak.h +++ b/include/linux/kmemleak.h @@ -28,6 +28,7 @@ extern void kmemleak_update_trace(const void *ptr) __ref; extern void kmemleak_not_leak(const void *ptr) __ref; extern void kmemleak_transient_leak(const void *ptr) __ref; extern void kmemleak_ignore(const void *ptr) __ref; +extern void kmemleak_ignore_percpu(const void __percpu *ptr) __ref; extern void kmemleak_scan_area(const void *ptr, size_t size, gfp_t gfp) __ref; extern void kmemleak_no_scan(const void *ptr) __ref; extern void kmemleak_alloc_phys(phys_addr_t phys, size_t size, @@ -97,6 +98,9 @@ static inline void kmemleak_not_leak(const void *ptr) static inline void kmemleak_transient_leak(const void *ptr) { } +static inline void kmemleak_ignore_percpu(const void __percpu *ptr) +{ +} static inline void kmemleak_ignore(const void *ptr) { } diff --git a/lib/alloc_tag.c b/lib/alloc_tag.c index d48b80f3f007..3a74d63a959e 100644 --- a/lib/alloc_tag.c +++ b/lib/alloc_tag.c @@ -10,6 +10,7 @@ #include #include #include +#include #define ALLOCINFO_FILE_NAME "allocinfo" #define MODULE_ALLOC_TAG_VMAP_SIZE (100000UL * sizeof(struct alloc_tag)) @@ -632,8 +633,13 @@ static int load_module(struct module *mod, struct codetag *start, struct codetag mod->name); return -ENOMEM; } - } + /* + * Avoid a kmemleak false positive. The pointer to the counters is stored + * in the alloc_tag section of the module and cannot be directly accessed. + */ + kmemleak_ignore_percpu(tag->counters); + } return 0; } diff --git a/mm/kmemleak.c b/mm/kmemleak.c index da9cee34ee1b..8d588e685311 100644 --- a/mm/kmemleak.c +++ b/mm/kmemleak.c @@ -1246,6 +1246,20 @@ void __ref kmemleak_transient_leak(const void *ptr) } EXPORT_SYMBOL(kmemleak_transient_leak); +/** + * kmemleak_ignore_percpu - similar to kmemleak_ignore but taking a percpu + * address argument + * @ptr: percpu address of the object + */ +void __ref kmemleak_ignore_percpu(const void __percpu *ptr) +{ + pr_debug("%s(0x%px)\n", __func__, ptr); + + if (kmemleak_enabled && ptr && !IS_ERR_PCPU(ptr)) + make_black_object((unsigned long)ptr, OBJECT_PERCPU); +} +EXPORT_SYMBOL_GPL(kmemleak_ignore_percpu); + /** * kmemleak_ignore - ignore an allocated object * @ptr: pointer to beginning of the object From 4f489fe6afb395dbc79840efa3c05440b760d883 Mon Sep 17 00:00:00 2001 From: SeongJae Park Date: Thu, 19 Jun 2025 11:36:07 -0700 Subject: [PATCH 198/289] mm/damon/sysfs-schemes: free old damon_sysfs_scheme_filter->memcg_path on write memcg_path_store() assigns a newly allocated memory buffer to filter->memcg_path, without deallocating the previously allocated and assigned memory buffer. As a result, users can leak kernel memory by continuously writing a data to memcg_path DAMOS sysfs file. Fix the leak by deallocating the previously set memory buffer. Link: https://lkml.kernel.org/r/20250619183608.6647-2-sj@kernel.org Fixes: 7ee161f18b5d ("mm/damon/sysfs-schemes: implement filter directory") Signed-off-by: SeongJae Park Cc: Shuah Khan Cc: [6.3.x] Signed-off-by: Andrew Morton --- mm/damon/sysfs-schemes.c | 1 + 1 file changed, 1 insertion(+) diff --git a/mm/damon/sysfs-schemes.c b/mm/damon/sysfs-schemes.c index 0f6c9e1fec0b..30ae7518ffbf 100644 --- a/mm/damon/sysfs-schemes.c +++ b/mm/damon/sysfs-schemes.c @@ -472,6 +472,7 @@ static ssize_t memcg_path_store(struct kobject *kobj, return -ENOMEM; strscpy(path, buf, count + 1); + kfree(filter->memcg_path); filter->memcg_path = path; return count; } From 79300ac805b672a84b64d80d4cbc374d83411599 Mon Sep 17 00:00:00 2001 From: Florian Fainelli Date: Thu, 19 Jun 2025 15:51:05 -0700 Subject: [PATCH 199/289] scripts/gdb: fix dentry_name() lookup The "d_iname" member was replaced with "d_shortname.string" in the commit referenced in the Fixes tag. This prevented the GDB script "lx-mount" command to properly function: (gdb) lx-mounts mount super_block devname pathname fstype options 0xff11000002d21180 0xff11000002d24800 rootfs / rootfs rw 0 0 0xff11000002e18a80 0xff11000003713000 /dev/root / ext4 rw,relatime 0 0 Python Exception : There is no member named d_iname. Error occurred in Python: There is no member named d_iname. Link: https://lkml.kernel.org/r/20250619225105.320729-1-florian.fainelli@broadcom.com Fixes: 58cf9c383c5c ("dcache: back inline names with a struct-wrapped array of unsigned long") Signed-off-by: Florian Fainelli Cc: Al Viro Cc: Jan Kara Cc: Jan Kiszka Cc: Jeff Layton Cc: Kieran Bingham Cc: Signed-off-by: Andrew Morton --- scripts/gdb/linux/vfs.py | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/scripts/gdb/linux/vfs.py b/scripts/gdb/linux/vfs.py index c77b9ce75f6d..b5fbb18ccb77 100644 --- a/scripts/gdb/linux/vfs.py +++ b/scripts/gdb/linux/vfs.py @@ -22,7 +22,7 @@ def dentry_name(d): if parent == d or parent == 0: return "" p = dentry_name(d['d_parent']) + "/" - return p + d['d_iname'].string() + return p + d['d_shortname']['string'].string() class DentryName(gdb.Function): """Return string of the full path of a dentry. From befd9a71d859ea625eaa84dae1b243efb3df3eca Mon Sep 17 00:00:00 2001 From: Haiyue Wang Date: Sun, 22 Jun 2025 01:13:51 +0800 Subject: [PATCH 200/289] fuse: fix runtime warning on truncate_folio_batch_exceptionals() The WARN_ON_ONCE is introduced on truncate_folio_batch_exceptionals() to capture whether the filesystem has removed all DAX entries or not. And the fix has been applied on the filesystem xfs and ext4 by the commit 0e2f80afcfa6 ("fs/dax: ensure all pages are idle prior to filesystem unmount"). Apply the missed fix on filesystem fuse to fix the runtime warning: [ 2.011450] ------------[ cut here ]------------ [ 2.011873] WARNING: CPU: 0 PID: 145 at mm/truncate.c:89 truncate_folio_batch_exceptionals+0x272/0x2b0 [ 2.012468] Modules linked in: [ 2.012718] CPU: 0 UID: 1000 PID: 145 Comm: weston Not tainted 6.16.0-rc2-WSL2-STABLE #2 PREEMPT(undef) [ 2.013292] RIP: 0010:truncate_folio_batch_exceptionals+0x272/0x2b0 [ 2.013704] Code: 48 63 d0 41 29 c5 48 8d 1c d5 00 00 00 00 4e 8d 6c 2a 01 49 c1 e5 03 eb 09 48 83 c3 08 49 39 dd 74 83 41 f6 44 1c 08 01 74 ef <0f> 0b 49 8b 34 1e 48 89 ef e8 10 a2 17 00 eb df 48 8b 7d 00 e8 35 [ 2.014845] RSP: 0018:ffffa47ec33f3b10 EFLAGS: 00010202 [ 2.015279] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000 [ 2.015884] RDX: 0000000000000000 RSI: ffffa47ec33f3ca0 RDI: ffff98aa44f3fa80 [ 2.016377] RBP: ffff98aa44f3fbf0 R08: ffffa47ec33f3ba8 R09: 0000000000000000 [ 2.016942] R10: 0000000000000001 R11: 0000000000000000 R12: ffffa47ec33f3ca0 [ 2.017437] R13: 0000000000000008 R14: ffffa47ec33f3ba8 R15: 0000000000000000 [ 2.017972] FS: 000079ce006afa40(0000) GS:ffff98aade441000(0000) knlGS:0000000000000000 [ 2.018510] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 2.018987] CR2: 000079ce03e74000 CR3: 000000010784f006 CR4: 0000000000372eb0 [ 2.019518] Call Trace: [ 2.019729] [ 2.019901] truncate_inode_pages_range+0xd8/0x400 [ 2.020280] ? timerqueue_add+0x66/0xb0 [ 2.020574] ? get_nohz_timer_target+0x2a/0x140 [ 2.020904] ? timerqueue_add+0x66/0xb0 [ 2.021231] ? timerqueue_del+0x2e/0x50 [ 2.021646] ? __remove_hrtimer+0x39/0x90 [ 2.022017] ? srso_alias_untrain_ret+0x1/0x10 [ 2.022497] ? psi_group_change+0x136/0x350 [ 2.023046] ? _raw_spin_unlock+0xe/0x30 [ 2.023514] ? finish_task_switch.isra.0+0x8d/0x280 [ 2.024068] ? __schedule+0x532/0xbd0 [ 2.024551] fuse_evict_inode+0x29/0x190 [ 2.025131] evict+0x100/0x270 [ 2.025641] ? _atomic_dec_and_lock+0x39/0x50 [ 2.026316] ? __pfx_generic_delete_inode+0x10/0x10 [ 2.026843] __dentry_kill+0x71/0x180 [ 2.027335] dput+0xeb/0x1b0 [ 2.027725] __fput+0x136/0x2b0 [ 2.028054] __x64_sys_close+0x3d/0x80 [ 2.028469] do_syscall_64+0x6d/0x1b0 [ 2.028832] ? clear_bhb_loop+0x30/0x80 [ 2.029182] ? clear_bhb_loop+0x30/0x80 [ 2.029533] ? clear_bhb_loop+0x30/0x80 [ 2.029902] entry_SYSCALL_64_after_hwframe+0x76/0x7e [ 2.030423] RIP: 0033:0x79ce03d0d067 [ 2.030820] Code: b8 ff ff ff ff e9 3e ff ff ff 66 0f 1f 84 00 00 00 00 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 03 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 41 c3 48 83 ec 18 89 7c 24 0c e8 c3 a7 f8 ff [ 2.032354] RSP: 002b:00007ffef0498948 EFLAGS: 00000246 ORIG_RAX: 0000000000000003 [ 2.032939] RAX: ffffffffffffffda RBX: 00007ffef0498960 RCX: 000079ce03d0d067 [ 2.033612] RDX: 0000000000000003 RSI: 0000000000001000 RDI: 000000000000000d [ 2.034289] RBP: 00007ffef0498a30 R08: 000000000000000d R09: 0000000000000000 [ 2.034944] R10: 00007ffef0498978 R11: 0000000000000246 R12: 0000000000000001 [ 2.035610] R13: 00007ffef0498960 R14: 000079ce03e09ce0 R15: 0000000000000003 [ 2.036301] [ 2.036532] ---[ end trace 0000000000000000 ]--- Link: https://lkml.kernel.org/r/20250621171507.3770-1-haiyuewa@163.com Fixes: bde708f1a65d ("fs/dax: always remove DAX page-cache entries when breaking layouts") Signed-off-by: Haiyue Wang Cc: Alistair Popple Cc: Dan Williams Cc: Miklos Szeredi Cc: Signed-off-by: Andrew Morton --- fs/fuse/inode.c | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c index bfe8d8af46f3..9572bdef49ee 100644 --- a/fs/fuse/inode.c +++ b/fs/fuse/inode.c @@ -9,6 +9,7 @@ #include "fuse_i.h" #include "dev_uring_i.h" +#include #include #include #include @@ -162,6 +163,9 @@ static void fuse_evict_inode(struct inode *inode) /* Will write inode on close/munmap and in all other dirtiers */ WARN_ON(inode->i_state & I_DIRTY_INODE); + if (FUSE_IS_DAX(inode)) + dax_break_layout_final(inode); + truncate_inode_pages_final(&inode->i_data); clear_inode(inode); if (inode->i_sb->s_flags & SB_ACTIVE) { From b160a5cc6ac761f341e58e48ae3f03a04d310677 Mon Sep 17 00:00:00 2001 From: Zijun Hu Date: Fri, 20 Jun 2025 19:53:53 +0800 Subject: [PATCH 201/289] mailmap: add entries for Zijun Hu Map my old qualcomm email addresses: Zijun Hu Zijun Hu To the current one: Zijun Hu Link: https://lkml.kernel.org/r/20250620-my_mailmap-v1-1-11ea3db8ba1e@oss.qualcomm.com Signed-off-by: Zijun Hu Cc: Hans verkuil Signed-off-by: Andrew Morton --- .mailmap | 2 ++ 1 file changed, 2 insertions(+) diff --git a/.mailmap b/.mailmap index d57531dab08b..85f13b941f08 100644 --- a/.mailmap +++ b/.mailmap @@ -830,3 +830,5 @@ Yosry Ahmed Yusuke Goda Zack Rusin Zhu Yanjun +Zijun Hu +Zijun Hu From c9e8efa0b384c43670852f955567b31a5eb4ac2b Mon Sep 17 00:00:00 2001 From: Zijun Hu Date: Fri, 20 Jun 2025 19:53:54 +0800 Subject: [PATCH 202/289] mailmap: correct name for a historical account of Zijun Hu Correct the name for from 'zijun_hu' to 'Zijun Hu'. Link: https://lkml.kernel.org/r/20250620-my_mailmap-v1-2-11ea3db8ba1e@oss.qualcomm.com Signed-off-by: Zijun Hu Cc: Hans verkuil Signed-off-by: Andrew Morton --- .mailmap | 1 + 1 file changed, 1 insertion(+) diff --git a/.mailmap b/.mailmap index 85f13b941f08..a5534ff9c1d7 100644 --- a/.mailmap +++ b/.mailmap @@ -832,3 +832,4 @@ Zack Rusin Zhu Yanjun Zijun Hu Zijun Hu +Zijun Hu From b6f5e748587063d4ad3294d07ca29886851d9cd7 Mon Sep 17 00:00:00 2001 From: Arnd Bergmann Date: Fri, 20 Jun 2025 13:21:22 +0200 Subject: [PATCH 203/289] crashdump: add CONFIG_KEYS dependency The dm_crypt code fails to build without CONFIG_KEYS: kernel/crash_dump_dm_crypt.c: In function 'restore_dm_crypt_keys_to_thread_keyring': kernel/crash_dump_dm_crypt.c:105:9: error: unknown type name 'key_ref_t'; did you mean 'key_ref_put'? There is a mix of 'select KEYS' and 'depends on KEYS' in Kconfig, so there is no single obvious solution here, but generally using 'depends on' makes more sense and is less likely to cause dependency loops. Link: https://lkml.kernel.org/r/20250620112140.3396316-1-arnd@kernel.org Fixes: 62f17d9df692 ("crash_dump: retrieve dm crypt keys in kdump kernel") Signed-off-by: Arnd Bergmann Cc: Alexander Graf Cc: Baoquan He Cc: Coiby Xu Cc: Dave Vasilevsky Cc: Eric Biggers Cc: Michael Ellerman Signed-off-by: Andrew Morton --- kernel/Kconfig.kexec | 1 + 1 file changed, 1 insertion(+) diff --git a/kernel/Kconfig.kexec b/kernel/Kconfig.kexec index e64ce21f9a80..2ee603a98813 100644 --- a/kernel/Kconfig.kexec +++ b/kernel/Kconfig.kexec @@ -134,6 +134,7 @@ config CRASH_DM_CRYPT depends on KEXEC_FILE depends on CRASH_DUMP depends on DM_CRYPT + depends on KEYS help With this option enabled, user space can intereact with /sys/kernel/config/crash_dm_crypt_keys to make the dm crypt keys From 7c942f87cc0be5699b1ce434d369eccd8b5321d4 Mon Sep 17 00:00:00 2001 From: Dev Jain Date: Fri, 20 Jun 2025 16:41:50 +0530 Subject: [PATCH 204/289] selftests/mm: fix validate_addr() helper validate_addr() checks whether the address returned by mmap() lies in the low or high VA space, according to whether a high addr hint was passed or not. The fix commit mentioned below changed the code in such a way that this function will always return failure when passed high_addr == 1; addr will be >= HIGH_ADDR_MARK always, we will fall down to "if (addr > HIGH_ADDR_MARK)" and return failure. Fix this. Link: https://lkml.kernel.org/r/20250620111150.50344-1-dev.jain@arm.com Fixes: d1d86ce28d0f ("selftests/mm: virtual_address_range: conform to TAP format output") Signed-off-by: Dev Jain Reviewed-by: Donet Tom Acked-by: David Hildenbrand Cc: Anshuman Khandual Cc: Lorenzo Stoakes Cc: Ryan Roberts Cc: Shuah Khan Signed-off-by: Andrew Morton --- tools/testing/selftests/mm/virtual_address_range.c | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/tools/testing/selftests/mm/virtual_address_range.c b/tools/testing/selftests/mm/virtual_address_range.c index b380e102b22f..169dbd692bf5 100644 --- a/tools/testing/selftests/mm/virtual_address_range.c +++ b/tools/testing/selftests/mm/virtual_address_range.c @@ -77,8 +77,11 @@ static void validate_addr(char *ptr, int high_addr) { unsigned long addr = (unsigned long) ptr; - if (high_addr && addr < HIGH_ADDR_MARK) - ksft_exit_fail_msg("Bad address %lx\n", addr); + if (high_addr) { + if (addr < HIGH_ADDR_MARK) + ksft_exit_fail_msg("Bad address %lx\n", addr); + return; + } if (addr > HIGH_ADDR_MARK) ksft_exit_fail_msg("Bad address %lx\n", addr); From 02d67850aecaf2306d849a3c804a125bb4f9c5f6 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Duje=20Mihanovi=C4=87?= Date: Fri, 20 Jun 2025 11:25:35 +0200 Subject: [PATCH 205/289] =?UTF-8?q?mailmap:=20update=20Duje=20Mihanovi?= =?UTF-8?q?=C4=87's=20email=20address?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit I'm switching to a new mail address, so map my old one to it. Link: https://lkml.kernel.org/r/20250620-mailmap-v1-1-a6b4b72dbd07@dujemihanovic.xyz Signed-off-by: Duje Mihanović Cc: Karel Balej Signed-off-by: Andrew Morton --- .mailmap | 1 + 1 file changed, 1 insertion(+) diff --git a/.mailmap b/.mailmap index a5534ff9c1d7..bd18a7c9d2ce 100644 --- a/.mailmap +++ b/.mailmap @@ -223,6 +223,7 @@ Dmitry Safonov <0x7f454c46@gmail.com> Dmitry Safonov <0x7f454c46@gmail.com> Domen Puncer Douglas Gilbert + Ed L. Cashin Elliot Berman Enric Balletbo i Serra From c0cb210a87fcdda3c25f43b5a64420e6b07d3f53 Mon Sep 17 00:00:00 2001 From: Lorenzo Stoakes Date: Wed, 25 Jun 2025 10:52:31 +0100 Subject: [PATCH 206/289] MAINTAINERS: add Lorenzo as THP co-maintainer I am doing a great deal of review and getting ever more involved in THP with intent to do more so in future also, so add myself as co-maintainer to help David with workload. Link: https://lkml.kernel.org/r/20250625095231.42874-1-lorenzo.stoakes@oracle.com Signed-off-by: Lorenzo Stoakes Acked-by: David Hildenbrand Acked-by: Baolin Wang Acked-by: Dev Jain Acked-by: Zi Yan Acked-by: Oscar Salvador Cc: Barry Song Cc: Liam Howlett Cc: Mariano Pache Cc: Ryan Roberts Signed-off-by: Andrew Morton --- MAINTAINERS | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/MAINTAINERS b/MAINTAINERS index 4e131bb97894..d4fa7884b4a8 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -15945,9 +15945,9 @@ F: mm/swapfile.c MEMORY MANAGEMENT - THP (TRANSPARENT HUGE PAGE) M: Andrew Morton M: David Hildenbrand +M: Lorenzo Stoakes R: Zi Yan R: Baolin Wang -R: Lorenzo Stoakes R: Liam R. Howlett R: Nico Pache R: Ryan Roberts From 1dcea07810c8bb4aba1bfed0bbe015b2d708a903 Mon Sep 17 00:00:00 2001 From: Kent Overstreet Date: Wed, 25 Jun 2025 12:17:51 -0400 Subject: [PATCH 207/289] bcachefs: btree_root_unreadable_and_scan_found_nothing should not be autofix Autofix is specified in btree_gc.c if it's not an important btree. Signed-off-by: Kent Overstreet --- fs/bcachefs/sb-errors_format.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/bcachefs/sb-errors_format.h b/fs/bcachefs/sb-errors_format.h index 041887aad63a..0641fb634bd4 100644 --- a/fs/bcachefs/sb-errors_format.h +++ b/fs/bcachefs/sb-errors_format.h @@ -301,7 +301,7 @@ enum bch_fsck_flags { x(btree_node_bkey_bad_u64s, 260, 0) \ x(btree_node_topology_empty_interior_node, 261, 0) \ x(btree_ptr_v2_min_key_bad, 262, 0) \ - x(btree_root_unreadable_and_scan_found_nothing, 263, FSCK_AUTOFIX) \ + x(btree_root_unreadable_and_scan_found_nothing, 263, 0) \ x(snapshot_node_missing, 264, FSCK_AUTOFIX) \ x(dup_backpointer_to_bad_csum_extent, 265, 0) \ x(btree_bitmap_not_marked, 266, FSCK_AUTOFIX) \ From 3e72acb78b73ccffeaf929c039dc5a0a7a147535 Mon Sep 17 00:00:00 2001 From: Kent Overstreet Date: Wed, 25 Jun 2025 12:45:11 -0400 Subject: [PATCH 208/289] bcachefs: Ensure btree node scan runs before checking for scanned nodes Previously, calling bch2_btree_has_scanned_nodes() when btree node scan hadn't actually run would erroniously return false - causing us to think a btree was entirely gone. This fixes a 6.16 regression from moving the scheduling of btree node scan out of bch2_btree_lost_data() (fixing the bug where we'd schedule it persistently in the superblock) and only scheduling it when check_toploogy() is asking for scanned btree nodes. Signed-off-by: Kent Overstreet --- fs/bcachefs/btree_gc.c | 29 ++++++++++++++++++----------- fs/bcachefs/btree_node_scan.c | 6 +++++- fs/bcachefs/btree_node_scan.h | 2 +- 3 files changed, 24 insertions(+), 13 deletions(-) diff --git a/fs/bcachefs/btree_gc.c b/fs/bcachefs/btree_gc.c index 697c6ecc3a65..bac108e93823 100644 --- a/fs/bcachefs/btree_gc.c +++ b/fs/bcachefs/btree_gc.c @@ -534,32 +534,39 @@ fsck_err: return ret; } -static int bch2_check_root(struct btree_trans *trans, enum btree_id i, +static int bch2_check_root(struct btree_trans *trans, enum btree_id btree, bool *reconstructed_root) { struct bch_fs *c = trans->c; - struct btree_root *r = bch2_btree_id_root(c, i); + struct btree_root *r = bch2_btree_id_root(c, btree); struct printbuf buf = PRINTBUF; int ret = 0; - bch2_btree_id_to_text(&buf, i); + bch2_btree_id_to_text(&buf, btree); if (r->error) { bch_info(c, "btree root %s unreadable, must recover from scan", buf.buf); - r->alive = false; - r->error = 0; + ret = bch2_btree_has_scanned_nodes(c, btree); + if (ret < 0) + goto err; - if (!bch2_btree_has_scanned_nodes(c, i)) { + if (!ret) { __fsck_err(trans, - FSCK_CAN_FIX|(!btree_id_important(i) ? FSCK_AUTOFIX : 0), + FSCK_CAN_FIX|(!btree_id_important(btree) ? FSCK_AUTOFIX : 0), btree_root_unreadable_and_scan_found_nothing, "no nodes found for btree %s, continue?", buf.buf); - bch2_btree_root_alloc_fake_trans(trans, i, 0); + + r->alive = false; + r->error = 0; + bch2_btree_root_alloc_fake_trans(trans, btree, 0); } else { - bch2_btree_root_alloc_fake_trans(trans, i, 1); - bch2_shoot_down_journal_keys(c, i, 1, BTREE_MAX_DEPTH, POS_MIN, SPOS_MAX); - ret = bch2_get_scanned_nodes(c, i, 0, POS_MIN, SPOS_MAX); + r->alive = false; + r->error = 0; + bch2_btree_root_alloc_fake_trans(trans, btree, 1); + + bch2_shoot_down_journal_keys(c, btree, 1, BTREE_MAX_DEPTH, POS_MIN, SPOS_MAX); + ret = bch2_get_scanned_nodes(c, btree, 0, POS_MIN, SPOS_MAX); if (ret) goto err; } diff --git a/fs/bcachefs/btree_node_scan.c b/fs/bcachefs/btree_node_scan.c index a35847734a60..23d8c62ea4b6 100644 --- a/fs/bcachefs/btree_node_scan.c +++ b/fs/bcachefs/btree_node_scan.c @@ -521,8 +521,12 @@ bool bch2_btree_node_is_stale(struct bch_fs *c, struct btree *b) return false; } -bool bch2_btree_has_scanned_nodes(struct bch_fs *c, enum btree_id btree) +int bch2_btree_has_scanned_nodes(struct bch_fs *c, enum btree_id btree) { + int ret = bch2_run_print_explicit_recovery_pass(c, BCH_RECOVERY_PASS_scan_for_btree_nodes); + if (ret) + return ret; + struct found_btree_node search = { .btree_id = btree, .level = 0, diff --git a/fs/bcachefs/btree_node_scan.h b/fs/bcachefs/btree_node_scan.h index 08687b209787..66e6f9ed19d0 100644 --- a/fs/bcachefs/btree_node_scan.h +++ b/fs/bcachefs/btree_node_scan.h @@ -4,7 +4,7 @@ int bch2_scan_for_btree_nodes(struct bch_fs *); bool bch2_btree_node_is_stale(struct bch_fs *, struct btree *); -bool bch2_btree_has_scanned_nodes(struct bch_fs *, enum btree_id); +int bch2_btree_has_scanned_nodes(struct bch_fs *, enum btree_id); int bch2_get_scanned_nodes(struct bch_fs *, enum btree_id, unsigned, struct bpos, struct bpos); void bch2_find_btree_nodes_exit(struct find_btree_nodes *); From 64b6a788bd96a0cdc11073a2d1f85413b078c1f2 Mon Sep 17 00:00:00 2001 From: Kent Overstreet Date: Wed, 25 Jun 2025 00:48:14 -0400 Subject: [PATCH 209/289] bcachefs: Ensure we rewind to run recovery passes Fix a 6.16 regression from the recovery pass rework, which introduced a bug where calling bch2_run_explicit_recovery_pass() would only return the error code to rewind recovery for the first call that scheduled that recovery pass. If the error code from the first call was swallowed (because it was called by an asynchronous codepath), subsequent calls would go "ok, this pass is already marked as needing to run" and return 0. Fixing this ensures that check_topology bails out to run btree_node_scan before doing any repair. Signed-off-by: Kent Overstreet --- fs/bcachefs/recovery_passes.c | 13 ++++++++----- 1 file changed, 8 insertions(+), 5 deletions(-) diff --git a/fs/bcachefs/recovery_passes.c b/fs/bcachefs/recovery_passes.c index c2c18c0a5429..c09ed2dd4639 100644 --- a/fs/bcachefs/recovery_passes.c +++ b/fs/bcachefs/recovery_passes.c @@ -313,6 +313,9 @@ static bool recovery_pass_needs_set(struct bch_fs *c, */ bool in_recovery = test_bit(BCH_FS_in_recovery, &c->flags); bool persistent = !in_recovery || !(*flags & RUN_RECOVERY_PASS_nopersistent); + bool rewind = in_recovery && + r->curr_pass > pass && + !(r->passes_complete & BIT_ULL(pass)); if (persistent ? !(c->sb.recovery_passes_required & BIT_ULL(pass)) @@ -323,6 +326,9 @@ static bool recovery_pass_needs_set(struct bch_fs *c, (r->passes_ratelimiting & BIT_ULL(pass))) return true; + if (rewind) + return true; + return false; } @@ -337,7 +343,6 @@ int __bch2_run_explicit_recovery_pass(struct bch_fs *c, struct bch_fs_recovery *r = &c->recovery; int ret = 0; - lockdep_assert_held(&c->sb_lock); bch2_printbuf_make_room(out, 1024); @@ -408,10 +413,8 @@ int bch2_run_explicit_recovery_pass(struct bch_fs *c, { int ret = 0; - scoped_guard(mutex, &c->sb_lock) { - if (!recovery_pass_needs_set(c, pass, &flags)) - return 0; - + if (recovery_pass_needs_set(c, pass, &flags)) { + guard(mutex)(&c->sb_lock); ret = __bch2_run_explicit_recovery_pass(c, out, pass, flags); bch2_write_super(c); } From ef6fac0f9e5d0695cee1d820c727fe753eca52d5 Mon Sep 17 00:00:00 2001 From: Kent Overstreet Date: Wed, 25 Jun 2025 14:55:59 -0400 Subject: [PATCH 210/289] bcachefs: Plumb correct ip to trans_relock_fail tracepoint Signed-off-by: Kent Overstreet --- fs/bcachefs/btree_locking.c | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/fs/bcachefs/btree_locking.c b/fs/bcachefs/btree_locking.c index 91a51aef82f1..bed2b4b6ffb9 100644 --- a/fs/bcachefs/btree_locking.c +++ b/fs/bcachefs/btree_locking.c @@ -771,7 +771,7 @@ static inline void __bch2_trans_unlock(struct btree_trans *trans) } static noinline __cold void bch2_trans_relock_fail(struct btree_trans *trans, struct btree_path *path, - struct get_locks_fail *f, bool trace) + struct get_locks_fail *f, bool trace, ulong ip) { if (!trace) goto out; @@ -796,7 +796,7 @@ static noinline __cold void bch2_trans_relock_fail(struct btree_trans *trans, st prt_printf(&buf, " total locked %u.%u.%u", c.n[0], c.n[1], c.n[2]); } - trace_trans_restart_relock(trans, _RET_IP_, buf.buf); + trace_trans_restart_relock(trans, ip, buf.buf); printbuf_exit(&buf); } @@ -806,7 +806,7 @@ out: bch2_trans_verify_locks(trans); } -static inline int __bch2_trans_relock(struct btree_trans *trans, bool trace) +static inline int __bch2_trans_relock(struct btree_trans *trans, bool trace, ulong ip) { bch2_trans_verify_locks(trans); @@ -825,7 +825,7 @@ static inline int __bch2_trans_relock(struct btree_trans *trans, bool trace) if (path->should_be_locked && (ret = btree_path_get_locks(trans, path, false, &f, BCH_ERR_transaction_restart_relock))) { - bch2_trans_relock_fail(trans, path, &f, trace); + bch2_trans_relock_fail(trans, path, &f, trace, ip); return ret; } } @@ -838,12 +838,12 @@ out: int bch2_trans_relock(struct btree_trans *trans) { - return __bch2_trans_relock(trans, true); + return __bch2_trans_relock(trans, true, _RET_IP_); } int bch2_trans_relock_notrace(struct btree_trans *trans) { - return __bch2_trans_relock(trans, false); + return __bch2_trans_relock(trans, false, _RET_IP_); } void bch2_trans_unlock(struct btree_trans *trans) From 7ab6847a03229e73bb7c58ca397630f699e79b53 Mon Sep 17 00:00:00 2001 From: Salvatore Bonaccorso Date: Wed, 25 Jun 2025 20:41:28 +0200 Subject: [PATCH 211/289] ALSA: hda/realtek: Fix built-in mic on ASUS VivoBook X507UAR The built-in mic of ASUS VivoBook X507UAR is broken recently by the fix of the pin sort. The fixup ALC256_FIXUP_ASUS_MIC_NO_PRESENCE is working for addressing the regression, too. Fixes: 3b4309546b48 ("ALSA: hda: Fix headset detection failure due to unstable sort") Reported-by: Igor Tamara Closes: https://bugs.debian.org/1108069 Signed-off-by: Salvatore Bonaccorso Link: https://lore.kernel.org/CADdHDco7_o=4h_epjEAb92Dj-vUz_PoTC2-W9g5ncT2E0NzfeQ@mail.gmail.com Signed-off-by: Takashi Iwai --- sound/pci/hda/patch_realtek.c | 1 + 1 file changed, 1 insertion(+) diff --git a/sound/pci/hda/patch_realtek.c b/sound/pci/hda/patch_realtek.c index 784644a2b51d..5d6d01ecfee2 100644 --- a/sound/pci/hda/patch_realtek.c +++ b/sound/pci/hda/patch_realtek.c @@ -11031,6 +11031,7 @@ static const struct hda_quirk alc269_fixup_tbl[] = { SND_PCI_QUIRK(0x1043, 0x1df3, "ASUS UM5606WA", ALC294_FIXUP_BASS_SPEAKER_15), SND_PCI_QUIRK(0x1043, 0x1264, "ASUS UM5606KA", ALC294_FIXUP_BASS_SPEAKER_15), SND_PCI_QUIRK(0x1043, 0x1e02, "ASUS UX3402ZA", ALC245_FIXUP_CS35L41_SPI_2), + SND_PCI_QUIRK(0x1043, 0x1e10, "ASUS VivoBook X507UAR", ALC256_FIXUP_ASUS_MIC_NO_PRESENCE), SND_PCI_QUIRK(0x1043, 0x1e11, "ASUS Zephyrus G15", ALC289_FIXUP_ASUS_GA502), SND_PCI_QUIRK(0x1043, 0x1e12, "ASUS UM3402", ALC287_FIXUP_CS35L41_I2C_2), SND_PCI_QUIRK(0x1043, 0x1e1f, "ASUS Vivobook 15 X1504VAP", ALC2XX_FIXUP_HEADSET_MIC), From 1476b218327b89bbb64c14619a2d34f0c320f2c3 Mon Sep 17 00:00:00 2001 From: Leo Yan Date: Wed, 25 Jun 2025 18:07:37 +0100 Subject: [PATCH 212/289] perf/aux: Fix pending disable flow when the AUX ring buffer overruns If an AUX event overruns, the event core layer intends to disable the event by setting the 'pending_disable' flag. Unfortunately, the event is not actually disabled afterwards. In commit: ca6c21327c6a ("perf: Fix missing SIGTRAPs") the 'pending_disable' flag was changed to a boolean. However, the AUX event code was not updated accordingly. The flag ends up holding a CPU number. If this number is zero, the flag is taken as false and the IRQ work is never triggered. Later, with commit: 2b84def990d3 ("perf: Split __perf_pending_irq() out of perf_pending_irq()") a new IRQ work 'pending_disable_irq' was introduced to handle event disabling. The AUX event path was not updated to kick off the work queue. To fix this bug, when an AUX ring buffer overrun is detected, call perf_event_disable_inatomic() to initiate the pending disable flow. Also update the outdated comment for setting the flag, to reflect the boolean values (0 or 1). Fixes: 2b84def990d3 ("perf: Split __perf_pending_irq() out of perf_pending_irq()") Fixes: ca6c21327c6a ("perf: Fix missing SIGTRAPs") Signed-off-by: Leo Yan Signed-off-by: Ingo Molnar Reviewed-by: James Clark Reviewed-by: Yeoreum Yun Cc: Adrian Hunter Cc: Alexander Shishkin Cc: Arnaldo Carvalho de Melo Cc: Ian Rogers Cc: Jiri Olsa Cc: Liang Kan Cc: Marco Elver Cc: Mark Rutland Cc: Namhyung Kim Cc: Peter Zijlstra Cc: Sebastian Andrzej Siewior Cc: linux-perf-users@vger.kernel.org Link: https://lore.kernel.org/r/20250625170737.2918295-1-leo.yan@arm.com --- kernel/events/core.c | 6 +++--- kernel/events/ring_buffer.c | 4 ++-- 2 files changed, 5 insertions(+), 5 deletions(-) diff --git a/kernel/events/core.c b/kernel/events/core.c index 1f746469fda5..7281230044d0 100644 --- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -7251,15 +7251,15 @@ static void __perf_pending_disable(struct perf_event *event) * CPU-A CPU-B * * perf_event_disable_inatomic() - * @pending_disable = CPU-A; + * @pending_disable = 1; * irq_work_queue(); * * sched-out - * @pending_disable = -1; + * @pending_disable = 0; * * sched-in * perf_event_disable_inatomic() - * @pending_disable = CPU-B; + * @pending_disable = 1; * irq_work_queue(); // FAILS * * irq_work_run() diff --git a/kernel/events/ring_buffer.c b/kernel/events/ring_buffer.c index d2aef87c7e9f..aa9a759e824f 100644 --- a/kernel/events/ring_buffer.c +++ b/kernel/events/ring_buffer.c @@ -441,7 +441,7 @@ void *perf_aux_output_begin(struct perf_output_handle *handle, * store that will be enabled on successful return */ if (!handle->size) { /* A, matches D */ - event->pending_disable = smp_processor_id(); + perf_event_disable_inatomic(handle->event); perf_output_wakeup(handle); WRITE_ONCE(rb->aux_nest, 0); goto err_put; @@ -526,7 +526,7 @@ void perf_aux_output_end(struct perf_output_handle *handle, unsigned long size) if (wakeup) { if (handle->aux_flags & PERF_AUX_FLAG_TRUNCATED) - handle->event->pending_disable = smp_processor_id(); + perf_event_disable_inatomic(handle->event); perf_output_wakeup(handle); } From dd2c18548964ae7ad48d208a765d909cd35448a1 Mon Sep 17 00:00:00 2001 From: Keith Busch Date: Wed, 11 Jun 2025 06:50:47 +0200 Subject: [PATCH 213/289] nvme: reset delayed remove_work after reconnect The remove_work will proceed with permanently disconnecting on the initial final path failure if the head shows no paths after the delay. If a new path connects while the remove_work is pending, and if that new path happens to disconnect before that remove_work executes, the delayed removal should reset based on the most recent path disconnect time, but queue_delayed_work() won't do anything if the work is already pending. Attempt to cancel the delayed work when a new path connects, and use mod_delayed_work() in case the remove_work remains pending anyway. Signed-off-by: Keith Busch Reviewed-by: Nilay Shroff Signed-off-by: Christoph Hellwig --- drivers/nvme/host/core.c | 4 ++++ drivers/nvme/host/multipath.c | 2 +- 2 files changed, 5 insertions(+), 1 deletion(-) diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c index 92697f98c601..724f5732786c 100644 --- a/drivers/nvme/host/core.c +++ b/drivers/nvme/host/core.c @@ -4036,6 +4036,10 @@ static int nvme_init_ns_head(struct nvme_ns *ns, struct nvme_ns_info *info) list_add_tail_rcu(&ns->siblings, &head->list); ns->head = head; mutex_unlock(&ctrl->subsys->lock); + +#ifdef CONFIG_NVME_MULTIPATH + cancel_delayed_work(&head->remove_work); +#endif return 0; out_put_ns_head: diff --git a/drivers/nvme/host/multipath.c b/drivers/nvme/host/multipath.c index 140079ff86e6..1062467595f3 100644 --- a/drivers/nvme/host/multipath.c +++ b/drivers/nvme/host/multipath.c @@ -1311,7 +1311,7 @@ void nvme_mpath_remove_disk(struct nvme_ns_head *head) */ if (!try_module_get(THIS_MODULE)) goto out; - queue_delayed_work(nvme_wq, &head->remove_work, + mod_delayed_work(nvme_wq, &head->remove_work, head->delayed_removal_secs * HZ); } else { list_del_init(&head->entry); From b2e607fecac15e07f50269c080e2e71b5049dfa2 Mon Sep 17 00:00:00 2001 From: Christoph Hellwig Date: Wed, 11 Jun 2025 07:09:21 +0200 Subject: [PATCH 214/289] nvme: refactor the atomic write unit detection Move all the code out of nvme_update_disk_info into the helper, and rename the helper to have a somewhat less clumsy name. Signed-off-by: Christoph Hellwig Reviewed-by: Luis Chamberlain Reviewed-by: John Garry --- drivers/nvme/host/core.c | 72 +++++++++++++++++++++------------------- 1 file changed, 38 insertions(+), 34 deletions(-) diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c index 724f5732786c..520fb5f1e214 100644 --- a/drivers/nvme/host/core.c +++ b/drivers/nvme/host/core.c @@ -2015,21 +2015,51 @@ static void nvme_configure_metadata(struct nvme_ctrl *ctrl, } -static void nvme_update_atomic_write_disk_info(struct nvme_ns *ns, - struct nvme_id_ns *id, struct queue_limits *lim, - u32 bs, u32 atomic_bs) +static u32 nvme_configure_atomic_write(struct nvme_ns *ns, + struct nvme_id_ns *id, struct queue_limits *lim, u32 bs) { - unsigned int boundary = 0; + u32 atomic_bs, boundary = 0; - if (id->nsfeat & NVME_NS_FEAT_ATOMICS && id->nawupf) { - if (le16_to_cpu(id->nabspf)) + /* + * We do not support an offset for the atomic boundaries. + */ + if (id->nabo) + return bs; + + if ((id->nsfeat & NVME_NS_FEAT_ATOMICS) && id->nawupf) { + /* + * Use the per-namespace atomic write unit when available. + */ + atomic_bs = (1 + le16_to_cpu(id->nawupf)) * bs; + if (id->nabspf) boundary = (le16_to_cpu(id->nabspf) + 1) * bs; + } else { + /* + * Use the controller wide atomic write unit. This sucks + * because the limit is defined in terms of logical blocks while + * namespaces can have different formats, and because there is + * no clear language in the specification prohibiting different + * values for different controllers in the subsystem. + */ + atomic_bs = (1 + ns->ctrl->awupf) * bs; } + + if (!ns->ctrl->subsys->atomic_bs) { + ns->ctrl->subsys->atomic_bs = atomic_bs; + } else if (ns->ctrl->subsys->atomic_bs != atomic_bs) { + dev_err_ratelimited(ns->ctrl->device, + "%s: Inconsistent Atomic Write Size, Namespace will not be added: Subsystem=%d bytes, Controller/Namespace=%d bytes\n", + ns->disk ? ns->disk->disk_name : "?", + ns->ctrl->subsys->atomic_bs, + atomic_bs); + } + lim->atomic_write_hw_max = atomic_bs; lim->atomic_write_hw_boundary = boundary; lim->atomic_write_hw_unit_min = bs; lim->atomic_write_hw_unit_max = rounddown_pow_of_two(atomic_bs); lim->features |= BLK_FEAT_ATOMIC_WRITES; + return atomic_bs; } static u32 nvme_max_drv_segments(struct nvme_ctrl *ctrl) @@ -2067,34 +2097,8 @@ static bool nvme_update_disk_info(struct nvme_ns *ns, struct nvme_id_ns *id, valid = false; } - atomic_bs = phys_bs = bs; - if (id->nabo == 0) { - /* - * Bit 1 indicates whether NAWUPF is defined for this namespace - * and whether it should be used instead of AWUPF. If NAWUPF == - * 0 then AWUPF must be used instead. - */ - if (id->nsfeat & NVME_NS_FEAT_ATOMICS && id->nawupf) - atomic_bs = (1 + le16_to_cpu(id->nawupf)) * bs; - else - atomic_bs = (1 + ns->ctrl->awupf) * bs; - - /* - * Set subsystem atomic bs. - */ - if (ns->ctrl->subsys->atomic_bs) { - if (atomic_bs != ns->ctrl->subsys->atomic_bs) { - dev_err_ratelimited(ns->ctrl->device, - "%s: Inconsistent Atomic Write Size, Namespace will not be added: Subsystem=%d bytes, Controller/Namespace=%d bytes\n", - ns->disk ? ns->disk->disk_name : "?", - ns->ctrl->subsys->atomic_bs, - atomic_bs); - } - } else - ns->ctrl->subsys->atomic_bs = atomic_bs; - - nvme_update_atomic_write_disk_info(ns, id, lim, bs, atomic_bs); - } + phys_bs = bs; + atomic_bs = nvme_configure_atomic_write(ns, id, lim, bs); if (id->nsfeat & NVME_NS_FEAT_IO_OPT) { /* NPWG = Namespace Preferred Write Granularity */ From f46d273449ba65afd53f3dd8fe0182c9df877e08 Mon Sep 17 00:00:00 2001 From: Christoph Hellwig Date: Wed, 11 Jun 2025 06:54:56 +0200 Subject: [PATCH 215/289] nvme: fix atomic write size validation Don't mix the namespace and controller values, and validate the per-controller limit when probing the controller. This avoid spurious failures for controllers with namespaces that have different namespaces with different logical block sizes, or report the per-namespace values only for some namespaces. It also fixes a missing queue_limits_cancel_update in an error path by removing that error path. Fixes: 8695f060a029 ("nvme: all namespaces in a subsystem must adhere to a common atomic write size") Reported-by: Yi Zhang Signed-off-by: Christoph Hellwig Reviewed-by: Luis Chamberlain Reviewed-by: John Garry Tested-by: Yi Zhang --- drivers/nvme/host/core.c | 33 +++++++++++---------------------- drivers/nvme/host/nvme.h | 3 +-- 2 files changed, 12 insertions(+), 24 deletions(-) diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c index 520fb5f1e214..e533d791955d 100644 --- a/drivers/nvme/host/core.c +++ b/drivers/nvme/host/core.c @@ -2041,17 +2041,7 @@ static u32 nvme_configure_atomic_write(struct nvme_ns *ns, * no clear language in the specification prohibiting different * values for different controllers in the subsystem. */ - atomic_bs = (1 + ns->ctrl->awupf) * bs; - } - - if (!ns->ctrl->subsys->atomic_bs) { - ns->ctrl->subsys->atomic_bs = atomic_bs; - } else if (ns->ctrl->subsys->atomic_bs != atomic_bs) { - dev_err_ratelimited(ns->ctrl->device, - "%s: Inconsistent Atomic Write Size, Namespace will not be added: Subsystem=%d bytes, Controller/Namespace=%d bytes\n", - ns->disk ? ns->disk->disk_name : "?", - ns->ctrl->subsys->atomic_bs, - atomic_bs); + atomic_bs = (1 + ns->ctrl->subsys->awupf) * bs; } lim->atomic_write_hw_max = atomic_bs; @@ -2386,16 +2376,6 @@ static int nvme_update_ns_info_block(struct nvme_ns *ns, if (!nvme_update_disk_info(ns, id, &lim)) capacity = 0; - /* - * Validate the max atomic write size fits within the subsystem's - * atomic write capabilities. - */ - if (lim.atomic_write_hw_max > ns->ctrl->subsys->atomic_bs) { - blk_mq_unfreeze_queue(ns->disk->queue, memflags); - ret = -ENXIO; - goto out; - } - nvme_config_discard(ns, &lim); if (IS_ENABLED(CONFIG_BLK_DEV_ZONED) && ns->head->ids.csi == NVME_CSI_ZNS) @@ -3219,6 +3199,7 @@ static int nvme_init_subsystem(struct nvme_ctrl *ctrl, struct nvme_id_ctrl *id) memcpy(subsys->model, id->mn, sizeof(subsys->model)); subsys->vendor_id = le16_to_cpu(id->vid); subsys->cmic = id->cmic; + subsys->awupf = le16_to_cpu(id->awupf); /* Versions prior to 1.4 don't necessarily report a valid type */ if (id->cntrltype == NVME_CTRL_DISC || @@ -3556,6 +3537,15 @@ static int nvme_init_identify(struct nvme_ctrl *ctrl) if (ret) goto out_free; } + + if (le16_to_cpu(id->awupf) != ctrl->subsys->awupf) { + dev_err_ratelimited(ctrl->device, + "inconsistent AWUPF, controller not added (%u/%u).\n", + le16_to_cpu(id->awupf), ctrl->subsys->awupf); + ret = -EINVAL; + goto out_free; + } + memcpy(ctrl->subsys->firmware_rev, id->fr, sizeof(ctrl->subsys->firmware_rev)); @@ -3651,7 +3641,6 @@ static int nvme_init_identify(struct nvme_ctrl *ctrl) dev_pm_qos_expose_latency_tolerance(ctrl->device); else if (!ctrl->apst_enabled && prev_apst_enabled) dev_pm_qos_hide_latency_tolerance(ctrl->device); - ctrl->awupf = le16_to_cpu(id->awupf); out_free: kfree(id); return ret; diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h index a468cdc5b5cb..7df2ea21851f 100644 --- a/drivers/nvme/host/nvme.h +++ b/drivers/nvme/host/nvme.h @@ -410,7 +410,6 @@ struct nvme_ctrl { enum nvme_ctrl_type cntrltype; enum nvme_dctype dctype; - u16 awupf; /* 0's based value. */ }; static inline enum nvme_ctrl_state nvme_ctrl_state(struct nvme_ctrl *ctrl) @@ -443,11 +442,11 @@ struct nvme_subsystem { u8 cmic; enum nvme_subsys_type subtype; u16 vendor_id; + u16 awupf; /* 0's based value. */ struct ida ns_ida; #ifdef CONFIG_NVME_MULTIPATH enum nvme_iopolicy iopolicy; #endif - u32 atomic_bs; }; /* From f0ef0b02af381418973a952867052b12194f9c16 Mon Sep 17 00:00:00 2001 From: Thomas Huth Date: Thu, 26 Jun 2025 20:07:10 +0800 Subject: [PATCH 216/289] LoongArch: Replace __ASSEMBLY__ with __ASSEMBLER__ in headers While the GCC and Clang compilers already define __ASSEMBLER__ automatically when compiling assembler code, __ASSEMBLY__ is a macro that only gets defined by the Makefiles in the kernel. This is bad since macros starting with two underscores are names that are reserved by the C language. It can also be very confusing for the developers when switching between userspace and kernelspace coding, or when dealing with uapi headers that rather should use __ASSEMBLER__ instead. So let's now standardize on the __ASSEMBLER__ macro that is provided by the compilers. This is almost a completely mechanical patch (done with a simple "sed -i" statement), with one comment tweaked manually in the arch/loongarch/include/asm/cpu.h file (it was missing the trailing underscores). Signed-off-by: Thomas Huth Signed-off-by: Huacai Chen --- arch/loongarch/include/asm/addrspace.h | 8 ++++---- arch/loongarch/include/asm/alternative-asm.h | 4 ++-- arch/loongarch/include/asm/alternative.h | 4 ++-- arch/loongarch/include/asm/asm-extable.h | 6 +++--- arch/loongarch/include/asm/asm.h | 8 ++++---- arch/loongarch/include/asm/cpu.h | 4 ++-- arch/loongarch/include/asm/ftrace.h | 4 ++-- arch/loongarch/include/asm/gpr-num.h | 6 +++--- arch/loongarch/include/asm/irqflags.h | 4 ++-- arch/loongarch/include/asm/jump_label.h | 4 ++-- arch/loongarch/include/asm/kasan.h | 2 +- arch/loongarch/include/asm/loongarch.h | 16 ++++++++-------- arch/loongarch/include/asm/orc_types.h | 4 ++-- arch/loongarch/include/asm/page.h | 4 ++-- arch/loongarch/include/asm/pgtable-bits.h | 4 ++-- arch/loongarch/include/asm/pgtable.h | 4 ++-- arch/loongarch/include/asm/prefetch.h | 2 +- arch/loongarch/include/asm/thread_info.h | 4 ++-- arch/loongarch/include/asm/types.h | 2 +- arch/loongarch/include/asm/unwind_hints.h | 6 +++--- arch/loongarch/include/asm/vdso/arch_data.h | 4 ++-- arch/loongarch/include/asm/vdso/getrandom.h | 4 ++-- arch/loongarch/include/asm/vdso/gettimeofday.h | 4 ++-- arch/loongarch/include/asm/vdso/processor.h | 4 ++-- arch/loongarch/include/asm/vdso/vdso.h | 4 ++-- arch/loongarch/include/asm/vdso/vsyscall.h | 4 ++-- tools/arch/loongarch/include/asm/orc_types.h | 4 ++-- 27 files changed, 64 insertions(+), 64 deletions(-) diff --git a/arch/loongarch/include/asm/addrspace.h b/arch/loongarch/include/asm/addrspace.h index fe198b473f84..e739dbc6329d 100644 --- a/arch/loongarch/include/asm/addrspace.h +++ b/arch/loongarch/include/asm/addrspace.h @@ -18,12 +18,12 @@ /* * This gives the physical RAM offset. */ -#ifndef __ASSEMBLY__ +#ifndef __ASSEMBLER__ #ifndef PHYS_OFFSET #define PHYS_OFFSET _UL(0) #endif extern unsigned long vm_map_base; -#endif /* __ASSEMBLY__ */ +#endif /* __ASSEMBLER__ */ #ifndef IO_BASE #define IO_BASE CSR_DMW0_BASE @@ -66,7 +66,7 @@ extern unsigned long vm_map_base; #define FIXADDR_TOP ((unsigned long)(long)(int)0xfffe0000) #endif -#ifdef __ASSEMBLY__ +#ifdef __ASSEMBLER__ #define _ATYPE_ #define _ATYPE32_ #define _ATYPE64_ @@ -85,7 +85,7 @@ extern unsigned long vm_map_base; /* * 32/64-bit LoongArch address spaces */ -#ifdef __ASSEMBLY__ +#ifdef __ASSEMBLER__ #define _ACAST32_ #define _ACAST64_ #else diff --git a/arch/loongarch/include/asm/alternative-asm.h b/arch/loongarch/include/asm/alternative-asm.h index ff3d10ac393f..7dc29bd9b2f0 100644 --- a/arch/loongarch/include/asm/alternative-asm.h +++ b/arch/loongarch/include/asm/alternative-asm.h @@ -2,7 +2,7 @@ #ifndef _ASM_ALTERNATIVE_ASM_H #define _ASM_ALTERNATIVE_ASM_H -#ifdef __ASSEMBLY__ +#ifdef __ASSEMBLER__ #include @@ -77,6 +77,6 @@ .previous .endm -#endif /* __ASSEMBLY__ */ +#endif /* __ASSEMBLER__ */ #endif /* _ASM_ALTERNATIVE_ASM_H */ diff --git a/arch/loongarch/include/asm/alternative.h b/arch/loongarch/include/asm/alternative.h index cee7b29785ab..b5bae21fb3c8 100644 --- a/arch/loongarch/include/asm/alternative.h +++ b/arch/loongarch/include/asm/alternative.h @@ -2,7 +2,7 @@ #ifndef _ASM_ALTERNATIVE_H #define _ASM_ALTERNATIVE_H -#ifndef __ASSEMBLY__ +#ifndef __ASSEMBLER__ #include #include @@ -106,6 +106,6 @@ extern void apply_alternatives(struct alt_instr *start, struct alt_instr *end); #define alternative_2(oldinstr, newinstr1, feature1, newinstr2, feature2) \ (asm volatile(ALTERNATIVE_2(oldinstr, newinstr1, feature1, newinstr2, feature2) ::: "memory")) -#endif /* __ASSEMBLY__ */ +#endif /* __ASSEMBLER__ */ #endif /* _ASM_ALTERNATIVE_H */ diff --git a/arch/loongarch/include/asm/asm-extable.h b/arch/loongarch/include/asm/asm-extable.h index df05005f2b80..d60bdf2e6377 100644 --- a/arch/loongarch/include/asm/asm-extable.h +++ b/arch/loongarch/include/asm/asm-extable.h @@ -7,7 +7,7 @@ #define EX_TYPE_UACCESS_ERR_ZERO 2 #define EX_TYPE_BPF 3 -#ifdef __ASSEMBLY__ +#ifdef __ASSEMBLER__ #define __ASM_EXTABLE_RAW(insn, fixup, type, data) \ .pushsection __ex_table, "a"; \ @@ -22,7 +22,7 @@ __ASM_EXTABLE_RAW(\insn, \fixup, EX_TYPE_FIXUP, 0) .endm -#else /* __ASSEMBLY__ */ +#else /* __ASSEMBLER__ */ #include #include @@ -60,6 +60,6 @@ #define _ASM_EXTABLE_UACCESS_ERR(insn, fixup, err) \ _ASM_EXTABLE_UACCESS_ERR_ZERO(insn, fixup, err, zero) -#endif /* __ASSEMBLY__ */ +#endif /* __ASSEMBLER__ */ #endif /* __ASM_ASM_EXTABLE_H */ diff --git a/arch/loongarch/include/asm/asm.h b/arch/loongarch/include/asm/asm.h index f591b3245def..f018d26fc995 100644 --- a/arch/loongarch/include/asm/asm.h +++ b/arch/loongarch/include/asm/asm.h @@ -110,7 +110,7 @@ #define LONG_SRA srai.w #define LONG_SRAV sra.w -#ifdef __ASSEMBLY__ +#ifdef __ASSEMBLER__ #define LONG .word #endif #define LONGSIZE 4 @@ -131,7 +131,7 @@ #define LONG_SRA srai.d #define LONG_SRAV sra.d -#ifdef __ASSEMBLY__ +#ifdef __ASSEMBLER__ #define LONG .dword #endif #define LONGSIZE 8 @@ -158,7 +158,7 @@ #define PTR_SCALESHIFT 2 -#ifdef __ASSEMBLY__ +#ifdef __ASSEMBLER__ #define PTR .word #endif #define PTRSIZE 4 @@ -181,7 +181,7 @@ #define PTR_SCALESHIFT 3 -#ifdef __ASSEMBLY__ +#ifdef __ASSEMBLER__ #define PTR .dword #endif #define PTRSIZE 8 diff --git a/arch/loongarch/include/asm/cpu.h b/arch/loongarch/include/asm/cpu.h index 98cf4d7b4b0a..dfb982fe8701 100644 --- a/arch/loongarch/include/asm/cpu.h +++ b/arch/loongarch/include/asm/cpu.h @@ -46,7 +46,7 @@ #define PRID_PRODUCT_MASK 0x0fff -#if !defined(__ASSEMBLY__) +#if !defined(__ASSEMBLER__) enum cpu_type_enum { CPU_UNKNOWN, @@ -55,7 +55,7 @@ enum cpu_type_enum { CPU_LAST }; -#endif /* !__ASSEMBLY */ +#endif /* !__ASSEMBLER__ */ /* * ISA Level encodings diff --git a/arch/loongarch/include/asm/ftrace.h b/arch/loongarch/include/asm/ftrace.h index 6e0a99763a9a..f4caaf764f9e 100644 --- a/arch/loongarch/include/asm/ftrace.h +++ b/arch/loongarch/include/asm/ftrace.h @@ -14,7 +14,7 @@ #define MCOUNT_INSN_SIZE 4 /* sizeof mcount call */ -#ifndef __ASSEMBLY__ +#ifndef __ASSEMBLER__ #ifndef CONFIG_DYNAMIC_FTRACE @@ -84,7 +84,7 @@ __arch_ftrace_set_direct_caller(struct pt_regs *regs, unsigned long addr) #endif -#endif /* __ASSEMBLY__ */ +#endif /* __ASSEMBLER__ */ #endif /* CONFIG_FUNCTION_TRACER */ diff --git a/arch/loongarch/include/asm/gpr-num.h b/arch/loongarch/include/asm/gpr-num.h index 996038da806d..af95b941f48b 100644 --- a/arch/loongarch/include/asm/gpr-num.h +++ b/arch/loongarch/include/asm/gpr-num.h @@ -2,7 +2,7 @@ #ifndef __ASM_GPR_NUM_H #define __ASM_GPR_NUM_H -#ifdef __ASSEMBLY__ +#ifdef __ASSEMBLER__ .equ .L__gpr_num_zero, 0 .irp num,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31 @@ -25,7 +25,7 @@ .equ .L__gpr_num_$s\num, 23 + \num .endr -#else /* __ASSEMBLY__ */ +#else /* __ASSEMBLER__ */ #define __DEFINE_ASM_GPR_NUMS \ " .equ .L__gpr_num_zero, 0\n" \ @@ -47,6 +47,6 @@ " .equ .L__gpr_num_$s\\num, 23 + \\num\n" \ " .endr\n" \ -#endif /* __ASSEMBLY__ */ +#endif /* __ASSEMBLER__ */ #endif /* __ASM_GPR_NUM_H */ diff --git a/arch/loongarch/include/asm/irqflags.h b/arch/loongarch/include/asm/irqflags.h index 003172b8406b..620163628a7f 100644 --- a/arch/loongarch/include/asm/irqflags.h +++ b/arch/loongarch/include/asm/irqflags.h @@ -5,7 +5,7 @@ #ifndef _ASM_IRQFLAGS_H #define _ASM_IRQFLAGS_H -#ifndef __ASSEMBLY__ +#ifndef __ASSEMBLER__ #include #include @@ -80,6 +80,6 @@ static inline int arch_irqs_disabled(void) return arch_irqs_disabled_flags(arch_local_save_flags()); } -#endif /* #ifndef __ASSEMBLY__ */ +#endif /* #ifndef __ASSEMBLER__ */ #endif /* _ASM_IRQFLAGS_H */ diff --git a/arch/loongarch/include/asm/jump_label.h b/arch/loongarch/include/asm/jump_label.h index 8a924bd69d19..4000c7603d8e 100644 --- a/arch/loongarch/include/asm/jump_label.h +++ b/arch/loongarch/include/asm/jump_label.h @@ -7,7 +7,7 @@ #ifndef __ASM_JUMP_LABEL_H #define __ASM_JUMP_LABEL_H -#ifndef __ASSEMBLY__ +#ifndef __ASSEMBLER__ #include @@ -50,5 +50,5 @@ l_yes: return true; } -#endif /* __ASSEMBLY__ */ +#endif /* __ASSEMBLER__ */ #endif /* __ASM_JUMP_LABEL_H */ diff --git a/arch/loongarch/include/asm/kasan.h b/arch/loongarch/include/asm/kasan.h index 7f52bd31b9d4..62f139a9c87d 100644 --- a/arch/loongarch/include/asm/kasan.h +++ b/arch/loongarch/include/asm/kasan.h @@ -2,7 +2,7 @@ #ifndef __ASM_KASAN_H #define __ASM_KASAN_H -#ifndef __ASSEMBLY__ +#ifndef __ASSEMBLER__ #include #include diff --git a/arch/loongarch/include/asm/loongarch.h b/arch/loongarch/include/asm/loongarch.h index d84dac88a584..a0994d226eff 100644 --- a/arch/loongarch/include/asm/loongarch.h +++ b/arch/loongarch/include/asm/loongarch.h @@ -9,15 +9,15 @@ #include #include -#ifndef __ASSEMBLY__ +#ifndef __ASSEMBLER__ #include /* CPUCFG */ #define read_cpucfg(reg) __cpucfg(reg) -#endif /* !__ASSEMBLY__ */ +#endif /* !__ASSEMBLER__ */ -#ifdef __ASSEMBLY__ +#ifdef __ASSEMBLER__ /* LoongArch Registers */ #define REG_ZERO 0x0 @@ -53,7 +53,7 @@ #define REG_S7 0x1e #define REG_S8 0x1f -#endif /* __ASSEMBLY__ */ +#endif /* __ASSEMBLER__ */ /* Bit fields for CPUCFG registers */ #define LOONGARCH_CPUCFG0 0x0 @@ -171,7 +171,7 @@ * SW emulation for KVM hypervirsor, see arch/loongarch/include/uapi/asm/kvm_para.h */ -#ifndef __ASSEMBLY__ +#ifndef __ASSEMBLER__ /* CSR */ #define csr_read32(reg) __csrrd_w(reg) @@ -187,7 +187,7 @@ #define iocsr_write32(val, reg) __iocsrwr_w(val, reg) #define iocsr_write64(val, reg) __iocsrwr_d(val, reg) -#endif /* !__ASSEMBLY__ */ +#endif /* !__ASSEMBLER__ */ /* CSR register number */ @@ -1195,7 +1195,7 @@ #define LOONGARCH_IOCSR_EXTIOI_ROUTE_BASE 0x1c00 #define IOCSR_EXTIOI_VECTOR_NUM 256 -#ifndef __ASSEMBLY__ +#ifndef __ASSEMBLER__ static __always_inline u64 drdtime(void) { @@ -1357,7 +1357,7 @@ __BUILD_CSR_OP(tlbidx) #define clear_csr_estat(val) \ csr_xchg32(~(val), val, LOONGARCH_CSR_ESTAT) -#endif /* __ASSEMBLY__ */ +#endif /* __ASSEMBLER__ */ /* Generic EntryLo bit definitions */ #define ENTRYLO_V (_ULCAST_(1) << 0) diff --git a/arch/loongarch/include/asm/orc_types.h b/arch/loongarch/include/asm/orc_types.h index caf1f71a1057..d5fa98d1d177 100644 --- a/arch/loongarch/include/asm/orc_types.h +++ b/arch/loongarch/include/asm/orc_types.h @@ -34,7 +34,7 @@ #define ORC_TYPE_REGS 3 #define ORC_TYPE_REGS_PARTIAL 4 -#ifndef __ASSEMBLY__ +#ifndef __ASSEMBLER__ /* * This struct is more or less a vastly simplified version of the DWARF Call * Frame Information standard. It contains only the necessary parts of DWARF @@ -53,6 +53,6 @@ struct orc_entry { unsigned int type:3; unsigned int signal:1; }; -#endif /* __ASSEMBLY__ */ +#endif /* __ASSEMBLER__ */ #endif /* _ORC_TYPES_H */ diff --git a/arch/loongarch/include/asm/page.h b/arch/loongarch/include/asm/page.h index 7368f12b7cb1..a3aaf34fba16 100644 --- a/arch/loongarch/include/asm/page.h +++ b/arch/loongarch/include/asm/page.h @@ -15,7 +15,7 @@ #define HPAGE_MASK (~(HPAGE_SIZE - 1)) #define HUGETLB_PAGE_ORDER (HPAGE_SHIFT - PAGE_SHIFT) -#ifndef __ASSEMBLY__ +#ifndef __ASSEMBLER__ #include #include @@ -110,6 +110,6 @@ extern int __virt_addr_valid(volatile void *kaddr); #include #include -#endif /* !__ASSEMBLY__ */ +#endif /* !__ASSEMBLER__ */ #endif /* _ASM_PAGE_H */ diff --git a/arch/loongarch/include/asm/pgtable-bits.h b/arch/loongarch/include/asm/pgtable-bits.h index 45bfc65a0c9f..7bbfb04a54cc 100644 --- a/arch/loongarch/include/asm/pgtable-bits.h +++ b/arch/loongarch/include/asm/pgtable-bits.h @@ -92,7 +92,7 @@ #define PAGE_KERNEL_WUC __pgprot(_PAGE_PRESENT | __READABLE | __WRITEABLE | \ _PAGE_GLOBAL | _PAGE_KERN | _CACHE_WUC) -#ifndef __ASSEMBLY__ +#ifndef __ASSEMBLER__ #define _PAGE_IOREMAP pgprot_val(PAGE_KERNEL_SUC) @@ -127,6 +127,6 @@ static inline pgprot_t pgprot_writecombine(pgprot_t _prot) return __pgprot(prot); } -#endif /* !__ASSEMBLY__ */ +#endif /* !__ASSEMBLER__ */ #endif /* _ASM_PGTABLE_BITS_H */ diff --git a/arch/loongarch/include/asm/pgtable.h b/arch/loongarch/include/asm/pgtable.h index b30185302c07..f2aeff544cee 100644 --- a/arch/loongarch/include/asm/pgtable.h +++ b/arch/loongarch/include/asm/pgtable.h @@ -55,7 +55,7 @@ #define USER_PTRS_PER_PGD ((TASK_SIZE64 / PGDIR_SIZE)?(TASK_SIZE64 / PGDIR_SIZE):1) -#ifndef __ASSEMBLY__ +#ifndef __ASSEMBLER__ #include #include @@ -618,6 +618,6 @@ static inline long pmd_protnone(pmd_t pmd) #define HAVE_ARCH_UNMAPPED_AREA #define HAVE_ARCH_UNMAPPED_AREA_TOPDOWN -#endif /* !__ASSEMBLY__ */ +#endif /* !__ASSEMBLER__ */ #endif /* _ASM_PGTABLE_H */ diff --git a/arch/loongarch/include/asm/prefetch.h b/arch/loongarch/include/asm/prefetch.h index 1672262a5e2e..0b168cdaae9a 100644 --- a/arch/loongarch/include/asm/prefetch.h +++ b/arch/loongarch/include/asm/prefetch.h @@ -8,7 +8,7 @@ #define Pref_Load 0 #define Pref_Store 8 -#ifdef __ASSEMBLY__ +#ifdef __ASSEMBLER__ .macro __pref hint addr #ifdef CONFIG_CPU_HAS_PREFETCH diff --git a/arch/loongarch/include/asm/thread_info.h b/arch/loongarch/include/asm/thread_info.h index 4f5a9441754e..9dfa2ef00816 100644 --- a/arch/loongarch/include/asm/thread_info.h +++ b/arch/loongarch/include/asm/thread_info.h @@ -10,7 +10,7 @@ #ifdef __KERNEL__ -#ifndef __ASSEMBLY__ +#ifndef __ASSEMBLER__ #include @@ -53,7 +53,7 @@ static inline struct thread_info *current_thread_info(void) register unsigned long current_stack_pointer __asm__("$sp"); -#endif /* !__ASSEMBLY__ */ +#endif /* !__ASSEMBLER__ */ /* thread information allocation */ #define THREAD_SIZE SZ_16K diff --git a/arch/loongarch/include/asm/types.h b/arch/loongarch/include/asm/types.h index baf15a0dcf8b..0edd731f3d6a 100644 --- a/arch/loongarch/include/asm/types.h +++ b/arch/loongarch/include/asm/types.h @@ -8,7 +8,7 @@ #include #include -#ifdef __ASSEMBLY__ +#ifdef __ASSEMBLER__ #define _ULCAST_ #define _U64CAST_ #else diff --git a/arch/loongarch/include/asm/unwind_hints.h b/arch/loongarch/include/asm/unwind_hints.h index 2c68bc72736c..16c7f7e465a0 100644 --- a/arch/loongarch/include/asm/unwind_hints.h +++ b/arch/loongarch/include/asm/unwind_hints.h @@ -5,7 +5,7 @@ #include #include -#ifdef __ASSEMBLY__ +#ifdef __ASSEMBLER__ .macro UNWIND_HINT_UNDEFINED UNWIND_HINT type=UNWIND_HINT_TYPE_UNDEFINED @@ -23,7 +23,7 @@ UNWIND_HINT sp_reg=ORC_REG_SP type=UNWIND_HINT_TYPE_CALL .endm -#else /* !__ASSEMBLY__ */ +#else /* !__ASSEMBLER__ */ #define UNWIND_HINT_SAVE \ UNWIND_HINT(UNWIND_HINT_TYPE_SAVE, 0, 0, 0) @@ -31,6 +31,6 @@ #define UNWIND_HINT_RESTORE \ UNWIND_HINT(UNWIND_HINT_TYPE_RESTORE, 0, 0, 0) -#endif /* !__ASSEMBLY__ */ +#endif /* !__ASSEMBLER__ */ #endif /* _ASM_LOONGARCH_UNWIND_HINTS_H */ diff --git a/arch/loongarch/include/asm/vdso/arch_data.h b/arch/loongarch/include/asm/vdso/arch_data.h index 322d0a5f1c84..395ec223bcbe 100644 --- a/arch/loongarch/include/asm/vdso/arch_data.h +++ b/arch/loongarch/include/asm/vdso/arch_data.h @@ -7,7 +7,7 @@ #ifndef _VDSO_ARCH_DATA_H #define _VDSO_ARCH_DATA_H -#ifndef __ASSEMBLY__ +#ifndef __ASSEMBLER__ #include #include @@ -20,6 +20,6 @@ struct vdso_arch_data { struct vdso_pcpu_data pdata[NR_CPUS]; }; -#endif /* __ASSEMBLY__ */ +#endif /* __ASSEMBLER__ */ #endif diff --git a/arch/loongarch/include/asm/vdso/getrandom.h b/arch/loongarch/include/asm/vdso/getrandom.h index a81724b69f29..2ff05003c6e7 100644 --- a/arch/loongarch/include/asm/vdso/getrandom.h +++ b/arch/loongarch/include/asm/vdso/getrandom.h @@ -5,7 +5,7 @@ #ifndef __ASM_VDSO_GETRANDOM_H #define __ASM_VDSO_GETRANDOM_H -#ifndef __ASSEMBLY__ +#ifndef __ASSEMBLER__ #include #include @@ -28,6 +28,6 @@ static __always_inline ssize_t getrandom_syscall(void *_buffer, size_t _len, uns return ret; } -#endif /* !__ASSEMBLY__ */ +#endif /* !__ASSEMBLER__ */ #endif /* __ASM_VDSO_GETRANDOM_H */ diff --git a/arch/loongarch/include/asm/vdso/gettimeofday.h b/arch/loongarch/include/asm/vdso/gettimeofday.h index f15503e3336c..dcafabca9bb6 100644 --- a/arch/loongarch/include/asm/vdso/gettimeofday.h +++ b/arch/loongarch/include/asm/vdso/gettimeofday.h @@ -7,7 +7,7 @@ #ifndef __ASM_VDSO_GETTIMEOFDAY_H #define __ASM_VDSO_GETTIMEOFDAY_H -#ifndef __ASSEMBLY__ +#ifndef __ASSEMBLER__ #include #include @@ -89,6 +89,6 @@ static inline bool loongarch_vdso_hres_capable(void) } #define __arch_vdso_hres_capable loongarch_vdso_hres_capable -#endif /* !__ASSEMBLY__ */ +#endif /* !__ASSEMBLER__ */ #endif /* __ASM_VDSO_GETTIMEOFDAY_H */ diff --git a/arch/loongarch/include/asm/vdso/processor.h b/arch/loongarch/include/asm/vdso/processor.h index ef5770b343a0..1e255373b0b8 100644 --- a/arch/loongarch/include/asm/vdso/processor.h +++ b/arch/loongarch/include/asm/vdso/processor.h @@ -5,10 +5,10 @@ #ifndef __ASM_VDSO_PROCESSOR_H #define __ASM_VDSO_PROCESSOR_H -#ifndef __ASSEMBLY__ +#ifndef __ASSEMBLER__ #define cpu_relax() barrier() -#endif /* __ASSEMBLY__ */ +#endif /* __ASSEMBLER__ */ #endif /* __ASM_VDSO_PROCESSOR_H */ diff --git a/arch/loongarch/include/asm/vdso/vdso.h b/arch/loongarch/include/asm/vdso/vdso.h index 50c65fb29daf..04bd2d452876 100644 --- a/arch/loongarch/include/asm/vdso/vdso.h +++ b/arch/loongarch/include/asm/vdso/vdso.h @@ -7,7 +7,7 @@ #ifndef _ASM_VDSO_VDSO_H #define _ASM_VDSO_VDSO_H -#ifndef __ASSEMBLY__ +#ifndef __ASSEMBLER__ #include #include @@ -16,6 +16,6 @@ #define VVAR_SIZE (VDSO_NR_PAGES << PAGE_SHIFT) -#endif /* __ASSEMBLY__ */ +#endif /* __ASSEMBLER__ */ #endif diff --git a/arch/loongarch/include/asm/vdso/vsyscall.h b/arch/loongarch/include/asm/vdso/vsyscall.h index 1140b54b4bc8..558eb9dfda52 100644 --- a/arch/loongarch/include/asm/vdso/vsyscall.h +++ b/arch/loongarch/include/asm/vdso/vsyscall.h @@ -2,13 +2,13 @@ #ifndef __ASM_VDSO_VSYSCALL_H #define __ASM_VDSO_VSYSCALL_H -#ifndef __ASSEMBLY__ +#ifndef __ASSEMBLER__ #include /* The asm-generic header needs to be included after the definitions above */ #include -#endif /* !__ASSEMBLY__ */ +#endif /* !__ASSEMBLER__ */ #endif /* __ASM_VDSO_VSYSCALL_H */ diff --git a/tools/arch/loongarch/include/asm/orc_types.h b/tools/arch/loongarch/include/asm/orc_types.h index caf1f71a1057..d5fa98d1d177 100644 --- a/tools/arch/loongarch/include/asm/orc_types.h +++ b/tools/arch/loongarch/include/asm/orc_types.h @@ -34,7 +34,7 @@ #define ORC_TYPE_REGS 3 #define ORC_TYPE_REGS_PARTIAL 4 -#ifndef __ASSEMBLY__ +#ifndef __ASSEMBLER__ /* * This struct is more or less a vastly simplified version of the DWARF Call * Frame Information standard. It contains only the necessary parts of DWARF @@ -53,6 +53,6 @@ struct orc_entry { unsigned int type:3; unsigned int signal:1; }; -#endif /* __ASSEMBLY__ */ +#endif /* __ASSEMBLER__ */ #endif /* _ORC_TYPES_H */ From 7d69294b8a8d0e19600797711ed3bc047fead1c1 Mon Sep 17 00:00:00 2001 From: Huacai Chen Date: Thu, 26 Jun 2025 20:07:18 +0800 Subject: [PATCH 217/289] LoongArch: Fix build warnings about export.h After commit a934a57a42f64a4 ("scripts/misc-check: check missing #include when W=1") and 7d95680d64ac8e836c ("scripts/misc-check: check unnecessary #include when W=1"), we get some build warnings with W=1: arch/loongarch/kernel/acpi.c: warning: EXPORT_SYMBOL() is used, but #include is missing arch/loongarch/kernel/alternative.c: warning: EXPORT_SYMBOL() is used, but #include is missing arch/loongarch/kernel/kfpu.c: warning: EXPORT_SYMBOL() is used, but #include is missing arch/loongarch/kernel/traps.c: warning: EXPORT_SYMBOL() is used, but #include is missing arch/loongarch/kernel/unwind_guess.c: warning: EXPORT_SYMBOL() is used, but #include is missing arch/loongarch/kernel/unwind_orc.c: warning: EXPORT_SYMBOL() is used, but #include is missing arch/loongarch/kernel/unwind_prologue.c: warning: EXPORT_SYMBOL() is used, but #include is missing arch/loongarch/lib/crc32-loongarch.c: warning: EXPORT_SYMBOL() is used, but #include is missing arch/loongarch/lib/csum.c: warning: EXPORT_SYMBOL() is used, but #include is missing arch/loongarch/kernel/elf.c: warning: EXPORT_SYMBOL() is not used, but #include is present arch/loongarch/kernel/paravirt.c: warning: EXPORT_SYMBOL() is not used, but #include is present arch/loongarch/pci/pci.c: warning: EXPORT_SYMBOL() is not used, but #include is present So fix these build warnings for LoongArch. Reviewed-by: Masahiro Yamada Signed-off-by: Huacai Chen --- arch/loongarch/kernel/acpi.c | 1 + arch/loongarch/kernel/alternative.c | 1 + arch/loongarch/kernel/elf.c | 1 - arch/loongarch/kernel/kfpu.c | 1 + arch/loongarch/kernel/paravirt.c | 1 - arch/loongarch/kernel/traps.c | 1 + arch/loongarch/kernel/unwind_guess.c | 1 + arch/loongarch/kernel/unwind_orc.c | 3 ++- arch/loongarch/kernel/unwind_prologue.c | 1 + arch/loongarch/lib/crc32-loongarch.c | 1 + arch/loongarch/lib/csum.c | 1 + arch/loongarch/pci/pci.c | 1 - 12 files changed, 10 insertions(+), 4 deletions(-) diff --git a/arch/loongarch/kernel/acpi.c b/arch/loongarch/kernel/acpi.c index a54cd6fd3796..1367ca759468 100644 --- a/arch/loongarch/kernel/acpi.c +++ b/arch/loongarch/kernel/acpi.c @@ -10,6 +10,7 @@ #include #include #include +#include #include #include #include diff --git a/arch/loongarch/kernel/alternative.c b/arch/loongarch/kernel/alternative.c index 4ad13847e962..0e0c766df1e3 100644 --- a/arch/loongarch/kernel/alternative.c +++ b/arch/loongarch/kernel/alternative.c @@ -1,4 +1,5 @@ // SPDX-License-Identifier: GPL-2.0-only +#include #include #include #include diff --git a/arch/loongarch/kernel/elf.c b/arch/loongarch/kernel/elf.c index 0fa81ced28dc..3d98c6aa00db 100644 --- a/arch/loongarch/kernel/elf.c +++ b/arch/loongarch/kernel/elf.c @@ -6,7 +6,6 @@ #include #include -#include #include #include diff --git a/arch/loongarch/kernel/kfpu.c b/arch/loongarch/kernel/kfpu.c index 4c476904227f..141b49bd989c 100644 --- a/arch/loongarch/kernel/kfpu.c +++ b/arch/loongarch/kernel/kfpu.c @@ -4,6 +4,7 @@ */ #include +#include #include #include #include diff --git a/arch/loongarch/kernel/paravirt.c b/arch/loongarch/kernel/paravirt.c index e5a39bbad078..b1b51f920b23 100644 --- a/arch/loongarch/kernel/paravirt.c +++ b/arch/loongarch/kernel/paravirt.c @@ -1,5 +1,4 @@ // SPDX-License-Identifier: GPL-2.0 -#include #include #include #include diff --git a/arch/loongarch/kernel/traps.c b/arch/loongarch/kernel/traps.c index 47fc2de6d150..3d9be6ca7ec5 100644 --- a/arch/loongarch/kernel/traps.c +++ b/arch/loongarch/kernel/traps.c @@ -13,6 +13,7 @@ #include #include #include +#include #include #include #include diff --git a/arch/loongarch/kernel/unwind_guess.c b/arch/loongarch/kernel/unwind_guess.c index 98379b7d4147..08d7951b2f60 100644 --- a/arch/loongarch/kernel/unwind_guess.c +++ b/arch/loongarch/kernel/unwind_guess.c @@ -3,6 +3,7 @@ * Copyright (C) 2022 Loongson Technology Corporation Limited */ #include +#include unsigned long unwind_get_return_address(struct unwind_state *state) { diff --git a/arch/loongarch/kernel/unwind_orc.c b/arch/loongarch/kernel/unwind_orc.c index d623935a7547..0005be49b056 100644 --- a/arch/loongarch/kernel/unwind_orc.c +++ b/arch/loongarch/kernel/unwind_orc.c @@ -1,6 +1,7 @@ // SPDX-License-Identifier: GPL-2.0-only -#include +#include #include +#include #include #include #include diff --git a/arch/loongarch/kernel/unwind_prologue.c b/arch/loongarch/kernel/unwind_prologue.c index 929ae240280a..729e775bd40d 100644 --- a/arch/loongarch/kernel/unwind_prologue.c +++ b/arch/loongarch/kernel/unwind_prologue.c @@ -3,6 +3,7 @@ * Copyright (C) 2022 Loongson Technology Corporation Limited */ #include +#include #include #include diff --git a/arch/loongarch/lib/crc32-loongarch.c b/arch/loongarch/lib/crc32-loongarch.c index b37cd8537b45..db22c2ec55e2 100644 --- a/arch/loongarch/lib/crc32-loongarch.c +++ b/arch/loongarch/lib/crc32-loongarch.c @@ -11,6 +11,7 @@ #include #include +#include #include #include diff --git a/arch/loongarch/lib/csum.c b/arch/loongarch/lib/csum.c index df309ae4045d..bcc9d01d8c41 100644 --- a/arch/loongarch/lib/csum.c +++ b/arch/loongarch/lib/csum.c @@ -2,6 +2,7 @@ // Copyright (C) 2019-2020 Arm Ltd. #include +#include #include #include diff --git a/arch/loongarch/pci/pci.c b/arch/loongarch/pci/pci.c index 2726639150bc..5bc9627a6cf9 100644 --- a/arch/loongarch/pci/pci.c +++ b/arch/loongarch/pci/pci.c @@ -3,7 +3,6 @@ * Copyright (C) 2020-2022 Loongson Technology Corporation Limited */ #include -#include #include #include #include From 39503fc84b4ea94f2bedca481de5e225e0df729d Mon Sep 17 00:00:00 2001 From: Ming Wang Date: Thu, 26 Jun 2025 20:07:18 +0800 Subject: [PATCH 218/289] LoongArch: Reserve the EFI memory map region The EFI memory map at 'boot_memmap' is crucial for kdump to understand the primary kernel's memory layout. This memory region, typically part of EFI Boot Services (BS) data, can be overwritten after ExitBootServices if not explicitly preserved by the kernel. This commit addresses this by: 1. Calling memblock_reserve() to reserve the entire physical region occupied by the EFI memory map (header + descriptors). This prevents the primary kernel from reallocating and corrupting this area. 2. Setting the EFI_PRESERVE_BS_REGIONS flag in efi.flags. This indicates that efforts have been made to preserve critical BS code/data regions which can be useful for other kernel subsystems or debugging. These changes ensure the original EFI memory map data remains intact, improving kdump reliability and potentially aiding other EFI-related functionalities that might rely on preserved BS code/data. Signed-off-by: Ming Wang Signed-off-by: Huacai Chen --- arch/loongarch/kernel/efi.c | 12 ++++++++++++ 1 file changed, 12 insertions(+) diff --git a/arch/loongarch/kernel/efi.c b/arch/loongarch/kernel/efi.c index de21e72759ee..860a3bc030e0 100644 --- a/arch/loongarch/kernel/efi.c +++ b/arch/loongarch/kernel/efi.c @@ -144,6 +144,18 @@ void __init efi_init(void) if (efi_memmap_init_early(&data) < 0) panic("Unable to map EFI memory map.\n"); + /* + * Reserve the physical memory region occupied by the EFI + * memory map table (header + descriptors). This is crucial + * for kdump, as the kdump kernel relies on this original + * memmap passed by the bootloader. Without reservation, + * this region could be overwritten by the primary kernel. + * Also, set the EFI_PRESERVE_BS_REGIONS flag to indicate that + * critical boot services code/data regions like this are preserved. + */ + memblock_reserve((phys_addr_t)boot_memmap, sizeof(*tbl) + data.size); + set_bit(EFI_PRESERVE_BS_REGIONS, &efi.flags); + early_memunmap(tbl, sizeof(*tbl)); } From a0137c9048252ea965a0436895c77d1c917dfe2a Mon Sep 17 00:00:00 2001 From: Kees Cook Date: Thu, 26 Jun 2025 20:07:18 +0800 Subject: [PATCH 219/289] LoongArch: Handle KCOV __init vs inline mismatches When the KCOV is enabled all functions get instrumented, unless the __no_sanitize_coverage attribute is used. To prepare for __no_sanitize_coverage being applied to __init functions, we have to handle differences in how GCC's inline optimizations get resolved. For LoongArch this exposed several places where __init annotations were missing but ended up being "accidentally correct". So fix these cases. Signed-off-by: Kees Cook Signed-off-by: Huacai Chen --- arch/loongarch/include/asm/smp.h | 2 +- arch/loongarch/kernel/time.c | 2 +- arch/loongarch/mm/ioremap.c | 4 ++-- 3 files changed, 4 insertions(+), 4 deletions(-) diff --git a/arch/loongarch/include/asm/smp.h b/arch/loongarch/include/asm/smp.h index ad0bd234a0f1..3a47f52959a8 100644 --- a/arch/loongarch/include/asm/smp.h +++ b/arch/loongarch/include/asm/smp.h @@ -39,7 +39,7 @@ int loongson_cpu_disable(void); void loongson_cpu_die(unsigned int cpu); #endif -static inline void plat_smp_setup(void) +static inline void __init plat_smp_setup(void) { loongson_smp_setup(); } diff --git a/arch/loongarch/kernel/time.c b/arch/loongarch/kernel/time.c index bc75a3a69fc8..367906b10f81 100644 --- a/arch/loongarch/kernel/time.c +++ b/arch/loongarch/kernel/time.c @@ -102,7 +102,7 @@ static int constant_timer_next_event(unsigned long delta, struct clock_event_dev return 0; } -static unsigned long __init get_loops_per_jiffy(void) +static unsigned long get_loops_per_jiffy(void) { unsigned long lpj = (unsigned long)const_clock_freq; diff --git a/arch/loongarch/mm/ioremap.c b/arch/loongarch/mm/ioremap.c index 70ca73019811..df949a3d0f34 100644 --- a/arch/loongarch/mm/ioremap.c +++ b/arch/loongarch/mm/ioremap.c @@ -16,12 +16,12 @@ void __init early_iounmap(void __iomem *addr, unsigned long size) } -void *early_memremap_ro(resource_size_t phys_addr, unsigned long size) +void * __init early_memremap_ro(resource_size_t phys_addr, unsigned long size) { return early_memremap(phys_addr, size); } -void *early_memremap_prot(resource_size_t phys_addr, unsigned long size, +void * __init early_memremap_prot(resource_size_t phys_addr, unsigned long size, unsigned long prot_val) { return early_memremap(phys_addr, size); From 080e8d2ecdfde588897aa8a87a8884061f4dbbbb Mon Sep 17 00:00:00 2001 From: Bibo Mao Date: Thu, 26 Jun 2025 20:07:27 +0800 Subject: [PATCH 220/289] LoongArch: KVM: Avoid overflow with array index The variable index is modified and reused as array index when modify register EIOINTC_ENABLE. There will be array index overflow problem. Cc: stable@vger.kernel.org Fixes: 3956a52bc05b ("LoongArch: KVM: Add EIOINTC read and write functions") Signed-off-by: Bibo Mao Signed-off-by: Huacai Chen --- arch/loongarch/kvm/intc/eiointc.c | 17 +++++++---------- 1 file changed, 7 insertions(+), 10 deletions(-) diff --git a/arch/loongarch/kvm/intc/eiointc.c b/arch/loongarch/kvm/intc/eiointc.c index f39929d7bf8a..9c47456b805c 100644 --- a/arch/loongarch/kvm/intc/eiointc.c +++ b/arch/loongarch/kvm/intc/eiointc.c @@ -436,17 +436,16 @@ static int loongarch_eiointc_writew(struct kvm_vcpu *vcpu, break; case EIOINTC_ENABLE_START ... EIOINTC_ENABLE_END: index = (offset - EIOINTC_ENABLE_START) >> 1; - old_data = s->enable.reg_u32[index]; + old_data = s->enable.reg_u16[index]; s->enable.reg_u16[index] = data; /* * 1: enable irq. * update irq when isr is set. */ data = s->enable.reg_u16[index] & ~old_data & s->isr.reg_u16[index]; - index = index << 1; for (i = 0; i < sizeof(data); i++) { u8 mask = (data >> (i * 8)) & 0xff; - eiointc_enable_irq(vcpu, s, index + i, mask, 1); + eiointc_enable_irq(vcpu, s, index * 2 + i, mask, 1); } /* * 0: disable irq. @@ -455,7 +454,7 @@ static int loongarch_eiointc_writew(struct kvm_vcpu *vcpu, data = ~s->enable.reg_u16[index] & old_data & s->isr.reg_u16[index]; for (i = 0; i < sizeof(data); i++) { u8 mask = (data >> (i * 8)) & 0xff; - eiointc_enable_irq(vcpu, s, index, mask, 0); + eiointc_enable_irq(vcpu, s, index * 2 + i, mask, 0); } break; case EIOINTC_BOUNCE_START ... EIOINTC_BOUNCE_END: @@ -529,10 +528,9 @@ static int loongarch_eiointc_writel(struct kvm_vcpu *vcpu, * update irq when isr is set. */ data = s->enable.reg_u32[index] & ~old_data & s->isr.reg_u32[index]; - index = index << 2; for (i = 0; i < sizeof(data); i++) { u8 mask = (data >> (i * 8)) & 0xff; - eiointc_enable_irq(vcpu, s, index + i, mask, 1); + eiointc_enable_irq(vcpu, s, index * 4 + i, mask, 1); } /* * 0: disable irq. @@ -541,7 +539,7 @@ static int loongarch_eiointc_writel(struct kvm_vcpu *vcpu, data = ~s->enable.reg_u32[index] & old_data & s->isr.reg_u32[index]; for (i = 0; i < sizeof(data); i++) { u8 mask = (data >> (i * 8)) & 0xff; - eiointc_enable_irq(vcpu, s, index, mask, 0); + eiointc_enable_irq(vcpu, s, index * 4 + i, mask, 0); } break; case EIOINTC_BOUNCE_START ... EIOINTC_BOUNCE_END: @@ -615,10 +613,9 @@ static int loongarch_eiointc_writeq(struct kvm_vcpu *vcpu, * update irq when isr is set. */ data = s->enable.reg_u64[index] & ~old_data & s->isr.reg_u64[index]; - index = index << 3; for (i = 0; i < sizeof(data); i++) { u8 mask = (data >> (i * 8)) & 0xff; - eiointc_enable_irq(vcpu, s, index + i, mask, 1); + eiointc_enable_irq(vcpu, s, index * 8 + i, mask, 1); } /* * 0: disable irq. @@ -627,7 +624,7 @@ static int loongarch_eiointc_writeq(struct kvm_vcpu *vcpu, data = ~s->enable.reg_u64[index] & old_data & s->isr.reg_u64[index]; for (i = 0; i < sizeof(data); i++) { u8 mask = (data >> (i * 8)) & 0xff; - eiointc_enable_irq(vcpu, s, index, mask, 0); + eiointc_enable_irq(vcpu, s, index * 8 + i, mask, 0); } break; case EIOINTC_BOUNCE_START ... EIOINTC_BOUNCE_END: From a4b1b51ae132ac199412028a2df7b6c267888190 Mon Sep 17 00:00:00 2001 From: Maarten Lankhorst Date: Fri, 6 Jun 2025 11:45:47 +0100 Subject: [PATCH 221/289] drm/xe: Move DSB l2 flush to a more sensible place MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Flushing l2 is only needed after all data has been written. Fixes: 01570b446939 ("drm/xe/bmg: implement Wa_16023588340") Signed-off-by: Maarten Lankhorst Cc: Matthew Auld Cc: stable@vger.kernel.org # v6.12+ Reviewed-by: Matthew Auld Signed-off-by: Matthew Auld Reviewed-by: Lucas De Marchi Reviewed-by: Ville Syrjälä Link: https://lore.kernel.org/r/20250606104546.1996818-3-matthew.auld@intel.com Signed-off-by: Lucas De Marchi (cherry picked from commit 0dd2dd0182bc444a62652e89d08c7f0e4fde15ba) Signed-off-by: Thomas Hellström --- drivers/gpu/drm/xe/display/xe_dsb_buffer.c | 11 ++++------- 1 file changed, 4 insertions(+), 7 deletions(-) diff --git a/drivers/gpu/drm/xe/display/xe_dsb_buffer.c b/drivers/gpu/drm/xe/display/xe_dsb_buffer.c index f95375451e2f..9f941fc2e36b 100644 --- a/drivers/gpu/drm/xe/display/xe_dsb_buffer.c +++ b/drivers/gpu/drm/xe/display/xe_dsb_buffer.c @@ -17,10 +17,7 @@ u32 intel_dsb_buffer_ggtt_offset(struct intel_dsb_buffer *dsb_buf) void intel_dsb_buffer_write(struct intel_dsb_buffer *dsb_buf, u32 idx, u32 val) { - struct xe_device *xe = dsb_buf->vma->bo->tile->xe; - iosys_map_wr(&dsb_buf->vma->bo->vmap, idx * 4, u32, val); - xe_device_l2_flush(xe); } u32 intel_dsb_buffer_read(struct intel_dsb_buffer *dsb_buf, u32 idx) @@ -30,12 +27,9 @@ u32 intel_dsb_buffer_read(struct intel_dsb_buffer *dsb_buf, u32 idx) void intel_dsb_buffer_memset(struct intel_dsb_buffer *dsb_buf, u32 idx, u32 val, size_t size) { - struct xe_device *xe = dsb_buf->vma->bo->tile->xe; - WARN_ON(idx > (dsb_buf->buf_size - size) / sizeof(*dsb_buf->cmd_buf)); iosys_map_memset(&dsb_buf->vma->bo->vmap, idx * 4, val, size); - xe_device_l2_flush(xe); } bool intel_dsb_buffer_create(struct intel_crtc *crtc, struct intel_dsb_buffer *dsb_buf, size_t size) @@ -74,9 +68,12 @@ void intel_dsb_buffer_cleanup(struct intel_dsb_buffer *dsb_buf) void intel_dsb_buffer_flush_map(struct intel_dsb_buffer *dsb_buf) { + struct xe_device *xe = dsb_buf->vma->bo->tile->xe; + /* * The memory barrier here is to ensure coherency of DSB vs MMIO, * both for weak ordering archs and discrete cards. */ - xe_device_wmb(dsb_buf->vma->bo->tile->xe); + xe_device_wmb(xe); + xe_device_l2_flush(xe); } From f16873f42a06b620669d48a4b5c3f888cb3653a1 Mon Sep 17 00:00:00 2001 From: Matthew Auld Date: Fri, 6 Jun 2025 11:45:48 +0100 Subject: [PATCH 222/289] drm/xe: move DPT l2 flush to a more sensible place MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Only need the flush for DPT host updates here. Normal GGTT updates don't need special flush. Fixes: 01570b446939 ("drm/xe/bmg: implement Wa_16023588340") Signed-off-by: Matthew Auld Cc: Maarten Lankhorst Cc: stable@vger.kernel.org # v6.12+ Reviewed-by: Ville Syrjälä Reviewed-by: Lucas De Marchi Link: https://lore.kernel.org/r/20250606104546.1996818-4-matthew.auld@intel.com Signed-off-by: Lucas De Marchi (cherry picked from commit 35db1da40c8cfd7511dc42f342a133601eb45449) Signed-off-by: Thomas Hellström --- drivers/gpu/drm/xe/display/xe_fb_pin.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/xe/display/xe_fb_pin.c b/drivers/gpu/drm/xe/display/xe_fb_pin.c index d918ae1c8061..55259969480b 100644 --- a/drivers/gpu/drm/xe/display/xe_fb_pin.c +++ b/drivers/gpu/drm/xe/display/xe_fb_pin.c @@ -164,6 +164,9 @@ static int __xe_pin_fb_vma_dpt(const struct intel_framebuffer *fb, vma->dpt = dpt; vma->node = dpt->ggtt_node[tile0->id]; + + /* Ensure DPT writes are flushed */ + xe_device_l2_flush(xe); return 0; } @@ -333,8 +336,6 @@ static struct i915_vma *__xe_pin_fb_vma(const struct intel_framebuffer *fb, if (ret) goto err_unpin; - /* Ensure DPT writes are flushed */ - xe_device_l2_flush(xe); return vma; err_unpin: From ad40098da5c3b43114d860a5b5740e7204158534 Mon Sep 17 00:00:00 2001 From: Michal Wajdeczko Date: Fri, 13 Jun 2025 00:09:37 +0200 Subject: [PATCH 223/289] drm/xe/guc: Explicitly exit CT safe mode on unwind MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit During driver probe we might be briefly using CT safe mode, which is based on a delayed work, but usually we are able to stop this once we have IRQ fully operational. However, if we abort the probe quite early then during unwind we might try to destroy the workqueue while there is still a pending delayed work that attempts to restart itself which triggers a WARN. This was recently observed during unsuccessful VF initialization: [ ] xe 0000:00:02.1: probe with driver xe failed with error -62 [ ] ------------[ cut here ]------------ [ ] workqueue: cannot queue safe_mode_worker_func [xe] on wq xe-g2h-wq [ ] WARNING: CPU: 9 PID: 0 at kernel/workqueue.c:2257 __queue_work+0x287/0x710 [ ] RIP: 0010:__queue_work+0x287/0x710 [ ] Call Trace: [ ] delayed_work_timer_fn+0x19/0x30 [ ] call_timer_fn+0xa1/0x2a0 Exit the CT safe mode on unwind to avoid that warning. Fixes: 09b286950f29 ("drm/xe/guc: Allow CTB G2H processing without G2H IRQ") Signed-off-by: Michal Wajdeczko Cc: Matthew Brost Reviewed-by: Matthew Brost Link: https://lore.kernel.org/r/20250612220937.857-3-michal.wajdeczko@intel.com (cherry picked from commit 2ddbb73ec20b98e70a5200cb85deade22ccea2ec) Signed-off-by: Thomas Hellström --- drivers/gpu/drm/xe/xe_guc_ct.c | 10 ++++++---- 1 file changed, 6 insertions(+), 4 deletions(-) diff --git a/drivers/gpu/drm/xe/xe_guc_ct.c b/drivers/gpu/drm/xe/xe_guc_ct.c index d0ac48d8f4f7..bbcbb348256f 100644 --- a/drivers/gpu/drm/xe/xe_guc_ct.c +++ b/drivers/gpu/drm/xe/xe_guc_ct.c @@ -34,6 +34,11 @@ #include "xe_pm.h" #include "xe_trace_guc.h" +static void receive_g2h(struct xe_guc_ct *ct); +static void g2h_worker_func(struct work_struct *w); +static void safe_mode_worker_func(struct work_struct *w); +static void ct_exit_safe_mode(struct xe_guc_ct *ct); + #if IS_ENABLED(CONFIG_DRM_XE_DEBUG) enum { /* Internal states, not error conditions */ @@ -186,14 +191,11 @@ static void guc_ct_fini(struct drm_device *drm, void *arg) { struct xe_guc_ct *ct = arg; + ct_exit_safe_mode(ct); destroy_workqueue(ct->g2h_wq); xa_destroy(&ct->fence_lookup); } -static void receive_g2h(struct xe_guc_ct *ct); -static void g2h_worker_func(struct work_struct *w); -static void safe_mode_worker_func(struct work_struct *w); - static void primelockdep(struct xe_guc_ct *ct) { if (!IS_ENABLED(CONFIG_LOCKDEP)) From af2b588abe006bd55ddd358c4c3b87523349c475 Mon Sep 17 00:00:00 2001 From: Michal Wajdeczko Date: Fri, 13 Jun 2025 00:09:36 +0200 Subject: [PATCH 224/289] drm/xe: Process deferred GGTT node removals on device unwind MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit While we are indirectly draining our dedicated workqueue ggtt->wq that we use to complete asynchronous removal of some GGTT nodes, this happends as part of the managed-drm unwinding (ggtt_fini_early), which could be later then manage-device unwinding, where we could already unmap our MMIO/GMS mapping (mmio_fini). This was recently observed during unsuccessful VF initialization: [ ] xe 0000:00:02.1: probe with driver xe failed with error -62 [ ] xe 0000:00:02.1: DEVRES REL ffff88811e747340 __xe_bo_unpin_map_no_vm (16 bytes) [ ] xe 0000:00:02.1: DEVRES REL ffff88811e747540 __xe_bo_unpin_map_no_vm (16 bytes) [ ] xe 0000:00:02.1: DEVRES REL ffff88811e747240 __xe_bo_unpin_map_no_vm (16 bytes) [ ] xe 0000:00:02.1: DEVRES REL ffff88811e747040 tiles_fini (16 bytes) [ ] xe 0000:00:02.1: DEVRES REL ffff88811e746840 mmio_fini (16 bytes) [ ] xe 0000:00:02.1: DEVRES REL ffff88811e747f40 xe_bo_pinned_fini (16 bytes) [ ] xe 0000:00:02.1: DEVRES REL ffff88811e746b40 devm_drm_dev_init_release (16 bytes) [ ] xe 0000:00:02.1: [drm:drm_managed_release] drmres release begin [ ] xe 0000:00:02.1: [drm:drm_managed_release] REL ffff88810ef81640 __fini_relay (8 bytes) [ ] xe 0000:00:02.1: [drm:drm_managed_release] REL ffff88810ef80d40 guc_ct_fini (8 bytes) [ ] xe 0000:00:02.1: [drm:drm_managed_release] REL ffff88810ef80040 __drmm_mutex_release (8 bytes) [ ] xe 0000:00:02.1: [drm:drm_managed_release] REL ffff88810ef80140 ggtt_fini_early (8 bytes) and this was leading to: [ ] BUG: unable to handle page fault for address: ffffc900058162a0 [ ] #PF: supervisor write access in kernel mode [ ] #PF: error_code(0x0002) - not-present page [ ] Oops: Oops: 0002 [#1] SMP NOPTI [ ] Tainted: [W]=WARN [ ] Workqueue: xe-ggtt-wq ggtt_node_remove_work_func [xe] [ ] RIP: 0010:xe_ggtt_set_pte+0x6d/0x350 [xe] [ ] Call Trace: [ ] [ ] xe_ggtt_clear+0xb0/0x270 [xe] [ ] ggtt_node_remove+0xbb/0x120 [xe] [ ] ggtt_node_remove_work_func+0x30/0x50 [xe] [ ] process_one_work+0x22b/0x6f0 [ ] worker_thread+0x1e8/0x3d Add managed-device action that will explicitly drain the workqueue with all pending node removals prior to releasing MMIO/GSM mapping. Fixes: 919bb54e989c ("drm/xe: Fix missing runtime outer protection for ggtt_remove_node") Signed-off-by: Michal Wajdeczko Cc: Rodrigo Vivi Cc: Lucas De Marchi Reviewed-by: Rodrigo Vivi Link: https://lore.kernel.org/r/20250612220937.857-2-michal.wajdeczko@intel.com (cherry picked from commit 89d2835c3680ab1938e22ad81b1c9f8c686bd391) Signed-off-by: Thomas Hellström --- drivers/gpu/drm/xe/xe_ggtt.c | 11 +++++++++++ 1 file changed, 11 insertions(+) diff --git a/drivers/gpu/drm/xe/xe_ggtt.c b/drivers/gpu/drm/xe/xe_ggtt.c index 7062115909f2..2c799958c1e4 100644 --- a/drivers/gpu/drm/xe/xe_ggtt.c +++ b/drivers/gpu/drm/xe/xe_ggtt.c @@ -201,6 +201,13 @@ static const struct xe_ggtt_pt_ops xelpg_pt_wa_ops = { .ggtt_set_pte = xe_ggtt_set_pte_and_flush, }; +static void dev_fini_ggtt(void *arg) +{ + struct xe_ggtt *ggtt = arg; + + drain_workqueue(ggtt->wq); +} + /** * xe_ggtt_init_early - Early GGTT initialization * @ggtt: the &xe_ggtt to be initialized @@ -257,6 +264,10 @@ int xe_ggtt_init_early(struct xe_ggtt *ggtt) if (err) return err; + err = devm_add_action_or_reset(xe->drm.dev, dev_fini_ggtt, ggtt); + if (err) + return err; + if (IS_SRIOV_VF(xe)) { err = xe_gt_sriov_vf_prepare_ggtt(xe_tile_get_gt(ggtt->tile, 0)); if (err) From 969127bf0783a4ac0c8a27e633a9e8ea1738583f Mon Sep 17 00:00:00 2001 From: Ronnie Sahlberg Date: Thu, 26 Jun 2025 12:20:45 +1000 Subject: [PATCH 225/289] ublk: sanity check add_dev input for underflow Add additional checks that queue depth and number of queues are non-zero. Signed-off-by: Ronnie Sahlberg Reviewed-by: Ming Lei Link: https://lore.kernel.org/r/20250626022046.235018-1-ronniesahlberg@gmail.com Signed-off-by: Jens Axboe --- drivers/block/ublk_drv.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/block/ublk_drv.c b/drivers/block/ublk_drv.c index d441d3259edb..c3e3c3b65a6d 100644 --- a/drivers/block/ublk_drv.c +++ b/drivers/block/ublk_drv.c @@ -2849,7 +2849,8 @@ static int ublk_ctrl_add_dev(const struct ublksrv_ctrl_cmd *header) if (copy_from_user(&info, argp, sizeof(info))) return -EFAULT; - if (info.queue_depth > UBLK_MAX_QUEUE_DEPTH || info.nr_hw_queues > UBLK_MAX_NR_QUEUES) + if (info.queue_depth > UBLK_MAX_QUEUE_DEPTH || !info.queue_depth || + info.nr_hw_queues > UBLK_MAX_NR_QUEUES || !info.nr_hw_queues) return -EINVAL; if (capable(CAP_SYS_ADMIN)) From c007062188d8e402c294117db53a24b2bed2b83f Mon Sep 17 00:00:00 2001 From: Yu Kuai Date: Thu, 26 Jun 2025 19:57:43 +0800 Subject: [PATCH 226/289] block: fix false warning in bdev_count_inflight_rw() While bdev_count_inflight is interating all cpus, if some IOs are issued from traversed cpu and then completed from the cpu that is not traversed yet: cpu0 cpu1 bdev_count_inflight //for_each_possible_cpu // cpu0 is 0 infliht += 0 // issue a io blk_account_io_start // cpu0 inflight ++ cpu2 // the io is done blk_account_io_done // cpu2 inflight -- // cpu 1 is 0 inflight += 0 // cpu2 is -1 inflight += -1 ... In this case, the total inflight will be -1, causing lots of false warning. Fix the problem by removing the warning. Noted there is still a valid warning for nvme-mpath(From Yi) that is not fixed yet. Fixes: f5482ee5edb9 ("block: WARN if bdev inflight counter is negative") Reported-by: Yi Zhang Closes: https://lore.kernel.org/linux-block/aFtUXy-lct0WxY2w@mozart.vkv.me/T/#mae89155a5006463d0a21a4a2c35ae0034b26a339 Reported-and-tested-by: Calvin Owens Closes: https://lore.kernel.org/linux-block/aFtUXy-lct0WxY2w@mozart.vkv.me/T/#m1d935a00070bf95055d0ac84e6075158b08acaef Reported-by: Dave Chinner Closes: https://lore.kernel.org/linux-block/aFuypjqCXo9-5_En@dread.disaster.area/ Signed-off-by: Yu Kuai Link: https://lore.kernel.org/r/20250626115743.1641443-1-yukuai3@huawei.com Reviewed-by: Christoph Hellwig Signed-off-by: Jens Axboe --- block/genhd.c | 26 +++++++++++++++----------- 1 file changed, 15 insertions(+), 11 deletions(-) diff --git a/block/genhd.c b/block/genhd.c index 8171a6bc3210..c26733f6324b 100644 --- a/block/genhd.c +++ b/block/genhd.c @@ -128,23 +128,27 @@ static void part_stat_read_all(struct block_device *part, static void bdev_count_inflight_rw(struct block_device *part, unsigned int inflight[2], bool mq_driver) { + int write = 0; + int read = 0; int cpu; if (mq_driver) { blk_mq_in_driver_rw(part, inflight); - } else { - for_each_possible_cpu(cpu) { - inflight[READ] += part_stat_local_read_cpu( - part, in_flight[READ], cpu); - inflight[WRITE] += part_stat_local_read_cpu( - part, in_flight[WRITE], cpu); - } + return; } - if (WARN_ON_ONCE((int)inflight[READ] < 0)) - inflight[READ] = 0; - if (WARN_ON_ONCE((int)inflight[WRITE] < 0)) - inflight[WRITE] = 0; + for_each_possible_cpu(cpu) { + read += part_stat_local_read_cpu(part, in_flight[READ], cpu); + write += part_stat_local_read_cpu(part, in_flight[WRITE], cpu); + } + + /* + * While iterating all CPUs, some IOs may be issued from a CPU already + * traversed and complete on a CPU that has not yet been traversed, + * causing the inflight number to be negative. + */ + inflight[READ] = read > 0 ? read : 0; + inflight[WRITE] = write > 0 ? write : 0; } /** From 711741f94ac3cf9f4e3aa73aa171e76d188c0819 Mon Sep 17 00:00:00 2001 From: Paulo Alcantara Date: Wed, 25 Jun 2025 12:22:38 -0300 Subject: [PATCH 227/289] smb: client: fix potential deadlock when reconnecting channels Fix cifs_signal_cifsd_for_reconnect() to take the correct lock order and prevent the following deadlock from happening ====================================================== WARNING: possible circular locking dependency detected 6.16.0-rc3-build2+ #1301 Tainted: G S W ------------------------------------------------------ cifsd/6055 is trying to acquire lock: ffff88810ad56038 (&tcp_ses->srv_lock){+.+.}-{3:3}, at: cifs_signal_cifsd_for_reconnect+0x134/0x200 but task is already holding lock: ffff888119c64330 (&ret_buf->chan_lock){+.+.}-{3:3}, at: cifs_signal_cifsd_for_reconnect+0xcf/0x200 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #2 (&ret_buf->chan_lock){+.+.}-{3:3}: validate_chain+0x1cf/0x270 __lock_acquire+0x60e/0x780 lock_acquire.part.0+0xb4/0x1f0 _raw_spin_lock+0x2f/0x40 cifs_setup_session+0x81/0x4b0 cifs_get_smb_ses+0x771/0x900 cifs_mount_get_session+0x7e/0x170 cifs_mount+0x92/0x2d0 cifs_smb3_do_mount+0x161/0x460 smb3_get_tree+0x55/0x90 vfs_get_tree+0x46/0x180 do_new_mount+0x1b0/0x2e0 path_mount+0x6ee/0x740 do_mount+0x98/0xe0 __do_sys_mount+0x148/0x180 do_syscall_64+0xa4/0x260 entry_SYSCALL_64_after_hwframe+0x76/0x7e -> #1 (&ret_buf->ses_lock){+.+.}-{3:3}: validate_chain+0x1cf/0x270 __lock_acquire+0x60e/0x780 lock_acquire.part.0+0xb4/0x1f0 _raw_spin_lock+0x2f/0x40 cifs_match_super+0x101/0x320 sget+0xab/0x270 cifs_smb3_do_mount+0x1e0/0x460 smb3_get_tree+0x55/0x90 vfs_get_tree+0x46/0x180 do_new_mount+0x1b0/0x2e0 path_mount+0x6ee/0x740 do_mount+0x98/0xe0 __do_sys_mount+0x148/0x180 do_syscall_64+0xa4/0x260 entry_SYSCALL_64_after_hwframe+0x76/0x7e -> #0 (&tcp_ses->srv_lock){+.+.}-{3:3}: check_noncircular+0x95/0xc0 check_prev_add+0x115/0x2f0 validate_chain+0x1cf/0x270 __lock_acquire+0x60e/0x780 lock_acquire.part.0+0xb4/0x1f0 _raw_spin_lock+0x2f/0x40 cifs_signal_cifsd_for_reconnect+0x134/0x200 __cifs_reconnect+0x8f/0x500 cifs_handle_standard+0x112/0x280 cifs_demultiplex_thread+0x64d/0xbc0 kthread+0x2f7/0x310 ret_from_fork+0x2a/0x230 ret_from_fork_asm+0x1a/0x30 other info that might help us debug this: Chain exists of: &tcp_ses->srv_lock --> &ret_buf->ses_lock --> &ret_buf->chan_lock Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(&ret_buf->chan_lock); lock(&ret_buf->ses_lock); lock(&ret_buf->chan_lock); lock(&tcp_ses->srv_lock); *** DEADLOCK *** 3 locks held by cifsd/6055: #0: ffffffff857de398 (&cifs_tcp_ses_lock){+.+.}-{3:3}, at: cifs_signal_cifsd_for_reconnect+0x7b/0x200 #1: ffff888119c64060 (&ret_buf->ses_lock){+.+.}-{3:3}, at: cifs_signal_cifsd_for_reconnect+0x9c/0x200 #2: ffff888119c64330 (&ret_buf->chan_lock){+.+.}-{3:3}, at: cifs_signal_cifsd_for_reconnect+0xcf/0x200 Cc: linux-cifs@vger.kernel.org Reported-by: David Howells Fixes: d7d7a66aacd6 ("cifs: avoid use of global locks for high contention data") Reviewed-by: David Howells Tested-by: David Howells Signed-off-by: Paulo Alcantara (Red Hat) Signed-off-by: David Howells Signed-off-by: Steve French --- fs/smb/client/cifsglob.h | 1 + fs/smb/client/connect.c | 58 +++++++++++++++++++++++++--------------- 2 files changed, 37 insertions(+), 22 deletions(-) diff --git a/fs/smb/client/cifsglob.h b/fs/smb/client/cifsglob.h index 45e94e18f4d5..318a8405d475 100644 --- a/fs/smb/client/cifsglob.h +++ b/fs/smb/client/cifsglob.h @@ -709,6 +709,7 @@ inc_rfc1001_len(void *buf, int count) struct TCP_Server_Info { struct list_head tcp_ses_list; struct list_head smb_ses_list; + struct list_head rlist; /* reconnect list */ spinlock_t srv_lock; /* protect anything here that is not protected */ __u64 conn_id; /* connection identifier (useful for debugging) */ int srv_count; /* reference counter */ diff --git a/fs/smb/client/connect.c b/fs/smb/client/connect.c index c48869c29e15..685c65dcb8c4 100644 --- a/fs/smb/client/connect.c +++ b/fs/smb/client/connect.c @@ -124,6 +124,14 @@ static void smb2_query_server_interfaces(struct work_struct *work) (SMB_INTERFACE_POLL_INTERVAL * HZ)); } +#define set_need_reco(server) \ +do { \ + spin_lock(&server->srv_lock); \ + if (server->tcpStatus != CifsExiting) \ + server->tcpStatus = CifsNeedReconnect; \ + spin_unlock(&server->srv_lock); \ +} while (0) + /* * Update the tcpStatus for the server. * This is used to signal the cifsd thread to call cifs_reconnect @@ -137,39 +145,45 @@ void cifs_signal_cifsd_for_reconnect(struct TCP_Server_Info *server, bool all_channels) { - struct TCP_Server_Info *pserver; + struct TCP_Server_Info *nserver; struct cifs_ses *ses; + LIST_HEAD(reco); int i; - /* If server is a channel, select the primary channel */ - pserver = SERVER_IS_CHAN(server) ? server->primary_server : server; - /* if we need to signal just this channel */ if (!all_channels) { - spin_lock(&server->srv_lock); - if (server->tcpStatus != CifsExiting) - server->tcpStatus = CifsNeedReconnect; - spin_unlock(&server->srv_lock); + set_need_reco(server); return; } - spin_lock(&cifs_tcp_ses_lock); - list_for_each_entry(ses, &pserver->smb_ses_list, smb_ses_list) { - if (cifs_ses_exiting(ses)) - continue; - spin_lock(&ses->chan_lock); - for (i = 0; i < ses->chan_count; i++) { - if (!ses->chans[i].server) + if (SERVER_IS_CHAN(server)) + server = server->primary_server; + scoped_guard(spinlock, &cifs_tcp_ses_lock) { + set_need_reco(server); + list_for_each_entry(ses, &server->smb_ses_list, smb_ses_list) { + spin_lock(&ses->ses_lock); + if (ses->ses_status == SES_EXITING) { + spin_unlock(&ses->ses_lock); continue; - - spin_lock(&ses->chans[i].server->srv_lock); - if (ses->chans[i].server->tcpStatus != CifsExiting) - ses->chans[i].server->tcpStatus = CifsNeedReconnect; - spin_unlock(&ses->chans[i].server->srv_lock); + } + spin_lock(&ses->chan_lock); + for (i = 1; i < ses->chan_count; i++) { + nserver = ses->chans[i].server; + if (!nserver) + continue; + nserver->srv_count++; + list_add(&nserver->rlist, &reco); + } + spin_unlock(&ses->chan_lock); + spin_unlock(&ses->ses_lock); } - spin_unlock(&ses->chan_lock); } - spin_unlock(&cifs_tcp_ses_lock); + + list_for_each_entry_safe(server, nserver, &reco, rlist) { + list_del_init(&server->rlist); + set_need_reco(server); + cifs_put_tcp_session(server, 0); + } } /* From 43e7e284fc77b710d899569360ea46fa3374ae22 Mon Sep 17 00:00:00 2001 From: David Howells Date: Wed, 25 Jun 2025 14:15:04 +0100 Subject: [PATCH 228/289] cifs: Fix the smbd_response slab to allow usercopy The handling of received data in the smbdirect client code involves using copy_to_iter() to copy data from the smbd_reponse struct's packet trailer to a folioq buffer provided by netfslib that encapsulates a chunk of pagecache. If, however, CONFIG_HARDENED_USERCOPY=y, this will result in the checks then performed in copy_to_iter() oopsing with something like the following: CIFS: Attempting to mount //172.31.9.1/test CIFS: VFS: RDMA transport established usercopy: Kernel memory exposure attempt detected from SLUB object 'smbd_response_0000000091e24ea1' (offset 81, size 63)! ------------[ cut here ]------------ kernel BUG at mm/usercopy.c:102! ... RIP: 0010:usercopy_abort+0x6c/0x80 ... Call Trace: __check_heap_object+0xe3/0x120 __check_object_size+0x4dc/0x6d0 smbd_recv+0x77f/0xfe0 [cifs] cifs_readv_from_socket+0x276/0x8f0 [cifs] cifs_read_from_socket+0xcd/0x120 [cifs] cifs_demultiplex_thread+0x7e9/0x2d50 [cifs] kthread+0x396/0x830 ret_from_fork+0x2b8/0x3b0 ret_from_fork_asm+0x1a/0x30 The problem is that the smbd_response slab's packet field isn't marked as being permitted for usercopy. Fix this by passing parameters to kmem_slab_create() to indicate that copy_to_iter() is permitted from the packet region of the smbd_response slab objects, less the header space. Fixes: ee4cdf7ba857 ("netfs: Speed up buffered reading") Reported-by: Stefan Metzmacher Link: https://lore.kernel.org/r/acb7f612-df26-4e2a-a35d-7cd040f513e1@samba.org/ Signed-off-by: David Howells Reviewed-by: Stefan Metzmacher Tested-by: Stefan Metzmacher cc: Paulo Alcantara cc: linux-cifs@vger.kernel.org cc: netfs@lists.linux.dev cc: linux-fsdevel@vger.kernel.org Signed-off-by: Steve French --- fs/smb/client/smbdirect.c | 18 +++++++++++++----- 1 file changed, 13 insertions(+), 5 deletions(-) diff --git a/fs/smb/client/smbdirect.c b/fs/smb/client/smbdirect.c index a976bcf61226..0a9fd6c399f6 100644 --- a/fs/smb/client/smbdirect.c +++ b/fs/smb/client/smbdirect.c @@ -1475,6 +1475,9 @@ static int allocate_caches_and_workqueue(struct smbd_connection *info) char name[MAX_NAME_LEN]; int rc; + if (WARN_ON_ONCE(sp->max_recv_size < sizeof(struct smbdirect_data_transfer))) + return -ENOMEM; + scnprintf(name, MAX_NAME_LEN, "smbd_request_%p", info); info->request_cache = kmem_cache_create( @@ -1492,12 +1495,17 @@ static int allocate_caches_and_workqueue(struct smbd_connection *info) goto out1; scnprintf(name, MAX_NAME_LEN, "smbd_response_%p", info); + + struct kmem_cache_args response_args = { + .align = __alignof__(struct smbd_response), + .useroffset = (offsetof(struct smbd_response, packet) + + sizeof(struct smbdirect_data_transfer)), + .usersize = sp->max_recv_size - sizeof(struct smbdirect_data_transfer), + }; info->response_cache = - kmem_cache_create( - name, - sizeof(struct smbd_response) + - sp->max_recv_size, - 0, SLAB_HWCACHE_ALIGN, NULL); + kmem_cache_create(name, + sizeof(struct smbd_response) + sp->max_recv_size, + &response_args, SLAB_HWCACHE_ALIGN); if (!info->response_cache) goto out2; From 263debecb4aa7cec0a86487e6f409814f6194a21 Mon Sep 17 00:00:00 2001 From: David Howells Date: Wed, 2 Apr 2025 20:27:26 +0100 Subject: [PATCH 229/289] cifs: Fix reading into an ITER_FOLIOQ from the smbdirect code When performing a file read from RDMA, smbd_recv() prints an "Invalid msg type 4" error and fails the I/O. This is due to the switch-statement there not handling the ITER_FOLIOQ handed down from netfslib. Fix this by collapsing smbd_recv_buf() and smbd_recv_page() into smbd_recv() and just using copy_to_iter() instead of memcpy(). This future-proofs the function too, in case more ITER_* types are added. Fixes: ee4cdf7ba857 ("netfs: Speed up buffered reading") Reported-by: Stefan Metzmacher Signed-off-by: David Howells cc: Tom Talpey cc: Paulo Alcantara (Red Hat) cc: Matthew Wilcox cc: linux-cifs@vger.kernel.org cc: netfs@lists.linux.dev cc: linux-fsdevel@vger.kernel.org Signed-off-by: Steve French --- fs/smb/client/smbdirect.c | 112 ++++++-------------------------------- 1 file changed, 17 insertions(+), 95 deletions(-) diff --git a/fs/smb/client/smbdirect.c b/fs/smb/client/smbdirect.c index 0a9fd6c399f6..754e94a0e07f 100644 --- a/fs/smb/client/smbdirect.c +++ b/fs/smb/client/smbdirect.c @@ -1778,35 +1778,39 @@ try_again: } /* - * Receive data from receive reassembly queue + * Receive data from the transport's receive reassembly queue * All the incoming data packets are placed in reassembly queue - * buf: the buffer to read data into + * iter: the buffer to read data into * size: the length of data to read * return value: actual data read - * Note: this implementation copies the data from reassebmly queue to receive + * + * Note: this implementation copies the data from reassembly queue to receive * buffers used by upper layer. This is not the optimal code path. A better way * to do it is to not have upper layer allocate its receive buffers but rather * borrow the buffer from reassembly queue, and return it after data is * consumed. But this will require more changes to upper layer code, and also * need to consider packet boundaries while they still being reassembled. */ -static int smbd_recv_buf(struct smbd_connection *info, char *buf, - unsigned int size) +int smbd_recv(struct smbd_connection *info, struct msghdr *msg) { struct smbdirect_socket *sc = &info->socket; struct smbd_response *response; struct smbdirect_data_transfer *data_transfer; + size_t size = iov_iter_count(&msg->msg_iter); int to_copy, to_read, data_read, offset; u32 data_length, remaining_data_length, data_offset; int rc; + if (WARN_ON_ONCE(iov_iter_rw(&msg->msg_iter) == WRITE)) + return -EINVAL; /* It's a bug in upper layer to get there */ + again: /* * No need to hold the reassembly queue lock all the time as we are * the only one reading from the front of the queue. The transport * may add more entries to the back of the queue at the same time */ - log_read(INFO, "size=%d info->reassembly_data_length=%d\n", size, + log_read(INFO, "size=%zd info->reassembly_data_length=%d\n", size, info->reassembly_data_length); if (info->reassembly_data_length >= size) { int queue_length; @@ -1844,7 +1848,10 @@ again: if (response->first_segment && size == 4) { unsigned int rfc1002_len = data_length + remaining_data_length; - *((__be32 *)buf) = cpu_to_be32(rfc1002_len); + __be32 rfc1002_hdr = cpu_to_be32(rfc1002_len); + if (copy_to_iter(&rfc1002_hdr, sizeof(rfc1002_hdr), + &msg->msg_iter) != sizeof(rfc1002_hdr)) + return -EFAULT; data_read = 4; response->first_segment = false; log_read(INFO, "returning rfc1002 length %d\n", @@ -1853,10 +1860,9 @@ again: } to_copy = min_t(int, data_length - offset, to_read); - memcpy( - buf + data_read, - (char *)data_transfer + data_offset + offset, - to_copy); + if (copy_to_iter((char *)data_transfer + data_offset + offset, + to_copy, &msg->msg_iter) != to_copy) + return -EFAULT; /* move on to the next buffer? */ if (to_copy == data_length - offset) { @@ -1921,90 +1927,6 @@ read_rfc1002_done: goto again; } -/* - * Receive a page from receive reassembly queue - * page: the page to read data into - * to_read: the length of data to read - * return value: actual data read - */ -static int smbd_recv_page(struct smbd_connection *info, - struct page *page, unsigned int page_offset, - unsigned int to_read) -{ - struct smbdirect_socket *sc = &info->socket; - int ret; - char *to_address; - void *page_address; - - /* make sure we have the page ready for read */ - ret = wait_event_interruptible( - info->wait_reassembly_queue, - info->reassembly_data_length >= to_read || - sc->status != SMBDIRECT_SOCKET_CONNECTED); - if (ret) - return ret; - - /* now we can read from reassembly queue and not sleep */ - page_address = kmap_atomic(page); - to_address = (char *) page_address + page_offset; - - log_read(INFO, "reading from page=%p address=%p to_read=%d\n", - page, to_address, to_read); - - ret = smbd_recv_buf(info, to_address, to_read); - kunmap_atomic(page_address); - - return ret; -} - -/* - * Receive data from transport - * msg: a msghdr point to the buffer, can be ITER_KVEC or ITER_BVEC - * return: total bytes read, or 0. SMB Direct will not do partial read. - */ -int smbd_recv(struct smbd_connection *info, struct msghdr *msg) -{ - char *buf; - struct page *page; - unsigned int to_read, page_offset; - int rc; - - if (iov_iter_rw(&msg->msg_iter) == WRITE) { - /* It's a bug in upper layer to get there */ - cifs_dbg(VFS, "Invalid msg iter dir %u\n", - iov_iter_rw(&msg->msg_iter)); - rc = -EINVAL; - goto out; - } - - switch (iov_iter_type(&msg->msg_iter)) { - case ITER_KVEC: - buf = msg->msg_iter.kvec->iov_base; - to_read = msg->msg_iter.kvec->iov_len; - rc = smbd_recv_buf(info, buf, to_read); - break; - - case ITER_BVEC: - page = msg->msg_iter.bvec->bv_page; - page_offset = msg->msg_iter.bvec->bv_offset; - to_read = msg->msg_iter.bvec->bv_len; - rc = smbd_recv_page(info, page, page_offset, to_read); - break; - - default: - /* It's a bug in upper layer to get there */ - cifs_dbg(VFS, "Invalid msg type %d\n", - iov_iter_type(&msg->msg_iter)); - rc = -EINVAL; - } - -out: - /* SMBDirect will read it all or nothing */ - if (rc > 0) - msg->msg_iter.count = 0; - return rc; -} - /* * Send data to transport * Each rqst is transported as a SMBDirect payload From 38074de35b015df5623f524d6f2b49a0cd395c40 Mon Sep 17 00:00:00 2001 From: Trond Myklebust Date: Thu, 19 Jun 2025 15:16:11 -0400 Subject: [PATCH 230/289] NFSv4/flexfiles: Fix handling of NFS level errors in I/O Allow the flexfiles error handling to recognise NFS level errors (as opposed to RPC level errors) and handle them separately. The main motivator is the NFSERR_PERM errors that get returned if the NFS client connects to the data server through a port number that is lower than 1024. In that case, the client should disconnect and retry a READ on a different data server, or it should retry a WRITE after reconnecting. Reviewed-by: Tigran Mkrtchyan Fixes: d67ae825a59d ("pnfs/flexfiles: Add the FlexFile Layout Driver") Signed-off-by: Trond Myklebust Signed-off-by: Anna Schumaker --- fs/nfs/flexfilelayout/flexfilelayout.c | 118 ++++++++++++++++++------- 1 file changed, 84 insertions(+), 34 deletions(-) diff --git a/fs/nfs/flexfilelayout/flexfilelayout.c b/fs/nfs/flexfilelayout/flexfilelayout.c index df4807460596..4bea008dbebd 100644 --- a/fs/nfs/flexfilelayout/flexfilelayout.c +++ b/fs/nfs/flexfilelayout/flexfilelayout.c @@ -1105,6 +1105,7 @@ static void ff_layout_reset_read(struct nfs_pgio_header *hdr) } static int ff_layout_async_handle_error_v4(struct rpc_task *task, + u32 op_status, struct nfs4_state *state, struct nfs_client *clp, struct pnfs_layout_segment *lseg, @@ -1115,34 +1116,42 @@ static int ff_layout_async_handle_error_v4(struct rpc_task *task, struct nfs4_deviceid_node *devid = FF_LAYOUT_DEVID_NODE(lseg, idx); struct nfs4_slot_table *tbl = &clp->cl_session->fc_slot_table; - switch (task->tk_status) { - case -NFS4ERR_BADSESSION: - case -NFS4ERR_BADSLOT: - case -NFS4ERR_BAD_HIGH_SLOT: - case -NFS4ERR_DEADSESSION: - case -NFS4ERR_CONN_NOT_BOUND_TO_SESSION: - case -NFS4ERR_SEQ_FALSE_RETRY: - case -NFS4ERR_SEQ_MISORDERED: + switch (op_status) { + case NFS4_OK: + case NFS4ERR_NXIO: + break; + case NFSERR_PERM: + if (!task->tk_xprt) + break; + xprt_force_disconnect(task->tk_xprt); + goto out_retry; + case NFS4ERR_BADSESSION: + case NFS4ERR_BADSLOT: + case NFS4ERR_BAD_HIGH_SLOT: + case NFS4ERR_DEADSESSION: + case NFS4ERR_CONN_NOT_BOUND_TO_SESSION: + case NFS4ERR_SEQ_FALSE_RETRY: + case NFS4ERR_SEQ_MISORDERED: dprintk("%s ERROR %d, Reset session. Exchangeid " "flags 0x%x\n", __func__, task->tk_status, clp->cl_exchange_flags); nfs4_schedule_session_recovery(clp->cl_session, task->tk_status); - break; - case -NFS4ERR_DELAY: + goto out_retry; + case NFS4ERR_DELAY: nfs_inc_stats(lseg->pls_layout->plh_inode, NFSIOS_DELAY); fallthrough; - case -NFS4ERR_GRACE: + case NFS4ERR_GRACE: rpc_delay(task, FF_LAYOUT_POLL_RETRY_MAX); - break; - case -NFS4ERR_RETRY_UNCACHED_REP: - break; + goto out_retry; + case NFS4ERR_RETRY_UNCACHED_REP: + goto out_retry; /* Invalidate Layout errors */ - case -NFS4ERR_PNFS_NO_LAYOUT: - case -ESTALE: /* mapped NFS4ERR_STALE */ - case -EBADHANDLE: /* mapped NFS4ERR_BADHANDLE */ - case -EISDIR: /* mapped NFS4ERR_ISDIR */ - case -NFS4ERR_FHEXPIRED: - case -NFS4ERR_WRONG_TYPE: + case NFS4ERR_PNFS_NO_LAYOUT: + case NFS4ERR_STALE: + case NFS4ERR_BADHANDLE: + case NFS4ERR_ISDIR: + case NFS4ERR_FHEXPIRED: + case NFS4ERR_WRONG_TYPE: dprintk("%s Invalid layout error %d\n", __func__, task->tk_status); /* @@ -1155,6 +1164,11 @@ static int ff_layout_async_handle_error_v4(struct rpc_task *task, pnfs_destroy_layout(NFS_I(inode)); rpc_wake_up(&tbl->slot_tbl_waitq); goto reset; + default: + break; + } + + switch (task->tk_status) { /* RPC connection errors */ case -ENETDOWN: case -ENETUNREACH: @@ -1174,27 +1188,56 @@ static int ff_layout_async_handle_error_v4(struct rpc_task *task, nfs4_delete_deviceid(devid->ld, devid->nfs_client, &devid->deviceid); rpc_wake_up(&tbl->slot_tbl_waitq); - fallthrough; + break; default: - if (ff_layout_avoid_mds_available_ds(lseg)) - return -NFS4ERR_RESET_TO_PNFS; -reset: - dprintk("%s Retry through MDS. Error %d\n", __func__, - task->tk_status); - return -NFS4ERR_RESET_TO_MDS; + break; } + + if (ff_layout_avoid_mds_available_ds(lseg)) + return -NFS4ERR_RESET_TO_PNFS; +reset: + dprintk("%s Retry through MDS. Error %d\n", __func__, + task->tk_status); + return -NFS4ERR_RESET_TO_MDS; + +out_retry: task->tk_status = 0; return -EAGAIN; } /* Retry all errors through either pNFS or MDS except for -EJUKEBOX */ static int ff_layout_async_handle_error_v3(struct rpc_task *task, + u32 op_status, struct nfs_client *clp, struct pnfs_layout_segment *lseg, u32 idx) { struct nfs4_deviceid_node *devid = FF_LAYOUT_DEVID_NODE(lseg, idx); + switch (op_status) { + case NFS_OK: + case NFSERR_NXIO: + break; + case NFSERR_PERM: + if (!task->tk_xprt) + break; + xprt_force_disconnect(task->tk_xprt); + goto out_retry; + case NFSERR_ACCES: + case NFSERR_BADHANDLE: + case NFSERR_FBIG: + case NFSERR_IO: + case NFSERR_NOSPC: + case NFSERR_ROFS: + case NFSERR_STALE: + goto out_reset_to_pnfs; + case NFSERR_JUKEBOX: + nfs_inc_stats(lseg->pls_layout->plh_inode, NFSIOS_DELAY); + goto out_retry; + default: + break; + } + switch (task->tk_status) { /* File access problems. Don't mark the device as unavailable */ case -EACCES: @@ -1218,6 +1261,7 @@ static int ff_layout_async_handle_error_v3(struct rpc_task *task, nfs4_delete_deviceid(devid->ld, devid->nfs_client, &devid->deviceid); } +out_reset_to_pnfs: /* FIXME: Need to prevent infinite looping here. */ return -NFS4ERR_RESET_TO_PNFS; out_retry: @@ -1228,6 +1272,7 @@ out_retry: } static int ff_layout_async_handle_error(struct rpc_task *task, + u32 op_status, struct nfs4_state *state, struct nfs_client *clp, struct pnfs_layout_segment *lseg, @@ -1246,10 +1291,11 @@ static int ff_layout_async_handle_error(struct rpc_task *task, switch (vers) { case 3: - return ff_layout_async_handle_error_v3(task, clp, lseg, idx); - case 4: - return ff_layout_async_handle_error_v4(task, state, clp, + return ff_layout_async_handle_error_v3(task, op_status, clp, lseg, idx); + case 4: + return ff_layout_async_handle_error_v4(task, op_status, state, + clp, lseg, idx); default: /* should never happen */ WARN_ON_ONCE(1); @@ -1302,6 +1348,7 @@ static void ff_layout_io_track_ds_error(struct pnfs_layout_segment *lseg, switch (status) { case NFS4ERR_DELAY: case NFS4ERR_GRACE: + case NFS4ERR_PERM: break; case NFS4ERR_NXIO: ff_layout_mark_ds_unreachable(lseg, idx); @@ -1334,7 +1381,8 @@ static int ff_layout_read_done_cb(struct rpc_task *task, trace_ff_layout_read_error(hdr, task->tk_status); } - err = ff_layout_async_handle_error(task, hdr->args.context->state, + err = ff_layout_async_handle_error(task, hdr->res.op_status, + hdr->args.context->state, hdr->ds_clp, hdr->lseg, hdr->pgio_mirror_idx); @@ -1507,7 +1555,8 @@ static int ff_layout_write_done_cb(struct rpc_task *task, trace_ff_layout_write_error(hdr, task->tk_status); } - err = ff_layout_async_handle_error(task, hdr->args.context->state, + err = ff_layout_async_handle_error(task, hdr->res.op_status, + hdr->args.context->state, hdr->ds_clp, hdr->lseg, hdr->pgio_mirror_idx); @@ -1556,8 +1605,9 @@ static int ff_layout_commit_done_cb(struct rpc_task *task, trace_ff_layout_commit_error(data, task->tk_status); } - err = ff_layout_async_handle_error(task, NULL, data->ds_clp, - data->lseg, data->ds_commit_index); + err = ff_layout_async_handle_error(task, data->res.op_status, + NULL, data->ds_clp, data->lseg, + data->ds_commit_index); trace_nfs4_pnfs_commit_ds(data, err); switch (err) { From 178b8ff66ff827c41b4fa105e9aabb99a0b5c537 Mon Sep 17 00:00:00 2001 From: Jens Axboe Date: Thu, 26 Jun 2025 12:17:48 -0600 Subject: [PATCH 231/289] io_uring/kbuf: flag partial buffer mappings A previous commit aborted mapping more for a non-incremental ring for bundle peeking, but depending on where in the process this peeking happened, it would not necessarily prevent a retry by the user. That can create gaps in the received/read data. Add struct buf_sel_arg->partial_map, which can pass this information back. The networking side can then map that to internal state and use it to gate retry as well. Since this necessitates a new flag, change io_sr_msg->retry to a retry_flags member, and store both the retry and partial map condition in there. Cc: stable@vger.kernel.org Fixes: 26ec15e4b0c1 ("io_uring/kbuf: don't truncate end buffer for multiple buffer peeks") Signed-off-by: Jens Axboe --- io_uring/kbuf.c | 1 + io_uring/kbuf.h | 3 ++- io_uring/net.c | 23 +++++++++++++++-------- 3 files changed, 18 insertions(+), 9 deletions(-) diff --git a/io_uring/kbuf.c b/io_uring/kbuf.c index ce95e3af44a9..f2d2cc319faa 100644 --- a/io_uring/kbuf.c +++ b/io_uring/kbuf.c @@ -271,6 +271,7 @@ static int io_ring_buffers_peek(struct io_kiocb *req, struct buf_sel_arg *arg, if (len > arg->max_len) { len = arg->max_len; if (!(bl->flags & IOBL_INC)) { + arg->partial_map = 1; if (iov != arg->iovs) break; buf->len = len; diff --git a/io_uring/kbuf.h b/io_uring/kbuf.h index 5d83c7adc739..723d0361898e 100644 --- a/io_uring/kbuf.h +++ b/io_uring/kbuf.h @@ -58,7 +58,8 @@ struct buf_sel_arg { size_t max_len; unsigned short nr_iovs; unsigned short mode; - unsigned buf_group; + unsigned short buf_group; + unsigned short partial_map; }; void __user *io_buffer_select(struct io_kiocb *req, size_t *len, diff --git a/io_uring/net.c b/io_uring/net.c index 5c1e8c4ba468..43a43522f406 100644 --- a/io_uring/net.c +++ b/io_uring/net.c @@ -75,12 +75,17 @@ struct io_sr_msg { u16 flags; /* initialised and used only by !msg send variants */ u16 buf_group; - bool retry; + unsigned short retry_flags; void __user *msg_control; /* used only for send zerocopy */ struct io_kiocb *notif; }; +enum sr_retry_flags { + IO_SR_MSG_RETRY = 1, + IO_SR_MSG_PARTIAL_MAP = 2, +}; + /* * Number of times we'll try and do receives if there's more data. If we * exceed this limit, then add us to the back of the queue and retry from @@ -187,7 +192,7 @@ static inline void io_mshot_prep_retry(struct io_kiocb *req, req->flags &= ~REQ_F_BL_EMPTY; sr->done_io = 0; - sr->retry = false; + sr->retry_flags = 0; sr->len = 0; /* get from the provided buffer */ } @@ -397,7 +402,7 @@ int io_sendmsg_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe) struct io_sr_msg *sr = io_kiocb_to_cmd(req, struct io_sr_msg); sr->done_io = 0; - sr->retry = false; + sr->retry_flags = 0; sr->len = READ_ONCE(sqe->len); sr->flags = READ_ONCE(sqe->ioprio); if (sr->flags & ~SENDMSG_FLAGS) @@ -751,7 +756,7 @@ int io_recvmsg_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe) struct io_sr_msg *sr = io_kiocb_to_cmd(req, struct io_sr_msg); sr->done_io = 0; - sr->retry = false; + sr->retry_flags = 0; if (unlikely(sqe->file_index || sqe->addr2)) return -EINVAL; @@ -823,7 +828,7 @@ static inline bool io_recv_finish(struct io_kiocb *req, int *ret, cflags |= io_put_kbufs(req, this_ret, io_bundle_nbufs(kmsg, this_ret), issue_flags); - if (sr->retry) + if (sr->retry_flags & IO_SR_MSG_RETRY) cflags = req->cqe.flags | (cflags & CQE_F_MASK); /* bundle with no more immediate buffers, we're done */ if (req->flags & REQ_F_BL_EMPTY) @@ -832,12 +837,12 @@ static inline bool io_recv_finish(struct io_kiocb *req, int *ret, * If more is available AND it was a full transfer, retry and * append to this one */ - if (!sr->retry && kmsg->msg.msg_inq > 1 && this_ret > 0 && + if (!sr->retry_flags && kmsg->msg.msg_inq > 1 && this_ret > 0 && !iov_iter_count(&kmsg->msg.msg_iter)) { req->cqe.flags = cflags & ~CQE_F_MASK; sr->len = kmsg->msg.msg_inq; sr->done_io += this_ret; - sr->retry = true; + sr->retry_flags |= IO_SR_MSG_RETRY; return false; } } else { @@ -1082,6 +1087,8 @@ static int io_recv_buf_select(struct io_kiocb *req, struct io_async_msghdr *kmsg kmsg->vec.iovec = arg.iovs; req->flags |= REQ_F_NEED_CLEANUP; } + if (arg.partial_map) + sr->retry_flags |= IO_SR_MSG_PARTIAL_MAP; /* special case 1 vec, can be a fast path */ if (ret == 1) { @@ -1276,7 +1283,7 @@ int io_send_zc_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe) int ret; zc->done_io = 0; - zc->retry = false; + zc->retry_flags = 0; if (unlikely(READ_ONCE(sqe->__pad2[0]) || READ_ONCE(sqe->addr3))) return -EINVAL; From 9159c5e733cfa35ec863fa81960a3e7435f831fb Mon Sep 17 00:00:00 2001 From: Bibo Mao Date: Fri, 27 Jun 2025 18:27:44 +0800 Subject: [PATCH 232/289] LoongArch: KVM: Add address alignment check for IOCSR emulation IOCSR instruction supports 1/2/4/8 bytes access, the address should be naturally aligned with its access size. Here address alignment check is added in the EIOINTC kernel emulation. Cc: stable@vger.kernel.org Fixes: 3956a52bc05b ("LoongArch: KVM: Add EIOINTC read and write functions") Signed-off-by: Bibo Mao Signed-off-by: Huacai Chen --- arch/loongarch/kvm/intc/eiointc.c | 10 ++++++++++ 1 file changed, 10 insertions(+) diff --git a/arch/loongarch/kvm/intc/eiointc.c b/arch/loongarch/kvm/intc/eiointc.c index 9c47456b805c..236cbf979167 100644 --- a/arch/loongarch/kvm/intc/eiointc.c +++ b/arch/loongarch/kvm/intc/eiointc.c @@ -305,6 +305,11 @@ static int kvm_eiointc_read(struct kvm_vcpu *vcpu, return -EINVAL; } + if (addr & (len - 1)) { + kvm_err("%s: eiointc not aligned addr %llx len %d\n", __func__, addr, len); + return -EINVAL; + } + vcpu->kvm->stat.eiointc_read_exits++; spin_lock_irqsave(&eiointc->lock, flags); switch (len) { @@ -676,6 +681,11 @@ static int kvm_eiointc_write(struct kvm_vcpu *vcpu, return -EINVAL; } + if (addr & (len - 1)) { + kvm_err("%s: eiointc not aligned addr %llx len %d\n", __func__, addr, len); + return -EINVAL; + } + vcpu->kvm->stat.eiointc_write_exits++; spin_lock_irqsave(&eiointc->lock, flags); switch (len) { From c34bbc2c990700ba07b271fc7c8113b0bc3e4093 Mon Sep 17 00:00:00 2001 From: Bibo Mao Date: Fri, 27 Jun 2025 18:27:44 +0800 Subject: [PATCH 233/289] LoongArch: KVM: Fix interrupt route update with EIOINTC With function eiointc_update_sw_coremap(), there is forced assignment like val = *(u64 *)pvalue. Parameter pvalue may be pointer to char type or others, there is problem with forced assignment with u64 type. Here the detailed value is passed rather address pointer. Cc: stable@vger.kernel.org Fixes: 3956a52bc05b ("LoongArch: KVM: Add EIOINTC read and write functions") Signed-off-by: Bibo Mao Signed-off-by: Huacai Chen --- arch/loongarch/kvm/intc/eiointc.c | 13 ++++++------- 1 file changed, 6 insertions(+), 7 deletions(-) diff --git a/arch/loongarch/kvm/intc/eiointc.c b/arch/loongarch/kvm/intc/eiointc.c index 236cbf979167..8d3f48e2a7f0 100644 --- a/arch/loongarch/kvm/intc/eiointc.c +++ b/arch/loongarch/kvm/intc/eiointc.c @@ -66,10 +66,9 @@ static void eiointc_update_irq(struct loongarch_eiointc *s, int irq, int level) } static inline void eiointc_update_sw_coremap(struct loongarch_eiointc *s, - int irq, void *pvalue, u32 len, bool notify) + int irq, u64 val, u32 len, bool notify) { int i, cpu; - u64 val = *(u64 *)pvalue; for (i = 0; i < len; i++) { cpu = val & 0xff; @@ -403,7 +402,7 @@ static int loongarch_eiointc_writeb(struct kvm_vcpu *vcpu, irq = offset - EIOINTC_COREMAP_START; index = irq; s->coremap.reg_u8[index] = data; - eiointc_update_sw_coremap(s, irq, (void *)&data, sizeof(data), true); + eiointc_update_sw_coremap(s, irq, data, sizeof(data), true); break; default: ret = -EINVAL; @@ -488,7 +487,7 @@ static int loongarch_eiointc_writew(struct kvm_vcpu *vcpu, irq = offset - EIOINTC_COREMAP_START; index = irq >> 1; s->coremap.reg_u16[index] = data; - eiointc_update_sw_coremap(s, irq, (void *)&data, sizeof(data), true); + eiointc_update_sw_coremap(s, irq, data, sizeof(data), true); break; default: ret = -EINVAL; @@ -573,7 +572,7 @@ static int loongarch_eiointc_writel(struct kvm_vcpu *vcpu, irq = offset - EIOINTC_COREMAP_START; index = irq >> 2; s->coremap.reg_u32[index] = data; - eiointc_update_sw_coremap(s, irq, (void *)&data, sizeof(data), true); + eiointc_update_sw_coremap(s, irq, data, sizeof(data), true); break; default: ret = -EINVAL; @@ -658,7 +657,7 @@ static int loongarch_eiointc_writeq(struct kvm_vcpu *vcpu, irq = offset - EIOINTC_COREMAP_START; index = irq >> 3; s->coremap.reg_u64[index] = data; - eiointc_update_sw_coremap(s, irq, (void *)&data, sizeof(data), true); + eiointc_update_sw_coremap(s, irq, data, sizeof(data), true); break; default: ret = -EINVAL; @@ -816,7 +815,7 @@ static int kvm_eiointc_ctrl_access(struct kvm_device *dev, for (i = 0; i < (EIOINTC_IRQS / 4); i++) { start_irq = i * 4; eiointc_update_sw_coremap(s, start_irq, - (void *)&s->coremap.reg_u32[i], sizeof(u32), false); + s->coremap.reg_u32[i], sizeof(u32), false); } break; default: From 45515c643d0abb75c2cc760a6bc6b235eadafd66 Mon Sep 17 00:00:00 2001 From: Bibo Mao Date: Fri, 27 Jun 2025 18:27:44 +0800 Subject: [PATCH 234/289] LoongArch: KVM: Check interrupt route from physical CPU With EIOINTC interrupt controller, physical CPU ID is set for irq route. However the function kvm_get_vcpu() is used to get destination vCPU when delivering irq. With API kvm_get_vcpu(), the logical CPU ID is used. With API kvm_get_vcpu_by_cpuid(), vCPU ID can be searched from physical CPU ID. Cc: stable@vger.kernel.org Fixes: 3956a52bc05b ("LoongArch: KVM: Add EIOINTC read and write functions") Signed-off-by: Bibo Mao Signed-off-by: Huacai Chen --- arch/loongarch/kvm/intc/eiointc.c | 24 ++++++++++++++++++------ 1 file changed, 18 insertions(+), 6 deletions(-) diff --git a/arch/loongarch/kvm/intc/eiointc.c b/arch/loongarch/kvm/intc/eiointc.c index 8d3f48e2a7f0..644fb7785c07 100644 --- a/arch/loongarch/kvm/intc/eiointc.c +++ b/arch/loongarch/kvm/intc/eiointc.c @@ -9,7 +9,8 @@ static void eiointc_set_sw_coreisr(struct loongarch_eiointc *s) { - int ipnum, cpu, irq_index, irq_mask, irq; + int ipnum, cpu, cpuid, irq_index, irq_mask, irq; + struct kvm_vcpu *vcpu; for (irq = 0; irq < EIOINTC_IRQS; irq++) { ipnum = s->ipmap.reg_u8[irq / 32]; @@ -20,7 +21,12 @@ static void eiointc_set_sw_coreisr(struct loongarch_eiointc *s) irq_index = irq / 32; irq_mask = BIT(irq & 0x1f); - cpu = s->coremap.reg_u8[irq]; + cpuid = s->coremap.reg_u8[irq]; + vcpu = kvm_get_vcpu_by_cpuid(s->kvm, cpuid); + if (!vcpu) + continue; + + cpu = vcpu->vcpu_id; if (!!(s->coreisr.reg_u32[cpu][irq_index] & irq_mask)) set_bit(irq, s->sw_coreisr[cpu][ipnum]); else @@ -68,17 +74,23 @@ static void eiointc_update_irq(struct loongarch_eiointc *s, int irq, int level) static inline void eiointc_update_sw_coremap(struct loongarch_eiointc *s, int irq, u64 val, u32 len, bool notify) { - int i, cpu; + int i, cpu, cpuid; + struct kvm_vcpu *vcpu; for (i = 0; i < len; i++) { - cpu = val & 0xff; + cpuid = val & 0xff; val = val >> 8; if (!(s->status & BIT(EIOINTC_ENABLE_CPU_ENCODE))) { - cpu = ffs(cpu) - 1; - cpu = (cpu >= 4) ? 0 : cpu; + cpuid = ffs(cpuid) - 1; + cpuid = (cpuid >= 4) ? 0 : cpuid; } + vcpu = kvm_get_vcpu_by_cpuid(s->kvm, cpuid); + if (!vcpu) + continue; + + cpu = vcpu->vcpu_id; if (s->sw_coremap[irq + i] == cpu) continue; From cc8d5b209e09d3b52bca1ffe00045876842d96ae Mon Sep 17 00:00:00 2001 From: Bibo Mao Date: Fri, 27 Jun 2025 18:27:44 +0800 Subject: [PATCH 235/289] LoongArch: KVM: Check validity of "num_cpu" from user space The maximum supported cpu number is EIOINTC_ROUTE_MAX_VCPUS about irqchip EIOINTC, here add validation about cpu number to avoid array pointer overflow. Cc: stable@vger.kernel.org Fixes: 1ad7efa552fd ("LoongArch: KVM: Add EIOINTC user mode read and write functions") Signed-off-by: Bibo Mao Signed-off-by: Huacai Chen --- arch/loongarch/kvm/intc/eiointc.c | 19 ++++++++++++++----- 1 file changed, 14 insertions(+), 5 deletions(-) diff --git a/arch/loongarch/kvm/intc/eiointc.c b/arch/loongarch/kvm/intc/eiointc.c index 644fb7785c07..056a75f7d090 100644 --- a/arch/loongarch/kvm/intc/eiointc.c +++ b/arch/loongarch/kvm/intc/eiointc.c @@ -805,7 +805,7 @@ static int kvm_eiointc_ctrl_access(struct kvm_device *dev, int ret = 0; unsigned long flags; unsigned long type = (unsigned long)attr->attr; - u32 i, start_irq; + u32 i, start_irq, val; void __user *data; struct loongarch_eiointc *s = dev->kvm->arch.eiointc; @@ -813,8 +813,14 @@ static int kvm_eiointc_ctrl_access(struct kvm_device *dev, spin_lock_irqsave(&s->lock, flags); switch (type) { case KVM_DEV_LOONGARCH_EXTIOI_CTRL_INIT_NUM_CPU: - if (copy_from_user(&s->num_cpu, data, 4)) + if (copy_from_user(&val, data, 4)) ret = -EFAULT; + else { + if (val >= EIOINTC_ROUTE_MAX_VCPUS) + ret = -EINVAL; + else + s->num_cpu = val; + } break; case KVM_DEV_LOONGARCH_EXTIOI_CTRL_INIT_FEATURE: if (copy_from_user(&s->features, data, 4)) @@ -842,7 +848,7 @@ static int kvm_eiointc_regs_access(struct kvm_device *dev, struct kvm_device_attr *attr, bool is_write) { - int addr, cpuid, offset, ret = 0; + int addr, cpu, offset, ret = 0; unsigned long flags; void *p = NULL; void __user *data; @@ -850,7 +856,7 @@ static int kvm_eiointc_regs_access(struct kvm_device *dev, s = dev->kvm->arch.eiointc; addr = attr->attr; - cpuid = addr >> 16; + cpu = addr >> 16; addr &= 0xffff; data = (void __user *)attr->addr; switch (addr) { @@ -875,8 +881,11 @@ static int kvm_eiointc_regs_access(struct kvm_device *dev, p = &s->isr.reg_u32[offset]; break; case EIOINTC_COREISR_START ... EIOINTC_COREISR_END: + if (cpu >= s->num_cpu) + return -EINVAL; + offset = (addr - EIOINTC_COREISR_START) / 4; - p = &s->coreisr.reg_u32[cpuid][offset]; + p = &s->coreisr.reg_u32[cpu][offset]; break; case EIOINTC_COREMAP_START ... EIOINTC_COREMAP_END: offset = (addr - EIOINTC_COREMAP_START) / 4; From 955853cf83657faa58572ef3f08b44f0f88885c1 Mon Sep 17 00:00:00 2001 From: Bibo Mao Date: Fri, 27 Jun 2025 18:27:44 +0800 Subject: [PATCH 236/289] LoongArch: KVM: Disable updating of "num_cpu" and "feature" Property "num_cpu" and "feature" are read-only once eiointc is created, which are set with KVM_DEV_LOONGARCH_EXTIOI_GRP_CTRL attr group before device creation. Attr group KVM_DEV_LOONGARCH_EXTIOI_GRP_SW_STATUS is to update register and software state for migration and reset usage, property "num_cpu" and "feature" can not be update again if it is created already. Here discard write operation with property "num_cpu" and "feature" in attr group KVM_DEV_LOONGARCH_EXTIOI_GRP_CTRL. Cc: stable@vger.kernel.org Fixes: 1ad7efa552fd ("LoongArch: KVM: Add EIOINTC user mode read and write functions") Signed-off-by: Bibo Mao Signed-off-by: Huacai Chen --- arch/loongarch/kvm/intc/eiointc.c | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/arch/loongarch/kvm/intc/eiointc.c b/arch/loongarch/kvm/intc/eiointc.c index 056a75f7d090..a75f865d6fb9 100644 --- a/arch/loongarch/kvm/intc/eiointc.c +++ b/arch/loongarch/kvm/intc/eiointc.c @@ -926,9 +926,15 @@ static int kvm_eiointc_sw_status_access(struct kvm_device *dev, data = (void __user *)attr->addr; switch (addr) { case KVM_DEV_LOONGARCH_EXTIOI_SW_STATUS_NUM_CPU: + if (is_write) + return ret; + p = &s->num_cpu; break; case KVM_DEV_LOONGARCH_EXTIOI_SW_STATUS_FEATURE: + if (is_write) + return ret; + p = &s->features; break; case KVM_DEV_LOONGARCH_EXTIOI_SW_STATUS_STATE: From f40213cd93e608ee78b5e25db042c42ec07139fe Mon Sep 17 00:00:00 2001 From: Johannes Berg Date: Fri, 6 Jun 2025 09:56:52 +0200 Subject: [PATCH 237/289] i2c: scx200_acb: depends on HAS_IOPORT It already depends on X86_32, but that's also set for ARCH=um. Recent changes made UML no longer have IO port access since it's not needed, but this driver uses it. Build it only for HAS_IOPORT. This is pretty much the same as depending on X86, but on the off-chance that HAS_IOPORT will ever be optional on x86 HAS_IOPORT is the real prerequisite. Signed-off-by: Johannes Berg Signed-off-by: Wolfram Sang --- drivers/i2c/busses/Kconfig | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/i2c/busses/Kconfig b/drivers/i2c/busses/Kconfig index 48c5ab832009..0a4ecccd1851 100644 --- a/drivers/i2c/busses/Kconfig +++ b/drivers/i2c/busses/Kconfig @@ -1530,7 +1530,7 @@ config I2C_XGENE_SLIMPRO config SCx200_ACB tristate "Geode ACCESS.bus support" - depends on X86_32 && PCI + depends on X86_32 && PCI && HAS_IOPORT help Enable the use of the ACCESS.bus controllers on the Geode SCx200 and SC1100 processors and the CS5535 and CS5536 Geode companion devices. From 09234a632be42573d9743ac5ff6773622d233ad0 Mon Sep 17 00:00:00 2001 From: Dave Chinner Date: Thu, 26 Jun 2025 08:48:54 +1000 Subject: [PATCH 238/289] xfs: xfs_ifree_cluster vs xfs_iflush_shutdown_abort deadlock Lock order of xfs_ifree_cluster() is cluster buffer -> try ILOCK -> IFLUSHING, except for the last inode in the cluster that is triggering the free. In that case, the lock order is ILOCK -> cluster buffer -> IFLUSHING. xfs_iflush_cluster() uses cluster buffer -> try ILOCK -> IFLUSHING, so this can safely run concurrently with xfs_ifree_cluster(). xfs_inode_item_precommit() uses ILOCK -> cluster buffer, but this cannot race with xfs_ifree_cluster() so being in a different order will not trigger a deadlock. xfs_reclaim_inode() during a filesystem shutdown uses ILOCK -> IFLUSHING -> cluster buffer via xfs_iflush_shutdown_abort(), and this deadlocks against xfs_ifree_cluster() like so: sysrq: Show Blocked State task:kworker/10:37 state:D stack:12560 pid:276182 tgid:276182 ppid:2 flags:0x00004000 Workqueue: xfs-inodegc/dm-3 xfs_inodegc_worker Call Trace: __schedule+0x650/0xb10 schedule+0x6d/0xf0 schedule_timeout+0x8b/0x180 schedule_timeout_uninterruptible+0x1e/0x30 xfs_ifree+0x326/0x730 xfs_inactive_ifree+0xcb/0x230 xfs_inactive+0x2c8/0x380 xfs_inodegc_worker+0xaa/0x180 process_scheduled_works+0x1d4/0x400 worker_thread+0x234/0x2e0 kthread+0x147/0x170 ret_from_fork+0x3e/0x50 ret_from_fork_asm+0x1a/0x30 task:fsync-tester state:D stack:12160 pid:2255943 tgid:2255943 ppid:3988702 flags:0x00004006 Call Trace: __schedule+0x650/0xb10 schedule+0x6d/0xf0 schedule_timeout+0x31/0x180 __down_common+0xbe/0x1f0 __down+0x1d/0x30 down+0x48/0x50 xfs_buf_lock+0x3d/0xe0 xfs_iflush_shutdown_abort+0x51/0x1e0 xfs_icwalk_ag+0x386/0x690 xfs_reclaim_inodes_nr+0x114/0x160 xfs_fs_free_cached_objects+0x19/0x20 super_cache_scan+0x17b/0x1a0 do_shrink_slab+0x180/0x350 shrink_slab+0xf8/0x430 drop_slab+0x97/0xf0 drop_caches_sysctl_handler+0x59/0xc0 proc_sys_call_handler+0x189/0x280 proc_sys_write+0x13/0x20 vfs_write+0x33d/0x3f0 ksys_write+0x7c/0xf0 __x64_sys_write+0x1b/0x30 x64_sys_call+0x271d/0x2ee0 do_syscall_64+0x68/0x130 entry_SYSCALL_64_after_hwframe+0x76/0x7e We can't change the lock order of xfs_ifree_cluster() - XFS_ISTALE and XFS_IFLUSHING are serialised through to journal IO completion by the cluster buffer lock being held. There's quite a few asserts in the code that check that XFS_ISTALE does not occur out of sync with buffer locking (e.g. in xfs_iflush_cluster). There's also a dependency on the inode log item being removed from the buffer before XFS_IFLUSHING is cleared, also with asserts that trigger on this. Further, we don't have a requirement for the inode to be locked when completing or aborting inode flushing because all the inode state updates are serialised by holding the cluster buffer lock across the IO to completion. We can't check for XFS_IRECLAIM in xfs_ifree_mark_inode_stale() and skip the inode, because there is no guarantee that the inode will be reclaimed. Hence it *must* be marked XFS_ISTALE regardless of whether reclaim is preparing to free that inode. Similarly, we can't check for IFLUSHING before locking the inode because that would result in dirty inodes not being marked with ISTALE in the event of racing with XFS_IRECLAIM. Hence we have to address this issue from the xfs_reclaim_inode() side. It is clear that we cannot hold the inode locked here when calling xfs_iflush_shutdown_abort() because it is the inode->buffer lock order that causes the deadlock against xfs_ifree_cluster(). Hence we need to drop the ILOCK before aborting the inode in the shutdown case. Once we've aborted the inode, we can grab the ILOCK again and then immediately reclaim it as it is now guaranteed to be clean. Note that dropping the ILOCK in xfs_reclaim_inode() means that it can now be locked by xfs_ifree_mark_inode_stale() and seen whilst in this state. This is safe because we have left the XFS_IFLUSHING flag on the inode and so xfs_ifree_mark_inode_stale() will simply set XFS_ISTALE and move to the next inode. An ASSERT check in this path needs to be tweaked to take into account this new shutdown interaction. Signed-off-by: Dave Chinner Reviewed-by: Christoph Hellwig Reviewed-by: Carlos Maiolino Signed-off-by: Carlos Maiolino --- fs/xfs/xfs_icache.c | 8 ++++++++ fs/xfs/xfs_inode.c | 2 +- 2 files changed, 9 insertions(+), 1 deletion(-) diff --git a/fs/xfs/xfs_icache.c b/fs/xfs/xfs_icache.c index 726e29b837e6..bbc2f2973dcc 100644 --- a/fs/xfs/xfs_icache.c +++ b/fs/xfs/xfs_icache.c @@ -979,7 +979,15 @@ xfs_reclaim_inode( */ if (xlog_is_shutdown(ip->i_mount->m_log)) { xfs_iunpin_wait(ip); + /* + * Avoid a ABBA deadlock on the inode cluster buffer vs + * concurrent xfs_ifree_cluster() trying to mark the inode + * stale. We don't need the inode locked to run the flush abort + * code, but the flush abort needs to lock the cluster buffer. + */ + xfs_iunlock(ip, XFS_ILOCK_EXCL); xfs_iflush_shutdown_abort(ip); + xfs_ilock(ip, XFS_ILOCK_EXCL); goto reclaim; } if (xfs_ipincount(ip)) diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c index ee3e0f284287..761a996a857c 100644 --- a/fs/xfs/xfs_inode.c +++ b/fs/xfs/xfs_inode.c @@ -1635,7 +1635,7 @@ retry: iip = ip->i_itemp; if (__xfs_iflags_test(ip, XFS_IFLUSHING)) { ASSERT(!list_empty(&iip->ili_item.li_bio_list)); - ASSERT(iip->ili_last_fields); + ASSERT(iip->ili_last_fields || xlog_is_shutdown(mp->m_log)); goto out_iunlock; } From db6a2274162de615ff74b927d38942fe3134d298 Mon Sep 17 00:00:00 2001 From: Dave Chinner Date: Thu, 26 Jun 2025 08:48:55 +1000 Subject: [PATCH 239/289] xfs: catch stale AGF/AGF metadata There is a race condition that can trigger in dmflakey fstests that can result in asserts in xfs_ialloc_read_agi() and xfs_alloc_read_agf() firing. The asserts look like this: XFS: Assertion failed: pag->pagf_freeblks == be32_to_cpu(agf->agf_freeblks), file: fs/xfs/libxfs/xfs_alloc.c, line: 3440 ..... Call Trace: xfs_alloc_read_agf+0x2ad/0x3a0 xfs_alloc_fix_freelist+0x280/0x720 xfs_alloc_vextent_prepare_ag+0x42/0x120 xfs_alloc_vextent_iterate_ags+0x67/0x260 xfs_alloc_vextent_start_ag+0xe4/0x1c0 xfs_bmapi_allocate+0x6fe/0xc90 xfs_bmapi_convert_delalloc+0x338/0x560 xfs_map_blocks+0x354/0x580 iomap_writepages+0x52b/0xa70 xfs_vm_writepages+0xd7/0x100 do_writepages+0xe1/0x2c0 __writeback_single_inode+0x44/0x340 writeback_sb_inodes+0x2d0/0x570 __writeback_inodes_wb+0x9c/0xf0 wb_writeback+0x139/0x2d0 wb_workfn+0x23e/0x4c0 process_scheduled_works+0x1d4/0x400 worker_thread+0x234/0x2e0 kthread+0x147/0x170 ret_from_fork+0x3e/0x50 ret_from_fork_asm+0x1a/0x30 I've seen the AGI variant from scrub running on the filesysetm after unmount failed due to systemd interference: XFS: Assertion failed: pag->pagi_freecount == be32_to_cpu(agi->agi_freecount) || xfs_is_shutdown(pag->pag_mount), file: fs/xfs/libxfs/xfs_ialloc.c, line: 2804 ..... Call Trace: xfs_ialloc_read_agi+0xee/0x150 xchk_perag_drain_and_lock+0x7d/0x240 xchk_ag_init+0x34/0x90 xchk_inode_xref+0x7b/0x220 xchk_inode+0x14d/0x180 xfs_scrub_metadata+0x2e2/0x510 xfs_ioc_scrub_metadata+0x62/0xb0 xfs_file_ioctl+0x446/0xbf0 __se_sys_ioctl+0x6f/0xc0 __x64_sys_ioctl+0x1d/0x30 x64_sys_call+0x1879/0x2ee0 do_syscall_64+0x68/0x130 ? exc_page_fault+0x62/0xc0 entry_SYSCALL_64_after_hwframe+0x76/0x7e Essentially, it is the same problem. When _flakey_drop_and_remount() loads the drop-writes table, it makes all writes silently fail. Writes are reported to the fs as completed successfully, but they are not issued to the backing store. The filesystem sees the successful write completion and marks the metadata buffer clean and removes it from the AIL. If this happens at the same time as memory pressure is occuring, the now-clean AGF and/or AGI buffers can be reclaimed from memory. Shortly afterwards, but before _flakey_drop_and_remount() runs unmount, background writeback is kicked and it tries to allocate blocks for the dirty pages in memory. This then tries to access the AGF buffer we just turfed out of memory. It's not found, so it gets read in from disk. This is all fine, except for the fact that the last writeback of the AGF did not actually reach disk. The AGF on disk is stale compared to the in-memory state held by the perag, and so they don't match and the assert fires. Then other operations on that inode hang because the task was killed whilst holding inode locks. e.g: Workqueue: xfs-conv/dm-12 xfs_end_io Call Trace: __schedule+0x650/0xb10 schedule+0x6d/0xf0 schedule_preempt_disabled+0x15/0x30 rwsem_down_write_slowpath+0x31a/0x5f0 down_write+0x43/0x60 xfs_ilock+0x1a8/0x210 xfs_trans_alloc_inode+0x9c/0x240 xfs_iomap_write_unwritten+0xe3/0x300 xfs_end_ioend+0x90/0x130 xfs_end_io+0xce/0x100 process_scheduled_works+0x1d4/0x400 worker_thread+0x234/0x2e0 kthread+0x147/0x170 ret_from_fork+0x3e/0x50 ret_from_fork_asm+0x1a/0x30 and it's all down hill from there. Memory pressure is one way to trigger this, another is to run "echo 3 > /proc/sys/vm/drop_caches" randomly while tests are running. Regardless of how it is triggered, this effectively takes down the system once umount hangs because it's holding a sb->s_umount lock exclusive and now every sync(1) call gets stuck on it. Fix this by replacing the asserts with a corruption detection check and a shutdown. Signed-off-by: Dave Chinner Reviewed-by: Carlos Maiolino Signed-off-by: Carlos Maiolino --- fs/xfs/libxfs/xfs_alloc.c | 41 ++++++++++++++++++++++++++++++-------- fs/xfs/libxfs/xfs_ialloc.c | 31 ++++++++++++++++++++++++---- 2 files changed, 60 insertions(+), 12 deletions(-) diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c index 7839efe050bf..000cc7f4a3ce 100644 --- a/fs/xfs/libxfs/xfs_alloc.c +++ b/fs/xfs/libxfs/xfs_alloc.c @@ -3444,16 +3444,41 @@ xfs_alloc_read_agf( set_bit(XFS_AGSTATE_AGF_INIT, &pag->pag_opstate); } + #ifdef DEBUG - else if (!xfs_is_shutdown(mp)) { - ASSERT(pag->pagf_freeblks == be32_to_cpu(agf->agf_freeblks)); - ASSERT(pag->pagf_btreeblks == be32_to_cpu(agf->agf_btreeblks)); - ASSERT(pag->pagf_flcount == be32_to_cpu(agf->agf_flcount)); - ASSERT(pag->pagf_longest == be32_to_cpu(agf->agf_longest)); - ASSERT(pag->pagf_bno_level == be32_to_cpu(agf->agf_bno_level)); - ASSERT(pag->pagf_cnt_level == be32_to_cpu(agf->agf_cnt_level)); + /* + * It's possible for the AGF to be out of sync if the block device is + * silently dropping writes. This can happen in fstests with dmflakey + * enabled, which allows the buffer to be cleaned and reclaimed by + * memory pressure and then re-read from disk here. We will get a + * stale version of the AGF from disk, and nothing good can happen from + * here. Hence if we detect this situation, immediately shut down the + * filesystem. + * + * This can also happen if we are already in the middle of a forced + * shutdown, so don't bother checking if we are already shut down. + */ + if (!xfs_is_shutdown(pag_mount(pag))) { + bool ok = true; + + ok &= pag->pagf_freeblks == be32_to_cpu(agf->agf_freeblks); + ok &= pag->pagf_freeblks == be32_to_cpu(agf->agf_freeblks); + ok &= pag->pagf_btreeblks == be32_to_cpu(agf->agf_btreeblks); + ok &= pag->pagf_flcount == be32_to_cpu(agf->agf_flcount); + ok &= pag->pagf_longest == be32_to_cpu(agf->agf_longest); + ok &= pag->pagf_bno_level == be32_to_cpu(agf->agf_bno_level); + ok &= pag->pagf_cnt_level == be32_to_cpu(agf->agf_cnt_level); + + if (XFS_IS_CORRUPT(pag_mount(pag), !ok)) { + xfs_ag_mark_sick(pag, XFS_SICK_AG_AGF); + xfs_trans_brelse(tp, agfbp); + xfs_force_shutdown(pag_mount(pag), + SHUTDOWN_CORRUPT_ONDISK); + return -EFSCORRUPTED; + } } -#endif +#endif /* DEBUG */ + if (agfbpp) *agfbpp = agfbp; else diff --git a/fs/xfs/libxfs/xfs_ialloc.c b/fs/xfs/libxfs/xfs_ialloc.c index 0c47b5c6ca7d..750111634d9f 100644 --- a/fs/xfs/libxfs/xfs_ialloc.c +++ b/fs/xfs/libxfs/xfs_ialloc.c @@ -2801,12 +2801,35 @@ xfs_ialloc_read_agi( set_bit(XFS_AGSTATE_AGI_INIT, &pag->pag_opstate); } +#ifdef DEBUG /* - * It's possible for these to be out of sync if - * we are in the middle of a forced shutdown. + * It's possible for the AGF to be out of sync if the block device is + * silently dropping writes. This can happen in fstests with dmflakey + * enabled, which allows the buffer to be cleaned and reclaimed by + * memory pressure and then re-read from disk here. We will get a + * stale version of the AGF from disk, and nothing good can happen from + * here. Hence if we detect this situation, immediately shut down the + * filesystem. + * + * This can also happen if we are already in the middle of a forced + * shutdown, so don't bother checking if we are already shut down. */ - ASSERT(pag->pagi_freecount == be32_to_cpu(agi->agi_freecount) || - xfs_is_shutdown(pag_mount(pag))); + if (!xfs_is_shutdown(pag_mount(pag))) { + bool ok = true; + + ok &= pag->pagi_freecount == be32_to_cpu(agi->agi_freecount); + ok &= pag->pagi_count == be32_to_cpu(agi->agi_count); + + if (XFS_IS_CORRUPT(pag_mount(pag), !ok)) { + xfs_ag_mark_sick(pag, XFS_SICK_AG_AGI); + xfs_trans_brelse(tp, agibp); + xfs_force_shutdown(pag_mount(pag), + SHUTDOWN_CORRUPT_ONDISK); + return -EFSCORRUPTED; + } + } +#endif /* DEBUG */ + if (agibpp) *agibpp = agibp; else From d62016b1a2df24c8608fe83cd3ae8090412881b3 Mon Sep 17 00:00:00 2001 From: Dave Chinner Date: Thu, 26 Jun 2025 08:48:56 +1000 Subject: [PATCH 240/289] xfs: avoid dquot buffer pin deadlock On shutdown when quotas are enabled, the shutdown can deadlock trying to unpin the dquot buffer buf_log_item like so: [ 3319.483590] task:kworker/20:0H state:D stack:14360 pid:1962230 tgid:1962230 ppid:2 task_flags:0x4208060 flags:0x00004000 [ 3319.493966] Workqueue: xfs-log/dm-6 xlog_ioend_work [ 3319.498458] Call Trace: [ 3319.500800] [ 3319.502809] __schedule+0x699/0xb70 [ 3319.512672] schedule+0x64/0xd0 [ 3319.515573] schedule_timeout+0x30/0xf0 [ 3319.528125] __down_common+0xc3/0x200 [ 3319.531488] __down+0x1d/0x30 [ 3319.534186] down+0x48/0x50 [ 3319.540501] xfs_buf_lock+0x3d/0xe0 [ 3319.543609] xfs_buf_item_unpin+0x85/0x1b0 [ 3319.547248] xlog_cil_committed+0x289/0x570 [ 3319.571411] xlog_cil_process_committed+0x6d/0x90 [ 3319.575590] xlog_state_shutdown_callbacks+0x52/0x110 [ 3319.580017] xlog_force_shutdown+0x169/0x1a0 [ 3319.583780] xlog_ioend_work+0x7c/0xb0 [ 3319.587049] process_scheduled_works+0x1d6/0x400 [ 3319.591127] worker_thread+0x202/0x2e0 [ 3319.594452] kthread+0x20c/0x240 The CIL push has seen the deadlock, so it has aborted the push and is running CIL checkpoint completion to abort all the items in the checkpoint. This calls ->iop_unpin(remove = true) to clean up the log items in the checkpoint. When a buffer log item is unpined like this, it needs to lock the buffer to run io completion to correctly fail the buffer and run all the required completions to fail attached log items as well. In this case, the attempt to lock the buffer on unpin is hanging because the buffer is already locked. I suspected a leaked XFS_BLI_HOLD state because of XFS_BLI_STALE handling changes I was testing, so I went looking for pin events on HOLD buffers and unpin events on locked buffer. That isolated this one buffer with these two events: xfs_buf_item_pin: dev 251:6 daddr 0xa910 bbcount 0x2 hold 2 pincount 0 lock 0 flags DONE|KMEM recur 0 refcount 1 bliflags HOLD|DIRTY|LOGGED liflags DIRTY .... xfs_buf_item_unpin: dev 251:6 daddr 0xa910 bbcount 0x2 hold 4 pincount 1 lock 0 flags DONE|KMEM recur 0 refcount 1 bliflags DIRTY liflags ABORTED Firstly, bbcount = 0x2, which means it is not a single sector structure. That rules out every xfs_trans_bhold() case except one: dquot buffers. Then hung task dumping gave this trace: [ 3197.312078] task:fsync-tester state:D stack:12080 pid:2051125 tgid:2051125 ppid:1643233 task_flags:0x400000 flags:0x00004002 [ 3197.323007] Call Trace: [ 3197.325581] [ 3197.327727] __schedule+0x699/0xb70 [ 3197.334582] schedule+0x64/0xd0 [ 3197.337672] schedule_timeout+0x30/0xf0 [ 3197.350139] wait_for_completion+0xbd/0x180 [ 3197.354235] __flush_workqueue+0xef/0x4e0 [ 3197.362229] xlog_cil_force_seq+0xa0/0x300 [ 3197.374447] xfs_log_force+0x77/0x230 [ 3197.378015] xfs_qm_dqunpin_wait+0x49/0xf0 [ 3197.382010] xfs_qm_dqflush+0x55/0x460 [ 3197.385663] xfs_qm_dquot_isolate+0x29e/0x4d0 [ 3197.389977] __list_lru_walk_one+0x141/0x220 [ 3197.398867] list_lru_walk_one+0x10/0x20 [ 3197.402713] xfs_qm_shrink_scan+0x6a/0x100 [ 3197.406699] do_shrink_slab+0x18a/0x350 [ 3197.410512] shrink_slab+0xf7/0x430 [ 3197.413967] drop_slab+0x97/0xf0 [ 3197.417121] drop_caches_sysctl_handler+0x59/0xc0 [ 3197.421654] proc_sys_call_handler+0x18b/0x280 [ 3197.426050] proc_sys_write+0x13/0x20 [ 3197.429750] vfs_write+0x2b8/0x3e0 [ 3197.438532] ksys_write+0x7e/0xf0 [ 3197.441742] __x64_sys_write+0x1b/0x30 [ 3197.445363] x64_sys_call+0x2c72/0x2f60 [ 3197.449044] do_syscall_64+0x6c/0x140 [ 3197.456341] entry_SYSCALL_64_after_hwframe+0x76/0x7e Yup, another test run by check-parallel is running drop_caches concurrently and the dquot shrinker for the hung filesystem is running. That's trying to flush a dirty dquot from reclaim context, and it waiting on a log force to complete. xfs_qm_dqflush is called with the dquot buffer held locked, and so we've called xfs_log_force() with that buffer locked. Now the log force is waiting for a workqueue flush to complete, and that workqueue flush is waiting of CIL checkpoint processing to finish. The CIL checkpoint processing is aborting all the log items it has, and that requires locking aborted buffers to cancel them. Now, normally this isn't a problem if we are issuing a log force to unpin an object, because the ->iop_unpin() method wakes pin waiters first. That results in the pin waiter finishing off whatever it was doing, dropping the lock and then xfs_buf_item_unpin() can lock the buffer and fail it. However, xfs_qm_dqflush() is waiting on the -dquot- unpin event, not the dquot buffer unpin event, and so it never gets woken and so does not drop the buffer lock. Inodes do not have this problem, as they can only be written from one spot (->iop_push) whilst dquots can be written from multiple places (memory reclaim, ->iop_push, xfs_dq_dqpurge, and quotacheck). The reason that the dquot buffer has an attached buffer log item is that it has been recently allocated. Initialisation of the dquot buffer logs the buffer directly, thereby pinning it in memory. We then modify the dquot in a separate operation, and have memory reclaim racing with a shutdown and we trigger this deadlock. check-parallel reproduces this reliably on 1kB FSB filesystems with quota enabled because it does all of these things concurrently without having to explicitly write tests to exercise these corner case conditions. xfs_qm_dquot_logitem_push() doesn't have this deadlock because it checks if the dquot is pinned before locking the dquot buffer and skipping it if it is pinned. This means the xfs_qm_dqunpin_wait() log force in xfs_qm_dqflush() never triggers and we unlock the buffer safely allowing a concurrent shutdown to fail the buffer appropriately. xfs_qm_dqpurge() could have this problem as it is called from quotacheck and we might have allocated dquot buffers when recording the quota updates. This can be fixed by calling xfs_qm_dqunpin_wait() before we lock the dquot buffer. Because we hold the dquot locked, nothing will be able to add to the pin count between the unpin_wait and the dqflush callout, so this now makes xfs_qm_dqpurge() safe against this race. xfs_qm_dquot_isolate() can also be fixed this same way but, quite frankly, we shouldn't be doing IO in memory reclaim context. If the dquot is pinned or dirty, simply rotate it and let memory reclaim come back to it later, same as we do for inodes. This then gets rid of the nasty issue in xfs_qm_flush_one() where quotacheck writeback races with memory reclaim flushing the dquots. We can lift xfs_qm_dqunpin_wait() up into this code, then get rid of the "can't get the dqflush lock" buffer write to cycle the dqlfush lock and enable it to be flushed again. checking if the dquot is pinned and returning -EAGAIN so that the dquot walk will revisit the dquot again later. Finally, with xfs_qm_dqunpin_wait() lifted into all the callers, we can remove it from the xfs_qm_dqflush() code. Signed-off-by: Dave Chinner Reviewed-by: Carlos Maiolino Signed-off-by: Carlos Maiolino --- fs/xfs/xfs_buf.c | 38 -------------------- fs/xfs/xfs_buf.h | 1 - fs/xfs/xfs_dquot.c | 4 +-- fs/xfs/xfs_qm.c | 86 ++++++++++------------------------------------ fs/xfs/xfs_trace.h | 1 - 5 files changed, 20 insertions(+), 110 deletions(-) diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c index 8af83bd161f9..ba5bd6031ece 100644 --- a/fs/xfs/xfs_buf.c +++ b/fs/xfs/xfs_buf.c @@ -2082,44 +2082,6 @@ xfs_buf_delwri_submit( return error; } -/* - * Push a single buffer on a delwri queue. - * - * The purpose of this function is to submit a single buffer of a delwri queue - * and return with the buffer still on the original queue. - * - * The buffer locking and queue management logic between _delwri_pushbuf() and - * _delwri_queue() guarantee that the buffer cannot be queued to another list - * before returning. - */ -int -xfs_buf_delwri_pushbuf( - struct xfs_buf *bp, - struct list_head *buffer_list) -{ - int error; - - ASSERT(bp->b_flags & _XBF_DELWRI_Q); - - trace_xfs_buf_delwri_pushbuf(bp, _RET_IP_); - - xfs_buf_lock(bp); - bp->b_flags &= ~(_XBF_DELWRI_Q | XBF_ASYNC); - bp->b_flags |= XBF_WRITE; - xfs_buf_submit(bp); - - /* - * The buffer is now locked, under I/O but still on the original delwri - * queue. Wait for I/O completion, restore the DELWRI_Q flag and - * return with the buffer unlocked and still on the original queue. - */ - error = xfs_buf_iowait(bp); - bp->b_flags |= _XBF_DELWRI_Q; - xfs_buf_unlock(bp); - - return error; -} - void xfs_buf_set_ref(struct xfs_buf *bp, int lru_ref) { /* diff --git a/fs/xfs/xfs_buf.h b/fs/xfs/xfs_buf.h index 9d2ab567cf81..15fc56948346 100644 --- a/fs/xfs/xfs_buf.h +++ b/fs/xfs/xfs_buf.h @@ -326,7 +326,6 @@ extern bool xfs_buf_delwri_queue(struct xfs_buf *, struct list_head *); void xfs_buf_delwri_queue_here(struct xfs_buf *bp, struct list_head *bl); extern int xfs_buf_delwri_submit(struct list_head *); extern int xfs_buf_delwri_submit_nowait(struct list_head *); -extern int xfs_buf_delwri_pushbuf(struct xfs_buf *, struct list_head *); static inline xfs_daddr_t xfs_buf_daddr(struct xfs_buf *bp) { diff --git a/fs/xfs/xfs_dquot.c b/fs/xfs/xfs_dquot.c index b4e32f0860b7..0bd8022e47b4 100644 --- a/fs/xfs/xfs_dquot.c +++ b/fs/xfs/xfs_dquot.c @@ -1398,11 +1398,9 @@ xfs_qm_dqflush( ASSERT(XFS_DQ_IS_LOCKED(dqp)); ASSERT(!completion_done(&dqp->q_flush)); + ASSERT(atomic_read(&dqp->q_pincount) == 0); trace_xfs_dqflush(dqp); - - xfs_qm_dqunpin_wait(dqp); - fa = xfs_qm_dqflush_check(dqp); if (fa) { xfs_alert(mp, "corrupt dquot ID 0x%x in memory at %pS", diff --git a/fs/xfs/xfs_qm.c b/fs/xfs/xfs_qm.c index 417439b58785..fa135ac26471 100644 --- a/fs/xfs/xfs_qm.c +++ b/fs/xfs/xfs_qm.c @@ -134,6 +134,7 @@ xfs_qm_dqpurge( dqp->q_flags |= XFS_DQFLAG_FREEING; + xfs_qm_dqunpin_wait(dqp); xfs_dqflock(dqp); /* @@ -465,6 +466,7 @@ xfs_qm_dquot_isolate( struct xfs_dquot *dqp = container_of(item, struct xfs_dquot, q_lru); struct xfs_qm_isolate *isol = arg; + enum lru_status ret = LRU_SKIP; if (!xfs_dqlock_nowait(dqp)) goto out_miss_busy; @@ -477,6 +479,16 @@ xfs_qm_dquot_isolate( if (dqp->q_flags & XFS_DQFLAG_FREEING) goto out_miss_unlock; + /* + * If the dquot is pinned or dirty, rotate it to the end of the LRU to + * give some time for it to be cleaned before we try to isolate it + * again. + */ + ret = LRU_ROTATE; + if (XFS_DQ_IS_DIRTY(dqp) || atomic_read(&dqp->q_pincount) > 0) { + goto out_miss_unlock; + } + /* * This dquot has acquired a reference in the meantime remove it from * the freelist and try again. @@ -492,41 +504,14 @@ xfs_qm_dquot_isolate( } /* - * If the dquot is dirty, flush it. If it's already being flushed, just - * skip it so there is time for the IO to complete before we try to - * reclaim it again on the next LRU pass. + * The dquot may still be under IO, in which case the flush lock will be + * held. If we can't get the flush lock now, just skip over the dquot as + * if it was dirty. */ if (!xfs_dqflock_nowait(dqp)) goto out_miss_unlock; - if (XFS_DQ_IS_DIRTY(dqp)) { - struct xfs_buf *bp = NULL; - int error; - - trace_xfs_dqreclaim_dirty(dqp); - - /* we have to drop the LRU lock to flush the dquot */ - spin_unlock(&lru->lock); - - error = xfs_dquot_use_attached_buf(dqp, &bp); - if (!bp || error == -EAGAIN) { - xfs_dqfunlock(dqp); - goto out_unlock_dirty; - } - - /* - * dqflush completes dqflock on error, and the delwri ioend - * does it on success. - */ - error = xfs_qm_dqflush(dqp, bp); - if (error) - goto out_unlock_dirty; - - xfs_buf_delwri_queue(bp, &isol->buffers); - xfs_buf_relse(bp); - goto out_unlock_dirty; - } - + ASSERT(!XFS_DQ_IS_DIRTY(dqp)); xfs_dquot_detach_buf(dqp); xfs_dqfunlock(dqp); @@ -548,13 +533,7 @@ out_miss_unlock: out_miss_busy: trace_xfs_dqreclaim_busy(dqp); XFS_STATS_INC(dqp->q_mount, xs_qm_dqreclaim_misses); - return LRU_SKIP; - -out_unlock_dirty: - trace_xfs_dqreclaim_busy(dqp); - XFS_STATS_INC(dqp->q_mount, xs_qm_dqreclaim_misses); - xfs_dqunlock(dqp); - return LRU_RETRY; + return ret; } static unsigned long @@ -1486,7 +1465,6 @@ xfs_qm_flush_one( struct xfs_dquot *dqp, void *data) { - struct xfs_mount *mp = dqp->q_mount; struct list_head *buffer_list = data; struct xfs_buf *bp = NULL; int error = 0; @@ -1497,34 +1475,8 @@ xfs_qm_flush_one( if (!XFS_DQ_IS_DIRTY(dqp)) goto out_unlock; - /* - * The only way the dquot is already flush locked by the time quotacheck - * gets here is if reclaim flushed it before the dqadjust walk dirtied - * it for the final time. Quotacheck collects all dquot bufs in the - * local delwri queue before dquots are dirtied, so reclaim can't have - * possibly queued it for I/O. The only way out is to push the buffer to - * cycle the flush lock. - */ - if (!xfs_dqflock_nowait(dqp)) { - /* buf is pinned in-core by delwri list */ - error = xfs_buf_incore(mp->m_ddev_targp, dqp->q_blkno, - mp->m_quotainfo->qi_dqchunklen, 0, &bp); - if (error) - goto out_unlock; - - if (!(bp->b_flags & _XBF_DELWRI_Q)) { - error = -EAGAIN; - xfs_buf_relse(bp); - goto out_unlock; - } - xfs_buf_unlock(bp); - - xfs_buf_delwri_pushbuf(bp, buffer_list); - xfs_buf_rele(bp); - - error = -EAGAIN; - goto out_unlock; - } + xfs_qm_dqunpin_wait(dqp); + xfs_dqflock(dqp); error = xfs_dquot_use_attached_buf(dqp, &bp); if (error) diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h index 01d284a1c759..9f0d6bc966b7 100644 --- a/fs/xfs/xfs_trace.h +++ b/fs/xfs/xfs_trace.h @@ -778,7 +778,6 @@ DEFINE_BUF_EVENT(xfs_buf_iowait_done); DEFINE_BUF_EVENT(xfs_buf_delwri_queue); DEFINE_BUF_EVENT(xfs_buf_delwri_queued); DEFINE_BUF_EVENT(xfs_buf_delwri_split); -DEFINE_BUF_EVENT(xfs_buf_delwri_pushbuf); DEFINE_BUF_EVENT(xfs_buf_get_uncached); DEFINE_BUF_EVENT(xfs_buf_item_relse); DEFINE_BUF_EVENT(xfs_buf_iodone_async); From fc48627b9c22f4d18651ca72ba171952d7a26004 Mon Sep 17 00:00:00 2001 From: Dave Chinner Date: Thu, 26 Jun 2025 08:48:57 +1000 Subject: [PATCH 241/289] xfs: add tracepoints for stale pinned inode state debug I needed more insight into how stale inodes were getting stuck on the AIL after a forced shutdown when running fsstress. These are the tracepoints I added for that purpose. Signed-off-by: Dave Chinner Reviewed-by: Carlos Maiolino Signed-off-by: Carlos Maiolino --- fs/xfs/xfs_inode_item.c | 5 ++++- fs/xfs/xfs_log_cil.c | 4 +++- fs/xfs/xfs_trace.h | 9 ++++++++- fs/xfs/xfs_trans.c | 4 +++- 4 files changed, 18 insertions(+), 4 deletions(-) diff --git a/fs/xfs/xfs_inode_item.c b/fs/xfs/xfs_inode_item.c index c6cb0b6b9e46..285e27ff89e2 100644 --- a/fs/xfs/xfs_inode_item.c +++ b/fs/xfs/xfs_inode_item.c @@ -758,11 +758,14 @@ xfs_inode_item_push( * completed and items removed from the AIL before the next push * attempt. */ + trace_xfs_inode_push_stale(ip, _RET_IP_); return XFS_ITEM_PINNED; } - if (xfs_ipincount(ip) > 0 || xfs_buf_ispinned(bp)) + if (xfs_ipincount(ip) > 0 || xfs_buf_ispinned(bp)) { + trace_xfs_inode_push_pinned(ip, _RET_IP_); return XFS_ITEM_PINNED; + } if (xfs_iflags_test(ip, XFS_IFLUSHING)) return XFS_ITEM_FLUSHING; diff --git a/fs/xfs/xfs_log_cil.c b/fs/xfs/xfs_log_cil.c index f66d2d430e4f..a80cb6b9969a 100644 --- a/fs/xfs/xfs_log_cil.c +++ b/fs/xfs/xfs_log_cil.c @@ -793,8 +793,10 @@ xlog_cil_ail_insert( struct xfs_log_item *lip = lv->lv_item; xfs_lsn_t item_lsn; - if (aborted) + if (aborted) { + trace_xlog_ail_insert_abort(lip); set_bit(XFS_LI_ABORTED, &lip->li_flags); + } if (lip->li_ops->flags & XFS_ITEM_RELEASE_WHEN_COMMITTED) { lip->li_ops->iop_release(lip); diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h index 9f0d6bc966b7..ba45d801df1c 100644 --- a/fs/xfs/xfs_trace.h +++ b/fs/xfs/xfs_trace.h @@ -1146,6 +1146,7 @@ DECLARE_EVENT_CLASS(xfs_iref_class, __field(xfs_ino_t, ino) __field(int, count) __field(int, pincount) + __field(unsigned long, iflags) __field(unsigned long, caller_ip) ), TP_fast_assign( @@ -1153,13 +1154,15 @@ DECLARE_EVENT_CLASS(xfs_iref_class, __entry->ino = ip->i_ino; __entry->count = atomic_read(&VFS_I(ip)->i_count); __entry->pincount = atomic_read(&ip->i_pincount); + __entry->iflags = ip->i_flags; __entry->caller_ip = caller_ip; ), - TP_printk("dev %d:%d ino 0x%llx count %d pincount %d caller %pS", + TP_printk("dev %d:%d ino 0x%llx count %d pincount %d iflags 0x%lx caller %pS", MAJOR(__entry->dev), MINOR(__entry->dev), __entry->ino, __entry->count, __entry->pincount, + __entry->iflags, (char *)__entry->caller_ip) ) @@ -1249,6 +1252,8 @@ DEFINE_IREF_EVENT(xfs_irele); DEFINE_IREF_EVENT(xfs_inode_pin); DEFINE_IREF_EVENT(xfs_inode_unpin); DEFINE_IREF_EVENT(xfs_inode_unpin_nowait); +DEFINE_IREF_EVENT(xfs_inode_push_pinned); +DEFINE_IREF_EVENT(xfs_inode_push_stale); DECLARE_EVENT_CLASS(xfs_namespace_class, TP_PROTO(struct xfs_inode *dp, const struct xfs_name *name), @@ -1653,6 +1658,8 @@ DEFINE_LOG_ITEM_EVENT(xfs_ail_flushing); DEFINE_LOG_ITEM_EVENT(xfs_cil_whiteout_mark); DEFINE_LOG_ITEM_EVENT(xfs_cil_whiteout_skip); DEFINE_LOG_ITEM_EVENT(xfs_cil_whiteout_unpin); +DEFINE_LOG_ITEM_EVENT(xlog_ail_insert_abort); +DEFINE_LOG_ITEM_EVENT(xfs_trans_free_abort); DECLARE_EVENT_CLASS(xfs_ail_class, TP_PROTO(struct xfs_log_item *lip, xfs_lsn_t old_lsn, xfs_lsn_t new_lsn), diff --git a/fs/xfs/xfs_trans.c b/fs/xfs/xfs_trans.c index c6657072361a..b4a07af513ba 100644 --- a/fs/xfs/xfs_trans.c +++ b/fs/xfs/xfs_trans.c @@ -742,8 +742,10 @@ xfs_trans_free_items( list_for_each_entry_safe(lip, next, &tp->t_items, li_trans) { xfs_trans_del_item(lip); - if (abort) + if (abort) { + trace_xfs_trans_free_abort(lip); set_bit(XFS_LI_ABORTED, &lip->li_flags); + } if (lip->li_ops->iop_release) lip->li_ops->iop_release(lip); } From d2fe5c4c8d25999862d615f616aea7befdd62799 Mon Sep 17 00:00:00 2001 From: Dave Chinner Date: Thu, 26 Jun 2025 08:48:58 +1000 Subject: [PATCH 242/289] xfs: rearrange code in xfs_buf_item.c The code to initialise, release and free items is all the way down the bottom of the file. Upcoming fixes need to these functions earlier in the file, so move them to the top. There is one code change in this move - the parameter to xfs_buf_item_relse() is changed from the xfs_buf to the xfs_buf_log_item - the thing that the function is releasing. Signed-off-by: Dave Chinner Reviewed-by: Carlos Maiolino Signed-off-by: Carlos Maiolino --- fs/xfs/xfs_buf_item.c | 116 +++++++++++++++++++++--------------------- fs/xfs/xfs_buf_item.h | 1 - 2 files changed, 58 insertions(+), 59 deletions(-) diff --git a/fs/xfs/xfs_buf_item.c b/fs/xfs/xfs_buf_item.c index 90139e0f3271..3e3c0f65a25c 100644 --- a/fs/xfs/xfs_buf_item.c +++ b/fs/xfs/xfs_buf_item.c @@ -32,6 +32,61 @@ static inline struct xfs_buf_log_item *BUF_ITEM(struct xfs_log_item *lip) return container_of(lip, struct xfs_buf_log_item, bli_item); } +static void +xfs_buf_item_get_format( + struct xfs_buf_log_item *bip, + int count) +{ + ASSERT(bip->bli_formats == NULL); + bip->bli_format_count = count; + + if (count == 1) { + bip->bli_formats = &bip->__bli_format; + return; + } + + bip->bli_formats = kzalloc(count * sizeof(struct xfs_buf_log_format), + GFP_KERNEL | __GFP_NOFAIL); +} + +static void +xfs_buf_item_free_format( + struct xfs_buf_log_item *bip) +{ + if (bip->bli_formats != &bip->__bli_format) { + kfree(bip->bli_formats); + bip->bli_formats = NULL; + } +} + +static void +xfs_buf_item_free( + struct xfs_buf_log_item *bip) +{ + xfs_buf_item_free_format(bip); + kvfree(bip->bli_item.li_lv_shadow); + kmem_cache_free(xfs_buf_item_cache, bip); +} + +/* + * xfs_buf_item_relse() is called when the buf log item is no longer needed. + */ +static void +xfs_buf_item_relse( + struct xfs_buf_log_item *bip) +{ + struct xfs_buf *bp = bip->bli_buf; + + trace_xfs_buf_item_relse(bp, _RET_IP_); + + ASSERT(!test_bit(XFS_LI_IN_AIL, &bip->bli_item.li_flags)); + ASSERT(atomic_read(&bip->bli_refcount) == 0); + + bp->b_log_item = NULL; + xfs_buf_rele(bp); + xfs_buf_item_free(bip); +} + /* Is this log iovec plausibly large enough to contain the buffer log format? */ bool xfs_buf_log_check_iovec( @@ -468,7 +523,7 @@ xfs_buf_item_unpin( ASSERT(list_empty(&bp->b_li_list)); } else { xfs_trans_ail_delete(lip, SHUTDOWN_LOG_IO_ERROR); - xfs_buf_item_relse(bp); + xfs_buf_item_relse(bip); ASSERT(bp->b_log_item == NULL); } xfs_buf_relse(bp); @@ -578,7 +633,7 @@ xfs_buf_item_put( */ if (aborted) xfs_trans_ail_delete(lip, 0); - xfs_buf_item_relse(bip->bli_buf); + xfs_buf_item_relse(bip); return true; } @@ -729,33 +784,6 @@ static const struct xfs_item_ops xfs_buf_item_ops = { .iop_push = xfs_buf_item_push, }; -STATIC void -xfs_buf_item_get_format( - struct xfs_buf_log_item *bip, - int count) -{ - ASSERT(bip->bli_formats == NULL); - bip->bli_format_count = count; - - if (count == 1) { - bip->bli_formats = &bip->__bli_format; - return; - } - - bip->bli_formats = kzalloc(count * sizeof(struct xfs_buf_log_format), - GFP_KERNEL | __GFP_NOFAIL); -} - -STATIC void -xfs_buf_item_free_format( - struct xfs_buf_log_item *bip) -{ - if (bip->bli_formats != &bip->__bli_format) { - kfree(bip->bli_formats); - bip->bli_formats = NULL; - } -} - /* * Allocate a new buf log item to go with the given buffer. * Set the buffer's b_log_item field to point to the new @@ -976,34 +1004,6 @@ xfs_buf_item_dirty_format( return false; } -STATIC void -xfs_buf_item_free( - struct xfs_buf_log_item *bip) -{ - xfs_buf_item_free_format(bip); - kvfree(bip->bli_item.li_lv_shadow); - kmem_cache_free(xfs_buf_item_cache, bip); -} - -/* - * xfs_buf_item_relse() is called when the buf log item is no longer needed. - */ -void -xfs_buf_item_relse( - struct xfs_buf *bp) -{ - struct xfs_buf_log_item *bip = bp->b_log_item; - - trace_xfs_buf_item_relse(bp, _RET_IP_); - ASSERT(!test_bit(XFS_LI_IN_AIL, &bip->bli_item.li_flags)); - - if (atomic_read(&bip->bli_refcount)) - return; - bp->b_log_item = NULL; - xfs_buf_rele(bp); - xfs_buf_item_free(bip); -} - void xfs_buf_item_done( struct xfs_buf *bp) @@ -1023,5 +1023,5 @@ xfs_buf_item_done( xfs_trans_ail_delete(&bp->b_log_item->bli_item, (bp->b_flags & _XBF_LOGRECOVERY) ? 0 : SHUTDOWN_CORRUPT_INCORE); - xfs_buf_item_relse(bp); + xfs_buf_item_relse(bp->b_log_item); } diff --git a/fs/xfs/xfs_buf_item.h b/fs/xfs/xfs_buf_item.h index e10e324cd245..50dd79b59cf5 100644 --- a/fs/xfs/xfs_buf_item.h +++ b/fs/xfs/xfs_buf_item.h @@ -49,7 +49,6 @@ struct xfs_buf_log_item { int xfs_buf_item_init(struct xfs_buf *, struct xfs_mount *); void xfs_buf_item_done(struct xfs_buf *bp); -void xfs_buf_item_relse(struct xfs_buf *); bool xfs_buf_item_put(struct xfs_buf_log_item *); void xfs_buf_item_log(struct xfs_buf_log_item *, uint, uint); bool xfs_buf_item_dirty_format(struct xfs_buf_log_item *); From 816c330b605c3f4813c0dc0ab5af5cce17ff06b3 Mon Sep 17 00:00:00 2001 From: Dave Chinner Date: Thu, 26 Jun 2025 08:48:59 +1000 Subject: [PATCH 243/289] xfs: factor out stale buffer item completion The stale buffer item completion handling is currently only done from BLI unpinning. We need to perform this function from where-ever the last reference to the BLI is dropped, so first we need to factor this code out into a helper. Signed-off-by: Dave Chinner Reviewed-by: Carlos Maiolino Signed-off-by: Carlos Maiolino --- fs/xfs/xfs_buf_item.c | 60 ++++++++++++++++++++++++++----------------- 1 file changed, 37 insertions(+), 23 deletions(-) diff --git a/fs/xfs/xfs_buf_item.c b/fs/xfs/xfs_buf_item.c index 3e3c0f65a25c..c95826863c82 100644 --- a/fs/xfs/xfs_buf_item.c +++ b/fs/xfs/xfs_buf_item.c @@ -444,6 +444,42 @@ xfs_buf_item_pin( atomic_inc(&bip->bli_buf->b_pin_count); } +/* + * For a stale BLI, process all the necessary completions that must be + * performed when the final BLI reference goes away. The buffer will be + * referenced and locked here - we return to the caller with the buffer still + * referenced and locked for them to finalise processing of the buffer. + */ +static void +xfs_buf_item_finish_stale( + struct xfs_buf_log_item *bip) +{ + struct xfs_buf *bp = bip->bli_buf; + struct xfs_log_item *lip = &bip->bli_item; + + ASSERT(bip->bli_flags & XFS_BLI_STALE); + ASSERT(xfs_buf_islocked(bp)); + ASSERT(bp->b_flags & XBF_STALE); + ASSERT(bip->__bli_format.blf_flags & XFS_BLF_CANCEL); + ASSERT(list_empty(&lip->li_trans)); + ASSERT(!bp->b_transp); + + if (bip->bli_flags & XFS_BLI_STALE_INODE) { + xfs_buf_item_done(bp); + xfs_buf_inode_iodone(bp); + ASSERT(list_empty(&bp->b_li_list)); + return; + } + + /* + * We may or may not be on the AIL here, xfs_trans_ail_delete() will do + * the right thing regardless of the situation in which we are called. + */ + xfs_trans_ail_delete(lip, SHUTDOWN_LOG_IO_ERROR); + xfs_buf_item_relse(bip); + ASSERT(bp->b_log_item == NULL); +} + /* * This is called to unpin the buffer associated with the buf log item which was * previously pinned with a call to xfs_buf_item_pin(). We enter this function @@ -493,13 +529,6 @@ xfs_buf_item_unpin( } if (stale) { - ASSERT(bip->bli_flags & XFS_BLI_STALE); - ASSERT(xfs_buf_islocked(bp)); - ASSERT(bp->b_flags & XBF_STALE); - ASSERT(bip->__bli_format.blf_flags & XFS_BLF_CANCEL); - ASSERT(list_empty(&lip->li_trans)); - ASSERT(!bp->b_transp); - trace_xfs_buf_item_unpin_stale(bip); /* @@ -510,22 +539,7 @@ xfs_buf_item_unpin( * processing is complete. */ xfs_buf_rele(bp); - - /* - * If we get called here because of an IO error, we may or may - * not have the item on the AIL. xfs_trans_ail_delete() will - * take care of that situation. xfs_trans_ail_delete() drops - * the AIL lock. - */ - if (bip->bli_flags & XFS_BLI_STALE_INODE) { - xfs_buf_item_done(bp); - xfs_buf_inode_iodone(bp); - ASSERT(list_empty(&bp->b_li_list)); - } else { - xfs_trans_ail_delete(lip, SHUTDOWN_LOG_IO_ERROR); - xfs_buf_item_relse(bip); - ASSERT(bp->b_log_item == NULL); - } + xfs_buf_item_finish_stale(bip); xfs_buf_relse(bp); return; } From 7b5f775be14ac1532c049022feadcfe44769566d Mon Sep 17 00:00:00 2001 From: Dave Chinner Date: Thu, 26 Jun 2025 08:49:00 +1000 Subject: [PATCH 244/289] xfs: fix unmount hang with unflushable inodes stuck in the AIL Unmount of a shutdown filesystem can hang with stale inode cluster buffers in the AIL like so: [95964.140623] Call Trace: [95964.144641] __schedule+0x699/0xb70 [95964.154003] schedule+0x64/0xd0 [95964.156851] xfs_ail_push_all_sync+0x9b/0xf0 [95964.164816] xfs_unmount_flush_inodes+0x41/0x70 [95964.168698] xfs_unmountfs+0x7f/0x170 [95964.171846] xfs_fs_put_super+0x3b/0x90 [95964.175216] generic_shutdown_super+0x77/0x160 [95964.178060] kill_block_super+0x1b/0x40 [95964.180553] xfs_kill_sb+0x12/0x30 [95964.182796] deactivate_locked_super+0x38/0x100 [95964.185735] deactivate_super+0x41/0x50 [95964.188245] cleanup_mnt+0x9f/0x160 [95964.190519] __cleanup_mnt+0x12/0x20 [95964.192899] task_work_run+0x89/0xb0 [95964.195221] resume_user_mode_work+0x4f/0x60 [95964.197931] syscall_exit_to_user_mode+0x76/0xb0 [95964.201003] do_syscall_64+0x74/0x130 $ pstree -N mnt |grep umount |-check-parallel---nsexec---run_test.sh---753---umount It always seems to be generic/753 that triggers this, and repeating a quick group test run triggers it every 10-15 iterations. Hence it generally triggers once up every 30-40 minutes of test time. just running generic/753 by itself or concurrently with a limited group of tests doesn't reproduce this issue at all. Tracing on a hung system shows the AIL repeating every 50ms a log force followed by an attempt to push pinned, aborted inodes from the AIL (trimmed for brevity): xfs_log_force: lsn 0x1c caller xfsaild+0x18e xfs_log_force: lsn 0x0 caller xlog_cil_flush+0xbd xfs_log_force: lsn 0x1c caller xfs_log_force+0x77 xfs_ail_pinned: lip 0xffff88826014afa0 lsn 1/37472 type XFS_LI_INODE flags IN_AIL|ABORTED xfs_ail_pinned: lip 0xffff88814000a708 lsn 1/37472 type XFS_LI_INODE flags IN_AIL|ABORTED xfs_ail_pinned: lip 0xffff88810b850c80 lsn 1/37472 type XFS_LI_INODE flags IN_AIL|ABORTED xfs_ail_pinned: lip 0xffff88810b850af0 lsn 1/37472 type XFS_LI_INODE flags IN_AIL|ABORTED xfs_ail_pinned: lip 0xffff888165cf0a28 lsn 1/37472 type XFS_LI_INODE flags IN_AIL|ABORTED xfs_ail_pinned: lip 0xffff88810b850bb8 lsn 1/37472 type XFS_LI_INODE flags IN_AIL|ABORTED .... The inode log items are marked as aborted, which means that either: a) a transaction commit has occurred, seen an error or shutdown, and called xfs_trans_free_items() to abort the items. This should happen before any pinning of log items occurs. or b) a dirty transaction has been cancelled. This should also happen before any pinning of log items occurs. or c) AIL insertion at journal IO completion is marked as aborted. In this case, the log item is pinned by the CIL until journal IO completes and hence needs to be unpinned. This is then done after the ->iop_committed() callback is run, so the pin count should be balanced correctly. Yet none of these seemed to be occurring. Further tracing indicated this: d) Shutdown during CIL pushing resulting in log item completion being called from checkpoint abort processing. Items are unpinned and released without serialisation against each other, journal IO completion or transaction commit completion. In this case, we may still have a transaction commit in flight that holds a reference to a xfs_buf_log_item (BLI) after CIL insertion. e.g. a synchronous transaction will flush the CIL before the transaction is torn down. The concurrent CIL push then aborts insertion it and drops the commit/AIL reference to the BLI. This can leave the transaction commit context with the last reference to the BLI which is dropped here: xfs_trans_free_items() ->iop_release xfs_buf_item_release xfs_buf_item_put if (XFS_LI_ABORTED) xfs_trans_ail_delete xfs_buf_item_relse() Unlike the journal completion ->iop_unpin path, this path does not run stale buffer completion process when it drops the last reference, hence leaving the stale inodes attached to the buffer sitting the AIL. There are no other references to those inodes, so there is no other mechanism to remove them from the AIL. Hence unmount hangs. The buffer lock context for stale buffers is passed to the last BLI reference. This is normally the last BLI unpin on journal IO completion. The unpin then processes the stale buffer completion and releases the buffer lock. However, if the final unpin from journal IO completion (or CIL push abort) does not hold the last reference to the BLI, there -must- still be a transaction context that references the BLI, and so that context must perform the stale buffer completion processing before the buffer is unlocked and the BLI torn down. The fix for this is to rework the xfs_buf_item_relse() path to run stale buffer completion processing if it drops the last reference to the BLI. We still hold the buffer locked, so the buffer owner and lock context is the same as if we passed the BLI and buffer to the ->iop_unpin() context to finish stale process on journal commit. However, we have to be careful here. In a shutdown state, we can be freeing dirty BLIs from xfs_buf_item_put() via xfs_trans_brelse() and xfs_trans_bdetach(). The existing code handles this case by considering shutdown state as "aborted", but in doing so largely masks the failure to clean up stale BLI state from the xfs_buf_item_relse() path. i.e regardless of the shutdown state and whether the item is in the AIL, we must finish the stale buffer cleanup if we are are dropping the last BLI reference from the ->iop_relse path in transaction commit context. Signed-off-by: Dave Chinner Reviewed-by: Carlos Maiolino Signed-off-by: Carlos Maiolino --- fs/xfs/xfs_buf_item.c | 123 +++++++++++++++++++++++++++++------------- fs/xfs/xfs_buf_item.h | 2 +- 2 files changed, 87 insertions(+), 38 deletions(-) diff --git a/fs/xfs/xfs_buf_item.c b/fs/xfs/xfs_buf_item.c index c95826863c82..7fc54725c5f6 100644 --- a/fs/xfs/xfs_buf_item.c +++ b/fs/xfs/xfs_buf_item.c @@ -612,43 +612,42 @@ xfs_buf_item_push( * Drop the buffer log item refcount and take appropriate action. This helper * determines whether the bli must be freed or not, since a decrement to zero * does not necessarily mean the bli is unused. - * - * Return true if the bli is freed, false otherwise. */ -bool +void xfs_buf_item_put( struct xfs_buf_log_item *bip) { - struct xfs_log_item *lip = &bip->bli_item; - bool aborted; - bool dirty; + + ASSERT(xfs_buf_islocked(bip->bli_buf)); /* drop the bli ref and return if it wasn't the last one */ if (!atomic_dec_and_test(&bip->bli_refcount)) - return false; + return; + + /* If the BLI is in the AIL, then it is still dirty and in use */ + if (test_bit(XFS_LI_IN_AIL, &bip->bli_item.li_flags)) { + ASSERT(bip->bli_flags & XFS_BLI_DIRTY); + return; + } /* - * We dropped the last ref and must free the item if clean or aborted. - * If the bli is dirty and non-aborted, the buffer was clean in the - * transaction but still awaiting writeback from previous changes. In - * that case, the bli is freed on buffer writeback completion. + * In shutdown conditions, we can be asked to free a dirty BLI that + * isn't in the AIL. This can occur due to a checkpoint aborting a BLI + * instead of inserting it into the AIL at checkpoint IO completion. If + * there's another bli reference (e.g. a btree cursor holds a clean + * reference) and it is released via xfs_trans_brelse(), we can get here + * with that aborted, dirty BLI. In this case, it is safe to free the + * dirty BLI immediately, as it is not in the AIL and there are no + * other references to it. + * + * We should never get here with a stale BLI via that path as + * xfs_trans_brelse() specifically holds onto stale buffers rather than + * releasing them. */ - aborted = test_bit(XFS_LI_ABORTED, &lip->li_flags) || - xlog_is_shutdown(lip->li_log); - dirty = bip->bli_flags & XFS_BLI_DIRTY; - if (dirty && !aborted) - return false; - - /* - * The bli is aborted or clean. An aborted item may be in the AIL - * regardless of dirty state. For example, consider an aborted - * transaction that invalidated a dirty bli and cleared the dirty - * state. - */ - if (aborted) - xfs_trans_ail_delete(lip, 0); + ASSERT(!(bip->bli_flags & XFS_BLI_DIRTY) || + test_bit(XFS_LI_ABORTED, &bip->bli_item.li_flags)); + ASSERT(!(bip->bli_flags & XFS_BLI_STALE)); xfs_buf_item_relse(bip); - return true; } /* @@ -669,6 +668,15 @@ xfs_buf_item_put( * if necessary but do not unlock the buffer. This is for support of * xfs_trans_bhold(). Make sure the XFS_BLI_HOLD field is cleared if we don't * free the item. + * + * If the XFS_BLI_STALE flag is set, the last reference to the BLI *must* + * perform a completion abort of any objects attached to the buffer for IO + * tracking purposes. This generally only happens in shutdown situations, + * normally xfs_buf_item_unpin() will drop the last BLI reference and perform + * completion processing. However, because transaction completion can race with + * checkpoint completion during a shutdown, this release context may end up + * being the last active reference to the BLI and so needs to perform this + * cleanup. */ STATIC void xfs_buf_item_release( @@ -676,18 +684,19 @@ xfs_buf_item_release( { struct xfs_buf_log_item *bip = BUF_ITEM(lip); struct xfs_buf *bp = bip->bli_buf; - bool released; bool hold = bip->bli_flags & XFS_BLI_HOLD; bool stale = bip->bli_flags & XFS_BLI_STALE; -#if defined(DEBUG) || defined(XFS_WARN) - bool ordered = bip->bli_flags & XFS_BLI_ORDERED; - bool dirty = bip->bli_flags & XFS_BLI_DIRTY; bool aborted = test_bit(XFS_LI_ABORTED, &lip->li_flags); + bool dirty = bip->bli_flags & XFS_BLI_DIRTY; +#if defined(DEBUG) || defined(XFS_WARN) + bool ordered = bip->bli_flags & XFS_BLI_ORDERED; #endif trace_xfs_buf_item_release(bip); + ASSERT(xfs_buf_islocked(bp)); + /* * The bli dirty state should match whether the blf has logged segments * except for ordered buffers, where only the bli should be dirty. @@ -703,16 +712,56 @@ xfs_buf_item_release( bp->b_transp = NULL; bip->bli_flags &= ~(XFS_BLI_LOGGED | XFS_BLI_HOLD | XFS_BLI_ORDERED); + /* If there are other references, then we have nothing to do. */ + if (!atomic_dec_and_test(&bip->bli_refcount)) + goto out_release; + /* - * Unref the item and unlock the buffer unless held or stale. Stale - * buffers remain locked until final unpin unless the bli is freed by - * the unref call. The latter implies shutdown because buffer - * invalidation dirties the bli and transaction. + * Stale buffer completion frees the BLI, unlocks and releases the + * buffer. Neither the BLI or buffer are safe to reference after this + * call, so there's nothing more we need to do here. + * + * If we get here with a stale buffer and references to the BLI remain, + * we must not unlock the buffer as the last BLI reference owns lock + * context, not us. */ - released = xfs_buf_item_put(bip); - if (hold || (stale && !released)) + if (stale) { + xfs_buf_item_finish_stale(bip); + xfs_buf_relse(bp); + ASSERT(!hold); + return; + } + + /* + * Dirty or clean, aborted items are done and need to be removed from + * the AIL and released. This frees the BLI, but leaves the buffer + * locked and referenced. + */ + if (aborted || xlog_is_shutdown(lip->li_log)) { + ASSERT(list_empty(&bip->bli_buf->b_li_list)); + xfs_buf_item_done(bp); + goto out_release; + } + + /* + * Clean, unreferenced BLIs can be immediately freed, leaving the buffer + * locked and referenced. + * + * Dirty, unreferenced BLIs *must* be in the AIL awaiting writeback. + */ + if (!dirty) + xfs_buf_item_relse(bip); + else + ASSERT(test_bit(XFS_LI_IN_AIL, &lip->li_flags)); + + /* Not safe to reference the BLI from here */ +out_release: + /* + * If we get here with a stale buffer, we must not unlock the + * buffer as the last BLI reference owns lock context, not us. + */ + if (stale || hold) return; - ASSERT(!stale || aborted); xfs_buf_relse(bp); } diff --git a/fs/xfs/xfs_buf_item.h b/fs/xfs/xfs_buf_item.h index 50dd79b59cf5..416890b84f8c 100644 --- a/fs/xfs/xfs_buf_item.h +++ b/fs/xfs/xfs_buf_item.h @@ -49,7 +49,7 @@ struct xfs_buf_log_item { int xfs_buf_item_init(struct xfs_buf *, struct xfs_mount *); void xfs_buf_item_done(struct xfs_buf *bp); -bool xfs_buf_item_put(struct xfs_buf_log_item *); +void xfs_buf_item_put(struct xfs_buf_log_item *bip); void xfs_buf_item_log(struct xfs_buf_log_item *, uint, uint); bool xfs_buf_item_dirty_format(struct xfs_buf_log_item *); void xfs_buf_inode_iodone(struct xfs_buf *); From 1f029b4e30a602db33dedee5ac676e9236ad193c Mon Sep 17 00:00:00 2001 From: Yang Li Date: Thu, 19 Jun 2025 11:01:07 +0800 Subject: [PATCH 245/289] Bluetooth: Prevent unintended pause by checking if advertising is active When PA Create Sync is enabled, advertising resumes unexpectedly. Therefore, it's necessary to check whether advertising is currently active before attempting to pause it. < HCI Command: LE Add Device To... (0x08|0x0011) plen 7 #1345 [hci0] 48.306205 Address type: Random (0x01) Address: 4F:84:84:5F:88:17 (Resolvable) Identity type: Random (0x01) Identity: FC:5B:8C:F7:5D:FB (Static) < HCI Command: LE Set Address Re.. (0x08|0x002d) plen 1 #1347 [hci0] 48.308023 Address resolution: Enabled (0x01) ... < HCI Command: LE Set Extended A.. (0x08|0x0039) plen 6 #1349 [hci0] 48.309650 Extended advertising: Enabled (0x01) Number of sets: 1 (0x01) Entry 0 Handle: 0x01 Duration: 0 ms (0x00) Max ext adv events: 0 ... < HCI Command: LE Periodic Adve.. (0x08|0x0044) plen 14 #1355 [hci0] 48.314575 Options: 0x0000 Use advertising SID, Advertiser Address Type and address Reporting initially enabled SID: 0x02 Adv address type: Random (0x01) Adv address: 4F:84:84:5F:88:17 (Resolvable) Identity type: Random (0x01) Identity: FC:5B:8C:F7:5D:FB (Static) Skip: 0x0000 Sync timeout: 20000 msec (0x07d0) Sync CTE type: 0x0000 Fixes: ad383c2c65a5 ("Bluetooth: hci_sync: Enable advertising when LL privacy is enabled") Signed-off-by: Yang Li Signed-off-by: Luiz Augusto von Dentz --- net/bluetooth/hci_sync.c | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/net/bluetooth/hci_sync.c b/net/bluetooth/hci_sync.c index 6687f2a4d1eb..42d3696227af 100644 --- a/net/bluetooth/hci_sync.c +++ b/net/bluetooth/hci_sync.c @@ -2481,6 +2481,10 @@ static int hci_pause_advertising_sync(struct hci_dev *hdev) int err; int old_state; + /* If controller is not advertising we are done. */ + if (!hci_dev_test_flag(hdev, HCI_LE_ADV)) + return 0; + /* If already been paused there is nothing to do. */ if (hdev->advertising_paused) return 0; From 46c0d947b64ac8efcf89dd754213dab5d1bd00aa Mon Sep 17 00:00:00 2001 From: Christian Eggers Date: Wed, 25 Jun 2025 15:09:29 +0200 Subject: [PATCH 246/289] Bluetooth: hci_sync: revert some mesh modifications This reverts minor parts of the changes made in commit b338d91703fa ("Bluetooth: Implement support for Mesh"). It looks like these changes were only made for development purposes but shouldn't have been part of the commit. Fixes: b338d91703fa ("Bluetooth: Implement support for Mesh") Cc: stable@vger.kernel.org Signed-off-by: Christian Eggers Signed-off-by: Luiz Augusto von Dentz --- net/bluetooth/hci_sync.c | 16 ++++------------ 1 file changed, 4 insertions(+), 12 deletions(-) diff --git a/net/bluetooth/hci_sync.c b/net/bluetooth/hci_sync.c index 42d3696227af..106862d47964 100644 --- a/net/bluetooth/hci_sync.c +++ b/net/bluetooth/hci_sync.c @@ -1970,13 +1970,10 @@ static int hci_clear_adv_sets_sync(struct hci_dev *hdev, struct sock *sk) static int hci_clear_adv_sync(struct hci_dev *hdev, struct sock *sk, bool force) { struct adv_info *adv, *n; - int err = 0; if (ext_adv_capable(hdev)) /* Remove all existing sets */ - err = hci_clear_adv_sets_sync(hdev, sk); - if (ext_adv_capable(hdev)) - return err; + return hci_clear_adv_sets_sync(hdev, sk); /* This is safe as long as there is no command send while the lock is * held. @@ -2004,13 +2001,11 @@ static int hci_clear_adv_sync(struct hci_dev *hdev, struct sock *sk, bool force) static int hci_remove_adv_sync(struct hci_dev *hdev, u8 instance, struct sock *sk) { - int err = 0; + int err; /* If we use extended advertising, instance has to be removed first. */ if (ext_adv_capable(hdev)) - err = hci_remove_ext_adv_instance_sync(hdev, instance, sk); - if (ext_adv_capable(hdev)) - return err; + return hci_remove_ext_adv_instance_sync(hdev, instance, sk); /* This is safe as long as there is no command send while the lock is * held. @@ -2109,16 +2104,13 @@ int hci_read_tx_power_sync(struct hci_dev *hdev, __le16 handle, u8 type) int hci_disable_advertising_sync(struct hci_dev *hdev) { u8 enable = 0x00; - int err = 0; /* If controller is not advertising we are done. */ if (!hci_dev_test_flag(hdev, HCI_LE_ADV)) return 0; if (ext_adv_capable(hdev)) - err = hci_disable_ext_adv_instance_sync(hdev, 0x00); - if (ext_adv_capable(hdev)) - return err; + return hci_disable_ext_adv_instance_sync(hdev, 0x00); return __hci_cmd_sync_status(hdev, HCI_OP_LE_SET_ADV_ENABLE, sizeof(enable), &enable, HCI_CMD_TIMEOUT); From e5af67a870f738bb8a4594b6c60c2caf4c87a3c9 Mon Sep 17 00:00:00 2001 From: Christian Eggers Date: Wed, 25 Jun 2025 15:09:30 +0200 Subject: [PATCH 247/289] Bluetooth: MGMT: set_mesh: update LE scan interval and window According to the message of commit b338d91703fa ("Bluetooth: Implement support for Mesh"), MGMT_OP_SET_MESH_RECEIVER should set the passive scan parameters. Currently the scan interval and window parameters are silently ignored, although user space (bluetooth-meshd) expects that they can be used [1] [1] https://git.kernel.org/pub/scm/bluetooth/bluez.git/tree/mesh/mesh-io-mgmt.c#n344 Fixes: b338d91703fa ("Bluetooth: Implement support for Mesh") Cc: stable@vger.kernel.org Signed-off-by: Christian Eggers Signed-off-by: Luiz Augusto von Dentz --- net/bluetooth/mgmt.c | 22 ++++++++++++++++++++++ 1 file changed, 22 insertions(+) diff --git a/net/bluetooth/mgmt.c b/net/bluetooth/mgmt.c index d540f7b4f75f..5d0f772c7a99 100644 --- a/net/bluetooth/mgmt.c +++ b/net/bluetooth/mgmt.c @@ -2153,6 +2153,9 @@ static int set_mesh_sync(struct hci_dev *hdev, void *data) else hci_dev_clear_flag(hdev, HCI_MESH); + hdev->le_scan_interval = __le16_to_cpu(cp->period); + hdev->le_scan_window = __le16_to_cpu(cp->window); + len -= sizeof(*cp); /* If filters don't fit, forward all adv pkts */ @@ -2167,6 +2170,7 @@ static int set_mesh(struct sock *sk, struct hci_dev *hdev, void *data, u16 len) { struct mgmt_cp_set_mesh *cp = data; struct mgmt_pending_cmd *cmd; + __u16 period, window; int err = 0; bt_dev_dbg(hdev, "sock %p", sk); @@ -2180,6 +2184,23 @@ static int set_mesh(struct sock *sk, struct hci_dev *hdev, void *data, u16 len) return mgmt_cmd_status(sk, hdev->id, MGMT_OP_SET_MESH_RECEIVER, MGMT_STATUS_INVALID_PARAMS); + /* Keep allowed ranges in sync with set_scan_params() */ + period = __le16_to_cpu(cp->period); + + if (period < 0x0004 || period > 0x4000) + return mgmt_cmd_status(sk, hdev->id, MGMT_OP_SET_MESH_RECEIVER, + MGMT_STATUS_INVALID_PARAMS); + + window = __le16_to_cpu(cp->window); + + if (window < 0x0004 || window > 0x4000) + return mgmt_cmd_status(sk, hdev->id, MGMT_OP_SET_MESH_RECEIVER, + MGMT_STATUS_INVALID_PARAMS); + + if (window > period) + return mgmt_cmd_status(sk, hdev->id, MGMT_OP_SET_MESH_RECEIVER, + MGMT_STATUS_INVALID_PARAMS); + hci_dev_lock(hdev); cmd = mgmt_pending_add(sk, MGMT_OP_SET_MESH_RECEIVER, hdev, data, len); @@ -6432,6 +6453,7 @@ static int set_scan_params(struct sock *sk, struct hci_dev *hdev, return mgmt_cmd_status(sk, hdev->id, MGMT_OP_SET_SCAN_PARAMS, MGMT_STATUS_NOT_SUPPORTED); + /* Keep allowed ranges in sync with set_mesh() */ interval = __le16_to_cpu(cp->interval); if (interval < 0x0004 || interval > 0x4000) From f3cb5676e5c11c896ba647ee309a993e73531588 Mon Sep 17 00:00:00 2001 From: Christian Eggers Date: Wed, 25 Jun 2025 15:09:31 +0200 Subject: [PATCH 248/289] Bluetooth: MGMT: mesh_send: check instances prior disabling advertising The unconditional call of hci_disable_advertising_sync() in mesh_send_done_sync() also disables other LE advertisings (non mesh related). I am not sure whether this call is required at all, but checking the adv_instances list (like done at other places) seems to solve the problem. Fixes: b338d91703fa ("Bluetooth: Implement support for Mesh") Cc: stable@vger.kernel.org Signed-off-by: Christian Eggers Signed-off-by: Luiz Augusto von Dentz --- net/bluetooth/mgmt.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/net/bluetooth/mgmt.c b/net/bluetooth/mgmt.c index 5d0f772c7a99..1485b455ade4 100644 --- a/net/bluetooth/mgmt.c +++ b/net/bluetooth/mgmt.c @@ -1080,7 +1080,8 @@ static int mesh_send_done_sync(struct hci_dev *hdev, void *data) struct mgmt_mesh_tx *mesh_tx; hci_dev_clear_flag(hdev, HCI_MESH_SENDING); - hci_disable_advertising_sync(hdev); + if (list_empty(&hdev->adv_instances)) + hci_disable_advertising_sync(hdev); mesh_tx = mgmt_mesh_next(hdev, NULL); if (mesh_tx) From 89fb8acc38852116d38d721ad394aad7f2871670 Mon Sep 17 00:00:00 2001 From: Christian Eggers Date: Fri, 27 Jun 2025 09:05:08 +0200 Subject: [PATCH 249/289] Bluetooth: HCI: Set extended advertising data synchronously Currently, for controllers with extended advertising, the advertising data is set in the asynchronous response handler for extended adverstising params. As most advertising settings are performed in a synchronous context, the (asynchronous) setting of the advertising data is done too late (after enabling the advertising). Move setting of adverstising data from asynchronous response handler into synchronous context to fix ordering of HCI commands. Signed-off-by: Christian Eggers Fixes: a0fb3726ba55 ("Bluetooth: Use Set ext adv/scan rsp data if controller supports") Cc: stable@vger.kernel.org v2: https://lore.kernel.org/linux-bluetooth/20250626115209.17839-1-ceggers@arri.de/ Signed-off-by: Luiz Augusto von Dentz --- net/bluetooth/hci_event.c | 36 ------- net/bluetooth/hci_sync.c | 207 ++++++++++++++++++++++++-------------- 2 files changed, 130 insertions(+), 113 deletions(-) diff --git a/net/bluetooth/hci_event.c b/net/bluetooth/hci_event.c index 66052d6aaa1d..4d5ace9d245d 100644 --- a/net/bluetooth/hci_event.c +++ b/net/bluetooth/hci_event.c @@ -2150,40 +2150,6 @@ static u8 hci_cc_set_adv_param(struct hci_dev *hdev, void *data, return rp->status; } -static u8 hci_cc_set_ext_adv_param(struct hci_dev *hdev, void *data, - struct sk_buff *skb) -{ - struct hci_rp_le_set_ext_adv_params *rp = data; - struct hci_cp_le_set_ext_adv_params *cp; - struct adv_info *adv_instance; - - bt_dev_dbg(hdev, "status 0x%2.2x", rp->status); - - if (rp->status) - return rp->status; - - cp = hci_sent_cmd_data(hdev, HCI_OP_LE_SET_EXT_ADV_PARAMS); - if (!cp) - return rp->status; - - hci_dev_lock(hdev); - hdev->adv_addr_type = cp->own_addr_type; - if (!cp->handle) { - /* Store in hdev for instance 0 */ - hdev->adv_tx_power = rp->tx_power; - } else { - adv_instance = hci_find_adv_instance(hdev, cp->handle); - if (adv_instance) - adv_instance->tx_power = rp->tx_power; - } - /* Update adv data as tx power is known now */ - hci_update_adv_data(hdev, cp->handle); - - hci_dev_unlock(hdev); - - return rp->status; -} - static u8 hci_cc_read_rssi(struct hci_dev *hdev, void *data, struct sk_buff *skb) { @@ -4164,8 +4130,6 @@ static const struct hci_cc { HCI_CC(HCI_OP_LE_READ_NUM_SUPPORTED_ADV_SETS, hci_cc_le_read_num_adv_sets, sizeof(struct hci_rp_le_read_num_supported_adv_sets)), - HCI_CC(HCI_OP_LE_SET_EXT_ADV_PARAMS, hci_cc_set_ext_adv_param, - sizeof(struct hci_rp_le_set_ext_adv_params)), HCI_CC_STATUS(HCI_OP_LE_SET_EXT_ADV_ENABLE, hci_cc_le_set_ext_adv_enable), HCI_CC_STATUS(HCI_OP_LE_SET_ADV_SET_RAND_ADDR, diff --git a/net/bluetooth/hci_sync.c b/net/bluetooth/hci_sync.c index 106862d47964..77b3691f3423 100644 --- a/net/bluetooth/hci_sync.c +++ b/net/bluetooth/hci_sync.c @@ -1205,9 +1205,126 @@ static int hci_set_adv_set_random_addr_sync(struct hci_dev *hdev, u8 instance, sizeof(cp), &cp, HCI_CMD_TIMEOUT); } +static int +hci_set_ext_adv_params_sync(struct hci_dev *hdev, struct adv_info *adv, + const struct hci_cp_le_set_ext_adv_params *cp, + struct hci_rp_le_set_ext_adv_params *rp) +{ + struct sk_buff *skb; + + skb = __hci_cmd_sync(hdev, HCI_OP_LE_SET_EXT_ADV_PARAMS, sizeof(*cp), + cp, HCI_CMD_TIMEOUT); + + /* If command return a status event, skb will be set to -ENODATA */ + if (skb == ERR_PTR(-ENODATA)) + return 0; + + if (IS_ERR(skb)) { + bt_dev_err(hdev, "Opcode 0x%4.4x failed: %ld", + HCI_OP_LE_SET_EXT_ADV_PARAMS, PTR_ERR(skb)); + return PTR_ERR(skb); + } + + if (skb->len != sizeof(*rp)) { + bt_dev_err(hdev, "Invalid response length for 0x%4.4x: %u", + HCI_OP_LE_SET_EXT_ADV_PARAMS, skb->len); + kfree_skb(skb); + return -EIO; + } + + memcpy(rp, skb->data, sizeof(*rp)); + kfree_skb(skb); + + if (!rp->status) { + hdev->adv_addr_type = cp->own_addr_type; + if (!cp->handle) { + /* Store in hdev for instance 0 */ + hdev->adv_tx_power = rp->tx_power; + } else if (adv) { + adv->tx_power = rp->tx_power; + } + } + + return rp->status; +} + +static int hci_set_ext_adv_data_sync(struct hci_dev *hdev, u8 instance) +{ + DEFINE_FLEX(struct hci_cp_le_set_ext_adv_data, pdu, data, length, + HCI_MAX_EXT_AD_LENGTH); + u8 len; + struct adv_info *adv = NULL; + int err; + + if (instance) { + adv = hci_find_adv_instance(hdev, instance); + if (!adv || !adv->adv_data_changed) + return 0; + } + + len = eir_create_adv_data(hdev, instance, pdu->data, + HCI_MAX_EXT_AD_LENGTH); + + pdu->length = len; + pdu->handle = adv ? adv->handle : instance; + pdu->operation = LE_SET_ADV_DATA_OP_COMPLETE; + pdu->frag_pref = LE_SET_ADV_DATA_NO_FRAG; + + err = __hci_cmd_sync_status(hdev, HCI_OP_LE_SET_EXT_ADV_DATA, + struct_size(pdu, data, len), pdu, + HCI_CMD_TIMEOUT); + if (err) + return err; + + /* Update data if the command succeed */ + if (adv) { + adv->adv_data_changed = false; + } else { + memcpy(hdev->adv_data, pdu->data, len); + hdev->adv_data_len = len; + } + + return 0; +} + +static int hci_set_adv_data_sync(struct hci_dev *hdev, u8 instance) +{ + struct hci_cp_le_set_adv_data cp; + u8 len; + + memset(&cp, 0, sizeof(cp)); + + len = eir_create_adv_data(hdev, instance, cp.data, sizeof(cp.data)); + + /* There's nothing to do if the data hasn't changed */ + if (hdev->adv_data_len == len && + memcmp(cp.data, hdev->adv_data, len) == 0) + return 0; + + memcpy(hdev->adv_data, cp.data, sizeof(cp.data)); + hdev->adv_data_len = len; + + cp.length = len; + + return __hci_cmd_sync_status(hdev, HCI_OP_LE_SET_ADV_DATA, + sizeof(cp), &cp, HCI_CMD_TIMEOUT); +} + +int hci_update_adv_data_sync(struct hci_dev *hdev, u8 instance) +{ + if (!hci_dev_test_flag(hdev, HCI_LE_ENABLED)) + return 0; + + if (ext_adv_capable(hdev)) + return hci_set_ext_adv_data_sync(hdev, instance); + + return hci_set_adv_data_sync(hdev, instance); +} + int hci_setup_ext_adv_instance_sync(struct hci_dev *hdev, u8 instance) { struct hci_cp_le_set_ext_adv_params cp; + struct hci_rp_le_set_ext_adv_params rp; bool connectable; u32 flags; bdaddr_t random_addr; @@ -1316,8 +1433,12 @@ int hci_setup_ext_adv_instance_sync(struct hci_dev *hdev, u8 instance) cp.secondary_phy = HCI_ADV_PHY_1M; } - err = __hci_cmd_sync_status(hdev, HCI_OP_LE_SET_EXT_ADV_PARAMS, - sizeof(cp), &cp, HCI_CMD_TIMEOUT); + err = hci_set_ext_adv_params_sync(hdev, adv, &cp, &rp); + if (err) + return err; + + /* Update adv data as tx power is known now */ + err = hci_set_ext_adv_data_sync(hdev, cp.handle); if (err) return err; @@ -1822,79 +1943,6 @@ int hci_le_terminate_big_sync(struct hci_dev *hdev, u8 handle, u8 reason) sizeof(cp), &cp, HCI_CMD_TIMEOUT); } -static int hci_set_ext_adv_data_sync(struct hci_dev *hdev, u8 instance) -{ - DEFINE_FLEX(struct hci_cp_le_set_ext_adv_data, pdu, data, length, - HCI_MAX_EXT_AD_LENGTH); - u8 len; - struct adv_info *adv = NULL; - int err; - - if (instance) { - adv = hci_find_adv_instance(hdev, instance); - if (!adv || !adv->adv_data_changed) - return 0; - } - - len = eir_create_adv_data(hdev, instance, pdu->data, - HCI_MAX_EXT_AD_LENGTH); - - pdu->length = len; - pdu->handle = adv ? adv->handle : instance; - pdu->operation = LE_SET_ADV_DATA_OP_COMPLETE; - pdu->frag_pref = LE_SET_ADV_DATA_NO_FRAG; - - err = __hci_cmd_sync_status(hdev, HCI_OP_LE_SET_EXT_ADV_DATA, - struct_size(pdu, data, len), pdu, - HCI_CMD_TIMEOUT); - if (err) - return err; - - /* Update data if the command succeed */ - if (adv) { - adv->adv_data_changed = false; - } else { - memcpy(hdev->adv_data, pdu->data, len); - hdev->adv_data_len = len; - } - - return 0; -} - -static int hci_set_adv_data_sync(struct hci_dev *hdev, u8 instance) -{ - struct hci_cp_le_set_adv_data cp; - u8 len; - - memset(&cp, 0, sizeof(cp)); - - len = eir_create_adv_data(hdev, instance, cp.data, sizeof(cp.data)); - - /* There's nothing to do if the data hasn't changed */ - if (hdev->adv_data_len == len && - memcmp(cp.data, hdev->adv_data, len) == 0) - return 0; - - memcpy(hdev->adv_data, cp.data, sizeof(cp.data)); - hdev->adv_data_len = len; - - cp.length = len; - - return __hci_cmd_sync_status(hdev, HCI_OP_LE_SET_ADV_DATA, - sizeof(cp), &cp, HCI_CMD_TIMEOUT); -} - -int hci_update_adv_data_sync(struct hci_dev *hdev, u8 instance) -{ - if (!hci_dev_test_flag(hdev, HCI_LE_ENABLED)) - return 0; - - if (ext_adv_capable(hdev)) - return hci_set_ext_adv_data_sync(hdev, instance); - - return hci_set_adv_data_sync(hdev, instance); -} - int hci_schedule_adv_instance_sync(struct hci_dev *hdev, u8 instance, bool force) { @@ -6273,6 +6321,7 @@ static int hci_le_ext_directed_advertising_sync(struct hci_dev *hdev, struct hci_conn *conn) { struct hci_cp_le_set_ext_adv_params cp; + struct hci_rp_le_set_ext_adv_params rp; int err; bdaddr_t random_addr; u8 own_addr_type; @@ -6314,8 +6363,12 @@ static int hci_le_ext_directed_advertising_sync(struct hci_dev *hdev, if (err) return err; - err = __hci_cmd_sync_status(hdev, HCI_OP_LE_SET_EXT_ADV_PARAMS, - sizeof(cp), &cp, HCI_CMD_TIMEOUT); + err = hci_set_ext_adv_params_sync(hdev, NULL, &cp, &rp); + if (err) + return err; + + /* Update adv data as tx power is known now */ + err = hci_set_ext_adv_data_sync(hdev, cp.handle); if (err) return err; From 6921d1e07cb5eddec830801087b419194fde0803 Mon Sep 17 00:00:00 2001 From: Edward Adam Davis Date: Tue, 24 Jun 2025 14:38:46 +0800 Subject: [PATCH 250/289] tracing: Fix filter logic error If the processing of the tr->events loop fails, the filter that has been added to filter_head will be released twice in free_filter_list(&head->rcu) and __free_filter(filter). After adding the filter of tr->events, add the filter to the filter_head process to avoid triggering uaf. Link: https://lore.kernel.org/tencent_4EF87A626D702F816CD0951CE956EC32CD0A@qq.com Fixes: a9d0aab5eb33 ("tracing: Fix regression of filter waiting a long time on RCU synchronization") Reported-by: syzbot+daba72c4af9915e9c894@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=daba72c4af9915e9c894 Tested-by: syzbot+daba72c4af9915e9c894@syzkaller.appspotmail.com Acked-by: Masami Hiramatsu (Google) Signed-off-by: Edward Adam Davis Signed-off-by: Steven Rostedt (Google) --- kernel/trace/trace_events_filter.c | 14 +++++++------- 1 file changed, 7 insertions(+), 7 deletions(-) diff --git a/kernel/trace/trace_events_filter.c b/kernel/trace/trace_events_filter.c index 08141f105c95..3885aadc434d 100644 --- a/kernel/trace/trace_events_filter.c +++ b/kernel/trace/trace_events_filter.c @@ -1436,13 +1436,6 @@ static void filter_free_subsystem_filters(struct trace_subsystem_dir *dir, INIT_LIST_HEAD(&head->list); - item = kmalloc(sizeof(*item), GFP_KERNEL); - if (!item) - goto free_now; - - item->filter = filter; - list_add_tail(&item->list, &head->list); - list_for_each_entry(file, &tr->events, list) { if (file->system != dir) continue; @@ -1454,6 +1447,13 @@ static void filter_free_subsystem_filters(struct trace_subsystem_dir *dir, event_clear_filter(file); } + item = kmalloc(sizeof(*item), GFP_KERNEL); + if (!item) + goto free_now; + + item->filter = filter; + list_add_tail(&item->list, &head->list); + delay_free_filter(head); return; free_now: From 8550821a153558d49dffacbc1dc98ac9d3eed2fa Mon Sep 17 00:00:00 2001 From: Jan Karcher Date: Thu, 26 Jun 2025 07:16:53 +0200 Subject: [PATCH 251/289] MAINTAINERS: update smc section Due to changes of my responsibilities within IBM i can no longer act as maintainer for smc. As a result of the co-operation with Alibaba over the last years we decided to, once more, give them more responsibility for smc by appointing D. Wythe and Dust Li as maintainers as well. Within IBM Sidraya Jayagond and Mahanta Jambigi are going to take over the maintainership for smc. Signed-off-by: Jan Karcher Reviewed-by: Wenjia Zhang Link: https://patch.msgid.link/20250626051653.4259-1-jaka@linux.ibm.com Signed-off-by: Jakub Kicinski --- MAINTAINERS | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/MAINTAINERS b/MAINTAINERS index efb51ee92683..bb7e5f8c4455 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -22564,9 +22564,11 @@ S: Maintained F: drivers/misc/sgi-xp/ SHARED MEMORY COMMUNICATIONS (SMC) SOCKETS +M: D. Wythe +M: Dust Li +M: Sidraya Jayagond M: Wenjia Zhang -M: Jan Karcher -R: D. Wythe +R: Mahanta Jambigi R: Tony Lu R: Wen Gu L: linux-rdma@vger.kernel.org From 6e457732c8a4431952a5cc075215268ce021dc0f Mon Sep 17 00:00:00 2001 From: Jakub Kicinski Date: Thu, 26 Jun 2025 11:20:55 -0700 Subject: [PATCH 252/289] docs: netdev: correct the heading level for co-posting selftests "Co-posting selftests" belongs in the "netdev patch review" section, same as "co-posting changes to user space components". It was erroneously added as its own section. Reviewed-by: Bagas Sanjaya Link: https://patch.msgid.link/20250626182055.4161905-1-kuba@kernel.org Signed-off-by: Jakub Kicinski --- Documentation/process/maintainer-netdev.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Documentation/process/maintainer-netdev.rst b/Documentation/process/maintainer-netdev.rst index 1ac62dc3a66f..e1755610b4bc 100644 --- a/Documentation/process/maintainer-netdev.rst +++ b/Documentation/process/maintainer-netdev.rst @@ -312,7 +312,7 @@ Posting as one thread is discouraged because it confuses patchwork (as of patchwork 2.2.2). Co-posting selftests --------------------- +~~~~~~~~~~~~~~~~~~~~ Selftests should be part of the same series as the code changes. Specifically for fixes both code change and related test should go into From ba2f83eecd2b36b12f92e5edd8bdc0509c7cd44e Mon Sep 17 00:00:00 2001 From: Ulrich Weber Date: Thu, 26 Jun 2025 16:56:18 +0200 Subject: [PATCH 253/289] doc: tls: socket needs to be established to enable ulp To enable TLS ulp socket needs to be in established state. This was added in commit d91c3e17f75f ("net/tls: Only attach to sockets in ESTABLISHED state"), in 2018. Signed-off-by: Ulrich Weber Link: https://patch.msgid.link/20250626145618.15464-1-ulrich.weber@gmail.com Signed-off-by: Jakub Kicinski --- Documentation/networking/tls.rst | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/Documentation/networking/tls.rst b/Documentation/networking/tls.rst index c7904a1bc167..36cc7afc2527 100644 --- a/Documentation/networking/tls.rst +++ b/Documentation/networking/tls.rst @@ -16,11 +16,13 @@ User interface Creating a TLS connection ------------------------- -First create a new TCP socket and set the TLS ULP. +First create a new TCP socket and once the connection is established set the +TLS ULP. .. code-block:: c sock = socket(AF_INET, SOCK_STREAM, 0); + connect(sock, addr, addrlen); setsockopt(sock, SOL_TCP, TCP_ULP, "tls", sizeof("tls")); Setting the TLS ULP allows us to set/get TLS socket options. Currently From d72411d20905180cdc452c553be17481b24463d2 Mon Sep 17 00:00:00 2001 From: Thomas Fourier Date: Wed, 25 Jun 2025 16:16:24 +0200 Subject: [PATCH 254/289] ethernet: atl1: Add missing DMA mapping error checks and count errors The `dma_map_XXX()` functions can fail and must be checked using `dma_mapping_error()`. This patch adds proper error handling for all DMA mapping calls. In `atl1_alloc_rx_buffers()`, if DMA mapping fails, the buffer is deallocated and marked accordingly. In `atl1_tx_map()`, previously mapped buffers are unmapped and the packet is dropped on failure. If `atl1_xmit_frame()` drops the packet, increment the tx_error counter. Fixes: f3cc28c79760 ("Add Attansic L1 ethernet driver.") Signed-off-by: Thomas Fourier Link: https://patch.msgid.link/20250625141629.114984-2-fourier.thomas@gmail.com Signed-off-by: Jakub Kicinski --- drivers/net/ethernet/atheros/atlx/atl1.c | 79 +++++++++++++++++------- 1 file changed, 57 insertions(+), 22 deletions(-) diff --git a/drivers/net/ethernet/atheros/atlx/atl1.c b/drivers/net/ethernet/atheros/atlx/atl1.c index cfdb546a09e7..98a4d089270e 100644 --- a/drivers/net/ethernet/atheros/atlx/atl1.c +++ b/drivers/net/ethernet/atheros/atlx/atl1.c @@ -1861,14 +1861,21 @@ static u16 atl1_alloc_rx_buffers(struct atl1_adapter *adapter) break; } - buffer_info->alloced = 1; - buffer_info->skb = skb; - buffer_info->length = (u16) adapter->rx_buffer_len; page = virt_to_page(skb->data); offset = offset_in_page(skb->data); buffer_info->dma = dma_map_page(&pdev->dev, page, offset, adapter->rx_buffer_len, DMA_FROM_DEVICE); + if (dma_mapping_error(&pdev->dev, buffer_info->dma)) { + kfree_skb(skb); + adapter->soft_stats.rx_dropped++; + break; + } + + buffer_info->alloced = 1; + buffer_info->skb = skb; + buffer_info->length = (u16)adapter->rx_buffer_len; + rfd_desc->buffer_addr = cpu_to_le64(buffer_info->dma); rfd_desc->buf_len = cpu_to_le16(adapter->rx_buffer_len); rfd_desc->coalese = 0; @@ -2183,8 +2190,8 @@ static int atl1_tx_csum(struct atl1_adapter *adapter, struct sk_buff *skb, return 0; } -static void atl1_tx_map(struct atl1_adapter *adapter, struct sk_buff *skb, - struct tx_packet_desc *ptpd) +static bool atl1_tx_map(struct atl1_adapter *adapter, struct sk_buff *skb, + struct tx_packet_desc *ptpd) { struct atl1_tpd_ring *tpd_ring = &adapter->tpd_ring; struct atl1_buffer *buffer_info; @@ -2194,6 +2201,7 @@ static void atl1_tx_map(struct atl1_adapter *adapter, struct sk_buff *skb, unsigned int nr_frags; unsigned int f; int retval; + u16 first_mapped; u16 next_to_use; u16 data_len; u8 hdr_len; @@ -2201,6 +2209,7 @@ static void atl1_tx_map(struct atl1_adapter *adapter, struct sk_buff *skb, buf_len -= skb->data_len; nr_frags = skb_shinfo(skb)->nr_frags; next_to_use = atomic_read(&tpd_ring->next_to_use); + first_mapped = next_to_use; buffer_info = &tpd_ring->buffer_info[next_to_use]; BUG_ON(buffer_info->skb); /* put skb in last TPD */ @@ -2216,6 +2225,8 @@ static void atl1_tx_map(struct atl1_adapter *adapter, struct sk_buff *skb, buffer_info->dma = dma_map_page(&adapter->pdev->dev, page, offset, hdr_len, DMA_TO_DEVICE); + if (dma_mapping_error(&adapter->pdev->dev, buffer_info->dma)) + goto dma_err; if (++next_to_use == tpd_ring->count) next_to_use = 0; @@ -2242,6 +2253,9 @@ static void atl1_tx_map(struct atl1_adapter *adapter, struct sk_buff *skb, page, offset, buffer_info->length, DMA_TO_DEVICE); + if (dma_mapping_error(&adapter->pdev->dev, + buffer_info->dma)) + goto dma_err; if (++next_to_use == tpd_ring->count) next_to_use = 0; } @@ -2254,6 +2268,8 @@ static void atl1_tx_map(struct atl1_adapter *adapter, struct sk_buff *skb, buffer_info->dma = dma_map_page(&adapter->pdev->dev, page, offset, buf_len, DMA_TO_DEVICE); + if (dma_mapping_error(&adapter->pdev->dev, buffer_info->dma)) + goto dma_err; if (++next_to_use == tpd_ring->count) next_to_use = 0; } @@ -2277,6 +2293,9 @@ static void atl1_tx_map(struct atl1_adapter *adapter, struct sk_buff *skb, buffer_info->dma = skb_frag_dma_map(&adapter->pdev->dev, frag, i * ATL1_MAX_TX_BUF_LEN, buffer_info->length, DMA_TO_DEVICE); + if (dma_mapping_error(&adapter->pdev->dev, + buffer_info->dma)) + goto dma_err; if (++next_to_use == tpd_ring->count) next_to_use = 0; @@ -2285,6 +2304,22 @@ static void atl1_tx_map(struct atl1_adapter *adapter, struct sk_buff *skb, /* last tpd's buffer-info */ buffer_info->skb = skb; + + return true; + + dma_err: + while (first_mapped != next_to_use) { + buffer_info = &tpd_ring->buffer_info[first_mapped]; + dma_unmap_page(&adapter->pdev->dev, + buffer_info->dma, + buffer_info->length, + DMA_TO_DEVICE); + buffer_info->dma = 0; + + if (++first_mapped == tpd_ring->count) + first_mapped = 0; + } + return false; } static void atl1_tx_queue(struct atl1_adapter *adapter, u16 count, @@ -2355,10 +2390,8 @@ static netdev_tx_t atl1_xmit_frame(struct sk_buff *skb, len = skb_headlen(skb); - if (unlikely(skb->len <= 0)) { - dev_kfree_skb_any(skb); - return NETDEV_TX_OK; - } + if (unlikely(skb->len <= 0)) + goto drop_packet; nr_frags = skb_shinfo(skb)->nr_frags; for (f = 0; f < nr_frags; f++) { @@ -2371,10 +2404,9 @@ static netdev_tx_t atl1_xmit_frame(struct sk_buff *skb, if (mss) { if (skb->protocol == htons(ETH_P_IP)) { proto_hdr_len = skb_tcp_all_headers(skb); - if (unlikely(proto_hdr_len > len)) { - dev_kfree_skb_any(skb); - return NETDEV_TX_OK; - } + if (unlikely(proto_hdr_len > len)) + goto drop_packet; + /* need additional TPD ? */ if (proto_hdr_len != len) count += (len - proto_hdr_len + @@ -2406,23 +2438,26 @@ static netdev_tx_t atl1_xmit_frame(struct sk_buff *skb, } tso = atl1_tso(adapter, skb, ptpd); - if (tso < 0) { - dev_kfree_skb_any(skb); - return NETDEV_TX_OK; - } + if (tso < 0) + goto drop_packet; if (!tso) { ret_val = atl1_tx_csum(adapter, skb, ptpd); - if (ret_val < 0) { - dev_kfree_skb_any(skb); - return NETDEV_TX_OK; - } + if (ret_val < 0) + goto drop_packet; } - atl1_tx_map(adapter, skb, ptpd); + if (!atl1_tx_map(adapter, skb, ptpd)) + goto drop_packet; + atl1_tx_queue(adapter, count, ptpd); atl1_update_mailbox(adapter); return NETDEV_TX_OK; + +drop_packet: + adapter->soft_stats.tx_errors++; + dev_kfree_skb_any(skb); + return NETDEV_TX_OK; } static int atl1_rings_clean(struct napi_struct *napi, int budget) From 2def09ead4ad5907988b655d1e1454003aaf8297 Mon Sep 17 00:00:00 2001 From: Fushuai Wang Date: Thu, 26 Jun 2025 21:30:03 +0800 Subject: [PATCH 255/289] dpaa2-eth: fix xdp_rxq_info leak The driver registered xdp_rxq_info structures via xdp_rxq_info_reg() but failed to properly unregister them in error paths and during removal. Fixes: d678be1dc1ec ("dpaa2-eth: add XDP_REDIRECT support") Signed-off-by: Fushuai Wang Reviewed-by: Simon Horman Reviewed-by: Ioana Ciornei Link: https://patch.msgid.link/20250626133003.80136-1-wangfushuai@baidu.com Signed-off-by: Jakub Kicinski --- .../net/ethernet/freescale/dpaa2/dpaa2-eth.c | 26 +++++++++++++++++-- 1 file changed, 24 insertions(+), 2 deletions(-) diff --git a/drivers/net/ethernet/freescale/dpaa2/dpaa2-eth.c b/drivers/net/ethernet/freescale/dpaa2/dpaa2-eth.c index 2ec2c3dab250..b82f121cadad 100644 --- a/drivers/net/ethernet/freescale/dpaa2/dpaa2-eth.c +++ b/drivers/net/ethernet/freescale/dpaa2/dpaa2-eth.c @@ -3939,6 +3939,7 @@ static int dpaa2_eth_setup_rx_flow(struct dpaa2_eth_priv *priv, MEM_TYPE_PAGE_ORDER0, NULL); if (err) { dev_err(dev, "xdp_rxq_info_reg_mem_model failed\n"); + xdp_rxq_info_unreg(&fq->channel->xdp_rxq); return err; } @@ -4432,17 +4433,25 @@ static int dpaa2_eth_bind_dpni(struct dpaa2_eth_priv *priv) return -EINVAL; } if (err) - return err; + goto out; } err = dpni_get_qdid(priv->mc_io, 0, priv->mc_token, DPNI_QUEUE_TX, &priv->tx_qdid); if (err) { dev_err(dev, "dpni_get_qdid() failed\n"); - return err; + goto out; } return 0; + +out: + while (i--) { + if (priv->fq[i].type == DPAA2_RX_FQ && + xdp_rxq_info_is_reg(&priv->fq[i].channel->xdp_rxq)) + xdp_rxq_info_unreg(&priv->fq[i].channel->xdp_rxq); + } + return err; } /* Allocate rings for storing incoming frame descriptors */ @@ -4825,6 +4834,17 @@ static void dpaa2_eth_del_ch_napi(struct dpaa2_eth_priv *priv) } } +static void dpaa2_eth_free_rx_xdp_rxq(struct dpaa2_eth_priv *priv) +{ + int i; + + for (i = 0; i < priv->num_fqs; i++) { + if (priv->fq[i].type == DPAA2_RX_FQ && + xdp_rxq_info_is_reg(&priv->fq[i].channel->xdp_rxq)) + xdp_rxq_info_unreg(&priv->fq[i].channel->xdp_rxq); + } +} + static int dpaa2_eth_probe(struct fsl_mc_device *dpni_dev) { struct device *dev; @@ -5028,6 +5048,7 @@ err_alloc_percpu_extras: free_percpu(priv->percpu_stats); err_alloc_percpu_stats: dpaa2_eth_del_ch_napi(priv); + dpaa2_eth_free_rx_xdp_rxq(priv); err_bind: dpaa2_eth_free_dpbps(priv); err_dpbp_setup: @@ -5080,6 +5101,7 @@ static void dpaa2_eth_remove(struct fsl_mc_device *ls_dev) free_percpu(priv->percpu_extras); dpaa2_eth_del_ch_napi(priv); + dpaa2_eth_free_rx_xdp_rxq(priv); dpaa2_eth_free_dpbps(priv); dpaa2_eth_free_dpio(priv); dpaa2_eth_free_dpni(priv); From 45537926dd2aaa9190ac0fac5a0fbeefcadfea95 Mon Sep 17 00:00:00 2001 From: Niklas Schnelle Date: Wed, 25 Jun 2025 11:28:28 +0200 Subject: [PATCH 256/289] s390/pci: Fix stale function handles in error handling The error event information for PCI error events contains a function handle for the respective function. This handle is generally captured at the time the error event was recorded. Due to delays in processing or cascading issues, it may happen that during firmware recovery multiple events are generated. When processing these events in order Linux may already have recovered an affected function making the event information stale. Fix this by doing an unconditional CLP List PCI function retrieving the current function handle with the zdev->state_lock held and ignoring the event if its function handle is stale. Cc: stable@vger.kernel.org Fixes: 4cdf2f4e24ff ("s390/pci: implement minimal PCI error recovery") Reviewed-by: Julian Ruess Reviewed-by: Gerd Bayer Reviewed-by: Farhan Ali Signed-off-by: Niklas Schnelle Signed-off-by: Alexander Gordeev --- arch/s390/pci/pci_event.c | 11 +++++++++++ 1 file changed, 11 insertions(+) diff --git a/arch/s390/pci/pci_event.c b/arch/s390/pci/pci_event.c index 2fbee3887d13..82ee2578279a 100644 --- a/arch/s390/pci/pci_event.c +++ b/arch/s390/pci/pci_event.c @@ -273,6 +273,8 @@ static void __zpci_event_error(struct zpci_ccdf_err *ccdf) struct zpci_dev *zdev = get_zdev_by_fid(ccdf->fid); struct pci_dev *pdev = NULL; pci_ers_result_t ers_res; + u32 fh = 0; + int rc; zpci_dbg(3, "err fid:%x, fh:%x, pec:%x\n", ccdf->fid, ccdf->fh, ccdf->pec); @@ -281,6 +283,15 @@ static void __zpci_event_error(struct zpci_ccdf_err *ccdf) if (zdev) { mutex_lock(&zdev->state_lock); + rc = clp_refresh_fh(zdev->fid, &fh); + if (rc) + goto no_pdev; + if (!fh || ccdf->fh != fh) { + /* Ignore events with stale handles */ + zpci_dbg(3, "err fid:%x, fh:%x (stale %x)\n", + ccdf->fid, fh, ccdf->fh); + goto no_pdev; + } zpci_update_fh(zdev, ccdf->fh); if (zdev->zbus->bus) pdev = pci_get_slot(zdev->zbus->bus, zdev->devfn); From b97a7972b1f4f81417840b9a2ab0c19722b577d5 Mon Sep 17 00:00:00 2001 From: Niklas Schnelle Date: Wed, 25 Jun 2025 11:28:29 +0200 Subject: [PATCH 257/289] s390/pci: Do not try re-enabling load/store if device is disabled If a device is disabled unblocking load/store on its own is not useful as a full re-enable of the function is necessary anyway. Note that SCLP Write Event Data Action Qualifier 0 (Reset) leaves the device disabled and triggers this case unless the driver already requests a reset. Cc: stable@vger.kernel.org Fixes: 4cdf2f4e24ff ("s390/pci: implement minimal PCI error recovery") Reviewed-by: Farhan Ali Signed-off-by: Niklas Schnelle Signed-off-by: Alexander Gordeev --- arch/s390/pci/pci_event.c | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/arch/s390/pci/pci_event.c b/arch/s390/pci/pci_event.c index 82ee2578279a..6c8922ad70f3 100644 --- a/arch/s390/pci/pci_event.c +++ b/arch/s390/pci/pci_event.c @@ -106,6 +106,10 @@ static pci_ers_result_t zpci_event_do_error_state_clear(struct pci_dev *pdev, struct zpci_dev *zdev = to_zpci(pdev); int rc; + /* The underlying device may have been disabled by the event */ + if (!zdev_enabled(zdev)) + return PCI_ERS_RESULT_NEED_RESET; + pr_info("%s: Unblocking device access for examination\n", pci_name(pdev)); rc = zpci_reset_load_store_blocked(zdev); if (rc) { From 62355f1f87b8c7f8785a8dd3cd5ca6e5b513566a Mon Sep 17 00:00:00 2001 From: Niklas Schnelle Date: Wed, 25 Jun 2025 11:28:30 +0200 Subject: [PATCH 258/289] s390/pci: Allow automatic recovery with minimal driver support According to Documentation/PCI/pci-error-recovery.rst only the error_detected() callback in the err_handler struct is mandatory for a driver to support error recovery. So far s390's error recovery chose a stricter approach also requiring slot_reset() and resume(). Relax this requirement and only require error_detected(). If a callback is not implemented EEH and AER treat this as PCI_ERS_RESULT_NONE. This return value is otherwise used by drivers abstaining from their vote on how to proceed with recovery and currently also not supported by s390's recovery code. So to support missing callbacks in-line with other implementors of the recovery flow, also handle PCI_ERS_RESULT_NONE. Since s390 only does per PCI function recovery and does not do voting, treat PCI_ERS_RESULT_NONE optimistically and proceed through recovery unless other failures prevent this. Reviewed-by: Farhan Ali Reviewed-by: Julian Ruess Signed-off-by: Niklas Schnelle Signed-off-by: Alexander Gordeev --- arch/s390/pci/pci_event.c | 44 ++++++++++++++++++++++++++------------- 1 file changed, 29 insertions(+), 15 deletions(-) diff --git a/arch/s390/pci/pci_event.c b/arch/s390/pci/pci_event.c index 6c8922ad70f3..d930416d4c90 100644 --- a/arch/s390/pci/pci_event.c +++ b/arch/s390/pci/pci_event.c @@ -54,6 +54,7 @@ static inline bool ers_result_indicates_abort(pci_ers_result_t ers_res) case PCI_ERS_RESULT_CAN_RECOVER: case PCI_ERS_RESULT_RECOVERED: case PCI_ERS_RESULT_NEED_RESET: + case PCI_ERS_RESULT_NONE: return false; default: return true; @@ -78,10 +79,6 @@ static bool is_driver_supported(struct pci_driver *driver) return false; if (!driver->err_handler->error_detected) return false; - if (!driver->err_handler->slot_reset) - return false; - if (!driver->err_handler->resume) - return false; return true; } @@ -118,16 +115,18 @@ static pci_ers_result_t zpci_event_do_error_state_clear(struct pci_dev *pdev, return PCI_ERS_RESULT_NEED_RESET; } - if (driver->err_handler->mmio_enabled) { + if (driver->err_handler->mmio_enabled) ers_res = driver->err_handler->mmio_enabled(pdev); - if (ers_result_indicates_abort(ers_res)) { - pr_info("%s: Automatic recovery failed after MMIO re-enable\n", - pci_name(pdev)); - return ers_res; - } else if (ers_res == PCI_ERS_RESULT_NEED_RESET) { - pr_debug("%s: Driver needs reset to recover\n", pci_name(pdev)); - return ers_res; - } + else + ers_res = PCI_ERS_RESULT_NONE; + + if (ers_result_indicates_abort(ers_res)) { + pr_info("%s: Automatic recovery failed after MMIO re-enable\n", + pci_name(pdev)); + return ers_res; + } else if (ers_res == PCI_ERS_RESULT_NEED_RESET) { + pr_debug("%s: Driver needs reset to recover\n", pci_name(pdev)); + return ers_res; } pr_debug("%s: Unblocking DMA\n", pci_name(pdev)); @@ -154,7 +153,12 @@ static pci_ers_result_t zpci_event_do_reset(struct pci_dev *pdev, return ers_res; } pdev->error_state = pci_channel_io_normal; - ers_res = driver->err_handler->slot_reset(pdev); + + if (driver->err_handler->slot_reset) + ers_res = driver->err_handler->slot_reset(pdev); + else + ers_res = PCI_ERS_RESULT_NONE; + if (ers_result_indicates_abort(ers_res)) { pr_info("%s: Automatic recovery failed after slot reset\n", pci_name(pdev)); return ers_res; @@ -218,7 +222,7 @@ static pci_ers_result_t zpci_event_attempt_error_recovery(struct pci_dev *pdev) goto out_unlock; } - if (ers_res == PCI_ERS_RESULT_CAN_RECOVER) { + if (ers_res != PCI_ERS_RESULT_NEED_RESET) { ers_res = zpci_event_do_error_state_clear(pdev, driver); if (ers_result_indicates_abort(ers_res)) { status_str = "failed (abort on MMIO enable)"; @@ -229,6 +233,16 @@ static pci_ers_result_t zpci_event_attempt_error_recovery(struct pci_dev *pdev) if (ers_res == PCI_ERS_RESULT_NEED_RESET) ers_res = zpci_event_do_reset(pdev, driver); + /* + * ers_res can be PCI_ERS_RESULT_NONE either because the driver + * decided to return it, indicating that it abstains from voting + * on how to recover, or because it didn't implement the callback. + * Both cases assume, that if there is nothing else causing a + * disconnect, we recovered successfully. + */ + if (ers_res == PCI_ERS_RESULT_NONE) + ers_res = PCI_ERS_RESULT_RECOVERED; + if (ers_res != PCI_ERS_RESULT_RECOVERED) { pr_err("%s: Automatic recovery failed; operator intervention is required\n", pci_name(pdev)); From d0b3b7b22dfa1f4b515fd3a295b3fd958f9e81af Mon Sep 17 00:00:00 2001 From: Linus Torvalds Date: Sun, 29 Jun 2025 13:09:04 -0700 Subject: [PATCH 259/289] Linux 6.16-rc4 --- Makefile | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Makefile b/Makefile index f884dfe10246..1c9ea229809f 100644 --- a/Makefile +++ b/Makefile @@ -2,7 +2,7 @@ VERSION = 6 PATCHLEVEL = 16 SUBLEVEL = 0 -EXTRAVERSION = -rc3 +EXTRAVERSION = -rc4 NAME = Baby Opossum Posse # *DOCUMENTATION* From 6f11adcc6f36ffd8f33dbdf5f5ce073368975bc3 Mon Sep 17 00:00:00 2001 From: Jens Axboe Date: Sun, 29 Jun 2025 16:48:28 -0600 Subject: [PATCH 260/289] io_uring: gate REQ_F_ISREG on !S_ANON_INODE as well io_uring marks a request as dealing with a regular file on S_ISREG. This drives things like retries on short reads or writes, which is generally not expected on a regular file (or bdev). Applications tend to not expect that, so io_uring tries hard to ensure it doesn't deliver short IO on regular files. However, a recent commit added S_IFREG to anonymous inodes. When io_uring is used to read from various things that are backed by anon inodes, like eventfd, timerfd, etc, then it'll now all of a sudden wait for more data when rather than deliver what was read or written in a single operation. This breaks applications that issue reads on anon inodes, if they ask for more data than a single read delivers. Add a check for !S_ANON_INODE as well before setting REQ_F_ISREG to prevent that. Cc: Christian Brauner Cc: stable@vger.kernel.org Link: https://github.com/ghostty-org/ghostty/discussions/7720 Fixes: cfd86ef7e8e7 ("anon_inode: use a proper mode internally") Signed-off-by: Jens Axboe --- io_uring/io_uring.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c index 5111ec040c53..73648d26a622 100644 --- a/io_uring/io_uring.c +++ b/io_uring/io_uring.c @@ -1666,11 +1666,12 @@ static void io_iopoll_req_issued(struct io_kiocb *req, unsigned int issue_flags) io_req_flags_t io_file_get_flags(struct file *file) { + struct inode *inode = file_inode(file); io_req_flags_t res = 0; BUILD_BUG_ON(REQ_F_ISREG_BIT != REQ_F_SUPPORT_NOWAIT_BIT + 1); - if (S_ISREG(file_inode(file)->i_mode)) + if (S_ISREG(inode->i_mode) && !(inode->i_flags & S_ANON_INODE)) res |= REQ_F_ISREG; if ((file->f_flags & O_NONBLOCK) || (file->f_mode & FMODE_NOWAIT)) res |= REQ_F_SUPPORT_NOWAIT; From 9e9b46672b1daac814b384286c21fb8332a87392 Mon Sep 17 00:00:00 2001 From: Youling Tang Date: Mon, 30 Jun 2025 09:11:48 +0800 Subject: [PATCH 261/289] xfs: add FALLOC_FL_ALLOCATE_RANGE to supported flags mask Add FALLOC_FL_ALLOCATE_RANGE to the set of supported fallocate flags in XFS_FALLOC_FL_SUPPORTED. This change improves code clarity and maintains by explicitly showing this flag in the supported flags mask. Note that since FALLOC_FL_ALLOCATE_RANGE is defined as 0x00, this addition has no functional modifications. Reviewed-by: Carlos Maiolino Signed-off-by: Youling Tang Reviewed-by: Christoph Hellwig Signed-off-by: Carlos Maiolino --- fs/xfs/xfs_file.c | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c index 48254a72071b..0b41b18debf3 100644 --- a/fs/xfs/xfs_file.c +++ b/fs/xfs/xfs_file.c @@ -1335,9 +1335,10 @@ xfs_falloc_allocate_range( } #define XFS_FALLOC_FL_SUPPORTED \ - (FALLOC_FL_KEEP_SIZE | FALLOC_FL_PUNCH_HOLE | \ - FALLOC_FL_COLLAPSE_RANGE | FALLOC_FL_ZERO_RANGE | \ - FALLOC_FL_INSERT_RANGE | FALLOC_FL_UNSHARE_RANGE) + (FALLOC_FL_ALLOCATE_RANGE | FALLOC_FL_KEEP_SIZE | \ + FALLOC_FL_PUNCH_HOLE | FALLOC_FL_COLLAPSE_RANGE | \ + FALLOC_FL_ZERO_RANGE | FALLOC_FL_INSERT_RANGE | \ + FALLOC_FL_UNSHARE_RANGE) STATIC long __xfs_file_fallocate( From 60f7f4afaf6d09c27971f30f5ab69a3aab78b28f Mon Sep 17 00:00:00 2001 From: Mark Bloch Date: Thu, 26 Jun 2025 18:42:52 -0700 Subject: [PATCH 262/289] MAINTAINERS: Add myself as mlx5 core and mlx5e co-maintainer I have been working on mlx5 related code for several years, contributing features, code reviews, and occasional maintainer tasks when needed. This patch makes my maintainer role official. Signed-off-by: Mark Bloch Signed-off-by: Saeed Mahameed Link: https://patch.msgid.link/20250627014252.1262592-1-saeed@kernel.org Signed-off-by: Jakub Kicinski --- MAINTAINERS | 2 ++ 1 file changed, 2 insertions(+) diff --git a/MAINTAINERS b/MAINTAINERS index bb7e5f8c4455..5f499ab23d6e 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -15550,6 +15550,7 @@ F: drivers/net/ethernet/mellanox/mlx4/en_* MELLANOX ETHERNET DRIVER (mlx5e) M: Saeed Mahameed M: Tariq Toukan +M: Mark Bloch L: netdev@vger.kernel.org S: Maintained W: https://www.nvidia.com/networking/ @@ -15619,6 +15620,7 @@ MELLANOX MLX5 core VPI driver M: Saeed Mahameed M: Leon Romanovsky M: Tariq Toukan +M: Mark Bloch L: netdev@vger.kernel.org L: linux-rdma@vger.kernel.org S: Maintained From e39ed71c7a26e8e94c637e222bc373b511ca127f Mon Sep 17 00:00:00 2001 From: Jiawen Wu Date: Thu, 26 Jun 2025 16:51:53 +0800 Subject: [PATCH 263/289] net: txgbe: fix the issue of TX failure There is a occasional problem that ping is failed between AML devices. That is because the manual enablement of the security Tx path on the hardware is missing, no matter what its previous state was. Fixes: 6f8b4c01a8cd ("net: txgbe: Implement PHYLINK for AML 25G/10G devices") Signed-off-by: Jiawen Wu Reviewed-by: Simon Horman Link: https://patch.msgid.link/5BDFB14C57D1C42A+20250626085153.86122-1-jiawenwu@trustnetic.com Signed-off-by: Jakub Kicinski --- drivers/net/ethernet/wangxun/txgbe/txgbe_aml.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/net/ethernet/wangxun/txgbe/txgbe_aml.c b/drivers/net/ethernet/wangxun/txgbe/txgbe_aml.c index 7dbcf41750c1..dc87ccad9652 100644 --- a/drivers/net/ethernet/wangxun/txgbe/txgbe_aml.c +++ b/drivers/net/ethernet/wangxun/txgbe/txgbe_aml.c @@ -294,6 +294,7 @@ static void txgbe_mac_link_up_aml(struct phylink_config *config, wx_fc_enable(wx, tx_pause, rx_pause); txgbe_reconfig_mac(wx); + txgbe_enable_sec_tx_path(wx); txcfg = rd32(wx, TXGBE_AML_MAC_TX_CFG); txcfg &= ~TXGBE_AML_MAC_TX_CFG_SPEED_MASK; From 6c7ffc9af7186ed79403a3ffee9a1e5199fc7450 Mon Sep 17 00:00:00 2001 From: Oleksij Rempel Date: Fri, 27 Jun 2025 07:13:46 +0200 Subject: [PATCH 264/289] net: usb: lan78xx: fix WARN in __netif_napi_del_locked on disconnect Remove redundant netif_napi_del() call from disconnect path. A WARN may be triggered in __netif_napi_del_locked() during USB device disconnect: WARNING: CPU: 0 PID: 11 at net/core/dev.c:7417 __netif_napi_del_locked+0x2b4/0x350 This happens because netif_napi_del() is called in the disconnect path while NAPI is still enabled. However, it is not necessary to call netif_napi_del() explicitly, since unregister_netdev() will handle NAPI teardown automatically and safely. Removing the redundant call avoids triggering the warning. Full trace: lan78xx 1-1:1.0 enu1: Failed to read register index 0x000000c4. ret = -ENODEV lan78xx 1-1:1.0 enu1: Failed to set MAC down with error -ENODEV lan78xx 1-1:1.0 enu1: Link is Down lan78xx 1-1:1.0 enu1: Failed to read register index 0x00000120. ret = -ENODEV ------------[ cut here ]------------ WARNING: CPU: 0 PID: 11 at net/core/dev.c:7417 __netif_napi_del_locked+0x2b4/0x350 Modules linked in: flexcan can_dev fuse CPU: 0 UID: 0 PID: 11 Comm: kworker/0:1 Not tainted 6.16.0-rc2-00624-ge926949dab03 #9 PREEMPT Hardware name: SKOV IMX8MP CPU revC - bd500 (DT) Workqueue: usb_hub_wq hub_event pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--) pc : __netif_napi_del_locked+0x2b4/0x350 lr : __netif_napi_del_locked+0x7c/0x350 sp : ffffffc085b673c0 x29: ffffffc085b673c0 x28: ffffff800b7f2000 x27: ffffff800b7f20d8 x26: ffffff80110bcf58 x25: ffffff80110bd978 x24: 1ffffff0022179eb x23: ffffff80110bc000 x22: ffffff800b7f5000 x21: ffffff80110bc000 x20: ffffff80110bcf38 x19: ffffff80110bcf28 x18: dfffffc000000000 x17: ffffffc081578940 x16: ffffffc08284cee0 x15: 0000000000000028 x14: 0000000000000006 x13: 0000000000040000 x12: ffffffb0022179e8 x11: 1ffffff0022179e7 x10: ffffffb0022179e7 x9 : dfffffc000000000 x8 : 0000004ffdde8619 x7 : ffffff80110bcf3f x6 : 0000000000000001 x5 : ffffff80110bcf38 x4 : ffffff80110bcf38 x3 : 0000000000000000 x2 : 0000000000000000 x1 : 1ffffff0022179e7 x0 : 0000000000000000 Call trace: __netif_napi_del_locked+0x2b4/0x350 (P) lan78xx_disconnect+0xf4/0x360 usb_unbind_interface+0x158/0x718 device_remove+0x100/0x150 device_release_driver_internal+0x308/0x478 device_release_driver+0x1c/0x30 bus_remove_device+0x1a8/0x368 device_del+0x2e0/0x7b0 usb_disable_device+0x244/0x540 usb_disconnect+0x220/0x758 hub_event+0x105c/0x35e0 process_one_work+0x760/0x17b0 worker_thread+0x768/0xce8 kthread+0x3bc/0x690 ret_from_fork+0x10/0x20 irq event stamp: 211604 hardirqs last enabled at (211603): [] _raw_spin_unlock_irqrestore+0x84/0x98 hardirqs last disabled at (211604): [] el1_dbg+0x24/0x80 softirqs last enabled at (211296): [] handle_softirqs+0x820/0xbc8 softirqs last disabled at (210993): [] __do_softirq+0x18/0x20 ---[ end trace 0000000000000000 ]--- lan78xx 1-1:1.0 enu1: failed to kill vid 0081/0 Fixes: ec4c7e12396b ("lan78xx: Introduce NAPI polling support") Suggested-by: Jakub Kicinski Signed-off-by: Oleksij Rempel Link: https://patch.msgid.link/20250627051346.276029-1-o.rempel@pengutronix.de Signed-off-by: Jakub Kicinski --- drivers/net/usb/lan78xx.c | 2 -- 1 file changed, 2 deletions(-) diff --git a/drivers/net/usb/lan78xx.c b/drivers/net/usb/lan78xx.c index f53e255116ea..e3ca6e91efe1 100644 --- a/drivers/net/usb/lan78xx.c +++ b/drivers/net/usb/lan78xx.c @@ -4567,8 +4567,6 @@ static void lan78xx_disconnect(struct usb_interface *intf) if (!dev) return; - netif_napi_del(&dev->napi); - udev = interface_to_usbdev(intf); net = dev->net; From 3b2c45cb1b508014cee59cbabb1fc936f2f2dd3f Mon Sep 17 00:00:00 2001 From: Lukas Bulwahn Date: Fri, 27 Jun 2025 15:44:53 +0200 Subject: [PATCH 265/289] MAINTAINERS: adjust file entry after renaming rzv2h-gbeth dtb Commit d53320aeef18 ("dt-bindings: net: Rename renesas,r9a09g057-gbeth.yaml") renames the net devicetree binding renesas,r9a09g057-gbeth.yaml to renesas,rzv2h-gbeth.yaml, but misses to adjust the file entry in the RENESAS RZ/V2H(P) DWMAC GBETH GLUE LAYER DRIVER section in MAINTAINERS. Adjust the file entry after this file renaming. Signed-off-by: Lukas Bulwahn Reviewed-by: Geert Uytterhoeven Reviewed-by: Simon Horman Reviewed-by: Lad Prabhakar Link: https://patch.msgid.link/20250627134453.51780-1-lukas.bulwahn@redhat.com Signed-off-by: Jakub Kicinski --- MAINTAINERS | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/MAINTAINERS b/MAINTAINERS index 5f499ab23d6e..84c9e2dbe3f1 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -21178,7 +21178,7 @@ M: Lad Prabhakar L: netdev@vger.kernel.org L: linux-renesas-soc@vger.kernel.org S: Maintained -F: Documentation/devicetree/bindings/net/renesas,r9a09g057-gbeth.yaml +F: Documentation/devicetree/bindings/net/renesas,rzv2h-gbeth.yaml F: drivers/net/ethernet/stmicro/stmmac/dwmac-renesas-gbeth.c RENESAS RZ/V2H(P) USB2PHY PORT RESET DRIVER From f77bf1ebf8ff6301ccdbc346f7b52db928f9cbf8 Mon Sep 17 00:00:00 2001 From: Michal Swiatkowski Date: Thu, 22 May 2025 10:52:06 +0200 Subject: [PATCH 266/289] idpf: return 0 size for RSS key if not supported Returning -EOPNOTSUPP from function returning u32 is leading to cast and invalid size value as a result. -EOPNOTSUPP as a size probably will lead to allocation fail. Command: ethtool -x eth0 It is visible on all devices that don't have RSS caps set. [ 136.615917] Call Trace: [ 136.615921] [ 136.615927] ? __warn+0x89/0x130 [ 136.615942] ? __alloc_frozen_pages_noprof+0x322/0x330 [ 136.615953] ? report_bug+0x164/0x190 [ 136.615968] ? handle_bug+0x58/0x90 [ 136.615979] ? exc_invalid_op+0x17/0x70 [ 136.615987] ? asm_exc_invalid_op+0x1a/0x20 [ 136.616001] ? rss_prepare_get.constprop.0+0xb9/0x170 [ 136.616016] ? __alloc_frozen_pages_noprof+0x322/0x330 [ 136.616028] __alloc_pages_noprof+0xe/0x20 [ 136.616038] ___kmalloc_large_node+0x80/0x110 [ 136.616072] __kmalloc_large_node_noprof+0x1d/0xa0 [ 136.616081] __kmalloc_noprof+0x32c/0x4c0 [ 136.616098] ? rss_prepare_get.constprop.0+0xb9/0x170 [ 136.616105] rss_prepare_get.constprop.0+0xb9/0x170 [ 136.616114] ethnl_default_doit+0x107/0x3d0 [ 136.616131] genl_family_rcv_msg_doit+0x100/0x160 [ 136.616147] genl_rcv_msg+0x1b8/0x2c0 [ 136.616156] ? __pfx_ethnl_default_doit+0x10/0x10 [ 136.616168] ? __pfx_genl_rcv_msg+0x10/0x10 [ 136.616176] netlink_rcv_skb+0x58/0x110 [ 136.616186] genl_rcv+0x28/0x40 [ 136.616195] netlink_unicast+0x19b/0x290 [ 136.616206] netlink_sendmsg+0x222/0x490 [ 136.616215] __sys_sendto+0x1fd/0x210 [ 136.616233] __x64_sys_sendto+0x24/0x30 [ 136.616242] do_syscall_64+0x82/0x160 [ 136.616252] ? __sys_recvmsg+0x83/0xe0 [ 136.616265] ? syscall_exit_to_user_mode+0x10/0x210 [ 136.616275] ? do_syscall_64+0x8e/0x160 [ 136.616282] ? __count_memcg_events+0xa1/0x130 [ 136.616295] ? count_memcg_events.constprop.0+0x1a/0x30 [ 136.616306] ? handle_mm_fault+0xae/0x2d0 [ 136.616319] ? do_user_addr_fault+0x379/0x670 [ 136.616328] ? clear_bhb_loop+0x45/0xa0 [ 136.616340] ? clear_bhb_loop+0x45/0xa0 [ 136.616349] ? clear_bhb_loop+0x45/0xa0 [ 136.616359] entry_SYSCALL_64_after_hwframe+0x76/0x7e [ 136.616369] RIP: 0033:0x7fd30ba7b047 [ 136.616376] Code: 0c 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 80 3d bd d5 0c 00 00 41 89 ca 74 10 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 71 c3 55 48 83 ec 30 44 89 4c 24 2c 4c 89 44 [ 136.616381] RSP: 002b:00007ffde1796d68 EFLAGS: 00000202 ORIG_RAX: 000000000000002c [ 136.616388] RAX: ffffffffffffffda RBX: 000055d7bd89f2a0 RCX: 00007fd30ba7b047 [ 136.616392] RDX: 0000000000000028 RSI: 000055d7bd89f3b0 RDI: 0000000000000003 [ 136.616396] RBP: 00007ffde1796e10 R08: 00007fd30bb4e200 R09: 000000000000000c [ 136.616399] R10: 0000000000000000 R11: 0000000000000202 R12: 000055d7bd89f340 [ 136.616403] R13: 000055d7bd89f3b0 R14: 000055d78943f200 R15: 0000000000000000 Fixes: 02cbfba1add5 ("idpf: add ethtool callbacks") Reviewed-by: Ahmed Zaki Signed-off-by: Michal Swiatkowski Reviewed-by: Simon Horman Tested-by: Samuel Salin Signed-off-by: Tony Nguyen --- drivers/net/ethernet/intel/idpf/idpf_ethtool.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/net/ethernet/intel/idpf/idpf_ethtool.c b/drivers/net/ethernet/intel/idpf/idpf_ethtool.c index 9bdb309b668e..eaf7a2606faa 100644 --- a/drivers/net/ethernet/intel/idpf/idpf_ethtool.c +++ b/drivers/net/ethernet/intel/idpf/idpf_ethtool.c @@ -47,7 +47,7 @@ static u32 idpf_get_rxfh_key_size(struct net_device *netdev) struct idpf_vport_user_config_data *user_config; if (!idpf_is_cap_ena_all(np->adapter, IDPF_RSS_CAPS, IDPF_CAP_RSS)) - return -EOPNOTSUPP; + return 0; user_config = &np->adapter->vport_config[np->vport_idx]->user_config; @@ -66,7 +66,7 @@ static u32 idpf_get_rxfh_indir_size(struct net_device *netdev) struct idpf_vport_user_config_data *user_config; if (!idpf_is_cap_ena_all(np->adapter, IDPF_RSS_CAPS, IDPF_CAP_RSS)) - return -EOPNOTSUPP; + return 0; user_config = &np->adapter->vport_config[np->vport_idx]->user_config; From b2beb5bb2cd90d7939e470ed4da468683f41baa3 Mon Sep 17 00:00:00 2001 From: Ahmed Zaki Date: Fri, 23 May 2025 14:55:37 -0600 Subject: [PATCH 267/289] idpf: convert control queue mutex to a spinlock With VIRTCHNL2_CAP_MACFILTER enabled, the following warning is generated on module load: [ 324.701677] BUG: sleeping function called from invalid context at kernel/locking/mutex.c:578 [ 324.701684] in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 1582, name: NetworkManager [ 324.701689] preempt_count: 201, expected: 0 [ 324.701693] RCU nest depth: 0, expected: 0 [ 324.701697] 2 locks held by NetworkManager/1582: [ 324.701702] #0: ffffffff9f7be770 (rtnl_mutex){....}-{3:3}, at: rtnl_newlink+0x791/0x21e0 [ 324.701730] #1: ff1100216c380368 (_xmit_ETHER){....}-{2:2}, at: __dev_open+0x3f0/0x870 [ 324.701749] Preemption disabled at: [ 324.701752] [] __dev_open+0x3dd/0x870 [ 324.701765] CPU: 30 UID: 0 PID: 1582 Comm: NetworkManager Not tainted 6.15.0-rc5+ #2 PREEMPT(voluntary) [ 324.701771] Hardware name: Intel Corporation M50FCP2SBSTD/M50FCP2SBSTD, BIOS SE5C741.86B.01.01.0001.2211140926 11/14/2022 [ 324.701774] Call Trace: [ 324.701777] [ 324.701779] dump_stack_lvl+0x5d/0x80 [ 324.701788] ? __dev_open+0x3dd/0x870 [ 324.701793] __might_resched.cold+0x1ef/0x23d <..> [ 324.701818] __mutex_lock+0x113/0x1b80 <..> [ 324.701917] idpf_ctlq_clean_sq+0xad/0x4b0 [idpf] [ 324.701935] ? kasan_save_track+0x14/0x30 [ 324.701941] idpf_mb_clean+0x143/0x380 [idpf] <..> [ 324.701991] idpf_send_mb_msg+0x111/0x720 [idpf] [ 324.702009] idpf_vc_xn_exec+0x4cc/0x990 [idpf] [ 324.702021] ? rcu_is_watching+0x12/0xc0 [ 324.702035] idpf_add_del_mac_filters+0x3ed/0xb50 [idpf] <..> [ 324.702122] __hw_addr_sync_dev+0x1cf/0x300 [ 324.702126] ? find_held_lock+0x32/0x90 [ 324.702134] idpf_set_rx_mode+0x317/0x390 [idpf] [ 324.702152] __dev_open+0x3f8/0x870 [ 324.702159] ? __pfx___dev_open+0x10/0x10 [ 324.702174] __dev_change_flags+0x443/0x650 <..> [ 324.702208] netif_change_flags+0x80/0x160 [ 324.702218] do_setlink.isra.0+0x16a0/0x3960 <..> [ 324.702349] rtnl_newlink+0x12fd/0x21e0 The sequence is as follows: rtnl_newlink()-> __dev_change_flags()-> __dev_open()-> dev_set_rx_mode() - > # disables BH and grabs "dev->addr_list_lock" idpf_set_rx_mode() -> # proceed only if VIRTCHNL2_CAP_MACFILTER is ON __dev_uc_sync() -> idpf_add_mac_filter -> idpf_add_del_mac_filters -> idpf_send_mb_msg() -> idpf_mb_clean() -> idpf_ctlq_clean_sq() # mutex_lock(cq_lock) Fix by converting cq_lock to a spinlock. All operations under the new lock are safe except freeing the DMA memory, which may use vunmap(). Fix by requesting a contiguous physical memory for the DMA mapping. Fixes: a251eee62133 ("idpf: add SRIOV support and other ndo_ops") Reviewed-by: Aleksandr Loktionov Signed-off-by: Ahmed Zaki Reviewed-by: Simon Horman Tested-by: Samuel Salin Signed-off-by: Tony Nguyen --- .../net/ethernet/intel/idpf/idpf_controlq.c | 23 +++++++++---------- .../ethernet/intel/idpf/idpf_controlq_api.h | 2 +- drivers/net/ethernet/intel/idpf/idpf_lib.c | 12 ++++++---- 3 files changed, 20 insertions(+), 17 deletions(-) diff --git a/drivers/net/ethernet/intel/idpf/idpf_controlq.c b/drivers/net/ethernet/intel/idpf/idpf_controlq.c index b28991dd1870..48b8e184f3db 100644 --- a/drivers/net/ethernet/intel/idpf/idpf_controlq.c +++ b/drivers/net/ethernet/intel/idpf/idpf_controlq.c @@ -96,7 +96,7 @@ static void idpf_ctlq_init_rxq_bufs(struct idpf_ctlq_info *cq) */ static void idpf_ctlq_shutdown(struct idpf_hw *hw, struct idpf_ctlq_info *cq) { - mutex_lock(&cq->cq_lock); + spin_lock(&cq->cq_lock); /* free ring buffers and the ring itself */ idpf_ctlq_dealloc_ring_res(hw, cq); @@ -104,8 +104,7 @@ static void idpf_ctlq_shutdown(struct idpf_hw *hw, struct idpf_ctlq_info *cq) /* Set ring_size to 0 to indicate uninitialized queue */ cq->ring_size = 0; - mutex_unlock(&cq->cq_lock); - mutex_destroy(&cq->cq_lock); + spin_unlock(&cq->cq_lock); } /** @@ -173,7 +172,7 @@ int idpf_ctlq_add(struct idpf_hw *hw, idpf_ctlq_init_regs(hw, cq, is_rxq); - mutex_init(&cq->cq_lock); + spin_lock_init(&cq->cq_lock); list_add(&cq->cq_list, &hw->cq_list_head); @@ -272,7 +271,7 @@ int idpf_ctlq_send(struct idpf_hw *hw, struct idpf_ctlq_info *cq, int err = 0; int i; - mutex_lock(&cq->cq_lock); + spin_lock(&cq->cq_lock); /* Ensure there are enough descriptors to send all messages */ num_desc_avail = IDPF_CTLQ_DESC_UNUSED(cq); @@ -332,7 +331,7 @@ int idpf_ctlq_send(struct idpf_hw *hw, struct idpf_ctlq_info *cq, wr32(hw, cq->reg.tail, cq->next_to_use); err_unlock: - mutex_unlock(&cq->cq_lock); + spin_unlock(&cq->cq_lock); return err; } @@ -364,7 +363,7 @@ int idpf_ctlq_clean_sq(struct idpf_ctlq_info *cq, u16 *clean_count, if (*clean_count > cq->ring_size) return -EBADR; - mutex_lock(&cq->cq_lock); + spin_lock(&cq->cq_lock); ntc = cq->next_to_clean; @@ -397,7 +396,7 @@ int idpf_ctlq_clean_sq(struct idpf_ctlq_info *cq, u16 *clean_count, cq->next_to_clean = ntc; - mutex_unlock(&cq->cq_lock); + spin_unlock(&cq->cq_lock); /* Return number of descriptors actually cleaned */ *clean_count = i; @@ -435,7 +434,7 @@ int idpf_ctlq_post_rx_buffs(struct idpf_hw *hw, struct idpf_ctlq_info *cq, if (*buff_count > 0) buffs_avail = true; - mutex_lock(&cq->cq_lock); + spin_lock(&cq->cq_lock); if (tbp >= cq->ring_size) tbp = 0; @@ -524,7 +523,7 @@ post_buffs_out: wr32(hw, cq->reg.tail, cq->next_to_post); } - mutex_unlock(&cq->cq_lock); + spin_unlock(&cq->cq_lock); /* return the number of buffers that were not posted */ *buff_count = *buff_count - i; @@ -552,7 +551,7 @@ int idpf_ctlq_recv(struct idpf_ctlq_info *cq, u16 *num_q_msg, u16 i; /* take the lock before we start messing with the ring */ - mutex_lock(&cq->cq_lock); + spin_lock(&cq->cq_lock); ntc = cq->next_to_clean; @@ -614,7 +613,7 @@ int idpf_ctlq_recv(struct idpf_ctlq_info *cq, u16 *num_q_msg, cq->next_to_clean = ntc; - mutex_unlock(&cq->cq_lock); + spin_unlock(&cq->cq_lock); *num_q_msg = i; if (*num_q_msg == 0) diff --git a/drivers/net/ethernet/intel/idpf/idpf_controlq_api.h b/drivers/net/ethernet/intel/idpf/idpf_controlq_api.h index 9642494a67d8..3414c5f9a831 100644 --- a/drivers/net/ethernet/intel/idpf/idpf_controlq_api.h +++ b/drivers/net/ethernet/intel/idpf/idpf_controlq_api.h @@ -99,7 +99,7 @@ struct idpf_ctlq_info { enum idpf_ctlq_type cq_type; int q_id; - struct mutex cq_lock; /* control queue lock */ + spinlock_t cq_lock; /* control queue lock */ /* used for interrupt processing */ u16 next_to_use; u16 next_to_clean; diff --git a/drivers/net/ethernet/intel/idpf/idpf_lib.c b/drivers/net/ethernet/intel/idpf/idpf_lib.c index 4eb20ec2accb..80382ff4a5fa 100644 --- a/drivers/net/ethernet/intel/idpf/idpf_lib.c +++ b/drivers/net/ethernet/intel/idpf/idpf_lib.c @@ -2314,8 +2314,12 @@ void *idpf_alloc_dma_mem(struct idpf_hw *hw, struct idpf_dma_mem *mem, u64 size) struct idpf_adapter *adapter = hw->back; size_t sz = ALIGN(size, 4096); - mem->va = dma_alloc_coherent(&adapter->pdev->dev, sz, - &mem->pa, GFP_KERNEL); + /* The control queue resources are freed under a spinlock, contiguous + * pages will avoid IOMMU remapping and the use vmap (and vunmap in + * dma_free_*() path. + */ + mem->va = dma_alloc_attrs(&adapter->pdev->dev, sz, &mem->pa, + GFP_KERNEL, DMA_ATTR_FORCE_CONTIGUOUS); mem->size = sz; return mem->va; @@ -2330,8 +2334,8 @@ void idpf_free_dma_mem(struct idpf_hw *hw, struct idpf_dma_mem *mem) { struct idpf_adapter *adapter = hw->back; - dma_free_coherent(&adapter->pdev->dev, mem->size, - mem->va, mem->pa); + dma_free_attrs(&adapter->pdev->dev, mem->size, + mem->va, mem->pa, DMA_ATTR_FORCE_CONTIGUOUS); mem->size = 0; mem->va = NULL; mem->pa = 0; From 0325143b59c6c6d79987afc57d2456e7a20d13b7 Mon Sep 17 00:00:00 2001 From: Vitaly Lifshits Date: Wed, 11 Jun 2025 15:52:54 +0300 Subject: [PATCH 268/289] igc: disable L1.2 PCI-E link substate to avoid performance issue I226 devices advertise support for the PCI-E link L1.2 substate. However, due to a hardware limitation, the exit latency from this low-power state is longer than the packet buffer can tolerate under high traffic conditions. This can lead to packet loss and degraded performance. To mitigate this, disable the L1.2 substate. The increased power draw between L1.1 and L1.2 is insignificant. Fixes: 43546211738e ("igc: Add new device ID's") Link: https://lore.kernel.org/intel-wired-lan/15248b4f-3271-42dd-8e35-02bfc92b25e1@intel.com Signed-off-by: Vitaly Lifshits Reviewed-by: Aleksandr Loktionov Tested-by: Mor Bar-Gabay Signed-off-by: Tony Nguyen --- drivers/net/ethernet/intel/igc/igc_main.c | 10 ++++++++++ 1 file changed, 10 insertions(+) diff --git a/drivers/net/ethernet/intel/igc/igc_main.c b/drivers/net/ethernet/intel/igc/igc_main.c index 686793c539f2..031c332f66c4 100644 --- a/drivers/net/ethernet/intel/igc/igc_main.c +++ b/drivers/net/ethernet/intel/igc/igc_main.c @@ -7115,6 +7115,10 @@ static int igc_probe(struct pci_dev *pdev, adapter->port_num = hw->bus.func; adapter->msg_enable = netif_msg_init(debug, DEFAULT_MSG_ENABLE); + /* Disable ASPM L1.2 on I226 devices to avoid packet loss */ + if (igc_is_device_id_i226(hw)) + pci_disable_link_state(pdev, PCIE_LINK_STATE_L1_2); + err = pci_save_state(pdev); if (err) goto err_ioremap; @@ -7500,6 +7504,9 @@ static int __igc_resume(struct device *dev, bool rpm) pci_enable_wake(pdev, PCI_D3hot, 0); pci_enable_wake(pdev, PCI_D3cold, 0); + if (igc_is_device_id_i226(hw)) + pci_disable_link_state(pdev, PCIE_LINK_STATE_L1_2); + if (igc_init_interrupt_scheme(adapter, true)) { netdev_err(netdev, "Unable to allocate memory for queues\n"); return -ENOMEM; @@ -7625,6 +7632,9 @@ static pci_ers_result_t igc_io_slot_reset(struct pci_dev *pdev) pci_enable_wake(pdev, PCI_D3hot, 0); pci_enable_wake(pdev, PCI_D3cold, 0); + if (igc_is_device_id_i226(hw)) + pci_disable_link_state_locked(pdev, PCIE_LINK_STATE_L1_2); + /* In case of PCI error, adapter loses its HW address * so we should re-assign it here. */ From e6ed134a4ef592fe1fd0cafac9683813b3c8f3e8 Mon Sep 17 00:00:00 2001 From: Dan Carpenter Date: Mon, 30 Jun 2025 14:36:40 -0500 Subject: [PATCH 269/289] lib: test_objagg: Set error message in check_expect_hints_stats() Smatch complains that the error message isn't set in the caller: lib/test_objagg.c:923 test_hints_case2() error: uninitialized symbol 'errmsg'. This static checker warning only showed up after a recent refactoring but the bug dates back to when the code was originally added. This likely doesn't affect anything in real life. Reported-by: kernel test robot Closes: https://lore.kernel.org/r/202506281403.DsuyHFTZ-lkp@intel.com/ Fixes: 0a020d416d0a ("lib: introduce initial implementation of object aggregation manager") Signed-off-by: Dan Carpenter Reviewed-by: Ido Schimmel Reviewed-by: Simon Horman Link: https://patch.msgid.link/8548f423-2e3b-4bb7-b816-5041de2762aa@sabinyo.mountain Signed-off-by: Jakub Kicinski --- lib/test_objagg.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/lib/test_objagg.c b/lib/test_objagg.c index d34df4306b87..222b39fc2629 100644 --- a/lib/test_objagg.c +++ b/lib/test_objagg.c @@ -899,8 +899,10 @@ static int check_expect_hints_stats(struct objagg_hints *objagg_hints, int err; stats = objagg_hints_stats_get(objagg_hints); - if (IS_ERR(stats)) + if (IS_ERR(stats)) { + *errmsg = "objagg_hints_stats_get() failed."; return PTR_ERR(stats); + } err = __check_expect_stats(stats, expect_stats, errmsg); objagg_stats_put(stats); return err; From 42fd432fe6d320323215ebdf4de4d0d7e56e6792 Mon Sep 17 00:00:00 2001 From: Raju Rangoju Date: Tue, 1 Jul 2025 00:56:36 +0530 Subject: [PATCH 270/289] amd-xgbe: align CL37 AN sequence as per databook Update the Clause 37 Auto-Negotiation implementation to properly align with the PCS hardware specifications: - Fix incorrect bit settings in Link Status and Link Duplex fields - Implement missing sequence steps 2 and 7 These changes ensure CL37 auto-negotiation protocol follows the exact sequence patterns as specified in the hardware databook. Fixes: 1bf40ada6290 ("amd-xgbe: Add support for clause 37 auto-negotiation") Signed-off-by: Raju Rangoju Link: https://patch.msgid.link/20250630192636.3838291-1-Raju.Rangoju@amd.com Signed-off-by: Jakub Kicinski --- drivers/net/ethernet/amd/xgbe/xgbe-common.h | 2 ++ drivers/net/ethernet/amd/xgbe/xgbe-mdio.c | 9 +++++++++ drivers/net/ethernet/amd/xgbe/xgbe.h | 4 ++-- 3 files changed, 13 insertions(+), 2 deletions(-) diff --git a/drivers/net/ethernet/amd/xgbe/xgbe-common.h b/drivers/net/ethernet/amd/xgbe/xgbe-common.h index e1296cbf4ff3..9316de4126cf 100644 --- a/drivers/net/ethernet/amd/xgbe/xgbe-common.h +++ b/drivers/net/ethernet/amd/xgbe/xgbe-common.h @@ -1269,6 +1269,8 @@ #define MDIO_VEND2_CTRL1_SS13 BIT(13) #endif +#define XGBE_VEND2_MAC_AUTO_SW BIT(9) + /* MDIO mask values */ #define XGBE_AN_CL73_INT_CMPLT BIT(0) #define XGBE_AN_CL73_INC_LINK BIT(1) diff --git a/drivers/net/ethernet/amd/xgbe/xgbe-mdio.c b/drivers/net/ethernet/amd/xgbe/xgbe-mdio.c index 71449edbb76d..fb5b7eceb73f 100644 --- a/drivers/net/ethernet/amd/xgbe/xgbe-mdio.c +++ b/drivers/net/ethernet/amd/xgbe/xgbe-mdio.c @@ -266,6 +266,10 @@ static void xgbe_an37_set(struct xgbe_prv_data *pdata, bool enable, reg |= MDIO_VEND2_CTRL1_AN_RESTART; XMDIO_WRITE(pdata, MDIO_MMD_VEND2, MDIO_CTRL1, reg); + + reg = XMDIO_READ(pdata, MDIO_MMD_VEND2, MDIO_PCS_DIG_CTRL); + reg |= XGBE_VEND2_MAC_AUTO_SW; + XMDIO_WRITE(pdata, MDIO_MMD_VEND2, MDIO_PCS_DIG_CTRL, reg); } static void xgbe_an37_restart(struct xgbe_prv_data *pdata) @@ -894,6 +898,11 @@ static void xgbe_an37_init(struct xgbe_prv_data *pdata) netif_dbg(pdata, link, pdata->netdev, "CL37 AN (%s) initialized\n", (pdata->an_mode == XGBE_AN_MODE_CL37) ? "BaseX" : "SGMII"); + + reg = XMDIO_READ(pdata, MDIO_MMD_AN, MDIO_CTRL1); + reg &= ~MDIO_AN_CTRL1_ENABLE; + XMDIO_WRITE(pdata, MDIO_MMD_AN, MDIO_CTRL1, reg); + } static void xgbe_an73_init(struct xgbe_prv_data *pdata) diff --git a/drivers/net/ethernet/amd/xgbe/xgbe.h b/drivers/net/ethernet/amd/xgbe/xgbe.h index 6359bb87dc13..057379cd43ba 100644 --- a/drivers/net/ethernet/amd/xgbe/xgbe.h +++ b/drivers/net/ethernet/amd/xgbe/xgbe.h @@ -183,12 +183,12 @@ #define XGBE_LINK_TIMEOUT 5 #define XGBE_KR_TRAINING_WAIT_ITER 50 -#define XGBE_SGMII_AN_LINK_STATUS BIT(1) +#define XGBE_SGMII_AN_LINK_DUPLEX BIT(1) #define XGBE_SGMII_AN_LINK_SPEED (BIT(2) | BIT(3)) #define XGBE_SGMII_AN_LINK_SPEED_10 0x00 #define XGBE_SGMII_AN_LINK_SPEED_100 0x04 #define XGBE_SGMII_AN_LINK_SPEED_1000 0x08 -#define XGBE_SGMII_AN_LINK_DUPLEX BIT(4) +#define XGBE_SGMII_AN_LINK_STATUS BIT(4) /* ECC correctable error notification window (seconds) */ #define XGBE_ECC_LIMIT 60 From aaf2b2480375099c022a82023e1cd772bf1c6a5d Mon Sep 17 00:00:00 2001 From: Alok Tiwari Date: Sat, 28 Jun 2025 07:56:05 -0700 Subject: [PATCH 271/289] enic: fix incorrect MTU comparison in enic_change_mtu() The comparison in enic_change_mtu() incorrectly used the current netdev->mtu instead of the new new_mtu value when warning about an MTU exceeding the port MTU. This could suppress valid warnings or issue incorrect ones. Fix the condition and log to properly reflect the new_mtu. Fixes: ab123fe071c9 ("enic: handle mtu change for vf properly") Signed-off-by: Alok Tiwari Acked-by: John Daley Reviewed-by: Simon Horman Link: https://patch.msgid.link/20250628145612.476096-1-alok.a.tiwari@oracle.com Signed-off-by: Jakub Kicinski --- drivers/net/ethernet/cisco/enic/enic_main.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/net/ethernet/cisco/enic/enic_main.c b/drivers/net/ethernet/cisco/enic/enic_main.c index 773f5ad972a2..6bc8dfdb3d4b 100644 --- a/drivers/net/ethernet/cisco/enic/enic_main.c +++ b/drivers/net/ethernet/cisco/enic/enic_main.c @@ -1864,10 +1864,10 @@ static int enic_change_mtu(struct net_device *netdev, int new_mtu) if (enic_is_dynamic(enic) || enic_is_sriov_vf(enic)) return -EOPNOTSUPP; - if (netdev->mtu > enic->port_mtu) + if (new_mtu > enic->port_mtu) netdev_warn(netdev, "interface MTU (%d) set higher than port MTU (%d)\n", - netdev->mtu, enic->port_mtu); + new_mtu, enic->port_mtu); return _enic_change_mtu(netdev, new_mtu); } From 34a500caf48c47d5171f4aa1f237da39b07c6157 Mon Sep 17 00:00:00 2001 From: Kohei Enju Date: Sun, 29 Jun 2025 12:06:31 +0900 Subject: [PATCH 272/289] rose: fix dangling neighbour pointers in rose_rt_device_down() There are two bugs in rose_rt_device_down() that can cause use-after-free: 1. The loop bound `t->count` is modified within the loop, which can cause the loop to terminate early and miss some entries. 2. When removing an entry from the neighbour array, the subsequent entries are moved up to fill the gap, but the loop index `i` is still incremented, causing the next entry to be skipped. For example, if a node has three neighbours (A, A, B) with count=3 and A is being removed, the second A is not checked. i=0: (A, A, B) -> (A, B) with count=2 ^ checked i=1: (A, B) -> (A, B) with count=2 ^ checked (B, not A!) i=2: (doesn't occur because i < count is false) This leaves the second A in the array with count=2, but the rose_neigh structure has been freed. Code that accesses these entries assumes that the first `count` entries are valid pointers, causing a use-after-free when it accesses the dangling pointer. Fix both issues by iterating over the array in reverse order with a fixed loop bound. This ensures that all entries are examined and that the removal of an entry doesn't affect subsequent iterations. Reported-by: syzbot+e04e2c007ba2c80476cb@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=e04e2c007ba2c80476cb Tested-by: syzbot+e04e2c007ba2c80476cb@syzkaller.appspotmail.com Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Kohei Enju Reviewed-by: Simon Horman Link: https://patch.msgid.link/20250629030833.6680-1-enjuk@amazon.com Signed-off-by: Jakub Kicinski --- net/rose/rose_route.c | 15 ++++----------- 1 file changed, 4 insertions(+), 11 deletions(-) diff --git a/net/rose/rose_route.c b/net/rose/rose_route.c index 2dd6bd3a3011..b72bf8a08d48 100644 --- a/net/rose/rose_route.c +++ b/net/rose/rose_route.c @@ -497,22 +497,15 @@ void rose_rt_device_down(struct net_device *dev) t = rose_node; rose_node = rose_node->next; - for (i = 0; i < t->count; i++) { + for (i = t->count - 1; i >= 0; i--) { if (t->neighbour[i] != s) continue; t->count--; - switch (i) { - case 0: - t->neighbour[0] = t->neighbour[1]; - fallthrough; - case 1: - t->neighbour[1] = t->neighbour[2]; - break; - case 2: - break; - } + memmove(&t->neighbour[i], &t->neighbour[i + 1], + sizeof(t->neighbour[0]) * + (t->count - i)); } if (t->count <= 0) From 561aa0e22b70a5e7246b73d62a824b3aef3fc375 Mon Sep 17 00:00:00 2001 From: Thomas Fourier Date: Mon, 30 Jun 2025 10:36:43 +0200 Subject: [PATCH 273/289] nui: Fix dma_mapping_error() check dma_map_XXX() functions return values DMA_MAPPING_ERROR as error values which is often ~0. The error value should be tested with dma_mapping_error(). This patch creates a new function in niu_ops to test if the mapping failed. The test is fixed in niu_rbr_add_page(), added in niu_start_xmit() and the successfully mapped pages are unmaped upon error. Fixes: ec2deec1f352 ("niu: Fix to check for dma mapping errors.") Signed-off-by: Thomas Fourier Reviewed-by: Simon Horman Signed-off-by: David S. Miller --- drivers/net/ethernet/sun/niu.c | 31 ++++++++++++++++++++++++++++++- drivers/net/ethernet/sun/niu.h | 4 ++++ 2 files changed, 34 insertions(+), 1 deletion(-) diff --git a/drivers/net/ethernet/sun/niu.c b/drivers/net/ethernet/sun/niu.c index ddca8fc7883e..26119d02a94d 100644 --- a/drivers/net/ethernet/sun/niu.c +++ b/drivers/net/ethernet/sun/niu.c @@ -3336,7 +3336,7 @@ static int niu_rbr_add_page(struct niu *np, struct rx_ring_info *rp, addr = np->ops->map_page(np->device, page, 0, PAGE_SIZE, DMA_FROM_DEVICE); - if (!addr) { + if (np->ops->mapping_error(np->device, addr)) { __free_page(page); return -ENOMEM; } @@ -6676,6 +6676,8 @@ static netdev_tx_t niu_start_xmit(struct sk_buff *skb, len = skb_headlen(skb); mapping = np->ops->map_single(np->device, skb->data, len, DMA_TO_DEVICE); + if (np->ops->mapping_error(np->device, mapping)) + goto out_drop; prod = rp->prod; @@ -6717,6 +6719,8 @@ static netdev_tx_t niu_start_xmit(struct sk_buff *skb, mapping = np->ops->map_page(np->device, skb_frag_page(frag), skb_frag_off(frag), len, DMA_TO_DEVICE); + if (np->ops->mapping_error(np->device, mapping)) + goto out_unmap; rp->tx_buffs[prod].skb = NULL; rp->tx_buffs[prod].mapping = mapping; @@ -6741,6 +6745,19 @@ static netdev_tx_t niu_start_xmit(struct sk_buff *skb, out: return NETDEV_TX_OK; +out_unmap: + while (i--) { + const skb_frag_t *frag; + + prod = PREVIOUS_TX(rp, prod); + frag = &skb_shinfo(skb)->frags[i]; + np->ops->unmap_page(np->device, rp->tx_buffs[prod].mapping, + skb_frag_size(frag), DMA_TO_DEVICE); + } + + np->ops->unmap_single(np->device, rp->tx_buffs[rp->prod].mapping, + skb_headlen(skb), DMA_TO_DEVICE); + out_drop: rp->tx_errors++; kfree_skb(skb); @@ -9644,6 +9661,11 @@ static void niu_pci_unmap_single(struct device *dev, u64 dma_address, dma_unmap_single(dev, dma_address, size, direction); } +static int niu_pci_mapping_error(struct device *dev, u64 addr) +{ + return dma_mapping_error(dev, addr); +} + static const struct niu_ops niu_pci_ops = { .alloc_coherent = niu_pci_alloc_coherent, .free_coherent = niu_pci_free_coherent, @@ -9651,6 +9673,7 @@ static const struct niu_ops niu_pci_ops = { .unmap_page = niu_pci_unmap_page, .map_single = niu_pci_map_single, .unmap_single = niu_pci_unmap_single, + .mapping_error = niu_pci_mapping_error, }; static void niu_driver_version(void) @@ -10019,6 +10042,11 @@ static void niu_phys_unmap_single(struct device *dev, u64 dma_address, /* Nothing to do. */ } +static int niu_phys_mapping_error(struct device *dev, u64 dma_address) +{ + return false; +} + static const struct niu_ops niu_phys_ops = { .alloc_coherent = niu_phys_alloc_coherent, .free_coherent = niu_phys_free_coherent, @@ -10026,6 +10054,7 @@ static const struct niu_ops niu_phys_ops = { .unmap_page = niu_phys_unmap_page, .map_single = niu_phys_map_single, .unmap_single = niu_phys_unmap_single, + .mapping_error = niu_phys_mapping_error, }; static int niu_of_probe(struct platform_device *op) diff --git a/drivers/net/ethernet/sun/niu.h b/drivers/net/ethernet/sun/niu.h index 04c215f91fc0..0b169c08b0f2 100644 --- a/drivers/net/ethernet/sun/niu.h +++ b/drivers/net/ethernet/sun/niu.h @@ -2879,6 +2879,9 @@ struct tx_ring_info { #define NEXT_TX(tp, index) \ (((index) + 1) < (tp)->pending ? ((index) + 1) : 0) +#define PREVIOUS_TX(tp, index) \ + (((index) - 1) >= 0 ? ((index) - 1) : (((tp)->pending) - 1)) + static inline u32 niu_tx_avail(struct tx_ring_info *tp) { return (tp->pending - @@ -3140,6 +3143,7 @@ struct niu_ops { enum dma_data_direction direction); void (*unmap_single)(struct device *dev, u64 dma_address, size_t size, enum dma_data_direction direction); + int (*mapping_error)(struct device *dev, u64 dma_address); }; struct niu_link_config { From 103406b38c600fec1fe375a77b27d87e314aea09 Mon Sep 17 00:00:00 2001 From: Lion Ackermann Date: Mon, 30 Jun 2025 15:27:30 +0200 Subject: [PATCH 274/289] net/sched: Always pass notifications when child class becomes empty Certain classful qdiscs may invoke their classes' dequeue handler on an enqueue operation. This may unexpectedly empty the child qdisc and thus make an in-flight class passive via qlen_notify(). Most qdiscs do not expect such behaviour at this point in time and may re-activate the class eventually anyways which will lead to a use-after-free. The referenced fix commit attempted to fix this behavior for the HFSC case by moving the backlog accounting around, though this turned out to be incomplete since the parent's parent may run into the issue too. The following reproducer demonstrates this use-after-free: tc qdisc add dev lo root handle 1: drr tc filter add dev lo parent 1: basic classid 1:1 tc class add dev lo parent 1: classid 1:1 drr tc qdisc add dev lo parent 1:1 handle 2: hfsc def 1 tc class add dev lo parent 2: classid 2:1 hfsc rt m1 8 d 1 m2 0 tc qdisc add dev lo parent 2:1 handle 3: netem tc qdisc add dev lo parent 3:1 handle 4: blackhole echo 1 | socat -u STDIN UDP4-DATAGRAM:127.0.0.1:8888 tc class delete dev lo classid 1:1 echo 1 | socat -u STDIN UDP4-DATAGRAM:127.0.0.1:8888 Since backlog accounting issues leading to a use-after-frees on stale class pointers is a recurring pattern at this point, this patch takes a different approach. Instead of trying to fix the accounting, the patch ensures that qdisc_tree_reduce_backlog always calls qlen_notify when the child qdisc is empty. This solves the problem because deletion of qdiscs always involves a call to qdisc_reset() and / or qdisc_purge_queue() which ultimately resets its qlen to 0 thus causing the following qdisc_tree_reduce_backlog() to report to the parent. Note that this may call qlen_notify on passive classes multiple times. This is not a problem after the recent patch series that made all the classful qdiscs qlen_notify() handlers idempotent. Fixes: 3f981138109f ("sch_hfsc: Fix qlen accounting bug when using peek in hfsc_enqueue()") Signed-off-by: Lion Ackermann Reviewed-by: Jamal Hadi Salim Acked-by: Cong Wang Acked-by: Jamal Hadi Salim Link: https://patch.msgid.link/d912cbd7-193b-4269-9857-525bee8bbb6a@gmail.com Signed-off-by: Jakub Kicinski --- net/sched/sch_api.c | 19 +++++-------------- 1 file changed, 5 insertions(+), 14 deletions(-) diff --git a/net/sched/sch_api.c b/net/sched/sch_api.c index c5e3673aadbe..d8a33486c511 100644 --- a/net/sched/sch_api.c +++ b/net/sched/sch_api.c @@ -780,15 +780,12 @@ static u32 qdisc_alloc_handle(struct net_device *dev) void qdisc_tree_reduce_backlog(struct Qdisc *sch, int n, int len) { - bool qdisc_is_offloaded = sch->flags & TCQ_F_OFFLOADED; const struct Qdisc_class_ops *cops; unsigned long cl; u32 parentid; bool notify; int drops; - if (n == 0 && len == 0) - return; drops = max_t(int, n, 0); rcu_read_lock(); while ((parentid = sch->parent)) { @@ -797,17 +794,8 @@ void qdisc_tree_reduce_backlog(struct Qdisc *sch, int n, int len) if (sch->flags & TCQ_F_NOPARENT) break; - /* Notify parent qdisc only if child qdisc becomes empty. - * - * If child was empty even before update then backlog - * counter is screwed and we skip notification because - * parent class is already passive. - * - * If the original child was offloaded then it is allowed - * to be seem as empty, so the parent is notified anyway. - */ - notify = !sch->q.qlen && !WARN_ON_ONCE(!n && - !qdisc_is_offloaded); + /* Notify parent qdisc only if child qdisc becomes empty. */ + notify = !sch->q.qlen; /* TODO: perform the search on a per txq basis */ sch = qdisc_lookup_rcu(qdisc_dev(sch), TC_H_MAJ(parentid)); if (sch == NULL) { @@ -816,6 +804,9 @@ void qdisc_tree_reduce_backlog(struct Qdisc *sch, int n, int len) } cops = sch->ops->cl_ops; if (notify && cops->qlen_notify) { + /* Note that qlen_notify must be idempotent as it may get called + * multiple times. + */ cl = cops->find(sch, parentid); cops->qlen_notify(sch, cl); } From 16ceda2ef683a50cd0783006c0504e1931cd8879 Mon Sep 17 00:00:00 2001 From: Raju Rangoju Date: Tue, 1 Jul 2025 12:20:16 +0530 Subject: [PATCH 275/289] amd-xgbe: do not double read link status The link status is latched low so that momentary link drops can be detected. Always double-reading the status defeats this design feature. Only double read if link was already down This prevents unnecessary duplicate readings of the link status. Fixes: 4f3b20bfbb75 ("amd-xgbe: add support for rx-adaptation") Signed-off-by: Raju Rangoju Reviewed-by: Simon Horman Link: https://patch.msgid.link/20250701065016.4140707-1-Raju.Rangoju@amd.com Signed-off-by: Jakub Kicinski --- drivers/net/ethernet/amd/xgbe/xgbe-mdio.c | 4 ++++ drivers/net/ethernet/amd/xgbe/xgbe-phy-v2.c | 24 +++++++++++++-------- 2 files changed, 19 insertions(+), 9 deletions(-) diff --git a/drivers/net/ethernet/amd/xgbe/xgbe-mdio.c b/drivers/net/ethernet/amd/xgbe/xgbe-mdio.c index fb5b7eceb73f..1a37ec45e650 100644 --- a/drivers/net/ethernet/amd/xgbe/xgbe-mdio.c +++ b/drivers/net/ethernet/amd/xgbe/xgbe-mdio.c @@ -1304,6 +1304,10 @@ static void xgbe_phy_status(struct xgbe_prv_data *pdata) pdata->phy.link = pdata->phy_if.phy_impl.link_status(pdata, &an_restart); + /* bail out if the link status register read fails */ + if (pdata->phy.link < 0) + return; + if (an_restart) { xgbe_phy_config_aneg(pdata); goto adjust_link; diff --git a/drivers/net/ethernet/amd/xgbe/xgbe-phy-v2.c b/drivers/net/ethernet/amd/xgbe/xgbe-phy-v2.c index 7a4dfa4e19c7..23c39e92e783 100644 --- a/drivers/net/ethernet/amd/xgbe/xgbe-phy-v2.c +++ b/drivers/net/ethernet/amd/xgbe/xgbe-phy-v2.c @@ -2746,8 +2746,7 @@ static bool xgbe_phy_valid_speed(struct xgbe_prv_data *pdata, int speed) static int xgbe_phy_link_status(struct xgbe_prv_data *pdata, int *an_restart) { struct xgbe_phy_data *phy_data = pdata->phy_data; - unsigned int reg; - int ret; + int reg, ret; *an_restart = 0; @@ -2781,11 +2780,20 @@ static int xgbe_phy_link_status(struct xgbe_prv_data *pdata, int *an_restart) return 0; } - /* Link status is latched low, so read once to clear - * and then read again to get current state + reg = XMDIO_READ(pdata, MDIO_MMD_PCS, MDIO_STAT1); + if (reg < 0) + return reg; + + /* Link status is latched low so that momentary link drops + * can be detected. If link was already down read again + * to get the latest state. */ - reg = XMDIO_READ(pdata, MDIO_MMD_PCS, MDIO_STAT1); - reg = XMDIO_READ(pdata, MDIO_MMD_PCS, MDIO_STAT1); + + if (!pdata->phy.link && !(reg & MDIO_STAT1_LSTATUS)) { + reg = XMDIO_READ(pdata, MDIO_MMD_PCS, MDIO_STAT1); + if (reg < 0) + return reg; + } if (pdata->en_rx_adap) { /* if the link is available and adaptation is done, @@ -2804,9 +2812,7 @@ static int xgbe_phy_link_status(struct xgbe_prv_data *pdata, int *an_restart) xgbe_phy_set_mode(pdata, phy_data->cur_mode); } - /* check again for the link and adaptation status */ - reg = XMDIO_READ(pdata, MDIO_MMD_PCS, MDIO_STAT1); - if ((reg & MDIO_STAT1_LSTATUS) && pdata->rx_adapt_done) + if (pdata->rx_adapt_done) return 1; } else if (reg & MDIO_STAT1_LSTATUS) return 1; From 5186ff7e1d0e26aaef998ba18b31c79c28d1441f Mon Sep 17 00:00:00 2001 From: Jiawen Wu Date: Tue, 1 Jul 2025 15:06:25 +0800 Subject: [PATCH 276/289] net: libwx: fix the incorrect display of the queue number When setting "ethtool -L eth0 combined 1", the number of RX/TX queue is changed to be 1. RSS is disabled at this moment, and the indices of FDIR have not be changed in wx_set_rss_queues(). So the combined count still shows the previous value. This issue was introduced when supporting FDIR. Fix it for those devices that support FDIR. Fixes: 34744a7749b3 ("net: txgbe: add FDIR info to ethtool ops") Cc: stable@vger.kernel.org Signed-off-by: Jiawen Wu Reviewed-by: Simon Horman Link: https://patch.msgid.link/A5C8FE56D6C04608+20250701070625.73680-1-jiawenwu@trustnetic.com Signed-off-by: Jakub Kicinski --- drivers/net/ethernet/wangxun/libwx/wx_lib.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/net/ethernet/wangxun/libwx/wx_lib.c b/drivers/net/ethernet/wangxun/libwx/wx_lib.c index c57cc4f27249..59840ba9c1fe 100644 --- a/drivers/net/ethernet/wangxun/libwx/wx_lib.c +++ b/drivers/net/ethernet/wangxun/libwx/wx_lib.c @@ -1705,6 +1705,7 @@ static void wx_set_rss_queues(struct wx *wx) clear_bit(WX_FLAG_FDIR_HASH, wx->flags); + wx->ring_feature[RING_F_FDIR].indices = 1; /* Use Flow Director in addition to RSS to ensure the best * distribution of flows across cores, even when an FDIR flow * isn't matched. From c2a2ff6b4db55647575260bf2227b0e09d46addb Mon Sep 17 00:00:00 2001 From: Antoine Tenart Date: Tue, 1 Jul 2025 09:49:34 +0200 Subject: [PATCH 277/289] net: ipv4: fix stat increase when udp early demux drops the packet udp_v4_early_demux now returns drop reasons as it either returns 0 or ip_mc_validate_source, which returns itself a drop reason. However its use was not converted in ip_rcv_finish_core and the drop reason is ignored, leading to potentially skipping increasing LINUX_MIB_IPRPFILTER if the drop reason is SKB_DROP_REASON_IP_RPFILTER. This is a fix and we're not converting udp_v4_early_demux to explicitly return a drop reason to ease backports; this can be done as a follow-up. Fixes: d46f827016d8 ("net: ip: make ip_mc_validate_source() return drop reason") Cc: Menglong Dong Reported-by: Sabrina Dubroca Signed-off-by: Antoine Tenart Reviewed-by: Sabrina Dubroca Link: https://patch.msgid.link/20250701074935.144134-1-atenart@kernel.org Signed-off-by: Jakub Kicinski --- net/ipv4/ip_input.c | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/net/ipv4/ip_input.c b/net/ipv4/ip_input.c index 30a5e9460d00..5a49eb99e5c4 100644 --- a/net/ipv4/ip_input.c +++ b/net/ipv4/ip_input.c @@ -319,8 +319,8 @@ static int ip_rcv_finish_core(struct net *net, const struct sk_buff *hint) { const struct iphdr *iph = ip_hdr(skb); - int err, drop_reason; struct rtable *rt; + int drop_reason; if (ip_can_use_hint(skb, iph, hint)) { drop_reason = ip_route_use_hint(skb, iph->daddr, iph->saddr, @@ -345,9 +345,10 @@ static int ip_rcv_finish_core(struct net *net, break; case IPPROTO_UDP: if (READ_ONCE(net->ipv4.sysctl_udp_early_demux)) { - err = udp_v4_early_demux(skb); - if (unlikely(err)) + drop_reason = udp_v4_early_demux(skb); + if (unlikely(drop_reason)) goto drop_error; + drop_reason = SKB_DROP_REASON_NOT_SPECIFIED; /* must reload iph, skb->head might have changed */ iph = ip_hdr(skb); From 315dbdd7cdf6aa533829774caaf4d25f1fd20e73 Mon Sep 17 00:00:00 2001 From: Bui Quang Minh Date: Mon, 30 Jun 2025 21:42:10 +0700 Subject: [PATCH 278/289] virtio-net: ensure the received length does not exceed allocated size In xdp_linearize_page, when reading the following buffers from the ring, we forget to check the received length with the true allocate size. This can lead to an out-of-bound read. This commit adds that missing check. Cc: Fixes: 4941d472bf95 ("virtio-net: do not reset during XDP set") Signed-off-by: Bui Quang Minh Acked-by: Jason Wang Link: https://patch.msgid.link/20250630144212.48471-2-minhquangbui99@gmail.com Signed-off-by: Paolo Abeni --- drivers/net/virtio_net.c | 38 ++++++++++++++++++++++++++++++++++---- 1 file changed, 34 insertions(+), 4 deletions(-) diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c index e53ba600605a..31661bcb3932 100644 --- a/drivers/net/virtio_net.c +++ b/drivers/net/virtio_net.c @@ -778,6 +778,26 @@ static unsigned int mergeable_ctx_to_truesize(void *mrg_ctx) return (unsigned long)mrg_ctx & ((1 << MRG_CTX_HEADER_SHIFT) - 1); } +static int check_mergeable_len(struct net_device *dev, void *mrg_ctx, + unsigned int len) +{ + unsigned int headroom, tailroom, room, truesize; + + truesize = mergeable_ctx_to_truesize(mrg_ctx); + headroom = mergeable_ctx_to_headroom(mrg_ctx); + tailroom = headroom ? sizeof(struct skb_shared_info) : 0; + room = SKB_DATA_ALIGN(headroom + tailroom); + + if (len > truesize - room) { + pr_debug("%s: rx error: len %u exceeds truesize %lu\n", + dev->name, len, (unsigned long)(truesize - room)); + DEV_STATS_INC(dev, rx_length_errors); + return -1; + } + + return 0; +} + static struct sk_buff *virtnet_build_skb(void *buf, unsigned int buflen, unsigned int headroom, unsigned int len) @@ -1797,7 +1817,8 @@ static unsigned int virtnet_get_headroom(struct virtnet_info *vi) * across multiple buffers (num_buf > 1), and we make sure buffers * have enough headroom. */ -static struct page *xdp_linearize_page(struct receive_queue *rq, +static struct page *xdp_linearize_page(struct net_device *dev, + struct receive_queue *rq, int *num_buf, struct page *p, int offset, @@ -1817,18 +1838,27 @@ static struct page *xdp_linearize_page(struct receive_queue *rq, memcpy(page_address(page) + page_off, page_address(p) + offset, *len); page_off += *len; + /* Only mergeable mode can go inside this while loop. In small mode, + * *num_buf == 1, so it cannot go inside. + */ while (--*num_buf) { unsigned int buflen; void *buf; + void *ctx; int off; - buf = virtnet_rq_get_buf(rq, &buflen, NULL); + buf = virtnet_rq_get_buf(rq, &buflen, &ctx); if (unlikely(!buf)) goto err_buf; p = virt_to_head_page(buf); off = buf - page_address(p); + if (check_mergeable_len(dev, ctx, buflen)) { + put_page(p); + goto err_buf; + } + /* guard against a misconfigured or uncooperative backend that * is sending packet larger than the MTU. */ @@ -1917,7 +1947,7 @@ static struct sk_buff *receive_small_xdp(struct net_device *dev, headroom = vi->hdr_len + header_offset; buflen = SKB_DATA_ALIGN(GOOD_PACKET_LEN + headroom) + SKB_DATA_ALIGN(sizeof(struct skb_shared_info)); - xdp_page = xdp_linearize_page(rq, &num_buf, page, + xdp_page = xdp_linearize_page(dev, rq, &num_buf, page, offset, header_offset, &tlen); if (!xdp_page) @@ -2252,7 +2282,7 @@ static void *mergeable_xdp_get_buf(struct virtnet_info *vi, */ if (!xdp_prog->aux->xdp_has_frags) { /* linearize data for XDP */ - xdp_page = xdp_linearize_page(rq, num_buf, + xdp_page = xdp_linearize_page(vi->dev, rq, num_buf, *page, offset, XDP_PACKET_HEADROOM, len); From 4be2193b3393dca33504793fe7586fed547abb5d Mon Sep 17 00:00:00 2001 From: Bui Quang Minh Date: Mon, 30 Jun 2025 21:42:11 +0700 Subject: [PATCH 279/289] virtio-net: remove redundant truesize check with PAGE_SIZE The truesize is guaranteed not to exceed PAGE_SIZE in get_mergeable_buf_len(). It is saved in mergeable context, which is not changeable by the host side, so the check in receive path is quite redundant. Acked-by: Jason Wang Signed-off-by: Bui Quang Minh Link: https://patch.msgid.link/20250630144212.48471-3-minhquangbui99@gmail.com Signed-off-by: Paolo Abeni --- drivers/net/virtio_net.c | 7 +++---- 1 file changed, 3 insertions(+), 4 deletions(-) diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c index 31661bcb3932..535a4534c27f 100644 --- a/drivers/net/virtio_net.c +++ b/drivers/net/virtio_net.c @@ -2157,9 +2157,9 @@ static int virtnet_build_xdp_buff_mrg(struct net_device *dev, { struct virtio_net_hdr_mrg_rxbuf *hdr = buf; unsigned int headroom, tailroom, room; - unsigned int truesize, cur_frag_size; struct skb_shared_info *shinfo; unsigned int xdp_frags_truesz = 0; + unsigned int truesize; struct page *page; skb_frag_t *frag; int offset; @@ -2207,9 +2207,8 @@ static int virtnet_build_xdp_buff_mrg(struct net_device *dev, tailroom = headroom ? sizeof(struct skb_shared_info) : 0; room = SKB_DATA_ALIGN(headroom + tailroom); - cur_frag_size = truesize; - xdp_frags_truesz += cur_frag_size; - if (unlikely(len > truesize - room || cur_frag_size > PAGE_SIZE)) { + xdp_frags_truesz += truesize; + if (unlikely(len > truesize - room)) { put_page(page); pr_debug("%s: rx error: len %u exceeds truesize %lu\n", dev->name, len, (unsigned long)(truesize - room)); From 7d4a119e45828e643baedea2d2ac736804bc85ee Mon Sep 17 00:00:00 2001 From: Bui Quang Minh Date: Mon, 30 Jun 2025 21:42:12 +0700 Subject: [PATCH 280/289] virtio-net: use the check_mergeable_len helper Replace the current repeated code to check received length in mergeable mode with the new check_mergeable_len helper. Signed-off-by: Bui Quang Minh Acked-by: Jason Wang Link: https://patch.msgid.link/20250630144212.48471-4-minhquangbui99@gmail.com Signed-off-by: Paolo Abeni --- drivers/net/virtio_net.c | 34 +++++++--------------------------- 1 file changed, 7 insertions(+), 27 deletions(-) diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c index 535a4534c27f..ecd3f46deb5d 100644 --- a/drivers/net/virtio_net.c +++ b/drivers/net/virtio_net.c @@ -2156,7 +2156,6 @@ static int virtnet_build_xdp_buff_mrg(struct net_device *dev, struct virtnet_rq_stats *stats) { struct virtio_net_hdr_mrg_rxbuf *hdr = buf; - unsigned int headroom, tailroom, room; struct skb_shared_info *shinfo; unsigned int xdp_frags_truesz = 0; unsigned int truesize; @@ -2202,20 +2201,14 @@ static int virtnet_build_xdp_buff_mrg(struct net_device *dev, page = virt_to_head_page(buf); offset = buf - page_address(page); - truesize = mergeable_ctx_to_truesize(ctx); - headroom = mergeable_ctx_to_headroom(ctx); - tailroom = headroom ? sizeof(struct skb_shared_info) : 0; - room = SKB_DATA_ALIGN(headroom + tailroom); - - xdp_frags_truesz += truesize; - if (unlikely(len > truesize - room)) { + if (check_mergeable_len(dev, ctx, len)) { put_page(page); - pr_debug("%s: rx error: len %u exceeds truesize %lu\n", - dev->name, len, (unsigned long)(truesize - room)); - DEV_STATS_INC(dev, rx_length_errors); goto err; } + truesize = mergeable_ctx_to_truesize(ctx); + xdp_frags_truesz += truesize; + frag = &shinfo->frags[shinfo->nr_frags++]; skb_frag_fill_page_desc(frag, page, offset, len); if (page_is_pfmemalloc(page)) @@ -2429,18 +2422,12 @@ static struct sk_buff *receive_mergeable(struct net_device *dev, struct sk_buff *head_skb, *curr_skb; unsigned int truesize = mergeable_ctx_to_truesize(ctx); unsigned int headroom = mergeable_ctx_to_headroom(ctx); - unsigned int tailroom = headroom ? sizeof(struct skb_shared_info) : 0; - unsigned int room = SKB_DATA_ALIGN(headroom + tailroom); head_skb = NULL; u64_stats_add(&stats->bytes, len - vi->hdr_len); - if (unlikely(len > truesize - room)) { - pr_debug("%s: rx error: len %u exceeds truesize %lu\n", - dev->name, len, (unsigned long)(truesize - room)); - DEV_STATS_INC(dev, rx_length_errors); + if (check_mergeable_len(dev, ctx, len)) goto err_skb; - } if (unlikely(vi->xdp_enabled)) { struct bpf_prog *xdp_prog; @@ -2475,17 +2462,10 @@ static struct sk_buff *receive_mergeable(struct net_device *dev, u64_stats_add(&stats->bytes, len); page = virt_to_head_page(buf); - truesize = mergeable_ctx_to_truesize(ctx); - headroom = mergeable_ctx_to_headroom(ctx); - tailroom = headroom ? sizeof(struct skb_shared_info) : 0; - room = SKB_DATA_ALIGN(headroom + tailroom); - if (unlikely(len > truesize - room)) { - pr_debug("%s: rx error: len %u exceeds truesize %lu\n", - dev->name, len, (unsigned long)(truesize - room)); - DEV_STATS_INC(dev, rx_length_errors); + if (check_mergeable_len(dev, ctx, len)) goto err_skb; - } + truesize = mergeable_ctx_to_truesize(ctx); curr_skb = virtnet_skb_append_frag(head_skb, curr_skb, page, buf, len, truesize); if (!curr_skb) From 5177373c31318c3c6a190383bfd232e6cf565c36 Mon Sep 17 00:00:00 2001 From: Bui Quang Minh Date: Mon, 30 Jun 2025 22:13:14 +0700 Subject: [PATCH 281/289] virtio-net: xsk: rx: fix the frame's length check When calling buf_to_xdp, the len argument is the frame data's length without virtio header's length (vi->hdr_len). We check that len with xsk_pool_get_rx_frame_size() + vi->hdr_len to ensure the provided len does not larger than the allocated chunk size. The additional vi->hdr_len is because in virtnet_add_recvbuf_xsk, we use part of XDP_PACKET_HEADROOM for virtio header and ask the vhost to start placing data from hard_start + XDP_PACKET_HEADROOM - vi->hdr_len not hard_start + XDP_PACKET_HEADROOM But the first buffer has virtio_header, so the maximum frame's length in the first buffer can only be xsk_pool_get_rx_frame_size() not xsk_pool_get_rx_frame_size() + vi->hdr_len like in the current check. This commit adds an additional argument to buf_to_xdp differentiate between the first buffer and other ones to correctly calculate the maximum frame's length. Cc: stable@vger.kernel.org Reviewed-by: Xuan Zhuo Fixes: a4e7ba702701 ("virtio_net: xsk: rx: support recv small mode") Signed-off-by: Bui Quang Minh Link: https://patch.msgid.link/20250630151315.86722-2-minhquangbui99@gmail.com Signed-off-by: Paolo Abeni --- drivers/net/virtio_net.c | 22 ++++++++++++++++++---- 1 file changed, 18 insertions(+), 4 deletions(-) diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c index ecd3f46deb5d..50ff9a309ddc 100644 --- a/drivers/net/virtio_net.c +++ b/drivers/net/virtio_net.c @@ -1147,15 +1147,29 @@ static void check_sq_full_and_disable(struct virtnet_info *vi, } } +/* Note that @len is the length of received data without virtio header */ static struct xdp_buff *buf_to_xdp(struct virtnet_info *vi, - struct receive_queue *rq, void *buf, u32 len) + struct receive_queue *rq, void *buf, + u32 len, bool first_buf) { struct xdp_buff *xdp; u32 bufsize; xdp = (struct xdp_buff *)buf; - bufsize = xsk_pool_get_rx_frame_size(rq->xsk_pool) + vi->hdr_len; + /* In virtnet_add_recvbuf_xsk, we use part of XDP_PACKET_HEADROOM for + * virtio header and ask the vhost to fill data from + * hard_start + XDP_PACKET_HEADROOM - vi->hdr_len + * The first buffer has virtio header so the remaining region for frame + * data is + * xsk_pool_get_rx_frame_size() + * While other buffers than the first one do not have virtio header, so + * the maximum frame data's length can be + * xsk_pool_get_rx_frame_size() + vi->hdr_len + */ + bufsize = xsk_pool_get_rx_frame_size(rq->xsk_pool); + if (!first_buf) + bufsize += vi->hdr_len; if (unlikely(len > bufsize)) { pr_debug("%s: rx error: len %u exceeds truesize %u\n", @@ -1280,7 +1294,7 @@ static int xsk_append_merge_buffer(struct virtnet_info *vi, u64_stats_add(&stats->bytes, len); - xdp = buf_to_xdp(vi, rq, buf, len); + xdp = buf_to_xdp(vi, rq, buf, len, false); if (!xdp) goto err; @@ -1378,7 +1392,7 @@ static void virtnet_receive_xsk_buf(struct virtnet_info *vi, struct receive_queu u64_stats_add(&stats->bytes, len); - xdp = buf_to_xdp(vi, rq, buf, len); + xdp = buf_to_xdp(vi, rq, buf, len, true); if (!xdp) return; From 45ebc7e6c125ce93d2ddf82cd5bea20121bb0258 Mon Sep 17 00:00:00 2001 From: Laurent Vivier Date: Wed, 21 May 2025 11:22:34 +0200 Subject: [PATCH 282/289] virtio_ring: Fix error reporting in virtqueue_resize The virtqueue_resize() function was not correctly propagating error codes from its internal resize helper functions, specifically virtqueue_resize_packet() and virtqueue_resize_split(). If these helpers returned an error, but the subsequent call to virtqueue_enable_after_reset() succeeded, the original error from the resize operation would be masked. Consequently, virtqueue_resize() could incorrectly report success to its caller despite an underlying resize failure. This change restores the original code behavior: if (vdev->config->enable_vq_after_reset(_vq)) return -EBUSY; return err; Fix: commit ad48d53b5b3f ("virtio_ring: separate the logic of reset/enable from virtqueue_resize") Cc: xuanzhuo@linux.alibaba.com Signed-off-by: Laurent Vivier Acked-by: Jason Wang Link: https://patch.msgid.link/20250521092236.661410-2-lvivier@redhat.com Tested-by: Lei Yang Acked-by: Michael S. Tsirkin Signed-off-by: Paolo Abeni --- drivers/virtio/virtio_ring.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c index b784aab66867..4397392bfef0 100644 --- a/drivers/virtio/virtio_ring.c +++ b/drivers/virtio/virtio_ring.c @@ -2797,7 +2797,7 @@ int virtqueue_resize(struct virtqueue *_vq, u32 num, void (*recycle_done)(struct virtqueue *vq)) { struct vring_virtqueue *vq = to_vvq(_vq); - int err; + int err, err_reset; if (num > vq->vq.num_max) return -E2BIG; @@ -2819,7 +2819,11 @@ int virtqueue_resize(struct virtqueue *_vq, u32 num, else err = virtqueue_resize_split(_vq, num); - return virtqueue_enable_after_reset(_vq); + err_reset = virtqueue_enable_after_reset(_vq); + if (err_reset) + return err_reset; + + return err; } EXPORT_SYMBOL_GPL(virtqueue_resize); From bd2948d2581ebd31745c1b7094a470513789555f Mon Sep 17 00:00:00 2001 From: Laurent Vivier Date: Wed, 21 May 2025 11:22:35 +0200 Subject: [PATCH 283/289] virtio_net: Cleanup '2+MAX_SKB_FRAGS' Improve consistency by using everywhere it is needed 'MAX_SKB_FRAGS + 2' rather than '2+MAX_SKB_FRAGS' or '2 + MAX_SKB_FRAGS'. No functional change. Signed-off-by: Laurent Vivier Reviewed-by: Xuan Zhuo Acked-by: Jason Wang Link: https://patch.msgid.link/20250521092236.661410-3-lvivier@redhat.com Tested-by: Lei Yang Acked-by: Michael S. Tsirkin Signed-off-by: Paolo Abeni --- drivers/net/virtio_net.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c index 50ff9a309ddc..031f82275316 100644 --- a/drivers/net/virtio_net.c +++ b/drivers/net/virtio_net.c @@ -1104,7 +1104,7 @@ static bool tx_may_stop(struct virtnet_info *vi, * Since most packets only take 1 or 2 ring slots, stopping the queue * early means 16 slots are typically wasted. */ - if (sq->vq->num_free < 2+MAX_SKB_FRAGS) { + if (sq->vq->num_free < MAX_SKB_FRAGS + 2) { struct netdev_queue *txq = netdev_get_tx_queue(dev, qnum); netif_tx_stop_queue(txq); @@ -1136,7 +1136,7 @@ static void check_sq_full_and_disable(struct virtnet_info *vi, } else if (unlikely(!virtqueue_enable_cb_delayed(sq->vq))) { /* More just got used, free them then recheck. */ free_old_xmit(sq, txq, false); - if (sq->vq->num_free >= 2+MAX_SKB_FRAGS) { + if (sq->vq->num_free >= MAX_SKB_FRAGS + 2) { netif_start_subqueue(dev, qnum); u64_stats_update_begin(&sq->stats.syncp); u64_stats_inc(&sq->stats.wake); @@ -3021,7 +3021,7 @@ static void virtnet_poll_cleantx(struct receive_queue *rq, int budget) free_old_xmit(sq, txq, !!budget); } while (unlikely(!virtqueue_enable_cb_delayed(sq->vq))); - if (sq->vq->num_free >= 2 + MAX_SKB_FRAGS) { + if (sq->vq->num_free >= MAX_SKB_FRAGS + 2) { if (netif_tx_queue_stopped(txq)) { u64_stats_update_begin(&sq->stats.syncp); u64_stats_inc(&sq->stats.wake); @@ -3218,7 +3218,7 @@ static int virtnet_poll_tx(struct napi_struct *napi, int budget) else free_old_xmit(sq, txq, !!budget); - if (sq->vq->num_free >= 2 + MAX_SKB_FRAGS) { + if (sq->vq->num_free >= MAX_SKB_FRAGS + 2) { if (netif_tx_queue_stopped(txq)) { u64_stats_update_begin(&sq->stats.syncp); u64_stats_inc(&sq->stats.wake); From 24b2f5df86aaebbe7bac40304eaf5a146c02367c Mon Sep 17 00:00:00 2001 From: Laurent Vivier Date: Wed, 21 May 2025 11:22:36 +0200 Subject: [PATCH 284/289] virtio_net: Enforce minimum TX ring size for reliability The `tx_may_stop()` logic stops TX queues if free descriptors (`sq->vq->num_free`) fall below the threshold of (`MAX_SKB_FRAGS` + 2). If the total ring size (`ring_num`) is not strictly greater than this value, queues can become persistently stopped or stop after minimal use, severely degrading performance. A single sk_buff transmission typically requires descriptors for: - The virtio_net_hdr (1 descriptor) - The sk_buff's linear data (head) (1 descriptor) - Paged fragments (up to MAX_SKB_FRAGS descriptors) This patch enforces that the TX ring size ('ring_num') must be strictly greater than (MAX_SKB_FRAGS + 2). This ensures that the ring is always large enough to hold at least one maximally-fragmented packet plus at least one additional slot. Reported-by: Lei Yang Signed-off-by: Laurent Vivier Reviewed-by: Xuan Zhuo Acked-by: Jason Wang Link: https://patch.msgid.link/20250521092236.661410-4-lvivier@redhat.com Tested-by: Lei Yang Acked-by: Michael S. Tsirkin Signed-off-by: Paolo Abeni --- drivers/net/virtio_net.c | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c index 031f82275316..5d674eb9a0f2 100644 --- a/drivers/net/virtio_net.c +++ b/drivers/net/virtio_net.c @@ -3504,6 +3504,12 @@ static int virtnet_tx_resize(struct virtnet_info *vi, struct send_queue *sq, { int qindex, err; + if (ring_num <= MAX_SKB_FRAGS + 2) { + netdev_err(vi->dev, "tx size (%d) cannot be smaller than %d\n", + ring_num, MAX_SKB_FRAGS + 2); + return -EINVAL; + } + qindex = sq - vi->sq; virtnet_tx_pause(vi, sq); From cc9f7f65cd2f31150b10e6956f1f0882e1bbae49 Mon Sep 17 00:00:00 2001 From: Jiawen Wu Date: Tue, 1 Jul 2025 14:30:28 +0800 Subject: [PATCH 285/289] net: txgbe: request MISC IRQ in ndo_open Move the creating of irq_domain for MISC IRQ from .probe to .ndo_open, and free it in .ndo_stop, to maintain consistency with the queue IRQs. This it for subsequent adjustments to the IRQ vectors. Fixes: aefd013624a1 ("net: txgbe: use irq_domain for interrupt controller") Cc: stable@vger.kernel.org Signed-off-by: Jiawen Wu Reviewed-by: Michal Swiatkowski Link: https://patch.msgid.link/20250701063030.59340-2-jiawenwu@trustnetic.com Signed-off-by: Paolo Abeni --- .../net/ethernet/wangxun/txgbe/txgbe_irq.c | 2 +- .../net/ethernet/wangxun/txgbe/txgbe_main.c | 22 +++++++++---------- 2 files changed, 11 insertions(+), 13 deletions(-) diff --git a/drivers/net/ethernet/wangxun/txgbe/txgbe_irq.c b/drivers/net/ethernet/wangxun/txgbe/txgbe_irq.c index 20b9a28bcb55..dc468053bdf8 100644 --- a/drivers/net/ethernet/wangxun/txgbe/txgbe_irq.c +++ b/drivers/net/ethernet/wangxun/txgbe/txgbe_irq.c @@ -78,7 +78,6 @@ free_queue_irqs: free_irq(wx->msix_q_entries[vector].vector, wx->q_vector[vector]); } - wx_reset_interrupt_capability(wx); return err; } @@ -211,6 +210,7 @@ void txgbe_free_misc_irq(struct txgbe *txgbe) free_irq(txgbe->link_irq, txgbe); free_irq(txgbe->misc.irq, txgbe); txgbe_del_irq_domain(txgbe); + txgbe->wx->misc_irq_domain = false; } int txgbe_setup_misc_irq(struct txgbe *txgbe) diff --git a/drivers/net/ethernet/wangxun/txgbe/txgbe_main.c b/drivers/net/ethernet/wangxun/txgbe/txgbe_main.c index f3d2778b8e35..a5867f3c93fc 100644 --- a/drivers/net/ethernet/wangxun/txgbe/txgbe_main.c +++ b/drivers/net/ethernet/wangxun/txgbe/txgbe_main.c @@ -458,10 +458,14 @@ static int txgbe_open(struct net_device *netdev) wx_configure(wx); - err = txgbe_request_queue_irqs(wx); + err = txgbe_setup_misc_irq(wx->priv); if (err) goto err_free_resources; + err = txgbe_request_queue_irqs(wx); + if (err) + goto err_free_misc_irq; + /* Notify the stack of the actual queue counts. */ err = netif_set_real_num_tx_queues(netdev, wx->num_tx_queues); if (err) @@ -479,6 +483,9 @@ static int txgbe_open(struct net_device *netdev) err_free_irq: wx_free_irq(wx); +err_free_misc_irq: + txgbe_free_misc_irq(wx->priv); + wx_reset_interrupt_capability(wx); err_free_resources: wx_free_resources(wx); err_reset: @@ -519,6 +526,7 @@ static int txgbe_close(struct net_device *netdev) wx_ptp_stop(wx); txgbe_down(wx); wx_free_irq(wx); + txgbe_free_misc_irq(wx->priv); wx_free_resources(wx); txgbe_fdir_filter_exit(wx); wx_control_hw(wx, false); @@ -564,7 +572,6 @@ static void txgbe_shutdown(struct pci_dev *pdev) int txgbe_setup_tc(struct net_device *dev, u8 tc) { struct wx *wx = netdev_priv(dev); - struct txgbe *txgbe = wx->priv; /* Hardware has to reinitialize queues and interrupts to * match packet buffer alignment. Unfortunately, the @@ -575,7 +582,6 @@ int txgbe_setup_tc(struct net_device *dev, u8 tc) else txgbe_reset(wx); - txgbe_free_misc_irq(txgbe); wx_clear_interrupt_scheme(wx); if (tc) @@ -584,7 +590,6 @@ int txgbe_setup_tc(struct net_device *dev, u8 tc) netdev_reset_tc(dev); wx_init_interrupt_scheme(wx); - txgbe_setup_misc_irq(txgbe); if (netif_running(dev)) txgbe_open(dev); @@ -882,13 +887,9 @@ static int txgbe_probe(struct pci_dev *pdev, txgbe_init_fdir(txgbe); - err = txgbe_setup_misc_irq(txgbe); - if (err) - goto err_release_hw; - err = txgbe_init_phy(txgbe); if (err) - goto err_free_misc_irq; + goto err_release_hw; err = register_netdev(netdev); if (err) @@ -916,8 +917,6 @@ static int txgbe_probe(struct pci_dev *pdev, err_remove_phy: txgbe_remove_phy(txgbe); -err_free_misc_irq: - txgbe_free_misc_irq(txgbe); err_release_hw: wx_clear_interrupt_scheme(wx); wx_control_hw(wx, false); @@ -957,7 +956,6 @@ static void txgbe_remove(struct pci_dev *pdev) unregister_netdev(netdev); txgbe_remove_phy(txgbe); - txgbe_free_misc_irq(txgbe); wx_free_isb_resources(wx); pci_release_selected_regions(pdev, From e37546ad1f9b2c777d3a21d7e50ce265ee3dece8 Mon Sep 17 00:00:00 2001 From: Jiawen Wu Date: Tue, 1 Jul 2025 14:30:29 +0800 Subject: [PATCH 286/289] net: wangxun: revert the adjustment of the IRQ vector sequence Due to hardware limitations of NGBE, queue IRQs can only be requested on vector 0 to 7. When the number of queues is set to the maximum 8, the PCI IRQ vectors are allocated from 0 to 8. The vector 0 is used by MISC interrupt, and althrough the vector 8 is used by queue interrupt, it is unable to receive packets. This will cause some packets to be dropped when RSS is enabled and they are assigned to queue 8. So revert the adjustment of the MISC IRQ location, to make it be the last one in IRQ vectors. Fixes: 937d46ecc5f9 ("net: wangxun: add ethtool_ops for channel number") Cc: stable@vger.kernel.org Signed-off-by: Jiawen Wu Reviewed-by: Larysa Zaremba Link: https://patch.msgid.link/20250701063030.59340-3-jiawenwu@trustnetic.com Signed-off-by: Paolo Abeni --- drivers/net/ethernet/wangxun/libwx/wx_lib.c | 17 ++++++++--------- drivers/net/ethernet/wangxun/libwx/wx_type.h | 2 +- drivers/net/ethernet/wangxun/ngbe/ngbe_main.c | 2 +- drivers/net/ethernet/wangxun/ngbe/ngbe_type.h | 2 +- drivers/net/ethernet/wangxun/txgbe/txgbe_irq.c | 6 +++--- drivers/net/ethernet/wangxun/txgbe/txgbe_type.h | 4 ++-- 6 files changed, 16 insertions(+), 17 deletions(-) diff --git a/drivers/net/ethernet/wangxun/libwx/wx_lib.c b/drivers/net/ethernet/wangxun/libwx/wx_lib.c index 59840ba9c1fe..835d60bd5fbc 100644 --- a/drivers/net/ethernet/wangxun/libwx/wx_lib.c +++ b/drivers/net/ethernet/wangxun/libwx/wx_lib.c @@ -1747,7 +1747,7 @@ static void wx_set_num_queues(struct wx *wx) */ static int wx_acquire_msix_vectors(struct wx *wx) { - struct irq_affinity affd = { .pre_vectors = 1 }; + struct irq_affinity affd = { .post_vectors = 1 }; int nvecs, i; /* We start by asking for one vector per queue pair */ @@ -1784,16 +1784,17 @@ static int wx_acquire_msix_vectors(struct wx *wx) return nvecs; } - wx->msix_entry->entry = 0; - wx->msix_entry->vector = pci_irq_vector(wx->pdev, 0); nvecs -= 1; for (i = 0; i < nvecs; i++) { wx->msix_q_entries[i].entry = i; - wx->msix_q_entries[i].vector = pci_irq_vector(wx->pdev, i + 1); + wx->msix_q_entries[i].vector = pci_irq_vector(wx->pdev, i); } wx->num_q_vectors = nvecs; + wx->msix_entry->entry = nvecs; + wx->msix_entry->vector = pci_irq_vector(wx->pdev, nvecs); + return 0; } @@ -2300,8 +2301,6 @@ static void wx_set_ivar(struct wx *wx, s8 direction, wr32(wx, WX_PX_MISC_IVAR, ivar); } else { /* tx or rx causes */ - if (!(wx->mac.type == wx_mac_em && wx->num_vfs == 7)) - msix_vector += 1; /* offset for queue vectors */ msix_vector |= WX_PX_IVAR_ALLOC_VAL; index = ((16 * (queue & 1)) + (8 * direction)); ivar = rd32(wx, WX_PX_IVAR(queue >> 1)); @@ -2340,7 +2339,7 @@ void wx_write_eitr(struct wx_q_vector *q_vector) itr_reg |= WX_PX_ITR_CNT_WDIS; - wr32(wx, WX_PX_ITR(v_idx + 1), itr_reg); + wr32(wx, WX_PX_ITR(v_idx), itr_reg); } /** @@ -2393,9 +2392,9 @@ void wx_configure_vectors(struct wx *wx) wx_write_eitr(q_vector); } - wx_set_ivar(wx, -1, 0, 0); + wx_set_ivar(wx, -1, 0, v_idx); if (pdev->msix_enabled) - wr32(wx, WX_PX_ITR(0), 1950); + wr32(wx, WX_PX_ITR(v_idx), 1950); } EXPORT_SYMBOL(wx_configure_vectors); diff --git a/drivers/net/ethernet/wangxun/libwx/wx_type.h b/drivers/net/ethernet/wangxun/libwx/wx_type.h index 7730c9fc3e02..d392394791b3 100644 --- a/drivers/net/ethernet/wangxun/libwx/wx_type.h +++ b/drivers/net/ethernet/wangxun/libwx/wx_type.h @@ -1343,7 +1343,7 @@ struct wx { }; #define WX_INTR_ALL (~0ULL) -#define WX_INTR_Q(i) BIT((i) + 1) +#define WX_INTR_Q(i) BIT((i)) /* register operations */ #define wr32(a, reg, value) writel((value), ((a)->hw_addr + (reg))) diff --git a/drivers/net/ethernet/wangxun/ngbe/ngbe_main.c b/drivers/net/ethernet/wangxun/ngbe/ngbe_main.c index b5022c49dc5e..68415a7ef12f 100644 --- a/drivers/net/ethernet/wangxun/ngbe/ngbe_main.c +++ b/drivers/net/ethernet/wangxun/ngbe/ngbe_main.c @@ -161,7 +161,7 @@ static void ngbe_irq_enable(struct wx *wx, bool queues) if (queues) wx_intr_enable(wx, NGBE_INTR_ALL); else - wx_intr_enable(wx, NGBE_INTR_MISC); + wx_intr_enable(wx, NGBE_INTR_MISC(wx)); } /** diff --git a/drivers/net/ethernet/wangxun/ngbe/ngbe_type.h b/drivers/net/ethernet/wangxun/ngbe/ngbe_type.h index bb74263f0498..6eca6de475f7 100644 --- a/drivers/net/ethernet/wangxun/ngbe/ngbe_type.h +++ b/drivers/net/ethernet/wangxun/ngbe/ngbe_type.h @@ -87,7 +87,7 @@ #define NGBE_PX_MISC_IC_TIMESYNC BIT(11) /* time sync */ #define NGBE_INTR_ALL 0x1FF -#define NGBE_INTR_MISC BIT(0) +#define NGBE_INTR_MISC(A) BIT((A)->num_q_vectors) #define NGBE_PHY_CONFIG(reg_offset) (0x14000 + ((reg_offset) * 4)) #define NGBE_CFG_LAN_SPEED 0x14440 diff --git a/drivers/net/ethernet/wangxun/txgbe/txgbe_irq.c b/drivers/net/ethernet/wangxun/txgbe/txgbe_irq.c index dc468053bdf8..3885283681ec 100644 --- a/drivers/net/ethernet/wangxun/txgbe/txgbe_irq.c +++ b/drivers/net/ethernet/wangxun/txgbe/txgbe_irq.c @@ -31,7 +31,7 @@ void txgbe_irq_enable(struct wx *wx, bool queues) wr32(wx, WX_PX_MISC_IEN, misc_ien); /* unmask interrupt */ - wx_intr_enable(wx, TXGBE_INTR_MISC); + wx_intr_enable(wx, TXGBE_INTR_MISC(wx)); if (queues) wx_intr_enable(wx, TXGBE_INTR_QALL(wx)); } @@ -131,7 +131,7 @@ static irqreturn_t txgbe_misc_irq_handle(int irq, void *data) txgbe->eicr = eicr; if (eicr & TXGBE_PX_MISC_IC_VF_MBOX) { wx_msg_task(txgbe->wx); - wx_intr_enable(wx, TXGBE_INTR_MISC); + wx_intr_enable(wx, TXGBE_INTR_MISC(wx)); } return IRQ_WAKE_THREAD; } @@ -183,7 +183,7 @@ static irqreturn_t txgbe_misc_irq_thread_fn(int irq, void *data) nhandled++; } - wx_intr_enable(wx, TXGBE_INTR_MISC); + wx_intr_enable(wx, TXGBE_INTR_MISC(wx)); return (nhandled > 0 ? IRQ_HANDLED : IRQ_NONE); } diff --git a/drivers/net/ethernet/wangxun/txgbe/txgbe_type.h b/drivers/net/ethernet/wangxun/txgbe/txgbe_type.h index 42ec815159e8..41915d7dd372 100644 --- a/drivers/net/ethernet/wangxun/txgbe/txgbe_type.h +++ b/drivers/net/ethernet/wangxun/txgbe/txgbe_type.h @@ -302,8 +302,8 @@ struct txgbe_fdir_filter { #define TXGBE_DEFAULT_RX_WORK 128 #endif -#define TXGBE_INTR_MISC BIT(0) -#define TXGBE_INTR_QALL(A) GENMASK((A)->num_q_vectors, 1) +#define TXGBE_INTR_MISC(A) BIT((A)->num_q_vectors) +#define TXGBE_INTR_QALL(A) (TXGBE_INTR_MISC(A) - 1) #define TXGBE_MAX_EITR GENMASK(11, 3) From 4174c0c331a2aa3322d3b3be532808deb041b37d Mon Sep 17 00:00:00 2001 From: Jiawen Wu Date: Tue, 1 Jul 2025 14:30:30 +0800 Subject: [PATCH 287/289] net: ngbe: specify IRQ vector when the number of VFs is 7 For NGBE devices, the queue number is limited to be 1 when SRIOV is enabled. In this case, IRQ vector[0] is used for MISC and vector[1] is used for queue, based on the previous patches. But for the hardware design, the IRQ vector[1] must be allocated for use by the VF[6] when the number of VFs is 7. So the IRQ vector[0] should be shared for PF MISC and QUEUE interrupts. +-----------+----------------------+ | Vector | Assigned To | +-----------+----------------------+ | Vector 0 | PF MISC and QUEUE | | Vector 1 | VF 6 | | Vector 2 | VF 5 | | Vector 3 | VF 4 | | Vector 4 | VF 3 | | Vector 5 | VF 2 | | Vector 6 | VF 1 | | Vector 7 | VF 0 | +-----------+----------------------+ Minimize code modifications, only adjust the IRQ vector number for this case. Fixes: 877253d2cbf2 ("net: ngbe: add sriov function support") Signed-off-by: Jiawen Wu Reviewed-by: Larysa Zaremba Link: https://patch.msgid.link/20250701063030.59340-4-jiawenwu@trustnetic.com Signed-off-by: Paolo Abeni --- drivers/net/ethernet/wangxun/libwx/wx_lib.c | 9 +++++++++ drivers/net/ethernet/wangxun/libwx/wx_sriov.c | 4 ++++ drivers/net/ethernet/wangxun/libwx/wx_type.h | 1 + drivers/net/ethernet/wangxun/ngbe/ngbe_main.c | 2 +- drivers/net/ethernet/wangxun/ngbe/ngbe_type.h | 2 +- 5 files changed, 16 insertions(+), 2 deletions(-) diff --git a/drivers/net/ethernet/wangxun/libwx/wx_lib.c b/drivers/net/ethernet/wangxun/libwx/wx_lib.c index 835d60bd5fbc..55e252789db3 100644 --- a/drivers/net/ethernet/wangxun/libwx/wx_lib.c +++ b/drivers/net/ethernet/wangxun/libwx/wx_lib.c @@ -1795,6 +1795,13 @@ static int wx_acquire_msix_vectors(struct wx *wx) wx->msix_entry->entry = nvecs; wx->msix_entry->vector = pci_irq_vector(wx->pdev, nvecs); + if (test_bit(WX_FLAG_IRQ_VECTOR_SHARED, wx->flags)) { + wx->msix_entry->entry = 0; + wx->msix_entry->vector = pci_irq_vector(wx->pdev, 0); + wx->msix_q_entries[0].entry = 0; + wx->msix_q_entries[0].vector = pci_irq_vector(wx->pdev, 1); + } + return 0; } @@ -2293,6 +2300,8 @@ static void wx_set_ivar(struct wx *wx, s8 direction, if (direction == -1) { /* other causes */ + if (test_bit(WX_FLAG_IRQ_VECTOR_SHARED, wx->flags)) + msix_vector = 0; msix_vector |= WX_PX_IVAR_ALLOC_VAL; index = 0; ivar = rd32(wx, WX_PX_MISC_IVAR); diff --git a/drivers/net/ethernet/wangxun/libwx/wx_sriov.c b/drivers/net/ethernet/wangxun/libwx/wx_sriov.c index e8656d9d733b..c82ae137756c 100644 --- a/drivers/net/ethernet/wangxun/libwx/wx_sriov.c +++ b/drivers/net/ethernet/wangxun/libwx/wx_sriov.c @@ -64,6 +64,7 @@ static void wx_sriov_clear_data(struct wx *wx) wr32m(wx, WX_PSR_VM_CTL, WX_PSR_VM_CTL_POOL_MASK, 0); wx->ring_feature[RING_F_VMDQ].offset = 0; + clear_bit(WX_FLAG_IRQ_VECTOR_SHARED, wx->flags); clear_bit(WX_FLAG_SRIOV_ENABLED, wx->flags); /* Disable VMDq flag so device will be set in NM mode */ if (wx->ring_feature[RING_F_VMDQ].limit == 1) @@ -78,6 +79,9 @@ static int __wx_enable_sriov(struct wx *wx, u8 num_vfs) set_bit(WX_FLAG_SRIOV_ENABLED, wx->flags); dev_info(&wx->pdev->dev, "SR-IOV enabled with %d VFs\n", num_vfs); + if (num_vfs == 7 && wx->mac.type == wx_mac_em) + set_bit(WX_FLAG_IRQ_VECTOR_SHARED, wx->flags); + /* Enable VMDq flag so device will be set in VM mode */ set_bit(WX_FLAG_VMDQ_ENABLED, wx->flags); if (!wx->ring_feature[RING_F_VMDQ].limit) diff --git a/drivers/net/ethernet/wangxun/libwx/wx_type.h b/drivers/net/ethernet/wangxun/libwx/wx_type.h index d392394791b3..c363379126c0 100644 --- a/drivers/net/ethernet/wangxun/libwx/wx_type.h +++ b/drivers/net/ethernet/wangxun/libwx/wx_type.h @@ -1191,6 +1191,7 @@ enum wx_pf_flags { WX_FLAG_VMDQ_ENABLED, WX_FLAG_VLAN_PROMISC, WX_FLAG_SRIOV_ENABLED, + WX_FLAG_IRQ_VECTOR_SHARED, WX_FLAG_FDIR_CAPABLE, WX_FLAG_FDIR_HASH, WX_FLAG_FDIR_PERFECT, diff --git a/drivers/net/ethernet/wangxun/ngbe/ngbe_main.c b/drivers/net/ethernet/wangxun/ngbe/ngbe_main.c index 68415a7ef12f..e0fc897b0a58 100644 --- a/drivers/net/ethernet/wangxun/ngbe/ngbe_main.c +++ b/drivers/net/ethernet/wangxun/ngbe/ngbe_main.c @@ -286,7 +286,7 @@ static int ngbe_request_msix_irqs(struct wx *wx) * for queue. But when num_vfs == 7, vector[1] is assigned to vf6. * Misc and queue should reuse interrupt vector[0]. */ - if (wx->num_vfs == 7) + if (test_bit(WX_FLAG_IRQ_VECTOR_SHARED, wx->flags)) err = request_irq(wx->msix_entry->vector, ngbe_misc_and_queue, 0, netdev->name, wx); else diff --git a/drivers/net/ethernet/wangxun/ngbe/ngbe_type.h b/drivers/net/ethernet/wangxun/ngbe/ngbe_type.h index 6eca6de475f7..3b2ca7f47e33 100644 --- a/drivers/net/ethernet/wangxun/ngbe/ngbe_type.h +++ b/drivers/net/ethernet/wangxun/ngbe/ngbe_type.h @@ -87,7 +87,7 @@ #define NGBE_PX_MISC_IC_TIMESYNC BIT(11) /* time sync */ #define NGBE_INTR_ALL 0x1FF -#define NGBE_INTR_MISC(A) BIT((A)->num_q_vectors) +#define NGBE_INTR_MISC(A) BIT((A)->msix_entry->entry) #define NGBE_PHY_CONFIG(reg_offset) (0x14000 + ((reg_offset) * 4)) #define NGBE_CFG_LAN_SPEED 0x14440 From f030713e5abf67d0a88864c8855f809c763af954 Mon Sep 17 00:00:00 2001 From: Krzysztof Kozlowski Date: Tue, 1 Jul 2025 08:36:22 +0200 Subject: [PATCH 288/289] dt-bindings: net: sophgo,sg2044-dwmac: Drop status from the example Examples should be complete and should not have a 'status' property, especially a disabled one because this disables the dt_binding_check of the example against the schema. Dropping 'status' property shows missing other properties - phy-mode and phy-handle. Fixes: 114508a89ddc ("dt-bindings: net: Add support for Sophgo SG2044 dwmac") Cc: Signed-off-by: Krzysztof Kozlowski Reviewed-by: Alexander Sverdlin Reviewed-by: Chen Wang Link: https://patch.msgid.link/20250701063621.23808-2-krzysztof.kozlowski@linaro.org Signed-off-by: Paolo Abeni --- Documentation/devicetree/bindings/net/sophgo,sg2044-dwmac.yaml | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/Documentation/devicetree/bindings/net/sophgo,sg2044-dwmac.yaml b/Documentation/devicetree/bindings/net/sophgo,sg2044-dwmac.yaml index 4dd2dc9c678b..8afbd9ebd73f 100644 --- a/Documentation/devicetree/bindings/net/sophgo,sg2044-dwmac.yaml +++ b/Documentation/devicetree/bindings/net/sophgo,sg2044-dwmac.yaml @@ -80,6 +80,8 @@ examples: interrupt-parent = <&intc>; interrupts = <296 IRQ_TYPE_LEVEL_HIGH>; interrupt-names = "macirq"; + phy-handle = <&phy0>; + phy-mode = "rgmii-id"; resets = <&rst 30>; reset-names = "stmmaceth"; snps,multicast-filter-bins = <0>; @@ -91,7 +93,6 @@ examples: snps,mtl-rx-config = <&gmac0_mtl_rx_setup>; snps,mtl-tx-config = <&gmac0_mtl_tx_setup>; snps,axi-config = <&gmac0_stmmac_axi_setup>; - status = "disabled"; gmac0_mtl_rx_setup: rx-queues-config { snps,rx-queues-to-use = <8>; From 223e2288f4b8c262a864e2c03964ffac91744cd5 Mon Sep 17 00:00:00 2001 From: HarshaVardhana S A Date: Tue, 1 Jul 2025 14:22:54 +0200 Subject: [PATCH 289/289] vsock/vmci: Clear the vmci transport packet properly when initializing it In vmci_transport_packet_init memset the vmci_transport_packet before populating the fields to avoid any uninitialised data being left in the structure. Cc: Bryan Tan Cc: Vishnu Dasa Cc: Broadcom internal kernel review list Cc: Stefano Garzarella Cc: "David S. Miller" Cc: Eric Dumazet Cc: Jakub Kicinski Cc: Paolo Abeni Cc: Simon Horman Cc: virtualization@lists.linux.dev Cc: netdev@vger.kernel.org Cc: stable Signed-off-by: HarshaVardhana S A Signed-off-by: Greg Kroah-Hartman Fixes: d021c344051a ("VSOCK: Introduce VM Sockets") Acked-by: Stefano Garzarella Link: https://patch.msgid.link/20250701122254.2397440-1-gregkh@linuxfoundation.org Signed-off-by: Paolo Abeni --- net/vmw_vsock/vmci_transport.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/net/vmw_vsock/vmci_transport.c b/net/vmw_vsock/vmci_transport.c index b370070194fa..7eccd6708d66 100644 --- a/net/vmw_vsock/vmci_transport.c +++ b/net/vmw_vsock/vmci_transport.c @@ -119,6 +119,8 @@ vmci_transport_packet_init(struct vmci_transport_packet *pkt, u16 proto, struct vmci_handle handle) { + memset(pkt, 0, sizeof(*pkt)); + /* We register the stream control handler as an any cid handle so we * must always send from a source address of VMADDR_CID_ANY */ @@ -131,8 +133,6 @@ vmci_transport_packet_init(struct vmci_transport_packet *pkt, pkt->type = type; pkt->src_port = src->svm_port; pkt->dst_port = dst->svm_port; - memset(&pkt->proto, 0, sizeof(pkt->proto)); - memset(&pkt->_reserved2, 0, sizeof(pkt->_reserved2)); switch (pkt->type) { case VMCI_TRANSPORT_PACKET_TYPE_INVALID: