Core & protocols
----------------
- Wrap datapath globals into net_aligned_data, to avoid false sharing.
- Preserve MSG_ZEROCOPY in forwarding (e.g. out of a container).
- Add SO_INQ and SCM_INQ support to AF_UNIX.
- Add SIOCINQ support to AF_VSOCK.
- Add TCP_MAXSEG sockopt to MPTCP.
- Add IPv6 force_forwarding sysctl to enable forwarding per interface.
- Make TCP validation of whether packet fully fits in the receive
window and the rcv_buf more strict. With increased use of HW
aggregation a single "packet" can be multiple 100s of kB.
- Add MSG_MORE flag to optimize large TCP transmissions via sockmap,
improves latency up to 33% for sockmap users.
- Convert TCP send queue handling from tasklet to BH workque.
- Improve BPF iteration over TCP sockets to see each socket exactly once.
- Remove obsolete and unused TCP RFC3517/RFC6675 loss recovery code.
- Support enabling kernel threads for NAPI processing on per-NAPI
instance basis rather than a whole device. Fully stop the kernel NAPI
thread when threaded NAPI gets disabled. Previously thread would stick
around until ifdown due to tricky synchronization.
- Allow multicast routing to take effect on locally-generated packets.
- Add output interface argument for End.X in segment routing.
- MCTP: add support for gateway routing, improve bind() handling.
- Don't require rtnl_lock when fetching an IPv6 neighbor over Netlink.
- Add a new neighbor flag ("extern_valid"), which cedes refresh
responsibilities to userspace. This is needed for EVPN multi-homing
where a neighbor entry for a multi-homed host needs to be synced
across all the VTEPs among which the host is multi-homed.
- Support NUD_PERMANENT for proxy neighbor entries.
- Add a new queuing discipline for IETF RFC9332 DualQ Coupled AQM.
- Add sequence numbers to netconsole messages. Unregister netconsole's
console when all net targets are removed. Code refactoring.
Add a number of selftests.
- Align IPSec inbound SA lookup to RFC 4301. Only SPI and protocol
should be used for an inbound SA lookup.
- Support inspecting ref_tracker state via DebugFS.
- Don't force bonding advertisement frames tx to ~333 ms boundaries.
Add broadcast_neighbor option to send ARP/ND on all bonded links.
- Allow providing upcall pid for the 'execute' command in openvswitch.
- Remove DCCP support from Netfilter's conntrack.
- Disallow multiple packet duplications in the queuing layer.
- Prevent use of deprecated iptables code on PREEMPT_RT.
Driver API
----------
- Support RSS and hashing configuration over ethtool Netlink.
- Add dedicated ethtool callbacks for getting and setting hashing fields.
- Add support for power budget evaluation strategy in PSE /
Power-over-Ethernet. Generate Netlink events for overcurrent etc.
- Support DPLL phase offset monitoring across all device inputs.
Support providing clock reference and SYNC over separate DPLL
inputs.
- Support traffic classes in devlink rate API for bandwidth management.
- Remove rtnl_lock dependency from UDP tunnel port configuration.
Device drivers
--------------
- Add a new Broadcom driver for 800G Ethernet (bnge).
- Add a standalone driver for Microchip ZL3073x DPLL.
- Remove IBM's NETIUCV device driver.
- Ethernet high-speed NICs:
- Broadcom (bnxt):
- support zero-copy Tx of DMABUF memory
- take page size into account for page pool recycling rings
- Intel (100G, ice, idpf):
- idpf: XDP and AF_XDP support preparations
- idpf: add flow steering
- add link_down_events statistic
- clean up the TSPLL code
- preparations for live VM migration
- nVidia/Mellanox:
- support zero-copy Rx/Tx interfaces (DMABUF and io_uring)
- optimize context memory usage for matchers
- expose serial numbers in devlink info
- support PCIe congestion metrics
- Meta (fbnic):
- add 25G, 50G, and 100G link modes to phylink
- support dumping FW logs
- Marvell/Cavium:
- support for CN20K generation of the Octeon chips
- Amazon:
- add HW clock (without timestamping, just hypervisor time access)
- Ethernet virtual:
- VirtIO net:
- support segmentation of UDP-tunnel-encapsulated packets
- Google (gve):
- support packet timestamping and clock synchronization
- Microsoft vNIC:
- add handler for device-originated servicing events
- allow dynamic MSI-X vector allocation
- support Tx bandwidth clamping
- Ethernet NICs consumer, and embedded:
- AMD:
- amd-xgbe: hardware timestamping and PTP clock support
- Broadcom integrated MACs (bcmgenet, bcmasp):
- use napi_complete_done() return value to support NAPI polling
- add support for re-starting auto-negotiation
- Broadcom switches (b53):
- support BCM5325 switches
- add bcm63xx EPHY power control
- Synopsys (stmmac):
- lots of code refactoring and cleanups
- TI:
- icssg-prueth: read firmware-names from device tree
- icssg: PRP offload support
- Microchip:
- lan78xx: convert to PHYLINK for improved PHY and MAC management
- ksz: add KSZ8463 switch support
- Intel:
- support similar queue priority scheme in multi-queue and
time-sensitive networking (taprio)
- support packet pre-emption in both
- RealTek (r8169):
- enable EEE at 5Gbps on RTL8126
- Airoha:
- add PPPoE offload support
- MDIO bus controller for Airoha AN7583
- Ethernet PHYs:
- support for the IPQ5018 internal GE PHY
- micrel KSZ9477 switch-integrated PHYs:
- add MDI/MDI-X control support
- add RX error counters
- add cable test support
- add Signal Quality Indicator (SQI) reporting
- dp83tg720: improve reset handling and reduce link recovery time
- support bcm54811 (and its MII-Lite interface type)
- air_en8811h: support resume/suspend
- support PHY counters for QCA807x and QCA808x
- support WoL for QCA807x
- CAN drivers:
- rcar_canfd: support for Transceiver Delay Compensation
- kvaser: report FW versions via devlink dev info
- WiFi:
- extended regulatory info support (6 GHz)
- add statistics and beacon monitor for Multi-Link Operation (MLO)
- support S1G aggregation, improve S1G support
- add Radio Measurement action fields
- support per-radio RTS threshold
- some work around how FIPS affects wifi, which was wrong (RC4 is used
by TKIP, not only WEP)
- improvements for unsolicited probe response handling
- WiFi drivers:
- RealTek (rtw88):
- IBSS mode for SDIO devices
- RealTek (rtw89):
- BT coexistence for MLO/WiFi7
- concurrent station + P2P support
- support for USB devices RTL8851BU/RTL8852BU
- Intel (iwlwifi):
- use embedded PNVM in (to be released) FW images to fix
compatibility issues
- many cleanups (unused FW APIs, PCIe code, WoWLAN)
- some FIPS interoperability
- MediaTek (mt76):
- firmware recovery improvements
- more MLO work
- Qualcomm/Atheros (ath12k):
- fix scan on multi-radio devices
- more EHT/Wi-Fi 7 features
- encapsulation/decapsulation offload
- Broadcom (brcm80211):
- support SDIO 43751 device
- Bluetooth:
- hci_event: add support for handling LE BIG Sync Lost event
- ISO: add socket option to report packet seqnum via CMSG
- ISO: support SCM_TIMESTAMPING for ISO TS
- Bluetooth drivers:
- intel_pcie: support Function Level Reset
- nxpuart: add support for 4M baudrate
- nxpuart: implement powerup sequence, reset, FW dump, and FW loading
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmiFgLgACgkQMUZtbf5S
IrvafxAAnQRwYBoIG+piCILx6z5pRvBGHkmEQ4AQgSCFuq2eO3ubwMFIqEybfma1
5+QFjUZAV3OgGgKRBS2KGWxtSzdiF+/JGV1VOIN67sX3Mm0a2QgjA4n5CgKL0FPr
o6BEzjX5XwG1zvGcBNQ5BZ19xUUKjoZQgTtnea8sZ57Fsp5RtRgmYRqoewNvNk/n
uImh0NFsDVb0UeOpSzC34VD9l1dJvLGdui4zJAjno/vpvmT1DkXjoK419J/r52SS
X+5WgsfJ6DkjHqVN1tIhhK34yWqBOcwGFZJgEnWHMkFIl2FqRfFKMHyqtfLlVnLA
mnIpSyz8Sq2AHtx0TlgZ3At/Ri8p5+yYJgHOXcDKyABa8y8Zf4wrycmr6cV9JLuL
z54nLEVnJuvfDVDVJjsLYdJXyhMpZFq6+uAItdxKaw8Ugp/QqG4QtoRj+XIHz4ZW
z6OohkCiCzTwEISFK+pSTxPS30eOxq43kCspcvuLiwCCStJBRkRb5GdZA4dm7LA+
1Od4ADAkHjyrFtBqTyyC2scX8UJ33DlAIpAYyIeS6w9Cj9EXxtp1z33IAAAZ03MW
jJwIaJuc8bK2fWKMmiG7ucIXjPo4t//KiWlpkwwqLhPbjZgfDAcxq1AC2TLoqHBL
y4EOgKpHDCMAghSyiFIAn2JprGcEt8dp+11B0JRXIn4Pm/eYDH8=
=lqbe
-----END PGP SIGNATURE-----
Merge tag 'net-next-6.17' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next
Pull networking updates from Jakub Kicinski:
"Core & protocols:
- Wrap datapath globals into net_aligned_data, to avoid false sharing
- Preserve MSG_ZEROCOPY in forwarding (e.g. out of a container)
- Add SO_INQ and SCM_INQ support to AF_UNIX
- Add SIOCINQ support to AF_VSOCK
- Add TCP_MAXSEG sockopt to MPTCP
- Add IPv6 force_forwarding sysctl to enable forwarding per interface
- Make TCP validation of whether packet fully fits in the receive
window and the rcv_buf more strict. With increased use of HW
aggregation a single "packet" can be multiple 100s of kB
- Add MSG_MORE flag to optimize large TCP transmissions via sockmap,
improves latency up to 33% for sockmap users
- Convert TCP send queue handling from tasklet to BH workque
- Improve BPF iteration over TCP sockets to see each socket exactly
once
- Remove obsolete and unused TCP RFC3517/RFC6675 loss recovery code
- Support enabling kernel threads for NAPI processing on per-NAPI
instance basis rather than a whole device. Fully stop the kernel
NAPI thread when threaded NAPI gets disabled. Previously thread
would stick around until ifdown due to tricky synchronization
- Allow multicast routing to take effect on locally-generated packets
- Add output interface argument for End.X in segment routing
- MCTP: add support for gateway routing, improve bind() handling
- Don't require rtnl_lock when fetching an IPv6 neighbor over Netlink
- Add a new neighbor flag ("extern_valid"), which cedes refresh
responsibilities to userspace. This is needed for EVPN multi-homing
where a neighbor entry for a multi-homed host needs to be synced
across all the VTEPs among which the host is multi-homed
- Support NUD_PERMANENT for proxy neighbor entries
- Add a new queuing discipline for IETF RFC9332 DualQ Coupled AQM
- Add sequence numbers to netconsole messages. Unregister
netconsole's console when all net targets are removed. Code
refactoring. Add a number of selftests
- Align IPSec inbound SA lookup to RFC 4301. Only SPI and protocol
should be used for an inbound SA lookup
- Support inspecting ref_tracker state via DebugFS
- Don't force bonding advertisement frames tx to ~333 ms boundaries.
Add broadcast_neighbor option to send ARP/ND on all bonded links
- Allow providing upcall pid for the 'execute' command in openvswitch
- Remove DCCP support from Netfilter's conntrack
- Disallow multiple packet duplications in the queuing layer
- Prevent use of deprecated iptables code on PREEMPT_RT
Driver API:
- Support RSS and hashing configuration over ethtool Netlink
- Add dedicated ethtool callbacks for getting and setting hashing
fields
- Add support for power budget evaluation strategy in PSE /
Power-over-Ethernet. Generate Netlink events for overcurrent etc
- Support DPLL phase offset monitoring across all device inputs.
Support providing clock reference and SYNC over separate DPLL
inputs
- Support traffic classes in devlink rate API for bandwidth
management
- Remove rtnl_lock dependency from UDP tunnel port configuration
Device drivers:
- Add a new Broadcom driver for 800G Ethernet (bnge)
- Add a standalone driver for Microchip ZL3073x DPLL
- Remove IBM's NETIUCV device driver
- Ethernet high-speed NICs:
- Broadcom (bnxt):
- support zero-copy Tx of DMABUF memory
- take page size into account for page pool recycling rings
- Intel (100G, ice, idpf):
- idpf: XDP and AF_XDP support preparations
- idpf: add flow steering
- add link_down_events statistic
- clean up the TSPLL code
- preparations for live VM migration
- nVidia/Mellanox:
- support zero-copy Rx/Tx interfaces (DMABUF and io_uring)
- optimize context memory usage for matchers
- expose serial numbers in devlink info
- support PCIe congestion metrics
- Meta (fbnic):
- add 25G, 50G, and 100G link modes to phylink
- support dumping FW logs
- Marvell/Cavium:
- support for CN20K generation of the Octeon chips
- Amazon:
- add HW clock (without timestamping, just hypervisor time access)
- Ethernet virtual:
- VirtIO net:
- support segmentation of UDP-tunnel-encapsulated packets
- Google (gve):
- support packet timestamping and clock synchronization
- Microsoft vNIC:
- add handler for device-originated servicing events
- allow dynamic MSI-X vector allocation
- support Tx bandwidth clamping
- Ethernet NICs consumer, and embedded:
- AMD:
- amd-xgbe: hardware timestamping and PTP clock support
- Broadcom integrated MACs (bcmgenet, bcmasp):
- use napi_complete_done() return value to support NAPI polling
- add support for re-starting auto-negotiation
- Broadcom switches (b53):
- support BCM5325 switches
- add bcm63xx EPHY power control
- Synopsys (stmmac):
- lots of code refactoring and cleanups
- TI:
- icssg-prueth: read firmware-names from device tree
- icssg: PRP offload support
- Microchip:
- lan78xx: convert to PHYLINK for improved PHY and MAC management
- ksz: add KSZ8463 switch support
- Intel:
- support similar queue priority scheme in multi-queue and
time-sensitive networking (taprio)
- support packet pre-emption in both
- RealTek (r8169):
- enable EEE at 5Gbps on RTL8126
- Airoha:
- add PPPoE offload support
- MDIO bus controller for Airoha AN7583
- Ethernet PHYs:
- support for the IPQ5018 internal GE PHY
- micrel KSZ9477 switch-integrated PHYs:
- add MDI/MDI-X control support
- add RX error counters
- add cable test support
- add Signal Quality Indicator (SQI) reporting
- dp83tg720: improve reset handling and reduce link recovery time
- support bcm54811 (and its MII-Lite interface type)
- air_en8811h: support resume/suspend
- support PHY counters for QCA807x and QCA808x
- support WoL for QCA807x
- CAN drivers:
- rcar_canfd: support for Transceiver Delay Compensation
- kvaser: report FW versions via devlink dev info
- WiFi:
- extended regulatory info support (6 GHz)
- add statistics and beacon monitor for Multi-Link Operation (MLO)
- support S1G aggregation, improve S1G support
- add Radio Measurement action fields
- support per-radio RTS threshold
- some work around how FIPS affects wifi, which was wrong (RC4 is
used by TKIP, not only WEP)
- improvements for unsolicited probe response handling
- WiFi drivers:
- RealTek (rtw88):
- IBSS mode for SDIO devices
- RealTek (rtw89):
- BT coexistence for MLO/WiFi7
- concurrent station + P2P support
- support for USB devices RTL8851BU/RTL8852BU
- Intel (iwlwifi):
- use embedded PNVM in (to be released) FW images to fix
compatibility issues
- many cleanups (unused FW APIs, PCIe code, WoWLAN)
- some FIPS interoperability
- MediaTek (mt76):
- firmware recovery improvements
- more MLO work
- Qualcomm/Atheros (ath12k):
- fix scan on multi-radio devices
- more EHT/Wi-Fi 7 features
- encapsulation/decapsulation offload
- Broadcom (brcm80211):
- support SDIO 43751 device
- Bluetooth:
- hci_event: add support for handling LE BIG Sync Lost event
- ISO: add socket option to report packet seqnum via CMSG
- ISO: support SCM_TIMESTAMPING for ISO TS
- Bluetooth drivers:
- intel_pcie: support Function Level Reset
- nxpuart: add support for 4M baudrate
- nxpuart: implement powerup sequence, reset, FW dump, and FW loading"
* tag 'net-next-6.17' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1742 commits)
dpll: zl3073x: Fix build failure
selftests: bpf: fix legacy netfilter options
ipv6: annotate data-races around rt->fib6_nsiblings
ipv6: fix possible infinite loop in fib6_info_uses_dev()
ipv6: prevent infinite loop in rt6_nlmsg_size()
ipv6: add a retry logic in net6_rt_notify()
vrf: Drop existing dst reference in vrf_ip6_input_dst
net/sched: taprio: align entry index attr validation with mqprio
net: fsl_pq_mdio: use dev_err_probe
selftests: rtnetlink.sh: remove esp4_offload after test
vsock: remove unnecessary null check in vsock_getname()
igb: xsk: solve negative overflow of nb_pkts in zerocopy mode
stmmac: xsk: fix negative overflow of budget in zerocopy mode
dt-bindings: ieee802154: Convert at86rf230.txt yaml format
net: dsa: microchip: Disable PTP function of KSZ8463
net: dsa: microchip: Setup fiber ports for KSZ8463
net: dsa: microchip: Write switch MAC address differently for KSZ8463
net: dsa: microchip: Use different registers for KSZ8463
net: dsa: microchip: Add KSZ8463 switch support to KSZ DSA driver
dt-bindings: net: dsa: microchip: Add KSZ8463 switch support
...
The initial idea of making them const was correct as they were seperate
instances. When they got embedded into larger data structures, which are
even modified by the callback this got moot. The only reason why this went
unnoticed is that the required container_of() casts the const attribute
forcefully away.
Stop pretending that it is const.
-----BEGIN PGP SIGNATURE-----
iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmiGkt0THHRnbHhAbGlu
dXRyb25peC5kZQAKCRCmGPVMDXSYodb/D/47tAOcL6qyTxh+E5fDj4h5M2aVQYp+
oLiBY7ejroObCAiMo6v/ajaGNEXvLXEtratFBQv1o7n12rxsKnv/XNdIbAzvMwKm
3vNx9anQ0THFnWiUKCAf/mZBc/Z9uXnjBXY0TeaFLuj4DuUMqGGs/ga32UIvEwW+
VXS592NIHYoK87VA/AIOK8AGX9VesYoxUqZmfmYn5NuJwsJDSgGaQpoH2fBVhU6p
B7NRnVLYtFYpeAvL0ihFuOP03HZes5hM2wmqNtoCJJF9e39Eg9DXhsaUw8AIa1fu
UCxm5sfNpMR+QN2AizbI5deHZJbFwYPy/cFrZ0AiwCfZt6NATjO7bQjvu/23yPFn
klby8GizV0Q7qmiLi45EdPdid5oiBm/lNy6ICP9fN/XlHZFxAsqE/OGx626P+LPs
xQK1MpHNjbkCK//X49hxQX6a0BCqSEAG4GOuEEqRxTeD9SZVzIQbcQ6CKOuBqPqc
hc5o5N3suzdmq5sujQD0IiL9R960WfzsSgbmdQm+njhRz+LjAx6T419MlyePiu/5
hf8pZjl/SHn1O6RpPfJ501U+tbo1auEnUjXMGcg12dx7KE4w9s7fboFQLXruuVRr
UNeVXeMRxZqXSAB1HyKQMI8lZ7nHF6FZUA+xLMusCwIYi7YU0h7I3BxDzxpZV62l
zJgslDp7ZqSopw==
=C0ua
-----END PGP SIGNATURE-----
Merge tag 'timers-cleanups-2025-07-27' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer cleanups from Thomas Gleixner:
"A treewide cleanup of struct cycle_counter const annotations.
The initial idea of making them const was correct as they were
seperate instances. When they got embedded into larger data
structures, which are even modified by the callback this got moot. The
only reason why this went unnoticed is that the required
container_of() casts the const attribute forcefully away.
Stop pretending that it is const"
* tag 'timers-cleanups-2025-07-27' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
time/timecounter: Fix the lie that struct cyclecounter is const
Fix typos in comments and error messages.
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: David Arinzon <darinzon@amazon.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250723201528.2908218-1-helgaas@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
On ASP versions v2.x we need to program the TX map vector register to
properly exercise end-to-end flow control, otherwise the TX engine can
either lock-up, or cause the hardware calculated checksum to be
wrong/corrupted when multiple back to back packets are being submitted
for transmission. This register defaults to 0, which means no flow
control being applied.
Fixes: e9f31435ee ("net: bcmasp: Add support for asp-v3.0")
Signed-off-by: Florian Fainelli <florian.fainelli@broadcom.com>
Link: https://patch.msgid.link/20250718212242.3447751-1-florian.fainelli@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Wire-up ethtool_ops::nway_reset to phy_ethtool_nway_reset in order to
support re-starting auto-negotiation.
Signed-off-by: Florian Fainelli <florian.fainelli@broadcom.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20250717180915.2611890-1-florian.fainelli@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
This moves bnxt_hsi.h contents to a common location so it can be
properly referenced by bnxt_en, bnxt_re, and bnge.
Signed-off-by: Andy Gospodarek <gospo@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev>
Link: https://patch.msgid.link/20250714170202.39688-1-gospo@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Cross-merge networking fixes after downstream PR (net-6.16-rc6-2).
No conflicts.
Adjacent changes:
drivers/net/wireless/mediatek/mt76/mt7925/mcu.c
c701574c54 ("wifi: mt76: mt7925: fix invalid array index in ssid assignment during hw scan")
b3a431fe2e ("wifi: mt76: mt7925: fix off by one in mt7925_mcu_hw_scan()")
drivers/net/wireless/mediatek/mt76/mt7996/mac.c
62da647a2b ("wifi: mt76: mt7996: Add MLO support to mt7996_tx_check_aggr()")
dc66a129ad ("wifi: mt76: add a wrapper for wcid access with validation")
drivers/net/wireless/mediatek/mt76/mt7996/main.c
3dd6f67c66 ("wifi: mt76: Move RCU section in mt7996_mcu_add_rate_ctrl()")
8989d8e90f ("wifi: mt76: mt7996: Do not set wcid.sta to 1 in mt7996_mac_sta_event()")
net/mac80211/cfg.c
58fcb1b428 ("wifi: mac80211: reject VHT opmode for unsupported channel widths")
037dc18ac3 ("wifi: mac80211: add support for storing station S1G capabilities")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
bnxt_fill_drv_seg_record() calls bnxt_dbg_hwrm_log_buffer_flush()
to flush the FW trace buffer. This needs to be done before we
call bnxt_copy_ctx_mem() to copy the trace data.
Without this fix, the coredump may not contain all the FW
traces.
Fixes: 3c2179e663 ("bnxt_en: Add FW trace coredump segments to the coredump")
Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Signed-off-by: Shruti Parab <shruti.parab@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Link: https://patch.msgid.link/20250710213938.1959625-3-michael.chan@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
In bnxt_ets_validate(), the code incorrectly loops over all possible
traffic classes to check and add the ETS settings. Fix it to loop
over the configured traffic classes only.
The unconfigured traffic classes will default to TSA_ETS with 0
bandwidth. Looping over these unconfigured traffic classes may
cause the validation to fail and trigger this error message:
"rejecting ETS config starving a TC\n"
The .ieee_setets() will then fail.
Fixes: 7df4ae9fe8 ("bnxt_en: Implement DCBNL to support host-based DCBX.")
Reviewed-by: Sreekanth Reddy <sreekanth.reddy@broadcom.com>
Signed-off-by: Shravya KN <shravya.k-n@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Link: https://patch.msgid.link/20250710213938.1959625-2-michael.chan@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Add a network device with netdev features enabled.
Some features are enabled based on the capabilities
advertised by the firmware. Add the skeleton of minimal
netdev operations. Additionally, initialize the parameters
for rings (TX/RX/Completion).
Signed-off-by: Vikas Gupta <vikas.gupta@broadcom.com>
Reviewed-by: Bhargava Chenna Marreddy <bhargava.marreddy@broadcom.com>
Reviewed-by: Rajashekar Hudumula <rajashekar.hudumula@broadcom.com>
Link: https://patch.msgid.link/20250701143511.280702-11-vikas.gupta@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Query resources from the firmware and, based on the
availability of resources, initialize the default
settings. The default settings include:
1. Rings and other resource reservations with the
firmware. This ensures that resources are reserved
before network and auxiliary devices are created.
2. Mapping the BAR, which helps access doorbells since
its size is known after querying the firmware.
3. Retrieving the TCs and hardware CoS queue mappings.
Signed-off-by: Vikas Gupta <vikas.gupta@broadcom.com>
Reviewed-by: Bhargava Chenna Marreddy <bhargava.marreddy@broadcom.com>
Reviewed-by: Rajashekar Hudumula <rajashekar.hudumula@broadcom.com>
Link: https://patch.msgid.link/20250701143511.280702-10-vikas.gupta@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Add irq allocation functions. This will help
to allocate IRQs to both netdev and RoCE aux devices.
Signed-off-by: Vikas Gupta <vikas.gupta@broadcom.com>
Reviewed-by: Bhargava Chenna Marreddy <bhargava.marreddy@broadcom.com>
Reviewed-by: Rajashekar Hudumula <rajashekar.hudumula@broadcom.com>
Link: https://patch.msgid.link/20250701143511.280702-9-vikas.gupta@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Get the resources and capabilities from the firmware.
Add functions to manage the resources with the firmware.
These functions will help netdev reserve the resources
with the firmware before registering the device in future
patches. The resources and their information, such as
the maximum available and reserved, are part of the members
present in the bnge_hw_resc struct.
The bnge_reserve_rings() function also populates
the RSS table entries once the RX rings are reserved with
the firmware.
Signed-off-by: Vikas Gupta <vikas.gupta@broadcom.com>
Reviewed-by: Bhargava Chenna Marreddy <bhargava.marreddy@broadcom.com>
Reviewed-by: Rajashekar Hudumula <rajashekar.hudumula@broadcom.com>
Link: https://patch.msgid.link/20250701143511.280702-8-vikas.gupta@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Backing store or context memory on the host helps the
device to manage rings, stats and other resources.
Context memory is allocated with the help of ring
alloc/free functions.
Signed-off-by: Vikas Gupta <vikas.gupta@broadcom.com>
Reviewed-by: Bhargava Chenna Marreddy <bhargava.marreddy@broadcom.com>
Reviewed-by: Rajashekar Hudumula <rajashekar.hudumula@broadcom.com>
Link: https://patch.msgid.link/20250701143511.280702-7-vikas.gupta@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Add ring allocation/free mechanism which help
to allocate rings (TX/RX/Completion) and backing
stores memory on the host for the device.
Future patches will use these functions.
Signed-off-by: Vikas Gupta <vikas.gupta@broadcom.com>
Reviewed-by: Bhargava Chenna Marreddy <bhargava.marreddy@broadcom.com>
Reviewed-by: Rajashekar Hudumula <rajashekar.hudumula@broadcom.com>
Link: https://patch.msgid.link/20250701143511.280702-6-vikas.gupta@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Query firmware with the help of basic firmware commands and
cache the capabilities. With the help of basic commands
start the initialization process of the driver with the
firmware.
Since basic information is available from the firmware,
register with devlink.
Signed-off-by: Vikas Gupta <vikas.gupta@broadcom.com>
Reviewed-by: Bhargava Chenna Marreddy <bhargava.marreddy@broadcom.com>
Reviewed-by: Rajashekar Hudumula <rajashekar.hudumula@broadcom.com>
Link: https://patch.msgid.link/20250701143511.280702-5-vikas.gupta@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Add support to communicate with the firmware.
Future patches will use these functions to send the
messages to the firmware.
Functions support allocating request/response buffers
to send a particular command. Each command has certain
timeout value to which the driver waits for response from
the firmware. In error case, commands may be either timed
out waiting on response from the firmware or may return
a specific error code.
Signed-off-by: Vikas Gupta <vikas.gupta@broadcom.com>
Reviewed-by: Bhargava Chenna Marreddy <bhargava.marreddy@broadcom.com>
Reviewed-by: Rajashekar Hudumula <rajashekar.hudumula@broadcom.com>
Link: https://patch.msgid.link/20250701143511.280702-4-vikas.gupta@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Allocate a base device and devlink interface with minimal
devlink ops.
Add dsn and board related information.
Map PCIe BAR (bar0), which helps to communicate with the
firmware.
Signed-off-by: Vikas Gupta <vikas.gupta@broadcom.com>
Reviewed-by: Bhargava Chenna Marreddy <bhargava.marreddy@broadcom.com>
Reviewed-by: Rajashekar Hudumula <rajashekar.hudumula@broadcom.com>
Link: https://patch.msgid.link/20250701143511.280702-3-vikas.gupta@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
I received a kernel-test-bot report[1] that shows the
[-Wunused-but-set-variable] warning. Since the previous commit I made, as
the 'Fixes' tag shows, gives users an option to turn on and off the
CONFIG_RFS_ACCEL, the issue then can be discovered and reproduced with
GCC specifically.
Like Simon and Jakub suggested, use fewer #ifdefs which leads to fewer
bugs.
[1]
All warnings (new ones prefixed by >>):
drivers/net/ethernet/broadcom/bnxt/bnxt.c: In function 'bnxt_request_irq':
>> drivers/net/ethernet/broadcom/bnxt/bnxt.c:10703:9: warning: variable 'j' set but not used [-Wunused-but-set-variable]
10703 | int i, j, rc = 0;
| ^
Fixes: 9b6a30febd ("net: allow rps/rfs related configs to be switched")
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202506282102.x1tXt0qz-lkp@intel.com/
Signed-off-by: Jason Xing <kernelxing@tencent.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
In both the read callback for struct cyclecounter, and in struct
timecounter, struct cyclecounter is declared as a const pointer.
Unfortunatly, a number of users of this pointer treat it as a non-const
pointer as it is burried in a larger structure that is heavily modified by
the callback function when accessed. This lie had been hidden by the fact
that container_of() "casts away" a const attribute of a pointer without any
compiler warning happening at all.
Fix this all up by removing the const attribute in the needed places so
that everyone can see that the structure really isn't const, but can,
and is, modified by the users of it.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/all/2025070124-backyard-hurt-783a@gregkh
The Rx rings are filled with Rx buffers. Which are supposed to fit
packet headers (or MTU if HW-GRO is disabled). The aggregation buffers
are filled with "device pages". Adjust the sizes of the page pool
recycling ring appropriately, based on ratio of the size of the
buffer on given ring vs system page size. Otherwise on a system
with 64kB pages we end up with >700MB of memory sitting in every
single page pool cache.
Correct the size calculation for the head_pool. Since the buffers
there are always small I'm pretty sure I meant to cap the size
at 1k, rather than make it the lowest possible size. With 64k pages
1k cache with a 1k ring is 64x larger than we need.
Reviewed-by: Michael Chan <michael.chan@broadcom.com>
Link: https://patch.msgid.link/20250626165441.4125047-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Correct spelling as flagged by codespell.
Signed-off-by: Simon Horman <horms@kernel.org>
Reviewed-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use netmem_dma_*() helpers and declare netmem_tx to support netmem TX.
By this change, all bnxt devices will support the netmem TX.
Unreadable skbs are not going to be handled by the TX push logic.
So, it checks whether a skb is readable or not before the TX push logic.
netmem TX can be tested with ncdevmem.c
Acked-by: Mina Almasry <almasrymina@google.com>
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Reviewed-by: Michael Chan <michael.chan@broadcom.com>
Link: https://patch.msgid.link/20250619144058.147051-1-ap420073@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
This reverts commit 325eb217e4.
udp_tunnel infra doesn't need RTNL, should be safe to get back
to only netdev instance lock.
Cc: Michael Chan <michael.chan@broadcom.com>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Signed-off-by: Stanislav Fomichev <stfomichev@gmail.com>
Link: https://patch.msgid.link/20250616162117.287806-7-stfomichev@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Drivers that are using ops lock and don't depend on RTNL lock
still need to manage it because udp_tunnel's RTNL dependency.
Introduce new udp_tunnel_nic_lock and use it instead of
rtnl_lock. Drop non-UDP_TUNNEL_NIC_INFO_MAY_SLEEP mode from
udp_tunnel infra (udp_tunnel_nic_device_sync_work needs to
grab udp_tunnel_nic_lock mutex and might sleep).
Cover more places in v4:
- netlink
- udp_tunnel_notify_add_rx_port (ndo_open)
- triggers udp_tunnel_nic_device_sync_work
- udp_tunnel_notify_del_rx_port (ndo_stop)
- triggers udp_tunnel_nic_device_sync_work
- udp_tunnel_get_rx_info (__netdev_update_features)
- triggers NETDEV_UDP_TUNNEL_PUSH_INFO
- udp_tunnel_drop_rx_info (__netdev_update_features)
- triggers NETDEV_UDP_TUNNEL_DROP_INFO
- udp_tunnel_nic_reset_ntf (ndo_open)
- notifiers
- udp_tunnel_nic_netdevice_event, depending on the event:
- triggers NETDEV_UDP_TUNNEL_PUSH_INFO
- triggers NETDEV_UDP_TUNNEL_DROP_INFO
- ethnl_tunnel_info_reply_size
- udp_tunnel_nic_set_port_priv (two intel drivers)
Cc: Michael Chan <michael.chan@broadcom.com>
Suggested-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Stanislav Fomichev <stfomichev@gmail.com>
Link: https://patch.msgid.link/20250616162117.287806-4-stfomichev@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Migrate to new callbacks added by commit 9bb00786fc ("net: ethtool:
add dedicated callbacks for getting and setting rxfh fields").
Reviewed-by: Michael Chan <michael.chan@broadcom.com>
Link: https://patch.msgid.link/20250617014555.434790-3-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Migrate to new callbacks added by commit 9bb00786fc ("net: ethtool:
add dedicated callbacks for getting and setting rxfh fields").
The driver has no other RXNFC functionality so the SET callback can
be now removed.
Reviewed-by: Subbaraya Sundeep <sbhatta@marvell.com>
Link: https://patch.msgid.link/20250617014555.434790-2-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The disable sequence in bcmgenet_phy_power_set() is updated to
match the inverse sequence and timing (and spacing) of the
enable sequence. This ensures that LEDs driven by the GENET IP
are disabled when the GPHY is powered down.
Signed-off-by: Doug Berger <opendmb@gmail.com>
Signed-off-by: Florian Fainelli <florian.fainelli@broadcom.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250614025817.3808354-1-florian.fainelli@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Improved wording and grammar in several comments for clarity.
"the must belongs" -> "it must belong"
"mininum" -> "minimum"
"fileds" -> "fields"
Replaced return -1 with -EINVAL in hwrm_ring_alloc_send_msg()
to return a proper error code.
These changes enhance code readability and consistent error handling.
Signed-off-by: Alok Tiwari <alok.a.tiwari@oracle.com>
Reviewed-by: Michael Chan <michael.chan@broadcom.com>
Link: https://patch.msgid.link/20250615154051.1365631-1-alok.a.tiwari@oracle.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The commit under the Fixes tag below which updates the VNICs' RSS
and MRU during .ndo_queue_start(), needs to be extended to cover any
non-default RSS contexts which have their own VNICs. Without this
step, packets that are destined to a non-default RSS context may be
dropped after .ndo_queue_start().
We further optimize this scheme by updating the VNIC only if the
RX ring being restarted is in the RSS table of the VNIC. Updating
the VNIC (in particular setting the MRU to 0) will momentarily stop
all traffic to all rings in the RSS table. Any VNIC that has the
RX ring excluded from the RSS table can skip this step and avoid the
traffic disruption.
Note that this scheme is just an improvement. A VNIC with multiple
rings in the RSS table will still see traffic disruptions to all rings
in the RSS table when one of the rings is being restarted. We are
working on a FW scheme that will improve upon this further.
Fixes: 5ac066b7b0 ("bnxt_en: Fix queue start to update vnic RSS table")
Reported-by: David Wei <dw@davidwei.uk>
Signed-off-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Link: https://patch.msgid.link/20250613231841.377988-4-michael.chan@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Add a new helper function that will configure MRU and RSS table
of a VNIC. This will be useful when we configure both on a VNIC
when resetting an RX ring. This function will be used again in
the next bug fix patch where we have to reconfigure VNICs for RSS
contexts.
Suggested-by: Michael Chan <michael.chan@broadcom.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: David Wei <dw@davidwei.uk>
Signed-off-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Link: https://patch.msgid.link/20250613231841.377988-3-michael.chan@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Before the commit under the Fixes tag below, bnxt_ulp_stop() and
bnxt_ulp_start() were always invoked in pairs. After that commit,
the new bnxt_ulp_restart() can be invoked after bnxt_ulp_stop()
has been called. This may result in the RoCE driver's aux driver
.suspend() method being invoked twice. The 2nd bnxt_re_suspend()
call will crash when it dereferences a NULL pointer:
(NULL ib_device): Handle device suspend call
BUG: kernel NULL pointer dereference, address: 0000000000000b78
PGD 0 P4D 0
Oops: Oops: 0000 [#1] SMP PTI
CPU: 20 UID: 0 PID: 181 Comm: kworker/u96:5 Tainted: G S 6.15.0-rc1 #4 PREEMPT(voluntary)
Tainted: [S]=CPU_OUT_OF_SPEC
Hardware name: Dell Inc. PowerEdge R730/072T6D, BIOS 2.4.3 01/17/2017
Workqueue: bnxt_pf_wq bnxt_sp_task [bnxt_en]
RIP: 0010:bnxt_re_suspend+0x45/0x1f0 [bnxt_re]
Code: 8b 05 a7 3c 5b f5 48 89 44 24 18 31 c0 49 8b 5c 24 08 4d 8b 2c 24 e8 ea 06 0a f4 48 c7 c6 04 60 52 c0 48 89 df e8 1b ce f9 ff <48> 8b 83 78 0b 00 00 48 8b 80 38 03 00 00 a8 40 0f 85 b5 00 00 00
RSP: 0018:ffffa2e84084fd88 EFLAGS: 00010246
RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000001
RDX: 0000000000000000 RSI: ffffffffb4b6b934 RDI: 00000000ffffffff
RBP: ffffa1760954c9c0 R08: 0000000000000000 R09: c0000000ffffdfff
R10: 0000000000000001 R11: ffffa2e84084fb50 R12: ffffa176031ef070
R13: ffffa17609775000 R14: ffffa17603adc180 R15: 0000000000000000
FS: 0000000000000000(0000) GS:ffffa17daa397000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000b78 CR3: 00000004aaa30003 CR4: 00000000003706f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
<TASK>
bnxt_ulp_stop+0x69/0x90 [bnxt_en]
bnxt_sp_task+0x678/0x920 [bnxt_en]
? __schedule+0x514/0xf50
process_scheduled_works+0x9d/0x400
worker_thread+0x11c/0x260
? __pfx_worker_thread+0x10/0x10
kthread+0xfe/0x1e0
? __pfx_kthread+0x10/0x10
ret_from_fork+0x2b/0x40
? __pfx_kthread+0x10/0x10
ret_from_fork_asm+0x1a/0x30
Check the BNXT_EN_FLAG_ULP_STOPPED flag and do not proceed if the flag
is already set. This will preserve the original symmetrical
bnxt_ulp_stop() and bnxt_ulp_start().
Also, inside bnxt_ulp_start(), clear the BNXT_EN_FLAG_ULP_STOPPED
flag after taking the mutex to avoid any race condition. And for
symmetry, only proceed in bnxt_ulp_start() if the
BNXT_EN_FLAG_ULP_STOPPED is set.
Fixes: 3c163f35bd ("bnxt_en: Optimize recovery path ULP locking in the driver")
Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Co-developed-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250613231841.377988-2-michael.chan@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Make use of the return value from napi_complete_done(). This allows
users to use the gro_flush_timeout and napi_defer_hard_irqs sysfs
attributes for configuring software interrupt coalescing.
Signed-off-by: Florian Fainelli <florian.fainelli@broadcom.com>
Reviewed-by: Justin Chen <justin.chen@broadcom.com>
Reviewed-by: Joe Damato <joe@dama.to>
Link: https://patch.msgid.link/20250611212730.252342-2-florian.fainelli@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Make use of the return value from napi_complete_done(). This allows users to
use the gro_flush_timeout and napi_defer_hard_irqs sysfs attributes for
configuring software interrupt coalescing.
Signed-off-by: Zak Kemble <zakkemble@gmail.com>
Tested-by: Florian Fainelli <florian.fainelli@broadcom.com>
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Link: https://patch.msgid.link/20250610220403.935-2-zakkemble@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Move this API to the canonical timer_*() namespace.
[ tglx: Redone against pre rc1 ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/all/aB2X0jCKQO56WdMt@gmail.com
-----BEGIN PGP SIGNATURE-----
iQJIBAABCgAyFiEEgMe7l+5h9hnxdsnuWYigwDrT+vwFAmhAa9EUHGJoZWxnYWFz
QGdvb2dsZS5jb20ACgkQWYigwDrT+vyA3w//aX8d73z/xVxkYLMN/6XQA5fdmd4d
Dv4n0Pjf0WCMKbsgRCdXEYLvcHV8VhH5iCR/b2UsFm9LjxSIRuqE5XosY3bNhrHn
xVKEh2prq2XZOibWrFkJ+RZ0FF7Ogq1Uy5gUBbBHbE1q1byZzrOALaF3FWGaDIZQ
6QLLAFtd3UtqOOUu8J8P9N15uFR8gunyfuM9U7TLMcy4B8txk6T6m/9xAWtRURuJ
I6WN8lO+g8Nl2mL9m27+wyWiVT3tKqoMwp8rVtym/L5JQOmHycYhn0WQAr2dPCMs
Xbgmoeei0je7mZvk5btpt68NAKQ3ZnCVkxbbINBkUxAjI0dbI6h37EhW18ShYVUk
CCo4fmaFtwP8qNN9tSvDN8vZdGB44fN5tIz4lmGzKk5gt+oV50RC/APrzC+PJBQ0
+2SdDVKj71Gr2H1VnI6uLB7oQ+tp7TOdhg+DGV4bdc6QFnsM+BpKWRq5f1UQcau/
XVDmorM/2t6z0DNktAv3NFwSodUjk1loWESr/pRBH1AqAWZTK98PWIg97XYsal59
zbJ3dLrnCqUNozeVgjtZo1LWD2FZaVTvhq2NY7D+QPpnMGhFUhHxNliZUXiQa1q4
boI2hEFdu3IQP/OC2a1zGJyMRLU43d5rhZ1U5xQSVtM0c3lgCY7rn/t26LymQVPA
SYdg2jBcnhe6gXo=
=eWJw
-----END PGP SIGNATURE-----
Merge tag 'pci-v6.16-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci
Pull pci updates from Bjorn Helgaas:
"Enumeration:
- Print the actual delay time in pci_bridge_wait_for_secondary_bus()
instead of assuming it was 1000ms (Wilfred Mallawa)
- Revert 'iommu/amd: Prevent binding other PCI drivers to IOMMU PCI
devices', which broke resume from system sleep on AMD platforms and
has been fixed by other commits (Lukas Wunner)
Resource management:
- Remove mtip32xx use of pcim_iounmap_regions(), which is deprecated
and unnecessary (Philipp Stanner)
- Remove pcim_iounmap_regions() and pcim_request_region_exclusive()
and related flags since all uses have been removed (Philipp
Stanner)
- Rework devres 'request' functions so they are no longer 'hybrid',
i.e., their behavior no longer depends on whether
pcim_enable_device or pci_enable_device() was used, and remove
related code (Philipp Stanner)
- Warn (not BUG()) about failure to assign optional resources (Ilpo
Järvinen)
Error handling:
- Log the DPC Error Source ID only when it's actually valid (when
ERR_FATAL or ERR_NONFATAL was received from a downstream device)
and decode into bus/device/function (Bjorn Helgaas)
- Determine AER log level once and save it so all related messages
use the same level (Karolina Stolarek)
- Use KERN_WARNING, not KERN_ERR, when logging PCIe Correctable
Errors (Karolina Stolarek)
- Ratelimit PCIe Correctable and Non-Fatal error logging, with sysfs
controls on interval and burst count, to avoid flooding logs and
RCU stall warnings (Jon Pan-Doh)
Power management:
- Increment PM usage counter when probing reset methods so we don't
try to read config space of a powered-off device (Alex Williamson)
- Set all devices to D0 during enumeration to ensure ACPI opregion is
connected via _REG (Mario Limonciello)
Power control:
- Rename pwrctrl Kconfig symbols from 'PWRCTL' to 'PWRCTRL' to match
the filename paths. Retain old deprecated symbols for
compatibility, except for the pwrctrl slot driver
(PCI_PWRCTRL_SLOT) (Johan Hovold)
- When unregistering pwrctrl, cancel outstanding rescan work before
cleaning up data structures to avoid use-after-free issues (Brian
Norris)
Bandwidth control:
- Simplify link bandwidth controller by replacing the count of Link
Bandwidth Management Status (LBMS) events with a PCI_LINK_LBMS_SEEN
flag (Ilpo Järvinen)
- Update the Link Speed after retraining, since the Link Speed may
have changed (Ilpo Järvinen)
PCIe native device hotplug:
- Ignore Presence Detect Changed caused by DPC.
pciehp already ignores Link Down/Up events caused by DPC, but on
slots using in-band presence detect, DPC causes a spurious Presence
Detect Changed event (Lukas Wunner)
- Ignore Link Down/Up caused by Secondary Bus Reset.
On hotplug ports using in-band presence detect, the reset causes a
Presence Detect Changed event, which mistakenly caused teardown and
re-enumeration of the device. Drivers may need to annotate code
that resets their device (Lukas Wunner)
Virtualization:
- Add an ACS quirk for Loongson Root Ports that don't advertise ACS
but don't allow peer-to-peer transactions between Root Ports; the
quirk allows each Root Port to be in a separate IOMMU group (Huacai
Chen)
Endpoint framework:
- For fixed-size BARs, retain both the actual size and the possibly
larger size allocated to accommodate iATU alignment requirements
(Jerome Brunet)
- Simplify ctrl/SPAD space allocation and avoid allocating more space
than needed (Jerome Brunet)
- Correct MSI-X PBA offset calculations for DesignWare and Cadence
endpoint controllers (Niklas Cassel)
- Align the return value (number of interrupts) encoding for
pci_epc_get_msi()/pci_epc_ops::get_msi() and
pci_epc_get_msix()/pci_epc_ops::get_msix() (Niklas Cassel)
- Align the nr_irqs parameter encoding for
pci_epc_set_msi()/pci_epc_ops::set_msi() and
pci_epc_set_msix()/pci_epc_ops::set_msix() (Niklas Cassel)
Common host controller library:
- Convert pci-host-common to a library so platforms that don't need
native host controller drivers don't need to include these helper
functions (Manivannan Sadhasivam)
Apple PCIe controller driver:
- Extract ECAM bridge creation helper from pci_host_common_probe() to
separate driver-specific things like MSI from PCI things (Marc
Zyngier)
- Dynamically allocate RID-to_SID bitmap to prepare for SoCs with
varying capabilities (Marc Zyngier)
- Skip ports disabled in DT when setting up ports (Janne Grunau)
- Add t6020 compatible string (Alyssa Rosenzweig)
- Add T602x PCIe support (Hector Martin)
- Directly set/clear INTx mask bits because T602x dropped the
accessors that could do this without locking (Marc Zyngier)
- Move port PHY registers to their own reg items to accommodate
T602x, which moves them around; retain default offsets for existing
DTs that lack phy%d entries with the reg offsets (Hector Martin)
- Stop polling for core refclk, which doesn't work on T602x and the
bootloader has already done anyway (Hector Martin)
- Use gpiod_set_value_cansleep() when asserting PERST# in probe
because we're allowed to sleep there (Hector Martin)
Cadence PCIe controller driver:
- Drop a runtime PM 'put' to resolve a runtime atomic count underflow
(Hans Zhang)
- Make the cadence core buildable as a module (Kishon Vijay Abraham I)
- Add cdns_pcie_host_disable() and cdns_pcie_ep_disable() for use by
loadable drivers when they are removed (Siddharth Vadapalli)
Freescale i.MX6 PCIe controller driver:
- Apply link training workaround only on IMX6Q, IMX6SX, IMX6SP
(Richard Zhu)
- Remove redundant dw_pcie_wait_for_link() from
imx_pcie_start_link(); since the DWC core does this, imx6 only
needs it when retraining for a faster link speed (Richard Zhu)
- Toggle i.MX95 core reset to align with PHY powerup (Richard Zhu)
- Set SYS_AUX_PWR_DET to work around i.MX95 ERR051624 erratum: in
some cases, the controller can't exit 'L23 Ready' through Beacon or
PERST# deassertion (Richard Zhu)
- Clear GEN3_ZRXDC_NONCOMPL to work around i.MX95 ERR051586 erratum:
controller can't meet 2.5 GT/s ZRX-DC timing when operating at 8
GT/s, causing timeouts in L1 (Richard Zhu)
- Wait for i.MX95 PLL lock before enabling controller (Richard Zhu)
- Save/restore i.MX95 LUT for suspend/resume (Richard Zhu)
Mobiveil PCIe controller driver:
- Return bool (not int) for link-up check in
mobiveil_pab_ops.link_up() and layerscape-gen4, mobiveil (Hans
Zhang)
NVIDIA Tegra194 PCIe controller driver:
- Create debugfs directory for 'aspm_state_cnt' only when
CONFIG_PCIEASPM is enabled, since there are no other entries (Hans
Zhang)
Qualcomm PCIe controller driver:
- Add OF support for parsing DT 'eq-presets-<N>gts' property for lane
equalization presets (Krishna Chaitanya Chundru)
- Read Maximum Link Width from the Link Capabilities register if DT
lacks 'num-lanes' property (Krishna Chaitanya Chundru)
- Add Physical Layer 64 GT/s Capability ID and register offsets for
8, 32, and 64 GT/s lane equalization registers (Krishna Chaitanya
Chundru)
- Add generic dwc support for configuring lane equalization presets
(Krishna Chaitanya Chundru)
- Add DT and driver support for PCIe on IPQ5018 SoC (Nitheesh Sekar)
Renesas R-Car PCIe controller driver:
- Describe endpoint BAR 4 as being fixed size (Jerome Brunet)
- Document how to obtain R-Car V4H (r8a779g0) controller firmware
(Yoshihiro Shimoda)
Rockchip PCIe controller driver:
- Reorder rockchip_pci_core_rsts because
reset_control_bulk_deassert() deasserts in reverse order, to fix a
link training regression (Jensen Huang)
- Mark RK3399 as being capable of raising INTx interrupts (Niklas
Cassel)
Rockchip DesignWare PCIe controller driver:
- Check only PCIE_LINKUP, not LTSSM status, to determine whether the
link is up (Shawn Lin)
- Increase N_FTS (used in L0s->L0 transitions) and enable ASPM L0s
for Root Complex and Endpoint modes (Shawn Lin)
- Hide the broken ATS Capability in rockchip_pcie_ep_init() instead
of rockchip_pcie_ep_pre_init() so it stays hidden after PERST#
resets non-sticky registers (Shawn Lin)
- Call phy_power_off() before phy_exit() in rockchip_pcie_phy_deinit()
(Diederik de Haas)
Synopsys DesignWare PCIe controller driver:
- Set PORT_LOGIC_LINK_WIDTH to one lane to make initial link training
more robust; this will not affect the intended link width if all
lanes are functional (Wenbin Yao)
- Return bool (not int) for link-up check in dw_pcie_ops.link_up()
and armada8k, dra7xx, dw-rockchip, exynos, histb, keembay,
keystone, kirin, meson, qcom, qcom-ep, rcar_gen4, spear13xx,
tegra194, uniphier, visconti (Hans Zhang)
- Add debugfs support for exposing DWC device-specific PTM context
(Manivannan Sadhasivam)
TI J721E PCIe driver:
- Make j721e buildable as a loadable and removable module (Siddharth
Vadapalli)
- Fix j721e host/endpoint dependencies that result in link failures
in some configs (Arnd Bergmann)
Device tree bindings:
- Add qcom DT binding for 'global' interrupt (PCIe controller and
link-specific events) for ipq8074, ipq8074-gen3, ipq6018, sa8775p,
sc7280, sc8180x sdm845, sm8150, sm8250, sm8350 (Manivannan
Sadhasivam)
- Add qcom DT binding for 8 MSI SPI interrupts for msm8998, ipq8074,
ipq8074-gen3, ipq6018 (Manivannan Sadhasivam)
- Add dw rockchip DT binding for rk3576 and rk3562 (Kever Yang)
- Correct indentation and style of examples in brcm,stb-pcie,
cdns,cdns-pcie-ep, intel,keembay-pcie-ep, intel,keembay-pcie,
microchip,pcie-host, rcar-pci-ep, rcar-pci-host, xilinx-versal-cpm
(Krzysztof Kozlowski)
- Convert Marvell EBU (dove, kirkwood, armada-370, armada-xp) and
armada8k from text to schema DT bindings (Rob Herring)
- Remove obsolete .txt DT bindings for content that has been moved to
schemas (Rob Herring)
- Add qcom DT binding for MHI registers in IPQ5332, IPQ6018, IPQ8074
and IPQ9574 (Varadarajan Narayanan)
- Convert v3,v360epc-pci from text to DT schema binding (Rob Herring)
- Change microchip,pcie-host DT binding to be 'dma-noncoherent' since
PolarFire may be configured that way (Conor Dooley)
Miscellaneous:
- Drop 'pci' suffix from intel_mid_pci.c filename to match similar
files (Andy Shevchenko)
- All platforms with PCI have an MMU, so add PCI Kconfig dependency
on MMU to simplify build testing and avoid inadvertent build
regressions (Arnd Bergmann)
- Update Krzysztof Wilczyński's email address in MAINTAINERS
(Krzysztof Wilczyński)
- Update Manivannan Sadhasivam's email address in MAINTAINERS
(Manivannan Sadhasivam)"
* tag 'pci-v6.16-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci: (147 commits)
MAINTAINERS: Update Manivannan Sadhasivam email address
PCI: j721e: Fix host/endpoint dependencies
PCI: j721e: Add support to build as a loadable module
PCI: cadence-ep: Introduce cdns_pcie_ep_disable() helper for cleanup
PCI: cadence-host: Introduce cdns_pcie_host_disable() helper for cleanup
PCI: cadence: Add support to build pcie-cadence library as a kernel module
MAINTAINERS: Update Krzysztof Wilczyński email address
PCI: Remove unnecessary linesplit in __pci_setup_bridge()
PCI: WARN (not BUG()) when we fail to assign optional resources
PCI: Remove unused pci_printk()
PCI: qcom: Replace PERST# sleep time with proper macro
PCI: dw-rockchip: Replace PERST# sleep time with proper macro
PCI: host-common: Convert to library for host controller drivers
PCI/ERR: Remove misleading TODO regarding kernel panic
PCI: cadence: Remove duplicate message code definitions
PCI: endpoint: Align pci_epc_set_msix(), pci_epc_ops::set_msix() nr_irqs encoding
PCI: endpoint: Align pci_epc_set_msi(), pci_epc_ops::set_msi() nr_irqs encoding
PCI: endpoint: Align pci_epc_get_msix(), pci_epc_ops::get_msix() return value encoding
PCI: endpoint: Align pci_epc_get_msi(), pci_epc_ops::get_msi() return value encoding
PCI: cadence-ep: Correct PBA offset in .set_msix() callback
...
Cross-merge networking fixes after downstream PR (net-6.15-rc8).
Conflicts:
80f2ab46c2 ("irdma: free iwdev->rf after removing MSI-X")
4bcc063939 ("ice, irdma: fix an off by one in error handling code")
c24a65b6a2 ("iidc/ice/irdma: Update IDC to support multiple consumers")
https://lore.kernel.org/20250513130630.280ee6c5@canb.auug.org.au
No extra adjacent changes.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
When xdp is attached or detached, dev->ndo_bpf() is called by
do_setlink(), and it acquires netdev_lock() if needed.
Unlike other drivers, the bnxt driver is protected by netdev_lock while
xdp is attached/detached because it sets dev->request_ops_lock to true.
So, the bnxt_xdp(), that is callback of ->ndo_bpf should not acquire
netdev_lock().
But the xdp_features_{set | clear}_redirect_target() was changed to
acquire netdev_lock() internally.
It causes a deadlock.
To fix this problem, bnxt driver should use
xdp_features_{set | clear}_redirect_target_locked() instead.
Splat looks like:
============================================
WARNING: possible recursive locking detected
6.15.0-rc6+ #1 Not tainted
--------------------------------------------
bpftool/1745 is trying to acquire lock:
ffff888131b85038 (&dev->lock){+.+.}-{4:4}, at: xdp_features_set_redirect_target+0x1f/0x80
but task is already holding lock:
ffff888131b85038 (&dev->lock){+.+.}-{4:4}, at: do_setlink.constprop.0+0x24e/0x35d0
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0
----
lock(&dev->lock);
lock(&dev->lock);
*** DEADLOCK ***
May be due to missing lock nesting notation
3 locks held by bpftool/1745:
#0: ffffffffa56131c8 (rtnl_mutex){+.+.}-{4:4}, at: rtnl_setlink+0x1fe/0x570
#1: ffffffffaafa75a0 (&net->rtnl_mutex){+.+.}-{4:4}, at: rtnl_setlink+0x236/0x570
#2: ffff888131b85038 (&dev->lock){+.+.}-{4:4}, at: do_setlink.constprop.0+0x24e/0x35d0
stack backtrace:
CPU: 1 UID: 0 PID: 1745 Comm: bpftool Not tainted 6.15.0-rc6+ #1 PREEMPT(undef)
Hardware name: ASUS System Product Name/PRIME Z690-P D4, BIOS 0603 11/01/2021
Call Trace:
<TASK>
dump_stack_lvl+0x7a/0xd0
print_deadlock_bug+0x294/0x3d0
__lock_acquire+0x153b/0x28f0
lock_acquire+0x184/0x340
? xdp_features_set_redirect_target+0x1f/0x80
__mutex_lock+0x1ac/0x18a0
? xdp_features_set_redirect_target+0x1f/0x80
? xdp_features_set_redirect_target+0x1f/0x80
? __pfx_bnxt_rx_page_skb+0x10/0x10 [bnxt_en
? __pfx___mutex_lock+0x10/0x10
? __pfx_netdev_update_features+0x10/0x10
? bnxt_set_rx_skb_mode+0x284/0x540 [bnxt_en
? __pfx_bnxt_set_rx_skb_mode+0x10/0x10 [bnxt_en
? xdp_features_set_redirect_target+0x1f/0x80
xdp_features_set_redirect_target+0x1f/0x80
bnxt_xdp+0x34e/0x730 [bnxt_en 11cbcce8fa11cff1dddd7ef358d6219e4ca9add3]
dev_xdp_install+0x3f4/0x830
? __pfx_bnxt_xdp+0x10/0x10 [bnxt_en 11cbcce8fa11cff1dddd7ef358d6219e4ca9add3]
? __pfx_dev_xdp_install+0x10/0x10
dev_xdp_attach+0x560/0xf70
dev_change_xdp_fd+0x22d/0x280
do_setlink.constprop.0+0x2989/0x35d0
? __pfx_do_setlink.constprop.0+0x10/0x10
? lock_acquire+0x184/0x340
? find_held_lock+0x32/0x90
? rtnl_setlink+0x236/0x570
? rcu_is_watching+0x11/0xb0
? trace_contention_end+0xdc/0x120
? __mutex_lock+0x946/0x18a0
? __pfx___mutex_lock+0x10/0x10
? __lock_acquire+0xa95/0x28f0
? rcu_is_watching+0x11/0xb0
? rcu_is_watching+0x11/0xb0
? cap_capable+0x172/0x350
rtnl_setlink+0x2cd/0x570
Fixes: 03df156dd3 ("xdp: double protect netdev->xdp_flags with netdev->lock")
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Michael Chan <michael.chan@broadcom.com>
Link: https://patch.msgid.link/20250520071155.2462843-1-ap420073@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
netdev_lock is already held when calling bnxt_ulp_irq_stop() and
bnxt_ulp_irq_restart(). When converting rtnl_lock to netdev_lock,
the original code was rtnl_dereference() to indicate that rtnl_lock
was already held. rcu_dereference_protected() is the correct
conversion after replacing rtnl_lock with netdev_lock.
Add a new helper netdev_lock_dereference() similar to
rtnl_dereference().
Fixes: 004b500801 ("eth: bnxt: remove most dependencies on RTNL")
Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com>
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250519204130.3097027-2-michael.chan@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>