Slave function read the following capabilities from the wrong offset:
1. log_mc_entry_sz
2. fs_log_entry_sz
3. log_mc_hash_sz
Fix that by adjusting these capabilities offset to match firmware
layout.
Due to the wrong offset read, the following issues might occur:
1+2. Negative value reported at max_mcast_qp_attach.
3. Driver to init FW with multicast hash size of zero.
Fixes: a40ded6043 ("net/mlx4_core: Add masking for a few queries on HCA caps")
Signed-off-by: Aya Levin <ayal@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Reviewed-by: Eran Ben Elisha <eranbe@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://lore.kernel.org/r/20201118081922.553-1-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQEzBAABCAAdFiEEGhZs6bAKwk/OTgTpSD+KveBX+j4FAl+0KZ4ACgkQSD+KveBX
+j5DDAgAip1LGfOUq7RISh1OWQ9zLl2KT/mmbdSioObGgjKunU4lMGZAPrLB0bpe
RH1RodslY+1bepcz7VQ/QxbL0EU605SZn08xLZIQAYYXT0Sar+hhg7h7vhD0iSA1
dwDF/XLQYHf8snc3x/tu6/zp9BzfzRfuiieYAbC1ri97Iz9uoYO+QcRI7OyqgXWf
LmK9/s6WzgaC0DKg8XXEQGp7F1MbZfJph6tThZnF62NY9Rrkut6AEah85p1QaqF7
9WLjqOtJwo893fzbN+WCXJTYo2UX9oU18nUSvSoYid2LEofiwY3PcT8R0gKSPZto
cVcjm1oGsotbonBnPlR2eMZq8LdPDQ==
=MKi5
-----END PGP SIGNATURE-----
Merge tag 'mlx5-fixes-2020-11-17' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux
Saeed Mahameed says:
====================
mlx5 fixes 2020-11-17
This series introduces some fixes to mlx5 driver.
* tag 'mlx5-fixes-2020-11-17' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux:
net/mlx5: fix error return code in mlx5e_tc_nic_init()
net/mlx5: E-Switch, Fail mlx5_esw_modify_vport_rate if qos disabled
net/mlx5: Disable QoS when min_rates on all VFs are zero
net/mlx5: Clear bw_share upon VF disable
net/mlx5: Add handling of port type in rule deletion
net/mlx5e: Fix check if netdev is bond slave
net/mlx5e: Fix IPsec packet drop by mlx5e_tc_update_skb
net/mlx5e: Set IPsec WAs only in IP's non checksum partial case.
net/mlx5e: Fix refcount leak on kTLS RX resync
====================
Link: https://lore.kernel.org/r/20201117195702.386113-1-saeedm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The driver sends Ethernet Management Datagram (EMAD) packets to the
device for configuration purposes and waits for up to 200ms for a reply.
A request is retried up to 5 times.
When the system is under heavy load, replies are not always processed in
time and EMAD transactions fail.
Make the process more robust to such delays by using exponential
backoff. First wait for up to 200ms, then retransmit and wait for up to
400ms and so on.
Fixes: caf7297e7a ("mlxsw: core: Introduce support for asynchronous EMAD register access")
Reported-by: Denis Yulevich <denisyu@nvidia.com>
Tested-by: Denis Yulevich <denisyu@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The commit cited below moved firmware flashing functionality from
mlxsw_spectrum to mlxsw_core, but did not adjust the Kconfig
dependencies. This makes it possible to have mlxsw_core as built-in and
mlxfw as a module. The mlxfw code is therefore not reachable from
mlxsw_core and firmware flashing fails:
# devlink dev flash pci/0000:01:00.0 file mellanox/mlxsw_spectrum-13.2008.1310.mfa2
devlink answers: Operation not supported
Fix by having mlxsw_core select mlxfw.
Fixes: b79cb787ac ("mlxsw: Move fw flashing code into core.c")
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reported-by: Vadim Pasternak <vadimp@nvidia.com>
Tested-by: Vadim Pasternak <vadimp@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Fix to return a negative error code from the error handling
case instead of 0, as done elsewhere in this function.
Fixes: aedd133d17 ("net/mlx5e: Support CT offload for tc nic flows")
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Wang Hai <wanghai38@huawei.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Avoid calling mlx5_esw_modify_vport_rate() if qos is not enabled and
avoid unnecessary syndrome messages from firmware.
Fixes: fcb64c0f56 ("net/mlx5: E-Switch, add ingress rate support")
Signed-off-by: Eli Cohen <elic@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Currently when QoS is enabled for VF and any min_rate is configured,
the driver sets bw_share value to at least 1 and doesn’t allow to set
it to 0 to make minimal rate unlimited. It means there is always a
minimal rate configured for every VF, even if user tries to remove it.
In order to make QoS disable possible, check whether all vports have
configured min_rate = 0. If this is true, set their bw_share to 0 to
disable min_rate limitations.
Fixes: c9497c9890 ("net/mlx5: Add support for setting VF min rate")
Signed-off-by: Vladyslav Tarasiuk <vladyslavt@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Currently, if user disables VFs with some min and max rates configured,
they are cleared. But QoS data is not cleared and restored upon next VF
enable placing limits on minimal rate for given VF, when user expects
none.
To match cleared vport->info struct with QoS-related min and max rates
upon VF disable, clear vport->qos struct too.
Fixes: 556b9d16d3 ("net/mlx5: Clear VF's configuration on disabling SRIOV")
Signed-off-by: Vladyslav Tarasiuk <vladyslavt@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Bond events handler uses bond_slave_get_rtnl to check if net device
is bond slave. bond_slave_get_rtnl return the rcu rx_handler pointer
from the netdev which exists for bond slaves but also exists for
devices that are attached to linux bridge so using it as indication
for bond slave is wrong.
Fix by using netif_is_lag_port instead.
Fixes: 7e51891a23 ("net/mlx5e: Use netdev events to set/del egress acl forward-to-vport rule")
Signed-off-by: Maor Dickman <maord@nvidia.com>
Reviewed-by: Raed Salem <raeds@nvidia.com>
Reviewed-by: Ariel Levkovich <lariel@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Both TC and IPsec crypto offload use metadata_regB to store
private information. Since TC does not use bit 31 of regB, IPsec
will use bit 31 as the IPsec packet marker. The IPsec's regB usage
is changed to:
Bit31: IPsec marker
Bit30-24: IPsec syndrome
Bit23-0: IPsec obj id
Fixes: b2ac7541e3 ("net/mlx5e: IPsec: Add Connect-X IPsec Rx data path offload")
Signed-off-by: Huy Nguyen <huyn@mellanox.com>
Reviewed-by: Raed Salem <raeds@nvidia.com>
Reviewed-by: Ariel Levkovich <lariel@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
The IP's checksum partial still requires L4 csum flag on Ethernet WQE.
Make the IPsec WAs only for the IP's non checksum partial case
(for example icmd packet)
Fixes: 5be019040c ("net/mlx5e: IPsec: Add Connect-X IPsec Tx data path offload")
Signed-off-by: Huy Nguyen <huyn@mellanox.com>
Reviewed-by: Raed Salem <raeds@nvidia.com>
Reviewed-by: Alaa Hleihel <alaa@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
On resync, the driver calls inet_lookup_established
(__inet6_lookup_established) that increases sk_refcnt of the socket. To
decrease it, the driver set skb->destructor to sock_edemux. However, it
didn't work well, because the TCP stack also sets this destructor for
early demux, and the refcount gets decreased only once, while increased
two times (in mlx5e and in the TCP stack). It leads to a socket leak, a
TLS context leak, which in the end leads to calling tls_dev_del twice:
on socket close and on driver unload, which in turn leads to a crash.
This commit fixes the refcount leak by calling sock_gen_put right away
after using the socket, thus fixing all the subsequent issues.
Fixes: 0419d8c9d8 ("net/mlx5e: kTLS, Add kTLS RX resync support")
Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
rq->xdp_prog is RCU-protected and should be accessed only with
rcu_access_pointer for the NULL check in mlx5e_poll_rx_cq.
rq->xdp_prog may change on the fly only from one non-NULL value to
another non-NULL value, so the checks in mlx5e_xdp_handle and
mlx5e_poll_rx_cq will have the same result during one NAPI cycle,
meaning that no additional synchronization is needed.
Fixes: fe45386a20 ("net/mlx5e: Use RCU to protect rq->xdp_prog")
Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
During driver reload, perform firmware tear-down which results in
firmware losing the configured VXLAN ports. These ports are still
available in the driver's database. Fix this by cleaning up driver's
VXLAN database in the nic unload flow, before firmware tear-down. With
that, minimize mlx5_vxlan_destroy() to remove only what was added in
mlx5_vxlan_create() and warn on leftover UDP ports.
Fixes: 18a2b7f969 ("net/mlx5: convert to new udp_tunnel infrastructure")
Signed-off-by: Aya Levin <ayal@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
When E-switch vport is disabled, querying its hardware address is
unsupported.
Avoid setting extack error log message in such case.
Fixes: f099fde16d ("net/mlx5: E-switch, Support querying port function mac address")
Signed-off-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
When a rule is duplicated, the refcount of the rule is increased so only
the second deletion of the rule should cause destruction of the FTE.
Currently, the FTE will be destroyed in the first deletion of rule since
the modify_mask will be 0.
Fix it and call to destroy FTE only if all the rules (FTE's children)
have been removed.
Fixes: 718ce4d601 ("net/mlx5: Consolidate update FTE for all removal changes")
Signed-off-by: Maor Gottlieb <maorg@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
async_icosq_lock may be taken from softirq and non-softirq contexts. It
requires protection with spin_lock_bh, otherwise a softirq may be
triggered in the middle of the critical section, and it may deadlock if
it tries to take the same lock. This patch fixes such a scenario by
using spin_lock_bh to disable softirqs on that CPU while inside the
critical section.
Fixes: 8d94b590f1 ("net/mlx5e: Turn XSK ICOSQ into a general asynchronous one")
Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
In functions mlx5e_route_lookup_ipv{4|6}() route_dev can be arbitrary net
device and not necessary mlx5 eswitch port representor. As such, in order
to ensure that route_dev is not destroyed concurrent the code needs either
explicitly take reference to the device before releasing reference to
rtable instance or ensure that caller holds rtnl lock. First approach is
chosen as a fix since rtnl lock dependency was intentionally removed from
mlx5 TC layer.
To prevent unprotected usage of route_dev in encap code take a reference to
the device before releasing rt. Don't save direct pointer to the device in
mlx5_encap_entry structure and use ifindex instead. Modify users of
route_dev pointer to properly obtain the net device instance from its
ifindex.
Fixes: 61086f3910 ("net/mlx5e: Protect encap hash table with mutex")
Fixes: 6707f74be8 ("net/mlx5e: Update hw flows when encap source mac changed")
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Modify header actions are allocated during parse tc actions and only
freed during the flow creation, however, on error flow the allocated
memory is wrongly unfreed.
Fix this by calling dealloc_mod_hdr_actions in __mlx5e_add_fdb_flow
and mlx5e_add_nic_flow error flow.
Fixes: d7e75a325c ("net/mlx5e: Add offloading of E-Switch TC pedit (header re-write) actions")
Fixes: 2f4fe4cab0 ("net/mlx5e: Add offloading of NIC TC pedit (header re-write) actions")
Signed-off-by: Maor Dickman <maord@nvidia.com>
Reviewed-by: Paul Blakey <paulb@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Current release regressions:
- r8169: fix forced threading conflicting with other shared
interrupts; we tried to fix the use of raise_softirq_irqoff
from an IRQ handler on RT by forcing hard irqs, but this
driver shares legacy PCI IRQs so drop the _irqoff() instead
- tipc: fix memory leak caused by a recent syzbot report fix
to tipc_buf_append()
Current release - bugs in new features:
- devlink: Unlock on error in dumpit() and fix some error codes
- net/smc: fix null pointer dereference in smc_listen_decline()
Previous release - regressions:
- tcp: Prevent low rmem stalls with SO_RCVLOWAT.
- net: protect tcf_block_unbind with block lock
- ibmveth: Fix use of ibmveth in a bridge; the self-imposed filtering
to only send legal frames to the hypervisor was too strict
- net: hns3: Clear the CMDQ registers before unmapping BAR region;
incorrect cleanup order was leading to a crash
- bnxt_en - handful of fixes to fixes:
- Send HWRM_FUNC_RESET fw command unconditionally, even
if there are PCIe errors being reported
- Check abort error state in bnxt_open_nic().
- Invoke cancel_delayed_work_sync() for PFs also.
- Fix regression in workqueue cleanup logic in bnxt_remove_one().
- mlxsw: Only advertise link modes supported by both driver
and device, after removal of 56G support from the driver
56G was not cleared from advertised modes
- net/smc: fix suppressed return code
Previous release - always broken:
- netem: fix zero division in tabledist, caused by integer overflow
- bnxt_en: Re-write PCI BARs after PCI fatal error.
- cxgb4: set up filter action after rewrites
- net: ipa: command payloads already mapped
Misc:
- s390/ism: fix incorrect system EID, it's okay to change since
it was added in current release
- vsock: use ns_capable_noaudit() on socket create to suppress
false positive audit messages
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAl+bGTcACgkQMUZtbf5S
IrtMvxAAldlA7x22atOHJ2HMTqUGK3rlIQYgxlWJbfDnA7Ui4rZTDa/K0VkuS4ey
rfaBf37XLDmzZkHgYvXG1qV2kB0MrXQqF7jJn+BNlAuM1kIsURt85Y2FxVu/+x6X
wWtBgg/D77VXpeMimGcp8wBg5xFlUDdTezo+tInSuY9ahi1dUQx3ZSBTgqz3a5Vn
wUwD7U0wkBEHkZFeLE6u0tdN9wY8IHH6cbMfzfnPxxIv6VVUOcQcvbomc+reEPhH
vxeCHg7tK3yxbe9cPEbuwVDpoapB8Y627rv08Njhfuxx6Yysp/OOvUNRIBeD/7Gi
TiZc6RMQ9XZ9QoGueaxFVSFIGRpRIQiO/gh+O5lWVX8dGsIjlKnw2E8gWmSS48YP
cMAez0Fe+CJ2S2QNFbGVyJJX6xOl5h6kQaf88OiEhudpEUgyz156MNVwbJnE4fYk
8GONCIea1hNjLQ1VUfcQEYdxChWVeAoUEZIFcK2YKA+1w9Ris6hV21j/aUxYXQRt
RGOALFUtCRIEX28ZW8eEyXgp1EdUvp7qcIK5YZEF6YHWlRxQ8LkU6qhD7Mm2oqkE
fydoMDz9TEBaWqFtpgQmZH76JYqd7btCsR2YPwnlKmcKQ3tEKtW0NKt1QH/DKcvm
nmDA6A+52XSbar1sRlVPnr3IGfodqGQ3A35sVFS8jkcmMvDRlbk=
=reLi
-----END PGP SIGNATURE-----
Merge tag 'net-5.10-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
"Current release regressions:
- r8169: fix forced threading conflicting with other shared
interrupts; we tried to fix the use of raise_softirq_irqoff from an
IRQ handler on RT by forcing hard irqs, but this driver shares
legacy PCI IRQs so drop the _irqoff() instead
- tipc: fix memory leak caused by a recent syzbot report fix to
tipc_buf_append()
Current release - bugs in new features:
- devlink: Unlock on error in dumpit() and fix some error codes
- net/smc: fix null pointer dereference in smc_listen_decline()
Previous release - regressions:
- tcp: Prevent low rmem stalls with SO_RCVLOWAT.
- net: protect tcf_block_unbind with block lock
- ibmveth: Fix use of ibmveth in a bridge; the self-imposed filtering
to only send legal frames to the hypervisor was too strict
- net: hns3: Clear the CMDQ registers before unmapping BAR region;
incorrect cleanup order was leading to a crash
- bnxt_en - handful of fixes to fixes:
- Send HWRM_FUNC_RESET fw command unconditionally, even if there
are PCIe errors being reported
- Check abort error state in bnxt_open_nic().
- Invoke cancel_delayed_work_sync() for PFs also.
- Fix regression in workqueue cleanup logic in bnxt_remove_one().
- mlxsw: Only advertise link modes supported by both driver and
device, after removal of 56G support from the driver 56G was not
cleared from advertised modes
- net/smc: fix suppressed return code
Previous release - always broken:
- netem: fix zero division in tabledist, caused by integer overflow
- bnxt_en: Re-write PCI BARs after PCI fatal error.
- cxgb4: set up filter action after rewrites
- net: ipa: command payloads already mapped
Misc:
- s390/ism: fix incorrect system EID, it's okay to change since it
was added in current release
- vsock: use ns_capable_noaudit() on socket create to suppress false
positive audit messages"
* tag 'net-5.10-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (36 commits)
r8169: fix issue with forced threading in combination with shared interrupts
netem: fix zero division in tabledist
ibmvnic: fix ibmvnic_set_mac
mptcp: add missing memory scheduling in the rx path
tipc: fix memory leak caused by tipc_buf_append()
gtp: fix an use-before-init in gtp_newlink()
net: protect tcf_block_unbind with block lock
ibmveth: Fix use of ibmveth in a bridge.
net/sched: act_mpls: Add softdep on mpls_gso.ko
ravb: Fix bit fields checking in ravb_hwtstamp_get()
devlink: Unlock on error in dumpit()
devlink: Fix some error codes
chelsio/chtls: fix memory leaks in CPL handlers
chelsio/chtls: fix deadlock issue
net: hns3: Clear the CMDQ registers before unmapping BAR region
bnxt_en: Send HWRM_FUNC_RESET fw command unconditionally.
bnxt_en: Check abort error state in bnxt_open_nic().
bnxt_en: Re-write PCI BARs after PCI fatal error.
bnxt_en: Invoke cancel_delayed_work_sync() for PFs also.
bnxt_en: Fix regression in workqueue cleanup logic in bnxt_remove_one().
...
Each EMAD transaction stores the skb used to issue the EMAD request
('trans->tx_skb') so that the request could be retried in case of a
timeout. The skb can be freed when a corresponding response is received
or as part of the retry logic (e.g., failed retransmit, exceeded maximum
number of retries).
The two tasks (i.e., response processing and retransmits) are
synchronized by the atomic 'trans->active' field which ensures that
responses to inactive transactions are ignored.
In case of a failed retransmit the transaction is finished and all of
its resources are freed. However, the current code does not mark it as
inactive. Syzkaller was able to hit a race condition in which a
concurrent response is processed while the transaction's resources are
being freed, resulting in a use-after-free [1].
Fix the issue by making sure to mark the transaction as inactive after a
failed retransmit and free its resources only if a concurrent task did
not already do that.
[1]
BUG: KASAN: use-after-free in consume_skb+0x30/0x370
net/core/skbuff.c:833
Read of size 4 at addr ffff88804f570494 by task syz-executor.0/1004
CPU: 0 PID: 1004 Comm: syz-executor.0 Not tainted 5.8.0-rc7+ #68
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
Call Trace:
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0xf6/0x16e lib/dump_stack.c:118
print_address_description.constprop.0+0x1c/0x250
mm/kasan/report.c:383
__kasan_report mm/kasan/report.c:513 [inline]
kasan_report.cold+0x1f/0x37 mm/kasan/report.c:530
check_memory_region_inline mm/kasan/generic.c:186 [inline]
check_memory_region+0x14e/0x1b0 mm/kasan/generic.c:192
instrument_atomic_read include/linux/instrumented.h:56 [inline]
atomic_read include/asm-generic/atomic-instrumented.h:27 [inline]
refcount_read include/linux/refcount.h:147 [inline]
skb_unref include/linux/skbuff.h:1044 [inline]
consume_skb+0x30/0x370 net/core/skbuff.c:833
mlxsw_emad_trans_finish+0x64/0x1c0 drivers/net/ethernet/mellanox/mlxsw/core.c:592
mlxsw_emad_process_response drivers/net/ethernet/mellanox/mlxsw/core.c:651 [inline]
mlxsw_emad_rx_listener_func+0x5c9/0xac0 drivers/net/ethernet/mellanox/mlxsw/core.c:672
mlxsw_core_skb_receive+0x4df/0x770 drivers/net/ethernet/mellanox/mlxsw/core.c:2063
mlxsw_pci_cqe_rdq_handle drivers/net/ethernet/mellanox/mlxsw/pci.c:595 [inline]
mlxsw_pci_cq_tasklet+0x12a6/0x2520 drivers/net/ethernet/mellanox/mlxsw/pci.c:651
tasklet_action_common.isra.0+0x13f/0x3e0 kernel/softirq.c:550
__do_softirq+0x223/0x964 kernel/softirq.c:292
asm_call_on_stack+0x12/0x20 arch/x86/entry/entry_64.S:711
Allocated by task 1006:
save_stack+0x1b/0x40 mm/kasan/common.c:48
set_track mm/kasan/common.c:56 [inline]
__kasan_kmalloc mm/kasan/common.c:494 [inline]
__kasan_kmalloc.constprop.0+0xc2/0xd0 mm/kasan/common.c:467
slab_post_alloc_hook mm/slab.h:586 [inline]
slab_alloc_node mm/slub.c:2824 [inline]
slab_alloc mm/slub.c:2832 [inline]
kmem_cache_alloc+0xcd/0x2e0 mm/slub.c:2837
__build_skb+0x21/0x60 net/core/skbuff.c:311
__netdev_alloc_skb+0x1e2/0x360 net/core/skbuff.c:464
netdev_alloc_skb include/linux/skbuff.h:2810 [inline]
mlxsw_emad_alloc drivers/net/ethernet/mellanox/mlxsw/core.c:756 [inline]
mlxsw_emad_reg_access drivers/net/ethernet/mellanox/mlxsw/core.c:787 [inline]
mlxsw_core_reg_access_emad+0x1ab/0x1420 drivers/net/ethernet/mellanox/mlxsw/core.c:1817
mlxsw_reg_trans_query+0x39/0x50 drivers/net/ethernet/mellanox/mlxsw/core.c:1831
mlxsw_sp_sb_pm_occ_clear drivers/net/ethernet/mellanox/mlxsw/spectrum_buffers.c:260 [inline]
mlxsw_sp_sb_occ_max_clear+0xbff/0x10a0 drivers/net/ethernet/mellanox/mlxsw/spectrum_buffers.c:1365
mlxsw_devlink_sb_occ_max_clear+0x76/0xb0 drivers/net/ethernet/mellanox/mlxsw/core.c:1037
devlink_nl_cmd_sb_occ_max_clear_doit+0x1ec/0x280 net/core/devlink.c:1765
genl_family_rcv_msg_doit net/netlink/genetlink.c:669 [inline]
genl_family_rcv_msg net/netlink/genetlink.c:714 [inline]
genl_rcv_msg+0x617/0x980 net/netlink/genetlink.c:731
netlink_rcv_skb+0x152/0x440 net/netlink/af_netlink.c:2470
genl_rcv+0x24/0x40 net/netlink/genetlink.c:742
netlink_unicast_kernel net/netlink/af_netlink.c:1304 [inline]
netlink_unicast+0x53a/0x750 net/netlink/af_netlink.c:1330
netlink_sendmsg+0x850/0xd90 net/netlink/af_netlink.c:1919
sock_sendmsg_nosec net/socket.c:651 [inline]
sock_sendmsg+0x150/0x190 net/socket.c:671
____sys_sendmsg+0x6d8/0x840 net/socket.c:2359
___sys_sendmsg+0xff/0x170 net/socket.c:2413
__sys_sendmsg+0xe5/0x1b0 net/socket.c:2446
do_syscall_64+0x56/0xa0 arch/x86/entry/common.c:384
entry_SYSCALL_64_after_hwframe+0x44/0xa9
Freed by task 73:
save_stack+0x1b/0x40 mm/kasan/common.c:48
set_track mm/kasan/common.c:56 [inline]
kasan_set_free_info mm/kasan/common.c:316 [inline]
__kasan_slab_free+0x12c/0x170 mm/kasan/common.c:455
slab_free_hook mm/slub.c:1474 [inline]
slab_free_freelist_hook mm/slub.c:1507 [inline]
slab_free mm/slub.c:3072 [inline]
kmem_cache_free+0xbe/0x380 mm/slub.c:3088
kfree_skbmem net/core/skbuff.c:622 [inline]
kfree_skbmem+0xef/0x1b0 net/core/skbuff.c:616
__kfree_skb net/core/skbuff.c:679 [inline]
consume_skb net/core/skbuff.c:837 [inline]
consume_skb+0xe1/0x370 net/core/skbuff.c:831
mlxsw_emad_trans_finish+0x64/0x1c0 drivers/net/ethernet/mellanox/mlxsw/core.c:592
mlxsw_emad_transmit_retry.isra.0+0x9d/0xc0 drivers/net/ethernet/mellanox/mlxsw/core.c:613
mlxsw_emad_trans_timeout_work+0x43/0x50 drivers/net/ethernet/mellanox/mlxsw/core.c:625
process_one_work+0xa3e/0x17a0 kernel/workqueue.c:2269
worker_thread+0x9e/0x1050 kernel/workqueue.c:2415
kthread+0x355/0x470 kernel/kthread.c:291
ret_from_fork+0x22/0x30 arch/x86/entry/entry_64.S:293
The buggy address belongs to the object at ffff88804f5703c0
which belongs to the cache skbuff_head_cache of size 224
The buggy address is located 212 bytes inside of
224-byte region [ffff88804f5703c0, ffff88804f5704a0)
The buggy address belongs to the page:
page:ffffea00013d5c00 refcount:1 mapcount:0 mapping:0000000000000000
index:0x0
flags: 0x100000000000200(slab)
raw: 0100000000000200 dead000000000100 dead000000000122 ffff88806c625400
raw: 0000000000000000 00000000000c000c 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected
Memory state around the buggy address:
ffff88804f570380: fc fc fc fc fc fc fc fc fb fb fb fb fb fb fb fb
ffff88804f570400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>ffff88804f570480: fb fb fb fb fc fc fc fc fc fc fc fc fc fc fc fc
^
ffff88804f570500: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
ffff88804f570580: 00 00 00 00 00 00 00 00 00 00 00 00 fc fc fc fc
Fixes: caf7297e7a ("mlxsw: core: Introduce support for asynchronous EMAD register access")
Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
During port creation the driver instructs the device to advertise all
the supported link modes queried from the device.
Since cited commit not all the link modes supported by the device are
supported by the driver. This can result in the device negotiating a
link mode that is not recognized by the driver causing ethtool to show
an unsupported speed:
$ ethtool swp1
...
Speed: Unknown!
This is especially problematic when the netdev is enslaved to a bond, as
the bond driver uses unknown speed as an indication that the link is
down:
[13048.900895] net_ratelimit: 86 callbacks suppressed
[13048.900902] t_bond0: (slave swp52): failed to get link speed/duplex
[13048.912160] t_bond0: (slave swp49): failed to get link speed/duplex
Fix this by making sure that only link modes that are supported by both
the device and the driver are advertised.
Fixes: b97cd89126 ("mlxsw: Remove 56G speed support")
Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
When a mlx5 core devlink instance is reloaded in different net namespace,
its associated IB device is deleted and recreated.
Example sequence is:
$ ip netns add foo
$ devlink dev reload pci/0000:00:08.0 netns foo
$ ip netns del foo
mlx5 IB device needs to attach and detach the netdevice to it through the
netdev notifier chain during load and unload sequence. A below call graph
of the unload flow.
cleanup_net()
down_read(&pernet_ops_rwsem); <- first sem acquired
ops_pre_exit_list()
pre_exit()
devlink_pernet_pre_exit()
devlink_reload()
mlx5_devlink_reload_down()
mlx5_unload_one()
[...]
mlx5_ib_remove()
mlx5_ib_unbind_slave_port()
mlx5_remove_netdev_notifier()
unregister_netdevice_notifier()
down_write(&pernet_ops_rwsem);<- recurrsive lock
Hence, when net namespace is deleted, mlx5 reload results in deadlock.
When deadlock occurs, devlink mutex is also held. This not only deadlocks
the mlx5 device under reload, but all the processes which attempt to
access unrelated devlink devices are deadlocked.
Hence, fix this by mlx5 ib driver to register for per net netdev notifier
instead of global one, which operats on the net namespace without holding
the pernet_ops_rwsem.
Fixes: 4383cfcc65 ("net/mlx5: Add devlink reload")
Link: https://lore.kernel.org/r/20201026134359.23150-1-parav@nvidia.com
Signed-off-by: Parav Pandit <parav@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
The typical set of driver updates across the subsystem:
- Driver minor changes and bug fixes for mlx5, efa, rxe, vmw_pvrdma, hns,
usnic, qib, qedr, cxgb4, hns, bnxt_re
- Various rtrs fixes and updates
- Bug fix for mlx4 CM emulation for virtualization scenarios where MRA
wasn't working right
- Use tracepoints instead of pr_debug in the CM code
- Scrub the locking in ucma and cma to close more syzkaller bugs
- Use tasklet_setup in the subsystem
- Revert the idea that 'destroy' operations are not allowed to fail at
the driver level. This proved unworkable from a HW perspective.
- Revise how the umem API works so drivers make fewer mistakes using it
- XRC support for qedr
- Convert uverbs objects RWQ and MW to new the allocation scheme
- Large queue entry sizes for hns
- Use hmm_range_fault() for mlx5 On Demand Paging
- uverbs APIs to inspect the GID table instead of sysfs
- Move some of the RDMA code for building large page SGLs into
lib/scatterlist
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEfB7FMLh+8QxL+6i3OG33FX4gmxoFAl+J37MACgkQOG33FX4g
mxrKfRAAnIecwdE8df0yvVU5k0Eg6qVjMy9MMHq4va9m7g6GpUcNNI0nIlOASxH2
l+9vnUQS3ebgsPeECaDYzEr0hh/u53+xw2g4WV5ts/hE8KkQ6erruXb9kasCe8yi
5QWJ9K36T3c03Cd3EeH6JVtytAxuH42ombfo9BkFLPVyfG/R2tsAzvm5pVi73lxk
46wtU1Bqi4tsLhyCbifn1huNFGbHp08OIBPAIKPUKCA+iBRPaWS+Dpi+93h3g3Bp
oJwDhL9CBCGcHM+rKWLzek3Dy87FnQn7R1wmTpUFwkK+4AH3U/XazivhX035w1vL
YJyhakVU0kosHlX9hJTNKDHJGkt0YEV2mS8dxAuqilFBtdnrVszb5/MirvlzC310
/b5xCPSEusv9UVZV0G4zbySVNA9knZ4YaRiR3VDVMLKl/pJgTOwEiHIIx+vs3ejk
p8GRWa1SjXw5LfZEQcq39J689ljt6xjCTonyuBSv7vSQq5v8pjBxvHxiAe2FIa2a
ZyZeSCYoSh0SwJQukO2VO7aprhHP3TcCJ/987+X03LQ8tV2VWPktHqm62YCaDcOl
fgiQuQdPivRjDDkJgMfDWDGKfZeHoWLKl5XsJhWByt0lablVrsvc+8ylUl1UI7gI
16hWB/Qtlhfwg10VdApn+aOFpIS+s5P4XIp8ik57MZO+VeJzpmE=
=LKpl
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma
Pull rdma updates from Jason Gunthorpe:
"A usual cycle for RDMA with a typical mix of driver and core subsystem
updates:
- Driver minor changes and bug fixes for mlx5, efa, rxe, vmw_pvrdma,
hns, usnic, qib, qedr, cxgb4, hns, bnxt_re
- Various rtrs fixes and updates
- Bug fix for mlx4 CM emulation for virtualization scenarios where
MRA wasn't working right
- Use tracepoints instead of pr_debug in the CM code
- Scrub the locking in ucma and cma to close more syzkaller bugs
- Use tasklet_setup in the subsystem
- Revert the idea that 'destroy' operations are not allowed to fail
at the driver level. This proved unworkable from a HW perspective.
- Revise how the umem API works so drivers make fewer mistakes using
it
- XRC support for qedr
- Convert uverbs objects RWQ and MW to new the allocation scheme
- Large queue entry sizes for hns
- Use hmm_range_fault() for mlx5 On Demand Paging
- uverbs APIs to inspect the GID table instead of sysfs
- Move some of the RDMA code for building large page SGLs into
lib/scatterlist"
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (191 commits)
RDMA/ucma: Fix use after free in destroy id flow
RDMA/rxe: Handle skb_clone() failure in rxe_recv.c
RDMA/rxe: Move the definitions for rxe_av.network_type to uAPI
RDMA: Explicitly pass in the dma_device to ib_register_device
lib/scatterlist: Do not limit max_segment to PAGE_ALIGNED values
IB/mlx4: Convert rej_tmout radix-tree to XArray
RDMA/rxe: Fix bug rejecting all multicast packets
RDMA/rxe: Fix skb lifetime in rxe_rcv_mcast_pkt()
RDMA/rxe: Remove duplicate entries in struct rxe_mr
IB/hfi,rdmavt,qib,opa_vnic: Update MAINTAINERS
IB/rdmavt: Fix sizeof mismatch
MAINTAINERS: CISCO VIC LOW LATENCY NIC DRIVER
RDMA/bnxt_re: Fix sizeof mismatch for allocation of pbl_tbl.
RDMA/bnxt_re: Use rdma_umem_for_each_dma_block()
RDMA/umem: Move to allocate SG table from pages
lib/scatterlist: Add support in dynamic allocation of SG table from pages
tools/testing/scatterlist: Show errors in human readable form
tools/testing/scatterlist: Rejuvenate bit-rotten test
RDMA/ipoib: Set rtnl_link_ops for ipoib interfaces
RDMA/uverbs: Expose the new GID query API to user space
...
From Maor Gottlieb says:
====================
This series extends __sg_alloc_table_from_pages to allow chaining of new
pages to an already initialized SG table.
This allows for drivers to utilize the optimization of merging contiguous
pages without a need to pre allocate all the pages and hold them in a very
large temporary buffer prior to the call to SG table initialization.
The last patch changes the Infiniband core to use the new API. It removes
duplicate functionality from the code and benefits from the optimization
of allocating dynamic SG table from pages.
In huge pages system of 2MB page size, without this change, the SG table
would contain x512 SG entries.
====================
* branch 'dynamic_sg':
RDMA/umem: Move to allocate SG table from pages
lib/scatterlist: Add support in dynamic allocation of SG table from pages
tools/testing/scatterlist: Show errors in human readable form
tools/testing/scatterlist: Rejuvenate bit-rotten test
Add redirect_neigh() BPF packet redirect helper, allowing to limit stack
traversal in common container configs and improving TCP back-pressure.
Daniel reports ~10Gbps => ~15Gbps single stream TCP performance gain.
Expand netlink policy support and improve policy export to user space.
(Ge)netlink core performs request validation according to declared
policies. Expand the expressiveness of those policies (min/max length
and bitmasks). Allow dumping policies for particular commands.
This is used for feature discovery by user space (instead of kernel
version parsing or trial and error).
Support IGMPv3/MLDv2 multicast listener discovery protocols in bridge.
Allow more than 255 IPv4 multicast interfaces.
Add support for Type of Service (ToS) reflection in SYN/SYN-ACK
packets of TCPv6.
In Multi-patch TCP (MPTCP) support concurrent transmission of data
on multiple subflows in a load balancing scenario. Enhance advertising
addresses via the RM_ADDR/ADD_ADDR options.
Support SMC-Dv2 version of SMC, which enables multi-subnet deployments.
Allow more calls to same peer in RxRPC.
Support two new Controller Area Network (CAN) protocols -
CAN-FD and ISO 15765-2:2016.
Add xfrm/IPsec compat layer, solving the 32bit user space on 64bit
kernel problem.
Add TC actions for implementing MPLS L2 VPNs.
Improve nexthop code - e.g. handle various corner cases when nexthop
objects are removed from groups better, skip unnecessary notifications
and make it easier to offload nexthops into HW by converting
to a blocking notifier.
Support adding and consuming TCP header options by BPF programs,
opening the doors for easy experimental and deployment-specific
TCP option use.
Reorganize TCP congestion control (CC) initialization to simplify life
of TCP CC implemented in BPF.
Add support for shipping BPF programs with the kernel and loading them
early on boot via the User Mode Driver mechanism, hence reusing all the
user space infra we have.
Support sleepable BPF programs, initially targeting LSM and tracing.
Add bpf_d_path() helper for returning full path for given 'struct path'.
Make bpf_tail_call compatible with bpf-to-bpf calls.
Allow BPF programs to call map_update_elem on sockmaps.
Add BPF Type Format (BTF) support for type and enum discovery, as
well as support for using BTF within the kernel itself (current use
is for pretty printing structures).
Support listing and getting information about bpf_links via the bpf
syscall.
Enhance kernel interfaces around NIC firmware update. Allow specifying
overwrite mask to control if settings etc. are reset during update;
report expected max time operation may take to users; support firmware
activation without machine reboot incl. limits of how much impact
reset may have (e.g. dropping link or not).
Extend ethtool configuration interface to report IEEE-standard
counters, to limit the need for per-vendor logic in user space.
Adopt or extend devlink use for debug, monitoring, fw update
in many drivers (dsa loop, ice, ionic, sja1105, qed, mlxsw,
mv88e6xxx, dpaa2-eth).
In mlxsw expose critical and emergency SFP module temperature alarms.
Refactor port buffer handling to make the defaults more suitable and
support setting these values explicitly via the DCBNL interface.
Add XDP support for Intel's igb driver.
Support offloading TC flower classification and filtering rules to
mscc_ocelot switches.
Add PTP support for Marvell Octeontx2 and PP2.2 hardware, as well as
fixed interval period pulse generator and one-step timestamping in
dpaa-eth.
Add support for various auth offloads in WiFi APs, e.g. SAE (WPA3)
offload.
Add Lynx PHY/PCS MDIO module, and convert various drivers which have
this HW to use it. Convert mvpp2 to split PCS.
Support Marvell Prestera 98DX3255 24-port switch ASICs, as well as
7-port Mediatek MT7531 IP.
Add initial support for QCA6390 and IPQ6018 in ath11k WiFi driver,
and wcn3680 support in wcn36xx.
Improve performance for packets which don't require much offloads
on recent Mellanox NICs by 20% by making multiple packets share
a descriptor entry.
Move chelsio inline crypto drivers (for TLS and IPsec) from the crypto
subtree to drivers/net. Move MDIO drivers out of the phy directory.
Clean up a lot of W=1 warnings, reportedly the actively developed
subsections of networking drivers should now build W=1 warning free.
Make sure drivers don't use in_interrupt() to dynamically adapt their
code. Convert tasklets to use new tasklet_setup API (sadly this
conversion is not yet complete).
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAl+ItRwACgkQMUZtbf5S
IrtTMg//UxpdR/MirT1DatBU0K/UGAZY82hV7F/UC8tPgjfHZeHvWlDFxfi3YP81
PtPKbhRZ7DhwBXefUp6nY3UdvjftrJK2lJm8prJUPSsZRye8Wlcb7y65q7/P2y2U
Efucyopg6RUrmrM0DUsIGYGJgylQLHnMYUl/keCsD4t5Bp4ksyi9R2t5eitGoWzh
r3QGdbSa0AuWx4iu0i+tqp6Tj0ekMBMXLVb35dtU1t0joj2KTNEnSgABN3prOa8E
iWYf2erOau68Ogp3yU3miCy0ZU4p/7qGHTtzbcp677692P/ekak6+zmfHLT9/Pjy
2Stq2z6GoKuVxdktr91D9pA3jxG4LxSJmr0TImcGnXbvkMP3Ez3g9RrpV5fn8j6F
mZCH8TKZAoD5aJrAJAMkhZmLYE1pvDa7KolSk8WogXrbCnTEb5Nv8FHTS1Qnk3yl
wSKXuvutFVNLMEHCnWQLtODbTST9DI/aOi6EctPpuOA/ZyL1v3pl+gfp37S+LUTe
owMnT/7TdvKaTD0+gIyU53M6rAWTtr5YyRQorX9awIu/4Ha0F0gYD7BJZQUGtegp
HzKt59NiSrFdbSH7UdyemdBF4LuCgIhS7rgfeoUXMXmuPHq7eHXyHZt5dzPPa/xP
81P0MAvdpFVwg8ij2yp2sHS7sISIRKq17fd1tIewUabxQbjXqPc=
=bc1U
-----END PGP SIGNATURE-----
Merge tag 'net-next-5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next
Pull networking updates from Jakub Kicinski:
- Add redirect_neigh() BPF packet redirect helper, allowing to limit
stack traversal in common container configs and improving TCP
back-pressure.
Daniel reports ~10Gbps => ~15Gbps single stream TCP performance gain.
- Expand netlink policy support and improve policy export to user
space. (Ge)netlink core performs request validation according to
declared policies. Expand the expressiveness of those policies
(min/max length and bitmasks). Allow dumping policies for particular
commands. This is used for feature discovery by user space (instead
of kernel version parsing or trial and error).
- Support IGMPv3/MLDv2 multicast listener discovery protocols in
bridge.
- Allow more than 255 IPv4 multicast interfaces.
- Add support for Type of Service (ToS) reflection in SYN/SYN-ACK
packets of TCPv6.
- In Multi-patch TCP (MPTCP) support concurrent transmission of data on
multiple subflows in a load balancing scenario. Enhance advertising
addresses via the RM_ADDR/ADD_ADDR options.
- Support SMC-Dv2 version of SMC, which enables multi-subnet
deployments.
- Allow more calls to same peer in RxRPC.
- Support two new Controller Area Network (CAN) protocols - CAN-FD and
ISO 15765-2:2016.
- Add xfrm/IPsec compat layer, solving the 32bit user space on 64bit
kernel problem.
- Add TC actions for implementing MPLS L2 VPNs.
- Improve nexthop code - e.g. handle various corner cases when nexthop
objects are removed from groups better, skip unnecessary
notifications and make it easier to offload nexthops into HW by
converting to a blocking notifier.
- Support adding and consuming TCP header options by BPF programs,
opening the doors for easy experimental and deployment-specific TCP
option use.
- Reorganize TCP congestion control (CC) initialization to simplify
life of TCP CC implemented in BPF.
- Add support for shipping BPF programs with the kernel and loading
them early on boot via the User Mode Driver mechanism, hence reusing
all the user space infra we have.
- Support sleepable BPF programs, initially targeting LSM and tracing.
- Add bpf_d_path() helper for returning full path for given 'struct
path'.
- Make bpf_tail_call compatible with bpf-to-bpf calls.
- Allow BPF programs to call map_update_elem on sockmaps.
- Add BPF Type Format (BTF) support for type and enum discovery, as
well as support for using BTF within the kernel itself (current use
is for pretty printing structures).
- Support listing and getting information about bpf_links via the bpf
syscall.
- Enhance kernel interfaces around NIC firmware update. Allow
specifying overwrite mask to control if settings etc. are reset
during update; report expected max time operation may take to users;
support firmware activation without machine reboot incl. limits of
how much impact reset may have (e.g. dropping link or not).
- Extend ethtool configuration interface to report IEEE-standard
counters, to limit the need for per-vendor logic in user space.
- Adopt or extend devlink use for debug, monitoring, fw update in many
drivers (dsa loop, ice, ionic, sja1105, qed, mlxsw, mv88e6xxx,
dpaa2-eth).
- In mlxsw expose critical and emergency SFP module temperature alarms.
Refactor port buffer handling to make the defaults more suitable and
support setting these values explicitly via the DCBNL interface.
- Add XDP support for Intel's igb driver.
- Support offloading TC flower classification and filtering rules to
mscc_ocelot switches.
- Add PTP support for Marvell Octeontx2 and PP2.2 hardware, as well as
fixed interval period pulse generator and one-step timestamping in
dpaa-eth.
- Add support for various auth offloads in WiFi APs, e.g. SAE (WPA3)
offload.
- Add Lynx PHY/PCS MDIO module, and convert various drivers which have
this HW to use it. Convert mvpp2 to split PCS.
- Support Marvell Prestera 98DX3255 24-port switch ASICs, as well as
7-port Mediatek MT7531 IP.
- Add initial support for QCA6390 and IPQ6018 in ath11k WiFi driver,
and wcn3680 support in wcn36xx.
- Improve performance for packets which don't require much offloads on
recent Mellanox NICs by 20% by making multiple packets share a
descriptor entry.
- Move chelsio inline crypto drivers (for TLS and IPsec) from the
crypto subtree to drivers/net. Move MDIO drivers out of the phy
directory.
- Clean up a lot of W=1 warnings, reportedly the actively developed
subsections of networking drivers should now build W=1 warning free.
- Make sure drivers don't use in_interrupt() to dynamically adapt their
code. Convert tasklets to use new tasklet_setup API (sadly this
conversion is not yet complete).
* tag 'net-next-5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (2583 commits)
Revert "bpfilter: Fix build error with CONFIG_BPFILTER_UMH"
net, sockmap: Don't call bpf_prog_put() on NULL pointer
bpf, selftest: Fix flaky tcp_hdr_options test when adding addr to lo
bpf, sockmap: Add locking annotations to iterator
netfilter: nftables: allow re-computing sctp CRC-32C in 'payload' statements
net: fix pos incrementment in ipv6_route_seq_next
net/smc: fix invalid return code in smcd_new_buf_create()
net/smc: fix valid DMBE buffer sizes
net/smc: fix use-after-free of delayed events
bpfilter: Fix build error with CONFIG_BPFILTER_UMH
cxgb4/ch_ipsec: Replace the module name to ch_ipsec from chcr
net: sched: Fix suspicious RCU usage while accessing tcf_tunnel_info
bpf: Fix register equivalence tracking.
rxrpc: Fix loss of final ack on shutdown
rxrpc: Fix bundle counting for exclusive connections
netfilter: restore NF_INET_NUMHOOKS
ibmveth: Identify ingress large send packets.
ibmveth: Switch order of ibmveth_helper calls.
cxgb4: handle 4-tuple PEDIT to NAT mode translation
selftests: Add VRF route leaking tests
...
Minor conflicts in net/mptcp/protocol.h and
tools/testing/selftests/net/Makefile.
In both cases code was added on both sides in the same place
so just keep both.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Here are some SPDX-specific changes for 5.10-rc1.
They include:
- driver fixes to make spdxcheck.pl work properly
- add GFDL licenses as "deprecated" but required due to some of
our documentation using them
- add Zlib license as "deprecated" but required because we have
code with this license in the tree.
- convert some drivers to have SPDX identifiers that previously
didn't have them.
All have been in linux-next for a very long time with no reported
issues.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCX4c6oA8cZ3JlZ0Brcm9h
aC5jb20ACgkQMUfUDdst+yl35ACg2i+pP5CBExSzQUtA1Tx/UD2CVNMAoIAQChwj
SHZurDuyHkEiCdB+5n1u
=C9qR
-----END PGP SIGNATURE-----
Merge tag 'spdx-5.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/spdx
Pull SPDX updates from Greg KH:
"Here are some SPDX-specific changes for 5.10-rc1.
They include:
- driver fixes to make spdxcheck.pl work properly
- add GFDL licenses as "deprecated" but required due to some of our
documentation using them
- add Zlib license as "deprecated" but required because we have code
with this license in the tree.
- convert some drivers to have SPDX identifiers that previously
didn't have them.
All have been in linux-next for a very long time with no reported
issues"
* tag 'spdx-5.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/spdx:
scripts/spdxcheck.py: handle license identifiers in XML comments
net/mlx5: IPsec: make spdxcheck.py happy
LICENSES/deprecated: add Zlib license text
LICENSE: add GFDL deprecated licenses
net/qla3xxx: Convert to SPDX license identifiers
net/qlge: Convert to SPDX license identifiers
net/qlcnic: Convert to SPDX license identifiers
scsi/qla2xxx: Convert to SPDX license identifiers
scsi/qla4xxx: Convert to SPDX license identifiers
Add new FTE in TX IPsec FT per IPsec state. It has the
same matching criteria as the RX steering rule.
The IPsec FT is created/destroyed when the first/last rule
is added/deleted respectively.
Signed-off-by: Huy Nguyen <huyn@mellanox.com>
Reviewed-by: Boris Pismenny <borisp@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Add new namespace that represents the NIC TX domain.
Signed-off-by: Huy Nguyen <huyn@mellanox.com>
Signed-off-by: Raed Salem <raeds@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Currently the error exit path err_free kfree's attr. In the case where
flow and parse_attr failed to be allocated this return path will free
the uninitialized pointer attr, which is not correct. In the other
case where attr fails to allocate attr does not need to be freed. So
in both error exits via err_free attr should not be freed, so remove
it.
Addresses-Coverity: ("Uninitialized pointer read")
Fixes: ff7ea04ad5 ("net/mlx5e: Fix potential null pointer dereference")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
netcons calls napi_poll with a budget of 0 to transmit packets.
Handle this by:
- skipping RX processing
- do not try to recycle TX packets to the RX cache
Signed-off-by: Jonathan Lemon <jonathan.lemon@gmail.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Add support for devlink reload action fw_activate with reload limit
no_reset which does firmware live patching, updating the firmware image
without reset, no downtime and no configuration lose. The driver checks
if the firmware is capable of handling the pending firmware changes as a
live patch. If it is then it triggers firmware live patching flow.
Signed-off-by: Moshe Shemesh <moshe@mellanox.com>
Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Firmware live patch event notifies the driver that the firmware was just
updated using live patch. In such case the driver should not reload or
re-initiate entities, part to updating the firmware version and
re-initiate the firmware tracer which can be updated by live patch with
new strings database to help debugging an issue.
Signed-off-by: Moshe Shemesh <moshe@mellanox.com>
Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The enable_remote_dev_reset devlink param flags that the host admin
allows resets by other hosts. In case it is cleared mlx5 host PF driver
will send NACK on pci sync for firmware update reset request and the
command will fail.
By default enable_remote_dev_reset parameter is true, so pci sync for
firmware update reset is enabled.
Signed-off-by: Moshe Shemesh <moshe@mellanox.com>
Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Add support for devlink reload action fw_activate. To activate firmware
image the mlx5 driver resets the firmware and reloads it from flash. If
a new image was stored on flash it will be loaded. Once this reload
command is executed the driver initiates fw sync reset flow, where the
firmware synchronizes all PFs on coming reset and driver reload.
Signed-off-by: Moshe Shemesh <moshe@mellanox.com>
Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
If firmware sends sync_reset_abort to driver the driver should clear the
reset requested mode as reset is not expected any more.
Signed-off-by: Moshe Shemesh <moshe@mellanox.com>
Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
On sync_reset_now event the driver does reload and PCI link toggle to
activate firmware upgrade reset. When the firmware sends this event it
syncs the event on all PFs, so all PFs will do PCI link toggle at once.
To do PCI link toggle, the driver ensures that no other device ID under
the same bridge by checking that all the PF functions under the same PCI
bridge have same device ID. If no other device it uses PCI bridge link
control to turn link down and up.
Signed-off-by: Moshe Shemesh <moshe@mellanox.com>
Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Once the driver gets sync_reset_request from firmware it prepares for the
coming reset and sends acknowledge.
After getting this event the driver expects device reset, either it will
trigger PCI reset on sync_reset_now event or such PCI reset will be
triggered by another PF of the same device. So it moves to reset
requested mode and if it gets PCI reset triggered by the other PF it
detect the reset and reloads.
Signed-off-by: Moshe Shemesh <moshe@mellanox.com>
Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Set capability to notify the firmware that this host driver is capable
of handling pci sync for firmware update events.
Signed-off-by: Moshe Shemesh <moshe@mellanox.com>
Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Add functions to query and set the MFRL reset options supported by
firmware.
Signed-off-by: Moshe Shemesh <moshe@mellanox.com>
Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Add reload limit to demand restrictions on reload actions.
Reload limits supported:
no_reset: No reset allowed, no down time allowed, no link flap and no
configuration is lost.
By default reload limit is unspecified and so no constraints on reload
actions are required.
Some combinations of action and limit are invalid. For example, driver
can not reinitialize its entities without any downtime.
The no_reset reload limit will have usecase in this patchset to
implement restricted fw_activate on mlx5.
Have the uapi parameter of reload limit ready for future support of
multiselection.
Signed-off-by: Moshe Shemesh <moshe@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Add devlink reload action to allow the user to request a specific reload
action. The action parameter is optional, if not specified then devlink
driver re-init action is used (backward compatible).
Note that when required to do firmware activation some drivers may need
to reload the driver. On the other hand some drivers may need to reset
the firmware to reinitialize the driver entities. Therefore, the devlink
reload command returns the actions which were actually performed.
Reload actions supported are:
driver_reinit: driver entities re-initialization, applying devlink-param
and devlink-resource values.
fw_activate: firmware activate.
command examples:
$devlink dev reload pci/0000:82:00.0 action driver_reinit
reload_actions_performed:
driver_reinit
$devlink dev reload pci/0000:82:00.0 action fw_activate
reload_actions_performed:
driver_reinit fw_activate
Signed-off-by: Moshe Shemesh <moshe@mellanox.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Rejecting non-native endian BTF overlapped with the addition
of support for it.
The rest were more simple overlapping changes, except the
renesas ravb binding update, which had to follow a file
move as well as a YAML conversion.
Signed-off-by: David S. Miller <davem@davemloft.net>
When interface is attached while in promiscuous mode and with VLAN
filtering turned off, both configurations are not respected and VLAN
filtering is performed.
There are 2 flows which add the any-vid rules during interface attach:
VLAN creation table and set rx mode. Each is relaying on the other to
add any-vid rules, eventually non of them does.
Fix this by adding any-vid rules on VLAN creation regardless of
promiscuous mode.
Fixes: 9df30601c8 ("net/mlx5e: Restore vlan filter after seamless reset")
Signed-off-by: Aya Levin <ayal@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Prior to this patch unloading an interface in promiscuous mode with RX
VLAN filtering feature turned off - resulted in a warning. This is due
to a wrong condition in the VLAN rules cleanup flow, which left the
any-vid rules in the VLAN steering table. These rules prevented
destroying the flow group and the flow table.
The any-vid rules are removed in 2 flows, but none of them remove it in
case both promiscuous is set and VLAN filtering is off. Fix the issue by
changing the condition of the VLAN table cleanup flow to clean also in
case of promiscuous mode.
mlx5_core 0000:00:08.0: mlx5_destroy_flow_group:2123:(pid 28729): Flow group 20 wasn't destroyed, refcount > 1
mlx5_core 0000:00:08.0: mlx5_destroy_flow_group:2123:(pid 28729): Flow group 19 wasn't destroyed, refcount > 1
mlx5_core 0000:00:08.0: mlx5_destroy_flow_table:2112:(pid 28729): Flow table 262149 wasn't destroyed, refcount > 1
...
...
------------[ cut here ]------------
FW pages counter is 11560 after reclaiming all pages
WARNING: CPU: 1 PID: 28729 at
drivers/net/ethernet/mellanox/mlx5/core/pagealloc.c:660
mlx5_reclaim_startup_pages+0x178/0x230 [mlx5_core]
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS
rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
Call Trace:
mlx5_function_teardown+0x2f/0x90 [mlx5_core]
mlx5_unload_one+0x71/0x110 [mlx5_core]
remove_one+0x44/0x80 [mlx5_core]
pci_device_remove+0x3e/0xc0
device_release_driver_internal+0xfb/0x1c0
device_release_driver+0x12/0x20
pci_stop_bus_device+0x68/0x90
pci_stop_and_remove_bus_device+0x12/0x20
hv_eject_device_work+0x6f/0x170 [pci_hyperv]
? __schedule+0x349/0x790
process_one_work+0x206/0x400
worker_thread+0x34/0x3f0
? process_one_work+0x400/0x400
kthread+0x126/0x140
? kthread_park+0x90/0x90
ret_from_fork+0x22/0x30
---[ end trace 6283bde8d26170dc ]---
Fixes: 9df30601c8 ("net/mlx5e: Restore vlan filter after seamless reset")
Signed-off-by: Aya Levin <ayal@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>