Currently, ethernet drivers are using devlink_port_type_eth_set() and
devlink_port_type_clear() to set devlink port type and link to related
netdev.
Instead of calling them directly, let the driver use
SET_NETDEV_DEVLINK_PORT macro to assign devlink_port pointer and let
devlink to track it. Note the devlink port pointer is static during
the time netdevice is registered.
In devlink code, use per-namespace netdev notifier to track
the netdevices with devlink_port assigned and change the internal
devlink_port type and related type pointer accordingly.
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Follow-up patch is going to introduce a netdevice notifier event
processing which is called with RTNL mutex held. Processing of this will
eventually lead to call to port_notity() and port_fill() which currently
takes RTNL mutex internally. So as a temporary solution, propagate a
bool indicating if the mutex is already held. This will go away in one
of the follow-up patches.
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
As __devlink_port_type_set() is going to be called directly from netdevice
notifier event handle in one of the follow-up patches, move the
port_type_netdev_checks() call there.
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
As __devlink_port_type_set() is going to be called directly from netdevice
notifier event handle in one of the follow-up patches, move the
port_type_warn_schedule() call there.
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Instead of storing type_dev as a void pointer, convert it to union and
use it to store either struct net_device or struct ib_device pointer.
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Now that the 32bit UP oddity is gone and 32bit uses always a sequence
count, there is no need for the fetch_irq() variants anymore.
Convert to the regular interface.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Lifetime of some of the devlink objects, like regions, is currently
forced to be different for devlink instance and devlink port instance
(per-port regions). The reason is that for devlink ports, the internal
structures initialization happens only after devlink_port_register() is
called.
To resolve this inconsistency, introduce new set of helpers to allow
driver to initialize devlink pointer and region list before
devlink_register() is called. That allows port regions to be created
before devlink port registration and destroyed after devlink
port unregistration.
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Instead of relying on devlink pointer not being initialized, introduce
an extra flag to indicate if devlink port is registered. This is needed
as later on devlink pointer is going to be initialized even in case
devlink port is not registered yet.
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Instead of checking devlink_port->devlink pointer for not being NULL
which indicates that devlink port is registered, put this check to new
pair of helpers similar to what we have for devlink and use them in
other functions.
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
This reverts commit 6005a8aece.
The assertion was intentionally removed in commit 043b8413e8 ("net:
devlink: remove redundant rtnl lock assert") and, contrary what is
described in the commit message, the comment reflects that: "Caller must
hold RTNL mutex or reference to dev...".
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Tested-by: Leon Romanovsky <leonro@nvidia.com>
Link: https://lore.kernel.org/r/20220829121324.3980376-1-vladbu@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Follow-up the removal of unused internal api of port params made by
commit 42ded61aa7 ("devlink: Delete not used port parameters APIs")
and stub the commands and add extack message to tell the user what is
going on.
If later on port params are needed, could be easily re-introduced,
but until then it is a dead code.
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Link: https://lore.kernel.org/r/20220826082730.1399735-1-jiri@resnulli.us
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Devlink with its global attr policy has a lot of attribute
presence check, use the new ext ack reporting when they are
missing.
Reviewed-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
We had historically not checked that genlmsghdr.reserved
is 0 on input which prevents us from using those precious
bytes in the future.
One use case would be to extend the cmd field, which is
currently just 8 bits wide and 256 is not a lot of commands
for some core families.
To make sure that new families do the right thing by default
put the onus of opting out of validation on existing families.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Acked-by: Paul Moore <paul@paul-moore.com> (NetLabel)
Signed-off-by: David S. Miller <davem@davemloft.net>
Similar to devlink_compat_phys_port_name_get(), make sure that
devlink_compat_switch_id_get() is called with RTNL lock held. Comment
already says so, so put this in code as well.
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Link: https://lore.kernel.org/r/20220825112923.1359194-1-jiri@resnulli.us
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Limit the acceptance of component name passed to cmd_flash_update() to
match one of the versions returned by info_get(), marked by version type.
This makes things clearer and enforces 1:1 mapping between exposed
version and accepted flash component.
Check VERSION_TYPE_COMPONENT version type during cmd_flash_update()
execution by calling info_get() with different "req" context.
That causes info_get() to lookup the component name instead of
filling-up the netlink message.
Remove "UPDATE_COMPONENT" flag which becomes used.
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Whenever the driver is called by his info_get() op, it may put multiple
version names and values to the netlink message. Extend by additional
helper devlink_info_version_running/stored_put_ext() that allows to
specify a version type that indicates when particular version name
represents a flash component.
This is going to be used in follow-up patch calling info_get() during
flash update command checking if version with this the version type
exists.
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
After a failed devlink reload, devlink parameters are still registered,
which means user space can set and get their values. In the case of the
mlxsw "acl_region_rehash_interval" parameter, these operations will
trigger a use-after-free [1].
Fix this by rejecting set and get operations while in the failed state.
Return the "-EOPNOTSUPP" error code which does not abort the parameters
dump, but instead causes it to skip over the problematic parameter.
Another possible fix is to perform these checks in the mlxsw parameter
callbacks, but other drivers might be affected by the same problem and I
am not aware of scenarios where these stricter checks will cause a
regression.
[1]
mlxsw_spectrum3 0000:00:10.0: Port 125: Failed to register netdev
mlxsw_spectrum3 0000:00:10.0: Failed to create ports
==================================================================
BUG: KASAN: use-after-free in mlxsw_sp_acl_tcam_vregion_rehash_intrvl_get+0xbd/0xd0 drivers/net/ethernet/mellanox/mlxsw/spectrum_acl_tcam.c:904
Read of size 4 at addr ffff8880099dcfd8 by task kworker/u4:4/777
CPU: 1 PID: 777 Comm: kworker/u4:4 Not tainted 5.19.0-rc7-custom-126601-gfe26f28c586d #1
Hardware name: QEMU MSN4700, BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
Workqueue: netns cleanup_net
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0x92/0xbd lib/dump_stack.c:106
print_address_description mm/kasan/report.c:313 [inline]
print_report.cold+0x5e/0x5cf mm/kasan/report.c:429
kasan_report+0xb9/0xf0 mm/kasan/report.c:491
__asan_report_load4_noabort+0x14/0x20 mm/kasan/report_generic.c:306
mlxsw_sp_acl_tcam_vregion_rehash_intrvl_get+0xbd/0xd0 drivers/net/ethernet/mellanox/mlxsw/spectrum_acl_tcam.c:904
mlxsw_sp_acl_region_rehash_intrvl_get+0x49/0x60 drivers/net/ethernet/mellanox/mlxsw/spectrum_acl.c:1106
mlxsw_sp_params_acl_region_rehash_intrvl_get+0x33/0x80 drivers/net/ethernet/mellanox/mlxsw/spectrum.c:3854
devlink_param_get net/core/devlink.c:4981 [inline]
devlink_nl_param_fill+0x238/0x12d0 net/core/devlink.c:5089
devlink_param_notify+0xe5/0x230 net/core/devlink.c:5168
devlink_ns_change_notify net/core/devlink.c:4417 [inline]
devlink_ns_change_notify net/core/devlink.c:4396 [inline]
devlink_reload+0x15f/0x700 net/core/devlink.c:4507
devlink_pernet_pre_exit+0x112/0x1d0 net/core/devlink.c:12272
ops_pre_exit_list net/core/net_namespace.c:152 [inline]
cleanup_net+0x494/0xc00 net/core/net_namespace.c:582
process_one_work+0x9fc/0x1710 kernel/workqueue.c:2289
worker_thread+0x675/0x10b0 kernel/workqueue.c:2436
kthread+0x30c/0x3d0 kernel/kthread.c:376
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:306
</TASK>
The buggy address belongs to the physical page:
page:ffffea0000267700 refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x99dc
flags: 0x100000000000000(node=0|zone=1)
raw: 0100000000000000 0000000000000000 dead000000000122 0000000000000000
raw: 0000000000000000 0000000000000000 00000000ffffffff 0000000000000000
page dumped because: kasan: bad access detected
Memory state around the buggy address:
ffff8880099dce80: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
ffff8880099dcf00: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
>ffff8880099dcf80: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
^
ffff8880099dd000: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
ffff8880099dd080: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
==================================================================
Fixes: 98bbf70c1c ("mlxsw: spectrum: add "acl_region_rehash_interval" devlink param")
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit 2dec18ad82 forgets to call mutex_unlock() before the function
returns in the error path:
New smatch warnings:
net/core/devlink.c:6392 devlink_nl_cmd_region_new() warn: inconsistent \
returns '®ion->snapshot_lock'.
Make sure we call mutex_unlock() in this error path.
Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Fixes: 2dec18ad82 ("net: devlink: remove region snapshots list dependency on devlink->lock")
Signed-off-by: Ammar Faizi <ammarfaizi2@gnuweeb.org>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Link: https://lore.kernel.org/r/20220801115742.1309329-1-ammar.faizi@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
As the devlink_mutex was removed and all devlink instances are protected
individually by devlink->lock mutex, allow the netlink ops to run
in parallel and therefore allow user to execute commands on multiple
devlink instances simultaneously.
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
All accesses to devlink structure from userspace and drivers are locked
with devlink->lock instance mutex. Also, devlinks xa_array iteration is
taken care of by iteration helpers taking devlink reference.
Therefore, remove devlink_mutex as it is no longer needed.
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Convert reload command to behave the same way as the rest of the
commands and let if be called with devlink->lock held. Remove the
temporary devl_lock taking from drivers. As the DEVLINK_NL_FLAG_NO_LOCK
flag is no longer used, remove it alongside.
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add new mark called "unregistering" to be set at the beginning of
devlink_unregister() function. Check this mark during devlinks
iteration in order to prevent getting a reference of devlink which is
being currently unregistered.
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Let the core take the devlink instance lock around health callbacks and
remove the now redundant locking in the drivers.
Signed-off-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
After mlx4 driver is converted to do locked reload,
devlink_region_snapshot_create() may be called from both locked and
unlocked context.
Note that in mlx4 region snapshots could be created on any command
failure. That can happen in any flow that involves commands to FW,
which means most of the driver flows.
So resolve this by removing dependency on devlink->lock for region
snapshots list consistency and introduce new mutex to ensure it.
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
After mlx4 driver is converted to do locked reload, functions to get/put
regions snapshot ID may be called from both locked and unlocked context.
So resolve this by removing dependency on devlink->lock for region
snapshot ID tracking by using internal xa_lock() to maintain
shapshot_ids xa_array consistency.
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Add a framework for running selftests.
Framework exposes devlink commands and test suite(s) to the user
to execute and query the supported tests by the driver.
Below are new entries in devlink_nl_ops
devlink_nl_cmd_selftests_show_doit/dumpit: To query the supported
selftests by the drivers.
devlink_nl_cmd_selftests_run: To execute selftests. Users can
provide a test mask for executing group tests or standalone tests.
Documentation/networking/devlink/ path is already part of MAINTAINERS &
the new files come under this path. Hence no update needed to the
MAINTAINERS
Signed-off-by: Vikas Gupta <vikas.gupta@broadcom.com>
Reviewed-by: Andy Gospodarek <gospo@broadcom.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The net_eq() check is already performed inside
devlinks_xa_for_each_registered_get() helper, so remove the redundant
appearance.
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Link: https://lore.kernel.org/r/20220727055912.568391-1-jiri@resnulli.us
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
For the purpose of exposing device info and allow flash update which is
going to be implemented in follow-up patches, introduce a possibility
for a line card to expose relation to nested devlink entity. The nested
devlink entity represents the line card.
Example:
$ devlink lc show pci/0000:01:00.0 lc 1
pci/0000:01:00.0:
lc 1 state active type 16x100G nested_devlink auxiliary/mlxsw_core.lc.0
supported_types:
16x100G
$ devlink dev show auxiliary/mlxsw_core.lc.0
auxiliary/mlxsw_core.lc.0
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Acked-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Benefit from having devlinks iterator helper
devlinks_xa_for_each_registered_get() and move the net pointer
check inside.
Suggested-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Remove dependency on devlink_mutex during devlinks xarray iteration.
The reason is that devlink_register/unregister() functions taking
devlink_mutex would deadlock during devlink reload operation of devlink
instance which registers/unregisters nested devlink instances.
The devlinks xarray consistency is ensured internally by xarray.
There is a reference taken when working with devlink using
devlink_try_get(). But there is no guarantee that devlink pointer
picked during xarray iteration is not freed before devlink_try_get()
is called.
Make sure that devlink_try_get() works with valid pointer.
Achieve it by:
1) Splitting devlink_put() so the completion is sent only
after grace period. Completion unblocks the devlink_unregister()
routine, which is followed-up by devlink_free()
2) During devlinks xa_array iteration, get devlink pointer from xa_array
holding RCU read lock and taking reference using devlink_try_get()
before unlock.
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Remove locked versions of functions that are no longer used by anyone.
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Prepare for devlink reload being called with devlink->lock held and
convert the netdevsim driver to use unlocked devlink API during init and
fini flows. Take devl_lock() in reload_down() and reload_up() ops in the
meantime before reload cmd is converted to take the lock itself.
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Add unlocked variants of devlink_region_create/destroy() functions
to be used in drivers called-in with devlink->lock held.
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Add unlocked variants of devlink_dpipe*() functions to be used
in drivers called-in with devlink->lock held.
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Add unlocked variants of devlink_sb*() functions to be used
in drivers called-in with devlink->lock held.
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Add unlocked variants of devlink_resource*() functions to be used
in drivers called-in with devlink->lock held.
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Add unlocked variants of devl_trap*() functions to be used in drivers
called-in with devlink->lock held.
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Return directly without intermediate value store at the end of
devlink_port_new_notify() function.
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Acked-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Fix the typo in a name of devlink_port_new_notifiy() function.
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Acked-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The return value is not used, so change the return value type to void.
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Acked-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
As far as the lock helpers exist as the drivers need to work with the
devlink->lock mutex, use the helpers internally in devlink.c in order to
be consistent.
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Acked-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
To be unified with the rest of the code, the unlocked version (devl_*)
of function should have the same description in documentation as the
locked one. Add the missing documentation. Also, add "Context"
annotation for the locked versions where it is missing.
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Acked-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Let the core take the devlink instance lock around port_new and port_del
callbacks and remove the now redundant locking in the only driver that
currently use them.
Signed-off-by: Moshe Shemesh <moshe@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
The previous patch removed the last usage of the functions
devlink_rate_leaf_create() and devlink_rate_nodes_destroy(). Thus,
remove these function from devlink API.
Signed-off-by: Moshe Shemesh <moshe@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
The previous patch removed the last usage of the function
devlink_rate_nodes_destroy(). Thus, remove this function from devlink
API.
Signed-off-by: Moshe Shemesh <moshe@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
As explained in commit 316580b69d ("u64_stats: provide u64_stats_t type")
we should use u64_stats_t and related accessors to avoid load/store tearing.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Allow the driver to provide per line card info get op to fill-up info,
similar to the "devlink dev info".
Example:
$ devlink lc info pci/0000:01:00.0 lc 8
pci/0000:01:00.0:
lc 8
versions:
fixed:
hw.revision 0
running:
ini.version 4
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>