Implement removing additional RSS contexts via Netlink.
Technically it'd be possible to shoehorn the delete operation
into ethnl_request_ops-compatible handler. The code ends
up longer than open coded version, and I think we'll need
a custom way of sending notifications at some stage (if we
allow tying the context lifetime to the netlink socket, in
the future).
Link: https://patch.msgid.link/20250717234343.2328602-8-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Support creating contexts via Netlink. Setting flow hashing
fields on the new context is not supported at this stage,
it can be added later.
An empty indirection table is not supported. This is a carry
over from the IOCTL interface where empty indirection table
meant delete. We can repurpose empty indirection table in
Netlink but for now to avoid confusion reject it using the
policy.
Support letting user choose the ID for the new context. This was
not possible in IOCTL since the context ID field for the create
action had to be set to the ETH_RXFH_CONTEXT_ALLOC magic value.
Link: https://patch.msgid.link/20250717234343.2328602-7-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Similarly to previous change, factor out populating the response.
We will use this after the context was allocated to send a notification
so this time factor out from the additional context handling, rather
than context 0 handling (for request context didn't exist, for response
it does).
Reviewed-by: Gal Pressman <gal@nvidia.com>
Link: https://patch.msgid.link/20250717234343.2328602-5-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
To ease the code reuse for RSS_CREATE we'll want to prepare
struct rss_reply_data for the new context. Unfortunately
we can't depend on the exiting scaffolding because the context
doesn't exist (ctx=NULL) when we start preparing. Factor out
the portion of the context 0 handling responsible for allocation
of request memory, so that we can call it directly.
Link: https://patch.msgid.link/20250717234343.2328602-4-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
In anticipation for CREATE and DELETE notifications - explicitly
pass the notification type to ethtool_rss_notify(), when calling
from the IOCTL code.
Reviewed-by: Gal Pressman <gal@nvidia.com>
Link: https://patch.msgid.link/20250717234343.2328602-3-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Supporting per-RSS context configuration of hashing fields but
not the hashing algorithm would complicate the code a lot.
We'd need to cross check the config against all RSS contexts.
None of the drivers need this today, so explicitly prevent
new drivers with such skewed capabilities from registering.
If such driver appears it will need to first adjust the checks
in the core.
Link: https://patch.msgid.link/20250717234343.2328602-2-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Add support for ETHTOOL_SRXFH (setting hashing fields) in RSS_SET.
The tricky part is dealing with symmetric hashing. In netlink user
can change the hashing fields and symmetric hash in one request,
in IOCTL the two used to be set via different uAPI requests.
Since fields and hash function config are still separate driver
callbacks - changes to the two are not atomic. Keep things simple
and validate the settings against both pre- and post- change ones.
Meaning that we will reject the config request if user tries
to correct the flow fields and set input_xfrm in one request,
or disables input_xfrm and makes flow fields non-symmetric.
We can adjust it later if there's a real need. Starting simple feels
right, and potentially partially applying the settings isn't nice,
either.
Reviewed-by: Gal Pressman <gal@nvidia.com>
Link: https://patch.msgid.link/20250716000331.1378807-11-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Support configuring symmetric hashing via Netlink.
We have the flow field config prepared as part of SET handling,
so scan it for conflicts instead of querying the driver again.
Reviewed-by: Gal Pressman <gal@nvidia.com>
Link: https://patch.msgid.link/20250716000331.1378807-10-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Support setting RSS hashing key via ethtool Netlink.
Use the Netlink policy to make sure user doesn't pass
an empty key, "resetting" the key is not a thing.
Reviewed-by: Gal Pressman <gal@nvidia.com>
Link: https://patch.msgid.link/20250716000331.1378807-7-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Support setting RSS hash function / algo via ethtool Netlink.
Like IOCTL we don't validate that the function is within the
range known to the kernel. The drivers do a pretty good job
validating the inputs, and the IDs are technically "dynamically
queried" rather than part of uAPI.
Only change should be that in Netlink we don't support user
explicitly passing ETH_RSS_HASH_NO_CHANGE (0), if no change
is requested the attribute should be absent.
The ETH_RSS_HASH_NO_CHANGE is retained in driver-facing
API for consistency (not that I see a strong reason for it).
Reviewed-by: Gal Pressman <gal@nvidia.com>
Link: https://patch.msgid.link/20250716000331.1378807-6-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Add initial support for RSS_SET, for now only operations on
the indirection table are supported.
Unlike the ioctl don't check if at least one parameter is
being changed. This is how other ethtool-nl ops behave,
so pick the ethtool-nl consistency vs copying ioctl behavior.
There are two special cases here:
1) resetting the table to defaults;
2) support for tables of different size.
For (1) I use an empty Netlink attribute (array of size 0).
(2) may require some background. AFAICT a lot of modern devices
allow allocating RSS tables of different sizes. mlx5 can upsize
its tables, bnxt has some "table size calculation", and Intel
folks asked about RSS table sizing in context of resource allocation
in the past. The ethtool IOCTL API has a concept of table size,
but right now the user is expected to provide a table exactly
the size the device requests. Some drivers may change the table
size at runtime (in response to queue count changes) but the
user is not in control of this. What's not great is that all
RSS contexts share the same table size. For example a device
with 128 queues enabled, 16 RSS contexts 8 queues in each will
likely have 256 entry tables for each of the 16 contexts,
while 32 would be more than enough given each context only has
8 queues. To address this the Netlink API should avoid enforcing
table size at the uAPI level, and should allow the user to express
the min table size they expect.
To fully solve (2) we will need more driver plumbing but
at the uAPI level this patch allows the user to specify
a table size smaller than what the device advertises. The device
table size must be a multiple of the user requested table size.
We then replicate the user-provided table to fill the full device
size table. This addresses the "allow the user to express the min
table size" objective, while not enforcing any fixed size.
From Netlink perspective .get_rxfh_indir_size() is now de facto
the "max" table size supported by the device.
We may choose to support table replication in ethtool, too,
when we actually plumb this thru the device APIs.
Initially I was considering moving full pattern generation
to the kernel (which queues to use, at which frequency and
what min sequence length). I don't think this complexity
would buy us much and most if not all devices have pow-2
table sizes, which simplifies the replication a lot.
Reviewed-by: Gal Pressman <gal@nvidia.com>
Link: https://patch.msgid.link/20250716000331.1378807-2-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The requirement of ->get_rxfh_fields() in ethtool_set_rxfh() is there to
verify that we have no conflict of input_xfrm with the RSS fields
options, there is no point in doing it if input_xfrm is not
supported/requested.
This is under the assumption that a driver that supports input_xfrm will
also support ->get_rxfh_fields(), so add a WARN_ON() to
ethtool_check_ops() to verify it, and remove the op NULL check.
This fixes the following error in mlx4_en, which doesn't support
getting/setting RXFH fields.
$ ethtool --set-rxfh-indir eth2 hfunc xor
Cannot set RX flow hash configuration: Operation not supported
Fixes: 72792461c8 ("net: ethtool: don't mux RXFH via rxnfc callbacks")
Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com>
Signed-off-by: Gal Pressman <gal@nvidia.com>
Link: https://patch.msgid.link/20250715140754.489677-1-gal@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Implement ETHTOOL_GRXFH over Netlink. The number of flow types is
reasonable (around 20) so report all of them at once for simplicity.
Do not maintain the flow ID mapping with ioctl at the uAPI level.
This gives us a chance to clean up the confusion that come from
RxNFC vs RxFH (flow direction vs hashing) in the ioctl.
Try to align with the names used in ethtool CLI, they seem to have
stood the test of time just fine. One annoyance is that we still
call L4 ports the weird names, but I guess they also apply to IPSec
(where they cover the SPI) so it is what it is.
$ ynl --family ethtool --dump rss-get
{
"header": {
"dev-index": 1,
"dev-name": "enp1s0"
},
"hfunc": 1,
"hkey": b"...",
"indir": [0, 1, ...],
"flow-hash": {
"ether": {"l2da"},
"ah-esp4": {"ip-src", "ip-dst"},
"ah-esp6": {"ip-src", "ip-dst"},
"ah4": {"ip-src", "ip-dst"},
"ah6": {"ip-src", "ip-dst"},
"esp4": {"ip-src", "ip-dst"},
"esp6": {"ip-src", "ip-dst"},
"ip4": {"ip-src", "ip-dst"},
"ip6": {"ip-src", "ip-dst"},
"sctp4": {"ip-src", "ip-dst"},
"sctp6": {"ip-src", "ip-dst"},
"udp4": {"ip-src", "ip-dst"},
"udp6": {"ip-src", "ip-dst"}
"tcp4": {"l4-b-0-1", "l4-b-2-3", "ip-src", "ip-dst"},
"tcp6": {"l4-b-0-1", "l4-b-2-3", "ip-src", "ip-dst"},
},
}
Link: https://patch.msgid.link/20250708220640.2738464-5-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Looks like some drivers (ena, enetc, fbnic.. there's probably more)
consider ETHER_FLOW to be legitimate target for flow hashing.
I'm not sure how intentional that is from the uAPI perspective
vs just an effect of ethtool IOCTL doing minimal input validation.
But Netlink will do strict validation, so we need to decide whether
we allow this use case or not. I don't see a strong reason against
it, and rejecting it would potentially regress a number of drivers.
So update the comments and flow_type_hashable().
Link: https://patch.msgid.link/20250708220640.2738464-4-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
After commit 040cef30b5 ("net: ethtool: move get_rxfh callback
under the rss_lock") we're expected to take rss_lock around get.
Switch dump to using the new prep helper and move the locking into it.
Link: https://patch.msgid.link/20250708220640.2738464-2-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Now that we don't have the compat code we can reduce the indent
a little. No functional changes.
Reviewed-by: Gal Pressman <gal@nvidia.com>
Reviewed-by: Edward Cree <ecree.xilinx@gmail.com>
Link: https://patch.msgid.link/20250707184115.2285277-6-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
All drivers are now converted to dedicated _rxfh_context ops.
Remove the use of >set_rxfh() to manage additional contexts.
Reviewed-by: Gal Pressman <gal@nvidia.com>
Reviewed-by: Edward Cree <ecree.xilinx@gmail.com>
Link: https://patch.msgid.link/20250707184115.2285277-5-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Ido spotted that I made a mistake in commit under Fixes,
ethnl_default_parse() may acquire a dev reference even when it returns
an error. This may have been driven by the code structure in dumps
(which unconditionally release dev before handling errors), but it's
too much of a trap. Functions should undo what they did before returning
an error, rather than expecting caller to clean up.
Rather than fixing ethnl_default_set_doit() directly make
ethnl_default_parse() clean up errors.
Reported-by: Ido Schimmel <idosch@idosch.org>
Link: https://lore.kernel.org/aGEPszpq9eojNF4Y@shredder
Fixes: 963781bdfe ("net: ethtool: call .parse_request for SET handlers")
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Link: https://patch.msgid.link/20250630154053.1074664-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
We already call get_rxfh under the rss_lock when we read back
context state after changes. Let's be consistent and always
hold the lock. The existing callers are all under rtnl_lock
so this should make no difference in practice, but it makes
the locking rules far less confusing IMHO. Any RSS callback
and any access to the RSS XArray should hold the lock.
Link: https://patch.msgid.link/20250626202848.104457-4-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Netlink code will want to perform the RSS_SET operation atomically
under the rss_lock. sfc wants to hold the rss_lock in rxfh_fields_get,
which makes that difficult. Lets move the locking up to the core
so that for all driver-facing callbacks rss_lock is taken consistently
by the core.
Link: https://patch.msgid.link/20250626202848.104457-3-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Always take the rss_lock in ethtool_set_rxfh(). We will want to
make a similar change in ethtool_set_rxfh_fields() and some
drivers lock that callback regardless of rss context ID being set.
Having some callbacks locked unconditionally and some only if
context ID is set would be very confusing.
ethtool handling is under rtnl_lock, so rss_lock is very unlikely
to ever be congested.
Link: https://patch.msgid.link/20250626202848.104457-2-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
We now reuse .parse_request() from GET on SET, so we need to make sure
that the policies for both cover the attributes used for .parse_request().
genetlink will only allocate space in info->attrs for ARRAY_SIZE(policy).
Reported-by: syzbot+430f9f76633641a62217@syzkaller.appspotmail.com
Fixes: 963781bdfe ("net: ethtool: call .parse_request for SET handlers")
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Tested-by: Ido Schimmel <idosch@nvidia.com>
Link: https://patch.msgid.link/20250626233926.199801-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
In preparation for RSS_SET handling in ethnl introduce Netlink
notifications for RSS. Only cover modifications, not creation
and not removal of a context, because the latter may deserve
a different notification type. We should cross that bridge
when we add the support for context add / remove via Netlink.
Link: https://patch.msgid.link/20250623231720.3124717-7-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Copy information parsed for SET with .req_parse to NTF handling
and therefore the GET-equivalent that it ends up executing.
This way if the SET was on a sub-object (like RSS context)
the notification will also be appropriately scoped.
Also copy the phy_index, Maxime suggests this will help PLCA
commands generate accurate notifications as well.
Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Tested-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Link: https://patch.msgid.link/20250623231720.3124717-6-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
ethtool_notify() takes a const void *data argument, which presumably
was intended to pass information from the call site to the subcommand
handler. This argument currently has no users.
Expecting the data to be subcommand-specific has two complications.
Complication #1 is that its not plumbed thru any of the standardized
callbacks. It gets propagated to ethnl_default_notify() where it
remains unused. Coming from the ethnl_default_set_doit() side we pass
in NULL, because how could we have a command specific attribute in
a generic handler.
Complication #2 is that we expect the ethtool_notify() callers to
know what attribute type to pass in. Again, the data pointer is
untyped.
RSS will need to pass the context ID to the notifications.
I think it's a better design if the "subcommand" exports its own
typed interface and constructs the appropriate argument struct
(which will be req_info). Remove the unused data argument from
ethtool_notify() but retain it in a new internal helper which
subcommands can use to build a typed interface.
Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Tested-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Link: https://patch.msgid.link/20250623231720.3124717-5-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
In preparation for using req_info to carry parameters between SET
and NTF - call .parse_request during ethnl_default_set_doit().
The main question here is whether .parse_request is intended to be
GET-specific. Originally the SET handling was delegated to each subcommand
directly - ethnl_default_set_doit() and .set callbacks in ethnl_request_ops
did not exist. Looking at existing users does not shed much light, all
of the following subcommands use .parse_request but have no SET handler
(and no NTF):
net/ethtool/eeprom.c
net/ethtool/rss.c
net/ethtool/stats.c
net/ethtool/strset.c
net/ethtool/tsinfo.c
There's only one which does have a SET:
net/ethtool/pause.c
where .parse_request handling is used to select which statistics to query.
Not relevant for SET but also harmless.
Going back to RSS (which doesn't have SET today) .parse_request parses
the rss_context ID. Using the req_info struct to pass the context ID
from SET to NTF will be very useful.
Switch to ethnl_default_parse(), effectively adding the .parse_request
for SET handlers.
Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Tested-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Link: https://patch.msgid.link/20250623231720.3124717-4-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
In preparation for using req_info to carry parameters between
SET and NTF allocate a full request info struct. Since the size
depends on the subcommand we need to allocate it on the heap.
Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Tested-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Link: https://patch.msgid.link/20250623231720.3124717-3-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
All drivers have been converted. Stop using the rxnfc fallbacks
for Rx Flow Hashing configuration.
Joe pointed out in earlier review that in ethtool_set_rxfh()
we need both .get_rxnfc and .get_rxfh_fields, because we need
both the ring count and flow hashing (because we call
ethtool_check_flow_types()). IOW the existing check added
for transitioning drivers was buggy.
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250618203823.1336156-11-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
This patch expands the status information provided by ethtool for PSE c33
with current port priority and max port priority. It also adds a call to
pse_ethtool_set_prio() to configure the PSE port priority.
Signed-off-by: Kory Maincent (Dent Project) <kory.maincent@bootlin.com>
Reviewed-by: Oleksij Rempel <o.rempel@pengutronix.de>
Link: https://patch.msgid.link/20250617-feature_poe_port_prio-v14-8-78a1a645e2ee@bootlin.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Report the index of the newly introduced PSE power domain to the user,
enabling improved management of the power budget for PSE devices.
Signed-off-by: Kory Maincent (Dent Project) <kory.maincent@bootlin.com>
Reviewed-by: Oleksij Rempel <o.rempel@pengutronix.de>
Link: https://patch.msgid.link/20250617-feature_poe_port_prio-v14-5-78a1a645e2ee@bootlin.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Add support for devm_pse_irq_helper() to register PSE interrupts and report
events such as over-current or over-temperature conditions. This follows a
similar approach to the regulator API but also sends notifications using a
dedicated PSE ethtool netlink socket.
Signed-off-by: Kory Maincent (Dent Project) <kory.maincent@bootlin.com>
Link: https://patch.msgid.link/20250617-feature_poe_port_prio-v14-2-78a1a645e2ee@bootlin.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
We mux multiple calls to the drivers via the .get_nfc and .set_nfc
callbacks. This is slightly inconvenient to the drivers as they
have to de-mux them back. It will also be awkward for netlink code
to construct struct ethtool_rxnfc when it wants to get info about
RX Flow Hash, from the RSS module.
Add dedicated driver callbacks. Create struct ethtool_rxfh_fields
which contains only data relevant to RXFH. Maintain the names of
the fields to avoid having to heavily modify the drivers.
For now support both callbacks, once all drivers are converted
ethtool_*et_rxfh_fields() will stop using the rxnfc callbacks.
Link: https://patch.msgid.link/20250611145949.2674086-5-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
RX Flow Hashing supports using different configuration for different
RSS contexts. Only two drivers seem to support it. Make sure we
uniformly error out for drivers which don't.
Reviewed-by: Joe Damato <joe@dama.to>
Link: https://patch.msgid.link/20250611145949.2674086-4-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Now that the handles have been separated - remove the RX Flow Hash
handling from rxnfc functions and vice versa.
Reviewed-by: Joe Damato <joe@dama.to>
Link: https://patch.msgid.link/20250611145949.2674086-3-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
RX Flow Hash configuration uses the same argument structure
as flow filters. This is probably why ethtool IOCTL handles
them together. The more checks we add the more convoluted
this code is getting (as some of the checks apply only
to flow filters and others only to the hashing).
Copy the code to separate the handling. This is an exact
copy, the next change will remove unnecessary handling.
Reviewed-by: Joe Damato <joe@dama.to>
Link: https://patch.msgid.link/20250611145949.2674086-2-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Current release - regressions:
- af_unix: allow passing cred for embryo without SO_PASSCRED/SO_PASSPIDFD
Current release - new code bugs:
- eth: airoha: correct enable mask for RX queues 16-31
- veth: prevent NULL pointer dereference in veth_xdp_rcv when peer
disappears under traffic
- ipv6: move fib6_config_validate() to ip6_route_add(), prevent invalid
routes
Previous releases - regressions:
- phy: phy_caps: don't skip better duplex match on non-exact match
- dsa: b53: fix untagged traffic sent via cpu tagged with VID 0
- Revert "wifi: mwifiex: Fix HT40 bandwidth issue.", it caused transient
packet loss, exact reason not fully understood, yet
Previous releases - always broken:
- net: clear the dst when BPF is changing skb protocol (IPv4 <> IPv6)
- sched: sfq: fix a potential crash on gso_skb handling
- Bluetooth: intel: improve rx buffer posting to avoid causing issues
in the firmware
- eth: intel: i40e: make reset handling robust against multiple requests
- eth: mlx5: ensure FW pages are always allocated on the local NUMA
node, even when device is configure to 'serve' another node
- wifi: ath12k: fix GCC_GCC_PCIE_HOT_RST definition for WCN7850,
prevent kernel crashes
- wifi: ath11k: avoid burning CPU in ath11k_debugfs_fw_stats_request()
for 3 sec if fw_stats_done is not set
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmhK/3IACgkQMUZtbf5S
IruE5A//RdwiBW/pqoMIiRKLA3HZeUA/beYOl4DwVf8WFQNUIqdboeAi6k4yFrS+
SykKN0s1z8fW45lA46iFv3sR0QKYGln/v/cANsqojYqKBD3PF42dRifFlEAIz2M5
fnXK1VHPJOFK/OBOyKiiW3R6mFv+v9epZM8BKED77vFy7osDV2zkObePeE8/34B7
yVAr6JNTpB5Ex4ziG+e/6tFF6IX9RJLBl4fkRRynLDSsb1NFuy39LxPsxRQPxnzo
tlfHfxEFl5qDNGondUoSxmp38HoO6MRofWp1d1GZoBbTXi0gXV26I5WaaBHBqPkm
jZ7AtIMQq2+JuEg0y4dFFRehZLwLEMuhvlbacbIOKNBngVIsploBzvbG3ntWuUa4
Z5VFayQXumsHB5g7+vEFK6vCPaIpatKt419JsFXogNvVmmQzghALFlSymm/WbyGL
Bj3R448xGDJw+2zDAXSH/nMMHkRaQd2Ptj2czvJ0Y7Fj8bxJgH0okaHOBrk9RQTQ
bdUGCiMY84p6WI7rKDkFyyohMxppdYsY8A9hSdGgpqvu7dZi5yGmzz1Sp9+uSfSF
Lj61am4LSvRsIuTP5cdqmTBn3mZS5R49hvJsFddgXRhF+Y9gB7LSm0sypZbuOEKD
m9ijKcNETglzer0iMCwAVrIbDHGjqqHS74DkRzsuPsQ8kaCjsno=
=0mtm
-----END PGP SIGNATURE-----
Merge tag 'net-6.16-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
"Including fixes from bluetooth and wireless.
Current release - regressions:
- af_unix: allow passing cred for embryo without SO_PASSCRED/SO_PASSPIDFD
Current release - new code bugs:
- eth: airoha: correct enable mask for RX queues 16-31
- veth: prevent NULL pointer dereference in veth_xdp_rcv when peer
disappears under traffic
- ipv6: move fib6_config_validate() to ip6_route_add(), prevent
invalid routes
Previous releases - regressions:
- phy: phy_caps: don't skip better duplex match on non-exact match
- dsa: b53: fix untagged traffic sent via cpu tagged with VID 0
- Revert "wifi: mwifiex: Fix HT40 bandwidth issue.", it caused
transient packet loss, exact reason not fully understood, yet
Previous releases - always broken:
- net: clear the dst when BPF is changing skb protocol (IPv4 <> IPv6)
- sched: sfq: fix a potential crash on gso_skb handling
- Bluetooth: intel: improve rx buffer posting to avoid causing issues
in the firmware
- eth: intel: i40e: make reset handling robust against multiple
requests
- eth: mlx5: ensure FW pages are always allocated on the local NUMA
node, even when device is configure to 'serve' another node
- wifi: ath12k: fix GCC_GCC_PCIE_HOT_RST definition for WCN7850,
prevent kernel crashes
- wifi: ath11k: avoid burning CPU in ath11k_debugfs_fw_stats_request()
for 3 sec if fw_stats_done is not set"
* tag 'net-6.16-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (70 commits)
selftests: drv-net: rss_ctx: Add test for ntuple rules targeting default RSS context
net: ethtool: Don't check if RSS context exists in case of context 0
af_unix: Allow passing cred for embryo without SO_PASSCRED/SO_PASSPIDFD.
ipv6: Move fib6_config_validate() to ip6_route_add().
net: drv: netdevsim: don't napi_complete() from netpoll
net/mlx5: HWS, Add error checking to hws_bwc_rule_complex_hash_node_get()
veth: prevent NULL pointer dereference in veth_xdp_rcv
net_sched: remove qdisc_tree_flush_backlog()
net_sched: ets: fix a race in ets_qdisc_change()
net_sched: tbf: fix a race in tbf_change()
net_sched: red: fix a race in __red_change()
net_sched: prio: fix a race in prio_tune()
net_sched: sch_sfq: reject invalid perturb period
net: phy: phy_caps: Don't skip better duplex macth on non-exact match
MAINTAINERS: Update Kuniyuki Iwashima's email address.
selftests: net: add test case for NAT46 looping back dst
net: clear the dst when changing skb protocol
net/mlx5e: Fix number of lanes to UNKNOWN when using data_rate_oper
net/mlx5e: Fix leak of Geneve TLV option object
net/mlx5: HWS, make sure the uplink is the last destination
...
Context 0 (default context) always exists, there is no need to check
whether it exists or not when adding a flow steering rule.
The existing check fails when creating a flow steering rule for context
0 as it is not stored in the rss_ctx xarray.
For example:
$ ethtool --config-ntuple eth2 flow-type tcp4 dst-ip 194.237.147.23 dst-port 19983 context 0 loc 618
rmgr: Cannot insert RX class rule: Invalid argument
Cannot insert classification rule
An example usecase for this could be:
- A high-priority rule (loc 0) directing specific port traffic to
context 0.
- A low-priority rule (loc 1) directing all other TCP traffic to context
1.
This is a user-visible regression that was caught in our testing
environment, it was not reported by a user yet.
Fixes: de7f7582df ("net: ethtool: prevent flow steering to RSS contexts which don't exist")
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Nimrod Oren <noren@nvidia.com>
Signed-off-by: Gal Pressman <gal@nvidia.com>
Reviewed-by: Joe Damato <jdamato@fastly.com>
Reviewed-by: Edward Cree <ecree.xilinx@gmail.com>
Link: https://patch.msgid.link/20250612071958.1696361-2-gal@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Move this API to the canonical timer_*() namespace.
[ tglx: Redone against pre rc1 ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/all/aB2X0jCKQO56WdMt@gmail.com
Multi-PTP source support within a network topology has been merged,
but the hardware timestamp source is not yet exposed to users.
Currently, users only see the PTP index, which does not indicate
whether the timestamp comes from a PHY or a MAC.
Add support for reporting the hwtstamp source using a
hwtstamp-source field, alongside hwtstamp-phyindex, to describe
the origin of the hardware timestamp.
Remove HWTSTAMP_SOURCE_UNSPEC enum value as it is not used at all.
Signed-off-by: Kory Maincent <kory.maincent@bootlin.com>
Link: https://patch.msgid.link/20250519-feature_ptp_source-v4-1-5d10e19a0265@bootlin.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Symmetric RSS hash requires that:
* No other fields besides IP src/dst and/or L4 src/dst are set
* If src is set, dst must also be set
This restriction was only enforced when RXNFC was configured after
symmetric hash was enabled. In the opposite order of operations (RXNFC
then symmetric enablement) the check was not performed.
Perform the sanity check on set_rxfh as well, by iterating over all flow
types hash fields and making sure they are all symmetric.
Introduce a function that returns whether a flow type is hashable (not
spec only) and needs to be iterated over. To make sure that no one
forgets to update the list of hashable flow types when adding new flow
types, a static assert is added to draw the developer's attention.
The conversion of uapi #defines to enum is not ideal, but as Jakub
mentioned [1], we have precedent for that.
[1] https://lore.kernel.org/netdev/20250324073509.6571ade3@kernel.org/
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Gal Pressman <gal@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250508103034.885536-1-gal@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Now that we have an infrastructure in ethnl for perphy DUMPs, we can get
rid of the custom ->doit and ->dumpit to deal with PHY listing commands.
As most of the code was custom, this basically means re-writing how we
deal with PHY listing.
Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Link: https://patch.msgid.link/20250502085242.248645-3-maxime.chevallier@bootlin.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
ethnl commands that target a phy_device need a DUMP implementation that
will fill the reply for every PHY behind a netdev. We therefore need to
iterate over the dev->topo to list them.
When multiple PHYs are behind the same netdev, it's also useful to
perform DUMP with a filter on a given netdev, to get the capability of
every PHY.
Implement dedicated genl ->start(), ->dumpit() and ->done() operations
for PHY-targetting command, allowing filtered dumps and using a dump
context that keep track of the PHY iteration for multi-message dump.
PSE-PD and PLCA are converted to this new set of ops along the way.
Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Link: https://patch.msgid.link/20250502085242.248645-2-maxime.chevallier@bootlin.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
When the link partner goes down, "ethtool --show-mm" still displays
"Verification status: SUCCEEDED," reflecting a previous state that is
no longer valid.
Reset the verification status to ensure it reflects the current state.
Reviewed-by: Furong Xu <0x1207@gmail.com>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Faizal Rahim <faizal.abdul.rahim@linux.intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
It appears that stmmac is not the only hardware which requires a
software-driven verification state machine for the MAC Merge layer.
While on the one hand it's good to encourage hardware implementations,
on the other hand it's quite difficult to tolerate multiple drivers
implementing independently fairly non-trivial logic.
Extract the hardware-independent logic from stmmac into library code and
put it in ethtool. Name the state structure "mmsv" for MAC Merge
Software Verification. Let this expose an operations structure for
executing the hardware stuff: sync hardware with the tx_active boolean
(result of verification process), enable/disable the pMAC, send mPackets,
notify library of external events (reception of mPackets), as well as
link state changes.
Note that it is assumed that the external events are received in hardirq
context. If they are not, it is probably a good idea to disable hardirqs
when calling ethtool_mmsv_event_handle(), because the library does not
do so.
Also, the MM software verification process has no business with the
tx_min_frag_size, that is all the driver's to handle.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Co-developed-by: Choong Yong Liang <yong.liang.choong@linux.intel.com>
Signed-off-by: Choong Yong Liang <yong.liang.choong@linux.intel.com>
Tested-by: Choong Yong Liang <yong.liang.choong@linux.intel.com>
Tested-by: Furong Xu <0x1207@gmail.com>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Faizal Rahim <faizal.abdul.rahim@linux.intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
The 'read_write_len_ext' field in 'struct ethtool_cmis_cdb_cmd_args'
stores the maximum number of bytes that can be read from or written to
the Local Payload (LPL) page in a single multi-byte access.
Cited commit started overwriting this field with the maximum number of
bytes that can be read from or written to the Extended Payload (LPL)
pages in a single multi-byte access. Transceiver modules that support
auto paging can advertise a number larger than 255 which is problematic
as 'read_write_len_ext' is a 'u8', resulting in the number getting
truncated and firmware flashing failing [1].
Fix by ignoring the maximum EPL access size as the kernel does not
currently support auto paging (even if the transceiver module does) and
will not try to read / write more than 128 bytes at once.
[1]
Transceiver module firmware flashing started for device enp177s0np0
Transceiver module firmware flashing in progress for device enp177s0np0
Progress: 0%
Transceiver module firmware flashing encountered an error for device enp177s0np0
Status message: Write FW block EPL command failed, LPL length is longer
than CDB read write length extension allows.
Fixes: 9a3b0d078b ("net: ethtool: Add support for writing firmware blocks using EPL payload")
Reported-by: Damodharam Ammepalli <damodharam.ammepalli@broadcom.com>
Closes: https://lore.kernel.org/netdev/20250402183123.321036-3-michael.chan@broadcom.com/
Tested-by: Damodharam Ammepalli <damodharam.ammepalli@broadcom.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Damodharam Ammepalli <damodharam.ammepalli@broadcom.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Link: https://patch.msgid.link/20250409112440.365672-1-idosch@nvidia.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
There's a consistent pattern where the .cleanup_data() callback is
called when .prepare_data() fails, when it should really be called to
clean after a successful .prepare_data() as per the documentation.
Rewrite the error-handling paths to make sure we don't cleanup
un-prepared data.
Fixes: c781ff12a2 ("ethtool: Allow network drivers to dump arbitrary EEPROM data")
Reviewed-by: Kory Maincent <kory.maincent@bootlin.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Michal Kubecek <mkubecek@suse.cz>
Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Link: https://patch.msgid.link/20250407130511.75621-1-maxime.chevallier@bootlin.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>