Commit graph

1573 commits

Author SHA1 Message Date
Michael Chan
b098dc5a33 bnxt_en: Add bnxt_setup_ctxm_pg_tbls() helper function
In bnxt_alloc_ctx_mem(), the logic to set up the context memory entries
and to allocate the context memory tables is done repetitively.  Add
a helper function to simplify the code.

The setup of the Fast Path TQM entries relies on some information from
the Slow Path TQM entries.  Copy the SP_TQM entries to the FP_TQM
entries to simplify the logic.

Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Link: https://lore.kernel.org/r/20231120234405.194542-7-michael.chan@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-11-21 17:32:48 -08:00
Michael Chan
2ad67aea11 bnxt_en: Use the pg_info field in bnxt_ctx_mem_type struct
Use the newly added pg_info field in bnxt_ctx_mem_type struct and
remove the standalone page info structures in bnxt_ctx_mem_info.
This now completes the reorganization of the context memory
structures to work better with the new and more flexible firmware
interface for newer chips.

Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Link: https://lore.kernel.org/r/20231120234405.194542-6-michael.chan@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-11-21 17:32:48 -08:00
Michael Chan
035c576159 bnxt_en: Add page info to struct bnxt_ctx_mem_type
This will further improve the organization of the bnxt_ctx_mem_info
structure by moving the standalone page info structures into the
bnxt_ctx_mem_type array.  Add the allocation and free logic first and
the next patch will migrate to use the new infrastructure.

Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Link: https://lore.kernel.org/r/20231120234405.194542-5-michael.chan@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-11-21 17:32:48 -08:00
Michael Chan
76087d997a bnxt_en: Restructure context memory data structures
The current code uses a flat bnxt_ctx_mem_info structure to store 8
types of context memory for the NIC.  All the context memory types
are very similar and have similar parameters.  They can all share a
common structure to improve the organization.  Also, new firmware
interface will provide a new API to retrieve each type of context
memory by calling the API repeatedly.

This patch reorganizes the bnxt_ctx_mem_info structure to fit better
with the new firmware interface.  It will also work with the legacy
firmware interface.  The flat fields in bnxt_ctx_mem_info are replaced
by the bnxt_ctx_mem_type array.  The bnxt_mem_init array info will no
longer be needed.

Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com>
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Link: https://lore.kernel.org/r/20231120234405.194542-4-michael.chan@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-11-21 17:32:48 -08:00
Michael Chan
e50dc4c220 bnxt_en: Free bp->ctx inside bnxt_free_ctx_mem()
We always free bp->ctx right after calling bnxt_free_ctx_mem(), so just
free it at the end of that function to make things simpler.

Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Link: https://lore.kernel.org/r/20231120234405.194542-3-michael.chan@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-11-21 17:32:47 -08:00
Michael Chan
aa8460bacf bnxt_en: The caller of bnxt_alloc_ctx_mem() should always free bp->ctx
bnxt_alloc_ctx_mem() calls bnxt_hwrm_func_backing_store_qcaps() to
allocate the memory for bp->ctx.  Initialize bp->ctx with the allocated
memory and let the caller free it during unwind.  The unwind logic is
already there, we just need to always set bp->ctx to the allocated
memory so the caller will always free it.  This simplifies the logic
and makes it easier to expand on the backing store logic.

Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Link: https://lore.kernel.org/r/20231120234405.194542-2-michael.chan@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-11-21 17:32:47 -08:00
Michael Chan
c1056a59ae bnxt_en: Optimize xmit_more TX path
Now that we use the cumulative consumer index scheme for TX completion,
we don't need to have one TX completion per TX packet in the xmit_more
code path.  Set the TX_BD_FLAGS_NO_CMPL flag if xmit_more is true.
Fallback to one interrupt per packet if the ring is filled beyond
bp->tx_wake_thresh.

Also, move the wmb() to bnxt_txr_db_kick().  When xmit_more is true,
we'll skip the bnxt_txr_db_kick() call and there is no need to call
wmb() to sync. the TX BD data.

Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com>
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-11-15 10:07:40 +00:00
Michael Chan
ba09801779 bnxt_en: Use existing MSIX vectors for all mqprio TX rings
We can now fully support sharing the same MSIX for all mqprio TX rings
belonging to the same ethtool channel with the new infrastructure:

1. Allocate the proper entries for cp_ring_arr in struct bnxt_cp_ring_info
to support the additional TX rings.

2. Populate the tx_ring array in struct bnxt_napi for all TX rings
sharing the same NAPI.

3. bnxt_num_tx_to_cp() returns the proper NQ/completion rings to support
the TX rings in the input.

4. Adjust bnxt_get_num_ring_stats() for the reduced number of ring
counters with the new scheme.

Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-11-15 10:07:40 +00:00
Michael Chan
f07b58801b bnxt_en: Add macros related to TC and TX rings
Add 3 macros that handle to conversions between TC numbers and TX
ring numbers.  These will help to clarify the existing logic and the
new logic in the next patch.

Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-11-15 10:07:40 +00:00
Michael Chan
f5b29c6afe bnxt_en: Add helper to get the number of CP rings required for TX rings
Up until now, each TX ring always requires a completion ring/NQ/MSIX.
bnxt_trim_rings() and the assignment of bp->cp_nr_rings always make
this assumption.  This will no longer be true in the next patches, so
we refactor and add helper functions to determine the proper relationship
between TX rings and the required completion ring/NQ/MSIX.  This patch
does not change the 1:1 relationship yet.

Note that on P5 chips, each RX and TX ring still requires a completion
ring.  Only the number of NQs has been reduced.  We should no longer call
bnxt_trim_rings() to adjust the RX and TX rings on P5 chips.  Replace with
simple logic to check that RX + TX < CP and adjust accordingly.

bnxt_check_rings() should call _bnxt_get_max_rings() to get the raw
number of rings instead of bnxt_get_max_rings().  If we are about to
create TCs, bnxt_get_max_rings() would not be able to calculate the max
rings correctly.

Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-11-15 10:07:40 +00:00
Michael Chan
0589a1ed4d bnxt_en: Support up to 8 TX rings per MSIX
For each mqprio TC, we allocate a set of TX rings to map to the new
hardware CoS queue.  Expand the tx_ring pointer in struct bnxt_napi
to an array of 8 to support up to 8 TX rings, one for each TC.
Only array entry 0 is used at this time.  The rest of the array
entries will be used in later patches.

Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-11-15 10:07:40 +00:00
Michael Chan
877edb3473 bnxt_en: Refactor bnxt_hwrm_set_coal()
Add 2 helper functions to set coalescing for each RX and TX rings.  This
will make it easier to expand the number of TX rings per MSIX in the
next patches.

Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-11-15 10:07:40 +00:00
Michael Chan
5a3c585fa8 bnxt_en: New encoding for the TX opaque field
In order to support multiple TX rings on the same MSIX, we'll use the
upper byte of the TX opaque field to store the ring index in the new
tx_napi_idx field.  This tx_napi_idx field is currently always 0 until
more infrastructure is added in later patches.

Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-11-15 10:07:39 +00:00
Michael Chan
ebf72319ce bnxt_en: Refactor bnxt_tx_int()
bnxt_tx_int() processes the only one TX ring from the bnxt_napi pointer.
To prepare for more TX rings associated with the bnxt_napi structure,
add a new __bnxt_tx_int() function that takes the bnxt_tx_ring_info
pointer to process that one TX ring.  No functional change.

Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-11-15 10:07:39 +00:00
Michael Chan
9c0b06de6f bnxt_en: Remove BNXT_RX_HDL and BNXT_TX_HDL
These 2 constants were used for the one RX and one TX completion
ring pointer in the cpr->cp_ring_arr fixed array.  Now that we've
changed to allocating the array for the exact number of entries to
support more TX rings, we no longer use these constants.

The array index as well as the type of completion ring (RX/TX) are
now encoded in the handle for the completion ring.  This will allow
us to locate the completion ring during NAPI for any number of
completion rings sharing the same MSIX.  In the following patches,
we'll be adding support for more TX rings associated with the same
MSIX vector.

Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-11-15 10:07:39 +00:00
Michael Chan
7845b8dfc7 bnxt_en: Add completion ring pointer in TX and RX ring structures
From the TX or RX ring structure, we need to find the corresponding
completion ring during initialization.  On P5 chips, we use the MSIX/napi
entry to locate the completion ring because there is only one RX/TX
ring per MSIX.  To allow multiple TX rings for each MSIX, we need
to add a direct pointer from the TX ring and RX ring structures.
This also simplifies the existing logic.

Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-11-15 10:07:39 +00:00
Michael Chan
d1eec61410 bnxt_en: Restructure cp_ring_arr in struct bnxt_cp_ring_info
The cp_ring_arr is currently a fixed array of 2 pointers for the
TX and RX completion rings.  These pointers are allocated during
ring initialization.  Currntly, we support up to 2 completion rings
for each MSIX.  In order to support more completion rings, we change
this fixed array to a pointer and allocate the required entries
during ring initialization.  This patch keeps the current scheme of
allocating only 2 entries when needed.  Later patches will expand
and allocate more entries when required.

Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-11-15 10:07:39 +00:00
Michael Chan
7f0a168b04 bnxt_en: Add completion ring pointer in TX and RX ring structures
From the TX or RX ring structure, we need to find the corresponding
completion ring during initialization.  On P5 chips, we use the MSIX/napi
entry to locate the completion ring because there is only one RX/TX
ring per MSIX.  To allow multiple TX rings for each MSIX, we need
to add a direct pointer from the TX ring and RX ring structures.
This also simplifies the existing logic.

Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-11-15 10:07:39 +00:00
Michael Chan
34eec1f29a bnxt_en: Put the TX producer information in the TX BD opaque field
Currently, the opaque field in the TX BD is only used for debugging.
The TX completion logic relies on getting one TX completion for each
packet and they always complete in order.

Improve this scheme by putting the producer information (ring index plus
number of BDs for the packet) in the opaque field.  This way, we can
handle TX completion processing by looking at the last TX completion
instead of counting the number of completions.

Since we no longer need to count the exact number of completions, we can
optimize xmit_more by disabling TX completion when the xmit_more
condition is true.  This will be done in later patches.

This patch is only initializing the opaque field in the TX BD and is
not changing the driver's TX completion logic yet.

Reviewed-by: Andy Gospodarek <gospo@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-11-15 10:07:39 +00:00
Michael Chan
9cfe8cf502 bnxt_en: Fix 2 stray ethtool -S counters
The recent firmware interface change has added 2 counters in struct
rx_port_stats_ext. This caused 2 stray ethtool counters to be
displayed.

Since new counters are added from time to time, fix it so that the
ethtool logic will only display up to the maximum known counters.
These 2 counters are not used by production firmware yet.

Fixes: 754fbf604f ("bnxt_en: Update firmware interface to 1.10.2.171")
Reviewed-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Link: https://lore.kernel.org/r/20231026013231.53271-1-michael.chan@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-26 19:43:38 -07:00
Yunsheng Lin
09d96ee567 page_pool: remove PP_FLAG_PAGE_FRAG
PP_FLAG_PAGE_FRAG is not really needed after pp_frag_count
handling is unified and page_pool_alloc_frag() is supported
in 32-bit arch with 64-bit DMA, so remove it.

Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
CC: Lorenzo Bianconi <lorenzo@kernel.org>
CC: Alexander Duyck <alexander.duyck@gmail.com>
CC: Liang Chen <liangchen.linux@gmail.com>
CC: Alexander Lobakin <aleksander.lobakin@intel.com>
Link: https://lore.kernel.org/r/20231020095952.11055-3-linyunsheng@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-23 19:14:48 -07:00
Edwin Peer
5d4e1bf606 bnxt_en: extend media types to supported and autoneg modes
The current driver code does not accurately report the supported and
advertised link modes.  It basically always assumes the media type
is copper for any particular speed.  Utilize the recently added link
mode mappings to accurately report fully qualified ethtool link modes for
advertised and supported speeds.

If the media type is known, we will report the supported link modes for
that media only.  If the media is not known, we will report all possible
supported link modes.  The user can now specify any supported link modes
(including NRZ and PAM4) to advertise for autoneg.  It used to only accept
copper NRZ modes.

Signed-off-by: Edwin Peer <edwin.peer@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-22 11:41:46 +01:00
Edwin Peer
64d20aea6e bnxt_en: convert to linkmode_set_bit() API
Barring the BNXT_FW_TO_ETHTOOL speed macros, which will be removed
in the next patch, update code to use the newer API.

No functional change.

Signed-off-by: Edwin Peer <edwin.peer@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-22 11:41:46 +01:00
Michael Chan
5802e30317 bnxt_en: Refactor NRZ/PAM4 link speed related logic
Refactor some NRZ/PAM4 link speed related logic into helper functions.
The NRZ and PAM4 link parameters are stored in separate structure fields.
The driver logic has to check whether it is in NRZ or PAM4 mode and
then use the appropriate field.

Refactor this logic into helper functions for better readability.

Reviewed-by: Damodharam Ammepalli <damodharam.ammepalli@broadcom.com>
Reviewed-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-22 11:41:46 +01:00
Edwin Peer
94c89e73d3 bnxt_en: refactor speed independent ethtool modes
A future patch in this series will change the algorithm used to
determine ethtool speed and media modes. Extract the handling of
the unrelated pause, autoneg modes into an independent function.
Also separate FEC handling out of bnxt_fw_to_ethtool_*_spds().

No functional change.

Signed-off-by: Edwin Peer <edwin.peer@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-22 11:41:46 +01:00
Edwin Peer
d6263677bb bnxt_en: support lane configuration via ethtool
Recent kernels support changing the number of link lanes via ethtool.
This is useful for determining the appropriate signal mode to use when
a given link speed can be achieved using different lane configurations.

Accept the ethtool lanes parameter when configuring forced speed.  If
there is no lanes parameter, select a default.

Signed-off-by: Edwin Peer <edwin.peer@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-22 11:41:46 +01:00
Edwin Peer
ecdad2a692 bnxt_en: add infrastructure to lookup ethtool link mode
Add infrastructure to look up the enum ethtool_link_mode_bit_indices
from link information provided by the firmware.  The link speed,
signal mode, and media type returned by firmware will be used to
look up the ethtool link mode.

The immediate benefit is that once the link mode is determined, we can
now use ethtool_params_from_link_mode() to fill the basic ethtool
parameters including the number of lanes.  Lanes will be fully
supported in the next patch.

Signed-off-by: Edwin Peer <edwin.peer@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-22 11:41:46 +01:00
Kalesh AP
fd78ec3fbc bnxt_en: Fix invoking hwmon_notify_event
FW sends the async event to the driver when the device temperature goes
above or below the threshold values.  Only notify hwmon if the
temperature is increasing to the next alert level, not when it is
decreasing.

Cc: Guenter Roeck <linux@roeck-us.net>
Reviewed-by: Kashyap Desai <kashyap.desai@broadcom.com>
Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com>
Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-22 11:41:46 +01:00
Kalesh AP
55862094a9 bnxt_en: Do not call sleeping hwmon_notify_event() from NAPI
Defer hwmon_notify_event() to bnxt_sp_task() workqueue because
hwmon_notify_event() can try to acquire a mutex shown in the stack trace
below.  Modify bnxt_event_error_report() to return true if we need to
schedule bnxt_sp_task() to notify hwmon.

  __schedule+0x68/0x520
  hwmon_notify_event+0xe8/0x114
  schedule+0x60/0xe0
  schedule_preempt_disabled+0x28/0x40
  __mutex_lock.constprop.0+0x534/0x550
  __mutex_lock_slowpath+0x18/0x20
  mutex_lock+0x5c/0x70
  kobject_uevent_env+0x2f4/0x3d0
  kobject_uevent+0x10/0x20
  hwmon_notify_event+0x94/0x114
  bnxt_hwmon_notify_event+0x40/0x70 [bnxt_en]
  bnxt_event_error_report+0x260/0x290 [bnxt_en]
  bnxt_async_event_process.isra.0+0x250/0x850 [bnxt_en]
  bnxt_hwrm_handler.isra.0+0xc8/0x120 [bnxt_en]
  bnxt_poll_p5+0x150/0x350 [bnxt_en]
  __napi_poll+0x3c/0x210
  net_rx_action+0x308/0x3b0
  __do_softirq+0x120/0x3e0

Cc: Guenter Roeck <linux@roeck-us.net>
Fixes: a19b480145 ("bnxt_en: Event handler for Thermal event")
Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-22 11:41:45 +01:00
Przemek Kitszel
074e1b4221 bnxt_en: devlink health: use retained error fmsg API
Drop unneeded error checking.

devlink_fmsg_*() family of functions is now retaining errors,
so there is no need to check for them after each call.

Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-20 11:34:49 +01:00
Jakub Kicinski
73b24e7ce8 eth: bnxt: fix backward compatibility with older devices
Recent FW interface update bumped the size of struct hwrm_func_cfg_input
above 128B which is the max some devices support.

Probe on Stratus (BCM957452) with FW 20.8.3.11 fails with:

   bnxt_en ...: Unable to reserve tx rings
   bnxt_en ...: 2nd rings reservation failed.
   bnxt_en ...: Not enough rings available.

Once probe is fixed other errors pop up:

   bnxt_en ...: Failed to set async event completion ring.

This is because __hwrm_send() rejects requests larger than
bp->hwrm_max_ext_req_len with -E2BIG. Since the driver doesn't
actually access any of the new fields, yet, trim the length.
It should be safe.

Similar workaround exists for backing_store_cfg_input.
Although that one mins() to a constant of 256, not 128
we'll effectively use here. Michael explains: "the backing
store cfg command is supported by relatively newer firmware
that will accept 256 bytes at least."

To make debugging easier in the future add a warning
for oversized requests.

Fixes: 754fbf604f ("bnxt_en: Update firmware interface to 1.10.2.171")
Reviewed-by: Michael Chan <michael.chan@broadcom.com>
Link: https://lore.kernel.org/r/20231016171640.1481493-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-17 17:50:55 -07:00
Jakub Kicinski
c27153682e Revert "bnxt_en: Support QOS and TPID settings for the SRIOV VLAN"
This reverts commit e76d44fe72.

We no longer accept drivers extending their use of the legacy
SR-IOV configuration APIs. Users should move to bridge offload.

Link: https://lore.kernel.org/r/20231004112243.41cb6351@kernel.org/
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-04 11:22:57 -07:00
Vikas Gupta
cbdbf0aa41 bnxt_en: Update VNIC resource calculation for VFs
Newer versions of firmware will pre-reserve 1 VNIC for every possible
PF and VF function.  Update the driver logic to take this into account
when assigning VNICs to the VFs.  These pre-reserved VNICs for the
inactive VFs should be subtracted from the global pool before
assigning them to the active VFs.

Not doing so may cause discrepancies that ultimately may cause some VFs to
have insufficient VNICs to support features such as aRFS.

Signed-off-by: Vikas Gupta <vikas.gupta@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-04 11:23:01 +01:00
Sreekanth Reddy
e76d44fe72 bnxt_en: Support QOS and TPID settings for the SRIOV VLAN
Add these missing settings in the .ndo_set_vf_vlan() method.
Older firmware does not support the TPID setting so check for
proper support.

Remove the unused BNXT_VF_QOS flag.

Signed-off-by: Sreekanth Reddy <sreekanth.reddy@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-04 11:23:01 +01:00
Kalesh AP
a19b480145 bnxt_en: Event handler for Thermal event
Newer FW will send a new async event when it detects that
the chip's temperature has crossed the configured threshold value.
The driver will now notify hwmon and will log a warning message.

Link: https://lore.kernel.org/netdev/20230815045658.80494-13-michael.chan@broadcom.com/
Cc: Jean Delvare <jdelvare@suse.com>
Cc: Guenter Roeck <linux@roeck-us.net>
Cc: linux-hwmon@vger.kernel.org
Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Acked-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-04 11:23:01 +01:00
Kalesh AP
3d9cf96206 bnxt_en: Use non-standard attribute to expose shutdown temperature
Implement the sysfs attributes directly in the driver for
shutdown threshold temperature and pass an extra attribute group
to the hwmon core when registering the hwmon device.

Link: https://lore.kernel.org/netdev/20230815045658.80494-12-michael.chan@broadcom.com/
 Cc: Jean Delvare <jdelvare@suse.com>
 Cc: Guenter Roeck <linux@roeck-us.net>
 Cc: linux-hwmon@vger.kernel.org
Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com>
Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-04 11:23:01 +01:00
Kalesh AP
cd13244f19 bnxt_en: Expose threshold temperatures through hwmon
HWRM_TEMP_MONITOR_QUERY response now indicates various
threshold temperatures. Expose these threshold temperatures
through the hwmon sysfs using this mapping:

hwmon_temp_max : bp->warn_thresh_temp
hwmon_temp_crit : bp->crit_thresh_temp
hwmon_temp_emergency : bp->fatal_thresh_temp

hwmon_temp_max_alarm : temp >= bp->warn_thresh_temp
hwmon_temp_crit_alarm : temp >= bp->crit_thresh_temp
hwmon_temp_emergency_alarm : temp >= bp->fatal_thresh_temp

Link: https://lore.kernel.org/netdev/20230815045658.80494-12-michael.chan@broadcom.com/
Cc: Jean Delvare <jdelvare@suse.com>
Cc: Guenter Roeck <linux@roeck-us.net>
Cc: linux-hwmon@vger.kernel.org
Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Acked-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-04 11:23:01 +01:00
Kalesh AP
847da8b117 bnxt_en: Modify the driver to use hwmon_device_register_with_info
The use of hwmon_device_register_with_groups() is deprecated.
Modified the driver to use hwmon_device_register_with_info().

Driver currently exports only temp1_input through hwmon sysfs
interface. But FW has been modified to report more threshold
temperatures and driver want to report them through the
hwmon interface.

Cc: Jean Delvare <jdelvare@suse.com>
Cc: Guenter Roeck <linux@roeck-us.net>
Cc: linux-hwmon@vger.kernel.org
Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Acked-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-04 11:23:01 +01:00
Kalesh AP
a47f3b3992 bnxt_en: Move hwmon functions into a dedicated file
This is in preparation for upcoming patches in the series.
Driver has to expose more threshold temperatures through the
hwmon sysfs interface. More code will be added and do not
want to overload bnxt.c.

Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com>
Cc: Jean Delvare <jdelvare@suse.com>
Cc: Guenter Roeck <linux@roeck-us.net>
Cc: linux-hwmon@vger.kernel.org
Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Acked-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-04 11:23:01 +01:00
Kalesh AP
6ad71984aa bnxt_en: Enhance hwmon temperature reporting
Driver currently does hwmon device register and unregister
in open and close() respectively. As a result, user will not
be able to query hwmon temperature when interface is in
ifdown state.

Enhance it by moving the hwmon register/unregister to the
probe/remove functions.

Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com>
Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-04 11:23:01 +01:00
Michael Chan
754fbf604f bnxt_en: Update firmware interface to 1.10.2.171
The main changes are the additional thermal thresholds in
hwrm_temp_monitor_query_output and the new async event to
report thermal errors.

Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-04 11:23:01 +01:00
Sebastian Andrzej Siewior
edc0140cc3 bnxt_en: Flush XDP for bnxt_poll_nitroa0()'s NAPI
bnxt_poll_nitroa0() invokes bnxt_rx_pkt() which can run a XDP program
which in turn can return XDP_REDIRECT. bnxt_rx_pkt() is also used by
__bnxt_poll_work() which flushes (xdp_do_flush()) the packets after each
round. bnxt_poll_nitroa0() lacks this feature.
xdp_do_flush() should be invoked before leaving the NAPI callback.

Invoke xdp_do_flush() after a redirect in bnxt_poll_nitroa0() NAPI.

Cc: Michael Chan <michael.chan@broadcom.com>
Fixes: f18c2b77b2 ("bnxt_en: optimized XDP_REDIRECT support")
Reviewed-by: Andy Gospodarek <gospo@broadcom.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Reviewed-by: Michael Chan <michael.chan@broadcom.com>
Acked-by: Jesper Dangaard Brouer <hawk@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-09-21 09:01:41 +02:00
Linus Torvalds
f7e97ce269 v6.6 merge window RDMA pull request
Many small changes across the subystem, some highlights:
 
 - Usual driver cleanups in qedr, siw, erdma, hfi1, mlx4/5, irdma, mthca,
   hns, and bnxt_re
 
 - siw now works over tunnel and other netdevs with a MAC address by
   removing assumptions about a MAC/GID from the connection manager
 
 - "Doorbell Pacing" for bnxt_re - this is a best effort scheme to allow
   userspace to slow down the doorbell rings if the HW gets full
 
 - irdma egress VLAN priority, better QP/WQ sizing
 
 - rxe bug fixes in queue draining and srq resizing
 
 - Support more ethernet speed options in the core layer
 
 - DMABUF support for bnxt_re
 
 - Multi-stage MTT support for erdma to allow much bigger MR registrations
 
 - A irdma fix with a CVE that came in too late to go to -rc, missing
   bounds checking for 0 length MRs
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQRRRCHOFoQz/8F5bUaFwuHvBreFYQUCZPEqkAAKCRCFwuHvBreF
 YZrNAPoCBfU+VjCKNr2yqF7s52os5ZdBV7Uuh4txHcXWW9H7GAD/f19i2u62fzNu
 C27jj4cztemMBb8mgwyxPw/wLg7NLwY=
 =pC6k
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma

Pull rdma updates from Jason Gunthorpe:
 "Many small changes across the subystem, some highlights:

   - Usual driver cleanups in qedr, siw, erdma, hfi1, mlx4/5, irdma,
     mthca, hns, and bnxt_re

   - siw now works over tunnel and other netdevs with a MAC address by
     removing assumptions about a MAC/GID from the connection manager

   - "Doorbell Pacing" for bnxt_re - this is a best effort scheme to
     allow userspace to slow down the doorbell rings if the HW gets full

   - irdma egress VLAN priority, better QP/WQ sizing

   - rxe bug fixes in queue draining and srq resizing

   - Support more ethernet speed options in the core layer

   - DMABUF support for bnxt_re

   - Multi-stage MTT support for erdma to allow much bigger MR
     registrations

   - A irdma fix with a CVE that came in too late to go to -rc, missing
     bounds checking for 0 length MRs"

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (87 commits)
  IB/hfi1: Reduce printing of errors during driver shut down
  RDMA/hfi1: Move user SDMA system memory pinning code to its own file
  RDMA/hfi1: Use list_for_each_entry() helper
  RDMA/mlx5: Fix trailing */ formatting in block comment
  RDMA/rxe: Fix redundant break statement in switch-case.
  RDMA/efa: Fix wrong resources deallocation order
  RDMA/siw: Call llist_reverse_order in siw_run_sq
  RDMA/siw: Correct wrong debug message
  RDMA/siw: Balance the reference of cep->kref in the error path
  Revert "IB/isert: Fix incorrect release of isert connection"
  RDMA/bnxt_re: Fix kernel doc errors
  RDMA/irdma: Prevent zero-length STAG registration
  RDMA/erdma: Implement hierarchical MTT
  RDMA/erdma: Refactor the storage structure of MTT entries
  RDMA/erdma: Renaming variable names and field names of struct erdma_mem
  RDMA/hns: Support hns HW stats
  RDMA/hns: Dump whole QP/CQ/MR resource in raw
  RDMA/irdma: Add missing kernel-doc in irdma_setup_umode_qp()
  RDMA/mlx4: Copy union directly
  RDMA/irdma: Drop unused kernel push code
  ...
2023-09-01 16:49:33 -07:00
Jakub Kicinski
e3b3a87967 bnxt: use the NAPI skb allocation cache
All callers of build_skb() (*) in bnxt are in NAPI context.
The budget checking is somewhat convoluted because in the shared
completion queue cases Rx packets are discarded by netpoll
by forcing an error (E). But that happens before skb allocation.
Only a call chain starting at __bnxt_poll_work() can lead to
an skb allocation and it checks budget (b).

* bnxt_rx_multi_page_skb
* bnxt_rx_skb
  ` bp->rx_skb_func
  * bnxt_tpa_end
    ` bnxt_rx_pkt
      E bnxt_force_rx_discard
      E bnxt_poll_nitroa0
      b __bnxt_poll_work

Use napi_build_skb() to take advantage of the skb cache.
In iperf tests with HW-GRO enabled it barely makes a difference
but in cases where HW-GRO is not as effective (or disabled) it
can give even a >10% boost (20.7Gbps -> 23.1Gbps).

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-08-23 11:45:54 +01:00
Michael Chan
8becd1961c bnxt_en: Add tx_resets ring counter
Add a new tx_resets ring counter.  This counter will be saved as
tx_total_resets across any reset.  Since we currently do a full reset
in bnxt_sched_reset_txr(), the per ring counter will always be cleared
during reset.  Only the tx_total_resets count will be meaningful and we
only display this under ethtool -S.

Link: https://lore.kernel.org/netdev/CACKFLimD-bKmJ1tGZOLYRjWzEwxkri-Mw7iFme1x2Dr0twdCeg@mail.gmail.com/
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Link: https://lore.kernel.org/r/20230817231911.165035-7-michael.chan@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-18 19:13:59 -07:00
Michael Chan
a080b47a04 bnxt_en: Display the ring error counters under ethtool -S
The existing driver displays the sum of 4 ring counters under ethtool -S.
These counters are in the array bnxt_sw_func_stats.  These counters are
summed at the time of ethtool -S and will be lost when the device is reset.

Replace these counters with the new total ring error counters added in the
last patch.  These new counters are saved before reset.  ethtool -S will
now display the sum of the saved counters plus the current counters.

Link: https://lore.kernel.org/netdev/CACKFLimD-bKmJ1tGZOLYRjWzEwxkri-Mw7iFme1x2Dr0twdCeg@mail.gmail.com/
Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com>
Reviewed-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Link: https://lore.kernel.org/r/20230817231911.165035-6-michael.chan@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-18 19:13:59 -07:00
Michael Chan
4c70dbe3c0 bnxt_en: Save ring error counters across reset
Currently, the ring counters are stored in the per ring datastructure.
During reset, all the rings are freed together with the associated
datastructures.  As a result, all the ring error counters will be reset
to zero.

Add logic to keep track of the total error counts of all the rings
and save them before reset (including ifdown).  The next patch will
display these total ring error counters under ethtool -S.

Link: https://lore.kernel.org/netdev/CACKFLimD-bKmJ1tGZOLYRjWzEwxkri-Mw7iFme1x2Dr0twdCeg@mail.gmail.com/
Reviewed-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com>
Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Link: https://lore.kernel.org/r/20230817231911.165035-5-michael.chan@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-18 19:13:58 -07:00
Michael Chan
d38c19b13b bnxt_en: Increment rx_resets counter in bnxt_disable_napi()
If we are doing a complete reset with irq_re_init set to true in
bnxt_close_nic(), all the ring structures will be freed.  New
structures will be allocated in bnxt_open_nic().  The current code
increments rx_resets counter in bnxt_enable_napi() if bnapi->in_reset
is true.  In a complete reset, bnapi->in_reset will never be true
since the structure is just allocated.

Increment the rx_resets counter in bnxt_disable_napi() instead.  This
will allow us to save all the ring error counters including the
rx_resets counters in the next patch.

Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Link: https://lore.kernel.org/r/20230817231911.165035-4-michael.chan@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-18 19:13:58 -07:00
Somnath Kotur
578fcfd26e bnxt_en: Let the page pool manage the DMA mapping
Use the page pool's ability to maintain DMA mappings for us.
This avoids re-mapping of the recycled pages.

Link: https://lore.kernel.org/netdev/20230728231829.235716-4-michael.chan@broadcom.com/
Signed-off-by: Somnath Kotur <somnath.kotur@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Link: https://lore.kernel.org/r/20230817231911.165035-3-michael.chan@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-18 19:13:58 -07:00
Somnath Kotur
86b05508f7 bnxt_en: Use the unified RX page pool buffers for XDP and non-XDP
Convert to use the page pool buffers for the aggregation ring when
running in non-XDP mode.  This simplifies the driver and we benefit
from the recycling of pages.  Adjust the page pool size to account
for the aggregation ring size.

Signed-off-by: Somnath Kotur <somnath.kotur@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Link: https://lore.kernel.org/r/20230817231911.165035-2-michael.chan@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-18 19:13:58 -07:00