linux/net/xdp
Jason Xing 45e359be1c net: xsk: introduce XDP_MAX_TX_SKB_BUDGET setsockopt
This patch provides a setsockopt method to let applications leverage to
adjust how many descs to be handled at most in one send syscall. It
mitigates the situation where the default value (32) that is too small
leads to higher frequency of triggering send syscall.

Considering the prosperity/complexity the applications have, there is no
absolutely ideal suggestion fitting all cases. So keep 32 as its default
value like before.

The patch does the following things:
- Add XDP_MAX_TX_SKB_BUDGET socket option.
- Set max_tx_budget to 32 by default in the initialization phase as a
  per-socket granular control.
- Set the range of max_tx_budget as [32, xs->tx->nentries].

The idea behind this comes out of real workloads in production. We use a
user-level stack with xsk support to accelerate sending packets and
minimize triggering syscalls. When the packets are aggregated, it's not
hard to hit the upper bound (namely, 32). The moment user-space stack
fetches the -EAGAIN error number passed from sendto(), it will loop to try
again until all the expected descs from tx ring are sent out to the driver.
Enlarging the XDP_MAX_TX_SKB_BUDGET value contributes to less frequency of
sendto() and higher throughput/PPS.

Here is what I did in production, along with some numbers as follows:
For one application I saw lately, I suggested using 128 as max_tx_budget
because I saw two limitations without changing any default configuration:
1) XDP_MAX_TX_SKB_BUDGET, 2) socket sndbuf which is 212992 decided by
net.core.wmem_default. As to XDP_MAX_TX_SKB_BUDGET, the scenario behind
this was I counted how many descs are transmitted to the driver at one
time of sendto() based on [1] patch and then I calculated the
possibility of hitting the upper bound. Finally I chose 128 as a
suitable value because 1) it covers most of the cases, 2) a higher
number would not bring evident results. After twisting the parameters,
a stable improvement of around 4% for both PPS and throughput and less
resources consumption were found to be observed by strace -c -p xxx:
1) %time was decreased by 7.8%
2) error counter was decreased from 18367 to 572

[1]: https://lore.kernel.org/all/20250619093641.70700-1-kerneljasonxing@gmail.com/

Signed-off-by: Jason Xing <kernelxing@tencent.com>
Acked-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Link: https://patch.msgid.link/20250704160138.48677-1-kerneljasonxing@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-07-10 14:48:29 +02:00
..
Kconfig
Makefile
xdp_umem.c xsk: Require XDP_UMEM_TX_METADATA_LEN to actuate tx_metadata_len 2024-07-25 11:57:27 +02:00
xdp_umem.h
xsk.c net: xsk: introduce XDP_MAX_TX_SKB_BUDGET setsockopt 2025-07-10 14:48:29 +02:00
xsk.h xdp: Add proper __rcu annotations to redirect map entries 2021-06-24 19:41:15 +02:00
xsk_buff_pool.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2025-05-01 15:11:38 -07:00
xsk_diag.c net: remove sock_i_uid() 2025-06-23 17:04:03 -07:00
xsk_queue.c xdp: Fix zero-size allocation warning in xskq_create() 2023-10-09 16:13:29 +02:00
xsk_queue.h xsk: Carry a copy of xdp_zc_max_segs within xsk_buff_pool 2024-10-14 17:23:30 +02:00
xskmap.c xsk: fix OOB map writes when deleting elements 2024-11-25 14:25:48 -08:00