linux

mirror of git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git synced 2025-08-05 16:54:27 +00:00

Author	SHA1	Message	Date
Vladimir Oltean	cf52bd238b	net: enetc: add support for MAC Merge statistics counters Add PF driver support for the following: - Viewing the standardized MAC Merge layer counters. - Viewing the standardized Ethernet MAC and RMON counters associated with the pMAC. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Link: https://lore.kernel.org/r/20230206094531.444988-2-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-02-07 20:13:55 -08:00
Vladimir Oltean	c7b9e80869	net: enetc: add support for MAC Merge layer Add PF driver support for viewing and changing the MAC Merge sublayer parameters, and seeing the verification state machine's current state. The verification handshake with the link partner is driven by hardware. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Link: https://lore.kernel.org/r/20230206094531.444988-1-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-02-07 20:13:55 -08:00
Jakub Kicinski	cc74ca303a	Merge branch 'sched-cpumask-improve-on-cpumask_local_spread-locality' Yury Norov says: ==================== sched: cpumask: improve on cpumask_local_spread() locality cpumask_local_spread() currently checks local node for presence of i'th CPU, and then if it finds nothing makes a flat search among all non-local CPUs. We can do it better by checking CPUs per NUMA hops. This has significant performance implications on NUMA machines, for example when using NUMA-aware allocated memory together with NUMA-aware IRQ affinity hints. Performance tests from patch 8 of this series for mellanox network driver show: TCP multi-stream, using 16 iperf3 instances pinned to 16 cores (with aRFS on). Active cores: 64,65,72,73,80,81,88,89,96,97,104,105,112,113,120,121 +-------------------------+-----------+------------------+------------------+ \| \| BW (Gbps) \| TX side CPU util \| RX side CPU util \| +-------------------------+-----------+------------------+------------------+ \| Baseline \| 52.3 \| 6.4 % \| 17.9 % \| +-------------------------+-----------+------------------+------------------+ \| Applied on TX side only \| 52.6 \| 5.2 % \| 18.5 % \| +-------------------------+-----------+------------------+------------------+ \| Applied on RX side only \| 94.9 \| 11.9 % \| 27.2 % \| +-------------------------+-----------+------------------+------------------+ \| Applied on both sides \| 95.1 \| 8.4 % \| 27.3 % \| +-------------------------+-----------+------------------+------------------+ Bottleneck in RX side is released, reached linerate (~1.8x speedup). ~30% less cpu util on TX. ==================== Link: https://lore.kernel.org/r/20230121042436.2661843-1-yury.norov@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-02-07 18:20:03 -08:00
Yury Norov	2ac4980c57	lib/cpumask: update comment for cpumask_local_spread() Now that we have an iterator-based alternative for a very common case of using cpumask_local_spread for all cpus in a row, it's worth to mention that in comment to cpumask_local_spread(). Signed-off-by: Yury Norov <yury.norov@gmail.com> Reviewed-by: Valentin Schneider <vschneid@redhat.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-02-07 18:20:00 -08:00
Tariq Toukan	2acda57736	net/mlx5e: Improve remote NUMA preferences used for the IRQ affinity hints In the IRQ affinity hints, replace the binary NUMA preference (local / remote) with the improved for_each_numa_hop_cpu() API that minds the actual distances, so that remote NUMAs with short distance are preferred over farther ones. This has significant performance implications when using NUMA-aware allocated memory (follow [1] and derivatives for example). [1] drivers/net/ethernet/mellanox/mlx5/core/en_main.c :: mlx5e_open_channel() int cpu = cpumask_first(mlx5_comp_irq_get_affinity_mask(priv->mdev, ix)); Performance tests: TCP multi-stream, using 16 iperf3 instances pinned to 16 cores (with aRFS on). Active cores: 64,65,72,73,80,81,88,89,96,97,104,105,112,113,120,121 +-------------------------+-----------+------------------+------------------+ \| \| BW (Gbps) \| TX side CPU util \| RX side CPU util \| +-------------------------+-----------+------------------+------------------+ \| Baseline \| 52.3 \| 6.4 % \| 17.9 % \| +-------------------------+-----------+------------------+------------------+ \| Applied on TX side only \| 52.6 \| 5.2 % \| 18.5 % \| +-------------------------+-----------+------------------+------------------+ \| Applied on RX side only \| 94.9 \| 11.9 % \| 27.2 % \| +-------------------------+-----------+------------------+------------------+ \| Applied on both sides \| 95.1 \| 8.4 % \| 27.3 % \| +-------------------------+-----------+------------------+------------------+ Bottleneck in RX side is released, reached linerate (~1.8x speedup). ~30% less cpu util on TX. * CPU util on active cores only. Setups details (similar for both sides): NIC: ConnectX6-DX dual port, 100 Gbps each. Single port used in the tests. $ lscpu Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian CPU(s): 256 On-line CPU(s) list: 0-255 Thread(s) per core: 2 Core(s) per socket: 64 Socket(s): 2 NUMA node(s): 16 Vendor ID: AuthenticAMD CPU family: 25 Model: 1 Model name: AMD EPYC 7763 64-Core Processor Stepping: 1 CPU MHz: 2594.804 BogoMIPS: 4890.73 Virtualization: AMD-V L1d cache: 32K L1i cache: 32K L2 cache: 512K L3 cache: 32768K NUMA node0 CPU(s): 0-7,128-135 NUMA node1 CPU(s): 8-15,136-143 NUMA node2 CPU(s): 16-23,144-151 NUMA node3 CPU(s): 24-31,152-159 NUMA node4 CPU(s): 32-39,160-167 NUMA node5 CPU(s): 40-47,168-175 NUMA node6 CPU(s): 48-55,176-183 NUMA node7 CPU(s): 56-63,184-191 NUMA node8 CPU(s): 64-71,192-199 NUMA node9 CPU(s): 72-79,200-207 NUMA node10 CPU(s): 80-87,208-215 NUMA node11 CPU(s): 88-95,216-223 NUMA node12 CPU(s): 96-103,224-231 NUMA node13 CPU(s): 104-111,232-239 NUMA node14 CPU(s): 112-119,240-247 NUMA node15 CPU(s): 120-127,248-255 .. $ numactl -H .. node distances: node 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 0: 10 11 11 11 12 12 12 12 32 32 32 32 32 32 32 32 1: 11 10 11 11 12 12 12 12 32 32 32 32 32 32 32 32 2: 11 11 10 11 12 12 12 12 32 32 32 32 32 32 32 32 3: 11 11 11 10 12 12 12 12 32 32 32 32 32 32 32 32 4: 12 12 12 12 10 11 11 11 32 32 32 32 32 32 32 32 5: 12 12 12 12 11 10 11 11 32 32 32 32 32 32 32 32 6: 12 12 12 12 11 11 10 11 32 32 32 32 32 32 32 32 7: 12 12 12 12 11 11 11 10 32 32 32 32 32 32 32 32 8: 32 32 32 32 32 32 32 32 10 11 11 11 12 12 12 12 9: 32 32 32 32 32 32 32 32 11 10 11 11 12 12 12 12 10: 32 32 32 32 32 32 32 32 11 11 10 11 12 12 12 12 11: 32 32 32 32 32 32 32 32 11 11 11 10 12 12 12 12 12: 32 32 32 32 32 32 32 32 12 12 12 12 10 11 11 11 13: 32 32 32 32 32 32 32 32 12 12 12 12 11 10 11 11 14: 32 32 32 32 32 32 32 32 12 12 12 12 11 11 10 11 15: 32 32 32 32 32 32 32 32 12 12 12 12 11 11 11 10 $ cat /sys/class/net/ens5f0/device/numa_node 14 Affinity hints (127 IRQs): Before: 331: 00000000,00000000,00000000,00000000,00010000,00000000,00000000,00000000 332: 00000000,00000000,00000000,00000000,00020000,00000000,00000000,00000000 333: 00000000,00000000,00000000,00000000,00040000,00000000,00000000,00000000 334: 00000000,00000000,00000000,00000000,00080000,00000000,00000000,00000000 335: 00000000,00000000,00000000,00000000,00100000,00000000,00000000,00000000 336: 00000000,00000000,00000000,00000000,00200000,00000000,00000000,00000000 337: 00000000,00000000,00000000,00000000,00400000,00000000,00000000,00000000 338: 00000000,00000000,00000000,00000000,00800000,00000000,00000000,00000000 339: 00010000,00000000,00000000,00000000,00000000,00000000,00000000,00000000 340: 00020000,00000000,00000000,00000000,00000000,00000000,00000000,00000000 341: 00040000,00000000,00000000,00000000,00000000,00000000,00000000,00000000 342: 00080000,00000000,00000000,00000000,00000000,00000000,00000000,00000000 343: 00100000,00000000,00000000,00000000,00000000,00000000,00000000,00000000 344: 00200000,00000000,00000000,00000000,00000000,00000000,00000000,00000000 345: 00400000,00000000,00000000,00000000,00000000,00000000,00000000,00000000 346: 00800000,00000000,00000000,00000000,00000000,00000000,00000000,00000000 347: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000001 348: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000002 349: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000004 350: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000008 351: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000010 352: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000020 353: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000040 354: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000080 355: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000100 356: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000200 357: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000400 358: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000800 359: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00001000 360: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00002000 361: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00004000 362: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00008000 363: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00010000 364: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00020000 365: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00040000 366: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00080000 367: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00100000 368: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00200000 369: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00400000 370: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00800000 371: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,01000000 372: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,02000000 373: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,04000000 374: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,08000000 375: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,10000000 376: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,20000000 377: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,40000000 378: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,80000000 379: 00000000,00000000,00000000,00000000,00000000,00000000,00000001,00000000 380: 00000000,00000000,00000000,00000000,00000000,00000000,00000002,00000000 381: 00000000,00000000,00000000,00000000,00000000,00000000,00000004,00000000 382: 00000000,00000000,00000000,00000000,00000000,00000000,00000008,00000000 383: 00000000,00000000,00000000,00000000,00000000,00000000,00000010,00000000 384: 00000000,00000000,00000000,00000000,00000000,00000000,00000020,00000000 385: 00000000,00000000,00000000,00000000,00000000,00000000,00000040,00000000 386: 00000000,00000000,00000000,00000000,00000000,00000000,00000080,00000000 387: 00000000,00000000,00000000,00000000,00000000,00000000,00000100,00000000 388: 00000000,00000000,00000000,00000000,00000000,00000000,00000200,00000000 389: 00000000,00000000,00000000,00000000,00000000,00000000,00000400,00000000 390: 00000000,00000000,00000000,00000000,00000000,00000000,00000800,00000000 391: 00000000,00000000,00000000,00000000,00000000,00000000,00001000,00000000 392: 00000000,00000000,00000000,00000000,00000000,00000000,00002000,00000000 393: 00000000,00000000,00000000,00000000,00000000,00000000,00004000,00000000 394: 00000000,00000000,00000000,00000000,00000000,00000000,00008000,00000000 395: 00000000,00000000,00000000,00000000,00000000,00000000,00010000,00000000 396: 00000000,00000000,00000000,00000000,00000000,00000000,00020000,00000000 397: 00000000,00000000,00000000,00000000,00000000,00000000,00040000,00000000 398: 00000000,00000000,00000000,00000000,00000000,00000000,00080000,00000000 399: 00000000,00000000,00000000,00000000,00000000,00000000,00100000,00000000 400: 00000000,00000000,00000000,00000000,00000000,00000000,00200000,00000000 401: 00000000,00000000,00000000,00000000,00000000,00000000,00400000,00000000 402: 00000000,00000000,00000000,00000000,00000000,00000000,00800000,00000000 403: 00000000,00000000,00000000,00000000,00000000,00000000,01000000,00000000 404: 00000000,00000000,00000000,00000000,00000000,00000000,02000000,00000000 405: 00000000,00000000,00000000,00000000,00000000,00000000,04000000,00000000 406: 00000000,00000000,00000000,00000000,00000000,00000000,08000000,00000000 407: 00000000,00000000,00000000,00000000,00000000,00000000,10000000,00000000 408: 00000000,00000000,00000000,00000000,00000000,00000000,20000000,00000000 409: 00000000,00000000,00000000,00000000,00000000,00000000,40000000,00000000 410: 00000000,00000000,00000000,00000000,00000000,00000000,80000000,00000000 411: 00000000,00000000,00000000,00000000,00000000,00000001,00000000,00000000 412: 00000000,00000000,00000000,00000000,00000000,00000002,00000000,00000000 413: 00000000,00000000,00000000,00000000,00000000,00000004,00000000,00000000 414: 00000000,00000000,00000000,00000000,00000000,00000008,00000000,00000000 415: 00000000,00000000,00000000,00000000,00000000,00000010,00000000,00000000 416: 00000000,00000000,00000000,00000000,00000000,00000020,00000000,00000000 417: 00000000,00000000,00000000,00000000,00000000,00000040,00000000,00000000 418: 00000000,00000000,00000000,00000000,00000000,00000080,00000000,00000000 419: 00000000,00000000,00000000,00000000,00000000,00000100,00000000,00000000 420: 00000000,00000000,00000000,00000000,00000000,00000200,00000000,00000000 421: 00000000,00000000,00000000,00000000,00000000,00000400,00000000,00000000 422: 00000000,00000000,00000000,00000000,00000000,00000800,00000000,00000000 423: 00000000,00000000,00000000,00000000,00000000,00001000,00000000,00000000 424: 00000000,00000000,00000000,00000000,00000000,00002000,00000000,00000000 425: 00000000,00000000,00000000,00000000,00000000,00004000,00000000,00000000 426: 00000000,00000000,00000000,00000000,00000000,00008000,00000000,00000000 427: 00000000,00000000,00000000,00000000,00000000,00010000,00000000,00000000 428: 00000000,00000000,00000000,00000000,00000000,00020000,00000000,00000000 429: 00000000,00000000,00000000,00000000,00000000,00040000,00000000,00000000 430: 00000000,00000000,00000000,00000000,00000000,00080000,00000000,00000000 431: 00000000,00000000,00000000,00000000,00000000,00100000,00000000,00000000 432: 00000000,00000000,00000000,00000000,00000000,00200000,00000000,00000000 433: 00000000,00000000,00000000,00000000,00000000,00400000,00000000,00000000 434: 00000000,00000000,00000000,00000000,00000000,00800000,00000000,00000000 435: 00000000,00000000,00000000,00000000,00000000,01000000,00000000,00000000 436: 00000000,00000000,00000000,00000000,00000000,02000000,00000000,00000000 437: 00000000,00000000,00000000,00000000,00000000,04000000,00000000,00000000 438: 00000000,00000000,00000000,00000000,00000000,08000000,00000000,00000000 439: 00000000,00000000,00000000,00000000,00000000,10000000,00000000,00000000 440: 00000000,00000000,00000000,00000000,00000000,20000000,00000000,00000000 441: 00000000,00000000,00000000,00000000,00000000,40000000,00000000,00000000 442: 00000000,00000000,00000000,00000000,00000000,80000000,00000000,00000000 443: 00000000,00000000,00000000,00000000,00000001,00000000,00000000,00000000 444: 00000000,00000000,00000000,00000000,00000002,00000000,00000000,00000000 445: 00000000,00000000,00000000,00000000,00000004,00000000,00000000,00000000 446: 00000000,00000000,00000000,00000000,00000008,00000000,00000000,00000000 447: 00000000,00000000,00000000,00000000,00000010,00000000,00000000,00000000 448: 00000000,00000000,00000000,00000000,00000020,00000000,00000000,00000000 449: 00000000,00000000,00000000,00000000,00000040,00000000,00000000,00000000 450: 00000000,00000000,00000000,00000000,00000080,00000000,00000000,00000000 451: 00000000,00000000,00000000,00000000,00000100,00000000,00000000,00000000 452: 00000000,00000000,00000000,00000000,00000200,00000000,00000000,00000000 453: 00000000,00000000,00000000,00000000,00000400,00000000,00000000,00000000 454: 00000000,00000000,00000000,00000000,00000800,00000000,00000000,00000000 455: 00000000,00000000,00000000,00000000,00001000,00000000,00000000,00000000 456: 00000000,00000000,00000000,00000000,00002000,00000000,00000000,00000000 457: 00000000,00000000,00000000,00000000,00004000,00000000,00000000,00000000 After: 331: 00000000,00000000,00000000,00000000,00010000,00000000,00000000,00000000 332: 00000000,00000000,00000000,00000000,00020000,00000000,00000000,00000000 333: 00000000,00000000,00000000,00000000,00040000,00000000,00000000,00000000 334: 00000000,00000000,00000000,00000000,00080000,00000000,00000000,00000000 335: 00000000,00000000,00000000,00000000,00100000,00000000,00000000,00000000 336: 00000000,00000000,00000000,00000000,00200000,00000000,00000000,00000000 337: 00000000,00000000,00000000,00000000,00400000,00000000,00000000,00000000 338: 00000000,00000000,00000000,00000000,00800000,00000000,00000000,00000000 339: 00010000,00000000,00000000,00000000,00000000,00000000,00000000,00000000 340: 00020000,00000000,00000000,00000000,00000000,00000000,00000000,00000000 341: 00040000,00000000,00000000,00000000,00000000,00000000,00000000,00000000 342: 00080000,00000000,00000000,00000000,00000000,00000000,00000000,00000000 343: 00100000,00000000,00000000,00000000,00000000,00000000,00000000,00000000 344: 00200000,00000000,00000000,00000000,00000000,00000000,00000000,00000000 345: 00400000,00000000,00000000,00000000,00000000,00000000,00000000,00000000 346: 00800000,00000000,00000000,00000000,00000000,00000000,00000000,00000000 347: 00000000,00000000,00000000,00000000,00000001,00000000,00000000,00000000 348: 00000000,00000000,00000000,00000000,00000002,00000000,00000000,00000000 349: 00000000,00000000,00000000,00000000,00000004,00000000,00000000,00000000 350: 00000000,00000000,00000000,00000000,00000008,00000000,00000000,00000000 351: 00000000,00000000,00000000,00000000,00000010,00000000,00000000,00000000 352: 00000000,00000000,00000000,00000000,00000020,00000000,00000000,00000000 353: 00000000,00000000,00000000,00000000,00000040,00000000,00000000,00000000 354: 00000000,00000000,00000000,00000000,00000080,00000000,00000000,00000000 355: 00000000,00000000,00000000,00000000,00000100,00000000,00000000,00000000 356: 00000000,00000000,00000000,00000000,00000200,00000000,00000000,00000000 357: 00000000,00000000,00000000,00000000,00000400,00000000,00000000,00000000 358: 00000000,00000000,00000000,00000000,00000800,00000000,00000000,00000000 359: 00000000,00000000,00000000,00000000,00001000,00000000,00000000,00000000 360: 00000000,00000000,00000000,00000000,00002000,00000000,00000000,00000000 361: 00000000,00000000,00000000,00000000,00004000,00000000,00000000,00000000 362: 00000000,00000000,00000000,00000000,00008000,00000000,00000000,00000000 363: 00000000,00000000,00000000,00000000,01000000,00000000,00000000,00000000 364: 00000000,00000000,00000000,00000000,02000000,00000000,00000000,00000000 365: 00000000,00000000,00000000,00000000,04000000,00000000,00000000,00000000 366: 00000000,00000000,00000000,00000000,08000000,00000000,00000000,00000000 367: 00000000,00000000,00000000,00000000,10000000,00000000,00000000,00000000 368: 00000000,00000000,00000000,00000000,20000000,00000000,00000000,00000000 369: 00000000,00000000,00000000,00000000,40000000,00000000,00000000,00000000 370: 00000000,00000000,00000000,00000000,80000000,00000000,00000000,00000000 371: 00000001,00000000,00000000,00000000,00000000,00000000,00000000,00000000 372: 00000002,00000000,00000000,00000000,00000000,00000000,00000000,00000000 373: 00000004,00000000,00000000,00000000,00000000,00000000,00000000,00000000 374: 00000008,00000000,00000000,00000000,00000000,00000000,00000000,00000000 375: 00000010,00000000,00000000,00000000,00000000,00000000,00000000,00000000 376: 00000020,00000000,00000000,00000000,00000000,00000000,00000000,00000000 377: 00000040,00000000,00000000,00000000,00000000,00000000,00000000,00000000 378: 00000080,00000000,00000000,00000000,00000000,00000000,00000000,00000000 379: 00000100,00000000,00000000,00000000,00000000,00000000,00000000,00000000 380: 00000200,00000000,00000000,00000000,00000000,00000000,00000000,00000000 381: 00000400,00000000,00000000,00000000,00000000,00000000,00000000,00000000 382: 00000800,00000000,00000000,00000000,00000000,00000000,00000000,00000000 383: 00001000,00000000,00000000,00000000,00000000,00000000,00000000,00000000 384: 00002000,00000000,00000000,00000000,00000000,00000000,00000000,00000000 385: 00004000,00000000,00000000,00000000,00000000,00000000,00000000,00000000 386: 00008000,00000000,00000000,00000000,00000000,00000000,00000000,00000000 387: 01000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000 388: 02000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000 389: 04000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000 390: 08000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000 391: 10000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000 392: 20000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000 393: 40000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000 394: 80000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000 395: 00000000,00000000,00000000,00000000,00000000,00000001,00000000,00000000 396: 00000000,00000000,00000000,00000000,00000000,00000002,00000000,00000000 397: 00000000,00000000,00000000,00000000,00000000,00000004,00000000,00000000 398: 00000000,00000000,00000000,00000000,00000000,00000008,00000000,00000000 399: 00000000,00000000,00000000,00000000,00000000,00000010,00000000,00000000 400: 00000000,00000000,00000000,00000000,00000000,00000020,00000000,00000000 401: 00000000,00000000,00000000,00000000,00000000,00000040,00000000,00000000 402: 00000000,00000000,00000000,00000000,00000000,00000080,00000000,00000000 403: 00000000,00000000,00000000,00000000,00000000,00000100,00000000,00000000 404: 00000000,00000000,00000000,00000000,00000000,00000200,00000000,00000000 405: 00000000,00000000,00000000,00000000,00000000,00000400,00000000,00000000 406: 00000000,00000000,00000000,00000000,00000000,00000800,00000000,00000000 407: 00000000,00000000,00000000,00000000,00000000,00001000,00000000,00000000 408: 00000000,00000000,00000000,00000000,00000000,00002000,00000000,00000000 409: 00000000,00000000,00000000,00000000,00000000,00004000,00000000,00000000 410: 00000000,00000000,00000000,00000000,00000000,00008000,00000000,00000000 411: 00000000,00000000,00000000,00000000,00000000,00010000,00000000,00000000 412: 00000000,00000000,00000000,00000000,00000000,00020000,00000000,00000000 413: 00000000,00000000,00000000,00000000,00000000,00040000,00000000,00000000 414: 00000000,00000000,00000000,00000000,00000000,00080000,00000000,00000000 415: 00000000,00000000,00000000,00000000,00000000,00100000,00000000,00000000 416: 00000000,00000000,00000000,00000000,00000000,00200000,00000000,00000000 417: 00000000,00000000,00000000,00000000,00000000,00400000,00000000,00000000 418: 00000000,00000000,00000000,00000000,00000000,00800000,00000000,00000000 419: 00000000,00000000,00000000,00000000,00000000,01000000,00000000,00000000 420: 00000000,00000000,00000000,00000000,00000000,02000000,00000000,00000000 421: 00000000,00000000,00000000,00000000,00000000,04000000,00000000,00000000 422: 00000000,00000000,00000000,00000000,00000000,08000000,00000000,00000000 423: 00000000,00000000,00000000,00000000,00000000,10000000,00000000,00000000 424: 00000000,00000000,00000000,00000000,00000000,20000000,00000000,00000000 425: 00000000,00000000,00000000,00000000,00000000,40000000,00000000,00000000 426: 00000000,00000000,00000000,00000000,00000000,80000000,00000000,00000000 427: 00000000,00000001,00000000,00000000,00000000,00000000,00000000,00000000 428: 00000000,00000002,00000000,00000000,00000000,00000000,00000000,00000000 429: 00000000,00000004,00000000,00000000,00000000,00000000,00000000,00000000 430: 00000000,00000008,00000000,00000000,00000000,00000000,00000000,00000000 431: 00000000,00000010,00000000,00000000,00000000,00000000,00000000,00000000 432: 00000000,00000020,00000000,00000000,00000000,00000000,00000000,00000000 433: 00000000,00000040,00000000,00000000,00000000,00000000,00000000,00000000 434: 00000000,00000080,00000000,00000000,00000000,00000000,00000000,00000000 435: 00000000,00000100,00000000,00000000,00000000,00000000,00000000,00000000 436: 00000000,00000200,00000000,00000000,00000000,00000000,00000000,00000000 437: 00000000,00000400,00000000,00000000,00000000,00000000,00000000,00000000 438: 00000000,00000800,00000000,00000000,00000000,00000000,00000000,00000000 439: 00000000,00001000,00000000,00000000,00000000,00000000,00000000,00000000 440: 00000000,00002000,00000000,00000000,00000000,00000000,00000000,00000000 441: 00000000,00004000,00000000,00000000,00000000,00000000,00000000,00000000 442: 00000000,00008000,00000000,00000000,00000000,00000000,00000000,00000000 443: 00000000,00010000,00000000,00000000,00000000,00000000,00000000,00000000 444: 00000000,00020000,00000000,00000000,00000000,00000000,00000000,00000000 445: 00000000,00040000,00000000,00000000,00000000,00000000,00000000,00000000 446: 00000000,00080000,00000000,00000000,00000000,00000000,00000000,00000000 447: 00000000,00100000,00000000,00000000,00000000,00000000,00000000,00000000 448: 00000000,00200000,00000000,00000000,00000000,00000000,00000000,00000000 449: 00000000,00400000,00000000,00000000,00000000,00000000,00000000,00000000 450: 00000000,00800000,00000000,00000000,00000000,00000000,00000000,00000000 451: 00000000,01000000,00000000,00000000,00000000,00000000,00000000,00000000 452: 00000000,02000000,00000000,00000000,00000000,00000000,00000000,00000000 453: 00000000,04000000,00000000,00000000,00000000,00000000,00000000,00000000 454: 00000000,08000000,00000000,00000000,00000000,00000000,00000000,00000000 455: 00000000,10000000,00000000,00000000,00000000,00000000,00000000,00000000 456: 00000000,20000000,00000000,00000000,00000000,00000000,00000000,00000000 457: 00000000,40000000,00000000,00000000,00000000,00000000,00000000,00000000 Signed-off-by: Tariq Toukan <tariqt@nvidia.com> [Tweaked API use] Suggested-by: Yury Norov <yury.norov@gmail.com> Signed-off-by: Valentin Schneider <vschneid@redhat.com> Reviewed-by: Yury Norov <yury.norov@gmail.com> Signed-off-by: Yury Norov <yury.norov@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-02-07 18:20:00 -08:00
Valentin Schneider	06ac01721f	sched/topology: Introduce for_each_numa_hop_mask() The recently introduced sched_numa_hop_mask() exposes cpumasks of CPUs reachable within a given distance budget, wrap the logic for iterating over all (distance, mask) values inside an iterator macro. Signed-off-by: Valentin Schneider <vschneid@redhat.com> Reviewed-by: Yury Norov <yury.norov@gmail.com> Signed-off-by: Yury Norov <yury.norov@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-02-07 18:20:00 -08:00
Valentin Schneider	9feae65845	sched/topology: Introduce sched_numa_hop_mask() Tariq has pointed out that drivers allocating IRQ vectors would benefit from having smarter NUMA-awareness - cpumask_local_spread() only knows about the local node and everything outside is in the same bucket. sched_domains_numa_masks is pretty much what we want to hand out (a cpumask of CPUs reachable within a given distance budget), introduce sched_numa_hop_mask() to export those cpumasks. Link: http://lore.kernel.org/r/20220728191203.4055-1-tariqt@nvidia.com Signed-off-by: Valentin Schneider <vschneid@redhat.com> Reviewed-by: Yury Norov <yury.norov@gmail.com> Signed-off-by: Yury Norov <yury.norov@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-02-07 18:20:00 -08:00
Yury Norov	b1beed72b8	lib/cpumask: reorganize cpumask_local_spread() logic Now after moving all NUMA logic into sched_numa_find_nth_cpu(), else-branch of cpumask_local_spread() is just a function call, and we can simplify logic by using ternary operator. While here, replace BUG() with WARN_ON(). Signed-off-by: Yury Norov <yury.norov@gmail.com> Acked-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Peter Lafreniere <peter@n8pjl.ca> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-02-07 18:20:00 -08:00
Yury Norov	406d394abf	cpumask: improve on cpumask_local_spread() locality Switch cpumask_local_spread() to use newly added sched_numa_find_nth_cpu(), which takes into account distances to each node in the system. For the following NUMA configuration: root@debian:~# numactl -H available: 4 nodes (0-3) node 0 cpus: 0 1 2 3 node 0 size: 3869 MB node 0 free: 3740 MB node 1 cpus: 4 5 node 1 size: 1969 MB node 1 free: 1937 MB node 2 cpus: 6 7 node 2 size: 1967 MB node 2 free: 1873 MB node 3 cpus: 8 9 10 11 12 13 14 15 node 3 size: 7842 MB node 3 free: 7723 MB node distances: node 0 1 2 3 0: 10 50 30 70 1: 50 10 70 30 2: 30 70 10 50 3: 70 30 50 10 The new cpumask_local_spread() traverses cpus for each node like this: node 0: 0 1 2 3 6 7 4 5 8 9 10 11 12 13 14 15 node 1: 4 5 8 9 10 11 12 13 14 15 0 1 2 3 6 7 node 2: 6 7 0 1 2 3 8 9 10 11 12 13 14 15 4 5 node 3: 8 9 10 11 12 13 14 15 4 5 6 7 0 1 2 3 Signed-off-by: Yury Norov <yury.norov@gmail.com> Acked-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Peter Lafreniere <peter@n8pjl.ca> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-02-07 18:20:00 -08:00
Yury Norov	cd7f55359c	sched: add sched_numa_find_nth_cpu() The function finds Nth set CPU in a given cpumask starting from a given node. Leveraging the fact that each hop in sched_domains_numa_masks includes the same or greater number of CPUs than the previous one, we can use binary search on hops instead of linear walk, which makes the overall complexity of O(log n) in terms of number of cpumask_weight() calls. Signed-off-by: Yury Norov <yury.norov@gmail.com> Acked-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Peter Lafreniere <peter@n8pjl.ca> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-02-07 18:20:00 -08:00
Yury Norov	62f4386e56	cpumask: introduce cpumask_nth_and_andnot Introduce cpumask_nth_and_andnot() based on find_nth_and_andnot_bit(). It's used in the following patch to traverse cpumasks without storing intermediate result in temporary cpumask. Signed-off-by: Yury Norov <yury.norov@gmail.com> Acked-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Peter Lafreniere <peter@n8pjl.ca> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-02-07 18:20:00 -08:00
Yury Norov	4324511780	lib/find: introduce find_nth_and_andnot_bit In the following patches the function is used to implement in-place bitmaps traversing without storing intermediate result in temporary bitmaps. Signed-off-by: Yury Norov <yury.norov@gmail.com> Acked-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Peter Lafreniere <peter@n8pjl.ca> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-02-07 18:20:00 -08:00
Jakub Kicinski	383d9f87a0	Merge branch 'net-core-use-a-dedicated-kmem_cache-for-skb-head-allocs' Eric Dumazet says: ==================== net: core: use a dedicated kmem_cache for skb head allocs Our profile data show that using kmalloc(non_const_size)/kfree(ptr) has a certain cost, because kfree(ptr) has to pull a 'struct page' in cpu caches. Using a dedicated kmem_cache for TCP skb->head allocations makes a difference, both in cpu cycles and memory savings. This kmem_cache could also be used for GRO skb allocations, this is left as a future exercise. ==================== Link: https://lore.kernel.org/r/20230206173103.2617121-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-02-07 11:00:03 -08:00
Eric Dumazet	bf9f1baa27	net: add dedicated kmem_cache for typical/small skb->head Recent removal of ksize() in alloc_skb() increased performance because we no longer read the associated struct page. We have an equivalent cost at kfree_skb() time. kfree(skb->head) has to access a struct page, often cold in cpu caches to get the owning struct kmem_cache. Considering that many allocations are small (at least for TCP ones) we can have our own kmem_cache to avoid the cache line miss. This also saves memory because these small heads are no longer padded to 1024 bytes. CONFIG_SLUB=y $ grep skbuff_small_head /proc/slabinfo skbuff_small_head 2907 2907 640 51 8 : tunables 0 0 0 : slabdata 57 57 0 CONFIG_SLAB=y $ grep skbuff_small_head /proc/slabinfo skbuff_small_head 607 624 640 6 1 : tunables 54 27 8 : slabdata 104 104 5 Notes: - After Kees Cook patches and this one, we might be able to revert commit `dbae2b0628` ("net: skb: introduce and use a single page frag cache") because GRO_MAX_HEAD is also small. - This patch is a NOP for CONFIG_SLOB=y builds. Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Acked-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Alexander Duyck <alexanderduyck@fb.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-02-07 10:59:58 -08:00
Eric Dumazet	5c0e820cbb	net: factorize code in kmalloc_reserve() All kmalloc_reserve() callers have to make the same computation, we can factorize them, to prepare following patch in the series. Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Acked-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Alexander Duyck <alexanderduyck@fb.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-02-07 10:59:55 -08:00
Eric Dumazet	65998d2bf8	net: remove osize variable in __alloc_skb() This is a cleanup patch, to prepare following change. Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Acked-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Alexander Duyck <alexanderduyck@fb.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-02-07 10:59:52 -08:00
Eric Dumazet	115f1a5c42	net: add SKB_HEAD_ALIGN() helper We have many places using this expression: SKB_DATA_ALIGN(sizeof(struct skb_shared_info)) Use of SKB_HEAD_ALIGN() will allow to clean them. Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Acked-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Alexander Duyck <alexanderduyck@fb.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-02-07 10:59:48 -08:00
Paolo Abeni	61d731e653	linux-can-next-for-6.3-20230206 -----BEGIN PGP SIGNATURE----- iQFHBAABCgAxFiEEDs2BvajyNKlf9TJQvlAcSiqKBOgFAmPg+r4THG1rbEBwZW5n dXRyb25peC5kZQAKCRC+UBxKKooE6A44B/0c8jjyLikVE2fEBKkvPk8iRlFKap48 140HNR+1TuVBBPCJTe2Gq3l8fDfFUgUVnFfVDjAQq6pMWufLkz9jVr5uGN7O94v+ tFCjip8NLRTa6ibgQlGrFXl31X5vqT+jV6V4l29wFgbwgOMRrod7wLDo+bHTtuuY 46Sf4GgIazYjxEeZBBsvjnAbraFq0twZFcEz9SyeOxqjR+9+71X4Y3x0kgQuz72N CrvRng0yFLqbHOBE60iaDqbjuNnwBGTH476pwX25Kj9gVyIH20T45h8tV4j6z2Sq A42cHKDlWZRWUiWQjGOHHT4igUf//Sfv8WDaFjqobO/DDY+nBA75EPAv =hWLT -----END PGP SIGNATURE----- Merge tag 'linux-can-next-for-6.3-20230206' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can-next Marc Kleine-Budde says: ==================== pull-request: can-next 2023-02-06 this is a pull request of 47 patches for net-next/master. The first two patch is by Oliver Hartkopp. One adds missing error checking to the CAN_GW protocol, the other adds a missing CAN address family check to the CAN ISO TP protocol. Thomas Kopp contributes a performance optimization to the mcp251xfd driver. The next 11 patches are by Geert Uytterhoeven and add support for R-Car V4H systems to the rcar_canfd driver. Stephane Grosjean and Lukas Magel contribute 8 patches to the peak_usb driver, which add support for configurable CAN channel ID. The last 17 patches are by me and target the CAN bit timing configuration. The bit timing is cleaned up, error messages are improved and forwarded to user space via NL_SET_ERR_MSG_FMT() instead of netdev_err(), and the SJW handling is updated, including the definition of a new default value that will benefit CAN-FD controllers, by increasing their oscillator tolerance. * tag 'linux-can-next-for-6.3-20230206' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can-next: (47 commits) can: bittiming: can_validate_bitrate(): report error via netlink can: bittiming: can_calc_bittiming(): convert from netdev_err() to NL_SET_ERR_MSG_FMT() can: bittiming: can_calc_bittiming(): clean up SJW handling can: bittiming: can_sjw_set_default(): use Phase Seg2 / 2 as default for SJW can: bittiming: can_sjw_check(): check that SJW is not longer than either Phase Buffer Segment can: bittiming: can_sjw_check(): report error via netlink and harmonize error value can: bittiming: can_fixup_bittiming(): report error via netlink and harmonize error value can: bittiming: factor out can_sjw_set_default() and can_sjw_check() can: bittiming: can_changelink() pass extack down callstack can: netlink: can_changelink(): convert from netdev_err() to NL_SET_ERR_MSG_FMT() can: netlink: can_validate(): validate sample point for CAN and CAN-FD can: dev: register_candev(): bail out if both fixed bit rates and bit timing constants are provided can: dev: register_candev(): ensure that bittiming const are valid can: bittiming: can_get_bittiming(): use direct return and remove unneeded else can: bittiming: can_fixup_bittiming(): set effective tq can: bittiming: can_fixup_bittiming(): use CAN_SYNC_SEG instead of 1 can: bittiming(): replace open coded variants of can_bit_time() can: peak_usb: Reorder include directives alphabetically can: peak_usb: align CAN channel ID format in log with sysfs attribute can: peak_usb: export PCAN CAN channel ID as sysfs device attribute ... ==================== Link: https://lore.kernel.org/r/20230206131620.2758724-1-mkl@pengutronix.de Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2023-02-07 15:54:09 +01:00
Vladimir Oltean	ca8e4cbff6	ethtool: mm: fix get_mm() return code not propagating to user space If ops->get_mm() returns a non-zero error code, we goto out_complete, but there, we return 0. Fix that to propagate the "ret" variable to the caller. If ops->get_mm() succeeds, it will always return 0. Fixes: `2b30f8291a` ("net: ethtool: add support for MAC Merge layer") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Link: https://lore.kernel.org/r/20230206094932.446379-1-vladimir.oltean@nxp.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2023-02-07 15:39:06 +01:00
Eddy Tao	15ea59a0e9	net: openvswitch: reduce cpu_used_mask memory Use actual CPU number instead of hardcoded value to decide the size of 'cpu_used_mask' in 'struct sw_flow'. Below is the reason. 'struct cpumask cpu_used_mask' is embedded in struct sw_flow. Its size is hardcoded to CONFIG_NR_CPUS bits, which can be 8192 by default, it costs memory and slows down ovs_flow_alloc. To address this: Redefine cpu_used_mask to pointer. Append cpumask_size() bytes after 'stat' to hold cpumask. Initialization cpu_used_mask right after stats_last_writer. APIs like cpumask_next and cpumask_set_cpu never access bits beyond cpu count, cpumask_size() bytes of memory is enough. Signed-off-by: Eddy Tao <taoyuan_eddy@hotmail.com> Acked-by: Eelco Chaudron <echaudro@redhat.com> Link: https://lore.kernel.org/r/OS3P286MB229570CCED618B20355D227AF5D59@OS3P286MB2295.JPNP286.PROD.OUTLOOK.COM Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-02-06 22:36:29 -08:00
Arnd Bergmann	bbe6418663	amd-xgbe: fix mismatched prototype The forward declaration was introduced with a prototype that does not match the function definition: drivers/net/ethernet/amd/xgbe/xgbe-phy-v2.c:2166:13: error: conflicting types for 'xgbe_phy_perform_ratechange' due to enum/integer mismatch; have 'void(struct xgbe_prv_data , enum xgbe_mb_cmd, enum xgbe_mb_subcmd)' [-Werror=enum-int-mismatch] 2166 \| static void xgbe_phy_perform_ratechange(struct xgbe_prv_data pdata, \| ^~~~~~~~~~~~~~~~~~~~~~~~~~~ drivers/net/ethernet/amd/xgbe/xgbe-phy-v2.c:391:13: note: previous declaration of 'xgbe_phy_perform_ratechange' with type 'void(struct xgbe_prv_data , unsigned int, unsigned int)' 391 \| static void xgbe_phy_perform_ratechange(struct xgbe_prv_data pdata, \| ^~~~~~~~~~~~~~~~~~~~~~~~~~~ Ideally there should not be any forward declarations here, which would make it easier to show that there is no unbounded recursion. I tried fixing this but could not figure out how to avoid the recursive call. As a hotfix, address only the broken prototype to fix the build problem instead. Fixes: `4f3b20bfbb` ("amd-xgbe: add support for rx-adaptation") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Simon Horman <simon.horman@corigine.com> Acked-by: Shyam Sundar S K <Shyam-sundar.S-k@amd.com> Link: https://lore.kernel.org/r/20230203121553.2871598-1-arnd@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-02-06 22:33:18 -08:00
Colin Foster	b1ca2f1b04	net: mscc: ocelot: un-export unused regmap symbols There are no external users of the vsc7514__regmap[] symbols or vsc7514_vcap_ functions. They were exported in commit `32ecd22ba6` ("net: mscc: ocelot: split register definitions to a separate file") with the intention of being used, but the actual structure used in commit `2efaca411c` ("net: mscc: ocelot: expose vsc7514_regmap definition") ended up being all that was needed. Bury these unnecessary symbols. Signed-off-by: Colin Foster <colin.foster@in-advantage.com> Suggested-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Link: https://lore.kernel.org/r/20230204182056.25502-1-colin.foster@in-advantage.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-02-06 22:33:15 -08:00
Jakub Kicinski	9ac543c06f	Merge branch 'aux-bus-v11' of https://github.com/ajitkhaparde1/linux Ajit Khaparde says: ==================== bnxt: Add Auxiliary driver support Add auxiliary device driver for Broadcom devices. The bnxt_en driver will register and initialize an aux device if RDMA is enabled in the underlying device. The bnxt_re driver will then probe and initialize the RoCE interfaces with the infiniband stack. We got rid of the bnxt_en_ops which the bnxt_re driver used to communicate with bnxt_en. Similarly We have tried to clean up most of the bnxt_ulp_ops. In most of the cases we used the functions and entry points provided by the auxiliary bus driver framework. And now these are the minimal functions needed to support the functionality. We will try to work on getting rid of the remaining if we find any other viable option in future. * 'aux-bus-v11' of https://github.com/ajitkhaparde1/linux: bnxt_en: Remove runtime interrupt vector allocation RDMA/bnxt_re: Remove the sriov config callback bnxt_en: Remove struct bnxt access from RoCE driver bnxt_en: Use auxiliary bus calls over proprietary calls bnxt_en: Use direct API instead of indirection bnxt_en: Remove usage of ulp_id RDMA/bnxt_re: Use auxiliary driver interface bnxt_en: Add auxiliary driver support ==================== Link: https://lore.kernel.org/r/20230202033809.3989-1-ajit.khaparde@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-02-06 22:25:48 -08:00
Marc Kleine-Budde	3dafbe5cc1	Merge patch series "can: bittiming: cleanups and rework SJW handling" Marc Kleine-Budde <mkl@pengutronix.de> says: several people noticed that on modern CAN controllers with wide bit timing registers the default SJW of 1 can result in unstable or no synchronization to the CAN network. See Patch 14/17 for details. During review of v1 Vincent pointed out that the original code and the series doesn't always check user provided bit timing parameters, sometimes silently limits them and the return error values are not consistent. This series first cleans up some code in bittiming.c, replacing open-coded variants by macros or functions (Patches 1, 2). Patch 3 adds the missing assignment of the effective TQ if the interface is configured with low level timing parameters. Patch 4 is another code cleanup. Patches 5, 6 check the bit timing parameter during interface registration. Patch 7 adds a validation of the sample point. The patches 8-13 convert the error messages from netdev_err() to NL_SET_ERR_MSG_FMT, factor out the SJW handling from can_fixup_bittiming(), add checking and error messages for the individual limits and harmonize the error return values. Patch 14 changes the default SJW value from 1 to min(Phase Seg1, Phase Seg2 / 2). Patch 15 switches can_calc_bittiming() to use the new SJW handling. Patch 16 converts can_calc_bittiming() to NL_SET_ERR_MSG_FMT(). And patch 16 adds a NL_SET_ERR_MSG_FMT() error message to can_validate_bitrate(). v1: https://lore.kernel.org/all/20220907103845.3929288-1-mkl@pengutronix.de Link: https://lore.kernel.org/all/20230202110854.2318594-1-mkl@pengutronix.de Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>	2023-02-06 14:02:37 +01:00
Marc Kleine-Budde	6d7934719f	can: bittiming: can_validate_bitrate(): report error via netlink Report an error to user space via netlink if the requested bit rate is not supported by the device. Link: https://lore.kernel.org/all/20230202110854.2318594-18-mkl@pengutronix.de Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>	2023-02-06 13:57:27 +01:00
Marc Kleine-Budde	06742086a3	can: bittiming: can_calc_bittiming(): convert from netdev_err() to NL_SET_ERR_MSG_FMT() Replace the netdev_err() by NL_SET_ERR_MSG_FMT() to better inform the user about the problem. While there, use %u to print unsigned values and improve error message a bit. In case of an error, return -EINVAL instead of -EDOM, this corresponds better to the actual meaning of the error value. Link: https://lore.kernel.org/all/20230202110854.2318594-17-mkl@pengutronix.de Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>	2023-02-06 13:57:27 +01:00
Marc Kleine-Budde	c7650728a7	can: bittiming: can_calc_bittiming(): clean up SJW handling In the current code, if the user configures a bitrate, a default SJW value of 1 is used. If the user configures both a bitrate and a SJW value, can_calc_bittiming() silently limits the SJW value to SJW max and TSEG2. We came to the conclusion that if the user provided an invalid SJW value, it's best to bail out and inform the user [1]. [1] https://lore.kernel.org/all/CAMZ6RqKqhmTgUZiwe5uqUjBDnhhC2iOjZ791+Y845btJYwVDKg@mail.gmail.com Further the ISO 11898-1:2015 standard mandates that "SJW shall be less than or equal to the minimum of these two items: Phase_Seg1 and Phase_Seg2." [2] The current code is missing that check. [2] https://lore.kernel.org/all/BL3PR11MB64844E3FC13C55433CDD0B3DFB449@BL3PR11MB6484.namprd11.prod.outlook.com The previous patches introduced 1) can_sjw_set_default() - sets a default value for SJW if unset 2) can_sjw_check() - implements a SJW check against SJW max, Phase Seg1 and Phase Seg2. In the error case this function reports the error to user space via netlink. Replace both the open-coded SJW default setting and the open-coded and insufficient checks of SJW with the helper functions can_sjw_set_default() and can_sjw_check(). Link: https://lore.kernel.org/all/20230202110854.2318594-16-mkl@pengutronix.de Link: https://lore.kernel.org/all/CAMZ6RqKqhmTgUZiwe5uqUjBDnhhC2iOjZ791+Y845btJYwVDKg@mail.gmail.com Link: https://lore.kernel.org/all/BL3PR11MB64844E3FC13C55433CDD0B3DFB449@BL3PR11MB6484.namprd11.prod.outlook.com Suggested-by: Thomas Kopp <Thomas.Kopp@microchip.com> Suggested-by: Vincent Mailhol <vincent.mailhol@gmail.com> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>	2023-02-06 13:57:27 +01:00
Marc Kleine-Budde	80bcf5ec99	can: bittiming: can_sjw_set_default(): use Phase Seg2 / 2 as default for SJW "The (Re-)Synchronization Jump Width (SJW) defines how far a resynchronization may move the Sample Point inside the limits defined by the Phase Buffer Segments to compensate for edge phase errors." [1] In other words, this means that the SJW parameter controls the tolerance of the CAN controller to frequency errors compared to other CAN controllers. If the user space does not provide an SJW parameter, the kernel chooses a default value of 1. This has proven to be a good default value for classic CAN controllers, but no longer for modern CAN-FD controllers. In the past there were CAN controllers like the sja1000 with a rather limited range of bit timing parameters. For the standard bit rates this results in the following bit timing parameters: \| Bit timing parameters for sja1000 with 8.000000 MHz ref clock \| _----+--------------=> tseg1: 1 … 16 \| / / _---------=> tseg2: 1 … 8 \| \| \| / _-----=> sjw: 1 … 4 \| \| \| \| / _-=> brp: 1 … 64 (inc: 1) \| \| \| \| \| / \| nominal \| \| \| \| \| real Bitrt nom real SampP \| Bitrate TQ[ns] PrS PhS1 PhS2 SJW BRP Bitrate Error SampP SampP Error BTR0 BTR1 \| 1000000 125 2 3 2 1 1 1000000 0.0% 75.0% 75.0% 0.0% 0x00 0x14 \| 800000 125 3 4 2 1 1 800000 0.0% 80.0% 80.0% 0.0% 0x00 0x16 \| 666666 125 4 4 3 1 1 666666 0.0% 80.0% 75.0% 6.2% 0x00 0x27 \| 500000 125 6 7 2 1 1 500000 0.0% 87.5% 87.5% 0.0% 0x00 0x1c \| 250000 250 6 7 2 1 2 250000 0.0% 87.5% 87.5% 0.0% 0x01 0x1c \| 125000 500 6 7 2 1 4 125000 0.0% 87.5% 87.5% 0.0% 0x03 0x1c \| 100000 625 6 7 2 1 5 100000 0.0% 87.5% 87.5% 0.0% 0x04 0x1c \| 83333 750 6 7 2 1 6 83333 0.0% 87.5% 87.5% 0.0% 0x05 0x1c \| 50000 1250 6 7 2 1 10 50000 0.0% 87.5% 87.5% 0.0% 0x09 0x1c \| 33333 1875 6 7 2 1 15 33333 0.0% 87.5% 87.5% 0.0% 0x0e 0x1c \| 20000 3125 6 7 2 1 25 20000 0.0% 87.5% 87.5% 0.0% 0x18 0x1c \| 10000 6250 6 7 2 1 50 10000 0.0% 87.5% 87.5% 0.0% 0x31 0x1c The attentive reader will notice that the SJW is 1 in most cases, while the Seg2 phase is 2. Both values are given in TQ units, which in turn is a duration in nanoseconds. For example the 500 kbit/s configuration: \| nominal real Bitrt nom real SampP \| Bitrate TQ[ns] PrS PhS1 PhS2 SJW BRP Bitrate Error SampP SampP Error BTR0 BTR1 \| 500000 125 6 7 2 1 1 500000 0.0% 87.5% 87.5% 0.0% 0x00 0x1c the TQ is 125ns, the Phase Seg2 is "2" (== 250ns), the SJW is "1" (== 125 ns). Looking at a more modern CAN controller like a mcp2518fd, it has wider bit timing registers. \| Bit timing parameters for mcp251xfd with 40.000000 MHz ref clock \| _----+--------------=> tseg1: 2 … 256 \| / / _---------=> tseg2: 1 … 128 \| \| \| / _-----=> sjw: 1 … 128 \| \| \| \| / _-=> brp: 1 … 256 (inc: 1) \| \| \| \| \| / \| nominal \| \| \| \| \| real Bitrt nom real SampP \| Bitrate TQ[ns] PrS PhS1 PhS2 SJW BRP Bitrate Error SampP SampP Error NBTCFG \| 500000 25 34 35 10 1 1 500000 0.0% 87.5% 87.5% 0.0% 0x00440900 The TQ is 25ns, the Phase Seg 2 is "10" (== 250ns), the SJW is "1" (== 25ns). Since the kernel chooses a default SJW of 1 regardless of the TQ, this leads to a much smaller SJW and thus much smaller tolerances to frequency errors. To maintain the same oscillator tolerances on controllers with wide bit timing registers, select a default SJW value of Phase Seg2 / 2 unless Phase Seg 1 is less. This results in the following bit timing parameters: \| Bit timing parameters for mcp251xfd with 40.000000 MHz ref clock \| _----+--------------=> tseg1: 2 … 256 \| / / _---------=> tseg2: 1 … 128 \| \| \| / _-----=> sjw: 1 … 128 \| \| \| \| / _-=> brp: 1 … 256 (inc: 1) \| \| \| \| \| / \| nominal \| \| \| \| \| real Bitrt nom real SampP \| Bitrate TQ[ns] PrS PhS1 PhS2 SJW BRP Bitrate Error SampP SampP Error NBTCFG \| 500000 25 34 35 10 5 1 500000 0.0% 87.5% 87.5% 0.0% 0x00440904 The TQ is 25ns, the Phase Seg 2 is "10" (== 250ns), the SJW is "5" (== 125ns). Which is the same as on the sja1000 controller. [1] http://web.archive.org/http://www.oertel-halle.de/files/cia99paper.pdf Link: https://lore.kernel.org/all/20230202110854.2318594-15-mkl@pengutronix.de Cc: Mark Bath <mark@baggywrinkle.co.uk> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>	2023-02-06 13:57:27 +01:00
Marc Kleine-Budde	b5a3d0864e	can: bittiming: can_sjw_check(): check that SJW is not longer than either Phase Buffer Segment According to "The Configuration of the CAN Bit Timing" [1] the SJW "may not be longer than either Phase Buffer Segment". Check SJW against length of both Phase buffers. In case the SJW is greater, report an error via netlink to user space and bail out. [1] http://web.archive.org/http://www.oertel-halle.de/files/cia99paper.pdf Link: https://lore.kernel.org/all/20230202110854.2318594-14-mkl@pengutronix.de Suggested-by: Vincent Mailhol <vincent.mailhol@gmail.com> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>	2023-02-06 13:57:27 +01:00
Marc Kleine-Budde	0c017f0910	can: bittiming: can_sjw_check(): report error via netlink and harmonize error value If the user space has supplied an invalid SJW value (greater than the maximum SJW value), report -EINVAL instead of -ERANGE, this better matches the actual meaning of the error value. Additionally report an error message via netlink to the user space. Link: https://lore.kernel.org/all/20230202110854.2318594-13-mkl@pengutronix.de Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>	2023-02-06 13:57:26 +01:00
Marc Kleine-Budde	de82d6185b	can: bittiming: can_fixup_bittiming(): report error via netlink and harmonize error value Check each bit timing parameter first individually against their limits and report a meaningful error message via netlink to the user space. In case of an error, return -EINVAL instead of -ERANGE, this corresponds better to the actual meaning of the error value. Link: https://lore.kernel.org/all/20230202110854.2318594-12-mkl@pengutronix.de Suggested-by: Vincent Mailhol <vincent.mailhol@gmail.com> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>	2023-02-06 13:57:26 +01:00
Marc Kleine-Budde	5988bf737d	can: bittiming: factor out can_sjw_set_default() and can_sjw_check() Factor out the functionality of assigning a SJW default value into can_sjw_set_default() and the checking the SJW limits into can_sjw_check(). This functions will be improved and called from a different function in the following patches. Link: https://lore.kernel.org/all/20230202110854.2318594-11-mkl@pengutronix.de Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>	2023-02-06 13:57:26 +01:00
Marc Kleine-Budde	286c0e09e8	can: bittiming: can_changelink() pass extack down callstack This is a preparation patch. In order to pass warning/error messages during netlink calls back to user space, pass the extack struct down the callstack of can_changelink(), the actual error messages will be added in the following ptaches. Link: https://lore.kernel.org/all/20230202110854.2318594-10-mkl@pengutronix.de Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>	2023-02-06 13:57:26 +01:00
Marc Kleine-Budde	1494d27f64	can: netlink: can_changelink(): convert from netdev_err() to NL_SET_ERR_MSG_FMT() Since commit `51c352bdbc` ("netlink: add support for formatted extack messages") formatted extack messages are supported to inform the user space or warnings/errors during netlink calls. Replace the netdev_err() by NL_SET_ERR_MSG_FMT() to better inform the user about the problem. While there, use %u to print unsigned values and improve error message a bit. Link: https://lore.kernel.org/all/20230202110854.2318594-9-mkl@pengutronix.de Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>	2023-02-06 13:57:26 +01:00
Marc Kleine-Budde	73335cfab7	can: netlink: can_validate(): validate sample point for CAN and CAN-FD The sample point is a value in tenths of a percent. Meaningful values are between 0 and 1000. Invalid values are rejected and an error message is returned to user space via netlink. Link: https://lore.kernel.org/all/20230202110854.2318594-8-mkl@pengutronix.de Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>	2023-02-06 13:57:26 +01:00
Marc Kleine-Budde	a3db542410	can: dev: register_candev(): bail out if both fixed bit rates and bit timing constants are provided The CAN driver framework supports either fixed bit rates or bit timing constants. Bail out during driver registration if both are given. Link: https://lore.kernel.org/all/20230202110854.2318594-7-mkl@pengutronix.de Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>	2023-02-06 13:57:26 +01:00
Marc Kleine-Budde	d58ac89d0d	can: dev: register_candev(): ensure that bittiming const are valid Implement the function can_bittiming_const_valid() to check the validity of the specified bit timing constant. Call this function from register_candev() to check the bit timing constants during the registration of the CAN interface. Link: https://lore.kernel.org/all/20230202110854.2318594-6-mkl@pengutronix.de Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>	2023-02-06 13:57:26 +01:00
Marc Kleine-Budde	8e0a0b32c4	can: bittiming: can_get_bittiming(): use direct return and remove unneeded else Clean up the code flow a bit, don't assign err variable but directly return. Remove the unneeded else, too. Link: https://lore.kernel.org/all/20230202110854.2318594-5-mkl@pengutronix.de Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>	2023-02-06 13:57:25 +01:00
Marc Kleine-Budde	52375446f2	can: bittiming: can_fixup_bittiming(): set effective tq The can_fixup_bittiming() function is used to validate the user-supplied low-level bit timing parameters and calculate the bitrate prescaler (brp) from the requested time quanta (tq) and the CAN clock of the controller. can_fixup_bittiming() selects the best matching integer bit rate prescaler, which may result in a different time quantum than the value specified by the user. Calculate the resulting time quantum and assign it so that the user sees the effective time quantum. Link: https://lore.kernel.org/all/20230202110854.2318594-4-mkl@pengutronix.de Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>	2023-02-06 13:57:25 +01:00
Marc Kleine-Budde	9cf670dbe6	can: bittiming: can_fixup_bittiming(): use CAN_SYNC_SEG instead of 1 Commit `1c47fa6b31` ("can: dev: add a helper function to calculate the duration of one bit") made the constant CAN_SYNC_SEG available in a header file. The magic number 1 in can_fixup_bittiming() represents the width of the sync segment, replace it by CAN_SYNC_SEG to make the code more readable. Link: https://lore.kernel.org/all/20230202110854.2318594-3-mkl@pengutronix.de Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>	2023-02-06 13:57:25 +01:00
Marc Kleine-Budde	89cfa63565	can: bittiming(): replace open coded variants of can_bit_time() Commit `1c47fa6b31` ("can: dev: add a helper function to calculate the duration of one bit") added the helper function can_bit_time(). Replace open coded variants of can_bit_time() by the helper function. Link: https://lore.kernel.org/all/20230202110854.2318594-2-mkl@pengutronix.de Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>	2023-02-06 13:57:25 +01:00
David S. Miller	c21adf256f	Merge branch 'tuntap-socket-uid' Pietro Borrello says: ==================== tuntap: correctly initialize socket uid sock_init_data() assumes that the `struct socket` passed in input is contained in a `struct socket_alloc` allocated with sock_alloc(). However, tap_open() and tun_chr_open() pass a `struct socket` embedded in a `struct tap_queue` and `struct tun_file` respectively, both allocated with sk_alloc(). This causes a type confusion when issuing a container_of() with SOCK_INODE() in sock_init_data() which results in assigning a wrong sk_uid to the `struct sock` in input. Due to the type confusion, both sockets happen to have their uid set to 0, i.e. root. While it will be often correct, as tuntap devices require CAP_NET_ADMIN, it may not always be the case. Not sure how widespread is the impact of this, it seems the socket uid may be used for network filtering and routing, thus tuntap sockets may be incorrectly managed. Additionally, it seems a socket with an incorrect uid may be returned to the vhost driver when issuing a get_socket() on a tuntap device in vhost_net_set_backend(). Fix the bugs by adding and using sock_init_data_uid(), which explicitly takes a uid as argument. Signed-off-by: Pietro Borrello <borrello@diag.uniroma1.it> --- Changes in v3: - Fix the bug by defining and using sock_init_data_uid() - Link to v2: https://lore.kernel.org/r/20230131-tuntap-sk-uid-v2-0-29ec15592813@diag.uniroma1.it Changes in v2: - Shorten and format comments - Link to v1: https://lore.kernel.org/r/20230131-tuntap-sk-uid-v1-0-af4f9f40979d@diag.uniroma1.it ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2023-02-06 10:16:55 +00:00
Pietro Borrello	66b2c338ad	tap: tap_open(): correctly initialize socket uid sock_init_data() assumes that the `struct socket` passed in input is contained in a `struct socket_alloc` allocated with sock_alloc(). However, tap_open() passes a `struct socket` embedded in a `struct tap_queue` allocated with sk_alloc(). This causes a type confusion when issuing a container_of() with SOCK_INODE() in sock_init_data() which results in assigning a wrong sk_uid to the `struct sock` in input. On default configuration, the type confused field overlaps with padding bytes between `int vnet_hdr_sz` and `struct tap_dev __rcu *tap` in `struct tap_queue`, which makes the uid of all tap sockets 0, i.e., the root one. Fix the assignment by using sock_init_data_uid(). Fixes: `86741ec254` ("net: core: Add a UID field to struct sock.") Signed-off-by: Pietro Borrello <borrello@diag.uniroma1.it> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-02-06 10:16:55 +00:00
Pietro Borrello	a096ccca6e	tun: tun_chr_open(): correctly initialize socket uid sock_init_data() assumes that the `struct socket` passed in input is contained in a `struct socket_alloc` allocated with sock_alloc(). However, tun_chr_open() passes a `struct socket` embedded in a `struct tun_file` allocated with sk_alloc(). This causes a type confusion when issuing a container_of() with SOCK_INODE() in sock_init_data() which results in assigning a wrong sk_uid to the `struct sock` in input. On default configuration, the type confused field overlaps with the high 4 bytes of `struct tun_struct __rcu *tun` of `struct tun_file`, NULL at the time of call, which makes the uid of all tun sockets 0, i.e., the root one. Fix the assignment by using sock_init_data_uid(). Fixes: `86741ec254` ("net: core: Add a UID field to struct sock.") Signed-off-by: Pietro Borrello <borrello@diag.uniroma1.it> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-02-06 10:16:55 +00:00
Pietro Borrello	584f374289	net: add sock_init_data_uid() Add sock_init_data_uid() to explicitly initialize the socket uid. To initialise the socket uid, sock_init_data() assumes a the struct socket* sock is always embedded in a struct socket_alloc, used to access the corresponding inode uid. This may not be true. Examples are sockets created in tun_chr_open() and tap_open(). Fixes: `86741ec254` ("net: core: Add a UID field to struct sock.") Signed-off-by: Pietro Borrello <borrello@diag.uniroma1.it> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-02-06 10:16:55 +00:00
David S. Miller	b601135e8d	Merge branch 'ENETC-mqprio-taprio-cleanup' Vladimir Oltean says: ==================== net: ENETC mqprio/taprio cleanup Please excuse the increased patch set size compared to v4's 15 patches, but Claudiu stirred up the pot :) when he pointed out that the mqprio TXQ validation procedure is still incorrect, so I had to fix that, and then do some consolidation work so that taprio doesn't duplicate mqprio's bugs. Compared to v4, 3 patches are new and 1 was dropped for now ("net/sched: taprio: mask off bits in gate mask that exceed number of TCs"), since there's not really much to gain from it. Since the previous patch set has largely been reviewed, I hope that a delta overview will help and make up for the large size. v4->v5: - new patches: "[08/17] net/sched: mqprio: allow reverse TC:TXQ mappings" "[11/17] net/sched: taprio: centralize mqprio qopt validation" "[12/17] net/sched: refactor mqprio qopt reconstruction to a library function" - changed patches worth revisiting: "[09/17] net/sched: mqprio: allow offloading drivers to request queue count validation" v4 at: https://patchwork.kernel.org/project/netdevbpf/cover/20230130173145.475943-1-vladimir.oltean@nxp.com/ v3->v4: - adjusted patch 07/15 to not remove "#include <net/pkt_sched.h>" from ti cpsw https://patchwork.kernel.org/project/netdevbpf/cover/20230127001516.592984-1-vladimir.oltean@nxp.com/ v2->v3: - move min_num_stack_tx_queues definition so it doesn't conflict with the ethtool mm patches I haven't submitted yet for enetc (and also to make use of a 4 byte hole) - warn and mask off excess TCs in gate mask instead of failing - finally CC qdisc maintainers v2 at: https://patchwork.kernel.org/project/netdevbpf/patch/20230126125308.1199404-16-vladimir.oltean@nxp.com/ v1->v2: - patches 1->4 are new - update some header inclusions in drivers - fix typo (said "taprio" instead of "mqprio") - better enetc mqprio error handling - dynamically reconstruct mqprio configuration in taprio offload - also let stmmac and tsnep use per-TXQ gate_mask v1 (RFC) at: https://patchwork.kernel.org/project/netdevbpf/cover/20230120141537.1350744-1-vladimir.oltean@nxp.com/ The main goal of this patch set is to make taprio pass the mqprio queue configuration structure down to ndo_setup_tc() - patch 13/17. But mqprio itself is not in the best shape currently, so there are some consolidation patches on that as well. Next, there are some consolidation patches in the enetc driver's handling of TX queues and their traffic class assignment. Then, there is a consolidation between the TX queue configuration for mqprio and taprio. Finally, there is a change in the meaning of the gate_mask passed by taprio through ndo_setup_tc(). We introduce a capability through which drivers can request the gate mask to be per TXQ. The default is changed so that it is per TC. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2023-02-06 10:06:44 +00:00
Vladimir Oltean	06b1c9110a	net: enetc: act upon mqprio queue config in taprio offload We assume that the mqprio queue configuration from taprio has a simple 1:1 mapping between prio and traffic class, and one TX queue per TC. That might not be the case. Actually parse and act upon the mqprio config. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-02-06 10:06:44 +00:00
Vladimir Oltean	1a353111b6	net: enetc: act upon the requested mqprio queue configuration Regardless of the requested queue count per traffic class, the enetc driver allocates a number of TX rings equal to the number of TCs, and hardcodes a queue configuration of "1@0 1@1 ... 1@max-tc". Other configurations are silently ignored and treated the same. Improve that by allowing what the user requests to be actually fulfilled. This allows more than one TX ring per traffic class. For example: $ tc qdisc add dev eno0 root handle 1: mqprio num_tc 4 \ map 0 0 1 1 2 2 3 3 queues 2@0 2@2 2@4 2@6 [ 146.267648] fsl_enetc 0000:00:00.0 eno0: TX ring 0 prio 0 [ 146.273451] fsl_enetc 0000:00:00.0 eno0: TX ring 1 prio 0 [ 146.283280] fsl_enetc 0000:00:00.0 eno0: TX ring 2 prio 1 [ 146.293987] fsl_enetc 0000:00:00.0 eno0: TX ring 3 prio 1 [ 146.300467] fsl_enetc 0000:00:00.0 eno0: TX ring 4 prio 2 [ 146.306866] fsl_enetc 0000:00:00.0 eno0: TX ring 5 prio 2 [ 146.313261] fsl_enetc 0000:00:00.0 eno0: TX ring 6 prio 3 [ 146.319622] fsl_enetc 0000:00:00.0 eno0: TX ring 7 prio 3 $ tc qdisc del dev eno0 root [ 178.238418] fsl_enetc 0000:00:00.0 eno0: TX ring 0 prio 0 [ 178.244369] fsl_enetc 0000:00:00.0 eno0: TX ring 1 prio 0 [ 178.251486] fsl_enetc 0000:00:00.0 eno0: TX ring 2 prio 0 [ 178.258006] fsl_enetc 0000:00:00.0 eno0: TX ring 3 prio 0 [ 178.265038] fsl_enetc 0000:00:00.0 eno0: TX ring 4 prio 0 [ 178.271557] fsl_enetc 0000:00:00.0 eno0: TX ring 5 prio 0 [ 178.277910] fsl_enetc 0000:00:00.0 eno0: TX ring 6 prio 0 [ 178.284281] fsl_enetc 0000:00:00.0 eno0: TX ring 7 prio 0 $ tc qdisc add dev eno0 root handle 1: mqprio num_tc 8 \ map 0 1 2 3 4 5 6 7 queues 1@0 1@1 1@2 1@3 1@4 1@5 1@6 1@7 hw 1 [ 186.113162] fsl_enetc 0000:00:00.0 eno0: TX ring 0 prio 0 [ 186.118764] fsl_enetc 0000:00:00.0 eno0: TX ring 1 prio 1 [ 186.124374] fsl_enetc 0000:00:00.0 eno0: TX ring 2 prio 2 [ 186.130765] fsl_enetc 0000:00:00.0 eno0: TX ring 3 prio 3 [ 186.136404] fsl_enetc 0000:00:00.0 eno0: TX ring 4 prio 4 [ 186.142049] fsl_enetc 0000:00:00.0 eno0: TX ring 5 prio 5 [ 186.147674] fsl_enetc 0000:00:00.0 eno0: TX ring 6 prio 6 [ 186.153305] fsl_enetc 0000:00:00.0 eno0: TX ring 7 prio 7 The driver used to set TC_MQPRIO_HW_OFFLOAD_TCS, near which there is this comment in the UAPI header: TC_MQPRIO_HW_OFFLOAD_TCS, /* offload TCs, no queue counts */ which is what enetc was doing up until now (and no longer is; we offload queue counts too), remove that assignment. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-02-06 10:06:44 +00:00
Vladimir Oltean	735ef62c2f	net: enetc: request mqprio to validate the queue counts The enetc driver does not validate the mqprio queue configuration, so it currently allows things like this: $ tc qdisc add dev swp0 root handle 1: mqprio num_tc 8 \ map 0 1 2 3 4 5 6 7 queues 3@0 1@1 1@2 1@3 1@4 1@5 1@6 1@7 hw 1 But also things like this, completely omitting the queue configuration: $ tc qdisc add dev eno0 root handle 1: mqprio num_tc 8 \ map 0 1 2 3 4 5 6 7 hw 1 By requesting validation via the mqprio capability structure, this is no longer allowed, and we bring what is accepted by hardware in line with what is accepted by software. The check that num_tc <= real_num_tx_queues also becomes superfluous and can be dropped, because mqprio_validate_queue_counts() validates that no TXQ range exceeds real_num_tx_queues. That is a stronger check, because there is at least 1 TXQ per TC, so there are at least as many TXQs as TCs. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-02-06 10:06:44 +00:00
Vladimir Oltean	522d15ea83	net/sched: taprio: only pass gate mask per TXQ for igc, stmmac, tsnep, am65_cpsw There are 2 classes of in-tree drivers currently: - those who act upon struct tc_taprio_sched_entry :: gate_mask as if it holds a bit mask of TXQs - those who act upon the gate_mask as if it holds a bit mask of TCs When it comes to the standard, IEEE 802.1Q-2018 does say this in the second paragraph of section 8.6.8.4 Enhancements for scheduled traffic: \| A gate control list associated with each Port contains an ordered list \| of gate operations. Each gate operation changes the transmission gate \| state for the gate associated with each of the Port's traffic class \| queues and allows associated control operations to be scheduled. In typically obtuse language, it refers to a "traffic class queue" rather than a "traffic class" or a "queue". But careful reading of 802.1Q clarifies that "traffic class" and "queue" are in fact synonymous (see 8.6.6 Queuing frames): \| A queue in this context is not necessarily a single FIFO data structure. \| A queue is a record of all frames of a given traffic class awaiting \| transmission on a given Bridge Port. The structure of this record is not \| specified. i.o.w. their definition of "queue" isn't the Linux TX queue. The gate_mask really is input into taprio via its UAPI as a mask of traffic classes, but taprio_sched_to_offload() converts it into a TXQ mask. The breakdown of drivers which handle TC_SETUP_QDISC_TAPRIO is: - hellcreek, felix, sja1105: these are DSA switches, it's not even very clear what TXQs correspond to, other than purely software constructs. Only the mqprio configuration with 8 TCs and 1 TXQ per TC makes sense. So it's fine to convert these to a gate mask per TC. - enetc: I have the hardware and can confirm that the gate mask is per TC, and affects all TXQs (BD rings) configured for that priority. - igc: in igc_save_qbv_schedule(), the gate_mask is clearly interpreted to be per-TXQ. - tsnep: Gerhard Engleder clarifies that even though this hardware supports at most 1 TXQ per TC, the TXQ indices may be different from the TC values themselves, and it is the TXQ indices that matter to this hardware. So keep it per-TXQ as well. - stmmac: I have a GMAC datasheet, and in the EST section it does specify that the gate events are per TXQ rather than per TC. - lan966x: again, this is a switch, and while not a DSA one, the way in which it implements lan966x_mqprio_add() - by only allowing num_tc == NUM_PRIO_QUEUES (8) - makes it clear to me that TXQs are a purely software construct here as well. They seem to map 1:1 with TCs. - am65_cpsw: from looking at am65_cpsw_est_set_sched_cmds(), I get the impression that the fetch_allow variable is treated like a prio_mask. This definitely sounds closer to a per-TC gate mask rather than a per-TXQ one, and TI documentation does seem to recomment an identity mapping between TCs and TXQs. However, Roger Quadros would like to do some testing before making changes, so I'm leaving this driver to operate as it did before, for now. Link with more details at the end. Based on this breakdown, we have 5 drivers with a gate mask per TC and 4 with a gate mask per TXQ. So let's make the gate mask per TXQ the opt-in and the gate mask per TC the default. Benefit from the TC_QUERY_CAPS feature that Jakub suggested we add, and query the device driver before calling the proper ndo_setup_tc(), and figure out if it expects one or the other format. Link: https://patchwork.kernel.org/project/netdevbpf/patch/20230202003621.2679603-15-vladimir.oltean@nxp.com/#25193204 Cc: Horatiu Vultur <horatiu.vultur@microchip.com> Cc: Siddharth Vadapalli <s-vadapalli@ti.com> Cc: Roger Quadros <rogerq@kernel.org> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Acked-by: Kurt Kanzenbach <kurt@linutronix.de> # hellcreek Reviewed-by: Gerhard Engleder <gerhard@engleder-embedded.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-02-06 10:06:44 +00:00

1 2 3 4 5 ...

1156325 commits