mirror of
git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
synced 2025-11-27 01:11:31 +00:00
Changing API xdp_return_frame() to take struct xdp_frame as argument, seems like a natural choice. But there are some subtle performance details here that needs extra care, which is a deliberate choice. When de-referencing xdp_frame on a remote CPU during DMA-TX completion, result in the cache-line is change to "Shared" state. Later when the page is reused for RX, then this xdp_frame cache-line is written, which change the state to "Modified". This situation already happens (naturally) for, virtio_net, tun and cpumap as the xdp_frame pointer is the queued object. In tun and cpumap, the ptr_ring is used for efficiently transferring cache-lines (with pointers) between CPUs. Thus, the only option is to de-referencing xdp_frame. It is only the ixgbe driver that had an optimization, in which it can avoid doing the de-reference of xdp_frame. The driver already have TX-ring queue, which (in case of remote DMA-TX completion) have to be transferred between CPUs anyhow. In this data area, we stored a struct xdp_mem_info and a data pointer, which allowed us to avoid de-referencing xdp_frame. To compensate for this, a prefetchw is used for telling the cache coherency protocol about our access pattern. My benchmarks show that this prefetchw is enough to compensate the ixgbe driver. V7: Adjust for commit |
||
|---|---|---|
| .. | ||
| arraymap.c | ||
| bpf_lru_list.c | ||
| bpf_lru_list.h | ||
| cgroup.c | ||
| core.c | ||
| cpumap.c | ||
| devmap.c | ||
| disasm.c | ||
| disasm.h | ||
| hashtab.c | ||
| helpers.c | ||
| inode.c | ||
| lpm_trie.c | ||
| Makefile | ||
| map_in_map.c | ||
| map_in_map.h | ||
| offload.c | ||
| percpu_freelist.c | ||
| percpu_freelist.h | ||
| sockmap.c | ||
| stackmap.c | ||
| syscall.c | ||
| tnum.c | ||
| verifier.c | ||