linux/net
Jakub Sitnicki 035ff358f2 net: Generate reuseport group ID on group creation
Commit 736b46027e ("net: Add ID (if needed) to sock_reuseport and expose
reuseport_lock") has introduced lazy generation of reuseport group IDs that
survive group resize.

By comparing the identifier we check if BPF reuseport program is not trying
to select a socket from a BPF map that belongs to a different reuseport
group than the one the packet is for.

Because SOCKARRAY used to be the only BPF map type that can be used with
reuseport BPF, it was possible to delay the generation of reuseport group
ID until a socket from the group was inserted into BPF map for the first
time.

Now that SOCK{MAP,HASH} can be used with reuseport BPF we have two options,
either generate the reuseport ID on map update, like SOCKARRAY does, or
allocate an ID from the start when reuseport group gets created.

This patch takes the latter approach to keep sockmap free of calls into
reuseport code. This streamlines the reuseport_id access as its lifetime
now matches the longevity of reuseport object.

The cost of this simplification, however, is that we allocate reuseport IDs
for all SO_REUSEPORT users. Even those that don't use SOCKARRAY in their
setups. With the way identifiers are currently generated, we can have at
most S32_MAX reuseport groups, which hopefully is sufficient. If we ever
get close to the limit, we can switch an u64 counter like sk_cookie.

Another change is that we now always call into SOCKARRAY logic to unlink
the socket from the map when unhashing or closing the socket. Previously we
did it only when at least one socket from the group was in a BPF map.

It is worth noting that this doesn't conflict with sockmap tear-down in
case a socket is in a SOCK{MAP,HASH} and belongs to a reuseport
group. sockmap tear-down happens first:

  prot->unhash
  `- tcp_bpf_unhash
     |- tcp_bpf_remove
     |  `- while (sk_psock_link_pop(psock))
     |     `- sk_psock_unlink
     |        `- sock_map_delete_from_link
     |           `- __sock_map_delete
     |              `- sock_map_unref
     |                 `- sk_psock_put
     |                    `- sk_psock_drop
     |                       `- rcu_assign_sk_user_data(sk, NULL)
     `- inet_unhash
        `- reuseport_detach_sock
           `- bpf_sk_reuseport_detach
              `- WRITE_ONCE(sk->sk_user_data, NULL)

Suggested-by: Martin Lau <kafai@fb.com>
Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20200218171023.844439-10-jakub@cloudflare.com
2020-02-21 22:29:45 +01:00
..
6lowpan
9p
802
8021q net: vlan: suppress "failed to kill vid" warnings 2020-02-17 14:30:54 -08:00
appletalk
atm proc: convert everything to "struct proc_ops" 2020-02-04 03:05:26 +00:00
ax25 net: Make sock protocol value checks more specific 2020-01-09 18:41:40 -08:00
batman-adv Merge ra.kernel.org:/pub/scm/linux/kernel/git/netdev/net 2020-01-19 22:10:04 +01:00
bluetooth Bluetooth: Fix race condition in hci_release_sock() 2020-01-26 10:34:17 +02:00
bpf
bpfilter kbuild: rename hostprogs-y/always to hostprogs/always-y 2020-02-04 01:53:07 +09:00
bridge net: bridge: vlan: add per-vlan state 2020-01-24 12:58:14 +01:00
caif caif_usb: fix spelling mistake "to" -> "too" 2020-01-24 08:12:06 +01:00
can
ceph Merge branch 'merge.nfs-fs_parse.1' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2020-02-08 13:26:41 -08:00
core net: Generate reuseport group ID on group creation 2020-02-21 22:29:45 +01:00
dcb
dccp
decnet net: Make sock protocol value checks more specific 2020-01-09 18:41:40 -08:00
dns_resolver
dsa net: dsa: tag_ar9331: Make sure there is headroom for tag 2020-02-14 07:34:51 -08:00
ethernet net: remove eth_change_mtu 2020-01-27 11:09:31 +01:00
ethtool net/core: Replace driver version to be kernel version 2020-01-27 13:47:22 +01:00
hsr net: hsr: fix possible NULL deref in hsr_handle_frame() 2020-02-04 09:27:07 +01:00
ieee802154
ife
ipv4 tcp_bpf: Don't let child socket inherit parent protocol ops on copy 2020-02-21 22:29:45 +01:00
ipv6 net, ip6_tunnel: enhance tunnel locate with link check 2020-02-14 07:31:48 -08:00
iucv
kcm
key
l2tp l2tp: Allow duplicate session creation with UDP 2020-02-04 12:35:49 +01:00
l3mdev
lapb
llc llc2: Fix return statement of llc_stat_ev_rx_null_dsap_xid_c (and _test_c) 2019-12-20 21:19:36 -08:00
mac80211 A few big new things: 2020-02-16 19:00:22 -08:00
mac802154
mpls
mptcp mptcp: make the symbol 'mptcp_sk_clone_lock' static 2020-02-10 10:23:00 +01:00
ncsi net/ncsi: Support for multi host mellanox card 2020-01-09 18:36:22 -08:00
netfilter Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2020-02-04 13:32:20 +00:00
netlabel
netlink net: netlink: Replace zero-length array with flexible-array member 2020-02-17 19:05:06 -08:00
netrom
nfc NFC: digital: Replace zero-length array with flexible-array member 2020-02-17 19:05:05 -08:00
nsh
openvswitch openvswitch: add TTL decrement action 2020-02-16 19:34:44 -08:00
packet y2038: core, driver and file system changes 2020-01-29 14:55:47 -08:00
phonet net: Remove redundant BUG_ON() check in phonet_pernet 2020-01-03 12:25:50 -08:00
psample
qrtr net: qrtr: Remove receive worker 2020-01-14 18:36:42 -08:00
rds net/rds: Use prefetch for On-Demand-Paging MR 2020-01-18 11:48:19 +02:00
rfkill
rose Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2020-01-26 10:40:21 +01:00
rxrpc rxrpc: Fix call RCU cleanup using non-bh-safe locks 2020-02-07 11:20:57 +01:00
sched net: sched: don't take rtnl lock during flow_action setup 2020-02-17 14:17:02 -08:00
sctp Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2020-01-09 12:13:43 -08:00
smc net/smc: reduce port_event scheduling 2020-02-17 14:50:24 -08:00
strparser
sunrpc Highlights: 2020-02-07 17:50:21 -08:00
switchdev net: switchdev: Replace zero-length array with flexible-array member 2020-02-17 19:05:06 -08:00
tipc tipc: fix successful connect() but timed out 2020-02-10 10:23:00 +01:00
tls net, sk_msg: Annotate lockless access to sk_prot on clone 2020-02-21 22:29:45 +01:00
unix skbuff: fix a data race in skb_queue_len() 2020-02-06 13:59:10 +01:00
vmw_vsock net: virtio_vsock: Enhance connection semantics 2020-02-16 19:01:49 -08:00
wimax
wireless A few big new things: 2020-02-16 19:00:22 -08:00
x25 net: x25: convert to list_for_each_entry_safe() 2020-02-16 18:59:42 -08:00
xdp mm, tree-wide: rename put_user_page*() to unpin_user_page*() 2020-01-31 10:30:38 -08:00
xfrm xfrm: interface: use icmp_ndo_send helper 2020-02-13 14:19:00 -08:00
compat.c
Kconfig mptcp: Add MPTCP socket stubs 2020-01-24 13:44:07 +01:00
Makefile mptcp: Add MPTCP socket stubs 2020-01-24 13:44:07 +01:00
socket.c socket: fix unused-function warning 2020-01-08 15:02:21 -08:00
sysctl_net.c