linux/net/ipv4
Soheil Hassas Yeganeh b75eba76d3 tcp: send in-queue bytes in cmsg upon read
Applications with many concurrent connections, high variance
in receive queue length and tight memory bounds cannot
allocate worst-case buffer size to drain sockets. Knowing
the size of receive queue length, applications can optimize
how they allocate buffers to read from the socket.

The number of bytes pending on the socket is directly
available through ioctl(FIONREAD/SIOCINQ) and can be
approximated using getsockopt(MEMINFO) (rmem_alloc includes
skb overheads in addition to application data). But, both of
these options add an extra syscall per recvmsg. Moreover,
ioctl(FIONREAD/SIOCINQ) takes the socket lock.

Add the TCP_INQ socket option to TCP. When this socket
option is set, recvmsg() relays the number of bytes available
on the socket for reading to the application via the
TCP_CM_INQ control message.

Calculate the number of bytes after releasing the socket lock
to include the processed backlog, if any. To avoid an extra
branch in the hot path of recvmsg() for this new control
message, move all cmsg processing inside an existing branch for
processing receive timestamps. Since the socket lock is not held
when calculating the size of receive queue, TCP_INQ is a hint.
For example, it can overestimate the queue size by one byte,
if FIN is received.

With this method, applications can start reading from the socket
using a small buffer, and then use larger buffers based on the
remaining data when needed.

V3 change-log:
	As suggested by David Miller, added loads with barrier
	to check whether we have multiple threads calling recvmsg
	in parallel. When that happens we lock the socket to
	calculate inq.
V4 change-log:
	Removed inline from a static function.

Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Neal Cardwell <ncardwell@google.com>
Suggested-by: David Miller <davem@davemloft.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-01 18:56:29 -04:00
..
netfilter kbuild: rename *-asn1.[ch] to *.asn1.[ch] 2018-04-07 19:04:02 +09:00
af_inet.c tcp: add TCP_ZEROCOPY_RECEIVE support for zerocopy receive 2018-04-29 21:29:55 -04:00
ah4.c
arp.c arp: fix arp_filter on l3slave devices 2018-04-05 22:05:03 -04:00
cipso_ipv4.c
datagram.c
devinet.c net: Drop pernet_operations::async 2018-03-27 13:18:09 -04:00
esp4.c
esp4_offload.c
fib_frontend.c net: Drop pernet_operations::async 2018-03-27 13:18:09 -04:00
fib_lookup.h
fib_notifier.c
fib_rules.c net: fib_rules: add extack support 2018-04-23 10:21:24 -04:00
fib_semantics.c net: Move fib_convert_metrics to metrics file 2018-04-17 23:41:15 -04:00
fib_trie.c net/ipv4: Allow notifier to fail route replace 2018-03-29 14:10:30 -04:00
fou.c net: Drop pernet_operations::async 2018-03-27 13:18:09 -04:00
gre_demux.c
gre_offload.c
icmp.c net: Drop pernet_operations::async 2018-03-27 13:18:09 -04:00
igmp.c net: Drop pernet_operations::async 2018-03-27 13:18:09 -04:00
inet_connection_sock.c
inet_diag.c
inet_fragment.c inet: frags: remove inet_frag_maybe_warn_overflow() 2018-03-31 23:25:39 -04:00
inet_hashtables.c
inet_timewait_sock.c soreuseport: initialise timewait reuseport field 2018-04-07 22:32:32 -04:00
inetpeer.c inetpeer: fix uninit-value in inet_getpeer 2018-04-09 10:57:35 -04:00
ip_forward.c
ip_fragment.c inet: frags: fix ip6frag_low_thresh boundary 2018-04-04 12:04:59 -04:00
ip_gre.c erspan: auto detect truncated packets. 2018-04-30 11:43:45 -04:00
ip_input.c
ip_options.c
ip_output.c udp: paged allocation with gso 2018-04-26 15:08:15 -04:00
ip_sockglue.c
ip_tunnel.c ip_tunnel: better validate user provided tunnel names 2018-04-05 15:16:15 -04:00
ip_tunnel_core.c
ip_vti.c net: Drop pernet_operations::async 2018-03-27 13:18:09 -04:00
ipcomp.c
ipconfig.c ipconfig: Write NTP server IPs to /proc/net/ipconfig/ntp_servers 2018-04-24 13:40:42 -04:00
ipip.c net: Drop pernet_operations::async 2018-03-27 13:18:09 -04:00
ipmr.c net: fib_rules: add extack support 2018-04-23 10:21:24 -04:00
ipmr_base.c ipmr: Make ipmr_dump() common 2018-03-26 13:14:43 -04:00
Kconfig
Makefile net: Move fib_convert_metrics to metrics file 2018-04-17 23:41:15 -04:00
metrics.c net: Move fib_convert_metrics to metrics file 2018-04-17 23:41:15 -04:00
netfilter.c
ping.c net: Drop pernet_operations::async 2018-03-27 13:18:09 -04:00
proc.c tcp: export packets delivery info 2018-04-19 13:05:16 -04:00
protocol.c
raw.c net: Drop pernet_operations::async 2018-03-27 13:18:09 -04:00
raw_diag.c
route.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2018-04-09 17:04:10 -07:00
syncookies.c net/ipv4: disable SMC TCP option with SYN Cookies 2018-03-25 20:53:54 -04:00
sysctl_net_ipv4.c net: Drop pernet_operations::async 2018-03-27 13:18:09 -04:00
tcp.c tcp: send in-queue bytes in cmsg upon read 2018-05-01 18:56:29 -04:00
tcp_bbr.c
tcp_bic.c
tcp_cdg.c
tcp_cong.c
tcp_cubic.c
tcp_dctcp.c
tcp_diag.c
tcp_fastopen.c
tcp_highspeed.c
tcp_htcp.c
tcp_hybla.c
tcp_illinois.c
tcp_input.c tcp: Add clean acked data hook 2018-05-01 09:42:46 -04:00
tcp_ipv4.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next 2018-03-31 23:33:04 -04:00
tcp_lp.c
tcp_metrics.c net: Drop pernet_operations::async 2018-03-27 13:18:09 -04:00
tcp_minisocks.c crypto : chtls - CPL handler definition 2018-03-31 23:37:32 -04:00
tcp_nv.c
tcp_offload.c
tcp_output.c tcp: remove mss check in tcp_select_initial_window() 2018-04-27 14:05:36 -04:00
tcp_rate.c
tcp_recovery.c
tcp_scalable.c
tcp_timer.c
tcp_ulp.c
tcp_vegas.c
tcp_vegas.h
tcp_veno.c
tcp_westwood.c
tcp_yeah.c
tunnel4.c
udp.c udp: disable gso with no_check_tx 2018-05-01 14:20:14 -04:00
udp_diag.c
udp_impl.h
udp_offload.c udp: remove stray export symbol 2018-04-27 20:32:39 -04:00
udp_tunnel.c
udplite.c net: Drop pernet_operations::async 2018-03-27 13:18:09 -04:00
xfrm4_input.c
xfrm4_mode_beet.c
xfrm4_mode_transport.c
xfrm4_mode_tunnel.c
xfrm4_output.c
xfrm4_policy.c net: Drop pernet_operations::async 2018-03-27 13:18:09 -04:00
xfrm4_protocol.c
xfrm4_state.c
xfrm4_tunnel.c