linux/net
Chuck Lever 2324fbedc2 xprtrdma: Pad optimization, revisited
The NetApp Linux team discovered that with NFS/RDMA servers that do
not support RFC 8797, the Linux client is forming NFSv4.x WRITE
requests incorrectly.

In this case, the Linux NFS client disables implicit chunk round-up
for odd-length Read and Write chunks. The goal was to support old
servers that needed that padding to be sent explicitly by clients.

In that case the Linux NFS included the tail kvec in the Read chunk,
since the tail contains any needed padding. That meant a separate
memory registration is needed for the tail kvec, adding to the cost
of forming such requests. To avoid that cost for a mere 3 bytes of
zeroes that are always ignored by receivers, we try to use implicit
roundup when possible.

For NFSv4.x, the tail kvec also sometimes contains a trailing
GETATTR operation. The Linux NFS client unintentionally includes
that GETATTR operation in the Read chunk as well as inline.

The fix is simply to /never/ include the tail kvec when forming a
data payload Read chunk. The padding is thus now always present.

Note that since commit 9ed5af268e ("SUNRPC: Clean up the handling
of page padding in rpc_prepare_reply_pages()") [Dec 2020] the NFS
client passes payload data to the transport with the padding in
xdr->pages instead of in the send buffer's tail kvec. So now the
Linux NFS client appends XDR padding to all odd-sized Read chunks.
This shouldn't be a problem because:

 - RFC 8166-compliant servers are supposed to work with or without
   that XDR padding in Read chunks.

 - Since the padding is now in the same memory region as the data
   payload, a separate memory registration is not needed. In
   addition, the link layer extends data in RDMA Read responses to
   4-byte boundaries anyway. Thus there is now no savings when the
   padding is not included.

Because older kernels include the payload's XDR padding in the
tail kvec, a fix there will be more complicated. Thus backporting
this patch is not recommended.

Reported by: Olga Kornievskaia <Olga.Kornievskaia@netapp.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Tom Talpey <tom@talpey.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2021-02-05 11:16:56 -05:00
..
6lowpan
9p 9p for 5.11-rc1 2020-12-21 10:28:02 -08:00
802
8021q net: make free_netdev() more lenient with unregistering devices 2021-01-08 19:27:41 -08:00
appletalk
atm atm: nicstar: Replace in_interrupt() usage 2020-11-18 16:43:55 -08:00
ax25
batman-adv batman-adv: Drop unused soft-interface.h include in fragmentation.c 2020-12-04 08:41:16 +01:00
bluetooth Bluetooth: Increment management interface revision 2020-12-07 17:02:00 +02:00
bpf bpf: Reject too big ctx_size_in for raw_tp test run 2021-01-13 19:31:43 -08:00
bpfilter
bridge net: mrp: move struct definitions out of uapi 2021-01-23 12:38:42 -08:00
caif
can can: isotp: isotp_getname(): fix kernel information leak 2021-01-13 22:15:13 +01:00
ceph libceph: fix "Boolean result is used in bitwise operation" warning 2021-01-21 16:49:59 +01:00
core net: Disable NETIF_F_HW_TLS_RX when RXCSUM is disabled 2021-01-19 15:58:05 -08:00
dcb net: dcb: Accept RTM_GETDCB messages carrying set-like DCB commands 2021-01-12 15:55:21 -08:00
dccp selinux/stable-5.11 PR 20201214 2020-12-16 11:01:04 -08:00
decnet net: decnet: fix netdev refcount leaking on error path 2021-01-27 17:33:46 -08:00
dns_resolver
dsa net: dsa: clear devlink port type before unregistering slave netdevs 2021-01-12 18:48:50 -08:00
ethernet net: datagram: fix some kernel-doc markups 2020-11-17 14:15:03 -08:00
ethtool ethtool: fix error paths in ethnl_set_channels() 2020-12-16 13:27:17 -08:00
hsr
ieee802154 treewide: rename nla_strlcpy to nla_strscpy. 2020-11-16 08:08:54 -08:00
ife
ipv4 tcp: fix TLP timer not set when CA_STATE changes from DISORDER to OPEN 2021-01-23 21:33:01 -08:00
ipv6 ipv6: set multicast flag on the multicast route 2021-01-18 19:52:37 -08:00
iucv net/af_iucv: use DECLARE_SOCKADDR to cast from sockaddr 2020-12-08 15:56:53 -08:00
kcm
key af_key: relax availability checks for skb size calculation 2021-01-04 10:05:50 +01:00
l2tp lsm,selinux: pass flowi_common instead of flowi to the LSM hooks 2020-11-23 18:36:21 -05:00
l3mdev
lapb net: lapb: Add locking to the lapb module 2021-01-26 17:53:45 -08:00
llc
mac80211 mac80211: pause TX while changing interface type 2021-01-26 11:59:45 +01:00
mac802154 net: mac802154: convert tasklets to use new tasklet_setup() API 2020-11-07 10:40:56 -08:00
mpls mpls: drop skb's dst in mpls_forward() 2020-11-03 12:55:53 -08:00
mptcp mptcp: fix locking in mptcp_disconnect() 2021-01-14 11:25:21 -08:00
ncsi net/ncsi: Use real net-device for response handler 2020-12-23 12:22:23 -08:00
netfilter netfilter: nft_dynset: dump expressions when set definition contains no expressions 2021-01-16 19:54:42 +01:00
netlabel Merge https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2020-11-19 19:08:46 -08:00
netlink
netrom
nfc NFC: fix resource leak when target index is invalid 2021-01-23 13:34:35 -08:00
nsh
openvswitch net: openvswitch: fix TTL decrement exception action execution 2020-12-14 17:18:25 -08:00
packet net: af_packet: fix procfs header for 64-bit pointers 2020-12-18 12:17:23 -08:00
phonet
psample
qrtr net: qrtr: fix null-ptr-deref in qrtr_ns_remove 2021-01-05 16:50:09 -08:00
rds rds: stop using dmapool 2020-11-17 15:22:06 -04:00
rfkill rfkill: add a reason to the HW rfkill state 2020-12-11 12:47:17 +01:00
rose rose: Fix Null pointer dereference in rose_send_frame() 2020-11-20 10:04:58 -08:00
rxrpc rxrpc: Fix memory leak in rxrpc_lookup_local 2021-01-28 13:12:14 -08:00
sched net_sched: avoid shift-out-of-bounds in tcindex_set_parms() 2021-01-15 18:11:10 -08:00
sctp sctp: Fix some typo 2020-11-23 17:44:11 -08:00
smc net/smc: use memcpy instead of snprintf to avoid out of bounds read 2021-01-12 20:22:01 -08:00
strparser
sunrpc xprtrdma: Pad optimization, revisited 2021-02-05 11:16:56 -05:00
switchdev net: switchdev: don't set port_obj_info->handled true when -EOPNOTSUPP 2021-01-27 16:41:59 -08:00
tipc net: tip: fix a couple kernel-doc markups 2021-01-14 10:30:24 -08:00
tls net: fix proc_fs init handling in af_packet and tls 2020-12-14 19:39:30 -08:00
unix
vmw_vsock af_vsock: Assign the vsock transport considering the vsock address flags 2020-12-14 19:33:39 -08:00
wireless wext: fix NULL-ptr-dereference with cfg80211's lack of commit() 2021-01-26 11:59:42 +01:00
x25 net: x25: Remove unimplemented X.25-over-LLC code stubs 2020-12-12 17:15:33 -08:00
xdp xsk: Clear pool even for inactive queues 2021-01-19 22:47:04 +01:00
xfrm Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec 2021-01-21 11:05:10 -08:00
compat.c
devres.c
Kconfig
Makefile
socket.c for-5.11/io_uring-2020-12-14 2020-12-16 12:44:05 -08:00
sysctl_net.c