mirror of
git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
synced 2025-09-18 22:14:16 +00:00
Lorenz recently reported:
In our TC classifier cls_redirect [0], we use the following sequence of
helper calls to decapsulate a GUE (basically IP + UDP + custom header)
encapsulated packet:
bpf_skb_adjust_room(skb, -encap_len, BPF_ADJ_ROOM_MAC, BPF_F_ADJ_ROOM_FIXED_GSO)
bpf_redirect(skb->ifindex, BPF_F_INGRESS)
It seems like some checksums of the inner headers are not validated in
this case. For example, a TCP SYN packet with invalid TCP checksum is
still accepted by the network stack and elicits a SYN ACK. [...]
That is, we receive the following packet from the driver:
| ETH | IP | UDP | GUE | IP | TCP |
skb->ip_summed == CHECKSUM_UNNECESSARY
ip_summed is CHECKSUM_UNNECESSARY because our NICs do rx checksum offloading.
On this packet we run skb_adjust_room_mac(-encap_len), and get the following:
| ETH | IP | TCP |
skb->ip_summed == CHECKSUM_UNNECESSARY
Note that ip_summed is still CHECKSUM_UNNECESSARY. After bpf_redirect()'ing
into the ingress, we end up in tcp_v4_rcv(). There, skb_checksum_init() is
turned into a no-op due to CHECKSUM_UNNECESSARY.
The bpf_skb_adjust_room() helper is not aware of protocol specifics. Internally,
it handles the CHECKSUM_COMPLETE case via skb_postpull_rcsum(), but that does
not cover CHECKSUM_UNNECESSARY. In this case skb->csum_level of the original
skb prior to bpf_skb_adjust_room() call was 0, that is, covering UDP. Right now
there is no way to adjust the skb->csum_level. NICs that have checksum offload
disabled (CHECKSUM_NONE) or that support CHECKSUM_COMPLETE are not affected.
Use a safe default for CHECKSUM_UNNECESSARY by resetting to CHECKSUM_NONE and
add a flag to the helper called BPF_F_ADJ_ROOM_NO_CSUM_RESET that allows users
from opting out. Opting out is useful for the case where we don't remove/add
full protocol headers, or for the case where a user wants to adjust the csum
level manually e.g. through bpf_csum_level() helper that is added in subsequent
patch.
The bpf_skb_proto_{4_to_6,6_to_4}() for NAT64/46 translation from the BPF
bpf_skb_change_proto() helper uses bpf_skb_net_hdr_{push,pop}() pair internally
as well but doesn't change layers, only transitions between v4 to v6 and vice
versa, therefore no adoption is required there.
[0] https://lore.kernel.org/bpf/20200424185556.7358-1-lmb@cloudflare.com/
Fixes:
|
||
|---|---|---|
| .. | ||
| 6lowpan | ||
| 9p | ||
| 802 | ||
| 8021q | ||
| appletalk | ||
| atm | ||
| ax25 | ||
| batman-adv | ||
| bluetooth | ||
| bpf | ||
| bpfilter | ||
| bridge | ||
| caif | ||
| can | ||
| ceph | ||
| core | ||
| dcb | ||
| dccp | ||
| decnet | ||
| dns_resolver | ||
| dsa | ||
| ethernet | ||
| ethtool | ||
| hsr | ||
| ieee802154 | ||
| ife | ||
| ipv4 | ||
| ipv6 | ||
| iucv | ||
| kcm | ||
| key | ||
| l2tp | ||
| l3mdev | ||
| lapb | ||
| llc | ||
| mac80211 | ||
| mac802154 | ||
| mpls | ||
| mptcp | ||
| ncsi | ||
| netfilter | ||
| netlabel | ||
| netlink | ||
| netrom | ||
| nfc | ||
| nsh | ||
| openvswitch | ||
| packet | ||
| phonet | ||
| psample | ||
| qrtr | ||
| rds | ||
| rfkill | ||
| rose | ||
| rxrpc | ||
| sched | ||
| sctp | ||
| smc | ||
| strparser | ||
| sunrpc | ||
| switchdev | ||
| tipc | ||
| tls | ||
| unix | ||
| vmw_vsock | ||
| wimax | ||
| wireless | ||
| x25 | ||
| xdp | ||
| xfrm | ||
| compat.c | ||
| devres.c | ||
| Kconfig | ||
| Makefile | ||
| socket.c | ||
| sysctl_net.c | ||