2025-04-15 13:17:23 +02:00
|
|
|
// SPDX-License-Identifier: GPL-2.0
|
|
|
|
/* OpenVPN data channel offload
|
|
|
|
*
|
|
|
|
* Copyright (C) 2019-2025 OpenVPN, Inc.
|
|
|
|
*
|
|
|
|
* Author: Antonio Quartulli <antonio@openvpn.net>
|
|
|
|
*/
|
|
|
|
|
|
|
|
#include <linux/netdevice.h>
|
2025-04-15 13:17:24 +02:00
|
|
|
#include <linux/inetdevice.h>
|
|
|
|
#include <linux/skbuff.h>
|
2025-04-15 13:17:23 +02:00
|
|
|
#include <linux/socket.h>
|
|
|
|
#include <linux/udp.h>
|
2025-04-15 13:17:24 +02:00
|
|
|
#include <net/addrconf.h>
|
|
|
|
#include <net/dst_cache.h>
|
|
|
|
#include <net/route.h>
|
|
|
|
#include <net/ipv6_stubs.h>
|
2025-04-15 13:17:25 +02:00
|
|
|
#include <net/transp_v6.h>
|
2025-04-15 13:17:23 +02:00
|
|
|
#include <net/udp.h>
|
2025-04-15 13:17:24 +02:00
|
|
|
#include <net/udp_tunnel.h>
|
2025-04-15 13:17:23 +02:00
|
|
|
|
|
|
|
#include "ovpnpriv.h"
|
|
|
|
#include "main.h"
|
2025-04-15 13:17:24 +02:00
|
|
|
#include "bind.h"
|
|
|
|
#include "io.h"
|
|
|
|
#include "peer.h"
|
2025-04-15 13:17:25 +02:00
|
|
|
#include "proto.h"
|
2025-04-15 13:17:23 +02:00
|
|
|
#include "socket.h"
|
|
|
|
#include "udp.h"
|
|
|
|
|
2025-04-15 13:17:25 +02:00
|
|
|
/* Retrieve the corresponding ovpn object from a UDP socket
|
|
|
|
* rcu_read_lock must be held on entry
|
|
|
|
*/
|
|
|
|
static struct ovpn_socket *ovpn_socket_from_udp_sock(struct sock *sk)
|
|
|
|
{
|
|
|
|
struct ovpn_socket *ovpn_sock;
|
|
|
|
|
|
|
|
if (unlikely(READ_ONCE(udp_sk(sk)->encap_type) != UDP_ENCAP_OVPNINUDP))
|
|
|
|
return NULL;
|
|
|
|
|
|
|
|
ovpn_sock = rcu_dereference_sk_user_data(sk);
|
|
|
|
if (unlikely(!ovpn_sock))
|
|
|
|
return NULL;
|
|
|
|
|
|
|
|
/* make sure that sk matches our stored transport socket */
|
ovpn: ensure sk is still valid during cleanup
Removing a peer while userspace attempts to close its transport
socket triggers a race condition resulting in the following
crash:
Oops: general protection fault, probably for non-canonical address 0xdffffc0000000077: 0000 [#1] SMP KASAN
KASAN: null-ptr-deref in range [0x00000000000003b8-0x00000000000003bf]
CPU: 12 UID: 0 PID: 162 Comm: kworker/12:1 Tainted: G O 6.15.0-rc2-00635-g521139ac3840 #272 PREEMPT(full)
Tainted: [O]=OOT_MODULE
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-20240910_120124-localhost 04/01/2014
Workqueue: events ovpn_peer_keepalive_work [ovpn]
RIP: 0010:ovpn_socket_release+0x23c/0x500 [ovpn]
Code: ea 03 80 3c 02 00 0f 85 71 02 00 00 48 b8 00 00 00 00 00 fc ff df 4d 8b 64 24 18 49 8d bc 24 be 03 00 00 48 89 fa 48 c1 ea 03 <0f> b6 14 02 48 89 f8 83 e0 07 83 c0 01 38 d0 7c 08 84 d2 0f 85 30
RSP: 0018:ffffc90000c9fb18 EFLAGS: 00010217
RAX: dffffc0000000000 RBX: ffff8881148d7940 RCX: ffffffff817787bb
RDX: 0000000000000077 RSI: 0000000000000008 RDI: 00000000000003be
RBP: ffffc90000c9fb30 R08: 0000000000000000 R09: fffffbfff0d3e840
R10: ffffffff869f4207 R11: 0000000000000000 R12: 0000000000000000
R13: ffff888115eb9300 R14: ffffc90000c9fbc8 R15: 000000000000000c
FS: 0000000000000000(0000) GS:ffff8882b0151000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f37266b6114 CR3: 00000000054a8000 CR4: 0000000000750ef0
PKRU: 55555554
Call Trace:
<TASK>
unlock_ovpn+0x8b/0xe0 [ovpn]
ovpn_peer_keepalive_work+0xe3/0x540 [ovpn]
? ovpn_peers_free+0x780/0x780 [ovpn]
? lock_acquire+0x56/0x70
? process_one_work+0x888/0x1740
process_one_work+0x933/0x1740
? pwq_dec_nr_in_flight+0x10b0/0x10b0
? move_linked_works+0x12d/0x2c0
? assign_work+0x163/0x270
worker_thread+0x4d6/0xd90
? preempt_count_sub+0x4c/0x70
? process_one_work+0x1740/0x1740
kthread+0x36c/0x710
? trace_preempt_on+0x8c/0x1e0
? kthread_is_per_cpu+0xc0/0xc0
? preempt_count_sub+0x4c/0x70
? _raw_spin_unlock_irq+0x36/0x60
? calculate_sigpending+0x7b/0xa0
? kthread_is_per_cpu+0xc0/0xc0
ret_from_fork+0x3a/0x80
? kthread_is_per_cpu+0xc0/0xc0
ret_from_fork_asm+0x11/0x20
</TASK>
Modules linked in: ovpn(O)
This happens because the peer deletion operation reaches
ovpn_socket_release() while ovpn_sock->sock (struct socket *)
and its sk member (struct sock *) are still both valid.
Here synchronize_rcu() is invoked, after which ovpn_sock->sock->sk
becomes NULL, due to the concurrent socket closing triggered
from userspace.
After having invoked synchronize_rcu(), ovpn_socket_release() will
attempt dereferencing ovpn_sock->sock->sk, triggering the crash
reported above.
The reason for accessing sk is that we need to retrieve its
protocol and continue the cleanup routine accordingly.
This crash can be easily produced by running openvpn userspace in
client mode with `--keepalive 10 20`, while entirely omitting this
option on the server side.
After 20 seconds ovpn will assume the peer (server) to be dead,
will start removing it and will notify userspace. The latter will
receive the notification and close the transport socket, thus
triggering the crash.
To fix the race condition for good, we need to refactor struct ovpn_socket.
Since ovpn is always only interested in the sock->sk member (struct sock *)
we can directly hold a reference to it, raher than accessing it via
its struct socket container.
This means changing "struct socket *ovpn_socket->sock" to
"struct sock *ovpn_socket->sk".
While acquiring a reference to sk, we can increase its refcounter
without affecting the socket close()/destroy() notification
(which we rely on when userspace closes a socket we are using).
By increasing sk's refcounter we know we can dereference it
in ovpn_socket_release() without incurring in any race condition
anymore.
ovpn_socket_release() will ultimately decrease the reference
counter.
Cc: Oleksandr Natalenko <oleksandr@natalenko.name>
Fixes: 11851cbd60ea ("ovpn: implement TCP transport")
Reported-by: Qingfang Deng <dqfext@gmail.com>
Closes: https://github.com/OpenVPN/ovpn-net-next/issues/1
Tested-by: Gert Doering <gert@greenie.muc.de>
Link: https://www.mail-archive.com/openvpn-devel@lists.sourceforge.net/msg31575.html
Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Signed-off-by: Antonio Quartulli <antonio@openvpn.net>
2025-04-30 02:26:49 +02:00
|
|
|
if (unlikely(!ovpn_sock->sk || sk != ovpn_sock->sk))
|
2025-04-15 13:17:25 +02:00
|
|
|
return NULL;
|
|
|
|
|
|
|
|
return ovpn_sock;
|
|
|
|
}
|
|
|
|
|
|
|
|
/**
|
|
|
|
* ovpn_udp_encap_recv - Start processing a received UDP packet.
|
|
|
|
* @sk: socket over which the packet was received
|
|
|
|
* @skb: the received packet
|
|
|
|
*
|
|
|
|
* If the first byte of the payload is:
|
|
|
|
* - DATA_V2 the packet is accepted for further processing,
|
|
|
|
* - DATA_V1 the packet is dropped as not supported,
|
|
|
|
* - anything else the packet is forwarded to the UDP stack for
|
|
|
|
* delivery to user space.
|
|
|
|
*
|
|
|
|
* Return:
|
|
|
|
* 0 if skb was consumed or dropped
|
|
|
|
* >0 if skb should be passed up to userspace as UDP (packet not consumed)
|
|
|
|
* <0 if skb should be resubmitted as proto -N (packet not consumed)
|
|
|
|
*/
|
|
|
|
static int ovpn_udp_encap_recv(struct sock *sk, struct sk_buff *skb)
|
|
|
|
{
|
|
|
|
struct ovpn_socket *ovpn_sock;
|
|
|
|
struct ovpn_priv *ovpn;
|
|
|
|
struct ovpn_peer *peer;
|
|
|
|
u32 peer_id;
|
|
|
|
u8 opcode;
|
|
|
|
|
|
|
|
ovpn_sock = ovpn_socket_from_udp_sock(sk);
|
|
|
|
if (unlikely(!ovpn_sock)) {
|
|
|
|
net_err_ratelimited("ovpn: %s invoked on non ovpn socket\n",
|
|
|
|
__func__);
|
|
|
|
goto drop_noovpn;
|
|
|
|
}
|
|
|
|
|
|
|
|
ovpn = ovpn_sock->ovpn;
|
|
|
|
if (unlikely(!ovpn)) {
|
|
|
|
net_err_ratelimited("ovpn: cannot obtain ovpn object from UDP socket\n");
|
|
|
|
goto drop_noovpn;
|
|
|
|
}
|
|
|
|
|
|
|
|
/* Make sure the first 4 bytes of the skb data buffer after the UDP
|
|
|
|
* header are accessible.
|
|
|
|
* They are required to fetch the OP code, the key ID and the peer ID.
|
|
|
|
*/
|
|
|
|
if (unlikely(!pskb_may_pull(skb, sizeof(struct udphdr) +
|
|
|
|
OVPN_OPCODE_SIZE))) {
|
|
|
|
net_dbg_ratelimited("%s: packet too small from UDP socket\n",
|
|
|
|
netdev_name(ovpn->dev));
|
|
|
|
goto drop;
|
|
|
|
}
|
|
|
|
|
|
|
|
opcode = ovpn_opcode_from_skb(skb, sizeof(struct udphdr));
|
|
|
|
if (unlikely(opcode != OVPN_DATA_V2)) {
|
|
|
|
/* DATA_V1 is not supported */
|
|
|
|
if (opcode == OVPN_DATA_V1)
|
|
|
|
goto drop;
|
|
|
|
|
|
|
|
/* unknown or control packet: let it bubble up to userspace */
|
|
|
|
return 1;
|
|
|
|
}
|
|
|
|
|
|
|
|
peer_id = ovpn_peer_id_from_skb(skb, sizeof(struct udphdr));
|
|
|
|
/* some OpenVPN server implementations send data packets with the
|
|
|
|
* peer-id set to UNDEF. In this case we skip the peer lookup by peer-id
|
|
|
|
* and we try with the transport address
|
|
|
|
*/
|
|
|
|
if (peer_id == OVPN_PEER_ID_UNDEF)
|
|
|
|
peer = ovpn_peer_get_by_transp_addr(ovpn, skb);
|
|
|
|
else
|
|
|
|
peer = ovpn_peer_get_by_id(ovpn, peer_id);
|
|
|
|
|
|
|
|
if (unlikely(!peer))
|
|
|
|
goto drop;
|
|
|
|
|
|
|
|
/* pop off outer UDP header */
|
|
|
|
__skb_pull(skb, sizeof(struct udphdr));
|
|
|
|
ovpn_recv(peer, skb);
|
|
|
|
return 0;
|
|
|
|
|
|
|
|
drop:
|
|
|
|
dev_dstats_rx_dropped(ovpn->dev);
|
|
|
|
drop_noovpn:
|
|
|
|
kfree_skb(skb);
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2025-04-15 13:17:24 +02:00
|
|
|
/**
|
|
|
|
* ovpn_udp4_output - send IPv4 packet over udp socket
|
|
|
|
* @peer: the destination peer
|
|
|
|
* @bind: the binding related to the destination peer
|
|
|
|
* @cache: dst cache
|
|
|
|
* @sk: the socket to send the packet over
|
|
|
|
* @skb: the packet to send
|
|
|
|
*
|
|
|
|
* Return: 0 on success or a negative error code otherwise
|
|
|
|
*/
|
|
|
|
static int ovpn_udp4_output(struct ovpn_peer *peer, struct ovpn_bind *bind,
|
|
|
|
struct dst_cache *cache, struct sock *sk,
|
|
|
|
struct sk_buff *skb)
|
|
|
|
{
|
|
|
|
struct rtable *rt;
|
|
|
|
struct flowi4 fl = {
|
|
|
|
.saddr = bind->local.ipv4.s_addr,
|
|
|
|
.daddr = bind->remote.in4.sin_addr.s_addr,
|
|
|
|
.fl4_sport = inet_sk(sk)->inet_sport,
|
|
|
|
.fl4_dport = bind->remote.in4.sin_port,
|
|
|
|
.flowi4_proto = sk->sk_protocol,
|
|
|
|
.flowi4_mark = sk->sk_mark,
|
|
|
|
};
|
|
|
|
int ret;
|
|
|
|
|
|
|
|
local_bh_disable();
|
|
|
|
rt = dst_cache_get_ip4(cache, &fl.saddr);
|
|
|
|
if (rt)
|
|
|
|
goto transmit;
|
|
|
|
|
|
|
|
if (unlikely(!inet_confirm_addr(sock_net(sk), NULL, 0, fl.saddr,
|
|
|
|
RT_SCOPE_HOST))) {
|
|
|
|
/* we may end up here when the cached address is not usable
|
|
|
|
* anymore. In this case we reset address/cache and perform a
|
|
|
|
* new look up
|
|
|
|
*/
|
|
|
|
fl.saddr = 0;
|
|
|
|
spin_lock_bh(&peer->lock);
|
|
|
|
bind->local.ipv4.s_addr = 0;
|
|
|
|
spin_unlock_bh(&peer->lock);
|
|
|
|
dst_cache_reset(cache);
|
|
|
|
}
|
|
|
|
|
|
|
|
rt = ip_route_output_flow(sock_net(sk), &fl, sk);
|
|
|
|
if (IS_ERR(rt) && PTR_ERR(rt) == -EINVAL) {
|
|
|
|
fl.saddr = 0;
|
|
|
|
spin_lock_bh(&peer->lock);
|
|
|
|
bind->local.ipv4.s_addr = 0;
|
|
|
|
spin_unlock_bh(&peer->lock);
|
|
|
|
dst_cache_reset(cache);
|
|
|
|
|
|
|
|
rt = ip_route_output_flow(sock_net(sk), &fl, sk);
|
|
|
|
}
|
|
|
|
|
|
|
|
if (IS_ERR(rt)) {
|
|
|
|
ret = PTR_ERR(rt);
|
|
|
|
net_dbg_ratelimited("%s: no route to host %pISpc: %d\n",
|
|
|
|
netdev_name(peer->ovpn->dev),
|
|
|
|
&bind->remote.in4,
|
|
|
|
ret);
|
|
|
|
goto err;
|
|
|
|
}
|
|
|
|
dst_cache_set_ip4(cache, &rt->dst, fl.saddr);
|
|
|
|
|
|
|
|
transmit:
|
|
|
|
udp_tunnel_xmit_skb(rt, sk, skb, fl.saddr, fl.daddr, 0,
|
|
|
|
ip4_dst_hoplimit(&rt->dst), 0, fl.fl4_sport,
|
2025-06-17 00:44:09 +02:00
|
|
|
fl.fl4_dport, false, sk->sk_no_check_tx, 0);
|
2025-04-15 13:17:24 +02:00
|
|
|
ret = 0;
|
|
|
|
err:
|
|
|
|
local_bh_enable();
|
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
|
|
|
|
#if IS_ENABLED(CONFIG_IPV6)
|
|
|
|
/**
|
|
|
|
* ovpn_udp6_output - send IPv6 packet over udp socket
|
|
|
|
* @peer: the destination peer
|
|
|
|
* @bind: the binding related to the destination peer
|
|
|
|
* @cache: dst cache
|
|
|
|
* @sk: the socket to send the packet over
|
|
|
|
* @skb: the packet to send
|
|
|
|
*
|
|
|
|
* Return: 0 on success or a negative error code otherwise
|
|
|
|
*/
|
|
|
|
static int ovpn_udp6_output(struct ovpn_peer *peer, struct ovpn_bind *bind,
|
|
|
|
struct dst_cache *cache, struct sock *sk,
|
|
|
|
struct sk_buff *skb)
|
|
|
|
{
|
|
|
|
struct dst_entry *dst;
|
|
|
|
int ret;
|
|
|
|
|
|
|
|
struct flowi6 fl = {
|
|
|
|
.saddr = bind->local.ipv6,
|
|
|
|
.daddr = bind->remote.in6.sin6_addr,
|
|
|
|
.fl6_sport = inet_sk(sk)->inet_sport,
|
|
|
|
.fl6_dport = bind->remote.in6.sin6_port,
|
|
|
|
.flowi6_proto = sk->sk_protocol,
|
|
|
|
.flowi6_mark = sk->sk_mark,
|
|
|
|
.flowi6_oif = bind->remote.in6.sin6_scope_id,
|
|
|
|
};
|
|
|
|
|
|
|
|
local_bh_disable();
|
|
|
|
dst = dst_cache_get_ip6(cache, &fl.saddr);
|
|
|
|
if (dst)
|
|
|
|
goto transmit;
|
|
|
|
|
|
|
|
if (unlikely(!ipv6_chk_addr(sock_net(sk), &fl.saddr, NULL, 0))) {
|
|
|
|
/* we may end up here when the cached address is not usable
|
|
|
|
* anymore. In this case we reset address/cache and perform a
|
|
|
|
* new look up
|
|
|
|
*/
|
|
|
|
fl.saddr = in6addr_any;
|
|
|
|
spin_lock_bh(&peer->lock);
|
|
|
|
bind->local.ipv6 = in6addr_any;
|
|
|
|
spin_unlock_bh(&peer->lock);
|
|
|
|
dst_cache_reset(cache);
|
|
|
|
}
|
|
|
|
|
|
|
|
dst = ipv6_stub->ipv6_dst_lookup_flow(sock_net(sk), sk, &fl, NULL);
|
|
|
|
if (IS_ERR(dst)) {
|
|
|
|
ret = PTR_ERR(dst);
|
|
|
|
net_dbg_ratelimited("%s: no route to host %pISpc: %d\n",
|
|
|
|
netdev_name(peer->ovpn->dev),
|
|
|
|
&bind->remote.in6, ret);
|
|
|
|
goto err;
|
|
|
|
}
|
|
|
|
dst_cache_set_ip6(cache, dst, &fl.saddr);
|
|
|
|
|
|
|
|
transmit:
|
2025-05-06 01:05:01 +02:00
|
|
|
/* user IPv6 packets may be larger than the transport interface
|
|
|
|
* MTU (after encapsulation), however, since they are locally
|
|
|
|
* generated we should ensure they get fragmented.
|
|
|
|
* Setting the ignore_df flag to 1 will instruct ip6_fragment() to
|
|
|
|
* fragment packets if needed.
|
|
|
|
*
|
|
|
|
* NOTE: this is not needed for IPv4 because we pass df=0 to
|
|
|
|
* udp_tunnel_xmit_skb()
|
|
|
|
*/
|
|
|
|
skb->ignore_df = 1;
|
2025-04-15 13:17:24 +02:00
|
|
|
udp_tunnel6_xmit_skb(dst, sk, skb, skb->dev, &fl.saddr, &fl.daddr, 0,
|
|
|
|
ip6_dst_hoplimit(dst), 0, fl.fl6_sport,
|
2025-06-17 00:44:14 +02:00
|
|
|
fl.fl6_dport, udp_get_no_check6_tx(sk), 0);
|
2025-04-15 13:17:24 +02:00
|
|
|
ret = 0;
|
|
|
|
err:
|
|
|
|
local_bh_enable();
|
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
#endif
|
|
|
|
|
|
|
|
/**
|
|
|
|
* ovpn_udp_output - transmit skb using udp-tunnel
|
|
|
|
* @peer: the destination peer
|
|
|
|
* @cache: dst cache
|
|
|
|
* @sk: the socket to send the packet over
|
|
|
|
* @skb: the packet to send
|
|
|
|
*
|
|
|
|
* rcu_read_lock should be held on entry.
|
|
|
|
* On return, the skb is consumed.
|
|
|
|
*
|
|
|
|
* Return: 0 on success or a negative error code otherwise
|
|
|
|
*/
|
|
|
|
static int ovpn_udp_output(struct ovpn_peer *peer, struct dst_cache *cache,
|
|
|
|
struct sock *sk, struct sk_buff *skb)
|
|
|
|
{
|
|
|
|
struct ovpn_bind *bind;
|
|
|
|
int ret;
|
|
|
|
|
|
|
|
/* set sk to null if skb is already orphaned */
|
|
|
|
if (!skb->destructor)
|
|
|
|
skb->sk = NULL;
|
|
|
|
|
|
|
|
rcu_read_lock();
|
|
|
|
bind = rcu_dereference(peer->bind);
|
|
|
|
if (unlikely(!bind)) {
|
|
|
|
net_warn_ratelimited("%s: no bind for remote peer %u\n",
|
|
|
|
netdev_name(peer->ovpn->dev), peer->id);
|
|
|
|
ret = -ENODEV;
|
|
|
|
goto out;
|
|
|
|
}
|
|
|
|
|
|
|
|
switch (bind->remote.in4.sin_family) {
|
|
|
|
case AF_INET:
|
|
|
|
ret = ovpn_udp4_output(peer, bind, cache, sk, skb);
|
|
|
|
break;
|
|
|
|
#if IS_ENABLED(CONFIG_IPV6)
|
|
|
|
case AF_INET6:
|
|
|
|
ret = ovpn_udp6_output(peer, bind, cache, sk, skb);
|
|
|
|
break;
|
|
|
|
#endif
|
|
|
|
default:
|
|
|
|
ret = -EAFNOSUPPORT;
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
|
|
|
|
out:
|
|
|
|
rcu_read_unlock();
|
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
|
|
|
|
/**
|
|
|
|
* ovpn_udp_send_skb - prepare skb and send it over via UDP
|
|
|
|
* @peer: the destination peer
|
ovpn: ensure sk is still valid during cleanup
Removing a peer while userspace attempts to close its transport
socket triggers a race condition resulting in the following
crash:
Oops: general protection fault, probably for non-canonical address 0xdffffc0000000077: 0000 [#1] SMP KASAN
KASAN: null-ptr-deref in range [0x00000000000003b8-0x00000000000003bf]
CPU: 12 UID: 0 PID: 162 Comm: kworker/12:1 Tainted: G O 6.15.0-rc2-00635-g521139ac3840 #272 PREEMPT(full)
Tainted: [O]=OOT_MODULE
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-20240910_120124-localhost 04/01/2014
Workqueue: events ovpn_peer_keepalive_work [ovpn]
RIP: 0010:ovpn_socket_release+0x23c/0x500 [ovpn]
Code: ea 03 80 3c 02 00 0f 85 71 02 00 00 48 b8 00 00 00 00 00 fc ff df 4d 8b 64 24 18 49 8d bc 24 be 03 00 00 48 89 fa 48 c1 ea 03 <0f> b6 14 02 48 89 f8 83 e0 07 83 c0 01 38 d0 7c 08 84 d2 0f 85 30
RSP: 0018:ffffc90000c9fb18 EFLAGS: 00010217
RAX: dffffc0000000000 RBX: ffff8881148d7940 RCX: ffffffff817787bb
RDX: 0000000000000077 RSI: 0000000000000008 RDI: 00000000000003be
RBP: ffffc90000c9fb30 R08: 0000000000000000 R09: fffffbfff0d3e840
R10: ffffffff869f4207 R11: 0000000000000000 R12: 0000000000000000
R13: ffff888115eb9300 R14: ffffc90000c9fbc8 R15: 000000000000000c
FS: 0000000000000000(0000) GS:ffff8882b0151000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f37266b6114 CR3: 00000000054a8000 CR4: 0000000000750ef0
PKRU: 55555554
Call Trace:
<TASK>
unlock_ovpn+0x8b/0xe0 [ovpn]
ovpn_peer_keepalive_work+0xe3/0x540 [ovpn]
? ovpn_peers_free+0x780/0x780 [ovpn]
? lock_acquire+0x56/0x70
? process_one_work+0x888/0x1740
process_one_work+0x933/0x1740
? pwq_dec_nr_in_flight+0x10b0/0x10b0
? move_linked_works+0x12d/0x2c0
? assign_work+0x163/0x270
worker_thread+0x4d6/0xd90
? preempt_count_sub+0x4c/0x70
? process_one_work+0x1740/0x1740
kthread+0x36c/0x710
? trace_preempt_on+0x8c/0x1e0
? kthread_is_per_cpu+0xc0/0xc0
? preempt_count_sub+0x4c/0x70
? _raw_spin_unlock_irq+0x36/0x60
? calculate_sigpending+0x7b/0xa0
? kthread_is_per_cpu+0xc0/0xc0
ret_from_fork+0x3a/0x80
? kthread_is_per_cpu+0xc0/0xc0
ret_from_fork_asm+0x11/0x20
</TASK>
Modules linked in: ovpn(O)
This happens because the peer deletion operation reaches
ovpn_socket_release() while ovpn_sock->sock (struct socket *)
and its sk member (struct sock *) are still both valid.
Here synchronize_rcu() is invoked, after which ovpn_sock->sock->sk
becomes NULL, due to the concurrent socket closing triggered
from userspace.
After having invoked synchronize_rcu(), ovpn_socket_release() will
attempt dereferencing ovpn_sock->sock->sk, triggering the crash
reported above.
The reason for accessing sk is that we need to retrieve its
protocol and continue the cleanup routine accordingly.
This crash can be easily produced by running openvpn userspace in
client mode with `--keepalive 10 20`, while entirely omitting this
option on the server side.
After 20 seconds ovpn will assume the peer (server) to be dead,
will start removing it and will notify userspace. The latter will
receive the notification and close the transport socket, thus
triggering the crash.
To fix the race condition for good, we need to refactor struct ovpn_socket.
Since ovpn is always only interested in the sock->sk member (struct sock *)
we can directly hold a reference to it, raher than accessing it via
its struct socket container.
This means changing "struct socket *ovpn_socket->sock" to
"struct sock *ovpn_socket->sk".
While acquiring a reference to sk, we can increase its refcounter
without affecting the socket close()/destroy() notification
(which we rely on when userspace closes a socket we are using).
By increasing sk's refcounter we know we can dereference it
in ovpn_socket_release() without incurring in any race condition
anymore.
ovpn_socket_release() will ultimately decrease the reference
counter.
Cc: Oleksandr Natalenko <oleksandr@natalenko.name>
Fixes: 11851cbd60ea ("ovpn: implement TCP transport")
Reported-by: Qingfang Deng <dqfext@gmail.com>
Closes: https://github.com/OpenVPN/ovpn-net-next/issues/1
Tested-by: Gert Doering <gert@greenie.muc.de>
Link: https://www.mail-archive.com/openvpn-devel@lists.sourceforge.net/msg31575.html
Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Signed-off-by: Antonio Quartulli <antonio@openvpn.net>
2025-04-30 02:26:49 +02:00
|
|
|
* @sk: peer socket
|
2025-04-15 13:17:24 +02:00
|
|
|
* @skb: the packet to send
|
|
|
|
*/
|
ovpn: ensure sk is still valid during cleanup
Removing a peer while userspace attempts to close its transport
socket triggers a race condition resulting in the following
crash:
Oops: general protection fault, probably for non-canonical address 0xdffffc0000000077: 0000 [#1] SMP KASAN
KASAN: null-ptr-deref in range [0x00000000000003b8-0x00000000000003bf]
CPU: 12 UID: 0 PID: 162 Comm: kworker/12:1 Tainted: G O 6.15.0-rc2-00635-g521139ac3840 #272 PREEMPT(full)
Tainted: [O]=OOT_MODULE
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-20240910_120124-localhost 04/01/2014
Workqueue: events ovpn_peer_keepalive_work [ovpn]
RIP: 0010:ovpn_socket_release+0x23c/0x500 [ovpn]
Code: ea 03 80 3c 02 00 0f 85 71 02 00 00 48 b8 00 00 00 00 00 fc ff df 4d 8b 64 24 18 49 8d bc 24 be 03 00 00 48 89 fa 48 c1 ea 03 <0f> b6 14 02 48 89 f8 83 e0 07 83 c0 01 38 d0 7c 08 84 d2 0f 85 30
RSP: 0018:ffffc90000c9fb18 EFLAGS: 00010217
RAX: dffffc0000000000 RBX: ffff8881148d7940 RCX: ffffffff817787bb
RDX: 0000000000000077 RSI: 0000000000000008 RDI: 00000000000003be
RBP: ffffc90000c9fb30 R08: 0000000000000000 R09: fffffbfff0d3e840
R10: ffffffff869f4207 R11: 0000000000000000 R12: 0000000000000000
R13: ffff888115eb9300 R14: ffffc90000c9fbc8 R15: 000000000000000c
FS: 0000000000000000(0000) GS:ffff8882b0151000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f37266b6114 CR3: 00000000054a8000 CR4: 0000000000750ef0
PKRU: 55555554
Call Trace:
<TASK>
unlock_ovpn+0x8b/0xe0 [ovpn]
ovpn_peer_keepalive_work+0xe3/0x540 [ovpn]
? ovpn_peers_free+0x780/0x780 [ovpn]
? lock_acquire+0x56/0x70
? process_one_work+0x888/0x1740
process_one_work+0x933/0x1740
? pwq_dec_nr_in_flight+0x10b0/0x10b0
? move_linked_works+0x12d/0x2c0
? assign_work+0x163/0x270
worker_thread+0x4d6/0xd90
? preempt_count_sub+0x4c/0x70
? process_one_work+0x1740/0x1740
kthread+0x36c/0x710
? trace_preempt_on+0x8c/0x1e0
? kthread_is_per_cpu+0xc0/0xc0
? preempt_count_sub+0x4c/0x70
? _raw_spin_unlock_irq+0x36/0x60
? calculate_sigpending+0x7b/0xa0
? kthread_is_per_cpu+0xc0/0xc0
ret_from_fork+0x3a/0x80
? kthread_is_per_cpu+0xc0/0xc0
ret_from_fork_asm+0x11/0x20
</TASK>
Modules linked in: ovpn(O)
This happens because the peer deletion operation reaches
ovpn_socket_release() while ovpn_sock->sock (struct socket *)
and its sk member (struct sock *) are still both valid.
Here synchronize_rcu() is invoked, after which ovpn_sock->sock->sk
becomes NULL, due to the concurrent socket closing triggered
from userspace.
After having invoked synchronize_rcu(), ovpn_socket_release() will
attempt dereferencing ovpn_sock->sock->sk, triggering the crash
reported above.
The reason for accessing sk is that we need to retrieve its
protocol and continue the cleanup routine accordingly.
This crash can be easily produced by running openvpn userspace in
client mode with `--keepalive 10 20`, while entirely omitting this
option on the server side.
After 20 seconds ovpn will assume the peer (server) to be dead,
will start removing it and will notify userspace. The latter will
receive the notification and close the transport socket, thus
triggering the crash.
To fix the race condition for good, we need to refactor struct ovpn_socket.
Since ovpn is always only interested in the sock->sk member (struct sock *)
we can directly hold a reference to it, raher than accessing it via
its struct socket container.
This means changing "struct socket *ovpn_socket->sock" to
"struct sock *ovpn_socket->sk".
While acquiring a reference to sk, we can increase its refcounter
without affecting the socket close()/destroy() notification
(which we rely on when userspace closes a socket we are using).
By increasing sk's refcounter we know we can dereference it
in ovpn_socket_release() without incurring in any race condition
anymore.
ovpn_socket_release() will ultimately decrease the reference
counter.
Cc: Oleksandr Natalenko <oleksandr@natalenko.name>
Fixes: 11851cbd60ea ("ovpn: implement TCP transport")
Reported-by: Qingfang Deng <dqfext@gmail.com>
Closes: https://github.com/OpenVPN/ovpn-net-next/issues/1
Tested-by: Gert Doering <gert@greenie.muc.de>
Link: https://www.mail-archive.com/openvpn-devel@lists.sourceforge.net/msg31575.html
Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Signed-off-by: Antonio Quartulli <antonio@openvpn.net>
2025-04-30 02:26:49 +02:00
|
|
|
void ovpn_udp_send_skb(struct ovpn_peer *peer, struct sock *sk,
|
2025-04-15 13:17:24 +02:00
|
|
|
struct sk_buff *skb)
|
|
|
|
{
|
ovpn: ensure sk is still valid during cleanup
Removing a peer while userspace attempts to close its transport
socket triggers a race condition resulting in the following
crash:
Oops: general protection fault, probably for non-canonical address 0xdffffc0000000077: 0000 [#1] SMP KASAN
KASAN: null-ptr-deref in range [0x00000000000003b8-0x00000000000003bf]
CPU: 12 UID: 0 PID: 162 Comm: kworker/12:1 Tainted: G O 6.15.0-rc2-00635-g521139ac3840 #272 PREEMPT(full)
Tainted: [O]=OOT_MODULE
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-20240910_120124-localhost 04/01/2014
Workqueue: events ovpn_peer_keepalive_work [ovpn]
RIP: 0010:ovpn_socket_release+0x23c/0x500 [ovpn]
Code: ea 03 80 3c 02 00 0f 85 71 02 00 00 48 b8 00 00 00 00 00 fc ff df 4d 8b 64 24 18 49 8d bc 24 be 03 00 00 48 89 fa 48 c1 ea 03 <0f> b6 14 02 48 89 f8 83 e0 07 83 c0 01 38 d0 7c 08 84 d2 0f 85 30
RSP: 0018:ffffc90000c9fb18 EFLAGS: 00010217
RAX: dffffc0000000000 RBX: ffff8881148d7940 RCX: ffffffff817787bb
RDX: 0000000000000077 RSI: 0000000000000008 RDI: 00000000000003be
RBP: ffffc90000c9fb30 R08: 0000000000000000 R09: fffffbfff0d3e840
R10: ffffffff869f4207 R11: 0000000000000000 R12: 0000000000000000
R13: ffff888115eb9300 R14: ffffc90000c9fbc8 R15: 000000000000000c
FS: 0000000000000000(0000) GS:ffff8882b0151000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f37266b6114 CR3: 00000000054a8000 CR4: 0000000000750ef0
PKRU: 55555554
Call Trace:
<TASK>
unlock_ovpn+0x8b/0xe0 [ovpn]
ovpn_peer_keepalive_work+0xe3/0x540 [ovpn]
? ovpn_peers_free+0x780/0x780 [ovpn]
? lock_acquire+0x56/0x70
? process_one_work+0x888/0x1740
process_one_work+0x933/0x1740
? pwq_dec_nr_in_flight+0x10b0/0x10b0
? move_linked_works+0x12d/0x2c0
? assign_work+0x163/0x270
worker_thread+0x4d6/0xd90
? preempt_count_sub+0x4c/0x70
? process_one_work+0x1740/0x1740
kthread+0x36c/0x710
? trace_preempt_on+0x8c/0x1e0
? kthread_is_per_cpu+0xc0/0xc0
? preempt_count_sub+0x4c/0x70
? _raw_spin_unlock_irq+0x36/0x60
? calculate_sigpending+0x7b/0xa0
? kthread_is_per_cpu+0xc0/0xc0
ret_from_fork+0x3a/0x80
? kthread_is_per_cpu+0xc0/0xc0
ret_from_fork_asm+0x11/0x20
</TASK>
Modules linked in: ovpn(O)
This happens because the peer deletion operation reaches
ovpn_socket_release() while ovpn_sock->sock (struct socket *)
and its sk member (struct sock *) are still both valid.
Here synchronize_rcu() is invoked, after which ovpn_sock->sock->sk
becomes NULL, due to the concurrent socket closing triggered
from userspace.
After having invoked synchronize_rcu(), ovpn_socket_release() will
attempt dereferencing ovpn_sock->sock->sk, triggering the crash
reported above.
The reason for accessing sk is that we need to retrieve its
protocol and continue the cleanup routine accordingly.
This crash can be easily produced by running openvpn userspace in
client mode with `--keepalive 10 20`, while entirely omitting this
option on the server side.
After 20 seconds ovpn will assume the peer (server) to be dead,
will start removing it and will notify userspace. The latter will
receive the notification and close the transport socket, thus
triggering the crash.
To fix the race condition for good, we need to refactor struct ovpn_socket.
Since ovpn is always only interested in the sock->sk member (struct sock *)
we can directly hold a reference to it, raher than accessing it via
its struct socket container.
This means changing "struct socket *ovpn_socket->sock" to
"struct sock *ovpn_socket->sk".
While acquiring a reference to sk, we can increase its refcounter
without affecting the socket close()/destroy() notification
(which we rely on when userspace closes a socket we are using).
By increasing sk's refcounter we know we can dereference it
in ovpn_socket_release() without incurring in any race condition
anymore.
ovpn_socket_release() will ultimately decrease the reference
counter.
Cc: Oleksandr Natalenko <oleksandr@natalenko.name>
Fixes: 11851cbd60ea ("ovpn: implement TCP transport")
Reported-by: Qingfang Deng <dqfext@gmail.com>
Closes: https://github.com/OpenVPN/ovpn-net-next/issues/1
Tested-by: Gert Doering <gert@greenie.muc.de>
Link: https://www.mail-archive.com/openvpn-devel@lists.sourceforge.net/msg31575.html
Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Signed-off-by: Antonio Quartulli <antonio@openvpn.net>
2025-04-30 02:26:49 +02:00
|
|
|
int ret;
|
2025-04-15 13:17:24 +02:00
|
|
|
|
|
|
|
skb->dev = peer->ovpn->dev;
|
2025-06-04 15:11:58 +02:00
|
|
|
skb->mark = READ_ONCE(sk->sk_mark);
|
2025-04-15 13:17:24 +02:00
|
|
|
/* no checksum performed at this layer */
|
|
|
|
skb->ip_summed = CHECKSUM_NONE;
|
|
|
|
|
|
|
|
/* crypto layer -> transport (UDP) */
|
ovpn: ensure sk is still valid during cleanup
Removing a peer while userspace attempts to close its transport
socket triggers a race condition resulting in the following
crash:
Oops: general protection fault, probably for non-canonical address 0xdffffc0000000077: 0000 [#1] SMP KASAN
KASAN: null-ptr-deref in range [0x00000000000003b8-0x00000000000003bf]
CPU: 12 UID: 0 PID: 162 Comm: kworker/12:1 Tainted: G O 6.15.0-rc2-00635-g521139ac3840 #272 PREEMPT(full)
Tainted: [O]=OOT_MODULE
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-20240910_120124-localhost 04/01/2014
Workqueue: events ovpn_peer_keepalive_work [ovpn]
RIP: 0010:ovpn_socket_release+0x23c/0x500 [ovpn]
Code: ea 03 80 3c 02 00 0f 85 71 02 00 00 48 b8 00 00 00 00 00 fc ff df 4d 8b 64 24 18 49 8d bc 24 be 03 00 00 48 89 fa 48 c1 ea 03 <0f> b6 14 02 48 89 f8 83 e0 07 83 c0 01 38 d0 7c 08 84 d2 0f 85 30
RSP: 0018:ffffc90000c9fb18 EFLAGS: 00010217
RAX: dffffc0000000000 RBX: ffff8881148d7940 RCX: ffffffff817787bb
RDX: 0000000000000077 RSI: 0000000000000008 RDI: 00000000000003be
RBP: ffffc90000c9fb30 R08: 0000000000000000 R09: fffffbfff0d3e840
R10: ffffffff869f4207 R11: 0000000000000000 R12: 0000000000000000
R13: ffff888115eb9300 R14: ffffc90000c9fbc8 R15: 000000000000000c
FS: 0000000000000000(0000) GS:ffff8882b0151000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f37266b6114 CR3: 00000000054a8000 CR4: 0000000000750ef0
PKRU: 55555554
Call Trace:
<TASK>
unlock_ovpn+0x8b/0xe0 [ovpn]
ovpn_peer_keepalive_work+0xe3/0x540 [ovpn]
? ovpn_peers_free+0x780/0x780 [ovpn]
? lock_acquire+0x56/0x70
? process_one_work+0x888/0x1740
process_one_work+0x933/0x1740
? pwq_dec_nr_in_flight+0x10b0/0x10b0
? move_linked_works+0x12d/0x2c0
? assign_work+0x163/0x270
worker_thread+0x4d6/0xd90
? preempt_count_sub+0x4c/0x70
? process_one_work+0x1740/0x1740
kthread+0x36c/0x710
? trace_preempt_on+0x8c/0x1e0
? kthread_is_per_cpu+0xc0/0xc0
? preempt_count_sub+0x4c/0x70
? _raw_spin_unlock_irq+0x36/0x60
? calculate_sigpending+0x7b/0xa0
? kthread_is_per_cpu+0xc0/0xc0
ret_from_fork+0x3a/0x80
? kthread_is_per_cpu+0xc0/0xc0
ret_from_fork_asm+0x11/0x20
</TASK>
Modules linked in: ovpn(O)
This happens because the peer deletion operation reaches
ovpn_socket_release() while ovpn_sock->sock (struct socket *)
and its sk member (struct sock *) are still both valid.
Here synchronize_rcu() is invoked, after which ovpn_sock->sock->sk
becomes NULL, due to the concurrent socket closing triggered
from userspace.
After having invoked synchronize_rcu(), ovpn_socket_release() will
attempt dereferencing ovpn_sock->sock->sk, triggering the crash
reported above.
The reason for accessing sk is that we need to retrieve its
protocol and continue the cleanup routine accordingly.
This crash can be easily produced by running openvpn userspace in
client mode with `--keepalive 10 20`, while entirely omitting this
option on the server side.
After 20 seconds ovpn will assume the peer (server) to be dead,
will start removing it and will notify userspace. The latter will
receive the notification and close the transport socket, thus
triggering the crash.
To fix the race condition for good, we need to refactor struct ovpn_socket.
Since ovpn is always only interested in the sock->sk member (struct sock *)
we can directly hold a reference to it, raher than accessing it via
its struct socket container.
This means changing "struct socket *ovpn_socket->sock" to
"struct sock *ovpn_socket->sk".
While acquiring a reference to sk, we can increase its refcounter
without affecting the socket close()/destroy() notification
(which we rely on when userspace closes a socket we are using).
By increasing sk's refcounter we know we can dereference it
in ovpn_socket_release() without incurring in any race condition
anymore.
ovpn_socket_release() will ultimately decrease the reference
counter.
Cc: Oleksandr Natalenko <oleksandr@natalenko.name>
Fixes: 11851cbd60ea ("ovpn: implement TCP transport")
Reported-by: Qingfang Deng <dqfext@gmail.com>
Closes: https://github.com/OpenVPN/ovpn-net-next/issues/1
Tested-by: Gert Doering <gert@greenie.muc.de>
Link: https://www.mail-archive.com/openvpn-devel@lists.sourceforge.net/msg31575.html
Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Signed-off-by: Antonio Quartulli <antonio@openvpn.net>
2025-04-30 02:26:49 +02:00
|
|
|
ret = ovpn_udp_output(peer, &peer->dst_cache, sk, skb);
|
|
|
|
if (unlikely(ret < 0))
|
2025-04-15 13:17:24 +02:00
|
|
|
kfree_skb(skb);
|
|
|
|
}
|
|
|
|
|
2025-04-15 13:17:25 +02:00
|
|
|
static void ovpn_udp_encap_destroy(struct sock *sk)
|
|
|
|
{
|
|
|
|
struct ovpn_socket *sock;
|
|
|
|
struct ovpn_priv *ovpn;
|
|
|
|
|
|
|
|
rcu_read_lock();
|
|
|
|
sock = rcu_dereference_sk_user_data(sk);
|
|
|
|
if (!sock || !sock->ovpn) {
|
|
|
|
rcu_read_unlock();
|
|
|
|
return;
|
|
|
|
}
|
|
|
|
ovpn = sock->ovpn;
|
|
|
|
rcu_read_unlock();
|
|
|
|
|
2025-04-15 13:17:31 +02:00
|
|
|
ovpn_peers_free(ovpn, sk, OVPN_DEL_PEER_REASON_TRANSPORT_DISCONNECT);
|
2025-04-15 13:17:25 +02:00
|
|
|
}
|
|
|
|
|
2025-04-15 13:17:23 +02:00
|
|
|
/**
|
|
|
|
* ovpn_udp_socket_attach - set udp-tunnel CBs on socket and link it to ovpn
|
|
|
|
* @ovpn_sock: socket to configure
|
ovpn: ensure sk is still valid during cleanup
Removing a peer while userspace attempts to close its transport
socket triggers a race condition resulting in the following
crash:
Oops: general protection fault, probably for non-canonical address 0xdffffc0000000077: 0000 [#1] SMP KASAN
KASAN: null-ptr-deref in range [0x00000000000003b8-0x00000000000003bf]
CPU: 12 UID: 0 PID: 162 Comm: kworker/12:1 Tainted: G O 6.15.0-rc2-00635-g521139ac3840 #272 PREEMPT(full)
Tainted: [O]=OOT_MODULE
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-20240910_120124-localhost 04/01/2014
Workqueue: events ovpn_peer_keepalive_work [ovpn]
RIP: 0010:ovpn_socket_release+0x23c/0x500 [ovpn]
Code: ea 03 80 3c 02 00 0f 85 71 02 00 00 48 b8 00 00 00 00 00 fc ff df 4d 8b 64 24 18 49 8d bc 24 be 03 00 00 48 89 fa 48 c1 ea 03 <0f> b6 14 02 48 89 f8 83 e0 07 83 c0 01 38 d0 7c 08 84 d2 0f 85 30
RSP: 0018:ffffc90000c9fb18 EFLAGS: 00010217
RAX: dffffc0000000000 RBX: ffff8881148d7940 RCX: ffffffff817787bb
RDX: 0000000000000077 RSI: 0000000000000008 RDI: 00000000000003be
RBP: ffffc90000c9fb30 R08: 0000000000000000 R09: fffffbfff0d3e840
R10: ffffffff869f4207 R11: 0000000000000000 R12: 0000000000000000
R13: ffff888115eb9300 R14: ffffc90000c9fbc8 R15: 000000000000000c
FS: 0000000000000000(0000) GS:ffff8882b0151000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f37266b6114 CR3: 00000000054a8000 CR4: 0000000000750ef0
PKRU: 55555554
Call Trace:
<TASK>
unlock_ovpn+0x8b/0xe0 [ovpn]
ovpn_peer_keepalive_work+0xe3/0x540 [ovpn]
? ovpn_peers_free+0x780/0x780 [ovpn]
? lock_acquire+0x56/0x70
? process_one_work+0x888/0x1740
process_one_work+0x933/0x1740
? pwq_dec_nr_in_flight+0x10b0/0x10b0
? move_linked_works+0x12d/0x2c0
? assign_work+0x163/0x270
worker_thread+0x4d6/0xd90
? preempt_count_sub+0x4c/0x70
? process_one_work+0x1740/0x1740
kthread+0x36c/0x710
? trace_preempt_on+0x8c/0x1e0
? kthread_is_per_cpu+0xc0/0xc0
? preempt_count_sub+0x4c/0x70
? _raw_spin_unlock_irq+0x36/0x60
? calculate_sigpending+0x7b/0xa0
? kthread_is_per_cpu+0xc0/0xc0
ret_from_fork+0x3a/0x80
? kthread_is_per_cpu+0xc0/0xc0
ret_from_fork_asm+0x11/0x20
</TASK>
Modules linked in: ovpn(O)
This happens because the peer deletion operation reaches
ovpn_socket_release() while ovpn_sock->sock (struct socket *)
and its sk member (struct sock *) are still both valid.
Here synchronize_rcu() is invoked, after which ovpn_sock->sock->sk
becomes NULL, due to the concurrent socket closing triggered
from userspace.
After having invoked synchronize_rcu(), ovpn_socket_release() will
attempt dereferencing ovpn_sock->sock->sk, triggering the crash
reported above.
The reason for accessing sk is that we need to retrieve its
protocol and continue the cleanup routine accordingly.
This crash can be easily produced by running openvpn userspace in
client mode with `--keepalive 10 20`, while entirely omitting this
option on the server side.
After 20 seconds ovpn will assume the peer (server) to be dead,
will start removing it and will notify userspace. The latter will
receive the notification and close the transport socket, thus
triggering the crash.
To fix the race condition for good, we need to refactor struct ovpn_socket.
Since ovpn is always only interested in the sock->sk member (struct sock *)
we can directly hold a reference to it, raher than accessing it via
its struct socket container.
This means changing "struct socket *ovpn_socket->sock" to
"struct sock *ovpn_socket->sk".
While acquiring a reference to sk, we can increase its refcounter
without affecting the socket close()/destroy() notification
(which we rely on when userspace closes a socket we are using).
By increasing sk's refcounter we know we can dereference it
in ovpn_socket_release() without incurring in any race condition
anymore.
ovpn_socket_release() will ultimately decrease the reference
counter.
Cc: Oleksandr Natalenko <oleksandr@natalenko.name>
Fixes: 11851cbd60ea ("ovpn: implement TCP transport")
Reported-by: Qingfang Deng <dqfext@gmail.com>
Closes: https://github.com/OpenVPN/ovpn-net-next/issues/1
Tested-by: Gert Doering <gert@greenie.muc.de>
Link: https://www.mail-archive.com/openvpn-devel@lists.sourceforge.net/msg31575.html
Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Signed-off-by: Antonio Quartulli <antonio@openvpn.net>
2025-04-30 02:26:49 +02:00
|
|
|
* @sock: the socket container to be passed to setup_udp_tunnel_sock()
|
2025-04-15 13:17:23 +02:00
|
|
|
* @ovpn: the openvp instance to link
|
|
|
|
*
|
|
|
|
* After invoking this function, the sock will be controlled by ovpn so that
|
|
|
|
* any incoming packet may be processed by ovpn first.
|
|
|
|
*
|
|
|
|
* Return: 0 on success or a negative error code otherwise
|
|
|
|
*/
|
ovpn: ensure sk is still valid during cleanup
Removing a peer while userspace attempts to close its transport
socket triggers a race condition resulting in the following
crash:
Oops: general protection fault, probably for non-canonical address 0xdffffc0000000077: 0000 [#1] SMP KASAN
KASAN: null-ptr-deref in range [0x00000000000003b8-0x00000000000003bf]
CPU: 12 UID: 0 PID: 162 Comm: kworker/12:1 Tainted: G O 6.15.0-rc2-00635-g521139ac3840 #272 PREEMPT(full)
Tainted: [O]=OOT_MODULE
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-20240910_120124-localhost 04/01/2014
Workqueue: events ovpn_peer_keepalive_work [ovpn]
RIP: 0010:ovpn_socket_release+0x23c/0x500 [ovpn]
Code: ea 03 80 3c 02 00 0f 85 71 02 00 00 48 b8 00 00 00 00 00 fc ff df 4d 8b 64 24 18 49 8d bc 24 be 03 00 00 48 89 fa 48 c1 ea 03 <0f> b6 14 02 48 89 f8 83 e0 07 83 c0 01 38 d0 7c 08 84 d2 0f 85 30
RSP: 0018:ffffc90000c9fb18 EFLAGS: 00010217
RAX: dffffc0000000000 RBX: ffff8881148d7940 RCX: ffffffff817787bb
RDX: 0000000000000077 RSI: 0000000000000008 RDI: 00000000000003be
RBP: ffffc90000c9fb30 R08: 0000000000000000 R09: fffffbfff0d3e840
R10: ffffffff869f4207 R11: 0000000000000000 R12: 0000000000000000
R13: ffff888115eb9300 R14: ffffc90000c9fbc8 R15: 000000000000000c
FS: 0000000000000000(0000) GS:ffff8882b0151000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f37266b6114 CR3: 00000000054a8000 CR4: 0000000000750ef0
PKRU: 55555554
Call Trace:
<TASK>
unlock_ovpn+0x8b/0xe0 [ovpn]
ovpn_peer_keepalive_work+0xe3/0x540 [ovpn]
? ovpn_peers_free+0x780/0x780 [ovpn]
? lock_acquire+0x56/0x70
? process_one_work+0x888/0x1740
process_one_work+0x933/0x1740
? pwq_dec_nr_in_flight+0x10b0/0x10b0
? move_linked_works+0x12d/0x2c0
? assign_work+0x163/0x270
worker_thread+0x4d6/0xd90
? preempt_count_sub+0x4c/0x70
? process_one_work+0x1740/0x1740
kthread+0x36c/0x710
? trace_preempt_on+0x8c/0x1e0
? kthread_is_per_cpu+0xc0/0xc0
? preempt_count_sub+0x4c/0x70
? _raw_spin_unlock_irq+0x36/0x60
? calculate_sigpending+0x7b/0xa0
? kthread_is_per_cpu+0xc0/0xc0
ret_from_fork+0x3a/0x80
? kthread_is_per_cpu+0xc0/0xc0
ret_from_fork_asm+0x11/0x20
</TASK>
Modules linked in: ovpn(O)
This happens because the peer deletion operation reaches
ovpn_socket_release() while ovpn_sock->sock (struct socket *)
and its sk member (struct sock *) are still both valid.
Here synchronize_rcu() is invoked, after which ovpn_sock->sock->sk
becomes NULL, due to the concurrent socket closing triggered
from userspace.
After having invoked synchronize_rcu(), ovpn_socket_release() will
attempt dereferencing ovpn_sock->sock->sk, triggering the crash
reported above.
The reason for accessing sk is that we need to retrieve its
protocol and continue the cleanup routine accordingly.
This crash can be easily produced by running openvpn userspace in
client mode with `--keepalive 10 20`, while entirely omitting this
option on the server side.
After 20 seconds ovpn will assume the peer (server) to be dead,
will start removing it and will notify userspace. The latter will
receive the notification and close the transport socket, thus
triggering the crash.
To fix the race condition for good, we need to refactor struct ovpn_socket.
Since ovpn is always only interested in the sock->sk member (struct sock *)
we can directly hold a reference to it, raher than accessing it via
its struct socket container.
This means changing "struct socket *ovpn_socket->sock" to
"struct sock *ovpn_socket->sk".
While acquiring a reference to sk, we can increase its refcounter
without affecting the socket close()/destroy() notification
(which we rely on when userspace closes a socket we are using).
By increasing sk's refcounter we know we can dereference it
in ovpn_socket_release() without incurring in any race condition
anymore.
ovpn_socket_release() will ultimately decrease the reference
counter.
Cc: Oleksandr Natalenko <oleksandr@natalenko.name>
Fixes: 11851cbd60ea ("ovpn: implement TCP transport")
Reported-by: Qingfang Deng <dqfext@gmail.com>
Closes: https://github.com/OpenVPN/ovpn-net-next/issues/1
Tested-by: Gert Doering <gert@greenie.muc.de>
Link: https://www.mail-archive.com/openvpn-devel@lists.sourceforge.net/msg31575.html
Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Signed-off-by: Antonio Quartulli <antonio@openvpn.net>
2025-04-30 02:26:49 +02:00
|
|
|
int ovpn_udp_socket_attach(struct ovpn_socket *ovpn_sock, struct socket *sock,
|
2025-04-15 13:17:23 +02:00
|
|
|
struct ovpn_priv *ovpn)
|
|
|
|
{
|
2025-04-15 13:17:25 +02:00
|
|
|
struct udp_tunnel_sock_cfg cfg = {
|
|
|
|
.encap_type = UDP_ENCAP_OVPNINUDP,
|
|
|
|
.encap_rcv = ovpn_udp_encap_recv,
|
|
|
|
.encap_destroy = ovpn_udp_encap_destroy,
|
|
|
|
};
|
2025-04-15 13:17:23 +02:00
|
|
|
struct ovpn_socket *old_data;
|
2025-04-15 13:17:24 +02:00
|
|
|
int ret;
|
2025-04-15 13:17:23 +02:00
|
|
|
|
|
|
|
/* make sure no pre-existing encapsulation handler exists */
|
|
|
|
rcu_read_lock();
|
ovpn: ensure sk is still valid during cleanup
Removing a peer while userspace attempts to close its transport
socket triggers a race condition resulting in the following
crash:
Oops: general protection fault, probably for non-canonical address 0xdffffc0000000077: 0000 [#1] SMP KASAN
KASAN: null-ptr-deref in range [0x00000000000003b8-0x00000000000003bf]
CPU: 12 UID: 0 PID: 162 Comm: kworker/12:1 Tainted: G O 6.15.0-rc2-00635-g521139ac3840 #272 PREEMPT(full)
Tainted: [O]=OOT_MODULE
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-20240910_120124-localhost 04/01/2014
Workqueue: events ovpn_peer_keepalive_work [ovpn]
RIP: 0010:ovpn_socket_release+0x23c/0x500 [ovpn]
Code: ea 03 80 3c 02 00 0f 85 71 02 00 00 48 b8 00 00 00 00 00 fc ff df 4d 8b 64 24 18 49 8d bc 24 be 03 00 00 48 89 fa 48 c1 ea 03 <0f> b6 14 02 48 89 f8 83 e0 07 83 c0 01 38 d0 7c 08 84 d2 0f 85 30
RSP: 0018:ffffc90000c9fb18 EFLAGS: 00010217
RAX: dffffc0000000000 RBX: ffff8881148d7940 RCX: ffffffff817787bb
RDX: 0000000000000077 RSI: 0000000000000008 RDI: 00000000000003be
RBP: ffffc90000c9fb30 R08: 0000000000000000 R09: fffffbfff0d3e840
R10: ffffffff869f4207 R11: 0000000000000000 R12: 0000000000000000
R13: ffff888115eb9300 R14: ffffc90000c9fbc8 R15: 000000000000000c
FS: 0000000000000000(0000) GS:ffff8882b0151000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f37266b6114 CR3: 00000000054a8000 CR4: 0000000000750ef0
PKRU: 55555554
Call Trace:
<TASK>
unlock_ovpn+0x8b/0xe0 [ovpn]
ovpn_peer_keepalive_work+0xe3/0x540 [ovpn]
? ovpn_peers_free+0x780/0x780 [ovpn]
? lock_acquire+0x56/0x70
? process_one_work+0x888/0x1740
process_one_work+0x933/0x1740
? pwq_dec_nr_in_flight+0x10b0/0x10b0
? move_linked_works+0x12d/0x2c0
? assign_work+0x163/0x270
worker_thread+0x4d6/0xd90
? preempt_count_sub+0x4c/0x70
? process_one_work+0x1740/0x1740
kthread+0x36c/0x710
? trace_preempt_on+0x8c/0x1e0
? kthread_is_per_cpu+0xc0/0xc0
? preempt_count_sub+0x4c/0x70
? _raw_spin_unlock_irq+0x36/0x60
? calculate_sigpending+0x7b/0xa0
? kthread_is_per_cpu+0xc0/0xc0
ret_from_fork+0x3a/0x80
? kthread_is_per_cpu+0xc0/0xc0
ret_from_fork_asm+0x11/0x20
</TASK>
Modules linked in: ovpn(O)
This happens because the peer deletion operation reaches
ovpn_socket_release() while ovpn_sock->sock (struct socket *)
and its sk member (struct sock *) are still both valid.
Here synchronize_rcu() is invoked, after which ovpn_sock->sock->sk
becomes NULL, due to the concurrent socket closing triggered
from userspace.
After having invoked synchronize_rcu(), ovpn_socket_release() will
attempt dereferencing ovpn_sock->sock->sk, triggering the crash
reported above.
The reason for accessing sk is that we need to retrieve its
protocol and continue the cleanup routine accordingly.
This crash can be easily produced by running openvpn userspace in
client mode with `--keepalive 10 20`, while entirely omitting this
option on the server side.
After 20 seconds ovpn will assume the peer (server) to be dead,
will start removing it and will notify userspace. The latter will
receive the notification and close the transport socket, thus
triggering the crash.
To fix the race condition for good, we need to refactor struct ovpn_socket.
Since ovpn is always only interested in the sock->sk member (struct sock *)
we can directly hold a reference to it, raher than accessing it via
its struct socket container.
This means changing "struct socket *ovpn_socket->sock" to
"struct sock *ovpn_socket->sk".
While acquiring a reference to sk, we can increase its refcounter
without affecting the socket close()/destroy() notification
(which we rely on when userspace closes a socket we are using).
By increasing sk's refcounter we know we can dereference it
in ovpn_socket_release() without incurring in any race condition
anymore.
ovpn_socket_release() will ultimately decrease the reference
counter.
Cc: Oleksandr Natalenko <oleksandr@natalenko.name>
Fixes: 11851cbd60ea ("ovpn: implement TCP transport")
Reported-by: Qingfang Deng <dqfext@gmail.com>
Closes: https://github.com/OpenVPN/ovpn-net-next/issues/1
Tested-by: Gert Doering <gert@greenie.muc.de>
Link: https://www.mail-archive.com/openvpn-devel@lists.sourceforge.net/msg31575.html
Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Signed-off-by: Antonio Quartulli <antonio@openvpn.net>
2025-04-30 02:26:49 +02:00
|
|
|
old_data = rcu_dereference_sk_user_data(ovpn_sock->sk);
|
2025-04-15 13:17:23 +02:00
|
|
|
if (!old_data) {
|
|
|
|
/* socket is currently unused - we can take it */
|
|
|
|
rcu_read_unlock();
|
ovpn: ensure sk is still valid during cleanup
Removing a peer while userspace attempts to close its transport
socket triggers a race condition resulting in the following
crash:
Oops: general protection fault, probably for non-canonical address 0xdffffc0000000077: 0000 [#1] SMP KASAN
KASAN: null-ptr-deref in range [0x00000000000003b8-0x00000000000003bf]
CPU: 12 UID: 0 PID: 162 Comm: kworker/12:1 Tainted: G O 6.15.0-rc2-00635-g521139ac3840 #272 PREEMPT(full)
Tainted: [O]=OOT_MODULE
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-20240910_120124-localhost 04/01/2014
Workqueue: events ovpn_peer_keepalive_work [ovpn]
RIP: 0010:ovpn_socket_release+0x23c/0x500 [ovpn]
Code: ea 03 80 3c 02 00 0f 85 71 02 00 00 48 b8 00 00 00 00 00 fc ff df 4d 8b 64 24 18 49 8d bc 24 be 03 00 00 48 89 fa 48 c1 ea 03 <0f> b6 14 02 48 89 f8 83 e0 07 83 c0 01 38 d0 7c 08 84 d2 0f 85 30
RSP: 0018:ffffc90000c9fb18 EFLAGS: 00010217
RAX: dffffc0000000000 RBX: ffff8881148d7940 RCX: ffffffff817787bb
RDX: 0000000000000077 RSI: 0000000000000008 RDI: 00000000000003be
RBP: ffffc90000c9fb30 R08: 0000000000000000 R09: fffffbfff0d3e840
R10: ffffffff869f4207 R11: 0000000000000000 R12: 0000000000000000
R13: ffff888115eb9300 R14: ffffc90000c9fbc8 R15: 000000000000000c
FS: 0000000000000000(0000) GS:ffff8882b0151000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f37266b6114 CR3: 00000000054a8000 CR4: 0000000000750ef0
PKRU: 55555554
Call Trace:
<TASK>
unlock_ovpn+0x8b/0xe0 [ovpn]
ovpn_peer_keepalive_work+0xe3/0x540 [ovpn]
? ovpn_peers_free+0x780/0x780 [ovpn]
? lock_acquire+0x56/0x70
? process_one_work+0x888/0x1740
process_one_work+0x933/0x1740
? pwq_dec_nr_in_flight+0x10b0/0x10b0
? move_linked_works+0x12d/0x2c0
? assign_work+0x163/0x270
worker_thread+0x4d6/0xd90
? preempt_count_sub+0x4c/0x70
? process_one_work+0x1740/0x1740
kthread+0x36c/0x710
? trace_preempt_on+0x8c/0x1e0
? kthread_is_per_cpu+0xc0/0xc0
? preempt_count_sub+0x4c/0x70
? _raw_spin_unlock_irq+0x36/0x60
? calculate_sigpending+0x7b/0xa0
? kthread_is_per_cpu+0xc0/0xc0
ret_from_fork+0x3a/0x80
? kthread_is_per_cpu+0xc0/0xc0
ret_from_fork_asm+0x11/0x20
</TASK>
Modules linked in: ovpn(O)
This happens because the peer deletion operation reaches
ovpn_socket_release() while ovpn_sock->sock (struct socket *)
and its sk member (struct sock *) are still both valid.
Here synchronize_rcu() is invoked, after which ovpn_sock->sock->sk
becomes NULL, due to the concurrent socket closing triggered
from userspace.
After having invoked synchronize_rcu(), ovpn_socket_release() will
attempt dereferencing ovpn_sock->sock->sk, triggering the crash
reported above.
The reason for accessing sk is that we need to retrieve its
protocol and continue the cleanup routine accordingly.
This crash can be easily produced by running openvpn userspace in
client mode with `--keepalive 10 20`, while entirely omitting this
option on the server side.
After 20 seconds ovpn will assume the peer (server) to be dead,
will start removing it and will notify userspace. The latter will
receive the notification and close the transport socket, thus
triggering the crash.
To fix the race condition for good, we need to refactor struct ovpn_socket.
Since ovpn is always only interested in the sock->sk member (struct sock *)
we can directly hold a reference to it, raher than accessing it via
its struct socket container.
This means changing "struct socket *ovpn_socket->sock" to
"struct sock *ovpn_socket->sk".
While acquiring a reference to sk, we can increase its refcounter
without affecting the socket close()/destroy() notification
(which we rely on when userspace closes a socket we are using).
By increasing sk's refcounter we know we can dereference it
in ovpn_socket_release() without incurring in any race condition
anymore.
ovpn_socket_release() will ultimately decrease the reference
counter.
Cc: Oleksandr Natalenko <oleksandr@natalenko.name>
Fixes: 11851cbd60ea ("ovpn: implement TCP transport")
Reported-by: Qingfang Deng <dqfext@gmail.com>
Closes: https://github.com/OpenVPN/ovpn-net-next/issues/1
Tested-by: Gert Doering <gert@greenie.muc.de>
Link: https://www.mail-archive.com/openvpn-devel@lists.sourceforge.net/msg31575.html
Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Signed-off-by: Antonio Quartulli <antonio@openvpn.net>
2025-04-30 02:26:49 +02:00
|
|
|
setup_udp_tunnel_sock(sock_net(ovpn_sock->sk), sock, &cfg);
|
2025-04-15 13:17:23 +02:00
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
/* socket is in use. We need to understand if it's owned by this ovpn
|
|
|
|
* instance or by something else.
|
|
|
|
* In the former case, we can increase the refcounter and happily
|
|
|
|
* use it, because the same UDP socket is expected to be shared among
|
|
|
|
* different peers.
|
|
|
|
*
|
|
|
|
* Unlikely TCP, a single UDP socket can be used to talk to many remote
|
|
|
|
* hosts and therefore openvpn instantiates one only for all its peers
|
|
|
|
*/
|
ovpn: ensure sk is still valid during cleanup
Removing a peer while userspace attempts to close its transport
socket triggers a race condition resulting in the following
crash:
Oops: general protection fault, probably for non-canonical address 0xdffffc0000000077: 0000 [#1] SMP KASAN
KASAN: null-ptr-deref in range [0x00000000000003b8-0x00000000000003bf]
CPU: 12 UID: 0 PID: 162 Comm: kworker/12:1 Tainted: G O 6.15.0-rc2-00635-g521139ac3840 #272 PREEMPT(full)
Tainted: [O]=OOT_MODULE
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-20240910_120124-localhost 04/01/2014
Workqueue: events ovpn_peer_keepalive_work [ovpn]
RIP: 0010:ovpn_socket_release+0x23c/0x500 [ovpn]
Code: ea 03 80 3c 02 00 0f 85 71 02 00 00 48 b8 00 00 00 00 00 fc ff df 4d 8b 64 24 18 49 8d bc 24 be 03 00 00 48 89 fa 48 c1 ea 03 <0f> b6 14 02 48 89 f8 83 e0 07 83 c0 01 38 d0 7c 08 84 d2 0f 85 30
RSP: 0018:ffffc90000c9fb18 EFLAGS: 00010217
RAX: dffffc0000000000 RBX: ffff8881148d7940 RCX: ffffffff817787bb
RDX: 0000000000000077 RSI: 0000000000000008 RDI: 00000000000003be
RBP: ffffc90000c9fb30 R08: 0000000000000000 R09: fffffbfff0d3e840
R10: ffffffff869f4207 R11: 0000000000000000 R12: 0000000000000000
R13: ffff888115eb9300 R14: ffffc90000c9fbc8 R15: 000000000000000c
FS: 0000000000000000(0000) GS:ffff8882b0151000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f37266b6114 CR3: 00000000054a8000 CR4: 0000000000750ef0
PKRU: 55555554
Call Trace:
<TASK>
unlock_ovpn+0x8b/0xe0 [ovpn]
ovpn_peer_keepalive_work+0xe3/0x540 [ovpn]
? ovpn_peers_free+0x780/0x780 [ovpn]
? lock_acquire+0x56/0x70
? process_one_work+0x888/0x1740
process_one_work+0x933/0x1740
? pwq_dec_nr_in_flight+0x10b0/0x10b0
? move_linked_works+0x12d/0x2c0
? assign_work+0x163/0x270
worker_thread+0x4d6/0xd90
? preempt_count_sub+0x4c/0x70
? process_one_work+0x1740/0x1740
kthread+0x36c/0x710
? trace_preempt_on+0x8c/0x1e0
? kthread_is_per_cpu+0xc0/0xc0
? preempt_count_sub+0x4c/0x70
? _raw_spin_unlock_irq+0x36/0x60
? calculate_sigpending+0x7b/0xa0
? kthread_is_per_cpu+0xc0/0xc0
ret_from_fork+0x3a/0x80
? kthread_is_per_cpu+0xc0/0xc0
ret_from_fork_asm+0x11/0x20
</TASK>
Modules linked in: ovpn(O)
This happens because the peer deletion operation reaches
ovpn_socket_release() while ovpn_sock->sock (struct socket *)
and its sk member (struct sock *) are still both valid.
Here synchronize_rcu() is invoked, after which ovpn_sock->sock->sk
becomes NULL, due to the concurrent socket closing triggered
from userspace.
After having invoked synchronize_rcu(), ovpn_socket_release() will
attempt dereferencing ovpn_sock->sock->sk, triggering the crash
reported above.
The reason for accessing sk is that we need to retrieve its
protocol and continue the cleanup routine accordingly.
This crash can be easily produced by running openvpn userspace in
client mode with `--keepalive 10 20`, while entirely omitting this
option on the server side.
After 20 seconds ovpn will assume the peer (server) to be dead,
will start removing it and will notify userspace. The latter will
receive the notification and close the transport socket, thus
triggering the crash.
To fix the race condition for good, we need to refactor struct ovpn_socket.
Since ovpn is always only interested in the sock->sk member (struct sock *)
we can directly hold a reference to it, raher than accessing it via
its struct socket container.
This means changing "struct socket *ovpn_socket->sock" to
"struct sock *ovpn_socket->sk".
While acquiring a reference to sk, we can increase its refcounter
without affecting the socket close()/destroy() notification
(which we rely on when userspace closes a socket we are using).
By increasing sk's refcounter we know we can dereference it
in ovpn_socket_release() without incurring in any race condition
anymore.
ovpn_socket_release() will ultimately decrease the reference
counter.
Cc: Oleksandr Natalenko <oleksandr@natalenko.name>
Fixes: 11851cbd60ea ("ovpn: implement TCP transport")
Reported-by: Qingfang Deng <dqfext@gmail.com>
Closes: https://github.com/OpenVPN/ovpn-net-next/issues/1
Tested-by: Gert Doering <gert@greenie.muc.de>
Link: https://www.mail-archive.com/openvpn-devel@lists.sourceforge.net/msg31575.html
Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Signed-off-by: Antonio Quartulli <antonio@openvpn.net>
2025-04-30 02:26:49 +02:00
|
|
|
if ((READ_ONCE(udp_sk(ovpn_sock->sk)->encap_type) == UDP_ENCAP_OVPNINUDP) &&
|
2025-04-15 13:17:23 +02:00
|
|
|
old_data->ovpn == ovpn) {
|
|
|
|
netdev_dbg(ovpn->dev,
|
|
|
|
"provided socket already owned by this interface\n");
|
|
|
|
ret = -EALREADY;
|
|
|
|
} else {
|
|
|
|
netdev_dbg(ovpn->dev,
|
|
|
|
"provided socket already taken by other user\n");
|
|
|
|
ret = -EBUSY;
|
|
|
|
}
|
|
|
|
rcu_read_unlock();
|
|
|
|
|
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
|
|
|
|
/**
|
|
|
|
* ovpn_udp_socket_detach - clean udp-tunnel status for this socket
|
|
|
|
* @ovpn_sock: the socket to clean
|
|
|
|
*/
|
|
|
|
void ovpn_udp_socket_detach(struct ovpn_socket *ovpn_sock)
|
|
|
|
{
|
ovpn: ensure sk is still valid during cleanup
Removing a peer while userspace attempts to close its transport
socket triggers a race condition resulting in the following
crash:
Oops: general protection fault, probably for non-canonical address 0xdffffc0000000077: 0000 [#1] SMP KASAN
KASAN: null-ptr-deref in range [0x00000000000003b8-0x00000000000003bf]
CPU: 12 UID: 0 PID: 162 Comm: kworker/12:1 Tainted: G O 6.15.0-rc2-00635-g521139ac3840 #272 PREEMPT(full)
Tainted: [O]=OOT_MODULE
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-20240910_120124-localhost 04/01/2014
Workqueue: events ovpn_peer_keepalive_work [ovpn]
RIP: 0010:ovpn_socket_release+0x23c/0x500 [ovpn]
Code: ea 03 80 3c 02 00 0f 85 71 02 00 00 48 b8 00 00 00 00 00 fc ff df 4d 8b 64 24 18 49 8d bc 24 be 03 00 00 48 89 fa 48 c1 ea 03 <0f> b6 14 02 48 89 f8 83 e0 07 83 c0 01 38 d0 7c 08 84 d2 0f 85 30
RSP: 0018:ffffc90000c9fb18 EFLAGS: 00010217
RAX: dffffc0000000000 RBX: ffff8881148d7940 RCX: ffffffff817787bb
RDX: 0000000000000077 RSI: 0000000000000008 RDI: 00000000000003be
RBP: ffffc90000c9fb30 R08: 0000000000000000 R09: fffffbfff0d3e840
R10: ffffffff869f4207 R11: 0000000000000000 R12: 0000000000000000
R13: ffff888115eb9300 R14: ffffc90000c9fbc8 R15: 000000000000000c
FS: 0000000000000000(0000) GS:ffff8882b0151000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f37266b6114 CR3: 00000000054a8000 CR4: 0000000000750ef0
PKRU: 55555554
Call Trace:
<TASK>
unlock_ovpn+0x8b/0xe0 [ovpn]
ovpn_peer_keepalive_work+0xe3/0x540 [ovpn]
? ovpn_peers_free+0x780/0x780 [ovpn]
? lock_acquire+0x56/0x70
? process_one_work+0x888/0x1740
process_one_work+0x933/0x1740
? pwq_dec_nr_in_flight+0x10b0/0x10b0
? move_linked_works+0x12d/0x2c0
? assign_work+0x163/0x270
worker_thread+0x4d6/0xd90
? preempt_count_sub+0x4c/0x70
? process_one_work+0x1740/0x1740
kthread+0x36c/0x710
? trace_preempt_on+0x8c/0x1e0
? kthread_is_per_cpu+0xc0/0xc0
? preempt_count_sub+0x4c/0x70
? _raw_spin_unlock_irq+0x36/0x60
? calculate_sigpending+0x7b/0xa0
? kthread_is_per_cpu+0xc0/0xc0
ret_from_fork+0x3a/0x80
? kthread_is_per_cpu+0xc0/0xc0
ret_from_fork_asm+0x11/0x20
</TASK>
Modules linked in: ovpn(O)
This happens because the peer deletion operation reaches
ovpn_socket_release() while ovpn_sock->sock (struct socket *)
and its sk member (struct sock *) are still both valid.
Here synchronize_rcu() is invoked, after which ovpn_sock->sock->sk
becomes NULL, due to the concurrent socket closing triggered
from userspace.
After having invoked synchronize_rcu(), ovpn_socket_release() will
attempt dereferencing ovpn_sock->sock->sk, triggering the crash
reported above.
The reason for accessing sk is that we need to retrieve its
protocol and continue the cleanup routine accordingly.
This crash can be easily produced by running openvpn userspace in
client mode with `--keepalive 10 20`, while entirely omitting this
option on the server side.
After 20 seconds ovpn will assume the peer (server) to be dead,
will start removing it and will notify userspace. The latter will
receive the notification and close the transport socket, thus
triggering the crash.
To fix the race condition for good, we need to refactor struct ovpn_socket.
Since ovpn is always only interested in the sock->sk member (struct sock *)
we can directly hold a reference to it, raher than accessing it via
its struct socket container.
This means changing "struct socket *ovpn_socket->sock" to
"struct sock *ovpn_socket->sk".
While acquiring a reference to sk, we can increase its refcounter
without affecting the socket close()/destroy() notification
(which we rely on when userspace closes a socket we are using).
By increasing sk's refcounter we know we can dereference it
in ovpn_socket_release() without incurring in any race condition
anymore.
ovpn_socket_release() will ultimately decrease the reference
counter.
Cc: Oleksandr Natalenko <oleksandr@natalenko.name>
Fixes: 11851cbd60ea ("ovpn: implement TCP transport")
Reported-by: Qingfang Deng <dqfext@gmail.com>
Closes: https://github.com/OpenVPN/ovpn-net-next/issues/1
Tested-by: Gert Doering <gert@greenie.muc.de>
Link: https://www.mail-archive.com/openvpn-devel@lists.sourceforge.net/msg31575.html
Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Signed-off-by: Antonio Quartulli <antonio@openvpn.net>
2025-04-30 02:26:49 +02:00
|
|
|
struct sock *sk = ovpn_sock->sk;
|
2025-04-15 13:17:25 +02:00
|
|
|
|
2025-05-13 16:43:59 +02:00
|
|
|
/* Re-enable multicast loopback */
|
|
|
|
inet_set_bit(MC_LOOP, sk);
|
|
|
|
/* Disable CHECKSUM_UNNECESSARY to CHECKSUM_COMPLETE conversion */
|
|
|
|
inet_dec_convert_csum(sk);
|
|
|
|
|
|
|
|
WRITE_ONCE(udp_sk(sk)->encap_type, 0);
|
|
|
|
WRITE_ONCE(udp_sk(sk)->encap_rcv, NULL);
|
|
|
|
WRITE_ONCE(udp_sk(sk)->encap_destroy, NULL);
|
|
|
|
|
|
|
|
rcu_assign_sk_user_data(sk, NULL);
|
2025-04-15 13:17:23 +02:00
|
|
|
}
|