2025-04-15 13:17:23 +02:00
|
|
|
// SPDX-License-Identifier: GPL-2.0
|
|
|
|
/* OpenVPN data channel offload
|
|
|
|
*
|
|
|
|
* Copyright (C) 2020-2025 OpenVPN, Inc.
|
|
|
|
*
|
|
|
|
* Author: James Yonan <james@openvpn.net>
|
|
|
|
* Antonio Quartulli <antonio@openvpn.net>
|
|
|
|
*/
|
|
|
|
|
|
|
|
#include <linux/net.h>
|
|
|
|
#include <linux/netdevice.h>
|
|
|
|
#include <linux/udp.h>
|
|
|
|
|
|
|
|
#include "ovpnpriv.h"
|
|
|
|
#include "main.h"
|
|
|
|
#include "io.h"
|
|
|
|
#include "peer.h"
|
|
|
|
#include "socket.h"
|
2025-04-15 13:17:28 +02:00
|
|
|
#include "tcp.h"
|
2025-04-15 13:17:23 +02:00
|
|
|
#include "udp.h"
|
|
|
|
|
|
|
|
static void ovpn_socket_release_kref(struct kref *kref)
|
|
|
|
{
|
|
|
|
struct ovpn_socket *sock = container_of(kref, struct ovpn_socket,
|
|
|
|
refcount);
|
|
|
|
|
ovpn: ensure sk is still valid during cleanup
Removing a peer while userspace attempts to close its transport
socket triggers a race condition resulting in the following
crash:
Oops: general protection fault, probably for non-canonical address 0xdffffc0000000077: 0000 [#1] SMP KASAN
KASAN: null-ptr-deref in range [0x00000000000003b8-0x00000000000003bf]
CPU: 12 UID: 0 PID: 162 Comm: kworker/12:1 Tainted: G O 6.15.0-rc2-00635-g521139ac3840 #272 PREEMPT(full)
Tainted: [O]=OOT_MODULE
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-20240910_120124-localhost 04/01/2014
Workqueue: events ovpn_peer_keepalive_work [ovpn]
RIP: 0010:ovpn_socket_release+0x23c/0x500 [ovpn]
Code: ea 03 80 3c 02 00 0f 85 71 02 00 00 48 b8 00 00 00 00 00 fc ff df 4d 8b 64 24 18 49 8d bc 24 be 03 00 00 48 89 fa 48 c1 ea 03 <0f> b6 14 02 48 89 f8 83 e0 07 83 c0 01 38 d0 7c 08 84 d2 0f 85 30
RSP: 0018:ffffc90000c9fb18 EFLAGS: 00010217
RAX: dffffc0000000000 RBX: ffff8881148d7940 RCX: ffffffff817787bb
RDX: 0000000000000077 RSI: 0000000000000008 RDI: 00000000000003be
RBP: ffffc90000c9fb30 R08: 0000000000000000 R09: fffffbfff0d3e840
R10: ffffffff869f4207 R11: 0000000000000000 R12: 0000000000000000
R13: ffff888115eb9300 R14: ffffc90000c9fbc8 R15: 000000000000000c
FS: 0000000000000000(0000) GS:ffff8882b0151000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f37266b6114 CR3: 00000000054a8000 CR4: 0000000000750ef0
PKRU: 55555554
Call Trace:
<TASK>
unlock_ovpn+0x8b/0xe0 [ovpn]
ovpn_peer_keepalive_work+0xe3/0x540 [ovpn]
? ovpn_peers_free+0x780/0x780 [ovpn]
? lock_acquire+0x56/0x70
? process_one_work+0x888/0x1740
process_one_work+0x933/0x1740
? pwq_dec_nr_in_flight+0x10b0/0x10b0
? move_linked_works+0x12d/0x2c0
? assign_work+0x163/0x270
worker_thread+0x4d6/0xd90
? preempt_count_sub+0x4c/0x70
? process_one_work+0x1740/0x1740
kthread+0x36c/0x710
? trace_preempt_on+0x8c/0x1e0
? kthread_is_per_cpu+0xc0/0xc0
? preempt_count_sub+0x4c/0x70
? _raw_spin_unlock_irq+0x36/0x60
? calculate_sigpending+0x7b/0xa0
? kthread_is_per_cpu+0xc0/0xc0
ret_from_fork+0x3a/0x80
? kthread_is_per_cpu+0xc0/0xc0
ret_from_fork_asm+0x11/0x20
</TASK>
Modules linked in: ovpn(O)
This happens because the peer deletion operation reaches
ovpn_socket_release() while ovpn_sock->sock (struct socket *)
and its sk member (struct sock *) are still both valid.
Here synchronize_rcu() is invoked, after which ovpn_sock->sock->sk
becomes NULL, due to the concurrent socket closing triggered
from userspace.
After having invoked synchronize_rcu(), ovpn_socket_release() will
attempt dereferencing ovpn_sock->sock->sk, triggering the crash
reported above.
The reason for accessing sk is that we need to retrieve its
protocol and continue the cleanup routine accordingly.
This crash can be easily produced by running openvpn userspace in
client mode with `--keepalive 10 20`, while entirely omitting this
option on the server side.
After 20 seconds ovpn will assume the peer (server) to be dead,
will start removing it and will notify userspace. The latter will
receive the notification and close the transport socket, thus
triggering the crash.
To fix the race condition for good, we need to refactor struct ovpn_socket.
Since ovpn is always only interested in the sock->sk member (struct sock *)
we can directly hold a reference to it, raher than accessing it via
its struct socket container.
This means changing "struct socket *ovpn_socket->sock" to
"struct sock *ovpn_socket->sk".
While acquiring a reference to sk, we can increase its refcounter
without affecting the socket close()/destroy() notification
(which we rely on when userspace closes a socket we are using).
By increasing sk's refcounter we know we can dereference it
in ovpn_socket_release() without incurring in any race condition
anymore.
ovpn_socket_release() will ultimately decrease the reference
counter.
Cc: Oleksandr Natalenko <oleksandr@natalenko.name>
Fixes: 11851cbd60ea ("ovpn: implement TCP transport")
Reported-by: Qingfang Deng <dqfext@gmail.com>
Closes: https://github.com/OpenVPN/ovpn-net-next/issues/1
Tested-by: Gert Doering <gert@greenie.muc.de>
Link: https://www.mail-archive.com/openvpn-devel@lists.sourceforge.net/msg31575.html
Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Signed-off-by: Antonio Quartulli <antonio@openvpn.net>
2025-04-30 02:26:49 +02:00
|
|
|
if (sock->sk->sk_protocol == IPPROTO_UDP)
|
2025-04-15 13:17:23 +02:00
|
|
|
ovpn_udp_socket_detach(sock);
|
ovpn: ensure sk is still valid during cleanup
Removing a peer while userspace attempts to close its transport
socket triggers a race condition resulting in the following
crash:
Oops: general protection fault, probably for non-canonical address 0xdffffc0000000077: 0000 [#1] SMP KASAN
KASAN: null-ptr-deref in range [0x00000000000003b8-0x00000000000003bf]
CPU: 12 UID: 0 PID: 162 Comm: kworker/12:1 Tainted: G O 6.15.0-rc2-00635-g521139ac3840 #272 PREEMPT(full)
Tainted: [O]=OOT_MODULE
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-20240910_120124-localhost 04/01/2014
Workqueue: events ovpn_peer_keepalive_work [ovpn]
RIP: 0010:ovpn_socket_release+0x23c/0x500 [ovpn]
Code: ea 03 80 3c 02 00 0f 85 71 02 00 00 48 b8 00 00 00 00 00 fc ff df 4d 8b 64 24 18 49 8d bc 24 be 03 00 00 48 89 fa 48 c1 ea 03 <0f> b6 14 02 48 89 f8 83 e0 07 83 c0 01 38 d0 7c 08 84 d2 0f 85 30
RSP: 0018:ffffc90000c9fb18 EFLAGS: 00010217
RAX: dffffc0000000000 RBX: ffff8881148d7940 RCX: ffffffff817787bb
RDX: 0000000000000077 RSI: 0000000000000008 RDI: 00000000000003be
RBP: ffffc90000c9fb30 R08: 0000000000000000 R09: fffffbfff0d3e840
R10: ffffffff869f4207 R11: 0000000000000000 R12: 0000000000000000
R13: ffff888115eb9300 R14: ffffc90000c9fbc8 R15: 000000000000000c
FS: 0000000000000000(0000) GS:ffff8882b0151000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f37266b6114 CR3: 00000000054a8000 CR4: 0000000000750ef0
PKRU: 55555554
Call Trace:
<TASK>
unlock_ovpn+0x8b/0xe0 [ovpn]
ovpn_peer_keepalive_work+0xe3/0x540 [ovpn]
? ovpn_peers_free+0x780/0x780 [ovpn]
? lock_acquire+0x56/0x70
? process_one_work+0x888/0x1740
process_one_work+0x933/0x1740
? pwq_dec_nr_in_flight+0x10b0/0x10b0
? move_linked_works+0x12d/0x2c0
? assign_work+0x163/0x270
worker_thread+0x4d6/0xd90
? preempt_count_sub+0x4c/0x70
? process_one_work+0x1740/0x1740
kthread+0x36c/0x710
? trace_preempt_on+0x8c/0x1e0
? kthread_is_per_cpu+0xc0/0xc0
? preempt_count_sub+0x4c/0x70
? _raw_spin_unlock_irq+0x36/0x60
? calculate_sigpending+0x7b/0xa0
? kthread_is_per_cpu+0xc0/0xc0
ret_from_fork+0x3a/0x80
? kthread_is_per_cpu+0xc0/0xc0
ret_from_fork_asm+0x11/0x20
</TASK>
Modules linked in: ovpn(O)
This happens because the peer deletion operation reaches
ovpn_socket_release() while ovpn_sock->sock (struct socket *)
and its sk member (struct sock *) are still both valid.
Here synchronize_rcu() is invoked, after which ovpn_sock->sock->sk
becomes NULL, due to the concurrent socket closing triggered
from userspace.
After having invoked synchronize_rcu(), ovpn_socket_release() will
attempt dereferencing ovpn_sock->sock->sk, triggering the crash
reported above.
The reason for accessing sk is that we need to retrieve its
protocol and continue the cleanup routine accordingly.
This crash can be easily produced by running openvpn userspace in
client mode with `--keepalive 10 20`, while entirely omitting this
option on the server side.
After 20 seconds ovpn will assume the peer (server) to be dead,
will start removing it and will notify userspace. The latter will
receive the notification and close the transport socket, thus
triggering the crash.
To fix the race condition for good, we need to refactor struct ovpn_socket.
Since ovpn is always only interested in the sock->sk member (struct sock *)
we can directly hold a reference to it, raher than accessing it via
its struct socket container.
This means changing "struct socket *ovpn_socket->sock" to
"struct sock *ovpn_socket->sk".
While acquiring a reference to sk, we can increase its refcounter
without affecting the socket close()/destroy() notification
(which we rely on when userspace closes a socket we are using).
By increasing sk's refcounter we know we can dereference it
in ovpn_socket_release() without incurring in any race condition
anymore.
ovpn_socket_release() will ultimately decrease the reference
counter.
Cc: Oleksandr Natalenko <oleksandr@natalenko.name>
Fixes: 11851cbd60ea ("ovpn: implement TCP transport")
Reported-by: Qingfang Deng <dqfext@gmail.com>
Closes: https://github.com/OpenVPN/ovpn-net-next/issues/1
Tested-by: Gert Doering <gert@greenie.muc.de>
Link: https://www.mail-archive.com/openvpn-devel@lists.sourceforge.net/msg31575.html
Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Signed-off-by: Antonio Quartulli <antonio@openvpn.net>
2025-04-30 02:26:49 +02:00
|
|
|
else if (sock->sk->sk_protocol == IPPROTO_TCP)
|
2025-04-15 13:17:28 +02:00
|
|
|
ovpn_tcp_socket_detach(sock);
|
2025-04-15 13:17:23 +02:00
|
|
|
}
|
|
|
|
|
|
|
|
/**
|
|
|
|
* ovpn_socket_put - decrease reference counter
|
|
|
|
* @peer: peer whose socket reference counter should be decreased
|
|
|
|
* @sock: the RCU protected peer socket
|
|
|
|
*
|
|
|
|
* This function is only used internally. Users willing to release
|
|
|
|
* references to the ovpn_socket should use ovpn_socket_release()
|
2025-04-15 13:17:28 +02:00
|
|
|
*
|
|
|
|
* Return: true if the socket was released, false otherwise
|
2025-04-15 13:17:23 +02:00
|
|
|
*/
|
2025-04-15 13:17:28 +02:00
|
|
|
static bool ovpn_socket_put(struct ovpn_peer *peer, struct ovpn_socket *sock)
|
2025-04-15 13:17:23 +02:00
|
|
|
{
|
2025-04-15 13:17:28 +02:00
|
|
|
return kref_put(&sock->refcount, ovpn_socket_release_kref);
|
2025-04-15 13:17:23 +02:00
|
|
|
}
|
|
|
|
|
|
|
|
/**
|
|
|
|
* ovpn_socket_release - release resources owned by socket user
|
|
|
|
* @peer: peer whose socket should be released
|
|
|
|
*
|
2025-04-15 13:17:35 +02:00
|
|
|
* This function should be invoked when the peer is being removed
|
|
|
|
* and wants to drop its link to the socket.
|
2025-04-15 13:17:23 +02:00
|
|
|
*
|
|
|
|
* In case of UDP, the detach routine will drop a reference to the
|
|
|
|
* ovpn netdev, pointed by the ovpn_socket.
|
|
|
|
*
|
|
|
|
* In case of TCP, releasing the socket will cause dropping
|
|
|
|
* the refcounter for the peer it is linked to, thus allowing the peer
|
|
|
|
* disappear as well.
|
|
|
|
*
|
|
|
|
* This function is expected to be invoked exactly once per peer
|
|
|
|
*
|
|
|
|
* NOTE: this function may sleep
|
|
|
|
*/
|
|
|
|
void ovpn_socket_release(struct ovpn_peer *peer)
|
|
|
|
{
|
|
|
|
struct ovpn_socket *sock;
|
2025-04-15 13:17:28 +02:00
|
|
|
bool released;
|
2025-04-15 13:17:23 +02:00
|
|
|
|
|
|
|
might_sleep();
|
|
|
|
|
|
|
|
sock = rcu_replace_pointer(peer->sock, NULL, true);
|
|
|
|
/* release may be invoked after socket was detached */
|
|
|
|
if (!sock)
|
|
|
|
return;
|
|
|
|
|
|
|
|
/* Drop the reference while holding the sock lock to avoid
|
|
|
|
* concurrent ovpn_socket_new call to mess up with a partially
|
|
|
|
* detached socket.
|
|
|
|
*
|
|
|
|
* Holding the lock ensures that a socket with refcnt 0 is fully
|
|
|
|
* detached before it can be picked by a concurrent reader.
|
|
|
|
*/
|
ovpn: ensure sk is still valid during cleanup
Removing a peer while userspace attempts to close its transport
socket triggers a race condition resulting in the following
crash:
Oops: general protection fault, probably for non-canonical address 0xdffffc0000000077: 0000 [#1] SMP KASAN
KASAN: null-ptr-deref in range [0x00000000000003b8-0x00000000000003bf]
CPU: 12 UID: 0 PID: 162 Comm: kworker/12:1 Tainted: G O 6.15.0-rc2-00635-g521139ac3840 #272 PREEMPT(full)
Tainted: [O]=OOT_MODULE
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-20240910_120124-localhost 04/01/2014
Workqueue: events ovpn_peer_keepalive_work [ovpn]
RIP: 0010:ovpn_socket_release+0x23c/0x500 [ovpn]
Code: ea 03 80 3c 02 00 0f 85 71 02 00 00 48 b8 00 00 00 00 00 fc ff df 4d 8b 64 24 18 49 8d bc 24 be 03 00 00 48 89 fa 48 c1 ea 03 <0f> b6 14 02 48 89 f8 83 e0 07 83 c0 01 38 d0 7c 08 84 d2 0f 85 30
RSP: 0018:ffffc90000c9fb18 EFLAGS: 00010217
RAX: dffffc0000000000 RBX: ffff8881148d7940 RCX: ffffffff817787bb
RDX: 0000000000000077 RSI: 0000000000000008 RDI: 00000000000003be
RBP: ffffc90000c9fb30 R08: 0000000000000000 R09: fffffbfff0d3e840
R10: ffffffff869f4207 R11: 0000000000000000 R12: 0000000000000000
R13: ffff888115eb9300 R14: ffffc90000c9fbc8 R15: 000000000000000c
FS: 0000000000000000(0000) GS:ffff8882b0151000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f37266b6114 CR3: 00000000054a8000 CR4: 0000000000750ef0
PKRU: 55555554
Call Trace:
<TASK>
unlock_ovpn+0x8b/0xe0 [ovpn]
ovpn_peer_keepalive_work+0xe3/0x540 [ovpn]
? ovpn_peers_free+0x780/0x780 [ovpn]
? lock_acquire+0x56/0x70
? process_one_work+0x888/0x1740
process_one_work+0x933/0x1740
? pwq_dec_nr_in_flight+0x10b0/0x10b0
? move_linked_works+0x12d/0x2c0
? assign_work+0x163/0x270
worker_thread+0x4d6/0xd90
? preempt_count_sub+0x4c/0x70
? process_one_work+0x1740/0x1740
kthread+0x36c/0x710
? trace_preempt_on+0x8c/0x1e0
? kthread_is_per_cpu+0xc0/0xc0
? preempt_count_sub+0x4c/0x70
? _raw_spin_unlock_irq+0x36/0x60
? calculate_sigpending+0x7b/0xa0
? kthread_is_per_cpu+0xc0/0xc0
ret_from_fork+0x3a/0x80
? kthread_is_per_cpu+0xc0/0xc0
ret_from_fork_asm+0x11/0x20
</TASK>
Modules linked in: ovpn(O)
This happens because the peer deletion operation reaches
ovpn_socket_release() while ovpn_sock->sock (struct socket *)
and its sk member (struct sock *) are still both valid.
Here synchronize_rcu() is invoked, after which ovpn_sock->sock->sk
becomes NULL, due to the concurrent socket closing triggered
from userspace.
After having invoked synchronize_rcu(), ovpn_socket_release() will
attempt dereferencing ovpn_sock->sock->sk, triggering the crash
reported above.
The reason for accessing sk is that we need to retrieve its
protocol and continue the cleanup routine accordingly.
This crash can be easily produced by running openvpn userspace in
client mode with `--keepalive 10 20`, while entirely omitting this
option on the server side.
After 20 seconds ovpn will assume the peer (server) to be dead,
will start removing it and will notify userspace. The latter will
receive the notification and close the transport socket, thus
triggering the crash.
To fix the race condition for good, we need to refactor struct ovpn_socket.
Since ovpn is always only interested in the sock->sk member (struct sock *)
we can directly hold a reference to it, raher than accessing it via
its struct socket container.
This means changing "struct socket *ovpn_socket->sock" to
"struct sock *ovpn_socket->sk".
While acquiring a reference to sk, we can increase its refcounter
without affecting the socket close()/destroy() notification
(which we rely on when userspace closes a socket we are using).
By increasing sk's refcounter we know we can dereference it
in ovpn_socket_release() without incurring in any race condition
anymore.
ovpn_socket_release() will ultimately decrease the reference
counter.
Cc: Oleksandr Natalenko <oleksandr@natalenko.name>
Fixes: 11851cbd60ea ("ovpn: implement TCP transport")
Reported-by: Qingfang Deng <dqfext@gmail.com>
Closes: https://github.com/OpenVPN/ovpn-net-next/issues/1
Tested-by: Gert Doering <gert@greenie.muc.de>
Link: https://www.mail-archive.com/openvpn-devel@lists.sourceforge.net/msg31575.html
Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Signed-off-by: Antonio Quartulli <antonio@openvpn.net>
2025-04-30 02:26:49 +02:00
|
|
|
lock_sock(sock->sk);
|
2025-04-15 13:17:28 +02:00
|
|
|
released = ovpn_socket_put(peer, sock);
|
ovpn: ensure sk is still valid during cleanup
Removing a peer while userspace attempts to close its transport
socket triggers a race condition resulting in the following
crash:
Oops: general protection fault, probably for non-canonical address 0xdffffc0000000077: 0000 [#1] SMP KASAN
KASAN: null-ptr-deref in range [0x00000000000003b8-0x00000000000003bf]
CPU: 12 UID: 0 PID: 162 Comm: kworker/12:1 Tainted: G O 6.15.0-rc2-00635-g521139ac3840 #272 PREEMPT(full)
Tainted: [O]=OOT_MODULE
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-20240910_120124-localhost 04/01/2014
Workqueue: events ovpn_peer_keepalive_work [ovpn]
RIP: 0010:ovpn_socket_release+0x23c/0x500 [ovpn]
Code: ea 03 80 3c 02 00 0f 85 71 02 00 00 48 b8 00 00 00 00 00 fc ff df 4d 8b 64 24 18 49 8d bc 24 be 03 00 00 48 89 fa 48 c1 ea 03 <0f> b6 14 02 48 89 f8 83 e0 07 83 c0 01 38 d0 7c 08 84 d2 0f 85 30
RSP: 0018:ffffc90000c9fb18 EFLAGS: 00010217
RAX: dffffc0000000000 RBX: ffff8881148d7940 RCX: ffffffff817787bb
RDX: 0000000000000077 RSI: 0000000000000008 RDI: 00000000000003be
RBP: ffffc90000c9fb30 R08: 0000000000000000 R09: fffffbfff0d3e840
R10: ffffffff869f4207 R11: 0000000000000000 R12: 0000000000000000
R13: ffff888115eb9300 R14: ffffc90000c9fbc8 R15: 000000000000000c
FS: 0000000000000000(0000) GS:ffff8882b0151000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f37266b6114 CR3: 00000000054a8000 CR4: 0000000000750ef0
PKRU: 55555554
Call Trace:
<TASK>
unlock_ovpn+0x8b/0xe0 [ovpn]
ovpn_peer_keepalive_work+0xe3/0x540 [ovpn]
? ovpn_peers_free+0x780/0x780 [ovpn]
? lock_acquire+0x56/0x70
? process_one_work+0x888/0x1740
process_one_work+0x933/0x1740
? pwq_dec_nr_in_flight+0x10b0/0x10b0
? move_linked_works+0x12d/0x2c0
? assign_work+0x163/0x270
worker_thread+0x4d6/0xd90
? preempt_count_sub+0x4c/0x70
? process_one_work+0x1740/0x1740
kthread+0x36c/0x710
? trace_preempt_on+0x8c/0x1e0
? kthread_is_per_cpu+0xc0/0xc0
? preempt_count_sub+0x4c/0x70
? _raw_spin_unlock_irq+0x36/0x60
? calculate_sigpending+0x7b/0xa0
? kthread_is_per_cpu+0xc0/0xc0
ret_from_fork+0x3a/0x80
? kthread_is_per_cpu+0xc0/0xc0
ret_from_fork_asm+0x11/0x20
</TASK>
Modules linked in: ovpn(O)
This happens because the peer deletion operation reaches
ovpn_socket_release() while ovpn_sock->sock (struct socket *)
and its sk member (struct sock *) are still both valid.
Here synchronize_rcu() is invoked, after which ovpn_sock->sock->sk
becomes NULL, due to the concurrent socket closing triggered
from userspace.
After having invoked synchronize_rcu(), ovpn_socket_release() will
attempt dereferencing ovpn_sock->sock->sk, triggering the crash
reported above.
The reason for accessing sk is that we need to retrieve its
protocol and continue the cleanup routine accordingly.
This crash can be easily produced by running openvpn userspace in
client mode with `--keepalive 10 20`, while entirely omitting this
option on the server side.
After 20 seconds ovpn will assume the peer (server) to be dead,
will start removing it and will notify userspace. The latter will
receive the notification and close the transport socket, thus
triggering the crash.
To fix the race condition for good, we need to refactor struct ovpn_socket.
Since ovpn is always only interested in the sock->sk member (struct sock *)
we can directly hold a reference to it, raher than accessing it via
its struct socket container.
This means changing "struct socket *ovpn_socket->sock" to
"struct sock *ovpn_socket->sk".
While acquiring a reference to sk, we can increase its refcounter
without affecting the socket close()/destroy() notification
(which we rely on when userspace closes a socket we are using).
By increasing sk's refcounter we know we can dereference it
in ovpn_socket_release() without incurring in any race condition
anymore.
ovpn_socket_release() will ultimately decrease the reference
counter.
Cc: Oleksandr Natalenko <oleksandr@natalenko.name>
Fixes: 11851cbd60ea ("ovpn: implement TCP transport")
Reported-by: Qingfang Deng <dqfext@gmail.com>
Closes: https://github.com/OpenVPN/ovpn-net-next/issues/1
Tested-by: Gert Doering <gert@greenie.muc.de>
Link: https://www.mail-archive.com/openvpn-devel@lists.sourceforge.net/msg31575.html
Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Signed-off-by: Antonio Quartulli <antonio@openvpn.net>
2025-04-30 02:26:49 +02:00
|
|
|
release_sock(sock->sk);
|
2025-04-15 13:17:23 +02:00
|
|
|
|
|
|
|
/* align all readers with sk_user_data being NULL */
|
|
|
|
synchronize_rcu();
|
2025-04-15 13:17:28 +02:00
|
|
|
|
|
|
|
/* following cleanup should happen with lock released */
|
|
|
|
if (released) {
|
ovpn: ensure sk is still valid during cleanup
Removing a peer while userspace attempts to close its transport
socket triggers a race condition resulting in the following
crash:
Oops: general protection fault, probably for non-canonical address 0xdffffc0000000077: 0000 [#1] SMP KASAN
KASAN: null-ptr-deref in range [0x00000000000003b8-0x00000000000003bf]
CPU: 12 UID: 0 PID: 162 Comm: kworker/12:1 Tainted: G O 6.15.0-rc2-00635-g521139ac3840 #272 PREEMPT(full)
Tainted: [O]=OOT_MODULE
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-20240910_120124-localhost 04/01/2014
Workqueue: events ovpn_peer_keepalive_work [ovpn]
RIP: 0010:ovpn_socket_release+0x23c/0x500 [ovpn]
Code: ea 03 80 3c 02 00 0f 85 71 02 00 00 48 b8 00 00 00 00 00 fc ff df 4d 8b 64 24 18 49 8d bc 24 be 03 00 00 48 89 fa 48 c1 ea 03 <0f> b6 14 02 48 89 f8 83 e0 07 83 c0 01 38 d0 7c 08 84 d2 0f 85 30
RSP: 0018:ffffc90000c9fb18 EFLAGS: 00010217
RAX: dffffc0000000000 RBX: ffff8881148d7940 RCX: ffffffff817787bb
RDX: 0000000000000077 RSI: 0000000000000008 RDI: 00000000000003be
RBP: ffffc90000c9fb30 R08: 0000000000000000 R09: fffffbfff0d3e840
R10: ffffffff869f4207 R11: 0000000000000000 R12: 0000000000000000
R13: ffff888115eb9300 R14: ffffc90000c9fbc8 R15: 000000000000000c
FS: 0000000000000000(0000) GS:ffff8882b0151000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f37266b6114 CR3: 00000000054a8000 CR4: 0000000000750ef0
PKRU: 55555554
Call Trace:
<TASK>
unlock_ovpn+0x8b/0xe0 [ovpn]
ovpn_peer_keepalive_work+0xe3/0x540 [ovpn]
? ovpn_peers_free+0x780/0x780 [ovpn]
? lock_acquire+0x56/0x70
? process_one_work+0x888/0x1740
process_one_work+0x933/0x1740
? pwq_dec_nr_in_flight+0x10b0/0x10b0
? move_linked_works+0x12d/0x2c0
? assign_work+0x163/0x270
worker_thread+0x4d6/0xd90
? preempt_count_sub+0x4c/0x70
? process_one_work+0x1740/0x1740
kthread+0x36c/0x710
? trace_preempt_on+0x8c/0x1e0
? kthread_is_per_cpu+0xc0/0xc0
? preempt_count_sub+0x4c/0x70
? _raw_spin_unlock_irq+0x36/0x60
? calculate_sigpending+0x7b/0xa0
? kthread_is_per_cpu+0xc0/0xc0
ret_from_fork+0x3a/0x80
? kthread_is_per_cpu+0xc0/0xc0
ret_from_fork_asm+0x11/0x20
</TASK>
Modules linked in: ovpn(O)
This happens because the peer deletion operation reaches
ovpn_socket_release() while ovpn_sock->sock (struct socket *)
and its sk member (struct sock *) are still both valid.
Here synchronize_rcu() is invoked, after which ovpn_sock->sock->sk
becomes NULL, due to the concurrent socket closing triggered
from userspace.
After having invoked synchronize_rcu(), ovpn_socket_release() will
attempt dereferencing ovpn_sock->sock->sk, triggering the crash
reported above.
The reason for accessing sk is that we need to retrieve its
protocol and continue the cleanup routine accordingly.
This crash can be easily produced by running openvpn userspace in
client mode with `--keepalive 10 20`, while entirely omitting this
option on the server side.
After 20 seconds ovpn will assume the peer (server) to be dead,
will start removing it and will notify userspace. The latter will
receive the notification and close the transport socket, thus
triggering the crash.
To fix the race condition for good, we need to refactor struct ovpn_socket.
Since ovpn is always only interested in the sock->sk member (struct sock *)
we can directly hold a reference to it, raher than accessing it via
its struct socket container.
This means changing "struct socket *ovpn_socket->sock" to
"struct sock *ovpn_socket->sk".
While acquiring a reference to sk, we can increase its refcounter
without affecting the socket close()/destroy() notification
(which we rely on when userspace closes a socket we are using).
By increasing sk's refcounter we know we can dereference it
in ovpn_socket_release() without incurring in any race condition
anymore.
ovpn_socket_release() will ultimately decrease the reference
counter.
Cc: Oleksandr Natalenko <oleksandr@natalenko.name>
Fixes: 11851cbd60ea ("ovpn: implement TCP transport")
Reported-by: Qingfang Deng <dqfext@gmail.com>
Closes: https://github.com/OpenVPN/ovpn-net-next/issues/1
Tested-by: Gert Doering <gert@greenie.muc.de>
Link: https://www.mail-archive.com/openvpn-devel@lists.sourceforge.net/msg31575.html
Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Signed-off-by: Antonio Quartulli <antonio@openvpn.net>
2025-04-30 02:26:49 +02:00
|
|
|
if (sock->sk->sk_protocol == IPPROTO_UDP) {
|
2025-04-15 13:17:28 +02:00
|
|
|
netdev_put(sock->ovpn->dev, &sock->dev_tracker);
|
ovpn: ensure sk is still valid during cleanup
Removing a peer while userspace attempts to close its transport
socket triggers a race condition resulting in the following
crash:
Oops: general protection fault, probably for non-canonical address 0xdffffc0000000077: 0000 [#1] SMP KASAN
KASAN: null-ptr-deref in range [0x00000000000003b8-0x00000000000003bf]
CPU: 12 UID: 0 PID: 162 Comm: kworker/12:1 Tainted: G O 6.15.0-rc2-00635-g521139ac3840 #272 PREEMPT(full)
Tainted: [O]=OOT_MODULE
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-20240910_120124-localhost 04/01/2014
Workqueue: events ovpn_peer_keepalive_work [ovpn]
RIP: 0010:ovpn_socket_release+0x23c/0x500 [ovpn]
Code: ea 03 80 3c 02 00 0f 85 71 02 00 00 48 b8 00 00 00 00 00 fc ff df 4d 8b 64 24 18 49 8d bc 24 be 03 00 00 48 89 fa 48 c1 ea 03 <0f> b6 14 02 48 89 f8 83 e0 07 83 c0 01 38 d0 7c 08 84 d2 0f 85 30
RSP: 0018:ffffc90000c9fb18 EFLAGS: 00010217
RAX: dffffc0000000000 RBX: ffff8881148d7940 RCX: ffffffff817787bb
RDX: 0000000000000077 RSI: 0000000000000008 RDI: 00000000000003be
RBP: ffffc90000c9fb30 R08: 0000000000000000 R09: fffffbfff0d3e840
R10: ffffffff869f4207 R11: 0000000000000000 R12: 0000000000000000
R13: ffff888115eb9300 R14: ffffc90000c9fbc8 R15: 000000000000000c
FS: 0000000000000000(0000) GS:ffff8882b0151000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f37266b6114 CR3: 00000000054a8000 CR4: 0000000000750ef0
PKRU: 55555554
Call Trace:
<TASK>
unlock_ovpn+0x8b/0xe0 [ovpn]
ovpn_peer_keepalive_work+0xe3/0x540 [ovpn]
? ovpn_peers_free+0x780/0x780 [ovpn]
? lock_acquire+0x56/0x70
? process_one_work+0x888/0x1740
process_one_work+0x933/0x1740
? pwq_dec_nr_in_flight+0x10b0/0x10b0
? move_linked_works+0x12d/0x2c0
? assign_work+0x163/0x270
worker_thread+0x4d6/0xd90
? preempt_count_sub+0x4c/0x70
? process_one_work+0x1740/0x1740
kthread+0x36c/0x710
? trace_preempt_on+0x8c/0x1e0
? kthread_is_per_cpu+0xc0/0xc0
? preempt_count_sub+0x4c/0x70
? _raw_spin_unlock_irq+0x36/0x60
? calculate_sigpending+0x7b/0xa0
? kthread_is_per_cpu+0xc0/0xc0
ret_from_fork+0x3a/0x80
? kthread_is_per_cpu+0xc0/0xc0
ret_from_fork_asm+0x11/0x20
</TASK>
Modules linked in: ovpn(O)
This happens because the peer deletion operation reaches
ovpn_socket_release() while ovpn_sock->sock (struct socket *)
and its sk member (struct sock *) are still both valid.
Here synchronize_rcu() is invoked, after which ovpn_sock->sock->sk
becomes NULL, due to the concurrent socket closing triggered
from userspace.
After having invoked synchronize_rcu(), ovpn_socket_release() will
attempt dereferencing ovpn_sock->sock->sk, triggering the crash
reported above.
The reason for accessing sk is that we need to retrieve its
protocol and continue the cleanup routine accordingly.
This crash can be easily produced by running openvpn userspace in
client mode with `--keepalive 10 20`, while entirely omitting this
option on the server side.
After 20 seconds ovpn will assume the peer (server) to be dead,
will start removing it and will notify userspace. The latter will
receive the notification and close the transport socket, thus
triggering the crash.
To fix the race condition for good, we need to refactor struct ovpn_socket.
Since ovpn is always only interested in the sock->sk member (struct sock *)
we can directly hold a reference to it, raher than accessing it via
its struct socket container.
This means changing "struct socket *ovpn_socket->sock" to
"struct sock *ovpn_socket->sk".
While acquiring a reference to sk, we can increase its refcounter
without affecting the socket close()/destroy() notification
(which we rely on when userspace closes a socket we are using).
By increasing sk's refcounter we know we can dereference it
in ovpn_socket_release() without incurring in any race condition
anymore.
ovpn_socket_release() will ultimately decrease the reference
counter.
Cc: Oleksandr Natalenko <oleksandr@natalenko.name>
Fixes: 11851cbd60ea ("ovpn: implement TCP transport")
Reported-by: Qingfang Deng <dqfext@gmail.com>
Closes: https://github.com/OpenVPN/ovpn-net-next/issues/1
Tested-by: Gert Doering <gert@greenie.muc.de>
Link: https://www.mail-archive.com/openvpn-devel@lists.sourceforge.net/msg31575.html
Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Signed-off-by: Antonio Quartulli <antonio@openvpn.net>
2025-04-30 02:26:49 +02:00
|
|
|
} else if (sock->sk->sk_protocol == IPPROTO_TCP) {
|
2025-04-15 13:17:28 +02:00
|
|
|
/* wait for TCP jobs to terminate */
|
|
|
|
ovpn_tcp_socket_wait_finish(sock);
|
|
|
|
ovpn_peer_put(sock->peer);
|
|
|
|
}
|
ovpn: ensure sk is still valid during cleanup
Removing a peer while userspace attempts to close its transport
socket triggers a race condition resulting in the following
crash:
Oops: general protection fault, probably for non-canonical address 0xdffffc0000000077: 0000 [#1] SMP KASAN
KASAN: null-ptr-deref in range [0x00000000000003b8-0x00000000000003bf]
CPU: 12 UID: 0 PID: 162 Comm: kworker/12:1 Tainted: G O 6.15.0-rc2-00635-g521139ac3840 #272 PREEMPT(full)
Tainted: [O]=OOT_MODULE
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-20240910_120124-localhost 04/01/2014
Workqueue: events ovpn_peer_keepalive_work [ovpn]
RIP: 0010:ovpn_socket_release+0x23c/0x500 [ovpn]
Code: ea 03 80 3c 02 00 0f 85 71 02 00 00 48 b8 00 00 00 00 00 fc ff df 4d 8b 64 24 18 49 8d bc 24 be 03 00 00 48 89 fa 48 c1 ea 03 <0f> b6 14 02 48 89 f8 83 e0 07 83 c0 01 38 d0 7c 08 84 d2 0f 85 30
RSP: 0018:ffffc90000c9fb18 EFLAGS: 00010217
RAX: dffffc0000000000 RBX: ffff8881148d7940 RCX: ffffffff817787bb
RDX: 0000000000000077 RSI: 0000000000000008 RDI: 00000000000003be
RBP: ffffc90000c9fb30 R08: 0000000000000000 R09: fffffbfff0d3e840
R10: ffffffff869f4207 R11: 0000000000000000 R12: 0000000000000000
R13: ffff888115eb9300 R14: ffffc90000c9fbc8 R15: 000000000000000c
FS: 0000000000000000(0000) GS:ffff8882b0151000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f37266b6114 CR3: 00000000054a8000 CR4: 0000000000750ef0
PKRU: 55555554
Call Trace:
<TASK>
unlock_ovpn+0x8b/0xe0 [ovpn]
ovpn_peer_keepalive_work+0xe3/0x540 [ovpn]
? ovpn_peers_free+0x780/0x780 [ovpn]
? lock_acquire+0x56/0x70
? process_one_work+0x888/0x1740
process_one_work+0x933/0x1740
? pwq_dec_nr_in_flight+0x10b0/0x10b0
? move_linked_works+0x12d/0x2c0
? assign_work+0x163/0x270
worker_thread+0x4d6/0xd90
? preempt_count_sub+0x4c/0x70
? process_one_work+0x1740/0x1740
kthread+0x36c/0x710
? trace_preempt_on+0x8c/0x1e0
? kthread_is_per_cpu+0xc0/0xc0
? preempt_count_sub+0x4c/0x70
? _raw_spin_unlock_irq+0x36/0x60
? calculate_sigpending+0x7b/0xa0
? kthread_is_per_cpu+0xc0/0xc0
ret_from_fork+0x3a/0x80
? kthread_is_per_cpu+0xc0/0xc0
ret_from_fork_asm+0x11/0x20
</TASK>
Modules linked in: ovpn(O)
This happens because the peer deletion operation reaches
ovpn_socket_release() while ovpn_sock->sock (struct socket *)
and its sk member (struct sock *) are still both valid.
Here synchronize_rcu() is invoked, after which ovpn_sock->sock->sk
becomes NULL, due to the concurrent socket closing triggered
from userspace.
After having invoked synchronize_rcu(), ovpn_socket_release() will
attempt dereferencing ovpn_sock->sock->sk, triggering the crash
reported above.
The reason for accessing sk is that we need to retrieve its
protocol and continue the cleanup routine accordingly.
This crash can be easily produced by running openvpn userspace in
client mode with `--keepalive 10 20`, while entirely omitting this
option on the server side.
After 20 seconds ovpn will assume the peer (server) to be dead,
will start removing it and will notify userspace. The latter will
receive the notification and close the transport socket, thus
triggering the crash.
To fix the race condition for good, we need to refactor struct ovpn_socket.
Since ovpn is always only interested in the sock->sk member (struct sock *)
we can directly hold a reference to it, raher than accessing it via
its struct socket container.
This means changing "struct socket *ovpn_socket->sock" to
"struct sock *ovpn_socket->sk".
While acquiring a reference to sk, we can increase its refcounter
without affecting the socket close()/destroy() notification
(which we rely on when userspace closes a socket we are using).
By increasing sk's refcounter we know we can dereference it
in ovpn_socket_release() without incurring in any race condition
anymore.
ovpn_socket_release() will ultimately decrease the reference
counter.
Cc: Oleksandr Natalenko <oleksandr@natalenko.name>
Fixes: 11851cbd60ea ("ovpn: implement TCP transport")
Reported-by: Qingfang Deng <dqfext@gmail.com>
Closes: https://github.com/OpenVPN/ovpn-net-next/issues/1
Tested-by: Gert Doering <gert@greenie.muc.de>
Link: https://www.mail-archive.com/openvpn-devel@lists.sourceforge.net/msg31575.html
Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Signed-off-by: Antonio Quartulli <antonio@openvpn.net>
2025-04-30 02:26:49 +02:00
|
|
|
/* drop reference acquired in ovpn_socket_new() */
|
|
|
|
sock_put(sock->sk);
|
2025-04-15 13:17:28 +02:00
|
|
|
/* we can call plain kfree() because we already waited one RCU
|
|
|
|
* period due to synchronize_rcu()
|
|
|
|
*/
|
|
|
|
kfree(sock);
|
|
|
|
}
|
2025-04-15 13:17:23 +02:00
|
|
|
}
|
|
|
|
|
|
|
|
static bool ovpn_socket_hold(struct ovpn_socket *sock)
|
|
|
|
{
|
|
|
|
return kref_get_unless_zero(&sock->refcount);
|
|
|
|
}
|
|
|
|
|
ovpn: ensure sk is still valid during cleanup
Removing a peer while userspace attempts to close its transport
socket triggers a race condition resulting in the following
crash:
Oops: general protection fault, probably for non-canonical address 0xdffffc0000000077: 0000 [#1] SMP KASAN
KASAN: null-ptr-deref in range [0x00000000000003b8-0x00000000000003bf]
CPU: 12 UID: 0 PID: 162 Comm: kworker/12:1 Tainted: G O 6.15.0-rc2-00635-g521139ac3840 #272 PREEMPT(full)
Tainted: [O]=OOT_MODULE
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-20240910_120124-localhost 04/01/2014
Workqueue: events ovpn_peer_keepalive_work [ovpn]
RIP: 0010:ovpn_socket_release+0x23c/0x500 [ovpn]
Code: ea 03 80 3c 02 00 0f 85 71 02 00 00 48 b8 00 00 00 00 00 fc ff df 4d 8b 64 24 18 49 8d bc 24 be 03 00 00 48 89 fa 48 c1 ea 03 <0f> b6 14 02 48 89 f8 83 e0 07 83 c0 01 38 d0 7c 08 84 d2 0f 85 30
RSP: 0018:ffffc90000c9fb18 EFLAGS: 00010217
RAX: dffffc0000000000 RBX: ffff8881148d7940 RCX: ffffffff817787bb
RDX: 0000000000000077 RSI: 0000000000000008 RDI: 00000000000003be
RBP: ffffc90000c9fb30 R08: 0000000000000000 R09: fffffbfff0d3e840
R10: ffffffff869f4207 R11: 0000000000000000 R12: 0000000000000000
R13: ffff888115eb9300 R14: ffffc90000c9fbc8 R15: 000000000000000c
FS: 0000000000000000(0000) GS:ffff8882b0151000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f37266b6114 CR3: 00000000054a8000 CR4: 0000000000750ef0
PKRU: 55555554
Call Trace:
<TASK>
unlock_ovpn+0x8b/0xe0 [ovpn]
ovpn_peer_keepalive_work+0xe3/0x540 [ovpn]
? ovpn_peers_free+0x780/0x780 [ovpn]
? lock_acquire+0x56/0x70
? process_one_work+0x888/0x1740
process_one_work+0x933/0x1740
? pwq_dec_nr_in_flight+0x10b0/0x10b0
? move_linked_works+0x12d/0x2c0
? assign_work+0x163/0x270
worker_thread+0x4d6/0xd90
? preempt_count_sub+0x4c/0x70
? process_one_work+0x1740/0x1740
kthread+0x36c/0x710
? trace_preempt_on+0x8c/0x1e0
? kthread_is_per_cpu+0xc0/0xc0
? preempt_count_sub+0x4c/0x70
? _raw_spin_unlock_irq+0x36/0x60
? calculate_sigpending+0x7b/0xa0
? kthread_is_per_cpu+0xc0/0xc0
ret_from_fork+0x3a/0x80
? kthread_is_per_cpu+0xc0/0xc0
ret_from_fork_asm+0x11/0x20
</TASK>
Modules linked in: ovpn(O)
This happens because the peer deletion operation reaches
ovpn_socket_release() while ovpn_sock->sock (struct socket *)
and its sk member (struct sock *) are still both valid.
Here synchronize_rcu() is invoked, after which ovpn_sock->sock->sk
becomes NULL, due to the concurrent socket closing triggered
from userspace.
After having invoked synchronize_rcu(), ovpn_socket_release() will
attempt dereferencing ovpn_sock->sock->sk, triggering the crash
reported above.
The reason for accessing sk is that we need to retrieve its
protocol and continue the cleanup routine accordingly.
This crash can be easily produced by running openvpn userspace in
client mode with `--keepalive 10 20`, while entirely omitting this
option on the server side.
After 20 seconds ovpn will assume the peer (server) to be dead,
will start removing it and will notify userspace. The latter will
receive the notification and close the transport socket, thus
triggering the crash.
To fix the race condition for good, we need to refactor struct ovpn_socket.
Since ovpn is always only interested in the sock->sk member (struct sock *)
we can directly hold a reference to it, raher than accessing it via
its struct socket container.
This means changing "struct socket *ovpn_socket->sock" to
"struct sock *ovpn_socket->sk".
While acquiring a reference to sk, we can increase its refcounter
without affecting the socket close()/destroy() notification
(which we rely on when userspace closes a socket we are using).
By increasing sk's refcounter we know we can dereference it
in ovpn_socket_release() without incurring in any race condition
anymore.
ovpn_socket_release() will ultimately decrease the reference
counter.
Cc: Oleksandr Natalenko <oleksandr@natalenko.name>
Fixes: 11851cbd60ea ("ovpn: implement TCP transport")
Reported-by: Qingfang Deng <dqfext@gmail.com>
Closes: https://github.com/OpenVPN/ovpn-net-next/issues/1
Tested-by: Gert Doering <gert@greenie.muc.de>
Link: https://www.mail-archive.com/openvpn-devel@lists.sourceforge.net/msg31575.html
Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Signed-off-by: Antonio Quartulli <antonio@openvpn.net>
2025-04-30 02:26:49 +02:00
|
|
|
static int ovpn_socket_attach(struct ovpn_socket *ovpn_sock,
|
|
|
|
struct socket *sock,
|
|
|
|
struct ovpn_peer *peer)
|
2025-04-15 13:17:23 +02:00
|
|
|
{
|
ovpn: ensure sk is still valid during cleanup
Removing a peer while userspace attempts to close its transport
socket triggers a race condition resulting in the following
crash:
Oops: general protection fault, probably for non-canonical address 0xdffffc0000000077: 0000 [#1] SMP KASAN
KASAN: null-ptr-deref in range [0x00000000000003b8-0x00000000000003bf]
CPU: 12 UID: 0 PID: 162 Comm: kworker/12:1 Tainted: G O 6.15.0-rc2-00635-g521139ac3840 #272 PREEMPT(full)
Tainted: [O]=OOT_MODULE
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-20240910_120124-localhost 04/01/2014
Workqueue: events ovpn_peer_keepalive_work [ovpn]
RIP: 0010:ovpn_socket_release+0x23c/0x500 [ovpn]
Code: ea 03 80 3c 02 00 0f 85 71 02 00 00 48 b8 00 00 00 00 00 fc ff df 4d 8b 64 24 18 49 8d bc 24 be 03 00 00 48 89 fa 48 c1 ea 03 <0f> b6 14 02 48 89 f8 83 e0 07 83 c0 01 38 d0 7c 08 84 d2 0f 85 30
RSP: 0018:ffffc90000c9fb18 EFLAGS: 00010217
RAX: dffffc0000000000 RBX: ffff8881148d7940 RCX: ffffffff817787bb
RDX: 0000000000000077 RSI: 0000000000000008 RDI: 00000000000003be
RBP: ffffc90000c9fb30 R08: 0000000000000000 R09: fffffbfff0d3e840
R10: ffffffff869f4207 R11: 0000000000000000 R12: 0000000000000000
R13: ffff888115eb9300 R14: ffffc90000c9fbc8 R15: 000000000000000c
FS: 0000000000000000(0000) GS:ffff8882b0151000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f37266b6114 CR3: 00000000054a8000 CR4: 0000000000750ef0
PKRU: 55555554
Call Trace:
<TASK>
unlock_ovpn+0x8b/0xe0 [ovpn]
ovpn_peer_keepalive_work+0xe3/0x540 [ovpn]
? ovpn_peers_free+0x780/0x780 [ovpn]
? lock_acquire+0x56/0x70
? process_one_work+0x888/0x1740
process_one_work+0x933/0x1740
? pwq_dec_nr_in_flight+0x10b0/0x10b0
? move_linked_works+0x12d/0x2c0
? assign_work+0x163/0x270
worker_thread+0x4d6/0xd90
? preempt_count_sub+0x4c/0x70
? process_one_work+0x1740/0x1740
kthread+0x36c/0x710
? trace_preempt_on+0x8c/0x1e0
? kthread_is_per_cpu+0xc0/0xc0
? preempt_count_sub+0x4c/0x70
? _raw_spin_unlock_irq+0x36/0x60
? calculate_sigpending+0x7b/0xa0
? kthread_is_per_cpu+0xc0/0xc0
ret_from_fork+0x3a/0x80
? kthread_is_per_cpu+0xc0/0xc0
ret_from_fork_asm+0x11/0x20
</TASK>
Modules linked in: ovpn(O)
This happens because the peer deletion operation reaches
ovpn_socket_release() while ovpn_sock->sock (struct socket *)
and its sk member (struct sock *) are still both valid.
Here synchronize_rcu() is invoked, after which ovpn_sock->sock->sk
becomes NULL, due to the concurrent socket closing triggered
from userspace.
After having invoked synchronize_rcu(), ovpn_socket_release() will
attempt dereferencing ovpn_sock->sock->sk, triggering the crash
reported above.
The reason for accessing sk is that we need to retrieve its
protocol and continue the cleanup routine accordingly.
This crash can be easily produced by running openvpn userspace in
client mode with `--keepalive 10 20`, while entirely omitting this
option on the server side.
After 20 seconds ovpn will assume the peer (server) to be dead,
will start removing it and will notify userspace. The latter will
receive the notification and close the transport socket, thus
triggering the crash.
To fix the race condition for good, we need to refactor struct ovpn_socket.
Since ovpn is always only interested in the sock->sk member (struct sock *)
we can directly hold a reference to it, raher than accessing it via
its struct socket container.
This means changing "struct socket *ovpn_socket->sock" to
"struct sock *ovpn_socket->sk".
While acquiring a reference to sk, we can increase its refcounter
without affecting the socket close()/destroy() notification
(which we rely on when userspace closes a socket we are using).
By increasing sk's refcounter we know we can dereference it
in ovpn_socket_release() without incurring in any race condition
anymore.
ovpn_socket_release() will ultimately decrease the reference
counter.
Cc: Oleksandr Natalenko <oleksandr@natalenko.name>
Fixes: 11851cbd60ea ("ovpn: implement TCP transport")
Reported-by: Qingfang Deng <dqfext@gmail.com>
Closes: https://github.com/OpenVPN/ovpn-net-next/issues/1
Tested-by: Gert Doering <gert@greenie.muc.de>
Link: https://www.mail-archive.com/openvpn-devel@lists.sourceforge.net/msg31575.html
Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Signed-off-by: Antonio Quartulli <antonio@openvpn.net>
2025-04-30 02:26:49 +02:00
|
|
|
if (sock->sk->sk_protocol == IPPROTO_UDP)
|
|
|
|
return ovpn_udp_socket_attach(ovpn_sock, sock, peer->ovpn);
|
|
|
|
else if (sock->sk->sk_protocol == IPPROTO_TCP)
|
|
|
|
return ovpn_tcp_socket_attach(ovpn_sock, peer);
|
2025-04-15 13:17:23 +02:00
|
|
|
|
|
|
|
return -EOPNOTSUPP;
|
|
|
|
}
|
|
|
|
|
|
|
|
/**
|
|
|
|
* ovpn_socket_new - create a new socket and initialize it
|
|
|
|
* @sock: the kernel socket to embed
|
|
|
|
* @peer: the peer reachable via this socket
|
|
|
|
*
|
|
|
|
* Return: an openvpn socket on success or a negative error code otherwise
|
|
|
|
*/
|
|
|
|
struct ovpn_socket *ovpn_socket_new(struct socket *sock, struct ovpn_peer *peer)
|
|
|
|
{
|
|
|
|
struct ovpn_socket *ovpn_sock;
|
ovpn: ensure sk is still valid during cleanup
Removing a peer while userspace attempts to close its transport
socket triggers a race condition resulting in the following
crash:
Oops: general protection fault, probably for non-canonical address 0xdffffc0000000077: 0000 [#1] SMP KASAN
KASAN: null-ptr-deref in range [0x00000000000003b8-0x00000000000003bf]
CPU: 12 UID: 0 PID: 162 Comm: kworker/12:1 Tainted: G O 6.15.0-rc2-00635-g521139ac3840 #272 PREEMPT(full)
Tainted: [O]=OOT_MODULE
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-20240910_120124-localhost 04/01/2014
Workqueue: events ovpn_peer_keepalive_work [ovpn]
RIP: 0010:ovpn_socket_release+0x23c/0x500 [ovpn]
Code: ea 03 80 3c 02 00 0f 85 71 02 00 00 48 b8 00 00 00 00 00 fc ff df 4d 8b 64 24 18 49 8d bc 24 be 03 00 00 48 89 fa 48 c1 ea 03 <0f> b6 14 02 48 89 f8 83 e0 07 83 c0 01 38 d0 7c 08 84 d2 0f 85 30
RSP: 0018:ffffc90000c9fb18 EFLAGS: 00010217
RAX: dffffc0000000000 RBX: ffff8881148d7940 RCX: ffffffff817787bb
RDX: 0000000000000077 RSI: 0000000000000008 RDI: 00000000000003be
RBP: ffffc90000c9fb30 R08: 0000000000000000 R09: fffffbfff0d3e840
R10: ffffffff869f4207 R11: 0000000000000000 R12: 0000000000000000
R13: ffff888115eb9300 R14: ffffc90000c9fbc8 R15: 000000000000000c
FS: 0000000000000000(0000) GS:ffff8882b0151000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f37266b6114 CR3: 00000000054a8000 CR4: 0000000000750ef0
PKRU: 55555554
Call Trace:
<TASK>
unlock_ovpn+0x8b/0xe0 [ovpn]
ovpn_peer_keepalive_work+0xe3/0x540 [ovpn]
? ovpn_peers_free+0x780/0x780 [ovpn]
? lock_acquire+0x56/0x70
? process_one_work+0x888/0x1740
process_one_work+0x933/0x1740
? pwq_dec_nr_in_flight+0x10b0/0x10b0
? move_linked_works+0x12d/0x2c0
? assign_work+0x163/0x270
worker_thread+0x4d6/0xd90
? preempt_count_sub+0x4c/0x70
? process_one_work+0x1740/0x1740
kthread+0x36c/0x710
? trace_preempt_on+0x8c/0x1e0
? kthread_is_per_cpu+0xc0/0xc0
? preempt_count_sub+0x4c/0x70
? _raw_spin_unlock_irq+0x36/0x60
? calculate_sigpending+0x7b/0xa0
? kthread_is_per_cpu+0xc0/0xc0
ret_from_fork+0x3a/0x80
? kthread_is_per_cpu+0xc0/0xc0
ret_from_fork_asm+0x11/0x20
</TASK>
Modules linked in: ovpn(O)
This happens because the peer deletion operation reaches
ovpn_socket_release() while ovpn_sock->sock (struct socket *)
and its sk member (struct sock *) are still both valid.
Here synchronize_rcu() is invoked, after which ovpn_sock->sock->sk
becomes NULL, due to the concurrent socket closing triggered
from userspace.
After having invoked synchronize_rcu(), ovpn_socket_release() will
attempt dereferencing ovpn_sock->sock->sk, triggering the crash
reported above.
The reason for accessing sk is that we need to retrieve its
protocol and continue the cleanup routine accordingly.
This crash can be easily produced by running openvpn userspace in
client mode with `--keepalive 10 20`, while entirely omitting this
option on the server side.
After 20 seconds ovpn will assume the peer (server) to be dead,
will start removing it and will notify userspace. The latter will
receive the notification and close the transport socket, thus
triggering the crash.
To fix the race condition for good, we need to refactor struct ovpn_socket.
Since ovpn is always only interested in the sock->sk member (struct sock *)
we can directly hold a reference to it, raher than accessing it via
its struct socket container.
This means changing "struct socket *ovpn_socket->sock" to
"struct sock *ovpn_socket->sk".
While acquiring a reference to sk, we can increase its refcounter
without affecting the socket close()/destroy() notification
(which we rely on when userspace closes a socket we are using).
By increasing sk's refcounter we know we can dereference it
in ovpn_socket_release() without incurring in any race condition
anymore.
ovpn_socket_release() will ultimately decrease the reference
counter.
Cc: Oleksandr Natalenko <oleksandr@natalenko.name>
Fixes: 11851cbd60ea ("ovpn: implement TCP transport")
Reported-by: Qingfang Deng <dqfext@gmail.com>
Closes: https://github.com/OpenVPN/ovpn-net-next/issues/1
Tested-by: Gert Doering <gert@greenie.muc.de>
Link: https://www.mail-archive.com/openvpn-devel@lists.sourceforge.net/msg31575.html
Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Signed-off-by: Antonio Quartulli <antonio@openvpn.net>
2025-04-30 02:26:49 +02:00
|
|
|
struct sock *sk = sock->sk;
|
2025-04-15 13:17:23 +02:00
|
|
|
int ret;
|
|
|
|
|
ovpn: ensure sk is still valid during cleanup
Removing a peer while userspace attempts to close its transport
socket triggers a race condition resulting in the following
crash:
Oops: general protection fault, probably for non-canonical address 0xdffffc0000000077: 0000 [#1] SMP KASAN
KASAN: null-ptr-deref in range [0x00000000000003b8-0x00000000000003bf]
CPU: 12 UID: 0 PID: 162 Comm: kworker/12:1 Tainted: G O 6.15.0-rc2-00635-g521139ac3840 #272 PREEMPT(full)
Tainted: [O]=OOT_MODULE
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-20240910_120124-localhost 04/01/2014
Workqueue: events ovpn_peer_keepalive_work [ovpn]
RIP: 0010:ovpn_socket_release+0x23c/0x500 [ovpn]
Code: ea 03 80 3c 02 00 0f 85 71 02 00 00 48 b8 00 00 00 00 00 fc ff df 4d 8b 64 24 18 49 8d bc 24 be 03 00 00 48 89 fa 48 c1 ea 03 <0f> b6 14 02 48 89 f8 83 e0 07 83 c0 01 38 d0 7c 08 84 d2 0f 85 30
RSP: 0018:ffffc90000c9fb18 EFLAGS: 00010217
RAX: dffffc0000000000 RBX: ffff8881148d7940 RCX: ffffffff817787bb
RDX: 0000000000000077 RSI: 0000000000000008 RDI: 00000000000003be
RBP: ffffc90000c9fb30 R08: 0000000000000000 R09: fffffbfff0d3e840
R10: ffffffff869f4207 R11: 0000000000000000 R12: 0000000000000000
R13: ffff888115eb9300 R14: ffffc90000c9fbc8 R15: 000000000000000c
FS: 0000000000000000(0000) GS:ffff8882b0151000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f37266b6114 CR3: 00000000054a8000 CR4: 0000000000750ef0
PKRU: 55555554
Call Trace:
<TASK>
unlock_ovpn+0x8b/0xe0 [ovpn]
ovpn_peer_keepalive_work+0xe3/0x540 [ovpn]
? ovpn_peers_free+0x780/0x780 [ovpn]
? lock_acquire+0x56/0x70
? process_one_work+0x888/0x1740
process_one_work+0x933/0x1740
? pwq_dec_nr_in_flight+0x10b0/0x10b0
? move_linked_works+0x12d/0x2c0
? assign_work+0x163/0x270
worker_thread+0x4d6/0xd90
? preempt_count_sub+0x4c/0x70
? process_one_work+0x1740/0x1740
kthread+0x36c/0x710
? trace_preempt_on+0x8c/0x1e0
? kthread_is_per_cpu+0xc0/0xc0
? preempt_count_sub+0x4c/0x70
? _raw_spin_unlock_irq+0x36/0x60
? calculate_sigpending+0x7b/0xa0
? kthread_is_per_cpu+0xc0/0xc0
ret_from_fork+0x3a/0x80
? kthread_is_per_cpu+0xc0/0xc0
ret_from_fork_asm+0x11/0x20
</TASK>
Modules linked in: ovpn(O)
This happens because the peer deletion operation reaches
ovpn_socket_release() while ovpn_sock->sock (struct socket *)
and its sk member (struct sock *) are still both valid.
Here synchronize_rcu() is invoked, after which ovpn_sock->sock->sk
becomes NULL, due to the concurrent socket closing triggered
from userspace.
After having invoked synchronize_rcu(), ovpn_socket_release() will
attempt dereferencing ovpn_sock->sock->sk, triggering the crash
reported above.
The reason for accessing sk is that we need to retrieve its
protocol and continue the cleanup routine accordingly.
This crash can be easily produced by running openvpn userspace in
client mode with `--keepalive 10 20`, while entirely omitting this
option on the server side.
After 20 seconds ovpn will assume the peer (server) to be dead,
will start removing it and will notify userspace. The latter will
receive the notification and close the transport socket, thus
triggering the crash.
To fix the race condition for good, we need to refactor struct ovpn_socket.
Since ovpn is always only interested in the sock->sk member (struct sock *)
we can directly hold a reference to it, raher than accessing it via
its struct socket container.
This means changing "struct socket *ovpn_socket->sock" to
"struct sock *ovpn_socket->sk".
While acquiring a reference to sk, we can increase its refcounter
without affecting the socket close()/destroy() notification
(which we rely on when userspace closes a socket we are using).
By increasing sk's refcounter we know we can dereference it
in ovpn_socket_release() without incurring in any race condition
anymore.
ovpn_socket_release() will ultimately decrease the reference
counter.
Cc: Oleksandr Natalenko <oleksandr@natalenko.name>
Fixes: 11851cbd60ea ("ovpn: implement TCP transport")
Reported-by: Qingfang Deng <dqfext@gmail.com>
Closes: https://github.com/OpenVPN/ovpn-net-next/issues/1
Tested-by: Gert Doering <gert@greenie.muc.de>
Link: https://www.mail-archive.com/openvpn-devel@lists.sourceforge.net/msg31575.html
Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Signed-off-by: Antonio Quartulli <antonio@openvpn.net>
2025-04-30 02:26:49 +02:00
|
|
|
lock_sock(sk);
|
2025-04-15 13:17:23 +02:00
|
|
|
|
|
|
|
/* a TCP socket can only be owned by a single peer, therefore there
|
|
|
|
* can't be any other user
|
|
|
|
*/
|
ovpn: ensure sk is still valid during cleanup
Removing a peer while userspace attempts to close its transport
socket triggers a race condition resulting in the following
crash:
Oops: general protection fault, probably for non-canonical address 0xdffffc0000000077: 0000 [#1] SMP KASAN
KASAN: null-ptr-deref in range [0x00000000000003b8-0x00000000000003bf]
CPU: 12 UID: 0 PID: 162 Comm: kworker/12:1 Tainted: G O 6.15.0-rc2-00635-g521139ac3840 #272 PREEMPT(full)
Tainted: [O]=OOT_MODULE
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-20240910_120124-localhost 04/01/2014
Workqueue: events ovpn_peer_keepalive_work [ovpn]
RIP: 0010:ovpn_socket_release+0x23c/0x500 [ovpn]
Code: ea 03 80 3c 02 00 0f 85 71 02 00 00 48 b8 00 00 00 00 00 fc ff df 4d 8b 64 24 18 49 8d bc 24 be 03 00 00 48 89 fa 48 c1 ea 03 <0f> b6 14 02 48 89 f8 83 e0 07 83 c0 01 38 d0 7c 08 84 d2 0f 85 30
RSP: 0018:ffffc90000c9fb18 EFLAGS: 00010217
RAX: dffffc0000000000 RBX: ffff8881148d7940 RCX: ffffffff817787bb
RDX: 0000000000000077 RSI: 0000000000000008 RDI: 00000000000003be
RBP: ffffc90000c9fb30 R08: 0000000000000000 R09: fffffbfff0d3e840
R10: ffffffff869f4207 R11: 0000000000000000 R12: 0000000000000000
R13: ffff888115eb9300 R14: ffffc90000c9fbc8 R15: 000000000000000c
FS: 0000000000000000(0000) GS:ffff8882b0151000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f37266b6114 CR3: 00000000054a8000 CR4: 0000000000750ef0
PKRU: 55555554
Call Trace:
<TASK>
unlock_ovpn+0x8b/0xe0 [ovpn]
ovpn_peer_keepalive_work+0xe3/0x540 [ovpn]
? ovpn_peers_free+0x780/0x780 [ovpn]
? lock_acquire+0x56/0x70
? process_one_work+0x888/0x1740
process_one_work+0x933/0x1740
? pwq_dec_nr_in_flight+0x10b0/0x10b0
? move_linked_works+0x12d/0x2c0
? assign_work+0x163/0x270
worker_thread+0x4d6/0xd90
? preempt_count_sub+0x4c/0x70
? process_one_work+0x1740/0x1740
kthread+0x36c/0x710
? trace_preempt_on+0x8c/0x1e0
? kthread_is_per_cpu+0xc0/0xc0
? preempt_count_sub+0x4c/0x70
? _raw_spin_unlock_irq+0x36/0x60
? calculate_sigpending+0x7b/0xa0
? kthread_is_per_cpu+0xc0/0xc0
ret_from_fork+0x3a/0x80
? kthread_is_per_cpu+0xc0/0xc0
ret_from_fork_asm+0x11/0x20
</TASK>
Modules linked in: ovpn(O)
This happens because the peer deletion operation reaches
ovpn_socket_release() while ovpn_sock->sock (struct socket *)
and its sk member (struct sock *) are still both valid.
Here synchronize_rcu() is invoked, after which ovpn_sock->sock->sk
becomes NULL, due to the concurrent socket closing triggered
from userspace.
After having invoked synchronize_rcu(), ovpn_socket_release() will
attempt dereferencing ovpn_sock->sock->sk, triggering the crash
reported above.
The reason for accessing sk is that we need to retrieve its
protocol and continue the cleanup routine accordingly.
This crash can be easily produced by running openvpn userspace in
client mode with `--keepalive 10 20`, while entirely omitting this
option on the server side.
After 20 seconds ovpn will assume the peer (server) to be dead,
will start removing it and will notify userspace. The latter will
receive the notification and close the transport socket, thus
triggering the crash.
To fix the race condition for good, we need to refactor struct ovpn_socket.
Since ovpn is always only interested in the sock->sk member (struct sock *)
we can directly hold a reference to it, raher than accessing it via
its struct socket container.
This means changing "struct socket *ovpn_socket->sock" to
"struct sock *ovpn_socket->sk".
While acquiring a reference to sk, we can increase its refcounter
without affecting the socket close()/destroy() notification
(which we rely on when userspace closes a socket we are using).
By increasing sk's refcounter we know we can dereference it
in ovpn_socket_release() without incurring in any race condition
anymore.
ovpn_socket_release() will ultimately decrease the reference
counter.
Cc: Oleksandr Natalenko <oleksandr@natalenko.name>
Fixes: 11851cbd60ea ("ovpn: implement TCP transport")
Reported-by: Qingfang Deng <dqfext@gmail.com>
Closes: https://github.com/OpenVPN/ovpn-net-next/issues/1
Tested-by: Gert Doering <gert@greenie.muc.de>
Link: https://www.mail-archive.com/openvpn-devel@lists.sourceforge.net/msg31575.html
Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Signed-off-by: Antonio Quartulli <antonio@openvpn.net>
2025-04-30 02:26:49 +02:00
|
|
|
if (sk->sk_protocol == IPPROTO_TCP && sk->sk_user_data) {
|
2025-04-15 13:17:23 +02:00
|
|
|
ovpn_sock = ERR_PTR(-EBUSY);
|
|
|
|
goto sock_release;
|
|
|
|
}
|
|
|
|
|
|
|
|
/* a UDP socket can be shared across multiple peers, but we must make
|
|
|
|
* sure it is not owned by something else
|
|
|
|
*/
|
ovpn: ensure sk is still valid during cleanup
Removing a peer while userspace attempts to close its transport
socket triggers a race condition resulting in the following
crash:
Oops: general protection fault, probably for non-canonical address 0xdffffc0000000077: 0000 [#1] SMP KASAN
KASAN: null-ptr-deref in range [0x00000000000003b8-0x00000000000003bf]
CPU: 12 UID: 0 PID: 162 Comm: kworker/12:1 Tainted: G O 6.15.0-rc2-00635-g521139ac3840 #272 PREEMPT(full)
Tainted: [O]=OOT_MODULE
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-20240910_120124-localhost 04/01/2014
Workqueue: events ovpn_peer_keepalive_work [ovpn]
RIP: 0010:ovpn_socket_release+0x23c/0x500 [ovpn]
Code: ea 03 80 3c 02 00 0f 85 71 02 00 00 48 b8 00 00 00 00 00 fc ff df 4d 8b 64 24 18 49 8d bc 24 be 03 00 00 48 89 fa 48 c1 ea 03 <0f> b6 14 02 48 89 f8 83 e0 07 83 c0 01 38 d0 7c 08 84 d2 0f 85 30
RSP: 0018:ffffc90000c9fb18 EFLAGS: 00010217
RAX: dffffc0000000000 RBX: ffff8881148d7940 RCX: ffffffff817787bb
RDX: 0000000000000077 RSI: 0000000000000008 RDI: 00000000000003be
RBP: ffffc90000c9fb30 R08: 0000000000000000 R09: fffffbfff0d3e840
R10: ffffffff869f4207 R11: 0000000000000000 R12: 0000000000000000
R13: ffff888115eb9300 R14: ffffc90000c9fbc8 R15: 000000000000000c
FS: 0000000000000000(0000) GS:ffff8882b0151000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f37266b6114 CR3: 00000000054a8000 CR4: 0000000000750ef0
PKRU: 55555554
Call Trace:
<TASK>
unlock_ovpn+0x8b/0xe0 [ovpn]
ovpn_peer_keepalive_work+0xe3/0x540 [ovpn]
? ovpn_peers_free+0x780/0x780 [ovpn]
? lock_acquire+0x56/0x70
? process_one_work+0x888/0x1740
process_one_work+0x933/0x1740
? pwq_dec_nr_in_flight+0x10b0/0x10b0
? move_linked_works+0x12d/0x2c0
? assign_work+0x163/0x270
worker_thread+0x4d6/0xd90
? preempt_count_sub+0x4c/0x70
? process_one_work+0x1740/0x1740
kthread+0x36c/0x710
? trace_preempt_on+0x8c/0x1e0
? kthread_is_per_cpu+0xc0/0xc0
? preempt_count_sub+0x4c/0x70
? _raw_spin_unlock_irq+0x36/0x60
? calculate_sigpending+0x7b/0xa0
? kthread_is_per_cpu+0xc0/0xc0
ret_from_fork+0x3a/0x80
? kthread_is_per_cpu+0xc0/0xc0
ret_from_fork_asm+0x11/0x20
</TASK>
Modules linked in: ovpn(O)
This happens because the peer deletion operation reaches
ovpn_socket_release() while ovpn_sock->sock (struct socket *)
and its sk member (struct sock *) are still both valid.
Here synchronize_rcu() is invoked, after which ovpn_sock->sock->sk
becomes NULL, due to the concurrent socket closing triggered
from userspace.
After having invoked synchronize_rcu(), ovpn_socket_release() will
attempt dereferencing ovpn_sock->sock->sk, triggering the crash
reported above.
The reason for accessing sk is that we need to retrieve its
protocol and continue the cleanup routine accordingly.
This crash can be easily produced by running openvpn userspace in
client mode with `--keepalive 10 20`, while entirely omitting this
option on the server side.
After 20 seconds ovpn will assume the peer (server) to be dead,
will start removing it and will notify userspace. The latter will
receive the notification and close the transport socket, thus
triggering the crash.
To fix the race condition for good, we need to refactor struct ovpn_socket.
Since ovpn is always only interested in the sock->sk member (struct sock *)
we can directly hold a reference to it, raher than accessing it via
its struct socket container.
This means changing "struct socket *ovpn_socket->sock" to
"struct sock *ovpn_socket->sk".
While acquiring a reference to sk, we can increase its refcounter
without affecting the socket close()/destroy() notification
(which we rely on when userspace closes a socket we are using).
By increasing sk's refcounter we know we can dereference it
in ovpn_socket_release() without incurring in any race condition
anymore.
ovpn_socket_release() will ultimately decrease the reference
counter.
Cc: Oleksandr Natalenko <oleksandr@natalenko.name>
Fixes: 11851cbd60ea ("ovpn: implement TCP transport")
Reported-by: Qingfang Deng <dqfext@gmail.com>
Closes: https://github.com/OpenVPN/ovpn-net-next/issues/1
Tested-by: Gert Doering <gert@greenie.muc.de>
Link: https://www.mail-archive.com/openvpn-devel@lists.sourceforge.net/msg31575.html
Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Signed-off-by: Antonio Quartulli <antonio@openvpn.net>
2025-04-30 02:26:49 +02:00
|
|
|
if (sk->sk_protocol == IPPROTO_UDP) {
|
|
|
|
u8 type = READ_ONCE(udp_sk(sk)->encap_type);
|
2025-04-15 13:17:23 +02:00
|
|
|
|
|
|
|
/* socket owned by other encapsulation module */
|
|
|
|
if (type && type != UDP_ENCAP_OVPNINUDP) {
|
|
|
|
ovpn_sock = ERR_PTR(-EBUSY);
|
|
|
|
goto sock_release;
|
|
|
|
}
|
|
|
|
|
|
|
|
rcu_read_lock();
|
ovpn: ensure sk is still valid during cleanup
Removing a peer while userspace attempts to close its transport
socket triggers a race condition resulting in the following
crash:
Oops: general protection fault, probably for non-canonical address 0xdffffc0000000077: 0000 [#1] SMP KASAN
KASAN: null-ptr-deref in range [0x00000000000003b8-0x00000000000003bf]
CPU: 12 UID: 0 PID: 162 Comm: kworker/12:1 Tainted: G O 6.15.0-rc2-00635-g521139ac3840 #272 PREEMPT(full)
Tainted: [O]=OOT_MODULE
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-20240910_120124-localhost 04/01/2014
Workqueue: events ovpn_peer_keepalive_work [ovpn]
RIP: 0010:ovpn_socket_release+0x23c/0x500 [ovpn]
Code: ea 03 80 3c 02 00 0f 85 71 02 00 00 48 b8 00 00 00 00 00 fc ff df 4d 8b 64 24 18 49 8d bc 24 be 03 00 00 48 89 fa 48 c1 ea 03 <0f> b6 14 02 48 89 f8 83 e0 07 83 c0 01 38 d0 7c 08 84 d2 0f 85 30
RSP: 0018:ffffc90000c9fb18 EFLAGS: 00010217
RAX: dffffc0000000000 RBX: ffff8881148d7940 RCX: ffffffff817787bb
RDX: 0000000000000077 RSI: 0000000000000008 RDI: 00000000000003be
RBP: ffffc90000c9fb30 R08: 0000000000000000 R09: fffffbfff0d3e840
R10: ffffffff869f4207 R11: 0000000000000000 R12: 0000000000000000
R13: ffff888115eb9300 R14: ffffc90000c9fbc8 R15: 000000000000000c
FS: 0000000000000000(0000) GS:ffff8882b0151000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f37266b6114 CR3: 00000000054a8000 CR4: 0000000000750ef0
PKRU: 55555554
Call Trace:
<TASK>
unlock_ovpn+0x8b/0xe0 [ovpn]
ovpn_peer_keepalive_work+0xe3/0x540 [ovpn]
? ovpn_peers_free+0x780/0x780 [ovpn]
? lock_acquire+0x56/0x70
? process_one_work+0x888/0x1740
process_one_work+0x933/0x1740
? pwq_dec_nr_in_flight+0x10b0/0x10b0
? move_linked_works+0x12d/0x2c0
? assign_work+0x163/0x270
worker_thread+0x4d6/0xd90
? preempt_count_sub+0x4c/0x70
? process_one_work+0x1740/0x1740
kthread+0x36c/0x710
? trace_preempt_on+0x8c/0x1e0
? kthread_is_per_cpu+0xc0/0xc0
? preempt_count_sub+0x4c/0x70
? _raw_spin_unlock_irq+0x36/0x60
? calculate_sigpending+0x7b/0xa0
? kthread_is_per_cpu+0xc0/0xc0
ret_from_fork+0x3a/0x80
? kthread_is_per_cpu+0xc0/0xc0
ret_from_fork_asm+0x11/0x20
</TASK>
Modules linked in: ovpn(O)
This happens because the peer deletion operation reaches
ovpn_socket_release() while ovpn_sock->sock (struct socket *)
and its sk member (struct sock *) are still both valid.
Here synchronize_rcu() is invoked, after which ovpn_sock->sock->sk
becomes NULL, due to the concurrent socket closing triggered
from userspace.
After having invoked synchronize_rcu(), ovpn_socket_release() will
attempt dereferencing ovpn_sock->sock->sk, triggering the crash
reported above.
The reason for accessing sk is that we need to retrieve its
protocol and continue the cleanup routine accordingly.
This crash can be easily produced by running openvpn userspace in
client mode with `--keepalive 10 20`, while entirely omitting this
option on the server side.
After 20 seconds ovpn will assume the peer (server) to be dead,
will start removing it and will notify userspace. The latter will
receive the notification and close the transport socket, thus
triggering the crash.
To fix the race condition for good, we need to refactor struct ovpn_socket.
Since ovpn is always only interested in the sock->sk member (struct sock *)
we can directly hold a reference to it, raher than accessing it via
its struct socket container.
This means changing "struct socket *ovpn_socket->sock" to
"struct sock *ovpn_socket->sk".
While acquiring a reference to sk, we can increase its refcounter
without affecting the socket close()/destroy() notification
(which we rely on when userspace closes a socket we are using).
By increasing sk's refcounter we know we can dereference it
in ovpn_socket_release() without incurring in any race condition
anymore.
ovpn_socket_release() will ultimately decrease the reference
counter.
Cc: Oleksandr Natalenko <oleksandr@natalenko.name>
Fixes: 11851cbd60ea ("ovpn: implement TCP transport")
Reported-by: Qingfang Deng <dqfext@gmail.com>
Closes: https://github.com/OpenVPN/ovpn-net-next/issues/1
Tested-by: Gert Doering <gert@greenie.muc.de>
Link: https://www.mail-archive.com/openvpn-devel@lists.sourceforge.net/msg31575.html
Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Signed-off-by: Antonio Quartulli <antonio@openvpn.net>
2025-04-30 02:26:49 +02:00
|
|
|
ovpn_sock = rcu_dereference_sk_user_data(sk);
|
2025-04-15 13:17:23 +02:00
|
|
|
if (ovpn_sock) {
|
|
|
|
/* socket owned by another ovpn instance, we can't use it */
|
|
|
|
if (ovpn_sock->ovpn != peer->ovpn) {
|
|
|
|
ovpn_sock = ERR_PTR(-EBUSY);
|
|
|
|
rcu_read_unlock();
|
|
|
|
goto sock_release;
|
|
|
|
}
|
|
|
|
|
|
|
|
/* this socket is already owned by this instance,
|
|
|
|
* therefore we can increase the refcounter and
|
|
|
|
* use it as expected
|
|
|
|
*/
|
|
|
|
if (WARN_ON(!ovpn_socket_hold(ovpn_sock))) {
|
|
|
|
/* this should never happen because setting
|
|
|
|
* the refcnt to 0 and detaching the socket
|
|
|
|
* is expected to be atomic
|
|
|
|
*/
|
|
|
|
ovpn_sock = ERR_PTR(-EAGAIN);
|
|
|
|
rcu_read_unlock();
|
|
|
|
goto sock_release;
|
|
|
|
}
|
|
|
|
|
|
|
|
rcu_read_unlock();
|
|
|
|
goto sock_release;
|
|
|
|
}
|
|
|
|
rcu_read_unlock();
|
|
|
|
}
|
|
|
|
|
|
|
|
/* socket is not owned: attach to this ovpn instance */
|
|
|
|
|
|
|
|
ovpn_sock = kzalloc(sizeof(*ovpn_sock), GFP_KERNEL);
|
|
|
|
if (!ovpn_sock) {
|
|
|
|
ovpn_sock = ERR_PTR(-ENOMEM);
|
|
|
|
goto sock_release;
|
|
|
|
}
|
|
|
|
|
ovpn: ensure sk is still valid during cleanup
Removing a peer while userspace attempts to close its transport
socket triggers a race condition resulting in the following
crash:
Oops: general protection fault, probably for non-canonical address 0xdffffc0000000077: 0000 [#1] SMP KASAN
KASAN: null-ptr-deref in range [0x00000000000003b8-0x00000000000003bf]
CPU: 12 UID: 0 PID: 162 Comm: kworker/12:1 Tainted: G O 6.15.0-rc2-00635-g521139ac3840 #272 PREEMPT(full)
Tainted: [O]=OOT_MODULE
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-20240910_120124-localhost 04/01/2014
Workqueue: events ovpn_peer_keepalive_work [ovpn]
RIP: 0010:ovpn_socket_release+0x23c/0x500 [ovpn]
Code: ea 03 80 3c 02 00 0f 85 71 02 00 00 48 b8 00 00 00 00 00 fc ff df 4d 8b 64 24 18 49 8d bc 24 be 03 00 00 48 89 fa 48 c1 ea 03 <0f> b6 14 02 48 89 f8 83 e0 07 83 c0 01 38 d0 7c 08 84 d2 0f 85 30
RSP: 0018:ffffc90000c9fb18 EFLAGS: 00010217
RAX: dffffc0000000000 RBX: ffff8881148d7940 RCX: ffffffff817787bb
RDX: 0000000000000077 RSI: 0000000000000008 RDI: 00000000000003be
RBP: ffffc90000c9fb30 R08: 0000000000000000 R09: fffffbfff0d3e840
R10: ffffffff869f4207 R11: 0000000000000000 R12: 0000000000000000
R13: ffff888115eb9300 R14: ffffc90000c9fbc8 R15: 000000000000000c
FS: 0000000000000000(0000) GS:ffff8882b0151000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f37266b6114 CR3: 00000000054a8000 CR4: 0000000000750ef0
PKRU: 55555554
Call Trace:
<TASK>
unlock_ovpn+0x8b/0xe0 [ovpn]
ovpn_peer_keepalive_work+0xe3/0x540 [ovpn]
? ovpn_peers_free+0x780/0x780 [ovpn]
? lock_acquire+0x56/0x70
? process_one_work+0x888/0x1740
process_one_work+0x933/0x1740
? pwq_dec_nr_in_flight+0x10b0/0x10b0
? move_linked_works+0x12d/0x2c0
? assign_work+0x163/0x270
worker_thread+0x4d6/0xd90
? preempt_count_sub+0x4c/0x70
? process_one_work+0x1740/0x1740
kthread+0x36c/0x710
? trace_preempt_on+0x8c/0x1e0
? kthread_is_per_cpu+0xc0/0xc0
? preempt_count_sub+0x4c/0x70
? _raw_spin_unlock_irq+0x36/0x60
? calculate_sigpending+0x7b/0xa0
? kthread_is_per_cpu+0xc0/0xc0
ret_from_fork+0x3a/0x80
? kthread_is_per_cpu+0xc0/0xc0
ret_from_fork_asm+0x11/0x20
</TASK>
Modules linked in: ovpn(O)
This happens because the peer deletion operation reaches
ovpn_socket_release() while ovpn_sock->sock (struct socket *)
and its sk member (struct sock *) are still both valid.
Here synchronize_rcu() is invoked, after which ovpn_sock->sock->sk
becomes NULL, due to the concurrent socket closing triggered
from userspace.
After having invoked synchronize_rcu(), ovpn_socket_release() will
attempt dereferencing ovpn_sock->sock->sk, triggering the crash
reported above.
The reason for accessing sk is that we need to retrieve its
protocol and continue the cleanup routine accordingly.
This crash can be easily produced by running openvpn userspace in
client mode with `--keepalive 10 20`, while entirely omitting this
option on the server side.
After 20 seconds ovpn will assume the peer (server) to be dead,
will start removing it and will notify userspace. The latter will
receive the notification and close the transport socket, thus
triggering the crash.
To fix the race condition for good, we need to refactor struct ovpn_socket.
Since ovpn is always only interested in the sock->sk member (struct sock *)
we can directly hold a reference to it, raher than accessing it via
its struct socket container.
This means changing "struct socket *ovpn_socket->sock" to
"struct sock *ovpn_socket->sk".
While acquiring a reference to sk, we can increase its refcounter
without affecting the socket close()/destroy() notification
(which we rely on when userspace closes a socket we are using).
By increasing sk's refcounter we know we can dereference it
in ovpn_socket_release() without incurring in any race condition
anymore.
ovpn_socket_release() will ultimately decrease the reference
counter.
Cc: Oleksandr Natalenko <oleksandr@natalenko.name>
Fixes: 11851cbd60ea ("ovpn: implement TCP transport")
Reported-by: Qingfang Deng <dqfext@gmail.com>
Closes: https://github.com/OpenVPN/ovpn-net-next/issues/1
Tested-by: Gert Doering <gert@greenie.muc.de>
Link: https://www.mail-archive.com/openvpn-devel@lists.sourceforge.net/msg31575.html
Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Signed-off-by: Antonio Quartulli <antonio@openvpn.net>
2025-04-30 02:26:49 +02:00
|
|
|
ovpn_sock->sk = sk;
|
2025-04-15 13:17:23 +02:00
|
|
|
kref_init(&ovpn_sock->refcount);
|
|
|
|
|
ovpn: ensure sk is still valid during cleanup
Removing a peer while userspace attempts to close its transport
socket triggers a race condition resulting in the following
crash:
Oops: general protection fault, probably for non-canonical address 0xdffffc0000000077: 0000 [#1] SMP KASAN
KASAN: null-ptr-deref in range [0x00000000000003b8-0x00000000000003bf]
CPU: 12 UID: 0 PID: 162 Comm: kworker/12:1 Tainted: G O 6.15.0-rc2-00635-g521139ac3840 #272 PREEMPT(full)
Tainted: [O]=OOT_MODULE
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-20240910_120124-localhost 04/01/2014
Workqueue: events ovpn_peer_keepalive_work [ovpn]
RIP: 0010:ovpn_socket_release+0x23c/0x500 [ovpn]
Code: ea 03 80 3c 02 00 0f 85 71 02 00 00 48 b8 00 00 00 00 00 fc ff df 4d 8b 64 24 18 49 8d bc 24 be 03 00 00 48 89 fa 48 c1 ea 03 <0f> b6 14 02 48 89 f8 83 e0 07 83 c0 01 38 d0 7c 08 84 d2 0f 85 30
RSP: 0018:ffffc90000c9fb18 EFLAGS: 00010217
RAX: dffffc0000000000 RBX: ffff8881148d7940 RCX: ffffffff817787bb
RDX: 0000000000000077 RSI: 0000000000000008 RDI: 00000000000003be
RBP: ffffc90000c9fb30 R08: 0000000000000000 R09: fffffbfff0d3e840
R10: ffffffff869f4207 R11: 0000000000000000 R12: 0000000000000000
R13: ffff888115eb9300 R14: ffffc90000c9fbc8 R15: 000000000000000c
FS: 0000000000000000(0000) GS:ffff8882b0151000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f37266b6114 CR3: 00000000054a8000 CR4: 0000000000750ef0
PKRU: 55555554
Call Trace:
<TASK>
unlock_ovpn+0x8b/0xe0 [ovpn]
ovpn_peer_keepalive_work+0xe3/0x540 [ovpn]
? ovpn_peers_free+0x780/0x780 [ovpn]
? lock_acquire+0x56/0x70
? process_one_work+0x888/0x1740
process_one_work+0x933/0x1740
? pwq_dec_nr_in_flight+0x10b0/0x10b0
? move_linked_works+0x12d/0x2c0
? assign_work+0x163/0x270
worker_thread+0x4d6/0xd90
? preempt_count_sub+0x4c/0x70
? process_one_work+0x1740/0x1740
kthread+0x36c/0x710
? trace_preempt_on+0x8c/0x1e0
? kthread_is_per_cpu+0xc0/0xc0
? preempt_count_sub+0x4c/0x70
? _raw_spin_unlock_irq+0x36/0x60
? calculate_sigpending+0x7b/0xa0
? kthread_is_per_cpu+0xc0/0xc0
ret_from_fork+0x3a/0x80
? kthread_is_per_cpu+0xc0/0xc0
ret_from_fork_asm+0x11/0x20
</TASK>
Modules linked in: ovpn(O)
This happens because the peer deletion operation reaches
ovpn_socket_release() while ovpn_sock->sock (struct socket *)
and its sk member (struct sock *) are still both valid.
Here synchronize_rcu() is invoked, after which ovpn_sock->sock->sk
becomes NULL, due to the concurrent socket closing triggered
from userspace.
After having invoked synchronize_rcu(), ovpn_socket_release() will
attempt dereferencing ovpn_sock->sock->sk, triggering the crash
reported above.
The reason for accessing sk is that we need to retrieve its
protocol and continue the cleanup routine accordingly.
This crash can be easily produced by running openvpn userspace in
client mode with `--keepalive 10 20`, while entirely omitting this
option on the server side.
After 20 seconds ovpn will assume the peer (server) to be dead,
will start removing it and will notify userspace. The latter will
receive the notification and close the transport socket, thus
triggering the crash.
To fix the race condition for good, we need to refactor struct ovpn_socket.
Since ovpn is always only interested in the sock->sk member (struct sock *)
we can directly hold a reference to it, raher than accessing it via
its struct socket container.
This means changing "struct socket *ovpn_socket->sock" to
"struct sock *ovpn_socket->sk".
While acquiring a reference to sk, we can increase its refcounter
without affecting the socket close()/destroy() notification
(which we rely on when userspace closes a socket we are using).
By increasing sk's refcounter we know we can dereference it
in ovpn_socket_release() without incurring in any race condition
anymore.
ovpn_socket_release() will ultimately decrease the reference
counter.
Cc: Oleksandr Natalenko <oleksandr@natalenko.name>
Fixes: 11851cbd60ea ("ovpn: implement TCP transport")
Reported-by: Qingfang Deng <dqfext@gmail.com>
Closes: https://github.com/OpenVPN/ovpn-net-next/issues/1
Tested-by: Gert Doering <gert@greenie.muc.de>
Link: https://www.mail-archive.com/openvpn-devel@lists.sourceforge.net/msg31575.html
Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Signed-off-by: Antonio Quartulli <antonio@openvpn.net>
2025-04-30 02:26:49 +02:00
|
|
|
/* the newly created ovpn_socket is holding reference to sk,
|
|
|
|
* therefore we increase its refcounter.
|
|
|
|
*
|
|
|
|
* This ovpn_socket instance is referenced by all peers
|
|
|
|
* using the same socket.
|
|
|
|
*
|
|
|
|
* ovpn_socket_release() will take care of dropping the reference.
|
|
|
|
*/
|
|
|
|
sock_hold(sk);
|
|
|
|
|
|
|
|
ret = ovpn_socket_attach(ovpn_sock, sock, peer);
|
2025-04-15 13:17:23 +02:00
|
|
|
if (ret < 0) {
|
ovpn: ensure sk is still valid during cleanup
Removing a peer while userspace attempts to close its transport
socket triggers a race condition resulting in the following
crash:
Oops: general protection fault, probably for non-canonical address 0xdffffc0000000077: 0000 [#1] SMP KASAN
KASAN: null-ptr-deref in range [0x00000000000003b8-0x00000000000003bf]
CPU: 12 UID: 0 PID: 162 Comm: kworker/12:1 Tainted: G O 6.15.0-rc2-00635-g521139ac3840 #272 PREEMPT(full)
Tainted: [O]=OOT_MODULE
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-20240910_120124-localhost 04/01/2014
Workqueue: events ovpn_peer_keepalive_work [ovpn]
RIP: 0010:ovpn_socket_release+0x23c/0x500 [ovpn]
Code: ea 03 80 3c 02 00 0f 85 71 02 00 00 48 b8 00 00 00 00 00 fc ff df 4d 8b 64 24 18 49 8d bc 24 be 03 00 00 48 89 fa 48 c1 ea 03 <0f> b6 14 02 48 89 f8 83 e0 07 83 c0 01 38 d0 7c 08 84 d2 0f 85 30
RSP: 0018:ffffc90000c9fb18 EFLAGS: 00010217
RAX: dffffc0000000000 RBX: ffff8881148d7940 RCX: ffffffff817787bb
RDX: 0000000000000077 RSI: 0000000000000008 RDI: 00000000000003be
RBP: ffffc90000c9fb30 R08: 0000000000000000 R09: fffffbfff0d3e840
R10: ffffffff869f4207 R11: 0000000000000000 R12: 0000000000000000
R13: ffff888115eb9300 R14: ffffc90000c9fbc8 R15: 000000000000000c
FS: 0000000000000000(0000) GS:ffff8882b0151000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f37266b6114 CR3: 00000000054a8000 CR4: 0000000000750ef0
PKRU: 55555554
Call Trace:
<TASK>
unlock_ovpn+0x8b/0xe0 [ovpn]
ovpn_peer_keepalive_work+0xe3/0x540 [ovpn]
? ovpn_peers_free+0x780/0x780 [ovpn]
? lock_acquire+0x56/0x70
? process_one_work+0x888/0x1740
process_one_work+0x933/0x1740
? pwq_dec_nr_in_flight+0x10b0/0x10b0
? move_linked_works+0x12d/0x2c0
? assign_work+0x163/0x270
worker_thread+0x4d6/0xd90
? preempt_count_sub+0x4c/0x70
? process_one_work+0x1740/0x1740
kthread+0x36c/0x710
? trace_preempt_on+0x8c/0x1e0
? kthread_is_per_cpu+0xc0/0xc0
? preempt_count_sub+0x4c/0x70
? _raw_spin_unlock_irq+0x36/0x60
? calculate_sigpending+0x7b/0xa0
? kthread_is_per_cpu+0xc0/0xc0
ret_from_fork+0x3a/0x80
? kthread_is_per_cpu+0xc0/0xc0
ret_from_fork_asm+0x11/0x20
</TASK>
Modules linked in: ovpn(O)
This happens because the peer deletion operation reaches
ovpn_socket_release() while ovpn_sock->sock (struct socket *)
and its sk member (struct sock *) are still both valid.
Here synchronize_rcu() is invoked, after which ovpn_sock->sock->sk
becomes NULL, due to the concurrent socket closing triggered
from userspace.
After having invoked synchronize_rcu(), ovpn_socket_release() will
attempt dereferencing ovpn_sock->sock->sk, triggering the crash
reported above.
The reason for accessing sk is that we need to retrieve its
protocol and continue the cleanup routine accordingly.
This crash can be easily produced by running openvpn userspace in
client mode with `--keepalive 10 20`, while entirely omitting this
option on the server side.
After 20 seconds ovpn will assume the peer (server) to be dead,
will start removing it and will notify userspace. The latter will
receive the notification and close the transport socket, thus
triggering the crash.
To fix the race condition for good, we need to refactor struct ovpn_socket.
Since ovpn is always only interested in the sock->sk member (struct sock *)
we can directly hold a reference to it, raher than accessing it via
its struct socket container.
This means changing "struct socket *ovpn_socket->sock" to
"struct sock *ovpn_socket->sk".
While acquiring a reference to sk, we can increase its refcounter
without affecting the socket close()/destroy() notification
(which we rely on when userspace closes a socket we are using).
By increasing sk's refcounter we know we can dereference it
in ovpn_socket_release() without incurring in any race condition
anymore.
ovpn_socket_release() will ultimately decrease the reference
counter.
Cc: Oleksandr Natalenko <oleksandr@natalenko.name>
Fixes: 11851cbd60ea ("ovpn: implement TCP transport")
Reported-by: Qingfang Deng <dqfext@gmail.com>
Closes: https://github.com/OpenVPN/ovpn-net-next/issues/1
Tested-by: Gert Doering <gert@greenie.muc.de>
Link: https://www.mail-archive.com/openvpn-devel@lists.sourceforge.net/msg31575.html
Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Signed-off-by: Antonio Quartulli <antonio@openvpn.net>
2025-04-30 02:26:49 +02:00
|
|
|
sock_put(sk);
|
2025-04-15 13:17:23 +02:00
|
|
|
kfree(ovpn_sock);
|
|
|
|
ovpn_sock = ERR_PTR(ret);
|
|
|
|
goto sock_release;
|
|
|
|
}
|
|
|
|
|
2025-04-15 13:17:28 +02:00
|
|
|
/* TCP sockets are per-peer, therefore they are linked to their unique
|
|
|
|
* peer
|
|
|
|
*/
|
ovpn: ensure sk is still valid during cleanup
Removing a peer while userspace attempts to close its transport
socket triggers a race condition resulting in the following
crash:
Oops: general protection fault, probably for non-canonical address 0xdffffc0000000077: 0000 [#1] SMP KASAN
KASAN: null-ptr-deref in range [0x00000000000003b8-0x00000000000003bf]
CPU: 12 UID: 0 PID: 162 Comm: kworker/12:1 Tainted: G O 6.15.0-rc2-00635-g521139ac3840 #272 PREEMPT(full)
Tainted: [O]=OOT_MODULE
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-20240910_120124-localhost 04/01/2014
Workqueue: events ovpn_peer_keepalive_work [ovpn]
RIP: 0010:ovpn_socket_release+0x23c/0x500 [ovpn]
Code: ea 03 80 3c 02 00 0f 85 71 02 00 00 48 b8 00 00 00 00 00 fc ff df 4d 8b 64 24 18 49 8d bc 24 be 03 00 00 48 89 fa 48 c1 ea 03 <0f> b6 14 02 48 89 f8 83 e0 07 83 c0 01 38 d0 7c 08 84 d2 0f 85 30
RSP: 0018:ffffc90000c9fb18 EFLAGS: 00010217
RAX: dffffc0000000000 RBX: ffff8881148d7940 RCX: ffffffff817787bb
RDX: 0000000000000077 RSI: 0000000000000008 RDI: 00000000000003be
RBP: ffffc90000c9fb30 R08: 0000000000000000 R09: fffffbfff0d3e840
R10: ffffffff869f4207 R11: 0000000000000000 R12: 0000000000000000
R13: ffff888115eb9300 R14: ffffc90000c9fbc8 R15: 000000000000000c
FS: 0000000000000000(0000) GS:ffff8882b0151000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f37266b6114 CR3: 00000000054a8000 CR4: 0000000000750ef0
PKRU: 55555554
Call Trace:
<TASK>
unlock_ovpn+0x8b/0xe0 [ovpn]
ovpn_peer_keepalive_work+0xe3/0x540 [ovpn]
? ovpn_peers_free+0x780/0x780 [ovpn]
? lock_acquire+0x56/0x70
? process_one_work+0x888/0x1740
process_one_work+0x933/0x1740
? pwq_dec_nr_in_flight+0x10b0/0x10b0
? move_linked_works+0x12d/0x2c0
? assign_work+0x163/0x270
worker_thread+0x4d6/0xd90
? preempt_count_sub+0x4c/0x70
? process_one_work+0x1740/0x1740
kthread+0x36c/0x710
? trace_preempt_on+0x8c/0x1e0
? kthread_is_per_cpu+0xc0/0xc0
? preempt_count_sub+0x4c/0x70
? _raw_spin_unlock_irq+0x36/0x60
? calculate_sigpending+0x7b/0xa0
? kthread_is_per_cpu+0xc0/0xc0
ret_from_fork+0x3a/0x80
? kthread_is_per_cpu+0xc0/0xc0
ret_from_fork_asm+0x11/0x20
</TASK>
Modules linked in: ovpn(O)
This happens because the peer deletion operation reaches
ovpn_socket_release() while ovpn_sock->sock (struct socket *)
and its sk member (struct sock *) are still both valid.
Here synchronize_rcu() is invoked, after which ovpn_sock->sock->sk
becomes NULL, due to the concurrent socket closing triggered
from userspace.
After having invoked synchronize_rcu(), ovpn_socket_release() will
attempt dereferencing ovpn_sock->sock->sk, triggering the crash
reported above.
The reason for accessing sk is that we need to retrieve its
protocol and continue the cleanup routine accordingly.
This crash can be easily produced by running openvpn userspace in
client mode with `--keepalive 10 20`, while entirely omitting this
option on the server side.
After 20 seconds ovpn will assume the peer (server) to be dead,
will start removing it and will notify userspace. The latter will
receive the notification and close the transport socket, thus
triggering the crash.
To fix the race condition for good, we need to refactor struct ovpn_socket.
Since ovpn is always only interested in the sock->sk member (struct sock *)
we can directly hold a reference to it, raher than accessing it via
its struct socket container.
This means changing "struct socket *ovpn_socket->sock" to
"struct sock *ovpn_socket->sk".
While acquiring a reference to sk, we can increase its refcounter
without affecting the socket close()/destroy() notification
(which we rely on when userspace closes a socket we are using).
By increasing sk's refcounter we know we can dereference it
in ovpn_socket_release() without incurring in any race condition
anymore.
ovpn_socket_release() will ultimately decrease the reference
counter.
Cc: Oleksandr Natalenko <oleksandr@natalenko.name>
Fixes: 11851cbd60ea ("ovpn: implement TCP transport")
Reported-by: Qingfang Deng <dqfext@gmail.com>
Closes: https://github.com/OpenVPN/ovpn-net-next/issues/1
Tested-by: Gert Doering <gert@greenie.muc.de>
Link: https://www.mail-archive.com/openvpn-devel@lists.sourceforge.net/msg31575.html
Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Signed-off-by: Antonio Quartulli <antonio@openvpn.net>
2025-04-30 02:26:49 +02:00
|
|
|
if (sk->sk_protocol == IPPROTO_TCP) {
|
2025-04-15 13:17:28 +02:00
|
|
|
INIT_WORK(&ovpn_sock->tcp_tx_work, ovpn_tcp_tx_work);
|
|
|
|
ovpn_sock->peer = peer;
|
|
|
|
ovpn_peer_hold(peer);
|
ovpn: ensure sk is still valid during cleanup
Removing a peer while userspace attempts to close its transport
socket triggers a race condition resulting in the following
crash:
Oops: general protection fault, probably for non-canonical address 0xdffffc0000000077: 0000 [#1] SMP KASAN
KASAN: null-ptr-deref in range [0x00000000000003b8-0x00000000000003bf]
CPU: 12 UID: 0 PID: 162 Comm: kworker/12:1 Tainted: G O 6.15.0-rc2-00635-g521139ac3840 #272 PREEMPT(full)
Tainted: [O]=OOT_MODULE
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-20240910_120124-localhost 04/01/2014
Workqueue: events ovpn_peer_keepalive_work [ovpn]
RIP: 0010:ovpn_socket_release+0x23c/0x500 [ovpn]
Code: ea 03 80 3c 02 00 0f 85 71 02 00 00 48 b8 00 00 00 00 00 fc ff df 4d 8b 64 24 18 49 8d bc 24 be 03 00 00 48 89 fa 48 c1 ea 03 <0f> b6 14 02 48 89 f8 83 e0 07 83 c0 01 38 d0 7c 08 84 d2 0f 85 30
RSP: 0018:ffffc90000c9fb18 EFLAGS: 00010217
RAX: dffffc0000000000 RBX: ffff8881148d7940 RCX: ffffffff817787bb
RDX: 0000000000000077 RSI: 0000000000000008 RDI: 00000000000003be
RBP: ffffc90000c9fb30 R08: 0000000000000000 R09: fffffbfff0d3e840
R10: ffffffff869f4207 R11: 0000000000000000 R12: 0000000000000000
R13: ffff888115eb9300 R14: ffffc90000c9fbc8 R15: 000000000000000c
FS: 0000000000000000(0000) GS:ffff8882b0151000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f37266b6114 CR3: 00000000054a8000 CR4: 0000000000750ef0
PKRU: 55555554
Call Trace:
<TASK>
unlock_ovpn+0x8b/0xe0 [ovpn]
ovpn_peer_keepalive_work+0xe3/0x540 [ovpn]
? ovpn_peers_free+0x780/0x780 [ovpn]
? lock_acquire+0x56/0x70
? process_one_work+0x888/0x1740
process_one_work+0x933/0x1740
? pwq_dec_nr_in_flight+0x10b0/0x10b0
? move_linked_works+0x12d/0x2c0
? assign_work+0x163/0x270
worker_thread+0x4d6/0xd90
? preempt_count_sub+0x4c/0x70
? process_one_work+0x1740/0x1740
kthread+0x36c/0x710
? trace_preempt_on+0x8c/0x1e0
? kthread_is_per_cpu+0xc0/0xc0
? preempt_count_sub+0x4c/0x70
? _raw_spin_unlock_irq+0x36/0x60
? calculate_sigpending+0x7b/0xa0
? kthread_is_per_cpu+0xc0/0xc0
ret_from_fork+0x3a/0x80
? kthread_is_per_cpu+0xc0/0xc0
ret_from_fork_asm+0x11/0x20
</TASK>
Modules linked in: ovpn(O)
This happens because the peer deletion operation reaches
ovpn_socket_release() while ovpn_sock->sock (struct socket *)
and its sk member (struct sock *) are still both valid.
Here synchronize_rcu() is invoked, after which ovpn_sock->sock->sk
becomes NULL, due to the concurrent socket closing triggered
from userspace.
After having invoked synchronize_rcu(), ovpn_socket_release() will
attempt dereferencing ovpn_sock->sock->sk, triggering the crash
reported above.
The reason for accessing sk is that we need to retrieve its
protocol and continue the cleanup routine accordingly.
This crash can be easily produced by running openvpn userspace in
client mode with `--keepalive 10 20`, while entirely omitting this
option on the server side.
After 20 seconds ovpn will assume the peer (server) to be dead,
will start removing it and will notify userspace. The latter will
receive the notification and close the transport socket, thus
triggering the crash.
To fix the race condition for good, we need to refactor struct ovpn_socket.
Since ovpn is always only interested in the sock->sk member (struct sock *)
we can directly hold a reference to it, raher than accessing it via
its struct socket container.
This means changing "struct socket *ovpn_socket->sock" to
"struct sock *ovpn_socket->sk".
While acquiring a reference to sk, we can increase its refcounter
without affecting the socket close()/destroy() notification
(which we rely on when userspace closes a socket we are using).
By increasing sk's refcounter we know we can dereference it
in ovpn_socket_release() without incurring in any race condition
anymore.
ovpn_socket_release() will ultimately decrease the reference
counter.
Cc: Oleksandr Natalenko <oleksandr@natalenko.name>
Fixes: 11851cbd60ea ("ovpn: implement TCP transport")
Reported-by: Qingfang Deng <dqfext@gmail.com>
Closes: https://github.com/OpenVPN/ovpn-net-next/issues/1
Tested-by: Gert Doering <gert@greenie.muc.de>
Link: https://www.mail-archive.com/openvpn-devel@lists.sourceforge.net/msg31575.html
Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Signed-off-by: Antonio Quartulli <antonio@openvpn.net>
2025-04-30 02:26:49 +02:00
|
|
|
} else if (sk->sk_protocol == IPPROTO_UDP) {
|
2025-04-15 13:17:25 +02:00
|
|
|
/* in UDP we only link the ovpn instance since the socket is
|
|
|
|
* shared among multiple peers
|
|
|
|
*/
|
|
|
|
ovpn_sock->ovpn = peer->ovpn;
|
|
|
|
netdev_hold(peer->ovpn->dev, &ovpn_sock->dev_tracker,
|
|
|
|
GFP_KERNEL);
|
|
|
|
}
|
|
|
|
|
ovpn: ensure sk is still valid during cleanup
Removing a peer while userspace attempts to close its transport
socket triggers a race condition resulting in the following
crash:
Oops: general protection fault, probably for non-canonical address 0xdffffc0000000077: 0000 [#1] SMP KASAN
KASAN: null-ptr-deref in range [0x00000000000003b8-0x00000000000003bf]
CPU: 12 UID: 0 PID: 162 Comm: kworker/12:1 Tainted: G O 6.15.0-rc2-00635-g521139ac3840 #272 PREEMPT(full)
Tainted: [O]=OOT_MODULE
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-20240910_120124-localhost 04/01/2014
Workqueue: events ovpn_peer_keepalive_work [ovpn]
RIP: 0010:ovpn_socket_release+0x23c/0x500 [ovpn]
Code: ea 03 80 3c 02 00 0f 85 71 02 00 00 48 b8 00 00 00 00 00 fc ff df 4d 8b 64 24 18 49 8d bc 24 be 03 00 00 48 89 fa 48 c1 ea 03 <0f> b6 14 02 48 89 f8 83 e0 07 83 c0 01 38 d0 7c 08 84 d2 0f 85 30
RSP: 0018:ffffc90000c9fb18 EFLAGS: 00010217
RAX: dffffc0000000000 RBX: ffff8881148d7940 RCX: ffffffff817787bb
RDX: 0000000000000077 RSI: 0000000000000008 RDI: 00000000000003be
RBP: ffffc90000c9fb30 R08: 0000000000000000 R09: fffffbfff0d3e840
R10: ffffffff869f4207 R11: 0000000000000000 R12: 0000000000000000
R13: ffff888115eb9300 R14: ffffc90000c9fbc8 R15: 000000000000000c
FS: 0000000000000000(0000) GS:ffff8882b0151000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f37266b6114 CR3: 00000000054a8000 CR4: 0000000000750ef0
PKRU: 55555554
Call Trace:
<TASK>
unlock_ovpn+0x8b/0xe0 [ovpn]
ovpn_peer_keepalive_work+0xe3/0x540 [ovpn]
? ovpn_peers_free+0x780/0x780 [ovpn]
? lock_acquire+0x56/0x70
? process_one_work+0x888/0x1740
process_one_work+0x933/0x1740
? pwq_dec_nr_in_flight+0x10b0/0x10b0
? move_linked_works+0x12d/0x2c0
? assign_work+0x163/0x270
worker_thread+0x4d6/0xd90
? preempt_count_sub+0x4c/0x70
? process_one_work+0x1740/0x1740
kthread+0x36c/0x710
? trace_preempt_on+0x8c/0x1e0
? kthread_is_per_cpu+0xc0/0xc0
? preempt_count_sub+0x4c/0x70
? _raw_spin_unlock_irq+0x36/0x60
? calculate_sigpending+0x7b/0xa0
? kthread_is_per_cpu+0xc0/0xc0
ret_from_fork+0x3a/0x80
? kthread_is_per_cpu+0xc0/0xc0
ret_from_fork_asm+0x11/0x20
</TASK>
Modules linked in: ovpn(O)
This happens because the peer deletion operation reaches
ovpn_socket_release() while ovpn_sock->sock (struct socket *)
and its sk member (struct sock *) are still both valid.
Here synchronize_rcu() is invoked, after which ovpn_sock->sock->sk
becomes NULL, due to the concurrent socket closing triggered
from userspace.
After having invoked synchronize_rcu(), ovpn_socket_release() will
attempt dereferencing ovpn_sock->sock->sk, triggering the crash
reported above.
The reason for accessing sk is that we need to retrieve its
protocol and continue the cleanup routine accordingly.
This crash can be easily produced by running openvpn userspace in
client mode with `--keepalive 10 20`, while entirely omitting this
option on the server side.
After 20 seconds ovpn will assume the peer (server) to be dead,
will start removing it and will notify userspace. The latter will
receive the notification and close the transport socket, thus
triggering the crash.
To fix the race condition for good, we need to refactor struct ovpn_socket.
Since ovpn is always only interested in the sock->sk member (struct sock *)
we can directly hold a reference to it, raher than accessing it via
its struct socket container.
This means changing "struct socket *ovpn_socket->sock" to
"struct sock *ovpn_socket->sk".
While acquiring a reference to sk, we can increase its refcounter
without affecting the socket close()/destroy() notification
(which we rely on when userspace closes a socket we are using).
By increasing sk's refcounter we know we can dereference it
in ovpn_socket_release() without incurring in any race condition
anymore.
ovpn_socket_release() will ultimately decrease the reference
counter.
Cc: Oleksandr Natalenko <oleksandr@natalenko.name>
Fixes: 11851cbd60ea ("ovpn: implement TCP transport")
Reported-by: Qingfang Deng <dqfext@gmail.com>
Closes: https://github.com/OpenVPN/ovpn-net-next/issues/1
Tested-by: Gert Doering <gert@greenie.muc.de>
Link: https://www.mail-archive.com/openvpn-devel@lists.sourceforge.net/msg31575.html
Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Signed-off-by: Antonio Quartulli <antonio@openvpn.net>
2025-04-30 02:26:49 +02:00
|
|
|
rcu_assign_sk_user_data(sk, ovpn_sock);
|
2025-04-15 13:17:23 +02:00
|
|
|
sock_release:
|
ovpn: ensure sk is still valid during cleanup
Removing a peer while userspace attempts to close its transport
socket triggers a race condition resulting in the following
crash:
Oops: general protection fault, probably for non-canonical address 0xdffffc0000000077: 0000 [#1] SMP KASAN
KASAN: null-ptr-deref in range [0x00000000000003b8-0x00000000000003bf]
CPU: 12 UID: 0 PID: 162 Comm: kworker/12:1 Tainted: G O 6.15.0-rc2-00635-g521139ac3840 #272 PREEMPT(full)
Tainted: [O]=OOT_MODULE
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-20240910_120124-localhost 04/01/2014
Workqueue: events ovpn_peer_keepalive_work [ovpn]
RIP: 0010:ovpn_socket_release+0x23c/0x500 [ovpn]
Code: ea 03 80 3c 02 00 0f 85 71 02 00 00 48 b8 00 00 00 00 00 fc ff df 4d 8b 64 24 18 49 8d bc 24 be 03 00 00 48 89 fa 48 c1 ea 03 <0f> b6 14 02 48 89 f8 83 e0 07 83 c0 01 38 d0 7c 08 84 d2 0f 85 30
RSP: 0018:ffffc90000c9fb18 EFLAGS: 00010217
RAX: dffffc0000000000 RBX: ffff8881148d7940 RCX: ffffffff817787bb
RDX: 0000000000000077 RSI: 0000000000000008 RDI: 00000000000003be
RBP: ffffc90000c9fb30 R08: 0000000000000000 R09: fffffbfff0d3e840
R10: ffffffff869f4207 R11: 0000000000000000 R12: 0000000000000000
R13: ffff888115eb9300 R14: ffffc90000c9fbc8 R15: 000000000000000c
FS: 0000000000000000(0000) GS:ffff8882b0151000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f37266b6114 CR3: 00000000054a8000 CR4: 0000000000750ef0
PKRU: 55555554
Call Trace:
<TASK>
unlock_ovpn+0x8b/0xe0 [ovpn]
ovpn_peer_keepalive_work+0xe3/0x540 [ovpn]
? ovpn_peers_free+0x780/0x780 [ovpn]
? lock_acquire+0x56/0x70
? process_one_work+0x888/0x1740
process_one_work+0x933/0x1740
? pwq_dec_nr_in_flight+0x10b0/0x10b0
? move_linked_works+0x12d/0x2c0
? assign_work+0x163/0x270
worker_thread+0x4d6/0xd90
? preempt_count_sub+0x4c/0x70
? process_one_work+0x1740/0x1740
kthread+0x36c/0x710
? trace_preempt_on+0x8c/0x1e0
? kthread_is_per_cpu+0xc0/0xc0
? preempt_count_sub+0x4c/0x70
? _raw_spin_unlock_irq+0x36/0x60
? calculate_sigpending+0x7b/0xa0
? kthread_is_per_cpu+0xc0/0xc0
ret_from_fork+0x3a/0x80
? kthread_is_per_cpu+0xc0/0xc0
ret_from_fork_asm+0x11/0x20
</TASK>
Modules linked in: ovpn(O)
This happens because the peer deletion operation reaches
ovpn_socket_release() while ovpn_sock->sock (struct socket *)
and its sk member (struct sock *) are still both valid.
Here synchronize_rcu() is invoked, after which ovpn_sock->sock->sk
becomes NULL, due to the concurrent socket closing triggered
from userspace.
After having invoked synchronize_rcu(), ovpn_socket_release() will
attempt dereferencing ovpn_sock->sock->sk, triggering the crash
reported above.
The reason for accessing sk is that we need to retrieve its
protocol and continue the cleanup routine accordingly.
This crash can be easily produced by running openvpn userspace in
client mode with `--keepalive 10 20`, while entirely omitting this
option on the server side.
After 20 seconds ovpn will assume the peer (server) to be dead,
will start removing it and will notify userspace. The latter will
receive the notification and close the transport socket, thus
triggering the crash.
To fix the race condition for good, we need to refactor struct ovpn_socket.
Since ovpn is always only interested in the sock->sk member (struct sock *)
we can directly hold a reference to it, raher than accessing it via
its struct socket container.
This means changing "struct socket *ovpn_socket->sock" to
"struct sock *ovpn_socket->sk".
While acquiring a reference to sk, we can increase its refcounter
without affecting the socket close()/destroy() notification
(which we rely on when userspace closes a socket we are using).
By increasing sk's refcounter we know we can dereference it
in ovpn_socket_release() without incurring in any race condition
anymore.
ovpn_socket_release() will ultimately decrease the reference
counter.
Cc: Oleksandr Natalenko <oleksandr@natalenko.name>
Fixes: 11851cbd60ea ("ovpn: implement TCP transport")
Reported-by: Qingfang Deng <dqfext@gmail.com>
Closes: https://github.com/OpenVPN/ovpn-net-next/issues/1
Tested-by: Gert Doering <gert@greenie.muc.de>
Link: https://www.mail-archive.com/openvpn-devel@lists.sourceforge.net/msg31575.html
Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Signed-off-by: Antonio Quartulli <antonio@openvpn.net>
2025-04-30 02:26:49 +02:00
|
|
|
release_sock(sk);
|
2025-04-15 13:17:23 +02:00
|
|
|
return ovpn_sock;
|
|
|
|
}
|