Commit graph

4 commits

Author SHA1 Message Date
Antonio Quartulli
a6a5e87b3e ovpn: avoid sleep in atomic context in TCP RX error path
Upon error along the TCP data_ready event path, we have
the following chain of calls:

strp_data_ready()
  ovpn_tcp_rcv()
    ovpn_peer_del()
      ovpn_socket_release()

Since strp_data_ready() may be invoked from softirq context, and
ovpn_socket_release() may sleep, the above sequence may cause a
sleep in atomic context like the following:

    BUG: sleeping function called from invalid context at ./ovpn-backports-ovpn-net-next-main-6.15.0-rc5-20250522/drivers/net/ovpn/socket.c:71
    in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 25, name: ksoftirqd/3
    5 locks held by ksoftirqd/3/25:
     #0: ffffffe000cd0580 (rcu_read_lock){....}-{1:2}, at: netif_receive_skb+0xb8/0x5b0
     OpenVPN/ovpn-backports#1: ffffffe000cd0580 (rcu_read_lock){....}-{1:2}, at: netif_receive_skb+0xb8/0x5b0
     OpenVPN/ovpn-backports#2: ffffffe000cd0580 (rcu_read_lock){....}-{1:2}, at: ip_local_deliver_finish+0x66/0x1e0
     OpenVPN/ovpn-backports#3: ffffffe003ce9818 (slock-AF_INET/1){+.-.}-{2:2}, at: tcp_v4_rcv+0x156e/0x17a0
     OpenVPN/ovpn-backports#4: ffffffe000cd0580 (rcu_read_lock){....}-{1:2}, at: ovpn_tcp_data_ready+0x0/0x1b0 [ovpn]
    CPU: 3 PID: 25 Comm: ksoftirqd/3 Not tainted 5.10.104+ #0
    Call Trace:
    walk_stackframe+0x0/0x1d0
    show_stack+0x2e/0x44
    dump_stack+0xc2/0x102
    ___might_sleep+0x29c/0x2b0
    __might_sleep+0x62/0xa0
    ovpn_socket_release+0x24/0x2d0 [ovpn]
    unlock_ovpn+0x6e/0x190 [ovpn]
    ovpn_peer_del+0x13c/0x390 [ovpn]
    ovpn_tcp_rcv+0x280/0x560 [ovpn]
    __strp_recv+0x262/0x940
    strp_recv+0x66/0x80
    tcp_read_sock+0x122/0x410
    strp_data_ready+0x156/0x1f0
    ovpn_tcp_data_ready+0x92/0x1b0 [ovpn]
    tcp_data_ready+0x6c/0x150
    tcp_rcv_established+0xb36/0xc50
    tcp_v4_do_rcv+0x25e/0x380
    tcp_v4_rcv+0x166a/0x17a0
    ip_protocol_deliver_rcu+0x8c/0x250
    ip_local_deliver_finish+0xf8/0x1e0
    ip_local_deliver+0xc2/0x2d0
    ip_rcv+0x1f2/0x330
    __netif_receive_skb+0xfc/0x290
    netif_receive_skb+0x104/0x5b0
    br_pass_frame_up+0x190/0x3f0
    br_handle_frame_finish+0x3e2/0x7a0
    br_handle_frame+0x750/0xab0
    __netif_receive_skb_core.constprop.0+0x4c0/0x17f0
    __netif_receive_skb+0xc6/0x290
    netif_receive_skb+0x104/0x5b0
    xgmac_dma_rx+0x962/0xb40
    __napi_poll.constprop.0+0x5a/0x350
    net_rx_action+0x1fe/0x4b0
    __do_softirq+0x1f8/0x85c
    run_ksoftirqd+0x80/0xd0
    smpboot_thread_fn+0x1f0/0x3e0
    kthread+0x1e6/0x210
    ret_from_kernel_thread+0x8/0xc

Fix this issue by postponing the ovpn_peer_del() call to
a scheduled worker, as we already do in ovpn_tcp_send_sock()
for the very same reason.

Fixes: 11851cbd60 ("ovpn: implement TCP transport")
Reported-by: Qingfang Deng <dqfext@gmail.com>
Closes: https://github.com/OpenVPN/ovpn-net-next/issues/13
Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Signed-off-by: Antonio Quartulli <antonio@openvpn.net>
2025-06-03 13:08:15 +02:00
Antonio Quartulli
ba499a07ce ovpn: ensure sk is still valid during cleanup
Removing a peer while userspace attempts to close its transport
socket triggers a race condition resulting in the following
crash:

Oops: general protection fault, probably for non-canonical address 0xdffffc0000000077: 0000 [#1] SMP KASAN
KASAN: null-ptr-deref in range [0x00000000000003b8-0x00000000000003bf]
CPU: 12 UID: 0 PID: 162 Comm: kworker/12:1 Tainted: G           O        6.15.0-rc2-00635-g521139ac3840 #272 PREEMPT(full)
Tainted: [O]=OOT_MODULE
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-20240910_120124-localhost 04/01/2014
Workqueue: events ovpn_peer_keepalive_work [ovpn]
RIP: 0010:ovpn_socket_release+0x23c/0x500 [ovpn]
Code: ea 03 80 3c 02 00 0f 85 71 02 00 00 48 b8 00 00 00 00 00 fc ff df 4d 8b 64 24 18 49 8d bc 24 be 03 00 00 48 89 fa 48 c1 ea 03 <0f> b6 14 02 48 89 f8 83 e0 07 83 c0 01 38 d0 7c 08 84 d2 0f 85 30
RSP: 0018:ffffc90000c9fb18 EFLAGS: 00010217
RAX: dffffc0000000000 RBX: ffff8881148d7940 RCX: ffffffff817787bb
RDX: 0000000000000077 RSI: 0000000000000008 RDI: 00000000000003be
RBP: ffffc90000c9fb30 R08: 0000000000000000 R09: fffffbfff0d3e840
R10: ffffffff869f4207 R11: 0000000000000000 R12: 0000000000000000
R13: ffff888115eb9300 R14: ffffc90000c9fbc8 R15: 000000000000000c
FS:  0000000000000000(0000) GS:ffff8882b0151000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f37266b6114 CR3: 00000000054a8000 CR4: 0000000000750ef0
PKRU: 55555554
Call Trace:
 <TASK>
 unlock_ovpn+0x8b/0xe0 [ovpn]
 ovpn_peer_keepalive_work+0xe3/0x540 [ovpn]
 ? ovpn_peers_free+0x780/0x780 [ovpn]
 ? lock_acquire+0x56/0x70
 ? process_one_work+0x888/0x1740
 process_one_work+0x933/0x1740
 ? pwq_dec_nr_in_flight+0x10b0/0x10b0
 ? move_linked_works+0x12d/0x2c0
 ? assign_work+0x163/0x270
 worker_thread+0x4d6/0xd90
 ? preempt_count_sub+0x4c/0x70
 ? process_one_work+0x1740/0x1740
 kthread+0x36c/0x710
 ? trace_preempt_on+0x8c/0x1e0
 ? kthread_is_per_cpu+0xc0/0xc0
 ? preempt_count_sub+0x4c/0x70
 ? _raw_spin_unlock_irq+0x36/0x60
 ? calculate_sigpending+0x7b/0xa0
 ? kthread_is_per_cpu+0xc0/0xc0
 ret_from_fork+0x3a/0x80
 ? kthread_is_per_cpu+0xc0/0xc0
 ret_from_fork_asm+0x11/0x20
 </TASK>
Modules linked in: ovpn(O)

This happens because the peer deletion operation reaches
ovpn_socket_release() while ovpn_sock->sock (struct socket *)
and its sk member (struct sock *) are still both valid.
Here synchronize_rcu() is invoked, after which ovpn_sock->sock->sk
becomes NULL, due to the concurrent socket closing triggered
from userspace.

After having invoked synchronize_rcu(), ovpn_socket_release() will
attempt dereferencing ovpn_sock->sock->sk, triggering the crash
reported above.

The reason for accessing sk is that we need to retrieve its
protocol and continue the cleanup routine accordingly.

This crash can be easily produced by running openvpn userspace in
client mode with `--keepalive 10 20`, while entirely omitting this
option on the server side.
After 20 seconds ovpn will assume the peer (server) to be dead,
will start removing it and will notify userspace. The latter will
receive the notification and close the transport socket, thus
triggering the crash.

To fix the race condition for good, we need to refactor struct ovpn_socket.
Since ovpn is always only interested in the sock->sk member (struct sock *)
we can directly hold a reference to it, raher than accessing it via
its struct socket container.

This means changing "struct socket *ovpn_socket->sock" to
"struct sock *ovpn_socket->sk".

While acquiring a reference to sk, we can increase its refcounter
without affecting the socket close()/destroy() notification
(which we rely on when userspace closes a socket we are using).

By increasing sk's refcounter we know we can dereference it
in ovpn_socket_release() without incurring in any race condition
anymore.

ovpn_socket_release() will ultimately decrease the reference
counter.

Cc: Oleksandr Natalenko <oleksandr@natalenko.name>
Fixes: 11851cbd60 ("ovpn: implement TCP transport")
Reported-by: Qingfang Deng <dqfext@gmail.com>
Closes: https://github.com/OpenVPN/ovpn-net-next/issues/1
Tested-by: Gert Doering <gert@greenie.muc.de>
Link: https://www.mail-archive.com/openvpn-devel@lists.sourceforge.net/msg31575.html
Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Signed-off-by: Antonio Quartulli <antonio@openvpn.net>
2025-06-03 13:08:15 +02:00
Antonio Quartulli
36bb1d713a ovpn: add support for MSG_NOSIGNAL in tcp_sendmsg
Userspace may want to pass the MSG_NOSIGNAL flag to
tcp_sendmsg() in order to avoid generating a SIGPIPE.

To pass this flag down the TCP stack a new skb sending API
accepting a flags argument is introduced.

Cc: Eric Dumazet <edumazet@google.com>
Cc: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Antonio Quartulli <antonio@openvpn.net>
Link: https://patch.msgid.link/20250415-b4-ovpn-v26-13-577f6097b964@openvpn.net
Reviewed-by: Sabrina Dubroca <sd@queasysnail.net>
Tested-by: Oleksandr Natalenko <oleksandr@natalenko.name>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-04-17 12:30:03 +02:00
Antonio Quartulli
11851cbd60 ovpn: implement TCP transport
With this change ovpn is allowed to communicate to peers also via TCP.
Parsing of incoming messages is implemented through the strparser API.

Note that ovpn redefines sk_prot and sk_socket->ops for the TCP socket
used to communicate with the peer.
For this reason it needs to access inet6_stream_ops, which is declared
as extern in the IPv6 module, but it is not fully exported.

Therefore this patch is also adding EXPORT_SYMBOL_GPL(inet6_stream_ops)
to net/ipv6/af_inet6.c.

Cc: David Ahern <dsahern@kernel.org>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Paolo Abeni <pabeni@redhat.com>
Cc: Simon Horman <horms@kernel.org>
Signed-off-by: Antonio Quartulli <antonio@openvpn.net>
Link: https://patch.msgid.link/20250415-b4-ovpn-v26-11-577f6097b964@openvpn.net
Reviewed-by: Sabrina Dubroca <sd@queasysnail.net>
Tested-by: Oleksandr Natalenko <oleksandr@natalenko.name>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-04-17 12:30:03 +02:00