Commit graph

1726 commits

Author SHA1 Message Date
Pavel Begunkov
62f666df76 io_uring/eventfd: dedup signalling helpers
Consolidate io_eventfd_flush_signal() and io_eventfd_signal(). Not much
of a difference for now, but it prepares it for following changes.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/5beecd4da65d8d2d83df499196f84b329387f6a2.1745493845.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-04-24 08:33:54 -06:00
Pavel Begunkov
5e16f1a68d io_uring: don't duplicate flushing in io_req_post_cqe
io_req_post_cqe() sets submit_state.cq_flush so that
*flush_completions() can take care of batch commiting CQEs. Don't commit
it twice by using __io_cq_unlock_post().

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/41c416660c509cee676b6cad96081274bcb459f3.1745493861.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-04-24 06:28:43 -06:00
Pavel Begunkov
76f1cc98b2 io_uring/zcrx: add support for multiple ifqs
Allow the user to register multiple ifqs / zcrx contexts. With that we
can use multiple interfaces / interface queues in a single io_uring
instance.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/668b03bee03b5216564482edcfefbc2ee337dd30.1745141261.git.asml.silence@gmail.com
[axboe: fold in fix]
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-04-23 07:13:14 -06:00
Pavel Begunkov
632b318672 io_uring/zcrx: move zcrx region to struct io_zcrx_ifq
Refill queue region is a part of zcrx and should stay in struct
io_zcrx_ifq. We can't have multiple queues without it, so move it there.

As a result there is no context global zcrx region anymore, and the
region is looked up together with its ifq. To protect a concurrent mmap
from seeing an inconsistent region we were protecting changes to
->zcrx_region with mmap_lock, but now it protect the publishing of the
ifq.

Reviewed-by: David Wei <dw@davidwei.uk>
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/24f1a728fc03d0166f16d099575457e10d9d90f2.1745141261.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-04-21 05:12:31 -06:00
Pavel Begunkov
77231d4e46 io_uring/zcrx: let zcrx choose region for mmaping
In preparation for adding multiple ifqs, add a helper returning a region
for mmaping zcrx refill queue. For now it's trivial and returns the same
ctx global ->zcrx_region.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/d9006e2ef8cd5e5b337c2ba2228ede8409eb60f2.1745141261.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-04-21 05:11:40 -06:00
Pavel Begunkov
59bc1ab922 io_uring/zcrx: remove sqe->file_index check
sqe->file_index and sqe->zcrx_ifq_idx are aliased, so even though
io_recvzc_prep() has all necessary handling and checking for
->zcrx_ifq_idx, it's already rejected a couple of lines above.

It's not a real problem, however, as we can only have one ifq at the
moment, which index is always zero, and it returns correct error
codes to the user.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/b51acedd326d1a87bf53e33303e6fc87661a3e3f.1745141261.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-04-21 05:11:40 -06:00
Pavel Begunkov
a79154ae5d io_uring/zcrx: move io_zcrx_iov_page
We'll need io_zcrx_iov_page at the top to keep offset calculations
closer together, move it there.

Reviewed-by: David Wei <dw@davidwei.uk>
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/575617033a8b84a5985c7eb760b7121efdbe7e56.1745141261.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-04-21 05:11:40 -06:00
Pavel Begunkov
37d26edd6b io_uring/zcrx: remove duplicated freelist init
Several lines below we already initialise the freelist, don't do it
twice.

Reviewed-by: David Wei <dw@davidwei.uk>
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/71b07c4e00d1db65899d1ed8d603d961fe7d1c28.1745141261.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-04-21 05:11:40 -06:00
Pavel Begunkov
be6bad57b2 io_uring/rsrc: remove null check on import
WARN_ON_ONCE() checking imu for NULL in io_import_fixed() is in the hot
path and too protective, it's time to get rid of it.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/3782c01e311dbd474bca45aefaf79a2f2822fafb.1745083025.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-04-21 05:10:04 -06:00
Pavel Begunkov
9cebcf7b0c io_uring/rsrc: clean up io_coalesce_buffer()
We don't need special handling for the first page in
io_coalesce_buffer(), move it inside the loop.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Reviewed-by: Anuj Gupta <anuj20.g@samsung.com>
Link: https://lore.kernel.org/r/ad698cddc1eadb3d92a7515e95bb13f79420323d.1745083025.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-04-21 05:10:04 -06:00
Pavel Begunkov
ea76925614 io_uring/rsrc: use unpin_user_folio
We want to have a full folio to be left pinned but with only one
reference, for that we "unpin" all but the first page with
unpin_user_pages(), which can be confusing. There is a new helper to
achieve that called unpin_user_folio(), so use that.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Reviewed-by: Anuj Gupta <anuj20.g@samsung.com>
Link: https://lore.kernel.org/r/e0b2be8f9ea68f6b351ec3bb046f04f437f68491.1745083025.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-04-21 05:10:04 -06:00
Jens Axboe
8a2dacd49f io_uring/rsrc: remove node assignment helpers
There are two helpers here, one assigns and increments the node ref
count, and the other is simply a wrapper around that for the buffer node
handling.

The buffer node assignment benefits from checking and setting
REQ_F_BUF_NODE together, otherwise stalls have been observed on setting
that flag later in the process. Hence re-do it so that it's set when
checked, and cleared in case of (unlikely) failure. With that, the
buffer node helper can go, and then drop the generic
io_req_assign_rsrc_node() helper as well as there's only a single user
of it left at that point.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-04-21 05:06:58 -06:00
Jens Axboe
53db8a71ec io_uring: add support for IORING_OP_PIPE
This works just like pipe2(2), except it also supports fixed file
descriptors. Used in a similar fashion as for other fd instantiating
opcodes (like accept, socket, open, etc), where sqe->file_slot is set
appropriately if two direct descriptors are desired rather than a set
of normal file descriptors.

sqe->addr must be set to a pointer to an array of 2 integers, which
is where the fixed/normal file descriptors are copied to.

sqe->pipe_flags contains flags, same as what is allowed for pipe2(2).

Future expansion of per-op private flags can go in sqe->ioprio,
like we do for other opcodes that take both a "syscall" flag set and
an io_uring opcode specific flag set.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-04-21 05:06:58 -06:00
Pavel Begunkov
bd32923e5f io_uring: don't store bgid in req->buf_index
Pass buffer group id into the rest of helpers via struct buf_sel_arg
and remove all reassignments of req->buf_index back to bgid. Now, it
only stores buffer indexes, and the group is provided by callers.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/3ea9fa08113ecb4d9224b943e7806e80a324bdf9.1743437358.git.asml.silence@gmail.com
Link: https://lore.kernel.org/io-uring/0c01d76ff12986c2f48614db8610caff8f78c869.1743500909.git.asml.silence@gmail.com/
[axboe: fold in patch from second link]
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-04-21 05:06:58 -06:00
Pavel Begunkov
c0e9650521 io_uring/kbuf: pass bgid to io_buffer_select()
The current situation with buffer group id juggling is not ideal.
req->buf_index first stores the bgid, then it's overwritten by a buffer
id, and then it can get restored back no recycling / etc. It's not so
easy to control, and it's not handled consistently across request types
with receive requests saving and restoring the bgid it by hand.

It's a prep patch that adds a buffer group id argument to
io_buffer_select(). The caller will be responsible for stashing a copy
somewhere and passing it into the function.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/a210d6427cc3f4f42271a6853274cd5a50e56820.1743437358.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-04-21 05:06:58 -06:00
Pavel Begunkov
e6f74fd67d io_uring: set IMPORT_BUFFER in generic send setup
Move REQ_F_IMPORT_BUFFER to the common send setup. Currently, the only
user is send zc, but we'll want for normal sends to support that in the
future.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/18b74edfc61d8a1b6c9fed3b78a9276fe80f8ced.1743437358.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-04-21 05:06:58 -06:00
Pavel Begunkov
e9ff9ae103 io_uring/net: don't use io_do_buffer_select at prep
Prep code is interested whether it's a selected buffer request, not
whether a buffer has already been selected like what
io_do_buffer_select() returns. Check for REQ_F_BUFFER_SELECT directly.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/4488a029ac698554bebf732263fe19d7734affa6.1743437358.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-04-21 05:06:58 -06:00
Caleb Sander Mateos
9fe99eed91 io_uring/wq: avoid indirect do_work/free_work calls
struct io_wq stores do_work and free_work function pointers which are
called on each work item. But these function pointers are always set to
io_wq_submit_work and io_wq_free_work, respectively. So remove these
function pointers and just call the functions directly.

Signed-off-by: Caleb Sander Mateos <csander@purestorage.com>
Link: https://lore.kernel.org/r/20250329161527.3281314-1-csander@purestorage.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-04-21 05:06:58 -06:00
Pavel Begunkov
f12ecf5e1c io_uring/zcrx: fix late dma unmap for a dead dev
There is a problem with page pools not dma-unmapping immediately when
the device is going down, and delaying it until the page pool is
destroyed, which is not allowed (see links). That just got fixed for
normal page pools, and we need to address memory providers as well.

Unmap pages in the memory provider uninstall callback, and protect it
with a new lock. There is also a gap between when a dma mapping is
created and the mp is installed, so if the device is killed in between,
io_uring would be holding on to dma mappings to a dead device with no
one to call ->uninstall. Move it to page pool init and rely on
->is_mapped to make sure it's only done once.

Link: https://lore.kernel.org/lkml/8067f204-1380-4d37-8ffd-007fc6f26738@kernel.org/T/
Link: https://lore.kernel.org/all/20250409-page-pool-track-dma-v9-0-6a9ef2e0cba8@redhat.com/
Fixes: 34a3e60821 ("io_uring/zcrx: implement zerocopy receive pp memory provider")
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/ef9b7db249b14f6e0b570a1bb77ff177389f881c.1744965853.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-04-18 06:12:10 -06:00
Jens Axboe
b419bed4f0 io_uring/rsrc: ensure segments counts are correct on kbuf buffers
kbuf imports have the front offset adjusted and segments removed, but
the tail segments are still included in the segment count that gets
passed in the iov_iter. As the segments aren't necessarily all the
same size, move importing to a separate helper and iterate the
mapped length to get an exact count.

Reviewed-by: Nitesh Shetty <nj.shetty@samsung.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-04-17 11:59:12 -06:00
Nitesh Shetty
80c7378f94 io_uring/rsrc: send exact nr_segs for fixed buffer
Sending exact nr_segs, avoids bio split check and processing in
block layer, which takes around 5%[1] of overall CPU utilization.

In our setup, we see overall improvement of IOPS from 7.15M to 7.65M [2]
and 5% less CPU utilization.

[1]
     3.52%  io_uring         [kernel.kallsyms]     [k] bio_split_rw_at
     1.42%  io_uring         [kernel.kallsyms]     [k] bio_split_rw
     0.62%  io_uring         [kernel.kallsyms]     [k] bio_submit_split

[2]
sudo taskset -c 0,1 ./t/io_uring -b512 -d128 -c32 -s32 -p1 -F1 -B1 -n2
-r4 /dev/nvme0n1 /dev/nvme1n1

Signed-off-by: Nitesh Shetty <nj.shetty@samsung.com>
[Pavel: fixed for kbuf, rebased and reworked on top of cleanups]
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/7a1a49a8d053bd617c244291d63dbfbc07afde36.1744882081.git.asml.silence@gmail.com
[axboe: fold in fix factoring in buf reg offset]
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-04-17 07:42:14 -06:00
Pavel Begunkov
59852ebad9 io_uring/rsrc: refactor io_import_fixed
io_import_fixed is a mess. Even though we know the final len of the
iterator, we still assign offset + len and do some magic after to
correct for that.

Do offset calculation first and finalise it with iov_iter_bvec at the
end.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/2d5107fed24f8b23245ef2ede9a5a7f7c426df61.1744882081.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-04-17 06:21:13 -06:00
Pavel Begunkov
50169d0754 io_uring/rsrc: separate kbuf offset adjustments
Kernel registered buffers are special because segments are not uniform
in size, and we have a bunch of optimisations based on that uniformity
for normal buffers. Handle kbuf separately, it'll be cleaner this way.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/4e9e5990b0ab5aee723c0be5cd9b5bcf810375f9.1744882081.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-04-17 06:21:13 -06:00
Pavel Begunkov
1ac5712888 io_uring/rsrc: don't skip offset calculation
Don't optimise for requests with offset=0. Large registered buffers are
the preference and hence the user is likely to pass an offset, and the
adjustments are not expensive and will be made even cheaper in following
patches.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/1c2beb20470ee3c886a363d4d8340d3790db19f3.1744882081.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-04-17 06:21:13 -06:00
Pavel Begunkov
70e4f9bfc1 io_uring/zcrx: add pp to ifq conversion helper
It'll likely change how page pools store memory providers, so in
preparation for that, keep accesses in one place in io_uring by
introducing a helper.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/3522eb8fa9b4e21bcf32e7e9ae656c616b282210.1744722526.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-04-15 07:38:05 -06:00
Pavel Begunkov
25744f8495 io_uring/zcrx: return ifq id to the user
IORING_OP_RECV_ZC requests take a zcrx object id via sqe::zcrx_ifq_idx,
which binds it to the corresponding if / queue. However, we don't return
that id back to the user. It's fine as currently there can be only one
zcrx and the user assumes that its id should be 0, but as we'll need
multiple zcrx objects in the future let's explicitly pass it back on
registration.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/8714667d370651962f7d1a169032e5f02682a73e.1744722517.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-04-15 07:37:49 -06:00
Jens Axboe
cf960726eb io_uring/kbuf: reject zero sized provided buffers
This isn't fixing a real issue, but there's also zero point in going
through group and buffer setup, when the buffers are going to be
rejected once attempted to get used.

Cc: stable@vger.kernel.org
Reported-by: syzbot+58928048fd1416f1457c@syzkaller.appspotmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-04-07 07:51:23 -06:00
Pavel Begunkov
5a17131a5d io_uring/zcrx: separate niov number from pages
A preparation patch that separates the number of pages / folios from
the number of niovs. They will not match in the future to support huge
pages, improved dma mapping and/or larger chunk sizes.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/0780ac966ee84200385737f45bb0f2ada052392b.1743848231.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-04-07 07:36:52 -06:00
Pavel Begunkov
9b58440a5b io_uring/zcrx: put refill data into separate cache line
Refill queue lock and other bits are only used from the allocation path
on the rx softirq side, but it shares the cache line with other fields
like ctx that are used also in the "syscall" path, which causes cache
bouncing when softirq runs on a different CPU.

Separate them into different cache lines. The first one now contains
constant fields used by both contextx, followed by a line responsible
for refill queue data.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/6d1f598e27d623c07fc49d6baee13089a9b1216c.1743848241.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-04-07 07:36:52 -06:00
Pavel Begunkov
ab6005f391 io_uring: don't post tag CQEs on file/buffer registration failure
Buffer / file table registration is all or nothing, if it fails all
resources we might have partially registered are dropped and the table
is killed. If that happens, it doesn't make sense to post any rsrc tag
CQEs. That would be confusing to the application, which should not need
to handle that case.

Cc: stable@vger.kernel.org
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Fixes: 7029acd8a9 ("io_uring/rsrc: get rid of per-ring io_rsrc_node list")
Link: https://lore.kernel.org/r/c514446a8dcb0197cddd5d4ba8f6511da081cf1f.1743777957.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-04-04 09:40:44 -06:00
Pavel Begunkov
390513642e io_uring: always do atomic put from iowq
io_uring always switches requests to atomic refcounting for iowq
execution before there is any parallilism by setting REQ_F_REFCOUNT,
and the flag is not cleared until the request completes. That should be
fine as long as the compiler doesn't make up a non existing value for
the flags, however KCSAN still complains when the request owner changes
oter flag bits:

BUG: KCSAN: data-race in io_req_task_cancel / io_wq_free_work
...
read to 0xffff888117207448 of 8 bytes by task 3871 on cpu 0:
 req_ref_put_and_test io_uring/refs.h:22 [inline]

Skip REQ_F_REFCOUNT checks for iowq, we know it's set.

Reported-by: syzbot+903a2ad71fb3f1e47cf5@syzkaller.appspotmail.com
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/d880bc27fb8c3209b54641be4ff6ac02b0e5789a.1743679736.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-04-03 08:31:57 -06:00
Ming Lei
1045afae4b io_uring: support vectored kernel fixed buffer
io_uring has supported fixed kernel buffer via io_buffer_register_bvec()
and io_buffer_unregister_bvec().

The vectored fixed buffer has been ready, so it is natural to support
fixed kernel buffer, one use case is ublk.

Signed-off-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20250325135155.935398-4-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-04-02 07:06:59 -06:00
Ming Lei
8622b20f23 io_uring: add validate_fixed_range() for validate fixed buffer
Add helper of validate_fixed_range() for validating fixed buffer
range.

Signed-off-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Caleb Sander Mateos <csander@purestorage.com>
Link: https://lore.kernel.org/r/20250325135155.935398-2-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-04-02 07:06:43 -06:00
David Wei
fcfd94d696 io_uring/zcrx: return early from io_zcrx_recv_skb if readlen is 0
When readlen is set for a recvzc request, tcp_read_sock() will call
io_zcrx_recv_skb() one final time with len == desc->count == 0. This is
caused by the !desc->count check happening too late. The offset + 1 !=
skb->len happens earlier and causes the while loop to continue.

Fix this in io_zcrx_recv_skb() instead of tcp_read_sock(). Return early
if len is 0 i.e. the read is done.

Fixes: 6699ec9a23 ("io_uring/zcrx: add a read limit to recvzc requests")
Signed-off-by: David Wei <dw@davidwei.uk>
Link: https://lore.kernel.org/r/20250401195355.1613813-1-dw@davidwei.uk
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-04-01 14:00:46 -06:00
Pavel Begunkov
81ed18015d io_uring/net: avoid import_ubuf for regvec send
With registered buffers we set up iterators in helpers like
io_import_fixed(), and there is no need for a import_ubuf() before that.
It was fine as we used real pointers for offset calculation, but that's
not the case anymore since introduction of ublk kernel buffers.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/9b2de1a50844f848f62c8de609b494971033a6b9.1743437358.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-03-31 12:41:49 -06:00
Pavel Begunkov
a1fbe0a121 io_uring/rsrc: check size when importing reg buffer
We're relying on callers to verify the IO size, do it inside of
io_import_fixed() instead. It's safer, easier to deal with, and more
consistent as now it's done close to the iter init site.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/f9c2c75ec4d356a0c61289073f68d98e8a9db190.1743446271.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-03-31 12:41:49 -06:00
Pavel Begunkov
ed344511c5 io_uring: cleanup {g,s]etsockopt sqe reading
Add a local variable for the sqe pointer to avoid repetition.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/8dbac0f9acda2d3842534eeb7ce10d9276b021ae.1743357108.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-03-31 07:08:46 -06:00
Pavel Begunkov
296e169618 io_uring: hide caches sqes from drivers
There is now an io_uring private part of cmd async_data, move saved sqe
into it. Drivers are accessing it via struct io_uring_cmd::cmd.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/ecbe078dd57acefdbc4366d083327086c0879378.1743357121.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-03-31 07:08:34 -06:00
Pavel Begunkov
487a0710f8 io_uring: make zcrx depend on CONFIG_IO_URING
Reflect in the kconfig that zcrx requires io_uring compiled.

Fixes: 6f377873cb ("io_uring/zcrx: add interface queue and refill queue")
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/8047135a344e79dbd04ee36a7a69cc260aabc2ca.1743356260.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-03-31 07:07:44 -06:00
Pavel Begunkov
697b2876ac io_uring: add req flag invariant build assertion
We're caching some of file related request flags in a tricky way, put
a build check to make sure flags don't get reshuffled.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/9877577b83c25dd78224a8274f799187e7ec7639.1743407551.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-03-31 07:07:34 -06:00
Pavel Begunkov
ea9106786e io_uring: don't pass ctx to tw add remote helper
Unlike earlier versions, io_msg_remote_post() creates a valid request
with a proper context, so don't pass a context to
io_req_task_work_add_remote() explicitly but derive it from the request.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/721f51cf34996d98b48f0bfd24ad40aa2730167e.1743190078.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-03-28 17:14:01 -06:00
Pavel Begunkov
9cc0bbdaba io_uring/msg: initialise msg request opcode
It's risky to have msg request opcode set to garbage, so at least
initialise it to nop. Later we might want to add a user inaccessible
opcode for such cases.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/9afe650fcb348414a4529d89f52eb8969ba06efd.1743190078.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-03-28 17:13:09 -06:00
Pavel Begunkov
b0e9570a6b io_uring/msg: rename io_double_lock_ctx()
io_double_lock_ctx() doesn't lock both rings. Rename it to prevent any
future confusion.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/9e5defa000efd9b0f5e169cbb6bad4994d46ec5c.1743190078.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-03-28 17:13:09 -06:00
Pavel Begunkov
fbe1a30c5d io_uring/net: import zc ubuf earlier
io_send_setup() already sets up the iterator for IORING_OP_SEND_ZC, we
don't need repeating that at issue time. Move it all together with mem
accounting at prep time, which is more consistent with how the non-zc
version does that.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/eb54f007c493ad9f4ca89aa8e715baf30d83fb88.1743202294.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-03-28 17:11:20 -06:00
Pavel Begunkov
ad3f6cc400 io_uring/net: set sg_from_iter in advance
In preparation to the next patch, set ->sg_from_iter callback at request
prep time.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/5fe2972701df3bacdb3d760bce195fa640bee201.1743202294.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-03-28 17:11:20 -06:00
Pavel Begunkov
49dbce5602 io_uring/net: clusterise send vs msghdr branches
We have multiple branches at prep for send vs sendmsg handling, put them
together so that the variant handling is more localised.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/33abf666d9ded74cba4da2f0d9fe58e88520dffe.1743202294.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-03-28 17:11:20 -06:00
Pavel Begunkov
63b16e4f0b io_uring/net: unify sendmsg setup with zc
io_sendmsg_zc_setup() duplicates parts of io_sendmsg_setup(), and the
only difference between them is that the former support vectored
registered buffers with nothing zerocopy specific. Merge them together,
we want regular sendmsg to eventually support fixed buffers either way.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/7e5ec40f9dc93355399dc6fa0cbc8b31f0b20ac5.1743202294.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-03-28 17:11:20 -06:00
Pavel Begunkov
c55e2845df io_uring/net: combine sendzc flags writes
Save an instruction / trip to the cache and assign some of sendzc flags
together.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/c519d6f406776c3be3ef988a8339a88e45d1ffd9.1743202294.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-03-28 17:11:20 -06:00
Pavel Begunkov
5f364117db io_uring/net: open code io_net_vec_assign()
Get rid of io_net_vec_assign() by open coding it into its only caller.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/19191c34b5cfe1161f7eeefa6e785418ea9ad56d.1743202294.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-03-28 17:11:20 -06:00
Pavel Begunkov
a20b8631c8 io_uring/net: open code io_sendmsg_copy_hdr()
io_sendmsg_setup() is trivial and io_sendmsg_copy_hdr() doesn't add
any good abstraction, open code one into another.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/565318ce585665e88053663eeee5178d2c15692f.1743202294.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-03-28 17:11:20 -06:00