mirror of
git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
synced 2025-11-01 09:13:37 +00:00
block-6.16-20250606
-----BEGIN PGP SIGNATURE-----
iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmhC7/UQHGF4Ym9lQGtl
cm5lbC5kawAKCRD301j7KXHgps+6D/9BOhkMyMkUF9LAev4PBNE+x3aftjl7Y1AY
EHv2vozb4nDwXIaalG4qGUhprz2+z+hqxYjmnlOAsqbixhcSzKK5z9rjxyDka776
x03vfvKXaXZUG7XN7ENY8sJnLx4QJ0nh4+0gzT9yDyq2vKvPFLEKweNOxKDKCSbE
31vGoLFwjltp74hX+Qrnj1KMaTLgvAaV0eXKWlbX7Iiw6GFVm200zb27gth6U8bV
WQAmjSkFQ0daHtAWmXIVy7hrXiCqe8D6YPKvBXnQ4cfKVbgG0HHDuTmQLpKGzfMi
rr24MU5vZjt6OsYalceiTtifSUcf/I2+iFV7HswOk9kpOY5A2ylsWawRP2mm4PDI
nJE3LaSTRpEvs5kzPJ2kr8Zp4/uvF6ehSq8Y9w52JekmOzxusLcRcswezaO00EI0
32uuK+P505EGTcCBTrEdtaI6k7zzQEeVoIpxqvMhNRG/s5vzvIV3eVrALu2HSDma
P3paEdx7PwJla3ndmdChfh1vUR3TW3gWoZvoNCVmJzNCnLEAScTS2NsiQeEjy8zs
20IGsrRgIqt9KR8GZ2zj1ZOM47Cg0dIU3pbbA2Ja71wx4TYXJCSFFRK7mzDtXYlY
BWOix/Dks8tk118cwuxnT+IiwmWDMbDZKnygh+4tiSyrs0IszeekRADLUu03C0Ve
Dhpljqf3zA==
=gs32
-----END PGP SIGNATURE-----
Merge tag 'block-6.16-20250606' of git://git.kernel.dk/linux
Pull more block updates from Jens Axboe:
- NVMe pull request via Christoph:
- TCP error handling fix (Shin'ichiro Kawasaki)
- TCP I/O stall handling fixes (Hannes Reinecke)
- fix command limits status code (Keith Busch)
- support vectored buffers also for passthrough (Pavel Begunkov)
- spelling fixes (Yi Zhang)
- MD pull request via Yu:
- fix REQ_RAHEAD and REQ_NOWAIT IO err handling for raid1/10
- fix max_write_behind setting for dm-raid
- some minor cleanups
- Integrity data direction fix and cleanup
- bcache NULL pointer fix
- Fix for loop missing write start/end handling
- Decouple hardware queues and IO threads in ublk
- Slew of ublk selftests additions and updates
* tag 'block-6.16-20250606' of git://git.kernel.dk/linux: (29 commits)
nvme: spelling fixes
nvme-tcp: fix I/O stalls on congested sockets
nvme-tcp: sanitize request list handling
nvme-tcp: remove tag set when second admin queue config fails
nvme: enable vectored registered bufs for passthrough cmds
nvme: fix implicit bool to flags conversion
nvme: fix command limits status code
selftests: ublk: kublk: improve behavior on init failure
block: flip iter directions in blk_rq_integrity_map_user()
block: drop direction param from bio_integrity_copy_user()
selftests: ublk: cover PER_IO_DAEMON in more stress tests
Documentation: ublk: document UBLK_F_PER_IO_DAEMON
selftests: ublk: add stress test for per io daemons
selftests: ublk: add functional test for per io daemons
selftests: ublk: kublk: decouple ublk_queues from ublk server threads
selftests: ublk: kublk: move per-thread data out of ublk_queue
selftests: ublk: kublk: lift queue initialization out of thread
selftests: ublk: kublk: tie sqe allocation to io instead of queue
selftests: ublk: kublk: plumb q_id in io_uring user_data
ublk: have a per-io daemon instead of a per-queue daemon
...
This commit is contained in:
commit
6d8854216e
48 changed files with 708 additions and 393 deletions
|
|
@ -115,15 +115,15 @@ managing and controlling ublk devices with help of several control commands:
|
||||||
|
|
||||||
- ``UBLK_CMD_START_DEV``
|
- ``UBLK_CMD_START_DEV``
|
||||||
|
|
||||||
After the server prepares userspace resources (such as creating per-queue
|
After the server prepares userspace resources (such as creating I/O handler
|
||||||
pthread & io_uring for handling ublk IO), this command is sent to the
|
threads & io_uring for handling ublk IO), this command is sent to the
|
||||||
driver for allocating & exposing ``/dev/ublkb*``. Parameters set via
|
driver for allocating & exposing ``/dev/ublkb*``. Parameters set via
|
||||||
``UBLK_CMD_SET_PARAMS`` are applied for creating the device.
|
``UBLK_CMD_SET_PARAMS`` are applied for creating the device.
|
||||||
|
|
||||||
- ``UBLK_CMD_STOP_DEV``
|
- ``UBLK_CMD_STOP_DEV``
|
||||||
|
|
||||||
Halt IO on ``/dev/ublkb*`` and remove the device. When this command returns,
|
Halt IO on ``/dev/ublkb*`` and remove the device. When this command returns,
|
||||||
ublk server will release resources (such as destroying per-queue pthread &
|
ublk server will release resources (such as destroying I/O handler threads &
|
||||||
io_uring).
|
io_uring).
|
||||||
|
|
||||||
- ``UBLK_CMD_DEL_DEV``
|
- ``UBLK_CMD_DEL_DEV``
|
||||||
|
|
@ -208,15 +208,15 @@ managing and controlling ublk devices with help of several control commands:
|
||||||
modify how I/O is handled while the ublk server is dying/dead (this is called
|
modify how I/O is handled while the ublk server is dying/dead (this is called
|
||||||
the ``nosrv`` case in the driver code).
|
the ``nosrv`` case in the driver code).
|
||||||
|
|
||||||
With just ``UBLK_F_USER_RECOVERY`` set, after one ubq_daemon(ublk server's io
|
With just ``UBLK_F_USER_RECOVERY`` set, after the ublk server exits,
|
||||||
handler) is dying, ublk does not delete ``/dev/ublkb*`` during the whole
|
ublk does not delete ``/dev/ublkb*`` during the whole
|
||||||
recovery stage and ublk device ID is kept. It is ublk server's
|
recovery stage and ublk device ID is kept. It is ublk server's
|
||||||
responsibility to recover the device context by its own knowledge.
|
responsibility to recover the device context by its own knowledge.
|
||||||
Requests which have not been issued to userspace are requeued. Requests
|
Requests which have not been issued to userspace are requeued. Requests
|
||||||
which have been issued to userspace are aborted.
|
which have been issued to userspace are aborted.
|
||||||
|
|
||||||
With ``UBLK_F_USER_RECOVERY_REISSUE`` additionally set, after one ubq_daemon
|
With ``UBLK_F_USER_RECOVERY_REISSUE`` additionally set, after the ublk server
|
||||||
(ublk server's io handler) is dying, contrary to ``UBLK_F_USER_RECOVERY``,
|
exits, contrary to ``UBLK_F_USER_RECOVERY``,
|
||||||
requests which have been issued to userspace are requeued and will be
|
requests which have been issued to userspace are requeued and will be
|
||||||
re-issued to the new process after handling ``UBLK_CMD_END_USER_RECOVERY``.
|
re-issued to the new process after handling ``UBLK_CMD_END_USER_RECOVERY``.
|
||||||
``UBLK_F_USER_RECOVERY_REISSUE`` is designed for backends who tolerate
|
``UBLK_F_USER_RECOVERY_REISSUE`` is designed for backends who tolerate
|
||||||
|
|
@ -241,10 +241,11 @@ can be controlled/accessed just inside this container.
|
||||||
Data plane
|
Data plane
|
||||||
----------
|
----------
|
||||||
|
|
||||||
ublk server needs to create per-queue IO pthread & io_uring for handling IO
|
The ublk server should create dedicated threads for handling I/O. Each
|
||||||
commands via io_uring passthrough. The per-queue IO pthread
|
thread should have its own io_uring through which it is notified of new
|
||||||
focuses on IO handling and shouldn't handle any control & management
|
I/O, and through which it can complete I/O. These dedicated threads
|
||||||
tasks.
|
should focus on IO handling and shouldn't handle any control &
|
||||||
|
management tasks.
|
||||||
|
|
||||||
The's IO is assigned by a unique tag, which is 1:1 mapping with IO
|
The's IO is assigned by a unique tag, which is 1:1 mapping with IO
|
||||||
request of ``/dev/ublkb*``.
|
request of ``/dev/ublkb*``.
|
||||||
|
|
@ -265,6 +266,18 @@ with specified IO tag in the command data:
|
||||||
destined to ``/dev/ublkb*``. This command is sent only once from the server
|
destined to ``/dev/ublkb*``. This command is sent only once from the server
|
||||||
IO pthread for ublk driver to setup IO forward environment.
|
IO pthread for ublk driver to setup IO forward environment.
|
||||||
|
|
||||||
|
Once a thread issues this command against a given (qid,tag) pair, the thread
|
||||||
|
registers itself as that I/O's daemon. In the future, only that I/O's daemon
|
||||||
|
is allowed to issue commands against the I/O. If any other thread attempts
|
||||||
|
to issue a command against a (qid,tag) pair for which the thread is not the
|
||||||
|
daemon, the command will fail. Daemons can be reset only be going through
|
||||||
|
recovery.
|
||||||
|
|
||||||
|
The ability for every (qid,tag) pair to have its own independent daemon task
|
||||||
|
is indicated by the ``UBLK_F_PER_IO_DAEMON`` feature. If this feature is not
|
||||||
|
supported by the driver, daemons must be per-queue instead - i.e. all I/Os
|
||||||
|
associated to a single qid must be handled by the same task.
|
||||||
|
|
||||||
- ``UBLK_IO_COMMIT_AND_FETCH_REQ``
|
- ``UBLK_IO_COMMIT_AND_FETCH_REQ``
|
||||||
|
|
||||||
When an IO request is destined to ``/dev/ublkb*``, the driver stores
|
When an IO request is destined to ``/dev/ublkb*``, the driver stores
|
||||||
|
|
|
||||||
|
|
@ -154,10 +154,9 @@ int bio_integrity_add_page(struct bio *bio, struct page *page,
|
||||||
EXPORT_SYMBOL(bio_integrity_add_page);
|
EXPORT_SYMBOL(bio_integrity_add_page);
|
||||||
|
|
||||||
static int bio_integrity_copy_user(struct bio *bio, struct bio_vec *bvec,
|
static int bio_integrity_copy_user(struct bio *bio, struct bio_vec *bvec,
|
||||||
int nr_vecs, unsigned int len,
|
int nr_vecs, unsigned int len)
|
||||||
unsigned int direction)
|
|
||||||
{
|
{
|
||||||
bool write = direction == ITER_SOURCE;
|
bool write = op_is_write(bio_op(bio));
|
||||||
struct bio_integrity_payload *bip;
|
struct bio_integrity_payload *bip;
|
||||||
struct iov_iter iter;
|
struct iov_iter iter;
|
||||||
void *buf;
|
void *buf;
|
||||||
|
|
@ -168,7 +167,7 @@ static int bio_integrity_copy_user(struct bio *bio, struct bio_vec *bvec,
|
||||||
return -ENOMEM;
|
return -ENOMEM;
|
||||||
|
|
||||||
if (write) {
|
if (write) {
|
||||||
iov_iter_bvec(&iter, direction, bvec, nr_vecs, len);
|
iov_iter_bvec(&iter, ITER_SOURCE, bvec, nr_vecs, len);
|
||||||
if (!copy_from_iter_full(buf, len, &iter)) {
|
if (!copy_from_iter_full(buf, len, &iter)) {
|
||||||
ret = -EFAULT;
|
ret = -EFAULT;
|
||||||
goto free_buf;
|
goto free_buf;
|
||||||
|
|
@ -264,7 +263,7 @@ int bio_integrity_map_user(struct bio *bio, struct iov_iter *iter)
|
||||||
struct page *stack_pages[UIO_FASTIOV], **pages = stack_pages;
|
struct page *stack_pages[UIO_FASTIOV], **pages = stack_pages;
|
||||||
struct bio_vec stack_vec[UIO_FASTIOV], *bvec = stack_vec;
|
struct bio_vec stack_vec[UIO_FASTIOV], *bvec = stack_vec;
|
||||||
size_t offset, bytes = iter->count;
|
size_t offset, bytes = iter->count;
|
||||||
unsigned int direction, nr_bvecs;
|
unsigned int nr_bvecs;
|
||||||
int ret, nr_vecs;
|
int ret, nr_vecs;
|
||||||
bool copy;
|
bool copy;
|
||||||
|
|
||||||
|
|
@ -273,11 +272,6 @@ int bio_integrity_map_user(struct bio *bio, struct iov_iter *iter)
|
||||||
if (bytes >> SECTOR_SHIFT > queue_max_hw_sectors(q))
|
if (bytes >> SECTOR_SHIFT > queue_max_hw_sectors(q))
|
||||||
return -E2BIG;
|
return -E2BIG;
|
||||||
|
|
||||||
if (bio_data_dir(bio) == READ)
|
|
||||||
direction = ITER_DEST;
|
|
||||||
else
|
|
||||||
direction = ITER_SOURCE;
|
|
||||||
|
|
||||||
nr_vecs = iov_iter_npages(iter, BIO_MAX_VECS + 1);
|
nr_vecs = iov_iter_npages(iter, BIO_MAX_VECS + 1);
|
||||||
if (nr_vecs > BIO_MAX_VECS)
|
if (nr_vecs > BIO_MAX_VECS)
|
||||||
return -E2BIG;
|
return -E2BIG;
|
||||||
|
|
@ -300,8 +294,7 @@ int bio_integrity_map_user(struct bio *bio, struct iov_iter *iter)
|
||||||
copy = true;
|
copy = true;
|
||||||
|
|
||||||
if (copy)
|
if (copy)
|
||||||
ret = bio_integrity_copy_user(bio, bvec, nr_bvecs, bytes,
|
ret = bio_integrity_copy_user(bio, bvec, nr_bvecs, bytes);
|
||||||
direction);
|
|
||||||
else
|
else
|
||||||
ret = bio_integrity_init_user(bio, bvec, nr_bvecs, bytes);
|
ret = bio_integrity_init_user(bio, bvec, nr_bvecs, bytes);
|
||||||
if (ret)
|
if (ret)
|
||||||
|
|
|
||||||
|
|
@ -117,13 +117,8 @@ int blk_rq_integrity_map_user(struct request *rq, void __user *ubuf,
|
||||||
{
|
{
|
||||||
int ret;
|
int ret;
|
||||||
struct iov_iter iter;
|
struct iov_iter iter;
|
||||||
unsigned int direction;
|
|
||||||
|
|
||||||
if (op_is_write(req_op(rq)))
|
iov_iter_ubuf(&iter, rq_data_dir(rq), ubuf, bytes);
|
||||||
direction = ITER_DEST;
|
|
||||||
else
|
|
||||||
direction = ITER_SOURCE;
|
|
||||||
iov_iter_ubuf(&iter, direction, ubuf, bytes);
|
|
||||||
ret = bio_integrity_map_user(rq->bio, &iter);
|
ret = bio_integrity_map_user(rq->bio, &iter);
|
||||||
if (ret)
|
if (ret)
|
||||||
return ret;
|
return ret;
|
||||||
|
|
|
||||||
|
|
@ -308,11 +308,14 @@ end_io:
|
||||||
static void lo_rw_aio_do_completion(struct loop_cmd *cmd)
|
static void lo_rw_aio_do_completion(struct loop_cmd *cmd)
|
||||||
{
|
{
|
||||||
struct request *rq = blk_mq_rq_from_pdu(cmd);
|
struct request *rq = blk_mq_rq_from_pdu(cmd);
|
||||||
|
struct loop_device *lo = rq->q->queuedata;
|
||||||
|
|
||||||
if (!atomic_dec_and_test(&cmd->ref))
|
if (!atomic_dec_and_test(&cmd->ref))
|
||||||
return;
|
return;
|
||||||
kfree(cmd->bvec);
|
kfree(cmd->bvec);
|
||||||
cmd->bvec = NULL;
|
cmd->bvec = NULL;
|
||||||
|
if (req_op(rq) == REQ_OP_WRITE)
|
||||||
|
file_end_write(lo->lo_backing_file);
|
||||||
if (likely(!blk_should_fake_timeout(rq->q)))
|
if (likely(!blk_should_fake_timeout(rq->q)))
|
||||||
blk_mq_complete_request(rq);
|
blk_mq_complete_request(rq);
|
||||||
}
|
}
|
||||||
|
|
@ -387,9 +390,10 @@ static int lo_rw_aio(struct loop_device *lo, struct loop_cmd *cmd,
|
||||||
cmd->iocb.ki_flags = 0;
|
cmd->iocb.ki_flags = 0;
|
||||||
}
|
}
|
||||||
|
|
||||||
if (rw == ITER_SOURCE)
|
if (rw == ITER_SOURCE) {
|
||||||
|
file_start_write(lo->lo_backing_file);
|
||||||
ret = file->f_op->write_iter(&cmd->iocb, &iter);
|
ret = file->f_op->write_iter(&cmd->iocb, &iter);
|
||||||
else
|
} else
|
||||||
ret = file->f_op->read_iter(&cmd->iocb, &iter);
|
ret = file->f_op->read_iter(&cmd->iocb, &iter);
|
||||||
|
|
||||||
lo_rw_aio_do_completion(cmd);
|
lo_rw_aio_do_completion(cmd);
|
||||||
|
|
|
||||||
|
|
@ -69,7 +69,8 @@
|
||||||
| UBLK_F_USER_RECOVERY_FAIL_IO \
|
| UBLK_F_USER_RECOVERY_FAIL_IO \
|
||||||
| UBLK_F_UPDATE_SIZE \
|
| UBLK_F_UPDATE_SIZE \
|
||||||
| UBLK_F_AUTO_BUF_REG \
|
| UBLK_F_AUTO_BUF_REG \
|
||||||
| UBLK_F_QUIESCE)
|
| UBLK_F_QUIESCE \
|
||||||
|
| UBLK_F_PER_IO_DAEMON)
|
||||||
|
|
||||||
#define UBLK_F_ALL_RECOVERY_FLAGS (UBLK_F_USER_RECOVERY \
|
#define UBLK_F_ALL_RECOVERY_FLAGS (UBLK_F_USER_RECOVERY \
|
||||||
| UBLK_F_USER_RECOVERY_REISSUE \
|
| UBLK_F_USER_RECOVERY_REISSUE \
|
||||||
|
|
@ -166,6 +167,8 @@ struct ublk_io {
|
||||||
/* valid if UBLK_IO_FLAG_OWNED_BY_SRV is set */
|
/* valid if UBLK_IO_FLAG_OWNED_BY_SRV is set */
|
||||||
struct request *req;
|
struct request *req;
|
||||||
};
|
};
|
||||||
|
|
||||||
|
struct task_struct *task;
|
||||||
};
|
};
|
||||||
|
|
||||||
struct ublk_queue {
|
struct ublk_queue {
|
||||||
|
|
@ -173,11 +176,9 @@ struct ublk_queue {
|
||||||
int q_depth;
|
int q_depth;
|
||||||
|
|
||||||
unsigned long flags;
|
unsigned long flags;
|
||||||
struct task_struct *ubq_daemon;
|
|
||||||
struct ublksrv_io_desc *io_cmd_buf;
|
struct ublksrv_io_desc *io_cmd_buf;
|
||||||
|
|
||||||
bool force_abort;
|
bool force_abort;
|
||||||
bool timeout;
|
|
||||||
bool canceling;
|
bool canceling;
|
||||||
bool fail_io; /* copy of dev->state == UBLK_S_DEV_FAIL_IO */
|
bool fail_io; /* copy of dev->state == UBLK_S_DEV_FAIL_IO */
|
||||||
unsigned short nr_io_ready; /* how many ios setup */
|
unsigned short nr_io_ready; /* how many ios setup */
|
||||||
|
|
@ -1099,11 +1100,6 @@ static inline struct ublk_uring_cmd_pdu *ublk_get_uring_cmd_pdu(
|
||||||
return io_uring_cmd_to_pdu(ioucmd, struct ublk_uring_cmd_pdu);
|
return io_uring_cmd_to_pdu(ioucmd, struct ublk_uring_cmd_pdu);
|
||||||
}
|
}
|
||||||
|
|
||||||
static inline bool ubq_daemon_is_dying(struct ublk_queue *ubq)
|
|
||||||
{
|
|
||||||
return !ubq->ubq_daemon || ubq->ubq_daemon->flags & PF_EXITING;
|
|
||||||
}
|
|
||||||
|
|
||||||
/* todo: handle partial completion */
|
/* todo: handle partial completion */
|
||||||
static inline void __ublk_complete_rq(struct request *req)
|
static inline void __ublk_complete_rq(struct request *req)
|
||||||
{
|
{
|
||||||
|
|
@ -1275,13 +1271,13 @@ static void ublk_dispatch_req(struct ublk_queue *ubq,
|
||||||
/*
|
/*
|
||||||
* Task is exiting if either:
|
* Task is exiting if either:
|
||||||
*
|
*
|
||||||
* (1) current != ubq_daemon.
|
* (1) current != io->task.
|
||||||
* io_uring_cmd_complete_in_task() tries to run task_work
|
* io_uring_cmd_complete_in_task() tries to run task_work
|
||||||
* in a workqueue if ubq_daemon(cmd's task) is PF_EXITING.
|
* in a workqueue if cmd's task is PF_EXITING.
|
||||||
*
|
*
|
||||||
* (2) current->flags & PF_EXITING.
|
* (2) current->flags & PF_EXITING.
|
||||||
*/
|
*/
|
||||||
if (unlikely(current != ubq->ubq_daemon || current->flags & PF_EXITING)) {
|
if (unlikely(current != io->task || current->flags & PF_EXITING)) {
|
||||||
__ublk_abort_rq(ubq, req);
|
__ublk_abort_rq(ubq, req);
|
||||||
return;
|
return;
|
||||||
}
|
}
|
||||||
|
|
@ -1330,24 +1326,22 @@ static void ublk_cmd_list_tw_cb(struct io_uring_cmd *cmd,
|
||||||
{
|
{
|
||||||
struct ublk_uring_cmd_pdu *pdu = ublk_get_uring_cmd_pdu(cmd);
|
struct ublk_uring_cmd_pdu *pdu = ublk_get_uring_cmd_pdu(cmd);
|
||||||
struct request *rq = pdu->req_list;
|
struct request *rq = pdu->req_list;
|
||||||
struct ublk_queue *ubq = pdu->ubq;
|
|
||||||
struct request *next;
|
struct request *next;
|
||||||
|
|
||||||
do {
|
do {
|
||||||
next = rq->rq_next;
|
next = rq->rq_next;
|
||||||
rq->rq_next = NULL;
|
rq->rq_next = NULL;
|
||||||
ublk_dispatch_req(ubq, rq, issue_flags);
|
ublk_dispatch_req(rq->mq_hctx->driver_data, rq, issue_flags);
|
||||||
rq = next;
|
rq = next;
|
||||||
} while (rq);
|
} while (rq);
|
||||||
}
|
}
|
||||||
|
|
||||||
static void ublk_queue_cmd_list(struct ublk_queue *ubq, struct rq_list *l)
|
static void ublk_queue_cmd_list(struct ublk_io *io, struct rq_list *l)
|
||||||
{
|
{
|
||||||
struct request *rq = rq_list_peek(l);
|
struct io_uring_cmd *cmd = io->cmd;
|
||||||
struct io_uring_cmd *cmd = ubq->ios[rq->tag].cmd;
|
|
||||||
struct ublk_uring_cmd_pdu *pdu = ublk_get_uring_cmd_pdu(cmd);
|
struct ublk_uring_cmd_pdu *pdu = ublk_get_uring_cmd_pdu(cmd);
|
||||||
|
|
||||||
pdu->req_list = rq;
|
pdu->req_list = rq_list_peek(l);
|
||||||
rq_list_init(l);
|
rq_list_init(l);
|
||||||
io_uring_cmd_complete_in_task(cmd, ublk_cmd_list_tw_cb);
|
io_uring_cmd_complete_in_task(cmd, ublk_cmd_list_tw_cb);
|
||||||
}
|
}
|
||||||
|
|
@ -1355,13 +1349,10 @@ static void ublk_queue_cmd_list(struct ublk_queue *ubq, struct rq_list *l)
|
||||||
static enum blk_eh_timer_return ublk_timeout(struct request *rq)
|
static enum blk_eh_timer_return ublk_timeout(struct request *rq)
|
||||||
{
|
{
|
||||||
struct ublk_queue *ubq = rq->mq_hctx->driver_data;
|
struct ublk_queue *ubq = rq->mq_hctx->driver_data;
|
||||||
|
struct ublk_io *io = &ubq->ios[rq->tag];
|
||||||
|
|
||||||
if (ubq->flags & UBLK_F_UNPRIVILEGED_DEV) {
|
if (ubq->flags & UBLK_F_UNPRIVILEGED_DEV) {
|
||||||
if (!ubq->timeout) {
|
send_sig(SIGKILL, io->task, 0);
|
||||||
send_sig(SIGKILL, ubq->ubq_daemon, 0);
|
|
||||||
ubq->timeout = true;
|
|
||||||
}
|
|
||||||
|
|
||||||
return BLK_EH_DONE;
|
return BLK_EH_DONE;
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|
@ -1429,24 +1420,25 @@ static void ublk_queue_rqs(struct rq_list *rqlist)
|
||||||
{
|
{
|
||||||
struct rq_list requeue_list = { };
|
struct rq_list requeue_list = { };
|
||||||
struct rq_list submit_list = { };
|
struct rq_list submit_list = { };
|
||||||
struct ublk_queue *ubq = NULL;
|
struct ublk_io *io = NULL;
|
||||||
struct request *req;
|
struct request *req;
|
||||||
|
|
||||||
while ((req = rq_list_pop(rqlist))) {
|
while ((req = rq_list_pop(rqlist))) {
|
||||||
struct ublk_queue *this_q = req->mq_hctx->driver_data;
|
struct ublk_queue *this_q = req->mq_hctx->driver_data;
|
||||||
|
struct ublk_io *this_io = &this_q->ios[req->tag];
|
||||||
|
|
||||||
if (ubq && ubq != this_q && !rq_list_empty(&submit_list))
|
if (io && io->task != this_io->task && !rq_list_empty(&submit_list))
|
||||||
ublk_queue_cmd_list(ubq, &submit_list);
|
ublk_queue_cmd_list(io, &submit_list);
|
||||||
ubq = this_q;
|
io = this_io;
|
||||||
|
|
||||||
if (ublk_prep_req(ubq, req, true) == BLK_STS_OK)
|
if (ublk_prep_req(this_q, req, true) == BLK_STS_OK)
|
||||||
rq_list_add_tail(&submit_list, req);
|
rq_list_add_tail(&submit_list, req);
|
||||||
else
|
else
|
||||||
rq_list_add_tail(&requeue_list, req);
|
rq_list_add_tail(&requeue_list, req);
|
||||||
}
|
}
|
||||||
|
|
||||||
if (ubq && !rq_list_empty(&submit_list))
|
if (!rq_list_empty(&submit_list))
|
||||||
ublk_queue_cmd_list(ubq, &submit_list);
|
ublk_queue_cmd_list(io, &submit_list);
|
||||||
*rqlist = requeue_list;
|
*rqlist = requeue_list;
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|
@ -1474,17 +1466,6 @@ static void ublk_queue_reinit(struct ublk_device *ub, struct ublk_queue *ubq)
|
||||||
/* All old ioucmds have to be completed */
|
/* All old ioucmds have to be completed */
|
||||||
ubq->nr_io_ready = 0;
|
ubq->nr_io_ready = 0;
|
||||||
|
|
||||||
/*
|
|
||||||
* old daemon is PF_EXITING, put it now
|
|
||||||
*
|
|
||||||
* It could be NULL in case of closing one quisced device.
|
|
||||||
*/
|
|
||||||
if (ubq->ubq_daemon)
|
|
||||||
put_task_struct(ubq->ubq_daemon);
|
|
||||||
/* We have to reset it to NULL, otherwise ub won't accept new FETCH_REQ */
|
|
||||||
ubq->ubq_daemon = NULL;
|
|
||||||
ubq->timeout = false;
|
|
||||||
|
|
||||||
for (i = 0; i < ubq->q_depth; i++) {
|
for (i = 0; i < ubq->q_depth; i++) {
|
||||||
struct ublk_io *io = &ubq->ios[i];
|
struct ublk_io *io = &ubq->ios[i];
|
||||||
|
|
||||||
|
|
@ -1495,6 +1476,17 @@ static void ublk_queue_reinit(struct ublk_device *ub, struct ublk_queue *ubq)
|
||||||
io->flags &= UBLK_IO_FLAG_CANCELED;
|
io->flags &= UBLK_IO_FLAG_CANCELED;
|
||||||
io->cmd = NULL;
|
io->cmd = NULL;
|
||||||
io->addr = 0;
|
io->addr = 0;
|
||||||
|
|
||||||
|
/*
|
||||||
|
* old task is PF_EXITING, put it now
|
||||||
|
*
|
||||||
|
* It could be NULL in case of closing one quiesced
|
||||||
|
* device.
|
||||||
|
*/
|
||||||
|
if (io->task) {
|
||||||
|
put_task_struct(io->task);
|
||||||
|
io->task = NULL;
|
||||||
|
}
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|
@ -1516,7 +1508,7 @@ static void ublk_reset_ch_dev(struct ublk_device *ub)
|
||||||
for (i = 0; i < ub->dev_info.nr_hw_queues; i++)
|
for (i = 0; i < ub->dev_info.nr_hw_queues; i++)
|
||||||
ublk_queue_reinit(ub, ublk_get_queue(ub, i));
|
ublk_queue_reinit(ub, ublk_get_queue(ub, i));
|
||||||
|
|
||||||
/* set to NULL, otherwise new ubq_daemon cannot mmap the io_cmd_buf */
|
/* set to NULL, otherwise new tasks cannot mmap io_cmd_buf */
|
||||||
ub->mm = NULL;
|
ub->mm = NULL;
|
||||||
ub->nr_queues_ready = 0;
|
ub->nr_queues_ready = 0;
|
||||||
ub->nr_privileged_daemon = 0;
|
ub->nr_privileged_daemon = 0;
|
||||||
|
|
@ -1783,6 +1775,7 @@ static void ublk_uring_cmd_cancel_fn(struct io_uring_cmd *cmd,
|
||||||
struct ublk_uring_cmd_pdu *pdu = ublk_get_uring_cmd_pdu(cmd);
|
struct ublk_uring_cmd_pdu *pdu = ublk_get_uring_cmd_pdu(cmd);
|
||||||
struct ublk_queue *ubq = pdu->ubq;
|
struct ublk_queue *ubq = pdu->ubq;
|
||||||
struct task_struct *task;
|
struct task_struct *task;
|
||||||
|
struct ublk_io *io;
|
||||||
|
|
||||||
if (WARN_ON_ONCE(!ubq))
|
if (WARN_ON_ONCE(!ubq))
|
||||||
return;
|
return;
|
||||||
|
|
@ -1791,13 +1784,14 @@ static void ublk_uring_cmd_cancel_fn(struct io_uring_cmd *cmd,
|
||||||
return;
|
return;
|
||||||
|
|
||||||
task = io_uring_cmd_get_task(cmd);
|
task = io_uring_cmd_get_task(cmd);
|
||||||
if (WARN_ON_ONCE(task && task != ubq->ubq_daemon))
|
io = &ubq->ios[pdu->tag];
|
||||||
|
if (WARN_ON_ONCE(task && task != io->task))
|
||||||
return;
|
return;
|
||||||
|
|
||||||
if (!ubq->canceling)
|
if (!ubq->canceling)
|
||||||
ublk_start_cancel(ubq);
|
ublk_start_cancel(ubq);
|
||||||
|
|
||||||
WARN_ON_ONCE(ubq->ios[pdu->tag].cmd != cmd);
|
WARN_ON_ONCE(io->cmd != cmd);
|
||||||
ublk_cancel_cmd(ubq, pdu->tag, issue_flags);
|
ublk_cancel_cmd(ubq, pdu->tag, issue_flags);
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|
@ -1930,8 +1924,6 @@ static void ublk_mark_io_ready(struct ublk_device *ub, struct ublk_queue *ubq)
|
||||||
{
|
{
|
||||||
ubq->nr_io_ready++;
|
ubq->nr_io_ready++;
|
||||||
if (ublk_queue_ready(ubq)) {
|
if (ublk_queue_ready(ubq)) {
|
||||||
ubq->ubq_daemon = current;
|
|
||||||
get_task_struct(ubq->ubq_daemon);
|
|
||||||
ub->nr_queues_ready++;
|
ub->nr_queues_ready++;
|
||||||
|
|
||||||
if (capable(CAP_SYS_ADMIN))
|
if (capable(CAP_SYS_ADMIN))
|
||||||
|
|
@ -2084,6 +2076,7 @@ static int ublk_fetch(struct io_uring_cmd *cmd, struct ublk_queue *ubq,
|
||||||
}
|
}
|
||||||
|
|
||||||
ublk_fill_io_cmd(io, cmd, buf_addr);
|
ublk_fill_io_cmd(io, cmd, buf_addr);
|
||||||
|
WRITE_ONCE(io->task, get_task_struct(current));
|
||||||
ublk_mark_io_ready(ub, ubq);
|
ublk_mark_io_ready(ub, ubq);
|
||||||
out:
|
out:
|
||||||
mutex_unlock(&ub->mutex);
|
mutex_unlock(&ub->mutex);
|
||||||
|
|
@ -2179,6 +2172,7 @@ static int __ublk_ch_uring_cmd(struct io_uring_cmd *cmd,
|
||||||
const struct ublksrv_io_cmd *ub_cmd)
|
const struct ublksrv_io_cmd *ub_cmd)
|
||||||
{
|
{
|
||||||
struct ublk_device *ub = cmd->file->private_data;
|
struct ublk_device *ub = cmd->file->private_data;
|
||||||
|
struct task_struct *task;
|
||||||
struct ublk_queue *ubq;
|
struct ublk_queue *ubq;
|
||||||
struct ublk_io *io;
|
struct ublk_io *io;
|
||||||
u32 cmd_op = cmd->cmd_op;
|
u32 cmd_op = cmd->cmd_op;
|
||||||
|
|
@ -2193,13 +2187,14 @@ static int __ublk_ch_uring_cmd(struct io_uring_cmd *cmd,
|
||||||
goto out;
|
goto out;
|
||||||
|
|
||||||
ubq = ublk_get_queue(ub, ub_cmd->q_id);
|
ubq = ublk_get_queue(ub, ub_cmd->q_id);
|
||||||
if (ubq->ubq_daemon && ubq->ubq_daemon != current)
|
|
||||||
goto out;
|
|
||||||
|
|
||||||
if (tag >= ubq->q_depth)
|
if (tag >= ubq->q_depth)
|
||||||
goto out;
|
goto out;
|
||||||
|
|
||||||
io = &ubq->ios[tag];
|
io = &ubq->ios[tag];
|
||||||
|
task = READ_ONCE(io->task);
|
||||||
|
if (task && task != current)
|
||||||
|
goto out;
|
||||||
|
|
||||||
/* there is pending io cmd, something must be wrong */
|
/* there is pending io cmd, something must be wrong */
|
||||||
if (io->flags & UBLK_IO_FLAG_ACTIVE) {
|
if (io->flags & UBLK_IO_FLAG_ACTIVE) {
|
||||||
|
|
@ -2449,9 +2444,14 @@ static void ublk_deinit_queue(struct ublk_device *ub, int q_id)
|
||||||
{
|
{
|
||||||
int size = ublk_queue_cmd_buf_size(ub, q_id);
|
int size = ublk_queue_cmd_buf_size(ub, q_id);
|
||||||
struct ublk_queue *ubq = ublk_get_queue(ub, q_id);
|
struct ublk_queue *ubq = ublk_get_queue(ub, q_id);
|
||||||
|
int i;
|
||||||
|
|
||||||
|
for (i = 0; i < ubq->q_depth; i++) {
|
||||||
|
struct ublk_io *io = &ubq->ios[i];
|
||||||
|
if (io->task)
|
||||||
|
put_task_struct(io->task);
|
||||||
|
}
|
||||||
|
|
||||||
if (ubq->ubq_daemon)
|
|
||||||
put_task_struct(ubq->ubq_daemon);
|
|
||||||
if (ubq->io_cmd_buf)
|
if (ubq->io_cmd_buf)
|
||||||
free_pages((unsigned long)ubq->io_cmd_buf, get_order(size));
|
free_pages((unsigned long)ubq->io_cmd_buf, get_order(size));
|
||||||
}
|
}
|
||||||
|
|
@ -2923,7 +2923,8 @@ static int ublk_ctrl_add_dev(const struct ublksrv_ctrl_cmd *header)
|
||||||
ub->dev_info.flags &= UBLK_F_ALL;
|
ub->dev_info.flags &= UBLK_F_ALL;
|
||||||
|
|
||||||
ub->dev_info.flags |= UBLK_F_CMD_IOCTL_ENCODE |
|
ub->dev_info.flags |= UBLK_F_CMD_IOCTL_ENCODE |
|
||||||
UBLK_F_URING_CMD_COMP_IN_TASK;
|
UBLK_F_URING_CMD_COMP_IN_TASK |
|
||||||
|
UBLK_F_PER_IO_DAEMON;
|
||||||
|
|
||||||
/* GET_DATA isn't needed any more with USER_COPY or ZERO COPY */
|
/* GET_DATA isn't needed any more with USER_COPY or ZERO COPY */
|
||||||
if (ub->dev_info.flags & (UBLK_F_USER_COPY | UBLK_F_SUPPORT_ZERO_COPY |
|
if (ub->dev_info.flags & (UBLK_F_USER_COPY | UBLK_F_SUPPORT_ZERO_COPY |
|
||||||
|
|
@ -3188,14 +3189,14 @@ static int ublk_ctrl_end_recovery(struct ublk_device *ub,
|
||||||
int ublksrv_pid = (int)header->data[0];
|
int ublksrv_pid = (int)header->data[0];
|
||||||
int ret = -EINVAL;
|
int ret = -EINVAL;
|
||||||
|
|
||||||
pr_devel("%s: Waiting for new ubq_daemons(nr: %d) are ready, dev id %d...\n",
|
pr_devel("%s: Waiting for all FETCH_REQs, dev id %d...\n", __func__,
|
||||||
__func__, ub->dev_info.nr_hw_queues, header->dev_id);
|
header->dev_id);
|
||||||
/* wait until new ubq_daemon sending all FETCH_REQ */
|
|
||||||
if (wait_for_completion_interruptible(&ub->completion))
|
if (wait_for_completion_interruptible(&ub->completion))
|
||||||
return -EINTR;
|
return -EINTR;
|
||||||
|
|
||||||
pr_devel("%s: All new ubq_daemons(nr: %d) are ready, dev id %d\n",
|
pr_devel("%s: All FETCH_REQs received, dev id %d\n", __func__,
|
||||||
__func__, ub->dev_info.nr_hw_queues, header->dev_id);
|
header->dev_id);
|
||||||
|
|
||||||
mutex_lock(&ub->mutex);
|
mutex_lock(&ub->mutex);
|
||||||
if (ublk_nosrv_should_stop_dev(ub))
|
if (ublk_nosrv_should_stop_dev(ub))
|
||||||
|
|
|
||||||
|
|
@ -89,8 +89,6 @@
|
||||||
* Test module load/unload
|
* Test module load/unload
|
||||||
*/
|
*/
|
||||||
|
|
||||||
#define MAX_NEED_GC 64
|
|
||||||
#define MAX_SAVE_PRIO 72
|
|
||||||
#define MAX_GC_TIMES 100
|
#define MAX_GC_TIMES 100
|
||||||
#define MIN_GC_NODES 100
|
#define MIN_GC_NODES 100
|
||||||
#define GC_SLEEP_MS 100
|
#define GC_SLEEP_MS 100
|
||||||
|
|
|
||||||
|
|
@ -1733,7 +1733,12 @@ static CLOSURE_CALLBACK(cache_set_flush)
|
||||||
mutex_unlock(&b->write_lock);
|
mutex_unlock(&b->write_lock);
|
||||||
}
|
}
|
||||||
|
|
||||||
if (ca->alloc_thread)
|
/*
|
||||||
|
* If the register_cache_set() call to bch_cache_set_alloc() failed,
|
||||||
|
* ca has not been assigned a value and return error.
|
||||||
|
* So we need check ca is not NULL during bch_cache_set_unregister().
|
||||||
|
*/
|
||||||
|
if (ca && ca->alloc_thread)
|
||||||
kthread_stop(ca->alloc_thread);
|
kthread_stop(ca->alloc_thread);
|
||||||
|
|
||||||
if (c->journal.cur) {
|
if (c->journal.cur) {
|
||||||
|
|
@ -2233,15 +2238,47 @@ static int cache_alloc(struct cache *ca)
|
||||||
bio_init(&ca->journal.bio, NULL, ca->journal.bio.bi_inline_vecs, 8, 0);
|
bio_init(&ca->journal.bio, NULL, ca->journal.bio.bi_inline_vecs, 8, 0);
|
||||||
|
|
||||||
/*
|
/*
|
||||||
* when ca->sb.njournal_buckets is not zero, journal exists,
|
* When the cache disk is first registered, ca->sb.njournal_buckets
|
||||||
* and in bch_journal_replay(), tree node may split,
|
* is zero, and it is assigned in run_cache_set().
|
||||||
* so bucket of RESERVE_BTREE type is needed,
|
*
|
||||||
* the worst situation is all journal buckets are valid journal,
|
* When ca->sb.njournal_buckets is not zero, journal exists,
|
||||||
* and all the keys need to replay,
|
* and in bch_journal_replay(), tree node may split.
|
||||||
* so the number of RESERVE_BTREE type buckets should be as much
|
* The worst situation is all journal buckets are valid journal,
|
||||||
* as journal buckets
|
* and all the keys need to replay, so the number of RESERVE_BTREE
|
||||||
|
* type buckets should be as much as journal buckets.
|
||||||
|
*
|
||||||
|
* If the number of RESERVE_BTREE type buckets is too few, the
|
||||||
|
* bch_allocator_thread() may hang up and unable to allocate
|
||||||
|
* bucket. The situation is roughly as follows:
|
||||||
|
*
|
||||||
|
* 1. In bch_data_insert_keys(), if the operation is not op->replace,
|
||||||
|
* it will call the bch_journal(), which increments the journal_ref
|
||||||
|
* counter. This counter is only decremented after bch_btree_insert
|
||||||
|
* completes.
|
||||||
|
*
|
||||||
|
* 2. When calling bch_btree_insert, if the btree needs to split,
|
||||||
|
* it will call btree_split() and btree_check_reserve() to check
|
||||||
|
* whether there are enough reserved buckets in the RESERVE_BTREE
|
||||||
|
* slot. If not enough, bcache_btree_root() will repeatedly retry.
|
||||||
|
*
|
||||||
|
* 3. Normally, the bch_allocator_thread is responsible for filling
|
||||||
|
* the reservation slots from the free_inc bucket list. When the
|
||||||
|
* free_inc bucket list is exhausted, the bch_allocator_thread
|
||||||
|
* will call invalidate_buckets() until free_inc is refilled.
|
||||||
|
* Then bch_allocator_thread calls bch_prio_write() once. and
|
||||||
|
* bch_prio_write() will call bch_journal_meta() and waits for
|
||||||
|
* the journal write to complete.
|
||||||
|
*
|
||||||
|
* 4. During journal_write, journal_write_unlocked() is be called.
|
||||||
|
* If journal full occurs, journal_reclaim() and btree_flush_write()
|
||||||
|
* will be called sequentially, then retry journal_write.
|
||||||
|
*
|
||||||
|
* 5. When 2 and 4 occur together, IO will hung up and cannot recover.
|
||||||
|
*
|
||||||
|
* Therefore, reserve more RESERVE_BTREE type buckets.
|
||||||
*/
|
*/
|
||||||
btree_buckets = ca->sb.njournal_buckets ?: 8;
|
btree_buckets = clamp_t(size_t, ca->sb.nbuckets >> 7,
|
||||||
|
32, SB_JOURNAL_BUCKETS);
|
||||||
free = roundup_pow_of_two(ca->sb.nbuckets) >> 10;
|
free = roundup_pow_of_two(ca->sb.nbuckets) >> 10;
|
||||||
if (!free) {
|
if (!free) {
|
||||||
ret = -EPERM;
|
ret = -EPERM;
|
||||||
|
|
|
||||||
|
|
@ -1356,11 +1356,7 @@ static int parse_raid_params(struct raid_set *rs, struct dm_arg_set *as,
|
||||||
return -EINVAL;
|
return -EINVAL;
|
||||||
}
|
}
|
||||||
|
|
||||||
/*
|
if (value < 0) {
|
||||||
* In device-mapper, we specify things in sectors, but
|
|
||||||
* MD records this value in kB
|
|
||||||
*/
|
|
||||||
if (value < 0 || value / 2 > COUNTER_MAX) {
|
|
||||||
rs->ti->error = "Max write-behind limit out of range";
|
rs->ti->error = "Max write-behind limit out of range";
|
||||||
return -EINVAL;
|
return -EINVAL;
|
||||||
}
|
}
|
||||||
|
|
|
||||||
|
|
@ -105,9 +105,19 @@
|
||||||
*
|
*
|
||||||
*/
|
*/
|
||||||
|
|
||||||
|
typedef __u16 bitmap_counter_t;
|
||||||
|
|
||||||
#define PAGE_BITS (PAGE_SIZE << 3)
|
#define PAGE_BITS (PAGE_SIZE << 3)
|
||||||
#define PAGE_BIT_SHIFT (PAGE_SHIFT + 3)
|
#define PAGE_BIT_SHIFT (PAGE_SHIFT + 3)
|
||||||
|
|
||||||
|
#define COUNTER_BITS 16
|
||||||
|
#define COUNTER_BIT_SHIFT 4
|
||||||
|
#define COUNTER_BYTE_SHIFT (COUNTER_BIT_SHIFT - 3)
|
||||||
|
|
||||||
|
#define NEEDED_MASK ((bitmap_counter_t) (1 << (COUNTER_BITS - 1)))
|
||||||
|
#define RESYNC_MASK ((bitmap_counter_t) (1 << (COUNTER_BITS - 2)))
|
||||||
|
#define COUNTER_MAX ((bitmap_counter_t) RESYNC_MASK - 1)
|
||||||
|
|
||||||
#define NEEDED(x) (((bitmap_counter_t) x) & NEEDED_MASK)
|
#define NEEDED(x) (((bitmap_counter_t) x) & NEEDED_MASK)
|
||||||
#define RESYNC(x) (((bitmap_counter_t) x) & RESYNC_MASK)
|
#define RESYNC(x) (((bitmap_counter_t) x) & RESYNC_MASK)
|
||||||
#define COUNTER(x) (((bitmap_counter_t) x) & COUNTER_MAX)
|
#define COUNTER(x) (((bitmap_counter_t) x) & COUNTER_MAX)
|
||||||
|
|
@ -789,7 +799,7 @@ static int md_bitmap_new_disk_sb(struct bitmap *bitmap)
|
||||||
* is a good choice? We choose COUNTER_MAX / 2 arbitrarily.
|
* is a good choice? We choose COUNTER_MAX / 2 arbitrarily.
|
||||||
*/
|
*/
|
||||||
write_behind = bitmap->mddev->bitmap_info.max_write_behind;
|
write_behind = bitmap->mddev->bitmap_info.max_write_behind;
|
||||||
if (write_behind > COUNTER_MAX)
|
if (write_behind > COUNTER_MAX / 2)
|
||||||
write_behind = COUNTER_MAX / 2;
|
write_behind = COUNTER_MAX / 2;
|
||||||
sb->write_behind = cpu_to_le32(write_behind);
|
sb->write_behind = cpu_to_le32(write_behind);
|
||||||
bitmap->mddev->bitmap_info.max_write_behind = write_behind;
|
bitmap->mddev->bitmap_info.max_write_behind = write_behind;
|
||||||
|
|
@ -1672,13 +1682,13 @@ __acquires(bitmap->lock)
|
||||||
&(bitmap->bp[page].map[pageoff]);
|
&(bitmap->bp[page].map[pageoff]);
|
||||||
}
|
}
|
||||||
|
|
||||||
static int bitmap_startwrite(struct mddev *mddev, sector_t offset,
|
static void bitmap_start_write(struct mddev *mddev, sector_t offset,
|
||||||
unsigned long sectors)
|
unsigned long sectors)
|
||||||
{
|
{
|
||||||
struct bitmap *bitmap = mddev->bitmap;
|
struct bitmap *bitmap = mddev->bitmap;
|
||||||
|
|
||||||
if (!bitmap)
|
if (!bitmap)
|
||||||
return 0;
|
return;
|
||||||
|
|
||||||
while (sectors) {
|
while (sectors) {
|
||||||
sector_t blocks;
|
sector_t blocks;
|
||||||
|
|
@ -1688,7 +1698,7 @@ static int bitmap_startwrite(struct mddev *mddev, sector_t offset,
|
||||||
bmc = md_bitmap_get_counter(&bitmap->counts, offset, &blocks, 1);
|
bmc = md_bitmap_get_counter(&bitmap->counts, offset, &blocks, 1);
|
||||||
if (!bmc) {
|
if (!bmc) {
|
||||||
spin_unlock_irq(&bitmap->counts.lock);
|
spin_unlock_irq(&bitmap->counts.lock);
|
||||||
return 0;
|
return;
|
||||||
}
|
}
|
||||||
|
|
||||||
if (unlikely(COUNTER(*bmc) == COUNTER_MAX)) {
|
if (unlikely(COUNTER(*bmc) == COUNTER_MAX)) {
|
||||||
|
|
@ -1724,10 +1734,9 @@ static int bitmap_startwrite(struct mddev *mddev, sector_t offset,
|
||||||
else
|
else
|
||||||
sectors = 0;
|
sectors = 0;
|
||||||
}
|
}
|
||||||
return 0;
|
|
||||||
}
|
}
|
||||||
|
|
||||||
static void bitmap_endwrite(struct mddev *mddev, sector_t offset,
|
static void bitmap_end_write(struct mddev *mddev, sector_t offset,
|
||||||
unsigned long sectors)
|
unsigned long sectors)
|
||||||
{
|
{
|
||||||
struct bitmap *bitmap = mddev->bitmap;
|
struct bitmap *bitmap = mddev->bitmap;
|
||||||
|
|
@ -2205,9 +2214,9 @@ static struct bitmap *__bitmap_create(struct mddev *mddev, int slot)
|
||||||
return ERR_PTR(err);
|
return ERR_PTR(err);
|
||||||
}
|
}
|
||||||
|
|
||||||
static int bitmap_create(struct mddev *mddev, int slot)
|
static int bitmap_create(struct mddev *mddev)
|
||||||
{
|
{
|
||||||
struct bitmap *bitmap = __bitmap_create(mddev, slot);
|
struct bitmap *bitmap = __bitmap_create(mddev, -1);
|
||||||
|
|
||||||
if (IS_ERR(bitmap))
|
if (IS_ERR(bitmap))
|
||||||
return PTR_ERR(bitmap);
|
return PTR_ERR(bitmap);
|
||||||
|
|
@ -2670,7 +2679,7 @@ location_store(struct mddev *mddev, const char *buf, size_t len)
|
||||||
}
|
}
|
||||||
|
|
||||||
mddev->bitmap_info.offset = offset;
|
mddev->bitmap_info.offset = offset;
|
||||||
rv = bitmap_create(mddev, -1);
|
rv = bitmap_create(mddev);
|
||||||
if (rv)
|
if (rv)
|
||||||
goto out;
|
goto out;
|
||||||
|
|
||||||
|
|
@ -3003,8 +3012,8 @@ static struct bitmap_operations bitmap_ops = {
|
||||||
.end_behind_write = bitmap_end_behind_write,
|
.end_behind_write = bitmap_end_behind_write,
|
||||||
.wait_behind_writes = bitmap_wait_behind_writes,
|
.wait_behind_writes = bitmap_wait_behind_writes,
|
||||||
|
|
||||||
.startwrite = bitmap_startwrite,
|
.start_write = bitmap_start_write,
|
||||||
.endwrite = bitmap_endwrite,
|
.end_write = bitmap_end_write,
|
||||||
.start_sync = bitmap_start_sync,
|
.start_sync = bitmap_start_sync,
|
||||||
.end_sync = bitmap_end_sync,
|
.end_sync = bitmap_end_sync,
|
||||||
.cond_end_sync = bitmap_cond_end_sync,
|
.cond_end_sync = bitmap_cond_end_sync,
|
||||||
|
|
|
||||||
|
|
@ -9,15 +9,6 @@
|
||||||
|
|
||||||
#define BITMAP_MAGIC 0x6d746962
|
#define BITMAP_MAGIC 0x6d746962
|
||||||
|
|
||||||
typedef __u16 bitmap_counter_t;
|
|
||||||
#define COUNTER_BITS 16
|
|
||||||
#define COUNTER_BIT_SHIFT 4
|
|
||||||
#define COUNTER_BYTE_SHIFT (COUNTER_BIT_SHIFT - 3)
|
|
||||||
|
|
||||||
#define NEEDED_MASK ((bitmap_counter_t) (1 << (COUNTER_BITS - 1)))
|
|
||||||
#define RESYNC_MASK ((bitmap_counter_t) (1 << (COUNTER_BITS - 2)))
|
|
||||||
#define COUNTER_MAX ((bitmap_counter_t) RESYNC_MASK - 1)
|
|
||||||
|
|
||||||
/* use these for bitmap->flags and bitmap->sb->state bit-fields */
|
/* use these for bitmap->flags and bitmap->sb->state bit-fields */
|
||||||
enum bitmap_state {
|
enum bitmap_state {
|
||||||
BITMAP_STALE = 1, /* the bitmap file is out of date or had -EIO */
|
BITMAP_STALE = 1, /* the bitmap file is out of date or had -EIO */
|
||||||
|
|
@ -72,7 +63,7 @@ struct md_bitmap_stats {
|
||||||
|
|
||||||
struct bitmap_operations {
|
struct bitmap_operations {
|
||||||
bool (*enabled)(struct mddev *mddev);
|
bool (*enabled)(struct mddev *mddev);
|
||||||
int (*create)(struct mddev *mddev, int slot);
|
int (*create)(struct mddev *mddev);
|
||||||
int (*resize)(struct mddev *mddev, sector_t blocks, int chunksize,
|
int (*resize)(struct mddev *mddev, sector_t blocks, int chunksize,
|
||||||
bool init);
|
bool init);
|
||||||
|
|
||||||
|
|
@ -89,9 +80,9 @@ struct bitmap_operations {
|
||||||
void (*end_behind_write)(struct mddev *mddev);
|
void (*end_behind_write)(struct mddev *mddev);
|
||||||
void (*wait_behind_writes)(struct mddev *mddev);
|
void (*wait_behind_writes)(struct mddev *mddev);
|
||||||
|
|
||||||
int (*startwrite)(struct mddev *mddev, sector_t offset,
|
void (*start_write)(struct mddev *mddev, sector_t offset,
|
||||||
unsigned long sectors);
|
unsigned long sectors);
|
||||||
void (*endwrite)(struct mddev *mddev, sector_t offset,
|
void (*end_write)(struct mddev *mddev, sector_t offset,
|
||||||
unsigned long sectors);
|
unsigned long sectors);
|
||||||
bool (*start_sync)(struct mddev *mddev, sector_t offset,
|
bool (*start_sync)(struct mddev *mddev, sector_t offset,
|
||||||
sector_t *blocks, bool degraded);
|
sector_t *blocks, bool degraded);
|
||||||
|
|
|
||||||
|
|
@ -6225,7 +6225,7 @@ int md_run(struct mddev *mddev)
|
||||||
}
|
}
|
||||||
if (err == 0 && pers->sync_request &&
|
if (err == 0 && pers->sync_request &&
|
||||||
(mddev->bitmap_info.file || mddev->bitmap_info.offset)) {
|
(mddev->bitmap_info.file || mddev->bitmap_info.offset)) {
|
||||||
err = mddev->bitmap_ops->create(mddev, -1);
|
err = mddev->bitmap_ops->create(mddev);
|
||||||
if (err)
|
if (err)
|
||||||
pr_warn("%s: failed to create bitmap (%d)\n",
|
pr_warn("%s: failed to create bitmap (%d)\n",
|
||||||
mdname(mddev), err);
|
mdname(mddev), err);
|
||||||
|
|
@ -7285,7 +7285,7 @@ static int set_bitmap_file(struct mddev *mddev, int fd)
|
||||||
err = 0;
|
err = 0;
|
||||||
if (mddev->pers) {
|
if (mddev->pers) {
|
||||||
if (fd >= 0) {
|
if (fd >= 0) {
|
||||||
err = mddev->bitmap_ops->create(mddev, -1);
|
err = mddev->bitmap_ops->create(mddev);
|
||||||
if (!err)
|
if (!err)
|
||||||
err = mddev->bitmap_ops->load(mddev);
|
err = mddev->bitmap_ops->load(mddev);
|
||||||
|
|
||||||
|
|
@ -7601,7 +7601,7 @@ static int update_array_info(struct mddev *mddev, mdu_array_info_t *info)
|
||||||
mddev->bitmap_info.default_offset;
|
mddev->bitmap_info.default_offset;
|
||||||
mddev->bitmap_info.space =
|
mddev->bitmap_info.space =
|
||||||
mddev->bitmap_info.default_space;
|
mddev->bitmap_info.default_space;
|
||||||
rv = mddev->bitmap_ops->create(mddev, -1);
|
rv = mddev->bitmap_ops->create(mddev);
|
||||||
if (!rv)
|
if (!rv)
|
||||||
rv = mddev->bitmap_ops->load(mddev);
|
rv = mddev->bitmap_ops->load(mddev);
|
||||||
|
|
||||||
|
|
@ -8799,13 +8799,13 @@ static void md_bitmap_start(struct mddev *mddev,
|
||||||
mddev->pers->bitmap_sector(mddev, &md_io_clone->offset,
|
mddev->pers->bitmap_sector(mddev, &md_io_clone->offset,
|
||||||
&md_io_clone->sectors);
|
&md_io_clone->sectors);
|
||||||
|
|
||||||
mddev->bitmap_ops->startwrite(mddev, md_io_clone->offset,
|
mddev->bitmap_ops->start_write(mddev, md_io_clone->offset,
|
||||||
md_io_clone->sectors);
|
md_io_clone->sectors);
|
||||||
}
|
}
|
||||||
|
|
||||||
static void md_bitmap_end(struct mddev *mddev, struct md_io_clone *md_io_clone)
|
static void md_bitmap_end(struct mddev *mddev, struct md_io_clone *md_io_clone)
|
||||||
{
|
{
|
||||||
mddev->bitmap_ops->endwrite(mddev, md_io_clone->offset,
|
mddev->bitmap_ops->end_write(mddev, md_io_clone->offset,
|
||||||
md_io_clone->sectors);
|
md_io_clone->sectors);
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|
|
||||||
|
|
@ -293,3 +293,13 @@ static inline bool raid1_should_read_first(struct mddev *mddev,
|
||||||
|
|
||||||
return false;
|
return false;
|
||||||
}
|
}
|
||||||
|
|
||||||
|
/*
|
||||||
|
* bio with REQ_RAHEAD or REQ_NOWAIT can fail at anytime, before such IO is
|
||||||
|
* submitted to the underlying disks, hence don't record badblocks or retry
|
||||||
|
* in this case.
|
||||||
|
*/
|
||||||
|
static inline bool raid1_should_handle_error(struct bio *bio)
|
||||||
|
{
|
||||||
|
return !(bio->bi_opf & (REQ_RAHEAD | REQ_NOWAIT));
|
||||||
|
}
|
||||||
|
|
|
||||||
|
|
@ -373,14 +373,16 @@ static void raid1_end_read_request(struct bio *bio)
|
||||||
*/
|
*/
|
||||||
update_head_pos(r1_bio->read_disk, r1_bio);
|
update_head_pos(r1_bio->read_disk, r1_bio);
|
||||||
|
|
||||||
if (uptodate)
|
if (uptodate) {
|
||||||
set_bit(R1BIO_Uptodate, &r1_bio->state);
|
set_bit(R1BIO_Uptodate, &r1_bio->state);
|
||||||
else if (test_bit(FailFast, &rdev->flags) &&
|
} else if (test_bit(FailFast, &rdev->flags) &&
|
||||||
test_bit(R1BIO_FailFast, &r1_bio->state))
|
test_bit(R1BIO_FailFast, &r1_bio->state)) {
|
||||||
/* This was a fail-fast read so we definitely
|
/* This was a fail-fast read so we definitely
|
||||||
* want to retry */
|
* want to retry */
|
||||||
;
|
;
|
||||||
else {
|
} else if (!raid1_should_handle_error(bio)) {
|
||||||
|
uptodate = 1;
|
||||||
|
} else {
|
||||||
/* If all other devices have failed, we want to return
|
/* If all other devices have failed, we want to return
|
||||||
* the error upwards rather than fail the last device.
|
* the error upwards rather than fail the last device.
|
||||||
* Here we redefine "uptodate" to mean "Don't want to retry"
|
* Here we redefine "uptodate" to mean "Don't want to retry"
|
||||||
|
|
@ -451,16 +453,15 @@ static void raid1_end_write_request(struct bio *bio)
|
||||||
struct bio *to_put = NULL;
|
struct bio *to_put = NULL;
|
||||||
int mirror = find_bio_disk(r1_bio, bio);
|
int mirror = find_bio_disk(r1_bio, bio);
|
||||||
struct md_rdev *rdev = conf->mirrors[mirror].rdev;
|
struct md_rdev *rdev = conf->mirrors[mirror].rdev;
|
||||||
bool discard_error;
|
|
||||||
sector_t lo = r1_bio->sector;
|
sector_t lo = r1_bio->sector;
|
||||||
sector_t hi = r1_bio->sector + r1_bio->sectors;
|
sector_t hi = r1_bio->sector + r1_bio->sectors;
|
||||||
|
bool ignore_error = !raid1_should_handle_error(bio) ||
|
||||||
discard_error = bio->bi_status && bio_op(bio) == REQ_OP_DISCARD;
|
(bio->bi_status && bio_op(bio) == REQ_OP_DISCARD);
|
||||||
|
|
||||||
/*
|
/*
|
||||||
* 'one mirror IO has finished' event handler:
|
* 'one mirror IO has finished' event handler:
|
||||||
*/
|
*/
|
||||||
if (bio->bi_status && !discard_error) {
|
if (bio->bi_status && !ignore_error) {
|
||||||
set_bit(WriteErrorSeen, &rdev->flags);
|
set_bit(WriteErrorSeen, &rdev->flags);
|
||||||
if (!test_and_set_bit(WantReplacement, &rdev->flags))
|
if (!test_and_set_bit(WantReplacement, &rdev->flags))
|
||||||
set_bit(MD_RECOVERY_NEEDED, &
|
set_bit(MD_RECOVERY_NEEDED, &
|
||||||
|
|
@ -511,7 +512,7 @@ static void raid1_end_write_request(struct bio *bio)
|
||||||
|
|
||||||
/* Maybe we can clear some bad blocks. */
|
/* Maybe we can clear some bad blocks. */
|
||||||
if (rdev_has_badblock(rdev, r1_bio->sector, r1_bio->sectors) &&
|
if (rdev_has_badblock(rdev, r1_bio->sector, r1_bio->sectors) &&
|
||||||
!discard_error) {
|
!ignore_error) {
|
||||||
r1_bio->bios[mirror] = IO_MADE_GOOD;
|
r1_bio->bios[mirror] = IO_MADE_GOOD;
|
||||||
set_bit(R1BIO_MadeGood, &r1_bio->state);
|
set_bit(R1BIO_MadeGood, &r1_bio->state);
|
||||||
}
|
}
|
||||||
|
|
|
||||||
|
|
@ -399,6 +399,8 @@ static void raid10_end_read_request(struct bio *bio)
|
||||||
* wait for the 'master' bio.
|
* wait for the 'master' bio.
|
||||||
*/
|
*/
|
||||||
set_bit(R10BIO_Uptodate, &r10_bio->state);
|
set_bit(R10BIO_Uptodate, &r10_bio->state);
|
||||||
|
} else if (!raid1_should_handle_error(bio)) {
|
||||||
|
uptodate = 1;
|
||||||
} else {
|
} else {
|
||||||
/* If all other devices that store this block have
|
/* If all other devices that store this block have
|
||||||
* failed, we want to return the error upwards rather
|
* failed, we want to return the error upwards rather
|
||||||
|
|
@ -456,9 +458,8 @@ static void raid10_end_write_request(struct bio *bio)
|
||||||
int slot, repl;
|
int slot, repl;
|
||||||
struct md_rdev *rdev = NULL;
|
struct md_rdev *rdev = NULL;
|
||||||
struct bio *to_put = NULL;
|
struct bio *to_put = NULL;
|
||||||
bool discard_error;
|
bool ignore_error = !raid1_should_handle_error(bio) ||
|
||||||
|
(bio->bi_status && bio_op(bio) == REQ_OP_DISCARD);
|
||||||
discard_error = bio->bi_status && bio_op(bio) == REQ_OP_DISCARD;
|
|
||||||
|
|
||||||
dev = find_bio_disk(conf, r10_bio, bio, &slot, &repl);
|
dev = find_bio_disk(conf, r10_bio, bio, &slot, &repl);
|
||||||
|
|
||||||
|
|
@ -472,7 +473,7 @@ static void raid10_end_write_request(struct bio *bio)
|
||||||
/*
|
/*
|
||||||
* this branch is our 'one mirror IO has finished' event handler:
|
* this branch is our 'one mirror IO has finished' event handler:
|
||||||
*/
|
*/
|
||||||
if (bio->bi_status && !discard_error) {
|
if (bio->bi_status && !ignore_error) {
|
||||||
if (repl)
|
if (repl)
|
||||||
/* Never record new bad blocks to replacement,
|
/* Never record new bad blocks to replacement,
|
||||||
* just fail it.
|
* just fail it.
|
||||||
|
|
@ -527,7 +528,7 @@ static void raid10_end_write_request(struct bio *bio)
|
||||||
/* Maybe we can clear some bad blocks. */
|
/* Maybe we can clear some bad blocks. */
|
||||||
if (rdev_has_badblock(rdev, r10_bio->devs[slot].addr,
|
if (rdev_has_badblock(rdev, r10_bio->devs[slot].addr,
|
||||||
r10_bio->sectors) &&
|
r10_bio->sectors) &&
|
||||||
!discard_error) {
|
!ignore_error) {
|
||||||
bio_put(bio);
|
bio_put(bio);
|
||||||
if (repl)
|
if (repl)
|
||||||
r10_bio->devs[slot].repl_bio = IO_MADE_GOOD;
|
r10_bio->devs[slot].repl_bio = IO_MADE_GOOD;
|
||||||
|
|
|
||||||
|
|
@ -471,7 +471,7 @@ EXPORT_SYMBOL_GPL(nvme_auth_generate_key);
|
||||||
* @c1: Value of challenge C1
|
* @c1: Value of challenge C1
|
||||||
* @c2: Value of challenge C2
|
* @c2: Value of challenge C2
|
||||||
* @hash_len: Hash length of the hash algorithm
|
* @hash_len: Hash length of the hash algorithm
|
||||||
* @ret_psk: Pointer too the resulting generated PSK
|
* @ret_psk: Pointer to the resulting generated PSK
|
||||||
* @ret_len: length of @ret_psk
|
* @ret_len: length of @ret_psk
|
||||||
*
|
*
|
||||||
* Generate a PSK for TLS as specified in NVMe base specification, section
|
* Generate a PSK for TLS as specified in NVMe base specification, section
|
||||||
|
|
@ -759,8 +759,8 @@ int nvme_auth_derive_tls_psk(int hmac_id, u8 *psk, size_t psk_len,
|
||||||
goto out_free_prk;
|
goto out_free_prk;
|
||||||
|
|
||||||
/*
|
/*
|
||||||
* 2 addtional bytes for the length field from HDKF-Expand-Label,
|
* 2 additional bytes for the length field from HDKF-Expand-Label,
|
||||||
* 2 addtional bytes for the HMAC ID, and one byte for the space
|
* 2 additional bytes for the HMAC ID, and one byte for the space
|
||||||
* separator.
|
* separator.
|
||||||
*/
|
*/
|
||||||
info_len = strlen(psk_digest) + strlen(psk_prefix) + 5;
|
info_len = strlen(psk_digest) + strlen(psk_prefix) + 5;
|
||||||
|
|
|
||||||
|
|
@ -106,7 +106,7 @@ config NVME_TCP_TLS
|
||||||
help
|
help
|
||||||
Enables TLS encryption for NVMe TCP using the netlink handshake API.
|
Enables TLS encryption for NVMe TCP using the netlink handshake API.
|
||||||
|
|
||||||
The TLS handshake daemon is availble at
|
The TLS handshake daemon is available at
|
||||||
https://github.com/oracle/ktls-utils.
|
https://github.com/oracle/ktls-utils.
|
||||||
|
|
||||||
If unsure, say N.
|
If unsure, say N.
|
||||||
|
|
|
||||||
|
|
@ -145,7 +145,7 @@ static const char * const nvme_statuses[] = {
|
||||||
[NVME_SC_BAD_ATTRIBUTES] = "Conflicting Attributes",
|
[NVME_SC_BAD_ATTRIBUTES] = "Conflicting Attributes",
|
||||||
[NVME_SC_INVALID_PI] = "Invalid Protection Information",
|
[NVME_SC_INVALID_PI] = "Invalid Protection Information",
|
||||||
[NVME_SC_READ_ONLY] = "Attempted Write to Read Only Range",
|
[NVME_SC_READ_ONLY] = "Attempted Write to Read Only Range",
|
||||||
[NVME_SC_ONCS_NOT_SUPPORTED] = "ONCS Not Supported",
|
[NVME_SC_CMD_SIZE_LIM_EXCEEDED ] = "Command Size Limits Exceeded",
|
||||||
[NVME_SC_ZONE_BOUNDARY_ERROR] = "Zoned Boundary Error",
|
[NVME_SC_ZONE_BOUNDARY_ERROR] = "Zoned Boundary Error",
|
||||||
[NVME_SC_ZONE_FULL] = "Zone Is Full",
|
[NVME_SC_ZONE_FULL] = "Zone Is Full",
|
||||||
[NVME_SC_ZONE_READ_ONLY] = "Zone Is Read Only",
|
[NVME_SC_ZONE_READ_ONLY] = "Zone Is Read Only",
|
||||||
|
|
|
||||||
|
|
@ -290,7 +290,6 @@ static blk_status_t nvme_error_status(u16 status)
|
||||||
case NVME_SC_NS_NOT_READY:
|
case NVME_SC_NS_NOT_READY:
|
||||||
return BLK_STS_TARGET;
|
return BLK_STS_TARGET;
|
||||||
case NVME_SC_BAD_ATTRIBUTES:
|
case NVME_SC_BAD_ATTRIBUTES:
|
||||||
case NVME_SC_ONCS_NOT_SUPPORTED:
|
|
||||||
case NVME_SC_INVALID_OPCODE:
|
case NVME_SC_INVALID_OPCODE:
|
||||||
case NVME_SC_INVALID_FIELD:
|
case NVME_SC_INVALID_FIELD:
|
||||||
case NVME_SC_INVALID_NS:
|
case NVME_SC_INVALID_NS:
|
||||||
|
|
@ -1027,7 +1026,7 @@ static inline blk_status_t nvme_setup_rw(struct nvme_ns *ns,
|
||||||
|
|
||||||
if (ns->head->ms) {
|
if (ns->head->ms) {
|
||||||
/*
|
/*
|
||||||
* If formated with metadata, the block layer always provides a
|
* If formatted with metadata, the block layer always provides a
|
||||||
* metadata buffer if CONFIG_BLK_DEV_INTEGRITY is enabled. Else
|
* metadata buffer if CONFIG_BLK_DEV_INTEGRITY is enabled. Else
|
||||||
* we enable the PRACT bit for protection information or set the
|
* we enable the PRACT bit for protection information or set the
|
||||||
* namespace capacity to zero to prevent any I/O.
|
* namespace capacity to zero to prevent any I/O.
|
||||||
|
|
|
||||||
|
|
@ -582,7 +582,7 @@ EXPORT_SYMBOL_GPL(nvmf_connect_io_queue);
|
||||||
* Do not retry when:
|
* Do not retry when:
|
||||||
*
|
*
|
||||||
* - the DNR bit is set and the specification states no further connect
|
* - the DNR bit is set and the specification states no further connect
|
||||||
* attempts with the same set of paramenters should be attempted.
|
* attempts with the same set of parameters should be attempted.
|
||||||
*
|
*
|
||||||
* - when the authentication attempt fails, because the key was invalid.
|
* - when the authentication attempt fails, because the key was invalid.
|
||||||
* This error code is set on the host side.
|
* This error code is set on the host side.
|
||||||
|
|
|
||||||
|
|
@ -80,7 +80,7 @@ enum {
|
||||||
* @transport: Holds the fabric transport "technology name" (for a lack of
|
* @transport: Holds the fabric transport "technology name" (for a lack of
|
||||||
* better description) that will be used by an NVMe controller
|
* better description) that will be used by an NVMe controller
|
||||||
* being added.
|
* being added.
|
||||||
* @subsysnqn: Hold the fully qualified NQN subystem name (format defined
|
* @subsysnqn: Hold the fully qualified NQN subsystem name (format defined
|
||||||
* in the NVMe specification, "NVMe Qualified Names").
|
* in the NVMe specification, "NVMe Qualified Names").
|
||||||
* @traddr: The transport-specific TRADDR field for a port on the
|
* @traddr: The transport-specific TRADDR field for a port on the
|
||||||
* subsystem which is adding a controller.
|
* subsystem which is adding a controller.
|
||||||
|
|
@ -156,7 +156,7 @@ struct nvmf_ctrl_options {
|
||||||
* @create_ctrl(): function pointer that points to a non-NVMe
|
* @create_ctrl(): function pointer that points to a non-NVMe
|
||||||
* implementation-specific fabric technology
|
* implementation-specific fabric technology
|
||||||
* that would go into starting up that fabric
|
* that would go into starting up that fabric
|
||||||
* for the purpose of conneciton to an NVMe controller
|
* for the purpose of connection to an NVMe controller
|
||||||
* using that fabric technology.
|
* using that fabric technology.
|
||||||
*
|
*
|
||||||
* Notes:
|
* Notes:
|
||||||
|
|
@ -165,7 +165,7 @@ struct nvmf_ctrl_options {
|
||||||
* 2. create_ctrl() must be defined (even if it does nothing)
|
* 2. create_ctrl() must be defined (even if it does nothing)
|
||||||
* 3. struct nvmf_transport_ops must be statically allocated in the
|
* 3. struct nvmf_transport_ops must be statically allocated in the
|
||||||
* modules .bss section so that a pure module_get on @module
|
* modules .bss section so that a pure module_get on @module
|
||||||
* prevents the memory from beeing freed.
|
* prevents the memory from being freed.
|
||||||
*/
|
*/
|
||||||
struct nvmf_transport_ops {
|
struct nvmf_transport_ops {
|
||||||
struct list_head entry;
|
struct list_head entry;
|
||||||
|
|
|
||||||
|
|
@ -1955,7 +1955,7 @@ nvme_fc_fcpio_done(struct nvmefc_fcp_req *req)
|
||||||
}
|
}
|
||||||
|
|
||||||
/*
|
/*
|
||||||
* For the linux implementation, if we have an unsuccesful
|
* For the linux implementation, if we have an unsucceesful
|
||||||
* status, they blk-mq layer can typically be called with the
|
* status, they blk-mq layer can typically be called with the
|
||||||
* non-zero status and the content of the cqe isn't important.
|
* non-zero status and the content of the cqe isn't important.
|
||||||
*/
|
*/
|
||||||
|
|
@ -2479,7 +2479,7 @@ __nvme_fc_abort_outstanding_ios(struct nvme_fc_ctrl *ctrl, bool start_queues)
|
||||||
* writing the registers for shutdown and polling (call
|
* writing the registers for shutdown and polling (call
|
||||||
* nvme_disable_ctrl()). Given a bunch of i/o was potentially
|
* nvme_disable_ctrl()). Given a bunch of i/o was potentially
|
||||||
* just aborted and we will wait on those contexts, and given
|
* just aborted and we will wait on those contexts, and given
|
||||||
* there was no indication of how live the controlelr is on the
|
* there was no indication of how live the controller is on the
|
||||||
* link, don't send more io to create more contexts for the
|
* link, don't send more io to create more contexts for the
|
||||||
* shutdown. Let the controller fail via keepalive failure if
|
* shutdown. Let the controller fail via keepalive failure if
|
||||||
* its still present.
|
* its still present.
|
||||||
|
|
|
||||||
|
|
@ -493,13 +493,15 @@ static int nvme_uring_cmd_io(struct nvme_ctrl *ctrl, struct nvme_ns *ns,
|
||||||
d.timeout_ms = READ_ONCE(cmd->timeout_ms);
|
d.timeout_ms = READ_ONCE(cmd->timeout_ms);
|
||||||
|
|
||||||
if (d.data_len && (ioucmd->flags & IORING_URING_CMD_FIXED)) {
|
if (d.data_len && (ioucmd->flags & IORING_URING_CMD_FIXED)) {
|
||||||
/* fixedbufs is only for non-vectored io */
|
int ddir = nvme_is_write(&c) ? WRITE : READ;
|
||||||
if (vec)
|
|
||||||
return -EINVAL;
|
|
||||||
|
|
||||||
|
if (vec)
|
||||||
|
ret = io_uring_cmd_import_fixed_vec(ioucmd,
|
||||||
|
u64_to_user_ptr(d.addr), d.data_len,
|
||||||
|
ddir, &iter, issue_flags);
|
||||||
|
else
|
||||||
ret = io_uring_cmd_import_fixed(d.addr, d.data_len,
|
ret = io_uring_cmd_import_fixed(d.addr, d.data_len,
|
||||||
nvme_is_write(&c) ? WRITE : READ, &iter, ioucmd,
|
ddir, &iter, ioucmd, issue_flags);
|
||||||
issue_flags);
|
|
||||||
if (ret < 0)
|
if (ret < 0)
|
||||||
return ret;
|
return ret;
|
||||||
|
|
||||||
|
|
@ -521,7 +523,7 @@ static int nvme_uring_cmd_io(struct nvme_ctrl *ctrl, struct nvme_ns *ns,
|
||||||
if (d.data_len) {
|
if (d.data_len) {
|
||||||
ret = nvme_map_user_request(req, d.addr, d.data_len,
|
ret = nvme_map_user_request(req, d.addr, d.data_len,
|
||||||
nvme_to_user_ptr(d.metadata), d.metadata_len,
|
nvme_to_user_ptr(d.metadata), d.metadata_len,
|
||||||
map_iter, vec);
|
map_iter, vec ? NVME_IOCTL_VEC : 0);
|
||||||
if (ret)
|
if (ret)
|
||||||
goto out_free_req;
|
goto out_free_req;
|
||||||
}
|
}
|
||||||
|
|
@ -727,7 +729,7 @@ int nvme_ns_head_ioctl(struct block_device *bdev, blk_mode_t mode,
|
||||||
|
|
||||||
/*
|
/*
|
||||||
* Handle ioctls that apply to the controller instead of the namespace
|
* Handle ioctls that apply to the controller instead of the namespace
|
||||||
* seperately and drop the ns SRCU reference early. This avoids a
|
* separately and drop the ns SRCU reference early. This avoids a
|
||||||
* deadlock when deleting namespaces using the passthrough interface.
|
* deadlock when deleting namespaces using the passthrough interface.
|
||||||
*/
|
*/
|
||||||
if (is_ctrl_ioctl(cmd))
|
if (is_ctrl_ioctl(cmd))
|
||||||
|
|
|
||||||
|
|
@ -760,7 +760,7 @@ int nvme_mpath_alloc_disk(struct nvme_ctrl *ctrl, struct nvme_ns_head *head)
|
||||||
* controller's scan_work context. If a path error occurs here, the IO
|
* controller's scan_work context. If a path error occurs here, the IO
|
||||||
* will wait until a path becomes available or all paths are torn down,
|
* will wait until a path becomes available or all paths are torn down,
|
||||||
* but that action also occurs within scan_work, so it would deadlock.
|
* but that action also occurs within scan_work, so it would deadlock.
|
||||||
* Defer the partion scan to a different context that does not block
|
* Defer the partition scan to a different context that does not block
|
||||||
* scan_work.
|
* scan_work.
|
||||||
*/
|
*/
|
||||||
set_bit(GD_SUPPRESS_PART_SCAN, &head->disk->state);
|
set_bit(GD_SUPPRESS_PART_SCAN, &head->disk->state);
|
||||||
|
|
|
||||||
|
|
@ -523,7 +523,7 @@ static inline bool nvme_ns_head_multipath(struct nvme_ns_head *head)
|
||||||
enum nvme_ns_features {
|
enum nvme_ns_features {
|
||||||
NVME_NS_EXT_LBAS = 1 << 0, /* support extended LBA format */
|
NVME_NS_EXT_LBAS = 1 << 0, /* support extended LBA format */
|
||||||
NVME_NS_METADATA_SUPPORTED = 1 << 1, /* support getting generated md */
|
NVME_NS_METADATA_SUPPORTED = 1 << 1, /* support getting generated md */
|
||||||
NVME_NS_DEAC = 1 << 2, /* DEAC bit in Write Zeores supported */
|
NVME_NS_DEAC = 1 << 2, /* DEAC bit in Write Zeroes supported */
|
||||||
};
|
};
|
||||||
|
|
||||||
struct nvme_ns {
|
struct nvme_ns {
|
||||||
|
|
|
||||||
|
|
@ -3015,7 +3015,7 @@ static void nvme_reset_work(struct work_struct *work)
|
||||||
goto out;
|
goto out;
|
||||||
|
|
||||||
/*
|
/*
|
||||||
* Freeze and update the number of I/O queues as thos might have
|
* Freeze and update the number of I/O queues as those might have
|
||||||
* changed. If there are no I/O queues left after this reset, keep the
|
* changed. If there are no I/O queues left after this reset, keep the
|
||||||
* controller around but remove all namespaces.
|
* controller around but remove all namespaces.
|
||||||
*/
|
*/
|
||||||
|
|
@ -3186,7 +3186,7 @@ static unsigned long check_vendor_combination_bug(struct pci_dev *pdev)
|
||||||
/*
|
/*
|
||||||
* Exclude some Kingston NV1 and A2000 devices from
|
* Exclude some Kingston NV1 and A2000 devices from
|
||||||
* NVME_QUIRK_SIMPLE_SUSPEND. Do a full suspend to save a
|
* NVME_QUIRK_SIMPLE_SUSPEND. Do a full suspend to save a
|
||||||
* lot fo energy with s2idle sleep on some TUXEDO platforms.
|
* lot of energy with s2idle sleep on some TUXEDO platforms.
|
||||||
*/
|
*/
|
||||||
if (dmi_match(DMI_BOARD_NAME, "NS5X_NS7XAU") ||
|
if (dmi_match(DMI_BOARD_NAME, "NS5X_NS7XAU") ||
|
||||||
dmi_match(DMI_BOARD_NAME, "NS5x_7xAU") ||
|
dmi_match(DMI_BOARD_NAME, "NS5x_7xAU") ||
|
||||||
|
|
|
||||||
|
|
@ -82,8 +82,6 @@ static int nvme_status_to_pr_err(int status)
|
||||||
return PR_STS_SUCCESS;
|
return PR_STS_SUCCESS;
|
||||||
case NVME_SC_RESERVATION_CONFLICT:
|
case NVME_SC_RESERVATION_CONFLICT:
|
||||||
return PR_STS_RESERVATION_CONFLICT;
|
return PR_STS_RESERVATION_CONFLICT;
|
||||||
case NVME_SC_ONCS_NOT_SUPPORTED:
|
|
||||||
return -EOPNOTSUPP;
|
|
||||||
case NVME_SC_BAD_ATTRIBUTES:
|
case NVME_SC_BAD_ATTRIBUTES:
|
||||||
case NVME_SC_INVALID_OPCODE:
|
case NVME_SC_INVALID_OPCODE:
|
||||||
case NVME_SC_INVALID_FIELD:
|
case NVME_SC_INVALID_FIELD:
|
||||||
|
|
|
||||||
|
|
@ -221,7 +221,7 @@ static struct nvme_rdma_qe *nvme_rdma_alloc_ring(struct ib_device *ibdev,
|
||||||
|
|
||||||
/*
|
/*
|
||||||
* Bind the CQEs (post recv buffers) DMA mapping to the RDMA queue
|
* Bind the CQEs (post recv buffers) DMA mapping to the RDMA queue
|
||||||
* lifetime. It's safe, since any chage in the underlying RDMA device
|
* lifetime. It's safe, since any change in the underlying RDMA device
|
||||||
* will issue error recovery and queue re-creation.
|
* will issue error recovery and queue re-creation.
|
||||||
*/
|
*/
|
||||||
for (i = 0; i < ib_queue_size; i++) {
|
for (i = 0; i < ib_queue_size; i++) {
|
||||||
|
|
@ -800,7 +800,7 @@ static int nvme_rdma_configure_admin_queue(struct nvme_rdma_ctrl *ctrl,
|
||||||
|
|
||||||
/*
|
/*
|
||||||
* Bind the async event SQE DMA mapping to the admin queue lifetime.
|
* Bind the async event SQE DMA mapping to the admin queue lifetime.
|
||||||
* It's safe, since any chage in the underlying RDMA device will issue
|
* It's safe, since any change in the underlying RDMA device will issue
|
||||||
* error recovery and queue re-creation.
|
* error recovery and queue re-creation.
|
||||||
*/
|
*/
|
||||||
error = nvme_rdma_alloc_qe(ctrl->device->dev, &ctrl->async_event_sqe,
|
error = nvme_rdma_alloc_qe(ctrl->device->dev, &ctrl->async_event_sqe,
|
||||||
|
|
|
||||||
|
|
@ -452,7 +452,8 @@ nvme_tcp_fetch_request(struct nvme_tcp_queue *queue)
|
||||||
return NULL;
|
return NULL;
|
||||||
}
|
}
|
||||||
|
|
||||||
list_del(&req->entry);
|
list_del_init(&req->entry);
|
||||||
|
init_llist_node(&req->lentry);
|
||||||
return req;
|
return req;
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|
@ -565,6 +566,8 @@ static int nvme_tcp_init_request(struct blk_mq_tag_set *set,
|
||||||
req->queue = queue;
|
req->queue = queue;
|
||||||
nvme_req(rq)->ctrl = &ctrl->ctrl;
|
nvme_req(rq)->ctrl = &ctrl->ctrl;
|
||||||
nvme_req(rq)->cmd = &pdu->cmd;
|
nvme_req(rq)->cmd = &pdu->cmd;
|
||||||
|
init_llist_node(&req->lentry);
|
||||||
|
INIT_LIST_HEAD(&req->entry);
|
||||||
|
|
||||||
return 0;
|
return 0;
|
||||||
}
|
}
|
||||||
|
|
@ -769,6 +772,14 @@ static int nvme_tcp_handle_r2t(struct nvme_tcp_queue *queue,
|
||||||
return -EPROTO;
|
return -EPROTO;
|
||||||
}
|
}
|
||||||
|
|
||||||
|
if (llist_on_list(&req->lentry) ||
|
||||||
|
!list_empty(&req->entry)) {
|
||||||
|
dev_err(queue->ctrl->ctrl.device,
|
||||||
|
"req %d unexpected r2t while processing request\n",
|
||||||
|
rq->tag);
|
||||||
|
return -EPROTO;
|
||||||
|
}
|
||||||
|
|
||||||
req->pdu_len = 0;
|
req->pdu_len = 0;
|
||||||
req->h2cdata_left = r2t_length;
|
req->h2cdata_left = r2t_length;
|
||||||
req->h2cdata_offset = r2t_offset;
|
req->h2cdata_offset = r2t_offset;
|
||||||
|
|
@ -1355,7 +1366,7 @@ static int nvme_tcp_try_recv(struct nvme_tcp_queue *queue)
|
||||||
queue->nr_cqe = 0;
|
queue->nr_cqe = 0;
|
||||||
consumed = sock->ops->read_sock(sk, &rd_desc, nvme_tcp_recv_skb);
|
consumed = sock->ops->read_sock(sk, &rd_desc, nvme_tcp_recv_skb);
|
||||||
release_sock(sk);
|
release_sock(sk);
|
||||||
return consumed;
|
return consumed == -EAGAIN ? 0 : consumed;
|
||||||
}
|
}
|
||||||
|
|
||||||
static void nvme_tcp_io_work(struct work_struct *w)
|
static void nvme_tcp_io_work(struct work_struct *w)
|
||||||
|
|
@ -1383,6 +1394,11 @@ static void nvme_tcp_io_work(struct work_struct *w)
|
||||||
else if (unlikely(result < 0))
|
else if (unlikely(result < 0))
|
||||||
return;
|
return;
|
||||||
|
|
||||||
|
/* did we get some space after spending time in recv? */
|
||||||
|
if (nvme_tcp_queue_has_pending(queue) &&
|
||||||
|
sk_stream_is_writeable(queue->sock->sk))
|
||||||
|
pending = true;
|
||||||
|
|
||||||
if (!pending || !queue->rd_enabled)
|
if (!pending || !queue->rd_enabled)
|
||||||
return;
|
return;
|
||||||
|
|
||||||
|
|
@ -2350,7 +2366,7 @@ static int nvme_tcp_setup_ctrl(struct nvme_ctrl *ctrl, bool new)
|
||||||
nvme_tcp_teardown_admin_queue(ctrl, false);
|
nvme_tcp_teardown_admin_queue(ctrl, false);
|
||||||
ret = nvme_tcp_configure_admin_queue(ctrl, false);
|
ret = nvme_tcp_configure_admin_queue(ctrl, false);
|
||||||
if (ret)
|
if (ret)
|
||||||
return ret;
|
goto destroy_admin;
|
||||||
}
|
}
|
||||||
|
|
||||||
if (ctrl->icdoff) {
|
if (ctrl->icdoff) {
|
||||||
|
|
@ -2594,6 +2610,8 @@ static void nvme_tcp_submit_async_event(struct nvme_ctrl *arg)
|
||||||
ctrl->async_req.offset = 0;
|
ctrl->async_req.offset = 0;
|
||||||
ctrl->async_req.curr_bio = NULL;
|
ctrl->async_req.curr_bio = NULL;
|
||||||
ctrl->async_req.data_len = 0;
|
ctrl->async_req.data_len = 0;
|
||||||
|
init_llist_node(&ctrl->async_req.lentry);
|
||||||
|
INIT_LIST_HEAD(&ctrl->async_req.entry);
|
||||||
|
|
||||||
nvme_tcp_queue_request(&ctrl->async_req, true);
|
nvme_tcp_queue_request(&ctrl->async_req, true);
|
||||||
}
|
}
|
||||||
|
|
|
||||||
|
|
@ -1165,7 +1165,7 @@ static void nvmet_execute_identify(struct nvmet_req *req)
|
||||||
* A "minimum viable" abort implementation: the command is mandatory in the
|
* A "minimum viable" abort implementation: the command is mandatory in the
|
||||||
* spec, but we are not required to do any useful work. We couldn't really
|
* spec, but we are not required to do any useful work. We couldn't really
|
||||||
* do a useful abort, so don't bother even with waiting for the command
|
* do a useful abort, so don't bother even with waiting for the command
|
||||||
* to be exectuted and return immediately telling the command to abort
|
* to be executed and return immediately telling the command to abort
|
||||||
* wasn't found.
|
* wasn't found.
|
||||||
*/
|
*/
|
||||||
static void nvmet_execute_abort(struct nvmet_req *req)
|
static void nvmet_execute_abort(struct nvmet_req *req)
|
||||||
|
|
|
||||||
|
|
@ -62,14 +62,7 @@ inline u16 errno_to_nvme_status(struct nvmet_req *req, int errno)
|
||||||
return NVME_SC_LBA_RANGE | NVME_STATUS_DNR;
|
return NVME_SC_LBA_RANGE | NVME_STATUS_DNR;
|
||||||
case -EOPNOTSUPP:
|
case -EOPNOTSUPP:
|
||||||
req->error_loc = offsetof(struct nvme_common_command, opcode);
|
req->error_loc = offsetof(struct nvme_common_command, opcode);
|
||||||
switch (req->cmd->common.opcode) {
|
|
||||||
case nvme_cmd_dsm:
|
|
||||||
case nvme_cmd_write_zeroes:
|
|
||||||
return NVME_SC_ONCS_NOT_SUPPORTED | NVME_STATUS_DNR;
|
|
||||||
default:
|
|
||||||
return NVME_SC_INVALID_OPCODE | NVME_STATUS_DNR;
|
return NVME_SC_INVALID_OPCODE | NVME_STATUS_DNR;
|
||||||
}
|
|
||||||
break;
|
|
||||||
case -ENODATA:
|
case -ENODATA:
|
||||||
req->error_loc = offsetof(struct nvme_rw_command, nsid);
|
req->error_loc = offsetof(struct nvme_rw_command, nsid);
|
||||||
return NVME_SC_ACCESS_DENIED;
|
return NVME_SC_ACCESS_DENIED;
|
||||||
|
|
@ -651,7 +644,7 @@ void nvmet_ns_disable(struct nvmet_ns *ns)
|
||||||
* Now that we removed the namespaces from the lookup list, we
|
* Now that we removed the namespaces from the lookup list, we
|
||||||
* can kill the per_cpu ref and wait for any remaining references
|
* can kill the per_cpu ref and wait for any remaining references
|
||||||
* to be dropped, as well as a RCU grace period for anyone only
|
* to be dropped, as well as a RCU grace period for anyone only
|
||||||
* using the namepace under rcu_read_lock(). Note that we can't
|
* using the namespace under rcu_read_lock(). Note that we can't
|
||||||
* use call_rcu here as we need to ensure the namespaces have
|
* use call_rcu here as we need to ensure the namespaces have
|
||||||
* been fully destroyed before unloading the module.
|
* been fully destroyed before unloading the module.
|
||||||
*/
|
*/
|
||||||
|
|
|
||||||
|
|
@ -1339,7 +1339,7 @@ nvmet_fc_portentry_rebind_tgt(struct nvmet_fc_tgtport *tgtport)
|
||||||
/**
|
/**
|
||||||
* nvmet_fc_register_targetport - transport entry point called by an
|
* nvmet_fc_register_targetport - transport entry point called by an
|
||||||
* LLDD to register the existence of a local
|
* LLDD to register the existence of a local
|
||||||
* NVME subystem FC port.
|
* NVME subsystem FC port.
|
||||||
* @pinfo: pointer to information about the port to be registered
|
* @pinfo: pointer to information about the port to be registered
|
||||||
* @template: LLDD entrypoints and operational parameters for the port
|
* @template: LLDD entrypoints and operational parameters for the port
|
||||||
* @dev: physical hardware device node port corresponds to. Will be
|
* @dev: physical hardware device node port corresponds to. Will be
|
||||||
|
|
|
||||||
|
|
@ -133,7 +133,7 @@ u16 blk_to_nvme_status(struct nvmet_req *req, blk_status_t blk_sts)
|
||||||
* Right now there exists M : 1 mapping between block layer error
|
* Right now there exists M : 1 mapping between block layer error
|
||||||
* to the NVMe status code (see nvme_error_status()). For consistency,
|
* to the NVMe status code (see nvme_error_status()). For consistency,
|
||||||
* when we reverse map we use most appropriate NVMe Status code from
|
* when we reverse map we use most appropriate NVMe Status code from
|
||||||
* the group of the NVMe staus codes used in the nvme_error_status().
|
* the group of the NVMe status codes used in the nvme_error_status().
|
||||||
*/
|
*/
|
||||||
switch (blk_sts) {
|
switch (blk_sts) {
|
||||||
case BLK_STS_NOSPC:
|
case BLK_STS_NOSPC:
|
||||||
|
|
@ -145,15 +145,8 @@ u16 blk_to_nvme_status(struct nvmet_req *req, blk_status_t blk_sts)
|
||||||
req->error_loc = offsetof(struct nvme_rw_command, slba);
|
req->error_loc = offsetof(struct nvme_rw_command, slba);
|
||||||
break;
|
break;
|
||||||
case BLK_STS_NOTSUPP:
|
case BLK_STS_NOTSUPP:
|
||||||
req->error_loc = offsetof(struct nvme_common_command, opcode);
|
|
||||||
switch (req->cmd->common.opcode) {
|
|
||||||
case nvme_cmd_dsm:
|
|
||||||
case nvme_cmd_write_zeroes:
|
|
||||||
status = NVME_SC_ONCS_NOT_SUPPORTED | NVME_STATUS_DNR;
|
|
||||||
break;
|
|
||||||
default:
|
|
||||||
status = NVME_SC_INVALID_OPCODE | NVME_STATUS_DNR;
|
status = NVME_SC_INVALID_OPCODE | NVME_STATUS_DNR;
|
||||||
}
|
req->error_loc = offsetof(struct nvme_common_command, opcode);
|
||||||
break;
|
break;
|
||||||
case BLK_STS_MEDIUM:
|
case BLK_STS_MEDIUM:
|
||||||
status = NVME_SC_ACCESS_DENIED;
|
status = NVME_SC_ACCESS_DENIED;
|
||||||
|
|
|
||||||
|
|
@ -99,7 +99,7 @@ static u16 nvmet_passthru_override_id_ctrl(struct nvmet_req *req)
|
||||||
|
|
||||||
/*
|
/*
|
||||||
* The passthru NVMe driver may have a limit on the number of segments
|
* The passthru NVMe driver may have a limit on the number of segments
|
||||||
* which depends on the host's memory fragementation. To solve this,
|
* which depends on the host's memory fragmentation. To solve this,
|
||||||
* ensure mdts is limited to the pages equal to the number of segments.
|
* ensure mdts is limited to the pages equal to the number of segments.
|
||||||
*/
|
*/
|
||||||
max_hw_sectors = min_not_zero(pctrl->max_segments << PAGE_SECTORS_SHIFT,
|
max_hw_sectors = min_not_zero(pctrl->max_segments << PAGE_SECTORS_SHIFT,
|
||||||
|
|
|
||||||
|
|
@ -2171,7 +2171,7 @@ enum {
|
||||||
NVME_SC_BAD_ATTRIBUTES = 0x180,
|
NVME_SC_BAD_ATTRIBUTES = 0x180,
|
||||||
NVME_SC_INVALID_PI = 0x181,
|
NVME_SC_INVALID_PI = 0x181,
|
||||||
NVME_SC_READ_ONLY = 0x182,
|
NVME_SC_READ_ONLY = 0x182,
|
||||||
NVME_SC_ONCS_NOT_SUPPORTED = 0x183,
|
NVME_SC_CMD_SIZE_LIM_EXCEEDED = 0x183,
|
||||||
|
|
||||||
/*
|
/*
|
||||||
* I/O Command Set Specific - Fabrics commands:
|
* I/O Command Set Specific - Fabrics commands:
|
||||||
|
|
|
||||||
|
|
@ -272,6 +272,15 @@
|
||||||
*/
|
*/
|
||||||
#define UBLK_F_QUIESCE (1ULL << 12)
|
#define UBLK_F_QUIESCE (1ULL << 12)
|
||||||
|
|
||||||
|
/*
|
||||||
|
* If this feature is set, ublk_drv supports each (qid,tag) pair having
|
||||||
|
* its own independent daemon task that is responsible for handling it.
|
||||||
|
* If it is not set, daemons are per-queue instead, so for two pairs
|
||||||
|
* (qid1,tag1) and (qid2,tag2), if qid1 == qid2, then the same task must
|
||||||
|
* be responsible for handling (qid1,tag1) and (qid2,tag2).
|
||||||
|
*/
|
||||||
|
#define UBLK_F_PER_IO_DAEMON (1ULL << 13)
|
||||||
|
|
||||||
/* device state */
|
/* device state */
|
||||||
#define UBLK_S_DEV_DEAD 0
|
#define UBLK_S_DEV_DEAD 0
|
||||||
#define UBLK_S_DEV_LIVE 1
|
#define UBLK_S_DEV_LIVE 1
|
||||||
|
|
|
||||||
|
|
@ -19,6 +19,7 @@ TEST_PROGS += test_generic_08.sh
|
||||||
TEST_PROGS += test_generic_09.sh
|
TEST_PROGS += test_generic_09.sh
|
||||||
TEST_PROGS += test_generic_10.sh
|
TEST_PROGS += test_generic_10.sh
|
||||||
TEST_PROGS += test_generic_11.sh
|
TEST_PROGS += test_generic_11.sh
|
||||||
|
TEST_PROGS += test_generic_12.sh
|
||||||
|
|
||||||
TEST_PROGS += test_null_01.sh
|
TEST_PROGS += test_null_01.sh
|
||||||
TEST_PROGS += test_null_02.sh
|
TEST_PROGS += test_null_02.sh
|
||||||
|
|
|
||||||
|
|
@ -46,9 +46,9 @@ static int ublk_fault_inject_queue_io(struct ublk_queue *q, int tag)
|
||||||
.tv_nsec = (long long)q->dev->private_data,
|
.tv_nsec = (long long)q->dev->private_data,
|
||||||
};
|
};
|
||||||
|
|
||||||
ublk_queue_alloc_sqes(q, &sqe, 1);
|
ublk_io_alloc_sqes(ublk_get_io(q, tag), &sqe, 1);
|
||||||
io_uring_prep_timeout(sqe, &ts, 1, 0);
|
io_uring_prep_timeout(sqe, &ts, 1, 0);
|
||||||
sqe->user_data = build_user_data(tag, ublksrv_get_op(iod), 0, 1);
|
sqe->user_data = build_user_data(tag, ublksrv_get_op(iod), 0, q->q_id, 1);
|
||||||
|
|
||||||
ublk_queued_tgt_io(q, tag, 1);
|
ublk_queued_tgt_io(q, tag, 1);
|
||||||
|
|
||||||
|
|
|
||||||
|
|
@ -18,11 +18,11 @@ static int loop_queue_flush_io(struct ublk_queue *q, const struct ublksrv_io_des
|
||||||
unsigned ublk_op = ublksrv_get_op(iod);
|
unsigned ublk_op = ublksrv_get_op(iod);
|
||||||
struct io_uring_sqe *sqe[1];
|
struct io_uring_sqe *sqe[1];
|
||||||
|
|
||||||
ublk_queue_alloc_sqes(q, sqe, 1);
|
ublk_io_alloc_sqes(ublk_get_io(q, tag), sqe, 1);
|
||||||
io_uring_prep_fsync(sqe[0], 1 /*fds[1]*/, IORING_FSYNC_DATASYNC);
|
io_uring_prep_fsync(sqe[0], 1 /*fds[1]*/, IORING_FSYNC_DATASYNC);
|
||||||
io_uring_sqe_set_flags(sqe[0], IOSQE_FIXED_FILE);
|
io_uring_sqe_set_flags(sqe[0], IOSQE_FIXED_FILE);
|
||||||
/* bit63 marks us as tgt io */
|
/* bit63 marks us as tgt io */
|
||||||
sqe[0]->user_data = build_user_data(tag, ublk_op, 0, 1);
|
sqe[0]->user_data = build_user_data(tag, ublk_op, 0, q->q_id, 1);
|
||||||
return 1;
|
return 1;
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|
@ -36,7 +36,7 @@ static int loop_queue_tgt_rw_io(struct ublk_queue *q, const struct ublksrv_io_de
|
||||||
void *addr = (zc | auto_zc) ? NULL : (void *)iod->addr;
|
void *addr = (zc | auto_zc) ? NULL : (void *)iod->addr;
|
||||||
|
|
||||||
if (!zc || auto_zc) {
|
if (!zc || auto_zc) {
|
||||||
ublk_queue_alloc_sqes(q, sqe, 1);
|
ublk_io_alloc_sqes(ublk_get_io(q, tag), sqe, 1);
|
||||||
if (!sqe[0])
|
if (!sqe[0])
|
||||||
return -ENOMEM;
|
return -ENOMEM;
|
||||||
|
|
||||||
|
|
@ -48,26 +48,26 @@ static int loop_queue_tgt_rw_io(struct ublk_queue *q, const struct ublksrv_io_de
|
||||||
sqe[0]->buf_index = tag;
|
sqe[0]->buf_index = tag;
|
||||||
io_uring_sqe_set_flags(sqe[0], IOSQE_FIXED_FILE);
|
io_uring_sqe_set_flags(sqe[0], IOSQE_FIXED_FILE);
|
||||||
/* bit63 marks us as tgt io */
|
/* bit63 marks us as tgt io */
|
||||||
sqe[0]->user_data = build_user_data(tag, ublk_op, 0, 1);
|
sqe[0]->user_data = build_user_data(tag, ublk_op, 0, q->q_id, 1);
|
||||||
return 1;
|
return 1;
|
||||||
}
|
}
|
||||||
|
|
||||||
ublk_queue_alloc_sqes(q, sqe, 3);
|
ublk_io_alloc_sqes(ublk_get_io(q, tag), sqe, 3);
|
||||||
|
|
||||||
io_uring_prep_buf_register(sqe[0], 0, tag, q->q_id, tag);
|
io_uring_prep_buf_register(sqe[0], 0, tag, q->q_id, ublk_get_io(q, tag)->buf_index);
|
||||||
sqe[0]->flags |= IOSQE_CQE_SKIP_SUCCESS | IOSQE_IO_HARDLINK;
|
sqe[0]->flags |= IOSQE_CQE_SKIP_SUCCESS | IOSQE_IO_HARDLINK;
|
||||||
sqe[0]->user_data = build_user_data(tag,
|
sqe[0]->user_data = build_user_data(tag,
|
||||||
ublk_cmd_op_nr(sqe[0]->cmd_op), 0, 1);
|
ublk_cmd_op_nr(sqe[0]->cmd_op), 0, q->q_id, 1);
|
||||||
|
|
||||||
io_uring_prep_rw(op, sqe[1], 1 /*fds[1]*/, 0,
|
io_uring_prep_rw(op, sqe[1], 1 /*fds[1]*/, 0,
|
||||||
iod->nr_sectors << 9,
|
iod->nr_sectors << 9,
|
||||||
iod->start_sector << 9);
|
iod->start_sector << 9);
|
||||||
sqe[1]->buf_index = tag;
|
sqe[1]->buf_index = tag;
|
||||||
sqe[1]->flags |= IOSQE_FIXED_FILE | IOSQE_IO_HARDLINK;
|
sqe[1]->flags |= IOSQE_FIXED_FILE | IOSQE_IO_HARDLINK;
|
||||||
sqe[1]->user_data = build_user_data(tag, ublk_op, 0, 1);
|
sqe[1]->user_data = build_user_data(tag, ublk_op, 0, q->q_id, 1);
|
||||||
|
|
||||||
io_uring_prep_buf_unregister(sqe[2], 0, tag, q->q_id, tag);
|
io_uring_prep_buf_unregister(sqe[2], 0, tag, q->q_id, ublk_get_io(q, tag)->buf_index);
|
||||||
sqe[2]->user_data = build_user_data(tag, ublk_cmd_op_nr(sqe[2]->cmd_op), 0, 1);
|
sqe[2]->user_data = build_user_data(tag, ublk_cmd_op_nr(sqe[2]->cmd_op), 0, q->q_id, 1);
|
||||||
|
|
||||||
return 2;
|
return 2;
|
||||||
}
|
}
|
||||||
|
|
|
||||||
|
|
@ -348,8 +348,8 @@ static void ublk_ctrl_dump(struct ublk_dev *dev)
|
||||||
|
|
||||||
for (i = 0; i < info->nr_hw_queues; i++) {
|
for (i = 0; i < info->nr_hw_queues; i++) {
|
||||||
ublk_print_cpu_set(&affinity[i], buf, sizeof(buf));
|
ublk_print_cpu_set(&affinity[i], buf, sizeof(buf));
|
||||||
printf("\tqueue %u: tid %d affinity(%s)\n",
|
printf("\tqueue %u: affinity(%s)\n",
|
||||||
i, dev->q[i].tid, buf);
|
i, buf);
|
||||||
}
|
}
|
||||||
free(affinity);
|
free(affinity);
|
||||||
}
|
}
|
||||||
|
|
@ -412,16 +412,6 @@ static void ublk_queue_deinit(struct ublk_queue *q)
|
||||||
int i;
|
int i;
|
||||||
int nr_ios = q->q_depth;
|
int nr_ios = q->q_depth;
|
||||||
|
|
||||||
io_uring_unregister_buffers(&q->ring);
|
|
||||||
|
|
||||||
io_uring_unregister_ring_fd(&q->ring);
|
|
||||||
|
|
||||||
if (q->ring.ring_fd > 0) {
|
|
||||||
io_uring_unregister_files(&q->ring);
|
|
||||||
close(q->ring.ring_fd);
|
|
||||||
q->ring.ring_fd = -1;
|
|
||||||
}
|
|
||||||
|
|
||||||
if (q->io_cmd_buf)
|
if (q->io_cmd_buf)
|
||||||
munmap(q->io_cmd_buf, ublk_queue_cmd_buf_sz(q));
|
munmap(q->io_cmd_buf, ublk_queue_cmd_buf_sz(q));
|
||||||
|
|
||||||
|
|
@ -429,20 +419,30 @@ static void ublk_queue_deinit(struct ublk_queue *q)
|
||||||
free(q->ios[i].buf_addr);
|
free(q->ios[i].buf_addr);
|
||||||
}
|
}
|
||||||
|
|
||||||
|
static void ublk_thread_deinit(struct ublk_thread *t)
|
||||||
|
{
|
||||||
|
io_uring_unregister_buffers(&t->ring);
|
||||||
|
|
||||||
|
io_uring_unregister_ring_fd(&t->ring);
|
||||||
|
|
||||||
|
if (t->ring.ring_fd > 0) {
|
||||||
|
io_uring_unregister_files(&t->ring);
|
||||||
|
close(t->ring.ring_fd);
|
||||||
|
t->ring.ring_fd = -1;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
static int ublk_queue_init(struct ublk_queue *q, unsigned extra_flags)
|
static int ublk_queue_init(struct ublk_queue *q, unsigned extra_flags)
|
||||||
{
|
{
|
||||||
struct ublk_dev *dev = q->dev;
|
struct ublk_dev *dev = q->dev;
|
||||||
int depth = dev->dev_info.queue_depth;
|
int depth = dev->dev_info.queue_depth;
|
||||||
int i, ret = -1;
|
int i;
|
||||||
int cmd_buf_size, io_buf_size;
|
int cmd_buf_size, io_buf_size;
|
||||||
unsigned long off;
|
unsigned long off;
|
||||||
int ring_depth = dev->tgt.sq_depth, cq_depth = dev->tgt.cq_depth;
|
|
||||||
|
|
||||||
q->tgt_ops = dev->tgt.ops;
|
q->tgt_ops = dev->tgt.ops;
|
||||||
q->state = 0;
|
q->state = 0;
|
||||||
q->q_depth = depth;
|
q->q_depth = depth;
|
||||||
q->cmd_inflight = 0;
|
|
||||||
q->tid = gettid();
|
|
||||||
|
|
||||||
if (dev->dev_info.flags & (UBLK_F_SUPPORT_ZERO_COPY | UBLK_F_AUTO_BUF_REG)) {
|
if (dev->dev_info.flags & (UBLK_F_SUPPORT_ZERO_COPY | UBLK_F_AUTO_BUF_REG)) {
|
||||||
q->state |= UBLKSRV_NO_BUF;
|
q->state |= UBLKSRV_NO_BUF;
|
||||||
|
|
@ -467,6 +467,7 @@ static int ublk_queue_init(struct ublk_queue *q, unsigned extra_flags)
|
||||||
for (i = 0; i < q->q_depth; i++) {
|
for (i = 0; i < q->q_depth; i++) {
|
||||||
q->ios[i].buf_addr = NULL;
|
q->ios[i].buf_addr = NULL;
|
||||||
q->ios[i].flags = UBLKSRV_NEED_FETCH_RQ | UBLKSRV_IO_FREE;
|
q->ios[i].flags = UBLKSRV_NEED_FETCH_RQ | UBLKSRV_IO_FREE;
|
||||||
|
q->ios[i].tag = i;
|
||||||
|
|
||||||
if (q->state & UBLKSRV_NO_BUF)
|
if (q->state & UBLKSRV_NO_BUF)
|
||||||
continue;
|
continue;
|
||||||
|
|
@ -479,34 +480,6 @@ static int ublk_queue_init(struct ublk_queue *q, unsigned extra_flags)
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
ret = ublk_setup_ring(&q->ring, ring_depth, cq_depth,
|
|
||||||
IORING_SETUP_COOP_TASKRUN |
|
|
||||||
IORING_SETUP_SINGLE_ISSUER |
|
|
||||||
IORING_SETUP_DEFER_TASKRUN);
|
|
||||||
if (ret < 0) {
|
|
||||||
ublk_err("ublk dev %d queue %d setup io_uring failed %d\n",
|
|
||||||
q->dev->dev_info.dev_id, q->q_id, ret);
|
|
||||||
goto fail;
|
|
||||||
}
|
|
||||||
|
|
||||||
if (dev->dev_info.flags & (UBLK_F_SUPPORT_ZERO_COPY | UBLK_F_AUTO_BUF_REG)) {
|
|
||||||
ret = io_uring_register_buffers_sparse(&q->ring, q->q_depth);
|
|
||||||
if (ret) {
|
|
||||||
ublk_err("ublk dev %d queue %d register spare buffers failed %d",
|
|
||||||
dev->dev_info.dev_id, q->q_id, ret);
|
|
||||||
goto fail;
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
io_uring_register_ring_fd(&q->ring);
|
|
||||||
|
|
||||||
ret = io_uring_register_files(&q->ring, dev->fds, dev->nr_fds);
|
|
||||||
if (ret) {
|
|
||||||
ublk_err("ublk dev %d queue %d register files failed %d\n",
|
|
||||||
q->dev->dev_info.dev_id, q->q_id, ret);
|
|
||||||
goto fail;
|
|
||||||
}
|
|
||||||
|
|
||||||
return 0;
|
return 0;
|
||||||
fail:
|
fail:
|
||||||
ublk_queue_deinit(q);
|
ublk_queue_deinit(q);
|
||||||
|
|
@ -515,6 +488,52 @@ static int ublk_queue_init(struct ublk_queue *q, unsigned extra_flags)
|
||||||
return -ENOMEM;
|
return -ENOMEM;
|
||||||
}
|
}
|
||||||
|
|
||||||
|
static int ublk_thread_init(struct ublk_thread *t)
|
||||||
|
{
|
||||||
|
struct ublk_dev *dev = t->dev;
|
||||||
|
int ring_depth = dev->tgt.sq_depth, cq_depth = dev->tgt.cq_depth;
|
||||||
|
int ret;
|
||||||
|
|
||||||
|
ret = ublk_setup_ring(&t->ring, ring_depth, cq_depth,
|
||||||
|
IORING_SETUP_COOP_TASKRUN |
|
||||||
|
IORING_SETUP_SINGLE_ISSUER |
|
||||||
|
IORING_SETUP_DEFER_TASKRUN);
|
||||||
|
if (ret < 0) {
|
||||||
|
ublk_err("ublk dev %d thread %d setup io_uring failed %d\n",
|
||||||
|
dev->dev_info.dev_id, t->idx, ret);
|
||||||
|
goto fail;
|
||||||
|
}
|
||||||
|
|
||||||
|
if (dev->dev_info.flags & (UBLK_F_SUPPORT_ZERO_COPY | UBLK_F_AUTO_BUF_REG)) {
|
||||||
|
unsigned nr_ios = dev->dev_info.queue_depth * dev->dev_info.nr_hw_queues;
|
||||||
|
unsigned max_nr_ios_per_thread = nr_ios / dev->nthreads;
|
||||||
|
max_nr_ios_per_thread += !!(nr_ios % dev->nthreads);
|
||||||
|
ret = io_uring_register_buffers_sparse(
|
||||||
|
&t->ring, max_nr_ios_per_thread);
|
||||||
|
if (ret) {
|
||||||
|
ublk_err("ublk dev %d thread %d register spare buffers failed %d",
|
||||||
|
dev->dev_info.dev_id, t->idx, ret);
|
||||||
|
goto fail;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
io_uring_register_ring_fd(&t->ring);
|
||||||
|
|
||||||
|
ret = io_uring_register_files(&t->ring, dev->fds, dev->nr_fds);
|
||||||
|
if (ret) {
|
||||||
|
ublk_err("ublk dev %d thread %d register files failed %d\n",
|
||||||
|
t->dev->dev_info.dev_id, t->idx, ret);
|
||||||
|
goto fail;
|
||||||
|
}
|
||||||
|
|
||||||
|
return 0;
|
||||||
|
fail:
|
||||||
|
ublk_thread_deinit(t);
|
||||||
|
ublk_err("ublk dev %d thread %d init failed\n",
|
||||||
|
dev->dev_info.dev_id, t->idx);
|
||||||
|
return -ENOMEM;
|
||||||
|
}
|
||||||
|
|
||||||
#define WAIT_USEC 100000
|
#define WAIT_USEC 100000
|
||||||
#define MAX_WAIT_USEC (3 * 1000000)
|
#define MAX_WAIT_USEC (3 * 1000000)
|
||||||
static int ublk_dev_prep(const struct dev_ctx *ctx, struct ublk_dev *dev)
|
static int ublk_dev_prep(const struct dev_ctx *ctx, struct ublk_dev *dev)
|
||||||
|
|
@ -562,7 +581,7 @@ static void ublk_set_auto_buf_reg(const struct ublk_queue *q,
|
||||||
if (q->tgt_ops->buf_index)
|
if (q->tgt_ops->buf_index)
|
||||||
buf.index = q->tgt_ops->buf_index(q, tag);
|
buf.index = q->tgt_ops->buf_index(q, tag);
|
||||||
else
|
else
|
||||||
buf.index = tag;
|
buf.index = q->ios[tag].buf_index;
|
||||||
|
|
||||||
if (q->state & UBLKSRV_AUTO_BUF_REG_FALLBACK)
|
if (q->state & UBLKSRV_AUTO_BUF_REG_FALLBACK)
|
||||||
buf.flags = UBLK_AUTO_BUF_REG_FALLBACK;
|
buf.flags = UBLK_AUTO_BUF_REG_FALLBACK;
|
||||||
|
|
@ -570,8 +589,10 @@ static void ublk_set_auto_buf_reg(const struct ublk_queue *q,
|
||||||
sqe->addr = ublk_auto_buf_reg_to_sqe_addr(&buf);
|
sqe->addr = ublk_auto_buf_reg_to_sqe_addr(&buf);
|
||||||
}
|
}
|
||||||
|
|
||||||
int ublk_queue_io_cmd(struct ublk_queue *q, struct ublk_io *io, unsigned tag)
|
int ublk_queue_io_cmd(struct ublk_io *io)
|
||||||
{
|
{
|
||||||
|
struct ublk_thread *t = io->t;
|
||||||
|
struct ublk_queue *q = ublk_io_to_queue(io);
|
||||||
struct ublksrv_io_cmd *cmd;
|
struct ublksrv_io_cmd *cmd;
|
||||||
struct io_uring_sqe *sqe[1];
|
struct io_uring_sqe *sqe[1];
|
||||||
unsigned int cmd_op = 0;
|
unsigned int cmd_op = 0;
|
||||||
|
|
@ -596,13 +617,13 @@ int ublk_queue_io_cmd(struct ublk_queue *q, struct ublk_io *io, unsigned tag)
|
||||||
else if (io->flags & UBLKSRV_NEED_FETCH_RQ)
|
else if (io->flags & UBLKSRV_NEED_FETCH_RQ)
|
||||||
cmd_op = UBLK_U_IO_FETCH_REQ;
|
cmd_op = UBLK_U_IO_FETCH_REQ;
|
||||||
|
|
||||||
if (io_uring_sq_space_left(&q->ring) < 1)
|
if (io_uring_sq_space_left(&t->ring) < 1)
|
||||||
io_uring_submit(&q->ring);
|
io_uring_submit(&t->ring);
|
||||||
|
|
||||||
ublk_queue_alloc_sqes(q, sqe, 1);
|
ublk_io_alloc_sqes(io, sqe, 1);
|
||||||
if (!sqe[0]) {
|
if (!sqe[0]) {
|
||||||
ublk_err("%s: run out of sqe %d, tag %d\n",
|
ublk_err("%s: run out of sqe. thread %u, tag %d\n",
|
||||||
__func__, q->q_id, tag);
|
__func__, t->idx, io->tag);
|
||||||
return -1;
|
return -1;
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|
@ -617,7 +638,7 @@ int ublk_queue_io_cmd(struct ublk_queue *q, struct ublk_io *io, unsigned tag)
|
||||||
sqe[0]->opcode = IORING_OP_URING_CMD;
|
sqe[0]->opcode = IORING_OP_URING_CMD;
|
||||||
sqe[0]->flags = IOSQE_FIXED_FILE;
|
sqe[0]->flags = IOSQE_FIXED_FILE;
|
||||||
sqe[0]->rw_flags = 0;
|
sqe[0]->rw_flags = 0;
|
||||||
cmd->tag = tag;
|
cmd->tag = io->tag;
|
||||||
cmd->q_id = q->q_id;
|
cmd->q_id = q->q_id;
|
||||||
if (!(q->state & UBLKSRV_NO_BUF))
|
if (!(q->state & UBLKSRV_NO_BUF))
|
||||||
cmd->addr = (__u64) (uintptr_t) io->buf_addr;
|
cmd->addr = (__u64) (uintptr_t) io->buf_addr;
|
||||||
|
|
@ -625,37 +646,72 @@ int ublk_queue_io_cmd(struct ublk_queue *q, struct ublk_io *io, unsigned tag)
|
||||||
cmd->addr = 0;
|
cmd->addr = 0;
|
||||||
|
|
||||||
if (q->state & UBLKSRV_AUTO_BUF_REG)
|
if (q->state & UBLKSRV_AUTO_BUF_REG)
|
||||||
ublk_set_auto_buf_reg(q, sqe[0], tag);
|
ublk_set_auto_buf_reg(q, sqe[0], io->tag);
|
||||||
|
|
||||||
user_data = build_user_data(tag, _IOC_NR(cmd_op), 0, 0);
|
user_data = build_user_data(io->tag, _IOC_NR(cmd_op), 0, q->q_id, 0);
|
||||||
io_uring_sqe_set_data64(sqe[0], user_data);
|
io_uring_sqe_set_data64(sqe[0], user_data);
|
||||||
|
|
||||||
io->flags = 0;
|
io->flags = 0;
|
||||||
|
|
||||||
q->cmd_inflight += 1;
|
t->cmd_inflight += 1;
|
||||||
|
|
||||||
ublk_dbg(UBLK_DBG_IO_CMD, "%s: (qid %d tag %u cmd_op %u) iof %x stopping %d\n",
|
ublk_dbg(UBLK_DBG_IO_CMD, "%s: (thread %u qid %d tag %u cmd_op %u) iof %x stopping %d\n",
|
||||||
__func__, q->q_id, tag, cmd_op,
|
__func__, t->idx, q->q_id, io->tag, cmd_op,
|
||||||
io->flags, !!(q->state & UBLKSRV_QUEUE_STOPPING));
|
io->flags, !!(t->state & UBLKSRV_THREAD_STOPPING));
|
||||||
return 1;
|
return 1;
|
||||||
}
|
}
|
||||||
|
|
||||||
static void ublk_submit_fetch_commands(struct ublk_queue *q)
|
static void ublk_submit_fetch_commands(struct ublk_thread *t)
|
||||||
{
|
{
|
||||||
int i = 0;
|
struct ublk_queue *q;
|
||||||
|
struct ublk_io *io;
|
||||||
|
int i = 0, j = 0;
|
||||||
|
|
||||||
for (i = 0; i < q->q_depth; i++)
|
if (t->dev->per_io_tasks) {
|
||||||
ublk_queue_io_cmd(q, &q->ios[i], i);
|
/*
|
||||||
|
* Lexicographically order all the (qid,tag) pairs, with
|
||||||
|
* qid taking priority (so (1,0) > (0,1)). Then make
|
||||||
|
* this thread the daemon for every Nth entry in this
|
||||||
|
* list (N is the number of threads), starting at this
|
||||||
|
* thread's index. This ensures that each queue is
|
||||||
|
* handled by as many ublk server threads as possible,
|
||||||
|
* so that load that is concentrated on one or a few
|
||||||
|
* queues can make use of all ublk server threads.
|
||||||
|
*/
|
||||||
|
const struct ublksrv_ctrl_dev_info *dinfo = &t->dev->dev_info;
|
||||||
|
int nr_ios = dinfo->nr_hw_queues * dinfo->queue_depth;
|
||||||
|
for (i = t->idx; i < nr_ios; i += t->dev->nthreads) {
|
||||||
|
int q_id = i / dinfo->queue_depth;
|
||||||
|
int tag = i % dinfo->queue_depth;
|
||||||
|
q = &t->dev->q[q_id];
|
||||||
|
io = &q->ios[tag];
|
||||||
|
io->t = t;
|
||||||
|
io->buf_index = j++;
|
||||||
|
ublk_queue_io_cmd(io);
|
||||||
|
}
|
||||||
|
} else {
|
||||||
|
/*
|
||||||
|
* Service exclusively the queue whose q_id matches our
|
||||||
|
* thread index.
|
||||||
|
*/
|
||||||
|
struct ublk_queue *q = &t->dev->q[t->idx];
|
||||||
|
for (i = 0; i < q->q_depth; i++) {
|
||||||
|
io = &q->ios[i];
|
||||||
|
io->t = t;
|
||||||
|
io->buf_index = i;
|
||||||
|
ublk_queue_io_cmd(io);
|
||||||
|
}
|
||||||
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
static int ublk_queue_is_idle(struct ublk_queue *q)
|
static int ublk_thread_is_idle(struct ublk_thread *t)
|
||||||
{
|
{
|
||||||
return !io_uring_sq_ready(&q->ring) && !q->io_inflight;
|
return !io_uring_sq_ready(&t->ring) && !t->io_inflight;
|
||||||
}
|
}
|
||||||
|
|
||||||
static int ublk_queue_is_done(struct ublk_queue *q)
|
static int ublk_thread_is_done(struct ublk_thread *t)
|
||||||
{
|
{
|
||||||
return (q->state & UBLKSRV_QUEUE_STOPPING) && ublk_queue_is_idle(q);
|
return (t->state & UBLKSRV_THREAD_STOPPING) && ublk_thread_is_idle(t);
|
||||||
}
|
}
|
||||||
|
|
||||||
static inline void ublksrv_handle_tgt_cqe(struct ublk_queue *q,
|
static inline void ublksrv_handle_tgt_cqe(struct ublk_queue *q,
|
||||||
|
|
@ -673,14 +729,16 @@ static inline void ublksrv_handle_tgt_cqe(struct ublk_queue *q,
|
||||||
q->tgt_ops->tgt_io_done(q, tag, cqe);
|
q->tgt_ops->tgt_io_done(q, tag, cqe);
|
||||||
}
|
}
|
||||||
|
|
||||||
static void ublk_handle_cqe(struct io_uring *r,
|
static void ublk_handle_cqe(struct ublk_thread *t,
|
||||||
struct io_uring_cqe *cqe, void *data)
|
struct io_uring_cqe *cqe, void *data)
|
||||||
{
|
{
|
||||||
struct ublk_queue *q = container_of(r, struct ublk_queue, ring);
|
struct ublk_dev *dev = t->dev;
|
||||||
|
unsigned q_id = user_data_to_q_id(cqe->user_data);
|
||||||
|
struct ublk_queue *q = &dev->q[q_id];
|
||||||
unsigned tag = user_data_to_tag(cqe->user_data);
|
unsigned tag = user_data_to_tag(cqe->user_data);
|
||||||
unsigned cmd_op = user_data_to_op(cqe->user_data);
|
unsigned cmd_op = user_data_to_op(cqe->user_data);
|
||||||
int fetch = (cqe->res != UBLK_IO_RES_ABORT) &&
|
int fetch = (cqe->res != UBLK_IO_RES_ABORT) &&
|
||||||
!(q->state & UBLKSRV_QUEUE_STOPPING);
|
!(t->state & UBLKSRV_THREAD_STOPPING);
|
||||||
struct ublk_io *io;
|
struct ublk_io *io;
|
||||||
|
|
||||||
if (cqe->res < 0 && cqe->res != -ENODEV)
|
if (cqe->res < 0 && cqe->res != -ENODEV)
|
||||||
|
|
@ -691,7 +749,7 @@ static void ublk_handle_cqe(struct io_uring *r,
|
||||||
__func__, cqe->res, q->q_id, tag, cmd_op,
|
__func__, cqe->res, q->q_id, tag, cmd_op,
|
||||||
is_target_io(cqe->user_data),
|
is_target_io(cqe->user_data),
|
||||||
user_data_to_tgt_data(cqe->user_data),
|
user_data_to_tgt_data(cqe->user_data),
|
||||||
(q->state & UBLKSRV_QUEUE_STOPPING));
|
(t->state & UBLKSRV_THREAD_STOPPING));
|
||||||
|
|
||||||
/* Don't retrieve io in case of target io */
|
/* Don't retrieve io in case of target io */
|
||||||
if (is_target_io(cqe->user_data)) {
|
if (is_target_io(cqe->user_data)) {
|
||||||
|
|
@ -700,10 +758,10 @@ static void ublk_handle_cqe(struct io_uring *r,
|
||||||
}
|
}
|
||||||
|
|
||||||
io = &q->ios[tag];
|
io = &q->ios[tag];
|
||||||
q->cmd_inflight--;
|
t->cmd_inflight--;
|
||||||
|
|
||||||
if (!fetch) {
|
if (!fetch) {
|
||||||
q->state |= UBLKSRV_QUEUE_STOPPING;
|
t->state |= UBLKSRV_THREAD_STOPPING;
|
||||||
io->flags &= ~UBLKSRV_NEED_FETCH_RQ;
|
io->flags &= ~UBLKSRV_NEED_FETCH_RQ;
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|
@ -713,7 +771,7 @@ static void ublk_handle_cqe(struct io_uring *r,
|
||||||
q->tgt_ops->queue_io(q, tag);
|
q->tgt_ops->queue_io(q, tag);
|
||||||
} else if (cqe->res == UBLK_IO_RES_NEED_GET_DATA) {
|
} else if (cqe->res == UBLK_IO_RES_NEED_GET_DATA) {
|
||||||
io->flags |= UBLKSRV_NEED_GET_DATA | UBLKSRV_IO_FREE;
|
io->flags |= UBLKSRV_NEED_GET_DATA | UBLKSRV_IO_FREE;
|
||||||
ublk_queue_io_cmd(q, io, tag);
|
ublk_queue_io_cmd(io);
|
||||||
} else {
|
} else {
|
||||||
/*
|
/*
|
||||||
* COMMIT_REQ will be completed immediately since no fetching
|
* COMMIT_REQ will be completed immediately since no fetching
|
||||||
|
|
@ -727,92 +785,93 @@ static void ublk_handle_cqe(struct io_uring *r,
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
static int ublk_reap_events_uring(struct io_uring *r)
|
static int ublk_reap_events_uring(struct ublk_thread *t)
|
||||||
{
|
{
|
||||||
struct io_uring_cqe *cqe;
|
struct io_uring_cqe *cqe;
|
||||||
unsigned head;
|
unsigned head;
|
||||||
int count = 0;
|
int count = 0;
|
||||||
|
|
||||||
io_uring_for_each_cqe(r, head, cqe) {
|
io_uring_for_each_cqe(&t->ring, head, cqe) {
|
||||||
ublk_handle_cqe(r, cqe, NULL);
|
ublk_handle_cqe(t, cqe, NULL);
|
||||||
count += 1;
|
count += 1;
|
||||||
}
|
}
|
||||||
io_uring_cq_advance(r, count);
|
io_uring_cq_advance(&t->ring, count);
|
||||||
|
|
||||||
return count;
|
return count;
|
||||||
}
|
}
|
||||||
|
|
||||||
static int ublk_process_io(struct ublk_queue *q)
|
static int ublk_process_io(struct ublk_thread *t)
|
||||||
{
|
{
|
||||||
int ret, reapped;
|
int ret, reapped;
|
||||||
|
|
||||||
ublk_dbg(UBLK_DBG_QUEUE, "dev%d-q%d: to_submit %d inflight cmd %u stopping %d\n",
|
ublk_dbg(UBLK_DBG_THREAD, "dev%d-t%u: to_submit %d inflight cmd %u stopping %d\n",
|
||||||
q->dev->dev_info.dev_id,
|
t->dev->dev_info.dev_id,
|
||||||
q->q_id, io_uring_sq_ready(&q->ring),
|
t->idx, io_uring_sq_ready(&t->ring),
|
||||||
q->cmd_inflight,
|
t->cmd_inflight,
|
||||||
(q->state & UBLKSRV_QUEUE_STOPPING));
|
(t->state & UBLKSRV_THREAD_STOPPING));
|
||||||
|
|
||||||
if (ublk_queue_is_done(q))
|
if (ublk_thread_is_done(t))
|
||||||
return -ENODEV;
|
return -ENODEV;
|
||||||
|
|
||||||
ret = io_uring_submit_and_wait(&q->ring, 1);
|
ret = io_uring_submit_and_wait(&t->ring, 1);
|
||||||
reapped = ublk_reap_events_uring(&q->ring);
|
reapped = ublk_reap_events_uring(t);
|
||||||
|
|
||||||
ublk_dbg(UBLK_DBG_QUEUE, "submit result %d, reapped %d stop %d idle %d\n",
|
ublk_dbg(UBLK_DBG_THREAD, "submit result %d, reapped %d stop %d idle %d\n",
|
||||||
ret, reapped, (q->state & UBLKSRV_QUEUE_STOPPING),
|
ret, reapped, (t->state & UBLKSRV_THREAD_STOPPING),
|
||||||
(q->state & UBLKSRV_QUEUE_IDLE));
|
(t->state & UBLKSRV_THREAD_IDLE));
|
||||||
|
|
||||||
return reapped;
|
return reapped;
|
||||||
}
|
}
|
||||||
|
|
||||||
static void ublk_queue_set_sched_affinity(const struct ublk_queue *q,
|
static void ublk_thread_set_sched_affinity(const struct ublk_thread *t,
|
||||||
cpu_set_t *cpuset)
|
cpu_set_t *cpuset)
|
||||||
{
|
{
|
||||||
if (sched_setaffinity(0, sizeof(*cpuset), cpuset) < 0)
|
if (sched_setaffinity(0, sizeof(*cpuset), cpuset) < 0)
|
||||||
ublk_err("ublk dev %u queue %u set affinity failed",
|
ublk_err("ublk dev %u thread %u set affinity failed",
|
||||||
q->dev->dev_info.dev_id, q->q_id);
|
t->dev->dev_info.dev_id, t->idx);
|
||||||
}
|
}
|
||||||
|
|
||||||
struct ublk_queue_info {
|
struct ublk_thread_info {
|
||||||
struct ublk_queue *q;
|
struct ublk_dev *dev;
|
||||||
sem_t *queue_sem;
|
unsigned idx;
|
||||||
|
sem_t *ready;
|
||||||
cpu_set_t *affinity;
|
cpu_set_t *affinity;
|
||||||
unsigned char auto_zc_fallback;
|
|
||||||
};
|
};
|
||||||
|
|
||||||
static void *ublk_io_handler_fn(void *data)
|
static void *ublk_io_handler_fn(void *data)
|
||||||
{
|
{
|
||||||
struct ublk_queue_info *info = data;
|
struct ublk_thread_info *info = data;
|
||||||
struct ublk_queue *q = info->q;
|
struct ublk_thread *t = &info->dev->threads[info->idx];
|
||||||
int dev_id = q->dev->dev_info.dev_id;
|
int dev_id = info->dev->dev_info.dev_id;
|
||||||
unsigned extra_flags = 0;
|
|
||||||
int ret;
|
int ret;
|
||||||
|
|
||||||
if (info->auto_zc_fallback)
|
t->dev = info->dev;
|
||||||
extra_flags = UBLKSRV_AUTO_BUF_REG_FALLBACK;
|
t->idx = info->idx;
|
||||||
|
|
||||||
ret = ublk_queue_init(q, extra_flags);
|
ret = ublk_thread_init(t);
|
||||||
if (ret) {
|
if (ret) {
|
||||||
ublk_err("ublk dev %d queue %d init queue failed\n",
|
ublk_err("ublk dev %d thread %u init failed\n",
|
||||||
dev_id, q->q_id);
|
dev_id, t->idx);
|
||||||
return NULL;
|
return NULL;
|
||||||
}
|
}
|
||||||
/* IO perf is sensitive with queue pthread affinity on NUMA machine*/
|
/* IO perf is sensitive with queue pthread affinity on NUMA machine*/
|
||||||
ublk_queue_set_sched_affinity(q, info->affinity);
|
if (info->affinity)
|
||||||
sem_post(info->queue_sem);
|
ublk_thread_set_sched_affinity(t, info->affinity);
|
||||||
|
sem_post(info->ready);
|
||||||
|
|
||||||
ublk_dbg(UBLK_DBG_QUEUE, "tid %d: ublk dev %d queue %d started\n",
|
ublk_dbg(UBLK_DBG_THREAD, "tid %d: ublk dev %d thread %u started\n",
|
||||||
q->tid, dev_id, q->q_id);
|
gettid(), dev_id, t->idx);
|
||||||
|
|
||||||
/* submit all io commands to ublk driver */
|
/* submit all io commands to ublk driver */
|
||||||
ublk_submit_fetch_commands(q);
|
ublk_submit_fetch_commands(t);
|
||||||
do {
|
do {
|
||||||
if (ublk_process_io(q) < 0)
|
if (ublk_process_io(t) < 0)
|
||||||
break;
|
break;
|
||||||
} while (1);
|
} while (1);
|
||||||
|
|
||||||
ublk_dbg(UBLK_DBG_QUEUE, "ublk dev %d queue %d exited\n", dev_id, q->q_id);
|
ublk_dbg(UBLK_DBG_THREAD, "tid %d: ublk dev %d thread %d exiting\n",
|
||||||
ublk_queue_deinit(q);
|
gettid(), dev_id, t->idx);
|
||||||
|
ublk_thread_deinit(t);
|
||||||
return NULL;
|
return NULL;
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|
@ -855,20 +914,20 @@ static int ublk_send_dev_event(const struct dev_ctx *ctx, struct ublk_dev *dev,
|
||||||
static int ublk_start_daemon(const struct dev_ctx *ctx, struct ublk_dev *dev)
|
static int ublk_start_daemon(const struct dev_ctx *ctx, struct ublk_dev *dev)
|
||||||
{
|
{
|
||||||
const struct ublksrv_ctrl_dev_info *dinfo = &dev->dev_info;
|
const struct ublksrv_ctrl_dev_info *dinfo = &dev->dev_info;
|
||||||
struct ublk_queue_info *qinfo;
|
struct ublk_thread_info *tinfo;
|
||||||
|
unsigned extra_flags = 0;
|
||||||
cpu_set_t *affinity_buf;
|
cpu_set_t *affinity_buf;
|
||||||
void *thread_ret;
|
void *thread_ret;
|
||||||
sem_t queue_sem;
|
sem_t ready;
|
||||||
int ret, i;
|
int ret, i;
|
||||||
|
|
||||||
ublk_dbg(UBLK_DBG_DEV, "%s enter\n", __func__);
|
ublk_dbg(UBLK_DBG_DEV, "%s enter\n", __func__);
|
||||||
|
|
||||||
qinfo = (struct ublk_queue_info *)calloc(sizeof(struct ublk_queue_info),
|
tinfo = calloc(sizeof(struct ublk_thread_info), dev->nthreads);
|
||||||
dinfo->nr_hw_queues);
|
if (!tinfo)
|
||||||
if (!qinfo)
|
|
||||||
return -ENOMEM;
|
return -ENOMEM;
|
||||||
|
|
||||||
sem_init(&queue_sem, 0, 0);
|
sem_init(&ready, 0, 0);
|
||||||
ret = ublk_dev_prep(ctx, dev);
|
ret = ublk_dev_prep(ctx, dev);
|
||||||
if (ret)
|
if (ret)
|
||||||
return ret;
|
return ret;
|
||||||
|
|
@ -877,22 +936,44 @@ static int ublk_start_daemon(const struct dev_ctx *ctx, struct ublk_dev *dev)
|
||||||
if (ret)
|
if (ret)
|
||||||
return ret;
|
return ret;
|
||||||
|
|
||||||
|
if (ctx->auto_zc_fallback)
|
||||||
|
extra_flags = UBLKSRV_AUTO_BUF_REG_FALLBACK;
|
||||||
|
|
||||||
for (i = 0; i < dinfo->nr_hw_queues; i++) {
|
for (i = 0; i < dinfo->nr_hw_queues; i++) {
|
||||||
dev->q[i].dev = dev;
|
dev->q[i].dev = dev;
|
||||||
dev->q[i].q_id = i;
|
dev->q[i].q_id = i;
|
||||||
|
|
||||||
qinfo[i].q = &dev->q[i];
|
ret = ublk_queue_init(&dev->q[i], extra_flags);
|
||||||
qinfo[i].queue_sem = &queue_sem;
|
if (ret) {
|
||||||
qinfo[i].affinity = &affinity_buf[i];
|
ublk_err("ublk dev %d queue %d init queue failed\n",
|
||||||
qinfo[i].auto_zc_fallback = ctx->auto_zc_fallback;
|
dinfo->dev_id, i);
|
||||||
pthread_create(&dev->q[i].thread, NULL,
|
goto fail;
|
||||||
ublk_io_handler_fn,
|
}
|
||||||
&qinfo[i]);
|
|
||||||
}
|
}
|
||||||
|
|
||||||
for (i = 0; i < dinfo->nr_hw_queues; i++)
|
for (i = 0; i < dev->nthreads; i++) {
|
||||||
sem_wait(&queue_sem);
|
tinfo[i].dev = dev;
|
||||||
free(qinfo);
|
tinfo[i].idx = i;
|
||||||
|
tinfo[i].ready = &ready;
|
||||||
|
|
||||||
|
/*
|
||||||
|
* If threads are not tied 1:1 to queues, setting thread
|
||||||
|
* affinity based on queue affinity makes little sense.
|
||||||
|
* However, thread CPU affinity has significant impact
|
||||||
|
* on performance, so to compare fairly, we'll still set
|
||||||
|
* thread CPU affinity based on queue affinity where
|
||||||
|
* possible.
|
||||||
|
*/
|
||||||
|
if (dev->nthreads == dinfo->nr_hw_queues)
|
||||||
|
tinfo[i].affinity = &affinity_buf[i];
|
||||||
|
pthread_create(&dev->threads[i].thread, NULL,
|
||||||
|
ublk_io_handler_fn,
|
||||||
|
&tinfo[i]);
|
||||||
|
}
|
||||||
|
|
||||||
|
for (i = 0; i < dev->nthreads; i++)
|
||||||
|
sem_wait(&ready);
|
||||||
|
free(tinfo);
|
||||||
free(affinity_buf);
|
free(affinity_buf);
|
||||||
|
|
||||||
/* everything is fine now, start us */
|
/* everything is fine now, start us */
|
||||||
|
|
@ -914,9 +995,11 @@ static int ublk_start_daemon(const struct dev_ctx *ctx, struct ublk_dev *dev)
|
||||||
ublk_send_dev_event(ctx, dev, dev->dev_info.dev_id);
|
ublk_send_dev_event(ctx, dev, dev->dev_info.dev_id);
|
||||||
|
|
||||||
/* wait until we are terminated */
|
/* wait until we are terminated */
|
||||||
for (i = 0; i < dinfo->nr_hw_queues; i++)
|
for (i = 0; i < dev->nthreads; i++)
|
||||||
pthread_join(dev->q[i].thread, &thread_ret);
|
pthread_join(dev->threads[i].thread, &thread_ret);
|
||||||
fail:
|
fail:
|
||||||
|
for (i = 0; i < dinfo->nr_hw_queues; i++)
|
||||||
|
ublk_queue_deinit(&dev->q[i]);
|
||||||
ublk_dev_unprep(dev);
|
ublk_dev_unprep(dev);
|
||||||
ublk_dbg(UBLK_DBG_DEV, "%s exit\n", __func__);
|
ublk_dbg(UBLK_DBG_DEV, "%s exit\n", __func__);
|
||||||
|
|
||||||
|
|
@ -1022,13 +1105,14 @@ wait:
|
||||||
|
|
||||||
static int __cmd_dev_add(const struct dev_ctx *ctx)
|
static int __cmd_dev_add(const struct dev_ctx *ctx)
|
||||||
{
|
{
|
||||||
|
unsigned nthreads = ctx->nthreads;
|
||||||
unsigned nr_queues = ctx->nr_hw_queues;
|
unsigned nr_queues = ctx->nr_hw_queues;
|
||||||
const char *tgt_type = ctx->tgt_type;
|
const char *tgt_type = ctx->tgt_type;
|
||||||
unsigned depth = ctx->queue_depth;
|
unsigned depth = ctx->queue_depth;
|
||||||
__u64 features;
|
__u64 features;
|
||||||
const struct ublk_tgt_ops *ops;
|
const struct ublk_tgt_ops *ops;
|
||||||
struct ublksrv_ctrl_dev_info *info;
|
struct ublksrv_ctrl_dev_info *info;
|
||||||
struct ublk_dev *dev;
|
struct ublk_dev *dev = NULL;
|
||||||
int dev_id = ctx->dev_id;
|
int dev_id = ctx->dev_id;
|
||||||
int ret, i;
|
int ret, i;
|
||||||
|
|
||||||
|
|
@ -1036,29 +1120,55 @@ static int __cmd_dev_add(const struct dev_ctx *ctx)
|
||||||
if (!ops) {
|
if (!ops) {
|
||||||
ublk_err("%s: no such tgt type, type %s\n",
|
ublk_err("%s: no such tgt type, type %s\n",
|
||||||
__func__, tgt_type);
|
__func__, tgt_type);
|
||||||
return -ENODEV;
|
ret = -ENODEV;
|
||||||
|
goto fail;
|
||||||
}
|
}
|
||||||
|
|
||||||
if (nr_queues > UBLK_MAX_QUEUES || depth > UBLK_QUEUE_DEPTH) {
|
if (nr_queues > UBLK_MAX_QUEUES || depth > UBLK_QUEUE_DEPTH) {
|
||||||
ublk_err("%s: invalid nr_queues or depth queues %u depth %u\n",
|
ublk_err("%s: invalid nr_queues or depth queues %u depth %u\n",
|
||||||
__func__, nr_queues, depth);
|
__func__, nr_queues, depth);
|
||||||
return -EINVAL;
|
ret = -EINVAL;
|
||||||
|
goto fail;
|
||||||
|
}
|
||||||
|
|
||||||
|
/* default to 1:1 threads:queues if nthreads is unspecified */
|
||||||
|
if (!nthreads)
|
||||||
|
nthreads = nr_queues;
|
||||||
|
|
||||||
|
if (nthreads > UBLK_MAX_THREADS) {
|
||||||
|
ublk_err("%s: %u is too many threads (max %u)\n",
|
||||||
|
__func__, nthreads, UBLK_MAX_THREADS);
|
||||||
|
ret = -EINVAL;
|
||||||
|
goto fail;
|
||||||
|
}
|
||||||
|
|
||||||
|
if (nthreads != nr_queues && !ctx->per_io_tasks) {
|
||||||
|
ublk_err("%s: threads %u must be same as queues %u if "
|
||||||
|
"not using per_io_tasks\n",
|
||||||
|
__func__, nthreads, nr_queues);
|
||||||
|
ret = -EINVAL;
|
||||||
|
goto fail;
|
||||||
}
|
}
|
||||||
|
|
||||||
dev = ublk_ctrl_init();
|
dev = ublk_ctrl_init();
|
||||||
if (!dev) {
|
if (!dev) {
|
||||||
ublk_err("%s: can't alloc dev id %d, type %s\n",
|
ublk_err("%s: can't alloc dev id %d, type %s\n",
|
||||||
__func__, dev_id, tgt_type);
|
__func__, dev_id, tgt_type);
|
||||||
return -ENOMEM;
|
ret = -ENOMEM;
|
||||||
|
goto fail;
|
||||||
}
|
}
|
||||||
|
|
||||||
/* kernel doesn't support get_features */
|
/* kernel doesn't support get_features */
|
||||||
ret = ublk_ctrl_get_features(dev, &features);
|
ret = ublk_ctrl_get_features(dev, &features);
|
||||||
if (ret < 0)
|
if (ret < 0) {
|
||||||
return -EINVAL;
|
ret = -EINVAL;
|
||||||
|
goto fail;
|
||||||
|
}
|
||||||
|
|
||||||
if (!(features & UBLK_F_CMD_IOCTL_ENCODE))
|
if (!(features & UBLK_F_CMD_IOCTL_ENCODE)) {
|
||||||
return -ENOTSUP;
|
ret = -ENOTSUP;
|
||||||
|
goto fail;
|
||||||
|
}
|
||||||
|
|
||||||
info = &dev->dev_info;
|
info = &dev->dev_info;
|
||||||
info->dev_id = ctx->dev_id;
|
info->dev_id = ctx->dev_id;
|
||||||
|
|
@ -1068,6 +1178,8 @@ static int __cmd_dev_add(const struct dev_ctx *ctx)
|
||||||
if ((features & UBLK_F_QUIESCE) &&
|
if ((features & UBLK_F_QUIESCE) &&
|
||||||
(info->flags & UBLK_F_USER_RECOVERY))
|
(info->flags & UBLK_F_USER_RECOVERY))
|
||||||
info->flags |= UBLK_F_QUIESCE;
|
info->flags |= UBLK_F_QUIESCE;
|
||||||
|
dev->nthreads = nthreads;
|
||||||
|
dev->per_io_tasks = ctx->per_io_tasks;
|
||||||
dev->tgt.ops = ops;
|
dev->tgt.ops = ops;
|
||||||
dev->tgt.sq_depth = depth;
|
dev->tgt.sq_depth = depth;
|
||||||
dev->tgt.cq_depth = depth;
|
dev->tgt.cq_depth = depth;
|
||||||
|
|
@ -1097,6 +1209,7 @@ static int __cmd_dev_add(const struct dev_ctx *ctx)
|
||||||
fail:
|
fail:
|
||||||
if (ret < 0)
|
if (ret < 0)
|
||||||
ublk_send_dev_event(ctx, dev, -1);
|
ublk_send_dev_event(ctx, dev, -1);
|
||||||
|
if (dev)
|
||||||
ublk_ctrl_deinit(dev);
|
ublk_ctrl_deinit(dev);
|
||||||
return ret;
|
return ret;
|
||||||
}
|
}
|
||||||
|
|
@ -1159,6 +1272,8 @@ run:
|
||||||
shmctl(ctx->_shmid, IPC_RMID, NULL);
|
shmctl(ctx->_shmid, IPC_RMID, NULL);
|
||||||
/* wait for child and detach from it */
|
/* wait for child and detach from it */
|
||||||
wait(NULL);
|
wait(NULL);
|
||||||
|
if (exit_code == EXIT_FAILURE)
|
||||||
|
ublk_err("%s: command failed\n", __func__);
|
||||||
exit(exit_code);
|
exit(exit_code);
|
||||||
} else {
|
} else {
|
||||||
exit(EXIT_FAILURE);
|
exit(EXIT_FAILURE);
|
||||||
|
|
@ -1266,6 +1381,7 @@ static int cmd_dev_get_features(void)
|
||||||
[const_ilog2(UBLK_F_UPDATE_SIZE)] = "UPDATE_SIZE",
|
[const_ilog2(UBLK_F_UPDATE_SIZE)] = "UPDATE_SIZE",
|
||||||
[const_ilog2(UBLK_F_AUTO_BUF_REG)] = "AUTO_BUF_REG",
|
[const_ilog2(UBLK_F_AUTO_BUF_REG)] = "AUTO_BUF_REG",
|
||||||
[const_ilog2(UBLK_F_QUIESCE)] = "QUIESCE",
|
[const_ilog2(UBLK_F_QUIESCE)] = "QUIESCE",
|
||||||
|
[const_ilog2(UBLK_F_PER_IO_DAEMON)] = "PER_IO_DAEMON",
|
||||||
};
|
};
|
||||||
struct ublk_dev *dev;
|
struct ublk_dev *dev;
|
||||||
__u64 features = 0;
|
__u64 features = 0;
|
||||||
|
|
@ -1360,8 +1476,10 @@ static void __cmd_create_help(char *exe, bool recovery)
|
||||||
exe, recovery ? "recover" : "add");
|
exe, recovery ? "recover" : "add");
|
||||||
printf("\t[--foreground] [--quiet] [-z] [--auto_zc] [--auto_zc_fallback] [--debug_mask mask] [-r 0|1 ] [-g]\n");
|
printf("\t[--foreground] [--quiet] [-z] [--auto_zc] [--auto_zc_fallback] [--debug_mask mask] [-r 0|1 ] [-g]\n");
|
||||||
printf("\t[-e 0|1 ] [-i 0|1]\n");
|
printf("\t[-e 0|1 ] [-i 0|1]\n");
|
||||||
|
printf("\t[--nthreads threads] [--per_io_tasks]\n");
|
||||||
printf("\t[target options] [backfile1] [backfile2] ...\n");
|
printf("\t[target options] [backfile1] [backfile2] ...\n");
|
||||||
printf("\tdefault: nr_queues=2(max 32), depth=128(max 1024), dev_id=-1(auto allocation)\n");
|
printf("\tdefault: nr_queues=2(max 32), depth=128(max 1024), dev_id=-1(auto allocation)\n");
|
||||||
|
printf("\tdefault: nthreads=nr_queues");
|
||||||
|
|
||||||
for (i = 0; i < sizeof(tgt_ops_list) / sizeof(tgt_ops_list[0]); i++) {
|
for (i = 0; i < sizeof(tgt_ops_list) / sizeof(tgt_ops_list[0]); i++) {
|
||||||
const struct ublk_tgt_ops *ops = tgt_ops_list[i];
|
const struct ublk_tgt_ops *ops = tgt_ops_list[i];
|
||||||
|
|
@ -1418,6 +1536,8 @@ int main(int argc, char *argv[])
|
||||||
{ "auto_zc", 0, NULL, 0 },
|
{ "auto_zc", 0, NULL, 0 },
|
||||||
{ "auto_zc_fallback", 0, NULL, 0 },
|
{ "auto_zc_fallback", 0, NULL, 0 },
|
||||||
{ "size", 1, NULL, 's'},
|
{ "size", 1, NULL, 's'},
|
||||||
|
{ "nthreads", 1, NULL, 0 },
|
||||||
|
{ "per_io_tasks", 0, NULL, 0 },
|
||||||
{ 0, 0, 0, 0 }
|
{ 0, 0, 0, 0 }
|
||||||
};
|
};
|
||||||
const struct ublk_tgt_ops *ops = NULL;
|
const struct ublk_tgt_ops *ops = NULL;
|
||||||
|
|
@ -1493,6 +1613,10 @@ int main(int argc, char *argv[])
|
||||||
ctx.flags |= UBLK_F_AUTO_BUF_REG;
|
ctx.flags |= UBLK_F_AUTO_BUF_REG;
|
||||||
if (!strcmp(longopts[option_idx].name, "auto_zc_fallback"))
|
if (!strcmp(longopts[option_idx].name, "auto_zc_fallback"))
|
||||||
ctx.auto_zc_fallback = 1;
|
ctx.auto_zc_fallback = 1;
|
||||||
|
if (!strcmp(longopts[option_idx].name, "nthreads"))
|
||||||
|
ctx.nthreads = strtol(optarg, NULL, 10);
|
||||||
|
if (!strcmp(longopts[option_idx].name, "per_io_tasks"))
|
||||||
|
ctx.per_io_tasks = 1;
|
||||||
break;
|
break;
|
||||||
case '?':
|
case '?':
|
||||||
/*
|
/*
|
||||||
|
|
|
||||||
|
|
@ -49,11 +49,14 @@
|
||||||
#define UBLKSRV_IO_IDLE_SECS 20
|
#define UBLKSRV_IO_IDLE_SECS 20
|
||||||
|
|
||||||
#define UBLK_IO_MAX_BYTES (1 << 20)
|
#define UBLK_IO_MAX_BYTES (1 << 20)
|
||||||
#define UBLK_MAX_QUEUES 32
|
#define UBLK_MAX_QUEUES_SHIFT 5
|
||||||
|
#define UBLK_MAX_QUEUES (1 << UBLK_MAX_QUEUES_SHIFT)
|
||||||
|
#define UBLK_MAX_THREADS_SHIFT 5
|
||||||
|
#define UBLK_MAX_THREADS (1 << UBLK_MAX_THREADS_SHIFT)
|
||||||
#define UBLK_QUEUE_DEPTH 1024
|
#define UBLK_QUEUE_DEPTH 1024
|
||||||
|
|
||||||
#define UBLK_DBG_DEV (1U << 0)
|
#define UBLK_DBG_DEV (1U << 0)
|
||||||
#define UBLK_DBG_QUEUE (1U << 1)
|
#define UBLK_DBG_THREAD (1U << 1)
|
||||||
#define UBLK_DBG_IO_CMD (1U << 2)
|
#define UBLK_DBG_IO_CMD (1U << 2)
|
||||||
#define UBLK_DBG_IO (1U << 3)
|
#define UBLK_DBG_IO (1U << 3)
|
||||||
#define UBLK_DBG_CTRL_CMD (1U << 4)
|
#define UBLK_DBG_CTRL_CMD (1U << 4)
|
||||||
|
|
@ -61,6 +64,7 @@
|
||||||
|
|
||||||
struct ublk_dev;
|
struct ublk_dev;
|
||||||
struct ublk_queue;
|
struct ublk_queue;
|
||||||
|
struct ublk_thread;
|
||||||
|
|
||||||
struct stripe_ctx {
|
struct stripe_ctx {
|
||||||
/* stripe */
|
/* stripe */
|
||||||
|
|
@ -76,6 +80,7 @@ struct dev_ctx {
|
||||||
char tgt_type[16];
|
char tgt_type[16];
|
||||||
unsigned long flags;
|
unsigned long flags;
|
||||||
unsigned nr_hw_queues;
|
unsigned nr_hw_queues;
|
||||||
|
unsigned short nthreads;
|
||||||
unsigned queue_depth;
|
unsigned queue_depth;
|
||||||
int dev_id;
|
int dev_id;
|
||||||
int nr_files;
|
int nr_files;
|
||||||
|
|
@ -85,6 +90,7 @@ struct dev_ctx {
|
||||||
unsigned int fg:1;
|
unsigned int fg:1;
|
||||||
unsigned int recovery:1;
|
unsigned int recovery:1;
|
||||||
unsigned int auto_zc_fallback:1;
|
unsigned int auto_zc_fallback:1;
|
||||||
|
unsigned int per_io_tasks:1;
|
||||||
|
|
||||||
int _evtfd;
|
int _evtfd;
|
||||||
int _shmid;
|
int _shmid;
|
||||||
|
|
@ -123,10 +129,14 @@ struct ublk_io {
|
||||||
unsigned short flags;
|
unsigned short flags;
|
||||||
unsigned short refs; /* used by target code only */
|
unsigned short refs; /* used by target code only */
|
||||||
|
|
||||||
|
int tag;
|
||||||
|
|
||||||
int result;
|
int result;
|
||||||
|
|
||||||
|
unsigned short buf_index;
|
||||||
unsigned short tgt_ios;
|
unsigned short tgt_ios;
|
||||||
void *private_data;
|
void *private_data;
|
||||||
|
struct ublk_thread *t;
|
||||||
};
|
};
|
||||||
|
|
||||||
struct ublk_tgt_ops {
|
struct ublk_tgt_ops {
|
||||||
|
|
@ -165,28 +175,39 @@ struct ublk_tgt {
|
||||||
struct ublk_queue {
|
struct ublk_queue {
|
||||||
int q_id;
|
int q_id;
|
||||||
int q_depth;
|
int q_depth;
|
||||||
unsigned int cmd_inflight;
|
|
||||||
unsigned int io_inflight;
|
|
||||||
struct ublk_dev *dev;
|
struct ublk_dev *dev;
|
||||||
const struct ublk_tgt_ops *tgt_ops;
|
const struct ublk_tgt_ops *tgt_ops;
|
||||||
struct ublksrv_io_desc *io_cmd_buf;
|
struct ublksrv_io_desc *io_cmd_buf;
|
||||||
struct io_uring ring;
|
|
||||||
struct ublk_io ios[UBLK_QUEUE_DEPTH];
|
struct ublk_io ios[UBLK_QUEUE_DEPTH];
|
||||||
#define UBLKSRV_QUEUE_STOPPING (1U << 0)
|
|
||||||
#define UBLKSRV_QUEUE_IDLE (1U << 1)
|
|
||||||
#define UBLKSRV_NO_BUF (1U << 2)
|
#define UBLKSRV_NO_BUF (1U << 2)
|
||||||
#define UBLKSRV_ZC (1U << 3)
|
#define UBLKSRV_ZC (1U << 3)
|
||||||
#define UBLKSRV_AUTO_BUF_REG (1U << 4)
|
#define UBLKSRV_AUTO_BUF_REG (1U << 4)
|
||||||
#define UBLKSRV_AUTO_BUF_REG_FALLBACK (1U << 5)
|
#define UBLKSRV_AUTO_BUF_REG_FALLBACK (1U << 5)
|
||||||
unsigned state;
|
unsigned state;
|
||||||
pid_t tid;
|
};
|
||||||
|
|
||||||
|
struct ublk_thread {
|
||||||
|
struct ublk_dev *dev;
|
||||||
|
struct io_uring ring;
|
||||||
|
unsigned int cmd_inflight;
|
||||||
|
unsigned int io_inflight;
|
||||||
|
|
||||||
pthread_t thread;
|
pthread_t thread;
|
||||||
|
unsigned idx;
|
||||||
|
|
||||||
|
#define UBLKSRV_THREAD_STOPPING (1U << 0)
|
||||||
|
#define UBLKSRV_THREAD_IDLE (1U << 1)
|
||||||
|
unsigned state;
|
||||||
};
|
};
|
||||||
|
|
||||||
struct ublk_dev {
|
struct ublk_dev {
|
||||||
struct ublk_tgt tgt;
|
struct ublk_tgt tgt;
|
||||||
struct ublksrv_ctrl_dev_info dev_info;
|
struct ublksrv_ctrl_dev_info dev_info;
|
||||||
struct ublk_queue q[UBLK_MAX_QUEUES];
|
struct ublk_queue q[UBLK_MAX_QUEUES];
|
||||||
|
struct ublk_thread threads[UBLK_MAX_THREADS];
|
||||||
|
unsigned nthreads;
|
||||||
|
unsigned per_io_tasks;
|
||||||
|
|
||||||
int fds[MAX_BACK_FILES + 1]; /* fds[0] points to /dev/ublkcN */
|
int fds[MAX_BACK_FILES + 1]; /* fds[0] points to /dev/ublkcN */
|
||||||
int nr_fds;
|
int nr_fds;
|
||||||
|
|
@ -211,7 +232,7 @@ struct ublk_dev {
|
||||||
|
|
||||||
|
|
||||||
extern unsigned int ublk_dbg_mask;
|
extern unsigned int ublk_dbg_mask;
|
||||||
extern int ublk_queue_io_cmd(struct ublk_queue *q, struct ublk_io *io, unsigned tag);
|
extern int ublk_queue_io_cmd(struct ublk_io *io);
|
||||||
|
|
||||||
|
|
||||||
static inline int ublk_io_auto_zc_fallback(const struct ublksrv_io_desc *iod)
|
static inline int ublk_io_auto_zc_fallback(const struct ublksrv_io_desc *iod)
|
||||||
|
|
@ -225,11 +246,14 @@ static inline int is_target_io(__u64 user_data)
|
||||||
}
|
}
|
||||||
|
|
||||||
static inline __u64 build_user_data(unsigned tag, unsigned op,
|
static inline __u64 build_user_data(unsigned tag, unsigned op,
|
||||||
unsigned tgt_data, unsigned is_target_io)
|
unsigned tgt_data, unsigned q_id, unsigned is_target_io)
|
||||||
{
|
{
|
||||||
assert(!(tag >> 16) && !(op >> 8) && !(tgt_data >> 16));
|
/* we only have 7 bits to encode q_id */
|
||||||
|
_Static_assert(UBLK_MAX_QUEUES_SHIFT <= 7);
|
||||||
|
assert(!(tag >> 16) && !(op >> 8) && !(tgt_data >> 16) && !(q_id >> 7));
|
||||||
|
|
||||||
return tag | (op << 16) | (tgt_data << 24) | (__u64)is_target_io << 63;
|
return tag | (op << 16) | (tgt_data << 24) |
|
||||||
|
(__u64)q_id << 56 | (__u64)is_target_io << 63;
|
||||||
}
|
}
|
||||||
|
|
||||||
static inline unsigned int user_data_to_tag(__u64 user_data)
|
static inline unsigned int user_data_to_tag(__u64 user_data)
|
||||||
|
|
@ -247,6 +271,11 @@ static inline unsigned int user_data_to_tgt_data(__u64 user_data)
|
||||||
return (user_data >> 24) & 0xffff;
|
return (user_data >> 24) & 0xffff;
|
||||||
}
|
}
|
||||||
|
|
||||||
|
static inline unsigned int user_data_to_q_id(__u64 user_data)
|
||||||
|
{
|
||||||
|
return (user_data >> 56) & 0x7f;
|
||||||
|
}
|
||||||
|
|
||||||
static inline unsigned short ublk_cmd_op_nr(unsigned int op)
|
static inline unsigned short ublk_cmd_op_nr(unsigned int op)
|
||||||
{
|
{
|
||||||
return _IOC_NR(op);
|
return _IOC_NR(op);
|
||||||
|
|
@ -280,17 +309,23 @@ static inline void ublk_dbg(int level, const char *fmt, ...)
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
static inline int ublk_queue_alloc_sqes(struct ublk_queue *q,
|
static inline struct ublk_queue *ublk_io_to_queue(const struct ublk_io *io)
|
||||||
|
{
|
||||||
|
return container_of(io, struct ublk_queue, ios[io->tag]);
|
||||||
|
}
|
||||||
|
|
||||||
|
static inline int ublk_io_alloc_sqes(struct ublk_io *io,
|
||||||
struct io_uring_sqe *sqes[], int nr_sqes)
|
struct io_uring_sqe *sqes[], int nr_sqes)
|
||||||
{
|
{
|
||||||
unsigned left = io_uring_sq_space_left(&q->ring);
|
struct io_uring *ring = &io->t->ring;
|
||||||
|
unsigned left = io_uring_sq_space_left(ring);
|
||||||
int i;
|
int i;
|
||||||
|
|
||||||
if (left < nr_sqes)
|
if (left < nr_sqes)
|
||||||
io_uring_submit(&q->ring);
|
io_uring_submit(ring);
|
||||||
|
|
||||||
for (i = 0; i < nr_sqes; i++) {
|
for (i = 0; i < nr_sqes; i++) {
|
||||||
sqes[i] = io_uring_get_sqe(&q->ring);
|
sqes[i] = io_uring_get_sqe(ring);
|
||||||
if (!sqes[i])
|
if (!sqes[i])
|
||||||
return i;
|
return i;
|
||||||
}
|
}
|
||||||
|
|
@ -373,7 +408,7 @@ static inline int ublk_complete_io(struct ublk_queue *q, unsigned tag, int res)
|
||||||
|
|
||||||
ublk_mark_io_done(io, res);
|
ublk_mark_io_done(io, res);
|
||||||
|
|
||||||
return ublk_queue_io_cmd(q, io, tag);
|
return ublk_queue_io_cmd(io);
|
||||||
}
|
}
|
||||||
|
|
||||||
static inline void ublk_queued_tgt_io(struct ublk_queue *q, unsigned tag, int queued)
|
static inline void ublk_queued_tgt_io(struct ublk_queue *q, unsigned tag, int queued)
|
||||||
|
|
@ -383,7 +418,7 @@ static inline void ublk_queued_tgt_io(struct ublk_queue *q, unsigned tag, int qu
|
||||||
else {
|
else {
|
||||||
struct ublk_io *io = ublk_get_io(q, tag);
|
struct ublk_io *io = ublk_get_io(q, tag);
|
||||||
|
|
||||||
q->io_inflight += queued;
|
io->t->io_inflight += queued;
|
||||||
io->tgt_ios = queued;
|
io->tgt_ios = queued;
|
||||||
io->result = 0;
|
io->result = 0;
|
||||||
}
|
}
|
||||||
|
|
@ -393,7 +428,7 @@ static inline int ublk_completed_tgt_io(struct ublk_queue *q, unsigned tag)
|
||||||
{
|
{
|
||||||
struct ublk_io *io = ublk_get_io(q, tag);
|
struct ublk_io *io = ublk_get_io(q, tag);
|
||||||
|
|
||||||
q->io_inflight--;
|
io->t->io_inflight--;
|
||||||
|
|
||||||
return --io->tgt_ios == 0;
|
return --io->tgt_ios == 0;
|
||||||
}
|
}
|
||||||
|
|
|
||||||
|
|
@ -43,7 +43,7 @@ static int ublk_null_tgt_init(const struct dev_ctx *ctx, struct ublk_dev *dev)
|
||||||
}
|
}
|
||||||
|
|
||||||
static void __setup_nop_io(int tag, const struct ublksrv_io_desc *iod,
|
static void __setup_nop_io(int tag, const struct ublksrv_io_desc *iod,
|
||||||
struct io_uring_sqe *sqe)
|
struct io_uring_sqe *sqe, int q_id)
|
||||||
{
|
{
|
||||||
unsigned ublk_op = ublksrv_get_op(iod);
|
unsigned ublk_op = ublksrv_get_op(iod);
|
||||||
|
|
||||||
|
|
@ -52,7 +52,7 @@ static void __setup_nop_io(int tag, const struct ublksrv_io_desc *iod,
|
||||||
sqe->flags |= IOSQE_FIXED_FILE;
|
sqe->flags |= IOSQE_FIXED_FILE;
|
||||||
sqe->rw_flags = IORING_NOP_FIXED_BUFFER | IORING_NOP_INJECT_RESULT;
|
sqe->rw_flags = IORING_NOP_FIXED_BUFFER | IORING_NOP_INJECT_RESULT;
|
||||||
sqe->len = iod->nr_sectors << 9; /* injected result */
|
sqe->len = iod->nr_sectors << 9; /* injected result */
|
||||||
sqe->user_data = build_user_data(tag, ublk_op, 0, 1);
|
sqe->user_data = build_user_data(tag, ublk_op, 0, q_id, 1);
|
||||||
}
|
}
|
||||||
|
|
||||||
static int null_queue_zc_io(struct ublk_queue *q, int tag)
|
static int null_queue_zc_io(struct ublk_queue *q, int tag)
|
||||||
|
|
@ -60,18 +60,18 @@ static int null_queue_zc_io(struct ublk_queue *q, int tag)
|
||||||
const struct ublksrv_io_desc *iod = ublk_get_iod(q, tag);
|
const struct ublksrv_io_desc *iod = ublk_get_iod(q, tag);
|
||||||
struct io_uring_sqe *sqe[3];
|
struct io_uring_sqe *sqe[3];
|
||||||
|
|
||||||
ublk_queue_alloc_sqes(q, sqe, 3);
|
ublk_io_alloc_sqes(ublk_get_io(q, tag), sqe, 3);
|
||||||
|
|
||||||
io_uring_prep_buf_register(sqe[0], 0, tag, q->q_id, tag);
|
io_uring_prep_buf_register(sqe[0], 0, tag, q->q_id, ublk_get_io(q, tag)->buf_index);
|
||||||
sqe[0]->user_data = build_user_data(tag,
|
sqe[0]->user_data = build_user_data(tag,
|
||||||
ublk_cmd_op_nr(sqe[0]->cmd_op), 0, 1);
|
ublk_cmd_op_nr(sqe[0]->cmd_op), 0, q->q_id, 1);
|
||||||
sqe[0]->flags |= IOSQE_CQE_SKIP_SUCCESS | IOSQE_IO_HARDLINK;
|
sqe[0]->flags |= IOSQE_CQE_SKIP_SUCCESS | IOSQE_IO_HARDLINK;
|
||||||
|
|
||||||
__setup_nop_io(tag, iod, sqe[1]);
|
__setup_nop_io(tag, iod, sqe[1], q->q_id);
|
||||||
sqe[1]->flags |= IOSQE_IO_HARDLINK;
|
sqe[1]->flags |= IOSQE_IO_HARDLINK;
|
||||||
|
|
||||||
io_uring_prep_buf_unregister(sqe[2], 0, tag, q->q_id, tag);
|
io_uring_prep_buf_unregister(sqe[2], 0, tag, q->q_id, ublk_get_io(q, tag)->buf_index);
|
||||||
sqe[2]->user_data = build_user_data(tag, ublk_cmd_op_nr(sqe[2]->cmd_op), 0, 1);
|
sqe[2]->user_data = build_user_data(tag, ublk_cmd_op_nr(sqe[2]->cmd_op), 0, q->q_id, 1);
|
||||||
|
|
||||||
// buf register is marked as IOSQE_CQE_SKIP_SUCCESS
|
// buf register is marked as IOSQE_CQE_SKIP_SUCCESS
|
||||||
return 2;
|
return 2;
|
||||||
|
|
@ -82,8 +82,8 @@ static int null_queue_auto_zc_io(struct ublk_queue *q, int tag)
|
||||||
const struct ublksrv_io_desc *iod = ublk_get_iod(q, tag);
|
const struct ublksrv_io_desc *iod = ublk_get_iod(q, tag);
|
||||||
struct io_uring_sqe *sqe[1];
|
struct io_uring_sqe *sqe[1];
|
||||||
|
|
||||||
ublk_queue_alloc_sqes(q, sqe, 1);
|
ublk_io_alloc_sqes(ublk_get_io(q, tag), sqe, 1);
|
||||||
__setup_nop_io(tag, iod, sqe[0]);
|
__setup_nop_io(tag, iod, sqe[0], q->q_id);
|
||||||
return 1;
|
return 1;
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|
@ -136,7 +136,7 @@ static unsigned short ublk_null_buf_index(const struct ublk_queue *q, int tag)
|
||||||
{
|
{
|
||||||
if (q->state & UBLKSRV_AUTO_BUF_REG_FALLBACK)
|
if (q->state & UBLKSRV_AUTO_BUF_REG_FALLBACK)
|
||||||
return (unsigned short)-1;
|
return (unsigned short)-1;
|
||||||
return tag;
|
return q->ios[tag].buf_index;
|
||||||
}
|
}
|
||||||
|
|
||||||
const struct ublk_tgt_ops null_tgt_ops = {
|
const struct ublk_tgt_ops null_tgt_ops = {
|
||||||
|
|
|
||||||
|
|
@ -138,13 +138,13 @@ static int stripe_queue_tgt_rw_io(struct ublk_queue *q, const struct ublksrv_io_
|
||||||
io->private_data = s;
|
io->private_data = s;
|
||||||
calculate_stripe_array(conf, iod, s, base);
|
calculate_stripe_array(conf, iod, s, base);
|
||||||
|
|
||||||
ublk_queue_alloc_sqes(q, sqe, s->nr + extra);
|
ublk_io_alloc_sqes(ublk_get_io(q, tag), sqe, s->nr + extra);
|
||||||
|
|
||||||
if (zc) {
|
if (zc) {
|
||||||
io_uring_prep_buf_register(sqe[0], 0, tag, q->q_id, tag);
|
io_uring_prep_buf_register(sqe[0], 0, tag, q->q_id, io->buf_index);
|
||||||
sqe[0]->flags |= IOSQE_CQE_SKIP_SUCCESS | IOSQE_IO_HARDLINK;
|
sqe[0]->flags |= IOSQE_CQE_SKIP_SUCCESS | IOSQE_IO_HARDLINK;
|
||||||
sqe[0]->user_data = build_user_data(tag,
|
sqe[0]->user_data = build_user_data(tag,
|
||||||
ublk_cmd_op_nr(sqe[0]->cmd_op), 0, 1);
|
ublk_cmd_op_nr(sqe[0]->cmd_op), 0, q->q_id, 1);
|
||||||
}
|
}
|
||||||
|
|
||||||
for (i = zc; i < s->nr + extra - zc; i++) {
|
for (i = zc; i < s->nr + extra - zc; i++) {
|
||||||
|
|
@ -162,13 +162,14 @@ static int stripe_queue_tgt_rw_io(struct ublk_queue *q, const struct ublksrv_io_
|
||||||
sqe[i]->flags |= IOSQE_IO_HARDLINK;
|
sqe[i]->flags |= IOSQE_IO_HARDLINK;
|
||||||
}
|
}
|
||||||
/* bit63 marks us as tgt io */
|
/* bit63 marks us as tgt io */
|
||||||
sqe[i]->user_data = build_user_data(tag, ublksrv_get_op(iod), i - zc, 1);
|
sqe[i]->user_data = build_user_data(tag, ublksrv_get_op(iod), i - zc, q->q_id, 1);
|
||||||
}
|
}
|
||||||
if (zc) {
|
if (zc) {
|
||||||
struct io_uring_sqe *unreg = sqe[s->nr + 1];
|
struct io_uring_sqe *unreg = sqe[s->nr + 1];
|
||||||
|
|
||||||
io_uring_prep_buf_unregister(unreg, 0, tag, q->q_id, tag);
|
io_uring_prep_buf_unregister(unreg, 0, tag, q->q_id, io->buf_index);
|
||||||
unreg->user_data = build_user_data(tag, ublk_cmd_op_nr(unreg->cmd_op), 0, 1);
|
unreg->user_data = build_user_data(
|
||||||
|
tag, ublk_cmd_op_nr(unreg->cmd_op), 0, q->q_id, 1);
|
||||||
}
|
}
|
||||||
|
|
||||||
/* register buffer is skip_success */
|
/* register buffer is skip_success */
|
||||||
|
|
@ -181,11 +182,11 @@ static int handle_flush(struct ublk_queue *q, const struct ublksrv_io_desc *iod,
|
||||||
struct io_uring_sqe *sqe[NR_STRIPE];
|
struct io_uring_sqe *sqe[NR_STRIPE];
|
||||||
int i;
|
int i;
|
||||||
|
|
||||||
ublk_queue_alloc_sqes(q, sqe, conf->nr_files);
|
ublk_io_alloc_sqes(ublk_get_io(q, tag), sqe, conf->nr_files);
|
||||||
for (i = 0; i < conf->nr_files; i++) {
|
for (i = 0; i < conf->nr_files; i++) {
|
||||||
io_uring_prep_fsync(sqe[i], i + 1, IORING_FSYNC_DATASYNC);
|
io_uring_prep_fsync(sqe[i], i + 1, IORING_FSYNC_DATASYNC);
|
||||||
io_uring_sqe_set_flags(sqe[i], IOSQE_FIXED_FILE);
|
io_uring_sqe_set_flags(sqe[i], IOSQE_FIXED_FILE);
|
||||||
sqe[i]->user_data = build_user_data(tag, UBLK_IO_OP_FLUSH, 0, 1);
|
sqe[i]->user_data = build_user_data(tag, UBLK_IO_OP_FLUSH, 0, q->q_id, 1);
|
||||||
}
|
}
|
||||||
return conf->nr_files;
|
return conf->nr_files;
|
||||||
}
|
}
|
||||||
|
|
|
||||||
|
|
@ -278,6 +278,11 @@ __run_io_and_remove()
|
||||||
fio --name=job1 --filename=/dev/ublkb"${dev_id}" --ioengine=libaio \
|
fio --name=job1 --filename=/dev/ublkb"${dev_id}" --ioengine=libaio \
|
||||||
--rw=randrw --norandommap --iodepth=256 --size="${size}" --numjobs="$(nproc)" \
|
--rw=randrw --norandommap --iodepth=256 --size="${size}" --numjobs="$(nproc)" \
|
||||||
--runtime=20 --time_based > /dev/null 2>&1 &
|
--runtime=20 --time_based > /dev/null 2>&1 &
|
||||||
|
fio --name=batchjob --filename=/dev/ublkb"${dev_id}" --ioengine=io_uring \
|
||||||
|
--rw=randrw --norandommap --iodepth=256 --size="${size}" \
|
||||||
|
--numjobs="$(nproc)" --runtime=20 --time_based \
|
||||||
|
--iodepth_batch_submit=32 --iodepth_batch_complete_min=32 \
|
||||||
|
--force_async=7 > /dev/null 2>&1 &
|
||||||
sleep 2
|
sleep 2
|
||||||
if [ "${kill_server}" = "yes" ]; then
|
if [ "${kill_server}" = "yes" ]; then
|
||||||
local state
|
local state
|
||||||
|
|
|
||||||
55
tools/testing/selftests/ublk/test_generic_12.sh
Executable file
55
tools/testing/selftests/ublk/test_generic_12.sh
Executable file
|
|
@ -0,0 +1,55 @@
|
||||||
|
#!/bin/bash
|
||||||
|
# SPDX-License-Identifier: GPL-2.0
|
||||||
|
|
||||||
|
. "$(cd "$(dirname "$0")" && pwd)"/test_common.sh
|
||||||
|
|
||||||
|
TID="generic_12"
|
||||||
|
ERR_CODE=0
|
||||||
|
|
||||||
|
if ! _have_program bpftrace; then
|
||||||
|
exit "$UBLK_SKIP_CODE"
|
||||||
|
fi
|
||||||
|
|
||||||
|
_prep_test "null" "do imbalanced load, it should be balanced over I/O threads"
|
||||||
|
|
||||||
|
NTHREADS=6
|
||||||
|
dev_id=$(_add_ublk_dev -t null -q 4 -d 16 --nthreads $NTHREADS --per_io_tasks)
|
||||||
|
_check_add_dev $TID $?
|
||||||
|
|
||||||
|
dev_t=$(_get_disk_dev_t "$dev_id")
|
||||||
|
bpftrace trace/count_ios_per_tid.bt "$dev_t" > "$UBLK_TMP" 2>&1 &
|
||||||
|
btrace_pid=$!
|
||||||
|
sleep 2
|
||||||
|
|
||||||
|
if ! kill -0 "$btrace_pid" > /dev/null 2>&1; then
|
||||||
|
_cleanup_test "null"
|
||||||
|
exit "$UBLK_SKIP_CODE"
|
||||||
|
fi
|
||||||
|
|
||||||
|
# do imbalanced I/O on the ublk device
|
||||||
|
# pin to cpu 0 to prevent migration/only target one queue
|
||||||
|
fio --name=write_seq \
|
||||||
|
--filename=/dev/ublkb"${dev_id}" \
|
||||||
|
--ioengine=libaio --iodepth=16 \
|
||||||
|
--rw=write \
|
||||||
|
--size=512M \
|
||||||
|
--direct=1 \
|
||||||
|
--bs=4k \
|
||||||
|
--cpus_allowed=0 > /dev/null 2>&1
|
||||||
|
ERR_CODE=$?
|
||||||
|
kill "$btrace_pid"
|
||||||
|
wait
|
||||||
|
|
||||||
|
# check that every task handles some I/O, even though all I/O was issued
|
||||||
|
# from a single CPU. when ublk gets support for round-robin tag
|
||||||
|
# allocation, this check can be strengthened to assert that every thread
|
||||||
|
# handles the same number of I/Os
|
||||||
|
NR_THREADS_THAT_HANDLED_IO=$(grep -c '@' ${UBLK_TMP})
|
||||||
|
if [[ $NR_THREADS_THAT_HANDLED_IO -ne $NTHREADS ]]; then
|
||||||
|
echo "only $NR_THREADS_THAT_HANDLED_IO handled I/O! expected $NTHREADS"
|
||||||
|
cat "$UBLK_TMP"
|
||||||
|
ERR_CODE=255
|
||||||
|
fi
|
||||||
|
|
||||||
|
_cleanup_test "null"
|
||||||
|
_show_result $TID $ERR_CODE
|
||||||
|
|
@ -41,5 +41,13 @@ if _have_feature "AUTO_BUF_REG"; then
|
||||||
fi
|
fi
|
||||||
wait
|
wait
|
||||||
|
|
||||||
|
if _have_feature "PER_IO_DAEMON"; then
|
||||||
|
ublk_io_and_remove 8G -t null -q 4 --auto_zc --nthreads 8 --per_io_tasks &
|
||||||
|
ublk_io_and_remove 256M -t loop -q 4 --auto_zc --nthreads 8 --per_io_tasks "${UBLK_BACKFILES[0]}" &
|
||||||
|
ublk_io_and_remove 256M -t stripe -q 4 --auto_zc --nthreads 8 --per_io_tasks "${UBLK_BACKFILES[1]}" "${UBLK_BACKFILES[2]}" &
|
||||||
|
ublk_io_and_remove 8G -t null -q 4 -z --auto_zc --auto_zc_fallback --nthreads 8 --per_io_tasks &
|
||||||
|
fi
|
||||||
|
wait
|
||||||
|
|
||||||
_cleanup_test "stress"
|
_cleanup_test "stress"
|
||||||
_show_result $TID $ERR_CODE
|
_show_result $TID $ERR_CODE
|
||||||
|
|
|
||||||
|
|
@ -38,6 +38,13 @@ if _have_feature "AUTO_BUF_REG"; then
|
||||||
ublk_io_and_kill_daemon 256M -t stripe -q 4 --auto_zc "${UBLK_BACKFILES[1]}" "${UBLK_BACKFILES[2]}" &
|
ublk_io_and_kill_daemon 256M -t stripe -q 4 --auto_zc "${UBLK_BACKFILES[1]}" "${UBLK_BACKFILES[2]}" &
|
||||||
ublk_io_and_kill_daemon 8G -t null -q 4 -z --auto_zc --auto_zc_fallback &
|
ublk_io_and_kill_daemon 8G -t null -q 4 -z --auto_zc --auto_zc_fallback &
|
||||||
fi
|
fi
|
||||||
|
|
||||||
|
if _have_feature "PER_IO_DAEMON"; then
|
||||||
|
ublk_io_and_kill_daemon 8G -t null -q 4 --nthreads 8 --per_io_tasks &
|
||||||
|
ublk_io_and_kill_daemon 256M -t loop -q 4 --nthreads 8 --per_io_tasks "${UBLK_BACKFILES[0]}" &
|
||||||
|
ublk_io_and_kill_daemon 256M -t stripe -q 4 --nthreads 8 --per_io_tasks "${UBLK_BACKFILES[1]}" "${UBLK_BACKFILES[2]}" &
|
||||||
|
ublk_io_and_kill_daemon 8G -t null -q 4 --nthreads 8 --per_io_tasks &
|
||||||
|
fi
|
||||||
wait
|
wait
|
||||||
|
|
||||||
_cleanup_test "stress"
|
_cleanup_test "stress"
|
||||||
|
|
|
||||||
|
|
@ -69,5 +69,12 @@ if _have_feature "AUTO_BUF_REG"; then
|
||||||
done
|
done
|
||||||
fi
|
fi
|
||||||
|
|
||||||
|
if _have_feature "PER_IO_DAEMON"; then
|
||||||
|
ublk_io_and_remove 8G -t null -q 4 --nthreads 8 --per_io_tasks -r 1 -i "$reissue" &
|
||||||
|
ublk_io_and_remove 256M -t loop -q 4 --nthreads 8 --per_io_tasks -r 1 -i "$reissue" "${UBLK_BACKFILES[0]}" &
|
||||||
|
ublk_io_and_remove 8G -t null -q 4 --nthreads 8 --per_io_tasks -r 1 -i "$reissue" &
|
||||||
|
fi
|
||||||
|
wait
|
||||||
|
|
||||||
_cleanup_test "stress"
|
_cleanup_test "stress"
|
||||||
_show_result $TID $ERR_CODE
|
_show_result $TID $ERR_CODE
|
||||||
|
|
|
||||||
11
tools/testing/selftests/ublk/trace/count_ios_per_tid.bt
Normal file
11
tools/testing/selftests/ublk/trace/count_ios_per_tid.bt
Normal file
|
|
@ -0,0 +1,11 @@
|
||||||
|
/*
|
||||||
|
* Tabulates and prints I/O completions per thread for the given device
|
||||||
|
*
|
||||||
|
* $1: dev_t
|
||||||
|
*/
|
||||||
|
tracepoint:block:block_rq_complete
|
||||||
|
{
|
||||||
|
if (args.dev == $1) {
|
||||||
|
@[tid] = count();
|
||||||
|
}
|
||||||
|
}
|
||||||
Loading…
Add table
Reference in a new issue