linux/fs
Hao Xu 8bad28d8a3 io_uring: don't hold uring_lock when calling io_run_task_work*
Abaci reported the below issue:
[  141.400455] hrtimer: interrupt took 205853 ns
[  189.869316] process 'usr/local/ilogtail/ilogtail_0.16.26' started with executable stack
[  250.188042]
[  250.188327] ============================================
[  250.189015] WARNING: possible recursive locking detected
[  250.189732] 5.11.0-rc4 #1 Not tainted
[  250.190267] --------------------------------------------
[  250.190917] a.out/7363 is trying to acquire lock:
[  250.191506] ffff888114dbcbe8 (&ctx->uring_lock){+.+.}-{3:3}, at: __io_req_task_submit+0x29/0xa0
[  250.192599]
[  250.192599] but task is already holding lock:
[  250.193309] ffff888114dbfbe8 (&ctx->uring_lock){+.+.}-{3:3}, at: __x64_sys_io_uring_register+0xad/0x210
[  250.194426]
[  250.194426] other info that might help us debug this:
[  250.195238]  Possible unsafe locking scenario:
[  250.195238]
[  250.196019]        CPU0
[  250.196411]        ----
[  250.196803]   lock(&ctx->uring_lock);
[  250.197420]   lock(&ctx->uring_lock);
[  250.197966]
[  250.197966]  *** DEADLOCK ***
[  250.197966]
[  250.198837]  May be due to missing lock nesting notation
[  250.198837]
[  250.199780] 1 lock held by a.out/7363:
[  250.200373]  #0: ffff888114dbfbe8 (&ctx->uring_lock){+.+.}-{3:3}, at: __x64_sys_io_uring_register+0xad/0x210
[  250.201645]
[  250.201645] stack backtrace:
[  250.202298] CPU: 0 PID: 7363 Comm: a.out Not tainted 5.11.0-rc4 #1
[  250.203144] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
[  250.203887] Call Trace:
[  250.204302]  dump_stack+0xac/0xe3
[  250.204804]  __lock_acquire+0xab6/0x13a0
[  250.205392]  lock_acquire+0x2c3/0x390
[  250.205928]  ? __io_req_task_submit+0x29/0xa0
[  250.206541]  __mutex_lock+0xae/0x9f0
[  250.207071]  ? __io_req_task_submit+0x29/0xa0
[  250.207745]  ? 0xffffffffa0006083
[  250.208248]  ? __io_req_task_submit+0x29/0xa0
[  250.208845]  ? __io_req_task_submit+0x29/0xa0
[  250.209452]  ? __io_req_task_submit+0x5/0xa0
[  250.210083]  __io_req_task_submit+0x29/0xa0
[  250.210687]  io_async_task_func+0x23d/0x4c0
[  250.211278]  task_work_run+0x89/0xd0
[  250.211884]  io_run_task_work_sig+0x50/0xc0
[  250.212464]  io_sqe_files_unregister+0xb2/0x1f0
[  250.213109]  __io_uring_register+0x115a/0x1750
[  250.213718]  ? __x64_sys_io_uring_register+0xad/0x210
[  250.214395]  ? __fget_files+0x15a/0x260
[  250.214956]  __x64_sys_io_uring_register+0xbe/0x210
[  250.215620]  ? trace_hardirqs_on+0x46/0x110
[  250.216205]  do_syscall_64+0x2d/0x40
[  250.216731]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[  250.217455] RIP: 0033:0x7f0fa17e5239
[  250.218034] Code: 01 00 48 81 c4 80 00 00 00 e9 f1 fe ff ff 0f 1f 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05  3d 01 f0 ff ff 73 01 c3 48 8b 0d 27 ec 2c 00 f7 d8 64 89 01 48
[  250.220343] RSP: 002b:00007f0fa1eeac48 EFLAGS: 00000246 ORIG_RAX: 00000000000001ab
[  250.221360] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f0fa17e5239
[  250.222272] RDX: 0000000000000000 RSI: 0000000000000003 RDI: 0000000000000008
[  250.223185] RBP: 00007f0fa1eeae20 R08: 0000000000000000 R09: 0000000000000000
[  250.224091] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
[  250.224999] R13: 0000000000021000 R14: 0000000000000000 R15: 00007f0fa1eeb700

This is caused by calling io_run_task_work_sig() to do work under
uring_lock while the caller io_sqe_files_unregister() already held
uring_lock.
To fix this issue, briefly drop uring_lock when calling
io_run_task_work_sig(), and there are two things to concern:

- hold uring_lock in io_ring_ctx_free() around io_sqe_files_unregister()
    this is for consistency of lock/unlock.
- add new fixed rsrc ref node before dropping uring_lock
    it's not safe to do io_uring_enter-->percpu_ref_get() with a dying one.
- check if rsrc_data->refs is dying to avoid parallel io_sqe_files_unregister

Reported-by: Abaci <abaci@linux.alibaba.com>
Fixes: 1ffc54220c ("io_uring: fix io_sqe_files_unregister() hangs")
Suggested-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Hao Xu <haoxu@linux.alibaba.com>
[axboe: fixes from Pavel folded in]
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-02-20 19:02:12 -07:00
..
9p 9p for 5.11-rc1 2020-12-21 10:28:02 -08:00
adfs Merge branch 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2020-10-24 12:26:05 -07:00
affs Merge branch 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2020-10-24 12:26:05 -07:00
afs afs: Fix directory entry size calculation 2021-01-04 12:25:19 +00:00
autofs file: Replace ksys_close with close_fd 2020-12-10 12:42:59 -06:00
befs
bfs bfs: don't use WARNING: string when it's just info. 2020-12-15 22:46:18 -08:00
btrfs for-5.11-rc5-tag 2021-01-29 13:54:40 -08:00
cachefiles cachefiles: Drop superfluous readpages aops NULL check 2021-01-20 11:33:51 -08:00
ceph libceph, ceph: disambiguate ceph_connection_operations handlers 2021-01-04 17:31:32 +01:00
cifs cifs: fix dfs domain referrals 2021-01-28 21:40:43 -06:00
coda
configfs configfs: fix kernel-doc markup issue 2020-11-14 10:22:45 +01:00
cramfs
crypto f2fs-for-5.11-rc1 2020-12-17 11:18:00 -08:00
debugfs debugfs: remove return value of debugfs_create_devm_seqfile() 2020-10-30 08:37:39 +01:00
devpts
dlm fs: dlm: check on existing node address 2020-11-10 12:14:20 -06:00
ecryptfs ecryptfs: fix uid translation for setxattr on security.capability 2021-01-26 01:47:14 +00:00
efivarfs efivarfs: revert "fix memory leak in efivarfs_create()" 2020-11-25 16:55:02 +01:00
efs
erofs erofs: avoid using generic_block_bmap 2020-12-10 11:07:40 +08:00
exfat exfat: Avoid allocating upcase table using kcalloc() 2020-12-22 12:31:17 +09:00
exportfs exportfs: Add a function to return the raw output from fh_to_dentry() 2020-12-09 09:39:38 -05:00
ext2 ext2: Fix fall-through warnings for Clang 2020-11-23 10:36:53 +01:00
ext4 A number of bug fixes for ext4: 2021-01-15 14:54:24 -08:00
f2fs f2fs-for-5.11-rc1 2020-12-17 11:18:00 -08:00
fat
freevxfs
fscache
fuse fuse: fix bad inode 2020-12-10 15:33:14 +01:00
gfs2 gfs2: in signal_our_withdraw wait for unfreeze of _this_ fs only 2020-12-03 17:04:41 +01:00
hfs fs: Replace zero-length array with flexible-array member 2020-10-29 17:22:59 -05:00
hfsplus fs: Replace zero-length array with flexible-array member 2020-10-29 17:22:59 -05:00
hostfs fix hostfs_open() use of ->f_path.dentry 2020-12-21 21:42:29 -05:00
hpfs
hugetlbfs
iomap mm: memcontrol: Use helpers to read page's memcg data 2020-12-02 18:28:05 -08:00
isofs fs: Replace zero-length array with flexible-array member 2020-10-29 17:22:59 -05:00
jbd2 jbd2: add a helper to find out number of fast commit blocks 2020-12-17 13:30:45 -05:00
jffs2 jffs2: Fix NULL pointer dereference in rp_size fs option parsing 2020-12-13 21:57:21 +01:00
jfs jfs: Fix array index bounds check in dbAdjTree 2020-11-13 16:03:07 -06:00
kernfs kernfs: wire up ->splice_read and ->splice_write 2021-01-21 18:30:28 +01:00
lockd fs/lockd: convert comma to semicolon 2020-12-16 07:57:37 -05:00
minix
nfs pNFS/NFSv4: Improve rejection of out-of-order layouts 2021-01-24 20:52:31 -05:00
nfs_common nfs_common: need lock during iterate through the list 2020-12-09 09:38:34 -05:00
nfsd nfsd4: readdirplus shouldn't return parent of export 2021-01-12 08:54:14 -05:00
nilfs2 fs/nilfs2: remove some unused macros to tame gcc 2020-12-15 22:46:17 -08:00
nls
notify fanotify: Fix sys_fanotify_mark() on native x86-32 2020-12-28 11:58:59 +01:00
ntfs fs/ntfs: remove unused variable attr_len 2020-12-15 12:13:37 -08:00
ocfs2 ocfs2: ratelimit the 'max lookup times reached' notice 2020-12-15 12:13:37 -08:00
omfs
openpromfs
orangefs orangefs: add splice file operations 2020-12-16 16:14:08 -05:00
overlayfs ovl: unprivieged mounts 2020-12-14 15:26:14 +01:00
proc proc: don't allow async path resolution of /proc/thread-self components 2021-02-15 11:02:16 -07:00
pstore Tracing updates for 5.11 2020-12-17 13:22:17 -08:00
qnx4
qnx6
quota \n 2020-12-17 11:00:37 -08:00
ramfs ramfs: fix nommu mmap with gaps in the page cache 2020-10-16 11:11:22 -07:00
reiserfs reiserfs: add check for an invalid ih_entry_count 2020-11-26 16:57:28 +01:00
romfs Merge branch 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2020-10-24 12:26:05 -07:00
squashfs Merge branch 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2020-10-24 12:26:05 -07:00
sysfs
sysv
tracefs
ubifs This pull request contains changes for JFFS2, UBI and UBIFS: 2020-12-17 17:46:34 -08:00
udf udf: fix the problem that the disc content is not displayed 2021-01-18 12:06:33 +01:00
ufs Merge branch 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2020-10-24 12:26:05 -07:00
unicode
vboxsf
verity Merge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 2020-12-14 12:18:19 -08:00
xfs New code for 5.11: 2020-12-18 12:50:18 -08:00
zonefs zonefs: select CONFIG_CRC32 2021-01-04 09:06:42 +09:00
aio.c Merge branch 'akpm' (patches from Andrew) 2020-12-15 12:53:37 -08:00
anon_inodes.c
attr.c
bad_inode.c
binfmt_aout.c
binfmt_elf.c Merge branch 'exec-for-v5.11' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace 2020-12-15 19:29:43 -08:00
binfmt_elf_fdpic.c binfmt_elf, binfmt_elf_fdpic: use a VMA list snapshot 2020-10-16 11:11:21 -07:00
binfmt_em86.c
binfmt_flat.c
binfmt_misc.c
binfmt_script.c
block_dev.c Revert "block: simplify set_init_blocksize" to regain lost performance 2021-01-27 09:14:12 -07:00
buffer.c for-5.11/block-2020-12-14 2020-12-16 12:57:51 -08:00
char_dev.c
compat_binfmt_elf.c elf: Expose ELF header on arch_setup_additional_pages() 2020-10-26 13:46:47 +01:00
coredump.c Merge branch 'exec-for-v5.11' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace 2020-12-15 19:29:43 -08:00
d_path.c
dax.c mm: simplify follow_pte{,pmd} 2020-12-15 22:46:19 -08:00
dcache.c fs: Kill DCACHE_DONTCACHE dentry even if DCACHE_REFERENCED is set 2020-12-10 17:33:17 -05:00
dcookies.c
direct-io.c
drop_caches.c
eventfd.c eventfd: Export eventfd_ctx_do_read() 2020-11-15 09:49:10 -05:00
eventpoll.c epoll: add syscall epoll_pwait2 2020-12-19 11:18:38 -08:00
exec.c Merge branch 'parisc-5.11-1' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux 2020-12-16 12:10:40 -08:00
fcntl.c fcntl: Fix potential deadlock in send_sig{io, urg}() 2020-11-05 07:44:15 -05:00
fhandle.c
file.c fs: provide locked helper variant of close_fd_get_file() 2021-02-01 10:02:42 -07:00
file_table.c epoll: take epitem list out of struct file 2020-10-25 20:02:08 -04:00
filesystems.c
fs-writeback.c fs: fix lazytime expiration handling in __writeback_single_inode() 2021-01-13 17:26:21 +01:00
fs_context.c
fs_parser.c
fs_pin.c
fs_struct.c
fs_types.c
fsopen.c
init.c
inode.c Merge branch 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2020-12-25 10:54:29 -08:00
internal.h fs: provide locked helper variant of close_fd_get_file() 2021-02-01 10:02:42 -07:00
io-wq.c io-wq: clear out worker ->fs and ->files 2021-02-12 14:02:54 -07:00
io-wq.h io_uring: provide FIFO ordering for task_work 2021-02-10 07:28:43 -07:00
io_uring.c io_uring: don't hold uring_lock when calling io_run_task_work* 2021-02-20 19:02:12 -07:00
ioctl.c
Kconfig
Kconfig.binfmt
kernel_read_file.c
libfs.c f2fs-for-5.11-rc1 2020-12-17 11:18:00 -08:00
locks.c Merge branch 'exec-for-v5.11' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace 2020-12-15 19:29:43 -08:00
Makefile Refactored code for 5.10: 2020-10-23 11:33:41 -07:00
mbcache.c
mount.h
mpage.c
namei.c fs: add support for LOOKUP_CACHED 2021-01-04 11:42:21 -05:00
namespace.c umount(2): move the flag validity checks first 2021-01-04 15:31:58 -05:00
no-block.c
nsfs.c
open.c fs: expose LOOKUP_CACHED through openat2() RESOLVE_CACHED 2021-01-04 11:42:26 -05:00
pipe.c fs/pipe: allow sendfile() to pipe again 2021-01-25 12:32:26 -08:00
pnode.c
pnode.h fs/namespace.c: WARN if mnt_count has become negative 2020-12-10 17:33:17 -05:00
posix_acl.c
proc_namespace.c proc mountinfo: make splice available again 2020-12-27 12:00:36 -08:00
read_write.c Refactored code for 5.10: 2020-10-23 11:33:41 -07:00
readdir.c
remap_range.c vfs: verify source area in vfs_dedupe_file_range_one() 2020-12-14 15:26:13 +01:00
select.c poll: fix performance regression due to out-of-line __put_user() 2021-01-08 11:06:29 -08:00
seq_file.c fix return values of seq_read_iter() 2020-11-15 22:12:53 -05:00
signalfd.c
splice.c io_uring-5.10-2020-10-24 2020-10-24 12:40:18 -07:00
stack.c
stat.c
statfs.c block: remove i_bdev 2020-12-01 14:53:39 -07:00
super.c block: remove i_bdev 2020-12-01 14:53:39 -07:00
sync.c
timerfd.c
userfaultfd.c userfaultfd: add user-mode only option to unprivileged_userfaultfd sysctl knob 2020-12-15 12:13:46 -08:00
utimes.c
xattr.c vfs: move cap_convert_nscap() call into vfs_setxattr() 2020-12-14 15:26:13 +01:00