linux/drivers/md
Yufen Yu 8c24259323 md/raid1: exit sync request if MD_RECOVERY_INTR is set
We met a sync thread stuck as follows:

 raid1_sync_request+0x2c9/0xb50
 md_do_sync+0x983/0xfa0
 md_thread+0x11c/0x160
 kthread+0x111/0x130
 ret_from_fork+0x35/0x40
 0xffffffffffffffff

At the same time, there is a stuck mdadm thread (mdadm --manage
/dev/md2 --add /dev/sda). It is trying to stop the sync thread:

 kthread_stop+0x42/0xf0
 md_unregister_thread+0x3a/0x70
 md_reap_sync_thread+0x15/0x160
 action_store+0x142/0x2a0
 md_attr_store+0x6c/0xb0
 kernfs_fop_write+0x102/0x180
 __vfs_write+0x33/0x170
 vfs_write+0xad/0x1a0
 SyS_write+0x52/0xc0
 do_syscall_64+0x6e/0x190
 entry_SYSCALL_64_after_hwframe+0x3d/0xa2

Debug tools show that the sync thread is waiting in raise_barrier(),
until raid1d() end all normal IO bios into bio_end_io_list(introduced
in commit 55ce74d4bf). But, raid1d() cannot end these bios if
MD_CHANGE_PENDING bit is set. It needs to get mddev->reconfig_mutex lock
and then clear the bit in md_check_recovery().
However, the lock is holding by mdadm in action_store().

Thus, there is a loop:
mdadm waiting for sync thread to stop, sync thread waiting for
raid1d() to end bios, raid1d() waiting for mdadm to release
mddev->reconfig_mutex lock and then it can end bios.

Fix this by checking MD_RECOVERY_INTR while waiting in raise_barrier(),
so that sync thread can exit while mdadm is stoping the sync thread.

Fixes: 55ce74d4bf ("md/raid1: ensure device failure recorded before write request returns.")
Signed-off-by: Jason Yan <yanaijie@huawei.com>
Signed-off-by: Yufen Yu <yuyufen@huawei.com>
Signed-off-by: Shaohua Li <shli@fb.com>
2018-04-09 08:41:16 -07:00
..
bcache for-4.17/block-20180402 2018-04-05 14:27:02 -07:00
persistent-data dm bufio: move dm-bufio.h to include/linux/ 2018-04-03 15:04:23 -04:00
dm-bio-prison-v1.c
dm-bio-prison-v1.h
dm-bio-prison-v2.c
dm-bio-prison-v2.h
dm-bio-record.h
dm-bufio.c dm bufio: don't embed a bio in the dm_buffer structure 2018-04-03 15:04:29 -04:00
dm-builtin.c
dm-cache-background-tracker.c dm cache background tracker: limit amount of background work that may be issued at once 2017-11-10 15:45:03 -05:00
dm-cache-background-tracker.h
dm-cache-block-types.h
dm-cache-metadata.c
dm-cache-metadata.h
dm-cache-policy-internal.h
dm-cache-policy-smq.c dm cache policy smq: allocate cache blocks in order 2017-11-10 15:45:05 -05:00
dm-cache-policy.c
dm-cache-policy.h
dm-cache-target.c dm: allow targets to return output from messages they are sent 2018-04-03 15:04:10 -04:00
dm-core.h dm: various cleanups to md->queue initialization code 2018-01-29 13:44:55 -05:00
dm-crypt.c dm crypt: limit the number of allocated pages 2018-04-03 15:04:11 -04:00
dm-delay.c dm: backfill missing calls to mutex_destroy() 2018-01-17 09:16:15 -05:00
dm-era-target.c dm: allow targets to return output from messages they are sent 2018-04-03 15:04:10 -04:00
dm-exception-store.c
dm-exception-store.h
dm-flakey.c dm: remove fmode_t argument from .prepare_ioctl hook 2018-04-04 12:12:39 -04:00
dm-integrity.c dm bufio: move dm-bufio.h to include/linux/ 2018-04-03 15:04:23 -04:00
dm-io.c dm io: remove BIOSET_NEED_RESCUER flag from bios bioset 2017-12-13 12:15:56 -05:00
dm-ioctl.c dm: allow targets to return output from messages they are sent 2018-04-03 15:04:10 -04:00
dm-kcopyd.c dm: backfill missing calls to mutex_destroy() 2018-01-17 09:16:15 -05:00
dm-linear.c dm: remove fmode_t argument from .prepare_ioctl hook 2018-04-04 12:12:39 -04:00
dm-log-userspace-base.c
dm-log-userspace-transfer.c
dm-log-userspace-transfer.h
dm-log-writes.c dm: remove fmode_t argument from .prepare_ioctl hook 2018-04-04 12:12:39 -04:00
dm-log.c
dm-mpath.c dm: remove fmode_t argument from .prepare_ioctl hook 2018-04-04 12:12:39 -04:00
dm-mpath.h
dm-path-selector.c
dm-path-selector.h
dm-queue-length.c dm mpath selector: more evenly distribute ties 2018-01-29 13:44:58 -05:00
dm-raid.c dm raid: fix parse_raid_params() variable range issue 2018-04-04 12:12:37 -04:00
dm-raid1.c md: Convert timers to use timer_setup() 2017-11-14 20:11:57 -07:00
dm-region-hash.c
dm-round-robin.c
dm-rq.c for-linus-20180204 2018-02-04 11:16:35 -08:00
dm-rq.h
dm-service-time.c dm mpath selector: more evenly distribute ties 2018-01-29 13:44:58 -05:00
dm-snap-persistent.c dm bufio: move dm-bufio.h to include/linux/ 2018-04-03 15:04:23 -04:00
dm-snap-transient.c
dm-snap.c dm snapshot: use mutex instead of rw_semaphore 2018-01-17 09:16:14 -05:00
dm-stats.c dm: backfill missing calls to mutex_destroy() 2018-01-17 09:16:15 -05:00
dm-stats.h
dm-stripe.c dm: add support for secure erase forwarding 2018-04-03 15:04:21 -04:00
dm-switch.c dm: remove fmode_t argument from .prepare_ioctl hook 2018-04-04 12:12:39 -04:00
dm-sysfs.c
dm-table.c - DM core passthrough ioctl fix to retain reference to DM table, and 2018-04-06 11:50:19 -07:00
dm-target.c dm: remove unused macro DM_MOD_NAME_SIZE 2018-04-03 15:04:15 -04:00
dm-thin-metadata.c dm thin metadata: THIN_MAX_CONCURRENT_LOCKS should be 6 2018-01-17 09:07:54 -05:00
dm-thin-metadata.h
dm-thin.c dm: allow targets to return output from messages they are sent 2018-04-03 15:04:10 -04:00
dm-uevent.c
dm-uevent.h
dm-unstripe.c dm unstripe: remove unnecessary header includes 2018-04-03 15:04:15 -04:00
dm-verity-fec.c
dm-verity-fec.h
dm-verity-target.c dm: remove fmode_t argument from .prepare_ioctl hook 2018-04-04 12:12:39 -04:00
dm-verity.h dm verity: add 'check_at_most_once' option to only validate hashes once 2018-04-03 15:04:29 -04:00
dm-zero.c
dm-zoned-metadata.c dm: backfill missing calls to mutex_destroy() 2018-01-17 09:16:15 -05:00
dm-zoned-reclaim.c
dm-zoned-target.c dm: remove fmode_t argument from .prepare_ioctl hook 2018-04-04 12:12:39 -04:00
dm-zoned.h
dm.c - DM core passthrough ioctl fix to retain reference to DM table, and 2018-04-06 11:50:19 -07:00
dm.h dm: move dm_table_destroy() to same header as dm_table_create() 2018-01-17 09:16:06 -05:00
Kconfig dm: add unstriped target 2018-01-17 09:16:00 -05:00
Makefile dm: add unstriped target 2018-01-17 09:16:00 -05:00
md-bitmap.c Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/shli/md 2017-11-14 16:07:26 -08:00
md-bitmap.h Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/shli/md 2017-11-14 16:07:26 -08:00
md-cluster.c
md-cluster.h
md-faulty.c
md-linear.c block: Use blk_queue_flag_*() in drivers instead of queue_flag_*() 2018-03-08 14:13:48 -07:00
md-linear.h Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/shli/md 2017-11-14 16:07:26 -08:00
md-multipath.c md-multipath: Use seq_putc() in multipath_status() 2018-02-17 13:00:35 -08:00
md-multipath.h Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/shli/md 2017-11-14 16:07:26 -08:00
md.c md-cluster: don't update recovery_offset for faulty device 2018-04-09 08:39:36 -07:00
md.h md: fix md_write_start() deadlock w/o metadata devices 2018-02-18 10:11:59 -08:00
raid0.c block: Use blk_queue_flag_*() in drivers instead of queue_flag_*() 2018-03-08 14:13:48 -07:00
raid0.h
raid1-10.c
raid1.c md/raid1: exit sync request if MD_RECOVERY_INTR is set 2018-04-09 08:41:16 -07:00
raid1.h md: document lifetime of internal rdev pointer. 2018-02-18 10:22:27 -08:00
raid5-cache.c raid5-ppl: PPL support for disks with write-back cache enabled 2018-01-15 14:29:42 -08:00
raid5-log.h raid5-ppl: fix handling flush requests 2018-02-21 09:40:40 -08:00
raid5-ppl.c raid5-ppl: fix handling flush requests 2018-02-21 09:40:40 -08:00
raid5.c for-4.17/block-20180402 2018-04-05 14:27:02 -07:00
raid5.h md: document lifetime of internal rdev pointer. 2018-02-18 10:22:27 -08:00
raid10.c for-4.17/block-20180402 2018-04-05 14:27:02 -07:00
raid10.h md: document lifetime of internal rdev pointer. 2018-02-18 10:22:27 -08:00