No description
Find a file
Qu Wenruo ae76d8e3e1 btrfs: scrub: fix grouping of read IO
[REGRESSION]
There are several regression reports about the scrub performance with
v6.4 kernel.

On a PCIe 3.0 device, the old v6.3 kernel can go 3GB/s scrub speed, but
v6.4 can only go 1GB/s, an obvious 66% performance drop.

[CAUSE]
Iostat shows a very different behavior between v6.3 and v6.4 kernel:

  Device         r/s      rkB/s   rrqm/s  %rrqm r_await rareq-sz aqu-sz  %util
  nvme0n1p3  9731.00 3425544.00 17237.00  63.92    2.18   352.02  21.18 100.00
  nvme0n1p3 15578.00  993616.00     5.00   0.03    0.09    63.78   1.32 100.00

The upper one is v6.3 while the lower one is v6.4.

There are several obvious differences:

- Very few read merges
  This turns out to be a behavior change that we no longer do bio
  plug/unplug.

- Very low aqu-sz
  This is due to the submit-and-wait behavior of flush_scrub_stripes(),
  and extra extent/csum tree search.

Both behaviors are not that obvious on SATA SSDs, as SATA SSDs have NCQ
to merge the reads, while SATA SSDs can not handle high queue depth well
either.

[FIX]
For now this patch focuses on the read speed fix. Dev-replace replace
speed needs more work.

For the read part, we go two directions to fix the problems:

- Re-introduce blk plug/unplug to merge read requests
  This is pretty simple, and the behavior is pretty easy to observe.

  This would enlarge the average read request size to 512K.

- Introduce multi-group reads and no longer wait for each group
  Instead of the old behavior, which submits 8 stripes and waits for
  them, here we would enlarge the total number of stripes to 16 * 8.
  Which is 8M per device, the same limit as the old scrub in-flight
  bios size limit.

  Now every time we fill a group (8 stripes), we submit them and
  continue to next stripes.

  Only when the full 16 * 8 stripes are all filled, we submit the
  remaining ones (the last group), and wait for all groups to finish.
  Then submit the repair writes and dev-replace writes.

  This should enlarge the queue depth.

This would greatly improve the merge rate (thus read block size) and
queue depth:

Before (with regression, and cached extent/csum path):

 Device         r/s      rkB/s   rrqm/s  %rrqm r_await rareq-sz aqu-sz  %util
 nvme0n1p3 20666.00 1318240.00    10.00   0.05    0.08    63.79   1.63 100.00

After (with all patches applied):

 nvme0n1p3  5165.00 2278304.00 30557.00  85.54    0.55   441.10   2.81 100.00

i.e. 1287 to 2224 MB/s.

CC: stable@vger.kernel.org # 6.4+
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-08-21 14:54:49 +02:00
arch - Use LEA ...%rsp instead of ADD %rsp in the Zen1/2 SRSO return sequence 2023-08-19 10:46:02 +02:00
block block-6.5-2023-08-19 2023-08-19 17:31:46 +02:00
certs KEYS: Add missing function documentation 2023-04-24 16:15:52 +03:00
crypto crypto: algif_hash - Fix race between MORE and non-MORE sends 2023-07-08 22:48:42 +10:00
Documentation Usual set of driver fixes. A bit more than usual because I was 2023-08-19 19:22:41 +02:00
drivers TTY/Serial fixes for 6.5-rc7 2023-08-20 08:26:51 +02:00
fs btrfs: scrub: fix grouping of read IO 2023-08-21 14:54:49 +02:00
include btrfs: remove v0 extent handling 2023-08-21 14:54:48 +02:00
init Kbuild updates for v6.5 2023-07-01 09:24:31 -07:00
io_uring io_uring/parisc: Adjust pgoff in io_uring mmap() for parisc 2023-08-08 12:37:01 -06:00
ipc Merge branch 'work.namespace' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2023-02-24 19:20:07 -08:00
kernel Power management fixes for 6.5-rc6 2023-08-11 12:24:22 -07:00
lib 14 hotfixes. 11 of these are cc:stable and the remainder address post-6.4 2023-08-11 14:19:20 -07:00
LICENSES LICENSES: Add the copyleft-next-0.3.1 license 2022-11-08 15:44:01 +01:00
mm mm: remove folio_account_redirty 2023-08-21 14:52:16 +02:00
net Including fixes from ipsec and netfilter. 2023-08-18 06:52:23 +02:00
rust rust: macros: vtable: fix HAS_* redefinition (gen_const_name) 2023-08-09 21:15:07 +02:00
samples arm64: ftrace: Add direct call trampoline samples support 2023-07-10 17:51:54 -04:00
scripts Kbuild fixes for v6.5 (2nd) 2023-08-13 08:56:24 -07:00
security sysctl: set variable key_sysctls storage-class-specifier to static 2023-08-07 17:55:54 +00:00
sound ALSA: hda/realtek - Remodified 3k pull low procedure 2023-08-16 14:20:27 +02:00
tools - Use LEA ...%rsp instead of ADD %rsp in the Zen1/2 SRSO return sequence 2023-08-19 10:46:02 +02:00
usr initramfs: Encode dependency on KBUILD_BUILD_TIMESTAMP 2023-06-06 17:54:49 +09:00
virt KVM: Grab a reference to KVM for VM and vCPU stats file descriptors 2023-07-29 11:05:28 -04:00
.clang-format iommu: Add for_each_group_device() 2023-05-23 08:15:51 +02:00
.cocciconfig
.get_maintainer.ignore get_maintainer: add Alan to .get_maintainer.ignore 2022-08-20 15:17:44 -07:00
.gitattributes .gitattributes: set diff driver for Rust source code files 2023-05-31 17:48:25 +02:00
.gitignore Revert ".gitignore: ignore *.cover and *.mbx" 2023-07-04 15:05:12 -07:00
.mailmap mailmap: add entries for Simon Horman 2023-08-16 09:53:10 +01:00
.rustfmt.toml rust: add .rustfmt.toml 2022-09-28 09:02:20 +02:00
COPYING
CREDITS - Address -Wmissing-prototype warnings 2023-06-26 16:43:54 -07:00
Kbuild Kbuild updates for v6.1 2022-10-10 12:00:45 -07:00
Kconfig
MAINTAINERS TTY/Serial fixes for 6.5-rc7 2023-08-20 08:26:51 +02:00
Makefile Linux 6.5-rc7 2023-08-20 15:02:52 +02:00
README

Linux kernel
============

There are several guides for kernel developers and users. These guides can
be rendered in a number of formats, like HTML and PDF. Please read
Documentation/admin-guide/README.rst first.

In order to build the documentation, use ``make htmldocs`` or
``make pdfdocs``.  The formatted documentation can also be read online at:

    https://www.kernel.org/doc/html/latest/

There are various text files in the Documentation/ subdirectory,
several of them using the Restructured Text markup notation.

Please read the Documentation/process/changes.rst file, as it contains the
requirements for building and running the kernel, and information about
the problems which may result by upgrading your kernel.