No description
Find a file
Dave Chinner 8b57b11cca pcpcntrs: fix dying cpu summation race
In commit f689054aac ("percpu_counter: add percpu_counter_sum_all
interface") a race condition between a cpu dying and
percpu_counter_sum() iterating online CPUs was identified. The
solution was to iterate all possible CPUs for summation via
percpu_counter_sum_all().

We recently had a percpu_counter_sum() call in XFS trip over this
same race condition and it fired a debug assert because the
filesystem was unmounting and the counter *should* be zero just
before we destroy it. That was reported here:

https://lore.kernel.org/linux-kernel/20230314090649.326642-1-yebin@huaweicloud.com/

likely as a result of running generic/648 which exercises
filesystems in the presence of CPU online/offline events.

The solution to use percpu_counter_sum_all() is an awful one. We
use percpu counters and percpu_counter_sum() for accurate and
reliable threshold detection for space management, so a summation
race condition during these operations can result in overcommit of
available space and that may result in filesystem shutdowns.

As percpu_counter_sum_all() iterates all possible CPUs rather than
just those online or even those present, the mask can include CPUs
that aren't even installed in the machine, or in the case of
machines that can hot-plug CPU capable nodes, even have physical
sockets present in the machine.

Fundamentally, this race condition is caused by the CPU being
offlined being removed from the cpu_online_mask before the notifier
that cleans up per-cpu state is run. Hence percpu_counter_sum() will
not sum the count for a cpu currently being taken offline,
regardless of whether the notifier has run or not. This is
the root cause of the bug.

The percpu counter notifier iterates all the registered counters,
locks the counter and moves the percpu count to the global sum.
This is serialised against other operations that move the percpu
counter to the global sum as well as percpu_counter_sum() operations
that sum the percpu counts while holding the counter lock.

Hence the notifier is safe to run concurrently with sum operations,
and the only thing we actually need to care about is that
percpu_counter_sum() iterates dying CPUs. That's trivial to do,
and when there are no CPUs dying, it has no addition overhead except
for a cpumask_or() operation.

This change makes percpu_counter_sum() always do the right thing in
the presence of CPU hot unplug events and makes
percpu_counter_sum_all() unnecessary. This, in turn, means that
filesystems like XFS, ext4, and btrfs don't have to work out when
they should use percpu_counter_sum() vs percpu_counter_sum_all() in
their space accounting algorithms

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2023-03-19 10:02:04 -07:00
arch cpumask: re-introduce constant-sized cpumask optimizations 2023-03-05 14:30:34 -08:00
block block-6.3-2023-03-03 2023-03-03 10:21:39 -08:00
certs Kbuild updates for v6.3 2023-02-26 11:53:25 -08:00
crypto Networking changes for 6.3. 2023-02-21 18:24:12 -08:00
Documentation A small set of updates for x86: 2023-03-05 11:27:48 -08:00
drivers This push fixes a regression in the caam driver. 2023-03-05 11:32:30 -08:00
fs xfs: test dir/attr hash when loading module 2023-03-19 09:55:49 -07:00
include cpumask: introduce for_each_cpu_or 2023-03-19 10:02:04 -07:00
init Kbuild updates for v6.3 2023-02-26 11:53:25 -08:00
io_uring io_uring-6.3-2023-03-03 2023-03-03 10:25:29 -08:00
ipc Merge branch 'work.namespace' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2023-02-24 19:20:07 -08:00
kernel A set of updates for the interrupt susbsystem: 2023-03-05 11:19:16 -08:00
lib pcpcntrs: fix dying cpu summation race 2023-03-19 10:02:04 -07:00
LICENSES LICENSES: Add the copyleft-next-0.3.1 license 2022-11-08 15:44:01 +01:00
mm mm: avoid gcc complaint about pointer casting 2023-03-04 14:03:27 -08:00
net nfsd-6.3 fixes: 2023-03-01 11:03:44 -08:00
rust Rust fixes for 6.3-rc1 2023-03-03 14:51:15 -08:00
samples LoongArch changes for v6.3 2023-03-01 09:27:00 -08:00
scripts Remove Intel compiler support 2023-03-05 10:49:37 -08:00
security capability: just use a 'u64' instead of a 'u32[2]' array 2023-03-01 10:01:22 -08:00
sound sound fixes for 6.3-rc1 2023-03-04 10:53:59 -08:00
tools Changes in this cycle were: 2023-03-02 09:45:34 -08:00
usr usr/gen_init_cpio.c: remove unnecessary -1 values from int file 2022-10-03 14:21:44 -07:00
virt KVM/riscv changes for 6.3 2023-02-15 12:33:28 -05:00
.clang-format cpumask: re-introduce constant-sized cpumask optimizations 2023-03-05 14:30:34 -08:00
.cocciconfig
.get_maintainer.ignore
.gitattributes .gitattributes: use 'dts' diff driver for *.dtso files 2023-02-26 15:28:23 +09:00
.gitignore .gitignore: ignore *.cover and *.mbx 2023-02-05 18:51:22 +09:00
.mailmap mailmap: map Dikshita Agarwal's old address to his current one 2023-03-02 21:54:24 -08:00
.rustfmt.toml
COPYING
CREDITS There is no particular theme here - mainly quick hits all over the tree. 2023-02-23 17:55:40 -08:00
Kbuild Kbuild updates for v6.1 2022-10-10 12:00:45 -07:00
Kconfig
MAINTAINERS Adding Christian Brauner as VFS co-maintainer. 2023-03-05 11:11:52 -08:00
Makefile Linux 6.3-rc1 2023-03-05 14:52:03 -08:00
README

Linux kernel
============

There are several guides for kernel developers and users. These guides can
be rendered in a number of formats, like HTML and PDF. Please read
Documentation/admin-guide/README.rst first.

In order to build the documentation, use ``make htmldocs`` or
``make pdfdocs``.  The formatted documentation can also be read online at:

    https://www.kernel.org/doc/html/latest/

There are various text files in the Documentation/ subdirectory,
several of them using the Restructured Text markup notation.

Please read the Documentation/process/changes.rst file, as it contains the
requirements for building and running the kernel, and information about
the problems which may result by upgrading your kernel.