Commit graph

174 commits

Author SHA1 Message Date
Kent Overstreet
09b9c72bd4 bcachefs: bch_err_throw()
Add a tracepoint for any time we return an error and unwind.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-06-02 12:16:35 -04:00
Kent Overstreet
18dad454cd bcachefs: Replace rcu_read_lock() with guards
The new guard(), scoped_guard() allow for more natural code.

Some of the uses with creative flow control have been left.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-06-01 00:03:12 -04:00
Kent Overstreet
80a160e494 bcachefs: Plumb btree_trans for more locking asserts
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-23 07:59:43 -04:00
Kent Overstreet
eb34365ada bcachefs: Clear should_be_locked before unlock in key_cache_drop()
We're adding new should_be_locked assertions, also add a comment
explaining why clearing should_be_locked is safe here.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-23 07:59:43 -04:00
Kent Overstreet
9469556a5f bcachefs: btree key cache asserts
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-21 20:15:02 -04:00
Kent Overstreet
49771a7578 bcachefs: Fix bch2_btree_path_traverse_cached() when paths realloced
btree_key_cache_fill() will allocate and traverse another path (for the
underlying btree), so we can't hold pointers to paths across a call - we
have to pass indices.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-17 18:46:17 -04:00
Kent Overstreet
9180ad2e16 bcachefs: Kill btree_iter.trans
This was planned to be done ages ago, now finally completed; there are
places where we have quite a few btree_trans objects on the stack, so
this reduces stack usage somewhat.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-04-02 10:24:34 -04:00
Kent Overstreet
1c8f4587d2 bcachefs: do_trace_key_cache_fill()
Reducing stack frame usage; this moves the printbuf out of the main
stack frame.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-04-02 10:24:34 -04:00
Alan Huang
677bdb7346 bcachefs: Fix deadlock
This fixes two deadlocks:

1.pcpu_alloc_mutex involved one as pointed by syzbot[1]
2.recursion deadlock.

The root cause is that we hold the bc lock during alloc_percpu, fix it
by following the pattern used by __btree_node_mem_alloc().

[1] https://lore.kernel.org/all/66f97d9a.050a0220.6bad9.001d.GAE@google.com/T/

Reported-by: syzbot+fe63f377148a6371a9db@syzkaller.appspotmail.com
Tested-by: syzbot+fe63f377148a6371a9db@syzkaller.appspotmail.com
Signed-off-by: Alan Huang <mmpgouride@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-02-26 19:31:05 -05:00
Kent Overstreet
3539880ef1 bcachefs: Fix rcu imbalance in bch2_fs_btree_key_cache_exit()
Spotted by sparse.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-02-06 22:35:11 -05:00
Kent Overstreet
a858175227 bcachefs: Fix btree_trans_peek_key_cache()
BTREE_ITER_cached_nofill has some tricky corner cases; it's used
internally for iterators that aren't walking the key cache, but need to
be coherent with the key cache.

It tells traverse to look up and lock the key cache entry if present,
but don't create one if it doesn't exist.

That means we have to have a BTREE_ITER_UPTODATE path (because after
traverse the path has to be UPTODATE, or we pop assertions) that doesn't
point to anything (which is the less bad option, taken by the previous
fix).

The previous fix for this path missed an issue that can happen in
bch2_trans_peek_key_cache(): we can't set should_be_locked on a path
that doesn't point to anything and doesn't hold locks.

Fixes: bd5b09727f ("bcachefs: Don't set btree_path to updtodate if we don't fill")
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-01-21 12:26:25 -05:00
Kent Overstreet
bd5b09727f bcachefs: Don't set btree_path to updtodate if we don't fill
This fixes various locking asserts, and a null ptr deref in
bch2_btree_iter_peek_path().

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-01-09 23:38:42 -05:00
Kent Overstreet
f908eacc34 bcachefs: fix bch2_btree_key_cache_drop()
When evicting, we shouldn't leave a pointer to the key cache entry lying
around - that screws up btree path asserts we're adding.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-01-09 23:38:41 -05:00
Kent Overstreet
861cd0f606 bcachefs: Write lock btree node in key cache fills
this addresses a key cache coherency bug

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-01-09 23:38:41 -05:00
Kent Overstreet
07c1a6fa90 bcachefs: trace_key_cache_fill
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-12-29 13:30:39 -05:00
Kent Overstreet
65b14fa3d8 bcachefs: Assert that we're not violating key cache coherency rules
We're not allowed to have a dirty key in the key cache if the key
doesn't exist at all in the btree - creation has to bypass the key
cache, so that iteration over the btree can check if the key is present
in the key cache.

Things break in subtle ways if cache coherency is broken, so this needs
an assert.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-12-21 01:36:17 -05:00
Kent Overstreet
895fbf1cf0 bcachefs: Use __GFP_ACCOUNT for reclaimable memory
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-09-21 11:39:46 -04:00
Nathan Chancellor
5396e5af3c bcachefs: Fix format specifier in bch2_btree_key_cache_to_text()
When building for a 32-bit architecture, for which 'size_t' is
'unsigned int', there is a compiler warning due to use of '%lu':

  In file included from fs/bcachefs/vstructs.h:5,
                   from fs/bcachefs/bcachefs_format.h:80,
                   from fs/bcachefs/bcachefs.h:207,
                   from fs/bcachefs/btree_key_cache.c:3:
  fs/bcachefs/btree_key_cache.c: In function 'bch2_btree_key_cache_to_text':
  fs/bcachefs/btree_key_cache.c:795:25: error: format '%lu' expects argument of type 'long unsigned int', but argument 3 has type 'size_t' {aka 'unsigned int'} [-Werror=format=]
    795 |         prt_printf(out, "pending:\t%lu\r\n",            per_cpu_sum(bc->nr_pending));
        |                         ^~~~~~~~~~~~~~~~~~~
  fs/bcachefs/util.h:78:63: note: in definition of macro 'prt_printf'
     78 | #define prt_printf(_out, ...)           bch2_prt_printf(_out, __VA_ARGS__)
        |                                                               ^~~~~~~~~~~
  fs/bcachefs/btree_key_cache.c:795:38: note: format string is defined here
    795 |         prt_printf(out, "pending:\t%lu\r\n",            per_cpu_sum(bc->nr_pending));
        |                                    ~~^
        |                                      |
        |                                      long unsigned int
        |                                    %u
  cc1: all warnings being treated as errors

Use the proper specifier, '%zu', to resolve the warning.

Fixes: e447e49977b8 ("bcachefs: key cache can now allocate from pending")
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-09-09 09:41:47 -04:00
Kent Overstreet
5f1929f1f0 bcachefs: key cache can now allocate from pending
btree_trans objects can hold the btree_trans_barrier srcu read lock for
an extended amount of time (they shouldn't, but it's difficult to
guarantee).

the srcu barrier blocks memory reclaim, so to avoid too many stranded
key cache items, this uses the new pending_rcu_items to allocate from
pending items - like we did before, but now without a global lock on the
key cache.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-09-09 09:41:47 -04:00
Kent Overstreet
f2bfe7e837 bcachefs: Rip out freelists from btree key cache
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-09-09 09:41:47 -04:00
Kent Overstreet
a592cdf516 bcachefs: don't use rht_bucket() in btree_key_cache_scan()
rht_bucket() does strange complicated things when a rehash is in
progress.

Instead, just skip scanning when a rehash is in progress: scanning is
going to be more expensive (many more empty slots to cover), and some
sort of infinite loop is being observed

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-08-22 10:04:41 -04:00
Kent Overstreet
87313ac1f1 bcachefs: clear path->should_be_locked in bch2_btree_key_cache_drop()
bch2_btree_key_cache_drop() evicts the key cache entry - it's used when
we're doing an update that bypasses the key cache, because for cache
coherency reasons a key can't be in the key cache unless it also exists
in the btree - i.e. creates have to bypass the cache.

After evicting, the path no longer points to a key cache key, and
relock() will always fail if should_be_locked is true.

Prep for improving path->should_be_locked assertions

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-08-22 03:12:57 -04:00
Kent Overstreet
a24e6e7146 bcachefs: delete faulty fastpath in bch2_btree_path_traverse_cached()
bch2_btree_path_traverse_cached() was previously checking if it could
just relock the path, which is a common idiom in path traversal.

However, it was using btree_node_relock(), not btree_path_relock();
btree_path_relock() only succeeds if the path was in state
BTREE_ITER_NEED_RELOCK.

If the path was in state BTREE_ITER_NEED_TRAVERSE a full traversal is
needed; this led to a null ptr deref in
bch2_btree_path_traverse_cached().

And the short circuit check here isn't needed, since it was already done
in the main bch2_btree_path_traverse_one().

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-08-13 22:56:50 -04:00
Kent Overstreet
d2cb6b219d bcachefs: Simplify btree key cache fill path
Don't allocate the new bkey_cached until after we've done the btree
lookup; this means we can kill bkey_cached.valid.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-07-14 19:00:16 -04:00
Kent Overstreet
789566da25 bcachefs: bch2_btree_key_cache_drop() now evicts
As part of improving btree key cache coherency, the bkey_cached.valid
flag is going away.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-07-14 19:00:16 -04:00
Kent Overstreet
c30402e548 bcachefs: btree_path_cached_set()
new helper - small refactoring

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-07-14 19:00:15 -04:00
Kent Overstreet
bf2b356afd bcachefs: Leave a buffer in the btree key cache to avoid lock thrashing
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-06-10 13:17:16 -04:00
Kent Overstreet
2760bfe388 bcachefs: Fix reporting of freed objects from key cache shrinker
We count objects as freed when we move them to the srcu-pending lists
because we're doing the equivalent of a kfree_srcu(); the only
difference is managing the pending list ourself means we can allocate
from the pending list.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-06-10 13:17:16 -04:00
Kent Overstreet
bc65e98e68 bcachefs: increase key cache shrinker batch size
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-06-10 13:17:16 -04:00
Kent Overstreet
5ae67abcdf bcachefs: Enable automatic shrinking for rhashtables
Since the key cache shrinker walks the rhashtable, a mostly empty
rhashtable leads to really nasty reclaim performance issues.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-06-10 13:17:16 -04:00
Kent Overstreet
218e5e0c2a bcachefs: Fix locking assert
We now track whether a transaction is locked, and verify that we don't
have nodes locked when the transaction isn't locked; reorder relocks to
not pop the new assert.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-28 11:29:26 -04:00
Kent Overstreet
b895c70326 bcachefs: x-macroize journal flags enums
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-08 17:29:22 -04:00
Kent Overstreet
c749541353 bcachefs: uninline set_btree_iter_dontneed()
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-08 17:29:21 -04:00
Nathan Chancellor
8bb0eddbbc bcachefs: Fix format specifiers in bch2_btree_key_cache_to_text()
When building for a 32-bit target, for which 'size_t' is 'unsigned int',
there are two warnings around mismatched format specifiers and argument
types:

  In file included from fs/bcachefs/vstructs.h:5,
                   from fs/bcachefs/bcachefs_format.h:79,
                   from fs/bcachefs/bcachefs.h:207,
                   from fs/bcachefs/btree_key_cache.c:3:
  fs/bcachefs/btree_key_cache.c: In function 'bch2_btree_key_cache_to_text':
  fs/bcachefs/btree_key_cache.c:1046:25: error: format '%lu' expects argument of type 'long unsigned int', but argument 3 has type 'size_t' {aka 'unsigned int'} [-Werror=format=]
   1046 |         prt_printf(out, "nonpcpu freelist:\t%lu\r\n",   bc->nr_freed_nonpcpu);
        |                         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~    ~~~~~~~~~~~~~~~~~~~~
        |                                                           |
        |                                                           size_t {aka unsigned int}
  fs/bcachefs/util.h:192:63: note: in definition of macro 'prt_printf'
    192 | #define prt_printf(_out, ...)           bch2_prt_printf(_out, __VA_ARGS__)
        |                                                               ^~~~~~~~~~~
  fs/bcachefs/btree_key_cache.c:1046:47: note: format string is defined here
   1046 |         prt_printf(out, "nonpcpu freelist:\t%lu\r\n",   bc->nr_freed_nonpcpu);
        |                                             ~~^
        |                                               |
        |                                               long unsigned int
        |                                             %u
  fs/bcachefs/btree_key_cache.c:1047:25: error: format '%lu' expects argument of type 'long unsigned int', but argument 3 has type 'size_t' {aka 'unsigned int'} [-Werror=format=]
   1047 |         prt_printf(out, "pcpu freelist:\t%lu\r\n",      bc->nr_freed_pcpu);
        |                         ^~~~~~~~~~~~~~~~~~~~~~~~~       ~~~~~~~~~~~~~~~~~
        |                                                           |
        |                                                           size_t {aka unsigned int}
  fs/bcachefs/util.h:192:63: note: in definition of macro 'prt_printf'
    192 | #define prt_printf(_out, ...)           bch2_prt_printf(_out, __VA_ARGS__)
        |                                                               ^~~~~~~~~~~
  fs/bcachefs/btree_key_cache.c:1047:44: note: format string is defined here
   1047 |         prt_printf(out, "pcpu freelist:\t%lu\r\n",      bc->nr_freed_pcpu);
        |                                          ~~^
        |                                            |
        |                                            long unsigned int
        |                                          %u
  cc1: all warnings being treated as error

Use the proper 'size_t' specifier, '%zu', to clear up the warnings for
these platforms.

Fixes: f2d47ec26af5 ("bcachefs: Btree key cache instrumentation")
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-08 17:29:21 -04:00
Kent Overstreet
5147b9ae76 bcachefs: Btree key cache instrumentation
It turns out the btree key cache shrinker wasn't actually reclaiming
anything, prior to the previous patch. This adds instrumentation so that
if we have further issues we can see what's going on.

Specifically, sysfs internal/btree_key_cache is greatly expanded with
new counters, and the SRCU sequence numbers of the first 10 entries on
each pending freelist, and we also add trigger_btree_key_cache_shrink
for testing without having to prune all the system caches.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-08 17:29:20 -04:00
Kent Overstreet
4984faff5d bcachefs: Use bch2_btree_path_upgrade() in key cache traverse
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-08 17:29:19 -04:00
Kent Overstreet
5dd8c60e1e bcachefs: iter/update/trigger/str_hash flag cleanup
Combine iter/update/trigger/str_hash flags into a single enum, and
x-macroize them for a to_text() function later.

These flags are all for a specific iter/key/update context, so it makes
sense to group them together - iter/update/trigger flags were already
given distinct bits, this cleans up and unifies that handling.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-08 17:29:18 -04:00
Kent Overstreet
7423330e30 bcachefs: prt_printf() now respects \r\n\t
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-08 17:29:17 -04:00
Kent Overstreet
b30b70ad8b bcachefs: Fix early error path in bch2_fs_btree_key_cache_exit()
Reported-by: syzbot+a35cdb62ec34d44fb062@syzkaller.appspotmail.com
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-06 10:58:17 -04:00
Kent Overstreet
adfe9357c3 bcachefs: Tweak btree key cache shrinker so it actually frees
Freeing key cache items is a multi stage process; we need to wait for an
SRCU grace period to elapse, and we handle this ourselves - partially to
avoid callback overhead, but primarily so that when allocating we can
first allocate from the freed items waiting for an SRCU grace period.

Previously, the shrinker was counting the items on the 'waiting for SRCU
grace period' lists as items being scanned, but this meant that too many
items waiting for an SRCU grace period could prevent it from doing any
work at all.

After this, we're seeing that items skipped due to the accessed bit are
the main cause of the shrinker not making any progress, and we actually
want the key cache shrinker to run quite aggressively because reclaimed
items will still generally be found (more compactly) in the btree node
cache - so we also tweak the shrinker to not count those against
nr_to_scan.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-04-20 17:06:20 -04:00
Hongbo Li
09e913f582 bcachefs: fix the count of nr_freed_pcpu after changing bc->freed_nonpcpu list
When allocating bkey_cached from bc->freed_pcpu list, it missed
decreasing the count of nr_freed_pcpu which would cause the mismatch
between the value of nr_freed_pcpu and the list items. This problem
also exists in moving new bkey_cached to bc->freed_pcpu list.
If these happened, the bug info may appear in
bch2_fs_btree_key_cache_exit by the follow code:

   BUG_ON(list_count_nodes(&bc->freed_pcpu) != bc->nr_freed_pcpu);
   BUG_ON(list_count_nodes(&bc->freed_nonpcpu) != bc->nr_freed_nonpcpu);

Fixes: c65c13f0ea ("bcachefs: Run btree key cache shrinker less aggressively")
Signed-off-by: Hongbo Li <lihongbo22@huawei.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-04-07 13:40:35 -04:00
Kent Overstreet
6088234ce8 bcachefs: JOURNAL_SPACE_LOW
"bcachefs; Fix deadlock in bch2_btree_update_start()" was a significant
performance regression (nearly 50%) on multithreaded random writes with
fio.

The reason is that the journal watermark checks multiple things,
including the state of the btree write buffer, and on multithreaded
update heavy workloads we're bottleneked on write buffer flushing - we
don't want kicknig off btree updates to depend on the state of the write
buffer.

This isn't strictly correct; the interior btree update path does do
write buffer updates, but it's a tiny fraction of total accounting
updates and we're more concerned with space in the journal itself.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-04-06 13:50:26 -04:00
Kent Overstreet
3ed94062e3 bcachefs: Improve bch2_fatal_error()
error messages should always include __func__

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-18 00:24:24 -04:00
Kent Overstreet
b3f8e71117 bcachefs: Fix btree key cache coherency during replay
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-13 21:22:26 -04:00
Kent Overstreet
ccb7b08fbb bcachefs: trans_for_each_path() no longer uses path->idx
path->idx is now a code smell: we should be using path_idx_t, since it's
stable across btree path reallocation.

This is also a bit faster, using the same loop counter vs. fetching
path->idx from each path we iterate over.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01 11:47:43 -05:00
Kent Overstreet
7f9821a7c1 bcachefs: btree_insert_entry -> btree_path_idx_t
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01 11:47:43 -05:00
Kent Overstreet
07f383c71f bcachefs: btree_iter -> btree_path_idx_t
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01 11:47:43 -05:00
Kent Overstreet
0c0ba8e9c5 bcachefs: skip journal more often in key cache reclaim
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01 11:47:43 -05:00
Kent Overstreet
e4e49375a8 bcachefs; kill bch2_btree_key_cache_flush()
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01 11:47:41 -05:00
Kent Overstreet
cf5bacb6a5 bcachefs: delete useless commit_do()
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01 11:47:39 -05:00