Commit graph

559 commits

Author SHA1 Message Date
Kent Overstreet
14da58521e bcachefs: fix btree_trans_peek_prev_journal()
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-06-29 00:47:52 -04:00
Kent Overstreet
32a01cd433 bcachefs: Don't log fsck err in the journal if doing repair elsewhere
This fixes exceeding the bump allocator limit when the allocator finds
many buckets that need repair - they're repaired asynchronously, which
means that every error logged a message in the bump allocator, without
committing.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-06-19 13:08:07 -04:00
Kent Overstreet
425da82c63 bcachefs: btree_iter: fix updates, journal overlay
We need to start searching from search_key - _not_ path->pos, which will
point to the key we found in the btree

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-06-15 22:11:56 -04:00
Alan Huang
9b54efe66c bcachefs: Fix alloc_req use after free
Now the alloc_req is allocated from the bump allocator, if there is
reallocation, the memory of alloc_req would be frees, fix by delaying the
reallocation to transaction restart, it has to restart anyway.

Reported-by: syzbot+2887a13a5c387e616a68@syzkaller.appspotmail.com
Signed-off-by: Alan Huang <mmpgouride@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-06-15 22:11:55 -04:00
Alan Huang
9b9a327009 bcachefs: Don't allocate new memory when mempool is exhausted
Allocating new memory when mempool is exhausted is too complicated, just
return ENOMEM is fine. memcpy is not needed, since there might be
pointers point to the old memory, that's the bug.

Signed-off-by: Alan Huang <mmpgouride@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-06-15 22:11:55 -04:00
Kent Overstreet
3bd6f8aeae bcachefs: btree iter tracepoints
We've been seeing some livelock-ish behavior in the index update part of
the main write path, and while we've got low level btree path
tracepoints, we've been lacking high level btree iterator tracepoints.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-06-15 22:11:55 -04:00
Kent Overstreet
09b9c72bd4 bcachefs: bch_err_throw()
Add a tracepoint for any time we return an error and unwind.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-06-02 12:16:35 -04:00
Kent Overstreet
18dad454cd bcachefs: Replace rcu_read_lock() with guards
The new guard(), scoped_guard() allow for more natural code.

Some of the uses with creative flow control have been left.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-06-01 00:03:12 -04:00
Kent Overstreet
19c0a8aa8a bcachefs: btree_node_missing_err()
Factor out an error path for a small stack usage improvement.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-30 01:21:12 -04:00
Kent Overstreet
0d25264ecf bcachefs: Kill bkey_buf in btree_path_down()
Allocate some (smaller) temporary storage in btree_trans for this -
btree_path_down() is in our max-stack call stack.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-30 01:21:12 -04:00
Kent Overstreet
3f2f028814 bcachefs: Fix btree_iter_next_node() for new locking asserts
We can't unlock a should_be_locked path unless we're in a transaction
restart.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-23 22:00:07 -04:00
Kent Overstreet
b41ac97fe0 bcachefs: Path must be locked if trans->locked && should_be_locked
If path->should_be_locked is true, that means user code (of the btree
API) has seen, in this transaction, something guarded by the node this
path has locked, and we have to keep it locked until the end of the
transaction.

Assert that we're not violating this; should_be_locked should also be
cleared only in _very_ special situations.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-23 07:59:43 -04:00
Kent Overstreet
22e921a6f9 bcachefs: Simplify bch2_path_put()
Simplify the "do we need to keep this locked?" checks.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-23 07:59:43 -04:00
Kent Overstreet
80a160e494 bcachefs: Plumb btree_trans for more locking asserts
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-23 07:59:43 -04:00
Kent Overstreet
aac49471b6 bcachefs: Give out new path if upgrade fails
Avoid transaction restarts due to failure to upgrade - we can traverse a
new iterator without a transaction restart.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-23 07:59:43 -04:00
Kent Overstreet
66782b2acb bcachefs: Fix btree_path_get_locks when not doing trans restart
btree_path_get_locks, on failure, shouldn't unlock if we're not issuing
a transaction restart: we might drop locks we're not supposed to (if
path->should_be_locked is set).

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-23 07:59:43 -04:00
Kent Overstreet
659489f37b bcachefs: Kill bch2_path_put_nokeep()
bch2_path_put_nokeep() was intended for paths we wouldn't need to
preserve for a transaction restart - it always frees them right away
when the ref hits 0.

But since paths are shared, freeing unconditionally is a bug, the path
might have been used elsewhere and have should_be_locked set, i.e. we
need to keep it locked until the end of the transaction.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-23 07:59:43 -04:00
Kent Overstreet
247abee6ae bcachefs: btree_trans_subbuf
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-21 20:15:06 -04:00
Kent Overstreet
b42fac043f bcachefs: bch2_fs_emergency_read_only2()
More error message cleanup: instead of multiple printk()s per error, we
want to be building up a single error message in a printbuf, so that it
can be printed with indenting that shows grouping and avoid errors
getting interspersed or lost in the log.

This gets rid of most calls to bch2_fs_emergency_read_only(). We still
have calls to
 - bch2_fatal_error()
 - bch2_fs_fatal_error()
 - bch2_fs_fatal_err_on()

that need work.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-21 20:14:56 -04:00
Kent Overstreet
c4e3889440 bcachefs: debug_check_iterators no longer requires BCACHEFS_DEBUG
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-21 20:14:55 -04:00
Kent Overstreet
2842515575 bcachefs: Debug params are now static_keys
We'd like users to be able to debug without building custom kernels, so
this will help us get rid of CONFIG_BCACHEFS_DEBUG, at least for most
things.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-21 20:14:54 -04:00
Kent Overstreet
ebf561b208 bcachefs: print_str_as_lines() -> print_str()
bch2_print_string_as_lines() is a low level helper that allows messages
longer than 1k to be printed without truncation.

But we should always be printing with the helpers that take a filesystem
object, if we're in fsck they direct output to the userspace process
controlling fsck instead of the dmesg log.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-21 20:14:18 -04:00
Kent Overstreet
d02755b8c5 bcachefs: trace bch2_trans_kmalloc()
We're occasionally seeing the WARN_ON() for bump allocator usage
exceeding BTREE_TRANS_MEM_MAX; add some tracing so we can see what's
going on.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-21 20:13:27 -04:00
Kent Overstreet
49771a7578 bcachefs: Fix bch2_btree_path_traverse_cached() when paths realloced
btree_key_cache_fill() will allocate and traverse another path (for the
underlying btree), so we can't hold pointers to paths across a call - we
have to pass indices.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-17 18:46:17 -04:00
Kent Overstreet
43b9fece2d bcachefs: Fix set_should_be_locked() call in peek_slot()
set_should_be_locked() needs to be called before peek_key_cache(), which
traverses other paths and may do a trans unlock/relock.

This fixes an assertion pop in path_peek_slot(), when the path we're
using is unexpectedly not uptodate.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-14 17:05:19 -04:00
Kent Overstreet
b1c71cb492 bcachefs: Fix broken btree_path lock invariants in next_node()
This fixes btree locking assert pops users were seeing during evacuate:

https://github.com/koverstreet/bcachefs/issues/878

May 09 22:45:02 sharon kernel: bcachefs (68116e25-fa2d-4c6f-86c7-e8b431d792ae):   bch2_btree_insert_node(): node not locked at level 1
May 09 22:45:02 sharon kernel:   bch2_btree_node_rewrite [bcachefs]: watermark=btree no_check_rw alloc l=0-1 mode=none nodes_written=0 cl.remaining=2 journal_seq=0
May 09 22:45:02 sharon kernel:   path: idx   1 ref 1:0   S B btree=alloc level=0 pos 0:3699637:0 0:3698012:1-0:3699637:0 bch2_move_btree.isra.0+0x1db/0x490 [bcachefs] uptodate 0 locks_want 2
May 09 22:45:02 sharon kernel:     l=0 locks intent seq 4 node ffff8bd700c93600
May 09 22:45:02 sharon kernel:     l=1 locks unlocked seq 1712 node ffff8bd6fd5e7a00
May 09 22:45:02 sharon kernel:     l=2 locks unlocked seq 2295 node ffff8bd6cc725400
May 09 22:45:02 sharon kernel:     l=3 locks unlocked seq 0 node 0000000000000000

Evacuate walks btree nodes with bch2_btree_iter_next_node() and rewrites
them, bch2_btree_update_start() upgrades the path to take intent locks
as far as it needs to.

But next_node() does low level unlock/relock calls on individual nodes,
and didn't handle the case where a path is supposed to be holding
multiple intent locks. If a path has locks_want > 1, it needs to be
either holding locks on all the btree nodes (at each level) requested,
or none of them.

Fix this with a bch2_btree_path_downgrade().

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-14 17:05:19 -04:00
Kent Overstreet
353739f1d1 bcachefs: Fix btree_iter_peek_prev() at end of inode
At the end of the inode, on an extents iterator, peek_slot() has to
advance to the next position to avoid returning a 0 size extent, which
is not allowed.

Changing iter->pos confuses peek_prev(), but we don't need to call
peek_slot() in this case.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-04-24 19:09:52 -04:00
Kent Overstreet
c4f89a1d35 bcachefs: Make btree_iter_peek_prev() assert more precise
The issue this assert is guarding against is that in
BTREE_ITER_filter_snapshots mode we only want to be iterating within a
single inode number - if we iterate into another inode number with keys
for a different snapshot tree, we'll loop arbitrarily long before
finding a key we can return.

This comes up in the unit tests, where we're using inode 0 for our test
keys.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-04-24 19:09:52 -04:00
Kent Overstreet
9180ad2e16 bcachefs: Kill btree_iter.trans
This was planned to be done ages ago, now finally completed; there are
places where we have quite a few btree_trans objects on the stack, so
this reduces stack usage somewhat.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-04-02 10:24:34 -04:00
Kent Overstreet
6d77ce4a27 bcachefs: Better printing of inconsistency errors
Build up and emit the error message for an inconsistency error all at
once, instead of spread over multiple printk calls, so they're not
jumbled in the dmesg log.

Also, add better indenting.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-29 13:26:13 -04:00
Kent Overstreet
ff4e0f7de6 bcachefs: add missing newline in bch2_trans_updates_to_text()
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-28 12:36:32 -04:00
Kent Overstreet
a2e9e68746 bcachefs: Kill a bit of dead code
Found with CC=clang W=1

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-14 21:02:16 -04:00
Kent Overstreet
9cf6b84b71 bcachefs: CONFIG_BCACHEFS_INJECT_TRANSACTION_RESTARTS
Incorrectly handled transaction restarts can be a source of heisenbugs;
add a mode where we randomly inject them to shake them out.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-02-12 18:40:19 -05:00
Kent Overstreet
a858175227 bcachefs: Fix btree_trans_peek_key_cache()
BTREE_ITER_cached_nofill has some tricky corner cases; it's used
internally for iterators that aren't walking the key cache, but need to
be coherent with the key cache.

It tells traverse to look up and lock the key cache entry if present,
but don't create one if it doesn't exist.

That means we have to have a BTREE_ITER_UPTODATE path (because after
traverse the path has to be UPTODATE, or we pop assertions) that doesn't
point to anything (which is the less bad option, taken by the previous
fix).

The previous fix for this path missed an issue that can happen in
bch2_trans_peek_key_cache(): we can't set should_be_locked on a path
that doesn't point to anything and doesn't hold locks.

Fixes: bd5b09727f ("bcachefs: Don't set btree_path to updtodate if we don't fill")
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-01-21 12:26:25 -05:00
Kent Overstreet
d3d0fac57d bcachefs: bch2_btree_iter_peek_slot() handles navigating to nonexistent depth
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-01-09 23:38:42 -05:00
Kent Overstreet
8cfdc6ce1f bcachefs: bch2_trans_node_drop()
Factor out a small common helper.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-01-09 23:38:42 -05:00
Kent Overstreet
baf13d8344 bcachefs: kill __bch2_btree_iter_flags()
bch2_btree_iter_flags() now takes a level parameter; this fixes a bug
where using a node iterator on a leaf wouldn't set
BTREE_ITER_with_key_cache, leading to fun cache coherency bugs.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-01-09 23:38:41 -05:00
Kent Overstreet
6679e363f4 bcachefs: bch2_btree_path_peek_slot() doesn't return errors
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-12-29 13:30:39 -05:00
Kent Overstreet
b9a37144da bcachefs: Fix key cache + BTREE_ITER_all_snapshots
Normally, whitouts (KEY_TYPE_whitout) are filtered from btree lookups,
since they exist only to represent deletions of keys in ancestor
snapshots - except, they should not be filtered in
BTREE_ITER_all_snapshots mode, so that e.g. snapshot deletion can clean
them up.

This means that that the key cache has to store whiteouts, and key cache
fills cannot filter them.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-12-21 01:36:23 -05:00
Kent Overstreet
7e320a4063 bcachefs: Fix btree_trans_peek_key_cache() BTREE_ITER_all_snapshots
In BTREE_ITER_all_snapshots mode, we're required to only return keys
where the snapshot field matches the iterator position -
BTREE_ITER_filter_snapshots requires pulling keys into the key cache
from ancestor snapshots, so we have to check for that.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-12-21 01:36:23 -05:00
Kent Overstreet
c50341be4e bcachefs: tidy btree_trans_peek_journal()
Change to match bch2_btree_trans_peek_updates() calling convention.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-12-21 01:36:23 -05:00
Kent Overstreet
68eb4fdd8c bcachefs: tidy up __bch2_btree_iter_peek()
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-12-21 01:36:23 -05:00
Kent Overstreet
ff1dd05f82 bcachefs: bch2_trans_relock() is trylock for lockdep
fix some spurious lockdep splats

Reported-by: syzbot+e088be3c2d5c05aaac35@syzkaller.appspotmail.com
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-12-21 01:36:20 -05:00
Kent Overstreet
90f3683e8f bcachefs: Fix null ptr deref in btree_path_lock_root()
Historically, we required that all btree node roots point to a valid
(possibly fake) node, but we're improving our ability to continue in the
presence of errors.

Reported-by: syzbot+e22007d6acb9c87c2362@syzkaller.appspotmail.com
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-12-21 01:36:20 -05:00
Kent Overstreet
854724d116 bcachefs: btree_and_journal_iter: don't iterate over too many whiteouts when prefetching
To help ameloriate issues with peek operations having to skip over
deletions in the journal - just bail out if all we're doing is
prefetching btree nodes.

Since btree node prefetching runs every time we iterate to a new node,
and has to sequentially scan ahead, this avoids another O(n^2).

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-12-21 01:36:18 -05:00
Kent Overstreet
ac745efb42 bcachefs: peek_prev_min(): Search forwards for extents, snapshots
With extents and snapshots, for slightly different reasons, we may have
to search forwards to find a key that compares equal to iter->pos (i.e.
a key that peek_prev() should return, as it returns keys <= iter->pos).

peek_slot() does this, and is an easy way to fix this case.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-12-21 01:36:18 -05:00
Kent Overstreet
7e5b8e00e2 bcachefs: Implement bch2_btree_iter_prev_min()
A user contributed a filessytem dump, where the dump was actually
corrupted (due to being taken while the filesystem was online), but
which exposed an interesting bug in fsck - reconstruct_inode().

When itearting in BTREE_ITER_filter_snapshots mode, it's required to
give an end position for the iteration and it can't span inode numbers;
continuing into the next inode might mean we start seeing keys from a
different snapshot tree, that the is_ancestor() checks always filter,
thus we're never able to return a key and stop iterating.

Backwards iteration never implemented the end position because nothing
else needed it - except for reconstuct_inode().

Additionally, backwards iteration is now able to overlay keys from the
journal, which will be useful if we ever decide to start doing journal
replay in the background.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-12-21 01:36:18 -05:00
Kent Overstreet
db6e584b85 bcachefs: Simplify btree_iter_peek() filter_snapshots
Collapse all the BTREE_ITER_filter_snapshots handling down into a single
block; btree iteration is much simpler in the !filter_snapshots case.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-12-21 01:36:17 -05:00
Kent Overstreet
000fe8d573 bcachefs: Rename btree_iter_peek_upto() -> btree_iter_peek_max()
We'll be introducing btree_iter_peek_prev_min(), so rename for
consistency.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-12-21 01:36:17 -05:00
Kent Overstreet
b318882022 bcachefs: bch2_trans_verify_not_unlocked_or_in_restart()
Fold two asserts into one.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-12-21 01:36:16 -05:00