Commit graph

559 commits

Author SHA1 Message Date
Kent Overstreet
acb3b26e76 bcachefs: Move btree lock debugging to slowpath fn
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:58 -04:00
Kent Overstreet
e751c01a8e bcachefs: Start using bpos.snapshot field
This patch starts treating the bpos.snapshot field like part of the key
in the btree code:

* bpos_successor() and bpos_predecessor() now include the snapshot field
* Keys in btrees that will be using snapshots (extents, inodes, dirents
  and xattrs) now always have their snapshot field set to U32_MAX

The btree iterator code gets a new flag, BTREE_ITER_ALL_SNAPSHOTS, that
determines whether we're iterating over keys in all snapshots or not -
internally, this controlls whether bkey_(successor|predecessor)
increment/decrement the snapshot field, or only the higher bits of the
key.

We add a new member to struct btree_iter, iter->snapshot: when
BTREE_ITER_ALL_SNAPSHOTS is not set, iter->pos.snapshot should always
equal iter->snapshot, which will be 0 for btrees that don't use
snapshots, and alsways U32_MAX for btrees that will use snapshots
(until we enable snapshot creation).

This patch also introduces a new metadata version number, and compat
code for reading from/writing to older versions - this isn't a forced
upgrade (yet).

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:57 -04:00
Kent Overstreet
4cf91b0270 bcachefs: Split out bpos_cmp() and bkey_cmp()
With snapshots, we're going to need to differentiate between comparisons
that should and shouldn't include the snapshot field. bpos_cmp is now
the comparison function that does include the snapshot field, used by
core btree code.

Upper level filesystem code generally does _not_ want to compare against
the snapshot field - that code wants keys to compare as equal even when
one of them is in an ancestor snapshot.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:57 -04:00
Kent Overstreet
43d002432d bcachefs: Add a mechanism for running callbacks at trans commit time
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:57 -04:00
Kent Overstreet
f793fd85dc bcachefs: Fix for bch2_trans_commit() unlocking when it's not supposed to
When we pass BTREE_INSERT_NOUNLOCK bch2_trans_commit isn't supposed to
unlock after a successful commit, but it was calling
bch2_trans_cond_resched() - oops.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:57 -04:00
Kent Overstreet
a9d79c6e8b bcachefs: Use pcpu mode of six locks for interior nodes
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:57 -04:00
Kent Overstreet
08070cba4a bcachefs: Split btree_iter_traverse and bch2_btree_iter_traverse()
External (to the btree iterator code) users of bch2_btree_iter_traverse
expect that on success the iterator will be pointed at iter->pos and
have that position locked - but since we split iter->pos and
iter->real_pos, that means it has to update iter->real_pos if necessary.

Internal users don't expect it to modify iter->real_pos, so we need two
separate functions.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:57 -04:00
Kent Overstreet
bcad562259 bcachefs: Update iter->real_pos lazily
peek() has to update iter->real_pos - there's no need for
bch2_btree_iter_set_pos() to update it as well.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:57 -04:00
Kent Overstreet
818664f505 bcachefs: Consolidate bch2_btree_iter_peek() and peek_with_updates()
Ideally we'll be getting rid of peek_with_updates(), but the callers
will need to be checked.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:57 -04:00
Kent Overstreet
ca58cbd471 bcachefs: Improve iter->real_pos handling
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:57 -04:00
Kent Overstreet
3b0baf6f29 bcachefs: Internal btree iterator renaming
This just gives some internal helpers some better names.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:57 -04:00
Kent Overstreet
07fc72e103 bcachefs: Kill btree_iter_peek_uptodate()
Since we're no longer doing next() immediately followed by peek(), this
optimization isn't doing anything anymore.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:57 -04:00
Kent Overstreet
5cde51cd48 bcachefs: Iterators are now always consistent with iter->real_pos
This means bch2_btree_iter_traverse_one() can be made more efficient.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:57 -04:00
Kent Overstreet
345ca825e7 bcachefs: Have btree_iter_next_node() use btree_iter_set_search_pos()
btree node iterators need to obey the regular btree node invarionts
w.r.t. iter->real_pos; once they do, bch2_btree_iter_traverse will have
less that it needs to check.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:56 -04:00
Kent Overstreet
e0ba3b6429 bcachefs: Replace bch2_btree_iter_next() calls with bch2_btree_iter_advance
The way btree iterators work internally has been changing, particularly
with the iter->real_pos changes, and bch2_btree_iter_next() is no longer
hyper optimized - it's just advance followed by peek, so it's more
efficient to just call advance where we're not using the return value of
bch2_btree_iter_next().

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:56 -04:00
Kent Overstreet
4ce41957a7 bcachefs: Optimize bch2_btree_iter_verify_level()
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:56 -04:00
Kent Overstreet
5c1ec980f9 bcachefs: Fix iterator picking
comparison was wrong

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:56 -04:00
Kent Overstreet
e9895f0ab9 bcachefs: Assert that iterators aren't being double freed
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:56 -04:00
Kent Overstreet
50dc0f692a bcachefs: Require all btree iterators to be freed
We keep running into occasional bugs with btree transaction iterators
overflowing - this will make those bugs more visible.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:56 -04:00
Kent Overstreet
8d956c2fb8 bcachefs: btree_iter_set_dontneed()
This is a bit clearer than using bch2_btree_iter_free().

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:56 -04:00
Kent Overstreet
abcecb49f5 bcachefs: Fsck code refactoring
Change fsck code to always put btree iterators - also, make some flow
control improvements to deal with lock restarts better, and refactor
check_extents() to not walk extents twice for counting/checking
i_sectors.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:56 -04:00
Kent Overstreet
f2eaea2fc1 bcachefs: Kill btree_iter_pos_changed()
this is used in only one place now, so just inline it into the caller.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:56 -04:00
Kent Overstreet
57447b7acc bcachefs: Fix a btree iterator leak
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:56 -04:00
Kent Overstreet
c7bb769c81 bcachefs: __bch2_trans_get_iter() refactoring, BTREE_ITER_NOT_EXTENTS
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:55 -04:00
Kent Overstreet
a045be5a0e bcachefs: Simplify bch2_btree_iter_peek_prev()
Since we added iter->real_pos, btree_iter_set_pos_to_(next|prev)_leaf no
longer modify iter->pos, so we don't have to save it at the start
anymore.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:55 -04:00
Kent Overstreet
61a19ce425 bcachefs: Fix bpos_diff()
Previously, bpos_diff() did not handle borrows correctly. Minor thing
considering how it was used, but worth fixing.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:55 -04:00
Kent Overstreet
f020bfcdb0 bcachefs: Use bch2_bpos_to_text() more consistently
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:55 -04:00
Kent Overstreet
18fc6ae503 bcachefs: btree_iter_prev_slot()
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:55 -04:00
Kent Overstreet
1f7fdc0abd bcachefs: btree_iter_live()
New helper to clean things up a bit - also, improve iter->flags
handling.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:55 -04:00
Kent Overstreet
c052cf82f3 bcachefs: KEY_TYPE_discard is no longer used
KEY_TYPE_discard used to be used for extent whiteouts, but when handling
over overlapping extents was lifted above the core btree code it became
unused. This patch updates various code to reflect that.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:55 -04:00
Kent Overstreet
9620c3ec2f bcachefs: Add a mempool for the replicas delta list
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:54 -04:00
Kent Overstreet
e131b6aa0a bcachefs: Add a mempool for btree_trans bump allocator
This allocation is required for filesystem operations to make forward
progress, thus needs a mempool.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:54 -04:00
Kent Overstreet
8042b5b715 bcachefs: Extents may now cross btree node boundaries
When snapshots arrive, we won't necessarily be able to arbitrarily split
existis - when we need to split an existing extent, we'll have to check
if the extent was overwritten in child snapshots and if so emit a
whiteout for the split in the child snapshot.

Because extents couldn't span btree nodes previously, journal replay
would sometimes have to split existing extents. That's no good anymore,
but fortunately since extent handling has already been lifted above most
of the btree code there's no real need for that rule anymore.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:53 -04:00
Kent Overstreet
7e1a3aa9df bcachefs: iter->real_pos
We need to differentiate between the search position of a btree
iterator, vs. what it actually points at (what we found). This matters
for extents, where iter->pos will typically be the start of the key we
found and iter->real_pos will be the end of the key we found (which soon
won't necessarily be in the same btree node!) and it will also matter
for snapshots.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:53 -04:00
Kent Overstreet
3d4955952f bcachefs: Fix bch2_btree_iter_peek_prev()
This makes bch2_btree_iter_peek_prev() and bch2_btree_iter_prev()
consistent with peek() and next(), w.r.t. iter->pos.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:53 -04:00
Kent Overstreet
434094bec0 bcachefs: bch2_btree_iter_advance_pos()
This adds a new common helper for advancing past the last key returned
by peek().

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:53 -04:00
Kent Overstreet
792e2c4c85 bcachefs: Kill bch2_btree_iter_set_pos_same_leaf()
The only reason we were keeping this around was for
BTREE_INSERT_NOUNLOCK semantics - if bch2_btree_iter_set_pos() advances
to the next leaf node, it'll drop the lock on the node that we just
inserted to.

But we don't rely on BTREE_INSERT_NOUNLOCK semantics for the extents
btree, just the inodes btree, and if we do need it for the extents btree
in the future we can do it more cleanly by cloning the iterator - this
lets us delete some special cases in the btree iterator code, which is
complicated enough as it is.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:53 -04:00
Kent Overstreet
2b2c1a89ce bcachefs: Simplify btree_iter_(next|prev)_leaf()
There's no good reason for these functions to not be using
bch2_btree_iter_set_pos().

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:53 -04:00
Kent Overstreet
edfbba58e3 bcachefs: Add btree node prefetching to bch2_btree_and_journal_walk()
bch2_btree_and_journal_walk() walks the btree overlaying keys from the
journal; it was introduced so that we could read in the alloc btree
prior to journal replay being done, when journalling of updates to
interior btree nodes was introduced.

But it didn't have btree node prefetching, which introduced a severe
regression with mount times, particularly on spinning rust. This patch
implements btree node prefetching for the btree + journal walk,
hopefully fixing that.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:51 -04:00
Kent Overstreet
07a1006ae8 bcachefs: Reduce/kill BKEY_PADDED use
With various newer key types - stripe keys, inline data extents - the
old approach of calculating the maximum size of the value is becoming
more and more error prone. Better to switch to bkey_on_stack, which can
dynamically allocate if necessary to handle any size bkey.

In particular we also want to get rid of BKEY_EXTENT_VAL_U64s_MAX.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:50 -04:00
Kent Overstreet
07bd4c285b bcachefs: Fix btree lock being incorrectly dropped
__btree_trans_get_iter() was using bch2_btree_iter_upgrade, but it
shouldn't have been because on failure bch2_btree_iter_upgrade may drop
locks in other iterators, expecting the transaction to be restarted. But
__btree_trans_get_iter can't return an error to indicate that we need to
restart thet transaction - oops.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:50 -04:00
Kent Overstreet
537c49d6af bcachefs: Fix btree node merge -> split operations
If a btree node merger is followed by a split or compact of the parent
node, we could end up with the parent btree node iterator pointing to
the whiteout inserted by the btree node merge operation - the fix is to
ensure that interior btree node iterators always point to the first non
whiteout.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:49 -04:00
Kent Overstreet
cc578a36f9 bcachefs: Fix __btree_iter_next() when all iters are in use_next() when all iters are in use
Also, print out more information on btree transaction iterator overflow.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:49 -04:00
Kent Overstreet
a2bfc8412a bcachefs: Try to print full btree error message
Metadata corruption bugs are hard to debug if we can't see exactly what
went wrong - try to allocate a bigger buffer so we can print out
everything we have.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:49 -04:00
Kent Overstreet
3eb26d0157 bcachefs: bch2_trans_get_iter() no longer returns errors
Since we now always preallocate the maximum number of iterators when we
initialize a btree transaction, getting an iterator never fails - we can
delete a fair amount of error path code.

This patch also simplifies the iterator allocation code a bit.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:48 -04:00
Kent Overstreet
dbd1e8259a bcachefs: Dont' use percpu btree_iter buf in userspace
bcachefs-tools doesn't have a real percpu (per thread) implementation
yet

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:47 -04:00
Kent Overstreet
0b5c9f5940 bcachefs: Set preallocated transaction mem to avoid restarts
this will reduce transaction restarts, from observation of tracepoints.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:47 -04:00
Kent Overstreet
876c7af3a6 bcachefs: Take a SRCU lock in btree transactions
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:47 -04:00
Kent Overstreet
b3d1e6cab2 bcachefs: Fix build warning when CONFIG_BCACHEFS_DEBUG=n
this function is only used by debug code, but we'd like to always build
it so we know that it does build.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:46 -04:00
Kent Overstreet
7e7ae6ca57 bcachefs: Fix spurious transaction restarts
The checks for lock ordering violations weren't quite right.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:46 -04:00