Commit graph

135 commits

Author SHA1 Message Date
Kent Overstreet
865ad1dbf1 bcachefs: Check for bad write buffer key when moving from journal
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-06-24 15:48:00 -04:00
Kent Overstreet
cb6f5d0dec bcachefs: __bch2_insert_snapshot_whiteouts() refactoring
Now uses bch2_get_snapshot_overwrites(), and much shorter.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-31 22:03:17 -04:00
Kent Overstreet
7d886a82bf bcachefs: bch2_trans_log_str()
The data update path doesn't need a printbuf for its log message - this
will help reduce stack usage.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-22 15:13:17 -04:00
Kent Overstreet
e8f9992b0a bcachefs: Split out accounting in transaction commit
There can be a lot of rendundancy in accounting updates within a single
btree transaction.

Split out accounting updates so that they can be deduped, in the next
commit.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-21 20:15:06 -04:00
Kent Overstreet
247abee6ae bcachefs: btree_trans_subbuf
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-21 20:15:06 -04:00
Kent Overstreet
ac4c7ac90e bcachefs: Extra write buffer asserts
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-21 20:14:56 -04:00
Kent Overstreet
cf95296295 bcachefs: bch2_trans_update_ip()
Allow btree_insert_entry.ip_allocated to be passed in, so we get better
info on where alloc updates are coming from.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-21 20:14:37 -04:00
Kent Overstreet
d02755b8c5 bcachefs: trace bch2_trans_kmalloc()
We're occasionally seeing the WARN_ON() for bump allocator usage
exceeding BTREE_TRANS_MEM_MAX; add some tracing so we can see what's
going on.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-21 20:13:27 -04:00
Kent Overstreet
edaed8ee8c bcachefs: BCH_JSET_ENTRY_log_bkey
Add a journal entry type for logging - but logging a bkey, not a string;
to be used for data move path debugging.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-30 18:25:12 -04:00
Kent Overstreet
28aa859b6b bcachefs: kmsan asserts
Catching these early makes them a lot easier to track down.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-24 09:50:36 -04:00
Kent Overstreet
90fd9ad5b0 bcachefs: Change btree wb assert to runtime error
We just had a report of the assert for "btree in write buffer for
non-write buffer btree" popping during the 6.14 upgrade.

- 150TB filesystem, after a reboot the upgrade was able to continue from
  where it left off, so no major damage.

But with 6.14 about to come out we want to get this tracked down asap,
and need more data if other users hit this.

Convert the BUG_ON() to an emergency read-only, and print out btree, the
key itself, and stack trace from the original write buffer update (which
did not have this check before).

Reported-by: Stijn Tintel <stijn@linux-ipv6.be>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-14 10:25:25 -04:00
Kent Overstreet
54c9b92fc7 bcachefs: bch2_trans_log_msg()
Export a helper for logging to the journal when we're already in a
transaction context.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-12-29 13:30:39 -05:00
Kent Overstreet
7d1918b0d8 bcachefs: bch2_btree_bit_mod_iter()
factor out a new helper, make it handle extents bitset btrees
(freespace).

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-12-21 01:36:17 -05:00
Kent Overstreet
a34b026482 bcachefs: Kill BCH_TRANS_COMMIT_lazy_rw
We unconditionally go read-write, if we're going to do so, before
journal replay: lazy_rw is obsolete.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-12-21 01:36:16 -05:00
Kent Overstreet
a1ca525b82 bcachefs: avoid 'unsigned flags'
flags should have actual types, where possible: fix btree_update.h
helpers

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-12-21 01:36:15 -05:00
Kent Overstreet
a0d11feefb bcachefs: Don't use commit_do() unnecessarily
Using commit_do() to call alloc_sectors_start_trans() breaks when we're
randomly injecting transaction restarts - the restart in the commit
causes us to leak the lock that alloc_sectorS_start_trans() takes.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-10-18 00:49:48 -04:00
Kent Overstreet
4a8f8fafbd bcachefs: Add extra padding in bkey_make_mut_noupdate()
This fixes a kasan splat in propagate_key_to_snapshot_leaves() -
varint_decode_fast() does reads (that it never uses) up to 7 bytes past
the end of the integer.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-09-27 21:46:34 -04:00
Ariel Miculas
49858d869b bcachefs: bch2_btree_insert() - add btree iter flags
The commit 65bd442397 ("bcachefs: bch2_btree_insert_trans() no longer
specifies BTREE_ITER_cached") removes BTREE_ITER_cached from
bch2_btree_insert_trans, which causes the update_inode function from
bcachefs-tools to take a long time (~20s).  Add an iter_flags parameter
to bch2_btree_insert, so the users can specify iter update trigger
flags, such as BTREE_ITER_cached.

Signed-off-by: Ariel Miculas <ariel.miculas@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-07-14 19:00:15 -04:00
Kent Overstreet
1d16c605cc bcachefs: Disk space accounting rewrite
Main part of the disk accounting rewrite.

This is a wholesale rewrite of the existing disk space accounting, which
relies on percepu counters that are sharded by journal buffer, and
rolled up and added to each journal write.

With the new scheme, every set of counters is a distinct key in the
accounting btree; this fixes scaling limitations of the old scheme,
where counters took up space in each journal entry and required multiple
percpu counters.

Now, in memory accounting requires a single set of percpu counters - not
multiple for each in flight journal buffer - and in the future we'll
probably also have counters that don't use in memory percpu counters,
they're not strictly required.

An accounting update is now a normal btree update, using the btree write
buffer path. At transaction commit time, we apply accounting updates to
the in memory counters, which are percpu counters indexed in an
eytzinger tree by the accounting key.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-07-14 19:00:13 -04:00
Kent Overstreet
9dec2a473b bcachefs: Accumulate accounting keys in journal replay
Until accounting keys hit the btree, they are deltas, not new versions
of the existing key; this means we have to teach journal replay to
accumulate them.

Additionally, the journal doesn't track precisely which entries have
been flushed to the btree; it only tracks a range of entries that may
possibly still need to be flushed.

That means we need to compare accounting keys against the version in the
btree and only flush updates that are newer.

There's another wrinkle with the write buffer: if the write buffer
starts flushing accounting keys before journal replay has finished
flushing accounting keys, journal replay will see the version number
from the new updates and updates from the journal will be lost.

To avoid this, journal replay has to flush accounting keys first, and
we'll be adding a flag so that write buffer flush knows to hold
accounting keys until then.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-07-14 19:00:13 -04:00
Kent Overstreet
7773df19c3 bcachefs: metadata version bucket_stripe_sectors
New on disk format version for bch_alloc->stripe_sectors and
BCH_DATA_unstriped - accounting for unstriped data in stripe buckets.

Upgrade/downgrade requires regenerating alloc info - but only if erasure
coding is in use.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-07-14 19:00:12 -04:00
Kent Overstreet
449ceafb49 bcachefs: bch2_trans_commit_flags_to_text()
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-08 17:29:19 -04:00
Kent Overstreet
5dd8c60e1e bcachefs: iter/update/trigger/str_hash flag cleanup
Combine iter/update/trigger/str_hash flags into a single enum, and
x-macroize them for a to_text() function later.

These flags are all for a specific iter/key/update context, so it makes
sense to group them together - iter/update/trigger flags were already
given distinct bits, this cleans up and unifies that handling.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-08 17:29:18 -04:00
Kent Overstreet
e07c28ab92 bcachefs: bch2_btree_bit_mod()
Provide a non-write buffer version of bch2_btree_bit_mod_buffered(), for
the subvolume children btree.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-13 21:22:24 -04:00
Kent Overstreet
506b187603 bcachefs: bch2_btree_bit_mod -> bch2_btree_bit_mod_buffered
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-13 21:22:24 -04:00
Kent Overstreet
6474b70610 bcachefs: Clean up btree_trans
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01 11:47:44 -05:00
Kent Overstreet
7f9821a7c1 bcachefs: btree_insert_entry -> btree_path_idx_t
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01 11:47:43 -05:00
Kent Overstreet
74e600c19a bcachefs; bch2_path_put() -> btree_path_idx_t
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01 11:47:43 -05:00
Kent Overstreet
559e6c2336 bcachefs: trans_for_each_update() now declares loop iter
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01 11:47:42 -05:00
Kent Overstreet
679972348d bcachefs: kill btree_trans->wb_updates
the btree write buffer path now creates a journal entry directly

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01 11:47:41 -05:00
Kent Overstreet
24de63dacb bcachefs: Improve trans->extra_journal_entries
Instead of using a darray, we now allocate journal entries for the
transaction commit path with our normal bump allocator - with an inlined
fastpath, and using btree_transaction_stats to remember how much to
initially allocate so as to avoid transaction restarts.

This is prep work for converting write buffer updates to use this
mechanism.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01 11:47:41 -05:00
Kent Overstreet
c259bd95d1 bcachefs: No need to allocate keys for write buffer
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01 11:47:38 -05:00
Kent Overstreet
cb52d23e77 bcachefs: Rename BTREE_INSERT flags
BTREE_INSERT flags are actually transaction commit flags - rename them
for clarity.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01 11:47:37 -05:00
Kent Overstreet
5927310dcf bcachefs: bch_str_hash_flags_t
Create a separate enum for str_hash flags - instead of abusing the
btree_insert_flags enum - and create a __bitwise typedef for sparse
typechecking.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01 11:47:37 -05:00
Kent Overstreet
aa62aabbc7 bcachefs: Kill dead BTREE_INSERT flags
BTREE_INSERT_NOWAIT and BTREE_INSERT_GC_LOCK_HELD are no longer used,
and can be deleted.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01 11:47:37 -05:00
Jiapeng Chong
3b59fbec86 bcachefs: Remove duplicate include
./fs/bcachefs/btree_update.h: journal.h is included more than once.

Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Closes: https://bugzilla.openanolis.cn/show_bug.cgi?id=6573
Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:10:14 -04:00
Kent Overstreet
6bd68ec266 bcachefs: Heap allocate btree_trans
We're using more stack than we'd like in a number of functions, and
btree_trans is the biggest object that we stack allocate.

But we have to do a heap allocatation to initialize it anyways, so
there's no real downside to heap allocating the entire thing.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:10:13 -04:00
Kent Overstreet
96dea3d599 bcachefs: Fix W=12 build errors
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:10:13 -04:00
Kent Overstreet
aaad530ac6 bcachefs: BTREE_ID_logged_ops
Add a new btree for long running logged operations - i.e. for logging
operations that we can't do within a single btree transaction, so that
they can be resumed if we crash.

Keys in the logged operations btree will represent operations in
progress, with the state of the operation stored in the value.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:10:12 -04:00
Kent Overstreet
aef32bf7cc bcachefs: __bch2_btree_insert() -> bch2_btree_insert_trans()
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:10:12 -04:00
Kent Overstreet
1e81f89b02 bcachefs: Fix assorted checkpatch nits
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:10:10 -04:00
Kent Overstreet
85beefefd2 bcachefs: bch2_trans_update_extent_overwrite()
Factor out a new helper, to be used when fsck has to repair overlapping
extents.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:10:08 -04:00
Kent Overstreet
ac319b4f89 bcachefs: Move some declarations to the correct header
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:10:08 -04:00
Kent Overstreet
8e992c6c1f bcachefs: bch2_btree_bit_mod()
New helper for bitset btrees.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:10:08 -04:00
Brian Foster
eabb10dc95 bcachefs: support btree updates of prejournaled keys
Introduce support for prejournaled key updates. This allows a
transaction to commit an update for a key that already exists (and
is pinned) in the journal. This is required for btree write buffer
updates as the current scheme of journaling both on write buffer
insertion and write buffer (slow path) flush is unsafe in certain
crash recovery scenarios.

Create a small trans update wrapper to pass along the seq where the
key resides into the btree_insert_entry. From there, trans commit
passes the seq into the btree insert path where it is used to manage
the journal pin for the associated btree leaf.

Note that this patch only introduces the underlying mechanism and
otherwise includes no functional changes.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:10:08 -04:00
Kent Overstreet
f33c58fc46 bcachefs: Kill BTREE_INSERT_USE_RESERVE
Now that we have journal watermarks and alloc watermarks unified,
BTREE_INSERT_USE_RESERVE is redundant and can be deleted.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:10:05 -04:00
Kent Overstreet
ec14fc6010 bcachefs: Kill JOURNAL_WATERMARK
This unifies JOURNAL_WATERMARK with BCH_WATERMARK; we're working towards
specifying watermarks once in the transaction commit path.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:10:05 -04:00
Kent Overstreet
0fb3355d0a bcachefs: Improve bch2_bkey_make_mut()
bch2_bkey_make_mut() now takes the bkey_s_c by reference and points it
at the new, mutable key.

This helps in some fsck paths that may have multiple repair operations
on the same key.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:10:05 -04:00
Kent Overstreet
ad520141b1 bcachefs: Fix corruption with writeable snapshots
When partially overwriting an extent in an older snapshot, the existing
extent has to be split.

If the existing extent was overwritten in a different (sibling)
snapshot, we have to ensure that the split won't be visible in the
sibling snapshot.

data_update.c already has code for this,
bch2_insert_snapshot_writeouts() - we just need to move it into
btree_update_leaf.c and change bch2_trans_update_extent() to use it as
well.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:10:03 -04:00
Kent Overstreet
51e84d3bbf bcachefs: bch2_bkey_get_empty_slot()
Add a new helper for allocating a new slot in a btree.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:10:01 -04:00