Commit graph

96 commits

Author SHA1 Message Date
Kent Overstreet
ef6fac0f9e bcachefs: Plumb correct ip to trans_relock_fail tracepoint
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-06-26 00:01:16 -04:00
Alan Huang
0acb385ec1 bcachefs: Fix possible console lock involved deadlock
Link: https://lore.kernel.org/all/6822ab02.050a0220.f2294.00cb.GAE@google.com/T/
Reported-by: syzbot+2c3ef91c9523c3d1a25c@syzkaller.appspotmail.com
Signed-off-by: Alan Huang <mmpgouride@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-06-11 23:21:30 -04:00
Kent Overstreet
18dad454cd bcachefs: Replace rcu_read_lock() with guards
The new guard(), scoped_guard() allow for more natural code.

Some of the uses with creative flow control have been left.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-06-01 00:03:12 -04:00
Kent Overstreet
cd831a9494 bcachefs: factor out break_cycle_fail()
More stack usage work.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-30 01:21:12 -04:00
Kent Overstreet
b41ac97fe0 bcachefs: Path must be locked if trans->locked && should_be_locked
If path->should_be_locked is true, that means user code (of the btree
API) has seen, in this transaction, something guarded by the node this
path has locked, and we have to keep it locked until the end of the
transaction.

Assert that we're not violating this; should_be_locked should also be
cleared only in _very_ special situations.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-23 07:59:43 -04:00
Kent Overstreet
80a160e494 bcachefs: Plumb btree_trans for more locking asserts
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-23 07:59:43 -04:00
Kent Overstreet
df92f3500b bcachefs: Clear trans->locked before unlock
We're adding new should_be_locked assertions: it's going to be illegal
to unlock a should_be_locked path when trans->locked is true.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-23 07:59:43 -04:00
Kent Overstreet
be9fecdcda bcachefs: bch2_path_get() reuses paths if upgrade_fails & !should_be_locked
Small additional optimization over the previous patch, bringing us
closer to the original behaviour, except when we need to clone to avoid
a transaction restart.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-23 07:59:43 -04:00
Kent Overstreet
66782b2acb bcachefs: Fix btree_path_get_locks when not doing trans restart
btree_path_get_locks, on failure, shouldn't unlock if we're not issuing
a transaction restart: we might drop locks we're not supposed to (if
path->should_be_locked is set).

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-23 07:59:43 -04:00
Kent Overstreet
5b7b342c40 bcachefs: btree_node_locked_type_nowrite()
Small helper to improve locking assertions.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-23 07:59:43 -04:00
Kent Overstreet
136d082abc bcachefs: Improve trace_trans_restart_upgrade
- Convert to a 'fs_str' tracepoint that just emits as a string: this
  lets us build up the tracepoint with a printbuf, using our pretty
  printers, and they're much easier to manage

- Include locks_held, before and after

- Include the btree node pointer we failed on (error pointer, null, or
  real node)

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-21 20:15:11 -04:00
Kent Overstreet
8a6fa52e07 bcachefs: relock_fail tracepoint now includes btree
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-21 20:15:01 -04:00
Kent Overstreet
110bb6cb8b bcachefs: debug_check_btree_locking modparam
Don't put btree locking asserts behind CONFIG_BCACHEFS_DEBUG, put them
behind a module parameter.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-21 20:14:54 -04:00
Kent Overstreet
b51b4055c3 bcachefs: Slim down inlined part of bch2_btree_path_upgrade()
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-21 20:14:53 -04:00
Kent Overstreet
ebf561b208 bcachefs: print_str_as_lines() -> print_str()
bch2_print_string_as_lines() is a low level helper that allows messages
longer than 1k to be printed without truncation.

But we should always be printing with the helpers that take a filesystem
object, if we're in fsck they direct output to the userspace process
controlling fsck instead of the dmesg log.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-21 20:14:18 -04:00
Alan Huang
f013b4ca35 bcachefs: Kill bch2_trans_unlock_noassert
Signed-off-by: Alan Huang <mmpgouride@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-21 20:14:14 -04:00
Kent Overstreet
739200c573 bcachefs: Fix race in print_chain()
00636 Unable to handle kernel NULL pointer dereference at virtual address 00000000000000b0
00636 Mem abort info:
00636   ESR = 0x0000000096000005
00636   EC = 0x25: DABT (current EL), IL = 32 bits
00636   SET = 0, FnV = 0
00636   EA = 0, S1PTW = 0
00636   FSC = 0x05: level 1 translation fault
00636 Data abort info:
00636   ISV = 0, ISS = 0x00000005, ISS2 = 0x00000000
00636   CM = 0, WnR = 0, TnD = 0, TagAccess = 0
00636   GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
00636 user pgtable: 4k pages, 39-bit VAs, pgdp=0000000101b10000
00636 [00000000000000b0] pgd=0000000000000000, p4d=0000000000000000, pud=0000000000000000
00636 Internal error: Oops: 0000000096000005 [#1] SMP
00636 Modules linked in:
00636 CPU: 12 UID: 0 PID: 79369 Comm: cat Not tainted 6.14.0-rc6-ktest-g3783b8973ab7 #17757
00636 Hardware name: linux,dummy-virt (DT)
00636 pstate: 20001005 (nzCv daif -PAN -UAO -TCO -DIT +SSBS BTYPE=--)
00636 pc : print_chain+0xb8/0x170
00636 lr : print_chain+0xa0/0x170
00636 sp : ffffff80d9c1bbb0
00636 x29: ffffff80d9c1bbb0 x28: 0000000000000002 x27: ffffff80c1be8250
00636 x26: ffffff80dd9b0000 x25: 0000000000000020 x24: 000000000000002d
00636 x23: 000000000000003c x22: ffffffc080a54518 x21: ffffff80da6e00d0
00636 x20: ffffff80da6e0170 x19: ffffff80c1a1d240 x18: 00000000ffffffff
00636 x17: 3535303937202d3c x16: 203139202d3c2035 x15: 00000000ffffffff
00636 x14: 0000000000000000 x13: ffffff80d71b63f1 x12: 0000000000000006
00636 x11: ffffffc080beb1c0 x10: 0000000000000020 x9 : 00000000000134cc
00636 x8 : 0000000000000020 x7 : 0000000000000004 x6 : 0000000000000020
00636 x5 : ffffff80d71b63f7 x4 : ffffffc080a5451b x3 : 0000000000000000
00636 x2 : 0000000000000000 x1 : 0000000000000000 x0 : 0000000000000000
00636 Call trace:
00636  print_chain+0xb8/0x170 (P)
00636  bch2_check_for_deadlock+0x444/0x5a0
00636  bch2_btree_deadlock_read+0xb4/0x1c8
00636  full_proxy_read+0x74/0xd8
00636  vfs_read+0x90/0x300

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-24 09:50:37 -04:00
Kent Overstreet
0b4fd56726 bcachefs: btree_trans_restart_foreign_task()
In debug mode, we save the call stack on transaction restart - but
there's no locking, so we can't touch it if we're issuing the restart
from another thread.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-24 09:50:37 -04:00
Alan Huang
677bdb7346 bcachefs: Fix deadlock
This fixes two deadlocks:

1.pcpu_alloc_mutex involved one as pointed by syzbot[1]
2.recursion deadlock.

The root cause is that we hold the bc lock during alloc_percpu, fix it
by following the pattern used by __btree_node_mem_alloc().

[1] https://lore.kernel.org/all/66f97d9a.050a0220.6bad9.001d.GAE@google.com/T/

Reported-by: syzbot+fe63f377148a6371a9db@syzkaller.appspotmail.com
Tested-by: syzbot+fe63f377148a6371a9db@syzkaller.appspotmail.com
Signed-off-by: Alan Huang <mmpgouride@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-02-26 19:31:05 -05:00
Alan Huang
5dd21b2712 bcachefs: Pop all the transactions from the abort one
The transaction is going to abort, so there will be no cycle involving
this transaction anymore.

Signed-off-by: Alan Huang <mmpgouride@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-01-14 10:45:25 -05:00
Alan Huang
b169138d48 bcachefs: Only abort the transactions in the cycle
When the cycle doesn't involve the initiator of the cycle detection,
we might choose a transaction that is not involved in the cycle to abort.
It shouldn't be that since it won't break the cycle, this patch
therefore chooses the transaction in the cycle to abort.

Signed-off-by: Alan Huang <mmpgouride@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-01-14 10:45:18 -05:00
Alan Huang
6853a5e5d4 bcachefs: Introduce lock_graph_pop_from
This patch introduces a helper function called lock_graph_pop_from,
it pops the graph from i.

Signed-off-by: Alan Huang <mmpgouride@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-01-14 10:45:13 -05:00
Alan Huang
b5c3dcd0db bcachefs: Convert open-coded lock_graph_pop_all to helper
Signed-off-by: Alan Huang <mmpgouride@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-01-14 10:45:08 -05:00
Alan Huang
0ef9ab34f4 bcachefs: Do not allow no fail lock request to fail
If the transaction chose itself as a victim before and restarted, it
might request a no fail lock request this time. But it might be added to
others' lock graph and be chose as the victim again, it's no longer safe
without additional check. We can also convert the cycle detector to be
fully RCU-based to solve that unsoundness, but the latency added to trans_put
and additional memory required may not worth it.

Signed-off-by: Alan Huang <mmpgouride@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-01-14 10:45:08 -05:00
Alan Huang
9c13cc9c7d Revert "bcachefs: Fix bch2_btree_node_upgrade()"
This reverts commit 62448afee7.

six_lock_tryupgrade fails only if there is an intent lock held,
it won't fail no matter how many read locks are held.

Signed-off-by: Alan Huang <mmpgouride@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-01-14 10:45:08 -05:00
Kent Overstreet
0971a72c3d bcachefs: bch2_trans_unlock_write()
New helper for dropping all write locks; which is distinct from the
helper the transaction commit path uses, which is faster and only
touches updates.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-01-09 23:38:42 -05:00
Kent Overstreet
6adc5af50a bcachefs: btree_path_very_locks(): verify lock seq
If the btree_path's lock seq is wrong, the next bch2_trans_relock()
operation is guaranteed to fail and we take an unnecessary transaction
restart.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-01-09 23:38:41 -05:00
Kent Overstreet
ff1dd05f82 bcachefs: bch2_trans_relock() is trylock for lockdep
fix some spurious lockdep splats

Reported-by: syzbot+e088be3c2d5c05aaac35@syzkaller.appspotmail.com
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-12-21 01:36:20 -05:00
Kent Overstreet
efb2018e4d bcachefs: Kill bch2_assert_btree_nodes_not_locked()
We no longer track individual btree node locks with lockdep, so this
will never be enabled.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-07-14 19:59:12 -04:00
Kent Overstreet
1a616c2fe9 lockdep: lockdep_set_notrack_class()
Add a new helper to disable lockdep tracking entirely for a given class.

This is needed for bcachefs, which takes too many btree node locks for
lockdep to track. Instead, we have a single lockdep_map for "btree_trans
has any btree nodes locked", which makes more since given that we have
centralized lock management and a cycle detector.

Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Will Deacon <will@kernel.org>
Cc: Waiman Long <longman@redhat.com>
Cc: Boqun Feng <boqun.feng@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-07-14 19:00:16 -04:00
Kent Overstreet
f236ea4bca bcachefs: Set PF_MEMALLOC_NOFS when trans->locked
proper lock ordering is: fs_reclaim -> btree node locks

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-07-11 20:10:55 -04:00
Kent Overstreet
fd80d14005 bcachefs: fix scheduling while atomic in break_cycle()
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-07-10 12:59:28 -04:00
Kent Overstreet
9a64e1bfd8 bcachefs: Fix GFP_KERNEL allocation in break_cycle()
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-06-05 10:44:08 -04:00
Kent Overstreet
fd104e2967 bcachefs: bch2_trans_verify_not_unlocked()
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-08 17:29:19 -04:00
Kent Overstreet
650db8a87c bcachefs: trans->locked
Add a field for tracking whether a transaction object holds btree locks,
and assertions to verify state.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-08 17:29:19 -04:00
Kent Overstreet
5d8c9d9428 bcachefs: bch2_btree_path_upgrade() checks nodes_locked, not uptodate
In the key cache fill path, we use path_upgrade() on a path that isn't
uptodate yet but should be locked.

This change makes bch2_btree_path_upgrade() slightly looser so we can
use it in key cache upgrade, instead of the __ version.

Also, make the related assert - that path->uptodate implies nodes_locked
- slightly clearer.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-08 17:29:19 -04:00
Kent Overstreet
923ed0ae5e bcachefs: bch2_trans_relock_fail() - factor out slowpath
Factor out slowpath into a separate helper

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-08 17:29:19 -04:00
Kent Overstreet
00589cadb1 bcachefs: bch2_btree_path_to_text()
Long form version of bch2_btree_path_to_text() - useful in error
messages and tracepoints.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-08 17:29:18 -04:00
Kent Overstreet
7423330e30 bcachefs: prt_printf() now respects \r\n\t
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-08 17:29:17 -04:00
Kent Overstreet
517236cb3e bcachefs: Kill read lock dropping in bch2_btree_node_lock_write_nofail()
dropping read locks in bch2_btree_node_lock_write_nofail() dates from
before we had the cycle detector; we can now tell the cycle detector
directly when taking a lock may not fail because we can't handle
transaction restarts.

This is needed for adding should_be_locked asserts.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-04-10 22:28:36 -04:00
Kent Overstreet
29e11f9699 bcachefs: Drop redundant btree_path_downgrade()s
If a path doesn't have any active references, we shouldn't downgrade it;
it'll either be reused, possibly with intent refs again, or dropped at
bch2_trans_begin() time.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-13 21:22:25 -04:00
Kent Overstreet
612e1110d6 bcachefs: Add gfp flags param to bch2_prt_task_backtrace()
Fixes: e6a2566f7a ("bcachefs: Better journal tracepoints")
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Reported-by: smatch
2024-01-22 12:37:51 -05:00
Kent Overstreet
b97de45365 bcachefs: Improve trace_trans_restart_relock
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-21 13:27:10 -05:00
Kent Overstreet
c13fbb7de2 bcachefs: Improve would_deadlock trace event
We now include backtraces for every thread involved in the cycle.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-05 23:24:21 -05:00
Kent Overstreet
31403dca5b bcachefs: optimize __bch2_trans_get(), kill DEBUG_TRANSACTIONS
- Some tweaks to greatly reduce locking overhead for the list of btree
   transactions, so that it can always be enabled: leave btree_trans
   objects on the list when they're on the percpu single item freelist,
   and only check for duplicates in the same process when
   CONFIG_BCACHEFS_DEBUG is enabled

 - don't zero out the full btree_trans() unless we allocated it from
   the mempool

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01 11:47:44 -05:00
Kent Overstreet
fea153a845 bcachefs: rcu protect trans->paths
Upcoming patches are going to be changing trans->paths to a
reallocatable buffer. We need to guard against use after free when it's
used by other threads; this introduces RCU protection to those paths and
changes them to check for trans->paths == NULL

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01 11:47:44 -05:00
Kent Overstreet
398c98347d bcachefs: kill btree_path.idx
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01 11:47:44 -05:00
Kent Overstreet
ccb7b08fbb bcachefs: trans_for_each_path() no longer uses path->idx
path->idx is now a code smell: we should be using path_idx_t, since it's
stable across btree path reallocation.

This is also a bit faster, using the same loop counter vs. fetching
path->idx from each path we iterate over.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01 11:47:43 -05:00
Kent Overstreet
a83b6c895c bcachefs: kill btree_path->(alloc_seq|downgrade_seq)
These were for extra info in tracepoints for debugging a specialized
issue - we do not want to bloat btree_path for this, at least in release
builds.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01 11:47:41 -05:00
Kent Overstreet
3398124444 bcachefs: Improve trace_trans_restart_would_deadlock
In the CI, we're seeing tests failing due to excessive would_deadlock
transaction restarts - the tracepoint now includes the lock cycle that
occured.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01 11:47:39 -05:00