2019-05-27 08:55:01 +02:00
|
|
|
// SPDX-License-Identifier: GPL-2.0-or-later
|
2007-04-26 15:49:28 -07:00
|
|
|
/* AFS cell and server record management
|
2005-04-16 15:20:36 -07:00
|
|
|
*
|
2017-11-02 15:27:50 +00:00
|
|
|
* Copyright (C) 2002, 2017 Red Hat, Inc. All Rights Reserved.
|
2005-04-16 15:20:36 -07:00
|
|
|
* Written by David Howells (dhowells@redhat.com)
|
|
|
|
*/
|
|
|
|
|
|
|
|
#include <linux/slab.h>
|
2007-04-26 15:57:07 -07:00
|
|
|
#include <linux/key.h>
|
|
|
|
#include <linux/ctype.h>
|
2010-08-04 15:16:38 +01:00
|
|
|
#include <linux/dns_resolver.h>
|
Detach sched.h from mm.h
First thing mm.h does is including sched.h solely for can_do_mlock() inline
function which has "current" dereference inside. By dealing with can_do_mlock()
mm.h can be detached from sched.h which is good. See below, why.
This patch
a) removes unconditional inclusion of sched.h from mm.h
b) makes can_do_mlock() normal function in mm/mlock.c
c) exports can_do_mlock() to not break compilation
d) adds sched.h inclusions back to files that were getting it indirectly.
e) adds less bloated headers to some files (asm/signal.h, jiffies.h) that were
getting them indirectly
Net result is:
a) mm.h users would get less code to open, read, preprocess, parse, ... if
they don't need sched.h
b) sched.h stops being dependency for significant number of files:
on x86_64 allmodconfig touching sched.h results in recompile of 4083 files,
after patch it's only 3744 (-8.3%).
Cross-compile tested on
all arm defconfigs, all mips defconfigs, all powerpc defconfigs,
alpha alpha-up
arm
i386 i386-up i386-defconfig i386-allnoconfig
ia64 ia64-up
m68k
mips
parisc parisc-up
powerpc powerpc-up
s390 s390-up
sparc sparc-up
sparc64 sparc64-up
um-x86_64
x86_64 x86_64-up x86_64-defconfig x86_64-allnoconfig
as well as my two usual configs.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-21 01:22:52 +04:00
|
|
|
#include <linux/sched.h>
|
2017-11-02 15:27:47 +00:00
|
|
|
#include <linux/inet.h>
|
2018-06-15 15:19:22 +01:00
|
|
|
#include <linux/namei.h>
|
2007-04-26 15:57:07 -07:00
|
|
|
#include <keys/rxrpc-type.h>
|
2005-04-16 15:20:36 -07:00
|
|
|
#include "internal.h"
|
|
|
|
|
2018-04-09 21:12:31 +01:00
|
|
|
static unsigned __read_mostly afs_cell_gc_delay = 10;
|
2018-10-20 00:57:57 +01:00
|
|
|
static unsigned __read_mostly afs_cell_min_ttl = 10 * 60;
|
|
|
|
static unsigned __read_mostly afs_cell_max_ttl = 24 * 60 * 60;
|
2020-10-13 20:51:59 +01:00
|
|
|
static atomic_t cell_debug_id;
|
2017-11-02 15:27:50 +00:00
|
|
|
|
afs: Simplify cell record handling
Simplify afs_cell record handling to avoid very occasional races that cause
module removal to hang (it waits for all cell records to be removed).
There are two things that particularly contribute to the difficulty:
firstly, the code tries to pass a ref on the cell to the cell's maintenance
work item (which gets awkward if the work item is already queued); and,
secondly, there's an overall cell manager that tries to use just one timer
for the entire cell collection (to avoid having loads of timers). However,
both of these are probably unnecessarily restrictive.
To simplify this, the following changes are made:
(1) The cell record collection manager is removed. Each cell record
manages itself individually.
(2) Each afs_cell is given a second work item (cell->destroyer) that is
queued when its refcount reaches zero. This is not done in the
context of the putting thread as it might be in an inconvenient place
to sleep.
(3) Each afs_cell is given its own timer. The timer is used to expire the
cell record after a period of unuse if not otherwise pinned and can
also be used for other maintenance tasks if necessary (of which there
are currently none as DNS refresh is triggered by filesystem
operations).
(4) The afs_cell manager work item (cell->manager) is no longer given a
ref on the cell when queued; rather, the manager must be deleted.
This does away with the need to deal with the consequences of losing a
race to queue cell->manager. Clean up of extra queuing is deferred to
the destroyer.
(5) The cell destroyer work item makes sure the cell timer is removed and
that the normal cell work is cancelled before farming the actual
destruction off to RCU.
(6) When a network namespace is destroyed or the kafs module is unloaded,
it's now a simple matter of marking the namespace as dead then just
waking up all the cell work items. They will then remove and destroy
themselves once all remaining activity counts and/or a ref counts are
dropped. This makes sure that all server records are dropped first.
(7) The cell record state set is reduced to just four states: SETTING_UP,
ACTIVE, REMOVING and DEAD. The record persists in the active state
even when it's not being used until the time comes to remove it rather
than downgrading it to an inactive state from whence it can be
restored.
This means that the cell still appears in /proc and /afs when not in
use until it switches to the REMOVING state - at which point it is
removed.
Note that the REMOVING state is included so that someone wanting to
resurrect the cell record is forced to wait whilst the cell is torn
down in that state. Once it's in the DEAD state, it has been removed
from net->cells tree and is no longer findable and can be replaced.
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
cc: linux-fsdevel@vger.kernel.org
Link: https://lore.kernel.org/r/20250224234154.2014840-16-dhowells@redhat.com/ # v1
Link: https://lore.kernel.org/r/20250310094206.801057-12-dhowells@redhat.com/ # v4
2025-02-24 16:06:03 +00:00
|
|
|
static void afs_cell_timer(struct timer_list *timer);
|
|
|
|
static void afs_destroy_cell_work(struct work_struct *work);
|
|
|
|
static void afs_manage_cell_work(struct work_struct *work);
|
2017-11-02 15:27:50 +00:00
|
|
|
|
|
|
|
static void afs_dec_cells_outstanding(struct afs_net *net)
|
|
|
|
{
|
|
|
|
if (atomic_dec_and_test(&net->cells_outstanding))
|
2018-03-15 11:42:28 +01:00
|
|
|
wake_up_var(&net->cells_outstanding);
|
2017-11-02 15:27:50 +00:00
|
|
|
}
|
|
|
|
|
afs: Simplify cell record handling
Simplify afs_cell record handling to avoid very occasional races that cause
module removal to hang (it waits for all cell records to be removed).
There are two things that particularly contribute to the difficulty:
firstly, the code tries to pass a ref on the cell to the cell's maintenance
work item (which gets awkward if the work item is already queued); and,
secondly, there's an overall cell manager that tries to use just one timer
for the entire cell collection (to avoid having loads of timers). However,
both of these are probably unnecessarily restrictive.
To simplify this, the following changes are made:
(1) The cell record collection manager is removed. Each cell record
manages itself individually.
(2) Each afs_cell is given a second work item (cell->destroyer) that is
queued when its refcount reaches zero. This is not done in the
context of the putting thread as it might be in an inconvenient place
to sleep.
(3) Each afs_cell is given its own timer. The timer is used to expire the
cell record after a period of unuse if not otherwise pinned and can
also be used for other maintenance tasks if necessary (of which there
are currently none as DNS refresh is triggered by filesystem
operations).
(4) The afs_cell manager work item (cell->manager) is no longer given a
ref on the cell when queued; rather, the manager must be deleted.
This does away with the need to deal with the consequences of losing a
race to queue cell->manager. Clean up of extra queuing is deferred to
the destroyer.
(5) The cell destroyer work item makes sure the cell timer is removed and
that the normal cell work is cancelled before farming the actual
destruction off to RCU.
(6) When a network namespace is destroyed or the kafs module is unloaded,
it's now a simple matter of marking the namespace as dead then just
waking up all the cell work items. They will then remove and destroy
themselves once all remaining activity counts and/or a ref counts are
dropped. This makes sure that all server records are dropped first.
(7) The cell record state set is reduced to just four states: SETTING_UP,
ACTIVE, REMOVING and DEAD. The record persists in the active state
even when it's not being used until the time comes to remove it rather
than downgrading it to an inactive state from whence it can be
restored.
This means that the cell still appears in /proc and /afs when not in
use until it switches to the REMOVING state - at which point it is
removed.
Note that the REMOVING state is included so that someone wanting to
resurrect the cell record is forced to wait whilst the cell is torn
down in that state. Once it's in the DEAD state, it has been removed
from net->cells tree and is no longer findable and can be replaced.
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
cc: linux-fsdevel@vger.kernel.org
Link: https://lore.kernel.org/r/20250224234154.2014840-16-dhowells@redhat.com/ # v1
Link: https://lore.kernel.org/r/20250310094206.801057-12-dhowells@redhat.com/ # v4
2025-02-24 16:06:03 +00:00
|
|
|
static void afs_set_cell_state(struct afs_cell *cell, enum afs_cell_state state)
|
2005-04-16 15:20:36 -07:00
|
|
|
{
|
afs: Simplify cell record handling
Simplify afs_cell record handling to avoid very occasional races that cause
module removal to hang (it waits for all cell records to be removed).
There are two things that particularly contribute to the difficulty:
firstly, the code tries to pass a ref on the cell to the cell's maintenance
work item (which gets awkward if the work item is already queued); and,
secondly, there's an overall cell manager that tries to use just one timer
for the entire cell collection (to avoid having loads of timers). However,
both of these are probably unnecessarily restrictive.
To simplify this, the following changes are made:
(1) The cell record collection manager is removed. Each cell record
manages itself individually.
(2) Each afs_cell is given a second work item (cell->destroyer) that is
queued when its refcount reaches zero. This is not done in the
context of the putting thread as it might be in an inconvenient place
to sleep.
(3) Each afs_cell is given its own timer. The timer is used to expire the
cell record after a period of unuse if not otherwise pinned and can
also be used for other maintenance tasks if necessary (of which there
are currently none as DNS refresh is triggered by filesystem
operations).
(4) The afs_cell manager work item (cell->manager) is no longer given a
ref on the cell when queued; rather, the manager must be deleted.
This does away with the need to deal with the consequences of losing a
race to queue cell->manager. Clean up of extra queuing is deferred to
the destroyer.
(5) The cell destroyer work item makes sure the cell timer is removed and
that the normal cell work is cancelled before farming the actual
destruction off to RCU.
(6) When a network namespace is destroyed or the kafs module is unloaded,
it's now a simple matter of marking the namespace as dead then just
waking up all the cell work items. They will then remove and destroy
themselves once all remaining activity counts and/or a ref counts are
dropped. This makes sure that all server records are dropped first.
(7) The cell record state set is reduced to just four states: SETTING_UP,
ACTIVE, REMOVING and DEAD. The record persists in the active state
even when it's not being used until the time comes to remove it rather
than downgrading it to an inactive state from whence it can be
restored.
This means that the cell still appears in /proc and /afs when not in
use until it switches to the REMOVING state - at which point it is
removed.
Note that the REMOVING state is included so that someone wanting to
resurrect the cell record is forced to wait whilst the cell is torn
down in that state. Once it's in the DEAD state, it has been removed
from net->cells tree and is no longer findable and can be replaced.
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
cc: linux-fsdevel@vger.kernel.org
Link: https://lore.kernel.org/r/20250224234154.2014840-16-dhowells@redhat.com/ # v1
Link: https://lore.kernel.org/r/20250310094206.801057-12-dhowells@redhat.com/ # v4
2025-02-24 16:06:03 +00:00
|
|
|
smp_store_release(&cell->state, state); /* Commit cell changes before state */
|
|
|
|
smp_wmb(); /* Set cell state before task state */
|
|
|
|
wake_up_var(&cell->state);
|
2017-11-02 15:27:50 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
afs: Fix rapid cell addition/removal by not using RCU on cells tree
There are a number of problems that are being seen by the rapidly mounting
and unmounting an afs dynamic root with an explicit cell and volume
specified (which should probably be rejected, but that's a separate issue):
What the tests are doing is to look up/create a cell record for the name
given and then tear it down again without actually using it to try to talk
to a server. This is repeated endlessly, very fast, and the new cell
collides with the old one if it's not quick enough to reuse it.
It appears (as suggested by Hillf Danton) that the search through the RB
tree under a read_seqbegin_or_lock() under RCU conditions isn't safe and
that it's not blocking the write_seqlock(), despite taking two passes at
it. He suggested that the code should take a ref on the cell it's
attempting to look at - but this shouldn't be necessary until we've
compared the cell names. It's possible that I'm missing a barrier
somewhere.
However, using an RCU search for this is overkill, really - we only need to
access the cell name in a few places, and they're places where we're may
end up sleeping anyway.
Fix this by switching to an R/W semaphore instead.
Additionally, draw the down_read() call inside the function (renamed to
afs_find_cell()) since all the callers were taking the RCU read lock (or
should've been[*]).
[*] afs_probe_cell_name() should have been, but that doesn't appear to be
involved in the bug reports.
The symptoms of this look like:
general protection fault, probably for non-canonical address 0xf27d208691691fdb: 0000 [#1] PREEMPT SMP KASAN
KASAN: maybe wild-memory-access in range [0x93e924348b48fed8-0x93e924348b48fedf]
...
RIP: 0010:strncasecmp lib/string.c:52 [inline]
RIP: 0010:strncasecmp+0x5f/0x240 lib/string.c:43
afs_lookup_cell_rcu+0x313/0x720 fs/afs/cell.c:88
afs_lookup_cell+0x2ee/0x1440 fs/afs/cell.c:249
afs_parse_source fs/afs/super.c:290 [inline]
...
Fixes: 989782dcdc91 ("afs: Overhaul cell database management")
Reported-by: syzbot+459a5dce0b4cb70fd076@syzkaller.appspotmail.com
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Hillf Danton <hdanton@sina.com>
cc: syzkaller-bugs@googlegroups.com
2020-10-09 14:11:58 +01:00
|
|
|
* Look up and get an activation reference on a cell record. The caller must
|
|
|
|
* hold net->cells_lock at least read-locked.
|
2017-11-02 15:27:50 +00:00
|
|
|
*/
|
afs: Fix rapid cell addition/removal by not using RCU on cells tree
There are a number of problems that are being seen by the rapidly mounting
and unmounting an afs dynamic root with an explicit cell and volume
specified (which should probably be rejected, but that's a separate issue):
What the tests are doing is to look up/create a cell record for the name
given and then tear it down again without actually using it to try to talk
to a server. This is repeated endlessly, very fast, and the new cell
collides with the old one if it's not quick enough to reuse it.
It appears (as suggested by Hillf Danton) that the search through the RB
tree under a read_seqbegin_or_lock() under RCU conditions isn't safe and
that it's not blocking the write_seqlock(), despite taking two passes at
it. He suggested that the code should take a ref on the cell it's
attempting to look at - but this shouldn't be necessary until we've
compared the cell names. It's possible that I'm missing a barrier
somewhere.
However, using an RCU search for this is overkill, really - we only need to
access the cell name in a few places, and they're places where we're may
end up sleeping anyway.
Fix this by switching to an R/W semaphore instead.
Additionally, draw the down_read() call inside the function (renamed to
afs_find_cell()) since all the callers were taking the RCU read lock (or
should've been[*]).
[*] afs_probe_cell_name() should have been, but that doesn't appear to be
involved in the bug reports.
The symptoms of this look like:
general protection fault, probably for non-canonical address 0xf27d208691691fdb: 0000 [#1] PREEMPT SMP KASAN
KASAN: maybe wild-memory-access in range [0x93e924348b48fed8-0x93e924348b48fedf]
...
RIP: 0010:strncasecmp lib/string.c:52 [inline]
RIP: 0010:strncasecmp+0x5f/0x240 lib/string.c:43
afs_lookup_cell_rcu+0x313/0x720 fs/afs/cell.c:88
afs_lookup_cell+0x2ee/0x1440 fs/afs/cell.c:249
afs_parse_source fs/afs/super.c:290 [inline]
...
Fixes: 989782dcdc91 ("afs: Overhaul cell database management")
Reported-by: syzbot+459a5dce0b4cb70fd076@syzkaller.appspotmail.com
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Hillf Danton <hdanton@sina.com>
cc: syzkaller-bugs@googlegroups.com
2020-10-09 14:11:58 +01:00
|
|
|
static struct afs_cell *afs_find_cell_locked(struct afs_net *net,
|
2020-10-13 20:51:59 +01:00
|
|
|
const char *name, unsigned int namesz,
|
|
|
|
enum afs_cell_trace reason)
|
2017-11-02 15:27:50 +00:00
|
|
|
{
|
|
|
|
struct afs_cell *cell = NULL;
|
|
|
|
struct rb_node *p;
|
afs: Fix rapid cell addition/removal by not using RCU on cells tree
There are a number of problems that are being seen by the rapidly mounting
and unmounting an afs dynamic root with an explicit cell and volume
specified (which should probably be rejected, but that's a separate issue):
What the tests are doing is to look up/create a cell record for the name
given and then tear it down again without actually using it to try to talk
to a server. This is repeated endlessly, very fast, and the new cell
collides with the old one if it's not quick enough to reuse it.
It appears (as suggested by Hillf Danton) that the search through the RB
tree under a read_seqbegin_or_lock() under RCU conditions isn't safe and
that it's not blocking the write_seqlock(), despite taking two passes at
it. He suggested that the code should take a ref on the cell it's
attempting to look at - but this shouldn't be necessary until we've
compared the cell names. It's possible that I'm missing a barrier
somewhere.
However, using an RCU search for this is overkill, really - we only need to
access the cell name in a few places, and they're places where we're may
end up sleeping anyway.
Fix this by switching to an R/W semaphore instead.
Additionally, draw the down_read() call inside the function (renamed to
afs_find_cell()) since all the callers were taking the RCU read lock (or
should've been[*]).
[*] afs_probe_cell_name() should have been, but that doesn't appear to be
involved in the bug reports.
The symptoms of this look like:
general protection fault, probably for non-canonical address 0xf27d208691691fdb: 0000 [#1] PREEMPT SMP KASAN
KASAN: maybe wild-memory-access in range [0x93e924348b48fed8-0x93e924348b48fedf]
...
RIP: 0010:strncasecmp lib/string.c:52 [inline]
RIP: 0010:strncasecmp+0x5f/0x240 lib/string.c:43
afs_lookup_cell_rcu+0x313/0x720 fs/afs/cell.c:88
afs_lookup_cell+0x2ee/0x1440 fs/afs/cell.c:249
afs_parse_source fs/afs/super.c:290 [inline]
...
Fixes: 989782dcdc91 ("afs: Overhaul cell database management")
Reported-by: syzbot+459a5dce0b4cb70fd076@syzkaller.appspotmail.com
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Hillf Danton <hdanton@sina.com>
cc: syzkaller-bugs@googlegroups.com
2020-10-09 14:11:58 +01:00
|
|
|
int n;
|
2017-11-02 15:27:50 +00:00
|
|
|
|
|
|
|
_enter("%*.*s", namesz, namesz, name);
|
|
|
|
|
|
|
|
if (name && namesz == 0)
|
|
|
|
return ERR_PTR(-EINVAL);
|
|
|
|
if (namesz > AFS_MAXCELLNAME)
|
|
|
|
return ERR_PTR(-ENAMETOOLONG);
|
|
|
|
|
afs: Fix rapid cell addition/removal by not using RCU on cells tree
There are a number of problems that are being seen by the rapidly mounting
and unmounting an afs dynamic root with an explicit cell and volume
specified (which should probably be rejected, but that's a separate issue):
What the tests are doing is to look up/create a cell record for the name
given and then tear it down again without actually using it to try to talk
to a server. This is repeated endlessly, very fast, and the new cell
collides with the old one if it's not quick enough to reuse it.
It appears (as suggested by Hillf Danton) that the search through the RB
tree under a read_seqbegin_or_lock() under RCU conditions isn't safe and
that it's not blocking the write_seqlock(), despite taking two passes at
it. He suggested that the code should take a ref on the cell it's
attempting to look at - but this shouldn't be necessary until we've
compared the cell names. It's possible that I'm missing a barrier
somewhere.
However, using an RCU search for this is overkill, really - we only need to
access the cell name in a few places, and they're places where we're may
end up sleeping anyway.
Fix this by switching to an R/W semaphore instead.
Additionally, draw the down_read() call inside the function (renamed to
afs_find_cell()) since all the callers were taking the RCU read lock (or
should've been[*]).
[*] afs_probe_cell_name() should have been, but that doesn't appear to be
involved in the bug reports.
The symptoms of this look like:
general protection fault, probably for non-canonical address 0xf27d208691691fdb: 0000 [#1] PREEMPT SMP KASAN
KASAN: maybe wild-memory-access in range [0x93e924348b48fed8-0x93e924348b48fedf]
...
RIP: 0010:strncasecmp lib/string.c:52 [inline]
RIP: 0010:strncasecmp+0x5f/0x240 lib/string.c:43
afs_lookup_cell_rcu+0x313/0x720 fs/afs/cell.c:88
afs_lookup_cell+0x2ee/0x1440 fs/afs/cell.c:249
afs_parse_source fs/afs/super.c:290 [inline]
...
Fixes: 989782dcdc91 ("afs: Overhaul cell database management")
Reported-by: syzbot+459a5dce0b4cb70fd076@syzkaller.appspotmail.com
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Hillf Danton <hdanton@sina.com>
cc: syzkaller-bugs@googlegroups.com
2020-10-09 14:11:58 +01:00
|
|
|
if (!name) {
|
2025-03-06 08:46:57 +00:00
|
|
|
cell = rcu_dereference_protected(net->ws_cell,
|
|
|
|
lockdep_is_held(&net->cells_lock));
|
afs: Fix rapid cell addition/removal by not using RCU on cells tree
There are a number of problems that are being seen by the rapidly mounting
and unmounting an afs dynamic root with an explicit cell and volume
specified (which should probably be rejected, but that's a separate issue):
What the tests are doing is to look up/create a cell record for the name
given and then tear it down again without actually using it to try to talk
to a server. This is repeated endlessly, very fast, and the new cell
collides with the old one if it's not quick enough to reuse it.
It appears (as suggested by Hillf Danton) that the search through the RB
tree under a read_seqbegin_or_lock() under RCU conditions isn't safe and
that it's not blocking the write_seqlock(), despite taking two passes at
it. He suggested that the code should take a ref on the cell it's
attempting to look at - but this shouldn't be necessary until we've
compared the cell names. It's possible that I'm missing a barrier
somewhere.
However, using an RCU search for this is overkill, really - we only need to
access the cell name in a few places, and they're places where we're may
end up sleeping anyway.
Fix this by switching to an R/W semaphore instead.
Additionally, draw the down_read() call inside the function (renamed to
afs_find_cell()) since all the callers were taking the RCU read lock (or
should've been[*]).
[*] afs_probe_cell_name() should have been, but that doesn't appear to be
involved in the bug reports.
The symptoms of this look like:
general protection fault, probably for non-canonical address 0xf27d208691691fdb: 0000 [#1] PREEMPT SMP KASAN
KASAN: maybe wild-memory-access in range [0x93e924348b48fed8-0x93e924348b48fedf]
...
RIP: 0010:strncasecmp lib/string.c:52 [inline]
RIP: 0010:strncasecmp+0x5f/0x240 lib/string.c:43
afs_lookup_cell_rcu+0x313/0x720 fs/afs/cell.c:88
afs_lookup_cell+0x2ee/0x1440 fs/afs/cell.c:249
afs_parse_source fs/afs/super.c:290 [inline]
...
Fixes: 989782dcdc91 ("afs: Overhaul cell database management")
Reported-by: syzbot+459a5dce0b4cb70fd076@syzkaller.appspotmail.com
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Hillf Danton <hdanton@sina.com>
cc: syzkaller-bugs@googlegroups.com
2020-10-09 14:11:58 +01:00
|
|
|
if (!cell)
|
|
|
|
return ERR_PTR(-EDESTADDRREQ);
|
2019-07-23 11:24:59 +01:00
|
|
|
goto found;
|
afs: Fix rapid cell addition/removal by not using RCU on cells tree
There are a number of problems that are being seen by the rapidly mounting
and unmounting an afs dynamic root with an explicit cell and volume
specified (which should probably be rejected, but that's a separate issue):
What the tests are doing is to look up/create a cell record for the name
given and then tear it down again without actually using it to try to talk
to a server. This is repeated endlessly, very fast, and the new cell
collides with the old one if it's not quick enough to reuse it.
It appears (as suggested by Hillf Danton) that the search through the RB
tree under a read_seqbegin_or_lock() under RCU conditions isn't safe and
that it's not blocking the write_seqlock(), despite taking two passes at
it. He suggested that the code should take a ref on the cell it's
attempting to look at - but this shouldn't be necessary until we've
compared the cell names. It's possible that I'm missing a barrier
somewhere.
However, using an RCU search for this is overkill, really - we only need to
access the cell name in a few places, and they're places where we're may
end up sleeping anyway.
Fix this by switching to an R/W semaphore instead.
Additionally, draw the down_read() call inside the function (renamed to
afs_find_cell()) since all the callers were taking the RCU read lock (or
should've been[*]).
[*] afs_probe_cell_name() should have been, but that doesn't appear to be
involved in the bug reports.
The symptoms of this look like:
general protection fault, probably for non-canonical address 0xf27d208691691fdb: 0000 [#1] PREEMPT SMP KASAN
KASAN: maybe wild-memory-access in range [0x93e924348b48fed8-0x93e924348b48fedf]
...
RIP: 0010:strncasecmp lib/string.c:52 [inline]
RIP: 0010:strncasecmp+0x5f/0x240 lib/string.c:43
afs_lookup_cell_rcu+0x313/0x720 fs/afs/cell.c:88
afs_lookup_cell+0x2ee/0x1440 fs/afs/cell.c:249
afs_parse_source fs/afs/super.c:290 [inline]
...
Fixes: 989782dcdc91 ("afs: Overhaul cell database management")
Reported-by: syzbot+459a5dce0b4cb70fd076@syzkaller.appspotmail.com
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Hillf Danton <hdanton@sina.com>
cc: syzkaller-bugs@googlegroups.com
2020-10-09 14:11:58 +01:00
|
|
|
}
|
2017-11-02 15:27:50 +00:00
|
|
|
|
afs: Fix rapid cell addition/removal by not using RCU on cells tree
There are a number of problems that are being seen by the rapidly mounting
and unmounting an afs dynamic root with an explicit cell and volume
specified (which should probably be rejected, but that's a separate issue):
What the tests are doing is to look up/create a cell record for the name
given and then tear it down again without actually using it to try to talk
to a server. This is repeated endlessly, very fast, and the new cell
collides with the old one if it's not quick enough to reuse it.
It appears (as suggested by Hillf Danton) that the search through the RB
tree under a read_seqbegin_or_lock() under RCU conditions isn't safe and
that it's not blocking the write_seqlock(), despite taking two passes at
it. He suggested that the code should take a ref on the cell it's
attempting to look at - but this shouldn't be necessary until we've
compared the cell names. It's possible that I'm missing a barrier
somewhere.
However, using an RCU search for this is overkill, really - we only need to
access the cell name in a few places, and they're places where we're may
end up sleeping anyway.
Fix this by switching to an R/W semaphore instead.
Additionally, draw the down_read() call inside the function (renamed to
afs_find_cell()) since all the callers were taking the RCU read lock (or
should've been[*]).
[*] afs_probe_cell_name() should have been, but that doesn't appear to be
involved in the bug reports.
The symptoms of this look like:
general protection fault, probably for non-canonical address 0xf27d208691691fdb: 0000 [#1] PREEMPT SMP KASAN
KASAN: maybe wild-memory-access in range [0x93e924348b48fed8-0x93e924348b48fedf]
...
RIP: 0010:strncasecmp lib/string.c:52 [inline]
RIP: 0010:strncasecmp+0x5f/0x240 lib/string.c:43
afs_lookup_cell_rcu+0x313/0x720 fs/afs/cell.c:88
afs_lookup_cell+0x2ee/0x1440 fs/afs/cell.c:249
afs_parse_source fs/afs/super.c:290 [inline]
...
Fixes: 989782dcdc91 ("afs: Overhaul cell database management")
Reported-by: syzbot+459a5dce0b4cb70fd076@syzkaller.appspotmail.com
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Hillf Danton <hdanton@sina.com>
cc: syzkaller-bugs@googlegroups.com
2020-10-09 14:11:58 +01:00
|
|
|
p = net->cells.rb_node;
|
|
|
|
while (p) {
|
|
|
|
cell = rb_entry(p, struct afs_cell, net_node);
|
2005-04-16 15:20:36 -07:00
|
|
|
|
afs: Fix rapid cell addition/removal by not using RCU on cells tree
There are a number of problems that are being seen by the rapidly mounting
and unmounting an afs dynamic root with an explicit cell and volume
specified (which should probably be rejected, but that's a separate issue):
What the tests are doing is to look up/create a cell record for the name
given and then tear it down again without actually using it to try to talk
to a server. This is repeated endlessly, very fast, and the new cell
collides with the old one if it's not quick enough to reuse it.
It appears (as suggested by Hillf Danton) that the search through the RB
tree under a read_seqbegin_or_lock() under RCU conditions isn't safe and
that it's not blocking the write_seqlock(), despite taking two passes at
it. He suggested that the code should take a ref on the cell it's
attempting to look at - but this shouldn't be necessary until we've
compared the cell names. It's possible that I'm missing a barrier
somewhere.
However, using an RCU search for this is overkill, really - we only need to
access the cell name in a few places, and they're places where we're may
end up sleeping anyway.
Fix this by switching to an R/W semaphore instead.
Additionally, draw the down_read() call inside the function (renamed to
afs_find_cell()) since all the callers were taking the RCU read lock (or
should've been[*]).
[*] afs_probe_cell_name() should have been, but that doesn't appear to be
involved in the bug reports.
The symptoms of this look like:
general protection fault, probably for non-canonical address 0xf27d208691691fdb: 0000 [#1] PREEMPT SMP KASAN
KASAN: maybe wild-memory-access in range [0x93e924348b48fed8-0x93e924348b48fedf]
...
RIP: 0010:strncasecmp lib/string.c:52 [inline]
RIP: 0010:strncasecmp+0x5f/0x240 lib/string.c:43
afs_lookup_cell_rcu+0x313/0x720 fs/afs/cell.c:88
afs_lookup_cell+0x2ee/0x1440 fs/afs/cell.c:249
afs_parse_source fs/afs/super.c:290 [inline]
...
Fixes: 989782dcdc91 ("afs: Overhaul cell database management")
Reported-by: syzbot+459a5dce0b4cb70fd076@syzkaller.appspotmail.com
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Hillf Danton <hdanton@sina.com>
cc: syzkaller-bugs@googlegroups.com
2020-10-09 14:11:58 +01:00
|
|
|
n = strncasecmp(cell->name, name,
|
|
|
|
min_t(size_t, cell->name_len, namesz));
|
|
|
|
if (n == 0)
|
|
|
|
n = cell->name_len - namesz;
|
|
|
|
if (n < 0)
|
|
|
|
p = p->rb_left;
|
|
|
|
else if (n > 0)
|
|
|
|
p = p->rb_right;
|
|
|
|
else
|
|
|
|
goto found;
|
|
|
|
}
|
|
|
|
|
|
|
|
return ERR_PTR(-ENOENT);
|
2005-04-16 15:20:36 -07:00
|
|
|
|
afs: Fix rapid cell addition/removal by not using RCU on cells tree
There are a number of problems that are being seen by the rapidly mounting
and unmounting an afs dynamic root with an explicit cell and volume
specified (which should probably be rejected, but that's a separate issue):
What the tests are doing is to look up/create a cell record for the name
given and then tear it down again without actually using it to try to talk
to a server. This is repeated endlessly, very fast, and the new cell
collides with the old one if it's not quick enough to reuse it.
It appears (as suggested by Hillf Danton) that the search through the RB
tree under a read_seqbegin_or_lock() under RCU conditions isn't safe and
that it's not blocking the write_seqlock(), despite taking two passes at
it. He suggested that the code should take a ref on the cell it's
attempting to look at - but this shouldn't be necessary until we've
compared the cell names. It's possible that I'm missing a barrier
somewhere.
However, using an RCU search for this is overkill, really - we only need to
access the cell name in a few places, and they're places where we're may
end up sleeping anyway.
Fix this by switching to an R/W semaphore instead.
Additionally, draw the down_read() call inside the function (renamed to
afs_find_cell()) since all the callers were taking the RCU read lock (or
should've been[*]).
[*] afs_probe_cell_name() should have been, but that doesn't appear to be
involved in the bug reports.
The symptoms of this look like:
general protection fault, probably for non-canonical address 0xf27d208691691fdb: 0000 [#1] PREEMPT SMP KASAN
KASAN: maybe wild-memory-access in range [0x93e924348b48fed8-0x93e924348b48fedf]
...
RIP: 0010:strncasecmp lib/string.c:52 [inline]
RIP: 0010:strncasecmp+0x5f/0x240 lib/string.c:43
afs_lookup_cell_rcu+0x313/0x720 fs/afs/cell.c:88
afs_lookup_cell+0x2ee/0x1440 fs/afs/cell.c:249
afs_parse_source fs/afs/super.c:290 [inline]
...
Fixes: 989782dcdc91 ("afs: Overhaul cell database management")
Reported-by: syzbot+459a5dce0b4cb70fd076@syzkaller.appspotmail.com
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Hillf Danton <hdanton@sina.com>
cc: syzkaller-bugs@googlegroups.com
2020-10-09 14:11:58 +01:00
|
|
|
found:
|
2020-10-13 20:51:59 +01:00
|
|
|
return afs_use_cell(cell, reason);
|
afs: Fix rapid cell addition/removal by not using RCU on cells tree
There are a number of problems that are being seen by the rapidly mounting
and unmounting an afs dynamic root with an explicit cell and volume
specified (which should probably be rejected, but that's a separate issue):
What the tests are doing is to look up/create a cell record for the name
given and then tear it down again without actually using it to try to talk
to a server. This is repeated endlessly, very fast, and the new cell
collides with the old one if it's not quick enough to reuse it.
It appears (as suggested by Hillf Danton) that the search through the RB
tree under a read_seqbegin_or_lock() under RCU conditions isn't safe and
that it's not blocking the write_seqlock(), despite taking two passes at
it. He suggested that the code should take a ref on the cell it's
attempting to look at - but this shouldn't be necessary until we've
compared the cell names. It's possible that I'm missing a barrier
somewhere.
However, using an RCU search for this is overkill, really - we only need to
access the cell name in a few places, and they're places where we're may
end up sleeping anyway.
Fix this by switching to an R/W semaphore instead.
Additionally, draw the down_read() call inside the function (renamed to
afs_find_cell()) since all the callers were taking the RCU read lock (or
should've been[*]).
[*] afs_probe_cell_name() should have been, but that doesn't appear to be
involved in the bug reports.
The symptoms of this look like:
general protection fault, probably for non-canonical address 0xf27d208691691fdb: 0000 [#1] PREEMPT SMP KASAN
KASAN: maybe wild-memory-access in range [0x93e924348b48fed8-0x93e924348b48fedf]
...
RIP: 0010:strncasecmp lib/string.c:52 [inline]
RIP: 0010:strncasecmp+0x5f/0x240 lib/string.c:43
afs_lookup_cell_rcu+0x313/0x720 fs/afs/cell.c:88
afs_lookup_cell+0x2ee/0x1440 fs/afs/cell.c:249
afs_parse_source fs/afs/super.c:290 [inline]
...
Fixes: 989782dcdc91 ("afs: Overhaul cell database management")
Reported-by: syzbot+459a5dce0b4cb70fd076@syzkaller.appspotmail.com
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Hillf Danton <hdanton@sina.com>
cc: syzkaller-bugs@googlegroups.com
2020-10-09 14:11:58 +01:00
|
|
|
}
|
2019-08-22 13:28:43 +01:00
|
|
|
|
2019-07-23 11:24:59 +01:00
|
|
|
/*
|
|
|
|
* Look up and get an activation reference on a cell record.
|
|
|
|
*/
|
afs: Fix rapid cell addition/removal by not using RCU on cells tree
There are a number of problems that are being seen by the rapidly mounting
and unmounting an afs dynamic root with an explicit cell and volume
specified (which should probably be rejected, but that's a separate issue):
What the tests are doing is to look up/create a cell record for the name
given and then tear it down again without actually using it to try to talk
to a server. This is repeated endlessly, very fast, and the new cell
collides with the old one if it's not quick enough to reuse it.
It appears (as suggested by Hillf Danton) that the search through the RB
tree under a read_seqbegin_or_lock() under RCU conditions isn't safe and
that it's not blocking the write_seqlock(), despite taking two passes at
it. He suggested that the code should take a ref on the cell it's
attempting to look at - but this shouldn't be necessary until we've
compared the cell names. It's possible that I'm missing a barrier
somewhere.
However, using an RCU search for this is overkill, really - we only need to
access the cell name in a few places, and they're places where we're may
end up sleeping anyway.
Fix this by switching to an R/W semaphore instead.
Additionally, draw the down_read() call inside the function (renamed to
afs_find_cell()) since all the callers were taking the RCU read lock (or
should've been[*]).
[*] afs_probe_cell_name() should have been, but that doesn't appear to be
involved in the bug reports.
The symptoms of this look like:
general protection fault, probably for non-canonical address 0xf27d208691691fdb: 0000 [#1] PREEMPT SMP KASAN
KASAN: maybe wild-memory-access in range [0x93e924348b48fed8-0x93e924348b48fedf]
...
RIP: 0010:strncasecmp lib/string.c:52 [inline]
RIP: 0010:strncasecmp+0x5f/0x240 lib/string.c:43
afs_lookup_cell_rcu+0x313/0x720 fs/afs/cell.c:88
afs_lookup_cell+0x2ee/0x1440 fs/afs/cell.c:249
afs_parse_source fs/afs/super.c:290 [inline]
...
Fixes: 989782dcdc91 ("afs: Overhaul cell database management")
Reported-by: syzbot+459a5dce0b4cb70fd076@syzkaller.appspotmail.com
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Hillf Danton <hdanton@sina.com>
cc: syzkaller-bugs@googlegroups.com
2020-10-09 14:11:58 +01:00
|
|
|
struct afs_cell *afs_find_cell(struct afs_net *net,
|
2020-10-13 20:51:59 +01:00
|
|
|
const char *name, unsigned int namesz,
|
|
|
|
enum afs_cell_trace reason)
|
afs: Fix rapid cell addition/removal by not using RCU on cells tree
There are a number of problems that are being seen by the rapidly mounting
and unmounting an afs dynamic root with an explicit cell and volume
specified (which should probably be rejected, but that's a separate issue):
What the tests are doing is to look up/create a cell record for the name
given and then tear it down again without actually using it to try to talk
to a server. This is repeated endlessly, very fast, and the new cell
collides with the old one if it's not quick enough to reuse it.
It appears (as suggested by Hillf Danton) that the search through the RB
tree under a read_seqbegin_or_lock() under RCU conditions isn't safe and
that it's not blocking the write_seqlock(), despite taking two passes at
it. He suggested that the code should take a ref on the cell it's
attempting to look at - but this shouldn't be necessary until we've
compared the cell names. It's possible that I'm missing a barrier
somewhere.
However, using an RCU search for this is overkill, really - we only need to
access the cell name in a few places, and they're places where we're may
end up sleeping anyway.
Fix this by switching to an R/W semaphore instead.
Additionally, draw the down_read() call inside the function (renamed to
afs_find_cell()) since all the callers were taking the RCU read lock (or
should've been[*]).
[*] afs_probe_cell_name() should have been, but that doesn't appear to be
involved in the bug reports.
The symptoms of this look like:
general protection fault, probably for non-canonical address 0xf27d208691691fdb: 0000 [#1] PREEMPT SMP KASAN
KASAN: maybe wild-memory-access in range [0x93e924348b48fed8-0x93e924348b48fedf]
...
RIP: 0010:strncasecmp lib/string.c:52 [inline]
RIP: 0010:strncasecmp+0x5f/0x240 lib/string.c:43
afs_lookup_cell_rcu+0x313/0x720 fs/afs/cell.c:88
afs_lookup_cell+0x2ee/0x1440 fs/afs/cell.c:249
afs_parse_source fs/afs/super.c:290 [inline]
...
Fixes: 989782dcdc91 ("afs: Overhaul cell database management")
Reported-by: syzbot+459a5dce0b4cb70fd076@syzkaller.appspotmail.com
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Hillf Danton <hdanton@sina.com>
cc: syzkaller-bugs@googlegroups.com
2020-10-09 14:11:58 +01:00
|
|
|
{
|
|
|
|
struct afs_cell *cell;
|
|
|
|
|
|
|
|
down_read(&net->cells_lock);
|
2020-10-13 20:51:59 +01:00
|
|
|
cell = afs_find_cell_locked(net, name, namesz, reason);
|
afs: Fix rapid cell addition/removal by not using RCU on cells tree
There are a number of problems that are being seen by the rapidly mounting
and unmounting an afs dynamic root with an explicit cell and volume
specified (which should probably be rejected, but that's a separate issue):
What the tests are doing is to look up/create a cell record for the name
given and then tear it down again without actually using it to try to talk
to a server. This is repeated endlessly, very fast, and the new cell
collides with the old one if it's not quick enough to reuse it.
It appears (as suggested by Hillf Danton) that the search through the RB
tree under a read_seqbegin_or_lock() under RCU conditions isn't safe and
that it's not blocking the write_seqlock(), despite taking two passes at
it. He suggested that the code should take a ref on the cell it's
attempting to look at - but this shouldn't be necessary until we've
compared the cell names. It's possible that I'm missing a barrier
somewhere.
However, using an RCU search for this is overkill, really - we only need to
access the cell name in a few places, and they're places where we're may
end up sleeping anyway.
Fix this by switching to an R/W semaphore instead.
Additionally, draw the down_read() call inside the function (renamed to
afs_find_cell()) since all the callers were taking the RCU read lock (or
should've been[*]).
[*] afs_probe_cell_name() should have been, but that doesn't appear to be
involved in the bug reports.
The symptoms of this look like:
general protection fault, probably for non-canonical address 0xf27d208691691fdb: 0000 [#1] PREEMPT SMP KASAN
KASAN: maybe wild-memory-access in range [0x93e924348b48fed8-0x93e924348b48fedf]
...
RIP: 0010:strncasecmp lib/string.c:52 [inline]
RIP: 0010:strncasecmp+0x5f/0x240 lib/string.c:43
afs_lookup_cell_rcu+0x313/0x720 fs/afs/cell.c:88
afs_lookup_cell+0x2ee/0x1440 fs/afs/cell.c:249
afs_parse_source fs/afs/super.c:290 [inline]
...
Fixes: 989782dcdc91 ("afs: Overhaul cell database management")
Reported-by: syzbot+459a5dce0b4cb70fd076@syzkaller.appspotmail.com
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Hillf Danton <hdanton@sina.com>
cc: syzkaller-bugs@googlegroups.com
2020-10-09 14:11:58 +01:00
|
|
|
up_read(&net->cells_lock);
|
|
|
|
return cell;
|
2017-11-02 15:27:50 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Set up a cell record and fill in its name, VL server address list and
|
|
|
|
* allocate an anonymous key
|
|
|
|
*/
|
|
|
|
static struct afs_cell *afs_alloc_cell(struct afs_net *net,
|
|
|
|
const char *name, unsigned int namelen,
|
2018-10-20 00:57:57 +01:00
|
|
|
const char *addresses)
|
2017-11-02 15:27:50 +00:00
|
|
|
{
|
afs: Simplify cell record handling
Simplify afs_cell record handling to avoid very occasional races that cause
module removal to hang (it waits for all cell records to be removed).
There are two things that particularly contribute to the difficulty:
firstly, the code tries to pass a ref on the cell to the cell's maintenance
work item (which gets awkward if the work item is already queued); and,
secondly, there's an overall cell manager that tries to use just one timer
for the entire cell collection (to avoid having loads of timers). However,
both of these are probably unnecessarily restrictive.
To simplify this, the following changes are made:
(1) The cell record collection manager is removed. Each cell record
manages itself individually.
(2) Each afs_cell is given a second work item (cell->destroyer) that is
queued when its refcount reaches zero. This is not done in the
context of the putting thread as it might be in an inconvenient place
to sleep.
(3) Each afs_cell is given its own timer. The timer is used to expire the
cell record after a period of unuse if not otherwise pinned and can
also be used for other maintenance tasks if necessary (of which there
are currently none as DNS refresh is triggered by filesystem
operations).
(4) The afs_cell manager work item (cell->manager) is no longer given a
ref on the cell when queued; rather, the manager must be deleted.
This does away with the need to deal with the consequences of losing a
race to queue cell->manager. Clean up of extra queuing is deferred to
the destroyer.
(5) The cell destroyer work item makes sure the cell timer is removed and
that the normal cell work is cancelled before farming the actual
destruction off to RCU.
(6) When a network namespace is destroyed or the kafs module is unloaded,
it's now a simple matter of marking the namespace as dead then just
waking up all the cell work items. They will then remove and destroy
themselves once all remaining activity counts and/or a ref counts are
dropped. This makes sure that all server records are dropped first.
(7) The cell record state set is reduced to just four states: SETTING_UP,
ACTIVE, REMOVING and DEAD. The record persists in the active state
even when it's not being used until the time comes to remove it rather
than downgrading it to an inactive state from whence it can be
restored.
This means that the cell still appears in /proc and /afs when not in
use until it switches to the REMOVING state - at which point it is
removed.
Note that the REMOVING state is included so that someone wanting to
resurrect the cell record is forced to wait whilst the cell is torn
down in that state. Once it's in the DEAD state, it has been removed
from net->cells tree and is no longer findable and can be replaced.
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
cc: linux-fsdevel@vger.kernel.org
Link: https://lore.kernel.org/r/20250224234154.2014840-16-dhowells@redhat.com/ # v1
Link: https://lore.kernel.org/r/20250310094206.801057-12-dhowells@redhat.com/ # v4
2025-02-24 16:06:03 +00:00
|
|
|
struct afs_vlserver_list *vllist = NULL;
|
2017-11-02 15:27:50 +00:00
|
|
|
struct afs_cell *cell;
|
|
|
|
int i, ret;
|
|
|
|
|
|
|
|
ASSERT(name);
|
|
|
|
if (namelen == 0)
|
|
|
|
return ERR_PTR(-EINVAL);
|
2010-08-04 15:16:38 +01:00
|
|
|
if (namelen > AFS_MAXCELLNAME) {
|
|
|
|
_leave(" = -ENAMETOOLONG");
|
2007-04-26 15:57:07 -07:00
|
|
|
return ERR_PTR(-ENAMETOOLONG);
|
2010-08-04 15:16:38 +01:00
|
|
|
}
|
2020-01-26 01:02:53 +00:00
|
|
|
|
|
|
|
/* Prohibit cell names that contain unprintable chars, '/' and '@' or
|
|
|
|
* that begin with a dot. This also precludes "@cell".
|
|
|
|
*/
|
|
|
|
if (name[0] == '.')
|
2018-04-06 14:17:23 +01:00
|
|
|
return ERR_PTR(-EINVAL);
|
2020-01-26 01:02:53 +00:00
|
|
|
for (i = 0; i < namelen; i++) {
|
|
|
|
char ch = name[i];
|
|
|
|
if (!isprint(ch) || ch == '/' || ch == '@')
|
|
|
|
return ERR_PTR(-EINVAL);
|
|
|
|
}
|
2007-04-26 15:57:07 -07:00
|
|
|
|
2018-10-20 00:57:57 +01:00
|
|
|
_enter("%*.*s,%s", namelen, namelen, name, addresses);
|
2017-11-02 15:27:50 +00:00
|
|
|
|
|
|
|
cell = kzalloc(sizeof(struct afs_cell), GFP_KERNEL);
|
2005-04-16 15:20:36 -07:00
|
|
|
if (!cell) {
|
|
|
|
_leave(" = -ENOMEM");
|
2007-04-26 15:55:03 -07:00
|
|
|
return ERR_PTR(-ENOMEM);
|
2005-04-16 15:20:36 -07:00
|
|
|
}
|
|
|
|
|
2025-01-07 18:34:49 +00:00
|
|
|
cell->name = kmalloc(1 + namelen + 1, GFP_KERNEL);
|
2020-06-24 17:00:24 +01:00
|
|
|
if (!cell->name) {
|
|
|
|
kfree(cell);
|
|
|
|
return ERR_PTR(-ENOMEM);
|
|
|
|
}
|
|
|
|
|
2025-01-07 18:34:49 +00:00
|
|
|
cell->name[0] = '.';
|
|
|
|
cell->name++;
|
2017-11-02 15:27:50 +00:00
|
|
|
cell->name_len = namelen;
|
|
|
|
for (i = 0; i < namelen; i++)
|
|
|
|
cell->name[i] = tolower(name[i]);
|
2020-06-24 17:00:24 +01:00
|
|
|
cell->name[i] = 0;
|
2017-11-02 15:27:50 +00:00
|
|
|
|
2025-01-07 18:34:49 +00:00
|
|
|
cell->net = net;
|
2022-07-06 10:52:14 +01:00
|
|
|
refcount_set(&cell->ref, 1);
|
2019-07-23 11:24:59 +01:00
|
|
|
atomic_set(&cell->active, 0);
|
afs: Simplify cell record handling
Simplify afs_cell record handling to avoid very occasional races that cause
module removal to hang (it waits for all cell records to be removed).
There are two things that particularly contribute to the difficulty:
firstly, the code tries to pass a ref on the cell to the cell's maintenance
work item (which gets awkward if the work item is already queued); and,
secondly, there's an overall cell manager that tries to use just one timer
for the entire cell collection (to avoid having loads of timers). However,
both of these are probably unnecessarily restrictive.
To simplify this, the following changes are made:
(1) The cell record collection manager is removed. Each cell record
manages itself individually.
(2) Each afs_cell is given a second work item (cell->destroyer) that is
queued when its refcount reaches zero. This is not done in the
context of the putting thread as it might be in an inconvenient place
to sleep.
(3) Each afs_cell is given its own timer. The timer is used to expire the
cell record after a period of unuse if not otherwise pinned and can
also be used for other maintenance tasks if necessary (of which there
are currently none as DNS refresh is triggered by filesystem
operations).
(4) The afs_cell manager work item (cell->manager) is no longer given a
ref on the cell when queued; rather, the manager must be deleted.
This does away with the need to deal with the consequences of losing a
race to queue cell->manager. Clean up of extra queuing is deferred to
the destroyer.
(5) The cell destroyer work item makes sure the cell timer is removed and
that the normal cell work is cancelled before farming the actual
destruction off to RCU.
(6) When a network namespace is destroyed or the kafs module is unloaded,
it's now a simple matter of marking the namespace as dead then just
waking up all the cell work items. They will then remove and destroy
themselves once all remaining activity counts and/or a ref counts are
dropped. This makes sure that all server records are dropped first.
(7) The cell record state set is reduced to just four states: SETTING_UP,
ACTIVE, REMOVING and DEAD. The record persists in the active state
even when it's not being used until the time comes to remove it rather
than downgrading it to an inactive state from whence it can be
restored.
This means that the cell still appears in /proc and /afs when not in
use until it switches to the REMOVING state - at which point it is
removed.
Note that the REMOVING state is included so that someone wanting to
resurrect the cell record is forced to wait whilst the cell is torn
down in that state. Once it's in the DEAD state, it has been removed
from net->cells tree and is no longer findable and can be replaced.
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
cc: linux-fsdevel@vger.kernel.org
Link: https://lore.kernel.org/r/20250224234154.2014840-16-dhowells@redhat.com/ # v1
Link: https://lore.kernel.org/r/20250310094206.801057-12-dhowells@redhat.com/ # v4
2025-02-24 16:06:03 +00:00
|
|
|
INIT_WORK(&cell->destroyer, afs_destroy_cell_work);
|
2019-07-23 11:24:59 +01:00
|
|
|
INIT_WORK(&cell->manager, afs_manage_cell_work);
|
afs: Simplify cell record handling
Simplify afs_cell record handling to avoid very occasional races that cause
module removal to hang (it waits for all cell records to be removed).
There are two things that particularly contribute to the difficulty:
firstly, the code tries to pass a ref on the cell to the cell's maintenance
work item (which gets awkward if the work item is already queued); and,
secondly, there's an overall cell manager that tries to use just one timer
for the entire cell collection (to avoid having loads of timers). However,
both of these are probably unnecessarily restrictive.
To simplify this, the following changes are made:
(1) The cell record collection manager is removed. Each cell record
manages itself individually.
(2) Each afs_cell is given a second work item (cell->destroyer) that is
queued when its refcount reaches zero. This is not done in the
context of the putting thread as it might be in an inconvenient place
to sleep.
(3) Each afs_cell is given its own timer. The timer is used to expire the
cell record after a period of unuse if not otherwise pinned and can
also be used for other maintenance tasks if necessary (of which there
are currently none as DNS refresh is triggered by filesystem
operations).
(4) The afs_cell manager work item (cell->manager) is no longer given a
ref on the cell when queued; rather, the manager must be deleted.
This does away with the need to deal with the consequences of losing a
race to queue cell->manager. Clean up of extra queuing is deferred to
the destroyer.
(5) The cell destroyer work item makes sure the cell timer is removed and
that the normal cell work is cancelled before farming the actual
destruction off to RCU.
(6) When a network namespace is destroyed or the kafs module is unloaded,
it's now a simple matter of marking the namespace as dead then just
waking up all the cell work items. They will then remove and destroy
themselves once all remaining activity counts and/or a ref counts are
dropped. This makes sure that all server records are dropped first.
(7) The cell record state set is reduced to just four states: SETTING_UP,
ACTIVE, REMOVING and DEAD. The record persists in the active state
even when it's not being used until the time comes to remove it rather
than downgrading it to an inactive state from whence it can be
restored.
This means that the cell still appears in /proc and /afs when not in
use until it switches to the REMOVING state - at which point it is
removed.
Note that the REMOVING state is included so that someone wanting to
resurrect the cell record is forced to wait whilst the cell is torn
down in that state. Once it's in the DEAD state, it has been removed
from net->cells tree and is no longer findable and can be replaced.
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
cc: linux-fsdevel@vger.kernel.org
Link: https://lore.kernel.org/r/20250224234154.2014840-16-dhowells@redhat.com/ # v1
Link: https://lore.kernel.org/r/20250310094206.801057-12-dhowells@redhat.com/ # v4
2025-02-24 16:06:03 +00:00
|
|
|
timer_setup(&cell->management_timer, afs_cell_timer, 0);
|
2023-11-07 17:59:33 +00:00
|
|
|
init_rwsem(&cell->vs_lock);
|
2020-04-30 01:03:49 +01:00
|
|
|
cell->volumes = RB_ROOT;
|
|
|
|
INIT_HLIST_HEAD(&cell->proc_volumes);
|
|
|
|
seqlock_init(&cell->volume_lock);
|
|
|
|
cell->fs_servers = RB_ROOT;
|
afs: Fix afs_server ref accounting
The current way that afs_server refs are accounted and cleaned up sometimes
cause rmmod to hang when it is waiting for cell records to be removed. The
problem is that the cell cleanup might occasionally happen before the
server cleanup and then there's nothing that causes the cell to
garbage-collect the remaining servers as they become inactive.
Partially fix this by:
(1) Give each afs_server record its own management timer that rather than
relying on the cell manager's central timer to drive each individual
cell's maintenance work item to garbage collect servers.
This timer is set when afs_unuse_server() reduces a server's activity
count to zero and will schedule the server's destroyer work item upon
firing.
(2) Give each afs_server record its own destroyer work item that removes
the record from the cell's database, shuts down the timer, cancels any
pending work for itself, sends an RPC to the server to cancel
outstanding callbacks.
This change, in combination with the timer, obviates the need to try
and coordinate so closely between the cell record and a bunch of other
server records to try and tear everything down in a coordinated
fashion. With this, the cell record is pinned until the server RCU is
complete and namespace/module removal will wait until all the cell
records are removed.
(3) Now that incoming calls are mapped to servers (and thus cells) using
data attached to an rxrpc_peer, the UUID-to-server mapping tree is
moved from the namespace to the cell (cell->fs_servers). This means
there can no longer be duplicates therein - and that allows the
mapping tree to be simpler as there doesn't need to be a chain of
same-UUID servers that are in different cells.
(4) The lock protecting the UUID mapping tree is switched to an
rw_semaphore on the cell rather than a seqlock on the namespace as
it's now only used during mounting in contexts in which we're allowed
to sleep.
(5) When it comes time for a cell that is being removed to purge its set
of servers, it just needs to iterate over them and wake them up. Once
a server becomes inactive, its destroyer work item will observe the
state of the cell and immediately remove that record.
(6) When a server record is removed, it is marked AFS_SERVER_FL_EXPIRED to
prevent reattempts at removal. The record will be dispatched to RCU
for destruction once its refcount reaches 0.
(7) The AFS_SERVER_FL_UNCREATED/CREATING flags are used to synchronise
simultaneous creation attempts. If one attempt fails, it will abandon
the attempt and allow another to try again.
Note that the record can't just be abandoned when dead as it's bound
into a server list attached to a volume and only subject to
replacement if the server list obtained for the volume from the VLDB
changes.
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
cc: linux-fsdevel@vger.kernel.org
Link: https://lore.kernel.org/r/20250224234154.2014840-15-dhowells@redhat.com/ # v1
Link: https://lore.kernel.org/r/20250310094206.801057-11-dhowells@redhat.com/ # v4
2025-02-24 16:51:36 +00:00
|
|
|
init_rwsem(&cell->fs_lock);
|
2018-10-20 00:57:57 +01:00
|
|
|
rwlock_init(&cell->vl_servers_lock);
|
afs: Detect cell aliases 1 - Cells with root volumes
Put in the first phase of cell alias detection. This part handles alias
detection for cells that have root.cell volumes (which is expected to be
likely).
When a cell becomes newly active, it is probed for its root.cell volume,
and if it has one, this volume is compared against other root.cell volumes
to find out if the list of fileserver UUIDs have any in common - and if
that's the case, do the address lists of those fileservers have any
addresses in common. If they do, the new cell is adjudged to be an alias
of the old cell and the old cell is used instead.
Comparing is aided by the server list in struct afs_server_list being
sorted in UUID order and the addresses in the fileserver address lists
being sorted in address order.
The cell then retains the afs_volume object for the root.cell volume, even
if it's not mounted for future alias checking.
This necessary because:
(1) Whilst fileservers have UUIDs that are meant to be globally unique, in
practice they are not because cells get cloned without changing the
UUIDs - so afs_server records need to be per cell.
(2) Sometimes the DNS is used to make cell aliases - but if we don't know
they're the same, we may end up with multiple superblocks and multiple
afs_server records for the same thing, impairing our ability to
deliver callback notifications of third party changes
(3) The fileserver RPC API doesn't contain the cell name, so it can't tell
us which cell it's notifying and can't see that a change made to to
one cell should notify the same client that's also accessed as the
other cell.
Reported-by: Jeffrey Altman <jaltman@auristor.com>
Signed-off-by: David Howells <dhowells@redhat.com>
2020-04-25 10:26:02 +01:00
|
|
|
cell->flags = (1 << AFS_CELL_FL_CHECK_ALIAS);
|
2017-11-02 15:27:47 +00:00
|
|
|
|
2019-05-07 15:30:34 +01:00
|
|
|
/* Provide a VL server list, filling it in if we were given a list of
|
|
|
|
* addresses to use.
|
2017-11-02 15:27:50 +00:00
|
|
|
*/
|
2018-10-20 00:57:57 +01:00
|
|
|
if (addresses) {
|
|
|
|
vllist = afs_parse_text_addrs(net,
|
|
|
|
addresses, strlen(addresses), ':',
|
|
|
|
VL_SERVICE, AFS_VL_PORT);
|
|
|
|
if (IS_ERR(vllist)) {
|
|
|
|
ret = PTR_ERR(vllist);
|
2025-07-21 15:26:51 +01:00
|
|
|
vllist = NULL;
|
2017-11-02 15:27:50 +00:00
|
|
|
goto parse_failed;
|
|
|
|
}
|
2007-04-26 15:57:07 -07:00
|
|
|
|
2019-05-07 15:06:36 +01:00
|
|
|
vllist->source = DNS_RECORD_FROM_CONFIG;
|
|
|
|
vllist->status = DNS_LOOKUP_NOT_DONE;
|
2017-11-02 15:27:50 +00:00
|
|
|
cell->dns_expiry = TIME64_MAX;
|
2018-10-20 00:57:57 +01:00
|
|
|
} else {
|
2019-05-07 15:30:34 +01:00
|
|
|
ret = -ENOMEM;
|
|
|
|
vllist = afs_alloc_vlserver_list(0);
|
|
|
|
if (!vllist)
|
|
|
|
goto error;
|
2019-05-07 15:06:36 +01:00
|
|
|
vllist->source = DNS_RECORD_UNAVAILABLE;
|
|
|
|
vllist->status = DNS_LOOKUP_NOT_DONE;
|
2018-10-20 00:57:57 +01:00
|
|
|
cell->dns_expiry = ktime_get_real_seconds();
|
2007-04-26 15:57:07 -07:00
|
|
|
}
|
|
|
|
|
2019-05-07 15:30:34 +01:00
|
|
|
rcu_assign_pointer(cell->vl_servers, vllist);
|
|
|
|
|
2019-05-07 15:06:36 +01:00
|
|
|
cell->dns_source = vllist->source;
|
|
|
|
cell->dns_status = vllist->status;
|
|
|
|
smp_store_release(&cell->dns_lookup_count, 1); /* vs source/status */
|
2019-07-23 11:24:59 +01:00
|
|
|
atomic_inc(&net->cells_outstanding);
|
2025-02-24 09:52:58 +00:00
|
|
|
ret = idr_alloc_cyclic(&net->cells_dyn_ino, cell,
|
|
|
|
2, INT_MAX / 2, GFP_KERNEL);
|
|
|
|
if (ret < 0)
|
|
|
|
goto error;
|
|
|
|
cell->dynroot_ino = ret;
|
2020-10-13 20:51:59 +01:00
|
|
|
cell->debug_id = atomic_inc_return(&cell_debug_id);
|
2025-02-24 09:52:58 +00:00
|
|
|
|
2020-10-13 20:51:59 +01:00
|
|
|
trace_afs_cell(cell->debug_id, 1, 0, afs_cell_trace_alloc);
|
2019-05-07 15:06:36 +01:00
|
|
|
|
2007-04-26 15:57:07 -07:00
|
|
|
_leave(" = %p", cell);
|
|
|
|
return cell;
|
|
|
|
|
2017-11-02 15:27:50 +00:00
|
|
|
parse_failed:
|
|
|
|
if (ret == -EINVAL)
|
|
|
|
printk(KERN_ERR "kAFS: bad VL server IP address\n");
|
2019-05-07 15:30:34 +01:00
|
|
|
error:
|
afs: Simplify cell record handling
Simplify afs_cell record handling to avoid very occasional races that cause
module removal to hang (it waits for all cell records to be removed).
There are two things that particularly contribute to the difficulty:
firstly, the code tries to pass a ref on the cell to the cell's maintenance
work item (which gets awkward if the work item is already queued); and,
secondly, there's an overall cell manager that tries to use just one timer
for the entire cell collection (to avoid having loads of timers). However,
both of these are probably unnecessarily restrictive.
To simplify this, the following changes are made:
(1) The cell record collection manager is removed. Each cell record
manages itself individually.
(2) Each afs_cell is given a second work item (cell->destroyer) that is
queued when its refcount reaches zero. This is not done in the
context of the putting thread as it might be in an inconvenient place
to sleep.
(3) Each afs_cell is given its own timer. The timer is used to expire the
cell record after a period of unuse if not otherwise pinned and can
also be used for other maintenance tasks if necessary (of which there
are currently none as DNS refresh is triggered by filesystem
operations).
(4) The afs_cell manager work item (cell->manager) is no longer given a
ref on the cell when queued; rather, the manager must be deleted.
This does away with the need to deal with the consequences of losing a
race to queue cell->manager. Clean up of extra queuing is deferred to
the destroyer.
(5) The cell destroyer work item makes sure the cell timer is removed and
that the normal cell work is cancelled before farming the actual
destruction off to RCU.
(6) When a network namespace is destroyed or the kafs module is unloaded,
it's now a simple matter of marking the namespace as dead then just
waking up all the cell work items. They will then remove and destroy
themselves once all remaining activity counts and/or a ref counts are
dropped. This makes sure that all server records are dropped first.
(7) The cell record state set is reduced to just four states: SETTING_UP,
ACTIVE, REMOVING and DEAD. The record persists in the active state
even when it's not being used until the time comes to remove it rather
than downgrading it to an inactive state from whence it can be
restored.
This means that the cell still appears in /proc and /afs when not in
use until it switches to the REMOVING state - at which point it is
removed.
Note that the REMOVING state is included so that someone wanting to
resurrect the cell record is forced to wait whilst the cell is torn
down in that state. Once it's in the DEAD state, it has been removed
from net->cells tree and is no longer findable and can be replaced.
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
cc: linux-fsdevel@vger.kernel.org
Link: https://lore.kernel.org/r/20250224234154.2014840-16-dhowells@redhat.com/ # v1
Link: https://lore.kernel.org/r/20250310094206.801057-12-dhowells@redhat.com/ # v4
2025-02-24 16:06:03 +00:00
|
|
|
afs_put_vlserverlist(cell->net, vllist);
|
2025-01-07 18:34:49 +00:00
|
|
|
kfree(cell->name - 1);
|
2007-04-26 15:57:07 -07:00
|
|
|
kfree(cell);
|
|
|
|
_leave(" = %d", ret);
|
|
|
|
return ERR_PTR(ret);
|
|
|
|
}
|
2005-04-16 15:20:36 -07:00
|
|
|
|
2007-04-26 15:57:07 -07:00
|
|
|
/*
|
2017-11-02 15:27:50 +00:00
|
|
|
* afs_lookup_cell - Look up or create a cell record.
|
2017-11-02 15:27:45 +00:00
|
|
|
* @net: The network namespace
|
2017-11-02 15:27:50 +00:00
|
|
|
* @name: The name of the cell.
|
|
|
|
* @namesz: The strlen of the cell name.
|
|
|
|
* @vllist: A colon/comma separated list of numeric IP addresses or NULL.
|
|
|
|
* @excl: T if an error should be given if the cell name already exists.
|
2025-02-24 10:37:56 +00:00
|
|
|
* @trace: The reason to be logged if the lookup is successful.
|
2017-11-02 15:27:50 +00:00
|
|
|
*
|
|
|
|
* Look up a cell record by name and query the DNS for VL server addresses if
|
|
|
|
* needed. Note that that actual DNS query is punted off to the manager thread
|
|
|
|
* so that this function can return immediately if interrupted whilst allowing
|
|
|
|
* cell records to be shared even if not yet fully constructed.
|
2007-04-26 15:57:07 -07:00
|
|
|
*/
|
2017-11-02 15:27:50 +00:00
|
|
|
struct afs_cell *afs_lookup_cell(struct afs_net *net,
|
|
|
|
const char *name, unsigned int namesz,
|
2025-02-24 10:37:56 +00:00
|
|
|
const char *vllist, bool excl,
|
|
|
|
enum afs_cell_trace trace)
|
2007-04-26 15:57:07 -07:00
|
|
|
{
|
2017-11-02 15:27:50 +00:00
|
|
|
struct afs_cell *cell, *candidate, *cursor;
|
|
|
|
struct rb_node *parent, **pp;
|
2019-05-07 15:06:36 +01:00
|
|
|
enum afs_cell_state state;
|
2017-11-02 15:27:50 +00:00
|
|
|
int ret, n;
|
|
|
|
|
|
|
|
_enter("%s,%s", name, vllist);
|
|
|
|
|
|
|
|
if (!excl) {
|
2025-02-24 10:37:56 +00:00
|
|
|
cell = afs_find_cell(net, name, namesz, trace);
|
2017-11-17 16:40:32 -06:00
|
|
|
if (!IS_ERR(cell))
|
2017-11-02 15:27:50 +00:00
|
|
|
goto wait_for_cell;
|
|
|
|
}
|
2007-04-26 15:57:07 -07:00
|
|
|
|
2017-11-02 15:27:50 +00:00
|
|
|
/* Assume we're probably going to create a cell and preallocate and
|
|
|
|
* mostly set up a candidate record. We can then use this to stash the
|
|
|
|
* name, the net namespace and VL server addresses.
|
|
|
|
*
|
|
|
|
* We also want to do this before we hold any locks as it may involve
|
|
|
|
* upcalling to userspace to make DNS queries.
|
|
|
|
*/
|
|
|
|
candidate = afs_alloc_cell(net, name, namesz, vllist);
|
|
|
|
if (IS_ERR(candidate)) {
|
|
|
|
_leave(" = %ld", PTR_ERR(candidate));
|
|
|
|
return candidate;
|
2008-03-28 14:15:55 -07:00
|
|
|
}
|
|
|
|
|
2017-11-02 15:27:50 +00:00
|
|
|
/* Find the insertion point and check to see if someone else added a
|
|
|
|
* cell whilst we were allocating.
|
|
|
|
*/
|
afs: Fix rapid cell addition/removal by not using RCU on cells tree
There are a number of problems that are being seen by the rapidly mounting
and unmounting an afs dynamic root with an explicit cell and volume
specified (which should probably be rejected, but that's a separate issue):
What the tests are doing is to look up/create a cell record for the name
given and then tear it down again without actually using it to try to talk
to a server. This is repeated endlessly, very fast, and the new cell
collides with the old one if it's not quick enough to reuse it.
It appears (as suggested by Hillf Danton) that the search through the RB
tree under a read_seqbegin_or_lock() under RCU conditions isn't safe and
that it's not blocking the write_seqlock(), despite taking two passes at
it. He suggested that the code should take a ref on the cell it's
attempting to look at - but this shouldn't be necessary until we've
compared the cell names. It's possible that I'm missing a barrier
somewhere.
However, using an RCU search for this is overkill, really - we only need to
access the cell name in a few places, and they're places where we're may
end up sleeping anyway.
Fix this by switching to an R/W semaphore instead.
Additionally, draw the down_read() call inside the function (renamed to
afs_find_cell()) since all the callers were taking the RCU read lock (or
should've been[*]).
[*] afs_probe_cell_name() should have been, but that doesn't appear to be
involved in the bug reports.
The symptoms of this look like:
general protection fault, probably for non-canonical address 0xf27d208691691fdb: 0000 [#1] PREEMPT SMP KASAN
KASAN: maybe wild-memory-access in range [0x93e924348b48fed8-0x93e924348b48fedf]
...
RIP: 0010:strncasecmp lib/string.c:52 [inline]
RIP: 0010:strncasecmp+0x5f/0x240 lib/string.c:43
afs_lookup_cell_rcu+0x313/0x720 fs/afs/cell.c:88
afs_lookup_cell+0x2ee/0x1440 fs/afs/cell.c:249
afs_parse_source fs/afs/super.c:290 [inline]
...
Fixes: 989782dcdc91 ("afs: Overhaul cell database management")
Reported-by: syzbot+459a5dce0b4cb70fd076@syzkaller.appspotmail.com
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Hillf Danton <hdanton@sina.com>
cc: syzkaller-bugs@googlegroups.com
2020-10-09 14:11:58 +01:00
|
|
|
down_write(&net->cells_lock);
|
2017-11-02 15:27:50 +00:00
|
|
|
|
|
|
|
pp = &net->cells.rb_node;
|
|
|
|
parent = NULL;
|
|
|
|
while (*pp) {
|
|
|
|
parent = *pp;
|
|
|
|
cursor = rb_entry(parent, struct afs_cell, net_node);
|
|
|
|
|
|
|
|
n = strncasecmp(cursor->name, name,
|
|
|
|
min_t(size_t, cursor->name_len, namesz));
|
|
|
|
if (n == 0)
|
|
|
|
n = cursor->name_len - namesz;
|
|
|
|
if (n < 0)
|
|
|
|
pp = &(*pp)->rb_left;
|
|
|
|
else if (n > 0)
|
|
|
|
pp = &(*pp)->rb_right;
|
|
|
|
else
|
|
|
|
goto cell_already_exists;
|
2007-04-26 15:57:07 -07:00
|
|
|
}
|
|
|
|
|
2017-11-02 15:27:50 +00:00
|
|
|
cell = candidate;
|
|
|
|
candidate = NULL;
|
afs: Simplify cell record handling
Simplify afs_cell record handling to avoid very occasional races that cause
module removal to hang (it waits for all cell records to be removed).
There are two things that particularly contribute to the difficulty:
firstly, the code tries to pass a ref on the cell to the cell's maintenance
work item (which gets awkward if the work item is already queued); and,
secondly, there's an overall cell manager that tries to use just one timer
for the entire cell collection (to avoid having loads of timers). However,
both of these are probably unnecessarily restrictive.
To simplify this, the following changes are made:
(1) The cell record collection manager is removed. Each cell record
manages itself individually.
(2) Each afs_cell is given a second work item (cell->destroyer) that is
queued when its refcount reaches zero. This is not done in the
context of the putting thread as it might be in an inconvenient place
to sleep.
(3) Each afs_cell is given its own timer. The timer is used to expire the
cell record after a period of unuse if not otherwise pinned and can
also be used for other maintenance tasks if necessary (of which there
are currently none as DNS refresh is triggered by filesystem
operations).
(4) The afs_cell manager work item (cell->manager) is no longer given a
ref on the cell when queued; rather, the manager must be deleted.
This does away with the need to deal with the consequences of losing a
race to queue cell->manager. Clean up of extra queuing is deferred to
the destroyer.
(5) The cell destroyer work item makes sure the cell timer is removed and
that the normal cell work is cancelled before farming the actual
destruction off to RCU.
(6) When a network namespace is destroyed or the kafs module is unloaded,
it's now a simple matter of marking the namespace as dead then just
waking up all the cell work items. They will then remove and destroy
themselves once all remaining activity counts and/or a ref counts are
dropped. This makes sure that all server records are dropped first.
(7) The cell record state set is reduced to just four states: SETTING_UP,
ACTIVE, REMOVING and DEAD. The record persists in the active state
even when it's not being used until the time comes to remove it rather
than downgrading it to an inactive state from whence it can be
restored.
This means that the cell still appears in /proc and /afs when not in
use until it switches to the REMOVING state - at which point it is
removed.
Note that the REMOVING state is included so that someone wanting to
resurrect the cell record is forced to wait whilst the cell is torn
down in that state. Once it's in the DEAD state, it has been removed
from net->cells tree and is no longer findable and can be replaced.
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
cc: linux-fsdevel@vger.kernel.org
Link: https://lore.kernel.org/r/20250224234154.2014840-16-dhowells@redhat.com/ # v1
Link: https://lore.kernel.org/r/20250310094206.801057-12-dhowells@redhat.com/ # v4
2025-02-24 16:06:03 +00:00
|
|
|
afs_use_cell(cell, trace);
|
2017-11-02 15:27:50 +00:00
|
|
|
rb_link_node_rcu(&cell->net_node, parent, pp);
|
|
|
|
rb_insert_color(&cell->net_node, &net->cells);
|
afs: Fix rapid cell addition/removal by not using RCU on cells tree
There are a number of problems that are being seen by the rapidly mounting
and unmounting an afs dynamic root with an explicit cell and volume
specified (which should probably be rejected, but that's a separate issue):
What the tests are doing is to look up/create a cell record for the name
given and then tear it down again without actually using it to try to talk
to a server. This is repeated endlessly, very fast, and the new cell
collides with the old one if it's not quick enough to reuse it.
It appears (as suggested by Hillf Danton) that the search through the RB
tree under a read_seqbegin_or_lock() under RCU conditions isn't safe and
that it's not blocking the write_seqlock(), despite taking two passes at
it. He suggested that the code should take a ref on the cell it's
attempting to look at - but this shouldn't be necessary until we've
compared the cell names. It's possible that I'm missing a barrier
somewhere.
However, using an RCU search for this is overkill, really - we only need to
access the cell name in a few places, and they're places where we're may
end up sleeping anyway.
Fix this by switching to an R/W semaphore instead.
Additionally, draw the down_read() call inside the function (renamed to
afs_find_cell()) since all the callers were taking the RCU read lock (or
should've been[*]).
[*] afs_probe_cell_name() should have been, but that doesn't appear to be
involved in the bug reports.
The symptoms of this look like:
general protection fault, probably for non-canonical address 0xf27d208691691fdb: 0000 [#1] PREEMPT SMP KASAN
KASAN: maybe wild-memory-access in range [0x93e924348b48fed8-0x93e924348b48fedf]
...
RIP: 0010:strncasecmp lib/string.c:52 [inline]
RIP: 0010:strncasecmp+0x5f/0x240 lib/string.c:43
afs_lookup_cell_rcu+0x313/0x720 fs/afs/cell.c:88
afs_lookup_cell+0x2ee/0x1440 fs/afs/cell.c:249
afs_parse_source fs/afs/super.c:290 [inline]
...
Fixes: 989782dcdc91 ("afs: Overhaul cell database management")
Reported-by: syzbot+459a5dce0b4cb70fd076@syzkaller.appspotmail.com
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Hillf Danton <hdanton@sina.com>
cc: syzkaller-bugs@googlegroups.com
2020-10-09 14:11:58 +01:00
|
|
|
up_write(&net->cells_lock);
|
2005-04-16 15:20:36 -07:00
|
|
|
|
afs: Simplify cell record handling
Simplify afs_cell record handling to avoid very occasional races that cause
module removal to hang (it waits for all cell records to be removed).
There are two things that particularly contribute to the difficulty:
firstly, the code tries to pass a ref on the cell to the cell's maintenance
work item (which gets awkward if the work item is already queued); and,
secondly, there's an overall cell manager that tries to use just one timer
for the entire cell collection (to avoid having loads of timers). However,
both of these are probably unnecessarily restrictive.
To simplify this, the following changes are made:
(1) The cell record collection manager is removed. Each cell record
manages itself individually.
(2) Each afs_cell is given a second work item (cell->destroyer) that is
queued when its refcount reaches zero. This is not done in the
context of the putting thread as it might be in an inconvenient place
to sleep.
(3) Each afs_cell is given its own timer. The timer is used to expire the
cell record after a period of unuse if not otherwise pinned and can
also be used for other maintenance tasks if necessary (of which there
are currently none as DNS refresh is triggered by filesystem
operations).
(4) The afs_cell manager work item (cell->manager) is no longer given a
ref on the cell when queued; rather, the manager must be deleted.
This does away with the need to deal with the consequences of losing a
race to queue cell->manager. Clean up of extra queuing is deferred to
the destroyer.
(5) The cell destroyer work item makes sure the cell timer is removed and
that the normal cell work is cancelled before farming the actual
destruction off to RCU.
(6) When a network namespace is destroyed or the kafs module is unloaded,
it's now a simple matter of marking the namespace as dead then just
waking up all the cell work items. They will then remove and destroy
themselves once all remaining activity counts and/or a ref counts are
dropped. This makes sure that all server records are dropped first.
(7) The cell record state set is reduced to just four states: SETTING_UP,
ACTIVE, REMOVING and DEAD. The record persists in the active state
even when it's not being used until the time comes to remove it rather
than downgrading it to an inactive state from whence it can be
restored.
This means that the cell still appears in /proc and /afs when not in
use until it switches to the REMOVING state - at which point it is
removed.
Note that the REMOVING state is included so that someone wanting to
resurrect the cell record is forced to wait whilst the cell is torn
down in that state. Once it's in the DEAD state, it has been removed
from net->cells tree and is no longer findable and can be replaced.
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
cc: linux-fsdevel@vger.kernel.org
Link: https://lore.kernel.org/r/20250224234154.2014840-16-dhowells@redhat.com/ # v1
Link: https://lore.kernel.org/r/20250310094206.801057-12-dhowells@redhat.com/ # v4
2025-02-24 16:06:03 +00:00
|
|
|
afs_queue_cell(cell, afs_cell_trace_queue_new);
|
2005-04-16 15:20:36 -07:00
|
|
|
|
2017-11-02 15:27:50 +00:00
|
|
|
wait_for_cell:
|
|
|
|
_debug("wait_for_cell");
|
afs: Simplify cell record handling
Simplify afs_cell record handling to avoid very occasional races that cause
module removal to hang (it waits for all cell records to be removed).
There are two things that particularly contribute to the difficulty:
firstly, the code tries to pass a ref on the cell to the cell's maintenance
work item (which gets awkward if the work item is already queued); and,
secondly, there's an overall cell manager that tries to use just one timer
for the entire cell collection (to avoid having loads of timers). However,
both of these are probably unnecessarily restrictive.
To simplify this, the following changes are made:
(1) The cell record collection manager is removed. Each cell record
manages itself individually.
(2) Each afs_cell is given a second work item (cell->destroyer) that is
queued when its refcount reaches zero. This is not done in the
context of the putting thread as it might be in an inconvenient place
to sleep.
(3) Each afs_cell is given its own timer. The timer is used to expire the
cell record after a period of unuse if not otherwise pinned and can
also be used for other maintenance tasks if necessary (of which there
are currently none as DNS refresh is triggered by filesystem
operations).
(4) The afs_cell manager work item (cell->manager) is no longer given a
ref on the cell when queued; rather, the manager must be deleted.
This does away with the need to deal with the consequences of losing a
race to queue cell->manager. Clean up of extra queuing is deferred to
the destroyer.
(5) The cell destroyer work item makes sure the cell timer is removed and
that the normal cell work is cancelled before farming the actual
destruction off to RCU.
(6) When a network namespace is destroyed or the kafs module is unloaded,
it's now a simple matter of marking the namespace as dead then just
waking up all the cell work items. They will then remove and destroy
themselves once all remaining activity counts and/or a ref counts are
dropped. This makes sure that all server records are dropped first.
(7) The cell record state set is reduced to just four states: SETTING_UP,
ACTIVE, REMOVING and DEAD. The record persists in the active state
even when it's not being used until the time comes to remove it rather
than downgrading it to an inactive state from whence it can be
restored.
This means that the cell still appears in /proc and /afs when not in
use until it switches to the REMOVING state - at which point it is
removed.
Note that the REMOVING state is included so that someone wanting to
resurrect the cell record is forced to wait whilst the cell is torn
down in that state. Once it's in the DEAD state, it has been removed
from net->cells tree and is no longer findable and can be replaced.
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
cc: linux-fsdevel@vger.kernel.org
Link: https://lore.kernel.org/r/20250224234154.2014840-16-dhowells@redhat.com/ # v1
Link: https://lore.kernel.org/r/20250310094206.801057-12-dhowells@redhat.com/ # v4
2025-02-24 16:06:03 +00:00
|
|
|
state = smp_load_acquire(&cell->state); /* vs error */
|
|
|
|
if (state != AFS_CELL_ACTIVE &&
|
|
|
|
state != AFS_CELL_DEAD) {
|
|
|
|
afs_see_cell(cell, afs_cell_trace_wait);
|
|
|
|
wait_var_event(&cell->state,
|
|
|
|
({
|
|
|
|
state = smp_load_acquire(&cell->state); /* vs error */
|
|
|
|
state == AFS_CELL_ACTIVE || state == AFS_CELL_DEAD;
|
|
|
|
}));
|
|
|
|
}
|
2019-05-07 15:06:36 +01:00
|
|
|
|
|
|
|
/* Check the state obtained from the wait check. */
|
afs: Simplify cell record handling
Simplify afs_cell record handling to avoid very occasional races that cause
module removal to hang (it waits for all cell records to be removed).
There are two things that particularly contribute to the difficulty:
firstly, the code tries to pass a ref on the cell to the cell's maintenance
work item (which gets awkward if the work item is already queued); and,
secondly, there's an overall cell manager that tries to use just one timer
for the entire cell collection (to avoid having loads of timers). However,
both of these are probably unnecessarily restrictive.
To simplify this, the following changes are made:
(1) The cell record collection manager is removed. Each cell record
manages itself individually.
(2) Each afs_cell is given a second work item (cell->destroyer) that is
queued when its refcount reaches zero. This is not done in the
context of the putting thread as it might be in an inconvenient place
to sleep.
(3) Each afs_cell is given its own timer. The timer is used to expire the
cell record after a period of unuse if not otherwise pinned and can
also be used for other maintenance tasks if necessary (of which there
are currently none as DNS refresh is triggered by filesystem
operations).
(4) The afs_cell manager work item (cell->manager) is no longer given a
ref on the cell when queued; rather, the manager must be deleted.
This does away with the need to deal with the consequences of losing a
race to queue cell->manager. Clean up of extra queuing is deferred to
the destroyer.
(5) The cell destroyer work item makes sure the cell timer is removed and
that the normal cell work is cancelled before farming the actual
destruction off to RCU.
(6) When a network namespace is destroyed or the kafs module is unloaded,
it's now a simple matter of marking the namespace as dead then just
waking up all the cell work items. They will then remove and destroy
themselves once all remaining activity counts and/or a ref counts are
dropped. This makes sure that all server records are dropped first.
(7) The cell record state set is reduced to just four states: SETTING_UP,
ACTIVE, REMOVING and DEAD. The record persists in the active state
even when it's not being used until the time comes to remove it rather
than downgrading it to an inactive state from whence it can be
restored.
This means that the cell still appears in /proc and /afs when not in
use until it switches to the REMOVING state - at which point it is
removed.
Note that the REMOVING state is included so that someone wanting to
resurrect the cell record is forced to wait whilst the cell is torn
down in that state. Once it's in the DEAD state, it has been removed
from net->cells tree and is no longer findable and can be replaced.
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
cc: linux-fsdevel@vger.kernel.org
Link: https://lore.kernel.org/r/20250224234154.2014840-16-dhowells@redhat.com/ # v1
Link: https://lore.kernel.org/r/20250310094206.801057-12-dhowells@redhat.com/ # v4
2025-02-24 16:06:03 +00:00
|
|
|
if (state == AFS_CELL_DEAD) {
|
2017-11-02 15:27:50 +00:00
|
|
|
ret = cell->error;
|
|
|
|
goto error;
|
|
|
|
}
|
2005-04-16 15:20:36 -07:00
|
|
|
|
2017-11-02 15:27:50 +00:00
|
|
|
_leave(" = %p [cell]", cell);
|
2007-04-26 15:55:03 -07:00
|
|
|
return cell;
|
2005-04-16 15:20:36 -07:00
|
|
|
|
2017-11-02 15:27:50 +00:00
|
|
|
cell_already_exists:
|
|
|
|
_debug("cell exists");
|
|
|
|
cell = cursor;
|
|
|
|
if (excl) {
|
|
|
|
ret = -EEXIST;
|
|
|
|
} else {
|
2025-02-24 10:37:56 +00:00
|
|
|
afs_use_cell(cursor, trace);
|
2017-11-02 15:27:50 +00:00
|
|
|
ret = 0;
|
|
|
|
}
|
afs: Fix rapid cell addition/removal by not using RCU on cells tree
There are a number of problems that are being seen by the rapidly mounting
and unmounting an afs dynamic root with an explicit cell and volume
specified (which should probably be rejected, but that's a separate issue):
What the tests are doing is to look up/create a cell record for the name
given and then tear it down again without actually using it to try to talk
to a server. This is repeated endlessly, very fast, and the new cell
collides with the old one if it's not quick enough to reuse it.
It appears (as suggested by Hillf Danton) that the search through the RB
tree under a read_seqbegin_or_lock() under RCU conditions isn't safe and
that it's not blocking the write_seqlock(), despite taking two passes at
it. He suggested that the code should take a ref on the cell it's
attempting to look at - but this shouldn't be necessary until we've
compared the cell names. It's possible that I'm missing a barrier
somewhere.
However, using an RCU search for this is overkill, really - we only need to
access the cell name in a few places, and they're places where we're may
end up sleeping anyway.
Fix this by switching to an R/W semaphore instead.
Additionally, draw the down_read() call inside the function (renamed to
afs_find_cell()) since all the callers were taking the RCU read lock (or
should've been[*]).
[*] afs_probe_cell_name() should have been, but that doesn't appear to be
involved in the bug reports.
The symptoms of this look like:
general protection fault, probably for non-canonical address 0xf27d208691691fdb: 0000 [#1] PREEMPT SMP KASAN
KASAN: maybe wild-memory-access in range [0x93e924348b48fed8-0x93e924348b48fedf]
...
RIP: 0010:strncasecmp lib/string.c:52 [inline]
RIP: 0010:strncasecmp+0x5f/0x240 lib/string.c:43
afs_lookup_cell_rcu+0x313/0x720 fs/afs/cell.c:88
afs_lookup_cell+0x2ee/0x1440 fs/afs/cell.c:249
afs_parse_source fs/afs/super.c:290 [inline]
...
Fixes: 989782dcdc91 ("afs: Overhaul cell database management")
Reported-by: syzbot+459a5dce0b4cb70fd076@syzkaller.appspotmail.com
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Hillf Danton <hdanton@sina.com>
cc: syzkaller-bugs@googlegroups.com
2020-10-09 14:11:58 +01:00
|
|
|
up_write(&net->cells_lock);
|
2019-07-23 11:24:59 +01:00
|
|
|
if (candidate)
|
2020-10-13 20:51:59 +01:00
|
|
|
afs_put_cell(candidate, afs_cell_trace_put_candidate);
|
2017-11-02 15:27:50 +00:00
|
|
|
if (ret == 0)
|
|
|
|
goto wait_for_cell;
|
2017-11-02 15:27:50 +00:00
|
|
|
goto error_noput;
|
2007-04-26 15:49:28 -07:00
|
|
|
error:
|
2025-02-24 10:54:04 +00:00
|
|
|
afs_unuse_cell(cell, afs_cell_trace_unuse_lookup_error);
|
2017-11-02 15:27:50 +00:00
|
|
|
error_noput:
|
2017-11-02 15:27:50 +00:00
|
|
|
_leave(" = %d [error]", ret);
|
2007-04-26 15:55:03 -07:00
|
|
|
return ERR_PTR(ret);
|
2007-04-26 15:49:28 -07:00
|
|
|
}
|
2005-04-16 15:20:36 -07:00
|
|
|
|
|
|
|
/*
|
2007-04-26 15:55:03 -07:00
|
|
|
* set the root cell information
|
|
|
|
* - can be called with a module parameter string
|
|
|
|
* - can be called from a write to /proc/fs/afs/rootcell
|
2005-04-16 15:20:36 -07:00
|
|
|
*/
|
2017-11-02 15:27:50 +00:00
|
|
|
int afs_cell_init(struct afs_net *net, const char *rootcell)
|
2005-04-16 15:20:36 -07:00
|
|
|
{
|
|
|
|
struct afs_cell *old_root, *new_root;
|
2017-11-02 15:27:50 +00:00
|
|
|
const char *cp, *vllist;
|
|
|
|
size_t len;
|
2005-04-16 15:20:36 -07:00
|
|
|
|
|
|
|
_enter("");
|
|
|
|
|
|
|
|
if (!rootcell) {
|
|
|
|
/* module is loaded with no parameters, or built statically.
|
|
|
|
* - in the future we might initialize cell DB here.
|
|
|
|
*/
|
2007-04-26 15:55:03 -07:00
|
|
|
_leave(" = 0 [no root]");
|
2005-04-16 15:20:36 -07:00
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
cp = strchr(rootcell, ':');
|
2017-11-02 15:27:50 +00:00
|
|
|
if (!cp) {
|
2010-08-04 15:16:38 +01:00
|
|
|
_debug("kAFS: no VL server IP addresses specified");
|
2017-11-02 15:27:50 +00:00
|
|
|
vllist = NULL;
|
|
|
|
len = strlen(rootcell);
|
|
|
|
} else {
|
|
|
|
vllist = cp + 1;
|
|
|
|
len = cp - rootcell;
|
|
|
|
}
|
2005-04-16 15:20:36 -07:00
|
|
|
|
2025-01-07 18:34:50 +00:00
|
|
|
if (len == 0 || !rootcell[0] || rootcell[0] == '.' || rootcell[len - 1] == '.')
|
|
|
|
return -EINVAL;
|
|
|
|
if (memchr(rootcell, '/', len))
|
|
|
|
return -EINVAL;
|
|
|
|
cp = strstr(rootcell, "..");
|
|
|
|
if (cp && cp < rootcell + len)
|
|
|
|
return -EINVAL;
|
|
|
|
|
2025-02-24 10:37:56 +00:00
|
|
|
/* allocate a cell record for the root/workstation cell */
|
|
|
|
new_root = afs_lookup_cell(net, rootcell, len, vllist, false,
|
|
|
|
afs_cell_trace_use_lookup_ws);
|
2007-04-26 15:55:03 -07:00
|
|
|
if (IS_ERR(new_root)) {
|
|
|
|
_leave(" = %ld", PTR_ERR(new_root));
|
|
|
|
return PTR_ERR(new_root);
|
2005-04-16 15:20:36 -07:00
|
|
|
}
|
|
|
|
|
2018-04-09 21:12:31 +01:00
|
|
|
if (!test_and_set_bit(AFS_CELL_FL_NO_GC, &new_root->flags))
|
2020-10-13 20:51:59 +01:00
|
|
|
afs_use_cell(new_root, afs_cell_trace_use_pin);
|
2017-11-02 15:27:50 +00:00
|
|
|
|
2007-04-26 15:55:03 -07:00
|
|
|
/* install the new cell */
|
afs: Fix rapid cell addition/removal by not using RCU on cells tree
There are a number of problems that are being seen by the rapidly mounting
and unmounting an afs dynamic root with an explicit cell and volume
specified (which should probably be rejected, but that's a separate issue):
What the tests are doing is to look up/create a cell record for the name
given and then tear it down again without actually using it to try to talk
to a server. This is repeated endlessly, very fast, and the new cell
collides with the old one if it's not quick enough to reuse it.
It appears (as suggested by Hillf Danton) that the search through the RB
tree under a read_seqbegin_or_lock() under RCU conditions isn't safe and
that it's not blocking the write_seqlock(), despite taking two passes at
it. He suggested that the code should take a ref on the cell it's
attempting to look at - but this shouldn't be necessary until we've
compared the cell names. It's possible that I'm missing a barrier
somewhere.
However, using an RCU search for this is overkill, really - we only need to
access the cell name in a few places, and they're places where we're may
end up sleeping anyway.
Fix this by switching to an R/W semaphore instead.
Additionally, draw the down_read() call inside the function (renamed to
afs_find_cell()) since all the callers were taking the RCU read lock (or
should've been[*]).
[*] afs_probe_cell_name() should have been, but that doesn't appear to be
involved in the bug reports.
The symptoms of this look like:
general protection fault, probably for non-canonical address 0xf27d208691691fdb: 0000 [#1] PREEMPT SMP KASAN
KASAN: maybe wild-memory-access in range [0x93e924348b48fed8-0x93e924348b48fedf]
...
RIP: 0010:strncasecmp lib/string.c:52 [inline]
RIP: 0010:strncasecmp+0x5f/0x240 lib/string.c:43
afs_lookup_cell_rcu+0x313/0x720 fs/afs/cell.c:88
afs_lookup_cell+0x2ee/0x1440 fs/afs/cell.c:249
afs_parse_source fs/afs/super.c:290 [inline]
...
Fixes: 989782dcdc91 ("afs: Overhaul cell database management")
Reported-by: syzbot+459a5dce0b4cb70fd076@syzkaller.appspotmail.com
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Hillf Danton <hdanton@sina.com>
cc: syzkaller-bugs@googlegroups.com
2020-10-09 14:11:58 +01:00
|
|
|
down_write(&net->cells_lock);
|
2025-03-06 08:46:57 +00:00
|
|
|
old_root = rcu_replace_pointer(net->ws_cell, new_root,
|
|
|
|
lockdep_is_held(&net->cells_lock));
|
afs: Fix rapid cell addition/removal by not using RCU on cells tree
There are a number of problems that are being seen by the rapidly mounting
and unmounting an afs dynamic root with an explicit cell and volume
specified (which should probably be rejected, but that's a separate issue):
What the tests are doing is to look up/create a cell record for the name
given and then tear it down again without actually using it to try to talk
to a server. This is repeated endlessly, very fast, and the new cell
collides with the old one if it's not quick enough to reuse it.
It appears (as suggested by Hillf Danton) that the search through the RB
tree under a read_seqbegin_or_lock() under RCU conditions isn't safe and
that it's not blocking the write_seqlock(), despite taking two passes at
it. He suggested that the code should take a ref on the cell it's
attempting to look at - but this shouldn't be necessary until we've
compared the cell names. It's possible that I'm missing a barrier
somewhere.
However, using an RCU search for this is overkill, really - we only need to
access the cell name in a few places, and they're places where we're may
end up sleeping anyway.
Fix this by switching to an R/W semaphore instead.
Additionally, draw the down_read() call inside the function (renamed to
afs_find_cell()) since all the callers were taking the RCU read lock (or
should've been[*]).
[*] afs_probe_cell_name() should have been, but that doesn't appear to be
involved in the bug reports.
The symptoms of this look like:
general protection fault, probably for non-canonical address 0xf27d208691691fdb: 0000 [#1] PREEMPT SMP KASAN
KASAN: maybe wild-memory-access in range [0x93e924348b48fed8-0x93e924348b48fedf]
...
RIP: 0010:strncasecmp lib/string.c:52 [inline]
RIP: 0010:strncasecmp+0x5f/0x240 lib/string.c:43
afs_lookup_cell_rcu+0x313/0x720 fs/afs/cell.c:88
afs_lookup_cell+0x2ee/0x1440 fs/afs/cell.c:249
afs_parse_source fs/afs/super.c:290 [inline]
...
Fixes: 989782dcdc91 ("afs: Overhaul cell database management")
Reported-by: syzbot+459a5dce0b4cb70fd076@syzkaller.appspotmail.com
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Hillf Danton <hdanton@sina.com>
cc: syzkaller-bugs@googlegroups.com
2020-10-09 14:11:58 +01:00
|
|
|
up_write(&net->cells_lock);
|
2005-04-16 15:20:36 -07:00
|
|
|
|
2025-02-24 10:54:04 +00:00
|
|
|
afs_unuse_cell(old_root, afs_cell_trace_unuse_ws);
|
2007-04-26 15:55:03 -07:00
|
|
|
_leave(" = 0");
|
|
|
|
return 0;
|
2007-04-26 15:49:28 -07:00
|
|
|
}
|
2005-04-16 15:20:36 -07:00
|
|
|
|
|
|
|
/*
|
2017-11-02 15:27:50 +00:00
|
|
|
* Update a cell's VL server address list from the DNS.
|
2005-04-16 15:20:36 -07:00
|
|
|
*/
|
2019-05-07 15:06:36 +01:00
|
|
|
static int afs_update_cell(struct afs_cell *cell)
|
2005-04-16 15:20:36 -07:00
|
|
|
{
|
2019-05-07 15:06:36 +01:00
|
|
|
struct afs_vlserver_list *vllist, *old = NULL, *p;
|
2018-10-20 00:57:57 +01:00
|
|
|
unsigned int min_ttl = READ_ONCE(afs_cell_min_ttl);
|
|
|
|
unsigned int max_ttl = READ_ONCE(afs_cell_max_ttl);
|
|
|
|
time64_t now, expiry = 0;
|
2019-05-07 15:06:36 +01:00
|
|
|
int ret = 0;
|
2005-04-16 15:20:36 -07:00
|
|
|
|
2017-11-02 15:27:50 +00:00
|
|
|
_enter("%s", cell->name);
|
|
|
|
|
2018-10-20 00:57:57 +01:00
|
|
|
vllist = afs_dns_query(cell, &expiry);
|
2019-05-07 15:06:36 +01:00
|
|
|
if (IS_ERR(vllist)) {
|
|
|
|
ret = PTR_ERR(vllist);
|
|
|
|
|
|
|
|
_debug("%s: fail %d", cell->name, ret);
|
|
|
|
if (ret == -ENOMEM)
|
|
|
|
goto out_wake;
|
|
|
|
|
|
|
|
vllist = afs_alloc_vlserver_list(0);
|
2023-12-21 15:09:31 +00:00
|
|
|
if (!vllist) {
|
|
|
|
if (ret >= 0)
|
|
|
|
ret = -ENOMEM;
|
2019-05-07 15:06:36 +01:00
|
|
|
goto out_wake;
|
2023-12-21 15:09:31 +00:00
|
|
|
}
|
2019-05-07 15:06:36 +01:00
|
|
|
|
|
|
|
switch (ret) {
|
|
|
|
case -ENODATA:
|
|
|
|
case -EDESTADDRREQ:
|
|
|
|
vllist->status = DNS_LOOKUP_GOT_NOT_FOUND;
|
|
|
|
break;
|
|
|
|
case -EAGAIN:
|
|
|
|
case -ECONNREFUSED:
|
|
|
|
vllist->status = DNS_LOOKUP_GOT_TEMP_FAILURE;
|
|
|
|
break;
|
|
|
|
default:
|
|
|
|
vllist->status = DNS_LOOKUP_GOT_LOCAL_FAILURE;
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
_debug("%s: got list %d %d", cell->name, vllist->source, vllist->status);
|
|
|
|
cell->dns_status = vllist->status;
|
2018-10-20 00:57:57 +01:00
|
|
|
|
|
|
|
now = ktime_get_real_seconds();
|
|
|
|
if (min_ttl > max_ttl)
|
|
|
|
max_ttl = min_ttl;
|
|
|
|
if (expiry < now + min_ttl)
|
|
|
|
expiry = now + min_ttl;
|
|
|
|
else if (expiry > now + max_ttl)
|
|
|
|
expiry = now + max_ttl;
|
|
|
|
|
2019-05-07 15:06:36 +01:00
|
|
|
_debug("%s: status %d", cell->name, vllist->status);
|
|
|
|
if (vllist->source == DNS_RECORD_UNAVAILABLE) {
|
|
|
|
switch (vllist->status) {
|
|
|
|
case DNS_LOOKUP_GOT_NOT_FOUND:
|
2018-10-20 00:57:57 +01:00
|
|
|
/* The DNS said that the cell does not exist or there
|
|
|
|
* weren't any addresses to be had.
|
|
|
|
*/
|
|
|
|
cell->dns_expiry = expiry;
|
2017-11-02 15:27:50 +00:00
|
|
|
break;
|
2017-11-02 15:27:50 +00:00
|
|
|
|
2019-05-07 15:06:36 +01:00
|
|
|
case DNS_LOOKUP_BAD:
|
|
|
|
case DNS_LOOKUP_GOT_LOCAL_FAILURE:
|
|
|
|
case DNS_LOOKUP_GOT_TEMP_FAILURE:
|
|
|
|
case DNS_LOOKUP_GOT_NS_FAILURE:
|
2017-11-02 15:27:50 +00:00
|
|
|
default:
|
2018-10-20 00:57:57 +01:00
|
|
|
cell->dns_expiry = now + 10;
|
2017-11-02 15:27:50 +00:00
|
|
|
break;
|
|
|
|
}
|
|
|
|
} else {
|
|
|
|
cell->dns_expiry = expiry;
|
|
|
|
}
|
2010-08-11 09:38:04 +01:00
|
|
|
|
2019-05-07 15:06:36 +01:00
|
|
|
/* Replace the VL server list if the new record has servers or the old
|
|
|
|
* record doesn't.
|
|
|
|
*/
|
|
|
|
write_lock(&cell->vl_servers_lock);
|
|
|
|
p = rcu_dereference_protected(cell->vl_servers, true);
|
|
|
|
if (vllist->nr_servers > 0 || p->nr_servers == 0) {
|
|
|
|
rcu_assign_pointer(cell->vl_servers, vllist);
|
|
|
|
cell->dns_source = vllist->source;
|
|
|
|
old = p;
|
|
|
|
}
|
|
|
|
write_unlock(&cell->vl_servers_lock);
|
|
|
|
afs_put_vlserverlist(cell->net, old);
|
2010-08-11 09:38:04 +01:00
|
|
|
|
2019-05-07 15:06:36 +01:00
|
|
|
out_wake:
|
|
|
|
smp_store_release(&cell->dns_lookup_count,
|
|
|
|
cell->dns_lookup_count + 1); /* vs source/status */
|
|
|
|
wake_up_var(&cell->dns_lookup_count);
|
|
|
|
_leave(" = %d", ret);
|
|
|
|
return ret;
|
2007-04-26 15:49:28 -07:00
|
|
|
}
|
2005-04-16 15:20:36 -07:00
|
|
|
|
|
|
|
/*
|
2017-11-02 15:27:50 +00:00
|
|
|
* Destroy a cell record
|
2005-04-16 15:20:36 -07:00
|
|
|
*/
|
2017-11-02 15:27:50 +00:00
|
|
|
static void afs_cell_destroy(struct rcu_head *rcu)
|
2005-04-16 15:20:36 -07:00
|
|
|
{
|
2017-11-02 15:27:50 +00:00
|
|
|
struct afs_cell *cell = container_of(rcu, struct afs_cell, rcu);
|
2019-07-23 11:24:59 +01:00
|
|
|
struct afs_net *net = cell->net;
|
2022-07-06 10:52:14 +01:00
|
|
|
int r;
|
2005-04-16 15:20:36 -07:00
|
|
|
|
2017-11-02 15:27:50 +00:00
|
|
|
_enter("%p{%s}", cell, cell->name);
|
2005-04-16 15:20:36 -07:00
|
|
|
|
2022-07-06 10:52:14 +01:00
|
|
|
r = refcount_read(&cell->ref);
|
|
|
|
ASSERTCMP(r, ==, 0);
|
|
|
|
trace_afs_cell(cell->debug_id, r, atomic_read(&cell->active), afs_cell_trace_free);
|
2017-11-02 15:27:50 +00:00
|
|
|
|
2019-07-23 11:24:59 +01:00
|
|
|
afs_put_vlserverlist(net, rcu_access_pointer(cell->vl_servers));
|
2025-02-24 10:54:04 +00:00
|
|
|
afs_unuse_cell(cell->alias_of, afs_cell_trace_unuse_alias);
|
2017-11-02 15:27:50 +00:00
|
|
|
key_put(cell->anonymous_key);
|
2025-02-24 09:52:58 +00:00
|
|
|
idr_remove(&net->cells_dyn_ino, cell->dynroot_ino);
|
2025-01-07 18:34:49 +00:00
|
|
|
kfree(cell->name - 1);
|
2017-11-02 15:27:50 +00:00
|
|
|
kfree(cell);
|
|
|
|
|
2019-07-23 11:24:59 +01:00
|
|
|
afs_dec_cells_outstanding(net);
|
2017-11-02 15:27:50 +00:00
|
|
|
_leave(" [destroyed]");
|
2007-04-26 15:49:28 -07:00
|
|
|
}
|
2005-04-16 15:20:36 -07:00
|
|
|
|
afs: Simplify cell record handling
Simplify afs_cell record handling to avoid very occasional races that cause
module removal to hang (it waits for all cell records to be removed).
There are two things that particularly contribute to the difficulty:
firstly, the code tries to pass a ref on the cell to the cell's maintenance
work item (which gets awkward if the work item is already queued); and,
secondly, there's an overall cell manager that tries to use just one timer
for the entire cell collection (to avoid having loads of timers). However,
both of these are probably unnecessarily restrictive.
To simplify this, the following changes are made:
(1) The cell record collection manager is removed. Each cell record
manages itself individually.
(2) Each afs_cell is given a second work item (cell->destroyer) that is
queued when its refcount reaches zero. This is not done in the
context of the putting thread as it might be in an inconvenient place
to sleep.
(3) Each afs_cell is given its own timer. The timer is used to expire the
cell record after a period of unuse if not otherwise pinned and can
also be used for other maintenance tasks if necessary (of which there
are currently none as DNS refresh is triggered by filesystem
operations).
(4) The afs_cell manager work item (cell->manager) is no longer given a
ref on the cell when queued; rather, the manager must be deleted.
This does away with the need to deal with the consequences of losing a
race to queue cell->manager. Clean up of extra queuing is deferred to
the destroyer.
(5) The cell destroyer work item makes sure the cell timer is removed and
that the normal cell work is cancelled before farming the actual
destruction off to RCU.
(6) When a network namespace is destroyed or the kafs module is unloaded,
it's now a simple matter of marking the namespace as dead then just
waking up all the cell work items. They will then remove and destroy
themselves once all remaining activity counts and/or a ref counts are
dropped. This makes sure that all server records are dropped first.
(7) The cell record state set is reduced to just four states: SETTING_UP,
ACTIVE, REMOVING and DEAD. The record persists in the active state
even when it's not being used until the time comes to remove it rather
than downgrading it to an inactive state from whence it can be
restored.
This means that the cell still appears in /proc and /afs when not in
use until it switches to the REMOVING state - at which point it is
removed.
Note that the REMOVING state is included so that someone wanting to
resurrect the cell record is forced to wait whilst the cell is torn
down in that state. Once it's in the DEAD state, it has been removed
from net->cells tree and is no longer findable and can be replaced.
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
cc: linux-fsdevel@vger.kernel.org
Link: https://lore.kernel.org/r/20250224234154.2014840-16-dhowells@redhat.com/ # v1
Link: https://lore.kernel.org/r/20250310094206.801057-12-dhowells@redhat.com/ # v4
2025-02-24 16:06:03 +00:00
|
|
|
static void afs_destroy_cell_work(struct work_struct *work)
|
2017-11-02 15:27:50 +00:00
|
|
|
{
|
afs: Simplify cell record handling
Simplify afs_cell record handling to avoid very occasional races that cause
module removal to hang (it waits for all cell records to be removed).
There are two things that particularly contribute to the difficulty:
firstly, the code tries to pass a ref on the cell to the cell's maintenance
work item (which gets awkward if the work item is already queued); and,
secondly, there's an overall cell manager that tries to use just one timer
for the entire cell collection (to avoid having loads of timers). However,
both of these are probably unnecessarily restrictive.
To simplify this, the following changes are made:
(1) The cell record collection manager is removed. Each cell record
manages itself individually.
(2) Each afs_cell is given a second work item (cell->destroyer) that is
queued when its refcount reaches zero. This is not done in the
context of the putting thread as it might be in an inconvenient place
to sleep.
(3) Each afs_cell is given its own timer. The timer is used to expire the
cell record after a period of unuse if not otherwise pinned and can
also be used for other maintenance tasks if necessary (of which there
are currently none as DNS refresh is triggered by filesystem
operations).
(4) The afs_cell manager work item (cell->manager) is no longer given a
ref on the cell when queued; rather, the manager must be deleted.
This does away with the need to deal with the consequences of losing a
race to queue cell->manager. Clean up of extra queuing is deferred to
the destroyer.
(5) The cell destroyer work item makes sure the cell timer is removed and
that the normal cell work is cancelled before farming the actual
destruction off to RCU.
(6) When a network namespace is destroyed or the kafs module is unloaded,
it's now a simple matter of marking the namespace as dead then just
waking up all the cell work items. They will then remove and destroy
themselves once all remaining activity counts and/or a ref counts are
dropped. This makes sure that all server records are dropped first.
(7) The cell record state set is reduced to just four states: SETTING_UP,
ACTIVE, REMOVING and DEAD. The record persists in the active state
even when it's not being used until the time comes to remove it rather
than downgrading it to an inactive state from whence it can be
restored.
This means that the cell still appears in /proc and /afs when not in
use until it switches to the REMOVING state - at which point it is
removed.
Note that the REMOVING state is included so that someone wanting to
resurrect the cell record is forced to wait whilst the cell is torn
down in that state. Once it's in the DEAD state, it has been removed
from net->cells tree and is no longer findable and can be replaced.
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
cc: linux-fsdevel@vger.kernel.org
Link: https://lore.kernel.org/r/20250224234154.2014840-16-dhowells@redhat.com/ # v1
Link: https://lore.kernel.org/r/20250310094206.801057-12-dhowells@redhat.com/ # v4
2025-02-24 16:06:03 +00:00
|
|
|
struct afs_cell *cell = container_of(work, struct afs_cell, destroyer);
|
2017-11-02 15:27:50 +00:00
|
|
|
|
afs: Simplify cell record handling
Simplify afs_cell record handling to avoid very occasional races that cause
module removal to hang (it waits for all cell records to be removed).
There are two things that particularly contribute to the difficulty:
firstly, the code tries to pass a ref on the cell to the cell's maintenance
work item (which gets awkward if the work item is already queued); and,
secondly, there's an overall cell manager that tries to use just one timer
for the entire cell collection (to avoid having loads of timers). However,
both of these are probably unnecessarily restrictive.
To simplify this, the following changes are made:
(1) The cell record collection manager is removed. Each cell record
manages itself individually.
(2) Each afs_cell is given a second work item (cell->destroyer) that is
queued when its refcount reaches zero. This is not done in the
context of the putting thread as it might be in an inconvenient place
to sleep.
(3) Each afs_cell is given its own timer. The timer is used to expire the
cell record after a period of unuse if not otherwise pinned and can
also be used for other maintenance tasks if necessary (of which there
are currently none as DNS refresh is triggered by filesystem
operations).
(4) The afs_cell manager work item (cell->manager) is no longer given a
ref on the cell when queued; rather, the manager must be deleted.
This does away with the need to deal with the consequences of losing a
race to queue cell->manager. Clean up of extra queuing is deferred to
the destroyer.
(5) The cell destroyer work item makes sure the cell timer is removed and
that the normal cell work is cancelled before farming the actual
destruction off to RCU.
(6) When a network namespace is destroyed or the kafs module is unloaded,
it's now a simple matter of marking the namespace as dead then just
waking up all the cell work items. They will then remove and destroy
themselves once all remaining activity counts and/or a ref counts are
dropped. This makes sure that all server records are dropped first.
(7) The cell record state set is reduced to just four states: SETTING_UP,
ACTIVE, REMOVING and DEAD. The record persists in the active state
even when it's not being used until the time comes to remove it rather
than downgrading it to an inactive state from whence it can be
restored.
This means that the cell still appears in /proc and /afs when not in
use until it switches to the REMOVING state - at which point it is
removed.
Note that the REMOVING state is included so that someone wanting to
resurrect the cell record is forced to wait whilst the cell is torn
down in that state. Once it's in the DEAD state, it has been removed
from net->cells tree and is no longer findable and can be replaced.
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
cc: linux-fsdevel@vger.kernel.org
Link: https://lore.kernel.org/r/20250224234154.2014840-16-dhowells@redhat.com/ # v1
Link: https://lore.kernel.org/r/20250310094206.801057-12-dhowells@redhat.com/ # v4
2025-02-24 16:06:03 +00:00
|
|
|
afs_see_cell(cell, afs_cell_trace_destroy);
|
|
|
|
timer_delete_sync(&cell->management_timer);
|
|
|
|
cancel_work_sync(&cell->manager);
|
|
|
|
call_rcu(&cell->rcu, afs_cell_destroy);
|
2017-11-02 15:27:50 +00:00
|
|
|
}
|
2005-04-16 15:20:36 -07:00
|
|
|
|
2017-11-02 15:27:50 +00:00
|
|
|
/*
|
|
|
|
* Get a reference on a cell record.
|
|
|
|
*/
|
2020-10-13 20:51:59 +01:00
|
|
|
struct afs_cell *afs_get_cell(struct afs_cell *cell, enum afs_cell_trace reason)
|
2017-11-02 15:27:50 +00:00
|
|
|
{
|
2022-07-06 10:52:14 +01:00
|
|
|
int r;
|
2020-10-13 20:51:59 +01:00
|
|
|
|
2022-07-06 10:52:14 +01:00
|
|
|
__refcount_inc(&cell->ref, &r);
|
|
|
|
trace_afs_cell(cell->debug_id, r + 1, atomic_read(&cell->active), reason);
|
2017-11-02 15:27:50 +00:00
|
|
|
return cell;
|
|
|
|
}
|
|
|
|
|
2017-11-02 15:27:50 +00:00
|
|
|
/*
|
|
|
|
* Drop a reference on a cell record.
|
|
|
|
*/
|
2020-10-13 20:51:59 +01:00
|
|
|
void afs_put_cell(struct afs_cell *cell, enum afs_cell_trace reason)
|
2019-07-23 11:24:59 +01:00
|
|
|
{
|
|
|
|
if (cell) {
|
2020-10-13 20:51:59 +01:00
|
|
|
unsigned int debug_id = cell->debug_id;
|
2022-07-06 10:52:14 +01:00
|
|
|
unsigned int a;
|
|
|
|
bool zero;
|
|
|
|
int r;
|
2019-07-23 11:24:59 +01:00
|
|
|
|
2020-10-13 20:51:59 +01:00
|
|
|
a = atomic_read(&cell->active);
|
2022-07-06 10:52:14 +01:00
|
|
|
zero = __refcount_dec_and_test(&cell->ref, &r);
|
|
|
|
trace_afs_cell(debug_id, r - 1, a, reason);
|
|
|
|
if (zero) {
|
2019-07-23 11:24:59 +01:00
|
|
|
a = atomic_read(&cell->active);
|
|
|
|
WARN(a != 0, "Cell active count %u > 0\n", a);
|
afs: Simplify cell record handling
Simplify afs_cell record handling to avoid very occasional races that cause
module removal to hang (it waits for all cell records to be removed).
There are two things that particularly contribute to the difficulty:
firstly, the code tries to pass a ref on the cell to the cell's maintenance
work item (which gets awkward if the work item is already queued); and,
secondly, there's an overall cell manager that tries to use just one timer
for the entire cell collection (to avoid having loads of timers). However,
both of these are probably unnecessarily restrictive.
To simplify this, the following changes are made:
(1) The cell record collection manager is removed. Each cell record
manages itself individually.
(2) Each afs_cell is given a second work item (cell->destroyer) that is
queued when its refcount reaches zero. This is not done in the
context of the putting thread as it might be in an inconvenient place
to sleep.
(3) Each afs_cell is given its own timer. The timer is used to expire the
cell record after a period of unuse if not otherwise pinned and can
also be used for other maintenance tasks if necessary (of which there
are currently none as DNS refresh is triggered by filesystem
operations).
(4) The afs_cell manager work item (cell->manager) is no longer given a
ref on the cell when queued; rather, the manager must be deleted.
This does away with the need to deal with the consequences of losing a
race to queue cell->manager. Clean up of extra queuing is deferred to
the destroyer.
(5) The cell destroyer work item makes sure the cell timer is removed and
that the normal cell work is cancelled before farming the actual
destruction off to RCU.
(6) When a network namespace is destroyed or the kafs module is unloaded,
it's now a simple matter of marking the namespace as dead then just
waking up all the cell work items. They will then remove and destroy
themselves once all remaining activity counts and/or a ref counts are
dropped. This makes sure that all server records are dropped first.
(7) The cell record state set is reduced to just four states: SETTING_UP,
ACTIVE, REMOVING and DEAD. The record persists in the active state
even when it's not being used until the time comes to remove it rather
than downgrading it to an inactive state from whence it can be
restored.
This means that the cell still appears in /proc and /afs when not in
use until it switches to the REMOVING state - at which point it is
removed.
Note that the REMOVING state is included so that someone wanting to
resurrect the cell record is forced to wait whilst the cell is torn
down in that state. Once it's in the DEAD state, it has been removed
from net->cells tree and is no longer findable and can be replaced.
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
cc: linux-fsdevel@vger.kernel.org
Link: https://lore.kernel.org/r/20250224234154.2014840-16-dhowells@redhat.com/ # v1
Link: https://lore.kernel.org/r/20250310094206.801057-12-dhowells@redhat.com/ # v4
2025-02-24 16:06:03 +00:00
|
|
|
WARN_ON(!queue_work(afs_wq, &cell->destroyer));
|
2019-07-23 11:24:59 +01:00
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Note a cell becoming more active.
|
|
|
|
*/
|
2020-10-13 20:51:59 +01:00
|
|
|
struct afs_cell *afs_use_cell(struct afs_cell *cell, enum afs_cell_trace reason)
|
2019-07-23 11:24:59 +01:00
|
|
|
{
|
2022-07-06 10:52:14 +01:00
|
|
|
int r, a;
|
2019-07-23 11:24:59 +01:00
|
|
|
|
afs: Simplify cell record handling
Simplify afs_cell record handling to avoid very occasional races that cause
module removal to hang (it waits for all cell records to be removed).
There are two things that particularly contribute to the difficulty:
firstly, the code tries to pass a ref on the cell to the cell's maintenance
work item (which gets awkward if the work item is already queued); and,
secondly, there's an overall cell manager that tries to use just one timer
for the entire cell collection (to avoid having loads of timers). However,
both of these are probably unnecessarily restrictive.
To simplify this, the following changes are made:
(1) The cell record collection manager is removed. Each cell record
manages itself individually.
(2) Each afs_cell is given a second work item (cell->destroyer) that is
queued when its refcount reaches zero. This is not done in the
context of the putting thread as it might be in an inconvenient place
to sleep.
(3) Each afs_cell is given its own timer. The timer is used to expire the
cell record after a period of unuse if not otherwise pinned and can
also be used for other maintenance tasks if necessary (of which there
are currently none as DNS refresh is triggered by filesystem
operations).
(4) The afs_cell manager work item (cell->manager) is no longer given a
ref on the cell when queued; rather, the manager must be deleted.
This does away with the need to deal with the consequences of losing a
race to queue cell->manager. Clean up of extra queuing is deferred to
the destroyer.
(5) The cell destroyer work item makes sure the cell timer is removed and
that the normal cell work is cancelled before farming the actual
destruction off to RCU.
(6) When a network namespace is destroyed or the kafs module is unloaded,
it's now a simple matter of marking the namespace as dead then just
waking up all the cell work items. They will then remove and destroy
themselves once all remaining activity counts and/or a ref counts are
dropped. This makes sure that all server records are dropped first.
(7) The cell record state set is reduced to just four states: SETTING_UP,
ACTIVE, REMOVING and DEAD. The record persists in the active state
even when it's not being used until the time comes to remove it rather
than downgrading it to an inactive state from whence it can be
restored.
This means that the cell still appears in /proc and /afs when not in
use until it switches to the REMOVING state - at which point it is
removed.
Note that the REMOVING state is included so that someone wanting to
resurrect the cell record is forced to wait whilst the cell is torn
down in that state. Once it's in the DEAD state, it has been removed
from net->cells tree and is no longer findable and can be replaced.
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
cc: linux-fsdevel@vger.kernel.org
Link: https://lore.kernel.org/r/20250224234154.2014840-16-dhowells@redhat.com/ # v1
Link: https://lore.kernel.org/r/20250310094206.801057-12-dhowells@redhat.com/ # v4
2025-02-24 16:06:03 +00:00
|
|
|
__refcount_inc(&cell->ref, &r);
|
2020-10-13 20:51:59 +01:00
|
|
|
a = atomic_inc_return(&cell->active);
|
afs: Simplify cell record handling
Simplify afs_cell record handling to avoid very occasional races that cause
module removal to hang (it waits for all cell records to be removed).
There are two things that particularly contribute to the difficulty:
firstly, the code tries to pass a ref on the cell to the cell's maintenance
work item (which gets awkward if the work item is already queued); and,
secondly, there's an overall cell manager that tries to use just one timer
for the entire cell collection (to avoid having loads of timers). However,
both of these are probably unnecessarily restrictive.
To simplify this, the following changes are made:
(1) The cell record collection manager is removed. Each cell record
manages itself individually.
(2) Each afs_cell is given a second work item (cell->destroyer) that is
queued when its refcount reaches zero. This is not done in the
context of the putting thread as it might be in an inconvenient place
to sleep.
(3) Each afs_cell is given its own timer. The timer is used to expire the
cell record after a period of unuse if not otherwise pinned and can
also be used for other maintenance tasks if necessary (of which there
are currently none as DNS refresh is triggered by filesystem
operations).
(4) The afs_cell manager work item (cell->manager) is no longer given a
ref on the cell when queued; rather, the manager must be deleted.
This does away with the need to deal with the consequences of losing a
race to queue cell->manager. Clean up of extra queuing is deferred to
the destroyer.
(5) The cell destroyer work item makes sure the cell timer is removed and
that the normal cell work is cancelled before farming the actual
destruction off to RCU.
(6) When a network namespace is destroyed or the kafs module is unloaded,
it's now a simple matter of marking the namespace as dead then just
waking up all the cell work items. They will then remove and destroy
themselves once all remaining activity counts and/or a ref counts are
dropped. This makes sure that all server records are dropped first.
(7) The cell record state set is reduced to just four states: SETTING_UP,
ACTIVE, REMOVING and DEAD. The record persists in the active state
even when it's not being used until the time comes to remove it rather
than downgrading it to an inactive state from whence it can be
restored.
This means that the cell still appears in /proc and /afs when not in
use until it switches to the REMOVING state - at which point it is
removed.
Note that the REMOVING state is included so that someone wanting to
resurrect the cell record is forced to wait whilst the cell is torn
down in that state. Once it's in the DEAD state, it has been removed
from net->cells tree and is no longer findable and can be replaced.
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
cc: linux-fsdevel@vger.kernel.org
Link: https://lore.kernel.org/r/20250224234154.2014840-16-dhowells@redhat.com/ # v1
Link: https://lore.kernel.org/r/20250310094206.801057-12-dhowells@redhat.com/ # v4
2025-02-24 16:06:03 +00:00
|
|
|
trace_afs_cell(cell->debug_id, r + 1, a, reason);
|
2019-07-23 11:24:59 +01:00
|
|
|
return cell;
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Record a cell becoming less active. When the active counter reaches 1, it
|
|
|
|
* is scheduled for destruction, but may get reactivated.
|
|
|
|
*/
|
2025-02-24 10:54:04 +00:00
|
|
|
void afs_unuse_cell(struct afs_cell *cell, enum afs_cell_trace reason)
|
2017-11-02 15:27:50 +00:00
|
|
|
{
|
2020-10-27 10:42:56 +00:00
|
|
|
unsigned int debug_id;
|
2017-11-02 15:27:50 +00:00
|
|
|
time64_t now, expire_delay;
|
afs: Simplify cell record handling
Simplify afs_cell record handling to avoid very occasional races that cause
module removal to hang (it waits for all cell records to be removed).
There are two things that particularly contribute to the difficulty:
firstly, the code tries to pass a ref on the cell to the cell's maintenance
work item (which gets awkward if the work item is already queued); and,
secondly, there's an overall cell manager that tries to use just one timer
for the entire cell collection (to avoid having loads of timers). However,
both of these are probably unnecessarily restrictive.
To simplify this, the following changes are made:
(1) The cell record collection manager is removed. Each cell record
manages itself individually.
(2) Each afs_cell is given a second work item (cell->destroyer) that is
queued when its refcount reaches zero. This is not done in the
context of the putting thread as it might be in an inconvenient place
to sleep.
(3) Each afs_cell is given its own timer. The timer is used to expire the
cell record after a period of unuse if not otherwise pinned and can
also be used for other maintenance tasks if necessary (of which there
are currently none as DNS refresh is triggered by filesystem
operations).
(4) The afs_cell manager work item (cell->manager) is no longer given a
ref on the cell when queued; rather, the manager must be deleted.
This does away with the need to deal with the consequences of losing a
race to queue cell->manager. Clean up of extra queuing is deferred to
the destroyer.
(5) The cell destroyer work item makes sure the cell timer is removed and
that the normal cell work is cancelled before farming the actual
destruction off to RCU.
(6) When a network namespace is destroyed or the kafs module is unloaded,
it's now a simple matter of marking the namespace as dead then just
waking up all the cell work items. They will then remove and destroy
themselves once all remaining activity counts and/or a ref counts are
dropped. This makes sure that all server records are dropped first.
(7) The cell record state set is reduced to just four states: SETTING_UP,
ACTIVE, REMOVING and DEAD. The record persists in the active state
even when it's not being used until the time comes to remove it rather
than downgrading it to an inactive state from whence it can be
restored.
This means that the cell still appears in /proc and /afs when not in
use until it switches to the REMOVING state - at which point it is
removed.
Note that the REMOVING state is included so that someone wanting to
resurrect the cell record is forced to wait whilst the cell is torn
down in that state. Once it's in the DEAD state, it has been removed
from net->cells tree and is no longer findable and can be replaced.
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
cc: linux-fsdevel@vger.kernel.org
Link: https://lore.kernel.org/r/20250224234154.2014840-16-dhowells@redhat.com/ # v1
Link: https://lore.kernel.org/r/20250310094206.801057-12-dhowells@redhat.com/ # v4
2025-02-24 16:06:03 +00:00
|
|
|
bool zero;
|
2022-07-06 10:52:14 +01:00
|
|
|
int r, a;
|
2005-04-16 15:20:36 -07:00
|
|
|
|
2017-11-02 15:27:50 +00:00
|
|
|
if (!cell)
|
2005-04-16 15:20:36 -07:00
|
|
|
return;
|
|
|
|
|
2017-11-02 15:27:50 +00:00
|
|
|
_enter("%s", cell->name);
|
2007-04-26 15:55:03 -07:00
|
|
|
|
2017-11-02 15:27:50 +00:00
|
|
|
now = ktime_get_real_seconds();
|
|
|
|
cell->last_inactive = now;
|
|
|
|
expire_delay = 0;
|
2019-05-07 15:06:36 +01:00
|
|
|
if (cell->vl_servers->nr_servers)
|
2017-11-02 15:27:50 +00:00
|
|
|
expire_delay = afs_cell_gc_delay;
|
2005-04-16 15:20:36 -07:00
|
|
|
|
2020-10-27 10:42:56 +00:00
|
|
|
debug_id = cell->debug_id;
|
2019-07-23 11:24:59 +01:00
|
|
|
a = atomic_dec_return(&cell->active);
|
afs: Simplify cell record handling
Simplify afs_cell record handling to avoid very occasional races that cause
module removal to hang (it waits for all cell records to be removed).
There are two things that particularly contribute to the difficulty:
firstly, the code tries to pass a ref on the cell to the cell's maintenance
work item (which gets awkward if the work item is already queued); and,
secondly, there's an overall cell manager that tries to use just one timer
for the entire cell collection (to avoid having loads of timers). However,
both of these are probably unnecessarily restrictive.
To simplify this, the following changes are made:
(1) The cell record collection manager is removed. Each cell record
manages itself individually.
(2) Each afs_cell is given a second work item (cell->destroyer) that is
queued when its refcount reaches zero. This is not done in the
context of the putting thread as it might be in an inconvenient place
to sleep.
(3) Each afs_cell is given its own timer. The timer is used to expire the
cell record after a period of unuse if not otherwise pinned and can
also be used for other maintenance tasks if necessary (of which there
are currently none as DNS refresh is triggered by filesystem
operations).
(4) The afs_cell manager work item (cell->manager) is no longer given a
ref on the cell when queued; rather, the manager must be deleted.
This does away with the need to deal with the consequences of losing a
race to queue cell->manager. Clean up of extra queuing is deferred to
the destroyer.
(5) The cell destroyer work item makes sure the cell timer is removed and
that the normal cell work is cancelled before farming the actual
destruction off to RCU.
(6) When a network namespace is destroyed or the kafs module is unloaded,
it's now a simple matter of marking the namespace as dead then just
waking up all the cell work items. They will then remove and destroy
themselves once all remaining activity counts and/or a ref counts are
dropped. This makes sure that all server records are dropped first.
(7) The cell record state set is reduced to just four states: SETTING_UP,
ACTIVE, REMOVING and DEAD. The record persists in the active state
even when it's not being used until the time comes to remove it rather
than downgrading it to an inactive state from whence it can be
restored.
This means that the cell still appears in /proc and /afs when not in
use until it switches to the REMOVING state - at which point it is
removed.
Note that the REMOVING state is included so that someone wanting to
resurrect the cell record is forced to wait whilst the cell is torn
down in that state. Once it's in the DEAD state, it has been removed
from net->cells tree and is no longer findable and can be replaced.
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
cc: linux-fsdevel@vger.kernel.org
Link: https://lore.kernel.org/r/20250224234154.2014840-16-dhowells@redhat.com/ # v1
Link: https://lore.kernel.org/r/20250310094206.801057-12-dhowells@redhat.com/ # v4
2025-02-24 16:06:03 +00:00
|
|
|
if (!a)
|
2019-07-23 11:24:59 +01:00
|
|
|
/* 'cell' may now be garbage collected. */
|
afs: Simplify cell record handling
Simplify afs_cell record handling to avoid very occasional races that cause
module removal to hang (it waits for all cell records to be removed).
There are two things that particularly contribute to the difficulty:
firstly, the code tries to pass a ref on the cell to the cell's maintenance
work item (which gets awkward if the work item is already queued); and,
secondly, there's an overall cell manager that tries to use just one timer
for the entire cell collection (to avoid having loads of timers). However,
both of these are probably unnecessarily restrictive.
To simplify this, the following changes are made:
(1) The cell record collection manager is removed. Each cell record
manages itself individually.
(2) Each afs_cell is given a second work item (cell->destroyer) that is
queued when its refcount reaches zero. This is not done in the
context of the putting thread as it might be in an inconvenient place
to sleep.
(3) Each afs_cell is given its own timer. The timer is used to expire the
cell record after a period of unuse if not otherwise pinned and can
also be used for other maintenance tasks if necessary (of which there
are currently none as DNS refresh is triggered by filesystem
operations).
(4) The afs_cell manager work item (cell->manager) is no longer given a
ref on the cell when queued; rather, the manager must be deleted.
This does away with the need to deal with the consequences of losing a
race to queue cell->manager. Clean up of extra queuing is deferred to
the destroyer.
(5) The cell destroyer work item makes sure the cell timer is removed and
that the normal cell work is cancelled before farming the actual
destruction off to RCU.
(6) When a network namespace is destroyed or the kafs module is unloaded,
it's now a simple matter of marking the namespace as dead then just
waking up all the cell work items. They will then remove and destroy
themselves once all remaining activity counts and/or a ref counts are
dropped. This makes sure that all server records are dropped first.
(7) The cell record state set is reduced to just four states: SETTING_UP,
ACTIVE, REMOVING and DEAD. The record persists in the active state
even when it's not being used until the time comes to remove it rather
than downgrading it to an inactive state from whence it can be
restored.
This means that the cell still appears in /proc and /afs when not in
use until it switches to the REMOVING state - at which point it is
removed.
Note that the REMOVING state is included so that someone wanting to
resurrect the cell record is forced to wait whilst the cell is torn
down in that state. Once it's in the DEAD state, it has been removed
from net->cells tree and is no longer findable and can be replaced.
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
cc: linux-fsdevel@vger.kernel.org
Link: https://lore.kernel.org/r/20250224234154.2014840-16-dhowells@redhat.com/ # v1
Link: https://lore.kernel.org/r/20250310094206.801057-12-dhowells@redhat.com/ # v4
2025-02-24 16:06:03 +00:00
|
|
|
afs_set_cell_timer(cell, expire_delay);
|
|
|
|
|
|
|
|
zero = __refcount_dec_and_test(&cell->ref, &r);
|
|
|
|
trace_afs_cell(debug_id, r - 1, a, reason);
|
|
|
|
if (zero)
|
|
|
|
WARN_ON(!queue_work(afs_wq, &cell->destroyer));
|
2019-07-23 11:24:59 +01:00
|
|
|
}
|
2005-04-16 15:20:36 -07:00
|
|
|
|
2020-10-13 20:51:59 +01:00
|
|
|
/*
|
|
|
|
* Note that a cell has been seen.
|
|
|
|
*/
|
|
|
|
void afs_see_cell(struct afs_cell *cell, enum afs_cell_trace reason)
|
|
|
|
{
|
2022-07-06 10:52:14 +01:00
|
|
|
int r, a;
|
2020-10-13 20:51:59 +01:00
|
|
|
|
2022-07-06 10:52:14 +01:00
|
|
|
r = refcount_read(&cell->ref);
|
2020-10-13 20:51:59 +01:00
|
|
|
a = atomic_read(&cell->active);
|
2022-07-06 10:52:14 +01:00
|
|
|
trace_afs_cell(cell->debug_id, r, a, reason);
|
2020-10-13 20:51:59 +01:00
|
|
|
}
|
|
|
|
|
2019-07-23 11:24:59 +01:00
|
|
|
/*
|
|
|
|
* Queue a cell for management, giving the workqueue a ref to hold.
|
|
|
|
*/
|
2020-10-13 20:51:59 +01:00
|
|
|
void afs_queue_cell(struct afs_cell *cell, enum afs_cell_trace reason)
|
2019-07-23 11:24:59 +01:00
|
|
|
{
|
afs: Simplify cell record handling
Simplify afs_cell record handling to avoid very occasional races that cause
module removal to hang (it waits for all cell records to be removed).
There are two things that particularly contribute to the difficulty:
firstly, the code tries to pass a ref on the cell to the cell's maintenance
work item (which gets awkward if the work item is already queued); and,
secondly, there's an overall cell manager that tries to use just one timer
for the entire cell collection (to avoid having loads of timers). However,
both of these are probably unnecessarily restrictive.
To simplify this, the following changes are made:
(1) The cell record collection manager is removed. Each cell record
manages itself individually.
(2) Each afs_cell is given a second work item (cell->destroyer) that is
queued when its refcount reaches zero. This is not done in the
context of the putting thread as it might be in an inconvenient place
to sleep.
(3) Each afs_cell is given its own timer. The timer is used to expire the
cell record after a period of unuse if not otherwise pinned and can
also be used for other maintenance tasks if necessary (of which there
are currently none as DNS refresh is triggered by filesystem
operations).
(4) The afs_cell manager work item (cell->manager) is no longer given a
ref on the cell when queued; rather, the manager must be deleted.
This does away with the need to deal with the consequences of losing a
race to queue cell->manager. Clean up of extra queuing is deferred to
the destroyer.
(5) The cell destroyer work item makes sure the cell timer is removed and
that the normal cell work is cancelled before farming the actual
destruction off to RCU.
(6) When a network namespace is destroyed or the kafs module is unloaded,
it's now a simple matter of marking the namespace as dead then just
waking up all the cell work items. They will then remove and destroy
themselves once all remaining activity counts and/or a ref counts are
dropped. This makes sure that all server records are dropped first.
(7) The cell record state set is reduced to just four states: SETTING_UP,
ACTIVE, REMOVING and DEAD. The record persists in the active state
even when it's not being used until the time comes to remove it rather
than downgrading it to an inactive state from whence it can be
restored.
This means that the cell still appears in /proc and /afs when not in
use until it switches to the REMOVING state - at which point it is
removed.
Note that the REMOVING state is included so that someone wanting to
resurrect the cell record is forced to wait whilst the cell is torn
down in that state. Once it's in the DEAD state, it has been removed
from net->cells tree and is no longer findable and can be replaced.
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
cc: linux-fsdevel@vger.kernel.org
Link: https://lore.kernel.org/r/20250224234154.2014840-16-dhowells@redhat.com/ # v1
Link: https://lore.kernel.org/r/20250310094206.801057-12-dhowells@redhat.com/ # v4
2025-02-24 16:06:03 +00:00
|
|
|
queue_work(afs_wq, &cell->manager);
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Cell-specific management timer.
|
|
|
|
*/
|
|
|
|
static void afs_cell_timer(struct timer_list *timer)
|
|
|
|
{
|
|
|
|
struct afs_cell *cell = container_of(timer, struct afs_cell, management_timer);
|
|
|
|
|
|
|
|
afs_see_cell(cell, afs_cell_trace_see_mgmt_timer);
|
|
|
|
if (refcount_read(&cell->ref) > 0 && cell->net->live)
|
|
|
|
queue_work(afs_wq, &cell->manager);
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Set/reduce the cell timer.
|
|
|
|
*/
|
|
|
|
void afs_set_cell_timer(struct afs_cell *cell, unsigned int delay_secs)
|
|
|
|
{
|
|
|
|
timer_reduce(&cell->management_timer, jiffies + delay_secs * HZ);
|
2007-04-26 15:49:28 -07:00
|
|
|
}
|
2005-04-16 15:20:36 -07:00
|
|
|
|
|
|
|
/*
|
2017-11-02 15:27:50 +00:00
|
|
|
* Allocate a key to use as a placeholder for anonymous user security.
|
2005-04-16 15:20:36 -07:00
|
|
|
*/
|
2017-11-02 15:27:50 +00:00
|
|
|
static int afs_alloc_anon_key(struct afs_cell *cell)
|
2005-04-16 15:20:36 -07:00
|
|
|
{
|
2017-11-02 15:27:50 +00:00
|
|
|
struct key *key;
|
|
|
|
char keyname[4 + AFS_MAXCELLNAME + 1], *cp, *dp;
|
2005-04-16 15:20:36 -07:00
|
|
|
|
2017-11-02 15:27:50 +00:00
|
|
|
/* Create a key to represent an anonymous user. */
|
|
|
|
memcpy(keyname, "afs@", 4);
|
|
|
|
dp = keyname + 4;
|
|
|
|
cp = cell->name;
|
|
|
|
do {
|
|
|
|
*dp++ = tolower(*cp);
|
|
|
|
} while (*cp++);
|
2005-04-16 15:20:36 -07:00
|
|
|
|
2017-11-02 15:27:50 +00:00
|
|
|
key = rxrpc_get_null_key(keyname);
|
|
|
|
if (IS_ERR(key))
|
|
|
|
return PTR_ERR(key);
|
2005-04-16 15:20:36 -07:00
|
|
|
|
2017-11-02 15:27:50 +00:00
|
|
|
cell->anonymous_key = key;
|
2005-04-16 15:20:36 -07:00
|
|
|
|
2017-11-02 15:27:50 +00:00
|
|
|
_debug("anon key %p{%x}",
|
|
|
|
cell->anonymous_key, key_serial(cell->anonymous_key));
|
|
|
|
return 0;
|
|
|
|
}
|
2005-04-16 15:20:36 -07:00
|
|
|
|
2017-11-02 15:27:50 +00:00
|
|
|
/*
|
|
|
|
* Activate a cell.
|
|
|
|
*/
|
|
|
|
static int afs_activate_cell(struct afs_net *net, struct afs_cell *cell)
|
|
|
|
{
|
2018-10-11 22:45:49 +01:00
|
|
|
struct hlist_node **p;
|
|
|
|
struct afs_cell *pcell;
|
2017-11-02 15:27:50 +00:00
|
|
|
int ret;
|
|
|
|
|
|
|
|
if (!cell->anonymous_key) {
|
|
|
|
ret = afs_alloc_anon_key(cell);
|
|
|
|
if (ret < 0)
|
|
|
|
return ret;
|
2007-04-26 15:55:03 -07:00
|
|
|
}
|
|
|
|
|
2018-05-18 11:46:15 +01:00
|
|
|
ret = afs_proc_cell_setup(cell);
|
2017-11-02 15:27:50 +00:00
|
|
|
if (ret < 0)
|
|
|
|
return ret;
|
2018-06-15 15:19:22 +01:00
|
|
|
|
|
|
|
mutex_lock(&net->proc_cells_lock);
|
2018-10-11 22:45:49 +01:00
|
|
|
for (p = &net->proc_cells.first; *p; p = &(*p)->next) {
|
|
|
|
pcell = hlist_entry(*p, struct afs_cell, proc_link);
|
|
|
|
if (strcmp(cell->name, pcell->name) < 0)
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
|
|
|
|
cell->proc_link.pprev = p;
|
|
|
|
cell->proc_link.next = *p;
|
|
|
|
rcu_assign_pointer(*p, &cell->proc_link.next);
|
|
|
|
if (cell->proc_link.next)
|
|
|
|
cell->proc_link.next->pprev = &cell->proc_link.next;
|
|
|
|
|
2018-06-15 15:19:22 +01:00
|
|
|
mutex_unlock(&net->proc_cells_lock);
|
2017-11-02 15:27:50 +00:00
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Deactivate a cell.
|
|
|
|
*/
|
|
|
|
static void afs_deactivate_cell(struct afs_net *net, struct afs_cell *cell)
|
|
|
|
{
|
|
|
|
_enter("%s", cell->name);
|
2005-04-16 15:20:36 -07:00
|
|
|
|
2018-05-18 11:46:15 +01:00
|
|
|
afs_proc_cell_remove(cell);
|
2005-04-16 15:20:36 -07:00
|
|
|
|
2018-06-15 15:19:22 +01:00
|
|
|
mutex_lock(&net->proc_cells_lock);
|
2025-01-07 18:34:49 +00:00
|
|
|
if (!hlist_unhashed(&cell->proc_link))
|
|
|
|
hlist_del_rcu(&cell->proc_link);
|
2018-06-15 15:19:22 +01:00
|
|
|
mutex_unlock(&net->proc_cells_lock);
|
2005-04-16 15:20:36 -07:00
|
|
|
|
2017-11-02 15:27:50 +00:00
|
|
|
_leave("");
|
2007-04-26 15:49:28 -07:00
|
|
|
}
|
2005-04-16 15:20:36 -07:00
|
|
|
|
afs: Simplify cell record handling
Simplify afs_cell record handling to avoid very occasional races that cause
module removal to hang (it waits for all cell records to be removed).
There are two things that particularly contribute to the difficulty:
firstly, the code tries to pass a ref on the cell to the cell's maintenance
work item (which gets awkward if the work item is already queued); and,
secondly, there's an overall cell manager that tries to use just one timer
for the entire cell collection (to avoid having loads of timers). However,
both of these are probably unnecessarily restrictive.
To simplify this, the following changes are made:
(1) The cell record collection manager is removed. Each cell record
manages itself individually.
(2) Each afs_cell is given a second work item (cell->destroyer) that is
queued when its refcount reaches zero. This is not done in the
context of the putting thread as it might be in an inconvenient place
to sleep.
(3) Each afs_cell is given its own timer. The timer is used to expire the
cell record after a period of unuse if not otherwise pinned and can
also be used for other maintenance tasks if necessary (of which there
are currently none as DNS refresh is triggered by filesystem
operations).
(4) The afs_cell manager work item (cell->manager) is no longer given a
ref on the cell when queued; rather, the manager must be deleted.
This does away with the need to deal with the consequences of losing a
race to queue cell->manager. Clean up of extra queuing is deferred to
the destroyer.
(5) The cell destroyer work item makes sure the cell timer is removed and
that the normal cell work is cancelled before farming the actual
destruction off to RCU.
(6) When a network namespace is destroyed or the kafs module is unloaded,
it's now a simple matter of marking the namespace as dead then just
waking up all the cell work items. They will then remove and destroy
themselves once all remaining activity counts and/or a ref counts are
dropped. This makes sure that all server records are dropped first.
(7) The cell record state set is reduced to just four states: SETTING_UP,
ACTIVE, REMOVING and DEAD. The record persists in the active state
even when it's not being used until the time comes to remove it rather
than downgrading it to an inactive state from whence it can be
restored.
This means that the cell still appears in /proc and /afs when not in
use until it switches to the REMOVING state - at which point it is
removed.
Note that the REMOVING state is included so that someone wanting to
resurrect the cell record is forced to wait whilst the cell is torn
down in that state. Once it's in the DEAD state, it has been removed
from net->cells tree and is no longer findable and can be replaced.
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
cc: linux-fsdevel@vger.kernel.org
Link: https://lore.kernel.org/r/20250224234154.2014840-16-dhowells@redhat.com/ # v1
Link: https://lore.kernel.org/r/20250310094206.801057-12-dhowells@redhat.com/ # v4
2025-02-24 16:06:03 +00:00
|
|
|
static bool afs_has_cell_expired(struct afs_cell *cell, time64_t *_next_manage)
|
|
|
|
{
|
|
|
|
const struct afs_vlserver_list *vllist;
|
|
|
|
time64_t expire_at = cell->last_inactive;
|
|
|
|
time64_t now = ktime_get_real_seconds();
|
|
|
|
|
|
|
|
if (atomic_read(&cell->active))
|
|
|
|
return false;
|
|
|
|
if (!cell->net->live)
|
|
|
|
return true;
|
|
|
|
|
|
|
|
vllist = rcu_dereference_protected(cell->vl_servers, true);
|
|
|
|
if (vllist && vllist->nr_servers > 0)
|
|
|
|
expire_at += afs_cell_gc_delay;
|
|
|
|
|
|
|
|
if (expire_at <= now)
|
|
|
|
return true;
|
|
|
|
if (expire_at < *_next_manage)
|
|
|
|
*_next_manage = expire_at;
|
|
|
|
return false;
|
|
|
|
}
|
|
|
|
|
2005-04-16 15:20:36 -07:00
|
|
|
/*
|
2017-11-02 15:27:50 +00:00
|
|
|
* Manage a cell record, initialising and destroying it, maintaining its DNS
|
|
|
|
* records.
|
2005-04-16 15:20:36 -07:00
|
|
|
*/
|
afs: Simplify cell record handling
Simplify afs_cell record handling to avoid very occasional races that cause
module removal to hang (it waits for all cell records to be removed).
There are two things that particularly contribute to the difficulty:
firstly, the code tries to pass a ref on the cell to the cell's maintenance
work item (which gets awkward if the work item is already queued); and,
secondly, there's an overall cell manager that tries to use just one timer
for the entire cell collection (to avoid having loads of timers). However,
both of these are probably unnecessarily restrictive.
To simplify this, the following changes are made:
(1) The cell record collection manager is removed. Each cell record
manages itself individually.
(2) Each afs_cell is given a second work item (cell->destroyer) that is
queued when its refcount reaches zero. This is not done in the
context of the putting thread as it might be in an inconvenient place
to sleep.
(3) Each afs_cell is given its own timer. The timer is used to expire the
cell record after a period of unuse if not otherwise pinned and can
also be used for other maintenance tasks if necessary (of which there
are currently none as DNS refresh is triggered by filesystem
operations).
(4) The afs_cell manager work item (cell->manager) is no longer given a
ref on the cell when queued; rather, the manager must be deleted.
This does away with the need to deal with the consequences of losing a
race to queue cell->manager. Clean up of extra queuing is deferred to
the destroyer.
(5) The cell destroyer work item makes sure the cell timer is removed and
that the normal cell work is cancelled before farming the actual
destruction off to RCU.
(6) When a network namespace is destroyed or the kafs module is unloaded,
it's now a simple matter of marking the namespace as dead then just
waking up all the cell work items. They will then remove and destroy
themselves once all remaining activity counts and/or a ref counts are
dropped. This makes sure that all server records are dropped first.
(7) The cell record state set is reduced to just four states: SETTING_UP,
ACTIVE, REMOVING and DEAD. The record persists in the active state
even when it's not being used until the time comes to remove it rather
than downgrading it to an inactive state from whence it can be
restored.
This means that the cell still appears in /proc and /afs when not in
use until it switches to the REMOVING state - at which point it is
removed.
Note that the REMOVING state is included so that someone wanting to
resurrect the cell record is forced to wait whilst the cell is torn
down in that state. Once it's in the DEAD state, it has been removed
from net->cells tree and is no longer findable and can be replaced.
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
cc: linux-fsdevel@vger.kernel.org
Link: https://lore.kernel.org/r/20250224234154.2014840-16-dhowells@redhat.com/ # v1
Link: https://lore.kernel.org/r/20250310094206.801057-12-dhowells@redhat.com/ # v4
2025-02-24 16:06:03 +00:00
|
|
|
static bool afs_manage_cell(struct afs_cell *cell)
|
2005-04-16 15:20:36 -07:00
|
|
|
{
|
2017-11-02 15:27:50 +00:00
|
|
|
struct afs_net *net = cell->net;
|
afs: Simplify cell record handling
Simplify afs_cell record handling to avoid very occasional races that cause
module removal to hang (it waits for all cell records to be removed).
There are two things that particularly contribute to the difficulty:
firstly, the code tries to pass a ref on the cell to the cell's maintenance
work item (which gets awkward if the work item is already queued); and,
secondly, there's an overall cell manager that tries to use just one timer
for the entire cell collection (to avoid having loads of timers). However,
both of these are probably unnecessarily restrictive.
To simplify this, the following changes are made:
(1) The cell record collection manager is removed. Each cell record
manages itself individually.
(2) Each afs_cell is given a second work item (cell->destroyer) that is
queued when its refcount reaches zero. This is not done in the
context of the putting thread as it might be in an inconvenient place
to sleep.
(3) Each afs_cell is given its own timer. The timer is used to expire the
cell record after a period of unuse if not otherwise pinned and can
also be used for other maintenance tasks if necessary (of which there
are currently none as DNS refresh is triggered by filesystem
operations).
(4) The afs_cell manager work item (cell->manager) is no longer given a
ref on the cell when queued; rather, the manager must be deleted.
This does away with the need to deal with the consequences of losing a
race to queue cell->manager. Clean up of extra queuing is deferred to
the destroyer.
(5) The cell destroyer work item makes sure the cell timer is removed and
that the normal cell work is cancelled before farming the actual
destruction off to RCU.
(6) When a network namespace is destroyed or the kafs module is unloaded,
it's now a simple matter of marking the namespace as dead then just
waking up all the cell work items. They will then remove and destroy
themselves once all remaining activity counts and/or a ref counts are
dropped. This makes sure that all server records are dropped first.
(7) The cell record state set is reduced to just four states: SETTING_UP,
ACTIVE, REMOVING and DEAD. The record persists in the active state
even when it's not being used until the time comes to remove it rather
than downgrading it to an inactive state from whence it can be
restored.
This means that the cell still appears in /proc and /afs when not in
use until it switches to the REMOVING state - at which point it is
removed.
Note that the REMOVING state is included so that someone wanting to
resurrect the cell record is forced to wait whilst the cell is torn
down in that state. Once it's in the DEAD state, it has been removed
from net->cells tree and is no longer findable and can be replaced.
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
cc: linux-fsdevel@vger.kernel.org
Link: https://lore.kernel.org/r/20250224234154.2014840-16-dhowells@redhat.com/ # v1
Link: https://lore.kernel.org/r/20250310094206.801057-12-dhowells@redhat.com/ # v4
2025-02-24 16:06:03 +00:00
|
|
|
time64_t next_manage = TIME64_MAX;
|
|
|
|
int ret;
|
2017-11-02 15:27:50 +00:00
|
|
|
|
|
|
|
_enter("%s", cell->name);
|
|
|
|
|
|
|
|
_debug("state %u", cell->state);
|
|
|
|
switch (cell->state) {
|
afs: Simplify cell record handling
Simplify afs_cell record handling to avoid very occasional races that cause
module removal to hang (it waits for all cell records to be removed).
There are two things that particularly contribute to the difficulty:
firstly, the code tries to pass a ref on the cell to the cell's maintenance
work item (which gets awkward if the work item is already queued); and,
secondly, there's an overall cell manager that tries to use just one timer
for the entire cell collection (to avoid having loads of timers). However,
both of these are probably unnecessarily restrictive.
To simplify this, the following changes are made:
(1) The cell record collection manager is removed. Each cell record
manages itself individually.
(2) Each afs_cell is given a second work item (cell->destroyer) that is
queued when its refcount reaches zero. This is not done in the
context of the putting thread as it might be in an inconvenient place
to sleep.
(3) Each afs_cell is given its own timer. The timer is used to expire the
cell record after a period of unuse if not otherwise pinned and can
also be used for other maintenance tasks if necessary (of which there
are currently none as DNS refresh is triggered by filesystem
operations).
(4) The afs_cell manager work item (cell->manager) is no longer given a
ref on the cell when queued; rather, the manager must be deleted.
This does away with the need to deal with the consequences of losing a
race to queue cell->manager. Clean up of extra queuing is deferred to
the destroyer.
(5) The cell destroyer work item makes sure the cell timer is removed and
that the normal cell work is cancelled before farming the actual
destruction off to RCU.
(6) When a network namespace is destroyed or the kafs module is unloaded,
it's now a simple matter of marking the namespace as dead then just
waking up all the cell work items. They will then remove and destroy
themselves once all remaining activity counts and/or a ref counts are
dropped. This makes sure that all server records are dropped first.
(7) The cell record state set is reduced to just four states: SETTING_UP,
ACTIVE, REMOVING and DEAD. The record persists in the active state
even when it's not being used until the time comes to remove it rather
than downgrading it to an inactive state from whence it can be
restored.
This means that the cell still appears in /proc and /afs when not in
use until it switches to the REMOVING state - at which point it is
removed.
Note that the REMOVING state is included so that someone wanting to
resurrect the cell record is forced to wait whilst the cell is torn
down in that state. Once it's in the DEAD state, it has been removed
from net->cells tree and is no longer findable and can be replaced.
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
cc: linux-fsdevel@vger.kernel.org
Link: https://lore.kernel.org/r/20250224234154.2014840-16-dhowells@redhat.com/ # v1
Link: https://lore.kernel.org/r/20250310094206.801057-12-dhowells@redhat.com/ # v4
2025-02-24 16:06:03 +00:00
|
|
|
case AFS_CELL_SETTING_UP:
|
|
|
|
goto set_up_cell;
|
|
|
|
case AFS_CELL_ACTIVE:
|
|
|
|
goto cell_is_active;
|
|
|
|
case AFS_CELL_REMOVING:
|
|
|
|
WARN_ON_ONCE(1);
|
|
|
|
return false;
|
|
|
|
case AFS_CELL_DEAD:
|
|
|
|
return false;
|
|
|
|
default:
|
|
|
|
_debug("bad state %u", cell->state);
|
|
|
|
WARN_ON_ONCE(1); /* Unhandled state */
|
|
|
|
return false;
|
|
|
|
}
|
2017-11-02 15:27:50 +00:00
|
|
|
|
afs: Simplify cell record handling
Simplify afs_cell record handling to avoid very occasional races that cause
module removal to hang (it waits for all cell records to be removed).
There are two things that particularly contribute to the difficulty:
firstly, the code tries to pass a ref on the cell to the cell's maintenance
work item (which gets awkward if the work item is already queued); and,
secondly, there's an overall cell manager that tries to use just one timer
for the entire cell collection (to avoid having loads of timers). However,
both of these are probably unnecessarily restrictive.
To simplify this, the following changes are made:
(1) The cell record collection manager is removed. Each cell record
manages itself individually.
(2) Each afs_cell is given a second work item (cell->destroyer) that is
queued when its refcount reaches zero. This is not done in the
context of the putting thread as it might be in an inconvenient place
to sleep.
(3) Each afs_cell is given its own timer. The timer is used to expire the
cell record after a period of unuse if not otherwise pinned and can
also be used for other maintenance tasks if necessary (of which there
are currently none as DNS refresh is triggered by filesystem
operations).
(4) The afs_cell manager work item (cell->manager) is no longer given a
ref on the cell when queued; rather, the manager must be deleted.
This does away with the need to deal with the consequences of losing a
race to queue cell->manager. Clean up of extra queuing is deferred to
the destroyer.
(5) The cell destroyer work item makes sure the cell timer is removed and
that the normal cell work is cancelled before farming the actual
destruction off to RCU.
(6) When a network namespace is destroyed or the kafs module is unloaded,
it's now a simple matter of marking the namespace as dead then just
waking up all the cell work items. They will then remove and destroy
themselves once all remaining activity counts and/or a ref counts are
dropped. This makes sure that all server records are dropped first.
(7) The cell record state set is reduced to just four states: SETTING_UP,
ACTIVE, REMOVING and DEAD. The record persists in the active state
even when it's not being used until the time comes to remove it rather
than downgrading it to an inactive state from whence it can be
restored.
This means that the cell still appears in /proc and /afs when not in
use until it switches to the REMOVING state - at which point it is
removed.
Note that the REMOVING state is included so that someone wanting to
resurrect the cell record is forced to wait whilst the cell is torn
down in that state. Once it's in the DEAD state, it has been removed
from net->cells tree and is no longer findable and can be replaced.
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
cc: linux-fsdevel@vger.kernel.org
Link: https://lore.kernel.org/r/20250224234154.2014840-16-dhowells@redhat.com/ # v1
Link: https://lore.kernel.org/r/20250310094206.801057-12-dhowells@redhat.com/ # v4
2025-02-24 16:06:03 +00:00
|
|
|
set_up_cell:
|
|
|
|
ret = afs_activate_cell(net, cell);
|
|
|
|
if (ret < 0) {
|
|
|
|
cell->error = ret;
|
|
|
|
goto remove_cell;
|
|
|
|
}
|
2017-11-02 15:27:50 +00:00
|
|
|
|
afs: Simplify cell record handling
Simplify afs_cell record handling to avoid very occasional races that cause
module removal to hang (it waits for all cell records to be removed).
There are two things that particularly contribute to the difficulty:
firstly, the code tries to pass a ref on the cell to the cell's maintenance
work item (which gets awkward if the work item is already queued); and,
secondly, there's an overall cell manager that tries to use just one timer
for the entire cell collection (to avoid having loads of timers). However,
both of these are probably unnecessarily restrictive.
To simplify this, the following changes are made:
(1) The cell record collection manager is removed. Each cell record
manages itself individually.
(2) Each afs_cell is given a second work item (cell->destroyer) that is
queued when its refcount reaches zero. This is not done in the
context of the putting thread as it might be in an inconvenient place
to sleep.
(3) Each afs_cell is given its own timer. The timer is used to expire the
cell record after a period of unuse if not otherwise pinned and can
also be used for other maintenance tasks if necessary (of which there
are currently none as DNS refresh is triggered by filesystem
operations).
(4) The afs_cell manager work item (cell->manager) is no longer given a
ref on the cell when queued; rather, the manager must be deleted.
This does away with the need to deal with the consequences of losing a
race to queue cell->manager. Clean up of extra queuing is deferred to
the destroyer.
(5) The cell destroyer work item makes sure the cell timer is removed and
that the normal cell work is cancelled before farming the actual
destruction off to RCU.
(6) When a network namespace is destroyed or the kafs module is unloaded,
it's now a simple matter of marking the namespace as dead then just
waking up all the cell work items. They will then remove and destroy
themselves once all remaining activity counts and/or a ref counts are
dropped. This makes sure that all server records are dropped first.
(7) The cell record state set is reduced to just four states: SETTING_UP,
ACTIVE, REMOVING and DEAD. The record persists in the active state
even when it's not being used until the time comes to remove it rather
than downgrading it to an inactive state from whence it can be
restored.
This means that the cell still appears in /proc and /afs when not in
use until it switches to the REMOVING state - at which point it is
removed.
Note that the REMOVING state is included so that someone wanting to
resurrect the cell record is forced to wait whilst the cell is torn
down in that state. Once it's in the DEAD state, it has been removed
from net->cells tree and is no longer findable and can be replaced.
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
cc: linux-fsdevel@vger.kernel.org
Link: https://lore.kernel.org/r/20250224234154.2014840-16-dhowells@redhat.com/ # v1
Link: https://lore.kernel.org/r/20250310094206.801057-12-dhowells@redhat.com/ # v4
2025-02-24 16:06:03 +00:00
|
|
|
afs_set_cell_state(cell, AFS_CELL_ACTIVE);
|
2017-11-02 15:27:50 +00:00
|
|
|
|
afs: Simplify cell record handling
Simplify afs_cell record handling to avoid very occasional races that cause
module removal to hang (it waits for all cell records to be removed).
There are two things that particularly contribute to the difficulty:
firstly, the code tries to pass a ref on the cell to the cell's maintenance
work item (which gets awkward if the work item is already queued); and,
secondly, there's an overall cell manager that tries to use just one timer
for the entire cell collection (to avoid having loads of timers). However,
both of these are probably unnecessarily restrictive.
To simplify this, the following changes are made:
(1) The cell record collection manager is removed. Each cell record
manages itself individually.
(2) Each afs_cell is given a second work item (cell->destroyer) that is
queued when its refcount reaches zero. This is not done in the
context of the putting thread as it might be in an inconvenient place
to sleep.
(3) Each afs_cell is given its own timer. The timer is used to expire the
cell record after a period of unuse if not otherwise pinned and can
also be used for other maintenance tasks if necessary (of which there
are currently none as DNS refresh is triggered by filesystem
operations).
(4) The afs_cell manager work item (cell->manager) is no longer given a
ref on the cell when queued; rather, the manager must be deleted.
This does away with the need to deal with the consequences of losing a
race to queue cell->manager. Clean up of extra queuing is deferred to
the destroyer.
(5) The cell destroyer work item makes sure the cell timer is removed and
that the normal cell work is cancelled before farming the actual
destruction off to RCU.
(6) When a network namespace is destroyed or the kafs module is unloaded,
it's now a simple matter of marking the namespace as dead then just
waking up all the cell work items. They will then remove and destroy
themselves once all remaining activity counts and/or a ref counts are
dropped. This makes sure that all server records are dropped first.
(7) The cell record state set is reduced to just four states: SETTING_UP,
ACTIVE, REMOVING and DEAD. The record persists in the active state
even when it's not being used until the time comes to remove it rather
than downgrading it to an inactive state from whence it can be
restored.
This means that the cell still appears in /proc and /afs when not in
use until it switches to the REMOVING state - at which point it is
removed.
Note that the REMOVING state is included so that someone wanting to
resurrect the cell record is forced to wait whilst the cell is torn
down in that state. Once it's in the DEAD state, it has been removed
from net->cells tree and is no longer findable and can be replaced.
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
cc: linux-fsdevel@vger.kernel.org
Link: https://lore.kernel.org/r/20250224234154.2014840-16-dhowells@redhat.com/ # v1
Link: https://lore.kernel.org/r/20250310094206.801057-12-dhowells@redhat.com/ # v4
2025-02-24 16:06:03 +00:00
|
|
|
cell_is_active:
|
|
|
|
if (afs_has_cell_expired(cell, &next_manage))
|
|
|
|
goto remove_cell;
|
2017-11-02 15:27:50 +00:00
|
|
|
|
afs: Simplify cell record handling
Simplify afs_cell record handling to avoid very occasional races that cause
module removal to hang (it waits for all cell records to be removed).
There are two things that particularly contribute to the difficulty:
firstly, the code tries to pass a ref on the cell to the cell's maintenance
work item (which gets awkward if the work item is already queued); and,
secondly, there's an overall cell manager that tries to use just one timer
for the entire cell collection (to avoid having loads of timers). However,
both of these are probably unnecessarily restrictive.
To simplify this, the following changes are made:
(1) The cell record collection manager is removed. Each cell record
manages itself individually.
(2) Each afs_cell is given a second work item (cell->destroyer) that is
queued when its refcount reaches zero. This is not done in the
context of the putting thread as it might be in an inconvenient place
to sleep.
(3) Each afs_cell is given its own timer. The timer is used to expire the
cell record after a period of unuse if not otherwise pinned and can
also be used for other maintenance tasks if necessary (of which there
are currently none as DNS refresh is triggered by filesystem
operations).
(4) The afs_cell manager work item (cell->manager) is no longer given a
ref on the cell when queued; rather, the manager must be deleted.
This does away with the need to deal with the consequences of losing a
race to queue cell->manager. Clean up of extra queuing is deferred to
the destroyer.
(5) The cell destroyer work item makes sure the cell timer is removed and
that the normal cell work is cancelled before farming the actual
destruction off to RCU.
(6) When a network namespace is destroyed or the kafs module is unloaded,
it's now a simple matter of marking the namespace as dead then just
waking up all the cell work items. They will then remove and destroy
themselves once all remaining activity counts and/or a ref counts are
dropped. This makes sure that all server records are dropped first.
(7) The cell record state set is reduced to just four states: SETTING_UP,
ACTIVE, REMOVING and DEAD. The record persists in the active state
even when it's not being used until the time comes to remove it rather
than downgrading it to an inactive state from whence it can be
restored.
This means that the cell still appears in /proc and /afs when not in
use until it switches to the REMOVING state - at which point it is
removed.
Note that the REMOVING state is included so that someone wanting to
resurrect the cell record is forced to wait whilst the cell is torn
down in that state. Once it's in the DEAD state, it has been removed
from net->cells tree and is no longer findable and can be replaced.
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
cc: linux-fsdevel@vger.kernel.org
Link: https://lore.kernel.org/r/20250224234154.2014840-16-dhowells@redhat.com/ # v1
Link: https://lore.kernel.org/r/20250310094206.801057-12-dhowells@redhat.com/ # v4
2025-02-24 16:06:03 +00:00
|
|
|
if (test_and_clear_bit(AFS_CELL_FL_DO_LOOKUP, &cell->flags)) {
|
|
|
|
ret = afs_update_cell(cell);
|
|
|
|
if (ret < 0)
|
|
|
|
cell->error = ret;
|
|
|
|
}
|
2020-10-16 13:21:14 +01:00
|
|
|
|
afs: Simplify cell record handling
Simplify afs_cell record handling to avoid very occasional races that cause
module removal to hang (it waits for all cell records to be removed).
There are two things that particularly contribute to the difficulty:
firstly, the code tries to pass a ref on the cell to the cell's maintenance
work item (which gets awkward if the work item is already queued); and,
secondly, there's an overall cell manager that tries to use just one timer
for the entire cell collection (to avoid having loads of timers). However,
both of these are probably unnecessarily restrictive.
To simplify this, the following changes are made:
(1) The cell record collection manager is removed. Each cell record
manages itself individually.
(2) Each afs_cell is given a second work item (cell->destroyer) that is
queued when its refcount reaches zero. This is not done in the
context of the putting thread as it might be in an inconvenient place
to sleep.
(3) Each afs_cell is given its own timer. The timer is used to expire the
cell record after a period of unuse if not otherwise pinned and can
also be used for other maintenance tasks if necessary (of which there
are currently none as DNS refresh is triggered by filesystem
operations).
(4) The afs_cell manager work item (cell->manager) is no longer given a
ref on the cell when queued; rather, the manager must be deleted.
This does away with the need to deal with the consequences of losing a
race to queue cell->manager. Clean up of extra queuing is deferred to
the destroyer.
(5) The cell destroyer work item makes sure the cell timer is removed and
that the normal cell work is cancelled before farming the actual
destruction off to RCU.
(6) When a network namespace is destroyed or the kafs module is unloaded,
it's now a simple matter of marking the namespace as dead then just
waking up all the cell work items. They will then remove and destroy
themselves once all remaining activity counts and/or a ref counts are
dropped. This makes sure that all server records are dropped first.
(7) The cell record state set is reduced to just four states: SETTING_UP,
ACTIVE, REMOVING and DEAD. The record persists in the active state
even when it's not being used until the time comes to remove it rather
than downgrading it to an inactive state from whence it can be
restored.
This means that the cell still appears in /proc and /afs when not in
use until it switches to the REMOVING state - at which point it is
removed.
Note that the REMOVING state is included so that someone wanting to
resurrect the cell record is forced to wait whilst the cell is torn
down in that state. Once it's in the DEAD state, it has been removed
from net->cells tree and is no longer findable and can be replaced.
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
cc: linux-fsdevel@vger.kernel.org
Link: https://lore.kernel.org/r/20250224234154.2014840-16-dhowells@redhat.com/ # v1
Link: https://lore.kernel.org/r/20250310094206.801057-12-dhowells@redhat.com/ # v4
2025-02-24 16:06:03 +00:00
|
|
|
if (next_manage < TIME64_MAX && cell->net->live) {
|
|
|
|
time64_t now = ktime_get_real_seconds();
|
|
|
|
|
|
|
|
if (next_manage - now <= 0)
|
|
|
|
afs_queue_cell(cell, afs_cell_trace_queue_again);
|
|
|
|
else
|
|
|
|
afs_set_cell_timer(cell, next_manage - now);
|
2017-11-02 15:27:50 +00:00
|
|
|
}
|
afs: Simplify cell record handling
Simplify afs_cell record handling to avoid very occasional races that cause
module removal to hang (it waits for all cell records to be removed).
There are two things that particularly contribute to the difficulty:
firstly, the code tries to pass a ref on the cell to the cell's maintenance
work item (which gets awkward if the work item is already queued); and,
secondly, there's an overall cell manager that tries to use just one timer
for the entire cell collection (to avoid having loads of timers). However,
both of these are probably unnecessarily restrictive.
To simplify this, the following changes are made:
(1) The cell record collection manager is removed. Each cell record
manages itself individually.
(2) Each afs_cell is given a second work item (cell->destroyer) that is
queued when its refcount reaches zero. This is not done in the
context of the putting thread as it might be in an inconvenient place
to sleep.
(3) Each afs_cell is given its own timer. The timer is used to expire the
cell record after a period of unuse if not otherwise pinned and can
also be used for other maintenance tasks if necessary (of which there
are currently none as DNS refresh is triggered by filesystem
operations).
(4) The afs_cell manager work item (cell->manager) is no longer given a
ref on the cell when queued; rather, the manager must be deleted.
This does away with the need to deal with the consequences of losing a
race to queue cell->manager. Clean up of extra queuing is deferred to
the destroyer.
(5) The cell destroyer work item makes sure the cell timer is removed and
that the normal cell work is cancelled before farming the actual
destruction off to RCU.
(6) When a network namespace is destroyed or the kafs module is unloaded,
it's now a simple matter of marking the namespace as dead then just
waking up all the cell work items. They will then remove and destroy
themselves once all remaining activity counts and/or a ref counts are
dropped. This makes sure that all server records are dropped first.
(7) The cell record state set is reduced to just four states: SETTING_UP,
ACTIVE, REMOVING and DEAD. The record persists in the active state
even when it's not being used until the time comes to remove it rather
than downgrading it to an inactive state from whence it can be
restored.
This means that the cell still appears in /proc and /afs when not in
use until it switches to the REMOVING state - at which point it is
removed.
Note that the REMOVING state is included so that someone wanting to
resurrect the cell record is forced to wait whilst the cell is torn
down in that state. Once it's in the DEAD state, it has been removed
from net->cells tree and is no longer findable and can be replaced.
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
cc: linux-fsdevel@vger.kernel.org
Link: https://lore.kernel.org/r/20250224234154.2014840-16-dhowells@redhat.com/ # v1
Link: https://lore.kernel.org/r/20250310094206.801057-12-dhowells@redhat.com/ # v4
2025-02-24 16:06:03 +00:00
|
|
|
_leave(" [done %u]", cell->state);
|
|
|
|
return false;
|
2017-11-02 15:27:50 +00:00
|
|
|
|
afs: Simplify cell record handling
Simplify afs_cell record handling to avoid very occasional races that cause
module removal to hang (it waits for all cell records to be removed).
There are two things that particularly contribute to the difficulty:
firstly, the code tries to pass a ref on the cell to the cell's maintenance
work item (which gets awkward if the work item is already queued); and,
secondly, there's an overall cell manager that tries to use just one timer
for the entire cell collection (to avoid having loads of timers). However,
both of these are probably unnecessarily restrictive.
To simplify this, the following changes are made:
(1) The cell record collection manager is removed. Each cell record
manages itself individually.
(2) Each afs_cell is given a second work item (cell->destroyer) that is
queued when its refcount reaches zero. This is not done in the
context of the putting thread as it might be in an inconvenient place
to sleep.
(3) Each afs_cell is given its own timer. The timer is used to expire the
cell record after a period of unuse if not otherwise pinned and can
also be used for other maintenance tasks if necessary (of which there
are currently none as DNS refresh is triggered by filesystem
operations).
(4) The afs_cell manager work item (cell->manager) is no longer given a
ref on the cell when queued; rather, the manager must be deleted.
This does away with the need to deal with the consequences of losing a
race to queue cell->manager. Clean up of extra queuing is deferred to
the destroyer.
(5) The cell destroyer work item makes sure the cell timer is removed and
that the normal cell work is cancelled before farming the actual
destruction off to RCU.
(6) When a network namespace is destroyed or the kafs module is unloaded,
it's now a simple matter of marking the namespace as dead then just
waking up all the cell work items. They will then remove and destroy
themselves once all remaining activity counts and/or a ref counts are
dropped. This makes sure that all server records are dropped first.
(7) The cell record state set is reduced to just four states: SETTING_UP,
ACTIVE, REMOVING and DEAD. The record persists in the active state
even when it's not being used until the time comes to remove it rather
than downgrading it to an inactive state from whence it can be
restored.
This means that the cell still appears in /proc and /afs when not in
use until it switches to the REMOVING state - at which point it is
removed.
Note that the REMOVING state is included so that someone wanting to
resurrect the cell record is forced to wait whilst the cell is torn
down in that state. Once it's in the DEAD state, it has been removed
from net->cells tree and is no longer findable and can be replaced.
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
cc: linux-fsdevel@vger.kernel.org
Link: https://lore.kernel.org/r/20250224234154.2014840-16-dhowells@redhat.com/ # v1
Link: https://lore.kernel.org/r/20250310094206.801057-12-dhowells@redhat.com/ # v4
2025-02-24 16:06:03 +00:00
|
|
|
remove_cell:
|
|
|
|
down_write(&net->cells_lock);
|
2017-11-02 15:27:50 +00:00
|
|
|
|
afs: Simplify cell record handling
Simplify afs_cell record handling to avoid very occasional races that cause
module removal to hang (it waits for all cell records to be removed).
There are two things that particularly contribute to the difficulty:
firstly, the code tries to pass a ref on the cell to the cell's maintenance
work item (which gets awkward if the work item is already queued); and,
secondly, there's an overall cell manager that tries to use just one timer
for the entire cell collection (to avoid having loads of timers). However,
both of these are probably unnecessarily restrictive.
To simplify this, the following changes are made:
(1) The cell record collection manager is removed. Each cell record
manages itself individually.
(2) Each afs_cell is given a second work item (cell->destroyer) that is
queued when its refcount reaches zero. This is not done in the
context of the putting thread as it might be in an inconvenient place
to sleep.
(3) Each afs_cell is given its own timer. The timer is used to expire the
cell record after a period of unuse if not otherwise pinned and can
also be used for other maintenance tasks if necessary (of which there
are currently none as DNS refresh is triggered by filesystem
operations).
(4) The afs_cell manager work item (cell->manager) is no longer given a
ref on the cell when queued; rather, the manager must be deleted.
This does away with the need to deal with the consequences of losing a
race to queue cell->manager. Clean up of extra queuing is deferred to
the destroyer.
(5) The cell destroyer work item makes sure the cell timer is removed and
that the normal cell work is cancelled before farming the actual
destruction off to RCU.
(6) When a network namespace is destroyed or the kafs module is unloaded,
it's now a simple matter of marking the namespace as dead then just
waking up all the cell work items. They will then remove and destroy
themselves once all remaining activity counts and/or a ref counts are
dropped. This makes sure that all server records are dropped first.
(7) The cell record state set is reduced to just four states: SETTING_UP,
ACTIVE, REMOVING and DEAD. The record persists in the active state
even when it's not being used until the time comes to remove it rather
than downgrading it to an inactive state from whence it can be
restored.
This means that the cell still appears in /proc and /afs when not in
use until it switches to the REMOVING state - at which point it is
removed.
Note that the REMOVING state is included so that someone wanting to
resurrect the cell record is forced to wait whilst the cell is torn
down in that state. Once it's in the DEAD state, it has been removed
from net->cells tree and is no longer findable and can be replaced.
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
cc: linux-fsdevel@vger.kernel.org
Link: https://lore.kernel.org/r/20250224234154.2014840-16-dhowells@redhat.com/ # v1
Link: https://lore.kernel.org/r/20250310094206.801057-12-dhowells@redhat.com/ # v4
2025-02-24 16:06:03 +00:00
|
|
|
if (atomic_read(&cell->active)) {
|
|
|
|
up_write(&net->cells_lock);
|
|
|
|
goto cell_is_active;
|
|
|
|
}
|
2017-11-02 15:27:50 +00:00
|
|
|
|
afs: Simplify cell record handling
Simplify afs_cell record handling to avoid very occasional races that cause
module removal to hang (it waits for all cell records to be removed).
There are two things that particularly contribute to the difficulty:
firstly, the code tries to pass a ref on the cell to the cell's maintenance
work item (which gets awkward if the work item is already queued); and,
secondly, there's an overall cell manager that tries to use just one timer
for the entire cell collection (to avoid having loads of timers). However,
both of these are probably unnecessarily restrictive.
To simplify this, the following changes are made:
(1) The cell record collection manager is removed. Each cell record
manages itself individually.
(2) Each afs_cell is given a second work item (cell->destroyer) that is
queued when its refcount reaches zero. This is not done in the
context of the putting thread as it might be in an inconvenient place
to sleep.
(3) Each afs_cell is given its own timer. The timer is used to expire the
cell record after a period of unuse if not otherwise pinned and can
also be used for other maintenance tasks if necessary (of which there
are currently none as DNS refresh is triggered by filesystem
operations).
(4) The afs_cell manager work item (cell->manager) is no longer given a
ref on the cell when queued; rather, the manager must be deleted.
This does away with the need to deal with the consequences of losing a
race to queue cell->manager. Clean up of extra queuing is deferred to
the destroyer.
(5) The cell destroyer work item makes sure the cell timer is removed and
that the normal cell work is cancelled before farming the actual
destruction off to RCU.
(6) When a network namespace is destroyed or the kafs module is unloaded,
it's now a simple matter of marking the namespace as dead then just
waking up all the cell work items. They will then remove and destroy
themselves once all remaining activity counts and/or a ref counts are
dropped. This makes sure that all server records are dropped first.
(7) The cell record state set is reduced to just four states: SETTING_UP,
ACTIVE, REMOVING and DEAD. The record persists in the active state
even when it's not being used until the time comes to remove it rather
than downgrading it to an inactive state from whence it can be
restored.
This means that the cell still appears in /proc and /afs when not in
use until it switches to the REMOVING state - at which point it is
removed.
Note that the REMOVING state is included so that someone wanting to
resurrect the cell record is forced to wait whilst the cell is torn
down in that state. Once it's in the DEAD state, it has been removed
from net->cells tree and is no longer findable and can be replaced.
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
cc: linux-fsdevel@vger.kernel.org
Link: https://lore.kernel.org/r/20250224234154.2014840-16-dhowells@redhat.com/ # v1
Link: https://lore.kernel.org/r/20250310094206.801057-12-dhowells@redhat.com/ # v4
2025-02-24 16:06:03 +00:00
|
|
|
/* Make sure that the expiring server records are going to see the fact
|
|
|
|
* that the cell is caput.
|
|
|
|
*/
|
|
|
|
afs_set_cell_state(cell, AFS_CELL_REMOVING);
|
2017-11-02 15:27:50 +00:00
|
|
|
|
afs: Simplify cell record handling
Simplify afs_cell record handling to avoid very occasional races that cause
module removal to hang (it waits for all cell records to be removed).
There are two things that particularly contribute to the difficulty:
firstly, the code tries to pass a ref on the cell to the cell's maintenance
work item (which gets awkward if the work item is already queued); and,
secondly, there's an overall cell manager that tries to use just one timer
for the entire cell collection (to avoid having loads of timers). However,
both of these are probably unnecessarily restrictive.
To simplify this, the following changes are made:
(1) The cell record collection manager is removed. Each cell record
manages itself individually.
(2) Each afs_cell is given a second work item (cell->destroyer) that is
queued when its refcount reaches zero. This is not done in the
context of the putting thread as it might be in an inconvenient place
to sleep.
(3) Each afs_cell is given its own timer. The timer is used to expire the
cell record after a period of unuse if not otherwise pinned and can
also be used for other maintenance tasks if necessary (of which there
are currently none as DNS refresh is triggered by filesystem
operations).
(4) The afs_cell manager work item (cell->manager) is no longer given a
ref on the cell when queued; rather, the manager must be deleted.
This does away with the need to deal with the consequences of losing a
race to queue cell->manager. Clean up of extra queuing is deferred to
the destroyer.
(5) The cell destroyer work item makes sure the cell timer is removed and
that the normal cell work is cancelled before farming the actual
destruction off to RCU.
(6) When a network namespace is destroyed or the kafs module is unloaded,
it's now a simple matter of marking the namespace as dead then just
waking up all the cell work items. They will then remove and destroy
themselves once all remaining activity counts and/or a ref counts are
dropped. This makes sure that all server records are dropped first.
(7) The cell record state set is reduced to just four states: SETTING_UP,
ACTIVE, REMOVING and DEAD. The record persists in the active state
even when it's not being used until the time comes to remove it rather
than downgrading it to an inactive state from whence it can be
restored.
This means that the cell still appears in /proc and /afs when not in
use until it switches to the REMOVING state - at which point it is
removed.
Note that the REMOVING state is included so that someone wanting to
resurrect the cell record is forced to wait whilst the cell is torn
down in that state. Once it's in the DEAD state, it has been removed
from net->cells tree and is no longer findable and can be replaced.
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
cc: linux-fsdevel@vger.kernel.org
Link: https://lore.kernel.org/r/20250224234154.2014840-16-dhowells@redhat.com/ # v1
Link: https://lore.kernel.org/r/20250310094206.801057-12-dhowells@redhat.com/ # v4
2025-02-24 16:06:03 +00:00
|
|
|
afs_deactivate_cell(net, cell);
|
|
|
|
afs_purge_servers(cell);
|
|
|
|
|
|
|
|
rb_erase(&cell->net_node, &net->cells);
|
|
|
|
afs_see_cell(cell, afs_cell_trace_unuse_delete);
|
|
|
|
up_write(&net->cells_lock);
|
2017-11-02 15:27:50 +00:00
|
|
|
|
2019-07-23 11:24:59 +01:00
|
|
|
/* The root volume is pinning the cell */
|
2023-11-08 13:01:11 +00:00
|
|
|
afs_put_volume(cell->root_volume, afs_volume_trace_put_cell_root);
|
2019-07-23 11:24:59 +01:00
|
|
|
cell->root_volume = NULL;
|
afs: Simplify cell record handling
Simplify afs_cell record handling to avoid very occasional races that cause
module removal to hang (it waits for all cell records to be removed).
There are two things that particularly contribute to the difficulty:
firstly, the code tries to pass a ref on the cell to the cell's maintenance
work item (which gets awkward if the work item is already queued); and,
secondly, there's an overall cell manager that tries to use just one timer
for the entire cell collection (to avoid having loads of timers). However,
both of these are probably unnecessarily restrictive.
To simplify this, the following changes are made:
(1) The cell record collection manager is removed. Each cell record
manages itself individually.
(2) Each afs_cell is given a second work item (cell->destroyer) that is
queued when its refcount reaches zero. This is not done in the
context of the putting thread as it might be in an inconvenient place
to sleep.
(3) Each afs_cell is given its own timer. The timer is used to expire the
cell record after a period of unuse if not otherwise pinned and can
also be used for other maintenance tasks if necessary (of which there
are currently none as DNS refresh is triggered by filesystem
operations).
(4) The afs_cell manager work item (cell->manager) is no longer given a
ref on the cell when queued; rather, the manager must be deleted.
This does away with the need to deal with the consequences of losing a
race to queue cell->manager. Clean up of extra queuing is deferred to
the destroyer.
(5) The cell destroyer work item makes sure the cell timer is removed and
that the normal cell work is cancelled before farming the actual
destruction off to RCU.
(6) When a network namespace is destroyed or the kafs module is unloaded,
it's now a simple matter of marking the namespace as dead then just
waking up all the cell work items. They will then remove and destroy
themselves once all remaining activity counts and/or a ref counts are
dropped. This makes sure that all server records are dropped first.
(7) The cell record state set is reduced to just four states: SETTING_UP,
ACTIVE, REMOVING and DEAD. The record persists in the active state
even when it's not being used until the time comes to remove it rather
than downgrading it to an inactive state from whence it can be
restored.
This means that the cell still appears in /proc and /afs when not in
use until it switches to the REMOVING state - at which point it is
removed.
Note that the REMOVING state is included so that someone wanting to
resurrect the cell record is forced to wait whilst the cell is torn
down in that state. Once it's in the DEAD state, it has been removed
from net->cells tree and is no longer findable and can be replaced.
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
cc: linux-fsdevel@vger.kernel.org
Link: https://lore.kernel.org/r/20250224234154.2014840-16-dhowells@redhat.com/ # v1
Link: https://lore.kernel.org/r/20250310094206.801057-12-dhowells@redhat.com/ # v4
2025-02-24 16:06:03 +00:00
|
|
|
|
|
|
|
afs_set_cell_state(cell, AFS_CELL_DEAD);
|
|
|
|
return true;
|
2019-07-23 11:24:59 +01:00
|
|
|
}
|
|
|
|
|
|
|
|
static void afs_manage_cell_work(struct work_struct *work)
|
|
|
|
{
|
|
|
|
struct afs_cell *cell = container_of(work, struct afs_cell, manager);
|
afs: Simplify cell record handling
Simplify afs_cell record handling to avoid very occasional races that cause
module removal to hang (it waits for all cell records to be removed).
There are two things that particularly contribute to the difficulty:
firstly, the code tries to pass a ref on the cell to the cell's maintenance
work item (which gets awkward if the work item is already queued); and,
secondly, there's an overall cell manager that tries to use just one timer
for the entire cell collection (to avoid having loads of timers). However,
both of these are probably unnecessarily restrictive.
To simplify this, the following changes are made:
(1) The cell record collection manager is removed. Each cell record
manages itself individually.
(2) Each afs_cell is given a second work item (cell->destroyer) that is
queued when its refcount reaches zero. This is not done in the
context of the putting thread as it might be in an inconvenient place
to sleep.
(3) Each afs_cell is given its own timer. The timer is used to expire the
cell record after a period of unuse if not otherwise pinned and can
also be used for other maintenance tasks if necessary (of which there
are currently none as DNS refresh is triggered by filesystem
operations).
(4) The afs_cell manager work item (cell->manager) is no longer given a
ref on the cell when queued; rather, the manager must be deleted.
This does away with the need to deal with the consequences of losing a
race to queue cell->manager. Clean up of extra queuing is deferred to
the destroyer.
(5) The cell destroyer work item makes sure the cell timer is removed and
that the normal cell work is cancelled before farming the actual
destruction off to RCU.
(6) When a network namespace is destroyed or the kafs module is unloaded,
it's now a simple matter of marking the namespace as dead then just
waking up all the cell work items. They will then remove and destroy
themselves once all remaining activity counts and/or a ref counts are
dropped. This makes sure that all server records are dropped first.
(7) The cell record state set is reduced to just four states: SETTING_UP,
ACTIVE, REMOVING and DEAD. The record persists in the active state
even when it's not being used until the time comes to remove it rather
than downgrading it to an inactive state from whence it can be
restored.
This means that the cell still appears in /proc and /afs when not in
use until it switches to the REMOVING state - at which point it is
removed.
Note that the REMOVING state is included so that someone wanting to
resurrect the cell record is forced to wait whilst the cell is torn
down in that state. Once it's in the DEAD state, it has been removed
from net->cells tree and is no longer findable and can be replaced.
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
cc: linux-fsdevel@vger.kernel.org
Link: https://lore.kernel.org/r/20250224234154.2014840-16-dhowells@redhat.com/ # v1
Link: https://lore.kernel.org/r/20250310094206.801057-12-dhowells@redhat.com/ # v4
2025-02-24 16:06:03 +00:00
|
|
|
bool final_put;
|
2019-07-23 11:24:59 +01:00
|
|
|
|
afs: Simplify cell record handling
Simplify afs_cell record handling to avoid very occasional races that cause
module removal to hang (it waits for all cell records to be removed).
There are two things that particularly contribute to the difficulty:
firstly, the code tries to pass a ref on the cell to the cell's maintenance
work item (which gets awkward if the work item is already queued); and,
secondly, there's an overall cell manager that tries to use just one timer
for the entire cell collection (to avoid having loads of timers). However,
both of these are probably unnecessarily restrictive.
To simplify this, the following changes are made:
(1) The cell record collection manager is removed. Each cell record
manages itself individually.
(2) Each afs_cell is given a second work item (cell->destroyer) that is
queued when its refcount reaches zero. This is not done in the
context of the putting thread as it might be in an inconvenient place
to sleep.
(3) Each afs_cell is given its own timer. The timer is used to expire the
cell record after a period of unuse if not otherwise pinned and can
also be used for other maintenance tasks if necessary (of which there
are currently none as DNS refresh is triggered by filesystem
operations).
(4) The afs_cell manager work item (cell->manager) is no longer given a
ref on the cell when queued; rather, the manager must be deleted.
This does away with the need to deal with the consequences of losing a
race to queue cell->manager. Clean up of extra queuing is deferred to
the destroyer.
(5) The cell destroyer work item makes sure the cell timer is removed and
that the normal cell work is cancelled before farming the actual
destruction off to RCU.
(6) When a network namespace is destroyed or the kafs module is unloaded,
it's now a simple matter of marking the namespace as dead then just
waking up all the cell work items. They will then remove and destroy
themselves once all remaining activity counts and/or a ref counts are
dropped. This makes sure that all server records are dropped first.
(7) The cell record state set is reduced to just four states: SETTING_UP,
ACTIVE, REMOVING and DEAD. The record persists in the active state
even when it's not being used until the time comes to remove it rather
than downgrading it to an inactive state from whence it can be
restored.
This means that the cell still appears in /proc and /afs when not in
use until it switches to the REMOVING state - at which point it is
removed.
Note that the REMOVING state is included so that someone wanting to
resurrect the cell record is forced to wait whilst the cell is torn
down in that state. Once it's in the DEAD state, it has been removed
from net->cells tree and is no longer findable and can be replaced.
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
cc: linux-fsdevel@vger.kernel.org
Link: https://lore.kernel.org/r/20250224234154.2014840-16-dhowells@redhat.com/ # v1
Link: https://lore.kernel.org/r/20250310094206.801057-12-dhowells@redhat.com/ # v4
2025-02-24 16:06:03 +00:00
|
|
|
afs_see_cell(cell, afs_cell_trace_manage);
|
|
|
|
final_put = afs_manage_cell(cell);
|
|
|
|
afs_see_cell(cell, afs_cell_trace_managed);
|
|
|
|
if (final_put)
|
|
|
|
afs_put_cell(cell, afs_cell_trace_put_final);
|
2017-11-02 15:27:50 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Purge in-memory cell database.
|
|
|
|
*/
|
|
|
|
void afs_cell_purge(struct afs_net *net)
|
|
|
|
{
|
|
|
|
struct afs_cell *ws;
|
afs: Simplify cell record handling
Simplify afs_cell record handling to avoid very occasional races that cause
module removal to hang (it waits for all cell records to be removed).
There are two things that particularly contribute to the difficulty:
firstly, the code tries to pass a ref on the cell to the cell's maintenance
work item (which gets awkward if the work item is already queued); and,
secondly, there's an overall cell manager that tries to use just one timer
for the entire cell collection (to avoid having loads of timers). However,
both of these are probably unnecessarily restrictive.
To simplify this, the following changes are made:
(1) The cell record collection manager is removed. Each cell record
manages itself individually.
(2) Each afs_cell is given a second work item (cell->destroyer) that is
queued when its refcount reaches zero. This is not done in the
context of the putting thread as it might be in an inconvenient place
to sleep.
(3) Each afs_cell is given its own timer. The timer is used to expire the
cell record after a period of unuse if not otherwise pinned and can
also be used for other maintenance tasks if necessary (of which there
are currently none as DNS refresh is triggered by filesystem
operations).
(4) The afs_cell manager work item (cell->manager) is no longer given a
ref on the cell when queued; rather, the manager must be deleted.
This does away with the need to deal with the consequences of losing a
race to queue cell->manager. Clean up of extra queuing is deferred to
the destroyer.
(5) The cell destroyer work item makes sure the cell timer is removed and
that the normal cell work is cancelled before farming the actual
destruction off to RCU.
(6) When a network namespace is destroyed or the kafs module is unloaded,
it's now a simple matter of marking the namespace as dead then just
waking up all the cell work items. They will then remove and destroy
themselves once all remaining activity counts and/or a ref counts are
dropped. This makes sure that all server records are dropped first.
(7) The cell record state set is reduced to just four states: SETTING_UP,
ACTIVE, REMOVING and DEAD. The record persists in the active state
even when it's not being used until the time comes to remove it rather
than downgrading it to an inactive state from whence it can be
restored.
This means that the cell still appears in /proc and /afs when not in
use until it switches to the REMOVING state - at which point it is
removed.
Note that the REMOVING state is included so that someone wanting to
resurrect the cell record is forced to wait whilst the cell is torn
down in that state. Once it's in the DEAD state, it has been removed
from net->cells tree and is no longer findable and can be replaced.
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
cc: linux-fsdevel@vger.kernel.org
Link: https://lore.kernel.org/r/20250224234154.2014840-16-dhowells@redhat.com/ # v1
Link: https://lore.kernel.org/r/20250310094206.801057-12-dhowells@redhat.com/ # v4
2025-02-24 16:06:03 +00:00
|
|
|
struct rb_node *cursor;
|
2017-11-02 15:27:50 +00:00
|
|
|
|
|
|
|
_enter("");
|
|
|
|
|
afs: Fix rapid cell addition/removal by not using RCU on cells tree
There are a number of problems that are being seen by the rapidly mounting
and unmounting an afs dynamic root with an explicit cell and volume
specified (which should probably be rejected, but that's a separate issue):
What the tests are doing is to look up/create a cell record for the name
given and then tear it down again without actually using it to try to talk
to a server. This is repeated endlessly, very fast, and the new cell
collides with the old one if it's not quick enough to reuse it.
It appears (as suggested by Hillf Danton) that the search through the RB
tree under a read_seqbegin_or_lock() under RCU conditions isn't safe and
that it's not blocking the write_seqlock(), despite taking two passes at
it. He suggested that the code should take a ref on the cell it's
attempting to look at - but this shouldn't be necessary until we've
compared the cell names. It's possible that I'm missing a barrier
somewhere.
However, using an RCU search for this is overkill, really - we only need to
access the cell name in a few places, and they're places where we're may
end up sleeping anyway.
Fix this by switching to an R/W semaphore instead.
Additionally, draw the down_read() call inside the function (renamed to
afs_find_cell()) since all the callers were taking the RCU read lock (or
should've been[*]).
[*] afs_probe_cell_name() should have been, but that doesn't appear to be
involved in the bug reports.
The symptoms of this look like:
general protection fault, probably for non-canonical address 0xf27d208691691fdb: 0000 [#1] PREEMPT SMP KASAN
KASAN: maybe wild-memory-access in range [0x93e924348b48fed8-0x93e924348b48fedf]
...
RIP: 0010:strncasecmp lib/string.c:52 [inline]
RIP: 0010:strncasecmp+0x5f/0x240 lib/string.c:43
afs_lookup_cell_rcu+0x313/0x720 fs/afs/cell.c:88
afs_lookup_cell+0x2ee/0x1440 fs/afs/cell.c:249
afs_parse_source fs/afs/super.c:290 [inline]
...
Fixes: 989782dcdc91 ("afs: Overhaul cell database management")
Reported-by: syzbot+459a5dce0b4cb70fd076@syzkaller.appspotmail.com
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Hillf Danton <hdanton@sina.com>
cc: syzkaller-bugs@googlegroups.com
2020-10-09 14:11:58 +01:00
|
|
|
down_write(&net->cells_lock);
|
2025-03-06 08:46:57 +00:00
|
|
|
ws = rcu_replace_pointer(net->ws_cell, NULL,
|
|
|
|
lockdep_is_held(&net->cells_lock));
|
afs: Fix rapid cell addition/removal by not using RCU on cells tree
There are a number of problems that are being seen by the rapidly mounting
and unmounting an afs dynamic root with an explicit cell and volume
specified (which should probably be rejected, but that's a separate issue):
What the tests are doing is to look up/create a cell record for the name
given and then tear it down again without actually using it to try to talk
to a server. This is repeated endlessly, very fast, and the new cell
collides with the old one if it's not quick enough to reuse it.
It appears (as suggested by Hillf Danton) that the search through the RB
tree under a read_seqbegin_or_lock() under RCU conditions isn't safe and
that it's not blocking the write_seqlock(), despite taking two passes at
it. He suggested that the code should take a ref on the cell it's
attempting to look at - but this shouldn't be necessary until we've
compared the cell names. It's possible that I'm missing a barrier
somewhere.
However, using an RCU search for this is overkill, really - we only need to
access the cell name in a few places, and they're places where we're may
end up sleeping anyway.
Fix this by switching to an R/W semaphore instead.
Additionally, draw the down_read() call inside the function (renamed to
afs_find_cell()) since all the callers were taking the RCU read lock (or
should've been[*]).
[*] afs_probe_cell_name() should have been, but that doesn't appear to be
involved in the bug reports.
The symptoms of this look like:
general protection fault, probably for non-canonical address 0xf27d208691691fdb: 0000 [#1] PREEMPT SMP KASAN
KASAN: maybe wild-memory-access in range [0x93e924348b48fed8-0x93e924348b48fedf]
...
RIP: 0010:strncasecmp lib/string.c:52 [inline]
RIP: 0010:strncasecmp+0x5f/0x240 lib/string.c:43
afs_lookup_cell_rcu+0x313/0x720 fs/afs/cell.c:88
afs_lookup_cell+0x2ee/0x1440 fs/afs/cell.c:249
afs_parse_source fs/afs/super.c:290 [inline]
...
Fixes: 989782dcdc91 ("afs: Overhaul cell database management")
Reported-by: syzbot+459a5dce0b4cb70fd076@syzkaller.appspotmail.com
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Hillf Danton <hdanton@sina.com>
cc: syzkaller-bugs@googlegroups.com
2020-10-09 14:11:58 +01:00
|
|
|
up_write(&net->cells_lock);
|
2025-02-24 10:54:04 +00:00
|
|
|
afs_unuse_cell(ws, afs_cell_trace_unuse_ws);
|
2017-11-02 15:27:50 +00:00
|
|
|
|
afs: Simplify cell record handling
Simplify afs_cell record handling to avoid very occasional races that cause
module removal to hang (it waits for all cell records to be removed).
There are two things that particularly contribute to the difficulty:
firstly, the code tries to pass a ref on the cell to the cell's maintenance
work item (which gets awkward if the work item is already queued); and,
secondly, there's an overall cell manager that tries to use just one timer
for the entire cell collection (to avoid having loads of timers). However,
both of these are probably unnecessarily restrictive.
To simplify this, the following changes are made:
(1) The cell record collection manager is removed. Each cell record
manages itself individually.
(2) Each afs_cell is given a second work item (cell->destroyer) that is
queued when its refcount reaches zero. This is not done in the
context of the putting thread as it might be in an inconvenient place
to sleep.
(3) Each afs_cell is given its own timer. The timer is used to expire the
cell record after a period of unuse if not otherwise pinned and can
also be used for other maintenance tasks if necessary (of which there
are currently none as DNS refresh is triggered by filesystem
operations).
(4) The afs_cell manager work item (cell->manager) is no longer given a
ref on the cell when queued; rather, the manager must be deleted.
This does away with the need to deal with the consequences of losing a
race to queue cell->manager. Clean up of extra queuing is deferred to
the destroyer.
(5) The cell destroyer work item makes sure the cell timer is removed and
that the normal cell work is cancelled before farming the actual
destruction off to RCU.
(6) When a network namespace is destroyed or the kafs module is unloaded,
it's now a simple matter of marking the namespace as dead then just
waking up all the cell work items. They will then remove and destroy
themselves once all remaining activity counts and/or a ref counts are
dropped. This makes sure that all server records are dropped first.
(7) The cell record state set is reduced to just four states: SETTING_UP,
ACTIVE, REMOVING and DEAD. The record persists in the active state
even when it's not being used until the time comes to remove it rather
than downgrading it to an inactive state from whence it can be
restored.
This means that the cell still appears in /proc and /afs when not in
use until it switches to the REMOVING state - at which point it is
removed.
Note that the REMOVING state is included so that someone wanting to
resurrect the cell record is forced to wait whilst the cell is torn
down in that state. Once it's in the DEAD state, it has been removed
from net->cells tree and is no longer findable and can be replaced.
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
cc: linux-fsdevel@vger.kernel.org
Link: https://lore.kernel.org/r/20250224234154.2014840-16-dhowells@redhat.com/ # v1
Link: https://lore.kernel.org/r/20250310094206.801057-12-dhowells@redhat.com/ # v4
2025-02-24 16:06:03 +00:00
|
|
|
_debug("kick cells");
|
|
|
|
down_read(&net->cells_lock);
|
|
|
|
for (cursor = rb_first(&net->cells); cursor; cursor = rb_next(cursor)) {
|
|
|
|
struct afs_cell *cell = rb_entry(cursor, struct afs_cell, net_node);
|
|
|
|
|
|
|
|
afs_see_cell(cell, afs_cell_trace_purge);
|
2017-11-02 15:27:50 +00:00
|
|
|
|
afs: Simplify cell record handling
Simplify afs_cell record handling to avoid very occasional races that cause
module removal to hang (it waits for all cell records to be removed).
There are two things that particularly contribute to the difficulty:
firstly, the code tries to pass a ref on the cell to the cell's maintenance
work item (which gets awkward if the work item is already queued); and,
secondly, there's an overall cell manager that tries to use just one timer
for the entire cell collection (to avoid having loads of timers). However,
both of these are probably unnecessarily restrictive.
To simplify this, the following changes are made:
(1) The cell record collection manager is removed. Each cell record
manages itself individually.
(2) Each afs_cell is given a second work item (cell->destroyer) that is
queued when its refcount reaches zero. This is not done in the
context of the putting thread as it might be in an inconvenient place
to sleep.
(3) Each afs_cell is given its own timer. The timer is used to expire the
cell record after a period of unuse if not otherwise pinned and can
also be used for other maintenance tasks if necessary (of which there
are currently none as DNS refresh is triggered by filesystem
operations).
(4) The afs_cell manager work item (cell->manager) is no longer given a
ref on the cell when queued; rather, the manager must be deleted.
This does away with the need to deal with the consequences of losing a
race to queue cell->manager. Clean up of extra queuing is deferred to
the destroyer.
(5) The cell destroyer work item makes sure the cell timer is removed and
that the normal cell work is cancelled before farming the actual
destruction off to RCU.
(6) When a network namespace is destroyed or the kafs module is unloaded,
it's now a simple matter of marking the namespace as dead then just
waking up all the cell work items. They will then remove and destroy
themselves once all remaining activity counts and/or a ref counts are
dropped. This makes sure that all server records are dropped first.
(7) The cell record state set is reduced to just four states: SETTING_UP,
ACTIVE, REMOVING and DEAD. The record persists in the active state
even when it's not being used until the time comes to remove it rather
than downgrading it to an inactive state from whence it can be
restored.
This means that the cell still appears in /proc and /afs when not in
use until it switches to the REMOVING state - at which point it is
removed.
Note that the REMOVING state is included so that someone wanting to
resurrect the cell record is forced to wait whilst the cell is torn
down in that state. Once it's in the DEAD state, it has been removed
from net->cells tree and is no longer findable and can be replaced.
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
cc: linux-fsdevel@vger.kernel.org
Link: https://lore.kernel.org/r/20250224234154.2014840-16-dhowells@redhat.com/ # v1
Link: https://lore.kernel.org/r/20250310094206.801057-12-dhowells@redhat.com/ # v4
2025-02-24 16:06:03 +00:00
|
|
|
if (test_and_clear_bit(AFS_CELL_FL_NO_GC, &cell->flags))
|
|
|
|
afs_unuse_cell(cell, afs_cell_trace_unuse_pin);
|
|
|
|
|
|
|
|
afs_queue_cell(cell, afs_cell_trace_queue_purge);
|
|
|
|
}
|
|
|
|
up_read(&net->cells_lock);
|
2017-11-02 15:27:50 +00:00
|
|
|
|
|
|
|
_debug("wait");
|
2018-03-15 11:42:28 +01:00
|
|
|
wait_var_event(&net->cells_outstanding,
|
|
|
|
!atomic_read(&net->cells_outstanding));
|
2005-04-16 15:20:36 -07:00
|
|
|
_leave("");
|
2007-04-26 15:49:28 -07:00
|
|
|
}
|