linux/fs/afs/cell.c

898 lines
22 KiB
C
Raw Permalink Normal View History

// SPDX-License-Identifier: GPL-2.0-or-later
/* AFS cell and server record management
*
* Copyright (C) 2002, 2017 Red Hat, Inc. All Rights Reserved.
* Written by David Howells (dhowells@redhat.com)
*/
#include <linux/slab.h>
#include <linux/key.h>
#include <linux/ctype.h>
#include <linux/dns_resolver.h>
#include <linux/sched.h>
#include <linux/inet.h>
#include <linux/namei.h>
#include <keys/rxrpc-type.h>
#include "internal.h"
afs: Fix checker warnings Fix warnings raised by checker, including: (*) Warnings raised by unequal comparison for the purposes of sorting, where the endianness doesn't matter: fs/afs/addr_list.c:246:21: warning: restricted __be16 degrades to integer fs/afs/addr_list.c:246:30: warning: restricted __be16 degrades to integer fs/afs/addr_list.c:248:21: warning: restricted __be32 degrades to integer fs/afs/addr_list.c:248:49: warning: restricted __be32 degrades to integer fs/afs/addr_list.c:283:21: warning: restricted __be16 degrades to integer fs/afs/addr_list.c:283:30: warning: restricted __be16 degrades to integer (*) afs_set_cb_interest() is not actually used and can be removed. (*) afs_cell_gc_delay() should be provided with a sysctl. (*) afs_cell_destroy() needs to use rcu_access_pointer() to read cell->vl_addrs. (*) afs_init_fs_cursor() should be static. (*) struct afs_vnode::permit_cache needs to be marked __rcu. (*) afs_server_rcu() needs to use rcu_access_pointer(). (*) afs_destroy_server() should use rcu_access_pointer() on server->addresses as the server object is no longer accessible. (*) afs_find_server() casts __be16/__be32 values to int in order to directly compare them for the purpose of finding a match in a list, but is should also annotate the cast with __force to avoid checker warnings. (*) afs_check_permit() accesses vnode->permit_cache outside of the RCU readlock, though it doesn't then access the value; the extraneous access is deleted. False positives: (*) Conditional locking around the code in xdr_decode_AFSFetchStatus. This can be dealt with in a separate patch. fs/afs/fsclient.c:148:9: warning: context imbalance in 'xdr_decode_AFSFetchStatus' - different lock contexts for basic block (*) Incorrect handling of seq-retry lock context balance: fs/afs/inode.c:455:38: warning: context imbalance in 'afs_getattr' - different lock contexts for basic block fs/afs/server.c:52:17: warning: context imbalance in 'afs_find_server' - different lock contexts for basic block fs/afs/server.c:128:17: warning: context imbalance in 'afs_find_server_by_uuid' - different lock contexts for basic block Errors: (*) afs_lookup_cell_rcu() needs to break out of the seq-retry loop, not go round again if it successfully found the workstation cell. (*) Fix UUID decode in afs_deliver_cb_probe_uuid(). (*) afs_cache_permit() has a missing rcu_read_unlock() before one of the jumps to the someone_else_changed_it label. Move the unlock to after the label. (*) afs_vl_get_addrs_u() is using ntohl() rather than htonl() when encoding to XDR. (*) afs_deliver_yfsvl_get_endpoints() is using htonl() rather than ntohl() when decoding from XDR. Signed-off-by: David Howells <dhowells@redhat.com>
2018-04-09 21:12:31 +01:00
static unsigned __read_mostly afs_cell_gc_delay = 10;
static unsigned __read_mostly afs_cell_min_ttl = 10 * 60;
static unsigned __read_mostly afs_cell_max_ttl = 24 * 60 * 60;
static atomic_t cell_debug_id;
afs: Simplify cell record handling Simplify afs_cell record handling to avoid very occasional races that cause module removal to hang (it waits for all cell records to be removed). There are two things that particularly contribute to the difficulty: firstly, the code tries to pass a ref on the cell to the cell's maintenance work item (which gets awkward if the work item is already queued); and, secondly, there's an overall cell manager that tries to use just one timer for the entire cell collection (to avoid having loads of timers). However, both of these are probably unnecessarily restrictive. To simplify this, the following changes are made: (1) The cell record collection manager is removed. Each cell record manages itself individually. (2) Each afs_cell is given a second work item (cell->destroyer) that is queued when its refcount reaches zero. This is not done in the context of the putting thread as it might be in an inconvenient place to sleep. (3) Each afs_cell is given its own timer. The timer is used to expire the cell record after a period of unuse if not otherwise pinned and can also be used for other maintenance tasks if necessary (of which there are currently none as DNS refresh is triggered by filesystem operations). (4) The afs_cell manager work item (cell->manager) is no longer given a ref on the cell when queued; rather, the manager must be deleted. This does away with the need to deal with the consequences of losing a race to queue cell->manager. Clean up of extra queuing is deferred to the destroyer. (5) The cell destroyer work item makes sure the cell timer is removed and that the normal cell work is cancelled before farming the actual destruction off to RCU. (6) When a network namespace is destroyed or the kafs module is unloaded, it's now a simple matter of marking the namespace as dead then just waking up all the cell work items. They will then remove and destroy themselves once all remaining activity counts and/or a ref counts are dropped. This makes sure that all server records are dropped first. (7) The cell record state set is reduced to just four states: SETTING_UP, ACTIVE, REMOVING and DEAD. The record persists in the active state even when it's not being used until the time comes to remove it rather than downgrading it to an inactive state from whence it can be restored. This means that the cell still appears in /proc and /afs when not in use until it switches to the REMOVING state - at which point it is removed. Note that the REMOVING state is included so that someone wanting to resurrect the cell record is forced to wait whilst the cell is torn down in that state. Once it's in the DEAD state, it has been removed from net->cells tree and is no longer findable and can be replaced. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org cc: linux-fsdevel@vger.kernel.org Link: https://lore.kernel.org/r/20250224234154.2014840-16-dhowells@redhat.com/ # v1 Link: https://lore.kernel.org/r/20250310094206.801057-12-dhowells@redhat.com/ # v4
2025-02-24 16:06:03 +00:00
static void afs_cell_timer(struct timer_list *timer);
static void afs_destroy_cell_work(struct work_struct *work);
static void afs_manage_cell_work(struct work_struct *work);
static void afs_dec_cells_outstanding(struct afs_net *net)
{
if (atomic_dec_and_test(&net->cells_outstanding))
wake_up_var(&net->cells_outstanding);
}
afs: Simplify cell record handling Simplify afs_cell record handling to avoid very occasional races that cause module removal to hang (it waits for all cell records to be removed). There are two things that particularly contribute to the difficulty: firstly, the code tries to pass a ref on the cell to the cell's maintenance work item (which gets awkward if the work item is already queued); and, secondly, there's an overall cell manager that tries to use just one timer for the entire cell collection (to avoid having loads of timers). However, both of these are probably unnecessarily restrictive. To simplify this, the following changes are made: (1) The cell record collection manager is removed. Each cell record manages itself individually. (2) Each afs_cell is given a second work item (cell->destroyer) that is queued when its refcount reaches zero. This is not done in the context of the putting thread as it might be in an inconvenient place to sleep. (3) Each afs_cell is given its own timer. The timer is used to expire the cell record after a period of unuse if not otherwise pinned and can also be used for other maintenance tasks if necessary (of which there are currently none as DNS refresh is triggered by filesystem operations). (4) The afs_cell manager work item (cell->manager) is no longer given a ref on the cell when queued; rather, the manager must be deleted. This does away with the need to deal with the consequences of losing a race to queue cell->manager. Clean up of extra queuing is deferred to the destroyer. (5) The cell destroyer work item makes sure the cell timer is removed and that the normal cell work is cancelled before farming the actual destruction off to RCU. (6) When a network namespace is destroyed or the kafs module is unloaded, it's now a simple matter of marking the namespace as dead then just waking up all the cell work items. They will then remove and destroy themselves once all remaining activity counts and/or a ref counts are dropped. This makes sure that all server records are dropped first. (7) The cell record state set is reduced to just four states: SETTING_UP, ACTIVE, REMOVING and DEAD. The record persists in the active state even when it's not being used until the time comes to remove it rather than downgrading it to an inactive state from whence it can be restored. This means that the cell still appears in /proc and /afs when not in use until it switches to the REMOVING state - at which point it is removed. Note that the REMOVING state is included so that someone wanting to resurrect the cell record is forced to wait whilst the cell is torn down in that state. Once it's in the DEAD state, it has been removed from net->cells tree and is no longer findable and can be replaced. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org cc: linux-fsdevel@vger.kernel.org Link: https://lore.kernel.org/r/20250224234154.2014840-16-dhowells@redhat.com/ # v1 Link: https://lore.kernel.org/r/20250310094206.801057-12-dhowells@redhat.com/ # v4
2025-02-24 16:06:03 +00:00
static void afs_set_cell_state(struct afs_cell *cell, enum afs_cell_state state)
{
afs: Simplify cell record handling Simplify afs_cell record handling to avoid very occasional races that cause module removal to hang (it waits for all cell records to be removed). There are two things that particularly contribute to the difficulty: firstly, the code tries to pass a ref on the cell to the cell's maintenance work item (which gets awkward if the work item is already queued); and, secondly, there's an overall cell manager that tries to use just one timer for the entire cell collection (to avoid having loads of timers). However, both of these are probably unnecessarily restrictive. To simplify this, the following changes are made: (1) The cell record collection manager is removed. Each cell record manages itself individually. (2) Each afs_cell is given a second work item (cell->destroyer) that is queued when its refcount reaches zero. This is not done in the context of the putting thread as it might be in an inconvenient place to sleep. (3) Each afs_cell is given its own timer. The timer is used to expire the cell record after a period of unuse if not otherwise pinned and can also be used for other maintenance tasks if necessary (of which there are currently none as DNS refresh is triggered by filesystem operations). (4) The afs_cell manager work item (cell->manager) is no longer given a ref on the cell when queued; rather, the manager must be deleted. This does away with the need to deal with the consequences of losing a race to queue cell->manager. Clean up of extra queuing is deferred to the destroyer. (5) The cell destroyer work item makes sure the cell timer is removed and that the normal cell work is cancelled before farming the actual destruction off to RCU. (6) When a network namespace is destroyed or the kafs module is unloaded, it's now a simple matter of marking the namespace as dead then just waking up all the cell work items. They will then remove and destroy themselves once all remaining activity counts and/or a ref counts are dropped. This makes sure that all server records are dropped first. (7) The cell record state set is reduced to just four states: SETTING_UP, ACTIVE, REMOVING and DEAD. The record persists in the active state even when it's not being used until the time comes to remove it rather than downgrading it to an inactive state from whence it can be restored. This means that the cell still appears in /proc and /afs when not in use until it switches to the REMOVING state - at which point it is removed. Note that the REMOVING state is included so that someone wanting to resurrect the cell record is forced to wait whilst the cell is torn down in that state. Once it's in the DEAD state, it has been removed from net->cells tree and is no longer findable and can be replaced. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org cc: linux-fsdevel@vger.kernel.org Link: https://lore.kernel.org/r/20250224234154.2014840-16-dhowells@redhat.com/ # v1 Link: https://lore.kernel.org/r/20250310094206.801057-12-dhowells@redhat.com/ # v4
2025-02-24 16:06:03 +00:00
smp_store_release(&cell->state, state); /* Commit cell changes before state */
smp_wmb(); /* Set cell state before task state */
wake_up_var(&cell->state);
}
/*
afs: Fix rapid cell addition/removal by not using RCU on cells tree There are a number of problems that are being seen by the rapidly mounting and unmounting an afs dynamic root with an explicit cell and volume specified (which should probably be rejected, but that's a separate issue): What the tests are doing is to look up/create a cell record for the name given and then tear it down again without actually using it to try to talk to a server. This is repeated endlessly, very fast, and the new cell collides with the old one if it's not quick enough to reuse it. It appears (as suggested by Hillf Danton) that the search through the RB tree under a read_seqbegin_or_lock() under RCU conditions isn't safe and that it's not blocking the write_seqlock(), despite taking two passes at it. He suggested that the code should take a ref on the cell it's attempting to look at - but this shouldn't be necessary until we've compared the cell names. It's possible that I'm missing a barrier somewhere. However, using an RCU search for this is overkill, really - we only need to access the cell name in a few places, and they're places where we're may end up sleeping anyway. Fix this by switching to an R/W semaphore instead. Additionally, draw the down_read() call inside the function (renamed to afs_find_cell()) since all the callers were taking the RCU read lock (or should've been[*]). [*] afs_probe_cell_name() should have been, but that doesn't appear to be involved in the bug reports. The symptoms of this look like: general protection fault, probably for non-canonical address 0xf27d208691691fdb: 0000 [#1] PREEMPT SMP KASAN KASAN: maybe wild-memory-access in range [0x93e924348b48fed8-0x93e924348b48fedf] ... RIP: 0010:strncasecmp lib/string.c:52 [inline] RIP: 0010:strncasecmp+0x5f/0x240 lib/string.c:43 afs_lookup_cell_rcu+0x313/0x720 fs/afs/cell.c:88 afs_lookup_cell+0x2ee/0x1440 fs/afs/cell.c:249 afs_parse_source fs/afs/super.c:290 [inline] ... Fixes: 989782dcdc91 ("afs: Overhaul cell database management") Reported-by: syzbot+459a5dce0b4cb70fd076@syzkaller.appspotmail.com Signed-off-by: David Howells <dhowells@redhat.com> cc: Hillf Danton <hdanton@sina.com> cc: syzkaller-bugs@googlegroups.com
2020-10-09 14:11:58 +01:00
* Look up and get an activation reference on a cell record. The caller must
* hold net->cells_lock at least read-locked.
*/
afs: Fix rapid cell addition/removal by not using RCU on cells tree There are a number of problems that are being seen by the rapidly mounting and unmounting an afs dynamic root with an explicit cell and volume specified (which should probably be rejected, but that's a separate issue): What the tests are doing is to look up/create a cell record for the name given and then tear it down again without actually using it to try to talk to a server. This is repeated endlessly, very fast, and the new cell collides with the old one if it's not quick enough to reuse it. It appears (as suggested by Hillf Danton) that the search through the RB tree under a read_seqbegin_or_lock() under RCU conditions isn't safe and that it's not blocking the write_seqlock(), despite taking two passes at it. He suggested that the code should take a ref on the cell it's attempting to look at - but this shouldn't be necessary until we've compared the cell names. It's possible that I'm missing a barrier somewhere. However, using an RCU search for this is overkill, really - we only need to access the cell name in a few places, and they're places where we're may end up sleeping anyway. Fix this by switching to an R/W semaphore instead. Additionally, draw the down_read() call inside the function (renamed to afs_find_cell()) since all the callers were taking the RCU read lock (or should've been[*]). [*] afs_probe_cell_name() should have been, but that doesn't appear to be involved in the bug reports. The symptoms of this look like: general protection fault, probably for non-canonical address 0xf27d208691691fdb: 0000 [#1] PREEMPT SMP KASAN KASAN: maybe wild-memory-access in range [0x93e924348b48fed8-0x93e924348b48fedf] ... RIP: 0010:strncasecmp lib/string.c:52 [inline] RIP: 0010:strncasecmp+0x5f/0x240 lib/string.c:43 afs_lookup_cell_rcu+0x313/0x720 fs/afs/cell.c:88 afs_lookup_cell+0x2ee/0x1440 fs/afs/cell.c:249 afs_parse_source fs/afs/super.c:290 [inline] ... Fixes: 989782dcdc91 ("afs: Overhaul cell database management") Reported-by: syzbot+459a5dce0b4cb70fd076@syzkaller.appspotmail.com Signed-off-by: David Howells <dhowells@redhat.com> cc: Hillf Danton <hdanton@sina.com> cc: syzkaller-bugs@googlegroups.com
2020-10-09 14:11:58 +01:00
static struct afs_cell *afs_find_cell_locked(struct afs_net *net,
const char *name, unsigned int namesz,
enum afs_cell_trace reason)
{
struct afs_cell *cell = NULL;
struct rb_node *p;
afs: Fix rapid cell addition/removal by not using RCU on cells tree There are a number of problems that are being seen by the rapidly mounting and unmounting an afs dynamic root with an explicit cell and volume specified (which should probably be rejected, but that's a separate issue): What the tests are doing is to look up/create a cell record for the name given and then tear it down again without actually using it to try to talk to a server. This is repeated endlessly, very fast, and the new cell collides with the old one if it's not quick enough to reuse it. It appears (as suggested by Hillf Danton) that the search through the RB tree under a read_seqbegin_or_lock() under RCU conditions isn't safe and that it's not blocking the write_seqlock(), despite taking two passes at it. He suggested that the code should take a ref on the cell it's attempting to look at - but this shouldn't be necessary until we've compared the cell names. It's possible that I'm missing a barrier somewhere. However, using an RCU search for this is overkill, really - we only need to access the cell name in a few places, and they're places where we're may end up sleeping anyway. Fix this by switching to an R/W semaphore instead. Additionally, draw the down_read() call inside the function (renamed to afs_find_cell()) since all the callers were taking the RCU read lock (or should've been[*]). [*] afs_probe_cell_name() should have been, but that doesn't appear to be involved in the bug reports. The symptoms of this look like: general protection fault, probably for non-canonical address 0xf27d208691691fdb: 0000 [#1] PREEMPT SMP KASAN KASAN: maybe wild-memory-access in range [0x93e924348b48fed8-0x93e924348b48fedf] ... RIP: 0010:strncasecmp lib/string.c:52 [inline] RIP: 0010:strncasecmp+0x5f/0x240 lib/string.c:43 afs_lookup_cell_rcu+0x313/0x720 fs/afs/cell.c:88 afs_lookup_cell+0x2ee/0x1440 fs/afs/cell.c:249 afs_parse_source fs/afs/super.c:290 [inline] ... Fixes: 989782dcdc91 ("afs: Overhaul cell database management") Reported-by: syzbot+459a5dce0b4cb70fd076@syzkaller.appspotmail.com Signed-off-by: David Howells <dhowells@redhat.com> cc: Hillf Danton <hdanton@sina.com> cc: syzkaller-bugs@googlegroups.com
2020-10-09 14:11:58 +01:00
int n;
_enter("%*.*s", namesz, namesz, name);
if (name && namesz == 0)
return ERR_PTR(-EINVAL);
if (namesz > AFS_MAXCELLNAME)
return ERR_PTR(-ENAMETOOLONG);
afs: Fix rapid cell addition/removal by not using RCU on cells tree There are a number of problems that are being seen by the rapidly mounting and unmounting an afs dynamic root with an explicit cell and volume specified (which should probably be rejected, but that's a separate issue): What the tests are doing is to look up/create a cell record for the name given and then tear it down again without actually using it to try to talk to a server. This is repeated endlessly, very fast, and the new cell collides with the old one if it's not quick enough to reuse it. It appears (as suggested by Hillf Danton) that the search through the RB tree under a read_seqbegin_or_lock() under RCU conditions isn't safe and that it's not blocking the write_seqlock(), despite taking two passes at it. He suggested that the code should take a ref on the cell it's attempting to look at - but this shouldn't be necessary until we've compared the cell names. It's possible that I'm missing a barrier somewhere. However, using an RCU search for this is overkill, really - we only need to access the cell name in a few places, and they're places where we're may end up sleeping anyway. Fix this by switching to an R/W semaphore instead. Additionally, draw the down_read() call inside the function (renamed to afs_find_cell()) since all the callers were taking the RCU read lock (or should've been[*]). [*] afs_probe_cell_name() should have been, but that doesn't appear to be involved in the bug reports. The symptoms of this look like: general protection fault, probably for non-canonical address 0xf27d208691691fdb: 0000 [#1] PREEMPT SMP KASAN KASAN: maybe wild-memory-access in range [0x93e924348b48fed8-0x93e924348b48fedf] ... RIP: 0010:strncasecmp lib/string.c:52 [inline] RIP: 0010:strncasecmp+0x5f/0x240 lib/string.c:43 afs_lookup_cell_rcu+0x313/0x720 fs/afs/cell.c:88 afs_lookup_cell+0x2ee/0x1440 fs/afs/cell.c:249 afs_parse_source fs/afs/super.c:290 [inline] ... Fixes: 989782dcdc91 ("afs: Overhaul cell database management") Reported-by: syzbot+459a5dce0b4cb70fd076@syzkaller.appspotmail.com Signed-off-by: David Howells <dhowells@redhat.com> cc: Hillf Danton <hdanton@sina.com> cc: syzkaller-bugs@googlegroups.com
2020-10-09 14:11:58 +01:00
if (!name) {
cell = rcu_dereference_protected(net->ws_cell,
lockdep_is_held(&net->cells_lock));
afs: Fix rapid cell addition/removal by not using RCU on cells tree There are a number of problems that are being seen by the rapidly mounting and unmounting an afs dynamic root with an explicit cell and volume specified (which should probably be rejected, but that's a separate issue): What the tests are doing is to look up/create a cell record for the name given and then tear it down again without actually using it to try to talk to a server. This is repeated endlessly, very fast, and the new cell collides with the old one if it's not quick enough to reuse it. It appears (as suggested by Hillf Danton) that the search through the RB tree under a read_seqbegin_or_lock() under RCU conditions isn't safe and that it's not blocking the write_seqlock(), despite taking two passes at it. He suggested that the code should take a ref on the cell it's attempting to look at - but this shouldn't be necessary until we've compared the cell names. It's possible that I'm missing a barrier somewhere. However, using an RCU search for this is overkill, really - we only need to access the cell name in a few places, and they're places where we're may end up sleeping anyway. Fix this by switching to an R/W semaphore instead. Additionally, draw the down_read() call inside the function (renamed to afs_find_cell()) since all the callers were taking the RCU read lock (or should've been[*]). [*] afs_probe_cell_name() should have been, but that doesn't appear to be involved in the bug reports. The symptoms of this look like: general protection fault, probably for non-canonical address 0xf27d208691691fdb: 0000 [#1] PREEMPT SMP KASAN KASAN: maybe wild-memory-access in range [0x93e924348b48fed8-0x93e924348b48fedf] ... RIP: 0010:strncasecmp lib/string.c:52 [inline] RIP: 0010:strncasecmp+0x5f/0x240 lib/string.c:43 afs_lookup_cell_rcu+0x313/0x720 fs/afs/cell.c:88 afs_lookup_cell+0x2ee/0x1440 fs/afs/cell.c:249 afs_parse_source fs/afs/super.c:290 [inline] ... Fixes: 989782dcdc91 ("afs: Overhaul cell database management") Reported-by: syzbot+459a5dce0b4cb70fd076@syzkaller.appspotmail.com Signed-off-by: David Howells <dhowells@redhat.com> cc: Hillf Danton <hdanton@sina.com> cc: syzkaller-bugs@googlegroups.com
2020-10-09 14:11:58 +01:00
if (!cell)
return ERR_PTR(-EDESTADDRREQ);
afs: Fix cell refcounting by splitting the usage counter Management of the lifetime of afs_cell struct has some problems due to the usage counter being used to determine whether objects of that type are in use in addition to whether anyone might be interested in the structure. This is made trickier by cell objects being cached for a period of time in case they're quickly reused as they hold the result of a setup process that may be slow (DNS lookups, AFS RPC ops). Problems include the cached root volume from alias resolution pinning its parent cell record, rmmod occasionally hanging and occasionally producing assertion failures. Fix this by splitting the count of active users from the struct reference count. Things then work as follows: (1) The cell cache keeps +1 on the cell's activity count and this has to be dropped before the cell can be removed. afs_manage_cell() tries to exchange the 1 to a 0 with the cells_lock write-locked, and if successful, the record is removed from the net->cells. (2) One struct ref is 'owned' by the activity count. That is put when the active count is reduced to 0 (final_destruction label). (3) A ref can be held on a cell whilst it is queued for management on a work queue without confusing the active count. afs_queue_cell() is added to wrap this. (4) The queue's ref is dropped at the end of the management. This is split out into a separate function, afs_manage_cell_work(). (5) The root volume record is put after a cell is removed (at the final_destruction label) rather then in the RCU destruction routine. (6) Volumes hold struct refs, but aren't active users. (7) Both counts are displayed in /proc/net/afs/cells. There are some management function changes: (*) afs_put_cell() now just decrements the refcount and triggers the RCU destruction if it becomes 0. It no longer sets a timer to have the manager do this. (*) afs_use_cell() and afs_unuse_cell() are added to increase and decrease the active count. afs_unuse_cell() sets the management timer. (*) afs_queue_cell() is added to queue a cell with approprate refs. There are also some other fixes: (*) Don't let /proc/net/afs/cells access a cell's vllist if it's NULL. (*) Make sure that candidate cells in lookups are properly destroyed rather than being simply kfree'd. This ensures the bits it points to are destroyed also. (*) afs_dec_cells_outstanding() is now called in cell destruction rather than at "final_destruction". This ensures that cell->net is still valid to the end of the destructor. (*) As a consequence of the previous two changes, move the increment of net->cells_outstanding that was at the point of insertion into the tree to the allocation routine to correctly balance things. Fixes: 989782dcdc91 ("afs: Overhaul cell database management") Signed-off-by: David Howells <dhowells@redhat.com>
2019-07-23 11:24:59 +01:00
goto found;
afs: Fix rapid cell addition/removal by not using RCU on cells tree There are a number of problems that are being seen by the rapidly mounting and unmounting an afs dynamic root with an explicit cell and volume specified (which should probably be rejected, but that's a separate issue): What the tests are doing is to look up/create a cell record for the name given and then tear it down again without actually using it to try to talk to a server. This is repeated endlessly, very fast, and the new cell collides with the old one if it's not quick enough to reuse it. It appears (as suggested by Hillf Danton) that the search through the RB tree under a read_seqbegin_or_lock() under RCU conditions isn't safe and that it's not blocking the write_seqlock(), despite taking two passes at it. He suggested that the code should take a ref on the cell it's attempting to look at - but this shouldn't be necessary until we've compared the cell names. It's possible that I'm missing a barrier somewhere. However, using an RCU search for this is overkill, really - we only need to access the cell name in a few places, and they're places where we're may end up sleeping anyway. Fix this by switching to an R/W semaphore instead. Additionally, draw the down_read() call inside the function (renamed to afs_find_cell()) since all the callers were taking the RCU read lock (or should've been[*]). [*] afs_probe_cell_name() should have been, but that doesn't appear to be involved in the bug reports. The symptoms of this look like: general protection fault, probably for non-canonical address 0xf27d208691691fdb: 0000 [#1] PREEMPT SMP KASAN KASAN: maybe wild-memory-access in range [0x93e924348b48fed8-0x93e924348b48fedf] ... RIP: 0010:strncasecmp lib/string.c:52 [inline] RIP: 0010:strncasecmp+0x5f/0x240 lib/string.c:43 afs_lookup_cell_rcu+0x313/0x720 fs/afs/cell.c:88 afs_lookup_cell+0x2ee/0x1440 fs/afs/cell.c:249 afs_parse_source fs/afs/super.c:290 [inline] ... Fixes: 989782dcdc91 ("afs: Overhaul cell database management") Reported-by: syzbot+459a5dce0b4cb70fd076@syzkaller.appspotmail.com Signed-off-by: David Howells <dhowells@redhat.com> cc: Hillf Danton <hdanton@sina.com> cc: syzkaller-bugs@googlegroups.com
2020-10-09 14:11:58 +01:00
}
afs: Fix rapid cell addition/removal by not using RCU on cells tree There are a number of problems that are being seen by the rapidly mounting and unmounting an afs dynamic root with an explicit cell and volume specified (which should probably be rejected, but that's a separate issue): What the tests are doing is to look up/create a cell record for the name given and then tear it down again without actually using it to try to talk to a server. This is repeated endlessly, very fast, and the new cell collides with the old one if it's not quick enough to reuse it. It appears (as suggested by Hillf Danton) that the search through the RB tree under a read_seqbegin_or_lock() under RCU conditions isn't safe and that it's not blocking the write_seqlock(), despite taking two passes at it. He suggested that the code should take a ref on the cell it's attempting to look at - but this shouldn't be necessary until we've compared the cell names. It's possible that I'm missing a barrier somewhere. However, using an RCU search for this is overkill, really - we only need to access the cell name in a few places, and they're places where we're may end up sleeping anyway. Fix this by switching to an R/W semaphore instead. Additionally, draw the down_read() call inside the function (renamed to afs_find_cell()) since all the callers were taking the RCU read lock (or should've been[*]). [*] afs_probe_cell_name() should have been, but that doesn't appear to be involved in the bug reports. The symptoms of this look like: general protection fault, probably for non-canonical address 0xf27d208691691fdb: 0000 [#1] PREEMPT SMP KASAN KASAN: maybe wild-memory-access in range [0x93e924348b48fed8-0x93e924348b48fedf] ... RIP: 0010:strncasecmp lib/string.c:52 [inline] RIP: 0010:strncasecmp+0x5f/0x240 lib/string.c:43 afs_lookup_cell_rcu+0x313/0x720 fs/afs/cell.c:88 afs_lookup_cell+0x2ee/0x1440 fs/afs/cell.c:249 afs_parse_source fs/afs/super.c:290 [inline] ... Fixes: 989782dcdc91 ("afs: Overhaul cell database management") Reported-by: syzbot+459a5dce0b4cb70fd076@syzkaller.appspotmail.com Signed-off-by: David Howells <dhowells@redhat.com> cc: Hillf Danton <hdanton@sina.com> cc: syzkaller-bugs@googlegroups.com
2020-10-09 14:11:58 +01:00
p = net->cells.rb_node;
while (p) {
cell = rb_entry(p, struct afs_cell, net_node);
afs: Fix rapid cell addition/removal by not using RCU on cells tree There are a number of problems that are being seen by the rapidly mounting and unmounting an afs dynamic root with an explicit cell and volume specified (which should probably be rejected, but that's a separate issue): What the tests are doing is to look up/create a cell record for the name given and then tear it down again without actually using it to try to talk to a server. This is repeated endlessly, very fast, and the new cell collides with the old one if it's not quick enough to reuse it. It appears (as suggested by Hillf Danton) that the search through the RB tree under a read_seqbegin_or_lock() under RCU conditions isn't safe and that it's not blocking the write_seqlock(), despite taking two passes at it. He suggested that the code should take a ref on the cell it's attempting to look at - but this shouldn't be necessary until we've compared the cell names. It's possible that I'm missing a barrier somewhere. However, using an RCU search for this is overkill, really - we only need to access the cell name in a few places, and they're places where we're may end up sleeping anyway. Fix this by switching to an R/W semaphore instead. Additionally, draw the down_read() call inside the function (renamed to afs_find_cell()) since all the callers were taking the RCU read lock (or should've been[*]). [*] afs_probe_cell_name() should have been, but that doesn't appear to be involved in the bug reports. The symptoms of this look like: general protection fault, probably for non-canonical address 0xf27d208691691fdb: 0000 [#1] PREEMPT SMP KASAN KASAN: maybe wild-memory-access in range [0x93e924348b48fed8-0x93e924348b48fedf] ... RIP: 0010:strncasecmp lib/string.c:52 [inline] RIP: 0010:strncasecmp+0x5f/0x240 lib/string.c:43 afs_lookup_cell_rcu+0x313/0x720 fs/afs/cell.c:88 afs_lookup_cell+0x2ee/0x1440 fs/afs/cell.c:249 afs_parse_source fs/afs/super.c:290 [inline] ... Fixes: 989782dcdc91 ("afs: Overhaul cell database management") Reported-by: syzbot+459a5dce0b4cb70fd076@syzkaller.appspotmail.com Signed-off-by: David Howells <dhowells@redhat.com> cc: Hillf Danton <hdanton@sina.com> cc: syzkaller-bugs@googlegroups.com
2020-10-09 14:11:58 +01:00
n = strncasecmp(cell->name, name,
min_t(size_t, cell->name_len, namesz));
if (n == 0)
n = cell->name_len - namesz;
if (n < 0)
p = p->rb_left;
else if (n > 0)
p = p->rb_right;
else
goto found;
}
return ERR_PTR(-ENOENT);
afs: Fix rapid cell addition/removal by not using RCU on cells tree There are a number of problems that are being seen by the rapidly mounting and unmounting an afs dynamic root with an explicit cell and volume specified (which should probably be rejected, but that's a separate issue): What the tests are doing is to look up/create a cell record for the name given and then tear it down again without actually using it to try to talk to a server. This is repeated endlessly, very fast, and the new cell collides with the old one if it's not quick enough to reuse it. It appears (as suggested by Hillf Danton) that the search through the RB tree under a read_seqbegin_or_lock() under RCU conditions isn't safe and that it's not blocking the write_seqlock(), despite taking two passes at it. He suggested that the code should take a ref on the cell it's attempting to look at - but this shouldn't be necessary until we've compared the cell names. It's possible that I'm missing a barrier somewhere. However, using an RCU search for this is overkill, really - we only need to access the cell name in a few places, and they're places where we're may end up sleeping anyway. Fix this by switching to an R/W semaphore instead. Additionally, draw the down_read() call inside the function (renamed to afs_find_cell()) since all the callers were taking the RCU read lock (or should've been[*]). [*] afs_probe_cell_name() should have been, but that doesn't appear to be involved in the bug reports. The symptoms of this look like: general protection fault, probably for non-canonical address 0xf27d208691691fdb: 0000 [#1] PREEMPT SMP KASAN KASAN: maybe wild-memory-access in range [0x93e924348b48fed8-0x93e924348b48fedf] ... RIP: 0010:strncasecmp lib/string.c:52 [inline] RIP: 0010:strncasecmp+0x5f/0x240 lib/string.c:43 afs_lookup_cell_rcu+0x313/0x720 fs/afs/cell.c:88 afs_lookup_cell+0x2ee/0x1440 fs/afs/cell.c:249 afs_parse_source fs/afs/super.c:290 [inline] ... Fixes: 989782dcdc91 ("afs: Overhaul cell database management") Reported-by: syzbot+459a5dce0b4cb70fd076@syzkaller.appspotmail.com Signed-off-by: David Howells <dhowells@redhat.com> cc: Hillf Danton <hdanton@sina.com> cc: syzkaller-bugs@googlegroups.com
2020-10-09 14:11:58 +01:00
found:
return afs_use_cell(cell, reason);
afs: Fix rapid cell addition/removal by not using RCU on cells tree There are a number of problems that are being seen by the rapidly mounting and unmounting an afs dynamic root with an explicit cell and volume specified (which should probably be rejected, but that's a separate issue): What the tests are doing is to look up/create a cell record for the name given and then tear it down again without actually using it to try to talk to a server. This is repeated endlessly, very fast, and the new cell collides with the old one if it's not quick enough to reuse it. It appears (as suggested by Hillf Danton) that the search through the RB tree under a read_seqbegin_or_lock() under RCU conditions isn't safe and that it's not blocking the write_seqlock(), despite taking two passes at it. He suggested that the code should take a ref on the cell it's attempting to look at - but this shouldn't be necessary until we've compared the cell names. It's possible that I'm missing a barrier somewhere. However, using an RCU search for this is overkill, really - we only need to access the cell name in a few places, and they're places where we're may end up sleeping anyway. Fix this by switching to an R/W semaphore instead. Additionally, draw the down_read() call inside the function (renamed to afs_find_cell()) since all the callers were taking the RCU read lock (or should've been[*]). [*] afs_probe_cell_name() should have been, but that doesn't appear to be involved in the bug reports. The symptoms of this look like: general protection fault, probably for non-canonical address 0xf27d208691691fdb: 0000 [#1] PREEMPT SMP KASAN KASAN: maybe wild-memory-access in range [0x93e924348b48fed8-0x93e924348b48fedf] ... RIP: 0010:strncasecmp lib/string.c:52 [inline] RIP: 0010:strncasecmp+0x5f/0x240 lib/string.c:43 afs_lookup_cell_rcu+0x313/0x720 fs/afs/cell.c:88 afs_lookup_cell+0x2ee/0x1440 fs/afs/cell.c:249 afs_parse_source fs/afs/super.c:290 [inline] ... Fixes: 989782dcdc91 ("afs: Overhaul cell database management") Reported-by: syzbot+459a5dce0b4cb70fd076@syzkaller.appspotmail.com Signed-off-by: David Howells <dhowells@redhat.com> cc: Hillf Danton <hdanton@sina.com> cc: syzkaller-bugs@googlegroups.com
2020-10-09 14:11:58 +01:00
}
afs: Fix cell refcounting by splitting the usage counter Management of the lifetime of afs_cell struct has some problems due to the usage counter being used to determine whether objects of that type are in use in addition to whether anyone might be interested in the structure. This is made trickier by cell objects being cached for a period of time in case they're quickly reused as they hold the result of a setup process that may be slow (DNS lookups, AFS RPC ops). Problems include the cached root volume from alias resolution pinning its parent cell record, rmmod occasionally hanging and occasionally producing assertion failures. Fix this by splitting the count of active users from the struct reference count. Things then work as follows: (1) The cell cache keeps +1 on the cell's activity count and this has to be dropped before the cell can be removed. afs_manage_cell() tries to exchange the 1 to a 0 with the cells_lock write-locked, and if successful, the record is removed from the net->cells. (2) One struct ref is 'owned' by the activity count. That is put when the active count is reduced to 0 (final_destruction label). (3) A ref can be held on a cell whilst it is queued for management on a work queue without confusing the active count. afs_queue_cell() is added to wrap this. (4) The queue's ref is dropped at the end of the management. This is split out into a separate function, afs_manage_cell_work(). (5) The root volume record is put after a cell is removed (at the final_destruction label) rather then in the RCU destruction routine. (6) Volumes hold struct refs, but aren't active users. (7) Both counts are displayed in /proc/net/afs/cells. There are some management function changes: (*) afs_put_cell() now just decrements the refcount and triggers the RCU destruction if it becomes 0. It no longer sets a timer to have the manager do this. (*) afs_use_cell() and afs_unuse_cell() are added to increase and decrease the active count. afs_unuse_cell() sets the management timer. (*) afs_queue_cell() is added to queue a cell with approprate refs. There are also some other fixes: (*) Don't let /proc/net/afs/cells access a cell's vllist if it's NULL. (*) Make sure that candidate cells in lookups are properly destroyed rather than being simply kfree'd. This ensures the bits it points to are destroyed also. (*) afs_dec_cells_outstanding() is now called in cell destruction rather than at "final_destruction". This ensures that cell->net is still valid to the end of the destructor. (*) As a consequence of the previous two changes, move the increment of net->cells_outstanding that was at the point of insertion into the tree to the allocation routine to correctly balance things. Fixes: 989782dcdc91 ("afs: Overhaul cell database management") Signed-off-by: David Howells <dhowells@redhat.com>
2019-07-23 11:24:59 +01:00
/*
* Look up and get an activation reference on a cell record.
*/
afs: Fix rapid cell addition/removal by not using RCU on cells tree There are a number of problems that are being seen by the rapidly mounting and unmounting an afs dynamic root with an explicit cell and volume specified (which should probably be rejected, but that's a separate issue): What the tests are doing is to look up/create a cell record for the name given and then tear it down again without actually using it to try to talk to a server. This is repeated endlessly, very fast, and the new cell collides with the old one if it's not quick enough to reuse it. It appears (as suggested by Hillf Danton) that the search through the RB tree under a read_seqbegin_or_lock() under RCU conditions isn't safe and that it's not blocking the write_seqlock(), despite taking two passes at it. He suggested that the code should take a ref on the cell it's attempting to look at - but this shouldn't be necessary until we've compared the cell names. It's possible that I'm missing a barrier somewhere. However, using an RCU search for this is overkill, really - we only need to access the cell name in a few places, and they're places where we're may end up sleeping anyway. Fix this by switching to an R/W semaphore instead. Additionally, draw the down_read() call inside the function (renamed to afs_find_cell()) since all the callers were taking the RCU read lock (or should've been[*]). [*] afs_probe_cell_name() should have been, but that doesn't appear to be involved in the bug reports. The symptoms of this look like: general protection fault, probably for non-canonical address 0xf27d208691691fdb: 0000 [#1] PREEMPT SMP KASAN KASAN: maybe wild-memory-access in range [0x93e924348b48fed8-0x93e924348b48fedf] ... RIP: 0010:strncasecmp lib/string.c:52 [inline] RIP: 0010:strncasecmp+0x5f/0x240 lib/string.c:43 afs_lookup_cell_rcu+0x313/0x720 fs/afs/cell.c:88 afs_lookup_cell+0x2ee/0x1440 fs/afs/cell.c:249 afs_parse_source fs/afs/super.c:290 [inline] ... Fixes: 989782dcdc91 ("afs: Overhaul cell database management") Reported-by: syzbot+459a5dce0b4cb70fd076@syzkaller.appspotmail.com Signed-off-by: David Howells <dhowells@redhat.com> cc: Hillf Danton <hdanton@sina.com> cc: syzkaller-bugs@googlegroups.com
2020-10-09 14:11:58 +01:00
struct afs_cell *afs_find_cell(struct afs_net *net,
const char *name, unsigned int namesz,
enum afs_cell_trace reason)
afs: Fix rapid cell addition/removal by not using RCU on cells tree There are a number of problems that are being seen by the rapidly mounting and unmounting an afs dynamic root with an explicit cell and volume specified (which should probably be rejected, but that's a separate issue): What the tests are doing is to look up/create a cell record for the name given and then tear it down again without actually using it to try to talk to a server. This is repeated endlessly, very fast, and the new cell collides with the old one if it's not quick enough to reuse it. It appears (as suggested by Hillf Danton) that the search through the RB tree under a read_seqbegin_or_lock() under RCU conditions isn't safe and that it's not blocking the write_seqlock(), despite taking two passes at it. He suggested that the code should take a ref on the cell it's attempting to look at - but this shouldn't be necessary until we've compared the cell names. It's possible that I'm missing a barrier somewhere. However, using an RCU search for this is overkill, really - we only need to access the cell name in a few places, and they're places where we're may end up sleeping anyway. Fix this by switching to an R/W semaphore instead. Additionally, draw the down_read() call inside the function (renamed to afs_find_cell()) since all the callers were taking the RCU read lock (or should've been[*]). [*] afs_probe_cell_name() should have been, but that doesn't appear to be involved in the bug reports. The symptoms of this look like: general protection fault, probably for non-canonical address 0xf27d208691691fdb: 0000 [#1] PREEMPT SMP KASAN KASAN: maybe wild-memory-access in range [0x93e924348b48fed8-0x93e924348b48fedf] ... RIP: 0010:strncasecmp lib/string.c:52 [inline] RIP: 0010:strncasecmp+0x5f/0x240 lib/string.c:43 afs_lookup_cell_rcu+0x313/0x720 fs/afs/cell.c:88 afs_lookup_cell+0x2ee/0x1440 fs/afs/cell.c:249 afs_parse_source fs/afs/super.c:290 [inline] ... Fixes: 989782dcdc91 ("afs: Overhaul cell database management") Reported-by: syzbot+459a5dce0b4cb70fd076@syzkaller.appspotmail.com Signed-off-by: David Howells <dhowells@redhat.com> cc: Hillf Danton <hdanton@sina.com> cc: syzkaller-bugs@googlegroups.com
2020-10-09 14:11:58 +01:00
{
struct afs_cell *cell;
down_read(&net->cells_lock);
cell = afs_find_cell_locked(net, name, namesz, reason);
afs: Fix rapid cell addition/removal by not using RCU on cells tree There are a number of problems that are being seen by the rapidly mounting and unmounting an afs dynamic root with an explicit cell and volume specified (which should probably be rejected, but that's a separate issue): What the tests are doing is to look up/create a cell record for the name given and then tear it down again without actually using it to try to talk to a server. This is repeated endlessly, very fast, and the new cell collides with the old one if it's not quick enough to reuse it. It appears (as suggested by Hillf Danton) that the search through the RB tree under a read_seqbegin_or_lock() under RCU conditions isn't safe and that it's not blocking the write_seqlock(), despite taking two passes at it. He suggested that the code should take a ref on the cell it's attempting to look at - but this shouldn't be necessary until we've compared the cell names. It's possible that I'm missing a barrier somewhere. However, using an RCU search for this is overkill, really - we only need to access the cell name in a few places, and they're places where we're may end up sleeping anyway. Fix this by switching to an R/W semaphore instead. Additionally, draw the down_read() call inside the function (renamed to afs_find_cell()) since all the callers were taking the RCU read lock (or should've been[*]). [*] afs_probe_cell_name() should have been, but that doesn't appear to be involved in the bug reports. The symptoms of this look like: general protection fault, probably for non-canonical address 0xf27d208691691fdb: 0000 [#1] PREEMPT SMP KASAN KASAN: maybe wild-memory-access in range [0x93e924348b48fed8-0x93e924348b48fedf] ... RIP: 0010:strncasecmp lib/string.c:52 [inline] RIP: 0010:strncasecmp+0x5f/0x240 lib/string.c:43 afs_lookup_cell_rcu+0x313/0x720 fs/afs/cell.c:88 afs_lookup_cell+0x2ee/0x1440 fs/afs/cell.c:249 afs_parse_source fs/afs/super.c:290 [inline] ... Fixes: 989782dcdc91 ("afs: Overhaul cell database management") Reported-by: syzbot+459a5dce0b4cb70fd076@syzkaller.appspotmail.com Signed-off-by: David Howells <dhowells@redhat.com> cc: Hillf Danton <hdanton@sina.com> cc: syzkaller-bugs@googlegroups.com
2020-10-09 14:11:58 +01:00
up_read(&net->cells_lock);
return cell;
}
/*
* Set up a cell record and fill in its name, VL server address list and
* allocate an anonymous key
*/
static struct afs_cell *afs_alloc_cell(struct afs_net *net,
const char *name, unsigned int namelen,
const char *addresses)
{
afs: Simplify cell record handling Simplify afs_cell record handling to avoid very occasional races that cause module removal to hang (it waits for all cell records to be removed). There are two things that particularly contribute to the difficulty: firstly, the code tries to pass a ref on the cell to the cell's maintenance work item (which gets awkward if the work item is already queued); and, secondly, there's an overall cell manager that tries to use just one timer for the entire cell collection (to avoid having loads of timers). However, both of these are probably unnecessarily restrictive. To simplify this, the following changes are made: (1) The cell record collection manager is removed. Each cell record manages itself individually. (2) Each afs_cell is given a second work item (cell->destroyer) that is queued when its refcount reaches zero. This is not done in the context of the putting thread as it might be in an inconvenient place to sleep. (3) Each afs_cell is given its own timer. The timer is used to expire the cell record after a period of unuse if not otherwise pinned and can also be used for other maintenance tasks if necessary (of which there are currently none as DNS refresh is triggered by filesystem operations). (4) The afs_cell manager work item (cell->manager) is no longer given a ref on the cell when queued; rather, the manager must be deleted. This does away with the need to deal with the consequences of losing a race to queue cell->manager. Clean up of extra queuing is deferred to the destroyer. (5) The cell destroyer work item makes sure the cell timer is removed and that the normal cell work is cancelled before farming the actual destruction off to RCU. (6) When a network namespace is destroyed or the kafs module is unloaded, it's now a simple matter of marking the namespace as dead then just waking up all the cell work items. They will then remove and destroy themselves once all remaining activity counts and/or a ref counts are dropped. This makes sure that all server records are dropped first. (7) The cell record state set is reduced to just four states: SETTING_UP, ACTIVE, REMOVING and DEAD. The record persists in the active state even when it's not being used until the time comes to remove it rather than downgrading it to an inactive state from whence it can be restored. This means that the cell still appears in /proc and /afs when not in use until it switches to the REMOVING state - at which point it is removed. Note that the REMOVING state is included so that someone wanting to resurrect the cell record is forced to wait whilst the cell is torn down in that state. Once it's in the DEAD state, it has been removed from net->cells tree and is no longer findable and can be replaced. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org cc: linux-fsdevel@vger.kernel.org Link: https://lore.kernel.org/r/20250224234154.2014840-16-dhowells@redhat.com/ # v1 Link: https://lore.kernel.org/r/20250310094206.801057-12-dhowells@redhat.com/ # v4
2025-02-24 16:06:03 +00:00
struct afs_vlserver_list *vllist = NULL;
struct afs_cell *cell;
int i, ret;
ASSERT(name);
if (namelen == 0)
return ERR_PTR(-EINVAL);
if (namelen > AFS_MAXCELLNAME) {
_leave(" = -ENAMETOOLONG");
return ERR_PTR(-ENAMETOOLONG);
}
/* Prohibit cell names that contain unprintable chars, '/' and '@' or
* that begin with a dot. This also precludes "@cell".
*/
if (name[0] == '.')
return ERR_PTR(-EINVAL);
for (i = 0; i < namelen; i++) {
char ch = name[i];
if (!isprint(ch) || ch == '/' || ch == '@')
return ERR_PTR(-EINVAL);
}
_enter("%*.*s,%s", namelen, namelen, name, addresses);
cell = kzalloc(sizeof(struct afs_cell), GFP_KERNEL);
if (!cell) {
_leave(" = -ENOMEM");
return ERR_PTR(-ENOMEM);
}
cell->name = kmalloc(1 + namelen + 1, GFP_KERNEL);
if (!cell->name) {
kfree(cell);
return ERR_PTR(-ENOMEM);
}
cell->name[0] = '.';
cell->name++;
cell->name_len = namelen;
for (i = 0; i < namelen; i++)
cell->name[i] = tolower(name[i]);
cell->name[i] = 0;
cell->net = net;
refcount_set(&cell->ref, 1);
afs: Fix cell refcounting by splitting the usage counter Management of the lifetime of afs_cell struct has some problems due to the usage counter being used to determine whether objects of that type are in use in addition to whether anyone might be interested in the structure. This is made trickier by cell objects being cached for a period of time in case they're quickly reused as they hold the result of a setup process that may be slow (DNS lookups, AFS RPC ops). Problems include the cached root volume from alias resolution pinning its parent cell record, rmmod occasionally hanging and occasionally producing assertion failures. Fix this by splitting the count of active users from the struct reference count. Things then work as follows: (1) The cell cache keeps +1 on the cell's activity count and this has to be dropped before the cell can be removed. afs_manage_cell() tries to exchange the 1 to a 0 with the cells_lock write-locked, and if successful, the record is removed from the net->cells. (2) One struct ref is 'owned' by the activity count. That is put when the active count is reduced to 0 (final_destruction label). (3) A ref can be held on a cell whilst it is queued for management on a work queue without confusing the active count. afs_queue_cell() is added to wrap this. (4) The queue's ref is dropped at the end of the management. This is split out into a separate function, afs_manage_cell_work(). (5) The root volume record is put after a cell is removed (at the final_destruction label) rather then in the RCU destruction routine. (6) Volumes hold struct refs, but aren't active users. (7) Both counts are displayed in /proc/net/afs/cells. There are some management function changes: (*) afs_put_cell() now just decrements the refcount and triggers the RCU destruction if it becomes 0. It no longer sets a timer to have the manager do this. (*) afs_use_cell() and afs_unuse_cell() are added to increase and decrease the active count. afs_unuse_cell() sets the management timer. (*) afs_queue_cell() is added to queue a cell with approprate refs. There are also some other fixes: (*) Don't let /proc/net/afs/cells access a cell's vllist if it's NULL. (*) Make sure that candidate cells in lookups are properly destroyed rather than being simply kfree'd. This ensures the bits it points to are destroyed also. (*) afs_dec_cells_outstanding() is now called in cell destruction rather than at "final_destruction". This ensures that cell->net is still valid to the end of the destructor. (*) As a consequence of the previous two changes, move the increment of net->cells_outstanding that was at the point of insertion into the tree to the allocation routine to correctly balance things. Fixes: 989782dcdc91 ("afs: Overhaul cell database management") Signed-off-by: David Howells <dhowells@redhat.com>
2019-07-23 11:24:59 +01:00
atomic_set(&cell->active, 0);
afs: Simplify cell record handling Simplify afs_cell record handling to avoid very occasional races that cause module removal to hang (it waits for all cell records to be removed). There are two things that particularly contribute to the difficulty: firstly, the code tries to pass a ref on the cell to the cell's maintenance work item (which gets awkward if the work item is already queued); and, secondly, there's an overall cell manager that tries to use just one timer for the entire cell collection (to avoid having loads of timers). However, both of these are probably unnecessarily restrictive. To simplify this, the following changes are made: (1) The cell record collection manager is removed. Each cell record manages itself individually. (2) Each afs_cell is given a second work item (cell->destroyer) that is queued when its refcount reaches zero. This is not done in the context of the putting thread as it might be in an inconvenient place to sleep. (3) Each afs_cell is given its own timer. The timer is used to expire the cell record after a period of unuse if not otherwise pinned and can also be used for other maintenance tasks if necessary (of which there are currently none as DNS refresh is triggered by filesystem operations). (4) The afs_cell manager work item (cell->manager) is no longer given a ref on the cell when queued; rather, the manager must be deleted. This does away with the need to deal with the consequences of losing a race to queue cell->manager. Clean up of extra queuing is deferred to the destroyer. (5) The cell destroyer work item makes sure the cell timer is removed and that the normal cell work is cancelled before farming the actual destruction off to RCU. (6) When a network namespace is destroyed or the kafs module is unloaded, it's now a simple matter of marking the namespace as dead then just waking up all the cell work items. They will then remove and destroy themselves once all remaining activity counts and/or a ref counts are dropped. This makes sure that all server records are dropped first. (7) The cell record state set is reduced to just four states: SETTING_UP, ACTIVE, REMOVING and DEAD. The record persists in the active state even when it's not being used until the time comes to remove it rather than downgrading it to an inactive state from whence it can be restored. This means that the cell still appears in /proc and /afs when not in use until it switches to the REMOVING state - at which point it is removed. Note that the REMOVING state is included so that someone wanting to resurrect the cell record is forced to wait whilst the cell is torn down in that state. Once it's in the DEAD state, it has been removed from net->cells tree and is no longer findable and can be replaced. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org cc: linux-fsdevel@vger.kernel.org Link: https://lore.kernel.org/r/20250224234154.2014840-16-dhowells@redhat.com/ # v1 Link: https://lore.kernel.org/r/20250310094206.801057-12-dhowells@redhat.com/ # v4
2025-02-24 16:06:03 +00:00
INIT_WORK(&cell->destroyer, afs_destroy_cell_work);
afs: Fix cell refcounting by splitting the usage counter Management of the lifetime of afs_cell struct has some problems due to the usage counter being used to determine whether objects of that type are in use in addition to whether anyone might be interested in the structure. This is made trickier by cell objects being cached for a period of time in case they're quickly reused as they hold the result of a setup process that may be slow (DNS lookups, AFS RPC ops). Problems include the cached root volume from alias resolution pinning its parent cell record, rmmod occasionally hanging and occasionally producing assertion failures. Fix this by splitting the count of active users from the struct reference count. Things then work as follows: (1) The cell cache keeps +1 on the cell's activity count and this has to be dropped before the cell can be removed. afs_manage_cell() tries to exchange the 1 to a 0 with the cells_lock write-locked, and if successful, the record is removed from the net->cells. (2) One struct ref is 'owned' by the activity count. That is put when the active count is reduced to 0 (final_destruction label). (3) A ref can be held on a cell whilst it is queued for management on a work queue without confusing the active count. afs_queue_cell() is added to wrap this. (4) The queue's ref is dropped at the end of the management. This is split out into a separate function, afs_manage_cell_work(). (5) The root volume record is put after a cell is removed (at the final_destruction label) rather then in the RCU destruction routine. (6) Volumes hold struct refs, but aren't active users. (7) Both counts are displayed in /proc/net/afs/cells. There are some management function changes: (*) afs_put_cell() now just decrements the refcount and triggers the RCU destruction if it becomes 0. It no longer sets a timer to have the manager do this. (*) afs_use_cell() and afs_unuse_cell() are added to increase and decrease the active count. afs_unuse_cell() sets the management timer. (*) afs_queue_cell() is added to queue a cell with approprate refs. There are also some other fixes: (*) Don't let /proc/net/afs/cells access a cell's vllist if it's NULL. (*) Make sure that candidate cells in lookups are properly destroyed rather than being simply kfree'd. This ensures the bits it points to are destroyed also. (*) afs_dec_cells_outstanding() is now called in cell destruction rather than at "final_destruction". This ensures that cell->net is still valid to the end of the destructor. (*) As a consequence of the previous two changes, move the increment of net->cells_outstanding that was at the point of insertion into the tree to the allocation routine to correctly balance things. Fixes: 989782dcdc91 ("afs: Overhaul cell database management") Signed-off-by: David Howells <dhowells@redhat.com>
2019-07-23 11:24:59 +01:00
INIT_WORK(&cell->manager, afs_manage_cell_work);
afs: Simplify cell record handling Simplify afs_cell record handling to avoid very occasional races that cause module removal to hang (it waits for all cell records to be removed). There are two things that particularly contribute to the difficulty: firstly, the code tries to pass a ref on the cell to the cell's maintenance work item (which gets awkward if the work item is already queued); and, secondly, there's an overall cell manager that tries to use just one timer for the entire cell collection (to avoid having loads of timers). However, both of these are probably unnecessarily restrictive. To simplify this, the following changes are made: (1) The cell record collection manager is removed. Each cell record manages itself individually. (2) Each afs_cell is given a second work item (cell->destroyer) that is queued when its refcount reaches zero. This is not done in the context of the putting thread as it might be in an inconvenient place to sleep. (3) Each afs_cell is given its own timer. The timer is used to expire the cell record after a period of unuse if not otherwise pinned and can also be used for other maintenance tasks if necessary (of which there are currently none as DNS refresh is triggered by filesystem operations). (4) The afs_cell manager work item (cell->manager) is no longer given a ref on the cell when queued; rather, the manager must be deleted. This does away with the need to deal with the consequences of losing a race to queue cell->manager. Clean up of extra queuing is deferred to the destroyer. (5) The cell destroyer work item makes sure the cell timer is removed and that the normal cell work is cancelled before farming the actual destruction off to RCU. (6) When a network namespace is destroyed or the kafs module is unloaded, it's now a simple matter of marking the namespace as dead then just waking up all the cell work items. They will then remove and destroy themselves once all remaining activity counts and/or a ref counts are dropped. This makes sure that all server records are dropped first. (7) The cell record state set is reduced to just four states: SETTING_UP, ACTIVE, REMOVING and DEAD. The record persists in the active state even when it's not being used until the time comes to remove it rather than downgrading it to an inactive state from whence it can be restored. This means that the cell still appears in /proc and /afs when not in use until it switches to the REMOVING state - at which point it is removed. Note that the REMOVING state is included so that someone wanting to resurrect the cell record is forced to wait whilst the cell is torn down in that state. Once it's in the DEAD state, it has been removed from net->cells tree and is no longer findable and can be replaced. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org cc: linux-fsdevel@vger.kernel.org Link: https://lore.kernel.org/r/20250224234154.2014840-16-dhowells@redhat.com/ # v1 Link: https://lore.kernel.org/r/20250310094206.801057-12-dhowells@redhat.com/ # v4
2025-02-24 16:06:03 +00:00
timer_setup(&cell->management_timer, afs_cell_timer, 0);
init_rwsem(&cell->vs_lock);
cell->volumes = RB_ROOT;
INIT_HLIST_HEAD(&cell->proc_volumes);
seqlock_init(&cell->volume_lock);
cell->fs_servers = RB_ROOT;
afs: Fix afs_server ref accounting The current way that afs_server refs are accounted and cleaned up sometimes cause rmmod to hang when it is waiting for cell records to be removed. The problem is that the cell cleanup might occasionally happen before the server cleanup and then there's nothing that causes the cell to garbage-collect the remaining servers as they become inactive. Partially fix this by: (1) Give each afs_server record its own management timer that rather than relying on the cell manager's central timer to drive each individual cell's maintenance work item to garbage collect servers. This timer is set when afs_unuse_server() reduces a server's activity count to zero and will schedule the server's destroyer work item upon firing. (2) Give each afs_server record its own destroyer work item that removes the record from the cell's database, shuts down the timer, cancels any pending work for itself, sends an RPC to the server to cancel outstanding callbacks. This change, in combination with the timer, obviates the need to try and coordinate so closely between the cell record and a bunch of other server records to try and tear everything down in a coordinated fashion. With this, the cell record is pinned until the server RCU is complete and namespace/module removal will wait until all the cell records are removed. (3) Now that incoming calls are mapped to servers (and thus cells) using data attached to an rxrpc_peer, the UUID-to-server mapping tree is moved from the namespace to the cell (cell->fs_servers). This means there can no longer be duplicates therein - and that allows the mapping tree to be simpler as there doesn't need to be a chain of same-UUID servers that are in different cells. (4) The lock protecting the UUID mapping tree is switched to an rw_semaphore on the cell rather than a seqlock on the namespace as it's now only used during mounting in contexts in which we're allowed to sleep. (5) When it comes time for a cell that is being removed to purge its set of servers, it just needs to iterate over them and wake them up. Once a server becomes inactive, its destroyer work item will observe the state of the cell and immediately remove that record. (6) When a server record is removed, it is marked AFS_SERVER_FL_EXPIRED to prevent reattempts at removal. The record will be dispatched to RCU for destruction once its refcount reaches 0. (7) The AFS_SERVER_FL_UNCREATED/CREATING flags are used to synchronise simultaneous creation attempts. If one attempt fails, it will abandon the attempt and allow another to try again. Note that the record can't just be abandoned when dead as it's bound into a server list attached to a volume and only subject to replacement if the server list obtained for the volume from the VLDB changes. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org cc: linux-fsdevel@vger.kernel.org Link: https://lore.kernel.org/r/20250224234154.2014840-15-dhowells@redhat.com/ # v1 Link: https://lore.kernel.org/r/20250310094206.801057-11-dhowells@redhat.com/ # v4
2025-02-24 16:51:36 +00:00
init_rwsem(&cell->fs_lock);
rwlock_init(&cell->vl_servers_lock);
afs: Detect cell aliases 1 - Cells with root volumes Put in the first phase of cell alias detection. This part handles alias detection for cells that have root.cell volumes (which is expected to be likely). When a cell becomes newly active, it is probed for its root.cell volume, and if it has one, this volume is compared against other root.cell volumes to find out if the list of fileserver UUIDs have any in common - and if that's the case, do the address lists of those fileservers have any addresses in common. If they do, the new cell is adjudged to be an alias of the old cell and the old cell is used instead. Comparing is aided by the server list in struct afs_server_list being sorted in UUID order and the addresses in the fileserver address lists being sorted in address order. The cell then retains the afs_volume object for the root.cell volume, even if it's not mounted for future alias checking. This necessary because: (1) Whilst fileservers have UUIDs that are meant to be globally unique, in practice they are not because cells get cloned without changing the UUIDs - so afs_server records need to be per cell. (2) Sometimes the DNS is used to make cell aliases - but if we don't know they're the same, we may end up with multiple superblocks and multiple afs_server records for the same thing, impairing our ability to deliver callback notifications of third party changes (3) The fileserver RPC API doesn't contain the cell name, so it can't tell us which cell it's notifying and can't see that a change made to to one cell should notify the same client that's also accessed as the other cell. Reported-by: Jeffrey Altman <jaltman@auristor.com> Signed-off-by: David Howells <dhowells@redhat.com>
2020-04-25 10:26:02 +01:00
cell->flags = (1 << AFS_CELL_FL_CHECK_ALIAS);
/* Provide a VL server list, filling it in if we were given a list of
* addresses to use.
*/
if (addresses) {
vllist = afs_parse_text_addrs(net,
addresses, strlen(addresses), ':',
VL_SERVICE, AFS_VL_PORT);
if (IS_ERR(vllist)) {
ret = PTR_ERR(vllist);
afs: Set vllist to NULL if addr parsing fails syzbot reported a bug in in afs_put_vlserverlist. kAFS: bad VL server IP address BUG: unable to handle page fault for address: fffffffffffffffa ... Oops: Oops: 0002 [#1] SMP KASAN PTI ... RIP: 0010:refcount_dec_and_test include/linux/refcount.h:450 [inline] RIP: 0010:afs_put_vlserverlist+0x3a/0x220 fs/afs/vl_list.c:67 ... Call Trace: <TASK> afs_alloc_cell fs/afs/cell.c:218 [inline] afs_lookup_cell+0x12a5/0x1680 fs/afs/cell.c:264 afs_cell_init+0x17a/0x380 fs/afs/cell.c:386 afs_proc_rootcell_write+0x21f/0x290 fs/afs/proc.c:247 proc_simple_write+0x114/0x1b0 fs/proc/generic.c:825 pde_write fs/proc/inode.c:330 [inline] proc_reg_write+0x23d/0x330 fs/proc/inode.c:342 vfs_write+0x25c/0x1180 fs/read_write.c:682 ksys_write+0x12a/0x240 fs/read_write.c:736 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline] do_syscall_64+0xcd/0x260 arch/x86/entry/syscall_64.c:94 entry_SYSCALL_64_after_hwframe+0x77/0x7f Because afs_parse_text_addrs() parses incorrectly, its return value -EINVAL is assigned to vllist, which results in -EINVAL being used as the vllist address when afs_put_vlserverlist() is executed. Set the vllist value to NULL when a parsing error occurs to avoid this issue. Fixes: e2c2cb8ef07a ("afs: Simplify cell record handling") Reported-by: syzbot+5c042fbab0b292c98fc6@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=5c042fbab0b292c98fc6 Tested-by: syzbot+5c042fbab0b292c98fc6@syzkaller.appspotmail.com Signed-off-by: Edward Adam Davis <eadavis@qq.com> Signed-off-by: David Howells <dhowells@redhat.com> Link: https://lore.kernel.org/4119365.1753108011@warthog.procyon.org.uk cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org cc: linux-fsdevel@vger.kernel.org Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-07-21 15:26:51 +01:00
vllist = NULL;
goto parse_failed;
}
vllist->source = DNS_RECORD_FROM_CONFIG;
vllist->status = DNS_LOOKUP_NOT_DONE;
cell->dns_expiry = TIME64_MAX;
} else {
ret = -ENOMEM;
vllist = afs_alloc_vlserver_list(0);
if (!vllist)
goto error;
vllist->source = DNS_RECORD_UNAVAILABLE;
vllist->status = DNS_LOOKUP_NOT_DONE;
cell->dns_expiry = ktime_get_real_seconds();
}
rcu_assign_pointer(cell->vl_servers, vllist);
cell->dns_source = vllist->source;
cell->dns_status = vllist->status;
smp_store_release(&cell->dns_lookup_count, 1); /* vs source/status */
afs: Fix cell refcounting by splitting the usage counter Management of the lifetime of afs_cell struct has some problems due to the usage counter being used to determine whether objects of that type are in use in addition to whether anyone might be interested in the structure. This is made trickier by cell objects being cached for a period of time in case they're quickly reused as they hold the result of a setup process that may be slow (DNS lookups, AFS RPC ops). Problems include the cached root volume from alias resolution pinning its parent cell record, rmmod occasionally hanging and occasionally producing assertion failures. Fix this by splitting the count of active users from the struct reference count. Things then work as follows: (1) The cell cache keeps +1 on the cell's activity count and this has to be dropped before the cell can be removed. afs_manage_cell() tries to exchange the 1 to a 0 with the cells_lock write-locked, and if successful, the record is removed from the net->cells. (2) One struct ref is 'owned' by the activity count. That is put when the active count is reduced to 0 (final_destruction label). (3) A ref can be held on a cell whilst it is queued for management on a work queue without confusing the active count. afs_queue_cell() is added to wrap this. (4) The queue's ref is dropped at the end of the management. This is split out into a separate function, afs_manage_cell_work(). (5) The root volume record is put after a cell is removed (at the final_destruction label) rather then in the RCU destruction routine. (6) Volumes hold struct refs, but aren't active users. (7) Both counts are displayed in /proc/net/afs/cells. There are some management function changes: (*) afs_put_cell() now just decrements the refcount and triggers the RCU destruction if it becomes 0. It no longer sets a timer to have the manager do this. (*) afs_use_cell() and afs_unuse_cell() are added to increase and decrease the active count. afs_unuse_cell() sets the management timer. (*) afs_queue_cell() is added to queue a cell with approprate refs. There are also some other fixes: (*) Don't let /proc/net/afs/cells access a cell's vllist if it's NULL. (*) Make sure that candidate cells in lookups are properly destroyed rather than being simply kfree'd. This ensures the bits it points to are destroyed also. (*) afs_dec_cells_outstanding() is now called in cell destruction rather than at "final_destruction". This ensures that cell->net is still valid to the end of the destructor. (*) As a consequence of the previous two changes, move the increment of net->cells_outstanding that was at the point of insertion into the tree to the allocation routine to correctly balance things. Fixes: 989782dcdc91 ("afs: Overhaul cell database management") Signed-off-by: David Howells <dhowells@redhat.com>
2019-07-23 11:24:59 +01:00
atomic_inc(&net->cells_outstanding);
afs: Change dynroot to create contents on demand Change the AFS dynamic root to do things differently: (1) Rather than having the creation of cell records create inodes and dentries for cell mountpoints, create them on demand during lookup. This simplifies cell management and locking as we no longer have to create these objects in advance *and* on speculative lookup by the user for a cell that isn't precreated. (2) Rather than using the libfs dentry-based readdir (the dentries now no longer exist until accessed from (1)), have readdir generate the contents by reading the list of cells. The @cell symlinks get pushed in positions 2 and 3 if rootcell has been configured. (3) Make the @cell symlink dentries persist for the life of the superblock or until reclaimed, but make cell mountpoints disappear immediately if unused. It's not perfect as someone doing an "ls -l /afs" may create a whole bunch of dentries which will be garbage collected immediately. But any dentry that gets automounted will be pinned by the mount, so it shouldn't be too bad. (4) Allocate the inode numbers for the cell mountpoints from an IDR to prevent duplicates appearing in the event it cycles round. The number allocated from the IDR is doubled to provide two inode numbers - one for the normal cell name (RO) and one for the dotted cell name (RW). Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org cc: linux-fsdevel@vger.kernel.org Link: https://lore.kernel.org/r/20250224234154.2014840-8-dhowells@redhat.com/ # v1 Link: https://lore.kernel.org/r/20250310094206.801057-4-dhowells@redhat.com/ # v4
2025-02-24 09:52:58 +00:00
ret = idr_alloc_cyclic(&net->cells_dyn_ino, cell,
2, INT_MAX / 2, GFP_KERNEL);
if (ret < 0)
goto error;
cell->dynroot_ino = ret;
cell->debug_id = atomic_inc_return(&cell_debug_id);
afs: Change dynroot to create contents on demand Change the AFS dynamic root to do things differently: (1) Rather than having the creation of cell records create inodes and dentries for cell mountpoints, create them on demand during lookup. This simplifies cell management and locking as we no longer have to create these objects in advance *and* on speculative lookup by the user for a cell that isn't precreated. (2) Rather than using the libfs dentry-based readdir (the dentries now no longer exist until accessed from (1)), have readdir generate the contents by reading the list of cells. The @cell symlinks get pushed in positions 2 and 3 if rootcell has been configured. (3) Make the @cell symlink dentries persist for the life of the superblock or until reclaimed, but make cell mountpoints disappear immediately if unused. It's not perfect as someone doing an "ls -l /afs" may create a whole bunch of dentries which will be garbage collected immediately. But any dentry that gets automounted will be pinned by the mount, so it shouldn't be too bad. (4) Allocate the inode numbers for the cell mountpoints from an IDR to prevent duplicates appearing in the event it cycles round. The number allocated from the IDR is doubled to provide two inode numbers - one for the normal cell name (RO) and one for the dotted cell name (RW). Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org cc: linux-fsdevel@vger.kernel.org Link: https://lore.kernel.org/r/20250224234154.2014840-8-dhowells@redhat.com/ # v1 Link: https://lore.kernel.org/r/20250310094206.801057-4-dhowells@redhat.com/ # v4
2025-02-24 09:52:58 +00:00
trace_afs_cell(cell->debug_id, 1, 0, afs_cell_trace_alloc);
_leave(" = %p", cell);
return cell;
parse_failed:
if (ret == -EINVAL)
printk(KERN_ERR "kAFS: bad VL server IP address\n");
error:
afs: Simplify cell record handling Simplify afs_cell record handling to avoid very occasional races that cause module removal to hang (it waits for all cell records to be removed). There are two things that particularly contribute to the difficulty: firstly, the code tries to pass a ref on the cell to the cell's maintenance work item (which gets awkward if the work item is already queued); and, secondly, there's an overall cell manager that tries to use just one timer for the entire cell collection (to avoid having loads of timers). However, both of these are probably unnecessarily restrictive. To simplify this, the following changes are made: (1) The cell record collection manager is removed. Each cell record manages itself individually. (2) Each afs_cell is given a second work item (cell->destroyer) that is queued when its refcount reaches zero. This is not done in the context of the putting thread as it might be in an inconvenient place to sleep. (3) Each afs_cell is given its own timer. The timer is used to expire the cell record after a period of unuse if not otherwise pinned and can also be used for other maintenance tasks if necessary (of which there are currently none as DNS refresh is triggered by filesystem operations). (4) The afs_cell manager work item (cell->manager) is no longer given a ref on the cell when queued; rather, the manager must be deleted. This does away with the need to deal with the consequences of losing a race to queue cell->manager. Clean up of extra queuing is deferred to the destroyer. (5) The cell destroyer work item makes sure the cell timer is removed and that the normal cell work is cancelled before farming the actual destruction off to RCU. (6) When a network namespace is destroyed or the kafs module is unloaded, it's now a simple matter of marking the namespace as dead then just waking up all the cell work items. They will then remove and destroy themselves once all remaining activity counts and/or a ref counts are dropped. This makes sure that all server records are dropped first. (7) The cell record state set is reduced to just four states: SETTING_UP, ACTIVE, REMOVING and DEAD. The record persists in the active state even when it's not being used until the time comes to remove it rather than downgrading it to an inactive state from whence it can be restored. This means that the cell still appears in /proc and /afs when not in use until it switches to the REMOVING state - at which point it is removed. Note that the REMOVING state is included so that someone wanting to resurrect the cell record is forced to wait whilst the cell is torn down in that state. Once it's in the DEAD state, it has been removed from net->cells tree and is no longer findable and can be replaced. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org cc: linux-fsdevel@vger.kernel.org Link: https://lore.kernel.org/r/20250224234154.2014840-16-dhowells@redhat.com/ # v1 Link: https://lore.kernel.org/r/20250310094206.801057-12-dhowells@redhat.com/ # v4
2025-02-24 16:06:03 +00:00
afs_put_vlserverlist(cell->net, vllist);
kfree(cell->name - 1);
kfree(cell);
_leave(" = %d", ret);
return ERR_PTR(ret);
}
/*
* afs_lookup_cell - Look up or create a cell record.
afs: Lay the groundwork for supporting network namespaces Lay the groundwork for supporting network namespaces (netns) to the AFS filesystem by moving various global features to a network-namespace struct (afs_net) and providing an instance of this as a temporary global variable that everything uses via accessor functions for the moment. The following changes have been made: (1) Store the netns in the superblock info. This will be obtained from the mounter's nsproxy on a manual mount and inherited from the parent superblock on an automount. (2) The cell list is made per-netns. It can be viewed through /proc/net/afs/cells and also be modified by writing commands to that file. (3) The local workstation cell is set per-ns in /proc/net/afs/rootcell. This is unset by default. (4) The 'rootcell' module parameter, which sets a cell and VL server list modifies the init net namespace, thereby allowing an AFS root fs to be theoretically used. (5) The volume location lists and the file lock manager are made per-netns. (6) The AF_RXRPC socket and associated I/O bits are made per-ns. The various workqueues remain global for the moment. Changes still to be made: (1) /proc/fs/afs/ should be moved to /proc/net/afs/ and a symlink emplaced from the old name. (2) A per-netns subsys needs to be registered for AFS into which it can store its per-netns data. (3) Rather than the AF_RXRPC socket being opened on module init, it needs to be opened on the creation of a superblock in that netns. (4) The socket needs to be closed when the last superblock using it is destroyed and all outstanding client calls on it have been completed. This prevents a reference loop on the namespace. (5) It is possible that several namespaces will want to use AFS, in which case each one will need its own UDP port. These can either be set through /proc/net/afs/cm_port or the kernel can pick one at random. The init_ns gets 7001 by default. Other issues that need resolving: (1) The DNS keyring needs net-namespacing. (2) Where do upcalls go (eg. DNS request-key upcall)? (3) Need something like open_socket_in_file_ns() syscall so that AFS command line tools attempting to operate on an AFS file/volume have their RPC calls go to the right place. Signed-off-by: David Howells <dhowells@redhat.com>
2017-11-02 15:27:45 +00:00
* @net: The network namespace
* @name: The name of the cell.
* @namesz: The strlen of the cell name.
* @vllist: A colon/comma separated list of numeric IP addresses or NULL.
* @excl: T if an error should be given if the cell name already exists.
* @trace: The reason to be logged if the lookup is successful.
*
* Look up a cell record by name and query the DNS for VL server addresses if
* needed. Note that that actual DNS query is punted off to the manager thread
* so that this function can return immediately if interrupted whilst allowing
* cell records to be shared even if not yet fully constructed.
*/
struct afs_cell *afs_lookup_cell(struct afs_net *net,
const char *name, unsigned int namesz,
const char *vllist, bool excl,
enum afs_cell_trace trace)
{
struct afs_cell *cell, *candidate, *cursor;
struct rb_node *parent, **pp;
enum afs_cell_state state;
int ret, n;
_enter("%s,%s", name, vllist);
if (!excl) {
cell = afs_find_cell(net, name, namesz, trace);
if (!IS_ERR(cell))
goto wait_for_cell;
}
/* Assume we're probably going to create a cell and preallocate and
* mostly set up a candidate record. We can then use this to stash the
* name, the net namespace and VL server addresses.
*
* We also want to do this before we hold any locks as it may involve
* upcalling to userspace to make DNS queries.
*/
candidate = afs_alloc_cell(net, name, namesz, vllist);
if (IS_ERR(candidate)) {
_leave(" = %ld", PTR_ERR(candidate));
return candidate;
}
/* Find the insertion point and check to see if someone else added a
* cell whilst we were allocating.
*/
afs: Fix rapid cell addition/removal by not using RCU on cells tree There are a number of problems that are being seen by the rapidly mounting and unmounting an afs dynamic root with an explicit cell and volume specified (which should probably be rejected, but that's a separate issue): What the tests are doing is to look up/create a cell record for the name given and then tear it down again without actually using it to try to talk to a server. This is repeated endlessly, very fast, and the new cell collides with the old one if it's not quick enough to reuse it. It appears (as suggested by Hillf Danton) that the search through the RB tree under a read_seqbegin_or_lock() under RCU conditions isn't safe and that it's not blocking the write_seqlock(), despite taking two passes at it. He suggested that the code should take a ref on the cell it's attempting to look at - but this shouldn't be necessary until we've compared the cell names. It's possible that I'm missing a barrier somewhere. However, using an RCU search for this is overkill, really - we only need to access the cell name in a few places, and they're places where we're may end up sleeping anyway. Fix this by switching to an R/W semaphore instead. Additionally, draw the down_read() call inside the function (renamed to afs_find_cell()) since all the callers were taking the RCU read lock (or should've been[*]). [*] afs_probe_cell_name() should have been, but that doesn't appear to be involved in the bug reports. The symptoms of this look like: general protection fault, probably for non-canonical address 0xf27d208691691fdb: 0000 [#1] PREEMPT SMP KASAN KASAN: maybe wild-memory-access in range [0x93e924348b48fed8-0x93e924348b48fedf] ... RIP: 0010:strncasecmp lib/string.c:52 [inline] RIP: 0010:strncasecmp+0x5f/0x240 lib/string.c:43 afs_lookup_cell_rcu+0x313/0x720 fs/afs/cell.c:88 afs_lookup_cell+0x2ee/0x1440 fs/afs/cell.c:249 afs_parse_source fs/afs/super.c:290 [inline] ... Fixes: 989782dcdc91 ("afs: Overhaul cell database management") Reported-by: syzbot+459a5dce0b4cb70fd076@syzkaller.appspotmail.com Signed-off-by: David Howells <dhowells@redhat.com> cc: Hillf Danton <hdanton@sina.com> cc: syzkaller-bugs@googlegroups.com
2020-10-09 14:11:58 +01:00
down_write(&net->cells_lock);
pp = &net->cells.rb_node;
parent = NULL;
while (*pp) {
parent = *pp;
cursor = rb_entry(parent, struct afs_cell, net_node);
n = strncasecmp(cursor->name, name,
min_t(size_t, cursor->name_len, namesz));
if (n == 0)
n = cursor->name_len - namesz;
if (n < 0)
pp = &(*pp)->rb_left;
else if (n > 0)
pp = &(*pp)->rb_right;
else
goto cell_already_exists;
}
cell = candidate;
candidate = NULL;
afs: Simplify cell record handling Simplify afs_cell record handling to avoid very occasional races that cause module removal to hang (it waits for all cell records to be removed). There are two things that particularly contribute to the difficulty: firstly, the code tries to pass a ref on the cell to the cell's maintenance work item (which gets awkward if the work item is already queued); and, secondly, there's an overall cell manager that tries to use just one timer for the entire cell collection (to avoid having loads of timers). However, both of these are probably unnecessarily restrictive. To simplify this, the following changes are made: (1) The cell record collection manager is removed. Each cell record manages itself individually. (2) Each afs_cell is given a second work item (cell->destroyer) that is queued when its refcount reaches zero. This is not done in the context of the putting thread as it might be in an inconvenient place to sleep. (3) Each afs_cell is given its own timer. The timer is used to expire the cell record after a period of unuse if not otherwise pinned and can also be used for other maintenance tasks if necessary (of which there are currently none as DNS refresh is triggered by filesystem operations). (4) The afs_cell manager work item (cell->manager) is no longer given a ref on the cell when queued; rather, the manager must be deleted. This does away with the need to deal with the consequences of losing a race to queue cell->manager. Clean up of extra queuing is deferred to the destroyer. (5) The cell destroyer work item makes sure the cell timer is removed and that the normal cell work is cancelled before farming the actual destruction off to RCU. (6) When a network namespace is destroyed or the kafs module is unloaded, it's now a simple matter of marking the namespace as dead then just waking up all the cell work items. They will then remove and destroy themselves once all remaining activity counts and/or a ref counts are dropped. This makes sure that all server records are dropped first. (7) The cell record state set is reduced to just four states: SETTING_UP, ACTIVE, REMOVING and DEAD. The record persists in the active state even when it's not being used until the time comes to remove it rather than downgrading it to an inactive state from whence it can be restored. This means that the cell still appears in /proc and /afs when not in use until it switches to the REMOVING state - at which point it is removed. Note that the REMOVING state is included so that someone wanting to resurrect the cell record is forced to wait whilst the cell is torn down in that state. Once it's in the DEAD state, it has been removed from net->cells tree and is no longer findable and can be replaced. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org cc: linux-fsdevel@vger.kernel.org Link: https://lore.kernel.org/r/20250224234154.2014840-16-dhowells@redhat.com/ # v1 Link: https://lore.kernel.org/r/20250310094206.801057-12-dhowells@redhat.com/ # v4
2025-02-24 16:06:03 +00:00
afs_use_cell(cell, trace);
rb_link_node_rcu(&cell->net_node, parent, pp);
rb_insert_color(&cell->net_node, &net->cells);
afs: Fix rapid cell addition/removal by not using RCU on cells tree There are a number of problems that are being seen by the rapidly mounting and unmounting an afs dynamic root with an explicit cell and volume specified (which should probably be rejected, but that's a separate issue): What the tests are doing is to look up/create a cell record for the name given and then tear it down again without actually using it to try to talk to a server. This is repeated endlessly, very fast, and the new cell collides with the old one if it's not quick enough to reuse it. It appears (as suggested by Hillf Danton) that the search through the RB tree under a read_seqbegin_or_lock() under RCU conditions isn't safe and that it's not blocking the write_seqlock(), despite taking two passes at it. He suggested that the code should take a ref on the cell it's attempting to look at - but this shouldn't be necessary until we've compared the cell names. It's possible that I'm missing a barrier somewhere. However, using an RCU search for this is overkill, really - we only need to access the cell name in a few places, and they're places where we're may end up sleeping anyway. Fix this by switching to an R/W semaphore instead. Additionally, draw the down_read() call inside the function (renamed to afs_find_cell()) since all the callers were taking the RCU read lock (or should've been[*]). [*] afs_probe_cell_name() should have been, but that doesn't appear to be involved in the bug reports. The symptoms of this look like: general protection fault, probably for non-canonical address 0xf27d208691691fdb: 0000 [#1] PREEMPT SMP KASAN KASAN: maybe wild-memory-access in range [0x93e924348b48fed8-0x93e924348b48fedf] ... RIP: 0010:strncasecmp lib/string.c:52 [inline] RIP: 0010:strncasecmp+0x5f/0x240 lib/string.c:43 afs_lookup_cell_rcu+0x313/0x720 fs/afs/cell.c:88 afs_lookup_cell+0x2ee/0x1440 fs/afs/cell.c:249 afs_parse_source fs/afs/super.c:290 [inline] ... Fixes: 989782dcdc91 ("afs: Overhaul cell database management") Reported-by: syzbot+459a5dce0b4cb70fd076@syzkaller.appspotmail.com Signed-off-by: David Howells <dhowells@redhat.com> cc: Hillf Danton <hdanton@sina.com> cc: syzkaller-bugs@googlegroups.com
2020-10-09 14:11:58 +01:00
up_write(&net->cells_lock);
afs: Simplify cell record handling Simplify afs_cell record handling to avoid very occasional races that cause module removal to hang (it waits for all cell records to be removed). There are two things that particularly contribute to the difficulty: firstly, the code tries to pass a ref on the cell to the cell's maintenance work item (which gets awkward if the work item is already queued); and, secondly, there's an overall cell manager that tries to use just one timer for the entire cell collection (to avoid having loads of timers). However, both of these are probably unnecessarily restrictive. To simplify this, the following changes are made: (1) The cell record collection manager is removed. Each cell record manages itself individually. (2) Each afs_cell is given a second work item (cell->destroyer) that is queued when its refcount reaches zero. This is not done in the context of the putting thread as it might be in an inconvenient place to sleep. (3) Each afs_cell is given its own timer. The timer is used to expire the cell record after a period of unuse if not otherwise pinned and can also be used for other maintenance tasks if necessary (of which there are currently none as DNS refresh is triggered by filesystem operations). (4) The afs_cell manager work item (cell->manager) is no longer given a ref on the cell when queued; rather, the manager must be deleted. This does away with the need to deal with the consequences of losing a race to queue cell->manager. Clean up of extra queuing is deferred to the destroyer. (5) The cell destroyer work item makes sure the cell timer is removed and that the normal cell work is cancelled before farming the actual destruction off to RCU. (6) When a network namespace is destroyed or the kafs module is unloaded, it's now a simple matter of marking the namespace as dead then just waking up all the cell work items. They will then remove and destroy themselves once all remaining activity counts and/or a ref counts are dropped. This makes sure that all server records are dropped first. (7) The cell record state set is reduced to just four states: SETTING_UP, ACTIVE, REMOVING and DEAD. The record persists in the active state even when it's not being used until the time comes to remove it rather than downgrading it to an inactive state from whence it can be restored. This means that the cell still appears in /proc and /afs when not in use until it switches to the REMOVING state - at which point it is removed. Note that the REMOVING state is included so that someone wanting to resurrect the cell record is forced to wait whilst the cell is torn down in that state. Once it's in the DEAD state, it has been removed from net->cells tree and is no longer findable and can be replaced. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org cc: linux-fsdevel@vger.kernel.org Link: https://lore.kernel.org/r/20250224234154.2014840-16-dhowells@redhat.com/ # v1 Link: https://lore.kernel.org/r/20250310094206.801057-12-dhowells@redhat.com/ # v4
2025-02-24 16:06:03 +00:00
afs_queue_cell(cell, afs_cell_trace_queue_new);
wait_for_cell:
_debug("wait_for_cell");
afs: Simplify cell record handling Simplify afs_cell record handling to avoid very occasional races that cause module removal to hang (it waits for all cell records to be removed). There are two things that particularly contribute to the difficulty: firstly, the code tries to pass a ref on the cell to the cell's maintenance work item (which gets awkward if the work item is already queued); and, secondly, there's an overall cell manager that tries to use just one timer for the entire cell collection (to avoid having loads of timers). However, both of these are probably unnecessarily restrictive. To simplify this, the following changes are made: (1) The cell record collection manager is removed. Each cell record manages itself individually. (2) Each afs_cell is given a second work item (cell->destroyer) that is queued when its refcount reaches zero. This is not done in the context of the putting thread as it might be in an inconvenient place to sleep. (3) Each afs_cell is given its own timer. The timer is used to expire the cell record after a period of unuse if not otherwise pinned and can also be used for other maintenance tasks if necessary (of which there are currently none as DNS refresh is triggered by filesystem operations). (4) The afs_cell manager work item (cell->manager) is no longer given a ref on the cell when queued; rather, the manager must be deleted. This does away with the need to deal with the consequences of losing a race to queue cell->manager. Clean up of extra queuing is deferred to the destroyer. (5) The cell destroyer work item makes sure the cell timer is removed and that the normal cell work is cancelled before farming the actual destruction off to RCU. (6) When a network namespace is destroyed or the kafs module is unloaded, it's now a simple matter of marking the namespace as dead then just waking up all the cell work items. They will then remove and destroy themselves once all remaining activity counts and/or a ref counts are dropped. This makes sure that all server records are dropped first. (7) The cell record state set is reduced to just four states: SETTING_UP, ACTIVE, REMOVING and DEAD. The record persists in the active state even when it's not being used until the time comes to remove it rather than downgrading it to an inactive state from whence it can be restored. This means that the cell still appears in /proc and /afs when not in use until it switches to the REMOVING state - at which point it is removed. Note that the REMOVING state is included so that someone wanting to resurrect the cell record is forced to wait whilst the cell is torn down in that state. Once it's in the DEAD state, it has been removed from net->cells tree and is no longer findable and can be replaced. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org cc: linux-fsdevel@vger.kernel.org Link: https://lore.kernel.org/r/20250224234154.2014840-16-dhowells@redhat.com/ # v1 Link: https://lore.kernel.org/r/20250310094206.801057-12-dhowells@redhat.com/ # v4
2025-02-24 16:06:03 +00:00
state = smp_load_acquire(&cell->state); /* vs error */
if (state != AFS_CELL_ACTIVE &&
state != AFS_CELL_DEAD) {
afs_see_cell(cell, afs_cell_trace_wait);
wait_var_event(&cell->state,
({
state = smp_load_acquire(&cell->state); /* vs error */
state == AFS_CELL_ACTIVE || state == AFS_CELL_DEAD;
}));
}
/* Check the state obtained from the wait check. */
afs: Simplify cell record handling Simplify afs_cell record handling to avoid very occasional races that cause module removal to hang (it waits for all cell records to be removed). There are two things that particularly contribute to the difficulty: firstly, the code tries to pass a ref on the cell to the cell's maintenance work item (which gets awkward if the work item is already queued); and, secondly, there's an overall cell manager that tries to use just one timer for the entire cell collection (to avoid having loads of timers). However, both of these are probably unnecessarily restrictive. To simplify this, the following changes are made: (1) The cell record collection manager is removed. Each cell record manages itself individually. (2) Each afs_cell is given a second work item (cell->destroyer) that is queued when its refcount reaches zero. This is not done in the context of the putting thread as it might be in an inconvenient place to sleep. (3) Each afs_cell is given its own timer. The timer is used to expire the cell record after a period of unuse if not otherwise pinned and can also be used for other maintenance tasks if necessary (of which there are currently none as DNS refresh is triggered by filesystem operations). (4) The afs_cell manager work item (cell->manager) is no longer given a ref on the cell when queued; rather, the manager must be deleted. This does away with the need to deal with the consequences of losing a race to queue cell->manager. Clean up of extra queuing is deferred to the destroyer. (5) The cell destroyer work item makes sure the cell timer is removed and that the normal cell work is cancelled before farming the actual destruction off to RCU. (6) When a network namespace is destroyed or the kafs module is unloaded, it's now a simple matter of marking the namespace as dead then just waking up all the cell work items. They will then remove and destroy themselves once all remaining activity counts and/or a ref counts are dropped. This makes sure that all server records are dropped first. (7) The cell record state set is reduced to just four states: SETTING_UP, ACTIVE, REMOVING and DEAD. The record persists in the active state even when it's not being used until the time comes to remove it rather than downgrading it to an inactive state from whence it can be restored. This means that the cell still appears in /proc and /afs when not in use until it switches to the REMOVING state - at which point it is removed. Note that the REMOVING state is included so that someone wanting to resurrect the cell record is forced to wait whilst the cell is torn down in that state. Once it's in the DEAD state, it has been removed from net->cells tree and is no longer findable and can be replaced. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org cc: linux-fsdevel@vger.kernel.org Link: https://lore.kernel.org/r/20250224234154.2014840-16-dhowells@redhat.com/ # v1 Link: https://lore.kernel.org/r/20250310094206.801057-12-dhowells@redhat.com/ # v4
2025-02-24 16:06:03 +00:00
if (state == AFS_CELL_DEAD) {
ret = cell->error;
goto error;
}
_leave(" = %p [cell]", cell);
return cell;
cell_already_exists:
_debug("cell exists");
cell = cursor;
if (excl) {
ret = -EEXIST;
} else {
afs_use_cell(cursor, trace);
ret = 0;
}
afs: Fix rapid cell addition/removal by not using RCU on cells tree There are a number of problems that are being seen by the rapidly mounting and unmounting an afs dynamic root with an explicit cell and volume specified (which should probably be rejected, but that's a separate issue): What the tests are doing is to look up/create a cell record for the name given and then tear it down again without actually using it to try to talk to a server. This is repeated endlessly, very fast, and the new cell collides with the old one if it's not quick enough to reuse it. It appears (as suggested by Hillf Danton) that the search through the RB tree under a read_seqbegin_or_lock() under RCU conditions isn't safe and that it's not blocking the write_seqlock(), despite taking two passes at it. He suggested that the code should take a ref on the cell it's attempting to look at - but this shouldn't be necessary until we've compared the cell names. It's possible that I'm missing a barrier somewhere. However, using an RCU search for this is overkill, really - we only need to access the cell name in a few places, and they're places where we're may end up sleeping anyway. Fix this by switching to an R/W semaphore instead. Additionally, draw the down_read() call inside the function (renamed to afs_find_cell()) since all the callers were taking the RCU read lock (or should've been[*]). [*] afs_probe_cell_name() should have been, but that doesn't appear to be involved in the bug reports. The symptoms of this look like: general protection fault, probably for non-canonical address 0xf27d208691691fdb: 0000 [#1] PREEMPT SMP KASAN KASAN: maybe wild-memory-access in range [0x93e924348b48fed8-0x93e924348b48fedf] ... RIP: 0010:strncasecmp lib/string.c:52 [inline] RIP: 0010:strncasecmp+0x5f/0x240 lib/string.c:43 afs_lookup_cell_rcu+0x313/0x720 fs/afs/cell.c:88 afs_lookup_cell+0x2ee/0x1440 fs/afs/cell.c:249 afs_parse_source fs/afs/super.c:290 [inline] ... Fixes: 989782dcdc91 ("afs: Overhaul cell database management") Reported-by: syzbot+459a5dce0b4cb70fd076@syzkaller.appspotmail.com Signed-off-by: David Howells <dhowells@redhat.com> cc: Hillf Danton <hdanton@sina.com> cc: syzkaller-bugs@googlegroups.com
2020-10-09 14:11:58 +01:00
up_write(&net->cells_lock);
afs: Fix cell refcounting by splitting the usage counter Management of the lifetime of afs_cell struct has some problems due to the usage counter being used to determine whether objects of that type are in use in addition to whether anyone might be interested in the structure. This is made trickier by cell objects being cached for a period of time in case they're quickly reused as they hold the result of a setup process that may be slow (DNS lookups, AFS RPC ops). Problems include the cached root volume from alias resolution pinning its parent cell record, rmmod occasionally hanging and occasionally producing assertion failures. Fix this by splitting the count of active users from the struct reference count. Things then work as follows: (1) The cell cache keeps +1 on the cell's activity count and this has to be dropped before the cell can be removed. afs_manage_cell() tries to exchange the 1 to a 0 with the cells_lock write-locked, and if successful, the record is removed from the net->cells. (2) One struct ref is 'owned' by the activity count. That is put when the active count is reduced to 0 (final_destruction label). (3) A ref can be held on a cell whilst it is queued for management on a work queue without confusing the active count. afs_queue_cell() is added to wrap this. (4) The queue's ref is dropped at the end of the management. This is split out into a separate function, afs_manage_cell_work(). (5) The root volume record is put after a cell is removed (at the final_destruction label) rather then in the RCU destruction routine. (6) Volumes hold struct refs, but aren't active users. (7) Both counts are displayed in /proc/net/afs/cells. There are some management function changes: (*) afs_put_cell() now just decrements the refcount and triggers the RCU destruction if it becomes 0. It no longer sets a timer to have the manager do this. (*) afs_use_cell() and afs_unuse_cell() are added to increase and decrease the active count. afs_unuse_cell() sets the management timer. (*) afs_queue_cell() is added to queue a cell with approprate refs. There are also some other fixes: (*) Don't let /proc/net/afs/cells access a cell's vllist if it's NULL. (*) Make sure that candidate cells in lookups are properly destroyed rather than being simply kfree'd. This ensures the bits it points to are destroyed also. (*) afs_dec_cells_outstanding() is now called in cell destruction rather than at "final_destruction". This ensures that cell->net is still valid to the end of the destructor. (*) As a consequence of the previous two changes, move the increment of net->cells_outstanding that was at the point of insertion into the tree to the allocation routine to correctly balance things. Fixes: 989782dcdc91 ("afs: Overhaul cell database management") Signed-off-by: David Howells <dhowells@redhat.com>
2019-07-23 11:24:59 +01:00
if (candidate)
afs_put_cell(candidate, afs_cell_trace_put_candidate);
if (ret == 0)
goto wait_for_cell;
goto error_noput;
error:
afs_unuse_cell(cell, afs_cell_trace_unuse_lookup_error);
error_noput:
_leave(" = %d [error]", ret);
return ERR_PTR(ret);
}
/*
* set the root cell information
* - can be called with a module parameter string
* - can be called from a write to /proc/fs/afs/rootcell
*/
int afs_cell_init(struct afs_net *net, const char *rootcell)
{
struct afs_cell *old_root, *new_root;
const char *cp, *vllist;
size_t len;
_enter("");
if (!rootcell) {
/* module is loaded with no parameters, or built statically.
* - in the future we might initialize cell DB here.
*/
_leave(" = 0 [no root]");
return 0;
}
cp = strchr(rootcell, ':');
if (!cp) {
_debug("kAFS: no VL server IP addresses specified");
vllist = NULL;
len = strlen(rootcell);
} else {
vllist = cp + 1;
len = cp - rootcell;
}
if (len == 0 || !rootcell[0] || rootcell[0] == '.' || rootcell[len - 1] == '.')
return -EINVAL;
if (memchr(rootcell, '/', len))
return -EINVAL;
cp = strstr(rootcell, "..");
if (cp && cp < rootcell + len)
return -EINVAL;
/* allocate a cell record for the root/workstation cell */
new_root = afs_lookup_cell(net, rootcell, len, vllist, false,
afs_cell_trace_use_lookup_ws);
if (IS_ERR(new_root)) {
_leave(" = %ld", PTR_ERR(new_root));
return PTR_ERR(new_root);
}
if (!test_and_set_bit(AFS_CELL_FL_NO_GC, &new_root->flags))
afs_use_cell(new_root, afs_cell_trace_use_pin);
/* install the new cell */
afs: Fix rapid cell addition/removal by not using RCU on cells tree There are a number of problems that are being seen by the rapidly mounting and unmounting an afs dynamic root with an explicit cell and volume specified (which should probably be rejected, but that's a separate issue): What the tests are doing is to look up/create a cell record for the name given and then tear it down again without actually using it to try to talk to a server. This is repeated endlessly, very fast, and the new cell collides with the old one if it's not quick enough to reuse it. It appears (as suggested by Hillf Danton) that the search through the RB tree under a read_seqbegin_or_lock() under RCU conditions isn't safe and that it's not blocking the write_seqlock(), despite taking two passes at it. He suggested that the code should take a ref on the cell it's attempting to look at - but this shouldn't be necessary until we've compared the cell names. It's possible that I'm missing a barrier somewhere. However, using an RCU search for this is overkill, really - we only need to access the cell name in a few places, and they're places where we're may end up sleeping anyway. Fix this by switching to an R/W semaphore instead. Additionally, draw the down_read() call inside the function (renamed to afs_find_cell()) since all the callers were taking the RCU read lock (or should've been[*]). [*] afs_probe_cell_name() should have been, but that doesn't appear to be involved in the bug reports. The symptoms of this look like: general protection fault, probably for non-canonical address 0xf27d208691691fdb: 0000 [#1] PREEMPT SMP KASAN KASAN: maybe wild-memory-access in range [0x93e924348b48fed8-0x93e924348b48fedf] ... RIP: 0010:strncasecmp lib/string.c:52 [inline] RIP: 0010:strncasecmp+0x5f/0x240 lib/string.c:43 afs_lookup_cell_rcu+0x313/0x720 fs/afs/cell.c:88 afs_lookup_cell+0x2ee/0x1440 fs/afs/cell.c:249 afs_parse_source fs/afs/super.c:290 [inline] ... Fixes: 989782dcdc91 ("afs: Overhaul cell database management") Reported-by: syzbot+459a5dce0b4cb70fd076@syzkaller.appspotmail.com Signed-off-by: David Howells <dhowells@redhat.com> cc: Hillf Danton <hdanton@sina.com> cc: syzkaller-bugs@googlegroups.com
2020-10-09 14:11:58 +01:00
down_write(&net->cells_lock);
old_root = rcu_replace_pointer(net->ws_cell, new_root,
lockdep_is_held(&net->cells_lock));
afs: Fix rapid cell addition/removal by not using RCU on cells tree There are a number of problems that are being seen by the rapidly mounting and unmounting an afs dynamic root with an explicit cell and volume specified (which should probably be rejected, but that's a separate issue): What the tests are doing is to look up/create a cell record for the name given and then tear it down again without actually using it to try to talk to a server. This is repeated endlessly, very fast, and the new cell collides with the old one if it's not quick enough to reuse it. It appears (as suggested by Hillf Danton) that the search through the RB tree under a read_seqbegin_or_lock() under RCU conditions isn't safe and that it's not blocking the write_seqlock(), despite taking two passes at it. He suggested that the code should take a ref on the cell it's attempting to look at - but this shouldn't be necessary until we've compared the cell names. It's possible that I'm missing a barrier somewhere. However, using an RCU search for this is overkill, really - we only need to access the cell name in a few places, and they're places where we're may end up sleeping anyway. Fix this by switching to an R/W semaphore instead. Additionally, draw the down_read() call inside the function (renamed to afs_find_cell()) since all the callers were taking the RCU read lock (or should've been[*]). [*] afs_probe_cell_name() should have been, but that doesn't appear to be involved in the bug reports. The symptoms of this look like: general protection fault, probably for non-canonical address 0xf27d208691691fdb: 0000 [#1] PREEMPT SMP KASAN KASAN: maybe wild-memory-access in range [0x93e924348b48fed8-0x93e924348b48fedf] ... RIP: 0010:strncasecmp lib/string.c:52 [inline] RIP: 0010:strncasecmp+0x5f/0x240 lib/string.c:43 afs_lookup_cell_rcu+0x313/0x720 fs/afs/cell.c:88 afs_lookup_cell+0x2ee/0x1440 fs/afs/cell.c:249 afs_parse_source fs/afs/super.c:290 [inline] ... Fixes: 989782dcdc91 ("afs: Overhaul cell database management") Reported-by: syzbot+459a5dce0b4cb70fd076@syzkaller.appspotmail.com Signed-off-by: David Howells <dhowells@redhat.com> cc: Hillf Danton <hdanton@sina.com> cc: syzkaller-bugs@googlegroups.com
2020-10-09 14:11:58 +01:00
up_write(&net->cells_lock);
afs_unuse_cell(old_root, afs_cell_trace_unuse_ws);
_leave(" = 0");
return 0;
}
/*
* Update a cell's VL server address list from the DNS.
*/
static int afs_update_cell(struct afs_cell *cell)
{
struct afs_vlserver_list *vllist, *old = NULL, *p;
unsigned int min_ttl = READ_ONCE(afs_cell_min_ttl);
unsigned int max_ttl = READ_ONCE(afs_cell_max_ttl);
time64_t now, expiry = 0;
int ret = 0;
_enter("%s", cell->name);
vllist = afs_dns_query(cell, &expiry);
if (IS_ERR(vllist)) {
ret = PTR_ERR(vllist);
_debug("%s: fail %d", cell->name, ret);
if (ret == -ENOMEM)
goto out_wake;
vllist = afs_alloc_vlserver_list(0);
if (!vllist) {
if (ret >= 0)
ret = -ENOMEM;
goto out_wake;
}
switch (ret) {
case -ENODATA:
case -EDESTADDRREQ:
vllist->status = DNS_LOOKUP_GOT_NOT_FOUND;
break;
case -EAGAIN:
case -ECONNREFUSED:
vllist->status = DNS_LOOKUP_GOT_TEMP_FAILURE;
break;
default:
vllist->status = DNS_LOOKUP_GOT_LOCAL_FAILURE;
break;
}
}
_debug("%s: got list %d %d", cell->name, vllist->source, vllist->status);
cell->dns_status = vllist->status;
now = ktime_get_real_seconds();
if (min_ttl > max_ttl)
max_ttl = min_ttl;
if (expiry < now + min_ttl)
expiry = now + min_ttl;
else if (expiry > now + max_ttl)
expiry = now + max_ttl;
_debug("%s: status %d", cell->name, vllist->status);
if (vllist->source == DNS_RECORD_UNAVAILABLE) {
switch (vllist->status) {
case DNS_LOOKUP_GOT_NOT_FOUND:
/* The DNS said that the cell does not exist or there
* weren't any addresses to be had.
*/
cell->dns_expiry = expiry;
break;
case DNS_LOOKUP_BAD:
case DNS_LOOKUP_GOT_LOCAL_FAILURE:
case DNS_LOOKUP_GOT_TEMP_FAILURE:
case DNS_LOOKUP_GOT_NS_FAILURE:
default:
cell->dns_expiry = now + 10;
break;
}
} else {
cell->dns_expiry = expiry;
}
/* Replace the VL server list if the new record has servers or the old
* record doesn't.
*/
write_lock(&cell->vl_servers_lock);
p = rcu_dereference_protected(cell->vl_servers, true);
if (vllist->nr_servers > 0 || p->nr_servers == 0) {
rcu_assign_pointer(cell->vl_servers, vllist);
cell->dns_source = vllist->source;
old = p;
}
write_unlock(&cell->vl_servers_lock);
afs_put_vlserverlist(cell->net, old);
out_wake:
smp_store_release(&cell->dns_lookup_count,
cell->dns_lookup_count + 1); /* vs source/status */
wake_up_var(&cell->dns_lookup_count);
_leave(" = %d", ret);
return ret;
}
/*
* Destroy a cell record
*/
static void afs_cell_destroy(struct rcu_head *rcu)
{
struct afs_cell *cell = container_of(rcu, struct afs_cell, rcu);
afs: Fix cell refcounting by splitting the usage counter Management of the lifetime of afs_cell struct has some problems due to the usage counter being used to determine whether objects of that type are in use in addition to whether anyone might be interested in the structure. This is made trickier by cell objects being cached for a period of time in case they're quickly reused as they hold the result of a setup process that may be slow (DNS lookups, AFS RPC ops). Problems include the cached root volume from alias resolution pinning its parent cell record, rmmod occasionally hanging and occasionally producing assertion failures. Fix this by splitting the count of active users from the struct reference count. Things then work as follows: (1) The cell cache keeps +1 on the cell's activity count and this has to be dropped before the cell can be removed. afs_manage_cell() tries to exchange the 1 to a 0 with the cells_lock write-locked, and if successful, the record is removed from the net->cells. (2) One struct ref is 'owned' by the activity count. That is put when the active count is reduced to 0 (final_destruction label). (3) A ref can be held on a cell whilst it is queued for management on a work queue without confusing the active count. afs_queue_cell() is added to wrap this. (4) The queue's ref is dropped at the end of the management. This is split out into a separate function, afs_manage_cell_work(). (5) The root volume record is put after a cell is removed (at the final_destruction label) rather then in the RCU destruction routine. (6) Volumes hold struct refs, but aren't active users. (7) Both counts are displayed in /proc/net/afs/cells. There are some management function changes: (*) afs_put_cell() now just decrements the refcount and triggers the RCU destruction if it becomes 0. It no longer sets a timer to have the manager do this. (*) afs_use_cell() and afs_unuse_cell() are added to increase and decrease the active count. afs_unuse_cell() sets the management timer. (*) afs_queue_cell() is added to queue a cell with approprate refs. There are also some other fixes: (*) Don't let /proc/net/afs/cells access a cell's vllist if it's NULL. (*) Make sure that candidate cells in lookups are properly destroyed rather than being simply kfree'd. This ensures the bits it points to are destroyed also. (*) afs_dec_cells_outstanding() is now called in cell destruction rather than at "final_destruction". This ensures that cell->net is still valid to the end of the destructor. (*) As a consequence of the previous two changes, move the increment of net->cells_outstanding that was at the point of insertion into the tree to the allocation routine to correctly balance things. Fixes: 989782dcdc91 ("afs: Overhaul cell database management") Signed-off-by: David Howells <dhowells@redhat.com>
2019-07-23 11:24:59 +01:00
struct afs_net *net = cell->net;
int r;
_enter("%p{%s}", cell, cell->name);
r = refcount_read(&cell->ref);
ASSERTCMP(r, ==, 0);
trace_afs_cell(cell->debug_id, r, atomic_read(&cell->active), afs_cell_trace_free);
afs: Fix cell refcounting by splitting the usage counter Management of the lifetime of afs_cell struct has some problems due to the usage counter being used to determine whether objects of that type are in use in addition to whether anyone might be interested in the structure. This is made trickier by cell objects being cached for a period of time in case they're quickly reused as they hold the result of a setup process that may be slow (DNS lookups, AFS RPC ops). Problems include the cached root volume from alias resolution pinning its parent cell record, rmmod occasionally hanging and occasionally producing assertion failures. Fix this by splitting the count of active users from the struct reference count. Things then work as follows: (1) The cell cache keeps +1 on the cell's activity count and this has to be dropped before the cell can be removed. afs_manage_cell() tries to exchange the 1 to a 0 with the cells_lock write-locked, and if successful, the record is removed from the net->cells. (2) One struct ref is 'owned' by the activity count. That is put when the active count is reduced to 0 (final_destruction label). (3) A ref can be held on a cell whilst it is queued for management on a work queue without confusing the active count. afs_queue_cell() is added to wrap this. (4) The queue's ref is dropped at the end of the management. This is split out into a separate function, afs_manage_cell_work(). (5) The root volume record is put after a cell is removed (at the final_destruction label) rather then in the RCU destruction routine. (6) Volumes hold struct refs, but aren't active users. (7) Both counts are displayed in /proc/net/afs/cells. There are some management function changes: (*) afs_put_cell() now just decrements the refcount and triggers the RCU destruction if it becomes 0. It no longer sets a timer to have the manager do this. (*) afs_use_cell() and afs_unuse_cell() are added to increase and decrease the active count. afs_unuse_cell() sets the management timer. (*) afs_queue_cell() is added to queue a cell with approprate refs. There are also some other fixes: (*) Don't let /proc/net/afs/cells access a cell's vllist if it's NULL. (*) Make sure that candidate cells in lookups are properly destroyed rather than being simply kfree'd. This ensures the bits it points to are destroyed also. (*) afs_dec_cells_outstanding() is now called in cell destruction rather than at "final_destruction". This ensures that cell->net is still valid to the end of the destructor. (*) As a consequence of the previous two changes, move the increment of net->cells_outstanding that was at the point of insertion into the tree to the allocation routine to correctly balance things. Fixes: 989782dcdc91 ("afs: Overhaul cell database management") Signed-off-by: David Howells <dhowells@redhat.com>
2019-07-23 11:24:59 +01:00
afs_put_vlserverlist(net, rcu_access_pointer(cell->vl_servers));
afs_unuse_cell(cell->alias_of, afs_cell_trace_unuse_alias);
key_put(cell->anonymous_key);
afs: Change dynroot to create contents on demand Change the AFS dynamic root to do things differently: (1) Rather than having the creation of cell records create inodes and dentries for cell mountpoints, create them on demand during lookup. This simplifies cell management and locking as we no longer have to create these objects in advance *and* on speculative lookup by the user for a cell that isn't precreated. (2) Rather than using the libfs dentry-based readdir (the dentries now no longer exist until accessed from (1)), have readdir generate the contents by reading the list of cells. The @cell symlinks get pushed in positions 2 and 3 if rootcell has been configured. (3) Make the @cell symlink dentries persist for the life of the superblock or until reclaimed, but make cell mountpoints disappear immediately if unused. It's not perfect as someone doing an "ls -l /afs" may create a whole bunch of dentries which will be garbage collected immediately. But any dentry that gets automounted will be pinned by the mount, so it shouldn't be too bad. (4) Allocate the inode numbers for the cell mountpoints from an IDR to prevent duplicates appearing in the event it cycles round. The number allocated from the IDR is doubled to provide two inode numbers - one for the normal cell name (RO) and one for the dotted cell name (RW). Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org cc: linux-fsdevel@vger.kernel.org Link: https://lore.kernel.org/r/20250224234154.2014840-8-dhowells@redhat.com/ # v1 Link: https://lore.kernel.org/r/20250310094206.801057-4-dhowells@redhat.com/ # v4
2025-02-24 09:52:58 +00:00
idr_remove(&net->cells_dyn_ino, cell->dynroot_ino);
kfree(cell->name - 1);
kfree(cell);
afs: Fix cell refcounting by splitting the usage counter Management of the lifetime of afs_cell struct has some problems due to the usage counter being used to determine whether objects of that type are in use in addition to whether anyone might be interested in the structure. This is made trickier by cell objects being cached for a period of time in case they're quickly reused as they hold the result of a setup process that may be slow (DNS lookups, AFS RPC ops). Problems include the cached root volume from alias resolution pinning its parent cell record, rmmod occasionally hanging and occasionally producing assertion failures. Fix this by splitting the count of active users from the struct reference count. Things then work as follows: (1) The cell cache keeps +1 on the cell's activity count and this has to be dropped before the cell can be removed. afs_manage_cell() tries to exchange the 1 to a 0 with the cells_lock write-locked, and if successful, the record is removed from the net->cells. (2) One struct ref is 'owned' by the activity count. That is put when the active count is reduced to 0 (final_destruction label). (3) A ref can be held on a cell whilst it is queued for management on a work queue without confusing the active count. afs_queue_cell() is added to wrap this. (4) The queue's ref is dropped at the end of the management. This is split out into a separate function, afs_manage_cell_work(). (5) The root volume record is put after a cell is removed (at the final_destruction label) rather then in the RCU destruction routine. (6) Volumes hold struct refs, but aren't active users. (7) Both counts are displayed in /proc/net/afs/cells. There are some management function changes: (*) afs_put_cell() now just decrements the refcount and triggers the RCU destruction if it becomes 0. It no longer sets a timer to have the manager do this. (*) afs_use_cell() and afs_unuse_cell() are added to increase and decrease the active count. afs_unuse_cell() sets the management timer. (*) afs_queue_cell() is added to queue a cell with approprate refs. There are also some other fixes: (*) Don't let /proc/net/afs/cells access a cell's vllist if it's NULL. (*) Make sure that candidate cells in lookups are properly destroyed rather than being simply kfree'd. This ensures the bits it points to are destroyed also. (*) afs_dec_cells_outstanding() is now called in cell destruction rather than at "final_destruction". This ensures that cell->net is still valid to the end of the destructor. (*) As a consequence of the previous two changes, move the increment of net->cells_outstanding that was at the point of insertion into the tree to the allocation routine to correctly balance things. Fixes: 989782dcdc91 ("afs: Overhaul cell database management") Signed-off-by: David Howells <dhowells@redhat.com>
2019-07-23 11:24:59 +01:00
afs_dec_cells_outstanding(net);
_leave(" [destroyed]");
}
afs: Simplify cell record handling Simplify afs_cell record handling to avoid very occasional races that cause module removal to hang (it waits for all cell records to be removed). There are two things that particularly contribute to the difficulty: firstly, the code tries to pass a ref on the cell to the cell's maintenance work item (which gets awkward if the work item is already queued); and, secondly, there's an overall cell manager that tries to use just one timer for the entire cell collection (to avoid having loads of timers). However, both of these are probably unnecessarily restrictive. To simplify this, the following changes are made: (1) The cell record collection manager is removed. Each cell record manages itself individually. (2) Each afs_cell is given a second work item (cell->destroyer) that is queued when its refcount reaches zero. This is not done in the context of the putting thread as it might be in an inconvenient place to sleep. (3) Each afs_cell is given its own timer. The timer is used to expire the cell record after a period of unuse if not otherwise pinned and can also be used for other maintenance tasks if necessary (of which there are currently none as DNS refresh is triggered by filesystem operations). (4) The afs_cell manager work item (cell->manager) is no longer given a ref on the cell when queued; rather, the manager must be deleted. This does away with the need to deal with the consequences of losing a race to queue cell->manager. Clean up of extra queuing is deferred to the destroyer. (5) The cell destroyer work item makes sure the cell timer is removed and that the normal cell work is cancelled before farming the actual destruction off to RCU. (6) When a network namespace is destroyed or the kafs module is unloaded, it's now a simple matter of marking the namespace as dead then just waking up all the cell work items. They will then remove and destroy themselves once all remaining activity counts and/or a ref counts are dropped. This makes sure that all server records are dropped first. (7) The cell record state set is reduced to just four states: SETTING_UP, ACTIVE, REMOVING and DEAD. The record persists in the active state even when it's not being used until the time comes to remove it rather than downgrading it to an inactive state from whence it can be restored. This means that the cell still appears in /proc and /afs when not in use until it switches to the REMOVING state - at which point it is removed. Note that the REMOVING state is included so that someone wanting to resurrect the cell record is forced to wait whilst the cell is torn down in that state. Once it's in the DEAD state, it has been removed from net->cells tree and is no longer findable and can be replaced. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org cc: linux-fsdevel@vger.kernel.org Link: https://lore.kernel.org/r/20250224234154.2014840-16-dhowells@redhat.com/ # v1 Link: https://lore.kernel.org/r/20250310094206.801057-12-dhowells@redhat.com/ # v4
2025-02-24 16:06:03 +00:00
static void afs_destroy_cell_work(struct work_struct *work)
{
afs: Simplify cell record handling Simplify afs_cell record handling to avoid very occasional races that cause module removal to hang (it waits for all cell records to be removed). There are two things that particularly contribute to the difficulty: firstly, the code tries to pass a ref on the cell to the cell's maintenance work item (which gets awkward if the work item is already queued); and, secondly, there's an overall cell manager that tries to use just one timer for the entire cell collection (to avoid having loads of timers). However, both of these are probably unnecessarily restrictive. To simplify this, the following changes are made: (1) The cell record collection manager is removed. Each cell record manages itself individually. (2) Each afs_cell is given a second work item (cell->destroyer) that is queued when its refcount reaches zero. This is not done in the context of the putting thread as it might be in an inconvenient place to sleep. (3) Each afs_cell is given its own timer. The timer is used to expire the cell record after a period of unuse if not otherwise pinned and can also be used for other maintenance tasks if necessary (of which there are currently none as DNS refresh is triggered by filesystem operations). (4) The afs_cell manager work item (cell->manager) is no longer given a ref on the cell when queued; rather, the manager must be deleted. This does away with the need to deal with the consequences of losing a race to queue cell->manager. Clean up of extra queuing is deferred to the destroyer. (5) The cell destroyer work item makes sure the cell timer is removed and that the normal cell work is cancelled before farming the actual destruction off to RCU. (6) When a network namespace is destroyed or the kafs module is unloaded, it's now a simple matter of marking the namespace as dead then just waking up all the cell work items. They will then remove and destroy themselves once all remaining activity counts and/or a ref counts are dropped. This makes sure that all server records are dropped first. (7) The cell record state set is reduced to just four states: SETTING_UP, ACTIVE, REMOVING and DEAD. The record persists in the active state even when it's not being used until the time comes to remove it rather than downgrading it to an inactive state from whence it can be restored. This means that the cell still appears in /proc and /afs when not in use until it switches to the REMOVING state - at which point it is removed. Note that the REMOVING state is included so that someone wanting to resurrect the cell record is forced to wait whilst the cell is torn down in that state. Once it's in the DEAD state, it has been removed from net->cells tree and is no longer findable and can be replaced. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org cc: linux-fsdevel@vger.kernel.org Link: https://lore.kernel.org/r/20250224234154.2014840-16-dhowells@redhat.com/ # v1 Link: https://lore.kernel.org/r/20250310094206.801057-12-dhowells@redhat.com/ # v4
2025-02-24 16:06:03 +00:00
struct afs_cell *cell = container_of(work, struct afs_cell, destroyer);
afs: Simplify cell record handling Simplify afs_cell record handling to avoid very occasional races that cause module removal to hang (it waits for all cell records to be removed). There are two things that particularly contribute to the difficulty: firstly, the code tries to pass a ref on the cell to the cell's maintenance work item (which gets awkward if the work item is already queued); and, secondly, there's an overall cell manager that tries to use just one timer for the entire cell collection (to avoid having loads of timers). However, both of these are probably unnecessarily restrictive. To simplify this, the following changes are made: (1) The cell record collection manager is removed. Each cell record manages itself individually. (2) Each afs_cell is given a second work item (cell->destroyer) that is queued when its refcount reaches zero. This is not done in the context of the putting thread as it might be in an inconvenient place to sleep. (3) Each afs_cell is given its own timer. The timer is used to expire the cell record after a period of unuse if not otherwise pinned and can also be used for other maintenance tasks if necessary (of which there are currently none as DNS refresh is triggered by filesystem operations). (4) The afs_cell manager work item (cell->manager) is no longer given a ref on the cell when queued; rather, the manager must be deleted. This does away with the need to deal with the consequences of losing a race to queue cell->manager. Clean up of extra queuing is deferred to the destroyer. (5) The cell destroyer work item makes sure the cell timer is removed and that the normal cell work is cancelled before farming the actual destruction off to RCU. (6) When a network namespace is destroyed or the kafs module is unloaded, it's now a simple matter of marking the namespace as dead then just waking up all the cell work items. They will then remove and destroy themselves once all remaining activity counts and/or a ref counts are dropped. This makes sure that all server records are dropped first. (7) The cell record state set is reduced to just four states: SETTING_UP, ACTIVE, REMOVING and DEAD. The record persists in the active state even when it's not being used until the time comes to remove it rather than downgrading it to an inactive state from whence it can be restored. This means that the cell still appears in /proc and /afs when not in use until it switches to the REMOVING state - at which point it is removed. Note that the REMOVING state is included so that someone wanting to resurrect the cell record is forced to wait whilst the cell is torn down in that state. Once it's in the DEAD state, it has been removed from net->cells tree and is no longer findable and can be replaced. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org cc: linux-fsdevel@vger.kernel.org Link: https://lore.kernel.org/r/20250224234154.2014840-16-dhowells@redhat.com/ # v1 Link: https://lore.kernel.org/r/20250310094206.801057-12-dhowells@redhat.com/ # v4
2025-02-24 16:06:03 +00:00
afs_see_cell(cell, afs_cell_trace_destroy);
timer_delete_sync(&cell->management_timer);
cancel_work_sync(&cell->manager);
call_rcu(&cell->rcu, afs_cell_destroy);
}
/*
* Get a reference on a cell record.
*/
struct afs_cell *afs_get_cell(struct afs_cell *cell, enum afs_cell_trace reason)
{
int r;
__refcount_inc(&cell->ref, &r);
trace_afs_cell(cell->debug_id, r + 1, atomic_read(&cell->active), reason);
return cell;
}
/*
* Drop a reference on a cell record.
*/
void afs_put_cell(struct afs_cell *cell, enum afs_cell_trace reason)
afs: Fix cell refcounting by splitting the usage counter Management of the lifetime of afs_cell struct has some problems due to the usage counter being used to determine whether objects of that type are in use in addition to whether anyone might be interested in the structure. This is made trickier by cell objects being cached for a period of time in case they're quickly reused as they hold the result of a setup process that may be slow (DNS lookups, AFS RPC ops). Problems include the cached root volume from alias resolution pinning its parent cell record, rmmod occasionally hanging and occasionally producing assertion failures. Fix this by splitting the count of active users from the struct reference count. Things then work as follows: (1) The cell cache keeps +1 on the cell's activity count and this has to be dropped before the cell can be removed. afs_manage_cell() tries to exchange the 1 to a 0 with the cells_lock write-locked, and if successful, the record is removed from the net->cells. (2) One struct ref is 'owned' by the activity count. That is put when the active count is reduced to 0 (final_destruction label). (3) A ref can be held on a cell whilst it is queued for management on a work queue without confusing the active count. afs_queue_cell() is added to wrap this. (4) The queue's ref is dropped at the end of the management. This is split out into a separate function, afs_manage_cell_work(). (5) The root volume record is put after a cell is removed (at the final_destruction label) rather then in the RCU destruction routine. (6) Volumes hold struct refs, but aren't active users. (7) Both counts are displayed in /proc/net/afs/cells. There are some management function changes: (*) afs_put_cell() now just decrements the refcount and triggers the RCU destruction if it becomes 0. It no longer sets a timer to have the manager do this. (*) afs_use_cell() and afs_unuse_cell() are added to increase and decrease the active count. afs_unuse_cell() sets the management timer. (*) afs_queue_cell() is added to queue a cell with approprate refs. There are also some other fixes: (*) Don't let /proc/net/afs/cells access a cell's vllist if it's NULL. (*) Make sure that candidate cells in lookups are properly destroyed rather than being simply kfree'd. This ensures the bits it points to are destroyed also. (*) afs_dec_cells_outstanding() is now called in cell destruction rather than at "final_destruction". This ensures that cell->net is still valid to the end of the destructor. (*) As a consequence of the previous two changes, move the increment of net->cells_outstanding that was at the point of insertion into the tree to the allocation routine to correctly balance things. Fixes: 989782dcdc91 ("afs: Overhaul cell database management") Signed-off-by: David Howells <dhowells@redhat.com>
2019-07-23 11:24:59 +01:00
{
if (cell) {
unsigned int debug_id = cell->debug_id;
unsigned int a;
bool zero;
int r;
afs: Fix cell refcounting by splitting the usage counter Management of the lifetime of afs_cell struct has some problems due to the usage counter being used to determine whether objects of that type are in use in addition to whether anyone might be interested in the structure. This is made trickier by cell objects being cached for a period of time in case they're quickly reused as they hold the result of a setup process that may be slow (DNS lookups, AFS RPC ops). Problems include the cached root volume from alias resolution pinning its parent cell record, rmmod occasionally hanging and occasionally producing assertion failures. Fix this by splitting the count of active users from the struct reference count. Things then work as follows: (1) The cell cache keeps +1 on the cell's activity count and this has to be dropped before the cell can be removed. afs_manage_cell() tries to exchange the 1 to a 0 with the cells_lock write-locked, and if successful, the record is removed from the net->cells. (2) One struct ref is 'owned' by the activity count. That is put when the active count is reduced to 0 (final_destruction label). (3) A ref can be held on a cell whilst it is queued for management on a work queue without confusing the active count. afs_queue_cell() is added to wrap this. (4) The queue's ref is dropped at the end of the management. This is split out into a separate function, afs_manage_cell_work(). (5) The root volume record is put after a cell is removed (at the final_destruction label) rather then in the RCU destruction routine. (6) Volumes hold struct refs, but aren't active users. (7) Both counts are displayed in /proc/net/afs/cells. There are some management function changes: (*) afs_put_cell() now just decrements the refcount and triggers the RCU destruction if it becomes 0. It no longer sets a timer to have the manager do this. (*) afs_use_cell() and afs_unuse_cell() are added to increase and decrease the active count. afs_unuse_cell() sets the management timer. (*) afs_queue_cell() is added to queue a cell with approprate refs. There are also some other fixes: (*) Don't let /proc/net/afs/cells access a cell's vllist if it's NULL. (*) Make sure that candidate cells in lookups are properly destroyed rather than being simply kfree'd. This ensures the bits it points to are destroyed also. (*) afs_dec_cells_outstanding() is now called in cell destruction rather than at "final_destruction". This ensures that cell->net is still valid to the end of the destructor. (*) As a consequence of the previous two changes, move the increment of net->cells_outstanding that was at the point of insertion into the tree to the allocation routine to correctly balance things. Fixes: 989782dcdc91 ("afs: Overhaul cell database management") Signed-off-by: David Howells <dhowells@redhat.com>
2019-07-23 11:24:59 +01:00
a = atomic_read(&cell->active);
zero = __refcount_dec_and_test(&cell->ref, &r);
trace_afs_cell(debug_id, r - 1, a, reason);
if (zero) {
afs: Fix cell refcounting by splitting the usage counter Management of the lifetime of afs_cell struct has some problems due to the usage counter being used to determine whether objects of that type are in use in addition to whether anyone might be interested in the structure. This is made trickier by cell objects being cached for a period of time in case they're quickly reused as they hold the result of a setup process that may be slow (DNS lookups, AFS RPC ops). Problems include the cached root volume from alias resolution pinning its parent cell record, rmmod occasionally hanging and occasionally producing assertion failures. Fix this by splitting the count of active users from the struct reference count. Things then work as follows: (1) The cell cache keeps +1 on the cell's activity count and this has to be dropped before the cell can be removed. afs_manage_cell() tries to exchange the 1 to a 0 with the cells_lock write-locked, and if successful, the record is removed from the net->cells. (2) One struct ref is 'owned' by the activity count. That is put when the active count is reduced to 0 (final_destruction label). (3) A ref can be held on a cell whilst it is queued for management on a work queue without confusing the active count. afs_queue_cell() is added to wrap this. (4) The queue's ref is dropped at the end of the management. This is split out into a separate function, afs_manage_cell_work(). (5) The root volume record is put after a cell is removed (at the final_destruction label) rather then in the RCU destruction routine. (6) Volumes hold struct refs, but aren't active users. (7) Both counts are displayed in /proc/net/afs/cells. There are some management function changes: (*) afs_put_cell() now just decrements the refcount and triggers the RCU destruction if it becomes 0. It no longer sets a timer to have the manager do this. (*) afs_use_cell() and afs_unuse_cell() are added to increase and decrease the active count. afs_unuse_cell() sets the management timer. (*) afs_queue_cell() is added to queue a cell with approprate refs. There are also some other fixes: (*) Don't let /proc/net/afs/cells access a cell's vllist if it's NULL. (*) Make sure that candidate cells in lookups are properly destroyed rather than being simply kfree'd. This ensures the bits it points to are destroyed also. (*) afs_dec_cells_outstanding() is now called in cell destruction rather than at "final_destruction". This ensures that cell->net is still valid to the end of the destructor. (*) As a consequence of the previous two changes, move the increment of net->cells_outstanding that was at the point of insertion into the tree to the allocation routine to correctly balance things. Fixes: 989782dcdc91 ("afs: Overhaul cell database management") Signed-off-by: David Howells <dhowells@redhat.com>
2019-07-23 11:24:59 +01:00
a = atomic_read(&cell->active);
WARN(a != 0, "Cell active count %u > 0\n", a);
afs: Simplify cell record handling Simplify afs_cell record handling to avoid very occasional races that cause module removal to hang (it waits for all cell records to be removed). There are two things that particularly contribute to the difficulty: firstly, the code tries to pass a ref on the cell to the cell's maintenance work item (which gets awkward if the work item is already queued); and, secondly, there's an overall cell manager that tries to use just one timer for the entire cell collection (to avoid having loads of timers). However, both of these are probably unnecessarily restrictive. To simplify this, the following changes are made: (1) The cell record collection manager is removed. Each cell record manages itself individually. (2) Each afs_cell is given a second work item (cell->destroyer) that is queued when its refcount reaches zero. This is not done in the context of the putting thread as it might be in an inconvenient place to sleep. (3) Each afs_cell is given its own timer. The timer is used to expire the cell record after a period of unuse if not otherwise pinned and can also be used for other maintenance tasks if necessary (of which there are currently none as DNS refresh is triggered by filesystem operations). (4) The afs_cell manager work item (cell->manager) is no longer given a ref on the cell when queued; rather, the manager must be deleted. This does away with the need to deal with the consequences of losing a race to queue cell->manager. Clean up of extra queuing is deferred to the destroyer. (5) The cell destroyer work item makes sure the cell timer is removed and that the normal cell work is cancelled before farming the actual destruction off to RCU. (6) When a network namespace is destroyed or the kafs module is unloaded, it's now a simple matter of marking the namespace as dead then just waking up all the cell work items. They will then remove and destroy themselves once all remaining activity counts and/or a ref counts are dropped. This makes sure that all server records are dropped first. (7) The cell record state set is reduced to just four states: SETTING_UP, ACTIVE, REMOVING and DEAD. The record persists in the active state even when it's not being used until the time comes to remove it rather than downgrading it to an inactive state from whence it can be restored. This means that the cell still appears in /proc and /afs when not in use until it switches to the REMOVING state - at which point it is removed. Note that the REMOVING state is included so that someone wanting to resurrect the cell record is forced to wait whilst the cell is torn down in that state. Once it's in the DEAD state, it has been removed from net->cells tree and is no longer findable and can be replaced. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org cc: linux-fsdevel@vger.kernel.org Link: https://lore.kernel.org/r/20250224234154.2014840-16-dhowells@redhat.com/ # v1 Link: https://lore.kernel.org/r/20250310094206.801057-12-dhowells@redhat.com/ # v4
2025-02-24 16:06:03 +00:00
WARN_ON(!queue_work(afs_wq, &cell->destroyer));
afs: Fix cell refcounting by splitting the usage counter Management of the lifetime of afs_cell struct has some problems due to the usage counter being used to determine whether objects of that type are in use in addition to whether anyone might be interested in the structure. This is made trickier by cell objects being cached for a period of time in case they're quickly reused as they hold the result of a setup process that may be slow (DNS lookups, AFS RPC ops). Problems include the cached root volume from alias resolution pinning its parent cell record, rmmod occasionally hanging and occasionally producing assertion failures. Fix this by splitting the count of active users from the struct reference count. Things then work as follows: (1) The cell cache keeps +1 on the cell's activity count and this has to be dropped before the cell can be removed. afs_manage_cell() tries to exchange the 1 to a 0 with the cells_lock write-locked, and if successful, the record is removed from the net->cells. (2) One struct ref is 'owned' by the activity count. That is put when the active count is reduced to 0 (final_destruction label). (3) A ref can be held on a cell whilst it is queued for management on a work queue without confusing the active count. afs_queue_cell() is added to wrap this. (4) The queue's ref is dropped at the end of the management. This is split out into a separate function, afs_manage_cell_work(). (5) The root volume record is put after a cell is removed (at the final_destruction label) rather then in the RCU destruction routine. (6) Volumes hold struct refs, but aren't active users. (7) Both counts are displayed in /proc/net/afs/cells. There are some management function changes: (*) afs_put_cell() now just decrements the refcount and triggers the RCU destruction if it becomes 0. It no longer sets a timer to have the manager do this. (*) afs_use_cell() and afs_unuse_cell() are added to increase and decrease the active count. afs_unuse_cell() sets the management timer. (*) afs_queue_cell() is added to queue a cell with approprate refs. There are also some other fixes: (*) Don't let /proc/net/afs/cells access a cell's vllist if it's NULL. (*) Make sure that candidate cells in lookups are properly destroyed rather than being simply kfree'd. This ensures the bits it points to are destroyed also. (*) afs_dec_cells_outstanding() is now called in cell destruction rather than at "final_destruction". This ensures that cell->net is still valid to the end of the destructor. (*) As a consequence of the previous two changes, move the increment of net->cells_outstanding that was at the point of insertion into the tree to the allocation routine to correctly balance things. Fixes: 989782dcdc91 ("afs: Overhaul cell database management") Signed-off-by: David Howells <dhowells@redhat.com>
2019-07-23 11:24:59 +01:00
}
}
}
/*
* Note a cell becoming more active.
*/
struct afs_cell *afs_use_cell(struct afs_cell *cell, enum afs_cell_trace reason)
afs: Fix cell refcounting by splitting the usage counter Management of the lifetime of afs_cell struct has some problems due to the usage counter being used to determine whether objects of that type are in use in addition to whether anyone might be interested in the structure. This is made trickier by cell objects being cached for a period of time in case they're quickly reused as they hold the result of a setup process that may be slow (DNS lookups, AFS RPC ops). Problems include the cached root volume from alias resolution pinning its parent cell record, rmmod occasionally hanging and occasionally producing assertion failures. Fix this by splitting the count of active users from the struct reference count. Things then work as follows: (1) The cell cache keeps +1 on the cell's activity count and this has to be dropped before the cell can be removed. afs_manage_cell() tries to exchange the 1 to a 0 with the cells_lock write-locked, and if successful, the record is removed from the net->cells. (2) One struct ref is 'owned' by the activity count. That is put when the active count is reduced to 0 (final_destruction label). (3) A ref can be held on a cell whilst it is queued for management on a work queue without confusing the active count. afs_queue_cell() is added to wrap this. (4) The queue's ref is dropped at the end of the management. This is split out into a separate function, afs_manage_cell_work(). (5) The root volume record is put after a cell is removed (at the final_destruction label) rather then in the RCU destruction routine. (6) Volumes hold struct refs, but aren't active users. (7) Both counts are displayed in /proc/net/afs/cells. There are some management function changes: (*) afs_put_cell() now just decrements the refcount and triggers the RCU destruction if it becomes 0. It no longer sets a timer to have the manager do this. (*) afs_use_cell() and afs_unuse_cell() are added to increase and decrease the active count. afs_unuse_cell() sets the management timer. (*) afs_queue_cell() is added to queue a cell with approprate refs. There are also some other fixes: (*) Don't let /proc/net/afs/cells access a cell's vllist if it's NULL. (*) Make sure that candidate cells in lookups are properly destroyed rather than being simply kfree'd. This ensures the bits it points to are destroyed also. (*) afs_dec_cells_outstanding() is now called in cell destruction rather than at "final_destruction". This ensures that cell->net is still valid to the end of the destructor. (*) As a consequence of the previous two changes, move the increment of net->cells_outstanding that was at the point of insertion into the tree to the allocation routine to correctly balance things. Fixes: 989782dcdc91 ("afs: Overhaul cell database management") Signed-off-by: David Howells <dhowells@redhat.com>
2019-07-23 11:24:59 +01:00
{
int r, a;
afs: Fix cell refcounting by splitting the usage counter Management of the lifetime of afs_cell struct has some problems due to the usage counter being used to determine whether objects of that type are in use in addition to whether anyone might be interested in the structure. This is made trickier by cell objects being cached for a period of time in case they're quickly reused as they hold the result of a setup process that may be slow (DNS lookups, AFS RPC ops). Problems include the cached root volume from alias resolution pinning its parent cell record, rmmod occasionally hanging and occasionally producing assertion failures. Fix this by splitting the count of active users from the struct reference count. Things then work as follows: (1) The cell cache keeps +1 on the cell's activity count and this has to be dropped before the cell can be removed. afs_manage_cell() tries to exchange the 1 to a 0 with the cells_lock write-locked, and if successful, the record is removed from the net->cells. (2) One struct ref is 'owned' by the activity count. That is put when the active count is reduced to 0 (final_destruction label). (3) A ref can be held on a cell whilst it is queued for management on a work queue without confusing the active count. afs_queue_cell() is added to wrap this. (4) The queue's ref is dropped at the end of the management. This is split out into a separate function, afs_manage_cell_work(). (5) The root volume record is put after a cell is removed (at the final_destruction label) rather then in the RCU destruction routine. (6) Volumes hold struct refs, but aren't active users. (7) Both counts are displayed in /proc/net/afs/cells. There are some management function changes: (*) afs_put_cell() now just decrements the refcount and triggers the RCU destruction if it becomes 0. It no longer sets a timer to have the manager do this. (*) afs_use_cell() and afs_unuse_cell() are added to increase and decrease the active count. afs_unuse_cell() sets the management timer. (*) afs_queue_cell() is added to queue a cell with approprate refs. There are also some other fixes: (*) Don't let /proc/net/afs/cells access a cell's vllist if it's NULL. (*) Make sure that candidate cells in lookups are properly destroyed rather than being simply kfree'd. This ensures the bits it points to are destroyed also. (*) afs_dec_cells_outstanding() is now called in cell destruction rather than at "final_destruction". This ensures that cell->net is still valid to the end of the destructor. (*) As a consequence of the previous two changes, move the increment of net->cells_outstanding that was at the point of insertion into the tree to the allocation routine to correctly balance things. Fixes: 989782dcdc91 ("afs: Overhaul cell database management") Signed-off-by: David Howells <dhowells@redhat.com>
2019-07-23 11:24:59 +01:00
afs: Simplify cell record handling Simplify afs_cell record handling to avoid very occasional races that cause module removal to hang (it waits for all cell records to be removed). There are two things that particularly contribute to the difficulty: firstly, the code tries to pass a ref on the cell to the cell's maintenance work item (which gets awkward if the work item is already queued); and, secondly, there's an overall cell manager that tries to use just one timer for the entire cell collection (to avoid having loads of timers). However, both of these are probably unnecessarily restrictive. To simplify this, the following changes are made: (1) The cell record collection manager is removed. Each cell record manages itself individually. (2) Each afs_cell is given a second work item (cell->destroyer) that is queued when its refcount reaches zero. This is not done in the context of the putting thread as it might be in an inconvenient place to sleep. (3) Each afs_cell is given its own timer. The timer is used to expire the cell record after a period of unuse if not otherwise pinned and can also be used for other maintenance tasks if necessary (of which there are currently none as DNS refresh is triggered by filesystem operations). (4) The afs_cell manager work item (cell->manager) is no longer given a ref on the cell when queued; rather, the manager must be deleted. This does away with the need to deal with the consequences of losing a race to queue cell->manager. Clean up of extra queuing is deferred to the destroyer. (5) The cell destroyer work item makes sure the cell timer is removed and that the normal cell work is cancelled before farming the actual destruction off to RCU. (6) When a network namespace is destroyed or the kafs module is unloaded, it's now a simple matter of marking the namespace as dead then just waking up all the cell work items. They will then remove and destroy themselves once all remaining activity counts and/or a ref counts are dropped. This makes sure that all server records are dropped first. (7) The cell record state set is reduced to just four states: SETTING_UP, ACTIVE, REMOVING and DEAD. The record persists in the active state even when it's not being used until the time comes to remove it rather than downgrading it to an inactive state from whence it can be restored. This means that the cell still appears in /proc and /afs when not in use until it switches to the REMOVING state - at which point it is removed. Note that the REMOVING state is included so that someone wanting to resurrect the cell record is forced to wait whilst the cell is torn down in that state. Once it's in the DEAD state, it has been removed from net->cells tree and is no longer findable and can be replaced. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org cc: linux-fsdevel@vger.kernel.org Link: https://lore.kernel.org/r/20250224234154.2014840-16-dhowells@redhat.com/ # v1 Link: https://lore.kernel.org/r/20250310094206.801057-12-dhowells@redhat.com/ # v4
2025-02-24 16:06:03 +00:00
__refcount_inc(&cell->ref, &r);
a = atomic_inc_return(&cell->active);
afs: Simplify cell record handling Simplify afs_cell record handling to avoid very occasional races that cause module removal to hang (it waits for all cell records to be removed). There are two things that particularly contribute to the difficulty: firstly, the code tries to pass a ref on the cell to the cell's maintenance work item (which gets awkward if the work item is already queued); and, secondly, there's an overall cell manager that tries to use just one timer for the entire cell collection (to avoid having loads of timers). However, both of these are probably unnecessarily restrictive. To simplify this, the following changes are made: (1) The cell record collection manager is removed. Each cell record manages itself individually. (2) Each afs_cell is given a second work item (cell->destroyer) that is queued when its refcount reaches zero. This is not done in the context of the putting thread as it might be in an inconvenient place to sleep. (3) Each afs_cell is given its own timer. The timer is used to expire the cell record after a period of unuse if not otherwise pinned and can also be used for other maintenance tasks if necessary (of which there are currently none as DNS refresh is triggered by filesystem operations). (4) The afs_cell manager work item (cell->manager) is no longer given a ref on the cell when queued; rather, the manager must be deleted. This does away with the need to deal with the consequences of losing a race to queue cell->manager. Clean up of extra queuing is deferred to the destroyer. (5) The cell destroyer work item makes sure the cell timer is removed and that the normal cell work is cancelled before farming the actual destruction off to RCU. (6) When a network namespace is destroyed or the kafs module is unloaded, it's now a simple matter of marking the namespace as dead then just waking up all the cell work items. They will then remove and destroy themselves once all remaining activity counts and/or a ref counts are dropped. This makes sure that all server records are dropped first. (7) The cell record state set is reduced to just four states: SETTING_UP, ACTIVE, REMOVING and DEAD. The record persists in the active state even when it's not being used until the time comes to remove it rather than downgrading it to an inactive state from whence it can be restored. This means that the cell still appears in /proc and /afs when not in use until it switches to the REMOVING state - at which point it is removed. Note that the REMOVING state is included so that someone wanting to resurrect the cell record is forced to wait whilst the cell is torn down in that state. Once it's in the DEAD state, it has been removed from net->cells tree and is no longer findable and can be replaced. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org cc: linux-fsdevel@vger.kernel.org Link: https://lore.kernel.org/r/20250224234154.2014840-16-dhowells@redhat.com/ # v1 Link: https://lore.kernel.org/r/20250310094206.801057-12-dhowells@redhat.com/ # v4
2025-02-24 16:06:03 +00:00
trace_afs_cell(cell->debug_id, r + 1, a, reason);
afs: Fix cell refcounting by splitting the usage counter Management of the lifetime of afs_cell struct has some problems due to the usage counter being used to determine whether objects of that type are in use in addition to whether anyone might be interested in the structure. This is made trickier by cell objects being cached for a period of time in case they're quickly reused as they hold the result of a setup process that may be slow (DNS lookups, AFS RPC ops). Problems include the cached root volume from alias resolution pinning its parent cell record, rmmod occasionally hanging and occasionally producing assertion failures. Fix this by splitting the count of active users from the struct reference count. Things then work as follows: (1) The cell cache keeps +1 on the cell's activity count and this has to be dropped before the cell can be removed. afs_manage_cell() tries to exchange the 1 to a 0 with the cells_lock write-locked, and if successful, the record is removed from the net->cells. (2) One struct ref is 'owned' by the activity count. That is put when the active count is reduced to 0 (final_destruction label). (3) A ref can be held on a cell whilst it is queued for management on a work queue without confusing the active count. afs_queue_cell() is added to wrap this. (4) The queue's ref is dropped at the end of the management. This is split out into a separate function, afs_manage_cell_work(). (5) The root volume record is put after a cell is removed (at the final_destruction label) rather then in the RCU destruction routine. (6) Volumes hold struct refs, but aren't active users. (7) Both counts are displayed in /proc/net/afs/cells. There are some management function changes: (*) afs_put_cell() now just decrements the refcount and triggers the RCU destruction if it becomes 0. It no longer sets a timer to have the manager do this. (*) afs_use_cell() and afs_unuse_cell() are added to increase and decrease the active count. afs_unuse_cell() sets the management timer. (*) afs_queue_cell() is added to queue a cell with approprate refs. There are also some other fixes: (*) Don't let /proc/net/afs/cells access a cell's vllist if it's NULL. (*) Make sure that candidate cells in lookups are properly destroyed rather than being simply kfree'd. This ensures the bits it points to are destroyed also. (*) afs_dec_cells_outstanding() is now called in cell destruction rather than at "final_destruction". This ensures that cell->net is still valid to the end of the destructor. (*) As a consequence of the previous two changes, move the increment of net->cells_outstanding that was at the point of insertion into the tree to the allocation routine to correctly balance things. Fixes: 989782dcdc91 ("afs: Overhaul cell database management") Signed-off-by: David Howells <dhowells@redhat.com>
2019-07-23 11:24:59 +01:00
return cell;
}
/*
* Record a cell becoming less active. When the active counter reaches 1, it
* is scheduled for destruction, but may get reactivated.
*/
void afs_unuse_cell(struct afs_cell *cell, enum afs_cell_trace reason)
{
unsigned int debug_id;
time64_t now, expire_delay;
afs: Simplify cell record handling Simplify afs_cell record handling to avoid very occasional races that cause module removal to hang (it waits for all cell records to be removed). There are two things that particularly contribute to the difficulty: firstly, the code tries to pass a ref on the cell to the cell's maintenance work item (which gets awkward if the work item is already queued); and, secondly, there's an overall cell manager that tries to use just one timer for the entire cell collection (to avoid having loads of timers). However, both of these are probably unnecessarily restrictive. To simplify this, the following changes are made: (1) The cell record collection manager is removed. Each cell record manages itself individually. (2) Each afs_cell is given a second work item (cell->destroyer) that is queued when its refcount reaches zero. This is not done in the context of the putting thread as it might be in an inconvenient place to sleep. (3) Each afs_cell is given its own timer. The timer is used to expire the cell record after a period of unuse if not otherwise pinned and can also be used for other maintenance tasks if necessary (of which there are currently none as DNS refresh is triggered by filesystem operations). (4) The afs_cell manager work item (cell->manager) is no longer given a ref on the cell when queued; rather, the manager must be deleted. This does away with the need to deal with the consequences of losing a race to queue cell->manager. Clean up of extra queuing is deferred to the destroyer. (5) The cell destroyer work item makes sure the cell timer is removed and that the normal cell work is cancelled before farming the actual destruction off to RCU. (6) When a network namespace is destroyed or the kafs module is unloaded, it's now a simple matter of marking the namespace as dead then just waking up all the cell work items. They will then remove and destroy themselves once all remaining activity counts and/or a ref counts are dropped. This makes sure that all server records are dropped first. (7) The cell record state set is reduced to just four states: SETTING_UP, ACTIVE, REMOVING and DEAD. The record persists in the active state even when it's not being used until the time comes to remove it rather than downgrading it to an inactive state from whence it can be restored. This means that the cell still appears in /proc and /afs when not in use until it switches to the REMOVING state - at which point it is removed. Note that the REMOVING state is included so that someone wanting to resurrect the cell record is forced to wait whilst the cell is torn down in that state. Once it's in the DEAD state, it has been removed from net->cells tree and is no longer findable and can be replaced. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org cc: linux-fsdevel@vger.kernel.org Link: https://lore.kernel.org/r/20250224234154.2014840-16-dhowells@redhat.com/ # v1 Link: https://lore.kernel.org/r/20250310094206.801057-12-dhowells@redhat.com/ # v4
2025-02-24 16:06:03 +00:00
bool zero;
int r, a;
if (!cell)
return;
_enter("%s", cell->name);
now = ktime_get_real_seconds();
cell->last_inactive = now;
expire_delay = 0;
if (cell->vl_servers->nr_servers)
expire_delay = afs_cell_gc_delay;
debug_id = cell->debug_id;
afs: Fix cell refcounting by splitting the usage counter Management of the lifetime of afs_cell struct has some problems due to the usage counter being used to determine whether objects of that type are in use in addition to whether anyone might be interested in the structure. This is made trickier by cell objects being cached for a period of time in case they're quickly reused as they hold the result of a setup process that may be slow (DNS lookups, AFS RPC ops). Problems include the cached root volume from alias resolution pinning its parent cell record, rmmod occasionally hanging and occasionally producing assertion failures. Fix this by splitting the count of active users from the struct reference count. Things then work as follows: (1) The cell cache keeps +1 on the cell's activity count and this has to be dropped before the cell can be removed. afs_manage_cell() tries to exchange the 1 to a 0 with the cells_lock write-locked, and if successful, the record is removed from the net->cells. (2) One struct ref is 'owned' by the activity count. That is put when the active count is reduced to 0 (final_destruction label). (3) A ref can be held on a cell whilst it is queued for management on a work queue without confusing the active count. afs_queue_cell() is added to wrap this. (4) The queue's ref is dropped at the end of the management. This is split out into a separate function, afs_manage_cell_work(). (5) The root volume record is put after a cell is removed (at the final_destruction label) rather then in the RCU destruction routine. (6) Volumes hold struct refs, but aren't active users. (7) Both counts are displayed in /proc/net/afs/cells. There are some management function changes: (*) afs_put_cell() now just decrements the refcount and triggers the RCU destruction if it becomes 0. It no longer sets a timer to have the manager do this. (*) afs_use_cell() and afs_unuse_cell() are added to increase and decrease the active count. afs_unuse_cell() sets the management timer. (*) afs_queue_cell() is added to queue a cell with approprate refs. There are also some other fixes: (*) Don't let /proc/net/afs/cells access a cell's vllist if it's NULL. (*) Make sure that candidate cells in lookups are properly destroyed rather than being simply kfree'd. This ensures the bits it points to are destroyed also. (*) afs_dec_cells_outstanding() is now called in cell destruction rather than at "final_destruction". This ensures that cell->net is still valid to the end of the destructor. (*) As a consequence of the previous two changes, move the increment of net->cells_outstanding that was at the point of insertion into the tree to the allocation routine to correctly balance things. Fixes: 989782dcdc91 ("afs: Overhaul cell database management") Signed-off-by: David Howells <dhowells@redhat.com>
2019-07-23 11:24:59 +01:00
a = atomic_dec_return(&cell->active);
afs: Simplify cell record handling Simplify afs_cell record handling to avoid very occasional races that cause module removal to hang (it waits for all cell records to be removed). There are two things that particularly contribute to the difficulty: firstly, the code tries to pass a ref on the cell to the cell's maintenance work item (which gets awkward if the work item is already queued); and, secondly, there's an overall cell manager that tries to use just one timer for the entire cell collection (to avoid having loads of timers). However, both of these are probably unnecessarily restrictive. To simplify this, the following changes are made: (1) The cell record collection manager is removed. Each cell record manages itself individually. (2) Each afs_cell is given a second work item (cell->destroyer) that is queued when its refcount reaches zero. This is not done in the context of the putting thread as it might be in an inconvenient place to sleep. (3) Each afs_cell is given its own timer. The timer is used to expire the cell record after a period of unuse if not otherwise pinned and can also be used for other maintenance tasks if necessary (of which there are currently none as DNS refresh is triggered by filesystem operations). (4) The afs_cell manager work item (cell->manager) is no longer given a ref on the cell when queued; rather, the manager must be deleted. This does away with the need to deal with the consequences of losing a race to queue cell->manager. Clean up of extra queuing is deferred to the destroyer. (5) The cell destroyer work item makes sure the cell timer is removed and that the normal cell work is cancelled before farming the actual destruction off to RCU. (6) When a network namespace is destroyed or the kafs module is unloaded, it's now a simple matter of marking the namespace as dead then just waking up all the cell work items. They will then remove and destroy themselves once all remaining activity counts and/or a ref counts are dropped. This makes sure that all server records are dropped first. (7) The cell record state set is reduced to just four states: SETTING_UP, ACTIVE, REMOVING and DEAD. The record persists in the active state even when it's not being used until the time comes to remove it rather than downgrading it to an inactive state from whence it can be restored. This means that the cell still appears in /proc and /afs when not in use until it switches to the REMOVING state - at which point it is removed. Note that the REMOVING state is included so that someone wanting to resurrect the cell record is forced to wait whilst the cell is torn down in that state. Once it's in the DEAD state, it has been removed from net->cells tree and is no longer findable and can be replaced. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org cc: linux-fsdevel@vger.kernel.org Link: https://lore.kernel.org/r/20250224234154.2014840-16-dhowells@redhat.com/ # v1 Link: https://lore.kernel.org/r/20250310094206.801057-12-dhowells@redhat.com/ # v4
2025-02-24 16:06:03 +00:00
if (!a)
afs: Fix cell refcounting by splitting the usage counter Management of the lifetime of afs_cell struct has some problems due to the usage counter being used to determine whether objects of that type are in use in addition to whether anyone might be interested in the structure. This is made trickier by cell objects being cached for a period of time in case they're quickly reused as they hold the result of a setup process that may be slow (DNS lookups, AFS RPC ops). Problems include the cached root volume from alias resolution pinning its parent cell record, rmmod occasionally hanging and occasionally producing assertion failures. Fix this by splitting the count of active users from the struct reference count. Things then work as follows: (1) The cell cache keeps +1 on the cell's activity count and this has to be dropped before the cell can be removed. afs_manage_cell() tries to exchange the 1 to a 0 with the cells_lock write-locked, and if successful, the record is removed from the net->cells. (2) One struct ref is 'owned' by the activity count. That is put when the active count is reduced to 0 (final_destruction label). (3) A ref can be held on a cell whilst it is queued for management on a work queue without confusing the active count. afs_queue_cell() is added to wrap this. (4) The queue's ref is dropped at the end of the management. This is split out into a separate function, afs_manage_cell_work(). (5) The root volume record is put after a cell is removed (at the final_destruction label) rather then in the RCU destruction routine. (6) Volumes hold struct refs, but aren't active users. (7) Both counts are displayed in /proc/net/afs/cells. There are some management function changes: (*) afs_put_cell() now just decrements the refcount and triggers the RCU destruction if it becomes 0. It no longer sets a timer to have the manager do this. (*) afs_use_cell() and afs_unuse_cell() are added to increase and decrease the active count. afs_unuse_cell() sets the management timer. (*) afs_queue_cell() is added to queue a cell with approprate refs. There are also some other fixes: (*) Don't let /proc/net/afs/cells access a cell's vllist if it's NULL. (*) Make sure that candidate cells in lookups are properly destroyed rather than being simply kfree'd. This ensures the bits it points to are destroyed also. (*) afs_dec_cells_outstanding() is now called in cell destruction rather than at "final_destruction". This ensures that cell->net is still valid to the end of the destructor. (*) As a consequence of the previous two changes, move the increment of net->cells_outstanding that was at the point of insertion into the tree to the allocation routine to correctly balance things. Fixes: 989782dcdc91 ("afs: Overhaul cell database management") Signed-off-by: David Howells <dhowells@redhat.com>
2019-07-23 11:24:59 +01:00
/* 'cell' may now be garbage collected. */
afs: Simplify cell record handling Simplify afs_cell record handling to avoid very occasional races that cause module removal to hang (it waits for all cell records to be removed). There are two things that particularly contribute to the difficulty: firstly, the code tries to pass a ref on the cell to the cell's maintenance work item (which gets awkward if the work item is already queued); and, secondly, there's an overall cell manager that tries to use just one timer for the entire cell collection (to avoid having loads of timers). However, both of these are probably unnecessarily restrictive. To simplify this, the following changes are made: (1) The cell record collection manager is removed. Each cell record manages itself individually. (2) Each afs_cell is given a second work item (cell->destroyer) that is queued when its refcount reaches zero. This is not done in the context of the putting thread as it might be in an inconvenient place to sleep. (3) Each afs_cell is given its own timer. The timer is used to expire the cell record after a period of unuse if not otherwise pinned and can also be used for other maintenance tasks if necessary (of which there are currently none as DNS refresh is triggered by filesystem operations). (4) The afs_cell manager work item (cell->manager) is no longer given a ref on the cell when queued; rather, the manager must be deleted. This does away with the need to deal with the consequences of losing a race to queue cell->manager. Clean up of extra queuing is deferred to the destroyer. (5) The cell destroyer work item makes sure the cell timer is removed and that the normal cell work is cancelled before farming the actual destruction off to RCU. (6) When a network namespace is destroyed or the kafs module is unloaded, it's now a simple matter of marking the namespace as dead then just waking up all the cell work items. They will then remove and destroy themselves once all remaining activity counts and/or a ref counts are dropped. This makes sure that all server records are dropped first. (7) The cell record state set is reduced to just four states: SETTING_UP, ACTIVE, REMOVING and DEAD. The record persists in the active state even when it's not being used until the time comes to remove it rather than downgrading it to an inactive state from whence it can be restored. This means that the cell still appears in /proc and /afs when not in use until it switches to the REMOVING state - at which point it is removed. Note that the REMOVING state is included so that someone wanting to resurrect the cell record is forced to wait whilst the cell is torn down in that state. Once it's in the DEAD state, it has been removed from net->cells tree and is no longer findable and can be replaced. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org cc: linux-fsdevel@vger.kernel.org Link: https://lore.kernel.org/r/20250224234154.2014840-16-dhowells@redhat.com/ # v1 Link: https://lore.kernel.org/r/20250310094206.801057-12-dhowells@redhat.com/ # v4
2025-02-24 16:06:03 +00:00
afs_set_cell_timer(cell, expire_delay);
zero = __refcount_dec_and_test(&cell->ref, &r);
trace_afs_cell(debug_id, r - 1, a, reason);
if (zero)
WARN_ON(!queue_work(afs_wq, &cell->destroyer));
afs: Fix cell refcounting by splitting the usage counter Management of the lifetime of afs_cell struct has some problems due to the usage counter being used to determine whether objects of that type are in use in addition to whether anyone might be interested in the structure. This is made trickier by cell objects being cached for a period of time in case they're quickly reused as they hold the result of a setup process that may be slow (DNS lookups, AFS RPC ops). Problems include the cached root volume from alias resolution pinning its parent cell record, rmmod occasionally hanging and occasionally producing assertion failures. Fix this by splitting the count of active users from the struct reference count. Things then work as follows: (1) The cell cache keeps +1 on the cell's activity count and this has to be dropped before the cell can be removed. afs_manage_cell() tries to exchange the 1 to a 0 with the cells_lock write-locked, and if successful, the record is removed from the net->cells. (2) One struct ref is 'owned' by the activity count. That is put when the active count is reduced to 0 (final_destruction label). (3) A ref can be held on a cell whilst it is queued for management on a work queue without confusing the active count. afs_queue_cell() is added to wrap this. (4) The queue's ref is dropped at the end of the management. This is split out into a separate function, afs_manage_cell_work(). (5) The root volume record is put after a cell is removed (at the final_destruction label) rather then in the RCU destruction routine. (6) Volumes hold struct refs, but aren't active users. (7) Both counts are displayed in /proc/net/afs/cells. There are some management function changes: (*) afs_put_cell() now just decrements the refcount and triggers the RCU destruction if it becomes 0. It no longer sets a timer to have the manager do this. (*) afs_use_cell() and afs_unuse_cell() are added to increase and decrease the active count. afs_unuse_cell() sets the management timer. (*) afs_queue_cell() is added to queue a cell with approprate refs. There are also some other fixes: (*) Don't let /proc/net/afs/cells access a cell's vllist if it's NULL. (*) Make sure that candidate cells in lookups are properly destroyed rather than being simply kfree'd. This ensures the bits it points to are destroyed also. (*) afs_dec_cells_outstanding() is now called in cell destruction rather than at "final_destruction". This ensures that cell->net is still valid to the end of the destructor. (*) As a consequence of the previous two changes, move the increment of net->cells_outstanding that was at the point of insertion into the tree to the allocation routine to correctly balance things. Fixes: 989782dcdc91 ("afs: Overhaul cell database management") Signed-off-by: David Howells <dhowells@redhat.com>
2019-07-23 11:24:59 +01:00
}
/*
* Note that a cell has been seen.
*/
void afs_see_cell(struct afs_cell *cell, enum afs_cell_trace reason)
{
int r, a;
r = refcount_read(&cell->ref);
a = atomic_read(&cell->active);
trace_afs_cell(cell->debug_id, r, a, reason);
}
afs: Fix cell refcounting by splitting the usage counter Management of the lifetime of afs_cell struct has some problems due to the usage counter being used to determine whether objects of that type are in use in addition to whether anyone might be interested in the structure. This is made trickier by cell objects being cached for a period of time in case they're quickly reused as they hold the result of a setup process that may be slow (DNS lookups, AFS RPC ops). Problems include the cached root volume from alias resolution pinning its parent cell record, rmmod occasionally hanging and occasionally producing assertion failures. Fix this by splitting the count of active users from the struct reference count. Things then work as follows: (1) The cell cache keeps +1 on the cell's activity count and this has to be dropped before the cell can be removed. afs_manage_cell() tries to exchange the 1 to a 0 with the cells_lock write-locked, and if successful, the record is removed from the net->cells. (2) One struct ref is 'owned' by the activity count. That is put when the active count is reduced to 0 (final_destruction label). (3) A ref can be held on a cell whilst it is queued for management on a work queue without confusing the active count. afs_queue_cell() is added to wrap this. (4) The queue's ref is dropped at the end of the management. This is split out into a separate function, afs_manage_cell_work(). (5) The root volume record is put after a cell is removed (at the final_destruction label) rather then in the RCU destruction routine. (6) Volumes hold struct refs, but aren't active users. (7) Both counts are displayed in /proc/net/afs/cells. There are some management function changes: (*) afs_put_cell() now just decrements the refcount and triggers the RCU destruction if it becomes 0. It no longer sets a timer to have the manager do this. (*) afs_use_cell() and afs_unuse_cell() are added to increase and decrease the active count. afs_unuse_cell() sets the management timer. (*) afs_queue_cell() is added to queue a cell with approprate refs. There are also some other fixes: (*) Don't let /proc/net/afs/cells access a cell's vllist if it's NULL. (*) Make sure that candidate cells in lookups are properly destroyed rather than being simply kfree'd. This ensures the bits it points to are destroyed also. (*) afs_dec_cells_outstanding() is now called in cell destruction rather than at "final_destruction". This ensures that cell->net is still valid to the end of the destructor. (*) As a consequence of the previous two changes, move the increment of net->cells_outstanding that was at the point of insertion into the tree to the allocation routine to correctly balance things. Fixes: 989782dcdc91 ("afs: Overhaul cell database management") Signed-off-by: David Howells <dhowells@redhat.com>
2019-07-23 11:24:59 +01:00
/*
* Queue a cell for management, giving the workqueue a ref to hold.
*/
void afs_queue_cell(struct afs_cell *cell, enum afs_cell_trace reason)
afs: Fix cell refcounting by splitting the usage counter Management of the lifetime of afs_cell struct has some problems due to the usage counter being used to determine whether objects of that type are in use in addition to whether anyone might be interested in the structure. This is made trickier by cell objects being cached for a period of time in case they're quickly reused as they hold the result of a setup process that may be slow (DNS lookups, AFS RPC ops). Problems include the cached root volume from alias resolution pinning its parent cell record, rmmod occasionally hanging and occasionally producing assertion failures. Fix this by splitting the count of active users from the struct reference count. Things then work as follows: (1) The cell cache keeps +1 on the cell's activity count and this has to be dropped before the cell can be removed. afs_manage_cell() tries to exchange the 1 to a 0 with the cells_lock write-locked, and if successful, the record is removed from the net->cells. (2) One struct ref is 'owned' by the activity count. That is put when the active count is reduced to 0 (final_destruction label). (3) A ref can be held on a cell whilst it is queued for management on a work queue without confusing the active count. afs_queue_cell() is added to wrap this. (4) The queue's ref is dropped at the end of the management. This is split out into a separate function, afs_manage_cell_work(). (5) The root volume record is put after a cell is removed (at the final_destruction label) rather then in the RCU destruction routine. (6) Volumes hold struct refs, but aren't active users. (7) Both counts are displayed in /proc/net/afs/cells. There are some management function changes: (*) afs_put_cell() now just decrements the refcount and triggers the RCU destruction if it becomes 0. It no longer sets a timer to have the manager do this. (*) afs_use_cell() and afs_unuse_cell() are added to increase and decrease the active count. afs_unuse_cell() sets the management timer. (*) afs_queue_cell() is added to queue a cell with approprate refs. There are also some other fixes: (*) Don't let /proc/net/afs/cells access a cell's vllist if it's NULL. (*) Make sure that candidate cells in lookups are properly destroyed rather than being simply kfree'd. This ensures the bits it points to are destroyed also. (*) afs_dec_cells_outstanding() is now called in cell destruction rather than at "final_destruction". This ensures that cell->net is still valid to the end of the destructor. (*) As a consequence of the previous two changes, move the increment of net->cells_outstanding that was at the point of insertion into the tree to the allocation routine to correctly balance things. Fixes: 989782dcdc91 ("afs: Overhaul cell database management") Signed-off-by: David Howells <dhowells@redhat.com>
2019-07-23 11:24:59 +01:00
{
afs: Simplify cell record handling Simplify afs_cell record handling to avoid very occasional races that cause module removal to hang (it waits for all cell records to be removed). There are two things that particularly contribute to the difficulty: firstly, the code tries to pass a ref on the cell to the cell's maintenance work item (which gets awkward if the work item is already queued); and, secondly, there's an overall cell manager that tries to use just one timer for the entire cell collection (to avoid having loads of timers). However, both of these are probably unnecessarily restrictive. To simplify this, the following changes are made: (1) The cell record collection manager is removed. Each cell record manages itself individually. (2) Each afs_cell is given a second work item (cell->destroyer) that is queued when its refcount reaches zero. This is not done in the context of the putting thread as it might be in an inconvenient place to sleep. (3) Each afs_cell is given its own timer. The timer is used to expire the cell record after a period of unuse if not otherwise pinned and can also be used for other maintenance tasks if necessary (of which there are currently none as DNS refresh is triggered by filesystem operations). (4) The afs_cell manager work item (cell->manager) is no longer given a ref on the cell when queued; rather, the manager must be deleted. This does away with the need to deal with the consequences of losing a race to queue cell->manager. Clean up of extra queuing is deferred to the destroyer. (5) The cell destroyer work item makes sure the cell timer is removed and that the normal cell work is cancelled before farming the actual destruction off to RCU. (6) When a network namespace is destroyed or the kafs module is unloaded, it's now a simple matter of marking the namespace as dead then just waking up all the cell work items. They will then remove and destroy themselves once all remaining activity counts and/or a ref counts are dropped. This makes sure that all server records are dropped first. (7) The cell record state set is reduced to just four states: SETTING_UP, ACTIVE, REMOVING and DEAD. The record persists in the active state even when it's not being used until the time comes to remove it rather than downgrading it to an inactive state from whence it can be restored. This means that the cell still appears in /proc and /afs when not in use until it switches to the REMOVING state - at which point it is removed. Note that the REMOVING state is included so that someone wanting to resurrect the cell record is forced to wait whilst the cell is torn down in that state. Once it's in the DEAD state, it has been removed from net->cells tree and is no longer findable and can be replaced. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org cc: linux-fsdevel@vger.kernel.org Link: https://lore.kernel.org/r/20250224234154.2014840-16-dhowells@redhat.com/ # v1 Link: https://lore.kernel.org/r/20250310094206.801057-12-dhowells@redhat.com/ # v4
2025-02-24 16:06:03 +00:00
queue_work(afs_wq, &cell->manager);
}
/*
* Cell-specific management timer.
*/
static void afs_cell_timer(struct timer_list *timer)
{
struct afs_cell *cell = container_of(timer, struct afs_cell, management_timer);
afs_see_cell(cell, afs_cell_trace_see_mgmt_timer);
if (refcount_read(&cell->ref) > 0 && cell->net->live)
queue_work(afs_wq, &cell->manager);
}
/*
* Set/reduce the cell timer.
*/
void afs_set_cell_timer(struct afs_cell *cell, unsigned int delay_secs)
{
timer_reduce(&cell->management_timer, jiffies + delay_secs * HZ);
}
/*
* Allocate a key to use as a placeholder for anonymous user security.
*/
static int afs_alloc_anon_key(struct afs_cell *cell)
{
struct key *key;
char keyname[4 + AFS_MAXCELLNAME + 1], *cp, *dp;
/* Create a key to represent an anonymous user. */
memcpy(keyname, "afs@", 4);
dp = keyname + 4;
cp = cell->name;
do {
*dp++ = tolower(*cp);
} while (*cp++);
key = rxrpc_get_null_key(keyname);
if (IS_ERR(key))
return PTR_ERR(key);
cell->anonymous_key = key;
_debug("anon key %p{%x}",
cell->anonymous_key, key_serial(cell->anonymous_key));
return 0;
}
/*
* Activate a cell.
*/
static int afs_activate_cell(struct afs_net *net, struct afs_cell *cell)
{
struct hlist_node **p;
struct afs_cell *pcell;
int ret;
if (!cell->anonymous_key) {
ret = afs_alloc_anon_key(cell);
if (ret < 0)
return ret;
}
ret = afs_proc_cell_setup(cell);
if (ret < 0)
return ret;
mutex_lock(&net->proc_cells_lock);
for (p = &net->proc_cells.first; *p; p = &(*p)->next) {
pcell = hlist_entry(*p, struct afs_cell, proc_link);
if (strcmp(cell->name, pcell->name) < 0)
break;
}
cell->proc_link.pprev = p;
cell->proc_link.next = *p;
rcu_assign_pointer(*p, &cell->proc_link.next);
if (cell->proc_link.next)
cell->proc_link.next->pprev = &cell->proc_link.next;
mutex_unlock(&net->proc_cells_lock);
return 0;
}
/*
* Deactivate a cell.
*/
static void afs_deactivate_cell(struct afs_net *net, struct afs_cell *cell)
{
_enter("%s", cell->name);
afs_proc_cell_remove(cell);
mutex_lock(&net->proc_cells_lock);
if (!hlist_unhashed(&cell->proc_link))
hlist_del_rcu(&cell->proc_link);
mutex_unlock(&net->proc_cells_lock);
_leave("");
}
afs: Simplify cell record handling Simplify afs_cell record handling to avoid very occasional races that cause module removal to hang (it waits for all cell records to be removed). There are two things that particularly contribute to the difficulty: firstly, the code tries to pass a ref on the cell to the cell's maintenance work item (which gets awkward if the work item is already queued); and, secondly, there's an overall cell manager that tries to use just one timer for the entire cell collection (to avoid having loads of timers). However, both of these are probably unnecessarily restrictive. To simplify this, the following changes are made: (1) The cell record collection manager is removed. Each cell record manages itself individually. (2) Each afs_cell is given a second work item (cell->destroyer) that is queued when its refcount reaches zero. This is not done in the context of the putting thread as it might be in an inconvenient place to sleep. (3) Each afs_cell is given its own timer. The timer is used to expire the cell record after a period of unuse if not otherwise pinned and can also be used for other maintenance tasks if necessary (of which there are currently none as DNS refresh is triggered by filesystem operations). (4) The afs_cell manager work item (cell->manager) is no longer given a ref on the cell when queued; rather, the manager must be deleted. This does away with the need to deal with the consequences of losing a race to queue cell->manager. Clean up of extra queuing is deferred to the destroyer. (5) The cell destroyer work item makes sure the cell timer is removed and that the normal cell work is cancelled before farming the actual destruction off to RCU. (6) When a network namespace is destroyed or the kafs module is unloaded, it's now a simple matter of marking the namespace as dead then just waking up all the cell work items. They will then remove and destroy themselves once all remaining activity counts and/or a ref counts are dropped. This makes sure that all server records are dropped first. (7) The cell record state set is reduced to just four states: SETTING_UP, ACTIVE, REMOVING and DEAD. The record persists in the active state even when it's not being used until the time comes to remove it rather than downgrading it to an inactive state from whence it can be restored. This means that the cell still appears in /proc and /afs when not in use until it switches to the REMOVING state - at which point it is removed. Note that the REMOVING state is included so that someone wanting to resurrect the cell record is forced to wait whilst the cell is torn down in that state. Once it's in the DEAD state, it has been removed from net->cells tree and is no longer findable and can be replaced. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org cc: linux-fsdevel@vger.kernel.org Link: https://lore.kernel.org/r/20250224234154.2014840-16-dhowells@redhat.com/ # v1 Link: https://lore.kernel.org/r/20250310094206.801057-12-dhowells@redhat.com/ # v4
2025-02-24 16:06:03 +00:00
static bool afs_has_cell_expired(struct afs_cell *cell, time64_t *_next_manage)
{
const struct afs_vlserver_list *vllist;
time64_t expire_at = cell->last_inactive;
time64_t now = ktime_get_real_seconds();
if (atomic_read(&cell->active))
return false;
if (!cell->net->live)
return true;
vllist = rcu_dereference_protected(cell->vl_servers, true);
if (vllist && vllist->nr_servers > 0)
expire_at += afs_cell_gc_delay;
if (expire_at <= now)
return true;
if (expire_at < *_next_manage)
*_next_manage = expire_at;
return false;
}
/*
* Manage a cell record, initialising and destroying it, maintaining its DNS
* records.
*/
afs: Simplify cell record handling Simplify afs_cell record handling to avoid very occasional races that cause module removal to hang (it waits for all cell records to be removed). There are two things that particularly contribute to the difficulty: firstly, the code tries to pass a ref on the cell to the cell's maintenance work item (which gets awkward if the work item is already queued); and, secondly, there's an overall cell manager that tries to use just one timer for the entire cell collection (to avoid having loads of timers). However, both of these are probably unnecessarily restrictive. To simplify this, the following changes are made: (1) The cell record collection manager is removed. Each cell record manages itself individually. (2) Each afs_cell is given a second work item (cell->destroyer) that is queued when its refcount reaches zero. This is not done in the context of the putting thread as it might be in an inconvenient place to sleep. (3) Each afs_cell is given its own timer. The timer is used to expire the cell record after a period of unuse if not otherwise pinned and can also be used for other maintenance tasks if necessary (of which there are currently none as DNS refresh is triggered by filesystem operations). (4) The afs_cell manager work item (cell->manager) is no longer given a ref on the cell when queued; rather, the manager must be deleted. This does away with the need to deal with the consequences of losing a race to queue cell->manager. Clean up of extra queuing is deferred to the destroyer. (5) The cell destroyer work item makes sure the cell timer is removed and that the normal cell work is cancelled before farming the actual destruction off to RCU. (6) When a network namespace is destroyed or the kafs module is unloaded, it's now a simple matter of marking the namespace as dead then just waking up all the cell work items. They will then remove and destroy themselves once all remaining activity counts and/or a ref counts are dropped. This makes sure that all server records are dropped first. (7) The cell record state set is reduced to just four states: SETTING_UP, ACTIVE, REMOVING and DEAD. The record persists in the active state even when it's not being used until the time comes to remove it rather than downgrading it to an inactive state from whence it can be restored. This means that the cell still appears in /proc and /afs when not in use until it switches to the REMOVING state - at which point it is removed. Note that the REMOVING state is included so that someone wanting to resurrect the cell record is forced to wait whilst the cell is torn down in that state. Once it's in the DEAD state, it has been removed from net->cells tree and is no longer findable and can be replaced. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org cc: linux-fsdevel@vger.kernel.org Link: https://lore.kernel.org/r/20250224234154.2014840-16-dhowells@redhat.com/ # v1 Link: https://lore.kernel.org/r/20250310094206.801057-12-dhowells@redhat.com/ # v4
2025-02-24 16:06:03 +00:00
static bool afs_manage_cell(struct afs_cell *cell)
{
struct afs_net *net = cell->net;
afs: Simplify cell record handling Simplify afs_cell record handling to avoid very occasional races that cause module removal to hang (it waits for all cell records to be removed). There are two things that particularly contribute to the difficulty: firstly, the code tries to pass a ref on the cell to the cell's maintenance work item (which gets awkward if the work item is already queued); and, secondly, there's an overall cell manager that tries to use just one timer for the entire cell collection (to avoid having loads of timers). However, both of these are probably unnecessarily restrictive. To simplify this, the following changes are made: (1) The cell record collection manager is removed. Each cell record manages itself individually. (2) Each afs_cell is given a second work item (cell->destroyer) that is queued when its refcount reaches zero. This is not done in the context of the putting thread as it might be in an inconvenient place to sleep. (3) Each afs_cell is given its own timer. The timer is used to expire the cell record after a period of unuse if not otherwise pinned and can also be used for other maintenance tasks if necessary (of which there are currently none as DNS refresh is triggered by filesystem operations). (4) The afs_cell manager work item (cell->manager) is no longer given a ref on the cell when queued; rather, the manager must be deleted. This does away with the need to deal with the consequences of losing a race to queue cell->manager. Clean up of extra queuing is deferred to the destroyer. (5) The cell destroyer work item makes sure the cell timer is removed and that the normal cell work is cancelled before farming the actual destruction off to RCU. (6) When a network namespace is destroyed or the kafs module is unloaded, it's now a simple matter of marking the namespace as dead then just waking up all the cell work items. They will then remove and destroy themselves once all remaining activity counts and/or a ref counts are dropped. This makes sure that all server records are dropped first. (7) The cell record state set is reduced to just four states: SETTING_UP, ACTIVE, REMOVING and DEAD. The record persists in the active state even when it's not being used until the time comes to remove it rather than downgrading it to an inactive state from whence it can be restored. This means that the cell still appears in /proc and /afs when not in use until it switches to the REMOVING state - at which point it is removed. Note that the REMOVING state is included so that someone wanting to resurrect the cell record is forced to wait whilst the cell is torn down in that state. Once it's in the DEAD state, it has been removed from net->cells tree and is no longer findable and can be replaced. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org cc: linux-fsdevel@vger.kernel.org Link: https://lore.kernel.org/r/20250224234154.2014840-16-dhowells@redhat.com/ # v1 Link: https://lore.kernel.org/r/20250310094206.801057-12-dhowells@redhat.com/ # v4
2025-02-24 16:06:03 +00:00
time64_t next_manage = TIME64_MAX;
int ret;
_enter("%s", cell->name);
_debug("state %u", cell->state);
switch (cell->state) {
afs: Simplify cell record handling Simplify afs_cell record handling to avoid very occasional races that cause module removal to hang (it waits for all cell records to be removed). There are two things that particularly contribute to the difficulty: firstly, the code tries to pass a ref on the cell to the cell's maintenance work item (which gets awkward if the work item is already queued); and, secondly, there's an overall cell manager that tries to use just one timer for the entire cell collection (to avoid having loads of timers). However, both of these are probably unnecessarily restrictive. To simplify this, the following changes are made: (1) The cell record collection manager is removed. Each cell record manages itself individually. (2) Each afs_cell is given a second work item (cell->destroyer) that is queued when its refcount reaches zero. This is not done in the context of the putting thread as it might be in an inconvenient place to sleep. (3) Each afs_cell is given its own timer. The timer is used to expire the cell record after a period of unuse if not otherwise pinned and can also be used for other maintenance tasks if necessary (of which there are currently none as DNS refresh is triggered by filesystem operations). (4) The afs_cell manager work item (cell->manager) is no longer given a ref on the cell when queued; rather, the manager must be deleted. This does away with the need to deal with the consequences of losing a race to queue cell->manager. Clean up of extra queuing is deferred to the destroyer. (5) The cell destroyer work item makes sure the cell timer is removed and that the normal cell work is cancelled before farming the actual destruction off to RCU. (6) When a network namespace is destroyed or the kafs module is unloaded, it's now a simple matter of marking the namespace as dead then just waking up all the cell work items. They will then remove and destroy themselves once all remaining activity counts and/or a ref counts are dropped. This makes sure that all server records are dropped first. (7) The cell record state set is reduced to just four states: SETTING_UP, ACTIVE, REMOVING and DEAD. The record persists in the active state even when it's not being used until the time comes to remove it rather than downgrading it to an inactive state from whence it can be restored. This means that the cell still appears in /proc and /afs when not in use until it switches to the REMOVING state - at which point it is removed. Note that the REMOVING state is included so that someone wanting to resurrect the cell record is forced to wait whilst the cell is torn down in that state. Once it's in the DEAD state, it has been removed from net->cells tree and is no longer findable and can be replaced. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org cc: linux-fsdevel@vger.kernel.org Link: https://lore.kernel.org/r/20250224234154.2014840-16-dhowells@redhat.com/ # v1 Link: https://lore.kernel.org/r/20250310094206.801057-12-dhowells@redhat.com/ # v4
2025-02-24 16:06:03 +00:00
case AFS_CELL_SETTING_UP:
goto set_up_cell;
case AFS_CELL_ACTIVE:
goto cell_is_active;
case AFS_CELL_REMOVING:
WARN_ON_ONCE(1);
return false;
case AFS_CELL_DEAD:
return false;
default:
_debug("bad state %u", cell->state);
WARN_ON_ONCE(1); /* Unhandled state */
return false;
}
afs: Simplify cell record handling Simplify afs_cell record handling to avoid very occasional races that cause module removal to hang (it waits for all cell records to be removed). There are two things that particularly contribute to the difficulty: firstly, the code tries to pass a ref on the cell to the cell's maintenance work item (which gets awkward if the work item is already queued); and, secondly, there's an overall cell manager that tries to use just one timer for the entire cell collection (to avoid having loads of timers). However, both of these are probably unnecessarily restrictive. To simplify this, the following changes are made: (1) The cell record collection manager is removed. Each cell record manages itself individually. (2) Each afs_cell is given a second work item (cell->destroyer) that is queued when its refcount reaches zero. This is not done in the context of the putting thread as it might be in an inconvenient place to sleep. (3) Each afs_cell is given its own timer. The timer is used to expire the cell record after a period of unuse if not otherwise pinned and can also be used for other maintenance tasks if necessary (of which there are currently none as DNS refresh is triggered by filesystem operations). (4) The afs_cell manager work item (cell->manager) is no longer given a ref on the cell when queued; rather, the manager must be deleted. This does away with the need to deal with the consequences of losing a race to queue cell->manager. Clean up of extra queuing is deferred to the destroyer. (5) The cell destroyer work item makes sure the cell timer is removed and that the normal cell work is cancelled before farming the actual destruction off to RCU. (6) When a network namespace is destroyed or the kafs module is unloaded, it's now a simple matter of marking the namespace as dead then just waking up all the cell work items. They will then remove and destroy themselves once all remaining activity counts and/or a ref counts are dropped. This makes sure that all server records are dropped first. (7) The cell record state set is reduced to just four states: SETTING_UP, ACTIVE, REMOVING and DEAD. The record persists in the active state even when it's not being used until the time comes to remove it rather than downgrading it to an inactive state from whence it can be restored. This means that the cell still appears in /proc and /afs when not in use until it switches to the REMOVING state - at which point it is removed. Note that the REMOVING state is included so that someone wanting to resurrect the cell record is forced to wait whilst the cell is torn down in that state. Once it's in the DEAD state, it has been removed from net->cells tree and is no longer findable and can be replaced. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org cc: linux-fsdevel@vger.kernel.org Link: https://lore.kernel.org/r/20250224234154.2014840-16-dhowells@redhat.com/ # v1 Link: https://lore.kernel.org/r/20250310094206.801057-12-dhowells@redhat.com/ # v4
2025-02-24 16:06:03 +00:00
set_up_cell:
ret = afs_activate_cell(net, cell);
if (ret < 0) {
cell->error = ret;
goto remove_cell;
}
afs: Simplify cell record handling Simplify afs_cell record handling to avoid very occasional races that cause module removal to hang (it waits for all cell records to be removed). There are two things that particularly contribute to the difficulty: firstly, the code tries to pass a ref on the cell to the cell's maintenance work item (which gets awkward if the work item is already queued); and, secondly, there's an overall cell manager that tries to use just one timer for the entire cell collection (to avoid having loads of timers). However, both of these are probably unnecessarily restrictive. To simplify this, the following changes are made: (1) The cell record collection manager is removed. Each cell record manages itself individually. (2) Each afs_cell is given a second work item (cell->destroyer) that is queued when its refcount reaches zero. This is not done in the context of the putting thread as it might be in an inconvenient place to sleep. (3) Each afs_cell is given its own timer. The timer is used to expire the cell record after a period of unuse if not otherwise pinned and can also be used for other maintenance tasks if necessary (of which there are currently none as DNS refresh is triggered by filesystem operations). (4) The afs_cell manager work item (cell->manager) is no longer given a ref on the cell when queued; rather, the manager must be deleted. This does away with the need to deal with the consequences of losing a race to queue cell->manager. Clean up of extra queuing is deferred to the destroyer. (5) The cell destroyer work item makes sure the cell timer is removed and that the normal cell work is cancelled before farming the actual destruction off to RCU. (6) When a network namespace is destroyed or the kafs module is unloaded, it's now a simple matter of marking the namespace as dead then just waking up all the cell work items. They will then remove and destroy themselves once all remaining activity counts and/or a ref counts are dropped. This makes sure that all server records are dropped first. (7) The cell record state set is reduced to just four states: SETTING_UP, ACTIVE, REMOVING and DEAD. The record persists in the active state even when it's not being used until the time comes to remove it rather than downgrading it to an inactive state from whence it can be restored. This means that the cell still appears in /proc and /afs when not in use until it switches to the REMOVING state - at which point it is removed. Note that the REMOVING state is included so that someone wanting to resurrect the cell record is forced to wait whilst the cell is torn down in that state. Once it's in the DEAD state, it has been removed from net->cells tree and is no longer findable and can be replaced. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org cc: linux-fsdevel@vger.kernel.org Link: https://lore.kernel.org/r/20250224234154.2014840-16-dhowells@redhat.com/ # v1 Link: https://lore.kernel.org/r/20250310094206.801057-12-dhowells@redhat.com/ # v4
2025-02-24 16:06:03 +00:00
afs_set_cell_state(cell, AFS_CELL_ACTIVE);
afs: Simplify cell record handling Simplify afs_cell record handling to avoid very occasional races that cause module removal to hang (it waits for all cell records to be removed). There are two things that particularly contribute to the difficulty: firstly, the code tries to pass a ref on the cell to the cell's maintenance work item (which gets awkward if the work item is already queued); and, secondly, there's an overall cell manager that tries to use just one timer for the entire cell collection (to avoid having loads of timers). However, both of these are probably unnecessarily restrictive. To simplify this, the following changes are made: (1) The cell record collection manager is removed. Each cell record manages itself individually. (2) Each afs_cell is given a second work item (cell->destroyer) that is queued when its refcount reaches zero. This is not done in the context of the putting thread as it might be in an inconvenient place to sleep. (3) Each afs_cell is given its own timer. The timer is used to expire the cell record after a period of unuse if not otherwise pinned and can also be used for other maintenance tasks if necessary (of which there are currently none as DNS refresh is triggered by filesystem operations). (4) The afs_cell manager work item (cell->manager) is no longer given a ref on the cell when queued; rather, the manager must be deleted. This does away with the need to deal with the consequences of losing a race to queue cell->manager. Clean up of extra queuing is deferred to the destroyer. (5) The cell destroyer work item makes sure the cell timer is removed and that the normal cell work is cancelled before farming the actual destruction off to RCU. (6) When a network namespace is destroyed or the kafs module is unloaded, it's now a simple matter of marking the namespace as dead then just waking up all the cell work items. They will then remove and destroy themselves once all remaining activity counts and/or a ref counts are dropped. This makes sure that all server records are dropped first. (7) The cell record state set is reduced to just four states: SETTING_UP, ACTIVE, REMOVING and DEAD. The record persists in the active state even when it's not being used until the time comes to remove it rather than downgrading it to an inactive state from whence it can be restored. This means that the cell still appears in /proc and /afs when not in use until it switches to the REMOVING state - at which point it is removed. Note that the REMOVING state is included so that someone wanting to resurrect the cell record is forced to wait whilst the cell is torn down in that state. Once it's in the DEAD state, it has been removed from net->cells tree and is no longer findable and can be replaced. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org cc: linux-fsdevel@vger.kernel.org Link: https://lore.kernel.org/r/20250224234154.2014840-16-dhowells@redhat.com/ # v1 Link: https://lore.kernel.org/r/20250310094206.801057-12-dhowells@redhat.com/ # v4
2025-02-24 16:06:03 +00:00
cell_is_active:
if (afs_has_cell_expired(cell, &next_manage))
goto remove_cell;
afs: Simplify cell record handling Simplify afs_cell record handling to avoid very occasional races that cause module removal to hang (it waits for all cell records to be removed). There are two things that particularly contribute to the difficulty: firstly, the code tries to pass a ref on the cell to the cell's maintenance work item (which gets awkward if the work item is already queued); and, secondly, there's an overall cell manager that tries to use just one timer for the entire cell collection (to avoid having loads of timers). However, both of these are probably unnecessarily restrictive. To simplify this, the following changes are made: (1) The cell record collection manager is removed. Each cell record manages itself individually. (2) Each afs_cell is given a second work item (cell->destroyer) that is queued when its refcount reaches zero. This is not done in the context of the putting thread as it might be in an inconvenient place to sleep. (3) Each afs_cell is given its own timer. The timer is used to expire the cell record after a period of unuse if not otherwise pinned and can also be used for other maintenance tasks if necessary (of which there are currently none as DNS refresh is triggered by filesystem operations). (4) The afs_cell manager work item (cell->manager) is no longer given a ref on the cell when queued; rather, the manager must be deleted. This does away with the need to deal with the consequences of losing a race to queue cell->manager. Clean up of extra queuing is deferred to the destroyer. (5) The cell destroyer work item makes sure the cell timer is removed and that the normal cell work is cancelled before farming the actual destruction off to RCU. (6) When a network namespace is destroyed or the kafs module is unloaded, it's now a simple matter of marking the namespace as dead then just waking up all the cell work items. They will then remove and destroy themselves once all remaining activity counts and/or a ref counts are dropped. This makes sure that all server records are dropped first. (7) The cell record state set is reduced to just four states: SETTING_UP, ACTIVE, REMOVING and DEAD. The record persists in the active state even when it's not being used until the time comes to remove it rather than downgrading it to an inactive state from whence it can be restored. This means that the cell still appears in /proc and /afs when not in use until it switches to the REMOVING state - at which point it is removed. Note that the REMOVING state is included so that someone wanting to resurrect the cell record is forced to wait whilst the cell is torn down in that state. Once it's in the DEAD state, it has been removed from net->cells tree and is no longer findable and can be replaced. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org cc: linux-fsdevel@vger.kernel.org Link: https://lore.kernel.org/r/20250224234154.2014840-16-dhowells@redhat.com/ # v1 Link: https://lore.kernel.org/r/20250310094206.801057-12-dhowells@redhat.com/ # v4
2025-02-24 16:06:03 +00:00
if (test_and_clear_bit(AFS_CELL_FL_DO_LOOKUP, &cell->flags)) {
ret = afs_update_cell(cell);
if (ret < 0)
cell->error = ret;
}
afs: Simplify cell record handling Simplify afs_cell record handling to avoid very occasional races that cause module removal to hang (it waits for all cell records to be removed). There are two things that particularly contribute to the difficulty: firstly, the code tries to pass a ref on the cell to the cell's maintenance work item (which gets awkward if the work item is already queued); and, secondly, there's an overall cell manager that tries to use just one timer for the entire cell collection (to avoid having loads of timers). However, both of these are probably unnecessarily restrictive. To simplify this, the following changes are made: (1) The cell record collection manager is removed. Each cell record manages itself individually. (2) Each afs_cell is given a second work item (cell->destroyer) that is queued when its refcount reaches zero. This is not done in the context of the putting thread as it might be in an inconvenient place to sleep. (3) Each afs_cell is given its own timer. The timer is used to expire the cell record after a period of unuse if not otherwise pinned and can also be used for other maintenance tasks if necessary (of which there are currently none as DNS refresh is triggered by filesystem operations). (4) The afs_cell manager work item (cell->manager) is no longer given a ref on the cell when queued; rather, the manager must be deleted. This does away with the need to deal with the consequences of losing a race to queue cell->manager. Clean up of extra queuing is deferred to the destroyer. (5) The cell destroyer work item makes sure the cell timer is removed and that the normal cell work is cancelled before farming the actual destruction off to RCU. (6) When a network namespace is destroyed or the kafs module is unloaded, it's now a simple matter of marking the namespace as dead then just waking up all the cell work items. They will then remove and destroy themselves once all remaining activity counts and/or a ref counts are dropped. This makes sure that all server records are dropped first. (7) The cell record state set is reduced to just four states: SETTING_UP, ACTIVE, REMOVING and DEAD. The record persists in the active state even when it's not being used until the time comes to remove it rather than downgrading it to an inactive state from whence it can be restored. This means that the cell still appears in /proc and /afs when not in use until it switches to the REMOVING state - at which point it is removed. Note that the REMOVING state is included so that someone wanting to resurrect the cell record is forced to wait whilst the cell is torn down in that state. Once it's in the DEAD state, it has been removed from net->cells tree and is no longer findable and can be replaced. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org cc: linux-fsdevel@vger.kernel.org Link: https://lore.kernel.org/r/20250224234154.2014840-16-dhowells@redhat.com/ # v1 Link: https://lore.kernel.org/r/20250310094206.801057-12-dhowells@redhat.com/ # v4
2025-02-24 16:06:03 +00:00
if (next_manage < TIME64_MAX && cell->net->live) {
time64_t now = ktime_get_real_seconds();
if (next_manage - now <= 0)
afs_queue_cell(cell, afs_cell_trace_queue_again);
else
afs_set_cell_timer(cell, next_manage - now);
}
afs: Simplify cell record handling Simplify afs_cell record handling to avoid very occasional races that cause module removal to hang (it waits for all cell records to be removed). There are two things that particularly contribute to the difficulty: firstly, the code tries to pass a ref on the cell to the cell's maintenance work item (which gets awkward if the work item is already queued); and, secondly, there's an overall cell manager that tries to use just one timer for the entire cell collection (to avoid having loads of timers). However, both of these are probably unnecessarily restrictive. To simplify this, the following changes are made: (1) The cell record collection manager is removed. Each cell record manages itself individually. (2) Each afs_cell is given a second work item (cell->destroyer) that is queued when its refcount reaches zero. This is not done in the context of the putting thread as it might be in an inconvenient place to sleep. (3) Each afs_cell is given its own timer. The timer is used to expire the cell record after a period of unuse if not otherwise pinned and can also be used for other maintenance tasks if necessary (of which there are currently none as DNS refresh is triggered by filesystem operations). (4) The afs_cell manager work item (cell->manager) is no longer given a ref on the cell when queued; rather, the manager must be deleted. This does away with the need to deal with the consequences of losing a race to queue cell->manager. Clean up of extra queuing is deferred to the destroyer. (5) The cell destroyer work item makes sure the cell timer is removed and that the normal cell work is cancelled before farming the actual destruction off to RCU. (6) When a network namespace is destroyed or the kafs module is unloaded, it's now a simple matter of marking the namespace as dead then just waking up all the cell work items. They will then remove and destroy themselves once all remaining activity counts and/or a ref counts are dropped. This makes sure that all server records are dropped first. (7) The cell record state set is reduced to just four states: SETTING_UP, ACTIVE, REMOVING and DEAD. The record persists in the active state even when it's not being used until the time comes to remove it rather than downgrading it to an inactive state from whence it can be restored. This means that the cell still appears in /proc and /afs when not in use until it switches to the REMOVING state - at which point it is removed. Note that the REMOVING state is included so that someone wanting to resurrect the cell record is forced to wait whilst the cell is torn down in that state. Once it's in the DEAD state, it has been removed from net->cells tree and is no longer findable and can be replaced. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org cc: linux-fsdevel@vger.kernel.org Link: https://lore.kernel.org/r/20250224234154.2014840-16-dhowells@redhat.com/ # v1 Link: https://lore.kernel.org/r/20250310094206.801057-12-dhowells@redhat.com/ # v4
2025-02-24 16:06:03 +00:00
_leave(" [done %u]", cell->state);
return false;
afs: Simplify cell record handling Simplify afs_cell record handling to avoid very occasional races that cause module removal to hang (it waits for all cell records to be removed). There are two things that particularly contribute to the difficulty: firstly, the code tries to pass a ref on the cell to the cell's maintenance work item (which gets awkward if the work item is already queued); and, secondly, there's an overall cell manager that tries to use just one timer for the entire cell collection (to avoid having loads of timers). However, both of these are probably unnecessarily restrictive. To simplify this, the following changes are made: (1) The cell record collection manager is removed. Each cell record manages itself individually. (2) Each afs_cell is given a second work item (cell->destroyer) that is queued when its refcount reaches zero. This is not done in the context of the putting thread as it might be in an inconvenient place to sleep. (3) Each afs_cell is given its own timer. The timer is used to expire the cell record after a period of unuse if not otherwise pinned and can also be used for other maintenance tasks if necessary (of which there are currently none as DNS refresh is triggered by filesystem operations). (4) The afs_cell manager work item (cell->manager) is no longer given a ref on the cell when queued; rather, the manager must be deleted. This does away with the need to deal with the consequences of losing a race to queue cell->manager. Clean up of extra queuing is deferred to the destroyer. (5) The cell destroyer work item makes sure the cell timer is removed and that the normal cell work is cancelled before farming the actual destruction off to RCU. (6) When a network namespace is destroyed or the kafs module is unloaded, it's now a simple matter of marking the namespace as dead then just waking up all the cell work items. They will then remove and destroy themselves once all remaining activity counts and/or a ref counts are dropped. This makes sure that all server records are dropped first. (7) The cell record state set is reduced to just four states: SETTING_UP, ACTIVE, REMOVING and DEAD. The record persists in the active state even when it's not being used until the time comes to remove it rather than downgrading it to an inactive state from whence it can be restored. This means that the cell still appears in /proc and /afs when not in use until it switches to the REMOVING state - at which point it is removed. Note that the REMOVING state is included so that someone wanting to resurrect the cell record is forced to wait whilst the cell is torn down in that state. Once it's in the DEAD state, it has been removed from net->cells tree and is no longer findable and can be replaced. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org cc: linux-fsdevel@vger.kernel.org Link: https://lore.kernel.org/r/20250224234154.2014840-16-dhowells@redhat.com/ # v1 Link: https://lore.kernel.org/r/20250310094206.801057-12-dhowells@redhat.com/ # v4
2025-02-24 16:06:03 +00:00
remove_cell:
down_write(&net->cells_lock);
afs: Simplify cell record handling Simplify afs_cell record handling to avoid very occasional races that cause module removal to hang (it waits for all cell records to be removed). There are two things that particularly contribute to the difficulty: firstly, the code tries to pass a ref on the cell to the cell's maintenance work item (which gets awkward if the work item is already queued); and, secondly, there's an overall cell manager that tries to use just one timer for the entire cell collection (to avoid having loads of timers). However, both of these are probably unnecessarily restrictive. To simplify this, the following changes are made: (1) The cell record collection manager is removed. Each cell record manages itself individually. (2) Each afs_cell is given a second work item (cell->destroyer) that is queued when its refcount reaches zero. This is not done in the context of the putting thread as it might be in an inconvenient place to sleep. (3) Each afs_cell is given its own timer. The timer is used to expire the cell record after a period of unuse if not otherwise pinned and can also be used for other maintenance tasks if necessary (of which there are currently none as DNS refresh is triggered by filesystem operations). (4) The afs_cell manager work item (cell->manager) is no longer given a ref on the cell when queued; rather, the manager must be deleted. This does away with the need to deal with the consequences of losing a race to queue cell->manager. Clean up of extra queuing is deferred to the destroyer. (5) The cell destroyer work item makes sure the cell timer is removed and that the normal cell work is cancelled before farming the actual destruction off to RCU. (6) When a network namespace is destroyed or the kafs module is unloaded, it's now a simple matter of marking the namespace as dead then just waking up all the cell work items. They will then remove and destroy themselves once all remaining activity counts and/or a ref counts are dropped. This makes sure that all server records are dropped first. (7) The cell record state set is reduced to just four states: SETTING_UP, ACTIVE, REMOVING and DEAD. The record persists in the active state even when it's not being used until the time comes to remove it rather than downgrading it to an inactive state from whence it can be restored. This means that the cell still appears in /proc and /afs when not in use until it switches to the REMOVING state - at which point it is removed. Note that the REMOVING state is included so that someone wanting to resurrect the cell record is forced to wait whilst the cell is torn down in that state. Once it's in the DEAD state, it has been removed from net->cells tree and is no longer findable and can be replaced. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org cc: linux-fsdevel@vger.kernel.org Link: https://lore.kernel.org/r/20250224234154.2014840-16-dhowells@redhat.com/ # v1 Link: https://lore.kernel.org/r/20250310094206.801057-12-dhowells@redhat.com/ # v4
2025-02-24 16:06:03 +00:00
if (atomic_read(&cell->active)) {
up_write(&net->cells_lock);
goto cell_is_active;
}
afs: Simplify cell record handling Simplify afs_cell record handling to avoid very occasional races that cause module removal to hang (it waits for all cell records to be removed). There are two things that particularly contribute to the difficulty: firstly, the code tries to pass a ref on the cell to the cell's maintenance work item (which gets awkward if the work item is already queued); and, secondly, there's an overall cell manager that tries to use just one timer for the entire cell collection (to avoid having loads of timers). However, both of these are probably unnecessarily restrictive. To simplify this, the following changes are made: (1) The cell record collection manager is removed. Each cell record manages itself individually. (2) Each afs_cell is given a second work item (cell->destroyer) that is queued when its refcount reaches zero. This is not done in the context of the putting thread as it might be in an inconvenient place to sleep. (3) Each afs_cell is given its own timer. The timer is used to expire the cell record after a period of unuse if not otherwise pinned and can also be used for other maintenance tasks if necessary (of which there are currently none as DNS refresh is triggered by filesystem operations). (4) The afs_cell manager work item (cell->manager) is no longer given a ref on the cell when queued; rather, the manager must be deleted. This does away with the need to deal with the consequences of losing a race to queue cell->manager. Clean up of extra queuing is deferred to the destroyer. (5) The cell destroyer work item makes sure the cell timer is removed and that the normal cell work is cancelled before farming the actual destruction off to RCU. (6) When a network namespace is destroyed or the kafs module is unloaded, it's now a simple matter of marking the namespace as dead then just waking up all the cell work items. They will then remove and destroy themselves once all remaining activity counts and/or a ref counts are dropped. This makes sure that all server records are dropped first. (7) The cell record state set is reduced to just four states: SETTING_UP, ACTIVE, REMOVING and DEAD. The record persists in the active state even when it's not being used until the time comes to remove it rather than downgrading it to an inactive state from whence it can be restored. This means that the cell still appears in /proc and /afs when not in use until it switches to the REMOVING state - at which point it is removed. Note that the REMOVING state is included so that someone wanting to resurrect the cell record is forced to wait whilst the cell is torn down in that state. Once it's in the DEAD state, it has been removed from net->cells tree and is no longer findable and can be replaced. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org cc: linux-fsdevel@vger.kernel.org Link: https://lore.kernel.org/r/20250224234154.2014840-16-dhowells@redhat.com/ # v1 Link: https://lore.kernel.org/r/20250310094206.801057-12-dhowells@redhat.com/ # v4
2025-02-24 16:06:03 +00:00
/* Make sure that the expiring server records are going to see the fact
* that the cell is caput.
*/
afs_set_cell_state(cell, AFS_CELL_REMOVING);
afs: Simplify cell record handling Simplify afs_cell record handling to avoid very occasional races that cause module removal to hang (it waits for all cell records to be removed). There are two things that particularly contribute to the difficulty: firstly, the code tries to pass a ref on the cell to the cell's maintenance work item (which gets awkward if the work item is already queued); and, secondly, there's an overall cell manager that tries to use just one timer for the entire cell collection (to avoid having loads of timers). However, both of these are probably unnecessarily restrictive. To simplify this, the following changes are made: (1) The cell record collection manager is removed. Each cell record manages itself individually. (2) Each afs_cell is given a second work item (cell->destroyer) that is queued when its refcount reaches zero. This is not done in the context of the putting thread as it might be in an inconvenient place to sleep. (3) Each afs_cell is given its own timer. The timer is used to expire the cell record after a period of unuse if not otherwise pinned and can also be used for other maintenance tasks if necessary (of which there are currently none as DNS refresh is triggered by filesystem operations). (4) The afs_cell manager work item (cell->manager) is no longer given a ref on the cell when queued; rather, the manager must be deleted. This does away with the need to deal with the consequences of losing a race to queue cell->manager. Clean up of extra queuing is deferred to the destroyer. (5) The cell destroyer work item makes sure the cell timer is removed and that the normal cell work is cancelled before farming the actual destruction off to RCU. (6) When a network namespace is destroyed or the kafs module is unloaded, it's now a simple matter of marking the namespace as dead then just waking up all the cell work items. They will then remove and destroy themselves once all remaining activity counts and/or a ref counts are dropped. This makes sure that all server records are dropped first. (7) The cell record state set is reduced to just four states: SETTING_UP, ACTIVE, REMOVING and DEAD. The record persists in the active state even when it's not being used until the time comes to remove it rather than downgrading it to an inactive state from whence it can be restored. This means that the cell still appears in /proc and /afs when not in use until it switches to the REMOVING state - at which point it is removed. Note that the REMOVING state is included so that someone wanting to resurrect the cell record is forced to wait whilst the cell is torn down in that state. Once it's in the DEAD state, it has been removed from net->cells tree and is no longer findable and can be replaced. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org cc: linux-fsdevel@vger.kernel.org Link: https://lore.kernel.org/r/20250224234154.2014840-16-dhowells@redhat.com/ # v1 Link: https://lore.kernel.org/r/20250310094206.801057-12-dhowells@redhat.com/ # v4
2025-02-24 16:06:03 +00:00
afs_deactivate_cell(net, cell);
afs_purge_servers(cell);
rb_erase(&cell->net_node, &net->cells);
afs_see_cell(cell, afs_cell_trace_unuse_delete);
up_write(&net->cells_lock);
afs: Fix cell refcounting by splitting the usage counter Management of the lifetime of afs_cell struct has some problems due to the usage counter being used to determine whether objects of that type are in use in addition to whether anyone might be interested in the structure. This is made trickier by cell objects being cached for a period of time in case they're quickly reused as they hold the result of a setup process that may be slow (DNS lookups, AFS RPC ops). Problems include the cached root volume from alias resolution pinning its parent cell record, rmmod occasionally hanging and occasionally producing assertion failures. Fix this by splitting the count of active users from the struct reference count. Things then work as follows: (1) The cell cache keeps +1 on the cell's activity count and this has to be dropped before the cell can be removed. afs_manage_cell() tries to exchange the 1 to a 0 with the cells_lock write-locked, and if successful, the record is removed from the net->cells. (2) One struct ref is 'owned' by the activity count. That is put when the active count is reduced to 0 (final_destruction label). (3) A ref can be held on a cell whilst it is queued for management on a work queue without confusing the active count. afs_queue_cell() is added to wrap this. (4) The queue's ref is dropped at the end of the management. This is split out into a separate function, afs_manage_cell_work(). (5) The root volume record is put after a cell is removed (at the final_destruction label) rather then in the RCU destruction routine. (6) Volumes hold struct refs, but aren't active users. (7) Both counts are displayed in /proc/net/afs/cells. There are some management function changes: (*) afs_put_cell() now just decrements the refcount and triggers the RCU destruction if it becomes 0. It no longer sets a timer to have the manager do this. (*) afs_use_cell() and afs_unuse_cell() are added to increase and decrease the active count. afs_unuse_cell() sets the management timer. (*) afs_queue_cell() is added to queue a cell with approprate refs. There are also some other fixes: (*) Don't let /proc/net/afs/cells access a cell's vllist if it's NULL. (*) Make sure that candidate cells in lookups are properly destroyed rather than being simply kfree'd. This ensures the bits it points to are destroyed also. (*) afs_dec_cells_outstanding() is now called in cell destruction rather than at "final_destruction". This ensures that cell->net is still valid to the end of the destructor. (*) As a consequence of the previous two changes, move the increment of net->cells_outstanding that was at the point of insertion into the tree to the allocation routine to correctly balance things. Fixes: 989782dcdc91 ("afs: Overhaul cell database management") Signed-off-by: David Howells <dhowells@redhat.com>
2019-07-23 11:24:59 +01:00
/* The root volume is pinning the cell */
afs_put_volume(cell->root_volume, afs_volume_trace_put_cell_root);
afs: Fix cell refcounting by splitting the usage counter Management of the lifetime of afs_cell struct has some problems due to the usage counter being used to determine whether objects of that type are in use in addition to whether anyone might be interested in the structure. This is made trickier by cell objects being cached for a period of time in case they're quickly reused as they hold the result of a setup process that may be slow (DNS lookups, AFS RPC ops). Problems include the cached root volume from alias resolution pinning its parent cell record, rmmod occasionally hanging and occasionally producing assertion failures. Fix this by splitting the count of active users from the struct reference count. Things then work as follows: (1) The cell cache keeps +1 on the cell's activity count and this has to be dropped before the cell can be removed. afs_manage_cell() tries to exchange the 1 to a 0 with the cells_lock write-locked, and if successful, the record is removed from the net->cells. (2) One struct ref is 'owned' by the activity count. That is put when the active count is reduced to 0 (final_destruction label). (3) A ref can be held on a cell whilst it is queued for management on a work queue without confusing the active count. afs_queue_cell() is added to wrap this. (4) The queue's ref is dropped at the end of the management. This is split out into a separate function, afs_manage_cell_work(). (5) The root volume record is put after a cell is removed (at the final_destruction label) rather then in the RCU destruction routine. (6) Volumes hold struct refs, but aren't active users. (7) Both counts are displayed in /proc/net/afs/cells. There are some management function changes: (*) afs_put_cell() now just decrements the refcount and triggers the RCU destruction if it becomes 0. It no longer sets a timer to have the manager do this. (*) afs_use_cell() and afs_unuse_cell() are added to increase and decrease the active count. afs_unuse_cell() sets the management timer. (*) afs_queue_cell() is added to queue a cell with approprate refs. There are also some other fixes: (*) Don't let /proc/net/afs/cells access a cell's vllist if it's NULL. (*) Make sure that candidate cells in lookups are properly destroyed rather than being simply kfree'd. This ensures the bits it points to are destroyed also. (*) afs_dec_cells_outstanding() is now called in cell destruction rather than at "final_destruction". This ensures that cell->net is still valid to the end of the destructor. (*) As a consequence of the previous two changes, move the increment of net->cells_outstanding that was at the point of insertion into the tree to the allocation routine to correctly balance things. Fixes: 989782dcdc91 ("afs: Overhaul cell database management") Signed-off-by: David Howells <dhowells@redhat.com>
2019-07-23 11:24:59 +01:00
cell->root_volume = NULL;
afs: Simplify cell record handling Simplify afs_cell record handling to avoid very occasional races that cause module removal to hang (it waits for all cell records to be removed). There are two things that particularly contribute to the difficulty: firstly, the code tries to pass a ref on the cell to the cell's maintenance work item (which gets awkward if the work item is already queued); and, secondly, there's an overall cell manager that tries to use just one timer for the entire cell collection (to avoid having loads of timers). However, both of these are probably unnecessarily restrictive. To simplify this, the following changes are made: (1) The cell record collection manager is removed. Each cell record manages itself individually. (2) Each afs_cell is given a second work item (cell->destroyer) that is queued when its refcount reaches zero. This is not done in the context of the putting thread as it might be in an inconvenient place to sleep. (3) Each afs_cell is given its own timer. The timer is used to expire the cell record after a period of unuse if not otherwise pinned and can also be used for other maintenance tasks if necessary (of which there are currently none as DNS refresh is triggered by filesystem operations). (4) The afs_cell manager work item (cell->manager) is no longer given a ref on the cell when queued; rather, the manager must be deleted. This does away with the need to deal with the consequences of losing a race to queue cell->manager. Clean up of extra queuing is deferred to the destroyer. (5) The cell destroyer work item makes sure the cell timer is removed and that the normal cell work is cancelled before farming the actual destruction off to RCU. (6) When a network namespace is destroyed or the kafs module is unloaded, it's now a simple matter of marking the namespace as dead then just waking up all the cell work items. They will then remove and destroy themselves once all remaining activity counts and/or a ref counts are dropped. This makes sure that all server records are dropped first. (7) The cell record state set is reduced to just four states: SETTING_UP, ACTIVE, REMOVING and DEAD. The record persists in the active state even when it's not being used until the time comes to remove it rather than downgrading it to an inactive state from whence it can be restored. This means that the cell still appears in /proc and /afs when not in use until it switches to the REMOVING state - at which point it is removed. Note that the REMOVING state is included so that someone wanting to resurrect the cell record is forced to wait whilst the cell is torn down in that state. Once it's in the DEAD state, it has been removed from net->cells tree and is no longer findable and can be replaced. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org cc: linux-fsdevel@vger.kernel.org Link: https://lore.kernel.org/r/20250224234154.2014840-16-dhowells@redhat.com/ # v1 Link: https://lore.kernel.org/r/20250310094206.801057-12-dhowells@redhat.com/ # v4
2025-02-24 16:06:03 +00:00
afs_set_cell_state(cell, AFS_CELL_DEAD);
return true;
afs: Fix cell refcounting by splitting the usage counter Management of the lifetime of afs_cell struct has some problems due to the usage counter being used to determine whether objects of that type are in use in addition to whether anyone might be interested in the structure. This is made trickier by cell objects being cached for a period of time in case they're quickly reused as they hold the result of a setup process that may be slow (DNS lookups, AFS RPC ops). Problems include the cached root volume from alias resolution pinning its parent cell record, rmmod occasionally hanging and occasionally producing assertion failures. Fix this by splitting the count of active users from the struct reference count. Things then work as follows: (1) The cell cache keeps +1 on the cell's activity count and this has to be dropped before the cell can be removed. afs_manage_cell() tries to exchange the 1 to a 0 with the cells_lock write-locked, and if successful, the record is removed from the net->cells. (2) One struct ref is 'owned' by the activity count. That is put when the active count is reduced to 0 (final_destruction label). (3) A ref can be held on a cell whilst it is queued for management on a work queue without confusing the active count. afs_queue_cell() is added to wrap this. (4) The queue's ref is dropped at the end of the management. This is split out into a separate function, afs_manage_cell_work(). (5) The root volume record is put after a cell is removed (at the final_destruction label) rather then in the RCU destruction routine. (6) Volumes hold struct refs, but aren't active users. (7) Both counts are displayed in /proc/net/afs/cells. There are some management function changes: (*) afs_put_cell() now just decrements the refcount and triggers the RCU destruction if it becomes 0. It no longer sets a timer to have the manager do this. (*) afs_use_cell() and afs_unuse_cell() are added to increase and decrease the active count. afs_unuse_cell() sets the management timer. (*) afs_queue_cell() is added to queue a cell with approprate refs. There are also some other fixes: (*) Don't let /proc/net/afs/cells access a cell's vllist if it's NULL. (*) Make sure that candidate cells in lookups are properly destroyed rather than being simply kfree'd. This ensures the bits it points to are destroyed also. (*) afs_dec_cells_outstanding() is now called in cell destruction rather than at "final_destruction". This ensures that cell->net is still valid to the end of the destructor. (*) As a consequence of the previous two changes, move the increment of net->cells_outstanding that was at the point of insertion into the tree to the allocation routine to correctly balance things. Fixes: 989782dcdc91 ("afs: Overhaul cell database management") Signed-off-by: David Howells <dhowells@redhat.com>
2019-07-23 11:24:59 +01:00
}
static void afs_manage_cell_work(struct work_struct *work)
{
struct afs_cell *cell = container_of(work, struct afs_cell, manager);
afs: Simplify cell record handling Simplify afs_cell record handling to avoid very occasional races that cause module removal to hang (it waits for all cell records to be removed). There are two things that particularly contribute to the difficulty: firstly, the code tries to pass a ref on the cell to the cell's maintenance work item (which gets awkward if the work item is already queued); and, secondly, there's an overall cell manager that tries to use just one timer for the entire cell collection (to avoid having loads of timers). However, both of these are probably unnecessarily restrictive. To simplify this, the following changes are made: (1) The cell record collection manager is removed. Each cell record manages itself individually. (2) Each afs_cell is given a second work item (cell->destroyer) that is queued when its refcount reaches zero. This is not done in the context of the putting thread as it might be in an inconvenient place to sleep. (3) Each afs_cell is given its own timer. The timer is used to expire the cell record after a period of unuse if not otherwise pinned and can also be used for other maintenance tasks if necessary (of which there are currently none as DNS refresh is triggered by filesystem operations). (4) The afs_cell manager work item (cell->manager) is no longer given a ref on the cell when queued; rather, the manager must be deleted. This does away with the need to deal with the consequences of losing a race to queue cell->manager. Clean up of extra queuing is deferred to the destroyer. (5) The cell destroyer work item makes sure the cell timer is removed and that the normal cell work is cancelled before farming the actual destruction off to RCU. (6) When a network namespace is destroyed or the kafs module is unloaded, it's now a simple matter of marking the namespace as dead then just waking up all the cell work items. They will then remove and destroy themselves once all remaining activity counts and/or a ref counts are dropped. This makes sure that all server records are dropped first. (7) The cell record state set is reduced to just four states: SETTING_UP, ACTIVE, REMOVING and DEAD. The record persists in the active state even when it's not being used until the time comes to remove it rather than downgrading it to an inactive state from whence it can be restored. This means that the cell still appears in /proc and /afs when not in use until it switches to the REMOVING state - at which point it is removed. Note that the REMOVING state is included so that someone wanting to resurrect the cell record is forced to wait whilst the cell is torn down in that state. Once it's in the DEAD state, it has been removed from net->cells tree and is no longer findable and can be replaced. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org cc: linux-fsdevel@vger.kernel.org Link: https://lore.kernel.org/r/20250224234154.2014840-16-dhowells@redhat.com/ # v1 Link: https://lore.kernel.org/r/20250310094206.801057-12-dhowells@redhat.com/ # v4
2025-02-24 16:06:03 +00:00
bool final_put;
afs: Fix cell refcounting by splitting the usage counter Management of the lifetime of afs_cell struct has some problems due to the usage counter being used to determine whether objects of that type are in use in addition to whether anyone might be interested in the structure. This is made trickier by cell objects being cached for a period of time in case they're quickly reused as they hold the result of a setup process that may be slow (DNS lookups, AFS RPC ops). Problems include the cached root volume from alias resolution pinning its parent cell record, rmmod occasionally hanging and occasionally producing assertion failures. Fix this by splitting the count of active users from the struct reference count. Things then work as follows: (1) The cell cache keeps +1 on the cell's activity count and this has to be dropped before the cell can be removed. afs_manage_cell() tries to exchange the 1 to a 0 with the cells_lock write-locked, and if successful, the record is removed from the net->cells. (2) One struct ref is 'owned' by the activity count. That is put when the active count is reduced to 0 (final_destruction label). (3) A ref can be held on a cell whilst it is queued for management on a work queue without confusing the active count. afs_queue_cell() is added to wrap this. (4) The queue's ref is dropped at the end of the management. This is split out into a separate function, afs_manage_cell_work(). (5) The root volume record is put after a cell is removed (at the final_destruction label) rather then in the RCU destruction routine. (6) Volumes hold struct refs, but aren't active users. (7) Both counts are displayed in /proc/net/afs/cells. There are some management function changes: (*) afs_put_cell() now just decrements the refcount and triggers the RCU destruction if it becomes 0. It no longer sets a timer to have the manager do this. (*) afs_use_cell() and afs_unuse_cell() are added to increase and decrease the active count. afs_unuse_cell() sets the management timer. (*) afs_queue_cell() is added to queue a cell with approprate refs. There are also some other fixes: (*) Don't let /proc/net/afs/cells access a cell's vllist if it's NULL. (*) Make sure that candidate cells in lookups are properly destroyed rather than being simply kfree'd. This ensures the bits it points to are destroyed also. (*) afs_dec_cells_outstanding() is now called in cell destruction rather than at "final_destruction". This ensures that cell->net is still valid to the end of the destructor. (*) As a consequence of the previous two changes, move the increment of net->cells_outstanding that was at the point of insertion into the tree to the allocation routine to correctly balance things. Fixes: 989782dcdc91 ("afs: Overhaul cell database management") Signed-off-by: David Howells <dhowells@redhat.com>
2019-07-23 11:24:59 +01:00
afs: Simplify cell record handling Simplify afs_cell record handling to avoid very occasional races that cause module removal to hang (it waits for all cell records to be removed). There are two things that particularly contribute to the difficulty: firstly, the code tries to pass a ref on the cell to the cell's maintenance work item (which gets awkward if the work item is already queued); and, secondly, there's an overall cell manager that tries to use just one timer for the entire cell collection (to avoid having loads of timers). However, both of these are probably unnecessarily restrictive. To simplify this, the following changes are made: (1) The cell record collection manager is removed. Each cell record manages itself individually. (2) Each afs_cell is given a second work item (cell->destroyer) that is queued when its refcount reaches zero. This is not done in the context of the putting thread as it might be in an inconvenient place to sleep. (3) Each afs_cell is given its own timer. The timer is used to expire the cell record after a period of unuse if not otherwise pinned and can also be used for other maintenance tasks if necessary (of which there are currently none as DNS refresh is triggered by filesystem operations). (4) The afs_cell manager work item (cell->manager) is no longer given a ref on the cell when queued; rather, the manager must be deleted. This does away with the need to deal with the consequences of losing a race to queue cell->manager. Clean up of extra queuing is deferred to the destroyer. (5) The cell destroyer work item makes sure the cell timer is removed and that the normal cell work is cancelled before farming the actual destruction off to RCU. (6) When a network namespace is destroyed or the kafs module is unloaded, it's now a simple matter of marking the namespace as dead then just waking up all the cell work items. They will then remove and destroy themselves once all remaining activity counts and/or a ref counts are dropped. This makes sure that all server records are dropped first. (7) The cell record state set is reduced to just four states: SETTING_UP, ACTIVE, REMOVING and DEAD. The record persists in the active state even when it's not being used until the time comes to remove it rather than downgrading it to an inactive state from whence it can be restored. This means that the cell still appears in /proc and /afs when not in use until it switches to the REMOVING state - at which point it is removed. Note that the REMOVING state is included so that someone wanting to resurrect the cell record is forced to wait whilst the cell is torn down in that state. Once it's in the DEAD state, it has been removed from net->cells tree and is no longer findable and can be replaced. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org cc: linux-fsdevel@vger.kernel.org Link: https://lore.kernel.org/r/20250224234154.2014840-16-dhowells@redhat.com/ # v1 Link: https://lore.kernel.org/r/20250310094206.801057-12-dhowells@redhat.com/ # v4
2025-02-24 16:06:03 +00:00
afs_see_cell(cell, afs_cell_trace_manage);
final_put = afs_manage_cell(cell);
afs_see_cell(cell, afs_cell_trace_managed);
if (final_put)
afs_put_cell(cell, afs_cell_trace_put_final);
}
/*
* Purge in-memory cell database.
*/
void afs_cell_purge(struct afs_net *net)
{
struct afs_cell *ws;
afs: Simplify cell record handling Simplify afs_cell record handling to avoid very occasional races that cause module removal to hang (it waits for all cell records to be removed). There are two things that particularly contribute to the difficulty: firstly, the code tries to pass a ref on the cell to the cell's maintenance work item (which gets awkward if the work item is already queued); and, secondly, there's an overall cell manager that tries to use just one timer for the entire cell collection (to avoid having loads of timers). However, both of these are probably unnecessarily restrictive. To simplify this, the following changes are made: (1) The cell record collection manager is removed. Each cell record manages itself individually. (2) Each afs_cell is given a second work item (cell->destroyer) that is queued when its refcount reaches zero. This is not done in the context of the putting thread as it might be in an inconvenient place to sleep. (3) Each afs_cell is given its own timer. The timer is used to expire the cell record after a period of unuse if not otherwise pinned and can also be used for other maintenance tasks if necessary (of which there are currently none as DNS refresh is triggered by filesystem operations). (4) The afs_cell manager work item (cell->manager) is no longer given a ref on the cell when queued; rather, the manager must be deleted. This does away with the need to deal with the consequences of losing a race to queue cell->manager. Clean up of extra queuing is deferred to the destroyer. (5) The cell destroyer work item makes sure the cell timer is removed and that the normal cell work is cancelled before farming the actual destruction off to RCU. (6) When a network namespace is destroyed or the kafs module is unloaded, it's now a simple matter of marking the namespace as dead then just waking up all the cell work items. They will then remove and destroy themselves once all remaining activity counts and/or a ref counts are dropped. This makes sure that all server records are dropped first. (7) The cell record state set is reduced to just four states: SETTING_UP, ACTIVE, REMOVING and DEAD. The record persists in the active state even when it's not being used until the time comes to remove it rather than downgrading it to an inactive state from whence it can be restored. This means that the cell still appears in /proc and /afs when not in use until it switches to the REMOVING state - at which point it is removed. Note that the REMOVING state is included so that someone wanting to resurrect the cell record is forced to wait whilst the cell is torn down in that state. Once it's in the DEAD state, it has been removed from net->cells tree and is no longer findable and can be replaced. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org cc: linux-fsdevel@vger.kernel.org Link: https://lore.kernel.org/r/20250224234154.2014840-16-dhowells@redhat.com/ # v1 Link: https://lore.kernel.org/r/20250310094206.801057-12-dhowells@redhat.com/ # v4
2025-02-24 16:06:03 +00:00
struct rb_node *cursor;
_enter("");
afs: Fix rapid cell addition/removal by not using RCU on cells tree There are a number of problems that are being seen by the rapidly mounting and unmounting an afs dynamic root with an explicit cell and volume specified (which should probably be rejected, but that's a separate issue): What the tests are doing is to look up/create a cell record for the name given and then tear it down again without actually using it to try to talk to a server. This is repeated endlessly, very fast, and the new cell collides with the old one if it's not quick enough to reuse it. It appears (as suggested by Hillf Danton) that the search through the RB tree under a read_seqbegin_or_lock() under RCU conditions isn't safe and that it's not blocking the write_seqlock(), despite taking two passes at it. He suggested that the code should take a ref on the cell it's attempting to look at - but this shouldn't be necessary until we've compared the cell names. It's possible that I'm missing a barrier somewhere. However, using an RCU search for this is overkill, really - we only need to access the cell name in a few places, and they're places where we're may end up sleeping anyway. Fix this by switching to an R/W semaphore instead. Additionally, draw the down_read() call inside the function (renamed to afs_find_cell()) since all the callers were taking the RCU read lock (or should've been[*]). [*] afs_probe_cell_name() should have been, but that doesn't appear to be involved in the bug reports. The symptoms of this look like: general protection fault, probably for non-canonical address 0xf27d208691691fdb: 0000 [#1] PREEMPT SMP KASAN KASAN: maybe wild-memory-access in range [0x93e924348b48fed8-0x93e924348b48fedf] ... RIP: 0010:strncasecmp lib/string.c:52 [inline] RIP: 0010:strncasecmp+0x5f/0x240 lib/string.c:43 afs_lookup_cell_rcu+0x313/0x720 fs/afs/cell.c:88 afs_lookup_cell+0x2ee/0x1440 fs/afs/cell.c:249 afs_parse_source fs/afs/super.c:290 [inline] ... Fixes: 989782dcdc91 ("afs: Overhaul cell database management") Reported-by: syzbot+459a5dce0b4cb70fd076@syzkaller.appspotmail.com Signed-off-by: David Howells <dhowells@redhat.com> cc: Hillf Danton <hdanton@sina.com> cc: syzkaller-bugs@googlegroups.com
2020-10-09 14:11:58 +01:00
down_write(&net->cells_lock);
ws = rcu_replace_pointer(net->ws_cell, NULL,
lockdep_is_held(&net->cells_lock));
afs: Fix rapid cell addition/removal by not using RCU on cells tree There are a number of problems that are being seen by the rapidly mounting and unmounting an afs dynamic root with an explicit cell and volume specified (which should probably be rejected, but that's a separate issue): What the tests are doing is to look up/create a cell record for the name given and then tear it down again without actually using it to try to talk to a server. This is repeated endlessly, very fast, and the new cell collides with the old one if it's not quick enough to reuse it. It appears (as suggested by Hillf Danton) that the search through the RB tree under a read_seqbegin_or_lock() under RCU conditions isn't safe and that it's not blocking the write_seqlock(), despite taking two passes at it. He suggested that the code should take a ref on the cell it's attempting to look at - but this shouldn't be necessary until we've compared the cell names. It's possible that I'm missing a barrier somewhere. However, using an RCU search for this is overkill, really - we only need to access the cell name in a few places, and they're places where we're may end up sleeping anyway. Fix this by switching to an R/W semaphore instead. Additionally, draw the down_read() call inside the function (renamed to afs_find_cell()) since all the callers were taking the RCU read lock (or should've been[*]). [*] afs_probe_cell_name() should have been, but that doesn't appear to be involved in the bug reports. The symptoms of this look like: general protection fault, probably for non-canonical address 0xf27d208691691fdb: 0000 [#1] PREEMPT SMP KASAN KASAN: maybe wild-memory-access in range [0x93e924348b48fed8-0x93e924348b48fedf] ... RIP: 0010:strncasecmp lib/string.c:52 [inline] RIP: 0010:strncasecmp+0x5f/0x240 lib/string.c:43 afs_lookup_cell_rcu+0x313/0x720 fs/afs/cell.c:88 afs_lookup_cell+0x2ee/0x1440 fs/afs/cell.c:249 afs_parse_source fs/afs/super.c:290 [inline] ... Fixes: 989782dcdc91 ("afs: Overhaul cell database management") Reported-by: syzbot+459a5dce0b4cb70fd076@syzkaller.appspotmail.com Signed-off-by: David Howells <dhowells@redhat.com> cc: Hillf Danton <hdanton@sina.com> cc: syzkaller-bugs@googlegroups.com
2020-10-09 14:11:58 +01:00
up_write(&net->cells_lock);
afs_unuse_cell(ws, afs_cell_trace_unuse_ws);
afs: Simplify cell record handling Simplify afs_cell record handling to avoid very occasional races that cause module removal to hang (it waits for all cell records to be removed). There are two things that particularly contribute to the difficulty: firstly, the code tries to pass a ref on the cell to the cell's maintenance work item (which gets awkward if the work item is already queued); and, secondly, there's an overall cell manager that tries to use just one timer for the entire cell collection (to avoid having loads of timers). However, both of these are probably unnecessarily restrictive. To simplify this, the following changes are made: (1) The cell record collection manager is removed. Each cell record manages itself individually. (2) Each afs_cell is given a second work item (cell->destroyer) that is queued when its refcount reaches zero. This is not done in the context of the putting thread as it might be in an inconvenient place to sleep. (3) Each afs_cell is given its own timer. The timer is used to expire the cell record after a period of unuse if not otherwise pinned and can also be used for other maintenance tasks if necessary (of which there are currently none as DNS refresh is triggered by filesystem operations). (4) The afs_cell manager work item (cell->manager) is no longer given a ref on the cell when queued; rather, the manager must be deleted. This does away with the need to deal with the consequences of losing a race to queue cell->manager. Clean up of extra queuing is deferred to the destroyer. (5) The cell destroyer work item makes sure the cell timer is removed and that the normal cell work is cancelled before farming the actual destruction off to RCU. (6) When a network namespace is destroyed or the kafs module is unloaded, it's now a simple matter of marking the namespace as dead then just waking up all the cell work items. They will then remove and destroy themselves once all remaining activity counts and/or a ref counts are dropped. This makes sure that all server records are dropped first. (7) The cell record state set is reduced to just four states: SETTING_UP, ACTIVE, REMOVING and DEAD. The record persists in the active state even when it's not being used until the time comes to remove it rather than downgrading it to an inactive state from whence it can be restored. This means that the cell still appears in /proc and /afs when not in use until it switches to the REMOVING state - at which point it is removed. Note that the REMOVING state is included so that someone wanting to resurrect the cell record is forced to wait whilst the cell is torn down in that state. Once it's in the DEAD state, it has been removed from net->cells tree and is no longer findable and can be replaced. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org cc: linux-fsdevel@vger.kernel.org Link: https://lore.kernel.org/r/20250224234154.2014840-16-dhowells@redhat.com/ # v1 Link: https://lore.kernel.org/r/20250310094206.801057-12-dhowells@redhat.com/ # v4
2025-02-24 16:06:03 +00:00
_debug("kick cells");
down_read(&net->cells_lock);
for (cursor = rb_first(&net->cells); cursor; cursor = rb_next(cursor)) {
struct afs_cell *cell = rb_entry(cursor, struct afs_cell, net_node);
afs_see_cell(cell, afs_cell_trace_purge);
afs: Simplify cell record handling Simplify afs_cell record handling to avoid very occasional races that cause module removal to hang (it waits for all cell records to be removed). There are two things that particularly contribute to the difficulty: firstly, the code tries to pass a ref on the cell to the cell's maintenance work item (which gets awkward if the work item is already queued); and, secondly, there's an overall cell manager that tries to use just one timer for the entire cell collection (to avoid having loads of timers). However, both of these are probably unnecessarily restrictive. To simplify this, the following changes are made: (1) The cell record collection manager is removed. Each cell record manages itself individually. (2) Each afs_cell is given a second work item (cell->destroyer) that is queued when its refcount reaches zero. This is not done in the context of the putting thread as it might be in an inconvenient place to sleep. (3) Each afs_cell is given its own timer. The timer is used to expire the cell record after a period of unuse if not otherwise pinned and can also be used for other maintenance tasks if necessary (of which there are currently none as DNS refresh is triggered by filesystem operations). (4) The afs_cell manager work item (cell->manager) is no longer given a ref on the cell when queued; rather, the manager must be deleted. This does away with the need to deal with the consequences of losing a race to queue cell->manager. Clean up of extra queuing is deferred to the destroyer. (5) The cell destroyer work item makes sure the cell timer is removed and that the normal cell work is cancelled before farming the actual destruction off to RCU. (6) When a network namespace is destroyed or the kafs module is unloaded, it's now a simple matter of marking the namespace as dead then just waking up all the cell work items. They will then remove and destroy themselves once all remaining activity counts and/or a ref counts are dropped. This makes sure that all server records are dropped first. (7) The cell record state set is reduced to just four states: SETTING_UP, ACTIVE, REMOVING and DEAD. The record persists in the active state even when it's not being used until the time comes to remove it rather than downgrading it to an inactive state from whence it can be restored. This means that the cell still appears in /proc and /afs when not in use until it switches to the REMOVING state - at which point it is removed. Note that the REMOVING state is included so that someone wanting to resurrect the cell record is forced to wait whilst the cell is torn down in that state. Once it's in the DEAD state, it has been removed from net->cells tree and is no longer findable and can be replaced. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org cc: linux-fsdevel@vger.kernel.org Link: https://lore.kernel.org/r/20250224234154.2014840-16-dhowells@redhat.com/ # v1 Link: https://lore.kernel.org/r/20250310094206.801057-12-dhowells@redhat.com/ # v4
2025-02-24 16:06:03 +00:00
if (test_and_clear_bit(AFS_CELL_FL_NO_GC, &cell->flags))
afs_unuse_cell(cell, afs_cell_trace_unuse_pin);
afs_queue_cell(cell, afs_cell_trace_queue_purge);
}
up_read(&net->cells_lock);
_debug("wait");
wait_var_event(&net->cells_outstanding,
!atomic_read(&net->cells_outstanding));
_leave("");
}