linux/Documentation/core-api/refcount-vs-atomic.rst
Suren Baghdasaryan 7f8ceea0c5 refcount: provide ops for cases when object's memory can be reused
For speculative lookups where a successful inc_not_zero() pins the object,
but where we still need to double check if the object acquired is indeed
the one we set out to acquire (identity check), needs this validation to
happen *after* the increment.  Similarly, when a new object is initialized
and its memory might have been previously occupied by another object, all
stores to initialize the object should happen *before* refcount
initialization.

Notably SLAB_TYPESAFE_BY_RCU is one such an example when this ordering is
required for reference counting.

Add refcount_{add|inc}_not_zero_acquire() to guarantee the proper ordering
between acquiring a reference count on an object and performing the
identity check for that object.

Add refcount_set_release() to guarantee proper ordering between stores
initializing object attributes and the store initializing the refcount. 
refcount_set_release() should be done after all other object attributes
are initialized.  Once refcount_set_release() is called, the object should
be considered visible to other tasks even if it was not yet added into an
object collection normally used to discover it.  This is because other
tasks might have discovered the object previously occupying the same
memory and after memory reuse they can succeed in taking refcount for the
new object and start using it.

Object reuse example to consider:

consumer:
    obj = lookup(collection, key);
    if (!refcount_inc_not_zero_acquire(&obj->ref))
        return;
    if (READ_ONCE(obj->key) != key) { /* identity check */
        put_ref(obj);
        return;
    }
    use(obj->value);

                 producer:
                     remove(collection, obj->key);
                     if (!refcount_dec_and_test(&obj->ref))
                         return;
                     obj->key = KEY_INVALID;
                     free(obj);
                     obj = malloc(); /* obj is reused */
                     obj->key = new_key;
                     obj->value = new_value;
                     refcount_set_release(obj->ref, 1);
                     add(collection, new_key, obj);

refcount_{add|inc}_not_zero_acquire() is required to prevent the following
reordering when refcount_inc_not_zero() is used instead:

consumer:
    obj = lookup(collection, key);
    if (READ_ONCE(obj->key) != key) { /* reordered identity check */
        put_ref(obj);
        return;
    }
                 producer:
                     remove(collection, obj->key);
                     if (!refcount_dec_and_test(&obj->ref))
                         return;
                     obj->key = KEY_INVALID;
                     free(obj);
                     obj = malloc(); /* obj is reused */
                     obj->key = new_key;
                     obj->value = new_value;
                     refcount_set_release(obj->ref, 1);
                     add(collection, new_key, obj);

    if (!refcount_inc_not_zero(&obj->ref))
        return;
    use(obj->value); /* USING WRONG OBJECT */

refcount_set_release() is required to prevent the following reordering
when refcount_set() is used instead:

consumer:
    obj = lookup(collection, key);

                 producer:
                     remove(collection, obj->key);
                     if (!refcount_dec_and_test(&obj->ref))
                         return;
                     obj->key = KEY_INVALID;
                     free(obj);
                     obj = malloc(); /* obj is reused */
                     obj->key = new_key; /* new_key == old_key */
                     refcount_set(obj->ref, 1);

    if (!refcount_inc_not_zero_acquire(&obj->ref))
        return;
    if (READ_ONCE(obj->key) != key) { /* pass since new_key == old_key */
        put_ref(obj);
        return;
    }
    use(obj->value); /* USING STALE obj->value */

                     obj->value = new_value; /* reordered store */
                     add(collection, key, obj);

[surenb@google.com: fix title underlines in refcount-vs-atomic.rst]
  Link: https://lkml.kernel.org/r/20250217161645.3137927-1-surenb@google.com
Link: https://lkml.kernel.org/r/20250213224655.1680278-11-surenb@google.com
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>	[slab]
Tested-by: Shivank Garg <shivankg@amd.com>
  Link: https://lkml.kernel.org/r/5e19ec93-8307-47c2-bb13-3ddf7150624e@amd.com
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Will Deacon <will@kernel.org>
Cc: Paul E. McKenney <paulmck@kernel.org>
Cc: Christian Brauner <brauner@kernel.org>
Cc: David Hildenbrand <david@redhat.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jann Horn <jannh@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Klara Modin <klarasmodin@gmail.com>
Cc: Liam R. Howlett <Liam.Howlett@Oracle.com>
Cc: Lokesh Gidra <lokeshgidra@google.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Mateusz Guzik <mjguzik@gmail.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Minchan Kim <minchan@google.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Pasha Tatashin <pasha.tatashin@soleen.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Shakeel Butt <shakeel.butt@linux.dev>
Cc: Sourav Panda <souravpanda@google.com>
Cc: Wei Yang <richard.weiyang@gmail.com>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-03-16 22:06:19 -07:00

193 lines
6.2 KiB
ReStructuredText

===================================
refcount_t API compared to atomic_t
===================================
.. contents:: :local:
Introduction
============
The goal of refcount_t API is to provide a minimal API for implementing
an object's reference counters. While a generic architecture-independent
implementation from lib/refcount.c uses atomic operations underneath,
there are a number of differences between some of the ``refcount_*()`` and
``atomic_*()`` functions with regards to the memory ordering guarantees.
This document outlines the differences and provides respective examples
in order to help maintainers validate their code against the change in
these memory ordering guarantees.
The terms used through this document try to follow the formal LKMM defined in
tools/memory-model/Documentation/explanation.txt.
memory-barriers.txt and atomic_t.txt provide more background to the
memory ordering in general and for atomic operations specifically.
Relevant types of memory ordering
=================================
.. note:: The following section only covers some of the memory
ordering types that are relevant for the atomics and reference
counters and used through this document. For a much broader picture
please consult memory-barriers.txt document.
In the absence of any memory ordering guarantees (i.e. fully unordered)
atomics & refcounters only provide atomicity and
program order (po) relation (on the same CPU). It guarantees that
each ``atomic_*()`` and ``refcount_*()`` operation is atomic and instructions
are executed in program order on a single CPU.
This is implemented using READ_ONCE()/WRITE_ONCE() and
compare-and-swap primitives.
A strong (full) memory ordering guarantees that all prior loads and
stores (all po-earlier instructions) on the same CPU are completed
before any po-later instruction is executed on the same CPU.
It also guarantees that all po-earlier stores on the same CPU
and all propagated stores from other CPUs must propagate to all
other CPUs before any po-later instruction is executed on the original
CPU (A-cumulative property). This is implemented using smp_mb().
A RELEASE memory ordering guarantees that all prior loads and
stores (all po-earlier instructions) on the same CPU are completed
before the operation. It also guarantees that all po-earlier
stores on the same CPU and all propagated stores from other CPUs
must propagate to all other CPUs before the release operation
(A-cumulative property). This is implemented using
smp_store_release().
An ACQUIRE memory ordering guarantees that all post loads and
stores (all po-later instructions) on the same CPU are
completed after the acquire operation. It also guarantees that all
po-later stores on the same CPU must propagate to all other CPUs
after the acquire operation executes. This is implemented using
smp_acquire__after_ctrl_dep().
A control dependency (on success) for refcounters guarantees that
if a reference for an object was successfully obtained (reference
counter increment or addition happened, function returned true),
then further stores are ordered against this operation.
Control dependency on stores are not implemented using any explicit
barriers, but rely on CPU not to speculate on stores. This is only
a single CPU relation and provides no guarantees for other CPUs.
Comparison of functions
=======================
case 1) - non-"Read/Modify/Write" (RMW) ops
-------------------------------------------
Function changes:
* atomic_set() --> refcount_set()
* atomic_read() --> refcount_read()
Memory ordering guarantee changes:
* none (both fully unordered)
case 2) - non-"Read/Modify/Write" (RMW) ops with release ordering
-----------------------------------------------------------------
Function changes:
* atomic_set_release() --> refcount_set_release()
Memory ordering guarantee changes:
* none (both provide RELEASE ordering)
case 3) - increment-based ops that return no value
--------------------------------------------------
Function changes:
* atomic_inc() --> refcount_inc()
* atomic_add() --> refcount_add()
Memory ordering guarantee changes:
* none (both fully unordered)
case 4) - decrement-based RMW ops that return no value
------------------------------------------------------
Function changes:
* atomic_dec() --> refcount_dec()
Memory ordering guarantee changes:
* fully unordered --> RELEASE ordering
case 5) - increment-based RMW ops that return a value
-----------------------------------------------------
Function changes:
* atomic_inc_not_zero() --> refcount_inc_not_zero()
* no atomic counterpart --> refcount_add_not_zero()
Memory ordering guarantees changes:
* fully ordered --> control dependency on success for stores
.. note:: We really assume here that necessary ordering is provided as a
result of obtaining pointer to the object!
case 6) - increment-based RMW ops with acquire ordering that return a value
---------------------------------------------------------------------------
Function changes:
* atomic_inc_not_zero() --> refcount_inc_not_zero_acquire()
* no atomic counterpart --> refcount_add_not_zero_acquire()
Memory ordering guarantees changes:
* fully ordered --> ACQUIRE ordering on success
case 7) - generic dec/sub decrement-based RMW ops that return a value
---------------------------------------------------------------------
Function changes:
* atomic_dec_and_test() --> refcount_dec_and_test()
* atomic_sub_and_test() --> refcount_sub_and_test()
Memory ordering guarantees changes:
* fully ordered --> RELEASE ordering + ACQUIRE ordering on success
case 8) other decrement-based RMW ops that return a value
---------------------------------------------------------
Function changes:
* no atomic counterpart --> refcount_dec_if_one()
* ``atomic_add_unless(&var, -1, 1)`` --> ``refcount_dec_not_one(&var)``
Memory ordering guarantees changes:
* fully ordered --> RELEASE ordering + control dependency
.. note:: atomic_add_unless() only provides full order on success.
case 9) - lock-based RMW
------------------------
Function changes:
* atomic_dec_and_lock() --> refcount_dec_and_lock()
* atomic_dec_and_mutex_lock() --> refcount_dec_and_mutex_lock()
Memory ordering guarantees changes:
* fully ordered --> RELEASE ordering + control dependency + hold
spin_lock() on success