linux/Documentation/preempt-locking.txt

===========================================================================
Proper Locking Under a Preemptible Kernel: Keeping Kernel Code Preempt-Safe
===========================================================================

:Author: Robert Love <rml@tech9.net>


Introduction
============


A preemptible kernel creates new locking issues.  The issues are the same as
those under SMP: concurrency and reentrancy.  Thankfully, the Linux preemptible
kernel model leverages existing SMP locking mechanisms.  Thus, the kernel
requires explicit additional locking for very few additional situations.

This document is for all kernel hackers.  Developing code in the kernel
requires protecting these situations.
 

RULE #1: Per-CPU data structures need explicit protection
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^


Two similar problems arise. An example code snippet::

	struct this_needs_locking tux[NR_CPUS];
	tux[smp_processor_id()] = some_value;
	/* task is preempted here... */
	something = tux[smp_processor_id()];

First, since the data is per-CPU, it may not have explicit SMP locking, but
require it otherwise.  Second, when a preempted task is finally rescheduled,
the previous value of smp_processor_id may not equal the current.  You must
protect these situations by disabling preemption around them.

You can also use put_cpu() and get_cpu(), which will disable preemption.


RULE #2: CPU state must be protected.
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^


Under preemption, the state of the CPU must be protected.  This is arch-
dependent, but includes CPU structures and state not preserved over a context
switch.  For example, on x86, entering and exiting FPU mode is now a critical
section that must occur while preemption is disabled.  Think what would happen
if the kernel is executing a floating-point instruction and is then preempted.
Remember, the kernel does not save FPU state except for user tasks.  Therefore,
upon preemption, the FPU registers will be sold to the lowest bidder.  Thus,
preemption must be disabled around such regions.

Note, some FPU functions are already explicitly preempt safe.  For example,
kernel_fpu_begin and kernel_fpu_end will disable and enable preemption.
However, fpu__restore() must be called with preemption disabled.


RULE #3: Lock acquire and release must be performed by same task
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^


A lock acquired in one task must be released by the same task.  This
means you can't do oddball things like acquire a lock and go off to
play while another task releases it.  If you want to do something
like this, acquire and release the task in the same code path and
have the caller wait on an event by the other task.


Solution
========


Data protection under preemption is achieved by disabling preemption for the
duration of the critical region.

::

  preempt_enable()		decrement the preempt counter
  preempt_disable()		increment the preempt counter
  preempt_enable_no_resched()	decrement, but do not immediately preempt
  preempt_check_resched()	if needed, reschedule
  preempt_count()		return the preempt counter

The functions are nestable.  In other words, you can call preempt_disable
n-times in a code path, and preemption will not be reenabled until the n-th
call to preempt_enable.  The preempt statements define to nothing if
preemption is not enabled.

Note that you do not need to explicitly prevent preemption if you are holding
any locks or interrupts are disabled, since preemption is implicitly disabled
in those cases.

But keep in mind that 'irqs disabled' is a fundamentally unsafe way of
disabling preemption - any cond_resched() or cond_resched_lock() might trigger
a reschedule if the preempt count is 0. A simple printk() might trigger a
reschedule. So use this implicit preemption-disabling property only if you
know that the affected codepath does not do any of this. Best policy is to use
this only for small, atomic code that you wrote and which calls no complex
functions.

Example::

	cpucache_t *cc; /* this is per-CPU */
	preempt_disable();
	cc = cc_data(searchp);
	if (cc && cc->avail) {
		__free_block(searchp, cc_entry(cc), cc->avail);
		cc->avail = 0;
	}
	preempt_enable();
	return 0;

Notice how the preemption statements must encompass every reference of the
critical variables.  Another example::

	int buf[NR_CPUS];
	set_cpu_val(buf);
	if (buf[smp_processor_id()] == -1) printf(KERN_INFO "wee!\n");
	spin_lock(&buf_lock);
	/* ... */

This code is not preempt-safe, but see how easily we can fix it by simply
moving the spin_lock up two lines.


Preventing preemption using interrupt disabling
===============================================


It is possible to prevent a preemption event using local_irq_disable and
local_irq_save.  Note, when doing so, you must be very careful to not cause
an event that would set need_resched and result in a preemption check.  When
in doubt, rely on locking or explicit preemption disabling.

Note in 2.5 interrupt disabling is now only per-CPU (e.g. local).

An additional concern is proper usage of local_irq_disable and local_irq_save.
These may be used to protect from preemption, however, on exit, if preemption
may be enabled, a test to see if preemption is required should be done.  If
these are called from the spin_lock and read/write lock macros, the right thing
is done.  They may also be called within a spin-lock protected region, however,
if they are ever called outside of this context, a test for preemption should
be made. Do note that calls from interrupt context or bottom half/ tasklets
are also protected by preemption locks and so may use the versions which do
not check preemption.
preempt-locking.txt: standardize document format Each text file under Documentation follows a different format. Some doesn't even have titles! Change its representation to follow the adopted standard, using ReST markups for it to be parseable by Sphinx: - mark titles; - mark literal blocks; - adjust identation where needed; - use :Author: for authorship. Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com> Signed-off-by: Jonathan Corbet <corbet@lwn.net> 2017-05-16 21:58:47 -03:00			`===========================================================================`
			`Proper Locking Under a Preemptible Kernel: Keeping Kernel Code Preempt-Safe`
			`===========================================================================`
Linux-2.6.12-rc2 Initial git repository build. I'm not bothering with the full history, even though we have it. We can create a separate "historical" git archive of that later if we want to, and in the meantime it's about 3.2GB when imported into git - space that would just make the early git days unnecessarily complicated, when we don't have a lot of good infrastructure for it. Let it rip! 2005-04-16 15:20:36 -07:00
preempt-locking.txt: standardize document format Each text file under Documentation follows a different format. Some doesn't even have titles! Change its representation to follow the adopted standard, using ReST markups for it to be parseable by Sphinx: - mark titles; - mark literal blocks; - adjust identation where needed; - use :Author: for authorship. Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com> Signed-off-by: Jonathan Corbet <corbet@lwn.net> 2017-05-16 21:58:47 -03:00			`:Author: Robert Love <rml@tech9.net>`
Linux-2.6.12-rc2 Initial git repository build. I'm not bothering with the full history, even though we have it. We can create a separate "historical" git archive of that later if we want to, and in the meantime it's about 3.2GB when imported into git - space that would just make the early git days unnecessarily complicated, when we don't have a lot of good infrastructure for it. Let it rip! 2005-04-16 15:20:36 -07:00
preempt-locking.txt: standardize document format Each text file under Documentation follows a different format. Some doesn't even have titles! Change its representation to follow the adopted standard, using ReST markups for it to be parseable by Sphinx: - mark titles; - mark literal blocks; - adjust identation where needed; - use :Author: for authorship. Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com> Signed-off-by: Jonathan Corbet <corbet@lwn.net> 2017-05-16 21:58:47 -03:00
			`Introduction`
			`============`
Linux-2.6.12-rc2 Initial git repository build. I'm not bothering with the full history, even though we have it. We can create a separate "historical" git archive of that later if we want to, and in the meantime it's about 3.2GB when imported into git - space that would just make the early git days unnecessarily complicated, when we don't have a lot of good infrastructure for it. Let it rip! 2005-04-16 15:20:36 -07:00

			`A preemptible kernel creates new locking issues. The issues are the same as`
			`those under SMP: concurrency and reentrancy. Thankfully, the Linux preemptible`
			`kernel model leverages existing SMP locking mechanisms. Thus, the kernel`
			`requires explicit additional locking for very few additional situations.`

			`This document is for all kernel hackers. Developing code in the kernel`
			`requires protecting these situations.`


			`RULE #1: Per-CPU data structures need explicit protection`
preempt-locking.txt: standardize document format Each text file under Documentation follows a different format. Some doesn't even have titles! Change its representation to follow the adopted standard, using ReST markups for it to be parseable by Sphinx: - mark titles; - mark literal blocks; - adjust identation where needed; - use :Author: for authorship. Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com> Signed-off-by: Jonathan Corbet <corbet@lwn.net> 2017-05-16 21:58:47 -03:00			`^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^`
Linux-2.6.12-rc2 Initial git repository build. I'm not bothering with the full history, even though we have it. We can create a separate "historical" git archive of that later if we want to, and in the meantime it's about 3.2GB when imported into git - space that would just make the early git days unnecessarily complicated, when we don't have a lot of good infrastructure for it. Let it rip! 2005-04-16 15:20:36 -07:00

preempt-locking.txt: standardize document format Each text file under Documentation follows a different format. Some doesn't even have titles! Change its representation to follow the adopted standard, using ReST markups for it to be parseable by Sphinx: - mark titles; - mark literal blocks; - adjust identation where needed; - use :Author: for authorship. Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com> Signed-off-by: Jonathan Corbet <corbet@lwn.net> 2017-05-16 21:58:47 -03:00			`Two similar problems arise. An example code snippet::`
Linux-2.6.12-rc2 Initial git repository build. I'm not bothering with the full history, even though we have it. We can create a separate "historical" git archive of that later if we want to, and in the meantime it's about 3.2GB when imported into git - space that would just make the early git days unnecessarily complicated, when we don't have a lot of good infrastructure for it. Let it rip! 2005-04-16 15:20:36 -07:00
			`struct this_needs_locking tux[NR_CPUS];`
			`tux[smp_processor_id()] = some_value;`
			`/* task is preempted here... */`
			`something = tux[smp_processor_id()];`

			`First, since the data is per-CPU, it may not have explicit SMP locking, but`
			`require it otherwise. Second, when a preempted task is finally rescheduled,`
			`the previous value of smp_processor_id may not equal the current. You must`
			`protect these situations by disabling preemption around them.`

			`You can also use put_cpu() and get_cpu(), which will disable preemption.`


			`RULE #2: CPU state must be protected.`
preempt-locking.txt: standardize document format Each text file under Documentation follows a different format. Some doesn't even have titles! Change its representation to follow the adopted standard, using ReST markups for it to be parseable by Sphinx: - mark titles; - mark literal blocks; - adjust identation where needed; - use :Author: for authorship. Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com> Signed-off-by: Jonathan Corbet <corbet@lwn.net> 2017-05-16 21:58:47 -03:00			`^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^`
Linux-2.6.12-rc2 Initial git repository build. I'm not bothering with the full history, even though we have it. We can create a separate "historical" git archive of that later if we want to, and in the meantime it's about 3.2GB when imported into git - space that would just make the early git days unnecessarily complicated, when we don't have a lot of good infrastructure for it. Let it rip! 2005-04-16 15:20:36 -07:00

			`Under preemption, the state of the CPU must be protected. This is arch-`
			`dependent, but includes CPU structures and state not preserved over a context`
			`switch. For example, on x86, entering and exiting FPU mode is now a critical`
			`section that must occur while preemption is disabled. Think what would happen`
			`if the kernel is executing a floating-point instruction and is then preempted.`
			`Remember, the kernel does not save FPU state except for user tasks. Therefore,`
			`upon preemption, the FPU registers will be sold to the lowest bidder. Thus,`
			`preemption must be disabled around such regions.`

			`Note, some FPU functions are already explicitly preempt safe. For example,`
			`kernel_fpu_begin and kernel_fpu_end will disable and enable preemption.`
x86/fpu: Rename math_state_restore() to fpu__restore() Move to the new fpu__*() namespace. Reviewed-by: Borislav Petkov <bp@alien8.de> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org> 2015-04-22 13:16:47 +02:00			`However, fpu__restore() must be called with preemption disabled.`
Linux-2.6.12-rc2 Initial git repository build. I'm not bothering with the full history, even though we have it. We can create a separate "historical" git archive of that later if we want to, and in the meantime it's about 3.2GB when imported into git - space that would just make the early git days unnecessarily complicated, when we don't have a lot of good infrastructure for it. Let it rip! 2005-04-16 15:20:36 -07:00

			`RULE #3: Lock acquire and release must be performed by same task`
preempt-locking.txt: standardize document format Each text file under Documentation follows a different format. Some doesn't even have titles! Change its representation to follow the adopted standard, using ReST markups for it to be parseable by Sphinx: - mark titles; - mark literal blocks; - adjust identation where needed; - use :Author: for authorship. Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com> Signed-off-by: Jonathan Corbet <corbet@lwn.net> 2017-05-16 21:58:47 -03:00			`^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^`
Linux-2.6.12-rc2 Initial git repository build. I'm not bothering with the full history, even though we have it. We can create a separate "historical" git archive of that later if we want to, and in the meantime it's about 3.2GB when imported into git - space that would just make the early git days unnecessarily complicated, when we don't have a lot of good infrastructure for it. Let it rip! 2005-04-16 15:20:36 -07:00

			`A lock acquired in one task must be released by the same task. This`
			`means you can't do oddball things like acquire a lock and go off to`
			`play while another task releases it. If you want to do something`
			`like this, acquire and release the task in the same code path and`
			`have the caller wait on an event by the other task.`


preempt-locking.txt: standardize document format Each text file under Documentation follows a different format. Some doesn't even have titles! Change its representation to follow the adopted standard, using ReST markups for it to be parseable by Sphinx: - mark titles; - mark literal blocks; - adjust identation where needed; - use :Author: for authorship. Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com> Signed-off-by: Jonathan Corbet <corbet@lwn.net> 2017-05-16 21:58:47 -03:00			`Solution`
			`========`
Linux-2.6.12-rc2 Initial git repository build. I'm not bothering with the full history, even though we have it. We can create a separate "historical" git archive of that later if we want to, and in the meantime it's about 3.2GB when imported into git - space that would just make the early git days unnecessarily complicated, when we don't have a lot of good infrastructure for it. Let it rip! 2005-04-16 15:20:36 -07:00

			`Data protection under preemption is achieved by disabling preemption for the`
			`duration of the critical region.`

preempt-locking.txt: standardize document format Each text file under Documentation follows a different format. Some doesn't even have titles! Change its representation to follow the adopted standard, using ReST markups for it to be parseable by Sphinx: - mark titles; - mark literal blocks; - adjust identation where needed; - use :Author: for authorship. Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com> Signed-off-by: Jonathan Corbet <corbet@lwn.net> 2017-05-16 21:58:47 -03:00			`::`

			`preempt_enable() decrement the preempt counter`
			`preempt_disable() increment the preempt counter`
			`preempt_enable_no_resched() decrement, but do not immediately preempt`
			`preempt_check_resched() if needed, reschedule`
			`preempt_count() return the preempt counter`
Linux-2.6.12-rc2 Initial git repository build. I'm not bothering with the full history, even though we have it. We can create a separate "historical" git archive of that later if we want to, and in the meantime it's about 3.2GB when imported into git - space that would just make the early git days unnecessarily complicated, when we don't have a lot of good infrastructure for it. Let it rip! 2005-04-16 15:20:36 -07:00
			`The functions are nestable. In other words, you can call preempt_disable`
			`n-times in a code path, and preemption will not be reenabled until the n-th`
			`call to preempt_enable. The preempt statements define to nothing if`
			`preemption is not enabled.`

			`Note that you do not need to explicitly prevent preemption if you are holding`
			`any locks or interrupts are disabled, since preemption is implicitly disabled`
			`in those cases.`

			`But keep in mind that 'irqs disabled' is a fundamentally unsafe way of`
Documentation: preempt-locking: Use better example The existing wording implies that the use of spin_unlock whilst irqs are disabled might trigger a reschedule. However the preemptible() test in preempt_schedule will prevent a reschedule if irqs are disabled. Lets improve the clarity of this wording to change the example from spin_unlock to cond_resched() and cond_resched_lock() as these are functions that will trigger a reschedule if the preempt count is 0 without testing that irqs are disabled. Also remove the 'Last Updated' line as this is not up to date and better tracked via GIT. Signed-off-by: Andrew Murray <andrew.murray@arm.com> Signed-off-by: Jonathan Corbet <corbet@lwn.net> 2018-10-08 14:15:15 +01:00			`disabling preemption - any cond_resched() or cond_resched_lock() might trigger`
			`a reschedule if the preempt count is 0. A simple printk() might trigger a`
			`reschedule. So use this implicit preemption-disabling property only if you`
			`know that the affected codepath does not do any of this. Best policy is to use`
			`this only for small, atomic code that you wrote and which calls no complex`
			`functions.`
Linux-2.6.12-rc2 Initial git repository build. I'm not bothering with the full history, even though we have it. We can create a separate "historical" git archive of that later if we want to, and in the meantime it's about 3.2GB when imported into git - space that would just make the early git days unnecessarily complicated, when we don't have a lot of good infrastructure for it. Let it rip! 2005-04-16 15:20:36 -07:00
preempt-locking.txt: standardize document format Each text file under Documentation follows a different format. Some doesn't even have titles! Change its representation to follow the adopted standard, using ReST markups for it to be parseable by Sphinx: - mark titles; - mark literal blocks; - adjust identation where needed; - use :Author: for authorship. Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com> Signed-off-by: Jonathan Corbet <corbet@lwn.net> 2017-05-16 21:58:47 -03:00			`Example::`
Linux-2.6.12-rc2 Initial git repository build. I'm not bothering with the full history, even though we have it. We can create a separate "historical" git archive of that later if we want to, and in the meantime it's about 3.2GB when imported into git - space that would just make the early git days unnecessarily complicated, when we don't have a lot of good infrastructure for it. Let it rip! 2005-04-16 15:20:36 -07:00
			`cpucache_t cc; / this is per-CPU */`
			`preempt_disable();`
			`cc = cc_data(searchp);`
			`if (cc && cc->avail) {`
			`__free_block(searchp, cc_entry(cc), cc->avail);`
			`cc->avail = 0;`
			`}`
			`preempt_enable();`
			`return 0;`

			`Notice how the preemption statements must encompass every reference of the`
preempt-locking.txt: standardize document format Each text file under Documentation follows a different format. Some doesn't even have titles! Change its representation to follow the adopted standard, using ReST markups for it to be parseable by Sphinx: - mark titles; - mark literal blocks; - adjust identation where needed; - use :Author: for authorship. Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com> Signed-off-by: Jonathan Corbet <corbet@lwn.net> 2017-05-16 21:58:47 -03:00			`critical variables. Another example::`
Linux-2.6.12-rc2 Initial git repository build. I'm not bothering with the full history, even though we have it. We can create a separate "historical" git archive of that later if we want to, and in the meantime it's about 3.2GB when imported into git - space that would just make the early git days unnecessarily complicated, when we don't have a lot of good infrastructure for it. Let it rip! 2005-04-16 15:20:36 -07:00
			`int buf[NR_CPUS];`
			`set_cpu_val(buf);`
			`if (buf[smp_processor_id()] == -1) printf(KERN_INFO "wee!\n");`
			`spin_lock(&buf_lock);`
			`/* ... */`

			`This code is not preempt-safe, but see how easily we can fix it by simply`
			`moving the spin_lock up two lines.`


preempt-locking.txt: standardize document format Each text file under Documentation follows a different format. Some doesn't even have titles! Change its representation to follow the adopted standard, using ReST markups for it to be parseable by Sphinx: - mark titles; - mark literal blocks; - adjust identation where needed; - use :Author: for authorship. Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com> Signed-off-by: Jonathan Corbet <corbet@lwn.net> 2017-05-16 21:58:47 -03:00			`Preventing preemption using interrupt disabling`
			`===============================================`
Linux-2.6.12-rc2 Initial git repository build. I'm not bothering with the full history, even though we have it. We can create a separate "historical" git archive of that later if we want to, and in the meantime it's about 3.2GB when imported into git - space that would just make the early git days unnecessarily complicated, when we don't have a lot of good infrastructure for it. Let it rip! 2005-04-16 15:20:36 -07:00

			`It is possible to prevent a preemption event using local_irq_disable and`
			`local_irq_save. Note, when doing so, you must be very careful to not cause`
			`an event that would set need_resched and result in a preemption check. When`
			`in doubt, rely on locking or explicit preemption disabling.`

			`Note in 2.5 interrupt disabling is now only per-CPU (e.g. local).`

			`An additional concern is proper usage of local_irq_disable and local_irq_save.`
			`These may be used to protect from preemption, however, on exit, if preemption`
			`may be enabled, a test to see if preemption is required should be done. If`
			`these are called from the spin_lock and read/write lock macros, the right thing`
			`is done. They may also be called within a spin-lock protected region, however,`
			`if they are ever called outside of this context, a test for preemption should`
			`be made. Do note that calls from interrupt context or bottom half/ tasklets`
			`are also protected by preemption locks and so may use the versions which do`
			`not check preemption.`