| 
									
										
										
										
											2017-05-16 21:58:47 -03:00
										 |  |  | =========================================================================== | 
					
						
							|  |  |  | Proper Locking Under a Preemptible Kernel: Keeping Kernel Code Preempt-Safe | 
					
						
							|  |  |  | =========================================================================== | 
					
						
							| 
									
										
										
										
											2005-04-16 15:20:36 -07:00
										 |  |  | 
 | 
					
						
							| 
									
										
										
										
											2017-05-16 21:58:47 -03:00
										 |  |  | :Author: Robert Love <rml@tech9.net> | 
					
						
							| 
									
										
										
										
											2005-04-16 15:20:36 -07:00
										 |  |  | 
 | 
					
						
							| 
									
										
										
										
											2017-05-16 21:58:47 -03:00
										 |  |  | 
 | 
					
						
							|  |  |  | Introduction | 
					
						
							|  |  |  | ============ | 
					
						
							| 
									
										
										
										
											2005-04-16 15:20:36 -07:00
										 |  |  | 
 | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | A preemptible kernel creates new locking issues.  The issues are the same as | 
					
						
							|  |  |  | those under SMP: concurrency and reentrancy.  Thankfully, the Linux preemptible | 
					
						
							|  |  |  | kernel model leverages existing SMP locking mechanisms.  Thus, the kernel | 
					
						
							|  |  |  | requires explicit additional locking for very few additional situations. | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | This document is for all kernel hackers.  Developing code in the kernel | 
					
						
							|  |  |  | requires protecting these situations. | 
					
						
							|  |  |  |   | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | RULE #1: Per-CPU data structures need explicit protection | 
					
						
							| 
									
										
										
										
											2017-05-16 21:58:47 -03:00
										 |  |  | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | 
					
						
							| 
									
										
										
										
											2005-04-16 15:20:36 -07:00
										 |  |  | 
 | 
					
						
							|  |  |  | 
 | 
					
						
							| 
									
										
										
										
											2017-05-16 21:58:47 -03:00
										 |  |  | Two similar problems arise. An example code snippet:: | 
					
						
							| 
									
										
										
										
											2005-04-16 15:20:36 -07:00
										 |  |  | 
 | 
					
						
							|  |  |  | 	struct this_needs_locking tux[NR_CPUS]; | 
					
						
							|  |  |  | 	tux[smp_processor_id()] = some_value; | 
					
						
							|  |  |  | 	/* task is preempted here... */ | 
					
						
							|  |  |  | 	something = tux[smp_processor_id()]; | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | First, since the data is per-CPU, it may not have explicit SMP locking, but | 
					
						
							|  |  |  | require it otherwise.  Second, when a preempted task is finally rescheduled, | 
					
						
							|  |  |  | the previous value of smp_processor_id may not equal the current.  You must | 
					
						
							|  |  |  | protect these situations by disabling preemption around them. | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | You can also use put_cpu() and get_cpu(), which will disable preemption. | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | RULE #2: CPU state must be protected. | 
					
						
							| 
									
										
										
										
											2017-05-16 21:58:47 -03:00
										 |  |  | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | 
					
						
							| 
									
										
										
										
											2005-04-16 15:20:36 -07:00
										 |  |  | 
 | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | Under preemption, the state of the CPU must be protected.  This is arch- | 
					
						
							|  |  |  | dependent, but includes CPU structures and state not preserved over a context | 
					
						
							|  |  |  | switch.  For example, on x86, entering and exiting FPU mode is now a critical | 
					
						
							|  |  |  | section that must occur while preemption is disabled.  Think what would happen | 
					
						
							|  |  |  | if the kernel is executing a floating-point instruction and is then preempted. | 
					
						
							|  |  |  | Remember, the kernel does not save FPU state except for user tasks.  Therefore, | 
					
						
							|  |  |  | upon preemption, the FPU registers will be sold to the lowest bidder.  Thus, | 
					
						
							|  |  |  | preemption must be disabled around such regions. | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | Note, some FPU functions are already explicitly preempt safe.  For example, | 
					
						
							|  |  |  | kernel_fpu_begin and kernel_fpu_end will disable and enable preemption. | 
					
						
							| 
									
										
										
										
											2015-04-22 13:16:47 +02:00
										 |  |  | However, fpu__restore() must be called with preemption disabled. | 
					
						
							| 
									
										
										
										
											2005-04-16 15:20:36 -07:00
										 |  |  | 
 | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | RULE #3: Lock acquire and release must be performed by same task | 
					
						
							| 
									
										
										
										
											2017-05-16 21:58:47 -03:00
										 |  |  | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | 
					
						
							| 
									
										
										
										
											2005-04-16 15:20:36 -07:00
										 |  |  | 
 | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | A lock acquired in one task must be released by the same task.  This | 
					
						
							|  |  |  | means you can't do oddball things like acquire a lock and go off to | 
					
						
							|  |  |  | play while another task releases it.  If you want to do something | 
					
						
							|  |  |  | like this, acquire and release the task in the same code path and | 
					
						
							|  |  |  | have the caller wait on an event by the other task. | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | 
 | 
					
						
							| 
									
										
										
										
											2017-05-16 21:58:47 -03:00
										 |  |  | Solution | 
					
						
							|  |  |  | ======== | 
					
						
							| 
									
										
										
										
											2005-04-16 15:20:36 -07:00
										 |  |  | 
 | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | Data protection under preemption is achieved by disabling preemption for the | 
					
						
							|  |  |  | duration of the critical region. | 
					
						
							|  |  |  | 
 | 
					
						
							| 
									
										
										
										
											2017-05-16 21:58:47 -03:00
										 |  |  | :: | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  |   preempt_enable()		decrement the preempt counter | 
					
						
							|  |  |  |   preempt_disable()		increment the preempt counter | 
					
						
							|  |  |  |   preempt_enable_no_resched()	decrement, but do not immediately preempt | 
					
						
							|  |  |  |   preempt_check_resched()	if needed, reschedule | 
					
						
							|  |  |  |   preempt_count()		return the preempt counter | 
					
						
							| 
									
										
										
										
											2005-04-16 15:20:36 -07:00
										 |  |  | 
 | 
					
						
							|  |  |  | The functions are nestable.  In other words, you can call preempt_disable | 
					
						
							|  |  |  | n-times in a code path, and preemption will not be reenabled until the n-th | 
					
						
							|  |  |  | call to preempt_enable.  The preempt statements define to nothing if | 
					
						
							|  |  |  | preemption is not enabled. | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | Note that you do not need to explicitly prevent preemption if you are holding | 
					
						
							|  |  |  | any locks or interrupts are disabled, since preemption is implicitly disabled | 
					
						
							|  |  |  | in those cases. | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | But keep in mind that 'irqs disabled' is a fundamentally unsafe way of | 
					
						
							| 
									
										
										
										
											2018-10-08 14:15:15 +01:00
										 |  |  | disabling preemption - any cond_resched() or cond_resched_lock() might trigger | 
					
						
							|  |  |  | a reschedule if the preempt count is 0. A simple printk() might trigger a | 
					
						
							|  |  |  | reschedule. So use this implicit preemption-disabling property only if you | 
					
						
							|  |  |  | know that the affected codepath does not do any of this. Best policy is to use | 
					
						
							|  |  |  | this only for small, atomic code that you wrote and which calls no complex | 
					
						
							|  |  |  | functions. | 
					
						
							| 
									
										
										
										
											2005-04-16 15:20:36 -07:00
										 |  |  | 
 | 
					
						
							| 
									
										
										
										
											2017-05-16 21:58:47 -03:00
										 |  |  | Example:: | 
					
						
							| 
									
										
										
										
											2005-04-16 15:20:36 -07:00
										 |  |  | 
 | 
					
						
							|  |  |  | 	cpucache_t *cc; /* this is per-CPU */ | 
					
						
							|  |  |  | 	preempt_disable(); | 
					
						
							|  |  |  | 	cc = cc_data(searchp); | 
					
						
							|  |  |  | 	if (cc && cc->avail) { | 
					
						
							|  |  |  | 		__free_block(searchp, cc_entry(cc), cc->avail); | 
					
						
							|  |  |  | 		cc->avail = 0; | 
					
						
							|  |  |  | 	} | 
					
						
							|  |  |  | 	preempt_enable(); | 
					
						
							|  |  |  | 	return 0; | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | Notice how the preemption statements must encompass every reference of the | 
					
						
							| 
									
										
										
										
											2017-05-16 21:58:47 -03:00
										 |  |  | critical variables.  Another example:: | 
					
						
							| 
									
										
										
										
											2005-04-16 15:20:36 -07:00
										 |  |  | 
 | 
					
						
							|  |  |  | 	int buf[NR_CPUS]; | 
					
						
							|  |  |  | 	set_cpu_val(buf); | 
					
						
							|  |  |  | 	if (buf[smp_processor_id()] == -1) printf(KERN_INFO "wee!\n"); | 
					
						
							|  |  |  | 	spin_lock(&buf_lock); | 
					
						
							|  |  |  | 	/* ... */ | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | This code is not preempt-safe, but see how easily we can fix it by simply | 
					
						
							|  |  |  | moving the spin_lock up two lines. | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | 
 | 
					
						
							| 
									
										
										
										
											2017-05-16 21:58:47 -03:00
										 |  |  | Preventing preemption using interrupt disabling | 
					
						
							|  |  |  | =============================================== | 
					
						
							| 
									
										
										
										
											2005-04-16 15:20:36 -07:00
										 |  |  | 
 | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | It is possible to prevent a preemption event using local_irq_disable and | 
					
						
							|  |  |  | local_irq_save.  Note, when doing so, you must be very careful to not cause | 
					
						
							|  |  |  | an event that would set need_resched and result in a preemption check.  When | 
					
						
							|  |  |  | in doubt, rely on locking or explicit preemption disabling. | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | Note in 2.5 interrupt disabling is now only per-CPU (e.g. local). | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | An additional concern is proper usage of local_irq_disable and local_irq_save. | 
					
						
							|  |  |  | These may be used to protect from preemption, however, on exit, if preemption | 
					
						
							|  |  |  | may be enabled, a test to see if preemption is required should be done.  If | 
					
						
							|  |  |  | these are called from the spin_lock and read/write lock macros, the right thing | 
					
						
							|  |  |  | is done.  They may also be called within a spin-lock protected region, however, | 
					
						
							|  |  |  | if they are ever called outside of this context, a test for preemption should | 
					
						
							|  |  |  | be made. Do note that calls from interrupt context or bottom half/ tasklets | 
					
						
							|  |  |  | are also protected by preemption locks and so may use the versions which do | 
					
						
							|  |  |  | not check preemption. |