linux-concurrency

英语水平有限，工作空闲时间翻译。同时也学习了并发控制的一些知识

Concurrency in the Kernel (内核里面的并发控制)

Multiple threads of execution need to be synchronized to avoid data corruption and even system freezes.

对于有多线程的程序来说，如果有访问共享的东西的话会导致数据的错乱甚至会导致系统的冻结

As the Linux kernel has grown in complexity to support Symmetric Multi-Processing (SMP) and kernel preemption, more and more scenarios generate multiple threads of execution. Because threads can simultaneously operate on shared kernel data structures, access to such data structures has to be serialized. In this column, let’s learn the basics of protecting shared kernel resources from concurrent access, starting with a simple example and slowly introducing complexities like interrupts, kernel preemption, and SMP.

随着linux内核支持对称多处理系统，简称SMP(Symmetric Multi-Processing)以及内核抢占，使得内核变得更加的复杂，越来越多的地方会有多线程的出现，所以线程可以同时操作内核共享的数据结构，访问这些数据必须得串行。在这里，让我们从一个简单的示例来学习并发访问时内核共享资源的基本保护并慢慢的介绍更复杂的情况。比如:中断，内核抢占以及SMP.

Spinlocks and Semaphores (自旋锁和信息量)

A code area that accesses shared resources is called a critical section. Spinlocks and semaphores are the two mechanisms used to protect critical sections in the kernel.

一段访问共享资源的代码叫做临界区。在内核里面，自旋锁和信号量是用来保护临界区的两种机制。

A spinlock ensures that only a single thread enters a critical section at any time. Any other thread that wants to enter the critical section must wait “spinning its wheels” until the first thread exits. Listing One shows a basic use of a spinlock.

在任何时候，只会有一个获取自旋锁的线程进入临界区。其它线程想要进入临界区必须”自旋”等待，直到第之前获取到锁的线程释放。

Listing One: Basic spinlock usage

#include <linux/spinlock.h>
/* Initialize */
spinlock_t mylock = SPIN_LOCK_UNLOCKED; 
/* Try to acquire the spinlock. This is inexpensive if there 
* is no one inside the critical section. In the face of contention,
* spinlock() busy waits.
*/
spin_lock (&mylock); 
/* … Critical Section … */
/* Release the lock */
spin_unlock (&mylock);

In contrast to spinlocks, which put threads into a spin if they attempt to enter a busy critical section, a semaphore puts each contending thread to sleep until its turn arrives to occupy the critical section.

如果线程尝试进入已经被自旋锁锁住的临界区，那么自旋锁会导致线程自旋。与自旋锁相比，信号量会让每个相互竞争锁的线程休眠直到他占有这
临界区

Because it’s bad to consume processor cycles to spin, semaphores are more suitable than spinlocks to protect critical sections when the estimated wait time is long. In semaphore terms, anything more than two context switches can be considered long, since semaphores have to switch out the contending thread to sleep, and switch it back in when it’s time to wake it up.

由于自旋占用cpu，当执行临界区的代码需要比较长时间时，信号量来保护临界区要比自旋锁更合适。在信号量的情况下，需要考虑超过两次上下文从信号量把竞争锁的线程成休眠到唤醒休眠的切换的时间。

Thus, in many cases, it’s easy to make a decision on whether to use a spinlock or a semaphore:

因此，在各种情况下，很容易作出决定用信号量还是用自旋锁来保护临界区:

If your critical section needs to sleep, you have no choice but to use a semaphore. It’s a deadly sin to schedule, preempt, or sleep on a wait queue after acquiring a spinlock.
如果你的临界区需要休眠，除了用信号量别无选择。在获取自旋锁后如果在等待队列里被重新调度、抢占或休眠是致命的错误。
Since semaphores put the calling thread to sleep in the face of contention, you have no choice but to use spinlocks inside interrupt handlers.
因为信号量在获取临界区是会导致调用线程休眠的，所以在中断处理程序里除了用自旋锁别无选择

Unlike spinlocks, which allow only a single thread inside a critical section at a time, semaphores can be configured to allow a predetermined number of threads into the critical section simultaneously. (However, semaphores that permit only a single thread at a time are more common in the kernel.) A semaphore that permits only a single thread at a time is called a mutex. Listing Two shows basic mutex usage.

每个时间点自旋锁只允许一个线程处于临界区，信号量是允许配置预先定义好的线程个数同时进入临界区(然而，在内核里面，每次只允许一个线程在临界区)。每个时间点只允许一个线程进入临界区的叫互斥量。

Listing Two: Basic Mutex Usage

* Architecture dependent */
#include <asm/semaphore.h>
/* Statically declare a mutex. To dynamically create
* a mutex, use init_MUTEX().
*/
static DECLARE_MUTEX (mysem);
/* Try to acquire the semaphore. This is inexpensive if there 
* is no one inside the critical section. In the face of contention,
* down() puts the calling thread to sleep.
*/
down (&mysem);
/* … Critical Section … */
/* Release the semaphore */
up (&mysem);

To illustrate the use of locks, let’s start with a critical section that is present only in process context and gradually introduce complexities in the following order:

为了说明锁的使用，我们从一个进程上下文的临界区开始然后逐渐讨论以下复杂的情况:

Critical section present only in process context on a uniprocessor box running a non-preemptible kernel (P-UP-N). This is the simplest case and needs no locking.
临界区只存在不可抢占的单核(P-UP-N)的进程上下文时，这种简单的情况下是不需要锁的。
Critical section present in process and interrupt contexts on a uniprocessor machine running a non-preemptible kernel (PI-UP-N).
临界区存在不可抢占单核(PI-UP-N)的进程和中断上下文
Critical section present in process and interrupt contexts on a uniprocessor machine running a preemptible kernel (PI-UP-P).
临界区存在可抢占单核(PI-UP-P)的进程和中断上下文
Critical section present in process and interrupt contexts on an SMP machine running a preemptible kernel (PI-SMP-P).
临界区存在可抢占的多核(PI-SMP-P)的进程和中断上下文

除了第一种情况，其它三种是需要对临界区做保护的

Case Two: PI-UP-N

In the case of a critical section present only in process context on a uniprocessor box running a non-preemptible kernel case, you only need to disable interrupts to protect the critical region:

在临界区存在中断上下文的时候，你需要把中断禁止来保护这个临界区:

1
2
3

cli ();  /* Disable Interrupts */
/* ... Critical Region ...  */
sti ();  /* Enable Interrupts */

However, if interrupts are already disabled when execution reaches cli(), sti() has the unpleasant side effect of re-enabling interrupts, rather than restoring interrupt state. This can be fixed with:

然而，如果执行到cli()这里里中断已经被禁止了，sti()恢复中断后会有意想不到的影响会出现，这个可以通过恢复中断状态修改:

unsigned long flags;
/* Point A: Disable Interrupts */
save_flags (flags); 
cli ();  
/* ... Critical Region ... */
/* Restore state to what it was at Point A above */	
restore_flags (flags);

This latter code works correctly, regardless of the interrupt state when
execution reaches cli().

这个代码正确工作，当执行到cli()时忽略中断的状态.

Case Three: PI-UP-P

If preemption is enabled, mere disabling of interrupts won’t protect your critical region from being trampled. There is the possibility of multiple threads simultaneously using the critical section in process context. The solution, apparently, is to disable kernel preemption before the start of the critical section and re-enable it at the end. Spinlocks do that internally if CONFIG_PREEMPT is enabled:

如果内核是可以抢占的，仅仅只禁掉中断是不能保护你的临界区的，有可能多个线程在同一个进程上下文进入到同一临界区。很显然，解决方案是在进入临界区前禁用内核抢占然后在最后重新开启内核抢占。在可抢占的内核里面，自旋锁就是这样实现的:

/* Marker to turn OFF preemption */
spin_lock (&mylock);    
/* ... Critical Region ... */
/* Marker to turn ON preemption */
spin_unlock (&mylock);

However, this still doesn’t prevent interrupt handlers from stomping through your critical section. Instead, use the IRQ variant of spinlocks:

然而，这仍然还是不能避免在临界区发生的中思。取而代之的是使用自旋锁的另一种实现IRQ:

unsigned long flags;
/*
 * Point A:
 * Save interrupt state.
 * Disable interrupts and preemption.
 */	
spin_lock_irqsave (&mylock, flags);
/* ... Critical Region ... */
/*
 * Enable preemption. Restore interrupt state to that 
 * at Point A.
 */
spin_unlock_irqrestore (&mylock, flags);

Preemption state need not be restored to what it was at Point A, since the kernel does that for you via a variable called the preemption counter. The counter gets incremented whenever preemption is disabled (using the preempt_disable() function), and gets decremented whenever preemption is enabled (using the preempt_enable() function). Preemption kicks in only if the counter value is zero.

自从内核通过一个叫可抢占计数后，在Point A不需要保存抢占的状态。这个计数在抢占禁用(preempt_disable)时会增加,在启用(preempt_enable())时会减少计数。只有当这个计数为0的时候才是可以抢占的。

Case Three: PI-SMP-P

Now assume that the critical section executes on an SMP machine and that your kernel has been configured with CONFIG_SMP and CONFIG_PREEMPT turned on.

假设我们的临界区在一个内核已经开启SMP和可抢占的SMP机器上

In the scenarios discussed so far, the spinlock primitives have done little other than enable and disable preemption and interrupts. The actual lock features have been compiled away.

到目前为止讨论的场景中, 自旋锁原语除了启用和禁用抢占和中断之外, 没有做什么。实际上锁的功能已被编译成汇编了。

In the presence of SMP, the actual locking logic gets compiled in, and the spinlock primitives are rendered SMP-safe. The SMP-enabled semantics is:

在SMP的机器上，锁的实现逻辑是通过编译器实现的，自旋锁原语提供了SMP安全，启用SMP的语义是:

unsigned long flags;
/*
 * Point A:
 * Save interrupt state on the local CPU
 * Disable interrupts on the local CPU
 * Disable preemption
 * Lock the section to regulate access by other CPUs
 */	
spin_lock_irqsave (&mylock, flags);
/* ... Critical Region ... */
/*
 * Enable preemption. Restore interrupt state to what 
 * it was at Point A for the local CPU.
 * Release the lock.
 */
spin_unlock_irqrestore (&mylock, flags);

In the SMP case, only interrupts in the local CPU are disabled when the lock is acquired. If interrupts are not disabled in the local CPU, a deadlock might arise if an interrupt is generated while execution is in the middle of a critical section. (The interrupt handler sits in a tight spin, waiting for the lock to become available, but the critical section cannot complete until the interrupt handler itself finishes execution.) However, this danger of deadlock does not arise for interrupts generated on other CPUs, since an interrupt handler executing on processor A doesn’t prevent process context code on processor B from executing. The interrupt handler on processor A spins, waiting until processor B exits the critical section.

在SMP的情况下，当锁获取后只有当前CPU的中断会被禁用。如果当前CPU的中断没有被禁用，在执行临界区时产生另一个中断可能会产生一个死锁。(中断处理程序会处于紧急旋转状态，一直等待之前的锁被释，但是之前的临界区一直等着现有的中断程序处理的返回)。然而，在其它CPU上产生的中断不会出现死锁的危险，当一个中断程序在processor A上执行时不会阻止进程上下文的代码在processor B上的执行。中断处理程序在processor A一直旋转直到processor B离开临界区。

The kernel has specialized locking primitives in its repertoire that help improve performance under specific conditions. Using a mutual exclusion scheme tailored to your needs will make your code more powerful. Let’s take a look at some of the specialized exclusion mechanisms.

内核指令里有专门的锁定原语，在特定的条件下帮你提升性能。根据你的需要来使用一个互斥方案会使你的代码更加强大。让我们来看一些互斥机制。

Atomic Operators

Atomic operators are used to perform lightweight operations like bumping counters, conditional increments, and setting bit positions, in one shot. Atomic operations are guaranteed to be serialized and do not need locks for protection against concurrent access. The implementation of atomic operators is architecture-dependent.

原子操作被用作执行一些轻微的动作，比如:碰撞计数、条件增量和设置bit位的位置.简而言之，原子操作确保有序并且不需要锁来保护并发访问。原子操作的实现是基于系统架构的。

For example, to check whether there are any remaining data references before freeing a kernel network buffer (called an skbuff), the skb_release_data() routine defined in net/core/skbuff.c does the following:

举个例子，用来检查是否有剩余的数据引用在内核释放网络缓冲(被称为skbuff)的数据,skb_release_data()函数的实现在net/core/skbuff.c里，内容如下:

if (!skb->cloned ||
    /* Atomically decrement and check the reference value */
    atomic_dec_and_test(&(skb_shinfo(skb)->dataref))) {
    /* ... */    
    kfree(skb->head);
}

While skb_release_data() is thus executing, another thread using the skbuff_clone() function (defined in the same file), might be simultaneously incrementing the data reference counter:

当skb_release_data()执行时，另一个线程可能同时增加这数据的引用计数通过使用skbuff_clone(在同一个文件中)函数:

/* .. */
/* Atomically bump the reference count */
atomic_inc(&(skb_shinfo(skb)->dataref));
/* .. */

The use of atomic operators protects the data reference counter from being trampled by these two threads. It also eliminates the hassle of using locks to protect a single integer variable from concurrent access.
The kernel also supports operators like set_bit(), clear_bit(), and test_and_set_bit(), to atomically manipulate a single bit.

使用原子操作保护数据的引用计数被多个线程碰撞。它还消除了使用来锁保护对单个整型变量的并发方问。
内核同样支持如:set_bit()、clear_bit()和test_and_set_bit()在一个bit位上执行原子操作

Reader-Write Locks

The reader-writer variant of a spinlock is another specialized concurrency regulation mechanism. If the usage of a critical section is such that separate threads either read or write, but don’t do both, a reader-writer lock is a natural fit.
Multiple reader threads are allowed inside a critical region simultaneously. Reader spinlocks are defined as:

自旋锁的一个变种——读写锁是是另外一个特定的并发机制。临界区的使用被分离出读和写的线程，读和写不能同时获得，读写锁是设计巧秒的。
多个读线程可以同时在一个临界区。读的自旋锁定义如下:

rwlock_t myrwlock = RW_LOCK_UNLOCKED;
 /* Acquire reader lock */
read_lock (&myrwlock);   
/* ... Critical Region ... */
/* Release lock */
read_unlock (&myrwlock);

However, if a writer thread enters a critical section, other reader or writer threads are not allowed inside. To use writer spinlocks, you’d write:

然而，当一个写的线程进入临界区时，其它尝试进入临界区的读或写的线程是不成功的。对写自旋锁的使用，你可能会这样使用:

rwlock_t myrwlock = RW_LOCK_UNLOCKED;
/* Acquire writer lock */
write_lock (&myrwlock);
/* ... Critical Region ... */
/* Release lock */
write_unlock (&myrwlock);

Look at the Internetwork Packet Exchange (IPX) routing code in net/ipx/ipx_route.c for a real life example of reader-writer spinlock usage. A reader-writer lock called ipx_routes_lock protects the IPX routing table from simultaneous access. Threads that need to look-up the routing table to forward packets don’t have to write to the table and can request reader locks. However, threads that need to add or delete entries from the routing table have to acquire writer locks. This improves performance since there will usually be far more instances of routing table look-ups than routing table updates.
Like regular spinlocks, reader-writer locks also have corresponding irq versions, namely, read_lock_irqsave(), read_lock_irqrestore(), write_lock_irqsave(), and write_lock_irqrestore(). The semantics of these functions are similar to that discussed for regular spinlocks.

通过net/ipx/ipx_route.c里面IPX路由代码的实现来体验下读写自旋锁的使用。一个叫ipx_routes_lock的读写锁用来保护IPX路由表的并发访问。线程需要遍历路由表来发送网络包，因为不需要对这路由表写，所以可以请求的是读锁。然而需要新增或删除路由表里面的数据需要获取写锁。这样将对性能有很大的提升，因为会查询路由表的次数会比写的次数多。
像常规的自旋锁一样，读写锁一样有相对应的irq版本，叫做read_lock_irqsave(),read_lock_irqrestor(),write_lock_irqsave()和 write_lock_irqrestore()。这些函数的语义和之前讨论的自旋锁用法一样。

Corresponding reader-writer flavors of semaphores are, down_read(), down_write(), up_read(), and up_write().
Sequence locks or seqlocks, introduced in the 2.6 kernel, are reader-writer locks where writers are favored over readers. Writer threads do not wait for readers who may be inside a critical section. Because of this, reader threads may discover that their critical section operation has failed, and may need to retry:

相对应的信息量的读写的方式有down_read(),down_write(),up_read()和up_write()。在2.6内核里面介绍的序号锁是写锁比读锁更受欢迎的读写锁。写的线程不需要等待在临界区的读线程。因此，读的线程可以发现他们的临界区操作失败然后可能需要重新获取:

do {
    seq = read_seqbegin (&mylock);
    /* ... Critical Section */
} while (read_seqtry (&mylock, seq));

Writers protect critical regions using write_seqlock() and write_sequnlock().
The 2.6 kernel introduced another mechanism called Read-Copy Update (RCU) that can yield improved performance in cases where readers far outnumber writers. The basic idea is that reader threads can execute without locking. Writer threads are more complex: each performs update operations on a copy of the data structure and replaces the pointer that readers will see. The original copy is maintained until the next context switch to ensure completion of all ongoing read operations. (Be aware that using RCU is more involved than using the primitives discussed thus far, and should be used only if you are sure that it’s the right tool for the job. There’s ample documentation in the kernel tree in Documentation/RCU/*.)
For an example on using RCU, look at fs/dcache.c. In Linux, each file is associated with directory entry information (stored in a structure called dentry), meta data information (stored in an inode), and actual data (stored in data blocks). Each time you operate on a file, the components in the file path are traversed and corresponding dentries are created. The dentries are kept cached in a data structure called the dcache to speed up future operations. At any time, the number of dcache lookups will be much more than dcache updates, so references to the dcache are protected using RCU primitives.

写的线程使用 write_seqlock () 和 write_sequnlock () 保护关键区域。
2.6 内核引入了另一个称为读拷贝更新的机制, 在读者远远超过作者的情况下, 可以提高性能。基本思想是, 读取器线程无需锁定即可执行。写线程更为复杂: 每一个都对数据结构的副本执行更新操作, 并替换读者将看到的指针。原始副本一直保持到下一上下文切换, 以确保所有正在进行的读取操作完成。(请注意, 使用RCU比使用上面讨论过的原语更复杂难懂, 只有当您确信它对你正在进行的任务是正确的工具才使用。内核在Documentation/RCU/*中有足够的文件对其说明。
举一个使用RCU的例子, 请查看 fs/dcache。在 Linux 中, 每个文件都与目录条目信息 (存储在称为 dentry 的结构中)、元数据信息 (存储在 inode 中) 和实际数据 (存储在数据块中) 相关联。每次对文件进行操作时, 都会遍历文件路径中的组件, 并创建相应的 dentries。dentries 被保存在一个称为 dcache 的数据结构中, 以加速未来的操作。在任何时候, dcache 查找的数量将远远超过 dcache 的更新, 因此对 dcache 的引用是使用协调单位原语进行保护的。

Debugging

Concurrency-related problems are generally hard to debug since they are usually difficult to reproduce. It’s a good idea to enable SMP (CONFIG_SMP) and preemption (CONFIG_PREEMPT) while compiling and testing your code, even if your production kernel is going to run on a UP machine with preemption disabled. There is a configuration option under Kernel Hacking called Spinlock Debugging (CONFIG_DEBUG_SPINLOCK) that can help you catch some common spinlock errors. Also, tools like lockmeter (http://oss.sgi.com/projects/lockmeter/) can be used to collect lock-related statistics.
The most common concurrency problem occurs when you forget to lock an access to a shared resource. This results in different threads “racing” through that access in an unregulated manner. The problem (called a race condition) may appear in the form of occasional strange code behavior.
Another potential problem arises when you miss releasing held locks in certain code paths, resulting in deadlocks. To get a hang of this, consider the following example:

并发相关的问题通常很难调试,因为它们通常很难重现。在编译和测试代码时启用 SMP (CONFIG_SMP) 和抢占 (CONFIG_PREEMPT) 是个好主意，即使你的生产环境内核运行在一个不可中断的单核机器。在内核里面有个叫Spinlock Debugging(CONFIG_DEBUG_SPINLOCK)的配置选项，这个可以帮助你发现一些常见的自旋锁的错误。当然，类似lockmeter(http://oss.sgi.com/projects/lockmeter/)的工具同样可以用来收集锁信息的信息。

当您忘记锁定对共享资源的访问时, 会发生最常见的并发问题。这种不常规的访问方式会导致不同的线程同步问题。这个问题(数据竞争情况)可能偶尔出现奇奇怪怪的代码行为

另一个潜在的问题是，当你忘记代码中忘记对获得的锁进行释放时，就会导致死锁的问题。为了理解这个，考虑下下面的例子:

/* Acquire lock */
spin_lock (&mylock);    
/* ... Critical Section ... */
/* Assume that error is very rare 
 *
 * I forgot to release the lock! 
 */
if (error) {    
   return -EIO; 
}
/* Release lock */
spin_unlock (&mylock);

After the occurrence of the error condition, any thread trying to acquire mylock gets deadlocked, and the kernel may freeze.
If the problem first manifests months or years after you write the code, it’s that much harder to go back and debug it. To avoid such unpleasant encounters, concurrency logic should be designed systematically at the beginning, when you architect your software.

在error条件判断出现的后面，任一尝试获取锁的线程都会死锁，系统可能会冻结住(死循环了)。如果问题首先体现在编写代码后的几个月或数年, 那么返回并调试它会更加困难。为了避免这种不愉快的遭遇, 在构建软件时, 应该在开始时系统地设计并发逻辑。