In multi-threaded programs running on multiprocessors, different threads may attempt to access shared data structures concurrently. Such access is generally coordinated via some form of locking: all threads acquire a lock before accessing the data structure (and hold the lock for the duration of the access). The simplest form of locking is mutual exclusion—only one thread at a time can hold the lock, so only a single thread is accessing the data structure at a time. However, in many cases one can classify the operations performed on the data structure as readers and/or writers; reader operations only read the data structure, but writer operations may update it. Several reader operations may safely run concurrently, but only one writer operation may safely run at a time. Therefore, it is useful to reflect this in the locking primitives, and the concurrent programming literature has long had the concept of reader/writer locks. Such a lock can either be acquired in read (or shared) mode, or write (or exclusive) mode—several threads may acquire the lock in read mode, but only one thread may acquire the lock in write mode. Thus, a reader/writer lock can protect a shared data structure, and when operations on the data structure have been classified as reader or writer operations, they can acquire the lock in the corresponding mode. For many shared data structures, it is common for reader operations to be much more frequent than writer operations, so that reader/writer locks allow significantly greater parallel scalability.
It is common for computers today to have many more processing cores than computers of just a few years ago. Where once computers with more than 2-4 cores were only found in database servers or supercomputers, even desktop computer systems can be ordered today with eight or more processor cores. The increased number of processors increases the sharing of resources such as memory, and exacerbates inefficient use of such resources, including cache faults caused by multiple processors modifying the same data.
Unfortunately, the most common implementations or reader/writer locks include a single variable that tracks the number of readers, and sometimes a separate variable that tracks writers. With many readers executing code on different processors, the reader/writer lock itself can quickly become a source of cache contention. For example, one common strategy in reader/writer lock implementations is to maintain a variable representing the number of threads that have acquired the lock in read mode, updating this count variable with atomic hardware instructions. This causes cache contention, as each processor that updates the count acquires the cache line containing the count variable in exclusive mode. As the number of processors in a machine increases, contention on even one cache line can severely limit performance. The impact of contention is determined partly by the rate at which processors access the contended cache line. If N processors each execute a loop in which they acquire a read lock to execute a read operation, the rate at which the read lock is acquired, and thus the shared cache line is accessed, will depend on the duration of the operation executed within the lock—the shorter the operation, the greater the contention. Thus, whereas for exclusive locks software developers generally increase parallelism by doing less inside locks, for reader/writer locks developers often receive paradoxical guidance, recommending increases in parallelism by doing more inside locks.