1. Field
The present invention relates generally to multithreaded programming and, more specifically, to mutual exclusion of readers and writers in a multithreaded programming environment.
2. Description
Mutual exclusion is a programming technique that ensures that only one program or routine at a time can access some resource, such as a memory location, an input/output (I/O) port, or a file, often through the use of semaphores, which are flags used in programs to coordinate the activities of more than one program or routine. An object for implementing mutual exclusion (or mutex) may be called a lock.
A reader-writer (RW) lock allows either multiple readers to inspect shared data or a single writer exclusive access for modifying that data. On shared memory multiprocessors, the cost of acquiring and releasing these locks can have a large impact on the performance of parallel applications. A major problem with naïve implementations of these locks, where processors spin on a global lock variable waiting for the lock to become available, is that the memory containing the lock and the interconnection network to that memory will become contended when the lock is contended.
Various approaches in the prior art implement scalable exclusive locks, that is, exclusive locks that can become contended without resulting in memory or interconnection contention. These approaches depend either on cache hardware support or on the existence of local memory, where accesses to local memory involve lower latency than accesses to remote memory.
In “Scalable Reader-Writer Synchronization for Shared-Memory Multiprocessors”, by John M. Mellor-Crummey and Michael L. Scott, Proceedings of the 3rd ACM Symposium on Principles and Practice of Parallel Programming, pp. 106-113, Williamsburg, Va., April 1991, the authors describe an exclusive lock which uses atomic operations to build a singly linked list of waiting processors. The processor at the head of the list has the lock and new processors add themselves to the list tail. Rather than spinning on a global lock variable, each processor spins on a variable in its local memory. A processor releases the lock by zeroing the variable on which the next processor in the queue in spinning.
For the RW variant of this exclusive lock, each queue element contains an additional variable to maintain the state of the request. When a new reader request arrives, the state of the previous element in the queue is examined to determine if the new request must block. With a RW lock, readers must be able to release the lock in any order. Hence, the singly linked list of Mellor-Crummey and Scott becomes discontinuous as readers dequeue. To allow for this, two global variables were added to their exclusive lock, a count of the number of active readers and a pointer to the first writer in the queue. As readers acquire and release the lock, they keep the global count of active readers up to date. When releasing the lock, if a reader discovers that the reader count is zero, it unblocks the writer pointed to by the global variable.
In “A Fair Fast Scalable Reader-Writer Lock” by Orran Krieger, Michael Stumm, Ron Unrau, and Jonathan Hanna, Proceedings of the 1993 International Conference on Parallel Processing, the authors describe a fair scalable RW locking algorithm derived from Mellor-Crummey and Scott's exclusive locking algorithm. In the Krieger et al., process, rather than adding more global state (that can become contended), an extra state needed for a RW lock is distributed across the list associated with the lock. In particular, readers are maintained in a doubly linked list. With a doubly linked list, instead of synchronizing on a global variable, a reader that is releasing the lock can synchronize with its nearest neighbors to remove itself from the queue. This allows readers to dequeue in any order without the list becoming discontinuous. Hence, it is not necessary to keep either a global pointer to the first writer or a global count of the number of active readers.
There are at least several disadvantages with the two prior art approaches discussed above. In each of the above approaches, queue nodes cannot be allocated on a stack, because sometimes a queue node supplied by a caller is read or written by other threads, even after the caller has released its lock on the mutex. These approaches require the queue nodes to be allocated on a heap, which is slower than stack allocation, and may require acquiring other locks on the heap itself. Further, these methods require that queue nodes never be freed for the lifetime of the mutex, or somehow be atomically reference-counted to determine when it is safe to free them (which is expensive in a multithreaded environment, compared to ordinary reads and writes). The approaches also require that a queue node live longer than the time between acquisition and release of the lock. Additionally, the Krieger et al. method sometimes allows readers to block other readers when a reader expects to be unblocked by its predecessor when the latter has already seen no successor to unblock, which causes the reader to block until all previous readers release the mutex.
Thus, there is a need for further advances in multithreaded programming techniques to overcome these and other disadvantages.