This invention relates to the field of concurrently accessed data structures within computer systems. More particularly, a method and apparatus are provided for reducing contention for a reader-writer lock among concurrent readers.
Reader-writer locks are mechanisms frequently implemented in concurrent programming environments for synchronizing and arbitrating access to a shared resource, such as memory, data storage, an input or output device, and so on. The lock serves as a control point for access to the resource, and resource requesters (e.g., threads of execution, software processes) needing to access the resource must have control of the lock before they can proceed.
With a typical reader-writer lock, a single requester may write to the shared resource, or multiple requesters may simultaneously read from the resource. However, a typical reader-writer lock inherently includes at least one control structure for which multiple requesters must contend.
This point of contention may be a lockword that can only be controlled by one writer or one or more readers at a time. This lockword is the gateway to the resource—only a requester that has control of (has “locked”) the lockword can access the resource. Because all requesters are simultaneously and continually attempting to gain control of the lockword, a lot of processing cycles may be lost to contention.
To mitigate contention on the lockword, many reader-writer schemes have implemented a queue to allow requesters to be sequenced in some orderly fashion. However, these schemes usually merely shift the point of contention, even if the queue is configured to replace the lockword altogether.
When a queue is implemented to order requesters seeking access to a lockword, a mutex (mutual exclusion) lock is sometimes implemented with the queue to allow only one requester at a time to modify the queue—whether to add itself to the queue, rearrange the queue, etc.
Even if a mutex lock is not used to control access to the queue, and even if the queue completely replaces the lockword (in which case the resource requester at the head of the queue is granted access to the resource), contention will still be found. For example, if each requester must add itself to the tail of the queue (e.g., to implement a FIFO scheme), the tail pointer of the queue becomes a single point of contention for which all new requesters vie.
One type of reader-writer lock that comprises a queue allows multiple successive readers in the queue to enter their critical sections (i.e., access the resource) simultaneously. As described immediately above, the tail pointer of the queue still acts as a point of contention for all requesters. In addition, if a later reader in the sequence of readers finishes before its predecessor, it must behave appropriately—by determining whether to splice itself out of the queue, notify the preceding or succeeding node of its departure, etc.
In this particular scheme, each reader node implements its own mutex lock. This allows a following node to modify its predecessor, so as to splice the following node out of the queue, for example. However, depending on when the predecessor node finishes (e.g., before the following node can grab the predecessor's lock), there may be contention for the predecessor's lock. Thus, this scheme not only suffers from contention on a tail pointer for the queue, but also from possible contention for locks on individual queue entries.
In addition, queue-based locks that maintain the order of waiting requesters without a mutex lock are often difficult to extend when additional features or more sophisticated fairness guarantees are required. For example, the reader-writer lock implementation used in the Solaris kernel employs a complicated algorithm that attempts to group readers together while considering the priorities of waiting writers and the possibility of priority inversion.
The Solaris lock forgoes a distributed queue-based design in favor of a central one: a single lockword is used to ensure reader-writer exclusion during access to a target resource, and a mutex-protected queuing data structure orders threads when the lockword is contended. The lockword contains a count of active readers, and a new reader can acquire the Solaris lock by incrementing the active reader count, but only if the lock is not write-locked and does not have writers waiting to acquire it. Thus, the lockword is a source of contention even under read-only workloads.
When the Solaris lock becomes contended (that is, when a writer wants to acquire the lock when it is held by another entity, or when a reader wants to acquire the lock when it is held by a writer), threads acquire the central mutex and add themselves to the queue data structure. The last active reader (or any writer) that releases the reader-writer lock must acquire the mutex and pass on ownership of the lock whenever the appropriate bit in the lockword indicates that a thread is waiting on the lock.
Thus, the mutex quickly becomes contended when writers are added to the workload. Because of these sources of contention, it is not surprising that the lock can be a performance bottleneck. In fact, it is possible for the lock to cause the kernel to panic due to thread timeouts.
In summary, existing reader-writer locks do not scale well under heavy loads, even heavy loads of requesters seeking read-only access. Contention over access to a single structure degrades the requesters' performance, whether that structure is a lockword, a tail pointer of a queue, a counter of the number of active readers, a mutex lock or something else. The smaller the critical section of a requester (i.e., the program code to be executed while the requester has access to the resource) and the greater the frequency with which it is executed, the greater the impact of the contention.