Field of the Disclosure
This disclosure relates generally to reader-writer locks, and more particularly to systems and methods for promoting reader groups for lock cohorting with reader-writer locks.
Description of the Related Art
In a multiprocessor environment with threads and preemptive scheduling, threads can participate in a mutual exclusion protocol through the use of lock or “mutex” constructs. A mutual exclusion lock can either be in a locked state or an unlocked state, and only one thread can hold or own the lock at any given time. The thread that owns the lock is permitted to enter a critical section of code protected by the lock or otherwise access a shared resource protected by the lock. If a second thread attempts to obtain ownership of a lock while the lock is held by a first thread, the second thread will not be permitted to proceed into the critical section of code (or access the shared resource) until the first thread releases the lock and the second thread successfully claims ownership of the lock.
Current trends in multicore architecture design imply that in coming years, there will be an accelerated shift away from simple bus-based designs towards distributed non-uniform memory-access (NUMA) and cache-coherent NUMA (CC-NUMA) architectures. Under NUMA, the memory access time for any given access depends on the location of the accessed memory relative to the processor. Such architectures typically consist of collections of computing cores with fast local memory (as found on a single multicore chip), communicating with each other via a slower (inter-chip) communication medium. In such systems, the processor can typically access its own local memory, such as its own cache memory, faster than non-local memory. In some systems, the non-local memory may include one or more banks of memory shared between processors and/or memory that is local to another processor. Access by a core to its local memory, and in particular to a shared local cache, can be several times faster than access to a remote memory (e.g., one located on another chip). Note that in various descriptions herein, the term “NUMA” may be used fairly broadly. For example, it may be used to refer to non-uniform communication access (NUCA) machines that exhibit NUMA properties, as well as other types of NUMA and/or CC-NUMA machines.
On large cache-coherent systems with Non-Uniform Memory Access (CC-NUMA, sometimes shortened to just NUMA), if lock ownership migrates frequently between threads executing on different nodes, the executing program can suffer from excessive coherence traffic, and, in turn, poor scalability and performance. Furthermore, this behavior can degrade the performance of other unrelated programs executing in the system.
Reader-writer locks are an important category of locks that help programmers overcome the scalability issues that are common with traditional mutual exclusion locks for workloads that include a significant percentage of read-only critical sections of code. At any given time, a reader-writer lock allows one or more reader threads to own a lock in a read-only mode or just one writer thread to own the lock in a write mode. In one very basic implementation of a reader-writer lock, there is a single variable to indicate the synchronization object. When there are multiple simultaneous lock acquisitions in read-only mode, this variable indicates number of reader threads. However, when there is an exclusive lock acquisition for writer thread, this variable indicates an address or other identifier of the writer thread.
With reader-writer locks, read-only or write access permission persists until it is explicitly surrendered using an unlock operation. Past research has shown that even though these locks can scale well for workloads with very high reader volumes (e.g., on the order of 99-100% reader threads), the performance quickly drops off with even a modest number of writer threads (e.g., 5-10%) competing for the lock. This drop-off can be expected to be even worse on cache-coherent NUMA architectures, where the writer threads can introduce significant interconnect traffic and latencies to access remotely situated lock metadata and data that is accessed in a related critical section of code. A reader-writer lock might provide better performance than a traditional mutex, as the reader-writer lock can admit multi-reader (reader-reader) parallelism. However, any actual benefit would be contingent on the workload of the executing application, the availability of true parallelism, and the specific implementation of the reader-writer lock.