1. Field of the Invention
The present invention relates generally to computer operating systems and more particularly to operating systems that implement locks to control access to critical resources.
2. Related Art
In today""s computer systems, it is not uncommon for there to be two or more similar processors connected via a high-bandwidth link and managed by one operating system. Such systems are often referred to as symmetric multi-processing (SMP) systems. Typically, SMP systems employ an operating system (e.g., UNIX, IRIX, Linux or the like) which allows every processor equal access to memory and I/O devices. More specifically, the operating system""s kernelxe2x80x94the part of an operating system that is responsible for resource allocation, low-level hardware interfaces, security, etc.xe2x80x94allows application programs to execute on any processor in the computer system, interchangeably, at the operating system""s discretion.
Given a multi-processor, multi-application environment, read and write access to shared critical resources within a computer system must be restricted so that race conditions do not arise. That is, multiple processes executing concurrently on multiple processors may need to access a critical resource in order to change a common variable, update a data structure, read a file, write to the file, etc. A number of the processes may desire to only read the contents of the critical resource (i.e., a xe2x80x9creaderxe2x80x9d), while other processes may desire to update (read and write) the contents of the critical resource (i.e., a xe2x80x9cwriterxe2x80x9d). If two readers simultaneously access the contents of the critical resource, no problems arise. If, however, a writer and another process (i.e., a reader or a writer) simultaneously attempt to access the content of the critical resource, a race condition arises.
In order to avoid (i.e., prevent) race conditions, it must be assured that at any given time, only one writer has exclusive access to a given critical resource. One solution is to implement a mutual exclusion (mutex) lock (also referred to as a semaphore). Mutex locks, which are well-known in the relevant art(s), use a central data structure which encompasses a protected variable. Mutex locks only allow one process to access a critical resourcexe2x80x94read or writexe2x80x94and force other processes to wait for access to the critical resource by either xe2x80x9cspinningxe2x80x9d (i.e., performing no-ops while waiting) or xe2x80x9csleepingxe2x80x9d (i.e., blocking and placing itself in a waiting queue).
Another solution is to use multi-reader locks, which are also well-known in the relevant art(s). Multi-reader locks are similar to mutex locks, but as their name suggests, allow multiple readers to simultaneously access the critical resource. When multiple readers simultaneously access a critical resource, a race condition does not occur because the resource is not being modifiedxe2x80x94only read.
Although useful for solving contention issues in SMP systems, multireader locks are limited. That is, conventional implementations of multi-reader locks only really work well in situations where contention is not high. That is, contention problems occur in conventional implementations because all readers and writers are forced to access (and hence contend for) a centralized data structure (i.e., the lock), before they obtain access to the critical resource. This is counter-productive because the aim of a multi-reader lock is to allow readers to proceed in parallel. Yet, the readers are all contending for the global multi-reader lock data structure. In other words, conventional implementations of multi-reader locks frequently breakdown on large CPU count computer systems when too many readers arrive at the lock at the same time. This causes severe cache contention problems for the lock data structures themselves.
The above-described problem is exacerbated in today""s computer systems that have cache-coherent non-uniform memory access (ccNUMA) architectures. In the situation where all of an SMP computer system""s CPUs are requesting the lock in read mode, a bottleneck on the lock data structure will still occur. The performance of multi-reader locks, in this situation, is no better than a normal mutex lock.
Therefore, given the foregoing, what is needed is a system, method and computer program product for scalable multi-reader/single-writer locks that overcomes the deficiencies of conventional mutex and multi-reader locks. The system, method and computer program product should allow readers to proceed in parallel without contending for a common resource.
The present invention is directed to a system, method and computer program product for implementing a scalable multi-reader/single-writer lock, within a computer operating system, that meets the above-identified needs.
The system of the present invention includes a registry head data structure for each critical resource within the computer system that requires a multi-reader lock. Linked to each of the registry head data structures are one or more client data structures that represent each client (i.e., process, thread, interrupt handler, or the like) that needs read and/or write access to the critical resource represented by the registry head data structure.
The method and computer program of the present invention involve initializing a registry head data structure for each critical resource in the computer system that one or more clients need write and/or read access to. That is, a registry data structure corresponding to a critical resource within the computer system is allocated. The registry data structure includes a writer flag initialized to zero and a spin lock initialized to a unlocked state.
Further, a plurality of client data structures, linked to the registry data structure, are allocated. Each of the client data structures includes a read enable flag initialized to one, and a read use flag initialized to zero. Each client data structure corresponds to one of a plurality of clients within the computer system which desires read and write access to the critical resource.
Reading the critical resource involves determining, by a client, whether the client""s read enable flag is set to one. If not, this indicates that a writer is currently updating the critical resource and thus the client must spin (i.e., wait). If yes, the client sets its read use flag to one and then performs at least one read operation on the critical resource. Once the client is done with its read operation(s) on the critical resource, the client resets the read use flag, within its client data structure, to zero. Note that the reader is not required to access the registry head data structure or obtain the global spin lock, thus avoiding contention in the common case.
Writing to the critical resource involves a client obtaining the registry head data structure""s spin lock in order to change its state to a locked state. Then, the client traverses every other client""s client data structure to determine if all the read use flags are set to zero. If not, the client must wait as a non-zero state indicates that another client is currently reading the critical resource. If yes, the client sets the read enable flag to zero within the client data structure of the other clients. This prevents any other client from reading the critical resource while one client is attempting to write to it.
Next, the client updates the value of the writer flag (i.e., sets it equal to one) within the registry head data structure. This prevents any other client from also becoming a writer. The client then releases the spin lock by changing its state to the unlocked state. This allows other operations which do not interfere with the write operation (e.g., the deleting a client data structure operation) to proceed. After the client performs its write operation(s) on the critical resource, the spin lock is once again obtained by the client in order to change its state to the locked state. The client then traverses every other client""s client data structure and sets the read enable flag back to one. The value of the writer flag is set back to zero and the spin lock is released by changing its state to the unlocked state.
An advantage of the present invention is that it does not utilize a centralized multi-reader data structure, but instead employs a unique data structure for each client thereby allowing parallelism for the read case.
Another advantage of the present invention is that it assumes writers are rare and readers are the more common case. Consequently, the invention xe2x80x9cpre-approvesxe2x80x9d readers by not requiring them to access the registry head data structure or obtain the global spin lock, thereby eliminating contention in the more common case.
Yet another advantage of the present invention is that it utilizes dynamic data structures which grow and shrink during operation of the computer system, rather than conventional static data structure lock implementations.
A further advantage of the present invention is that it can be implemented in a distributed cluster environment.
Further features and advantages of the invention as well as the structure and operation of various embodiments of the present invention are described in detail below with reference to the accompanying drawings.