This invention relates generally to a method and apparatus for improving performance in systems where multiple processors contend for control of a shared resource through a lock associated with the shared resource, and more particularly to a method and apparatus for improving performance in intelligent data storage systems.
When a computer system resource is shared by multiple processes running on multiple processors, or even on one processor, often there must be some way of insuring that no more than one such process may access that resource at any one time. In designing complex data storage systems including multiple processors, synchronizing access to shared resources has been recognized as an issue which must be addressed in order to maintain the consistency and validity of the data. However, the sharing issue may arise in connection with almost any resource that might be used by multiple requesters.
Many high-performance storage systems are intelligent data storage systems which may be accessible by multiple host computers. These may include, in addition to one or more storage device arrays, a number of intelligent controllers for controlling the various aspects of the data transfers associated with the storage system. In such systems, host controllers may provide the interface between the host computers and the storage system, and device controllers may be used to manage the transfer of data to and from an associated array of storage devices (e.g. disk drives). Often, the arrays may be accessed by multiple hosts and controllers. In addition, advanced storage systems, such as the SYMMETRIX(copyright) storage systems manufactured by EMC Corporation, generally include a global memory which typically shared by the controllers in the system. The memory may be used as a staging area (or cache) for the data transfers between the storage devices and the host computers and may provide a communications path which buffers data transfer between the various controllers. Various communication channels, such as busses, backplanes or networks, link the controllers to one another and the global memory, the host controllers to the host computers, and the disk controllers to the storage devices. Such systems are described, for example, in Yanai et al, U.S. Pat. No. 5,206,939 issued Apr. 27, 1993, (hereinafter xe2x80x9cthe ""959 patentxe2x80x9d), Yanai et al, U.S. Pat. No. 5,381,539 issued Jan. 10, 1995, (hereinafter xe2x80x9cthe ""539 patentxe2x80x9d), Vishlitzky et al, U.S. Pat. No. 5,592,492 issued Jan. 7, 1997, (hereinafter xe2x80x9cthe ""492 patentxe2x80x9d), Yanai et al, U.S. Pat. No. 5,664,144 issued Sep. 2, 1997 (hereinafter xe2x80x9cthe ""144 patent), and Vishlitzky et al, U.S. Pat. No. 5,787,473 issued Jul. 28, 1998, (hereinafter xe2x80x9cthe ""473 patentxe2x80x9d), all of which are herein incorporated in their entirety by reference. The systems described therein allow the controllers to act independently to perform different processing tasks and provide for distributed management of the global memory resources by the controllers. This high degree of parallelism permits improved efficiency in processing I/O tasks. Since each of the controllers may act independently, there may be contention for certain of the shared memory resources within the system. In these systems, the consistency of the data contained in some portions of global memory may be maintained by requiring each controller to lock those data structures which require consistency while it is performing any operations on them which are supposed to be atomic.
Since locking inherently reduces the parallelism of the system and puts a high load on system resources, locking procedures must be designed with care to preserve system efficiency. Adding features to the lock, such as queuing, lock override procedures, or multimodality can help to avoid some pitfalls of common lock protocols, such as processor starvation, deadlocks, livelocks and convoys. However, it is also known that, while many of these lock features have individual advantages, multifeatured lock management procedures are difficult to design and implement without unduly burdening system resources or inadvertently introducing pitfalls such as additional deadlock or starvation situations. For example, multimodal locks, which permit the requestor to identify the kind of resource access desired by the requester and the degree of resource sharing which its transaction can tolerate, can be useful in improving system performance and avoiding deadlocks, but providing a lock override which is suitable for a multimodal lock is quite difficult. If, for example, one lock mode is set to allow unusually long transactions, a timeout set to accommodate normal transactions will cut the long ones off in midstream while a timeout set to accommodate the long transactions will allow failures occurring during normal transactions to go undetected for excessively long periods. Moreover, timeouts are competitive procedures which, in certain circumstances, undesirably offset the cooperative advantages of a queued lock. Because of the complexities introduced by multifeatured locks, it is desirable to validate features and modes which create particularly significant drains on system resources, such as long timeout modes, but introducing additional validation features can itself load system resources to the point where the system efficiency suffers.
Providing suitable procedures becomes especially difficult in complex multiprocessor systems which may contain a number of queued locks associated with different shared resources and where a requestor may have to progress through a number of lock request queues in turn in order to complete a process. In these systems, it is desirable that whatever procedure is implemented be fair, ensure that each requestor eventually obtains access to the lock whether or not all other requesters in the system are operating properly, and minimize the average waiting time for each requestor in the queue to improve system efficiency. Queued locks which implement a first-in-first-out (FIFO) protocol meet the fairness criteria because denied requests are queued in the order they are received. One such lock services procedure, often known as the xe2x80x9cbakeryxe2x80x9d or xe2x80x9cdelixe2x80x99 algorithm, is described, for example, in xe2x80x9cResource Allocation with Immunity to Limited Process Failurexe2x80x9d, Michael J. Fischer, Nancy A. Lynch, James E. Burns, and Alan Borodin, 20th Annual Symposium on Foundations of Computer Science, San Juan, Puerto Rico, October 1979, p 234-254; and xe2x80x9cDistributed FIFO Allocation of Identical Resources Using Small Shared Spacexe2x80x9d, ACM Transactions on Programming Languages and Systems, January 1989, 11(1): 90-114. When all requestors in the system are operating properly, the basic xe2x80x9cdelixe2x80x9d algorithm also meets the other criteria, but a protocol violation such as the failure of any processor in the lock request queue can lead to total system deadlock. However, in all complex multiprocessor systems, occasional protocol violations are inevitable, and the xe2x80x9cdelixe2x80x9d algorithm makes no provision either for detecting these through validation procedures or otherwise, or for handling them when they occur. Moreover, the basic xe2x80x9cdelixe2x80x9d lock is a unimodal lock.
A lock is needed which supports multiple locking modes and makes provision both for validation features to detect protocol violations and lock override procedures to manage the violations without unduly reducing system efficiency, and which also meets desirable design criteria for fairness, wait time minimization and guaranteed access.
In accordance with the present invention, a lock mechanism for managing shared resources in a data processing system is provided.
In accordance with the present invention, a method for providing queued, multimodal, self-validating locking and unlocking services for managing a shared resource in a data processing system is provided. The lock mechanism is multimodal and self-validating. One or more supplemental validation procedures may be selectively associated with certain lock modes. Thus, lock modes which constitute a particularly heavy drain on system resources may be extensively validated after the lock is reserved but before the requestor commits to locking the shared resource in these modes to avoid committing the system in error, while the drain on system resources which would be created by validating every lock transaction to a comparably high level may be avoided.
In one aspect of the invention, a method for providing queued, multimodal, self-validating locking and unlocking services for managing a shared resource in a data processing system is provided. The system includes a plurality of processors as lock requestors. Each processor supports atomic operations and is coupled to the shared resource through one or more first common communication channels. The method includes providing for the shared resource, an associated main lock data structure stored in a shared memory accessible by the processors. The main lock data structure includes in a single atomic structure, the resources needed to lock the shared resource by a successful lock requester, to identify one of at least two lock modes, to establish a queue of unsuccessful lock requesters, and to validate the existence of the lock. The method also includes providing for each processor a lock services procedure including at least fist and second lock mode procedures, a lock mode selection procedure for selecting one from the lock mode procedures by a successful lock requester, a queuing procedure for unsuccessful lock requesters, locking and unlocking procedures for locking and unlocking the shared resource in the selected lock mode by a successful lock requestor, and a supplemental validation procedure selectively associated with the second lock mode for validating the lock by a successful lock requestor. The method also includes the step of selecting one from the lock modes by a requesting processor. The method also includes the step of, in a single atomic operation by the requesting processor, examining the contents of the main lock data structure to determine if another requesting processor has previously locked the shared resource and if the lock contents are valid, and writing data to the main lock data structure to establish its place in a queue of requesters for subsequent locks on the shared resource if some other requesting processor has previously locked the shared resource and the lock contents are also valid, or writing data to the main lock data structure to reserve and validate the lock and to identify the first lock mode if the lock contents are invalid or if no other requesting processor has previously locked the shared resource. Then, if the lock contents are invalid or if no other requesting processor has previously locked the shared resource and the second lock mode has been selected by the requesting processor, the requesting processor executes the supplemental validation procedure by the requesting processor to validate the lock allocation to that requesting processor. If the supplemental validation procedure does validate the lock allocation to that requesting processor, then the requesting processor writes data to the main lock data structure to identify the second lock mode, but if it does not, the requesting processor does not enter the second lock mode.
In another aspect of the invention, a more efficient intelligent storage system is provided. The intelligent storage system typically includes multiple processors as requestors, and these are coupled to a shared resource through one or more first common communication channels. Each processor supports atomic operations. A lock services procedure is implemented in each of the processors. The lock services procedure includes at least first and second lock mode procedures, a lock mode selection procedure for selecting one from the lock mode procedures by a successful lock requestor, a queuing procedure for unsuccessful lock requestors, locking and unlocking procedures for locking and unlocking the shared resource in the selected lock mode by a successful lock requestor, and also includes at least one supplemental validation procedure selectively associated with the second lock mode for validating the lock by a successful lock requestor. A main lock data structure, responsive to these lock services procedures and associated with the shared resource, is implemented in the shared memory accessible over one or more second common communications channels to all of the processors. The main lock data structure provides, in a single atomic structure, the resources needed to lock a shared resource, identify one of the at least two lock modes, establish a queue of unsuccessful lock requesters, and validate the existence of the lock, Resources are also provided to validate the identity of the successful lock requestor in connection with certain transactions. The lock services procedure also enables each processor, in a single transaction, to request locks on the shared resource, to validate the existence of a lock in the main lock data structure, to request locks on the shared resource, to lock the shared resource if its request is successful and to establish a place in a queue of requestors for subsequent locks on the shared resource if its request is unsuccessful. Each requesting processor is operable in accordance with its lock services procedure to select one from the lock modes. Each requesting processor is also operable in accordance with its lock services procedure in a single atomic operation, to examine the contents of the main lock data structure to determine if another requesting processor has previously locked the shared resource and if the lock contents are valid, and to write data to the main lock data structure to establish its place in a queue of requesters for subsequent locks on the shared resource if some other requesting processor has previously locked the shared resource and the lock contents are also valid, or to write data to the main lock data structure to reserve and validate the lock and to identify the first lock mode if the lock contents are invalid or if no other requesting processor has previously locked the shared resource. Then, if the lock contents are invalid or no other requesting processor has previously locked the shared resource and the second lock mode has been selected by the requesting processor, the requesting processor is operable to execute the supplemental validation procedure in order to validate the lock allocation to that requesting processor. If the supplemental validation procedure does validate the lock allocation to that requesting processor, then, the requesting processor is operable to write data to the main lock data structure to identify the second lock mode, but if it does not, the requesting processor does not enter the second lock mode. Thus, data may be written to lock the shared resource in a first one of at least two lock modes as a part of this atomic operation or to lock the shared resource in a second one of at least two lock modes in a subsequent atomic operation following a supplemental validation procedure.
In yet another aspect of the invention, multiple processes running on a single processor may in some aspects act as requesters and a lock allocation process or procedure may be invoked by each of these processes, but the operation of the invention is otherwise as described above.