This invention relates generally to a method and apparatus for improving performance in systems where multiple processors contend for control of a shared resource through a lock associated with the shared resource, and more particularly to a method and apparatus for improving performance in intelligent data storage systems.
When a computer system resource is shared by multiple processes running on multiple processors, or even on one processor, often there must be some way of insuring that no more than one such process may access that resource at any one time. In designing complex data storage systems including multiple processors, synchronizing access to shared resources has been recognized as an issue which must be addressed in order to maintain the consistency and validity of the data. However, the sharing issue may arise in connection with almost any resource that might be used by multiple requesters.
Many high-performance storage systems are intelligent data storage systems which may be accessible by multiple host computers. These may include, in addition to one or more storage device arrays, a number of intelligent controllers for controlling the various aspects of the data transfers associated with the storage system. In such systems, host controllers may provide the interface between the host computers and the storage system, and device controllers may be used to manage the transfer of data to and from an associated array of storage devices (e.g. disk drives). Often, the arrays may be accessed by multiple hosts and controllers. In addition, advanced storage systems, such as the SYMMETRIX(copyright) storage systems manufactured by EMC Corporation, generally include a global memory which typically shared by the controllers in the system. The memory may be used as a staging area (or cache) for the data transfers between the storage devices and the host computers and may provide a communications path which buffers data transfer between the various controllers. Various communication channels, such as busses, backplanes or networks, link the controllers to one another and the global memory, the host controllers to the host computers, and the disk controllers to the storage devices. Such systems are described, for example, in Yanai et al, U.S. Pat. No. 5,206,939 issued Apr. 27, 1993, (hereinafter xe2x80x9cthe ""939 patentxe2x80x9d), Yanai et al, U.S. Pat. No. 5,381,539 issued Jan. 10, 1995, (hereinafter xe2x80x9cthe ""539patentxe2x80x9d), Vishlitzky et al, U.S. Pat. No. 5,592,492 issued Jan. 7, 1997, (hereinafter xe2x80x9cthe ""492 patentxe2x80x9d), Yanai et al, U.S. Pat. No. 5,664,144 issued Sept. 2, 1997 (hereinafter xe2x80x9cthe ""44 patentxe2x80x9d), and Vishlitzky et al, U.S. Pat. No. 5,787,473 issued Jul. 28, 1998, (hereinafter xe2x80x9cthe ""473 patentxe2x80x9d), all of which are herein incorporated in their entirety by reference. The systems described therein allow the controllers to act independently to perform different processing tasks and provide for distributed management of the global memory resources by the controllers. This high degree of parallelism permits improved efficiency in processing I/O tasks. Since each of the controllers may act independently, there may be contention for certain of the shared memory resources within the system. In these systems, the consistency of the data contained in some portions of global memory may be maintained by requiring each controller to lock those data structures which require consistency while it is performing any operations on them which are supposed to be atomic.
Since locking inherently reduces the parallelism of the system and puts a high load on system resources, locking procedures must be designed with care to preserve system efficiency. Adding features to the lock, such as queuing, lock override procedures, or multimodality can help to avoid some pitfalls of common lock protocols, such as processor starvation, deadlocks, livelocks and convoys. However, it is also known that, while many of these lock features have individual advantages, multifeatured lock management procedures are difficult to design and implement without unduly burdening system resources or inadvertently introducing pitfalls such as additional deadlock or starvation situations. For example, multimodal locks, which permit the requestor to identify the kind of resource access desired by the requestor and the degree of resource sharing which its transaction can tolerate, can be useful in improving system performance and avoiding deadlocks, but providing a lock override which is suitable for a multimodal lock is quite difficult. If, for example, one lock mode is set to allow unusually long transactions, a timeout set to accommodate normal transactions will cut the long ones off in midstream while a timeout set to accommodate the long transactions will allow failures occurring during normal transactions to go undetected for excessively long periods. Moreover, timeouts are competitive procedures which, in certain circumstances, undesirably offset the cooperative advantages of a queued lock. Because of the complexities introduced by multifeatured locks, it is desirable to validate features and modes which create particularly significant drains on system resources, such as long timeout modes, but introducing additional validation features can itself load system resources to the point where the system efficiency suffers.
Providing suitable procedures becomes especially difficult in complex multiprocessor systems which may contain a number of queued locks associated with different shared resources and where a requestor may have to progress through a number of lock request queues in turn in order to complete a process. In these systems, it is desirable that whatever procedure is implemented be fair, ensure that each requester eventually obtains access to the lock whether or not all other requesters in the system are operating properly, and minimize the average waiting time for each requestor in the queue to improve system efficiency. Queued locks which implement a first-in-first-out (FIFO) protocol meet the fairness criteria because denied requests are queued in the order they are received. One such lock services procedure, often known as the xe2x80x9cbakeryxe2x80x9d or xe2x80x9cdelixe2x80x9d algorithm, is described, for example, in xe2x80x9cResource Allocation with Immunity to Limited Process Failurexe2x80x9d, Michael J. Fischer, Nancy A. Lynch, James E. Burns, and Alan Borodin, 20th Annual Symposium on Foundations of Computer Science, San Juan, Puerto Rico, October ""1979, p 234-254; and xe2x80x9cDistributed FIFO Allocation of Identical Resources Using Small Shared Spacexe2x80x9d, ACM Transactions on Programming Languages and Systems, January ""1989, 11(1): 90-114. When all requesters in the system are operating properly, the basic xe2x80x9cdelixe2x80x9d algorithm also meets the other criteria, but a protocol violation such as the failure of any processor in the lock request queue can lead to total system deadlock. However, in all complex multiprocessor systems, occasional protocol violations are inevitable, and the xe2x80x9cdelixe2x80x9d algorithm makes no provision either for detecting these through validation procedures or otherwise, or for handling them when they occur. Moreover, the basic xe2x80x9cdelixe2x80x9d lock is a unimodal lock. A lock is needed which supports multiple locking modes and makes provision both for validation features to detect protocol violations and lock override procedures to manage the violations without unduly reducing system efficiency, and which also meets desirable design criteria for fairness, wait time minimization and guaranteed access.
In accordance with the present invention, a lock mechanism for managing shared resources in a data processing system is provided.
In accordance with the present invention, a lock mechanism for managing a shared resource in a data processing system is provided. The lock mechanism includes a main lock data structure which provides, in a single atomic structure, the resources needed to lock the shared resource, to identify one of at least two lock modes, to establish a queue of unsuccessful lock requesters, and to validate the existence of the lock. Resources are also provided to validate the identity of the successful lock requestor in connection with certain transactions. This combination allows a processor, in a single transaction, to validate the main lock data structure, to request a lock, to take the lock and, in one aspect, to establish a lock mode if its request is successful and, in another aspect, to establish a place in a queue of requesters for subsequent locks on the shared resource if its request is unsuccessful. System overhead is thereby significantly reduced.
In one aspect of the invention, a more efficient intelligent storage system is provided. The intelligent storage system typically includes multiple processors as requesters, and these are coupled to a shared resource through one or more first common communication channels. Each processor supports atomic operations. A lock services procedure is implemented in each of the processors. A main lock data structure, responsive to these lock services procedures, is implemented in a shared memory accessible over one or more second common communications channels to all of the processors. The main lock data structure provides, in a single atomic structure, the resources needed to lock the shared resource by a successful requester, identify one of at least two lock modes, to establish a queue of unsuccessful lock requestors, and to validate the existence of the lock. Resources are also provided to validate the identity of the successful lock requestor in connection with certain transactions. Each requesting processor is operable in accordance with its lock services procedure, in a single atomic operation, to examine the contents of the main lock data structure to determine if another requesting processor has previously locked the shared resource and if the lock contents are valid, and either, if the lock contents are valid and some other requesting processor has previously locked the shared resource, to write data to the main lock data structure to establish its place in a queue of requesters for subsequent locks on the shared resource, or if the contents are invalid or no other requesting processor has previously locked the shared resource, to write data to the main lock data structure to reserve and validate the lock.
In one aspect, the lock services procedure also includes at least two lock mode procedures, and a lock mode selection procedure for selecting one from the lock mode procedures by a successful lock requester. Each requesting processor is operable in accordance with its lock services procedure to select one from the lock modes, and to write data to the main lock data structure to identify the selected lock mode so that the shared resource may be locked in a selected one of at least two lock modes. Data may be written to the main lock data structure to identify the selected lock mode in the same atomic operation which writes data to the main lock data structure to reserve and validate the lock.
Another aspect of the invention provides a method for providing queued, multimodal, self-validating locking and unlocking services for managing a shared resource in a data processing system. The system includes a plurality of processors as lock requesters. Each processor supports atomic operations and is coupled to the shared resource through one or more first common communication channels. The method includes providing for each shared resource, an associated main lock data structure stored in a shared memory accessible by a plurality of processors as requesters. The main lock data structure includes in a single atomic structure, the resources needed to lock the shared resource, to identify one of at least two lock modes, to establish a queue of unsuccessful lock requesters, and to validate the existence of the lock. Resources are also provided to validate the identity of the successful lock requestor in connection with certain transactions. The method also includes providing for each processor a lock services procedure including a queuing procedure for unsuccessful lock requesters, and locking and unlocking procedures for locking and unlocking the shared resource in the selected lock mode by a successful lock requestor. In one aspect, the lock services procedure also includes at least two lock mode procedures, and a lock mode selection procedure for selecting one from the lock modes by a successful lock requester. The method also includes, in a single atomic operation by one of the requesting processors, examining the contents of the main lock data structure to determine if another requesting processor has previously locked the shared resource and if the lock contents are valid, and either, if the lock contents are valid and some other requesting processor has previously locked the shared resource, writing data to the main lock data structure to establish its place in a queue of requesters for subsequent locks on the shared resource, or if the contents are invalid or no other requesting processor has previously locked the shared resource, writing data to the main lock data structure to reserve and validate the lock. In one aspect, the requesting processor may select one from the lock modes in accordance with its lock services procedure and write data to the main lock data structure to lock the shared resource in a selected one of at least two lock modes either as a part of this atomic operation or in a subsequent atomic operation.
In yet another aspect of the invention, multiple processes running on a single processor may in some aspects act as requestors, and a lock allocation process or procedure may be invoked by each of these processes, but the operation of the invention is otherwise as described above.