This invention relates generally to a method and apparatus for improving performance in systems where multiple processors contend for control of a shared resource through a lock associated with the shared resource, and more particularly to a method and apparatus for improving performance in intelligent data storage systems.
When a computer system resource is shared by multiple processes running on multiple processors, or even on one processor, often there must be some way of insuring that no more than one such process may access that resource at any one time. In designing complex data storage systems including multiple processors, synchronizing access to shared resources has been recognized as an issue which must be addressed in order to maintain the consistency and validity of the data. However, the sharing issue may arise in connection with almost any resource that might be used by multiple requestors.
Many high-performance storage systems are intelligent data storage systems which may be accessible by multiple host computers. These may include, in addition to one or more storage device arrays, a number of intelligent controllers for controlling the various aspects of the data transfers associated with the storage system. In such systems, host controllers may provide the interface between the host computers and the storage system, and device controllers may be used to manage the transfer of data to and from an associated array of storage devices (e.g. disk drives). Often, the arrays may be accessed by multiple hosts and controllers. In addition, advanced storage systems, such as the SYMMETRIX(copyright) storage systems manufactured by EMC Corporation, generally include a global memory which typically shared by the controllers in the system. The memory may be used as a staging area (or cache) for the data transfers between the storage devices and the host computers and may provide a communications path which buffers data transfer between the various controllers. Various communication channels, such as busses, backplanes or networks, link the controllers to one another and the global memory, the host controllers to the host computers, and the disk controllers to the storage devices. Such systems are described, for example, in Yanai et al, U.S. Pat. No. 5,206,939 issued Apr. 27, 1993, (hereinafter xe2x80x9cthe ""939 patentxe2x80x9d), Yanai et al, U.S. Pat. No. 5,381,539 issued Jan. 10, 1995, (hereinafter xe2x80x9cthe ""539 patentxe2x80x9d), Vishlitzky et al, U.S. Pat. No. 5,592,492 issued Jan. 7, 1997, (hereinafter xe2x80x9cthe ""492 patentxe2x80x9d), Yanai et al, U.S. Pat. No. 5,664,144 issued Sept. 2, 1997 (hereinafter xe2x80x9cthe ""144 patentxe2x80x9d), and Vishlitzky et al, U.S. Pat. No. 5,787,473 issued Jul. 28, 1998, (hereinafter xe2x80x9cthe ""473 patentxe2x80x9d), all of which are herein incorporated in their entirety by reference. The systems described therein allow the controllers to act independently to perform different processing tasks and provide for distributed management of the global memory resources by the controllers. This high degree of parallelism permits improved efficiency in processing I/O tasks. Since each of the controllers may act independently, there may be contention for certain of the shared memory resources within the system. In these systems, the consistency of the data contained in some portions of global memory may be maintained by requiring each controller to lock those data structures which require consistency while it is performing any operations on them which are supposed to be atomic.
Since locking inherently reduces the parallelism of the system and puts a high load on system resources, locking procedures must be designed with care to preserve system efficiency. Adding features to the lock, such as queuing, lock override procedures, or multimodality can help to avoid some pitfalls of common lock protocols, such as processor starvation, deadlocks, livelocks and convoys. However, it is also known that, while many of these lock features have individual advantages, multifeatured lock management procedures are difficult to design and implement without unduly burdening system resources or inadvertently introducing pitfalls such as additional deadlock or starvation situations. For example, multimodal locks, which permit the requestor to identify the kind of resource access desired by the requestor and the degree of resource sharing which its transaction can tolerate, can be useful in improving system performance and avoiding deadlocks, but providing a lock override which is suitable for a multimodal lock is quite difficult. If, for example, one lock mode is set to allow unusually long transactions, a timeout set to accommodate normal transactions will cut the long ones off in midstream while a timeout set to accommodate the long transactions will allow failures occurring during normal transactions to go undetected for excessively long periods. Moreover, timeouts are competitive procedures which, in certain circumstances, undesirably offset the cooperative advantages of a queued lock. Because of the complexities introduced by multifeatured locks, it is desirable to validate features and modes which create particularly significant drains on system resources, such as long timeout modes, but introducing additional validation features can itself load system resources to the point where the system efficiency suffers.
Providing suitable procedures becomes especially difficult in complex multiprocessor systems which may contain a number of queued locks associated with different shared resources and where a requestor may have to progress through a number of lock request queues in turn in order to complete a process. In these systems, it is desirable that whatever procedure is implemented be fair, ensure that each requestor eventually obtains access to the lock whether or not all other requestors in the system are operating properly, and minimize the average waiting time for each requestor in the queue to improve system efficiency. Queued locks which implement a first-in-first-out (FIFO) protocol meet the fairness criteria because denied requests are queued in the order they are received. One such lock services procedure, often known as the xe2x80x9cbakeryxe2x80x9d or xe2x80x9cdelixe2x80x9d algorithm, is described, for example, in xe2x80x9cResource Allocation with Immunity to Limited Process Failurexe2x80x9d, Michael J. Fischer, Nancy A. Lynch, James E. Burns, and Alan Borodin, 20th Annual Symposium on Foundations of Computer Science, San Juan, Puerto Rico, October ""79, p 234-254; and xe2x80x9cDistributed FIFO Allocation of Identical Resources Using Small Shared Spacexe2x80x9d, ACM Transactions on Programming Languages and Systems, January ""89, 11(1): 90-114. When all requestors in the system are operating properly, the basic xe2x80x9cdelixe2x80x9d algorithm also meets the other criteria, but a protocol violation such as the failure of any processor in the lock request queue can lead to total system deadlock. However, in all complex multiprocessor systems, occasional protocol violations are inevitable, and the xe2x80x9cdelixe2x80x9d algorithm makes no provision either for detecting these through validation procedures or otherwise, or for handling them when they occur. Moreover, the basic xe2x80x9cdelixe2x80x9d lock is a unimodal lock.
A lock is needed which supports multiple locking modes and makes provision both for validation features to detect protocol violations and lock override procedures to manage the violations without unduly reducing system efficiency, and which also meets desirable design criteria for fairness, wait time minimization and guaranteed access.
In accordance with the present invention, a lock mechanism for managing shared resources in a data processing system is provided.
In accordance with the present invention, a method for providing queued locking and unlocking services for a shared resource is provided. The services include a cooperative lock override procedure. In one aspect, the locking services are multimodal and the cooperative lock override procedure is selectively associated with a lock mode.
In another aspect of the invention, a method for providing self-validating, queued lock services for managing a shared resource in a data processing system services includes providing a cooperative lock override procedure. The data processing system includes a plurality of processors as lock requestors. Each processor supports atomic operations and is coupled to the shared resource through one or more first common communication channels. The method includes providing for each shared resource an associated main lock data structure stored in a shared memory accessible by the plurality of processors. The main lock data structure includes in a single atomic structure, the resources needed to lock the shared resource by a successful lock requestor, to establish a queue of unsuccessful lock requestors, and to validate the existence of the lock. Resources are also provided to validate the identity of the successful lock requestor in connection with certain transactions. The method also includes providing for each shared resource, an associated auxiliary lock data structure stored in a shared memory accessible by the plurality of processors. The auxiliary lock data structure may be a single entry, the entry being a single atomic structure, or it may be an array which includes an entry for each processor, each entry being a single atomic structure. Each entry includes the resources needed to identify the successful lock requestor""s place in a queue of requestors and to identify the successful lock requestor. Each entry may also include the resources needed to save a timestamp as a reference value. The method also includes providing for each processor a monitoring procedure for detecting a predetermined indication of protocol failure by an one of the plurality of processors and identifying the failing processor. The method also includes providing for each processor a lock services procedure including a queuing procedure for unsuccessful lock requestors, locking and unlocking procedures for locking and unlocking the shared resource by a successful lock requestor, and a cooperative lock override procedure responsive to the detection of the predetermined indication of protocol failure. The method also includes detecting, by one of the processors, one of these predetermined indications of protocol failure and identifying the failing processor. The method also includes, in a single atomic operation, examining the contents of the auxiliary lock data structure by the detecting processor to determine whether the identified failing processor is the successful lock requestor, and either, if the identified failing processor is the successful lock requestor, in a single atomic operation by the detecting processor, examining the contents of the main lock data structure and writing data to the main lock data structure to reserve the lock to the next requestor in the queue of unsuccessful lock requestors and to revalidate the lock, or, if the identified failing processor is not the successful lock requestor, exiting the cooperative lock override procedure.
Prior to the step of examining the contents of the main lock data structure by the detecting processor, one of the requesting processors may, in a single atomic operation, examine the contents of the main lock data structure to determine if another requesting processor has previously locked the shared resource and if the lock contents are valid, and if it determines that the contents are invalid or no other requesting processor has previously locked the shared resource, it may write data to the main lock data structure to reserve and validate the lock.
In one aspect of the invention, the lock services procedure further includes at least two lock mode procedures and a lock mode selection procedure for selecting one from the lock mode procedures by a successful lock requestor. The locking and unlocking procedures include one or more procedures for locking and unlocking the shared resource in the selected lock mode by a successful lock requestor and the cooperative lock override procedure is selectively associated with a lock mode. The atomic main lock data structure further includes the resources needed to identify one of the lock modes and the auxiliary lock data structure further includes the resources needed to identify one from the lock modes. In examining the contents of the main lock data structure, the detecting processor may, in the same atomic operation, verify that the identified lock mode is a lock mode associated with the cooperative lock override procedure and in writing data to the main lock data structure to reserve the lock to the next requestor in the queue of unsuccessful lock requestors the detecting processor may, in the same atomic operation, invalidate the identified lock mode.
In another aspect, the invention provides an intelligent data storage system. The intelligent storage system typically includes multiple processors as requestors, and these are coupled to a shared resource through one or more first common communication channels. The system also includes a shared memory accessible over one or more second common communications channels to all of the processors. Each processor supports atomic operations. Each processor implements a monitoring procedure for detecting a predetermined indication of protocol failure by a one of the plurality of processors and identifying the failing processor. A lock services procedure is also implemented in each of the processors. The lock services procedure includes a queuing procedure for unsuccessful lock requestors, and locking and unlocking procedures for locking and unlocking the shared resource by a successful lock requestor, and a cooperative lock override procedure responsive to the detection of the predetermined indication of protocol failure. An atomic main lock data structure, responsive to the lock services procedures, is implemented in the shared memory and associated with the shared resource, The main lock data structure includes the resources needed to lock a shared resource by a successful lock requestor, to establish a place in a queue of unsuccessful lock requestors, and to validate the existence of the lock. An atomic auxiliary lock data structure, responsive to the lock services procedures, is also implemented in the shared memory and associated with the shared resource. The auxiliary lock data structure includes the resources needed to identify the successful lock requestor""s place in a queue of requestors and to identify the successful lock requestors. Each processor is operable in accordance with its monitoring procedure to detect a predetermined indication of protocol failure and identify the failing processor. Each processor is also operable in accordance with its lock services procedure, first to initiate its cooperative lock override procedure responsive to its detection of the predetermined indication of protocol failure, and then in a single atomic operation, to examine the contents of the auxiliary lock data structure to determine if the identified failing processor is the successful lock requestor, and either, if the identified failing processor is the successful lock requestor, in a single atomic operation, to examine the contents of the main lock data structure and write data to the main lock data structure to reserve the lock to the next requestor in the queue of unsuccessful lock requestors and to revalidate the lock, or, if the identified failing processor is not the successful lock requestor, to exit the cooperative lock override procedure.
Each of the requesting processors is also operable in accordance with its lock services procedure, in a single atomic operation, to examine the contents of the main lock data structure to determine if another requesting processor has previously locked the shared resource and if the lock contents are valid, and if it determines that the contents are invalid or no other requesting processor has previously locked the shared resource, it may write data to the main lock data structure to reserve and validate the lock.
In one aspect of the invention, the lock services procedure further includes at least two lock mode procedures and a lock mode selection procedure for selecting one from the lock mode procedures by a successful lock requestor. The locking and unlocking procedures include one or more procedures for locking and unlocking the shared resource in the selected lock mode by a successful lock requestor and the cooperativelock override procedure is selectively associated with a lock mode. The atomic main lock data structure further includes the resources needed to identify one of the lock modes. In examining the contents of the main lock data structure, the detecting processor may, in the same atomic operation, verify that the identified lock mode is the lock mode associated with the cooperative lock override procedure and in of writing data to the main lock data structure to reserve the lock to the next requestor in the queue of unsuccessful lock requestors the detecting processor may, in the same atomic operation, invalidate the identified lock mode.
In yet another aspect of the invention, multiple processes running on a single processor may in some aspects act as requestors, and a lock allocation process or procedure may be invoked by each of these processes, but the operation of the invention is otherwise as described above.