1. Field of the Invention
The present disclosure relates to a robust mechanism for assuring integrity and availability of data in the event of one or more failures of nodes and/or resources of a distributed system. More particularly, it deals with lock management and page registration for resources shared in a distributed system.
2. Description of the Related Art
Distributed systems, in particular data processing systems, have long used “locking” as a means for insuring data integrity. At its most fundamental level, locking a resource is a technique used by a process to prevent its use by another process until the locking process is finished with it (the resource).
As an example of a distributed system including locking mechanisms, shared data cluster database systems allow read and write transactions to be directly executed on multiple computers in a cluster. The cluster database system provides a global locking mechanism to coordinate access to the data, and to ensure that updates are coherently performed.
In a shared data cluster database system, the global locking function may be implemented by a separate processing element in the cluster. The separate processing element is referred to as the ‘primary coherency processing element’ (PCPE). The PCPE runs on it's own dedicated computer, or alternatively, on the same computer as a database node. The PCPE may not run database transactions, instead it may provide a global locking function, and other functions to assist in the overall coherency of the cluster database, such as page registration services. In such systems, if a database node fails, the PCPE remains available to service new lock requests, and only locks held by the failed host are unavailable until the failed host recovers.