1. Field of the Invention
The present invention relates to computer systems and methods in which common data are shared among nodes in a clustered data processing system while preserving data integrity and consistency. More particularly, the invention concerns an improved data consistency technique that does not require distributed locks or ad hoc messaging protocols and in which data read operations are able to run concurrently with data update operations without requiring locks or messages, thereby greatly improving their performance.
2. Description of the Prior Art
By way of background, a clustered data processing system represents a computational environment in which plural discrete computers (referred to as nodes) cooperate to provide a common data processing service or function. It is often the case that the nodes of a cluster cooperatively share mutable data that must remain in a consistent state throughout the cluster, yet can be manipulated locally at each node. For example, in a distributed database system, database server nodes managing a distributed pool of data storage must each maintain a consistent view of which server nodes are currently members of the cluster. It is necessary that each node be aware of state changes occurring at other nodes and that such state changes be coordinated among the nodes. By way of further example, in a distributed lock manager system, a flag can be used to indicate when a node is in a recovery mode following a node failure (and the system is attempting to recover the failed node's previous locking state). Lock requesters within the cluster that seek to acquire locks should see a consistent view of the flag, so that they are aware the recovery mode is in force and not a normal operational mode.
In the past, clustered systems have used globally mediated locks or leases to mediate access to shared mutable data. However, processes acquiring these locks or leases must incur substantial overhead. In cases where the data is rarely modified, this overhead is largely wasted. There are a number of methods of overlapping the latency of lock/lease acquisition with that of actual disk I/O (Input/Output), so-called “optimistic locking” techniques in which processes perform data updates under the assumption that any commit has a chance to fail because at least one of the data objects being committed has been changed by another process since the transaction began. In contrast, under so-called “pessimistic locking,” a process explicitly obtains a lock before performing any update transaction. There are also timestamping and versioning techniques for maintaining data consistency, but these require that processes using a particular data version register in some way to prevent that version from being prematurely discarded. In all such cases, cluster-wide locking/leasing is required, even if the workload is entirely read-only. Although there are a number of techniques for caching locks, so that acquiring compatible locks does not require subsequent I/O, this still incurs the overhead of checking that the lock is still being cached.
As an alternative to lock/lease-oriented mutual exclusion, clustered data processing systems can also use message-based protocols, such as two-phase commit. The two-phase commit protocol is a distributed algorithm that lets nodes in a cluster agree to commit a transaction. The protocol results in either all nodes committing the transaction or aborting the transaction. The two-phases of the algorithm are broken into a commit_request phase and a commit phase. In the commit_request phase, a node acting as a coordinator in connection with the transaction sends notification messages to all other nodes and waits for responses from the other nodes in which each node either agrees to the request or asserts an abort reply. In the commit phase, if all nodes have agreed to commit, the coordinator sends a commit message, following which all of the nodes commit the transaction. Otherwise, the coordinator sends an abort message.
Although message-based mutual exclusion protocols are generally effective, a major disadvantage is that they tend to be blocking. For example, in the two-phase commit protocol, a node will block while waiting for a message. This means that other processes competing for resource locks held by the blocked processes will have to wait for the locks to be released. In addition, a single node will continue to wait even if all other nodes have failed. If the coordinator fails permanently, some cohorts will never resolve their transactions.
The foregoing motivates a search for a new cluster-oriented mutual exclusion technique that overcomes the foregoing problems. What is particularly needed is an improved technique that is not burdened with the overhead of managing distributed locks or leases, and which does not require extensive message exchange with message-waiting processes blocking until responses are received.