Modern computer systems typically consist of a CPU to process data, a networking interface to communicate to other computer systems, and one or more durable storage units. The system may stop processing, for example, due to power failure, program incorrectness, or a hardware fault. Such failures are often called process failures. The durable storage units are able to keep the data intact while the fault is repaired.
A set of these computer systems can be networked to form a cluster. Although the network is generally reliable, occasional faults may occur to disrupt communication between certain nodes or sets of nodes. This disruption in communication is often called a network partition.
Each of the nodes runs a transactional storage system that both reads and writes data (a database management system). Some of this data is concurrently accessed by applications operating on different nodes. To guarantee data consistency, distributed transaction and lock management techniques are used to manage and regulate access to that data. However, conventional distributed transaction and lock management techniques are associated with a number of problems.
For example, conventional systems typically exhibit poor performance of lock transfer. In more detail, as several programs contend for the same shared resources, access grants must be transferred as quickly as possible from one program and/or node to the next. Access grants must guarantee fairness so that all programs have a chance to access the resource eventually, providing freedom from livelock. Existing distributed memory lock algorithms assume the primitive of an ordered unicast (also called point-to-point) messaging layer (e.g., TCP/IP) and attempt to optimize the use of message traffic typically by imposing a virtual hierarchical tree topology on the communications paths across the nodes. Despite improvements achieved in this area, as the number of nodes in a cluster increases, the increase in message traffic and the associated processing costs and delays causes the performance of all of these algorithms to degrade dramatically as the number of nodes increase beyond a relatively small number.
Conventional systems are also susceptible to network or process failure. Networks, computer systems, and the processes running on those computer systems occasionally fail. In such situations, conventional algorithms either stop working (e.g., stop granting locks on all nodes) or work incorrectly (e.g., the algorithm does not provide the same guarantees or semantics that the algorithm provides in a faultless environment, for example, by granting locks on some nodes when the locks have not been released yet).
Conventional systems also exhibit poor scalability for resource ownership transfer. In most conventional algorithms, the acquisition and the release of N resources requires numerous messages (worst case N*M where M is the number of nodes, but best cases are typically log(N) or log(M)) to successfully transfer the ownership of a lock from one node to another). In addition, many messages are required for hierarchical lock ownership transfer. In most conventional algorithms, the support for hierarchical locking (also called multi-granular locking) requires multiple messages for each acquisition of ownership of a distributed resource.
What is needed, therefore, are better distributed transaction and lock management techniques.