In a database system, it is often a challenge to provide a way to synchronize access to shared resources for distributed database application. An example situation for which this type of synchronization can be used, for example, is in multi-instance database systems. A multi-instance database system contains a shared architecture in which the multiple running instances can each be used to manage a set of shared physical data files. An exemplary example of a multi-instance database system is the RAC (Real Application Cluster) product, available from Oracle Corporation of Redwood Shores, Calif., which has an architecture where in a typical scenario each of the database instances resides on a separate host and forms its own set of background processes and memory buffers, but in which the RAC infrastructure enables access to a single shared database via the multiple database instances. A synchronization mechanism is usually provided to prevent conflicts when the multiple instances seek to access the same set of shared resources.
Lock management is a common approach that is used to synchronize accesses to the shared resources. A resource corresponds to any object or entity to which shared access must be controlled. For example, the resource can be a file, a record, an area of shared memory, or anything else that can be shared by multiple processes in a system. A hierarchy of resources may be defined, so that multiple levels of locking can be implemented. For instance, a database might define a resource hierarchy as follows: Database->Table->Record (Row)->Field, in descending order. A process can then acquire locks on the database as a whole, and then on particular parts of the database. A lock must be obtained on a parent resource before a subordinate resource can be locked.
There are several possible approaches that can be taken to implement a lock management system. One common approach is to use a centralized lock management (CLM) service in which the locks are managed in a centralized location. The enqueues of the lock requests are centrally managed in a dedicated database (DB) component. When an instance seeks access to a given resource, the instance sends a request to the centralized lock management component to obtain a lock on the resource.
One problem with the CLM approach is that performance bottlenecks may exist because of the centralized nature of the lock management system, since all instances must go through a single centralized location to obtain locks on resources. Additionally, the CLM approach poses the risk of having single point of failures. Therefore, the centralized approach lacks the flexibility and autonomy needed by processes at each compute node in modern multi-instance database systems.
Another possibility is to use a Distributed Lock Management (DLM) approach, which permits multiple database instances to access the same database files simultaneously, where communications between the instances are managed by the Distributed Lock Manager (DLM). To address the possibility of two or more instances attempting to modify the same information simultaneously, the DLM uses multiple distributed processes (e.g., ten background processes named LCK0 through LCK9) to lock the resources in use by these instances.
However, there are also many challenges with this approach. One problem is that communications between instances for the DLM often involve broadcast of the enqueue messages among compute nodes in the cluster, which can lead to heavy overhead of network communications, which is particularly a problem due to sometimes unreliable network transmissions. Another issue is that deadlock is often a major challenge and complicated deadlock detection algorithm has to be developed. In addition, the DLM approach often leads to longer latency due to the complicated coordination and deadlock detection among compute nodes in the cluster. Due to these reasons, this approach does not scale well with a large number of compute nodes, say in the number of hundreds, in the cluster due to the heavy overhead of network communications of the enqueue messages and complicated coordination and deadlock detection for DLM.
Therefore, there is a need for an improved approach to synchronize accesses to shared resources for distributed database application that addresses at least these problems with the alternative approaches.