In typical database systems, users store, update and retrieve information by submitting commands to a database application. To be correctly processed, the commands must comply with the database language that is supported by the database application. One popular database language is known as Structured Query Language (SQL).
A logical unit of work that is comprised of one or more database language statements is referred to as a transaction. In a database server, a memory area called the System Global Area (SGA) is allocated and one or more processes are started to execute one or more transactions. The combination of the SGA and the processes executing transactions is called a database instance.
Database instances use resources while executing transactions. Even though resources may be shared between database instances, many resources may not be accessed in certain ways by more than one process at any given time. For example, resources such as data blocks of a storage medium or tables stored on a storage medium may be concurrently accessed in some ways (e.g. read) by multiple processes, but accessed in other ways (e.g. written to) by only one process at a time. Consequently, mechanisms have been developed which control access to resources.
One such mechanism is referred to as a lock. A lock is a data structure that indicates that a particular process has been granted certain rights with respect to a resource. There are many types of locks. Some types of locks may be shared on the same resource by many processes, while other types of locks prevent any other locks to be granted on the same resource.
The entity responsible for granting locks on resources is referred to as a lock manager. In a single node database system, a lock manager will typically consist of one or more processes on the node. In a multiple-node system, such as a multi-processing machine or a local area network, a lock manager may include processes distributed over numerous nodes. A lock manager that includes components that reside on two or more nodes is referred to as a distributed lock manager.
FIG. 1 is a block diagram of a multiple-node computer system 100. Each node has stored therein a database instance and a portion of a distributed lock management system 132. Specifically, the illustrated system includes three nodes 102, 112 and 122 on which reside database instances 104, 114 and 124, respectively, and lock manager units 106, 116 and 126, respectively. Database instances 104, 114 and 124 have access to the same database 120. The database 120 resides on a disk 118 that contains multiple blocks of data. Disk 118 generally represents one or more persistent storage devices which may be on any number of machines, including but not limited to the machines that contain nodes 102, 112 and 122.
A communication mechanism allows processes on nodes 102, 112, and 122 to communicate with each other and with the disks that contain portions of database 120. The specific communication mechanism between the nodes and disk 118 will vary based on the nature of system 100. For example, if the nodes 102, 112 and 122 correspond to workstations on a network, the communication mechanism will be different than if the nodes 102, 112 and 122 correspond to clusters of processors and memory within a multi-processing machine.
Before any of database instances 104, 114 and 124 can access a resource shared with the other database instances, it must obtain the appropriate lock on the resource from the distributed lock management system 132. Such a resource may be, for example, one or more blocks of disk 118 on which data from database 120 is stored.
Lock management system 132 stores data structures that indicate the locks held by database instances 104, 114 and 124 on the resources shared by the database instances. If one instance requests a lock on a resource while another instance has a lock on the resource, the distributed lock management system 132 must determine whether the requested lock is consistent with granted locks. If the requested lock is not consistent with granted locks, then the requester must wait until the instance holding the conflicting granted lock releases the conflicting granted lock.
According to one approach, lock management system 132 maintains one master resource object for every resource managed by lock management system 132, and includes one lock manager unit for each node that contains a database instance. The master resource object for a particular resource stores (1) an indication of all locks that have been granted on or requested for the particular resource, and (2) a VALID flag for the resource.
The master resource object for each resource resides within only one of the lock manager units 106, 116 and 126. For each resource, the lock manager units that do not manage the master resource store data that indicates (1) the name of the resource, (2) data indicating which lock manager unit maintains the master resource object of the resource, (3) a VALID flag for the resource, and (4) the lock held on the resource by the database instance that is associated with the lock manager unit.
For example, assume that three resources (R1, R2 and R3) exist in database system 100, and that the master resource objects for R1, R2 and R3 are stored in lock manager units 106, 106 and 126, respectively. Assume also that database instance 104 has been granted an exclusive mode lock on resource R1 and a shared mode lock on resource R2. Database instance 114 has been granted a shared mode lock on resource R2. Database instance 124 does not hold any shared or exclusive mode locks. FIG. 2A illustrates the information maintained in each of the lock manager units 106, 116 and 126 under these conditions.
Various types of failures may occur in a database system, including node failure and instance failure. When a database instance fails, the current state of and all data structures in the failed database instance are lost. When a node fails, all information stored in the volatile memory of the node is lost, including the current state of and all data structures in any instance that resides on the node and any portion of the distributed lock manager that resides on the node. Thus, if node 102 failed, the current state of and all data structures in both lock manager unit 106 and database instance 104 will be lost.
When a database instance fails, either pursuant to an instance failure or a node failure, a cleanup operation must be performed on the resources on which the failed instance held any type of exclusive lock. Other database instances are allowed to access those resources only after the appropriate cleanup operations are performed. All of the exclusive mode locks held by a failed instance can easily be identified if all portions of the distributed lock management system 132 continue to operate after the failure of the instance. However, instance recovery becomes more difficult if a portion of the distributed lock management system 132 also fails.
For example, assume that node 102 fails. With the failure of node 102, lock manager unit 106 and database instance 104 will crash. All of the information maintained within lock manager unit 106 will be lost. Potentially, the lost information includes many or all of the exclusive mode locks granted to database instance 104. Under the conditions illustrated in FIG. 2A, the master resource objects for R1 and R2 would be lost with the failure of node 102.
Based on the information that remains in lock manager units 116 and 126, it is possible to determine some information about the locks that were held by database instance 104 when database instance 104 failed. For example, the remaining lock manager units 116 and 126 may maintain the master resource objects of some of the locks that were held by database instance 104. In the illustrated example, lock manager unit 126 manages the master resource object for resource R3, and therefore knows that database instance 104 owned a null mode lock on R3. Consequently, cleanup does not have to be performed on R3. If database instance 104 held an exclusive mode lock on R3, then the lock manager unit 126 would mark R3 as "invalid".
In addition to managing some of the locks held by database instance 104, lock manager units 116 and 126 may contain information that may be used to determine that database instance 104 could not have owned exclusive locks on particular resources. For example, lock manager unit 116 stores data that indicates that database instance 114 has a shared mode lock on resource R2. A shared lock is incompatible with an exclusive mode lock, so it can be concluded that database instance 104 did not have an exclusive mode lock on resource R2. Consequently, no cleanup must be performed on resource R2.
Using the techniques described above, it is possible to determine that some resources must be cleaned up before a subsequent access, and that other resources do not have to be cleaned up before a subsequent access. However, there may be a large number of resources for which it is not possible to determine whether a cleanup is necessary. For example, based on the information contained in lock manager units 116 and 126, it is not possible to determine whether database instance 104 had an exclusive mode lock on resource R1. Resources that fall into this category are also marked "invalid".
Specifically, when a node fails, each surviving lock manager unit sets to FALSE the VALID flag of each resource on which the failed instance held or may have held an exclusive mode lock. The distributed lock management system 132 does not grant valid locks on any requests on resources that are marked invalid. Once cleanup has been performed on a resource, each lock manager unit sets the VALID flag for the resource back to TRUE, and the distributed lock management system 132 can grant valid locks on the resource.
Thus, after the failure of node 102, lock manager units 116 and 126 set the VALID flag associated with resource R1 to FALSE. While the VALID flag remains FALSE, lock management system 132 grants locks on resource R1 with a return value that indicates that the locks are invalid, and the associated database initiates cleanup of the appropriate resources. As part of the lock manager cleanup of R1, a new master resource object for R1 is created and any locks on R1 held by the surviving database instances are placed on the new master resource object. When the resource cleanup operation for R1 has been completed, lock manager units 116 and 126 set the VALID flag associated with resource R1 to TRUE. After lock manager units 116 and 126 have set the VALID flag of R1 to TRUE, lock management system 132 may once again grant locks on resource R1.
FIG. 2B illustrates lock manager units 116 and 126 after the failure of node 102. With the failure of node 102, the master resource objects for resources R1 and R2 are lost. In FIG. 2B, master resource objects for resources R1 and R2 have been recreated within lock manager unit 116. The master resource object for resource R1 has been marked invalid because it is not possible to determine whether instance 104 held an exclusive mode lock on resource R1. The master resource object for resource R2 is marked valid because the shared lock on resource R2 held by instance 114 would have prevented instance 104 from having an exclusive lock on resource R2.
A significant problem with the recovery technique described above is that every lock manager unit must store data for every resource. Consequently, each new resource used by any instance in the system increases the size of every lock manager unit within the system. In large systems that use small granularity locks, each lock manager unit within distributed lock management system 132 may have to store data for millions of resources, whether or not the resources are ever used by any process in the system. Consequently, the total overhead required by the distributed lock manager becomes enormous.
However, if each lock manager unit does not store information about each resource in the system, it is possible for a lock manager unit to grant a lock that should not be granted. For example, assume that lock manager units 116 and 126 did not store any lock information for resource R2. After the crash of lock manager unit 106, there is no indication that a master resource object had ever been created for R2. Consequently, if one of the remaining lock manager units 116 and 126 received a lock request for R2, the lock manager unit would simply create a master resource object for R2 and grant the requested lock. The process to which the lock was granted would then have access to potentially invalid data stored in R2.
Based on the foregoing, it is desirable to provide a distributed lock management system in which each lock manager unit does not have to store information about every resource in the system. It is further desirable to provide a distributed lock management system in which lock manager units will not grant locks on resources that are in invalid states even though the lock manager units do not have prior knowledge of the resource.