Within the context of computer systems, many types of resources can be shared among processes. However, many resources, though sharable, may not be accessed in certain ways by more than one process at any given time. For example, resources such as data blocks of a storage medium or tables stored on a storage medium may be concurrently accessed in some ways (e.g. read) by multiple processes, but accessed in other ways (e.g. written to) by only one process at a time. Consequently, mechanisms have been developed which control access to resources.
One such mechanism is referred to as a lock. A lock is a data structure that indicates that a particular process has been granted certain rights with respect to a resource. There are many types of locks. Some types of locks may be shared on the same resource by many processes, while other types of locks prevent any other locks from being granted on the same resource.
The entity responsible for granting locks on resources is referred to as a lock manager. In a single node database system, a lock manager will typically consist of one or more processes on the node. In a multiple-node system, such as a multi-processing machine or a local area network, a lock manager may include processes distributed over numerous nodes. A lock manager that includes components that reside on two or more nodes is referred to as a distributed lock manager.
FIG. 1 is a block diagram of a multiple-node computer system 100. Each node has stored therein a database server and a portion of a distributed lock management system 132. Specifically, the illustrated system includes three nodes 102, 112 and 122 on which reside database servers 104, 114 and 124, respectively, and lock manager units 106, 116 and 126, respectively. Database servers 104, 114 and 124 have access to the same database 120. The database 120 resides on a disk 118 that contains multiple blocks of data. Disk 118 generally represents one or more persistent storage devices which may be on any number of machines, including but not limited to the machines that contain nodes 102, 112 and 122.
A communication mechanism allows processes on nodes 102, 112, and 122 to communicate with each other and with the disks that contain portions of database 120. The specific communication mechanism between the nodes and disk 118 will vary based on the nature of system 100. For example, if the nodes 102, 112 and 122 correspond to workstations on a network, the communication mechanism will be different than if the nodes 102, 112 and 122 correspond to clusters of processors and memory within a multi-processing machine.
Before any of database servers 104, 114 and 124 can access a resource shared with the other database servers, it must obtain the appropriate lock on the resource from the distributed lock management system 132. Such a resource may be, for example, one or more blocks of disk 118 on which data from database 120 is stored.
Lock management system 132 stores data structures that indicate the locks held by database servers 104, 114 and 124 on the resources shared by the database servers. If one database server requests a lock on a resource while another database server has a lock on the resource, then the distributed lock management system 132 must determine whether the requested lock is consistent with the granted lock. If the requested lock is not consistent with the granted lock, then the requester must wait until the database server holding the granted lock releases the granted lock.
According to one approach, lock management system 132 maintains one master resource object for every resource managed by lock management system 132, and includes one lock manager unit for each node that contains a database server. The master resource object for a particular resource stores, among other things, an indication of all locks that have been granted on or requested for the particular resource. The master resource object for each resource resides within only one of the lock manager units 106, 116 and 126.
The node on which a lock manager unit resides is referred to as the “master node” (or simply “master”) of the resources whose master resource objects are managed by that lock manager unit. Thus, if the master resource object for a resource R1 is managed by lock manager unit 106, then node 102 is the master of resource R1.
In typical systems, a hash function is employed to select the particular node that acts as the master node for a given resource. For example, system 100 includes three nodes, and therefore may employ a hash function that produces three values: 0, 1 and 2. Each value is associated with one of the three nodes. The node that will serve as the master for a particular resource in system 100 is determined by applying the hash function to the name of the resource. All resources that have names that hash to 0 are mastered on node 102. All resources that have names that hash to 1 are mastered on node 112. All resources that have names that hash to 2 are mastered on node 122.
When a process on a node wishes to access a resource, a hash function is applied to the name of the resource to determine the master of the resource, and a lock request is sent to the master node for that resource. The lock manager on the master node for the resource controls the allocation and deallocation of locks for the associated resource.
While the hashing technique described above tends to distribute the resource mastering responsibility evenly among existing nodes, it has some significant drawbacks. For example, it is sometimes desirable to be able to select the exact node that will function as master node to a lock resource. For example, consider the situation when a particular lock resource is to be accessed exclusively by processes residing on node 102. In this situation, it would be inefficient to have the lock resource and the request queue for that resource located on any node in the network other than node 102. However, the relatively random distribution of lock resource management responsibilities that results from the hash function assignment technique makes it unlikely that resources will be mastered at the most efficient locations.
To address the inefficiency associated with the randomness of assigning masters based on a hash function, techniques have been developed for establishing resource-to-master-node assignments based on the affinity between (1) nodes and (2) the objects to which the resources belong. In this context, an “object” may be any entity that includes resources that are protected by locks. The types of objects to which the techniques described herein may be applied may vary based on the type of system in which the techniques are used. For example, within a relational database system, “objects” could include tables, table partitions, segments, extents, indexes, Large Objects (LOBs), etc. Within a file system, “objects” could include files, sets of file system metadata, etc. Within a storage system, “objects” could include storage devices, disk sectors, etc.
The “affinity” between a node and an object refers to the degree of efficiency achieved by assigning the node to be the master of the resources that belong to the object. For example, a particular node that accesses a table much more frequently than any other node has a high degree of affinity to the table. Relative to that table, the degree of affinity for that particular node is high because, if that node is assigned to be the master of the resources within the table, a high number of inter-node lock-related communications would be avoided. On the other hand, a node that accesses a table much less frequently than other nodes has a low degree of affinity to the table, because assigning that node to be the master of the table would avoid few inter-node lock-related communications.
The Related Applications describe various techniques related to mastering resources based on the affinity between nodes and the objects to which the resources belong. In general, once an affinity relationship has been established between an object and a node, the resources for the object cease to be randomly mastered across the nodes in the system. Instead, the node becomes master for all of the resources that belong to the object. On the other hand, when an affinity relationship is dissolved, the resources of the object are no longer mastered by the node with whom they had the affinity relationship. Instead, the resources are remastered across the nodes in the system.
One problem occurs when transactions are failed over to a node different than the node on which the transaction was started. The new node will not have the affinity relationships the previous node which hosted the transaction had established. Until new access patterns drive the same affinity relationships on the new node over time, inefficiency occurs due to the loss of the affinity relationship.
The approaches described in this section are approaches that could be pursued, but not necessarily approaches that have been previously conceived or pursued. Therefore, unless otherwise indicated, it should not be assumed that any of the approaches described in this section qualify as prior art merely by virtue of their inclusion in this section.