Multiple processes running on multi-processing systems may access “shared resources.” Some of these shared resources may be accessed by only one process at a time, while others may be accessed concurrently by multiple processes. Consequently, “synchronization mechanisms” have been developed to control access by multiple processes to shared resources. The synchronization mechanism grants locks to processes. Locks grant to holders of the locks the right to access a particular resource in a particular way. Once a lock is granted to a process, the process holds or owns the lock until the lock is relinquished, revoked, or otherwise terminated. Locks are represented by data structures such as semaphores, read/write latches, and condition variables. There are many types of locks. Some types of locks allow shared resources to be shared by many processes concurrently (e.g. shared read lock), while other types of locks prevent any type of lock from being granted on the same resource (exclusive write lock).
The entity responsible for granting locks is referred to herein as a lock manager. In a single node multi-processing system, a lock manager is typically a software component executed and invoked by processes on the node accessing a shared resource.
In contrast to a single node system, a multi-node system consists of network of computing devices or “nodes,” each of which may be a multi-processing system. Each of the nodes can access a set of shared resources. Multi-node systems use synchronization mechanisms, referred to as global synchronization mechanisms, to control access to the set of shared resources by nodes in the multi-node system.
A global lock mechanism includes a global lock manager that is responsible for issuing locks to processes on the multi-node system. In order for a node to access a shared resource, it is granted a “global lock” by a global lock manager. A global lock is a lock that can be granted by a global lock manager on a node in a multi-node system to one or more processes on another node to coordinate access to the shared resources among the processes executing on any node in a multi-node system.
One type of global lock manager, a central global lock manager, is responsible for issuing locks for all shared resources in a multi-node system. Another type of global lock manager, a distributed lock manager, is comprised of local lock managers, with one or more of the local lock managers running on each node in a multi-node system. Each lock manager is responsible for coordinating the global locks that are needed to access a subset of shared resources.
Nodes are described herein as performing actions and as being the object of actions. However, this is just a convenient way of expressing that one or more processes on a node are performing an action or is the object of an action. For example, a node accessing a shared resource or granting, holding, or being issued a lock is just a convenient way of expressing that a process on the node is accessing a shared resource or granting, holding, or being issued a lock.
Techniques have been developed for establishing resource-to-master-node assignments based on the affinity between (1) nodes and (2) the objects to which the resources belong. In this context, an “object” may be any entity that includes resources that are protected by locks. The types of resources to which the techniques described herein may be applied may vary based on the type of system in which the techniques are used. For example, within a relational database system, “resources” could include data blocks, tables, table partitions, segments, extents, indexes, Large Objects (LOBs), etc. Within a file system, “resources” could include files, sets of file system metadata, etc. Within a storage system, “resources” could include storage devices, disk sectors, etc.
The “affinity” between a node and an object refers to the degree of efficiency achieved by assigning the node to be the master of the resources that belong to the object. For example, a particular node that accesses a table much more frequently than any other node has a high degree of affinity to the table. Relative to that table, the degree of affinity for that particular node is high because, if that node is assigned to be the master of the resources within the table, a high number of inter-node lock-related communications would be avoided. On the other hand, a node that accesses a table much less frequently than other nodes has a low degree of affinity to the table, because assigning that node to be the master of the table would avoid few inter-node lock-related communications.
While the locking mechanisms described above are usually effective at synchronizing access to resources by nodes in a multi-node systems, problems exist where data is accessed from multiple instances, mostly for reads. While a read must return the most recently updated version of the block, a write must ensure that no other node or instance has a current copy of the block. Various techniques for dealing with this problem incur unnecessary amounts of overhead, among other shortcomings.
If data has affinity to an instance, the mastership for the data is dynamically transferred to this instance and the instance will be able to obtain “affinity locks” for the data. Affinity locks are so termed because they are granted to a master for the resource whose mastership was acquired by affinity. As master, it may create a lock without coordinating with another node and/or lock manager. If the data does not have affinity to a single instance, the mastership for this data may be uniformly distributed across all instances and each instance would obtain regular locks on blocks of the data.
This locking protocol is not efficient if neither of these conditions is satisfied: (1) most of the lock requests are for read locks, which result in very few read-write conflicts, and (2) there is very little read-sharing among instances, as most lock requests result in lock grants followed by a read of that block from disk.
Further, in many cases there will be no affinity and many nodes will be accessing the same piece of data, but different parts of the same piece of data, mostly for reads. In this case, one node should not be assigned to be the master because the other nodes are going to be sending messages to single masters to obtain locks.
In these cases, locking incurs unnecessary overhead. One technique to reduce the locking overhead is to utilize high-performance interconnects with specialized operations. A significant disadvantage, however, is that this technique is not a generic solution and will not work with the ubiquitous UDP/Ethernet IPC stack.
Another technique to reduce the overhead cost of acquiring locks is to use coarse-grain locking. In this scheme, locks are acquired at a higher level of granularity, such as a table or file, instead of a finer level of granularity, such as a row or a disk block. When a lock is acquired at the higher level of granularity, it is implicitly granted for levels of shared data at a finer level of granularity. For example, if a global lock is acquired for an entire table, individual global locks for the rows or blocks for the table are implied and do not have to be acquired, avoiding the cost of obtaining a global lock for each row and block.
One disadvantage to this technique is that when an instance needs to modify data locked by a coarse lock, all instances must release their coarse locks because there is no way of detecting which data is to be modified. This takes a significant amount of time because a single coarse lock can protect several blocks. Further, because the non-modifying instance has released the coarse lock, it must reacquire the coarse lock if it has to access any block that is protected by the coarse lock even though the modifying instance is modifying a completely different block.
Another technique to reduce the overhead cost of acquiring locks is to use hierarchical locking. In this scheme, locks are first acquired at a higher level in the hierarchy, such as a table. If a global lock is acquired at a higher level in the hierarchy, global locks are implicitly granted at the lower level of the hierarchy. When another node subsequently needs to access data in the lower level of the hierarchy, such as a row or a block, in a conflicting mode, the first node de-escalates its lock and acquires locks at the lower level in the hierarchy.
This technique has significant disadvantages. First, it is prone to deadlocks, and it is only applicable when the object being shared has a natural hierarchy, such as a B-Tree. Many objects such as flat files, heap tables and other indexes do not have a natural hierarchy and are not candidates for this type of locking approach.
The approaches described in this section are approaches that could be pursued, but not necessarily approaches that have been previously conceived or pursued. Therefore, unless otherwise indicated, it should not be assumed that any of the approaches described in this section qualify as prior art merely by virtue of their inclusion in this section.