A multi-node computer system is made up of interconnected nodes that share access to resources. Typically, the nodes are interconnected via a network and share access, in varying degrees, to shared storage (e.g. shared access to a set of disk drives). The nodes in a multi-node computer system may be in the form of a group of computers (e.g. work stations, personal computers) that are interconnected via a network. Alternately, the nodes may be the nodes of a grid. A grid is composed of nodes in the form of server blades interconnected with other server blades on a rack.
The term resource herein refers to any resource used by a computer to which access between multiple processes is managed. Resources include units of memory, peripheral devices (e.g. printers, network cards), units of disk storage (e.g. a file, a data block), and data structures (a relational table, records of relational tables, a data block that holds records of a relational table). A shared resource is a resource shared and accessed by multiple nodes in a multi-node system.
Even though resources may be shared, many resources may not be used by more than one process at any given time. For example, most printers are unable to print more than one document at a time. Other resources, such as data blocks of a storage medium or tables stored on a storage medium, may be concurrently accessed in some ways (e.g. read) by multiple processes, but accessed in other ways (e.g. written to) by only one process at a time. Consequently, mechanisms have been developed which manage access to shared resources of a multi-node system.
One such mechanism is referred to herein as a two-tiered lock system. In a two-tiered lock system, for a given resource, one node in a multi-node computer system is the “master” of the resource and responsible for managing access to the resource. Access by processes in a multi-node system, whether the process is executing on the master or another node within the system, is controlled by the master of the resource. To gain access to a resource, a request must be made to the master of the resource, which may grant or deny the request. Processes on a node that is not the master (i.e. a “remote node”) are not individually granted access to a resource by a master node. Rather, a slave node is granted access to a resource, and once granted, the process on the slave may access the resource.
With respect to a particular master node, processes on the master node are referred to herein as local processes; processes on a remote node are referred to herein as remote processes.
A master node uses locks to manage access rights (“rights”) to a resource. A lock is a data structure that indicates whether a particular entity has requested and/or been granted a certain right to a resource. When a request for the right represented by a lock has been granted, the lock itself is referred to as being granted.
With respect to a master node, locks requested by or granted to local processes are referred as local locks while locks requested by or granted to remote processes are referred to as remote locks. Lock requests made by local processes are referred to as local lock requests and lock requests made by remote processes are referred to as remote lock requests.
Lock Types
There are many types of locks. For a given resource, a “shared lock” represents a right to share access to the resource. A shared lock may be concurrently granted to multiple processes, allowing them the right to share a form of access (e.g. read access). An “exclusive lock” may only be concurrently granted to one process. Once granted, the lock prevents this type of lock from being granted for the resource.
Due to the various permissions and guarantees associated with these locks, certain combinations of locks are not allowed to be concurrently granted. For example, if a process owns an exclusive lock on a resource, then no other process can be granted an exclusive lock or a shared lock. If a process owns a shared lock, then other processes may be granted shared locks but may not be granted an exclusive lock. Locks which cannot be combined are referred to herein as being incompatible or conflicting.
Managing Locks Using Queues
To manage the granting of locks to a resource, a master node uses queues. FIG. 1 is a block diagram showing a set of queues used by a master node to manage the granting of locks.
Referring to FIG. 1, it shows convert queue 102 and granted queue 103 for a master node N1. Master node N1 is part of a multi-node system that also includes nodes N2, N3, N4, and N5, which are not depicted. A convert queue, such as convert queue 102, holds locks for a right that has been requested but not granted. A granted queue, such as granted queue 103, holds locks that have been granted. When a master node receives a lock request from a resource, the master resource places a lock representing the request on the convert queue. When a lock is granted, the lock is placed on the convert queue.
The term queue refers to any data structure with ordered elements or entries. The entry first in the order is referred as being at the head of the queue, and the entry last in order is referred to as being at the tail of the queue. Convert queue 102, as depicted in FIG. 1, holds locks C1–C9, which are for nodes N1, N1, N2, N1, N3, N1, N1, N4, N1, respectively. The entries at the head and at the tail are locks for master node N1. Granted queue 103 holds one lock G1, which is for node N5.
Typically, entries in a queue are processed in a first-in-first-out (“fifo”) basis. When an entry is removed from the head of the queue, the entry following in order moves to the head of the queue. When an entry is added to the tail of the queue, it is added as the last entry in order; the entry formerly at the end is no longer at the tail of the queue. If the lock at the head of the convert queue does not conflict with any lock in the granted queue, then the lock at the head is granted, removed from the convert queue, and added to the granted queue.
For example, lock G1 in granted queue 103 is an exclusive lock. Locks C1 and C2 are shared locks, and lock C3 is an exclusive lock. While exclusive lock G1 remains in the granted queue, shared lock C1 cannot be granted because it conflicts with exclusive lock G1. Shared lock C1 is referred to being blocked by exclusive lock G1; exclusive lock G1 is referred to as being blocking. Shared lock C2 and exclusive lock C3 cannot be granted because they follow lock C1 in the queue and are not granted before lock C1 is removed from the convert queue.
Next, the master node removes exclusive lock G1 from the granted queue, when, for example, the owner of exclusive lock G1 relinquishes the lock. The master node then grants shared lock C1, removes it from the convert queue, and adds it to the granted queue.
As a result, shared lock C2 moves to the head of the queue. Given that shared lock C1 is compatible with shared lock C2, shared lock C2 is granted, leaving exclusive lock C3 on the convert queue.
Exclusive lock C3 is not compatible with a lock on the convert queue, i.e. not compatible with shared locks C1 and C2. Exclusive lock C3 is therefore blocked.
Inherent Unfair Resource Allocation
In a two-tier lock system, some remote nodes may suffer from an inherent bias with respect to the frequency and speed at which they are added to the convert queue. Such bias is referred to herein as queuing bias. As a result of queuing bias, the remote nodes may receive an unbalanced and disproportionately lesser share of a resource relative to the extent processes on the remote nodes request the resource. There are various forms of queuing bias that stem from a variety of causes.
One form of queuing bias favors local processes. This form of queue bias occurs because of locality. Locality refers to the condition of being local on a master node. Locality gives local processes an inherent advantage in several ways with respect to frequency and speed with which locks for local lock requests are placed in a convert queue. A local process does not have to transmit a lock request to the master node using an inter-node communication mechanism, like remote nodes do. Transmitting requests in this way can involve relatively substantial time and delay. As a result, a lock request for a local process can be processed and responded to much more quickly than one from a remote node.
Second, a remote node may be restricted to only one lock on a convert queue of a master node even though multiple remote processes on the remote node have requested a remote lock. This restriction is a measure designed to reduce network traffic. For a given resource, when remote processes on a remote node request a remote lock, the lock requests are not transmitted by the remote while a remote lock exists in the convert or granted queue of the resource. A local process, however, is not subject to such a restriction. Thus, many local lock requests may be added to the convert queue in front of multiple remote lock requests while the transmission of the remote lock requests is deferred.
For example, convert queue 103 contains many locks for master node N1 but only one for each of remote nodes N2, N3, and N4. Other remote processes on N2 have made remote lock requests, which are deferred until lock C3 is granted and relinquished. Local locks C4, C6, C7, and C9 were generated for local lock requests after many of the deferred remote locks requests on node N2 were generated.
Finally, another cause of queuing bias is the relative computing power of a node and the speed at which it may communicate via inter-node communication mechanisms with the master node. Nodes with relatively higher computing power or access to a faster inter-node communication mechanism can process and transmit remote lock requests much more quickly, allowing their lock requests to be queued more frequently and swiftly.
For a resource in high demand, queuing bias alone can allow one node to horde the resource and starve other nodes of it. This causes uneven or unbalanced use of shared resource among nodes of a multi-node system that share the resource.
Inefficient Parallel Processing
Parallel processing is a very important feature of a multi-node system. Under parallel processing, a task may be performed more quickly if divided into subtasks that are each concurrently performed by a node in the multi-node system. Each node performs their respective subtask in parallel, i.e., concurrently.
Queuing bias leads to inefficient parallel processing. In general, parallel processing is performed more efficiently if all participating nodes complete their respective task at the same time. Queuing bias causes the participating nodes to complete their subtasks at different times, leading to inefficient parallel processing in a multi-node system.
Specifically, if a task to be performed in parallel involves use of resources mastered by a subset of nodes participating in the parallel execution of the task, then queuing bias favors the subset of nodes. The subset of nodes, which horde the resources from other nodes participating in the parallel execution of the task, will thus complete their respective subtasks sooner than the other nodes.
Based on the foregoing, there is a clear need for techniques that lessens adverse effects of queuing bias.