A “cluster” is the result of “clustering” computing resources together in such a way that they behave like a single resource. Clustering is often used for purposes of parallel processing, load balancing and fault tolerance. One common example of a cluster is a set of computers, or “nodes”, that are configured so that they behave like a single computer. Each computer in the cluster has shared access to a set of resources. A resource is, generally, any item that can be shared by the computers in the cluster. A common example of a resource is a block of memory in which information is stored. The block of memory may be part of a node in the cluster or may be external to the cluster, such as a database block.
A cluster comprises multiple nodes that each executes an instance of a server that each facilitates access to a shared set of resources on behalf of clients of the cluster. One example of a cluster is a database cluster. A database cluster comprises multiple nodes that each executes an instance of a database server that each facilitates access to a shared database. Among other functions of database management, a database server governs and facilitates access to the particular database by processing requests by clients to access data in the database.
Typically, resources are assigned to masters, where each master coordinates the sharing of the resources assigned to it. A single node is the master of a given shared resource. A master has a global view of the state of the shared resources that it masters at any given time and acts as a coordinator for access to the shared resource. For example, a master coordinates and is aware of which node is currently granted a lock on the shared resource (and what type of lock) and which nodes are queued to obtain a lock on the shared resource. Typically, the master's global view of the status of a shared resource is embodied in metadata associated with the resource.
Each shared resource is mapped to a master. Various mechanisms may be used to establish the resource-to-master mapping. Techniques for using hash tables to establish the resource-to-master mapping are described in detail, for example, in U.S. Pat. No. 6,363,396. The techniques described herein are not limited to any particular mechanism for establishing the resource-to-master mapping.
In order to ensure a balanced distribution of shared resource mastership among nodes in a cluster, resources should be remastered when a node membership change causes a reconfiguration in the cluster, such as when a node is added to or removed from a cluster. For example, if a node goes down and needs to be removed from the cluster, the resources that the node was mastering need to be remastered, i.e., the mastership of these resources needs to be redistributed to other nodes in the cluster. Likewise, if a node is added to the cluster, the new node should be assigned some resources to master, from the other nodes in the cluster. Resource remastering generally entails message exchanges among the nodes regarding which resources need to be remastered as a result of the cluster reconfiguration. Once these messages are exchanged and the resources for remastering are agreed upon by the nodes, resource remastering generally entails freezing access operations (e.g., granting locks) on the particular resources being remastered, while transferring the new resource-to-master mapping and transferring the global view of the state of the resources (e.g., metadata) being remastered from the source master node to the target master node. Resource remastering associated with a cluster reconfiguration operation typically results in a better performing system.
However, with past approaches to cluster reconfiguration, while a cluster is being reconfigured, all the shared resources associated with the cluster are left in an inconsistent state until the remastering operation is completed. Lock operations on any of the resources are not allowed while the resources are in this inconsistent state, which lasts until the remastering operation is completed. This constraint results in a total freezing of access to all the shared resources associated with the cluster. In the context of a database cluster, all the resources associated with the database, i.e., the database itself, are frozen during a reconfiguration remastering operation. Hence, there is room for improvement in making resources shared among nodes in a cluster available while the cluster is being reconfigured.