A “cluster” is made up of multiple “nodes,” each of which executes one or more database server instances that read data from and write data to a database that is located on shared storage. Each node may be a separate computing device. Nodes may communicate with other nodes and the shared storage through a network and/or other communication mechanisms.
Clusters offer many benefits not available in alternative data processing configurations. The computational power offered by a cluster of relatively inexpensive nodes often rivals the computational power offered by a single, much more expensive, computing device. Individual nodes can be added to or removed from a cluster according to need. Thus, clusters are highly scalable. Even when one node in a cluster fails, other nodes in the cluster may continue to provide services. Thus, clusters are highly fault-tolerant.
Each node in a cluster is associated with a numeric node identifier. Typically, the first node to join a cluster is assigned an identifier of “1.” The next node to join the cluster is typically assigned an identifier of “2.” As more nodes join the cluster, those nodes typically are assigned numerically increasing identifiers.
As time progresses, technology advances. Consequently, nodes that have been in a cluster the longest often are the least computationally powerful. Nodes that have joined a cluster most recently often are the most computationally powerful. Due to the manner in which identifiers are assigned, the numerically lowest identifiers are often, although not always, assigned to the least computationally powerful nodes.
As mentioned above, each node in a cluster may execute one or more database server instances, referred to herein simply as “instances.” Each such instance may have a separate buffer cache stored in the memory of the node on which that instance is resident. When a particular instance needs to access a block of data from the database, the instance determines whether the block is stored in any instance's buffer cache. If the block is stored in some instance's buffer cache, then the particular instance obtains the block from that buffer cache and places the block in the particular instance's buffer cache, unless the block is already stored in the particular instance's buffer cache. If the block is not stored in any instance's buffer cache, then the particular instance reads the block from the database and places the block in the particular instance's buffer cache. Either way, the particular instance can then access the block from the particular instance's buffer cache instead of the database. Accessing a block from a buffer cache is significantly faster than accessing a block from the database.
When an instance accesses a block, the instance may do so for the purpose of modifying the block. The instance modifies the block that is in the instance's buffer cache. In order to reduce the amount of writing to the database, which degrades performance, the writing of the modified block to the database might be deferred for some period of time. To protect against node failure, a “redo log” stored in the database maintains a history of modifications that the instance performs on data blocks.
Sometimes, nodes fail. When a node fails, the blocks stored in the buffer caches resident on that node may be lost. Some of those lost blocks might be blocks that were modified but not yet written to the database. In such a situation, a recovery process needs to be initiated so that the database contains the correct blocks. According to one approach, an instance resident on the surviving node that has the lowest numerical identifier is selected from among instances resident on surviving nodes in the cluster. The selected instance is given the task of performing the recovery process.
Selecting an instance in this manner is quick and easy. However, as is explained above, the nodes that have the lowest numerical identifiers often have the least computational power of any nodes in the cluster. When an instance on a node with relatively low computational power is selected to perform the recovery process, the recovery process takes a longer period of time to complete. To prevent potentially incorrect data from being retrieved from the database or surviving buffer caches, some blocks of data are made inaccessible until certain phases of the recovery process are completed. Selecting an instance to perform the recovery process according to the above approach often maximizes the period of inaccessibility.
An alternative approach to selecting an instance to perform the recovery process might involve selecting an instance resident on the surviving node that has the numerically highest identifier of surviving nodes in the cluster. Such an approach would be just as fast and simple as the approach described above, and might result in the selection of an instance on a node with relatively high computational power. However, there is no guarantee that the node that has the numerically highest identifier will always be the most computationally powerful node. There is always the possibility that the node that has the computationally lowest power will have the numerically highest identifier.
These are some of the problems that attend approaches to the selection of an instance to perform database recovery. Because of these problems, approaches to such instance selection leave much to be desired. A technique that overcomes these problems is needed.
The approaches described in this section are approaches that could be pursued, but not necessarily approaches that have been previously conceived or pursued. Therefore, unless otherwise indicated, it should not be assumed that any of the approaches described in this section qualify as prior art merely by virtue of their inclusion in this section.