The present invention relates generally to the field of databases, and more particularly to fault handling in databases. Databases with requirements for high availability are often run on multiple member nodes that are organized into clusters. It is known that a set of clusters hosting a database should preferably be designed to: (i) continue to operate properly even upon occurrence of a hardware failure; and (ii) continue to operate properly even when experiencing increased demand or software upgrades. Because clusters are at risk from disaster events (such as fires, floods, and power failures), disaster recovery solutions are customarily designed to replicate the entire cluster on another, geographically separate cluster.
With a replicated database running on a cluster architecture, a user has access to connect to and alter data from any member node in the primary cluster. Therefore, each member node in the primary cluster must ship its logs to the standby cluster for replication. Further, each member node in the standby cluster needs a copy of all changes from all member nodes in the primary cluster. An example arrangement of a cluster architecture is where there is only one active node (referred to as the replay master) in the standby cluster while all of the member nodes in the primary cluster are active. The member nodes in the primary cluster connect to the replay master and ship their logs to the replay master. The replay master handles log merging, that is, when the standby cluster takes over from the primary cluster, the replay master brings up all of the rest of the member nodes in the standby cluster and provides a consistent log to those member nodes.