Replication of databases in computer clusters can provide a basis for data protection and service availability if the replicated copies remain in service and are available to become active when a failure should occur to another copy. Depending on the amount of redundancy in place and rate of failure, different system design parameters may be used. Costs can be reduced if the system is designed to repair itself, including the recovery from failed servers and disks.
One such system design is an active/passive replication system that uses a cluster as the scope for replication of databases. As used herein, a cluster is a group of computer systems that work together to provide one or more services so that the cluster can be viewed as a single system in one or more respects. In this active/passive design, a database is replicated to multiple copies in the cluster, with one copy being an active copy. An active copy is a copy to which access (e.g., reads and writes) is allowed. The active copy status is typically managed by a primary active manager (a role which may float between computing machines) within the cluster can determine which copy is active. Accordingly, if the active copy fails, the primary active manager can designate one of the other passive copies to be the active copy. However, such an active/passive replication system confronted with an additional cost impact. Specifically, the redundancy unit—the cluster—is designed with sufficient redundancy to automatically repair itself. This can further drive up the costs, especially because designers may assume the cluster redundancy unit fails at its worst rate.