A distributed database, or a distributed data store, is a database in which information is stored on multiple storage devices with multiple computer processor units. A distributed database cluster is the system of storage devices on which a distributed database is stored. The distributed database cluster may be multiple computers in the same physical location, or may be multiple computers that are physically dispersed but connected via a communication network. Distributed database clusters store large amounts of data that are accessible by a large number of computers. For example, large corporations or other organizations that create, maintain, and allow access to a large amount of information internally or externally may use distributed database clusters to store the information.
Distributed databases systems are subject to service disruptions if all or a part of the distributed database cluster is upgraded, replaced, or otherwise subject to maintenance. Several methods have been developed to maintain service in the event of a planned disruption in service of a distributed database cluster. One such method involves providing multiple distributed databases systems that maintain the same information, otherwise known as a replicated distributed database system. Before one distributed database cluster is shut down for maintenance, access requests to that system are directed to another distributed database cluster. The process of switching from a one distributed database cluster to a replacement distributed database cluster is known as a “failover operation.”
Information is dispersed among replicated distributed database systems in at least two ways. In one method, one distributed database cluster is designated the master distributed database cluster through which all access requests are handled. When changes are made to the master distributed database, the changes are replicated to the backup, or slave, database systems. This configuration is known as a primary-backup or master-slave scheme. In another method, access requests can be made to any distributed database cluster. Changes to any one distributed database cluster are replicated to the other distributed database clusters. This configuration is known as a multi-primary or multi-master scheme. In a replicated distributed database system, it is important that all database systems maintain consistent information when accessed. More specifically, a read request for a datum in a replicated distributed database system should be directed to a distributed database cluster in which all previous write requests to that datum have been applied. This is known as access invariance. The access invariance of a replicated distributed database system should be maintained during a failover operation.
While replication of data among the distributed database clusters is an ongoing process, during a failover operation it is important that the current data in the original distributed database cluster are completely transferred to a new distributed database cluster that will remain operational. If a failover operation occurs and the new distributed database cluster does not contain the exact same information as the original distributed database cluster, information will be lost. Future access requests to the new distributed database cluster may return outdated or erroneous information. In addition, when a failover operation occurs all client software processes should stop accessing the original distributed database cluster. This is usually accomplished by forcing the software processes to abort and then restarting the processes after the failover operation is complete. Due to these constraints, failover operations have generally taken a substantial amount of time to complete, especially when the replicated distributed database system is large.