The subject matter discussed in the background section should not be assumed to be prior art merely as a result of its mention in the background section. Similarly, a problem mentioned in the background section or associated with the subject matter of the background section should not be assumed to have been previously recognized in the prior art. The subject matter in the background section merely represents different approaches, which in and of themselves may also correspond to implementations of the claimed inventions.
Cloud computing is an information technology paradigm, and a model for enabling ubiquitous access to shared pools of configurable resources (such as computer networks, servers, data storage, applications and services), which may be rapidly provisioned with minimal management effort, often over the Internet. Cloud computing allows users and enterprises with various computing capabilities to store and process data either in a privately-owned cloud, or on third-party servers located in data centers, thus making data-accessing mechanisms more efficient and reliable.
A distributed database can be an organized collection of information that is dispersed over a network of interconnected computers, which may be referred to as a cluster of nodes, such as a cloud computing network. A high availability distributed database system provides continued access to data in a database even after a failure of a node that stores a copy of the database results in the node becoming unavailable for access by an end user. For example, if each of three nodes store a copy (or replica) of a database, after the failure of one node, the end users can still access the data in the database through one of the other available nodes that stores a replica of the database. Further to this example, the three nodes that store the three replicas of the database may be distributed across three fault domains of nodes, such as three racks of nodes that each shares a single point of failure. Consequently, if a rack of nodes, which includes a node that stores a replica of a database, is affected by a single point of failure, such as a power outage or a loss of network access that results in a failure for all nodes in the rack, the end users can still access the data in the database through one of the other racks that includes one of the other available nodes that stores a replica of the database. A replication process ensures that a distributed database remains up-to-date and current by identifying changes in one replica of the database and propagating the changes to the other replicas of the database.
As a distributed database system grows in scale, the probability of a single node failure becomes increasingly likely. While a single node failure may not lead to immediate data loss or data unavailability, such a failure affects the probability of data loss, as presented in “Probability of Data Loss in Large Clusters,” by Martin Kleppmann (http://martin.kleppmann.com/2017/01/26/data-loss-in-large-clusters.html). The probability of data loss depends upon the number of nodes in a cluster of nodes, and the probability of a node failure. The final formula presented is:Probability of Data Loss=kp{circumflex over ( )}r 
where k˜number of partitions in a cluster of nodes, p is the probability of a node failure, which is based on a time window equal to the recovery time of a node, and r is the replication factor of the data, or how many copies are replicated for each data element. Kleppmann's article assumes a constant probability p of a node failure. Distributed database systems attempt to minimize the probability of simultaneous node failure by reducing the recovery time window, such that for a given cluster size (the number of partitions) and replication factor, the probability of data loss decreases with the decrease in recovery time from a node failure.