Failover mechanisms of current distributed computing systems are inefficient. Some current distributed computing systems have clusters of servers serving a client. For example, they have a master server and a slave server. The master server replicates the data to the slave servers in other clusters asynchronously. If the master goes down, then one of the slaves turns into master and serves the clients. When a section of the network fails, the distributed computing system fails over from one cluster to another cluster. One of the problems in such distributed computing systems is that since the replication is asynchronous, when the data is replicated from one cluster onto the other, there is no guarantee how long it takes for the data to be received at the other cluster. This replication can consume significant time, e.g., hours or even days, especially if the amount of data in the cluster is significantly large.
Accordingly, if the servers or the network fails in a particular cluster, there exists data that is written to the master server that has not yet replicated to the other clusters. Consequently, the users of the distributed computing system may experience data loss if the servers from the particular cluster are failed over to the other cluster. If the distributed computing system has to ensure that the users do not experience data loss, it may need to wait until all of the servers in the cluster have replicated data to the other cluster, which can consume significant time, and then fail over to serve the clients. Furthermore, current distributed computing systems do not have an efficient way of selecting one of the servers as a master server.