Cloud-based systems are proliferating with providers using cloud computing to offer services. A cloud-based system includes a plurality of servers interconnected to one another, including servers in different, geographically diverse data centers. As a cloud-based system can be spread globally, there is a need for system replication and coordination. System replication includes distribution of files, databases, objects, etc. between different nodes, servers, clusters, etc. Subsequent to system replication, there is a need for validation and possibly recovery due to failures and/or errors in the system replication. Paxos, Raft, and ZooKeeper are examples of conventional techniques in system replication and coordination. Paxos is a family of protocols for guaranteeing consistency across a group unreliable of replicas. The protocol attempts to make progress even during periods when some replicas are unresponsive. Raft is a consensus algorithm designed as an alternative to Paxos. It was meant to be more understandable than Paxos by means of separation of logic, but it is also formally proven safe and offers some additional features. Raft offers a generic way to distribute a state machine across a cluster of computing systems, ensuring that each node in the cluster agrees upon the same series of state transitions. ZooKeeper is a distributed, open-source coordination service for distributed applications. It exposes a simple set of primitives that distributed applications can build upon to implement higher level services for synchronization, configuration maintenance, and groups and naming. While these approaches provide replication and coordination, there is a need for additional fault-tolerance and recovery in terms of in-service cloud-based systems.