Data replication is the process of maintaining multiple copies of data in a distributed system. Typically, a copy of the data, referred to as a replica, is maintained on each device, such as a server for example, of the distributed system. Data replication is useful in the event of a failure of one of the devices, because a replica can be used to recover from the failure. Also, data replication can provide improved system performance. An application executing on the system can access a specific replica to improve access times and minimize traffic within the system.
In the event of a failure, recovery can be achieved via a process referred to as failover. Failover is the process by which a device having a replica of the data takes over processing for the failed device. Typical distributed systems that support replication and failover implement an approach, wherein each device in the distributed system having a replica stored thereon, also has an instance of a specific state machine stored thereon. All commands and operations, such as reads and writes, go through multiple rounds of message exchanges between the devices to execute a command, in addition to requests to stable storage (e.g., disks). For concerns of efficiency, it is desirable to decrease the number of message exchanges as well as the number of requests to stable storage.