Cloud computing is changing the ways in which computing services are provided. In order to take full advantage of the capabilities of a data center, many services are designed to scale with load and to be tolerant to faults. A number of services (e.g., interactive services such as telecommunications, gaming, and the like), have relatively tight limits on network performance parameters such as delay and jitter. These criteria—elasticity, fault-tolerance, and network performance—can conflict in many different ways. While there are various fault-tolerance mechanisms which add fault-tolerance to processes, such fault-tolerance is typically added at the expense of incurring network overhead. One such fault-tolerance mechanism is primary-backup replication, in which the state of a primary process is synchronized with the state of a backup process so that the backup process can take over for the primary process after a failure of the primary process. Disadvantageously, however, in many existing primary-backup replication schemes, including those that allow recovery of both memory and disk, regular synchronization between the primary process and the backup process adds delay at least equal to the round-trip delay between the primary process and the backup process. This seriously degrades the network performance of latency-sensitive services.