Cloud computing refers to the use of computing resources including hardware and software that are delivered as a service over a network such as the Internet. Cloud service providers operate data centers that host customer databases and applications. Customers of a cloud service provider build services for their customers on top of this infrastructure. Some of those services are mission critical. That is, failure of those services will have an adverse effect on the ability of an entity, such as a business, to continue operations.
One of biggest benefits of cloud computing is that customers, by virtue data center employment, no longer have to worry about the availability of data and code. At the same time, hardware and various other elements inside of data centers clearly have reliability limits and failures can occur. The expectation is that regardless of these failures cloud service providers will be able to run applications and protect data in any case. Mechanisms are already in place when it comes to failure of individual components of a data center, for example, when a server fails. However, there are types of failures or distributive events that could affect an entire data center. For example, natural disasters, catastrophic human errors, or malicious acts could result in a massive failure of a data center. Such a massive failure would result in application and data unavailability for a period of time while a cloud service provider repairs the facility.
In order to guarantee availability of applications and data even in cases of a massive data center failure, redundant copies can be maintained at a separate facility. A primary data center can be selected to host applications and data, and a secondary data center can be identified as a backup, wherein the primary and secondary data centers are in different geographical regions. If the primary data center suffers a massive failure and an application and database are no longer accessible, there is a process called fail over in which the secondary data center can be activated to provide access to the application and database on the secondary data center. To support such a disaster recovery scenario, the service provider replicates data between the primary and secondary data centers. Nevertheless, data can be lost in case of data center failure, because of the distance between data centers located in different regions. The distance causes a lag between transactions committed on a primary database and transactions committed on a secondary database, for instance. When a failure occurs, not all data committed on the primary database will have been committed on the secondary database. Consequently, data is lost after fail over.