The present invention generally relates to disaster recovery systems. Some example embodiments of the present invention are applicable to distributed virtual computing services provided to a plurality of customers.
The present invention generally relates to computing services, e.g., virtual computing services provided by a service provider to an enterprise customer, or to multiple enterprise customers. In some example embodiments, disaster recovery locations for these enterprise customers may be provided, and the disaster recovery locations for multiple enterprise customers may be distributed across multiple data centers that are also used to provide regular production services.
Service providers typically provide services, e.g., virtual computing services such as hosting or storage management, from a number of data centers. Each data center may contain servers, networking devices, storage systems, security systems, and all other hardware and software resources required to provide for the computing needs of the enterprise customers they serve. Each enterprise customer's services may be provided primarily by a particular data center, in the same manner as a company that manages its own computing infrastructure has a primary data center. However, multiple enterprise customers may share the same primary data center. Enterprise customers contract with the service providers to provide computing services for customer applications. The service providers then allocate the resources needed for each customer application in a data center. Service contracts may include guarantees of certain levels of system performance and availability (e.g., Service-Level agreements or SLAs).
To meet service availability targets, and to ensure service continuity in the event of a disaster, disaster recovery services may be provided. Examples of disasters include natural disasters, power failures, network failures, fires, and other events that impair the operation or use of a computing center. Consumers of virtual computing services typically require disaster recovery services able to ensure that critical applications remain functional in the event of a significant failure. Often these customers require disaster recovery services that are able to react to the failure of an entire data center. Therefore, service providers must allocate redundant systems and services in remote locations in order to implement the disaster recovery services expected.
An enterprise operating its own data center may have an entire dedicated backup data center set up to serve as a backup in the event of a disaster at its primary data center. The problem with his approach is that the resources spent on the backup are typically idle when the primary data center is operational. Thus, the overhead required for this approach is high, often 100% for many types of resources. Disaster recovery resources provided in this manner are inefficient, requiring unnecessary resources.
Alternatively, some service providers maintain a dedicated disaster recovery data center to serve the needs of multiple customers. In the typical case, a service provider would locate a data center at a location separate from its other production data centers. The service provider would then equip the data center with the hardware and other resources used to provide virtual computing services to its customers at other locations. This shared backup data center is still idle when not in use for disaster recovery. Moreover, when major disaster events occur, there may not be sufficient capacity to meet the disaster recovery needs of all the customers. Also, the central disaster recovery site may not be configured to provide an easy transition to operation when a disaster occurs. When multiple customers all lose service from a primary site at the same time from a common event it may be difficult to transition all of them to the backup site at the same time in an orderly fashion. Finally, while the disaster recovery services are provided remote from other data centers, the disaster recovery data center is itself becomes a single point of failure for a large group of customers. While a primary production site has failed, if the disaster recovery data center is unable to provide services for some reason, the services of all enterprise customers located in the failed primary production site will be affected.
There exists a need for a distributed disaster recovery system able to provide disaster recovery services to enterprise customers efficiently and reliably.