The present invention relates to a data center topology that can recover from a disaster and more particularly to an improved disaster recovery method in the event the active data center malfunctions.
A data center stores information related to a particular business, provides global access to the information and application software through a plurality of computer resources and may include automated systems to monitor server activity, network traffic and performance. A data center may be known by a variety of names such as, by way of example, a server farm, hosting facility, data farm, data warehouse, co-location facility, co-located server hosting facility, corporate data center, managed data centers, internet hotel, internet service provider, application service provider, full service provider, wireless application service provider, site or other data network facility. Regardless of the name used, a typical data center houses computer resources such as mainframe computers; web, application, file and printer servers executing various operating systems and application software, storage subsystems and network infrastructure. A data center may be either a centralized data center or a distributed data center interconnected by either a public or private network.
A centralized data center provides a single data center where the computer resources are located. Because there is only one location, there is a saving in terms of the number of computer resources required to provide services to the user. Because there is only one location, management of the computer resources is much easier and capital and operating costs are reduced. Unfortunately, centralized data centers are rarely capable of providing the necessary reliability required under common service level agreements for a geographically diverse organization and the service is susceptible to interruption in the event of a disaster, such as a fire or earthquake, equipment malfunction or denial of service attack. For these reasons, centralized data centers are rarely relied upon for critical applications.
A distributed data center is one that locates computer resources at geographically diverse data centers. The use of multiple data centers provides critical redundancy, albeit at higher capitol and operating costs, business continuity, disaster recovery, and load-sharing solutions. Some distributed data centers use Domain Name System (DNS) for managing business continuance and load sharing between multiple data centers. However, Interior Gateway Protocol (IGP) and exterior Border Gateway Protocol (E-BGP) are more often used to route traffic between multiple data centers. IGP refers to the Interior Gateway Protocol, which is an internet protocol used to exchange routing information within an autonomous system. BGP refers to the Border Gateway Protocol that is an interautonomous system routing protocol. BGP is used to exchange routing information for the Internet and is the protocol used between Internet service providers (ISP). An autonomous system is a network or group of networks under a common administration and with common routing policies. BGP is used to exchange routing information for the Internet and is the protocol used between Internet service providers. When BGP is used between autonomous systems (AS), the protocol is referred to as External BGP (E-BGP). If BGP is used to exchange routes within an AS, then the protocol is referred to as Interior BGP (I-BGP).
One type of distributed data center topology comprises a pair of data centers, one of which is active and one of which is a standby data center. In an active/standby topology, applications are hosted on both data centers but only one data center is active at any give time. All traffic goes to the active data center until it fails after which traffic is routed to the standby data center. With DNS routing, there could be a significant delay as DNS record caches are updated to redirect the traffic to the now-active data center. During this period, the data center would be inaccessible to any users. A preferable method to implement an active/standby data center topology is to use the same IP address for both data centers and advertise the IP address with different metrics from each data center location. A metric is a standard of measurement, such as path bandwidth, that is used by routing algorithms to determine the optimal path to a destination. This may be preferable to a DNS solution because it avoids the vulnerabilities of DNS record caching.
Although advertising IP addresses is relatively straight forward, there is a risk that both data centers can be active simultaneously, which is undesirable. Accordingly, there is a great need to enable the standby data center to accurately monitor the health of the active data center such that the standby data center will advertise its IP address only if the active data center is actually down. Further, since routing protocols use various metrics to evaluate what path will be best for traffic to travel, route information will vary depending on the routing algorithm. To aid the process of path determination, it is necessary that the IP address be advertised in a manner that minimizes or eliminates the ambiguity in taking different paths and also minimizes the time it takes to update adjacent routers with the new route information.
What is needed is way to make sure that two data centers are not simultaneously active in an active-standby topology. What is also needed is standby data center that will advertise its IP address only if service by the active data center is interrupted due to a disaster, equipment malfunction or other reason.