Embodiments disclosed herein relate generally to data center availability; and, more specifically, embodiments relate to system for data center availability wherein one data center may assume the responsibilities of another data center. The improved system in particular may relate to systems and methods for load balancing between data centers, particularly in autonomous intranet embodiments.
Many organizations use servers connected to the Internet to provide information and service to customers and potential customers. When a primary server experiences a failure of some sort, customers may be notified, redirected to a backup server, or may lose their connection entirely. In any case, customers' interaction with the organization suffers, and their opinion or impression of the organization may suffer as well.
A data center is a facility that houses computing systems for a particular business, industry, governmental entity, or other organization. Such computing systems may include, for example, one or more server farms that perform various functions for the organization. Examples of such functions include hosting web sites, storing information, and providing processing for computing applications, among others. Other computing systems may be housed in a data center for performing other functions.
Security of information and application processing associated with a data center may be critical to particular organizations. Various efforts have been made to enhance the security of data centers. For example, some data centers are provided with physical security such as housing the data center in an inconspicuous location, providing restricted access to the data center, providing the data center with environmental isolation and control, and providing electrical power supply redundancy to the data center. Another element of security that has been added to data center design is to provide an organization with more than one physical data center, e.g., providing multiple data centers at different locations.
Providing “redundant” or “backup” data centers may provide an organization with the ability to protect data center functionality against harmful factors that extend beyond the scope of the organization's control over a single data center. For example, a single data center may be vulnerable to physical failure, e.g., from terrorist activity, fire, earthquake, etc. A single data center may be vulnerable to electronic failure, e.g., “hacker” activity such as viruses, broadcast storms, denial of service attacks, and the like. A single data center may be vulnerable to electric and/or telecommunications failure of such a magnitude that provided systems internal to the data center are unable to mitigate the failure. Other failures that reduce or eliminate the functionality of a single data center are possible. In such instances, having additional data centers at separate geographic locations may provide the organization with the ability to maintain data center functionality after the loss of a single data center.
An organization may desire to provide “always-on” service from data centers such that a client using the functionality of the data centers perceives continuous service during a failover from one data center to another and during simultaneous operation of multiple active data centers. Some methods have been proposed to provide such “always-on” service to clients connecting via the Internet. For example, each of the following U.S. patent applications: Ser. No. 11/065,871 “DISASTER RECOVERY FOR ACTIVE-STANDBY DATA CENTER USING ROUTE HEALTH AND BGP”, Ser. No. 11/066,955 “APPLICATION BASED ACTIVE-ACTIVE DATA CENTER NETWORK USING ROUTE HEALTH INJECTION AND IGP”, and Ser. No. 11/067,037 “ACTIVE-ACTIVE DATA CENTER USING RHI, BGP, AND IGP ANYCAST FOR DISASTER RECOVERY AND LOAD DISTRIBUTION”—all to Naseh et al., describe the use of border gateway protocol (BGP) and advertisement of a block of IP addresses, e.g., 24.24.24.0/24, on a subnet basis for the respective data centers (this application again incorporates herein by reference in its entirety each of these applications).
The above-mentioned efforts to enhance the security of data centers may themselves create issues. For example, a networking issue for organizations that maintain multiple active data centers is session persistence. If route maps change during a client session (for example, because changes in network usage cause changes in a shortest network path), traffic from one client for one session may be routed to more than one data center. For example, if two active data centers advertise the same block of Internet protocol (IP) addresses, a client may generally be routed via the shortest topographic path, using one of a number of routing metrics, to one of the data centers. However, the “shortest” path may change during the pendency of the session, e.g., as network traffic at various points throughout the network changes. In some circumstances, such changes could cause a route to a different data center to become “shorter” than the route initially taken by client traffic. This may be particularly problematic for lengthy client sessions (for example, sessions associated with financial transactions performed over a network).
Route convergence is an example of a networking issue for organizations that maintain an active data center with a passive backup data center that may become active upon failover. When a network topology changes, e.g., due to a failure, some routers on the network may receive updated network information and use the updated information to recompute routes and/or rebuild routing tables. On a large-scale network, e.g., the Internet, route convergence may take a significant amount of time with respect to the duration of some client sessions, possibly allowing a client to become aware of a network problem, e.g., by receiving a failure dialog on a network interface. A client may store domain name system (DNS) records locally, e.g., a cache of IP addresses corresponding to websites. Such DNS records may come with a particular time to live (TTL) value that, if not expired, may prevent such DNS records from being refreshed, which may slow the route convergence process and/or allow the client to receive a failure dialog on a network interface.