Many sites on the Internet are not always as simple in their implementation as they appear to be to the user. Often what appears to be a single site on a single computer is actually a collection of servers on a Local Area Network (LAN). This collection of servers is commonly referred to as a server farm. The server farm frequently has more than one connection to the Internet to further ensure that the site does not have a single point of failure to the outside world. Sometimes these additional connections are actually mirrored sites located at different geographic locations.
The purpose of the server farm is to provide enough raw computing power for a site so that it does not get overwhelmed when the traffic is heavy. Many server farm sites use LAN load balancers to ensure that the traffic load is evenly balanced among all of the servers in the farm. The LAN load balancers present a single Virtual Internet Protocol address (VIP) to the outside world for the site, where the single VIP address represents a virtual site comprising all of the servers in the farm.
LAN load balancers typically use performance metrics of the individual servers along with the requested service's availability information to direct each connection to a server in the virtual site that can best fulfil a client's request. A service is defined as a process (application program) running on a server which is uniquely identified by the Internet Protocol (IP) address of the server and the service port that the process is listening on. For instance a HyperText Transport Protocol (HTTP) server running on a server with an IP address of 192.10.1.12, listens on port 80 and would be referred to as the service 192.10.1.12:80. Performance metrics are quantitative data about a particular server such as the number of connections to the server, or the load on the server's central processing unit (CPU), e.g., how much data are the clients transferring and how much additional processing must the server complete for each client request. The performance metrics often provide a better measure of the ability of the virtual site to satisfy a client request than service availability. For example, the ability to ping the server is not truly considered service availability (but is sometimes used) since the ability to ping a server does not mean that the service (such as HTTP) is actually available on that server. LAN load balancers typically use performance metrics to balance loads only among those sites where the service is available, since when the service is not available it is unlikely that performance metrics for the site can be obtained.
When the virtual site comprises multiple physical sites distributed across a Wide Area Network (WAN) multiple VIP addresses are required, with at least one VIP address is assigned to each physical site. A WAN or multi-site load balancer distributes the traffic load more evenly among the multiple physical sites. Unlike the LAN load balancers, the multi-site load balancers typically do not view server farms as individual servers. Instead, they view each server farm as a single site and attempt to balance the traffic to each site.
The most common implementation for multi-site load balancing is to load balance the Domain Name Service (DNS) requests for a host name. For example, when a client enters a uniform record locator (URL) on their web browser or clicks on a link, the client's name server must translate the host name in the URL into an Internet Protocol (IP) address. The DNS request works its way through the Internet until it eventually finds a name server that claims to have an authoritative answer for the request, at which point the request may be balanced.
Like the LAN balancers, WAN or multi-site load balancers attempt to direct each connection to the server that can best fulfil the DNS request, in this case by referring the client to a site that is capable of providing an optimal response. There are several factors that influence whether a site is capable of providing an optimal response. One factor is server response time, which is based on several factors, primarily the performance metrics of the servers that comprise the site. Another factor is network response time, which is based on network latency. Network latency is a measure of how quickly packets can reach the site through the network.
Prior art approaches to multi-site load balancing refer clients to sites having either the fastest server response times or the fastest network response times. However, those sites may not be the sites actually capable of providing the client with the optimal response.
The most common prior art approach to multi-site balancing refers clients to the best available site of the moment based on server response time. But load balancing based solely on server response time may break down completely when there are significant differences in the network response time between the client and each of the available sites. A site may have the best server response time but the slowest network response time. For example, a site with the best performance metrics might have an unacceptably slow 600 millisecond network latency, whereas another site with only marginally poorer performance metrics, but a significantly less network latency of 150 milliseconds, may be a better choice.
Another less commonly used prior art approach to multi-site balancing refers clients to the best available site of the moment based solely on network response time. But load balancing based solely on network response time may also be problematic, since the site with the least network latency may also be the most heavily loaded. Although the packet reaches the site quickly, the servers may be so overloaded that the server response time is totally unacceptable.
Another problem with load balancing based solely on network response time occurs when there is only one site available to respond. For example, multiple VIPs may be associated with the same site where the VIP addresses often correspond to the individual services available at that site (e.g. HTTP, HTTPS, FTP, etc . . .). Since there is only one site, the prior art network latency load balancing approach defaults to using a round robin balancing of the VIP addresses mapped to the host name to return a random VIP address. But the randomly returned VIP address may not necessarily be the VIP address of the site capable of providing the optimal response.