Load balancing is a process of selectively routing successive client requests to one of a plurality of servers that is then currently best able to service each request. In past approaches, a client request is normally routed on a hop-by-hop basis from its source to a destination router that is communicatively coupled to a hardware or software load-balancing device. The load-balancing device then determines which server should service the request, and forwards the request to that server.
Although this approach is workable, when a plurality of servers is organized in a server farm that is distributed over numerous logically or geographically separate sites, the past approach becomes inefficient. There may be a great distance between the load-balancing device and one or more of the servers, but packets in a flow associated with the client request are required to arrive at the load-balancing device repeatedly and are then repeatedly re-routed to the selected server, resulting in delay. Specifically, for each new packet arriving for the same flow, the same server load-balancing decision must be carried out. There is a need to cache the first such decision and a need for a way to indicate the decision for future flows. There is a need for a way to route the flow of a client request to a selected router more directly and rapidly.
Load balancing can involve either global load balancing or local load balancing. In global load balancing over the Internet, locating a content site that can service a client request generally involves mapping a host name to an IP address using the Domain Name Service (DNS). Some DNS servers can store multiple IP address entries for a host, and deliver successive mappings based on a round-robin scheme. DNS also enables a client to look up available servers to a particular protocol, and can return server preferences that may assist a client in selecting a particular server. Dynamic DNS can store weight values in server records, so that a server can be selected using a weighted approach. Proximity values obtained using Border Gateway Protocol (BGP) can be used to determine the server that is closest to a particular client. The proximity values can be obtained through the AS Hop Count value through BGP, or hop count values that are obtained using a Layer 3 routing protocol such as DVMRP. Alternatively, or round-trip delay time values could be measured by sending ICMP Echo Reply packets and measuring time delay between the ICMP Echo Request and Reply.
A commercial example of a global load-balancing device is Distributed Director, from Cisco Systems, Inc., San Jose, Calif Distributed Director operates in a DNS mode and an HTRP redirect mode. In DNS mode, Distributed Director maps a host name to one of many IP addresses. A disadvantage of this approach is that many hops may be required before the correct DNS server is discovered and the query is satisfied. Following such hops adds to latency from the client's perspective. Further, certain clients will cache the DNS replies; however, if a DNS server goes down or becomes unavailable, caching may cause a client request to attempt to reach a server that is not responding.
Still other disadvantages include the need for manual configuration of Distributed Directors in the DNS server; latency in that the client caches the mapping for a longer time period than the DNS server allows, such that the server mapping become invalid but the client is not notified; and lack of dynamic amortization of the cost of discovery of mapping.
In HTTP redirect mode, Distributed Director redirects an HTTP client to a nearby HTTP server. The HTTP server is selected by communicating using a Director Response Protocol to agents at the servers. The agents provide network-related metrics, external metrics, internal metrics, and server metrics. Network-related metrics may include round-trip packet delay and network topology information. External metrics may include the distance between a client and the DRP agents at the servers in terms of BGP hop counts. Internal metrics may be the distance between a DRP agent and the nearest BGP router as measured by IGP metrics. Server metrics may be the distance between servers and DRP agents as measured by IGP metrics. A disadvantage of this approach is that it is limited to HTTP.
Local load balancing may involve a Layer 2 rewriting approach or a Layer 3 rewriting approach. A commercial example of a local load-balancing device is Local Director, from Cisco Systems, Inc., San Jose, Calif. The Layer 3 rewriting approach is used when a selected host has only one IP address. The load-balancing device rewrites the destination IP address in all client packets destined for that host with the host IP address. The Layer 2 rewriting approach is used when a plurality of hosts share a single virtual IP address. The load-balancing device rewrites the destination MAC address of the host.
In local load balancing, a particular host may be selected using numerous approaches, including a round robin approach, a weighted server approach, a weighted round robin approach, an approach based on the number of connections of the server, and an approach based on the amount of round-trip delay to each server.
Load balancing devices also have been available in the past from Arrowpoint, Sightpath, Altheon, and other vendors.
All of these approaches have disadvantages when applied in the context of high-demand content networks that provide large amounts of multimedia content to millions of widely distributed clients. The owners or operators of these content networks, known as content providers, need approaches other than load balancing to ensure that all requesting clients receive requested content. Certain kinds of caching may be used to address in this approach. One past approach to caching is Web Content Cache Protocol (WCCP) as defined at the document draft-forster-web-pro-00.txt at domain “wrec.org” on the World Wide Web. A “boomerang” agent is configured to intercept a DNS request message and broadcast the same request to multiple WCCP-enabled DNS caches. The first WCCP DNS cache to send a reply is elected as the responding server, and its server address is returned to the client. However, sending DNS queries to many WCCP DNS caches wastes processing cycles and network resources of those caches that respond more slowly, because their replies are ignored. Thus, the flooding nature of the boomerang protocol creates excessive network traffic and is not scalable.
Further, caching does not solve all problems associated with content providers. Caching approaches deprive content providers of important content management data, such as usage tracking information that may be vital for billing purposes. Further, a caching approach that allows users to be untraceable is no longer acceptable for security reasons. As a result, there is a need for improved approaches to deliver content to clients without the disadvantages of past approaches.
One possible approach for improving content delivery involves configuring routers in a network to transmit content data to more than one client at a time. This improves data distribution efficiency and reduces network congestion. This approach may be implemented using “multicast” communications. In Internet Protocol Version 6 (IPv6), an anycast is communication between a single sender and the nearest of several receivers in a group; the receivers are identified by 27-bit subnet addresses. A multicast is communication between a single sender and all receivers in a multicast group, and a unicast is communication between a single sender and a single receiver in a network
Labeling mechanisms offer other ways of distributing data in a network more efficiently. In a normally routed data network, frames of data pass from a source to a destination in a hop-by-hop basis. In this context, a “hop” represents a specific network data processing device, such as a router. Transit routers evaluate each frame and perform a route table lookup to determine the next hop toward the destination. Typically, the Layer 3 header of a frame is evaluated in this step. “Layer 3” refers to one of the logical communication layers defined in the Open Systems Interconnect (OSI) reference model. This evaluation process tends to reduce throughput in a network because of the intensive processing steps that are needed to process each frame. Although some routers implement hardware and software switching techniques to accelerate the evaluation process by creating high-speed cache entries, these methods rely upon the Layer 3 routing protocol to determine the path to the destination.
However, such routing protocols have little, if any, visibility into characteristics of the network at other layers, such as quality of service, loading, or the identity of a particular content server or other data source that is servicing a source device. To address these issues, multi-protocol label switching (MPLS) enables devices to specify paths in the network based upon quality of service or bandwidth needs of applications.
With MPLS, an edge label switch router (edge LSR) creates a label and applies it to packets. The label is used by label switch routers (LSRs), which may be switches or routers, to forward packets. The format of the label varies based upon the network media type. In one approach, in a LAN environment, the label is located between the Layer 2 header and Layer 3 header of a packet. A label-switched path is a path defined by the labels through LSRs between end points. A label virtual circuit (LVC) is an LSP through an asynchronous transfer mode (ATM) system. In ATM systems, cells rather than frames are labeled.
Each LSR maintains a label forwarding information base (LFIB) that indicates where and how to forward frames with specific label values. In one implementation, each LFIB entry comprises an association of values that identify an incoming interface, incoming label, destination network, outgoing interface, and outgoing label. When a LSR receives a frame from a particular incoming interface that bears a label, the LSR looks up the incoming interface value and incoming label value in the LFIB. If there is a match, the LSR routes the frame to the destination network identified in the matching entry, on the outgoing interface, and replaces the incoming label of the frame with the outgoing label. This process is repeated at each successive LSR in a label-switched path until an edge LSR at the destination end is reached. At this router, known as an egress edge LSR, all label information is stripped and a standard frame is passed to the destination. Because each LSR in that path could switch the frame based upon content in the LFIB and did not need to perform the usual routing operations, the frame is handled more quickly.
An implementation of MPLS is generally structured using a forwarding component and a control component (or “control plane”). The forwarding component uses labels carried by packets and the label-forwarding information maintained by an LSR to perform packet forwarding. The control component is responsible for maintaining correct label-forwarding information among a group of interconnected label switches.
While MPLS has many applications, including quality of service management, traffic engineering, establishing and managing virtual private networks, provisioning, managing IP of any protocol, etc., it has no presently known applicability to server load balancing.
Based on the foregoing, broadly stated, there is a need for an improved network load balancing approach that is useful in a global (WAN) environment or local (LAN) environment.
There is a particular need for a load balancing approach that is appropriate for use in high-demand, high-volume content networks.
It would be advantageous to have a load-balancing approach that can make use of existing network protocols to carry load-balancing information.
Another deficiency of past approaches is that they fail to support “stickiness” of clients to servers. When a client makes multiple related requests for service, such related subsequent requests are referred to herein as “sticky” content requests because it is desirable to logically attach the requesting client to the same server, even though some of the network protocols that are conventionally involved in such client-server communications are connection-less, and do not support a persistent connection of the client to the server.
In past approaches, flow states have been tracked by storing tables of the “5-tuple” of Layer 4 values (source IP address, source port value, destination IP address, destination port value, and protocol identifier), which uniquely identify flows, in association with server identifier values. These tables provide a useful mapping of a client request flow to a specific server, and are generally required to carry out Layer 3 or Layer 2 network address translation for local server load-balancing, and sometimes for global server load-balancing if one or more intermediate routes are also carrying out global server load-balancing. However, these values have been stored at nodes such as routers at the expense of complexity and processing overhead.
To address this problem, in an embodiment, a mapping of a client identifier to a server identifier is stored at the client side and server side in cookies. The cookies enable a device to determine if a new request has a past association with previous flows or a previously selected server. In order to read the cookie data, the client connection is TCP terminated at the server load-balancing device, and a new client connection is initiated to the chosen server. At the server load-balancing device, a TCP splicing approach is used to combine the original client connection and the new server connection, to preclude the need to repeatedly walk up and down the TCP stack as the mapping is evaluated. In effect, a first connection initiated by the client and terminated at the server load-balancing device, and a second connection initiated by the server load-balancing device and directed to the server, are spliced. This reduces overhead involved in traversing up and down the TCP stack.
However, the processing overhead and intrusiveness of these approaches is undesirable. In particular, these approaches have proven to be not scalable; the performance of TCP termination hardware, which is typically implemented in ASIC chips, is known to degrade significantly when traffic volume involves millions of connections and gigabit-speed traffic is involved. Indeed, the volume of Internet growth is increasing at a rate greater than the rate of improvement in ASIC chip speed. Further, certain of the past approaches only work for HTTP traffic.
Thus, there is a need for an improved way to ensure that subsequent packets go to the same server without requiring termination of a TCP connection to do cookie-based stickiness. There is also a need for a way to support client stickiness in protocols and applications other than HTTP.