Web server clusters are a popular hardware platform in a web hosting infrastructure. Servers based on clusters of workstations are used to meet the growing traffic demands imposed by the World Wide Web. A cluster of servers, arranged to act as a single unit, provides an incremental scalability as it has the ability to grow gradually with demand. However, for clusters to be able to achieve the scalable performance with the cluster size increase, mechanisms and policies are employed for “balanced” request distribution.
Traditional load balancing solutions are represented by two major groups: 1) Domain Name System (DNS) based approaches; and 2) Internet Protocol (IP)/Transmission Control Protocol (TCP)/Hypertext Transfer Protocol (HTTP) redirection based approaches.
In a DNS based approach, the DNS server returns the IP address list (e.g., a list of nodes in a cluster which can serve this content, placing a different address first in the list for each successive request) to distribute the requests among the nodes in the cluster. Thus, different clients are mapped to different server nodes in the cluster. DNS based approaches are widely used, as they require minimal setup time and provide reasonable load balancing. Further, it uses the existing DNS infrastructure (e.g., there is no additional cost). However, DNS based approaches do not recognize either the load of the nodes in a cluster or the content of the request.
The second group, IP/TCP/HTTP redirection based approaches, employ a specialized front-end node, the load-balancer, which acts as a single point of contact for the clients and distributes the requests among back-end server nodes in the cluster. These solutions can be classified in the following groups:                layer four switching with layer two packet forwarding (L4/2);        layer four switching with layer three packet forwarding (L4/3);        layer seven switching (L7) or content aware switching.        
These terms refer to the techniques by which the systems in the cluster are configured together. In a L4/2 and L4/3 cluster, the load-balancer determines the least loaded server (this decision is the job of the proprietary algorithms implemented in different products) to which server in a cluster the packet has to be sent.
Traditional load balancing solutions for a web server cluster (L4/2 and L4/3) try to distribute the requests among all the back-end machines based on some load information.
The load-balancer can be either a switch or a load-balancing server (e.g., hardware solution) or a software load balancer (e.g., software solution). In both solutions, the load-balancer determines the least loaded server in a cluster to which the packet should be sent.
Load-balancing servers operate by intelligently distributing the incoming requests across multiple web servers. They determine where to send an incoming request, taking into account the processing capacity of attached servers, monitoring the responses in real time and shifting the load onto servers that can best handle the traffic. Load-balancing servers are typically positioned between a router (connected to the Internet) and a local area network (LAN) switch which fans traffic to the Web servers.
FIG. 1A illustrates a block diagram of a typical configuration of a network with a load-balancing server in accordance with the prior art. Client 110 issues a request which is received at load-balancing server 120, located at the front end. Load-balancing server 120 determines which back-end web server (e.g., web servers 130a and 130b) gets the request. The decision is based on a number of factors including: the number of servers available, the resources (CPU speed and memory) of each, and how many active TCP sessions are being serviced. All traffic is routed through load-balancing server 120.
FIG. 1B illustrates a block diagram of a typical configuration of a network with a software load-balancer in accordance with the prior art. Client 160 issues a request which is received at server 170 located at the front end, wherein server 170 has stored upon it load-balancing software. The load-balancing software determines which back-end web server (e.g., web servers 180a and 180b) gets the request. The decision is based on a number of factors including the number of servers available, the resources (CPU speed and memory) of each, and how many active TCP sessions are being serviced. Once a connection has been established with a particular web server, the web server (e.g., web servers 180a and 180b) responds directly to client 160.
Traditional load balancing solutions for a web server try to distribute the requests evenly among all the back-end machines based on some load information. This adversely affects efficient memory usage because the content is redundantly replicated across the caches of all the web servers, thus resulting in a significant decrease in overall system performance.
Content-aware request distribution (e.g., L7 switching) takes into account the content (can be a Uniform Resource Locator (URL) name, URL type, or cookies) when making a decision to which back-end server the request has to be routed. Content-aware request distribution mechanisms enable intelligent routing inside the cluster to support additional quality of service requirements for different types of content and to improve overall cluster performance. Policies distributing the requests based on cache affinity lead to significant performance improvements compared to the strategies taking into account only load information.
There are three main components comprising a cluster configuration with content aware request distribution strategy: the dispatcher which implements the request distribution strategy, it decides which web server will be processing a given request; the distributor which interfaces the client and implements the mechanism that distributes the client requests to a specific web server; and the web server which processes HTTP requests.
In the content-aware request distribution approach, the cluster nodes are partitioned in two sets: front end and back ends. The front end acts as a smart router or a switch, its functionality is similar to the aforementioned load-balancing software servers. The front end node implements the policy which routes the incoming requests to an appropriate node (e.g., web server) in the cluster. Content-aware request distribution can take into account both document locality and current load. In this configuration, the typical bottleneck is due to front-end node that combines the functions of distributor and dispatcher.
To be able to distribute the requests on a base of requested content, the distributor component should implement either a form of TCP handoff or the splicing mechanism. Splicing is an optimization of the front-end relaying approach, with the traffic flow represented in FIG. 1A. The TCP handoff mechanism was introduced to enable the forwarding of back-end responses directly to the clients without passing through the front-end, with traffic flow represented in FIG. 1B. This difference in the response flow route allows substantially higher scalability of the TCP handoff mechanism than TCP splicing. In considering different cluster designs for content aware balancing strategies, it is assumed that a distributor component implements some form of TCP handoff mechanism.
FIG. 2A shows a typical cluster configuration 200 with content-aware request distribution strategy and a single front-end 210. In this configuration, the typical bottleneck is due to the front-end node 210 that combines the functions of a distributor 220 and a dispatcher 230. Back-end 240 comprises servers 245a, 245b, and 245c. 
Thus, another recent solution is shown in FIG. 2B. It is based on alternative distributed cluster design 250 where the distributor components 260a, 260b, and 260c are co-located with the server components 270a, 270b, and 270c, while the dispatcher component 280 is centralized.
In this architecture the distributor is decoupled from the request distribution strategy defined by the centralized dispatcher module. The switch in front of the cluster can be a simple LAN switch or L4 level load-balancer. For simplicity, we assume that the clients directly contact distributor, for instance via RR-DNS. In this case, the typical client request is processed in the following way. 1) Client web browser uses TCP/IP protocol to connect to the chosen distributor; 2) the distributor component accepts the connection and parses the request; 3) the distributor contacts the dispatcher for the assignment of the request to a server; 4) the distributor hands off the connection using TCP handoff protocol to the server chosen by the dispatcher (since in this design the centralized dispatcher is the most likely bottleneck, the dispatcher module resides on a separate node in a typical configuration, as shown in FIG. 2b); 5) the server takes over the connection using the TCP hand-off protocol; 6) the server application at the server node accepts the created connection; and 7) the server sends the response directly to the client.
This design shows good scalability properties when distributing requests with the earlier proposed LARD policy. The main idea behind LARD is to logically partition the documents among the cluster nodes, aiming to optimize the usage of the overall cluster RAM. Thus, the requests to the same document will be served by the same cluster node that will most likely have the file in RAM. Clearly, the proposed distributed architecture eliminates the front-end distributor bottleneck, and improves cluster scalability and performance.
However, under the described policy in a sixteen-node cluster, each node statistically will serve only 1/16 of the incoming requests locally and will forward 15/16 of the requests to the other nodes using the TCP handoff mechanism. TCP handoff is an expensive operation. Besides, the cost of the TCP handoff mechanism can vary depending on the implementation and specifics of the underlying hardware. It could lead to significant forwarding overhead, decreasing the potential performance benefits of the proposed solution.
Web server performance greatly depends on efficient RAM usage. A web server operates much faster when it accesses files from a cache in the RAM. Additionally, the web servers throughput is much higher too.
Accordingly, a need exists for a request distribution strategy that maximizes the number of requests served from the total cluster memory by partitioning files to be served by different servers. A need also exists for a request distribution strategy that minimizes the forwarding and the disk access overhead. Furthermore, a need also exists for a request distribution strategy that accomplishes the above needs and that improves web server cluster throughput.