The number of users accessing the Internet has grown exponentially in recent years, and it is now commonplace for popular web sites to service millions of users, or clients, per day. For such a popular web site, thousands—and, in some instances, tens of thousands—of clients may attempt to access the web site at any given time. A single web server at the host site is no longer adequate to satisfy the demand for access. A failure to provide access to a web site is especially troublesome for business applications, where the lack of client access may lead to frustrated customers and/or lost revenue. Even for those situations where a single server is sufficient to service all clients accessing a web site, a single server provides minimal ability to increase capacity if network traffic on that web site grows. In other words, a single server does not provide scalability.
To increase the capacity of a web site, it is known to deploy a plurality of servers, or a server cluster, at the host site, as illustrated in FIG. 1. Referring to FIG. 1, a server hosting system 100 includes a plurality of servers 150, including servers 150a, 150b, . . . , 150k, that are coupled with a dispatcher 130 via a network 140, the network 140 typically having an Ethernet-based architecture. A communication link 120 couples the dispatcher 130 with a router 110, and the router 110, in turn, is coupled to the Internet 5. The server cluster 150a-k is assigned a single IP (Internet Protocol) address, or virtual IP address (VIP), and all network traffic destined for—or originating from—the server cluster 150a-k flows through the dispatcher 130. Thus, the server cluster 150a-k appears as a single network resource to those clients accessing the server hosting system 100.
When a client attempts to establish a connection with the server hosting system 100, a packet including a connection request (TCP SYN) is received from the client at router 110, and the router 110 transmits the packet to the dispatcher 130. The dispatcher 130 will select one of the servers 150a-k to process the connection request. In selecting a server 150, the dispatcher 130 employs a load balancing mechanism to balance all incoming connection requests among the plurality of servers 150a-k. 
A number of load balancing mechanisms are known in the art. The dispatcher 130 may, for example, selectively forward a connection request to a server 150 based, at least in part, upon the load on each of the servers 150a-k. This form of load balancing is often referred to as “transactional” load balancing. Another load balancing mechanism commonly employed is known as “application-aware,” or “content-aware,” load balancing. In application-aware load balancing, a packet including a connection request is forwarded to a server 150 that is selected based upon the application associated with the packet. Stated another way, the packet is routed to a server 150, or one of multiple servers, that provides the application (e.g., email) initiated or requested by the client.
Using the load balancing mechanism, the dispatcher 130 selects one of the servers 150a-k and transmits the packet containing the connection request to the selected server 150 for processing. To route the packet to the selected server 150, the dispatcher's network address—e.g., layer 2 address or MAC (Media Access Control) address—is replaced with the selected server's network address. The selected server 150 then sends an acknowledgement (TCP SYN-ACK) to the client and creates a session.
A dispatch table 135 containing a list of each session in progress is maintained in the dispatcher 130. When a session is created, the dispatcher 130 places a session entry in the dispatch table 135, the session entry identifying the client and the server 150 selected for that session. Accordingly, the server 150 assigned to a session can be identified while that session is in progress, and any packet subsequently received from the client can be associated with the selected server 150—i.e., the dispatcher's network address replaced with the selected server's network address—and the packet forwarded thereto. Thus, once a session has been established, all additional packets received at the dispatcher 130 and associated with that session are routed to the selected server 150.
When a packet including a termination request (TCP FIN) is received from the client, the dispatcher 130 removes the corresponding session entry from the dispatch table 135 and forwards that packet to the selected server 150. The selected server 150, in turn, terminates the session with the client.
The performance of a web site can be enhanced by employing a server cluster in conjunction with server load balancing, as shown and described above with respect to FIG. 1. Such an approach eliminates the bottleneck that occurs in a single server system; however, in many instances, as the number of clients attempting to gain access to a web site grows, the bottleneck is simply shifted to the dispatcher, which provides only a single entry point to the server cluster. It follows, therefore, that the dispatcher and server cluster system is not amenable to scaling for increased capacity, as the addition of more servers into the cluster will simply intensify the bottleneck occurring at the dispatcher. Further, this conventional load balancing solution provides minimal fault tolerance, as a failure at the dispatcher can disrupt operation of the entire system.