Many server-side techniques have been proposed to increase the throughput and scalability of web servers and to decrease the request latency for clients. In an exemplary server-side technique described in M. T. Kwan et al, "NCSA's World Wide Web Server: Design and Performance," IEEE Computer, pp. 68-74, November 1995, independent web servers use a distributed file system known as Andrew File System (AFS) to access documents requested by the clients. A round robin Domain Name Service (DNS) is used to multiplex requests to the web servers. In this server system architecture, although the throughput is increased by balancing the load across the servers through multiplexing, a high degree of load balance may not be achieved due to DNS name caching at different places in the network. This DNS name caching will also prevent the clients from tolerating server failures.
Another approach that uses AFS is described in M. Garland et. al., "Implementing Distributed Server Groups for the World Wide Web," Technical Report CMU-CS-95-114, School of Computer Science, Carnegie Mellon University, January 1995. In this approach, a front-end server, called a dispatcher, is used to dispatch a request to one of a number of back-end document servers. The dispatcher monitors the load on the document servers and based on this information determines which server should service a given incoming client request. The document servers have access to all the requested documents by using the AFS. Unfortunately, these and other approaches based on AFS are limited by the need for the web servers either to go across the network through the file servers to fetch the document, as in the NCSA server, or to store all the documents locally.
The SWEB approach described in D. Andresen et al., "SWEB: Towards a Scalable World Wide Web Server on Multicomputers," Department of Computer Science Tech Report--TRCS95-17, U.C. Santa Barbara, September, 1995, uses distributed memory machines and a network of workstations as web servers. All the servers do not locally store all the documents, but can instead go over a LAN to fetch documents that are requested but are not locally available. At the front end, a round robin DNS is used to direct a request to one of the web servers. This web server then uses a pre-processing step to determine whether the request should be serviced locally or should be redirected to another server. The redirection decision is made based on a dynamic scheduling policy that considers parameters such as CPU load, network latency and disk load. If a decision is made to service a request locally, and if the document is not available locally, an appropriate server is chosen from which the document is fetched. If a decision is made not to service the request locally, another server is chosen and the client is redirected to that server using HTTP redirection. This system is scalable and does not require each server to locally store all documents. Although this system alleviates the problem of DNS name caching through the use of server redirection, the increase in throughput is still limited by the dynamic redirection and the need to go over the network to fetch documents. Furthermore, failures are still a problem due to the use of DNS name caching.
The "One-IP" approach described in O. P. Damani, P.-Y. Chung, Y. Huang, C. Kintala, Y.-M. Wang, "ONE-IP: Techniques for Hosting a Service on a Cluster of Machines," Sixth International World Wide Web Conference, Santa Clara, April 1997, and U.S. patent application Ser. No. 08/818,989 filed Mar. 14, 1997, distributes requests to different servers in a cluster by dispatching packets at the Internet Protocol (IP) level. A dispatcher redirects requests to the different servers based on the source IP address of the client. The One-IP approach provides a low-overhead scalable solution, but a potential drawback is that the load may not be optimally balanced if arriving requests do not have source IP addresses that are reasonably random.
An approach referred to as "TCPRouter" is described in D. Dias et al., "A Scalable and Highly Available Server," COMPCON '96, pp. 85-92, 1996. This approach publicizes the address of the server side router which receives the client requests, and dispatches the request to an appropriate server based on load information. The destination address of each IP address is changed by the router before dispatching. This means that the kernel code of every server in the cluster needs to be modified, although the approach can provide fault-tolerance and load balancing in certain applications.
A number of other server-side techniques are based on caching or mirroring documents on geographically distributed sites. See, for example, J. Gwertzman and M. Seltzer, "The Case for Geographical Push-Caching," HotOS '95, 1995, A. Bestavaros, "Speculative Data Dissemination and Service to Reduce Server Load, Network Traffic and Service Time in Distributed Information Systems," Proceedings of the International Conference on Data Engineering, March 1996, and A. Heddaya and S. Mirdad, "WebWave: Globally Load Balanced Fully Distributed Caching of Hot Published Documents," Computer Science Technical Report, BU-CS-96-024, Boston University, October 1996. These techniques are generally referred to as geographic push caching or server side caching. Client requests are sent to a home server which then redirects the request to a proxy server closer to the client. The redirection can be based on both geographic proximity and the load on the proxies. Dissemination of document information from the home server is used to keep the caches consistent. These techniques are limited in their scalability because of the need for keeping caches consistent. Furthermore, the load balancing achieved may be limited if the location information of the document is cached at the client. Fault-tolerance is also an issue as it will be difficult for the home server to keep dynamic information about servers that are faulty.