1. Field of the Invention
The present invention relates generally to Internet client-server applications, and more specifically to a way of maximizing server throughput while avoiding server overload by controlling the rate of establishing server-side network connections.
2. Background Art
The importance to the modern economy of rapid information and data exchange cannot be overstated. This explains the exponentially increasing popularity of the Internet. The Internet is a world-wide set of interconnected computer networks that can be used to access a growing amount and variety of information electronically.
One method of accessing information on the Internet is known as the World Wide Web (www, or the “web”). The web is a distributed, hypermedia system and functions as a client-server based information presentation system. Information that is intended to be accessible over the web is stored in the form of “pages” on general-purpose computers known as “servers.” Computer users can access a web (or HTML) page using general-purpose computers, referred to as “clients,” by specifying the uniform resource locator (URL) of the page. Via the URL, the network address of the requested server is determined and the client request for connection is passed to the requested server. FIG. 1 is a network block diagram showing a plurality of clients and servers connected to the Internet.
Once the requested server receives the client request for connection, the client and server must typically exchange three packets of information to setup a connection. The number of packets specified above for opening a connection (or specified below for closing a connection) assumes that there is no packet loss in the process of connection establishment. In the event packet loss occurs, then the number of exchanged packets will increase correspondingly. A page typically consists of multiple URL's and in fact it is not uncommon to find websites with 40 or more URL's per page.
Once the connection is established, a client sends one or more URL (page) requests to the server, which consists of one or more packets. The server will then send one or more packet responses back to the client. Once a request and response is exchanged from the client and server, both client and server may close their respective connections. The closing of the connection takes a minimum of four additional packets of information exchange. Therefore, there is a significant amount of processing overhead involved in downloading even a single URL for a client where that client does not already have a connection established with the server.
Each packet that reaches the server interrupts the server's CPU to move that packet from the Network Interface Card (NIC) into the server's main memory. This process uses up server resources and results in loss of productivity on the server's CPU. In addition, to establish a connection at the server side the packet needs to be processed by the driver layer, where Ethernet specific information is handled. The driver layer sends the packet to the IP (Internet Protocol) layer for more processing, where all the IP related processing is handled. After this, the packet is passed to TCP (Transmission Control Protocol) layer, where the TCP related information is processed. The TCP layer consumes significant server resources to create a connection table, etc.
Most servers incorporate multitasking, which also consumes server resources and therefore may increase server response time. Multitasking, which is well known in the relevant art(s), is the ability to execute more than one task at the same time. Examples of a task include processing a URL or page request in order to service an existing client, establishing a new connection in order to accept new clients (which involves, at a minimum, essentially three tasks as described above), closing a connection to an existing client (which involves, at a minimum, essentially four tasks as described above), etc. In multitasking, one or more processors are switched between multiple tasks so that all tasks appear to progress at the same time. There are at least two basic types of multitasking that are well known to those skilled in the art, including preemptive and cooperative.
Whether the operating system of a particular server (including, but not limited to, application servers and database queuing) uses preemptive or cooperative multitasking, the response time to URL (page) requests increases as there are more tasks in the system, including tasks in the form of URL requests from more clients. In addition, the response time to a page request increases as the number of new clients trying to gain access to the server increases within a short period of time. For example, if a surge of new clients attempt to gain access to the server at the same time, then under certain load conditions the server may spend the majority of its processing resources accepting new clients rather than servicing its existing clients. A surge of new clients can be the result of a popular web site attracting many new visitors, a server attack, and so forth. A server attack happens with one or more malicious users make regular requests that are issued at a very high rate in the attempt to crash a server.
Servers are also faced with the unpredictable and erratic nature of intenet traffic and the inconsistent arrival of requests over the web. Many factors contribute to the wide variability of web traffic including the popularity of a URL or website, the variations in performance of the multiple points of web infrastructure encountered by a request as it traverses the net, including routers, switches and proxy devices and the overall congestion on the infrastructure over which the traffic is being carried.
Servers are designed to do certain things well. Servers are typically general-purpose machines that are optimized for general tasks such as file management, application processing, database processing, and the like. Servers are not optimized to handle switching tasks, such as opening and closing network connections. Under certain load conditions, these tasks can represent a considerable overhead, consuming a large percentage of the server's processing resources, often on the order of twenty percent and sometimes up to fifty percent. This problem is referred to herein as “connection loading.”
The server may provide to its existing clients unacceptably slow server response time when the server is forced to spend most of its processing resources accepting new clients and therefore not servicing existing clients. In fact, when there is no limit on the amount of clients a server is accepting and/or servicing, often times the result is declining server performance, including server failure or crash and/or the failure to service some or all requests coming to it. Some servers, once they reach processing capacity, may just drop or block a connection request. When the response time for a server is unacceptably slow and/or has a tendency to crash often and/or the client's connection request is blocked or dropped, the owner of the server may lose business. This loss of business is detrimental to anyone seeking to conduct business over the Internet.