Widespread use of the Internet allows the user to use various services via a network. Examples include mail, homepage browsing, search, online commerce, IP phones, and video on demand. Those network services may be provided in various forms and, recently, the use of a Web server has become a mainstream of the interface with clients.
The basic mechanism of services (Web services) using a Web server is as follows. First, a client sends a request, which specifies a URL (Uniform Resource Locator) identifying a content to be acquired, to a Web server. In response to this request, the Web server sends contents, corresponding to the URL included in the request, to the client as the response. Web services are provided by the repetition of this exchange between requests and responses.
HTTP (Hyper Text Transfer Protocol) is used as a communication protocol to transfer requests and responses. In this specification, the whole server system that performs Web services is called a Web server, the function to process the HTTP protocol on a Web server is called an HTTP server, and the function to generate contents according to a request is called a Web application.
In addition, video and audio streaming is increasingly used as contents provided by the Web services. The basic mechanism of streaming is as follows.
First, the Web browser of a client acquires the metafile of stream contents from a Web server. The metafile describes the URL of the stream contents. At the same time, the Web browser starts the player (stream reproduction application) associated with the extension of the metafile. Based on the URL indicated by the metafile acquired from the Web server, the player requests the streaming server to send the stream contents. Finally, the streaming server sends streaming data to the player.
In streaming, the server generally uses the RTSP (Real Time Streaming Protocol) to control the reproduction of streaming contents. The RTSP protocol, a protocol based on the HTTP protocol, sends and receives a request and a response, returned in response to the request, between the client and the server to control the reproduction of the stream contents.
The major control methods, which can be used by an RTSP request, are initialization (SETUP), reproduction (PLAY), and stop (TEARDOWN). RTSP, which controls multiple streams at the same time, has the concept of a session. That is, RTSP processes the period of time, from the moment the player sends a SETUP request to the moment the player sends a TEARDOWN request to terminate streaming, as one session.
When a SETUP request is received from the player, the stream server issues a unique session ID. The session ID is attached to the response and is notified to the client. The player attaches the notified session ID to subsequent requests to allow the stream server to identify a session to be controlled.
As the Web services become increasingly popular, the problems to be solved for using the services smoothly are becoming apparent. One of those problems is how to process extremely high traffic caused by the concentrated use of services.
Examples of the concentrated use of services include the concentration of requests for actively-traded stocks or for the sales of tickets and phone calls during natural disaster. A high volume of meaningless requests such as F5 attacks are sometimes sent by a malicious client. Too many requests, if sent due to those factors, degrade the request processing performance of the server.
The following are the factors that degrade the server's request processing performance during extremely high traffic times. First, the input/output overhead, such as interrupts and TCP/IP processing, is increased when the server receives too many requests to process. Second, the number of threads or processes for processing requests is increased and, as a result, the context switching overhead, which is an overhead required for switching threads and processes, becomes obvious. Third, because the response time until a response is returned to a client is increased, clients that cannot wait long are forced to cancel their requests.
As a result of those factors, the problem that is generated is that the processing performance of the server is degraded as the server becomes more and more congested.
FIG. 1 shows the experimental result showing a decrease in the processing performance of a Web server when the Web server receives too many requests. The horizontal axis indicates the input request rate, and the vertical axis indicates the throughput. FIG. 1 shows a case in which requests are sent to a Web server at a varying input request rate, that is, by varying the number of requests per unit time (rps). The throughput, that is, the number of requests (rps) the Web server can complete per unit time, is measured. FIG. 1 shows that the throughput is proportional to the input rate if the input request rate is within a fixed range (straight line (a) in FIG. 1). However, when the maximum throughput of the Web server is reached, the throughput begins to fall (straight line (c) in FIG. 1). So, there is a need for a technology that keeps the maximum performance of the Web server along the broken line (b) in FIG. 1 even after the Web server receives the number of requests that exceeds the maximum performance of the Web server. For reference, FIG. 2 shows the behavior of ideal throughput.
To prevent the server performance from being degraded by extremely high traffic, methods are proposed that limit in advance the amount of requests sent to a server. The following indexes are used to limit the amount of requests: (a) number of TCP connections, (b) server load status, (c) bandwidth, and (d) degree of parallelism.
When (a) the number of TCP connections is used as the index, the upper limit of the number of TCP connections connectable simultaneously is determined to avoid the overload of the server. This method is used for general-purpose HTTP servers such as Apache and load balancers. However, the load varies largely among TCP connections depending upon the request type, client line speed, and so on. Because of this, a problem is generated that a new TCP connection cannot be established because the server becomes overloaded before the number of TCP connections reaches the upper limit or, conversely, because the number of TCP connections reaches the upper limit even if the server resources are available.
When (b) the server load status is used as the index, the server load status is estimated from the CPU usage rate, memory usage amount, or response time to determine if the server is overloaded. If it is determined that the server is overloaded, the traffic control is performed to reduce the server load, for example, by transferring or rejecting a new request. However, because the traffic control is performed after it is determined that the server is overloaded, a temporary decrease in the server performance cannot be avoided.
When (c) bandwidth is used as the index, the bandwidth control function such as a shaper is used to limit the amount of traffic that reaches the server.
However, the bandwidth cannot be used as the index for accurately measuring the load on the server. For example, the download of an image file occupies large bandwidth but gives a relatively lighter load on the server. So, it is difficult to absolutely avoid the overload by limiting the bandwidth while fully utilizing the resources of the server.
When (d) degree of parallelism is used as the index, the number of threads or processes that the server executes at the same time is limited. Limiting the number of threads or processes in this way can reduce the context switching overhead involved in the increase in the number of threads or processes for processing requests.
An example of controlling the degree of parallelism is described in Document 1 (Masahiro Matsunuma, Hideaki Hibino, Yoshiki Sato, Kenichi Mitsuki, Sigeru Chiba: “Session-Level Queue Scheduling for Improving Performance Degradation of Web Application at Overload Time”, Second Dependable Software Workshop (DSW'05), pp. 105-114, January, 2005) in which the HTTP server is extended to limit the degree of parallelism on a page basis. However, even if the degree of parallelism is controlled on the server, the overhead of interrupts or TCP/IP processing, which is the primary cause of request-processing performance degradation and which is involved in the reception of too many requests for the server to process, cannot be avoided. The result is that the processing performance of the server is degraded as with other methods during extremely high traffic times. Another problem is that, because the HTTP server or the Web application must be changed, it is difficult to introduce this method into the services already in operation.
Another example of controlling the degree of parallelism is to limit the number of sessions on a streaming server. That is, a streaming server usually has an upper limit on the number of sessions that can be active thereon at the same time. Putting this limit avoids the overload on the server generated by an increase in the number of sessions.
However, limiting the number of sessions does not limit the reception of control requests via RTSP. Because of this, the problem is that the concentration of RTSP requests on a stream server increases the overhead for processing requests and degrades the processing performance of the stream server.
The performance of a server is degraded by an increase in the interrupts, input/output, and context switching overhead that are caused when new requests are received as shown in FIG. 3(a). To remove such an overhead and to maximize the performance of the server, it is ideal that the next request arrives immediately after the server processing is completed as shown in FIG. 3(b). In this case, an overhead that is generated when the server receives too many requests to process is not generated. In addition, there is no spare time in the server from the moment the processing is completed to the moment the next request arrives.