1. Field of the Invention
The present invention relates to computer performance, and deals more particularly with a technique, system, and computer program for enhancing performance of a computer running a multithreaded server application. A scheduling heuristic is defined for optimizing the number of available threads. A 2-stage queue is defined for passive sockets, in order to ensure threads are not assigned to connections unless data is being sent. A new type of socket is defined, for merging input from more than one source and making that merged input available for scheduling. A function is defined for optimizing assignment of threads to incoming requests when persistent connections are used.
2. Description of the Related Art
A multithreaded application is a software program that supports concurrent execution by multiple threadsxe2x80x94that is, a re-entrant program. A thread is a single execution path within such a program. The threads execute sequentially within one process, under control of the operating system scheduler, which allocates time slices to available threads. A process is an instance of a running program. The operating system maintains information about each concurrent thread that enables the threads to share the CPU in time slices, but still be distinguishable from each other. For example, a different current instruction pointer is maintained for each thread, as are the values of registers. By maintaining some distinct state information, each execution path through the re-entrant program can operate independently, as if separate programs were executing. Other state information such as virtual memory and file descriptors for open I/O (input/output) streams are shared by all threads within the process for execution efficiency. On SMP (Symmetric Multiprocessor) machines, several of these threads may be executing simultaneously. The re-entrant program may contain mechanisms to synchronize these shared resources across the multiple execution paths.
Multithreaded applications are becoming common on servers running in an Internet environment. The Internet is a vast collection of computing resources, interconnected as a network, from sites around the world. It is used every day by millions of people. The World Wide Web (referred to herein as the xe2x80x9cWebxe2x80x9d) is that portion of the Internet which uses the HyperText Transfer Protocol (xe2x80x9cHTTPxe2x80x9d) as a protocol for exchanging messages. (Alternatively, the xe2x80x9cHTTPSxe2x80x9d protocol can be used, where this protocol is a security-enhanced version of HTTP.)
A user of the Internet typically accesses and uses the Internet by establishing a network connection through the services of an Internet Service Provider (ISP). An ISP provides computer users the ability to dial a telephone number using their computer modem (or other connection facility, such as satellite transmission), thereby establishing a connection to a remote computer owned or managed by the ISP. This remote computer then makes services available to the user""s computer. Typical services include: providing a search facility to search throughout the interconnected computers of the Internet for items of interest to the user; a browse capability, for displaying information located with the search facility; and an electronic mail facility, with which the user can send and receive mail messages from other computer users.
The user working in a Web environment will have software running on his computer to allow him to create and send requests for information, and to see the results. These functions are typically combined in what is referred to as a xe2x80x9cWeb browserxe2x80x9d, or xe2x80x9cbrowserxe2x80x9d. After the user has created his request using the browser, the request message is sent out into the Internet for processing. The target of the request message is one of the interconnected computers in the Internet network. That computer will receive the message, attempt to find the data satisfying the user""s request, format that data for display with the user""s browser, and return the formatted response to the browser software running on the user""s computer. In order to enable many clients to access the same computer, the computer that receives and/or processes the client""s request typically executes a multithreaded application. The same instance of the application can then process multiple requests, where separate threads are used to isolate one client""s request from the requests of other clients.
This is an example of a client-server model of computing, where the machine at which the user requests information is referred to as the client, and the computer that locates the information and returns it to the client is the server. In the Web environment, the server is referred to as a xe2x80x9cWeb serverxe2x80x9d. The client-server model may be extended to what is referred to as a xe2x80x9cthree-tier architecturexe2x80x9d. This architecture places the Web server in the middle tier, where the added tier typically represents databases of information that may be accessed by the Web server as part of the task of processing the client""s request. This three-tiered architecture recognizes the fact that many client requests are not simply for the location and return of static data, but require an application program to perform processing of the client""s request in order to dynamically create the data to be returned. In this architecture, the Web server may equivalently be referred to as an xe2x80x9capplication serverxe2x80x9d. When the server executes a multithreaded application program, the server may equivalently be referred to as a xe2x80x9cthreaded server, or xe2x80x9cmultithreaded serverxe2x80x9d.
The server is responsible for the threads. The set of threads that have been created but not destroyed will be referred to herein as a xe2x80x9cpoolxe2x80x9d of threads. The number of threads to be created for the pool is typically specified by a user (e.g. a systems administrator), as a configuration parameter when initializing the server. Typically, this parameter is set so that the server creates a large number of threads, in order to deal with the maximum anticipated connection load (i.e. the maximum number of incoming client requests).
The TCP/IP protocol (Transmission Control Protocol/Internet Protocol) is the de facto standard method of transmitting data over networks, and is widely used in Internet transmissions. TCP/IP uses the concept of a connection between two xe2x80x9csocketsxe2x80x9d for exchanging data between two computers, where a socket is comprised of an address identifying one of the computers, and a port number that identifies a particular process on that computer. The process identified by the port number is the process that will receive the incoming data for that socket. A socket is typically implemented as a queue by each of the two computers using the connection, whereby the computer sending data on the connection queues the data it creates for transmission, and the computer receiving data on the connection queues arriving data prior to processing that data.
For applications which receive requests from a number of clients, a special xe2x80x9cpassivexe2x80x9d socket is created which represents a queue of pending client connections. Each client that needs the services of this application requests a connection to this passive socket, by using the same server port number (although communications using a secure protocol such as Secure Sockets Layer, or xe2x80x9cSSLxe2x80x9d, typically use a different port number than xe2x80x9cnormalxe2x80x9d communications without security, for the same application). The server accepts a pending client connection from the special passive socket. This creates a new server socket, which is then assigned to an available thread for processing.
A number of shortcomings exist in the current approach to implementing multithreaded server applications running in this environment, which result in less than optimal performance of those applications. With the increasing popularity of applications such as those running on Web servers, which may receive thousands or even millions of xe2x80x9chitsxe2x80x9d (i.e. client requests for processing) per day, performance becomes a critical concern. The present invention addresses these performance concerns.
In existing server implementations, a separate xe2x80x9cdispatcherxe2x80x9d thread is typically responsible for monitoring the queue which receives incoming connection requests for the passive socket for a given application. To differentiate between the thread doing the dispatching, and those threads to which it dispatches work, the latter are referred to herein as xe2x80x9cworker threadsxe2x80x9d. The dispatcher thread keeps track of the status of each worker thread, and assigns each incoming request to an available thread. An xe2x80x9cavailablexe2x80x9d thread is one that is ready to run, but has no work currently assigned to it. A thread in this state may equivalently be referred to as an xe2x80x9cidle threadxe2x80x9d. When work is assigned to an idle thread, it is no longer considered idle, and no further work will be assigned to it until it has completed its current work request. On SMP machines, the dispatcher thread may become a bottleneck that prevents worker threads from being scheduled fast enough to keep all of the processors busy.
Alternatively, a server may be implemented without using a dispatcher thread. In this approach, the threads are responsible for checking the passive socket queue to determine if there are any connection requests. As each thread completes the work request it has been processing, it looks on the queue for its next request. If a request is waiting, the thread removes the request from the queue, and begins to process it. If no request is waiting, the thread becomes an idle thread. The idle thread may then xe2x80x9csleepxe2x80x9d, whereby a system timer is used to cause the thread to wait for a predetermined period of time, and then xe2x80x9cawakenxe2x80x9d to recheck the queue to see if work has arrived. This is referred to as xe2x80x9cpollingxe2x80x9d mode. A more common alternative to polling mode is to use event-driven interrupts. In that approach, the thread will go into the idle state and wait for a system-generated interrupt that will be invoked when work arrives, signalling the thread to become active again. Going into the idle state is also referred to as xe2x80x9cblockingxe2x80x9d, and being awakened from the blocked state (i.e. receiving the interrupt) is referred to as xe2x80x9cunblockingxe2x80x9d.
In current server implementations that use event-driven interrupts, as each worker thread completes its current request, it checks the passive socket queue to see if any requests are waiting. When there is no waiting request, the thread blocks. Any number of threads may be blocked at a given time. When the next incoming request arrives, an event is generated to wake up the threads. Each blocked worker thread receives this interrupt, so each unblocks and tries to take the request from the queue. Only the first worker thread will be able to take the incoming request, and the others will again find the queue empty and return to the blocked state. However, a new API (Application Programming Interface) is under development to change this approach to interrupt generation. The API is referred to herein as xe2x80x9caccept_and_receivexe2x80x9d. According to the accept_and_receive API, when an incoming request arrives, an interrupt will be generated only to a single blocked thread.
This new interrupt approach leads to the first performance problem to be addressed by the present invention, which will be referred to herein as xe2x80x9cover-schedulingxe2x80x9d. When the number of incoming connections is less than the number of threads in the thread pool (i.e. the connection load is less than the maximum for which the server is configured), too many threads from the pool are used to service the workload. In other words, the thread pool is being over-scheduled. This leads to inefficient use of resources.
The following scenario illustrates the over-scheduling problem. Suppose all threads are blocked, waiting for connection requests. A first request arrives. The system scheduler wakes up one of these blocked threads, and assigns the incoming request to that thread. The thread begins processing the request. Then, a second request arrives, so the scheduler wakes up a second blocked thread and assigns this new request to it. The second thread begins processing this new request. The first thread completes the request it was working on, and checks the passive socket. Finding no new connection requests there, the first thread blocks. For two requests, the scheduler has awakened two threads.
However, it may be that thread one was nearly finished with its first request at the time the second request arrived. When this is the case, it would be more efficient to wait for the first thread to finish and find the second request when it checks the passive socket, as opposed to awakening the second thread. If the scheduler awakens a new thread for each incoming request (i.e. it over-schedules the threads), a thread working on a request is guaranteed to find the incoming connection queue empty when it completes its current request and checks for another. The threads will therefore block after each completed request. The repeated blocking and unblocking operations are expensive in terms of the overall pathlength for servicing a request. When a thread blocks, the scheduler will save the context information for that thread, and the thread will move from the xe2x80x9creadyxe2x80x9d state to the xe2x80x9cblockedxe2x80x9d state. The unblocking operation requires the fairly-significant overhead associated with interrupt processing.
A further impact on the system""s performance during over-scheduling is caused by the memory paging mechanism. As a thread executes, it will refer to stored information. That information must be in memory to be processed. If it is not already in memory, it will be paged in. Typically, another page must be paged out to make room for the one being paged in. Paging mechanisms use algorithms to decide which page to page out. Commonly, the least-recently-used page is selected for paging out. When over-scheduling occurs, each thread blocks after it executes, and its pages therefore become unused. The longer a thread blocks, the more likely it becomes that its pages will be paged out. Then, when the thread is awakened, its pages must be paged back in, causing another thread""s pages to be paged out. The extra processing caused by these paging operations reduces the efficiency of processing the incoming request.
Additionally, the operation of checking the passive socket, only to find it empty, is a wasted operation which further reduces the efficiency of the blocking thread.
A second performance problem will be referred to herein as the xe2x80x9cmultiple input sourcexe2x80x9d problem. As previously stated, a server application may receive unsecure connection requests on one passive socket, and secure connection requests on a second passive socket. This will be the case, for example, in on-line shopping applications. The client shopper may request to display available products from an on-line catalog, eventually selecting some products to be ordered. Such requests for display of information are usually sent on an unsecure connection, so as not to incur the additional processing overhead associated with a secure connection. When the shopper places his order, he may choose to pay by credit card, and submit his credit card information electronically. This part of the transaction will be sent on the secure connection, in order to protect the shopper""s information. Typically, the seller will use the same server application for the entire sequence of shopping transactions. The application must therefore be able to accept both unsecure and secure connection requests from the two passive sockets.
When a Web server is hosting more than one hostname, each hostname having its own IP address, a pair of passive sockets is used for each hostname. Thus, a given application may need to accept connections that arrive on many passive sockets. The set of such sockets is referred to herein as multiple input sources.
With the previously-discussed dispatcher thread approach to socket queue management, one dispatcher (or xe2x80x9cacceptorxe2x80x9d) thread is allocated to each passive socket. When an incoming connection request arrives, these dispatchers are responsible for finding an available worker thread from the thread pool, and assigning an incoming request to the thread. As the number of dispatcher threads increases, the interference between them for managing the shared pool of worker threads also increases.
When dispatcher threads are not used, and the responsibility for checking the arrival queue belongs with the worker threads, the thread pool will be statically partitioned across the set of passive socket queues. Because the workload at any particular time, and the corresponding distribution of requests among the passive sockets, is unpredictable, it is very likely that this static partitioning will be less than optimal. One queue may have too few threads to handle its workload, and another may have too many. When too few threads are available, incoming requests have to wait on the queue, while available system capacity is left idle. Because an incoming request normally has a human waiting for the response, this type of delay in processing the response must be avoided to the greatest extent possible. When too many threads are available, the inefficiencies discussed previously for over-scheduling will result. A more dynamic partitioning, whereby the pool of worker threads is divided based on the current distribution of work among the passive sockets, cannot be accomplished by the server application because the depth of the connection queues on the passive sockets is not available to it.
A third performance problem will be referred to herein as xe2x80x9cpersistent connection schedulingxe2x80x9d. Persistent connection capability was introduced in version 1.1 of HTTP, and enables a single connection to be used for a stream of requests (and corresponding responses) between the client and server. Persistent connections are intended to reduce the amount of overhead associated with processing a series of requests, eliminating the set-up and tear-down costs of TCP connections that would otherwise be required for each individual request: instead, a single set-up and a single tear-down are used. Previously, each request generated at the client created a new connection, which lasted only for the duration of that request. An exchange of messages was required to set up the connection, and another exchange was required to close it. Many Web-based applications generate quite complex pages of information to display to users, and each page may require a number of separate requests to be sent through the network. For example, one request may be sent for each graphic image on the page, another for the static text, and yet others for any dynamically-generated text. Thus, for display of a single Web page, use of a persistent connection saves a great deal of processing overhead. That is, once a connection has been created for use between the two computers, the client may send any number of requests over that connection without stopping to wait for acknowledgement that the server has received each of those requests. This is referred to as a xe2x80x9cstreamxe2x80x9d mode of sending requests. The server is required to respond to all requests from the stream in order. Either the client or the server may terminate the connection on any request boundary, without creating a protocol error.
In practice, the client software in the browser keeps this persistent connection open until the user moves to a different Web site (where a different server socket address, and therefore a new connection, will be needed). Some amount of time may pass between the last request sent on an open persistent connection, and when the user moves to the new site. The socket for the existing connection will have no incoming data during this time. The server application cannot know whether the socket is in this particular state (i.e the client is finished sending data, but the connection is still open), or whether the client simply has not generated its next request yet. Therefore, uncertainty exists at the server regarding reading the next request for this type of connection. There may be data on the queue, data that will arrive soon, or data that will not arrive for quite some time. And, any of these data packets may contain a client request for ongoing work on the connection, or a request to close the socket.
If data will arrive soon, it is most efficient to keep the connection bound to the worker thread, allowing the worker thread to go idle temporarily. However, if there will be a long delay before data arrives, it is more efficient to unbind the worker thread from this connection, and assign it to another request. Then, when the next request for the unbound connection arrives, a threadxe2x80x94most likely a different thread than the one to which it was originally boundxe2x80x94is assigned to continue the processing. There is no way to know in advance which connections will have long delays between any given requests, when those delays will occur, or how long they will last. Attempting to partition the pool of worker threads between those that will accept new connections, and those that will handle connections that reactivate after a delay presents a similar problem to that discussed above for the multiple input source problem: assigning too many threads, or too few threads, to either partition will result in inefficiencies.
Accordingly, a need exists for a technique by which these inefficiencies in the current implementations of multithreaded server applications can be overcome. The proposed technique defines: a scheduling heuristic for optimizing the number of available threads; a 2-stage queue for passive sockets; a new type of socket, for merging input from more than one source and making that merged input available for scheduling; and a function for optimizing assignment of threads to incoming requests when persistent connections are used.
An object of the present invention is to provide a technique for enhancing the performance of multithreaded servers.
Another object of the present invention is to provide a technique whereby these performance enhancements are achieved by optimizing the scheduling of requests to worker threads.
It is another object of the present invention to provide this optimization by defining a scheduling heuristic that optimizes the number of available threads.
It is a further object of the present invention to provide this optimization by defining a new type of socket for merging input from more than one source, and making that merged input available for scheduling.
It is yet another object of the present invention to provide this optimization by defining a function that optimizes assignment of threads to incoming requests when persistent connections are used.
Other objects and advantages of the present invention will be set forth in part in the description and in the drawings which follow and, in part, will be obvious from the description or may be learned by practice of the invention.
To achieve the foregoing objects, and in accordance with the purpose of the invention as broadly described herein, the present invention provides a system, method, and computer-readable code implementing a software process for use in a computing environment having a connection to a network, for enhancing performance of a multithreaded application, comprising: a plurality of client requests for connections; a plurality of worker threads; a subprocess for receiving said plurality of client requests; and a subprocess for implementing a scheduling heuristic to alleviate over-scheduling of said worker threads. Further, a first group of said worker threads are active threads, said first group being comprised of changeable ones of said plurality of worker threads, and having a changeable number of said changeable ones, said changeable number being at least one; and said subprocess for implementing a scheduling heuristic further comprises a subprocess for balancing said changeable number in said first group against a current workload comprised of one or more of said plurality of client requests. Said subprocess for balancing may further comprise using an average delay, and also a maximum delay. Preferably, said average delay and said maximum delay are configuration parameters. In addition to said first group of worker threads, there may be a second group of said worker threads which are blocked threads (said second group being comprised of ones of said plurality of worker threads which are not in said first group), and which are stored in a Last-In, First-Out queue. Further, the present invention provides a system, method, and computer-readable code for enhancing performance of a multithreaded application, comprising: a subprocess for moving connections from a pending connections queue to a first queue when each of said connections are accepted; a subprocess for moving each of said connections from said first queue to a second queue when an initial data packet arrives for said connection; and a subprocess for assigning a worker thread to each of said connections on said second queue. Additionally, the present invention provides a system, method, and computer-readable code for enhancing performance of a multithreaded application, comprising: a subprocess for receiving input from multiple sources; and a subprocess for merging said received input onto a single queue for scheduling. Preferably, this further comprises: a subprocess for moving connections from a pending connections queue to a first queue when each of said connections are accepted; a subprocess for moving each of said connections from said first queue to said single queue when an initial data packet arrives for said connection; and a subprocess for assigning a worker thread to each of said connections on said single queue. Preferably, said subprocess for scheduling further comprises: a group of active worker threads comprised of changeable ones of a plurality of worker threads, and having a changeable number of said changeable ones, said changeable number being at least one; and a subprocess for implementing a scheduling heuristic for balancing said changeable number in said active group against a current workload comprised of said client requests stored on said single queue. Further, the present invention provides a system, method, and computer-readable code for enhancing performance of a multithreaded application, comprising: a plurality of persistent connections; a plurality of worker threads; a subprocess for binding selected ones of said persistent connections to selected ones of said worker threads, wherein an execution of said subprocess for binding results in a bound connection; and a subprocess for unbinding selected ones of said bound connections, wherein an execution of said subprocess for unbinding results in an unbound worker thread. Preferably, said subprocess for binding further comprises using a 2-stage queue; and said subprocess for unbinding further comprises using said 2-stage queue. Said subprocess for binding using said 2-stage queue further comprises: a subprocess for moving each of said persistent connections to said first stage when an initial data packet arrives for said connection; a subprocess for moving each of said persistent connections from said second stage to said first stage when data is received for said connection; and a subprocess for scheduling said persistent connections from said first stage; and said subprocess for unbinding using said 2-stage queue further comprises: a subprocess for moving selected ones of said bound connections from said first stage to said second stage when said selected bound connection goes idle; a subprocess for closing selected ones of said persistent connections in said second stage, responsive to a maximum idle period being exceeded; and a subprocess for making said unbound worker thread available to said subprocess for binding. Preferably, said subprocess for unbinding further comprises: a subprocess for closing further selected ones of said persistent connections in said second stage, responsive to exceeding a maximum number of idle connections.
The present invention will now be described with reference to the following drawings, in which like reference numbers denote the same element throughout.