1. Field of the Invention
The present invention relates to computer software, and deals more particularly with methods, systems, computer program products, and methods of doing business by programmatically tuning thread pools at run-time (e.g., to balance server workload in a multithreaded server environment).
2. Description of the Related Art
The popularity of client/server computing has increased tremendously in recent years, due in large part to growing business and consumer use of the public Internet and the subset thereof known as the “World Wide Web” (or simply “Web”). Other types of client/server computing environments, such as corporate intranets and extranets, are also increasingly popular. As solutions providers focus on delivering improved Web-based computing, many of the solutions which are developed are adaptable to other client/server computing environments. Thus, references herein to the Internet and Web are for purposes of illustration and not of limitation. (Furthermore, the terms “Internet”, “Web”, and “World Wide Web” are used interchangeably herein.)
Millions of people use the Internet on a daily basis, whether for their personal enjoyment or for business purposes or both. As consumers of electronic information and business services, people now have easy access to sources on a global level. When a human user is interacting with a software application over the Internet and is requesting content, delays or inefficiencies in returning responses may have a very negative impact on user satisfaction, even causing the users to switch to alternative sources. Delivering requested content quickly and efficiently is therefore critical to user satisfaction, and accordingly, it is important to ensure that the systems on the server side of the network perform as efficiently as possible.
Experience has shown that in an application server handling requests for various clients in this type of environment, it is usually necessary to constrain the usage of resources in order to provide the best throughput and response time across the variety of requests that are received. One of the primary resources of interest is execution threads (referred to equivalently hereinafter simply as “threads”). Unconstrained creation, usage, and destruction of threads can hurt both response time and throughput for various reasons which are known in the art. For example, if too many threads are created, the system overhead for managing the threads may be unacceptably high, and too much memory may be required for storing system state and other information for these threads. In addition, contention for shared resources is a primary reason for constraining the number of available threads, since queuing large numbers of threads for limited resources typically causes thrashing on those resources. On the other hand, however, if too few threads are available, incoming requests may wait a long time before being assigned to a thread, thereby increasing the response time to the end user.
Therefore, it is useful to tune the number of threads in the system. The set of threads that have been created but not destroyed will be referred to herein as a “thread pool”. The number of threads to be created for the thread pool in a particular client/server environment is often specified by a user (e.g., a systems administrator) as a configuration parameter when initializing the server. Typically, tuning the thread pool size for a given set of applications is an iterative operation in environments where the applications are moderately to heavily driven, such that the thread pool is resized in an attempt to improve throughput and response times.
In a homogeneous workload, the requests will often have very similar overall system response times, and iteratively resizing the thread pool works well for improving performance of the system. Similarly, where the workload contains a mix of request types but those varied requests have similar response times, this type of resizing operation also works fairly well. However, for workloads with a highly varied response time mix, the problem is more complex.
When a single thread pool, having a constrained number of threads, is used with a workload consisting of request types that have varied average response times, it is possible to find a “best size” for the thread pool, where (on average) the requests are processed in a reasonable amount of time. However, this use of a single thread pool for a mixed workload tends to be sub-optimal. In particular, this approach disproportionately elongates the response times of requests having shorter execution times.
The reason for this phenomenon is that while constraining an application server's single thread pool is crucial to controlling resource utilization within that application server, as discussed above, the single thread pool also tends to become saturated with requests having longer execution times and thus those requests that have shorter execution times will effectively be starved. Bursts of requests with longer execution times can essentially block requests with shorter execution times from being assigned to a thread from the single constrained thread pool. And even though a particular request may have been processed very quickly by its thread once the thread was assigned from the thread pool, the request may have had to wait a very long time before the thread was assigned. The end user's (or in the more general case, the requester's) perceived response time for such requests may therefore be inordinately long.
What is needed are techniques that overcome these problems of the prior art.