A variety of variables, and any changes thereto, affect the performance of individual and networked computer systems, including the various resources of each (e.g., the operating system, hardware, software applications, etc.). For example, a server may have multiple processors that can be allocated to different software applications according to prioritization of the workload. Or, for example, an operating system may be configured with larger or smaller disk caching buffers to efficiently balance various demands on memory allocation. Other examples may include system kernel parameters, disk bandwidth, process scheduling, application-specific parameters, etc.
Instrumentation exists that measures these variables and/or the performance of the computer system or various resources thereof. For example, instrumentation may measure the utilization of various resources (e.g., the central processing unit (CPU), memory, etc.), throughput, application health, response time, etc. This instrumentation is used by performance tools, and, albeit less often, by application software, to monitor the characteristics of the workload and the health of the services enabled by the applications running on the computer system. For example, the Application Response Measurement (ARM) standard may be used to instrument services to provide response time and throughput. An example of application-specific information may be the statistics on cache efficiency internal to a specific database. In addition, probes may simulate or generate service requests on a system, which may be measured to provide performance data. For example, Hewlett-Packard Company's (corporate headquarters, Palo Alto, Calif.) Vantage Point Internet Services makes use of such probes. However, the relationship between the system variables and the performance of the computer system is often indirect and non-deterministic.
In general, such performance tools are oriented towards obtaining performance metrics from the various instrumentation for display and report generation. Often these performance tools do not recommend any changes to the variables affecting the performance of the computer system. The system administrator must interpret these results to determine which variables, if any, can be reconfigured. For example, the performance tool may indicate to the administrator that the response time of a service has slowed beyond an acceptable threshold. In response, the administrator (or automated load balancer) may make changes to the CPU scheduling policy to favor the application providing the service.
Where these performance tools do make recommendations, the recommendations are generated from simple thresholds (e.g., provisioning a specific resource in response to a specific event). Even so, these recommendations are only based on current behavior, and are not based on a historical analysis. Other factors, and often more than one resource, may account for the slower response time. For example, the application may be accessing a storage device that is bottlenecked (i.e., at capacity with a large queue) because of paging activity by the operating system which may, in turn, have been caused by a bottleneck on physical memory allocation caused by another application allocating excessive memory. As such, changing the application CPU processing priority will not improve the response time of the service. Instead, changes to memory (partitioning) or storage (moving paging areas away from application data paths) are required to speed up response time.
Some workload management and load balancing tools, such as Hewlett-Packard's Workload Manager and WebQOS, are capable of adjusting system variables based on performance monitors. However the changes are coarse (single system resource level configuration such as processor allocation). Furthermore, the changes are not based on trends in historical data, and do not consider the affect of previous changes to these variables.