1. Technical Field
The present invention relates generally to the performance of computer systems and, in particular, to a system and method for automated performance tuning of computer systems and applications in a generic, application-independent manner.
2. Description of Related Art
There has been a tremendous growth in the complexity of distributed and networked systems in the past few years. In large part, this can be attributed to the exploitation of client-server architectures and other paradigms of distributed computing. Such computer systems and software (operating systems, middle ware and applications) have become so complex that it is difficult to configure them for optimal performance.
Complex applications such as databases (e.g., ORACLE, DB2), message queuing systems (e.g., MQSERIES) and application servers (e.g., WEBSPHERE, DOMINO) have literally tens and hundreds of parameters that control their configuration, behavior and performance (DOMINO/DB2 admin guide). The behavior of such a complex system is also governed by the dynamic loads that are placed on the system by the system users. It takes considerable expertise to set individual parameters, and it is even more challenging to understand the interaction between parameters and the resultant effect on the behavior and performance of the system. Another factor that increases the difficulty of administering these systems is that such systems can be very dynamic and therefore may require constant monitoring and adjustment of their parameters, for instance if the workloads change over time.
Thus, the total cost of ownership (TCO) of the particular system may increase not only due to the cost of hiring expert help, but also due to potentially lost revenue if the system is not configured properly. To reduce the TCO and the burden on system administrators, many software vendors are now turning to software agents to help manage the complexity of administering these complex systems.
Software agents are very well suited to the task of controlling such systems. Prior expert knowledge could be incorporated once and for all in the agent, thereby reducing the need for expertise by the end-user. In addition, the software agent can be more closely tied to the system and can perform even closer monitoring and updating than humanly possible. Recent advances in the fields of Control Theory, Optimization, Operations Research and Artificial Intelligence provide a wealth of algorithms and techniques to dynamically tune the behavior of complex systems, even in the absence of much expert knowledge.
A variety of target-specific or “customized automated tuning systems” (CATS) have been developed. Examples include systems by: (1) Abdelzaher et al., as described in “End-host Architecture for QoS-Adaptive Communication,” IEEE Real-time Technology and Applications Symposium, Denver, Colo., June 1998, the disclosure of which is incorporated by reference herein; and (2) Aman et al., as described in “Adaptive algorithms for managing a distributed data processing workload,” IBM Systems Journal, Vol. 36, No 2, 1997, the disclosure of which is incorporated by reference herein. The system of Abdelzaher et al. controls quality of service for the delivery of multimedia using task priorities in a communications subsystem. The system of Aman et al. provides a means by which administrators specify response time and throughput goals to achieve in MVS (Multiple Virtual Storage) systems using MVS-specific mechanisms to achieve these goals.
The concept of “tuning” seeks to improve service levels by adjusting existing resource allocations. To accomplish the preceding requires access to metrics and to the controls that determine resource allocations. In general, there are three classes of metrics, as follows: (1) “configuration metrics” that describe performance related features of the target that are not changed by adjusting tuning controls, such as, for example, line speeds, processor speeds, and memory sizes; (2) “workload metrics” that characterize the load on the target, such as, for example, arrival rates and service times; and (3) “service level metrics” that characterize the performance delivered, such as, for example, response times, queue lengths, and throughputs.
“Tuning controls” are parameters that adjust target resource allocations and hence change the target's performance characteristics. We give a few examples. LOTUS NOTES, an e-mail system and application framework, has a large set of controls. Among these are: NSF13 BufferPoolSize for managing memory, Server_MaxSessions for controlling admission to the server, and Server_SessionTimeout for regulating the number of idle users. In Web-based applications that support differentiated services, there are tuning controls that determine routing fractions by service class and server type. MQ SERIES, a reliable transport mechanism in distributed systems, has controls for storage allocations and assigning priorities. Database products (e.g., IBM's DB/2) expose controls for sort indices and allocating buffer pool sizes.
CATS require that metrics and tuning controls be identified in advance so that mechanisms for their interpretation and adjustment can be incorporated into the automated tuning system. Thus, CATS construction and maintenance still require considerable expertise. With the advent of the Internet, software systems and their components evolve rapidly, as do the workloads that they process. Thus, it may well be that automated tuning systems must be updated on a rate approaching that at which tuning occurs. Under such circumstances, the value of automated tuning is severely diminished.
The prior art related to automated tuning has mostly focused on developing specific algorithms and architectures that are very tightly coupled to the target system (i.e., the system being controlled). In such cases, the algorithms cannot be easily reapplied to other systems, nor can other control schemes be inserted into the proposed architecture.
Existing prior art for target-independent automated tuning does not consider architectural support for access to the metrics and controls. Realizing generic, automated tuning requires well defined interfaces so that a generic automated tuning system can access the data required from the target. Previous work has ignored these considerations.
The search for appropriate settings of tuning controls is facilitated by exposing information about the semantics of metrics and the operation of tuning controls. In particular, it is helpful for the target to place metrics into the categories of configuration, workload, and service level. These designations can aid the construction of a generic system model. Further, there should be a way to express the directional effects of tuning control adjustments since having such knowledge reduces the complexity of the search for appropriate settings of tuning controls. Past work has not focused on these concerns.