The present invention relates generally to the performance of networked systems and, more particularly, to automated techniques for improving the performance of networked systems.
There has been a tremendous growth in the complexity of distributed and networked systems in the past few years. In large part, this can be attributed to the exploitation of client-server architectures and other paradigms of distributed computing.
Our interest is in automated techniques for improving the performance of heterogeneous distributed systems. Such systems are comprised of a variety of components including: computing elements, operating systems within the computing elements, middleware that links together computing elements, and applications that span computing elements. Herein, we use the term xe2x80x9ctargetxe2x80x9d to refer to the system, subsystem, or element that is being manipulated to improve its performance.
An initial question may be raised such as xe2x80x9cwhy not solve performance problems by using more hardware, such as faster processors and more memory?xe2x80x9d Sometimes, this is effective, at least up to the point where cost becomes a major issue. However, applying this approach in practice requires identifying resource bottlenecks, which requires some thought. Also, more hardware typically does not resolve logical bottlenecks, such as those due to locking or improper settings of task priorities.
The concept of xe2x80x9ctuningxe2x80x9d seeks to improve service levels by adjusting existing resource allocations. Doing so requires access to metrics and to the controls that determine resource allocations. In general, there are three classes of metrics: (1) xe2x80x9cconfiguration metricsxe2x80x9d that describe performance related features of the target that are not changed by adjusting tuning controls, such as line speeds, processor speeds, and memory sizes; (2) xe2x80x9cworkload metricsxe2x80x9d that characterize the load on the target, such as arrival rates and service times; and (3) xe2x80x9cservice level metricsxe2x80x9d that characterize the performance delivered, such as response times, queue lengths, and throughputs.
xe2x80x9cTuning controlsxe2x80x9d are parameters that adjust target resource allocations and hence change the target""s performance characteristics. We give a few examples. Lotus Notes, an e-mail system and application framework, has a large set of controls. Among these are: NSF_BufferPoolSize for managing memory, Server_MaxSessions for controlling admission to the server, and Server_SessionTimeout for regulating the number of idle users. In Web-based applications that support differentiated services, there are tuning controls that determine routing fractions by service class and server type. MQ Series, a reliable transport mechanism in distributed systems, has controls for storage allocations and assigning priorities. Database products (e.g., IBM""s DB/2) expose controls for sort indices and allocating buffer pool sizes.
To determine the effect of tuning adjustments, there must be a model that relates workload levels and the settings of tuning controls to the service levels that will be achieved. We refer to this as the target""s system model, or just xe2x80x9csystem model.xe2x80x9d It is difficult to acquire and maintain a system model, especially since application software can change dramatically from release to release (even between patch levels). Thus, system models are usually informal and imprecise.
In the existing art, tuning typically involves the following steps: (1) collect data; (2) use the system model to determine how tuning controls should be adjusted; and (3) goto step (1).
There are many challenges here. First, as noted previously, acquiring and maintaining the system model is difficult. Second, the controls are complex and often impact service levels in nonlinear ways. This makes it challenging to select the tuning controls to adjust as well as to determine what the settings of these controls should be. Third, the above scenario only considers current workloads. It may well be that by the time the tuning controls are adjusted, the workload will have changed so that new adjustments are necessary.
Because the expertise required for manual tuning is scarce, many have pursued an automated approach. A variety of target-specific or xe2x80x9ccustomized automated tuning systemsxe2x80x9d (CATS) have been developed. Examples include systems by: (1) Abdelzaher and Shin, as described in xe2x80x9cEnd-host Architecture for QoS-Adaptive Communication,xe2x80x9d IEEE Real-Time Technology and Applications Symposium, Denver, Colo., June 1998, the disclosure of which is incorporated by reference herein, who control quality of service for the delivery of multimedia using task priorities in the communications subsystem; and (2) Aman et al., as described in xe2x80x9cAdaptive algorithms for managing a distributed data processing workload,xe2x80x9d IBM Systems Journal, Vol. 36, No 2, 1997, the disclosure of which is incorporated by reference herein, who provide a means by which administrators specify response time and throughput goals to achieve in MVS (Multiple Virtual Storage) systems using MVS-specific mechanisms to achieve these goals.
CATS require that metrics and tuning controls be identified in advance so that mechanisms for their interpretation and adjustment can be incorporated into the automated tuning system. Thus, CATS construction and maintenance still require considerable expertise. With the advent of the Internet, software systems and their components evolve rapidly, as do the workloads that they process. Thus, it may well be that automated tuning systems must be updated on a rate approaching that at which tuning occurs. Under such circumstances, the value of automated tuning is severely diminished.
Since customized automated tuning systems are difficult to build and maintain, it would be highly desirable to have a generic automated tuning system. Thus, instead of requiring experts to incorporate detailed knowledge of the target, such a generic automated tuning system may learn the target""s performance characteristics. This may include having such a generic automated tuning system exploit prior knowledge of the target system, when such knowledge is available, reliable, and durable. As will be explained herein, the present invention provides such a generic automated tuning system and methodology.
As will be further explained in accordance with the present invention, a starting point in building such a generic automated tuning system is to construct a generic system model. Prior art in learning system models has largely focused on neural network approaches such as those in U.S. Pat. Nos. 5,893,905 to Main et al.; 5,461,699 to Arbabi and Fischthal; and 5,444,820 to Tzes and Tsotras, the disclosures of which are incorporated by reference herein. More related to objectives of the present invention is the work in U.S. Pat. No. 5,745,652 to Bigus, the disclosure of which is incorporated by reference herein, that describes a target-independent approach to automated tuning. This is accomplished by having a neural network that is trained off-line to learn the system model. In on-line operation, the system model is used in combination with a second neural network, the controller, to learn control actions.
The foregoing address two issues associated with constructing a generic automated tuning system, as is provided in accordance with the present invention: the generic system model and tuning control estimation. However, as will be explained, many other considerations are necessary as well.
First, existing art for target-independent automated tuning does not consider architectural support for access to the metrics and controls. Realizing generic, automated tuning requires well defined interfaces so that a generic automated tuning system can access the data required from the target. Previous work has ignored these considerations.
Second, the search for appropriate settings of tuning controls is facilitated by exposing information about the semantics of metrics and the operation of tuning controls. In particular, it is helpful for the target to place metrics into the categories of configuration, workload, and service level. These designations can aid the construction of a generic system model. Further, there should be a way to express the directional effects of tuning control adjustments since having such knowledge reduces the complexity of the search for appropriate settings of tuning controls. Past work has not focused on these concerns.
Third, the system model can be constructed using various learning approaches that enable different control algorithms to be employed. Prior work on automated tuning has largely focused on learning the system model using neural networks. It has not addressed other approaches that learn whether the system model falls into known classes of system models and then employ known effective controllers for such classes.
Fourth, one motivation for generic, adaptive tuning is that the system itself may change over time. This goes beyond the considerations discussed in Bigus in that we must detect changes and adjust the system model as well as retrain the controller. Prior art has not addressed these issues.
Lastly, it is well known in control theory that delays in the feedback loop in combination with xe2x80x9cnoisexe2x80x9d (e.g., variable workloads) severely limit the controllability of a system. These limitations can, in part, be overcome in automated tuning if there are reasonably accurate models for forecasting load. Prior work such as in Hellerstein, Zhang, and Shahabuddin, xe2x80x9cAn Approach to Predictive Detection for. Service Management,xe2x80x9d Symposium on Integrated Network Management, 1999, the disclosure of which is incorporated by reference herein, suggests that forecasting workloads is an achievable task, at least in some networked environments. However, to date, there has been no attempt to incorporate forecasting models into an automated tuning system.
The present invention provides systems and methodologies for adaptively tuning heterogeneous distributed systems and their components. In one aspect of the present invention, an architecture is provided for a generic automated tuning agent (referred to herein as a GATA), as well as the GATA enablements required by the target system. In another aspect of the present invention, a methodology is provided for incorporating prior knowledge of the target system, as well as for learning verifiable properties about the target system, that can aid in the construction of a generic system model. In yet another aspect of the present invention, an automated tuning process is provided that: (a) takes into account changes in the target""s system model; and (b) exploits workload forecasting models.
The architecture of the present invention includes components for metric access, tuning control manipulation, administrative access, and a generic controller that determines the settings of tuning controls. In a preferred embodiment, the generic controller has components for change-point detection, system model construction and execution, and workload model construction and execution. The enabled target system, which we refer to as the controlled target, provides interfaces for access to workload, service level, and configuration metrics, as well as a mechanism to adjust tuning controls.
In accordance with the present invention, the target system is enabled by providing interfaces to metrics and tuning controls. Both interfaces expose meta data. The metrics meta data provides, among other things, a means to classify the metric as configuration, workload, or service level indicators. For the tuning controls, the interface may indicate how service classes are affected by directional changes in specific controls. It is to be understood that these descriptions are not prerequisites to the operation of generic automated tuning. However, when present, these descriptions provide a means for generic automated tuning to operate more efficiently.
In addition, the present invention provides a method and a system for detecting changes in the target""s system model and to incorporate workload forecasting information. In one embodiment, the method comprises the steps of: (1) waiting for data; (2) if no system model is present or if a system model change-point has been detected, invalidating the current system model, and constructing a new system model or switching to a previously constructed system model, and if insufficient data is available to construct a new model, going to step (1); (3) if no workload forecast model is present or a workload model change-point has been detected, invalidating the current workload forecast model and constructing a new one, and if insufficient data is available to construct a new model, going to step (1); (4) selecting new settings of tuning controls by using the system model (at least implicitly) to evaluate settings of tuning controls that improve forecast values of service level metrics based on forecast values of workload metrics; and (5) going to step (1).
The present invention provides numerous benefits not present in the existing art. Some examples of such benefits are provided. First, although existing art describes attempts to provide means for target-independent control, the architectural considerations for generic automated tuning have not been addressed, especially considerations such as access to metrics and tuning controls. Without these capabilities, generic automated tuning systems cannot be built.
Second, existing art in the area of generic control does not provide a means to express prior knowledge of the target, especially its metrics and tuning controls. This knowledge is invaluable in terms of achieving service levels that are closer to the optimal in a shorter time frame.
Third, prior work has focused on specific approaches, such as neural networks, to generic automated tuning. The present invention is designed to support a spectrum of approaches to generic automated tuning, which ranges from learning an effective controller for the target system (assuming no knowledge about the target) through employing a known effective controller based on verifiable properties about the target system.
Fourth, existing art for automated tuning does not detect changes in the system model nor does the existing art indicate how to update the system model if it changes. Such capabilities are essential to adapt in dynamic environments.
Lastly, prior work in automated tuning does not incorporate workload forecast models into the automated tuning system. Incorporating such models can greatly improve the quality of service delivered, as well as reduce the variance of the service levels achieved by anticipating future target loadings, a consideration that is especially important when there are delays in the feedback loop.
In addition, we note that the present invention may be applied to distributed systems in the following way. First, a target may in fact be distributed. So, metrics may be extracted from multiple components and tuning controls on multiple components may be manipulated. Second, the target may be another GATA. This provides a means for hierarchical control.