The present invention relates to capacity management in a computer system such as a network or server and, more particularly, to a method and system for analyzing the performance of components of a computer system and applying rules to the results to identify bottlenecks as well as potential (or latent) bottlenecks resulting from improving system performance to remove other bottlenecks and to make recommendations for ameliorating the actual and latent bottlenecks.
Managing a computer system which includes a plurality of devices such as networks or servers is of special interest to data processing (or information technology) personnel. The computer systems typically include a plurality of diverse devices such as memory, disks, local area network (LAN) adapters and central processing units (CPUs) which interact in various interrelated ways when a variety of data processing applications are used in the computer system. As the systems get larger and more complex, these interactions become hard to define, model or predict the relationships between the devices, and hence the capacity of the system becomes difficult to manage. These systems are quite expensive to install and changes to the system involve a significant investment, so, while an investment is desirable which will improve the performance of the computer system, some investments in improvements to the computer system would not be worthwhile since the performance of the overall system would not improve.
Frequently, the performance of the computer system or network is less than it could be because only or more of the components is not appropriate for application loading of the computer system (or a network or server). It is desirable to know what changes to the computer system would be worthwhile in improving the capacity of the computer system and making those changes while avoiding changes which would not have a significant benefit to the performance of the computer system.
One way to address the proper components for the loading of the system is to provide a model of the load and simulate the system to provide an optimum (or desirable) combination of elements. While there are numerous simulation techniques, they all rely on approximations of the loading and the components, and, as the systems become larger and more complex and the loading becomes more complex, the simulations are approximations whose accuracy and reliability is subject to significant doubt.
Another approach to predicting performance of a complex computer system involves active monitors, or adding a known load to an existing system and measuring the resulting output and effect of the load. This requires that the system be available for experimentation and that the added load operate in a known manner, both of which are assumptions that may work in some instances but not in others. For example, BlueCurve Dynameasure by BlueCurve, Inc. Intentionally induces an artificial workload to determine performance characteristics of a computer system. Such an active monitor is disruptive to the network (in that it interferes, at least to some extent, with the ongoing work of the computer system and the artificial load on the network may not accurately reflect the real world actual work of the computer system, either now or in the future.
Another way to manage the capacity is described in the Performance Management Patent and involves sampling of indicators of system activity. These indicators can be displayed as described in the Performance Display Patent, if desired. In any event, the data must be interpreted by a professional who has experience in looking at the results and interpreting the data to make recommendations. Unfortunately, these experts are in demand and not enough exist, so it is unlikely that a network expert would be available to analyze the results and to make suggestions for improvement at any given time and having the local expert is an inefficient use of his time and expertise.
Accordingly, the prior art systems for capacity management are limited and have undesirable limitations and disadvantages.
The present invention overcomes the limitations and disadvantages of the prior art systems by providing an improved capacity management system which is easy to use and which provides an indication of the bottleneck(s) in the system, in an ordered list, along with recommendations on how to improve the computer system, based on the use of passive monitors.
The improved capacity management system uses data which is typically available from hardware and software and uses software tools which are typically available. Thus, it is not necessary to find some obscure data on the computer systems or to add additional overhead (such as additional hardware or new software) to the computer system in order to obtain the necessary data to make recommendations on improving the computer system.
The present system has the advantage that it does not require a simulation program and it does not require that the user find or create unusual sets of data (like artificial loads of an active monitoring system like BlueCurve Dynameasure) which other prior art capacity management systems may require for analysis.
The present system avoids the need for consulting with an expert in the field of analyzing computer system performance to look at the various data which is available on the performance of the computer system and make judgments of whether the system has bottlenecks and whether changes to the system could make a significant improvement to its performance.
The present invention has the advantageous effect that the recommendations can be sorted according to rules, such as addressing the most severe problem first, and can be combined so that the same problem is reported only once.
The computer system of the present invention has the benefit that interactions between the performance of different parts of the system are taken into consideration during the analysis and the recommendations. The present invention also takes into consideration that a system may be operating at less than its optimal performance due to a bottleneck, and, when this is remedied, another bottleneck, referred to as a latent bottleneck, may occur. This is because a system which has a bottleneck condition is operating at less than its full speed, and when the bottleneck is removed, the entire system will operate better, making those performance factors that are above a second or lower threshold likely to exceed the higher threshold when the first bottleneck is remedied.
The present invention also has the advantage that certain periods of performance (like evenings and/or weekends) may be disregarded, if desired, in order to avoid the impact on the analysis of periods not particularly relevant to the users of the system. That is, the periods of greatest concern to the users can be focused on, either completely or with appropriate emphasis, and periods of lesser importance can be ignored or considered less important.
The disclosed computer system also has the capability of averaging system performance over a period of time to prevent peak periods of short duration from unduly influencing the data and the conclusions about the performance of the system. As with other monitors, the duration of the averaging may be adjusted by the user, if desired.
The present invention has the further advantage that the results can be made available over an Internet or intranet using hypertext markup language (HTML) format so that the results can be monitored from a remote site. The use of information in HTML format allows additional information (such as backup information and graphs, additional details, or a source of further information, such as an expert) to be made available by hot-links as well.
One further advantage of the present invention is that additional detail on the results can be added, such as warnings as to the strength of the recommendations and the confidence in the recommendations. If certain monitors are not present or have data only for a limited time, the results may be less reliable than if the same information was available over a longer period of time, and the system of the present invention has the advantage of providing information on the quality of the data on which the recommendations are made.
The present invention also has the advantage that the indicators are programmable and the definition of a bottleneck may be changed by the user. In this way, the user has his choice of a predetermined definition of a bottleneck or the use of his own customized version of a bottleneck. The present invention also includes preset parameters which define reliable data, but, again, the user can override these parameters, if desired, to customize his system.
The invention is an automated method of detecting and diagnosing computer system latent bottlenecks utilizing passive monitoring techniques. A latent bottleneck is a bottleneck that is suppressed by another bottleneck. Typically when the realized bottleneck is alleviated, the latent bottleneck will become realized.
This invention builds on the Performance Data Patent and the System Recommendation Patent. It has the advantages listed in these disclosures, plus the following: It tells the user what new bottleneck to expect after the current one is remedied. It is common for a system administrator to successfully fix a bottleneck, only to discover that a second bottleneck was waiting behind the first one and must also be fixed immediately. The user who knows about the second bottleneck ahead of time can be prepared to fix both at once. Detecting latent bottlenecks improves the quality of recommendations for alleviating realized bottlenecks because these recommendations are more likely to yield appreciable gains in system performance. It detects and diagnoses the latent bottlenecks using the same mechanism as the realized bottlenecks. No additional input is required since it uses the same monitor types, diagnoses, and report. The output has the same format and features as the realized bottlenecks. It works with the forecasted data such as are provided by the Performance Prediction Patent as well as with real data, so that latent bottlenecks can be forecasted.
Other objects and advantages of the present invention will be apparent to those skilled in the relevant arts in view of the following description of the drawings, taken together with the accompanying drawings and the appended claims.