1. Field of the Invention
The present invention is generally related to general purpose, stored program, digital computers and more particularly relates to an automatic, resource efficient means for monitoring the performance of various portions of a computer system.
2. Description of the Prior Art
The term xe2x80x9cperformance monitoringxe2x80x9d refers to the process of monitoring the performance of various system components within a computer system while the computer system is operating under normal operating conditions. Performance monitoring is a key factor in the operation and maintenance of many of today""s complex computer systems.
In the past several decades, the demand on computer systems has steadily increased. Today""s software packages require much more processing power and storage capacity than those produced just a few years ago. In addition, many more people are using computers to do tasks that were traditionally done using other means.
Many factors may reduce the optimal performance of a computer system. First, there may be a bottleneck at the input/output (I/O) interface causing the central processing unit (CPU) to be idling a substantial portion of time waiting for data. Even a simple relief on a key constraint in performance can greatly increase the throughput on a system. In the case of I/O, there may not be sufficient channels or disks to provide adequate response time. Second, there may be an insufficient number of processor cycles available to accomplish the program workload awaiting execution. In an interactive or high volume transaction system, such as an airline reservation system or a bank transaction system with thousands of terminals and ATM stations, work is held up at the human interface. This is known as latent workload and can result in unacceptable levels of service. Finally, there may not be enough internal memory within the computer system to store all of the computer programs and data that are to be simultaneously executed and used by the data processing system. This can result in paging. Paging occurs when internal memory limitations require the resulting data from the computer program to be loaded and unloaded from an external storage device each time a process becomes active. Paging data in and out of external storage can greatly increase the time required to complete a given process resulting in unacceptable levels of performance.
The above examples are given only to illustrate the necessity for performance monitoring techniques within a computer system and are not intended as an exhaustive list. It is recognized that many other performance inhibitors exist in modern computer systems and that many of them may be detected by using performance monitoring techniques. However, the basic metrics used in determining performance and levels of service in modern computer systems are input/output service time, processor utilization, and memory utilization.
Performance monitoring of today""s computer systems is typically provided by using off the shelf software packages. Examples of such off-the-shelf performance monitoring software packages include the Viewpoint program available from Datametrics, the ALICE module of the SYSTAR products, and Online Activity Monitor (OSAM)/CMF Baseline available from TeamQuest. These software packages are executed on a particular computer or computer network and generate performance data based on a number of preselected factors. One such method is discussed in xe2x80x9cGetting Started in 1100/2200 Performance Monitoringxe2x80x9d, by George Gray, UNISPHERE Magazine, November 1993. All of the performance monitoring packages listed above use the Software Instrumentation Package (SIP), available from Unisys Corporation, for data collection.
These off the shelf software packages may prove to be useful for some users but they are not an ideal solution for others. Problems that exist with these software packages include: (1) only the performance parameters selected by the software developer are available to the user; (2) the software packages are typically only available for standard computer systems and therefore cannot be used during the development stage of a computer system or on less known computer systems without independent development of the performance monitoring software; (3) the software packages are typically run concurrently with and on the same CPU as the user software and therefore may slow down systems performance (in some cases, as much as 5-10%) while the performance monitoring software is executed; and (4) only hardware that is accessible by the software package, like CPU activity and I/O requests, can be monitored by these software packages.
Problems (1) and (2) listed above may be minimized by having the user write a customized performance monitoring software package for the user""s system. However, this requires a significant investment in resources to develop such a program. Problems (3) and (4) listed above cannot typically be eliminated by having the user write a customized software package for several reasons. First, only the nodes within the computer system that are accessible to the performance monitoring software can be monitored. This limitation is a result of having the performance monitoring strategy determined after the computer hardware is designed. Many nodes within a computer system are neither controllable nor observable via software. Second, the performance monitor software is run on the same CPU as the user programs and therefore may decrease overall system performance. Since the performance monitoring software may effect the performance of the system in which the software is attempting to measure, the overall accuracy of the results obtained by the performance monitoring software packages may be limited.
Performance monitoring is often a highly technical process requiring an analyst with many years of experience to examine the performance monitoring results. In a typical scenario, a user first suspects he/she has a problem. The user may then run one or more programs over a period of time to collect utilization statistics, archive and compile the results, and give the data to an analyst, who will do a detailed inspection of the user data, and issue a diagnosis and recommendation. This process can take weeks to months to complete.
Another approach to performance monitoring is to use an external monitor. The external monitor is attached to the system, and often requires a first specialist to attach the device and a second specialist to interpret the results. This approach is both expensive and time consuming.
In many instances, users are unaware that their computer systems are either at or approaching performance limits, and thus performance monitoring is never initiated. When faced with a throughput problem, some customers will simply purchase a new computer system, unaware that the addition of a simple hardware upgrade to their existing computer system would provide better performance at a fraction of the cost of a new computer system.
The present invention overcomes many of the disadvantages associated with the prior art by providing an automated, real time performance monitoring facility for a computer system which runs periodically as a background process. This invention preferably uses built in performance data collection sites already present in the hardware of the computer system, microcode and/or operating system software. At a user selectable period of time, a sampling of key performance factors is taken from the performance data collection sites. The performance monitor then analyzes the sampled results by comparing the results against two or more performance threshold levels for each performance criteria. In an illustrative embodiment, two performance threshold levels (early warning and actual) are established for the performance criterion of processor utilization. When processor utilization reaches the 90 percent performance threshold, for example, an early warning performance limiter is detected. If the processor utilization reaches the 100 percent performance threshold, for example, an actual performance limiter is detected. If either an early warning or actual performance limiter is detected, an easy-to-understand informational message is provided to a computer operator identifying subsystems that are performance inhibitors along with suggestions of specific upgrade solutions that will address the identified performance problems.
An advantage of the present invention is that it is an automated, real time process that runs periodically during the normal operation of the computer. In past practice, performance monitoring often required an analyst with many years of experience to examine a system. This was often a procedurally complex, time consuming process, taking weeks to months to complete. In the present invention, an automated background process periodically samples a limited set of special purpose data collectors located at key performance sites in the computer system, compares the sampled results against a set of two or more performance thresholds for each performance criterion, and automatically issues a warning to a computer operator of any early warning or actual performance limiter. No special expertise is required to perform the analysis, or interpret the results.
The results are preferably a simple color coded message, much like a xe2x80x9cservice engine soonxe2x80x9d warning light on an automobile""s instrumentation.
Another advantage of the present invention is that minimal overhead is required. In the past, performance monitoring approaches could be quite inefficient, often requiring as much as 5-10% of system resources to monitor operations. In the present invention, the monitoring process preferably utilizes special purpose data collectors that are designed into the hardware, microcode and/or operating system software of a computer system. Further, the monitoring process of the present invention preferably runs only about 12 times an hour for approximately 15 seconds in order to gather the information from the performance data collector sites, analyze the results, and issue any necessary warnings.
Finally, since the performance monitoring process is automatically done as part of the normal operation of the computer, a user does not need to know or even suspect that there are performance problems in their computer system in order to receive notification of early warning or actual performance limiters. This present invention takes a proactive approach to detection of performance problems, often notifying the user of a early warning or actual problem even before the user suspects there is a problem, thus allowing the user adequate time to take measures to alleviate the problem.