A modern computer system is typically a complicated combination of software and hardware that has many different components for performing various functions and supporting various features. The optimal performance of a computer system often can be obtained only by continuously monitoring the health and performance of the components of the computer system, and correcting problems identified through such monitoring.
The need for continuous performance studies is present not only in operating an existing computer system but also in developing computer software and hardware products. For instance, during the development of an operating system, such as the Windows NT operating system by Microsoft Corporation, various components of the operating system are constantly being tested by subjecting them to strenuous operating conditions and observing whether they can withstand the heavy usage without failure. Such a performance study, often termed “stress testing,” helps the software developers to identify the weak spots or defects in the components of the operating system and provides valuable information as to the causes of failure.
In this regard, the collection of meaningful data regarding the operation characteristics of the system components and the compilation of the collected data into reports in useful formats are critical aspects of an effective system performance study. For each of the components being monitored, there may be a number of statistical variables that are of interest and should be tracked. The collected statistical data then have to be presented in easy-to-understand formats to facilitate identification of the status of the components and diagnosis of problems. Moreover, the results of a performance study often are to be reviewed by different levels of management. To that end, it is often necessary to provide reports that summarize the results of the performance study on different levels of abstraction to suit the different information needs of the management. For instance, in a network environment, a network administrator may want to know the total number of calls processed by a given server in the network, while a top-level manager may only be interested knowing the general health of the network.
Existing reporting tools for reporting the results of system performance studies, however, do not satisfactorily meet these reporting needs. For instance, in the example of software development of the operating system, the development team is divided into groups, with each group responsible for one or more components of the operating system. Stress tests for various components are run on a plurality of computers, and the states of the stressed components are closely monitored by the responsible groups. Generally, each group tracks and reports a variety of statistics collected from machines running the stress tests for its components, and shares the information within itself and with other groups. To that end, each group typically implements its own ad hoc tracking and reporting applications. Due to the various types of statistical data tracked by different groups and the inconsistent ways the data are reported, the stress data provided by one group often cannot be readily used with data provided by other groups for analysis and summary purposes. Moreover, information on critical system attributes necessary for monitoring system performance and health is often not uniformly tracked and in some cases are simply omitted from the tracking tools of some individual groups. Such an inconsistency in tracking critical system attributes makes it difficult to establish a baseline for system evaluation.
The need to generate useful reports from the collected data poses another problem. In many: cases, there are formatting requirements for stress reports that have to be adhered to. Experience has shown that constructing stress reports that meet the given formatting requirements is a very time-consuming task that is prone to errors. Often times such reports are generated by hand, requiring sometimes hours to assemble and format the relevant data. Moreover, the reporting needs often evolve over time, and it is difficult for the various developer groups to keep track of the ever-changing reporting requirements and formats and to rewrite their reporting software in response to the changes.