Nowadays, a computer system has evolved into a complicated combination of multiple software and hardware components for performing various functions and supporting various features. To obtain optimal performance of a computer system, continuous monitoring of the performance of the computer system and/or its components is necessary.
Continuous studies of component performance are necessary, not only in operating an existing computer system, but also in developing computer software and hardware components. For example, when developing an operating system, such as the Microsoft® Windows® XP operating system, development teams of various components of the operating system constantly stress test the various components. Stress testing is the process of subjecting a component to strenuous operating conditions and observing whether the component can withstand heavy usage without failure. Stress testing thus helps a component development team to identify any weakness or defect in the component and can provide valuable information as to the causes of a failure if meaningful data are collected during the stress testing process.
Therefore, an effective study of system performance should be able to collect meaningful data regarding the operating characteristics of a component and make the data easily accessible. During stress testing or other performance studies, a component may have a number of statistical variables that are of interest and should be tracked. For a computer system component, such statistical variables can capture the usage and information with regard to system memory, CPU, event log, etc., in the component. The collected data can then be used to identify the status of the component and diagnose problems in the component. Conventionally, such statistical data concerning a component in a computer system is called a metric.
Conventional approaches in capturing metrics usually provide static snapshots of the current status of a component performing a task such as stress testing. However, a static snapshot fails to reflect changes of a metric over time. Moreover, conventional approaches usually collect metrics that are specific for an individual computer system or component, rather than metrics that are common to different computer systems or components. Thus, conventional approaches fail to reveal how the same metric may vary in different computing environments. Further, collected metrics are usually stored as a text report, which provides a user little flexibility or variation in presenting metric data.
Furthermore, different component teams may store collected metrics in different formats and in different locations, rather than in a uniform format and at a centralized location that everyone can access and use. As a result, metric data provided by one component team often cannot be easily integrated with metric data provided by other component teams, therefore making it difficult to establish consistent system evaluation of different components.
Therefore, there exists a need to collect any system metric during run time of a component in a predefined format, and to store the system metric in a way so as to enable analyzing of either the individual metric or a combination of metrics, either offline or online.