Modern enterprise systems are often complex and monitor a large number of performance metrics, ranging from relatively high-level metrics, such as transaction response time, throughput and availability, to low-level metrics, such as the amount of physical memory in use on each computer on a network, the amount of disk space available, or the number of threads executing on each processor on each computer. Metrics relating to the operation of database systems and application servers, operating systems, physical hardware, network performance, etc. are often monitored, even across networks that may include many computers each executing numerous processes, so that problems can be detected and corrected when or before such problems arise.
Often, however, too much monitoring information may be sent to an administrator. For example, there may be a large number of applications sending information at a given time or there may be a large number of consoles to monitor at the same time. Frequently an enterprise may not have enough experienced administrators to review all of the data generated by the various applications. In addition to monitoring the information, systems, or devices, an administrator may further analyze the data when problems arise to determine a root cause of the problem.
Due to the complexity of the systems, any problems and metrics involved can be large or complex. Some systems have been developed to call attention to those metrics that indicate that there may be abnormalities in system operation and to correlate and group such metrics, so that an operator of the system does not become overwhelmed with the amount of information that is presented. Correlating and grouping metrics may also assist operators to determine the cause of problems that arise, so that the proper corrections can be applied.
Metric correlation generally involves the comparison of each possible pair of metrics. Metric correlation in a large system having thousands of metrics, for example, can involve the comparison of millions of possible pairs of metrics and can quickly become impractical. Additionally, many systems and dynamic and interrelations between metrics can be frequently changed, complicating the task of isolating and correcting system problems.