In the field of information technology, various types of metrics data, including numerical and unstructured metrics data, for applications and networks are collected in order to monitor application and network performance. When performance degradation occurs, these collected metrics may be analyzed via correlation to diagnose probable root cause(s). Correlating the collected metrics may allow for the identification of the metrics most correlated to a problematic metric associated with the performance degradation. However, as the number of applications and sampled data for disparate metrics collected per application increases, traditional monitoring systems should find relevant “information” out of a vast number of collected metrics. Beyond the sheer and increasing number of application metrics, applications are also operating on increasingly finer-grained data, such as finer time resolutions for performance data. This finer-grained data further increases the amount of sampled data. The monitoring systems, in the process of turning data into information, typically help users by determining various characteristics of data and by making it clear why certain information is interesting. To help accomplish this goal, many monitoring systems compile and analyze data using “baselines” or “thresholds” which dictate rules regarding expectations for the metric data. For example, a rule for CPU usage may state that “CPU usage can't be more than 90%.” In this example, for instance, envision that data center support (IT support) personnel may need to receive a notification indicating that the metric increases above the pre-defined baseline.
It is desirable to have systems that are well equipped to quickly and efficiently identify various data anomalies.