With the advent of increased computing power and data storage, the development of computational tools to study ever increasingly complex systems in detail has increased. Examples of complex systems include weather systems, ecosystems, biological systems, and information technology systems. These computational tools enable vast amounts of information regarding a complex system to be collected, analyzed and presented for human understanding. Of particular importance to those who study these complex systems is the ability to identify variations, such as abnormalities, that occur within the complex system. For instance, in the case of an information technology infrastructure, variations from normal or expected operation could lead to failures, slowdown, threshold violations, and other problems. These types of problems are often triggered by unobserved variations or abnormalities in the operation of one or more nodes that cascade into larger problems.
In recent years, computational techniques have been developed to detect patterns in data produced by a complex system that do not conform to an established normal behavior for the complex system. These anomalies may translate into critical and actionable information in several application domains. However, many anomalies in complex systems do not adhere to common statistical definitions of an outlier. As a result, many anomaly detection techniques cannot be applied to a wide variety of different types of data generated by different complex systems. For instance, typical techniques for anomalous detection of time-series data rely heavily on parametric analysis. These techniques assume a known set of distributions for the metrics and perform simple calculations to detect percent out of normal. On the other hand, non-parametric techniques make no assumption about the data distribution and, as a result, can be applied to any data set but at the cost of complexity and more resource intensive algorithms. Those working in the computing industry continue to seek tools that can be used to detect anomalies in a given data set regardless of the type of data.