Data collection is undertaken for a variety of reasons, such as to document/monitor system performance (such as a manufacturing plant performance), to monitor usage (such as traffic on a telecommunications system, such as the internet), or to predict characteristics for decision making (such as to predict a credit card use as fraudulent). A variety of data manipulation techniques allows information to be extracted from a data set, such as trend curve analysis, statistical analysis, feature extraction, etc., and the analysis can be used to identify or characterize a data point as “anomalous,” or a substantial deviation from a data set tendency. If the data set is analyzed using trend analysis, for instance, a particular data point may be characterized as anomalous if it is more than a designated distance from a fitted trend; if a statistical analysis is used, a data point may be considered anomalous if it is more than a designated number of standard deviations away from some measure of central tendency. The particular scheme used to characterize, organize or “measure” the data set will provide a means of distinguishing “anomalous” from non-anomalous.
Data set characterization can require substantial user input and knowledge of the data set. To overcome the need for user supervision or input, data set manipulation techniques have been developed that attempt to learn from a training data set, such as those using machine learning techniques like artificial neural-networks, Kohonen's self-organizing maps, fuzzy classifiers, symbolic dynamics, multivariate analysis, and others. These techniques have become popular because of their high detection accuracies at low false positive rates. However, the techniques have two drawbacks: (1) most of these techniques are not readily adapted to different applications; and (2) these techniques construct anomaly detection methods with single machine learning methods like artificial neural-networks, pattern matching, etc.