Models of data can be used to show characteristics of that data. For example, network data traffic models can be used to show unique characteristics of specific network data traffic. Network data traffic models can be used for detecting network data traffic content anomalies, such as malicious code. This is because the characteristics of normal data traffic differ from the characteristics of data traffic harboring malicious code, such as viruses, worms, Trojan horses, spyware, and/or other data that can cause harmful effects. Anomaly-based systems can be used to generate anomaly detection models and/or use anomaly detection models to monitor and detect anomalous code in, for example, network traffic, instruction streams, and/or streams of function calls, etc.
Anomaly-based systems can be used to detect abnormal inputs and/or behavior without relying on, for example, a static set of signatures or a potentially incomplete behavioral specification. The efficacy of anomaly detection sensors can depend, however, on the quality of the data used to train them. Artificial or contrived training datasets may not provide a realistic view of the deployment environment. On the other hand, real world datasets may be dirty, for example, they may contain a number of attacks or abnormal events. However, the size of training data sets can make manual removal or labeling of anomalies difficult and/or impractical. As a result, sensors trained on such data may, for example, miss some attacks and their variations.