Content anomaly detectors have been developed to identify anomalous data in an otherwise seemingly normal stream of data. Anomalous data can include instances of malicious code such as worms, viruses, Trojans, etc. In some of these detectors, an n-gram is looked at by the detector to determine if it is anomalous. An n-gram is a set of n units of data. For example, a 1-gram may be a single byte of data, and a 2-gram may be two bytes of data.
A content anomaly detection model based on 1-gram frequency distribution of datasets is effective at capturing attacks that display abnormal byte distributions, but it is vulnerable to attacks crafted to resemble normal byte distributions. A content anomaly detection model based on higher order n-grams frequency distribution of datasets can address this shortcoming. However, as the order of the n-grams increases, memory usage increases exponentially. This is because the maximum possible number of distinct n-grams increases exponentially as the order of the n-grams increases. For example, the maximum possible number of distinct 5-grams is 2565, or 1024 billion.
As new defensive (e.g., anomaly detection, etc.) techniques are developed to counter fast-spreading network threats, attackers have become more sophisticated as well. A model based on a mixture of high order n-grams frequency distribution can address threats posed by such sophisticated attackers, but only at the expense of heavy memory and computational overhead. For example, even for a small mixture of n-grams of modest orders, such as a mixture of 2-grams, 3-grams, and 4-grams, the total memory capacity may be impracticable.