Typical anti-virus and intrusion detection techniques rely on one or two methods: pattern matching and anomaly detection, both of which have shortcomings.
In pattern matching, binary software is scanned to see if it provides a match against any of a number of binary patterns, known as “signatures.” Because the patterns must be known, pattern matching cannot detect a previously-unknown item of malware unless the author of that malware included such a signature, for example by re-using a section of virus code from a previous virus. Further, pattern matching is easy to defeat by a process known as mutation. The malware author makes a series of alterations to his malware, and then tests it against the commercial products. Once a state is reached where the malicious software is still doing its malicious deeds, but is no longer caught by the virus scanners, the malware author can distribute the newly working mutated virus again. This is why signature-based products need to update their signatures so often: as new mutations are released, new patterns must also be released to detect them. For example, if a virus scanner is looking for a binary pattern like 0x2F5E . . . B00F, and the malware writer can produce a virus with a pattern that looks like 0x2F5E . . . B00A, it won't be recognized, and the new pattern must be added to an updated signature file that every user must then download in order to be protected. It is not uncommon for virus to have dozens or even hundreds of variants. Clever malware writers can even automate the mutation process so that no human interaction is required.
In anomaly detection, an algorithm attempts to classify inputs using a “classifier” or “discriminator.” The techniques used in anomaly detection are often very similar to those used in machine learning or image processing. The goal is to sort incoming network data into two bins, one that will be labeled “benign” while the other is labeled “suspicious.”
There are at least two ways to defeat anomaly detectors. One approach is known as maladaptive training, in which an anomaly detector is flooded with benign but unusual data (for example, by taking a real virus, neutralizing it, and then mutating the neutralized version many times). If this unusual data represents a significant fraction of the total network data, the malware author can in cause the anomaly detector to adapt its algorithm, so that an item of malware that is similar to the benign data will be classified as benign.
Another way to defeat anomaly detectors is an approach called “low and slow.” Anomaly detectors are typically looking at the network data associated with more than one host, so the amount of data they must handle is very large. As a result, anomaly detectors often have a “time window.” Any data that is outside the time window by virtue of being too old is not used as input to the algorithm. If the malware author can partition his malware into chunks, and send each chunk slowly enough that at most a single chunk will be within the time window of the anomaly detector (which can be as small as 30 seconds), then the anomaly detector may not find the malware. This approach is “low” because every chunk looks benign, even though the combined block of binary code would look malicious. It is “slow” because of the way in which it evades the anomaly detector's correlation process. Of course, in order to use a low and slow attack the malware author must include some code to reassemble the chunks; this is often sent in the first chunk (or sometimes the last).