1. Field
The invention disclosed and claimed herein generally pertains to a method of anomaly detection at the code level of a computer program. More particularly, the invention pertains to a method of the above type, wherein invariants associated with data structures of the program's concrete state are used to detect anomalies.
2. Description of the Related Art
Anomaly detection is the act of detecting patterns in a given data set that do not conform to an established normal behavior. Anomaly detection is a highly active area of research and development in academia as well as in industry, and breaks into two subareas. One subarea is rule-based anomaly detection, which is the act of discovering anomalies based on a set of rules defining normal behavior. The other subarea is statistical anomaly detection, which uses learning techniques to automatically infer a set of “likely invariants” that characterize the normal behavior of the software system. In this case, there is no need for a user provided specification of the normal behavior. Instead, the anomaly detection system needs to be trained prior to its deployment.
A recent and significant development in the area of statistical anomaly detection, published in a paper of Cova et al., referred to herein as the “Swaddler approach”, suggests that anomalies can be discovered at the level of program code, rather than at the external interface of a program, i.e. input payloads. This is achieved by instrumenting the subject program, and establishing likely invariants at each basic block of the program visited during the training phase. These invariants are encoded as a model, assigning a probability value to a feature of the state variable or a set of state variables associated with a block that is about to be executed. This value reflects the probability of occurrence of a given feature value with regards to an established model of “normality”.
While the current state of the art as represented by the Swaddler approach has been shown, quite convincingly, to be of practical value, it is still characterized by a number of limitations. These include issues pertaining to expressiveness, portability, overhead and accuracy. In regard to expressiveness, Swaddler cannot capture invariants across more than one control flow. Regarding portability, letting each basic block in the program be anomaly aware has the undesirable effect of making the detection system highly sensitive to code changes. Regarding overhead, performing anomaly checks at each basic block is highly expensive. It is difficult to see how the Swaddler solution can scale to enterprise applications comprising on the order of hundreds of millions of lines of code, including their library dependencies.
Finally, in regard to accuracy, a further negative byproduct of testing for anomalies at each basic block is that the system is more likely to issue false alarms. The more checks there are, the more likely it is for statistical reasoning to come to the wrong conclusion.