The present disclosure relates to computer system security, and more specifically to detecting anomalies through runtime verification of software execution using a behavioral model.
Anomaly detection is the detection of anomalous behavior in the context of a normal model built using past activity. In security, anomaly detection is a technique of comparing new activity with known “normal” activity patterns. Usually, normal activity is learned from past computer system operation. Various techniques differ by model of normal behavior they use. Current models representing computer system behavior are typically either very simple and of limited precision, or are very complex and large.
N-grams are useful in implementing approximate system matching. It has been shown that n-gram models can be used to implement anomaly detection. For example, an n-gram model may be built from a trace of system calls as representing system's normal behavior. This model records short-range correlations between system calls under normal operation.
A model of system behavior as a single database of n-grams can be rather coarse-grained. While the system may run a number of distinct processes, only a single amalgamated behavior of the entire system is typically captured. This reduces the anomaly detection system's accuracy, as any input sequence is matched to all of known sub-sequences, regardless of whether they ever appear in this arrangement. It has been demonstrated that this weakness can be exploited to bypass the anomaly detection system. An attacker may craft a malicious sequence that, when decomposed, contains only known n-grams. Such a mimicry attack, may be difficult to exploit, as the attacker must know n-grams that have been used to model the particular system's behavior and encode a malicious sequence from them. However, for a large system, such databases can be very large. This large selection of n-grams may simplify the task of identifying a known n-gram.
Operation sequence behavior models can be used to detect the appearance of a previously unknown sequence of actions. Operation sequence behavior models can detect foreign code execution due to attacks such as buffer overflow or cross-site scripting. However, these models typically cannot be used to detect anomalies that do not manifest themselves by the appearance of an unknown sequence. For example, a typical sequence for a business transaction may contain a segment representing execution of a security mechanism. An attack may manifest by a sequence in which this segment is missing. Such a sequence may be accepted as it does not introduce anything unknown.
Another limitation of operation sequence behavior models is using only a single action attribute, which is often not enough to accurately represent a system's behavior. For example, a system may have a policy which requires that certain actions are executed by different users to ensure separation of duties. Without including other attributes in the model, it may not be possible to capture this policy and detect any violations.
Recent anomaly detection systems operate by detecting behavioral norms, such as repeating patterns of behavior identified from system logs. For example, a log trace may be partitioned into a number of sub-traces, or “strands”, identified as executions of a transaction-like process. The resulting behavioral model includes multiple distinct n-gram models for each of the strands. This approach may allow for building a more precise model. However, the model size is related to the number of stands used for learning process, and can be quite large. A large model may not be practical.
U.S. Pat. No. 8,225,402 discloses anomaly-based detection of SQL injection attacks. A method for detecting a SQL injection attack includes a training phase and a detection phase. In the training phase, a plurality of SQL queries is transformed into a respective plurality of SQL token domain queries which are processed using a n-gram analysis to provide a threshold and an averaging vector. In the detection phase, each newly arrived SQL query is transformed into a new SQL token domain query, and the n-gram analysis is applied together with the averaging vector and the threshold to each new SQL token domain query to determine if the new SQL query is normal or abnormal. The detection may be online or offline.