In data processing security, anomaly detection is a technique of comparing new activity in a computer system with known “normal” activity patterns in the computer system. Typically, normal activity is learned from past operation of the computer system. Various prior art techniques differ in the model of “normal” behavior they use.
N-grams are useful in implementing approximate matching of current activity and past activity of the computer system. Further information about n-grams can be found at http://en.wikipedia.org/wiki/N-gram. In the past, it has been shown that n-gram models can be used to implement anomaly detection.
Stephanie Forrest, Steven A. Hofmeyr, Anil Somayaji and Thomas A. Longstaff, “A Sense of Self for Unix Processes”, Proceedings of the 1996 IEEE Symposium on Security and Privacy (SP '96), IEEE Computer Society, Washington, D.C., USA, 120, discloses a method for anomaly detection in which “normal” is defined by short-range correlations in system calls of a process. An n-gram model is built from a trace of system calls as representing a system's normal behavior. The n-gram model records short-range correlations between system calls under normal operation.
A model of behavior as a single database of n-grams is rather coarse-grained. Whilst a computer system may run a number of distinct processes, only a single, amalgamated behavior of the entire computer system is captured. This reduces the accuracy of determining anomalies because any input sequence is matched to all of known sub-sequences, regardless of whether they ever appear in this arrangement.
Warrender C., Forrest S. and Pearlmutter B., “Detecting intrusions using system calls: alternative data models”, IEEE Symposium on Security and Privacy (1999), 133-145, discloses using variable sub-sequence sizes, masks, and state machines to analyze sequences of system calls into the kernel of an operating system.
David Wagner and Paolo Soto, “Mimicry attacks on host-based intrusion detection systems”, Proceedings of the 9th ACM conference on Computer and communications security (CCS '02), ACM, New York, N.Y., USA, 255-264 discloses the notion of a mimicry attack, which allows a sophisticated attacker to cloak their intrusion to avoid detection by an intrusion detection system (IDS). An attacker may craft a malicious sequence that, when decomposed, contains only known n-grams.
This mimicry attack may be difficult to exploit on a real system, as an attacker needs to know n-grams that have been used to model a particular system's behavior and encode a malicious sequence from them. However, for a large system, such databases can be very large and a large selection of n-grams may simplify the task.
Stephanie Forrest, Steven Hofmeyr, and Anil Somayaji, “The Evolution of System-Call Monitoring”, Proceedings of the 2008 Annual Computer Security Applications Conference (ACSAC '08), IEEE Computer Society, Washington, D.C., USA, 418-430, discloses that the similarities between computer security and the problem of protecting a body against damage from externally and internally generated threats are compelling and were recognized as early as 1972 when the term “computer virus” was coined. The connection to immunology was made explicit in the mid 1990s, leading to a variety of prototypes, commercial products, attacks, and analyses. The use of system-call monitoring and its application to anomaly intrusion detection and response is discussed.
Operation sequence behavior models can be used to detect appearance of previously unknown sequence of actions. They are attractive for detecting foreign code execution due to attacks, such as buffer overflow or cross-site scripting.
Raman, P., “JaSpin: JavaScript Based Anomaly Detection of Cross-Site Scripting Attacks”, Master's thesis, Carleton University, Ottawa, Ontario (2008), discloses that the increasing use of sophisticated JavaScript in web applications has led to the widespread exploitation of cross-site scripting (XSS) flaws. An anomaly detection-based approach for detecting cross-site attacks is disclosed. JaSPIn is based on the observation that the patterns of JavaScript methods invoked by web sites is extremely consistent, even for complex AJAX-driven applications. Thus, web page behavioral profiles can be generated by recording the methods executed when legitimate content is displayed. These profiles can then be used to constrain JavaScript behavior so that XSS attacks cannot succeed.
However, operation behavior sequence models can not be used to detect anomalies that do not manifest themselves by the appearance of an unknown sequence. For example, a typical sequence for a business transaction may contain a segment representing execution of a security mechanism. An attack may represent itself by a sequence in which this segment is missing. Such a sequence may be accepted as it does not introduce anything unknown.
Another limitation of operation sequence behavior models is that they use only a single action attribute, which is often not enough to accurately represent a system's behavior. For example, a system may have a policy which requires that certain actions are executed by different users to ensure separation of duties. Without including other attributes in the model, it is not possible to capture this policy and detect any violations.
More recent research in this area is focused on finding behavioral norms, that is, emergent, repeating patterns of behavior built from system logs. A trace is partitioned into a number of sub-traces (called “strands”) identified as executions of some transaction-like process. The resulting behavioral model includes multiple distinct n-gram models for each of the strands. Such an approach allows the building of a much more precise model. However, it may be significantly larger and its size will depend on a number of strands used for a learning process. This means that it is not practical. An initial approach to aggregate n-gram databases was useful in understanding system structure but resulting precision was very low.
O. Pieczul and S. N. Foley, “Discovering emergent norms in security logs”, IEEE Conference on Communications and Network Security (CNS-SafeConfig), Washington D.C., 2013, discloses a model that characterizes security logs as a collection of norms that reflect patterns of emergent behavior. An analysis technique for detecting behavioral norms based on these logs is described and evaluated. The application of behavioral norms is considered, including its use in system security evaluation and anomaly detection.
O. Pieczul and S. N. Foley, “Collaborating as normal: detecting systemic anomalies in your partner”, 22nd International Workshop on Security Protocols, Cambridge, UK, 2014, discloses whether anomaly detection techniques might be used to determine potentially malicious behavior by service providers. Data mining techniques can be used to derive patterns of repeating behavior from logs of past interactions between service consumers and providers. Consumers may use these patterns to detect anomalous provider behavior, while providers may seek to adapt their behavior in ways that cannot be detected by the consumer. A challenge is deriving a behavioral model that is a sufficiently precise representation of the consumer-provider interactions. Behavioral norms, which model these patterns of behavior, are used to explore these issues in an on-line photograph sharing style service.