Malicious executables (or malware) which propagate through the Internet can be classified into three main categories: (a) worm-related; (b) non-worm related (i.e. virus, Trojan); and (c) probes (i.e. adware, spyware, spam, phishing). The detection of malicious executables that are known beforehand is typically performed using signature-based techniques. Said signature-based techniques typically rely on the prior explicit knowledge of the malicious executable code, which is in turn represented by one or more signatures or rules that are stored in a database. According to said prior art techniques, the database is frequently updated with new signatures, based on new observations. The main disadvantage of these techniques is the inability to detect totally new un-encountered malicious executables, (i.e. malicious executables whose signatures are not yet stored in the database).
An object of the present invention is to provide a technique which can detect new malicious executables, whose signatures are unknown yet. There are two main prior art approaches for performing such a task: (a) static analysis of executables; and (b) dynamic analysis of executables.
The static analysis approach suggests an inspection of the code of executables without actually running them, while the dynamic analysis approach suggests monitoring during the execution phase of the executable in order to detect anomaly behavior.
The present invention suggests a new technique of the dynamic analysis approach for the detection of new, unknown malicious executables.
Traditionally, anomaly detection techniques that are based on dynamic analysis approach have been used to detect new electronic threats (eThreats). These techniques build models of a normal program behavior during a training phase, and then, using the models the techniques attempt to detect deviations from said normal behavior during a detection phase. For example, S. Forrest, “A Sense of Self for UNIX Processes”, Proceedings of the IEEE Symposium on Security and Privacy, Oakland, Calif. 120-128, 1996, introduces a simple anomaly detection technique which is based on monitoring the system calls issued by specific privileged processes. During a training phase, the system of Forrest records short sequences of system calls that represent a normal process behavior into a “normal dictionary”. During a detection phase which is performed later, sequences of actual system calls are compared with said normal dictionary. An alarm is issued if no match is found.
Several data mining techniques for studying system call sequences have been proposed so far. W. Lee, S. J. Stolfo, and P. K. Chan, “Learning patterns from UNIX process execution traces for intrusion detection”, AAAI Workshop on AI Approaches to Fraud Detection and Risk Management, pages 50-56, AAAI Press, July 1997, and W. Lee and S. J. Stolfo, “Data mining approaches for intrusion detection”, Proceedings of the 7th USENIX Security Symposium, 1998, propose a method for describing “normal” system call sequences by means of a generally small set of rules, wherein the rules cover common elements in those sequences. During real time detection, sequences that are found to violate the rules are considered as anomalies.
The main advantage of said anomaly detection techniques is their ability to detect new, previously un-encountered malicious codes. The main drawback of using these techniques is the necessity to perform a complex and frequent retraining in order to separate “noise” and natural changes to programs from malicious codes. Legitimate program updates may result in false alarms, while malicious code actions that seem to be normal may cause missed detections. Furthermore, most applications that are based on anomaly detection techniques identify malicious behavior of specific processes only.
Another technique which is based on dynamic analysis approach has been proposed in T. Lee, Jigar J. Mody, “Behavioral Classification” Presented at the EICAR Conference, May 2006. Lee and Jigar propose a malicious code classification technique which is based on clustering of system call sequences. In the technique proposed by Lee and Jigar, malicious programs of various classes are represented as sequences of system calls. A K-medoid Clustering algorithm, as described in L. Kaufman and P. J. Rousseeuw, “Finding groups in data: An introduction to cluster analysis, New York: John Wiley & Sons. 1990 is applied to the sequences in order to map the input into a predefined number of different classes. The distance threshold between sequences is defined by as the minimum “cost” required in order to transform one sequence of system calls to another sequence of system calls, by applying a set of predefined operations. The process of Lee and Jigar results in a classifier, which includes plurality of medoids, wherein each medoid is a best representative of each cluster. The classification of new objects is performed using the nearest neighbor classification method as described in K. Beyer, J. Goldstein, R. Ramakhrisnan, and U. Shaft, “When is ‘nearest neighbor’ meaningful?”, Proc. 7th Int. Conf. on Database Theory (ICDT'99), pages 217-235, 1999. A new object is compared to all medoids, and receives a class label of the closest one.
The technique above can be used to classify a given malicious code instance as belonging to one of the predefined number of classes, but cannot be used for a new malicious code detection in real time.
It is therefore an object of the present invention to provide a general, real time detection method and system that is more reliable than prior art methods and systems.
It is still another object of the invention to provide a method which can detect a new malicious code in any executable, and not only in specific previously known programs.
Other objects and advantages will become apparent as the description proceeds.