The proliferation of the use of computers, their interconnectivity and our reliance on them in our daily lives has given rise to the serious problem of the vulnerability of our computers and the information contained therein. Those who wish take advantage of this vulnerability have developed software to carry out their malicious intent.
This malicious software, or “malware”, may take the form of “viruses”, “worms”, “Trojans”, “spyware” or other such software that can operate on a computer to compromise the integrity of the computer and the information contained therein. Such programs can cause the computer to cease operating, grant access to the computer's information to unintended parties, monitor the computer's operations and collect sensitive information input into the computer by the user, such as personal and business records and passwords. Such programs can also self-replicate and spread out from an infected computer to a non-infected computer, thus propagating their malicious actions and causing harm on a great scale.
Given the great potential for harm posed by malware, methods for detecting the presence of malware on a computer and for diffusing the harmful effects of the malware have been developed.
Traditionally, there were two approaches to malware detection and diffusion: (i) static analysis; and (ii) dynamic analysis.
Static analysis extracts information from the program without launching the program. The subject file is processed and analyzed for characteristics of known malware by the use of sets of signatures. However, most malware is packed or encrypted and many different packers or encryptors are used. This creates significant problems with this type of analysis. Modern malware packers and encryptors are using polymorphism that prevents reliable signature based detection. As such it is desirable to remove the packers or encryptors first, before applying the signature. However, it is practically impossible to have unpacking or un-encrypting routines for each and every packer or encryptor and, in any event, it is costly and time consuming to address. Also, since the same packers and encryptors are used for both malware and non-malware, the use of this technique often results in a considerable percentage of false positives and wasted resources. To address this, the signatures are taken from packers and encryptors that use polymorphic algorithms to fight them. All of this requires the creation of more and more signatures that describe the same malware family.
A further static analysis technique that has been proposed builds signatures from the simplified control flow graph of the malware. This technique isolates all of the paths that may be traversed by the subject program and seeks to identify common paths between different samples of the same malware family. However, this technique uses a significant amount of computing time and is not scalable. As such, its effectiveness is limited.
Dynamic analysis requires the execution of the subject program and monitors the behaviour of the program during such execution. In proposed applications of dynamic analysis, the behaviour of the program is compared to signatures in a database or sets of expert created rules. When malware like behaviour is identified, action can be taken to diffuse the program.
The first problem with dynamic analysis is that permitting a potentially malicious program to run on a computer places the computer at risk. This has been addressed by running the program in an artificial environment which is isolated from the main computing environment and that seeks to emulate as many of the characteristics of the main computing environment as is reasonably possible. This “emulation” technique allows for the effects of malware to be isolated from the main computing environment. It is during such emulation process that the behaviour of the malware is analyzed and diffusion methods determined.
There are problems presented by emulation. First, it is not possible to emulate the computer's entire operating system. As such, compromises must be made which inherently gives rise to limitations to emulation and permits some malware to defeat emulation. This also makes emulation only suitable for malware that is subsequently introduced to the computer and not suitable for use on computers that are already infected with malware prior to the implementation of the detector. Second, the use of emulation is a considerable drain on the computer's resources and takes a significant amount of time to perform. This limits the overall performance of the computer and, in some cases, defeats the detector's inherent purpose.
Another known method of employing dynamic analysis for the detection of malware software involves the application of expert created rules. In fact, the application of expert created rules to an executing subject program is the dominant method in the industry for detecting malware. This is sometimes known as Host-Based Intrusion Prevention Systems or HIPS. Such rules can be applied to high-level events (such as Windows API, IRP or operating system callbacks) or to low-level events (such as system calls). The fundamental problem of this approach is the presence of false positives. The cause of this is the fact that the rules exist out of the context of the given program execution.
Also, it is necessary to consider the fact that malware tries to counteract the restrictions imposed by HIPS. An example of such counter-measures is the splitting of malware actions between different processes. The only way to mitigate that measure is to merge the different threads and process histories into a single context to which the rules will be applied. If the rule is formed in an unsuccessful way (such as when it is statistically often occurring in non-malware) a false positive will occur. The probability of the false positive is going to increase on the merged context. Other known counter-measures include discarding or compromising the program's history and damaging or compromising the list of rules. Furthermore, if malware modules are injected into an otherwise trusted processes, HIPS will fail to notice the injected process. As a result, HIPS based analysis may cause false positives and missed detections.
It has been proposed to employ a dynamic analysis involving the creation of behaviour signatures to detect malware. Such behaviour signatures are proposed to be created from select groups of API calls or system calls generated by the subject program. However, such limited API and system call based signatures are unreliable. For example, even if all API calls were analyzed, it is still possible for the malware to generate system calls directly. Accordingly, any signature based on API calls is incomplete. On the other hand, if a signature is based only on limited groups of system calls, then it is vulnerable to missed detections and exploitation by malware producers. As such, to be completely effective, it is desired for all of the subject program's system calls to be analyzed.
It is also desirable to use the detector on computers that are already infected with malware prior to the implementation of the malware detecting program. In this case, the malware may already have been injected into some trusted processes. Accordingly, it is required to process the entire operating system. However, the average computer operating system can generate thousands of system calls each second for a single process and up to 200,000 system calls each second overall. The use of previously proposed dynamic analysis on this volume of calls would lead to operating system stall.
Because the format of the behaviour signatures is important, signatures have been proposed that use the longest common subsequence (LCS). LCS patterns are obtained by comparing different malware samples of the same family. However pure LCS-based signatures have certain drawbacks. First, each system call must have an accompanying thread identifier. The signature based on system calls without thread identifiers, such as pure-LCS based signatures, will be vulnerable to defeat by a task switch from the operating system scheduler. Also, no thread-spread attempts will be detected. Furthermore, pure LCS-based signatures do not allow for “junk call” insertions or system call permutations. Malware will inevitably use “junk calls” to defeat pure LCS-based signatures. System call permutations can also happen as the result of operating system scheduler actions or as the result of the malware modifications, which will defeat pure LCS-based signatures.
Considering the limitations of the existing malware detection and diffusion techniques, it is desired to have a system and method for detecting the presence of malicious software on a computer and to diffusing malicious software before it can operate to cause undesirable effects on the computer. It is desired that such a system and method analyses all of the system calls of the operating system in real time on the main computer without emulation or the need for unpackers or decrypters and detects the malware without the use of HIPS or control flow graphs, using signatures resistant to OS scheduler actions, malware counter acts and to malware modifications, and diffuses the malware before the malware can operate to cause harm to the main computer. It is further desired that such a system and method operates efficiently with respect to computer resources and time and can detect and diffuse both known and previously unknown malware on computers that are infected before and after introduction of the malware detector to the computer.