At the present time there are several methods which antivirus software can use to identify malware: signature matching methods, heuristic analysis, etc. These methods are useful to malware detection if the mal ware growth is not aggressive and antivirus companies an make necessary updates for antivirus systems.
The significant growth in the number of Internet users in the past decade fueled by the advances in Internet services, such as gaming, news, entertainment, shopping, banking, social networking, etc., has led to significant increase in emergence of new types of malware. The number of new malicious programs being detected has increased more than tenfold in the last three years alone. And the rate of growth continues to increase. Thus, antivirus software developers have been working hard to keep up with the proliferation of new kinds of malware by developing new methods and systems for detection of malicious software.
As a result of this development, techniques of signature matching and heuristic analysis of malware have become widespread and most frequently used in antivirus applications and other computer and network security products. However, these techniques have limitations.
Signature matching methods are oriented exclusively towards the detection of known software objects and are poorly adapted in detection of previously unknown types of malware. This is associated with the fact that they are based on a comparison of the values of hash functions from small sections of a file. Thus, by virtue of the cryptographic properties of the hash functions, a change in even one bit in the input data completely changes the output result. Because of this insignificant mutation in malware object can make the object unknown for antivirus system. There are two main parameters which characterize an antivirus system. The first parameter is the detection rate of unknown malware. It is determined as the number of detected malware objects divided by the number of known malware objects. The second parameter is the false positive rate which is determined as the number of safe objects detected as malware objects divided by the number of all scanned safe objects.
Heuristic analysis methods of protection also have deficiencies in the detection of unknown malware: first, a longer operating time relative to the signature methods; and second, they are already close to the limit of their capabilities, providing a detection rate of 60-70%. Heuristic analysis methods are liable to false detection risks.
There are other methods of malware detection except described above. The method for detection malicious software by identifying malicious structural features, decryption code, and cryptographic functions is described in the U.S. Patent Application No. 20050223238 A1. A malicious structural feature is identified by comparing a known malicious structural feature to one or more instructions of the executable file. Deletion of is harmless code (for example. API-functions) performed to obtain unique structural feature which is only 10-20% of original code. The structural feature of the object is compared with malicious structural features to determine if the object is malicious.
Another way to determine malware is to compare functionality of the analyzed software with the malware functionality using a poi non of the execution path of the object (see FIG. 7).
FIG. 7 illustrates an example of a portion of the execution path of a worm program, i.e. computer program that replicates, but does not infect other files: instead, it installs itself on a victim computer and then looks for a way to spread to other computers (for more information, visit http://www.securelist.com/ru/glossary?glossid=152527951). The execution path of the worm program consists of network address of the computer system determination, establishment a connection with the computer system, and data transmission. Thus, it becomes clear from the execution path of the program what functions and with what parameters are being called out during program execution, which turns out to be useful for analysis of maliciousness of the program.
The analysis of program graphs can be also very useful to detect malicious object. The graph vertices are a set of assembler functions. Two vertices are connected by an edge when and only when there is a call from one of them to the other. There are two types of program graphs: a program flowchart of the object and a function call-graph of the object. FIG. 8 and FIG. 9 illustrate the program flowchart of the object and the function call-graph of the object accordingly. FIG. 8 illustrates an example program flowchart of a Trojan-horse program (for more information, visit http://www.securelist.com/ru/glossary?letter=244#gloss152528302). As shown, the vertices are a set of assembler functions executed before JMP instruction. FIG. 9 illustrates a portion of a function call-graph of a worm program. In this case, the vertices are certain procedures or functions of a high-level language. It is possible to separate from them standard or API-functions 920, as well as unique ones 910, which were written by the actual programmer for himself.
The program execution path method is described in the U.S. Patent Application Nos. 20070240215A1, 20070094734A1, and in the U.S. Pat. No. 7,093,239 B1. The patented method describes the program execution path which is formed during the start of the application. A drawback of the method is that it can be performed in virtual environment only that involves increasing the computer resource consumption. Malware detection process in the mentioned patent and patent applications is based on the resemblance determination procedure between the analyzed object and one or more malicious objects. To determine the resemblance between two objects some different approaches are used, for example, Levenshtein distance.
The U.S. Patent Application No. 20070136455A1 describes a method for classifying an application into an application group which is previously classified in a knowledge base. The U.S. Patent Application No. 20090631001 describes a method for detection, classification and reporting of malicious software based on analysis of the software activity log using the behavior patterns.
The known methods are yet not applicable with another types of objects, for instance, script files, documents etc. Another disadvantage is that some malicious objects can be detected as a suspicious object because of the distance between them exceeds the threshold value.
There is another problem connected with the response times to new threats. Sometimes antivirus system cannot detect if the object is malicious. In this case antivirus system sends information about the object (name, size, hash function, etc.) to the antivirus server. The response time depends on if there is such object on the antivirus server. The known antivirus systems determine malicious objects based on analysis of the entire object. The antivirus server receives thousands requests. To reduce the traffic it would be better to transmit the necessary information only but not entire objects.
The detection efficiency of existing methods directly depends on the absence of their vulnerability. If method is undisclosed there are almost no exploits that use its vulnerabilities. In future the vulnerabilities of the method will be revealed and using, of such method will not be so effective. There is an example of reducing efficiency of line-by-line comparison analysis for detection malware. After this analysis was explored the new types of malware appeared which used dynamic code generation and encryption. FIG. 13A illustrates an example of a modification of program code so that line-by-line comparison analysis would fail. Shown are two sets of code lines of two Trojan-horse programs, which have the same functionality, but different file names. Thus, both objects perform similar malicious actions, but when their code lines are compared for exact match, it is determined that practically all lines are different. Thus, if the exact line-by-line comparison does not give a positive result, approximate string matching may be used. FIG. 13B illustrates an example of code obfuscation that prevents detection of a malicious object using analysis of function call-graphs of the program. In the specified list of functions being called, criminals have included dummy code that does not affect the end result of the program. The direct comparison of the provided function call-graphs fails to identify the program as known malware because the function call-graphs are too dissimilar. Therefore such garbage instructions make the use of the call-graph analysis difficult for detecting unknown malware, and other malware analysis methods should be used.
Therefore, new methods for detection of unknown malware are necessary.