The turbulent development of computer technologies in the past decade and also the widespread use of diverse computing devices (e.g., personal computers, notebooks, tablets, smartphones and the like) have been a powerful stimulus to the use of these devices in diverse areas of activity and for a tremendous number of tasks (e.g., from Internet surfing to banking transfers and electronic document circulation). In parallel with the growth of the number of computing devices and the software running on these devices, the number of harmful programs (also known as malicious software or malware) has also grown significantly.
There are many different kinds of harmful programs. Some of them steal personal and confidential data from the devices of users, such as logins and passwords, banking information, electronic documents and the like. Others form so-called botnets from user devices for attacks such as denial of service (DDOS) or for sorting through passwords by brute force for other computers or computer networks. Still others offer paid content (e.g., adware) to users by intrusive advertising, paid subscriptions, sending of SMS to paid numbers, etc.
Specialized antivirus programs are used to fight harmful programs, that is, to detect harmful programs, prevent infection and restore the computer systems infected by harmful programs. In particular, antivirus programs employ various technologies to detect the full diversity of harmful programs, such as “signature analysis”, which is a search for correspondences of a particular section of code of a program being analyzed with a known code (signature) from a database of signatures of harmful programs. Other technologies include “heuristic analysis”, which is the emulation of the working of the program being analyzed, creating of emulation logs (i.e., containing data on API function calls, the parameters transmitted, the code sections of the program being analyzed, and so on), and searching for correspondences in the data of the created logs with the data from a database of emulations of harmful programs. Yet other technologies utilize “white lists” and “black lists”, which includes searching for the calculated checksum of a program being analyzed (or portions thereof) in a database of checksums of harmful programs (black lists) or a database of checksums of legal programs (white lists). And finally, technologies use proactive protection by intercepting API function calls of a program being analyzed that is running in the system, creating of logs for the working of the program being analyzed (containing data on API function calls, the parameters transmitted, the code sections of the program being analyzed, and so on), and searching for correspondences in the data of the created logs with the data from a database of calls of harmful programs.
In turn, harmful programs are increasingly using methods to resist the detection by antivirus programs of their presence on infected computer systems. These techniques include code obfuscation to defeat signature analysis, i.e., giving the original text (such as that of scripts like JavaScript) or executable code of programs an appearance that preserves their functionality, yet resists analysis, understanding of the working algorithms, and their modification during decompilation. Moreover, harmful programs utilize more complicated behavior to defeat heuristic analysis, including the use of a large number of operations or API function calls whose use do not affect the results of the input/output working of the program, yet disrupts its emulation by antivirus programs. Finally, such programs also monitor the behavior of third party programs to defeat proactive protection, i.e., continual monitoring of the behavior of third party programs in the operating system, searching for antivirus programs and taking action against them (e.g., hiding or substituting their own code for the analysis).
By using various techniques, such as code generators (i.e., designer programs able to automatically create harmful programs having a specified functionality), obfuscatory (i.e., programs able to alter the executable code of programs, thereby complicating their analysis without changing their functionality), packers (i.e., program modules introduced into programs, encrypting the executable code of the programs and decrypting it when launched), and so forth, hackers are able to quickly and effortlessly create and spread a large number of new versions of their harmful programs that are not detectable by antivirus applications.
For an effective detection of harmful programs obtained by the above-described methods, a technology is used whereby a group of harmful programs (i.e., a cluster) with certain characteristics (for example, files of harmful programs packed by one version of a packer, harmful programs having similar behavior, and so on) is collected. Moreover, in the files of the collected cluster, a search is made for similar code sections, data from emulation logs, or behavior (for example, a sequence of API function calls). Finally, rules of detection are created so that, if one knows the data of one harmful program of the cluster (such as code sections), other harmful programs of this cluster can also be detected.
One significant limitation of this approach is that there is no universal solution for detecting harmful files independently of the platform on which the harmful program is operating (for example, a mobile architecture ARM) or the type of data which the harmful program constitutes (such as a JavaScript script, a JavaScript byte-code or a compiled C++ code). Accordingly, it is often necessary to use algorithms individualized to the platform and the data type for the clustering and creation of rules for detection of harmful programs.
There are a large number of harmful programs actively utilizing virtual machines (and their vulnerabilities) for their propagation and destructive activity on the computers of users, and especially utilizing virtual stack machines (such as the Adobe Flash or Java virtual machine). The detection of such harmful programs involves additional difficulties as compared to the detection of ordinary harmful programs on PCs because the standard methods of detection are either not applicable (due to the architectural features of virtual machines) or are slow or inefficient (having too low a level of detection). Therefore, there is a need for more effective techniques for the detection of malware on virtual stack machines.