Harmful programs, known as malware, are becoming increasingly common and infect computers of users in various ways. Some malware, such as the Trojan programs, are generally camouflaged as harmless or useful programs so that the user will download them onto his computer. Viruses and worms have the ability of self-copying, which can result in rapid spread of malware in the event that there is no antivirus software on the computers.
Until recently, the overwhelming majority of harmful programs have been executable files, that is, they contained machine code, but could also contain pseudocode, such as byte code or instructions whereof the execution needed an interpreter. An example of an executable file format is the EXE or COFF format. However, files of other types are now becoming increasingly more common, such as PDF or SWF. This is due to the fact that individual programs are used to open such files (such as Adobe Reader), and the format of such files is itself a kind of container with resources that are used by the program which opens such a file. Often the resource can be either a malicious URL or a script (such as JavaScript).
FIG. 1 shows the structure of a PDF document that includes four sectors: a header, a body, a table of cross references, and also a trailer. In a general case, a PDF document can be represented as a hierarchy of objects (pages, images, scripts) which are stored in the body of the file, while the table of cross references contains information about these objects. Today it is possible to insert in the file body not just text, but also script, including malicious one. Furthermore, popular programs working with PDF (Acrobat Reader itself) continue to have many vulnerabilities, and a document can be structured in such way that, when opened, it becomes possible to utilize vulnerability using exploit and initiate an execution of malicious payload.
The existing methods of detecting harmful files are ineffective and in some instances impossible to use for analyzing files of different formats.