The term “malware” is short for malicious software and is used to refer to any software designed to infiltrate or damage a computer system without the owner's informed consent. Malware can include viruses, worms, trojan horses, rootkits, adware, spyware and any other malicious and unwanted software. Many computer devices, such as desktop personal computers (PCs), laptops, personal data assistants (PDAs) and mobile phones can be at risk from malware.
Detecting malware is often challenging, as malware is usually designed to be difficult to detect, often employing technologies that deliberately hide the presence of malware on a system. It is desirable, if possible, to prevent malware being installed onto a computer in the first place, rather than having to detect and remove it once it has been installed. A common method that is used by creators of malware to bypass anti-virus software and firewalls installed on a computer is to embed the malicious executable code into a document such as a PDF, Excel™ or Flash™ file. A vulnerability in the software used to open the document can then be exploited by the attacker, and allow malware to be installed on the computer system. This is known as a “document exploit”. In recent years, there have been several vulnerabilities in Adobe Reader™ and Adobe Flash Player™, as well as in Microsoft Office™ software such as Excel™, Word™ and Powerpoint™. For example, a recent trend has been to embed malicious Flash objects within Microsoft Office files as these are easy to deliver as email attachments and are generally mistakenly trusted by recipients who will open them without much concern. By sending the malware embedded in a document, the attacker no longer requires the malware binary to be downloaded, for example from a known malicious server on the internet, and therefore increases the chances of avoiding antivirus and Intrusion Detection System (IDS) security.
Typically, when a user opens a malicious document (i.e. one that has malicious code embedded within) on a computer, this triggers “shellcode” to be executed. Shellcode is a small piece of code that is also embedded in the document. It is executed by the code that exploits a vulnerability in the software used to open the document The shellcode attempts to find the malicious document that has been opened, and once it has been found, the shellcode can extract the embedded malicious data from the document. Once extracted, the malware can be run, and the computer will be infected.
A current method of detecting such malicious documents is to analyse the code that makes up the document. For example, analysis of the code may include searching for known sections of code that are indicative of known malware or known malicious shellcode. However, code analysis has significant limitations, examples of which include the relatively long time taken to carry out the analysis and the high processing resources required to do so. This can degrade the experience of the end user. In addition, malicious code can be obfuscated, making it difficult to detect, and as shellcode is relatively easy to code (when compared to the malware) an extremely large number of shellcode variants can exist, many of which are likely to be unique and never seen before.