The most basic defense against malware in documents is the avoidance (e.g., removal) of executable code. For example, documents represented in the Portable Document Format (PDF) may include embedded interactive elements written in the JavaScript programming language, where malware may be avoided by removing or blocking the JavaScript code. Another defense is to disable the executable code in the software application (e.g., PDF reader software) that reads the document. However, these approaches degrade usability by preventing the execution of harmless, useful features. For example, interactive forms may no longer be able to function.
A common method for identifying malware, used by anti-virus software, is to search files for signatures or patterns of known malicious files. However, document-related malware often obfuscates its malicious code to avoid detection methods based on matching the text of the executable code against a signature or pattern. Therefore, methods based on matching signatures or patterns may be effective only for known malware, and may be vulnerable to new attacks. Other methods for detecting malware analyze behavior of the executable code while executing in a sandbox environment. However, there is no guarantee that malicious behavior will be observed within the sandbox (e.g., the malicious behavior might be triggered by a combination of factors not present in the sandbox).