The approaches described in this section are approaches that could be pursued, but not necessarily approaches that have been previously conceived or pursued. Therefore, unless otherwise indicated, it should not be assumed that any of the approaches described in this section qualify as prior art merely by virtue of their inclusion in this section.
Modern file specifications may allow a file to include different types of content stored in the same file. For example, a HyperText Markup Language (“HTML”) file may include data stored in an eXtensible Markup Language (“XML”) format and/or one or more JavaScript instructions. A file that includes multiple types of content may be referred to as a mixed content file. Each content type may include data or instructions.
Viewer programs, such as a browser, may be sophisticated software programs that process the data and execute the instructions in a mixed content file. Viewer programs may support a wide range of functionality to allow content providers to create rich user experiences and critical functionality.
Unfortunately, a malicious user may embed malware in a file that exploits functionality supported by the viewer program. Detecting malware in a file may be difficult for many reasons: instructions that are used for legitimate purposes may also be used maliciously; malicious instructions (also referred to herein as malware) may be intermingled with legitimate instructions that provide additional functionality or improve a user's experience. For example, a file may include one or more legitimate instructions, which when executed by a viewer program causes the viewer program to send data entered by a user to a different computer, such as a printer. However, the same file may include one or more similar instructions which when executed by a viewer program causes the viewer program to gather data entered by a user and send the data to an online database for the malicious user to use or sell.