Attackers are known to use several file or document based techniques for attacking a victim's computer. Known file-based attacks may exploit a structure of a file or document and/or vulnerabilities in a platform or document specification. Some file-based attacks include the use of active content embedded in a document, file, or communication to cause an application to execute malicious code or enable other malicious activity on a victim's computer upon rendering the file. Active content may include any content embedded in an electronic file or document configured to carry out an action or trigger an action. Common forms of active content include word processing and spreadsheet macros, formulas, or scripts, JavaScript code within Portable Document Format (PDF) documents, web pages including plugins, applets or other executable content, browser or application toolbars and extensions, etc. Some malicious active content can be automatically invoked to perform the intended malicious functions when a computer runs a program or application to render (e.g., open or read) the received content, such as a file or document. One such example includes the use of a macro embedded in a spreadsheet, where the macro is configured to be automatically executed to take control of the victimized computer upon the user opening the spreadsheet, without any additional action by the user. Active content used by hackers may also be invoked responsive to some other action taken by a user or computer process.
Another file-based attack includes the use of embedded shellcode in a file to take control of a victim's computer when the computer runs a program to open or read the file. A shellcode is a small piece of program code that may be embedded in a file that hackers can use to exploit vulnerable computers. Hackers typically embed shellcode in a file to take control of a computer when the computer runs a program to open or read the file. It is called “shellcode” because it typically starts a “command shell” to take control of the computer, though any piece of program code or software that performs any malicious task, like taking control of a computer, can be called “shellcode.”
Most shellcode is written in a low-level programming language called “machine code” because of the low level at which the vulnerability being exploited gives an attacker access to a process executing on the computer. Shellcode in an infected or malicious file is typically encoded or embedded in byte level data—a basic data unit of information for the file. At this data unit level of a file, actual data or information for the file (e.g., a pixel value of an image) and executable machine code are indistinguishable. In other words, whether a data unit (i.e., a byte(s) or bit(s)) represents a pixel value for an image file or executable shellcode cannot typically be readily determined by examination of the byte level data.
Indeed, shellcode is typically crafted so that the infected or malicious file appears to be a legitimate file and in many cases functions as a legitimate file. Additionally, an infected or malicious file including embedded shellcode may not be executable at all by some software applications, and thus the infected file may appear as a legitimate file imposing no threat to a computer. That is, an infected or malicious image file, for example, may be processed by an application executed on a computer to display a valid image and/or to “execute” the byte level data as “machine code” to take control of a computer or to perform other functions dictated by the shellcode. Thus, whether a process executing on a computer interprets a byte or sequence of bytes of a file to represent information of the file, or instead to execute malicious machine code, depends on a vulnerability in a targeted application process executed on the computer.
Shellcode is therefore often created to target one specific combination of processor, operating system and service pack, called a platform. Additionally, shellcode is often created as the payload of an exploit directed to a particular vulnerability of targeted software on a computer, which in some cases may be specific to a particular version of the targeted software. Thus, for some exploits, due to the constraints put on the shellcode by the target process or target processor architecture, a very specific shellcode must be created. However, it is possible for one shellcode to work for multiple exploits, service packs, operating systems and even processors.
Attackers typically use shellcode as the payload of an exploit targeting a vulnerability in an endpoint or server application, triggering a bug that leads to “execution” of the byte level machine code. The actual malicious code may be contained within the byte level payload of the infected file, and to be executed, must be made available in the application process space, e.g., memory allocated to an application for performing a desired task. This may be achieved by loading the malicious code into the process space, which can be done by exploiting a vulnerability in an application known to the shellcode developer. A common technique includes performing a heap spray of the malicious byte level shellcode, which includes placing certain byte level data of the file (e.g., aspects of the embedded shellcode) at locations of allocated memory of an application process. This may exploit a vulnerability of the application process and lead the processor to execute the shellcode payload.
Other file-based attacks are known and are generally characterized by the ability to control a victim's computer or perform malicious activity on the victim's computer upon a user opening, executing, or rendering a malicious document or file on the user's computer. More commonly, the user receives the malicious document or file via electronic communication, such as downloading from a remote repository, via the internet or via an e-mail communication. Attackers are becoming increasingly more sophisticated to disguise the nature of the attack, making such attacks increasingly more difficult to prevent using conventional techniques.
Computer systems are known to implement various protective tools at end-user computer devices and/or gateways or access points to the computer system for screening or detecting malicious content before the malicious content is allowed to infect the computer system. Conventional tools commonly rely on the ability to identify or recognize a particular malicious threat or characteristics known to be associated with malicious content or activity. For example, conventional techniques include attempts to identify malicious files or malicious content by screening incoming files at a host computer or server based on a comparison of the possibly malicious code to a known malicious signature. These signature-based malware detection techniques, however, are incapable of identifying malicious files or malicious content for which a malicious signature has not yet been identified. Accordingly, it is generally not possible to identify new malicious exploits using signature-based detection methods, as the technique lags behind the crafty hacker. Furthermore, in most cases, malicious content is embedded in otherwise legitimate files having proper structure and characteristics, and the malicious content may also be disguised to hide the malicious nature of the content, so that the malicious content appears to be innocuous. Thus, even upon inspection of a document according to known malware scanning techniques, it may be difficult to identify malicious content.
Another conventional technique is based on the use of behavior-based techniques or heuristics to identify characteristics of known malicious exploits or other suspicious activity or behavior, such as that based on a heap spray attack. One such technique implements a “sandbox,” (e.g., a type of secured, monitored, or virtual operating system environment) which can be used to virtually execute untested or untrusted programs, files, or code without risking harm to the host machine or operating system. That is, conventional sandbox techniques may execute or detonate a file while monitoring the damage or operations post-detonation such as writing to disk, network activity, spawn of new processes etc. and monitor for suspicious behaviors. This technique, however, also suffers from the inability to identify new exploits for which a (software) vulnerability has not yet been identified, e.g., so called zero-day exploits. Some sophisticated malware have also been developed to evade such “sandbox” techniques by halting or skipping if it detects that it is running in such a virtual execution or monitored environment. Furthermore, clever hackers consistently evolve their code to include delayed, or staged attacks that may not be detected from evaluation of a single file, for example, or may lay in wait for a future unknown process to complete an attack. Thus, in some situations it may be too computationally intensive or impracticable to identify some shellcode exploits using conventional sandbox techniques.
Furthermore, because some malicious attacks are often designed to exploit a specific vulnerability of a particular version of an application program, it is very difficult to identify a malicious file if that vulnerable version of the application program is not executed at a screening host computer or server. This creates additional problems for networks of computers that may be operating different versions of application or operating system software. Thus, while a shellcode attack, for example, may be prevented or undetected at a first computer because its application software does not include the target vulnerability, the malicious file may then be shared within the network where it may be executed at a machine that is operating the targeted vulnerable version of application software.
The present disclosure includes embodiments directed to solving problems rooted in the use of embedded or referenced malicious content generally, without regard to a specific vulnerability or how the malicious content is configured to be invoked. The present disclosure includes embodiments directed to solving problems and risks posed by malicious content generally, whether such malicious content may be considered active content or shellcode or any other form of malicious content.