A shellcode is a small piece of program code that may be embedded in a file that hackers can use to exploit vulnerable computers. Hackers typically embed shellcode in a file to take control of a computer when the computer runs a program to open or read the file. It is called “shellcode” because it typically starts a “command shell” to take control of the computer, though any piece of program code or software that performs any malicious task, like taking control of a computer, can be called “shellcode.”
Most shellcode is written in a low level programming language called “machine code” because of the low level at which the vulnerability being exploited gives an attacker access to a process executing on the computer. Shellcode in an infected or malicious file is typically encoded or embedded in byte level data—a basic data unit of information for the file. At this data unit level of a file, actual data or information for the file (e.g., a pixel value of an image) and executable machine code are indistinguishable. In other words, whether a data unit (i.e., a byte(s) or bit(s)) represents a pixel value for an image file or executable shellcode cannot typically be readily determined by examination of the byte level data.
Indeed, shellcode is typically crafted so that the infected or malicious file appears to be a legitimate file and in many cases functions as a legitimate file. Additionally, an infected or malicious file including embedded shellcode may not be executable at all by some software applications, and thus the infected file may appear as a legitimate file imposing no threat to a computer. That is, an infected or malicious image file, for example, may be processed by an application executed on a computer to display a valid image and/or to “execute” the byte level data as “machine code” to take control of a computer or to perform other functions dictated by the shellcode. Thus, whether a process executing on a computer interprets a byte or sequence of bytes of a file to represent information of the file, or instead to execute malicious machine code, depends on a vulnerability in a targeted application process executed on the computer.
Shellcode is therefore often created to target one specific combination of processor, operating system and service pack, called a platform. Additionally, shellcode is often created as the payload of an exploit directed to a particular vulnerability of targeted software on a computer, which in some cases may be specific to a particular version of the targeted software. Thus, for some exploits, due to the constraints put on the shellcode by the target process or target processor architecture, a very specific shellcode must be created. However, it is possible for one shellcode to work for multiple exploits, service packs, operating systems and even processors.
Attackers typically use shellcode as the payload of an exploit targeting a vulnerability in an endpoint or server application, triggering a bug that leads to “execution” of the byte level machine code. The actual malicious code may be contained within the byte level payload of the infected file, and to be executed, must be made available in the application process space, e.g., memory allocated to an application for performing a desired task. This may be achieved by loading the malicious code into the process space, which can be done by exploiting a vulnerability in an application known to the shellcode developer. A common technique includes performing a heap spray of the malicious byte level shellcode, which includes placing certain byte level data of the file (e.g., aspects of the embedded shellcode) at locations of allocated memory of an application process. This may exploit a vulnerability of the application process and lead the processor to execute the shellcode payload.
One known heap spray technique implemented by hackers includes embedding the payload of the malicious shellcode in an image file to be opened by a victim computer. An example of this technique is the CVE-2014-0322 exploit. This exploit stored the payload of the malicious machine code in a downloadable JPG image file. The payload of the JPG image file included legitimate image bytes together with bytes representing the actual malicious code that caused the victim computer to execute the first stage of the attack. Had the JPG image file been blocked or disarmed to prevent or disrupt execution of the malicious code, the attack could have been prevented.
Another example of a shellcode attack is the CVE-2014-0502 exploit in which, as a first stage of attack, shellcode was used as part of an exploit targeting a vulnerability in a version of the Adobe® Flash® Player application to download a malicious GIF file, which contained encrypted/encoded shellcode embedded within. As part of a second stage of attack the shellcode in the infected GIF file is eventually executed, leading to download of the actual backdoor that compromised the victim computer.
Another technique that has been used by hackers included embedding shellcode in a file that itself does not contain the machine code that allows the hacker to take control of the computer. Instead, the executed shellcode points to another file or network location and directs the application process to load an executable side file (side channel) that allows the hacker to take control of the computer. One example of this is the CVE-2014-4114 exploit, which introduced a method to use a PowerPoint presentation that contained a remote or embedded image (e.g., slide1.gif) that is actually an executable (with PE header) file. The CVE-2014-4114 attack exploited a logical bug in application software that was used to trigger the embedded image as an executable, renaming slide1.gif to slide1.gif.exe, that was then automatically executed leading to full control of the victim computer.
Each of the above attacks used shellcode exploits contained within image files that were run, opened, or downloaded by application software of the victim computer. Similar attacks may also be staged using files including audio and/or video data or other file types—not just image data. In these examples, the malicious image files in some respect included legitimate image data that, but for a targeted vulnerability of particular application software, would not have resulted in execution of the embedded shellcode. That a malicious image file can include legitimate data and be used in many respects as expected, makes these kinds of attacks very difficult to prevent using conventional techniques.
For example, conventional techniques include attempts to identify malicious files by screening incoming files at a host computer or server based on a comparison of the possibly malicious code to a known malicious signature. These signature-based malware detection techniques, however, are incapable of identifying malicious files for which a malicious signature has not yet been identified. Accordingly, it is generally not possible to identify new malicious exploits, as the technique lags behind the crafty hacker. Furthermore, in most cases, malicious shellcode is embedded in otherwise legitimate files having proper structure and characteristics, such that they may not be detectable based on a signature-based comparison.
Another conventional technique is based on the use of behavior-based techniques or heuristics to identify characteristics of known shellcode exploits or other suspicious activity or behavior, such as that based on a heap spray attack. One such technique implements a “sandbox,” (e.g., a type of secured, monitored, or virtual operating system environment) which can be used to virtually execute untested or untrusted programs, files, or code without risking harm to the host machine or operating system. That is, conventional sandbox techniques may execute or detonate a file while monitoring the damage or operations post-detonation such as writing to disk, network activity, spawn of new processes etc. and monitor for suspicious behaviors. This technique, however, also suffers from the inability to identify new exploits for which a (software) vulnerability has not yet been identified, e.g., so called zero-day exploits. Some sophisticated malware have also been developed to evade such “sandbox” techniques by halting or skipping if it detects that it is running in such a virtual execution or monitored environment. Furthermore, clever hackers consistently evolve their code to include delayed, or staged attacks that may not be detected from evaluation of a single file, for example, or may lay in wait for a future unknown process to complete an attack. Thus, in some situations it may be too computationally intensive or impracticable to identify some shellcode exploits using conventional sandbox techniques.
Furthermore, because shellcode attacks are often designed to exploit a specific vulnerability of a particular version of an application program, it is very difficult to identify a malicious file if that vulnerable version of the application program is not executed at a screening host computer or server. This creates additional problems for networks of computers that may be operating different versions of application or operating system software. Thus, while a shellcode attack included in a file may be prevented or undetected at a first computer because its application software does not include the target vulnerability, the malicious file may then be shared within the network where it may be executed at a machine that is operating the targeted vulnerable version of application software.
Thus, there is a need for alternative techniques to prevent malicious shellcode attacks, including new zero-day exploits.