Contemporary delivery of application code typically involves its compression through a packing process. By using a packing process, binary file sizes may be reduced, and multiple files may be combined into one file. Modern packing processes create “self-extracting executables,” which may be executed to unpack the contents of the packed code. That is, the packed code itself is accompanied by an executable code section or stub that, when executed, results in inflating or uncompressing the packed code. Accordingly, running a self-extracting executable can result in the packed code executable being expanded on disk, in memory, or both.
When packing a file to create a self-extracting executable, many different types of compression algorithms and packing techniques may be employed. Some of these are well-known and documented while others are not. Employing different techniques on the same file to create a self-extracting executable will result in different files—both the packing code and the packed code may be different because of different packers and varying results from different compression algorithms. Further, if unknown or undocumented techniques are used to pack the file into a self-extracting executable, it may be difficult to even determine the distinction between the packing code and the packed code.
These characteristics of self-extracting executables are often exploited by malware developers to hide malware from antivirus programs or malware detection programs. One common method to detect malware is signature scanning. With signature scanning, files are scanned for bit patterns, or signatures, that are known or suspected to be associated with malware. When a bit pattern in a file matches a signature of known malware, then that file can be identified as being, or containing, malware. However, a signature of a malicious executable can be easily changed in an effort to obfuscate the executable. When malware is packed, detection may be avoided because the known signature of the unpacked malware will not match any bit pattern of the packed malware file.
To attempt to overcome these efforts to hide malware, antivirus programs and malware detection programs may employ multiple techniques. One technique is to extract the packed code in memory without executing it and then attempt to scan the uncompressed binary for malware signatures. Packed code may be extracted by emulating its execution or, if the packing algorithm is known, performing the extraction by the antivirus program. If the packing technique is not well-known or documented, extracting the packed code under the control of the antivirus program may not be possible. Also, many packing algorithms use anti-emulation and anti-debugging techniques to simply terminate the unpacking process after detecting that the unpacking is being performed by a debugger or through execution emulation. Time stamping parts of the code flow is a standard method that may be used to determine that code is being emulated. Similarly, identifying that code is being debugged may be easily determined by inquiring to the operating system.
Even if the self-extracting executable is allowed to execute or be emulated, an antivirus program may have difficulty in determining when the unpacking part of execution is complete and when the originally compressed executable begins execution. In a self-extracting executable, the unpacking code and the packed executable are part of the same binary, and determining the distinction between the two in memory can be difficult.
Another technique to overcome the efforts to hide malware is to add signatures of known self-extracting executables which contain malware into an antivirus signature database once such a new signature of packed malware is identified. A weakness to this technique is that it may be easily avoided by slightly altering the packer code or the packing technique, resulting in a different self-extracting executable, and thus a different signature. Adding signatures accounting for these variations in packing techniques to the antivirus signature database serves to increase the size of the signature database. This causes a problem in that the number of signatures and the difficulty of maintaining of signature files can correspondingly increase. Further, these efforts may be further thwarted because the packing process can be repeated any number of times using different packing algorithms in different orders, creating an even greater number of signatures to identify and maintain.
Because unpacking of the binary can be crucial for malware detection, malware clustering and classification, automated analysis and automated reverse engineering, anti-malware software developers have tried various approaches to generic unpacking of malware, including PolyUnpack, Renovo, and OmniUnpack, however previous heuristic approaches have limitations on the types of packing that can be unpacked, require considerable computational resources, and have high false positive rates. PolyUnpack and Renovo are based on variants of fine grained analysis, which is a very slow and weak approach against the latest custom packers, while Omniunpack has a complex implementation and produces very high false positive rates if used in raw form. Better generic unpacking approaches would be helpful.