Software programmers want the ability to make their computer software protected from undesired change. Such changes can be manual, such as a malicious user bypassing a licensing check in commercial software, or automatic, such as a virus modifying a binary to include a copy of the virus. To verify that it has not been modified, software attempts to monitor its own code, with execution changing when a modification to the code is detected (anti-tampering). To thwart a malicious user from locating and disabling self-checking code, code is made difficult to understand, and self-checking codes are hidden within the rest of the application code (obfuscation). Unfortunately, malicious attackers continue to thwart such checks using a variety of information gathered from the dynamic execution of the program. The issue of protecting software from unauthorized tampering is a critical problem in modem software deployment. Billions of dollars are lost in revenue each year due to the efforts of malicious hackers and software pirates. For example, malicious users may modify software to bypass a licensing check in commercial software or alter programs to include a copy of a computer virus. Anti-tampering methods are also of great importance in the growing area of digital rights management, where tampering results in the loss of significant royalties and license fees.
One popular method of protecting a program from modification is code obfuscation. Code obfuscation involves modifying computer code so that it is more difficult to understand. This then makes it more difficult for malicious hackers to figure out what parts of a program to modify. While a powerful tool in the fight to make software more secure, existing obfuscation techniques unfortunately possess several drawbacks. Much of the work in this area only involves making it more difficult to perform “static analysis” of programs. In other words, many techniques only deter knowledge of source code or object code but not knowledge of executing code (“dynamic analysis”), which can often be gained by piecing together the information from several different, runs of the program. Many obfuscation strategies also involve extremely high overhead, which may be unacceptable for many people and prevent adoption of the security measure. Other obfuscation strategies require the use of special hardware or fail to present complete and implementable solutions.
Previous work has, but not limited thereto, a variety of major drawbacks as discussed below.
Much work strives only to make the program hard to statically analyze [See 21, 33]. For example, an opaque predicate may be hard to analyze statically, but several runs of the program in a simulator can determine which branches are highly biased. That information can be fed into a static disassembler to identity the start of basic blocks. Also, that same run of the simulator can help reveal (the dynamically executed portions of) the control flow graph. Combining information from several runs of the program can yield a highly accurate representation of the instructions in a program, the opaque predicates, and the control flow graph.
Other work requires special hardware [See 28, 30, 24]. For example, a hardware device that stores a decryption key or a processor that only executes encrypted instructions can be used to guarantee that only programs that were generated with a proper encryption key can run. Unfortunately, specialized hardware may be expensive and not generally or widely available. Furthermore, users may reject hardware that is incapable of running a wide variety of programs.
Some previously proposed techniques have extremely high overhead [See 2]. In fact, some previous work provides such an unreasonable execution overhead that an overhead measurement is not even suggested. Realistic, usable solutions must provide minimal overhead or people will be unwilling to adopt the security measures.
Yet other work provides a threat model that does not meet the tamper-resistance needs of modem hardware, or provides only a partial or impractical solution [See 16, 18, 27]. Some previous techniques even assume that an optimal algorithm for performing a checksum calculation can be computed, that the optimal algorithm runs on a known hardware configuration (and even a pre-calculated clock rate) and that the result can transmitted over an unmonitored network within bounded time!
Thus, there is a continuing need for better ways to secure software, systems and content by providing a mechanism tor the protection of software from tampering and reverse engineering.