Ever since the days of early IBM PCs and 8-bit machines, application programmers and crackers have fought a never-ending battle in the field of software protection. Creating “uncrackable” programs remains a theoretical and practical open problem, especially against skilled and talented reverse engineers who devote significant amounts of time and effort to their job. However, various techniques of obfuscation and tamper resistance can be useful depending on the models of piracy in given channels, to deter casual pirates or delay crackers who attempt to bypass security checks, attach viruses, patch code and data at runtime, and in general alter software behavior.
Developers have often treated software security as an independent feature that can be easily “plugged in” once an application is finished. Indeed, a large number of automatic “protectors” or “wrappers” are available to encrypt executables and add layers of anti-hacking code. Unfortunately, such plug-in solutions have typically proven quite breakable and ineffective, despite continuing upgrades and improvements to address hacks that appear within days or even hours. In a certain theoretical models, automatic obfuscation of programs is not possible. All this provides some evidence that programmers wishing to create secure software need to be involved more deeply with protection.
Programmers need to involve themselves in the protection process from the beginning. A programmer needs to flag which variables and computations thereof that need be protected and indicate, whenever possible, degree of protection sought in some manner. The compiler then may pool these flagged portions with some variables it generates, obfuscates and verifies common data types and operation. One of the goals is to make the accesses to the data types leak very little information unless the attacker observes for a significant amount of time across many local sections of the code as it executes. Tampering a variable without proper information will lead to inconsistent or incorrect data or results. The user may also flag variables (e.g. a return address) as to be tamper evident when it will be appended with a randomized check sum akin to a cryptographic check sum; this if tampered with high probability it will be detected. OTL both provides protection tools and makes the programmer pay special attention to security-critical code. Additionally, OTL can inject “useless” data and code, both to disguise crucial data and to deflect the cracker's attention away from sensitive data and code; such “useless” data and code can be tightly interleaved with the rest of the application, including OTL-protected parts. To maximize security, OTL can be combined with other techniques we implemented, such as oblivious hashing, code-integrity checks, piecewise code encryption, anti-debugging, and others.
Software obfuscation is a widely-used method for software vendors to thwart illegitimate attempts at reverse engineering their products. Past efforts have focused mostly on code obfuscation, with data left largely in the clear. The tools available for hiding and protecting critical data invariably follow the decrypt-use-encrypt paradigm, and are generally used manually or semi-automatically by developers after the software is written. The following example, Sample 1 shows an example of using such a system. It shows a piece of code that manages a free-trial period. Every time this code is invoked, it checks a global variable iTimeLeft; if its value is zero, the code will ask the user to register the software.
SAMPLE 1A simple sample code to manage free-trial period.extern int32iTimeLeft;boolbTimeToRegister;bTimeToRegister = iTimeLeft < 0;if (bTimeToRegister) {// ask the user to register}iTimeLeft - - -;
Sample 2 shows the changes from applying a typical decrypt-use-encrypt method. First, the key variable, iTimeLeft, is stored in its encrypted (and expanded) form as a 64-bit integer. A macro is used to decrypt the 64-bit data into 32-bit clear text. The clear-text time counter is then checked and decremented before it is re-encrypted and stored away.
SAMPLE 2The traditional decrypt-use-encrypt approach.extern int64iTimeLeft;int32iTemp;boolbTimeToRegister;iTemp = DECRYPT_AND_CHECK_MAC_INT32(iTimeLeft, MY_KEY) ;bTimeToRegister = iTimeLeft < 0;if (bTimeToRegister) {// ask the user to register}iTimeLeft - - -;
The decrypt-use-encrypt approach has two weaknesses. First, the protected variables appear in plaintext between the decryption and encryption stages. An attacker can discover what the program is doing by setting appropriate breakpoints and checking register contents. Though DRM systems often employ some form of anti-debugging, such measures historically have been easy tnot so hard to defeat; also, powerful debuggers, simulators, and in-circuit emulators can render the entire system at the attacker's disposal.
Second, data security tends to be applied as an afterthought; the developers explicitly insert macros or library function calls. There is no guarantee that accesses to secure data are always bracketed by decryption and encryption macros. As a matter of fact, bugs have arisen because developers forgot to apply decryption macros before manipulating the secure data. Due to its manual nature, decrypt-use-encrypt processing is applied to only a few key variables in a program.
A systematic approach is needed to data hiding and protection, which can meet the following criteria:
A non-intrusive methodology for annotating the software during its development cycle so that it can be adopted easily by developers and employed efficiently. The underlying mechanisms & algorithms of supporting the indicated protection can be developed independently in a modular way and made available for use as a programming language tools and transformations (e.g. compilers.)
Protected data objects are seldom manipulated in clear-text form.
Breaking data protection for one data object does not lead to cracking of other protected objects.
There are no automated tools to reverse engineer our data protection mechanisms. In particular traditional program flow analysis tools should not be able to discover the details of the exact protection mechanism used in a given copy of the protected application. The attacker has to work through the entire program laboriously (many times) to uncover the original data objects.
Oblivious (i.e., unobvious and well-disguised) comparison of data (e.g. Is A=B?) and oblivious return of results (e.g. If A=B return the result in a probabilistic fashion in a variable C; if C does not have the right properties it will corrupt some protected data variable and lead to incorrect operation).