Some modern compilers, most notably the Java compiler from Sun Microsystems, are designed to compile source code (e.g. Java Programs or Java Applets) into sequences of instructions to be executed on a stack-based virtual machine. A key benefit of compiling source code for execution on a virtual machine is that any processor that can be programmed to implement a virtual machine, regardless of the processor's internal architecture, may execute the compiled code.
When a human readable unit of source code is compiled into a stream of instructions for a virtual machine by a typical compiler, the mechanically compiled virtual machine instructions can be deterministically transformed back into a version of the human readable source code. This process of de-compilation of instructions for a virtual machine into a version of the human readable source code enables reverse engineering of the intellectual properties embedded in the source code. After spending a large amount of time and resources developing a software program, developers do not want to place their applications in the public domain in a form that gives away their efforts.
Obfuscation is the process of transforming a stream of computer instructions into another stream of instructions that executes the same set of logical operations as that in the original stream so that it is more difficult to be transformed back into a version of the human readable source code.
FIG. 1 shows one example of an obfuscation method according to one embodiment of the prior art. In operation 341 a typical compiler converts a unit of human readable source code 302 into a virtual machine instruction stream 304 which can be easily de-compiled into a version of the human readable source code. To obfuscate the virtual machine instruction stream 304, operation 343 breaks the stream 304 into a set of parts 310. These parts are transformed and padded with dummy instructions in operation 345. For example, part 316 is transformed into part 324, which is padded with dummy instructions 322. The transformations in operation 345 may include reversing loops, expanding loops, flow transformation, renaming identifiers, etc. After the transformation and padding, operation 347 assembles the set of transformed and padded parts 320 into a new instruction stream 330. The new instruction stream is obfuscated and more difficult to be de-compiled into a version of the human readable source code than the mechanically compiled instruction stream 304.
Dummy instructions 322 are not intended to be executed by a virtual machine for efficiency. For example, null instructions may be used as the dummy instructions to change the patterns of mechanically compiled instruction streams in order to prevent some software programs from de-compiling the instruction stream into a version of the human readable source code.
FIG. 2 shows a block diagram of an obfuscation method according to one example of the prior art. Operation 202, corresponding to the operation 343 in FIG. 1, breaks a virtual machine instruction stream into parts. Operation 204 transforms the parts; operation 206 pads the transformed parts with dummy instructions. Operations 204 and 206 correspond to the operation 345 in FIG. 1. Operation 208, corresponding to operation 347 in FIG. 1, assembles the padded and transformed parts into a new instruction stream.
However, the obfuscation methods as in FIGS. 1 and 2 are subject to attack. The distinct characteristics of the parts, which are taken from a logically cohesive source, and the dummy instructions, which do not perform any logical operation, make it possible to filter out the dummy instructions from the obfuscated instruction stream. Just as chaff can be separated from wheat because of the different physical characteristics, so can be the dummy instructions when an obfuscated instruction stream is compared to an instruction stream that is from a logically cohesive source. The chaff can be seen and removed. The dummy instructions may be shown to be garbage or not producible from a valid source, and thus be detected and removed.
Since in operation 204 the transformations applied to the parts are chosen from a transformation library, a large pool of obfuscated virtual machine instruction streams may be processed to derive the transformation library. With a derived transformation library, an obfuscated instruction stream produced according to the methods in FIGS. 1 and 2 can be transformed back into a version of a human readable source code once the dummy instructions are removed.