1. Field of the Invention
The present invention relates generally to software development systems and, more particularly, to a development system providing a methodology for hiding (steganographic embedding) information in a software program.
2. Description of the Background Art
Software is very easy to copy and distribute without any indication of who the original copy was licensed to. This is a significant concern for electronic distribution of commercial software, since there is no way to tell the difference between the original software download provided by the vendor and a copy of that software provided for download by unauthorized parties. As a result, illegally copied applications continue to be distributed on a wide-scale basis over the Internet, with software developers losing billions of dollars per year as a result.
Digitally stamping software with some sort of identifier is one possible technique for detecting and tracing unauthorized copies of software packages. For example, licensee or license key information can be embedded into an executable in a variety of ways, ranging from appending the data to the executable in clear text, to encrypting the data appended to the executable, to appending the data and encrypting the entire executable. Besides allowing one to trace software, this information can be used to prevent software from being executed, manipulated, or copied. To date, however, such identification data is easy to remove and thus does not provide a sufficient obstacle to unauthorized copying and distribution of the software.
Another approach is a technique to encode data in an executable file (e.g., .exe file on Microsoft Windows systems) by rewriting the machine opcodes using different equivalent instructions or instruction sequences. See, e.g., “Hydan: Hiding Information in Program Binaries” by Rakan El-Khalil and Angelos D. Keromytis, available via the Internet (currently available at wwwl.cs.columbia.edu/˜angelos/Papers/hydan.pdf, and at www.crazyboy.com/hydan/), the disclosure of which is hereby incorporated by reference for purposes of indicating the background of the invention or illustrating the state of the art. Owing to their digital nature, computers essentially only understand “machine code,” i.e., the low-level, minute operational codes or instructions (“opcodes”) for performing specific tasks. Opcodes are therefore the executable binary instructions—the sequence of ones and zeros—that are interpreted as specific instructions by the computer's microprocessor, such as Intel x86 microprocessor (e.g., Intel Pentium). The opcode-based approach to encode or hide data has the advantage that the embedded data is difficult to find (e.g., in a debugger tool). The technique of rewriting the opcodes is problematic, however, as changing the opcode sequence or stream may cause less-than-optimal instructions to be used, thus potentially degrading software performance. For example, “jump” (JMP) instructions may take longer to execute than the original encoding. As another shortcoming, the data bandwidth available to be carried by the technique is very small. This results from the fact that rewriting the opcodes only allows for a few additional bits to be accommodated over a given section of code, such as only 1 bit per 100 bytes. Therefore, one would need to have a rather large executable file in order to embed just a modest amount of additional information using this technique.
The approach also suffers from being tied to a specific instruction set (e.g., x86 opcodes), and may even be tied to a specific model of a processor (e.g., dependent on Intel Pentium 4's flexibility with processing instructions). The technique is based on the assumption that one knows in advance the patterns that the compiler will produce. As a result, any subsequent optimizations or improvements in a compiler's processing that affect the opcode sequence will break the technique. Executables created with one version of a given compiler will likely be incompatible with executables that are created with a subsequent version of that compiler. Finally, the approach has the disadvantage that its use of unusual opcode sequences may in fact alert hackers, who then can attempt to decode the embedded bits.
What is needed is a technique for creating software in a manner that allows information hiding that is largely transparent to both developers and their end-users. In particular, such an approach should allow software to be conveniently installed and used by end-users, but at the same time support the embedding of hidden information that protects the software against unauthorized copying and distribution. Additionally, the approach should be fairly transparent to the software developer, and thus should not inject additional dependencies or incompatibilities into the development process. The present invention fulfills these and other needs.