1. Field of the Invention
The invention relates generally to protecting executable code against impermissible uses and more particularly to altering executable code to reduce the amount that can be learned from the executable code by decompiling or disassembling it.
2. Description of Related Art
As more and more of the devices attached to networks have become programmable, mobile code has become more and more important. Mobile code is code which is downloaded to a device attached to a network in the course of an interaction between a user of the device and the network (or another device attached to the network) and is then executed as part of the interaction. Mobile code is ubiquitous in the Internet. Many Web pages include mobile code written in the Java™ or ActiveX programming languages. When the Web page is received in a browser, the mobile code is executed by the computer upon which the browser is written. Mobile code is also used to implement features in devices such as cellular telephones. When a user does something with the cellular telephone which requires the feature, mobile code for the feature is downloaded to the cellular telephone and then used in the interactions that involve the feature.
From the point of view of the owner of the intellectual property rights in a piece of mobile code, the very mobility of the code is a problem. In order to be useful, the code must be downloaded to the user; once it has been downloaded, it is available to the skilled user for study and reverse engineering. Using tools such as decompilers (programs which produce a high-level language version of a program, for example, a source code version, from an object code version), disassemblers (programs which produce an assembly-language program from an object code version), or debuggers (programs which permit a user to observe and manipulate another program as the other program executes), the skilled user can learn a great deal about the mobile code and can use what he or she learns to produce his or her own version of it.
A technique that has been widely used to make the study of programs generally and mobile programs in particular more difficult is obfuscation. To obfuscate a program, one rewrites it in a form which does not substantially affect the manner in which the program executes, but does make the program more difficult to study. For example, most of the entities in a program have names chosen by the programmer. Programmers generally choose the names with an eye to making the program more understandable for human readers of it. For the systems which are used to generate executable code from the program or to execute the code, though, it makes no difference whether a name is understandable. These systems require only that the name be used according to the rules of the relevant programming language. Thus, one way of obfuscating a program is to replace all of the names in the program with names that are legal in the programming language but as meaningless as possible to a human being reading the program. For a general discussion of obfuscation, see the published PCT application, WO 99/01815, Collberg, et al., Obfuscation techniques for enhancing software security, published 14 Jan. 1999.
Many mobile programs are written in the Java programming language, developed by Sun Microsystems, Inc. and described in detail in Ken Arnold, et al., The Java Programming Language, Addison-Wesley Publishing Company, Reading, Mass., 1997. Programs written in the Java programming language are intended to be used in an infrastructure 101 of the type shown in FIG. 1. Writing a Java language program involves the portions of the infrastructure shown at 103 through 107. Java source code 103 is the Java language code as written by the programmer; Java compiler 105 is a program which generates Java byte code 107 from Java source code 103. Java byte code 107 is executable on any programmable device which includes a Java virtual machine. For a general discussion of the Java virtual machine, see Tim Lindholm and Frank Yellin, The Java Virtual Machine Specification, Addison-Wesley Publishing Company, Reading, Mass., 1999
Such a programmable device is shown at 111. Device 111 has two main hardware components, processor 113, which executes machine instructions 117, and memory 114, in which programs and data are stored. Included in the programs is Java virtual machine 115, which interprets the byte codes in Java byte code 107 to produce machine instructions 117. Programmable device 111 is connected to network 109 and Java byte code 107 is a mobile program which has been downloaded via network 109 from a server (not shown) upon which it was stored. As indicated above, byte code 107 may be a part of an HTML page being interpreted by a Web browser.
In interpreting Java byte code 107, Java virtual machine 115 must interpret Java byte code 107's names. Some of the names in byte code 107 are defined by the Java infrastructure; others are defined in byte code 107. In the Java programming language, names are defined in class definitions; Java virtual machine 115 has access to two sets of class definitions: Java system classes 119, which are class definitions that are available to the Java virtual machine from sources other than byte code 107, and application classes 121, which are classes defined in byte code 107. Application classes 121, like the other data used in the execution of Java byte code 107, is stored in application runtime 123, an area private to the execution of Java byte code 107. The use of application runtime 123 ensures that an execution of byte code 107 will neither affect nor be affected by the execution of other Java byte codes. Moreover, application runtime 123 can be defined in a manner which limits the amount of control that a byte code 107 may exercise over programmable device 111, and can thereby protect programmable device 111 from mistakes in byte code 107 or malicious byte codes.
The popularity of the Java programming language for mobile code is a result of the advantages offered by Java infrastructure 101. Because Java byte codes can be executed on any device with a Java virtual machine, Java byte codes are completely portable. Because application runtime 123 offers a protected execution environment for the byte codes, the byte codes may be safely executed on any of these devices. Infrastructure 101 does, however, have a significant disadvantage: Java byte codes are more difficult to protect against study and reverse engineering than other executable programs.
One reason for this is that a Java byte code and a Java virtual machine together contain far more information about the program than is available in the object code generally produced by compilers. Together, Java system classes 119 in the Java virtual machine and application classes 121 for a given Java byte code contain all of the information needed to define the symbolic names used in the Java byte code. Symbolic names include class, method, and field names. Some of the symbolic names are defined by the programmer for the particular application program and others are defined as part of the Java infrastructure. Because the name definitions are included in the byte code and the Java virtual machine, a programmer who is studying the byte code can use the Java reflection mechanism or a Java debugger to find out the complete class information for a particular program construct in the byte code.
Another reason why Java byte code is difficult to protect is that when Java virtual machine 115 executes a Java byte code, it links the names in the byte code that are defined in the Java system classes to the definitions 119 of those classes in programmable device 111. The linking is done by matching the names in the byte code with names in the definitions 119. Consequently, the names defined in the Java system classes cannot be obfuscated in the byte code. If they are obfuscated, virtual machine 115 cannot find the definitions in system classes 119 and if it cannot do that, it cannot execute the byte code.
It is an object of the present invention to overcome the above disadvantage of the Java infrastructure by providing improved techniques for obfuscating Java byte codes, including names in those byte codes that are defined in Java system classes.