1. Field of the Invention
The invention relates generally to techniques which protect code that is executable in a computer system from reverse engineering and/or modification. The invention relates more particularly to the use of obfuscation and watermarking to protect code that executes in an execution environment such as that provided by the Java platform.
2. Description of Related Art
As more and more of the devices attached to networks have become programmable, mobile code has become more and more important. Mobile code is code which is downloaded to a device attached to a network in the course of an interaction between a user of the device and the network (or another device attached to the network) and is then executed as part of the interaction. Mobile code is ubiquitous in the Internet. Many Web pages include mobile code written in the Java™ or ActiveX programming languages. When the Web page is received in a browser, the mobile code is executed by the computer upon which the browser is running. Mobile code is also used to implement features in devices such as cellular telephones. When a user does something with the cellular telephone which requires the feature, mobile code for the feature is downloaded to the cellular telephone and then used in the interactions that involve the feature.
From the point of view of the provider of a piece of mobile code, the very mobility of the code is a problem. In order to be useful, the code must be downloaded to the user; once it has been downloaded, it is available to the user not only for use, but also for illegitimate purposes such, as copying, reverse engineering, and modification, including modification for the purpose of altering the interaction for which the code was downloaded. As modified, the code may permit the user to avoid paying a fee, to access restricted information, or to sabotage a network switch, to name just a few possibilities. Widely-available software tools such as decompilers (programs which produce a high-level language version of a program, for example, a source code version, from an object code version), disassemblers (programs which produce an assembly-language program from an object code version), or debuggers (programs which permit a user to observe and manipulate another program as the other program executes) make it relatively easy for the skilled user to study and modify the mobile code.
What has just been described is an example of the malicious host problem. The host is the processor upon which the downloaded mobile code executes; a host is malicious when it makes illegitimate use of the mobile code. The parents of the present patent application deal with two techniques for protecting mobile code from malicious hosts:                obfuscation, in which code is rewritten in a form which does not substantially affect the manner in which the code executes, but makes it more difficult to study, decompile, or disassemble the program.        watermarking, in which information is hidden in the code which does not affect how the code executes, but makes it possible to detect whether the code has been altered.        
In the parent applications, obfuscation and watermarking techniques are applied to code written in the Java programming language, developed by Sun Microsystems, Inc. and described in detail in Ken Arnold, et al., The Java Programming Language, Addison-Wesley Publishing Company, Reading, Mass., 1997. Programs written in the Java programming language are intended to be used in an infrastructure 101 of the type shown in FIG. 1. Writing a Java language program involves the portions of the infrastructure shown at 103 through 107. Java source code 103 is the Java language code as written by the programmer; Java compiler 105 is a program which generates Java byte code 107 from Java source code 103. Java byte code 107 is executable on any programmable device which includes a Java virtual machine. For a general discussion of the Java virtual machine, see Tim Lindholm and Frank Yellin, The Java Virtual Machine Specification, Addison-Wesley Publishing Company, Reading, Mass., 1999.
Such a programmable device is shown at 111. Device 111 has two main hardware components, processor 113, which executes machine instructions 117, and memory 114, in which programs and data are stored. Included in the programs is Java virtual machine 115, which interprets the byte codes in Java byte code 107 to produce machine instructions 117. Programmable device 111 is connected to network 109 and Java byte code 107 is a mobile program which has been downloaded via network 109 from a server (not shown) upon which it was stored. As indicated above, byte code 107 may be a part of an HTML page being interpreted by a Web browser.
In interpreting Java byte code 107, Java virtual machine 115 must interpret Java byte code 107's names. Some of the names in byte code 107 are defined by the Java infrastructure; others are defined in byte code 107. In the Java programming language, names are defined in class definitions; Java virtual machine 115 has access to two sets of class definitions: Java system classes 119, which are class definitions that are available to the Java virtual machine from sources other than byte code 107, and application classes 121, which are classes defined in byte code 107. Application classes 121, like the other data used in the execution of Java byte code 107, are stored in application runtime 123, an area private to the execution of Java byte code 107. The use of application runtime 123 ensures that an execution of byte code 107 will neither affect nor be affected by the execution of other Java byte codes. Moreover, application runtime 123 can be defined in a manner which limits the amount of control that a byte code 107 may exercise over programmable device 111, and can thereby protect programmable device 111 from mistakes in byte code 107 or malicious byte codes.
The popularity of the Java programming language for mobile code is a result of the advantages offered by Java infrastructure 101. Because Java byte codes can be executed on any device with a Java virtual machine, Java byte codes are completely portable. Because application runtime 123 offers a protected execution environment for the byte codes, the byte codes may be safely executed on any of these devices. Infrastructure 101 does, however, have a significant disadvantage: Java byte codes are more difficult to protect against study and reverse engineering than other executable programs.
One reason for this is that a Java byte code and a Java virtual machine together contain far more information about the program than is available in the object code generally produced by compilers. Together, Java system classes 119 in the Java virtual machine and application classes 121 for a given Java byte code contain all of the information needed to define the symbolic names used in the Java byte code. Symbolic names include class, method, and field names. Some of the symbolic names are defined by the programmer for the particular application program and others are defined as part of the Java infrastructure. The latter names are termed herein Java system names. Because the definitions for all of the names used in the byte code are contained either in the byte code itself or in the Java virtual machine, a programmer who is studying the byte code can use the Java reflection mechanism or a Java debugger to find out the complete class information for a particular program construct in the byte code.
Obfuscation of Java Byte Code
The possibilities for obfuscating Java byte code are limited. Application-defined names in Java byte code can of course be obfuscated in the same fashion as in any other computer code. However, when Java virtual machine 115 executes a Java byte code, it links the java system names in the byte code to the definitions 119 of those classes in programmable device 111. The linking is done by matching the names in the byte code with names in the definitions 119. Consequently, the Java system names cannot be obfuscated in the byte code. If they are obfuscated, virtual machine 115 cannot find the definitions in system classes 119 and if it cannot do that, it cannot execute the byte code. A parent of the present patent application, U.S. Ser. No. 10/019,828, presents techniques for obfuscating Java system names in Java byte codes and executing the byte codes with the obfuscated system names.
Watermarking Java Byte Code
A digital watermark is a message which has been incorporated into the content of a digital representation in such a way that the message does not render the digital representation unfit for its intended purpose. Typically, the watermark adds an imperceptible amount of noise to the digital representation. Digital watermarks are used for a number of purposes; the one that is of interest here is to determine whether alterations were made to the digital content after the watermark was added. The alterations necessarily also alter the watermark, and consequently, to determine whether alterations were made, one simply compares the original watermark with the watermark currently in the digital representation. The difficulty with applying standard digital watermarking techniques to mobile code is that mobile code is executable code; that is, everything in it is functional. There is thus no “noise” to hide the watermark in and adding “noise” changes the behavior of the program.
Techniques have nevertheless been developed for using watermarks to authenticate executable code. These techniques have fallen into two broad classes: static watermarking and dynamic watermarking. In static watermarking, the watermark can be perceived from the text of the code; for example, IBM researchers used the order in which the code pushed and popped certain registers as a watermark, as disclosed in: Counsel for IBM Corporation. Software birthmarks. Talk to BCS Technology of Software Protection Special Interest Group. Microsoft researchers encoded a software serial number in the program's control flow graph, as disclosed in U.S. Pat. No. 5,559,884, Robert Davidson and Nathan Myhrvold, Method and system for generating and auditing a signature for a computer program, September 1996. To authenticate a program using such static watermarks, the sender includes an encrypted representation of the correct value of the property being used to watermark the code and the receiver can decrypt the representation and compare it with the value of the property in the code as received.
In dynamic watermarking, the watermark can be perceived from properties of the execution of the code. Published PCT application WO 99/64973, Callberg, et al., Software watermarking techniques, priority date Jun. 10, 1998, describes program watermarking techniques that are based on the program's dynamic response to a given input string.
While these techniques do make it possible to authenticate executable code, they have significant limitations. In the case of the static watermarking techniques described above, the information used for the watermark is an integral part of the executable code, which means that all copies of the executable code will have the same watermark. Moreover, if the property being used as the basis of the watermark is known, a malicious sender need only modify other aspects of the executable code. As long as the property that is the basis of the watermark is untouched, the modified code will appear to the receiver to be authentic.
In the case of the dynamic watermarking, the dynamic response that provides the watermark is produced by adding additional code to the program being watermarked; because the additional code is not necessary for the functioning of the program, it can be removed, and when it is removed, the watermark is gone. Another parent of the present application, U.S. Ser. No. 10/019,827, describes techniques for overcoming these limitations of watermarks in code.
As described in the parents of the present application, obfuscation and watermarking are stand-alone techniques for protecting mobile code against malicious hosts. What is needed, and what is provided by the invention disclosed herein, are techniques that combine obfuscation and watermarking to provide better protection for mobile code against malicious hosts than heretofore possible. It is thus an object of the present invention to provide better protection for mobile code against malicious hosts.