Computer languages and environments generally employ three types of methods for translating source code written by a programmer into machine instructions. In interpreted environments, each statement or command of source code is translated to machine language as the program runs, one or a few statements at a time. An example of an interpreted environment is the Restructured Extended Executor (REXX) language specified by IBM and supported on IBM's Virtual Machine (VM) and OS/2.TM. operating systems. In compiled environments, each part of a program is translated into machine code and the several parts are linked together into a machine code executable file for a specific machine before the program is distributed or run. Compiled environments include C, C++, FORTRAN, and most mainstream programming languages. A third type of environment is a virtual machine environment as shown in FIG. 1, in which a program is translated 105 into a pseudo-machine code 115, which runs on a machine specification which may be implemented in a hardware machine or may be implemented as a "virtual machine" 125 in software on several hardware and operating system platforms 130, before being run. This third environment allows the translated program to be run on every hardware and operating system environment on which there is a compliant implementation of the virtual machine. The present invention is directed toward virtual machine environments, the most prominent of which is the Java.TM. environment, recently released by Sun Microsystems, in which the pseudo-machine code is also called bytecode, and the files that contain the bytecode and other information needed to load and run programs are called class files. The terms bytecode and class file are used generically herein to specify, respectively, the pseudo-machine code to be interpreted by a virtual machine environment and the files which contain bytecodes and other information needed to load and run programs in virtual machine environments, respectively. Use of these terms is not intended to restrict the invention to the present JavaTm architecture, or to the Java.TM. language.
In virtual machine environment systems, programs and program parts are usually distributed in class files, with the source code not generally being made available. In some environments it may be desirable to derive the original source code from the class files, and in the Java.TM. world there are several tools available to do this derivation. These tools include a tool named Mocha, written by the late Hanpeter van Vliet and available at http://www.brouhaha.com/.about.eric/computers/mocha.html, and another tool named D-Java, written by Shawn Silverman and available at http://home.cc.umanitoba.ca/.about.umsilve1/djava/. Many authors of Java.TM. programs do not want their class files to be translated back to source code, a process known as disassembly. These authors may resort to another set of tools available in the Java.TM. world called obfuscators, which operate as shown in FIG. 3. Obfuscators modify Java.TM. class files 300 to make them difficult to understand if they are disassembled using tools like the Mocha or D-Java disassembers. An example of an obfuscator is the hashjava tool, by KB Sriram and available at http://webx.best.com/.about.kbs/hashjava.html. Obfuscators do not modify the logic and/or semantics of Java.TM. class files; they perform simple transformations on names to make them harder to understand, and they do not allow the user flexibility to do anything else with class files that would change their logic or semantics. Some obfuscators do not allow the user flexibility to affect how the class files are transformed, and others such as the hashjava tool only allow the user the capability to specify simple transformations 320 that will be performed on the names.
One may desire to modify the logic and/or semantics of class files without having the original source available. For example, in the area of security, one may wish to have a capability to "scrub" class files to prevent them from executing any instructions that may cause a security risk. In the prior art, the Java.TM. Virtual Machine (JVM) 125 has a "bytecode verifier" function that examines class files to verify that they are not performing risky operations. However the bytecode verifier does not have any capability for modifying class files found to be troublesome--it simply refuses to load them.
Another area in which one may desire to modify the logic or semantics of class files with no source code available is to enable different pieces of a program to be dynamically distributed across several computers, without requiring the programmer to be aware of such distribution, as in the invention described in the copending, commonly assigned patent application having Ser. No. 08/852,263, entitled "A Process for Running Objects Remotely," filed on May 6, 1997.
Further reasons to modify class file logic or semantics include, but are not limited to, performance optimization, performance and error tracing and notification, etc.
Digital Equipment Corporation's Analysis Tools with OM (ATOM) tool provides for programmatic modification of compiled code for the compiled environment without resort to source code. The ATOM tool is designed to add calls to analysis and measurement routines, not to alter the logic and semantics of a program. Additionally, as the ATOM tool is designed to operate in the compiled environment, it has no capabilities for modifying compiled code as it is loaded to be run, and the ATOM tool works only on compiled code for Digital Equipment Corporation's supported machines. More information on the ATOM tool is available at http://www.research.digital.com/wrl/projects/om/om.html.
Using the teachings of the prior art, there are several unappealing methods available for modifying class files for virtual machine environments such as the Java.TM. environment.
1. Class file modification could be done by hand, as shown in FIG. 2, using the disassemblers of the prior art 205 to derive source code 210, modify it by hand 215, and recompile 225 it into a class file 230 again. If one is particularly skilled in the art of the language being used and is familiar with the formats of bytecodes and class files, one may also directly manipulate class files by hand (235 and 240). However, this process is tedious and error prone, and particularly inefficient if many parts must be modified in similar ways that could be described programmatically. Furthermore, one may desire to do class file modification dynamically, as the bytecode files are being loaded for execution, especially if said bytecode files are being loaded from a remote server, and therefore are not available to the local machine before they are to be run. It obviously would not be desirable to do such dynamic, load-time modification by hand.
2. Class file modification could be done with a tool which makes specific transformations, as shown in FIG. 3, for example changing commands that may pose a security risk into harmless or ineffective commands. Such a tool would work similarly to an obfuscator, searching for specific sequences in the class file and changing them to some other sequence. Such tools may even allow for the use of "profiles" 320 or other methods of exporting to the user 325 choices such as what sequences are to be changed into what other sequences, or giving a mapping from an existing sequence to a new sequence that is to replace the existing sequence. Such a method would simply represent the automation of the first method mentioned above.
3. Class file modification could be done with a program such as the ATOM tool, which would allow a programmatic and flexible interface for making changes to the compiled code. While a program such as the ATOM tool would allow the user flexibility to add and change compiled code in a programmatic way, such a tool contemplates adding analysis and measurement routines, and does not contemplate modifications in the logic or semantics of the modified code. Also, the ATOM tool does not teach or suggest modifying code in an automatic fashion as such code is being loaded for execution, since the concept of a class loader is absent in the compiled environments for which the ATOM tool is designed.
Hence, none of the prior art teaches or suggests a method for doing bytecode modification at load time in a programmatic way.