This invention relates in general to the execution of computer programs and more specifically to a system which allows the run-time functionality of a compiled computer program to be modified.
Computer programs, or code, can be executed in a number of different ways. Two broad, and fundamentally different, ways are by executing interpreted code or by executing compiled code.
Interpreted code generally requires the processor to operate on a line-by-line basis on human-readable code. That is, the representation of the computer program is in a text-based form, or not far removed from a text-based form such as where the code is "tokenized," so that the difference between the code as written by a human programmer, and as executed by the processor are quite similar. Interpreted code unlike compiled code, has the advantage of not requiring long "build" times. Essentially a programmer can write interpreted code and execute the code immediately for the purposes of testing the code. In this respect, interpreted code is useful for rapid prototyping. Interpreted code is generally well-suited to small applications where speed is not a major issue. This is because a big drawback of interpreted code is that it is notoriously slow compared to compiled code.
Compiled code produces very fast executable programs. However, the creation and maintenance of compiled code is much more involved than with interpreted code. Also, the programs produced with the compiled program approach are more complex to develop and modify. Typically, many program modules are required which must be compiled, linked and loaded before a change can be tested or before a deliverable executable is produced. There can be hundreds of different modules, or files, in large compiled computer programs, sometimes referred to as "projects." The build process for these projects is complicated in itself, requiring precise coordination of symbols, processes, resources and other aspects of developing the program. A complete build of a computer program can take hours of time, depending on the size of the program. Moreover, compiled code development requires precise archiving, bookeeping and tracking of modules, utilities, tools and other developer support software. As a computer program ages it may be difficult, or impossible, to re-build a specific version of a program even though the executable version of the program is still in use. This is because the operating system, development environment, tools, utilities or other software (or hardware) used to build the program may have changed.
Since a programmer must have detailed knowledge of the compiled program and the development environment it is difficult for programmers who are not the original programmers of a compiled program project to "come up to speed" and make modifications to software written by another programmer. The compiled, linked and loaded executable code is not readable by a human programmer in any practical sense, forcing the programmer to learn not only the human-readable "source" code version of the program, but to also have a detailed working knowledge of the build process for the program. Thus, the maintenance and modification of compiled code poses problems.
Another property of both interpreted and compiled code is that it is difficult to change the run-time functionality, or behavior, of the code. In the interpreted code approach, an entire new set of interpreted code instructions must be loaded onto a user's computer. Compiled code is typically so much larger than interpreted code that approaches, discussed below, have been developed to circumvent the loading of a completely new version of the compiled executable. Although interpreted code is relatively small, thus permitting new versions to be loaded easily, it is not suitable for the majority of application programs which require fast execution of very large programs. Examples of interpreted code are BASIC, Lisp and script languages such as Perl and Java. Examples of laguages used in compiled code approaches are Fortran, "C," assembly language, etc. However, these categories are somewhat loosely defined since any computer language can be implemented as a compiled code or interpreted code approach assuming an appropriate "interpreter" or "compiler" is written for the target machine intended to execute the code.
The prior art is discussed below with reference to FIGS. 1A-F.
FIG. 1A shows a simplified diagram illustrating the process for executing interpreted code.
In FIG. 1A, program 10 is created by a programmer. Typically this is done in a word-processing program which results in human-readable text. The program is loaded into a user computer 12. The transfer, or loading, can be by diskette, compact disk read-only memory (CDROM), downloading from a network, or by other means. The loaded program 14 is usually an exact copy of the original text produced by the programmer. Loaded program 14 is interpreted by interpreter 16 which results in the execution of functions as specified by the program code, or script, to produce the desired run-time functionality in the user's computer. Thus, there is only a single program definition in an interpreted code approach. That of program 10 which serves as the human-readable definition of the program and as the executing image in the user's computer.
FIG. 1B shows a simplified diagram illustrating the process for executing compiled code.
In FIG. 1B, items 20-30 are part of a software "build" process whereby an machine-readable executable object is created from human-readable source code modules. Items 34-38 illustrate items involved with executing a compiled software program on a user's computer 32.
In FIG. 1B, source code 20 is created by a programmer and is the human-readable version of the program. Typically, programmers in compiled code development environments work with separate files of source code so, for example, source code 20 of FIG. 1B represents a single module of many modules used in the program project. A module can have from a few to several hundred or more lines of code. Source code 20 is compiled by compiler 22 to result in an object file 24. Each module is capable of being compiled independently of any other modules in the program project. This allows sections of the program to be modified on a module-by-module basis without requiring compilation of all of the many modules in the program project. However, note that changing even a single line in a module requires that the entire module be re-compiled and the entire project re-built as discussed below.
Once compiled, the object file can be linked to other object files in the program project. The other object files can be from other compiled source code modules that the programmer, or another programmer, has written. Other sources for linkable object modules include pre-existing library objects 26 to provide commonly needed functions. All of the object files for the program project are linked via linker 28 to produce a single executable object 30. Producing executable object 30 culminates the program build. Note that, generally (aside from dynamic linking discussed below) it is necessary to have all of the object files from all of the modules, libraries and other object file sources on hand to do a build. This is problematic on large projects because different objects may be changing at different times as a result of different programmers' actions. When it is desired to change the functionality of a compiled program at a later time, by re-building the program, for example, it is necessary to have a set of object files that are compatible with each other. Essentially this means that all "symbol" references among the object files must agree. Symbols are merely text labels that refer to different objects or items in the source code such as processes, routines, subroutines, functions and data structures. Because the build process is so complex, programmer's "developer's environments" are provided by software development "tool" manufacturers that assist in coordinating modules, object files, symbol references and performing other related development tasks.
Returning to FIG. 1B, the result of linking object files is executable object 30. Executable object 30 represents the deliverable product which is then transferred to the target, or user's, computer 32 for execution. Typically, executable object 30 is loaded by loader 32 which places the executable object into system random access memory (RAM) to produce an executable image 34. As with the interpreted code case, the executable object can be transferred on any computer-readable media, transferred over a communication link such as the Internet, or stored in the user's computer by other means. Executable image 34 is accessed by the processor in the user's computer to execute the compiled instructions to provide functionality according to the original source code instructions.
Note that execution of compiled code requires several steps and many different files and transformations of the original source code instructions. This is in contrast with execution of interpreted code. The complexity of preparing and handling compiled code is necessary to achieve maximum speed and efficiency desired by many of today's computer applications.
However, a problem exists with both the interpreted and compiled program approaches in that it is difficult to modify the functionality of these programs at run-time. That is, at the time when the user is executing the program it is difficult, or impossible, for a developer, manufacturer, or other provider of the original program to modify the functionality of the programs provided to the user. In order to fully illustrate this more details of both the interpreted code and compiled code approaches are presented.
FIG. 1C shows an example of a small portion of interpreted source code. In FIG. 1C, a portion of a Java-type of code is shown. An example of this is the JavaScript language described in detail in such references as "JavaScript, The Definitive Guide," by David Flanagan, published by O'Reilly & Associates, Inc., 1997. FIG. 1A shows the JavaScript as it would appear resident in the user's computer system such as in loaded program 10 of FIG. 1A. Although the JavaScript is written in a specialized syntax, it uses standard alphanumeric English characters and is human-readable. Thus, a person of skill in the art can immediately look at the three lines of script resident on a user's computer and determine that these lines instruct the computer to print out the numbers from one to ten with a space between each number.
In order to modify the functionality of this code as, for example, to print out the numbers from one to twenty, an entire new source code module, or document, would have to be obtained and substituted for the source code module containing the lines shown in FIG. 1C. Typically, this is not a problem where the source code modules arc very small in size. For example, Java-type languages are very popular on the Internet where they are used embedded within, or loaded in connection with, World Wide Web pages. Since the Java "applets" tend to be very small, a new version of the applet is provided each time a user access (i.e., loads) a page of information. In this manner, changes to the source code are always available in the version that the user is executing.
However, as mentioned above, interpreted source code has the major drawback of vere slow execution (compared to compiled code). Also, large programs are not written in interpreted code because the human-readable format is not space-efficient.
FIG. 1D shows an example of what a compiled executable object portion would look like. The executable object is merely a series of numbers, represented in FIG. 1D as hexadecimal numbers. Typically these numbers are not even viewable as readable numbers unless special viewing programs arc used. The numbers represent machine-level instructions that the central processing unit (CPU) in the user's computer system executes directly rather than the indirect, interpreted, approach.
For example, in the interpreted code instruction of FIG. 1C, a line-by-line interpretation may take place. The computer reads in a line of script, parses the line, and converts the line into a series of machine instructions and executes the instructions. In some cases the conversion results in numbers similar to those that would result had the interpreted code been compiled, instead. However, in many cases the interpreted code is not as fast or compact, even considering just executing a single line of the interpreted code. Naturally the reading, parsing, converting and executing of the interpreted line require the processor to execute many extra "overhead" steps to accomplish the ultimate execution of only a few machine instructions that actually perform the functionality intended by the programmer. The executable object of the compiled program approach, on the other hand, is already in machine readable form and contains just the instructions to implement the functionality desired by the programmer. Thus, the computer can directly load each machine instruction into its processor and execute the instruction without suffering the run-time execution overhead of the interpreted code approach. The use of compiled code is absolutely necessary to take full advantage of a processor's speed, and the computer's limited memory and disk space.
As can be seen from the executable object code portion in FIG. 1D, the numbers occupy contiguous locations in memory. This makes it difficult to perform a modification that adds instructions or data to the executable. Again, in order to properly change the functionality of the executable object resident in the user's computer a new, or modified, executable object would have to be produced and provided to the end-user. Although this can be done by downloading over a communication link, such as the Internet, the large size of today's executable objects requires several hours of download time and makes it impractical and undesirable to make changes to the run-time functionality of compiled executable objects.
The software industry has developed two basic approaches to changing the functionality of compiled executable objects at run-time. These are (1) "patching" existing executable objects or (2) using dynamically linked libraries (DLLs).
FIG. 1E shows an example of patching a compiled executable object.
In FIG. 1E, executable object 52 can be modified by patch apply code 54 which uses patch data 56. In this approach, an instruction or value, such as the value 000A which represents a 16-bit word can be modified to a different value such as 0005 by executing instructions in patch apply code 54 which obtain the new value 0005 from patch data 56 and insert the data into the proper location in executable object 52. In this example, a portion of code shown in FIG. 1D would have, as a result of applying the patch, the code shown in executable object 52.
Where executable object 52 may be on the order of tens of megabytes in size, the patch apply code 54 and patch data 56 (collectively the "patch") may be on the order of thousands of bytes. The patch typically constructs a new executable object by using data in the previous executable object (i.e., the executable object of FIG. 1D) making the changes and saving the new executable object (the executable object 52 in FIG. 1E) as a replacement to the prior executable object. Naturally, the types of patching that can occur include deleting portions of the old executable object, adding new portions to the old executable object and modifying existing portions of the old executable object.
Patching is more efficient than downloading an entire new version of a large compiled executable object since the patch apply code and patch data are much smaller than the executable object. However, patching only provides a one-time change to a program's functionality. Also, the change in functionality takes place before the program is executed. In this respect, the patching approach does not provide run-time modification of the functionality of the computer code. Also, the process of downloading a patch and applying the patch to create a new executable object is time-consuming and is not transparent to the user who must often participate in the patching process by giving authorization to go ahead with the patch, designating which directory the patched executable object will reside in, etc. Patches can cause problems where multiple patches exist for a program and the patches are not applied by a user in the proper order. Also, patches cause an annoying delay in a user's execution of a program since the patch must be obtained and executed--a sometimes time-consuming process--before the patched program can begin execution.
The second approach, that of using DLLs, does allow a degree of run-time modification of functionality. However, this approach is limited and lacks desired flexibility as described below.
FIG. 1F illustrates the use of a DLL.
In FIG. 1F, executable image 60 links to routines, functions, and other items in DLL 62 just prior to, and during, run-time. The process of linking to a DLL is similar in concept to the process performed by linker 28 of FIG. 1B during a program build, discussed briefly above. To understand dynamic linking to a DLL, linking during a program build is first discussed.
As mentioned, modules are handled as separate entities during program development. The modules' information must be combined to create a single executable program during the program build. The major task in combining modules is to resolve symbolic references to items such as processes, routines, subroutines, functions, objects, data structures and other resources (collectively referred to as "items") that may be defined in other modules. Symbolic references, or symbols, are merely the alphanumeric human-readable names that a programmer uses during writing of the source code. These symbols are later mapped into addresses to generate the machine-readable code. In order to map symbols to addresses the symbol must be associated with the definition of a corresponding item.
Some item definitions will not reside in a given module that needs to access the item. In this case, the given module "declares" that the item is external to the module. The compiler can then ignore references to the externally defined item so that the given module can be compiled (and checked for internal errors) without needing to include other modules in the compile. Later, when the modules have been fixed with respect to the errors that the compiler detects, the given module is "linked" with the module that actually has the item definition. The linker can provide detailed information on a symbol name, such as where the symbol is used, where the symbol is defined, the relative address that the symbol maps to, etc. Large computer programs may use thousands, or tens of thousands or more, symbolic references. Since each programmer typically makes up their own symbol names, memorization and understanding of the symbol names is usually a major hurdle to be jumped if another programmer is to understand the program sufficiently to modify the program. As mentioned, symbols are defined in one module and may be used by another, or many other, modules by having the other modules declare the symbol as external to the module. There are several other types of mechanisms for handling symbolic references. These mechanisms vary according to particular computer languages. However, each computer language, assuming it is a compiled language, ultimately needs to resolve symbol definitions and symbol use among multiple source code modules (or other source code entities) by using a process like a linker. Since symbol resolution is so massive, and so pervasive throughout the program project, it is mandatory that a major change to a compiled computer program take place by modifying, adding to, or deleting from, source code modules and that a subsequent re-build of the entire program take place to create a new version. As mentioned before, builds take considerable time and require accurate archiving, bookeeping and tracking of various modules, tools, utilities and additional information.
Dynamic linking allows linking to occur at startup time or during run-time execution of the program. The basic method of a DLL is to declare certain symbols as "exported" or "imported" symbols to be resolved at execution time. The code that resolves the symbols (i.e., associates each symbol reference to a symbol definition) is "DLL linking" code which is, itself, linked into the program during the program build and becomes part of executing image 60. At run-time the DLL linking code is executed to provide the executing program with access to items contained in the DLL.
Typically, DLL 62 includes many functions, routines, data structures, or other items that can be referenced by instructions within executing image. The use of a DLL provides a way to modify the functionality of an executable image just prior to, or during, run-time by changing the functions in the DLL prior to executing the executable image. For example, a DLL may contain a graphics library such as Microsoft's DirectX library. If the graphics routines are changed the user can update their DLL by obtaining the new DLL (e.g. from a CD ROM, downloading from the Internet, etc.) and execute the same executable image which will then make use of the new functionality provided by the updated DLL.
While the use of DLLs in this manner has advantages, there are also drawbacks. One drawback is that the entire DLL is loaded whenever any one item within the DLL needs to be referenced by an instruction in the executable image. The requirement for handling items within DLLs as a group is inefficient where only a small percentage of the items in the DLL are updated. That is, the user must obtain an entirely new DLL which contains only a small amount of changed code.
A second drawback with DLLs is that they require specific preparation in the application program that results in an executable image that can make use of DLLs. Referring to FIG. 1B, this preparation takes place at the outset when a programmer is writing the source code, such as source code 20. Typically, the programmer must declare DLL item references in various portions of the code. Specific support files must also be linked by linker 28 in order to resolve references to DLL items that will be actually linked at a later time. There are additional preparations that need to be made in order to correctly implement a DLL that vary among operating systems. Much of the details of DLL use are created by a programmer and, in this respect, a later programmer needs to learn the details. In general, the DLL approach is still a "library" based approach that doesn't work well for small, selective, functional changes. For discussions on preparing applications programs for use with DLLs, references describing operating systems such as Microsoft's Windows 95, Sun Micro Systems, and Sun OS should be consulted.
Thus, it is apparent that a system for providing modification of run-time functionality that overcomes the problems of the prior art is desirable. Such systems should allow transparent and efficient modification of functionality prior to, or during, run-time without requiring a user to update large files such as DLLs. Such a system should also allow general application programs to be modified without requiring intricate preparation or large overhead by programmers or other developers. Ideally, the system would allow persons not involved with the Program Conversion of the application to quickly and accurately modify the functionality of the application program. The system should provide efficient execution of instructions that modify the functionality of an executing image while not requiring large amounts of system RAM to accommodate the added, or changed, functionality. The system should provide the simplicity and flexibility of interpretive code while maintaining compactness and speed of execution provided by compiled code.