A. Field of the Invention
This invention generally relates to memory management for computer systems and, more particularly, to a methodology for managing memory resources for an application program having two types of program code, native code executing directly in an operating environment and target code for execution by an abstract computing machine associated with the operating environment and responsible for memory management for both types of code.
B. Description of the Related Art
Object-oriented programming techniques have revolutionized the computer industry. For example, such techniques offer new methods for designing and implementing computer programs using an application programming interface (API) associated with a predefined set of "classes," each of which provides a template for the creation of "objects" sharing certain attributes determined by the class. These attributes typically include a set of data fields and a set of methods for manipulating the object.
The Java.TM. Development Kit (JDK) from Sun Microsystems, Inc., for example, enables developers to write object-oriented programs using an API with classes defined using the Java.TM. programming language. The Java programming language is described, for example, in a text entitled "The Java Language Specification" by James Gosling, Bill Joy, and Guy Steele, Addison-Wesley, 1996. The class library associated with the Java API defines a hierarchy of classes with a child class (i.e., subclass) inheriting attributes (i.e., fields and methods) of its parent class. Instead of having to write all aspects of a program from scratch, programmers can simply include selected classes from the API in their programs and extend the functionality offered by such classes as required to suit the particular needs of a program. This effectively reduces the amount of effort generally required for software development.
The JDK also includes a compiler and a runtime environment with a virtual machine (VM) for executing programs. In general, software developers write programs in a programming language (in this case the Java programming language) that use classes from the API. Using the compiler, developers compile their programs into "class files" containing instructions for an abstract computing model embodied by the Java VM; these instruction are often called "bytecodes." The runtime environment has a class loader that integrates the class files of the application with selected API classes into an executable application. The Java VM then executes the application by simulating (or "interpreting") bytecodes on the host operating system/computer hardware. The Java VM thus acts like an abstract computing machine, receiving instructions from programs in the form of bytecodes and interpreting these bytecodes. (Another mode of execution is "just in time" compilation in which the VM dynamically compiles bytecodes into so-called native code for faster execution.) Details on the VM for the JDK can be found in a text entitled "The Java Virtual Machine Specification," by Tim Lindholm and Frank Yellin, Addison Wesley, 1996.
The Java VM also supports multi-threaded program execution. Multi-threading is the partitioning of a computer program or application into logically independent "threads" of execution that can execute in parallel. Each thread includes a sequence of instructions to carry out a particular program task, such as a method for computing a value or for performing an input/output function. When employing a computer system with multiple processors, separate threads may execute concurrently on each processor.
Thus, object-oriented facilities like the JDK assist both development and execution of object-oriented systems. First, they enable developers to create programs in an object-oriented programming language using an API. Second, they enable developers to compile their programs, and third, they facilitate program execution by providing a virtual machine implementation.
However, object-oriented programs may not be suitable for all functions of a system or it may not be economically feasible to convert all of the programs in an existing legacy system into object-oriented programs. It may also be necessary, for a system having primarily object-oriented programs, to use features of a platform's operating system that are not available in implementations using a VM like the Java VM. Finally, the virtual machine implementation itself is generally not written in the language it executes but rather in the native code of the host machine. Thus, it is not uncommon for systems to have programs with "native" and "non-native" code.
For purposes of this description, native code includes code written in any programming language that is then compiled to run on a compatible operating system/hardware configuration. For example, native code in this context includes program code written in the C or C++ programming language and compiled by an appropriate compiler for execution on a particular platform, such as a computer having the Windows 95 operating system running on an Intel Pentium processor. Native code is distinguishable from the non-native code, which will be referred to as "target code," because while non-native code is foreign to a platform's operating system/hardware configuration, its target for purposes of this description is an abstract computing machine, such as a VM, operating on any compatible platform configuration. For example, target code for the Java VM is generally written in the Java programming language. This combination of native and target code in the same application tends to complicate the management of memory resources (i.e., the allocation and deallocation of memory) for such systems.
In practice, when an application seeks to refer to an object, the computer must first allocate or designate memory for the object. Using a "reference" to the allocated memory, the application can then properly manipulate the object. One way to implement a reference is by means of a "pointer" or "machine address," which uses multiple bits of information, however, other implementations are possible. Objects can themselves contain primitive data items, such as integers or floating point numbers, and/or references to other objects. In this manner, a chain of references can be created, each reference pointing to an object which, in turn, points to another object. When no chain of references in an application reaches a given object, the computer can deallocate or reclaim the corresponding memory for reuse.
Memory reclamation can be handled explicitly by the application program. This method, however, requires programmers to design programs to account for all allocated objects and to determine when the objects are available for reclamation. The alternative is to assign responsibility for memory management to a runtime system responsible for controlling program execution. The Java VM, one such system responsible for controlling program execution for example, includes a "garbage collector" to manage available memory resources used during execution of Java code.
"Garbage collection" is the term used to refer to a class of algorithms used to carry out memory management, specifically, automatic reclamation. Garbage collection algorithms generally determine reachability of objects from the references held in some set of roots. When an object is no longer reachable, the memory that the object occupies can be reclaimed and reused. There are many known garbage collection algorithms, including reference counting, mark-sweep, and generational garbage collection algorithms. These, and other garbage collection techniques, are described in detail in a book entitled "Garbage Collection, Algorithms For Automatic Dynamic Memory Management" by Richard Jones and Raphael Lins, John Wiley & Sons, 1996.
To be effective, garbage collection techniques should be able to, first, identify references that are directly accessible to the executing program, and, second, given the reference to an object, identify references contained within that object, thereby allowing the garbage collector to transitively trace chains of references. Unfortunately, many of the described techniques for garbage collection have specific requirements which cause implementation problems, particularly when a garbage collector is charged with managing memory for a system having programs written in both native and target code. For example, the Java VM's garbage collector manages resources for Java code with relative ease; however, it requires additional facilities to manage resources for other native code and even then the garbage collector has significant limitations.
In most language implementations, including the implementation of the Java programming language embodied in the JDK, stacks form one component of the root set. A stack is a region of memory in which stack frames may be allocated and deallocated. In typical object-oriented systems, each method executing in a thread of control allocates a stack frame, and uses the slots of that stack to hold the values of local variables. Some of those variables may contain references to heap-allocated objects. (The heap is an area of memory designated for resources associated with objects.) Such objects must be considered reachable as long as a method is executing. The term stack is used because the stack frames obey a last-in/first-out allocation discipline within a given thread of control. There is generally a stack associated with each thread of control, and when a thread involves both native and target program code, there are often two stacks, one for each type of code. Another component of the root set includes global variables used to hold references to objects outside a stack frame, which makes the objects available to multiple methods.
A garbage collector may be exact or conservative in how it treats different sources of references, such as stacks. A conservative collector knows only that some region of memory (e.g., a slot for a local variable in the stack frame or a memory location holding a global variable) may contain references, but does not know whether or not a given value in that region is a reference. If such a collector encounters a value that is a possible reference value, it must keep the referenced object alive. Because of the uncertainty in recognizing references, the collector is constrained not to move the object, since that would require updating the reference, which might actually be an unfortunately-valued integer or floating-point number. The main advantage of conservative collection is that it allows garbage collection to be used with systems not originally designed to support collection. For example, the collectors described in Bartlett, Joel F., Mostly-Copying Collection Picks Up Generations and C++, Technical Report TN-12, DEC Western Research Laboratory, October 1989, and Boehm, Hans Juergen and Weiser, Mark, Garbage Collection in an Uncooperative Environment. Software-Practice & Experience, 18(9), p. 807-820, September 1988, use conservative techniques to support collection for C and C++ programs.
In contrast, a collector is exact in its treatment of a memory region if it can accurately distinguish references from non-reference values in that region. Exactness has several advantages over conservatism. A conservative collector may retain garbage referenced by a non-reference value that an exact collector would reclaim. Perhaps more importantly, an exact collector is always free to relocate objects since it is able to identify references exactly. In an exact system, one in which references and non-references can be distinguished, this enables a wide range of useful and efficient garbage-collection techniques that cannot easily be used in a conservative setting. For example, the ability to relocate objects enables an exact collector to compact used memory during a collection cycle. However, a drawback of exact systems is that they must provide the information that makes them exact, i.e., information on whether a given value in memory is a reference or a primitive value. A VM can do this effectively for its target code using techniques such as stack maps that distinguish references from primitive values in the target code's stack. However, there is no known implementation that uses exact garbage collection for programs including both native and target code and allows the same level of flexibility and convenience in writing native code.
Sun Microsystems, Inc. also developed an interface, called the Java.TM. Native Interface (JNI), for native program code executing within the Java VM. The JNI is comprised of a library of functions, i.e., an API, and developers of native code call upon these functions with references to them by name in the native code. The JNI functions enable the Java VM's garbage collector to obtain certain information concerning the native code for purposes of garbage collection. Using JNI functions, for example, the native code can reference objects in a heap managed by the Java VM's garbage collector. While the interface itself allows an implementation supporting exact garbage collection, in the most common implementation exact garbage collection is not possible. This is because references are maintained in the same stack used to hold references for the Java code and the Java VM uses an indicator in a special frame of Java code stack to control garbage collection of the native code objects. This implementation is satisfactory for conservative garbage collection but it does not prevent the "leaking" of direct object references outside the JNI stack frame. In other words, direct references to objects may be lost during a garbage collection cycle when all of the references may not be located in the JNI stack frame. Consequently, such an implementation of the JNI does not support an exact collection algorithm.
There is, therefore, a need for a mechanism that facilitates flexible garbage collection for memory resources for an application having two types of program code, native code familiar to an operating environment and target code for execution by an abstract computing machine associated with the operating environment.