1. Field of the Invention
The present invention relates memory recycling processes or “garbage collection” in computer systems.
2. Description of the Prior Art
Many modern programming languages allow the programmer to dynamically allocate and reclaim memory. Dynamically allocated storage is often automatically managed during execution of a computer program using a process known as garbage collection. A particular example of a programming language that uses garbage collection is Java. Java is an object-oriented programming language. In object-oriented programming systems a “class” can provide a template for the creation of “objects” (i.e. data items) having characteristics of that class. Objects are typically created dynamically (in Java, using an operator called the “new” operator) during program execution. Methods associated with a class typically operate on objects of the same class.
Java source code is compiled to produce runtime code known as bytecode. However the bytecode is not the binary code of any existing computer. Rather, Java bytecode is an architecture-neutral machine code that can be interpreted to run on any specific computer. A Java program is executed by running an interpreter program called a Java Virtual Machine (JVM). The JVM reads the bytecode program and interprets or translates it to the native instruction set of the computer on which it is installed. The bytecode will run on any computer platform on which a JVM is installed without any need for recompilation of the source code. A process called “bytecode verification” is used as part of the linking of a Java program, prior to execution. The verification process ensures that the bytecode adheres to a set of rules defining well-formed Java class files. For example, if the verifier fails to confirm that a method in a class file pushes two integers onto an operand stack before executing an “iadd” (i.e. integer addition) instruction, the verifier will reject that class file.
Java allows multi-threaded execution of processes. A thread (abbreviation for “thread of control”) is a sequence of execution within a process. A thread is a lightweight process to the extent that it does not have its own address space but uses the memory and other resources of the process in which it executes. There may be several threads in one process and the JVM manages the threads and schedules them for execution. Threads allow switching between several different functions executing simultaneously within a single program. When the JVM switches from running one thread to another thread, a context switch within the same address space is performed.
Garbage collection relieves programmers from the burden of explicitly freeing allocated memory and helps ensure program integrity by reducing the likelihood of incorrectly deallocating memory. However, a potential disadvantage of garbage collection is that at unpredictable times, a potentially large amount of garbage collection processing will be initiated when there is a need for more memory. In Java this problem is sometimes ameliorated by the multi-threaded nature of the system, which can allow the garbage collection to run in parallel with the user code. A garbage collection algorithm must perform two basic functions. Firstly, it must determine which objects are suitable for reclamation as garbage and secondly it must reclaim the memory space used by the garbage objects and make it available to the program. Garbage is defined as memory no longer accessible to the program.
The first stage of this process is typically performed by defining a set of “roots” and determining reachability from those roots. Objects that are not reachable via the roots are considered to be garbage since there is no way for the program to access them so they cannot affect the future course of program execution.
In the JVM the set of roots is implementation-dependent but will always include any object references in the local variables. The JVM comprises four basic components: registers; stack memory; heap memory; and a method area. All Java objects reside in heap memory and it is heap memory that is garbage-collected. Different pieces of storage can be allocated from and returned to the heap in no particular order. Memory allocated with the “new” operator in Java comes from the heap. The heap is shared among all of the threads. The method area is where the bytecodes reside. A program counter, which points to (i.e. contains the address of) some byte in the method area is used to keep track of the thread of execution. The Java stack is used to: store parameters for and results of bytecode instructions; to pass parameters to and return parameters from methods; and to keep the state of each method invocation. The JVM has few registers because the bytecode instructions operate primarily on the stack. The stack is so-called because it operates on a last-in-first-out basis. The state of a method invocation is called its “stack frame”. Each method executing in a thread of control allocates a stack frame. When the method returns the stack frame is discarded.
In short, in the JVM all objects reside on the heap, the local variables typically reside on the stack and each thread of execution has its own stack. Each local variable is either an object reference or a primitive type (i.e. non-reference) such as an integer, character or floating point number. Therefore the roots include every object reference on the stack of every thread.
There are many known garbage collection algorithms including reference counting, mark and sweep and generational garbage collection algorithms. Details of these and other garbage collection algorithms are described in “Garbage Collection, Algorithms for Automatic Dynamic Memory Management” by Richard Jones and Raphaels Lins, published by John Wiley & Sons 1996.
Garbage collection algorithms may be categorised as either precise (exact) or imprecise (conservative) in how they identify and keep track of reference values in sources of references such as stacks. An imprecise garbage collector knows that a particular region of memory (e.g. a slot in a stack frame) may contain a reference but does not know where a given value stored in that particular region is a reference. The imprecise garbage collector must therefore keep any potential reference object alive. A disadvantage of this known technique is that a primitive type may be erroneously identified as a reference by the imprecise garbage collector if the variable value happens to coincide with a memory address. This means that a garbage object will be wrongly considered to be “live” by the imprecise collector, because an object-reference lookalike (i.e. the primitive type) referred to it.
Precise garbage collectors, on the other hand, can discriminate between a genuine object-reference and a primitive-type masquerading as a reference. Accordingly, precise garbage collectors alleviate the problem suffered by imprecise collectors of failure to garbage collect objects corresponding to object-reference lookalikes. To perform precise garbage collection, the system must be able to clearly distinguish between references and non-references. For objects, a “layout map” can be produced for each object describing which data fields of the object contain references. Reference identification for objects is relatively simple since the layout map will be invariant for the entire lifetime of the object and all objects of the same type have identical layout maps. However, the reference identification process is more complex for stack frames because the layout of a stack frame is likely to change during its lifetime. For example a given stack frame slot could be uninitialised at the start of the method, contain an integer for one block of the method and a reference for another block of the method.
There are currently two known methods for storing the information required for implementing precise garbage collection. The first involves building a series of stack maps during the bytecode verification stage. A “stack map” is a data structure that, for each execution point at which garbage collection may occur, represents a snapshot in time of the stack frame indicating the stack frame slots that contain references. Stack maps must be stored for a plurality of execution points in every verified method therefore this method is intensive on Random Access Memory (RAM).
The second involves keeping track of each stack write operation during execution of the verified bytecode and tagging each stack slot to which a reference is written and clearing the stack tag whenever a non-reference value is written to that stack location. This requires that the reference values and non-reference values written to the stack are self-describing i.e. each value has one or more bits used to indicate whether that value is a reference. The tag data itself may be stored either by widening the existing stack to include a tag field or by keeping a parallel stack. This stack-tagging process slows down execution of the program since both the stack and the variable must be marked as a reference or non-reference for each value written to stack.
Accordingly, it will be desirable to provide a precise garbage collection method that is less memory intensive and has less impact on program execution than known methods.