One of the most important resources within a data processing system is the amount of memory directly available for utilization by tasks during execution. Accordingly, much interest has been directed to efficient utilization of memory and memory management strategies. An important concept in memory management is the manner in which memory is allocated to a task, deallocated and then reclaimed.
Memory deallocation and reclamation may be explicit and controlled by an executing program, or may be carried out by another special purpose program which locates and reclaims memory which is unused, but has not been explicitly deallocated. "Garbage collection" is the term used in technical literature and the relevant arts to refer to a class of algorithms utilized to carry out storage management, specifically automatic memory reclamation. There are many known garbage collection algorithms, including reference counting, mark-sweep, and generational garbage collection algorithms. These, and other garbage collection techniques, are described in detail in a book entitled "Garbage Collection, Algorithms For Automatic Dynamic Memory Management" by Richard Jones and Raphael Lins, John Wiley & Sons, 1996. Unfortunately, many of the described techniques for garbage collection have specific requirements which cause implementation problems, as described herein.
For the purpose of this specification, the term "object" refers to a data structure that is represented in the memory of a computing system. This usage of the term object is distinct from the usage of the term "object" in "object-oriented" systems, wherein objects have associated "methods", i.e. pieces of code associated with them, which code may be invoked through a reference to the object. However, the present invention is applicable to such object-oriented systems.
An object may be located by a "reference", or a small amount of information that can be used to access the data structure. One way to implement a reference is by means of a "pointer" or "machine address", which uses multiple bits of information, however, other implementations are possible. General-purpose programming languages and other programmed systems often use references to locate and access objects. Such objects can themselves contain references to data, such as integers or floating-point numbers and to yet other objects. In this manner, a chain of references can be created, each reference pointing to an object which, in turn, points to another object.
Garbage collection techniques determine when a data structure is no longer reachable by an executing program, either directly or through a chain of pointers. When a data structure is no longer reachable, the memory that the data structure occupies can be reclaimed and reused even if it has not been explicitly deallocated by the program. To be effective, garbage collection techniques should be able to, first, identify references that are directly accessible to the executing program, and, second, given a reference to an object, identify references contained within that object, thereby allowing the garbage collector to trace transitively chains of references.
A subclass of garbage collectors known as "relocating" garbage collectors, relocate data structures that are still reachable by the executing program. Relocation of a data structure is accomplished by making a copy of the data structure in another region of memory, then replacing all reachable references to the original data structure with references to the new copy. The memory occupied by the original data structure may then be reclaimed and reused. Relocating garbage collectors have the desirable property that they compact the memory used by the executing program and thereby reduce memory fragmentation.
Because relocating garbage collectors modify references during the garbage collection process, it is important that references be identified and distinguished from non-reference information, such as data, which cannot be modified for garbage collection purposes. Consequently, fully relocating garbage collectors belong to a subclass of garbage collection methods, known as "exact" garbage collectors, which require knowledge whether a given piece of information in memory is a reference or a primitive value. For the purposes of this document, a "primitive value" or "primitive data" is defined as data which does not function as a reference, such as an integer or floating point number.
In order to facilitate the use of exact garbage collection, some computing systems use a "tagged" representation for all memory locations to positively distinguish references from data. In such systems, references and primitive data, such as integers and floating-point numbers, are represented in memory in a manner that a reference always has a different bit pattern than a primitive value. This is generally done by including tag bits in each memory location in addition to the bits holding the memory location value. The tag bits for a memory location holding a reference value are always different from the tag bits for a memory location holding a datum value. The MIT LISP Machine was one of the first architectures which used garbage collection and had a single stack with explicitly tagged memory values. Its successor, the Symbolics 3600, commercially available from Symbolics, Inc., Cambridge, Mass., also used explicitly tagged memory values. The Symbolics 3600 was able to accommodate either a 32-bit reference or a 32-bit primitive datum in a single stack by using 36 bit words, 4 bits of which were permanently allocated for tagging information. As such, the bit pattern within a 36-bit word for a reference was always distinguishable from the bit pattern for a primitive integer or floating-point value.
Permanently allocated tag bits have the disadvantage that they consume memory space that might otherwise be used to store computational data. Consequently, many computer systems use an "untagged" data representation in which the entire memory word is devoted to representing the datum value. In such systems, the same bit pattern might represent a reference or a primitive value. As a result, in such systems, the distinction between references and primitive values is often made from external considerations or representations, such as the instruction that is to operate on the data, or the position of the data within an object. However, the use of external considerations to make this distinction was not possible in all systems.
For example, the Java programming language was originally designed for use in systems using untagged data representation. The Java programming language is described in detail in the text entitled "The Java Language Specification" by James Gosling, Bill Joy and Guy Steele, Addison-Wesley, 1996. The Java language was designed to run on computing systems with characteristics that are specified by the Java Virtual Machine Specification which is described in detail in a text entitled "The Java Virtual Machine Specification", by Tim Lindholm and Frank Yellin, Addison-Wesley, 1996.
According to the Java Virtual Machine (JVM) Specification, a local variable or stack slot in a computing system using 32-bit memory words may contain either a 32-bit integer, a 32-bit floating-point number, or a 32-bit reference. Consequently, tagged data representation cannot be used in all cases (programming languages that use tagged data representation on 32-bit computer architectures typically restrict the size of integers to 30 bits.) However, it is not possible to distinguish references from data in all cases by examining Java instructions, because many instructions operate indiscriminately on references and data.
Another technique by which references and primitive data are identified and distinguished is to keep them on separate stacks. This technique was used in a system that also used garbage collection and goes back more than twenty years, to a version of the MacLisp system that had four stacks. Two of these stacks contained references and two did not. Of the two that contained references, one contained ordinary subroutine arguments and one contained "dynamic bindings" of so-called "special variables". Of the two that contained primitive data, one held integer values and one held floating-point values. While it was true that the garbage collector did not need to scan the integer and floating-point stacks while looking for references, the principal motivation for this separation was not to use two or more stacks to simulate a single logical stack, but rather to stack-allocate certain numerical quantities that might otherwise have required allocation in the garbage-collected heap, and to be able to tell whether such data was integer or floating-point by its address. This required two separate stacks for primitive data, not just one. Subroutine arguments were never passed on the two primitive stacks. Rather, every argument was represented as a reference on the main reference stack. Although some of these references were addresses of locations on the numerical stacks, this was a consequence of the more general mechanism for stack allocation of numerical values. Local variables of the reference type were allocated on the reference stack and local variables of numerical type might, or might not, be allocated on a numerical stack. If a local variable of the numerical type was allocated on a numerical stack, it might be represented in a dual form where the numerical value was stored in a numerical stack and the address of the location on the numerical stack was stored in the reference stack. A generalization of such a strategy was used in the S-1 LISP language.
Accordingly, implementation of an exact garbage collection algorithm with a computer system architecture which does not accommodate tagged data or which does not support segregation of operands into different stacks presents implementation problems.