Basic Explanation of Garbage Collection
Garbage collection is a complex topic that has been the subject of hundreds of technical articles and at least one text book. The following is a simplified explanation of dynamic memory allocation and garbage collection. For a more complete explanation of basic garbage collection technology, see U.S. Pat. No. 5,088,036, and Richard Jones and Rafael Lins, "Garbage Collection," John Wiley & Sons Ltd., 1996, both of which are incorporated by reference as background information.
Referring to FIG. 1, there is shown a typical multitasking computer system 100 that has a CPU 102, user interface 106, and memory 108 (including both fast random access memory and slower non-volatile memory). The memory 108 stores an operating system 110 and one or more mutator tasks 112 (also called processes or threads). Each active task in the system is assigned a portion of the computer's memory, including space for storing the application level code 112 executed by the task, space for storing a program stack 114, and a heap 116 that is used for dynamic memory allocation.
The CPU 102 includes an instruction cache 120 for providing instructions to an instruction decoder and execution logic 122. The CPU also includes a stack cache 124 for storing in high speed cache memory a portion of the program stack 114, and a set of registers 126 for storing data values, object references 128 and the like. The program stack 114, including the portion in the stack cache 124, is used to temporarily store various data structures and parameters, including activation records (sometimes called "frames") 130 that are pushed on the program stack each time a method or other procedure is invoked.
During garbage collection, the program stack 114, and the registers 126 in the CPU 102 are typically used to locate a "root set" of object references or pointers used by the mutator task 112. A root set locator procedure in the garbage collector will typically generate and store a root set list 132 of the located root set object references.
It should be understood that FIG. 1 depicts only a simplified representation of a CPU 102 and the items stored in memory. Also, it should be understood that multiple processes may be executing simultaneously in a computer system, each with its own address space that includes the same types of items as shown in the memory 108 of FIG. 1.
For the purposes of this description, the terms "task", "mutator", "mutator thread", "thread" and "process" are used interchangeably. Tasks and programs are sometimes called mutators because they change or "mutate" the contents of the heap 116.
The term "object" is herein defined to mean any data structure created by a program or process.
The terms "reference" and "object reference" are used interchangeably to describe a data structure that includes a pointer to an object. While the term "pointer" or "object pointer" are sometimes used interchangeably with "reference" and "object reference", object references may include information in addition to a pointer. An object reference may be direct or indirect. A direct object reference directly points to an object header, while an indirect object reference points to an object handle. In this document the term "object reference" refers to both types.
The term "stack cache overflow" is defined for the purpose of this document to mean either that the relevant stack cache has become full and the CPU is requesting to write additional entries to it, or that the number of entries in the stack cache equals or exceeds a predefined or programmable "high water mark" (or equivalently, the number of unused entries in the stack cache falls below a predefined or programmable value).
The term "stack cache underflow" is defined for the purpose of this document to mean either that the relevant stack cache has become empty and the CPU is requesting further stack cache entries from it, or that the number of used entries in the stack cache falls below a predefined or programmable "low water mark" (or equivalently, the number of unused entries in the stack cache exceeds a predefined or programmable value).
The term "in-band memory tagging" refers to storing stack cache tags in-band in memory, as a header word preceeding N datum words. The term "out-band memory tagging" refers to storing stack cache tags in a header for N datum words in an alternate location in memory from the location used to store the N datum words.
When the mutator task 112 associated with the heap 116 needs space for storing an array or other program "object", a Memory Allocator routine 140 in the operating system is called. The memory allocator 140 responds by allocating a block of unused memory in the heap 116 to the task. Additional requests for memory will result in the allocation of additional memory blocks. Clearly, if the task continues to ask for more memory, all the space in the heap 116 will eventually be used and the task will fail for lack of memory. Therefore space must be restored by either explicit actions of the program, or some other mechanism.
It is well known that most tasks "abandon" much of the memory space that is allocated to them. Typically, the task stores many program objects in allocated memory blocks, and discards all references to many of those objects after it has finished processing them because it will never need to access those objects again. An object for which there are no references (sometimes called pointers) is often termed an "inaccessible object", and the memory space it occupies is "inaccessible" to the task that once used it.
The solution to this problem is to recover blocks of memory space in the heap 116 that are no longer being used by the task. Garbage collection is the term used to refer to automatic methods of recovering unused memory in the heap 116. The garbage collector generally gathers and recovers unused memory upon the occurrence of a predefined event, such as the expiration of a predefined time period, or usage of a certain amount of the available heap. Thus, FIG. 1 shows that the operation system 110 includes a garbage collector 142.
Thus, the purpose of the garbage collector 142 is to recover unused or abandoned portions of memory in the heap 116 so that the task using the heap will not run out of memory.
While there are a number of different garbage collection methodologies, the easiest one to explain is the Stop and Copy garbage collection technique. In this scheme the heap is divided into two halves, also called semi-spaces, and the program uses only one semi-space at a time. Stop and Copy garbage collectors reclaim unused memory and compact the program accessible memory used by a task by copying all "accessible objects" in the current semi-space to a contiguous block of memory in the other semi-space, and changing all references to the accessible objects so as to point to the new copy of these objects. An accessible object is any object (i.e., block of memory) that is referenced, directly or indirectly, by the "roots" or "root set" of the task. Typically, the "root set" of a task with respect to garbage collection is a set of object references stored in known locations, in the program stack 114 and registers 126 used by the task, which point to objects used by a task. Many of those objects, in turn, will contain references to other objects used by the task. The chain, or directed graph, of object references emanating from the root set indirectly points to all of the accessible objects in the heap.
The entire set of objects referenced by these object references (pointers) is called the set of accessible objects. Inaccessible objects are all objects not referenced by the set of object references derived from the root set.
By copying all accessible objects to a new contiguous block of memory in the heap, and then using the new copy of the objects in place of the old copy, the Stop and Copy garbage collector eliminates all unused memory blocks in the heap. It also "compacts" the memory storage used by the task so that there are no "holes" between accessible objects. Compaction is a desirable property because it puts all of the memory available for allocation to a task in a contiguous block, which eliminates the need to keep track of numerous small blocks of unallocated memory. Compaction also improves virtual memory performance.
Also shown in FIG. 1 are aspects of a computer system that is set up to execute "Java.TM." (a trademark of Sun Microsystems, Inc.) bytecode programs. In particular, the operating system of such a system includes:
a bytecode program verifier 144 for verifying whether or not a specified Java bytecode program satisfies certain predefined integrity criteria; PA1 a class loader 146, which loads object classes into a user's address space and utilizes the bytecode program verifier 144 to verify the integrity of the methods associated with each loaded object class; and PA1 a bytecode program interpreter (not shown) for executing Java bytecode programs. If the instruction decoder and execution logic 122 is designed to execute Java bytecode instructions, a bytecode program interpreter is not needed.
Furthermore, in a computer system set up to execute Java bytecode programs, memory 108 will include at least one class repository 150, for locally storing object classes 152 in use and/or available for use by users of the computer system 100. The heap 116 acts as an object repository for storing objects, which are instances of objects of the object classes stored in the class repository 150.
The present invention is equally applicable to systems using incremental garbage collection, which is a collection of techniques for performing garbage collection in a manner that is interleaved in small increments with mutator functions. Incremental garbage collection is used primarily in systems that require real-time system performance. In most copying versions of incremental garbage collection, every time an existing object is accessed, the existing object is copied from old space to new space unless the object has already been moved to new space during the current collection cycle. There are also non-copying and non-compacting versions of incremental garbage collection. Incremental garbage collection reduces the length of any single system pause caused by garbage collection, but may increase latency in the execution of individual mutator task instructions. The procedure or set of instructions used for performing incremental garbage collection are sometimes called "read barrier" instructions, since they are typically performed in conjunction with object field read instructions. Incremental collection may also be performed using write barrier instructions, which are typically performed in conjunction with object reference write instructions.
The present invention is also equally applicable to: any garbage collection system in which the evaluation stack is part of the root set, and to any tracing garbage collection system.
When a computer system supports multithreading, it is possible for two or more threads of execution to share a single address space. In such systems, each thread of execution is typically considered to be a separate mutator task, and each has its own stack and register set.
In computer systems that use tagged memory to facilitate efficient garbage collection, every word of memory has a corresponding tag that specifies whether or not the value stored in the memory word is an object reference. Of course, in such systems it is necessary that the CPU be able to determine, while it is executing each instruction that changes the contents of any memory location, stack location or register, whether or not the value being written is an object reference. For this reason the instructions used by such computer systems are generally sufficiently data type specific that all instructions, other than instructions which copy values from one memory location to another, that can write an object reference, are only used to write object reference values (i.e., they are data type specific for handling only object reference data).
The identification of object references in the program stack is a tedious time consuming task in non-tagged memory based computers. Further, making a CPU or operating system compatible with non-tagged memory is usually considered to be desirable because virtually all desktop and workstation computers use memory with a standardized word width, such as 32, 64 or 128 bits per word. Also, updating an explicit main memory tag can incur expensive read-modify-write cycle operations, thus reducing available memory bandwidth.
Prior to the present invention there has been an implicit assumption that if tagged main memory was not available, then cache memory inside the CPU should be untagged since it was assumed that there was no point in tagging one without the other. However, given the well known advantages of using tagged memory for garbage collection, the present invention provides a system and method for using tagged stack cache memory inside a computer's CPU while using conventional untagged main memory outside the CPU.
One last point of background information is that the prior art provides a number of examples, other than object reference marking, in which the tagging of memory is useful. It is beyond the scope of this document to explain the details of such systems, because the preferred embodiment of the present invention is specifically addressed at making root set location efficient in object oriented computer systems that use automatic garbage collection for storage management. However, suffice it to say that most aspects of the present invention are applicable to any computer system or operating system that uses conventional, untagged main memory, but where using tagged cache memory in the CPU would be desirable.