1. Field of the Invention
The present invention relates to memory management particularly with the aspect of memory management that has become known as xe2x80x9cgarbage collection.xe2x80x9d More particularly the present invention relates to garbage collection in systems having multiple processors sharing memory.
2. Background Information
In the field of computer systems, considerable effort has been expended on the task of allocating memory to data objects. For the purposes of this discussion, the term object refers to a data structure represented in a computer system""s memory. Other terms sometimes used for the same concept are record and structure. An object may be identified by a reference, a relatively small amount of information that can be used to access the object. A reference can be represented as a xe2x80x9cpointerxe2x80x9d or a xe2x80x9cmachine address,xe2x80x9d which may require, for instance, only sixteen, thirty-two, or sixty-four bits of information, although there are other ways to represent a reference.
In some systems, which are usually known as xe2x80x9cobject oriented,xe2x80x9d objects may have associated methods, which are routines that can be invoked by reference to the object. An object may belong to a class, which is an organizational entity that may contain method code or other information shared by all objects belonging to that class. In the discussion that follows, though, the term object will not be limited to such structures; it will additionally include structures with which methods and classes are not associated.
Modern programs often run on systems using many processors and dynamically generate objects that are stored in a part of memory referred to in the field as the xe2x80x9cheap.xe2x80x9d Although there are some different uses of the term, the discussion that follows will use heap to refer to shared memory managed by automatic garbage collection. The garbage collector has control of and/or direct access and/or knowledge of the addresses, classes, roots, and other such detailed information about all live objects created in the system.
After an object is no longer needed, it sometimes becomes necessary to reclaim the memory allocated to the object in order to prevent the system from running out of memory as more and more temporary objects fill the heap. Such memory reclaiming is referred to as xe2x80x9cgarbage collection,xe2x80x9d or GC. Known GC is well described by Richard Jones and Rafael Lins in their book, xe2x80x9cGarbage Collection Algorithms for Automatic Dynamic Memory Management,xe2x80x9d published by John Wiley and Sons, 1996. This book is incorporated herein by reference. A brief description of known GC systems and techniques follows.
Garbage collectors operate by reclaiming space that is no longer xe2x80x9creachable.xe2x80x9d Statically allocated objects represented by a program""s global variables are normally considered reachable throughout a program""s life. Such objects are not ordinarily stored in the garbage collector""s managed memory space, but they may contain references to dynamically allocated objects that are, and such dynamically allocated objects are considered reachable, too. Clearly, objects referred to in the execution threads"" call stack are reachable, as are the objects referred to by register contents. And an object referred to by any reachable object is also reachable.
The use of automatic garbage collectors is advantageous because, whereas a programmer working on a particular sequence of code can perform his task creditably in most respects with only local knowledge of the application at any given time, memory allocation and reclamation require a global knowledge of the program. Specifically, a programmer dealing with a given sequence of code does tend to know whether some portion of memory is still in use by that sequence of code, but it is considerably more difficult for him to know what the rest of the application is doing with that memory. By tracing references from some conservative notion of a xe2x80x9croot set,xe2x80x9d e.g., global variables, registers, and the call stack, automatic garbage collectors obtain global knowledge in a methodical way. By using a garbage collector, the programmer is relieved of the need to worry about the application""s global state and can concentrate on local-state issues, which are more manageable.
Garbage-collection mechanisms can be implemented in a wide range of combinations of hardware and/or software. As is true of most of the garbage-collection techniques described in the literature, the present invention makes use of and is applicable to most such systems.
To distinguish the part of the program that does xe2x80x9cusefulxe2x80x9d work from that which does the garbage collection, the term mutator is sometimes used in discussions of these effects; from the collector""s point of view, what the mutator does is mutate active data structures"" connectivity. Some garbage-collection approaches rely heavily on interleaving garbage-collection steps among mutator steps. In one type of garbage-collection approach, for instance, the mutator operation of writing a reference is followed immediately by garbage-collector steps used to maintain a reference count in that object""s header, and code for subsequent new-object allocation includes steps for finding space occupied by objects whose reference count has fallen to zero. Obviously, such an approach can slow mutator operation significantly.
Other, xe2x80x9cstop-the-worldxe2x80x9d GC approaches use somewhat less interleaving. The mutator still typically allocates space within the heap by invoking the garbage collector, for example, and the garbage collector, at some level, manages access to the heap. Basically, the mutator asks the garbage collector for a pointer to a heap region where it can safely place the object""s data. The garbage collector keeps track of the fact that the thus-allocated region is occupied, and it will refrain from allocating that region in response to any other request until it determines that the mutator no longer needs the region allocated to that object. In stop-the-world collectors, the task of memory reclamation is performed during separate garbage collection cycles. In such cycles the collector interrupts the mutator process, finds unreachable objects, and reclaims their memory space for reuse. As explained later when discussing xe2x80x9ccard tables,xe2x80x9d the GC""s finding of unreachable objects is facilitated by the mutator recording where in memory changes have been made.
Garbage collectors vary as to which objects they consider reachable and unreachable. For the present discussion, though, an object will be considered xe2x80x9creachablexe2x80x9d if it is referred to by a reference in a root. The root set includes, for instance, reference values stored in the mutator""s threads"" call stacks, the CPU registers, and global variables outside the garbage-collected heap. An object is also reachable if it is referred to by another reachable object. Objects that are not reachable can no longer affect the program, so it is safe to re-allocate the memory spaces that they occupy.
A typical approach to garbage collection is therefore to identify all reachable objects and reclaim any previously allocated memory that the reachable objects do not occupy. A typical garbage collector may identify reachable objects by tracing objects pointed to from a root, tracing objects pointed to from those reachable objects, and so on until all the referenced or pointed to objects are found and are retained. Thus the last objects found will have no pointers to other untraced objects. In this way unreachable objects are in effect discarded and their memory space becomes free for alternative use.
However, such free space is more useful when it is compacted than when it is distributed in a fragmented way throughout the heap. Compaction increases the data""s xe2x80x9clocality of reference.xe2x80x9d This increases cache hits and therefore cache performance. To compact free space, many garbage collectors may relocate reachable objects. In one known technique the heap is partitioned into two halves, hereafter called xe2x80x9csemi-spaces.xe2x80x9d Between any two garbage-collection cycles, all objects are allocated in one semi-space (xe2x80x9cfromxe2x80x9d space), leaving the other semi-space (xe2x80x9ctoxe2x80x9d space) free. When the garbage-collection cycle occurs, objects identified as reachable are xe2x80x9cevacuated,xe2x80x9d i.e., copied compactly into the xe2x80x9ctoxe2x80x9d semi-space from the xe2x80x9cfromxe2x80x9d semi-space, which is then considered free. Once the garbage-collection cycle has occurred, the designations xe2x80x9cfromxe2x80x9d and xe2x80x9ctoxe2x80x9d are interchanged for the next GC cycle. Any new objects will be allocated in the newly labeled xe2x80x9cfromxe2x80x9d semi-space until the next GC cycle.
Although this relocation requires the extra steps of copying the reachable objects and updating references to them, it tends to be quite time and code efficient, since most new objects quickly become unreachable, so most of the current semi-space is actually garbage. That is, only a relatively few, reachable objects need to be relocated, after which the entire semi-space contains only garbage and can be pronounced free for reallocation. One limitation of this technique is that half the memory so used is unusable for storing newly created objects.
A way of not only reducing collection-cycle length but also increasing overall efficiency is to segregate the heap into one or more parts, called generations, that are subject to different collection policies. New objects are allocated in a xe2x80x9cyoungxe2x80x9d generation, and older objects are promoted from younger generations to older or more xe2x80x9cmaturexe2x80x9d generations. Collecting the younger generations more frequently than the others yields greater efficiency because the younger generations tend to accumulate garbage faster; newly allocated objects tend to xe2x80x9cdie,xe2x80x9d while older objects tend to xe2x80x9csurvive.xe2x80x9d But generational collection greatly increases what is effectively the root set for a given generation since references to objects in one generation may be found in another generation, and thus other generations must be searched to uncover such references.
Consider FIGS. 1 and 2, which depict a heap as organized into an old generation 14 and a young generation 16. With such a partition, the system may take advantage of a copy type GC""s simplicity in managing the young generation because the unused half memory is relatively small. But, for the xe2x80x9coldxe2x80x9d generation, which uses the great majority of the memory, using only half for storage may not be practical. So a different approach may be used. Among the possibilities are the mark-sweep and mark-compact described in the above referenced book by Richard Jones and Rafael Lins.
In multiprocessor systems, one approach to speeding up garbage collections is to xe2x80x9cparallelizexe2x80x9d the GC process by involving any idle processors in the garbage collection task. Toshido Endo et al. in their paper, xe2x80x9cA Scalable Mark-Sweep Garbage Collector on Large Scale Shared-Memory Machines,xe2x80x9d published in the Proceedings of High Performance Networking and Computing (SC97) in 1997 describes such a system. This approach includes copying the work to lockable auxiliary queues from which work may be stolen. A thread whose queue is empty looks for a queue with work, locks that queue and steals half the elements from the locked queue.
The present invention provides advantages that are achieved in a parallel infrastructure garbage collection system using protocols, priorities and management aspects of such parallelization in a multi-threaded, multi-processor system.
In the present invention, the mutator work threads share heap memory. At least a portion of this shared memory is partitioned into semi-spaces referred to herein as xe2x80x9cfromxe2x80x9d and xe2x80x9ctoxe2x80x9d spaces. Reachable objects (here defined to include any xe2x80x9cwork tasksxe2x80x9d or xe2x80x9clive objects, xe2x80x9d etc.) are stored as they are created by mutator threads in xe2x80x9cfromxe2x80x9d space, and a later garbage collection cycle will transfer the reachable objects pointed to by a root source into xe2x80x9ctoxe2x80x9d space.
The beginning of the free area in the xe2x80x9ctoxe2x80x9d space is indicated by a copy pointer, available to all GC threads, pointing to free space, and a scan pointer is provided for indicating copied objects that have not yet been scanned for references to additional reachables. For example, when the object is scanned for references, the GC thread will execute an atomic fetch-and-set (or an instruction capable of implementing a read-modify-write sequence automatically, e.g. a CAS) operation on the copy pointer. The copy pointer is fetched and an offset equal to a predefined local buffer size is added to the copy pointer. The thread is then free to store into the local buffer with no competition from other threads. The GC thread will employ, in a typical application, a local copy pointer and a local scan pointer for controlling the copying of reachable objects into the local buffer.
Once the reachable object is copied, the local copy pointer is incremented to point to the next free space. The object is scanned for references, and the scan pointer is incremented to point to the next object that has not been scanned. Any references found by the scanning are stored in a work-to-be-completed queue associated with the thread. The thread later pops the referenced objects one at a time from the queue and processes them. If the copy pointer and the scan pointer are equal, there are no tasks to be scanned.
When a first local buffer becomes filled, the thread allocates another buffer area and preserves the heap by storing a filler or some other an integer array into the unused portion of the first buffer. In a preferred embodiment, the filler is a xe2x80x9cdead objectxe2x80x9d that just takes up the space remaining the buffer, but has the structure of an object.
The present invention provides for the case where a first object is being copied into a first thread""s local allocation buffer, and a second object is being copied into a second thread""s local allocation buffer, and both the first and the second objects point to a common third object. The first object is copied into the first local allocation buffer starting at the location pointed to by a local copy pointer that is incremented to the start of free space in the first local allocation buffer. The second object is copied into the second local allocation buffer starting at the location pointed to by the local copy pointer in the second local allocation buffer.
In some instances, the first thread scans the first copied object finding the reference to the third object, and the second thread scans the second copied object finding the reference to the third object. Both local scan pointers are incremented accordingly.
Both the first and the second threads execute a primitive instruction, preferably a compare-and-swap as discussed later, on that object. The first to execute the primitive will find that the third object has not been copied, whereupon that thread copies the third object and updates the pointer in the first copied object to point to the copied third object. When the other thread (or threads) later executes the primitive, the primitive will indicate that the third object was already copied into the xe2x80x9ctoxe2x80x9d semi-space. However, the primitive will return the new pointer to the copied third object, and the other thread (or threads) will update the pointers in their respective copied objects with the new pointer to the already copied third object.