1. Field of the Invention
The present invention relates to memory management particularly with the aspect of memory management that has become known as xe2x80x9cgarbage collection.xe2x80x9d More particularly the present invention relates to garbage collection in systems having multiple processors sharing memory.
2. Background Information
In the field of computer systems, considerable effort has been expended on the task of allocating memory to data objects. For the purposes of this discussion, the term object refers to a data structure represented in a computer system""s memory. Other terms sometimes used for the same concept are record and structure. An object may be identified by a reference, a relatively small amount of information that can be used to access the object. A reference can be represented as a xe2x80x9cpointerxe2x80x9d or a xe2x80x9cmachine address,xe2x80x9d which may require, for instance, only sixteen, thirty-two, or sixty-four bits of information, although there are other ways to represent a reference.
In some systems, which are usually known as xe2x80x9cobject oriented,xe2x80x9d objects may have associated methods, which are routines that can be invoked by reference to the object. An object may belong to a class, which is an organizational entity that may contain method code or other information shared by all objects belonging to that class. In the discussion that follows, though, the term object will not be limited to such structures; it will additionally include structures with which methods and classes are not associated.
Modern programs often run on systems using many processors and dynamically generate objects that are stored in a part of memory referred to in the field as the xe2x80x9cheap.xe2x80x9d Although there are some different uses of the term, the discussion that follows will use heap to refer to shared memory managed by automatic garbage collection. The garbage collector has control of and/or direct access and/or knowledge of the addresses, classes, roots, and other such detailed information about all live objects created in the system.
After an object is no longer needed, it sometimes becomes necessary to reclaim the memory allocated to the object in order to prevent the system from running out of memory as more and more temporary objects fill the heap. Such memory reclaiming is referred to as xe2x80x9cgarbage collection,xe2x80x9d or GC. Known GC is well described by Richard Jones and Rafael Lins in their book, xe2x80x9cGarbage Collection Algorithms for Automatic Dynamic Memory Management,xe2x80x9d published by John Wiley and Sons, 1996. This book is incorporated herein by reference. A brief description of known GC systems and techniques follows.
Garbage collectors operate by reclaiming space that is no longer xe2x80x9creachable.xe2x80x9d Statically allocated objects represented by a program""s global variables are normally considered reachable throughout a program""s life. Such objects are not ordinarily stored in the garbage collector""s managed memory space, but they may contain references to dynamically allocated objects that are, and such dynamically allocated objects are considered reachable, too. Clearly, objects referred to in the execution threads"" call stack are reachable, as are the objects referred to by register contents. And an object referred to by any reachable object is also reachable.
The use of automatic garbage collectors is advantageous because, whereas a programmer working on a particular sequence of code can perform his task creditably in most respects with only local knowledge of the application at any given time, memory allocation and reclamation require a global knowledge of the program. Specifically, a programmer dealing with a given sequence of code does tend to know whether some portion of memory is still in use by that sequence of code, but it is considerably more difficult for him to know what the rest of the application is doing with that memory. By tracing references from some conservative notion of a xe2x80x9croot set,xe2x80x9d e.g., global variables, registers, and the call stack, automatic garbage collectors obtain global knowledge in a methodical way. By using a garbage collector, the programmer is relieved of the need to worry about the application""s global state and can concentrate on local-state issues, which are more manageable.
Garbage-collection mechanisms can be implemented in a wide range of combinations of hardware and/or software. As is true of most of the garbage-collection techniques described in the literature, the present invention makes use of and is applicable to most such systems.
To distinguish the part of the program that does xe2x80x9cusefulxe2x80x9d work from that which does the garbage collection, the term mutator is sometimes used in discussions of these effects; from the collector""s point of view, what the mutator does is mutate active data structures"" connectivity. Some garbage-collection approaches rely heavily on interleaving garbage-collection steps among mutator steps. In one type of garbage-collection approach, for instance, the mutator operation of writing a reference is followed immediately by garbage-collector steps used to maintain a reference count in that object""s header, and code for subsequent new-object allocation includes steps for finding space occupied by objects whose reference count has fallen to zero. Obviously, such an approach can slow mutator operation significantly.
Other, xe2x80x9cstop-the-worldxe2x80x9d GC approaches use somewhat less interleaving. The mutator still typically allocates space within the heap by invoking the garbage collector, for example, and the garbage collector, at some level, manages access to the heap. Basically, the mutator asks the garbage collector for a pointer to a heap region where it can safely place the object""s data. The garbage collector keeps track of the fact that the thus-allocated region is occupied, and it will refrain from allocating that region in response to any other request until it determines that the mutator no longer needs the region allocated to that object. In stop-the-world collectors, the task of memory reclamation is performed during separate garbage collection cycles. In such cycles the collector interrupts the mutator process, finds unreachable objects, and reclaims their memory space for reuse. As explained later when discussing xe2x80x9ccard tables,xe2x80x9d the GC""s finding of unreachable objects is facilitated by the mutator recording where in memory changes have been made.
Garbage collectors vary as to which objects they consider reachable and unreachable. For the present discussion, though, an object will be considered xe2x80x9creachablexe2x80x9d if it is referred to by a reference in a root. The root set includes, for instance, reference values stored in the mutator""s threads"" call stacks, the CPU registers, and global variables outside the garbage-collected heap. An object is also reachable if it is referred to by another reachable object. Objects that are not reachable can no longer affect the program, so it is safe to re-allocate the memory spaces that they occupy.
A typical approach to garbage collection is therefore to identify all reachable objects and reclaim any previously allocated memory that the reachable objects do not occupy. A typical garbage collector may identify reachable objects by tracing objects pointed to from a root, tracing objects pointed to from those reachable objects, and so on until all the referenced or pointed to objects are found and are retained. Thus the last objects found will have no pointers to other untraced objects. In this way unreachable objects are in effect discarded and their memory space becomes free for alternative use.
However, such free space is more useful when it is compacted than when it is distributed in a fragmented way throughout the heap. Compaction increases the data""s xe2x80x9clocality of reference.xe2x80x9d This increases cache hits and therefore cache performance. To compact free space, many garbage collectors may relocate reachable objects. In one known technique the heap is partitioned into two halves, hereafter called xe2x80x9csemi-spaces.xe2x80x9d Between any two garbage-collection cycles, all objects are allocated in one semi-space (xe2x80x9cfromxe2x80x9d space), leaving the other semi-space (xe2x80x9ctoxe2x80x9d space) free. When the garbage-collection cycle occurs, objects identified as reachable are xe2x80x9cevacuated,xe2x80x9d i.e., copied compactly into the xe2x80x9ctoxe2x80x9d semi-space from the xe2x80x9cfromxe2x80x9d semi-space, which is then considered free. Once the garbage-collection cycle has occurred, the designations xe2x80x9cfromxe2x80x9d and xe2x80x9ctoxe2x80x9d are interchanged for the next GC cycle. Any new objects will be allocated in the newly labeled xe2x80x9cfromxe2x80x9d semi-space until the next GC cycle.
Although this relocation requires the extra steps of copying the reachable objects and updating references to them, it tends to be quite time and code efficient, since most new objects quickly become unreachable, so most of the current semi-space is actually garbage. That is, only a relatively few, reachable objects need to be relocated, after which the entire semi-space contains only garbage and can be pronounced free for reallocation. One limitation of this technique is that half the memory so used is unusable for storing newly created objects.
A way of not only reducing collection-cycle length but also increasing overall efficiency is to segregate the heap into one or more parts, called generations, that are subject to different collection policies. New objects are allocated in a xe2x80x9cyoungxe2x80x9d generation, and older objects are promoted from younger generations to older or more xe2x80x9cmaturexe2x80x9d generations. Collecting the younger generations more frequently than the others yields greater efficiency because the younger generations tend to accumulate garbage faster; newly allocated objects tend to xe2x80x9cdie,xe2x80x9d while older objects tend to xe2x80x9csurvive.xe2x80x9d But generational collection greatly increases what is effectively the root set for a given generation since references to objects in one generation may be found in another generation, and thus other generations must be searched to uncover such references.
Consider FIGS. 1 and 2, which depict a heap as organized into an old generation 14 and a young generation 16. With such a partition, the system may take advantage of a copy type GC""s simplicity in managing the young generation because the unused half memory is relatively small. But, for the xe2x80x9coldxe2x80x9d generation, which uses the great majority of the memory, using only half for storage may not be practical. So a different approach may be used. Among the possibilities are the mark-sweep and mark-compact described in the above referenced book by Richard Jones and Rafael Lins.
In multiprocessor systems, one approach to speeding up garbage collections is to xe2x80x9cparallelizexe2x80x9d the GC process by involving any idle processors in the garbage collection task. Toshido Endo et al. in their paper, xe2x80x9cA Scalable Mark-Sweep Garbage Collector on Large Scale Shared-Memory Machines,xe2x80x9d published in the Proceedings of High Performance Networking and Computing (SC97) in 1997 describes such a system. This approach includes copying the work to lockable auxiliary queues from which work may be stolen. A thread whose queue is empty looks for a queue with work, locks that queue and steals half the elements from the locked queue.
The inventive system parallelizes the basic sequential GC operation on collection work tasks by employing a number of execution threads.
A separate work queue, with two ends wherein entries can be pushed and popped from the first end and at least popped from the second end, is created for each thread. As a thread proceeds, it dynamically identifies referenced work tasks. Identifiers, usually pointers, are created for the referenced work tasks and pushed onto the first end of its work queue. At some later point, the thread will pop from the first end of its work queue the identifiers of the referenced work tasks, and in performing those tasks it may identify further work tasks. As before, identifiers of these further found referenced work tasks are pushed onto the first end of its work queue for later processing.
When a thread has exhausted its own tasks (and any overflow tasks as described below) the thread steals work asynchronously from other threads"" work queues by popping identifiers from the second ends thereof. By using the second opposite queue end from the queue""s xe2x80x9cownerxe2x80x9d thread, the stealing thread minimizes interrupting or blocking of the owner""s operation.
Any contention that occurs is resolved, in a preferred embodiment of the invention, by using atomic instructions, i.e., instructions that represent operations that always run to completion without interruption by another instruction. As will be described below, such instructions provide a way for the stealing thread to know whether it has been successful or not in stealing, and for obtaining a value useful for updating pointers when installing forwarding pointers to collected objects.
In one aspect, the present invention has adapted the work stealing to accommodate fixed size queues that can overflow. The queues"" contents are popped and scanned by their respective threads, and, if the scanning determines that there is not enough room available on a queue, an overflow list is created. In the case where work tasks are objects to be scanned for further referenced objects, such overflow objects are linked by class in this overflow list by replacing class pointers found in the objects with linking pointers.