Within the context of computer memory management, garbage collection relates to the automatic reclamation of computer storage. When data objects such as arrays, records and other data structures are created, space for the object is allocated in the heap. The term "object" is used herein to denote generally any piece of memory. When the object is no longer needed, its space must be freed in order that the heap does not become saturated with objects that are no longer required for the computation. Computer programming languages such as Pascal or C, typically require the programmer to attend to reclamation of heap storage manually. The programmer must keep track of information that allows him to determine when an object can be safely discarded. This manual heap maintenance is feasible, although prone to errors.
The continuing need to avoid such errors has rendered systems and languages supporting garbage collected heaps very attractive. Developing software in such environments is much faster because garbage collection eliminates a large class of programmer errors, both in the design and implementation stages. Furthermore, in programming languages such as Java from Sun Microsystems, which is emerging as a standard Internet tool and a platform-independent implementation vehicle, there is no explicit de-allocation by the programmer and therefore use of these languages mandates a good garbage collection algorithm.
The garbage collector's task is to locate data objects that are not longer required, and to reclaim their space in memory for use by the running program. In mark-sweep garbage collectors, garbage collection is implemented in two successive stages. In a first stage, the object graph described by the interrelation of objects starting from the roots and traversing all connected objects in the heap, is traced so as to identify live objects. An object is considered live if it is reachable either directly from the roots or from some other live object. Any other object is considered garbage and can be collected. The roots include global state (e.g. global variables) and the local state of each thread (e.g. the thread's stack and its local variables on that stack). The live objects are marked in some way so as to distinguish between live objects and garbage. In a second stage, the memory is swept, all the memory space occupied by unmarked objects (garbage) is reclaimed and the marked objects are unmarked, in preparation for the next garbage collection cycle.
In so-called "concurrent" garbage collectors, the execution of the program which updates and changes the object graph is concurrent with the marking and sweeping operations carried out by the collector. Whilst this avoids processor inactivity during garbage collection, the running program may change the object graph during the very act of tracing out reachable data objects by the collector. For this reason, the running program is referred to as the mutator since it mutates or changes the object graph. As a result, there exists the risk that the collector may miss marking a live object and the live object may then be subsequently reclaimed by the collector. In order to avoid this possibility, synchronization between the mutator and collector threads is essential.
An important consideration with regard to concurrent collectors is their degree of conservatism with respect to changes made by the mutator during garbage collection. Thus, an object may have been marked as live by the garbage collector and subsequently made unreachable by the mutator. Such an object constitutes floating garbage which is not reclaimed during the current garbage collection cycle. It will, however, be collected during the next cycle since it will be identified as garbage at the beginning of the next collection.
Floating garbage clogs up the heap unnecessarily and thus is undesirable. Whilst a certain amount of floating garbage may be tolerated and, indeed, is inevitable since no garbage collector can be completely efficient, the reverse can under no circumstances be tolerated. That is to say, reachable objects must never be marked as unreachable by the tracer since their space would then be erroneously collected, causing possibly catastrophic effects on the application program. This asymmetry inclines garbage collectors towards being naturally conservative since it always better not to reclaim garbage than to reclaim it erroneously. This conservatism impacts on the manner in which conflicts between mutator allocation and garbage collector sweep are resolved.
The question arises as to how to mark an object newly allocated by the mutator, especially during the sweep phase of garbage collection, which collects unmarked objects and resets the mark of marked objects. During the sweep phase, an object which is allocated in those locations of the heap that have not yet been swept in the current sweep cycle, must be allocated as marked, so that the sweep will not collect them. Objects which are allocated in an area which has already been swept must be allocated as unmarked in order that they be unmarked for the start of the next collection. This requires synchronization, be it implicit or explicit, between the sweep process and the allocation procedure, lest an object be subsequently reclaimed whilst still alive.
A sub-class of concurrent garbage collectors are so-called "on the fly" garbage collectors first introduced by Dijkstra et al. [1]. In this type of garbage collector, the manner in which reachable objects are marked is by assigning a different color attribute to distinguish between reachable and unreachable objects. This approach has been adopted in both concurrent and "on the fly" garbage collectors, a four-color marking conventionally being used. A "white" color indicates that an object is unmarked. A "gray" color indicates that an object is marked, but that its direct descendants may not yet be marked (i.e. some may be white). A "black" color indicates that an object is marked and that all its direct descendants are marked (either gray or black). Finally, a "blue" color indicates that the object is free. Use of a fourth color to distinguish free objects avoids the need for the garbage collector to trace these objects, and thus saves time. In such a scheme, "gray" or "black" objects are also referred to as "shaded" objects. At the start of the cycle all objects are white. During tracing, the color of live objects progresses from white to gray to black. After tracing, the collector then sweeps: white objects are colored blue and appended to the free list; shaded objects are changed to white in preparation for the next collection cycle.
The advantage of "on the fly" garbage collectors resides in that there is no synchronization point where the mutator threads have to stop. This obviates the need for explicit locking which might otherwise lock out the mutator and collector threads in order to force synchronization between them. However, as will be seen, this does not itself preclude implicit synchronization whereby the order of operations as performed by a thread in a multiprocessor system is significant and must be the same order perceived by other threads. That is to say, given the absence of explicit synchronization between collector and mutator threads, what is referred to as "strong" or "sequential" consistency may be required for correctness of the collection algorithm. As defined by Lamport[6] a multiprocessor system is sequentially consistent if the result of any execution is the same as if all of the processors were executed in some sequential order, and the operations of each individual processor appear in this sequence in the order specified by the program. An analogous definition for sequential consistency of a multi-threaded or multi-process execution holds.
There are two requirements for sequential consistency. First, program order must be maintained among operations from a single processor thread, and secondly a single sequential order must be maintained among all operations. For reasons of performance, modern multiprocessors do not guarantee sequential consistency; rather they provide a more relaxed form of consistency. In the absence of sequential consistency in a multiprocessor system, special steps must be taken in order to ensure that when a new object is allocated during the sweep stage of the collector, it will be marked the appropriate color. This will now be explained in greater detail with particular regard to the Doligez and Gonthier collector[4].
When a mutator allocates a new object, i.e. removes it from the free list and starts using it, it must assign the proper color to the new object. The proper color depends on the stage of the collection cycle currently being executed by the collector thread. While no garbage collection is taking place and at the start of the collection cycle the proper color is white. At some point during the mark/trace phase, the proper color becomes black (the point depends on the specific collection algorithm). During sweep, the proper color is black if the object is in an area of the heap that has not yet been swept and white if the object has already been swept. Choosing the proper color during sweep requires synchronization between the mutator thread allocating the object and the collector thread. This synchronization may be implicit and depend on the ordering of read and write operations as in the collector described by Doligez and Gonthier[4].
The Doligez and Gonthier collector is a descendent of the Dijkstra collector and is described in pseudocode. Mutator threads perform actions including the coloring of newly created objects in cooperation with the collector. Exactly what actions they need to perform are determined by where the collector thread is in the collection cycle. To facilitate this cooperation, each mutator thread has a status field connected with it which takes one of three values: Sync1, Sync2, Async. The collector calls for mutators to change their status three times per collection cycle. The mutators change status in a circular fashion, progressing from Async to Sync1 to Sync2 and back to Async. When the collector reaches a certain point in its cycle, it requests that all the mutators take on the succeeding state. These requests are known as handshake actions. For example Handshake (Async) signifies that the collector is requesting all mutators to change their state from Sync2 to Async.
The Doligez and Gonthier collector calls for the mutators to execute a create protocol every time an object, x, is allocated by a mutator, m. The purpose of the protocol is to choose a color for the newly created object. It is assumed that a mutator does not respond to a handshake action, i.e., change its collection status during the execution of the create protocol:
color[x]=Black; PA1 if (status[m].noteq.Async or x &lt;swept) PA1 else if (x==swept) PA1 swept=0; PA1 while (swept&lt;end_of_heap) do PA1 swept=+infinity; PA1 (a) in a first collection cycle, associating a first attribute with objects believed to be reachable and associating a second attribute with objects believed to be unreachable, PA1 (b) in a successive collection cycle, associating said first attribute with objects believed to be unreachable and associating said second attribute with objects believed to be reachable, and PA1 (a) repeating steps (a) and (b) for all successive cycles.
color[x]=White; PA2 color [x]=Gray; PA2 if (color[swept]==Black or color[swept]==Gray) PA2 else if (color[swept]==White) PA2 swept=swept+1;
Checking the conditions in the create protocol involves accessing a global variable, swept, which must be reloaded from memory on each access. The value of swept represents the collector's progress in sweeping the heap. While the collector is not sweeping, the global variable swept is set to some value guaranteed to be larger than the value of any address in the heap. Just before Mark/Trace, the collector resets this value to less than the lowest address in the heap. During sweeping this value is gradually incremented as the collector processes the elements in the heap. Its value represents the address of the object currently being swept.
Execution of the create protocol is important: if a newly created object is colored White at the wrong time it will be incorrectly collected. If it is colored Black, this implies that its immediate descendants have been marked. Therefore, coloring Black at the wrong time, i.e. before the immediate descendants are marked may result in the descendants being incorrectly collected. It is always safe to color Gray, but inefficient: if an object is Gray neither it nor its descendants can be collected. This contradicts the prime goal of the collector, namely to free unused memory.
Sweeping in the Doligez and Gonthier collector is done by the following pseudocode:
color[swept]=White; PA3 color[swept]=Blue; PA3 append_to_free_list(swept);
Synchronization between object allocation and sweep is implicit and complex to understand. It also depends on the allocating mutator thread reading an up-to-date value of the variable swept. On multiprocessor architectures that do not guarantee sequential consistency (e.g. the PowerPC), sweep may require a synchronizing instruction (e.g. sync on PowerPC) after incrementing the variable swept, and object allocation may require a synchronizing instruction before reading the value of the variable swept. These synchronizing instructions are multi-cycle instructions and may require memory access; thus they are quite expensive.
Hudak and Keller[2] describe a collector for an esoteric distributed applicative processing system (DAPS) model. In this model there is no shared memory between processors. Thus, consider a standard stack-implementation of the mark phase of a conventional collector in shared memory. Each root is marked and pushed on to the stack. Nodes are then repetitively removed from the stack in order to examine each of their descendants in the object graph. If a descendant is already marked, no further action is required; otherwise, it is also marked and pushed on to the stack. Thus, the stack serves as a place-holder for nodes that have been marked but whose descendants have not yet been examined.
Implementing a stack for DAPS would impose a very high synchronization overhead. In place of the stack, Hudak and Keller employ a marking tree of tasks. The marking tree reflects the parallel nature of distributed marking in a manner analogous to the linear stack reflecting the nature of sequential marking. Thus, whilst a sequential mutator adds nodes to a stack, so their distributed collector starts a new task and adds it as a branch in the marking tree.
In order to avoid the synchronization between object allocation and sweep, Hudak and Keller further propose switching the meaning of the black and white colors on successive collection cycles. In saying this, it is to be noted that Hudak and Keller themselves acknowledge that the term "color" has a different interpretation for their distributed system than for conventional shared data structures. In particular, their definition of "color" is related to their marking tree data structure.
The sweep phase in the garbage collector disclosed by Hudak and Keller comprises three separate phases. At the end of marking, white nodes are garbage, and all tasks pointing to white nodes are irrelevant. The sweep phase first terminates irrelevant tasks, then collects all white nodes by adding them to the free-list, and then prepares the system for the next collector cycle. In practice, adding white nodes to the free list requires that they first be "bleached" since nodes on the free-list have no color in the Hudak and Keller collector. Trace is finished when there are no gray nodes left and therefore at end of trace all nodes which are reachable are black. There can also be white and bleached nodes. This, incidentally, is distinct from the Doligez and Gonthier collector mentioned above, where there can be gray nodes. Doligez and Gonthier do not invest the effort to prevent this condition since their collector works correctly on the assumption that all reachable nodes are shaded and point to other nodes which are shaded.
Thus, at the start of the sweep in the Hudak Keller collector, there can be no gray nodes. The question which remains, therefore, is what to do with the black nodes. It is inadmissible merely to paint them white in preparation for the next mark phase, since if this were done at the same time as the sweeping process is reclaiming white nodes, live nodes would be freed with fatal consequences. Therefore, Hudak and Keller simply ignore black nodes until the sweep is complete, whereafter the mutator is instructed to reverse its sense of black and white. That is, when the sweep phase is complete, the mutator sees only black nodes. If now, it interprets them as being white, then the mark phase is ready to begin.
The implementation of this approach by Hudak and Keller is intimately bound up with the parallel processing afforded by the distributed nature of their mutators since, in effect, there exist many processing elements each acting independently. When one processing element changes its sense of color, it views all nodes in the system as being white, even though some other processing element may view the same nodes as being black. As long as they are all either white or black, the mutators behave the same. It is only after all processing elements have "reversed colors" that the next mark phase is allowed to commence.
It is further to be noted that Hudak and Keller do require locking when updating a node by a program thread in order to prevent other processors from updating the same node. In this connection, particular reference should be paid to their two complementary tasks add-ref and expand-node. Add-ref selectively adds an arc to the marking tree and is used to spawn a new node in the object graph during tracing. Expand-node allows a program thread to add a new subgraph to a selected node. In both cases, a child, or descendant node, may be selected only when the mutator threads are locked against accessing the memory address of the parent node. Moreover, the color which is assigned by expand-node to a child node, depends on the node's hierarchy in the object graph. Thus, the color of the parent node must first be checked. If it is Black then the child node is also set to Black whilst otherwise it is set to White.
The need always to check the color of the parent before assigning a color to a newly allocated object coupled with the need for explicit locking constitute a major overhead which degrades the performance of the garbage collector.
It is thus apparent that the color reversal proposed the Hudak and Keller collector is very specific to their DAPS model and is by no means immediately applicable to other garbage collectors. This is borne out by the fact that Hudak and Keller published their marking-tree collector in 1982 and since that time no attempt has been made to try to apply their techniques to other concurrent garbage collectors.
Finally, mention is made of Lamport[5] who also describes a mechanism for changing the meaning of colors for a concurrent and on-the-fly collector. He proposes his mechanism in order to pipeline the collection algorithm, so that the trace of new collection cycle can work in parallel with the sweep of the previous collection cycle. His algorithm does not have the race between allocate and sweep because he bases his algorithm on Dijkstra's original 3 color scheme.