Programming languages such as Java relieve the programmer of the burden of explicit memory management through the use of automatic garbage collection (GC) techniques that are applied “behind the scenes.” When a data object is created, space for the object is allocated in the heap. Unused data objects, which are no longer reachable by the running program via any path of pointer traversals, are considered “garbage.” GC automatically reclaims computer storage assigned to such objects, in order to free the storage for reuse. This makes programming in garbage-collected languages significantly easier than in C or C++, for example, in which the programmer must include an explicit “free” statement in order to reclaim memory. GC allows many run-time errors to be avoided and naturally supports modular programming.
A variety of different GC techniques are known in the art. In mark-sweep garbage collectors, garbage collection is implemented in two successive stages. In a first stage, an object graph is traversed, tracing the interrelation of objects starting from specified roots and traversing all connected objects in the heap. Objects that are reachable on this graph are considered live objects. Any other object is considered garbage and can be collected. The live objects are marked in some way so as to distinguish between live objects and garbage. In a second stage, the memory is swept, and all memory space occupied by unmarked objects (garbage) is reclaimed, so that it is free to be reallocated. During the sweep stage, the marked objects are unmarked in preparation for the next GC cycle.
In “concurrent” GC, the execution of application program threads that may update and change the object graph goes on concurrently with the marking and sweeping operations carried out by collector threads. For this reason, threads of the running program are referred to as “mutators,” since they mutate, or change, the object graph. Although the concurrent approach avoids processor inactivity during GC, the running program may change the object graph even during the very steps of tracing out reachable data objects by the collector. As a result, there is a risk that the collector may miss marking a live object, and the live object will then be reclaimed during the sweep phase of the collector. In order to avoid this possibility, synchronization between the mutator and collector threads is essential.
“On-the-fly” concurrent GC schemes use implicit synchronization between the mutator and collector threads in order to allow the threads to run concurrently without having to stop for synchronization. This type of GC was first described by Dijkstra et al., in “On-the-Fly Garbage Collection: An Exercise in Cooperation,” published in Communications of the ACM 21:11 (1978), pages 966-975, which is incorporated herein by reference. Objects are marked by assigning a different “color” attribute to each object, with “white” indicating unmarked objects, and “black” indicating objects that have been marked and traced. Objects that are marked but have not been traced are marked “gray.” At the beginning of a GC cycle, all objects are white. The root objects of each local thread and global objects are then marked gray. When the collector encounters a gray object, it knows that its direct descendants in the pointer graph may not yet have been marked (i.e., some may still be white). On the other hand, when an object is marked black, all of its direct descendants are necessarily marked as well, either gray or black. During the mark/trace phase, the collector traces the graph of live objects, and in doing so changes the color of all gray objects to black and their descendants to gray, continuing until no untraced gray objects remain. After all of the live objects have been traced, the collector then sweeps: white objects are reclaimed and appended to the list of free memory, while black objects are changed to white in preparation for the next collection cycle.
Wilson defines a write barrier in “Uniprocessor Garbage Collection Techniques,” published in the 1992 International Workshop on Memory Management (September 1992), p. 18, herein incorporated by reference, as an action taken by the mutator to trap or record a write operation into an object. There are a variety of ways to implement write barriers for on-the-fly GC. For example, Doligez and Gonthier describe a collector, in “Portable Unobtrusive Garbage Collection for Multi-Processor Systems,” published in the Conference Record of the Twenty-first Annual ACM Symposium on Principles of Programming Languages (1994), pages 70-83, which is incorporated herein by reference, in which the mutator grays objects prior to updating and raises a flag to indicate to the collector that there are objects left to trace. Only afterward does it update the reference. While GC is active, the mutator grays pointers to white objects whenever they are overwritten, i.e., when an object cell is updated, the object previously referenced is grayed. Domani et al. describe another solution in “Implementing an On-the-Fly Garbage Collector for Java,” published in the International Symposium on Memory Management (November 2000), wherein the mutator writes the address of the object to a local markstack and afterwards updates the pointer. The present invention, as described hereinbelow, is not limited to a particular type of write barrier and can operate with write barriers of different types.
Many current multiprocessor system support relaxed consistency memory models. Relaxed consistency models, which includes weak-consistency models, allow the constraints on sequential execution of the instructions in a program to be relaxed. (By contrast, in sequential consistency models, memory accesses must always appear as if they are executed in the sequence in which they appear in the program code.) Adve and Gharachorloo describe the concepts behind weak consistency in “Shared Memory Consistency Models: A Tutorial,” published in the IEEE Computer Magazine (December 1996), pp. 66-76, which is incorporated herein by reference. The authors describe the advantages of weak consistency in a multiprocessor execution environment, in which different pieces of a single program may execute on different processing units. They describe problems inherent in weak consistency and models for overcoming the problems. In particular, they introduce the concept of a safety net to enforce sequential program execution order when necessary. A fence instruction is an example of such a safety net, as the instruction imposes program order between various memory operations.
Relaxed consistency breaks assumptions in the GC implicit object synchronization described above. For instance, assume that an object, A, holds a reference to an object, O. The GC has not yet traced another object B, and has not yet traced object O. The following sequence of actions occurs:                1. Mutator M1 grays O in preparation for an update.        2. Mutator M2 writes a reference to O in object B.        3. Mutator M1 overwrites the reference to O in A.Under sequential consistency, the garbage collector will trace O as a result of step 1, or will trace O as a result of its connection to A. However, under weak consistency the order of steps 1 and 2 may be reversed. The collector could trace B before O is connected to B, conclude that there are no more gray objects, and terminate the trace before O and its descendents are marked. The result will be that the descendents of O will not be traced, and may be left with a white color attribute in the trace phase of garbage collection. In the sweep phase, since all white objects are returned to the free memory list, active memory could be corrupted.        
Both of the solution of Doligez and Gonthier and that of Domani et al., as mentioned above, could be adapted for relaxed consistency by using a fence instruction between steps 1 and 2 in the example above. Fence instructions are expensive, however, since they increase program overhead. Since updates are a frequent operation, the fence instructions also execute frequently, slowing program execution. As a result, the advantages of weak consistency in terms of speeding program execution can be largely lost.