In computer science, garbage collection (GC) is a form of automatic memory management. The garbage collector, or just collector, attempts to reclaim garbage, or memory used by objects that will never be accessed or mutated again by the application or computer program. Garbage collection was invented by John McCarthy around 1959 to solve the problems of manual memory management in Lisp.
Garbage collection is the opposite of manual memory management, which requires the programmer to specify which objects to deallocate and return to the memory system. However, many systems use a combination of the two approaches.
A typical tracing garbage collector maintains a set, U, of all memory objects known to the collector. During a collection cycle, the collector's task is to categorize all objects as either mutable or immutable. Mutable objects are objects that the mutator, i.e., the computer program, is able to read from or write to because the mutator has retained references, to the object in somewhere in memory. Objects that have no remaining references in the application's domain are considered immutable. The programmer provides the collector with a set of objects that are to be prejudged as mutable. The collector uses this set, the root set, as roots of a graph composed of objects, i.e., the vertices, and references, i.e., the edges. As the collector traverses this graph, objects are added to the mutable set M. When the graph has been completely traversed, all of the objects in U that do not also belong to M, i.e., U-M, are considered immutable and safe to collect.
Many applications or computer programs, such as those with real-time constraints and those that maintain large heaps, cannot afford to stop processing long enough so that the collector can compute all immutable objects in memory. Incremental garbage collection addresses this issue by splitting the work of a single collection cycle into small parts, interrupting the application/computer program frequently for small periods of time instead of interrupting the application/computer program relatively infrequently for potentially long periods of time.
Previous attempts to parallelize garbage collection incur a bottleneck, commonly referred to as “stop the world,” meaning that all threads in the application must be stopped for a portion of the collection cycle. Long pauses are anathema to parallelism, as shown in FIG. 4. The task of parallelizing incremental collection represents even more of a challenge to implement without defeating the benefits of parallelization completely.
Tri-Color Marking
Most modern tracing garbage collectors implement some variant of the tri-color marking abstraction, but simple collectors, such as the mark-and-sweep collector, often do not make this abstraction explicit.
FIG. 1 (prior art) is a schematic view of a“tri-color” garbage collector algorithm 80. Tri-color marking works as follows:
1. Create initial white W, grey G, and black B sets; these sets will be used to maintain progress during the cycle. Initially the white W set or condemned set is the set of objects that are candidates for having their memory recycled. The black B set is the set of objects that can be proven to have no references to objects in the white W set; this diagram in FIG. 1 (prior art) demonstrates an implementation that starts each collection cycle with an empty black B set. The grey G set is all the remaining objects that may or may not have references to objects in the white W set and elsewhere. These sets partition memory; every object in the system, including the root set, is in precisely one set.
2. Mark the root set grey. This step is important since both the black and the grey sets start off empty.
3. Pick an object from the grey G set. Blacken this object, i.e., move it to the black B set, by greying all the white W objects it references directly.
4. Repeat the previous step until the grey G set is empty.
5. When there are no more objects in the grey G set, then all the objects remaining in the white W set are safe to consider unreachable and the storage occupied by them can be reclaimed safely.
The tri-color marking algorithm preserves an important invariant: “No black B object points directly to a white W object.” This ensures that the white W objects can be safely destroyed once the grey G set is empty.
The tri-color method has an important advantage: it can be performed ‘on-the-fly’, without halting the system for significant time periods. This is accomplished by marking objects as they are allocated and during mutation, maintaining the various sets. By monitoring the size of the sets, the system can perform garbage collection periodically, rather than as-needed. Also, the need to touch the entire working set each cycle is avoided.
FIG. 2 (prior art) is a schematic view of the traditional actor model 90. The actor model has been described with respect to parallel programming in computing. In a network of active objects, all processes run concurrently and communicate through messaging. As an example, the internet is a model network in which each computer is an actor and all actors interact together in essentially real-time. Each actor has both a state and a thread of execution. The degree of parallelism is related to the degree of time-sharing, and not all messages can receive an immediate response. It will be understood, therefore, that with synchronous communication, there is the need to stop a thread in anticipation of a response, but with asynchronous processing, there is no need to wait.