Automatic memory management, or garbage collection, is a mature field that has been studied for about fifty years. An extensive survey of garbage collection is provided by the book ‘Garbage Collection: Algorithms for Dynamic Memory Management’ by R. Jones and R. Lins (Wiley, 1996). This book is basic reading for anyone skilled in the art of garbage collection. Even since the publication of this book, the field has seen active development due to the significant commercial interest in Java and other similar virtual machine based programming environments.
Another reference that generally should be reviewed when considering whether something is new in garbage collection is Bishop: Computer Systems with a Very Large Address Space and Garbage Collection, MIT/LCS/TR-178, MIT, 1977; NTIS ADA040601. While an old reference, it laid the groundwork for modern garbage collectors that operate on large memories incrementally by dividing the memory into regions (called areas by Bishop), and many patents have been granted in recent years where it may serve as invalidating prior art. It touches concepts such as regions (areas), remembered sets (inter-area links), generations (cables), concurrency (multiple simultaneous collections), multiple-area cycles, read barriers (load operation), write barriers (store operation), selecting which regions to collect next.
Much of the recent work in garbage collection has been driven by the need to make Java scale to server environments where applications have working sets of hundreds of megabytes or even several gigabytes, utilize multiple threads, and cannot tolerate pause times of more than some tens of milliseconds.
The Garbage-First collector, as described in Detlefs et al: Garbage-First Garbage Collection, ISMM'04, pp. 37-48, ACM, 2004, which is hereby incorporated herein by reference, can be considered representative of modern garbage collectors. It divides the heap into regions that can be collected independently, maintains remembered sets to know which objects in a region are referenced from outside the region, uses card marking to coarsely keep track of which memory locations have been written into between evacuation pauses, uses a parallel copying collector to copy and compact live objects in regions, uses metrics to decide which regions to collect next and uses global snapshot-at-the-beginning tracing running mostly concurrently with mutators to identify garbage data structures spanning multiple regions. Tracing takes place on the individual object level. Garbage collection is performed during evacuation pauses, which are short (typically less than 50 ms) pauses when mutator activity is stopped in order to perform garbage collection on one or more memory regions, typically also including a young object area.
The work of Siegwart and Hirzel: Improving Locality with Parallel Hierarchical Copying GC, ISMM'06, pp. 52-63, ACM, 2006, which is hereby included herein by reference, is representative of work on clustering objects during garbage collection to improve memory access locality. Their paper shows how to reduce cache and TLB misses by changing the order in which a parallel garbage collector copies heap objects. They also discuss various copy orders, such as breath first, depth first, and hierarchical copy order.
Systems where memory is divided into multiple independently collected regions need to be able to update references to objects to point to their new locations when objects are moved. Some systems use an indirection data structure that is updated when an object is moved. Most modern systems use a data structure called remembered set, which lists for each independently collectable region the set of objects in it that are referenced from outside the region and identifies for each such object the memory locations outside that region that contain those references, so that the referring memory locations can be updated when the object is moved. In many systems only some references are maintained; in Bishop, a data structure called cable is used to limit references; in generational garbage collectors, references from younger to older generations are not generally tracked; and in the train collector, references to higher numbered cars of a train are not tracked. In most collectors, references from the young object area (nursery) to older objects are not maintained in the remembered sets. Many systems do not track references to popular objects.
Existing systems generally only use the remembered set data structures for tracking references that cross region boundaries (including node boundaries in distributed systems). This is mandated by the fact that typical objects are small (e.g., a list node is usually 8-24 bytes) in comparison to the remembered set data structures, and the overhead of maintaining such data structures on a per-object basis would be prohibitive.
Detection and collection of garbage cycles spanning multiple regions is an important problem in garbage collection. Most known systems detect such garbage by tracing the entire heap object-by-object, with suitable bookkeeping (including special code in the write barrier) to implement snapshot-at-the-beginning or incremental-update tracing. The train algorithm of Hudson & Moss, and in Bishop method, on the other hand, detect such cycles by eventually moving all objects belonging to the same cycle to a single area (Bishop) or train (Hudson&Moss), after which the cycle can be detected as garbage. In distributed systems garbage cycles spanning multiple nodes are reclaimed either using a centralized server or by using a protocol that transmits either reference lists or timestamps between servers. The references lists may be compressed, and may only include references between externally referenced objects. Locally within nodes, such systems operate on a per-object level and perform object-level tracing to discover which external objects are reachable from which externally referenced objects.
The state of the art in local garbage collection could be summarized so that tracing is performed on a per-object basis and in parallel by multiple threads, soft real-time performance has been achieved with reasonably large memories (up to several gigabytes), snapshot-at-the-beginning concurrent tracing algorithms (or various other forms of tracing that run concurrently with mutator execution) allow detecting garbage cycles spanning multiple regions, and various metrics are used for priorizing regions for collection.