1. Field of the Invention
This invention generally relates to automatic memory management, and more specifically, the invention relates to methods and systems for copying garbage collection.
2. Background Art
In operation, computer programs spend a lot of time stalled in cache and Translation Lookaside Buffer (TLB) misses, because computation tends to be faster than memory access. For example, Adl-Tabatabai et al. report that the SPECjbb2000 benchmark spends 45% of its time stalled in misses on an Itanium processor [Ali-Reza Adl-Tabatabai, Richard L. Hudson, Mauricio J. Serrano, and Sreenivas Subramoney. Prefetch injection based on hardware monitoring and object metadata. In Programming Language Design and Implementation (PLDI), 2004]. Better locality reduces misses, and thus improves performance. For example, techniques like prefetching or cache-aware memory allocation improve locality, and can significantly speedup the performance of a program.
Locality is in part determined by the order of heap objects in memory. If two objects reside on the same cache line or page, then an access to one causes the system to fetch this cache line or page. A subsequent access to the other object is fast. Copying garbage collection (GC) can change the order of objects in memory. To improve locality, copying GC should strive to colocate related objects on the same cache line or page.
Copying GC traverses the graph of heap objects, copies objects when it reaches them, and recycles memory of unreachable objects afterwards. Consider copying a binary tree of objects, where each cache line can hold three objects. When the traversal uses a FIFO queue, the order is breadth-first and results in the cache line layout in FIG. 1A. When the traversal uses a LIFO stack, the order is depth-first and results in the cache line layout in FIG. 1B. In both cases, most cache lines hold unconnected objects. For example, breadth-first order colocates o10 and o11 with o12, even though o12 will usually not be accessed together with o10 or o11.
Intuitively, it is better if an object occupies the same cache line as its siblings, parents, or children. Hierarchical copy order achieves this (FIG. 1C). Moon invented a hierarchical GC in 1984, and Wilson, Lam, and Moher improved it in 1991 [Paul R. Wilson, Michael S. Lam, and Thomas G. Moher. Effective “static-graph” reorganization to improve locality in a garbage-collected system. In Programming Language Design and Implementation (PLDI), 1991], calling it “hierarchical decomposition”. The algorithms by Moon and by Wilson, Lam, and Moher use only a single GC thread. Using multiple parallel GC threads reduces GC cost, and most product GCs today are parallel.