The vast majority of computer systems allow programs to dynamically allocate memory to data structures during execution. While dynamic allocation provides flexibility to programmers, systems which allocate memory must also find a way to identify and deallocate memory locations that are no longer being used during execution. Such techniques, which are generally known as garbage collection, allow for efficient use of memory, and prevent programs from running out of resources.
The efficiency of garbage collection schemes is often measured by reference to “throughput” and “pause time” metrics. Generally, “throughput” refers to the performance of a garbage collection technique. Specifically, the throughput of a program can be measured by the inverse of its execution time while using a particular garbage collection scheme. By another method of measurement, throughput is related to the amount of memory that can be reclaimed per amount of time that a program is executing. In the description to follow, we shall use throughput to mean the former description. Pause time, by contrast, is the amount of time taken up as the main program is prevented from executing while a garbage collector locates and reclaims memory.
Garbage collection methods are typically distinguished by the methods through which they identify memory locations that can no longer be reached during execution and how these methods affect throughput and pause time. For example, one collection technique called indirect collection periodically pauses execution of a main program in order to traverse memory references and identify memory locations that are no longer reachable by the program. While indirect-collection techniques usually show a relatively high throughput, as they combine reclamation of many memory locations into a single traversal, they tend to have high, and oftentimes unbounded, pause times.
By contrast, another technique, known as reference-counting (“RC”) garbage collection, reclaims memory using a count maintained against each logically independent unit of data, for example, a count ρ(x) is maintained against a unit of data x. In this example, ρ(x) is a tally that signifies whether there are any references to x, and changes as references to x are added and deleted. These count increments and decrements are referred to herein generally as “RC updates.” A ρ(x) value of zero means that there are no references to x, at which point it is safe to reclaim x. RC techniques, generally, are superior to indirect-collection techniques in the pause time metric, because garbage collection calls are usually of bounded time. However, these techniques, through their frequent calling of garbage collection routines, can cause throughput to suffer.
Moreover, traditional RC implementations are typically based on a reachability view of memory management. That is, RC updates are applied just when references are actually destroyed (either due to a redefinition or due to a reference going out of scope) or created, or after that. This could cause garbage objects to be held long after the references to them are last used, resulting in a program consuming more memory than needed.
Thus there remains room for improving the execution time and peak memory usage characteristic of the RC garbage collection technique.