This invention relates to automatic reclamation of allocated, but unused memory, or garbage, in a computer system that uses a generational garbage collector and, particularly, to techniques for selectively allocating objects in younger or older generations used by the garbage collector. Memory reclamation may be carried out by a special-purpose garbage collection algorithm that locates and reclaims memory that is unused, but has not been explicitly de-allocated. There are many known garbage collection algorithms, including reference counting, mark-sweep, mark-compact and generational garbage collection algorithms. These, and other garbage collection techniques, are described in detail in a book entitled “Garbage Collection, Algorithms for Automatic Dynamic Memory Management” by Richard Jones and Raphael Lins, John Wiley & Sons, 1996.
However, many of the aforementioned garbage collection techniques often lead to long and unpredictable delays because normal processing must be suspended during the garbage collection process (called “stop the world” or STW processing) and these collectors at least occasionally scan the entire heap. The garbage collection process is performed by collection threads that perform collection work when all other threads are stopped. Therefore, they are generally not suitable in situations, such as real-time or interactive systems, where non-disruptive behavior is of greatest importance.
Conventional generational collection techniques alleviate these delays somewhat by concentrating collection efforts on a small memory area, called the “young” generation, in which most of the object allocation activity occurs. Since many objects allocated in the younger generation do not survive to the next collection, they do not significantly contribute to the collection delay. In addition, the more frequent collection of the younger generation reduces the need for collecting the remaining large memory area, called the “old” or “mature” generation and, thus, reduces the overall time consumed during garbage collection.
“Pre-tenuring” is a technique that increases the efficiency of generational garbage collection by identifying object allocations likely to produce objects with longer-than-average lifetimes, and allocating such objects directly in the old generation. This selective allocation fills the young generation with objects with shorter-than-average lifetimes, decreasing their survival rates and increasing the efficiency of collection.
A key issue in pre-tenuring is identifying the object allocations to be allocated in the old generation. One approach is offline profiling in which program training runs are conducted with selected data in order to predict the behavior of subsequent “real” program runs. This approach has the advantage of allowing relatively extensive program “instrumentation” to aid in the prediction, but requires that the user perform extra work, and that the training runs accurately predict the behavior of subsequent “real” runs.
Another approach is static analysis conducted during compilation, such as just-in-time compilation. This static analysis examines object allocation “sites” or instructions that allocate new objects. For example, it has been proposed that an allocation of an object from an allocation site followed by an assignment of that object to a static variable, leads to the conclusion that an object allocated from that allocation site is a good candidate for pre-tenuring. See, for example, “Understanding the Connectivity of Heap Objects”, M. Hirzel, J. Henkel, A. Diwan and M. Hind, Proceedings of the Third International Symposium on Memory Management, June 2002. Another technique combines static analysis with dynamic techniques to allocate an object in the same generation as an existing object into which a reference to the newly allocated object is assigned. See “Finding Your Cronies: Static Analysis for Dynamic Object Colocation”, S. Guyer and K. McKinley, ACM Conference on Object-Oriented Systems, Languages and Applications, 2004
Still another approach is to perform profiling used to make pre-tenuring decisions dynamically on the running program. This approach requires no extra effort on the part of users, and the training program run is the real program run, but the cost of the profiling must be very small, or else it will outweigh any efficiency advantages that might be gained. Therefore, techniques using this approach generally use some form of sampling, in which the lifetimes of only a subset of allocated objects are tracked. If this subset is large enough, it will gather enough information to permit accurate pre-tenuring decisions. But the subset cannot be too large, or else the expense of tracking the sampled objects will be too high. Examples of conventional sampling techniques are disclosed in “Dynamic Adaptive Pre-Tenuring”, T. Harris, Proceedings of the Second International Symposium on Memory Management, October, 2000 and “Dynamic Object Sampling for Pre-tenuring”, M. Jump, S. M. Blackburn, and K. S. McKinley, ACM International Symposium on Memory Management, October 2004. Rather than sampling all allocations directly, both of these techniques use an event, such as the allocation of a new local allocation buffer, to identify an allocation to be sampled.
However, these conventional sampling techniques are vulnerable to “sampling bias.” In particular, the allocations of larger objects often cause a local allocation buffer to overflow and, thus, require a new local allocation buffer to be allocated. Therefore, techniques that sample objects based on their allocation from new local allocation buffers tend to sample larger objects.
In order to avoid this bias, in another technique, a pre-tenuring decision is made by a two step process. In the first step, during a young-generation collection, the number of bytes that survive collection is determined for each allocation site and a predetermined number of sites with the highest number of surviving bytes are selected as candidate sites. In the second step, during a subsequent young-generation collection, the survival rates are determined for the candidate sites and objects to be allocated from sites with a sufficiently high survival rate are allocated directly in the old generation.
The aforementioned process enables counting of bytes allocated at allocation sites only for the candidate allocation sites. Since the maximum number of candidate sites is pre-selected, the counting process is limited. However, in this technique, survival rates for the candidate sites are calculated by counting the number of bytes allocated at each candidate site between two garbage collection cycles and storing the counts in a global byte array. Then, during a collection, the number of bytes allocated that survive can be determined. Code can easily be generated to increment the allocated bytes count stored in such a global array by the size of an allocated object if a single-threaded programming language is used. However, in a multi-threaded environment, such incrementing code becomes more difficult to generate and runs slower.
For example, the array count can be locked during the incrementing operation or atomic instructions such as fetch-and-add or compare-and-swap can be used to store the results of the increment, but these alternatives can slow the operation of the program considerably, especially if an allocation site is popular and the use of atomic instructions or locks causes contention. Even if atomic techniques are not used, thereby allowing some increments to be lost in the case of a conflict, cache memory line contention still may have deleterious effects on performance.
One way to avoid the performance penalties introduced by atomic operations is to maintain a matrix mapping pairs of global allocation site identifiers and thread IDs to allocated byte counts. However, such matrices could consume significant memory space, since the number of application threads may be large. Further, the expense of summing the per-thread matrix entries at the next collection can also be significant.