1. Field of the Invention
The present invention relates to automatic memory management and, more particularly, to techniques for adapting tenuring and/or promotion policies in a garbage collector based on run-time sampling of object lifetimes.
2. Description of the Related Art
In general, the efficacy of certain automatic dynamic memory management systems (e.g., garbage collectors) can be significantly affected by lifetimes of software objects. In some cases, such runtime characteristics may be empirically predicted and computational strategies may be tailored in accordance with such predictions. In other cases, simulations or profile-driven (i.e., off-line) techniques may be employed to better tailor computational strategies.
Generational collection is an automatic dynamic memory management technique that aims to improve the performance of a garbage collected heap. Objects within the heap are divided into two or more generations according to the elapsed time since their allocation. As is customary, references to time, ages, older or younger should be taken to be in terms of an allocation clock (measuring the total volume of objects allocated since the program started) rather than a wall clock (measuring elapsed real time). In the simplest case, there are only two generations, which can be termed young and old. The young generation is subject to more frequent garbage collection than the old generation. Objects are allocated into the young generation and subsequently tenured into the old generation if they survive longer than some threshold.
This scheme is generally effective because many objects are short-lived and so it is worthwhile concentrating on the young objects (which are likely to die) rather than on the older objects (which are likely to continue to survive). The fact that many objects die young makes it desirable to use a copying collector for the young generation. In such implementations, only the few objects that survive need to be copied. In a typical implementation, each generation is held in a separate part of the heap. This approach allows the generations to be managed by different garbage collectors and according to different allocation policies. The physical separation means that an object is copied when it is tenured. Consequently, long-lived objects are often copied several times before they come to rest in the old generation. Pretenured allocation, i.e., allocating an object into an old (or older) generation, can avoid some of this copying.
Previous work has investigated feedback-based techniques for segregating long-lived and short-lived objects. Barrett and Zorn attempted to predict short-lived objects in a number of allocation-intensive applications written in C. See generally, David A. Barrett and Benjamin G. Zorn, “Using Lifetime Predictors to Improve Memory Allocation Performance,” ACM SIGPLAN Notices, 28(6):187-193, June 1993. They used profile-driven full-run feedback based on observed object lifetimes. Their goal was to reduce the fragmentation caused by long-lived objects scattered throughout the heap. They were also able to reduce the cost of allocating short-lived objects by placing them contiguously and delaying deallocation until entire 4K batches became free.
In particular, Barrett and Zorn attempted to correlate short object lifetimes with the most recent n return addresses on the execution stack. They found that, typically, there was an abrupt step in the effectiveness of prediction when n reached some critical value. These critical values varied between applications, but were usually not greater than 4.
The effect of using these predictions was evaluated using a simulator which replayed allocation traces. Each entry in the allocation traces contained an identifier representing the object size and the complete call-chain to the allocation site. They estimated that the cost of computing a reasonable approximation to such an identifier was between 9 and 94 RISC-style instructions for each memory allocation made. While such costs may be acceptable for a free-list based allocator from the libc library, for other implementations, such overhead is unsatisfactory. For example, the fast-path of some allocators may utilize as few as 9 SPARC instructions. Accordingly, even the best-case estimate of overhead is substantial.
Seidl and Zorn proposed dividing the heap into a number of sections based on reference behavior and object lifetime. See generally, Matthew L. Seidl and Benjamin G. Zorn, “Segregating Heap Objects by Reference Behavior and Lifetime,” ACM SIGPLAN Notices, 33(11):12-23, November 1998. They identified four kinds of object: highly referenced objects that are accessed frequently, non-highly referenced objects that are accessed infrequently, short-lived objects that are de-allocated soon after they are created and other objects which form the remainder of the heap. These divisions were designed to improve the program's usage of virtual memory pages. Seidl and Zorn's work used trace-driven full-run feedback to gather statistics about a number of large C applications, including AWK, PostScript and Pen interpreters. They identified two effective techniques for predicting, at allocation time, into which category an object should be placed.
The first, a path point predictor assumed that there is a high correlation between certain call sites in a program and the behavior of objects allocated in procedures ‘below’ these sites in the dynamic call graph. The intuition was that there are certain significant points at which the program changes between generating different kinds of object.
The second, a stack contents predictor, used a subset of the call chain at the time of allocation as a predictor of object behavior. For example, it considered the most recent n return addresses, for small values of n such as 3. In previous work, the authors showed that this was effective for programs written in C++ because a few stack frames were sufficient to disambiguate allocations occurring in common functions (such as object constructors, or malloc wrappers) that are invoked throughout the application.
Cheng, Harper and Lee described profile-based pretenuring in the context of the TIL compiler for Standard ML. See generally, Perry Cheng, Robert Harper, and Peter Lee, “Generational Stack Collection and Profile-Driven Pretenuring,”ACM SIGPLAN Notices, 33(5):162-173, May 1998. They implemented profile-driven full-run feedback, employing the program counter of each allocation site as a predictor for whether an object would be long-lived.
Unfortunately, due to an absence of practical run-time sampling techniques, on-line, dynamic computation of object lifetimes has not been used. Instead, research has focused on profile-driven information gathered from previous runs of programs. Furthermore, previous approaches have had to use whole-program runs and have not been adaptive to changes in the execution of the application over time. In particular, they have not been able to capture phase-like changes in behavior of a program.