1. Field of the Invention
The present invention relates to execution behavior of computer programs and, more particularly, to techniques for sampling software objects during respective lifetimes thereof.
2. Description of the Related Art
In general, runtime characteristics and/or populations of software objects can affect performance and operation of a variety of computational systems. For example, the efficacy of certain automatic dynamic memory management systems (e.g., garbage collectors) can be significantly affected by lifetimes of software objects. In some cases, such runtime characteristics may be empirically predicted and computational strategies may be tailored in accordance with such predictions. In other cases, simulations or profile-driven (i.e., off-line) techniques may be employed to better tailor computational strategies.
Generational collection is an automatic dynamic memory management technique that aims to improve the performance of a garbage collected heap. Objects within the heap are divided into two or more generations according to the elapsed time since their allocation. As is customary, references to time, ages, older or younger should be taken to be in terms of an allocation clock (measuring the total volume of objects allocated since the program started) rather than a wall clock (measuring elapsed real time). In the simplest case, there are only two generations, which can be termed young and old. The young generation is subject to more frequent garbage collection than the old generation. Objects are allocated into the young generation and subsequently tenured into the old generation if they survive longer than some threshold.
This scheme is generally effective because many objects are short-lived and so it is worthwhile concentrating on the young objects (which are likely to die) rather than on the older objects (which are likely to continue to survive). The fact that many objects die young makes it desirable to use a copying collector for the young generation. In such implementations, only the few objects that survive need to be copied. In a typical implementation, each generation is held in a separate part of the heap. This approach allows the generations to be managed by different garbage collectors and according to different allocation policies. The physical separation means that an object is copied when it is tenured. Consequently, long-lived objects are often copied several times before they come to rest in the old generation. Pre-tenured allocation, i.e., allocating an object into an old (or older) generation, can avoid some of this copying.
Previous work has investigated feedback-based techniques for segregating long-lived and short-lived objects. Barrett and Zorn attempted to predict short-lived objects in a number of allocation-intensive applications written in C. See generally, David A. Barrett and Benjamin G. Zorn, xe2x80x9cUsing Lifetime Predictors to Improve Memory Allocation Performance,xe2x80x9d ACM SIGPLAN Notices, 28(6):187-193, June 1993. They used profile-driven full-run feedback based on observed object lifetimes. Their goal was to reduce the fragmentation caused by long-lived objects scattered throughout the heap. They were also able to reduce the cost of allocating short-lived objects by placing them contiguously and delaying deallocation until entire 4K batches became free.
In particular, Barrett and Zorn attempted to correlate short object lifetimes with the most recent n return addresses on the execution stack. They found that, typically, there was an abrupt step in the effectiveness of prediction when n reached some critical value. These critical values varied between applications, but were usually not greater than 4.
The effect of using these predictions was evaluated using a simulator which replayed allocation traces. Each entry in the allocation traces contained an identifier representing the object size and the complete call-chain to the allocation site. They estimated that the cost of computing a reasonable approximation to such an identifier was between 9 and 94 RISC-style instructions for each memory allocation made. While such costs may be acceptable for a free-list based allocator from the libc library, for other implementations, such overhead is unsatisfactory. For example, the fast-path of some allocators may utilize as few as 9 SPARC instructions. Accordingly, even the best-case estimate of overhead is substantial.
Seidl and Zorn proposed dividing the heap into a number of sections based on reference behavior and object lifetime. See generally, Matthew L. Seidl and Benjamin G. Zorn, xe2x80x9cSegregating Heap Objects by Reference Behavior and Lifetime,xe2x80x9d ACM SIGPLAN Notices, 33(11):12-23, Nov. 1998. They identified four kinds of object: highly referenced objects that are accessed frequently, non-highly referenced objects that are accessed infrequently, short-lived objects that are deallocated soon after they are created and other objects which form the remainder of the heap. These divisions were designed to improve the program""s usage of virtual memory pages. Seidl and Zorn""s work used trace-driven full-run feedback to gather statistics about a number of large C applications, including AWK, PostScript and Pen interpreters. They identified two effective techniques for predicting, at allocation time, into which category an object should be placed.
The first, a path point predictor assumed that there is a high correlation between certain call sites in a program and the behavior of objects allocated in procedures xe2x80x98belowxe2x80x99 these sites in the dynamic call graph. The intuition was that there are certain significant points at which the program changes between generating different kinds of object.
The second, a stack contents predictor, used a subset of the call chain at the time of allocation as a predictor of object behavior. For example, it considered the most recent n return addresses, for small values of n such as 3. In previous work, the authors showed that this was effective for programs written in C++ because a few stack frames were sufficient to disambiguate allocations occurring in common functions (such as object constructors, or malloc wrappers) that are invoked throughout the application.
Cheng, Harper and Lee described profile-based pretenuring in the context of the TIL compiler for Standard ML. See generally, Perry Cheng, Robert Harper, and Peter Lee, xe2x80x9cGenerational Stack Collection and Profile-Driven Pretenuring,xe2x80x9d ACM SIGPLAN Notices, 33(5):162-173, May 1998. They implemented profile-driven full-run feedback, employing the program counter of each allocation site as a predictor for whether an object would be long-lived.
Unfortunately, due to an absence of practical run-time sampling techniques, on-line, dynamic computation of object lifetimes has not been used. Instead, research has focused on profile-driven information gathered from previous runs of programs. Furthermore, previous approaches have had to use whole-program runs and have not been adaptive to changes in the execution of the application over time. In particular, they have not been able to capture phase-like changes in behavior of a program.
Accordingly, techniques have been developed whereby software objects can be sampled at run-time to obtain statistical information about allocation sites, object lifetimes, and other properties. Such run-time sampled information may be used to improve performance of memory systems by improving object tenuring and/or object placement decisions. While applications to pretenuring decisions in a garbage collected memory environment are particularly attractive, exploitations of techniques in accordance with the present invention are not limited thereto. Indeed, the object sampling techniques and facilities described herein may be employed in any of a variety of computational systems for which low-overhead runtime sampling of objects is desired. In some realizations, weak references are employed. In some realizations, representative subsets of objects are sampled.
In some realizations of the present invention, an allocator creates instances of data objects in response to requests from an application program or mutator. A subset at these objects are tracked by an object sampler. For an object selected to be tracked, a weak reference to the object is established to facilitate collection of information associated with the data object. Such information may identify the allocation time of the object, the application program requesting the object, the allocation call site, the type of the data object structure, etc. Once the data object is no longer reachable by a mutator, object termination begins. Typically, a garbage collector determines reachability using any of a variety of suitable techniques; however, explicit reclamation techniques may be employed in some realizations to trigger object termination. During this termination process, the object sampler collects additional information about the data object such as its termination time, and the weak reference established by the object sampler is removed. The object sampler then compiles and updates data object lifetime statistics based in part on the newly collected information. In some realizations, object statistics may be updated apart from termination. For example, in a generational collector implementation, statistics suitable for shaping a tenuring policy may be updated based on populations of sampled objects promoted from a younger generation to an older generation.
In one embodiment in accordance with the present invention, a method of generating object lifetime statistics based on run-time observations includes selecting from amongst object instances of an observed category, a sampled subset of the object instances allocated in one or more execution threads of a computational system; establishing (coincident with allocation of a sampled instance of an object) a weak reference thereto and associating therewith information indicative of at least allocation time; and referencing the sampled instances at run-time via the weak references and updating the object lifetime statistics based on the associated allocation time and then-current state. In some variations, the computational system includes a garbage collector and object lifetime statistics updating is performed in response to a determination by the garbage collector that one or more sampled instances have become unreachable. In some variations, the observed category corresponds to an object class. In some variations, the associated information indicates an allocation site or an allocating one of the execution threads.
In another embodiment in accordance with the present invention and suitable for an automatically reclaimed storage environment, a method of sampling instances of software objects during their respective lifetimes includes establishing weak references to respective of the sampled instances, each of the weak references identifying at least one respective sampled instance; associating allocation-time information with each sampled instance; and accessing the sampled instances via the weak references and performing an action based at least in part on a state of one or more of the sampled instances and respective allocation-time information. In various realizations, the sampled instances include a representative subset of a category of software objects that is object class specific, that is call-site specific, that corresponds to an activation record stack profile, that covers an abstract class or interface or that is specific to a particular garbage collection space. In some realizations, the method further includes selecting (at allocation time) the sampled instances from amongst all instances of a particular type. Selection may be performed in a variety of ways including based on allocation buffer overflow or based on a subset of allocations for each type of sampled software object.
In still another embodiment of the present invention, an object sampling facility includes a weak reference construct implemented by a computational system and an object fingerprinter. The object fingerprinter is responsive to a storage allocator of the computational system and object fingerprinter associates (1) allocation-time information and (2) an instance of the weak reference construct with at least a sampled subset of objects allocated by the storage allocator. In some variations, the object sampling facility includes an object sampler responsive to garbage collection events in the computational system. The object sampler references the sampled subset via the weak reference instances and maintains object lifetime statistics based on the associated allocation-time information and then-current state of the sampled subset. In some variations, the object sampling facility includes an object sampler that references the sampled subset via the weak reference instances and maintains object lifetime statistics based on the associated allocation-time information and sampled state of the sampled subset. The storage allocator can then be responsive to the object lifetime statistics in its allocation decisions.
In still yet another embodiment of the present invention, a computer program product includes at least one functional sequence for associating allocation-time information and an instance of a weak reference at least a sampled subset of objects allocated by a storage allocator and at least one functional sequence for sampling the sampled subset using the weak reference instances and maintaining object lifetime statistics based on the associated allocation-time information and sampled state of the sampled subset. In some variations, the computer program product further includes at least one functional sequence for tenuring certain object instances in accordance with those of the object lifetime statistics corresponding thereto.
In still yet another embodiment of the present invention, an apparatus includes (1) means for associating allocation-time information with sampled instances of software objects, (2) means for referencing the sampled instances of software objects, wherein the referencing means is operable for both reachable and unreachable software objects and (3) means for updating lifetime predictions for categories of the software objects based on run-time access to states of corresponding sampled instances and associated allocation-time information therefor.
These and other realizations will be appreciated based on the description and claims that follow.