The role of a “garbage collector” in a JVM (Java Virtual Machine) is to remove unreachable objects from the heap and create space for new ones. Generational garbage collection is a popular policy used in many JVMs because of its ability to quickly collect objects that die young. A popular generational scheme divides the heap into two sections: a nursery (or new space) where new objects are allocated, and a tenure area (or old space) where longer-lived objects reside. The new space itself is divided into an allocate space and a survivor space. New objects are allocated in the allocate space and when that space fills up the garbage collector (GC) determines which objects are still alive and copies them to the survivor space. The roles of the survivor and allocate areas are then reversed, and new object allocations are made in the new allocate space. Once an object has survived a number of copies it is no longer considered young and is copied to the tenured space.
On large workloads (e.g., an application server such as IBM WebSphere® Application Server, etc.), analysis reveals that a significant proportion of processor cycles are spent waiting for heap data to be paged in from backing store (i.e., a page fault), or to arrive from main memory or an outer level of the cache hierarchy (e.g., a cache miss), or to translate a virtual address into a physical one (e.g., a translation look-aside buffer miss). Many of these misses are due to the poor locality of objects in the heap. Locality of reference is a principle in computing science that states that computer programs usually repeatedly access data related either spatially or temporally. In other words, if the program accesses a certain memory location L, it can be expected that it would access some other memory location close to L soon. There is also a strong likelihood that if a certain memory location is accessed once, it might be accessed again several times in a relatively short duration. It is well known that improving the reference locality of objects in the heap can result in significant performance improvements by reducing cache and TLB misses.
A garbage collector usually makes indiscriminant choices when deciding where to copy objects and this is the typical cause of degenerate locality. Locality can be improved if the garbage collector understands both the relationships between Java object references at runtime and the memory hierarchy of the underlying hardware. Examining references to objects reveals that some are accessed much more frequently than others. Such objects are referred to as “hot” and the remaining objects are referred to as “cold”. Locating hot objects close together on the heap has the dual effect of reducing page spread and improving TLB performance in addition to reducing cache line conflicts among hot objects.
Although static analysis of class usage in methods can yield some limited useful information for GC, the most accurate understanding of object relationships comes from profiling references at runtime (or some combination of the two). Profiling which objects are referenced while an application is running is extremely challenging. The overhead of profiling must be low enough that it does not defeat any gains realized by better locality, it must scale well in a multi-threaded environment, and the GC has to efficiently process the data collected and use it.
Some prior art solutions have collected traces of object references into a buffer and/or setting a bit in a word on the object to indicate that it has been referenced. Profiling is either continuous or requires cloning methods with instrumentation inserted and control logic to switch between the two bodies depending on whether a profiling phase is active or not.
The solution described herein therefore presents a novel mechanism for determining the frequency of access of Java heap objects at runtime that has extremely low instrumentation overhead, is scalable, and is highly space efficient. Enhancements to a garbage collector are thus described in greater detail herein to calibrate the quality of data collected and to use the information to optimize object placement to minimize cache and TLB misses in the heap.
Understanding the frequency of reference (or “hotness”) of an object is an important first step for many data locality-based optimizations by a garbage collector. It is a challenge to collect and process hotness information efficiently at runtime without introducing significant overheads that outweigh any benefits realized by improved data locality. This disclosure proposes a novel mechanism for determining the frequency of access of Java heap objects at runtime that has low instrumentation overhead, is scalable, is highly space efficient, and where the hotness information is readily available per object to a garbage collector for immediate use.
Previous runtime profiling solutions based on sampling have attempted to reduce overhead by cloning methods and inserting profiling instrumentation into the cloned version and control logic to switch between the two bodies at regular intervals. This approach does not work well in large-scale production middleware applications because of the huge number of methods involved and the resulting footprint cost of cloning each method. In addition, the instrumentation control logic is never completely removed and adds continuous drag to throughput performance. Our mechanism is superior in production environments because we use self-modifying code to insert and remove profiling instrumentation without duplicating methods. The metadata requirements to support patching methods are significantly less than cloning the entire method and the runtime profiling code itself can be shared among methods. The instrumentation logic is completely removed from methods in our mechanism. Determining the set of heap objects that are referenced more frequently than others is typically done by static analysis, associating a counter with each object, or tracing references to objects into a buffer.
The static analysis approach looks at object allocation sites and field references in hot execution paths through an application and determines hotness by the classes being referenced. While the overhead is low the results are general as this cannot identify specific object instances that are hot. Another approach is to have a counter associated with each object that is incremented each time it is referenced at runtime. Some mechanism is required to control when counting occurs. While it does collect accurate reference count statistics, the presence of the counter bloats the object and requires that the object be touched on each reference to update the counter. In practice, this scheme contributes to the cache locality problem we are trying to solve.
Finally, buffer based approaches that write traces of object references into a buffer have also been used. An affinity graph can be constructed from the data in the buffer that show not only hotness of objects but their temporal relationships. However, the number of objects involved in production systems will quickly overwhelm any buffer-based profiling scheme unless significant memory is dedicated to the buffers, and the overhead of managing buffer pointers and storing data is high. In addition, a separate buffer is needed per thread such that scalability does not suffer. In all the above cases a garbage collector must aggregate the profiling data collected before it can use it for locality based optimizations, and this may incur significant time and space overheads. The present invention, on the other hand, as disclosed in greater detail herein utilizes probability to accurately determine whether objects are hot or not and represents this efficiently with a single bit associated with each object. Because hotness is determined at runtime the information is available for immediate use by the garbage collector without any aggregation or processing of data.