Memory analysis has become an important area of focus for information processing systems. Problems such as excessive memory footprint or unbounded memory growth over time are common causes of system slowdown and failure. For large-scale systems, understanding the behavior of a program's memory over time, and finding the root cause of memory problems can be difficult with currently available techniques. One area of particular concern is that of memory leaks. Despite automatic garbage collection, memory leaks remain a significant problem for many Java applications. A memory leak occurs when a Java program inadvertently maintains references to objects that are no longer needed, preventing the garbage collector (GC) from reclaiming space. Memory leaks are easy to spot, but are often difficult to diagnose. The likelihood that a memory leak exists can be determined by using black box analysis, monitoring the memory heap after each round of garbage collection. Each round of garbage collection frees less and less memory space until the application grinds to a halt for lack of space.
A number of diagnostic tools exist that help users determine the root cause of a leak. These tools rely on a combination of heap snapshot differencing, and allocation and/or usage tracking at a fine level of detail. However, these techniques are not adequate for large-scale, enterprise applications.
Many existing memory management tools work by dividing a program heap into old objects and newer objects, under the assumption that the older objects are more likely to be permanent. Referring to FIG. 1, there is shown an illustration of a set of objects 100 including older objects 102, recently created objects 104, and a boundary or fringe 106 between them. By classifying the objects, the user manually tries to discover why the newer, ostensibly more temporary objects are being retained, by exploring the boundary (or fringe) 106. We say an object is on the fringe if it is a new object pointed to by an older one. The objects 102 in the older side of the fringe 106 comprise old objects 108 and fringe-old objects 110. The objects 104 in the new side of the fringe 106 comprise new objects 112 and fringe-new objects 114. This classification scheme is used to analyze possible sources of memory leaks. This manual method of leak analysis is time-consuming and difficult to implement.
To diagnose a memory leak, a user must look for a set of candidate data structures that are likely to have problems. Finding the right data structures on which to focus is difficult. As we will discuss herein, when exploring the reference graphs (sets of currently live objects and their references) of large applications, issues of noise, complexity, and scale make this a daunting task. For example, e-Business servers intentionally retain a large number of objects in caches. Existing analysis approaches require that the user manually distinguish these cached objects from truly leaking ones. In general, these approaches swamp the user with too much low level detail about individual objects that were created, and leave the user with the difficult task of interpreting complex reference graphs or allocation paths in order to understand the larger context. This interpretation process requires a lot of expertise and it involves many hours of analysis to find the root cause of a leak. Moreover, these techniques will in some cases perturb the running application too much to be of practical value, especially in production environments, making them inadequate for leak detection in enterprise systems.
Many known applications have properties, common to many Java applications, that make memory leak diagnosis especially difficult. These applications make heavy use of reusable frameworks and libraries, often from many sources. These framework intensive applications contain large amounts of code where the inner workings are not well understood by the developers, let alone those doing the problem determination. Server-side e-Business applications make use of particularly large frameworks, and introduce additional analysis difficulties due to their high degree of concurrency, scale, and long-running nature.
Existing tools have been used to help diagnose leaks. For example, the HPROF (Java H Profiler) tool works by categorizing each object according to its allocation call path and type, as shown in Table 1 below. This table shows the output of HPROF on a simple example using an application which, in a loop, leaks objects of various datatypes. The tool, as the program runs, makes notes of every object allocation: it remembers the call stack of the allocation, and the allocated datatype; in this way, it assigns a pair (STACK, TYPE) to each allocated object. As the program runs, it records statistics of these tuples. For example, it records how many allocations map to each tuple, and how many allocated, but not yet freed map to a tuple. Then, when the program completes (or when the tool user requests), HPROF sorts the histogram by the “live” statistic, and prints out the current top-N. Table 1 shows the top-5 for our simple example.
TABLE 1PercentLiveAllocatedStackClassRankSelfAccumBytesObjectsBytesObjectsTraceName197.3197.3110280001000010280000100001995Byte array297.3197.3140964181880101996Object array397.3197.31400001000040000100001994MemoryConsumer497.3197.311638811638811295Character array597.3197.311638811638811304Character array
Some recent work uses static semantics to enforce and detect ownership using ownership types. Data structures are composed of the objects they own. Thus, to diagnose a leak, one must find the data structures which own leaking objects. D. Clarke, J. Noble, and J. Potter, “Simple Ownership Types for Object Containment,” European Conference on Object Oriented Programming, 2001. Some have studied the interaction between the application's and the runtime's use of objects. See N. Rojemo and C. Runciman. “Lag, drag, void and use—heap profiling and space-efficient compilation revisited. In International Conference on Functional Programming, pages 34-41, 1996.” They break an object's lifetime into several phases, such as the time after allocation and before first use, and the time between last use and collection (“drag”). See The Glasgow Haskell Compiler User's Guide. http://haskell.cs.yale,edu/ghc. as of version 5.03 has built-in support for this type of analysis, which it calls “biographical profiling.” Other works study how liveness information [see O. Agesen, D. Detlefs, and J. E. B. Moss, Garbage Collection And Local Variable Type Precision And Liveness In Java Virtual Machines, Programming Language Design and Implementation, 1998] or reachability [M. Hirzel, J. Hinkel, A. Diwan, and M. Hind, Understanding The Connectivity Of Heap Objects, International Symposium on Memory Management, 2002] can benefit conservative garbage collection. We next discuss three problems encountered in analyzing data structures: perturbation, noise, and data structure complexity.