1. Field of Invention
The present invention relates to the field of diagnosing a memory leak. In particular, the present invention relates to a method, system and an article of manufacture tangibly embodying a computer readable program for diagnosing a memory leak.
2. Description of the Related Art
At runtime of an application program (hereafter also referred to as a program) written in a programming language that implements Garbage Collection (GC), the memory management functionality is typically provided by application programs themselves. A memory no longer needed by the program is released by the program designer (also referred to as a programmer). If the program can not reasonably release different memories, it results in a waste of the memory resources since these memories can not be used by other programs.
Program errors that lead to such wasted memory are usually termed as “memory leaks”. In some programming languages, an automatic memory management is used rather than relying on the programmer to release the memory. Such automatic memory management is called “garbage-collection” (GC) in the art, i.e., an active component of a runtime system associated with the program. Such automatic memory management partly saves the efforts of programmers on memory management, by automatically releasing portions of the memory which are no longer referred by the running programs. However, another disadvantage caused by the automatic memory management is that some objects will reserve the references to the data structures in portions of the memory, but these data structures will not be used in the future execution of the application programs. The references will prevent the automatic garbage collector from reclaiming the unused portions of the memory, and this also leads to “memory leaks”.
Although garbage collection helps reduce the issue of “memory leaks”, the latter type of memory leaks still exist and may in some instances cause the performance of the computer to degraded and may even cause the running of the application program to consume all the memory thereby causing the computer to crash. Therefore, ‘memory leaks’ degrade the availability and security of the computer due to their large effect on the performance of the computer.
Usually, there are two kinds of memory leaks: one type is that leaks are produced with faster speed with each execution of leak incurring code and are obvious to notice and the other type is related to the leaks produced from time to time and slowly at runtime.
An important issue to be solved is how to identify objects that are leaking and rapidly confirm the cause of the memory leaks. Typically it is not easy to diagnose the memory leaks of a system, especially for those chronic memory leaks which occur continuously and with small volume each time. It is rather complex to identify an apparently insignificant but potentially important increase on the heap in time. It could be rather late when the memory leaks are found, and in this time the leaking program can caused a significant disadvantage on the entire system. This is especially true for the memory leaks that start out small but continue to grow over time. Sometimes, weeks of service uptime are required before the issue is large enough to be noticeable.
It is very difficult to identify these latent leaks, especially for the online productive system which can not endure multi-heap access, even heap dump, because these systems can not bear the execution pause due to heap traversing. Although there exist various garbage-collection approaches and they have respective benefits, such memory leak is still a disadvantage especially for Java® programs (Java is a registered trademark of Sun Microsystems).
Some existing technologies assist programmers to look inside the black box, to determine the root cause of the memory leak at runtime. For memory leak diagnosis, the existing technologies perform diagnosis mainly by differentiating heap snapshot (a snapshot is a graph that consists of types as nodes and references as connection among them) and according to the volume growth of objects of a particular type.
These technologies monitor the heap after each round of garbage collection and observe a downward-sawtooth pattern (ratio curve pattern of the memory is used) is observed of the free space until the program can not acquire any space from the heap since the used memory can not be efficiently collected and there are fewer available memory resources. The existing technologies can not be used in online system, because this kind of acquisition and analysis of the heap snapshot will cause the system having a large heap capacity to pause for several seconds. For the online system such as servers, these delays or pauses will lead to timeouts, thereby significantly influencing the performance of the online application. Such delays and pauses are undesirable for the online system.
Also, the memory heap of large application program often has a large capacity, and thus an attempt to frequently compare the heap snapshots offers little help for the diagnosis of application programs, because the objects that leak from the application program are not obvious. If the existing technologies are used to perform memory leak diagnosis, the application program will be perturbed a lot due to the frequent comparing operations of the heap snapshots for the memory leak diagnosis, which will bring a negative effect on the service quality and the programmers' experience. Also, in some circumstances, these technologies will perturb the running application programs or systems, thereby having no practical value, especially in the wireless circumstance.
The existing methods for diagnosing memory leaks have a limited effect on the industrial applications, because these existing methods normally recognize mostly the obvious type of memory leaks as suspicious candidates. For example, an existing technology suggests using the references to find objects responsible for the leaks. But the reference can not include the executing context information. The analysis of reference graph needs expertise and often confuses users with respect to complex reference connections, especially a plurality of references caused by a common type. In this case, programmers may still have difficulty knowing the reason as to why these references are produced and the reason of incurring leaks. The correctness of diagnosis and fix is difficult to judge and make.
In practice, taking full reference graph snapshots often and analysis on the references is far too expensive for large-scale online system. From the perspective of memory leak diagnosis, the user must identify the data structures that are likely to have issues. But finding the right data structures to focus on is difficult. When exploring the reference graphs of services (especially for large online system), issues of noise, complexity, and scale make the analysis on the reference graphs a daunting task, especially problematic for long-running systems. Noise effects can dwarf the evidence needed to diagnose a slow leak till the crash occurrence.
In general, the existing technologies mainly focus on the following points: frequent accesses on heap, even heap dump, to produce heap snapshots, comparisons among different snapshots to find growing nodes as leaking candidates, finding suspicious structures, and analyzing reference graphs to find the references causing the inappropriately held memory, for later confirmation. Thus, the methods used for identifying the memory leaking path normally include two steps: detect leak candidates, and diagnose the reason of the leak. But there is a gap between the two phases, and the existing technologies do not help adequately to diagnose the memory leaks.
To sum up, current technologies for diagnosing memory leaks have following disadvantages:
1. High requirements on expertise of the analyzers. The existing approaches require that the user manually distinguishes the real cause of memory leaks from within these cached objects. In general, these approaches swamp the user with too much low-level detail about individual objects that were created, and leave the user with the difficult task of interpreting complex reference graphs in order to understand the larger context. This interpretation process requires a lot of expertise. Even for experts, it usually takes several hours of analysis work to find the root cause of a memory leak.
2. Perturbation caused by heap access. These techniques will in some cases perturb the running service too much to be of practical value, especially in online environments. Comparison and analysis on heap snapshots are needed after acquiring reference graphs, which can cause a system with a large heap size to pause for several seconds. As mentioned above, for servers, these delays or pauses can cause timeout, significantly changing the behavior of the system.
3. Limited leaking analysis based on heap growth. Many existing tools find memory leaks using growth and heap differencing of heap to find the growing objects of heap. Although heap growth is a useful parameter to help judge, there are some issues with only using growth as a heuristic to find leaks. After all, growing objects or types do not have to be leaks and leaks do not have to grow.
4. Limited leaking analysis based on reference graph. Knowing only the type of leaking objects that predominates, often a low-level type such as a ‘String’, does not help explain why the leak occurs. This is because these Strings are likely to be used in many contexts, and even may be used for multiple purposes within the same data structure, such as a DOM document. In addition, because one low-level leaking object can simultaneously be inappropriately held by a plurality of references, it is easy to get lost quickly in analyzing the reference graph and extracting a reason for memory leakage. A single DOM object typically contains several objects, with a rich network of references among them. Without the knowledge of running program, it is difficult to know which path the reference types of leaks are created or when analyzing allocation call paths, which call site is important.
5. Limited leaking analysis based on allocation stack. Some methods can record allocation stacks of each type object at the same time with monitoring heap, but not all the instances of the suspicion type are leaking, so the real leaking path tends to be buried among all the stacks, and the storage and analysis of these stacks is very likely to be resource intensive. Often, leaking site can not map with the allocation site. For example, Java Database Connectivity (JDBC) is created repetitiously by one agent class invoked by another class, and the invoked class forgets to invoke the JDBC-free function of this agent class. Here the analysis of invoker is necessary.
Overall existing technologies need complex graph analysis and rich programming knowledge to provide even a limited clue for memory leak diagnosis. It is noted that existing methods mainly focus on searching the memory leaks, but rarely focus on recognizing the allocation paths which are directly related with the memory leakage issues.