1. Field of the Invention
The present invention is directed to technology for finding the source of memory leaks.
2. Description of the Related Art
Memory leaks are allocated memory that are no longer in use. They should have been freed, but were not. Memory leaks slow program execution and can cause programs to run out of memory. Many times the effects of memory leaks can cause a program to crash. Memory leaks are very difficult to detect because memory leaks rarely produce directly noticeable effects, but instead cumulatively degrade and/or affect overall performance. That is, a memory leak typically does not have a direct symptom. The cumulative effects of memory leaks is that memory is lost which increases the size of the working memory being used. In the worst case, a program can consume the entire virtual memory of the host system.
The indirect symptom of a memory leak is that a process's address space grows during activity when one would have expected it to remain constant. Thus, a prior test methodology for finding memory leaks is to repeat an action many times and to conclude that there are no leaks if the address space growth levels out. However, there are two problems with this methodology. The first problem is that it does not rule out that there simply was enough unallocated heap memory in the existing address space to accommodate the leaks. In other words, the address space does not grow, but there does exist a leak. The assumption that testers have is that if the leak was significant enough to care about, it would have consumed all of the unallocated heap memory within the chosen number of repetitions and forced an expansion of the process's address space.
The second problem with this repetition methodology is that it is quite time-consuming to build test sweeps that repetitively exercise every feature and automatically watch for improper address space growth. In fact, it is generally so time-consuming that it is rarely done at all. Suppose, however, that a developer sufficiently builds a leak detecting sweep and finds that the address space grows unacceptably due to one or more leaks. The developer still must spend a considerable amount of time to track down the problems. A developer could shrink the test sweep bit by bit until the address space growth is no longer observed, or modify the allocation process and free process to record their arguments and perform an analysis of what was allocated but not freed. The first technique is fairly brute force and can take many iterations to track down a single leak. The second technique is powerful in practice, but has problems. In any given repetition loop there may be allocated chunks that are allocated but legitimately not freed until the next iteration. Thus, just because a chunk was allocated but not freed during an iteration does not mean the chunk represents a leak. It may represent a carry-over from a previous iteration. An improved technique is to record the allocation and free calls for an entire program run and look for chunks that are allocated but not freed. The problem with this is the existence of permanently allocated data, such as a symbol table, that is designed to be to be reclaimed only when the process terminates. Such permanently allocated data may show up as a leak.
Memory leaks are so hard to detect and track down that they are often simply tolerated. In short-running programs, this is not serious. However, in long-running programs it can be a major problem. For example, consider a web application that is available to users twenty-four hours a day, seven days a week. In that case, a memory leak could grow and accumulate over time, such that the program degrades in performance so as to be come unusable. An organization that relies on commerce or on functions via the Internet may not be able to live with such degradation of performance or crashing of their Internet applications.
A previous attempt to solve the memory leak problems with applications written in the C++ programming language include malloc-debug packages. These packages implemented the malloc interface and also provided several levels of additional inter-checking and memory marking. Unfortunately, malloc-debug packages do not detect errors at the point they occur. They only detect errors at the next malloc_verify call. Since malloc_verify has to scan the entire heap, it is expensive to call frequently.
Another previous tool for working with memory leaks used a mark and sweep algorithm. In the mark phase, the tool recursively followed potential pointers from data and stack segments into the heap and marked all block references in the standard conservative and pessimistic manner. In the sweep phase, the tool stepped through the heap and reported allocated blocks that no longer seem to be referenced by the program. The tool also modified malloc to label each allocated block with the return addresses of the functions then on the call stack. These addresses, when translated into function names and line numbers via the symbol table identified the code path that allocated the leaked memory and often made it somewhat easy for the programmer to eliminate the error.
While some of the above-described tools were somewhat successful for use with applications created using the C and C++ programming languages, they were not sufficient for applications written in Java. Java differs from C and C++ in ways that have made prior memory leak solutions not available to Java applications. For example, in C and C++, the program is responsible for allocating and freeing memory. In addition, it is possible to monitor each allocation. On the other hand, with Java the Java Virtual Machine (“JVM”) is responsible for freeing memory. Additionally, there are millions of objects that can be used in an application. Thus, tracking allocation may not be practical.
When Java first became popular, many programmers thought that they no longer had to worry about memory leaks because with Java the programmer simply creates objects and the JVM takes care of removing them when they are no longer needed. The task of removing unused objects is known as “garbage collection.” The garbage collector finds objects that are no longer needed by an application and removes them when they can no longer be accessed or referenced. The garbage collector starts at the root nodes, classes that persist throughout the life of a Java application, and sweeps through all the nodes. As it traverses the nodes, it keeps track of which objects are actively being referenced. Any objects that are no longer being referenced are then eligible to be garbage collected. The memory resources used by these objects can be returned to the JVM when the objects are deleted. Thus, Java does not necessarily require the programmer to be responsible for memory management and cleanup because it automatically garbage collects unused objects. However, an object is only counted as being unused when it is no longer referenced. Thus, if a set of objects are created for use for a short period of time, and the reference to the object is not removed, then a leak may be created.
One attempt to debug memory leaks in Java applications tracks every object allocated and keeps a record of every object created. After the application is run, the information is analyzed by a human. Tracking every object requires a lot of CPU time, which prevents the application from running in production when the memory leak debug tool is operating. Because the application has to be run in a non-production environment, it may be possible that the leak is not reproduced in the non-production environment (e.g. a debugging or testing environment). Also, there is a heavy burden on the human developer to read through all of the information.
Thus, there is a need for an improved means for debugging memory leaks in Java applications.