Memory leaks and related resource exhaustion/resource-contention problems can degrade software reliability. Memory leaks can remain in programs despite extensive tests in the development phase and can consume enough of a server's resources to seriously hinder performance or even cause application hangs or system crashes. This problem can become more acute in a multi-user environment where a large number of users can be affected by a single application, process, or collection of interacting processes exhibiting memory leaks. If applications or processes with memory leaks can be detected well in advance, preventive recovery actions can be taken to avoid potentially catastrophic failures affecting many users.
In many programming languages, it is possible to allocate memory for objects (or variables) dynamically during program execution. After the dynamically allocated object is used and there is no more need for it, it is necessary to explicitly release the memory consumed by the object. Failure to free 100% of the memory allocated results in a memory leak in the program. Memory leaks are also associated with programming errors in which a program gradually loses the ability to release non-useful memory. For example, an error might overwrite a pointer to a memory area thus rendering the memory unreachable and preventing the program from either utilizing the memory or freeing it. Memory leaks are common in programming languages like C and C++, which rely heavily on pointer arithmetic and do not implement or mandate “garbage collection”.
The main problem with a memory leak is that it results in an ever growing amount of memory being used by the system as a whole, not merely by the erroneous process/program. Eventually, all (or too much) of the available memory may be allocated (and not freed), and performance for the entire system can become severely degraded or can even crash. This problem is compounded in a multi-user environment as even one offending process or application can affect all of the users. System administrators typically do not get an alarm that there is a problem until about 95% of the available memory has been used up. Moreover, well before the system administrators start taking remedial actions, individual users' applications may start requesting more memory than available, which causes them to swap to disk and can decrease performance and increase transaction latencies tremendously.
Prior art has mostly focused on three aspects.
First, memory leak detection when the program source code is available for analysis. However, this approach is generally not an option for end-user customers who may have large commercial software systems competing for resources in multi-user environments, or when third-party and off-the-shelf software is used.
Second, memory leak detection and recovery during runtime. Memory leak detection and removal (or recovery of leaked memory) for runtime systems is often called garbage collection. One of the significant challenges for garbage collection is the additional performance overhead incurred. This overhead is particularly conspicuous for the mark-sweep approaches because they require a temporary suspension of the application while the algorithm is executed.
Third, detecting gradual system resource exhaustion in systems. Time series analysis is used to detect trends and estimate times to resource exhaustion. See, for example, U.S. Pat. No. 7,100,079. Preventive action (such as software rejuvenation) is performed to avoid any impending failure. However, identifying or pinpointing an offending application or process can be extremely difficult, for example in a multi-user environment with a very chaotic system memory usage profile. The entire system may have to be rebooted.