Computer systems are well known in the art. They have attained widespread use for providing computer power to many segments of today's modern society. Computers are available in many different forms such as a desktop, floor standing, or portable computers and include, e.g., one or more central processing units (CPUs) and associated volatile and non-volatile memory. Some computer systems also include a display, an input-output device such as a keyboard and/or a mouse, one or more storage devices such as hard disk drives, and, in many instances, a network interface adapter. One of the distinguishing characteristics of these systems is the use of a motherboard or system planar to electrically connect these components together. Examples of such computer systems are IBM's e-Server series, ThinkPad series, and Intellistation series.
The widespread use of personal computers in conjunction with networks has resulted in a reliance on the network resources such as e-business enterprises for, e.g., telecommuting, obtaining news and stock market information, trading, banking, shopping, shipping, communicating in the form of Voice Internet protocol (VoiceIP) and email, as well as other services. For many, PCs represent an essential tool for their livelihood. Thus, in today's networked world, the availability and performance of the network is as important as the availability and performance of the personal computer.
Today's e-business environment is very competitive so there is no room for failure. Servers such as the IBM pSeries help e-business enterprises remain competitive by remaining operational 24/7, 365 days a year. Because reliability is mission-critical, such servers include features to monitor for problems, features to correct or bypass minor problems on the fly, and hot swappable components to allow failed components to be replaced without powering down the server.
The problem of memory leakage, however, is not adequately addressed. Memory leakage refers to situations in which a task fails to deallocate at least some of the memory allocated to the task. For instance, when being executed, a task is allocated a portion of a heap, or other shared memory, to store variables, perform calculations, or the like. Once the task no longer needs to use the memory, the memory should be deallocated so it is available for allocation to other tasks. Memory leakage occurs when some portion of memory is not deallocated even though a task may no longer need the memory. Moreover, that same task, when executed again may be allocated more memory and may not deallocate all of that memory either, resulting in a continual leakage that can eventually consume so much of the shared memory that insufficient memory is available for other tasks.
In many instances, memory leakage remains undetected. In the latter instances, the computer system produces an “out of memory” error the next time some task needs a memory allocation and reports the error along with an identification of the task that requested the memory. The source of the memory leakage, however, is not detectable from the available information so the group of tasks must be restarted and, depending upon how critical the tasks are on the operation of the computer system, the entire computer system may need to be powered down and rebooted.
Further, memory leakage may occur only under specific circumstances that result from a combination of tasks being executed simultaneously. Thus, debugging the error is difficult post-mortem.
Many servers forward the “out of memory” error to a technical service provider to resolve the error. The technical service provider must re-create the circumstances to reproduce the memory leakage and/or review thousands or hundreds of thousands of lines of code to find the code responsible for the memory leakage. In many instances, one or more highly experienced software “trouble-shooters” may work on the problem for months before finding a solution, while the customer continues to endure the problem.
Restarting the application or rebooting the computer system, temporarily fixes the memory leakage condition. However, powering down and rebooting a network server significantly impacts the availability and reliability of the server, which is a very undesirable effect when availability and reliability are key features for distinguishing the server from a multitude of available servers on the market. For instance, many servers take hours or even days to return to service once they are powered down. Further, the memory leakage condition is not corrected, the condition is just delayed until the next time that the circumstances facilitate a memory leakage by the task.
Therefore, there is a need for methods, systems, and media to enhance memory leakage management by identifying attributes of a task potentially related to memory leakage and implementing measures to protect against memory leakage based upon the attributes identified. In some embodiments, upon detecting a memory leakage associated with a task, the embodiments may generate a signal adapted to terminate and restart the task or to request that the task be terminated and restarted. When the task restarts, memory usage by the task is reset, freeing up the leaked memory.