Computer systems are well known in the art. They have attained widespread use for providing computer power to many segments of today's modern society. Computers are available in many different forms such as a desktop, floor standing, or portable computers and include, e.g., one or more central processing units (CPUs) and associated volatile and non-volatile memory. Some computer systems also include a display, an input-output device such as a keyboard and/or a mouse, one or more storage devices such as hard disk drives, and, in many instances, a network interface adapter. One of the distinguishing characteristics of these systems is the use of a motherboard or system planar to electrically connect these components together. Examples of such computer systems are IBM's e-Server series, ThinkPad series, and Intellistation series.
The widespread use of personal computers in conjunction with networks has resulted in a reliance on the network resources such as e-business enterprises for, e.g., telecommuting, obtaining news and stock market information, trading, banking, shopping, shipping, communicating in the form of Voice Internet protocol (VoiceIP) and email, as well as other services. For many, PCs represent an essential tool for their livelihood. Thus, in today's networked world, the availability and performance of the network is as important as the availability and performance of the personal computer.
Today's e-business environment is very competitive so there is no room for failure. Servers such as the IBM pSeries help e-business enterprises remain competitive by remaining operational 24/7, 365 days a year. Because reliability is mission-critical, such servers include features to monitor for problems, features to correct or bypass minor problems on the fly, and hot swappable components to allow failed components to be replaced without powering down the server.
Memory overflows, however, are not adequately addressed. Memory overflows refer to situations in which tasks continue to store data in memory locations beyond their memory allocation. When a task exceeds its memory allocation, the task writes data in the space typically left between memory allocations and possibly in the next memory page. The next memory page may be allocated for use by another task so writing data in that memory page corrupts data utilized by another task. If left unchecked, the memory overflow condition may corrupt a sufficient amount of memory to cause that task and/or other tasks to crash. The computer system may even crash depending upon the extent of the overflow and types of tasks affected. The memory overflow condition is temporarily fixed after the task is restarted or the memory overflow forces the computer system to power down and reboot.
Current solutions attempt to avoid rebooting the computer system for memory overflow conditions by implementing memory overflow detection. The memory overflow detection may, for instance, monitor the content of the spaces between memory allocations to detect when the content of the space is corrupted. If a memory overflow is detected, often referred to as a segmentation violation, the corresponding task is terminated to prevent the memory overflow from affecting the execution of another task. If the task can be restarted without significantly impacting the execution of other critical tasks, the memory allocated to the task and the space are flushed. Then, the task is restarted. On the other hand, some tasks are critical to the continued operation of the computer system such as some of the tasks executed by a service processor of a server. Such tasks may not be restarted without powering down and rebooting the computer system.
Once again, whether the application is restarted or the computer system is rebooted, the memory overflow condition is temporarily fixed. However, powering down and rebooting a network server significantly impacts the availability and reliability of the server, which is a very undesirable effect when availability and reliability are key features for distinguishing the server from a multitude of available servers on the market. For instance, many servers take hours or even days to return to service once they are powered down. Further, the memory overflow condition is not corrected, the condition is just delayed until the next time that the circumstances facilitate a memory overflow by the task.
Upon identifying a memory overflow condition, the condition is reported to a technical support service. The technical support service may attempt to locate the erroneous code, fix the code, and supply the customers with the erroneous code with an update to prevent the error from occurring again. Depending upon the nature of the update, the customers may have to reboot the affected computer systems to install the update.
Therefore, there is a need for methods, systems, and media to enhance memory overflow management by identifying a memory overflow condition associated with execution of a task and adjusting memory allocation for the task to attenuate the memory overflow condition. In some situations, the corrected condition may eliminate the cause of the memory overflow condition, advantageously eliminating the need to install an update. In further situations, the impact of the memory overflow condition is attenuated, reducing the frequency and/or severity of the memory overflow condition.