This invention relates generally to computer systems having multiple processes and in particular to memory management cleanup systems for memory shared amongst one or more processes.
Shared memory is useful for interprocess communications and is provided in several computer operating systems. Those operating systems that allow shared memory also provide some form of memory management as well. In the more primitive systems, if a process terminates normally it is disconnected from shared memory, but if it terminates abnormally, an unsophisticated system might not free up the memory it used. Thus other applications might not see that the terminated process was no longer using the space but would think that portion of memory was still in use. Over time memory resources could be severely impacted.
As a result, more recent operating systems incorporate some type of cleanup or "garbage collection" to free up shared memory resources that were allocated to now-defunct processes or programs, including shared resources allocated to processes that terminated abnormally. Since the operating system controls the creation and termination of processes, it is usually designed to not only withstand but to know about any abnormal terminations of processes when they occur. Given this knowledge, the operating system is easily able to free up shared memory resources when such an abnormal termination is detected.
However, in some operating systems, there are limitations placed on shared memory use. For example, in the Unix operating system a single process can only share memory with about 12 or fewer regions. There are also limits on the number of shared memory objects that can be created at any one time across the system.
To circumvent such operating system limits, in an otherwise desirable operating system, some users create application systems that have one shared memory partition or region used for one or more memory pools, each of which may contain many shared memory elements, and layers of application software manage the elements within the shared space.
In this approach a shared memory pool is created by the application program. It contains memory elements. Each element has an object identifier (id) and a use count. When an application system client program wishes to share access to information about an object, it allocates an element in the shared memory pool using that object's identifier as a key. If it is the first program sharing information about that object, a use count of zero is found so the program will increment the use count to 1 and store the object id in the element. If more application system programs want to share information about the same object, they find the existing element with the relevant object id and increment the use count.
To minimize overhead and duplication of operating system memory management, some application systems further provide that when a process is no longer interested in the shared object information it decrements the use count for the corresponding shared memory element. When all processes have disconnected from a particular object's information, the use count should be decremented back to 0, thus freeing up the shared memory element for reuse. However, if a process doesn't explicitly disconnect from the element before it terminates, as is often the case for abnormal terminations, the use count will not be decremented for that client program or process and will never return to 0. The storage will not be reused. Since the shared memory partition is a fixed size, the repeated loss of these elements may ultimately jeopardize the ability to allocate new elements, and there will be many unused elements that appear to be used. FIG. 4 shows an existing implementation of this approach.
Processes that terminate abnormally are usually not able to disconnect from their shared memory elements before termination. Hence, they are likely to leave behind elements that appear to be in use, but are not. Since an application system, unlike an operating system, is not usually aware of abnormal terminations such as these, it is more difficult for it to provide the garbage collection function that the operating system does.
Application systems having multiple processes sharing memory elements are thus susceptible to out-of-memory situations if they take a considerable time to execute. As an example, backup application systems for large disk storage systems having multiple disks in array configurations may have several backups operating concurrently, each backup operating as a process. Such backups may take hours to execute.
Such backup application systems are often run unattended overnight from client workstations or servers over a network, and are responsible for backing up anywhere from a few megabytes to gigabytes or multiples of gigabytes from disks to tape on anywhere from one to 96 or more disks in a system. If the backup application system is in the middle of backing up several hours worth of work and multiple gigabytes of data, out of memory situations can be a severe problem.
If several processes terminate abnormally, the backup application system may attempt to continue but eventually may be unable to allocate shared memory to new or replacement processes, or even existing processes that need more.
This, in turn, can cause the failure of the entire backup. Since backups of large systems are often done unattended overnight, they may need to be rerun during the day, if possible, or rescheduled for the following night. If a scheduled backup does not occur, the user's data is more at risk if catastrophic disk failures occur.
It is an object of the present invention to provide a shared memory cleanup for an applications system.
It is another object of the present invention to free up unused shared memory elements, returning them to the pool of available elements.
Still another object of the present invention is to provide a way for an application memory management program to free shared memory elements allocated to programs or processes that have terminated abnormally.