1. Field of the Invention
The invention pertains to memory management in computer systems generally and pertains more particularly to management of memory that belongs to a process's heap.
2. Description of Related Art: FIG. 1
In a computer system, a given execution of the code for a program is performed by a process that runs on the computer system. Each process has its own address space, i.e., a range of addresses that are accessible only to the process, and the program's code and the data for the given execution are all contained in the process's address space. A process's address space exists only for the life of the process.
FIG. 1 shows a process address space 102 as it appears during execution of an application program 105 that is written in a language such as C or C++ which permits the programmer to explicitly allocate and free memory for the program in process address space 102. Process address space 122 is subdivided into a number of different areas. Program code 103 contains the code for the program. The code includes the code for application program 105 and other code that is invoked by application program 105; here, the only additional code shown is for allocator 111, which is library code that allocates and frees memory. Next comes static storage 117, which contains static data used by application program 105 and the library code that is executed by application program 105. Then comes stack 121, which contains storage for data belonging to each procedure currently being executed by program 105. Then comes unused address space 123. Finally, there is heap 125, which contains storage which is explicitly allocated and freed by statements in program 105 that invoke functions in allocator 111. The size of both stack 121 and heap 125 may increase and decrease during execution of application program 105; if address space 123 is completely consumed, the process cannot continue to execute application program 105 and the process is said to crash.
Continuing in more detail with application program 105 and allocator 111, allocator 111 includes a malloc function 113, which allocates blocks 127 in heap 125, and free function 115, which frees blocks in heap 125. Both of these functions are external, in the sense that they may be called by other code such as application program 105. This fact is indicated in FIG. 1 by the placement of the blocks representing the functions along one edge of the block representing the allocator. Allocator 111 maintains data structures including a free list 119 from which all of the free heap blocks 131 can be located; when the allocator allocates a block 127, it removes the block 127 from free list 119; when it frees the block, it returns the block 127 to free list 119. If there are no blocks 127 on free list 119, allocator 111 expands heap 125 into unused address space 123. Allocation is done in response to an invocation 107 of malloc function 113 in application program 105; the invocation specifies a size for the block and allocator 111 removes a block of that size from free list 119 and provides a pointer to the block to application program 105. A pointer is an item of data whose value is a location in the process's address space. Freeing is done in response to an invocation 109 of free function 115 in application program 105; the invocation provides a pointer to the block 127 to be freed and the free function uses the pointer to return the block to free list 119. Because application program 105 must explicitly allocate and free blocks in heap 131, application program 105 is said to employ explicit heap management. An example of a widely-used public domain allocator for explicit heap management is Doug Lea's allocator. It is described in the paper, Doug Lea, A memory allocator, which could be found in September, 2001 at http://gee.cs.oswego.edu/dl/html/malloc.html.
As is apparent from the foregoing, the programmer who writes application program 105 must take care to avoid two errors in managing heap 125:                freeing a heap block 127 before application program 105 is finished using it; and        failing to free a heap block 127 after application program 105 is finished using it.        
The first error is termed premature freeing, and if the application program references the block after it has been freed, the block may have contents that are different from those the program expects. The second error is termed a memory leak. If an allocated heap block 127 is not freed after the program is done using it, the allocated heap block 127 becomes garbage, that is, a heap block 127 that is no longer being used by application program 105, but has not been returned to free list 119, and is therefore not available for reuse by application program 105. If the process executing application program 105 runs long enough, the garbage that accumulates from a memory leak can consume all of unused address space 123, causing the process to crash. Even before a memory leak causes a process to crash, the garbage in heap 125 ties up resources in the computer system and degrades the performance of the process executing application program 105 and of other processes running on the computer system.
As application programs have grown in size and complexity, have been written and maintained by many different programmers over a span of years, and have been executed by processes that cease running only if the computer system they are running on fails, memory allocation errors such as memory leaks and premature frees have become an increasingly important problem. The larger and more complex a program is, the greater the chance that allocation errors will occur, particularly when a programmer working on one part of the program does not understand the conditions under which a heap block allocated by another part of the program may-be freed. When the program is used and modified by many different programmers over a period of many years, the risk of allocation errors increases further. In addition, if a program uses library routines provided by third parties such as vendors of operating systems, these library routines may contain allocation errors. Finally, the fact that programs which were developed for processes that only ran for a limited time are now being executed by processes that effectively “run forever” means that memory leaks which were formerly harmless now result in sluggish performance and crashes. Problems caused by allocation errors are moreover difficult to diagnose and fix; they are difficult to diagnose because the state of a process's heap is a consequence of the entire history of the given execution of the program represented by the process; consequently, the manner in which problems caused by allocation errors manifest themselves will vary from one process to another. They are difficult to fix because the invocation of the free function (or lack thereof) which is causing the problem may be in a part of the code which is apparently completely unrelated to the part which allocated the heap block.
A fundamental solution to the problem of allocation errors is to make heap management automatic. The programmer is still permitted to allocate blocks 127 in heap 125, but not to free them. The automatic heap management is done by garbage collector code which can be invoked from other code being executed by the process. A process with automatic heap management is shown at 133 in FIG. 1. Process address space 102 is as before, but allocator 111 has been replaced by garbage collector 139. Garbage collector 139 has an external malloc function 141 which is available to application program 135. Garbage collector 139 also has a free function 145, but free function 145 is an internal function that is not available to application program 135, as indicated by the location of free function 145 within garbage collector 139. Since only the malloc function is external, application program 135 contains invocations 137 of malloc function 141 but no invocations of free functions 145. The process periodically executes the garbage collector, and when executed, the garbage collector scans heap 125 for heap blocks 127 that are no longer being used by application program 105 and returns unused heap blocks 127 to free list 119.
When executed, garbage collector 139 determines which heap blocks 127 are no longer being used by the process by scanning pointers in the process's root data, that is, process data that is not contained in heap 125, for example, data in static data area 117, stack 121 and machine registers, and in allocated heap blocks 129 to see if the pointer being followed points to a heap block 127. If there are no pointers pointing to a given heap block 127, that heap block is not being used by the process and can be freed. Garbage collector 139 frees the unused block as described above: by invoking a free function that returns a pointer to the block to free list 119.
There are many different kinds of garbage collectors; for a general discussion, see Richard Jones and Rafael Lins, Garbage collection, Algorithms for automatic dynamic memory management, John Wiley and Sons, Chichester, UK, 1996. In the following we are concerned with conservative garbage collectors. For purposes of the present discussion, a conservative garbage collector is any garbage collector which does not require that pointers have forms which make them distinguishable from other kinds of data. Conservative garbage collectors can thus be used with programs written in languages such as C or C++ that do not give pointers forms that distinguish them from other data. These garbage collectors are conservative in the sense that they guarantee that they will not free a heap block that is being used by the process, but do not guarantee that allocated heap blocks 129 contains only blocks that are being used by the process.
Conservative garbage collectors include a marker function 143 which makes an in use table 120 that contains a list of the locations of all of the allocated heap blocks 129. Marker function 143 then scans the root data and allocated heap blocks 129 for data which has values that could be pointers into heap 105. When it finds such a value, it uses in use table 120 to determine whether the data points to a heap block; if it does and the heap block has not yet been marked as in use in table 120, marker function 143 marks the block in the table. When the scan is complete, the locations for all heap blocks that are in fact in use have been marked in in use table 120. The blocks 127 in table 120 that have not been marked are not being used by the process, and garbage collector 139 returns these blocks 127 to free list 119. A commercially-available example of a conservative garbage collector is the Great Circle® garbage collector manufactured by Geodesic Systems, Inc., 414 N. Orleans St., Suite 410, Chicago, Ill. 60610. Information about the Great Circle garbage collector can be obtained at the Geodesic Systems, Inc. Web site, geodesic.com
The performance of a conservative garbage collector can be enhanced if the conservative garbage collector can reduce the number of heap blocks pointed to by false pointers. A false pointer is a value that the garbage collector takes to be a pointer to a heap block 127, but is in fact not really a pointer at all. As mentioned above, a conservative garbage collector treats every data value that can be interpreted as a pointer as such; for example, if the pointers in the computer system on which the process is running are aligned 32-bit values, the garbage collector will treat every aligned 32-bit value as a pointer. The problem with false pointers is that when an allocated heap block has a false pointer pointing to it, the false pointer will prevent the garbage collector from returning the block to the free list even though there are no (or no more) real pointers pointing to it.
One technique for reducing the number of heap blocks pointed to by false pointers is blacklisting. When the garbage collector detects a pointer that points to an area of the heap that does not presently contain allocated heap blocks, the pointer is clearly a false pointer. When the garbage collector detects such a pointer, it blacklists the block by adding it to a list of such blocks; this list is termed the blacklist. When the collector expands the heap into an area that contains blacklisted blocks, it uses the blacklist to determine what blacklisted blocks are in the area. The blacklisted blocks are not placed on the free list; consequently, only unblacklisted blocks are allocated, thereby reducing the chance that the block being allocated will not be able to be freed because of a false pointer. Like real pointers, false pointers may disappear as a result of changes in the process's storage; when a mark phase can no longer find any pointers that point to a blacklisted block, the garbage collector returns the blacklisted block to the free list in the sweep phase.
A problem with prior-art garbage collectors such as garbage collector 139 is that garbage collector 139 replaces allocator 111. That fact makes it difficult to retrofit garbage collector 139 to a program which was written for an allocator 111. A prior-art technique for retrofitting is employed in the Great Circle garbage collector. When an application program is to be executed with the Great Circle garbage collector, a library of programs that belong to the garbage collector is linked to the application program when the process that is executing the program begins running. The library of programs includes functions that manage heap 125, among them a malloc function and a free function that replace the malloc and free functions 113 and 115 of allocator 111. The replacement malloc function is identical to malloc function 141 of garbage collector 139; the replacement free function is a function which does nothing and returns; garbage collector 139 then uses its own internal free function as described above to return blocks 127 that are not being used by the process to free list 119.
Simply replacing an existing allocator 111 with heap management functions belonging to garbage collector 139 is undesirable whenever the replacement of the allocator with functions belonging to garbage collector 139 involves a substantial change or risk of substantial change in the behavior of the application program that uses the allocator 111. One such situation is with legacy programs that are known to work well with allocator 111, but where garbage collection would be desirable to deal with memory leaks caused by third-party library routines that are invoked by the application program. Such ancillary leaks are termed in the art litter, and the question for those responsible for maintaining the application program is whether the advantages of using garbage collector 139 for litter collection outweigh the risk of changing a known allocator 111. Another such situation is where the application program works better with the allocator it presently has than it will with the garbage collector's heap management functions. The application program may work better with the allocator it presently has either because the application program has been optimized for use with allocator 111 or a custom allocator has been optimized for use with the application program. In either case, replacing the allocator with the allocation functions of the garbage collector may result in substantial losses of efficiency, either with regard to speed of execution of the allocation functions or with regard to management of heap 125.
The undesirable effects of replacing an existing allocator 111 with heap management functions belonging to garbage collector 139 are a particular example of a general problem in the design of conservative garbage collectors: that the garbage collector not only determines what heap blocks 127 may be freed, but also performs the general heap management functions of an allocator. What is needed, and what is provided by the present invention is a conservative garbage collector which can use any existing allocator 111 to perform the heap management functions. Such a conservative garbage collector could be used with any application program, without risk of substantially affecting the application program's behavior. More fundamentally, the separation of the garbage collector from the allocator permits modular development of both allocators and garbage collectors. It is thus an object of the present invention to provide a conservative garbage collector which does not include heap management functions, but instead uses those provided by an allocator that is separate from the garbage collector.