Many computer systems dynamically allocate memory to a task. The following is a somewhat simplified explanation of dynamic memory allocation and garbage collection.
Referring to FIG. 1, a typical multitasking computer system 20 which uses a garbage collector 22 includes a CPU 24 and a defined memory space 26, which may include virtual memory. Each active task in the system is assigned a portion 28 of the computer's memory space 26. The task's memory space 28 can be divided into three regions: one region 30 for holding the code which represents and controls the task, another region 32 that contains a set of "root" pointers used by the task, and a third region 40, called the heap, which is used for dynamic memory allocation.
It should be understood that FIG. 1 represents only one of many ways in which memory may be allocated for storing the roots, code and heap associated with a task or a set of tasks.
For the purposes of this description, the terms "task", "mutator", "mutator thread", "thread" and "process" are used interchangeably. Tasks and programs are sometimes called mutators because they change or "mutate" the contents of the heap 40. The term "thread" relates to the continuity of a task or process, especially in multi-threaded environments in which each process is periodically interrupted by other ones of the processes in the system.
The term "object" is herein defined to mean any data structure created by a program or process. Objects are sometimes herein called program objects.
When the task associated with the heap 40 needs space for storing an array or other program "object", a Memory Allocator routine 42 is called. The memory allocator 42 responds by allocating a block of unused memory 44 in the heap 44 to the task. Additional requests for memory will result in the allocation of additional memory blocks 46, 48 and so on. Clearly, if the task continues to ask for more memory, all the space in the heap 40 will eventually be used and the task will fail for lack of memory. Therefore space must be restored by either explicit actions of the program, or some other mechanism.
It is well known that most tasks "abandon" much of the memory space that is allocated to them. Typically, the task stores many program objects in allocated memory blocks, and discards all pointers to many of those objects after it has finished processing them because it will never need to access those objects again. An object for which there are no pointers is often termed an "inaccessible object", and the memory space it occupies is "inaccessible" to the task which once used it.
The solution to this problem is to recover blocks of memory space in the heap 40 which are no longer being used by the task. Garbage collection is the term used to refer to automatic methods of recovering unused memory in the heap 40. Garbage collectors generally gather and recover unused memory upon the occurrence of a certain amount of memory usage, most typically when half of the storage space in the heap 40 has been allocated.
Thus, the purpose of garbage collection is to recover unused or abandoned portions of memory in a heap 40 so that the task using the heap 40 will not run out of memory.
Stop and Copy Garbage Collection. Stop and Copy garbage collectors compact the memory used by a task by copying all "accessible objects" in the heap to a contiguous block of memory in the heap, and changing all pointers to the accessible objects so as to point to the new copy of these objects. An accessible object is any object (i.e., block of memory) which is referenced, directly or indirectly, by the "roots" or "root set" of the task. Typically, the "roots" of a task are a set of pointers stored in known locations (generally in the program stack and registers used by the task), which point to the objects used by a task. Many of those objects, in turn, will contain pointers to other objects in the task. The chain, or graph, of pointers emanating from the root set indirectly points to all of the accessible objects in the heap.
The entire set of objects referenced by these pointers is herein called the set of accessible objects. Inaccessible objects are all objects not referenced by the set of pointers derived from the root.
By copying all accessible objects to a new contiguous block of memory in the heap, and then using the new copy of the objects in place of the old copy, the Stop and Copy garbage collector eliminates all unused memory blocks in the heap. It also "compacts" the memory storage used by the task so that there are no "holes" between accessible objects. Compaction is a desirable property because it puts all of the memory available for allocation to a task in a contiguous block, which eliminates the need to keep track of numerous small blocks of unallocated memory. Compaction also improves virtual memory performance.
FIG. 2 shows a "snap shot" of the Stop and Copy garbage collection process. "Old-space" 50 is the half of the heap 40 which was recently filled up and is now being compacted by copying the accessible objects into "new-space" 52. At the time of this snap shot the copying process has been only partially completed. As shown, new-space 52 is divided into several regions. Regions 54 and 56 both contain objects that have been copied from old-space. The objects in region 54 have already been "scanned", while those in region 56 are "unscanned".
When an object is scanned, all of the pointers in the object are inspected to determine whether they point to objects in new-space or old-space. Pointers to new-space objects need no further processing. Pointers to old-space objects are processed as follows. If the object 58 in old-space referenced by the pointer contains a "forwarding pointer" 60, this means that the referenced object has already been copied into new-space, and the pointer being processed is simply replaced with a copy of the forwarding pointer 60. The resulting pointer points to an object 62 in new-space 52 which is a copy of the object 58 in old-space 50.
If, however, a referenced object 64 in old-space does not contain a forwarding pointer, then a copy 66 of the referenced object 64 must be made in new-space 52, and a forwarding pointer 68 must be placed in the old-space object 64 so that object 64 will not be copied more than once into new-space 52. Note that objects are copied into new-space at the position of the UNSCANNED pointer 70, thereby using up a portion of the unused region 72 of new-space 52. After the object is copied into new-space, the position of the UNSCANNED pointer 70 is adjusted to point to the next available space in the unused region 72.
Stop and Copy garbage collection proceeds by sequentially scanning all of the objects in the unscanned region 56. As each object is scanned, the SCANNED pointer is advanced by one program object. The scanning process continues until there are no objects in the unscanned region 56. Once the scanning process is complete, garbage collection is complete, and the primary task associated with the heap 40 can be resumed.
After the completion of garbage collection, new objects created by the task are added to the New Object regions 76, which is at the end of the unused regions 72, at the position of the NEW pointer 74. New-space 52 is filled and a new garbage collection cycle must be started when there is insufficient space in the unused region 72 to store a new program object.
Generally, the new-space copy of a task's accessible objects occupies less space than the old-space copy, because old-space included abandoned, inaccessible objects. After copying the accessible objects into new-space 52, old-space 50 is unused until new-space 52 is completely filled with program objects. At that time, old-space and new-space are "flipped" (i.e., definitions of "old" and "new" space are interchanged), and the garbage collection process resumes.
An attractive property of Stop and Copy garbage collectors is that such collectors can have a running time proportional to the amount of accessible storage. The reason for this is that Stop and Copy collectors only process accessible objects, and ignore unaccessible objects. Thus, for example, if only thirty-five percent of the allocated memory space in the heap 40 is retained during garbage collection, the Stop and Copy collector only processes thirty-five percent of the allocated space.
However, a traditional Stop and Copy garbage collector cannot be used in a real-time computer system because the "latency" of the collector (i.e., the maximum amount of time that the mutator task is interrupted at any one time by the collector) can exceed the requirements of the real-time system. In other words, it is generally not possible to complete a Stop and Copy garbage collector cycle in less than the maximum latency of a real-time computer system.
In summary, the primary problem with using classical Stop and Copy garbage collectors in real-time computer systems is that the collector stops the other tasks in the computer for an unacceptably long period of time.
Baker's Algorithm. The garbage collection algorithm known as Baker's Algorithm is perhaps the best known real-time garbage collection algorithm. As will be described below, Baker's Algorithm has several major liabilities, including the facts that it is not concurrent and requires the use of specialized hardware in order for it to be implemented efficiently. See H. G. Baker, "List processing in real time on a serial computer", Communications of the ACM, 21(4):280-294, 1978.
Referring to FIG. 2, when new-space 52 fills up, the Baker collector stops the mutator, flips old-space and new-space, but then copies only the root objects into new-space (for example, those referenced by the mutator's registers). It then resumes the mutator immediately. Accessible objects are copied incrementally from old-space 50 to new-space 52 while the mutator executes. In particular, every time the mutator allocates a new object, the collector 22 is invoked to copy a few more objects from old-space (i.e., to scan a few more objects in the unscanned region 56).
In addition, in order to make the garbage collector invisible to the mutator, it is necessary to ensure that the mutator sees only new-space pointers in its registers. To accomplish this, every pointer fetched by the mutator must be checked to see if it points to old-space. If a fetched pointer points to an old-space object, the old--space object is copied to new-space and the pointer is updated; only then is the pointer returned to the mutator. As a result, old-space pointers are replaced with new-space pointers before they can be processed by the mutator, and therefore the mutator only sees new-space objects.
In systems using Baker's garbage collection algorithm, every fetch of a pointer and allocation of a new object is slowed down by a small, bounded amount of time. Thus the latency of the garbage collection (copying) process is low and Baker's algorithm is suitable for real-time applications.
The pointer checking called for by Baker's algorithm requires hardware support to be implemented efficiently. In particular, every pointer in the heap must be tagged with a one-bit or multi-bit flag that identifies old-space pointers and new-space pointers. The tag checking hardware required by the Baker collector inspects the tag associated with each pointer and calls an object copying routine when the inspected pointer references an object in old-space.
It should be noted that Baker's garbage collection algorithm is not concurrent because the mutator stops whenever the collector does a bit of work. Also, implementing a concurrent version of Baker's algorithm on a multiprocessor computer would require fine-grain locking on each object, adding more overhead.
It is noted that Baker's garbage collection algorithm can be implemented on stock hardware at the cost of an extra word per object, an extra memory indirection per object reference, and several extra instructions to change the contents of a cell. See Rodney A. Brooks, "Trading data space for reduced time and code space in real-time garbage collection on stock hardware," SIGPLAN Notices, Proceedings of ACM SIGSOFT/SIGPLAN Software Engineering Symposium on Practical Software Development Environments, pages 256-262, 1984.
Real-Time Collection Recuirements. The most typical requirement of a real time computer system is that the real-time tasks or mutators in the system must never be interrupted for longer than a very small constant time.
A collector has small "latency" if the interruptions of the mutators are short. An interactive workstation typically requires latencies of less than 0.1 second if collections are not to affect communications, mouse tracking, or animation on the screen.
A garbage collection program or task is said to be "concurrent" if the collector can do its work in parallel with another task (i.e., the mutator). A concurrent collector should allow for multiple mutator threads (processes) and multiple processors. Concurrency is useful even on a single processor computer, because the collector can run while the mutator is waiting for external events such as user input, page faults, and i/o.
A garbage collection program or task is said to be "efficient" if the amortized cost to allocate and collect an object is small compared to the cost of initializing the object.
An algorithm runs on "stock hardware" if it can run on standard commercial computer architectures such as the VAX and the 68000. We assume that any multiprocessor computers used with a concurrent garbage collector have an efficient shared memory.
Shared-memory multiprocessors are becoming widespread, so it's important to find efficient concurrent collection ( algorithms. With today's technology, the marginal cost of adding extra processors and caches to a machine is small. Most new large mainframe computers are multiprocessors, and it has been shown that it is also economical to build multiprocessor workstations. See C.P. Thacker, L.C. Stewart, and E.H. Satterthwaite, Jr., "Firefly: A Multiprocessor Workstation," Research Report 23, Digital System Research Center, Dec. 30, 1987.
Synchronization insures that objects in the heap are not referenced simultaneously by the garbage collector and a mutator. Fine-grained synchronization between the collec tor and the mutator is a problem for concurrent collectors because fine-grained synchronization either requires special hardware (which is expensive), or it requires extra instructions to be executed by the mutator and collector, which negatively impacts the speed of operation of the mutator. The present invention solves this problem by providing a less expensive medium-grained synchronization.