1. Field of the Invention
This invention relates generally to digital computers, and more particularly to methods and systems for optimizing the efficient use of digital computer memories.
2. Description of the Prior Art
Large digital computer systems often employ specialized hardware and/or software routines to optimize memory system characteristics or performance. Among these routines are two known as garbage collection and virtual memory.
The goal of garbage collection is to recover usable address space that is no longer being utilized by the application program. By increasing data density via garbage collection, system performance is increased by lowering memory access time, and cost is decreased by reducing required primary (e.g. RAM) and secondary memory (e.g. disk) capacity.
Virtual memory permits a user to access secondary memory as if it were primary memory. As such, virtual memory frees the user from concerns as to the size limitations of the primary memory.
Although both garbage collection and virtual memory free the user of unwanted work, they do so at the cost of additional computation. Unfortunately, the computations required for garbage collection are generally incompatible with the computations required for virtual memory, resulting in poor interaction between the two routines. The reason for this is that virtual memory minimizes its computational load by exploiting a general observation about program execution: instruction and data references tend to be localized to reasonably small areas of the address space, while garbage collection saves the user effort by searching through every data object in the system to locate those which are no longer in use. Consequently, garbage collection violates the foundation upon which virtual memory depends. A user pays for this poor interaction between garbage collection and virtual memory with long interruptions in computation during which memory is being moved back and forth between primary and secondary memories in a process known as "thrashing".
For the following discussion of garbage collection and virtual memory techniques, it will be useful to define some terminology. "Data memory" will refer to the area where essentially all data or data objects reside. "Data objects" are the structures containing the actual information to be processed by the program. Data objects can be stored in the execution stack(s) and in processor registers as well as in data memory, such as the system RAM. Examples of data objects are Lisp CONS cells, vectors, or bignums.
"References" identify data objects and provide a uniform and efficient way for handling them. Efficiency can be gained by passing around constant-sized references rather than arbitrary-sized data objects.
Data memory is broken into a dynamic area and a static area. The dynamic area will be called the "heap". For the purposes of this paper, a heap will always be considered to be a reasonably large contiguous area of memory. Storage in the heap is allocated for new data objects as the user's application program creates them.
The static area of data memory will be called the "base set". In a Lisp system, the base set consists of data objects referred to by code or objects such as symbols which must always be available. Data objects and references contained in the stack(s) and registers are usually also considered part of the base set.
A data object is considered to be "live" as long as it can be reached through some chain of references originating in the base set. The storage associated with live data objects must not be reclaimed. Only data objects in the heap are subject to reclamation; data objects in the base set can never die.
A "minimal base set" will be used to denote only the set of references from the base set into the heap. References which do not point into the heap, and self-contained data in the base set, are not considered part of the minimal base set. Duplicate references from the base set into the heap constitute separate entries in the minimal base set.
There are a number of classical garbage collection schemes in existence. While they vary in complexity and effectiveness, all garbage collection schemes include the steps of: (1) identifying the live and/or dead data objects; and (2) recovering the storage associated with the dead data objects.
One type of garbage collection is known as "reference counting". Reference counting systems attempt to identify the point at which the data object is no longer live by noting when the last reference to the data object is lost. This is accomplished by incrementing a counter, the "reference count," associated with each data object each time a reference to it is created, and decrementing the counter each time a reference is lost. References are created and lost through assignment or the creation and destruction of environments. When the reference count reaches zero, the data object is no longer referenced and its storage may be reclaimed.
Reference counting has the desirable feature that garbage collection is accomplished at the earliest possible moment. Furthermore, the reclamation is spread out through time rather than being lumped into a single long, and potentially disruptive, operation. On the negative side, reference counting cannot reclaim dead data objects linked into a circular structure nor data objects whose reference counts have overflowed. Also, the computational effort expended with reference counting is proportional to the number of dead data objects in the system's memory, placing large demands upon the system CPU.
Another garbage collection scheme is known as the "mark and sweep" collector. With reference to FIG. 1, storage is allocated from a "free list", i.e. a list of free storage spaces in the heap, until some minimum threshold of available storage is reached. Upon reaching this threshold, the user computation is stopped and garbage collection begins (see FIG. 1a). The base set is first traversed to find all data objects that are recursively reachable. When an object in the heap is encountered for the first time, it is marked. Then all objects in the heap which are recursively reachable from the object are traversed and marked, if not already marked. Marking usually involves setting a bit associated with the data object to indicate that it is live. FIG. 1b shows the state of the mark bits after the bottom element of the base set and those data objects recursively reachable from it have been traversed and marked. Should a marked object be encountered later through some other chain of references, the fact that it is marked will prevent further effort from being expended in traversing the object and those reachable from it. When all reachable objects have been marked, the mark phase ends and the sweep phase begins, as illustrated in FIG. 1c.
At this point in the mark and sweep garbage collection scheme, it is known that all live objects in the heap have been marked and that any unmarked objects are dead. The sweep phase recovers the storage used by dead data objects by linearly sweeping the heap looking for unmarked data objects. When a live object is encountered, the mark associated with it is cleared. When a dead object is encountered, the storage associated with it is placed on a free list. At the end of the sweep, the garbage collection is complete and the user computation may be resumed (FIG. 1d).
The benefits of a mark and sweep collector are that no dead structure can survive a garbage collection, data need not be moved (which may be important in some systems), and that the entire space allocated to the heap can be used for data. A negative aspect of mark and sweep collectors is that memory fragmentation can become a serious problem unless compaction or elaborate multi-space allocation is done. Also, compaction must be an atomic operation since the data memory is inconsistent during that time (barring complex schemes which will degrade system performance) and, because of the multiple garbage collection phases, accesses of the same location are guaranteed to occur at substantially disjoint points in time. This can have substantial bearing on the virtual memory performance.
Yet another garbage collections scheme is known as "stop and copy" which works by copying live data objects from the space being collected into an unused space. With reference to FIG. 2, this scheme "divides" the heap into two equally-sized spaces. All allocation takes place from one of the spaces known as "FromSpace". The second space, known as "ToSpace" remains empty until garbage collection begins. When there is no more room in FromSpace, the user computation is stopped and garbage collection begins (FIG. 2a). The base set is traversed looking for references into FromSpace. When one is found, a check is made to see if there is a "forwarding pointer" where the data object should be. A forwarding pointer is used by the collector to indicate the new location of a moved data object. If there is a forwarding pointer, the original reference is updated to point where specified by the forwarding pointer and the base set traversal is continued. If the data object is in FromSpace, it is copied to the next available location in ToSpace, a forwarding pointer is placed at the old location in FromSpace to indicate where the data object was copied to, and the original reference is updated to reference the moved data object (FIG. 2b).
When the base set has been fully traversed, ToSpace can be scanned linearly looking for references from copied objects into FromSpace. As references are found into FromSpace, the data objects are copied to ToSpace (if they have not already been copied) and the reference is updated to reflect the new location. Since ToSpace is scanned from the same end that allocation first occurred and the scan is in the direction of new allocation, data objects copied during the scan will always be placed at the end of ToSpace where scanning has yet to happen. When the scanning pointer reaches the new allocation pointer, the garbage collection is complete (FIG. 2c).
There should no longer be any reference into FromSpace; all that remain are dead objects and forwarding pointers. FromSpace and ToSpace are interchanged and the user computation can continue. The interchange of the spaces reclaims all space in FromSpace for use as ToSpace during the next collection (FIG. 2d). The classical version of the algorithm described permits only half of the available dynamic data memory to be allocated before a garbage collection must take place; the second half of the memory remains idle and empty, waiting for copying to occur.
A virtual memory expense that might be excessive in the stop and copy scheme is that of having to actually access the heap when a reference to FromSpace is encountered. The least predictable accessing in garbage collection is that involved with marking or checking to see if a data object needs to be copied. If there were only one reference in existence for each data object, the same number of heap references would be required for either scheme. However, there is often more than one reference to a data object. In the mark and sweep scheme, a small table of mark bits can be used to avoid actually making accesses to a large heap for any but the first reference mark encountered. The stop and copy scheme essentially requires making these fairly random heap accesses to obtain the forwarding information.
The stop and copy scheme does have several advantages. Its implementation is quite simple, as there are no tricky tables to build and no elaborate techniques necessary to achieve recursions without using large amounts of space. In the simplest case, in which the base set is a single contiguous block, a linear scan of it followed by a linear scan of ToSpace will accomplish the entire traversal of all live objects.
Another nice feature of this scheme in virtual memory is that each reference is touched only once. While a compacting mark and sweep collector needs to access each reference once to initiate marking and once to relocate the reference after compaction has occurred, the stop and copy scheme combines these phases into a single pass, potentially reducing virtual memory paging by a factor of two.
Another aspect of the stop and copy scheme which has added to its attractiveness is its ability to be converted into an incremental algorithm. Since the heap can be guaranteed to be consistent by making the run-time systems cognizant of forwarding pointers and by making the copying of data objects and storing of forwarding pointers atomic, garbage collection can be distributed over long periods.
Prior art garbage collection schemes such as those described above tend to be slow and disruptive. Interruptions due to non-reference counted garbage collection are viewed as unwelcome distractions that impede progress and destroy trains of thought. As such, much research has been conducted into ways for reducing the time spent in garbage collection.
A logical starting point for speeding up garbage collection is to compress key information into small tables and to make memory accesses more orderly. For example, allocating objects in the base set such that they are physically contiguous (i.e. they may be linearly rather than randomly scanned) and using compact tables of mark bits rather than mark bits spread through memory can improve virtual memory performance.
Modifications such as those mentioned above can result in substantial improvements, but do not address a key problem. Garbage collectors historically must deal with, at a minimum, every live object in the system during a collection. This includes all the data structures used by those programs which were created but rarely, if ever, used. Requiring the collector to access and possibly move all this relatively stable structure takes time that would be better spent on user computation. In a worst case scenario, it is possible that the entire user computation must be moved out of primary memory to make room for the relatively stable structure which must be traversed during garbage collection.
Another approach to speeding up garbage collection involves the real or apparent reduction in size of the base set and/or heap. It is readily apparent that garbage collection in a system with a small base set and heap would be faster than garbage collection in a system with a large base set and heap. In fact, if the total space occupied by the base set and the heap is less than the primary memory size, it might be possible to accomplish the entire garbage collection routine without any virtual memory paging, resulting in enormous performance gains. Pragmatically, however, reducing the size of data memory available in the system is not a popular issue with most users, particularly since reduced data space is considered to be a disadvantage with the stop and copy garbage collection approach. A solution, then, is to make the base set and heap appear to be smaller to the garbage collector, but not to the user.
Recent work has suggested that effort spent in garbage collection can be reduced by making use of two simple heuristics which depend upon knowledge of the age of an object: (1) The mortality rate among newly created objects is substantially higher than that of older objects; and (2) References contained in an object tend to identify previously created objects.
The first heuristic leads to a reduction in effort by reducing the size of the heap to be collected. The reduction results from noting that newly allocated areas tend to yield the highest ratio of dead to live objects. By focusing the effort on regions of newly allocated objects where the mortality rate is expected to be high, the size of the heap is effectively lessened. The older portion of the heap can be temporarily absorbed into the base set. Only dead data objects in the newer part of the heap are reclaimed, dead objects that are contained in the older part of the heap are not reclaimed and may artificially prolong the life of some objects in the newer part.
Tests indicate that a 30% improvement is experienced with both compacting mark and sweep and stop and copy garbage collectors when modified as described above. A first reason for this speed-up is that the reduction in size of the collectable heap makes it possible to avoid much of the marking or copying that would normally happen. The entire base set is to be traversed, so pointers into the base set are not recursively followed. Thus, no effort is spent following pointers into the stable part of the heap; the stable heap will be traversed as part of the base set. A second reason for the speed-up is that the randomness of access has been reduced substantially. Random accesses into the stable heap dictated by the base set are replaced by a linear sweep of the stable heap. Basically, work has been reduced by replacing decision (involving which objects are live in the stable heap) with definition, and by imposing order on an otherwise chaotic accessing pattern.
On the negative side, it is still necessary to look through the entire base set to find references into the reduced heap. The price being paid is that of looking through every single data object that could ever be used by any program that was ever loaded into the system.
The second heuristic says that the base set can be reduced. This is done by noting that when data objects are created they are initialized to reference other data objects which already exist. It is quite hard to reference something which has yet to be created. Therefore, there is a tendency for references to point backward in time. Of course, it is possible for an old object to be modified to reference a newer object, but empirically that is the exception to the rule. Thus, to garbage collect a particular portion of the heap, it is only necessary to look for references in newer parts of the heap and in places where older parts of the heap have been modified to reference the newer parts.
Very few of the variations of garbage collectors discussed so far preserve information regarding order of creation (i.e. relative age) of objects across a collection. Using the heuristics requires having some knowledge of this age information.
It is useful to differentiate more than old and new objects; there needs to be multiple age groups. In one scheme it is suggested that the heap be broken into many smaller subheaps, each of which is associated with an "age". By maintaining a table (or entry vector) of older locations which have been modified to point into each younger subheap, the base set for a given subheap can be reduced to the entry vector plus younger subheaps.
Unfortunately, by looking through newer areas of the heap to find references into older areas of the heap, there will be scanning of a substantial percentage of dead objects that will be treated as live. Based on the heuristics, under normal circumstances it would be unwise to do a collection of some middle-aged subheap since it is the youngest subheaps that contain the bulk of the reclaimable space. If a subheap other than the youngest is to be collected, probably the best approach is to collect it at the same time younger areas are being collected.
Another scheme known as "generation scavenging" breaks the heap into several logical (not physical) subheaps or "generations". Each reference contains a "generation tag" which identifies the age of the data object. When a reference to a newer generation is stored in an older generation data object, the location of the older data object is added to an entry vector known as the "remembered set". At garbage collection time, the remembered set acts as the base set for the younger generations. By reducing the size of the base set which needs to be searched, the amount of paging that needs to take place will, in all likelihood, be reduced. If garbage collections occur frequently enough, the base set to be searched can be contained totally within the memory resident working set.
Generation scavenging employs an interesting memory layout to help minimize paging. The dynamic memory is broken into an "old area" and a "new area". Objects are always created in the new area. When an object has survived for a sufficiently long time, it is moved from the new area to the old area. Old area objects are considered fairly stable. Only occasional garbage collection is performed on objects in the old area; during normal execution only the new area is collected. By moving objects to the old area, the heap size can be reduced.
The new area of the heap can be broken into three areas: "NewSpace", "PastSurvivorSpace", and "FutureSurvivorSpace". PastSurvivorSpace and FutureSurvivorSpace perform much the same functions as FromSpace and ToSpace in the stop and copy collector. During a collection, live objects in PastSurvivorSpace are moved to FutureSurvivorSpace. When the collection is complete, the two spaces are switched in preparation for the next collection.
NewSpace is used to avoid excessive virtual memory paging. Rather than allocate new objects in PastSurvivorSpace as would be done in a classic stop and copy scheme, all new objects are allocated in NewSpace. By keeping the new allocation local to NewSpace, the pages always used for new allocation stand a good chance of remaining memory resident. It is not necessary to move the allocation area back and forth as PastSurvivorSpace and FutureSurvivorSpace are switched following collections.
During a collection live objects in NewSpace are moved to FutureSurvivorSpace. Indeed, by careful layout of the area for new allocation, generation scavenging avoids gratuitous paging and permits the partial collections to be run unnoticed during interactive sessions.
References contain the age information in a generation tag, so data objects of different generations may be freely mixed in PastSurvivorSpace without fear of losing track of the age of any individual object. When an object survives a given number of collections, its generation data is modified to indicate that it has gotten older. When the tag shows that the object is old enough, it is moved to the old area in a process known as "tenuring".
Another method to increase performance of garbage collection is known as the "ephemeral collector". Ephemeral collection is an incremental scheme which utilizes hardware to maintain the entry vectors and to decrease effort necessary to deal with the non-uniformity resulting from the interleaving of the user computation (mutator) with the garbage collector (transporter and scavenger).
Much like generation scavenging, ephemeral collection is based upon a two space copying scheme modified to deal with reduced heap and base set size. The ephemeral collector supports two categories of dynamic data (dynamic and ephemeral) and a single category of static data. Dynamic objects can be viewed as either the tenured objects of generation scavenging, or as a third, intermediate lifetime generation.
Within the category of ephemeral, or short lifetime objects, are several "levels" which indicate the age of an object. The level of an object is derivable from its address through the use of a table; no bits in the reference are needed to encode the age. As an object survives collections it moves to higher levels until it is finally moved to the category of a dynamic object. The ephemeral collector provides for fine grained control over which levels are being collected at any time.
Hardware is used to maintain the entry vector for the ephemeral regions. When a store occurs, the hardware (the "barrier") which monitors the bus looks at the value being stored. If the tag of the data indicates that the data is a reference, and from looking at the address contained in the reference it is known that this reference is into an ephemeral area, then a mark is made in a page table (GCPT) to indicate that the page being written contains a reference into ephemeral space. Should it be necessary to remove a page from physical memory to make room for another page, the system software searches the page looking for references in ephemeral space. If any are found, the page and level of ephemeral object referenced are recorded in a B* tree (ESRT).
The GCPT and ESRT define the base set to be used during collection of any level of ephemeral object. The difference between the remembered set of generation scavenging and the GCPT and ESRT of the Ephemeral collector is in the way they are maintained (in-line code vs. hardware) and in the detail of information being kept. The remembered set contains specific objects in the base set. The GCPT and ESRT identify an object according to the page containing it. Because entire pages can be scanned quickly, the detail of exactly which object contains the possible reference of interest is not believed to be necessary.
The suggestions discussed above have all pointed out ways to increase performance of garbage collection. They are all based on reducing the heap to be collected and the base set to be traversed. Each of these schemes promises a solution to the problem of garbage collector performance, but there is a price to be paid. The ephemeral collector requires new hardware, the barrier and GCPT to be present in a tagged architecture to record references from region to region. Generation scavenging has been shown effective in systems running on conventional hardware, but requires a new tag in each reference. The cost in additional tag handling and reduced address range, along with the effort needed to maintain the remembered set, may make generation scavenging impractical for many new or existing implementations.
The following references are considered to be of interest as representative background art:
(1) O. Babaoglu and W. Joy, "Converting a Swap-Based System to do Paging in an Architecture Lacking Page-Referenced Bit 3", Proceedings of the Eighth Symposium on Operating Systems Principles, Pacific Grove, Calif., 1981, 78-86. PA1 (2) H. Baker, "List Processing in Real Time on a Serial Computer", Communications of the ACM, Vol. 21, 4 (April 1978), 280-294. PA1 (3) S. Ballard and S. Shirron, "The Design and Implementation of VAX/Smalltalk-80", in Smalltalk-80: Bits of History, Words of Advice, G. Krasner (editor), Addison Wesley, 1983, 127-150. PA1 (4) C. Cheney, "A Nonrecursive List Compacting Algorithm", Communications of the ACM, Vol. 13, 11 (November 1970), 677-678. PA1 (5) D. Clark and C. Green, "An Empirical Study of List Structure in Lisp", Communications of the ACM, Vol. 20, 2 (February 1977), 78-87. PA1 (6) P. J. Denning, "The Working Set Model for Program Behavior", Communications of the ACM, Vol. 11, 5 (May 1968), 323-333. PA1 (7) P. J. Denning, "Virtual Memory", Computing Surveys, Vol. 2, 3 (September 1970), 153-189. PA1 (8) L. P. Deutsch and D. Bobrow, "An Efficient Incremental Automatic Garbage Collector", Communications of the ACM, Vol. 19, 9 (September 1976), 522-526. PA1 (9) R. Fenichel and J. Yochelson, "A LISP Garbage-Collector for Virtual-Memory Computer Systems", Communications of the ACM, Vol. 12, 11 (November 1969), 611-612. PA1 (10) J. Foderaro and R. Fateman, "Characterization of VAX Macxyma", Proceedings of the 1981 ACM Symposium on Symbolic and Algebraic Computation, Berkeley, Calif., 1981, 14-19. PA1 (11) H. Lieberman and C. Hewitt, "A Real-Time Garbage Collector Based on the Lifetimes of Objects", Communications of the ACM, Vol. 26, 6 (June 1983), 419-429. PA1 (12) D. Moon, "Garbage Collection in a Large Lisp System", ACM Symposium on Lisp and Functional Programming, Austin, Tex., 1984, 235-246. PA1 (13) P. Rovner, On Adding Garbage Collection and Runtime Types to a Strongly-Typed, Statistically-Checked, Concurrent Language, CSL-84-7, Xerox PARC, Palo Alto, Calif., 1985. PA1 (14) G. Steele, Common Lisp, The Language, Digital Press, 465 pp., 1984. PA1 (15) P. Steenkiste and J. Hennesssy, "LISP on a Reduced-Instruction-Set-Processor", Proceedings of the 1986 ACM Conference on Lisp and Functional Programming, Cambridge, Mass., 1986, 192-201. PA1 (16) G. Taylor, P. Hilfinger, J. Larus, D. Patterson, and B. Zorn, "Evaluation of the SPUR Lisp Architecture", Proceedings of the Thirteenth Symposium on Computer Architecture, Tokyo, Japan, 1986, 444-452. PA1 (17) D. Ungar, The Design and Evaluation of a High Performance Smalltalk System, PhD. Thesis, UC Berkeley, UCB/CSD 86/287, March 1986. PA1 (18) J. White, "Address/Memory Management for a Gigantic LISP Environment or, GC Considered Harmful", Conference Record of the 1980 LISP Conference, Redwood Estates, Calif., 1980, 119-127.