1. Field of the Invention
The present invention pertains to memory management systems for symbolic digital processors. More particularly, the invention pertains to systems for reclaiming for active use by the symbolic digital processor those portions of memory that are occupied by various forms of data and instructions that are obsolete and are no longer used by the symbolic digital processor. Because such systems are used to remove the data and instructions that are no longer needed they are often referred to as "garbage collectors".
2. Prior Art 80
Digital processors that implement symbolic languages such as LISP, PROLOG and SMALLTALK utilize the concept of an "object". The term "object" means an associated collection of information contained in memory which the digital processor treats as a single entity. The object may consist of one piece of data, i.e. a number, it may consist of many pieces of data in the form of a list, it may contain code, i.e. a sequence of processor instructions, and it may contain pointers to other objects. An object can contain combinations of all of these various items. In a typical application that is programmed in an object-oriented language, an object will contain both data and the functions or procedures that operate on such data. The important characteristic of the object is that the digital processor is able to recognize the object and to treat it as a single entity, which entity may include within it pointers to other objects in memory.
In simplest form, a pointer to an object is simply a memory location that contains the address of a location elsewhere in memory at which location the object pointed to is located. Because an object may contain within it pointer to other objects, a particular object may be but one of a series of objects in a sequence or chain of objects that originate from a set of base objects. The sequence of objects may even close upon itself in that a pointer located in an object in the sequence may point to an object that appears earlier in the sequence or "chain" of objects
In the implementation of a symbolic language such as LISP, PROLOG, SMALLTALK, OBJECTIVE C or C++, the digital processor, referred to here as the "operating processor", creates numerous objects in memory The operating processor accesses the objects by means of the chains or sequences of pointers, all of which originate from within a collection of pointers and/or objects containing pointers, which collection is denoted the "root set". Languages such as SMALLTALK contain a standard library of objects from which all other objects are created. This standard library of objects is the "root set". In some other languages, such as C++, the programmer must first create a library of objects that together comprise the root set. For the purpose of this invention the important point is that, in operation, the root set of objects is fixed and unchanged, except that the pointers in the root set of objects may be altered so as to point to new objects created by the operating processor. Furthermore, all objects outside of the root set are accessible to the operating processor only by means of a chain or sequence of pointers that originate in the root set of objects. Because each object in such a "chain" of objects can, itself, point to more than one object, the objects that are accessible by the operating processor may be thought of as points on the limbs or branches of trees that represent the interconnecting pointers and that arise from the root set of objects.
In the computational process, the operating processor creates and alters (mutates) numerous objects in memory and creates, alters and destroys the pointers to these objects. Although during the computational process the operating processor may alter the location pointed to by a pointer contained in the root set, the operating processor does not otherwise alter the objects in the "root set". (Although the operating processor could be programmed to alter the objects in the root set, such a program would not arise in realistic application.) In the normal course of the computational process, many objects that have been created by the operating processor and which occupy space in memory are no longer needed. Such obsolete objects are referred to as "garbage" and the identification and removal of such objects is known as "garbage collection".
Various garbage collection schemes exist which utilize the fact that only those objects that can be accessed by the operating processor through a chain of pointers that originate in the root set need be retained in memory. Those objects that can no longer be accessed via such chains of pointers from the root set have become obsolete and are "garbage: and the memory that was occupied by these obsolete objects may be reclaimed for the storage of new objects.
One of the classical methods of garbage collection is called mark and sweep. With modifications, this algorithm is used in many modern LISP implementations. In this method the computational process, i.e. the processing or mutation of objects by the operating processor, is suspended periodically so that the operating processor, may be used to identify and remove from memory those objects that have become obsolete.
There are two phases to the algorithm: the mark phase and the sweep phase in the mark phase, beginning at the root set of objects, the pointers are followed along every "branch" of the tree. Every object that is encountered is marked by setting a "marking" bit in the object. All of the branches of the tree are searched to identify and mark all objects that can be accessed from the root set.
One method for systematically searching all of the branches is to follow a path from an object in the root set all of the way to the end of a branch or limb, recording the location (or address) of each point along the path at which the limb divides or branches. The search is then retraced to the most recently encountered point of branching, the limb just searched is marked as having been searched and all of the objects encountered in the return from the end of the limb to the point of branching are marked. Any unsearched limbs extending from the branching point are then searched to the end of each such limb and the location of any intervening points at which branching occurs are recorded. The process is repeated until all limbs (sequences of pointers) emanating from the objects in the root set have been searched.
After completion of the marking phase, the processor "sweeps" memory and removes all objects that have not been marked, i.e. the portions of memory that are occupied by objects that are no longer accessible from the root set are made available to the operating processor for the storage of newly created objects. In some implementations, in order to expedite the computational process, the accessible objects are then aggregated in one area of memory and the memory that has been reclaimed from occupation by obsolete objects is aggregated in another area. The pointers from the root set of objects are then altered so as to point to the new locations in memory to which the accessible objects have been moved.
A major disadvantage of the mark and sweep garbage collection algorithm is that the processing by the operating processor is suspended during the garbage collection. In a large scale system in which a substantial portion of the memory resides on disk or tape, garbage collection may interrupt processing for a period of hours. In many applications, such extended interruptions cannot be tolerated.
The lengthy interruptions for garbage collection may be avoided using a technique known as "copying garbage collection". Baker's algorithm is an example of this technique. See G. Baker, Jr., "List Processing in Real Time on a Serial Computer", Communications of the ACM, Vol. 21, No. 4, April, 1978. In a "copying" garbage collector, memory is divided into equal-sized areas called old space and new space. The operating processor is used in three different roles: first as a "mutator" to carry out the computation process required by the particular user application; second as a "transporter" to transport or copy objects in old space to new space; and third as a "scavenger" to search for objects located along the branches emanating from the root set of objects. In a simple implementation, the operation of the processor as a "mutator" is interrupted intermittently for short periods of time so that the processor may follow a set of instructions that cause it to function for a few moments either as a "scavenger" or as a "transporter".
When functioning as a "scavenger," the processor searches for objects on each of the branches emanating from the root set in the manner indicated above. Whenever the scavenger finds an object in old space that has not already been moved or copied into new space, the scavenger invokes the "transporter" software which copies the object from old space into new space and which leaves a pointer in old space at the location of the copied object that points to the location of the copied object in new space. The scavenger also then replaces the pointer that was used by the scavenger to arrive at the object in old space by a pointer that points to the location of the object in new space.
The operation of the mutator is restricted such that it places all newly created objects only in new space. Whenever the mutator either reads or modifies (mutates) an object that is located in old space (i.e. that has not been copied and replaced by a pointer to its location in new space) the mutator suspends its operation and invokes the transporter to move (i.e. to copy) the object in old space into new space and to replace the object in old space by a pointer to its new location in new space.
After the scavenging of old space is complete, the computations of the mutator are briefly suspended so that the allocation in memory of old space and new space can be interchanged or "flipped". During the "flip" the pointer in the root set that point to objects in old space are replaced by pointers that point to the locations of these objects in new space. The designations of new space and old space are then interchanged and the entire garbage collection process is begun again.
Improved copying garbage collection systems that sub-divide old space and new space into generations, so that the scavenging operations can be concentrated upon those generations of objects that are most likely to have become obsolete are described by Courts in U.S. Pat. No. 4,807,120 and by McEntee et al. in U.S. Pat. No. 4,797,810. Such garbage collection systems that take into account the temporal history of the objects are sometimes referred to as "ephemeral garbage collectors".
In "Garbage Collection in a Large Lisp System", Proc. 1984 ACM Symposium on Lisp and Functional Programming, August 1984, by D. Moon, the author describes a variation of the Baker Algorithm, in which variation the new space is sub-divided into copy space and new space. Objects from old space are copied into copy space and objects newly created by the mutator are placed only in new space. An operational "barrier" is erected between old space and the other spaces such that references to old space cannot spread to objects in new space. As a consequence only old space and copy space need to be searched by the scavenger for references to objects in old space.
Although the "copying garbage collection" schemes referred to above can avoid the interruption for extended periods of the operation of the processor as a mutator, the operation of the "mutator" is still interrupted whenever it encounters an object in old space that, since the last "flip", has not been copied to copy space. Since immediately following a "flip" nearly every existing object will be located in old space, the operation of the mutator, after a flip, will be repeatedly interrupted and slowed while the transporter is invoked to transport the object to copy space. The computational speed of the mutator is also slowed down as more objects are placed in copy space because each reference by the processor to an object that has been copied to copy space requires two accesses to memory, first to the pointer in old space that replaced the copied object and second to the object in copy space pointed to by the pointer in old space. Because the timing and rate at which such interruptions and delays may occur are unpredictable, a computational system that utilizes "copying" garbage collection will not be usable in a "real time" application if the computational speed of the mutator must be reliable and predictable.
Garbage collection systems that utilize a reference count that is maintained for each object are described by Watson in U.S. Pat. No. 4,755,939 ("939") and by Oxley et al. in U.S. Pat. No. 4,775,932 ("932"). When the count of reference pointers to an object becomes zero, the Watson ("939") and Oxley ("932") inventions assume that the object is no longer accessible from the root set and that the memory allocated to the object may be reclaimed. One problem with reference count systems, however, is that a group of objects may point to one another in a cyclical manner such that the reference count for each such object is not zero even though none of the objects in the group is pointed to by objects outside the group and though the objects are not accessible from the root set. Nevertheless, such a group of objects is obsolete and the memory occupied by such a group should be "collected" and reallocated for use by the processor. However, because none of the reference counts is zero, such groups are not detected by the reference count technique.
The Oxley invention ("932") recognizes the fact that the reference count technique will not remove all obsolete objects. For this reason the Oxley invention ("932") also uses a second garbage collection technique based upon the Baker algorithm to remove the obsolete objects that have not collected by the reference count technique. In addition to a central processing unit that acts as an operating processor, the Oxley invention ("932") also utilizes a second processor, a memory management processor, that operates in the background and that has the responsibility for garbage collection. In addition to removing objects for which the reference count has become zero, the memory management processor operates on individual sub-sections or sub-spaces of memory, one at a time, to identify objects that are not obsolete, to copy such current objects to a new sub-sections or sub-spaces in memory and to reclaim the sub sections of memory from which all currently accessible objects have been copied.
In the Oxley "932" invention, as each current object in an "old" sub-space is identified and copied to a "new" sub-space, a pointer is left in the old sub-space, which pointer points to the location in the new sub-space to which the object was copied, in a manner similar to that of the copying garbage collectors described above. Because the Oxley invention replaces the object that was copies from the old sub-space by a pointer to the location of the object in the new sub-space, the operating speed of the operating processor is affected by the garbage collection process. The operating speed is affected because whenever the operating processor accesses an object in the old sub-space that has been replaced by a pointer to the object's location in the new sub-space, the operating processor must perform a second access to memory to obtain access to the copied object. The Oxley invention also imposes the requirement that during the period of time in which an object is being copies from the old sub-space to the new sub-space, the operating processor must be denied access to this object. The need for a second access to memory for copied objects and the denial of access to objects that are in the process of being copied both degrade the predictability of the speed of the operating processor and reduce the attractiveness of this garbage collection technique for use in processors used in real time applications.