The present invention relates to management of memory in a computer, and in particular, to garbage collection.
Many computer systems provide for dynamic allocation of data objects. The performance of these systems relies on the ability to reclaim memory and re-use memory for dynamically allocated objects after the objects are no longer being used by an executing program. In practice, an object is considered unused when no reference on a computer system refers to the object. When no reference refers to an object, the object is referred to as being dead. Garbage collection includes the process of automatically reclaiming memory allocated to dead objects.
One conventional method of garbage collection is the xe2x80x9ctracingxe2x80x9d approach. A trace is the identification of objects which may be referenced, directly or indirectly, through a reference in a root set. A root set is one or more areas of memory that contain references which refer to, directly or indirectly, objects that are considered to be xe2x80x9clivexe2x80x9d for the purposes of garbage collection. A base set is a set of root sets that are traced by a garbage collector to find all the live objects in issue in the area of memory being managed by the garbage collector. Any object not identified through a trace of the root sets in the base set are considered dead, and memory allocated to the object may be reclaimed. For example, object A, object B, and object C reside in memory A. Call stack S is a root set. A reference from call stack S refers to object A, and a reference within object A refers to object B. Object A is thus directly referenced by the reference in call stack S and, object B is indirectly referenced by the reference in call stack S. A trace through the call stack identifies object A and object B, but not object C. Object C is therefore dead, and memory allocated to object C may be reclaimed.
The tracing approach poses several problems for computer systems that use large amounts of memory to store objects. Because execution of processes running on the computer system (e.g. real-time applications) are paused during garbage collection, and a trace accesses all the active objects, long delays in the execution of the processes may occur. Furthermore, accessing all the objects on a computer system violates the presumption of locality of reference that underlies virtual memory operating systems, and may result in excessive memory page thrashing.
These problems have prompted the development of the generational approach to garbage collection. Under the generational approach, two or more areas of memory are used to store objects according to age. The generational approach takes advantage of the empirical observation that newly created (xe2x80x9cyoungxe2x80x9d) objects tend to xe2x80x9cdiexe2x80x9d quickly (i.e. become unused). Newly created objects under a threshold size (small objects tend to have small life times) are stored in an area of memory referred to as a xe2x80x9cnurseryxe2x80x9d.
Under the generational approach, as the object in a nursery ages (e.g. remains alive after a threshold number of garbage collection cycles), the objects are moved from the nursery into another area of memory for storing older objects. Because the nursery contains the newer objects, the memory that is most often reclaimed and reallocated is clustered (i.e. in the nursery). Furthermore, garbage collection is performed more often on objects in the nursery. Thus, under the generational approach, locality of reference is improved.
One common approach to collecting memory from a nursery is the copying approach. Under the xe2x80x9ccopyingxe2x80x9d approach, an area of memory (i.e. the nursery) is divided into semispaces. One semispace is designated as the xe2x80x9cto-spacexe2x80x9d, and one is designated as the xe2x80x9cfrom-spacexe2x80x9d. Live objects are stored in the from-space, and newly created objects are allocated memory from the from-space.
When there is insufficient memory to allocate for a new object, garbage collection is performed. Objects identified as live through a trace are copied into the to-space. Because most objects in a nursery are dead due to the short life span of the objects, after copying the live objects the total memory allocated to objects in the to-space is much smaller than that was allocated in the from-space. The difference represents reclaimed memory.
In addition to copying objects, a reference referring to any object that was copied must be reset to refer to the new location of the copied object. Finally, the to-space is established as the current from-space, and the former from-space becomes the current to-space. New objects are allocated memory from the reclaimed portion of the newly established from-space.
Some computer languages lack runtime typing of data. It is not always possible to identify at runtime the references used by programs written in such languages. Garbage collectors used to manage the objects used by such programs are hampered by the difficulty in distinguishing object references from other types of data structures (e.g. integers, characters). A memory area that may contain one or more references (e.g. pointers) that may not be distinguishable from other types of data structures stored in the memory area is referred to as an ambiguous root set. A xe2x80x9cCxe2x80x9d call stack is an example of an ambiguous root set (i.e. a four byte entity stored in the call stack might represent a reference or a procedure parameter of the type integer).
The term xe2x80x9cambiguous referencexe2x80x9d refers to a portion of memory (e.g. the number of bytes in a pointer) which may or may not be a reference, but if evaluated as a reference refers to an area of memory occupied by an object. An object referred to by an ambiguous reference is considered to be live and may not be moved to another memory location for the following reason. After moving such an object, the ambiguous reference could not be modified because the ambiguous reference might in fact not be a reference, but instead, may be, for example, an integer. On the other hand, moving the object without modifying the ambiguous reference would break a reference to the object, if indeed the ambiguous reference was in fact a reference.
A hybrid of the copying approach that accounts for the problem of ambiguous references has been proposed by Bartlett. Structures shown in FIG. 1 are used to illustrate the Bartlett approach. FIG. 1 depicts from-space 102 and ambiguous root set 104. From-space 102 contains live object 132, and dead objects 134, 142, and 152. Garbage collection of objects in from-space 102 and its corresponding to-space (not shown) is performed by a garbage collector. Objects in from-space 102 and its corresponding to-space are sufficiently described to garbage collector 170 such that garbage collector 170 may discern boundaries between objects and the data structures contained within objects, as well as the data type of the data structures (i.e. integer, pointer).
Under the Bartlett approach, both the to-space (not shown) and from-space 102 are logically divided into pages, like page 130. Furthermore, an object cannot span a page. Thus, typically there is some memory left unallocated between the last object in a page and the end of the page.
Furthermore, during garbage collection, if any ambiguous reference in ambiguous root set 104 refers to memory that falls within a page, the whole page is xe2x80x9cpinnedxe2x80x9d. When a page is pinned, the objects contained in the page are left in the to-space and cannot be garbage collected. In addition, any other objects referred to (directly or indirectly) by objects in the pinned page become ineligible for garbage collection.
For example, ambiguous root set 104 contains an ambiguous reference 106, which refers to object 132 in page 130. Thus page 130 is pinned. Furthermore, object 134, which is in page 130, cannot be moved or garbage collected. Object 134 refers to object 142. Although object 142 can be moved, it cannot be garbage collected. Likewise, object 152, which is referred to by object 142 (and thus indirectly by object 134), can be moved but cannot be garbage collected.
The advantage of the Bartlett approach is that it can be performed very quickly. Scanning for ambiguous references in an ambiguous root set and determining which pages should be pinned only requires a few relatively quick operations (e.g. masking operations on the ambiguous pointer).
However, a major disadvantage of the Bartlett approach is that it inefficiently uses memory. In particular, memory between the last object in a page and the end of the page is not allocated. Furthermore, a large amount of memory occupied by objects within the same page as ambiguously referenced objects must remain uncollected, even though all or many of the objects are dead. Specifically, when an ambiguous reference refers to a first object in a first page, all the objects in the first page cannot be collected, even though all or many of the objects may be dead. In addition, any other object referred to directly or indirectly by any object in the first page becomes ineligible for garbage collection. For example, object 142 and object 152 cannot be garbage collected even though they are dead (i.e. not referred to by any reference, ambiguous or otherwise).
Based on the foregoing, it is desirable to provide a xe2x80x9ccopyingxe2x80x9d garbage collection method that accounts for ambiguous references and that quickly performs garbage collection without the overhead of wasted memory attendant the Bartlett approach.
A method and apparatus for a copying garbage collector are described. According to an aspect of the present invention, a set of ambiguously referenced objects is identified in a from-space and their boundaries are determined. Live objects in the from-space which are not ambiguously referenced are moved to the to-space, and the ambiguously referenced are transfixed (i.e. left behind to reside in the from-space). When moving objects to the to-space, the objects are moved into regions between boundaries of objects transfixed in the to-space.