The present invention relates to management of memory in a computer. More specifically, the invention relates to conservative garbage collection.
Many computer systems provide for dynamic allocation of data objects. The performance of these systems relies on the ability to reclaim memory and re-use memory for dynamically allocated objects after the objects are no longer being used by an executing program. In practice, an object is considered unused when no reference on a computer system refers to the object. When no reference refers to an object, the object is referred to as being dead. Garbage collection includes the process of automatically reclaiming memory allocated to dead objects.
One conventional method of garbage collection is the “tracing” approach. A trace is the identification of objects which may be referenced, directly or indirectly, through a reference in a root set. A root set is one or more areas of memory that contain references which refer to, directly or indirectly, objects that are considered to be “live” for the purposes of garbage collection. A base set is a set of root sets that are traced by a garbage collector to find all the live objects in issue in the area of memory being managed by the garbage collector. Any object not identified through a trace of the root sets in the base set are considered dead, and memory allocated to the object may be reclaimed. For example, object A, object B, and object C reside in memory A. Call stack S is a root set. A reference from call stack S refers to object A, and a reference within object A refers to object B. Object A is thus directly referenced by the reference in call stack S and, object B is indirectly referenced by the reference in call stack S. A trace through the call stack identifies object A and object B, but not object C. Object C is therefore dead, and memory allocated to object C may be reclaimed.
The tracing approach poses several problems for computer systems that use large amounts of memory to store objects. Because execution of processes running on the computer system (e.g., real-time applications) are paused during garbage collection, and a trace accesses all the active objects, long delays in the execution of the processes may occur. Furthermore, accessing all the objects on a computer system violates the presumption of locality of reference that underlies virtual memory operating systems, and may result in excessive memory page thrashing.
These problems have prompted the development of the generational approach to garbage collection. Under the generational approach, two or more areas of memory are used to store objects according to age. The generational approach takes advantage of the empirical observation that newly created (“young”) objects tend to “die” quickly (i.e., become unused). Newly created objects under a threshold size (small objects tend to have small life times) are stored in an area of memory referred to as a “nursery.”
Under the generational approach, as the object in a nursery ages (e.g., remains alive after a threshold number of garbage collection cycles), the objects are moved from the nursery into another area of memory for storing older objects. Because the nursery contains the newer objects, the memory that is most often reclaimed and reallocated is clustered (i.e., in the nursery). Furthermore, garbage collection is performed more often on objects in the nursery. Thus, under the generational approach, locality of reference is improved.
One common approach to collecting memory from a nursery is the copying approach. Under the “copying” approach, an area of memory (i.e., the nursery) is divided into semispaces. One semispace is designated as the “to-space,” and one is designated as the “from-space.” Live objects are stored in the from-space, and newly created objects are allocated memory from the from-space. An innovative approach to garbage collection is described in U.S. Pat. No. 6,421,689, which is hereby incorporated by reference for all purposes.
When there is insufficient memory to allocate for a new object, garbage collection is performed. Objects identified as live through a trace are copied into the to-space. Because most objects in a nursery are dead due to the short life span of the objects, after copying the live objects the total memory allocated to objects in the to-space is much smaller than that was allocated in the from-space. The difference represents reclaimed memory.
In addition to copying objects, a reference referring to any object that was copied must be reset to refer to the new location of the copied object. Finally, the to-space is established as the current from-space, and the former from-space becomes the current to-space. New objects are allocated memory from the reclaimed portion of the newly established from-space.
Some computer languages lack runtime typing of data. It is not always possible to identify at runtime the references used by programs written in such languages. Garbage collectors used to manage the objects used by such programs are hampered by the difficulty in distinguishing object references from other types of data structures (e.g., integers, characters). A memory area that may contain one or more references (e.g., pointers) that may not be distinguishable from other types of data structures stored in the memory area is referred to as an ambiguous root set. A “C” call stack is an example of an ambiguous root set (i.e., a four byte entity stored in the call stack might represent a reference or a procedure parameter of the type integer).
The term “ambiguous reference” refers to a portion of memory (e.g., the number of bytes in a pointer) which may or may not be a reference, but if evaluated as a reference refers to an area of memory occupied by objects. An object referred to by an ambiguous reference is considered to be live and may not be moved to another memory location for the following reason. After moving such an object, the ambiguous reference could not be modified because the ambiguous reference might in fact not be a reference, but instead, may be, for example, an integer. On the other hand, moving the object without modifying the ambiguous reference would break a reference to the object, if indeed the ambiguous reference was in fact a reference.
With Java virtual machines (“VMs”), interpreted code is executed in the same address space as compiled native code, such as C code. As the VM needs to perform garbage collection and the Java objects being collected may be referenced by this C code, it is necessary for the garbage collector (“GC”) to deal with the C code. There are two common approaches for this.
One solution is to have the C functions “register” the Java objects they are using before they receive a reference to the object. Similarly, when the C functions are done with an object, they send a message that they no longer need the object. This is called “precise garbage collection.” A disadvantage of this is that the C code has to be written in a way that is less performant, stylized and harder to maintain.
Another solution is to have the GC scan the C stack frames and registers to identify potential references to Java objects. Objects that are identified are marked in some manner that they are live. This is called “conservative garbage collection.” One disadvantage of this is that objects identified as live cannot be relocated. Another disadvantage is that the GC will sometimes identify what appears to be a reference to an object when, in fact, it is not and is an artifact of a previous computation or just a random bit pattern.
With conservative garbage collection, it can also be very difficult to identify the problem when a program that should work fails because the program ran out of memory due to the over-conservative nature on the part of the GC (i.e., identified references did not refer to live objects). These cases can also appear randomly and it can be difficult for users to provide test cases that reliably reproduce the error. Even if such a test case is available, it can be very difficult to determine what C code is responsible for the false references that are causing the failure.
Accordingly, it would be beneficial to have innovative techniques for debugging memory issues in a stack conservative GC. Additionally, it would be beneficial if the techniques did not require a detailed understanding of the code of the program of interest.