This invention relates to automatic reclamation of allocated, but unused memory, or garbage, in a computer system that uses a space-incremental garbage collector to process an object space. Modern programming languages like the Java™ programming language or C# allow the use of automatic memory reclamation, or “garbage collection”, which relieves programmers of the burden of explicitly freeing, or de-allocating, storage allocated to objects when the objects are no longer used, or reachable, by the application program. Memory reclamation may be carried out by a special-purpose garbage collection algorithm that locates and reclaims dynamically assigned memory (called “heap” memory) that is unused, but has not been explicitly de-allocated. There are many known garbage collection algorithms, including reference counting, mark-sweep, mark-compact and generational garbage collection algorithms. These, and other garbage collection techniques, are described in detail in a book entitled “Garbage Collection, Algorithms for Automatic Dynamic Memory Management” by Richard Jones and Raphael Lins, John Wiley & Sons, 1996.
However, many of the aforementioned garbage collection techniques often lead to long and unpredictable delays because normal application thread processing must be suspended during the garbage collection process and these collectors at least occasionally scan the entire heap memory. For example, many modern applications have large live data sets, sometimes measured in gigabytes. Even on fast modern processors, collecting the entire heap in one atomic operation can take several seconds. Some applications require only minimizing the total garbage collection overhead and may be able to tolerate such delays. Other applications will tolerate significant garbage collection overhead, if the total garbage collection delay is broken into a series of smaller delays and each individual delay is small enough that it does not unduly delay individual operations of the application. For example, real-time or interactive systems where non-disruptive behavior is of greatest importance generally cannot use techniques which collect the entire heap in one operation and, thus, cause considerable disruption.
Several conventional techniques are typically used to alleviate long garbage collection delays. In accordance with one such technique, some portion of the collection process occurs concurrently with the operation of the application program. For example, on a multiprocessor system, one processor might perform garbage collection while another processor concurrently executed the application. While concurrent operation works well for some collection tasks, it is more difficult to use with others. In particular, it is often desirable for a garbage collector to perform “compaction.” More specifically, if a portion of a garbage-collected heap contains a mix of reachable and unreachable objects, then even if the unreachable objects are identified and made available for re-allocation, it may be difficult to allocate new objects in the free space because it is fragmented into many relatively small non-contiguous pieces. Compaction moves the reachable objects together, creating larger contiguous areas of reachable objects and free space, which enables easier allocation of new objects. A common compaction method is to move or “evacuate” all the reachable objects in a sub-region of the heap memory to another, smaller, contiguous portion of the heap, making the evacuated region entirely free space.
Another solution for limiting pause times is to use a space incremental garbage collector. In such a collector, heap memory is divided into a set of equal-sized “regions”. Some process (perhaps a concurrent marking process) identifies reachable and unreachable objects. Then collection may be accomplished by selecting regions (often regions containing few reachable and many unreachable objects) for evacuation, and evacuating their reachable objects. A few such regions can be evacuated at a time, breaking up a large disruptive collection operation into a number of smaller, less disruptive collection operations. In addition, these collectors automatically compact the heap regions that are collected since all reachable objects are copied to a contiguous region of the heap memory. Examples of space-incremental collectors include the Mature Object Space (or “Train”) collector disclosed in “Incremental Garbage Collection for Mature Objects”, R. L. Hudson and J. E. B. Moss, Proceedings of the International Workshop on Memory Management, v. 637 of Lecture Notes in Computer Science, pp. 388-403, University of Massachusetts, USA, Sep. 16-18, 1992. Springer-Verlag., the MC2 collector described in “MC2: High-performance Garbage Collection for Memory-constrained Environments”, N. Sachindran, J. E. B. Moss and E.D. Berger, Proceedings of the 19th Annual ACM SIGPLAN Conference on Object-oriented Programming, Systems, Languages and Applications, pp. 81-98, Vancouver BC, Canada, 2004, ACM Press, New York, N.Y. and the Garbage-First garbage collector described in general in “Garbage-First Garbage Collection”, D. Detlefs, C. Flood, S. Heller and A. Printezis, Proceedings of the 4th international symposium on Memory management, pp. 37-48, Vancouver, BC, Canada 2004.
Since space incremental collectors move reachable objects from one memory region to another, other objects located outside of a region being collected that contain pointers to the objects being moved must also be identified so that the pointers can be updated to refer the new locations of the moved objects. One conventional mechanism for identifying these other objects is to use a “remembered set” for each region. A remembered set is a data structure associated with a region containing information that identifies all objects outside of that region that might contain pointers to objects inside that region. During collection, the remembered set data structure is scanned to locate the other objects. The aforementioned space-incremental collectors all use remembered set data structures to enable space-incremental collection.
However, remembered sets can present their own problems. One problem is maintenance of the remembered sets by both the collector and the application program. Remembered sets are typically updated by the collector during a collection. A write barrier is used to intercept pointer modifications made by the application program and to update the appropriate remembered set when each modification is made. The overhead imposed by such a write barrier on the application program can be substantial and, therefore, known methods are used to reduce this overhead. One such method is to use “card marking.” In this technique, the heap memory is partitioned into “card” areas of equal size and a card table maintains a bit for each card area. Whenever, an application program modifies an object located within a heap region represented by a card area, it marks that card area as “dirty in the card table. Although each pointer modification must still be intercepted by a write barrier, in most systems, the cost of updating the card table is much less then updating the remembered set. Later, during a collection, the collector uses the card table to locate dirty card areas and then scans those areas to locate the inter-generational pointers and update the remembered sets.
Another problem with remembered sets is that, in some situations, the remembered set data structures can grow very large, consuming memory space that could be better utilized holding user data. The conventional solution to this problem is to trade space for precision. In accordance with this solution, when a remembered set grows too large, its representation is “coarsened” so that fewer bits of remembered set information represent a larger region of the heap that might contain relevant pointers. Collection of a region whose remembered set data structure has been coarsened in this way is more expensive, since larger heap portions represented by the remembered set information must be scanned to find pointers into the region being collected. For example, a more fine-grained remembered set representation might have represented small portions of heap memory in which pointers into the region being collected occurred densely, whereas the more coarse-grained representation represents large portions of the heap in which pointers into the region being collected occur sparsely.
This additional collection expense can be a considerable problem in a garbage collector like the aforementioned Garbage-First garbage collector that maintains a detailed model of collection costs, including remembered set scanning costs, in an attempt to perform collections that reliably fit within a user-specified time limit. More specifically, if a remembered set data structure is coarsened, the associated region may be too expensive to collect.