1. Field
The present disclosure relates to computer systems and methods in which dynamically-allocated storage is periodically reclaimed to remove unused data objects. More particularly, this disclosure concerns improvements in automated heap memory reclamation, also known as garbage collection.
2. Description of Prior Art
By way of background, when a software program dynamically allocates heap memory for a data object, the object continues to “live” (as a valid object) as long as a reference to it (i.e., a pointer) exists somewhere in the active state of the program, such that the object is reachable via the reference. When an object ceases to be referenced from the active state and is no longer in use, it becomes “garbage.” Garbage collection is an automated memory management operation that can be performed periodically to identify unused objects and reclaim their memory space.
Garbage collection (GC) is supported by various software programming languages, including Java® (registered trademark of Sun Microsystems Inc., hereinafter referred to as “Java”), C#, C+ and C. One popular type of garbage collection uses a tracing algorithm known as “mark-sweep.” According to this technique, a mark phase is first implemented wherein live/reachable objects in the heap are located and “marked” as being reachable. A sweep phase is then implemented in which all unmarked memory locations are reclaimed. During the mark phase, the garbage collector first performs a root scan of the program call stacks to identify stack variables that reference objects on the heap. Such objects are known as garbage collection “roots” (GC roots). The set of all GC root objects found by the root scan is known as a garbage collection root set (GC root set). The garbage collector also traverses all object sub-trees that emanate from the GC root objects to find additional non-root objects that are reachable from the root objects. All objects that are encountered during the root scan and sub-tree traversal operations are marked as being “in use” by setting a temporary mark flag. The mark flag may be implemented as a bit in a “mark bit array”. During the sweep phase, the garbage collector uses the mark flags to note the memory locations of the marked objects, then sweeps the heap and finds all of the free space that can be reused. Compaction may thereafter be performed to consolidate the free space into contiguous regions of heap memory. All of the mark flags set during the mark phase of the current garbage collection cycle will be cleared prior to the mark phase of the next garbage collection cycle.
In some mark-sweep implementations, the root scan and sub-tree traversal operations performed during the mark phase are implemented using an iterative technique involving the use of a “mark stack.” According to this technique, as each GC root object is encountered during the root scan, its mark flag is set and a reference to the root object is pushed onto a mark stack for deferred processing. Following root scanning, sub-tree traversal operations proceed by popping references off the mark stack. As each reference is popped, the object is inspected and any sub-tree object pointed to by the object is evaluated. The mark flag for the sub-tree object is set, and a reference thereto is pushed onto the mark stack. The mark stack will then be processed in the same fashion. Such operations will repeat until all of the reachable sub-tree objects have been identified for all the root objects.
As the cost of memory continues to drop, applications are running with infrastructures having increasing amounts of physical memory. This enables developers to write applications that maintain most of their data objects in main memory rather than saving the data in file systems. Such objects will often be referred throughout the lifetime of the application. In-memory databases that use in-memory tables for faster transaction processing are one example.
Garbage collection has an associated overhead, referred to as pause time, that delays the operation of concurrently executing applications. Garbage collection pause times tend to grow with heap size, which can affect the scalability of applications that maintain large amounts of live in-memory data. Moreover, garbage collection can incur unnecessary overhead when there are a large number of persistent objects insofar as such objects are rarely discarded. Because persistent objects may persist indefinitely (sometimes for the life of an application), the overhead associated with scanning such objects is largely wasted effort.
To address this concern, some garbage collectors use strategies that process short-lived objects differently than long-lived objects. For example, generational garbage collection divides heap memory into two or more regions that segregate objects by generation, then focuses garbage collection on the younger generation region(s) where unused objects are more likely to be found. This mechanism incurs the overhead associated with copying long-lived objects multiple times between regions. Moreover, because generational garbage collection is heuristic in nature, it does not reclaim all memory associated with unused objects during every cycle, which means that global mark-and-sweep garbage collection needs to be performed periodically. These global mark-and-sweep garbage collection cycles still face the overhead of repeated processing of persistent objects.
The Metronome garbage collector in IBM® Websphere® products uses another garbage collection strategy that allocates persistent objects in a separate memory area. However, the garbage collector cannot identify persistent objects by itself and relies on users to specify such objects through an API (application program interface). Moreover, once an object is identified as persistent, it remains so forever. If the object is released at some point, its memory will not be reclaimed unless the garbage collector is specifically advised.