One feature of Java is its garbage-collected heap, which takes care of freeing dynamically allocated memory that is no longer referenced. The Java Virtual Machine's (JVM's) heap stores all objects created by an executing Java program. Objects are created by Java's “new” operator, and memory for new objects is allocated on the heap at run time.
Garbage Collection is the process of automatically freeing objects that are no longer referenced by the program. This frees the programmer from having to keep track of when to free allocated memory, thereby preventing many potential bugs and headaches. When an object is no longer referenced by the program, the heap space it occupies must be recycled so that the space is available for subsequent new objects. The Garbage Collector must determine which objects are no longer referenced by the program and make available the heap space occupied by such unreferenced objects. In the process of freeing unreferenced objects, the Garbage Collector must run any finalizers of objects being freed. In addition to freeing unreferenced objects, a Garbage Collector may also combat heap fragmentation. Heap fragmentation occurs through the course of normal program execution. New objects are allocated, and unreferenced objects are freed such that free blocks of heap memory are left in between blocks occupied by live objects. Requests to allocate new objects may have to be filled by extending the size of the heap even though there is enough total unused space in the existing heap. This will happen if there is not enough contiguous free heap space available into which the new object will fit. On a virtual memory system, the extra paging required to service an ever-growing heap could degrade the performance of the executing program.
A potential disadvantage of a garbage-collected heap is that it adds an overhead that can affect program performance. The JVM has to keep track of which objects are being referenced by the executing program, and finalize and free unreferenced objects on the fly. This activity will likely require more CPU time than would have been required if the program explicitly freed unnecessary memory. In addition, programmers in a garbage-collected environment have less control over the scheduling of CPU time devoted to freeing objects that are no longer needed.
A Garbage Collector performs several tasks. First, the GC must detect garbage objects. Second, the GC must reclaim the heap space used by the garbage objects and make this space available to the program. Garbage detection is ordinarily accomplished by defining a set of roots and determining reachability from the roots. An object is reachable if there is some path of references from the roots by which the executing program can access the object. The roots are accessible to the program. Any objects that are reachable from the roots are considered live. Objects that are not reachable are considered garbage, because they can no longer affect the future course of program execution.
The Heap also maintains a pointer which will indicate were the next object is to be allocated with in the heap. Initially the pointer is set to the base address of the reserved address region. When a new object is created with the new operator it will make sure that the bytes required for the new object is available on heap. The heap detects this by adding the size of the new object to heap pointer. If pointer is beyond the end of the address space region, then the heap is full and a collection must be performed.
When the Garbage Collector starts running it assumes that all the objects in the heap are garbage. The Garbage Collector starts walking the roots and building a graph of all objects reachable from the roots. Once all the roots have been checked, the Garbage Collector's graph contains the set of all objects that are somehow reachable from the application's roots. Any objects that are not in the graph are not accessible by the application, and are therefore considered garbage. The Garbage Collector walks through the heap linearly, looking for contiguous blocks of garbage objects. The Garbage Collector then copies the non-garbage objects down in memory removing all of the gaps in the heap. Moving the objects in memory invalidates all pointers to the objects. Therefore the Garbage Collector modifies the application's roots so that the pointers point to the objects' new locations. In addition, if any object contains a pointer to another object, the Garbage Collector is responsible for correcting these intra-heap pointers as well. Finally the pointer is positioned just after the last non-garbage object.
Broadly, garbage collection schemes may be divided into copying and non-copying mechanisms. Non-copying collectors admit fragmentation of the heap, which, at the extreme, can mean that an allocation request may fail if no suitably sized block is free in the heap, even though the aggregate free space in the size exceeds the size of the request. Depending on the workload, fragmentation may worsen over the lifetime of the heap. Copying collectors avoid fragmentation and move objects in an attempt to maximize the size of free regions. In addition, copying collectors may allow the use of simpler and faster allocation schemes, such as “bump pointers”. A bump pointer simply points to the base of a large free region in the heap. Allocation requests are serviced by simply incrementing the bump pointer.
Orthogonal to copying, collectors may use tracing or reference counting. At collection time a tracing collector follows all references, identifying all reachable objects. Non-reachable objects are garbage and the underlying memory is recycled and made available for subsequent allocation requests. A reference counting collector maintains counts of the number of references to an object. When the counter reaches zero the object may be immediately reclaimed. (In a sense, a tracing collector traces, live, reachable objects, and a reference counting collector traces potentially dead or unreachable objects). For easy of explication, the discussion below assumes copying, tracing collector, that uses bump pointer allocation.
Furthermore, we assume a concurrent copying collector. The collector is concurrent in the sense that it runs in parallel with the Java application threads (sometimes called “mutator” threads). Our concurrent copying collector is page-based in that it selects a page to evacuate and the copies all live objects residing on that page to new locations. At a later time that entire page can be reclaimed.
Garbage Collectors that move objects concurrently with application threads require the application to use a memory access barrier which detects an access to an object that is in the process of being moved by the collector and redirects the read to the new location of that object, and, if needed, waits for the copy to complete. There are several ways of employing a memory access barrier: either by adding inline instructions, use of dedicated read-barrier assist logic in the processor, or by leveraging the operating system's existing memory protection mechanisms.