This invention relates to garbage collectors that mark live objects using a bitmap structure and, more particularly, to the use of bitmap marking garbage collectors that employ parallel marking. In general, memory reclamation may be carried out by a special purpose garbage collection algorithm that locates and reclaims memory which is unused, but has not been explicitly de-allocated. There are many known garbage collection algorithms, including reference counting, mark-sweep, mark-compaction and generational garbage collection algorithms. These, and other garbage collection techniques, are described in detail in a book entitled “Garbage Collection, Algorithms For Automatic Dynamic Memory Management” by Richard Jones and Raphael Lins, John Wiley & Sons, 1996.
An object may be located by a “reference”, or a small amount of information that can be used to access the object data structure. Objects can themselves contain references to yet other objects. In this manner, a chain of references can be created, each reference pointing to an object which, in turn, points to another object. Some garbage collection techniques determine when a data structure is reachable by an executing program thread by starting at external roots (for example, program stack entries) and following chains of references and marking all objects encountered. After all reference chains have been followed, the memory occupied by unmarked objects can be reclaimed and reused. Object marking may be carried out by a single collector thread or may be carried out by several collector threads operating in parallel. Reclamation of unused memory is generally performed during a “sweep” phase of garbage collection that follows the marking phase.
To follow chains of references from the program's roots and to mark all reachable objects, each marking thread must track the objects it marks and scan them, in turn, for references. Typically, this objective is achieved with a local mark stack that manages the objects that have been marked, but not yet scanned. As objects are marked, they are pushed on the local mark stack. When an object is popped off the stack, its references are examined and any unmarked objects are marked and pushed on the stack. When all the roots have been scanned and the local mark stacks are empty, all reachable objects are marked.
There are generally two approaches to marking objects. The first approach uses a marking data structure, such as a bitmap, that is “external” or separate from the memory occupied by the objects. The bitmap typically contains one bit for each address where an object may start and is indexed by the address of the object. Objects allocated in a garbage-collected heap typically have a minimum alignment which limits the set of addresses at which objects may start. Common constraints include a minimum alignment on either single- or double-word boundaries which result in the possibility of objects starting on every word or on every two words, respectively. In the former case, we need one bit for each word of memory; in the latter, one bit is needed for every two words. Each block of memory corresponding to a bit in the bitmap is referred to as a “unit of memory”.
The second approach, called “inline” marking, uses memory space reserved in the data structure for each object to store the marking bit. Each marking approach has advantages. For example, in uniprocessor systems, external marking data structures typically have better locality, allow for less expensive sweep operations, and provide a natural data structure for overflowing mark stacks. Alternatively, inline marking requires no additional memory (for the external structures), typically requires a bit per object instead of a bit per unit of memory, has a simpler marking operation (tagging and storing a reference rather than indexing a bitmap).
Because the average object size is typically 40-64 bytes and because marking threads are usually marking different subsets of objects, the approach of marking objects inline tends to disperse the marking activity so that two marking threads rarely mark two objects in the same cache line. This property is not true for the approach of marking objects in a single external bitmap because the marking information is represented in a more compact format. Because hundreds of mark-bits fit on the typical cache line, which for most modern processors ranges from 64 to 256 bytes, multiple marking threads are more likely to write locations in the same cache lines as so contend with each other during the accessing of the cache lines.
One challenge when employing external bitmaps for marking objects is that modern computer instruction sets typically do not provide instructions to independently write single bits. Instead, a thread must read a larger unit of memory, such as a word, set the appropriate bit in that word, and write back the result. If this write-back is performed with an ordinary store instruction, the thread risks losing information about bits set by other threads in the same word of memory. For this reason, a marking thread writing to a shared bitmap typically uses atomic instructions such as compare-and-swap, swap, or load-locked/store-conditional instructions that enable it to detect updates made by other threads to the same sequence of bits. If two marking threads attempt to write the same word simultaneously, one thread may have to retry the operation causing a delay. Contention for highly referenced portions of the shared bitmap can cause significant delays for multiple marking threads.
Conventional parallel marking systems have dealt with this problem by using a variety of techniques. For example, some systems use a coarser representation, such as a single shared byte-map where each individual mark can be written separately into the map structure. Other systems replicate the bitmap so that the marking threads work on separate copies. Still other systems partition the bitmap and index it with addresses so that threads work on a single bitmap, but mark disjoint sets of objects.
However, some of these conventional techniques lead to the possibility of duplicate work. For example, if ordinary non-atomic byte-store instructions are used to update the map, multiple threads may think they have marked a particular object, since, in this case, marking is an idempotent operation. Atomic update instructions can be used on bytes to eliminate the duplicate work but then the instruction cost for writing each mark is much greater than non-atomic update instructions (40-50 cycles versus one cycle). Using replicated bitmaps can also allow multiple threads to mark an object unless each thread incurs the expense of checking how the object is marked in all of the replicated copies of the bitmap.
Some of these techniques also require extra communication and synchronization operations among the marking threads. For example, with a partitioned bitmap, each thread must communicate references found in its partition to objects in partitions updated by other threads. In the worst case, in such systems, linking patterns of objects in the heap can effectively serialize the marking of all the parallel threads. Finally, the use of byte-maps and replicated bitmaps increase the space used and replicated bitmaps require an ORing of bits across all the replicated bitmap copies to determine if an object is marked.
In addition, in collectors that use non-atomic operations to mark and claim objects to scan, the objects can end up on multiple mark stacks. Their presence on multiple stacks, in turn, limits what can be done with those objects safely. For example, some overflow strategies for mark stacks involve threading overflowed objects through their class pointers; a strategy which nicely summarizes the excess work with no additional space overhead. However, objects on overflow lists no longer have references to their class information in their headers. If an object can be on multiple stacks, and it is placed on an overflow list by one marking thread, other marking threads whose mark stacks contain that object will have difficulty scanning that object without its class information.