The term “garbage” is used to describe an object or data element that is no longer accessible by a computer program. Some systems are designed with no garbage detection and collection programs. In these systems, it is up to the programmer to remember to reclaim objects and data that is no longer accessible. Garbage occupies part of the memory of a computer system but serves no purpose. If a computer program does not run for very long or is infrequently run, garbage collection is not a problem because the computer system generally has plenty of memory. However, if the program creates garbage and is run for a long time or frequently, the extraneous garbage can grow to occupy all of the useful memory of the computer system. This will cause a system shutdown or other deleterious effects. Today, programs are designed to run continuously all day, every day. Business servers, in particular, simply cannot experience unscheduled shutdowns.
The effect of garbage has been known from the beginning of the computer era. In fact, forty years ago, two methods of automatic garbage collection for computer systems were introduced: reference counting and tracing. Reference counting is described in Collins, “A Method for Overlapping and Erasure of Lists,” Communications of the Ass'n of Computing Machinery (ACM) 3, 655–657 (1960), while tracing is described in McCarthy, “Recursive Functions of Symbolic Expressions and Their Computation by Machine,” Communications of ACM 3, 184–195 (1960), the disclosures of which are incorporated herein by reference. Briefly, in reference counting, a reference count is used in an object to track how many other objects reference this object. Reference counts are incremented and decremented, and a reference count of zero indicates that the object is garbage because it is unreachable by any other object. In tracing, the entire object graph is traced until garbage is found. Since this early time, tracing collectors and their variants have been much more widely used due to perceived deficiencies in reference counting.
Changes in the relative costs of memory and processing power, and the widespread adoption of languages that employ garbage collection, have modified the landscape. As processor clock speeds increase while Random Access Memory (RAM) becomes plentiful but not significantly faster, certain properties of reference counting make it more appealing. Moreover, the purported extra processing power required is likely to be less relevant.
At the same time, the incorporation of garbage collection by the programming language Java has thrust the problem into the mainstream. Now, large, mission-critical systems are being built in Java. This stresses the flexibility and scalability of the underlying garbage collection implementations used in Java. As a result, the supposed advantages of tracing collectors, namely simplicity and low overhead, are being eroded as they are being made ever more complex in an attempt to address the real-world requirements of large and varied programs.
Furthermore, the fundamental assumption behind tracing collectors, namely that it is acceptable to periodically trace all of the live objects in the heap (an area of memory reserved for data that is created during runtime), will not necessarily scale to the very large main memories that are becoming increasingly common.
There are three primary problems with reference counting: (1) the storage overhead associated with keeping a count for each object; (2) the runtime overhead of incrementing and decrementing the reference count each time a pointer is copied; and (3) the inability to detect cyclic garbage and consequent necessity of including a second garbage collection technique to deal with cyclic garbage.
The inability to collect cyclic garbage (also called “cycles” herein) is generally considered to be the greatest weakness of reference counting collectors. It places the burden on the programmer to break cycles explicitly, requires special programming idioms, or requires a tracing collector to collect the cycles.
The problem of cycles in reference counting systems is illustrated in FIGS. 1 and 2. FIG. 1 shows a subgraph 100 containing a number of nodes 110, 125, 130, 135, 140, 145, 150, and 155 therein. When a computer program runs, it creates a number of objects or data structures or both. The interrelationship between the program, the objects, and the data structures is commonly called a graph. FIG. 1 shows a subset of a graph created by an executing program (the program is not shown). This subset is subgraph 100.
Subgraph 100, as discussed above, contains a number of nodes 110, 125, 130, 135, 140, 145, 150, and 155. Each node represents an object or part of a data structure. Between each node is one or more edges. For instance, between node 110 and node 125 is edge 115, and between node 110 and node 140 is edge 120. Additionally, node 110 is connected to the rest of the graph (not shown) through edge 105. Each edge represents a reference from one node to another node. In FIG. 1, node 110 is referencing node 125 through edge 115, and it is also referencing node 140 through edge 120.
In a reference counting system, the reference count for each node is tracked. For instance, node 125 has a Reference Count (RC) of two because nodes 110 and 135 reference node 125. In FIG. 1, subgraph 160 represents a cyclic structure, while subgraph 170 represents an acyclic structure. Subgraph 160 represents a cyclic structure because there is a series of edges that traverses nodes and that starts at node 125 and ends at 125. In other words, one can traverse this graph by starting at one node and ending at the same node. Thus, subgraph 160 is cyclic. In subgraph 170, conversely, there is no series of edges that traverses nodes and that starts at one node and ends at the same node. Thus, subgraph 170 is acyclic.
FIG. 2 shows a resultant subgraph 200 that occurs after the program removes the references from node 110 that created edges 115 and 120. Even though the program explicitly removes the references, a “mutator” actually performs the low level removal of references. The process undertaken by the mutator is generally hidden from a programmer. A garbage collector will easily recognize that subgraph 170 is garbage, because the reference count for node 140 is zero. A zero indicates that node 140 is no longer being referenced by the program, and, therefore, the node may be removed. Because node 140 can be removed, nodes 145, 150, and 155 can also be removed.
Subgraph 160 is more challenging for a garbage collector. There is no node that contains a reference count of zero. Even though this subgraph 160 cannot be accessed by the program, the reference counts are non-zero. A garbage collector in this instance will have to select a node and search through the entire subgraph to determine that no node in the subgraph is referenced by a node outside of the subgraph. It can then eliminate subgraph 160 as garbage.
Many reference counting systems use a “stop the world” type of synchronous garbage collection, where all processes or threads other than the garbage collector are stopped. This means that the reference counts are not changing while the garbage collector collects garbage. However, “stop the world” garbage collection can take too much time. In fact, garbage collectors of this type have been known to run for many seconds or even minutes on large systems, which is too long for critical applications. Thus, concurrent garbage collection, which allows processes to run during garbage collection, is becoming increasingly necessary.
Concurrent collection of garbage creates additional problems, however. One of these problems is illustrated in FIG. 2. In FIG. 2, subgraph 160 is considered to be garbage once edge 115 is removed. However, node 210 might add edge 220 shortly before edge 115 is removed. This would cause the reference count for node 130 to be increased to two. If garbage collection occurs before the addition of edge 220 and after the removal of the edge 115 is recognized by the collector, a garbage collector will determine that subgraph 160 and its nodes 125, 130, and 135 are garbage. However, they are not garbage because node 210 has added or will add edge 220 to allow node 210 to reference node 130.
Concurrent collection of garbage therefore adds additional problems to garbage collection in reference counting garbage collection systems. Techniques for concurrent collection of garbage exist, but these techniques do not use reference counting.
Thus, better techniques are needed for concurrent collection of cyclic garbage in reference counting computer systems.