The present invention relates generally to distributed computer systems in which multiple processes are able to access network objects, and particularly to a method for garbage collecting cycles of distributed network objects.
The term garbage collection describes a process implemented on one or more interconnected general purpose machines (real or virtual) for effectively deleting obsolete data from a memory associated with the machines. Problems and solutions to garbage collection are well known. For example U.S. Pat. No. 5,241,673 and U.S. Pat. No. 5,446,901, hereby expressly incorporated by reference for all purposes, describe general background information as well as conventional solutions to Distributed Garbage Collection.
An object is a construct of a computing machine. To instantiate an object, a machine allocates a portion of its memory in order to define and make use of the object. During operation of a machine, objects are continually created, used and obsoleted. As memory is limited, it is desirable to identify and collect obsolete objects (objects no longer required by any existing object) so that memory previously allocated to obsolete objects may be used by the machine, such as to create new objects. Sometimes collection of these obsolete objects lags behind their obsolescence and the operation of the machine may begin to be degraded as a consequence.
Conventional solutions for garbage collection, such as those described in the patents incorporated above, include methods for checking each object to determine whether it is obsolete and should be collected. In a method of this type, referred to as a mark and sweep process, an analysis begins at all root objects stored in the memory of all of the machines making up the distributed system. A forward reference graph defines a relationship between a root object and all the secondary objects that the root object references. The secondary objects may include references to tertiary objects, which may include further references to other objects. Objects may be instantiated in different portions of the collective memory of all of the concurrent processes in all of the different machines. Mark and sweep requires that several messages be sent to and received from every object. As a consequence, mark and sweep solutions to distributed garbage collection are expensive in terms of time and message overhead.
In addition to these incorporated patents, another reference describing a conventional solution to Distributed Garbage Collection is Garbage Collection on an Open Network, International Work on Memory Management, Spring Verlag LNCS 986, 1985 by Matthew Fuchs, also hereby expressly incorporated by reference for all purposes. Garbage Collection on an Open Network describes a total solution to Distributed Garbage Collection that makes use of inverse reference graphs. Construction, maintenance and use of inverse reference graphs is well known and will not be described in detail herein. An inverse reference graph includes objects represented as nodes with edges between pairs of nodes defining a referential relationship between the pairs of objects represented by the nodes.
Cyclical garbage is a special class of garbage that requires special processing for identification so that it may be collected. FIG. 1 is an inverse reference graph for a cycle 100 including a collection of three objects (first object 102, second object 104, and third object 106). In cycle 100, object 102 has a first reference arrow 108 pointing to object 104. The direction of reference arrow 108 reflects that object 104 references object 102. In other words, reference arrow 108 starting from object 102 and extending to object 104 means that object 102 is referenced by object 104. Arrows directed away from a node represented on an inverse reference graph define the branches of the node.
In cycle 100, object 104 has a second reference arrow 110 pointing to object 106. Object 106 has a third reference arrow 112 pointing to object 102. The references between the objects are cyclical. Unless one of the objects represents a rooted object, either a local root, or a remote root, cycle 100 is garbage. A locally rooted object is an object that is being referenced by a rooted (persistent, non-collectable object) in the same machine as the object. A remotely rooted object is an object referenced by only objects in remote machines, and all references originate from one or more locally rooted objects.
In the case where cycle 100 resides in a single simple machine with few objects, it is straightforward and inexpensive (in terms of time and a number of message exchanges among the objects) to identify cycles. When cycle 100 becomes distributed across two or more machines and the distributed machines have large numbers of objects to create, use and to identify as obsolete, prior art solutions become expensive to adequately deal with collecting cycles.