The present invention is in the general field of memory management and concerns more specifically automatic memory management or garbage collection (GC).
Garbage collectors free the space that can no longer be used by a program so that this space can be reused for future allocations. In many systems, the unit of space allocated by a program and freed by the collector is called an object. A so-called xe2x80x9cconcurrentxe2x80x9d garbage collector represents a general class of collectors in which the mutators continue to work while the collector is active. Note, however, that there may be a point during the GC cycle where all the mutator threads need to be stopped at once. An on-the-fly collector based on the original article by Dijkstra et al. [Edsgar W. Dijkstra, Leslie Lamport, A. J. Scholten, E. F. Scholten, E. F. Steffens, On-the-fly Garbage Collection: An Exercise in Cooperation, November, 1978, Communications of the ACM] does not have a synchronization point where all threads are stopped at once. Doligez and Gonthier [Damien Doligez, Georges Gonthier, Portable Unobstrusive Garbage Collection for Multiprocessor Systems, January, 1994, Conference Record of the Twenty-first Annual ACM Symposium on Principles of Programming Languages] described a more advanced and more efficient on-the-fly algorithm.
An on-the-fly garbage collector, i.e., a collector that reclaims unused space in parallel to the running program without stopping it for the collection is a fascinating theoretical idea with important benefits in practice. In particular, on many server platforms, the actual operation of stopping all parallel threads in order to do a garbage collection task is a high cost, time consuming operation. The reason is that the threads cannot be stopped at any point and, thus, there is a relatively long wait until the last (of many) threads reaches a point where it may stop. Additionally, stopping all program threads during garbage collection does not take advantage of all available processors.
On-the-fly garbage collectors are well known in the literature. On-the-fly collectors generally use mark sweep whereas concurrent collectors may also use other garbage collection techniques e.g. copying. In the mark sweep type of collectors, there is normally a first step, in which the live memory objects in the heap are marked and there is a second step in which the unmarked objects are xe2x80x9csweptxe2x80x9d, i.e., reclaimed for future use.
The trace of live objects is normally (although not necessarily) done with a 3-color scheme: Objects are white if they have not been traced, they are marked gray if they have been traced but their immediate children have not yet been traced, and they are marked black if they have been traced and their immediate children have been traced as well. The trace proceeds step by step by taking a gray object, marking it black and marking gray all its white children.
The fact that the collector works xe2x80x9con-the-flyxe2x80x9d makes its life harder. Thus, while it is scanning the heap, the user program threads change the reachability graph concurrently. If the collector uses this naive scheme, it may miss some live items. If, for example, (see FIG. 1) the user program moves a white node (1) from being referenced by a gray object (2) (i.e., whose children (3 and 4) have not yet been traced) to being referenced by a black object (5) (whose sons (6, 7) will not be traced any more), then the white object (1) (and its sons, if any) may not be traced.
To solve this problem and let the collector spot all live objects during the trace, the program threads help the collector through use of a write barrier. During the garbage collecting cycle, whenever a pointer is modified from pointing to an object A into pointing to object B, either A or B are marked gray by the modifier thread (by the embodiment of FIG. 1 object(1) is marked gray either when the connection to (5) is created or when reference from (2) is erased). Choosing which of the objects to mark depends on the specific algorithm or the stage of the algorithm. Sometimes, algorithms may mark both A and B gray and sometimes only A or only B. This operation of the program is sometimes called the. xe2x80x9cwrite barrierxe2x80x9d or the xe2x80x9cupdate protocolxe2x80x9d.
In a typical scenario, more than one program thread (referred to also as mutator thread) and the collector thread run simultaneously, meaning that the update (graying) of the objects is executed also during collection. Thus, not only do the mutators gray objects in parallel one with the other, but they also gray objects in parallel with the collector during trace. Collector, in this context, signifies one or more collector threads.
This manner of operation may create race conditions between mutators, and/or between the collector and the mutators, which is obviously undesired. Race conditions may occur for example in the following scenario. Marking an object gray by the mutators and the handling of gray objects by the collector may occur concurrently. This may create a race condition if there is a need to keep track of the gray objects.
In a multiprocessor environment, previous implementations have either required frequent explicit synchronization between the collector and the mutators in order to keep track of the gray objects (e.g. using a single mark buffer), or have been inefficient and required repeated scans of the heap (or a data structure proportional to the size of the heap) until there are no more gray objects. The first option slows down the mutators and the second option slows down the collector, delaying the collection of garbage.
Turning to the specified second solution of repeatedly scanning the heap to find the gray objects, it requires little synchronization between the mutator threads and the collector thread. However, not only is scanning the heap multiple times inefficient, but it may require bringing every page of the heap into memory, which may be very costly time-wise. This problem may be ameloriated by using a color bitmap (as described in xe2x80x9cGarbage Collectionxe2x80x9d by Richard Jones and Rafael Lins, pp. 87-88) to hold the color representation of objects. However, this still requires multiple scans of the color bitmap, whose size is proportional to the size of the heap, until no grays remain, hence it suffers from the same inefficiency drawback.
In accordance with an alternative approach, queuing gray objects in a mark buffer will eliminate the need for multiple scans by keeping track of all remaining gray objects, i.e., those that still need to be traced by the collector. However, having multiple writing threads to the same mark buffer requires synchronization, which as specified before gives rise to an undesired slow down.
There is accordingly a need in the art to provide for a novel technique which enables to carry out tracing of memory objects, with little or no explicit synchronization. The proposed approach is also useful for other applications which employ multiple writers and single reader.
In the context of the invention, reference to a memory object should not be construed to any specific data type or size. Object should be construed in a broad manner including any area of memory which is returned in response to an allocation request by a program thread.
Reference to colors of memory objects is provided for illustrative purposes only, indicating corresponding state associated with the memory object.
Thread should be construed in a broad manner including xe2x80x9cprocessxe2x80x9d.
Whilst, for simplicity, the invention is described with reference to an on-the-fly garbage collection application, those versed in the art will readily appreciate that the invention is by no means bound by this example. Thus, by another non-limiting embodiment, the garbage collection technique of the invention is used with concurrent garbage collection algorithm. It should be further noted that the use of the invention is not necessarily bound to the so called xe2x80x9cmark and sweepxe2x80x9d algorithm.
In accordance with the broadest aspect of the invention, there is provided a generic data structure associated with at least Insert, Extract and isEmpty operations. The Insert operation is designated for inserting objects to selected parts of the generic data structure by multiple writer threads whilst avoiding (or substantially avoiding) synchronization between the writers. The Extract operation is designated for extracting objects by one or more readers (from selected parts of the generic data structure whilst avoiding (or substantially avoiding) synchronization with any of the writers. The selected parts of the generic data structure that are utilized by the Insert operation may partially or fully overlap the selected parts of the generic data structure that are utilized by the Extract operation, all as required and appropriate, depending upon the particular application.
The isEmpty operation is designated for determining if there are remaining objects in selected parts of the generic data structure. In accordance with the invention the isEmpty operation is not synchronized with either the Insert and the Extract operations thereby bringing about the desired result that no (or substantially no) synchronization exists between the writers and between the writers and one or more readers.
Thus, in accordance with the broadest aspect, the invention provides for a computer implemented method that utilizes at least two writer threads and at least one reader thread wherein said writer threads running on the computer simultaneously with said reader thread, the method comprising the steps of:
(a) providing a generic data structure for said threads; the generic data structure is associated with at least Insert, Extract and isEmpty operations;
(b) inserting objects to selected parts of the generic data structure by at least two writer threads, using said Insert operation;
(c) extracting objects by the reader thread from selected parts of the generic data structure, using said Extract operation;
(d)
determining if there are remaining objects in selected parts of the generic data structure utilizing said isEmpty operation; said isEmpty operation is substantially not synchronized with said Insert and Extract operations;
whereby substantially no synchronization is required between the writer threads themselves; and between said reader threads and said writer threads.
In a preferred embodiment, the proposed technique is utilized for performing garbage collection of unused memory objects in a memory heap. By this embodiment the specified writer threads stand for mutator threads and the reader (or readers) thread(s) stand for respective one or more collector thread. Still further by this embodiment the generic data structure includes a dedicated mark buffer and associated fields for each one of the mutator threads as well as for the collector thread(s).
Using dedicated mark buffer for each respective thread alleviates the problem of potential race between mutator threads, however, this does not cope with a situation in which collector extracts from a mutator mark buffer while the mutator is continuing to add to the mark buffer.
In order to overcome the above problems, there are provided as specified at least three operations, Insert, Extract, and isEmpty which are associated with each mark buffer. Insert inserts an element (being representative of a grayed memory object) to a mark buffer (constituting a part of said generic data structure). Extract, chooses an arbitrary element, removes it from the mark buffer (constituting a part of said generic data structure) and returns it. The order of extraction is determined according to the application, say e.g. FIFO or LIFO. isEmpty returns true if the data structure was empty (i.e. no remaining elements to extract) at the time the isEmpty operation was initiated.
All three operations are done with substantially no synchronization cost or a very little synchronization cost, which can be made arbitrarily small.
Using data structure of the kind specified (associated with the Insert, Extract and isEmpty) enables keeping track of the objects remaining to be traced efficiently, in a manner equal to the number of remaining objects, as opposed to hitherto known techniques where the computational complexity is dependent on the heap size. Thus, in accordance with one prior art technique the heap is repeatedly scanned for examining object colors or in accordance with another hitherto known technique the bit map (which varies in size with the heap) is repeatedly scanned.
As will be explained in greater detail below, using the technique of the invention substantially avoids synchronization between the collector thread performing the Extract operation and the mutator threads performing the Insert operation, and also among the mutator threads themselves.
The collector thread uses the isEmpty operation to check that there are no objects remaining to be traced. This operation is also done substantially without synchronization to the other Extract and/or Insert operations.
In accordance with this preferred embodiment, many mutator threads invoke Insert and preferably, although not necessarily, a single collector thread invokes both Insert and Extract. Accordingly, a reader should be construed also as possibly performing writing operations, i.e. reader/writer.
The implementation employs a buffer for each thread. Each thread can insert an element in its buffer without synchronization. The collector can extract from each of the buffers without synchronization. The check for the completion of tracing (isEmpty) is done by the collector without synchronization; the cost of the check is proportional to the number of threads, and independent of heap size.
John DeTreville [Experience with Concurrent Garbage Collectors for Modula-2+, November, 1990, Digital Systems Research Center, (copyright) Digital Equipment Corporation] describes a seemingly similar buffering scheme for a concurrent reference counting collector. However, his scheme does not allow the collector to access a mutator""s buffer at the same time the mutator may be inserting entries. Also, his scheme requires a point in time when all mutator threads are stopped; at that time, the collector can processor the partially filled buffers of the mutators. Thus, his scheme is not appropriate for an on-the-fly collector.
The invention further provides for a system of the kind specified mutatis mutandis: a computer implemented method for performing garbage collection of unused memory objects in a memory heap by at least one collector thread; at least one mutator thread running on the computer simultaneously with said at least one collector thread, the method comprising the steps of:
(a) providing a mark buffer data structure for each one of the mutator and collector threads; each mark buffer is associated with at least three operations, Insert, Extract, and isEmpty, wherein Insert inserts an element representative of a memory object, Extract chooses an arbitrary element representative of a memory object, removes it from the mark buffer, and isEmpty returns true if all the mark buffers include no remaining elements to extract, at the time the operation was initiated;
(b) applying on-the-fly garbage collection in order to collect unused memory objects in the heap; said on the fly garbage collection step includes:
i. the at least one mutator thread acquires, using synchronization primitives, a respective dedicated mark buffer and uses, said Insert, operation for inserting objects to said mark buffer;
ii.
the at least one collector thread uses, said Extract operation for extracting objects from a mark buffer;
determining if there are remaining objects in the mark buffers utilizing said isEmpty operation; said isEmpty operation is substantially not synchronized with said Insert and Extract operations;
whereby substantially no synchronization is required between the mutator threads themselves; and between said at least one collector threads and said mutator threads.
By an alternative embodiment, a concurrent garbage collector is employed instead of said on-the-fly garbage collector.
Still further, the invention provides for a program storage device readable by machine, tangibly embodying a program of instructions executable by the machine to perform method that utilizes at least two writer threads and at least one reader thread wherein said writer threads running on the computer simultaneously with said reader thread, the method comprising the steps of:
(a) providing a generic data structure for said threads; the generic data structure is associated with at least Insert, Extract and isEmpty operations;
(b) inserting objects to selected parts of the generic data structure by the at least two writer threads, using said Insert operation;
(c) extracting objects by the reader thread from selected parts of the generic data structure, using said Extract operation;
(d)
determining if there are remaining objects in selected parts of the generic data structure utilizing said isEmpty operation; said isEmpty operation is substantially not synchronized with said Insert and Extract operations;
whereby substantially no synchronization is required between the writer threads themselves; and between said reader threads and said writer threads.