In the context of memory management, garbage collection is an important task which identifies and collects memory objects that were previously allocated for a given computer application, and are no longer used thereby. Consider, for example, a continuous (and relatively large) heap (10) (see FIG. 1) and a smaller, so called "root memory module" (12), representative of memory currently in use by one or more computer applications.
All those objects (14) that are directly or indirectly reachable from the root by pointers are "alive" and should not be collected. In contrast thereto, all the objects (16) which have no reference pointers are effectively no longer in use and are therefore regarded as garbage that should be collected. After collection, only active objects are maintained in the heap and memory space that has just been released due to garbage collection, may be allocated for new applications.
The standard approach of scanning the entire heap in order to identify and collect garbage objects is time consuming and therefore an improved scheme called `generational collection` (5) has been developed, and is now a well accepted solution for reducing pause times induced by garbage collection. Generational garbage collectors rely on the assumption that many objects die young. Under this assumption, it is useful to collect the garbage in the young area more frequently. Young objects that survive several collections are "prompted" to the older generation. Since the young generation is kept small, most collections are fast and do not stall the application for too long.
FIG. 2 illustrates schematically a generational garbage collection scheme, wherein the heap (20) is partitioned into two areas: young and old, (21) and (22), respectively. The young area (21) is scanned more frequently and after garbage objects therein are collected and some of the surviving objects are moved to the old area, objects may be allocated in the young area for new applications. The advantages of the generational garbage collection approach are:
1. Most collections are fast and efficient: they concentrate on the young area where it is expected to find a high percentage of garbage. PA1 2. The heap is frequently collected. Thus the heap is frequently reused. PA1 3. The collector uses a smaller working set since most collections only scan a small part of the heap. PA1 4. The specified advantages (2 and 3) give rise to overall better system behavior with less paging: i.e. the collector traces through fewer pages and the program maintains a small working set since the heap is reused. PA1 (a) partitioning said heap or a portion thereof into at least three generations; a generation from among said at least three generations constituting oldest generation and being least frequently subject to garbage collection, other generations from among said at least three generations constituting younger generations of which one constitutes a youngest generation most frequently subject to garbage collection; PA1 (b) partitioning said heap or partition thereof into cards; PA1 (c) associating said generations with remembered sets and card markings data structures; said card marking including, for each card, a card scan indication indicative of a youngest one of said generations for which the card has not been scanned; PA1 (d) for every card having a card scan indication value that does not exceed a selected generation: PA1 (a) partitioning said heap or portion thereof into at least three generations; a generation from among said at least three generations constituting oldest generation and being least frequently subject to garbage collection, other generations from among said at least three generations constituting younger generations of which one constitutes a youngest generation most frequently subject to garbage collection; PA1 (b) partitioning said heap or portions thereof into cards; PA1 (c) associating said generations with remembered sets and card markings data structures; each card in said card markings includes scan generation related data indicative of generations for which the card has or has not to be scanned; PA1 (d) scanning cards according to said scan generation related data; PA1 (e) in the case of identified updated inter-generational pointers, updating the remembered set with the identified inter-generational pointers; and PA1 (f) updating the scan generation related data. PA1 a processor communicating with said memory for scanning the cards having a card scan indication value that does not exceed a selected generation; and for each one of said cards; PA1 (b) updating the respective remembered set with the identified inter-generational pointers for each identified updated inter-generational pointer; and PA1 (c) updating the card scan indication of said card to a generation older by one than said selected generation. PA1 a processor communicating with said memory for: PA1 scanning the cards according to said scan generation related data; and in response to identifying updated inter-generational pointers; PA1 updating the remember set with the identified inter-generational pointers; and PA1 updating the scan generation related data. PA1 a memory heap or portion thereof that is partitioned into at least three generations; a generation from among said at least three generations constituting oldest generation and being least frequently subject to garbage collection, other generations from among said at least three generations being younger generations and a generation from among said at least three generations, constituting youngest generation, and being most frequently subject to garbage collection; said heap or portion thereof being partitioned into cards; said generations are associated with remembered sets and card markings data structure; said card marking includes, for each card, a card scan indication indicative of the youngest generation for which the card has not been scanned. PA1 a memory heap or portion thereof that is partitioned into at least three generations; a generation from among said at least three generations constituting oldest generation and being least frequently subject to garbage collection, other generations from among said at least three generations being younger generations one of which constitutes a youngest generation most frequently subject to garbage collection; said heap or portions thereof being partitioned into cards; said generations being associated with remembered sets and card markings data structure; each card in said card markings including scan generation related data indicative of generations for which the card has or has not to be scanned.
Since, only part of the heap is scanned, it is required to identify not only those pointers that reference objects from the root to the young area (e.g. pointer (25), but also inter-generational pointers (e.g. (26), i.e. pointers that originate from objects residing in the old generation and reference objects in the young generation. As will be explained in greater detail below, data structure are known in the literature which assist in rapidly identifying the inter-generational pointers for GC purposes.
In a multi-generation scheme (i.e. the heap is partitioned into more than two generations), typically, when a generation is subject to GC, all the younger generations are also collected. This reduces the bookkeeping for inter-generational pointers, so that only pointers from older to younger generations need to be maintained. Typically, the number of such pointers is relatively small and, thus, generational collections can use a data structure to maintain an (almost) updated list of these inter-generational pointers. Two possible data structures are suggested in the prior art [5], [7] and [8]: card marking and remembered sets. A combination of the two is suggested in [3].
One way to record inter-generational pointers for a given generation is to maintain a remembered set for the generation [5] and [8]. In the remember set of generation g, all locations of the inter-generational pointers that reference objects in generation g are kept. Maintenance of this set is done by the application whenever a pointer is stored, and by the collector when objects are promoted. Variations on this method are discussed in [1] and [9].
Maintaining the remembered set imposes a costly overhead on the application during normal operation seeing that any change of a pointer necessitates insertion and/or deletion of a member in the remembered set. Card marking reduces this cost [7]. Here, the heap is partitioned into cards of equal size, and whenever the application modifies an object in a card, it marks the card as dirty. Marking a card is a very short operation for the user program [2], [7], [10]. Depending on the specific processor, it may be implemented in 3 to 6 instructions. However, the collector performs more work in a card marking system. It must scan all the dirty cards to find the inter-generational pointers, instead of just getting the pointer from the remembered set. Dirty cards are cards that were recently modified by the application some of which contain inter-generational pointers, the latter being scanned repeatedly.
The advantage of combining these two methods is pointed out by Hosking and Moss [3]. After scanning a card once to find all modifications, the relevant inter-generational pointers can be kept in a remembered set and the card need not be scanned again unless it is modified. This keeps the advantage of low overhead on the application, but also increases the collector efficiency, since cards are scanned once and not repeatedly; their dirty flag is cleared; and only dirty (modified) cards are scanned.
The utilization of conventional remembered set and card marking data structures poses a significant overhead in a multi-generational scheme. Thus, suppose that a few young generations are collected, all dirty cards are scanned, and the remembered set of each collected generation is updated. The dilemma is whether all the remembered sets, including remembered sets of generations that were not collected should be updated. If in the affirmative, longer delays are caused, while collecting the younger generations. (Recall that updating the remembered set means removing all entries that have become relevant plus adding entries for new inter-generational pointers). On the other hand, if not all the remembered sets are updated, then the mark of the card cannot be cleared, since it has not been scanned for older generations. The inevitable consequence of failing to clear the marks is that the card is unnecessarily scanned again and again during future collections of the young generations.
There is accordingly a need in the art to substantially reduce or overcome the inherent limitations of remembered set and card markings data structures, in multi-generational GC applications.