In a computer system, programs and data reside in physical storage, such as RAM or disk but are addressable in virtual memory, defined by the operating system. When a computer program is executed, the operating system establishes a run time environment. In the run time environment, storage must be allocated by the system not only for any external data needed by the program but also for data generated by the program itself. Several methods of storage allocation are known. Static allocation binds all names in the program to fixed storage locations at compile time. This is the oldest and least flexible technique but may still be used for storage of dynamically loadable library (DLL) files used by a program. Dynamic allocation of storage requires the creation of data structures in dedicated areas of memory known as the “stack* and the “heap”. Typically, modern programming language compilers or run time environments may provide all three types of storage under overall system control.
The stack is typically a push down stack (last-in-first-out) and is used for data which must be organised and retrieved in a known and controlled manner. The heap is used for storage of transient data such as intermediate results and variable values which may not be needed for more than a short time during execution. Data structures in a heap may be allocated and deallocated in any order.
During program execution, the allocation of free virtual memory is managed by means of “free lists” which are data structures containing pointers to storage locations in a free pool of memory which are available to a requesting program. There must of course be limits on the amount of storage which can be allocated to any particular program. In the case of the heap, a large amount of transient data may be generated by the program. In order for the heap not to become full, storage must be deallocated by the program as its contents become redundant.
However, because of the dynamic aspect of heap allocation and the transient nature of the program operations carried out on heap data, it is quite frequently the case that pointers to stored data objects may be destroyed after the objects have been used by the program, without the data object storage being explicitly deallocated. This means that the data object has become unreachable by the program. A single instance of this is referred to as a “leak* and collectively, all the leaks are referred to as •garbage”.
Automatic techniques known collectively as “garbage collection” have been developed to identify such garbage data and to reallocate its storage for reuse by the program. An in-depth treatise on the subject may be found in the book “Garbage Collection—Algorithms for Automatic Dynamic Memory Management” by Richard Jones and Rafael Lins (Wiley, 1996, ISBN 0471941484.)
In the field of this invention it is known that garbage collection is a part of a programming language's runtime system, or an add-on library, perhaps assisted by the compiler, the hardware, the operating system, or any combination of the three, that automatically determines what memory—a program is no longer using, and recycles it for other use. It is also known as “automatic storage (or memory) reclamation”. One example of a managed runtime programming language relying on garbage collection is the Java programming language (Java and all Java-based trademarks are trademarks of Sun Microsystems, Inc. in the United States, other countries, or both). Another example is the Visual C# language and .NET programming framework from Microsoft Corporation (Visual C# is a trademark of Microsoft Corporation in the United States, other countries, or both).
Automatic garbage collection is preferred to explicit memory management by the programmer, which is time consuming and error prone, since most programs often create leaks, particularly programs using exception-handling and/or threads. The benefits of garbage collection are increased reliability, decoupling of memory management from class interface design, and less developer time spent chasing memory management errors. However, garbage collection is not without its costs, including performance impact, pauses, configuration complexity, and non-deterministic finalization.
A common method of garbage collection, many versions of which are described in detail in the above referenced book, is known as “mark-sweep”, where allocated memory (that is, memory corresponding to accessible data objects) is first marked and a collector then sweeps the heap and collects unmarked memory for re-allocation. Broadly, the marking phase traces all live objects in the heap by following pointer chains from “roots” to which the program has access. Roots may typically be in the program stack or in processor registers. When a data object on the heap is reached, it is marked, typically by setting a bit in a mark map representing the heap storage, although alternatively, an extra bit could be set in the object itself. When all reachable objects have been traced, any other objects must be garbage. The sweep phase uses the marking results to identify unmarked data as garbage and returns the garbage containing areas to the free list for reallocation. An entire collection may be performed at once while the user program is suspended (so-called ‘stop-the-world’ collection). Alternatively, the collector may run incrementally (the entire heap not being collected at once, resulting in shorter collection pauses).
However, these approaches have the disadvantages that the sweep phase of garbage collection can take a significant part of the pause time (greater than 50%). An alternative is to run the collector process concurrently whereby the user program assists the garbage collection process being performed by the system. Typically, the amount of work done by a user program thread is a function of the amount of transient storage allocated by it. However, “concurrent sweep”, as this is known, has the drawback of decreasing application throughput
In addition to the Jones and Lin book, reference is also made to a paper entitled “Dynamic selection of application specific garbage collectors” by S. Soman et al., (ISMM'04 Oct. 24-25, 2004 Vancouver, Copyright 2004 ACM). This paper reports results achieved using five different known methods of garbage collection and recommends switching between the methods for the greatest efficiency. However, it does not suggest how to increase the speed of garbage collection.
A need therefore exists for a garbage collection technique wherein the above mentioned disadvantage(s) may be alleviated.