Managing available memory is critically important to the performance and reliability of a computer system. Automatic memory management or automatic storage reclamation techniques help programmers to relieve the burden of tedious manual memory management, significantly reduce the possibility of programming mistakes, and thus enhance system reliability.
Without automatic memory management, programmers have to manually “free” or “delete” memory blocks or objects in memory heap after use. Forgetting to reclaim objects causes memory leaks, which in turn affects the performance of the application and other applications in the same system. Mistakenly reclaiming an object that is being used may lead to catastrophic corruption of the system. Automatic memory management techniques allow the computer to discover unused objects by itself and recycle the memory space.
Automatic memory management, also known as garbage collection (GC), generally includes two categories: reference counting, and tracing garbage collection.
Reference counting is a form of automatic memory management where each object has a count of the number of references to it. An object's reference count is incremented when a reference to it is created, and is decremented when a reference is destroyed. The object's memory is reclaimed when the count reaches zero.
With the relatively high cost of maintaining the reference count and failure to reclaim circular-referenced objects, reference counting is not a very attractive option for most garbage collection implementations. However, reference counting can detect and reclaim garbage object immediately when all the references to the object are dropped. This feature is known as deterministic reclamation, which is beneficial to many aspects of programming. Moreover, because the maintenance of the reference count is interleaved with ordinary application execution, the granularities are very small and the pauses are negligible. It is suitable for real-time applications.
In contrast to reference counting, tracing garbage collection focuses on determining which objects are reachable (or potentially reachable), and then discarding all remaining objects. Live objects are traced or traversed to determine unreachable objects.
The advantage of a tracing garbage collector is its ability to reclaim circular-referenced garbage objects. Notwithstanding that tracing garbage collection was developed more than forty years ago, there are some issues with this form of garbage collection. Some of the main issues are listed briefly here, and later a more detailed explanation will be given as to how these issues are resolved by the present invention.
(1) Non-deterministic reclamation: Because it cannot be determined with certainty when a garbage object is collected and reclaimed, there is a conflict with the principle of Resource Acquisition Is Initialization (RAII).
The traversal operation of tracing garbage collection is very expensive. It consumes processor resources, causes a large memory access footprint, and invalidates an excessive number of cache lines. Moreover, the complexity of the operation is proportional to the number of active objects. But even if there are no garbage objects, the expense of reference traversal would remain high. Because reference traversal operation cannot be executed too frequently, it does not give the programmer the precise control over when the objects are destroyed. In generational garbage collection or other partial collection algorithms, some garbage objects are not reclaimed until a seldom-run full garbage collection is performed. Thus, under tracing garbage collection, the reclamation of objects is not deterministic.
RAII principle is advocated by Object-Oriented Programming (OOP). It uses objects to represent resources, and the acquisition is bound to the construction (initialization), whereas the release is bound to the destruction (un-initialization) of the object. Resources normally are referred to those of limited number and with high acquisition contention, including file handles, network connections, software licenses, GUI windows, etc. Thus, resources ought to be released as soon as possible after use. Tracing garbage collectors cannot fulfill this requirement. It does not guarantee or provide precise control for deterministic reclamation of objects.
Because of the lack of deterministic reclamation, application design and programming can become awkward. For example, the Java™ programming language introduces the concept of a “weak reference” to the execution of an object finalization function; .NET™ uses a “Dispose” member function to explicitly release object's associated resources, and uses a “Destructor” function for reclamation of stack-allocated objects. Programmers cannot depend on GC system to reclaim resources and therefore must manually manage resource acquisition and release. This is bug-prone and not productive, especially in a large complicated application in which resources are mutual referenced and dependant. These programming tools only put the burden of resource management back on application programmers.
(2) Low memory efficiency: Those of skill in the art appreciate that Java, .NET, and similar applications use more memory than traditional native applications. One reason is the garbage collector defers the complete reclamation of garbage objects until memory usage reaches a threshold. Therefore, memory usage rises until it reaches the threshold and a full tracing collection is triggered. After garbage collection, memory usage drops to the actual required level where all garbage objects are collected. The memory usage chart is frequently erratic, rising to the threshold, dropping, and rising again. In other words, between garbage collection cycles, there are always plenty of garbage objects occupying precious memory space. Thus, tracing garbage collection always requires more memory than what is actually needed.
(3) Pause of execution: Even the incremental garbage collector will freeze applications for an unpredictable length of time. If the average delay is less than 1 millisecond, which is not perceptible to a human being, the collector referred to as a real-time garbage collector. Unfortunately, real-time garbage collectors do not guarantee or provide a way to predict the worst-case maximum pause time. The length of pause is affected by many factors, such as how many objects exist and the relationship between these objects, at which point during application execution garbage collection starts, and the relative speed between mutators and the collector. Therefore, it is difficult, if not impossible, to guarantee low execution delay in commercial real-time products using the foregoing garbage collection applications.
One reason of pause is contention for shared data structures between mutators (applications) and garbage collector. Applications continually change the reference relationship while the collector needs a sustainable stable relationship graph to trace. The execution of an application thread is suspended when the garbage collector needs to identify all pointers in the application thread stack. Moreover, garbage collection can only start at a GC-Safe point, where the reference relationship is exposed completely to the collector. Application threads need to reach a GC-Safe rendezvous before garbage collection can start. This may cause threads to wait a long time for one errant thread, which results in an extended GC-Unsafe state.
(4) Lack of accurate garbage collectors in C++: In order to perform a tracing collection, the system must be able to identify all references to an object. If an implementation can identify all references exactly as they are, it is referred to as a precise or accurate tracing collector. Implementations, which merely guess and determine that all variables that look like pointers are pointers, might leave garbage objects uncollected. They are referred to as conservative garbage collection.
A spurious pointer in a conservative garbage collection system might retain an object and all its descendant objects, leaving large memory uncollected. The memory effectiveness is not guaranteed. In addition, optimized compilation could make some pointers undetectable by conservative garbage collectors. When a conservative garbage collection unfortunately happens at those unsafe points, live objects may be reclaimed and the system may crash.
An accurate collector must determine the layout of all references at a GC-Safe point, if not all the time. Some compilers can generate the required information, such as pointer layout in thread stacks and CPU registers, but this information occupies many memory spaces, bloats the application working set and decreases the performance. Further, a language without built-in support of garbage collection, such as C/C++, does not generate this information. That is why there is a lack of precise collectors in C++.
See, e.g., Hans-J. Boehm—“A Proposal for Garbage-Collector-Safe C Compilation,” Journal of C Language Translation (1992).
The C/C++language referred to herein is merely a typical sample of languages which do not provide enough information for garbage collection.
(5) High cost of combining reference counting and tracing collection: Reference counting and tracing garbage collection both have some advantages over each other. However, simple combination of these two approaches will combine their shortcomings and also create many new problems. First, while the run-time cost of reference counting and tracing collection are both individually very high already, the sum of these costs is hardly acceptable. Second, reference counting reclaims garbage objects throughout the duration of application execution, while tracing collection reclaims garbage objects periodically. What happens when both collectors want to collect the same object? Synchronization ought to be applied, but if applied at every reference counting operation, the synchronization cost is tremendous. The Internet group “Resource Management” started by Brian Harry is an interesting forum for this issue.
Some fundamental approaches of garbage collections are described in Paul R. Wilson's paper of “Uniprocessor Garbage Collection Techniques”. Proc. Int. Workshop on Memory Management (1992).
Thus, better and faster techniques are needed for automatic memory management.