The present invention relates to the field of memory management in computer programming. In particular, the present invention deals with garbage collection during computer programming.
A computer system has a limited amount of memory that is primarily used by a number of software programs or applications that run on the computer system. It is imperative that a program use the limited memory allocated to it very judiciously and efficiently. A non-judicious use of the allocated limited memory can result in memory overruns and greater time-consumption in program execution. Most programs involve allocation of memory locations or objects based upon current requirements or demands by the various operations in a program. For example, a word processor would have to allocate a memory object for a table of data created by the user. Memory objects are memory locations that are well defined in terms of size, type, and structure. As the program executes, the memory requirements of various operations of the program change. The change in run-time memory requirement is addressed by dynamic memory management, which can be done manually or automatically.
Manual memory management requires the programmer to allocate memory objects at appropriate places in the program code. The programmer also has to free the allocated memory objects if they are no longer in use. Various programming languages allow programmers to allocate and de-allocate memory manually. For example, in the C language the function malloc( ) allocates memory while free( ) frees up the allocated memory. Though it provides the programmer with flexibility, manual memory management is feasible primarily for small programs. In case of larger programs, manual memory management becomes progressively difficult and can lead to errors. For example, a memory object can be de-allocated while it is still being referenced in other parts of the program. On the other hand, unused or dead memory objects might not be de-allocated leading to a lot of dead memory objects occupying the memory space. Attempts to correct any of the above two errors would add to the complexity of the program and may cause another error.
Automatic memory management, also known as garbage collection, on the other hand, relieves the programmer of most of the worries of memory management. It dispenses with the need for a programmer to de-allocate memory in the program code and hence, avoids almost all the errors caused by manual memory management. Automatic memory management or garbage collection involves techniques that recycle unused memory objects. A code for garbage collection typically includes a mutator and a collector. The part of the code that executes the user code is called the mutator and the part that executes garbage collection is called the collector or garbage collector. The mutator and the collector can be mutually exclusive threads or can be interleaved in the same thread.
At the beginning of the garbage collection process, the collector receives a root set from the mutator. The root set is a collection of roots of the memory objects. A root of a memory object holds a set of references to that memory object. The set of references comprises references in registers of the mutator thread executing the program, all static references, and references to the memory object from any other memory location outside the allocated memory. Generally, the garbage collector carries out garbage collection in two phases. In the first phase, it identifies unused memory objects or garbage. Various techniques are used to identify unused objects. For example, an object that has not been referenced can be termed as dead or unused. In the second phase, the garbage collector carries out relocation of memory objects in such a manner that the unused memory objects are available for use by the same program. A garbage collection process running concurrently with the execution of the program can satisfy the memory requirements dynamically.
It is desirable that during a garbage collection process, memory objects that are made available for re-use are in contiguous memory blocks. If the freed objects are not contiguous, the reallocation of these objects may not be possible if there is a requirement for a larger contiguous block. Further, it is also desirable that the freeing up and reallocation of memory objects happen concurrently with program execution. Ideally, a garbage collection process should address these two requirements.
A number of garbage collection methods are currently being used for memory management. One method of garbage collection is reference counting. In this method, a count is kept of the number of references to a memory object. If the count of references becomes zero, the object is termed as dead or unused. The object is then reallocated. Another method of garbage collection is the Mark-Sweep method. In this method, a given subset of memory is traversed and all the live objects are marked. A live object is defined as an object that is currently in use or which is currently referenced. The memory subset is then swept for unmarked objects. These objects are then reallocated.
Another method of automatic memory management is through the copying garbage collection process. In copying garbage collection, all reachable or referenced objects are copied to a new location. The objects, which are left behind in the old location, are termed as dead and are reallocated. Another method of garbage collection is generational garbage collection. This method uses the generational hypothesis, which states that newer objects are more likely to have a smaller life than older ones. The method involves gathering objects in generations. The objects are divided into new and old generations. The objects in the new generation are moved to the old generation if they survive for a particular amount of time. The objects in the newer generation are collected periodically to reclaim free memory.
There exist a number of patents pertaining to various garbage collection methods. One such patent is U.S. Pat. No. 6,502,111, titled ‘Method and System for Concurrent Garbage Collection’. This patent describes a method for concurrent garbage collection wherein live memory objects are marked concurrently with the execution of the application program. A first marking act is performed using root information while the program executes. The method uses a write watch module to accumulate all the modifications that occur during the concurrent marking act in the memory structure. The logged information in the write watch module is then used to perform a second marking act. The application is paused or stopped to perform the second marking act. The garbage collection is then completed by using various techniques such as sweeping or copying. In this invention, the application is stopped or paused while the collection of garbage is carried out. The memory freed up after garbage collection is not available for reallocation as a contiguous block of memory.
Compaction of the unused memory objects results in the availability of a contiguous block of memory for reallocation. Compaction of unused memory objects is described in U.S. Pat. No. 6,249,793, titled ‘Mostly Concurrent Compaction in a Garbage Collection System’. In compaction, variables containing pointers, which point to objects stored in a selected chunk or subset of memory, are identified and stored in a data structure. Concurrently with these steps, a write barrier marks as ‘dirty’, the regions of memory in which one or more pointers have been modified by the program. Program execution is then stopped for examination of ‘dirty’ objects to identify any further variables pointing to objects in the memory subset. The data structure is updated accordingly. The variables in the data structure are examined to determine if they still point to the objects in the memory subset. The variables that continue to do so are modified to point to corresponding locations outside the subset of memory. The objects are then copied to the locations outside of the subset of memory, and the program is restarted. The subset of the memory can now be re-allocated as a contiguous block of memory. Extensive remapping of the objects is required as the objects that are referenced in the program are relocated. This increases the complexity and the time taken for execution of the garbage collection process. In addition, the execution of the application program has to be stopped for relocation of the objects.
Most of the garbage collection techniques, existent in the art, like the Mark-Sweep method, stop or suspend the execution of the program. In processes that are time critical, such stoppages are not desirable. Techniques in the art, like reference counting, segregating unused or dead memory blocks from live memory blocks or memory blocks that are being currently used. However, they do not provide contiguous free memory for further reallocation. The reference counting method also fails to detect circular references, which are references from one object to another and vice-versa. The copying method of garbage collection provides contiguous blocks of free memory, however, the method requires twice the memory be freed up. The copying method also suspends the execution of the program for relatively longer periods. Even generational garbage collection methods that do not stop the execution initially, do suspend the execution for garbage collecting the objects in the final stages of the collection process.
Besides the above-mentioned disadvantages of the existent techniques, none of the existing techniques provide an estimate of the minimum amount of memory that can be freed in a particular iteration of the garbage collection process. Such an estimate can be used to pace the garbage collection process in accordance with the current demand.
Thus, from the above discussion, it is evident that there is a need for a garbage collection system that is highly concurrent with the execution of the program. The garbage collection process should not significantly interfere with the execution of the program. The garbage collection process should be able to provide contiguous free blocks of memory for reallocation. At the same time, it should not require excess memory space itself. The garbage collection process should also be able to pace itself as per the current memory demand.