1. Field of the Invention
This invention relates to the field of computer memory allocation and deallocation. Specifically, this invention is a new and useful method, apparatus, system, and computer program product for a write barrier of a garbage collected heap and for scanning card markers associated with the card marked heap.
2. Background
Memory allocation and deallocation techniques have become very important in structured programming and object oriented programming methodologies. Memory allocated from a heap can be used to store information. Often this information is an instantiated object within an object-oriented paradigm. The subsequently described techniques apply to both nodes in the heap containing data and nodes in the heap that are instantiated objects.
Introduction to Garbage Collection
Computer memory is a resource. Programs cause a computer to perform operations (to execute) based on instructions stored in memory. Executing programs also use memory to store information. This information is often organized into memory resident data structures. Usually, these data structures are linked together by pointers from one structure to another referenced through pointers in static and stack variable storage. The memory resource is managed to meet the storage requirements for information and program code.
Executing programs often need memory for a purpose that extends for a limited period of time. For example, a program may allocate memory to hold information, store the information into the allocated memory, operate on the stored information to produce a result, and then have no further need of the stored information. Once the program no longer needs the stored information, the allocated memory can be released for later reuse.
Modern programming languages provide facilities for static, stack and heap allocation of memory. Static allocation binds variables to storage locations at compile and/or link time. Stack allocation pushes an activation frame on the computer stack when a program block prepares to execute. This activation frame contains storage for variables within the scope of execution for the program block. Once the program block completes, the activation frame is popped from stack. Thus, stacks store information in a last-in-first-out (LIFO) manner. Variables stored in the activation frame are not saved from one activation of the block to the next. Heap allocation allows memory for variables to be allocated and deallocated in any order and these variables can outlive the procedure (or block) that created them. Once memory is deallocated it is available for reallocation for another use.
A "node" is memory allocated from a heap. Nodes are accessed through pointers. A direct (or simple) pointer is the node's address in the heap. An indirect pointer (sometimes called a `handle`) points to an address in memory that contains the address of the node. More complex pointers exist. Indirect pointers allow nodes to be moved in the heap without needing to update the occurrences of the handle. One problem with indirect pointers is that they require an extra memory access to reach the node. This extra memory access slows execution of the program.
The "root set" is a set of node references such that the referenced nodes must be retained regardless of the state of the heap. A node is reachable if the node is in the root set, or referenced by a reachable node. The "reference set" is the set of node references contained in a node. A memory leak occurs when a node becomes unreachable from the root set and is never reclaimed. A memory leak reduces the amount of heap memory available to the program. A node that becomes unreachable from the root set and can be reclaimed is a garbage node.
Usage of heap memory can be accomplished by manually programming node allocation and deallocation. However, although a programmer knows when a new node is required, it is often difficult for the programmer to know when a node is no longer reachable. Thus, problems may occur when programmers explicitly deallocate nodes. One of these problems is that it is very difficult to debug memory leaks. Often the design of the application being programmed obfuscates when the programmer can explicitly deallocate memory. Additionally, when one portion of a program is ready to deallocate memory, it must be certain that no other portion of the program will use that memory. Thus, in object oriented programming (OOP) languages, multiple modules must closely cooperate in the memory management process. This, contrary to OOP programming methodology, leads to tight binding between supposedly independent modules.
These difficulties are minimized if the programmer need not explicitly deallocate memory. Automatic garbage collection methods scan memory for referenced nodes and recover garbage nodes--but at a cost. The process of finding and deallocating garbage nodes takes processor time. Balancing the impact of the garbage collection process on an executing program is important because the main function of the program may require timely operation or uninterrupted user interaction. Real-time systems (those systems that must provide a response within a specified clock time) often cannot dedicate large amounts of processor time to garbage collection. In real-time systems the garbage collection algorithm must be able to be interrupted.
In a system using garbage collection, nodes are allocated from the heap as memory is needed. These nodes are not initially reclaimed when they are no longer needed. Instead, when a memory allocation attempt fails or in response to some condition (for example on expiration of a clock), the garbage collection process is automatically invoked and unused memory is reclaimed for subsequent reuse.
Some garbage collection methods copy nodes (that is, these methods relocate nodes that appear to be alive from one location in the heap to another location). When this happens, a mechanism is required to allow existing pointers to the original location of the node to be used to access the relocated node. These mechanisms include (among others) updating existing pointers to the node's original location and providing indirect pointers to the new location of the node.
The prior art in garbage collection is well discussed in Garbage Collection, Algorithms for Automatic Dynamic Memory Management, by Richard Jones and Rafael Lins, John Wiley & Sons, ISBN 0-471-94148-4, copyright 1996 hereby incorporated by reference as indicative of the prior art.
Types of Garbage Collection Algorithms.
Garbage collection algorithms can be classified as `exact` or `conservative. These exact algorithms operate by tracking variables that are known to contain pointers. These algorithms are often assisted by compiler modifications that help distinguish between pointers and data values. Often data values and pointer values are tagged to differentiate between them. The conservative algorithms do not receive any help from the compiler nor are the data values tagged. Thus, the garbage collection algorithms are unable to distinguish between data values and pointer values so that everything that looks like a pointer is treated as a pointer. Further, the conservative algorithms do not know the structure of the heap or the stack and do not expect pointers to be tagged. As such, the conservative algorithms must include steps for handling mis-identified pointers. Many garbage collection algorithms are a mixture of exact and conservative techniques.
Generational Garbage Collection
Generational garbage collection techniques use the observation that many nodes allocated from the heap are only used for a short period of time. These nodes are allocated for a specific short-term purpose, used for the purpose, and then can be deallocated for possible later reuse. Thus, garbage collection algorithms that concentrate on younger nodes are more efficient than those that process all nodes identically because fewer nodes need to be examined during the garbage collection process.
Generational Garbage Collection algorithms separate nodes into two or more areas in the heap depending on the node's age. Each area is a generation. Nodes are first allocated from the creation area within the youngest generation and are copied to the older generation if the node survives long enough ("long enough" is often until the next scavenge operation). These garbage collection algorithms concentrate on reclaiming storage from youngest generation area where most of the garbage is found. Generally, the number of live nodes in the youngest generation is significantly less than the number of live nodes in the other generation areas so that the time required to scavenge nodes in the youngest generation is less than the time required to scavenge the other generation areas. A scavenge operation of the creation area is termed a minor collection. Any garbage collection operation on an older generation area is termed a major collection. The minor collection operation occurs more frequently than the major collection operation because of the reduced overhead and higher efficiency of the minor collection process.
However, generational garbage collection algorithms need to record inter-generational pointers. These inter-generational pointers are created (1) by storing a pointer in a node or (2) when a node containing a pointer is copied to an older generation area. The pointers created by a copying algorithm can be recognized by the copying algorithm. A write-barrier is used to record pointers created by an assignment of a pointer within a node. If all younger generation areas are collected whenever an older generation area is collected, the write-barrier only need record pointers from the older generation area to the younger generation area.
Even though a minor collection operation is faster than a major collection operation, the minor collection operation often requires too much time to be satisfactory in a real-time situation. Thus, the minor collection process must be interrupted to meet real-time requirements. One difficulty with interrupting the minor collection is that the inter-nodal pointers are left in an indeterminate state such that some inter-nodal pointers point to the promoted node and others point to the original node. That is, when the minor collection operation is interrupted after a node is copied, often not all the references to the node's prior location are updated to the new location of the node.
Once a node is copied, any pointers to the copied node must be updated or tracked so that future references to the copied node eventually succeed. Further, pointers to nodes in the younger generation contained in copied nodes must be accessed to determine the reference set.
FIG. 1a illustrates a heap area indicated by general reference character 100. The heap area 100 includes a generational garbage collection area 101. The generational garbage collection area 101 includes a younger generation 103 and an older generation area 105. The younger generation 103 is often subdivided into a creation area 107, a `to` area 109, and a `from` area 111. Nodes (such as a new node 113) are first created in the creation area 107. When the creation area 107 fills, the meaning of the `to` area 109 and the `from` area 111 are interchanged. Then, active nodes, such as the new node 113, along with active nodes in the `from` area 111 are copied to the `to` area 109. Active nodes in the `to` area 109 are copied to the older generation area 105 when the `to` area 109 fills. This results in a promoted node 115 in the older generation area 105. One skilled in the art will understand that other generational implementations exist. Further one skilled in the art will understand that the creation area 107 contains the youngest nodes.
Card Marking
The process to determine the root set often takes significant processor time searching for pointers in the heap. One optimization used in the prior art is to segment the heap into equal size areas (called cards) and to mark each card when a write operation occurs within the card--a form of a write-barrier. Thus, only cards marked as `dirty` (instead of all the cards in the heap memory) are searched for pointers when updating the root set. FIG. 1b illustrates the use of card marking. A general reference character 120 illustrates a card-marked region of memory 121. The card-marked region of memory 121 contains a first card 123 and a second card 125. In this illustration, the first card 123 is adjacent in memory to the second card 125. Thus a plurality of nodes (A-F) 127 are distributed over the first card 123 and the second card 125. The first card 123 is associated with a first card marker 129 and the second card 125 is associated with a second card marker 131. When memory is modified in one of the cards 123, 125, the appropriate card marker is flagged. Thus, in the illustration of FIG. 1b, a write operation was performed within the first card 123 resulting in the first card marker 129 being marked `dirty` as indicated by the `X` in the first card marker 129. The fact that the second card marker 131 is not marked indicates that none of the memory in the second card 125 has been modified since the last scavenge. The fact that a node `D` 133 extends across the boundary between the first card 123 and the second card 125 complicates the ability to detect the start of the node. Generally, card markers are initialized to all ones (FF hex) because the computer's memory-clear operation is often faster than a store-value operation.
When using card marking, it is often necessary to find the start of a node given a pointer to an address within the interior of the node or an index to a card. This is typically done in the prior art by scanning backwards in memory from the initial pointer (or start of a card) looking for the node's header. However, with programming language implementations that do not differentiate or tag integers, object headers and pointers, scanning backwards does not work due to the inability to detect the start of the node.
Another goal of card marking, when used with a generational garbage collection algorithm, is to skip over objects in the copied generation area of the heap that do not reference objects in the creation area of the heap. However, this goal is lost if the density of such nodes in the older generation is such that most cards are marked. FIG. 1c illustrates this problem of the prior art with a `card marking structure` as indicated by general reference character 140. A `younger area of the heap` 141 contains at least one node 143, 145, 147. An `older generation area of the heap` 149 is segmented into a plurality of cards 151, 153. The card 151 is associated with a `card marker` 155 and the card 153 is associated with a `card marker` 157. A `card boundary` 159 indicates the ending of the card 151 and the beginning of the card 153. The `older generation area of the heap` 149 contains a `number of nodes (A-F)` 161 including a `node E` 163 and a `node C` 165. The `node E` 163 includes a pointer to the node 145 and the `node C` 165 includes a pointer to the node 143 both in the `younger area of the heap` 141. Because a node in the card 151 references the `younger area of the heap` 141 the `card marker` 155 is marked. Because a node in the card 153 references the `younger area of the heap` 141 the `card marker` 157 is marked. Thus, even using card marking, each node in the `older generation area of the heap` 149 must be checked for pointers to the `younger area of the heap` 141. This eliminated the advantage sought by using card marking.
Another problem with cardmarking is that the operation of scanning the card indicators to find the marked cards is an overhead operation because a large number of memory locations (those containing the marking vector) must be examined to locate the marked cards.
A card marking implementation is described in A Fast Write Barrier for Generational Garbage Collectors by Urs Holzle, presented at the OOPSLA'93 Garbage Collection Workshop in Washington D.C. in October 1993. This paper is included by reference as illustrative of the prior art and can be found on the internet at:
"http://self.sunlabs.com/papers/write-barrier.html".
Object Oriented Programming
Object oriented programming (OOP) is a methodology for building computer software. Key OOP concepts include data encapsulation, inheritance and polymorphism. While these three key concepts are common to OOP languages, most OOP languages implement the three key concepts differently. Objects contain data and methods. Methods are procedures that generally access the object's data. The programmer using the object does not need to be concerned with the type of data in the object; rather, the programmer need only be concerned with creating the correct sequence of method invocations and using the correct method.
Smalltalk, Java and C++ are examples of OOP languages. Smalltalk was developed in the Learning Research Group at Xerox's Palo Alto Research Center (PARC) in the early 1970s. C++ was developed by Bjarne Stroustrup at the AT&T Bell Laboratories in 1983 as an extension of C. Java is a OOP language with elements from C and C++ and includes highly tuned libraries for the internet environment. It was developed at SUN Microsystems and released in 1995.
In an OOP system, objects hide (encapsulate) the internal structure of their data and the algorithms used by their methods. Instead of exposing these implementation details, well-designed OOP objects present interfaces that represent their abstractions cleanly with no extraneous information. Polymorphism takes encapsulation a step further. A software component can invoke a method in an OOP object without knowing exact details about how the method operates. Thus a software component can invoke the `draw` method for a square object and a circle object and the objects respectively draw a square and a circle. Inheritance allows developers to reuse pre-existing design and code and reduces the need for developers to create software from scratch. Rather, through inheritance, developers derive subclasses that inherit behaviors from existing OOP objects, that the developer then customizes to meet their particular needs.
Objects
Objects are instantiated in the heap based on classes that contain the programmed methods for the object. Instantiated objects contain data (in instance variables) specific to that particular instantiated object. Generally, an object based on a class is instantiated (or constructed) when a node with memory for the object is allocated from the heap, the required information to tie the object to the class is stored in the object, the object is also associated with other objects as appropriate and the object's instance variables initialized. FIG. 1d illustrates the conceptual aspects of an instantiated object as indicated by general reference character 170. The instantiated object 170 contains an object header 171, a base-class variable storage 173, a first subclass variable storage 175, a second subclass variable storage 177 and a final subclass variable storage 179 for the n.sup.th subclass. The object header 171 contains or refers to information (as indicated by a block 181) that supports the instantiated object 170. The information in the object header 171 often includes a pointer to a class definition and, either directly or indirectly, an instance-variable count. The base-class variable storage 173, and the first subclass variable storage 175 each include instance variables as indicated by a block 183 associated with the second subclass variable storage 177. The instance variables in the block 183 include intermixed pointer and data variables. One difficulty with the organization of information in the instantiated object 170 is that the data value and pointer instance variables can not be distinguished simply by examination of the information stored in the instance variables. Hence, determining the pointers into the heap for garbage collection is inefficient. This inefficiency has led many object-oriented language implementations to sacrifice data value precision and to tag each value to distinguish a pointer value from a data value. Another common approach provides a tag table that associates a tag for each variable defined in the class. The tag indicates whether the instance variable of an instantiated object of the class contains a data value or a pointer value. Using a tag table increases computational overhead because the tag table must be checked for each instance variable when determining the live nodes in the heap. One skilled in the art will understand that pointers may be either direct or indirect.
As previously discussed, objects are allocated from the heap. Thus, objects are a special case of nodes. Further, many OOP implementations assign a `hash value` to objects and provide methods to access this hash value. The hash value is a useful quasi-unique integer associated with a node in the heap. Determining this hash value and storing it in the object when that object is short lived (hence only existent in the heap for a limited period of time) is unnecessary overhead. One prior art method used to reduce this burden is to only generate the hash value when it is requested. Thus a counter containing the next hash value is accessed to get the hash value, the hash value is stored in the node, and the counter incremented requiring one memory read and two memory-write operations.
Further information about OOP concepts may be found in Object Oriented Design with Applications by Grady Booch, the Benjamin/Cummings Publishing Co., Inc., Redwood City, Calif., (1991), ISBN 0-8053-0091-0.
Compilers, Virtual Machines (Interpreters) and Machines
Programming languages allow a programmer to use a symbolic textual representation (the source code) representing the operations that an application binary interface (ABI) (such as a computer or an interpreter running on a computer) is to perform. This symbolic representation is converted into opcodes understood by the ABI. Usually these opcodes are binary values. By processing the source code, compilers create an object file (or object module) containing the opcodes corresponding to the source code. (One skilled in the art will understand that the terms `object file` and `object module` are not related to the `OOP object` previously discussed.) This object module, when linked to other object modules, results in executable instructions that can be loaded into a computer's memory and run by the ABI.
An interpreter is a program that executes on a computer that accesses opcodes and causes the computer to perform one or more operations that effectuate the operation specified by the opcode. Thus, an interpreter can be thought of as a program that provides a virtual computer environment or virtual machine--the ABI. Any computer that is able to execute the interpreter is able to execute programs compiled for the ABI. Thus, the same program's opcodes can be downloaded over a network and executed on a variety of different computer architectures that implement the ABI.
A program's source consists of an ordered grouping of strings (statements) that are converted into both opcodes and data suitable for execution by the execution environment. A source program provides a symbolic description of the operations that the ABI will perform when executing the opcodes resulting from compilation and linkage of the source. The conversion from source to opcodes is performed according to the grammatical and syntactical rules of the programming language used to write the source.
Each compiled statement can produce a multitude of opcodes that, according to the ABI, implement the operation described by the symbolic statement. A compiler may significantly change the structural organization represented by the source when producing the compiled opcodes. However, no matter how much the compiler changes this organization, the compiler is restricted in that the opcodes, when run by the ABI, must provide the same result as the programmer described using the source language--regardless of how this result is obtained. Similarly, the order in which data is stored in the structure need not be the same order as implied by the sequence of variable declarations supplied by the programmer. For example, the actual placement of instance variables in an instantiated object need not be in the same order as the variables were defined in the class declaration.
Many modern compilers can optimize the binary opcodes resulting from the compilation process. Due to the design of programming languages, a compiler can determine structural information about the program being compiled. This information can be used by the compiler to generate different versions of the sequence of opcodes that perform the same operation. (For example, enabling debugging capability, or optimizing instructions dependent on which version of the target processor for which the source code is compiled.) Some optimizations minimize the amount of memory required to hold the instructions; other optimizations reduce the time required to execute the instructions.
Some advantages of optimization are that the optimizing compiler frees the programmer from the time consuming task of manually tuning the source code. This increases programmer productivity. Optimizing compilers also encourage a programmer to write maintainable code because manual tuning often makes the source code less understandable to other programmers. Finally, an optimizing compiler improves portability of code because source code tuned to one computer architecture may be inefficient on another computer architecture. A general discussion of optimizing compilers and the related techniques used can be found in Compilers: Principles, Techniques and Tools by Alfred V. Aho, Ravi Sethi and Jeffrey D. Ullman, Addison-Wesley Publishing Co. 1988, ISBN 0-201-10088-6, in particular chapters 9 and 10, pages 513-723.
One programming construct that can be significantly optimized are loops. Loops often iterate using a loop-control variable. The loop-control variable is initialized to a starting value for the first iteration of the loop. The loop-control variable is modified by a stride value on each iteration of the loop until the loop-control variable reaches a last value. The loop completes when the loop-control variable reaches the last value. Such loops are often used to assign values to elements of an array of pointers (for example, an array of pointers to OOP objects). For applications using card marking or other write-barrier methods this means that the write-barrier instructions are also executed in the loop. Thus, a loop is inefficient if that loop assigns values to elements in a pointer array in a heap that uses a write-barrier.