1. Field of the Invention
This invention relates to the field of computer memory allocation and deallocation. Specifically, this invention is a new and useful method, apparatus, system, and computer program product for processing null pointers in a garbage collected memory within an object-oriented programming environment with statically typed variables that may contain an null pointer.
2. Background
Memory allocation and deallocation techniques have become very important in structured programming and object oriented programming methodologies. Memory allocated from a heap can be used to store information. Within an object-oriented programming paradigm this information is often in the form of an instantiated object. An allocated portion of heap memory is a node. The subsequently described techniques apply to both nodes that contain data and nodes that are instantiated objects. These nodes are explicitly allocated by the program. However, many modern systems use heap-memory garbage collection techniques to recover previously allocated, but no longer used, nodes.
Introduction to Garbage Collection
Computer memory is a resource. Programs cause a computer to perform operations (to execute) based on instructions stored in memory. Executing programs also use memory to store information. This information is often organized into memory resident data structures. These data structures are often linked together by pointers from one structure to another and are often referenced through pointers in static, register and stack variable storage.
Modern programming languages provide facilities for static, stack and heap allocation of memory. Static allocation binds variables to storage locations at compile and/or link time. Stack allocation pushes an activation frame on the processor's stack when a program block prepares to execute. This activation frame contains storage for variables within the scope of execution for the program block executing in the processor. Once the program block completes, the activation frame is popped from stack. Variables stored in the activation frame are not saved from one activation of the block to the next. Heap allocation allows memory for variables to be allocated and deallocated in any order and these variables can outlive the procedure (or block) that created them. Once memory is deallocated it is available for reallocation for another use.
A "node" is an area of memory allocated from a heap. Nodes are accessed through pointers. A direct (or simple) pointer is the node's address in the heap. An indirect pointer (sometimes called a "handle") points to an address in memory that contains the address of the node. More complex pointers exist. Indirect pointers allow nodes to be moved in the heap without needing to update the occurrences of the handle.
The "root set" is a set of node references such that the referenced nodes must be retained regardless of the state of the heap. A node is reachable if the node is in the root set, or referenced by a reachable node. The "reference set" is the set of node references contained in a node. A memory leak occurs when a node becomes unreachable from the root set and is never reclaimed. A memory leak reduces the amount of heap memory available to the program. A garbage node is a node that becomes unreachable from the root set and can be reclaimed.
Heap memory can be used by invoking explicit node allocation and deallocation procedures. However, although a programmer knows when a new node is required, it is often difficult for the programmer to know when a node is no longer reachable. Thus, problems may occur when programmers explicitly deallocate nodes. One of these problems is that it is very difficult to debug memory leaks. Often the design of the application being programmed obfuscates when the programmer can explicitly deallocate memory. Additionally, when one portion of a program is ready to deallocate memory, it must be certain that no other portion of the program will use that memory. Thus, in object oriented programming (OOP) languages, multiple modules must closely cooperate in the memory management process. This, contrary to OOP programming methodology, leads to tight binding between supposedly independent modules.
These difficulties are reduced if the programmer need not explicitly deallocate memory. Automatic garbage collection methods scan memory for referenced nodes and recover garbage nodes--but at a cost. The process of finding and deallocating garbage nodes takes processor resources. Balancing the impact of the garbage collection process on an executing program is important because the primary function of the program may require timely operation, uninterrupted user interaction or be subject to some other real-time constraint.
A mutator program changes (mutates) the connectivity of the graph of active nodes in the heap. In a system using garbage collection, nodes are allocated from the heap as memory is needed by the mutator program. These nodes are not initially reclaimed when they are no longer needed. Instead, when a memory allocation attempt fails or in response to some condition (for example, on expiration of a clock or counter), the mutation phase is paused, the garbage collection phase is automatically invoked and unused memory allocated to garbage nodes is reclaimed for subsequent reuse. The mutation phase resumes after the garbage collection phase completes.
Some garbage collection methods copy (or scavenge) nodes (that is, these methods relocate nodes that appear to be alive from one location in the heap to another location). These methods require a mechanism that allows existing pointers to the original location of the node to be used to access the relocated node. These mechanisms include (among others) updating existing pointers to the node's original location and providing indirect pointers to the new location of the node.
The prior art in garbage collection is well discussed in Garbage Collection, Algorithms for Automatic Dynamic Memory Management, by Richard Jones and Rafael Lins, John Wiley & Sons, ISBN 0-471-94148-4, copyright 1996, hereby incorporated by reference as indicative of the prior art.
Object Oriented Programming
Object oriented programming (OOP) is a methodology for building computer software. Key OOP concepts include data encapsulation, inheritance and polymorphism. While these three key concepts are common to OOP languages, most OOP languages implement the three key concepts differently. Objects contain data and methods. Methods are procedures that generally access the object's data. The programmer using the object does not need to be concerned with the type of data in the object; rather, the programmer need only be concerned with creating the correct sequence of method invocations and using the correct method.
Smalltalk, Java and C++ are examples of OOP languages. Smalltalk was developed in the Learning Research Group at Xerox's Palo Alto Research Center (PARC) in the early 1970s. C++ was developed by Bjarne Stroustrup at the AT&T Bell Laboratories in 1983 as an extension of C. Java is an OOP language with elements from C and C++ and includes highly tuned libraries for the internet environment. Java uses garbage collection techniques to manage its heap. Java was developed at SUN Microsystems and released in 1995. The Java environment is also an example object-oriented programming environment with statically typed variables that may contain a null pointer.
Further information about OOP concepts may be found in Object Oriented Design with Applications by Grady Booch, the Benjamin/Cummings Publishing Co., Inc., Redwood City, Calif., (1991), ISBN 0-8053-0091-0.
Objects
Objects are instantiated in the heap based on classes that contain the programmed methods for the object. Objects are specialized data structures that generally include data specific to the object and references to procedures that manipulate the data. Instantiated objects contain data (in instance variables) specific to that particular instantiated object. Generally, an object based on a class is instantiated (or constructed) when a node with memory for the object is allocated from the heap, the required information to tie the object to the class is stored in the object, the object is associated with other objects as appropriate and the object's instance variables initialized. Like any data structure, the object may contain instance variables that are used to store pointers. These instance variables are generally initialized to a specified initial value when the object is instantiated.
FIG. 1a illustrates a linked data object, indicated by general reference character 100, that includes a first data object 101. The first data object 101 includes an object header 103 and a `non-pointer` instance variable 105. The `non-pointer` instance variable 105 is used to contain data such as an integer or a floating point value. The first data object 101 also includes a `non-null pointer` instance variable 107 that contains a pointer to a second data object 109. In addition, the first data object 101 also contains a `null pointer` instance variable 111 that contains a value that is defined to be an invalid pointer. Invalid pointers are commonly defined to be either the ZERO pointer or an address of a NULL object (an identifiable special object located at a specified address). The term "null pointer" refers to whichever of these values is used to define the invalid pointer. The null pointer is often used as a linked object termination indicator such as in a leaf or end object in a linked list or tree. The null pointer is also used to initialize unassigned pointer variables so that an attempted access through an unassigned pointer value can be detected. Thus, pointer variables are generally initialized to the null pointer. These pointer variables are subsequently assigned pointer values that reference nodes in the heap memory. The Java environment, for example, provides facilities for detecting when a reference is attempted through a null pointer and, when this attempt occurs, for raising the NullPointerException. The use of the NullPointerException is described in Java Developer's Reference, by Mike Cohn et el., .COPYRGT. 1996 by Sams.net Publishing, ISBN 1-57521-129-7, in chapter 22 and at pages 1009-1010.
The Java environment also provides garbage collection facilities. However, as is subsequently described, the two approaches previously described for defining the null pointer have conflicting advantages in object-oriented programming environment with statically typed variables such as the Java environment.
FIG. 1b illustrates a garbage collection process, indicated by general reference character 120, used when the null pointer is defined as the ZERO pointer. The process 120 initiates at a `start` terminal 121 and continues to a decision procedure 123 that compares the contents of the pointer variable to ZERO. If the content of the pointer variable is ZERO, the process 120 completes through an `end` terminal 125. Otherwise, the process 120 continues to a `garbage collect reference` procedure 127 that performs prior art garbage collection operations on the contents of the pointer variable (the reference). The process 120 completes through the `end` terminal 125 after the `garbage collect reference` procedure 127 completes.
Those skilled in the art will understand that the decision procedure 123 is executed for every pointer variable. Thus, the process 120 is inefficient as compared to a process that omits the decision procedure 123. The decision procedure 123 can be omitted if the null pointer is defined to be a pointer to a NULL object instead of the ZERO pointer. In this situation, a separate NULL object is allocated and a pointer to the NULL object is used as to indicate link termination and unassigned variables--thus, obviating the need for the decision procedure 123 and improving the process 120. However, as subsequently discussed, this definition of the null pointer adversely affects the mutator.
The Java environment is an example of an object-oriented programming environment with statically typed variables that provides a facility to raise an exception (the NullPointerException in the Java environment) when a memory reference is attempted to an address region near address zero. Thus, if the null pointer is defined as the ZERO pointer, this facility will capture an attempted access through the null pointer. This allows the mutator to access objects through a pointer without explicitly checking that the pointer is the null pointer. Instead, if the mutator attempts to access an object through the null pointer, the Java envrionment will intercept the attempted access and raise the NullPointerException. The mutator either will explicitly handle this exception or will terminate. Other programming environments provide similar capabilities.
FIG. 1c illustrates a mutator's object reference process, indicated by general reference character 140, used to validate object accesses when the address of the NULL object (instead of the ZERO pointer) is defined to be the null pointer. The mutator's object reference process 140 is invoked whenever an object is accessed by the mutator (that is, whenever the mutator performes an operation on an object and whenever the mutator accesses an object's instance variable). The mutator's object reference process 140 initiates at a `start` terminal 141 and continues to a decision procedure 143. The decision procedure 143 compares the address of the object being operated on (the value of the pointer) with the address of the NULL object. If the value of the pointer is not the same as the address of the NULL object, the mutator's object reference process 140 continues to an `operate on reference` procedure 145 that performs the desired operation on the object. Next, the mutator's object reference process 140 completes through an `end` terminal 147. However, if at the `operate on reference` procedure 145 the address of the object is the same as the address of the NULL object, the decision procedure 143 continues to an `invoke exception` procedure 149. The `invoke exception` procedure 149 invokes a system exception or other operation used to signal that an access has been attempted though a null pointer. Thus, the mutator is less efficient when the address of the NULL object is defined to be the null pointer instead of the ZERO pointer because the decision procedure 143 must be executed at every object reference even though the garbage collection process is more efficient.
Thus, one skilled in the art will understand that although defining the address of a NULL object to be the null pointer improves the efficiency of the garbage collection process, that this definition complicates the mutator's access to objects. Conversely, although using the ZERO pointer as the null pointer improves the efficiency of the mutator it adversely impacts the garbage collection operation. Thus, the problem is to define a null pointer, and an approach for checking accesses through the null pointer, that are efficient for both the mutator's object accesses and the garbage collection process's accesses.