1. Field of the Invention
The present invention is directed to memory management. It particularly concerns what has come to be known as “garbage collection.”
2. Background Information
In the field of computer systems, considerable effort has been expended on the task of allocating memory to data objects. For the purposes of this discussion, the term object refers to a data structure represented in a computer system's memory. Other terms sometimes used for the same concept are record and structure. An object may be identified by a reference, a relatively small amount of information that can be used to access the object. A reference can be represented as a “pointer” or a “machine address,” which may require, for instance, only sixteen, thirty-two, or sixty-four bits of information, although there are other ways to represent a reference.
In some systems, which are usually known as “object oriented,” objects may have associated methods, which are routines that can be invoked by reference to the object. They also may belong to a class, which is an organizational entity that may contain method code or other information shared by all objects belonging to that class. In the discussion that follows, though, the term object will not be limited to such structures; it will additionally include structures with which methods and classes are not associated.
The invention to be described below is applicable to systems that allocate memory to objects dynamically. Not all systems employ dynamic allocation. In some computer languages, source programs can be so written that all objects to which the program's variables refer are bound to storage locations at compile time. This storage-allocation approach, sometimes referred to as “static allocation,” is the policy traditionally used by the Fortran programming language, for example.
Even for compilers that are thought of as allocating objects only statically, of course, there is often a certain level of abstraction to this binding of objects to storage locations. Consider the typical computer system 10 depicted in FIG. 1, for example. The data and instructions for operating on them that a microprocessor 11 uses may reside in on-board cache memory or be received from further cache memory 12, possibly through the mediation of a cache controller 13. That controller 13 can in turn receive such data from system read/write memory (“RAM”) 14 through a RAM controller 15 or from various peripheral devices through a system bus 16. The memory space made available to an application program may be “virtual” in the sense that it may actually be considerably larger than RAM 14 provides. So the RAM contents will be swapped to and from a system disk 17.
Additionally, the actual physical operations performed to access some of the most-recently visited parts of the process's address space often will actually be performed in the cache 12 or in microprocessor 11's on-board cache rather than on the RAM 14, with which those caches swap data and instructions just as RAM 14 and system disk 17 do with each other.
A further level of abstraction results from the fact that an application will often be run as one of many processes operating concurrently with the support of an underlying operating system. As part of that system's memory management, the application's memory space may be moved among different actual physical locations many times in order to allow different processes to employ shared physical memory devices. That is, a single location specified in the application's machine code may actually result in different physical locations at different times because the operating system adds different offsets to the machine-language-specified location.
Despite these expedients, the use of static memory allocation in writing certain long-lived applications makes it difficult to restrict storage requirements to the available memory space. Abiding by space limitations is easier when the platform provides for dynamic memory allocation, i.e., when the determination of whether memory space will be allocated to a given object is made only at run time.
Dynamic allocation has a number of advantages, among which is that the run-time system is able to adapt allocation to run-time conditions. For example, the programmer can specify that space should be allocated for a given object only in response to a particular run-time condition. Conversely, the programmer can specify conditions under which memory previously allocated to a given object can be reclaimed for reuse. The C-language library functions malloc( ) and free( ) are often used for such run-time allocation and reclamation.
Because dynamic allocation provides for memory reuse, it facilitates generation of large or long-lived applications, which over the course of their lifetimes may employ objects whose total memory requirements would greatly exceed the available memory resources if they were bound to memory locations statically. Particularly for long-lived applications, though, dynamic-memory allocation and reclamation must be performed carefully. If the application fails to reclaim unused memory—or, worse, loses track of the address of a dynamically allocated segment of memory—its memory requirements will grow over time to exceed the system's available memory. This kind of error is known as a “memory leak.”
Another kind of error occurs when an application reclaims memory for reuse even though it still maintains a reference to that memory. If the reclaimed memory is reallocated for a different purpose, the application may inadvertently manipulate the same memory in multiple inconsistent ways. This kind of error is known as a “dangling reference,” because an application should not retain a reference to a memory location once that location is reclaimed.
Because of human limitations, managing dynamic memory explicitly by using interfaces like malloc( )/free( ) often leads to these problems. Whereas a programmer working on a particular sequence of code can perform his task creditably in most respects with only local knowledge of the program at any given time, memory allocation and reclamation require more-global knowledge. Specifically, a programmer dealing with a given sequence of code does tend to know whether some portion of memory is still in use by that sequence of code. But it is considerably more difficult for him to keep track of what all the rest of the program does with that memory.
A way to reduce the likelihood of such leaks and related errors is to provide memory-space reclamation in a more-automatic manner. Techniques used by systems that reclaim memory space automatically are commonly referred to as “garbage collection.” Garbage collectors operate by reclaiming space that they no longer consider “reachable.” Statically allocated objects represented by a program's global variables are normally considered reachable throughout a program's life. Such objects are not ordinarily stored in the garbage collector's managed memory space, but they may contain references to dynamically allocated objects that are. If so, the thereby-referenced dynamically allocated objects are considered reachable, too. Clearly, an object referred to in the processor's call stack is reachable, as is an object referred to by register contents. And an object referred to by any reachable object is also reachable.
Automatic garbage collectors obtain the global knowledge required for proper dynamic memory management by tracing references from some collector-appropriate notion of a “root set,” e.g., global variables, registers, and the call stack. By using a garbage collector, the programmer is relieved of the need to worry about the application's global state and can concentrate on (more-manageable) local-state issues. The result is applications that are more robust, having no dangling references and fewer memory leaks.
Garbage-collection mechanisms can be implemented by various parts and levels of a computing system. One approach is simply to provide them as part of a batch compiler's output. Consider FIG. 2's simple batch-compiler operation, for example. A computer system executes in accordance with compiler object code and therefore acts as a compiler 20. The compiler object code is typically stored on a medium such as FIG. 1's system disk 17 or some other machine-readable medium, and it is loaded into RAM 14 to configure the computer system to act as a compiler. In some cases, though, the compiler object code's persistent storage may instead be provided in a server system remote from the machine that performs the compiling. The electrical signals that carry the digital data by which the computer systems exchange that code are exemplary forms of carrier waves transporting the information.
The input to the compiler is the application source code, and the end product of the compiler process is application object code. This object code defines an application 21, which typically operates on input such as files, mouse clicks, etc., to generate a display or some other type of output. This object code implements the relationship that the programmer intends to specify by his application source code. In one approach to garbage collection, the compiler 20, without the programmer's explicit direction, additionally generates code that automatically reclaims memory space containing unreachable objects.
Even in this simple case, though, there is a sense in which the application does not itself provide the entire garbage collector. Specifically, the application will typically call upon the underlying operating system's memory-allocation functions. And the operating system may in turn take advantage of various hardware that lends itself particularly to use in garbage collection. So even a very simple system may disperse the garbage-collection mechanism over a number of computer-system layers.
To get some sense of the variety of system components that can be used to implement garbage collection, consider FIG. 3's example of a more complex way in which various levels of source code can result in the machine instructions that a processor executes. In the FIG. 3 arrangement, the human applications programmer produces source code 22 written in a high-level language. A compiler 23 typically converts that code into “class files.” These files include routines written in instructions, called “byte codes” 24, for a “virtual machine” that various processors can be configured to emulate. This conversion into byte codes is almost always separated in time from those codes' execution, so FIG. 3 divides the sequence into a “compile-time environment” 25 separate from a “run-time environment” 26, in which execution occurs. One example of a high-level language for which compilers are available to produce such virtual-machine instructions is the Java™ programming language. (Java is a trademark or registered trademark of Sun Microsystems, Inc., in the United States and other countries.)
Most typically, the class files' byte-code routines are executed by a processor under control of a virtual-machine process 27. That process emulates a virtual machine from whose instruction set the byte codes are drawn. As is true of the compiler 23, the virtual-machine process 27 may be specified by code stored on a local disk or some other machine-readable medium from which it is read into FIG. 1's RAM 14 to configure the computer system to implement the garbage collector and otherwise act as a virtual machine. Again, though, that code's persistent storage may instead be provided by a server system remote from the processor that implements the virtual machine, in which case the code would be transmitted electrically or optically to the virtual-machine-implementing processor.
In some implementations, much of the virtual machine's action in executing these byte codes is most like what those skilled in the art refer to as “interpreting,” so FIG. 3 depicts the virtual machine as including an “interpreter” 28 for that purpose. In addition to or instead of running an interpreter, many virtual-machine implementations actually compile the byte codes concurrently with the resultant object code's execution, so FIG. 3 depicts the virtual machine as additionally including a “just-in-time” compiler 29. The arrangement of FIG. 3 differs from FIG. 2 in that the compiler 23 for converting the human programmer's code does not contribute to providing the garbage-collection function; that results largely from the virtual machine 27's operation.
Those skilled in that art will recognize that both of these organizations are merely exemplary, and many modern system employ hybrid mechanisms, which partake of the characteristics of traditional compilers and traditional interpreters both. The invention to be described below is applicable independently of whether a batch compiler, a just-in-time compiler, an interpreter, or some hybrid is employed to process source code. In the remainder of this application, therefore, we will use the term compiler to refer to any such mechanism, even if it is what would more typically called an interpreter.
Now, some of the functionality that source-language constructs specify can be quite complicated, requiring many machine-language instructions for their implementation. One quite-common example is a source-language instruction that calls for 64-bit arithmetic on a 32-bit machine. More germane to the present invention is the operation of dynamically allocating space to a new object; this may require determining whether enough free memory space is available to contain the new object and reclaiming space if there is not.
In such situations, the compiler may produce “inline” code to accomplish these operations. That is, all object-code instructions for carrying out a given source-code-prescribed operation will be repeated each time the source code calls for the operation. But inlining runs the risk that “code bloat” will result if the operation is invoked at many source-code locations.
The natural way of avoiding this result is instead to provide the operation's implementation as a procedure, i.e., a single code sequence that can be called from any location in the program. In the case of compilers, a collection of procedures for implementing many types of source-code-specified operations is called a runtime system for the language. The compiler and its runtime system are designed together so that the compiler “knows” what runtime-system procedures are available in the target computer system and can cause desired operations simply by including calls to procedures that the target system already contains. To represent this fact, FIG. 3 includes block 30 to show that the compiler's output makes calls to the runtime system as well as to the operating system 31, which consists of procedures that are similarly system resident but are not compiler-dependent.
Although the FIG. 3 arrangement is a popular one, it is by no means universal, and many further implementation types can be expected. Proposals have even been made to implement the virtual machine 27's behavior in a hardware processor, in which case the hardware itself would provide some or all of the garbage-collection function. In short, garbage collectors can be implemented in a wide range of combinations of hardware and/or software.
The invention to be described below is applicable to most such systems, so long as they are of the “generational” variety. Generational garbage collection is distinguished by the fact that the “heap” of dynamically allocable memory is divided into a “young generation” and one or more “old generations.” Most if not all objects are allocated initially in the young generation. If an object remains reachable after some period of time, it is “promoted” into an old generation, which is managed differently from the young generation. To understand the reason for dividing the heap into generations, it helps to recall certain garbage-collection basics, which we will now review.
Garbage collection involves searching through the heap for objects to which there are no chains of references from the root set. When the collector finds such objects, it returns the memory space that they occupy to its list of free space available for new-object allocation. A measure of a collector's efficiency is the amount of memory space it can thereby free in a given amount of collector execution time.
Now, it has been observed empirically that in most programs most dynamically allocated objects are used for only a short time. An implication of this fact for garbage collectors is that they will tend to be more productive if they spend their time disproportionately in parts of the heap where objects have been allocated most recently. If a collector treats a portion of the heap as a young generation, in which it allocates most or all new objects, then performing its activity disproportionately in the younger generation will tend to yield greater collector efficiency and thus greater efficiency of the application that employs the collector's services.
Further advantages follow from the fact that write operations in most applications tend to occur disproportionately in the fields of newly allocated objects. To appreciate those advantages, it is necessary to understand a further typical feature of generational garbage collectors, namely, the use of so-called remembered sets. We now turn to a discussion of that feature.
A typical generational garbage collector's young generation occupies only a fairly small proportion of the entire heap. The collector searches that portion very frequently for garbage. To do so, it typically interrupts the mutator, i.e., the non-garbage-collector part of the application. Also typically, a large number of these “minor cycles” of young-generation collecting occur for each “major cycle” of older-generation collecting.
Although in a minor cycle the collector may scan only the relatively small young generation for reachable objects, the number (and lengths) of reference chains leading from the root set can be large, with many extending into the (typically large) old generation. Usually, though, very few of the chains that include references in the old generation lead to objects in the new generation. Instead of following reference chains into the old generation, therefore, the collector will typically rely on “remembered sets.” A remembered set is a data structure that is associated with a generation (in this case, the young generation) or portion thereof and that lists locations in the heap where references to objects in that generation or portion thereof may be found. Instead of following all reference chains into the old generation, the garbage collector can restrict its attention to only the regions that the remembered set's entries identify. If a young-generation object is not referred to by a reference thus found, the collector can conclude that the object is not reachable through the old generation, so it is garbage if it is not reachable directly from a root or through a reference chain from a root that involves only young-generation objects. The number of such references in the old generation tends to be small, not only because the young generation itself is small but also because older objects tend not to refer to younger ones.
To enable the collector to maintain remembered sets, the mutator must communicate with the garbage collector to tell it when the mutator has modified a heap-located reference that potentially identifies a heap-allocated object. The mechanism used for this purpose is known as a “write barrier.” Whenever source code calls for a reference to be written, the compiler will insert into the object code additional instructions that record the fact of that operation in a data structure to which the garbage collector will later refer. Specifically, the garbage collector refers to that data structure during the garbage-collection cycle in order to update the remembered sets. Since in most programs such write operations take place primarily in the young generation, not many entries need to be made in the data structure that keeps track of how many old-generation objects have been “dirtied” by a reference write. So a minor-cycle collection can be performed efficiently.
But the write barriers that make it possible to maintain such data structures exact a performance penalty: reference writing requires that the mutator execute more instructions than it would have to if it did not need to communicate with the collector. To a degree, therefore, obtaining such collector efficiency exacts a mutator-performance cost.