1. Field of the Invention
The present invention concerns computer-program compiling and in particular to what has in that discipline come to be referred to as “garbage collection.”
2. Background Information
Garbage collection is the term that has come to be used for the operations by which data objects that a program will no longer use are recognized so that the computer memory occupied by those objects can be reclaimed for reuse. For the purposes of this discussion, the term object refers to a data structure represented in a computer system's memory. Other terms sometimes used for the same concept are record and structure. An object may be identified by a reference, a relatively small amount of information that can be used to access the object. A reference can be represented as a “pointer” or a “machine address,” which may require, for instance, only sixteen, thirty-two, or sixty-four bits of information, although there are other ways to represent a reference.
In some systems, which are usually known as “object oriented,” objects may have associated methods, which are routines that can be invoked by reference to the object. An object also may belong to a class, which is an organizational entity that may contain method code or other information shared by all objects belonging to that class. In the discussion that follows, though, the term object will not be limited to such structures; it will additionally include structures with which methods and classes are not associated.
Garbage collection is used almost exclusively in environments in which memory can be allocated to some objects dynamically. Not all systems employ dynamic allocation. In some computer languages, source programs must be se written so that all objects to which the program's variables refer are bound to storage locations at compile time. This storage-allocation approach, sometimes referred to as “static allocation,” is the policy traditionally used by the Fortran programming language, for example.
Even for compilers that are thought of as allocating objects only statically, of course, there is often a certain level of abstraction to this binding of objects to storage locations. Consider the typical computer system 10 depicted in FIG. 1, for example. Data that a microprocessor 11 uses and instructions for operating on them may reside in onboard cache memory or be received from further cache memory 12, possibly through the mediation of a cache controller 13. That controller 13 can in turn receive such data from system read/write memory (“RAM”) 14 through a RAM controller 15 or from various peripheral devices through a system bus 16. The memory space made available to an application program may be “virtual” in the sense that it may actually be considerably larger than RAM 14 provides. So the RAM contents will be swapped to and from a system disk 17.
Additionally, the actual physical operations performed to access some of the most-recently visited parts of the process's address space often will actually be performed in the cache 12 or in a cache on board microprocessor 11 rather than on the RAM 14. Those caches would swap data and instructions with the RAM 14 just as RAM 14 and system disk 17 do with each other.
A further level of abstraction results from the fact that an application will often be run as one of many processes operating concurrently with the support of an underlying operating system. As part of that system's memory management, the application's memory space may be moved among different actual physical locations many times in order to allow different processes to employ shared physical memory devices. That is, the location specified in the application's machine code may actually result in different physical locations at different times because the operating system adds different offsets to the machine-language-specified location.
Some computer systems may employ a plurality of processors so that different processes' executions actually do occur simultaneously. Such systems come in a wide variety of configurations. Some may be largely the same as that of FIG. 1 with the exception that they include more than one microprocessor such as processor 11, possibly together with respective cache memories, sharing common read/write memory by communication over the common bus 16.
In other configurations, parts of the shared memory may be more local to one or more processors than to others. In such a situation, one or the other or both of the processors may need to fetch code or data or both from a remote location, but it will often be true that parts of the code will be replicated in both places.
Despite these expedients, the use of static memory allocation in writing certain long-lived applications makes it difficult to restrict storage requirements to the available memory space. Abiding by space limitations is easier when the platform provides for dynamic memory allocation, i.e., when the platform enables allocation of memory space to be delayed until after the program has been loaded and is already running.
Dynamic allocation has a number of advantages, among which is that the run-time system is able to adapt allocation to run-time conditions; for given objects the programmer can specify respective conditions on which space should be allocated to them. The C-language library function malloc( ) is often used for this purpose. Conversely, the programmer can specify conditions under which memory previously allocated to a given object can be reclaimed for reuse. The C-language library function free( ) results in such memory reclamation.
Because dynamic allocation provides for memory reuse, it facilitates generation of large or long-lived applications, which over the course of their lifetimes may employ objects whose total memory requirements would greatly exceed the available memory resources if they were bound to memory locations statically.
Particularly for long-lived applications, though, allocation and reclamation of dynamic memory must be performed carefully. If the application fails to reclaim unused memory—or, worse, loses track of the address of a dynamically allocated segment of memory—its memory requirements will grow over time to exceed the system's available memory. This kind of error is known as a “memory leak.” Another kind of error occurs when an application reclaims memory for reuse even though it still maintains a reference to that memory. If the reclaimed memory is reallocated for a different purpose, the application may inadvertently manipulate the same memory in multiple inconsistent ways. This kind of error is known as a “dangling reference,” because an application should not retain a reference to a memory location once that location is reclaimed. Explicitly managing dynamic memory by using interfaces like malloc( )/free( ) often leads to these problems.
Such leaks and related errors can be made less likely by reclaiming memory space more automatically. As was mentioned above, the software and/or hardware used for this purpose is typically referred to as a garbage collector. Garbage collectors operate by inspecting the running program's current state, determining from that state whether it can decide that there are some objects that the program can no longer reach, and reclaiming objects thus found not to be reachable. The criteria that garbage collectors use for this purpose vary, but, for example, a program's global variables are normally considered reachable throughout a program's life. Although they are not ordinarily stored in the memory space that the garbage collector manages, they may contain references to dynamically allocated objects that are, and the garbage collector will consider such objects reachable. It will typically also consider an object reachable if it is referred to by a reference in a register or a thread's call stack. And reachability is contagious: if a reachable object refers to another object, that other object is reachable, too.
It is advantageous to use garbage collectors because, whereas a programmer working on a particular sequence of code can perform his task creditably in most respects with only local knowledge of the application, memory allocation and reclamation tend to require more-global knowledge. A programmer dealing with a small subroutine, for example, may well be able to identify the point in the subroutine beyond which the routine has finished with a given memory portion, but knowing whether the application as a whole will be finished with it at that point is often much more difficult. In contrast, garbage collectors typically work by tracing references from some conservative notion of a “root set,” e.g., global variables, registers, and the call stack: they thereby obtain reachability information methodically. By using a garbage collector, the programmer is relieved of the need to worry about the application's global state and can concentrate on (more-manageable) local-state issues. The result is applications that are more robust, having no dangling references and fewer memory leaks.
Garbage-collection mechanisms can be implemented by various parts and levels of a computing system. One approach is simply to provide them as part of a batch compiler's output. Consider FIG. 2's simple batch-compiler operation, for example. A computer system executes in accordance with compiler object code and therefore acts as a compiler 20. The compiler object code is typically stored on a medium such as FIG. 1's system disk 17 or some other machine-readable medium, and it is loaded into RAM 14 to configure the computer system to act as a compiler. In some cases, though, the compiler object code's persistent storage may instead be provided in a server system remote from the machine that performs the compiling. The electrical signals that typically carry the digital data by which the computer systems exchange that code are examples of the kinds of electromagnetic signals by which the computer instructions can be communicated. Others are radio waves, microwaves, and both visible and invisible light.
The input to the compiler is the application source code, and the end product of the compiler process is application object code. This object code defines an application 21, which typically operates on input such as mouse clicks, etc., to generate a display or some other type of output. This object code implements the relationship that the programmer intends to specify by his application source code. In one approach to garbage collection, the compiler 20, without the programmer's explicit direction, additionally generates code that automatically reclaims unreachable memory space.
Even in this simple case, though, there is a sense in which the application does not itself provide the entire garbage collector. Specifically, the application will typically call upon the underlying operating system's memory-allocation functions. And the operating system may in turn take advantage of hardware that lends itself particularly to use in garbage collection. So even a very simple system may disperse the garbage-collection mechanism over a number of computer-system layers.
To get some sense of the variety of system components that can be used to implement garbage collection, consider FIG. 3's example of a more complex way in which various levels of source code can result in the machine instructions that a processor executes. In the FIG. 3 arrangement, the human applications programmer produces source code 22 written in a high-level language. A compiler 23 typically converts that code into “class files.” These files include routines written in instructions, called “byte code” 24, for a “virtual machine” that various processors can be software-configured to emulate. This conversion into byte code is almost always separated in time from that code's execution, so FIG. 3 divides the sequence into a “compile-time environment” 25 separate from a “run-time environment” 26, in which execution occurs. One example of a high-level language for which compilers are available to produce such virtual-machine instructions is the Java™ programming language. (Java is a trademark or registered trademark of Sun Microsystems, Inc., in the United States and other countries.)
Most typically, the class files' byte-code routines are executed by a processor under control of a virtual-machine process 27. That process emulates a virtual machine from whose instruction set the byte code is drawn. As is true of the compiler 23, the virtual-machine process 27 may be specified by code stored on a local disk or some other machine-readable medium from which it is read into FIG. 1's RAM 14 to configure the computer system to implement the garbage collector and otherwise act as a virtual machine. Again, though, that code's persistent storage may instead be provided by a server system remote from the processor that implements the virtual machine, in which case the code would be transmitted by electromagnetic signals to the virtual-machine-implementing processor.
In some implementations, much of the virtual machine's action in executing these byte codes is most like what those skilled in the art refer to as “interpreting,” so FIG. 3 depicts the virtual machine as including an “interpreter” 28 for that purpose. In addition to or instead of running an interpreter, many virtual-machine implementations actually compile the byte codes concurrently with the resultant object code's execution, so FIG. 3 depicts the virtual machine as additionally including a “just-in-time” compiler 29.
The resultant instructions typically invoke calls to a run-time system 30, which handles matters such as loading new class files as they are needed and includes much of garbage-collector implementation. The run-time system will typically call on the services of an underlying operating system 31. Among the differences between the arrangements of FIGS. 2 and 3 in that FIG. 3's compiler 23 for converting the human programmer's code does not contribute to providing the garbage-collection function; that results largely from the virtual machine 27's operation.
Independently of the specific system-architecture levels at which the collector resides, garbage collection usually includes some amount of reference tracing to determine whether objects are at least potentially reachable; if they are not potentially reachable, they are garbage, and their memory space can therefore be reclaimed. The most straightforward and accurate way to do the tracing is to start at the root set and scan recursively for referred-to objects until all have been identified. If an object is not encountered in that process, it is not reachable, even if it is referred to by a reference in some other object.
Now, it would be unacceptable in many applications to have the mutator pause while the collector traces references through the whole heap. So some garbage collectors operate “space-incrementally.” In each successive collection increment, which typically (but not necessarily) is all performed during a single pause in mutator execution, the collector considers a small portion, or collection set, of the heap. For respective regions of the heap into which the collector treats the heap as divided, the collector keeps track of the references that refer directly to objects in those regions. A collection-set object can be recognized as unreachable if no reference chain from such a reference includes it. This approach can often limit the lengths of collection pauses adequately.
Even in space-incremental approaches, though, it is desirable to reduce collection-pause lengths as much as possible, and one way of doing so involves the way in which the system updates “remembered sets.” For each of a number of heap regions, the system keeps a so-called remembered set of the locations where references to objects in those regions have been observed. To enable the collector to keep remembered sets complete, mutators written for use with space-incremental collectors typically include so-called write barriers. A write barrier is code added to a reference-writing operation to notify the collector of that operation. Write barriers of the type relevant here do so by recording information from which the collector can determine where references were written, although they may additionally record other types of information about the operation, such as the reference value written or replaced. Before it uses the remembered sets to determine collection-set objects' potential reachability, the collector typically updates them, and in doing so it can limit its attention to locations identified by the write barriers as possibly modified since the last collection interval.
To this end, the write barrier can simply add to a list, commonly called a sequential-store buffer, of locations where reference mutation has occurred. An alternative, which FIG. 4 illustrates, is for the write barrier instead to employ what is often referred to as a “card table.” FIG. 4 depicts a garbage-collected heap 40. Of that heap a portion 42, which for reasons that will become apparent will be referred to as the “mature generation,” is shown as divided into sections known for this purpose as “cards” such as card 46. Card size is a design choice, but it is usually such that a plurality fit into each region, such as region 48, with which a separate remembered set is associated. The collector maintains a card table 50 containing an entry for each card. When the mutator writes a reference, such as reference 52 in card 46, it makes the entry 54 associated with that card (or, say, with the card in which the object containing the reference begins) indicate that the card is “dirty,” i.e., that the card has had one or more of its references modified.
As was stated above, a collector updating remembered sets can restrict its attention to references located in dirty cards. Some collectors use this approach uniformly. To indicate that not all do, though, FIG. 4 shows only the mature generation as divided into cards. A typical approach to garbage collection is to place most newly allocated objects in a small heap portion, or “young generation.” Since newly allocated objects have a greater tendency to become unreachable than objects that have already remained reachable for some time, the collector collects the young generation frequently, and it copies (“promotes”) objects into the mature generation when they have survived a certain number of young-generation collections. In the common arrangement in which the collector collects the entire young generation just before every increment of mature-generation collection, it does not need write-barrier results to find references in the young generation, because that part of the heap is being completely scanned as part of the young-generation collection operation. Also, although the collector does need to keep track of references in the rest of the heap that refer to young-generation objects in the young generation, it may for that purpose employ a type of remembered set for the young generation that differs from the type it uses for mature-generation references to mature-generation objects.
In any event, even though the collector restricts its attention to references located in dirty cards, the amount of time required to update remembered sets can be significant. So, to limit the portion of the collection-pause length taken up by updating, some collectors employ one or more remembered-set-updating threads that execute concurrently with the mutator. Usually, some remembered-set-updating activity remains to be done when the collection pause occurs, but much of it will typically have been performed already, concurrently with the mutator's execution, so the pause's duration is less than it would have been without the concurrent remembered-set-updating.
Although concurrent remembered-set-updating thus reduces the remembered-set-updating portion of a pause's duration, it nonetheless exacts its own cost. If the same reference is modified more than once between collection pauses, for example, the number of times the remembered set needs to be updated in response to a given reference location is greater than it would be if updating were limited to collection pauses. Moreover, the resultant remembered-set size can be greater, so the time required to scan the thereby-listed locations for references to collection-set objects can be greater.