1. Field of the Invention
The present invention concerns computer-program compiling and in particular to what has in that discipline come to be referred to as “garbage collection.”
2. Background Information
Garbage collection is the term that has come to be used for the operations by which data objects that a program will no longer use are recognized so that the computer memory occupied by those objects can be reclaimed for reuse. For the purposes of this discussion, the term object refers to a data structure represented in a computer system's memory. Other terms sometimes used for the same concept are record and structure. An object may be identified by a reference, a relatively small amount of information that can be used to access the object. A reference can be represented as a “pointer” or a “machine address,” which may require, for instance, only sixteen, thirty-two, or sixty-four bits of information, although there are other ways to represent a reference.
In some systems, which are usually known as “object oriented,” objects may have associated methods, which are routines that can be invoked by reference to the object. An object also may belong to a class, which is an organizational entity that may contain method code or other information shared by all objects belonging to that class. In the discussion that follows, though, the term object will not be limited to such structures; it will additionally include structures with which methods and classes are not associated.
Garbage collection is used almost exclusively in environments in which memory can be allocated to some objects dynamically. Not all systems employ dynamic allocation. In some computer languages, source programs must be so written that all objects to which the program's variables refer are bound to storage locations at compile time. This storage-allocation approach, sometimes referred to as “static allocation,” is the policy traditionally used by the Fortran programming language, for example.
Even for compilers that are thought of as allocating objects only statically, of course, there is often a certain level of abstraction to this binding of objects to storage locations. Consider the typical computer system 10 depicted in FIG. 1, for example. Data that a microprocessor 11 uses and instructions for operating on them may reside in onboard cache memory or be received from further cache memory 12, possibly through the mediation of a cache controller 13. That controller 13 can in turn receive such data from system read/write memory (“RAM”) 14 through a RAM controller 15 or from various peripheral devices through a system bus 16. The memory space made available to an application program may be “virtual” in the sense that it may actually be considerably larger than RAM 14 provides. So the RAM contents will be swapped to and from a system disk 17.
Additionally, the actual physical operations performed to access some of the most-recently visited parts of the process's address space often will actually be performed in the cache 12 or in a cache on board microprocessor 11 rather than on the RAM 14. Those caches would swap data and instructions with the RAM 14 just as RAM 14 and system disk 17 do with each other.
A further level of abstraction results from the fact that an application will often be run as one of many processes operating concurrently with the support of an underlying operating system. As part of that system's memory management, the application's memory space may be moved among different actual physical locations many times in order to allow different processes to employ shared physical memory devices. That is, the location specified in the application's machine code may actually result in different physical locations at different times because the operating system adds different offsets to the machine-language-specified location.
Some computer systems may employ a plurality of processors so that different processes' executions actually do occur simultaneously. Such systems come in a wide variety of configurations. Some may be largely the same as that of FIG. 1 with the exception that they include more than one microprocessor such as processor 11, possibly together with respective cache memories, sharing common read/write memory by communication over the common bus 16.
In other configurations, parts of the shared memory may be more local to one or more processors than to others. In FIG. 2, for instance, one or more microprocessors 20 at a location 22 may have access both to a local memory module 24 and to a further, remote memory module 26, which is provided at a remote location 28. Because of the greater distance, though, port circuitry 29 and 30 may be necessary to communicate at the lower speed to which an intervening channel 32 is limited. A processor 34 at the remote location may similarly have different-speed access to both memory modules 24 and 26. In such a situation, one or the other or both of the processors may need to fetch code or data or both from a remote location, but it will often be true that parts of the code will be replicated in both places.
Despite these expedients, the use of static memory allocation in writing certain long-lived applications makes it difficult to restrict storage requirements to the available memory space. Abiding by space limitations is easier when the platform provides for dynamic memory allocation, i.e., when the platform enables allocation of memory space to be delayed until after the program has been loaded and is already running.
Dynamic allocation has a number of advantages, among which is that the run-time system is able to adapt allocation to run-time conditions; for given objects the programmer can specify respective conditions on which space should be allocated to them. The C-language library function malloc( ) is often used for this purpose. Conversely, the programmer can specify conditions under which memory previously allocated to a given object can be reclaimed for reuse. The C-language library function free( ) results in such memory reclamation.
Because dynamic allocation provides for memory reuse, it facilitates generation of large or long-lived applications, which over the course of their lifetimes may employ objects whose total memory requirements would greatly exceed the available memory resources if they were bound to memory locations statically.
Particularly for long-lived applications, though, allocation and reclamation of dynamic memory must be performed carefully. If the application fails to reclaim unused memory—or, worse, loses track of the address of a dynamically allocated segment of memory—its memory requirements will grow over time to exceed the system's available memory. This kind of error is known as a “memory leak.” Another kind of error occurs when an application reclaims memory for reuse even though it still maintains a reference to that memory. If the reclaimed memory is reallocated for a different purpose, the application may inadvertently manipulate the same memory in multiple inconsistent ways. This kind of error is known as a “dangling reference,” because an application should not retain a reference to a memory location once that location is reclaimed. Explicitly managing dynamic memory by using interfaces like malloc( )/free( ) often leads to these problems.
Such leaks and related errors can be made less likely by reclaiming memory space more automatically. As was mentioned above, the software and/or hardware used for this purpose is typically referred to as a garbage collector. Garbage collectors operate by inspecting the running program's current state, determining from that state whether it can decide that there are some objects that the program can no longer reach, and reclaiming objects thus found not to be reachable. The criteria that garbage collectors use for this purpose vary, but, for example, a program's global variables are normally considered reachable throughout a program's life. Although they are not ordinarily stored in the memory space that the garbage collector manages, they may contain references to dynamically allocated objects that are, and the garbage collector will consider such objects reachable. It will typically also consider an object reachable if it is referred to by a reference in a register or a thread's call stack. And reachability is contagious: if a reachable object refers to another object, that other object is reachable, too.
It is advantageous to use garbage collectors because, whereas a programmer working on a particular sequence of code can perform his task creditably in most respects with only local knowledge of the application, memory allocation and reclamation tend to require more-global knowledge. A programmer dealing with a small subroutine, for example, may well be able to identify the point in the subroutine beyond which the routine has finished with a given memory portion, but knowing whether the application as a whole will be finished with it at that point is often much more difficult. In contrast, garbage collectors typically work by tracing references from some conservative notion of a “root set,” e.g., global variables, registers, and the call stack: they thereby obtain reachability information methodically. By using a garbage collector, the programmer is relieved of the need to worry about the application's global state and can concentrate on (more-manageable) local-state issues. The result is applications that are more robust, having no dangling references and fewer memory leaks.
Garbage-collection mechanisms can be implemented by various parts and levels of a computing system. One approach is simply to provide them as part of a batch compiler's output. Consider FIG. 3's simple batch-compiler operation, for example. A computer system executes in accordance with compiler object code and therefore acts as a compiler 36. The compiler object code is typically stored on a medium such as FIG. 1's system disk 17 or some other machine-readable medium, and it is loaded into RAM 14 to configure the computer system to act as a compiler. In some cases, though, the compiler object code's persistent storage may instead be provided in a server system remote from the machine that performs the compiling. The electrical signals that typically carry the digital data by which the computer systems exchange that code are examples of the kinds of electromagnetic signals by which the computer instructions can be communicated. Others are radio waves, microwaves, and both visible and invisible light.
The input to the compiler is the application source code, and the end product of the compiler process is application object code. This object code defines an application 38, which typically operates on input such as mouse clicks, etc., to generate a display or some other type of output. This object code implements the relationship that the programmer intends to specify by his application source code. In one approach to garbage collection, the compiler 36, without the programmer's explicit direction, additionally generates code that automatically reclaims unreachable memory space.
Even in this simple case, though, there is a sense in which the application does not itself provide the entire garbage collector. Specifically, the application will typically call upon the underlying operating system's memory-allocation functions. And the operating system may in turn take advantage of hardware that lends itself particularly to use in garbage collection. So even a very simple system may disperse the garbage-collection mechanism over a number of computer-system layers.
To get some sense of the variety of system components that can be used to implement garbage collection, consider FIG. 4's example of a more complex way in which various levels of source code can result in the machine instructions that a processor executes. In the FIG. 4 arrangement, the human applications programmer produces source code 40 written in a high-level language. A compiler 42 typically converts that code into “class files.” These files include routines written in instructions, called “byte code” 44, for a “virtual machine” that various processors can be software-configured to emulate. This conversion into byte code is almost always separated in time from that code's execution, so FIG. 4 divides the sequence into a “compile-time environment” 46 separate from a “run-time environment” 48, in which execution occurs. One example of a high-level language for which compilers are available to produce such virtual-machine instructions is the Java™ programming language. (Java is a trademark or registered trademark of Sun Microsystems, Inc., in the United States and other countries.)
Most typically, the class files' byte-code routines are executed by a processor under control of a virtual-machine process 50. That process emulates a virtual machine from whose instruction set the byte code is drawn. As is true of the compiler 42, the virtual-machine process 50 may be specified by code stored on a local disk or some other machine-readable medium from which it is read into FIG. 1's RAM 14 to configure the computer system to implement the garbage collector and otherwise act as a virtual machine. Again, though, that code's persistent storage may instead be provided by a server system remote from the processor that implements the virtual machine, in which case the code would be transmitted by electromagnetic signals to the virtual-machine-implementing processor.
In some implementations, much of the virtual machine's action in executing these byte codes is most like what those skilled in the art refer to as “interpreting,” so FIG. 4 depicts the virtual machine as including an “interpreter” 52 for that purpose. In addition to or instead of running an interpreter, many virtual-machine implementations actually compile the byte codes concurrently with the resultant object code's execution, so FIG. 4 depicts the virtual machine as additionally including a “just-in-time” compiler 54.
The resultant instructions typically invoke calls to a run-time system 56, which handles matters such as loading new class files as they are needed and includes much of garbage-collector implementation. The run-time system will typically call on the services of an underlying operating system 58. Among the differences between the arrangements of FIGS. 3 and 4 in that FIG. 4's compiler 40 for converting the human programmer's code does not contribute to providing the garbage-collection function; that results largely from the virtual machine 50's operation.
Independently of the specific system-architecture levels at which the collector resides, garbage collection usually includes some amount of reference tracing to determine whether objects are at least potentially reachable; if they are not potentially reachable, they are garbage, and their memory space can therefore be reclaimed. The most straightforward and accurate way to do the tracing is to start at the root set and scan recursively for referred-to objects until all have been identified. If an object is not encountered in that process, it is not reachable, even if it is referred to by a reference in some other object.
Now, it would be unacceptable in many applications to have the mutator pause while the collector traces references through the whole heap. So some garbage collectors perform the tracing in threads of execution that (mostly) operate concurrently with the mutator. They mark the objects thereby encountered, and, possibly with some exceptions, objects that remain unmarked at the end of the marking operation can be recognized as garbage. The memory blocks occupied by thus-recognized objects can be reclaimed concurrently with mutator execution. Since most of such a marking-and-sweeping operation executes concurrently with mutator execution, this approach limits pause times. For large heaps, though, the marking is expensive, and it can slow mutator execution.
Another solution is for the garbage collector to operate “space-incrementally.” In each successive collection increment, which typically (but not necessarily) is all performed during a single pause in mutator execution, the collector considers a small portion, or collection set, of the heap. As the mutator executes, it notifies the collector when it writes a reference, and the collector thereby maintains, for each of a plurality of heap regions, a list, or remembered set, of locations in other regions where the mutator has reported references to that region's objects. Without tracing reference chains from the basic root set throughout the entire heap during every collection increment, the collector can recognize a collection-set object as unreachable if no reference to it resides in any location that the collection-set regions' remembered sets identify.
Although incremental collection helps to limit pause times, collector overhead remains significant, and a great amount of effort has been expended in improving collector efficiency.