Information processing applications require the manipulation of data objects. The manipulation of these objects consumes information processing resources including memory and processor time. Provision of these resources is costly. It is therefore desirable to employ methods which reduce the quantity of information processing resources required to accomplish a given manipulation.
In particular, the manipulation of data objects often requires the allocation and deallocation of memory intended to hold temporary and return values. The allocation and deallocation operations themselves require processor time, and memory remains in use between a successful allocation and its corresponding deallocation. It is therefore desirable to reduce the number of allocations and deallocations required by a given manipulation. This will in turn reduce the total amount of information processing resources used.
In the past, two main methods have been used to reduce run-time memory requirements: copy-by-reference and local reuse. Unfortunately, prior methods employing copy-by-reference require garbage collection procedures which are expensive in terms of CPU resources at run time. Existing local reuse methods require that the language used guarantees that every data object has at most one reader. Furthermore, a successful synthesis of these two methods has not previously been achieved.
The method of the present invention represents a novel synthesis of these two existing methods wherein copy-by-reference can be achieved without requiring expensive run-time garbage collection, and local reuse can be employed in connection with a declarative language. Because this method relies upon the concept of a subtype field, which in itself is entirely new, it can be seen that this method is at once innovative and non-obvious.
In the following subsection, the terms used in the present discussion are defined. Subsection 2.2 then briefly presents the known methods, for comparison and contrast with the present method.
2.1 Definitions
2.1.1 Applications
Information processing applications are programs or sets of programs which are executed on one or more computers 200. A computer consists of a central processing unit (CPU) 230 and memory 220, as well as various arithmetic, logical, and input/output (I/O) devices. After a program has been stored as machine instructions in memory 220, it can be executed by the CPU 230. The CPU 230 reads and executes the stored instructions one-by-one in sequence. When such a sequence is being executed, it is called a process 250. Computers usually contain memory management hardware and/or software 240 which controls the allocation and deallocation of blocks of memory 220 for specific processes 250. The CPU 230 communications with memory 220, the memory manager 240 and processes 250 through messages 260 passed between them.
Information processing applications are commonly specified using a computer language. A translator is then used to convert this specification into a form which can be executed by the target machine. The translator can be either an interpreter or a compiler: An interpreter translates the source code statement-by-statement while executing a program; a compiler translates the entire program before execution begins. For efficiency reasons, compilers are used most often. Compilers are able to effect optimizations which reduce the computational resources required by an application. Manipulations which are performed by the compiler are said to be done at compile time, while manipulations performed during execution of the application are said to be done at run time.
Programming languages can be split into two groups: functional and declarative: Functional languages do not include data structure declarations, whereas declarative languages do. Functional languages are mainly of academic interest, while the great majority of applications are coded using declarative languages.
An information processing application coded in a declarative language includes data declarations and function invocations. Most languages also include facilities for data structure definitions and function definitions. Data structure definitions define aggregations of data which are to be manipulated as a unit. Function definitions consist of a sequence of data structure declarations and function invocations. Execution of a data declaration causes the creation of a data structure. Execution of a function invocation causes the corresponding function definition to be executed.
2.1.2 Objects
In the object-oriented programming paradigm, a data structure is called an object, and data structure definitions are called classes or types. A type specification includes a list of all of the functions which can be used to manipulate an object of that type. Thus, object-oriented programming provides for the direct implementation of an abstract data type (ADT). Although the preferred embodiment of the present invention makes use of specific capabilities of object-oriented programming languages, it is more generally applicable. Henceforth, the term object is to be construed in its most general sense, i.e. "an instance of a data structure".
An object comprises two types of field: items and pointers. A pointer is the address of another memory location; an item is an object or data of a fundamental (non-pointer) type. Since data of a fundamental type can be considered to be an object, the terms "item" and "object" are used interchangeably.
In a well-behaved program, a pointer is either a reference to an object or has the special value nil. A pointer whose value is nil is called a null pointer. Application of a dereferencing operation to a pointer returns the object referenced or pointed to. The result of applying a dereferencing operation to a null pointer is undefined.
In general, an object 1 as shown in FIG. 1 is comprised of a type field 2, zero or more items 4, and zero or more pointers (6 and 7). Because the information contained in the type field is commonly used only at compile time, it is not necessarily stored with the object at run time. The type field is therefore drawn within a dashed box to distinguish it from information comprising the object which must be stored at run time. In the figure, nonnull pointers 6 are represented as arrows pointing to other objects 8. Null pointers 7 are represented by an X within the corresponding pointer field. Items 4 contained by an object are known as internal objects, while objects 8 referenced by pointers contained by an object are known as external objects.
An object 1 also has zero or more names 3 associated with it. Each name is a labelled pointer to the object. Since names are used only by the compiler at compile time, they do not require any storage at run time. This fact is represented by the use of dashed boxes to enclose the name pointers.
Note that external objects can also contain pointers to other objects recursively, creating an object with arbitrary "depth". The depth of an object can be determined by counting the number of pointers that must be followed to reach it, starting from a name. Thus in the figure, names 3 are at depth 0, the object 1 itself is at depth 1, and the external objects 8 are at depth 2. For consistency, the depth attributed to the manipulation of a pointer corresponds to the depth at which that pointer is stored. Thus, manipulations of pointers 6 as shown in FIG. 1 are considered to be at depth 1.
Since objects comprise data, memory resources must be allocated to store them. If the memory required to hold an object can be allocated before the information processing application begins execution, then it is said that memory for the object is allocated at compile time. Otherwise, memory for the object is allocated at run time. An object whose memory is allocated at run time is said to be dynamically allocated.
2.1.3 Functions
A function is the definition of a manipulation involving zero or more objects. These objects are called operands. At run time, a function modifies zero or more of its operands, using the information present in its operands. For the purposes of this discussion, global or static memory referenced by a function is considered to be an operand of that function.
Some functions have associated with them a special operand called a return value. Such functions are called operations. Functions which are not operations are called procedures. All operands of a function which are not the return value are called arguments. If a function is not permitted to modify an argument, then that argument is called a constant argument. Any argument not declared to be constant may be modified by the function. An operation may always modify the return value, if present.
In addition to reading and modifying their operands, functions can create and destroy objects called temporaries. These temporaries are necessarily dynamically allocated. The return values of operations are also commonly dynamically allocated, and the operands of a function can be dynamically allocated as well. Objects which are dynamically allocated by a function are allocated and deallocated each time the function is executed. This means, among other things, that temporaries must be deallocated before the function is exited.
Taken together, the objects which a function can access during its operation are called its local store. When a function begins execution, all of the arguments are in the local store, as shown in FIG. 2(a). As execution of the function progresses, temporaries may be added and removed. Thus, at any given time, there will also be a set of zero or more temporaries in the local store (FIG. 2(b)). Just before the function is exited, all temporaries are removed from the local store and deallocated, but a return value may remain (FIG. 2(c)).
A function definition consists of a header and a body. The header contains the name of the function and a list of formal parameters. The formal parameters identify the type of each operand including the return value, and provide a means for binding objects of the same name within the function body to the corresponding actual parameters in the function invocation. Some formal parameter lists include ellipsis, meaning that an unspecified number of parameters follow. In this case, the function must determine the types of the additional operands by other means.
The body of a function contains a list of data declarations and function invocations. Objects which are declared within a function definition can be accessed only within that function body. Said function body is said to be the scope of such an object. Objects declared in a function which invokes the present function can be accessed as actual parameters though bindings with this function's formal parameters. Objects in the calling (invoking) function are said to be in an enclosing scope.
2.1.4 Object-Oriented Languages
Object-oriented programming languages directly support the implementation of abstract data types (ADTs) by allowing a list of legal functions to be specified in conjunction with a type specification. Functions which operate implicitly on an object of that type are called member functions, and the object thus operated upon is called the target of the member function. Member functions which modify the target are called destructive functions. Member functions which do not modify the target are called nondestructive functions. It is also possible to specify non-member functions which access objects of that type explicitly.
Object-oriented languages provide in their syntax for special member functions to be applied in allocating and deallocating objects. These are called constructors and destructors, respectively. A type specification may contain many constructors, but has only one destructor. Most object-oriented languages automatically call the destructors of objects whenever the scope in which they are defined is exited.
Such languages also permit a given function to be multiply declared and defined, provided that any version of a function can be distinguished from all others according to the types of its operands. This is known as function overloading. When the compiler for such a language is presented with the invocation of a function, it selects the correct function definition by first identifying the family of functions having the same name. It then matches up the types of the actual parameters in the function invocation with those of the formal parameters in the function definitions. If there is exactly one match, it selects that one function as the correct match. Otherwise, a compiler error is generated.
In addition to providing for the implementation of abstract data types, most object-oriented programming languages provide for inheritance. Inheritance allows most of a type specification (and the corresponding function definitions) to be shared among different types. Specifically, a derived type inherits its basic behavior from its parent type. It then suffices to specify how the behavior of the derived type differs from that of its parent.
In most implementations, inheritance also gives rise to an intentional ambiguity in the resolution of operand types. Specifically, if there is no function definition which matches the type of an operand exactly, then the compiler attempts to find a match with each of the operand type's ancestors in turn. If there is exactly one best match, then the compiler selects this one as the correct function definition. Otherwise, a compiler error is generated. This mechanism supports the definition of a function which applies to an entire subtree of operand types. When a function accepts a parameter of any type within an inheritance subtree, this is known as polymorphism.
2.2 Existing Methods for Memory Reuse
The operative concept used to reduce the cost of dynamic allocation is to re-use objects which have already been allocated in preference to allocating new ones. The techniques which have been used can be summarized as copy-by-reference and direct reuse.
2.2.1 Copy-by-Reference
Information processing applications may involve the manipulation of multiple copies of the same object. The most economical way to copy a declared object is to create an alias for it. An alias is an additional name 3 which refers to the same object 1. The example of FIG. 3 shows an object 1 which has the aliases "A" and "B". This is the simplest form of copy-by-reference.
It is possible to extend the copy-by-reference scheme to the case in which there are distinct objects containing pointers, one of which is to be given the same value as the other. In this case, the internal fields 4, 6 and 7 are copied verbatim from one to the other. Any nonnull pointers in the second object then point to the same external objects as are pointed to by those in the first. This is known as a shallow copy between two objects. An example of the result of a shallow copy is shown in FIG. 4. In FIG. 4, it is obvious that the external object 8 has been copied into the second object by reference.
There are two problems associated with shallow copies. The first is that even though the objects 1 are declared independently, modification of the external object 8 now affects both copies. Thus, objects which share external data can be termed virtual aliases. This behavior may be in contrast to what was intended. The second problem is that the external objects 8 are now jointly owned by all of the objects 1 referring to them, and no external object 8 can safely be deallocated until every reference 6 to it is destroyed.
Deep Copying
The first problem is solved by making provisions for a deep or verbatim copy. A deep copy is the same as a shallow copy, except that the external objects 8 are also copied verbatim. This involves the allocation of memory sufficient to hold each external object 8 referred to by the first object, and copying their contents. The nonnull pointers 6 in the second object are then set to point to the copies of the external objects 8 (as opposed to pointing to the originals). An example of the result of a deep copy is shown in FIG. 5. Note that a deep copy yields two data structures which are completely distinct (i.e. share no memory).
Since the definition of an object is recursive, it can be seen that there can be several levels of "depth" to a shallow copy. The present discussion treats only two levels, but the methods described here can easily be extended to multiple levels of reference.
Memory Reclamation
The second problem associated with copy-by-reference has been addressed by methods for memory reclamation or garbage collection. These methods deallocate an external object 8 only after all references 6 to it have been removed, thereby returning its memory to the free store for later use or use by other processes while ensuring that an existing reference copy always has access to it. Once a memory reclamation method is in place, it can be said that copy-by-reference is supported. Methods for memory reclamation include reference counting, mark-scan garbage collection, and the two-space copy[4].
The article by Corporaal and Veldman [1] provides an excellent taxonomy which classifies known memory reclamation methods according to six general descriptors.
Reference Counting
In reference counting[2], as diagrammed in FIG. 6, each external object 8 has associated with it a count 11 of the number of objects that reference (point to) it. Every time a new copy-by-reference is made to the external object 8, the count 11 is incremented, and every time an object which references it is destroyed, the count 11 is decremented. When the count 11 goes to zero, the object 8 (and its associated count 11) can be deallocated safely. However, the reference counting method requires additional resources for its implementation. The shared external data must contain the reference count--utilizing memory resources--and the count must be updated whenever a copy object is created or destroyed--utilizing CPU resources.
A variation of the reference counting scheme known as weighted reference counting removes the requirement of referencing shared memory, which makes such schemes amenable to parallel implementation, but some bookkeeping is still required at run time. Another variation known as lazy reference counting reduces the run-time CPU requirements by deferring deallocation operations and then combining them with allocations, but does not eliminate them entirely. Lazy reference counting is also inefficient in reclaiming unused memory. Run-time CPU resources are still required by all these methods. Both variations also require more memory for their implementation, in addition to the reference count. The fact that reference counting requires CPU resources at run-time has fostered interest in compile-time reallocation techniques.
Garbage Collection
One alternative method is called Mark-Scan garbage collection. In this scheme, external objects are never explicitly deallocated. Periodically, a garbage collection process marks all data blocks which can be accessed by any object. Unreferenced memory can be reclaimed by scanning the entire memory and deallocating unmarked elements.
Mark-scan garbage collection has two major drawbacks. One is that garbage collection is costly in terms of CPU usage (though not as costly as reference counting). The other is that on the average, half of the available memory is occupied by data blocks which could be reclaimed, so it is extremely costly in terms of unused memory. It is also necessary to set aside one bit of storage in each memory cell, in order to mark the cells as "used" during the marking step. This again is less expensive than reference counting.
Two-Space Copy garbage collection is similar to Mark-Scan garbage collection. However, instead of marking those items that can be reached by reference, all reachable data structures are periodically copied from one memory space into another. The first memory space can then be reclaimed in its entirety.
This method has the advantages that it avoids the problem of memory fragmentation and that the second memory space is only needed as the reclamation is being performed. It also eliminates the need for a "mark" bit in each memory cell. However, this method is just as expensive as mark-scan garbage collection in terms of CPU time. Both mark-scan and two-space copy require that all data structures remain unchanged during their execution, which prohibits running them in parallel with the application.
Whereas the memory reclamation methods discussed above permit the use of copy-by-reference, they are all expensive in terms of CPU usage. Thus, while copy-by-reference is useful in avoiding deep copies, in the current art it has an unavoidable CPU cost associated with it. In addition, only special cases of reference counting can be parallelized.
2.2.2 Direct Reuse
An approach which avoids the problems associated with copy-by-reference is to ensure that every reachable object has exactly one reference to it. A language which enforces this restriction is called a functional or single-assignment language, to contrast it with more general declarative languages. Since functional languages disallow multiple references, an object can be deallocated as soon as the one and only reference to it is deleted. There are no other references to be concerned about. Indeed, all such deallocations can be scheduled at compile time. Thus for functional languages, no run-time resources need be involved in reclaiming memory.
Unfortunately, the use of functional languages forces the proliferation of deep copies. Since deep copies are expensive in terms of both memory and CPU resources, it is desirable to avoid them whenever possible. Therefore, merely abandoning shallow copies in favor of deep copies is not regarded as an acceptable alternative. The techniques of local reuse and targeting have been developed in conjunction with functional languages, and have succeeded in bringing their performance up to a level which is comparable with the more conventional declarative languages[6].
Local Reuse
Local reuse[1, 3, 6] (also termed update-in-place) has been applied to functional languages, and involves a compile-time analysis of the source code to determine when an object with only a single reader can be reused. In this case, this object's memory may be reused by the function which reads it.
Local reuse involves the transfer of memory from one object to another as diagrammed in FIG. 7. FIG. 7(a) shows an object A which owns some external memory and an object B which does not. Memory is transferred from A to B by copying one or more pointers to memory 6 from A to B, and then setting the memory pointer(s) in A to nil. The configuration in memory after such a transfer is performed is shown in FIG. 7(b). As an interesting special case, an entire named object can be reused by performing this update on the name pointers 3 of the objects.
Targeting
A global extension of the local reuse techniques is known as targeting[3, 6] or build-in-place. It involves compile-time analysis to determine the ultimate disposition of partial results. A data block of size sufficient to hold the result is allocated as early in the execution as possible. The data blocks used to hold partial results then become references to portions of this data block.
An example of targeting is shown in FIG. 8. In the figure, the arrows represent operations, while the boxes indicate blocks of memory. Blocks which must be contiguous in memory are shown as contiguous blocks in the figure. Targeting maps distinct portions of an ultimate large result to the results of intermediate calculations. Before the application of targeting, many different blocks of memory must be allocated to hold the partial and intermediate results, as shown in FIG. 8(a). After targeting, portions of a large contiguous block of memory are used to hold the partial and intermediate results, as well as the final result, as shown in FIG. 8(b).
The techniques of local reuse and targeting have been developed in conjunction with functional languages, which disallow multiple references to an object. However, most commonly-used languages are declarative, and allow multiple references. Thus, the use of these methods also requires rendering the functions to be processed in a functional language. Among other things, this transformation requires that every shallow copy be replaced by a deep one.
2.2.3 Related Patents
Seventeen U.S. patents involving garbage collection (reclamation of shared memory) in computer systems were located using a patent bibliographic database. Among those, the ones most pertinent to the present invention are U.S. Pat. Nos. 5,293,614, 5,241,673, 5,218,698, 5,136,706, 5,088,036, 4,912,629, 4,907,151, 4,814,971, 4,775,932 and 4,755,939. Eighty-seven U.S. patent involving temporary memory were located by the same means. Of these, only the one numbered U.S. Pat. No. 5,136,712 is pertinent to the present invention.
The U.S. Pat. Nos. 5,293,614, 5,218,698, 5,136,706, 5,088,036 and 4,907,151 present various implementations of the two-space copy method. U.S. Pat. No. 5,293,614 describes a system implemented in virtual memory which does not require a read barrier--to prevent inconsistencies from being introduced by running the garbage collection routine in parallel with other processes. U.S. Pat. No. 5,218,698 describes a two-space copy garbage collection scheme implemented in connection with a logic programming system. U.S. Pat. No. 5,136,706 describes a variation of the two-space copy method in which the lifetime of objects is determined adaptively. U.S. Pat. No. 5,088,036 describes another parallel two-space copy method implemented in a paged virtual memory system. U.S. Pat. No. 4,907,151 describes a two-space copy method in which root pointers may point to defunct objects.
The U.S. Pat. Nos. 5,241,673, 4,814,971 and 4,775,932 present modifications of the mark-scan garbage collection method. U.S. Pat. No. 5,241,673 describes a variation of the mark-scan garbage collection method in which the reachability of a node is determined in a manner which is distributed and incremental. This is in contrast to the usual mark-scan scheme in which reachability is determined afresh each time the routine is invoked, by starting from a root or roots and marking all nodes reachable therefrom. Because each memory element stores a list of objects which may reference it, this method also has characteristics in common with the reference-counting scheme. The other two U.S. Pat. Nos. 4,814,971 and 4,775,932 describe hardware systems for implementing the mark-scan method.
The U.S. Pat. Nos. 4,912,629 and 4,755,939 describe various reference counting methods. U.S. Pat. No. 4,912,629 describes a variation of the reference-counting method in which the size of the reference count can be increased as needed. U.S. Pat. No. 4,755,939 discloses exactly the weighted reference counting scheme.
U.S. Pat. No. 5,136,712 makes a distinction between temporary return value objects and declared objects, but uses this only to determine when such an object should be deallocated--making no mention of direct reuse. It also uses a standard reference-counting technique.
2.2.4 Summary of Prior Art
The method of copy-by-reference has been used to avoid expensive deep copy operations. However, this method involves the problem of reclaiming the shared data blocks. Known methods for reclaiming this memory require CPU resources at run time. The mark-scan and two-space copy methods, as well as "lazy" reference counting schemes, are also inefficient in reclaiming unreferenced memory.
The method of direct reuse has been developed in conjunction with functional languages. Functional languages explicitly disallow copy-by-reference, resulting in the proliferation of deep copies. Methods for direct reuse can replace some of these deep copies with shallow copies, but do not support the creation of shallow copies in the source language. Thus, although local reuse may produce a function which requires fewer allocations, it requires the use of a functional language.
Among existing methods, none supports copy-by-reference without requiring additional memory and CPU resources at run time, none implements local reuse in conjunction with a declarative language, and none provides a synthesis of copy-by-reference and direct reuse.