This invention relates generally to the field of mixed programming environments, and provides, in particular, mechanisms for combining memory management disciplines and for managing object instances in a mixed programming environment.
Modem operating systems and hardware platforms make available increasingly large addressable or dynamic memory spaces. Modem applications have correspondingly grown in size and complexity to take advantage of this available memory. Most applications today use a great deal of dynamic memory. Features such as multitasking and multi threading increase the demands on memory.
Where there isn""t enough space in dynamic memory for an application to execute, execution time is considerably slowed down while data that another application is referencing is swapped out of dynamic memory to make room for the currently executing thread or process. Object oriented (OO) programming languages such as C++, in particular, use dynamic memory much more heavily than comparable serial programming languages like C, often for small, short-lived allocations.
The effective management of dynamic memory, to locate useable free blocks and to deallocate blocks no longer needed in an executing program, has become an important programing consideration.
The C and C++ languages use the memory allocation function malloc to manage the dynamic memory space. Malloc is a library function defined under the ANSI C standard, and in its usual form, it is a list of free blocks of memory. To allocate a specified number of bytes of memory for a process, the list is searched linearly until a suitably-sized block of free memory is located, and a pointer is returned to identify the address of the newly allocated block. A corresponding deallocation is made using the function free. This causes the space pointed to by the pointer to be deallocated and put back into the list of free memory.
There are modifications and additional allocation and deallocation functions that can be used, examples of which are discussed in:
xe2x80x9cAdvanced Programing in the UNIX(copyright) Environmentxe2x80x9d, W. Richard Stevens, 1992, Addison Wesley Publishing Co.; and
xe2x80x9cRethinking Memory Managementxe2x80x9d, Arthur D. Applegate, Dr. Dobb""s Journal, June 1994, pp. 52-55.
However, for the most part, memory management under C and C++ is handled explicitly, for the reasons discussed below.
Contrasted with this is the implicit form of memory management to make excess memory available without having to run a deallocating routine. This is generally referred to as xe2x80x9cgarbage collectionxe2x80x9d, and is used in a number of interpreted OO programming languages such as Lisp, Smalltalk and Java.
C++ is a compiled OO language. Some of the differences between compiled and interpreted OO languages, as well as the differences between OO and serial programming technology are discussed in two co-pending Canadian Patent Applications, No. 2,204,971 titled xe2x80x9cUniform Access to and Interchange Between Objects Employing a Plurality of Access Methodsxe2x80x9d (IBM Docket No. CA997-013) and No. 2,204,974 titled xe2x80x9cTransparent Use of Compiled or Interpreted Objects in an Object Oriented Systemxe2x80x9d (IBM Docket No. CA997-014), which are commonly assigned.
Reference counting is a technique used to provide automatic garbage collection in some language implementations (for example, Lisp), and is also sometimes used explicitly by programmers in languages such as C++ that do not provide implicit garbage collection, in order to achieve a similar effect in that language domain. Embedded in each object is an integer field called a reference count. Whenever a reference to the object is duplicated, this count is required to be incremented. Conversely, when a reference is discarded or replaced by a reference to some other object, the count is required to be decremented. By means of this counting discipline, which may be enforced by the language implementation itself (as in the case of Lisp language) or by the programmer (as when this technique is used in C++), the reference count field indicates how many references exist to the object. If, after decrementing the count, it is found to have reached zero, then that object is known to be unreferenced and its storage can be freed.
Although reference counting has advantages of simplicity and scalability, it does not deal well with data structures containing circular references. These are collections of objects containing references to one other, such that the reference counts of all objects in the collection are nonzero even though the executing application no longer holds a reference to any of them and will not access them. Reference counting will not discover that these objects are garbage.
A block of memory is implicitly available to be deallocated or returned to the list of free memory whenever there are no references to it. In a runtime environment supporting implicit memory management, a garbage collector usually scans the dynamic memory from time to time looking for unreferenced blocks and returning them. The garbage collector starts at locations known to contain references to allocated blocks. These locations are called xe2x80x9crootsxe2x80x9d. The garbage collector examines the roots and when it finds a reference to an allocated block, it marks the block as referenced. If the block was unmarked, it recursively examines the block for references. When all the referenced blocks have been marked, a linear scan of all allocated memory is made and unreferenced blocks are swept into the free memory list. The memory may also be compacted by copying referenced blocks to lower memory locations that were occupied by unreferenced blocks and then updating references to point to the new locations for the allocated blocks. Because scanning collectors determine which objects are actually reachable, they are not fooled by circular references and will collect those objects when they are garbage.
Serious problems can arise if garbage collection of an allocated block occurs prematurely. For example, if a garbage collection occurs during processing, there would be no reference to the start of the allocated block and the collector would move the block to the free memory list. If the processor allocates memory, the block may end up being reallocated, destroying the current processing. This could result in a system failure.
Applications that use garbage collection sometimes need to defeat the collection mechanism in controlled ways in order to achieve a particular desired effect. A weak reference to an object is a reference that is intentionally overlooked by the garbage collection mechanism, with the result that an object will be considered garbage and subject to collection if it is unreferenced, or if it is reachable only through weak references. Ordinary references are considered by the garbage collector and are sometimes called strong references to distinguish them from the weak variety.
One use of weak references is to break circular reference chains. Another use is in implementing an instance manager, which is a software module that keeps track of the instances of a particular class or group of classes. The instance manager needs to hold a reference to each object in order to locate it, but it is not the intent that these references alone should prevent the objects from being garbage collected. These requirements can be met by having the instance manager hold weak references to the objects.
The destruction of an object and reclamation of its storage is often preceded by a finalization step, in which user-defined code associated with the object is executed so as to give it an opportunity to xe2x80x9cclean upxe2x80x9d before it is destroyed. This corresponds to the execution of destructor methods in C++.
The Java language requires that when the garbage collector has determined that an object is unreferenced, it must first execute the finalize( ) methods defined for that object before reclaiming its storage. If, after the object has been finalized, it is subsequently found to still be unreferenced, its storage may be immediately reclaimed. However, Java also permits that the user-defined finalize( ) methods may result in the creation of a new strong reference to the object; if this occurs, the storage for the object will not be reclaimed. Only when the object is found to be unreferenced for a second time, after finalization, will it be reclaimed. This behaviour is called finalization with resurrection, since an apparently xe2x80x98deadxe2x80x99 object comes back to life.
Safe garbage collection cannot be easily achieved in programming languages such as C and C++ which permit the liberal use of indirect calls through pointer references
Pointer references are particularly useful in complex programs where the exact number of elements in different data structures may not be ascertainable at compilation time. The number may vary with the program""s actions as it is running. The use of pointer references allows individual pieces of storage to be allocated as needed, so that the required amount of storage is available at any given moment during program execution. Legacy C and C++ systems do not permit garbage collection as an automatic feature. One reason is that without a specific mechanism to deal with indirect references, garbage collection in these environments is difficult to implement. Current C++ implementations require explicit user control of the lifetime of objects.
The different programming languages mentioned above have been developed to support different types of applications. Users have become accustomed to having available increasingly rich applications. To reduce development time and cost, application developers want to be able to re-use code, in whatever language or programming environment it has been developed, and they want to be able to take advantage of the functionality offered in one environment, across several. Application developers want to have available to them the option of making cross-language function calls or method invocations.
This is particularly the case with Java, a programming environment that facilitates network communication (such as over the Internet) through a medium called bytecode. All applications can be written or re-written in Java for translation to the bytecode medium, but at great development and migration expense to the users that depend on them for day-to-day operations. Many of these existing applications are already in an OO programming language operating under the same general principles as Java. The ideal is to permit cross-language calls to access the data objects constructed in other language environments. The above-referenced application titled xe2x80x9cUniform Access to and Interchange Between Objects Employing a Plurality of Access Methodsxe2x80x9d (IBM Docket No. CA997-013) describes a system that supports both local and remote calls across OO languages.
In such a system, a problem arises when a call originates from a programming environment that supports automatic garbage collection, such as Java, to an environment that requires explicit memory release, such as C++. The call from the Java environment has the effect of constructing the data object in the C++ environment. On destruction of the Java handle, there is no way to pass a message to release the memory occupied by the C++ data object because C++ does not recognise the garbage collection mechanism. Similarly, the remote Java user does not have access to the C++ mechanism to explicitly release the memory.
Therefore, a programming model that manages memory correctly across the language boundary is needed.
It is an object of the present invention to provide an environment in which composite objects can be safely constructed over garbage collected and non-garbage collected environments in a manner to avoid untimely object destruction.
It is also an object of the present invention to correctly detect unreferenced objects in a computer program environment that includes both reference counting and garbage collection memory management domains, and where object references may be held in one domain to objects resident in the other domain.
A further object of the present invention is to activate a prescribed storage management function on an object when that object is detected to be unreferenced.
It is also an object of the present invention to provide distinguished references to objects and to correctly detect when all remaining references to a particular object are distinguished references, and to activate a prescribed storage management function on said object at the time of said detection.
Accordingly the present invention provides a cross-language memory management system for use over an object oriented programming in which there is at least an explicit memory management domain with reference counting and an implicit memory management domain. This system includes an interface mechanism that operates to connect the explicit memory management domain and the to implicit memory management domain. The interface mechanism is adapted to detect cross-language references intended for implementation objects in each domain and, in the case of any implementation object targeted by at least one cross-language reference, it maintains at least one strong reference to that implementation object.
Preferably, the interface mechanism, for any implementation object targeted by a cross-language reference, consists of a reference object containing a reference count of cross-language references intended for that implementation object. While the reference count is non-zero, the reference object is adapted to maintain a strong reference to the implementation object.
In another embodiment, the invention provides a mechanism to control object destruction in an object oriented programming environment permitting cross-language invocations from an explicit memory management domain having reference counting to an implicit memory management domain. In this embodiment, the mechanism consists of means to pass a strong reference from a calling object in the explicit memory management domain to an implementation object in the implicit memory management domain, and means to destroy the strong reference on destruction of the calling.
The invention also provides a mechanism to control object destruction in the converse situation, the invocation of an implementation object in an explicit memory management domain having reference counting by a calling object in an implicit memory management domain. In this embodiment, the mechanism consists of means to increment a reference count in the implementation object, means to return a weak reference to the calling object, and means to decrement the reference count on destruction of the calling object.
In addition, the invention provides a method, and corresponding computer program product embodying means to program a general purpose computer to carry out the method, of implementing cross-language calls between objects in explicit memory management domains supporting reference counting and implicit memory management domains support garbage collection. The method includes the steps of invoking a method for the call on a proxy object permitting cross-language access, converting the method to a language-independent token having a reference count of at least one, selecting a pointer for invoking an implementation side mechanism, and receiving a return value at the client.
Finally, the invention provides a composite object constructed over at least first and second object oriented language environments. The composite object is constructed with a class hierarchy in the first environment having a proxy base class and at least one derived implementation object, a class hierarchy in the second environment having a base implementation class and at least one proxy derived class, an interface mechanism adapted to detect a cross-language reference from proxy class in the composite object and to select a pointer to a corresponding implementation class. When a method invoked on the composite object in one of the language environments executes on a proxy class, that proxy class invokes a cross-language method on its corresponding implementation class in the second environment.