A conventional method is known as described in the article: M. Schelvis and E. Bledoeg: "The implementation of a Distributed Smalltalk", ECOOP'88 proceedings, Lecture Notes in Computer Science 322, Springer-Verlag, pp. 212-232 (1988), which is incorporated by reference herein. This article describes a method for exchanging packet-bound information as it is used in a distributed object-oriented system, such as that of the Smalltalk-80 system, on behalf of the distributed reclamation of memory space, hereinafter referred to as "Distributed Garbage Collection". The Smalltalk-80 system is further described in an article by Glenn Krasner entitled "The Smalltalk-80 Virtual Machine", BYTE, August 1981, pp. 300-320 (1981), which is incorporated by reference herein.
The aforementioned "Smalltalk-80" article describes the Smalltalk system as one type of object-oriented system which can be represented by a digraph. The system encourages development of large applications programs. The system contains a compiler, a debugger, a storage management system, text and picture editors, and a file system. To aid in the understanding of the system in which the methods of "garbage collection" of the present invention can be used, the Smalltalk-80 system will be subsequently described. However, it should be noted that the present invention is not limited to operation on only the Smalltalk-80 system. Such a system is merely being described for exemplary purposes and should not be considered in any way to limit the scope of the present invention.
The pieces contained in the system described in the "Smalltalk-80" article (the compiler, the debugger, . . . , etc.) can be written in Smalltalk-80 and are called the "Smalltalk-80 Virtual Image". The remaining part of the Smalltalk-80 system is defined in terms of an abstract machine called the "Smalltalk-80 Virtual Machine". The Smalltalk-80 compiler translates source code into machine instructions for this virtual machine, rather than translating directly into machine instructions for a particular hardware machine. Therefore, the implementation of the Smalltalk-80 system onto any actual computer (such as a SUN computer, for example), hereinafter referred to as a "target" computer, includes implementing of (writing a program to simulate) the Smalltalk Virtual Machine on the target computer.
Such a system as the Smalltalk-80 system is made up of objects that have state and exhibit behavior. Their state includes values of indexed and name variables, called fields, and their behavior is exhibited through sending and receiving messages. Programming in Smalltalk-80 is done by defining the procedures, or methods, that are executed when objects receive messages. Typically, messages are sent to other objects to invoke their methods. The fields mentioned before, are, most of the time, containers for a pointer to another object (for performance optimization, the values in the fields of some objects will be interpreted as the numerical values themselves, rather than as object pointers). A group of objects referring to each other, by means of these pointers, also including a pointer to the object-class (which is itself also an object) constitutes a directed graph with the objects being nodes and the references or pointers (through which messages can be transmitted from node to node) being edges. Some objects are accessible from outside the graph and are termed "root objects". Objects live if they have a root, i.e., if they are accessible via a path through the graph starting from a root-object. The object of garbage collection is to reclaim memory space which is occupied by "dead" objects which are objects that are no longer accessible and are hence useless.
An object's graph is a directed graph for each of the connection points are all objects than can be accessed from the root object via a path of inter-object references (the graph's branches). If roots of the objects are shared, the objects are called identical.
An object represents a component of the Smalltalk-80 software system. For example, objects represent: Numbers; character strings; queues; dictionaries; rectangles; file directories; text editors; programs; compilers; computational processes; financial histories; and views of information. An object consists of some private memory and a set of operations. The nature of the operations of the object depend on the type of component it represents. Objects representing numbers compute arithmetic operations. Objects representing data structure store and retrieve information. Objects representing positions and areas answer inquiries about their relation to other positions and areas.
A message is a request for an object to carry out one of its operations. A message specifies which operation is desired, but not how that operation should be carried out. The receiver, the object to which the message was sent, determines how to carry out the requested operation. For example, addition is performed by sending a message to an object representing a number. The message specifies that the desired operation is addition and also specifies what number should be added to the receiver. The receiver determines how to accomplish the addition.
The set of messages to which an object can respond is called its interface with the rest of the system. The only way they interact with an object is through its interface. A crucial property of an object is that its private memory can be manipulated only by its own operations. A crucial property of messages is that they are the only way to invoke an object's operations. These properties ensure that the implementation of one object cannot depend on the internal details of other objects, only the messages to which they respond.
An example of a commonly-used data structure and programming is a dictionary, which associates names and values. In the Smalltalk-80 system, a dictionary is represented by an object that can perform two operations: associate a name with a new value, and find a value last associated with a particular name. A programmer using a dictionary must know how to specify these two operations with messages. Dictionary objects understand messages that make requests like "associate the name Brett with the value 3" and "what is the value associated with the name Dave?" Since everything is an object, the names, such as Brett or Dave, and the values, such as 3 or 30 are also represented by objects. Although a curious programmer may want to know how associations are represented in the dictionary, this internal implementation information is unnecessary for successful use of a dictionary. Knowledge of a dictionary's implementation is of interest only to the programmer who works on the definition of the dictionary object itself.
Garbage collection is often used in systems in which memory is organized as a "heap", for example, in Smalltalk or Lisp systems. In creating an object in Smalltalk systems, memory space of the heap is assigned hereto dynamically. Description of this memory space and further the relationships between objects and memory, will be subsequently be described.
It should be noted that the Smalltalk Virtual Machine includes three elements which must be implemented: The storage manager; the interpreter; and the primitive subroutines. To implement the storage manager, information is necessary to represent objects in the computer's memory. This information includes the amount of memory that each object will occupy, which can be computed from the number of fields the object has, and the representation of fields in the memory. Thus, the storage manager for objects in a Smalltalk-80 system will fetch a particular class of objects, fetch and store fields of objects, create new objects, and collect and manage free space. It is the maintenance of free space to which garbage collection is essentially directed to. This process will be explained subsequently in greater detail.
The interpreter executes the machine instructions of the Smalltalk-80 Virtual Machine. Information needed to design the interpreter is a description of these machine instructions call byte codes. The byte codes are contained in methods, so the representation of methods must also be known. From this information it can be decided how the interpreter will fetch and execute byte codes and how methods will be found to run when messages are sent.
Finally, it must be determined which messages will invoke primitive subroutines (which will be described subsequently in greater detail). Thus, while typically messages are sent to other objects to invoke their methods, sometimes messages invoke primitive (machine-code) subroutines rather than Smalltalk-80 methods. Accordingly, it must be determined which methods must be implemented in machine code to terminate the recursion of message sending and optimize performance.
Everything in a Smalltalk system is an object, so from the storage point of view memory needs to be divided into blocks, one for each object, plus a pool of memory that is not yet used. Every time a new object is created, a new block of the appropriate size must be found for that object. Further, when objects are no longer used, their memory block may be returned to the pool of unused memory.
An object pointer is assigned each object. The object pointer is an indirect pointer to the object through a table kept by the storage manager. This allows the storage manager to move an object around in memory without affecting any object that refers to it. It also ensures that the storage manager is the only entity in the system concerned with (and allowed to change) the actual memory. In the Smalltalk-80 Virtual Image, object pointers are single 16-bit words. The storage manager keeps the length of the block as one word of the block, one word which is the object pointer of the object that describes class, and fields of the object which are in themselves objects. Accordingly, if the object is of a class description such as a "point", one word must be kept which is the object pointer of an object that is the X coordinate field of the point and one must be kept which is the Y coordinate field of the point. Similarly, if the object is of the class "triangle", one word must be kept that is the object pointer of an instance of class "point", representing one vertex field, as well as one for the second vertex field, and one for the third vertex field. Still further, for performance optimization, the values in the fields of some objects, such as those of a class "Byte Array" will be interpreted as the numerical values themselves, rather than object pointers.
The purpose of the storage manager is to fetch and store fields of objects, to create objects, and to manage free space. Requests can be made for new storage, by calling a particular subroutine, but not to return new storage. In some other systems, storage that is no longer used must be explicitly returned to the free storage pool. The Smalltalk-80 philosophy is that neither the user nor any part of the system other than the storage manager need have such concerns. Therefore, the storage manager must know which objects are no longer being used, so that their storage may reenter the free storage pool. Accordingly, it is this proper management of freeing up memory space, relating to objects which are not accessible via path through their graph starting from a root-object, and thus removing them from memory and returning the particular portion of memory to a free storage pool, which requires a method of garbage collection. This information which is no longer essential for any purpose must be removed from memory to free up the memory for other future use. This involves garbage collection.
The interpreter of the Smalltalk-80 Virtual Machine performs the actions described in the byte codes of methods (the machine code of the Virtual Machine). The information needed to implement the interpreter does a description of the byte codes, the representation of methods, and the technique defined the method to run when sending a message. The Smalltalk-80 Virtual Machine, in a corresponding byte codes set, are stack oriented. However, the difference between the Smalltalk-80 Virtual Machine and procedure-based machines is the way the procedure is found. In the Smalltalk-80 system only the "name", called the selector, of the messages provided, the method to be executed being found through a strategy involving the received message and its class. Accordingly, object pointers are pushed and popped from a stack, and when a message is sent, the top few elements of the stack are used as receiver and arguments of the method.
Methods are implemented as object whose fields contain the byte codes plus a group of pointers to other objects called the literal frame. The interpreter can use a particular subroutine of the storage manager to fetch the next required byte code to execute. This takes care of returns, jumps, and pops, but for the other byte codes more information must be represented. In particular, for the push and store byte codes, it must be represented where to find the object pointers to push. Further, for the second byte codes, we need to represent where to find the selector of the message and which stack elements are the receiver and arguments.
The source code for a method contains variable names and literals, but the byte code to the Virtual Machine are defined only in terms of field offsets. From the Virtual Machine's point of view, there are three types of variables: Variables local to the method (called temporaries); variables local to the receiver of the message (instance variables); or variables found in some dictionary that the receivers' class shares (global variables). Class variables are treated in the same way as other global variables. The Smalltalk-80 compiler (itself written in Smalltalk-80) translates references to these variables into byte codes their references to field offsets of the receiver, the temporary, or globals. The instance variables are translated using a field of class-describing objects that associate instance variable names with field offsets. The assignment of offsets to temporaries is done when the compiler translates a method by associating names of temporaries to offsets in the temporary area. The compiler creates instances for literals, puts their object pointers into the literal frame of the method, and produces byte codes in terms of offsets into the literal frame. For global variables, the compiler uses system dictionaries that associate global names to indirect references to objects. Object pointers of the indirect references to the global objects are also placed in the literal frame of method. The byte codes for accessing globals are encoded as indirect references through field offsets in the literal frame.
Therefore, when the interpreter is executing a method, it has to keep a stack, a temporary area, a pointer to the receiver and arguments of the method, and a pointer to the method itself. It uses the storage managers subroutines to push and pop pointers from the stack object, to receive and set values of variables in the temporary area, to retrieve and set values of variables of the receiver, and to get byte codes and values of global variables from the method.
When a message is sent, the receiver and arguments must be identified, and the appropriate method must be found by the interpreter. The technique used in Smalltalk-80 is to include in each class-describing object a dictionary, called the method dictionary, that associates selectors with methods. Pointers to the selectors that will be sent by any method are kept in the method (along with global variable pointers and byte codes). The byte codes that tell the interpreter to send a message and code a field offset in a literal frame where the selector is found, plus the number of arguments that the method needs. By convention, the top elements of the stack are the arguments and the next one down are the receivers.
If no such association is found, the searching does not end. The receivers class may be a subclass of another class, called its superclass. If this is the case, the method may be defined in the superclass, so the interpreter checks here. This means that each class must have a field that refers to its superclass. Interpreter search is the method dictionary of the superclass, its superclass and so on, until either an appropriate method is found or it runs out of superclasses, in which case an error occurs.
The Smalltalk-80 Virtual Machine implementation is a program running in the machine language of the target computer. The storage manager is the collection of subroutines in the program that deals with memory allocation and deallocation. The interpreter is the collection of subroutines in this program, one which fetches the next byte code from the currently running method and calls one of the others to perform the appropriate action for that byte code.
In addition to these functions there are several other places in the Smalltalk-80 system which performance considerations make it necessary, or at least desirable, to implement certain functions as machine code subroutines in the Smalltalk-80 Virtual Machine. These places are: Input/output, connecting the Smalltalk-80 system to the actual hardware; basic arithmetic functions; fetching and storing indexable instance variables; screen graphics utilizing drawing and moving areas of the screen bit map quickly; and objection allocation, connecting the Smalltalk-80 code for creating a new instance with the storage manager subroutines. This set of subroutines are called primitive subroutines.
The primitive subroutines are represented in the Smalltalk-80 Virtual Image, this method with special flag that says to run the corresponding subroutine rather than Smalltalk-80 byte codes. When the interpreter is executing the code to send a message and finds one of the flags set, it calls the subroutines and uses the value returned from it as the value of the method. A number of these methods in Smalltalk-80 is small, in order to keep the rest of the system as flexible and extensible as possible.
Accordingly, the Smalltalk-80 Virtual Machine should be recognized as a fairly small computer program that consists of the storage manager, an interpreter, and a set of primitive subroutines. The task implementing a Smalltalk-80 Virtual Machine for a new target computer is not too large because most of the functions that must usually be implemented in machine code are already part of the Smalltalk-80 Virtual Image that runs on top of the Virtual Machine.
It should further be noted that a system such as the Smalltalk-80 Virtual Machine, while be discussed as a fairly small computer program, could also be implemented in hardware. Such an implementation would sacrifice some of the flexibility of the software, but it would result in the performance benefits that hardware provides. Hardware assets to a system such as the Smalltalk-80 Virtual Machine software can greatly improve performance. Writable microcodes storage for the pieces of code that are frequently run, hardware assets for graphics or hardware assets for the fetching of bright codes could all potentially improve the performance of Smalltalk-80 Virtual Machine implementation.
In the article entitled "The Implementation of a Distributed Smalltalk", a number of systems are discussed which are designed for distributed garbage collection. The article further states a plurality of specific disadvantages described in the conventional system. These disadvantages, as well as discussion of the distributed Smalltalk system will subsequently be discussed.
Distributed Smalltalk comprises a number of cooperating Smalltalk Virtual Machines, as was previously described, distributed over a network, that provide complete distribution transparency to the image level, including transparent messages passing across machine boundaries. As a result, no modifications are necessary at the image level and thus the standard Smalltalk debugger can be used for system wide debugging. However, in such a network system, garbage must be collected not only from a single Virtual Machine, but from a plurality of Virtual Machines whose object may be interconnected in order to establish a path starting from a root-object, indicating that an object should live. Thus, distributed garbage must be collected over such a system. It should be noted that the distributed Smalltalk system, to be hereinafter described, is only an exemplary system to which methods of "garbage collection" of the present invention can be utilized. Further, it should be noted that the present invention is not limited to operation of a distributed Smalltalk system. Such a system is merely described for exemplary purposes and should not be considered in any way to limit the scope of the present invention. Garbage collection, by the methods and device of the present invention, can be utilized in any computer network system, for example, which can be represented by a digraph. Further, the overall invention is certainly not restricted to object-oriented systems. One aspect of the present invention, could, for instance, be a distributed document management system, programmed in C and running one a number of interconnected personal computers.
In a distributed programming system such as that of distributed Smalltalk, the system is completely distribution transparent. Distribution transparency implies that programmers writing distributed applications, such as multiauthoring document systems, e-mail or calendar systems, need not worry about object access, network location, replication, currency control, etc. Distributed Smalltalk is based upon an existing implementation of Smalltalk and is implemented on a network of SUN computers running Berkeley, UNIX.
Distributed systems in applications are inherently more complex to program than non-distributed ones. Accordingly, object-oriented languages like Smalltalk-80 allow programmers to construct applications in terms of communicating objects. Objects are an excellent way to structure a distributed system because they provide a means for data encapsulation. Data encapsulation is a powerful mechanism for controlling access to share data.
The Smalltalk programming system can be seen as a set of objects that communicate with each other and with the user in a defined way. The Smalltalk system provides functions like storage management, display handling, text and picture editing, compiling and debugging. The Smalltalk system consists of a virtual image and a Virtual Machine as previously discussed.
It is the Virtual Image which comprises the set of all objects. An object is a representation of a real world entity such as a display screen, or an abstract entity such as a number. Objects communicate with each other by sending messages, as previously described, a message specifying the name of the receiver object, the name of the operation, and a list of object names as arguments. A message only specifies which operation has to be performed. The receiver of the message determines how the operation will be carried out.
A class is a set of equivalent objects, a class being itself an object. It describes the private data and set of operations of its instances. The private data of an object is described by its instance variables. An instance variable being a name which refers to one object, called its value.
Smalltalk-80 is a single user programming system, as previously described. Multiple Smalltalk programmers can exchange objects only by writing objects (or source code) into a file, transferring the file over the network and reading the file destination. The Smalltalk Virtual Machines are enriched with some primitives that enable inner-Virtual Machine communication, and the images with specific objects that make use of these new primitives in order to send messages to each other over the network. By utilization of this distributed system, advantages of the image level approach can take place such as: no substantial changes have to be made to the Virtual Machine and hence it is relatively easy to make Virtual Machines from different vendors work together; and no performance is lost during local operation.
In the past, problems existed with an image level approach. One problem, dealing with input/output existed when a remote Virtual Machine was executing some method for one person, and within this method a message is set to the object display. Accordingly, one person would want things to happen on their screen and not the screen of a colleague. Other problems existed with standard classes on different host computers and a further problem existed with regard to Smalltalk processes. With the image level approach, during a remote execution there are several Smalltalk processes involved, at least one sender and one receiver process. Therefore, the Smalltalk debugger could not be utilized for remote debugging.
These problems were solved by the concept of distribution transparency at the image level. This provided that all consequences of distribution were concealed from the image, and therefore also from applications and users. As a result of distribution transparency, all object in the system could be referenced in the uniform manner regardless of factors such as access, location, migration and replication. Since distribution transparency had to be provided by the Virtual Machines, the approach was entitled a "Virtual Machine Level" approach. When a message was sent to one object, the Virtual Machine knew whether the object was local, remote or replicated. Further, if it was remote, the machine knew where or how to find it and if it was replicated, how to select the appropriate replica. If the receiver happened to be a local object, the local Virtual Machine handled the message just like an ordinary stand-alone Virtual Machine. If the receiver was remote, however, the message was forwarded to the remove Virtual Machine. As a result objects on different Virtual Machines could work together as if they were on the same host.
With regard to the Virtual Machine level approach, a number of Virtual Machines are capable of localizing objects and doing remote sends on one standard Smalltalk image. The objects of the standard image can thus be distributed at random over the plurality of Virtual Machines. The resulting system, however, still leads to a single user system since there is only one display object, one input sensor, etc. and worse, they are on different machines. Accordingly, the system was improved such that extra objects on every Virtual Machine were made which represented the functionality of the underlying hardware, for example, display, sensor, processor, schedule controller. These objects were called host objects (a host is a Virtual Machine plus image). However, since multiple objects are now associated with the same name, selectibility of the right object was a problem.
To reduce network traffic between machines, multiple copies (replicas) of heavily used objects were created, one for each host. Accordingly, host objects like display carry the flag "home" in their header, and process objects contained an instance variable containing the identity of the host where their process was initiated (their home). When a receiver was flagged "home" then the message sent to it was forwarded to the home of the current process. These objects flagged home were entitled "home objects".
An important part in designing Smalltalk-80 programs exists in determining which kinds of objects should be described and which message names provide a useful vocabulary of interaction among these objects. A language is designed whenever the programmer specifies the messages that can be sent to an object. An appropriate choice of objects depends, of course, on the purposes to which the object will be put and the granularity of information to be manipulated. For example, if the simulation of an amusement park is to be created for the purpose of collecting data on queues at the various rides, then it would be useful to describe objects representing the rides, workers who control the rides, the waiting lines, and the people visiting the park. If the purpose of the simulation includes monitoring consumption of food in the park, then the objects representing the comsumable resources are required.
In distributed Smalltalk, objects within a local object space are uniquely identified and addressable by means of their object-oriented pointer. Sometimes objects are addressed indirectly, and the object-oriented pointers or pointer to a "forwarding object", a Smalltalk object that contains the actual pointer. Forwarding objects are not visible within the Smalltalk image, but for the Virtual Machine they are ordinary objects.
The problem of garbage collection is that of reclaiming space occupied by "dead" objects, which is data that has been inaccessible. All data (objects) in a heap oriented system form a graph structure of objects pointing to one another. This graph contains some rude objects, which are accessible by definition. Objects live when they are accessible via a path of pointers starting from a root. Otherwise, they are dead.
One conventional method of garbage collection, utilized in connection with the distributed Smalltalk system, is that of a garbage collection based upon a lifetime of objects, called general generation scavenging. Newly created objects are stored in New Space. When New Space is filled up, New Space and Survivor Space are garbage collected with a copy via graph traversal called scavenging. The roots of this graph are the set of New Survivor objects referenced from Old Space, Replica Space or remote hosts. This route is dynamically updated by checking on stores of pointers to New Space and spaces. The objects in this graph are moved to a new Survivor Space, except for old enough objects, which are moved to Old Space. At the end of a traversal, New Space is empty. Since most new objects die soon, Old Space fills up relatively slowly, and therefore garbage collection of much bigger Old Space and Replica Space is necessary much less frequency.
However, one problem which exists in generation scavenging in that cyclic structures cannot be detected by this system. Accordingly, if a first object is connected to a second object which is further connected to the third object, and the third objection is connected back to the first object, garbage collection cannot occur immediately via generation scavenging. One has to wait until, by aging, distributed cyclic garbage is supposed to end up in the oldest generation of objects, and thus becomes local cyclic garbage, which can then be collected. Therefore, a system is desired which can detect and breaks cycles of roots via a plurality of connectable objects and can further detect objects which do not trace back to a root to thereby reclaim memory space occupied by these "dead" objects.
A further conventional system exists utilizing a type of mark and sweep scavenging. This is termed "reorganization". In Smalltalk, for example, the Old Space is garbage collected on user-request utilizing this mark and sweep system with a file as temporary space. For distributed Smalltalk, scavenging is done within a background Smalltalk process, such that each time it is active it copies a few living Old Space objects from one side of the Old Space to the other. With the distributed Smalltalk system, however, a global mark and sweep type system is necessary. Thus, is order to discover dead objects in a distributed system, all hosts should be checked as having pointers to a particular object. The graph of living objects is traversed, the objects accessed are marked, and at the end the space of unmarked objects is reclaimed or "swept". The global mark and sweep type system, however, does not work properly when all hosts are not able or willing to cooperate. This is an important problem because in an average distributed system, this is likely to be the case.
A still further conventional method of garbage collection is the type known as reference counting or waited reference counting. Reference counting is an object-based method which concentrates on the death of individual objects. Object-based methods keep track of the incoming pointers of each object, for example, by dynamically updating a count of them. When the count equals zero, the object is dead and space is reclaimed. This system, however, is clearly even less efficient than the previously mentioned systems, because it suffers from the deficiency of not being able to collect local cyclic garbage, let alone distributed cyclic garbage.