The present invention is implemented in a distributed data processing system. A system consisting of two or more data processing systems which are capable of functioning independently but which are so coupled as to send and receive messages to and from each other.
A local area network (LAN) is an example of a distributed data processing system. A typical LAN comprises a number of autonomous data processing "nodes", each comprising at least a processor and memory. Each node is capable of conducting data processing operations independently. In addition, each node is coupled to a network of other nodes which may be, for example, a loop, star, tree, etc., depending upon the design considerations.
As mentioned above, the present invention finds utility in such a distributed data processing system, since there is a need in such a system for processes which are executing or which are to be executed in the individual nodes to share data and to communicate data among themselves.
A "process", as used herein is a self-contained package of data and executable procedures which operate on that data, comparable to a "task" in other known systems. Within the present invention a process can be thought of as comparable to a set (module) of subroutines in terms of size, complexity, and the way it is used. The difference between processes and subroutines is that processes can be created and terminated dynamically and can execute concurrently with their creator and other sets (modules) of "subroutines".
Within a process the data is private and cannot be accessed from the outside (i.e. by other processes). Processes can therefore be used to implement "objects", "modules", or other higher level data abstractions. Each process executes sequentially. Concurrent execution is achieved through multiple processes, possibly executing on multiple processors.
Every process in the distributed data processing system of the present invention has a unique identifier connector by which it can be referenced. The connector is assigned by the system when the process is created. The connector is used by the system to physically locate the process.
Every process also has a non-unique, symbolic "name", which is a variable-length string of characters. In general, the name of a process is known system-wide. To restrict the scope of names, the concept of a "context" is utilized. This concept is described in detail in copending U.S. Pat. applications having Ser. Nos. 000,621 and 000,624 cited in detail above. Basically, a context is a collection of related process whose names are not known outside of the context.
A process in one context cannot symbolically communicate with, and does not know about, processes inside other contexts. All interaction across boundaries is by means of messages and pass through a "context process".
A "message" is a buffer containing data which tells a process what to do and/or supplies it with information it needs to carry out its operation. Messages are queued from one process to another by name or connector. Queuing avoids potential synchronization problems and is used instead of semaphores, monitors, etc. The sender of the message is free to continue after the message is sent. When the receiver attempts to get the message, it will be suspended until one arrives if none are already waiting in its queue. Optionally, the sender can specify that it wants to wait for a reply and is suspended until the specific message arrives. Messages from any other source are not dequeued until after that happens.
Messages provide the mechanism by which user transparency is achieved. A process located anywhere in the system may send a message to any other process anywhere within the system if the sending process has the receiving processes name or connector. This permits process's to be dynamically distributed across the system at any time to gain optimal throughput without changing the processes which reference them. Sending messages by connector obviates the need for a name search and ignores context boundaries. This is the most efficient method of communicating.
In the present invention messages are generally composed of a message ID and one or more "triples". The message ID is a word describing the purpose of the message (e.g. status) or the request (e.g. get) encoded in it. A triple is a data portion made of three fields. The first fields generally identify the type of triple. The second field indicates how many bytes of information are contained in the third field, which may be zero (0). The third field contains the data of the message, such as a process status code.
In known data processing environments it is often necessary to add/remove resources (software or hardware types) to existing nodes. In addition, it is often necessary to add/remove nodes from the system. The connection between nodes may also become temporarily disrupted, this should not impact the correctness of the operation of the distributed service. Because of the interaction between nodes and resources of those nodes, it is essential that preexisting (remaining) nodes are notified of these additions (removals).
In the prior art, it is necessary for the system to be informed of these changes through a user interface. In addition, it is often necessary for the operation of the system to be discontinued during the reconfiguration of the system to compensate for the changes.
Presently, a local segment of a distributed resource service may lose or gain resources without the rest of the network becoming aware of the change. This change may be discovered at the next explicit access to the specific resource, but it may be too late at that time. A service could also poll all resources periodically in a logical ring. However, this polling could be too expensive in code and in performance.
In addition, presently nodes may get disconnected, restarted, and reconnected without the rest of the system becoming aware of the change. Connectors to remote resources residing on this node may become invalid with no indication to the rest of the service or to the users of the resources.
Further, some internal algorithms of the distributed services may be attempting to utilize a single "master" node in a single virtual machine in order to maintain the transparency of a single virtual machine, and thus also risk the loss or modification of the "master" node without the rest of the system being notified.
Accordingly, it is an object of the present invention to provide a distributed computer system that overcomes the above deficiencies.
A further object of the present invention is to provide a distributed computer system with network and resource status monitoring.
Another object of the present invention is to provide a distributed computer system with network and resource status monitoring which operates in a manner transparent to the users.
Still another object of the present invention is to provide a distributed computer system with network and resource status monitoring which is self-configuring.
Yet another object of the present invention is to provide a distributed computer system with network and resource status monitoring which is capable of adding and removing nodes at run-time while sustaining a non-stop mode of operation.
Another object of the present invention is to provide a distributed computer system with network and resource status monitoring in which the node and resource status monitoring are coherent with each other.
Still another object of the present invention is to provide a distributed computer system with network and resource monitoring in which the monitoring is handled in a distributed fashion.
Yet another object of the present invention is to provide a distributed computer system with network and resource monitoring which does not significantly reduce the network throughput.
Another object of the present invention is to provide a distributed computer system that provides true transparency of services running on dynamically changing configurations.