This application claims priority under 35 U.S.C. xc2xa7xc2xa7119 to Application No. 9704565-2 filed in Sweden on Dec. 8, 1997; the entire content of which is hereby incorporated by reference.
The present invention relates to a distributed communication system which comprises a number of nodes and said nodes are interconnected via an interconnection network. Each node comprises a number of resources such as for example processing resources, in/output resources, storing resources and distributed applications are executed through sending of messages between the resources in said nodes. The invention also relates to a node in a distributed network configuration, which node comprises a number of resources and communicates with other nodes via said interconnection network.
The invention still further relates to a method for sending messages in a distributed communication system comprising a number of nodes interconnected via an interconnecting network wherein each node comprises a number of resources. The invention also relates to a method of providing communication among user applications executing at resources located in different nodes in a distributed network through the sending of messages.
In a distributed system, which means a system based on a computer platform comprising a number of computer nodes interconnected through an interconnecting network a number of specific problems arise which mainly are related to the communication between resources or entities such as processes, threads, queues, residing at different nodes. Such problems among others relate to the communication between resources or entities, i.e. what operations are/should be available for communication and what are the semantics of such operations. Another issue relates to the detection of failures in a computer node and how the system reacts thereon. Still another important issue relates to the adaption of the system to changes in the available resources such as for example when nodes are added, removed etc.
Systems implementing location transparent message passing are known. Location transparent message passing means that an entity or a resource which sends a message to another resource (a process, a thread or similar) in a distributed system does not have to know anything about the actual location of the receiving entity but merely a name is used to refer to the receiving entity and underlying software layers handle the location dependent parts of the communication. Basically two different types of location transparent message passing are known. Depending on the name that is used slightly different semantics in the message passing procedures are produced.
In a first case, the name refers to a specific resource or entity which has a particular name. In that case the name is bound to the specific entity and the name is regarded invalid if the actual entity disappears even if another entity reappears under one and the same name. This means that the life time of the name and the entity it denotes is the same.
The second case relates to a name referring to any entity which has the particular name. This means in practice that an entity carrying a given name may disappear and a new entity may be created using the same name. Both these kinds of naming provides for location transparency but they differ in the way entities or resources may relocate in the system. In the first case only migration is permitted which means that an entity or a resource is actually moved and it does not allow destruction of an entity at one location and creation of a new entity at a new location which however is allowable in the second case. Most of the distributed systems are generally location transparent or even weaker which means that both the name and the location of the receiving entity has to be given. U.S. Pat. No. 5,133,053 shows an example on a system implementing location transparent message passing.
However, even if location transparent message passing actually is implemented this does not enable the provision of a truly scalable and fault tolerant distributed system. Scalability and tolerance to failures are two related properties since both of them are concerned with the behaviour of the system when changes occur in the available resources of the system. The same mechanisms are involved irrespectively of whether a resource such as for example an entire computer node is removed or whether it is added. Adaption means that an application should start employing an added node which means that it, or rather the entities residing on it, should become visible in the address space that is used when communicating in the system. Adaption means that the entities which existed on a removed node also should disappear from the communication address space.
FIG. 1A illustrates a location transparent system in which a thread named T1 sends a message to another thread named T2. Since the system applies location transparency, the sending thread T1 does not have to know the exact location of the destination thread, e.g. it does not have to know at which computer node the receiving thread T2 resides but it is sufficient with the name T2.
FIG. 1B illustrates a case when T2 has disappeared from the system. This could for example be the case if a failure had occurred in node 3. However, this fact is effectively hidden to the sending thread T1 because of the location transparency and T1 still attempts to send the message using the name T2 to refer to the destination. However, the name T2 is now invalid.
Depending on which is the communication mechanism, the behaviour in such a situation may differ but at best the sending thread is notified of the failure to deliver the message. The communication, even in the best case, do thus not offer any fault tolerance in itself but it merely gives the application, here the sending thread T1, information about the failure.
FIG. 1C illustrates the case when the disappeared thread T2 reappears at another node (here node 2). The location transparent message passing scheme will then automatically adjust to such a situation on condition that the reappearing thread has the same name, namely T2, as the disappeared thread.
Finally, FIG. 1D illustrates a situation with two instances of a thread T2, which means that two threads are functionally equivalent to the extent that they both are possible receivers of a message sent by T1. Through adding one more thread with the same functionality, the throughput of the application could potentially scale up. This scaling, however, does not mean that the application has to be rewritten, it is merely a question of creating one more copy of a thread. The main drawback is that, if location transparent communication, or any weaker form of communication, is used, the performance scaling will not take place since T1 still explicitly addresses T2 when sending messages.
Through the examples as illustrated in FIGS. 1A-1D, it becomes evident that location transparent message passing can handle situations in which the locality of a thread changes. However, it can not to a satisfactory extent handle situations with disappearing threads; in a way which makes a distributed application fault-tolerant. Furthermore scalability of an application is not supported since the application code is written to take advantage of specific, explicitly named resources (threads in the examples referred to above) and the addition of new resources requires the making of changes in the application code.
What is needed is therefore a distributed communication system which is efficient and easy to implement. Particularly a system is needed which permits scalability, i.e. that at least to a desired extent enables the use of newly added nodes or resources or similiar and which can handle the situation when nodes or resources are removed, when failures occur etc., i.e. a system which is tolerent to failures. In other words a system is needed which can handle an evolving system and when changes occur in the system and which efficiently can take advantage of the changes or handle the situation when errors occur.
A node is also needed which operates in a distributed environment and which fulfills the above mentioned objects and wherein a mechanism for sending of messages is used enabling the objects as referred to above to be achieved.
Furthermore a method of sending messages in a distributed communication system is needed which is efficient, allows scalability and which enables the handling of failures, i.e. which enables fault tolerant distribution of messages. Furthermore a system, a node and a method respectively is needed which is easy to manage.
Therefore a distributed communication system as initially referred to is provided wherein the resources are categorized or grouped into a number of function types. Resources which are grouped into one and the same function type are functionally equivalent at least to a given extent so that they all are possible receivers of a message intended for a resource of the function type. A number of function type instances, each corresponding to a particular resource, or type instance are provided for each function type. Each node furthermore comprises information holding means keeping information about which resourses/function type instances that correspond to a given function type and a distribution function is associated with said information holding means and selects a receiving function type instance among the instances available in the information holding means. For sending a message from a sending node, only the function type of the receiving resource is given as address information and the distribution function selects which function type instance will be the actual receiver among the function type instances associated with the input function type so that messages from a sending resource are sent independently of the location of the rescources with which a sending resource communicates as well as independently of the number and names thereof. Examples of resources are processing resources, input/output resources and storing resources.
Distribution state information is held in each node, particularly each computer node, regarding the currently available entities of any function type that some entity at the node has declared its intention to communicate with. Alternatively distribution state information can be held without any declaration about intention to communicate with particular function types. Advantageously the information holding means are updated, particularly continuously, so that in each node information is provided about which resources of at least a number of function types that actually are available for messages addressing the corresponding function types. In that way information about added/removed nodes is provided, making the system scalable and tolerant to failures.
In an advantagous embodiment the information carrier protocol is TCP/IP, (Transmission Control Protocol/Internet Protocol).
Advantageously messages are sent out by the distribution function, when a resource has been selected (i.e. a selected particular function type instance) using common location transparent message passing. Advantageously the distribution function executes in the same context as the sending resource.
Particularly the distribution function is specific for a particular function type and it is supplied and registered by said particular function type. Alternatively a number of distribution functions are predefined.
According to the invention, a resource which intends to send a message to a resource of a given function type invokes a send operation. Said send operation comprises information about the message and about the destination function type.
In an advantageous embodiment, which however is not necessary for the functioning of the present invention, a node may invoke a connect operation for each of a number of function types with which the node intends to communicate. In that manner a general reference is provided for subsequent send operation commands.
Advantageously the function of a distribution application sending messages is decoupled from the configuration of the distributed application. Particularly the number of resources varies and the system scales automatically at runtime enabling use of newly added resources or the adaption to the loss of one or more resources (as well as entire nodes).
According to the invention a function type is given by the association of a global name and all instances of the same function type execute the same function. Particularly the distribution function is provided by an application and invoked by the send operation.
In an advantageous embodiment the information holding means comprises a distribution state table which contains the function type instances available for each of a number of function types. Which function types are included in the table may for example be provided through the node and the connect operation referred to above in which an indication is provided about which function types a node/resource intends to communicate with. However, this can also be decided in other manners. In one particular embodiment all function types are included but this depends on the complexity of the system and its capacity, storing means etc.
Advantageously is for each function type included in the distribution state table a function handling means provided which contains function type name, its associated distribution function and the function type instances, at least including the instance name. According to the invention a distributed function can scale without the application having to be rewritten or recompiled, i.e. it can start using newly added nodes (resources and nodes) and also handle the loss of nodes/resources.
In an advantageous embodiment a separately executing distribution service is provided for maintaining coherent information in some or all nodes. In one embodiment the separately executing distribution function is integrated in the managing system, or the communication system itself. In an alternative embodiment it is arranged hierarchically above the system.
A node in a distributed communication system which comprises a number of resources and communicates with other nodes via an interconnection network is therefore also provided. The resources are grouped into a number of function types wherein resources grouped into the same function type forms so called function types instances of said function type and they are at least to a given extent functionally equivalent. A node furthermore comprises information holding means which are provided for holding information about which function type instances correspond to a given function type. Furthermore it comprises a distribution function associated with the information holding means and which is used for selecting which function type instance that will be the actual receiver of a particular message and only function type information has to be given as address information when a node or a resource sends a message.
Particularly location transparent message passing is used when a message is sent to a selected function type instance. The information holding means comprises a distribution state table which contains the function type instances that are available for each of a number of function types. Particularly are, :for each function type in the table, function handling means provided which contain function type name, associated distribution function and function type instances, including at least instance name for each of the instances.
A method for sending messages in a distributed communication system comprising a number of nodes which are interconnected by an interconnecting network is therefore also provided. Each node comprises a number of resources. According to the method the resources are grouped into a number of function types depending on functionality. Furthermore information is stored about the resources grouped depending on functionality in a table, in each, or at least in a number of nodes which are decided according to any appropriate critera. The function type is given as addressing information when a message is to be sent to a resource of a given function type. The information in the table is used and the distribution function is invoked. The distribution function uses the message, the information in the information holding means relevant for the addressed function type and optional application defined information to select a particular instance. The selected instance is returned and the message is location-transparently sent to the selected instance. Advantageously the message comprises a step of continuously updating the information holding means. Thereby is in each node information provided about which particular function type instances of at least a number of function types actually are available for messages addressing the corresponding function types. This enables scaling up/down in runtime of the application.
A method of providing communication among applications executing at resources located at a number of different nodes in a network interconnecting said nodes, wherein the actual locations of the nodes are transparent to the applications, is also provided. It comprises the step of maintaining, in storing means in each node, information about resources depending on their functionality, which are grouped into a number of function type instance groups, one for each a particular function type, and accessing a distribution function by giving the function type of a resource as a receiver of the message and said distribution function using the information in the information holding means, the message and the selection algorithms to select a particular function type instance as the receiver of the message. Then the message is sent to the selected function type instance using location transparent message passing.