This application is based upon and claims the benefit of priority from the prior Japanese Patent Application No. 11-229355, filed Aug. 13, 1999, the entire contents of which are incorporated herein by reference.
The present invention relates to a method for synchronizing program such as a deterministic program by using a reliable ordered multicast in a distributed computer system having a plurality of computers connected by a network, and also concerns such a distributed computer system and a computer as well as a storage medium.
First, an explanation will be given of a deterministic program, a reliable ordered multicast and a synchronizing process that are used in the present specification.
The deterministic program is explained as follows: As illustrated in FIG. 1, a deterministic program 12 is designed in such a manner that upon application of an input to a computer 10, the output and the next status are determined depending on the status 11 of the computer 10 at that time. In other words, in the deterministic program 12, once the output is determined, the next status 12 and the output are uniquely determined. More specifically, it refers to a program in which no reference is made to undefined values and random numbers. The concept of such a deterministic property has been widely used in the field of automaton.
As illustrated in FIG. 2, the characteristic of the deterministic program is that once the initial status and an input string have been determined, the operation is uniquely determined. Hereinafter, the deterministic program is referred to simply as the program.
Moreover, the reliable ordered multicast is explained as follows: In an environment such as a distributed computer system in which a plurality of computers are connected through a network, the respective computers are allowed to operate independently. Therefore, a special scheme is required so as to operate these computers in a synchronized manner. The reliable ordered multicast, which is one of such schemes, is a protocol which distributes data from each computer to all the computers, and which ensures that the order of arrivals of pieces of data is the same in all the computers.
Referring to FIG. 3, a specific example is given of the reliable ordered multicast. Data A, which was transmitted from a computer 10-2 at time t20, is received by all the computers 10-1, 10-2 and 10-3 at times t11, t21 and t31 through an reliable ordered multicast, not shown. Data B, transmitted from a computer 10-3 at t30, is received by all the computers 10-1, 10-2 and 10-3 at times t12, t22 and t32. In this case, data A and data B are received by the respective computers 10-1, 10-2 and 10-3, and the reliable ordered multicast controls the system so that the order of receipts of these two data is the same in all the computers 10-1, 10-2 and 10-3.
Moreover, the synchronizing process is explained as follows: In the distributed computer system, there is a possibility that any of the computers might become out of order independently. Supposing that one disordered computer causes a malfunction in the entire system, the operating ratio of the distributed computer system becomes lower than the operating ratio of any one of the computers.
In order to prevent such a problem, it is necessary to multiplex processes that relate to the entire system. In contrast, the synchronizing process makes it possible to set the operating ratio of the distributed computer system higher than the operating ratio of any one of the computers. For example, in the case when a distributed computer system, constituted by 10 computers each having an operating ratio of 99%, is not multiplexed at all, the operation ratio of the distributed computer system is approximately 90%. Here, when multiplexed by two computers each having an operating ratio of 99%, the process has an operating ratio of approximately 99.99%.
Next, referring to FIG. 4, an explanation will be given of a synchronizing method by using the reliable ordered multicast. In this example, in a distributed computer system having computers 10-1, 10-2 and 10-3, a program execution is multiplexed by using the reliable ordered multicast.
As illustrated in FIG. 4, first, all the computers 10-1, 10-2 and 10-3 are started with a predetermined initial status 11 in which, for example, all variables are set to zero. Data to be input is distributed to all the computers 10-1, 10-2 and 10-3 always through a reliable ordered multicast 13, and inputted to respective programs 12. Here, one output from any one of the computers is taken as an output (in FIG. 4, computer 10-1). The input string of each program is allowed to have the same order by the reliable ordered multicast 13 so that all the computers 10-1, 10-2 and 10-3 are maintained in the same status 11 with their output strings being also the same because of the feature of the program. In other words, the execution of the program is multiplexed.
Next, an explanation will be given of the difference between a system in which synchronizing is made by the reliable ordered multicast and a system in which synchronizing is made by using a master/slave method. In other words, in the master/slave method, while a program is being executed on a master computer, each status is transferred to a slave computer periodically, and in the event of any fault of the master, switching is made to the execution of the program on the slave side; thus, a synchronizing process is achieved.
However, in the case of the master/slave method, back tracking occurs at the time of each taking over, with the result that the switching process at the time of any fault of the computer becomes complex, causing time-consuming tasks.
In contrast, in the case of the application of the reliable ordered multicast, no back tracking occurs at the time of any fault of the computer so that the switching process is simple and no time-consuming task is required.
Moreover, in the master/slave system, overhead is required for copying each status regularly; however, in the application of the reliable ordered multicast, no overhead is required.
In this manner, with respect to processes relating to reliability and performances of the entire system, it is preferable to use the reliable ordered multicast so as to carry out synchronizing.
The master/slave method, on the other hand, is suitable for cases in which a deterministic program is executed or in which executing a program on the slave side is not preferable.
The synchronizing method by the use of the reliable ordered multicast is based upon the premise that all the computers are operated from beginning. However, in an actual operation, there are cases in which a synchronizing process has to be started in the middle of the operation. For example, such cases include cases in which a computer which has been fault is recovered and in which a computer is newly added. In these cases, it is necessary to expanding the synchronizing process.
Referring to FIG. 5, an explanation will be given of a conventional method for expanding the synchronizing process by the use of the reliable ordered multicast. In FIG. 5, at Step 1, the reliable ordered multicast 13 is temporarily stopped. Next, at Step 2, the status 11 is copied on the computer 10-3 to be included. Next, at Step 3, the group of the reliable ordered multicast 13 is expanded and resumed.
In this method, supposing that a copy of any status is not appropriately carried out, the operation of the computer that has been included becomes different from the other computers. Of course, since the copy is made after the reliable ordered multicast has been temporarily stopped to maintain an invariable status, such an event will never occur in principle, and all the computers are allowed to start with the same operation when the reliable ordered multicast is resumed.
However, in the case when the status of computer is complex, it is not easy to acquire the status accurately, and a bug might be contained in some cases. In such a case, in the conventional system, only the computer that has been included malfunctions, and the other computers are operated normally, resulting in a problem in which it becomes all the more difficult to discover any defect.
The object of the present invention is to provide a program synchronizing method capable of synchronizing a deterministic program by using a reliable ordered multicast independent of the status of computer, a distributed computer system, a computer and a storage medium.
In order to solve the above-mentioned problem, the method for synchronizing a program that is executed on one of a plurality of computers in a distributed computer system by using a reliable ordered multicast, comprising the steps of: generating a new process comprising a program and the status in execution on a computer; and transferring the new process through the reliable ordered multicast to the computers, respectively.
Moreover, a system for synchronizing a program in accordance with the present invention, which is applied to a distributed computer system having a reliable ordered multicast, comprising: means for generating a new process comprising a program and the status in execution on one of the computers; and means for transferring the new process through the reliable ordered multicast to the computers, respectively.
Furthermore, a computer in accordance with the present invention, to be adapted a distributed computer system by using a reliable ordered multicast, comprising: generation means for generating a new process comprising a program and the status in execution on the computer; and transferring means for transferring new process through the reliable ordered multicast to the other computers of the distributed computer system.
A computer readable storage medium in accordance with the present invention, which is readable by a computer, and applied one of a plurality computers in a distributed computer system having a reliable ordered multicast, and synchronizing method including the steps of: generating a new process comprising a program and the status in execution on a computer; and transferring new process through the reliable ordered multicast to the computers, respectively.
In accordance with the present invention, the process in execution need not be expanded, and a new process is generated in a separated manner so that the status is transferred or copied from the process in execution to the new process; therefore, any possible error, etc. at the time of acquiring the status does not give adverse effects on the synchronizing process. Moreover, different from the conventional system, since it is not necessary to expand the reliable ordered multicast itself, it is possible to simplify the reliable ordered multicast protocol as compared with the conventional system, and consequently to improve the performances and reliability.
Additional objects and advantages of the invention will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention. The objects and advantages of the invention may be realized and obtained by means of the instrumentalities and combinations particularly pointed out hereinafter.