This invention relates to a method of synchronizing a plurality of controllable clock circuits for use in a loosely coupled distributed system comprising a plurality of processors which are included in the controllable clock circuits, respectively, and which are coupled to one another.
A loosely coupled distributed system is, for example, one of multiprocessor systems that comprises a plurality of processors having no common memory. Such a multiprocessor system may be a distributed filing system, a communication network system, and so on. The distributed filing system comprises the processors, each of which creates or compiles a plurality of files. Each processor includes a controllable clock circuit indicating a local time and a local memory, such as a magnetic disk. Each file is memorized in its own local memory together with the local time at a creating or a compiling time instant at which the file is created or compiled. Each file is assigned with a version or an operation condition. The version of the file is often identified by using the creating or the compiling time instant of the file. In this case, the controllable clock circuits of all processors must indicate local times which are equal to one another as a system time instant for the distributed filing system.
Inasmuch as each processor, for example, successively memorizes its own history information itself in its local memory together with the system time instant, it is possible to obtain information for assisting in locating a failure from the history information when the failure occurs.
The communication network system comprises a plurality of nodes as the processors are connected to one another through transmission paths. Also in the communication network system, the controllable clock circuits of all processors must indicate the local times which are equal to one another as a network time instant for the communication network system. This is necessary because, for example, a service may be offered at a predetermined reservation time instant by connecting the transmission paths together.
Various clock synchronization methods are already known. By way of example, a clock synchronization method is disclosed in an article which is contributed by Riccardo Gusella et al to Proceedings of the 6th International Conference of IEEE on Distributed Computing Systems (May 1986), pages 364-371, under the title of "An Election Algorithm for a Distributed Clock Synchronization Program". The clock synchronization method according to the Gusella et al article is designed for a distributed network clock synchronizer for Berkeley UNIX 4.3BSD computer systems which is called TEMPO. The TEMPO works in a local area network where a specific one of the processors is called a master processor and the remaining processors are each called a slave processor. The master processor inquires of all processors about local times indicated by controllable clock circuits of the respective processors with a predetermined time interval and then computes the network time instant as an average of the local times. The master processor issues an instruction for clock adjustment that includes the network time instant. Each processor adjusts its own controllable clock circuit by using the instruction issued from the master processor. Therefore, a clock synchronization method by the TEMPO is called a master-slave method. In the master-slave method, one processor must become a new master processor when a failure happens to a previous master processor. According to the Gusella et al article, an election algorithm or method using a time-out scheme is implemented. A first one of the slave processors, whose timer circuit expires after the failure has occurred in the previous master processor, becomes a candidate for the new master processor. The candidate broadcasts an election message to all processors notifying them of its candidacy and the new master processor is elected among such slave processors. At any rate, inasmuch as the master processor inquires of all processors about the local times, traffic for the master processor is increased and the overhead in the master processor is increased in order to compute the average of the local times when there are a large number of processors.
Another clock synchronization method is described in an article which is contributed by Flaviu Cristian et al to Proceedings of 16th Annual International Symposium of IEEE on Fault-Tolerant Computing (1986), pages 218-223, under the title of "Clock Synchronization in the Presence of Omission and Performance Faults, and Processor Joins". In the clock synchronization method according to the Cristian et al article, each processor in the communication network system diffuses periodically a message for establishing synchronization that includes its own local time. Each processor adjusts its own controllable clock circuit by using the message which includes the fastest local time. The controllable clock circuits of all processors therefore indicate the fastest local time as the network time instant when clock synchronization is maintained. As a result, the network time instant is controlled by a specific processor having the controllable clock circuit which indicates the fastest local time. Therefore, the specific processor corresponds to the master processor which is described in the Gusella et al article. When the controllable clock circuits of the respective processors indicate local times which are nearly equal to one another, the messages for establishing synchronization are almost simultaneously transmitted from the processors. In this case, processing for synchronizing the controllable clock circuits is repeatedly and concentratedly carried out at a particular time instant.
Still another example is disclosed in an article which is contributed by Leslie Lamport et al to Journal of the Association of Computing Machinery, Vol. 32, No. 1 (January 1985), pages 55-78, under the title of "Synchronizing Clocks in the Presence of Faults". In the clock synchronization method according to the Lamport et al article, all processors in the system periodically transmit the local times indicated by the controllable clock circuits of the respective processors to one another. Each processor controls its own controllable clock circuit by using the transmitted time instants and its own local time. The clock synchronization method according to the Lamport et al article is therefore called a fully distributed algorithm or method. The fully distributed method focuses on fault tolerance against even malicious failures. However, the fully distributed method is disadvantageous in that communication overhead increases in a practical large system. This is because all processors mutually communicate.
In order to remove the disadvantages discussed above, a clock synchronization method has been proposed by Syoichiro Nakai et al, the present joint inventors, in a paper read in "A Symposium on Integrity and Reliability for Information Communication Network" under the auspices of "Denshi Joho Tushin Gakkai (The Institute of Electronics, Information and Communication Engineers of Japan)" (July 1987), pages 35-38, under the title of "A Study on Clock Synchronization in Network Systems" according to contributors' translation. In the clock synchronization method described in the Nakai et al paper, each of the processors randomly and periodically selects a plurality of processors as selected processors from all processors equal in number to a first predetermined number which is not less than three. The selected processors are equal in number to a second predetermined number which is not less than two but is less than the first predetermined number. Each processor reads, as read time instants, the local times indicated by the controllable clock circuits of the respective selected processors and then controls its own controllable clock circuit by using its own local time and the read time instants. When each processor reads the local times from the respective selected processors, each processor transmits to the respective selected processors inquiry messages for obtaining from the respective selected processors the local times indicated by the controllable clock circuits of the respective selected processors and then receives from the respective selected processors acknowledgement messages including the local times indicated by the controllable clock circuits of the respective selected processors. However, transmission of the inquiry messages transmitted from all processors are repeatedly and concentratedly carried out at a predetermined time instant at which clock synchronization is maintained, that is, at which the controllable clock circuits of all processors indicate the local times which are nearly equal to one another. This is because all processors transmit the inquiry messages to the respective selected processors at the predetermined time instant. In addition, each processor waits for the acknowledgement messages transmitted from the respective selected processors. When any one of the selected processor develops a fault and becomes a fault processor and when the fault processor therefore can not transmit the acknowledgement message to the processor waiting for the acknowledgement message which the fault processor should transmit, the processor waits for the acknowledgement message and falls into a deadlock condition.