The present invention relates generally to devices for monitoring the synchronization in a multicomputer system having parallel-working individual computers, and more particularly to such a device that evaluates the time between synchronization-readiness signals, which are produced by the various computers of the multicomputer system or their synchronization modules and which signals indicate the attainment of a specified synchronization point, and to such a device that severs a computer, which does not output its synchronization-readiness signal within a specified time span after receiving the synchronization-readiness signals from the other computers. Such a device is disclosed by the German Provisional Patent 12 698 27.
Data flow among computers must be synchronized from time to time, especially in those multicomputer systems, in which the individual computers have separate clock-pulse generators and must exchange data among themselves. This is particularly necessary in situations where the number of computers is increased for security and reliability reasons and, in some instances for availability reasons, and where the computers continually compare themselves to one another to check for conformity in order to detect malfunctions. Computers also have to be synchronized when data (e.g., messages and commands) are simultaneously input via external interrupts. The maximum permissible time spans, within which a mutual synchronization must take place, depend essentially upon the clock frequency of the computer-clock-pulse generators and upon the accuracy of these clock-pulse generators.
German Provisional Patent 19 52 926 discloses a method for synchronizing two parallel-working data processing units, one of which is active and the other of which constitutes a reserve unit. The active unit in each case generates synchronizing signals in periodic intervals. These signals serve in the reserve unit to phase lock the clock-pulse generator there to the phase position of the clock signals from the controlling unit. This known method is not suited for multicomputer systems that have several controlling computers due to the unpredictable manner in which the computer emitting the synchronizing signals influences the data processing of the other computers, when correcting their clock-pulse generators.
German Provisional Patent 21 55 159 discloses an arrangement for synchronizing a multitude of computers in a computer system, in which the individual computers are mutually synchronized by having the computer that is the first to reach a synchronization point anchored in its program transmit a synchronization signal via a common line shared by all the computers to the remaining computers. This synchronization signal is stored in the remaining computers for a certain length of time. It blocks the synchronization signals generated there within the remaining computers themselves, and activates a pulse-generating circuit, by means of which a counter for clock signals is forced into the same switch position to which the corresponding counter of the fastest computer had been switched. This completes the synchronization process. In the case of this known configuration for synchronizing the computers of a multicomputer system, the failure of one computer, or rather of the circuit elements allocated to this computer for synchronization purposes, is not detected. Additionally, this known configuration fails to verify whether or not the computer actually assumes the specified switch position in the slower-running computers.
German Provisional Patent 24 13 401 discloses a device for synchronizing a two-out-of-three computer system, in which the processing of a new command is made dependent upon at least two of the three computers having established completion of the preceding command. Time-delay elements assure that the slowest computer at the time is able to complete the execution of commands and then simultaneously begin, together with the other computers, with the processing of the following command. If the slowest computer is not able to do this, it falls out of step, and is unable to synchronize itself. The computer system as such remains operational then as a two-out-of-two system. This device is unable to determine that one of the computers has failed because it is unable to be synchronized with the other computers; hence, no troubleshooting operation is launched. Thus, the failure of a second computer causes the computer system to become non-operational.
German Provisional Patent 12 698 27 discloses a method and a device for synchronizing two parallel-working data processing systems, in which the synchronization signals generated by the two data-processing devices are monitored in a timer supervision routine to verify that they do not run too far apart. If they are running apart to an unacceptable degree, however, then a program interrupt occurs due to a timing error. If the two synchronization signals exist within the maximum time duration specified by the timer supervision, then they initiate a synchronization routine in both individual computers through an AND operation. Since the timer supervision is not supposed to respond during the running operation, its performance must be verified by test programs to ensure that it is actually effective in case of a malfunction as well. These test programs adversely affect the application programs running in the data-processing devices and slow down the effective operating speed of the computer system.
German Published Patent Application 34 31 169 discloses a method for synchronizing several parallel-working computers, in which each computer interrupts its program in response to a signal received from another computer indicating its synchronization readiness, and when the conditions are present for its part, it outputs a corresponding signal to all the other computers. Each computer begins with the processing of the next program step, after all computers of the computer system have signalled their synchronization readiness. Therefore, in this case, the processing speed of the fastest computer is adapted to that of the slowest computer of the multicomputer system. To prevent the situation from occurring in which the entire multicomputer system can no longer continue functioning after one computer fails, the computers also continue with their program when, in addition to their own synchronization-readiness signal, the corresponding signal from another computer is also available, and a certain specified minimum time has elapsed. However, no means are available for detecting and disconnecting an individual computer that has become out of synchronization with the other computers. In particular, this known device does not disclose, in case of a malfunction, means for severing the computer that is no longer reliably operational from the multicomputer system, in which the means for severing operate within the still operational computers.
The present invention is directed to the problem of developing a device for monitoring the synchronization in a multicomputer system consisting of parallel-working individual computers by evaluating the delay between synchronization-readiness signals that are produced by the various computers of the multicomputer system or their synchronization modules and that indicate the computer has reached a specified synchronization point. The present invention is also directed to the problem of developing such a device that severs a computer that does not output its synchronization-readiness signal within a specified time span after the device receives the synchronization-readiness signals from the other computers, and which device guarantees that a defective computer can be reliably detected and severed from the computer system, when the defective computer is unable to be synchronized with the remaining computers. The present invention is also directed to the problem of developing a device that performs the above stated functions without interrupting the performance of the application program in an undesirable manner. Finally, the present invention is directed to the problem of developing such a device in which when a single computer becomes unsynchronized, the remaining computers continue to operate in the two-out-of-two mode, and in which only when the unsynchronized computer is unable to be easily severed, is the computer system altogether disconnected for security reasons.