1. Field of the Invention
The present invention relates to a lockstep fault tolerant computer that simultaneously processes the same instruction strings in a plurality of clock-synchronized computer modules therein, and more particularly to a fault tolerant computer and its transaction synchronization control method that accept a time lag in I/O transactions issued from the plurality of CPU modules to I/O modules.
2. Description of the Related Art
A conventional fault tolerant computer that simultaneously processes the same instructions in a plurality of clock-synchronized computer modules therein comprises a plurality of CPU modules and a plurality of I/O modules. Each I/O module has a comparator that checks if the operation of the plurality of CPU modules synchronizes each other. When the I/O modules receive the same I/O transactions from all CPU modules at the same time, the comparator judges that the CPU modules are in synchronization, that is, the CPU modules do not fail and the I/O modules execute I/O processing. A conventional fault tolerant computer guarantees that synchronization is maintained unless the comparator judges that the CPU modules fail.
However, as the speed of processors has increased recently, an out-of-synchronization condition occurs. This is because the recent high-speed processors installed in the CPU modules do not fully synchronize with each other even if the CPU modules receive the same clock. This sometimes results in an I/O module finding a time difference in the I/O transactions issued from the plurality of CPU modules.
For such an out-of-synchronization condition, it is known that the program operations are the same. However, the conventional fault tolerant computer sometimes enters the CPU module fallback condition or executes re-installation operation in that case although the CPU modules do not fail.
As described above, one of the problems with the conventional fault tolerant computer is that, when an I/O module finds a time difference in the I/O transactions issued from the plurality of CPU modules, the fault tolerant computer enters the CPU module fallback condition or executes re-installation operation even when the cause of the out-of-synchronization condition was not a failure. This, in turn, reduces the MTBF (Mean Time Between Failure: average time from a computer system failure to the next failure) of a fault tolerant computer and thus reduces the advantage of a fault tolerant computer.
Many technologies have been proposed conventionally for solving an out-of-synchronization condition among a plurality of processors. For example, Japanese Patent Laid-Open Publication No. Hei 11-338832 discloses a method in which the inter-processor bus data transfer start signal and end signal are used by the active processor and the standby processor to wait for the delayed processing of the standby processor for quickly establishing synchronization between them. However, those technologies differ from the technology according to the present invention that in that the technology according to the present invention accepts a time delay in I/O transactions received by each I/O controller.