As one of redundant systems providing high reliability, a fault tolerant (FT) system has been known.
A fault tolerant system is a computer system in which fault tolerance is improved by duplexing or multiplexing (hereinafter simply referred to as duplexing) hardware modules constituting the system, operating all of the duplexed modules in synchronization, and even if a fault occurs in any part, separating the failed module and continuing processing using the normal modules.
The basic configuration of a fault tolerant system is constituted of hardware modules including a CPU (Central Processing Unit), a memory, an I/O device, and the like which are to be duplexed, and a fault tolerant control section (hereinafter referred to as an FT control section) which is connected with the modules and performs synchronous operation processing, switching control when a fault occurs, and the like. In general, a fault tolerance system is divided into a part in which respective modules are duplexed using hardware and a part in which they are duplexed using software. For example, a CPU subsystem including a CPU and a memory is an infrastructure on which software operates, so it is necessary to be duplexed by hardware. As such, the duplexed CPU subsystems must operate with the same clock. In this way, operating duplexed CPU subsystems in synchronous with each other completely in clock units is called a lockstep operation. As such, if an error occurs in a CPU subsystem, the hardware (FT control section) separates the CPU and the memory of the CPU subsystem from the system so as to control the system such that the error does not affect the CPU and the memory operating normally.
On the other hand, in the case of a fault of an I/O device, it is possible to switch the I/O device by software if the FT control section detecting it notifies the software controlling the I/O device (hereinafter referred to as an I/O device driver) of an error. In that case, the I/O device driver stops using the failed I/O device, and uses another I/O device which is the duplexed one, instead of the failed one.
As described above, in a fault tolerant system, when a failure occurs generally, the system of the failed side is separated, and the operation is continued only using the remaining normal system. However, if the system of one side is separated, there is a problem that redundancy is lost, so that the system will stop if another failure occurs.
As related art of the present invention, Patent Document 1 (JP 11-134210 A) discloses a system redundancy method. In the related art, each module is made redundant by being, at least, duplexed, and respective functions in a module are divided into blocks, or respective functional elements are provided in a multiple number of pieces. If a failure occurs in a part of any function in a module, the block in which the failure has occurred or the failed functional element is separated so as to implement a degraded operation. Further, a normal redundant module, which operates in parallel, is also caused to implement a degraded operation in parallel so as to have the same configuration as that of the module performing a degraded operation due to the failure.    Patent Document 1: JP 11-134210 A
However, it is difficult to apply the above-described related art to a redundant system which implements a lockstep operation because of the following grounds. In the related art, the failed system and the normal system perform different operations while respective redundant modules run in parallel. This means that a system including a memory, which is separated because it is failed actually, and a system including a memory, which is separated because it corresponds to the failed memory and it is separated in order to realize the same configuration as that of the failed system, run in parallel but perform different operations. However, in a redundant system which implements a lockstep operation, during a parallel operation in a lockstep mode, all systems perform the same operation, so that it is unlikely that the two systems perform different operations.