This invention relates to a computer system comprising a plurality of processors and, in particular, to a fault-tolerant computer system comprising an active processor and a backup processor wherein the active processor carries out control for a controlled system when a failure does not occur in the active processor and the backup processor carries out control for the controlled system when a failure occurs in the active processor.
Such a fault-tolerant computer system is, for example, described by J. Gray et al and translated by E. Watanabe et al into Japanese in a book published by McGraw-Hill, Inc., (October, 1986) and entitled "FAULT TOLERANT SYSTEM." The fault-tolerant computer system comprises a first processor acting as an active processor, a second processor acting as a backup processor, and an input/output control device serving as a channel connection switching control device. The first processor is connected to the input/output control device via a first input/output channel while the second processor is connected to the input/output control device via a second input/output channel. The input/output control device is connected to a controlled system via a system input/output channel.
The first processor comprises a first central processing unit (CPU) and a first failure detecting circuit. The second processor comprises a second CPU and a second failure detecting circuit. The first CPU periodically produces a first periodic signal indicative of a first operation state of the first CPU. The second CPU periodically produces a second periodic signal indicative of a second operation state of the second CPU.
The first failure detecting circuit always monitors the second operation state of the second CPU by receiving the second periodic signal. The first failure detecting circuit delivers a first monitored result signal indicative of its monitored result. Inasmuch as the first monitored result signal indicates the second operation state of the second CPU, the first monitored result signal is called a second processor operation state signal. When the first CPU supplies a first input/output channel acquisition signal to the input/output control device, the input/output control device connects the first input/output channel with the system input/output channel.
Likewise, the second failure detecting circuit always monitors the first operation state of the first CPU by receiving the first periodic signal. The second failure detecting circuit delivers a second monitored result signal indicative of its monitored result. Inasmuch as the second monitored result signal indicates the first operation state of the first CPU, the second monitored result signal is called a first processor operation state signal. When the second CPU supplies a second input/output channel acquisition signal to the input/output control device, the input/output control device connects the second input/output channel with the system input/output channel.
The input/output control device carries out connection and switching of the first and the second input/output channels and the system input/output channel on the basis of the first and the second input/output channel acquisition signals supplied from the first and the second processors.
Operation of the fault-tolerant computer system will be described. Description will be at first made as regards an operation in a case where no failure occurs in both of the first processor and the second processor.
The first CPU of the first processor periodically sends the first periodic signal indicating that no failure occurs in its own CPU (the first CPU) to the second failure detecting circuit of the second processor. The first failure detecting circuit receives the second periodic signal from the second processor and supplies the first CPU with the first monitored result signal indicating that no failure occurs in the second processor.
Each of the first and the second failure detecting circuits may be composed of general electronic circuit elements. Each of the first and the second failure detecting circuits may be, for instance, a "watchdog timer" which is described in detail by Yoshihiro Tohma et al in a book published by Maki Shoten (Mar. 1991) and entitled "Structure and Design of Fault-Tolerant System," on pages 159-160. In the first failure detecting circuit using the "watchdog timer", the second periodic signal includes a second timer start condition signal and a second timer reset condition signal. Responsive to the second timer start condition signal, the first failure detecting circuit makes a timer operate. If the first failure detecting circuit cannot receive the second timer reset condition signal before the timer expires, the first failure detecting circuit judges that a failure occurs in the second processor.
In order to require connection of the first input/output channel and the system input/output channel, the first CPU supplies the input/output control device with the first input/output channel acquisition signal.
When the input/output control device receives the first input/output channel acquisition signal from the first CPU, the input/output control device connects the system input/output channel with the first input/output channel. Such an input/output control device is disclosed in the above-mentioned book entitled "FAULT TOLERANT SYSTEM," on pages 104-106. In this event, the input/output control device accommodates the first and the second input/output channels and the system input/output channel. On reception of the first or the second input/output channel acquisition signals, the input/output control device connects the system input/output channel with one of the first and the second input/output channels for the processor which produces the input/output channel acquisition signal in question.
The second processor is similar in structure to the above-mentioned first processor. When the second CPU recognizes that no failure occurs in the first processor by receiving the first periodic signal supplied from the first CPU, the second CPU makes the second processor operate as the backup processor. For this purpose, the second CPU does not supply the input/output control device with the second input/output channel acquisition signal, thereby the second processor does not use the system input/output channel.
As apparent from the above-mentioned operation, the first processor acquires the system input/output channel to carry out control of the controlled system. The second processor waits as the backup processor.
Description will be made as regards operation in a case where a failure occurs in the first processor which is operable as the active processor.
When the failure due to abnormality in software and fault in hardware occurs in the first processor, the first CPU stops delivery of the first periodic signal to the second failure detecting circuit of the second processor. In this event, the second failure detecting circuit recognizes that a failure occurs in the first processor and the second failure detecting circuit supplies the second CPU with the second monitored result signal indicating that a failure occurs in the first processor.
On reception of the second monitored result signal, the second CPU supplies the input/output control device with the second input/output channel acquisition signal to switch control of the controlled system from the first processor to the second processor. The input/output control device disconnects the system input/output channel from the first input/output channel and connects the system input/output channel with the second input/output channel. Connected to the controlled system, the second CPU carries out transmission and reception of control information to the controlled system by using the system input/output channel.
As apparent from the above-mentioned operation, when switching of the processors is carried out, the second processor acting as the backup processor carries out control of the controlled system instead of the first processor serving as the active processor.
As described above, in a conventional fault-tolerant computer system, the CPU, which detects the failure in its mating processor, produces the input/output channel acquisition signal to be operable as the active processor. However, it is impossible in the conventional fault-tolerant computer system to prevent the CPU where a fault occurs in the processor from supplying the input/output channel acquisition signal to the input/output control device. When the CPU where a failure occurs in the processor accidentally supplies the input/output control device with the input/output channel acquisition signal, the system input/output channel is connected to the processor in which the failure occurs. Under the circumstances, erroneous control information is supplied to the controlled system.