1. Field of the Invention
This invention relates to a majority circuit in a highly reliable multiplexed computer. It relates specially to a majority circuit for selecting an output of a normal processing unit from more than three processing units which execute the same instruction simultaneously.
2. Description of the Related Art
Various techniques are used to improve reliability of computers in the field of fault-tolerant computing. One of the most popular techniques is multiplexing a circuit which provides a critical logical function (Usually, a processing unit). Generally, a plurality of outputs from the multiplexed processing units are input to a majority circuit, and the majority circuit selects one output as a majority result. A typical example of the majority circuit is "two out of three majority method" which selects one of at least two equal outputs from three outputs of triplicated processing units. According to this method, the correct output is obtained by the majority circuit even when an error occurs in one of three processing units. But when errors occur in two of the three processing units or in the majority circuit itself, the normal output cannot be obtained, and this may cause the whole system to go down.
An example of a conventional technique that detects an error of the majority circuit, is described in Japanese Examined Patent No. HEI 3-26415 bulletin as shown in FIG. 27. Majority circuit 200 detects the majority signal of input signals 31-33 in this method. Inside of a self-diagnostic circuit 4, there is another majority circuit, which is similar to the majority circuit 200. An output signal 7 of the majority circuit 200 and a majority result of the self-diagnostic circuit 4 are compared. When these two majority results do not match, the majority circuit is judged to be faulty. When each input signal 31-33 has 2 bits, the majority circuit 200 is configured as shown in FIG. 28. This majority circuit 200 carries out logical operations, bit-by-bit, and obtains output signals as shown in FIG. 29. But, by this method, even when the input signals 31-33 are different as shown in the fourth line, the fifth line, and the seventh line of FIG. 29, the output signal 7 of the majority circuit 200 becomes equal to the majority result of the self-diagnostic circuit 4. In other words, the output signal 7 of the majority circuit 200 becomes effective even when two of the processing units are faulty. Thus, a wrong signal may be provided to the circuit of the next stage of the majority circuit 200.
Usual output of the processing unit includes an address signal, a data signal, and a control signal. A status of the address signal or the data signal may float (for example, high impedance) at a certain timing point. Accordingly, if mismatch detection and error detection is carried out at any timing point to detect faults, improper fault detection may result. To avoid improper fault detection, mismatch detection is carried out only when a mismatch is detected and continued for a certain period by a masking time setter 5 as shown in FIG. 27. The period of the floating state varies according to the operating status of the processing unit, so that the masking time becomes long when the period of the floating state is long. On the other hand, a period to be checked is also masked by the long masking time and the reliability of the processing unit is reduced. In this way, using the masking time setter 5 cannot solve the problem completely. And in this method, a redundant circuit as the masking time setter 5 including a timer etc. is used.
When the input signals 31-33 consist of 1 bit, the majority circuit 200, the self-diagnostic circuit 4, and a mismatch detector 3 do not become large. But the number of the output signals is usually more than 32 bits in current processing units. Thus, each of the majority circuits 200, the self-diagnostic circuit 4, and the mismatch detector 3 need to be at least 32 bits large, and the total circuit scale becomes large.
Furthermore, according to Japanese Examined Patent No. HEI 3-26415 bulletin, the majority circuit is multiplexed to improve its reliability, and the majority result is output to the function circuit through a driver. High reliability is obtained by implementing the multiplexed majority circuit in an LSI circuit in this method. FIG. 30 shows a configuration in applying this method to a computer and a controller connected to the computer. Though the majority circuit 200 is multiplexed in this way, when the driver 16-2 or the controller of the next stage (for example, a receiver 13-4 of a controller 18) is faulty, a correct result cannot be output to a system bus. And though a function circuit 14 is multiplexed, when the driver 16-2 of the majority circuit or the receiver 18-4 is faulty, the correct input cannot be provided to the function circuit 14. In this case, multiplexing the function circuit is irrelevant.
In a fault-tolerant computer, a processing unit is triplicated and an output signal is decided by majority logic. In this way, the reliability of the fault-tolerant computer is improved. And the reliability is also improved by duplicating modules of circuits from the next stage of the majority circuit. FIG. 31 shows a system described in Japanese Unexamined Patent No. HEI 2-202636 bulletin. There are majority circuits 200-1 and 200-2 each in global memories #1, #2 respectively in this system. The majority circuits 200-1 and 200-2 compare the outputs of three processing units (CPU #A, CPU #B and CPU#C), detect the majority, and output the majority result to a system bus. The majority circuit is duplicated in this configuration, and each majority circuit works independently. But the circuits do not exchange signals for confirming each operation mutually, so that the circuits are not duplicated in an exact meaning. Accordingly, an error occurring in one of the majority circuits results in an output of a wrong signal to the system bus.
FIG. 32 shows another system for improving the reliability of the majority circuit, which is described in Japanese Examined Patent No. HEI 3-46851 bulletin. In this system, the outputs from three processing units are input to the majority circuit 200 through an error correction code encoder(ECC/ENC) 28 and the output of the majority circuit 200 is input to an error correction code decoder (ECC/DEC) 29. Accordingly, bit errors can be corrected by the error correction code decoder 29.
The relationship between a faulty element and an element for avoiding an error is shown in FIG. 33. In the case of an error occurring in a path, the error can be avoided by circuits or elements of the next stage. In this way, this system puts the principal object to improve the reliability of the multiplexed circuit. But this system does not pay attention to specifying the error location and repairing the faulty element. Thus, it takes a long time to disconnect, repair or exchange the faulty element automatically by CPU because the error location cannot be specified.
FIG. 34 shows another system for improving the reliability of the majority circuit, which is described in Japanese Unexamined Patent No. HEI 1-98034 bulletin. Parity generators 23-1-23-3 for each output of the systems 1-3 are comprised in this system. Parity generator 23-4 is also comprised for the output of the majority circuit 200. An error occurring in systems 1-3 is detected by comparing the outputs of the parity generators 23-1-23-3. An error occurring in the majority circuit 200 is detected by comparing the outputs of the parity generators 23-1-23-4. Generally, the systems 1-3 are processing units and it is a rare case that only one bit of the plural bits of the outputs of the processing units differs from the others. For example, plural bits may differ when the processing unit is faulty in executing branch instruction and branches to different location. Generally, the parity check method can detect only one bit error. Thus, errors occurring in systems 1-3 cannot be adequately specified by comparing the outputs of the parity generators 23.
A system for correcting an error occurring in the majority circuit is described in Japanese Examined Patent No. HEI 3-46851 bulletin as shown in FIG. 32. By this system, one-bit errors occurring in the majority circuit can be automatically corrected, and plural bits error can be detected. But the circuit scale of the error correction code encoder/decoder is large.
In a multiplexed system with the majority logic, when the faulty system is left connected, the output of the majority circuit is still influenced by the faulty system. Systems for disconnecting a faulty system and selecting a main system from several systems have been proposed. For example, disconnecting a faulty system and selecting the main system from plural systems can be carried out by a system described in Japanese Unexamined Patent No. SHO 57-36356 bulletin (this is not illustrated). But even in the majority circuit for one bit, as described in the bulletin, a lot of logic circuits are needed for disconnecting the faulty system and selecting the main system from plural systems. In case of plural bits (n bits), almost n times as many logic circuits are needed. And in this system, the majority result becomes ineffective when the main system is selected, so that the reliability is reduced.
Another system for disconnecting the faulty system is described in Japanese Unexamined Patent No. HEI 1-126825 bulletin (this is not illustrated). In this system, a purging circuit can disconnect the faulty system. But in case of the majority circuit for plural bits (n bits), almost n times as many logic circuits are needed.
Problems Solved by the Invention
According to the conventional techniques as mentioned above, various kinds of systems have been provided to improve reliability of the majority circuit. But there are still problems as follows.
Problem 1: In case of a majority circuit configured as a simple circuit, the signal from the majority circuit becomes effective even if two processing units are faulty. Thus, a wrong signal may be provided to the next circuit to cause a malfunction of the next circuit. PA0 Problem 2: When a masking time setter is provided and comprised to prevent an improper fault detection of a mismatch of inputs of the majority circuit, it is difficult to set the time based on the bus operation because the masking time setter is configured by a timer, etc. And a redundant circuit as the masking time setter including a timer, etc., is needed. PA0 Problem 3: In the case of the majority circuit being applied to the processing unit, the circuit scale becomes large because the output of the processing unit has plural bits. And when an error is corrected to improve reliability of the majority circuit, the circuit scale becomes large. PA0 Problem 4: When a function circuit is separated from the majority circuit, a faulty driver of the majority circuit or a faulty receiver of the function circuit causes a wrong input to the function circuit. Accordingly, even if reliability of the majority circuit improves, the reliability of the total circuit including the function circuit does not improve. PA0 Problem 5: Though the majority circuit connected to the processing unit (or a controller including the majority circuit) is duplicated, two majority circuits do not exchange signals mutually. Thus, a wrong signal may be input to the system bus in case of an error occurring in the majority circuit. PA0 Problem 6: When an error correction code encoders/decoders are included, an error can be avoided, but it is difficult to repair or exchange a faulty element because the error location cannot be specified. PA0 Problem 7: When parity generators are included, the outputs of the parity generators are compared and checked. But plural bit errors of the output of the processing units may not be checked because the parity check method can detect only one-bit errors. PA0 Problem 8: In a conventional system for disconnecting a faulty system, disconnecting is carried out bit-by-bit, so that in case of plural bits (n bits), almost n times as many logic circuits are needed. PA0 Problem 9: In case of selecting a specific system from plural systems, only the output of the specified system is selected and the majority result becomes ineffective. This causes the reliability of the whole system to go down.