1. Field of Use
The present invention relates to a fault tolerant computer architecture.
2. Prior Art
It is known that due to the advent of integrated circuit technologies and the consequent cost and size reduction of the electronic components, fault tolerant computers have been proposed and put in the market. The concept of fault tolerance, as used with reference to computers, is very broad and includes all the expedients which make possible the correct operation of a computer even in presence of a failure, or at least the immediate detection of a failure, in order to avoid incorrect data handling and the propagation and spreading of errors in the set of handled data. In other words, it is essential that failures do not result in data errors.
Several computer architectures are used to achieve this result: the spread is from majority logic architectures to simple logic redundancy architectures. In the majority logic architectures, the various processing functions are performed jointly in parallel by three or more functional units, with a comparison of the input and output data so that, in case of discrepancy among the input data or the output data, valid data are recognized as those which coincide with the input or output of two functional units, and that data which differ from the majority is discarded as incorrect data.
It is clear that this kind of system may operate without causing data errors even if a failure is present in one functional unit until the occurrence of a further failure, affecting a functional unit in a data flow parallel to the one of the already faulty units. During this time interval, it is possible to assure a continued service of the equipment even if the faulty unit is temporarily removed from the system for purpose of replacement or repairment.
In simple logic redundancy architectures, this objective is relinquished and it is only assured, through suitable redundancies, that a failure is immediately detected, so as to stop the running of logical processes without affecting the data correctness and integrity. Thereafter, by suitable diagnostic procedures, the defective unit may be identified, excluded from the system, replaced with a spare unit if available, or repaired.
In the most elementary form of redundancy architecture, the computer may be provided with parity bit generators and checkers in those nodes or units where information loss is more likely to occur, for instance, in the working memories. In the most sophisticated redundancy architectures, all the functional units or most are duplicated and simultaneously operated in parallel.
Comparators, suitably located, compare the data pair as an input to, or output from each pair of functional units and upon the occurrence of a discrepancy provide a fault signal and cause a system halt. The problem in such architectures is checking
the comparators functionality. The most common approach is also to duplicate the comparators, giving rise to further complications and other disadvantages, such as increases in driven loads and failure probability.
Another approach is to cause, under test conditions, an alteration at one of the data in inputs to the several comparators to check if they effectively provide an error indication. This may be obtained by providing, upstream of one input set of each comparator, a set of exclusive OR (EX OR) gates. Such gates, depending on the logic level present at one "control" input, transfer an output, the logic level present at the other input in direct or inverted form. Therefore, they allow altering at will, the data at the input to the comparators to check the effective generation of an error signal. Even in this case, however, a further complication results in terms of an increase in the failure probability and cost.
It would be possible, in order to simplify the structure, to use a single set of exclusive OR gates, located in the information flow so as to simulate a data error with effects which propagate in cascade in a line of functional units and not in the other one, therefore with consequences affecting the operation of the functional units located downstream and detectable by the comparators located downstream. This approach reduces but does not overcome the above-mentioned disadvantages. In addition, it has the disadvantage of introducing propagation delays in the logic flow, delays which are generally unacceptable.