To ensure reliable results, electronic data processing systems comprise, without exception, error check circuits for monitoring the arithmetical and logical operations performed in them. The best known devices used to this end are parity check circuits which generate an additional or a parity bit on the basis of a fixed data length. This parity bit makes the number of bits within the fixed data length either even or odd. At the end of transfer sections, the parity bit is generally checked for changes.
Processing steps changing the source information necessitate that the parity bit be newly generated, the newly generated parity bit subsequently accompanies this information. Changes in the parity bit, on the other hand, indicate that an error has occurred which, depending upon its magnitude, may result in a machine stop.
Such machine stops reduce the availability of a digital computer system, thus affecting the execution of jobs in real time operation.
However, not all of the errors detected in a digital computer are attributable to defective circuits, the latter leading to permanent errors, but there are many other causes besides, such as the discharge of high static voltages that may lead to erroneous pulses on the transfer lines, so that, for example, a line which at the time of the occurrence of such a pulse should have carried a binary zero in the form of a low-level signal carries a high-level signal instead, which analogously would be equivalent to a binary one. In such a case the connected parity check circuit would detect an error with regard to the information transferred. In known data processing systems, the error would be eliminated by operation retries. In the case of intermittent errors, the digital computer is generally still capable of computing the correct result, so that the occurrence of such errors normally does not affect its availability.
However, when so-called permanent errors, i.e., errors caused by defective circuits or components, occur in a digital computer system, the correct result cannot be computed by retrying a faulty operation or function, so that the machine has to be stopped. In such cases the system remains at a standstill pending the exchange of the defective system components.
This, however, entails the disadvantage of valuable machine time being lost, which is particularly detrimental when urgent jobs have to be carried out. Thus, it can be said that permanent errors seriously affect the availability of the system.
As there are certain electronic data processing applications where interruptions have to be avoided at all costs, it has been previously proposed to provide a data processing system consisting of two synchronized data processing units performing the same functions on the input data and whereby each processing unit comprises a plurality of data sources corresponding to a plurality of data sources in another processing unit. The two data processing units of the system concerned are connected in such a manner that they automatically monitor each other, disconnecting the defective data processing unit from the system in the case of an error. Such a system ensures largely trouble-free operation and meets the reliability standards up-to-date data processing systems are expected to meet. However, from the cost standpoint such a completely redundant data processing system is highly uneconomical as it requires twice the number of units for handling its jobs.
It has also been proposed to connect a main processing system and an error processing system via buses, whereby the error processing system monitors the check circuits of the main processing system by means of an addressing arrangement, identifying the corresponding check circuit in the case of an error, taking over the source data from the registers and functional units that have contributed to the faulty operation, storing and computing the erroneous function in its processing system, and subsequently transferring the erroneous function, via a selectable transfer system, to a result register of the main processing system corresponding to that function, and finally starting the main processing system for the next functions by setting a switch.
Although from the cost standpoint such a system is altogether more favourable than the one previously described, it does not offer an optimal solution with regard to the cost and time factor involved, for each function and operation necessitate data transfers from the main processing system to the error processing system and vice versa.