1. Field of the Invention
The invention relates to digital processors, particularly with respect to an architecture thereof that provides highly fault-tolerant performance in multiprocessor parallel operation and uniprocessor sequential operation.
2. Description of the Prior Art
Multiprocessor parallel operation and uniprocessor sequential operation computer systems are prevalent present-day computer architectures. Multiprocessor systems traditionally share common memory and perform relatively short sequences or threads of instructions in parallel. The parallelism of such systems often gets quite fine. In a uniprocessor system, the instructions of a task are sequentially executed. Fault-tolerant operation is generally a desideratum of such systems. In fine grain parallel processing, the desirability of fault-tolerance is even greater than in sequential processing. A failure of a processor in a parallel system, while executing one of several parallel execution paths, results in wasting the efforts of several processors. This condition is exacerbated when parallel execution threads converge and are followed by sequential processing. A break in a processor executing one of the threads may waste the work of all of the other processors.
The prior art endeavors to effect fault-tolerance by utilizing an auxiliary processor identical to the primary processor. The auxiliary processor operates in lock-step with the primary processor such that both processors simultaneously execute the same instruction of the same program thread. Whenever the thread being executed requires computer output, such as an external access to memory or to a message link, the outputs are compared and if disagreement is detected, the output is not utilized and the thread or the entire program is rerun. This technique is used in both multiprocessor and uniprocessor architectures. The disadvantage of this approach is that the instruction execution error that resulted in the erroneous output may have occurred a number of instruction execution cycles prior to the output and at the time of the output had been obscured by the subsequent processing. The only recovery option then is rerunning a significant portion of the program which is an extremely time wasteful procedure.
Tightly-coupled multiprocessor systems used in fast real-time processing provide high throughput and reliability. Although fault-tolerance in sequential systems is significant, it is believed to be more important in such parallel processing for hard real-time systems. The consequences of re-executing a segment of computation are sufficiently undesirable for single processors but are exacerbated if multiple processors are involved.