Ever-increasing demands being placed on the processing capabilities of processing systems have often exceeded the computing capacity of a single central processor. A solution to this problem has been to attach one or more auxiliary, or adjunct, processors to the central processor in such systems. The function of the adjunct processor has been to take some of the computational load off of the central processor, and thus increase the overall system processing capacity.
Certain applications of processing systems, such as communications switching, cannot easily tolerate being put out of service by the failure of system units. Such applications require the use of highly-reliable processing systems. Reliability in such systems has been achieved by the duplication of system units, in particular, the processors. Generally, the duplicated processors in such systems operate either in a lock-step configuration, with each processor performing all system tasks in parallel with the other processor, or in an active-standby configuration wherein one processor is performing all system tasks while the second processor is acting as a backup and is standing idly by, ready to take over should the one processor fail, or in a checkpoint configuration wherein a processor periodically sends information about transactions that it undertakes to another processor so that if failure of the one processor were to occur, the other processor could compute the current state of the failed processor and take over its transactions from that point. These arrangements are known as redundancy arrangements.
In redundant systems, either one of the duplicated processors is adapted to handle all systems tasks alone. Thus, a fault in one of the processors does not bring about the failure of the processing system. The other processor carries on all system tasks, but without a backup, until the faulty processor is repaired. In such systems, it takes the simultaneous failure of both processors to incapacitate the system. System reliability is thus significantly improved, but at the cost of adding a second processor that effectively goes unused. Furthermore, the viability of the second processor to take over the processing of system tasks can be guaranteed on a continuing basis only through extensive, complex, and expensive monitoring arrangements.
Multi-processor systems have also become extensively used. Such systems include a plurality of processors operating independently of each other, and hence not wasting the processing power of any of the processors. The processors are commonly attached to a communication bus, the use of which they share and over which they communicate both with each other and with shared resources such as memory. When one of the processors fails, the other processors take on the failed processor's processing load and continue to carry on all system tasks. Nevertheless, it has been difficult to make such systems highly reliable, one reason being that all of the processors share, and hence all depend upon the proper functioning of, the shared resources, such as communications buses and memory. And because a plurality of the processors share the resource, there is an increased chance that one of the processors will malfunction and adversely affect the shared resource, thereby adversely affecting as well the operation of the other processors that depend upon that resource.
Attempts have been made to combine the desirable features of both redundant processor and multi-processor architectures in one architecture. An example thereof is disclosed in U.S. Pat. No. 4,823,256. It discloses a dual processor system that can be configured, and reconfigured at will, to operate either as a multi-processor where both processors operate independently of each other, or as a redundant processor operating in the active-standby redundancy mode. However, the complexity of its disclosed implementation makes it too expensive and commercially impractical in all but a few specialized applications.
What the art still lacks is a fault-tolerant processing system architecture that is simple in design and inexpensive to implement, use, and maintain, yet that does not sacrifice robustness, reliability, and fault-tolerance.