Conventional processing systems generally have been configured around a single central processor. The central processor has been the main computing unit of the processing system. The central processor has also generally been the main administrator of the system, charged with coordinating the operation of various system units to perform system tasks. Like the central processor, the other system units have largely not been duplicated in the system.
Such systems have not been highly reliable, in the sense of being prone to partial or total operational failure as a result of a component unit becoming faulty. In particular, a major fault in the central processor has generally brought down the whole processing system, and the system has been unavailable for performing assigned tasks until the fault was repaired.
Ever-increasing demands being placed on the processing capabilities of systems have often exceeded the computing capacity of a single central processor. A solution to this problem has been to attach one or more auxiliary processors to the central processor in such systems. The function of the auxiliary processor has been to take some of the computational load off the central processor, and thus increase the overall system processing capability. Commonly, however, the attached processor has operated merely as a special purpose computing unit under the control of the central processor. It has commonly shared other system resources with the central processor and often was adapted to communicate with those resources only via the central processor. The central processor has retained administrative control over the processing system.
The processing power of a system has been significantly increased by the attachment of an additional processor. However, since most system units have remained unduplicated and the central processor has retained its key position in such processing systems, system reliability has not been improved. In particular, the susceptibility of the system to faults in the central processor has remained substantially unchanged from the single processor configuration.
On the other hand, certain applications of processing systems, such as communication switching systems, cannot easily tolerate being put out of service by the failure of system units. Such applications require the use of highly reliable processing systems. Reliability in such systems has been achieved by the duplication of system units, in particular, the central processor. Generally, the duplicated central processors in such systems operate either in a lock-step configuration, with each processor performing all system tasks in parallel with the other processor, or in an active-standby configuration wherein one processor is performing all system tasks while the second processor is acting as a backup and is standing idle by, ready to take over should the one processor fail, or in a checkpoint configuration wherein a processor sends information about transactions that it undertakes to another processor so that if failure of the one processor were to occur, the other processor could compute the current state of the failed processor and take over its transactions from that point.
In such systems either one of the duplicated processors is adapted to handle all systems tasks alone. Thus a fault in one of the processors does not bring about the failure of the processing system. The other processor carries on all system tasks, but without a backup, until the faulty processor is repaired. In such systems it takes the simultaneous failure of both processors to incapacitate the system. System reliability is thus significantly improved over the single processor configuration. However, even though such highly reliable systems have undergone the expense of duplicating system units, including the central processor, they can effectively utilize the processing power of only one of the processors at any one time. Their processing capacity therefore is no better than that of a corresponding system having only one processor.
Multi-processor systems have also become known to the art. Such systems include a plurality of processing units, often operating independently of each other. The processing units are commonly attached to a communication bus the use of which they share and over which they communicate both with each other and with shared resources such as memory. It has been difficult to make such systems highly reliable, one reason being that all of the processing units share, and hence all depend upon the proper functioning of, the shared resources such as memory. And because a plurality of the processing units share the resource, there is an increased chance that one of the processing units will malfunction and adversely affect the shared resource, thereby adversely affecting the operation as well of the other processing units that depend upon that resource.