Fault-tolerant computer systems have used several arrangements to detect and/or correct hardware faults. Some of these methods include: parity on the bus system; EDAC (error detection and correction) for memory; and, redundant systems, to name a few. The area of redundant systems uses two primary methods, they are N+1 and "Hot" standby. Both types require switching a backup system on-line when a fault in the primary system is detected. However, detecting a fault may be difficult.
Limiting the scope to the CPU (central processor unit), detecting a fault requires a comparison between a known value and an unknown value. For the CPU peripherals, parity, or a derivative, can be used to detect most errors. However, detecting faults in a processor CPU is not that simple. Using two processors synchronized with each other and comparing their signals for any differences is the most common method to detect faults. If a difference is found in the comparison the CPU system is taken off-line and diagnostic software is used to help isolate the fault.
In prior systems the basic assumption for "microsynched" processor systems was that if the two CPUs started simultaneously and used the same input stimulus, they would always execute in lock-step absent a fault. (Examples of such prior art can be seen in U.S. Pat. Nos. 4,412,282 and 4,633,039, both assigned to the same assignee as the present invention.) This required that both processors receive the same input signals such as CLOCK, RESET, and INTERRUPTS; as a result, the output of the two processor would be identical, within timing specifications. This assumption may not be valid for today's complex processors. Therefore, simply providing the same stimulus may not insure concurrent operation of the two processors.
It is therefore a primary objective to provide a means of synchronizing two or more processors and thereby allowing detection of faults in a processor CPU system.