Over the past decade, the use of computers and related technology has increased tremendously. In particular, computers often support air traffic control systems, banking systems, and mission critical defense systems, such as computer systems controlling the launch and flight of defense missiles. Deployed in such a ubiquitous manner, the computers can cause severe problems in the functioning of society if any were to fail. Because of the potential for far-reaching adverse effects in the event of failure, computers are being required to ensure ever-higher reliability. Fault-tolerant computers are computers that generally provide this reliability aspect in such systems.
Typically, a fault-tolerant computer includes one or more redundant central processor units (CPUs) and one or more redundant input-output (I/O) boards, or subsystems. In a fault-tolerant server, the redundant CPUs often execute in “lockstep,” that is, each CPU executes substantially identical copies of an operating system and application programs and executes substantially identical instruction streams, substantially simultaneously, or in cycle-by-cycle synchronism. This enables a first CPU to replace a second CPU upon the failure of the second CPU without loss of operation of the fault-tolerant server. Such a replacement of CPUs is unnoticeable to the user of the fault-tolerant computer.
To verify that the redundant CPUs are executing identical instruction streams, the I/O subsystems typically compare the I/O instructions that each redundant CPU generates. When the redundant CPUs and I/O subsystems are included in a single system, enabling the verification that the CPUs execute in lockstep is readily obtainable because each I/O subsystem can communicate with each CPU to compare the generated instructions.
However, when these redundant components (CPUs, I/O subsystems) are located on more than one independent system, enabling the lockstep operation of the CPUs is frequently not readily obtainable. Thus, there remains a need to enable more than one CPU located on more than one independent system to execute in lockstep.