Referring to FIG. 1, a typical prior art computer includes a central processing unit (CPU) 12, a random-access memory (RAM) 20, and a mass storage device 32 connected by a system bus 18 that passes data and messages between components connected to the system bus 18. Computer performance can be increased, for example, by increasing the clock speed at which the CPU 12 operates. However, the increase in performance from an increase in clock speed is limited by the rate at which the system bus 18 conveys data between the components of computer 10.
Typically, system bus 18 has a lower clock rate than CPU 12. Thus, computer performance may also be increased by increasing the clock speed of the bus 18, thereby increasing the throughput of communications carried by the bus 18. One implementation of a high-bandwidth integrated memory subsystem using a bus with a fast clock is the RAMBUS specification from RAMBUS, Inc. of Los Altos, Calif. The RAMBUS system uses a 400 MHz clock with triggered on the rising and falling edges of the clock signal. Therefore, one line in a RAMBUS channel has a bandwidth of 800 Mb/s.
FIG. 2 depicts one embodiment of the RAMBUS integrated memory subsystem. The memory system typically includes a direct RAMBUS controller 50 and at least one RAMBUS Integrated Memory Module (RDRAM) 52 connected by direct RAMBUS channels 54 ending in a channel terminator 56. The controller 50 operates as a bus master for the memory subsystem: it generates requests, controls the flow of data, and keeps track of RDRAM refresh and states. The channel 54 is composed of thirty individual lines triggered on both edges of a 400 MHz clock signal, resulting in a 2.4 Gb/s throughput. The terminator 56 is a matched impedance absorbing any signals reaching the end of the channel 54 without any reflections.
In order to operate at this level of throughput, the operation of components in the RAMBUS subsystem is tightly monitored and periodically adjusted to maintain performance within predetermined tolerances. During these periodic recalibration events, the memory subsystem is not available for memory read or write transactions. In a single processor or asynchronous multiprocessor computer, this recalibration results in a short delay when the memory subsystem is unavailable. However, unsynchronized recalibration events can cause errors in a synchronized multiprocessor computing environment.
Certain prior art computer systems achieve fault tolerance through multiply-redundant system components. Each computer has multiple CPUs, each CPU having its own memory subsystem and other support electronics. The CPUs are cycle-synchronized to run identical copies of the same program simultaneously. Additional logic monitors the output of each CPU at a given point in time and, if the outputs disagree, restarts or initiates a diagnostic sequence to correct or identify the problem. If each CPU is equipped with a high-bandwidth memory subsystem that requires periodic recalibration, then the output of each individual CPU will appear to stall during a recalibration period. If recalibration among multiple memory subsystems is uncoordinated, then during recalibration events the outputs of the CPUs may vary, inducing monitor logic to halt or restart the system. Therefore, it is desirable to implement high-bandwidth memory in a lockstepped multiprocessor computing environment while avoiding delay-induced voter miscompares and other problems.