For mission-critical computer systems, one key operational parameter is availability. If parts of a system fail, the system should continue to be available, preferably with no reduction in performance.
It is known to provide spare memory modules, with a way to automatically substitute a working module for a defective module. See, for example, U.S. Pat. No. 4,093,985. Typically, because of the cost of memory, the amount of memory used as spare memory is much less than the amount of memory actively being used. As a result, when errors are detected in a defective memory unit, the contents of the defective unit must be copied to the spare memory before the defective unit is inactivated. Depending on the size of the defective unit, copying the contents may affect performance.
Where both availability and full performance are critical, it is known to provide two completely separate redundant memory systems with identical data contents. If errors in one of the systems exceed a predetermined threshold, the system with errors may be inactivated and the other memory system may be activated, with little or no impact on performance. Such systems are called mirrored memory systems. If memory modules can be replaced while the overall computer system is running, replacement is sometimes called hot swapping, or hot plugging.
FIG. 1 illustrates a mirrored memory system with two separate controllers and two separate memory busses. A processor 100 communicates over a processor bus to two memory controllers 102 and 104. Controller 102 controls a first memory bus A. Controller 104 controls a second memory bus B. Two memory units, A0 and A1 are illustrated on memory bus A. Two memory units, B0 and B1 are illustrated on memory bus B. In the configuration illustrated in FIG. 1, controllers 102 and 104 operate in parallel. Whatever is written to memory unit A0 is also written to memory unit B0. Whatever is written to memory unit A1 is also written to memory unit B1. Memory read transactions only use one memory bus. For example, if memory bus A is active, then memory bus B is not used for memory read transactions. If, for example, memory bus A is active, and memory unit A1 is determined to be defective (for example, correctable memory errors), memory read transactions may be switched from memory bus A to memory bus B. Power to memory bus A may be disconnected, and an entire bank of memory containing memory unit A1 may be removed and replaced, with no interruption of service or impact on performance. After memory unit A1 is replaced, data in memory unit B1 is copied to replacement unit A1 for full mirroring. This copying of data may be performed as a background process without affecting performance.
Mirrored memory systems typically duplicate complex and expensive memory controllers and memory busses. There is a need for less expensive and less complex mirrored memory systems.