1. Technical Field
The present inventions relate to memory systems with primary and redundant memory.
2. Background Art
Computer systems typically include memory devices. Dynamic random access memories (DRAMs) are commonly used memory devices that store relatively large amounts of data. Memory controllers issue write requests and read requests to DRAMs. The data to be stored in response to a write request may originate from a processor or another chip. The data provided by the DRAM in response to a read request may be used by the processor or another chip. The memory controller may be in a physically separate chip from the processor or may be on the same chip as the processor.
Computer systems, including server systems, follow a technology trend in which memory subsystems are increasing in both absolute size and in device density. Accompanying the larger memory subsystems is an increasing occurrence of both soft and hard errors in the DRAM devices used to implement the memory subsystem. As the memory subsystem grows, so does the statistical probability of a multi-bit error in any given quantum of data manipulated by the memory controller. In many cases, the memory controller operates on a fixed data size corresponding to a fraction of a cache-line size of the platform processor complex. For example, a memory controller designed for CPUs with a 64 Byte line may store eight 64 bit fragments independently.
Recent server system products have exhibited several features targeted at limiting the system impact of both soft and hard errors in the DRAM memory subsystem. Today, it is common for memory controllers to implement an “Error Correcting Code” (ECC) algorithm, where additional bits of data are stored along with each cache-line fragment, such that any single bit error or combination of bit errors within an aligned nibble may be corrected in hardware. This mechanism permits a system to continue operating reliably in the presence of occasional single-bit soft errors, as well as in the presence of a hard error affecting up to an entire ×4 DRAM device. Extensions on this algorithm are available to protect against failed ×8 DRAM devices. But the ECC mechanism may break down when multiple soft errors are encountered on a single access to the memory store, because the limited correcting code (typically 8 check bits for every 64 data bits, or 16 check bits for every 128 data bits) may not be able to cover all permutations of two or more bit errors scattered across the affected data.
Mirroring data refers to maintaining two copies of every datum in the main memory store. Mirroring every bit of data cuts the effective capacity of a given memory subsystem implementation in half. Known solutions available today also require that the available bandwidth of the memory subsystem be cut in half to provide the mirroring capability.
FIG. 1 provides an example of systems using memory mirroring. In FIG. 1, a system 10 includes a memory controller 12 that is coupled to a primary channel 16 and a mirror channel 18. Memory modules M1, M3, M5 and M7 are coupled to primary channel 16 and memory modules M2, M4, M6, and M8 are coupled to mirror channel 18. Primary data sections DA1, DB1, DA2, and DB2 are provided to memory chips in modules M1, M3, M5, and M7 and redundant data sections DA1′, DB1′, DA2′, and DB2′ are provided to memory chips in modules M2, M4, M6, and M8. Note that primary data sections DA1, DB1, DA2, and DB2 are identical or essentially identical to redundant data sections DA1′, DB1′, DA2′, and DB2′.
In another memory system, a memory controller is coupled to a first repeater hub through the primary channel and a second repeater hub through the mirror channel. Two subchannels are coupled to each repeater hub. Memory modules are coupled to the subchannels. Primary dated is stored in the memory modules of the subchannels coupled to the first repeater hub and redundant data is stored in the memory modules of the subchannels coupled to the second repeater hub.
Memory systems with more than two channels have been proposed.