1. Field of the Invention
The present invention generally relates to computer systems. More particularly, the present invention relates to memory systems.
2. Related Art
To improve reliability, availability, and serviceability, a variety of techniques have evolved to facilitate hot swapping memory in computer system such as personal computers and servers. This allows the memory defect (or failing memory) to be healed (or replaced) without taking the computer system down. Moreover, substantial error correction capability has been integrated into servers, allowing them to run with a faulty memory module without crashing.
Traditionally, hot swapping memory has been accomplished by mirroring. That is, a second copy of the memory content is provided in the main memory system. For every memory bank in the main memory system, there exists a mirror memory bank having the same content. Every write operation to the main memory writes two copies: one copy to the memory bank and one copy to the mirror memory bank. Each read comes from a single copy of the main memory system.
Many implementations read just one copy at a time—if the copy being read has an uncorrectable error (through whatever error correction code (ECC) scheme that is being used), the computer system will report an uncorrectable error and crash even though there probably is a correct copy of the read in the unread memory copy. This is an implementation optimization. The number of ECC corrections can be used as a trigger to switch which copy from main memory is being read at any particular time.
A hot swapping operation is accomplished by suspending all accesses to a memory bank (mirror or non-mirror), and then turning that memory bank off. Certain maintenance operations are done in order to make sure that both the memory bank and the mirror memory bank are consistent, especially around hot swap operations. This is strongly analogous to RAID 1 (redundant array of independent disks). It is easy to implement, but quite expensive since two full copies of the contents of the main memory are needed.
Another approach to hot swapping memory is based on RAID 3. In this approach, the main memory system has one copy plus some extra information to help recover if a small portion of the main memory fails. Typically, this is accomplished by dividing the main memory system into several memory banks, striping the data across the memory banks, and adding one extra memory bank that stores the parity (or some other function) of the data stored in the other memory banks. In this way, if the failing memory bank is known, the failing memory bank can be reconstructed from the remaining memory banks and the extra memory bank storing the parity information. This has the advantage that less memory capacity is needed than the mirroring approach, but at the cost of a more complex algorithm (e.g., to calculate parity) for managing the main memory system.