In a running process of a computer system, reliability of a memory plays a significant role. On one hand, with an increasing quantity of memories configured in the system, a failure rate of a memory system is to increase exponentially; on the other hand, because of an introduction of a low-voltage operating mode, a possibility that an error occurs in a memory increases and a quantity of errors increases.
At present, an error checking and correcting (ECC) memory is a widely adopted memory reliability solution. A basic idea of an ECC memory, that is, a memory module with an ECC check code, is to perform data protection in a basic unit of memory module bit width. That a memory module bit width is 64 bits is used as an example; and each time 64-bit data is written, an 8-bit parity bit is calculated for the data and stored in an independent ECC chip, where the 64-bit data and the 8-bit parity bit form a 72-bit ECC word together; any 1-bit error in the 72-bit ECC word can be corrected in this encoding mode. However, if a 2-bit error occurs, the error can only be detected but cannot be corrected, not to mention a multi-bit error occurs.
The IBM company puts forward a Chipkill memory technology on the basis of the ECC memory. A design principle of a Chipkill memory based on an accumulative effect of memory errors tends to occur in a same dynamic random access memory (Dynamic Random Access Memory, DRAM) chip, and a failure of any DRAM chip can be tolerated in the Chipkill technology. A memory controller (Memory Controller, MC) of the Chipkill memory needs to simultaneously control four dual inline memory modules (Dual Inline Memory Module, DIMM) with ECC to work cooperatively, where a bit width of the MC is formed by four 72-bit ECC words, a 1-bit error can be detected and corrected in each ECC word, a bit width of a DRAM chip in each DIMM needs to be four bits, and by means of careful design, 4-bit input/output of a same DRAM chip are respectively mapped to four different ECC words. By means of such design, even if an error occurs in all data of four pins of a DRAM chip, four different ECC words can recover the data, that is, damage of any DRAM chip in any DIMM can be tolerated in the Chipkill technology. In the Chipkill technology, relatively high reliability is achieved by means of a wider MC bit width and data encoding at a coarser granularity. However, theoretically, this technology can only be applied in a DRAM chip of a 4-bit bit width, and therefore is inflexible; in addition, data encoding at an extremely coarse granularity causes that data read by a DIMM each time is much larger than data requested by actual memory access request, which causes much unnecessary power consumption.