A memory module, such as a dual in-line memory module (DIMM), may include multiple memory integrated circuits or chips, such as dynamic random-access memory (DRAM) chips. A memory module that does not include error checking and correcting capability (referred to as non-ECC memory) typically includes 8 chips per side for storing data. A memory module may be designed with one or more independent sets of memory chips connected to the same address and data buses. Each such set of memory chips is referred to as a rank. Because all ranks of a memory module share the same buses, only one rank may be accessed at a time.
A memory module may include ×4 (by 4) memory chips or ×8 (by 8) memory chips. “×4” or “×8” refer to the data width of the memory chips in bits. For a ×4 memory module, each memory chip has a data width of 4 bits, which provides 4 data bits on each access. A ×4 memory module with 8 chips per side has a data width of 32 bits per side. A memory controller that accesses 64 bits of data at a time needs to access both sides of an 8-chip ×4 memory module at the same time to read or write the data. A two-sided memory module where both sides are accessed at the same time is a single-ranked memory module.
For a ×8 memory module, each memory chip has a data width of 8 bits, which provides 8 data bits on each access. A ×8 memory module with 8 chips per side has a data width of 64 bits per side. A memory controller that accesses 64 bits of data at a time accesses one side of an 8-chip ×8 memory module at a time to read or write the data. A two-sided memory module where one side is accessed at a time is a dual-ranked memory module.
A memory module may include ECC capability to detect and correct bit errors in data stored in the memory module. A memory module that includes ECC capability (referred to as an ECC memory) encodes data by generating ECC bits, e.g., redundancy bits or parity bits, that are stored along with the data in the memory module. A conventional ECC memory may use a Single Error Correct, Double Error Detect (SEC-DED) algorithm to detect and correct single-bit errors and detect, but not correct, double-bit errors in a data word. A data word as used in this disclosure is the largest unit of data, not including the ECC bits, that can be transferred to and from a memory module in a single operation. Enhanced Hamming Code, which generates 7 Hamming check bits and 1 parity bit, may be used to provide SEC-DED protection for each data word.
An ECC memory typically includes 9 chips per side for storing data and ECC bits that can be used to detect and correct errors in the data. An ECC memory can also include an interface that can provide simultaneous access of a data word and its corresponding ECC bits. A data word and its corresponding ECC bits are referred to as an encoded word. For example, an ECC memory that can provide 8 ECC bits for each 32-bit data word may include a 40-bit wide interface to access a 40-bit encoded word. Similarly, an ECC memory that can provide 8 ECC bits for each 64-bit data word may include a 72-bit wide interface to access a 72-bit encoded word.
Because conventional ECC algorithms, such as SEC-DED, detect and correct single-bit errors in a data word, an unrecoverable data loss can occur when multiple bits in a data word are in error or when a memory chip on a memory module fails. In either situation, the number of bits in error is greater than the number of bits protected by a conventional ECC memory. For example, when a ×8 memory chip fails, a conventional ECC memory cannot recover the 8 bits of data that are lost because a conventional ECC memory only protects against single-bit failures.
To prevent unrecoverable data loss, ECC memories have been designed to protect data from single memory chip failures and multi-bit errors from a single memory chip (referred to as advanced ECC). In such an advanced ECC memory, the bits of an encoded word are distributed across multiple memory chips such that failure of or multi-bit errors from a single memory chip will affect only one bit in an encoded word. Each bit that is stored on a memory chip corresponds to a different encoded word. For example, if an advanced ECC memory included ×4 memory chips, each of the 4 bits provided by the memory chip during a read operation would correspond to a different encoded word. Thus, even in the case of an entire memory chip failure, an encoded word will have no more than one bit of bad data, which can be corrected using conventional ECC algorithms such as SEC-DED.
As an example, a 72-bit encoded word may be protected from unrecoverable data loss using an advanced ECC memory that includes 4 memory modules each having 18 chips. The encoded word may be divided into four 18-bit segments. Each 18-bit segment may be stored in a separate memory module. Each bit of an 18-bit segment may be stored in a separate memory chip.
If the advanced ECC memory includes four single-ranked ×4 memory modules, 4 ranks may need to be accessed to transfer a 72-bit encoded word to or from the advanced ECC memory. If the advanced ECC memory includes four dual-ranked ×8 memory modules, 8 ranks may need to be accessed to transfer a 72-bit encoded word to or from the advanced ECC memory. Accessing multiple ranks to transfer an encoded word may impact the performance of an advanced ECC memory by increasing access latency due to a time needed to switch shared buses between the multiple ranks and a time needed to transfer data to or from one rank at a time.
Data that is to be written to an advanced ECC memory may need to be interleaved in order to write each bit of an encoded word to a different memory chip. Data that is read from an advanced ECC memory may need to be de-interleaved to reconstruct the desired encoded word that was stored on multiple memory chips. The interleaving of data that is to be written and de-interleaving of data that is read may also impact the performance of an advanced ECC memory by further increasing access latency corresponding to a time needed to perform the interleaving or de-interleaving of the data.
To transfer a 72-bit encoded word to or from an advanced ECC memory that includes four single-ranked ×4 memory modules, 288 bits (72 bits×4 modules×1 rank per module) of data may need to be transferred to or from the advanced ECC memory. To transfer a 72-bit encoded word to or from an advanced ECC memory that includes four dual-ranked ×8 memory modules, 576 bits (72 bits×4 modules×2 ranks per module) of data may need to be transferred to or from the advanced ECC memory. Generally, to read an encoded word from an advanced ECC memory, large blocks of data that include data other than a desired encoded word may also need to be transferred along with the desired encoded word. To write data to an advanced ECC memory, large blocks of data that include multiple encoded words may need to be generated and buffered before the writing of the data can be performed.
Because large blocks of data that include multiple encoded words are transferred to or from an advanced ECC memory each time a single encoded word is requested or modified, the advanced ECC memory may need to perform read-modify-write (RMW) to modify a single encoded word. For RMW, the advanced ECC memory reads a large block of data that includes the encoded word that has been modified. The advanced ECC memory may replace the bits of an encoded word in the large block of data that correspond to the bits of the modified encoded word. The large block of data with the modified encoded word is then written back to the advanced ECC memory. RMW may impact the performance of an advanced ECC memory by decreasing data throughput and increasing access latency. To write a single encoded word to an advanced ECC memory, the ECC memory needs to perform a read operation and a write operation. This may decrease a rate at which data can be written to the advanced ECC memory and increase an amount of time needed to write data to the advanced ECC memory.