Double data rate synchronous dynamic random-access memory (DDR SDRAM) is a type of memory integrated circuit (IC) used in computers. DDR SDRAM is able to achieve faster transfer rates by using timing control of electrical data and clock signals, and is able to transfer data both on the rising edge and on the falling edge of the clock signal, thereby effectively doubling data bus bandwidth when compared to a single data rate SDRAM interface utilizing the same clock frequency, and thereby achieving nearly double the bandwidth.
Different generations of DRAM are able to use error-correcting code (ECC) memory during data storage to both detect and sometimes correct common types of data corruption. ECC memory is immune to single-bit errors through use of parity checking. In DRAM systems, parity checking is accomplished by storing a redundant parity bit representing the parity (odd or even) of data (e.g., one byte of data) stored in memory (e.g., stored in a parity device, or in an ECC chip, of the DRAM module), by independently computing the parity, and by comparing the stored parity to the computed parity to detect whether a data error/memory error has occurred.
Accordingly, to ensure that data retrieved from the DRAM module (e.g., a dual in-line memory module (DIMM)), which may correspond to a data word or data symbol, is the same as the data written to the DRAM module, ECC can correct errors that arise when one or more bits of the data are flipped to the wrong state.
That is, by using ECC redundancy, the ECC chip is capable of single error correction double error detection (SEC-DED), meaning that the ECC chip is able to detect the existence of two errors occurring in a single burst, and is also able to correct a single erroneous bit when occurring in isolation. That is, if one data chip is corrupted or lost, by using data of the remaining data chips and ECC data of the ECC chip, the data of the corrupted or missing data chip can be reconstructed. Accordingly, in standard ECC, SEC-DED enables correction of single-bit errors, and is able to detect two errors in a 64-bit datapath. However, conventional SEC-DED uses an extra chip to store ECC bits, and performs error detection and error correction in memory controller by using Hamming codes.
Additionally, DRAM systems may have chipkill mechanisms (e.g., single chipkill and double chipkill) for erasing, or disabling, nonfunctional data chips. Various chipkill mechanisms for DDR4 use two or more ECC devices/chips per memory channel to detect, locate, and erase nonfunctional chips. Accordingly, in standard ECC, chipkill mechanisms are able to correct entire chip failures (e.g., 4-bit chips). However, conventional chipkill mechanisms use SSCDCD for single chip failure (or) double chip sparing for double chip failure, with older models using a (128/144) scheme, while relatively newer modules using a (64/72) scheme.
For example, normal DDR4 has a prefetch length of 8n, a burst length of eight (i.e., eight bursts per memory transaction), and a memory channel width of sixty-four bits, where n is a number of bits of an interface width of the data used in the corresponding system architecture (e.g., if the interface width is 4-bit, then the prefetch length of the corresponding DDR4 system is thirty-two bits). Accordingly, DDR4 will transmit 512 bits for each memory transaction.
To continue increasing DDR interface bandwidth, a new DDR interface may increase the prefetch length. This new DDR interface may have prefetch length of 16n, which is twice of the prefetch length of current DDR4 interface. The new DDR interface will, therefore, transfer twice the amount of data transferred by the DDR4 system for each memory transaction. This new DDR interface may also have a burst length of sixteen (i.e., sixteen bursts of data in each individual memory transaction), and a memory channel width of thirty-two bits per memory channel, and will therefore also transmit 512 bits per memory channel per memory transaction. However, this DDR interface has two memory channels per DIMM, each DIMM being a module having multiple DRAM chips on a circuit board including chip pins to enable connection to a computer motherboard. The two memory channels of the DDR DIMM effectively work independently of one another.
Despite having a narrower memory channel than DDR4, the new DDR interface has a data width of thirty-two bits per memory channel, with eight data devices (e.g., 4-bit data chips) being configured to store and transfer data for each memory channel. This new DDR interface also has an ECC width of four bits per memory channel with one 4-bit ECC chip for each memory channel. Accordingly, to compensate for having half of the memory channel width of DDR4, this new DDR interface has twice the burst length as DDR4. Because this new DDR interface has two memory channels, each memory channel having eight 4-bit data chips dedicated to storing data, there will be a total of sixty-four bits of memory data per burst.
Furthermore, unlike DDR4, which has two ECC chips per memory channel, this new DDR interface may have a single ECC chip per memory channel, or even a single ECC chip per DIMM, to protect the sixteen data chips used for storing data. The new DDR interface can therefore have reduced ECC overhead when compared to DDR4. Accordingly, if a new DDR interface uses one ECC chip per memory channel, for every burst there will be eight bits of ECC data corresponding to the two 4-bit ECC chips, one ECC chip being in each of the two memory channels of the DIMM. Accordingly, such a new DDR interface will transmit 72 bits of data for every burst.
Modern servers require robust error correction and error detection to guarantee high RAS features. However, this comes with the overhead of additional device and controller complexities. Accordingly, it may be difficult to maintain DDR reliability, availability, and serviceability (RAS) using current DDR4 techniques, as system ECC overhead increases with the corresponding decrease in data width. Furthermore, chipkill techniques require additional ECC overhead due to the increased number of memory channels per DIMM. Moreover, as DRAM systems scale, even more robust reliability methods are necessary to guarantee end-to-end data integrity.
Accordingly, it may be useful to provide novel methods of error correction and data recovery, and to provide a DRAM DIMM that is able to correct some types of memory errors internally without assistance from a memory controller, and that is able to direct the memory controller to assist in memory correction for other types of errors that the DRAM is unable to correct internally.
The above information disclosed in this Background section is only to enhance the understanding of the background of the invention, and therefore it may contain information that does not constitute prior art.