Embodiments of the present invention relate to memory errors and, more specifically, to reducing uncorrectable errors based on a history of correctable errors.
During memory reads and writes, dynamic random access memories (DRAMs) experience occasional errors. These errors can be transient or permanent, also referred to respectively as soft or hard errors. These errors can result in both correctable and uncorrectable errors. In the case of a correctable error, the data read out of memory is restored to its correct value and is usable by the system, whereas in the case of an uncorrectable error, the data cannot be restored to its correct value and is unusable by the system. To manage errors and error correction, memory systems using DRAMs can include error correction circuitry (ECC), memory mirroring, redundant array of independent memory (RAIM) ECC, scrubbing, marking, sparing, and retries.
In the case of correctable errors, error information can be accumulated and used to make decisions to mark DRAM chips or memory channels, thus taking them offline to avoid future errors. This error information is often collected over time during memory scrub operations. Additionally, the error information is used to periodically update tables in hardware that control markings on DRAM chip and memory channels, to avoid future error events involving faulty DRAM chips or memory channels.
There may be circumstances, however, when either a DRAM chip or a memory channel experiences a burst of errors prior to a memory scrub discovering the errors. Depending on limitations in the ECC code, these bursts of errors may result in uncorrectable errors. Further, if there are simultaneous error events due to a burst of otherwise correctable errors in one channel combined with errors in another channel prior to a scrub setting a mark, uncorrectable errors are likely to result.