Computing devices require storage for data and code to be executed. Temporary storage traditionally provides faster access to data for execution, and has traditionally been implemented with volatile memory resources. Volatile memory must be periodically refreshed to retain a determinative state, but its density and low access latency makes it a preferred technology for current computing platforms, whether for servers, desktop or laptop computers, mobile devices, and consumer and business electronics. DRAM (dynamic random access memory) devices are the most common types of volatile memory devices in use. Single bit DRAM failures are projected to increase as the manufacturing processes that produce DRAMs generate components with smaller geometries, leading to an increase in persistent single bit errors.
One technique for addressing the increasing error rate is with on-die ECC (error checking and correction), which refers to error detection and correction logic that resides on the memory device itself. In general, error checking and correction can vary from the lowest levels of protection (such as parity) to more complex algorithmic solutions (such as double-bit error correction). Parity error generation and checking is fast, and can indicate an error in a long string with a single parity bit, but it provides no correction capability. Single error correction (SEC) requires more resources than parity, and can correct a single error per code word. Double-bit error correction requires more resources (time and code store) to implement, which may not be feasible for on-die ECC in memory devices in high-speed, high-bandwidth applications. While stronger codes provide better error detection and correction, there is a tradeoff with computation time and resources that favors weaker codes in on-die ECC implementations.
In systems that employ DRAMs implementing on-die SEC, the ECC can correct a single bit error (SBE). On-die ECC can be used in addition to system level ECC, and the memory device may return single-error corrected data that is indistinguishable to the system from data that had no errors. However, an SEC ECC system can attempt to correct a double bit error as an SBE. The miscorrection of a double bit error (which system-level ECC may be able to correct) as an SBE can actually create a triple bit error by toggling a third bit due to misinterpreting the double bit error as an error at a bit indicated by an SEC code. While a double bit error may be correctable or detectable with system-level ECC, a triple bit error may not be correctable or detectable. Aliasing can refer to the erroneous changing of a bit value based on syndrome computation. Aliasing can create an additional error while attempting to correct an error. Traditional on-die ECC is subject to aliasing errors due to limitations on the resources needed to implement ECC.
Descriptions of certain details and implementations follow, including a description of the figures, which may depict some or all of the examples described below, as well as discussing other potential examples or implementations of the inventive concepts presented herein.