Achieving and maintaining a desired level of performance reliability may be essential for certain business-critical computer systems, such as, servers designed to provide high-volume storage capacity, intensive data crunching, high-speed communication interfacing, etc. Reliability of the memory subsystems often plays a key role in meeting overall server reliability, availability, and serviceability (RAS) benchmark. Single Device Data Correction (SDDC), a computer memory technology for error checking and correcting, developed by Intel Corp., is a pivotal RAS feature for Dynamic Random Access Memory (DRAM) subsystem in servers due to the significant hard-failure rate associated with DRAM devices. SDDC is typically implemented using Error Correcting Code (ECC) memory, such as, ECC Dual In-Line Memory Modules (DIMMs).
ECC memory is common in the industry for its positive impact on server reliability. ECC memory is able to detect and correct single bit memory errors. However, the increase of memory capacity, the density of memory on a single DIMM, and the increase in speed of the memory subsystem have significantly increased the risks of multi-bit memory errors that cannot be corrected by conventional ECC memory, resulting in system failure. A special type of advance ECC memory, referred in the industry as “Chipkill” memory, is known to reduce chances of system downtime caused by memory device failures, including multi-bit memory errors. The term ‘chipkill’ indicates detection and correction of failed device. This technology was originally developed by IBM Corp. for mission-critical systems, but is gradually distilling down to consumer systems as well. For example, the market interest in cloud-based computing is definitely providing a positive push towards enhancing overall system reliability in a cost-effective and power-efficient way.
Going back to the ECC memory architecture, a conventional x4 DIMM has 2 spare devices and a x8 DIMM has 1 spare device that can be used for ECC. RAS-conscious customers either use x4 DIMMs or x8 DIM Ms along with special features (such as, operating two channels in ‘lockstep’ to increase the number of available ECC devices) to achieve SDDC. Conventionally, SDDC requires a minimum of 2 spare devices. A significant portion of the spare devices has to be used to store tag bits, and there may not be enough bits left to implement SDDC using conventional ECC codes. With this said, there appears to be room for improving ECC memory subsystems that also frees up capacity for metadata storage.