For many years, dynamic random access memory (DRAM) has served as a fundamental building block in computer memory sub-systems. Over this time, memory capacity has significantly increased, with a correspondingly significant reduction in memory chip size. Such DRAM scaling has led to significant increases in the memory capacities of computer devices and an ever increasing number of portable devices. However, memory scaling is not without its problems.
As scaling increases, semiconductor memories reach densities at the atomic level. Unfortunately, at this level individual atoms and electrons likely have negative effects on data storage correctness. Potentially, incorrect data storage could lead to the end of DRAM scaling. In response, the memory chip industry may soon turn its attention to resistive-memory technologies such as phase-change memory (PRAM). PRAM is one of the most promising technologies to potentially replace DRAM because functional PRAM prototypes have been demonstrated at 22 nm, and they are projected to scale to 9 nm. Eventually, it is possible that PRAM, or other types of resistive memories, will replace most of the semiconductor memories, including those residing on the memory bus.
Currently, PRAM's greatest limiting factor is its write endurance. At the 65 nm technology node, a PRAM cell is expected to sustain 108 writes before the cell's heating element breaks and induces a stuck-at fault (or hard failure), where writes are no longer able to change the value stored in the cell. Moreover, as PRAM scales to near-atomic dimensions, variability across device lifetimes increases, causing many cells to fail much sooner than in systems with lower variations. Unfortunately, existing systems for managing hard failures in DRAM and flash memory technologies do not map easily to PRAM. Accordingly, there is a need for low-overhead, accurate detection of hard failures, and a simple hardware-software interface that provides lossless recovery from such failures.