Personal computers, workstations, and servers include at least one processor, such as a central processing unit (CPU), and some form of memory system that includes dynamic, random-access memory (DRAM). The processor executes instructions and manipulates data stored in the DRAM.
DRAM stores binary bits by alternatively charging or discharging capacitors to represent the logical values one and zero. The capacitors are exceedingly small, and their stored charges can be upset by electrical interference or high-energy particles. The resultant changes to the stored instructions and data produce undesirable computational errors.
Some computer systems, such as high-end servers, employ various forms of error detection and correction to manage DRAM errors, or even more permanent memory failures. The general idea is to add storage for extra information that can be used to identify or correct for errors. By way of example, conventional servers that support error correction commonly include pairs of memory modules, each of which provides burst of 72-bit data for each memory access, for a total of 144 bits. Sixteen of these bits are used for error correction, so that each memory access effectively provides 128 bits of information. This level of redundancy allows support for error detection and correction (EDC) robust enough to correct for any single DRAM device failure, and any multi-bit errors from any portion of a single DRAM device. An exemplary EDC technology of this type is marketed under the trademark Chipkill™.