Error Correcting Code (ECC) is a technique that is commonly used to correct errors in semiconductor memory but may be used elsewhere. ECC is used with all forms of semiconductor memory but is especially beneficial in dynamic memory (DRAM) memories and to a lesser extent in static memory (SRAMs). DRAMs are more susceptible than SRAMs to soft errors (transitory) and hard errors (permanent) caused by a variety of sources, including energetic particles, electrical noise, microwaves, age, and high temperatures. An energetic particle (often a proton produced by a decayed cosmic ray neutron) can discharge small capacitors that store bits in a DRAM and can, in some cases, permanently damage semiconductor circuits. Airborne system designers pay particular heed to a risk from energetic particles whose prevalence increases greatly with altitude. A common form of ECC used with semiconductor memories is Single Error Correction Double Error Detection (SEC-DED) which can, as the name implies, detect and correct a single bit error and detect a double bit error. Usually a system is unaware of an occurrence of a single bit error but may try to clear a double bit error by retrying an access. If a double bit error cannot be cleared, an operating system is often notified by way of a machine check, which may then take an appropriate action. Many systems cannot recover from a double bit error in critical code, e.g., the kernel of an operating system. Some systems scrub memory by periodically reading and writing data to clean single bit soft errors from memory to reduce the likelihood that ECC will detect a double bit error.
When data is written to an ECC enabled memory, ECC logic examines a block of data bits, commonly 64-bits, and generates a block of bits based on the data bits, called check bits, that are stored with the data. A check bit is a parity bit generated on a combination of data bits, and each check bit is generated from a specific combination of data bits that is unique to each check bit. SEC-DED requires 8 check bits to be generated from and stored with a 64-bit block of data, therefore storing 72-bits. When the data is read, the check bits are read with the data and are processed by ECC logic to generate an error indicator, called a syndrome. A syndrome points to a flipped bit (in the data or check bits) if there is one, or may indicate that two erroneous bits exist somewhere in the 72-bits read. In an unlikely event that three or more bits are in error, an erroneous syndrome is generated that may erroneously indicate that a correct bit is incorrect or that the data is correct.
Double Error Correction (DEC) techniques exist but require 14 check bits to be generated and stored with 64-bits of data. Double Error Correction Triple Error Detection (DEC-TED) requires 15 check bits to be generated and stored with 64-bits of data. DEC or DEC-TED is used in situations that require extreme reliability and/or operation in hazardous environments, e.g., spacecraft exposed to radiation or in hardened weapons systems.
Byte correction codes are a type of ECC that is are often employed in memory systems with a memory organization that includes memory chips that provide byte accesses. In this case, a failed memory chip causes an entire byte of information to be incorrect. Byte-oriented error correction codes have been developed that provide single byte error correction and double byte error detection (SBC-DBD) to enable a system to continue operation with a failed memory chip. Other byte-oriented ECC techniques are possible.