The present invention is related to error correction in memory systems, and in particular to a system and method for automatically detecting and handling permanent bit errors in memory systems.
Computer systems generally contain, at minimum, a central processing unit and a memory storage device. Memory storage devices are necessary because they provide data retention for the computer system. These memory storage devices are susceptible to data bit errors, especially in environments with high levels of radiation. These bit errors occur when an energized particle, such as a stray neutron, comes into contact with a data bit storage cell and causes the stored data bit to flip from a logic 1 to a logic 0 or vice versa.
Computer systems are usually equipped to detect single or multiple bit errors in the data stored in a memory storage device. These errors can be temporary, such as bit errors caused by a single event upset (SEU), or permanent, such as bit errors caused by single event latchup (SEL) events, which require a reset of power to the memory storage device to fix. An SEU event is caused when an energized particle comes into contact with a data bit storage cell causing the storage cell to change state. An SEU event can be fixed by simply rewriting the correct data to the data bit storage cell. In contrast, an SEL event generally occurs when an energized particle, usually a stray neutron, hits a metal oxide semiconductor field effect transistor (MOSFET) and creates a low-impedance path between the power rails of the transistor causing it to continuously conduct. The transistor remains in this state as long as there is some current flowing through it, which is generally until the power is cycled.
The prior art involves handling permanent bit errors by detecting the error and then automatically recycling power to the device. In flight applications, for example, powering down the latched device requires special circuitry to be added to the power supply to interrupt the input power and to allow the energy in the system to deplete before re-applying the input power. During this period of time, the system must remain offline.
Recycling power to the memory storage device can cause problems in systems that cannot be offline for any extended period of time, such as the systems for an aircraft. Thus, systems that are susceptible to permanent bit errors, particularly in environments that are exposed to high radiation, are required to use memory storage devices that are insulated as to be incapable of permanent bit errors. These memory storage devices are often more expensive than those that are susceptible to permanent bit errors.