The present invention is generally directed to recovering from memory errors and, more specifically, to recovering from radiation induced memory errors.
Today, commercially available microprocessors are generally not designed to operate in a space borne environment. As such, these commercial microprocessors are subject to radiation induced errors. For example, single event upsets (SEUs) can occur when an ionized particle hits a flip-flop or a memory cell, associated with the microprocessor, and changes the state of the associated flip-flop or memory cell. These radiation induced errors can result in incorrect calculations and faulty program execution, as well as system reset and a loss of state, due to timeout of a watchdog timer. Further, even when a microprocessor is radiation hardened, the microprocessor is still subject to SEUs of internal flip-flops and/or memory cells.
In a memory that includes a parity generator/checker circuit (and when parity checking is enabled), each time a data byte, i.e., eight bits, is written to the memory the circuit examines the byte and determines whether the byte has an even or odd number of ‘ones’. In the case of odd parity, when the data byte has an even number of ‘ones’ a parity bit, i.e., a ninth bit, is set to ‘one’. Otherwise, the parity bit is set to ‘zero’. The result is that no matter how many ‘ones’ were in the original eight bits of data, there are an odd number of ones when all nine bits are examined. Alternatively, instead of implementing odd parity the circuit may implement even parity such that the sum of the ‘ones’ is an even number. In a typical microprocessor system, when a byte is read from memory the circuit checks the parity of the byte to determine whether a parity error is indicated.
In a typical microprocessor system, when a parity error is detected a parity checker/generator circuit generates a non-maskable interrupt (NMI), which is usually used to instruct a microprocessor to immediately halt. This is done to ensure that invalid data does not corrupt valid data. In many microprocessor systems, a watchdog timer, which can be implemented in hardware or software, may be the only means for detecting when an execution error occurs. Alternatively, the watchdog timer may also be implemented in conjunction with a parity generator/checker circuit to detect memory errors. In either case, the microprocessor generally executes code until the watchdog timer times-out or an unrecoverable software occurs.
Microprocessors that have an internal cache memory with a parity generator/checker circuit may provide an output from the circuit off-chip and may also include an external flush line, which causes the microprocessor to flush its internal cache when it receives an appropriate signal. However, in general, space borne microprocessor systems have only used watchdog timers to detect memory errors attributable to SEUs. As a result, for microprocessor systems used in space born environments, the time period between when an application error occurs and the microprocessor system recovers from the error has generally been relatively long and has required resetting the microprocessor system.
What is needed is a technique for recovering from radiation induced memory errors that is both efficient and timely. It would also be desirable to recover from a radiation induced memory error without resetting the microprocessor system.