This invention related to methods and apparatus for correcting soft errors in digital data and is particularly useful for correcting soft errors in a computer memory.
Data stored on modem day integrated circuit memory chips is subject to so-called xe2x80x9csoft errorsxe2x80x9d caused by gamma rays, cosmic rays, alpha particles and other environmental factors. The passage of a gamma ray through a memory chip, for example, will sometimes cause a disturbance which is sufficient to reverse the binary state of a stored data bit. This is called a xe2x80x9csoftxe2x80x9d error because no permanent damage is done to the structure of the chip and the disturbed memory cell is thereafter completely reusable for storing data.
Soft errors are particularly bothersome for the case of small, high-speed cache memory chips. If store updates are made to a xe2x80x9cdirtyxe2x80x9d cache regardless of the presence of soft errors, data integrity is soon lost, especially when the error is in the unmodified segment of the data. Left uncorrected, soft errors can turn into fatal double bit errors.
Various error correction methods have been proposed for correcting soft errors. One proposed method is to generate and include with each line of stored data a set of error correcting code bits which can be used to detect and locate a bit which has been changed as a result of a soft error event. As each line of data is subsequently read out of memory, all data bits including the error correcting code bits are decoded as a group and the decoder output indicates which, if any, data bit is in error. The data is corrected by reversing the binary state of the erroneous bit.
Unfortunately, this method of error correction is time consuming and adversely affects system performance and increases latency.
The present invention provides a solution to the data integrity problem without making a big compromise on latency and data throughput. In particular, error correction testing is not performed on every data sample or segment. It is performed only when necessary. A simple parity check is used to determine when error correction is needed.
For the case of data stored in a cache memory, for example, on data writes to the cache memory, the existing data currently stored on the desired cache storage line is read out and parity checked. The read-out data is modified with new data only if there is no parity error. If a parity error is detected, a cache miss is signaled and the read-out line of data is written back into the cache memory. Error correction code checking and error correction are performed on the defective line of data as part of this write-back to the cache memory.
For a better understanding of the present invention, together with other and further advantages and features thereof, reference is made to the following description taken in connection with the accompanying drawings, the scope of the invention being pointed out in the appended claims.