When data is read back from a memory in which it has been stored, it occasionally happens that an error occurs, i.e. that the data read back is not identical to the data previously stored.
A number of error correcting codes (ECC) are known in the prior art that are capable of not only detecting but also correcting errors. Typically, these codes can detect a broader range of errors than they can correct. For example, a DED-SEC code is capable of detecting any double errors that occur within the data field the code covers (i.e. errors in which two bits within the field are erroneous) and of correcting any single errors (i.e. only one wrong bit).
As applied to main memory or Random Access Memory (RAM) within a computer system, it may be desirable to consider each 64-bit double word as its own data field, i.e., to store along with it its own ECC or redundancy check information. As the computer system reads words from memory, this ECC information would be checked so that errors in the word could be detected and hopefully corrected.
If the ECC hardware detects a correctable error, then it is desirable to correct the word being read on-the-fly so as to provide the processor or I/O controller that is reading main memory with a corrected word. This is a performance critical task because accessing main memory is one of the most performance-critical aspects of computer system design. Any improvement or degradation in the latency between an access request and the delivery of the data requested often has a substantial effect on overall system performance.
It is further desirable to correct the word in main memory because errors accumulate over time. If subsequent errors occur within the same word, then they may convert a correctable error into an un-correctable error. The process of correcting the data stored in memory is called scrubbing the memory. Compared with the on-the-fly correction described above, the process of correcting the data stored in main memory is more time consuming and more costly in terms of requiring additional hardware and/or software to implement it.
In one approach to scrubbing memory, it is desired to not impose any of the error correction task on software. In this case, it would be desirable to include in the memory controller a state machine that temporarily suspends the normal operation of the memory and writes the corrected word back to the erroneous memory location. Disadvantages of this approach include both the complexity of the hardware that would be required to do the write back and the performance penalty because the memory would not be accessible for other purposes until the correct and re-write process is completed.
In another approach to scrubbing memory, it is desired to keep hardware costs and complexity at a minimum and impose most of the error correction task on software. Such an approach would find it desirable to generate an interrupt to activate software or firmware, executing on the processor, to correct the erroneous memory location. Unfortunately, in some systems the limited number of interrupt request signals or vectors that are available are already utilized. Also, a different version of the correction routine may be required for each different operating system that will be run on the computer system.