1. Field of the Invention
The present invention relates to computer memory. More specifically, the present invention relates to self-correcting memory in a shared memory multiprocessor system.
2. Related Art
Modern computing systems designers are under constant pressure to increase the speed and density of the integrated circuit devices, including the memory devices, within these systems.
Increasing the density of memory devices, however, can cause the occurrence of “soft errors” to increase, which can lead to erroneous computational results. These soft errors occur at random and are attributable to uncontrollable causes, such as alpha particle radiation. Increased density leads to smaller memory cells within the memory devices and, in turn, smaller charge levels within the cell to indicate the logic state of the cell. The smaller charge levels make a cell more susceptible to soft errors.
In an attempt to reduce the impact of soft errors within a memory system, designers have routinely used self-correcting memory systems as described in U.S. Pat. No. 4,319,356 issued to James E. Kocol and David B. Schuck. These self-correcting memory systems use additional bits within the device for storing an error correcting code, and use error correcting circuitry to correct any cells that have a soft error. In operation, the memory system periodically visits each cell within the memory system and corrects any errors detected in the cell's data. This process is termed “scrubbing the memory.”
There are several methods that can be used to form the error correcting code, and to correct soft errors. In general, the number of bits assigned to the error correcting code determines how many errors the error correcting systems can correct. Commonly available systems include single bit error correction/double bit error detection, and double bit error correction.
While effective, these error correcting memory systems do not provide error correction on cache memory within multiprocessor shared memory systems. Typically, devices and subsystems such as a central processing unit or an input/output device within these multiprocessor shared memory systems have an associated cache for storing data while it is in use by the device or subsystem. As the system operates, data from the memory system is “checked out” to the cache. While the data is checked out to a cache, correcting errors in the cells in main memory will not correct errors in the cache. If the data is checked out for a long time, it is possible for multiple soft errors to accumulate within the data cell such that the number of errors is beyond the capabilities of the self-correcting memory system.
What is needed is a method and apparatus for eliminating soft errors in data checked out to a cache.