1. Field of the Invention
The present invention relates to techniques for maintaining the validity of data stored in memory. More particularly, this invention relates to scrubbing techniques for correcting errors in data stored in memory with reference to redundant information stored in association with that data.
2. Description of the Prior Art
It is known that data stored in a memory device can develop errors. Contemporary memory devices, being constructed at ever smaller scales and higher densities, can be particularly vulnerable to such errors occurring, for example due to variations in the physical structure of the semiconductors from which such devices are constructed, or for example due to an external influence such as temperature variation or incident ionizing radiation. Due to this vulnerability, it is known for error correcting codes (ECC) to be stored in association with the stored data values, to provide a level of redundancy which allows isolated errors (typically single bit errors) to be corrected.
A data processing system may be arranged to check its memory reads for ECC errors, but due to data locality reasons a sequence of memory reads can be confined to a small range of addresses leaving other memory locations untouched for a long time. The longer a memory location is left untouched, the more vulnerable it can become to more than one soft (i.e. correctable) error occurring.
In view of the recognised problem that memory locations which are left untouched for a long time may develop errors which cannot be corrected with reference to the ECC data, it is known to perform memory scrubbing. Typically, this technique involves the memory controller periodically reading in memory lines, checking for and correcting any errors, and subsequently writing those memory lines back. Assuming that this memory scrubbing can be performed frequently enough that single bit flip errors do not accumulate into multiple bit errors, then this technique can maintain the validity of the data in the memory device.
Automatic memory scrubbing is used in various fields where data reliability is important, such as embedded processors used in safety-critical systems, and in environments which are exposed to radiation levels which increase the frequency of bit flip errors, such as space/error space applications (see for example M. Rodriguez, N. Silva, J. Esteves, L. Henriques, D. Costa, N. Holsti, K. Hjortnaes, “Challenges in Calculating the WCET of a Complex On-board Satellite Application”, In Proc. 3rd International Workshop on Worst-Case Execution Time Analysis, (WCET 2003), 2003). The technique is also used in data servers to promote reliability, availability and serviceability (RAS).
In order not to disturb regular memory requests and the consequent performance impact, memory scrubbing is usually performed during idle periods of the data processing system, being performed periodically rather than continuously. This automated process is usually controlled by the memory controller which initiates an automatic memory scrubbing process. Examples of such automated memory scrubbing processes are discussed in:
“The Intel Itanium Processor 9300 Series”, White Paper, retrieved from http://download.intel.com/products/processor/itanium/323247.pdf;
D. Abts, J. Thompson and G. Schwoerer, “Architectural Support for Mitigating DRAM Soft Errors in Large-Scale Supercomputers”, Selse 3, 2007;
J. Mitchell, D. Henderson and G. Ahrens, “IBM POWER5 Processor-based Servers: A Highly Available Design for Business-Critical Applications”, 2006; and
Advanced Memory Protection for HP ProLiant 300 Series G4 Servers”, http://h20000.www2.hp.com/bc/docs/support/SupportManual/c00218059/c00218059.pdf.
Some servers such as the Sun Solaris servers perform memory scrubbing by running a kernel memory scrub thread (see “Soft Memory Errors and Their Effect on Sun Fire™ SystemsE, Sun Microsystems, 2002). This thread scrubs the whole memory periodically and the scrub process is set up so that it will traverse all of the physical memory within 12 hours. Furthermore it reads 8 MB pages so as not to be obtrusive. The read operation is accomplished using block load hardware to maximize read bandwidth (see UltraSPARC Architecture 2007, http://opensparc-t2.sunsource.net/specs/UA2007-current-draft-HP-EXT.pdf).
G. R. Brown, “Radiation Hardened PowerPC 603e TM Based Single Board Computer”, Aerospace Conference, 2001, IEEE Proceedings, 2001 discusses a radiation-hardened embedded processor designed for space and aerospace applications implementing a hardware memory scrubber in the memory controller.
Another known memory scrubbing techniques involves performing memory scrubbing in a cache using the physical location (i.e. set/way) rather than the addresses. To do this a dummy access is made to the cache which is similar to a cache maintenance operation. This technique is discussed in US Published Patent Application 2009/0044086.
However, such automated memory scrubbing techniques must be carefully set up so that they do not adversely affect the system performance, whilst on the other hand they must be sufficiently active to ensure that the data reliability is maintained. In this context it would be desirable to provide an improved technique for memory scrubbing.