1. Field of the Invention
The present invention relates to set associative storage devices in data processing apparatuses, and in particular to mechanisms for correcting errors in the data values stored in such storage devices.
2. Description of the Prior Art
It is known that both soft and hard errors may manifest themselves in logic designs. In particular, such errors are known to particularly affect RAMs. In an attempt to combat this it is known to provide these components with error correction code (ECC), parity or other check mechanisms in order to allow the detection, and optionally correction, of errors that may occur.
Typically such a check and/or correct operation is only performed when a data value is directly accessed within the system. In a processor based system, this will typically be as a result of a load/store or instruction fetch operation. Note that the term “data value” is used here to refer to any stored value, whether data or an instruction. Thus, for example, when a processor accesses a given data value in a cache RAM, part of the access procedure in the cache RAM involves invoking a checking mechanism to verify (e.g. by reference to parity bits associated with the stored data value) whether an error has arisen in that stored data value. This checking mechanism may further take steps to automatically correct such an error (where this is possible), such that the correct data value is returned to the processor. Alternatively, the occurrence of the error may be reported to the processor, in particular where the correction of the error was not possible. The invocation of an error correction mechanism only when required by an access may result in certain RAM locations (and their associated check bits) remaining dormant and untested for a large period of time, thus increasing the chance that sufficient corruption may occur so as not to be correctable, or worse still undetectable.
One known approach which has been developed in realization of the problems that may result from some portions of a RAM remaining untested for an extended period of time uses a direct memory access (DMA) engine to perform accesses to the RAM in periods when it is not in use by the processor. These accesses exercise the error correction mechanism and thus correct any errors that are discovered. An example of this kind of approach is disclosed in “BIOS and Kernel Developer's Guide for the AMD Athlon™ 64 and AMD Opteron™ Processors”, AMD, February 2006. Such “hardware scrub” approaches are expensive due to the need to provide a dedicated hardware engine, and lack flexibility since they operate according to a predefined algorithm, e.g. cycling through a set of memory addresses. Consequently, they are not particularly power-efficient.
In addition, it is known to arrange a cache RAM in a set-associative fashion. Whilst this has advantages for the storage and retrieval of data values in the cache (these advantages being known in the art and not being further discussed here), this also has the consequence that a data value stored in the cache may be stored in as many locations as there are ways in the cache, e.g. in a 4-way set associative cache there are four locations where a given data value may be stored. Thus, when the cache is addressed in terms of the memory addresses that may be cached there, some physical locations may rarely be used, depending on the manner in which the way is selected for a given memory address. Consequently for some physical locations in a set associative storage device (such as a cache) the error correction mechanism may be exercised less often than for others, increasing the likelihood of uncorrectable corruption of data values occurring.
The article “POWER2 Fixed-Point, Data Cache, and Storage Control Units” by IBM (available for download from the IBM website at the following URL: http://www-03.ibm.com/servers/eserver/pseries/hardware/whitepapers/power/fxu—3.html) discloses a software-controlled memory-scrubbing function. The software uses three registers to control the scrub function: a start address register, an end address register and a timer value register. Thus, this function also references the memory in terms of memory addresses, and hence will not operate effectively when used with a set-associative storage device.
Accordingly, it would be desirable to provide an improved error detection/correction mechanism which allows a more flexible, power efficient approach to how and when the mechanism for correcting errors in the data values stored in a set associative storage device is invoked.