The invention relates to computer memory systems. More particularly, the invention relates to error detection and/or correction among multilevel cache memories.
In a computer system, the interface between a processor and memory is critically important to the performance of the system. Because fast memory is very expensive, memory in the amount needed to support a processor is generally much slower than the processor. In order to bridge the gap between fast processor cycle times and slow memory access times, cache memory is utilized. A cache is a small amount of very fast memory that is used to store a copy of frequently accessed data and instructions from main memory. The microprocessor can operate out of this very fast memory and thereby reduce the number of wait states that must be interposed during memory accesses. When the processor requests data from memory and the data resides in the cache, then a cache read xe2x80x9chitxe2x80x9d takes place, and the data from the memory access can be returned to the processor from the cache without incurring the latency penalty of accessing main memory. If the data is not in the cache, then a cache read xe2x80x9cmissxe2x80x9d takes place, and the memory request is forwarded to the main memory, as would normally be done if the cache did not exist. On a cache miss, the data that is retrieved from the main memory is provided to the processor and is also written into the cache due to the statistical likelihood that this data will be requested again by the processor in the near future.
The individual data elements stored in a cache memory are referred to as xe2x80x9clines.xe2x80x9d Each line of a cache is meant to correspond to one addressable unit of data in the main memory. A cache line thus comprises data and is associated with a main memory address in some way. Schemes for associating a main memory address with a line of cache data include direct mapping, full association and set association, all of which are well known in the art.
The presence of a cache should be transparent to the overall system, and various protocols are implemented to achieve such transparency, including write-through and write-back protocols. In a write-through action, data to be stored is written to a cache line and to the main memory at the same time. In a write-back action, data to be stored is written to the cache and only written to the main memory later when the line in the cache needs to be displaced for a more recent line of data or when another processor requires the cached line. Because lines may be written to a cache exclusively in a write-back protocol, precautions must be taken to manage the status of data in a write-back cache so as to preserve coherency between the cache and the main memory. The preservation of cache coherency is especially challenging when there are several bus masters that can access memory independently. In this case, well known techniques for maintaining cache coherency include snooping and snarfing.
A cache may be designed independently of the microprocessor, in which case the cache is placed on the local bus of the microprocessor and interfaced between the processor and the system bus during the design of the computer system. However, as the density of transistors or a processes chip has increased, processors may be designed with one or more internal caches in order to decrease further memory access times. An internal cache is generally small, an exemplary size being 8 Kb (8192 bytes) in size. In computer systems that utilize processors with one or more internal caches, an external cache is often added to the system to further improve memory access time. The external cache is generally much larger than the internal cache(s), and, when used in conjunction with the internal cache(s), provides a greater overall hit rate than the internal cache(s) would provide alone.
In systems that incorporate multiple levels of caches, when the processor requests data from memory, the internal or first level cache is first checked to see if a copy of the data resides there. If so, then a first level cache hit occurs, and the first level cache provides the appropriate data to the processor. If a first level cache miss occurs, then the second level cache is then checked. If a second level cache hit occurs, then the data is provided from the second level cache to the processor. If a second level cache miss occurs, then the data is retrieved from main memory (or higher levels of caches, if present). Write operations are similar, with mixing and matching of the operations discussed above being possible.
A common transaction in a multilevel cache system is a fill operation. In a fill operation, a line of a higher level cache is copied into a lower level cache. Before writing the copied line into the lower level, it is prudent to take measures to ensure that the line is valid (i.e., free of errors). Errors can be introduced into a cache memory arrayxe2x80x94or any memoryxe2x80x94when alpha particles, cosmic rays or some other electrical disturbance causes one or more bits to change logical state. Although data corruption is very rare, its consequences are significantxe2x80x94almost always forced shutdown of the processor. To guard against this possibility, cache lines can be encoded using an error correction code (ECC). ECC encoding utilizes additional bits to represent the line as a codeword containing a small amount of controlled redundancy, so as to enable detection and correction of the most common errors (e.g., single bit errors or double bit errors). As the amount of redundancy is increased, the error detection and correction capability of the ECC encoding is increased. During a fill operation, an error detection and correction algorithm is performed on the basis of the ECC encoding before the line is copied to the lower level cache. Unfortunately, the time required for execution of the error detection and correction algorithm significantly slows the transfer of the line. In particular, the error detection and correction algorithm may require one or more computer cycles. Only after those cycles can the lower level cache begin to process the transferred data. Such processing typically includes buffering of the line and its tag (address) before the line is written to the lower level cache.
This latency problem is better understood by considering FIG. 1, which shows a block diagram of known circuitry 100 for a filling operation from an L1 cache 105 to an L0 cache 110. The L1 cache 105 and the L0 cache 110 are solid state memories, which may be physically packaged together on the same integrated circuit or separately on distinct integrated circuits (and perhaps combined with other circuitry not shown). The L1 cache 105 outputs an M+L bit data codeword CODEWORD, an N bit address word TAG for the address of the data codeword CODEWORD, and a control line FILL VALID. The M+L bit data codeword CODEWORD contains M data bits and L redundant bits for ECC. The control line FILL VALID is asserted (i.e., one, set, hot or high) when the values on the address line TAG and/or the data codeword CODEWORD are valid and ready to transfer. The data codeword CODEWORD is input from the L1 cache 105 to an error detection and correction circuit 115, which outputs a control signal /ERROR, which is high when an error is not detected. The error detection and correction circuit 115 also outputs a possibly corrected data word DATAxe2x80x2, which is the same as the M raw data bits in the codeword CODEWORD, if no errors are detected (assuming that the ECC is systematic). If a correctable error is detected, then the data word DATAxe2x80x2 is the corrected data pattern. The control signal /ERROR is input to an AND gate 125, to which the control signal FILL VALID is also input. The output of the AND gate 125 is a control signal WRITE ENABLE, which is high when the L1 cache 105 is ready to proceed with the fill operation and when no errors have been detected in the data codeword CODEWORD.
The control signals /ERROR and WRITE ENABLE are not formed until the error detection and correction circuitry 105 has completed processing of the data codeword CODEWORD. As mentioned above, this processing may require one or more computer cycles. The L0 cache 110 cannot write the data word DATAxe2x80x2 until after the control signal ENABLE is formed. Thus, a delay elapses before writing of the transferred cache line.
In one respect, the invention is a method for masking the latency of error detection and/or error correction applied to data units transferred between a first memory and a second memory. The data units may be codewords having redundant bits that provide error detection and/or correction capability. The method comprises the following steps: determining whether there is an error in a data unit in the first memory; transferring data based on the data unit from the first memory to a second memory, wherein the transferring step commences before completion of the determining step; and disabling at least part of the second memory if the determining step detects an error in the data unit. Optionally, the method corrects the error in the data unit, if the error is correctable. Preferably, the first memory and the second memory are cache memories, and the data unit is a cache line. The disabling step may be accomplished, for example, by forcing all accesses to the second memory to return misses or by stalling the second memory.
In another respect, the invention is an apparatus. The apparatus comprises a first memory, a second memory and error detection circuitry. The first memory stores error protection encoded codewords. The error protection may be error detection capability, error correction capability or both. Data corresponding to a codeword is transferred from the first memory to the second memory regardless of whether the data contains an error. The error detection circuitry disables at least part of the second memory if the data transferred to the second memory contains a detectable error. Preferably, the first memory and the second memory are cache memories, and the second cache memory resides between the first cache memory and a microprocessor core.
In yet another respect, the invention is an apparatus for avoiding the latency of error detection and/or correction applied to a unit of data in a first memory. The apparatus comprises a disable logic and a second memory. The disable logic is connected to the first memory and provides a disable signal in response to detection of an error in the unit of data. Receipt of the disable signal by the second memory causes a stall of the second memory. Preferably, the first memory is a higher level cache, and the second memory is a lower level cache. Optionally, the second cache memory comprises a tag buffer and a data buffer, and the disable signal is connected to the tag buffer.