Computer systems, from handheld electronic devices to medium-sized mobile and desktop systems to large servers and workstations, are becoming increasingly pervasive in our society. Each computer system includes one or more processors. A processor manipulates and controls the flow of data in a computer. Decreasing the size of the processor will tend to decrease its manufacturing cost. Improving processor reliability tends to improve the overall quality of the computer. Processor designers employ many different techniques to decrease processor size and to improve reliability to create less expensive and more robust computers for consumers.
One reliability problem arises from occurrences known as soft errors. A soft error is a situation in which a bit is set to a particular value in the processor and spontaneously changes to the opposite value (e.g. from a logical "1" to a logical "0", or vice-versa), thereby corrupting the associated data. A soft error may be caused by cosmic rays passing through a storage element within the processor, charging or discharging the storage element, thereby causing a stored bit to change its value.
As processor supply voltages continue to be reduced in an effort to lower processor power consumption, the difference in voltage values that define the 1's and 0's of bits is reduced as well. This makes processors more susceptible to soft errors. In addition, as storage elements become more densely packed within processors, the likelihood of a soft error increases.
One way to combat soft errors is through the use of error correction or error detection bits. The most common type of error correction and error detection bits are parity bits. One type of parity bit works by indicating whether the number of bits with a logical value of 1 is odd or even in a given data value. For example, for a 64 bit value containing 22 bits with a value of 0 and 42 bits with a value of 1, the parity bit is set to 0 (assuming even parity), indicating that the associated 64 bit data structure contains an even number of bits with a value of 1. If the processor transfers or stores this 64 bit value with parity checking, 65 bits are actually transferred or stored, the original 64 bit value plus the parity bit.
If a soft error occurs to any one bit of the 64 bits during transfer or storage, it can be detected by the processor. For example, if a bit with a value of 0 flips to a 1 by soft error, the 64 bit value would then contain 21 bits with a value of 0 and 43 bits with a value of 1. By performing a parity check on the 64 bit value, the soft error is detected because the odd number of bits with a value of 1 no longer agrees with the previously set parity bit corresponding to an even number of bits having a value of 1. Upon detecting the soft error, the processor may regenerate the original 64 bit value to preserve the integrity of the data.
One area where soft errors are of particular concern is in the cache. A cache is storage space located close to the processor, or in some cases within the processor, for fast data access by the processor. The processor finds data inside the cache by matching a portion of a desired data address, called a lookup tag, to the same portion of a data address stored in the cache, called a cache tag.
If a cache tag is found that matches the lookup tag, this is referred to as a hit, and the associated data is assumed to be the data desired by the processor. In the presence of soft errors, however, it is possible for a lookup tag to match two cache tags. For example, a lookup tag may match a first cache tag associated with the desired data, and a second cache tag that has been corrupted by a soft error to look identical to the first tag.
To handle the occurrence of multiple hits, processor designers have designed parity check circuitry into their caches. Parity bits are generated and stored with their associated tags in the cache. In the event two cache tags match a single lookup tag, the two cache tags are parity checked against their associated parity bits to determine which of the two tags is corrupted and which is the true match. Unfortunately, the circuits necessary to accommodate multiple hits and to perform parity checking in this manner are large and complex, increasing processor size and power consumption.