1. Field of the Invention
The present invention is directed to data processing systems. More specifically, the present invention is directed to a method, apparatus, and computer program product for testing a data processing system's ability to recover from cache directory errors.
2. Description of the Related Art
Cache memories are relatively small buffer memories used in computer systems to provide temporary storage for data retrieved from larger, slower main memory devices, such as main memory in the computer system or external storage devices such as hard disk drives. Cache memories are located near the processor cores that they serve, and are used to store copies of data that has recently been used by a processor core and/or is likely to be used by the processor core in the near future. Use of cache memory can frequently reduce the period of time for data access that would otherwise be required for accessing to main memory. Cache memories, if properly managed, can significantly improve computer system performance.
A central processing unit in a computer system needs less time to obtain data from a cache memory than it does to obtain the same data by accessing the main memory or an external storage device. If a reasonable percentage of the data needed by a central processing unit is maintained in cache memory, the amount of processor time wasted waiting for data to be retrieved from main memory is significantly reduced, improving the computer system performance.
When a processor core needs particular information, the processor core first looks to its cache memory to determine whether information is currently stored in its cache memory. The information retrieved from the cache may be data, instruction, or a combination of data and instruction. Thus, the cache may be a data cache, an instruction cache, or a cache that includes a combination of instruction and data.
If the requested information is found in the cache, this is called a “hit”. If the requested information cannot be found in the cache, this is called a “miss”. A miss necessitates a “fetch” operation to retrieve the memory from other storage, such as main memory in the computer system or an external storage device.
A processor chip may include multiple processor cores within that chip. Each processor core may include at least one cache memory that is associated with the processor core. A cache is often included in the processor core itself for use by that processor core. In addition, other caches may be included within a chip for use by the processor cores included in that chip.
There are several different kinds of caches including direct mapped caches, set associative caches, and content addressable memory or CAM type caches.
The simplest type of cache is a direct mapped cache. With a direct mapped cache a subset of the bits are used to address both the directory portion of the cache and the data portion of the cache. Generally the low order bits of the address are used to address the cache and directory arrays. The bits that are not used to address the cache are stored in the directory. In addition there is a valid bit associated with each entry in the cache. A cache hit in the direct mapped cache occurs when the bits of the address that are stored in the directory matches the address bits that are being fetched, and the entry is marked as being valid by its associated valid bit. As an example, there may be a four-entry cache and four address bits. Bits 2:3 of the address would be used to address the directory while bits 0:1 of the address would be stored in the directory with a valid bit.
A set associative cache is several direct mapped caches side by side. In a direct mapped cache each address can only exist in one place in the cache. In a set associative cache the data can exist in multiple places or “ways”. When the data is loaded into the cache the hardware must determine in which “way” it will store the data in the cache. Typically this is done with a least recently used scheme (LRU), but sometimes other schemes are used. Based on which “way” the cache directory matched determines which “way” the data should be muxed out. If none of the “way” addresses match then there is a miss.
A CAM is a set associative instruction cache (Icache) where none of the bits are used to address the array and all of the bits are used for the compare, e.g., if there is a four-entry CAM, all four entries in the CAM would contain address compares for all the bits.
Some caches use parity. For these caches, each entry in the cache and the cache directory includes an associated parity. This parity is typically one bit although multiple bits may be used. Parity is used for error-checking. Parity implies whether the associated entry includes an even or odd number of logical ones. If, for a particular entry, the associated parity bit implies that the parity for the entry is odd and the system is using even parity, a determination is made that an error has occurred. Thus, the parity for this entry is said to be “bad”. Conversely, if, for that particular entry, the parity bit implies that the parity for the entry is even, a determination is made that no error has occurred. Thus, the parity for this entry is said to be “good”.
When the parity that is associated with a cache directory entry is bad, an error has occurred and the cache directory entry is not valid. In this case, any information that is retrieved from the cache that corresponded to that cache directory entry should not be forwarded or processed. If an error occurs, the system can either implement an error recovery process or execute a machine stop or machine check.
If an error recovery process is not executed properly, the invalid information could be processed by the machine which results in inaccurate results by the system or a system malfunction such as a system crash. Thus, it is important that the error recovery process be executed properly.
An error recovery process must also be able to recover when the information in the cache is incorrect. This incorrect information should not be processed by the system.
The error recovery process may be tested by injecting an error into the cache directory and then making sure that the error recovery mechanism in the system recovered properly from that injected defect. However, it can be difficult to determine whether the error recovery mechanism recovered properly from that injected defect.
For example, an error may be injected into the cache directory by changing a bit in the entry. This would result in a cache directory hit when the address that is now being fetched just happens to match a directory entry only because the entry was corrupted. In this case, there was a hit in the cache directory only because the defect caused the address to change from address A to address B and the cache happened to have the data for address A.
However, the parity bit associated with this directory entry would now imply bad parity. If address A were needed, a cache miss would occur and no extra recovery action would be required. If an attempt were made to fetch address B, there would be a hit in the cache but the directory entry's parity would imply bad parity. In this case, the error detection logic would be required to properly detect and recover from the error.
Therefore, a need exists for a method, apparatus, and computer program product for testing a data processing system's ability to recover from cache directory errors.