1. Field of the Invention
The present invention relates generally to cache memory systems, and in particular, to mechanisms for accommodating hardware failures which disable a subset of the cache storage elements.
2. Description of the Related Art
High performance processors have used cache memory systems as an integral component of overall system design for many years. A cache memory typically has a much faster access time than main storage. For example, cache may make use of a relatively small number (N) of high-speed data storage elements, located in close proximity to an associated processor, while main storage typically uses larger numbers of storage elements and is located at some distance from the processor. Cache memory systems have been designed to overcome the access speed limitation of main storage by providing rapid access to a relatively small set of data which is likely to be used during a relatively short time interval.
If a datum with a given address can be stored in any location within the cache (a fully associative cache) access time is relatively slow, because of the large number of address comparisons required to find the datum. To provide rapid access to data in the cache, most caches limit the number of locations within the cache into which a datum with a given virtual address may be stored. The set-associativity of the cache determines the number of locations in the cache into which a datum with a given address (in main memory) can be placed. The set-associativity of the cache is an important design parameter, influencing the access speed and the cache hit ratio.
Set associative caches are commonly used. In a set associative cache, a datum with a given address may be stored in one of a limited group of locations in the cache, known as a congruence class. The directory for such a cache will include a row of addresses for each congruence class. To access data in the cache, the addresses in the row are compared to the desired address in parallel, to determine whether the desired datum is stored in the cache in any of the locations within the congruence class. Logic is required to determine which, if any, of the addresses match the desired address. Because a given datum can be stored in more than one location, a replacement strategy may be implemented to retain data in the cache which is likely to be accessed, improving the cache hit ratio.
When the set associativity is 1, the cache is said to be direct mapped. FIG. 1 shows a conventional direct mapped cache. That is, the congruence class only includes one location for a datum with a given address. FIG. 1 also shows a conventional apparatus for addressing data in a direct mapped cache unit 100. Cache unit 100 includes a cache directory 110 and a cache memory 112. The cache memory includes a plurality of rows, 108a-n, each row including a line of data. Each row 108a-n forms a congruence class with a single set or location within the row into which a given datum may be stored. The cache directory includes entries 180a-n which are associated with respective rows 108a-n. Entries 180a-n include address tags 105a-n which store the high order bits of the respective addresses (in main memory) of the requested data stored in respective rows 108a-n. Valid bits 103a-n indicate whether the data in the associated cache memory storage elements 108a-n are valid. There is also a one-for-one correspondence between the low order bits 116 of the requested address and the entries 180a-n in directory 110, so that it is unnecessary to store the low order bits of the address in entries 180a-n.
Because cache 100 is direct mapped, identification of a row in cache memory 108a-n is sufficient to determine where a given datum is stored. Because cache 100 uses real placement, the translated ADHIGH bits 114 of the translatable portion of the address are stored in the address tag 105a-n.
When a datum 120 is requested by the processor 140, the low order bits of the address in ADLOW 116 are used to select a row in directory 110 to check. The directory entry 180a-n contains an address tag which comprises the high order bits of the address in main memory in which the data in the associated cache memory line 105a-n are stored. The address tag 180a-n is compared to ADHIGH 114 at the same time that valid bit 103a-n is checked. If the addresses match and the data is valid, then there is a cache hit and the associated cache memory entry 108a-n is provided to the processor 140.
The direct mapped structure has a very fast access latency, because for any desired datum, only one address in the cache directory must be compared to the desired address to determine whether the datum is in the cache. The direct mapped cache may also be less expensive, because the logic used in a set-associative cache to perform multiple compares within each congruence class is not needed. For some applications, the faster cache access speed of a direct mapped cache outweighs the slightly higher (relative to set-associative mapping) cache miss rate.
Nonetheless, direct mapped caches have not been used as widely as set-associative caches. One reason is the inability of a direct mapped cache to accommodate hardware failures. If a location in the cache is subject to a hard failure (a failure of a storage element, the electrical path connecting the element, or the logic that is used to access the element), then data which map to the failed location in the cache cannot be stored in the cache at all. Any reference to data which map to the failed location results in a cache miss; the data must be fetched from main memory, which has a much longer access time than cache.
While set-associative caches are more reliable, they are not immune to failure. The loss of a single storage element in a set associative cache only disables one set of the congruence class. But a failure in one of the lines used to access the congruence class may result in the loss of an entire congruence class, even in a set associative cache.
Another aspect of cache memory design is the determination of the number of congruence classes. The simplest method to increase the size of a cache memory is to increase the number of congruence classes without changing set associativity. So long as the number of bits required to uniquely identify the congruence class does not exceed the number of non-translatable bits in the virtual address, the non-translated (virtual) address may be used to request data from the cache. This is faster than using the real (translated) address to access the cache, because there is a delay associated with address translation.
When the number of congruence classes grows so large that the number of bits in the non-translated portions of the address is insufficient to uniquely identify the congruence class, then the requesting address expands into the translated field. This results in formation of cache synonym classes. A synonym class includes a plurality of congruence classes whose addresses have the same non-translatable address field but different low order bits within the translatable address field. If the virtual address is used to search for a datum in the cache, it is possible that the wrong congruence class within the synonym class is checked. The system responds as if there is a cache miss, even though the desired datum is actually present in the cache in another congruence class different from the class addressed by the request.
The existence of cache synonym classes has generally been regarded as a problem in cache design, and a number of systems have been disclosed to detect the existence of synonym classes, so that virtual addresses may be used for retrieving data from the cache. Such systems are discussed in U.S. Pat. Nos. 4,332,010 to Messina and 4,400,770 to Chan et al.