When typical programs are analyzed, there is a tendency that a memory reference of a given time is conducted in a limited area of memory. This phenomenon is referred to as a “locality of reference,” which can be readily understood by the fact that a program loop and a subroutine are frequently used for typical computer programs and are sequentially processed. In addition, a memory reference to data is generally limited to one area of memory, which is applicable to a table-lookup process and a repeating process referring to a common memory and arrangement.
If a frequently referred to program segment and data are stored in a high-speed small-size memory, average memory access time is shortened, thereby reducing the processing time required for performing the program. Such a high-speed small-size memory is commonly referred to as a “cache memory.” In general, access time of a cache memory formed of SRAM (static random access memory) is 5-10 times as fast as that of a main memory formed of DRAM (dynamic random access memory). Cache is generally the fastest memory device in the hierarchy of memories and approximates to the speed of the CPU (central processing unit).
The basic operation of cache is as follows. When a CPU requires an access to memory, cache is first investigated. When a desired word is found in cache, the CPU reads out the word. When a desired word is not found in cache, the CPU accesses main memory for reading out the word. A block of the main memory that includes the word is transferred from the main memory to cache memory. A size of the block is commonly on the order of about 1-16 words.
The performance of a cache memory is measured according to “hit ratio”. When CPU refers to a memory and finds the desired word in cache, this process is referred to as a “hit.” However, when the CPU does not find the word in cache, but rather in a main memory, the process is referred to as a “miss.” The “hit rate” represents a ratio of the number of hits divided by the total number of memory references performed by CPU. The hit rate is experimentally measured by performing a typical program and calculating the number of hits and misses during a given time period. The hit ratio is generally greater than 0.9, which is to demonstrate the locality of a memory reference.
The basic characteristic of the cache memory is a rapid access time such that it takes little time to locate a word in cache. To transfer data from the main memory to the cache memory is referred to as a “mapping process,” which is divided into associative mapping, direct mapping, and set-associative mapping.
The cache memory adopting the associative mapping is the fastest and the most flexible. The associative memory stores an address and data of a memory word. An address of the CPU is stored in an argument register. The associative memory locates the same address as the address in the argument register, reads corresponding data, and sends the data to the CPU. If there does not arise a hit condition, the associative memory locates the word in a main memory and stores the address and data in the associative cache memory. If the cache memory is full, a pair of address and data should be replaced to store a necessary word that does not exist in the cache. The pair that is to be removed is determined by a replacement algorithm as selected by a designer.
In direct mapping, an address of the CPU is divided into an index field and a tag field. Each word of the cache memory is formed of a data word and a tag. If the CPU requests a reference of memory, the index field of a CPU address is used as an address for accessing the cache. Comparing the tag field of the CPU address with that of a word read out from the cache, if both are consistent, it is a hit condition representing that desired data is in the cache. Otherwise, it is a miss condition and a desired word is read out from the main memory. The word has a new tag and is stored in the cache. A disadvantage of the direct mapping is that if two or more words, which have the same index and different tags, repeatedly access, a hit rate may be markedly lowered.
The set-associative mapping is used to compensate for this weakness in direct mapping so that each word of the cache may store two or more memory words under the same index address. Each data word is stored together with a tag, and tag word items in a word constitute a single set.
Recently, with developments in semiconductor fabrication, CPUs and cache memories have been integrated in a single chip. FIG. 1 is a schematic diagram illustrating a configuration of a single processor chip 1 in which CPU 2 and a cache memory 3 are integrated. The processor chip 1 may comprise a CPU 2, or other types of processors such as DSPs (digital signal processors) or microprocessors. In the case where the cache memory and the CPU are embedded in a single chip, the number of data bits that are communicated between the CPU chip and the cache memory is highly increased in order to elevate the resulting input/output speed of data. As a result, the general performance of the CPU may be improved. However, because the embedded cache memory has a restriction on a size as compared to an external cache memory, its cache hit rate may be lowered. Accordingly, when cache memories are embedded in the CPU chip, a set size of a set-associative cache memory has been increased to compensate for reduction in the cache hit rate caused by downsizing.
FIG. 2 illustrates an example of a multi-way set-associative cache memory configuration, which is disclosed by Toshichika on Oct. 10, 2000 in U.S. Pat. No. 6,131,143 entitled “MULTI-WAY ASSOCIATIVE STORAGE TYPE CACHE MEMORY.”
The cache memory illustrated in FIG. 2 is a 9-way set-associative cache memory. In FIG. 2, an address 10 supplied from the CPU 2 to an address bus is formed of a tag address 11 and a set address 12.
Referring to FIG. 2, the cache memory 3 comprises a decoder 20, 9 tag memories 30a-30i, valid bits 31a-31i, data memories 40a-40i, comparators 50a-50i, and a way selector 55.
The decoder 20 decodes the set address 12 of the address 10. The tag memories 30a-30i and the data memories 40a-40i include k lines (i.e., sets), and each of the sets stores data. The data memories 40a-40i supply the cache size selector 60 with data stored in the set selected by a set address decoded by the decoder 20. The valid bits 31a-31i represent whether or not the contents stored in corresponding data memories 30a-30i are valid, respectively. The way selector 55 selects one of or none of output data among the data memories 40a-40i in response to signals HITa-HITi that represent results of the comparators 50a-50i. 
In FIG. 2, suffixes a and i of references, such as 30a and 30i representing the tag memories and 40a and 40i representing the data memories, represent ways a and b, respectively.
Next, the operation of the cache memory 3 will be explained.
When the address 10 is supplied from the CPU 2, the set address 12 is decoded by the decoder 20. The contents stored in the selected set address of the tag memories 30a-30i are supplied to the comparators 50a-50i, respectively. In addition, the contents stored in the selected set address of the data memories 40a-40i are supplied to the way selector 55.
When one of the comparators 50a-50i determines that the contents stored in the selected set address of the tag memories 30a-30i are consistent with the address tag 11 and corresponding valid bits 31a, 31b, . . . , or 31i are active, a corresponding bit signal HITa, HITb, . . . , or HITi becomes active.
When there is present a way in which the contents stored in the selected set address of the tag memory 30a, 30b, . . . , or 30i are identical with the tag address 11, data of the way is output from the way selector 55. When there is present no way in which the contents stored in the selected set address of the tag memory 30a, 30b, . . . , or 30i are identical with the tag address 11, data stored in a main memory (not shown) rather than the cache memory is accessed by the CPU 2. In addition, the data in the main memory accessed by the CPU 2 is stored in the data memory 40a, 40b, . . . , or 40i of the cache memory 3.
In the cache memory explained above, the tag memories 30a-30i and data memories 40a-40i are formed of a plurality of memory cells, and the cache memory further comprises redundant cells for replacing defective memory cells. Generally, the last step of manufacturing semiconductor integrated circuits is to test whether or not the manufactured semiconductor integrated circuit is defective. In the test step, if it is determined that a cache memory includes a defective memory cell, the memory cell is replaced by a redundant cell. However, because the number of the redundant cells is limited, when a defective cell cannot be replaced by a redundant cell, the integrated circuit should be scrapped or discarded.
In a processor chip in which CPU and a cache memory are integrated, if the cache memory includes defective cells that cannot be replaced by redundant cells, the processor chip must be scrapped or discarded. In general, CPU fabrication is expensive relative to fabrication of other semiconductor integrated circuits. Therefore, it is a substantial loss to scrap or discard the entire processor chip when several defective cells of the cache memory are present.