The present invention relates to a cache memory and, more particularly, to a cache memory suitable for use as incorporated in a microprocessor.
The cache memory is smaller than the main memory in storage capacity but faster in access. Therefore, the cache memory is located very close to the central processing unit (CPU) for the purpose of supplying data held in the main memory to the CPU. A variety of problems about the cache memory are discussed in the ACM, Computing Surveys, Vol. 14, No. 3, 1994, pp. 473-530 and "Computer Organization & Design--The Hardware/Software Interface," Morgan Kaufmann Publishers, pp. 454-527, 1994, for example. The main problems of the cache memory are access time and power consumption.
An example of a conventional cache memory of relatively small power consumption is shown in the NIKKEI Electronics, Feb. 14, 1994, pp. 79-92 (this cache memory is hereinafter referred to as the first prior-art technology). FIG. 2 shows a block diagram of the first prior-art technology.
As shown, the cache memory according to the first prior-art technology is a four-way set-associative cache memory. The set-associative memory is provided as follows. Namely, a plurality of areas that can hold data in a size of blocks in the cache memory are divided into a plurality of rows and a plurality of columns. Each of areas in main memory (not shown) that can hold a data block is divided into a plurality of columns corresponding to the above-mentioned plurality of columns. Block storage areas in the same column in main memory are associated with a given block storage area in the cache memory column corresponding to that same column.
To be more specific, as shown in FIG. 2, in the prior-art cache memory, an address array 200 is composed of four memory mats (also called ways) 206 (namely, way 0, way 1, way 2, and way 3), a decoder 205 commonly provided for these ways, and a precharge and equalize circuit 207, a sense amplifier 208, and a comparator 209 provided for each of the ways. Likewise, a data array 201 is composed of four memory mats 218 (namely, way 0, way 1, way 2, and way 3) and an address decoder 217, a precharge and equalize circuit 219, a sense amplifier 220, and an output buffer 221 provided for each of the ways.
The above-mentioned prior-art cache memory operates as follows. First, access to the four ways 206 is started according to a middle address Am entered from a line 204. Addresses registered in the way 0, the way 1, the way 2, and the way 3 are read and are outputted from the sense amplifiers 208 provided for respective ways (these addresses are also referred to as tags). In the comparator 209 provided for each way, an upper address Au entered from a line 210 is compared with the address read from each way. If a match is found, namely if the cache memory has hit, the comparator 209 asserts a corresponding hit line 211, 212, 213 or 214. Conversely, if a mismatch is found, namely if the cache memory has not hit, the comparator 209 leaves the corresponding hit line negated.
Of the four ways of the data array 200, only one way for which the address array 100 has hit, is activated by the corresponding hit line.
Consequently, the above-mentioned prior-art technology is advantageous in power saving. However, the access time of the entire cache memory is a sum of the access time of the address array 200, the time required for the comparison operation in the comparator 209, and the access time of the data array 201, resulting in a relatively large value. This makes it difficult to enhance the operating frequency of the cache memory.
To overcome such a problem, the present inventors considered a method in which the address array is activated at the same time the data array is activated. FIG. 3 shows a block diagram of a four-way set-associative cache memory 3000 that operates in this method (this cache memory is called a reference technology hereinafter). In FIG. 3, the structures of an address array 300 and a data array 301 are generally the same as those of FIG. 2. The difference between the prior-art technology of FIG. 2 and the reference technology of FIG. 3 lies in that, when the address array 300 is activated, the data array 301 is activated at the same time. The data held in an output buffer 321 of one way among the four ways of the data array 301 corresponding to a way in which hit occurred in the address array 300 may only be outputted to a data line 322. In this method, the address array 300 and the data array 301 are accessed simultaneously, so that the access time of the entire cache memory 3000 is approximately equal to the access time of the data array 301. Thus, the access time of the entire cache memory is relatively short. In this method, however, a way in the data array corresponding to a way in which no hit occurred in the address array is also accessed, so that the power consumption of the data array increases significantly. Further, even if the operating frequency of the cache memory is lowered, the data array operates in the same manner as mentioned above, and therefore, the power consumption is not reduced.
The NIKKEI Electronics, Mar. 27, 1995, pp. 1320 introduces a new RISC (Reduced Instruction Set Computer) processor (a second prior-art technology hereinafter) developed by the assignee hereof and others. Especially, page 16 of the same publication describes a technology for suppressing cache power consumption that follows. Namely, SH7708 employed three methods of suppressing cache power consumption. In the first method, only a way in which hit occurred in the address array is driven. This method was also employed in SH7604, but it is impossible to drive the data array after address array hit determination at high-speed operations, because of the limitation of circuit speed in SH7708. Hence, a circuit constitution for dynamically determining a drive timing of a data array was provided and, if hit determination cannot be made in time, all four ways of the data array are driven. The limit of the frequency for selectively driving one way of the data array is about 40 MHz.