1. Field of the Invention
The present invention relates to a cache memory to be connected to an MPU (Micro Processing Unit) and more particularly to a data memory in the cache memory.
2. Description of the Related Art
In general, the cache memory is provided between an arithmetic unit such as the MPU and a memory system serving as a main memory and performs a function of bridging a gap in a processing speed that occurs between the arithmetic unit and the memory system. The cache memory has a tag memory used to store data on addresses of the memory system and a data memory used to temporarily store part of data contained in the memory system as cache data. In the data memory, as is well known, desired cache data is read in one cycle and a predetermined amount of data referenced by the memory system is written in another one cycle. By these operations, a waiting cycle time of the MPU is reduced, thereby achieving high-speed operations between the cache memory and the MPU.
FIG. 2 is a schematic block diagram showing configurations of a data memory in a conventional cache memory employing a set-associative method. In the example shown in FIG. 2, the data memory in the conventional cache memory employs a four-way set-associative method in which unitized data memory macro units 10-13, 20-23, 30-33 and 40-43 used to manage cache data are provided in four ways xe2x80x9c0-3xe2x80x9d, respectively. In FIG. 2, configurations of the data memory macro units 20-23 mounted in the way 1, the data memory macro units 30-33 mounted in the way 2 and the data memory macro units 40-43 mounted in the way 3 are the same as the data memory macro units 10-13 mounted in the way xe2x80x9c0xe2x80x9d. Write data D0-D3 are values obtained by selection of multiplexers 50-53 from word data stored in a line buffer 1 or from MPU write data, which are input, as appropriate, to the data memory macro units 10-43. One of read data which are outputted from each of the data memory macro units 10-43 and are selected by multiplexers 60-63 and a multiplexer 70 is outputted to the MPU.
In the data memory in the cache, the data memory macro unit described above is provided to every word in all the ways (in the example shown in FIG. 2, the number of the words is four) so that one final read data can be outputted in one cycle. Each of the data memory macro units 10 to 43 is configured so as to be simultaneously accessed. At the time of reading of data, one data can be simultaneously from each of the data memory macro units 10 to 43 by inputting an address fed from the MPU to an address terminal A of each of all the data memory macro units 10 to 43 and by inputting chip enable signals 0 to 3 [0:3] having been asserted to a chip enable input terminal CE of each of the data memory macro units 10 to 43. Here, xe2x80x9c[0:3]xe2x80x9d denotes the chip enable signals [0] to [3]. In the data memory, one required data is finally selected from the data read from the data macro units 10 to 43, based on a word address contained in the addresses fed from the MPU and a way number, in which a cache hit has been found, fed from the tag memory. The final data selected as above is fed to the MPU.
Moreover, writing of data to each of the data memory macro units 10 to 43 is carried out when a request for writing is fed from the MPU or when a cache miss occurs due to absence of required data in the data memory. However, in the case of the occurrence of the cache miss, the above writing is carried out after the data read from the memory system have been stored in all the word data areas 0 to 3 in the line buffer 1 as shown in FIG. 2. When data are stored in all the word data areas in the line buffer 1, the writing of data is carried out to all the data memory macro units in any one of the ways 0 to 3. For example, when the writing of the data is performed in the way 0, in order to write all word data simultaneously in one cycle, then address fed from the MPU is input to the address input terminals A of each of the data memory macro units 10 to 13 and, at the same time, each of the write data D0 to D3 is input to the data input terminals D of each of the data memory macro units 10 to 13. Moreover, by inputting each of the chip enable signals 0 [0:3] having been asserted to each of chip enable input terminals CE of all the word data areas in the way 0 and by inputting each of write enable signals 0 [0:3] having been asserted to each of write enable input terminals WE of all the word data areas in the way 0, all the word data can be written simultaneously to the data memory macro units 10 to 13 in the way 0.
FIG. 3 is a diagram explaining a conventional format of an address fed from the MPU. In the cache memory, the address outputted from the MPU is used in a state where the address is divided into four portions including a tag data portion X1, index address portion X2, word address portion X3, and byte address portion X4. The tag data portion X1 is the data to be stored in the tag memory in the cache. The address of the data memory by which an access is required by the MPU is compared with effective data in the tag memory and, when both of them match each other, the cache hit occurs. The index address portion X2 is bit strings indicating a predetermined line position in each of the ways in the cache memory. The word address portion X3 is bit strings indicating a predetermined word position in a predetermined line. The byte address portion X4 is bit strings indicating a predetermined byte position in a predetermined word.
FIG. 4 is a diagram explaining a conventional data storing position in each of the data memory macro units 10-43 contained in each of the ways 0 to 3. For example, each of the data memory macro units 10 to 13 stores data corresponding the data 0 to 3 in the word address portion X3 as shown in FIG. 3. As each of physical memory addresses of the data memory macro units 10-13 in the way 0, that is, each of the cache memory address, the same number as used in the index address portion X2 is employed. Similarly, as each of physical memory addresses of the data memory macro units 20 to 43 in the ways 1 to 3, the same number as used in the index address portion X2 is employed. Examples of the data storing positions at the time of reading and writing are shown by shaded areas in FIG. 4. At the time of reading, if the address requested for reading by the MPU is, for example, xe2x80x9c0xe2x80x9d for the index address and xe2x80x9c2xe2x80x9d for the word address, data of (x, 0, z) (x 0 to 3, z 0 to 3) containing data (x, 0, 2) is read as candidate data, as shown in FIG. 4. Out of these candidate data, one data for each of the ways 0 to 3 is selected and, further, out of the data selected for each of the ways 0 to 3, final read data is selected and read. Moreover, at the time of writing, if the index address of the read miss address caused by the cache miss is xe2x80x9c511xe2x80x9d and if the way to be written is xe2x80x9c0xe2x80x9d, data stored in the line buffer 1 is written to a place corresponding to the positions (0, 511, z) as shown in FIG. 4.
FIG. 5 is a diagram explaining an example of a conventional floor plan for an LSI (Large Scale Integrated Circuit) having a cache memory. In FIG. 5, a TAG memory section 81 of the cache memory, an MPU 82, a control section 83, and a data memory section 84 of the cache memory are shown. A size of a die 80 indicates outer dimensions of the LSI chip. In the example of FIG. 5, the data memory section 84 has 16 pieces of data memory macro units 85. Each of the data memory macro units 85 is unitized, that is, is operating as a separate unit, which corresponds to each of the 16 pieces of the data memory macro units 10 to 13, 20 to 23, 30 to 33, and 40 to 43 shown in FIG. 2.
FIG. 6 is a time chart explaining operations of the conventional data memory macro units 10 to 43 at the time of reading and writing. Each of the data memory macro units 10 to 43 operates in synchronization with a predetermined clock. As shown in FIG. 6, at the time of reading, an address signal RA1 is input during an edge T2 of the clock and, at the same time, one of the chip enable signals 0 to 3 [0:3] having been asserted is input. Read data signal RD1 is output during an edge T3 of the clock so that the MPU can latch the data signal RD1. Moreover, at the time of writing (in the example, the writing is performed in the way 0), both an address signal WA2 and one of write data signals (0 to 3) WD are input during an edge T4 of the clock and, at the same time, the chip enable signal 0 [0:3] and the write enable signal 0 [0:3] each having been asserted are also input. This causes a value of the write data (0 to 3) WD2 to be written to each of the corresponding data memory macro units.
However, in the conventional configurations described above, the data memory macro units each being unitized in numbers that can correspond to the number expressed by xe2x80x9cthe number of ways x the number of wordsxe2x80x9d (in the above example, 16 pieces) have to be prepared and each of them has to be connected to the MPU. As a result, as in the example of the floor plan shown in FIG. 5, some places exist where wirings between the MPU 82 and each of the data memory macro units 85 become long, which causes delays in data transmission between them and interferes with high-speed operations. As the number of the data memory macro units in the memory cache becomes large, an area of the LSI also increases, which causes an increase in a unit price of the LSIs.
In view of the above, it is an object of the present invention to provide a cache memory which is capable of reducing areas occupied by data memory macro units and preventing delays in data transmission caused by wirings mounted between an MPU and the data memory macro units, thus improving performance of the cache memory and inhibiting an increase in a unit price of LSIs.
According to a first aspect of the present invention, there is provided a cache memory for temporarily storing part of data stored in a main memory as cache data and having N pieces of ways being represented by two or more integers and employing a set associative method in which the cache data is managed for each of the ways, including:
a plurality of data memory macro units in which a storing position of each of the cache data is designated by a way number used to identify the way, an index number designated by part of an address to be fed to data stored in the main memory and a word number designated by other part of the address and each being able to be accessed simultaneously; and
wherein each of the cache data being given the same way number and same index number is stored in the data memory macro units being different from each other and each of the cache data being given the same index number and same word number is stored in the data memory macro units being different from each other.
In the foregoing, a preferable mode is one wherein a physical cache memory address being commonly applied among the data memory macro units is given in each of data storing positions in each of the data memory macro units and wherein each of the cache data being given the same index number and same way number is stored in the cache memory address being different among the data memory macro units.
Also, a preferable mode is one wherein each of the data being given the same index number and same word number is stored in the cache memory address being same among the data memory macro units.
Also, a preferable mode is one wherein a physical cache memory address being commonly applied among the data memory macro units is given in each of data storing positions in each of the data memory macro units and wherein each of the data being given the same index number and same way number is stored in the cache memory address being same among the data memory macro units.
Also, a preferable mode is one wherein each of the data being given the same index number and same word number is stored in the cache memory address being different among the data memory macro units.
Also, a preferable mode is one wherein each of cache data being given the same index number out of the cache data stored in each of the data memory macro units is given the different way number and different word number.
Also, a preferable mode is one wherein a plurality of the cache data stored in each of the data memory macro units includes the index numbers being arranged continuously so that the index numbers are sequentially increased by a group of the cache data having the same index number in a manner so as to correspond to arrangement of the cache memory addresses and wherein each of the cache data making up each group of the cache data having the same index number includes the way number or the word number cyclically arranged.
Also, a preferable mode is one wherein a phase in an arrangement cycle of the way numbers or the word numbers included in the cache data making up each group of the cache data is different among the data memory macro units.
Also, a preferable mode is one wherein the number of the data memory macro units exceeds the number of the ways being N pieces.
Furthermore, a preferable mode is one wherein the number of the data memory macro units is a multiple of the number N.
With the above configurations, the number of the data memory macro units can be made smaller than that in the conventional case. This can present the wiring distance between each of the data memory macro units and the arithmetic device from becoming longer, thereby avoiding easy occurrence of delays in access to each of the data memory macro units. It is thought that, in order to reduce the number of the data memory macro units to give considerations to the wiring distance described above, the capacity per one data memory macro unit is increased to correspond to the decrease in the number of the data memory macro units and the specified way number is assigned to each of the data memory macro units. However, when each of the data memory macro units the capacity of which is increased is merely classified by the way number, there is a fear that data having the same index address may be stored in one data memory macro unit. Moreover, in the cache memory, in general, since simultaneous writing of each data to be stored in different cache memory addresses in one data memory macro unit connected to one port is difficult, each data cannot be written in one data memory macro unit in one cycle. However, in the cache memory of the present invention, the data storing position is assigned so as to solve the above problems, that is, as in the conventional cache, by using comparatively small number of the data memory macro units, the cache data can be read in one cycle and can be written in another cycle.