The present invention relates to a multi-port cache memory, particularly, to a multi-port cache memory consisting of 1-port SRAM (Static Random Access Memory) cell blocks adapted for decreasing the chip area of high performance microprocessors.
A multi-port cache memory formed of multi-port SRAM cell blocks is included in the multi-port cache memories used in conventional high performance microprocessors. FIG. 1 shows as an example of the architecture of a multi-port cache memory for a direct-map scheme.
The conventional multi-port cache memory shown in FIG. 1 comprises a cache-hit comparing circuit 30 and a tag memory consisting of an N-port decoder 10 and a tag storage 20 on the side of the tag, a data memory consisting of an N-port decoder 40 and a data storage 50 on the side of the data. Tag storage 20 and data storage 50 are constructed from multi-port storage cells (e.g. multi-port SRAM cells). It is possible to store 2mind tags in the tag memory. Also, 2mind cache lines are included in the data memory.
In executing a cache access from a port, the internal identification of the cache memory is performed with a tag, a cache line index and a cache line offset. The tag, cache line index and cache line offset (data word) for the n-th port are represented by Atagn, Aindn, and Awordn, respectively. Also, the number of address bits used for the tag is represented by mtag, the number of address bits used for the cache line index is represented by mind, and the number of address bits used for the cache line offset is represented by mword. Further, the number of ports of the tag memory and the data memory is represented by N.
The tags Atagn for the N ports are transmitted through a N*mtag bit wide bus into the tag memory, and the cache line indices Aindn of N*mind bits are transmitted into the N-port decoder 10 of the tag memory so as to compare the tags of the accessed data lines to the tags of the data lines stored in the data memory of the cache under the line indices Aindn. The comparison is made in a cache-hit-comparing circuit 30. If the tags Atagn are found to agree with the corresponding tags stored under the line indices Aindn, corresponding cache hit signals are transmitted into the data bus. If any of the tags Atagn do not agree with the corresponding tags stored under the line indices Aindn, the respective access operations are processed as cache-misses. Incidentally, the symbol R/Wn shown in FIG. 1 represents read and write instructions transmitted from the processor core (not shown).
Also, the cache line indices Aindn of the N ports of N*mind bits and the cache line offsets Awordn of N*mword bits are transmitted through the address bus into the N-port decoder 40 of the data memory. In the case of cache hits, the data words Dn are transmitted between the cache lines identified by the line indices Aindn in the data memory and the processor core. The merit that a cache line has more than 1 data word can be realized by using the cache line offsets Awordn added to the addresses of the data memory.
Incidentally, in the multi-port cache memory shown in FIG. 1, the tag memory and the data memory are separated from each other. However, it is possible to combine the tag memory and the data memory into one tag-data memory.
An example of a multi-port cache memory of a 2-way set-associative scheme will now be described with reference to FIG. 2. The multi-port cache memory of the 2-way set-associative scheme is an extension of the direct-map scheme described above.
The multi-port cache memory shown in FIG. 2 comprises N-port decoders 10, 10a, tag storages 20, 20a, forming 2 tag memories, cache hit comparing circuits 30, 30a, and OR gates 70 inputting the results of comparison on the side of the tag and N-port decoders 40, 40a, data storages 50, 50a, forming 2 data memories, and data enable circuits 80, 80a on the side of the data. Each of the tag storages 20, 20a and the data storages 50, 50a is formed from multi-port storage cells.
The multi-port cache memory of the 2-way set-associative scheme shown in FIG. 2 performs functions similar to those performed by the multi-port cache memory of the direct-map scheme shown in FIG. 1, except that the OR gates 70 for transmitting cache hit signals upon receipt of the results of comparison performed in the cache hit comparing circuits 30, 30a and the data enable circuits 80, 80a which permit transmitting the data words Dn between the data bus and the data memories upon receipt of the results of comparison performed in the cache-hit-comparing circuits 30, 30a are added to the multi-port cache memory of the 2-way set-associative scheme shown in FIG. 2. Therefore, the corresponding components of the multi-port memories are denoted by the same reference numerals so as to avoid an overlapping description.
FIG. 3 shows the division of the address bits for the access of a port to the cache memory into the tag Atag, the cache line index Aind, the cache line offset Aword, and the byte offset Abyte.
The conventional multi-port cache memory using the multi-port storage cells described above was not actually used in many cases. The reason is as follows.
Specifically, it is necessary for the multi-port cache memory to have a large storage capacity in order to achieve a low cache miss rate. It should be noted in this connection that the area of the multi-port SRAM constructed from multi-port storage cells increases in proportion to the square of the number of ports. Therefore, if the number of ports is increased to make the multi-port SRAM adapted for use in a high performance microprocessor, the chip area of the microprocessor is markedly increased so as to give rise to the problem that the area efficiency is lowered (Electronics Letters 35, 2185-2187, (1999)).
Also, the reason why the multi-port cache memory was not used in the past can be summarized as follows:
(1) In the conventional general purpose microprocessor, the bandwidth required for the transmission of instructions and data between the cache memory and the processor core is small, with the result that a one-port cache was capable of achieving its objective. On the other hand, if it is necessary to double the bandwidth in a higher performance microprocessor, a one-port cache can be divided into a portion performing, for example, the transmission of program instructions and another portion for transmitting the data for the execution of the program instructions, however, paying the penalty of a higher cache miss rate.
(2) As described above, the chip area is markedly increased in the conventional multi-port cache memory comprising multi-port storage cells as constituents. Therefore, it is highly uneconomical to prepare a multi-port cache memory of a large storage capacity in order to achieve a low cache miss rate.
(3) For forming a multi-port cache memory, a complex wiring is required for transmitting a large number of port addresses and data. Therefore, if a multi-port cache memory having a large area due to the construction from multi-port SRAM cells is formed on a chip separately from the processor core for achieving a hybrid integration on a printed circuit board, the number of process steps is increased because of formation of the complex wiring on the printed circuit board, which is uneconomical.
For avoiding the complexity of the wiring on the printed circuit board, it is desirable for the processor core and the multi-port cache memory to be integrated on the same chip. In this case, however, the problem of the chip area is rendered more serious.
In recent microprocessors, it is possible to execute a plurality of instructions for each clock cycle as in, for example, Pentium II and III by Intel Inc. Such being the situation, it is a serious objective in recent years to increase the number of ports for coping with the large cache access bandwidth and to develop a multi-port cache memory having a small chip area.
As described above, in a conventional multi-port cache memory constructed from multi-port SRAM cells, the area is increased in proportion to the square of the number of ports. Therefore, if the number of ports is increased, the chip area of the microprocessor is markedly increased so as to give rise to the problem that the area efficiency is lowered.