1. Field of the Invention
The present invention relates to a cache memory system, and more specifically to a cache memory system including a plurality of direct-mapped cache memories organized in a hierarchical structure.
2. Description of Related Art PA1 a first cache memory and a second cache memory, each of the cache memories including a data memory section for storing data, a tag memory section for storing an address tag for data stored in the data memory section, a comparator for comparing an output of rho tag memory section with the address tag of a given address, and a hit generator for generating a hit signal on the basis of an output of the comparator, and PA1 means for controlling the first cache memory and the second cache memory in such a manner that, in response to an access from an external, the first cache memory is accessed in the first place, and if the hit signal is not generated in the first cache memory, the second cache memory is accessed, and if the hit signal is not generated in the second cache memory, an external memory is accessed.
Under a recent advancement circumstance of semiconductor devices, a clock frequency of microprocessors is increasing more and more, but an access time for a DRAM (random access memory) and a ROM (read only memory), which constitute a main memory, is not so shortened. To compensate this Speed gap, it has been frequently adopted to provide between a processor and the main memory a cache memory which is composed of a high speed memory of a small memory capacity.
For example, a fundamental architecture of the cache memory is described in detail in "Computer Architecture: A Quantitative Approach" John L. Hennessy & David A. Patterson, Morgan Kaufman Publishers Inc., 1990.
In brief, the main memory is previously divided into a number of blocks each having the same capacity ordinarily on the order of 16 bytes, and some of the blocks is stored in the cache memory. The block in the cache memory is called an "entry". Each of the entries of the cache memory is composed of three sections: (1) a dam memory section for storing data, (2) a tag memory section for storing information (called a "tag") about at which address the data stored in the data memory section is located within an address space, and (3) a status flag indicating whether or not a valid data is stored in the entry. This status flag may have a different meaning in a different system.
The structure of the cache memory can divided into a direct mapped type, a set associative type, and a fully associative type.
1 Direct mapped:
Each entry includes one set of a tag section, a status flag and a data section which are formed of a RAM, respectively. The RAM (tag section and data section) is accessed by using least significant bits (index) of a given address, as an address information. It an output of the tag section is equal to most significant bits (tag) of the given address, and if the status flag indicates "valid", the data of the data section of the entry concerned is valid, namely, is hit.
2 Set associative:
Them are provided N sets of direct-mapped RAMs (ordinarily, two sets or four sets), which are accessed in parallel to each other. If any one of the sets is hit, the output of the data section of the hit set is selected.
3 Fully associative:
The tag section of each entry has a comparator for directly comparing the given address and the content of the tag section. Ordinarily, it is constituted of a CAM (content addressable RAM).
The hit rate (hit number/access number) of the above mentioned three types is highest in the fully associative type 3, and drops in the order of the set associate type 2 and the direct mapped type 1. However, the access time of the cache memory is longest in the fully associative type 3, and becomes short in the order of the set associate type 2 and the direct mapped type 1. In addition, when the cache memory is implemented on a LSI (large scaled integrated circuit) chip, the required area becomes large in the order of 1&lt;2&lt;3. In particular, the fully associative type is disadvantageous in that the required area is large and the necessary processing is complicated at the time of the missed hit.
In a recent CPU of the RISC (reduced instruction set computer) type, since the clock frequency is directly influenced by the access time of the cache memory, the direct mapped type has been adopted in many cases.
Here, referring to FIG. 8, there is shown a graph of a miss rate ("1" minus "hit rate") in the direct mapped type 1 and the Set associate type 2. This graph was based on the data shown in FIG. 8.12 of "Computer Architecture: A Quantitative Approach" quoted hereinbefore.
In the direct mapped type 1 and the set associate type 2, since different addresses are forced to use the same entry, a missing (conflict missing) occurs, and therefore, the miss rate is higher than the fully associate type 3.
In addition, it would be seen from FIG. 8 that the larger the memory capacity of the cache memory becomes, the lower the cost/performance becomes. When the cache memory capacity is doubled, the miss rate is lowered by a constant rate (0.7 to 0.8 times), but when the cache memory is implemented in the LSI, the required area is in proportion to the memory capacity.
Therefore, in order to realize the cache memory and the CPU on a single chip LSI, a cache memory system having a preferable area saving property and a low miss rate is desired.
Here, another conventional architecture of the cache memory system is discussed. A high efficient cache memory system having a direct mapped cache memory (which is high in the access speed, but low in the hit rate) added with a small size of cache memory of the fully associative type (which is low in the access speed, but high in me hit rate), is disclosed in "Improvement Direct-Mapped Cache Performance by the addition of a Small Fully-associative Cache and Prefetch Buffers", Norman P. Jouppi, 1990 IEEE International Symposium on Computer Architecture.
In this cache memory system, when the direct-mapped cache memory (primary cache) is missed, the fully-associative cache (called a "victim cache" in the above quoted paper) is accessed as a secondary cache. Then, if the secondary cache is missed, the main memory is accessed. Since both of the caches are implemented together with the CPU on the same LSI chip, a transfer between the primary cache and the secondary cache is performed at a high speed (one clock cycle). Since most of the memory accesses which had mishit in the direct-mapped cache memory due to the conflict miss hits in the secondary cache, the miss rate reducing effect obtained by doubling the memory capacity of the primary cache can be obtained by the secondary cache of four to eight entries.
This cache memory is disadvantageous in that the secondary cache is of the fully-associative type. Since the fully-associative cache memory has a comparator provided for each of he entries, the required chip area is remarkably increased. In addition, a control logic for determining which of the entries should be replaced when the memory access is not hit, is very complicated (ordinarily, LRU (least recent used)), and the testing is difficult. On the other hand, the defect that the access speed is low is not a problem, since the secondary cache is accessed only when the primary cache is not hit.
As mentioned above, the direct-mapped cache is high in the access speed and small in the required chip area, but low in the hit rate. The set-associative cache is inferior to the direct-mapped cache in the access speed and in the area efficiency, and inferior to the fully-associative cache in the hit rate. The fully-associative cache has the hightest hit rate, but the lowest access speed and remarkably large in the required chip area. In addition, the required control logic is complicated, and the testing is difficult.
Furthermore, the victim cache is excellent in that the high hit rate of the fully-associative type can be enjoyed while maintaining the high speed operation of the direct-mapped type. However, this still has the disadvantage of the fully-associative type, namely, the large required chip area and associated complications. Therefore, a further improvement is needed.