1. Field of the Invention
The present invention relates to a cache memory configured to perform a pipeline processing of a memory access from a processor.
2. Description of Related Art
A cache memory that uses a clock synchronous SRAM (synchronous SRAM) and adopts a pipeline structure has been put to practical use. A cache memory having a pipeline structure is arranged between a processor and a low-speed memory and processes a memory access request from the processor by the pipeline which is divided into a plurality of process stages (see Japanese Unexamined Patent Application Publication No. 10-63575, for example). The processor that performs a memory access to the cache memory having the pipeline structure is typically a RISC (Reduced Instruction Set Computer) type microprocessor. The processor may be the one of a CISC (Complex Instruction Set Computer) type or may be a DSP (Digital Signal Processor) for performing a digital signal processing such as a speech processing or an image processing. When the cache memory having the pipeline structure is used in a second cache or in a cache which has a lower order, a higher order cache memory corresponds to the processor that performs a memory access to the cache memory.
It can be expected that throughput is improved by increasing a number of pipeline stage of the cache memory. On the other hand, cache access time, which is the time required to get a result after the processor gives the access request to the cache memory, is increased. The number of pipeline stage of the cache memory is typically two because the increase of the cache access time is undesirable.
On the other hand, especially in a set associative type cache memory, another configuration is also known for reading out of data by accessing only a way that is hit in response to a load request instead of reading out of the data from all the way of the data memory for the purpose of reducing power consumption of the cache memory.
A configuration example of the cache memory having a two-stage pipeline structure is shown in FIG. 5. A cache memory 8 shown in FIG. 5 is a four-way set associative type cache memory and is arranged between a processor 2 and a main memory 3 which is a low-speed memory. The cache memory 8 and the processor 2 are connected by an address bus 4, a data bus 5, and a WAIT signal line 7. And the cache memory 8 and the main memory 3 are connected by a memory bus 6.
A data memory 10 included in the cache memory 8 is configured to store the data corresponding to a subset of the data stored in the main memory 3. A storage area of the data memory 10 is physically or logically divided into four ways. Furthermore, each way is managed by a data storage unit which is a multiple word unit called line. A place where the data is stored in the data memory 10 is designated by decoding a lower part of an input address which is supplied from the address bus 4. More specifically, the line is designated by an index address which is a higher order part of the lower part of the input address and a word position in the line is designated by a word address which is the lowest part of the input address. An example of the input address is shown in FIG. 7. Each bit number of the above-described word address, the index address, and a tag address which is arranged in a higher part of the word address and the index address is decided depending on how a number of ways of the cache memory 8, a number of lines included in one way, and a number of words included in one line are designed.
A tag memory 11 is configured to store the tag address corresponding to the data stored in line in the data memory 10. The tag memory 11 receives the index address value included in the input address and outputs the tag address identified by decoding the index address. The cache memory 8 shown in FIG. 5 is the four-way type cache memory and outputs four tag addresses corresponding to the four ways in response to one index address which is input. The tag memory 11 has a valid flag (not shown) showing a validity of the stored tag address and a dirty flag (not shown) showing that there is a mismatch between the data stored in the data memory 10 and the data stored in the main memory 3 due to the data memory 10 being updated by the store access.
A hit decision unit 12 makes a decision whether there is a cache hit or a miss hit by comparing the tag address included in the input address with four tag addresses output from the tag memory 11. More specifically, the hit decision unit 12 outputs a signal indicating the cache hit when the tag address included in the input address and the output of the tag memory 11 are matched. The hit decision unit 12 outputs a signal indicating the miss hit when the tag address included in the input address and the output of the tag memory 11 are not matched. The output signal of the hit decision unit 12 is a four-bit signal indicating a hit decision result for one way in one-bit logical value respectively.
A controller 83 controls reading out of the data from the data memory 10 by outputting a chip select signal (CS signal) and a read strobe signal (RS signal) to the data memory 10 when a hit decision result by the hit decision unit 12 is the cache hit. On the other hand, when the hit decision result by the hit decision unit 12 is the miss hit, the controller 83 controls rewriting of the tag memory 11 in order to store the tag address included in the input address in the tag memory 1 and controls data refilling of the data memory 10. The control of the data refilling means the controls of reading out of the data from the main memory 3 and rewriting of the data memory 10 by the data read out from the main memory 3. The controller 83 outputs a WAIT signal using the WAIT signal line 7 to make a notice to the processor 2 that the miss hit has occurred.
An address latch 14 is a circuit for holding at least the tag address part of the input address for one clock cycle. For example, the address latch 14 can be composed of D flip-flops. The data stored in the address latch 14 is used as a data input to the tag memory 11 when the tag memory 11 is rewritten.
Referring now to FIG. 6, a behavior of the cache memory 8 is described. FIG. 6 shows a pipeline behavior of the cache memory 8 when a load request made by the processor 2 is processed. Part (a) of FIG. 6 shows the behavior when the hit decision result is the cache hit and part (b) of FIG. 6 shows the behavior when the hit decision result is the miss hit. In a first stage of the pipeline, the tag memory 11 receives the input address supplied from the processor 2 and outputs four tag addresses corresponding to the index address of the input address. Also in the same first stage, the hit decision unit 12 performs the hit decision.
When the decision result made by the hit decision unit 12 is the cache hit, the input address, the CS signal, and the RS signal are input to the data memory 10 at a last part of the first stage. As shown in the part (a) of FIG. 6, in a second stage just after the first stage, the data is read out from the data memory 10 and output to the processor 2. The data output from the cache memory 8 is stored in a storage area of the processor 2 such as a general register.
On the other hand, when the decision result made by the hit decision unit 12 is the miss hit, the controller does not output the CS signal and the RS signal at the last part of the first stage. Then as shown in the part (b) of FIG. 6, in the second stage, the controller 83 performs a process of deciding a replacement way and an update process of the tag address corresponding to the line decided as the replacement way held in the tag memory 11 with new tag address included in the input address. In the same second stage, the controller 83 performs a read access to the main memory 3, and the data corresponding to the input address is read out from the main memory 3 and stored in the data memory 10. Also in the same second stage, the data read out from the main memory 3 is output to the processor 2.
As stated above, the cache memory 8 shown in FIG. 5 reads out the tag address from the tag memory 11 and performs the hit decision by the hit decision unit 12 in the first pipeline stage. When the hit decision result is the cache hit, the cache memory 8 reads out a data from a hit way of the data memory 10 and transfers the data which is read out to the processor 2 in the second pipeline stage. On the other hand, when the hit decision result is the miss hit, the cache memory 8 decides the replacement way, updates the tag memory 11, updates the data memory 10 with the data read out from the main memory 3, and transfers the data read out from the main memory 3 to the processor 2 in the second pipeline stage.
However, in the cache memory having the two-stage pipeline as stated above, it is difficult to improve operating frequency of the cache memory 8 because it executes large amount of operations in one stage, and it is impossible to make the whole process fast enough to be satisfied (impossible to improve throughput of the cache memory 8). Therefore, the present inventor tried to make the cache memory having a three or more stage pipeline, and to build a configuration in which the process of reading out of the tag address from the tag memory and the process of hit decision are performed in different pipeline stages. However, as a number of pipeline stages of a cache-memory is increased, the present inventor has faced a problem as described below which prevents an efficient behavior of the cache memory.
Now we assume that the miss hit occurs in one memory access request and the tag memory is updated. The problem here is that this update result of the tag memory due to the occurrence of the miss hit is not reflected to the hit decision made in response to the memory access request which is made immediately after the miss hit occurs. If this situation is left as it is, when the memory access request which is made immediately after the miss hit occurs is the access to the same memory block as the memory access request in which the miss hit occurs, it is decided again that the result is the miss hit and an unwanted data refill process is performed even though the data refilling from the low-speed memory has already performed according to the detection of the miss hit. Similarly, when the memory access request which is made just after the miss hit occurs is the access to the memory block that is to be replaced by the data refill process due to the detection of the miss hit in the memory access request in which the miss hit occurs, it is decided that the result is the cache hit even though it should be decided as the miss hit and the incorrect data is read out.
As one solution to prevent the unwanted data refill action and to prevent the incorrect data from being read out from the cache memory as described above, we also examined to adopt another architecture. In this architecture, when the miss hit occurs in one memory access request, the process of the subsequent memory access request is performed again from the beginning of the pipeline, in other words from the process of accessing to the tag memory. This architecture is the one that is adopted in the RISC type microprocessor, for example. However, this architecture also causes other problems that a hardware size is increased and a complexity of a control section for controlling a retry sequence is increased.