Conventionally, a computer has at least a one-layer cache memory between a Central Processing Unit (CPU) and a main storage device to hide access latency of the main storage device and improve insufficient throughput.
The rate of performance improvement of a memory system is low compared with speed/performance enhancement of the CPU and thus, in recent years it is becoming increasingly necessary to improve the hit rate of the cache memory and hide cache miss latency.
As a means of solving these problems, a prefetch technique to read data anticipated to be used in the near future into a cache memory in advance is used.
Realization methods of a prefetch can roughly be divided into two methods: a software prefetch by software, and a hardware prefetch by hardware.
In the software prefetch, a prefetch is performed by a prefetch instruction being explicitly inserted into an instruction sequence in advance by a compiler or a programmer.
In the hardware prefetch, on the other hand, address patterns such as memory access addresses and cache miss addresses in the past are stored in a prefetch address queue and when continuous memory access is performed from past address patterns, an anticipated address is prefetched.
Conventionally, regarding the hardware prefetch, a technology to determine a stride value used for the prefetch and a technology about an instruction cache device to prefetch an instruction from the memory for storage in a cache are known (See, for example, Published Japanese Translation of a PCT Application No. 2006-510082, and Japanese Patent Application Laid-Open No. HEI 11-306028.)
In a conventional software prefetch, prefetch instructions are inserted into an instruction sequence in advance, leading to flexible control. However, it is difficult for the software prefetch to insert a necessary prefetch instruction in accordance with dynamic behavior such as an occurrence of cache miss and an address calculation result.
Further, the software prefetch has a problem in that unnecessary redundant prefetch instructions are actually inserted because it is difficult for the software prefetch to insert a necessary minimum single prefetch instruction in units of cache lines in which a prefetch is performed, leading to frequent insertion of a prefetch insertion into all units of cache lines.
In a conventional hardware prefetch, on the other hand, if a plurality of sequences of continuous access exceeding the number of entries that can be recorded as address patterns in a prefetch address queue occurs at the same time, existing entries are overwritten with new address patterns related to continuous access based on LRU (Least Recently Used) control.
However, since the plurality of sequences of continuous access occurs at the same time, overwriting may occur among the plurality of sequences, creating a problem that continuous access cannot be detected so that a hardware prefetch is not generated.
For example, FIG. 10 shows changes of address patterns held in a prefetch address queue 100 by conventional LRU control. The prefetch address queue 100 has four entries of entries 0 to 3 and can record four address patterns.
FIG. 10 also shows changes of address patterns of the prefetch address queue 100 when memory access of five different sequences (here, access addresses A to E) occurs at the same time. Here, the vertical direction in FIG. 10 shows the time.
First, address patterns of sequences A to D are successively registered at times t1 to t4. Here, as a general hardware prefetch mechanism, cache line addresses A+1 to D+1 following the access addresses are registered at the entries 0 to 3 of the prefetch address queue as address patterns respectively.
That is, the address pattern A+1 is registered at the entry 0 in accordance with an occurrence of access of the sequence A at time t1. Also, the address pattern B+1 is registered at the entry 1 in accordance with an occurrence of access of the sequence B at time t2. Further, the address pattern C+1 is registered at the entry 2 in accordance with an occurrence of access of the sequence C at time t3. Then, the address pattern D+1 is registered at the entry 3 in accordance with an occurrence of access of the sequence D at time t4.
Accordingly, all the entries 0 to 3 of the prefetch address queue 100 are used.
When continuous access further proceeds and access of the subsequent sequence E occurs at time t5, the address pattern A+1 of the oldest entry 0 is overwritten by the LRU control to register the address pattern E+1 at the entry 0.
Then, when continuous access of the sequence A occurs at time t6, the address pattern A+1 registered till time t4 at the entry 0 has been deleted by being overwritten at time t5. Thus, even if continuous access of the sequence A occurs at this point, a prefetch request of the sequence A cannot be issued.
Then, at this point, overwrite processing of the address pattern B+1 of entry 1 is performed with the address pattern A+2.
Henceforth, as shown in FIG. 10, like the sequence A, even if continuous access of the sequences B to D occurs at times t7 to t9, prefetch requests of the sequences B to D cannot be issued either.
According to the conventional LRU control, as described above, if access of more sequences than the number of entries of the prefetch address queue 100 occurs at the same time, address patterns of respective sequences are mutually overwritten in the prefetch address queue 100. As a result, a problem arises in that a prefetch request cannot be issued.
Moreover, the hardware prefetch operates according to an algorithm implemented as a circuit and thus, timing of performing a prefetch and a prefetch distance between the access address and an address of data to be prefetched are fixed, leading to less flexible control than that of the software prefetch.
Further, the hardware prefetch is conventionally invisible to compilers and programmers and thus, the hardware prefetch and software prefetch are optimized independently. Therefore, there is a problem of less efficient optimization such as the same address being prefetched by both the hardware prefetch and software prefetch.