Current computer systems introduce a cache system to conceal a speed difference between a memory and a computing unit. A cache memory, hereinafter simply referred to as “cache”, indicates a small-capacity and high-speed memory that temporarily stores therein data frequently used. To raise performance of the overall computer system, it is required to effectively utilize the cache.
There are two approaches to make efficient use of the cache. The first approach is that data having a high reuse possibility among data once stored in a cache is not sent outside the cache if possible. The second approach is that data having a high use possibility in the near future is preliminarily transferred from a slow memory to a fast cache. In the latter approach, the representative technique is a technique referred to as prefetch (for example, see Japanese Laid-open Patent Publication No. 07-56809 and Japanese Laid-open Patent Publication No. 11-143774).
A computer system having a cache system includes a small and high-speed memory that is located near a processor, and a large and low-speed memory that is located far from the processor. Data frequently utilized is saved in the memory located near the processor. In this way, all memories can be ideally in the state where all memories seem to be a near and fast memory. The recent computer system realizes an access time close to the ideal state by virtue of all kinds of efforts. The mechanism of such a hierarchical memory is referred to as cache. John L. Hennessy and David A. Patterson, Computer Architecture—A Quantitative Approach 3rd Edition MORGAN KAUFMANN PUBLISHERS, ISBN1-55860-724-2 discloses a computer system having a cache system.
Moreover, the details of micro-architecture starting with an out-of-order technology in a super scalar processor are described in Mike Johnson, Superscalar Microprocessor Design, Prentice-Hallm Inc, ISBN 0138756341. The out-of-order technology is a technology for dynamically responding to factors such as cache miss that cannot be predicted at the time of compilation while executing instructions within the processor and reconfiguring the instructions in optimum execution order.
Next, a prefetch technology is described. The prefetch technology is one of technologies for optimizing cache access. In general, when an instruction such as a load instruction of loading data from a memory is executed, the data is loaded from a main memory if the data is not present in the cache. This access is an extremely time-consuming process compared with a process of accessing the cache to obtain data on the cache.
When a memory address to be read is preliminarily known, necessary data can be preliminarily loaded from a main memory to a cache. Performing data transfer from a memory to a cache in parallel with normal processes allows high speed data load from the cache when the data is loaded after that. A process of preliminarily transferring data from a main memory to a cache in parallel with other processes in this manner is referred to as prefetch.
In the case of normal memory access, the execution of a memory access instruction is not completed until data can be acquired. On the other hand, the prefetch process is performed in parallel with the execution of another instruction. Therefore, a processor continues to process the next instruction even if data acquisition is not completed. For this reason, by performing prefetch beforehand, data can be transferred to a cache when needed.
If transferring data to a cache is completed before the data becomes actually necessary, the data is loaded from the cache at high speed. When data transferring is not completed, data is loaded after waiting until the transfer of data is completed. In this case, although an effect is small as compared to when prefetch is ready in time, the response of memory load is speeded up as compared to a situation when a prefetch instruction is not utilized because the memory load process is started in advance of the process of an instruction.
The prefetch is further explained in detail. In general, prefetch is largely classified into two kinds of prefetch, i.e., software prefetch and hardware prefetch. The performance comparison between the software prefetch and the hardware prefetch is disclosed in Tien-Fu Chen and Jean-Loup Baer, “A performance study of software and hardware data prefetching schemes”, Proc. 1994 the 21st Annual International Symposium on Computer Architecture. 
To perform software prefetch, a compiler or a programmer explicitly embeds prefetch instructions as described in David Callahan, Ken Kennedy, and Allan Porterfield, “Software prefetching”, ACM SIGARCH Computer Architecture News Volume 19, Issue 2 (April 1991) and Todd C. Mowry, Monica S. Lam, and Anoop Gupta, “Design and evaluation of a compiler algorithm for prefetching”, ACM SIGPLAN Notices Volume 27, Issue 9 (September 1992). Based on the static characteristic of a program, the prefetch instruction is embedded into the program at a point ahead of the point at which an actual memory access instruction is present, with respect to a point at which cache miss seems to occur.
In hardware prefetch, a processor implicitly performs prefetch as described in Steven P. Vanderwiel and David J. Lilja, “Data Prefetch mechanisms”, ACM Computing Surveys Volume 32, Issue 2 (June 2000) and Wei-Fen Lin, Steven K. Reinhardt, and Doug Burger, “Designing a Modern Memory Hierarchy with Hardware Prefetching”, IEEE Transactions on Computers Volume 50, Issue 11 (November 2001). The hardware prefetch is performed based on the forecast from the dynamic behavior of an application. The processor detects a consecutive stream memory access or a stride memory access performing regular continuous accesses at constant intervals and performs prefetch to perform look-ahead on these accesses.
In the conventional hardware prefetch technology, prefetch is predicted and performed based on a past access tendency only on stream access or stride access to a memory. Therefore, an application having a regular memory access pattern such as scholarly computation has effect. However, there is a problem in that a prediction accuracy of hardware prefetch is low and thus an effect of hardware prefetch is low in a general application (particularly, an application having a irregular memory access pattern).
Therefore, it is important to perceive irregular memory access depicted in a general application and perform effective hardware prefetch.