An operation processing device, such as a central processing unit (CPU), or the like includes an operation processing unit that performs operation, and a cache memory disposed between the operation processing unit and a main memory. The operation processing unit refers to data stored in the main memory or the cache memory, and performs operation. The cache memory stores a part of data in the main memory.
The operation processing device refers to data stored in the cache memory that is disposed in the same semiconductor device as the operation processing device, and operates with the same clock so that the operation processing device is allowed to shorten a wait time when referring to the data compared with data stored in the main memory. In this regard, in numerical calculation processing that uses large-sized data, such as an array, when the locality of data (for example, the possibility of data once referenced is referenced again) is low, the hit rate of the cache memory is decreased. If a cache miss occurs, a wait time when the operation processing device refers to data increases by a time period of transferring the data from the main memory to the cache memory compared with the case of a cache hit.
For a method of reducing a decrease in the hit rate of a cache memory, prefetching data stored in the main memory into the cache memory so that requisite data is recorded in advance is used (for example, refer to Japanese Laid-open Patent Publication No. 2001-166989 and Japanese National Publication of International Patent Application No. 2006-524375). As a method of achieving prefetch, hardware prefetch using hardware and software prefetch using software are known.
In hardware prefetch, a proposal has been made of a method of achieving prefetch by using a stride prediction table that stores a stride value of memory accesses executed at predetermined address intervals (hereinafter also referred to as a stride value) (for example, refer to Japanese National Publication of International Patent Application No. 2006-516168). In the hardware prefetch of this kind, when instructions in a loop defined by a loop sentence are unrolled, a stride prediction table having the number of unrolled instructions (hereinafter also referred to as the number of unrolling) is used.
Unrolling instructions in a loop defined in a loop sentence is executed at the time of compiling, or the like in order to increase execution speed of a program compared with executing the loop sentence without unrolling the instructions in the loop. In this regard, in software prefetch, a proposal is made of a method of determining a position where a prefetch instruction is inserted in accordance with the number of unrolling (for example, refer to Japanese Laid-open Patent Publication No. 7-306790).