An arithmetic processing device includes, for example, a decoder which decodes commands, an arithmetic unit which executes computations based on the decoded commands, and a cache memory which is disposed between the arithmetic unit and a main memory serving as a main storage device. The arithmetic unit executes computations by referring to, for example, data stored in the main memory or the cache memory. The cache memory stores therein, for example, data which is referred to by the arithmetic units.
The arithmetic processing device may shorten a waiting time for referring to data, for example, by referring to data stored in the cache memory, compared with by referring to data stored in the main memory. However, a numerical value calculation processing which uses large-scale data such as an array has a low hit ratio in the cache memory because of the low locality of the data. In this case, the cache memory is not effectively used accordingly to obtain only a small effect of shortening a waiting time for referring data.
One of solutions to the low hit ratio in the cache memory uses, for example, a prefetch in which data stored in the main memory is transferred in advance to the cache memory. A software prefetch by software and a hardware prefetch by a hardware have been known as methods of implementing prefetch.
For example, in the software prefetch, a compiler inserts, in a machine language program, a command (hereinafter, referred to as a prefetch command) to transfer in advance data stored in the main memory to the cache memory. Further, the compiler executes compile processing of converting, for example, a source program into a machine language program executable by an arithmetic processing device such as a processor.
On the other hand, in the hardware prefetch, hardware such as a hardware prefetch mechanism is provided in the arithmetic processing device. For example, if determining that sequential memory accesses will be executed, the hardware prefetch mechanism predicts data to be accessed next, and transfers in advance the data stored in the main memory to the cache memory.
However, even if applying software prefetches, an arithmetic processing device including a hardware prefetch mechanism, for example, may achieve only lowed performance in some cases. For example, both of a prefetch by a hardware prefetch mechanism and a prefetch by a prefetch command are executed on data at the same address in some cases. In other words, an unnecessary prefetch command is executed in some cases. In this case, the execution of the unnecessary prefetch command may cause decrease in performance such as a lowered transfer speed due to increase in the number of commands and increase in the transfer amount in a bus.
Accordingly, there has been proposed a technique of running both a hardware prefetch and a software prefetch together with high efficiency to enhance the performance of the arithmetic processing device. The arithmetic processing device of this type uses, for example, a memory access command added with an indication informing whether or not the command is targeted for a hardware prefetch. For example, at compile processing, if detecting memory access commands involving successive memory accesses, the compiler creates a memory access command added with the indication. For example, the complier is stopped from creating a prefetch command for a memory access command added with an indication informing that the command is targeted for a hardware prefetch.
The following are related prior art documents: Japanese Laid-open Patent Publication Nos. 2009-230374, 2010-244204, 2006-330813, 2011-81836, 2002-297379, 2001-166989 and Japanese National Publication of International Patent Application No. 2011-504274
In the case where software prefetch is applied to an arithmetic processing device including a hardware prefetch mechanism, there is a possibility that the performance of the arithmetic processing device may be lowered due to execution of unnecessary prefetch commands. Further, even by use of the method of stopping creation of a prefetch command at compile processing, unnecessary prefetch commands may be executed. For example, in some cases, at an actual operation, the arithmetic processing device may perform a hardware prefetch for a memory access command which has been determined not to be targeted for the hardware prefetch at the compile processing. In this case, the prefetch command may be executed unnecessarily.