An arithmetic processing unit such as a central processing unit (CPU) includes a cache memory accessible at high speed compared with a main memory device (see, for example, International Publication Pamphlet No. WO 2009/104240 and Japanese Laid-open Patent Publication No. 2006-40090). The cache memory (also called a cache) is disposed between a processor core such as a CPU core and a main memory device and retains a part of information stored in the main memory device. For example, the cache memory includes a pipeline processing section that sequentially executes, in a plurality of stages, processing based on a request or the like from the processor core.
Note that, when the cache memory has a hierarchical structure, for example, the arithmetic processing unit includes a cache memory of a second level and a cache memory of a first level accessible at high speed compared with the cache memory of the second level. In the following explanation, the cache memory of the first level and the cache memory of the second level are respectively referred to as a primary cache memory and a secondary cache memory as well. Some arithmetic processing units include the primary cache memory and the like divided into an instruction cache memory for retaining instructions and a data cache memory for retaining data.
The number of stages of pipeline processing in the cache memory tends to increase according to an increase in the frequency and multi-threading of the arithmetic processing unit. According to the increase in the number of stages of the pipeline processing, a penalty (for example, latency of the pipeline processing) in the case of a stall of the pipeline processing increases.
For example, in the instruction cache memory, processing based on a request from the processor core is executed according to request order from the processor core. Therefore, when input of one request among a plurality of requests to the pipeline processing section is delayed, input of the remaining requests is sometimes delayed as well. In this case, latency from issuance of a request by the processor core until return of a result of the request to the processor core by the instruction cache memory increases. That is, it is likely that latency in the case of cache hit of the instruction cache memory increases.