A CPU (or a processor) that is an arithmetic processing apparatus has a plurality of CPU cores, primary cache memories (hereinafter referred to as “L1 caches”) provided inside the CPU cores, and secondary cache memories (hereinafter referred to as “L2 caches”) provided outside the CPU cores and shared by the plurality of CPU cores. Moreover, a CPU chip has memory access controllers that control access requests for access to a large-capacity main memory.
The number of CPU cores in the CPU is increased in order to improve the performance of the CPU. Progress of semiconductor miniaturization techniques enables an increase in the number of CPU cores, whereas the memory capacity of the L2 caches also needs to be increased in order to improve performance.
When the number of CPU cores or the cache capacity is increased in accordance with the miniaturization rate of semiconductors, latency depending on the distance between the CPU core and the caches is not significantly increased. However, when the number of CPU cores or the cache capacity is increased beyond the miniaturization rate of semiconductors in order to improve performance, the distance between the CPU core and the caches relatively increases, prolonging and deteriorating the latency between the CPU core and the cache. This also applies to the latency between the CPU core and the main memory. Thus, an increased number of CPU cores unexpectedly results in cache or memory latency bottleneck, hindering the performance of the CPU from being improved.
An arithmetic processing device is disclosed in JP2008-525902.
As means for preventing cache or memory latency from being deteriorated as a result of the use of multicore, additional provision of one layer of caches to the CPU cores between the L1 caches and the L2 caches has been proposed. An object of the addition is to reduce a cache miss rate in the CPU cores as much as possible by adding one layer of caches. However, an increased number of cache layers mean an increase in the number of cache pipelines, thus deteriorating the latency between the main memory and the CPU cores.