An arithmetic processing device (a central processing unit (CPU) or a processor) includes a plurality of cores, a last level cache (LLC) shared by the plurality of cores, and a memory controller. The hierarchy of a cache memory includes a level-1 cache (L1 cache) provided inside a core and a level-2 cache (L2 cache) provided outside the core and shared by a plurality of cores, for example. In this case, the L2 cache corresponds to the LLC. Alternatively, when the hierarchy of a cache memory includes an L1 cache and an L2 cache provided inside a core and a level-3 cache (L3 cache) provided outside the core and shared by a plurality of cores, the L3 cache corresponds to the LLC.
In any hierarchy structure, when a cache miss occurs in an LLC, the LLC issues a fetch request to a memory controller, and the memory controller accesses a main memory to read data and returns a data response to the LLC. The LLC registers (fills) the read data in a cache and returns the data response to a core.
The cache capacity tends to increase. That is, with miniaturization in processes, the number of cores integrated to a chip increases. Moreover, with an increase in the number of cores (threads), the associativity (the number of ways) of a set-associative cache also increases. As a result, the capacity of an LLC shared by a plurality of cores also increases. Thus, the chip size of high-end processor chips tends to increase with improvement in performance regardless of a reduction in the area resulting from miniaturization.
In view of such circumstance, when a processor having many cores employs an LLC configuration in which all cores can equally access the LLC, a data access path to the LLC is lengthened due to a large chip size and the large-capacity LLC and the hit delay of the LLC increases.
Thus, instead of a single LLC configuration in which an LLC is shared by all cores, a configuration in which an LLC is divided to a plurality of LLCs and each of a plurality of core groups shares the divided LLCs has been proposed. In such a configuration, the LLCs shared by each core group have a small capacity, a physical distance from a core in the core group to each LLC is small, and the control is simple. Thus, high-speed access can be realized. That is, the LLC hit latency in a configuration including a plurality of clusters in which a limited number of cores share a small-capacity LLC is smaller than that in the large-capacity, single-LLC configuration in which the LLC can be accessed equally from all cores. In this configuration, when the data of the cache memory is less shared between clusters, the LLC exhibits the maximum performance.
Japanese Laid-open Patent Publication No. H8-137749 discloses a technique of dynamically changing the cache capacity allocated to multiprocessors.