An operation processing device such as a central processing unit (CPU) includes a cache memory that is accessible at a higher speed than a main memory device (for example, refer to Japanese Laid-open Patent Publication No. 2004-199677). The cache memory is provided between a processor core such as a CPU core, an operation processing unit, and the main memory device and retains a part of data stored in the main memory device.
The operation processing device, for example, in a case where the cache memory has a hierarchical structure, includes a level two cache memory and a level one cache memory that is accessible at a higher speed than the level two cache memory. Hereinafter, the level one cache memory and the level two cache memory will be respectively referred to as a primary cache memory and a secondary cache memory.
An operation processing device including processor cores as a plurality of operation processing units includes, for example, primary cache memories provided in correspondence with the processor cores and a secondary cache memory shared by the plurality of processor cores. Hereinafter, the secondary cache memory shared by the plurality of processor cores will be referred to as a shared cache memory.
The storage capacity of the shared cache memory is larger than the storage capacity of the primary cache memory. The shared cache memory retains a part of data stored in the main memory device, and the primary cache memory retains a part of data retained in the shared cache memory. The shared cache memory retains management information for management of data retained in each primary cache memory.
The shared cache memory is accessed in a case where access target data is not retained in the primary cache memory (in a case where a cache miss occurs in the primary cache memory). The primary cache memory, in a case where a cache miss occurs, transfers a read request from the processor core to the shared cache memory. The shared cache memory transfers data specified by the read request to the processor core through the primary cache memory in a case where the shared cache memory retains data specified by the read request (in a case where a cache hit occurs in the shared cache memory). The processor core uses the data received from the shared cache memory in operation processing and the like. The primary cache memory retains the data received from the shared cache memory. Accordingly, the data specified by the read request is registered in the primary cache memory.
The primary cache memory performs replacement before registering the data received from the shared cache memory and notifies the completion of the replacement to the shared cache memory in a case where the replacement is completed. The replacement is a process of evicting some data (for example, data that is not used for the longest time) retained in the primary cache memory in a case where there is no region to register the data received from the shared cache memory. Hereinafter, replacement target data will be referred to as replaced data.
The shared cache memory performs a process related to the completion of the replacement. For example, the shared cache memory, in a case where the replaced data (replacement target data) is updated by a store operation of the processor core, performs a write-back that writes back the replaced data from the primary cache memory to the shared cache memory. Meanwhile, the shared cache memory, in a case where the processor core does not perform a store operation for the replaced data, performs a process of invalidating information related to the replaced data in the management information of the primary cache memory (hereinafter, referred to as invalidation). A write-back that writes back the replaced data from the primary cache memory to the shared cache memory is not performed in the invalidation.
The shared cache memory, for one read request, performs the invalidation and the like in addition to transfer of the data specified by the read request. Therefore, the frequency of data transfer to the primary cache memory that is based on the read request is at least one in two cycles (details will be described later) in the shared cache memory that performs each process of data transfer, a write-back, the invalidation, and the like in one cycle. Thus, the upper limit of the throughput of the shared cache memory is equal to 64 bytes/cycle in a case where, for example, the upper limit of the amount of data transfer in one cycle is equal to 128 bytes.
The number of processor cores that may share the shared cache memory having a throughput of 64 bytes/cycle for one read request is less than or equal to four in a case where the amount of data transfer by each processor core is equal to 16 bytes/cycle. The number of processor cores that may share the shared cache memory is increased by improvement in the throughput of the shared cache memory.
In one aspect, an object of an operation processing device and a method for controlling the operation processing device of the present disclosure is to improve the throughput of a shared cache memory.