Some related information processing apparatuses use Non-Uniform Memory Access (NUMA) technology. With information processing apparatuses that use NUMA technology, each memory is connected to its corresponding arithmetic processing unit (hereinafter, referred to as a central processing unit (CPU)), which in turn is connected to other CPUs and thus the memories are shared by the CPUs.
When CPUs that share a single memory read new data from a main memory, i.e., a main storage, due to a program being executed, the CPU registers the data from the main memory in a cache memory that is built into the CPU. At this point, if there is no free space in the cache memory, the CPU performs a replacement process by selecting an entry that has been registered in the past and replacing the data in the selected entry with newly read data.
Furthermore, when the CPU executes the replacement process, the CPU discards data or performs a write back in which the data is written back to the main memory on the basis of the cache state, which indicates the consistency of data shared among the multiple CPUs. For example, if the cache state of the data is dirty indicating that the data has been changed, the CPU requests a CPU that is connected to the main memory that retains the data, to write back the data. In contrast, if the cache state of the data is clean indicating that data has not been changed, the CPU notifies a CPU that is connected to the memory that retains the data that the data is to be discarded.
Furthermore, there may be a case in which a fetch access from another CPU to data to be replaced occurs at the same time as the CPU executes the replacement process. In such a case, the CPU connected to the main memory that retains the data arbitrates the order of the replacement process and the fetch access to the data that is to be replaced.
A process for arbitrating the order of the replacement process and the fetch access to the data that is to be replaced will be described with reference to FIGS. 17 and 18. In a description below, a CPU that is the request source of the replacement process is represented by a Local (L)-CPU; a CPU that is connected to a memory that retains data to be subjected to the replacement process is represented by a Home (H)-CPU; and a CPU that requests the fetch access to the data to be replaced is represented by a Remote (R)-CPU.
FIG. 17 is a sequence diagram illustrating an example of the operation of a conventional replacement process. FIG. 17 illustrates a case in which the replacement process precedes the fetch access to the data to be replaced. First, when the replacement process occurs in the L-CPU, the L-CPU issues a replacement request to the H-CPU that corresponds to the target address (Step S901). Then, when the replacement process is established in the H-CPU, the H-CPU issues a start authorization to the L-CPU (Step S902).
Furthermore, after issuing the start authorization to the L-CPU, the H-CPU receives, from the R-CPU, a fetch access request for data to be replaced (Step S903). At this point, the H-CPU allows the fetch access to wait until the replacement process has been completed (Step S904).
After receiving the start authorization, the L-CPU checks the cache state of the data to be replaced at this time. In this case, because the cache state of the data is dirty, as represented by M (Modified), the L-CPU issues a write back request to which the data is attached (Step S905).
Then, the H-CPU executes the write back of the data received from the L-CPU, updates directory information to “L=I”, indicating that the L-CPU does not retain the data, and responds to the L-CPU by informing it of the completion of the replacement process (Step S906). At this time, the L-CPU does not have the data to be replaced. Then, the H-CPU updates the directory information to “R=E”, indicating that the R-CPU retains data that may possibly be dirty, and responds to the R-CPU by sending fetch data (Step S907).
FIG. 18 is a sequence diagram illustrating an example of the operation of a conventional replacement process. FIG. 18 illustrates a case in which the fetch access to the data to be replaced precedes the replacement process.
The H-CPU receives, from the R-CPU, a fetch access request for the data to be replaced (Step S920) and requests the L-CPU, which retains the requested data in the cache memory, to transfer the data (Step S921). Furthermore, after the H-CPU has requested the L-CPU to perform the data transfer, the H-CPU receives a replacement request from the L-CPU (Step S922). At this point, the H-CPU allows the replacement process to wait until the fetch access ends (Step S923).
The L-CPU that receives the transfer request for the data from the H-CPU instructs the H-CPU to execute the write back of the data to be replaced (Step S924) and transfers the data to be replaced to the R-CPU (Step S925). Then, the L-CPU updates the cache state of the data to be replaced to “I (Invalid)”, indicating that the cache is invalid.
Then, the H-CPU executes the write back and updates the directory information to “L=I”, indicating that the L-CPU does not retains the data and to “R=E”, indicating that the R-CPU retains the data that may possibly be dirty. After the fetch access ends, the H-CPU issues a start authorization to the L-CPU (Step S926).
The L-CPU receives the start authorization, recognizes that the cache state of the data to be replaced at this point is “I”, and thus instructs the H-CPU to discard the data (Step S927). Furthermore, the H-CPU updates the directory information to “S” indicating that the R-CPU retains clean data. Then, the H-CPU responds to the L-CPU by informing it of the completion of the replacement process (Step S928).
As described above, in the information processing apparatus, the H-CPU arbitraries the order of the replacement process and the fetch access to the data that is to be replaced.
Patent Document 1: Japanese Laid-open Patent Publication No. 2007-048314
Patent Document 2: Japanese Laid-open Patent Publication No. 2007-199999
However, the problem with the conventional technology described above is that it increases the amount of communication traffic during the replacement process.
Specifically, the L-CPU requests the H-CPU to execute the replacement process and obtains a start authorization. Then, after receiving the start authorization, the L-CPU transmits, to the H-CPU, information indicating that the L-CPU discards the write back request or the data and receives a completion response from the H-CPU indicating that the replacement process is completed. In this way, communication occurs twice between the L-CPU and the H-CPU after the L-CPU requests the replacement process and before the completion thereof.
Furthermore, in a large-scale system, there is sometimes a long physical distance between the L-CPU and the H-CPU because multiple XBs (crossbar switches) are arranged between the L-CPU, which requests replacement, and the H-CPU. In such a case, because the communication time between the L-CPU and the H-CPU increases, the completion of a fetch access to an address that is the same as that of a replacement process is slowed down.