In a parallel computer system that is constituted so that a plurality of information processing devices (computers) is coupled to a network as data processing computers (nodes), higher performance may be obtained as the number of nodes that are coupled to the network is increased. Therefore, processing in which high performance is requested is generally caused to be executed by the parallel computer system.
The parallel computer system is a distributed-memory type computer system in which each of the nodes includes a memory space. Therefore, each of the nodes obtains data from further piece of data as appropriate.
Each of the nodes includes a communication control device such as a network interface card (NIC) for communication through a network, and an arithmetic processing device such as a central processing unit (CPU). Generally, in the arithmetic processing device, a plurality of processor cores that functions as a single processor are installed, and in each of the processer cores, a cache memory is provided. Using them, the arithmetic processing device reads desired data on a main memory onto the cache memory.
In the cache memory, a plurality of cache lines is provided as a data storage area. Reading and writing of data in the cache memory is performed in a unit of the cache line.
Data on the cache memory is updated as appropriate. Therefore, the data on the cache memory may not be matched with data on the main memory and data on a further cache memory. In order to execute appropriate processing, it is desirable that appropriate data is used. Therefore, in a system environment in which the plurality of cache memories exists such as the parallel computer system, cache coherency control is performed so that there is no conflict between contents of the plurality of cache memories. A cache coherency protocol is a protocol that is used for the cache coherency control, and as the cache coherency protocol, there are a MSI protocol, a MESI protocol, a MOESI protocol, and the like.
In the MOESI protocol, the states of the cache lines on the cache memory are classified into five states of “M” (Modified), “O” (Owned), “E” (Exclusive), “S” (Shared), and “I” (Invalid).
In the “M” state, data merely exists on the cache memory of the processor core, and the content of the data is not matched with a content of data on the main memory. In the “E” state, data merely exists on the cache memory of the processor core, and the content of the data is matched with a content of data on the main memory. In the “S” state, data exists on the cache memory of the processor core and a further cache memory. In the “I” state, a cache line is invalid. In the O state, a content of data on the cache memory of the processor core is not matched with a content of data on the main memory, and the data on the cache memory of the processor core exists on a further cache memory as well. The “O” state is different from the “S” state in that write-back caching in which data is stored in the main memory is performed. That is, in the plurality of cache memories in which data that is not matched with a content of data on the main memory exists, merely a single cache memory becomes in the O state, and the other cache memories become in the S state.
Generally, for the parallel computer system, “multi-thread” is employed in which the nodes execute the smallest execution units of programs, which are called threads, all at the same time. Pieces of data that are stored on the main memories by the nodes are shared resources, and it is desirable that the pieces of data are synchronized. As an operation that is executed in the parallel computer system, there is an atomic operation that is not divided into a smaller operation in order to synchronize the pieces of data or perform exclusive control.
The atomic operation corresponds to a plurality of series of operations that are used to execute simple mathematical calculation or simple logical calculation for data. When the atomic operation is being executed, data is locked, and the atomic operation is completed before access by a further thread. Therefore, the pieces of data may be synchronized.
As the atomic operation, for example, there is “Fetch and Add”. “Fetch and Add” includes the following series of operations.
(1) Read data from the main memory onto the cache memory
(2) Combine the read data and an operand
(3) Perform write-back caching of the addition result to the main memory
When “Fetch and Add” is being executed, a further thread (or process) is not allowed to access data that is a read target on the main memory. As a result, it is avoided that the further thread obtains data before the addition result and performs rewriting to the addition result.
There are further various examples of the atomic operation. For example, “Compare and Swap” is an atomic operation in which values of data of an operand and data on the main memory are compared with each other, and when the values of the data of the operand and the data on the main memory are matched with each other, the data on the main memory is replaced with a value of data of a further operand.
Even when the atomic operation is executed between nodes, it is desirable that consistency of the cache memories (cache coherency) is kept. Therefore, in a related art, an arithmetic processing device (processor core that is installed in the arithmetic processing device) checks a state of a cache line in which target data of the atomic operation is stored, and executes processing that corresponds to the check result. For example, when the checked state of the cache line is the E state or the S state as the check result, the arithmetic processing device causes the state to transit to the I state, and when the checked state of the cache line is the M state or the O state, the arithmetic processing device performs write-back caching of the target data to the main memory and causes the state to transit to the I state. After that, the arithmetic processing device executes the atomic operation for the target data on the main memory. Due to such processing, the cache coherency may be kept.
Japanese National Publication of International Patent Application No. 2010-507160, and Japanese Laid-open Patent Publication No. 2008-204101 are the related arts.