Heretofore, a parallel computer system is known, in which a plurality of information processing apparatuses mutually sends and receives data and performs arithmetic operations. For an example of such a parallel computer system, a parallel computer system is known, in which a plurality of information processing apparatuses that does not share a memory space is mutually connected to each other through an interconnection network.
The information processing apparatus provided on such a parallel computer system includes a main memory that is a main storage device to store data for use in arithmetic operations, an arithmetic processing unit that performs arithmetic operations, and a communication device that sends and receives data for use in arithmetic operations with a different information processing apparatus. The communication device included in such an information processing apparatus sends and receives data involved in arithmetic operations with the different information processing apparatus through an interconnection network, and stores the received data on the main memory.
Moreover, since the arithmetic processing unit is operated at faster speed than a frequency for use in reading data out of an external main memory of the arithmetic processing unit, the arithmetic processing unit is not enabled to efficiently perform arithmetic operations as compared with the processing of data stored on a cache memory in the arithmetic processing unit, in the case where data for use in arithmetic operations is stored on the main memory. Therefore, the arithmetic processing unit includes a cache memory that can read and write data faster than the main memory does, and stores data for use in arithmetic operations on the cache memory, so that the arithmetic processing unit increases the speed to read data in performing arithmetic operations, and efficiently performs arithmetic operations.
Here, in the case where a typical communication device receives data from a different information processing apparatus, the communication device causes the arithmetic processing unit to perform a series of processes related to receiving data as an interruption process with respect to arithmetic operation processes. However, in the case where the arithmetic processing unit performs a series of processes related to receiving data as an interruption process, the arithmetic processing unit saves data held on a large number of arithmetic registers or setting registers, for example, or reconstructs saved data in association with switching processes, causing an increase in communication delay.
In the parallel computer system, a plurality of information processing apparatuses is connected to each other with interconnectors in such a way that communication delay between the information processing apparatuses falls in a predetermined delay time. Moreover, the arithmetic processing unit included in the parallel computer system repeats processes in which the arithmetic processing unit waits for the reception of data sent from a different information processing apparatus, performs arithmetic operations, and sends the result of arithmetic operations to the different information processing apparatus. Therefore, in the case where the arithmetic processing unit performs a series of processes related to receiving data as an interruption process to increase communication delay in association with switching processes, the efficiency of calculation processing in the parallel computer system is degraded.
Therefore, in the parallel computer system, in a period in which the communication device stores the data received from the different information processing apparatus on the main memory, the arithmetic processing unit performs a polling process in which the arithmetic processing unit repeatedly reads data out of memory addresses at which data is stored. Since the arithmetic processing unit performing such a polling process does not switch between processes related to receiving data and arithmetic operation processes, communication delay is reduced, and the efficiency of calculation processing is maintained.
Moreover, in the case where the arithmetic processing unit directly acquires data received at the communication device without through a buffer for receiving data, communication delay can be reduced more than in the case where data is acquired through the buffer for receiving data. However, since data volumes sent and received between the information processing apparatuses are large, it is not practical to newly provide a buffer for receiving data on the arithmetic processing unit. Therefore, such a technique is known, in which data received at the communication device is stored on a cache memory included in the arithmetic processing unit.
An information processing apparatus to which such a technique is adapted directly stores data received at the communication device from a different information processing apparatus on a cache memory included in the arithmetic processing unit. Therefore, since the arithmetic processing unit can read data for use in arithmetic operations out of the cache memory at high speed, communication delay is reduced.    Patent Document 1: Japanese Laid-open Patent Publication No. 11-039214    Patent Document 2: International Publication Pamphlet No. WO 2007/110898    Non Patent Literature 1: Ram Huggahalli and Ravi Iyer, Scott Tetrick, “Direct Cache Access for High Bandwidth Network I/O,” ISCA '05 Proceedings of the 32nd annual international symposium on Computer Architecture
However, in the foregoing technique in which the received data is stored on the cache memory included in the arithmetic processing unit, data that is not used for arithmetic operations is stored on the cache memory included in the arithmetic processing unit also in the case where data that is not used for arithmetic operations is received. Therefore, a problem arises in that it is difficult for the arithmetic processing unit to efficiently perform arithmetic operations and calculation processing speed is reduced.
In other words, in storing data used for arithmetic operations on the cache memory, the information processing apparatus sometimes discharges data used for arithmetic operations out of the cache memory in order to store received data on the cache memory in the case of receiving new data. In this case, since the information processing apparatus reads data, which is discharged from the cache memory, out of the main memory in order to perform arithmetic operations, it is difficult for the information processing apparatus to efficiently perform arithmetic operations, and calculation processing speed is reduced.