Non-patent Document 1 describes an example of a conventional distributed processing system. This conventional distributed processing system is called MapReduce, and is formed by a distributed file system, and plural nodes for making a calculation. On the distributed file system, one file is split into multiple pieces (chunks), which are distributed across plural nodes. Further, a copy of each of the chunks is placed on the nodes, thereby securing reliability.
The conventional distributed processing system having such a configuration described above operates in the following manner. Calculation is made in two phases including a Map phase and a Reduce phase. In each of the nodes that makes a calculation, worker processes are performed. To the worker processes, a task (hereinafter, referred to as Map task) that performs the Map phase and/or a task (hereinafter, referred to as Reduce task) that performs the Reduce phase are/is assigned. Calculation of the worker processes is made by performing these assigned tasks.
In the Map phase, each Map task running in the worker process in each node reads chunks of at least one input file from a local or remote unit, and in parallel performs a Map function defined by a user. Here, it is assumed that the output from the Map function is a key-value pair. Then, using a split function defined by a user, determination is made as to which key is transferred to which Reduce task. On the basis of this determination, the key-value pairs are categorized, and are stored in a local disk.
Then, the Reduce task makes a request to each of the nodes for a key-value pair that this Reduce task is in charge of, and receives the key-value pair. The Reduce task assembles values for each key, and executes a Reduce function defined by a user using the assembled data as input. Each Reduce task outputs the calculation results to a distributed file system as separate files.
The conventional distributed processing system operates in the following manner in the case where failure occurs in the nodes during the operational processing. First, the distributed processing system causes another node to re-process all the Map tasks that have been completed in the node having the failure. This is because the output of the Map task is stored in the local disk on the node having the failure. In the case where all the Map tasks are re-processed as described above, the source of input of data is changed, and the worker process is notified to that effect. Note that it may be possible that the reduce task is not executed again. This is because the execution results are already stored in the distributed file system.