A technique of a distributed processing system has been known, in which large-scale data are divided into multiple data units in a case where large-scale data are processed, and multiple work units respectively perform processing on the divided multiple data units.
For example, first, a distributed processing system divides, into data units, a processing request (job) which is given by the client and which is targeted on all or some of data unit groups divided large-scale data. Subsequently, the distributed processing system generates a processing request (task) of a data unit for a work unit performing predetermined processing on each of the divided data units. Subsequently, in the divided processing system, each of multiple work units performs predetermined processing on the data unit in accordance with this processing request, and outputs a processing result. Then, the distributed processing system collects the processing results of the task, thus performing processing on the data units for the entire or a part of the large-scale data.
In the technique of such distributed processing system, it is desired to have a distributed processing system of large-scale data having a high degree of reliability capable of returning a processing result even in a case where a failure and the like occurs in a small number of work units (for example, physical servers), and the work units goes down. In the technique of the distributed processing system, many work units are required, and therefore, it is desired to reduce the overhead due to the tasks by reducing the amount of communication and the number of times communication is performed between the work unit and the controller for commanding the work unit to perform the processing.
In this case, in the distributed processing system, a technique for distributing the processing load to multiple work units is known (for example, see PTL 1). In the technique of PTL 1, in a case where an access request is given to multiple processors to access various kinds of data arranged in a distributed manner, a distributed-type data base management system in which processing according to the access request is performed by a particular processor arranged with desired data includes a processing load deviation detector and a data arrangement change unit. Then, this processing load deviation detector detects the processing load deviation on the basis of the system load statistics information and the access information for accessing the data unit, and changes the arrangement configuration of the data in accordance with the load deviation. As a result, the load of the task can be distributed.
In the technique of the distributed processing system, a redundant arrangement technique of data for arranging the same data unit in a distributed manner to each of the multiple work units so as not to lose the data unit is known (for example, see PTL 2). The technique of PTL 2 includes means for classifying the physical node of the storage into groups, and means for allocating data so that distributed data and copied data of the distributed data do not exist in the group. The distributed processing system having such configuration arranges copied data to multiple different groups, thus capable of maintaining redundancy of the data.