In recent years, parallel processing platforms have been developed with which a plurality of computers (hereinafter also referred to as nodes) is connected in parallel using a local area network (LAN) or the like so that a large volume of data can be processed in parallel in a short time. For example, there is known a parallel processing platform called “Hadoop.” Hadoop builds a distributed file system using a plurality of computers so that data stored in the distributed file system can be subjected to distributed processing. As an example of typical applications, creating search indexes is known. Hadoop is also considered to be one of the effective methods for analyzing a large volume of data in a short time. To implement the Hadoop, however, it is necessary to store data in a distributed manner from storage as a precondition to performing parallel processing.
A case in which a large volume of data is analyzed is described below. First, data to be processed is entered into the distributed processing platform. This data is obtained by extracting data to be analyzed from a data source (e.g., a database), which is not included in the distributed processing platform, and translating it into a form that can be processed by the distributed processing platform. Translation and entry of such data are performed by one of the computers that constitute the distributed processing platform or by another computer.
The entered data is stored in the distributed file system made up of storage devices (e.g., hard disk drives) of computers that constitute the distributed processing platform. At this time, the data is divided into blocks of a given size before being stored in each computer. Thereafter, each computer performs analytical processing to its associated partial data. The associated partial data is, in many cases, data that has been obtained by dividing the original data into blocks and storing each data block in the storage device of each computer. The result of the analytical processing executed by each computer is merged again as a single output result by the distributed processing platform for storage in the distributed file system.
As one of the distributed processing techniques, there is known a technique called a distributed database. For example, Patent Literature 1 discloses a method for building a database system with a plurality of computers.