As the performance of hardware, software, and a communication network is improved, a technique to obtain high processing performance by performing distributed processing with a plurality of computers connected by the network has been developed.
In particular, recently, as a distributed processing technique is developed, a distributed parallel processing infrastructure capable of fast analyzing large quantities of data is provided, and it is applied to derivation of a tendency and knowledge with respect to large quantities of data. For example, Hadoop that is well known as a distributed parallel processing infrastructure is applied to mining of customer information and an action history, a trend analysis of large quantities of log information, and the like. In Hadoop, HDFS (Hadoop Distributed File System) is used as a distributed file system that deals with large quantities of data. In addition, as a scalable and high-performance distributed storage built on HDFS, HBase described in “Apache Hbase”, The Apache Software Foundation, [online], [retrieved on Mar. 5, 2014], the Internet <URL:http://hbase.apache.org/>is known.
HBase is a distributed storage using a KVS (Key-Value Store) technique. In HBase, a table data structure is used, and with Key called RowKey, Values of data sets correlated with the RowKey can be uniquely obtained. In addition, in HBase, the data sets are divided into a plurality of files each including a certain range of RowKey values, and stored in a plurality of nodes. In addition, in HBase, the data sets are sorted in dictionary order of RowKey values (for example, natural order) and stored. Thus, range retrieval with Key is fast performed.
In addition, in such a distributed storage, copies (replicas) of the data sets are generally stored in the plurality of nodes so as to achieve high availability. Accordingly, even when failures occur in one node, a node including a replica takes over processing, and thus, fault tolerance is increased.