In recent years, a demand for large-scale data processing is increasing, as the high-speed network and the Internet spread. As the demand increases, there is a need for an increase in capacity (hereinafter, called as “storage capacity”) capable of being stored in a storage node for use in data processing.
However, when storage nodes are constituted using a single server, the following problems may occur. Firstly, there is a limit in expanding storage nodes. Secondly, it is difficult to increase the storage capacity and to improve the I/O (input/output) performance when the server is in operation. Thirdly, a storage node may become a single failure point. In order to solve these problems, a technique called a distributed data system is proposed.
Generally, a distributed data system uses a plurality of external storage devices of server devices as a distributed data storage node connected via a network. The distributed data storage node is operates by linking a plurality of node groups incorporated with storage devices. Therefore, the distributed data storage node is a computer system capable of acting like one node.
The distributed data storage system has a feature scale out expandability such as increasing the data storage capacity or improving the data accessibility by adding a node (i.e. a server device) constituting the system.
The distributed data storage system stores data with writing request not in a single node but in a plurality of nodes by duplicating the data. Therefore, even in a case where a certain node becomes inaccessible due to failure of the node and the node fails to function as a storage node, the distributed data storage system can continue the following process. That is, at a reply process to accessing to data stored in a failed node, another node storing a duplicate of the data stored in the failed node inherits the service that has been carried out by the failed node. With the process above, the distributed data storage system can maintain availability of the system.
Further, even in a case where the data stored in a certain node cannot be reread due to node failure, the distributed data storage system can use the duplicated data stored in another node. According to this, loss of data can be prevented, and at the same time, reliability of data holding can be maintained.
Thus, introduction of a distributed data storage system has progressed in a variety of systems in view of trends such as free expandability of system scale, improved performance in processing nodes constituting a system, an increase in the capacity of a storage device, and achieving a low cost.
In a distributed data storage system, a certain number of hardware nodes are required, considering the service scale in an IT (Information Technology) system, in order to maintain availability of services to be provided and reliability of data holding.
An HDD (Hard Disk Drive) is mainly employed as an actual storage device for physically storing data. Capacity of an HDD is continued to increase for many years. However, further improvement of I/O performance of an HDD stagnates, and the I/O performance per capacity has been decreasing year by year.
Meanwhile, capacity of an SSD (Solid-State Drive) incorporated with a flash memory as a storage medium has been increasing year by year, as well as an HDD. In recent years, a demand for an SSD is increasing since an SSD has random accessibility exceedingly superior to an HDD.
Further, increasing capacity of volatile memories as represented by a DRAM (Dynamic Random Access Memory), and development of storage devices incorporated with a next-generation non-volatile semiconductor memory having an I/O performance exceedingly superior to that of a flash memory has also progressed. A plurality of storage devices having different costs, performance characteristics, and capacities are employed as a storage device group within a single storage system.
As described above, regarding the nodes constituting a distributed data storage, it is possible to combine devices having remarkable different I/O performances from each other, in addition to combining devices having a difference in data storage capacity.
Further, when a distributed data storage system is operated, scale out expandability such that a system is stored by adding a new node can be maintained.
However, in order to achieve the aforementioned scale-out expandability, it is necessary to equally distribute data among the nodes, and to uniquely set data storage destinations in a facilitated manner, even when a number of nodes constituting the distributed data storage system increases.
In order to achieve the aforementioned features, there is proposed a method to share rules for arithmetically setting data storage destinations between each node and a client which accesses to data, without administrating IDs (identifiers) for specifying data and data storage destinations using a data table in a certain node.
Employing the aforementioned method allows for the client to uniquely set a data storage destination, each time the client accesses to data, without querying other nodes for the data storage destination.
Further, when the number of nodes constituting a distributed data storage system increases to several hundreds or thousands, a following problems may occur. Specifically, when a specific one of the nodes administers all the data storage destinations or replies to queries from the clients for data storage destinations, the node may be a performance bottleneck in the system.
By employing the aforementioned arithmetical data storage destination setting method, the performance bottleneck as described above may be avoided. As an algorithm for arithmetically setting data storage destinations, there are proposed an algorithm in which hash functions called Consistent Hashing are combined, and an algorithm in which random functions are combined.
These algorithms equally assign data storage. By using the algorithms, in a distributed data storage system configured such that a plurality of node groups having the same performance are linked to each other, it is possible to avoid the problems such that a deviation occurs in a specific one of the nodes and the data storage capacity exceeds in the specific one of the nodes as a result, and that access deviation occurs in a specific one of the nodes and the performance is degraded.
The arithmetical data storage destination setting method has following features. That is, data identifiers (IDs) assigned to respective data in advance, and data itself are set as input data. In this case, system configuration information including “the number of nodes constituting a system”, “a logical data capacity assigned to each node”, “IP address (Internet Protocol Address) information of nodes”, “activation/non-activation of nodes”, and “storage information in which calculation result values by a predetermined arithmetic expression, and data storage destinations are associated with each other” is set as parameters for use in calculation. In this method, a node serving as a data storage destination is set by the value calculated using the parameters.
Thus, According to the aforementioned setting method, when the system configuration is changed due to node addition or node failure, it is possible to share only the system configuration information between a node constituting the system, and a client which accesses to data. Therefore, it is possible to uniquely set in which node, all the data is to be stored.
As described above, when all the nodes constituting a distributed data storage system have the same capacity and the same I/O performance, the system configuration information may be set in such a manner that data is equally distributed and stored in all the nodes, using the arithmetical data storage destination setting method.
In the distributed data storage system as set above, it is possible to equally distribute the consumption amount of the storage capacity of each node, and the number of data access I/O, as data is newly generated or written.
However, when the aforementioned distributed data storage system is operated for a long term, it is necessary to solve a problem such that a storage node having the same performance as an existing storage node cannot be procured. In this case, it is considered that a storage node having a different capacity and a different I/O performance from those of the storage nodes incorporated in the system is incorporated as a newly-introduced storage node.
Further, it is also considered to constitute a system of storage nodes having different performances from the beginning.
For instance, PTL1 proposes a multi-node storage system. The multi-node storage system is configured such that an administration node, and storage nodes of different types which administer storage devices of different types are connected to each other via a network. Further, the administration node acquires information relating to data administration from the storage nodes of different types, and updates the logical volume as necessary.
The system described in PTL2 judges in which one of the storage nodes, data is to be stored, on the basis of a hash value to be calculated from a key given to the data. It is possible to calculate a hash value with respect to a key, for instance, using MD5 (Message Digest algorithm 5). The system may use another hash function such as an SHA (Secure Hash Algorithm).
The method for determining a target storage node (hereinafter, also called as a target node) on the basis of a hash value with respect to a key may be called as Consistent Hashing. PTL2 proposes a distributed storage system, in which a target node as an access destination is judged on the basis of a target administration table which administers the target node with respect to a hash value.
In addition to that, the system described in PTL3 divides a storage region into a first storage region and a second storage region regarding a data distribution method. When there is no vacant region in the first storage region, data is stored in the second storage region as user information. Thus, data is distributed and stored by interchanging the user information stored in the first storage region, and the user information stored in the second storage region according to a condition i.e. on the basis of frequency of use of user information.