The present invention relates to a distributed database made up of a plurality of computers, and in particular, relates to a process of distributing and locating data.
In recent years, the volume of data processed in a computing system that executes Web applications has increased dramatically, and various systems that improve the performance of accessing data by distributing data to a plurality of computers are known. For example, in a relational database management system (RDBMS), a method of improving the access performance in an entire system by splitting data into predetermined ranges and locating the split data in a plurality of computers is known.
Moreover, a NoSQL (Not only SQL) database such as KVS (Key Value Store) that locates cache data made up of keys which are data identifiers and data values (values) in a plurality of computer systems according to a predetermined distribution method is known as a system that is used in a cache server or the like.
The KVS employs various configurations such as a configuration of storing data in a volatile storage medium (for example, a memory) capable of accessing data at high speed, a configuration of storing data in a nonvolatile recording medium (for example, solid state disk (SSD), HDD, or the like) having excellent persistent data storage properties, or a combination configuration thereof.
In the combination configuration, the balance between a memory store formed by integrating the memories of a plurality of computers and a disk store made up of a nonvolatile storage medium of at least one computer can be changed in various ways according to various operating policies such as a policy that emphasizes high-speed accessibility or a policy that emphasizes data storage properties.
In the memory store and the disk store, data (values) and data identifiers (keys) are stored as pairs.
Moreover, in the KVS, a plurality of servers forms a cluster, and data is distributed and located in the servers included in the cluster to realize parallel processing. Specifically, data corresponding to a management range (for example, a key range) which is a range of data managed by a server is stored in the respective servers. Each server executes a process as a master of the data included in the management range that the server is in charge of. That is, a server in charge of the data of a management range in which a predetermined key is included reads the data corresponding to the key in response to a read request that includes the predetermined key.
Thus, the KVS can improve the parallel processing performance by scale-out.
In the KVS, a system that employs a configuration in which a server that constitutes a cluster stores copy data of the data managed by another server in order to secure data reliability is known. That is, each server is a master that manages data included in a predetermined management range and is a slave that holds the copy data managed by another server. Due to this, even when a failure occurs in a server, processes can be continuously performed since another server which is a slave uses the copy data held by the server as master data instead of the data managed by the failed server as a master.
Hereinafter, the server which is a master will be referred to as a master server and the server which is a slave will be referred to as a slave server.
As described above, a single point of failure does not exist because the servers that constitute the KVS do not have a special server like a management server. That is, since another server can continue processing even when a certain server fails, the computer system does not stop. Thus, the KVS has failure resistance.
The number of slave servers (that is, the number of servers in which copy data is stored) can be arbitrarily set by the computer system.
Examples of a data location method used in the KVS or the like include a consistent hashing method, a range method, and a list method. The consistent hashing method will be described as a representative example. In the consistent hashing method, first, a hash value of a key is calculated, and the residue of a division of the calculated hash value by the number of servers is calculated. Data is located in a server of which the identification number is identical to the residue.
The system described above is a system for improving the access performance. However, if an access concentrates on specific data, there is a problem in that the load of a computer that manages the specific data increases and the access performance of the entire system decreases. Thus, a method of solving the decrease in the access performance by adding a computer, scale-in or scale-out of the system, or the like is known (for example, see Japanese Patent Application Publication No. H6-259478).
Japanese Patent Application Publication No. H6-259478 discloses a technique of setting a splitting condition of a database according to a use state of computer resources, an access distribution, or the like and relocating data according to the splitting condition.
Moreover, a technique of suppressing a decrease in the access performance by splitting the management range on which the load is concentrated due to addition of a new server to a cluster is known (for example, see Japanese Patent Application Publication No. 2011-118525).