This invention relates to a distributed database constructed of a plurality of computers. In particular, this invention relates to setting processing for automatically constructing a distributed database.
In recent years, data amounts have increased explosively in a computer system for executing an application using the Web, and various systems that improve the performance of accessing data by distributing data to a plurality of computers are known. For example, in a relational database management system (RDBMS), a method of improving the access performance in an entire system by splitting data into predetermined ranges and locating the split data in a plurality of computers is known (see, for example, JP 2002-297428 A).
In JP 2002-297428 A, there is disclosed an invention in which the only one original site on a network executes processing of updating data stored in each of databases allocated to the plurality of computers on the network, and each of other replica sites receives an updating result executed by the original site to reflect the updating result in replica data held by the replica site itself. With this configuration, it is possible to maintain uniformity of data used by the plurality of computers on the network.
Moreover, a NoSQL (Not only SQL) database such as KVS (Key Value Store) that locates cache data made up of keys which are data identifiers and data values (values) in a plurality of computer systems according to a predetermined distribution method is known as a system that is used in a cache server or the like.
The KVS employs various configurations such as a configuration of storing data in a volatile storage medium (for example, a memory) capable of accessing data at high speed, a configuration of storing data in a nonvolatile recording medium (for example, solid state disk (SSD), HDD, or the like) having excellent persistent data storage properties, or a combination configuration thereof.
In the combination configuration, the balance between a memory store formed by integrating the memories of a plurality of computers and a disk store made up of a nonvolatile storage medium of at least one computer can be changed in various ways according to various operating policies such as a policy that emphasizes high-speed accessibility or a policy that emphasizes data storage properties.
In the memory store and the disk store, data (values) and data identifiers (keys) are stored as pairs.
Moreover, in the KVS, a plurality of servers forms a cluster, and data is distributed and located in the servers included in the cluster to realize parallel processing. Specifically, data corresponding to a management range (for example, a key range) which is a range of data managed by a server is stored in the respective servers. Each server executes processing as a master of the data included in the management range that the server is in charge of. That is, a server in charge of the data of a management range in which a predetermined key is included reads the data corresponding to the key in response to a read request that includes the predetermined key.
Thus, the KVS can improve the parallel processing performance by scale-out.
In the KVS, a system that employs a configuration in which a server that constitutes a cluster stores copy data of the data managed by another server in order to secure data reliability is known. That is, each server is a master that manages data included in a predetermined management range and is a slave that holds the copy data managed by another server. Due to this, even when a failure occurs in a server, processes can be continuously performed since another server which is a slave uses the copy data held by the server as master data instead of the data managed by the failed server as a master.
It should be noted that the server as the master is hereinafter also referred to as “master server” and the server as the slave is hereinafter also referred to as “slave server”.
As described above, a single point of failure does not exist because the servers that constitute the KVS do not have a special server like a management server. That is, since another server can continue processing even when a certain server fails, the computer system does not stop. Accordingly, the KVS can also ensure a failure tolerance.
It should be noted that the computer system can arbitrarily determine the number of slave servers, in other words, the number of servers to which the replicated data is to be stored.
As a method of allocating data in a distributed manner used in the KVS or the like, various methods, such as consistent hashing method, a range method, and a list method, are used.
For example, in consistent hashing, first, a hash value of a key is calculated, and the residue of a division of the calculated hash value by the number of servers is calculated. Data is located in a server of which the identification number is identical to the residue.