The present invention relates to a distributed database including a plurality of computers and particularly to a replication process of distributed data.
In recent years, the amount of data has explosively increased in computing systems which execute an application using the Web and computer systems including a NoSQL (Not Only Structured Query Language) database such as KVS (Key Value Store) have been in widespread use. At present, such systems are introduced to various enterprise systems and further utilization in the future is expected.
In a KVS, various configurations are adopted such as a configuration for storing data in a volatile storage medium enabling high-speed access to data, e.g. a memory, a configuration for storing data in a nonvolatile storage medium with excellent data storage durability such as an SSD (Solid State Drive) or an HDD(Hard Disk Drvice), and a configuration as a combination of these. In a combinational configuration, a balance of a memory store configured by virtually integrating memories of a plurality of computers and a disk store configured by nonvolatile storage medium/media of one or more computers can be variously changed according to various operating policies such as an emphasis on high-speed accessibility and an emphasis on storability.
Data as a pair of data (value) and an identifier (key) of the data is stored in the memory store and the disk store.
Further, in the KVS, a cluster is composed of a plurality of servers, and a parallel processing is realized by distributing data in the servers included in that cluster. Specifically, data is stored in each server for each range of a key (key range). Each server performs a processing as a master of the data included in the key range in charge. Specifically, in response to a read request including a predetermined key, the server in charge of the data in the key range including that key reads data corresponding to the key.
Accordingly, in the KVS, the performance of the parallel processing can be improved by scaling out.
It should be noted that the cluster is configured by connecting the servers in a ring and a unique identification number is assigned to each server. Further, various methods such as a consistent hashing method, a range method and a list method are used as a data arrangement method for each server.
The consistent hashing method is described as a representative. In the consistent hashing method, a hash value corresponding to a key is calculated, and a remainder of the division of the calculated hash value by the number of the servers is calculated. Data is arranged in the server with an identification number matching that remainder.
Some of KVSs are known to be so configured that replicated data of data managed by other servers is stored in servers constituting a cluster due to a requirement to ensure data reliability. Specifically, each server is a master for managing data included in a predetermined key range and, simultaneously, a slave for holding replicated data of data managed by the other servers. In this way, even if a failure occurs in a server, the other server as a slave can become a master for the data managed by the failed server as a master by upgrading replicated data held thereby and the processing can be continued.
It should be noted that the server that is a master is also written as a master server and the server that is a slave is also written as a slave server below.
As described above, the servers constituting the KVS have no single point of failure because of the absence of a special server such as a management server. Specifically, since the other server can continue the processing even if a failure occurs in an arbitrary server, the computer system is not stopped. Thus, the KVS is ensured to have fault tolerance.
It should be noted that the number of the slave servers, i.e. the number of the servers that become storage destinations of replicated data can be arbitrarily set by the computer system.
Cost for the replication processing (replication) for storing replicated data in the slave servers is high. Specifically, if it is waited until the replicated data is stored in all the slave servers to ensure data reliability, a waiting time of the processing occurs and the speeding-up of the processing in response to a request cannot be realized. Thus, it is recommended to perform the replication processing in asynchronization with a request such as the one for a data reading processing.
However, if the processing is continued without waiting for the completion of the replication processing, there is a risk of losing data if a failure occurs in the master server before the replication processing is completed, and data reliability cannot be ensured.
The following methods are, for example, known in conventional replication processings.
A first method is as follows. When receiving a request for a storage processing from a client or the like, a master server stores data in a memory store or a disk store. Thereafter, the master server notifies the completion of a replication processing to the client or the like as a storage processing requester (without making a replication processing request to slave servers). Thereafter, the replication processing of data requested to be written is requested to the slave servers (replication processing by asynchronization). Although a request can be processed at a high speed in the first method, data reliability is low since the storage processing of the replicated data in the slave servers is not completed.
A second method is as follows. When receiving a request for a storage processing, a master server stores data in a memory store or a disk store. The master server transmits replicated data to one slave server. When receiving a completion notification of the storage processing of the replicated data from the one slave server, the master server notifies the completion of the replication processing to a computer as a requester (replication processing by synchronization). Since replicated data is stored in one slave server in the second method, data reliability is higher than in the first method. However, since a response from the slave server is waited for, processing performance in response to a request is lower than in the first method. Further, in the second method, there is a risk of losing data when a double failure occurs.
A third method is as follows. When receiving a request for a storage processing from a client device, a master server stores data in a memory store or a disk store. The master server transmits replicated data to all the slave servers. When receiving a notification of the completion of the storage processing of the replicated data from all the slave servers, the master server notifies the completion of the replication processing to the computer as a requester (replication processing by synchronization). Since replicated data is stored in all the slave servers in the third method, data reliability is highest. However, since responses from all the slave servers are waited for, processing performance in response to a request is lowest.
It should be noted that methods other than the aforementioned ones are also known.
As described above, data reliability and the speeding-up of a request processing are in a trade-off relationship.
Various methods have been thought as a method for combining data reliability and the speeding-up of a request processing (see, for example, patent literature 1). In patent literature 1, a freshness threshold value is set in child nodes for replicating data of a route node, an update period is determined based on the freshness threshold value, and data of each child node is updated.
Patent Literature 1: JP2009-545072A