Conventionally, there are several methods to put or get data in a distributed storage system, such as NoSQL typified by distributed KVS (Key Value Store), in which data is multiplexed into a plurality of nodes. The nodes here are computers including a CPU, a memory, or a disk device, etc., and the nodes are connected to one another via a network. Furthermore, the nodes in the distributed storage system are storage devices for storing therein data in a distributed manner. To put data here denotes to write data into the distributed storage system, and to get data denotes to read out data from the distributed storage system.
FIG. 12A is a diagram illustrating an example of a method to get data, and FIG. 12B is a diagram illustrating an example of a method to put data. In FIGS. 12A and 12B, data is tripled, and the tripled data are stored in nodes, respectively; there exists an order relation of “primary”→“secondary”→“tertiary” in the nodes. Here, “primary”, “secondary”, and “tertiary” denote roles for the nodes in a distributed storage system. A primary 10 is a node having the role of “primary”, a secondary 20 is a node having the role of “secondary”, and a tertiary 30 is a node having the role of “tertiary”. Furthermore, a client 5 is a device that requests the distributed storage system to put or get data.
As illustrated in FIG. 12A, the client 5 can get data from any of the primary 10, the secondary 20, and the tertiary 30. Namely, when the client 5 requests any of the primary 10, the secondary 20, and the tertiary 30 to get data, the client 5 can get an OK from all of them.
On the other hand, the client 5 can request only the primary 10 to put as illustrated in FIG. 12B. A put request is transmitted in order of the client 5→the primary 10→the secondary 20→the tertiary 30, and “OK”, a reply to the put request, is transmitted in reverse order of the tertiary 30→the secondary 20→the primary 10→the client 5.    Non-patent document 1: Robbert van Renesse, Fred B. Schneider, “Chain Replication for Supporting High Throughput and Availability”, OSDI′ 04:6th Symposium on Operating Systems Design and Implementation, P. 91.    Non-patent document 2: Jeff Terrace and Michael J. Freedman, “Object Storage on CRAQ High-throughput chain replication for read-mostly workloads”, In Proc. USENIX Annual Technical Conference, San Diego, Calif., June 2009.
However, the conventional methods illustrated in FIGS. 12A and 12B have a problem that there may be a discrepancy among data held in the nodes. FIG. 13 is a diagram illustrating three cases of occurrence of the problem in the conventional methods.
A case (1) is a case where a failure has occurred in the secondary 20 after the secondary 20 transmitted a put request to the tertiary 30 but has not yet received a reply from the tertiary 30 or before the secondary 20, which has received the reply from the tertiary 30, sends a reply to the primary 10. In this case, the primary 10 detects a time-out; if data is not written at the time of time-out, old data is stored in the primary 10, and updated new data is stored in the tertiary 30.
A case (2) is a case where a temporary communication failure has occurred in a network at the time when the tertiary 30 sends a reply to the secondary 20, and a time-out occurred in the secondary 20. In this case, if data is not written at the time of time-out, old data is stored in the primary 10 and the secondary 20, and updated new data is stored in the tertiary 30.
A case (3) is a case where a temporary communication failure has occurred in the network at the time when the secondary 20 sends a reply to the primary 10, and a time-out occurred in the primary 10. In this case, if data is not written at the time of time-out, old data is stored in the primary 10, and updated new data is stored in the secondary 20 and the tertiary 30.