In recent years, eventually consistent data store is widely utilized along with development of a scale-out type system configuration. The eventually consistent data store is a distributed data store in which, in the CAP theorem regarding consistency, availability, and partition-tolerance of a distributed system, the consistency is relaxed and the availability and the partition-tolerance are given importance. In the distributed data store, mass data are stored and distributed batch processing is utilized in order to efficiently perform analysis and manipulation of the mass data.
For example, when an eventually consistent storage service, such as Amazon S3 (registered trademark), is utilized, storage of mass data such as files into the Amazon S3 and batch processing that includes manipulation of stored data are executed by a plurality of computers.
FIG. 9 is a block diagram illustrating a configuration of a general distributed system that employs such distributed batch processing.
Referring to FIG. 9, the distributed system includes a control node 700, a plurality of processing nodes 800, and a distributed data store 900. The distributed data store 900 is an eventually consistent data store described above, duplicates data contained in a processing-target file, and stores the data into a plurality of data store nodes 910. The control node 700 instructs each processing node 800 to execute a job of the distributed batch processing. Each processing node 800 executes the job of the distributed batch processing with respect to the data designated by the control node 700.
FIG. 10 is a diagram illustrating an example of definitions of operations of the distributed batch processing.
In the example in FIG. 10, the distributed batch processing is made up of definitions of two jobs, a “job 1” and a “job 2”, and definitions of split execution regarding the respective jobs, “split by X” and “split by D”. The distributed batch processing is executed as follows. First, the control node 700 allocates data (file data) X (X1, X2, . . . ) contained in a file to each processing node 800 (split by X). Each processing node 800 executes the job 1 on the allocated data X (X1, X2, . . . ). The job 1 reads, from the file, the allocated file data X (X1, X2, . . . ) (FR(X)), and writes the file data as data D (D1, D2, . . . ) into the distributed data store 900 (SW(D)). When the job 1 on all the file data X (X1, X2, . . . ) by the processing nodes 800 ends, the control node 700 allocates the data D (D1, D2, . . . ) to each processing node 800 (split by D). Each processing node 800 executes the job 2 on the allocated data D (D1, D2, . . . ). The job 2 reads the allocated data D (D1, D2, . . . ) from the distributed data store 900 (SR(D)) and performs predetermined processing (P(D)). The job 2 writes a result of the processing as data E (E1, E2, . . . ) into the distributed data store 900 (SW(E)).
In the eventually consistent data store, even immediately after writing of data, consistency of reading of the data is not assured. For example, when a process A updating the data X updates a value of the data X in a certain data store node 910, a certain amount of time is needed before a result of the update is reflected (synchronized) in the other data store nodes 910 that store replicas of the data X. Therefore, even when, immediately after the update of the data X by the process A, another process B reads the data X, there is a possibility that, when the data X are read from another data store node 910, a value different from the value of the updated data X may be read.
FIG. 11 is a diagram illustrating an example of data processed by distributed batch processing in the distributed system in FIG. 9.
In the example in FIG. 11, there is a possibility that data read in the job 2 may be data before the update (synchronization) by the job 1, like data D6′. In this case, a false result E6′ of the process is written into the distributed data store 900.
A technique for solving such a problem regarding consistency of reading of data in an eventually consistent data store is disclosed in, for example, PTL 1. In the technique of PTL 1, a time point at which data is certainly fixed is specifically determined by using a logical clock value at a time of command reception in each data store node.
Furthermore, as a related technique, PTL 2 discloses a technique in which, in an eventually consistent data store, an error in an execution sequence of an operation is detected on the basis of a reception time of the operation, and the operation is re-executed in a correct sequence.