A large-scale distribution file system attracts attention and is utilized as a cloud computing base for executing very large scale scientific and engineering calculations, etc. The cloud computing is a term referring to a configuration that, for example, a plurality of server computers is coupled through a network to enable users to utilize the data processing capabilities of the server computers without a consciousness of the hardware configuration and has the same meaning as network computing herein, for example.
A representative example of such a cloud computing base is GFS (the Google File System, see NPL 1). Since very large data is handled in a file system employed in the GFS, one data file is divided into units called “chunks” having a relatively large size and stored in a plurality of servers (hereinafter “chunk servers”) provided on a network. For example, one data file (hereinafter, simply “file”) may be on the order of GBs or more and is made up as a set of a plurality of chunks each having a data size of 64 MB. Information of each file, information of the whole file system, etc., are managed by one master server communicatively coupled to a plurality of the chunk servers through a network. One pseudo file system is created by one master server managing a multiplicity of the chunk servers.
In a calculation process handling such large scale data, one calculation may take a few hours to a few days. Final data may be acquired by repeatedly utilizing the large-scale result data calculated as above in another calculation in some cases. Therefore, if a mistake in calculation, a lack of data, etc., occur in the middle of a calculation process, a huge amount of time is required for performing the calculation again. Since a portion of a huge size of data is often changed to repeat the same calculation in the applications such as simulations, etc., a multiplicity of huge similar data must be retained.
The GFS provides a snapshot function to be prepared for such situations and applications. The snapshot function is a function of retaining an image of a file, etc., at a certain time point to enable the image of the time point to be read later. The snapshot of the GFS is a function of retaining differences for each chunk with data updated to enable the backup and retention of a plurality of data without copying whole data making up a file.
On the other hand, a differential snapshot is a similar technology provided in a normal file system, etc., locally established in one server (hereinafter, “local file system”). This is a technology implementing the snapshot with a small capacity by retaining a difference between an image of a file, a directory, an entire local file system, etc., at a certain time point and a current image for each block of several KB to several tens of KB. The differential snapshot is disclosed in PTL 1 and PTL 2.
Citation List
Patent Literature
    PTL 1: U.S. Pat. No. 5,963,962    PTL 2: U.S. Pat. No. 7,237,076Non Patent Literature    NPL 1: Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung, “The Google File System”, ACM SIGOPS Operating Systems Review archive Volume 37, Issue 5 (December 2003), SOSP '03