The quantity of digital data produced by companies and the like are increasing year after year. With this, storage capacity necessary for storing the digital data has been increased, causing escalation of data management costs. Against such a background, storage products and techniques with a data reduction function called “deduplication” are drawing attention.
General duplication processing is performed through the following three processes.
(1) Chunking Process: Divide data stored in a storage device into data fragments called chunks.
(2) Duplication Judgment Process: Judge whether the storage device has the same chunk as any of the chunks newly created (whether data are stored in a duplicated manner).
(3) Metadata Creation Process: Store only a non-duplicate chunk out of the newly-created chunks in the storage device, and create information (called “metadata” hereinbelow) to be used in the duplication judgment processing and in restoration of original data from the stored chunk data.
In the above processing, for multiple duplicate chunks, only one piece of real data is stored in the storage device. Thus, considerable data reduction can be expected in a case where approximately the same data appear many times, such as a case of backup data.
In such deduplication processing, a chunking method is an important factor that determines the performance. Generally, the smaller the chunk size of each created chunk, the higher the rate of real data that can be reduced (deduplication rate). However, too small chunk size leads to a problem of increasing the amount of metadata necessary for management of chunks as well as the amount of time required for restoration of original data from the chunks. Conversely, large chunk size can decrease the amount of metadata on chunks as well as the amount of time required for data restoration, but leads to a problem of lowering the deduplication rate.
As a countermeasure against this dilemma regarding chunk size, a technique is known which applies multiple chunk sizes according to data on which to perform deduplication processing (see, for example, Patent Literatures 1 and 2). Patent Literature 1 discloses a method of performing chunking by using a small chunk size first, then detecting a largest sequence of repeated chunks contained in chunks created, and newly outputting chunks having a chunk size larger than an initial value. Moreover, Patent Literature 2 discloses a method of performing chunking on chunking-processing target data to be stored, by using a large chunk size for long stretches of duplicate data and long stretches of non-duplicate data and using a small size for data near the border between the duplicate data and the non-duplicate data.