Deduplication has been used for reducing an amount of data that is increasingly stored in a memory device such as a storage device. In this deduplication, data (or record) is divided by a predetermined division size of fixed or variable length (for example, into 8 KB blocks), and a finger print (FP) is generated by calculating a hash value for each of the divided data. Then, duplicate data is detected by comparing the FPs, and only one of the data having the same FP is stored. The elimination of such duplicate data results in reducing an amount of data stored in the memory device.
There are three kinds of deduplication to store backup data in a storage device.
(1) Post-Process Deduplication
With post-process deduplication, data is stored in the storage device once, and the data is compared with each other and deduplicated at a control unit of the storage device.
(2) Client-Side Deduplication
Data is compared with each other and deduplicated at an external device such as a backup management server or a client, other than the storage device, and then stored in the storage device.
(3) In-Line Deduplication
Data is stored in the storage device while being compared with each other at the control unit of the storage device.
A reduction in an amount of data stored in a storage device is desired in terms of suppressing an increase in an amount of data in the storage device and reducing costs. Deduplication is a technique that is expected to further reduce an amount of data.