In many traditional computer data storage applications, magnetic tape has been used for data backup. However, tape is a sequential access device. While its streaming bandwidth is high once the head is at the right point and the tape is moving at full speed, the “set up” time is long.
With the explosion in disk capacity, it is now affordable to use hard disks for data backup. Hard disks, of course, are random access and can significantly speed up backup and restore operations. Accordingly, disk-to-disk backup (D2D) has become the preferred backup option for organizations. Backed-up data typically has massive redundancy, because a large proportion of data does not generally change between backup sessions. De-duplication removes this redundancy by storing duplicate data only once, and this increases the effective capacity of the storage device.
Thus, de-duplication has become an essential feature of disk-to-disk backup solutions. Chunk-based de-duplication is a well known de-duplication technique. In this approach, data to be de-duplicated is broken into chunks, and incoming chunks are compared with the chunks already in the store by hash comparison. Only chunks that are not already in the store are backed up, and duplicate chunks are replaced with pointers to the identical, stored copies.