1. Field
This invention relates to deduplication and more particularly relates to enhanced block-level deduplication.
2. Description of the Related Art
The amount of stored data is growing at an enormous rate. A large storage system commonly has as much as 3,000 petabytes (1024 terabytes) that grows by about 30% a year. Furthermore, many copies of data typically exist on a storage system, giving rise to methods to compress or to eliminate duplicate copies of this extra data to improve the capacity of the storage system. Such methods are commonly referred to as deduplication.
Deduplication at the data block level is difficult because there is no knowledge of the file name, structure, or application and therefore no way to imply or check supposed duplication of said files. Block-level deduplication requires a large amount of processing to individually compare the blocks for duplicates by reading and comparing sectors of data to see if the sectors are identical. A system may read blocks of data and compute some sort of a signature of the block and determine if multiple blocks have the same signature. If signatures match, the system may perform a bit by bit compare of the data blocks and determine if there is actual duplication of blocks.
The most efficient deduplication will occur with the smallest block size. However, the smaller the block size, the larger the directory of indexes to the signatures of each block. A large directory requires a large amount of storage to contain the directory and more processing resources to search the directory. Furthermore, due to the resource intensive nature of deduplication at the block-level, background processes of a storage system will often perform the signature comparing and the bit by bit comparing. Moreover, these background processes consume valuable resources of the storage system.