According to one estimate, the size of the digital universe in 2007 was two hundred eighty one billion gigabytes. The estimate goes on to note that the digital universe had a compound annual growth rate of almost sixty percent. With so much information being generated, the need for efficiently storing information is increasing.
Traditional ways of storing data have been to “backup” a copy of the data to a storage device. However, there is frequently a substantial amount of redundancy in the data that is stored in the storage device. For example, the data may have numerous copies of a file, or there may be minor modifications in the data between consecutive backups. Redundant data wastes storage capacity and unnecessarily consumes bandwidth. Thus, storing data would be more efficient if the data redundancy was removed.
There have been attempts to remove data redundancy. One approach is to divide the data into blocks, assign a unique signature to each block, and store the blocks and unique signatures in a hash table or image file. During subsequent backup operations, new data is divided into blocks, each block is assigned a signature, and the blocks and signatures are compared to previous ones to determine whether a block was previously stored. If an identical block or signature is found, the block is discarded; otherwise, the new block is stored. This is approach is commonly known as deduplication, or “deduping.” Other approaches include storing the blocks in a binary tree and determining whether an incoming block should be stored by searching the binary tree.
While such approaches achieve some efficiency by not storing redundant data, it incurs significant disk overhead as a result of constantly accessing the disk to search for data blocks. Also, the searching techniques employed in existing systems often involve searching for the signature in a database, which becomes less efficient as the size of the database grows. There is a need, therefore, for an improved method, article of manufacture, and apparatus for backing up information.