Data deduplication (also referred to simply as “deduplication”) is a space-saving technology intended to eliminate redundant (duplicate) data (such as, files) on a data storage system. By saving only one instance of a file, disk space can be significantly reduced. For example, if a file of size 10 megabytes (MB) is stored in ten folders of each employee in an organization that has ten employees. Thus, 100 megabytes (MB) of the disk space is consumed to maintain the same file of size 10 megabytes (MB). Deduplication ensures that only one complete copy is saved to a disk. Subsequent copies of the file are only saved as references that point to the saved copy, such that end-users still see their own files in their respective folders. Similarly, a storage system may retain 200 e-mails, each with an attachment of size 1 megabyte (MB). With deduplication, the disk space needed to store each attachment of size 1 megabyte (MB) is reduced to just 1 megabyte (MB) from 200 megabyte (MB) because deduplication only stores one copy of the attachment.
Data deduplication can operate at a file or a block level. File deduplication eliminates duplicate files (as in the example above), but block deduplication processes blocks within a file and saves unique copy of each block. For example, if only a few bytes of a document or presentation or a file are changed, only the changed blocks are saved. The changes made to few bytes of the document or the presentation or the file does not constitute an entirely new file.