As technology advances, data storage is becoming increasingly important and data storage capacities are increasing rapidly. Correspondingly, the size of data storage arrays and their demands for storage have increased rapidly. Thus, ever increasing amounts of data are required to be highly available. As a result, conservation of storage space and use of space saving techniques has become particularly important.
In order the save space, deduplication is a way of performing duplicate data detection and elimination in storage. Previous solutions have been in the area of backup deduplication which are not well suited to primary or live storage. Previous attempts to do inline deduplication for filesystem storage, or deduplication before data is written to the disk from an application, have significant performance issues. For example, when an application makes a request to write data, the data must be compared to the data currently stored prior to being written. This penalty results from the need to compare data prior to writing the data. Even if there is not a write operation as a result of the deduplication process, there is still a performance penalty which negatively impacts the application performance. The penalty may be reduced if a large amount of memory is used to prevent having to perform lookups on the storage. However, this is not practical in most systems.
File level deduplication is not feasible for primary storage because the signification resources involved in comparing the files. File level deduplication does not provide acceptable deduplication percentages generally and in particular when files are modified by a single byte.
Thus, a need exists to efficiently deduplicate data of a filesystem on primary storage so that large amounts of RAM memory are not required and the applications do not experience a performance penalty.