The present invention relates generally to data storage. In particular, the present invention relates to data deduplication for streaming sequential data from storage such as tape storage.
Data deduplication reduces the data footprint on storage media. For example, for backup data, deduplication is reported to reduce the data footprint by 10 to 100 times. Data deduplication is a form of data compression for eliminating redundant data and improving storage utilization. Large data sets often contain long stretches of duplicate bytes. Data deduplication compresses data by identifying these stretches of duplicate data and replacing them with references to a single copy of the unique data. As such, the amount of data that must be stored is reduced.