The present invention relates generally to the field of data storage management and more particularly to identifying data for deduplication.
Data deduplication is a data compression technique for eliminating repeated copies of the same data. Data deduplication improves storage utilization and, when applied to network data transfers, reduces the volume of data transmitted. In data deduplication, unique files (or, more generally, byte patterns), are identified and stored for analysis. This analysis may include comparing other files to the unique files and eliminating redundant files. Current data deduplication methods face difficulty with identifying redundant data files and with the amount of memory required to identify data for deduplication.