Safeguarding electronic data by backing the data up is a common event, and an event that consumes increasing amount of memory and processing power. Data files today typically occupy much more memory than with previous software programs and thus backing these files up requires more storage space, and more processing power and communication-link bandwidth to transfer and store the files. With enormous amounts of data to back up, it is desirable to reduce data backup to not back up data that has not changed, and to back up as few copies (preferably one) of a file as possible.
A number of techniques have been developed for network-based computer backup systems that greatly reduce the bandwidth and storage needs of the backup system. Two examples are differential file backup and common file elimination (e.g., Cane et al, U.S. Pat. No. 5,765,173). Generally, differential file backup is performed by determining changes that have occurred within a file using a set of hash codes that represents the information within the file, as it previously existed, in fixed-size blocks. These hash codes are matched up against the same file now modified, determining those areas of the file that have changed and those that areas of the file that have not changed. This results in significant bandwidth and space savings for sending and storing the portions of the file that have changed. Common file elimination determines whether a file to be backed up is the same as other files to be backed up (e.g., a file already backed up), and if so, stores only one copy of that file. Common file elimination techniques can be applied to data groups other than files.