Data is very important to individuals and businesses. Many businesses regularly backup data stored on computer systems to avoid loss of data should a storage device or system fail or become damaged. One current data backup trend is to backup data to disks and use tapes for long-term retention only. The amount of disk space needed to store a month's backup can be very large, such as around 70 terabytes per server in a multi-server computing environment in some examples. The amount of data will likely only be increasing going forward.
One strategy for backing up data is performed as backup data is copied from a storage device and involves backing up only data that has changed, as opposed to all of the data, and then using prior backups of unchanged data to reconstruct the backed-up data if needed. In one approach, data may be divided into fixed size chunks. An MD5 hash or a SHA256 hash may be calculated on the data belonging to the fixed size chunks of data based on logical or natural boundaries of the data, resulting in an MD5 signature for each block of data. The MD5 signature may be searched against an in memory database or an embedded database of previous MD5 signatures. The next time the data is backed-up, signatures are generated for the chunks and searched against the database of signatures to find duplicates if any data has changed. However, this strategy is performed with regard to only a single volume of a single computing device as data is backed-up which increases the time for taking a backup.