By making a backup copy of an active database at a point in time, the state of the database at that point in time can be restored if, for example, the database subsequently becomes corrupted or is lost. Deduplication of data can reduce the amount of storage space and bandwidth consumed by backups and restores. With deduplication, after a “chunk” of data is stored, other instances of data that match that chunk are replaced with a reference to the stored chunk of data.
While deduplication can improve performance in some ways, it can reduce the efficiency of restore operations, especially as the number of backups increases. For example, as a result of a first backup, data is copied and stored in a first set of disk locations. After a second backup is performed, data that is duplicated in the second backup will be replaced with a reference to the matching data in the first backup. Thus, some of the data associated with the second backup is stored in a second set of disk locations, while the rest of the data associated with the second backup is stored in the first set of disk locations. In a similar manner, data associated with a third backup may be stored in three sets of disk locations. As the number of backups increases, the data associated with the later backups becomes more and more fragmented.
As a result, data throughput during a restore decreases as the number of backups increases. For the same set of files, the restore throughput of the first, 10th, 20th, and 30th backups can be 500, 300, 213, and 156 megabytes/second (MB/sec), respectively, for example.
Restores may be performed routinely for a variety of reasons such as for compliance reasons. Consequently, a technique for improving restore throughput while maintaining the benefits of deduplication would be advantageous.