Systems that perform backups of data often make use of a fingerprint index and deduplication. In deduplication, a fingerprint is derived for each arriving data segment, and the fingerprint is compared to fingerprints in the fingerprint index. If there is a match between the fingerprint of the arriving data segment and a fingerprint in the fingerprint index, the newly arriving data segment is discarded, since the matching fingerprint in the fingerprint index indicates a copy of that data segment is already stored in backup storage. If no match is found between the fingerprint of the arriving data segment and the fingerprint index, the arriving data segment is stored in backup storage, and the fingerprint is added to the fingerprint index. Over time, and many backup operations, data segments can be stored in many differing containers. In a restore operation, the data segments must be retrieved from these many differing containers, in accordance with the appropriate backup image, which references the containers and the segments. The restore operation may be very time-consuming, due to the scattered data segments.
Deduplication systems tend to reduce backup storage by discarding as many data segments as possible, so stored segments of a backup image tend to be scattered among the whole storage system over time. Data locality is desirable for a data image restore or a data image tape-out, because a large amount of disk I/O (input/output) time will be spent in disk track seeking, for a backup image with bad data locality. It would be desirable that restoring a backup image from a year ago should have the same performance as restoring from the most recent backup image. Therefore, there is a need in the art for a solution which overcomes the drawbacks described above.