The storage of a computer may be backed up using a backup system. This may be done repeatedly over many days, with one backup taken per day so that data loss that is only discovered after a period of time has occurred can still be recovered. While some, backups may be “fulls”, where a complete copy of the storage is made, others may be “incrementals”, which save only files/sections that have been modified since the last backup. Although an incremental is not a complete copy of the storage, when combined with the last full and any incrementals in between, it can be used to restore the contents of the storage at the time the incremental was taken.
In order to reduce the amount of storage required by the backups, the backup system may deduplicate the backups. It may do this by breaking the backups into small pieces (˜4-12 KB) called chunks and only keeping one copy of each unique chunk. By saving only the unique chunks plus backup recipes—instructions for reconstructing a backup from the set of unique chunks—the backup system can use orders of magnitude less storage to store any given set of backups.
When it comes time to retrieve a backup for use in restoring the computers storage, the various unique chunks making up that backup must be read and assembled in order. The speed of this process is heavily dependent with many modern storage technologies (e.g., hard disk drives) upon how fragmented the backup's chunks are. More precisely, every time the read process must switch to reading from a different part of the backup storage it may pay a random seek penalty (˜10 ms with current drives). If there is no locality in where a backup's chunks are located (e.g., 76, 12, 34, 224, 103, 876 . . . rather than 76, 77, 78, 224, 225, 226, . . . ), then store speed may be quite slow.
Prior art stores new unique chunks sequentially. While this places all the new chunks from a given backup together, it does not place them next to the old chunks from that backup. If a backup is taken every day, then the data that was new on each day is located together. Unfortunately, many files such as log files and draft documents change a little bit each day, which results in their chunks being scattered across the backup system's storage. Sharing of chunks between different backups can also result in chunks being placed in a sub-optimal location from the perspective of a given backup. Accordingly, many users complain that restoring a computer's storage from a deduplicated backup is a sluggish and time-consuming process.