1. Field of the Invention
The present invention relates to a method and system for backing up data from a data source, and particularly to using metadata to provide an efficient and cost-effective incremental backup.
2. Related Art
Backing up data from one or more computer disks is typically performed to recover from inadvertent user deletions/overwrites or from disk hardware failure. In the case of inadvertent user deletions/overwrites, only the data corresponding to the destroyed data is copied to the original disk from the backup. In the case of disk hardware failure, the user can restore all files to the original disk from the most recent backup. In most computer systems, the backup device is a tape drive, which can accommodate large amounts of data at a relatively low cost per byte of storage.
Generally, conventional backup methods provide for either file-by-file backup or image backup. In a file-by-file backup, the backup program copies one file at a time from the disk to the tape. Specifically, the program places all pieces of data for each file, irrespective of actual locations on the disk, into a single sequential block that is stored on the tape. Thus, a file-by-file backup can easily provide an incremental backup, wherein only those files that have been modified or added since the last backup are written to tape. However, a file-by-file backup fails to ensure that all changes to the files are noted. Specifically, the file-by-file backup fails to indicate removes (wherein a file has actually been deleted), renames (wherein the file is renamed), or links (wherein a file, such as an email, includes pointers to other files, e.g. other mail boxes). It also can be slow since files are written to tape in file order not disk order.
In an image backup, the data image is read sequentially from the disk and written to the tape. Because disk order (not file order) is used, an image backup can be significantly faster than a file-by-file backup. Image backups have most often been used for full backups only. Image incremental backups exist today but are based on block-change lists. That is, an additional software layer must be used at the file system layer or at the device driver layer that tracks changes to underlying storage on a per block basis. Typically, when a portion of a file is re-written, the data can be written directly over the old data.
In systems that want to provide image incremental backups, the additional software to track changes must be enabled. This software, at a minimum, must track which portion of the file system or storage has been re-written. This usually involves updating a map or a list tracking which blocks have been re-written. Thus, all write operations now require at least two writes: one write to update the change list or map and another write to write the data. Therefore, this method adds 100% overhead for writes on systems wanting to enable image incremental backups. Note that some implementations require even more than 2 writes, thereby further increasing the overhead. To perform an image incremental backup, these systems read the list of changed blocks, and then copy each changed block from the disk to the tape.
Therefore, a need arises for a system that provides quick image incremental backups, without requiring the additional overhead of updating a change list or map.