1. The Field of the Invention
The present invention relates to computing backup and restore technology; and more specifically, to mechanisms for generating an incremental backup of a partial volume, and for performing the backup of the same.
2. Background and Related Art
Computing technology has transformed the way we work and play. Businesses, residences, and other enterprises have come to rely on computing systems to manage their key operational data. Often, the data itself is many times more valuable to an enterprise than the computing hardware that stores the data. Accordingly, in this information age, many enterprises have taken precautions to protect their data.
One way of protecting data is to introduce storage redundancy. For example, a primary computing system maintains and operates upon the active data also referred to herein as a “live volume”. A volume is a logical group of data blocks (e.g., sectors on a disk) that are set aside for use by a file system. On desktop systems, a volume is usually equivalent to a disk partition.
At a particular point in time, the primary computing system captures the current state of the active data. The processes of capturing the current state of active data on the primary computing system is also often referred to as taking a “snapshot” of the active data. While there may be a variety of ways of taking a snapshot of the active data, one example will now be described. In the example, from the point of the snapshot forward, if there is a write to the active data, the data that is about to be overwritten is instead copied to another location, and a snapshot table is updated to reflect that the snapshot copy of that portion of the data is in another location. Thus, the snapshot may be preserved while the primary computing system continues to operate upon the active data. At some point, the data from the snapshot may be backed up to the backup computing system. Hereinafter, the active volume that continues to be operated upon by the system separate and apart from the snapshot may be referred to as the “live volume”. The snapshot may be referred to as the “snapped volume”.
In order to do a full (also called a “base”) backup, the backup process conventionally compiles the snapshot version of all of the used clusters in a file system into a base backup image file. The blocks that need to be included in this backup can be determined by the system bitmap. The used data blocks (e.g., sectors or clusters) in a file system may be determined from a system bitmap. As used herein, a “bitmap” is a data structure that has one bit for every data block in a volume. A conventional system bitmap has each bit set if the corresponding data block is in use (e.g., is allocated) by the file system, and clear if the corresponding data block is not in use by the file system.
After a full backup is taken, a snapshot device driver monitors the live volume of the primary computing system and keeps track of each block that has been modified since the last backup. It does this by using what will be referred to herein as a “vdiff” bitmap. When the snapshot is taken for the full backup, the vdiff bitmap has all of its bits initially clear. Until the next snapshot time, if there is a write to the blocks of the live volume, the bit corresponding to the data block being written to is set.
When the next incremental backup is taken, only the blocks that have been modified and that are part of the file system are captured. The incremental bitmap specifies which blocks need to be captured. In conventional incremental imaging, the incremental bitmap may be computed by bit-wise ANDing the system bitmap with the vdiff bitmap. The full backup corresponding to the time that the incremental image is taken can be reconstructed by accessing blocks in the incremental image, and if they are not present in the incremental image, accessing blocks from the base image.
However, often it is not necessary or desirable to backup all files on a system. There are some files that are simply not a high priority for backing up. Excluding these files from a backup can reduce the size of the backup, as well as the time that it takes to create or restore the backup. A good example of files that do not need to be backed up is a user's temporary Internet files, which serve as a cache of files visited recently. Generally there is no need to back them up, and because this cache can be large and changes often, eliminating these files can significantly reduce the size of base and backup images. Another example would be the WINDOWS recycle bin, which also contains old files that the user should not need to back up.
Accordingly, what would be advantageous are mechanisms that permit base and incremental images to be taken, but while allowing certain files to be excluded from the base and incremental images.